'THFﬁv‘

This is to certify that the
dissertation entitled

MINIMUM DISTANCE
MEASUREMENT ERRORS MODEL FITTING

presented by

WEIXING SONG

has been accepted towards fulﬁllment
of the requirements for the

Ph.D. degree in Department of Statistics and
Probability

 

 

M

Major Professor’s Signature

Date

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

LIBRARY
Michi an State
Un versity

 

 

 

PLACE IN RETURN Box to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 p:/CIRCIDateDue.indd-p.1

Minimum Distance

Measurement Errors Model Fitting
By

Weixing Song

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of
DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

2006

ABSTRACT

Minimum Distance

Measurement Errors Model Fitting
By

Weixing Song

This work proposes a class of minimum distance tests for ﬁtting a parametric
regression model to a class of regression functions in the measurement error models.
In the errors-in-variables model case, these tests are based on certain minimized L2
distances between a nonparametric regression function estimator and a deconvolution
kernel estimator of the regression function of the parametric model being ﬁtted. In
the Berkson model case, these tests are based on certain minimized distances between
a nonparametric regression function estimator and the parametric model being ﬁtted.
The thesis establishes the asymptotic normality of the proposed test statistics under
the null hypothesis and that of the corresponding minimum distance estimators in
both cases. Simulation studies show that the testing procedures are quite satisfactory

in the preservation of the finite sample level and in terms of a power comparison.

ACKNOWLEDGMENTS

I wish to express my sincere gratitude to my advisor Professor Hira L. Koul
for his invaluable guidance. It would have been impossible for me to ﬁnish this
dissertation without the uncountable number of hours he spent sharing his knowledge
and discussing various ideas throughout the study. His general thinking of statistical
problem and ways to solve the problem will help my future research.

I would also like to thank Professors Sarat Dass, RV. Ramamoorthi and Richard
Baillie for serving on my guidance committee. Many thanks to Professors Connie Page
and Dennis Gilliland for their advice when I was at the consulting service. Finally,
I would like to thank the Department of Statistics and Probability for offering me
graduate assistantships, and the Graduate School for offering me the Dissertation
Completion Fellowship so that I could complete my graduate studies at the Michigan
State University.

Last but not. the least, I would like to give my thanks to my mother Fuying Song

and my wife Xiuqin Bai, whose patient love enabled me to complete this work.

iii

TABLE OF CONTENTS

LIST OF TABLES vi
LIST OF FIGURES viii
Introduction 1
1 Minimum Distance Errors-in-Variables Model Fitting 8
1.1 Introduction ................................ 8

1.2 Assumptions ................................ 13

1.3 Asymptotic normality of On ....................... 20

1.4 Asymptotic normality of the minimized distance ............ 25

1.5 Simulations ................................ 40

1.6 Discussion ................................. 49
1.6.1 Sample Size Allocation ...................... 49

1.6.2 General Errors—in-Variables Model Fitting ........... 53

2 Minimum Distance Berkson Model Fitting 57
2.1 Introduction ................................ 57

2.2 Assumptions ................................ 65
2.3 The Consistency of 9?; and 6A.” ...................... 69

2.4 Asymptotic Distribution of 6A,, ...................... 72

2.5 Asymptotic Distribution of the Miniinized Distance .......... 78
2.6 Simulations ................................ 87
BIBLIOGRAPHY 93

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

1.10

1.11

1.12

1.13

2.1

2.2

2.3

2.4

LIST OF TABLES

Mean and MSE of Ban, (1 = 1, q = 1, Double Exponential ........
Levels and powers of the MD. test, d = 1, q = 1, Double Exponential
Mean and MSE of an, d = 1, q = 1, Normal ...............
Levels and powers of the MD. test, d = 1, q = 1, Normal .......
Mean and MSE of an, d = 1. q = 2, Double Exponential ........
Levels and powers of the MD. test, d = 1, q = 2, Double Exponential
Mean and MSE of ﬁn, d = 2, q = 2, Double Exponential ........
Levels and powers of the MD. test. d = 2, q = 2, Double Exponential
711 = n2, d = 1, q = 1, Double exponential ...............
Same sample, at = 1. q = 1, Double exponential .............
n1: n2, d =1,q =1, Normal ......................
Same sample, d = 1, q = 1, Normal ...................

Same sample, d = 2, q = 2. Double Exponential ............

Mean and MSE of 67,1, (1 = 1, q = 1 ...................
Levels and powers of the MD. test, d = 1. q = I ............
Mean and MSE of an, d =—. 2. q = 2 ...................

Levels and powers of the MD. test, d = 2, q = 2 ............

vi

42

45

46

46

46

47

48

49

51

51

51

90

92

92

2.1 Comparison Plot

LIST OF FIGURES

ooooooooooooooooooooooooooooo

Introduction

In the classical regression model, we use a set of variables, say d—dimensional predic-
tor X, to explain the response Y, a one dimensional real random variable, here, both
X and Y are observable. But in the real applications, the predictor X is not always
observable. To deal with the statistical inference problems in this case, statisticians
proposed the so called measurement errors model. In this model, a surrogate of X,
say Z, is observed. Then how to investigate the statistical relationships between X
and Y based on the data from Z and Y is the main issue in the measurement errors
models.

Based on the stochastic structure between X and Z, the measurement errors model
usually can be divided into two classes, error models which including the errors-in-
variables models in which Z = X + u and the error calibration models in which
Z = a + BX + u, and the Berkson model (or Regression calibration models) in which
X = Z + r}, where u, 17 are measurement errors. About this classiﬁcation, see Carroll,
Rupert. and Stefanski (1995) for the details.

The measurement errors regression models have been receiving a continuing atten-

tion in the statistical literature over the last century. For some literature reviews on

errors-in-variables models, see Gleser (1981), Anderson (1984), Fuller (1987), Bickel
and Ritov (1987), Carroll and Hall (1988), Fan (1991a, 1991b), Fan and Truong
(1993). Carroll, Rupert and Stefanski (1995), and the references therein. As for the
Berkson models, see Rudemo, et al. (1989), Huwang, L. and Huang, Y.H.S. (2000),
Wang (2003, 2004) for some literature reviews. Most. of the existing literature has
focused on the estimation problem. Model checking or lack-of-ﬁt testing problem is
not discussed thoroughly. Only some sporadic results on this topic can be found in
the literature.

In the errors-in-variables model case, Fuller (1987) discusses a graphic method
for lack-of-ﬁt testing of a linear errors in variables regression model. Carroll and
Spiegelman (1992) consider the graphic and numerical diagnostics for nonlinearity and
heteroscedasticity in linear regression model with errors in variables. Zhu, Song and
Cui (2003) considered the Iack-of-ﬁt testing in the polynomial regression with errors
in variables and constructed a residual-based test of score type, but their method has
two limitations. First, the predictor is one dimensional and the regression function
under the null hypothesis is polynomial; second, the density function of the predictor
is assumed to be known which is generally unrealistic in the real applications. Cheng
and Kukush (2004) also addressed the same problem based on so—called adjusted least
squares estimators. Few results on the errors in variables regression model checking
without imposing strict conditions are available in the literature.

Berkson model has a relatively simpler structure than errors-in-variables model
in that the density function of the predictor can be estimated by the usual kernel

method. Like the errors-in-variables models, there is a vast literature on the estima-

tion problems about. the parameters, but no discussion on the model checking problem
for this case.

Many interesting and profound results, on the other hand, are available for the
regression model checking problem in the absence of errors in predictor, see, e.g.,
Eubank and Spiegelman (1990), An and Cheng (1991), Eubank and Hart (1992, 1993),
Hart (1997), Stute (1996), Zheng (1996), Stute, Thies, and Zhu (1998), Khmaladze
and Koul (2004), among others. For a general discussion on the model ﬁtting in
the classical regression case, a good reference is Hart (1997). Stute (1996), Stute,
Thies, and Zhu (1998) constructed a test statistic based on certain marked empirical
processes. Their simulation results show the testing procedure is quite satisfying, but
their procedure can only be used for the one dimensional case. The recent paper
of Koul and Ni (2004)(K-N) uses the minimum distance (MD) ideas developed by
Wolfowitz (1953, 1954, 1957) to propose tests of lack-of—ﬁt for the regression model
without errors in variables. Their work can be used to deal with the multidimensional
case. In a ﬁnite sample comparison of these tests with some other existing tests, they
noted that a member of this class preserves the asymptotic level and has very high
power against some alternatives and compared to some other existing lack-of-ﬁt tests.
Our work will extend this methodology to the measurement errors model set up.

To be speciﬁc, in the classical regression set up, let X ,Y be random variables,
with X being al.-dimensional and Y one dimensional with EIYI < oo. Letu(2:) =
E(Y|X = 2r) denote the regression function, and let {m6(-) : 0 E G}, 9 C Rq, q 2 1,

be a given parametric model. The statistical problem of interest here is to test the

following hypothesis:

H0 : u(.r) = -m.90(;r). for some 60 E 9, and all 3: E I, vs. H1 : H0 is not true,
(1)
where I is a compact subset. of Rd, (1 Z 1, based on a random sample (Xi,l’z-);1 S
i S n from the distribution of (X, Y). In the K-N paper, the design is random but
observable. Let K, K * be two possibly different density kernels on [—1, lld. For any

bandwidth sequence h, let

I

Kh(:r) Z= iiK(;l-), Khz-(I) I: Kh(.’L‘-X.i), th(CL‘) =lZKhz(l')

Note that f X h is the kernel estimator of IX corresponding to the kernel K *. K-N

deﬁnes
Tum/[i {Km-m ><Y-— max 0&2in (and) Ge), (2)

and 5n := argmingeean) where w = urn ~ (log n/n)1/(d+4), and h = fin is a
bandwidth depending on the sample size n. For some crucial technical reasons, differ-
ent bandwidths h and w are chosen. The integration measure G is a o-ﬁnite measure
on Rd which may be chosen to make the test statistic to have good power. Under
the null hypothesis and some regular conditions, the consistency and asymptotic nor-
mality of 5n are proved. They also showed that the asymptotic null distribution of

mtg/2m 1/2

(Tn(dn)— C") is standard normal, where

'77.
.. 1 ,. ~_ A
on := 72 2 j [I A3,<r>s?fu.,3<r)dc(x). 5, = Y1: — mganZ-i
' i=1

hd"

~ 2
Fn, I: —2 (A[X'hl'(.’l‘)1{hj(I)él§ifz;3(l‘)d0(r)) .
"‘ i761

Thus, the test that. rejects the null hypothesis whenever 11h.g/2F;1/2ITn.(én) — Cnl >
za/Q, is of the asymptotic size a, where 20- is (1 — a)th percentile of the standard
normal distribution. Unlike in other related papers, K-N do not need the null regres-
sion function to be twice continuously differentiable in the parameter vector. The
asymptotic normal distribution of ﬁn and Tn(9~n.) were made feasible by recognizing
to use different band widths for the estimation of the numerator and denominator
in the nonparametric regression function estimation. A consequence of the above
asymptotic normality result is that at least for large samples one need not use any
resampling method to implement these tests.

In this thesis, we will discuss, in the measurement errors setup, how to develop

testing procedures for the following hypothesis:
H0 : u(:r) = m90(;r), for some 60 E 9, and all :13, vs. H1 : H0 is not true. (3)

From K-N’s procedure, we know that if we want to use the minimum distance
method, a kernel-type regression estimator must be constructed, but this in turn
implies that we must ﬁnd an estimator for the density function of the predictor. This
is not a problem in the classical regression case in that the predictor is observable. But
in the measurement errors models case, the predictor X is not observable, to adapt
K-N’s minimum distance method, the above procedure needs some modiﬁcation.

We now briefly describe the modiﬁcation needed for the errors-in-variables model.
It consists of two steps:

Step 1. Hypothesis Change: The hypothesis (3) concerns with the regression

function u(:r) which depends on the true predictor, but the true predictor is not

observable. By recognizing that 11(2) 2: E(Y|Z = 2) = E(;1.(X)|Z = 2), we consider
the new regression model Y =2 11(2) + C, where the error C is uncorrelated with Z
and has mean 0. The problem of testing for H0 can be transformed to test for
11(2) = 1190(2), where 119(2) 2: E(m9(X)|Z = 2). Since Z is observable, so we can
construct a classic kernel estimator for the new regression function 11(2).

Step 2. Deconvolution Kernel Density Estimator: The minimum distance
will be constructed based on the classical kernel estimator of 11(2) and a proper
estimator of 119(2) := E(m9(X)|Z = 2) under the null hypothesis. Note that, under

the null hypothesis,

1! (z) = fm9($)fx(x)fu(z — I)dx
6 ffXIIIfuIZ - I)dx '

 

To estimate this quantity for given 0, we need an estimator of f X' In this connection
the deconvolution kernel density estimators are found to be useful here. Putting the
deconvolution kernel density estimator of f X into the above expression, we construct
the deconvolution kernel estimator of 119(2).

To obtain the asymptotic distribution of the test statistic, we need to consider
the asymptotic behavior of the deconvolution kernel estimator of 119(2). Although we
extend Stefanski and Carroll (1991)’s result to a more general case, the convergence
rate of the deconvolution kernel estimator is still slower than the classical kernel
estimator. This brings us some difﬁculty in proving the technical results. To overcome
this difficulty, we adopt the sample splitting technique. The sample splitting scheme
required in the proof is not so realistic in certain cases, but the simulation results

show that the test statistic behaves good if we do not follow the sample splitting

scheme.

In the Berkson model case, things become relatively easy. From X = Z + u
and the independence between Z and u, E (YIZ ) is known under the null hypothesis
except the parameter. After changing the hypothesis, the testing procedure can be
developed in the similar way as done in the errors-in-variables model case.

This thesis is organized as follows. Chapter 1 discusses the model ﬁtting for
errors-in-variables model in which the regression function under the null hypothesis
is linear in parameters. Theorem 1.3.1 gives the asymptotic distribution of the un-
derlying parameter estimator. Theorem 1.4.1 gives the asymptotic distribution of the
minimized distance under the null hypothesis. A test statistic therefore can be con-
structed based on this theorem. Several simulations are present in section 1.5. Some
problems related to the sample allocation scheme and the results about the general
errors-in-variables models are discussed the subsequent section.

Chapter 2 discusses the minimum distance model ﬁtting in Berkson model. Corol-
lary ?? and Theorem 2.3.1 state the consistency of the underlying parameter estima-
tors, Theorem 2.4.1 and Theorem 2.5.1 give the asymptotic distribution of the param-
eter estimator and the minimized distance under the null hypothesis. A test statistic
therefore can be constructed based on the Theorem 2.5.1. Simulations conducted in

section 2.6 show the testing procedure is quite satisfactory.

CHAPTER 1

Minimum Distance

Errors-in-Variables Model Fitting

1 . 1 Introduction

The ﬁndings in the classical regression case motivate one to look for tests of lack-of-
ﬁt in the presence of the errors in variables based on the above minimized distances.
Since the predictor in errors in variables models are unobservable, clearly the above
procedures need some modiﬁcation. To be speciﬁc, in an errors in variables regression

model of interest here, one observes Z2" Y2 obeying the model
Y1- =u(X.i)-I-€z', Zi :Xi+ui’ 1 Si 371, (1.1)

where Xi’s are the unobservable d-dimensiona] random design variables. We addi-
tionally assume that (Xi, 52-, “'2" Zi, Y2), 2' = 1,2, - ~ ,n, are i.i.d. copies of (X, 8, u,
Z, Y). The r.v.'s (X, 11, e) are assumed to be mutually independent, with u be-
ing d-dimensional, and 5 being l-dimensional r.v.’s, E(€) = O, E(u) = 0, and their

8

marginal distributions having densities f X1 fir, and f5, respectively. For the sake of
identiﬁability, the density f“, is assumed to be known. This is a common and stan-
dard assumption in the literature of the errors in variables regression models. The
densities f X and f5 need not be known. The problem of interest in this chapter is

to develop tests for the hypothesis
H0 : 11(1) 2 631(1), for some 90 E Rq, v.s. H1 : H0 is not true, (1.2)

in the model (1.1).

A way for constructing tests here is to ﬁrst recognize that the independence of X
and 5 and E(5) 2 0 imply that 11(2) := E(Y|Z = 2) = E(u(X)|Z = 2). Thus one can
consider the new regression model Y = 11(2) + C, where the conditional expectation
E((|Z) = 0, hence C is uncorrelated with Z. The problem of testing for H0 is now
transformed to test for 11(2) 2 1190(2), where 119(2) := 6TE(1'(X)|Z = 2). Note that

for any 2 for which f Z(2) > 0, we have

 

_ ”(meme —- 2:)de

11(2) — ffx($)fu(z — xldiv (1.3)

From (1.3) one sees that if f X is known then f Z is known and hence 119 is known ex-
cept for (9. Let Q(2) :2 E (r(X )|Z = 2). Therefore a modiﬁcation of K-N’s procedure

in this case is as follows. Deﬁne

._ __1_n .Z ._ T . 2 .. 61
W) .— / Ian<in=ZIKhI( >09 6 Q(Zz))] th), 66R.

:1:
3
||

arg min9ERan(6),

Here h is a bandwidth depending only on n and K hiiz) is redeﬁned as K ((2 -—
Zi)/h)/hd for any kernel function K and bandwidth h . Then we may use 611 to

9

 

 

estimate 6, and construct. the test based on the Trim—n.)- Unfortunately, f X is generally
not known and hence f Z and (2(2) are unknown. This makes the above procedures
infeasible. To construct the test statistic, one needs estimators for f Z and 62(2). In
this connection the deconvolution kernel density estimators are found to be useful
here.

For any density L on Rd, let (i) L denote its characteristic function and deﬁne

 

 

__ 1 _. . ¢L(t) ._ _ 12
Lha) ._ ——(27r)d/Rdexp( 1t emu/mat, 1._( 1) /, $6R,(1.4)
,, I Tl x—Zz‘
thCL‘) == W211“ h ), :EERd.

The above Lh is called the deconvolution kernel function, while th is called de-
convolution kernel density estimator of f X, cf. Masry (1993), Carroll, Ruppert and
Stefanski (1995).

Note that Q(2 ) is equal to R(2 )/fZ(2 2,) where R(2) = fr(a:)fX(:r)fu(2 —— :r)dx,

and fZ(2) =(ffX5r )fu.( 2 - a3)d:r Then one can estimate Q(2 )b y
one) = Rats/me), (1.5)

where Rn(2) = f 1‘(:17)th(:1:)le,(2 — :r)dx, th(2) =f th(:1: :r)fu(2 —:r)d:r. At this
point, it is worth mentioning that, by the deﬁnition of Lh and a direct calculation,
one can show th is nothing but the classical kernel estimator of f Z with kernel L
and bandwidth h. That. is, th(2) = 3:1L((z — Z,)/h)/nhd.

Our proposed inference procedures will be based on the analogs of Tn where (2(2)
is replaced by the above estimator Q71, and f Z is replaced by a kernel estimator.

A very important. question related to the above procedure is the following: Are

10

 

the two hypotheses, H10 : 11(1‘) 2 6311-17), for some 60. and all 51:, and H20 : 11(2) =

63E(r(r)|Z = 2), for some 190 and all 2, equivalent? The answer is negative in
general, but in some special case, these two hypotheses are equivalent. See a general
discussion in Section 1.6.2

The large sample behavior of the deconvolution kernel density estimators strongly
depends on the smoothness of the distribution of measurement error 11. Using the
terms from Fan and Truong (1993), a distribution is called ordinary smooth if the
tails of its characteristic function decay to 0 at an algebraic rate; it is called super
smooth if its characteristic function has tails approaching 0 exponentially fast. As
Masry (1993) showed, the local and global rates of convergence of the sequences of
deconvolution kernel density estimators are slower than that of the classical kernel
density estimators. Moreover, these convergence rates are much slower in the super
smooth cases than in the ordinary smooth cases. But Stefanski and Carroll (1991)
shows that in the one dimensional case with 1‘(:r) = :L‘, for estimating E(X|Z = 2)
by (271(2), faster rates are obtainable. For example, in the case of normal measure-
ment error, the mean squared error rate of convergence of f X ’73 to f X is of order
(log(n))—2, while the convergence rate of Qn(2) to E(X|Z = 2) is of order n—4/7.
Even so, the convergence rate is still slower than the mean squared error convergence

4/5 in the one dimensional case. This

rate of the classic kernel estimator, which is n—
creates extra difficulty when considering the asymptotic behaviors of the analogs of

the corresponding MD estimators and test statistics. In fact, if we base the estimators

of f X1 hence Q(2) and the other quantities on the same sample, the consistency of

11

the corresponding MD estimator is still available. but. its asymptotic normality and
that of the corresponding MD test statistic may not. be obtained. We overcome this
difficulty by using different bandwidths and splitting the full sample, say 5, with
sample size n into two subsamples, SI with size 111, and $2 with size 112, then using
the subsample 52 to estimate f X hence Q(2) and the subsample 31 to estimate the
remaining quantities. The sample size allocation scheme is stated in section 2.

To be precise, let

111 n

121.2(2) == 21(71ng 1/n1, wa< 1:: Z Lw<(x—zj)/w>/n2wd,
2:1 j=n1+1

Rn2(Z) :: /T($)qu,1(I)fu(Z—.T)dl', qu12(" I: )/wa2 (dd?)fu(Z—$).’L‘ (I),

Qng (Z) 3: R112 (3)/fzw2IZ),

where h1, h2 depend on 111, and 1111 and 1112 depend on 112. Now deﬁne

- 2
117,6 := K , Y,-6T n z,- dGz,
<1 “11,12,292: 1,116 2< >1] (1

A

On.

arginf9eRq Afn(6). (1.6)

Then we may use 6n to estimate 0, and construct the test. statistic through Mn(én).
We ﬁrst prove the consistency of ﬁn for 0, then the asymptotic normality of , /n1(én —

60). Finally, let
111

C31” 3: ‘92‘1Q'1112fzjal C71 3- "12 Z I/KhlzIz ZlCZdU/hZIZ Z)
711 A 2
F11 1:711—2 Z (/Kh1,( 2)I\h1j(2 )Cvidewh2IZl) ,
1553': 1
7 _(I____G(3)
day 2) (1.7
WI th2< > )

I2

We prove that the asymptotic null distribution of the normalized test statistic
n1I131/2P; 1/2(1l{n(én) — (in) is standard normal. Consequently, the test that re-
jects H0 whenever nhcll/2F;1/2|Aln(6n) — CHI > 20/2 is of the asymptotic size
a.

This chapter is organized as follows. Section 2 states the needed assumptions.
A multidimensional extension of Lemma A1 in Stefanski and Carroll (1991) is also
proved there, together with some other needed results. Section 4 proves the asymp-
totic normality of the MD estimator. The asymptotic normality of the MD test
statistic is discussed in section 5. Section 6 includes some results from a ﬁnite sample
simulation study.

In the sequel, c will denote the generic ﬁnite positive constant whose value depends
on the context. For any vector b, bT denotes its transpose. For any function f, we will
use f, f to denote the ﬁrst and the second derivative with respect to its argument.
The convergence in distribution is denoted by 2—1—‘1, and Nd(a, B) stands for the d-
dimensional normal distribution with mean vector a and covariance matrix B and

E S denotes the conditional expectation given the subsample 51. The integration

1
with respect to the G-measure is understood to be over the compact set I.
1 .2 Assumptions

This section ﬁrst states the various conditions needed in this chapter. About the
errors, the underlying design and the integrating o-ﬁnite measure measure G, we

assume the following:

13

 

 

(e1) The random variables {(Zi, Y1) : Z, G Rd,Yi E Rt :2 1,2, - -- ,n} from (1.1) are
i.i.d. with the conditional expectation 11(2) 2 E(Y|Z : 2) satisfying f 112dG' <

00, where G is a o-finite measure on Rd.

(e2) 0 < a? = E52 < oo, E||1“(X)||2 < 00, and the function 62(2) = EIBgMX) —

63Q(Z))2|Z = 2] is as. (G) continuous on I.
(e3) E|512+6 < co, E||r(X)||2+6 < 00, for some 6 > 0.
(e4) 131514 < 00, 13111110114 < oo. *

(u) The density function fu is continuous and f |¢u(t)|dt < oo.

 

(f1) The density f X of the d—dimensional r.v. X, and its all possible ﬁrst and second

derivatives are continuous and bounded.

(f2) For some (50 > 0, the density f Z is bounded below on the compact subset 160

of Rd, where for any 6 > 0

d
I — yElR : max y-— '36, 1.8
6 I lsj'sdl J JI ( )

y=(y1"”iyd)Tyz=(z1’..I92d)T,ZEI},

(g) G has a continuous Lebesgue density 9.

About. the null model we need to assume the following:

 

(m1) There exists a positive continuous function J (2), such that as Hi“ —> 00,

f (dz — r) — 1(2)) €Xp(-ith)fu(.r)d.r

” ” ant)

 

S J(z).

 

 

 

 

for some a 2 0 and all 2 6 Rd, and EJ2(Z) < oo.

14

(m2) E||1‘(Z)||2 < oo, E12(Z) < 00, where [(2): f ||r(.r)||fu(2 — :r)d;r.
About the kernel functions. we assume:

(f) The kernel function L is a density, symmetric around the origin, ||t||0|¢L(t)| <
00, for all t 6 Rd; l\=‘loreover, f H‘UH2L(U)dU < 00 and f lltlllglgbL(t)|dt < 00 for

,3 = 0, a, with a: as in (1111).
About the bandwidths and sample size we need to assume the following:

(11) With n denoting the sample size, let n1, 112 be two positive integers such that

n = 111 +112, 11.2 = [11?], b > 1+ (d+ 2a)/4, where 01 is as in (m1).
(hl) hl ~ 11?, where a. < min(1/2d,4/d(d+ 4)).

(112) 112 = cabana/12111461”).

—1 d+4+2a
(WI) 1111 = n2 /( ).

(W2) 1112 = 02(log(n2)/n2)1/(d+4).

Assumption (m1) is not so strict as it appears. Some commonly used regression
functions such as polynomial and exponential functions indeed satisfy this assumption
as shown below.

Example 1: Suppose dzq, r(;1:) = 1:, and 11 ~ Nd(0, Eu). Then,

f<r<z — :r) — 1(a)) exp(—1tTr)fu(:c)dr
(1211(0

 

 

 

 

 

 

 

T . .1
-exp (t 2111‘) = “(90110) H ~exp(tTZ-ut/2) S clltll,

= /1‘ exp(—ifT4l‘lfu(I)d4f 2 Mt

 

 

where the constant c depends only on Eu. Hence (m1) holds with a = 1 and J(2) = c.

15

 

 

Example 2: Suppose (lzqzl. 1'(.1:) = .r. and 11 has a double exponential distribution

with mean 0 and variance 0,2,. In this case, (911“): 1/(1+ 0312/2) and

f(1‘(2 — :r) — 1‘(2)) exp(—it;1:)fu(:r)d.r
6-110)

= /19XI)(-it1lfu($ld1/I<211(tll

_ (91.5141) , = cltl

 

 

 

 

with c now depending only on 03. Hence as Itl —+ 00, (m1) holds for a = 0 and,

J(2) = c.

Example 3: Suppose d=q=1, 1(1) = ex, and 11 ~ N(0,012,). Then

I/(T(2 — 1‘) — 1'(2)) exp(—it:1:)fu(x)d$l
-_— I/(ez LE- (ﬂexp(—itl‘)fu(:r)dxl

“I

where c is some positive number depending only on 03. Hence (m1) holds for a = 0

I/\

/exeit1‘fu(1‘()d~rl + I¢u(t)I] _<_ cezl¢11(t)I

 

and, J(2) = 062.
Next, we give some general preliminaries needed in the proofs below.
In the case of r(;r) : er and d = 1, Stefanski and Carroll (1991) obtain the following

results:
{Em-2(2) - RM}? S 114,111+ 122). V“"(Rn2(3)) s ornawn‘kwfz“ + 22),

for all 2 6 IR, and under the assumptions (1) f X1 f X and f X are continuous and
bounded; (ii) f|¢u(t ()Idt < 00: (iii) as ltl —> oo, lou(t)/¢u(t)| = 0(ltla), for some

16

 

 

a Z 0; (iv) 112 —+ 00, and U11 —+ 0. The kernel function L used in the deconvo-
lution estimator is assumed to be four-times continuously differentiable, compactly
supported and real valued. The following lemma. is a multidimensional extension of

the above results which will be frequently used in the sequel.
Lemma 1.2.1 Suppose d 2 1, and (f1), (11), (1111), (h!) hold. Then for any 2 6 Rd,

||ERn2(2)—R(2)||2 3 21111201),

C

T122113!

 

131161.2(21— Bang/21112 s (126122?“ + 112(211121.

where [(2) is as in (m2), J(2) is as in (1111) and where c is a constant not depending

on 2, n2 and 1111.

Proof. A direct calculation yields that for any 2 6 Rd, Ewa1(x) = f L(v)fX(a: —
u-wﬂdv. By assumption (f1), there exists a vector a(:r, 11) such that f X(:1: — 111111) has
a Taylor expansion up to the second order, fX(:1: — 111111) = fX(:1:) - wlvaX(a:) +

wgvauX (a(:1:, 11))11/2. Hence

ERn2() ;//"II(')L (11 )f(X :1: — 111111)fu(2 — 51:)dudrr

=1~<//21L (211.12 12—12162

—w //r((21: v1vT1X(2 111(2—21dvd2

ff )11211TfX(a( 21121.1(2 — 21dvd2.

 

Assumption (5) implies that the ﬁrst term is f1‘('r r)fX(:r)fu(2 — :1:)d:c— — Rz( ), the
second term vanishes because of f 1.1TL(1')dv = 0, while the third term is bounded
above by 01(2) by assumption (f 1), where c is a positive constant depending only on
the kernel function L. Therefore, the ﬁrst claim in the lemma holds.

17

Note that Rn2(2) —— E [2712(2) is an average of i.i.d. centered random vectors. A
routing calculation shows that

1

1311122221 — 13122221112 5
71221

 

.22?” / 7‘(17)L1111((I — 21/211112 — 2122“2

by using the fact that the variance is bounded above by the second moment. Let
D(t,2) = f r(a:)fu(2 — :rr)exp(—itT:1:)d:r. By the deﬁnition of the deconvolution

kernel Lb, it follows that
_1 2
2dEH /7‘($)Lw1((1‘ “ Z)/U)1)fu(z — 1‘)dx”
“’1

t.

 

= // DT(t, 21D(s, 212L(twl12L(swl12x(t + 21221(t+ 1‘)de
(22120122212221

By changing variable, D(t, 2) = exp(—itT2) fr(2 — x)fu(:1:) exp(itT:1:)dx. Adding

and subtracting r(2) from 7‘(2 — :c) in the integrand, we obtain

 

. T
, rz—I—rz xex ltxdsc
0(221 : epoTz) 22(2) [2(2) + f( ( 1 ( 11111 1 p( 1 l-
6511“)
From assumption (m1), ||D(t, 2)“ is bounded above by |¢1u(t)| - [||r(2)|| + J(2)||t||a]
for all 2 6 Rd. Hence EIIRn2(2) — ERnZ(2)||2 is bounded above by

2112(2)“2
712
+W / f ((12110 +1121101121221122221112212+ 21ldtds

2
CJ 2(2)//lltlla||8|lal¢LUw1)<1‘>L(5w1)cDuU+ Slldtds-

n

/ |¢L(tw1)gbL(sw1)ou(t + s)|dtds (1.9)

+

 

Note that for any m, p = 0 or 02, from assumption (5), we have
/ 112(1P11211m12L(221mus-2112.12 .2 31122.12

_._ _2d . . ,
3 21p m / (Itupusu’"12L(112L(s122((2221/21112222

18

 

 

222;”""‘2” / (1211”"12u211122((2+21121112222

(.wi'p—m—2d/||s||m|oL(s)|(/Iou((t +s)/z_1.'1)|dt)ds

= 1‘ ”‘m‘d / 1121(’"'I2L(21122 - / 12.121122 = 1‘22”“"’"“.

| /\

The second claim in the lemma follows from (1.9) by using the above inequality. [3
By the usual bias and variance decomposition of mean square error, the following

inequality is a direct consequence of Lemma 1.2.1,

C

 

2711222221— 11(2112 5 222111221 + (12212140 + 112(211121.

d
1

71211)

If the bandwidth wl is chosen by assumption (wl), then
_ 4
231222.2(21 — 12(2112 s 222 3+20+4 (12(2) + 12(2) + (12(21121. (1.101
In the sequel, we will write
7(2) := 12(2) + 12(2) + ”2(2)“? (1.11)

The following lemma we will be used repeatedly, which along with its proof appears
as Theorem 2.2 part (2) in Bosq (1998). We state the lemma for a sample size n and

a bandwidth h, they may be replaced by n1 or n2. h2 or 1112 according to the context.

Lemma 1.2.2 Let fZ be the kernel estimator with a kernel K which satisﬁes a Lip-

 

schitz condition and bandwidth h. If fZ is twice continuously differentiable, and the

bandwidth h is chosen to be cn(log(n)/n.)1/(d+4), where ('n —1 c > 0, then

(1221... 211(2/ 122(2112/(“9 2:1} (12(2) — .1Z(21( ——> 0

for any positive integer k and compact set I.

19

1.3 Asymptotic normality of 92

Recall the deﬁnitions in (1.6). Because the null model is linear in 6’. so the minimizer
6." has an explicit form obtained by setting the derivative of [1171(6) with respect to

6 equal to O, which gives the equation
"1 ”1
/I"11 ZKh12(Z) Qn(Z2 iZKh11@)Qn2(Zi)dl/w2(z') 63n
1 "’1
f1 22—1 21 19.112111" 31—2 1211221222122. 1d2h,(21
Adding and subtracting 63Qn2(Zz-) from Y2 and doing some routing arrangement,

én will satisfy the following equation:

”1
£7111 ZKhli( ZlQn2( Zi') —Tl11in§11Khlz(:1)Qn2(Zi)dI/3h2(z) . (én _00)

1 T A 1 - ,

(1.12)

The above explicit relation between én — 90 and the other quantities allows us, com-
pared to K—N, to investigate the asymptotic distribution of én without proving the
consistency in advance. Most importantly, the separation of bin from R-n,2(2) makes

a conditional expectation argument in the following proofs relatively easy. To keep

 

the exposition concise, let

”1
2121(21 == 51—211,.”(21m—63Qw211. (1.131
Dn(2) := %1::Khli@l(Qng(Zil Q(Zillv

20

l 1
1111 := A1, A1,, (2) := 7—.— _ _._
1( "1,- :2: 12(31 féhzbz) f%(z)

The main result in this section is the following theorem:

Theorem 1.3.1 Suppose H0, {61), (e2), (85’), (u), (f1), (f2), (m1), (m2), (5), (n),

(111), (112), (1111), and (11121 hold, then ,x—n1(én — 90) => Nd(0,2612261), where

2 T 2
_ , T2 Z = T(z)Q(z)Q (219(21z
2:0— /Q(~)Q (1dG(1, 2 j M d,

 

and 72(2) = a? + 62(2), where 02, and 62(2) are deﬁned as in (62).

Proof. It sufﬁces to show that the matrix before an — 00 on the left hand side of
(1.12) converges to 20 in probability, and \/n_1 times the right hand side of (1.12) is
asymptotically normal with mean vector 0 and covariance matrix 2.

Consider the second claim ﬁrst. Adding and subtracting 03Q(Z,-) from Y,- —
60 Qn2(Z ,) 1n the ﬁrst factor of the integrand, and adding and subtracting Q(Z,- ) from

Qn2(Z ,) 1n the second factor of the integrand, replacing 1 / f Z h (z ) by l/fgh2(z) —

1/f%(z)+1/fZ(z) := An1(z) + l/fZ(z), \/n—1- times the right hand side of (1.12)

can be written as the sum of the following eight terms:

3111 4‘ / U111 71(22)An1( 21dG(211 S112 =ﬂ / U111(21Dn(21dw(211
51,13 4“ / U111 11n1(21A111(21dG(211 1144— / U111 1211111()dw(21,
8,15 := —¢— / D11 T(2.1An1(21dc(2160,

-¢r/Dn.(:01€(21d11(2160,

S117 == ya / Dr1-1(2112n1(21A111(21dG(21 60,

5718 2: —,/nl/D,;(z);1,£1(z)dti‘(z)60.

Cr)
:3
cu

ll

21

Among these terms, 8,14 is asymptotically normal with mean vector 0 and covariance
matrix )3. The proof uses Lindeberg—Feller central limit theorem, and the arguments
are exactly the same as in K-N with ”’60(Xi) and #26009) there replaced by 6362(le)
and Q(Zl-) here, respectively. The proof is omitted. All the other seven terms are of
the order 0p(1). Since the proofs are similar, only Sn8 = 013(1) will be shown below
for the sake of brevity. We note that by using a similar method as in K-N, we can
show Un1(2) is Op(1/ nlh‘li), which is used in proving Snl = 0p(1) for l = 1,2,3.
First, notice that the kernel function K has compact support [—1,1]d, so K h 12-
is not 0 only if the distances between each coordinate pair of 2i and z are no more

than h. on the other hand, the integrating measure has compact support I, so if we

deﬁne

y: (1113.” ayle,Z=(21,“‘ ,Zd)T,ZEI},

then Zhl is a compact set in Rd, and Khli = 0 if 2i ¢ Ihl. Hence, without loss
of generality, we can assume all Z1: 6 Ih 1. Since f Z is bounded from below on the
compact set 150 by assumption (f2) and Ihl C 160 for 711 large enough, so from

assumption (W2), Lemma 1.2.2. we obtain

 

 

 

., 2
sup Tf—Z—ﬁ — 1 = 0((10gkn2)(logn2)m) as, (1.14)
ZEIhl fZ'll’2(Z) n2
sup Tf—Z—EL' 7- Op(1).
ZEZhl qu72(2)

Secondly, we have the following inequality:

22

 

 

 

. Illinngl') - R(Z,-)|l fZ(Zz)
. Z: — Z- < . - ~
————.fZ(Zi) —1 - Z- 115
+ quQsz') ”Q( I)“. ( . )

 

 

Recall the definition of 5728' We have

771
llSnsll _<. \/_||90|| / 2319.1. zIIIQn2( I— Q(Zz°)ll

i1 2: Kh1,(zIIIQ<Z.IIId2I<zI.
i=1

From (1.15) and (1.14), this upper bound satisﬁes

 

2
l
\/"—1 019(1) An11+\/—1 0(10gkn2)(0g:2)m)v‘1n12, (1-16)
where
1 "1
Anll = /n—‘11::Kh11(z)lang(Zi—) R(Zli)ll'agKh1i(2)llQ(zillld1/9(zl

Anlg = [[5 2:1Khlic )(ZIIQ ZI-II]2dw<zI.

By the Cauchy-Schwarz inequality, A2 is bounded above by
7211

f [% 2::Kh1mIIIRn2<Z-—I R(Zi)||]2dw(z)-/[nil:1Kh1,(z)llQ(Zi)lll2dw(2)-
i=1

Note that
1 n1 , . 2 _
E f [E Z Kh1i(:)llR122(Zzt)- R(Zi)||] dw(z)
i=1

23

 

 

/E (”—11 2:1Ahl2(3)I\hlj(3)HRIIQ(ZI)_ R(Zi)ll'“RnQ(Zj)—R(Zj)ll)dlgk‘(z)
2,]

/E(;1§ :1: Ifhll(z Ilh1j(z “)E51(|IRn2(Zi)‘—R(Zi)“
n12,j=1

-||Rn.2( sz — R(Zj)ll))dni'(z)-

By the Cauchy-Schwarz again,

E51(||Rn2(ZI) — R(Zi)|||l1i?n2(Zj) — R(Zj)|l))
s IES1III‘2n2IZI-I - R(ZI)||2)1/2(E51Ian.2(ZJ-) — R(Zj)ll2)1/ 2,

which in turn, from the independence of the subsamples 81 and SQ, the choice of 2“

—4/(d+2a+4)

bandwidth wl, and (1.10), is bounded above by on2 T1/2(Zi)Tl/2(Zj),

where T is deﬁned in (1.11). So

ETI/[ll ZKhlzlz )HR"2(Z 2') R(Zi)|l]2dw(2)

C’gaﬁTJr/ﬁ—EIKW (2*)(ZT1/2Z))2 dw()

|/\

Using the similar method as in K-N, together with the assumptions (m1) and (m2),

we can show that

M
/(nl_12: Khl‘i(z)Tl/2(Zi))2d“(3) = Op(1), (1.12)
i=1

"1
1 2 ~ _—
/(ai=2:lKhli(z)llQIZz-III) dn(,)_op(1),

Finally, from (1.16), we obtain

 

 

2
10 n
llSnsll<\/_ OpIn2 / ”+20” INTI—1 2201021 "2>( g 2)m)

712

24

which is of the order 0p(l) by the assumption (n).
To finish the proof, we only need to show the matrix before an — 60 on the left
hand side of (1.12) converges to 20 in probability. Adding and subtracting Q(Zi)

from Qn 2(Z 2") this matrix can be written as the sum of the following eight terms:

Tm 2: /Dn (aszAan n2—/Dn>11T11<A1211>dc<a
"3 := fﬂrzlleanzlAnlf )dcw n4 — f 11,211 >11nw11mn1<z>d012i
T715 2: /Dn()Dn (2.')d1,/")(Z), T116 :=/Dn(z)u£1(z)d¢(z),

Tn7 z= fun112>Dn12>dw<z>, Tns:=/un11z>u%"1<z>dw<z>.

Notice the connection between T 111 and S725, TnQanB and Sn7,T 715 and 31161
TnGan'? and 3718- By using similar argument as above, we can verify that
Tn! = 0p(1) for l = 1,2,3,4,5,6,7. From (1.14), and the second fact in (1.17),

T714 is also of the order of 0p( 1). Finally, employing similar method as in K-N, we

can show Tn 8 converges to 20 in probability. Thereby proving the theorem. [:1

1.4 Asymptotic normality of the minimized dis-

tance

This section contains a proof of the asymptotic normality of the minimized distance

Alb-Adm). To state the result precisely, the following notations are needed:

a 12.431212 1 (wag—agengzi),

721

6n, := _2Z/Kh12 (2(11;)(z ), 1\7[n(00):= /[n1 121(h11(3)Ci]2dl/"'3( ),

25

T
A

F 2: 2/(T2(z))29(z)du(z)-/[/K(11.)K(u+1')du]2d11.

where 72(:) is as in Theorem 1.3.1.

The main result proved in this section is the following:

Theorem 1.4.1 Suppose If H0, (61), (e2), (e4), (11), (f1), (f2), (ml), (7112), (K),
(n), (M). (I12) ,(wl) and (1122) hold, then 711hf/2f‘51/2(A171,(6An)— 6771):) N(0,1),

where 677. f‘n are as in {1.7).

The proof of this theorem is facilitated by the following ﬁve lemmas:

Lemma 1.4.1 If H0, (61), (62), (e4), (11), (f1), (f2), (m1), (m2), (5), (n), (hl),

(1111) and (1112) hold, then
d/2 ,” ~

Proof. Replacing C,- by 5,- + 193 (Q(Z,-) — Qn,2(Z,-)) in the deﬁnition MMBO) and
expand the quadratic term, n1h7/2(1\~[n(60) — C'n) Can be written as the sum of the

following four terms:

B

1
nl n—g Khlzlz )Kh1j( Z)€1€jdw(z)
"1

3,, == —"2— K111 )11,1,()(11,13‘1Q(z ) Q11212>>11211
711

n1
2
,/
”1
3112 := i¥§/K h1i( 3,)Kh1j( Z)151'90(Q (Zj)-Qng(Zj))dh9(Z)a
"1
Z
12 /

and

BN4 '= 73%:Zl/Khlihﬂgh1jm60 (leil —Q712(Zi))
7» J

631Q12,)- 131212))1111)

26

 

 

Using the similar method as in K-N. one can show that. ﬁlial/28,71 => 1N'd(0, P).
1 . . . . . d/2 ,
To prove the lemma, it is suthIent to show enlhl Bnl = op(1) for l = 2,3,4. We
begin with the case of l = 2.
By (1.14) and the inequality (1.15), and let

CIIIj =§/Kh11)(z Kh1j(z )E1dw(z )
2 J

then 811.2 is bounded above by the sum 37121 + 8,122, where

 

n1
1 A
8121 := 0111—, 11121212,>—R12,-)H1011,11,
”1j=1
logn2 3% 1711
B1122 z= 0((logkn2)( 1, ) )-—,ZiIIQ12,->n-Icn1,n.
”31 :1

On the one hand, by the conditional expectation argument and inequality (1.10), we

have

—, 2111111121 (2 ,-)— R1 Zhu-1011,11

Enlj=1
”1
= "—1—2 Z[E31(HRn2(Zj—) R(Zj)”) ' [anjll
En1j=1
——2/(d+2a+4) 1 2Z
g 0712 E[12 g: T /( ])|ijl]
n2
1j=1
—2 (1+2 +4
:11, /( a )aElTl/2(21)'|Cn,illl-

Now. consider the asymptotic behavior of E [71/ 2(Z1) - ICm-l I]. Instead of consider

the expectation, we investigate the second moment. It is easy to see that ET(Zl )C12111

27

equals to

T()Z1 Z Z //1\/,1,(2 )Ixhl1(z)Kh1j(y)Kh11(y)€,€jd1&(z)d1,b(y) (1.18)
zaéljaél
= 1121 —1>E// 119,121: 111,211 >131 E1111,1(2)I<,,,1(1)7121)1211211111).

The second equality is from the independence 011,-, i = 1, - - - ,nl and E61 = 0. But

E(Kh12(3)Kh12(1/)€%)= 1K11212>1<1121y>12§M2122)»

= Zé?/11(5)?)K("’I:1“)(0,21115211))1121111111

1

Similarly, we can show that

 

 

     

— 10103 + 15212 — hlv))fZ(z — 1111mm

 

E(Kh11(2)Kh11(y)T(Zi)) = l—ld-/K(v)K y _ - v)T(z - hiv)fz(z - h110611)-
11

Putting back these two expectations in (1.18), and changing variables y = z + hlu,

then by the continuity of f2, 62(2), 9(2), and T(z), we obtain ET(Z1)C2

nil :

(n1 — 1)h1—d. Therefore,

—2 d+2a+4 1 —d 2
7:1[IIR712(Z] RZ( dll'lCni'll—‘Oolg /( )— nl—lhl /).
J J M V
En1j=1

b d 2 d 2 r
This, in turn, implies 81221: Op(n1—2 /( +2a+4)— 1/ h1— / ), by assump— "
tion (11). Similarlv one can show 11 {2:311 1[||Q(Zj)|| - ICnin is of the 1.3.1

1 2 —d 2 2/(d+4)

order Op(n1— /h1 /). Thus, 87,22 = op((logkn1)(logn1/nfi) -

. Hence

—12—d2
nl/h1/)

1 2b 1 2b 2
d 2 _“—‘_-—_ _ 1
nlhl/ an2| 2 011(11? d+211+4> + Op(n1Q m logk 111(log nﬂm)

28

is of the order 0p(1) since b > ((1 + 2a + 4)/4 by assumption (n).

. .,.....,,.d/2

By exactly same method as a )o\ e, we can show that. nlh1 B713 = 0p(1).
. ' d/2 7
It remains to show that nlh1 B774 2 0p(1). Note that
1 n1
, 2 ‘
(3,41 s A? Z. [1,11,121Khlj12111601 112712121) — c2121)“-

i¢jA
”Q'ngizj) - Q(Zj)”dl¥/‘(3)-

From (1.15), the right hand side of above inequality is bounded above by the sum

 

2
l .
01,11) . 3,,41 + 0p((logk n2)( 0g ”2)m) (37242 + 13,,43) (1.19)

n2

+019 ((10122 E2) (10:32:?) ELL4:71) 37144’

 

where
1 n1 2
87141 := 7 [K1,11121Kh1j121-1122712121) «(El->11-
"1 #1
(1117:2123-1— 31211112121212),
711
1 A
87142 == 72 / Kh1112)1<h1j(2)-112172121) —R1Zz-)II 1 IIQ1Zj>IIdw12),
"12221
1 n1 ,
B7143 3: ﬁZ/KMAZNﬁle-llan2(Zj)—R(Zj)|| ' ”1212011211121,
12212“
1 ”1
En44 == 72 [K11112>1<h,j12>-111212111111121231111212).
”1 2211'

By a conditional expectation argument, Cauchy—Schwarz inequality, (2.2), and the

continuity of f Z and T(z), we obtain

n—4/(d+2a'+4)

EEn41 s 2222‘4/‘d+2“’+4) / E11<111(2171/21211122212) = 01 2 >.

29

 

 

—4/(d+2a+4)

This implies B7141 = Op(n.2 ), since I) > (d + 211 + 4)/4 by assumption
(n), so that
d 2 d. 2 —4b d+2a+4
nlhl/ -op(1)B,,41= nlhl/ . 010(1)op(n1 /( l) = 0,,(1).
Similarly, we can show
—2 d+2a+4 . —2 d+20+4
37142 = 019012 /( )), 812.43 = 019012 /( )), Bn44 = 010(1)-
Therefore, for l = 2, 3,
2

nthll/Zop((10gk RB) (13%?) m)Bn4l

 

1_d?£4—d+223+4 (11/2 32—4
2 0p(n1 [11 (logkn1)(logn1) + )

which is of the order op(l) by assumption (n). For 87,44, we have

 

4
d2 logn d—
"lhl/ '0p((login2)( n22) +4)Bn44

4b
1—3— d 2 4
2pm, +4h1/(108%"1)(10g"1)a+4)

which is also of the order 0p(1). Finally, from above and (1.19), we prove

nlhcll/2Bn4 = 019(1). Thereby proving the lemma.

Lemma 1.4.2 In addition to the conditions in Lemma 1.4.1, suppose (h?) also holds,

d 2 2
then nlhl/ (ann) — 1122160)) = 0,011).
Proof. Recall the deﬁnitions of A{n(6). Adding and subtracting
1 711 T 2
a Z Khﬂi’v’WQ Qngizi)
i=1

30

 

in the squared integrand of A«In(dn), we can write 11.1,,(én) — 111;;(60) as the sum

l’an + 2Wn2, where

"1

v 1 A A 2 1‘

11,11 := / [;§:K,,,112)160——9n.>TQn2(Z2)] 21221212),
1i=1

n1
W112 := [7,112. 2: Khlim n11: Kh1i( Z)(90- 9n)T Qn2(Zz')d?/1h2(z),
and (2.: Y, — 190 Qn2(Z 2") Easy to see that

1 n1 2 2 2
111,1 3 2] [5—1-:Kh,,122)1904219112212122)421211)] 222,212111201
i=1

1 2 T 2 1
+2] [a 2:1 Khlz-(ZXQO — 9n) Q(Zi)] d¢h2(2)-
2:
We write the ﬁrst term on the right hand side as Wnll and the second term as Wn12-

On the one hand, note that Wnll is bounded above by

f (Z) 2
Ha — 6012 sup | Z ( Hui ZK21111~>HQ21212 > — Q( 22-111] 2112)
“21023
By the conditional expectation argument as we used in the previous part, we can
show that the integral part is indeed of the order 019(1). By assumption (W2), the

compactness of I h 1, and the asymptotic behavior of én — 90 stated in Theorem 1.3.1,

nlhcli/QWnll = op(hC11/2) = 029(1). On the other hand, Wnl2 is bounded above by

Ilén—90||2-sup'fZ——Zw—-— (2:) /[ ”1:2:Kh11o1111Q12122111]2121

Since the integral part is of the order 019(1), so what/2 1'17,le = 01,013” 2

l = 0p(1)
.. 7 . . d/2 ,
1s easily obtalned. Therefore, nlh1 l/an : op(1) IS proved.

Now, consider VVnQ. Rewrite it. as
n1 "1

1 , ,
11,,2 =/n—1 z.2211K,1,1,()cz 251;ZKh1,(z)Q,,2(z,)d1/1h2(z)2(90 4)”).

31

Note that integral part of IV"? is same as the expression on the right hand side of

(1.12). thus
711

Wing = (972—90)T/n11ZKhlz@)Qn2(Zi’)

—1nZKh1i@)Qisz(Zi)dl;’h2(W (90—911)-

Therefore. W n2 is bounded above by

1132—6011 2122;] 2131me 111222212 ll] 21231.21 2)

Adding and subtracting Q(Z i) from Qn2(Z -), it turns out that W122 is further

bounded above by the sum an21 + ang, where

W221 == 21162—60112/1221‘12Khl.11122212) 621221122111212),

W222 := 21112-60112 /1n;1Z19.1.12)11212211122231212)
i=1

Arguing as in Wnll and Wn121 we can show

d 2 d 2
nlhl/ |Wn21| =op11), nlhl/ Iangl =op11).
Therefore n.11hd/2 lunglz 019(1). Together with the result n1h1d2/ [IV n—ll — op(1),

the lemma is proved. [:1
Lemma 1.4.3 If H0, {81), {62), (11), (fl), (f2), (ml), (m2), (5), (n). (11.1), {h2),
(101) and (11.12) hold, n,1hcll/2(1l«fn(60) — ilTn,(60)) = 0p(1).

Proof. Recall the deﬁnition of (l- and Un1(z ). Note that nlh (ll/2|Mn(90) — Mn(00)|

is bounded above by

2
d 2 f (2) .
nlhl/ sup ——2—Z————— 1| ][n—l— 72:1Kh12(z)(l]2 (111(3).
361 th2 1i

32

Replace CZ- bx éi + 67( (Q(Z i)_ Qn2(Z 21)), the integral part of the above inequality

can be bounded above by the sum

2/113 ( )clc >+2/ [n1 ZKhllW (QZ( Zil‘Qn2(Zzﬁ))]2d’C¥’(Z)-

The ﬁrst term is of the order Op((n1h‘11)_1/2) which is obtained by the similar
method as in K-N, while the second term, by the conditional expectation argument,

has the same order as

 

2 ., 2
sup %Z—(:)—- -O(n2—4/(d+2a+4)) + sup I%Z—(Z—)— — 1 2 - Op(1).
zEIhl wa2(3) zEIhl wa2(z)

Therefore ,lnlh d2/ |1\In(60)—A71n(60)| is less than or equal to

d 2 1
0p(n1hl/ -—d -logk n,1(logn1/n1)2/(d+4))
nhl'
d 2 —4b d+2 +4
+ 0p(n1h1/ - logk n,1(log'n.1/n.1)2/(d+4) - n1 /( a ))

+ 0p(n1h(11/2 ~10gk n1(logn1/n1)2/(d+4) -log% n1(logn.1)4/(d+4)n1_4b/(d+4)).

All the three terms are of the order 0p(1) by the assumptions (n), (hl), (112), (WI)

and (w2). Hence the lemma. Cl

Lemma 1.4.4 If H0, (61), (62), (64), (u), {f1}, (f2), (m1), (7722), (i), (n), (h1),

{h2}, (2111) and (2112) hold, nlhii/2(C‘n — (3'71) : 0p(1).

Proof. Recall the notation An1(z) in (1.13). Adding and subtracting Oan,2(Zi)
from Yz in the integrand of hCn, then expand the quadratic term. then Cn — (in can

be rewritten as the sum of C721? 1 = 1,2,3,4, 5, where

33

 

 

721

1

C- 1 1: 2: h 7:(~ 90 Qn2(Z 2'2» Ann )dWZ)’

n 111 12': l/K2 1
2 n1 . A , .

Cng := ”—22 / Kai-(2)0}‘93Q712(Zi))'(90—9n,)TQ712(Zi)An1(Zldwzl,
1i=1

n1
Cn3 == $Z/K131W 90Qn2('—))(90 én) Tengzadwz),
12' 1
Ca := 7 /K,2,1,<z>m 42%" 6271202,»-(<90—én>TQn2(Zi))2An1(awe),
0715 2: '7” Z/K§1,<zz>m ‘gan2(Zi))((00‘én)TQn2(Zi))2dtb(z).

To prove the lemma, it is enough to prove nlhf/2Cnl = 0p(1) for l = 1, 2, 3,4, 5.

For the case of l = 1, ﬁrst notice that

lCnll g ZSuplAn1)('|11:1/Kh1i(—§nZ)€z'2d¢(2)
261 ”12: 11
2 A ~) . K2 z)(90T(Q - — “. Z- 2d
+ supl n1( 1.) h i( Q( Z2) Qn2( 1,)» 19(2)
261 nl i—l 1
= Cn11+Cn12-

Since 721—2112:_lth1i(z(z){i2d¢(z) = Op(1/n1hcli) by a routing expectation argu-

ment, so

"lhii/Qlcnlll = 0p("1h(11/2 '(108k n1)(logn1)2/(d+4)n1‘2/(d+4) . (n.1h1)—1)

0p ("ad/2—2/(dnwl) _ (1ng n1)(logn1)2/(d+4)) = 012(1).

Second, from the compactness of 9, we have

34

U] 1231,2123 (Q(2,~1 — (“22222111221221

712122] Khl,z1(211Q 1 (922(221112d2e1.

Again by the conditional expectation argument, the second factor of the above ex—

pression has the same order as

—4/(d+2a+4) fZ(Z)' 2

O n - su ———~ K2 z)T Z d

p( 2 ) 26121 llew2(Z'2 ) 711%:211/ h1i(z ( )1,“ )
+ sup IJZ_(Z_)__1| 1%}: 1‘/K}IIZ(Z)(leQZizlllzd‘M )

zeIhl fzu,2(Z )

Because

22 / K1112 1(222 Z1dw(z=1 (Jpn/12111311,
n1i=11

-2- Z / A2122311162(ZZ-1112c12( 1= ope/2112311,
n1i=1

so, from (h2), (W2), and Lemma 1. 2. 2, we obtain n1h1d2/ lCn12l is of the order

Op (nl—Q/(CH4)—4b/(d+2a+4)h1_d/2(logk n11)(log n1)2/(d+4))

+ 0,,(121 —d/2n ”1— 2/(d+4)—4b/(d+4)(10g1gn1)(10gn1)6/(d+4))

which is 012(1) by assumption (h1) Hence we get nlh1 (12/ lCn1l_ —- 013(1).
Now we will show that 71112.1/2 ICn3| 2 012(1). Once we prove this, then
nlh1 2|Cn2l = 019(1) is a natural consequence. In fact,

0.23 = 3%;{31/K3412 (62+90Q(Z 1— 63Q2,2(2211
Z—

‘(90 -én)T(Q1-12(Z2)- Q(Zi) + Q(Zilldw(2)-

35

So lCn3| is bounded above by the sum 2(Cn31 + C7132 + 01,133 + C7134), where

01131:: 22/ 211, 1121-11160— (911111Q12121 Q12111121121,

01132 := 51:2/1931112112111160 — @1111111Q12111111121,

01123 := ~112]21111211160111160—é111111Q12(21-1—Q(21111221(21,

01134 == 22 / 11,1,11110111110_1n1111131212,1_ Q12111111Q(2111121(21.

n1i=1

It is sufﬁcient to show that nlh1 d/2 lC n3ll = 0p(1) for l = 1,2,3,4. Because the
proofs are similar, here we only show nlhcli/2ICR3QI = 017(1), others are omitted for
the sake of brevity. In fact, note that

”1112[2111121121111Q12111121( 1=—— 010/1111?)
2— 1

by a expectation argument, then from ”On — 90H— — 019(n1 12/ ) by Theorem 1. 3. 1,

1 —d 2
we have 11111;”? 10,,321 = 2111/2 111911 4011 ppm/11111611) = 0,1111- 1211/ / ). Be-

cause nl- 1/2h1—d/22 nil/2+ad/2 and a < 1/2d. by assumption (hl), so the above

expression is 0p(l). Similarly, we can show that the same results hold for Cn4 and

C715. Details are left out. El
Lemma 1.4.5 Under the same conditions as in Lemma 1.4.4, fn — P = 0p(1).

Proof. Recall the notation for Q. Deﬁne

_2 1 2
Pn=2h1n1 Z (A/Khliu )Kh1](z )fifjdd)h2(2)) .

36

The len'n'na is proved by showing that
ﬁn — ﬁn = 0pm, rn — r = 0,)(1). (1.21)

But the second claim can be shown using the same method as in K-NJ so we only
prove the ﬁrst claim. Write an := én — 60, 'ri := 93Q(Z..)— 6,7; Q712(Z ). Now I};

can be expressed as the sum of f” and the following terms:

"'1
B"1 = 2hil'nI222 [/Kh12(z(z)Kh1J-(z >52 rJ-duh2(z z)
2751'
A 2
+/Kh1;(1 >Kh1j( >972de +th12(z >Kh1j<w>irjdwh2(z>],
3,12 _ 4th 711-2: (th1J(z (z)KhIJ-(z)€zfjd1g71h2(z ))

iaéj
(fKh1i((:)Kh1j(:)€iTjd§:h2((3)-J—

so it suffices to show that both terms are of the order 019(1). Applying the Cauchy-

Schwarz inequality to the double sum, one can see that we only need to show the

following:
Kain {2 K )K ()IE-r-ldz?‘ <~>]2—o (1) (122)
h1i(z hlj .z ] ”12.2 ~ - P '

i#j[

d "1 A 2

h1n2[fKh1i(z)Kh1J-(z)|1‘irjIdu’)h2(3)] = op(1),
#1“

d 2"1 A 2

ml Z[jKh,1i(z>Kh1j<z)Ie:jIda/WM] =0p(1).
iaéj

The third claim in (1.22)can be proved by using the same argument as in K-N. Now,
consider the ﬁrst claim above. From Lemma 1.2.2, we only need to show the claim

37

is true when d'cjvh2(z) is replaced by d-IJ‘J(2). Since rJ- has nothing to do with the
integration variable, so the left hand side of the ﬁrst claim after the replacing can be

rewritten as

2
hi] nw2nzll7‘3 |2[ UKh1i( Z)Ah1j(z )|€J|dv(z)] . (1.23)
#J’

Note that rJ- = III’JIQIZJ-I —c‘2nJ(ZJ-II —u%7Q<ZJ-I —OEJ’(QIJ<ZJI - Q(ZJ-))-. (1.23)

can be bounded above by the sum of the following three terms:

Am == 3h‘1’ nf2ll~21m|l ZIIQIJIZJ zJ-III’] / KIJJ- Kh13(2)|€2|dw( )]2,
if;
2 2

AII = 3h‘1’n; 2IIIIII ZUKIJJI Z)KJI1J(Z)|€IIIIQ(Z J)||d1x>(2)],
#J’

An3 = 3hi’n; 22IIIIII ZIIQIJIZJ I2-[fKIJ1J-Iz )KhlJ-(Z)l€z'ldw(z)]2-
#J

A712 = 0p(1) can be shown be the fact that un = én — 60 = 0p(1), and that
d 2 n1 2
III-r21 Z [ / KI1J<zI1<I1J( IIIIIIIQIZ JIIIIIIII I] = 0pm
#J'
which can be shown by using the same argument as in K-N. Let’s consider A713.

Using the inequality (1.15), Lemma 1.2.2 or (1.14), and the compactness of 8, it is

easy to see A713 is bounded above by the sum A7131 + An32, where

721 - 2
An31 = OIIIIIi’nf’zIIRIJIZJI—RIZJ-IIIZ-[fKIJIIIIKIJJ-(zIIII-IIIIII]
#J'
1 2n1 2
AIII = IIIII-han; Z] / KhlJ-(Z)Kh1J()l€II|IQ(Z J-IIIIIII I] .
IIII'

Apply the conditional expectation argument to the second factor in An31, using the

38

fact (1.10) and the elementary inequality (1. < (1 + a)2, we can show

"1 . 2
E[I(,ln,-2 Z IIRIJIZJ-I - R(ZJ-)|l2 [ / Khlzt(2)1\’h1J-(2)|€Jleft/(2)] ]

I'Iéj
Til A 2
= E]hi’n;22<Es,IIRIJ<ZJI—R<zJ-III2I[/ KIJIIIIKIJJIIIIIJ-IdIIzI] ]
#J'
_ 4 ”1 2
S 0722 3+20‘+4E[hflfnl—2 Z [/Kh1.i(z)Kh1J-(z)|52||1+T(ZJ-)|du’2(z)] ].

1753'
The expectation of the right hand side of above inequality turns out to be 0(1) by
using same argument as in K—N. So,
711
h‘l’ni" Z IIRIJIZJI — KIzJ-III2 [ / KIJJJ-IIIKI1 JIIIIIJ-IIIIIII]2 = «III.
III
This, in turn, implies that the second factor in A7131 = 0p(1). Same method as in

K-N also leads to the following fact:

(1 —2 n1 , . . . I 2 _
IIIIIJ §)[/ KIJJ<2IKIJJJ(IIIIJIIIQIZJIIIIII(II] -Op(1)-
#J‘

Hence A7132 = 0p(1). Therefore, Bnl = 019(1), and 3722 = 0p(1). Thereby proving
the first claim in (1.21), hence the lemma. Cl

We end this section by adding some remarks. First, the MD estimator and testing
procedure depends on the choice of the integrating measure. In the classical regression
case, K-N provides some guidelines on how to choose G. The same guidelines also
apply here. For example, in the one-dimensional case, the asymptotic variance of
ﬁwn - 90) can attain its minimum if G is chosen to be f Z h 2(2.). As far as the MD
test statistic A[n(é-n) is concerned, the choice of G will depend on the alternatives.

In the classical regression case, K-N found that the test has high power against the

39

selected alternatives, if the density function is chosen to be the square of the density
estimator of the design variables. Same phenomenon happens in our case. Secondly,
since replacing Fn in Theorem 4.1 by other consistent estimator of F does not affect
the validity of the result, so we can choose some other consistent estimator of F, for

example,

E3172

2
I-‘nzC/( Ell-=1 1317131002 gnQngw 1))
"1th2 (Z )

 

)2IIIIIIIIJJIII. (1.24)

to make the test procedure computationally efficient, where the constant C equal to

2 I] f J((IIIKcI + IIIIIIIQII

1 .5 Simulations

This section contains results of four simulations corresponding to the following cases:
Case 1: d = q = 1 and ma linear, the measurement error e is chosen to be normal
and u double exponential; Case 2: d = q = 1 and mo linear, the measurement
error e and u are chosen to be normal; Case 3: d = 1,q = 2, and my polynomial,
the measurement error e is chosen to be normal and u double exponential; Case 4:
d = q = 2, and ma linear, the measurement error e is chosen to be normal and u
double exponential. In each case the Monte Carlo average of én, h/ISE(én), empirical
levels and powers of the MD test are reported. The asymptotic level is taken to be
0.05 in all cases. For any random variable W, we will use {W'J-k] }:j=1, j = 1,2
to denote the j- -th subsample SJ- from IV with sample size nJ- .So the full sample is

SI U 52. Finally, to make the simulation less time consuming, Fn, deﬁned in (1.24)

40

'4

I LA... 21-!—

_._id

will be used in the test statistic in stead of 1:77;. So the value of the test statistic is

calculated by ﬁn, := nllﬁ/ZI—‘gl/lelnwn) — Cn).

Case 1 In this case, {X kaij :1 are obtained as a random sample form the
uniform distribution on [—1, 1], {ej k], };:J] :1 are obtained as a random sample from
the normal distribution N (0, (0.1)?) and {ujkjﬂjzl are obtained as a random
sample from the double exponential distribution with mean 0 and variance 0.01. The
parametric model is taken to be m9(X) = 0X, and the true parameter 00 = 1. Then

(Yi, 22') are generated using the model

Y-..= -.+E-..,Z'..= -.+u' .,
1k] 3k] 2k] 3k] 2k] 3k]
kj = 1,2, - -- ,nj, 3' = 1,2. From example 2, we know that the assumption (m1) is

held for a = 0. The kernel functions K and K * and the band widths used in all the

simulations are

1/3

3 — —1 5
K(z) = K*(z) = 3(1— 22)I(|z| g 1), I11: anl , h2 = bn1 / (logn1)1/5,

(1.25)

with some choices for a and b. For the chosen kernel function (1.25), the constant
C in T}; is equal to 0.7642. The kernel function used in (1.4) is chosen to be the
standard normal, so that the deconvolution kernel function with bandwidth 10 takes

the form

 

 

_ 0.005(ch — 1)]

Lw(r) = 1 exp ( — 1;172)[1 102

v27r 2

and the band width “’1 = 722—1/5, 2172 = (log(n2)/n2)1/5) which are chosen by the

41

assumptions (W1) and (W2). Correspondingly. Q,)2(z) = Rn2(z)/wa2(z), where

Rng(z) := fx-qu'1(I)flt(z — r)d.r., qu,2 :2 /_fX,,,2(:r)le,(z — :r)d;r.

Table 1.1 reports the Monte Carlo mean and the AISE(0,)) under H0 for the sample

 

(711,712) (50,134) (100,317) (200,753) (3001250) (500,2366)

 

Mean 1.0103 1.0095 1.0102 1.0105 1.0098

MSE 0.0014 0.0007 0.0004 0.0003 0.0002

 

 

 

 

Table 1.1: Mean and MSE of (in, d = 1, q = 1, Double Exponential

sizes 711 = 50, 100, 200, 500, correspondingly, 722 = 134,317,753, 1250, 2366, each
repeated 1000 times. One can see there appears to be small bias in On for all chosen
sample sizes and as expected, the MSE decreases as the sample size increases.
To assess the level and power behavior of the ﬁn test, we chose the following four
models to simulate data from.
Model 0: Y = X + 5,
Model 1: Y = X + 0.3x2 + 5,
Model 2: Y = X +1.4exp(—0.2X2)+ 5,
Model 3: Y = XI(X 2 0.2) + 5.
To assess the effect of the choice of ((1,1)) that appear in the bandwidths on the
level and power, we ran the simulations for numerous choices of (a, b), ranging from

0.3 to 1. Table 1.2 reports the simulation results pertaining to ﬁn, for three choices

42

of (a. b). The simulation results for the other choices were similar to those reported
here. Data from Model 0 in this table are used to study the empirical sizes, and from
Models 1 to 3 are used to study the empirical powers of the test. These entities are
obtained by computing #{lﬁnl Z 1.96}/1000.

From Table 1.2, one sees that the empirical level is sensitive to the choice of (a, b)
for moderate sample sizes (n1 3 200) but gets closer to the asymptotic level of 0.05
with the increase in the sample size, and hence is stable over the chosen values of
(a, b) for large sample sizes. On the other hand the empirical power appears to be far
less sensitive to the values of (a, b) for the sample sizes of 100 and more. Even though
the theory is not applicable to model 3, it was included here to see the effect of the
discontinuity in the regression function on the power of the minimum distance test.
In our simulation, the discontinuity of the regression has little effect on the power of

the minimum distance test.

Case 2: The measurement. error in this case has normal distribution N (0, (0.1)2). By
Example 1 in Section 2, we see the assumption (m1) is satisﬁed with a = 1. Hence,
by the sample allocation scheme (11), the sample sizes 722 = [12,1]b, b > 7/4. In the

simulation, we choose b = 7/4 + 0.0001. The band widths are chosen to be

1 3
h1 = 121/ , he —- (10s(n1)/721)1/5.
“’1 = 712—1/7, (112 = (log(712)/712)1/5

by the assumptions (hl), (112), (WI) and (w2). The kernel functions K, K* are the
same as in the ﬁrst case, while the density function L has a Fourier transform given
by (1914(1) = max{(1 — t2)3, 0}, the corresponding deconvolution kernel function then

43

takes the form

1 I
Lu»(;r) = ;/0 cos(t:r.)(1 — 1‘2)3 ex1)(0.005t2/u12)dt.

Table 1.3 reports the Monte Carlo mean and the MSE of the MD estimator 071,
under H0. One can see there appears to be small bias in 0n for all chosen sample
sizes and as expected, the MSE decreases as the sample size increases.

To assess the level and power behavior of the ﬁn test, we Chose the following four
models to simulate data from.

Model 0: Y = X + 8,

Model 1: Y = X + 0.3X2 + 5,

Model 2: Y = X +1.4exp(—0.2X2)+ 5,
Model 3: Y = XI(X 2 0.2) + 8.

Table 1.4 reports the simulation results pertaining to ﬁn. Data from Model 0 in
this table are used to study the empirical sizes, and from Models 1 to 3 are used to
study the empirical powers of the test.

Case 3: This simulation considers the case of d = 1, q = 2. Everything here is same
as in Case 1 except the null model we want to test is m9(X) = 81X + 62X2. The

true parameters are 01 = 1, 02 = 2. Easy to see that Rn2(z) takes the form

£1,120) 2: (/$qul1(T)fU(z —1)d$,/$2wal($lfU-(Z ‘0‘”)73

Table 1.5 reports the Monte Carlo mean and the MSE of the MD estimator én =
(0,,1,0n2) under H0. One can see there appears to be small bias in 0,, for all chosen

sample sizes and as expected, the MSE decreases as the sample size increases.

44

 

(711,712)

 

 

 

 

 

 

 

 

(a,b) (50.134) (100,317) (200,753) (3001250) (500,2366)
(0.30.5) 0.003 0.008 0.009 0.020 0.041
(0.30.8) 0008 0.014 0.017 0.031 0.053
ModelO (0.5.0.5) 0.010 0.011 0.020 0.030 0.049
(0.8,0.8) 0.020 0.024 0.027 0.042 0.052
(1.00.8) 0024 0.028 0.026 0.039 0.050
(1.0,10) 0.028 0.037 0.030 0.048 0.054
(0.30.5) 0.407 0.865 0.987 0.997 1.000
(0.30.8) 0.491 0.888 0.990 0.998 1.000
Modell (0.50.5) 0.704 0.975 0.999 1.000 1.000
(080.8) 0.896 0.997 1.000 1.000 1.000
(1.00.8) 0.921 0.999 1.000 1.000 1.000
(101.0) 0.926 0.997 1.000 1.000 1.000
(0.30.5) 0.898 0.972 0.999 0.999 1.000
(0.30.8) 0.919 0.976 0.999 0.999 1.000
Mode12 (0.50.5) 0.985 0.999 0.999 1.000 1.000
(0.80.8) 0.998 1.000 1.000 1.000 1.000
(1.00.8) 0.999 1.000 1.000 1.000 1.000
(10.10) 0.999 1.000 1.000 1.000 1.000
(0.30.5) 0.774 0.959 0.993 0.998 1.000
(0.30.8) 0.807 0.964 0.993 0.998 1.000
Mode13 (0.50.5) 0.933 0.966 0.999 1.000 1.000
(0.808) 0.999 1.000 1.000 1.000 1.000
(1.00.8) 0.992 1.000 1.000 1.000 1.000
(10.10) 0.988 1.000 1.000 1.000 1.000

 

Table 1.2: Levels and powers of the MD. test, d = 1, q = 1, Double Exponential

45

 

 

 

 

 

 

 

 

 

 

(721,712) (50,941) (100,3164) (200,10643) (300,21638) (500,52902)
Mean 1.0051 1.0078 1.0085 1.0101 1.0169
MSE 0.0013 0.0007 0.0004 0.0003 0.0004

Table 1.3: Mean and MSE of 0n, d = 1, q = 1, Normal
(711,112)

Model (50,941) (100,3164) (200,10643) (300,21638) (500,52902)
Model 0 0.018 0.022 0.029 0.035 0.049
Model 1 0.918 0.999 1.000 1.000 1.000
Model 2 0.999 1.000 1.000 1.000 1.000
Model 3 0.993 1.000 1.000 1.000 1.000

 

 

Table 1.4: Levels and powers of the MD. test, d = 1, q = 1, Normal

 

 

 

 

(711,712) (50,134) (100,317) (200,753) (300,1250) (500,2366)
Mean of 9711 1.0169 1.0144 1.0139 1.0136 1.0128
MSE of (9,,1 0.0058 0.0031 0.0015 0.0011 0.0007
Mean of 9,,2 2.0450 2.0452 2.0463 2.0493 2.0473
MSE of ring 0.0124 0.0076 0.0046 0.0042 0.0033

 

 

Table 1.5: Mean and MSE of 97), d = 1, q = 2, Double Exponential

46

 

 

 

 

(n1,n2)
Model (50134) (100.317) (200,753) (3001250) (500.2366)

 

 

Model 0 0.001 0.009 0.019 0.029 0.046
Model 1 0.297 0.815 0.999 1.000 1.000
Model 2 0.528 0.965 0.999 1.000 1.000
Model 3 0.996 0.999 1.000 1.000 1.000

 

 

 

 

Table 1.6: Levels and powers of the MD. test. d = 1, q = 2, Double Exponential

To assess the level and power behavior of the ﬁn test, we chose the following four

models to simulate data from.
Model 0; Y = X + 2X2 + e,
Model 1: Y = X + 2X2 + 0.3x3 + 0.1+ 5,
Model 2: Y = X + 2X2 + 1.4 exp(—0.2X2) + 6.
Model 3: Y = X + 2X2 sin(X) + 5,

Table 1.6 reports the simulation results pertaining to ﬁn. Data from Model 0 in
this table are used to study the empirical sizes, and from Models 1 to 3 are used to
study the empirical powers of the test.

Case 4: This simulation considers the case of d = 2, q = 2. The null model we want
to test is m6(X) = 91X1 + QQXQ. The true parameters are 61 = 1,02 = 2. The

kernel functions K and K* and the band widths used in the simulation are

9
K(21,22) = K*(21,22) = —_(1 — 2%)(1— 2%)I(|21|§ 1,]z2l 31), (1.26)
10

—1/5 —1/6(

hl 2711 , 12.2 = 721 1/6,

log 711)

47

 

 

 

 

 

 

(721,712) (50.354) (100.1001) (200.2830) (300.5200) (500,11188)
Mean of 9,,1 1.0099 1.0120 1.0115 1.0094 1.0113
MSE of 9,,1 0.0042 0.0019 0.0011 0.0008 0.0005
Mean of (9,,2 2.0202 2.0220 2.0213 2.0225 2.0209
MSE of 97,,2 0.0042 0.0027 0.0014 0.0011 0.0008

 

Table 1.7: Mean and MSE of én» d = 2, q = 2, Double Exponential

For the chosen kernel function (1.26), the constant C in Fn is equal to 0.292. The
kernel function used in the (1.4) is chosen to be the bivariate standard normal, so the

deconvolution kernel function with band width 10 takes the form

2

 

 

Lott) = % exp ( _ of + $3111 _ 0.00501; — 1)] [1_ 00050273 - 1)].

w 2122
Since (m1) holds for a = 0, so the band widths 101 2 7131/6, 102 = (log(n2)/n2)1/6)
which are chosen by assumption (WI) and (w2). According to the assumption (n) we
take 712 = 71%‘5001.
Table 1.7 reports the Monte Carlo mean and the MSE of the MD estimator 6n =
(9311160712) under H0. One can see there appears to be small bias in 0n for all chosen
sample sizes and as expected, the MSE decreases as the sample size increases.
To assess the level and power behavior of the ’13” test, we chose the following four
models to simulate data from.
Model 0: Y = X1+ 2X2 + 5,
Model 1: Y 2 X1 + 2X2 + 0.3X1X2 + 0.9 + 8,

Model 2: Y = X1 + 2X2 + 1.4(exp(—0.2X1) — exp(0.7X2)) + 5.

48

Model 3; Y = X11(X2 2 0.2) + e.

 

(711.712)
Model (50,354) (100.1001) (200.2830) (300.5200) (500,11188)

 

 

Model 0 0.002 0.012 0.018 0.016 0.038
Model 1 0.908 0.998 1.000 1.000 1.000
Model 2 0.992 0.999 1.000 1.000 1.000
Model 3 0.935 0.996 1.000 1.000 1.000

 

 

 

 

Table 1.8: Levels and powers of the MD. test, d = 2, q = 2, Double Exponential

Table 1.8 reports the simulation results pertaining to ﬁn. Data from Model 0 in
this table are used to study the empirical sizes, and from Models 1 to 3 are used to

study the empirical powers of the test.

1 .6 Discussion

1.6.1 Sample Size Allocation

The simulation studies Show that the proposed testing procedures are quite satisfac-
tory in the preservation of the ﬁnite sample level and in terms of a power comparison.
But in the proof of the above theorems, we need the sample size allocation assump-
tion (n) to ensure that the estimator (2712(2) has a faster convergence rate. The
assumption (n) plays a very important role in the theoretical argument, but it loses

attraction to a practical practitioner. For example, in the simulation case 1 where the

49

measurement error follows a double exponential distribution, the sample size alloca-
tion is 722 = [723’], and b = 1.2501. 77.2 in the second subsample 52 increases in a power
rate of the sample size 711 in the ﬁrst subsample, If "1 = 500, 712 is at least 2365, the
sample size of the full sample is 2865 which is perhaps not easily available in practice.
The situation becomes even worse when the measurement error is super smooth or
d > 1. For example, in Case 2, the measurement error has a normal distribution, n2
is at least 52902 if n1 = 500; in Case 4, d = 2, n2 is at least 11188 if "'1 = 500.
Then an interesting question arises. What is the small sample behavior of the
test procedure if (1) 711 = 712 and the two subsamples SI and 82 are independent
or (2) n 2 n1 2 712 and the same sample is used in the test? We have no theory
at this point about the asymptotic behavior of Mn(9n). For d = 1, we only conduct
some Monte Carlo simulations here to see the performance of the test procedure, see
Table 1.9-Table 1.12. The simulation results about levels and powers of the MD test
appears in the following tables, in which the measurement error follows the same
double exponential and normal distributions as in the previous section, the null and

alternative models are the same as in Case 1.

50

 

 

 

 

Sample size: (711,712)

Model (50,50) (100,100) (200,200) (300,300) (500,500)
Model 0 0.008 0.036 0.033 0.038 0.049
Model 1 0.938 1.000 1.000 1.000 1.000
Model 2 1.000 1.000 1.000 1.000 1.000
Model 3 0.990 1.000 1.000 1.000 1.000

 

 

Table 1.9: 72.1 2 712, d = 1,q = 1, Double exponential

 

Sample size

 

Model

50

100 200

300

500

 

 

Model 0
Model 1
Model 2

Model 3

 

0.015

0.934

0.999

0.991

0.024 0.036

1.000 1.000
1.000 1.000

1.000 1.000

0.043

1.000

1 .000

1.000

0.047

1.000

1.000

1.000

 

 

Table 1.10: Same sample, d = 1, q = 1, Double exponential

 

Sample size: (111,722)

 

 

 

Model (50,50) (100,100) (200,200) (300,300) (500,500)
Model 0 0.013 0.023 0.027 0.035 0.047
Model 1 0.931 0.999 1.000 1.000 1.000
Model 2 1.000 1.000 1.000 1.000 1.000
Model 3 0.984 1.000 1.000 1.000 1.000

 

 

Table 1.11: 71,1: 712, d =1,q = 1, Normal

51

 

 

 

Sample size

 

Model 50 100 200 300 500

 

ModelO 0.017 0.019 0.036 0.036 0.051
Modell 0.954 0.998 1.000 1.000 1.000

Model2 0.999 1.000 1.000 1.000 1.000

 

 

 

Mode13 0.992 1.000 1.000 1.000 1.000

 

Table 1.12: Same sample, d = 1, q = 1, Normal

 

Sample size

 

Model 50 100 200 300 500

 

Model 0 0.000 0.004 0.010 0.018 0.041
Model 1 0.628 0.996 1.000 1.000 1.000
Model 2 0.994 0.999 1.000 1.000 1.000

Mode13 0.844 0.998 1.000 1.000 1.000

 

 

 

 

Table 1.13: Same sample, d --= 2, q = 2, Double Exponential

To our surprise, the simulation results for the ﬁrst three cases in which d = 1 are
very good. There are almost no differences between the simulation results based on
our theory and the simulation results by just neglecting the theory. In the Case 4
with d = 2, we only conduct the simulation for 51 = SQ, see Table 1.13. The test
procedure is conservative for small sample sizes, but the empirical level is close to the

nominal level 0.05 when sample size reaches 500. This phenomenon suggests us that.

52

by loosing some conditions, such as (11), even the assumptions on the choices of the

bandwidths, Theorem 1.3.1 and Theorem 1.4.1 maybe still valid.

1.6.2 General Errors-in-Variables Model Fitting

In the previous sections we have so far discussed the model ﬁtting problem in the
errors-in-variables models in which the regression function is linear in 6 under the
null hypothesis. The separation between the parameter and the predictor enables
us not only to get an explicit expression for the estimator, but also to utilize a
conditional expectation argument, so that we can use Lemma 1.2.1 to get a better
sample allocation scheme. If the regression function under the null hypothesis has
a general form other than the form we discussed in this chapter, things become
complicated.

For the sake of brevity, this section only reports the results we obtained for the
general errors—in-variables model ﬁtting.

To be speciﬁc, in the errors-in-variables model (1.1), the problem of interest is to

develop tests for the following hypotheses:
H0 : 11(1) 2 "100(1)? for some 60 E 9, vs. H1 : H0 is not true, (1.27)

where {m9(1:) : 0 E O} is a given parametric family. Just like in the special case
considered in the previous sections, the problem of testing for H0 is transformed to
test for V(::) = V90(z), where now V6(Z) := E(mg(X)|Z = z). A very important
question related to this hypothesis change is the following: Are the two hypotheses,
H10 : Mr) 2 "190(2), for some 00 and all .r, and H20 : l/(Z) = 1x90(z), for some

53

60 and all 2, equivalent? The answer is negative generally, because for any two
measurable functions 777.1(23), 777.2(1), E(ml(X)|Z =: z) = E(m2(X)|Z = z), for all 2,
need not imply m1(;r) = 777.2(1) for all :r. In this case, if our test rejects H20, then
we can reject H10 as well, but if the test fails to reject H 20, then we can say nothing

about H10. Note that E(m1(X)|Z = z) = E(m2(X)|Z = z) is equivalent to

/ m1($)fx($lfu(z — x>dx = / m2(I)fX(I)fu(z — nae

for all 2. Hence if fu(z — -), as a distribution family with parameter 2 6 Rd, forms
a complete family, then these two hypotheses are indeed equivalent. This is the
case, for example, for the normal distribution, and if d = 1, for double exponential
distribution.

From (1.3) one sees that if f X is known then f Z is known and hence V0 is known
except for 6. Therefore a modiﬁcation of K-N’s procedure in this case is as follows.

Let

 

_ 1 n 2
m9) .= / [an(z)i=ZlKhZ-(z)Yz-—u6(z)] 30(2), gee, (1.28)

 

1 n 2
m0) .= j[an(z)i=:lKhZ-(z)(Yi—u6(Z,-))] (10(2), ago,

6n := argmingeeTnW), 6n: argminQEeTnW),

Here h is a bandwidth only depending on n. Then we may use 6n to estimate 6, and
construct the test statistic through T 71(971).

Unfortunately, f X is generally not known and hence f Z and H9 are unknown.
This makes the above procedures infeasible. To construct the test statistic, one needs
estimators for f Z and H9. For f Z: one can still use the classical kernel estimator, with

54

a possibly different kernel function K* and a bandwidth I12. So one only needs to ﬁnd
an estimator for V9. Using deconvoluting kernel density estimator with bandwidth
I13 for f X One can estimate V6(Z) by
- f '77'7.6(1:)th3(r)f77(z — 50111:
[16(2) = ~ 7
f Zh 3 (3 )

th3(z) = fth3(r)fn(z—I)d.r.

 

Our proposed inference procedures will be based on the analogs of Tn where 119(2) in
(1.28) is replaced by its estimator 199(z).

To be precise, we assign the ﬁrst n1 = n1(n) and 721 < n observations to estimate
f Z1 and use all n observations to estimate f X' The bandwidths h1, h2 will depend
on the sub-sample size n1, and ’13 will still depend on the full sample size n.

Replace V9(Z) in (1.28) by its estimator 196(8) and deﬁne

M,",‘(8) : fln1fZ1h2—(— z)ZKh1i(z)Y V9(z)leG(z)’

- 2
6471(9) I: _/l'[n—-—z_1le(h2) ZKhlz-(ZX Y'-1/6(Z.i))] (10(2), 669,

67;, := arginfgeeMnW), 67):: argianEean).

Then we may use 6n to estimate 6, and construct the test statistic through Mn(6n).
We can show that 6;; converges to 6 in probability. But as is clear 6,"; is really not an
estimator, but we need this convergence result to prove the consistency for 6n for 6,
and the asymptotic normality of \/n—1(6n — 60). Finally, let 9 be a density of C, and

let

c.- == n—Hoowd, <.-:=Y.-—HA (2.).

55

’11

C7) 2: rlIQZ/Kgll-(z) )(2dL( (3,)

Cn :2 721 QZ/Khli C2dl' h2(3 M)
. - . 2
Tn := 5ﬁj(.fKhli(31Kh1(leideUhQ(3)) »

2(2) :2 o? + E((:90(X) -— 7190(2))212 = z), o3 ;—_- Var(€),

F :2 2/(72(z))29(2)d¢l(z)-/( (/K(u)K(u+1l)du)2dv,

 

; dG(z) / . dG(z )
W) 77 W 1%(1
2

Under appropriate sample size allocation scheme, and under the null hypothe-
sis and other regular conditions, we can show that the asymptotic distribution of

d/2I‘ 7:1/2 (Mn(6n,) — Cn) is standard normal. But the sample allocation scheme

nlhl
711 = 711(71) is not feasible, particularly in the super smooth case. Simulation results

show that, if we do not follow the sample allocation scheme, just like we did in the

previous section, the test statistic behaves quite satisfactory.

56

CHAPTER 2

Minimum Distance Berkson Model

Fitting

2.1 Introduction

Berkson model is also commonly used in the real applications. As an example, con-
sider the herbicide study of Rudemo, et al. (1989) in which a nominal measured
amount Z of herbicide was applied to a plant but the actual amount absorbed by the
plant X is unobservable. As another example, from Wang (2004), an epidemiologist
studies the severity of a lung disease, Y, among the residents in a city in relation to
the amount of certain air pollutants, X. The amount of the air pollutants Z can be
measured at certain observation stations in the city, but the actual exposure of the
residents to the pollutants, X, is unobservable and may vary randomly from the Z-
values. In both cases, X can be expressed as Z plus a random error. There are many

similar examples in agricultural or medical studies, see e.g., Fuller (1987), Carroll,

57

Ruppert and Stefanski (1995), among others.

All these examples can be formalized into the so called Berkson model
Y = /1(X) + 5. X = Z + 77, (2.1)

where n and 5 are random errors with E5 = 0, and where n is d-dimensional, and Z
is the observable d—dimensional control variable. All three variables 8, 77, and Z are
assumed to be mutually independent.

The parametric Berkson model where the regression function is of a parametric
form {m6(17) : :r 6 Rd, 6 E 9 C Rq}, q 2 1, has been focus of numerous authors.
Fuller (1987) and Cheng and Van Ness (1999), among others, discuss the estimation
in the linear Berkson measurement error models. For nonlinear models, Carroll et al.
(1995) and references therein, consider the estimation problem by using regression
calibration method. Huwang and Huang (2000) studies the estimation problem when
m9(;r) is a polynomial in 1: of a. known order and shows that the least square estimators
based on the first two conditional moments of Y, given Z, are consistent. Wang (2003,
2004) addresses the same problem in general nonlinear models and shows that the
estimators obtained by minimizing the ﬁrst two conditional moments of Y, given Z,
are consistent and asymptotically normal.

But literature appears to be scant on the lack-of-ﬁt testing problem in this im-
portant model. This paper makes an attempt in ﬁlling this void. To be precise, with

(X, Y) obeying the model (2.1), the problem of interest here is to test the hypothesis

H0 : p(;r) = 77’60(I)’ for some 60 E G and for all .r;
H1 : H0 is not. true,

58

based on a random sample (X z" 1”,), 1 g 2' g n. from the distribution of (X, Y).

Many interesting and profound results. on the contrary, are obtained for the re-
gression model checking problem in the absence of errors in independent variables,
see, e.g., Eubank and Spiegelman (1990), An and Cheng (1991), Hart. (1997) and
references therein, Stute (1997), Stute. Thies, and Zhu (1998), among others. The
recent paper of Koul and Ni (2004) uses the minimum distance methodology to pro-
pose tests of lack-of-ﬁt for the regression model without errors in variables. In a ﬁnite
sample comparison of these tests with some other existing tests, they noted that a
member of this class preserves the asymptotic level and has very high power against
some alternatives and compared to some other existing lack-of-ﬁt tests. This paper
extends this methodology to the above Berkson model.

To be speciﬁc, Koul and Ni (2004) (K-N) considered the following tests of H0
where the design is random and observable, and the errors are heteroscedastic. For

any density kernel K, let Kh(17) 2: K(:r/h)/hd, h > 0, :1: 6 Rd. Deﬁne, as in K-N,

n
fw(x) := i: KZ,(;r — X -), w = wn ~ (logn/n)1/(d+4),

 

me) ;= (C iii“ H— ' ’(Yj_m"(Xj))l26fl”gl:i’

and én := argmingeeT 71(6), where K, K* are density kernel functions, possibly
different, h = hn and w = urn are the window widths, depending on the sample size
n, and C is a sigma finite measure on C which is a compact subset of Rd. They proved
the consistency and asymptotic normality of this estimator, and that the asymptotic

null distribution. under H0. of Dn :2 mil/2m, (én) — C'71)/l",17/2 is standard normal,

59

where

n.
.. 1 2 2"—2 - .
0n. 1: £521 Cth’I-Xilﬁ'fw (.1‘)dG(1.‘), 5i=Yi_m9~n(Xi)
z:
n .
~ '_ 1 (If—Xi) I_Xj) g; ~_2‘ -' 2
n 2%]7—1

The test based on ’Dn is preferable over the tests developed by Hardle and Mam-
men (1993), and Zheng (1996). Unlike in these and other related papers, K-N do not
need the null regression function to be twice continuously differentiable in the param-
eter vector nor do their proofs need the rate for uniform consistency of nonparametric
regression function estimators. Moreover, the asymptotic normality of Til/2(én — 6)
and 0,; was made feasible by recognizing to use different window widths for the esti-
mation of the numerator and denominator in the nonparametric regression function
estimation. A consequence of the above asymptotic normality result is that at least
for large samples one does not need to use any resampling method to implement these
tests.

These ﬁndings thus motivate one to look for tests of lack-of-ﬁt in the Berkson
model based on the above minimized distances. Since the predictors in Berkson
models are unobservable, clearly the above procedures need some modiﬁcations.

Let f5, f X? f7), f Z denote the density functions of the r.v.’s in their sub-scripts and
0&2 denote the variance of 5. In linear regression models if one is interested in making
inference about the coefficient parameters only, these density functions need not be
known. Berkson (1950) pointed out that the ordinary least square estimators are
unbiased and consistent in these models and one can simply ignore the measurement
error 77. But if the regression model is nonlinear or if there are other parameters in

60

the Berkson model that need to be estimated, then extra information about these
densities should be supplied to ensure the identiﬁability. A standard assumption
in the literature is to assume that f7? is known or unknown only up to an Euclidean
parameter vector, cf., Carroll, et al. (1995), Huwang and Huang (2000), Wang (2004),
among others. Throughout this paper, we shall assume that f7? is known unless the
regression function under the null hypothesis is linear.

To adopt K-N’s procedure to the current setup, we ﬁrst need to obtain a nonpara-
metric estimator of )1. Note that in the model (2.1), f X (=23) f f Z )f77(a: — z)dz.

Let K be a kernel density,

n
fz(Z) = 71-1 Z Kh(z — Z
i=1
be the kernel estimator of f 2(2), and
Rh<x,z> == / Kh(y — 2mm: — guy, x, z e Rd.

It is then natural to estimate fX(17) by

'szZM ()(fn (Wad iZI—{MLZZ'L :rEIRd.

Given the estimator fX(I), one is then tempted to estimate the regression function

[1.(1‘) by

Unfortunately, the classical argument shows that jn(.r) is not a consistent estimator of

)1. 1(1" ). It in fact 18 consistent for J(J: =E[H(Z )IX- - :17], where H( (z): E[,u(X |)Z— —

z].

61

We include the following simulation study to illustrate this point. Consider the
model Y = X2 + 5, X = Z + 17, where e and 17 are Gaussian r.v.’s with means zero -
and variances 0.01, and 0.05, respectively. The r.v. Z is the standard Gaussian. Then
J (1:) = 0.0976 + 0.90712. We generated 500 samples from this model, calculated in,
and then put all three graphs, jn(x), u(a:) = x2, J (1:) = 0.0976 + 0.9071:2 into one

plot in the Figure 2.1. The curves with solid, dash-dot, dot lines are those of in,

 

 

 

 

1 .-
0.8-
0.6-
0.4~
02*

GT

-1.5 1l5

Figure 2.1: Comparison Plot
J (x), and p(:z:) = 1:2, respectively.
To overcome this difﬁculty, one way to proceed is as follows. Deﬁne
H9(z) :2 E[m0(X)|Z = z], J0(x) = E[H9(Z)|X =25],
~ 1 n 2 _
W) = j - Rhea, z-)Y- — 19(1?) dG(:v). (2.2)
C [an (:17) 1; z z ]

 

1 n - 2 _
(271(9) = [C[nfxf$)i=ZIKh($’Zi)l}/i—H6(Zi)l] del‘),

62

and 9n = argrningeeéﬂd). 6n = argmingEeQnW).

Under some conditions, we can show that 6n, 5n are weakly consistent for 6. and
the asymptotic null distribution of the test statistic based on the suitably standardized
minimum distance Q-nW-n) is the same as that of a degenerate U—statistic, whose
asymptotic distribution in turn is the same as that of an inﬁnite sum of weighted
centered chi square random variables. Since the kernel function in the degenerate
U-statistic is complicated, the computation of the eigenvalues and the eigenfunctions
is not easy and hence this test is hard to implement in practice.

An alternative way to proceed as we do here is to recognize that E (YIZ ) = H ( Z)
and hence consider the new regression model Y = H (Z) + C, where the error C
is uncorrelated with Z and has mean zero. The problem of testing for H0 is now
transformed to test for H ( z) = H 9 0(2). Thus we do the following modiﬁcation of the

above K-N procedure to adjust for not observing the design variable. Let

fzuﬂz r= £77:sz — w~(logn/n)1/(d+4);

. , z E Rd.
anw(z)

 

Note that fin is a nonparametric estimator of the conditional expectation H (z) =

E(,u(X)|Z = 2). Deﬁne

2
A1,:(6) = ./I [——— )ZKM z—Z z)Y- H9(z )] (10(2),
anw(z)
2
AI,6 2/1 K( -Y-—H Z- (10:,
n<> [n——fzw 2) 2 he —Z,,>[. 9( n1] ()
0;: = argmingEQAI;(61), 9n = argmingeeillnw),

where G is a measure supported on a compact subset I C Rd. We. consider ll-[n to be

63

the right analog of the above Tn for the Berkson model. Let 60 be the true parameter
under H0. This paper proves that 6;: converges in probability to 60, under H0. This
in turn is used to prove the consistency of én for 60, and the asymptotic normality of
\fﬁén — 60), under H0. Additionally, we prove that the asymptotic null distribution
. . . (1/2 A — 1/2 ' A " . .
of the normalized test statistic 'nh Fn (ann) — C71), based on the minlmum
distance Illn(én), is standard normal, which, unlike the ﬁrst modiﬁcation of (2.2),
can be easily used to implement this testing procedure, at least for the large samples.

Here,

dag) = $0“) zERd, (, =Y,-—Hé (2,), 19312, (2.3)
wa(Z) n
- 1 n
n = Z—Z/KIQJz—Zzﬂzzdwh),
i=1
- _ . - - 2
1“,, ;= 2n hdZ(th(z—2,)Kh(z—Zj)gigjdwh2(z)) .

so
We note that there is a typo in the deﬁnition of the T}; of K-N, there should be a
factor of 2 in there also.

The paper is organized as follow. The needed assumptions are stated in the next.
section. Section 3 contain the proofs of consistency of 6;; and én while sections 4 and
5 contains the proofs of the asymptotic normality of 6n and that of the proposed test
statistic. The simulation results in section 6 Show little bias in the estimator 9A7; for
all chosen sample sizes. The ﬁnite sample level approximates the nominal level well
for larger sample sizes and the empirical power is high (above 0.9) for moderate to

large sample sizes against the chosen alternatives.

64

2.2 Assumptions

Here we shall state the needed assumptions in this paper. Throughout the paper 00
denotes the true parameter value under H0. About the errors, the underlying design

and G we assume the following:

(e1) The random variables {(Zi,}/i) : Z, 6 Rd,i = 1,2,--- ,n} are i.i.d. with the
conditional expectation H(z) = E(Y|Z = z) satisfying fH2(z)dG(z) < 00,
where G is a o—ﬁnite measure on I .

(e2) 0 < a? < oo, EmgOUf) < 00, and the function 72(z) = E[(m90(X) —
H00(Z))2|Z = z] is as. (G) continuous on I.

(e3) El €|2+5 < oo, E[m60(X) — H90(Z)]2+6 < 00, for some 6 > 0.

(e4) E|5|4 < oo, E[m90(X) — H90(Z)]4 < 00.

(f1) The density f Z is uniformly continuous and bounded from below on I.

(f2) The density f Z is twice continuously differentiable.

(g) The integrating measure G has a continuous Lebesgue density 9 on I.
About the kernel functions K and K*, we shall assume the following:

(1() The kernel functions K, K* are positive symmetric square integrable densities

on [—1, 1]d. In addition, K* satisﬁes a Lipschitz condition.

About the parametric family of functions to be ﬁtted we need to assume the

following:

(m1) For each (9, m9(;r) is as. continuous w.r.t.. the Lebesgue measure.

(m2) The function H3(z) is identiﬁable w.r.t.. 0. i.e., if H6105) = H92(z) for almost

all 2(0), then 01 = 62.

(m3) For some positive continuous function I on I with E€(Z) < 00 and for some

ﬁ>0,

[H92(z) — H91(z)l g ”92 — elm-ﬂag), V61,62 6 9,2 6 I.

(m4) For every 2, H9(z) is differentiable in 0 in a neighborhood of 00 with the vector

of derivative H9(z), such that for every 0 < k < 00,

H 2.,- —H 2,- — (9—9 ’H 2,-
Sup | 9( l 90( ) ( 0) 90( )|=0p(1).

H9 - 90H
lgign,(/nh%||6—6O||gk

(m5) For every 0 < k < oo,

 

—d 2 . .
sup hn / HH9(Zi)—H90(Zilll=0p(1), Vn>N€-

1§i_<_n,\/nh§11||6—60||§k

(m6) 20 := ngOHéOdG is positive deﬁnite.
About the bandwidth hn we shall make the following assumptions:

(h1)hn,—’Oasn—>oo.
(h2) nhgld—xooasn—aoo.

(h3) hn ~ 71—61, where a < min(1/2d,4/(d(d + 4))).

The above conditions are similar to those imposed in K-N on the model 1729.

Consider the following conditions in terms of the given model.

66

 

(m2’) The parametric family of models m9(;r) is identiﬁable w.r.t. 6, i.e., if
m9l(. r): m92(.r I.) for almost all :13, then 61- — 62.
(m3’) For some positive continuous function L on Rd with EL(X) < 00 and for

some 1'3 > 0,
|m92(;r) — m91(2:)| S ”62 — 61ll'3L(:r), V91.92 6 8,1? 6 Rd.

(1114’) The function m9(1:) is differentiable in (9 in a neighborhood of 60, with the

vector of differential #190 such that for every k < oo,

Imam - m90($) — (9 — 90) 'm90(I)l

 

sup ”0 _ 90“ = op(1).
xEle,(/nh%||0—00||Sk
(m5’) For every 0 < k < 00,
—d 2 . .
sup hn / Ilmgos) —m90(x)n =0p(1), Vn> N5.

xERd,-(/nh%||0—90||Sk

In some cases, (m2) and (m2’) are equivalent. For example, if the family of
densities {f77(~ — 2); 2 E IR} is complete then this holds. Similarly, if "19(1) = 6’7(x)
and f( 7(x )(f9 :6 — 2)d:r 75 0, for all 2, then also (m2) and (m2’) are equivalent.

We can also Show that (m3’)-(m5’) imply (m3)-(m5), respectively. This follows

because H9() .=_f m9 (1‘)fo — 2)d;1:. Thus under (n13’),
|H92<z)- H9 ( )l < “62 — (an/u am. —zd:r,) v2 e Rd.

Hence (n13) holds with €(2 =fL( :r)(fn 3r — z)d1:. Note that E€(Z)= EL(X) < oo

67

Similarly, using the fact that ff77(1: — z)d1: E 1, the left hand side of (m4) is
bounded above by

|m9(.r) — m90(1‘) — (6 — 60) ’m90(r)l . (1
||0 — 60H ‘ 0” )’

 

sup
reRd,\/rzh§ill9—90|I:k

by (m4’) Similarly, (m5’) implies (m5) and (m1) implies that H9(2) is as. continuous
in 2 (G).

The conditions (m1)-(m6) are trivially satisﬁed by the model m9(:1:) = 0’ 7(1‘)
provided the components of E [7(X )IZ = 2] are continuous, non-zero on I, and the
matrix f E[7(X)7’(X)|Z == z]dG(z) is positive deﬁnite.

The conditions (e1), (e2), (f1), (k), (m1)-(m3), (hl) and (h2) suﬂice for the con-
sistency of (In, while these plus (e3), (f2), (m4), (m5), (m6) and (h3) are needed for
the asymptotic normality of 6n. The asymptotic normality of Mn(6n) needs (e1),
(e2), (e3), (e4), and (f1)-(m6), and (h3). Of course, (h3) implies (hl) and (h2).

Let th denote kernel density estimator of f Z with bandwidth h E hn. From

Mack and Silverman (1982), we obtain that under (f1), (k), (hl) and (h2),

fur; lehn(z) - fz(2)| = 029(1), sup Iwa(2) - fz(Z)| = 011(1), (2-4)

 

26 2€I
sup i .fZ(Z) — 1‘ 2 013(1)
.26]: 213(3)

These conclusions are often used in the proofs below.

In the sequel, the true parameter 00 is assumed to be an inner point of 9 and
C 2: Y — H90(Z). The integrals with respect to the G—measure are understood
to be over the compact. set I. The convergence in distribution is denoted by —>d
and Np(a, B) denotes the p-dimensional normal distribution with mean vector (1 and

68

covariance matrix B. p 2 1. We shall also need the following notation.

 

(16' 2
([1,;(2) :2 2( ). 02(2) 2: Var9 (CIZ = 2) = a? + 72(2), 2 6 Rd. (2.5)
f2(3) Q 0
~ 1 Zn 2 2
C2 = ll—H90(Z,), 1<Z<Tl Cn “-3 Kh(~—ZlezdL(2),
i=1

2.3 The Consistency of 6; and 6,,

This section proves the consistency of 6;; and 6n. Let L2(G) denote a class of square

integrable real valued functions on Rd with respect to 0'. Deﬁne

p(u1,z/2) := /[V1(.’L‘) — V2(a:)]2dG(1:), 111,112 6 L2(G),

and the map T(z/) = arg min9Ee p(1/, H9), V E L2(G).
The following lemma is found useful in the proofs here. Its proof is similar to that

of Theorem 1 in Beran (1977).

Lemma 2.3.1 Let H9 satisfy conditions (m1)-(m3). Then the following hold.
(a). T(1/) always exists, for VI/ 6 L2(G).
(b). 1fT(1/) is unique, then T is continuous at u in the sense that for any sequence

of {1m} 6 L2(G) converging to u in L2(G), T(l/n) —+ T(1/), i.e.
p(un, u) —> 0 implies T(1/n) —> T(I/), as n —> 00.
(c). T(H9(-)) = 6. uniquely for V6 6 9.

Recall the notation at (2.3) and (2.5). As in K-N, for any integral J := f rdlf), the
replacement. of dc; by dw( 2) is reﬂected by the notation j := f rdui'. We also need to

69

deﬁne, for a 6 E R9,

untzﬂ) := 3 411,13 — ZilH6(Zilv (2.6)
#n(~ 6) = 121194.. — 2.1119(2)
(no.9) = % :1 Kht: — 22W, — maze)
,2
= ,1; :KW - ZillYi — H6020], Un(2) == Un(z,90)
,:
anza 9) == 111n<z,6)— Mat/3,90) = % En: KHZ — ZillH6(Zi) - 1790(4)],

These entities are the analogs of the similar entities deﬁned at (3.1) in K-N. The
main difference is that 119 there is replaced by H9 and Xi’s by 21’s. A consequence

of Lemma 2.3.1 is the following

Corollary 2.3.1 Suppose HO, (e1), (e2), (f1), and (m1)-(m3) hold. Then 6;; —+ 60,

in probability.

Proof. We shall use part (b) of the Lemma 2.3.1 with l/n = 6171(2), and I/ = H90(2).
Note that M,"{(60) = p(H-n, H90), 6;: = T(un), and by the identiﬁability condition

(m2), T(1/) = 60 is unique. It thus sufﬁces to prove

Pfﬂn, H90) = 0p(1)- (2-7)

To show this, by plugging in Y2 = (z- + H9O(Zz-), and expanding the quadratic

integrand, pal-n, H90) is bounded above by the sum 2iCn1 + Cn2(60)], where

cm 2: /U,2,(,z)dz.3(z). ongw) ;=/[;1n(z,e)—wa(z)H9(z)]2d13(z), 9 e 1110.

70

By Fubini and the independence of Z and e, we have

E / b',2,(2)(11,(2)= 711/ EKga-r — 21m? + T2(Z1))du’1(2). (2.8)

By the uniform continuity of f Z ensured by (f1),

2(
th/K(h

= O(.1/hd)

EKﬁz—Zn

     

(Indy = ﬁg / K2(y)fz(z — yh)dy

Similarly, using additionally the as. continuity of 72(2), we also have

 

EK%(2 — 21)72(zl) = E[ﬁ/K2(z—hzl)72(Z1)dz] = 0(1/11d).

These calculations imply that

E/U%(Z)d1,6(2)= 0(5):?) and /U%(2)d1/)(2) = 014-75113) (2.9)

Hence by (2.4), we obtain

 

1
0"” < :22 120(1le [03.1, 1:04;?) =01“)-
Let
eh(2,6) := EKh(2—Z1)H9(Z1)=/K(u)H9(2—uh)fZ(2—uh)du
e:U(2,6) :2 EKh(2—Z1)H9(2) =/K( u )fZ(2—uh)du H9(2).

By adding and subtracting eh(2, 6) and 6:0(2, 6) in the quadratic term of the integrand
in C712, and using the similar method as in K-N, one can show that Cn2(60) = 019(1)

by (f1), (m1). This proves (2.7) and hence the corollary. Cl

71

Remark 2.3.1 Lemmas 3.1 and Corollary 3.1 are similar to those in the K-N paper.
The only difference is that here we have H9 and Zi in place of m9 and X i there so that
the current (2- are analogs of Si of the K-N paper. Another difference is because of the
measurement error in design, (2.8) here has the extra variance term 72(Z), although
the asymptotic order of this expectation is the same as in the no measurement error

model given in (2.9) above. Thus, from now onwards, in many proofs below we shall

be brief.

The proof of the following theorem is exactly similar to that of Theorem 3.1 of
K-N after the above said modiﬁcations are made in there. Details are left out for the

sake of brevity.

Theorem 2.3.1 Suppose (e1), (62), (63), (f1), (m1)-(m3), and (h2) hold. Then

under H0, 6n ——> 60, in probability.

2.4 Asymptotic Distribution of 6,,

In this section, we shall prove the asymptotic normality of ﬁ(6n — 60). The ﬁrst

step towards this goal is to show that
nhd||6n — 00))? = op(1). (2.10)
Recall the definition of Zn, and let Dn(6) = f 2721(2, 6)d1,bh2(2). We claim that
nthn(6n) = op(1). (2.11)

To see this, observe that nhdM-n,(60) = nhdﬂrrli 2:111 Kh(z — Zi)Ci]2dz/}h2(z) :-
Op(1) by (2.9) and (2.4). But, according to the deﬁnition of 6n, one has .M-n(6n) g

72

Mn(60). so nhdllln(6n) = Op(1). This fact, together with the inequality Dn(6n) S
2.6/[71((971) + 2Mn(60), proves (2.11).
Next, we shall show that for any a > 0, there exists an Na such that
P(Dn(6n)/||6n — 60H2 2 a + ”bilrll:1 bTEOb) > 1 — a, Vn > Na, (2.12)
where 20 as in (m6). The claim (2.10) then will follow from (2.11), (2.12), (m6), and
the fact.
nthn<éni = nhdllén — 6012 - [Dam/116}. — 60112].
To prove (2.12), let
an := in — 60, (2.13)

I ' .
dni 2= Hén(Zi) — H90(Z.i) — unH60(Zi)) IS 2 S n,

271“?) I: / [bl ° 72—12: Kh(2 - ZilH60(Zi)] 2dzbh2(2), b E Rq.
i=1

 

 

Note that
Dn(6n) /Z72l(2,6n) ~ 1/2 1/2
_._— : ——dibh(2)2D1+DQ—2D D , (2.14)
nan -90||2 Hun“? 2 " n "1 "2
where
D -= [[liK (2—Z-)( dm' )]ch5 (.~)
n1 ' ”ll—1 h 2 “an“ ’ hQ “ ’
I —1 n . ' .
"2 ' Hun“ h2

By the assumption (m4) and (2.10), one veriﬁes that Dnl = op(1). For the term

D712, note that

11,,22 inf 2,,(11). (2.15)
llbll=1

73

Decompose

211(1)) 2 /[b’.1_ Z Khl’: — Zi)H90(Z2j)]2dtlr(2)

Note that. EKh(2 — Z)H90(Z) = H90(2)fZ(2) + 0(1). Hence, by the Law of Large

Numbers, Z,,,1(b) —+ b’ZOb, for every b E IN. Moreover,

‘ ———1|.>:,,1(b)=op(1), VbeRq.
zEI wa(Z)

Also, note that for any 6 > 0, and any two unit vectors b1, b2 E Rd and ||b1 —b2|I S (5,

one has

|$n1(b2) - 3711011)!

= l/[<12—bl>'%§:Kh<z—22119.1(anng
i=1

 

+| / [(1)2 — b1) ’3; Zn: KW — 2911692)] [% 2": no — 2.1H90(Z.-)]dw<z)|
i=1 i=1
S 6(6 + 1) f 'E En: Kh(2 - ZilH90(Zz-) Edi/1(2).
i=1

But the expected value of the random variables inside the square of the second
factor tends to H90(2) f 2(2) in probability, so the second factor is Op(1). From
these observations and the compactness of {b E Rd : ||b|l = 1}, one obtains
suprH:1 ||)L‘n,(b) — b’ZObH = op(1). This fact, together with (2.15), implies (2.12) in
a routing fashion, and also concludes the proof of (2.10). We remark here that the
inequality (2.14) above corrects a typo in the K-N paper in the equation just above
(4.8) there on page 120.

74

We shall prove the asymptotic normality of \/1—1(6n — 60). The proof
is classical in nature. Recall the deﬁnition of Mn(6), and let Mn(6) =
—2fUn(2, 6)/1.-,7(2.6)(lc'},2(2). Sice 60 is an interior point of 9, by consistency for
sufﬁciently large n. 6,) will be in the interior of 8, and 11.1,)(672) = 0, with arbitrarily
large probability. But the equation Mn(6n) = 0 is equivalent to

/ U,,(z))1,,(z,én)d15,,2(z) = /2n(z.én))1n(z,én)dth2(z). (2.16)

We shall show that ﬁx the left hand side of this equation converges in distribu—
tion to a normal random variable, while the right hand side of this equation equals
any?” - 90), for all n 2 1, with Rn = 20 + op(1). To establish the ﬁrst of these two

claims, rewrite this random variable as the sum of Sn + Snl +9n1 +9712 + 9n3 + 9714,

where

Sn := /Un(2)uh(2)di,b(2), uh(2) := EKh(2 — Z)H90(Z),

Sm := [Un<z)ph<z)<1/f%w(z)-1/f§<z)>d0<z).

9721 / Un<z)11n(z,60) — mauve).
9.2 := / Un.(z)[1ln(z,90) —12h<z>1(1/f§,,<z> — 1/f§<z>>dc<z),
9,13 := /Un<z)111n(z.én) — uazﬂoiidwz),
9,2 := /Un(z)11n<z.én> —12n<z.60>)<1/f§w<z) — 1/f%<z))d0<z).
we need the following lemmas.
Lemma 2.4.1 Suppose (e1), (e2), (f1), (1:), (m1)-(m6), (1.1),(112) H0 hold.

(1') If, additionally, ((33) and (g) hold, then (friSn 77d N(0, E), where

(111.

 

2 2 ' i U .I ll 2 U
Z=/(0€ +7 (11)) H90( )H90( )9 ( )
fz(U)

75

 

 

(ii) If, additionally. (f2) and (hi?) hold, then
ﬁ|s,,1|=op(1). (2.17)

Lemma 2.4.2 Under H0, (e1), (e2), (f1), (k), (ml), (m2), (m4), (m5), (hl) and
(h2).

711/297,}; = op(1), k =1,2,3,4.

The proof of (2.17) is facilitated by the following lemma, which along with its

proof appears as Theorem 2.2 part (2) in Bosq (1998).

Lemma 2.4.3 Let f Zu,(2) be the kernel estimator associated with a kernel K * which
satisﬁes a Lipschitz condition. If (f2) holds and w is chosen to be an(log n/n)1/(d+4),

where an —+ a0 > 0, then

(log,c n)_1(n/log n)2/(d+4) sgglfzwb) — fZ(2)| ——> 0 as.

for any positive integer k.
Proof of Lemma 2.4.1. Again this proof is similar to that of Lemma 4.1 of K-N but
we include details here to see how the difference in the asymptotic variance appears.
For convenience, we shall give the proof here only for the case q = 1, i.e., when [1},(2)
is one dimensional. For multidimensional case the result can be proved by using linear
combination of its components instead of [1 [1(2), and applying the same argument.
Let 3,”: := f Kh(2 — Zi)C,jp,,(2)d1,/L1(2). Then ﬁS-n can be rewritten as J55" =
71-1/2 22:15,”. Note that 511i : 1 S i g n are i.i.d. centered random variables for

each n. By the Lindeberg—Feller CLT, it suffices to Show that

E8311 ——+ Z. E.s,2_111[|sn1|>n1/2A]—> 0, for VA > 0. (2.18)

76

2

711 is equal to

In fact, one can show that Es

ff]Kl'vlmtlaghdfz(U)/l),(u+th)/1,19%th) guﬂih) g(u+th)

2 dudvdt —> Z},
fZ(u +1.1h)fZ(u + th)

 

thereby proving the ﬁrst claim in (2.18). To prove the second claim, note that by the

Holder inequality, E33111 [lsnll > Til/QM is bounded above by
.. __. __ _ , (2+6)/2 I 2
A 5n 5/2133‘31 g A 5n 6/2E([/lKh(2—Z)uh(2) dw(2)] -|(|2+5).

By assumption (e3), this upper bound is seen to be of the order 0((nh2)_6/2) = 0(1)
by (h2), thereby proving the second claim in (2.18). The proof of (2.17) uses Lemma

2.4.3 and is similar to that of (4.6) of K-N, hence no details are given. Cl

Proof of Lemma 2.4.2. This proof is similar to that of Lemma 4.2 in K-N with
obvious modiﬁcations. Details are left out for the sake of brevity.

Next, we shall show that the right hand side of (2.16) equals Rn(6n — 60), where
Rn = 20 + 019(1). Recall the notation at (2.13). The right hand side of (2.16) can be

written as the sum Wnl + W712, where

dni

Hun“

W

n1

 

TL
. ~ 1 .
llunll - / me. 9n); 2 Kh<z — 2,) may),
i=1
Wn2 := / line,60)24..<z.én)d-ih2(z> - Un-
Observe that

n—l/Z/ EllKh(2 — Z)H90(Z)||2dy’1(2) = 0(n—1/2h—d) = 0(1). (2.19)

By (2.4), (2.19) and the assumptions (m4), (m5), we can show that lanl“ =

op(||un||) and W212 = 20 + 019(1). This proves Rn = 20 + 019(1).

77

 

Upon combining these results about the left hand side and the right hand side of

(2.16), we obtain the following theorem.

Theorem 2.4.1 Assume (e1)-(e.3’), (f1), (f2), (9). (k), (m1)-(m5), and (h3) hold.
Then under H0,

\/7—l(én —' 60) = 26.1711/2871 + 019(1).

Consequently, VH6); — 60) => N(0,2612261), where 2 and 20 are deﬁned in

Lemma 2.4.1 and ( m6 ) respectively.

The above theorem shows that the asymptotic variance of (ﬁt—(6n — 60) consists
of two parts. The part involving the element 0? reﬂects the variation in the regres—

2 reflects the variation in the

sion model, while the part involving the component r
measurement error. This is the major difference between asymptotic distribution of

the m.d. estimators discussed for the classical regression model in the K-N paper and

for the Berkson model here.

2.5 Asymptotic Distribution of the Minimized

Distance

This section contains a proof of the asymptotic distribution of the minimized distance
Mn(6n). Recall the notation in (2.3), the main result proved in this section is the

following
Theorem 2.5.1 Suppose (e1), (e2), (e4), (f1), (f2), (9), (k), (m1)-(m5) and (h3)
hold. Then under H0, nhd/2(1Mn(6n) — Cn) —>d N1(O, F). Moreover IP'nF—l — 1| 2

78

0})(1).

. . —1 2 1 .
Consequently, the test. that. rejects HO whenever 71h,([/2Fn / lilfn(6n) — Cnl >
20/2 is of the asymptotic size a, where 3a is the 100(1 — (.1)% percentile of the
standard normal distribution.

Our proof of this theorem is facilitated by the following ﬁve lemmas.

Lemma 2.5.1 Suppose (61), (e2), (e4), (f1), (9), (k), (hl) and (h2) hold, then under
H0,

nhd/2(Mn(60) — on) —»d N1(0, r).

Lemma 2.5.2 Suppose (61), (e2), (f1),(k), (m3)-(m5) (hl) and (h2) hold, then un-
der H0,

nhd/2|Mn(6n) — anon = 0pm).

Lemma 2.5.3 Suppose (e1), (62), (f1), (f2), (k), (m3)-(m5) and (h3) hold, then
under H0,

Lemma 2.5.4 Under the same conditions as in Lemma 2. 5. 3,
"lid/Zlén - énl = 01’“)-

Lemma 2.5.5 Under the same conditions as in Lemma 2.5 2, I‘m — I‘ = op(1),

Consequently, the positive deﬁniteness ofI‘ implies lf‘nf‘_1 — 1| = op(1).

The proof of the Lemma 2.5.1 is facilitated by Theorem 1 of Hall (1984) which is
reproduced here for the sake of completeness.

79

Theorem 2.5.2 Let Zil S i S n, be i.i.d. random vectors, and let

Lin :: Z HTI(Z~i~Zj)~ anJ'alI) f: EH"(ZII)Hll(21~y)~
132(an

where H n is a sequence of measurable functions symmetric under permutation with
E[Hn(21,22)|21] = 0, EH,2,(Z”1,22) < 00 v n. 21.

If, additionally.

EG%(21, 22) + 71-1 EH74)(Z~19 22)
1EH.2.<21.22)12

 

—>0, asn—>oo,

then Un is asymptotically normally distributed with the mean 0 and the variance

2 - -
E; EH,2,(21.22).

Proof of Lemma 2.5.1. Note that Mn(60) can be written as the sum of On and

hing, where

1 .
121122 3: ”—2' Z / Kh(z “ ZilKh(Z _ Zlez'delé’(3)-
215.1

We shall prove that Mid/2111,12 —+d N1(O, I‘) with the help of Theorem 2.5.2. Let

Zi = (Z1443) and Hn(2i,2j) = n_1hd/2th(2 — Z.l')[{h(z — Zj)CideU/'(Z). Then,
nhd/2111n2 = 2 Z Hn(z‘,, 2).

Observe that Hn(Z,-,Zj) is symmetric, E[Hn(Z1,Z1)|Z1] = 0, and EH,%(Z1,Z2)

equals to

 

132—1}? // [/I('(u)1{(y Z :1: + 11) 02(1‘ — uh)fZ(r -uh)(1u]2dtf1(.r)dw(y)

80

which is ﬁnite for each n 2 1. Hence, to apply Theorem 2.5.2, it remains to show

 

 

that
EGﬁle‘ Z2) 0 “Q—IEH4(Z1 Z2) (2 20)
[EH-n( (Z1 Z2) >12 [EHn( (Z1,Z2)l2
But by the similar method as in K-N’s paper, we can show that
20,2,(21. 22) = 0(n_4hd), EH;4,(21, 22) = our-411d). (2.21)
EH3 (21, 22) (2.22)

= g// [/K(u)K(y—;—£ + u)og(;c — uh)fZ(:r — uh)du] 2dw(21:)dw(y)

= 001—2).

This veriﬁes (2.20). By (2.22), the continuity of (72(2) and fZ(2), we obtain that

2’12 EHI2I(Z~1222) converges to

 

2 :r
%K(q//// u) (+w +)uK(v)K(v + w)(og(:c))2f%($) g4( )drdudvd11223)
12(33)

= 2]“ 03))? may (:13) /( /K(u)K )K(w+u)du)2dw.

This completes the proof of Lemma 2.5.1. C]

Proof of Lemma 2.5.2. Recall the deﬁnitions of Un(2) and Zn(2, 6) from (2.6). Add
and subtract H9 0(Z i) to the i-th summand inside the square integrand of Mn(6-n),

to obtain that
Mn(g())—1l[n(én) = Z/Unleznflaé'n.)d12}12(2)_/Zﬁ(zaén)(l'12h2(zl =3 2Q1"Q2‘
It thus sufﬁces to show that.

TIhd/2Q1 = op(1), nhd/2Q2 = op(1). (2.24)

81

By subtracting and adding (63" — 60)’H60(Zi) to the i-th summand of the second

factor in Q1, we can rewrite Q1 as the sum of Q11 and Q12, where

1 n ,
Q11 == /Un( )[n ZKh(Z_Zi)d'nild7#/)h2(3),
Q12 3: “rt/U729) MM 2 90)d¢h2(2 2)

where dm’ are as in (2.13). By (2.10), for for any 77 > 0, there exists a k < 00, N < 00,
such that P(An) _>_ 1— 77 for all n > N, where An = {(nhd)1/2||én — 60” S k}. By

the Cauchy-Schwarz inequality, (2.4), (2.9), and the fact

/ wa (1th z) 0pm (2.25)

we obtain that on An, nhd/ 2lQ11l is bounded above by

_d__m'

IT M 0 p((nhd) V2).

fnnén — Hon ' (nhdﬂ/2 sup
IsiSn,(nhd)1/2ll9—90ll<k

This bound in turn is 019(1) by Theorem 2.4.1 and the assumption (m4). Hence
to prove the ﬁrst claim in (2.24), it remains to show that 1111‘” 2|Q12| =
0p(1). But Q12 can be written as the sum of (2121 and (2122, where 62121 =
(9n — 90) ,fUMZ limb»? gnldl/Jh2(3 ) @122 = —9( n — 90) IUMZ) )l/Jn(Z 9n) —
un(z,60)]dtﬁh2(z). Arguing as above, on the event An, nhd/2IQ122I is bounded

above by

2 d ‘ 2 d —1 ' 2
n h ||9n-90|| -op<<nh> l'lglflénllHnCZz')‘H60(sz)ll -0p<1>=op<1>,

by (2.4), (2.9), (2.25) and assumptions (m4) and (h2). Next. note that 62121 is the

same as the expression in the left hand side of (2.16). Thus it is equal to

82

 

 

tin/3M2 9n)#n(3 ~énld1+5h2(zl
= “71/an .éninn(z.90)dt7)h2(z>
+11.;l/Zn(z.éyl) [11.»n,(z,é-r1,) — ﬂln(z,90)]dzfjh2(z)

2: D1+ D2.

By Cauchy-Schwarz inequality, (2.4), (2.25), assumption (m1) and the compactness
of G, nhd/2ID1I S nhd/Qllén — 60||0p(1) = Op(hd/2) = 019(1) by Theorem 2.4.1 and
(h2). Similarly, one can show that nhd/2ID2| S nhd/zllén — 60||0p(1) = 0p(hd/2) =
0p(1). This completes the proof of the ﬁrst claim in (2.24).

The proof of the second claim in (2.24) is similar. Details are left out for the sake

of brevity. Cl

Proof of Lemma 2.5.3. Note that

nhd/2|111n(90) - anon
_—_n.d/2 in z—-'.'2 1 — 1 Z
h l / lngm Z’Kll (fgwe) f§<z>lda< )l

_<. Mud/‘2 - 0p((nhd)“1> . 0p<<logw n) - (loan/n)2/ (“1+”) = 0pc)

 

by (2.9) and Lemma 2.4.3. Hence the lemma.

Proof of Lemma 2.5.4. For convenience, let i,- := HA (Z2) — H60(Zi)v [371(2) :2

2 6n
.2
TfQZQ —- 1, then one obtains
wa(z)

C‘n: 2Z/Ilh 217M,- (lthz 22/11,, 2— - ' —t')2dL'Zh2()

83

it can written as the sum of Am and .4712, where

n
1 ’ - .1
Anl := EZ/Rgb—Ziﬂgi—tl-)2dw(::)
i=1
A ‘— if H2<z—Z-><c-——t-)2A Md 11(2)
n2 '— n2. 1 h ‘l -2 “z 7? 2 '9 -
2:

In order to prove the lemma, it sufﬁce to show that
nhd/2(An1 — C”) = 0p(1), 71hd/2Ang = 013(1). (2.26)

By expanding the term (Ci — til2 in Am and noting that. max ltil2 = Op((nhd)_1)
by (m4) and (2.10), the ﬁrst claim in (2.26) follows the similar argument as in K-N.

To prove the second claim in (2.26), note that An2 can be written as

1 n .
Anz = g: [mic—am -t.~>2An<z)du/'<z>
i=1
1 n 1 n
= .7? ' 1 K212 — zaqfamadwz) + 7—1—2— 21 / Kgcz — Zi)t22An(z)d1/J(z)
Z: 2:

2 n
___2_ Z /Kg(z — Zl')CitiAn(Z)d'Cl’(z)-

But all the three terms on the right hand side are of the order 0p((nhd/2)—1).
Thereby completing the proof of the second claim of (2.26), and hence that of the

lemma. [:1

Proof of Lemma 2.5.5. Deﬁne
_ , . 2 - -
Pin 3: 2n 211d 2 ([11,1(3 — Z.i)1\h(3 — Zj)CjdeU(/7)) = 2 Z H721(Z'i’Zj)’
#J' 1791'
2hd(n — l)n_1//[E11h(r — Z)11'h(y — Z)UE(Z)]2dz,/b(r)d¢i(y).

Pn,
We shall prove

It"; — fn = 01)(1).fn —- Fn '2 01)(1), Fn — P = 0p(1). (2.27)

84

 

Note that Fn, can be rewritten as the sum of the following three terms:

2
B1 ;= 2,—2/.d2(/1(h(.3,_2/.z—)Ah(—ZJ-)(C —t )(cJ— tj)du(z)) ,

#J

2

B2 := 2n(_2hd:: (Lt—[11,, KW ‘21)“) —' HMCJ' " tlenaldt’i’Wl) )

#J'
B3 ;= 4.2-2m; (faw z— Z)Kw(z-Z,-)<<.- —t.:)<<,- —t,—)dw<2)-

2 J

/ Kh<z -— Z.)Kh(z — Z,)<c.- — t.)<c,~- t,)An<z )dw(2))

So, to prove the ﬁrst claim in (2.27), it sufﬁces to show that

By taking the expectation, Fubini and usual calculation one can obtain

2
n-2hd:(/Khz_z,)1<,,(z_ Zj)lC.-II.CJ-ldv>(z)) =0p<1),

#J‘
2
"Md: W19, Z-)Kh(z — Zj)|C,-|d-¢b(z)) = 010(1).
iaéJ'
n-2,.) Z ( [Kc Z'lKh(Z — Z,-)d-u>(z))2 = 0pm.
#J'

Furthermore, we also have

supA 2 =01, max t =01
zEI (1() p() 13i<n| l p()

(2.28)

(2.29)

(2.30)

(2.31)

(2.32)

by (2.4), (m4) and (2.10). By expanding (C27 — tJ-)(CJ- — tj) and the quadratic terms

in B1. we have,

2hd

- , 2
IB1- Pnl _<. 712— Z (/K})(Z —‘ szlKhKZ — Zj)(|tzi’j| + lCil‘jl + le’zlldw(3))

#J'

85

WW 2 ( / Km 2 - Z .)K..2—< Z_.-)|<.c,).z.)(.)
#J'

X +/Kh(z — Z.l')Kh(Z — Zj)(ltitjl + lCitjl + ICJ‘ijI)(1L)(2))

:2 8721+ 3722-

By (2.30), (2.31) and (2.32), one has Bnl = 0p(1), and 87,2 2 013(1). Hence lBl —
F'nl = Up“)-
Next, consider 82. Note that

2
B2 < 2.... An_2hdZ(/Kh(Z—1)Kh(z— Z,)IC.-t.||Cj—t,-Idw(z))

which is of the order 019(1) by the inequality lCi‘tz'l - le _th S lCile+(ltitjl+lCitjl+
WM) and expanding the quadratic terms, and by (2.32), (2.29), and the results that
312 = 0p(1), 813 = op(1). Finally, again an application of the Cauchy-Schwarz
inequality to the double sum yields B3 2 019(1). This completes the proof of (2.28)
and hence that of the ﬁrst claim in (2.27).

To prove the second claim in (2.27), note that P1,, = Ef‘n. Hence, with Cij =

f Kh(z — Zl)Kh(z — Z2)C.J-der,b(z), one obtains

 

E[f‘n — 1“,.)2
2d
= "1”JJE[ZC,2 —EC.2J.)]2 < 4n—4h2d: E( ij) +422 4th Z ECgJCg,
2222' 2222' 12222221
= 4: EH,,‘§(2,. " )+4 2 EH,( 2J-)H,2,(2, 2,)
2222' 12.22221

g 4(n2 + n3) EH4(Z~,-, 2]) = 0(n—1hd) = 0(1).

by (2.21) and (111). thereby proving the second claim in (2.27).

86

The third claim in (2.27) is easily obtained from the following fact.

I“), = 212d( )n_1//[EKh( I. — Z)Kh(y — (.Z)]2c12,.’)(.r) )d2,) (y)
412(n —— 1) n2
z 2n(n ——~ 1) EH.,2,(21, 22)— T —2—- ——-EHQ(Zl, 22) r
by (2.23). This completes the proof of Lemma 2.5.5. [:1

2.6 Simulations

This section contains results of two simulation studies corresponding to the following
cases: Case 1: d = q = 1 and my linear; Case 2: d = q = 2, and ma nonlinear.
In each case the Monte Carlo average values of én, NISE(én), empirical levels and
powers of the m.d. test are reported. The asymptotic level is taken to be 0.05 in all
cases.

In the first case {21-}:121 are obtained as a random sample from the uniform distri-
bution on [— 1 l] {Ei}n'___1 and {771}? _1 are obtained as two independent random sam-
ples from N1(0.(0.1)2). Then (Xi, Y2) are generated using the model Y2 = ,u(Xi)+ez-,
Xi = Zi+7livi=1222”' ,n

The kernel function and the band widths used in the simulation are L

K(z) = I’*(z) = 3(1— 22)I(|z| g 1), h = an—1/3, w = bn—1/5(10gn)1/5,

with some choices for a, and b. The integrating measure G is taken to be the uniform

 

measure on [—1. 1].
The parametric model is taken to be m6(3:) = 8.2:, :r,0 E R, 60 = 1. Then,
H9(z) = 62. In this case various calculations simplify as follows. By taking the

87

derivative of Mn(6) in 8 and solving the equation of 0111,),(9) /06 = 0, we obtain

é-n, = An/Bn . where

1 n 'n n _2
An. = /—1[Z A’h(2 — 20%] ' [Z K'h(z — 202?] ' [Z Ku)(Z -‘ 22)] d2,
2:1 221 i=1
8,. = /_11[§nj Km: — 2022:]? [2”: me — Zn] _de.
21:1 221
Then.
Mm”) = [_11 (i KW — 2,)(13- - énzi))2. (Zn: Kw(z — Zi))—2dz
1 2:1 2:1 _2
6.. = f 1()“, K222 — Zam- - Ma?) - (Z Kw(Z - 2») dz.
— i=1 i=1

The value of the test statistic is calculated by ﬁn := nhd/zfgl/Zﬂwnﬂén) — 6'72).
Table 2.1 reports the Monte Carlo mean and the MSE(én) under H0 for the

sample sizes 50, 100, 200, 500, each repeated 1000 times. One can see there appears

to be little bias in én for all chosen sample sizes and as expected, the MSE decreases

as the sample size increases. To assess the level and power behavior of the Dn test,

 

Sample Size 50 100 200 500

 

Mean 1.0003 0.9987 1.0006 0.9998
MSE 0.0012 0.0006 0.0003 0.0001

 

 

 

 

Table 2.1: Mean and MSE of (in, d = 1, q = 1

we chose the following four models to simulate data from. In each of these cases
X2- : Zl- + 772'.

88

Model 0: Y2 : X,- + 62:.

Model 1: Y, = X, + 0.3x,2 + 6,.

Model 2: Y, = X,- + 1.4 exp(—0.2X2-2) + 62',
Model 3: Y2 = XiI(X.,j 2 0.2) + 62'.

To assess the effect of the choice of (a, b) that appear in the bandwidths on the
level and power, we ran the simulations for numerous choices of (a, b), ranging from
0.2 to 1. Table 2.2 reports the simulation results pertaining to ﬁn for three choices
of (a, b). The simulation results for the other choices were similar to those reported
here. Data from Model 0 in this table are used to study the empirical sizes, and from
Models 1 to 3 are used to study the empirical powers of the test. These entities are
obtained by computing #{Iﬁnl Z 1.96} / 1000.

From Table 2.2, one sees that the empirical level is sensitive to the choice of (a, b)
for moderate sample sizes (n S 200) but gets closer to the asymptotic level of 0.05
with the increase in the sample size, and hence is stable over the chosen values of
(a, b) for large sample sizes. On the other hand the empirical power appears to be far
less sensitive to the values of (a, b) for the sample sizes of 100 and more. Even though
the theory we developed is not applicable to model 3, it was included here to see the
effect of the discontinuity in the regression function on the power of the minimum
distance test. In our simulation. the discontinuity of the regression has little effect
on the power of the minimum distance test.

Now consider the case 2 where d = 2, q = 2 and {m9(x) = 61.731 + exp(62:r2), 6 =
(91,62)T E R2, 171,2?2 E IR. Accordingly, here H6(z) = 6121 + exp(6222 + 0.00563).
The true 60 = (1, 2), was used in the simulations.

89

 

 

 

Sample size

 

Model

a,b

50

100

200

500

 

Model 0

0.3, 0.2
0.5, 0.5

1.0, 1.0

0.007
0.014

0.021

0.026

0.022

0.020

0.028

0.040

0.031

0.048
0.051

0.043

 

Model 1

0.3, 0.2
0.5, 0.5

1.0, 1.0

0.754
0.945

1.000

0.987

1.000

1.000

1.000

1.000

1.000

1.000
1.000

1.000

 

Model 2

0.3, 0.2
0.5, 0.5

1.0, 1.0

0.857

0.999

1.000

0.996

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

 

Model 3

 

 

0.3, 0.2
0.5, 0.5

1.0, 1.0

 

0.874

1.000

1.000

0.993

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

 

Table 2.2: Levels and powers of the MD. test, d = 1, q = 1

90

 

 

 

In all models below, (Z,- = (Z 1,. Z2,)T}§':__ 1 are obtained as a random sample

from the uniform distribution on [—1, 1]2, {5,}?21 are obtained from N1(0, (0.1)2),

and {r},- = (7)1,-.'172,-)T}?;_1 are obtained from the bivariate normal distribution with

mean vector 0 and the diagonal covariance matrix with both diagonal entries equal

to (0.1)2. We simulated data from the following four models. where X,- = Z, + 77,-.
Model 0: Y, = X1,- + exp(2X2,-) + 6,,
Model 1: Y, = X1,- + exp(2X2,-) + 1.4Xf, + 1 + 6,,
Model 2: Y, = X1,- + exp(2X22‘) + 1.4X§,X§, + 6,,
Model 3: Y,- = X1,- + exp(2X2,-) +1.4(exp(—0.2X1,-) + exp(0.7X%,)) + 6,.
The kernel function and the bandwidths used in the simulation were taken to be

K<z> = K*<z>=,—95(1—z%><1—z3>1<121|31.122131).

h = 71—1/4'5, w=n—1/6(logn)1/6.

The sample sizes chosen are 50, 100, 200 and 300, each repeated 1000 times. Table
2.3 lists the means and the MSE of the estimator én = (énlv 97,2), which are obtained
by minimizing 1147109) and employing the Newton-Raphson algorithm. As in the case
1, one sees little bias in the estimator for all chosen sample sizes.

Table 2.4 gives the empirical sizes and powers for testing Model 0 against Models 1
- 3. The entries in Table 2.4 corresponding to Model 0 are used to study the empirical
size of the m.d. test, and the entries from Models 1 - 3 are used to study the empirical
power of the test. From this table one sees that our m.d. test is conservative when the

sam le sizes are small. while the sizes do increase with the sam le sizes and indeed
P .

91

 

 

preserve the nominal size 0.05. It also shows that the m.d. test performs well for

sample sizes larger than 200 at all alternatives.

 

Sample Size 50 100 200 300

 

Mean of 0,,1 0.9978 0.9973 0.9974 0.9988

MSE ofénl 0.0190 0.0095 0.0053 0.0034

 

Mean of 0,2 1.9962 1.9965 2.0013 2.0004

0.0063 0.0028 0.0014 0.0010

 

 

 

MSE of 0,2

 

Table 2.3: Mean and MSE of (in, d = 2, q = 2

 

 

 

 

Sample size 50 100 200 300
Model 0 0.003 0.019 0.049 0.052
Model 1 0.158 0.843 0.979 0.996
Model 2 0.165 0.840 0.976 0.992
Model 3 0.044 0.608 0.954 0.997

 

Table 2.4: Levels and powers of the MD. test, d = 2, q = 2

92

 

BIBLIOGRAPHY

[1] An, H.Z., Cheng, B., (1991). A Kolmogorov-Smirnov type statistic with appli-
cation to test for nonlinearity in time series. Int. Statist. Rev. 59, 287-307.

[2] Anderson,T.W. (1984). Estimating Linear Statistical Relationships. Ann. Statist.
12 1-45.

[3] Beran, R.J. (1977). Minimum Hellinger distance estimates for parametric models.
Ann. Statist. 5 445-463.

[4] Berkson, J. (1950). Are these two regressions? J. Amer. Statist. Assoc. 5 164-
180.

[5] Bickel, P.J. 82: Ritov,Y. (1987). Efficient Estimation in the Errors in Variables
Model. Ann. Statist. 15, 2, 513-540.

[6] Bosq, D. (1998). Nonparametric statistics for stochastic processes: Estimation
and Prediction, 2nd edition. Springer Lecture Notes in Statistics, 110. Springer-
Verlag, New York, Inc.

[7] Carroll,R.J. & Hall,P. (1988). Optimal rates of convergence for deconvoluting a
density. JASA. 83 1184-1185.

[8] Carroll,R.J. & Spiegelman,C.H. (1992). Diagnostics for nonlinearity and het-
eroscedasticity in errors in variables regression. Technometrics 34 186-196.

[9] Carroll, R.J., Ruppert, D. and Stefanski, LA. (1995). Measurement Error in
Nonlinear Models, Chapman & Hall/CRC, Boca Raton.

[10] Cheng, C. and Van Ness, J .VV .(1999). Statistical regression with measurement
error. Arnold, London.

[11] Cheng, CL. and Kukush, A.G.(2004). A goodness-of— fit test for a polynomial
errors-in-variables model, 56 641-661.

93

[12] Elias Masry (1993). Strong consistency and rates for deconvolution of multivari-
ate densities of stationary process. Stochastic Processes and their Applications
47 53-74.

[13] Eubank.R.L., Hart, J.D., (1992). Testing the goodness of ﬁt in regression via
order selection criteria. Ann. Statist. 20 1412-1425.

[14] Eubank,R.L., Hart, J.D., (1993). Commonality of CUMSUM, von Neurnann and
smoothing based goodness-of-fit tests. Biometrika 80 89-98.

[15] Eubank,R.L., Spiegelman, OH, (1990). Testing the goodness of fit of a linear
model via nonparametric regression techniques. J. Amer. Statist. Assoc. 85 387-
392.

[16] Fan,J. (1991a). On the optimal rates of convergence for nonparametric deconvo-
lution problems. Ann. Statist. 19 1257-1272.

[17] Fan,J. (1991b). Asymptotic normality for deconvolution kernel density estima-
tors. Sankhyc’i Ser. A. 53 97-110.

[18] F an,J . & Truong, K.T. (1993). Nonparametric regression with errors in variables.
Ann. Statist. 21 1900-1925.

[19] Fuller, W .A. ( 1987). Measurement Error Models.Wiley, New York.

[20] Gleser,L.J. (1981). Estimation in a Multivariate ”Errors in Variables” Regression
Model: Large Sample Results. Ann. Statist. 9, 1, 24-44.

[21] Hart, JD. (1997). Nonparametric smoothing and lack-of-ﬁt tests. Springer-
Verlag, New York, Inc.

[22] Huwang, L. and Huang, Y.H.S. (2000). On errors-in-variables in polynomial re-
gression - Berkson case. Statist. Sinica. 10, 923-936.

[23] Koul, Hira L. and Pingping Ni (2004). Minimum distance regression model check-
ing, J. Stat. Plann. Inference 119, No.1, 109-141.

[24] Mack, Y.P. and Silverman, B.W. (1982). Weak and strong uniform consistency
of kernel regression estimates, Z. Wahrsch. Gebiete 61, 405-415.

[25] Rudemo. M., Ruppert, D. and Streibig, J. (1989). Random effect models in

nonlinear regression with applications to bioassay. Biometrics. 45 349-362.

94

[26] Stute, W". (1997). Nonparametric model checks for regression. Ann. Statist. 25
613-641.

[27] Stefanski, LA, and Carroll, RJ. (1991). Deconvolution-based score tests in
measurement error models. The Annals of Statistics 19 249-259.

[28] Stute, W. (1997). Nonparametric model checks for regression. Ann. Statist. 25
613-641.

[29] Stute, W., Thies, S., Zhu, L.X. (1998). Model checks for regression: an innovation
process approach. Ann. Statist. 26, 1916-1934.

[30] Wang, L. (2003). Estimation of nonlinear Berkson-type measurement errors mod-
els. Statist. Sinica. 13, 1201-1210.

[31] Wang, L. (2004). Estimation of nonlinear models with Berkson measurement
errors.Ann. Statist. 32, 6, 2559—2579.

[32] Wolfowitz, J ., (1953). Estimation by the minimum distance method. Ann. Inst.
Statist. Math, Tokyo, 5 9-23.

[33] Wolfowitz, J., (1954). Estimation by the minimum distance method in nonpara-
metric stochastic difference equation. Ann. Math. Statist., 25, 203-217.

[34] Wolfowitz, J ., (1957). The minimum distance method. Ann. Math. Statist., 28,
75—88.

[35] Zheng, J .X., (1996). A consistent test of functional form via estimation technique.
J. Econometrics, 75, 263-289.

[36] Zhu,L.X., Song,W.X., & Cui,H.J. (2003). Testing lack-of-fit for a polynomial
errors-in—variables model. Acta Math. Appl. Sin. Engl. Ser. 19 353-362.

95

   

II[[[l]]]l]]l]]l[llj]][[1]][1