m LIBRARY
2- Michigan State
2008 University

This is to certify that the
dissertation entitled

ON SOME INFERENCE PROBLEMS FOR CURRENT
STATUS DATA

presented by-
DEEPA AGGARWAL

has been accepted towards fulﬁllment
of the requirements for the

Ph.D. degree in STATISTICS AND
PROBABILITY

 

 

ﬂag/'—

 

Major Professor's Signature
”“27 9,1 M?
Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

 

-.-—.—.—p-.~._-—.-.~.-.~.-.—.-—.—-—u--.—.—.-—.--—---—-—.--—.-.—---._.—.—.—‘a.-.-—-_I-.--—.—.—-—-—.-—o—.—.—-—u—.-n-

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 KrlProﬂAccS-Pres/CIRC/DateDue.mdd

 

On Some Inference Problems For
Current Status Data

By

Deepa Aggarwal

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

2008

ABSTRACT

On Some Inference Problems For
Current Status Data

By

Deepa Aggarwal

In the current status or interval censored case 1 data, one does not observe the
event occurrence time but only the inspection time and whether the event has occurred
prior to the inspection time or not. This thesis consists of two parts. The ﬁrst
part pertains to ﬁtting a parametric model to the distribution function of the event
occurrence time in the one sample set up with current status data. In this part,
we ﬁrst discuss two analogous minimum distance inference procedures for ﬁtting a
regression function in the classical regression set up. These distances are based on
squared deviations of a nonparametric regression function estimator and the model
being ﬁtted. In the one distance the integrating measure is a— ﬁnite and in the second,
it is data dependent. The thesis establishes asymptotic normality of the proposed
empirical minimum distance statistic and that of the corresponding estimator under
the ﬁtted model in a general regression set up. Then, these results for empirical
minimum distance test are adapted to ﬁt a parametric model to the distribution of

the event occurrence time based on current status data. It also contains a ﬁnite

sample comparison of the proposed test with Koul and Yi test and the one sample
Cramér-von Mises test based on nonparametric maximum likelihood estimator of the
distribution function of the event occurrence time.

The second part of the thesis pertains to testing for the equality of the two event
occurrence time distribution functions in the two sample setting when the data is in-
terval censored case 1 from both samples. It derives the asymptotic distribution of the
underlying test statistic both under the null hypothesis and under local alternatives.
It also contains a ﬁnite sample comparison of the proposed test with the two sample
Cramér-von Mises test based on nonparametric maximum likelihood estimators of

the time to event distribution functions.

ACKNOWLEDGMENTS

I wish to express my deepest regard to my advisor Professor Hira L. Koul for his
invaluable guidance, generous support. I have enjoyed our interactions tremendously.
I thank Dr Koul for providing me with countless opportunities to grow both personally
and professionally. This dissertation could not have been completed without his help
and support.

I would also like to thank Professors Sarat Dass, Dennis Gilliland and Habib
Salehi for serving on my guidance committee. My special thanks go to Professors
Connie Page, Dennis Gilliland and Sandra Herman for their advice when I was at the
consulting service.

I would like to thank my family especially my mother, my husband and my daugh-
ter for all the support and encouragement provided by them during my graduate
education.

This research was partly supported by the NSF grant DMS 0704130 with Professor

Koul as P.I.

iv

TABLE OF CONTENTS

1 Introduction 1

2 Empirical Minimum Distance
Lack-Of—Fit Testing

In Regression Model 8
2.1 Introduction ................................ 8
2.2 Assumptions ................................ 12
2.3 Consistency of 0;; and ﬁn ......................... 14
2.4 Asymptotic distribution of «in ...................... 28
2.5 Asymptotic normality of Mn (0n) .................... 41

3 Minimum Distance Goodness-Of—Fit Tests

For Current Status Data 60
3.1 Introduction ................................ 60
3.2 Minimum Distance Statistics and Tests ................ 63
3.3 Empirical Minimum Distance Statistic ................. 67
3.4 Simulations ................................ 68

4 Testing the equality of two distributions with Current Status Data 75

4.1 Introduction ................................ 75
4.2 Asymptotic behavior under the null hypothesis and local alternatives 78
4.3 Simulations ................................ 88
BIBLIOGRAPHY 93

3.1

3.2

3.3

3.4

3.5

3.6

3.7

3.8

4.1

4.2

4.3

4.4

4.5

4.6

4.7

4.8

LIST OF TABLES

Mean and MSE of tin, X, T N exp(1), 00 = 1 ............. 72
Empirical sizes of Mn(én), X, T ~ exp(1) ............... 72
Power of Mn(én), T ~ exp(1), (c1,c2) = (.9, 1) ............ 73
Simulated percentiles of CV1, X, T N exp(1) ............... 73
Empirical sizes of CV1, X, T ~ exp(1) ................. 73
Mean and MSE of (An, X, T ~ exp(1), 00 = 1 ............. 74
Power of CV1, T = exp(1) ......................... 74
Empirical sizes, X,T ~ exp(1), (01,02) = (.9, .8) ............ 74
Empirical sizes of T, X, Y N exp(1), S,T ~ exp(1) .......... 90
Empirical sizes of T, X, Y ~ exp(1), (n1,n2) = (180, 200) ...... 90
Power of T, S,T ~ exp(1), X ~ exp(1), n1 = 77.2 = 50 ......... 91
Simulated percentiles of CV2, X, Y ~ exp(1), S,T ~ exp(1.5) ..... 91
Empirical sizes of CV2, X, Y ~ exp(1), S, T N exp(1.5) ........ 91
Simulated 95th percentile of CV2, X, Y ~ exp(1). ........... 92
Empirical sizes, X, Y N exp(1), (c1,c2) = (25,1). ........... 92
Power, S,T ~ exp(1), X ~ exp(1), (01,02) = (.2, .9). ......... 92

vi

CHAPTER 1

Introduction

In recent years there has been a considerable research on the analysis of interval-
censored data. In the case I interval censored data, an event occurrence time X
is unobservable, but one observes an inspection time T and whether an event has
occurred prior to this time or not. This type of data is also known as current status
data. It is different from the right censored data where one observes true life time in
the case of no censoring and only censoring time when life time is censored.

Current status data often arises in epidemiology, demography and economics. For
example, as mentioned in Jewell and Van der Laan (2004), in the study of infectious
disease Human Immunodeﬁciency Virus (HIV), in particular, the partner studies of
HIV infection. These partnerships are assumed to include an index case who has been
infected via some external source, and a susceptible partner who has no other means
of infection except the contact with the index case. Suppose X denotes the time
from infection of the index case to infection of the susceptible partner and T is the

time the susceptible partner is examined after infection of the index case. Then the

infection status of the susceptible partner provides current status data. For some more
applications for current status data, see Hoel and Walburg (1972), Finkelstein and
Wolfe (1985), Finkelstein (1986), Diamond, McDonald and Shah (1986), Diamond
and McDonald (1991), Keiding (1991) and Jewell and Van der Laan (2004).

This thesis is concerned with the following two problems. The ﬁrst problem per-
tains to ﬁtting a parametric model to the distribution function of the event occurrence
time in the one sample set up with current status data. The second problem is con-
cerned with testing for the equality of the two event occurrence times distribution
functions in the two sample setting when the data is interval censored case 1 from
both samples.

We shall now focus on the ﬁrst problem for the moment. To describe this problem a
bit more precisely, let F denote the distribution function (d.f.) of the event occurrence
time X, G be a subset of q-dimensional Euclidean space IN, and (Fe, 0 E G} be
a known parametric family of d.f.’s on [0,00). Let T be the inspection time and
6 := I (X S T) and I be a compact interval of [0, 00), where I [A] denotes the
indicator function of the event A. We assume X to be independent of T. The problem

of interest here is to test the hypothesis
H01 : F(t) = F90“), for all t E I, for some 90 E 6,
against the alternative
H11 : H01 is not true.

It is natural to base tests of H01 on a distance between the nonparametric max-
imum likelihood estimates F of F and (Fe, 0 E 6}. One such test statistic is the

2

Cramér—von-Mises statistic

cvl = 1an / (F(x) —F0(z))2dﬁ(x).

But unfortunately neither the ﬁnite sample nor the asymptotic null distribution of this
statistic is known because of the complicated nature of the distribution of F‘. Even
asymptotic distribution of a suitably standardized E is intractable, cf., Groeneboom
and Wellner (1992).

An alternative way to proceed is to use the well known regression relationship
between 5 and F(T), i.e., E(6|T) = F (T), and the fact that this regression is het-
eroscedastic. In this context then the problem of testing H01 is equivalent to testing
the lack—of—ﬁt of the parametric regression model {F9, 6 E 6}.

There is a vast literature on the problem of testing for the lack—of-ﬁt of a para-
metric regression model. The monograph of Hart (1997) provides a nice overview
on the subjet till 1997. Using the ideas of Khmaladze (1979), Stute, Thies and Zhu
(1998) proposed an asymptotically distribution free test for this problem based on
a martingale transform of a certain marked empirical process of the residuals. Koul
and Ni (2005) used the minimum distance methodology to propose a class of tests for
the same problem. In all this literature the data is completely observable.

Using the above mentioned equivalence between testing H01 and the correspond-
ing lack-of-ﬁt testing of a regression model, Koul and Yi (2006) adapted the Stute—
Thies-Zhu test to test for H01. They provide sufﬁcient conditions for consistency
of their test at a ﬁxed alternative and derive an expression for its asymptotic power

against local alternatives.

Koul and Ni (2004) used the integrated square distance between a kernel type
nonparametric estimator of the regression function and the model being ﬁtted, where
the integrating measure is a a—ﬁnite measure. A practical problem that arises in using
these statistics is the choice of the integrating measure. Although one may choose
this by using some optimality criteria, such a measure will invariably depend on the
model being ﬁtted and the design distribution.

In this thesis we ﬁrst discuss two analogous minimum distance inference proce-
dures in the classical regression set up, ﬁrst when the integrating measure is a—ﬁnite
and second when the integrating measure is data dependent-viz, the empirical d.f. of
the design variable. We prove asymptotic normality of the proposed empirical mini-
mum distance statistic and that of the corresponding estimator under the ﬁtted model
in a general regression set up. Then, these results for empirical minimum distance
test are adapted to ﬁt a parametric model to the distribution of the event occurrence
times based on current status data. We also show consistency of the proposed mini-
mum distance tests against a ﬁxed alternative and obtain asymptotic power against
a class of local alternatives for current status data.

We now describe the second problem of this thesis. To describe it more precisely,
let F1 (F2) denote the d.f. of event occurrence time X (Y) from the ﬁrst (second)
population, and let S (T) be the corresponding inspection time. In the two sample
current status data set up, one observes (6, S) and (n, T), where 6 = I [X S S] and

17 = I [Y S T]. The problem of interest here is to test the null hypothesis that the

two event occurrence distributions are the same, i.e.
H02: F1(a:) = F2(rc), for all :1: E I,
against the alternative
H12: F1(:r) 79 172(1), for some :2: E I.

Similar to the one sample set up discussed above, it is natural to base tests of
H02 on nonparametric maximum likelihood estimates F1 and F2 of F1 and F2. One

such test is based on the Cramér-von—Mises statistic

 

 

CV2 : 7117-2712 ‘/J:' (131(1) - F2(33))2d13‘1 (at)
1217:2112 /1_(ﬁ‘1(:r)— ﬁ2($))2dﬁg($).

Again for the same reasons given above asymptotic null distribution of such a statistic
is not currently tractable.

An alternative way to proceed is to use the well known regression relationship
between 6 and 171(5), and 77 and F2(T), i.e., E((5|S) = F1(S) and E(77IT) = F2(T),
and the fact that these two regressions are heteroscedastic. In this context then the
problem of testing H02 is equivalent to testing the equality of the two regression
functions under heteroscedasticity.

The problem of comparing the two regression functions has been discussed by
several authors. In general, see, e.g., Hall and Hart (1990), King, Hart and Wehrly
(1991), Carroll and Hall (1992), Delgado (1993), Kulasekera (1995), Koul and Schick
(1997, 2003), Neumeyer and Dette (2003), among others. The data is completely
observable in the above mentioned literature.

5

Koul and Schick (2003) proposed a test using covariate matching for the same
problem in a general regression set up. In this thesis, we adapt this test to the two
sample current status data and discuss its asymptotic normality under a general set
of assumptions.

This thesis is organized as follows. Chapter 2 studies empirical minimum dis-
tance tests of lack of ﬁt in classical regression set up. Corollary 2.3.1 and Theorem
2.3.1 state and prove consistency of empirical minimum distance estimates of the
underlying parameters of the model being ﬁtted. Theorem 2.4.1 and Theorem 2.5.1
give asymptotic distribution of the parameter estimator and the empirical minimum
distance statistic under the null hypothesis.

In chapter 3, section 2, we apply the results of Koul and Ni (2004) for minimum
distance tests of goodness of ﬁt hypothesis based on current status data. After that,
we discuss consistency of these tests against a ﬁxed alternative and obtain asymptotic
power against a class of local alternatives. Section 3.3 uses the results of Chapter 2
for empirical minimum distance test to ﬁt a parametric model to the distribution of
the event occurrence times based on current status data.

Section 3.4 reports the numerical results of the three simulation studies in the one
sample set up. The ﬁrst one assesses the ﬁnite sample level and power behavior of
the empirical minimum distance test. The simulation results of empirical minimum
distance statistic are consistent with asymptotic theory. Also, simulation results show
little bias in the estimator of the best ﬁtted parameter for all the chosen sample sizes.
The second simulation study investigates Monte Carlo size and power behavior of

the Cramér-Von-Mises test CV1. The ﬁnite sample level of this test approximates

6

the nominal level well for all the chosen sample sizes. The third simulation study
investigates Monte Carlo size comparison of empirical minimum distance test, CV1,
and Koul and Yi (2006) test. Simulation results show that empirical sizes are better
for C V1 and Koul and Yi (2006) test as compared to empirical minimum distance test,
when sample size is less than 200. But when sample size is 200 or large, empirical
sizes are comparable in all the three tests. In our simulations, F is obtained by
the one step procedure for the calculation of the nonparametric maximum likelihood
estimator, based on isotonic regression, cf. Groeneboom and Wellner (1992).

Chapter 4 deals with the problem of testing the equality of two distribution func-
tions against the two sided alternative based on the current status data. Proposition
4.2.1 discuss asymptotic normality of the underlying test statistic under a general set
of assumptions. Section 4.3 reports the numerical results of the two simulation stud-
ies. The ﬁrst one assesses the ﬁnite sample level and power behavior of the proposed
test statistic. The simulation results of the proposed test statistic are consistent with
asymptotic theory.

In the second simulation study, the ﬁnite sample comparison of the proposed test
statistic with the two sample Cramér-Von Mises test is made. Simulation results show
that for all the chosen alternatives, bandwidths and sample sizes, signiﬁcance level and
power of the proposed and CV2 tests are comparable. Again, in our simulations, F1
and F2 are obtained by the one step procedure for the calculation of the nonparametric

maximum likelihood estimator, based on isotonic regression.

CHAPTER 2

Empirical Minimum Distance
Lack-Of-Fit Testing

In Regression Model

2.1 Introduction

This chapter discusses an empirical minimum distance method for ﬁtting a parametric
model to the regression function “(3) :2 E(Y|X = :13), :r 6 Rd, d 2 1, assuming it
exists, where Y is the one dimensional response variable and X is a d dimensional
design variable. Let {m0(:z:) : a: 6 Rd, 6 E G C 189, q 2 1} be a given parametric
family of regression functions and let I be a compact subset of Rd. The problem of

interest is that of model checking, i.e., to test the hypothesis

H0 : p(x) = m90(:r), for some 60 E 8 and for all :1: E I;

H1 : H0 is not true,

based on a random sample (Xi, Y,;), 1 S 2' g n, from the distribution of (X, Y).

Several authors have addressed the problem of regression model checking: see,
Hart (1997) and references there in. The recent paper of Koul and Ni (2004) (K—N)
uses the minimum distance ideas of Wolfowitz (1953, 1954, 1957) and Beran (1977,
1978) to propose tests of lack-of-ﬁt for the regression model with heteroscedastic
errors. In a ﬁnite sample comparison of these tests with some other existing tests,
they noted that a member of this class preserves the asymptotic level and has very
high power against some alternatives when compared to some other existing lack-of-
ﬁt tests. The distance used in their paper is the integrated square deviation between
a nonparametric estimator of the regression function and the parametric model being
ﬁtted with respect to a general integrating measure.

To be speciﬁc, K—N considered the following tests of H0 where the design is random
and observable, and the errors are heteroscedastic. Let (I) be a sigma ﬁnite measure
on Rd, G denote the d.f of the design variable X, and Gn be the empirical d.f. based
on Xi, 1 g i S n. For any density kernel K, let Kh(:r) 2: K(a:/h)/hd, h > 0, .r 6 Rd.

Deﬁne, as in K-N,

 

the) = $2: Km — x»,
i=1
i=1
1 n 2 d<I>(:c)
77L 6 3: — K 27 — Xi Y2- — m X2- ,
< > [I In; M >( 9( >>] 9122(1))

and tin := argminﬂeeTnW), where K, K * are kernel density functions, possibly

different, h = hn and w = 2117; are the window widths, depending on the sam-

ple size n. K—N gave some sufﬁcient conditions on the underlying entities for con-
sistency and asymptotic normality of tin under HO, and asymptotic normality of
Dn := nhg/2(Tn(1§n) — ISM/6132 under H0 , where

Rn I: '—— dq) 2.1
"2 .221/1 gaze) ( ) I )

372' = Yi'mén (X,),

.. _ Kh($-X,)Kh($-X')E,§' 2
9n := 712/143; (/ 9h“) 3 J d<I>(:c)) .

 

 

A practical problem that arises in using these statistics is the choice of the inte-
grating measure <I>. Although one may choose (I) using some optimality criteria, such
a (I) will invariably depend on the model being ﬁtted and the design distribution. One
way to simplify the choice of (I) is to use the empirical d.f. of design in the above
entities.

We are thus motivated to propose empirical minimum distance tests of lack-of-ﬁt

in the classical regression model. Accordingly, let 1']- : 1(Xj E I) and deﬁne

K},(Xj— —X ,)(Y,-—— m9(X,- ))

n 2
Mn“) 72-12:" I: 9111(le Ij ,

 

and (in := argminoeean).

We also need the following entities:

 

. K x—-—X - ,)Y,'
mn(:c) := — n: n*h(92 *llx , a: 6 Rd,
w
M519) :2 n—l ”1(mn(x —m9(x- n21 -, gang,
J

6,"; := argminjgleeMnW).

10

In this thesis we prove the consistency of 6;; and 877,. We also prove asymptotic
normality of Jaw”, — 60), and nhd/ 2(1I71n(00) — Cn) under H0, where Cn is given
below at (2.2). Then, similar to K-N, sequences of estimators C’n and I‘n are provided
such that Cn is nhd/z- consistent for Cu and En is consistent for I‘, and under
some sufﬁcient conditions on the underlying entities, asymptotic null distribution of
nhd/ 2F; 1/ 2(anﬁn) — C77,), is shown to be standard normal. These results are

similar in nature to Theorems 4.1 and 5.1 of K—N. Here,

Tl. . __ .
“ ._ —3 h J z' ,

 

 

ll
.5
I
3

 

~ . 2

,, K X——X'K X—X’e-e-
n, .= win-4: X h” 2) h“ J) 2 J: ,
. . 9 (X1)
2%] l h
where 02(3) := E[(Y — m60 (112))2IX = 1r] , x 6 Rd, and g is Lebesgue density of G.

This chapter is organized as follows. Section 2.2 states the needed assumptions.
In the beginning of section 2.3, we summarize some of the results of K-N and Koul
and Song (2006) (K-S) for the sake of completeness. Section 2.3 contains the proofs of
consistency of 6;; and 0n, while section 2.4 and 2.5 contains the proofs of asymptotic

normality of ﬁn and that of the proposed empirical minimum distance test statistic,

respectively.

11

2.2 Assumptions

Here we shall state the assumptions that are in K-N for reference where theorems
and lemmas are proved. Throughout the thesis 00 denotes the true parameter value
under H0 assumed to be in the interior of 8. About the errors, the underlying design

and the 0— ﬁnite measure (P on Rd we assume the following:

(e1) The random variables {(X,,Y,-) : X,- E Rd,Y, E R1; = 1,2,--- ,n} are i.i.d.,
EIYI < 00, and the conditional expectation 11(17) := E(Y|X = :r) satisfying
f p2(:1:)d<I>(x) < 00.

(e2) E(Y — ,u(X))2 < co, and the function 0%) = E[(Y — ,u(X))2|X = I] is as.
((1)) continuous on I and 030 (.13) is continuous on Rd.

(e3) EIY — n(X)|2+6 < 00, for some 6 > 0.

(e4) EIY — [.l.(X)|4 < 00 and 760(zr) := E[(Y -— m90(X))4|X = x] is continuous on
Rd.

(g1) The d.f. G of the design variable X has a uniformly continuous Lebesgue density
9 that is bounded from below on I.

(g2) The density 9 of the d.f. G is twice continuously differentiable.

(p) (I) has a continuous Lebesgue density qS.
About the kernel functions K and K *, we shall assume the following:

(k) The kernels K, K * are positive symmetric density functions on [—-1,1]d with
ﬁnite variances. In addition, K is a bounded kernel and K * satisﬁes a Lipschitz

condition.

12

About the parametric family of functions to be ﬁtted we need to assume the

following:

(m1) For each 6, m9(a:) is a.e. continuous in 51:, w.r.t. the integrating measure <I> on
Rd.

(m2) The function ing (1:) is identiﬁable w.r.t. 6. i.e., if mgl (11:) = "1.92 (x) for almost

all :r(<I>), then 61 = 62.

(m3) For some positive continuous function E on Rd with E€(X) < co and some

ﬂ>0,

|m92(:r) — m61(:r)| g "02 -— olnﬁea), val, 92 6 6,3: 6 I.

m4 For every 3;, m .1: is differentiable in 6 in a neighborhood of 6 with the vector
6 0

of derivatives #1903), which is continuous on Rd such that for every 0 < k < co,

m X- —m X- — 6—6 ’r'n X-
I 6( 1,) 00( 7,) ( O) 60( z)I 20131),
"9-90"

 

sup
where C := {1 g 1' S n, Vnhg|l6 - 60” _<_ k}.
(m5) For every 0 < k < oo,
—d/2 . .
Slép hn ||m0(X,) — m60(X,)|| 2 019(1), Vn > NE.
. 2 .___ . . I . . . .
(m6) f ||m90|| d‘I) < co, and 20 .— fmgomgodq) 1S posmve deﬁnite.
About the bandwidth hn we shall make the following assumptions:

(hl) fin—103511400.

13

(h2) nhgd —> 00 as n——> oo.

(h3) hn ~ 11—“, where a < min(1/2d,4/(d(d + 4))).

Let 9,, and 9.2:) denote the kernel density estimators of g with bandwidth h and
112, respectively. From Mack and Silverman (1982), we obtain that under (g1), (k),

(hl) and (112),

:23 léhtv) - g($)| = 019(1), £1611; lgiiitv) - 9(m)| = 012(1), (2-3)
9(56) _ =0
:22 game) 1| 10’-

 

These conclusions are often used in the proofs below.

In the sequel, 5 := Y — m90(X). The integrals with respect to <1) and G measures
are understood to be over the compact set I. The inequality ((1 + b)2 S 2(a2 + b2),
for any real numbers a, b, is often used without mention in the proofs below. The
convergence in distribution is denoted by —+ d and Np(a, B) denotes the p—dimensional

normal distribution with mean vector or and covariance matrix B, p 2 1.

2.3 Consistency of 6,”; and 6,,

This section proves the consistency of 6;; and 6”. To state and prove these results
we need some more notation. For a a-ﬁnite measure Or on d—dimensional Borel space
(Rd,Bd), let L2 ((1) denote a class of square integrable real valued functions on Rd

with respect to a. Deﬁne
pm. V2; e) == [Io/1m — WWW),
pow/2) := /I(V1(x)-V2(x))2dG(w),

l4

Pn(V1iV2) :-—- /I<u1<x>-u2(x)>2dcn(x)

Tl.
= n—12(ui(X,)—u2<xj>)21j. Viz/2613(0).
i=1

and the maps

T(I/, Q) := argmingee p(1/, me; Q), T(V) :2 argmingee p(u,m6),

Tn(1/) := argmingee pn(u,m6), u E L2(G), n 2 1.
The following lemma has its roots in Beran (1977) and is proved in Ni (2002).

Lemma 2.3.1 Let m satisfy the conditions (m1), (m2), and (m3). Then the follow-
ing hold.

(a) VV E L2(Q), T(V; Q) always exists.

(b) If T(u; Q) is unique, then T (V; Q) is continuous at 1/ in the following sense:
For any sequence of {un}, V E L2(Q), p(z/n, V; Q) -—> 0 implies T(l/n; Q) —-> T(u; Q).

(c) V 6 E O, T(m6) = 6, uniquely.

We need an analog of this lemma for the random distance pn and the correspond-

ing Tn given as follows.

Lemma 2.3.2 Let m satisfy the conditions (m1), (m2), and (m3) with Q replaced
by G. Then the following hold.

(a) VV E L2(G), T(z/) always exists, and Tn(1/) exists Vn 2 1, mp].

(b) If T(u) is unique, then the following holds. For any sequence of {Un},V E

L2(G).

pn(un, V) ——>p 0, implies Tn(un) -—->p T(l/), as n ——+ 00.

15

(c) V 6 E 9, T(m6(-)) = 6, uniquely, and Tn(m6(o)) = 6 uniquely, for all n 2 1,
w.p.1.
Proof. The following proof is a suitable modiﬁcation of the proof that appears in Ni.
Proof of Part (a). The existence of T(V) follows from (a) of Lemma 2.3.1. We shall
prove that the family of random functions 6 H pn(u,m0), n 2 1, is almost surely
equi-continuous. Then the claim (a) pertaining to Tn follows from the compactness

of 8. By the Cauchy-Schwarz inequality, for any 19, 6 E 8,

1 2 1 2
Ipn(Vi 771,9) _' pn(Vim6)I S Pn(m19im0) + 2107/ (Vim0)pn/ (mﬂimg)

But, by (m3),

pn(m,,,m,) = [Ilma($)-ma(x))2d0n(x)sll19-9ll2n_lz€2(xi)1i-

i=1

Since 6 is continuous on Rd and I is compact then I? is bounded on I and hence
sup pn(m,9,m6) S C||19 — 6H2, w.p.1.
n21
Similarly, under (m3),
p(m,9,m9) 3 one — 9H2 van 6 9.

Because of the SLLN’s and because my, V E L2(G), pn(i/,m,9) —+ p(u,m,9), as.
for each 19 E 8. Also in view of the above bounds, both functions 19 I—) pn(-,m,9)
and 19 H p(-, my) are Lipschitz(2) uniformly in n and with probability 1. These facts

together with the compactness of 9 imply
sup |pn(u,m6) — p(u, m9)| —+ 0, as, as n —+ 00, (2.4)
Slﬁp‘pn(u’ "‘19) —— pn(u,m9)| -—> 0, as, as “19 — 6|] —> 0,

16

thereby completing the proof of equi-continuity of the map 6 H pn(u,m6), and of

part (a).

Proof of part (b). Let {Vn}, V in L2(G) be such that
pawn, u) = ope). (2.5)
Let 6 = T(l/), 6n = Tn(un). For an e > 0, let
Ame == {PnO/niV) S 6’ IMO/me) - p(V.m.9)| S 6}-

By (2.4) and (2.5), there is an N6 such that

F(Am) 2 1 — c, vn > NE. (2.6)

Now, by the deﬁnition of Tn,
pn(1/n,m,9n)§ pn(un,m0), Vn Z 1,w.p.1.

By subtracting and adding 11 inside of the square of the integrand, expanding the

quadratic and using the Cauchy—Schwarz inequality on the cross product term,
1 2 1 2
men, m9) s me, me) + man, u) + 2M (un. m/ (me).
On A7116, we thus obtain

man, man) 3 pa, mg) + e + 2.1/ 2(. + pox, m9>)1/2. (2.7)

On the other hand, again by the deﬁnition of T, Tn, 6, and 1%, pn(u,m6) S

pn(u,m,9n), for all n 2 1, as. This, together with an argument like the above,

17

implies

Pn(Vnim19n) _ PRU/img)
2 Pn(Vn,mgn) — pn(V.m.9.,,)

1 2 1 2
2 pn<un.u>—2pn/ (um/)4 (V.m.9,,)Vn.>_1, w.p.1.

But,
pn(1/,m,9n) S 6pn(un,u)+4pn(u,m6).

Hence, on A7115,

Pn(Vn,m,9n)
2 pn(u,m9) + [mo/"1”) “ 2671/2971, V){6Pn(Vni V) + 41071.01, m6)}1/2
Z IMP/imp) — e — 261/2(26 + p(1/,m9))1/2

> p(u,m6) — 26 — 261/2(2€ + p(V,m9))1/2.

Thus, in view of (2.6), (2.7), and the arbitrariness of e, we obtain

pn(Vn.m.9,,) = puma + ope).

From these facts it follows that 197,, —+p 6. For, suppose 19n 4+ 6, in probability.

Then, by the compactness of 6, there are subsequences 197% of {1977,} such that 197;, k -+

19 7Q 6, and by (2.8), pnk(1/nk,m,9nk) —-+ p(u, my), in probability. Hence, p(u, m,9)

p(1/,m9), implying, in view of the uniqueness of T(I/), a contradiction, unless 19 = 6.

Proof of part (c). The claim here follows from the identiﬁability condition (m2)

withQ=G.

18

As in K-N, for any average L. = n"1 E? _X1(7( j)/g,"2,(XJ-)), the replacement of
gw by g is reﬂected by the notation L := n‘12?=1(7(Xj)/9(Xj)). We also need

to deﬁne for x E Rd, 6 E R9,
maze) == - :2: Kh( z — "1909) (2.9)
,[in(x, 6) := — n: Kh(x — X,-)m6l(X,),

(171(1), 6) I: Kh(£lI — X2)” — [171(2), 6)

£1101: — X’l)[Y'i — m0(X'i)l7 (171(3) 3: Un(.’L‘,00)

i};
5.
M0?) 2: Emma): EXha-Xmgou).
Zn(x,6) :2 un(x, 6) — an(x, 60)
= -ZKh<.x—X ilma< Xa— m90(X)l
Un($)llh($) 1n Un(X] ____)_Xj]IJ
Sn 2: -—-———— -:_ ,
/ 92(56) (mm) %n;[g,,2( (Xj)

n 2
Cn2(6) := ij: I[ﬂ—W—ngg) (jX) —m9(Xj)] Ij,
21

2 .2 [0,0 0<x)m90(x>m90(x)’ «We

9 (x) dx.

 

Many of these entities are the empirical analogues of the entities deﬁned at (3.1) in
K-N.

Now, we will summarize the results of K-N and K—S for the sake of completeness.
The next two results state the consistency of 19;; and 6n.

Result 2-3-1. Suppose H0, (e1), (e2), (k), (g1), (h1), (h2), and (m1)-(m3) hold.

19

Then, 19,”; —+ 60, in probability.

Result 2-3—2. Suppose H0, (e1), (e2), (k), (g1), (hl), (h2), and (m1)-(m3) hold.
Then, in ——» 60, in probability.

The following result gives asymptotic normality of 197,.

Result 2-3-3. Suppose H0, (e1), (e2), (e3), (g1), (g2), (k), (p) , (h3), and

(m1)—(m5) hold. Then,

Consequently, 711/2097, - 60) —->d Nq(0, 26123261), where 8;; and 2 are as in (2.9)
and 230 is as in (m6).

The following result states asymptotic normality of the minimized distance
min).

Result 2-3—4. Suppose H0, (e1), (e2), (e4), (k), (p), (g1), (g2), (h3), and (m1)-
(m5) hold. Then, nhd/2(Tn(19n)—Rn) T’d N1(0, 1‘). Moreover, IQnFTl —1| = 0p(1),
where ﬁn, Rn, and F are as in (2.1) and (2.2).

The following result from K-S gives consistency of 19:, and 197,, for T(m), where m
is a given regression function, different from the model being ﬁtted.

Result 2-3-5. Suppose (k), (g1), (m3) hold, and m is a given regression function
such that m E {mm 6 E O}, m E L2(Q), and T(m) is unique.

(a) In addition, suppose m is a.e.(Q) continuous. Then 19:, = T(m) + 013(1).

(b) In addition, suppose m is continuous on I. Then 6n = T(m) + 019(1).

Next, we shall prove consistency of empirical minimum distance estimates of the

underlying parameter vectors under H0.

20

Corollary 2.3.] Assume H0, (e1), (e2), (91), (k), (m1)-(m3), (hl), and (h2) hold,

with Q replaced by G. Then 6:, —+ 60, in probability.

Proof. Note that Mﬁ(60) 2: pn(mn, m00)’ 6;“, = Tn(mn), and by the identiﬁability

condition m2 , T m = 6 . It thus sufﬁces to prove
60 0

Pn(mn,m90) = 0p(1)- (2-10)

To prove this, substitute m60 (X,) + e,- for Y,- inside the ith summand of M7”: (60) and
expand the quadratic summand to obtain that mean, mgo) is bounded above by the
sum QICnl + 0112 (60)], where Cnlv 07,2 are as in (2.9). It thus sufﬁces to show that
both of these terms are op(1).

2

Since 6, is conditionally centered, given X,, and by continuity of g and 060’

assured by (e2), (g1), (k) and (h2), we obtain

 

 

 

 

 

 

 

 

 

”—1 n U’n(Xj) . 2
E( 'Jélgwﬂlj (2.11)
_ " K(X X,)e 2
3:12:14 h M) l
j 1
2
=n‘3h—2dK2(0)ZE 9,), 1,2] +n‘3ZE [7‘09 (X176 02 1,]
Xi
K2(0) 000031),” 090 (x— uh)-g(x uh)K2 (u)
:(nhd)2/I 9(37) +nhd/fo 9(55) dxdu
=O(1/nhd).
Hence
{1}”: M12 —- o ((nhd)_1) (212)
i=1 909') J — p ’ '

and, by (2.3), Cnl = 039(1). Next, we shall show
Chemo) = 0p(1)- (2.13)

By taking the summations for i = j and i 7e j, and by using the inequality

(a + b)2 g 2(a2 + b2), for any real numbers a, b,

 

 

Cn2(9) S 210n21(0)+5‘n22(9)l. 668. (2-14)
where
. _ “ “(Xh(0)—Xt(0))mg(x~) 2

__ ,, 3 J . .

0.21(0)— )2 gwj, I, . (215)
. _ n l (X(X-—X.-)m(X.-)—Xt(X-—-X.-)m(X-)) 2
Cn22(0) : n 3: 2: h J 6 g*(X~) J 6 J 2.9

i=1»i¢j w J

 

By the compactness of 8, every open cover of 9 has a ﬁnite subcover {6); 1 g j S k},

say. For any 6 > 0 such that ||6 —— 6j|| g 6, and by (m3),

supoeeE(én21(9))

2K2(0) (h—2d _ w—2d)

S ———:—,——— sup sup
11

193k “9"9jIIS5

(172.9(1)-—mg,.(a:))2 (ninja)?
X +—-————— dx
H he) at») l

 

 

2CK2(0) —2d___ —2d 25 k
g n, (h w )(5 +152 [I 9,5,) dx.

Thus by (g1), (k), and by continuity of m9,V6 E O, we obtain

supeeeE(C‘n21(6)) = 0(nhd)’1. (216)

Hence Cn21(60) :2 op(1) follows from (h2), and (2.3)..

22

To deal with C7722, let, for j 74 i
6},(.’L‘, 0) = E[I(h(Xj -- X,)m9(X,)lXj = :13],
efu(n,6) = E[K,";,(X,- — X,)m9(XJ-)|Xj = .13].

By adding and subtracting eh(X]-,6) and e;,(XJ-,6) in the quadratic term of the

summand of Cn22, one obtains

 

 

 

 

 

én22(9) S 3Cn221(6)+30n222(6)+3Cn223(9)i 668, (217)
where
n ' .- 'X(X-—-X.-)m(X.-)-e (X-,6)) 2
07.2216) = n22 :22 h 2 .(X2, " 2 1)] (2.18)
j=1 _ 911) J
_ " (2.2-[Xi1X-—X.-)mn(X-)—et(X-.e)) 2
an (6) ___ n 3 J .7 * .7 .7 I] ,
222 3;] 910(le J
__ (n—l)2 " €h(Xj 9)-661(Xja9) _2
Cn223(6) _ n3 32::1I 961(le I]

By the fact that the variance is bounded above by the second moment, one obtains

\7’6EO,

 

~ 1
ECn221(9) S 7 X E
n . .
15‘]
Again proceeding as for (2.16), for any 6 > 0 such that “6 — 6le _<_ 5, we obtain

9(X ') J

[Kh(Xj - Xi)m9(Xi)1.] 2
J

 

68111,)3 E(Cn221(0)) (219)
e
2 Kh(Xj ~ X,)(m6(X,-) - may-(X0) 2
S sup sup —3 2: E' (X) I]-
133516 IIQ-lelSJ 7‘ 2,4]- 9 J

_ Kh(Xj-Xi)msj(Xi) . 2
-+- sup 32E I]
lgjgkn #3. 909')

 

23

 

_<_ sup sup ;E—Zne 0II22E
1<J<kll9 93' ”<57; iaéj

+ su E[Kh(Xj " Xi)m6j(xi)l_]2
p -3- 22E
1<j<k n £75]-

Kh(X--X->e<X3-> ,2
l M > 1,]

 

909') J

2622 K2(u)e2<y—uh)g(y —uh)
W/ / 3(3) my
K2<u>m§ (y - uh>g(y - uh)

sup // dydu
+nhd1<j<k19(y)

= 0(nhd)-'1 ,

 

by (m3), (g1), (k), and by continuity of m6,\7’6 E 6. Hence, 071221090) =
0p((nhd)"1) follows from (2.3). Similarly, we can obtain that Cn222(60) =
03((nhdr1).

Next, we shall show C'n223(90) 2 017(1). By adding and subtracting Eeh(Xj,0)

and E63209, 6) in the quadratic term of the summand of 033223, one obtains

 

 

 

 

Cn223(6) s 3131(0) +31n2(9) +31n3(0), 6 e 6, (2.20)
where
n 2
1 l€h(Xj,0) - Eeh(Xj,9)
I 0 == — I- , 2.21)
P 2
"’2 nj=1 _ 9271(Xj) J ’
n r at: 2
1 Eeh(X -,9) — Eew(X -,9)
1,3309) = 52 7 *(X) 2 Ij .
j=1 _ 911) J
But V6 E 9,

 

211.7712 "U 2 “-‘U,
Efn1(0) s %E[/K H 9(X gig? h)I(XEI)du].

24

Similar to the argument as in (2.19), by boundedness of 9, (m3), (g1), (k), and by

continuity of m6,V0 E 6, we thus obtain

supgeeEntW) = 0(n)—1. (2.22)

Hence 17,1 (60): 019(1) follows from (2.3). Similarly, one can obtain that 1112(90) =

019(1). Also, by continuity of mg and g, one readily sees that
Ein3(0) = 0(1), for each9 E 6. (2.23)

Hence, In3(60)= 0(1) follows from (2.3). This completes the proof of (2.13) and
hence that of (2.10) and the Corollary 2.3.1. Cl

Remark 2.3.1: The basic ideas of the above proof is the same as that of Corollary
3.1 in K-N. The only difference is in some details, like e.g. in the derivation of the
bounds (2.11) and (2.19). This phenomena is true in many proofs that follow. So we
shall be brief in these proofs whenever possible.

Before stating the next result we give a fact that is often used in the proofs below.
Under (g1), (k), and (h2), independence of Xi’s, and for any continuous function a,
one obtain

n
71—1 2 E
j=1

_ 12. 0X-
n 1:K:(X:—Xi)g((x;;lj

 

(2.24)

 

 

 

 

 

 

 

 

 

2K2(0)" 2 7‘ Kh(X"’Xz')a(Xi) 2
s 2:12;? 13,222+] g3: 21E [g2 2,33.) ,]
J: 2 3
_ 2K2(0) a(X)I(XEI) 2 2 Kh(X'-Xi)a(xi) 2
— (nhd)2 [ 9(X) ] +n_3§JE[ 3909') j]

 

+5? 2 E Kh(Xj -X-)Kh(2)(()j(j—)Xl)a(X,-,1_)a(Xl)1j]
#221

25

 

21:10) 02(rr)d K2(u )az(:r — uh)g(:c— uh)
(Ti/id)? 2i1'"le 9(rc +nhd 2/ /I( 9(23) dx d“

+2/[/(K(u:a (x—uh)g(:r—uh)d 122]—g—(1—x-)dx

= 0(1) + 0(1) + 0(1) = 0(1).
We now proceed to state and prove

Theorem 2.3.1 Suppose H0, (61), (62), (91), (k), (m1)-(m3), (hl), and (h?) hold

with (I) replaced by G. Then,
bin -—-> 90, in probability.
Proof. Arguing as in K-N, we shall again use part (b) of Lemma 2.3.2 with V(:r) E

m90(;r), z/n(;r) E mén(:c). Then by (m2), (in = Tn(un), 60 = T(V), uniquely. It thus

sufﬁces to show that
supganm — MW): = 0pm. (2.25)

For, (2.25) implies that

Mmén) = Mn(én) + 010(1), M3092“): Mn(9n )+ 010(1)

M,*,(én) — M;;(6;‘,) = Mn(én) — Mn(6;';) + 033(1). (2.26)
By the deﬁnitions of én and 6;, for every n, the left hand side of (2.26) is non negative,
while the ﬁrst term on the right hand side is non-positive. Hence,

Mﬂén) Mn(9n )= 012(1)

This together with the fact that Mﬂaﬁ) .<_. M;(00) and (2.10) then proves the
required result.

26

Arguing as in the proof of Theorem 3.1 of K-N, it thus sufﬁces to Show that

sngnZUJ) = 019(1), sgpllln(6)=0p(1). (2.27)

Using the same argument as in (2.14) - (2.23), one obtains, (77,209) = 017(1), for

each 0 E G. This and (2.3) in turn imply that

V6 6 e. (2.28)

 

C 2(9) < sup 92(27) 5 n2(9)=0 (1)
" ‘ 3619* 2m" p ’

By the Cauchy-Schwarz inequality, for any 61, 02 E 9

ICn2(92) — 03,2be _<_ 2091 + E2) + 403320122 + E211/2,

where, by (m3),
" K1109 — Xi)(m62(Xi) — m61(Xi))I

n 2
._ -3 .
E1 '_ n 2 [2 9221(Xj) 3]

i=1
S ”92"61ll2’68uP161'29( H )"[ 3 j: 1(2 h(J 9X( le) ( )Ij) l

9322 (9:)

 

E2 :— 22—1:[lm62(Xj)—m61(xj)lzj2] <n62 61II22n "IZWX X->I-12
j=1 j: —1

Hence (2.24) applied with O E (f, (2.3) together with the compactness of G and

(2.28) completes the proof of the ﬁrst part of (2.27).

To prove the second part of (2.27), note that by adding and subtracting m00(X,-)

to the ithsummand in Mn(6), we obtain

2 2
< gl —::—2 ——-—2J

 

 

27

But, by a similar argument as for (2.19), and by (2.24) applied with a E Z ,

Zn 391- 2
sup9[n '1j21[ (XJ J)J]] (2.29)

9( X3)

 

2
" Kh(Xj - Xi)€(Xi)Ij

n
5 ll9 — 60n22n‘3 Z [Z .
j=1 i=1 9(X])

 

This together with (2.12) complete the proof of the second part of (2.27), and hence

that of Theorem 2.3.1. CI

2.4 Asymptotic distribution of én

In this section we shall prove the asymptotic normality of n1/2((§n — 60). Let

[23(2) := Etna, 60) = Eth: — X)m30(X), (2.30)

.. .:_1jzl Un(Xj)(/I;Y]:()Xj)1j .

We shall prove the following

Theorem 2.4.1 Assume that (61), (e2), (e3), (91), (92), (k), (m1)-(m5), and (h3)

hold, and (I) is replaced by G. Then under H0,
n1/2(9n _ (90) = 251,3”an + 033(1), (2.31)

Consequently, n1/2(én — 00) _’d Nq(0, 2612261), where 20 and E are as in (m6)

and (2.9), respectively.

Proof. The proof consists of several steps. The ﬁrst is to show that

nhdué‘n—90u2 = 03(1). (2.32)

28

Let

 

._ ”—1 n Zn(X"9)Ij 2

We claim
nthn(én) = 033(1). (2.33)

To see this observe that

 

 

 

” n U (X-)I- 2‘
d d —1 n J J
- j=1[ 9212(XJ) ] J
n '1
dl -—1 U"(XJ)IJ 2
S nh n 2[ J
_ 3:1 9(le
’2 U X- I; 2 2
+nhd[n_1 Z [M] J sup 9 2(13) -1|
j=1 9(XJ) x61 92):) (1')

 

by (2.12) and (2.3). But, by deﬁnition, M33023) 3 M72190), implying that
nhdMn(én,) = Op(1). These facts together with the inequality Dn(6) _<_ 2[Mn(60) +
Mn(6n)] proves (2.33).

To complete the proof of (2.32), arguing as in K-N, it sufﬁces to show for any

0 < a < 1, there exists an Na such that

D -
P(—.—nﬁﬂ—2 _>_ a+ inf bTZOb) > 1 — a, Vn > Na, (2-34)
"on — 90H Hbll=1

where 20 as in (m6). To prove (2.34), let
an := (in — 00, (2.35)

I . .
dni 3: "297109) — "2900(2) — unm00(Xz'), 1 _<_. z _<_ n,

29

 

 

 

1 n - l n d ' 2
on, = n- Z - 219,09- -X.-> A 2,- /g(Xj) ,
. n . Nun”
3:1 - 2221
n 'u ’p X-,0 1-2
0722 I: "-1: n n( ]X(.)) J
A .. 1 2 1 2
Then, we have Dn(6n)/||6n — 00||2 _>_ Dnl + Dn2 — 2Dn/l D122 .

By assumption (m4), consistency of én, and by using (2.24) with a = 1, one

veriﬁes that Dnl = 019(1). For the term D712, note that

||b||=1

where

n b" X-,6 I- 2
271(0) := 72-1: M“ J 0) J , beRQ_
j=1 91(le

 

Now, we will prove that for each I) 6 Kg, ”1)“ = 1, 277(1)) ——> b'EOb, in probability.

For this it sufﬁces to show that
E[En(b) — #201212 = 0(1), Vb e 13‘]. (2.37)

Rewrite

 

I .
b Kh(Xj - Xi)m90(Xi)Ij>2

1 n n
272(1)) = -—;,ZZ( M.)

 

n .
J=1 2:1 .7
I . - I
+_1_ i Z (J Kh(Xj - Xi)m90(Xi)Kh(Xj — Xk)m90(xk) bIj
3 2 .

where

 

b’KMXJ- — Xamaom)

2
2122(1)) : n13ZZ[ 9(Xj) Ij] I

 

 

 

z jyéz
K 0 b’Kh,(X- 43mg (xamg (X->’b
27‘3“» = n3);clz:§[ J 9208].) O J 11')
z z ]

 

1 (”(1109 - Xilm60(xi)Kh(Xj — Xklm90(Xk)' b
21240)) : "7:; . 92(X') Ij -
276],k .7

The left hand side of (2.37) is less than or equal to
6[E2%1(b ) + 1.3222(1)) + 13233, 3b( )1 )+ 2E[2,,4(b) — b’ 20312
Thus to prove (2.37), it is enough to show for each b 6 Kg,
1323,13) = 0(1), 3232(3) = 0(1), (2.33)

0(1), E[Zn4(b) — b’EOb]2 = 0(1).

323,3(3)

Now, we shall prove the ﬁrst part of (2.38). By the Cauchy-Schwarz inequality,

4 llm mu?
232(1)) )_K(O)]le92( 60 1]].

 

 

n1 5h4d Xj)

Therefore, by (g1), (k) and (h2), one obtains

4 IIm (on2
g 5:492 / _29__d
n4h4d I 9(3)

= 0(nhd)—1 = 0(1).

(17

Esup 272110))
b

 

 

 

Similarly,
K 0 b’Kh(Xj- lemo (Xi)ThI9(X j)’b 2
E53233“) < 71543532? 9203]) O 11'
z j 2
K2“) Kiﬂr-y)llm90(x)ll2llm90(y)ll2g(y)
< —— dxd
_ n2h4d//I 93(3) 31
Km) K4(u)llm90( z>II2IImgO(z-hu>u29 (cc—1m)
S ”Tth// g3(:z:) dxdu

= 0(n2h3dr1 20(1).

31

by (g1), (k), (h2), and continuity of 75290. This proves the third claim in (2.38).

Now we shall prove the fourth claim in (2.38). Since,

E2724“)

_ E[b'KMX2 - X1)m90(X1)Kh(X2 - X3)m90(X3)’12 b
92(X2)

:_— bI/1_//K(u)K(v)T'n90(x — hu)7i290(x — vh)’

x g($ _ hu)g(x — Uh) du d1) d2: b
9(23)

 

 

—+ b’ZOb.

Thus, to prove E[Zn4(b) —b’20b]2 = 0(1), it is enough to show E23403 = (b’EO (2)2.

Now,
2
23n4(b) : Z: Z zijkzlmn
i¢j¢kl¢m¢n
S 0 Z Zijk(zlmi+zlmj+zljm)+ Z zijkzlmn
i¢j¢k¢m751 i7éj7ék7ém¢l7én
= 272,410?) + Z3n42(b) + 371430)) + 212.44%), say,
where

. - I
_ ._ b’ _3 Kh(Xj—Xi)m90(Xz')Kh(Xj-Xk)m90(Xk) I- b
Ziﬂt" ” 2X. J
9( J)

By independence of Xi’s, (k), (m1), (g1), and (h2), one obtain for each b 6 KW,

 

Ilbll = 1,
E27241“)

g 5:73 [f [ [1' K(u)K(v)b’r'n90(x—uh)1'n90(a:—vh)’b

X (gee — uh);((:)— vim”? dz] 2,, d,

= 0(nh2d)_1 = 0(1).

32

Similarly, one can obtain that E2n42(b) = 0(1), for each (2 E W such that “b” = 1.
Again by independence of Xi’s, (k), (m1), and (g1), one obtain for each b E Rq,

Ilbll =1,

E3n43(b)
sn—f [ f /( K(u K(v)llmgo ( —uh)llllv‘n90(x — vh)”
we: — hu)g(x — hv)du (11))29-3 (3)33
= 0(n—1)= 0(1).
Also by independence of Xi’s, E2n44(b) = (EZn4(b))2 ——> (b’ZOb)2, for each b 6 EU,
“b“ = 1. This also completes the proof of (2.37).

Also note that for any A > O, and any two unit vectors ()1, b2 6 Rq, ”bl“ = 1 =

H52”, “()2 - b1“ S A, we have

 

l2n1(b2)— Z37"L1(b1)l
,1 ” Kh(Xj -Xz')"n00(Xz') )2
j

n—1 b—b) —
,.>::,[<2 .,2 IX.)

i=1
_1 n 1 " Kh(Xj_Xi)m60(Xi)
n J;[(b2—b1)lgz 909) 1']-
X[_l_>_’1_ " Kh(:j‘X:)7h00(Xi)I]
J

n i=1 9(X ')

 

 

 

+2

 

 

 

n Kh(Xj ’Xi)m60(X i) 2

909') Ij

S (/\+2n

 

           

 

 

But similar to the argument as for Zn,

1 n _1 n . 2
E[gzjun :Kh,<xj—x.->mg,<x >/g<x Joy-III]
3:1 2:1
—+ / ||m90($)||2dG(:r), in probability,

33

Hence, the second factor is 019(1). From these observations and the compactness of
{b E R4 : “b“ = 1}, one obtains supllbll=1||2n(b) — (1,201)” = 013(1).

This fact, together with (2.36), implies (2.34), and also concludes the proof of
(2.32).

Now, we shall prove asymptotic normality of n1/2(én — 60). Recall the deﬁnitions

of (2.9) and (2.35), and let

__ Un(XJ 6)/‘n(x '16)
an 2711: ll 9111209) 3 Ijl.

 

Since 00 is an interior point of 9, by consistency, for sufﬁciently large n, (in will be
in the interior of G, and Mn(én) = 0, with arbitrarily large probability. But the

equation Mn(én) = O is equivalent to
1 EU

In the ﬁnal step of the proof we shall show that \/1_1 X the left hand side of this

 

 

Xj)#n(XJ #9701 13'ng —1:[ZTI’(X.’6nMn(X.16n)Ij . (2.39)

gm 2( Xj) 9732(Xj)

 

equation converges in distribution to a normal random variable, while the right hand
side of this equation equals Qn(én — 60), for all n 2 1, with Qn = 230 + 019(1). To
establish the ﬁrst of these two claims, rewrite this random variable as the sum of

Sn + Snl + 9n] + 9,0 + 9113 + 97,4, where 3n is as in (2.30) and

 

Sm == n‘IZIUnOf w X))(gw<X )—g—2(Xj>>zj1,
j: —1
.= "_1 " Un(Xj)[/~."n(Xj>90)_ﬂh(Xj)lI.]
9n]. ' j;1[ 92(Xj) J 1
W == n*lz[ Un(Xj>Ipn(Xj.eo>~12h<XjI<932(XJ->—g"2(Xj))I:rj].
j=1

34

Un( (Xj)[l1n(Xj an) " ﬂn(Xj» 90)]

—1 .
{:1 92(Xj) 1’ ’

9724 := n‘lZIUn(X,->tzn(X,-,én)-pn(Xj.6o)I(g.;2(X,->~g“2(X,-)>I)I.

 

971.3

Note that these gnj’s are the empirical analogs of the similar entities in K—N. Anal-

ogous to the proof there, we need the following lemmas.

Lemma 2.4.1 Suppose (61), (e2), (91), (k), (m1)-(m5), (h1), (h?) and Hg with (I)
replaced by G hold.
(i) If, additionally, (e3) hold, then \ﬂiSAn —’d Nq(0, 2), where E is as in (2.9).

(ii) If, additionally, (g2) and (h3) holds, then VA 6 R9
x/FLIIX’Snlll = 0pm). (2.40)

Lemma 2.4.2 Suppose (e1), (e2), (g1), (k), (m1), (m2), (m4), (m5), (M), and (h?)

with (D replaced by G hold. Then, under ’HO, VA 6 Rq
n1/2llAignkH = 0pm), k = 1,234. (2.41)

The proof of (2.40) is facilitated by the following lemma, which along with its

proof appears as Theorem 2.2 part (2) in Bosq (1998).

Lemma 2.4.3 Let 9:0(13), :1: 6 Rd, d 2 1, be the kernel estimator associated
with a kernel K * which satisﬁes a Lipschitz condition. If (92) holds and wn =

an(log n/n)1/(d+4), where an ——> a0 > 0, then for any positive integer 1:,

(log;. n)-1(../ logo” <d+4> 52p l9w*(:v) — g(sv)| —> o

35

Proof of Lemma 2.4.1. Let Sn denote 8n of (2.9) with (I) 2: Gn. To prove the ﬁrst

part of Lemma 2.4.1, by Slutsky’s Theorem, it sufﬁces to show that
[”571 ”*d Nq(0a 2); ﬁllén - 5n“ = Op“)- (242)

First part of (2.42) follows from Lemma 4.1 of K-N. To prove the second part of

(2.42), it sufﬁces to show that, VA 6 1R",

lam/Von — sun? = 0(1). (2.43)

—3/2

Let 1),,(23) = A'ﬂh(:r) and aij := n 8,; Cija where

 

dG'(:r) .

 

6.. 2 Kh(XJ' " Xilfhwjll. _/ Khlx — Xithl-T)
2] 122(le J I 92”)

Now, the left hand side of equation (2.43) can be rewritten as the following sum:
6: E(a?Z-) + Z E(a12j) + 4 Z E(“ij“ii) + Z E(aijaiml- (2.44)
.- #i #2‘ man"
To prove (2.43), it sufﬁces to show that each of the four terms of (2.44) are 0(1). By

continuity of 77290, (k), and (g1), VA E Rq and V2: 6 I, one obtains

supIIz>h(z)II s EKh(x—X)IIX’m9 (X>II (2.45)
n O

= /K(u)||A’7n60 (a: — hu)||g(x — hu)du

= 0(1).

Now, we shall show that 23-75,- E(a22j) = 0(1). By (2.45), (k), (g1), and continuity

of 030, forj 7é i, and VA 6 IR",

36

K2 x— {/2 :1;
= ”—2 [0303) [I ’2 9(3)) ’2 29(y)d:vdy

2
_<_ Cn —3h—d/1260 02 (x— hu )/K (2:26:34, u>dudx

 

 

= 0(n_3h-d)

Hence, by (h2), 2.7-5.5, E(a2 j_) — 0(nh d)_1=o,(1) VA 6 KW. Similarly, we can
show that Z, E(a2 i) = 0(1), VA 6 HM.

Note that, VA 6 KW, Vi 75 j , E[cz-lez- = y] = 0. Thus, by the independence
of Xi’s, 23-79,- E(a,;ja,-,-|X,-)= 0- — Zm¢i3£j Ea(,- jaileiv) for all n 2 1. This
completes the proof of the second part of (2.42), and hence that of the ﬁrst part of
Lemma 2.4.1. I

To prove (2.40), by the Cauchy-Schwarz inequality, (2.45) , (2.12), by Lemma

2.4.3, and by (h3), we obtain VA 6 Kg

 

 

 

 

 

 

 

 

 

 

 

 

an'5n112
21 n Un(Xj)l'/h(Xj) Jnr 92(1):) 2
< n — I- su ——1
‘ _ngl 9209-) 3 $612 922,2(15)
i_1 n Un(Xj)Ij 2 _1 WWW 2 92(2)
2 22 J; 909') Z 909') Ij i229 *2(a:)-1
.. ,_1 W W- ,_1 2... 92cc) 2
< C Elly 909) 53'— XXX .3} 93(22) 1
d 1 2 4
= n0p((nh )— )0p(1)0p((logkn) (hemmm)
2 .4— Gal—32-
: 0p((logkn) (logn) +4n +4) =op(1).
This completes the proof of Lemma 2.4.1. [:1

37

Proof of Lemma 2.4.2. Let I'ln 2: A’fin, 19h := A’ph, and 1) := A’rn. By the

Cauchy-Schwarz inequality, VA 6 Rq

 

UnX-I- 2
n1/2IIX’9n1ll s (n‘1/22[——g-((—,{—))i] ) (2.46)
n - 2
”_1/2 Ill'/n(X 60)—uh(X,-)II ,
x< ,‘El 909) z, .

By (2.11) and (h2),

En—l/Z Z

The second factor is bounded by 2bn1 + 2bn2, where

IX 0)); X K X yf/ d0 2
||h() 60f -) f h( —y)00(y)ll (3.0%],

__ n5——/2 Z 1[ 909 )

I[Un__;_Xi) ‘22)]: -——-—J2 = 0(n_1/2h—d) = 0(1).

 

 

 

bn2 = "221/2 2%
j=1
IIKh(Xj — X.)1>90(X.-) -— f Xh(X,- - 2290(2))” dam/)1 2
"37:3 g(X,—) 1'

Now, by using (g1), (k), (h2), and continuity of mo, we obtain that the expected

value of bnl is bounded above by

_2___K2(0) / I___Iveo(x)II2d

 

n3/2h2d 9(3)
2 (f K(u)llz>oo(x-hu)llg(x — hu)du)2
J'W/z go) 22 = “(2'

To handle bn2: ﬁrst note that conditional on X j, the inner term of bn2 is (n— 1) / n
times the average of centered i.i.d. r.v.’s. Using the fact that the variance is bounded

above by the second moment, we obtain that the expected value of bn2 is bounded

38

above by

 

K X —X z; X 2
"-1/2E[1 h( 2 g()1()2|;(90( 1)” 2]

X2 (u) Mow — mum — uh) _
=n1—_/12hd// g($) du d1: — 0(1).

 

This completes the proof of (2.41) for k = 1. This together with (2.1) implies (2.41)
for k = 2.

To prove (2.41) for k = 3, similarly by the Cauchy-Schwarz inequality, VA 6 Rq

1/29 n3”2

Un(X ” IIVn(X -,én>—z>n<X-,oo>u 2
( 1Zl__-)_Ijl2"_)( 2i 909-) J Ijl)

lln

 

But the second summation is bounded above by

)2 _321th(X—X,-j)z 2
saw > mm ﬁlm: .( J)

= 0p(hd) X 012(1),

 

by (2.32) and the assumption (m5), and by (2.24) applied with a E 1. This together
with (2.12) proves (2.41) for k = 3. The proof of (2.41) for k = 4 uses (2.41) for k = 3
and (2.3), thereby, completing the proof of Lemma 2.4.2.

Next, we shall show that the right hand side of (2.39) equals Qn(én — 60), where

Qn = 20 + 019(1). Recall the notation at (2.9) and (2.35). Let

 

n l- A
_ . . 9h(X') d -
Vn 1= n 1: Vn(Xj,9n)";§‘J—Ij'”7nl"'],
nlD(ngX9)l°/’(jé)
._ _1 ”X *0217, 1 n .
Ln ._ n El 102)“ j) I], VAEIR‘].
i=1

 

39

So, the right hand side of (2.39) can be written as the sum [Vnugz + Ln]un. But,

 

 

 

Id
"V75“ S max1<z<n 712“” ll n1 1”
n A
_ 9h(X ) .
uvnln := n 1: ——1—uun<x- 6n>ur snvnun+uvn12|t
*2 J
j=1 9w (Xj)
where,
lan11|| == maxlgi<nllmgn (X-)—m00(X,-)||
x[n—3 :(Zz_ —_-j1Kh(X Xilz’)2]
. 9w(X ) J ,
3:1 J
“V H n_1 Z": (71"1 Z?=1Kh(xj -Xi)llm90(xi)”1_) x (EL Xj)l_)
1112 = * . j
j=l 9112(le 9w(X )9

By (2.3), (2.32), the assumption (m5), and by using (2.24) with a E 1, "anlll =
019(1). Also, by the Cauchy-Schwarz inequality, (2.3), and by using (2.24) with a E

andl on the ﬁrst and second term, respectively, one obtains

n —1 n . __ . ‘ . 2 1/2
n X = KMX lellma 09)”
_1 Z 2 1 J 0 .

n —1 n , . 2 1/2
n—l ” Zi=1Kh(XJ_X1) ,
X J;( 9321(le I]

=%mx%m=%m.

m9

0

 

 

 

Hence, ”V111“ = 019(1). This together with (m4) then implies that “anl = 019(1), and
by consistency of én, we have llugVnunll 2 019(1).

Next, we will show that “Lnunll = 013(1). For this write Ln := Lnl + L712, where

 

 

L ._ _1 n an(Xj,60)[Dn(Xj,én)—1>n(XJ-,00)]’I.
n1 '— TL 2 *2(X) J ’
j=l _ 9w J
L ._ _1" 'vn<Xj,eo>vn<XJ-ﬂo>’1.
j=1 _ 9w J

 

40

But, by (2.3), (m5), by the Cauchy-Schwarz inequality and (2.24) , VA 6 Rq, “Lnl II 2

019(1), while

 

 

 

 

 

 

 

_ n-1Xj:60)9h(Xja00)’Ij
"2 ,2?!“ 912%,)
{171,0 -—l'/ 330 2
”.42: ||( (X3 0) :12)“ 2;ng 0))” Ij]
n ||(Vn(XJ :ol-Vh( Xj,90))||||Vh(Xj,90)|| .
nlj; [ 9132(Xj) 1]]

But, by using same argument as in the second factor of the right hand side of (2.46),
and by (2.3), this upper bound is op(1). Moreover, similar to the argument used for
Zn in (2.37) and using (2.3), one obtains VA 6 KW,

lth(X j902) Vh(X 90)'||
—1
; 9w 2(Xj)

 

This proves Qn = 20 + op( 1), thereby also completing the proof of Theorem 2.4.1.

2.5 Asymptotic normality of Mn(én)

This section contains a proof of the asymptotic normality of the minimized distance
Mn(én). The replacement of 91"}, by g in Mn and Tn is reflected by notation Mn and

Tn. The main result proved in this section is the following

Theorem 2.5.1 Suppose (CI), (62), (e4), (91), (92), (k), {m1)-(m5) and (h3) with<1>
replaced by G hold. Then under H0, nhd/2(Mn(dn) - C'n) —+d N1(0, F). Moreover,

If‘nP—l — 1| = 013(1).

41

Consequently, the test that rejects H0 whenever nhd/Zf‘;1/2|Mn(én) —- C'nl > Za/2
is of the asymptotic size a, where 20 is the 100(1 — a)% percentile of the standard
normal distribution.

Analogous to the proof in K—N, the proof of this theorem is facilitated by the

following ﬁve lemmas.

Lemma 2.5.1 Suppose (e1), (e2), (e4), (91), (k), (hl), and (h2) with <1) replaced by

G hold. Then, under ’HO,
nhd/2(A7In(60) — a.) 2d N1(0, r).

Lemma 2.5.2 Suppose (61), (e2), (91), (k), (m3)-(m5), (M), and (h2) with (I) re—

placed by G hold. Then under H0,
nhd/2an(én) — ano): = 0100).

Lemma 2.5.3 Suppose (e1), (e2), (91), (92), (k), (m3)-(m5), and (h?) hold with (I)

replaced by G. Then under H0,
nhd/2|Mn(90) -— anon = 0pm).
Lemma 2.5.4 Under the same conditions as in Lemma 2. 5. 3,
nhd/ZIC‘n —- on) = 0,,(1).

Lemma 2.5.5 Under the same conditions as in Lemma 2.5.2, f‘n — F = 019(1),

Consequently, the positive deﬁniteness of F implies [fur-1 — 1| = 0p(1).

The proof of Lemma 2.5.1 is facilitated by Theorem 1 of Hall (1984) which is
reproduced here for the sake of completeness.

42

Theorem 2.5.2 Let fit-,1 S i S n, be i.i.d. random vectors, and let

Un := Z Hn()?2in)i GTl($ay) I: EHn(X1,$)Hn(X1a3/),
l_<_i<j_<_n

where Hn is a sequence of measurable functions symmetric under permutation with

E[Hn(X1,X2)|X1] = 0, EH2,(X1,XQ) < 00 v n 2 1.

If, additionally,

 

EG%(X1: X2) + n-1 EH14L(X1,X2) _,
[EH%(X1,X2)12

O, asn—+oo,

then Un is asymptotically normally distributed with the mean 0 and the variance

”2 EH7%(X11)22)/2'

Proof of Lemma 2.5.1. Let

 

~ 1 n 2d<I>(:c)
Tn 6 := — K a: — Xi Y2- — m Xi —,
( ) (1b,; h< >( 9( ))] 92a)
- ._ _1_ " Kg(x—X,)é,2
Rn ._ ”2221/1 9201:) d<I>(:r).

To prove Lemma 2.5.1, by Slutsky’s Theorem, it sufﬁces to show that
nhd/ 2(21(60) — fin) —+d N1(0, r), (2.47)
nltd/2|1V4n(6’o)- rm): = ops),
mid/2mm — on) = 0,,(1).
The ﬁrst claim in (2.47) is proved in Lemma 5.1 of K-N.

For the second claim, it suffices to show that E [nhd/ 2|Mn(00) — Tn (60)”2 = 0(1).

Let

 

dG(a:) ,

 

f.. ._ K1109 - XilKh(Xj - X191 _/ Kh(a: — X,)K,,(x — Xk)
iJk '— 9209.) .7 I 92(x)

43

and eijk = ﬂgﬂfijh Hence,
[nhd/2(Mn(90) Tn(90))l = 2:26 eijk-
i
Expanding the quadratic and using the fact E [ f,- j kIX'l’ X k] = O, \7’ j at i, k, we obtain
E[nhd/2(Mn<oo> — Tn(90))]2 (2.48)

2 2
-<- 04282 eiii + Zleiiie ejjj +ez‘jie J'iJ +ei 2J2 ie+eijj iii “we Jii +6 m
i J7“

2
+eljjejji + eiijejji + eiij + emjejn- + eijiejjj + ej’ljeiiil] .
To prove the required claim, it sufﬁces to show that all terms on the right hand side
of inequality (2.48) are 0(1).
By (e4), (g1), (k), and (h2), one obtains

E [ Z 62221]

i

 

 

 

 

_ hd K%(0)11 Kg (:1: — X1) 2
_ 3E 790(x1)[ 920(1) —/1 9(3) dx] ]
2K4(O) T__60 (13)“ K4(u)g( (a: — hu)r90 (:c — hu)
3 (nhd)3 /;r— g3(x)d +n3h2d/z / 92(1') dim

= 0(nhd)-1= 0(1).

Similarly, using the independence of Xi’s, by (e4), (g1), (k), and (h2), one shows
El2j#ileiiiejjjl = 0(1).
Next,

hd 2 2
E[éezjieﬁj] = ;E[090(X1)090(X2)f121f212l-
J 2

44

By the independence of Xi’s, (g1), and (k) and continuity of 050,

Eiago(X1)0§0(X2)f121f212]

4(u)o2 (:c)o2 (x— hu)
= h3d1/I [K g:()g(wf0hU) dadx

“2353 f [/1 K2(u)K2(v)060(x — hu)ogo(x —— hu — hv)
g(a: — hu —- vh)
g($)9($ - hu)

1: — mag (a: — hu)

K2 (u)g( 2
O _ d —3
+h_2_d[.//I g(:c) dxdu -O(h ) .

Hence, El2j¢i eijiejij] = 0(nh2d)“1 = 0(1). Similarly, one can show that other

 

dxdudv

 

 

terms of (2.48) are 0(1), thereby completing the proof of the second claim in (2.47).

To prove the third claim in (2.47), it sufﬁces to Show that

Let

 

2 ._ . 2 __ .
(in: Kh(XJ X1)I._/ Mdga)
.3 9209-) 3 2: 92(2) ’
hd/22 ~ ~
and bij =—2—Ld,-j Then, [rind/2w” — 07.)] = ZiZJ-bZ-j. Expanding the

quadratic and using the fact that Eldileil = 0, V j # i, we obtain

Elnhd/2(Rn — Grill2

< C1512?“ + Z [bzibmm + bimalmm + bim + bmill]
i mséi

To prove the required claim, it sufﬁces to show that all the terms on the right hand

side of above inequality are 0( 1).

45

Thus by (e4), (g1), (k), and (h2) and continuity of r60, one obtains

4 r (x)
E[Ziﬁl 5 ill??? [I get-fwd

K2(u)g1/2(x — hu)r1/2((x — hu) 2
2 90
+n3h2d / [/1 9(3) d2: d“

= 0(nhd)’1 = 0(1).

 

 

For the fourth term, since

hd
El: bimbmil - 7Elago (X1)090 (X2)d12d21]
maéi
2

thus, by independence of Xi’s, (g1), and (k) and continuity of 000, one obtains

E[0§0(X1)0§0 (X2ld12d21l
K4(u)000 (at)000(1:—hu)

—2;§8—1///1_K2(u)1{2(v)000(x — uh)ogO (2c — vh — uh)
g(:r — vh — hu)

game—hm
a: — hu)060 (x — uh) 2

+h_2—d1[//1'K2(u)g( 9(5'3) dxdu =O(hd)"3

Hence, E l2m¢i bimbmil = 0(nh2al)"1 = 0(1). By similar arguments, one can show

 

 

dxdudv

 

that other terms are also 0(1). Hence we are done with the third part of (2.47) and
also with the proof of Lemma 2.5.1. [:1
Proof of Lemma 2.5.2. Recall the deﬁnitions of Un and Zn from (2.9). Add

and subtract mgo (Xi) to the ith summand inside the squared integrand of Mn(én),

46

to obtain that

Mn(90) — Mn(én)
=n_1 :1[Un(X]*2MZn( 9nllj J]

 

 

9w 2(Xj)
_ . X19“ )1".
1 2 [22? W212?) fl
= 2Q1] - —Q12, say.
It thus sufﬁces to show that
nhd/2Q1 = op(1), nhd/2Q2 = 013(1) (2.49)

Add and subtract (én — 60)’7'n00(X,-) to the ith summand of Zn(Xj,én,), we can

rewrite

 

X
621 = WIZFW ﬁg“ jd") mfg]

9w 2( XjU)n(
_1X.7')HTL(X]60)
+(0n _ 60), n JZU 9* 10% Xj) Ij

= Q11+Q12, say,

 

where dm- are as in (2.35). By using (2.3) and (2.24) with a E 1, one obtains

Xj) 2
lzliﬁX 1].] = 0,,(1) (2.50)

By the Cauchy-Schwarz inequality, (2.3), (2.12) and (2.50), one obtain that (nhd/2

lQlll) is bounded above by

ldni l

W Zuén — 00||(nhd)1/2~Op((nhd)—1/2) -maxim.
n _

47

But, by (m4) and (2.32), this entire bound in turn is 019(1). Hence, to prove the
ﬁrst part of (2.49), it remains to prove that nhd/ZIQmI = 013(1). But Q12 can be

rewritten as

 

 

n U77(X )#n(X] 6in)

(an _6),n—11(g[ J I]

_(én _ 60) ”_1 Z": [UM XXj)lﬂn(J ' ,29n) — Iln(Xj,90)le]
j=1 9u12X(j)

= Q121 - Q122, SEW-

But, by the Cauchy-Schwarz inequality, (2.3), (2.12), (2.50), one obtains (nhd/2

“0122“)? is bounded above by
thduén - 60u20p(nhd>—1maxinménm.) — Th60(X7;)ll-

By assumptions (m5) and (h2), and consistency of an for 90, this entire bound is
0p(1).

Next, note that the average in (2121 is the same as the expression in the left hand
side of (2.39). Thus it is equal to

. " zn(x.,én) pn(x (in)
(an — 0 )ln-1 [ J I]

=(9n —00)n ), n_1 j: 1:7:UO([J jaJo‘n) MAX] j60)l_j:l

(2.51)

 

 

 

920 209')
_ Zn(XJ 9n )l#n(X 671—) MM 90)]
6n _ 60)”, I: ll 9w2(Xj) Ij]

=D1 + D2, say,

48

But, by the Cauchy-Schwarz inequality, (2.3), (2.29) and (2.24) with a E moo, VA 6

1M,

Emir” 2"01”)

_<- nhd/2l|(én——6 0)”le _1 Z 1|:Z'n(XX jion)l, 132% If [Ill/7“)? Xgn)lle:|2

9112(le j=l (XJ')

 

z nhd/2“(én — 60)II2O(1).= 0(1),

by Theorem 2.4.1 and the assumption (m5) and (h2). Hence, nhd/ZIIDIH = op(1).
Similarly, one obtains (nhd/2IID2II) is bounded above by nhd/2ll(én — 60)||2op(1) =
019(1).

This completes the proof of the ﬁrst part of (2.49). The proof of the second part
of (2.49) is similar. [:1

Proof of Lemma 2.5.3. By (2.11) and Lemma 2.4.3.

nhd/gerzWo) — Numb):

 

 

 

 

U2<X 92b)
< nhd/2 —1
" [" £19209]: 452% 93.01:)
= nhd/20p((nhd)—1)-Op((logkn)(logn/n)1/d+4) =op(1). El
Proof of Lemma 2.5.4. Let
ti=m§n(xi)—m00(xi)a Awe) :=92(x>(*“2(1>’2<z)).

Rewrite an as

 

2 ._ .€._.2
K(X] X)(z t2) 1]]

an = ”—3 n h
22;) .2,

911) J)

49

 

 

 

z ”_3 n n K2109- X,)(c, t,)2
,2]; [ 512(le J
n n K2(X- -X-)(c- __ t-)2Aw(X-)
+,,.3 [ h J . . 3,]
i=2]; 9122(Xj) J

Now, it sufﬁces to prove that
nhd/2(A,,1 — on) = 0,90), nhd/2An2 = 010(1). (2.52)

By expanding the quadratic term in the summand, Anl can be written as the sum

Of 67;, A7112, and A7113, where

 

 

n K2(X--X')t2
—3 h. J z
= n z:[ 2..., 21.],
i=1j=1 9 J
n n K2(X- — X-)e-t-
__ , _3 h J Z 22 .
Am ~— 2:2:[ 920., 1.]
i=1j=1 J

By (m4) and (2.32), one obtains that maxlSiSnltz-I2 = 0p((nhd)_1). Moreover,

by (g1), (k), and (h2), one obtains that

“l 3: E 192102; )Ijll

 

 

 

 

 

 

 

i=1j=1
__K_2__(0)" K;,(Xj- -X i)
d )2: E 92X szlJrl "3 2 E 92 61:3}
7.21.2 _1 ( X.) W (X )
_K__2__(0) 1 K2(u) g(:1: — uh)du
7:2th /;r—)d 9(r) +nhd I/ 9(I)
= 0((nhdr1).
Hence,
K2( X —X-
lAn12l S max1<i<nlti 971—3: 2 [M 92 X )Ij] (2-53)
i=1j=1 ( J)

= 0p<(nhd)—1> x 0p((nhd)_l) = 0p(<nhd)—2).

50

Next, by ($1). (k), and (112).

”—3: Z [Kh(Xj( Xialéilzjﬂ

i=1j=1
=“K2(0) 1 K2(X--X.-)Ie.-I
n3h2dZE [922:2 X-) I’lJr—EZE[ h 9909) Ijl

71..
#J

 

 

 

 

 

 

<K2(O) 000(th
fl

 

 

 

<(nhd)2 9(1)?)
2(u) 090 (a: —— uh) g(:v -— uh)
+(nhd)/ fK 9(13) dxdu
- ounhdﬂ).
Hence,
Kh(Xj( leléz'l
lAn13l S 2maX1gignltil7? 3: Z Xj) Ij
2=1j=1

= opanhdrl”) x op<<nhd) )=op(<nhdr3/2).

Consequently,

Inhd/2(An1—én)l -—— nhd/2(Op((nhd)‘2+0p((nhd)‘3/2))

= Op((nh_3d/2)-1)+ op((nh2d)-1/2) = 0,,(1).

To prove the second claim in (2.52), rewrite An2 as the sum of An21, A7222 and

 

 

 

 

An23,where
n " "K2(X-—X-)52A (X)

__ -—3 h J 2 z w J ,
A7121 — n 2: 2(X) Jl,
i=1j21- 9 J

n n FK2(X-—-X-)t2A (X')
—3 h J z w J
A7122 = n :2 2(X_Z) ] ’
i=1j=1- 9 J ,
n n "K2(X-—X-)€-t'A (x.)
_ —3 h J J H w J ,
An23 "‘ 72. Z: 2(X-) J '
i=1j=1- 9 J

 

51

 

 

 

 

 

i=1j=1 9 j
K2(0) n 22 J 1 [1922109] X0622 ]
= E 1-
2 2 2
_ K2(0) 060(18) 1 K (22) 000(m-uh) g(:z:—-uh)
— n2h2d/I 9(3) dx+ nhd).//I 9(3) dxdu
= 0<(nhd)“1>.

Hence,

|nhd/2An21| 3 mild/2 sup lAw(a:)|n _3 2: 2K
161 2=1j— 1

= Mid/20p(logkn(logn/n)2/d+4) ng((nhd)(_1)

 

Xiz.)e
x-> 19]

 

= Op(h_d/glogkn(logn/n)2/d+4) = 0p(1),
by Lemma 2.4.3 and (2.3). Similarly, one obtains that
Inhd/QAnZZI

n n K2(X- —X-)
__ Z
S mild/2 sup IAw($ $)|max1_<_i<n|tz' |2 n 3: Z [ h QJX- Ii]

 

= nhd/ 20P(10gk”(108n/ n)2/ d+4)0p((nhd)_1)Op((”hd)—1)

= op<(nh3d/2>—1)= 0pm,
and

lnhd/gAn23l

 

192m- - Mia-II]
J

92(Xj)

n n
g 2nhd/2 sup IAw($)lmaxlgignltiln—3 Z Z [
xEI 2' lj—l

52

= nhd/Zop(Iogkn(logn/n>2/d+4)op(W)*1/2)0p((nhd)—1>

= cam/125*”) = 0pm),

thereby completing the proof of the claim in (2.52), and hence that of the Lemma
2.5.4. E]

Proof of Lemma 2.5.5. Recall the notation from (2.1), (2.2). Let

n 2w

To prove the ﬁrst part of Lemma 2.5.5, we need to prove the following steps:

EKh(a: — X)Kh(y —- X)o§O(X) 2
9(95)9(y)

 

dG(a:) dG (y)

 

lf‘n —— m = 0pm), lfn —- in = 0,,(1) (2.54)
Ign — 972' = 010(1), 9n —> I‘.
Now, we shall prove the ﬁrst part of (2.54). For the sake of convenience, write

Kh(Xj — X2") by Ki(XJ-) and Ah(.r) :2 g2(.2:)(§;2(:1:) — g"2(:r)). Now, rewrite f‘n

as the sum of the following three terms:

 

 

.__. d _2 -1 19(le >-K (szez— tixcj-tj) 2
Bl ._ 2h 12 2;}; _ 2 920(1) I, ,
32 .___ 2hdn_22n —1: K2(Xl) Kj(Xl)(€22— tij)(€ “’9 j)Ah(Xl)I 1,21] ,
#J’ - 920(1)

 

 

_ XlK (X )(67, t¢)(€'—t')
B3 := Zhdn 2}: (n1: ’ 912W) 9 91,)

i752 -
Ki(Xz)K°(Xz)(€i- tz'jXC -t jrlAMXz) 11)]

X ”-1 J
( Z 92(Xl)

l

 

In order to prove the ﬁrst part of (2.54), it sufﬁces to prove that

Bl — Ln = 0p(1), B2 = 0p(1), and B3 = 019(1). (2.55)

53

For this, we shall show that

K°X K-X e; c,-
B :2 hdn—ZZ ”—1: 2( l) _7( I” 2”le
92(X) l

iaéj l 1

This expression is bounded by the sum of the following two terms:
'2

2K2(0) e2e2[K9(X9) _+K2(Xj) .
n4hd #j 9 9 920(2) 2 92(le J,
K'(X)K-(X) '

._ d —4 2 2 2 l J l

#1 1m 9209’) ,

By using (g1), (k), and continuity of 030, we obtain
[K%(X2)€%€% J

94(X2)
—d ‘30 (:1: — uh)a§0 (x)g(:r -— uh)
= h [I] 93(33) dudx

2

 

 

H1

 

 

’

2

 

 

 

 

II
,9
a".
a.
V

Hence, EH1 = 0(nhd)_2 = 0(1), by (h2), and H1 2 op(1).

Next, rewrite H2 as the sum of the following two terms:

 

 

 

K2 x K2 X 5262.
H21 = hdn—4 Z 2( z)4J( z), 112:
179279] 9 (X1)
H22 = h dn_ 4 Z Ki(Xl)Kj(X1)K,-(Xm)Kj(Xm)ezze%IlIm.
m752°5£j¢l 92(X1)92(Xm)
By (92), (81): (k), (112), and independence of Xi’s,
K2(X3)K§<x3)a§ (X903 (x2)
21 (n ) [ 940(3) 3
2
= (nhd)_1/I[/K2(u)ogo(x—uh)g(a:—uh)du] g—3(:L‘)d:c

= 0(1).

54

Similarly, by (e2), (g1), and (k),

EH22

= [If// [K(u)K(w)K(v + u)K(w + ”>030“ — uh — vh)ago(x - Uh — wh)

X 9(2: — vh -- wh)g(a: — ’Uh — uh)
9(15 - vh)g(~’c)

 

dw du dv dz]

= 0(1).

Hence, H2 = Op(1), and (2.56) is proved.

By a similar argument, under (2.56), (e2), (g1), (k), and (h2) one obtains

 

 

 

Min—22L, 4: K2 2(X;)2Kj (Xz)|€2lzl]2 =Op(1), (2.57)
#J’ - (X1)
__ Xz'()Xl )K j(Xl)
dn 2§""_1Z 92(X2) l2 =0p(1). (2.58)
2 J -
Furthermore,
sup lAh(x)| = 019(1), by (2.3), (2.59)
2:61
maxlgignltil = 019(1), by (m4) and (2.32). (2.60)

Note that by expanding (c,- — ti)(ej — tj) and the quadratic terms, '31 — in] is
bounded above by the sum of 812 and B13, where

2
. d _2K2(X1)Kj(Xz)(|t2tj|+|6212|+lt2€'l)
312 .= 2h ’n. E[n— 1}: 92(Xl) '7 It ,

2763' 1

313 := 4hdn_2}:(n— 1: K2(Xl)Kj g2((X)l)l€2 9'11)

272]: z

 

 

 

X n_1 Ki(Xz)Kj(Xl)l(ltitjl + l€2til + Itz'éjl)
( Z 92(Xz) II .

l

55

But 812 = op(1) by (2.57), (2.58),(2.60), and the fact that ti’s are free of X1. Sim-
ilarly, by applying the Cauchy-Schwarz inequality to the double sum and by (2.56)
813 : 019(1). Hence lBl — f‘nl = 017(1).

Next, consider 82. By using the inequality
léz' - tilléj - tjl S Iéz'éjl + ”£th + lfz‘tz‘l + lti€jl,
and by (2.59),

132 S 2 sup IAh(fr)| [312 + B] = 0p(1).
:rEI

Similarly, an application of the Cauchy-Schwarz inequality to the double sum
yields B3 = 013(1). This completes the proof of (2.55), and hence that of the ﬁrst part
of (2.54).

To prove the second part of (2.54), it sufﬁces to Show that E[f‘n — ﬁn]? = 0(1).

Let

 

S. 'kl [Ki(Xk)Kj:Xk)K:(Xz)Kj(X1)Ik11
2]

2(ch) 2(Xz)
_K/fz z($)Kj ($)Kz' (30K 3'30 y>dxdy ,

9((x)9 y)

 

d .. ..
and uz’jkl: [772 62 62’ J'Szjkl Hence, [P71 - 9n] = ijéz' 2k 21 uijkl'

Expanding the quadratic, we obtain

lf‘n - 5n]2 = 22222 X ZZuz-jszmnpq

ijaéz'k l mnyémp q
Since we have four kernel terms in each Sijkl term, thus by using (g1) and (k)
h—Sd)

maximum order of Es,- j kl smnpq = O( , and hence that of E[uz’jkl umnpq] =

56

2d
h—g— x 0(h—8d) = 0(n—8h‘6d). Also, we have eight summations, if there are at
n

most ﬁve different subscripts in the summations, then the expected value of those
terms which has at most ﬁve different subscripts is at most n5 X 0(n—8h-6d) =
0(n‘3h’6d), and hence by (h2), it is 0(1).

So, we will only discuss those terms, which are involved with more than ﬁve
different subscripts in the summations. According to that criteria, [f‘n — 67212 is

bounded by the following terms:

2 Z Z Z Z Z “ijklumnpq, (2.61)

j#i k l ”#m p#i’j7k3l1m,n q?éi,j7k)l,m3n)p

”2 = Z Z Z Z Z uz‘jkl
j¢2 n#m k#zij3m)n l#z)j1k)m)n p¢23j3k3l3min
X lumnpi + umnpj + umnpm + umnpnl,

U3 = Z: Z Z ”mnpk

is“ 7175771 kﬂdﬂﬁ P¢i,j,k,m,n
Xluijii + “ijz'j + “ijim + “ijin + uz’jjz’ + uz’jjj + '“ijjm + uz‘jjn

U1

+uz'jmz' + “ijmj + uijmm + uz‘jmn + uijnz' + “z'jnj + uijnm + uz‘jnn],

U4 = Z “2' j kl umnpl’
p¢i¢j¢k¢m¢n¢z

U5 2 Z luz'jkn(umipk + umjpk + “mnpk)
##jaékaémaén

+umnpk(uijki + uz’jkj + ”ijkk) + 7‘2’jkm(uinpk + “jnpk + umnpk)

+(umnpi + umnpj + umnpm + "mnzmxuijki + uz’jkj + “ijkk + uz’jkm

+uz‘jkn) + (“ijz'k + uz'jjk)umnpk + (Umpm + ujnpm + uknprn)uijkm'
To prove the claim, if sufﬁces to show that the expected value of all of these terms is

0(1).

57

Note that, for all k, 1% i,j and k 75 l, ElsijkllXi’le = 0. Now, by using this

fact, Vzmsé 233', k,m,n, 1, and p # q,

Hence, E(U1) = 0.

Similarly, for all k,l 75 i,j, m, 71,}? and k 74 l,
E(uijl€l umnMIXi,Xj,Xm,Xn,Xp) = 0

and, expected value of other terms of U2 is zero. Hence, EU2 = 0. Similarly, for all
km 74 2333mm and k 25 p, E(U3) = 0-

Again by the above fact,

hzd 2 2 2
E(U4) = 753(616365868123485674]
h2d
= —,—,-E[{E[030<X1)U§O(X2>3123429514191
W 2 2 2
= TEl090(X1)090(lefEl31234lX3aX4l} l=0-

By independence of Xi’s, expected value of the ﬁrst term of U5 is equal to

2d
6 h 2 2
n Eluz’jknumz'pkl = n2 E[700(X1)090(X2)090(X5)8123485163]-

By independence of Xi’s, (g1), (k) and continuity of 060 and 790, one obtains

2 2
E [T90(X1)090(X2)090(X5)8123435123l

= Elmo(X1){E[0§0(X2)81234|X1,X3:X4]}2]

2&3[/;'// [/TOIJZCC—hv—hw)ogo(:c—hu—hw)K(U)K(v)

2 g(:r «- hv - hw)

3 3 dvdwdx
g (av-hwy (x)

 

xK(w + v)K(w + u)g(:r — hu — hw)du]

58

+ f [/1/ [7,13% .. hv —— hw)a§0<x - hu — hw)K<u>K(v>

_ _ 1/2 _ _ 2
g(a: hu hw)g (a: hv hw)dudwdx] dv]

 

xK(w + v)K('w + 1‘) g(:z:)g(:17 - wh)

_—. 0(h_4d).

Hence, the expected value of the ﬁrst term of U5 is equal to 0(72.hal)“2 = 0(1), by
(h2). Similarly, by using (g1), (k) and (h2), expected value of the other terms of U5
is 0(1). Hence, the second part of (2.54) is proved.

The proof of the third and fourth part of (2.54) is given in Lemma 5.5 of K-N.

Hence (2.54) is proved, and so is the Lemma 2.5.5. Cl

59

CHAPTER 3

Minimum Distance

Goodness-Of-Fit Tests

For Current Status Data

3. 1 Introduction

This chapter discusses a minimum distance method for ﬁtting a parametric model to
the distribution function of the event occurrence time in the one sample set up with
current status data. Let X and T denote the event occurrence and inspection times,
respectively. Let F (G) denote the d.f. of X (T). Assume X and T are independent. In
the current status data set up, one observes 6 = I [X S T] and T, where I [A] denotes
the indicator function of the event A. Let .A := {F0(t) : t E R+, 0 E 8 C Rq, q 2 1}

be a given parametric family of d.f.’s. Let I be a compact sub-interval of [0, 00). The

60

problem of interest here is to test the hypothesis
H01 : F(t) = F000), for all t E I, for some 00 E 9,
against the alternative
H11 : H01 is not true.

based on the random sample {(Ti, 67;) : 1 S i S n} from the distribution of (T, 6).
In this chapter, we adapt the inference procedures discussed in chapter 2 to the
current status data. More precisely, let 03(Ti) 2 F9 (Ti)(1 — F9(T,-)), and consider

the regression model
6i = F6(Ti) + 06(Tz')Ci, I S ’i S n.

Here {(2} are i.i.d. r.v’s such that E((,-|T,-) = O and E(§2-2|T,;) = 1, , 1 g i g n. we
shall be using the notation of chapter 2 with X, Y, p(:c) and me replaced by T, (5,

F (t) and F9, respectively, where now (1 = 1. Thus e.g., now

 

 

 

 

l Tl K t—T- 6-—F T- 2
W, .= fl [nazl Myself“ an» m), (3.1)
1 n 1 n Kh(Tj-Ti)(5i_F6(Ti)) 2
Mn(9) 3: - [”2 * ' ,
nj=1 ”i=1 9w“) ‘7
Dn I: nhl/Z égl/2(Tn(1§n) '- Rn),
6n := argminoeeTn(0), 611:: argmingeeMnW),

-- n K2 t—T- é?
Rn :: n—ZZ/ h(*2 1.) 1d
i=1 I 9w (0

. K (t—‘T°)K (t-fl-)é.-é' 2
._ -2 d h i h _7 a
9n ._ 2n h E j (/ §h(t) J d<I>(t)) ’

 

<I>(t), 52-2: i—F~ (Ti), lsisn,

 

61

2 .7"
j=1i 1 9i}; (Tj)

- . 2
. K T—T-K T-T'E'E'
Fn := thn—4Z (2: h( l 2)“ h( l J) z 311) .
#j z 9h(Tl)

This chapter is organized as follows. Section 2 adapts the results discussed in

 

chapter 2 based on minimum distance statistic to the current status data. First, we
discuss consistency of 19;; and 127; for T(F*), where F * E L2(<I>) is a d.f., different
from the null model A. Then, we discuss the consistency of 19;, 1972 and asymptotic
normality of 1% and Dn, under H01. Similar to Koul and Song (2006) (K-S), we also
obtain consistency of Dn against a ﬁxed alternative, under some regularity conditions.

Additionally, we obtain asymptotic power of the proposed minimum distance tests
under a class of local alternatives Hln : F(t) = F6005) + gb(t)/nh1/2, where 2/1 is a
continuously differentiable function such that f w2d<l> < 00 and f F9¢d<l> = O, for all
6 E 9.

Section 4 adapts the results of chapter 2 based on empirical minimum distance
statistic to the current status data and discusses consistency of 6}“, and 6n and asymp-
totic normality of 6n and nhl/zf‘;l/2(Mn(én) — Cm), under H01.

Section 5 reports results of the three simulation studies. The ﬁrst simulatiOn
study investigates Monte Carlo size and power of empirical minimum distance test.
The ﬁnite sample level approximates the nominal level well for large sample sizes.
Simulation results also show little bias in the estimator 6n, for all the chosen sample
sizes.

The second simulation study investigates the empirical size and power behavior

62

of the Cramér-Von—Mises test CV1, where CV1 is deﬁned in chapter 1. Since the
asymptotic distribution of C V1 is not known, so in order to ﬁnd the Monte Carlo levels
and powers of this test, we need to estimate its out off points. Estimated cut off points
are obtained by ﬁrst getting 10,000 values of C V1 and then by ﬁnding percentiles from
the distribution of 10,000 values. The ﬁnite sample level approximates the nominal
level well for all the chosen sample sizes. In our simulations, F is computed by the
one step procedure for calculating the nonparametric maximum likelihood estimator,
based on isotonic regression, cf. Groeneboom and Wellner (1992).

The third simulation study investigates Monte Carlo size comparison of the em-
pirical minimum distance test with the tests of Koul and Yi (2006) (KY) and CV1.
Simulation results show that the empirical levels of CV1 and KY tests are better
than Mn(én), when sample size is less than 200. But when the sample size is 200,

the signiﬁcance levels of all the three tests are comparable to each other.

3.2 Minimum Distance Statistics and Tests

In this section we adapt the results discussed in section 2 based on a class of minimum
distance statistics to the current status data. Here we shall be using the same assump-
tions discussed in chapter 2 with X, Y, ”(22) and ma replaced by T, 6, F(t) and F9,
respectively, where now d = 1 and I is a bounded interval in [0, 00). Also under the
current status data set up, assumptions (e1), (e2), (e3), and (e4) are automatically
satisﬁed.

First, we discuss the consistency of 19}: and 1§n for T(F*), where F * E L2(<I>) is a

63

d.f., different from the null model A. Let H11 : F(t) = F*(t), t E I.

Lemma 3.2.1 Suppose assumptions (k), (91), and (m3) of chapter 2 hold with ma
replaced by F6, (1 = 1 and I is a bounded sub-interval in [0, 00). Let F * be a given
d.f. such that F* ¢ A, F* E L2(<I>), and T(F*) is unique.

(a) In addition, suppose F * is a.e. ((1)) continuous. Then, under H11, 19;: =
T(F*) + op(1).

(b) In addition, suppose F * is continuous on I. Then, under H11, dn = T(F'*) +

Upon taking F * = F 00 in the above result one immediately obtains the following:

Corollary 3.2.1 Suppose assumptions (91), (k), (m1)-(m3), (hl), and (h2) of chap-
ter 2 hold with me replaced by F9, where now d = 1 and I is a bounded sub-interval

in [0, 00). Then, under H01, 19,"; ——> 60, and dn -—> 6’0 in probability.
Next result gives the asymptotic normality of n1/2(1§n — 60) under H01. Let
Fn(t, 6) := 11-1 1': Kh( (1— 7",), (3.2)
n

Fn(t,6) :2 n—12Kh(r,)1—)F9(T,-,)

Fh(1) := EFn(t,60)= EKh(t—T)F,90(T),

 

_ mow)

sn _ f1 92(t) d<I>(t),

2 __ F90(t)(1“F00(t))F00(t)F90(0,4520)dt
°_ /l' 900

64

Corollary 3.2.2 Suppose assumptions (g1), (g2), (p), (k), (m1)-(m5), and (h3) of
chapter 2 hold with my replaced by F6, where now (1 = 1 and I is a bounded sub-
interval in [0, 00). Then, under H01, n1/2(i§n — 60) = 261n1/2Sn + op(1). Conse-
quently,

n1/2(1§n — 00) —»d Nq(0, 23512251).

Next, we state the asymptotic normality result about Dn under H01. Let

F (t)(1F (t))¢(t)2
r := 2/;r{ 00 9(th }2d 1(/ /K(u) K()v+udu)2dv.

Corollary 3.2.3 Suppose assumptions (91), (92), (p), (k), (m1)-(m5) and (h3) of

 

chapter 2 hold with me replaced by F6, where now d = 1 and I is a bounded sub-
interval of [0,00). Then under H01, Dn -—+d N1(O,F) and [STEP—1 — 1| = 019(1),

where Q", is as in (3.1).

Consequently, the test that rejects H01 whenever anI > z a /2 is of the asymptotic
size a, where za is the 100(1 — a)% percentile of the standard normal distribution.

The following corollary provides a set of sufﬁcient conditions under which IDnI —>
oo, in probability, for any sequence of consistent estimators tin of T(F*) under the

ﬁxed alternative H11.

Corollary 3.2.4 Suppose assumptions (9]), (g2), (p), (k), (m3), (h3) of chapter 2
hold with ma replaced by F9, where now d = 1 and I is a bounded sub-interval of
[0,00). Assume the alternative hypothesis H11 hold with the additional assumption
that infg p(F*, F0) > 0. Then, for any sequence of consistent estimator 157;, of T(F *),
IDnl —> oo, in probalility. Consequently, the test that rejects whenever |’Dn| > za is
consistent against the ﬁxed alternative H11.

65

Its proof is similar to that of Theorem 5.1 in KS adapted to the current status
data.
Next, let 1,!) be a known continuously differentiable real valued function. In addi-

tion, assume i/J E L2(<I>) and
fF9¢d<I> = 0, ‘v’ 0 E 8. (3.3)
Consider the sequence of local alternatives
111,, = F(t) = F10(t)+ 1.112(1). 11. = 1/(nh1/2)1/2- (3.4)

The following corollary gives asymptotic power of the minimum distance test against
the local alternative H In- Its proof is similar to that of Theorem 5.3 in K-S adapted

to the current status data.

Corollary 3.2.5 Suppose assumptions (91), (92), (p), (k), (m4), and (h3) of chap-
ter 2 hold with m9 replaced by F9, where now d = 1 and I is a bounded sub-
interval of [0, 00), then under the local alternative hypothesis (3.3) and (3.4), 'Dn —> d

N(F’1/2fw2d<1>,1).

The following corollary gives the asymptotic distribution of Sn under H 1n- Its

proof is similar to that of Theorem 5.2 in K-S adapted to the current status data.

Corollary 3.2.6 Suppose assumptions (91), (92), (p), (k), (m1)-(m6), (h3) of chap-
ter 2 hold with m9 replaced by F9, where now at = 1 and I is a bounded sub-
interval of [0, 00), then under the local alternative (3.3) and (3.4), ill/2(3); —00) "’d

Nq(o,2512251).

66

3.3 Empirical Minimum Distance Statistic

In this section, we adapt the results of chapter 2 based on empirical minimum distance
statistic to the current status data. First, we discuss the consistency of 9;“, and Sn.
The consistency of these estimators for 60 under H01 follows from Lemma 2.3.2 of

chapter 2. Applying Corollary 2.3.1 to the current status set up, we have

Corollary 3.3.1 Suppose assumptions (91), (k), (m1)-(m3), (hl), and (h2) of chap-
ter 2 hold with m9 and (I) replaced by F9 and C, respectively, where now d = 1 and I

is a bounded sub«interval of [0, 00). Then under H01, 0;; —1 60, in probability.

Applying Theorem 2.3.1 to the current status set up, we have

Corollary 3.3.2 Suppose assumptions (g1), (k), (m1)-(m3), (hl), and (h2) of chap-
ter 2 hold with m9 and (I) replaced by F9 and C, respectively, where now (1 = 1 and I

is a bounded sub-interval in [0, 00). Then, under H01,
(in —1 60, in probability.

Now, we discuss asymptotic normality of n1/2(én — 60). Let

n .' . .
a. z: 11-1 U”(TJ)Fh(TJ)IJ.
9(7)“)

 

i=1
Applying Theorem 2.4.1 to the current status set up, we obtain

Corollary 3.3.3 Suppose (61), (g1), (92), (k), (m1)-(m5) and (h3) of chapter 2
hold with m9 and (I) replaced by F9 and G, respectively, where now d = 1 and I is a

bounded sub-interval in [0, 00). Then under H01,

Til/2(én—00) = 261n1/23n+0p(1),

67

Consequently, n1/2(dn —60) _’d Nq(0, 2612261), where 20 and Z are as in (m6)

and (3.2), respectively.

Next, we discuss asymptotic distribution of the empirical minimized distance

Mn (Sn). It follows from Theorem 2.5.1 adapted to current status data.

Corollary 3.3.4 Suppose (e1), (g1), (92), (k), (m1)-(m5) and (h3) of chapter 2
hold with m9 and (I) replaced by F9 and C, respectively, where now d = 1 and I is a
bounded sub-interval in [0, 00). Then under H01, nh1/2(Mn(dn) —C'n) —’d N1(0, F).

Moreover, If‘nI‘_1 —— II = op(1), where f‘n and Ch are as in (3.1).

Consequently, the test that rejects H0 whenever nh1/2f‘; 1/ 2IMn(l9An) — Cnl > Za/g

is of the asymptotic size a.

3.4 Simulations

This section contains the results of three simulation studies. The ﬁrst one assesses
finite sample level and power behavior of the empirical minimum distance test statistic
Mn(0). The second simulation study investigates ﬁnite sample level behavior of the
Cramér-Von-Mises test CV1. The third simulation study investigates a Monte Carlo
size comparison of Mn (6), CV1 , and KY test. The simulations are done using Matlab.

The kernel functions and the bandwidths used in the simulations are

Ka) = 11*(1): 2i“ — x2>mx1 s 1)

h = c1n_1/3, w =c2n_1/5(l0gn)1/5,

68

with some choices for CI and c2. In the tables below, exp(d), 6’ > 0, de-
notes the exponential distribution with parameter 6 under the null hypothesis
H01 :F = exp(0), for somed > 0. The Weibull distribution with density w(t)
:2 ba—btb“1exp(—t/a)b is denoted by W(a, b) and G(a,b) represents the Gamma
distribution with density g(t) := mta_13$P(—t/b)1 a > 0, b > 0. The asymp-
totic level is taken to be 0.05 in all the cases. The sample sizes chosen are 50, 100,
200, each repeated 1,000 times.

Table 3.1 reports the Monte Carlo mean and the MSE(én) under H01 which are
obtained by minimizing Mn(0) and employing the Newton-Raphson algorithm. The
sample sizes chosen are 50, 100, 200, 500, each repeated 1,000 times. One can see
there appears to be little bias in én for all the chosen sample sizes and MSE decreases
as the sample size increases.

To assess the effect of the choice of (c1, eg) that appears in the bandwidths on the
level and power, we ran the simulations for various choices of (c1, c2), ranging from
0.1 to 1. Table 3.2 reports the simulation results for those (c1, c2) which gave the best
results. The entries in the tables for Mn(én) are obtained by computing the number
of times (lnhl/Qfgl/2(Mn(dn) — Cn)| 2 1.96)/1,000. Table 3.2 summarizes the
empirical levels for test statistic Mn(én). It shows that as the sample size increases
the simulated levels are getting closer to the asymptotic level 0.05.

Table 3.3 represents the power for test statistic Mn(dn) for four different alterna-
tives, when (c1,c2) = (.9, 1). It shows that the power is getting better as the sample
size increases.

The second simulation study investigates the behavior of the Cramér—Von—Mises

69

test CV1. Since the asymptotic distribution of CV1 is not known, so in order to
ﬁnd the Monte Carlo levels and powers of this test, we need to estimate its out off
points. Estimated cut off points are obtained by ﬁrst generating 10,000 values of
CV1 and then by ﬁnding percentiles from these 10,000 values. After that, for CV1,
the empirical level and power are obtained by computing the number of (CV1 2
estimated cut off point) / 1,000. In our simulations, F is obtained by the one step
procedure for the calculation of the nonparametric maximum likelihood estimator,
based on isotonic regression, cf. Groeneboom and Wellner (1992).

Table 3.4 contains the simulated 90th, 95th, 97.5th, 99th, and 99.5th percentiles
of CV1 when distributions of X, T are exp(1). Table 3.5 represents simulated signiﬁ-
cance levels by using the corresponding simulated percentiles given in Table 3.4 when
testing F = exp(1) and the distribution of T is exp(1). It shows that the simulated
signiﬁcance levels of CV1 for different chosen sample sizes are very close to the true

nominal sizes.

Let

A

Cn := argminCEe CV1(().

I

Table 3.6 reports the Monte Carlo mean and the MSE of (in) under F = exp(1) which
are obtained by minimizing CV1 and employing the Newton-Raphson algorithm. One
can see there appears to be little bias in Sn for all the chosen sample sizes and MSE
decreases as the sample size increases.

Table 3.7 represents the power of CV1 for ﬁve different alternatives, when distri-

bution of T is exp(1). It shows that the power is getting better as the sample size

70

increases.

In the third simulation study, we make a comparison of the Monte Carlo level of
the proposed empirical minimum distance test Mn (Sn) with the other two tests C V1
and KY.

KY test. Let S M L E denote the maximum likelihood estimator, obtained by

using the the following score statistics Sn(6) given in (3.3) of KY:

SW) := Z [——5_:6_T, — 1]T,-.

1:1 1“ 8
Let
. n
Un(t) 2: 12—1/2 ZI
i=1

(T <1) 11—1 Z
é(T-+T-)/2

TjTie— z j (1 _ e—gTi)—1/2(l _ 8—19Tj)—1/21(Tj S tATﬂ] 5“-
,.

 

 

__ —ér —éT _
"1211:17135 11(1—1 k) 11(Tk2Tj)

Let Gn denote the empirical distribution of the design variable Ti, 1 S i g n and
t0 = 99th percentile of On. The KY test statistic is

Rn: sup M.
OStStO V 017,00)

As shown in KY, the limiting null distribution of Rn is the same as that of
3111303131 |B(t)|, where B is the standard Brownian motion. The 95th percentile

of this distribution is approximately equal to 2.24241, which is obtained from the fact

P( sup |B(t)|<b)=P(B (1) <b)+2:(— 1) F((21—1)1<B(1)<(21+1)1)
03131

The empirical size is computed by using #{Kn > 2.24241} / 1000.
Table 3.8 shows comparison of simulated signiﬁcance levels for M91271), CV1 and
Rn. For the simulated signiﬁcance levels of CV1 we used the percentiles given in

71

Table 3.4. It shows that the empirical levels of statistics C V1 and K n tests are better
than Mn(61n), when sample size is less than 200. But when the sample size is 200,

the signiﬁcance levels of all the three tests are comparable to each other.

Table 3.1: Mean and MSE of bin, X, T ~ exp(1), 90 = 1

 

 

 

Sample Size 50 100 200 500
Mean 1.0442 1.0172 1.0025 1.0016
MSE 0.3324 0.1780 0.1216 0.0748

 

 

 

 

 

 

 

Table 3.2: Empirical sizes of Mn(én), X, T ~ exp(1)

 

 

c1, 02\n

50

100

200

 

0.5, 0.2

0.024

0.044

0.049

 

0.8, 0.6

0.061

0.058

0.046

 

0.8, 0.7

0.088

0.062

0.056

 

0.9, 0.8

0.07

0.061

0.05

 

0.9, .9

0.07

0.059

0.049

 

 

0.9, 1

 

0.094

 

0.058

 

0.045

 

 

72

Table 3.3: Power of Mn(dn), T ~ exp(1), (cl,c2) = (.9, l)

 

X\n

50

100

200

 

c(2,1)

0.975

1

l

 

G(1,3)

0.927

0.999

1

 

W(1,5)

0.211

0.404

0.628

 

 

 

W(1,2)

 

0.452

0.693

 

0.929

 

 

Table 3.4: Simulated percentiles of CV1, X, T ~ exp(1)

 

 

 

 

 

 

 

 

 

 

 

 

Percentile \ n 50 100 200
99.5 0.0412 0.0245 0.0152
99 0.0359 0.0216 0.0131
97.5 0.0298 0.0177 0.0112
95 0.0236 0.0156 0.0095
90 0.0183 0.0125 0.0077

 

 

Table 3.5: Empirical sizes of CV1, X, T ~ exp(1)

 

 

 

 

 

 

 

 

 

 

True level\ n 50 100 200
0.005 0.004 0.003 0.004
0.01 0.011 0.009 0.008
0.025 0.025 0.029 0.026
0.05 0.048 0.056 0.049

0.1 0.101 0.106 0.09

 

 

 

73

 

Table 3.6: Mean and MSE of (An, X, T ~ exp(1), 90 = 1

 

 

 

 

 

 

 

 

Sample Size 50 100 200
Mean 1.6815 1.4594 1.3088
MSE 0.5558 0.3152 0.1878

 

Table 3.7: Power of CV1, T = exp(1).

 

 

 

 

 

 

 

 

 

 

 

 

Dist. of X\ n 50 100 200
G(1,3) 0.95 0.999 1
G(2,1) 0.975 1 1
W(1,.5) 0.413 0.624 0.891
W(1,1.5) 0.394 0.45 0.575
W(1,2) 0.619 0.779 0.957

 

 

 

 

 

Table 3.8: Empirical sizes, X,T ~ exp(1), (c1,c2) = (.9, .8)

 

 

 

 

 

 

 

Tests\ 11 50 100 200
Mn .074 0.07 0.055
1%,, .04 0.049 0.052
CV1 0.049 0.048 0.051

 

 

 

 

 

 

74

CHAPTER 4

Testing the equality of two
distributions with Current Status

Data

4.1 Introduction

This chapter discusses the problem of testing the equality of two distribution func-
tions based on current status data. Accordingly, let X, S (Y, T) denote the event
occurrence and inspection times, respectively, from the ﬁrst (second) population. Let
F1 (F2) denote d.f. ofX (Y) and G1 (02) denote the d.f. of S (T). Let X1,.. .,Xn1
(Y1, . . . 1Yn2) be i.i.d. F1 (F2) and S1, . . .,Sn1 (T1, . . . ,Tn2) be i.i.d. G1 (02) ran-
dom variables. Assume all random variables are mutually independent. In the two
sample current status data set up, one observes (6,38,), 1 g i g 111, and (nj,Tj),

1Sjgng,where6=I[XSS],n=I[YST].

75

The problem of interest here is to test the null hypothesis that the two event

occurrence distributions are the same, i.e.
H02: F1(:r) = F2(:7:), for all a: E I,
against the alternative
H12: F1(:1:) 75 F2(:r), for some :1: E I,

where I is a compact sub-interval of [0, 00).
In this chapter we adapt the test proposed by Koul and Schick (2003)(K-Sh) to
the two sample current status data. More precisely, let 0%(Si) = F1(S,-)(1— F1(S,')),

0%(Tj) = F2(Tj)(1 —- F2(Tj)) and consider the regression models
5,,- := 171(52') + 01(Silcli’ 1 S i _<_ n1,
nj :: F2(Tj) + 02(lec2j’ 1 Sj _<_ n2.

Here (11" (23- are i.i.d. r.v.’s such that E((1,-|Si) = 0 = E((2j|Tj) and E(C‘12i|Sz-) =
1 = E(C%j|TJ-), 1 g i 3 n1, 1 S j 3 n2. Assume also that 01, G2 have positive
densities 91 and 92 on [0, 00), respectively, and that F1, F2 have bounded densities.

Let U denote the set of all nonnegative functions that vanish off I and whose

restrictions to I are continuous. Consider the integral

r = / 11(11)[F1(11) — F2(x)]dx, 11 e u.

A possible choice for u is the indicator 1: of the interval I. The integral 1‘ is 0 if the

null hypothesis holds, and is non-zero under the alternative H12, for all u E U.

76

Let K be a symmetric density with compact support [—1,1] and a = an be a
bandwidth sequence, and let

1 "1 "'2 WMMTﬂw

T = _—
n1712 i—1j_1 91(51)92(Tj )(

 

77j)-Ka(51— —T j)
Observe that

E(T)=//\/1KS—\/_lF1(t1)lKa(8-t)dsdt

which is close to I‘ for small a. Thus I provides an estimate of I‘ if 91 and 92 are
known, which is rarely the case.

This suggests to replace the densities in T by their estimates. Accordingly, let

n1 n2

1! )1/ (T9)( —17‘)K (S-—T-), (4.1)
T=n11nzgjzllﬁ 2 J a Z J

 

where 13k is an estimate of ”k = ﬁ/gk, k = 1, 2, constructed from the pooled sample
such that 1% (0:) = 0 for a: Q! I. Similar to K-Sh, the estimators of ”k can be obtained
when u is known and when u = u7 as described in Remark 4.2.5 below. So that the

asymptotic normality of I both under the null hypothesis and under local alternatives

n1n2

l
F=F+N_2, N:=———-———,
1 2 7 n1+n2

(4-2)

can be obtained, where '7 is a non-negative continuous function such that 7(0) 2
0,7(00) = 0 and 0 < fu(x)7(z)dx < 00.

The rest of the chapter is organized as follows. Section 2 discusses asymptotic
normality of I under a general set of assumptions on the estimates 191 and 192. Section

3 reports the numerical results of the two simulation studies. The ﬁrst one assesses

77

the ﬁnite sample level and power behavior of I test. The simulation results of the
test statistic I are consistent with asymptotic theory.

In the second study, the ﬁnite sample comparison of I and CV2 tests is made,
where CV2 is deﬁned in chapter 1. Since the asymptotic distribution of CV2 is
not known, so in order to find Monte Carlo levels and powers of this test, we need to
estimate its cut off points. Estimated cut off points are obtained by ﬁrst getting 10,000
values of CV2 and then by ﬁnding percentiles from the distribution of these 10,000
values. Simulation results show that for all the chosen alternatives and bandwidths,
signiﬁcance level of CV2 is better than I, and power of I is better than CV2, when
sample sizes are 50 and 100. But when sample size is 200, signiﬁcance level and power.
of I and CV2 tests are comparable. In our simulations, F1 and F2 are computed
by the one step procedure for calculating the nonparametric maximum likelihood

estimator, based on isotonic regression, cf. Groeneboom and Wellner (1992).

4.2 Asymptotic behavior under the null hypothe-

sis and local alternatives

This section discusses the behavior of the test statistic I given in (4.1) under the
null hypothesis and under the alternatives (4.2). Note that the choice 7 = 0 in (4.2)
corresponds to the null hypothesis. To stress the dependence of local alternative on
the parameter 7 we write Pry for the underlying probability measure and E7 for the

corresponding expectation.

78

Arguing as in K—Sh, we shall describe the asymptotic behavior of I as the sample

sizes n1 and n2 tend to 00. For this we need the following assumptions.

(A.1) The function u E U, the set of all non-negative functions that vanish off I and
whose restrictions to I are continuous.

(A2) The densities 91 and 92 are bounded and their restrictions to I are positive
and continuous.

(A.3) For any pairs of d.f.’s (F1,Gl), and (F2,G2), P(0 < F1(S) < 1) = 1 and
P(0 < F2(T) < 1) = 1.

(A4) The weight function K is a symmetric Lipschitz-continuous density with com-
pact support [-1, 1].

(A5) The bandwidth a is chosen such that a2N —+ 0 and aNc -+ 00, for some c < 1.

Note that 01(5) 2 0 as, implies either F1(S) = 0 or F1 (S) = 1 as. In the former

case E(6 S) = 0 implies 6 = 0 as. Hence 6 — F1(S) = 0 as. Similarly F1(S) =

 

E(6lS) = 1 as. implies 6 = 1 as. and hence 6 — F1 (S) = 0 as. Thus 01 (S) = 0 as,
implies 6-— F1(S) = 0 as, and, under (A3), P(01(S) > 0) = P(0 < F1(S) < 1) = 1.
Similarly, P(O < F2(T) < 1) = P(02(T) > 0) = 1.

The condition (A3) is a joint condition on the supports of (F1, G1) and (F2, G2).
For example, if distributions of X and S are exponential with the scale parameter
61 > 0, then P(0 < F1(S) < 1) = P(0 < e_01S < 1) = 1. But if the distributions
of X and S are U(0, 1) and exponential with the scale parameter 0 > 0, respectively,

then P(0 < F1(S) < 1) = P(0 < S < 1) = 1 — e‘g, and hence in this case the

79

ﬁrst part of (A.3) does not hold. A sufficient condition for (A.3) is that F1 (F2) be
strictly increasing on the support of G1 (6'2).

Also note that under (A2) and (A.3) the functions 91: 92 and 0%, 0% are bounded
and bounded away from zero on interval I and so are the functions 01/ 91 and 02 2./92

To establish the asymptotic normality of I, rewrite this statistic as

 

n2 _1 n1
1 .. N 2
igrlﬂg'SDC1i—EZT2(Tj)02(Tj)C2j+—£—1-—Zr1(Si)6(Sz-)+T4,
i=1 i=1
where
r19) = S1(z)— 12212091120. T—,-),
2j: —1
1
T2(2) = 172(x);1‘21:91(51)Ka(x—S,), xE[0,oo),
and
n1 n2
T4=,,,, ZZV1(S)02(T2).(F2(S)—F2T(2,—1)Ka(S T-)
1 2i=1j=1

As in K-Sh, the following additional deﬁnitions and assumptions are made .to
analyze the asymptotic behavior of I. Let S = (311 . . .,Sn1), T = (T1, . . . .Tnz).
6 = (61, . . . ,6n1), ’7 = (171,... 177712). and 6,- (773') be the vector obtained from 6(1))

by removing 6,- (71]).

Deﬁnition 4.2.1 We say the estimator rk is consistent and cross-validated (CCV)

on I for the function rk if the following conditions hold:

111
N 22119) )E.1(r1(S.-)-r1(s .)) 131:0..(1)
"Ii: 1
N ”2
‘7. 2 12(1 1211.2(1 ) 211912111: 012(1)
n2j= 1

80

N 1< atlaxnljugEmrnx)E.(1=1(x)ls,6.1>2151 = 0.2(1),

N K131111212 :21} E,[(.2(.)_ E7IT2(T)IT. 12,1)2IT] = 0.,(11.

We say i‘k is a modiﬁcation of fk if P7(SuPa:€I Ii‘k(:z:) — fk(:c)| > O) —+ 0. We say
fl: is essentially CCV on I for rk if there exists a modiﬁcation of 72k which is CCV
on I for rk.

Assumption 4.2.2 The estimate 72k is essentially CCV on I for T1: = u/gk for
k = l, 2.

The following result gives a sufﬁcient condition for Assumption 4.2.2. Its proof is

similar to that of Lemma 2.4 in K-Sh and hence no details are given.

Lemma 4.2.1 Suppose there are modiﬁcations 17k of 19k such that, for k=1,2

0 g 32(1) 3 K, :1: e I, (4.3)
for some ﬁnite constant K,
51—1-2 2 E7101 — V1(S 5.))2IS] : 0.2(1), (44)
7,12}: E.((z>2(T,- > - u.(T,-)>21T1 = 0.20), (4.5)
N 1 3.1%.] :gﬂylm (T) 571171 (T)IS.5.'])2IS] = 0127(1). (46)
N 12.3.22 :21; E.((u2(a:)— E.{T2(x>IT, 2,112.11 = 0.20). (4.7)

Then, Assumption 4.2.2 holds.

The next result gives the asymptotic distribution of I under the alternative (4.2)

for any 7 including the case 7 = 0.

81

Proposition 4.2.1 Suppose the conditions (A.1)-(A5) and Assumption 4.2.2 hold.

Then, under Pry, N1/2(T — F)/T converges in distribution to a N(0, 1) r.v., where
2 _ 2
r — f u <x>lq1¢1(x>+q2¢2<x>1dx,

1/21 = 1710- F1)/91, $2 = F2(1— F2)/92, C11: N/n1 and (12 = N/n2-

Details of the proof of this result are similar to those appearing in K-Sh and left out
for the sake of brevity.

Remark 4.2.3 The above result suggests a test which rejects H02 for large values
of ITI. To implement such a test we need a consistent estimate +2 of 72. Given
such an estimator f2, we have under the above assumptions that N1/2(’j' — I‘)/7“'
is asymptotically standard normal under Pry, where I‘ = 0 under H02. Let now (I)
denote the standard normal distribution function and z a /2 be its (1 — a/2)-quantile.
Then a test that rejects H02 if |(N1/2’j')/%| 2 Zak/2’ has the asymptotic level a.
Moreover, from the above result, the asymptotic power of this test, under P7, is

1— <I>(za/2 — H) + <I>(—za/2 — n), where

_ f u(:r)'y(:r)d:1:.

K—
T

 

Note that the value of I: does not change if we replace u by cu, with c a positive
constant.

Remark 4.2.4 Optimal u. Similar to K-Sh, the optimal u can be achieved such
that it maximizes the asymptotic power, or equivalently the function It, under (4.2)

for a speciﬁc function 7. An application of the Cauchy-Schwarz inequality shows that

82

n is maximized by the choice

it. = VII
7 q1w1(x)+ q2¢2(z)’

 

(4.8)

u:

and the maximal value of r; is

2(x)I (2:) 1/2
W = (/ (111/1135) +€2¢2($)dx) '

 

The optimal ury depends on the sample sizes, the density functions 91 and 92, and

the distribution functions F1 and F2. Next, we shall present the estimates of gk, 0%,

7'2 and ”IV

Estimates of 9k, 0120’ T2 and Vk. Similar to K-Sh, estimates of Vk, k = 1,2,
can be found for ﬁxed given u and for the (unknown) optimal u = u7. For this we
need estimates of the inspection time densities and variance function.

The inspection time densities 91 and 92 can be estimated by the kernel density

estimates

"1

. 1

91m = a Z Kh1(t — 5,),
i=1

. 1
92(t) = -- Z Kh2(t — Tj), t 6 RN,
With bandwidth hk, k = 1, 2. Its expected value is
5k“) = /9k(t+hky)K(y)d(3/). t6 R+.

Lemma 4.2.2 Suppose (A2), (A4) hold and the bandwidth hk is such that hk —) 0

and hkng ——> 00 for some c < 1. Then the following hold:

sumezlékm - §k(t)| = 0p7(1)7 (4-9)

/(‘g‘k(t) — gk(t))2dt _. 0. (4.10)

83

Details of the proof of this result are similar to those appearing in K-Sh and left out

for the sake of brevity.
Next, consider the following estimator of 0,26, k = 1, 2:

Tl .
62(t) = 2,2110% 7'; H1(Si))2K61(t — Si)
Ziil K610E " 51;)
n2 ._ . . 2 __ .
Zj=1(’7] “2(TJ» K62“ T3) t6 IR+

.2
02“) z n
Zjil K02“ " Tj)

where uk, 1: = 1,2 is the kernel regression estimate
n
,(t) 2.116.Kb,<t- s.)
#1 = n ’
2,211 K1210 “ 52')
n2
2 77 'Kb (t - T')
J 2 J t 6 EN.

.2 i=1
2.7:} Kb2(t - Tj)

Here the bandwidths bk: and Ck are chosen such that bk ——> 0, Ck ——> 0,11%!”c + Ck) —)

00, for some c < 1%. The following lemma gives the needed properties of this estimator

It follows from Lemma 3.3 of K-Sh.

Lemma 4.2.3 Suppose (A.2)—(A.5) hold, and the conditional fourth moment ”k is
2 is essentially CCV on I for 0% and

bounded on an open interval I. Then 6k
(4.11)

superIo?C -— oil 2 019.7(1), k = 1,2.

 

 

Now, consider the following estimator of variance function T2:
111 .2 n2 ~2 .
. 1 u (S) . 1 u (T ).
T2 = (11;.— Z .2 z ”i629 + ”n— .2 J “22(le (4'12)
1i=1 91(Si) 2 j=1 {12(le

The following Lemma proves the consistency of this estimator.

84

Lemma 4.2.4 Suppose the assumptions of Proposition 4.2.1 and Lemma 4.2.3 hold
2 = 7'2 +0p7(1), where

and a be a uniformly consistent estimator of u on I. Then, i

2 __ 2 + 2
7' — Tl 72
u2(x)F1($)(1 — F1($))d01($) + (12 / u2(x)F2(a:2)((1)- F2($))d02(x).
92 33

= (11/ 9%(93)

Proof. Note that (4.12) can be written as 7"2 = i1 + 1‘2, where

 

 

Let f2 = $12 + i22, where

 

1 n1 112(3))

~2
T1 = ‘11— .
"1 2:21 9%(52')

6%(‘S'Z)a

 

In order to prove $2 = 72 + op7(1), it sufﬁces to prove that
(4.13)

2 + emu), %2 = 72 + 0,9,0).

.2

T=T

For the ﬁrst claim in (4.13), it sufﬁces to show that
(4.14)

2

1 =

+ rf+op,(1), ‘22 =e22+op,(1).

By the choice of n1 and n2, 0 < ql < 1. Thus, for the ﬁrst claim in (4.14), it sufﬁces

 

 

 

 

to show that
n1 .2 2 «2
1 [u (32;) - u (3010160
51— : [ .209) = 0mm). (4.15)
i=1 91 2
Now, the left hand side of (4.15) is bounded above by
«2
. 0 (a?)
sup [who — uze): .; l =-— 0127(1),
:36 91(55)

85

by uniform consistency of a, 61, 91, (A2) and (A.3). This completes the proof of
the ﬁrst claim in (4.14). The proof of the second claim in (4.14) is similar, thereby
completing the proof of the ﬁrst part of (4.13).

To prove the second claim in (4.13), it sufﬁces to show that

12 = T12 + op,(1), “22 = 13+ 01.70). (4.16)

By the choice of n1 and n2, 0 < ‘11 < 1. Thus, for the ﬁrst claim in (4.16), it suffices

to Show that

n .
i i aha-was.) _ / man) dc (x)
"1 - 612(5 ') x 1

i=1 1 Z
By the Law of Large Numbers,

" MS )4 he) 42(x>a%(x)
—— n11: / —T——dGl (at), in probability.
924(3 ) 91W)

Thus, to prove (4.17), it remains to prove that

n .
i 21: 1,2,5.) [4%) _ sass]
n1 ,2, ‘ 4345.) gas.)
By the triangle inequality, the left hand side of (4.18) is bounded above by the sum

 

 

 

 

 

 

 

of the following two terms:

1 "“1
"119

1 2 1
A := —— s o , —— .
2 ”1 22111 ( )1(S)l91(154) £090]

 

[ 2(8.)— ms») ,

 

 

 

n
Note that, for uknown andu = u7, ”—1122- =11u2(SZ-) = 0197(1), and by (A2),
infxel'gl (3:) > 0. Hence

.2111

o :1: 2(2:

AISsup|1(?2—(I1 ”172112“,
xEI 91(3) i=1

= 0P7(1)0P7(1)— _ 0137(1))

86

 

by uniform consistency of 61 and (4.11). Similarly, because 0%(S) < 1, VS,

 

l§¥(x)—g¥(x)l 1 ”1 2
--—- u Si
er £04440) ”1.; ( )
= op,(1)op7(1>=om<1>,

242$

by uniform consistency of 61 and (A2). This completes the proof of the ﬁrst claim
in (4.16). The proof of the second claim in (4.16) is similar, thereby completing the
proof of Lemma 4.2.4.

Remark 4.2.5 Estimation of ”k' Estimation of ”k when u is known. As-
sume (A.1), (A2) and (A4) hold. Then Vk can be estimated by ﬁ/gk with his
as mentioned in Lemma 4.2.2. We shall now show that these estimates satisfy the
assumptions of Lemma 4.2.1 and hence Assumption 4.2.2. It follows from (A2) that
gk(t) > 4,6 for all t E I and for some ,6 > 0. Thus, by (A4), §k(t) > 26 for all t E I.
In view of (4.9), 17k = ﬁ/(ﬁk V6) is a modiﬁcation of 19k. We then obtain (4.3) from
the boundedness of u, while (4.4) and (4.5) follows from (4.9) and (4.10). Of course,
(4.6) and (4.7) holds as 17k does not depend on 6 and 1).

Estimation of Vk when u = u/y. Here we shall discuss the estimation of ”k =
WIT/9k) where '7 is a known non negative continuous function. In view of (4.8), an
obvious estimate of 11.7 is

. , where ibk=:—, k=1,2.

. _ 711
’7 — A
(11¢1 + 021/12 9k

21.

Similar to K-Sh, we can easily verify the assumptions of Lemma 4.2.1 for Vk =

M/gk by using Lemma 4.2.2 and 4.2.3.

87

4.3 Simulations

This section examines the Monte Carlo comparison of the test statistics I and C V2
based on 10,000 replications. For simplicity we took I = (0,5) and u(:r) = 11-. The
simulations are done using Matlab. The kernel function used for w, 91 and 92 in the
simulations is g(l — 2:2)1 (le S 1). Let c1 be the bandwidth used for w. Similar
to K-Sh, the values chosen for CI are 0.2 and 0.25. Also the bandwidths used for
densities 91: 92 in the simulations are hl = h2 = c2(log(n)/n)1/5. In the tables
below, exp(A) denote the exponential distribution with parameter /\ and wei(a,b)
represents the weibull distribution with density w(t) :2 ba‘btb_1ezp(—t/a)b. The
asymptotic level is taken to be 0.05 in all the cases. We used 7:2 of (4.12) to compute
I. The entries in the table for I test statistic are obtained by computing the number
of (If) 2 1.96)/10,000.

Since the asymptotic distribution of CV2 is not known, so in order to ﬁnd the
Monte Carlo levels and the Monte Carlo powers of this test, we need to estimate its
cut off points. Estimated cut off points are obtained by ﬁrst getting 10,000 values
of CV2 and then by ﬁnding percentiles from the distribution of these 10,000 values.
After that, for CV2, the signiﬁcance levels and powers are obtained by computing the
number of (CV2 2 estimated cut off point) / 10, 000.

Table 4.1 summarizes the empirical levels for test statistic I when sample sizes
for both the populations are the same with chosen values of Cl and c2. The sample
sizes chosen here are 50, 100 and 200. It shows that as the sample size increases the

simulated levels are getting closer to the asymptotic level 0.05.

88

Table 4.2 represents the empirical levels for test statistic I when sample sizes for
the two populations are not the same for all the chosen values of Cl and c2 and chosen
inspection time densities. It shows that the simulated levels are consistent with the
asymptotic theory when sample sizes are not the same for the two populations.

Table 4.3 shows the simulated power of I for six different alternatives and chosen
values of Cl and c2 when sample size for both the populations is 50. It shows that
the power is getting better as the parameter of exponential distribution increases.

Table 4.4 represents the simulated 95th, 97.5th, 99th, 99.5th and 90th percentiles
of CV2 for sample sizes 40, 80, 100, 200 when distribution of X, Y is exp(1) and
distribution of S, T is exp( 1.5). Table 4.5 represents the simulated signiﬁcance level
by using the corresponding simulated percentiles given in table 4.4 for sample sizes 40,
80, 100, 200 when distribution of X, Y is exp(1) and distribution of S, T is exp(1.5).
It shows that the simulated signiﬁcance levels of CV2 for different chosen sample sizes
are very close to the true nominal size.

Table 4.6 represents the simulated 95th percentile of C V2 for sample sizes 50, 100,
200 and for all the chosen inspection time densities.

Table 4.7 shows comparison of simulated signiﬁcance levels for I and CV2 for
different inspection time densities and different sample sizes. For the simulated sig-
niﬁcance levels of CV2 we used the percentiles given in Table 4.6. It shows that the
empirical levels of statistics CV2 is better than I, when sample size is small. But
when sample size is large, then the results of I and CV2 are close to each other.

Table 4.8 represents the comparison of power between I and CV2 for different

chosen alternatives and sample sizes. For the power of CV2 we used the percentiles

89

given in Table 4.6. It shows that the power of statistics I is better than CV2, when
sample size is small. But when sample size is large, then power of statistics I and

CV2 is comparable to each other.

Table 4.1: Empirical sizes of I, X, Y ~ exp(1), S,T ~ exp(1)

 

 

 

 

 

 

 

cl,c2 n1=n2=50 n1=n2=100 n1=n2=200
0.2, 0.6 0.1151 0.0852 0.062
0.2, 0.9 0.1112 0.0812 0.0564
0.25, 0.8 0.0763 0.0698 0.0585
0.25, 0.9 0.0843 0.0655 0.0595
0.25, 1 0.1016 0.086 0.052

 

 

 

 

 

 

Table 4.2: Empirical sizes of I, X, Y N exp(1), (n1, n2) = (180, 200)

 

 

 

 

 

 

c1, c2 S,T := exp(1.5) S,T :2 exp(1) S := exp(1),T := exp(1.5)
0.2, 0.6 0.061 0.0589 0.062
0.2, 0.9 0.056 0.0551 0.0573
0.25, 0.8 0.0573 0.058 0.0549
0.25, 0.9 0.0598 0.0586 0.059
0.25, 1 , 0.0535 0.0522 0.0560

 

 

 

 

 

 

 

 

90

 

Table 4.3: Power of I, S,T ~ exp(1), X ~ exp(1), n1 2 n2 = 50

 

 

 

 

 

 

 

 

 

 

 

 

 

c1, c2\Y exp(.5) exp(1.5) exp(2) exp(3) exp(4) exp(5)
0.2, 0.6 0.5381 0.2779 0.5762 0.9095 0.9634 0.9916
0.2, 0.9 0.4875 0.2767 0.5454 0.8891 0.9688 0.9940
0.25, 0.6 0.5253 0.2684 0.5960 0.9197 0.9772 0.9965
0.25, 0.8 0.5213 0.2777 0.5687 0.8980 0.9800 0.9913
0.25, 0.9 0.5227 0.2739 0.5900 0.8732 0.9761 0.9947
0.25, 1 0.5040 0.2645 0.5459 0.8943 0.9676 0.9848

 

 

Table 4.4: Simulated percentiles of CV2, X, Y ~ exp(1), S, T ~ exp(1.5)

 

 

 

 

 

 

 

 

 

 

 

Percentile\n1 = n2 40 80 100 200
99.5 0.1896 0.108 0.0866 0.0466
99 0.1433 0.0934 0.0755 0.041
97.5 0.1413 0.0787 0.0649 0.0318
95 0.1189 0.0671 0.0563 0.0321
90 0.0974 0.0571 0.0471 0.0275

 

Table 4.5: Empirical sizes of CV2, X, Y ~ exp(1), S, T ~ exp(1.5)

 

 

 

 

 

 

 

 

 

 

 

 

True level\n1 = n2 40 80 100 200
0.005 0.00498 0.0053 0.0051 0.0049
0.01 0.0099 0.0105 0.011 0.0121
0.025 0.02456 0.0254 0.0255 0.0249
0.05 0.05 0.0502 0.0501 0.0510
0.1 0.1022 0.1015 0.1014 0.0998

 

91

 

 

 

Table 4.6: Simulated 95th percentile of CV2, X, Y ~ exp(1).

 

 

 

 

 

 

 

 

Dist. of S, T n1=n2=50 n1=n2=100 n1=n2=200
exp(1) 0.0999 0.0551 0.031
exp(1.5) 0.1008 0.0563 0.0321
exp(1), exp(1.5) 0.1011 0.0556 0.0311

 

 

 

 

Table 4.7: Empirical sizes, X, Y ~ exp(1), (c1, c2) 2 (25,1).

 

 

 

 

 

 

 

S, T exp(1.5) exp(1) exp(1), exp(1.5)

n1 = n2 T CV T CV T CV
50 0.1013 0.0510 0.0965 0.0486 0.1115 0.0480
100 0.853 0.0497 0.0729 0.0494 0.0876 0.0504
200 0.0521 0.0482 0.0559 0.0466 0.058 0.0465

 

 

 

 

 

 

Table 4.8: Power, S,T ~ exp(1), X N exp(1), (c1, c2) = (.2, .9).

 

 

 

 

 

 

 

 

 

 

 

 

n1=n2 50 100 200
Dist. 61v T CV T CV T CV
exp(0.5) 0.5016 0.2845 0.7067 0.5328 0.8941 0.8465
exp(1.5) 0.2677 0.1478 0.4136 0.2365 0.6511 0.4085
exp(2) 0.5624 0.3606 0.8223 0.6200 0.9756 0.8699
exp(3) 0.7976 0.7209 0.9812 0.9429 1 1

w(.2,1) 0.9468 0.9488 0.9997 0.9993 1 1

w(.5,1) 0.5487 0.3876 0.8281 0.6586 0.9804 0.8999
w(1.5,1) 0.2138 0.1298 0.3438 0.2129 0.5120 0.3922
w(2, 1) 0.4445 0.3086 0.6568 0.5418 0.8673 0.8488

 

 

 

 

 

92

 

 

BIBLIOGRAPHY

[1] Ayer, M.; Brunk, H.D.; Ewing, G.M.; Reid, W.T.; Silverman, E. (1955). An
empirical distribution function for sampling with incomplete information. Ann.
Math. Statist. 26, 641-647.

[2] Beran, R.J. (1977 ) Minimum Hellinger distance estimates for parametric models.
Ann. Statist. 5, 445-463.

[3] Carroll, R.J.; Hall, P. (1992). Semiparametric comparison of regression curves
via normal likelihoods. Austral. J. Statist. 34, 471-487.

[4] Delgado, MA. (1993). Testing the equality of nonparametric regression curves.
Statist. Probab. Lett. 17, 199-204.

[5] Diamond, I.D.; Mcdonald, J. W.; Shah, I. H. (1986). Proportional hazards models
for current status data: application to the study of differentials in age at weaning
in Pakistan Demography, 23, 607-620.

[6] Diamond, I.D.; Mcdonald, J. W. (1991). Analysis of current status data, In
Demographic Applications of Event History Analysis (J fﬂussell, R. Hankinson
and J .Tilton, Eds), 231-252. Oxford Univ. Press.

[7] Finkelstein, D.M.; Wolfe, RA. (1985). A semiparametric model for regression
analysis of interval—censored failure time data. Biometrics, 41, 933-945.

[8] Finkelstein, D.M. (1986). A proportional hazards model for interval-censored
failure time data. Biometrics, 42, 845—854.

[9] Groeneboom, P.; Wellner, J. A. (1992). Information bounds and nonparametric
maximum likelihood estimation. DMV Seminar, 19, Birkhauser Verlag , Basel.

[10] Hall, P. (1984). Central limit theorem for integrated square error of multivariate
nonparametric density estimators. J. Mult. Analysis. 14, 1-16.

93

[11] Hall, P.; Hart, J. D. (1990). Bootstrap test for difference between means in
nonparametric regression. J. Amer. Statist. Assoc. 85, 1039-1049.

[12] Hart, JD. (1997). Nonparametric smoothing and lack-of-ﬁt tests. Springer-
Verlag, New York, Inc.

[13] Hoel, D. G.; Walburg; H. E. (1972). Statistical analysis of survival experiment.
J. National Cancer Institute, 49, 361-372.

[14] Jewell, N. P.; Van der Laan, M. (2004). Current status data : review, recent
developments and open problems. Advances in survival analysis, 625-642, Hand-
book of Statist., 23, Elsevier, Amsterdam.

[15] Keiding, N. (1991). Age-specific incidence and prevalence: A statistical perspec-
tive (with discussion) J.Roy. Statist. Soc. Ser. A., 154, 371-412.

[16] Khmaladze, E .V.(1981) Martingale approach in the theory of goodness-of-ﬁt
tests. Theor. Probability Appl. 26, 240-257.

[17] King, E; Hart, J .D.; Wehrly, TE. (1991). Testing the equality of two regression
curves using linear smoothers. Statist. Probab. Lett. 12, 239-247.

[18] Koul, H. L.; Ni, P. (2004). Minimum distance regression model checking, J. Stat.
Plann. Inference, 119, No.1, 109-141.

[19] Koul, H. L.; Schick, A. (1997). Testing for the equality of two nonparametric
regression curves. J. Statist. Plann.Inference. 65, 293—314.

[20] Koul, H. L.; Schick, A. (2003). Testing for superiority among two regression
curves. J. Statist. Plann. Inference. 117, 15-33.

[21] Koul, H. L.; Song, W. (2008) Minimum distance regression model checking with
Berkson measurement errors. To appear in Ann. Math. Statist.

[22] Koul, H. L.; Yi, T. (2006). Goodness-of—ﬁt testing in interval censoring case 1.
Statist. Probab. Letters. 76, 709-718.

[23] Kulasekera, K. B. (1995). Comparison of regression curves using quasi-residuals.
J. Amer. Statist. Assoc. 431, 1085—1093.

[24] Mack, Y.P.; Silverman, B.W. (1982). Weak and strong uniform consistency of
kernel regression estimates, Z. Wahrsch. Gebiete, 61, 405-415.

94

[25] Neumeyer, N.; Dette, H. (2003). Nonparametric comparison of regression curves:
an empirical process approach. Ann. Statist. 31, 880-920.

[26] Ni, P. (2002). Minimum distance regression and autoregressive model ﬁtting. In
Thesis. The Department of Statistics and Probability, Michigan State University.

[27] Shen, X. (2000). Linear regression with current status data. J. Amer. Statist.
Assoc. 451, 842-852.

[28] Stute, W.; Thies, S.; Zhu, L.X. (1998). Model checks for regression: an innovation
process approach. Ann. Statist. 26, 1916-1934.

95