TESTING OF REGRESSION FUNCTIONS WHEN RESPONSES ARE MISSING AT RANDOM By Xiaoyu Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics 2012 ABSTRACT TESTING OF REGRESSION FUNCTIONS WHEN RESPONSES ARE MISSING AT RANDOM By Xiaoyu Li This thesis consists two chapters. The first chapter proposes a class of minimum distance tests for fitting a parametric regression model to a regression function when some responses are missing at random. These tests are based on a class of minimum integrated square distances between a kernel type estimator of a regression function and the parametric regression function being fitted. The estimators of the regression function are based on two completed data sets constructed by imputation and inverse probability weighting methods. The corresponding test statistics are shown to have asymptotic normal distributions under null hypothesis. Some simulation results are also presented. The second chapter considers the problem of testing the equality of two nonparametric regression curves against a one-sided alternatives based on two samples with possibly distinct design and error densities, when responses are missing at random. This chapter proposes a class of tests using imputation and covariate matching. The asymptotic distributions of these test statistics are shown to be Gaussian under null hypothesis and a class of local nonparametric alternatives. The consistency of these tests against a large class of fixed alternatives is also established. This chapter also includes a simulation study, which assesses the finite sample behavior of a member of this class of tests. Copyright by XIAOYU LI 2012 ACKNOWLEDGMENTS I would like to sincerely and gratefully thank my advisor Professor Hira L. Koul for his excellent guidance and great patience during the past five years. He sets up a career model for me with his great enthusiasm towards science, serious attitude, hard working, and extraordinary kindness to students. His love of statistics and mathematics encourages me to keep working in my research. I also wish to thank Professor Vidyadhar S. Mandrekar, Yimin Xiao, David Todem for serving on my dissertation committee. Special thanks to Professor Lijian Yang and Yimin Xiao for their help in my graduate study and life. Many thanks to Professor Vidyadhar S. Mandrekar for his teaching and encouragement, and to Professor James Stapleton for his help since the first day I came to Michigan State University. I am grateful to the Department of Statistics and Probability for offering assistantship to support me to complete graduate studies. Finally, I would like to thank my family for their love which enables me to complete this work and pursue my career goal. This research is supported by the NSF DMS Grant 0704130, under the P.I.: Professor Hira L. Koul. iv TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 1.1 1.2 1.3 1.4 1.5 1.6 Minimum Distance Regression Model Checking when Responses are Missing At Random . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consistency of the minimum distance estimators . . . . . . . . . . . . . . . . Asymptotic distribution of the minimum distance estimators under H0 . . . Asymptotic distribution of the test statistics under H0 . . . . . . . . . . . . Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Chapter 2 2.1 2.2 2.3 2.4 2.5 Testing for Superiority of Two Regression Curves when sponses are Missing At Random . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asymptotic distribution of the test statistic under H0 , H1N , and H1 . . Some suggested estimators . . . . . . . . . . . . . . . . . . . . . . . . . Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . v 1 1 5 10 16 35 46 Re. . . . . . . . . . . . . . . . . . 50 50 53 58 68 73 . . . . . . . . . . . . . . . . . . . 79 LIST OF TABLES Table 1.1 Table 1.2 Table 1.3 Empirical sizes and powers for model 0 vs. models 1-4 with X ∼ N (0, V1 ) and ε ∼ N (0, (.3)2 ) . . . . . . . . . . . . . . . . . . . . . . 48 Empirical sizes and powers for model 0 vs. models 1-4 with X ∼ N (0, V2 ) and ε ∼ N (0, (.3)2 ) . . . . . . . . . . . . . . . . . . . . . . 48 ˆ Mean and s.d. of θn1 under model 0 with X ∼ N (0, V1 ), ε ∼ 2 ), and E(δ|X = x) = ∆ (x) . . . . . . . . . . . . . . . . . N (0, (.3) 1 49 Table 2.1 ˆ Empirical sizes of V , with coefficients ρ1 , ρ2 , ρ3 , and ∆l = Dl , l = 1, 2. 74 Table 2.2 ˆ Empirical sizes of V , with coefficients ρ1 , ρ2 , ρ3 , and ∆l = 1, l = 1, 2. 74 Table 2.3 ˆ Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.1, and ∆l = Dl , l = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Table 2.4 ˆ Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.2, and ∆l = 1, l = 1, 2. 75 Table 2.5 ˆ Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = Dl , l = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 ˆ Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = 1, l = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Table 2.6 vi Chapter 1 Minimum Distance Regression Model Checking when Responses are Missing At Random 1.1 Introduction In this chapter, we discuss a class of minimum distance tests for fitting a parametric model to the regression function based on imputation and inverse probability weighting method, when responses are missing at random. To be specific, let X be an explanatory variable of dimension d with d ≥ 1, Y be a response variable of dimension 1 with E|Y | < ∞, δ be an indicator for whether the response is missing or observed, i.e. δ = 1 if Y is observed, and δ = 0 if Y is missing. The missing mechanism of the data is missing at random, in which δ and Y are conditionally independent, given X, i.e. P (δ = 1|Y, X) = P (δ = 1|X), a.s.; see 1 Little and Rubin (1987). Let µ(x) = E(Y |X = x), x ∈ Rd , denote the regression function. Consider the regression model Y = µ(X) + ε (1.1) with response missing at random. Let {mθ (·) : θ ∈ Θ}, Θ ⊂ Rq , be a given parametric model and I be a compact subset of Rd . The problem of interest is to test the hypothesis H0 : µ(x) = mθ0 (x) for some θ0 ∈ Θ, and for all x ∈ I, H1 : H0 is not true, based on the random sample {(Xi , δi Yi ) : i = 1, 2, · · · , n} from the distribution of (X, δY ) in model (1.1). One is also interested in finding the parameter θ ∈ Θ that best fits the data under the null hypothesis. Regression model checking when data are completely observed is a classical problem in statistics. Many interesting results are available, see, e.g., Eubank and Spiegelman (1990), Eubank and Hart (1992, 1993), H¨rdle and Mammen (1993), Zheng (1996), Hart (1997), a Stute et al. (1998), Koul and Ni (2004), Koul and Song (2009), Koul (2011), among others. Hart (1997) summarized numerous testing procedures. Koul and Ni (2004) (K-N) proposed a class of tests based on certain minimized L2 distances between a nonparametric estimator of the regression function and the parametric model being fitted. They proved asymptotic 2 normality of the minimum distance estimators and the proposed test statistics under the fitted model, and consistency of the proposed tests against a class of fixed alternatives. Koul and Song (2009) extended this minimum distance methodology to the regression model with Berkson measurement errors. They also obtained the asymptotic power of the proposed tests against a class of local alternatives. Koul (2011) implemented the minimum distance methodology on classical regression model with design non-random and uniform on [0, 1]. Sun and Wang (2009) considered the model checking problem when data are missing at random. They constructed complete data sets by imputation and inverse probability weighting methods, and proposed two score-type and two empirical process based test statistics. The asymptotic behaviors of these test statistics were investigated under the null hypothesis and local alternatives. In this chapter we focus on adapting the minimum distance testing method of K-N to missing data at random setup when the data are completed by the imputation and inverse probability weighting methods. To describe the testing procedure, we need to estimate µ(x). Since, under H0 , µ is parametric, we only need to estimate θ0 at √ n-consistent rate. Let αn ˆ be such an estimator of θ0 based on the random sample. A suggested choice of αn is given in ˆ ˜ Theorem 1.4.1, Section 1.4 below. Let K be a symmetric kernel function on [−1, 1]d , b = bn ˜ ˜ be a bandwidth sequence of positive numbers, Kb (y) := b−d K(y/bn ), y ∈ Rd ; and let for n x ∈ Rd , ∆(x) = P (δ = 1|X = x), and ˆ ∆n (x) = n ˜ i=1 δi Kb (x − Xi ) . n ˜ i=1 Kb (x − Xi ) ˆ Note that ∆n (x) is the Nadaraya-Watson kernel estimator of ∆(x). We construct two comˆ plete data sets {(Xi , Yij ), i = 1, · · · , n}, j = 1, 2, by imputation and inverse probability 3 weighting methods, respectively, where ˆ Yi1 = δi Yi + (1 − δi )mαn (Xi ), ˆ i = 1, · · · , n; δi δi Y + 1− mαn (Xi ), ˆ ˆ n (Xi ) i ˆ n (Xi ) ∆ ∆ ˆ Yi2 = (1.2) i = 1, · · · , n. (1.3) To proceed further, let K and K ∗ be kernel functions on [−1, 1]d , h = hn and w = wn be window width sequences of positive numbers, G be a σ-finite measure on Rd with Lebesgue density g. Assume the design variable X has a uniformly continuous Lebesgue density f that is bounded from below on I. Define n n ˆ fh (x) = n−1 Kh (x − Xi ), ˆ fw (x) = n−1 i=1 ∗ Kw (x − Xi ), x ∈ Rd , i=1 where hn ∼ n−a with 0 < a < min(1/(2d), 4/(d(d + 4))), and wn ∼ (log n/n)1/(d+4) . Adaptive versions of the L2 distances proposed in K-N in the current setup are n ˆ Tnj (θ) = n−1 I 2 ˆ ˆ Kh (x − Xi )(Yij − mθ (Xi )) {fw (x)}−2 dG(x), θ ∈ Rq , i=1 and the corresponding minimum distance estimators are ˆ ˆ θnj := arg min Tnj (θ), j = 1, 2. θ∈Θ ˆ ˆ The proposed tests of H0 are to be based on Tnj (θnj ), j = 1, 2. To proceed further, we need more notation. Let ˆ εij := Yij − mθ (Xi ), ˆ ˆ nj 4 j = 1, 2, (1.4) n ˆ Cnj := n−2 i=1 I 2 Kh (x − Xi )ˆ2 {fw (x)}−2 dG(x), εij ˆ ˆ Γnj := 2hd n−2 i=k I ˆ Kh (x − Xi )Kh (x − Xk )ˆij εkj {fh (x)}−2 dG(x) ε ˆ ˆ ˆ ˆ ˆ ˆ 1/2 Dnj := nhd/2 (Tnj (θnj ) − Cnj )/Γnj , 2 , j = 1, 2. ˆ For each j = 1, 2, the proposed test rejects H0 whenever |Dnj | is large. Asymptotic normality ˆ ˆ of n1/2 (θnj − θ0 ) and Dnj , j = 1, 2, under H0 are established in Section 1.4 and Section 1.5, ˆ respectively. Consistency of θnj , j = 1, 2, under H0 is given in Section 1.3. Assumptions and preliminary lemmas needed to prove all these results are stated in Section 1.2, while Section 1.6 is devoted to simulation studies. In the sequel, we write h for hn , w for wn , and b for bn ; the integrals with respect to the G-measure are understood to be over the set I; all limits are taken as n → ∞, unless specified otherwise; for any two sequences of real numbers an and bn , notation an ∼ bn means that an /bn → 1; the convergence in probability is denoted by →p , in distribution, by →d , and almost surely, by →a.s. ; the r-dimension normal distribution with mean vector a and covariance matrix B is denoted by Nr (a, B), and N (a, B) = N1 (a, B). Denoted by Φ the standard normal cumulative distribution function, and zα the (1 − α)-quantile. 1.2 Assumptions Here we shall state the needed assumptions. (e1) (Xi , δi Yi ); Xi ∈ Rd , Yi ∈ R, δi = 0 or 1, i = 1, 2, · · · , n, are i.i.d. random vectors with δ = 1, if Y is observed, and δ = 0, if Y is missing; δ and ε are conditionally independent, given X. 5 (e2) E(ε|X = x) = 0, Eε2 < ∞. The function σ 2 (x) := E(ε2 |X = x) is a.e. in (G) continuous on I, and ∆(x) := E(δ|X) = P (δ = 1|X = x) is positive and Lipschitzcontinuous of order 1 on an open interval containing I. (e3) E|ε|2+δ0 < ∞, for some δ0 > 0. (e4) Eε4 < ∞. (f1) The design variable X has a uniformly continuous Lebesgue density f that is bounded from below on an open interval containing I. (f2) The density f is twice continuously differentiable with a compact support. (g) G is a σ-finite measure on Rd and has a continuous Lebesgue density g. (k1) The kernels K and K ∗ are positive symmetric square integrable densities on [−1, 1]d . In addition, K ∗ satisfies Lipschitz-continuity of order 1. ˜ (k2) The kernel K is positive symmetric square integrable density on [−1, 1]d , satisfying ˜ Lipschitz-continuity of order γ, γ > 0. K(u) attains its maximum at u = 0. (m1) For each θ, mθ (x) is a.s. continuous in x w.r.t. integrating measure G. (m2) The parametric family of models mθ (x) is identifiable w.r.t. θ, i.e., if mθ1 (x) = mθ2 (x), for almost all x(G), then θ1 = θ2 . (m3) For some positive continuous function on I and for some β > 0, |mθ2 (x) − mθ1 (x)| ≤ θ2 − θ1 β (x), ∀θ2 , θ1 ∈ Θ, x ∈ I. (m4) The true parameter θ0 is an inner point of Θ. For every x, mθ (x) is differentiable in θ in a neighborhood of θ0 with the vector of derivatives mθ (x), such that for every ˙ 6 ε > 0, k < ∞, lim sup P mθ (Xi ) − mθ0 (Xi ) − (θ − θ0 )T mθ0 (Xi ) ˙ √ sup n θ − θ0 1≤i≤n, nhd θ−θ0 ≤k >ε = 0. (m5) The vector function x → mθ0 (x) is continuous in x ∈ I and for every ε > 0, ˙ there is an Nε < ∞ such that for every 0 < k < ∞, max P 1≤i≤n,(nhd )1/2 θ−θ0 ≤k (m6) n−1 h−d/2 mθ (Xi ) − mθ0 (Xi ) ≥ ε ≤ ε, ˙ ˙ n ˙T ˙ i=1 δi mθ0 (Xi )mθ0 (Xi ), ∀n > Nε . ˙θ n ≥ q, and E[δ mθ0 (X)mT (X)] are positive defi˙ 0 nite. (a) The estimator αn is ˆ √ n-consistent for θ0 under H0 . (b1) nbd → ∞, nbd+1 → 0. n n (b2) bn ∼ n−r , where 1/(d + 1) < r < 1/d. (h1) hn → 0. (h2) nh2d → ∞. n (h3) hn ∼ n−a , where 0 < a < min(1/(2d), 4/(d(d + 4))). (h4) hn ∼ n−a , where 0 < a < 1/d − r, with r in (b2). (w) wn = an (log n/n)1/(d+4) , an → a0 > 0. Note that (h3) implies (h1) and (h2), (h4) implies (h3), and (b2) implies (b1). Among these assumptions, (e3), (e4), (f1), (f2), (g), (k1), (m1)-(m5), (h1)-(h3), (w), and part of (e1) and (e2), are similar as in K-N when no data are missing; conditions on δ and ∆ in (e1) and (e2) 7 are for the missing data at random setup; (m6) and (a) are used for the imputation method, while (k2), (a), (b1), (b2), and (h4) are for the inverse probability weighting method. An example of r in (b2) and a in (h4) is r = (2d + 1)/(2d(d + 1)), a = 1/(2d(d + 1)). We need the following notation in the proofs later. For i = 1, · · · , n, j = 1, 2, x ∈ Rd , define ε∗ := i2 ε∗ := δi εi , i1 δi ε, ∆(Xi ) i ˜ Yi1 := mθ0 (Xi ) + ε∗ , i1 (1.5) ˜ Yi2 := mθ0 (Xi ) + ε∗ , i2 ∗ ∗ Kwi := Kw (x − Xi ), Khi (x) := Kh (x − Xi ), n n ˆ fh (x) := n−1 dψ(x) := ˜ ˜ Kbi (x) := Kb (x − Xi ), Khi (x), i=1 {f (x)}−2 dG(x), ˆ fw (x) := n−1 n ∗ Kwi (x), ˆ fb (x) := n−1 i=1 i=1 ˆ ˆ dψh (x) := {fh (x)}−2 dG(x), n µn (x, θ) := ˆ ˆ dψw (x) := {fw (x)}−2 dG(x), n n−1 Khi (x)mθ (Xi ), µn (x, θ) := ˙ n−1 i=1 n Zn (x, θ) := n−1 Khi (x)mθ (Xi ), ˙ i=1 Khi (x)(mθ (Xi ) − mθ0 (Xi )), i=1 µh (x) := E µn (x, θ0 ) = EKh (x − X)mθ0 (X), ˙ ˙ ˙ n µnδ (x, θ) := n−1 ˙ Khi (x)(1 − δi )mθ (Xi ), ˙ i=1 µhδ (x) := E µnδ (x, θ0 ) = EKh (x − X)(1 − δ)mθ0 (X), ˙ ˙ ˙ n Unj (x, θ) := n−1 ˜ Khi (x)(Yij − mθ (Xi )), i=1 n ˆ Unj (x, θ) := n−1 ˆ Khi (x)(Yij − mθ (Xi )), i=1 n Unj (x) := Unj (x, θ0 ) = n−1 Khi (x)ε∗ , ij i=1 n Tnj (θ) := ˜ Kbi (x), n−1 2 ˜ Khi (x)(Yij − mθ (Xi )) dψ(x), i=1 8 θ ∈ Rq , n n−1 ˜ Tnj (θ) := 2 ˆ ˜ Khi (x)(Yij − mθ (Xi )) dψw (x), θ ∈ Rq , i=1 ˜ ˜ θnj := arg min Tnj (θ), θ∈Θ ˜ εij := Yij − mθ (Xi ), ˜ ˜ nj n 2 n−1 ˜ Anj := ˆ ˜ Khi (x)(Yij − Yij ) dψ(x), i=1 n Cnj := n−2 2 ˜ Khi (x)(Yij − mθ0 (Xi ))2 dψ(x), i=1 rn (x) := ˆ 1 1 − , ˆ n (x) ∆(x) ∆ un := αn − θ0 , ˆ rni := rn (Xi ), ˆ ˆ dni := mαn (Xi ) − mθ0 (Xi ) − uT mθ0 (Xi ). ˆ n ˙ The following lemmas are found useful in proofs later. Lemma 1.2.1 is facilitated by Mack and Silverman (1982), and Lemma 1.2.3 is derived by Theorem 3 of Collomb and H¨rdle a (1986). Lemma 1.2.1. Under the conditions (f1), (k1), (h1), and (h2), the following hold. ˆ sup |fh (x) − f (x)| = op (1), (1.6) ˆ sup |fw (x) − f (x)| = op (1), (1.7) x∈I x∈I f (x) − 1 = op (1). ˆ x∈I fw (x) (1.8) sup Lemma 1.2.2. (Theorem 2.2 part (2), Bosq (1998)) Under the assumptions (f2), (k1), and (w), we have for ∀k > 0, and k ∈ N, ˆ (logk n)−1 (n/ log n)2/(d+4) sup fw (x) − f (x) → 0, x∈I 9 a.s. (1.9) Lemma 1.2.3. Suppose (e2), (f2), (k2), and (b1) hold, then ˆ sup |fb (x) − f (x)| = op (1), (1.10) ˆ sup |∆n (x) − ∆(x)| = op (1), (1.11) x∈I x∈I 1 1 − = op (1), ˆ n (x) ∆(x) x∈I ∆ 1 1 n1/2 bd/2 (log n)−1/2 sup − = Op (1). ˆ ∆(x) x∈I ∆n (x) sup 1.3 (1.12) (1.13) Consistency of the minimum distance estimators ˆ In this section we prove the consistency of the minimum distance estimators θnj , j = 1, 2, under H0 . To state the results, we need Lemma 3.1 in K-N as a preliminary reproduced here for the sake of completeness. Let L2 (G) denote a class of square integrable real valued functions on Rd with respect to G. Define ρ(ν1 , ν2 ) := (ν1 (x) − ν2 (x))2 dG(x), ν1 , ν2 ∈ L2 (G), and the map M(ν) := arg min ρ(ν, mθ ), ν ∈ L2 (G). θ∈Θ Lemma 1.3.1. (Koul and Ni (2004)) Let m satisfy conditions (m1)-(m3). Then the following hold. (a) M(ν) always exists, ∀ν ∈ L2 (G). 10 (b) If M(ν) is unique, then M is continuous at ν in the sense that for any sequence of {νn } ∈ L2 (G) converging to ν in L2 (G), M(νn ) → M(ν), i.e., ρ(νn , ν) → 0 implies M(νn ) → M(ν) as n → ∞. (c) M(mθ (·)) = θ, uniquely for ∀θ ∈ Θ. ˆ We now proceed to state and prove the consistency of θnj , j = 1, 2. Theorem 1.3.1. Under H0 , (e1), (e2), (f1), (k1), (m1)-(m4), (a), (h1), and (h2), ˆ θnj →p θ0 , j = 1, 2. Proof. The basic idea of the proof is the same as in K-N, Theorem 3.1; Only details ˆ ˜ with respect to Yij − Yij , i = 1, · · · , n, are different. By part (c) in Lemma 1.3.1, one has ˆ θnj = M(mθ ), j = 1, 2, and θ0 = M(mθ0 ). Then it suffices to prove ρ(mθ , mθ0 ) = op (1), ˆ ˆ nj nj j = 1, 2, by part (b1) in Lemma 1.3.1. Define n mnj (x) := ˜ n−1 n ˜ ˆ Khi (x)Yij /fw (x), mnj (x) := ˆ i=1 n−1 ˆ ˆ Khi (x)Yij /fw (x), i=1 ˆ Rnj (θ) = [mnj (x) − mθ (x)]2 dG(x), ˆ θ ∈ Rq , Cn (θ) := ˆ ˆ [µn (x, θ) − fw (x)mθ ]2 dψw (x). ˆ ˆ βnj := arg min Rnj (θ), θ∈Θ By the fact that ˆ ˆ ˆ ρ(mθ , mθ0 ) ≤ 2[ρ(mθ , mnj ) + ρ(mnj , mθ0 )] = 2[Rnj (θnj ) + Rnj (θ0 )], ˆ ˆ ˆ ˆ nj nj 11 it suffices to show ˆ Rnj (θ0 ) = op (1), j = 1, 2, ˆ ˆ Rnj (θnj ) = op (1), (1.14) j = 1, 2. (1.15) If we can prove (1.14) and the following result ˆ ˆ sup |Tnj (θ) − Rnj (θ)| = op (1), j = 1, 2, (1.16) θ∈Θ ˆ we can obtain (1.15). This is because the definition of βnj and (1.14) lead to the result ˆ ˆ ˆ ˆ Rnj (βnj ) = op (1), which together with (1.16) leads to Tnj (βnj ) = op (1); by the definition of ˆ ˆ ˆ θnj , one has Tnj (θnj ) = op (1); this result and (1.16) bring the claim (1.15). Therefore, we only need to prove (1.14) and (1.16). ˜ Recall Anj from (1.5). To prove (1.14), note that ˆ [fw (x)(mnj (x) − mnj (x)) + Unj (x) ˆ ˜ ˆ Rnj (θ0 ) = ˆ ˆ +µn (x, θ0 ) − fw (x)mθ0 (x)]2 dψw (x) ≤ 3 +3 [mnj (x) − mnj (x)]2 dG(x) + 3 ˆ ˜ 2 ˆ Unj (x)dψw (x) ˆ ˆ [µn (x, θ0 ) − fw (x)mθ0 (x)]2 dψw (x) ˆ2 ˜ ˜ ≤ 3(1 + sup |f 2 (x)/fw (x) − 1|)Anj + 3Tnj (θ0 ) + 3Cn (θ0 ), j = 1, 2, x∈I By Fubini, the continuity of f , σ 2 , and ∆, assured by (e2) and (f1), and by (k1) and (h2), we have 12 E 2 Un1 (x)dψ(x) = n−1 2 EKh (x − X)∆(X)σ 2 (X)dψ(x) = O((nhd )−1 ), E 2 Un2 (x)dψ(x) = n−1 2 EKh (x − X){∆(X)}−1 σ 2 (X)dψ(x) = O((nhd )−1 ), so that Tnj (θ0 ) = 2 Unj (x)dψ(x) = Op ((nhd )−1 ), j = 1, 2. Together by (1.8), we have ˆ ˜ Tnj (θ0 ) ≤ sup |f (x)/fw (x)|2 Tnj (θ0 ) = Op ((nhd )−1 ), j = 1, 2. x∈I The claim Cn (θ0 ) = op (1) can be derived by the same argument as that of proving (3.5) in K-N. Note that for i = 1, · · · , n, ˆ ˜ Yi1 − Yi1 = (1 − δi )(mαn (Xi ) − mθ0 (Xi )), ˆ ˆ ˜ Yi2 − Yi2 = rni δi εi + 1 − ˆ (1.17) δi (mαn (Xi ) − mθ0 (Xi )) ˆ ∆(Xi ) −ˆni δi (mαn (Xi ) − mθ0 (Xi )). r ˆ Recall un and dni from (1.5). By calculation in (3.9) in K-N, (m4), and (a), we have |d |2 ˜ An1 ≤ 2 un 2 max ni 2 1≤i≤n un ˆ2 fh (x)dψ(x) n +2 un n−1 2 Khi (x)(1 − δi ) mθ0 (Xi ) ˙ 2 dψ(x) i=1 = op (1), (1.18) n ˜ An2 ≤ 3 +3 n−1 2 Khi (x)ˆni δi εi dψ(x) r i=1 n n−1 i=1 Khi (x) 1 − 2 δi (mαn (Xi ) − mθ0 (Xi )) dψ(x) ˆ ∆(Xi ) 13 n n−1 +3 2 Khi (x)ˆni δi (mαn (Xi ) − mθ0 (Xi )) dψ(x) r ˆ i=1 n ≤ 3 n−1 2 Khi (x)ˆni δi εi dψ(x) r i=1 +6 un 2 d2 ni max 1≤i≤n un 2 n n−1 Khi (x) 1 + i=1 n +6 un 2 +6 +6 n−1 Khi (x) 1 − δi ∆(Xi ) 2 δi ∆(Xi ) mθ0 (Xi ) ˙ 2 dψ(x) dψ(x) i=1 n 2 d2 ni Khi (x)δi dψ(x) n−1 sup rni ˆ2 un 2 max 1≤i≤n un 2 1≤i≤n i=1 n 2 n−1 Khi (x)δi mθ0 (Xi ) dψ(x) ˙ un 2 sup rni ˆ2 1≤i≤n i=1 = op (1), (1.19) Therefore, together with (1.8) and (1.12), we obtain (1.14). To prove (1.16), write ˆ ˆ Tnj (θ) − Rnj (θ) mnj (x) − ˆ = = − −2 µn (x, θ) 2 dG(x) − ˆ fw (x) [mnj (x) − mθ (x)]2 dG(x) ˆ 2 µn (x, θ) − mθ (x) dG(x) ˆ fw (x) mnj (x) − ˆ µn (x, θ) ˆ fw (x) µn (x, θ) − mθ (x) dG(x), ˆ fw (x) j = 1, 2. By Cauchy-Schwarz (C-S) inequality, we have 1/2 ˆ ˆ ˆ1/2 sup |Tnj (θ) − Rnj (θ)| ≤ sup Cn (θ) + 2 sup Tnj (θ)Cn (θ), θ∈Θ θ∈Θ θ∈Θ Hence it suffices to prove 14 j = 1, 2. sup Cn (θ) = op (1), θ∈Θ ˆ sup Tnj (θ) = Op (1), j = 1, 2. (1.20) θ∈Θ One can prove the first claim in (1.20) by the same argument as in proving (3.14) in K-N. ˜ To prove the second part of (1.20), note that by adding and subtracting Yij to the i-th ˆ summand in Tnj (θ), we obtain ˆ2 ˜ ˆ Tnj (θ) ≤ 2(1 + sup |f 2 (x)/fw (x) − 1|) Anj + x∈I [Unj (x) − Zn (x, θ)]2 dψ(x) ˆ2 ≤ 2(1 + sup |f 2 (x)/fw (x) − 1|) x∈I ˜ × Anj + 2 2 Unj (x)dψ(x) + 2 From (3.16) in K-N, one obtains supθ∈Θ 2 Zn (x, θ)dψ(x) . 2 Zn (x, θ)dψ(x) = Op (1). By (1.8) and Anj = op (1), 2 ˆ Unj (x)dψ(x) = op (1) in the argument above, we have supθ∈Θ Tnj (θ) = Op (1), j = 1, 2. Thus the proof of the theorem is complete. 15 1.4 Asymptotic distribution of the minimum distance estimators under H0 ˆ This section states and proves the asymptotic normality of θnj , j = 1, 2. To proceed further, we need the following notation. Let Σ0 := mθ0 (x)mT (x)g(x)dx, ˙ ˙θ Σ∗ := 0 (1 − ∆(x))mθ0 (x)mT (x)g(x)dx, ˙ ˙θ Σ1 := σ 2 (x)∆(x)mθ0 (x)mT (x)g 2 (x)(f (x))−1 dx ˙ ˙θ (1.21) 0 0 0 ˜ ˙θ σ 2 (x)∆(x)mθ0 (x)mT (x)g(x)dx Σ−1 Σ∗ ˙ 0 0 Σ∗ + 2 1 0 ˜ +Σ∗ Σ−1 0 0 ˜ ˙θ σ 2 (x)∆(x)mθ0 (x)mT (x)f (x)dx Σ−1 Σ∗ , ˙ 0 0 0 Σ2 := σ 2 (x)mθ0 (x)mT (x)g 2 (x)(∆(x)f (x))−1 dx, ˙ ˙θ ˜ Σ0 := ∆(x)mθ0 (x)mT (x)f (x)dx, ˙ ˙θ 0 0 n ˜ Σn := n−1 i=1 δi mθ0 (Xi )mT (Xi ), ˙ ˙θ 0 n ˜ Sn := n−1 δi εi mθ0 (Xi ), ˙ Snj := Unj (x)µh (x)dψ(x), ˙ j = 1, 2. i=1 Theorem 1.4.1. Suppose H0 , (e1), (e2), (e3), (f1), (f2), (g), (k1), (m1)-(m5), (a), and (h3) hold. Then, ˆ n1/2 (θn1 − θ0 ) = Σ−1 n1/2 {Sn1 + Σ∗ (ˆ n − θ0 )} + op (1), 0 α 0 (1.22) where αn is in (1.2), and ˆ ˆ n1/2 (θn1 − θ0 ) = Op (1). 16 (1.23) T If under H0 , mθ0 (x) is a linear function of θ0 , i.e. mθ0 (x) = θ0 l(x), for all x ∈ I, where ˜n l(x) satisfies (m1)-(m3) and (m6), we can take αn = Σ−1 {n−1 ˆ n ˙ i=1 δi Yi mθ0 (Xi )}, which is the least square estimator and satisfies condition (a), and we obtain ˆ ˜ ˜ n1/2 (θn1 − θ0 ) = Σ−1 n1/2 {Sn1 + Σ∗ Σ−1 Sn } + op (1). 0 n 0 (1.24) If (k2), (b2), and (h4) hold, one has ˆ n1/2 (θn2 − θ0 ) = Σ−1 n1/2 Sn2 + op (1). 0 (1.25) Consequently, (1.24) and (1.25) lead to ˆ n1/2 (θnj − θ0 ) →d Nq (0, Σ−1 Σj Σ−1 ), 0 0 j = 1, 2. (1.26) ˜ ˜ ˜ Here Σ0 , Σ∗ , Σ0 , Σn , Sn , Snj , and Σj , j = 1, 2, are in (1.21). 0 Proof. We prove the theorem in two steps, following the routine to prove Theorem 4.1 in K-N. Step 1. The first step is to show that ˆ nhd θnj − θ0 2 = Op (1), Let Dn (θ) := j = 1, 2. (1.27) 2 Zn (x, θ)dψ(x). Note that ˆ nhd Dn (θnj ) = nhd ˆ ˆnj − θ0 2 Dn (θnj ) , θ ˆ θnj − θ0 2 17 j = 1, 2. It suffices to prove ˆ nhd Dn (θnj ) = Op (1), j = 1, 2, (1.28) because the rest follows the a similar argument used in proving (4.4) in K-N, if the correˆ ˆ sponding θn is changed to θnj , j = 1, 2. Observe that ˆ nhd Dn (θnj ) = nhd ˆ ˆ ˆ [Unj (x, θnj ) − Unj (x, θ0 )]2 dψ(x) ˆ2 ≤ 2nhd (1 + sup |fw (x)/f 2 (x) − 1|) x∈I × ˆ ˆ ˆ2 Unj (x, θnj )dψw (x) + ˆ ˆ2 Unj (x, θ0 )dψw (x) ˆ2 ˆ ≤ 4nhd (1 + sup |fw (x)/f 2 (x) − 1|)Tnj (θ0 ) x∈I ≤ 8nhd (1 + ˆ2 ˆ2 ˜ sup |fw (x)/f 2 (x) − 1|)(1 + sup |f 2 (x)/fw (x) − 1|){Tnj (θ0 ) + Anj }. x∈I x∈I ˜ By (1.7), (1.8), and Tnj (θ0 ) = Op ((nhd )−1 ), j = 1, 2, it suffices to prove nhd Anj = Op (1), j = 1, 2. This result hold for j = 1 because of (1.18). When j = 2, by (a), (1.12), and calculation in (1.19), it suffices to show the following results: n nhd n−1 2 Khi (x)ˆni δi εi dψ(x) = Op (1). r (1.29) i=1 To prove (1.29), we have n nhd E n−1 n 2 Khi (x)ˆni δi εi dψ(x) = n−1 hd r i=1 E rni δi ε2 ˆ 2 i i=1 18 2 Khi (x)dψ(x) = hd E δˆn (X)ε2 r2 = h−d E × 2 Kh (x − X)dψ(x) K 2 ((x − z)/h){f (x)}−2 g(x)f (z){∆(z)}−1 σ 2 (z) ˜ ˜ (∆(z) − 1)K(0) + n (∆(z) − δi )Kbi (z) 2 i=2 dzdx ˜ ˜ K(0) + n δi Kbi (z) i=2 K 2 (u){f (z + uh)}−2 g(z + uh)f (z){∆(z)}−1 σ 2 (z) = ×E ˜ ˜ (∆(z) − 1)K(0) + n (∆(z) − δi )Kbi (z) 2 i=2 dzdu, ˜ ˜ K(0) + n δi Kbi (z)) i=2 where the last equality is derived by Fubini’s theorem. Let Bn (z) := E ˜2 ˜ (∆(z) − 1)2 K 2 (0) + n (∆(z) − δi )2 Kbi (z) i=1 , ˜ ˜ [K(0) + n δi Kbi (z))]2 i=1 z ∈ Rd . Let Ib be the bn -neighborhood of compact set I. By (e2), (f1), and (k1), it is sufficient ˜ to show supz∈I Bn (z) = O(1). Let In (z) := K(0) + b n ˜ i=1 δi Kbi (z), z ∈ Ib , n ≥ 1, and ˜ I0 (z) ≡ K(0). For any z ∈ Ib , write Bn (z) = Bn1 (z) + B2n (z) + 2Bn3 (z) − 2Bn4 (z), where ˜ Bn1 (z) := E [In (z)]−2 (∆(z) − 1)2 K 2 (0) , n Bn2 (z) := E [In (z)]−2 Bn3 (z) := E [In (z)]−2 ˜2 (∆(z) − δi )2 Kbi (z) , i=1 ˜ ˜ (∆(z) − δi )(∆(z) − δj )Kbi (z)Kbj (z) , 1≤i 0, 21 [−1,1]d (a + c0 )−2 du − [−1,1]d ˜ (a + K(u))−2 du ˜ ˜ K 2 (u) + 2aK(u) − c2 − 2ac0 0 du 2 (a + K(u))2 ˜ d (a + c0 ) [−1,1] ˜ ˜ K 2 (u) − c2 K(u) − c0 0 du + 2a = (a + c0 )−2 du ˜ ˜ [−1,1]d (a + K(u))2 [−1,1]d (a + K(u))2 ˜ c2 K 2 (u) −2 −2 0 du du − a ≥ (a + c0 ) 2 2 [−1,1]d a [−1,1]d (2a) ˜ c0 K(u) +2a du − du 2 2 [−1,1]d a [−1,1]d (2a) = ≥ (a + c0 )−2 (2a)−2 ˜ K 2 (u)du − 2d+2 c2 + (2a)−1 1 − 2d+2 c0 0 ≥ 0, thus, E [−1,1]d ˜ [In−1 + K(u)]−2 du = E En−1 ≤ E [−1,1]d [−1,1]d ˜ [In−1 + K(u)]−2 du [In−1 + c0 ]−2 du = 2d E[In−1 + c0 ]−2 . By a similar argument used in proving (1.30), we have k, j = 0, 1, · · · , n, E[In−k (z) + jc0 ]−2 ≤ {1 − p(2b)d ρ1∗ (z, b)}E[In−k−1 (z) + jc0 ]−2 +p(2b)d ρ∗ (z, b)E[In−k−1 (z) + (j + 1)c0 ]−2 . 1 Therefore, by (1.30) and (1.31), the following hold: 22 (1.31) E[In (z)]−2 ≤ {1 − p(2b)d ρ1∗ (z, b)} {1 − p(2b)d ρ1∗ (z, b)}E[In−2 (z)]−2 +p(2b)d ρ∗ (z, b)E[In−2 (z) + c0 ]−2 1 +{p(2b)d ρ∗ (z, b)} {1 − p(2b)d ρ1∗ (z, b)}E[In−2 (z) + c0 ]−2 1 +p(2b)d ρ∗ (z, b)E[In−2 (z) + 2c0 ]−2 1 2 = k=0 2 {1 − p(2b)d ρ1∗ (z, b)}2−k {p(2b)d ρ∗ (z, b)}k E[In−2 (z) + kc0 ]−2 1 k ≤ {1 − p(2b)d ρ1∗ (z, b)}2 {1 − p(2b)d ρ1∗ (z, b)}E[In−3 (z)]−2 +p(2b)d ρ∗ (z, b)E[In−3 (z) + c0 ]−2 1 +2{1 − p(2b)d ρ1∗ (z, b)}{p(2b)d ρ∗ (z, b)} {1 − p(2b)d ρ1∗ (z, b)} 1 ×E[In−3 (z) + c0 ]−2 + p(2b)d ρ∗ (z, b)E[In−3 (z) + 2c0 ]−2 1 +{p(2b)d ρ∗ (z, b)}2 {1 − p(2b)d ρ1∗ (z, b)}E[In−3 (z) + 2c0 ]−2 1 +p(2b)d ρ∗ (z, b)E[In−3 (z) + 3c0 ]−2 1 3 = k=0 3 {1 − p(2b)d ρ1∗ (z, b)}3−k {p(2b)d ρ∗ (z, b)}k E[In−3 (z) + kc0 ]−2 1 k ≤ ··· n ≤ k=0 n {1 − p(2b)d ρ1∗ (z, b)}n−k {p(2b)d ρ∗ (z, b)}k E[I0 (z) + kc0 ]−2 1 k ˜ ≤ {1 − p(2b)d ρ1∗ (z, b)}n [K(0)]−2 n +c−2 0 k=1 n −2 k {1 − p(2b)d ρ1∗ (z, b)}n−k {p(2b)d ρ∗ (z, b)}k . 1 k (1.32) By (e2) and (f1), for large enough n, f (x) and ∆(x) are bounded and bounded below from zero, and Lipschitz-continuous on Ib . Let f and ∆ denote the Lipschitz constants of f and 23 ∆, respectively. Define c1 := min ρ1∗ (z, b) > 0, c2 := ( f sup ∆(z) + ∆ sup f (z)), z∈Ib p(z, b) := ˜ z∈I2b z∈I2b p(2b)d ρ∗ (z, b) 1 . 1 + p(2b)d (ρ∗ (z, b) − ρ1∗ (z, b)) 1 By (1.30) and the fact that supz∈I (ρ∗ (z, b) − ρ1∗ (z, b)) ≤ 2bd1/2 c2 , we have b 1 ˜ E[In (z)]−2 ≤ {1 − p(2b)d c1 }n [K(0)]−2 + c−2 {1 + p(2b)d+1 d1/2 c2 }n 0 n × k=1 n −2 k {1 − p(z, b)}n−k {˜(z, b)}k . ˜ p k (1.33) Hence, nbd E[In (z)]−2 ≤ ˜ nbd [K(0)]−2 +c−2 0 {1 − p(2b)d c {1 + p(2b)d+1 d1/2 c n ×nbd k=1 1 2 d −(p(2b)d c1 )−1 −n(2b) pc1 } d+1 pd1/2 c 2 (p(2b)d+1 d1/2 c2 )−1 n(2b) } n −2 k {1 − p(z, b)}n−k {˜(z, b)}k . ˜ p k Note that n!(k!)−1 ((n − k)!)−1 {1 − p(z, b)}n−k {˜(z, b)}k is the probability mass function of ˜ p the Binomial(n, p(z, b)) distribution. Recall the Chernoff’s bound for a r.v. ζ ∼ B(n, p0 ), ˜ and a constant η ∈ (0, 1), P (ζ < (1 − η)np0 ) < exp(−np0 η 2 /2). Using this bound, with η = 1/2, we obtain that for any z ∈ Ib , 24 n nbd k=1 n −2 k {1 − p(z, b)}n−k {˜(z, b)}k ˜ p k n˜(z,b)/2 p = nbd n n −2 k {1 − p(z, b)}n−k {˜(z, b)}k ˜ p k + k=1 n˜(z,b)/2 p k= n˜(z,b)/2 +1 p n {1 − p(z, b)}n−k {˜(z, b)}k + nbd {n˜(z, b)/2}−2 ˜ p p k ≤ nbd ≤ k=1 d exp(−n˜(z, b)/8) + nbd {n˜(z, b)/2}−2 nb p p = nbd exp(−nbd 2d−3 pc1 (1 + p(2b)d+1 d1/2 c2 )−1 ) +(nbd )−1 41−d p−2 c−2 (1 + p(2b)d+1 d1/2 c2 )2 = O((nbd )−1 ), 1 by condition (b1). Together with the fact that d −1 {1 − p(2b)d c1 }−(p(2b) c1 ) → exp(1), d+1 d1/2 c )−1 2 {1 + p(2b)d+1 d1/2 c2 }(p(2b) → exp(1), we have nbd E[In (z)]−2 = O((nbd )−1 ), z ∈ Ib , sup nbd E[In (z)]−2 = O((nbd )−1 ). z∈Ib Hence sup Bn1 (z) = O((nbd )−2 ), z∈Ib sup Bn2 (z) = O((nbd )−1 ). z∈Ib 25 Observe that Bn3 (z) ˜ ˜ (∆(z) − δn−1 )(∆(z) − δn )Kb(n−1) (z)Kbn (z) = n(n − 1)E ˜ ˜ [In−2 (z) + δn−1 K (z) + δn Kbn (z)]2 b(n−1) = n(n − 1)b2d E ˜ ˜ K(u)K(v) 2 ∆ (z)(1 − ∆(z − bu))(1 − ∆(z − bv)) [In−2 (z)]2 ˜ ˜ K(u)K(v) −2 ∆(z)(1 − ∆(z)) ˜ [In−2 (z) + K(u)]2 ×∆(z − bu)(1 − ∆(z − bv)) + ˜ ˜ K(u)K(v) (∆(z) − 1)2 2 ˜ ˜ [In−2 (z) + K(u) + K(v)] ×∆(z − bu)∆(z − bv) ×f (z − bu)f (z − bv)dudv, thus we have |Bn3 (z)| ≤ n(n − 1)b2d E ˜ ˜ K(u)K(v) 2 [∆ (z)(1 − ∆(z − bu))(1 − ∆(z − bv)) [In−2 (z)]2 −2∆(z)(1 − ∆(z))∆(z − bu)(1 − ∆(z − bv)) +(∆(z) − 1)2 ∆(z − bu)∆(z − bv)] +2 ˜ ˜ ˜ ˜ K(u)K(v) K(u)K(v) − ˜ [In−2 (z)]2 [In−2 (z) + K(u)]2 ×∆(z)(1 − ∆(z))∆(z − bu)(1 − ∆(z − bv)) ×f (z − bu)f (z − bv)dudv ≤ n(n − 1)b2d E ˜ ˜ K(u)K(v) [ |∆(z) − ∆(z − bu)||∆(z) − ∆(z − bv)| [In−2 (z)]2 +∆(z)(1 − ∆(z))|∆(z − bv) − ∆(z − bu)| ] 26 +4 ˜ ˜ K 2 (u)K(v) ∆(z)(1 − ∆(z))∆(z − bu)(1 − ∆(z − bv)) [In−2 (z)]3 ×f (z − bu)f (z − bv)dudv ≤ nbd+2 (nbd )E[In−2 (z)]−2 2 ∆ ˜ ˜ K(u)K(v) u × v f (z − bu)f (z − bv)dudv +nbd+1 (nbd )E[In−2 (z)]−2 ∆(z)(1 − ∆(z)) ∆ ˜ ˜ K(u)K(v) u − v f (z − bu)f (z − bv)dudv × +4(nbd )2 E[In−2 (z)]−3 ∆(z)(1 − ∆(z)) ˜ ˜ K 2 (u)K(v)∆(z − bu)(1 − ∆(z − bv))f (z − bu)f (z − bv)dudv. × By a similar argument used in proving (1.33), for z ∈ Ib and j = 3, 4, · · · , one has ˜ E[In (z)]−j ≤ {1 − p(2b)d c1 }n [K(0)]−j −j +c0 {1 + p(2b)d+1 d1/2 c2 }n n k=1 n −j k {1 − p(z, b)}n−k {˜(z, b)}k , ˜ p k hence by (b1) and Chernoff’s bound, we obtain that n2 b2d E[In (z)]−3 ˜ ≤ n2 b2d [K(0)]−3 {1 − p(2b)d c1 }n n +c−3 {1 + p(2b)d+1 d1/2 c2 }n × n2 b2d 0 k=1 ≤ ˜ n2 b2d [K(0)]−3 {1 − p(2b)d c 1 n −3 k {1 − p(z, b)}n−k {˜(z, b)}k ˜ p k d −(p(2b)d c1 )−1 −n(2b) pc1 } +c−3 {1 + p(2b)d+1 d1/2 c2 }(p(2b) 0 d+1 d1/2 c )−1 n(2b)d+1 pd1/2 c2 2 × n2 b2d {n˜(z, b)/2}−3 + n2 b2d exp(−n˜(z, b)/8) p p 27 ˜ ∼ n2 b2d [K(0)]−3 exp(−n(2b)d pc1 ) +c−3 exp(n(2b)d+1 pd1/2 c2 ) (nbd )−1 81−d p−3 c−3 (1 + p(2b)d+1 d1/2 c2 )3 0 1 +n2 b2d exp(−nbd 2d−3 pc1 (1 + p(2b)d+1 d1/2 c2 )−1 ) = O((nbd )−1 ), for any z ∈ Ib , and supz∈I |Bn3 (z)| = O((nbd )−1 ). With the fact that b ˜ Bn4 (z) = 2n(1 − ∆(z))K(0)E ˜ (∆(z) − δn )Kbn (z) [In (z)]2 ˜ = 2nbd (1 − ∆(z))K(0) ×E ∆(z)(1 − ∆(z − bu)) ˜ K(u)f (z − bu) [In−1 (z)]2 (1 − ∆(z))∆(z − bu) − du, ˜ [In−1 (z) + K(u)]2 we have |Bn4 (z)| ˜ ≤ 2nbd (1 − ∆(z))K(0) ×E ˜ K(u) |∆(z) − ∆(z − bu)| [In−1 (z)]2 ˜ ˜ K(u) K(u) + − (1 − ∆(z))∆(z − bu) f (z − bu)du ˜ [In−1 (z)]2 [In−1 (z) + K(u)]2 ˜ = 2nbd+1 (1 − ∆(z))K(0)E[In−1 (z)]−2 ∆ ˜ +4nbd (1 − ∆(z))2 K(0)E[In−1 (z)]−3 ˜ K(u) u f (z − bu)du ˜ K 2 (u)∆(z − bu)f (z − bu)du = O(b(nbd )−1 ) + O((nbd )−2 ) = O((nbd )−2 ), and supz∈I |Bn4 (z)| = O((nbd )−2 ). Thus we have supz∈I |Bn (z)| = O((nbd )−1 ), and b b 28 n n−1 nhd 2 Khi (x)ˆni δi εi dψ(x) = Op ((nbd )−1 ). r i=1 Moreover, one obtains n n−1 2 Khi (x)ˆni δi εi dψ(x) = Op ((nhd )−1 (nbd )−1 )) = op (n−1 ). r (1.34) i=1 This completes the proof of (1.29), and hence we obtain (1.27). Step 2. In this part, we shall prove (1.22)-(1.26) in two steps, (2.a) and (2.b). (2.a) We will prove (1.22), (1.24) and (1.25) by similar arguments used in proving the asymptotic normality of the minimum distance estimator when data is complete in K-N. Let ˙ ˆ Tnj (θ) := −2 ˆ ˆ Unj (x, θ)µn (x, θ)dψw (x), ˙ j = 1, 2, ˆ be the derivative of Tnj (θ) with respect to θ. Since θ0 is an interior point of Θ by condition ˆ ˆ (m4), and θnj is consistent for θnj by Theorem 1.3.1, θnj will be in the interior of Θ and ˙ ˆ ˆ Tnj (θnj ) = 0 with arbitrarily large probability for all sufficient large n. The equation ˙ ˆ Tnj (θ) = 0 is equivalent to ˆ ˆ ˆ ˆ ˆ (Unj (x, θnj ) − Unj (x, θnj ))µn (x, θnj )dψw (x) + ˙ = ˆ ˙ ˆ ˆ Zn (x, θnj )µn (x, θnj )dψw (x), ˆ ˆ Unj (x)µn (x, θnj )dψw (x) ˙ j = 1, 2. (1.35) ˆ By similar proof as that of (4.16) in K-N, the right-hand side of (1.35) equals Rn (θnj − θ0 ) for all n ≥ 1, with Rn = Σ0 + op (1); while for the second term on the left-hand side, one has 29 ˆ ˆ Unj (x)µn (x, θnj )dψw (x) = Snj + op (n−1/2 ) by similar proofs as those of Lemma 4.1 and ˙ Lemma 4.2 in K-N, with Un and εi replaced by Unj and ε∗ in (1.5), respectively. Recall un ij and dni from (1.5). For the first term on the left-hand side with j = 1, note that ˆ ˆ ˆ ˆ ˆ (Un1 (x, θn1 ) − Un1 (x, θn1 ))µn (x, θn1 )dψw (x) ˙ n = un n−1 i=1 n +uT n n−1 d ˆ ˆ ˙ Khi (x)(1 − δi ) ni µn (x, θn1 )dψw (x) un ˆ ˆ Khi (x)(1 − δi )mθ0 (Xi ) µn (x, θn1 )dψw (x) := Jn1 + Jn2 . ˙ ˙ i=1 By (m4), (m5), (a), and result (1.8), we obtain n1/2 Jn1 |dni | 1≤i≤n un |d | ≤ n1/2 un max ni 1≤i≤n un ≤ n1/2 un max + max 1≤i≤n ˆ ˆ ˆ fh (x) µn (x, θn1 ) dψw (x) ˙ ˆ ˆ fh (x) µn (x, θ0 ) dψw (x) ˙ mθ (Xi ) − mθ0 (Xi ) ˙ˆ ˙ n1 ˆ ˆ fh (x)dψw (x) = op (1). Moreover, observe that T n1/2 Jn2 = n1/2 uT n ˆ µhδ (x)µT (x)dψw (x) ˙ ˙h +n1/2 uT n ˆ µhδ (x){µT (x, θ0 ) − µT (x)}dψw (x) ˙ ˙n ˙h +n1/2 uT n ˆ ˆ µhδ (x){µT (x, θn1 ) − µT (x, θ0 )}dψw (x) ˙ ˙n ˙n +n1/2 uT n ˆ {µnδ (x, θ0 ) − µhδ (x)}µT (x)dψw (x) ˙ ˙ ˙h +n1/2 uT n ˆ {µnδ (x, θ0 ) − µhδ (x)}{µT (x, θ0 ) − µT (x)}dψw (x) ˙ ˙ ˙n ˙h +n1/2 uT n ˆ ˆ {µnδ (x, θ0 ) − µhδ (x)}{µT (x, θn1 ) − µT (x, θ0 )}dψw (x). ˙ ˙ ˙n ˙n 30 On the right-hand side of last equality, the last five terms are op (1), because of (m5), (a), (1.8), C-S inequality and the fact that {µn (x, θ0 ) − µh (x)}{µT (x, θ0 ) − µT (x)}dψ(x) ˙ ˙ ˙n ˙h E = V ar(µn (x, θ0 ))dψ(x) = Op ((nhd )−1 ), ˙ ˆ ˆ {µn (x, θn1 ) − µn (x, θ0 )}{µT (x, θn1 ) − µT (x, θ0 )}dψ(x) ˙ ˙ ˙n ˙n = ˆ2 fh (x)dψ(x) max (mθ (Xi ) − mθ0 (Xi ))(mT (Xi ) − mT (Xi )) = op (hd ), ˙ˆ ˙ ˙ˆ ˙θ θn1 0 n1 1≤i≤n {µnδ (x, θ0 ) − µhδ (x)}{µT (x, θ0 ) − µδ hT (x)}dψ(x) ˙ ˙ ˙ nδ ˙ E = V ar(µnδ (x, θ0 ))dψ(x) = Op ((nhd )−1 ). ˙ For the first term, by (m4), (m5), (a), (1.8), and C-S inequality, one has ˆ µh (x)µT (x)dψw (x) = Σ∗ + op (1). ˙ ˙ hδ 0 Hence (1.22) holds. If under H0 , mθ0 (x) is a linear function of θ0 , and αn is the least square ˆ ˜n ˜ estimator, we have un = Σ−1 Sn and result (1.24). To prove (1.25), it suffices to show that when j = 2, the first term in the left-hand side of (1.35) multiplied by n1/2 is op (1). Note that by C-S inequality, ˆ ˆ ˆ ˆ ˆ (Un2 (x, θn2 ) − Un2 (x, θn2 ))µn (x, θn2 )dψw (x) ˙ ≤ ˆ2 1 + sup |f 2 (x)/fw (x) − 1| x∈I By the fact that 2 ˜ An2 2 ˆ µn (x, θn2 ) 2 dψ(x). ˙ ˆ ˆ2 µn (x, θn2 ) 2 dψ(x) = Op (1), and supx∈I |f 2 (x)/fw (x) − 1| = op (1) ˙ ˜ derived by (1.8), and it suffices to prove An2 = op (n−1 ), which in turn follows (a), (1.12), 31 (1.19), and (1.34). (2.b) We shall prove (1.26) in this step. Based on (1.24) and (1.25), it suffices to prove that ˜ ˜ n1/2 {Sn1 + Σ∗ Σ−1 Sn } →d Nq (0, Σ1 ), 0 n (1.36) n1/2 Sn2 →d Nq (0, Σ2 ). (1.37) The proof of (1.37) is similar as that of Lemma 4.1 (a) in K-N, if εi , σ 2 and Σ there are replaced by δi εi /∆(Xi ), σ 2 /∆, and Σ2 in (1.5), respectively. To prove (1.36), note that ˜ ˜n ˜ n1/2 Sn = Op (1) by the Central Limit Theorem, and Σ−1 = Σ−1 + op (1) by Law of Large 0 Numbers and routine calculations. Thus we have ˜ ˜ ˜ ˜ ˜ ˜ ˜ n1/2 {Sn1 + Σ∗ Σ−1 Sn } = n1/2 {Sn1 + Σ∗ Σ−1 Sn } + Σ∗ (Σ−1 − Σ−1 )(n1/2 Sn ) 0 n 0 n 0 0 0 ˜ ˜ = n1/2 {Sn1 + Σ∗ Σ−1 Sn } + op (1), 0 0 ˜ ˜ and it suffices to show n1/2 {Sn1 + Σ∗ Σ−1 Sn } →d Nq (0, Σ1 ). Write 0 0 ˜ n1/2 {Sn1 + Σ∗ Σ−1 sn1 } 0 0 n = ˜ Khi (x)µh (x)dψ(x) + Σ∗ Σ−1 mθ0 (Xi ) δi εi ˙ 0 0 ˙ n−1/2 i=1 n = n−1/2 sni , say. i=1 Note that by (e1) and (e2), {sni , i = 1, · · · , n} are i.i.d. centered r.v.’s for each n. By the Lindeberg-Feller C.L.T., it suffices to prove that as n → ∞, 32 Es2 → Σ1 , n1 (1.38) E{s2 I(|sn1 | > n1/2 η)} → 0 ∀η > 0. n1 (1.39) By the continuity of σ 2 , ∆, f , and g, we obtain ˜ Kh (x − X)µh (x)dψ(x) + Σ∗ Σ−1 mθ0 (X) ˙ 0 0 ˙ Es2 = E n1 2 ∆(X)σ 2 (X) Kh (x − X)Kh (y − X)σ 2 (X)∆(X)µh (x)µT (y)dψ(x)dψ(y) ˙ ˙h = E +E ˜ Kh (x − X)σ 2 (X)∆(X)µh (x)mT (X)dψ(x)Σ−1 Σ∗ ˙ ˙θ 0 0 0 ˜ +Σ∗ Σ−1 E 0 0 ˙h Kh (x − X)σ 2 (X)∆(X)mθ0 (X)µT (x)dψ(x) ˙ ˜ ˜ +Σ∗ Σ−1 E[mθ0 (X)mT (X)σ 2 (X)∆(X)]Σ−1 Σ∗ ˙ ˙θ 0 0 0 0 0 σ 2 (x)∆(x)mθ0 (x)mT (x)(f (x))−1 g 2 (x)dx ˙ ˙θ → 0 +2 ˜ σ 2 (x)∆(x)mθ0 (x)mT (x)g(x)dx Σ−1 Σ∗ ˙ ˙θ 0 0 0 ˜ +Σ∗ Σ−1 0 0 ˜ ˙θ σ 2 (x)∆(x)mθ0 (x)mT (x)f (x)dx Σ−1 Σ∗ = Σ1 , ˙ 0 0 0 Hence (1.38) is proved. Note that by the H¨lder’s inequality, the L.H.S. of (1.39) with η = δ0 o in (e3) is bounded by 2+δ Cn−δ0 /2 Esn1 0 ˜ Kh (x − X)µh (x)dψ(x) + Σ∗ Σ−1 mθ0 (X) ˙ 0 0 ˙ = Cn−δ0 /2 E ≤ Cn−δ0 /2 E 2 Kh (x − X)µh (x)dψ(x) ˙ 2+δ0 ˜ +Cn−δ0 /2 E[{2Σ∗ Σ−1 mθ0 (X)}2+δ0 |δε|2+δ0 ] 0 0 ˙ 33 |δε|2+δ0 2+δ0 |δε|2+δ0 2+δ0 ≤ Cn−δ0 /2 22+δ0 E (Kh (x − X)µh (x)) 2 dψ(x) ˙ 2 dψ(x) δ0 |δε|2+δ0 ˜ +Cn−δ0 /2 E[{2Σ∗ Σ−1 mθ0 (X)}2+δ0 |δε|2+δ0 ] 0 0 ˙ = Op ((nhd )−δ0 /2 ). Therefore the proof is complete. Remark 1.4.1. (Choice of G). Assuming f = 0 implies g = 0. When q = 1 and σ 2 (x) ≡ σ 2 , ˆ a constant, the asymptotic variance of θn1 satisfies ˜ v1 : = σ 2 Σ−1 + σ 2 Σ−2 0 0 ∆(x)m2 (x)(f (x))−1 g 2 (x)dx ˙θ ∆(x)m2 (x)f (x)dx ˙θ − 0 −1 0 ∆(x)m2 (x)g(x)dx ˙θ 2 0 ˜ ≥ σ 2 Σ−1 , 0 because, by C-S inequality, ∆(x)m2 (x)g(x)dx ˙θ 2 0 ˙ ∆1/2 (x)mθ0 (x)f 1/2 (x)∆1/2 (x)mθ0 (x)f −1/2 (x)g(x)dx ˙ = ≤ ∆(x)m2 (x)f (x)dx ˙θ 0 2 ∆(x)m2 (x)(f (x))−1 g 2 (x)dx, ˙θ 0 ˆ with equality if and only if g ∝ f ; and the asymptotic variance of θn2 satisfies v2 : = σ 2 ≥ σ2 (∆(x))−1 m2 (x)g 2 (x)(f (x))−1 dx ˙θ 0 ∆(x)m2 (x)f (x)dx ˙θ 0 ˜ = σ 2 Σ−1 , 0 34 −1 m2 (x)g(x)dx ˙θ 0 −2 because m2 (x)g(x)dx ˙θ 2 0 (∆(x))−1/2 mθ0 (x)g(x)(f (x))−1/2 (∆(x))1/2 mθ0 (x)(f (x))1/2 dx ˙ ˙ = ≤ (∆(x))−1 m2 (x)g 2 (x)(f (x))−1 dx ˙θ 0 2 ∆(x)m2 (x)f (x)dx, ˙θ 0 with equality if and only if g ∝ f ∆. This implies that both lower bounds on the asymptotic ˆ variances of θnj , j = 1, 2, are at that of the least square estimator’s when the regression function is linear. 1.5 Asymptotic distribution of the test statistics under H0 ˆ In this section we shall discuss the asymptotic null distribution of Dnj in Theorem 1.5.1. Theorem 1.5.1. Assume that H0 , (e1), (e2), (e3), (e4), (f1), (f2), (g), (k1), (m1)-(m5), (a), and (h3) hold. Then, ˆ Dn1 →d N (0, 1). If, in addition, (k2), (b2), and (h4) hold, then, ˆ Dn2 →d N (0, 1). ˆ Consequently, for each j = 1, 2, the test that rejects H0 whenever |Dnj | > zα/2 , is of the 35 asymptotic size α. The proof of Theorem 1.5.1 is facilitated by Lemma 1.5.2-1.5.7. The idea of the proof is similar to that of Theorem 5.1 in K-N. Lemma 1.5.1 is applied to prove Lemma 1.5.2. ˜ Lemma 1.5.1. (Theorem 1 of Hall (1984)) Let Xi , 1 ≤ i ≤ n, be i.i.d. random vectors, and let ˜ ˜ Hn (Xi , Xj ), Un := ˜ ˜ Gn (x, y) = EHn (X1 , x)Hn (X1 , y), 1≤i 0 implies |Γnj Γ−1 − 1| = op (1), j = 1, 2. j Proof. The proof of Lemma 1.5.7 is similar to that of Lemma 5.5 in K-N. Recall vw , ti1 , ti2 , si , ai , ci , qi from (1.43), and vn , tn1 , tn2 , sn , an , cn , qn , wn1 , wn2 from (1.44). Let for k = 1, 2, ˜ Γnk := 2hd n−2 Khi (x)Khj (x)ε∗ ε∗ dψ(x) ik jk 2 . i=j From result (1.41), it suffices to show ˜ Γnk − Γnk = op (1), ˆ ˜ Γnk − Γnk = op (1), k = 1, 2. (1.45) The first claim in (1.45) is proved similarly as (5.13) in K-N. For the second claim, note that 43 ˆ ˜ Γn1 − Γn1 = 2hd n−2 Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )(1 + vw (x))dψ(x) j1 i1 i=j −2hd n−2 Khi (x)Khj (x)ε∗ ε∗ dψ(x) i1 j1 2 i=j = 2hd n−2 Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )dψ(x) j1 i1 2 i=j +2hd n−2 Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )vw (x)dψ(x) i1 j1 2 i=j +4hd n−2 Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )dψ(x) i1 j1 i=j × Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )vw (x)dψ(x) i1 j1 Khi (x)Khj (x)ε∗ ε∗ dψ(x) i1 j1 −2hd n−2 2 , i=j ˆ ˜ Γn2 − Γn2 = 2hd n−2 Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 } i2 i=j ×{ε∗ (1 + aj ) + wj2 }(1 + vw (x))dψ(x) j2 −2hd n−2 Khi (x)Khj (x)ε∗ ε∗ dψ(x) i2 j2 2 i=j = 2hd n−2 Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 } i2 i=j ×{ε∗ (1 + aj ) + wj2 }dψ(x) j2 +2hd n−2 2 Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 } i2 i=j ×{ε∗ (1 + aj ) + wj2 }vw (x)dψ(x) j2 44 2 2 2 +4hd n−2 Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 } i2 i=j ×{ε∗ (1 + aj ) + wj2 }dψ(x) j2 Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 } i2 × ×{ε∗ (1 + aj ) + wj2 }vw (x)dψ(x) j2 −2hd n−2 Khi (x)Khj (x)ε∗ ε∗ dψ(x) i2 j2 2 . i=j By Fubini’s theorem and taking the expected value, one obtains Wnk,2,2 := 2hd n−2 (ε∗ )2 (ε∗ )2 ik jk 2 Khi (x)Khj (x)dψ(x) = Op (1), i=j Wnk,2,1 := 2hd n−2 (ε∗ )2 |ε∗ | ik jk Khi (x)Khj (x)dψ(x) 2 = Op (1), i=j Wnk,2,0 := 2hd n−2 (ε∗ )2 ik 2 Khi (x)Khj (x)dψ(x) = Op (1), i=j Wnk,1,1 := 2hd n−2 |ε∗ | |ε∗ | ik jk Khi (x)Khj (x)dψ(x) 2 = Op (1), i=j Wnk,1,0 := 2hd n−2 |ε∗ | ik Khi (x)Khj (x)dψ(x) 2 = Op (1), i=j Wnk,0,0 := 2hd n−2 Khi (x)Khj (x)dψ(x) 2 = Op (1), k = 1, 2. i=j Hence, we have 2 4 ˆ ˜ |Γn1 − Γn1 | ≤ (1 + vn )2 {2wn1 Wn1,2,0 + wn1 Wn1,0,0 + 4wn1 Wn1,2,1 2 3 2 +4wn1 Wn1,1,1 + 4wn1 Wn1,1,0 } + (2vn + vn )Wn1,2,2 = op (1), 45 2 ˆ ˜ |Γn2 − Γn2 | ≤ (1 + vn )2 {(2an + a2 )Wn1,2,2 + 2wn2 (1 + a2 )2 Wn2,2,0 n 4 +wn2 Wn2,0,0 + 4wn2 (1 + an )3 Wn2,2,1 2 3 +4wn2 (1 + an )2 Wn2,1,1 + 4wn2 (1 + an )Wn2,1,0 } 2 +(2vn + vn )Wn1,2,2 = op (1). Therefore the second claim of (1.45) is proved, and so is Lemma 1.5.7. 1.6 Simulations In this section two simulation studies are reported. The first investigates behavior of the ˆ empirical size and power of the test I(|Dn1 | > 1.96) with g(x) ≡ 1 on [−1, 1]2 at 4 alternatives under different designs and data missing probabilities. The second lists the mean ˆ and standard deviation of the minimum distance parameter estimator θn1 . In both studies, d = 2, and the completed data set are constructed using imputation method. All simulations are based on 1000 replications. In the first study, we compare the empirical size and power of the test at 4 alternatives, on 2 designs X, and 3 data missing probabilities ∆(X). More precisely, the design variables Xi = (X1i , X2i )T , i = 1, · · · , n, are i.i.d bivariate normal N (0, Vk ), k = 1, 2, with    0.36 0  V1 =  , 0 1   1  V2 =  46 0.64  . 0.64 1 (1.46) The three choices of ∆(x), x = (x1 , x2 )T , are as follows: ∆1 (x) = (1 + e−0.8−0.5x1 −0.5x2 )−1 , (1.47) ∆2 (x) = (1 + e−0.2−0.3x1 −0.3x2 )−1 , ∆3 ≡ 1, the complete data. These choices are similar to those in Sun and Wang (2009). They use the data missing probabilities {1+exp(−0.3−0.3x)}−1 , {1+exp(−1.0−0.8x)}−1 , and 1−0.4 exp(−5(x−0.4)2 ) when d = 1. The error distribution is N (0, (0.3)2 ). The regression function under the null T hypothesis is µ(x) = θ0 l(x), where θ0 = (0.5, 0.8)T , l(x) = x = (x1 , x2 )T . The regression models are as follows: M odel 0. δi Yi = δi µ(Xi ) + δi εi , M odel 1. δi Yi = δi µ(Xi ) + 0.5δi (X1i − 0.2)(X2i − 0.4) + δi εi , M odel 2. δi Yi = δi µ(Xi ) + 0.5δi (X1i X2i − 1) + δi εi , 2 2 M odel 3. δi Yi = δi µ(Xi ) + 2δi {exp(−0.4X1i ) − exp(0.6X2i )} + δi εi , M odel 4. δi Yi = δi X1i I(X2i > 0.2) + δi εi , The nominal level is α = 0.05. The sample sizes considered are n = 50, 100, 200. The first 2 tables describe empirical sizes and powers in models 0-4. Model 0 is the null model while model 1-4 are the alternatives. These empirical levels and powers are computed by ˆ the relative frequency of the event {|Dn1 | > 1.96} in corresponding models. Bandwidths h = n−1/4.5 and w = (log n/n)1/6 are chosen because of (h3) and (1.9). The kernels are 47 K(u, v) ≡ K 1 (u)K 1 (v) and K ∗ ≡ K, with K 1 (u) := 3 (1 − u2 )I(|u| ≤ 1). 4 Table 1.1: Empirical ε ∼ N (0, (.3)2 ) n ∆ Model 0 Model 1 Model 2 Model 3 Model 4 sizes and powers for model 0 vs. models 1-4 with X ∼ N (0, V1 ) and ∆1 .020 .103 .993 .315 .241 n=50 ∆2 .027 .079 .941 .203 .159 ∆3 .031 .224 1 .999 .484 ∆1 .029 .278 1 .351 .671 n=100 ∆2 .029 .176 .999 .270 .497 ∆3 .036 .586 1 1 .905 ∆1 .033 .633 1 .375 .980 n=200 ∆2 ∆3 .034 .042 .513 .935 1 1 .338 1 .920 1 Table 1.1 gives the empirical sizes and powers for testing model 0 against models 1-4 with design X ∼ N (0, V1 ), when the data are randomly missing with either of the 2 missing data probabilities or with no missing data. In the simulation, the empirical sizes of the test for model 0 keep less than 0.05. When the sample size increases, it gradually approaches the asymptotic level and becomes quite close at the sample size 200. On the other hand, the empirical powers of the test are greater than 0.05 against each alternative 1-4 for all the sample sizes we take, and become closer to 1 as the sample size increases; especially against alternative 2, the power is above 0.94 even at sample size 50. From the comparison among the 3 data missing probabilities, we observe that the level behavior is affected by the data missing probability, while the power is affected much more. Table 1.2: Empirical ε ∼ N (0, (.3)2 ) n ∆ Model 0 Model 1 Model 2 Model 3 Model 4 sizes and powers for model 0 vs. models 1-4 with X ∼ N (0, V2 ) and ∆1 .025 .115 .965 .237 .203 n=50 ∆2 .027 .103 .831 .187 .144 ∆3 .030 .371 1 1 .529 ∆1 .029 .199 .999 .272 .596 n=100 ∆2 .031 .164 .991 .209 .471 ∆3 .036 .677 1 1 .927 ∆1 .035 .479 1 .274 .957 n=200 ∆2 ∆3 .037 .043 .373 .952 1 1 .227 1 .892 1 Table 1.2 lists empirical sizes and powers with design X ∼ N (0, V2 ). In addition to 48 obtaining similar conclusion as the first table, we can also find that the power and the level behaviors are affected by the dependence between the design variable coordinates, although they are not affected that much. Results for model 4 in both tables show that the discontinuity of regression function has an effect on the power of the test, because the power is dramatically changed as the sample size increases. ˆ Table 1.3: Mean and s.d. of θn1 under model 0 with X ∼ N (0, V1 ), ε ∼ N (0, (.3)2 ), and E(δ|X = x) = ∆1 (x) n n = 50 n = 100 n = 200 Mean (.494, .804) (.503, .800) (.499, .800) (.110, .084) (.078, .061) (.052, .043) Std dev ˆ The second study gives the mean and standard deviation of each component of θn1 under the null hypothesis model 0 with normal error ε ∼ N (0, (0.3)2 ) when d = q = 2. The variance of design and data missing probability are chosen to be V1 in (1.46) and ∆1 in (1.47), respectively. The regression function and parameter are the same as in the first study. Results listed in Table 1.3 show that the minimum distance estimator of the parameter is very close to the true parameter and the standard deviation is quite small. 49 Chapter 2 Testing for Superiority of Two Regression Curves when Responses are Missing At Random 2.1 Introduction This chapter considers a class of tests using covariate matching for comparing the equality of two nonparametric regression curves against a one-sided alternative, when responses are missing at random. More precisely, let (Xk , δk Yk ), k = 1, 2, be the two groups of random variables, where Xk is a one-dimensional explanatory variable, Yk is a one-dimensional response variable, δk is an indicator for whether the response is missing or observed, i.e. δk = 1, if Yk is observed, and δk = 0, if Yk is missing, k = 1, 2. We say Yk is missing at random, if δk and Yk are conditionally independent, given Xk , i.e. P (δk = 1|Yk , Xk ) = P (δk = 1|Xk ), a.s., k = 1, 2; see Little and Rubin (1987). 50 Now, let µk (x) := E(Yk |Xk = x), x ∈ R, k = 1, 2, be the two regression functions so that Yk = µk (Xk ) + εk , E(εk |Xk = x) = 0, ∀ x ∈ R, k = 1, 2. Let I be a compact interval in R. The problem of interest is to test the hypothesis H0 : µ1 (x) = µ2 (x), for all x ∈ I, H1 : µ1 (x) ≥ µ2 (x), for all x ∈ I with strict inequality for at least one x ∈ I, based on independent samples {(Xk,i , δk,i Yk,i ) : i = 1, · · · , nk } from the distributions of (Xk , δk Yk ), k = 1, 2, respectively. Moreover, let φ be a non-negative continuous function on R. One is interested in the asymptotic power of a given test against the local alternatives H1N : µ1 (x) = µ2 (x) + N −1/2 φ(x), N := n1 n2 , n1 + n2 for all x ∈ I. (2.1) When we observe complete data, this testing problem has been addressed by many researchers. In particular, Hall et. al (1997) proposed a class of tests based on the covariatematching, and the local averaging interpolation rule. They proved the asymptotic normality of the proposed statistics under general alternatives, allowing design and error densities to be different. They also proposed an adaptive version of their test that achieves the optimal power against a sequence of local alternatives. Koul and Schick (1997) proposed four 51 classes of tests under the assumption of possibly distinct design but common error densities. They gave a general asymptotic optimality theory against a sequence of local alternatives. One of these classes of covariate-matched tests is shown to have desirable asymptotic power properties against several alternatives. Koul and Schick (2003) (K-S) developed this class of test further and derive their asymptotic power for the local alternatives, under the heteroscedastic setting with possibly distinct error and design densities in the two regression models. They obtained an upper bound on the asymptotic power of all tests against a given sequence of local alternatives using a semiparametric approach, and showed that a member of this class of tests achieves this upper bound. This chapter discusses the above one-sided testing problem when responses are missing at random. We construct a complete data set by imputing kernel-type estimates for the regression functions, and investigate the asymptotic properties of the modified version for missing at random setup of the covariate-matched test statistic proposed in K-S under null hypothesis and local alternatives. The consistency of the tests based on these statistics is also discussed. To set up the analysis, let U be the set of all non-negative functions that are continuous on I and vanish off I. Assume that Xk has a density gk that is bounded away from zero on I, k = 1, 2. Let K be a symmetric Lipschitz continuous kernel density with compact support [−1, 1], a = aN , bk = bk,n , ck = ck,n , and dk = dk,n , be bandwidth k k k sequences. Let Kh (y) := K(y/h)/h, y ∈ R, h = a, bk , ck , dk . The estimators of regression functions and the constructed responses are, respectively, µk (x) := ˆ nk i=1 δk,i Yk,i Kbk (x − Xk,i ) , nk i=1 δk,i Kbk (x − Xk,i ) ˆ Yk,i := δk,i Yk,i + (1 − δk,i )ˆk (Xk,i ), µ 52 1 ≤ i ≤ nk , k = 1, 2. For each k = 1, 2, let vk be a non-negative estimate of vk := ˆ √ u/gk which vanishes off I. The covariate-matched statistic and the adaptive version with responses missing at random, respectively, are 1 T := n1 n2 n1 n2 v1 (X1,i )v2 (X2,j )(Y1,i − Y2,j )Ka (X1,i − X2,j ), i=1 j=1 and ˆ T := 1 n1 n2 n1 n2 ˆ ˆ v1 (X1,i )ˆ2 (X2,j )(Y1,i − Y2,j )Ka (X1,i − X2,j ). ˆ v i=1 j=1 The needed assumptions and conditions to state the main results are given in Section 2.2. ˆ Section 2.3 states the asymptotic normality of T under H0 and H1N , and the consistency of ˆ the test based on T . The optimal u to maximize the asymptotic power against H1N is also ˆ discussed. Section 2.4 gives the estimates needed to construct T and the corresponding test. Simulation studies are set in Section 2.5. 2.2 Assumptions In this section we shall state the needed assumptions. The following assumptions are similar to those in K-S. For each k = 1, 2, (e1) (Xk,i , δk,i Yk,i ) : Xk,i ∈ R, Yk,i ∈ R, δk,i = 0 or 1, i = 1, 2, · · · , nk , are i.i.d. random vectors with δk,i = 1, if Yk,i is observed, and δk,i = 0, if Yk,i is missing; µk (x) = E(Yk,1 |Xk,1 = x), x ∈ R, εk,i = Yk,i − µk (Xk,i ), δk,i and εk,i are conn n 1 2 ditionally independent, given Xk,i . {(X1,i , δ1,i Y1,i )}i=1 and {(X2,j , δ2,j Y2,j )}j=1 are 53 independent. 2 (e2) Eε2 < ∞, σk (x) := E(ε2 |Xk,1 = x) and ∆k (x) := E(δk,1 = 1|Xk,1 = x) are k,1 k,1 continuous and positive on I. 4 (e3) νk (x) := E(ε4 |Xk,1 ), x ∈ R, is bounded on an open interval containing I. k,1 2 (e4) σk and ∆k are twice continuously differentiable on I. (g1) The design variable Xk,1 has a bounded Lebesgue density gk which is continuous and positive on I. (g2) The density g is twice continuously differentiable on I. (k) The kernel w is symmetric square integrable continuous density with compact support [−1, 1]. In addition, w satisfies Lipschitz-continuity of order 1. (m) µ1 is continuous. µ2 is Lipschitz-continuous of order 1 with Lipschitz constant µ2 . (p) φ is a non-negative continuous function. (q) ξ is a non-negative continuous function with ξ(x) > 0 for at least one x ∈ I. (u) U is the set of all non-negative functions that vanish off I and whose restrictions to I are continuous. (w1) a2 N → 0, aN η1 → ∞, for some η1 ∈ (1/2, 1). η (w2) b2 nk → 0, bk nk2 → ∞, for some η2 ∈ (1/2, 1). k η (w3) ck → 0, dk → 0, (ck +dk )nk3 → ∞ for some η3 ∈ (0, 1/2), (c5 +d5 )nk (log nk )−1 ≤ k k C for some C < ∞. (z) {Ik,1 , · · · , Ik,B } partitions I into disjoint intervals of equal length πk , with πk → 0 k 1/2 and nk πk → ∞. 54 2 Note that (e2) and (g1) imply that for each k = 1, 2, the functions gk , σk , and ∆k , are bounded and uniformly continuous on the compact interval I, and bounded away from zero on I. Rewrite H1 into the form: H1 : µ1 = µ2 + ξ, where ξ satisfies (q) and u(x)ξ(x)dx > 0, u ∈ U. (2.2) To state the main results, we need the following set of additional conditions on estimators. They are motivated by Schick (1987), and proposed in K-S as Definition 2.1, Assumption 2.3, and Lemma 2.4, for the case of complete responses. These conditions are reproduced as follows, only with changes from the case of complete responses to data missing at random setup. We need these conditions not only under H0 and H1N in (2.1), but also under H1 in (2.2). Let X := (X1,1 , · · · , X1,n1 , X2,1 , · · · , X2,n2 ), (2.3) δ := (δ1,1 , · · · , δ1,n1 , δ2,1 , · · · , δ2,n2 ), Y := (Y1,1 , · · · , Y1,n1 , Y2,1 , · · · , Y2,n2 ), rk (x) = u(x)/gk (x), x ∈ I. and Yk,j be the vector obtained from Y by removing Yk,j , j = 1, · · · , nk , k = 1, 2. Definition 2.2.1. The estimator rk of rk is said to be consistent and cross-validated on I ˆ for the function rk (short CCV on I for rk ) if the following two conditions hold: 55 n N k 1I (Xk,i )E[{ˆk (Xk,i ) − rk (Xk,i )}2 |X, δ] = op (1), r n2 i=1 k N max sup E[{ˆk (x) − E[ˆk (x)|X, δ, Yk,j ]}2 |X, δ] = op (1). r r 1≤j≤nk x∈I (2.4) (2.5) We say rk is a modification of rk if P (supx∈I |˜k (x) − rk (x)| > 0) → 0. We say rk is ˜ ˆ r ˆ ˆ essentially CCV on I for rk if there exists a modification of rk which is CCV on I for rk . ˆ Assumption 2.2.1. The estimate rk is essentially CCV on I for rk for k = 1, 2. ˆ Lemma 2.2.1. Suppose there are modifications vk of vk such that, for k = 1, 2 and l = 1, 2, ˜ ˆ 0 ≤ vk (x) ≤ M, ˜ x ∈ I, (2.6) for some finite constant M , and such that 1 nl nl E[{˜k (Xl,i ) − vk (Xl,i )}2 |X, δ] = op (1), v (2.7) i=1 N max sup E[{˜k (x) − E[˜k (x)|X, δ, Yl,i ]}2 |X, δ] = op (1). v v 1≤i≤nl x∈I (2.8) Then Assumption 2.2.1 holds. The proof of Lemma 2.2.1 follows that of Lemma 2.4 in K-S, only with changes from X to (X, δ). Since this proof does not involve the responses Y but only the designs (X, δ), the above lemma holds under H0 , H1N , and H1 . Remark 2.2.1. Suppose modifications vk of vk exist and satisfy (2.6)-(2.8), k = 1, 2. K-S ˜ ˆ 56 show in their proof of Lemma 2.4 that the estimators 1 r1 (x) := v1 (x) ˆ ˆ n2 r2 (x) := v2 (x) ˆ ˆ 1 n1 n2 v2 (X2,j )Ka (x − X2,j ), ˆ and v1 (X1,i )Ka (x − X1,i ), ˆ x ∈ R, (2.9) j=1 n1 i=1 are essentially CCV on I for r1 , and r2 , respectively, and their respective modifications can be chosen as 1 r1 (x) = v1 (x) ˜ ˜ n2 r2 (x) = v2 (x) ˜ ˜ 1 n1 n2 v2 (X2,j )Ka (x − X2,j ), ˜ (2.10) j=1 n1 v1 (X1,i )Ka (x − X1,i ). ˜ i=1 We also need the following notation and results in the proofs later. Let hk (x) := ∆k (x)gk (x), 1 ˆ hk (x) := nk λk := inf hk (x), x∈I nk δk,l Kb (x − Xk,l ), k l=1 k = 1, 2. 1 gk (x) := ˆ nk (2.11) nk Kb (x − Xk,l ). k l=1 Lemma 2.2.2. Let tk = tnk , k = 1, 2, be bandwidths satisfying tk → 0 and nk t5 (log nk )−1 ≤ k C for some C < ∞. Assume (e2), (e4), (g1), and (g2) hold. Then the following hold. 1 sup x∈I nk nk Ktk (x − Xk,i ) − gk (x) = op (1). (2.12) i=1 nk 1 δk,i Ktk (x − Xk,i ) − hk (x) = op (1). x∈I nk i=1 sup 57 (2.13) 1 sup x∈I nk nk εk,i δk,i Ktk (x − Xk,i ) = op (1). (2.14) i=1 nk 1 ε2 δk,i Ktk (x − Xk,i ) − Eε2 δk,1 Ktk (x − Xk,1 ) = op (1). k,i k,1 nk x∈I i=1 nk 1 nk i=1 δk,i Ktk (x − Xk,i ) sup − ∆k (x) = op (1). nk 1 Ktk (x − Xk,i ) x∈I n i=1 sup (2.15) (2.16) k This lemma is obtained from Theorem 3 of Collomb and H¨rdle (1986). a 2.3 Asymptotic distribution of the test statistic under H0, H1N, and H1 ˆ In this section we discuss the asymptotic distribution of T against H1N in Theorem 2.3.1. The asymptotic null distribution is included because the choice φ = 0 corresponds to the ˆ null hypothesis. The asymptotic behavior of T against H1 is given in Theorem 2.3.2, while consistency of the corresponding test against H1 is stated in Remark 2.3.1. K-S propose an optimal u to test H0 against H1N when data is complete. When responses are missing at random, a similar optimal u can be derived. This result is given in Remark 2.3.1. The following definitions are used in the theorems and remarks below. n2 N = , n1 n1 + n2 2 σ1 (x) ψ1 (x) := , ∆1 (x)g1 (x) q1 := τ 2 := N n1 = , n2 n1 + n2 2 σ2 (x) ψ2 (x) := , ∆2 (x)g2 (x) q2 := u2 (x)[q1 ψ1 (x) + q2 ψ2 (x)]dx, 58 D := (2.17) x ∈ R, u(x)(µ1 (x) − µ2 (x))dx. Theorem 2.3.1. Assume that (e1), (e2), (e4), (g1), (g2), (k), (m), (p), (u), (w1), (w2), and Assumption 2.2.1 hold. Then under H1N of (2.1), N 1/2 1 ˆ T −D− n1 − 1 n2 n1 i=1 n2 j=1 u(X1,i ) δ ε ∆1 (X1,i )g1 (X1,i ) 1,i 1,i u(X2,j ) δ ε = op (1), ∆2 (X2,j )g2 (X2,j ) 2,j 2,j as both sample sizes n1 and n2 tend to infinity. Consequently, under H1N , ˆ N 1/2 (T − D) →d N (0, τ 2 ), as n1 ∧ n2 → ∞. ˆ Proof. Recall r from (2.3), rk from (2.9), gk and hk from (2.11), k = 1, 2. For x ∈ R, ˆ ˆ k, m = 1, 2, let 1 µk,m (x) := ¯ nk εk (x) := ¯ 1 nk 1 ¯ φk (x) := nk nk ˆ µm (Xk,l )δk,l Kb (x − Xk,l )/hk (x), k l=1 nk ˆ εk,l δk,l Kb (x − Xk,l )/hk (x), k l=1 nk ˆ φ(Xk,l )δk,l Kb (x − Xk,l )/hk (x). k l=1 ˆ Suppose H1N holds. With definitions above, write T = A1 + B1 − B2 + C1 − C2 + R1 + R2 , where 1 A1 := n1 n2 B1 := 1 n1 n1 n2 v1 (X1,i )ˆ2 (X2,j ) µ2 (X1,i ) − µ2 (X2,j ) Ka (X1,i − X2,j ), ˆ v i=1 j=1 n1 r1 (X1,i )(1 − δ1,i ) µ1,2 (X1,i ) − µ2 (X1,i ) , ˆ ¯ i=1 59 1 B2 := n2 C1 := C2 := 1 n1 1 n2 n2 r2 (X2,j )(1 − δ2,j ) µ2,2 (X2,j ) − µ2 (X2,j ) , ˆ ¯ j=1 n1 r1 (X1,i ) δ1,i ε1,i + (1 − δ1,i )¯1 (X1,i ) , ˆ ε i=1 n2 r2 (X2,j ) δ2,j ε2,j + (1 − δ2,j )¯2 (X2,j ) , ˆ ε j=1 N −1/2 R1 := n1 R2 := n1 r1 (X1,i )φ(X1,i ), ˆ i=1 n N −1/2 1 n1 ¯ r1 (X1,i )(1 − δ1,i ) φ1 (X1,i ) − φ(X1,i ) . ˆ i=1 In the following, we shall show that N 1/2 A1 = op (1); N 1/2 Bk = op (1), k = 1, 2; N 1/2 R1 = n1/2 D + op (1); N 1/2 C N 1/2 = k nk N 1/2 R2 nk i=1 (2.18) rk (Xk,i ) δ ε + op (1), ∆k (Xk,i ) k,i k,i = op (1). k = 1, 2; (2.19) (2.20) Among them, (2.18) is derived by similar proof as that of Theorem 2.6 in K-S, while some details proof of (2.19) are also inspired by those of Theorem 2.6 in K-S. Recall the Lipschitz constant µ2 of µ2 in condition (m). By (g1), (m), (u), (w1), Assumption 2.2.1, and routine calculation, one has N 1/2 |A1 | ≤ N 1/2 1 µ2 a n1 n1 r1 (X1,i ) = op (1). ˆ i=1 From (g1), (m), (u), (w2), Assumption 2.2.1, and the fact 60 µk,2 (Xk,i ) − µ2 (Xk,i ) ¯ ≤ ≤ 1 nk nk l=1 |µ2 (Xk,l ) − µ2 (Xk,i )|δk,l Kbk (Xk,i − Xk,l ) ˆ hk (Xk,i ) µ2 bk , k = 1, 2, nk ˆ i=1 rk (Xk,i ) 1 one obtains N 1/2 |Bk | ≤ N 1/2 µ2 bk n k = op (1), k = 1, 2. For each k = 1, 2, note that 1 Ck = nk nk i=1 1 εk,i nk nk rk (Xk,l )(1 − δk,l )δk,i Kb (Xk,i − Xk,l ) ˆ k ˆ hk (Xk,l ) l=1 + δk,i rk (Xk,i ) . ˆ Write Ck = Ck,1 + Ck,2 + Ck,3 + Ck,4 , where 1 Ck,1 := nk nk i=1 1 εk,i nk nk l=1 rk (Xk,l )(1 − δk,l )δk,i Kb (Xk,i − Xk,l ) k hk (Xk,l ) − 1 Ck,2 := nk nk i=1 1 εk,i nk nk l=1 δk,i (1 − ∆k (Xk,i ))rk (Xk,i ) , ∆k (Xk,i ) rk (Xk,l ) rk (Xk,l ) ˆ − ˆ hk (Xk,l ) hk (Xk,l ) ×(1 − δk,l )δk,i Kb (Xk,i − Xk,l ) , k 1 Ck,3 := nk Ck,4 := 1 nk nk εk,i δk,i (ˆk (Xk,i ) − rk (Xk,i )), r i=1 nk i=1 εk,i δk,i rk (Xk,i ) . ∆k (Xk,i ) For i, l = 1, · · · , nk , k = 1, 2, let Ik,i,l := rk (Xk,l )(1 − δk,l )δk,i Kb (Xk,i − Xk,l ) k , hk (Xk,l ) 61 Jk,i := δk,i (1 − ∆k (Xk,i ))rk (Xk,i ) . ∆k (Xk,i ) By (e1), (e2), (g1), (u), (w2), and routine calculation, one has E(N 1/2 Ck,1 ) = 0 and V ar(N 1/2 Ck,1 ) n = = = 2 1 k N E ε2 Ik,1,l − Jk,1 k,1 n nk k l=1 N (n − 1)(nk − 2) nk − 1 2 Ik,1,2 + k Ik,1,2 Ik,1,3 E ε2 k,1 nk n2 n2 k k 2(nk − 1) 2 − Jk,1 Ik,1,2 + Jk,1 nk N nk − 1 2 2 σk (x)∆k (x)gk (x) rk (x + bu)(1 − ∆k (x + bu)) n k n 2 bk k ×(∆k (x + bu))−2 (gk (x + bu))−1 K 2 (u)dudx + (nk − 1)(nk − 2) n2 k 2 σk (x)∆k (x)gk (x) rk (x + bu) ×(1 − ∆k (x + bu))(∆k (x + bu))−1 ) rk (x + bv) ×(1 − ∆k (x + bv))(∆k (x + bv))−1 K(u)K(v)dudvdx − 2(nk − 1) nk 2 σk (x)rk (x)(1 − ∆k (x))gk (x) rk (x + bu) ×(1 − ∆k (x + bu))(∆k (x + bu))−1 ) K(u)dudx + → 0, 2 2 σk (x)rk (x)(1 − ∆k (x))2 (∆k (x))−1 gk (x)dx k = 1, 2. Hence N 1/2 Ck,1 = op (1), k = 1, 2. Recall the modification rk of rk defined in (2.10) which ˜ ˆ is CCV on I for rk . Let for i, j, m = 1, · · · , nk , k = 1, 2, rk,i (x) := E[˜k (x)|X, δ, Yk,i ], ˜ r rk,i,j (x) := E[˜k,i (x)|X, δ, Yk,j ], ˜ r 62 1 ˆ Mk,i := nk 1 ˜ Mk,i := nk nk l=1 nk l=1 rk (Xk,l ) ˆ rk (Xk,l ) − (1 − δk,l )δk,i Kb (Xk,i − Xk,l ), k ˆ hk (Xk,l ) hk (Xk,l ) rk (Xk,l ) rk (Xk,l ) ˜ − (1 − δk,l )δk,i Kb (Xk,i − Xk,l ), k ˆ hk (Xk,l ) hk (Xk,l ) ˜ ˜ Mk,i;j := E[Mk,i |X, δ, Yk,j ], ˜ ˜ Mk,i;j,m := E[Mk,i;j |X, δ, Yk,m ]. Then we have 1 Ck,2 = nk nk i=1 1 ˜ εk,i Mk,i;i + nk nk i=1 = Ck,2,1 + Ck,2,2 + Ck,2,3 , 1 ˜ ˜ εk,i (Mk,i − Mk,i;i ) + nk nk ˆ ˜ εk,i (Mk,i − Mk,i ) i=1 say. For each k = 1, 2, let Qk,l;i,j := E[(˜k,i (Xk,l ) − rk,i,j (Xk,l ))2 |X, δ], l, i, j = 1, · · · , nk . By r ˜ C-S inequality, one has n n n n N k k ˜ ˜ E[(Mk(i),i − Mk(i),i,j )2 |X, δ] Sk,1 : = 2 nk i=1 j=1 N k k = E n2 i=1 j=1 k 1 nk nk ˜ rk,i (Xk,l ) rk,i,j (Xk,l ) ˜ − ˆ ˆ hk (Xk,l ) hk (Xk,l ) l=1 ×(1 − δk,l )δk,i Kb (Xk,i − Xk,l ) k n n 2 X, δ n ˜ ˜ N k k 1 k rk,i (Xk,l ) rk,i,j (Xk,l ) 2 ≤ − Kb (Xk,i − Xk,l ) X, δ E 2 k ˆ ˆ nk nk i=1 j=1 hk (Xk,l ) hk (Xk,l ) l=1 1 × nk nk nk nk N 1 ≤ sup gk (x) 2 ˆ E nk i=1 j=1 nk x∈I l=1 nk (1 − δk,l )δk,i Kb (Xk,i − Xk,l ) k l=1 rk,i (Xk,l ) rk,i,j (Xk,l ) 2 ˜ ˜ X, δ − ˆ ˆ hk (Xk,l ) hk (Xk,l ) ×Kb (Xk,i − Xk,l ) k 63 N = sup gk (x) 3 ˆ nk x∈I nk nk nk l=1 i=1 j=1 k nk λ ˆ {hk (Xk,m ) ≥ k }] 2 = sup gk (x)I[ ˆ x∈I Qk,l;i,j Kb (Xk,i − Xk,l ) ˆ h2 (Xk,l ) k m=1 N × 3 nk nk + sup gk (x)I[ ˆ x∈I m=1 Qk,l;i,j Kb (Xk,i − Xk,l ) ˆ h2 (Xk,l ) k l=1 i=1 j=1 k λ ˆ {hk (Xk,m ) < k }] 2 N × 3 nk = Sk,1,1 + Sk,1,2 , nk nk nk nk nk nk Qk,l;i,j Kb (Xk,i − Xk,l ) ˆ h2 (Xk,l ) k l=1 i=1 j=1 k say. By Assumption 2.2.1, (2.5), (e1), and C-S inequality, one obtains n N k sup Qk,l;i,j 1≤i,l≤nk nk j=1 n N k E[(˜k,i (Xk,l ) − rk,j,i (Xk,l ))2 |X, δ] r ˜ = sup nk 1≤i,l≤nk j=1 n n N k k ≤ sup E[E({˜k (Xk,l ) − rk,j (Xk,l )}2 |X, δ, Yk,i )|X, δ] r ˜ 2 1≤l≤nk nk i=1 j=1 ≤ N max sup E[{˜k (x) − rk,j (x)}2 |X, δ] = op (1). r ˜ (2.21) 1≤j≤nk x∈I (2.12) in Remark 2.2.2 shows supx∈I gk (x) = op (1). Together with (2.11) and (2.21), we ˆ have 1 Sk,1,1 ≤ n2 k nk nk n N k Kb (Xk,i − Xk,l ) sup Qk,l;i,j k 1≤i,l≤nk nk j=1 l=1 i=1 nk × sup gk (x)I[ ˆ x∈I ˆ {hk (Xk,m ) ≥ λk /2}] m=1 64 (λk /2)2 ≤ sup gk (x) ˆ n N k Qk,l;i,j sup 1≤i,l≤nk nk j=1 2 x∈I (λk /2)2 = op (1). (2.13) in Remark 2.2.2 leads to the result nk P( ˆ ˆ {hk (Xk,i ) < λ/2}) ≤ P ( max |hk (Xk,i ) − hk (Xk,i )| > λ/2) 1≤i≤n k i=1 ˆ ≤ P (sup |hk (x) − hk (x)| > λ/2) → 0. x∈I Together with the fact that nk ˆ i=1 {hk (Xk,i ) < λ/2} ∈ σ(X, δ), one has Sk,1,2 = op (1). Thus, we have Sk,1 = op (1), k = 1, 2. Let ˜ Di := ε1,i M1,i;i , Di,j := E[Di |X, δ, Y1,j ], i, j = 1, · · · , n1 . Note that by (e1), Di,i = 0, and E[Di Dj |X, δ] = E[(Di − Di,j )(Dj − Dj,i )|X, δ]. From (e2), one has n n n k,2,1 )2 |X, δ] n n E[(N 1/2 C n N k k = E[(Di − Di,j )(Dj − Dj,i )|X, δ] n2 i=1 j=1 k N k k ≤ E[(Di − Di,j )2 |X, δ] 2 nk i=1 j=1 N k k ˜ ˜ = E[ε2 (M1,i;i − M1,i;i,j )2 |X, δ] 1,i 2 nk i=1 j=1 2 ≤ Sk,1 sup σk (x) = op (1), k = 1, 2. x∈I Thus N 1/2 Ck,2,1 = op (1), k = 1, 2. By similar routine in proving Sk,1 = op (1), one has N Sk,2 := n k nk 2 ˜ ˜ i=1 E[(Mk,i − Mk,i;i ) |X, δ] = op (1), k = 1, 2. This together with (e2) and C-S inequality leads to the following result: 65 (N 1/2 C k,2,2 )2 N ≤ nk nk ˜ ˜ (Mk,i − Mk,i;i )2 i=1 1 nk nk ε2 k,j = op (1), k = 1, 2. j=1 Because of P (|N 1/2 Ck,2,3 | > 0) ≤ P (supx∈I |˜k (x)− rk (x)| > 0) → 0, we have N 1/2 Ck,2,3 = r ˆ op (1), k = 1, 2. Therefore one obtains N 1/2 Ck,2 = op (1), k = 1, 2. By similar proof as that of Theorem 2.6 in K-S, N 1/2 Ck,3 = op (1) can be derived. Then one has N 1/2 C N 1/2 = k nk nk i=1 rk (Xk,i ) δ ε + op (1), ∆k (Xk,i ) k,i k,i k = 1, 2. Furthermore, by Assumption 2.2.1, (2.4), (p), C-S inequality, and Law of Large Numbers, one obtains N 1/2 R1 1 = n1 + N 1/2 R2 ≤ n1 1 r1 (X1,i )φ(X1,i ) + n1 i=1 n1 1 n1 1 n1 n1 (˜1 (X1,i ) − r1 (X1,i ))φ(X1,i ) r i=1 (ˆ1 (X1,i ) − r1 (X1,i ))φ(X1,i ) = N 1/2 D + op (1), r ˜ i=1 n1 1/2 r1 (X1,i ) ˆ2 i=1 1 n1 n1 ¯ (φ1 (X1,i ) − φ(X1,i ))2 1/2 . i=1 Together by the result as follows: 1 n1 n1 ¯ (φ1 (X1,i ) − φ(X1,i ))2 i=1 1 = n1 ≤ 1 n1 n1 i=1 n1 i=1 1 n1 1 n1 n1 ˆ (φ(X1,l ) − φ(X1,i ))δ1,l Kb1 (X1,i − X1,l )/h1 (X1,i ) l=1 n1 ˆ (φ(X1,l ) − φ(X1,i ))2 δ1,l Kb1 (X1,i − X1,l )/h1 (X1,i ) l=1 66 2 n = n 1 1 1 ˆ (φ(X1,l ) − φ(X1,i ))2 δ1,l Kb1 (X1,i − X1,l )/h1 (X1,i ) n2 i=1 1 l=1 n1 × I[ n1 ˆ {h1 (X1,m ) ≥ λ1 /2}] + I[ m=1 ˆ {h1 (X1,m ) < λ1 /2}] m=1 = op (1) + op (1) = op (1), we have N 1/2 R2 = op (1). Therefore, one obtains ˆ N 1/2 T = N 1/2 1 D+ n1 n1 i=1 r1 (X1,i ) 1 δ1,i ε1,i − ∆1 (X1,i ) n2 n2 j=1 r2 (X2,j ) δ ε + op (1). ∆2 (X2,j ) 2,j 2,j Thus the proof is complete. Theorem 2.3.2. Suppose (e1), (e2), (e4), (g1), (g2), (k), (m), (p), (u), (w1), (w2), and ˆ Assumption 2.2.1 hold. Then under H1 in (2.2), one has N 1/2 T →p ∞. The proof of Theorem 2.3.2 is similar to that of Theorem 2.3.1, only with difference that N 1/2 (R1 + R2 ) →p ∞ under H1 . Remark 2.3.1. Let γ := u(x)φ(x)dx τ . Assume that under H0 , H1 , and H1N , the assumptions of Theorem 2.3.1 hold, and there exists an estimate τ 2 of τ 2 which satisfies ˆ τ 2 = τ 2 + op (1). Then, one has ˆ ˆ τ N 1/2 T /ˆ →d N (0, 1), under H0 , ˆ τ N 1/2 T /ˆ →d N (γ, 1), under H1N , ˆ τ N 1/2 T /ˆ →p ∞, under H1 . 67 Consequently, the asymptotic level of the test ˆ ˆ τ V = I{N 1/2 T /ˆ ≥ zα }, (2.22) is α. The asymptotic power of this test under H1N is 1 − Φ(zα − γ). An application of the Cauchy-Schwarz (C-S) inequality shows that γ and the asymptotic power are maximized by the choice u = uφ := φ1I . q1 ψ1 + q2 ψ2 The maximal asymptotic power is 1 − Φ(zα − γφ ), where γφ := 1/2 φ2 (x)1I (x) dx q1 ψ1 (x) + q2 ψ2 (x) is the maximal value of γ. This result is similar to that of the complete responses data discussed in Remark 2.8 and Remark 2.9 of K-S. The only difference in the missing data at random structure is reflected in having ∆k (x) appear in the denominator of ψk , k = 1, 2. The result is exactly the same as that of complete responses data when ∆k ≡ 1, k = 1, 2. 2.4 Some suggested estimators In this section we shall consider estimates of vk and τ 2 . K-S give these estimates for a given u and the (unknown) optimal u when responses are complete, and discuss their properties. When responses are missing at random, similar estimates and properties are still valid. They are listed as follows for the sake of completeness. 68 ˆ The following discussion gives an estimate of vk , k = 1, 2. Recall gk and hk from (2.11). ˆ When u is known, consider vk := ˆ √ u/ˆk , g vk := ˜ √ u/(ˆk ∨ η), g (2.23) where η is a positive number satisfying gk (x) > 4η for all x ∈ I, vk is a modification of vk ˜ ˆ which satisfies assumption of Lemma 2.2.1. This implies that Assumption 2.2.1 holds. When u = uφ with a known non-negative continuous function φ, let ck and dk be bandwidths satisfy (w3), and consider nk j=1 Yk,j δk,j Kck (x − Xk,j ) , nk δk,j Kck (x − Xk,j ) j=1 µk,c (x) := ˆ σk (x) ˆ2 := σ2 ˆ ˆk := k , ψ ˆ hk nk j=1 (Yk,j µφ := ˆ (2.24) − µk,c (Xk,j ))2 δk,j Kd (x − Xk,j ) ˆ k nk j=1 δk,j Kdk (x − Xk,j ) φ1I ˆ ˆ q1 ψ1 + q2 ψ2 , vk := ˆ µφ ˆ , gk ˆ , x ∈ R, k = 1, 2. Arguing as in the estimation of vk when u = uφ in section 3 of K-S, we can find a modification vk of vk , which satisfies the assumptions of Lemma 2.2.1, such that Assumption 2.2.1 holds. ˜ ˆ The following lemma gives the needed properties of σk . ˆ2 Lemma 2.4.1. Suppose (e1), (e2), (e3), (e4), (g1), (g2), (k), (m), (p), (u), (v), and (w3) hold. Then for each k = 1, 2, 2 sup |ˆk (x) − σk (x)| = op (1), σ2 under H0 , H1 , and H1N , x∈I 2 and σk is essentially CCV on I for σk , under H0 , H1 , and H1N . ˆ2 69 (2.25) Proof. First we give the proof of (2.25) under H1N . The case φ = 0 corresponds to the result under H0 . For k = 1, 2 and x ∈ I, define 1 ˆ hk,c (x) := nk µk,c (x) := ¯ εk,c (x) := ¯ 1 nk nk δk,l Kck (x − Xk,l ), l=1 nk l=1 µk (Xk,l )δk,l Kck (x − Xk,l ) ˆ hk,c (x) 1 nk nk l=1 εk,l δk,l Kck (x − Xk,l ) ˆ hk,c (x) , , ˆ while hk,d (x), µk,d (x), and εk,d (x) can be defined similarly when the bandwidth dk is used ¯ ¯ 2 instead of ck . One can write σk (x) − σk (x) into the sum of the following terms: ˆ2 Zk,1 (x) = Zk,2 (x) = nk 2 ¯ j=1 (µk (Xk,j ) − µk,c (Xk,j )) δk,l Kdk (x − Xk,j ) 1 nk ˆ hk,d (x) , nk 2 ¯ j=1 εk,c (Xk,j )δk,j Kdk (x − Xk,j ) 1 nk ˆ hk,d (x) 2 nk − + nk ¯ j=1 εk,j εk,c (Xk,j )δk,j Kdk (x − Xk,j ) 1 nk ˆ hk,d (x) nk 2 j=1 εk,j δk,j Kdk (x − Xk,j ) ˆ hk,d (x) 2 − σk (x) = Zk,2,1 − Zk,2,2 + Zk,2,3 , Zk,3 (x) = 2 nk say, nk ¯ j=1 (µk (Xk,j ) − µk,c (Xk,j ))(εk,j − εk,c (Xk,j ))δk,j Kd (x − Xk,j ) ¯ ˆ hk,d (x) k . By (m), (p), and (u), we have supx∈I Zk,1 (x) ≤ 2 2 c2 + op (n−1 ) = op (1). (2.13) and (2.14) µ k in Lemma 2.2.2 leads to the result 70 sup Zk,2,1 (x) ≤ x∈I max ε2 (Xk,j ) ¯ 1≤j≤nk k,c ≤ sup x∈I h(x) 2 ˆ hk,c (x) sup 1 nk x∈I nk 2 l=1 εk,l δk,l Kck (x − Xk,l ) h2 (x) = op (1). From (2.13) and (2.15) in Lemma 2.2.2, one obtains sup |Zk,2,3 (x)| ≤ sup x∈I 1 nk nk 2 2 j=1 εk,j δk,j Kdk (x − Xk,j ) − σk (x)hk (x) hk (x) x∈I hk (x) hk (x) 2 + sup σk (x) sup −1 ˆ ˆ x∈I hk,d (x) x∈I x∈I hk,d (x) × sup = op (1)op (1) + op (1)op (1) = op (1). By C-S inequality, we have supx∈I |Zk,2,2 (x)| = op (1) and supx∈I |Zk,3 (x)| = op (1). There2 fore, supx∈I |ˆk (x) − σk (x)| = op (1) holds under H1N . σ2 Under H1 of (2.2), the above proof remain the same except that of Z1,1 (x). By (m), (q), (u), and compactness of I, sup Z1,1 (x) x∈I ≤ sup x∈I,0≤t≤c1 ≤ ≤ sup (µ1 (x) − µ1 (x + t))2 2(µ2 (x) − µ2 (x + t1 ))2 + sup 2(ξ(x) − ξ(x + t2 ))2 x∈I,0≤t1 ≤c1 x∈I,0≤t2 ≤c1 2 2 2 c2 + sup 2(ξ(x) − ξ(x + t2 ))2 = op (1). µ 1 x∈I,0≤t2 ≤c1 Therefore one has (2.25) under H1 . The rest of the results in this lemma can be proved in a routine fashion. Thus the proof is complete. 71 To estimate τ 2 , let {Ik,1 , · · · , Ik,B } and πk be as in assumption (z). Define k ˆ ∆k (x) := gk (x) := ˜ nk j=1 δk,j Kck (x − Xk,j ) , nk j=1 Kck (x − Xk,j ) nk 1 nk π k 1{X j=1 k,j ∈Ik,i } , x ∈ R, x ∈ Ik,i , k = 1, 2. By Remark 3.2 in K-S, the function gk (x) is a simple bin-estimate, which is uniformly ˜ consistent for gk (x) for x ∈ I under condition (z). Recall rk from (2.9). Because τ 2 from ˆ (2.17) can be expressed as τ = q1 = q1 2 2 r1 (x)σ1 (x)g1 (x) dx + q2 ∆1 (x) 4 2 3 v1 (x)σ1 (x)g1 (x) dx + q2 ∆1 (x) 2 2 r2 (x)σ2 (x)g2 (x) dx ∆2 (x) 4 2 3 v2 (x)σ2 (x)g2 (x) dx, ∆2 (x) we consider two estimators of τ 2 : τ2 ˆ 1 := q1 n1 τ∗ ˆ2 1 := q1 n1 n1 n r1 (X1,i )ˆ1 (X1,i ) ˆ2 σ2 ˆ2 σ2 1 2 r2 (X2,j )ˆ2 (X2,j ) + q2 , ˆ ˆ n2 ∆1 (X1,i ) ∆2 (X2,j ) i=1 j=2 n1 i=1 v1 (X1,i )ˆ1 (X1,i )˜1 (X1,i ) ˆ4 σ2 g2 1 + q2 ˆ n2 ∆1 (X1,i ) n2 j=2 v2 (X2,j )ˆ2 (X2,j )˜2 (X2,j ) ˆ4 σ2 g2 . ˆ ∆2 (X2,j ) These estimators have the following properties, which can be proved in a routine fashion. Lemma 2.4.2. Suppose the assumptions of Lemma 2.4.1, (w1), and (z) hold. Then τ 2 = τ 2 + op (1), ˆ and hold under H0 , H1 , and H1N . 72 τ∗ = τ 2 + op (1) ˆ2 2.5 Simulations ˆ In this section we investigate the behavior of the empirical size and power of the test V defined in (2.22) against local alternatives and fixed alternatives. To be specific, let I = [0, 1], Z1 and Z2 be independent standard normal random variables, and independent of {X1 , X2 , δ1 , δ2 }. Recall uφ defined in (2.24). Design and error distributions and functions including φ, ξ, u, ˆ µ2 , ∆l , l = 1, 2, are chosen as follows: X1 ∼ N (0, (0.7)2 ), ε1 = Z1 X1 and X2 are independent; 2 ε2 = Z2 (1 + X2 ); , 2 1 + X1 ∆l (x) = Dl (x), X2 ∼ N (0, 1), l = 1, 2, where D1 (x) = {1 + exp(−0.5 − 0.5x)}−1 , or ∆l (x) ≡ 1, φ(x) = φj (x), l = 1, 2, D2 (x) = {1 + exp(−2 − 2x)}−1 , for complete responses; j = 0, 1, 2, 3, where φ0 (x) = 0, φ1 (x) = (x + 1)2 , φ2 (x) = 2ex , φ3 (x) = 4 cos(x); ξ(x) = ξj (x), j = 1, 2, 3, where ξj (x) = φj (x); u(x) = uj (x), j = 1, 2, 3, where uj (x) = 1[0,1] (x)φj (x), or u(x) = u∗ (x), j j = 1, 2, 3, where u∗ (x) = uφj (x); ˆ j µ2 (x) = log (x2 + 0.5). The kernel is chosen to be K(u) := 3 (1 − u2 )I{|u| ≤ 1}, with bandwidths a = ρ1 N −2/3 , 4 −2/3 bk = ρ2 nk −1/4 , and ck = dk = ρ3 nk , k = 1, 2, where ρi , i = 1, 2, 3, are constants. The sample sizes are chosen to be n1 = n2 = 50, 100, 200. All simulations are based on 2000 73 replications. The nominal level is α = 0.05. The empirical sizes and powers are computed ˆ τ by the relative frequency of the event {N 1/2 T /ˆ ≥ 1.645}. ˆ Table 2.1: Empirical sizes of V , with coefficients ρ1 , ρ2 , ρ3 , and ∆l = Dl , l = 1, 2. u1 u∗ 1 u2 u∗ 2 u3 u∗ 3 (ρ1 , ρ2 , ρ3 ) (.5,.2,.8) (.5,.2,.8) (.8,.5,.8) (.2,.5,.8) (.2,.2,.5) (.2,.5,.8) n1 = n2 = 50 .066 .071 .077 .066 .072 .068 n1 = n2 = 100 .057 .059 .053 .062 .058 .055 n1 = n2 = 200 .050 .052 .051 .052 .050 .049 ˆ Table 2.2: Empirical sizes of V , with coefficients ρ1 , ρ2 , ρ3 , and ∆l = 1, l = 1, 2. u1 u∗ 1 u2 u∗ 2 u3 u∗ 3 (ρ1 , ρ2 , ρ3 ) (.8,.5,.8) (.8,.8,.8) (.2,.2,.8) (.5,.8,.8) (.2,.2,.8) (.2,.2,.8) n1 = n2 = 50 .073 .085 .072 .084 .071 .063 n1 = n2 = 100 .066 .079 .065 .074 .061 .073 n1 = n2 = 200 .052 .052 .049 .051 .052 .050 Before we calculate the empirical powers, we choose suitable coefficients (ρ1 , ρ2 , ρ3 ) in bandwidths for each u and the corresponding test, in order to make the empirical size close to 0.05 when n1 = n2 = 200. To find such coefficients, we compare the empirical sizes among all choices of ρi ∈ {0.2, 0.5, 0.8}, i = 1, 2, 3, and pick the one which is closest to 0.05 at n1 = n2 = 200. For each u, empirical sizes with the best choice of (ρ1 , ρ2 , ρ3 ) at n = 50 and 100 are also listed. These results of data with responses missing at random, i.e. ∆l = Dl , l = 1, 2, are put in Table 2.1; while results of complete data set, ∆l ≡ 1, l = 1, 2, are reported in Table 2.2. Notice that these choices of ρi s are just fairly good ones ˆ among many others. There doesn’t really exist best choices. The behavior of V under null hypothesis will not be affected by the choices of these coefficients for large sample sizes n1 and n2 . 74 ˆ Table 2.3: Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.1, and ∆l = Dl , l = 1, 2. φ φ1 φ2 φ3 n1 = n2 50 100 200 50 100 200 50 100 200 u = u1 .268 .238 .238 .420 .384 .379 .431 .402 .410 u = u∗ 1 .183 .230 .308 .268 .281 .357 .292 .303 .379 u = u2 .281 .265 .269 .436 .388 .389 .472 .403 .388 u = u∗ 2 .230 .255 .341 .318 .339 .399 .344 .368 .421 u = u3 .242 .233 .215 .356 .370 .351 .436 .427 .444 u = u∗ 3 .210 .228 .282 .308 .324 .411 .371 .395 .464 ˆ Table 2.4: Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.2, and ∆l = 1, l = 1, 2. φ φ1 φ2 φ3 n1 = n2 50 100 200 50 100 200 50 100 200 u = u1 .339 .295 .280 .519 .472 .429 .495 .434 .404 u = u∗ 1 .345 .302 .303 .509 .503 .468 .541 .490 .483 u = u2 .238 .247 .236 .382 .353 .373 .401 .380 .376 u = u∗ 2 .325 .308 .282 .503 .477 .494 .542 .507 .503 u = u3 .234 .215 .237 .360 .376 .373 .448 .445 .455 u = u∗ 3 .207 .185 .201 .314 .310 .329 .405 .420 .404 ˆ Table 2.5: Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = Dl , l = 1, 2. φ φ0 φ1 φ2 φ3 n1 = n2 50 100 200 50 100 200 50 100 200 50 100 200 u = u1 .070 .055 .056 .280 .262 .243 .439 .405 .394 .410 .362 .332 u = u∗ 1 .070 .060 .056 .293 .288 .275 .459 .448 .440 .463 .439 .454 u = u2 .068 .060 .054 .292 .264 .244 .450 .419 .374 .417 .391 .361 75 u = u∗ 2 .080 .052 .054 .299 .273 .280 .457 .434 .458 .474 .489 .490 u = u3 .065 .058 .060 .281 .258 .242 .439 .418 .396 .503 .488 .476 u = u∗ 3 .065 .058 .059 .252 .241 .230 .412 .394 .411 .521 .508 .516 ˆ Table 2.6: Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = 1, l = 1, 2. φ φ0 φ1 φ2 φ3 n1 = n2 50 100 200 50 100 200 50 100 200 50 100 200 u = u1 .068 .061 .060 .305 .294 .263 .477 .452 .439 .449 .394 .364 u = u∗ 1 .073 .074 .059 .331 .307 .317 .525 .524 .472 .537 .537 .533 u = u2 .076 .056 .045 .315 .290 .281 .494 .456 .463 .474 .412 .392 u = u∗ 2 .088 .061 .066 .340 .315 .320 .525 .514 .509 .571 .557 .541 u = u3 .063 .052 .041 .301 .266 .265 .500 .470 .474 .553 .541 .535 u = u∗ 3 .078 .054 .062 .274 .288 .253 .488 .467 .471 .611 .601 .610 ˆ Table 2.3 and 2.4 give the empirical powers of V against H1N of (2.1), with respect to missing data and complete data, respectively. These empirical powers of each test with corresponding u are calculated with the coefficients (ρ1 , ρ2 , ρ3 ) in bandwidths given in Table 2.1 and 2.2. ˆ Table 2.5 and 2.6 compare the empirical powers of V with different u s against H1N , by choosing common coefficients ρ1 = ρ2 = ρ3 = 1 in bandwidths, with respect to missing data and complete data, respectively. In each table, one can see that the empirical sizes are ˆ getting closer to 0.05 as the sample sizes increase. For each φ = φj , j = 1, 2, 3, the test V with u = u∗ has the largest, or one of several largest, empirical power among all choices of j u. This is consistent with the result in Remark 2.3.1. Moreover, for each j = 1, 2, 3, u = u∗ j has larger empirical powers than u = uj for all choices of φ. From comparison between two tables, one can see that empirical powers of all these tests at three sample sizes of the complete data’s are larger than those of the missing data’s, while their empirical sizes don’t show much difference. It means that data missing probability affects the power of the test. ˆ All of the empirical powers of V with above choices of u are 1, against H1 in (2.2) with 76 ξ = ξj , j = 1, 2, 3, for both of the missing data and the complete data, and for all three ˆ sample sizes. This result in turn shows the consistency of V . 77 BIBLIOGRAPHY 78 BIBLIOGRAPHY [1] Bosq, D. 1998. Nonparametric Statistics for stochastic Processes, 2nd Edition. Springer, Berlin. [2] Collomb, G. and H¨rdle, W. (1986). Strong uniform convergence rates in robust nona parametric time series analysis and prediction: kernel regression estimation from dependent observations. Stochastic Process. Appl., 23, no.1, 77-89. [3] Eubank, R.L. and Hart, J.D. 1992. Testing goodness-of-fit in regression via order selection criteria. Ann. Statist., 20, 1412-1425. [4] Eubank, R.L. and Hart, J.D. 1993. Commonality of CUSUM, von Neumann and smoothing based goodness-of-fit tests. Biometrika, 80, 89-98. [5] Eubank, R.L. and Spiegelman, C.H. 1990. Testing the goodness of fit of a linear model via nonparametric regression techniques. J. Amer. Statist. Assoc., 85, 387-392. [6] Hall, P., Huber, C., Speckman, P.L., 1997. Covariate-matched one-sided tests for the difference between functional means. J. Amer. Statist. Assoc., 92, 1074-1083. [7] Hall, P. 1984. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal., 14, 1-16. [8] H¨rdle, W. and Mammen, E. 1993. Comparing nonparametric versus parametric rea gression fits. Ann. Statist., 21, 1926-1947. [9] Hart, J.D. 1997. Nonparametric Smoothing and Lack-of-fit Tests. Springer, New York. [10] Koul, H.L. 2011. Minimum distance lack-of-fit tests for fixed design. J. Statist. Plann. Inference, 141, 65-79. 79 [11] Koul, H.L. and Ni, P.P. 2004. Minimum distance regression model checking. J. Statist. Plann. Inference, 119, 109-141. [12] Koul, H.L. and Schick, A. (1997). Testing the equality of two regression curves. J. Statist. Plann. Inference, 65, 293-314. [13] Koul, H.L. and Schick, A. (2003). Testing the superiority among two regression curves. J. Statist. Plann. Inference, 117, 15-33. [14] Koul, H.L. and Song, W.X. 2009. Minimum distance regression model checking with Berkson measurement errors. Ann. Statist., 37, 132-156. [15] Little, R.J.A. and Rubin, D.B. 1987. Statistical Analysis with Missing Data. Wiley, New York. [16] Mack, Y.P. and Silverman, B.W. 1982. Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Gebiete, 61, 405-415. [17] Schick, A. 1987. A note on the construction of asymptotic linear estimators. J. Statist. Plann. Inference, 16, 89-105. Correction (1989), 22, 269-270. [18] Stute, W., Thies, S., and Zhu, L.X. 1998. Model checks for regression: an innovation process approach. Ann. Statist., 26, 1916-1934. [19] Sun, Z.H. and Wang, Q.H. 2009. Checking the Adequacy of a General Linear Model with Responses Missing at Random. J. Statist. Plann. Inference, 139, 3588-3604. [20] Zheng, J.X. 1996. A consistent test of functional from via nonparametric estimation technique. J. Econometrics, 75, 263-289. 80