'THFfiv‘ This is to certify that the dissertation entitled MINIMUM DISTANCE MEASUREMENT ERRORS MODEL FITTING presented by WEIXING SONG has been accepted towards fulfillment of the requirements for the Ph.D. degree in Department of Statistics and Probability M Major Professor’s Signature Date MSU is an Affirmative Action/Equal Opportunity Institution LIBRARY Michi an State Un versity PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 2/05 p:/CIRCIDateDue.indd-p.1 Minimum Distance Measurement Errors Model Fitting By Weixing Song A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2006 ABSTRACT Minimum Distance Measurement Errors Model Fitting By Weixing Song This work proposes a class of minimum distance tests for fitting a parametric regression model to a class of regression functions in the measurement error models. In the errors-in-variables model case, these tests are based on certain minimized L2 distances between a nonparametric regression function estimator and a deconvolution kernel estimator of the regression function of the parametric model being fitted. In the Berkson model case, these tests are based on certain minimized distances between a nonparametric regression function estimator and the parametric model being fitted. The thesis establishes the asymptotic normality of the proposed test statistics under the null hypothesis and that of the corresponding minimum distance estimators in both cases. Simulation studies show that the testing procedures are quite satisfactory in the preservation of the finite sample level and in terms of a power comparison. ACKNOWLEDGMENTS I wish to express my sincere gratitude to my advisor Professor Hira L. Koul for his invaluable guidance. It would have been impossible for me to finish this dissertation without the uncountable number of hours he spent sharing his knowledge and discussing various ideas throughout the study. His general thinking of statistical problem and ways to solve the problem will help my future research. I would also like to thank Professors Sarat Dass, RV. Ramamoorthi and Richard Baillie for serving on my guidance committee. Many thanks to Professors Connie Page and Dennis Gilliland for their advice when I was at the consulting service. Finally, I would like to thank the Department of Statistics and Probability for offering me graduate assistantships, and the Graduate School for offering me the Dissertation Completion Fellowship so that I could complete my graduate studies at the Michigan State University. Last but not. the least, I would like to give my thanks to my mother Fuying Song and my wife Xiuqin Bai, whose patient love enabled me to complete this work. iii TABLE OF CONTENTS LIST OF TABLES vi LIST OF FIGURES viii Introduction 1 1 Minimum Distance Errors-in-Variables Model Fitting 8 1.1 Introduction ................................ 8 1.2 Assumptions ................................ 13 1.3 Asymptotic normality of On ....................... 20 1.4 Asymptotic normality of the minimized distance ............ 25 1.5 Simulations ................................ 40 1.6 Discussion ................................. 49 1.6.1 Sample Size Allocation ...................... 49 1.6.2 General Errors—in-Variables Model Fitting ........... 53 2 Minimum Distance Berkson Model Fitting 57 2.1 Introduction ................................ 57 2.2 Assumptions ................................ 65 2.3 The Consistency of 9?; and 6A.” ...................... 69 2.4 Asymptotic Distribution of 6A,, ...................... 72 2.5 Asymptotic Distribution of the Miniinized Distance .......... 78 2.6 Simulations ................................ 87 BIBLIOGRAPHY 93 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 2.1 2.2 2.3 2.4 LIST OF TABLES Mean and MSE of Ban, (1 = 1, q = 1, Double Exponential ........ Levels and powers of the MD. test, d = 1, q = 1, Double Exponential Mean and MSE of an, d = 1, q = 1, Normal ............... Levels and powers of the MD. test, d = 1, q = 1, Normal ....... Mean and MSE of an, d = 1. q = 2, Double Exponential ........ Levels and powers of the MD. test, d = 1, q = 2, Double Exponential Mean and MSE of fin, d = 2, q = 2, Double Exponential ........ Levels and powers of the MD. test. d = 2, q = 2, Double Exponential 711 = n2, d = 1, q = 1, Double exponential ............... Same sample, at = 1. q = 1, Double exponential ............. n1: n2, d =1,q =1, Normal ...................... Same sample, d = 1, q = 1, Normal ................... Same sample, d = 2, q = 2. Double Exponential ............ Mean and MSE of 67,1, (1 = 1, q = 1 ................... Levels and powers of the MD. test, d = 1. q = I ............ Mean and MSE of an, d =—. 2. q = 2 ................... Levels and powers of the MD. test, d = 2, q = 2 ............ vi 42 45 46 46 46 47 48 49 51 51 51 90 92 92 2.1 Comparison Plot LIST OF FIGURES ooooooooooooooooooooooooooooo Introduction In the classical regression model, we use a set of variables, say d—dimensional predic- tor X, to explain the response Y, a one dimensional real random variable, here, both X and Y are observable. But in the real applications, the predictor X is not always observable. To deal with the statistical inference problems in this case, statisticians proposed the so called measurement errors model. In this model, a surrogate of X, say Z, is observed. Then how to investigate the statistical relationships between X and Y based on the data from Z and Y is the main issue in the measurement errors models. Based on the stochastic structure between X and Z, the measurement errors model usually can be divided into two classes, error models which including the errors-in- variables models in which Z = X + u and the error calibration models in which Z = a + BX + u, and the Berkson model (or Regression calibration models) in which X = Z + r}, where u, 17 are measurement errors. About this classification, see Carroll, Rupert. and Stefanski (1995) for the details. The measurement errors regression models have been receiving a continuing atten- tion in the statistical literature over the last century. For some literature reviews on errors-in-variables models, see Gleser (1981), Anderson (1984), Fuller (1987), Bickel and Ritov (1987), Carroll and Hall (1988), Fan (1991a, 1991b), Fan and Truong (1993). Carroll, Rupert and Stefanski (1995), and the references therein. As for the Berkson models, see Rudemo, et al. (1989), Huwang, L. and Huang, Y.H.S. (2000), Wang (2003, 2004) for some literature reviews. Most. of the existing literature has focused on the estimation problem. Model checking or lack-of-fit testing problem is not discussed thoroughly. Only some sporadic results on this topic can be found in the literature. In the errors-in-variables model case, Fuller (1987) discusses a graphic method for lack-of-fit testing of a linear errors in variables regression model. Carroll and Spiegelman (1992) consider the graphic and numerical diagnostics for nonlinearity and heteroscedasticity in linear regression model with errors in variables. Zhu, Song and Cui (2003) considered the Iack-of-fit testing in the polynomial regression with errors in variables and constructed a residual-based test of score type, but their method has two limitations. First, the predictor is one dimensional and the regression function under the null hypothesis is polynomial; second, the density function of the predictor is assumed to be known which is generally unrealistic in the real applications. Cheng and Kukush (2004) also addressed the same problem based on so—called adjusted least squares estimators. Few results on the errors in variables regression model checking without imposing strict conditions are available in the literature. Berkson model has a relatively simpler structure than errors-in-variables model in that the density function of the predictor can be estimated by the usual kernel method. Like the errors-in-variables models, there is a vast literature on the estima- tion problems about. the parameters, but no discussion on the model checking problem for this case. Many interesting and profound results, on the other hand, are available for the regression model checking problem in the absence of errors in predictor, see, e.g., Eubank and Spiegelman (1990), An and Cheng (1991), Eubank and Hart (1992, 1993), Hart (1997), Stute (1996), Zheng (1996), Stute, Thies, and Zhu (1998), Khmaladze and Koul (2004), among others. For a general discussion on the model fitting in the classical regression case, a good reference is Hart (1997). Stute (1996), Stute, Thies, and Zhu (1998) constructed a test statistic based on certain marked empirical processes. Their simulation results show the testing procedure is quite satisfying, but their procedure can only be used for the one dimensional case. The recent paper of Koul and Ni (2004)(K-N) uses the minimum distance (MD) ideas developed by Wolfowitz (1953, 1954, 1957) to propose tests of lack-of—fit for the regression model without errors in variables. Their work can be used to deal with the multidimensional case. In a finite sample comparison of these tests with some other existing tests, they noted that a member of this class preserves the asymptotic level and has very high power against some alternatives and compared to some other existing lack-of-fit tests. Our work will extend this methodology to the measurement errors model set up. To be specific, in the classical regression set up, let X ,Y be random variables, with X being al.-dimensional and Y one dimensional with EIYI < oo. Letu(2:) = E(Y|X = 2r) denote the regression function, and let {m6(-) : 0 E G}, 9 C Rq, q 2 1, be a given parametric model. The statistical problem of interest here is to test the following hypothesis: H0 : u(.r) = -m.90(;r). for some 60 E 9, and all 3: E I, vs. H1 : H0 is not true, (1) where I is a compact subset. of Rd, (1 Z 1, based on a random sample (Xi,l’z-);1 S i S n from the distribution of (X, Y). In the K-N paper, the design is random but observable. Let K, K * be two possibly different density kernels on [—1, lld. For any bandwidth sequence h, let I Kh(:r) Z= iiK(;l-), Khz-(I) I: Kh(.’L‘-X.i), th(CL‘) =lZKhz(l') Note that f X h is the kernel estimator of IX corresponding to the kernel K *. K-N defines Tum/[i {Km-m >s?fu.,3 za/Q, is of the asymptotic size a, where 20- is (1 — a)th percentile of the standard normal distribution. Unlike in other related papers, K-N do not need the null regres- sion function to be twice continuously differentiable in the parameter vector. The asymptotic normal distribution of fin and Tn(9~n.) were made feasible by recognizing to use different band widths for the estimation of the numerator and denominator in the nonparametric regression function estimation. A consequence of the above asymptotic normality result is that at least for large samples one need not use any resampling method to implement these tests. In this thesis, we will discuss, in the measurement errors setup, how to develop testing procedures for the following hypothesis: H0 : u(:r) = m90(;r), for some 60 E 9, and all :13, vs. H1 : H0 is not true. (3) From K-N’s procedure, we know that if we want to use the minimum distance method, a kernel-type regression estimator must be constructed, but this in turn implies that we must find an estimator for the density function of the predictor. This is not a problem in the classical regression case in that the predictor is observable. But in the measurement errors models case, the predictor X is not observable, to adapt K-N’s minimum distance method, the above procedure needs some modification. We now briefly describe the modification needed for the errors-in-variables model. It consists of two steps: Step 1. Hypothesis Change: The hypothesis (3) concerns with the regression function u(:r) which depends on the true predictor, but the true predictor is not observable. By recognizing that 11(2) 2: E(Y|Z = 2) = E(;1.(X)|Z = 2), we consider the new regression model Y =2 11(2) + C, where the error C is uncorrelated with Z and has mean 0. The problem of testing for H0 can be transformed to test for 11(2) = 1190(2), where 119(2) 2: E(m9(X)|Z = 2). Since Z is observable, so we can construct a classic kernel estimator for the new regression function 11(2). Step 2. Deconvolution Kernel Density Estimator: The minimum distance will be constructed based on the classical kernel estimator of 11(2) and a proper estimator of 119(2) := E(m9(X)|Z = 2) under the null hypothesis. Note that, under the null hypothesis, 1! (z) = fm9($)fx(x)fu(z — I)dx 6 ffXIIIfuIZ - I)dx ' To estimate this quantity for given 0, we need an estimator of f X' In this connection the deconvolution kernel density estimators are found to be useful here. Putting the deconvolution kernel density estimator of f X into the above expression, we construct the deconvolution kernel estimator of 119(2). To obtain the asymptotic distribution of the test statistic, we need to consider the asymptotic behavior of the deconvolution kernel estimator of 119(2). Although we extend Stefanski and Carroll (1991)’s result to a more general case, the convergence rate of the deconvolution kernel estimator is still slower than the classical kernel estimator. This brings us some difficulty in proving the technical results. To overcome this difficulty, we adopt the sample splitting technique. The sample splitting scheme required in the proof is not so realistic in certain cases, but the simulation results show that the test statistic behaves good if we do not follow the sample splitting scheme. In the Berkson model case, things become relatively easy. From X = Z + u and the independence between Z and u, E (YIZ ) is known under the null hypothesis except the parameter. After changing the hypothesis, the testing procedure can be developed in the similar way as done in the errors-in-variables model case. This thesis is organized as follows. Chapter 1 discusses the model fitting for errors-in-variables model in which the regression function under the null hypothesis is linear in parameters. Theorem 1.3.1 gives the asymptotic distribution of the un- derlying parameter estimator. Theorem 1.4.1 gives the asymptotic distribution of the minimized distance under the null hypothesis. A test statistic therefore can be con- structed based on this theorem. Several simulations are present in section 1.5. Some problems related to the sample allocation scheme and the results about the general errors-in-variables models are discussed the subsequent section. Chapter 2 discusses the minimum distance model fitting in Berkson model. Corol- lary ?? and Theorem 2.3.1 state the consistency of the underlying parameter estima- tors, Theorem 2.4.1 and Theorem 2.5.1 give the asymptotic distribution of the param- eter estimator and the minimized distance under the null hypothesis. A test statistic therefore can be constructed based on the Theorem 2.5.1. Simulations conducted in section 2.6 show the testing procedure is quite satisfactory. CHAPTER 1 Minimum Distance Errors-in-Variables Model Fitting 1 . 1 Introduction The findings in the classical regression case motivate one to look for tests of lack-of- fit in the presence of the errors in variables based on the above minimized distances. Since the predictor in errors in variables models are unobservable, clearly the above procedures need some modification. To be specific, in an errors in variables regression model of interest here, one observes Z2" Y2 obeying the model Y1- =u(X.i)-I-€z', Zi :Xi+ui’ 1 Si 371, (1.1) where Xi’s are the unobservable d-dimensiona] random design variables. We addi- tionally assume that (Xi, 52-, “'2" Zi, Y2), 2' = 1,2, - ~ ,n, are i.i.d. copies of (X, 8, u, Z, Y). The r.v.'s (X, 11, e) are assumed to be mutually independent, with u be- ing d-dimensional, and 5 being l-dimensional r.v.’s, E(€) = O, E(u) = 0, and their 8 marginal distributions having densities f X1 fir, and f5, respectively. For the sake of identifiability, the density f“, is assumed to be known. This is a common and stan- dard assumption in the literature of the errors in variables regression models. The densities f X and f5 need not be known. The problem of interest in this chapter is to develop tests for the hypothesis H0 : 11(1) 2 631(1), for some 90 E Rq, v.s. H1 : H0 is not true, (1.2) in the model (1.1). A way for constructing tests here is to first recognize that the independence of X and 5 and E(5) 2 0 imply that 11(2) := E(Y|Z = 2) = E(u(X)|Z = 2). Thus one can consider the new regression model Y = 11(2) + C, where the conditional expectation E((|Z) = 0, hence C is uncorrelated with Z. The problem of testing for H0 is now transformed to test for 11(2) 2 1190(2), where 119(2) := 6TE(1'(X)|Z = 2). Note that for any 2 for which f Z(2) > 0, we have _ ”(meme —- 2:)de 11(2) — ffx($)fu(z — xldiv (1.3) From (1.3) one sees that if f X is known then f Z is known and hence 119 is known ex- cept for (9. Let Q(2) :2 E (r(X )|Z = 2). Therefore a modification of K-N’s procedure in this case is as follows. Define ._ __1_n .Z ._ T . 2 .. 61 W) .— / Ian09 6 Q(Zz))] th), 66R. :1: 3 || arg min9ERan(6), Here h is a bandwidth depending only on n and K hiiz) is redefined as K ((2 -— Zi)/h)/hd for any kernel function K and bandwidth h . Then we may use 611 to 9 estimate 6, and construct. the test based on the Trim—n.)- Unfortunately, f X is generally not known and hence f Z and (2(2) are unknown. This makes the above procedures infeasible. To construct the test statistic, one needs estimators for f Z and 62(2). In this connection the deconvolution kernel density estimators are found to be useful here. For any density L on Rd, let (i) L denote its characteristic function and define __ 1 _. . ¢L(t) ._ _ 12 Lha) ._ ——(27r)d/Rdexp( 1t emu/mat, 1._( 1) /, $6R,(1.4) ,, I Tl x—Zz‘ thCL‘) == W211“ h ), :EERd. The above Lh is called the deconvolution kernel function, while th is called de- convolution kernel density estimator of f X, cf. Masry (1993), Carroll, Ruppert and Stefanski (1995). Note that Q(2 ) is equal to R(2 )/fZ(2 2,) where R(2) = fr(a:)fX(:r)fu(2 —— :r)dx, and fZ(2) =(ffX5r )fu.( 2 - a3)d:r Then one can estimate Q(2 )b y one) = Rats/me), (1.5) where Rn(2) = f 1‘(:17)th(:1:)le,(2 — :r)dx, th(2) =f th(:1: :r)fu(2 —:r)d:r. At this point, it is worth mentioning that, by the definition of Lh and a direct calculation, one can show th is nothing but the classical kernel estimator of f Z with kernel L and bandwidth h. That. is, th(2) = 3:1L((z — Z,)/h)/nhd. Our proposed inference procedures will be based on the analogs of Tn where (2(2) is replaced by the above estimator Q71, and f Z is replaced by a kernel estimator. A very important. question related to the above procedure is the following: Are 10 the two hypotheses, H10 : 11(1‘) 2 6311-17), for some 60. and all 51:, and H20 : 11(2) = 63E(r(r)|Z = 2), for some 190 and all 2, equivalent? The answer is negative in general, but in some special case, these two hypotheses are equivalent. See a general discussion in Section 1.6.2 The large sample behavior of the deconvolution kernel density estimators strongly depends on the smoothness of the distribution of measurement error 11. Using the terms from Fan and Truong (1993), a distribution is called ordinary smooth if the tails of its characteristic function decay to 0 at an algebraic rate; it is called super smooth if its characteristic function has tails approaching 0 exponentially fast. As Masry (1993) showed, the local and global rates of convergence of the sequences of deconvolution kernel density estimators are slower than that of the classical kernel density estimators. Moreover, these convergence rates are much slower in the super smooth cases than in the ordinary smooth cases. But Stefanski and Carroll (1991) shows that in the one dimensional case with 1‘(:r) = :L‘, for estimating E(X|Z = 2) by (271(2), faster rates are obtainable. For example, in the case of normal measure- ment error, the mean squared error rate of convergence of f X ’73 to f X is of order (log(n))—2, while the convergence rate of Qn(2) to E(X|Z = 2) is of order n—4/7. Even so, the convergence rate is still slower than the mean squared error convergence 4/5 in the one dimensional case. This rate of the classic kernel estimator, which is n— creates extra difficulty when considering the asymptotic behaviors of the analogs of the corresponding MD estimators and test statistics. In fact, if we base the estimators of f X1 hence Q(2) and the other quantities on the same sample, the consistency of 11 the corresponding MD estimator is still available. but. its asymptotic normality and that of the corresponding MD test statistic may not. be obtained. We overcome this difficulty by using different bandwidths and splitting the full sample, say 5, with sample size n into two subsamples, SI with size 111, and $2 with size 112, then using the subsample 52 to estimate f X hence Q(2) and the subsample 31 to estimate the remaining quantities. The sample size allocation scheme is stated in section 2. To be precise, let 111 n 121.2(2) == 21(71ng 1/n1, wa< 1:: Z Lw<(x—zj)/w>/n2wd, 2:1 j=n1+1 Rn2(Z) :: /T($)qu,1(I)fu(Z—.T)dl', qu12(" I: )/wa2 (dd?)fu(Z—$).’L‘ (I), Qng (Z) 3: R112 (3)/fzw2IZ), where h1, h2 depend on 111, and 1111 and 1112 depend on 112. Now define - 2 117,6 := K , Y,-6T n z,- dGz, <1 “11,12,292: 1,116 2< >1] (1 A On. arginf9eRq Afn(6). (1.6) Then we may use 6n to estimate 0, and construct the test. statistic through Mn(én). We first prove the consistency of fin for 0, then the asymptotic normality of , /n1(én — 60). Finally, let 111 C31” 3: ‘92‘1Q'1112fzjal C71 3- "12 Z I/KhlzIz ZlCZdU/hZIZ Z) 711 A 2 F11 1:711—2 Z (/Kh1,( 2)I\h1j(2 )Cvidewh2IZl) , 1553': 1 7 _(I____G(3) day 2) (1.7 WI th2< > ) I2 We prove that the asymptotic null distribution of the normalized test statistic n1I131/2P; 1/2(1l{n(én) — (in) is standard normal. Consequently, the test that re- jects H0 whenever nhcll/2F;1/2|Aln(6n) — CHI > 20/2 is of the asymptotic size a. This chapter is organized as follows. Section 2 states the needed assumptions. A multidimensional extension of Lemma A1 in Stefanski and Carroll (1991) is also proved there, together with some other needed results. Section 4 proves the asymp- totic normality of the MD estimator. The asymptotic normality of the MD test statistic is discussed in section 5. Section 6 includes some results from a finite sample simulation study. In the sequel, c will denote the generic finite positive constant whose value depends on the context. For any vector b, bT denotes its transpose. For any function f, we will use f, f to denote the first and the second derivative with respect to its argument. The convergence in distribution is denoted by 2—1—‘1, and Nd(a, B) stands for the d- dimensional normal distribution with mean vector a and covariance matrix B and E S denotes the conditional expectation given the subsample 51. The integration 1 with respect to the G-measure is understood to be over the compact set I. 1 .2 Assumptions This section first states the various conditions needed in this chapter. About the errors, the underlying design and the integrating o-finite measure measure G, we assume the following: 13 (e1) The random variables {(Zi, Y1) : Z, G Rd,Yi E Rt :2 1,2, - -- ,n} from (1.1) are i.i.d. with the conditional expectation 11(2) 2 E(Y|Z : 2) satisfying f 112dG' < 00, where G is a o-finite measure on Rd. (e2) 0 < a? = E52 < oo, E||1“(X)||2 < 00, and the function 62(2) = EIBgMX) — 63Q(Z))2|Z = 2] is as. (G) continuous on I. (e3) E|512+6 < co, E||r(X)||2+6 < 00, for some 6 > 0. (e4) 131514 < 00, 13111110114 < oo. * (u) The density function fu is continuous and f |¢u(t)|dt < oo. (f1) The density f X of the d—dimensional r.v. X, and its all possible first and second derivatives are continuous and bounded. (f2) For some (50 > 0, the density f Z is bounded below on the compact subset 160 of Rd, where for any 6 > 0 d I — yElR : max y-— '36, 1.8 6 I lsj'sdl J JI ( ) y=(y1"”iyd)Tyz=(z1’..I92d)T,ZEI}, (g) G has a continuous Lebesgue density 9. About. the null model we need to assume the following: (m1) There exists a positive continuous function J (2), such that as Hi“ —> 00, f (dz — r) — 1(2)) €Xp(-ith)fu(.r)d.r ” ” ant) S J(z). for some a 2 0 and all 2 6 Rd, and EJ2(Z) < oo. 14 (m2) E||1‘(Z)||2 < oo, E12(Z) < 00, where [(2): f ||r(.r)||fu(2 — :r)d;r. About the kernel functions. we assume: (f) The kernel function L is a density, symmetric around the origin, ||t||0|¢L(t)| < 00, for all t 6 Rd; l\=‘loreover, f H‘UH2L(U)dU < 00 and f lltlllglgbL(t)|dt < 00 for ,3 = 0, a, with a: as in (1111). About the bandwidths and sample size we need to assume the following: (11) With n denoting the sample size, let n1, 112 be two positive integers such that n = 111 +112, 11.2 = [11?], b > 1+ (d+ 2a)/4, where 01 is as in (m1). (hl) hl ~ 11?, where a. < min(1/2d,4/d(d+ 4)). (112) 112 = cabana/12111461”). —1 d+4+2a (WI) 1111 = n2 /( ). (W2) 1112 = 02(log(n2)/n2)1/(d+4). Assumption (m1) is not so strict as it appears. Some commonly used regression functions such as polynomial and exponential functions indeed satisfy this assumption as shown below. Example 1: Suppose dzq, r(;1:) = 1:, and 11 ~ Nd(0, Eu). Then, f oo, lou(t)/¢u(t)| = 0(ltla), for some 16 a Z 0; (iv) 112 —+ 00, and U11 —+ 0. The kernel function L used in the deconvo- lution estimator is assumed to be four-times continuously differentiable, compactly supported and real valued. The following lemma. is a multidimensional extension of the above results which will be frequently used in the sequel. Lemma 1.2.1 Suppose d 2 1, and (f1), (11), (1111), (h!) hold. Then for any 2 6 Rd, ||ERn2(2)—R(2)||2 3 21111201), C T122113! 131161.2(21— Bang/21112 s (126122?“ + 112(211121. where [(2) is as in (m2), J(2) is as in (1111) and where c is a constant not depending on 2, n2 and 1111. Proof. A direct calculation yields that for any 2 6 Rd, Ewa1(x) = f L(v)fX(a: — u-wfldv. By assumption (f1), there exists a vector a(:r, 11) such that f X(:1: — 111111) has a Taylor expansion up to the second order, fX(:1: — 111111) = fX(:1:) - wlvaX(a:) + wgvauX (a(:1:, 11))11/2. Hence ERn2() ;//"II(')L (11 )f(X :1: — 111111)fu(2 — 51:)dudrr =1~L(5w1)cDuU+ Slldtds- n / |¢L(tw1)gbL(sw1)ou(t + s)|dtds (1.9) + Note that for any m, p = 0 or 02, from assumption (5), we have / 112(1P11211m12L(221mus-2112.12 .2 31122.12 _._ _2d . . , 3 21p m / (Itupusu’"12L(112L(s122((2221/21112222 18 222;”""‘2” / (1211”"12u211122((2+21121112222 (.wi'p—m—2d/||s||m|oL(s)|(/Iou((t +s)/z_1.'1)|dt)ds = 1‘ ”‘m‘d / 1121(’"'I2L(21122 - / 12.121122 = 1‘22”“"’"“. | /\ The second claim in the lemma follows from (1.9) by using the above inequality. [3 By the usual bias and variance decomposition of mean square error, the following inequality is a direct consequence of Lemma 1.2.1, C 2711222221— 11(2112 5 222111221 + (12212140 + 112(211121. d 1 71211) If the bandwidth wl is chosen by assumption (wl), then _ 4 231222.2(21 — 12(2112 s 222 3+20+4 (12(2) + 12(2) + (12(21121. (1.101 In the sequel, we will write 7(2) := 12(2) + 12(2) + ”2(2)“? (1.11) The following lemma we will be used repeatedly, which along with its proof appears as Theorem 2.2 part (2) in Bosq (1998). We state the lemma for a sample size n and a bandwidth h, they may be replaced by n1 or n2. h2 or 1112 according to the context. Lemma 1.2.2 Let fZ be the kernel estimator with a kernel K which satisfies a Lip- schitz condition and bandwidth h. If fZ is twice continuously differentiable, and the bandwidth h is chosen to be cn(log(n)/n.)1/(d+4), where ('n —1 c > 0, then (1221... 211(2/ 122(2112/(“9 2:1} (12(2) — .1Z(21( ——> 0 for any positive integer k and compact set I. 19 1.3 Asymptotic normality of 92 Recall the definitions in (1.6). Because the null model is linear in 6’. so the minimizer 6." has an explicit form obtained by setting the derivative of [1171(6) with respect to 6 equal to O, which gives the equation "1 ”1 /I"11 ZKh12(Z) Qn(Z2 iZKh11@)Qn2(Zi)dl/w2(z') 63n 1 "’1 f1 22—1 21 19.112111" 31—2 1211221222122. 1d2h,(21 Adding and subtracting 63Qn2(Zz-) from Y2 and doing some routing arrangement, én will satisfy the following equation: ”1 £7111 ZKhli( ZlQn2( Zi') —Tl11in§11Khlz(:1)Qn2(Zi)dI/3h2(z) . (én _00) 1 T A 1 - , (1.12) The above explicit relation between én — 90 and the other quantities allows us, com- pared to K—N, to investigate the asymptotic distribution of én without proving the consistency in advance. Most importantly, the separation of bin from R-n,2(2) makes a conditional expectation argument in the following proofs relatively easy. To keep the exposition concise, let ”1 2121(21 == 51—211,.”(21m—63Qw211. (1.131 Dn(2) := %1::Khli@l(Qng(Zil Q(Zillv 20 l 1 1111 := A1, A1,, (2) := 7—.— _ _._ 1( "1,- :2: 12(31 féhzbz) f%(z) The main result in this section is the following theorem: Theorem 1.3.1 Suppose H0, {61), (e2), (85’), (u), (f1), (f2), (m1), (m2), (5), (n), (111), (112), (1111), and (11121 hold, then ,x—n1(én — 90) => Nd(0,2612261), where 2 T 2 _ , T2 Z = T(z)Q(z)Q (219(21z 2:0— /Q(~)Q (1dG(1, 2 j M d, and 72(2) = a? + 62(2), where 02, and 62(2) are defined as in (62). Proof. It suffices to show that the matrix before an — 00 on the left hand side of (1.12) converges to 20 in probability, and \/n_1 times the right hand side of (1.12) is asymptotically normal with mean vector 0 and covariance matrix 2. Consider the second claim first. Adding and subtracting 03Q(Z,-) from Y,- — 60 Qn2(Z ,) 1n the first factor of the integrand, and adding and subtracting Q(Z,- ) from Qn2(Z ,) 1n the second factor of the integrand, replacing 1 / f Z h (z ) by l/fgh2(z) — 1/f%(z)+1/fZ(z) := An1(z) + l/fZ(z), \/n—1- times the right hand side of (1.12) can be written as the sum of the following eight terms: 3111 4‘ / U111 71(22)An1( 21dG(211 S112 =fl / U111(21Dn(21dw(211 51,13 4“ / U111 11n1(21A111(21dG(211 1144— / U111 1211111()dw(21, 8,15 := —¢— / D11 T(2.1An1(21dc(2160, -¢r/Dn.(:01€(21d11(2160, S117 == ya / Dr1-1(2112n1(21A111(21dG(21 60, 5718 2: —,/nl/D,;(z);1,£1(z)dti‘(z)60. Cr) :3 cu ll 21 Among these terms, 8,14 is asymptotically normal with mean vector 0 and covariance matrix )3. The proof uses Lindeberg—Feller central limit theorem, and the arguments are exactly the same as in K-N with ”’60(Xi) and #26009) there replaced by 6362(le) and Q(Zl-) here, respectively. The proof is omitted. All the other seven terms are of the order 0p(1). Since the proofs are similar, only Sn8 = 013(1) will be shown below for the sake of brevity. We note that by using a similar method as in K-N, we can show Un1(2) is Op(1/ nlh‘li), which is used in proving Snl = 0p(1) for l = 1,2,3. First, notice that the kernel function K has compact support [—1,1]d, so K h 12- is not 0 only if the distances between each coordinate pair of 2i and z are no more than h. on the other hand, the integrating measure has compact support I, so if we define y: (1113.” ayle,Z=(21,“‘ ,Zd)T,ZEI}, then Zhl is a compact set in Rd, and Khli = 0 if 2i ¢ Ihl. Hence, without loss of generality, we can assume all Z1: 6 Ih 1. Since f Z is bounded from below on the compact set 150 by assumption (f2) and Ihl C 160 for 711 large enough, so from assumption (W2), Lemma 1.2.2. we obtain ., 2 sup Tf—Z—fi — 1 = 0((10gkn2)(logn2)m) as, (1.14) ZEIhl fZ'll’2(Z) n2 sup Tf—Z—EL' 7- Op(1). ZEZhl qu72(2) Secondly, we have the following inequality: 22 . Illinngl') - R(Z,-)|l fZ(Zz) . Z: — Z- < . - ~ ————.fZ(Zi) —1 - Z- 115 + quQsz') ”Q( I)“. ( . ) Recall the definition of 5728' We have 771 llSnsll _<. \/_||90|| / 2319.1. zIIIQn2( I— Q(Zz°)ll i1 2: Kh1,(zIIIQ( g 2)m) 712 24 which is of the order 0p(l) by the assumption (n). To finish the proof, we only need to show the matrix before an — 60 on the left hand side of (1.12) converges to 20 in probability. Adding and subtracting Q(Zi) from Qn 2(Z 2") this matrix can be written as the sum of the following eight terms: Tm 2: /Dn (aszAan n2—/Dn>11T11dc11nw11mn1d012i T715 2: /Dn()Dn (2.')d1,/")(Z), T116 :=/Dn(z)u£1(z)d¢(z), Tn7 z= fun112>Dn12>dw, Tns:=/un11z>u%"1dw. Notice the connection between T 111 and S725, TnQanB and Sn7,T 715 and 31161 TnGan'? and 3718- By using similar argument as above, we can verify that Tn! = 0p(1) for l = 1,2,3,4,5,6,7. From (1.14), and the second fact in (1.17), T714 is also of the order of 0p( 1). Finally, employing similar method as in K-N, we can show Tn 8 converges to 20 in probability. Thereby proving the theorem. [:1 1.4 Asymptotic normality of the minimized dis- tance This section contains a proof of the asymptotic normality of the minimized distance Alb-Adm). To state the result precisely, the following notations are needed: a 12.431212 1 (wag—agengzi), 721 6n, := _2Z/Kh12 (2(11;)(z ), 1\7[n(00):= /[n1 121(h11(3)Ci]2dl/"'3( ), 25 T A F 2: 2/(T2(z))29(z)du(z)-/[/K(11.)K(u+1')du]2d11. where 72(:) is as in Theorem 1.3.1. The main result proved in this section is the following: Theorem 1.4.1 Suppose If H0, (61), (e2), (e4), (11), (f1), (f2), (ml), (7112), (K), (n), (M). (I12) ,(wl) and (1122) hold, then 711hf/2f‘51/2(A171,(6An)— 6771):) N(0,1), where 677. f‘n are as in {1.7). The proof of this theorem is facilitated by the following five lemmas: Lemma 1.4.1 If H0, (61), (62), (e4), (11), (f1), (f2), (m1), (m2), (5), (n), (hl), (1111) and (1112) hold, then d/2 ,” ~ Proof. Replacing C,- by 5,- + 193 (Q(Z,-) — Qn,2(Z,-)) in the definition MMBO) and expand the quadratic term, n1h7/2(1\~[n(60) — C'n) Can be written as the sum of the following four terms: B 1 nl n—g Khlzlz )Kh1j( Z)€1€jdw(z) "1 3,, == —"2— K111 )11,1,()(11,13‘1Q(z ) Q11212>>11211 711 n1 2 ,/ ”1 3112 := i¥§/K h1i( 3,)Kh1j( Z)151'90(Q (Zj)-Qng(Zj))dh9(Z)a "1 Z 12 / and BN4 '= 73%:Zl/Khlihflgh1jm60 (leil —Q712(Zi)) 7» J 631Q12,)- 131212))1111) 26 Using the similar method as in K-N. one can show that. filial/28,71 => 1N'd(0, P). 1 . . . . . d/2 , To prove the lemma, it is suthIent to show enlhl Bnl = op(1) for l = 2,3,4. We begin with the case of l = 2. By (1.14) and the inequality (1.15), and let CIIIj =§/Kh11)(z Kh1j(z )E1dw(z ) 2 J then 811.2 is bounded above by the sum 37121 + 8,122, where n1 1 A 8121 := 0111—, 11121212,>—R12,-)H1011,11, ”1j=1 logn2 3% 1711 B1122 z= 0((logkn2)( 1, ) )-—,ZiIIQ12,->n-Icn1,n. ”31 :1 On the one hand, by the conditional expectation argument and inequality (1.10), we have —, 2111111121 (2 ,-)— R1 Zhu-1011,11 Enlj=1 ”1 = "—1—2 Z[E31(HRn2(Zj—) R(Zj)”) ' [anjll En1j=1 ——2/(d+2a+4) 1 2Z g 0712 E[12 g: T /( ])|ijl] n2 1j=1 —2 (1+2 +4 :11, /( a )aElTl/2(21)'|Cn,illl- Now. consider the asymptotic behavior of E [71/ 2(Z1) - ICm-l I]. Instead of consider the expectation, we investigate the second moment. It is easy to see that ET(Zl )C12111 27 equals to T()Z1 Z Z //1\/,1,(2 )Ixhl1(z)Kh1j(y)Kh11(y)€,€jd1&(z)d1,b(y) (1.18) zaéljaél = 1121 —1>E// 119,121: 111,211 >131 E1111,1(2)I<,,,1(1)7121)1211211111). The second equality is from the independence 011,-, i = 1, - - - ,nl and E61 = 0. But E(Kh12(3)Kh12(1/)€%)= 1K11212>1<1121y>12§M2122)» = Zé?/11(5)?)K("’I:1“)(0,21115211))1121111111 1 Similarly, we can show that — 10103 + 15212 — hlv))fZ(z — 1111mm E(Kh11(2)Kh11(y)T(Zi)) = l—ld-/K(v)K y _ - v)T(z - hiv)fz(z - h110611)- 11 Putting back these two expectations in (1.18), and changing variables y = z + hlu, then by the continuity of f2, 62(2), 9(2), and T(z), we obtain ET(Z1)C2 nil : (n1 — 1)h1—d. Therefore, —2 d+2a+4 1 —d 2 7:1[IIR712(Z] RZ( dll'lCni'll—‘Oolg /( )— nl—lhl /). J J M V En1j=1 b d 2 d 2 r This, in turn, implies 81221: Op(n1—2 /( +2a+4)— 1/ h1— / ), by assump— " tion (11). Similarlv one can show 11 {2:311 1[||Q(Zj)|| - ICnin is of the 1.3.1 1 2 —d 2 2/(d+4) order Op(n1— /h1 /). Thus, 87,22 = op((logkn1)(logn1/nfi) - . Hence —12—d2 nl/h1/) 1 2b 1 2b 2 d 2 _“—‘_-—_ _ 1 nlhl/ an2| 2 011(11? d+211+4> + Op(n1Q m logk 111(log nflm) 28 is of the order 0p(1) since b > ((1 + 2a + 4)/4 by assumption (n). . .,.....,,.d/2 By exactly same method as a )o\ e, we can show that. nlh1 B713 = 0p(1). . ' d/2 7 It remains to show that nlh1 B774 2 0p(1). Note that 1 n1 , 2 ‘ (3,41 s A? Z. [1,11,121Khlj12111601 112712121) — c2121)“- i¢jA ”Q'ngizj) - Q(Zj)”dl¥/‘(3)- From (1.15), the right hand side of above inequality is bounded above by the sum 2 l . 01,11) . 3,,41 + 0p((logk n2)( 0g ”2)m) (37242 + 13,,43) (1.19) n2 +019 ((10122 E2) (10:32:?) ELL4:71) 37144’ where 1 n1 2 87141 := 7 [K1,11121Kh1j121-1122712121) «(El->11- "1 #1 (1117:2123-1— 31211112121212), 711 1 A 87142 == 72 / Kh1112)1IIdw12), "12221 1 n1 , B7143 3: fiZ/KMAZNfile-llan2(Zj)—R(Zj)|| ' ”1212011211121, 12212“ 1 ”1 En44 == 72 [K11112>1-111212111111121231111212). ”1 2211' By a conditional expectation argument, Cauchy—Schwarz inequality, (2.2), and the continuity of f Z and T(z), we obtain n—4/(d+2a'+4) EEn41 s 2222‘4/‘d+2“’+4) / E11<111(2171/21211122212) = 01 2 >. 29 —4/(d+2a+4) This implies B7141 = Op(n.2 ), since I) > (d + 211 + 4)/4 by assumption (n), so that d 2 d. 2 —4b d+2a+4 nlhl/ -op(1)B,,41= nlhl/ . 010(1)op(n1 /( l) = 0,,(1). Similarly, we can show —2 d+2a+4 . —2 d+20+4 37142 = 019012 /( )), 812.43 = 019012 /( )), Bn44 = 010(1)- Therefore, for l = 2, 3, 2 nthll/Zop((10gk RB) (13%?) m)Bn4l 1_d?£4—d+223+4 (11/2 32—4 2 0p(n1 [11 (logkn1)(logn1) + ) which is of the order op(l) by assumption (n). For 87,44, we have 4 d2 logn d— "lhl/ '0p((login2)( n22) +4)Bn44 4b 1—3— d 2 4 2pm, +4h1/(108%"1)(10g"1)a+4) which is also of the order 0p(1). Finally, from above and (1.19), we prove nlhcll/2Bn4 = 019(1). Thereby proving the lemma. Lemma 1.4.2 In addition to the conditions in Lemma 1.4.1, suppose (h?) also holds, d 2 2 then nlhl/ (ann) — 1122160)) = 0,011). Proof. Recall the definitions of A{n(6). Adding and subtracting 1 711 T 2 a Z Khfli’v’WQ Qngizi) i=1 30 in the squared integrand of A«In(dn), we can write 11.1,,(én) — 111;;(60) as the sum l’an + 2Wn2, where "1 v 1 A A 2 1‘ 11,11 := / [;§:K,,,112)160——9n.>TQn2(Z2)] 21221212), 1i=1 n1 W112 := [7,112. 2: Khlim n11: Kh1i( Z)(90- 9n)T Qn2(Zz')d?/1h2(z), and (2.: Y, — 190 Qn2(Z 2") Easy to see that 1 n1 2 2 2 111,1 3 2] [5—1-:Kh,,122)1904219112212122)421211)] 222,212111201 i=1 1 2 T 2 1 +2] [a 2:1 Khlz-(ZXQO — 9n) Q(Zi)] d¢h2(2)- 2: We write the first term on the right hand side as Wnll and the second term as Wn12- On the one hand, note that Wnll is bounded above by f (Z) 2 Ha — 6012 sup | Z ( Hui ZK21111~>HQ21212 > — Q( 22-111] 2112) “21023 By the conditional expectation argument as we used in the previous part, we can show that the integral part is indeed of the order 019(1). By assumption (W2), the compactness of I h 1, and the asymptotic behavior of én — 90 stated in Theorem 1.3.1, nlhcli/QWnll = op(hC11/2) = 029(1). On the other hand, Wnl2 is bounded above by Ilén—90||2-sup'fZ——Zw—-— (2:) /[ ”1:2:Kh11o1111Q12122111]2121 Since the integral part is of the order 019(1), so what/2 1'17,le = 01,013” 2 l = 0p(1) .. 7 . . d/2 , 1s easily obtalned. Therefore, nlh1 l/an : op(1) IS proved. Now, consider VVnQ. Rewrite it. as n1 "1 1 , , 11,,2 =/n—1 z.2211K,1,1,()cz 251;ZKh1,(z)Q,,2(z,)d1/1h2(z)2(90 4)”). 31 Note that integral part of IV"? is same as the expression on the right hand side of (1.12). thus 711 Wing = (972—90)T/n11ZKhlz@)Qn2(Zi’) —1nZKh1i@)Qisz(Zi)dl;’h2(W (90—911)- Therefore. W n2 is bounded above by 1132—6011 2122;] 2131me 111222212 ll] 21231.21 2) Adding and subtracting Q(Z i) from Qn2(Z -), it turns out that W122 is further bounded above by the sum an21 + ang, where W221 == 21162—60112/1221‘12Khl.11122212) 621221122111212), W222 := 21112-60112 /1n;1Z19.1.12)11212211122231212) i=1 Arguing as in Wnll and Wn121 we can show d 2 d 2 nlhl/ |Wn21| =op11), nlhl/ Iangl =op11). Therefore n.11hd/2 lunglz 019(1). Together with the result n1h1d2/ [IV n—ll — op(1), the lemma is proved. [:1 Lemma 1.4.3 If H0, {81), {62), (11), (fl), (f2), (ml), (m2), (5), (n). (11.1), {h2), (101) and (11.12) hold, n,1hcll/2(1l«fn(60) — ilTn,(60)) = 0p(1). Proof. Recall the definition of (l- and Un1(z ). Note that nlh (ll/2|Mn(90) — Mn(00)| is bounded above by 2 d 2 f (2) . nlhl/ sup ——2—Z————— 1| ][n—l— 72:1Kh12(z)(l]2 (111(3). 361 th2 1i 32 Replace CZ- bx éi + 67( (Q(Z i)_ Qn2(Z 21)), the integral part of the above inequality can be bounded above by the sum 2/113 ( )clc >+2/ [n1 ZKhllW (QZ( Zil‘Qn2(Zzfi))]2d’C¥’(Z)- The first term is of the order Op((n1h‘11)_1/2) which is obtained by the similar method as in K-N, while the second term, by the conditional expectation argument, has the same order as 2 ., 2 sup %Z—(:)—- -O(n2—4/(d+2a+4)) + sup I%Z—(Z—)— — 1 2 - Op(1). zEIhl wa2(3) zEIhl wa2(z) Therefore ,lnlh d2/ |1\In(60)—A71n(60)| is less than or equal to d 2 1 0p(n1hl/ -—d -logk n,1(logn1/n1)2/(d+4)) nhl' d 2 —4b d+2 +4 + 0p(n1h1/ - logk n,1(log'n.1/n.1)2/(d+4) - n1 /( a )) + 0p(n1h(11/2 ~10gk n1(logn1/n1)2/(d+4) -log% n1(logn.1)4/(d+4)n1_4b/(d+4)). All the three terms are of the order 0p(1) by the assumptions (n), (hl), (112), (WI) and (w2). Hence the lemma. Cl Lemma 1.4.4 If H0, (61), (62), (64), (u), {f1}, (f2), (m1), (7722), (i), (n), (h1), {h2}, (2111) and (2112) hold, nlhii/2(C‘n — (3'71) : 0p(1). Proof. Recall the notation An1(z) in (1.13). Adding and subtracting Oan,2(Zi) from Yz in the integrand of hCn, then expand the quadratic term. then Cn — (in can be rewritten as the sum of C721? 1 = 1,2,3,4, 5, where 33 721 1 C- 1 1: 2: h 7:(~ 90 Qn2(Z 2'2» Ann )dWZ)’ n 111 12': l/K2 1 2 n1 . A , . Cng := ”—22 / Kai-(2)0}‘93Q712(Zi))'(90—9n,)TQ712(Zi)An1(Zldwzl, 1i=1 n1 Cn3 == $Z/K131W 90Qn2('—))(90 én) Tengzadwz), 12' 1 Ca := 7 /K,2,1,m 42%" 6271202,»-(<90—én>TQn2(Zi))2An1(awe), 0715 2: '7” Z/K§1,m ‘gan2(Zi))((00‘én)TQn2(Zi))2dtb(z). To prove the lemma, it is enough to prove nlhf/2Cnl = 0p(1) for l = 1, 2, 3,4, 5. For the case of l = 1, first notice that lCnll g ZSuplAn1)('|11:1/Kh1i(—§nZ)€z'2d¢(2) 261 ”12: 11 2 A ~) . K2 z)(90T(Q - — “. Z- 2d + supl n1( 1.) h i( Q( Z2) Qn2( 1,)» 19(2) 261 nl i—l 1 = Cn11+Cn12- Since 721—2112:_lth1i(z(z){i2d¢(z) = Op(1/n1hcli) by a routing expectation argu- ment, so "lhii/Qlcnlll = 0p("1h(11/2 '(108k n1)(logn1)2/(d+4)n1‘2/(d+4) . (n.1h1)—1) 0p ("ad/2—2/(dnwl) _ (1ng n1)(logn1)2/(d+4)) = 012(1). Second, from the compactness of 9, we have 34 U] 1231,2123 (Q(2,~1 — (“22222111221221 712122] Khl,z1(211Q 1 (922(221112d2e1. Again by the conditional expectation argument, the second factor of the above ex— pression has the same order as —4/(d+2a+4) fZ(Z)' 2 O n - su ———~ K2 z)T Z d p( 2 ) 26121 llew2(Z'2 ) 711%:211/ h1i(z ( )1,“ ) + sup IJZ_(Z_)__1| 1%}: 1‘/K}IIZ(Z)(leQZizlllzd‘M ) zeIhl fzu,2(Z ) Because 22 / K1112 1(222 Z1dw(z=1 (Jpn/12111311, n1i=11 -2- Z / A2122311162(ZZ-1112c12( 1= ope/2112311, n1i=1 so, from (h2), (W2), and Lemma 1. 2. 2, we obtain n1h1d2/ lCn12l is of the order Op (nl—Q/(CH4)—4b/(d+2a+4)h1_d/2(logk n11)(log n1)2/(d+4)) + 0,,(121 —d/2n ”1— 2/(d+4)—4b/(d+4)(10g1gn1)(10gn1)6/(d+4)) which is 012(1) by assumption (h1) Hence we get nlh1 (12/ lCn1l_ —- 013(1). Now we will show that 71112.1/2 ICn3| 2 012(1). Once we prove this, then nlh1 2|Cn2l = 019(1) is a natural consequence. In fact, 0.23 = 3%;{31/K3412 (62+90Q(Z 1— 63Q2,2(2211 Z— ‘(90 -én)T(Q1-12(Z2)- Q(Zi) + Q(Zilldw(2)- 35 So lCn3| is bounded above by the sum 2(Cn31 + C7132 + 01,133 + C7134), where 01131:: 22/ 211, 1121-11160— (911111Q12121 Q12111121121, 01132 := 51:2/1931112112111160 — @1111111Q12111111121, 01123 := ~112]21111211160111160—é111111Q12(21-1—Q(21111221(21, 01134 == 22 / 11,1,11110111110_1n1111131212,1_ Q12111111Q(2111121(21. n1i=1 It is sufficient to show that nlh1 d/2 lC n3ll = 0p(1) for l = 1,2,3,4. Because the proofs are similar, here we only show nlhcli/2ICR3QI = 017(1), others are omitted for the sake of brevity. In fact, note that ”1112[2111121121111Q12111121( 1=—— 010/1111?) 2— 1 by a expectation argument, then from ”On — 90H— — 019(n1 12/ ) by Theorem 1. 3. 1, 1 —d 2 we have 11111;”? 10,,321 = 2111/2 111911 4011 ppm/11111611) = 0,1111- 1211/ / ). Be- cause nl- 1/2h1—d/22 nil/2+ad/2 and a < 1/2d. by assumption (hl), so the above expression is 0p(l). Similarly, we can show that the same results hold for Cn4 and C715. Details are left out. El Lemma 1.4.5 Under the same conditions as in Lemma 1.4.4, fn — P = 0p(1). Proof. Recall the notation for Q. Define _2 1 2 Pn=2h1n1 Z (A/Khliu )Kh1](z )fifjdd)h2(2)) . 36 The len'n'na is proved by showing that fin — fin = 0pm, rn — r = 0,)(1). (1.21) But the second claim can be shown using the same method as in K-NJ so we only prove the first claim. Write an := én — 60, 'ri := 93Q(Z..)— 6,7; Q712(Z ). Now I}; can be expressed as the sum of f” and the following terms: "'1 B"1 = 2hil'nI222 [/Kh12(z(z)Kh1J-(z >52 rJ-duh2(z z) 2751' A 2 +/Kh1;(1 >Kh1j( >972de +th12(z >Kh1jirjdwh2(z>], 3,12 _ 4th 711-2: (th1J(z (z)KhIJ-(z)€zfjd1g71h2(z )) iaéj (fKh1i((:)Kh1j(:)€iTjd§:h2((3)-J— so it suffices to show that both terms are of the order 019(1). Applying the Cauchy- Schwarz inequality to the double sum, one can see that we only need to show the following: Kain {2 K )K ()IE-r-ldz?‘ <~>]2—o (1) (122) h1i(z hlj .z ] ”12.2 ~ - P ' i#j[ d "1 A 2 h1n2[fKh1i(z)Kh1J-(z)|1‘irjIdu’)h2(3)] = op(1), #1“ d 2"1 A 2 ml Z[jKh,1i(z>Kh1j(2)], #J’ An3 = 3hi’n; 22IIIIII ZIIQIJIZJ I2-[fKIJ1J-Iz )KhlJ-(Z)l€z'ldw(z)]2- #J A712 = 0p(1) can be shown be the fact that un = én — 60 = 0p(1), and that d 2 n1 2 III-r21 Z [ / KI1J 7/4. In the simulation, we choose b = 7/4 + 0.0001. The band widths are chosen to be 1 3 h1 = 121/ , he —- (10s(n1)/721)1/5. “’1 = 712—1/7, (112 = (log(712)/712)1/5 by the assumptions (hl), (112), (WI) and (w2). The kernel functions K, K* are the same as in the first case, while the density function L has a Fourier transform given by (1914(1) = max{(1 — t2)3, 0}, the corresponding deconvolution kernel function then 43 takes the form 1 I Lu»(;r) = ;/0 cos(t:r.)(1 — 1‘2)3 ex1)(0.005t2/u12)dt. Table 1.3 reports the Monte Carlo mean and the MSE of the MD estimator 071, under H0. One can see there appears to be small bias in 0n for all chosen sample sizes and as expected, the MSE decreases as the sample size increases. To assess the level and power behavior of the fin test, we Chose the following four models to simulate data from. Model 0: Y = X + 8, Model 1: Y = X + 0.3X2 + 5, Model 2: Y = X +1.4exp(—0.2X2)+ 5, Model 3: Y = XI(X 2 0.2) + 8. Table 1.4 reports the simulation results pertaining to fin. Data from Model 0 in this table are used to study the empirical sizes, and from Models 1 to 3 are used to study the empirical powers of the test. Case 3: This simulation considers the case of d = 1, q = 2. Everything here is same as in Case 1 except the null model we want to test is m9(X) = 81X + 62X2. The true parameters are 01 = 1, 02 = 2. Easy to see that Rn2(z) takes the form £1,120) 2: (/$qul1(T)fU(z —1)d$,/$2wal($lfU-(Z ‘0‘”)73 Table 1.5 reports the Monte Carlo mean and the MSE of the MD estimator én = (0,,1,0n2) under H0. One can see there appears to be small bias in 0,, for all chosen sample sizes and as expected, the MSE decreases as the sample size increases. 44 (711,712) (a,b) (50.134) (100,317) (200,753) (3001250) (500,2366) (0.30.5) 0.003 0.008 0.009 0.020 0.041 (0.30.8) 0008 0.014 0.017 0.031 0.053 ModelO (0.5.0.5) 0.010 0.011 0.020 0.030 0.049 (0.8,0.8) 0.020 0.024 0.027 0.042 0.052 (1.00.8) 0024 0.028 0.026 0.039 0.050 (1.0,10) 0.028 0.037 0.030 0.048 0.054 (0.30.5) 0.407 0.865 0.987 0.997 1.000 (0.30.8) 0.491 0.888 0.990 0.998 1.000 Modell (0.50.5) 0.704 0.975 0.999 1.000 1.000 (080.8) 0.896 0.997 1.000 1.000 1.000 (1.00.8) 0.921 0.999 1.000 1.000 1.000 (101.0) 0.926 0.997 1.000 1.000 1.000 (0.30.5) 0.898 0.972 0.999 0.999 1.000 (0.30.8) 0.919 0.976 0.999 0.999 1.000 Mode12 (0.50.5) 0.985 0.999 0.999 1.000 1.000 (0.80.8) 0.998 1.000 1.000 1.000 1.000 (1.00.8) 0.999 1.000 1.000 1.000 1.000 (10.10) 0.999 1.000 1.000 1.000 1.000 (0.30.5) 0.774 0.959 0.993 0.998 1.000 (0.30.8) 0.807 0.964 0.993 0.998 1.000 Mode13 (0.50.5) 0.933 0.966 0.999 1.000 1.000 (0.808) 0.999 1.000 1.000 1.000 1.000 (1.00.8) 0.992 1.000 1.000 1.000 1.000 (10.10) 0.988 1.000 1.000 1.000 1.000 Table 1.2: Levels and powers of the MD. test, d = 1, q = 1, Double Exponential 45 (721,712) (50,941) (100,3164) (200,10643) (300,21638) (500,52902) Mean 1.0051 1.0078 1.0085 1.0101 1.0169 MSE 0.0013 0.0007 0.0004 0.0003 0.0004 Table 1.3: Mean and MSE of 0n, d = 1, q = 1, Normal (711,112) Model (50,941) (100,3164) (200,10643) (300,21638) (500,52902) Model 0 0.018 0.022 0.029 0.035 0.049 Model 1 0.918 0.999 1.000 1.000 1.000 Model 2 0.999 1.000 1.000 1.000 1.000 Model 3 0.993 1.000 1.000 1.000 1.000 Table 1.4: Levels and powers of the MD. test, d = 1, q = 1, Normal (711,712) (50,134) (100,317) (200,753) (300,1250) (500,2366) Mean of 9711 1.0169 1.0144 1.0139 1.0136 1.0128 MSE of (9,,1 0.0058 0.0031 0.0015 0.0011 0.0007 Mean of 9,,2 2.0450 2.0452 2.0463 2.0493 2.0473 MSE of ring 0.0124 0.0076 0.0046 0.0042 0.0033 Table 1.5: Mean and MSE of 97), d = 1, q = 2, Double Exponential 46 (n1,n2) Model (50134) (100.317) (200,753) (3001250) (500.2366) Model 0 0.001 0.009 0.019 0.029 0.046 Model 1 0.297 0.815 0.999 1.000 1.000 Model 2 0.528 0.965 0.999 1.000 1.000 Model 3 0.996 0.999 1.000 1.000 1.000 Table 1.6: Levels and powers of the MD. test. d = 1, q = 2, Double Exponential To assess the level and power behavior of the fin test, we chose the following four models to simulate data from. Model 0; Y = X + 2X2 + e, Model 1: Y = X + 2X2 + 0.3x3 + 0.1+ 5, Model 2: Y = X + 2X2 + 1.4 exp(—0.2X2) + 6. Model 3: Y = X + 2X2 sin(X) + 5, Table 1.6 reports the simulation results pertaining to fin. Data from Model 0 in this table are used to study the empirical sizes, and from Models 1 to 3 are used to study the empirical powers of the test. Case 4: This simulation considers the case of d = 2, q = 2. The null model we want to test is m6(X) = 91X1 + QQXQ. The true parameters are 61 = 1,02 = 2. The kernel functions K and K* and the band widths used in the simulation are 9 K(21,22) = K*(21,22) = —_(1 — 2%)(1— 2%)I(|21|§ 1,]z2l 31), (1.26) 10 —1/5 —1/6( hl 2711 , 12.2 = 721 1/6, log 711) 47 (721,712) (50.354) (100.1001) (200.2830) (300.5200) (500,11188) Mean of 9,,1 1.0099 1.0120 1.0115 1.0094 1.0113 MSE of 9,,1 0.0042 0.0019 0.0011 0.0008 0.0005 Mean of (9,,2 2.0202 2.0220 2.0213 2.0225 2.0209 MSE of 97,,2 0.0042 0.0027 0.0014 0.0011 0.0008 Table 1.7: Mean and MSE of én» d = 2, q = 2, Double Exponential For the chosen kernel function (1.26), the constant C in Fn is equal to 0.292. The kernel function used in the (1.4) is chosen to be the bivariate standard normal, so the deconvolution kernel function with band width 10 takes the form 2 Lott) = % exp ( _ of + $3111 _ 0.00501; — 1)] [1_ 00050273 - 1)]. w 2122 Since (m1) holds for a = 0, so the band widths 101 2 7131/6, 102 = (log(n2)/n2)1/6) which are chosen by assumption (WI) and (w2). According to the assumption (n) we take 712 = 71%‘5001. Table 1.7 reports the Monte Carlo mean and the MSE of the MD estimator 6n = (9311160712) under H0. One can see there appears to be small bias in 0n for all chosen sample sizes and as expected, the MSE decreases as the sample size increases. To assess the level and power behavior of the ’13” test, we chose the following four models to simulate data from. Model 0: Y = X1+ 2X2 + 5, Model 1: Y 2 X1 + 2X2 + 0.3X1X2 + 0.9 + 8, Model 2: Y = X1 + 2X2 + 1.4(exp(—0.2X1) — exp(0.7X2)) + 5. 48 Model 3; Y = X11(X2 2 0.2) + e. (711.712) Model (50,354) (100.1001) (200.2830) (300.5200) (500,11188) Model 0 0.002 0.012 0.018 0.016 0.038 Model 1 0.908 0.998 1.000 1.000 1.000 Model 2 0.992 0.999 1.000 1.000 1.000 Model 3 0.935 0.996 1.000 1.000 1.000 Table 1.8: Levels and powers of the MD. test, d = 2, q = 2, Double Exponential Table 1.8 reports the simulation results pertaining to fin. Data from Model 0 in this table are used to study the empirical sizes, and from Models 1 to 3 are used to study the empirical powers of the test. 1 .6 Discussion 1.6.1 Sample Size Allocation The simulation studies Show that the proposed testing procedures are quite satisfac- tory in the preservation of the finite sample level and in terms of a power comparison. But in the proof of the above theorems, we need the sample size allocation assump- tion (n) to ensure that the estimator (2712(2) has a faster convergence rate. The assumption (n) plays a very important role in the theoretical argument, but it loses attraction to a practical practitioner. For example, in the simulation case 1 where the 49 measurement error follows a double exponential distribution, the sample size alloca- tion is 722 = [723’], and b = 1.2501. 77.2 in the second subsample 52 increases in a power rate of the sample size 711 in the first subsample, If "1 = 500, 712 is at least 2365, the sample size of the full sample is 2865 which is perhaps not easily available in practice. The situation becomes even worse when the measurement error is super smooth or d > 1. For example, in Case 2, the measurement error has a normal distribution, n2 is at least 52902 if n1 = 500; in Case 4, d = 2, n2 is at least 11188 if "'1 = 500. Then an interesting question arises. What is the small sample behavior of the test procedure if (1) 711 = 712 and the two subsamples SI and 82 are independent or (2) n 2 n1 2 712 and the same sample is used in the test? We have no theory at this point about the asymptotic behavior of Mn(9n). For d = 1, we only conduct some Monte Carlo simulations here to see the performance of the test procedure, see Table 1.9-Table 1.12. The simulation results about levels and powers of the MD test appears in the following tables, in which the measurement error follows the same double exponential and normal distributions as in the previous section, the null and alternative models are the same as in Case 1. 50 Sample size: (711,712) Model (50,50) (100,100) (200,200) (300,300) (500,500) Model 0 0.008 0.036 0.033 0.038 0.049 Model 1 0.938 1.000 1.000 1.000 1.000 Model 2 1.000 1.000 1.000 1.000 1.000 Model 3 0.990 1.000 1.000 1.000 1.000 Table 1.9: 72.1 2 712, d = 1,q = 1, Double exponential Sample size Model 50 100 200 300 500 Model 0 Model 1 Model 2 Model 3 0.015 0.934 0.999 0.991 0.024 0.036 1.000 1.000 1.000 1.000 1.000 1.000 0.043 1.000 1 .000 1.000 0.047 1.000 1.000 1.000 Table 1.10: Same sample, d = 1, q = 1, Double exponential Sample size: (111,722) Model (50,50) (100,100) (200,200) (300,300) (500,500) Model 0 0.013 0.023 0.027 0.035 0.047 Model 1 0.931 0.999 1.000 1.000 1.000 Model 2 1.000 1.000 1.000 1.000 1.000 Model 3 0.984 1.000 1.000 1.000 1.000 Table 1.11: 71,1: 712, d =1,q = 1, Normal 51 Sample size Model 50 100 200 300 500 ModelO 0.017 0.019 0.036 0.036 0.051 Modell 0.954 0.998 1.000 1.000 1.000 Model2 0.999 1.000 1.000 1.000 1.000 Mode13 0.992 1.000 1.000 1.000 1.000 Table 1.12: Same sample, d = 1, q = 1, Normal Sample size Model 50 100 200 300 500 Model 0 0.000 0.004 0.010 0.018 0.041 Model 1 0.628 0.996 1.000 1.000 1.000 Model 2 0.994 0.999 1.000 1.000 1.000 Mode13 0.844 0.998 1.000 1.000 1.000 Table 1.13: Same sample, d --= 2, q = 2, Double Exponential To our surprise, the simulation results for the first three cases in which d = 1 are very good. There are almost no differences between the simulation results based on our theory and the simulation results by just neglecting the theory. In the Case 4 with d = 2, we only conduct the simulation for 51 = SQ, see Table 1.13. The test procedure is conservative for small sample sizes, but the empirical level is close to the nominal level 0.05 when sample size reaches 500. This phenomenon suggests us that. 52 by loosing some conditions, such as (11), even the assumptions on the choices of the bandwidths, Theorem 1.3.1 and Theorem 1.4.1 maybe still valid. 1.6.2 General Errors-in-Variables Model Fitting In the previous sections we have so far discussed the model fitting problem in the errors-in-variables models in which the regression function is linear in 6 under the null hypothesis. The separation between the parameter and the predictor enables us not only to get an explicit expression for the estimator, but also to utilize a conditional expectation argument, so that we can use Lemma 1.2.1 to get a better sample allocation scheme. If the regression function under the null hypothesis has a general form other than the form we discussed in this chapter, things become complicated. For the sake of brevity, this section only reports the results we obtained for the general errors—in-variables model fitting. To be specific, in the errors-in-variables model (1.1), the problem of interest is to develop tests for the following hypotheses: H0 : 11(1) 2 "100(1)? for some 60 E 9, vs. H1 : H0 is not true, (1.27) where {m9(1:) : 0 E O} is a given parametric family. Just like in the special case considered in the previous sections, the problem of testing for H0 is transformed to test for V(::) = V90(z), where now V6(Z) := E(mg(X)|Z = z). A very important question related to this hypothesis change is the following: Are the two hypotheses, H10 : Mr) 2 "190(2), for some 00 and all .r, and H20 : l/(Z) = 1x90(z), for some 53 60 and all 2, equivalent? The answer is negative generally, because for any two measurable functions 777.1(23), 777.2(1), E(ml(X)|Z =: z) = E(m2(X)|Z = z), for all 2, need not imply m1(;r) = 777.2(1) for all :r. In this case, if our test rejects H20, then we can reject H10 as well, but if the test fails to reject H 20, then we can say nothing about H10. Note that E(m1(X)|Z = z) = E(m2(X)|Z = z) is equivalent to / m1($)fx($lfu(z — x>dx = / m2(I)fX(I)fu(z — nae for all 2. Hence if fu(z — -), as a distribution family with parameter 2 6 Rd, forms a complete family, then these two hypotheses are indeed equivalent. This is the case, for example, for the normal distribution, and if d = 1, for double exponential distribution. From (1.3) one sees that if f X is known then f Z is known and hence V0 is known except for 6. Therefore a modification of K-N’s procedure in this case is as follows. Let _ 1 n 2 m9) .= / [an(z)i=ZlKhZ-(z)Yz-—u6(z)] 30(2), gee, (1.28) 1 n 2 m0) .= j[an(z)i=:lKhZ-(z)(Yi—u6(Z,-))] (10(2), ago, 6n := argmingeeTnW), 6n: argminQEeTnW), Here h is a bandwidth only depending on n. Then we may use 6n to estimate 6, and construct the test statistic through T 71(971). Unfortunately, f X is generally not known and hence f Z and H9 are unknown. This makes the above procedures infeasible. To construct the test statistic, one needs estimators for f Z and H9. For f Z: one can still use the classical kernel estimator, with 54 a possibly different kernel function K* and a bandwidth I12. So one only needs to find an estimator for V9. Using deconvoluting kernel density estimator with bandwidth I13 for f X One can estimate V6(Z) by - f '77'7.6(1:)th3(r)f77(z — 50111: [16(2) = ~ 7 f Zh 3 (3 ) th3(z) = fth3(r)fn(z—I)d.r. Our proposed inference procedures will be based on the analogs of Tn where 119(2) in (1.28) is replaced by its estimator 199(z). To be precise, we assign the first n1 = n1(n) and 721 < n observations to estimate f Z1 and use all n observations to estimate f X' The bandwidths h1, h2 will depend on the sub-sample size n1, and ’13 will still depend on the full sample size n. Replace V9(Z) in (1.28) by its estimator 196(8) and define M,",‘(8) : fln1fZ1h2—(— z)ZKh1i(z)Y V9(z)leG(z)’ - 2 6471(9) I: _/l'[n—-—z_1le(h2) ZKhlz-(ZX Y'-1/6(Z.i))] (10(2), 669, 67;, := arginfgeeMnW), 67):: argianEean). Then we may use 6n to estimate 6, and construct the test statistic through Mn(6n). We can show that 6;; converges to 6 in probability. But as is clear 6,"; is really not an estimator, but we need this convergence result to prove the consistency for 6n for 6, and the asymptotic normality of \/n—1(6n — 60). Finally, let 9 be a density of C, and let c.- == n—Hoowd, <.-:=Y.-—HA (2.). 55 ’11 C7) 2: rlIQZ/Kgll-(z) )(2dL( (3,) Cn :2 721 QZ/Khli C2dl' h2(3 M) . - . 2 Tn := 5fij(.fKhli(31Kh1(leideUhQ(3)) » 2(2) :2 o? + E((:90(X) -— 7190(2))212 = z), o3 ;—_- Var(€), F :2 2/(72(z))29(2)d¢l(z)-/( (/K(u)K(u+1l)du)2dv, ; dG(z) / . dG(z ) W) 77 W 1%(1 2 Under appropriate sample size allocation scheme, and under the null hypothe- sis and other regular conditions, we can show that the asymptotic distribution of d/2I‘ 7:1/2 (Mn(6n,) — Cn) is standard normal. But the sample allocation scheme nlhl 711 = 711(71) is not feasible, particularly in the super smooth case. Simulation results show that, if we do not follow the sample allocation scheme, just like we did in the previous section, the test statistic behaves quite satisfactory. 56 CHAPTER 2 Minimum Distance Berkson Model Fitting 2.1 Introduction Berkson model is also commonly used in the real applications. As an example, con- sider the herbicide study of Rudemo, et al. (1989) in which a nominal measured amount Z of herbicide was applied to a plant but the actual amount absorbed by the plant X is unobservable. As another example, from Wang (2004), an epidemiologist studies the severity of a lung disease, Y, among the residents in a city in relation to the amount of certain air pollutants, X. The amount of the air pollutants Z can be measured at certain observation stations in the city, but the actual exposure of the residents to the pollutants, X, is unobservable and may vary randomly from the Z- values. In both cases, X can be expressed as Z plus a random error. There are many similar examples in agricultural or medical studies, see e.g., Fuller (1987), Carroll, 57 Ruppert and Stefanski (1995), among others. All these examples can be formalized into the so called Berkson model Y = /1(X) + 5. X = Z + 77, (2.1) where n and 5 are random errors with E5 = 0, and where n is d-dimensional, and Z is the observable d—dimensional control variable. All three variables 8, 77, and Z are assumed to be mutually independent. The parametric Berkson model where the regression function is of a parametric form {m6(17) : :r 6 Rd, 6 E 9 C Rq}, q 2 1, has been focus of numerous authors. Fuller (1987) and Cheng and Van Ness (1999), among others, discuss the estimation in the linear Berkson measurement error models. For nonlinear models, Carroll et al. (1995) and references therein, consider the estimation problem by using regression calibration method. Huwang and Huang (2000) studies the estimation problem when m9(;r) is a polynomial in 1: of a. known order and shows that the least square estimators based on the first two conditional moments of Y, given Z, are consistent. Wang (2003, 2004) addresses the same problem in general nonlinear models and shows that the estimators obtained by minimizing the first two conditional moments of Y, given Z, are consistent and asymptotically normal. But literature appears to be scant on the lack-of-fit testing problem in this im- portant model. This paper makes an attempt in filling this void. To be precise, with (X, Y) obeying the model (2.1), the problem of interest here is to test the hypothesis H0 : p(;r) = 77’60(I)’ for some 60 E G and for all .r; H1 : H0 is not. true, 58 based on a random sample (X z" 1”,), 1 g 2' g n. from the distribution of (X, Y). Many interesting and profound results. on the contrary, are obtained for the re- gression model checking problem in the absence of errors in independent variables, see, e.g., Eubank and Spiegelman (1990), An and Cheng (1991), Hart. (1997) and references therein, Stute (1997), Stute. Thies, and Zhu (1998), among others. The recent paper of Koul and Ni (2004) uses the minimum distance methodology to pro- pose tests of lack-of-fit for the regression model without errors in variables. In a finite sample comparison of these tests with some other existing tests, they noted that a member of this class preserves the asymptotic level and has very high power against some alternatives and compared to some other existing lack-of-fit tests. This paper extends this methodology to the above Berkson model. To be specific, Koul and Ni (2004) (K-N) considered the following tests of H0 where the design is random and observable, and the errors are heteroscedastic. For any density kernel K, let Kh(17) 2: K(:r/h)/hd, h > 0, :1: 6 Rd. Define, as in K-N, n fw(x) := i: KZ,(;r — X -), w = wn ~ (logn/n)1/(d+4), me) ;= (C iii“ H— ' ’(Yj_m"(Xj))l26fl”gl:i’ and én := argmingeeT 71(6), where K, K* are density kernel functions, possibly different, h = hn and w = urn are the window widths, depending on the sample size n, and C is a sigma finite measure on C which is a compact subset of Rd. They proved the consistency and asymptotic normality of this estimator, and that the asymptotic null distribution. under H0. of Dn :2 mil/2m, (én) — C'71)/l",17/2 is standard normal, 59 where n. .. 1 2 2"—2 - . 0n. 1: £521 Cth’I-Xilfi'fw (.1‘)dG(1.‘), 5i=Yi_m9~n(Xi) z: n . ~ '_ 1 (If—Xi) I_Xj) g; ~_2‘ -' 2 n 2%]7—1 The test based on ’Dn is preferable over the tests developed by Hardle and Mam- men (1993), and Zheng (1996). Unlike in these and other related papers, K-N do not need the null regression function to be twice continuously differentiable in the param- eter vector nor do their proofs need the rate for uniform consistency of nonparametric regression function estimators. Moreover, the asymptotic normality of Til/2(én — 6) and 0,; was made feasible by recognizing to use different window widths for the esti- mation of the numerator and denominator in the nonparametric regression function estimation. A consequence of the above asymptotic normality result is that at least for large samples one does not need to use any resampling method to implement these tests. These findings thus motivate one to look for tests of lack-of-fit in the Berkson model based on the above minimized distances. Since the predictors in Berkson models are unobservable, clearly the above procedures need some modifications. Let f5, f X? f7), f Z denote the density functions of the r.v.’s in their sub-scripts and 0&2 denote the variance of 5. In linear regression models if one is interested in making inference about the coefficient parameters only, these density functions need not be known. Berkson (1950) pointed out that the ordinary least square estimators are unbiased and consistent in these models and one can simply ignore the measurement error 77. But if the regression model is nonlinear or if there are other parameters in 60 the Berkson model that need to be estimated, then extra information about these densities should be supplied to ensure the identifiability. A standard assumption in the literature is to assume that f7? is known or unknown only up to an Euclidean parameter vector, cf., Carroll, et al. (1995), Huwang and Huang (2000), Wang (2004), among others. Throughout this paper, we shall assume that f7? is known unless the regression function under the null hypothesis is linear. To adopt K-N’s procedure to the current setup, we first need to obtain a nonpara- metric estimator of )1. Note that in the model (2.1), f X (=23) f f Z )f77(a: — z)dz. Let K be a kernel density, n fz(Z) = 71-1 Z Kh(z — Z i=1 be the kernel estimator of f 2(2), and Rh == / Kh(y — 2mm: — guy, x, z e Rd. It is then natural to estimate fX(17) by 'szZM ()(fn (Wad iZI—{MLZZ'L :rEIRd. Given the estimator fX(I), one is then tempted to estimate the regression function [1.(1‘) by Unfortunately, the classical argument shows that jn(.r) is not a consistent estimator of )1. 1(1" ). It in fact 18 consistent for J(J: =E[H(Z )IX- - :17], where H( (z): E[,u(X |)Z— — z]. 61 We include the following simulation study to illustrate this point. Consider the model Y = X2 + 5, X = Z + 17, where e and 17 are Gaussian r.v.’s with means zero - and variances 0.01, and 0.05, respectively. The r.v. Z is the standard Gaussian. Then J (1:) = 0.0976 + 0.90712. We generated 500 samples from this model, calculated in, and then put all three graphs, jn(x), u(a:) = x2, J (1:) = 0.0976 + 0.9071:2 into one plot in the Figure 2.1. The curves with solid, dash-dot, dot lines are those of in, 1 .- 0.8- 0.6- 0.4~ 02* GT -1.5 1l5 Figure 2.1: Comparison Plot J (x), and p(:z:) = 1:2, respectively. To overcome this difficulty, one way to proceed is as follows. Define H9(z) :2 E[m0(X)|Z = z], J0(x) = E[H9(Z)|X =25], ~ 1 n 2 _ W) = j - Rhea, z-)Y- — 19(1?) dG(:v). (2.2) C [an (:17) 1; z z ] 1 n - 2 _ (271(9) = [C[nfxf$)i=ZIKh($’Zi)l}/i—H6(Zi)l] del‘), 62 and 9n = argrningeeéfld). 6n = argmingEeQnW). Under some conditions, we can show that 6n, 5n are weakly consistent for 6. and the asymptotic null distribution of the test statistic based on the suitably standardized minimum distance Q-nW-n) is the same as that of a degenerate U—statistic, whose asymptotic distribution in turn is the same as that of an infinite sum of weighted centered chi square random variables. Since the kernel function in the degenerate U-statistic is complicated, the computation of the eigenvalues and the eigenfunctions is not easy and hence this test is hard to implement in practice. An alternative way to proceed as we do here is to recognize that E (YIZ ) = H ( Z) and hence consider the new regression model Y = H (Z) + C, where the error C is uncorrelated with Z and has mean zero. The problem of testing for H0 is now transformed to test for H ( z) = H 9 0(2). Thus we do the following modification of the above K-N procedure to adjust for not observing the design variable. Let fzuflz r= £77:sz — w~(logn/n)1/(d+4); . , z E Rd. anw(z) Note that fin is a nonparametric estimator of the conditional expectation H (z) = E(,u(X)|Z = 2). Define 2 A1,:(6) = ./I [——— )ZKM z—Z z)Y- H9(z )] (10(2), anw(z) 2 AI,6 2/1 K( -Y-—H Z- (10:, n<> [n——fzw 2) 2 he —Z,,>[. 9( n1] () 0;: = argmingEQAI;(61), 9n = argmingeeillnw), where G is a measure supported on a compact subset I C Rd. We. consider ll-[n to be 63 the right analog of the above Tn for the Berkson model. Let 60 be the true parameter under H0. This paper proves that 6;: converges in probability to 60, under H0. This in turn is used to prove the consistency of én for 60, and the asymptotic normality of \ffién — 60), under H0. Additionally, we prove that the asymptotic null distribution . . . (1/2 A — 1/2 ' A " . . of the normalized test statistic 'nh Fn (ann) — C71), based on the minlmum distance Illn(én), is standard normal, which, unlike the first modification of (2.2), can be easily used to implement this testing procedure, at least for the large samples. Here, dag) = $0“) zERd, (, =Y,-—Hé (2,), 19312, (2.3) wa(Z) n - 1 n n = Z—Z/KIQJz—Zzflzzdwh), i=1 - _ . - - 2 1“,, ;= 2n hdZ(th(z—2,)Kh(z—Zj)gigjdwh2(z)) . so We note that there is a typo in the definition of the T}; of K-N, there should be a factor of 2 in there also. The paper is organized as follow. The needed assumptions are stated in the next. section. Section 3 contain the proofs of consistency of 6;; and én while sections 4 and 5 contains the proofs of the asymptotic normality of 6n and that of the proposed test statistic. The simulation results in section 6 Show little bias in the estimator 9A7; for all chosen sample sizes. The finite sample level approximates the nominal level well for larger sample sizes and the empirical power is high (above 0.9) for moderate to large sample sizes against the chosen alternatives. 64 2.2 Assumptions Here we shall state the needed assumptions in this paper. Throughout the paper 00 denotes the true parameter value under H0. About the errors, the underlying design and G we assume the following: (e1) The random variables {(Zi,}/i) : Z, 6 Rd,i = 1,2,--- ,n} are i.i.d. with the conditional expectation H(z) = E(Y|Z = z) satisfying fH2(z)dG(z) < 00, where G is a o—finite measure on I . (e2) 0 < a? < oo, EmgOUf) < 00, and the function 72(z) = E[(m90(X) — H00(Z))2|Z = z] is as. (G) continuous on I. (e3) El €|2+5 < oo, E[m60(X) — H90(Z)]2+6 < 00, for some 6 > 0. (e4) E|5|4 < oo, E[m90(X) — H90(Z)]4 < 00. (f1) The density f Z is uniformly continuous and bounded from below on I. (f2) The density f Z is twice continuously differentiable. (g) The integrating measure G has a continuous Lebesgue density 9 on I. About the kernel functions K and K*, we shall assume the following: (1() The kernel functions K, K* are positive symmetric square integrable densities on [—1, 1]d. In addition, K* satisfies a Lipschitz condition. About the parametric family of functions to be fitted we need to assume the following: (m1) For each (9, m9(;r) is as. continuous w.r.t.. the Lebesgue measure. (m2) The function H3(z) is identifiable w.r.t.. 0. i.e., if H6105) = H92(z) for almost all 2(0), then 01 = 62. (m3) For some positive continuous function I on I with E€(Z) < 00 and for some fi>0, [H92(z) — H91(z)l g ”92 — elm-flag), V61,62 6 9,2 6 I. (m4) For every 2, H9(z) is differentiable in 0 in a neighborhood of 00 with the vector of derivative H9(z), such that for every 0 < k < 00, H 2.,- —H 2,- — (9—9 ’H 2,- Sup | 9( l 90( ) ( 0) 90( )|=0p(1). H9 - 90H lgign,(/nh%||6—6O||gk (m5) For every 0 < k < oo, —d 2 . . sup hn / HH9(Zi)—H90(Zilll=0p(1), Vn>N€- 1§i_<_n,\/nh§11||6—60||§k (m6) 20 := ngOHéOdG is positive definite. About the bandwidth hn we shall make the following assumptions: (h1)hn,—’Oasn—>oo. (h2) nhgld—xooasn—aoo. (h3) hn ~ 71—61, where a < min(1/2d,4/(d(d + 4))). The above conditions are similar to those imposed in K-N on the model 1729. Consider the following conditions in terms of the given model. 66 (m2’) The parametric family of models m9(;r) is identifiable w.r.t. 6, i.e., if m9l(. r): m92(.r I.) for almost all :13, then 61- — 62. (m3’) For some positive continuous function L on Rd with EL(X) < 00 and for some 1'3 > 0, |m92(;r) — m91(2:)| S ”62 — 61ll'3L(:r), V91.92 6 8,1? 6 Rd. (1114’) The function m9(1:) is differentiable in (9 in a neighborhood of 60, with the vector of differential #190 such that for every k < oo, Imam - m90($) — (9 — 90) 'm90(I)l sup ”0 _ 90“ = op(1). xEle,(/nh%||0—00||Sk (m5’) For every 0 < k < 00, —d 2 . . sup hn / Ilmgos) —m90(x)n =0p(1), Vn> N5. xERd,-(/nh%||0—90||Sk In some cases, (m2) and (m2’) are equivalent. For example, if the family of densities {f77(~ — 2); 2 E IR} is complete then this holds. Similarly, if "19(1) = 6’7(x) and f( 7(x )(f9 :6 — 2)d:r 75 0, for all 2, then also (m2) and (m2’) are equivalent. We can also Show that (m3’)-(m5’) imply (m3)-(m5), respectively. This follows because H9() .=_f m9 (1‘)fo — 2)d;1:. Thus under (n13’), |H92d and Np(a, B) denotes the p-dimensional normal distribution with mean vector (1 and 68 covariance matrix B. p 2 1. We shall also need the following notation. (16' 2 ([1,;(2) :2 2( ). 02(2) 2: Var9 (CIZ = 2) = a? + 72(2), 2 6 Rd. (2.5) f2(3) Q 0 ~ 1 Zn 2 2 C2 = ll—H90(Z,), 1 0 implies T(1/n) —> T(I/), as n —> 00. (c). T(H9(-)) = 6. uniquely for V6 6 9. Recall the notation at (2.3) and (2.5). As in K-N, for any integral J := f rdlf), the replacement. of dc; by dw( 2) is reflected by the notation j := f rdui'. We also need to 69 define, for a 6 E R9, untzfl) := 3 411,13 — ZilH6(Zilv (2.6) #n(~ 6) = 121194.. — 2.1119(2) (no.9) = % :1 Kht: — 22W, — maze) ,2 = ,1; :KW - ZillYi — H6020], Un(2) == Un(z,90) ,: anza 9) == 111n 60, in probability. 2.4 Asymptotic Distribution of 6,, In this section, we shall prove the asymptotic normality of fi(6n — 60). The first step towards this goal is to show that nhd||6n — 00))? = op(1). (2.10) Recall the definition of Zn, and let Dn(6) = f 2721(2, 6)d1,bh2(2). We claim that nthn(6n) = op(1). (2.11) To see this, observe that nhdM-n,(60) = nhdflrrli 2:111 Kh(z — Zi)Ci]2dz/}h2(z) :- Op(1) by (2.9) and (2.4). But, according to the definition of 6n, one has .M-n(6n) g 72 Mn(60). so nhdllln(6n) = Op(1). This fact, together with the inequality Dn(6n) S 2.6/[71((971) + 2Mn(60), proves (2.11). Next, we shall show that for any a > 0, there exists an Na such that P(Dn(6n)/||6n — 60H2 2 a + ”bilrll:1 bTEOb) > 1 — a, Vn > Na, (2.12) where 20 as in (m6). The claim (2.10) then will follow from (2.11), (2.12), (m6), and the fact. nthn<éni = nhdllén — 6012 - [Dam/116}. — 60112]. To prove (2.12), let an := in — 60, (2.13) I ' . dni 2= Hén(Zi) — H90(Z.i) — unH60(Zi)) IS 2 S n, 271“?) I: / [bl ° 72—12: Kh(2 - ZilH60(Zi)] 2dzbh2(2), b E Rq. i=1 Note that Dn(6n) /Z72l(2,6n) ~ 1/2 1/2 _._— : ——dibh(2)2D1+DQ—2D D , (2.14) nan -90||2 Hun“? 2 " n "1 "2 where D -= [[liK (2—Z-)( dm' )]ch5 (.~) n1 ' ”ll—1 h 2 “an“ ’ hQ “ ’ I —1 n . ' . "2 ' Hun“ h2 By the assumption (m4) and (2.10), one verifies that Dnl = op(1). For the term D712, note that 11,,22 inf 2,,(11). (2.15) llbll=1 73 Decompose 211(1)) 2 /[b’.1_ Z Khl’: — Zi)H90(Z2j)]2dtlr(2) Note that. EKh(2 — Z)H90(Z) = H90(2)fZ(2) + 0(1). Hence, by the Law of Large Numbers, Z,,,1(b) —+ b’ZOb, for every b E IN. Moreover, ‘ ———1|.>:,,1(b)=op(1), VbeRq. zEI wa(Z) Also, note that for any 6 > 0, and any two unit vectors b1, b2 E Rd and ||b1 —b2|I S (5, one has |$n1(b2) - 3711011)! = l/[<12—bl>'%§:Khd01(1/f§,, — 1/f§>dc —12n)<1/f§w 0, then (log,c n)_1(n/log n)2/(d+4) sgglfzwb) — fZ(2)| ——> 0 as. for any positive integer k. Proof of Lemma 2.4.1. Again this proof is similar to that of Lemma 4.1 of K-N but we include details here to see how the difference in the asymptotic variance appears. For convenience, we shall give the proof here only for the case q = 1, i.e., when [1},(2) is one dimensional. For multidimensional case the result can be proved by using linear combination of its components instead of [1 [1(2), and applying the same argument. Let 3,”: := f Kh(2 — Zi)C,jp,,(2)d1,/L1(2). Then fiS-n can be rewritten as J55" = 71-1/2 22:15,”. Note that 511i : 1 S i g n are i.i.d. centered random variables for each n. By the Lindeberg—Feller CLT, it suffices to Show that E8311 ——+ Z. E.s,2_111[|sn1|>n1/2A]—> 0, for VA > 0. (2.18) 76 2 711 is equal to In fact, one can show that Es ff]Kl'vlmtlaghdfz(U)/l),(u+th)/1,19%th) guflih) g(u+th) 2 dudvdt —> Z}, fZ(u +1.1h)fZ(u + th) thereby proving the first claim in (2.18). To prove the second claim, note that by the Holder inequality, E33111 [lsnll > Til/QM is bounded above by .. __. __ _ , (2+6)/2 I 2 A 5n 5/2133‘31 g A 5n 6/2E([/lKh(2—Z)uh(2) dw(2)] -|(|2+5). By assumption (e3), this upper bound is seen to be of the order 0((nh2)_6/2) = 0(1) by (h2), thereby proving the second claim in (2.18). The proof of (2.17) uses Lemma 2.4.3 and is similar to that of (4.6) of K-N, hence no details are given. Cl Proof of Lemma 2.4.2. This proof is similar to that of Lemma 4.2 in K-N with obvious modifications. Details are left out for the sake of brevity. Next, we shall show that the right hand side of (2.16) equals Rn(6n — 60), where Rn = 20 + 019(1). Recall the notation at (2.13). The right hand side of (2.16) can be written as the sum Wnl + W712, where dni Hun“ W n1 TL . ~ 1 . llunll - / me. 9n); 2 Kh - Un- Observe that n—l/Z/ EllKh(2 — Z)H90(Z)||2dy’1(2) = 0(n—1/2h—d) = 0(1). (2.19) By (2.4), (2.19) and the assumptions (m4), (m5), we can show that lanl“ = op(||un||) and W212 = 20 + 019(1). This proves Rn = 20 + 019(1). 77 Upon combining these results about the left hand side and the right hand side of (2.16), we obtain the following theorem. Theorem 2.4.1 Assume (e1)-(e.3’), (f1), (f2), (9). (k), (m1)-(m5), and (h3) hold. Then under H0, \/7—l(én —' 60) = 26.1711/2871 + 019(1). Consequently, VH6); — 60) => N(0,2612261), where 2 and 20 are defined in Lemma 2.4.1 and ( m6 ) respectively. The above theorem shows that the asymptotic variance of (fit—(6n — 60) consists of two parts. The part involving the element 0? reflects the variation in the regres— 2 reflects the variation in the sion model, while the part involving the component r measurement error. This is the major difference between asymptotic distribution of the m.d. estimators discussed for the classical regression model in the K-N paper and for the Berkson model here. 2.5 Asymptotic Distribution of the Minimized Distance This section contains a proof of the asymptotic distribution of the minimized distance Mn(6n). Recall the notation in (2.3), the main result proved in this section is the following Theorem 2.5.1 Suppose (e1), (e2), (e4), (f1), (f2), (9), (k), (m1)-(m5) and (h3) hold. Then under H0, nhd/2(1Mn(6n) — Cn) —>d N1(O, F). Moreover IP'nF—l — 1| 2 78 0})(1). . . —1 2 1 . Consequently, the test. that. rejects HO whenever 71h,([/2Fn / lilfn(6n) — Cnl > 20/2 is of the asymptotic size a, where 3a is the 100(1 — (.1)% percentile of the standard normal distribution. Our proof of this theorem is facilitated by the following five lemmas. Lemma 2.5.1 Suppose (61), (e2), (e4), (f1), (9), (k), (hl) and (h2) hold, then under H0, nhd/2(Mn(60) — on) —»d N1(0, r). Lemma 2.5.2 Suppose (61), (e2), (f1),(k), (m3)-(m5) (hl) and (h2) hold, then un- der H0, nhd/2|Mn(6n) — anon = 0pm). Lemma 2.5.3 Suppose (e1), (62), (f1), (f2), (k), (m3)-(m5) and (h3) hold, then under H0, Lemma 2.5.4 Under the same conditions as in Lemma 2. 5. 3, "lid/Zlén - énl = 01’“)- Lemma 2.5.5 Under the same conditions as in Lemma 2.5 2, I‘m — I‘ = op(1), Consequently, the positive definiteness ofI‘ implies lf‘nf‘_1 — 1| = op(1). The proof of the Lemma 2.5.1 is facilitated by Theorem 1 of Hall (1984) which is reproduced here for the sake of completeness. 79 Theorem 2.5.2 Let Zil S i S n, be i.i.d. random vectors, and let Lin :: Z HTI(Z~i~Zj)~ anJ'alI) f: EH"(ZII)Hll(21~y)~ 132(an where H n is a sequence of measurable functions symmetric under permutation with E[Hn(21,22)|21] = 0, EH,2,(Z”1,22) < 00 v n. 21. If, additionally. EG%(21, 22) + 71-1 EH74)(Z~19 22) 1EH.2.<21.22)12 —>0, asn—>oo, then Un is asymptotically normally distributed with the mean 0 and the variance 2 - - E; EH,2,(21.22). Proof of Lemma 2.5.1. Note that Mn(60) can be written as the sum of On and hing, where 1 . 121122 3: ”—2' Z / Kh(z “ ZilKh(Z _ Zlez'delé’(3)- 215.1 We shall prove that Mid/2111,12 —+d N1(O, I‘) with the help of Theorem 2.5.2. Let Zi = (Z1443) and Hn(2i,2j) = n_1hd/2th(2 — Z.l')[{h(z — Zj)CideU/'(Z). Then, nhd/2111n2 = 2 Z Hn(z‘,, 2). Observe that Hn(Z,-,Zj) is symmetric, E[Hn(Z1,Z1)|Z1] = 0, and EH,%(Z1,Z2) equals to 132—1}? // [/I('(u)1{(y Z :1: + 11) 02(1‘ — uh)fZ(r -uh)(1u]2dtf1(.r)dw(y) 80 which is finite for each n 2 1. Hence, to apply Theorem 2.5.2, it remains to show that EGfile‘ Z2) 0 “Q—IEH4(Z1 Z2) (2 20) [EH-n( (Z1 Z2) >12 [EHn( (Z1,Z2)l2 But by the similar method as in K-N’s paper, we can show that 20,2,(21. 22) = 0(n_4hd), EH;4,(21, 22) = our-411d). (2.21) EH3 (21, 22) (2.22) = g// [/K(u)K(y—;—£ + u)og(;c — uh)fZ(:r — uh)du] 2dw(21:)dw(y) = 001—2). This verifies (2.20). By (2.22), the continuity of (72(2) and fZ(2), we obtain that 2’12 EHI2I(Z~1222) converges to 2 :r %K(q//// u) (+w +)uK(v)K(v + w)(og(:c))2f%($) g4( )drdudvd11223) 12(33) = 2]“ 03))? may (:13) /( /K(u)K )K(w+u)du)2dw. This completes the proof of Lemma 2.5.1. C] Proof of Lemma 2.5.2. Recall the definitions of Un(2) and Zn(2, 6) from (2.6). Add and subtract H9 0(Z i) to the i-th summand inside the square integrand of Mn(6-n), to obtain that Mn(g())—1l[n(én) = Z/Unleznflaé'n.)d12}12(2)_/Zfi(zaén)(l'12h2(zl =3 2Q1"Q2‘ It thus suffices to show that. TIhd/2Q1 = op(1), nhd/2Q2 = op(1). (2.24) 81 By subtracting and adding (63" — 60)’H60(Zi) to the i-th summand of the second factor in Q1, we can rewrite Q1 as the sum of Q11 and Q12, where 1 n , Q11 == /Un( )[n ZKh(Z_Zi)d'nild7#/)h2(3), Q12 3: “rt/U729) MM 2 90)d¢h2(2 2) where dm’ are as in (2.13). By (2.10), for for any 77 > 0, there exists a k < 00, N < 00, such that P(An) _>_ 1— 77 for all n > N, where An = {(nhd)1/2||én — 60” S k}. By the Cauchy-Schwarz inequality, (2.4), (2.9), and the fact / wa (1th z) 0pm (2.25) we obtain that on An, nhd/ 2lQ11l is bounded above by _d__m' IT M 0 p((nhd) V2). fnnén — Hon ' (nhdfl/2 sup IsiSn,(nhd)1/2ll9—90ll l'lglflénllHnCZz')‘H60(sz)ll -0p<1>=op<1>, by (2.4), (2.9), (2.25) and assumptions (m4) and (h2). Next. note that 62121 is the same as the expression in the left hand side of (2.16). Thus it is equal to 82 tin/3M2 9n)#n(3 ~énld1+5h2(zl = “71/an .éninn(z.90)dt7)h2(z> +11.;l/Zn(z.éyl) [11.»n,(z,é-r1,) — flln(z,90)]dzfjh2(z) 2: D1+ D2. By Cauchy-Schwarz inequality, (2.4), (2.25), assumption (m1) and the compactness of G, nhd/2ID1I S nhd/Qllén — 60||0p(1) = Op(hd/2) = 019(1) by Theorem 2.4.1 and (h2). Similarly, one can show that nhd/2ID2| S nhd/zllén — 60||0p(1) = 0p(hd/2) = 0p(1). This completes the proof of the first claim in (2.24). The proof of the second claim in (2.24) is similar. Details are left out for the sake of brevity. Cl Proof of Lemma 2.5.3. Note that nhd/2|111n(90) - anon _—_n.d/2 in z—-'.'2 1 — 1 Z h l / lngm Z’Kll (fgwe) f§lda< )l _<. Mud/‘2 - 0p((nhd)“1> . 0p<2An i=1 1 n 1 n = .7? ' 1 K212 — zaqfamadwz) + 7—1—2— 21 / Kgcz — Zi)t22An(z)d1/J(z) Z: 2: 2 n ___2_ Z /Kg(z — Zl')CitiAn(Z)d'Cl’(z)- But all the three terms on the right hand side are of the order 0p((nhd/2)—1). Thereby completing the proof of the second claim of (2.26), and hence that of the lemma. [:1 Proof of Lemma 2.5.5. Define _ , . 2 - - Pin 3: 2n 211d 2 ([11,1(3 — Z.i)1\h(3 — Zj)CjdeU(/7)) = 2 Z H721(Z'i’Zj)’ #J' 1791' 2hd(n — l)n_1//[E11h(r — Z)11'h(y — Z)UE(Z)]2dz,/b(r)d¢i(y). Pn, We shall prove It"; — fn = 01)(1).fn —- Fn '2 01)(1), Fn — P = 0p(1). (2.27) 84 Note that Fn, can be rewritten as the sum of the following three terms: 2 B1 ;= 2,—2/.d2(/1(h(.3,_2/.z—)Ah(—ZJ-)(C —t )(cJ— tj)du(z)) , #J 2 B2 := 2n(_2hd:: (Lt—[11,, KW ‘21)“) —' HMCJ' " tlenaldt’i’Wl) ) #J' B3 ;= 4.2-2m; (faw z— Z)Kw(z-Z,-)<<.- —t.:)<<,- —t,—)dw<2)- 2 J / Kh(z)) =0p<1), #J‘ 2 "Md: W19, Z-)Kh(z — Zj)|C,-|d-¢b(z)) = 010(1). iaéJ' n-2,.) Z ( [Kc Z'lKh(Z — Z,-)d-u>(z))2 = 0pm. #J' Furthermore, we also have supA 2 =01, max t =01 zEI (1() p() 13i = K*=,—95(1—z%><1—z3>1<121|31.122131). h = 71—1/4'5, w=n—1/6(logn)1/6. The sample sizes chosen are 50, 100, 200 and 300, each repeated 1000 times. Table 2.3 lists the means and the MSE of the estimator én = (énlv 97,2), which are obtained by minimizing 1147109) and employing the Newton-Raphson algorithm. As in the case 1, one sees little bias in the estimator for all chosen sample sizes. Table 2.4 gives the empirical sizes and powers for testing Model 0 against Models 1 - 3. The entries in Table 2.4 corresponding to Model 0 are used to study the empirical size of the m.d. test, and the entries from Models 1 - 3 are used to study the empirical power of the test. From this table one sees that our m.d. test is conservative when the sam le sizes are small. while the sizes do increase with the sam le sizes and indeed P . 91 preserve the nominal size 0.05. It also shows that the m.d. test performs well for sample sizes larger than 200 at all alternatives. Sample Size 50 100 200 300 Mean of 0,,1 0.9978 0.9973 0.9974 0.9988 MSE ofénl 0.0190 0.0095 0.0053 0.0034 Mean of 0,2 1.9962 1.9965 2.0013 2.0004 0.0063 0.0028 0.0014 0.0010 MSE of 0,2 Table 2.3: Mean and MSE of (in, d = 2, q = 2 Sample size 50 100 200 300 Model 0 0.003 0.019 0.049 0.052 Model 1 0.158 0.843 0.979 0.996 Model 2 0.165 0.840 0.976 0.992 Model 3 0.044 0.608 0.954 0.997 Table 2.4: Levels and powers of the MD. test, d = 2, q = 2 92 BIBLIOGRAPHY [1] An, H.Z., Cheng, B., (1991). A Kolmogorov-Smirnov type statistic with appli- cation to test for nonlinearity in time series. Int. Statist. Rev. 59, 287-307. [2] Anderson,T.W. (1984). Estimating Linear Statistical Relationships. Ann. Statist. 12 1-45. [3] Beran, R.J. (1977). Minimum Hellinger distance estimates for parametric models. Ann. Statist. 5 445-463. [4] Berkson, J. (1950). Are these two regressions? J. Amer. Statist. Assoc. 5 164- 180. [5] Bickel, P.J. 82: Ritov,Y. (1987). Efficient Estimation in the Errors in Variables Model. Ann. Statist. 15, 2, 513-540. [6] Bosq, D. (1998). Nonparametric statistics for stochastic processes: Estimation and Prediction, 2nd edition. Springer Lecture Notes in Statistics, 110. Springer- Verlag, New York, Inc. [7] Carroll,R.J. & Hall,P. (1988). Optimal rates of convergence for deconvoluting a density. JASA. 83 1184-1185. [8] Carroll,R.J. & Spiegelman,C.H. (1992). Diagnostics for nonlinearity and het- eroscedasticity in errors in variables regression. Technometrics 34 186-196. [9] Carroll, R.J., Ruppert, D. and Stefanski, LA. (1995). Measurement Error in Nonlinear Models, Chapman & Hall/CRC, Boca Raton. [10] Cheng, C. and Van Ness, J .VV .(1999). Statistical regression with measurement error. Arnold, London. [11] Cheng, CL. and Kukush, A.G.(2004). A goodness-of— fit test for a polynomial errors-in-variables model, 56 641-661. 93 [12] Elias Masry (1993). Strong consistency and rates for deconvolution of multivari- ate densities of stationary process. Stochastic Processes and their Applications 47 53-74. [13] Eubank.R.L., Hart, J.D., (1992). Testing the goodness of fit in regression via order selection criteria. Ann. Statist. 20 1412-1425. [14] Eubank,R.L., Hart, J.D., (1993). Commonality of CUMSUM, von Neurnann and smoothing based goodness-of-fit tests. Biometrika 80 89-98. [15] Eubank,R.L., Spiegelman, OH, (1990). Testing the goodness of fit of a linear model via nonparametric regression techniques. J. Amer. Statist. Assoc. 85 387- 392. [16] Fan,J. (1991a). On the optimal rates of convergence for nonparametric deconvo- lution problems. Ann. Statist. 19 1257-1272. [17] Fan,J. (1991b). Asymptotic normality for deconvolution kernel density estima- tors. Sankhyc’i Ser. A. 53 97-110. [18] F an,J . & Truong, K.T. (1993). Nonparametric regression with errors in variables. Ann. Statist. 21 1900-1925. [19] Fuller, W .A. ( 1987). Measurement Error Models.Wiley, New York. [20] Gleser,L.J. (1981). Estimation in a Multivariate ”Errors in Variables” Regression Model: Large Sample Results. Ann. Statist. 9, 1, 24-44. [21] Hart, JD. (1997). Nonparametric smoothing and lack-of-fit tests. Springer- Verlag, New York, Inc. [22] Huwang, L. and Huang, Y.H.S. (2000). On errors-in-variables in polynomial re- gression - Berkson case. Statist. Sinica. 10, 923-936. [23] Koul, Hira L. and Pingping Ni (2004). Minimum distance regression model check- ing, J. Stat. Plann. Inference 119, No.1, 109-141. [24] Mack, Y.P. and Silverman, B.W. (1982). Weak and strong uniform consistency of kernel regression estimates, Z. Wahrsch. Gebiete 61, 405-415. [25] Rudemo. M., Ruppert, D. and Streibig, J. (1989). Random effect models in nonlinear regression with applications to bioassay. Biometrics. 45 349-362. 94 [26] Stute, W". (1997). Nonparametric model checks for regression. Ann. Statist. 25 613-641. [27] Stefanski, LA, and Carroll, RJ. (1991). Deconvolution-based score tests in measurement error models. The Annals of Statistics 19 249-259. [28] Stute, W. (1997). Nonparametric model checks for regression. Ann. Statist. 25 613-641. [29] Stute, W., Thies, S., Zhu, L.X. (1998). Model checks for regression: an innovation process approach. Ann. Statist. 26, 1916-1934. [30] Wang, L. (2003). Estimation of nonlinear Berkson-type measurement errors mod- els. Statist. Sinica. 13, 1201-1210. [31] Wang, L. (2004). Estimation of nonlinear models with Berkson measurement errors.Ann. Statist. 32, 6, 2559—2579. [32] Wolfowitz, J ., (1953). Estimation by the minimum distance method. Ann. Inst. Statist. Math, Tokyo, 5 9-23. [33] Wolfowitz, J., (1954). Estimation by the minimum distance method in nonpara- metric stochastic difference equation. Ann. Math. Statist., 25, 203-217. [34] Wolfowitz, J ., (1957). The minimum distance method. Ann. Math. Statist., 28, 75—88. [35] Zheng, J .X., (1996). A consistent test of functional form via estimation technique. J. Econometrics, 75, 263-289. [36] Zhu,L.X., Song,W.X., & Cui,H.J. (2003). Testing lack-of-fit for a polynomial errors-in—variables model. Acta Math. Appl. Sin. Engl. Ser. 19 353-362. 95 II[[[l]]]l]]l]]l[llj]][[1]][1