GOODNESS-OF-FIT TESTING OF ERROR DISTRIBUTION IN NONPARAMETRIC ARCH(1) MODELS AND LINEAR MEASUREMENT ERROR MODELS By Xiaoqing Zhu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics—Doctor of Philosophy 2015 ABSTRACT GOODNESS-OF-FIT TESTING OF ERROR DISTRIBUTION IN NONPARAMETRIC ARCH(1) MODELS AND LINEAR MEASUREMENT ERROR MODELS By Xiaoqing Zhu This thesis discusses the goodness-of-fit testing of an error distribution in a nonparametric autoregressive conditionally heteroscedastic model of order one and in the linear measurement error model. For the nonparametric autoregressive conditionally heteroscedastic model of order one, the test is based on a weighted empirical distribution function of the residuals, where the residuals are obtained from a local linear fit for the autoregressive and heteroscedasticity functions, and the weights are chosen to adjust for the undesirable behavior of these nonparametric estimators in the tails of their domains. An asymptotically distribution free test is obtained via Khmaladze martingale transformation. A simulation study is included to assess the finite sample level and power behavior of this test. It exhibits some superiority of this test compared to the classical Kolmogorov-Smirnov and Cram´er-von Mises tests in terms of the finite sample level and power. For the linear measurement error model, a class of test statistics are based on the integrated square difference between the deconvolution kernel density estimators of the regression model error density and a smoothed version of the null error density, an analog of the so called Bickel and Rosenblatt test statistics. The asymptotic null distributions of the proposed test statistics are derived for both the ordinary smooth and super smooth cases. The asymptotic powers of the proposed tests against a fixed alternative and a class of local non- parametric alternatives for both cases are also described. A finite sample simulation study shows some superiority of the proposed test compared to some other tests. To my beloved parents, Shijun Zhu and Guangxia Lv, my brother, Yingming Zhu, and my boyfriend, Silong Zhang. iv ACKNOWLEDGMENTS Foremost, I would like to express my sincere gratitude and appreciation to my advisor Dr. Hira L. Koul for his continuous support for my Ph.D. study and research, for his patient guidance, encouragement, valuable criticism, and immense knowledge. His enlightening ideas, constructive comments, numerous advises and uncountable time he spent make it possible for me to finish this dissertation. His enthusiasm in research, and optimistic and energetic attitude will help me in my future life. Besides of my advisor, I would also like to thank Dr. Lyudmila Sakhanenko, Dr. David Todem, and Dr. Yimin Xiao for serving as members of my doctoral committee and their invaluable suggestions. My sincere thanks also go to Dr. Shiyuan Zhong from the Department of Geography at MSU, for offering me collaboration opportunities in her group and providing me chances to work in diverse exciting projects. I would also like to thank Dr. Weixing Song from the Department of Statistics at Kansas State University for his interesting course on measurement error models. I would like to express my deepest appreciation to Dr. Zhiying Wen from Tsinghua University for his continuous support and invaluable encouragement all these years. Thanks also go to the entire faculty and staff members in the Department of Statistics and Probability who have taught me and helped me during my study at MSU. My special thanks go to Dr. Yimin Xiao for his interesting courses, valuable advise and encouragement. Thanks to the graduate school, the College of Natural Science and the Department of Statistics and Probability who provided me the Dissertation Continuation Fellowship and traveling fellowships for working on my thesis and attending conferences. This dissertation is also supported in part by the grant NSF DMS 1205271, P.I. Hira L. Koul. v I would also like to thank my academic family members: Liqian Cai, Bin Gao, Tao He, Abhishek Kaul, Lisi Pei, Dr. Xin Qi, Honglang Wang, Yuzhen Zhou and all other students from the Department of Statistics and Probability for the numerous discussions and all the fun we had in the last four years. Last but not least, I would express my profound gratitude to my beloved parents, Shijun Zhu and Guangxia Lv, my older brother, Yingming Zhu and my boyfriend, Silong Zhang for their love, endless support and encouragements all these years. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 ix Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 Nonparametric ARCH(1) Models . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Autoregressive and Variance Functions Estimation . . . . . . . . . . . . . . . 2.3 Goodness-of-fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Asymptotic expansion for the weighted empirical distribution function 2.3.2 Khmaladze martingale transformation . . . . . . . . . . . . . . . . . 2.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Limiting process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 6 11 11 13 20 23 29 32 Chapter 3 Linear Measurement Error Models 3.1 Introduction . . . . . . . . . . . . . . . . . . . 3.2 Asymptotic Null Distribution . . . . . . . . . 3.2.1 Ordinary smooth case . . . . . . . . . 3.2.2 Super smooth case . . . . . . . . . . . 3.3 Consistency and Asymptotic Power . . . . . . 3.4 Simulations . . . . . . . . . . . . . . . . . . . 3.4.1 Ordinary smooth case . . . . . . . . . 3.4.2 Super smooth case . . . . . . . . . . . 3.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 50 55 55 58 59 64 65 70 74 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 BIBLIOGRAPHY vii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF TABLES Table 2.1 Monte carlo critical values of the KS and CvM tests. . . . . . . . . . . . 31 Table 2.2 Empirical levels of Un test . . . . . . . . . . . . . . . . . . . . . . . 32 Table 2.3 Empirical powers of tests based on σ ˆ22 . . . . . . . . . . . . . . . . . . 32 Table 3.1 Monte Carlo critical values of all the tests, ordinary smooth case. . . . . 67 Table 3.2 Empirical powers against chosen alternatives, ordinary smooth case. . . 68 Table 3.3 Empirical powers against mixture normal (left panel) and logistic alternatives, ordinary smooth case. . . . . . . . . . . . . . . . . . . . . . . . . 69 Table 3.4 Monte Carlo critical values of the TKS , TCvM , and Wn , super smooth case. . 70 Table 3.5 Empirical powers against alternative distributions, super smooth case. . 72 Table 3.6 Empirical powers against mixture normal (left panel) and logistic distributions, super smooth case. . . . . . . . . . . . . . . . . . . . . . . . . 73 viii KEY TO ABBREVIATIONS • ARCH(1): autoregressive conditionally heteroscedastic model of order 1. • Gof: Goodness-of-fit. • KS: Kolmogorov-Smirnov. • CvM: Cram´er-von Mises. • KK: Khmaldaze and Koul (2009). • MSW: M¨ uller, Schick, and Wefelmeyer (2012). • d.f.: distribution function. • i.i.d: independent and identically distributed ix Chapter 1 Introduction One of the classical problems of statistical inference is to test if a given random sample comes from a given continuous distribution. This is the so called goodness-of-fit testing problem. A well known test for this problem is Kolmogorov’s tests based on empirical distribution function, which is asymptotically distribution free. This is desirable because it makes this test implementable for large or moderate sample sizes. This property is lost as soon as there is a nuisance parameter present in the testing problem as happens to be case when, for example, one is fitting a given distribution up to an unknown location parameter or up to unknown location and scale parameters. Similarly, analogous tests based on the residual empirical process in regression or in autoregressive conditionally heteroscedastic time series models are not asymptotically distribution free for fitting a known distribution function (d.f.) to the error d.f. One way to obtain asymptotically distribution free tests from residual empirical process in these models is to base tests on its Khmaladze (1981) martingale transform. This has been successfully done in parametric and non-parametric regression models in Khmaladze and Koul (2004, 2009). M¨ uller, Schick and Wefelmeyer (2012) developed analogous transformation test based on certain weighted residual empirical process for fitting a known error d.f. in nonparametric autoregressive time series models of order 1. Chapter 2 of this thesis pertains to developing and analyzing analogously transformed process for fitting a known error d.f. to the error d.f. in nonparametric autoregressive conditionally heteroscedastic time series models of order 1 1. The supremum test based on this transform is asymptotically distribution free. A finite sample study shows accuracy of the asymptotic null distribution of this test, and that its empirical power dominates that of the Kolmogorov test based on weighted residual empirical process at all chosen alternatives, levels, and sample sizes. Another way to obtain asymptotically distribution free tests in these problems is to assume densities exist and use nonparametric estimators of densities to fit a given density. For fitting a known density in the one sample set up, Bickel and Rosenblatt (1973) were the first to investigate the asymptotic null distribution of a test based on a L2 -distance between a kernel type density estimator and its null expected value. The asymptotic null distribution of a suitably standardized version of this statistics was shown to be standard Gaussian. Since then numerous papers have appeared proposing tests based on analogs of this statistics in various models having some nuisance parameters. A desirable property of this statistics is that its asymptotic null distribution is not affected by not knowing the nuisance parameters in the one sample location-scale models. Lee and Na (2002), Bachmann and Dette (2005), Horvath and Zitikis (2006) and Koul and Mimoto (2012) observed that this fact continues to hold for the analog of this statistics when fitting an error density based on residuals in parametric autoregressive and generalized autoregressive conditionally heteroscedastic time series models. A similar fact has been observed to hold by Ducharme and Lafaye de Micheaux (2004) in parametric autoregressive moving average models, by Cheng and Sun (2008) in parametric nonlinear autoregressive time series models, by Bercu and Portier (2008) for multivariate ARMAX models in adaptive tracking, and by Na (2009) for infinite-order autoregressive models. The regression models where covariates are not directly observable are abound in real world applications as is evidenced by the three monographs of Fuller (1987), Carroll, Rup2 pert and Stefanski (1995), and Cheng and Van Ness (1999). In these models one observes a surrogate for the covariates with some error. These are known as measurement error regression models or errors-in-variables regression models. Statistical inference in these models is highly sensitive to the knowledge of the error distributions. Knowing the regression model error distribution can help to develop efficient inference for the underlying parameter in these models. It is thus of interest to develop goodness-of-fit tests for fitting a known error density to the regression model error density in the presence of measurement error in the covariates. Chapter 3 of this thesis pertains to developing a goodness-of-fit tests for this testing problem in linear measurement error regression models. The test statistics are of the above L2 type distance based on a class of deconvoluted error density estimators and the smoothed version of null error density. Two types of tail properties of the measurement error distribution are considered, which are the ordinary smooth case and super smooth case. For each case, a comprehensive theoretical analysis of the asymptotic distributions of these statistics under null hypothesis, under a fixed alternative and under a sequence of local nonparametric alternatives is presented. A member of this class of tests is compared via a finite sample simulation with some other tests. It dominates several of these tests in terms of the power at the chosen alternatives when the measurement error is large. 3 Chapter 2 Nonparametric ARCH(1) Models 2.1 Introduction In recent years, there has been a considerable focus for providing asymptotically distribution free tests for fitting a known error distribution in regression and autoregressive and moving average models. Boldin (1982, 1990), Koul (1991, 2002), Khmaldaze and Koul (2004), Koul and Ling (2006), among others, focus on tests based on residual empirical distribution function (d.f.) in parametric cases. Khmaldaze and Koul (2009) provide martingale transform tests based on residual empirical d.f. for nonparametric regression models, and M¨ uller, Schick, and Wefelmeyer (2012) provide similar tests fitting an error distribution in semiparametric partially linear regression models. The focus of the present chapter is to analyze an analog of the above tests for fitting an error distribution in nonparametric autoregressive conditionally heteroscedastic models of order 1 (ARCH(1)). One of the main problems faced here is the construction of the nonparametric residuals so that the corresponding residual empirical d.f. obeys uniform asymptotic linearity expansion up to the first order. M¨ uller et al. (2009) obtained this type of a result for nonparametric homoscedastic autoregressive time series models of order 1. In this chapter we extend this result to a class of ARCH(1) models. The chapter is organized as follows. In section 2, we introduce the local linear estimators of autoregressive and variance functions and state their uniform strong consistency. The 4 asymptotic uniform linear expansion of a suitably standardized weighted residual empirical process based on the corresponding residuals, and the asymptotic distributions of the test based on the martingale transform of these weighted residual empirical processes are established in section 3. Several examples of error d.f.’s where the results of this chapter are applicable are also discussed in section 3. A simulation study of section 4 shows that the finite sample power of the martingale transform test is uniformly higher than that of the Kolmogorov-Smirnov test based on a weighted residual empirical process at all chosen alternatives. This finding is consistent with that reported in Khmaladze and Koul (2009) (KK) when dealing with nonparametric regression models. The same simulation study also shows some superiority of the proposed test over the Cram´er-von Mises based on a weighted residual empirical process in terms of the finite sample level and power at the chosen alternatives. The proofs of some technical results pertaining to nonparametric estimators of autoregressive and heteroscedasticity functions and those of the asymptotic uniform linearity of the weighted residual empirical process are deferred to the last section of this chapter, section 2.5. One of the novelties of this chapter is in the implementation of the Khmaladze martingale transform test in ARCH(1) models even when the incomplete Fisher information matrix is singular. In the location set up alone this matrix is known to be singular for double exponential error distribution. In this chapter we note that this matrix is singular also for a class of t-distributions in the present location-scale context, which is unlike in the location set up where it is nonsingular as was noted in KK. 5 2.2 Autoregressive and Variance Functions Estimation Consider the nonparametric ARCH model of order 1 Xi = m(Xi−1 ) + σ(Xi−1 )εi , i ∈ Z := {0, ±1, · · · }, (2.2.1) where εi , i ∈ Z are independent copies of a standardized random variable (r.v.) ε, and εi is independent of Xi−1 , for all i ∈ Z. Note that then m(x) = E(Xi |Xi−1 = x), and σ 2 (x) = E{(Xi − m(Xi−1 )2 |Xi−1 = x}, x ∈ R, i ∈ Z. Let F be a known d.f. We are interested in testing the hypothesis that the d.f. of ε is F . Any test of such a hypothesis has to be based on the estimated residuals, which in turn needs suitable estimators of the nonparametric functions m and σ. Several researchers have investigated numerous nonparametric estimators of m and σ in regression and autoregressive models. In order to use these estimators in the above testing problem, one needs their uniform consistency. For homoscedastic regression models with bounded dependent variable, Ojeda (2008) established the H¨older continuity properties of the local polynomial estimators of the regression function for the one dimensional covariate case. For heteroscedastic regression models, Neumeyer and Van Keilegom (2010) established the uniform consistency of the local polynomial estimators for the regression and variance functions in the case of multidimensional covariates. To estimate the variance function, they use the estimators of the type a ˆ−m ˆ 2 (see also Yao and Tong (1994), where a ˆ(x) and m(x) ˆ are estimators of E(Y 2 |X = x) and m(x), respectively. For homoscedastic autoregressive models, Masry (1996) proved the uniform consistency over compact sets of multivariate local polynomial estimators of the autoregressive function, provided the time series is α−mixing. For stationary and ergodic auto-regressive time series of order 1, MSW proved the uniform 6 consistency over a sequence of compact intervals increasing to R of the local linear estimators of the autoregressive function. For the one dimensional α−mixing time series model, Neumeyer and Selk (2013) proved the uniform consistency over a sequence of compact intervals increasing to R of the Nadaraya-Waston estimators for autoregressive and variance functions. Fan and Yao (1998) provided the asymptotic properties for an efficient fully-adaptive 2 |X = x) estimator for the variance function, i.e. the local linear estimator of E((Y − m(X)) ˆ in the one dimensional β−mixing case. Different from the mixing condition, Wu, Huang and Huang (2010) gave a moment contracting condition for the dependence properties of a general autoregressive model, and established the uniform consistency for the NadarayaWaston type estimators of the autoregressive function over a bounded compact set. Based on the moment contracting condition for one dimensional stationary autoregressive model, Borkowski and Mielniczuk (2012) established the asymptotic distributional properties of the 2 |X = x). efficient fully-adaptive local linear estimator of the variance function E((Y − m(X)) ˆ To proceed further, we now define the estimators of interest here. Let K and W be density kernel functions and h1 and h2 be the bandwidths. Define n (ˆ a0 (x), ˆb0 (x)) = arg min α,β Xi − α − β(Xi−1 − x) 2 K i=1 Xi−1 − x , h1 x ∈ R. (2.2.2) Note that a ˆ0 (x) and ˆb0 (x) are the local linear estimators of m(x) and the first derivative m(x) ˙ of m(x), respectively. Henceforth, m(x) ˆ =a ˆ0 (x). To estimate σ 2 (x), we shall consider the following two methods. The first one is based on Yao and Tong (1994), where σ ˆ 2 (x) ≡ σ ˆ12 (x) = a ˆ1 (x) − m ˆ 2 (x), and n Xi2 − α − β(Xi−1 − x) (ˆ a1 (x), ˆb1 (x)) = arg min α,β i=1 7 2 W Xi−1 − x . h2 (2.2.3) The second estimator is based on the work of Fan and Yao (1998), who suggested an efficient fully-adaptive procedure, σ ˆ 2 (x) ≡ σ ˆ22 (x) = a ˆ2 (x), where n (ˆ a2 (x), ˆb2 (x)) = arg min α,β rˆi − α − β(Xi−1 − x) 2 W i=1 Xi−1 − x . h2 (2.2.4) Here rˆi = [Xi − m(X ˆ i−1 )]2 . We shall show that both of these estimators of σ 2 (x) yield the same asymptotic result for the proposed goodness-of-fit tests under similar conditions. Here we shall present some consistency results about these estimators. In order to do so, we need some assumptions as follows. In the sequel, for any twice differentiable function g, g˙ and g¨ represent the first and second derivatives of g, respectively. All limits are taken as n → ∞, unless specified otherwise. Assumptions: (E) There exists some b > 1 + √ 3 such that E[|X0 |2b ] < ∞ and E[|ε1 |2b ] < ∞. (F) The innovation εj , j ∈ Z, are i.i.d. F . The density f of F is continuously differentiable and supx∈R |xf (x)| < ∞ as well as supx∈R |x2 f˙(x)| < ∞. (H) The sequence of bandwidths hi = αi cn , i = 1, 2, αi > 0, cn → 0 and √ 3 ) → 0, (log n)η /(nc2+ n nc4n (log n)η → 0, ∀ η > 0. (2.2.5) If σ ˆ22 (x) is used, cn also satisfies, 3.8 ) → 0, (log n)η /(ncn ∀ η > 0. (2.2.6) (I) The two sequences of real numbers an , bn satisfy the following conditions: an < 0 < bn , 8 −an and bn tend to infinity such that for an 0 ≤ r1 < ∞, (bn − an ) = O((log n)r1 ), and P (X0 ≤ an + λ) + P (X0 ≥ bn − λ) = o((log n)−1 ), for any λ > 0. (KZ) For the α−mixing process, the kernel density K is supported on [−1, 1], symmetric around 0 and three times differentiable, with all three derivatives bounded. Moreover ˙ K(1) = K(1) = 0. The kernel W satisfies the same conditions. (KZ ) For the geometric moment contracting process, the kernel density K is supported on [−1, 1], symmetric around 0 and three times continuously differentiable. The kernel W satisfies the same conditions. (M) The functions m and σ are four times differentiable and there exist constants 0 < d1 < d2 < ∞, 0 ≤ rq , rs < ∞, and sequences qn , qn,σ such that for all sufficiently large n, d1 < qn < d2 (log n)rq , d1 < qn,σ < d2 (log n)rs , supx∈[an −h , bn +h ] |m(k) (x)| = 1 1 O(qn ), and supx∈[an −h , bn +h ] |σ (k) (x)| = O(qn ), k = 0, 1, 2, 3, 4, and 2 2 (inf x∈In |σ(x)|)−1 = O(qn,σ ), where h1 , h2 are as in (H) above. (X) The observations Xj , j ∈ Z have a common marginal density g, which is bounded and four times differentiable with bounded derivatives. The density is also bounded away from zero on compact intervals. There exists some 0 ≤ rg < ∞ such that qn,g = (inf x∈In g(x))−1 = O((log n)rg ), where In := [an , bn ], with an , bn as in Assumption (I). (Z) The process (Xj )j∈Z is α-mixing with mixing-coefficient α(n) = O(n−κ ), for some √ √ (3 + 3)b + 2 + 3 √ √ ,7 . κ > max 2 (1 + 3)b − 2(2 + 3) 9 Moreover, supx∈R ((|m(x)| + |σ(x)|)2k )g(x) < ∞, and there exists a j ∗ ≥ 1 such that sup ((|m(x)| + |σ(x)|)k (|m(x )| + |σ(x )|)k gX0 ,Xj−1 (x, x )) < ∞, ∀ j > j ∗ + 1, x,x ∈R for k = 1, 2, where gU,V denotes joint density of any two r.v’s (U, V ). (Z ) Xn = J (· · · , εn−1 , εn ), which is a σ−field generated by · · · , εn−1 , εn . Also (Xt )t∈Z is geometric moment contracting, i.e. let Y p = (E|Y |p )1/p , for n > 0, some q > 1 and 0 < r < 1, Xn − Xn∗ q = O(rn ), where Xn∗ = J (· · · , ε−1 , ε∗0 , · · · , εn−1 , εn ) and ε∗0 is an independent copy of ε0 . The above assumptions (E), (F), (H), (I), (KZ), (M), (X) and (Z) are similar to the conditions in Neumeyer and Selk (2013) for the mixing processes. Assumption (Z ) is similar as in Borkowski and Mielniczuk (2012) when the process satisfies the moment contracting condition, and the kernel conditions (KZ ) are similar to those in M¨ uller, Schick, and Wefelmeyer (2009) (MSW). The relation (2.2.6) in assumption (H) is needed only for the analysis of σ ˆ22 (x). We are now ready to state a uniform consistency result for the above estimators of m and σ 2 . Its proof is deferred to the last section. Throughout the chapter, In := [an , bn ], with an , bn as in Assumption (I). Lemma 2.2.1 Suppose (2.2.1), (F), (H), (I), (KZ) or (KZ ), (X), (Z) or (Z ), and (M) hold. Then m(x) ˆ − m(x) = Op σ(x) x∈In σ ˆ (x) − σ(x) sup i = Op σ(x) x∈In sup cn −1/2 −1/2 n (log n)1/2 Qn , −1/2 −1/2 n (log n)1/2 Q2n , cn 10 (2.2.7) i = 1, 2, (2.2.8) where Qn = qn qn,g qn,σ . ˆ the Khmaladze marThe next section describes the proposed weighted empirical d.f. F, ˆ its asymptotic distribution under null hypothesis and tingale transform test based on F, computation of the test statistics for several distributions. 2.3 2.3.1 Goodness-of-fit Tests Asymptotic expansion for the weighted empirical distribution function To begin with we need to introduce the weighted residual empirical d.f. Unlike in the regression case, MSW noted that the dependency and unboundedness of the observations create some technical difficulties in autoregressive time series models because of the poor performances of the estimator m(x) ˆ for large values of x. They used only those residuals εˆj = Xj − m(X ˆ j−1 ), for which Xj−1 falls in the interval In = [an , bn ]. Analogously, we use the following weighted residual empirical process. Fix a λ > 0. Let ωn (x) ∈ (0, 1) be a sequence of functions arbitrarily defined for x in the intervals [an , an + λ) and (bn − λ, bn ]. In addition, assume that ωn (x) is three times (j) differentiable in x with uniformly three bounded derivatives, i.e., supn∈N supx∈R |ωn (x)| < ∞, j = 1, 2, 3, and satisfies 1, ωn (x) = 0, x ∈ [an + λ, bn − λ], x ∈ [an , bn ]. 11 (2.3.1) Let ωnj = ωn (Xj−1 ) and ω ¯j = ωnj , n i=1 ωni j = 1, · · · , n. Let εˆj := (Xj − m(X ˆ j−1 ))/ˆ σ (Xj−1 ), where µ ˆ, σ ˆ are as in the previous section. Then the weighted residual empirical d.f. of interest is n ˆ F(x) = ω ¯ j I(ˆ εj ≤ x), x ∈ R. (2.3.2) j=1 We also need the empirical d.f. based on the true errors 1 Fn (x) = n n I[εj ≤ x], x ∈ R. j=1 For the one dimension autoregressive homoscedastic regression model, where εˆj = Xj − m(X ˆ j−1 ), MSW established, under the null hypothesis and under some conditions, that 1 ˆ sup |F(x) − Fn (x) − f (x) n x∈R n εj | = op (n−1/2 ). j=1 Neumeyer and Selk (2013) obtained an analogous result for the ARCH(1) model (2.2.1) by using nonparametric residuals based on Nadaraya-Waston type estimators of autoregressive and variance functions. Under some conditions, they proved that 1 ˆ sup F(x) − Fn (x) − f (x) n x∈R n x [εj + (ε2j − 1)] = op (n−1/2 ). 2 (2.3.3) j=1 Theorem 2.3.1 below shows that this result continues to hold when residuals are based 12 on the local linear fitting of m(x) and σ 2 (x) as defined in (2.2.2)–(2.2.4). Theorem 2.3.1 Under the assumptions (2.2.1), (E), (F), (H), (I), (KZ) or (KZ ), (M), (X) and (Z) or (Z ), (2.3.3) continues to hold. 2.3.2 Khmaladze martingale transformation The classical tests for the goodness-of-fit testing of an error distribution are the KolmogorovSmirnov (KS) and Cram´er-von Mises (CvM) tests. Using the the asymptotic expansion (2.3.3) we readily obtain the following Corollary 2.3.1 Under the conditions of Theorem 2.3.1, ˆ KS = n1/2 sup |F(x) − F (x)| →d sup |R(x)|, x∈R CvM = n x∈R ˆ ˆ (F(x) − F (x))2 dF(x) →d R2 (x)dF (x), where R(x) is a zero-mean Gaussian process with covariance function Cov(R(x1 ), R(x2 )) = E x I(ε ≤ x1 ) − F (x1 ) + f (x1 )(ε + 1 (ε2 − 1)) 2 x × I(ε ≤ x2 ) − F (x2 ) + f (x2 )(ε + 2 (ε2 − 1) 2 . Clearly these limiting null distributions depend on F in a complicated fashion and to date no theoretical results about their quantiles are available, which makes it impractical to implement these tests in practice, even for large samples. Instead, we propose to use the ˆ to obtain asymptotically distribution free tests. Khmaladze martingale transformation of F 13 To proceed further, as in KK, assume F has an absolutely continuous density f with almost derivative f˙. Let ψf (x) = −f˙(x)/f (x). We assume I(f ) = ψf2 (x)dF (x) = f˙ 2 dF < ∞. f (2.3.4) Note that Eε2 < ∞ and (2.3.4) imply [xψf (x) − 1]2 dF (x) < ∞. (2.3.5) Thus (2.3.4) and (2.3.5) guarantee the finiteness of the Fisher information for location and scale parameters. Consider the extended score function vector h(x) = (1, ψf (x), xψf (x) − 1)T , for locationscale family F ((y − θ)/σ) with respect to both θ and σ, at θ = 0 and σ = 1. Define the incomplete information matrix ∞ h(y)hT (y)dF (y) ΓF (x) = x  f (x)  1 − F (x)   ∞ ˙2 =  f (x) x (f (y)/f (y)dy   ∞ ˙ ˙ xf (x) x (f (y) + y f (y))f (y)/f (y)dy  xf (x)    ∞ ˙ ˙ . x (f (y) + y f (y))f (y)/f (y)dy   ∞ 2 ˙ x (f (y) + y f (y)) /f (y)dy Suppose ΓF (x) is nonsingular, for all x ∈ R, and define, as in KK, for a signed measure v, x K(x, v) = −∞ hT (y)Γ−1 F (y) ∞ h(z)dv(z)dF (y), y 14 x ∈ R. (2.3.6) If we define a vector function x hdF = (1 − F (x), −f (x), −xf (x))T , H(x) = −∞ then analogous to (2.4) of KK, we obtain H T (x) − K(x, H T ) = 0, ∀ x ∈ R. (2.3.7) Let vˆn (x) = √ ˆ n[F(x) − F (x)], vn (x) = √ n[Fn (x) − F (x)], x ∈ R. The Khmaladze martingale transformed processes Un and Un are defined as Un (x) = Un (x) = √ √ ˆ ˆ = vˆn (x) − K(x, vˆn ), n[F(x) − K(x, F)] (2.3.8) n[Fn (x) − K(x, Fn )] = vn (x) − K(x, vn ). Based on the asymptotic expansion (2.3.3), we can rewrite Un (x) = Un (x) + ηn (x), ηn (x) = ξn (x) − K(x, ξn ), 1 ξn (x) = vˆn (x) − vn (x) − f (x) √ n n x [εj + (ε2j − 1)], 2 j=1 sup |ξn (x)| = op (1). x If the matrix ΓF (x) is singular, then Γ−1 cannot be uniquely defined. But, the above F (x) transformation is still well defined as is evidenced in the following lemma. This lemma is an 15 extension of Lemma 2.1 of KK, suitable for the location-scale set up. As mentioned in KK, it is an adaptation and simplification of a more general argument presented in Nikabadze (1987) and Tsigroshvili (1998). Lemma 2.3.1 Suppose, for some x0 , such that 0 < F (x0 ) < 1, the matrix ΓF (x) , for x > x0 degenerates to the form   x  1 1   ΓF (x) = (1 − F (x))  1 1 x   x x x2 + 1    ,   ∀ x > x0 , (2.3.9) or   1   ΓF (x) = (1 − F (x))  k  x  k k x (k+1)2 k (k+2)x2 k2 x  k   k2  , x   2 k ∀ x > x0 , some k > 0. (2.3.10) Then in both cases, the equalities (2.3.7) and, hence, (2.3.8) are still valid. Besides, for (2.3.9), hT (x)Γ−1 F (x) ∞ x 2vn (x) − x∞ vn (y)dy h(y)dvn (y) = − , 1 − F (x) x ∈ R; (2.3.11) for (2.3.10), hT (x)Γ−1 F (x) ∞ x ∞ vn (y) dy y2 (k + 1) 2vn (x) + (k + 2)x x h(y)dvn (y) = − k 1 − F (x) , x ∈ R. (2.3.12) The conclusions (2.3.11) and (2.3.12) continue to hold with vn replaced by vˆn . 16 Proof. The proof of this lemma is similar to that of Lemma 2.1 of KK, which was proved for the location model only where the analog of Γ is 2 × 2. In the present set up Γ is 3 × 3 matrix, which creates some complexity. For the sake of self containment and completeness, we give details here to deal with this situation. When ΓF (x) is degenerate of the form (2.3.9), h(x) = (1, 1, x − 1)T . The image of the linear operator in R3 of ΓF (x) is I(ΓF (x) ) = {b : b = ΓF (x) a, for some a ∈ R3 } = {b : b = (1 − F (x))(β, β, βx + γ), β, γ ∈ R}, and the kernel of this operator is K(ΓF (x) ) = {a : ΓF (x) a = 0} = {a : a = α(1, −1, 0), α ∈ R}, To prove the equalities (2.3.7), it suffices to show that for any b ∈ I(ΓF (x) ), a ∈ K(ΓF (x) ), h(x)T Γ−1 Γ (b + a) = h(x)T (b + a). F (x) F (x) Note that for any b ∈ I(ΓF (x) ), a ∈ K(ΓF (x) ), ΓF (x) (b + a) = ΓF (x) b = (2β + βx2 + γx, 2β + βx2 + γx, 3βx + βx3 + γx2 + γ)T . 17 For any g = (λ, λ, λx + η) ∈ I(ΓF (x) ), if ΓF (x) g = ΓF (x) b, then 2λ + λx2 + ηx = 2β + βx2 + γx, 3λx + λx3 + ηx2 + η = 3βx + βx3 + γx2 + γ. From these two equations we obtain λ = β and η = γ. Then Γ−1 is any linear operator on F (x) I(ΓF (x) ) such that Γ−1 Γ b = b + a1 , F (x) F (x) a1 ∈ K(ΓF (x) ). From this fact we obtain that for any a ∈ K(ΓF (x) ), hT a = 0, h(x)T Γ−1 Γ (b + a) = h(x)T Γ−1 Γ b = h(x)T (b + a1 ) = h(x)T (b + a). F (x) F (x) F (x) F (x) Similarly, one proves (2.3.7) in the case ΓF (x) is degenerate of the form (2.3.10). This completes the proof of (2.3.7), which in turn yields the claims (2.3.11) and (2.3.12) for vn and vˆn , in an obvious way. Sometimes it is convenient to use the time transformation t = F (x), un = vn (F −1 (t)), uˆ = vˆn (F −1 (t)), γ(t) = h(F −1 (t)), and Γt = t1 γ(s)γ(s)T ds, 0 ≤ t ≤ 1. Now consider a function parametric version of the u- and un -processes and their transforms: 1 u(ϕ) = 1 ϕ(s)du(s), un (ϕ) = 0 ϕ(s)dun (s), 0 1 K(ϕ) = K(ϕ, u) = 0 ϕ(t)γ T (t)Γ−1 t 1 Kn (ϕ) = K(ϕ, un ) = 0 b(ϕ) = u(ϕ) − K( ϕ), 1 γ(s)du(s)dt, t ϕ(t)γ T (t)Γ−1 t 1 γ(s)dun (s)dt, t bn (ϕ) = un (ϕ) − Kn (ϕ), 18 ϕ ∈ L2 [0, 1]. Write b(t) and bn (t) for b(ϕ) and bn (ϕ(·)), respectively, when ϕ(·) = I(· ≤ t). Then t b(t) = u(t) − 0 γ T (z)Γ−1 z t bn (t) = un (t) − 0 1 γ(s)du(s)dz, t ∈ [0, 1], (2.3.13) z γ T (z)Γ−1 z 1 γ(s)dun (s)dz, t ∈ [0, 1]. z If Φ ⊂ L2 [0, 1] is a subset of square integrable functions such that the sequence un (ϕ), n ≥ 1, is uniformly in n equicontinuous on Φ, then un →d u in ∞ (Φ), where u is standard Brownian bridge, and ∞ (Φ) is the set of all uniformly bounded real valued functions on Φ (see van der Vaart and Wellner (1996)). The following theorem describes the weak convergence of the process Kn (ϕ), ϕ ∈ Φ. It is an extension of Theorem 2.1 of KK, which is valid for the location model only, to the location-scale model. Theorem 2.3.2 (i) Let L2,ε ⊂ L2 [0, 1] be the subspace of all square integrable functions which are equal to 0 on the interval (1 − ε, 1]. Then, Kn →d K, on L2,ε , for any 0 < ε < 1. (ii) Let, for any arbitrarily small but fixed ε > 0, C < ∞, and α < 1/2, Φε ⊂ L2 [0, 1] be a class of all square integrable functions satisfying the following right tail condition: −1/2 (1 − s)−1/2−α , |ϕ(s)| ≤ C[γ T (s)Γ−1 s γ(s)] ∀ s > 1 − ε. (2.3.14) Then, Kn →d K, on Φε . The following theorem describes the weak limit of the bn process and is an extension of Theorem 2.2 of KK to the location-scale set up. Recall that as in van der Vaart and Wellner (1996), the family of Gaussian random variable b(φ), φ ∈ Φ, Φ ⊂ L2 [0, 1], is continuous on Φ, with covariance function Eb(φ)b(φ ) = 01 φ(t)φ (t)dt is called Browian motion on Φ. 19 Theorem 2.3.3 (i) Let Φ be a Donsker class, that is, let un →d u in l∞ (Φ). Then, for every ε > 0, bn →d b in ∞ (Φ ∩ Φ ε ), where {b(ϕ), ϕ ∈ Φ} is standard Brownian motion. (ii) If the envelop function Ψ(t) of (2.3.14) tends to positive (finite or infinite) limit at t = 1, then for the process (2.3.13) we have bn →d b 2.3.3 on [0, 1]. Examples Here, we shall assess the behavior of γ T (s)Γ−1 s γ(s), as s → 1, for some well known distributions. This is needed to understand the behavior of the bound in (2.3.14), which in turn sheds some light on the class of functions ϕ one can use in this testing problem. Many technical details are similar to those appearing in KK when dealing with the location model only where Γs is 2 × 2 matrix. In the current set up we are dealing with the 3 × 3 matrix, which makes the details of derivations a bit more involved. First, let F be standard normal d.f. Then h(x) = (1, x, x2 − 1)T . With ζ ≡ ζ(x) = 20 f (x)/(1 − F (x)), we obtain   ζ xζ  1   ΓF (x) = (1 − F (x))  ζ xζ + 1 (1 + x2 )ζ   xζ (1 + x2 ) 2ζ + (x + x3 )ζ Γ−1 = F (x)    ,   (1 − F (x))−1 × 2 − 3ζ 2 + 3xζ + xζ 3 − 2x2 ζ 2 + x3 ζ  2 2 2 3 −2ζ ζ 2 − xζ  2 − ζ + 3xζ − x ζ + x ζ    −2ζ 2 + xζ − x2 ζ 2 + x3 ζ −ζ + xζ 2 − x2 ζ   ζ 2 − xζ −ζ + xζ 2 − x2 ζ 1 − ζ 2 + xζ     ,   and hT (x)Γ−1 h(x) = F (x) 3 − 4ζ 2 + 4xζ + x2 ζ 2 + x4 − 2x3 ζ 1 . (1 − F (x)) 2 − 3ζ 2 + 3xζ + xζ 3 − 2x2 ζ 2 + x3 ζ Using the asymptotic expansion for the tail of the normal d.f. for ζ(x) we obtain, as in KK, x ζ(x) = , 1 − S(x) n where S(x) = i=1 (−1)i−1 (2i − 1)!! 1 3 15 = 2 − 4 + 6 − ··· . 2i x x x x From this one can derive that 3 − 4ζ 2 + 4xζ + x2 ζ 2 + x4 − 2x3 ζ → 9/5, 2 − 3ζ 2 + 3xζ + xζ 3 − 2x2 ζ 2 + x3 ζ x → ∞, and hence h(x)T Γ−1 h(x) ∼ 9(1 − F (x))−1 /5, x → ∞, equivalently, F (x) −1 γ T (s)Γ−1 s γ(s) ∼ 9(1 − s) /5, 21 s → 1. This result is similar to the one obtained in KK for the location model only, where 9/5 is replaced by 2. Next, consider logistic d.f. F (x) with scale parameter 1, or equivalently ψf (x) = 2F (x) − 1. Then h(x) = (1, 2F (x)−1, x(2F (x)−1)−1)T or in terms of s = F (x), γ(s) = h(F −1 (s)) = (1, 2s − 1, F −1 (s)(2s − 1) − 1)T , and when s is close to 1,   s xs  1  2  1 − 2s + 4s s + x(1 − s)2 Γs ∼ (1 − s)  s + xs  3 3   π 2 + 12x2 s + x(1 − s)2 + xs 3 − 9x2 − 6(x − 3x2 )s + xs 3 9(1 − s)        . s=F (x) −1 From this formula, one can verify that γ T (s)Γ−1 s γ(s) ∼ (1 − s) , for s → 1. This result is different from the one reported in KK, where analogous γ and Γ satisfy γ T (s)Γ−1 s γ(s) = 4(1 − s)−1 , for all 0 ≤ s < 1. Next, consider the double exponential d.f. with density f (x) = e−|x| /2. For x > 0, we get h(x) = (1, 1, x − 1)T , and ΓF (x) is degenerate and equals to (2.3.9). An argument similar to the proof of Lemma 2.3.1 yields h(x)T Γ−1 h(x) = 2(1 − F (x))−1 , for all x > 0 with F (x) F (x) < 1. Finally, consider student tk -distribution with degrees of freedom k. In this case, 1 Γ((k + 1/2)) 1 f (x) = √ . πk Γ(k/2) (1 + x2 /k)(k+1)/2 22 As shown in KK, using the results of Soms (1976), for every k ≥ 1, d 1 1 + x2 /k 1 Γ((k + 1/2)) k/2 f (x) ∼ k k , dk = √ k , x k x π Γ(k/2) dk x k+1 k+1 f (x) ∼ k+1 , x → ∞. , ψf (x) = ∼ 2 k 1 + (x /k) x x 1 − F (x) ∼ Hence, h(x) = (1, ψf (x), xψf (x) − 1)T ∼ (1, (k + 1)/x, k)T , and ΓF (x) degenerates and has the form as in (2.3.10). This is unlike the location model case where KK observed that the analog of ΓF (x) is non-degenerate. Nevertheless, one still continues to have the same right tail behavior for the quadratic from γ(s)T Γ−1 s γ(s) as in the location model case, viz, −1 γ(s)T Γ−1 s γ(s) ∼ {2(k + 1)/k}(1 − s) , s → 1. 2.3.4 Limiting process In this section we discuss the weak convergence of the Un process. Towards this goal we assume the same tail conditions for vˆn as in KK, which is that for some 0 < β < 1/2, |ˆ vn (y)| = op (1), y>x (1 − F (y))β sup as x → ∞, (2.3.15) uniformly in n. To simplify the notation, we let ψ1 (x) = −f˙(x)/f (x), ψ2 (x) = −xf˙(x)/f (x) − 1, and denote the right tail mean of ψ1 and ψ2 by Ex ψi = E[ψi (e1 )|e1 > x], ψi0 = ψi − Ex ψi , Varx (ψi ) = Var[ψi (e1 )|e1 > x], Covx (ψ1 , ψ2 ) = Cov[ψ1 (e1 ), ψ2 (e1 )|e1 > x] i = 1, 2. 23 Now we formulate three more conditions on F : (a) For any ε > 0, the function ψi (F −1 ), i = 1, 2, is monotone on [1 − ε, 1]. (b) For some δ > 0, ε > 0 and some C < ∞, and for all x, such that F (x) > 1 − ε, hT (x)Γ−1 (0, ψ10 (x), 0)T F (x) 2 Var (ψ ) − ψ ψ Cov (ψ , ψ )| |ψ10 x 2 x 1 2 10 20 = Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 ) ≤ C(1 − F (x))−2δ , hT (x)Γ−1 (0, 0, ψ20 (x))T F (x) = 2 Var (ψ ) − ψ ψ Cov (ψ , ψ )| |ψ20 x 1 x 1 2 10 20 Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 ) ≤ C(1 − F (x))−2δ . Note that in terms of the above notation, with t = F (x), γ T (t)Γ−1 t γ(t) = 2 Var (ψ ) − 2ψ ψ Cov (ψ , ψ ) ψ 2 Varx (ψ2 ) + ψ20 1 x 1 x 1 2 10 20 × 1 + 10 . (1 − F (x)) Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 ) Hence, condition (b) implies −1−2δ , γ T (t)Γ−1 t γ(t) ≤ C(1 − t) ∀ t > 1 − ε. (2.3.16) (c) For some 0 < C < ∞ and β > 0 as in (2.3.15), ∞ x [1 − F (y)]β dψi (y) ≤ C|ψi0 (x)|, i = 1, 2. Remark 2.3.1 As mentioned in KK, (2.3.15) also holds for vn for any 0 < β < 1/2. Conditions (a), (b) and (c) are easy to check for all the examples in Section 2.3.3 by following similar 24 procedures even with δ = 0 in condition (b), so we omit the details here. Now we consider the asymptotic behaviors for the K(ψ, ξn ), which is 1 K(ψ, ξn ) = 0 ψ(t)γ T (t)Γ−1 t 1 γ(s)ξn (F −1 (ds))dt, t and for a given indexing class Φ of functions from L2 [0, 1]. Let Φ ◦ F = {ϕ(F (·)), ϕ ∈ Φ}. We can prove the similar limiting process for Un as Theorem 4.1 in KK. Theorem 2.3.4 (i) Suppose conditions (2.3.15) and (a)-(c) are satisfied with β > δ. Then, on the class Φε as in Theorem 2.3.2, with α < β − δ, we have sup |K(ϕ, ξn )| = op (1), n → ∞. ϕ∈Φε Therefore, if Φ is a Donsker class, then, for every ε > 0, Un →d b in ∞ (Φ ∩ Φ ε ◦ F ), where {b(ϕ), ϕ ∈ Φ} is standard Brownian motion. (ii) If, in addition, δ < α, then for the time transformed process Un (F −1 (·)) of (2.3.8), Un (F −1 (·)) →d b(·) in D[0, 1]. Proof. The proof below is similar to that of Theorem 4.1 in KK. 25 Note that [ψ10 Varx (ψ2 ) − ψ20 Covx (ψ1 , ψ2 )]a1 ; Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 ) [ψ20 Varx (ψ1 ) − ψ10 Covx (ψ1 , ψ2 )]a2 T = . γ T (t)Γ−1 t (0, 0, a2 ) (1 − F (x))[Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 )] T = γ T (t)Γ−1 t (0, a1 , 0) The above equalities used with ai = t1 (1 − s)β dψi (F −1 (s)), i = 1, 2, combined with conditions (b) and (c), yield T −1−2δ , |γ T (t)Γ−1 t (0, a1 , a2 ) | ≤ C(1 − t) ∀ 1 − < t < 1. (2.3.17) Now we prove the first claim. (i) Denote ξn (t) = ξn (x) with t = F (x). Because of the singularities at t = 0 and t = 1 in both integrals in K(ϕ, ξn ), we will isolate the neighborhood of t = 1. The neighborhood of t = 0 can be treated more easily. First assume Γt > 0 for all t < 1. Then, 1 0 ϕ(t)γ T (t)Γ−1 t 1 t 1−ε γ(t)ξn (ds)dt = 0 ϕ(t)γ T (t)Γ−1 t 1−ε + 0 1 + 1−ε 1−ε γ(t)ξn (ds)dt t 1 ϕ(t)γ T (t)Γ−1 t ϕ(t)γ T (t)Γ−1 t 1−ε 1 t γ(t)ξn (ds)dt γ(t)ξn (ds)dt. We shall show that each of these three terms are op (1). First consider the third summand on the right-hand side. By definition, n ξn (t) = uˆn (t) − un (t) − f (F −1 (t))n−1/2 [εi + i=1 26 F −1 (t) 2 (εi − 1)]. 2 The third summand is then sum of the two terms, one corresponding to the difference uˆn −un and the other corresponding to the remaining term. Now, since df (F −1 (s)) = ψf (x)f (x)dx and dF −1 (s)f (F −1 (s)) = [1 + xψf (x)]f (x)dx, F (x) = s, then 1 1−ε ϕ(t)γ T (t)Γ−1 t 1 γ(t)(df (F −1 (s)) + dF −1 (s)f (F −1 (s)))dt t 1 is the sum of the second and the third coordinate of 1−ε ϕ(t)γ(t)dt, and is small for small ε anyway. Assumption (a) guarantees the monotonicity of ψf (F −1 ) and dF −1 (s)f (F −1 (s)), so the integration by parts is justified, and we obtain 1 1−ε ϕ(t)γ T (t)Γ−1 t 1 = 1−ε 1 γ(t)ˆ un (ds)dt t ϕ(t)γ T (t)Γ−1 − γ(t)ˆ un (t) − t 1 uˆn (s)dγ(s) dt. t Using assumption (2.3.14) on ϕ and (2.3.16), we obtain 1 1−ε ϕ(t)γ T (t)Γ−1 un (t)dt t γ(t)ˆ 1 ≤ C 1−ε 1 ≤ C 1−ε which is small for small 1/2 [γ T (t)Γ−1 t γ(t)] 1 (1 − t)1+α+δ−β 1 (1 − t)1/2+α−β |ˆ un (t)| β t>1−ε (1 − t) dt sup |ˆ un (t)| , β t>1−ε (1 − t) dt sup as soon as α < β − δ. Note that t1 uˆn (s)dΓ(s) = T 0, t1 uˆn (s)dψf (F −1 (s)), t1 uˆn (s)d(F −1 (s)ψf (F −1 (s)) . Using monotonicity of ψf (F −1 (s)) and F −1 (s)ψf (F −1 (s)) for small enough ε, we obtain, 27 for all t > 1 − ε, 1 t 1 t uˆn (s)dψf (F −1 (s)) < C uˆn (s)d(F −1 (s)ψf (F −1 (s))) < C 1 t |ˆ un (t)| ; β t>1−ε (1 − t) (1 − s)β dψf (F −1 (s)) sup 1 t (2.3.18) |ˆ un (t)| . β t>1−ε (1 − t) (1 − s)β d(F −1 (s)ψf (F −1 (s))) sup Therefore, using (2.3.17), for the double integral 1 1−ε ϕ(t)γ T (t)Γ−1 t 1 1 uˆn (s)dγ(s)dt ≤ C t |ˆ un (t)| , β t>1−ε (1 − t) (1 − t)−1−2δ dt sup 1−ε which is small as soon as α < β − δ. The same conclusion is true for uˆn replaced by un . Since (2.3.18) implies the smallness of 1 and uˆn (s)d(F −1 (s)ψf (F −1 (s))) and 1−ε 1 1−ε 1 uˆn (s)dψf (F −1 (s)) 1−ε 1 1−ε un (s)dψf (F −1 (s)); un (s)d(F −1 (s)ψf (F −1 (s))), to prove that the middle summand on the right-hand side is small one needs only finiteness of ψ1 (x), ψ2 (x) in each x with 0 < F (x) < 1, which follows from (a). This and uniform in x smallness of ξn proves smallness of the first summand as well. The smallness of integrals ε 0 ϕ(t)γ T (t)Γ−1 t γ(t) 1 t γ(s)ξn (ds)dt, −1 follows from Γ−1 t ∼ Γ0 for small t, and square integrability of ϕ and Γ. 28 If Γt is degenerate of the form (2.3.9) for any t > t0 , we get γ T (t)Γ−1 t 2ξn (t) − t1 ξn (t)dt . γ(s)ξn (ds)dt = − 1−t t 1 If Γt is degenerate of the type (2.3.10) for any t > t0 , we get γ T (t)Γ−1 t 1 (k + 1) 2ξn (t) + (k + 2)F −1 (t) t ξn (t)/F −1 (t)2 dt γ(s)ξn (ds)dt = − . k 1−t t 1 The smallness of all tail integrals easily follows by the tail condition (2.3.15) for our choice of the indexing functions ϕ. (ii) Since for δ < α the envelope function Ψ(t) of (2.3.14) satisfies inequality Ψ(t) ≥ (1 − t)δ−α . It has positive finite or infinite lower limit at t = 1. We can choose an indexing class of indicator functions ϕ(t) = I[τ ≤ t] and the claim follows. 2.4 Simulations In this section we report the findings of a simulation study. To examine the performance of the proposed test, we consider the following autoregressive and conditional variance functions m(x) = (1/2 + x2 /2) − 1/2, σ 2 (x) = 3/4 + x2 /4, 29 x ∈ R. In the null hypothesis, F is the d.f. of a standardized normal r.v., as in Section 2.3.3, then h(x) = (1, x, x2 − 1)T , and Γ−1 is as in (2.3.15). The interval In := [− log(n), log(n)]. For F (y) the purpose of computation, we use the following representation of n Un (x) = ω ¯ i [I(ˆ ei ≤ x) − h(ˆ ei )T G(x ∧ eˆi )], n1/2 x ∈ R, i=1 eˆi := εˆi I(− log n ≤ Xi−1 ≤ log(n)), where G(x) = −1 y≤x ΓF (y) h(y)dF (y). εˆi := (Xi − m(X ˆ i−1 ))/ˆ σ (Xi−1 ), Let eˆ(j) , 1 ≤ j ≤ n denote the ordered residuals eˆi , 1 ≤ i ≤ n. Then Un := supx∈R |Un (x)| = max{max1≤j≤n |Un (ˆ e(j) )|, supx<ˆe (1) |Un (x)|}. The asymptotic critical values of the Un -test are the critical values of the distribution of sup0≤t≤1 |b(t)|. From Khmaladze and Koul (2004) these critical values at the levels 5%, 2.5% and 1%, respectively, are 2.24241, 2.49771 and 2.80705. To compare the effect of the two estimators σ ˆ12 (x) and σ ˆ22 (x) of σ 2 (x) given at (2.2.2) and (2.2.4) on the finite sample behavior of the test, we first compared the type I error for different sample sizes obtained by computing the number of times Un exceeded the given asymptotic critical value, divided by the number of repetitions, based on the sample sizes n = 300, 500, each repeated 1000 times. The results are displayed in Table 2.2. One sees that σ ˆ22 is more effective than σ ˆ12 in preserving the nominal level of this test. Then we used the adaptive estimator σ ˆ22 to examine the finite sample power of the proposed Khmaladze martingale transform Un test. The alternatives chosen are the mixture distributions of standard normal and standardized t-distribution with degree of freedom 4, √ i.e (1 − p)N (0, 1) + pt4 / 2, for p ∈ [0, 1]. We compared the Un test with the two classical tests, KS and CvM tests. The critical values for the latter two tests are simulated by Monte Carlo method. We choose n = 500 30 and 1000 repetitions for each test. The critical values thus obtained are given in Table 2.1. Level KS CvM 0.01 1.03159 0.21080 0.025 0.93812 0.17630 0.05 0.86067 0.15036 Table 2.1: Monte carlo critical values of the KS and CvM tests. The empirical powers, i.e., the relative rejection frequencies under the chosen alternatives, for all three tests based on the sample sizes n = 300 and n = 500 with 1000 repetitions and 5%, 2.5% and 1% levels are displayed in Table 2.3. As in KK, the martingale transform test Un again has larger empirical power than the KS test, uniformly at all chosen levels and for all values of p. Its empirical powers are also higher than those of the CvM test, at all chosen levels and for all values of p, except for p = .8 and p = 1. In this simulation study the time series Xi was generated as follows. For each simulation, 900 + n observation of Xi were generated, and only the last n observations were used in the test, to ensure stationarity. The local linear estimators for m ˆ and σ ˆ 2 were calculated using the biweight kernel function K(x) ≡ W (x) ≡ 15(1 − x2 )2 I(|x| ≤ 1)/16. Both the bandwidths were chosen according to the assumption by a rule of thumb as h1 = h2 = 1.06 ∗ min(sd(ˆ e), IQR(ˆ e)/1.34) ∗ h−2/(6+1.9) , where eˆ is the vector of all residuals with Xi−1 ∈ In = [− log n, log n], i = 1, · · · , n, and IQR means the interquartile range. Let s = (log n − |x|)/0.1, x ∈ R. The weight function used was    0,     wn (x) = 1,       −20s7 + 70s6 − 84s5 + 35s4 , 31 x∈ / [− log n, log n]; x ∈ [− log n + 0.1, log n − 0.1]; otherwise. Table 2.2: Empirical levels of Un test Level 0.05 0.025 0.01 n = 300 0.014 0.005 0.005 σ ˆ12 n = 500 0.021 0.010 0.006 n = 600 0.031 0.014 0.006 n = 300 0.031 0.009 0.004 σ ˆ22 n = 500 0.047 0.017 0.008 n = 600 0.051 0.027 0.010 Table 2.3: Empirical powers of tests based on σ ˆ22 . p 0 0.2 0.4 0.6 0.8 1 2.5 Level 0.05 0.025 0.01 0.050 0.025 0.010 0.050 0.025 0.010 0.050 0.025 0.010 0.050 0.025 0.010 0.050 0.025 0.010 Un 0.030 0.018 0.007 0.073 0.057 0.045 0.148 0.129 0.110 0.261 0.223 0.188 0.404 0.342 0.300 0.556 0.499 0.437 n = 300 KS 0.053 0.022 0.006 0.038 0.015 0.006 0.071 0.049 0.024 0.109 0.066 0.032 0.216 0.141 0.087 0.331 0.235 0.153 CvM 0.049 0.018 0.007 0.041 0.024 0.012 0.099 0.066 0.037 0.182 0.131 0.076 0.408 0.311 0.217 0.575 0.478 0.368 Un 0.049 0.020 0.007 0.134 0.118 0.106 0.303 0.263 0.229 0.494 0.445 0.398 0.673 0.612 0.563 0.812 0.760 0.710 n = 500 KS 0.049 0.021 0.014 0.046 0.025 0.013 0.089 0.052 0.030 0.241 0.172 0.101 0.422 0.326 0.209 0.587 0.447 0.356 CvM 0.052 0.029 0.013 0.052 0.025 0.014 0.169 0.117 0.075 0.411 0.336 0.253 0.716 0.627 0.516 0.873 0.816 0.738 Proofs In this section we give the proof of Theorem 2.3.1. To this end, we list some useful lemmas. For α−mixing processes, we can follow the same proof as in Selk and Neumeyer (2013), and for the moment contracting stationary processes, the proofs are similar to those of Wu et al. (2010). Many details that follow lemma will be brief. 32 Let t1 , t2 , · · · be measurable functions which are bounded by the same constant B. Let 1 Tn (x) = nh n tn (Xj )K j=1 Xj − x , h1 x ∈ R. (2.5.1) We have Lemma 2.5.1 Under the conditions of Theorem 2.3.1, sup |Tn (x) − E(Tn (x))| = Op x∈In Proof. log n 1/2 . ncn (i) Under condition (Z) for α−mixing processes, the proof is similar to that of Lemma B.1 in Selk and Neumeyer (2010) with k = 0 in their proof. (ii) For the moment contracting processes, since the t1 , t2 , · · · are bounded on In , the claim follows from Proposition 2 and Lemma 4 of Wu et al. (2010). Next, consider 1 Un,l (x) = nh n εj σ(Xj−1 )K (l) j=1 Xj−1 − x , h1 x ∈ In , l = 0, 1, 2, (2.5.2) where K (l) is the l-th derivative of K. We have Lemma 2.5.2 Under the conditions of Theorem 2.3.1, sup x∈In ,l=0,1,2 |Un,l (x)| = Op −1/2−l −1/2 n (log n)1/2 cn + c2n qn . Proof. i) Under the condition (Z) for α−mixing processes, it follows from Lemma B.1 and Lemma B.2 of Selk and Neumeyer (2010) applied with k = 1. 33 (ii) Under the condition (Z ) for the moment contracting processes, because of the stationarity, it follows from Lemma 4 of M¨ uller et al. (2009). Proof of Lemma 2.2.1. The general idea of the proof this lemma and Theorem 2.3.1 is similar to that of Theorem 1 in M¨ uller et al. (2009), so we use similar notation as in their paper and shall be brief whenever possible. Let Ki (u) = ui K(u), i ≥ 0, Let Ki (u) = ui K(u), i ≥ 0, 1 pˆi (x) = nh1 n j=1 Xj−1 − x Ki , h1 1 qˆi (x) = nh1 n X j Ki j=1 Xj−1 − x , h1 x ∈ R. On the event, pˆ2 (x)ˆ p0 (x) − pˆ21 (x) > 0, pˆ (x)ˆ q0 (x) − pˆ1 (x)ˆ q1 (x) . m(x) ˆ = 2 pˆ2 (x)ˆ p0 (x) − pˆ21 (x) Assumption (F), (H), (K), and Lemmas 2.5.1 imply sup |ˆ pi (x) − E[ˆ pi (x)]| = Op (h1 ), i = 0, 1, 2, · · · . (2.5.3) x∈In Let p¯i (x) = E[ˆ pi (x)] and λi = Ki (u)du = ui K(u)du. Note that p¯i (x) = g(x − h1 u)ui K(u)du, and λ0 = 1, λ1 = 0, λ2 > 0. By (2.5.3), pˆi /g − λi In + p¯i /g − λi In = Op (h1 ), i = 0, 1, 2, · · · . Hence pˆ2 (x)ˆ p0 (x) − pˆ21 (x) − λ2 g 2 In = Op (h1 ). 34 (2.5.4) With (inf x∈In g(x))−1 = qn,g in assumption (X), there exists an η > 0 such that 2 inf |ˆ p2 (x)ˆ p0 (x) − pˆ21 (x)| > η → 1. P qn,g x∈In (2.5.5) Write qˆi = Ai + Bi , for i = 0, 1, where 1 Ai (x) = nh1 Bi (x) = 1 nh1 n σ(Xj−1 )εj Ki j=1 n m(Xj−1 )Ki j=1 Xj−1 − x , h1 Xj−1 − x , h1 x ∈ R. Since the second derivative m ¨ of m is bounded, a Taylor expansion shows that 1 (Bi − mˆ pi − mh ˙ 1 pˆi+1 − mh ¨ 21 pˆi+2 )/g In = Op (h31 ), 2 where (2.5.6) · In denotes the super norm over In . Note that the proof of the properties of σ ˆ12 is similar to one for m, ˆ so we give the details for m ˆ and σ ˆ22 only. By gn = up (hn ), we mean that there exists constant C > 0, such that P ( gn In ≤ C hn In ) → 1. Based on the analysis above, we obtain the following expansions, which are similar to those appearing in Yao and Tong (1994). With rˆj ≡ (Xj − m(X ˆ j−1 ))2 , m(x) ˆ − m(x) = 1 nh1 g(x) n σ(Xj−1 )εj K j=1 Xj−1 − x h2 λ2 + 1 m(x) ¨ + up (Rn,1(x) ),(2.5.7) h1 2 35 σ ˆ22 (x) − σ22 (x) 1 = nh2 g(x) (2.5.8) n Xj−1 − x {ˆ rj − σ 2 (x) − σ˙ 2 (x)(Xj−1 − x)} + up {Rn,2 (x)}, h2 W j=1 where 1 Rn,1 (x) = ng(x) n + j=1 1 Rn,2 (x) = ng(x) n + n σ(Xj−1 )εj K j=1 Xj−1 − x h1 Xj−1 − x Xj−1 − x σ(Xj− )εj K h1 h1 n W j=1 2 q h3 ); + O(qn,g n 1 Xj−1 − x {ˆ rj − σ 2 − σ˙ 2 (x)(Xj−1 − x)} h2 Xj−1 − x Xj−1 − x {ˆ rj − σ 2 − σ˙ 2 (x)(Xj−1 − x)} W h2 h2 j=1 2 q 2 h3 ). +O(qn,g n 2 From Lemma 2.5.2, we have 1 sup x∈In nh1 g(x) n σ(Xj−1 )εj Ki j=1 Xj−1 − x h1 = Op qn qn,g log n 1/2 . nh1 From (2.5.7) and the above bounds we readily obtain sup |m(x) ˆ − m(x)| = Op −1/2 −1/2 n (log n)1/2 cn qn qn,g . x∈In Combining this fact with condition (M) completes the proof of (2.2.7). 36 (2.5.9) To deal with σ ˆ22 , a similar analysis as in Fan and Yao (2002) can be followed, where rˆj = {Xj − m(X ˆ j−1 )}2 = {σ(Xj−1 )εj + m(Xj−1 ) − m(X ˆ j−1 )}2 ˆ j−1 )} = σ 2 (Xj−1 )ε2j + 2σ(Xj−1 )εj {m(Xj−1 ) − m(X +{m(Xj−1 ) − m(X ˆ j−1 )}2 . Then σ ˆ22 (x) − σ 2 (x) = J1 + J2 − J3 + J4 + Op (h2 )(|J1 + J2 − J3 + J4 | + |J1∗ + J2∗ − J3∗ + J4∗ |), where 1 J1 = nh2 g(x) J2 = J3 = J4 = 1 nh2 g(x) 2 nh2 g(x) 1 nh2 g(x) n W Xj−1 − x {σ 2 (Xj−1 ) − σ 2 (x) − σ˙ 2 (x)(Xj−1 − x)}, h2 W Xj−1 − x 2 σ (Xj−1 )(ε2j − 1), h2 W Xj−1 − x σ(Xj−1 )εj {m(X ˆ j−1 ) − m(Xj−1 )}, h2 W Xj−1 − x {m(X ˆ j−1 ) − m(Xj−1 )}2 , h2 j=1 n j=1 n j=1 n j=1 and Ji∗ is defined in the same way as Ji with one more factor h−1 2 (Xj−1 − x) in the jth summand, for j = 1, · · · , n and i = 1, · · · , 4. Condition (M) implies J1 In = Op (qn qn,g h22 ), 37 and from Lemma 2.5.2, we obtain log n 1/2 . J2 In = Op qn2 qn,g nh2 Based on (2.5.9), 3 q 2 log n J4 In = Op qn,g n nh h . 1 2 To deal with J3 , rewrite J3 = J31 + J32 + J33 , where 1 J31 = 2 n h1 h2 g(x) n K i,j=1 g −1 (Xi−1 )W = h21 λ2 J32 = nh2 g(x) |J33 | ≤ Op (1) n2 h2 Xi−1 − Xj−1 σ(Xi−1 )σ(Xj−1 )εi εj h1 Xj−1 − x Xi−1 − x + g −1 (Xj−1 )W h2 h2 1 φij , n2 h1 h2 g(x) 1≤i,j≤n n W Xi−1 − x σ(Xi−1 )εi m(X ¨ i−1 ), h2 W Xi−1 − Xj−1 Xi−1 − x K σ(Xi−1 )σ(Xj−1 )|εi |εj /g(Xi−1 ) , h2 h1 i=1 n i,j=1 where Xi−1 − Xj−1 Xi−1 − x σ(Xi−1 )σ(Xj−1 )εi εj g −1 (Xi−1 )W h1 h2 Xj−1 − x +g −1 (Xj−1 )W . h2 φij = K 38 Argue as in Borkowski and Mielniczuk (2012), to obtain E φij 2 2 ). = Op (n2 c2n qn4 qn,g 1≤i,j≤n To obtain the uniform bound, we consider the equal-length cover Ink and with center xnk , k = 1, · · · , L(n), for In , where L(n) = O((log n)r1 /(c3n (ncn )1/2 qn qn,g )). Then sup |J31 (x)| ≤ x∈In max sup 1≤k≤L(n) x∈In ∩I nk |J31 (x) − J31 (xnk )| + max 1≤k≤L(n) |J31 (xnk )| = R1 + R2 . Note that R1 ≤ For any 2 C(log n)r1 qn3 qn,g L(n)h2 (nh1 )1/2 = Op qn2 qn,g c2n . > 0, by the relation (2.2.6) in assumption (H), for a constant C < ∞, −1 c−2 R > qn−2 qn,g 2 n P ≤ L(n)P −1 c−2 qn−2 qn,g n 1 φij n2 h1 h2 g(x) 1≤i≤j≤n C(log n)r1 >ε 2 1 E φ ij 2 c3n (ncn )1/2 qn qn,g 2 n4 c4n h21 h22 qn4 qn,g in1/2 ,1≤i≤n |t|>n1/2 √ √ ≤ 2(1 − F ( n/2)) + 2F (1 − n/2). Since F has a finite second moment, we have F (t) = o(t−2 ), as t → −∞ and 1 − F (t) = o(t−2 ), as t → ∞. This implies that sup ˆ Tˆ − H(t, 0, 0)| = op (n−1/2 ). |H(t, S, |t|>n1/2 So we are left to show sup ˆ Tˆ − H(t, 0, 0)| = op (n−1/2 ). |H(t, S, |t|≤n1/2 42 (2.5.11) Now let δ = 1/(1 + √ 3). For any interval I, let C11+δ (I) be the set of differentiable functions h on R that satisfy h I,δ ≤ 1, where h I,δ = h I + h˙ I + ˙ ˙ |h(x) − h(y)| . |x − y|δ x,y∈I,x=y sup Now let Dn = {u + ν : u ∈ Un , ν ∈ Vn }, where Un = {h ∈ C(R) : h In ≤ n−1/2 }, −1/2 Vn = {h ∈ C11+δ (R) : h In ≤ n−1/2 cn log nQ2n }, with Qn = qn qn,g qn,σ . Let uˆ(x) := m(x) ˆ − m(x) − vˆ(x), and uˆσ (x) := σ ˆ 2 (x) − σ 2 (x) − vˆσ (x), where 1 vˆ(x) := nh1 g(x) vˆσ (x) := 1 nh2 g(x) n σ(Xj−1 )εj K j=1 n W j=1 Xj−1 − x + Op (qn c2n ), h1 Xj−1 − x 2 σ (Xj−1 )(ε2j − 1). h2 It follows from Lemma 2.5.2 and similar argument as in Selk and Neumeyer (2013), Sˆ and Tˆ belong to Dn with probability tending to one. So (2.5.11) will be followed if we prove sup |H(t, S, T − H(t, 0, 0)| = op (n−1/2 ). |t|≤n1/2 ,S,T ∈Dn To this end, set ηn = n−1/2 . Let t1 , · · · , tMn be ηn -net of [−n1/2 , n1/2 ], and set 43 ν1 , · · · , νNn for Vn . We can choose the former net such that Mn ≤ 2 + n, (2.5.12) Nn ≤ exp(K∗ (2 + bn − an )n1/(2+2δ) ), (2.5.13) the second net is where K∗ is some positive constant, see also (Van der Vaart and Wellner (1996)). Note that ν1 , · · · , νNn is an 2ηn -net for Dn . We have sup |H(t, S, T ) − H(t, 0, 0)| |t|≤n1/t ,S,T ∈Dn ≤ max |Hn (ti , νl , νm ) − Hn (ti , 0, 0)| + max Di,l,m , i,l,m i,l,m where Di,l,m = sup |t−ti |≤ηn , S−νl I ≤2ηn , T −νm I ≤2ηn |H(ti , S, T ) − H(t, νl , νm )| +|H(ti , 0, T ) − H(t, 0, νm )| + |H(ti , S, 0) − H(t, νl , 0)| + |Hn (ti , 0, 0) − Hn (t, 0, 0)|. For |t − ti | ≤ ηn , S − νl I ≤ 2ηn , T − νm I ≤ 2ηn , we have I y ≤ ti + νl (x) + νm (x)ti − ηn (A + 3) ≤ I y ≤ t + S(x) + T (x)t ≤ I y ≤ ti + νl (x) + νm (x)ti − ηn (A + 3) , 44 and F ti + νl (x) + νm (x)ti − ηn (A + 3) ≤ F t + S(x) + T (x)t ≤ F ti + νl (x) + νm (x)ti + ηn (A + 3) , for all y ∈ R and x ∈ In , where A = |T | + 2|ti | + 2ηn . Hence |H(ti , S, T ) − H(t, νl , νm )| ≤ |H ti + ηn (A + 3), νl (x), νm (x) − H ti − ηn (A + 3), νl (x), νm (x) | + 2Ri,l,m , with Ri,l,m n = j=1 ωnj {F ti + νl (x) + νm (x)ti + ηn (A + 3) − F ti + νl (x) + νm (x)ti − ηn (A + 3) } n ≤ 2ηn (sup |Af (ξ)| + 3 f ∞ ), say. t for some ξ is between ti + νl (x) + νm (x)ti − ηn (A + 3) and ti + νl (x) + νm (x)ti + ηn (A + 3). By assumption (F), there exists some L, such that |Af (ξ)| < L < ∞. Similarly, we derive the bound for the following terms, |H(ti , 0, T ) − H(t, 0, νm )| ≤ |H ti + ηn (A + 1), 0, νm (x) − H ti − ηn (A + 1), 0, νm (x) | ≤ 4ηn L + 4 f ∞ , |H(ti , S, 0) − H(t, νl , 0)|) ≤ |H ti + 3ηn , νl (x), 0 − H ti − 3ηn , νl (x), 0 | ≤ ηn 12 f ∞ , |Hn (ti , 0, 0) − Hn (t, 0, 0)| ≤ |H(ti + ηn , 0, 0) − H(ti − ηn , 0, 0)| ≤ ηn 4 f ∞ . 45 So ˆ Tˆ − H(t, 0, 0)| = T1 + T2 + T3 + T4 + T5 + ηn (8L + 32 f ∞ ), |H(t, S, sup |t|≤n1/2 ,S,T ∈Dn where T1 = max |H(ti , νl , νm ) − H(ti , 0, 0)|, i,l,m T2 = max |H ti + ηn (A + 3), νl (x), νm (x) − H(ti − ηn (A + 3), νl (x), νm (x))|, i,l,m T3 = max |H ti + ηn (A + 1), 0, νm (x) − H ti − ηn (A + 1), 0, νm (x) |, i,l,m T4 = max |H ti + 3ηn , νl (x), 0 − H ti − 3ηn , νl (x), 0 |, i,l,m T5 = max |H(ti + ηn , 0, 0) − H(ti − ηn , 0, 0)|. i,l,m To continue, for any υi and τi , i = 1, 2, let Yj = ωnj I εj ≤ s + υ1 (Xj−1 ) + τ1 (Xj−1 )s − I εj ≤ t + υ2 (Xj−1 ) + τ2 (Xj−1 )t − F s + υ1 (Xj−1 ) + τ1 (Xj−1 )s + F t + υ2 (Xj−1 ) + τ2 (Xj−1 )t . We have |Yj | ≤ 2, E(Yj |X0 , · · · , Xj−1 ) = 0, and n E Yj2 |X0 , · · · , Xj−1 Vn = j=1 n ≤ F s + υ1 (Xj−1 ) + τ1 (Xj−1 )s − bF t + υ2 (Xj−1 ) + τ2 (Xj−1 )t j=1 ≤ n f (ξ) s + υ1 (Xj−1 ) + τ1 (Xj−1 )s − t + υ2 (Xj−1 ) + τ2 (Xj−1 )t where ξ is between s + υ1 (Xj−1 ) + τ1 (Xj−1 )s and 46 , t + υ2 (Xj−1 ) + τ2 (Xj−1 )t. Since supt |tf (t)| < ∞, there exists some constant L, such that Vn ≤ n{ f ∞ (|s − t|(1 + σ In ) + υ1 − υ2 In ) + L τ1 − τ2 In } = n f ∞ B. Then by martingale inequality in Freedman (1975), n P (|H(s, υ1 , τ1 )| − H(t, υ2 , τ2 )| > βn1/2 ) Yj > βn1/2 , Vn ≤ n f ∞ B , = P j=1 ≤ 2 exp(− −1/2 Also νl In ≤ n−1/2 cn β 2n 4βn1/2 + 2n f ∞ B ). log nQ2n + ηn . Thus we obtain that P (T1 > βn−1/2 ) P (|H(ti , νl , νm )| − H(ti , 0, 0)| > βn1/2 ) ≤ i,l,m ≤ 2Mn Nn2 exp − β 2n 4βn1/2 −1/2 + 4n(n−1/2 cn log nQ2n (L + 1) + ηn ) . f ∞ Similarly, there exists some constant L2 and L3 , such that P (T2 > βn−1/2 ) ≤ 2Mn Nn2 exp − P (T3 > βn−1/2 ) ≤ 2Mn Nn2 exp − P (T4 P (T5 β 2n 4βn1/2 + nηn (L2 + 12 f ∞ ) β 2n 4βn1/2 + nηn (L3 + 4 f ∞ ) β 2n > βn−1/2 ) ≤ 2Mn Nn2 exp − , 4βn1/2 + 12nηn f ∞ β 2n > βn−1/2 ) ≤ 2Mn Nn2 exp − . 4βn1/2 + 4nηn f ∞ As δ = 1/(1 + √ , , 3) and relation (2.2.5) in condition (H), together with relations (2.5.12) 47 and (2.5.13), we obtain that P (Ti > βn−1/2 ) → 0, i = 1, 2, · · · , 5, β > 0. This completes the proof of (2.5.10) and hence the proof of Theorem 2.3.1 Lemma 2.5.3 Under the conditions of Theorem 2.3.1, 1 n n j=1 n m(X ˆ j−1 ) − m(Xj−1 ) 1 ωn (Xj−1 ) = εj + op (n−1/2 ), σ( Xj−1 ) n j=1 and for i = 1, 2, 1 n n ωn (Xj−1 ) j=1 n σ ˆi (Xj−1 ) − σ(Xj−1 ) 1 = (ε2j − 1) + op (n−1/2 ). σ( Xj−1 ) 2n j=1 Proof. To prove the first equation, from the proof of Lemma 2.2.1, we have 1 m(x) ˆ − m(x) = nh1 g(x) n σ(Xj )εj K j=1 Xj − x + op (n−1/2 ). h1 Then we only need to prove 1 n n i=1 ωn (Xi ) nh1 g(Xi )σ(Xi ) n j=1 Xj − X i σ(Xj )εj K h1 1 = n n εj + op (n−1/2 ). j=1 Denote n ˆ = d(x) i=1 ωn (Xi )σ(x) x − Xi K . nh1 g(Xi )σ(Xi ) h1 48 ¯ = E(d(x)), ˆ Let d(x) we have ωn (u)σ(x) x−u K du. h1 σ(u) h1 ¯ = d(x) ¯ Then we have E[(d(X) − 1)2 ] → 0. Therefore 1 n n ¯ j ) − 1) = op (n−1/2 ). εj (d(X j=1 Thus we only need to prove that 1 n n ˜ j ) = op (n−1/2 ), εj d(X (2.5.14) j=1 ˜ = d(x) ˆ − d(x). ¯ where d(x) But the proof of (2.5.14) is similar to that of Lemma B.3 of Selk and Neumeyer (2013) under the mixing condition (Z), and as that appearing in section 5 of M¨ uller et al. (2009) under the moment contracting condition (Z ). The second equation can be followed by similar proof. 49 Chapter 3 Linear Measurement Error Models 3.1 Introduction The problem of fitting an error distribution in regression models has been well studied when covariates are fully observed, see, e.g., Loynes (1980), Koul (2002), Khamalze and Koul (2004, 2009) and the references therein. However, in practice there are numerous examples of real world applications where covariates are not observable. Instead, one observes some surrogates for covariates. The monographs of Cheng and Van Ness (1999), Fuller (2009) and Carroll, Rupert, Stefanski, and Crainiceanu (2012) are full of such important applications. These models are often called errors-in-variables models or measurement errors models. Relatively little is known about fitting an error distribution to the regression model in these models. In this chapter we investigate a class of tests for this testing problem based on deconvoluted density estimators of the error density. Let p ≥ 1 be a given dimension of the covariate vector X. In a multiple linear regression model with measurement error in X one observes the response variable Y and a surrogate p-vector Z obeying the model Y = α + β X + ε, Z = X + u, (3.1.1) for some α ∈ R, β ∈ Rp , where the p-vector u is the measurement error in X. Here b 50 denotes the transpose of any vector b ∈ Rp . The variables ε, u and X are assumed to be mutually independent, with Eε = 0 and Eu = 0. And for the model identifiability reasons, we assume the density g of the measurement error u to be known. Let f denote density of ε, and f0 be a known density with zero mean. Consider the problem of testing the hypothesis H0 : f = f0 v.s H1 : f = f0 , (3.1.2) based on a random sample (Yi , Zi ), 1 ≤ i ≤ n from the joint distribution of (Y, Z) obeying the model (3.1.1). Note that if in (3.1.1), β = 0, then Y bears no relation with X and hence whether X is observable or not is irrelevant for making inference about f . In particular any goodness-of-fit test based on Yi , 1 ≤ i ≤ n, useful for fitting a density up to an unknown location parameter may be used to test the above hypotheses. Thus, from now onwards we shall assume β = 0 in this chapter. Since we observe Z instead of X, we shall rewrite the model (3.1.1) as Y = α + β Z + e, e = ε − β u. Because u and ε are independent, the density of e is h(v) = Let h0 (v) = f (v + β u)g(u)du, v ∈ R. f0 (v + β u)g(u)du, v ∈ R. As argued in Koul and Song (2012), there is a one-to-one map between the densities of ε and e. Hence, testing for H0 is equivalent to testing for H0 : h = h0 , vs. H1 : h = h0 . 51 (3.1.3) In the one sample i.i.d. set up, Bickel and Rosenblatt (1973) goodness-of-fit test for fitting a known density is based on an L2 distance between a kernel density estimator and its null expected value. This test is adapted to fitting an error density up to an unknown location parameter, where the density estimator would be based on the estimated residuals. This statistics has the property that its asymptotic null distribution is not affected by not knowing the location parameter. In other words, not knowing the nuisance location parameter has no effect on asymptotic level of the test based on the analog of this statistics. What is remarkable is that this property continues to hold in several more complicated additive models. Lee and Na (2002), Bachmann and Dette (2005), and Koul and Mimoto (2012) observed that this fact continues to hold for the analog of this statistics when fitting an error density based on residuals in autoregressive and generalized autoregressive conditionally heteroscedastic time series models. This type of property makes these L2 -distance type tests more desirable, compared to the tests based on residual empirical processes, because the asymptotic null distribution of the standardized residual empirical process depends on the estimators of the underlying nuisance parameters in these models in a complicated fashion. In all of these works all data are completely observable. In the above measurement error model, Koul and Song (2012) proposed analogous class of tests for the testing problem (3.1.3) based on kernel density estimators of h obtained from the residuals Yi − α ˆ − βˆ Zi , 1 ≤ i ≤ n, directly, where α ˆ , βˆ are some n1/2 -consistent estimators of α, β, under H0 . Alternately, because f is involved in the convolution h, it is natural to construct tests of H0 based on a deconvolution density estimators. In this chapter we develop an analogs of the above tests for testing H0 based on deconvolution density estimators. There is a vast literature on the deconvolution estimators of the density of X in the 52 measurement error model (3.1.1), as is evidenced in the papers of Carroll and Hall (1988), Stefanski and Carroll (1990), Fan (1991), van Es and Uh (2004), and Delaigle and Hall (2006) among others. The goodness-of-fit testing problem pertaining to the density function of X has been studied by several authors including Butucea (2004), Holzman and Boysen (2006), Holzman, Bissantz and Munk (2007), and Loubes and Marteau (2014). All of these authors use analogs of the above L2 -distance type tests based either on the deconvoluted estimator of density of X or on a density estimator of Z density. None of them address the above problem of testing (3.1.2) or (3.1.3) pertaining to the error density in the above measurement error model (3.1.1). Consider the model (3.1.1) and assume for the time being α, β are known. Since we observe Y and Z, we can construct a kernel density estimator of density h of e := Y − α − β Z = ε − β u, which is also an estimator of the convolution of the density f of ε with the known density of β u. From this we obtain a deconvolution density estimator of f , which we shall use to construct tests of H0 . Let Φγ denote the characteristic function of a density γ. Proceeding a bit more precisely, by the independence of ε and u, Φh (t) = Φf (t)Φg (−βt). Assuming Φg (t) = 0, for all t ∈ R, the characteristic function of ε is Φf (t) = Φh (t)/Φg (−βt). Using the data Yi , Zi , 1 ≤ i ≤ n, an estimate of Φh is provided by the empirical characteristic function Ψn (t) := n−1 itej n j=1 e of ej := Yj − α − β Zj , 1 ≤ j ≤ n. A kernel density estimator of h is 1 hn (x, α, β) = nb n K j=1 x − ej , b where K is a kernel function with its characteristic function ΦK compactly support and b > 0 is a bandwidth sequence. Then the characteristic function of hn is ΦK (bt)Ψn (t). Since 53 Φg is known, a kernel estimate of Φf (t) is ΦK (bt)Ψn (t)/Φg (−βt). By the inversion formula, fn (x, α, β) = Ψn (t) 1 e−itx ΦK (bt) dt, 2π R Φg (−βt) is a deconvolution estimate of f when α and β are known. But, in practice α, β are seldom known. Let α ˆ , βˆ be estimators of α, β, respectively. Then the corresponding deconvoluˆ obtained from fn after replacing α, β by α ˆ tion estimator of f is fˆn (x) := fn (x, α ˆ , β) ˆ , β, respectively. The proposed class of tests, one for each K and b, of H0 is to be based on 2 fˆn (x) − Kb ∗ f0 (x) dx, Tˆn = R where for any function γ, Kb ∗ γ(x) := b−1 K((x − y)/b)γ(y)dy. It is well known that the convergence rate of the deconvolution density estimators depends sensitively on the tail behaviour of the characteristic function of the underlying measurement error, which in the present set up is Φg . There are two general cases: one is the ordinary smooth case, where |Φg (t)| is of polynomial order |t|−κ , for some κ > 0, as |t| → ∞; the other λ is the super smooth case, where |Φg (t)| is of the order |t|λ0 e−|t| /ν , for some λ0 ∈ R, λ > 0 and ν > 0, as |t| → ∞. In this chapter, we obtain asymptotic distributions of Tˆn under H0 in both the ordinary smooth and super smooth cases in section 2. The consistency against a fixed alternative, the asymptotic power against a class of local nonparametric alternatives and against a fixed alternative for both cases is described in section 3. The findings of a finite sample simulation that compares the empirical power of a member of the proposed class of tests with that of the Kolmogorov–Smirnov, Cram´er–von Mises tests based on the empirical d.f. of {ˆ ej := Yj − α ˆ − βˆ Zj , 1 ≤ j ≤ n}, and a Koul and Song (2012) 54 ˆ are presented in section 4. The comparison is made for the three test based on hn (·, α ˆ , β) choices of the measurement error variance σu2 . In the ordinary smooth case, the proposed test dominates the Koul-Song test at almost all chosen alternatives for all three choices of σu2 . It also dominates the other two tests for the larger values of σu2 at most of the chosen alternatives and for a larger sample size. The findings in the super smooth case are similar. In general the proposed test has better empirical power at the chosen alternatives compared to some of these other tests for larger values of σu2 , while Cram´er–von Mises test dominates in terms of the empirical power for smaller values of σu2 . See section 3.4 for more on this finite sample comparison. Throughout this chapter, N µ, σ 2 ) denotes the normal distribution with mean µ and variance σ 2 , all limits are taken as n → ∞, →d and →p denoted the convergence in distribution and probability, respectively, and the range of integration in all the integrals is R, unless specified otherwise. 3.2 Asymptotic Null Distribution This section discusses the asymptotic null distribution Tˆn for the ordinary smooth and super smooth cases. 3.2.1 Ordinary smooth case Here we shall first derive the limiting null distribution of Tˆn for the ordinary smooth case. To begin with we state the needed assumptions. (A): The characteristic function Φg of the error vector u satisfies Φg (t) = 0, for all t ∈ Rp , and |Φg (t)| ≈ t −κ , for a κ > 0, i.e. there are c, C > 0 such that c t −κ ≤ |Φg (t)| ≤ 55 C t −κ , for all sufficiently large t . (B): The characteristic function Φf of the density f of ε satisfies |Φf (t)| = O(|t|−r ), for some r > 1, as |t| → ∞. (C): The characteristic function ΦK of the kernel function K is symmetric around 0 and compactly supported on [−1, 1]. (D): E{ X 4 + |ε|4 + u 4 } < ∞. Next, define ψ(β, s, t) := Φg (βt + βs)Φf (t + s), and let Tn (α, β) := fn (x, α, β) − Kb ∗ f0 (x) 2 dx, CM,b := |ΦK (tb)|2 dt, |Φg (βt)|2 |ΦK (tb)|2 |ΦK (sb)|2 |ψ(β, s, t)|2 dt. |Φg (βt)|2 |Φg (βs)|2 CV,b := Using Theorem 1 of Holzman et al. (2007) one can derive the following result. Suppose H0 and the assumptions (A)–(C) hold and b → 0, nb → ∞. Then CM,b ≈ b−(2κ+1) , −1/2 nCV,b CV,b ≈ b−(4κ+1) , Tn (α, β) − CM,b / 2πn →d N 0, 1/2π 2 . (3.2.1) (3.2.2) ˆ Thus we need the above results to hold with α, β replaced Note that Tˆn = Tn (ˆ α, β). ˆ respectively. Accordingly, write CˆM,b , CˆV,b and Ψ ˆ n (t) for CM,b , CV,b and by α ˆ and β, ˆ respectively. We are now ready to state the following Ψn (t), when α, β are replaced by α ˆ , β, theorem, which provides yet another example where the asymptotic null distributions of these L2 -distance statistics are not affected by not knowing the nuisance parameters α, β. 56 Theorem 3.2.1 Suppose H0 holds, assumptions (A), (B) with r > 3/2, (C) and (D) hold, and that n1/2 {|ˆ α − α| + βˆ − β } = Op (1). (3.2.3) In addition, suppose b → 0, and nbmax{2κ+3,3.5} → ∞, with κ as in (A). Then CˆM,b ≈ b−(2κ+1) , CˆV,b ≈ b−(4κ+1) , and −1/2 nCˆV,b Tˆn − CˆM,b 2πn →d N 0, 1 . 2π 2 (3.2.4) The proof of this theorem is given in the last section. Let za be (1 − a)100th percentile of the N (0, 1) distribution. An immediate consequence of (3.2.4) is that for any 0 < a < 1, the test that rejects H0 whenever Tn := √ −1/2 2πnCˆV,b Tˆn − CˆM,b > za/2 2πn has the asymptotic size a. Examples of g that satisfy assumption (A) include uniform distribution with κ = 1, gamma distributions with scale γ where κ = γ, exponential where κ = 1, and Laplace distribution with location 0 and scale 1 where κ = 2. The class of the regression error densities f that satisfy assumption (B) includes Laplace where r = 2, normal and Cauchy for any r > 0. 57 3.2.2 Super smooth case Now we consider the problem of obtaining the limiting distribution of Tˆn in the super smooth case. Here we need the following assumptions. (A ): The characteristic function Φg of the error variable u satisfies Φg (t) = 0, for any t ∈ Rp . λ For any β ∈ Rp , βk = 0, for k = 1, · · · , p, |Φg (βt)| ∼ C(β)|t|λ0 e−ν(β)|t| , as |t| → ∞, for a λ > 1, C(β) > 0, ν(β) > 0, and λ0 ∈ R. Also, C(β), ν(β), exist bounded first derivatives. (B ): The density f is square-integrable, and Eε2 < ∞. (C ): The characteristic function ΦK of the kernel function K is symmetric around 0 and compactly supported on [−1, 1]. Moreover ΦK (0) = 1, and there exist A > 0, ω ≥ 0 such that ΦK (1 − t) = Atω + o(tω ), as t → 0. From Holzmann and Boysen (2006) we can deduce that under the conditions (A )–(C ), as n → ∞ and b → 0, (2λ)1+2ω πC 2 (β)n A2 ν 1+2ω (β)bλ−1+2λω+2λ0 exp 2ν(β)/bλ Γ(2ω + 1) Tn (α, β) →d χ22 /2, (3.2.5) where χ22 is a r.v. having chi-square distribution with 2 degree of freedom, and Γ(·) is the Gamma function. In order to derive a similar result for Tˆn , we need the following additional condition. Let q˙ be the first derivative of q for any function q. 58 (D ): There exists some λ1 > 1, Φf (t)| = O(|t|−λ1 ) as |t| → ∞. Theorem 3.2.2 Suppose H0 and the assumptions (A ), (B ), (C ), (D ), (D) hold, b → 0, and nb−η exp − 2ν(β)/bλ → ∞, for any η > 0. (3.2.6) Then Tn,s := ˆ 2n (2λ)1+2ω πC(β) ˆ 1+2ω bλ−1+2λω+2λ0 exp A2 ν(β) Tˆn →d χ22 /2. (3.2.7) λ ˆ 2ν(β)/b Γ(2ω + 1) Note that the factor multiplying Tˆn here is all known. Again, the proof of this theorem appears in the last section. The corresponding test is to rejects H0 with asymptotic size a, for 0 < a < 1, whenever Tn,s > Xa /2, where Xa is (1 − a)100th percentile of the χ22 distribution. Examples satisfying assumption (A ) include normal densities. If g is a standard normal density then Cg = 1, λ0 = 0, λ = 2 and ν = 2. For kernel functions satisfying assumption (C ), Holzmann and Boysen (2006) used the sinc kernel K(x) = sin(x)/(πx), with A = 1 and ω = 0, and Fan (1992) used ΦK (t) = (1 − t2 )3 with A = 8 and ω = 3. Other suitable kernel functions can also be found in Delaigle and Hall (2006). 3.3 Consistency and Asymptotic Power In this section we shall discuss the consistency and asymptotic power for fixed and local nonparametric alternatives of the above tests for both ordinary and super smooth cases. 59 Consistency. Let f1 be another fixed density of ε such that 2 f1 − f0 := f1 (x) − f0 (x) dx 1/2 > 0. (3.3.1) Consider the fixed alternatives, H1 : f (x) = f1 (x), for all x ∈ R. The following two theorems yield the consistency of the above Tn and Tn,s tests against H1 for the ordinary and super smooth cases, respectively. Theorem 3.3.1 Suppose assumptions (A) and (C) hold, f0 and f1 satisfy (B) with r > 3/2, and have finite fourth moment, and (3.2.3) holds under H1 . Furthermore, suppose (D) holds, b → 0, and nbmax{2κ+3,3.5} → ∞. Then √ CˆM,b −1/2 2πnCˆV,b Tˆn − →p ∞. 2πn (3.3.2) Theorem 3.3.2 Assume (3.2.3) holds under H1 , and that the assumptions of Theorem 3.2.2 hold. Then n bλ−1+2λω+2λ0 exp ˆ λ 2ν(β)/b Tˆn →p ∞. Asymptotic local power. First we consider the ordinary smooth case. We shall describe the asymptotic distribution of Tˆn under a sequence of the local nonparametric alternatives f1n (x) = f0 (x) + δ1n (x), x ∈ R, with δ1n = (CV,b /2)1/4 /(nπ)1/2 , and f1n a nonnegative function, ∈ L2 (R), and 0. We obtain 60 (x)dx = Theorem 3.3.3 Suppose the assumptions of Theorem 3.2.1 hold and that under H1n : f (x) = f1n (x), (3.2.3) holds. Then, under under H1n , √ −1/2 2πnCˆV,b (Tˆn − CˆM,b /(2πn)) →d N ( 2 , 1). Similarly for the super smooth case, consider a sequence of the local nonparametric alternatives x ∈ R, f2n (x) = f0 (x) + δ2n (x), δ2n = (2λ)1+2ω πC(β)2 n −1/2 A2 ν(β)1+2ω bλ−1+2λω+2λ0 exp 2ν(β)/bλ Γ(2ω + 1) with f2n a nonnegative function, ∈ L2 (R), and , (x)dx = 0. We obtain Theorem 3.3.4 Suppose the assumptions of Theorem 3.2.2 hold and (3.2.3) holds under H2n : f (x) = f2n (x). Then, under H2n , ˆ 2n (2λ)1+2ω πC(β) ˆ 1+2ω bλ−1+2λω+2λ0 exp A2 ν(β) Tˆ − ˆ λ Γ(2ω + 1) n 2ν(β)/b 2 →d χ22 /2. The above two theorems show that the proposed tests can detect alternatives which converge to f0 at a rate slower than n−1/2 . Asymptotic power against a fixed alternative. Now we describe the asymptotic power for the ordinary smooth case against a fixed alternative f1 such that f1 − f0 > 0. To proceed further we state the following result, which follows from Theorem 2 of Holzmann et al. (2007). Assume f1 = f0 satisfies (3.3.1), assumptions (A) and (C) hold, f1 and f0 satisfy 61 assumption (B) for some r > κ + 1, and have bounded second derivatives b → 0, and (3.2.6) holds. Then, under H1 , n1/2 Tn (α, β) − Kb ∗ (f1 − f0 ) 2 →d N (0, τ02 ), (3.3.3) where τ02 = 1 Var 2π 3 e−itε Φf1 (t) − Φf0 (t) Φg (βt) dt . We shall use this result to analyze the asymptotic distribution of Tˆn under the fixed alternative H1 . To proceed further, let µZ := EZ, and suppose the first derivatives f˙1 and f˙0 exist. Define Af = 2 (f1 − f0 )f˙0 (x) dx, Bf = 2µZ (f1 − f0 )f˙1 (x) dx. Theorem 3.3.5 Assume that (A), (C) and (D) hold, f1 and f0 satisfy assumption (B) with r > κ + 1, r > 3/2, and κ as in (A) and have bounded second derivatives. Also, assume (3.3.1) and (3.2.3) hold under H1 . Furthermore, if b → 0, nbmax{4κ+2,2κ+3} → ∞, then n1/2 Tˆn − Kb ∗ (f1 − f0 ) 2 − (ˆ α − α)Af − (βˆ − β) Bf →d N (0, τ02 ). (3.3.4) Note that the effect of estimating α and β introduces another bias term n1/2 ((ˆ α − α)Af + (βˆ − β) Bf ) in the asymptotic distribution of the statistics Tˆn . This bias will vanish if to begin with there is no intercept parameter in the model and µZ = 0. It also vanishes under the following linearity condition on the estimators. 62 Furthermore, suppose under H1 , the estimators α ˆ and βˆ satisfy the following expansion: 1 α ˆ−α= n n ηj + op (n−1/2 ), (3.3.5) j=1 n 1 βˆk − βk = n ζjk + op (n−1/2 ), k = 1, · · · , p, (3.3.6) j=1 where ηj are i.i.d. with Eη = 0, Var(η) > 0, E|η|2+ϑ < ∞, for some ϑ > 0. Moreover, the same conditions are satisfied by ζjk ’s, and also for i = j = k ηi , ζj and ek are mutually independent. Examples of the estimators of α ˆ , βˆ that satisfy these two conditions include the naive least square estimators, maximum likelihood estimators (see Huˇakov´a and Meintanis (2007)), and the bias-corrected estimators (see Fuller (1987)). Using the above expansion, we obtain Theorem 3.3.6 Assume the conditions of Theorem 3.3.5 and (3.3.5)-(3.3.6) for α ˆ and βˆ hold. Then, for some τ > 0, n1/2 Tˆn − Kb ∗ (f1 − f0 ) 2 →d N (0, τ 2 ). (3.3.7) The form of τ is described in the proof of this theorem in the last section, see (3.5.26). Although τ is complicated to calculate in practice, the bootstrap simulation methods can be used to estimate τ . For the super smooth case, in order to obtain a similar result as above, we need to make the following stronger assumptions on f1 and f0 : ξ (B∗ ) The characteristic function Φf of the density f of ε satisfies |Φf (t)| = O(|t|ξ0 e−|t| /ζ ) for some ξ0 ∈ R, ζ > 0 and ξ > λ. 63 Assumption (B∗ ) implies (D ), and assures Φf (t)/Φg (βt) dt < ∞. An example of f and g satisfying the above condition is where f is a normal density with variance smaller than 1, and g is standard normal density. A result analogous to (3.3.3) can be obtained in the super smooth case also by following the proof of Theorem 2 in Holzmann et al. (2007) with known α and β. To be clear, assume f1 , f0 satisfying (3.3.1), assumptions (A ) and (C ) hold, and f1 and f0 satisfy assumption (B∗ ). Assume b → 0, and nb−η exp − 4ν(β)/bλ → ∞, for any η > 0. (3.3.8) Then (3.3.3) holds. In the case of unknown α and β, we obtain the following theorem. Theorem 3.3.7 Suppose assumptions (A ), (C ), and (B∗ ) hold, f1 , f0 satisfy (3.3.1), and have bounded second derivatives. If, in addition, b → 0, and (3.2.6) holds, then we have (3.3.4). Furthermore, if α ˆ and βˆ satisfy (3.3.5)-(3.3.6), then (3.3.7) holds for some τ > 0. 3.4 Simulations In this section we report the findings of some extensive simulations, which assess some finite sample level and power behavior of a member of the above class of tests. The results are presented in the two subsections for ordinary and super smooth cases. 64 3.4.1 Ordinary smooth case Consider the measurement error model Y = 1 + X + ε, Z = X + u, (3.4.1) where X ∼ N (0, 1) and Φg (t) = 16/(4 + σu2 t2 )2 . This Φg satisfies assumption (A) of the ordinary smooth case with κ = 4. We wish to test the hypothesis that ε ∼ N (0, 0.25), i.e., f0 in H0 is the density of normal distribution with mean zero and variance 0.25. As in Koul and Song (2012), we use the bias-corrected estimators α ˆ = Y¯ − βˆZ¯ and βˆ = SZY /(SZZ − σu2 ), where Y¯ and Z¯ denote the sample mean of Y and Z, and SZY and SZZ denote the sample covariance of Z and Y and the sample variance of Z, respectively. In the deconvolution estimator of f , we used the sinc kernel K(x) = sin x/(πx). The proposed test based on Tˆn −1/2 rejects H0 for the large values of Tˆn := nCˆV,b |Tˆn − CˆM,b /(2πn)|. We shall compare this test with with the Kolmogorov-Smirnov (TKS ), the Cram´er-von Mises (TCvM ) tests and the Wn test proposed by Koul and Song (2012), all based directly ˆ i , 1 ≤ i ≤ n. The first two statistics are defined as on residuals eˆi := Yi − α ˆ − βZ TKS := sup n1/2 |Fˆn (x) − F0 (x)|, x∈R where Fˆn (x) := n−1 n ei i=1 I(ˆ TCvM := n 2 Fˆn (x) − F0 (x) dF0 (x), ≤ x). To define Wn , let ϕ be a density kernel on R, ϕ2 (u) := ϕ(v)ϕ(u + v) dv, c ≡ cn be another window width, w be a compactly supported density 65 on R, and let n x − eˆj ϕ , c 1 Cˆn := 2 2 n c ˆ n := 2 Γ ˜ 2 (x)w2 (x)dx h n ϕ2 (u) du. ˆ := Then, with h0 (x, β) ˆ f0 (x + βu)dx, ˜ n (x) := h 1 nc j=1 ˆ −1 Wn := nb1/2 Γ n n ϕ2 i=1 v − eˆi w(v) dv, c 2 ˜ n (x) − h0 (x, β) ˆ 2 w(x) dx − Cˆn . h In this simulation study, we chose the kernel function ϕ to be the standard normal density and the bandwidth c = n−0.27 , and w(·) was chosen to be the uniform density on the closed interval [−6, 6]. All these three tests reject H0 for large values of their corresponding statistics. To assess the effect of the measurement error on the finite sample level and power of these tests, we conducted simulations for the three values of σu2 = 0.25, σu2 = 0.5, σu2 = 1 with bandwidth b = 0.5n−1/12 ,b = 0.65n−1/12 , and b = 0.8n−1/12 , respectively . It is well known that the approximation of the distributions of the test statistics based on density estimators by their asymptotic distributions is generally slow. For that reason in this simulation study we use the Monte Carlo simulation method to obtain the critical values for all tests considered. At level 0.05, the critical values of all four tests are simulated by the Monte Carlo method, based on sample size 300 and 500, repeating 1000 times. The 95% quantiles are calculated for 1000 repetitions and the mean values of the 1000 quantiles are chosen as the critical values given in Table 3.1 for different values of σu2 . The Monte Carlo level of the three tests Tˆn , TKS and TCvM is relatively more robust against the variation in 66 the measurement error, compared to that of the Wn test. n 300 500 σu2 0.25 0.5 1 0.25 0.5 1 Tˆn 1.18118 1.14531 1.15600 1.19852 1.20847 1.20484 TKS 0.86714 0.87355 0.88576 0.86223 0.86674 0.87829 TCvM 0.16988 0.17395 0.18056 0.16929 0.17357 0.18073 Wn 1.34799 1.40213 1.49112 1.40258 1.44800 1.52883 Table 3.1: Monte Carlo critical values of all the tests, ordinary smooth case. The alternatives considered here are t-distributions with k degrees of freedom, denoted by tk , for k = 4, 6, 8, 10, 15, 20, double exponential (DE) and logistic (L) distributions, all having zero mean and standard deviation 0.5. The sample sizes chosen are 300 and 500 and the level is .05. From Table 3.2, we see that in terms of the empirical power, the Tˆn dominates the Wn test uniformly across the chosen alternatives, sample sizes and the values of σu2 , while for n = 500, it also dominates the TKS test for at almost all chosen alternatives, when σu2 = .5, 1, while the TCvM test dominates all other tests for the smallest value of σu2 . We also considered the following normal and logistic mixture alternatives. f1 = 0.5N (−µ, 0.25) + 0.5N (µ, 0.25), µ > 0, f2 = 0.5 (−λ, 1.5/π) + 0.5 (λ, 1.5/π), λ ≥ 0, − x−a b ). where (a, b) is the density of the logistic d.f. 1/(1 + e The empirical powers for normal and logistic mixture alternatives are given in Table 3.3. In both cases the sample sizes are 300 and 500 and the level is 0.05. From Table 3.3 one observes the following. First, as σu2 increases, the empirical powers decrease generally. Secondly, for the alternatives f1 and σu2 = 1, the proposed test Tˆn based on deconvolution density estimator has larger empirical 67 powers than the TKS and Wn tests in most of the cases, while the TCvM test dominates all other three tests in terms of the empirical power. For σu2 = 0.25, the Wn test dominates all three tests Tˆn , TKS and TCvM at all normal mixture alternatives. For σu2 = 0.5, the Tˆn test has larger empirical powers than TKS , but smaller empirical powers than TCvM and Wn . Similar phenomena can be found from Table 3.3 for the alternatives f2 . n 300 σu2 0.25 0.5 1 500 0.25 0.5 1 Test Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn t4 0.240 0.191 0.268 0.035 0.093 0.064 0.100 0.018 0.046 0.052 0.042 0.037 0.397 0.244 0.398 0.051 0.131 0.113 0.162 0.059 0.069 0.059 0.058 0.042 t6 0.093 0.084 0.103 0.023 0.072 0.057 0.077 0.037 0.050 0.046 0.052 0.041 0.158 0.122 0.159 0.037 0.052 0.070 0.059 0.043 0.050 0.049 0.046 0.034 t8 0.072 0.062 0.065 0.023 0.038 0.055 0.065 0.038 0.043 0.046 0.047 0.050 0.092 0.082 0.101 0.027 0.049 0.055 0.057 0.034 0.045 0.043 0.042 0.054 t10 0.052 0.058 0.053 0.027 0.051 0.054 0.052 0.043 0.040 0.050 0.050 0.039 0.078 0.077 0.083 0.032 0.054 0.062 0.062 0.041 0.052 0.044 0.056 0.047 t15 0.047 0.041 0.053 0.027 0.052 0.060 0.056 0.040 0.049 0.055 0.057 0.046 0.068 0.050 0.054 0.034 0.043 0.052 0.045 0.048 0.061 0.053 0.057 0.048 t20 0.060 0.053 0.062 0.039 0.047 0.053 0.051 0.046 0.049 0.047 0.060 0.038 0.060 0.044 0.053 0.035 0.049 0.064 0.069 0.049 0.047 0.050 0.049 0.041 DE 0.183 0.150 0.226 0.033 0.050 0.051 0.071 0.032 0.046 0.047 0.054 0.043 0.344 0.223 0.315 0.059 0.107 0.106 0.129 0.043 0.060 0.049 0.068 0.034 Table 3.2: Empirical powers against chosen alternatives, ordinary smooth case. 68 L 0.059 0.064 0.068 0.036 0.050 0.049 0.049 0.034 0.048 0.040 0.047 0.038 0.097 0.076 0.090 0.025 0.044 0.053 0.052 0.037 0.063 0.050 0.059 0.048 µ n 300 2 σu 0.25 0.5 1 500 0.25 0.5 1 Test Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn Tˆn TKS TCvM Wn 0.2 0.084 0.078 0.101 0.137 0.071 0.078 0.082 0.095 0.079 0.059 0.079 0.075 0.113 0.095 0.143 0.172 0.063 0.091 0.084 0.094 0.089 0.065 0.084 0.073 0.4 0.672 0.577 0.787 0.818 0.457 0.350 0.514 0.473 0.262 0.199 0.265 0.212 0.893 0.835 0.955 0.955 0.646 0.529 0.709 0.639 0.394 0.305 0.415 0.320 λ 0.6 0.999 0.998 1.000 1.000 0.973 0.920 0.985 0.974 0.729 0.593 0.760 0.655 1.000 1.000 1.000 1.000 0.998 0.990 0.999 0.999 0.898 0.801 0.930 0.855 0.8 1.000 1.000 1.000 1.000 1.000 0.998 1.000 1.000 0.964 0.924 0.974 0.949 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.988 1.000 0.995 1.2 0.252 0.236 0.367 0.419 0.162 0.163 0.226 0.206 0.147 0.117 0.153 0.132 0.390 0.393 0.586 0.585 0.300 0.247 0.373 0.322 0.182 0.156 0.197 0.153 1.4 0.523 0.460 0.651 0.687 0.363 0.278 0.430 0.403 0.209 0.164 0.233 0.181 0.747 0.697 0.879 0.872 0.520 0.444 0.619 0.531 0.304 0.218 0.342 0.265 1.6 0.767 0.703 0.862 0.874 0.551 0.431 0.631 0.584 0.291 0.228 0.328 0.255 0.947 0.913 0.985 0.982 0.776 0.668 0.845 0.790 0.446 0.379 0.484 0.373 1.8 0.929 0.889 0.966 0.968 0.742 0.645 0.824 0.772 0.450 0.347 0.496 0.404 0.992 0.987 1.000 1.000 0.922 0.858 0.961 0.934 0.627 0.528 0.681 0.550 Table 3.3: Empirical powers against mixture normal (left panel) and logistic alternatives, ordinary smooth case. 69 3.4.2 Super smooth case Now consider the measurement error model (3.4.1), where again X ∼ N (0, 1), ε ∼ N (0, 0.25) but u ∼ N (0, σu2 ). The bias-corrected estimators are also used to estimate α and β. The sinc kernel K(x) = sin x/(πx) is consider for the deconvolution kernel estimator, with the √ bandwidth b = 0.55(log n)−0.5 , b = ( 0.5 + 0.05)(log n)−0.5 and b = 1.15(log n)−0.4 when σu2 = 0.25, σu2 = 0.5 and σu2 = 1, respectively. Thus, Cg = 1, ν = 2/σu2 , λ0 = 0, λ = 2, A = 1, and ω = 0 in equation (3.2.4). Then the left side of (3.2.7) can be written as Tˆn,s := 2πnσu2 βˆ2 Tˆn . ˆ u |2 /b2 ) b exp(|βσ The Monte Carlo distribution of Tˆn,s for the sample size 1000 based on 1000 repetitions is very close to χ22 /2. Hence the critical values of this test are obtained from χ22 /2 distribution. To examine the power, we compared our test with the same three direct tests as in the previous section. We generated the critical values for TKS , TCvM and Wn defined as above by Monte Carlo methods, based on 500 and 1000 sample size, repeated 1000 times. The 95% quantiles are calculated for 1000 repetitions and the mean values of 1000 these quantiles are chosen as the critical values. These critical values are listed in Table 3.4. n 500 1000 σu2 0.25 0.5 1 0.25 0.5 1 TKS 0.85655 0.85670 0.85545 0.85038 0.85183 0.85210 TCvM 0.16540 0.16500 0.16446 0.16482 0.16535 0.16492 Wn 1.39447 1.45467 1.53780 1.46119 1.51706 1.59195 Table 3.4: Monte Carlo critical values of the TKS , TCvM , and Wn , super smooth case. We consider the same alternative as in the ordinary smooth case of the subsection 3.4.1. 70 The empirical powers against t, double exponential and logistics distributions are given in Table 3.5. From this table one sees that the proposed deconvolution test provides the largest empirical powers in all the cases compared to the other three testing methods when σu2 = 1, while it dominates the Wn test for smaller values of σu2 . The empirical powers against normal and logistics mixture alternatives are given in Table 3.6, for sample size 500 and 1000. From this table we see that for both normal and logistic mixture alternatives, the Wn test dominates the Tˆn,s and TKS tests for all chosen sample sizes and for all values of σu2 , while the TCvM test dominates all other tests uniformly. 71 n 500 σu2 0.25 0.5 1 1000 0.25 0.5 1 Test Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn t3 0.207 0.451 0.694 0.146 0.143 0.152 0.250 0.043 0.155 0.084 0.087 0.043 0.262 0.714 0.960 0.499 0.184 0.273 0.468 0.092 0.199 0.094 0.135 0.066 t4 0.118 0.184 0.300 0.037 0.105 0.086 0.104 0.047 0.093 0.050 0.059 0.047 0.117 0.364 0.551 0.098 0.085 0.102 0.164 0.041 0.116 0.062 0.072 0.035 t5 0.111 0.119 0.160 0.037 0.087 0.073 0.082 0.051 0.088 0.046 0.052 0.044 0.096 0.193 0.304 0.051 0.079 0.090 0.120 0.041 0.088 0.051 0.057 0.047 t6 0.087 0.090 0.120 0.035 0.080 0.066 0.059 0.039 0.071 0.044 0.047 0.045 0.067 0.123 0.171 0.036 0.066 0.072 0.090 0.044 0.078 0.054 0.053 0.058 t8 0.072 0.071 0.082 0.029 0.065 0.053 0.055 0.046 0.074 0.054 0.053 0.049 0.066 0.090 0.114 0.038 0.046 0.056 0.060 0.041 0.074 0.048 0.063 0.043 t10 0.061 0.062 0.066 0.040 0.069 0.045 0.058 0.051 0.058 0.047 0.049 0.050 0.062 0.067 0.088 0.037 0.060 0.048 0.053 0.049 0.063 0.044 0.047 0.055 DE 0.121 0.130 0.206 0.034 0.105 0.073 0.088 0.045 0.070 0.054 0.057 0.044 0.165 0.274 0.403 0.095 0.081 0.100 0.095 0.046 0.081 0.049 0.050 0.050 L 0.075 0.071 0.082 0.035 0.072 0.051 0.045 0.034 0.075 0.051 0.050 0.054 0.062 0.080 0.101 0.044 0.075 0.046 0.048 0.050 0.073 0.047 0.039 0.042 Table 3.5: Empirical powers against alternative distributions, super smooth case. 72 m n 500 σu2 0.025 0.5 1 1000 0.025 0.5 1 Test Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn Tˆn,s TKS TCvM Wn 0.2 0.048 0.079 0.136 0.143 0.048 0.042 0.049 0.045 0.040 0.049 0.059 0.054 0.063 0.154 0.225 0.187 0.070 0.096 0.147 0.044 0.034 0.067 0.083 0.052 0.4 0.140 0.800 0.930 0.908 0.065 0.171 0.263 0.191 0.042 0.099 0.128 0.085 0.150 0.983 1.000 0.994 0.153 0.742 0.898 0.631 0.275 0.368 0.511 0.238 λ 0.6 0.547 1.000 1.000 1.000 0.207 0.803 0.938 0.835 0.352 0.452 0.590 0.359 0.684 1.000 1.000 1.000 0.553 1.000 1.000 1.000 0.968 0.933 0.977 0.879 0.8 0.890 1.000 1.000 1.000 0.599 0.997 1.000 0.999 0.925 0.865 0.950 0.828 0.979 1.000 1.000 1.000 0.892 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.2 0.092 0.389 0.565 0.502 0.079 0.199 0.297 0.195 0.042 0.115 0.145 0.081 0.076 0.632 0.824 0.755 0.070 0.361 0.557 0.342 0.094 0.174 0.252 0.126 1.4 0.134 0.645 0.844 0.810 0.105 0.326 0.512 0.362 0.071 0.164 0.237 0.127 0.110 0.929 0.984 0.966 0.108 0.634 0.803 0.610 0.231 0.295 0.420 0.199 1.6 0.189 0.853 0.961 0.942 0.165 0.549 0.732 0.558 0.159 0.286 0.412 0.227 0.208 0.996 0.997 0.997 0.169 0.879 0.964 0.862 0.365 0.484 0.649 0.358 1.8 0.305 0.974 0.997 0.997 0.240 0.765 0.897 0.783 0.291 0.342 0.523 0.279 0.356 0.998 0.998 0.998 0.271 0.984 0.996 0.973 0.625 0.660 0.837 0.508 Table 3.6: Empirical powers against mixture normal (left panel) and logistic distributions, super smooth case. 73 3.5 Proofs Here we present proof of Theorems 3.2.1–3.3.7. We write Tn := Tn (α, β) and fn (x) := ˆ for expressions simplicity. fn (x, α, β) with known α, β and fˆn (x) := fn (x, α ˆ , β) Since CV,b ≈ b−(4κ+1) , we first show nb2κ (fˆn − fn )2 (x) dx = op (1). (3.5.1) Using Parseval’s equation, we have (fˆn − fn )2 (x) dx = 1 4π 2 = 1 2π |ΦK (ht)|2 ≤ 1 2π |ΦK (ht)|2 (3.5.2) e−itx ΦK (ht) ˆ n (t) 2 Ψn (t) Ψ − dt dx ˆ Φg (−βt) Φg (−βt) ˆ n (t) Ψ Ψn (t) 2 dt − ˆ Φg (−βt) Φg (−βt) ˆ n (t) − Ψn (t))|2 |Ψ dt ˆ 2 |Φg (−βt)| ˆ − Φg (−βt)|2 |Φg (−βt) 1 2 |ΦK (bt)Ψn (t)| + dt ˆ g (−βt)|2 2π |Φg (−βt)Φ 1 1 = S1 + S , say. 2π 2π 2 Since ΦK is supported on [−1, 1], ΦK (bt) = 0, for |t| > 1/b. Thus in the above two integrals, t ∈ [−1/b, 1/b]. Since µg := |x|g(x)dx < ∞, Φ˙ g exists and is uniformly bounded 74 above by µg . This fact together with (3.2.3) and assumption (A) imply, ˆ − Φg (−βt)| ≤ µg |t| βˆ − β , |Φg (−βt) ˆ Φg (−βt) −1 |t|≤1/b Φg (−βt) max = ˆ − Φg (−βt) Φg (−βt) Φg (−βt) |t|≤1/b max (3.5.3) (3.5.4) = Op (n−1/2 b−κ−1 ). ˆ ≥ |Φg (−βt)|/2, t ∈ [−1/b, 1/b]}. Since nb2κ+3 → ∞, (3.5.4) implies Let An := {|Φg (−βt)| P (An ) → 1. Thus we need only to restrict our attention to An . Consider S2 . Conditions (A) and (B) imply that there exists a M , cβ , Cβ and Cf , such that for all |t| > M , cβ |t|−κ ≤ |Φg (βt)| ≤ Cβ |t|−κ and Φf (t) ≤ Cf |t|−r . Take n large enough so that M < 1/b. Split the integral in S2 into two ranges, one with |t| ≤ M and the other with |t| > M . Then by (3.2.3) and (3.5.3) we obtain that on the event An , S2 is bounded from the above by |tΦK (bt)Ψn (t)|2 dt + Op (n−1 ) 4 |Φg (−βt)| 1/b≥|t|>M |tΦK (bt)|2 |Ψn (t) − Φh (t)|2 2 2 ˆ ≤ 8µg β − β |Φg (−βt)|4 1/b≥|t|>M 4µ2g βˆ − β 2 |tΦK (bt)Φh (t)|2 + dt + Op (n−1 ). 4 |Φg (−βt)| By the Parseval’s identity Tn (α, β) = 1 2π |ΦK (bt)|2 |Ψn (t) − Φh (t)|2 dt = Op (n−1 b−2κ−1 ), 2 |Φg (−βt)| (3.5.5) because of (3.2.1) and (3.2.2). Because |Φg (βt)|−2 ≤ c2β |t|2κ , the first term within the curly brackets in the above bound is bounded above by b−2κ−2 Tn (α, β) = Op (n−1 b−4κ−3 ). 75 Similarly, assumptions (A) and (B) imply |tΦK (bt)Φf (t)|2 |tΦK (bt)Φh (t)|2 dt = dt = O(bmin(2r−2κ−3,0) ). 4 2 |Φ (−βt)| |Φ (−βt)| g g |t|>M |t|>M Hence, in view of (3.2.3), S2 = Op (n−2 b−4κ−3 ) + Op (n−1 bmin(2r−2κ−3,0) ) = op (n−1 b−2κ ). Next, to analyze S1 . Let 1 S11 := n2 |ΦK (bt)|2 | 1 S12 := n2 |ΦK (bt)|2 | 1 S13 := n2 |ΦK (bt)|2 | n ˆ j=1 t(β − β) it(Yj −α−β Zj ) 2 | Zj e |Φg (−βt)|2 it(Yj −α−β Zj ) 2 n | j=1 te 2 |Φg (−βt)| n ˆ j=1 t((β − β) 76 dt, it(Yj −α−β Zj ) 2 | Zj )2 e |Φg (−βt)|2 dt, dt. (3.5.6) Using the fact Yj − α − β Zj = εj − β uj , we obtain on the event An , S1 ≤ 4 (3.5.7) |ΦK (bt)|2 ˆ n (t) − Ψn (t)|2 |Ψ dt |Φg (−βt)|2 |ΦK (bt)|2 | 16 ≤ n2 n ˆ j=1 t(β − β) Zj e it(εj −β uj ) 2 | |Φg (−βt)|2 16(ˆ α − α)2 + n2 it(εj −β uj ) 2 n | j=1 te |Φg (−βt)|2 |ΦK (bt)|2 | |ΦK (bt)|2 | 16 + 2 2 n b n ˆ j=1 t((β − β) Zj )2 e dt dt it(εj −β uj ) 2 | |Φg (−βt)|2 16(ˆ α − α)4 + n 2 b2 it(εj −β uj ) 2 n | j=1 te |Φg (−βt)|2 |ΦK (bt)|2 | dt dt + Op n−3 b−2κ−7 = 16[S11 + (ˆ α − α)2 S12 + b−2 S13 + (ˆ α − α)4 b−2 S12 ] + Op n−3 b−2κ−7 , by (3.2.3), assumption (A), and the fact that n 3 j=1 |Zj | = Op (n). Now, consider S11 . p S11 ≤ p k=1 (βˆk − βk )2 n2 |ΦK (bt)|2 | it(εj −β uj ) 2 n | j=1 tZkj e |Φg (−βt)|2 Since X, u and ε are mutually independent, for any k = 1, · · · , p, EZk e it(εj −β uj ) = EXk Φh (t) + Φf (t)Euk e−itβ u . 77 dt We use this to obtain |ΦK (bt)|2 | 1 n2 it(εj −β uj ) 2 n | j=1 tZkj e 2 |Φg (−βt)| dt it(εj −β uj ) it(ε −β uj ) 2 n − EZkj e j ] j=1 [Zkj e b2 |Φg (−βt)|2 2 n |ΦK (bt)|2 j=1 EXk tΦh (t)] dt |Φg (−βt)|2 n −itβ u | |ΦK (bt)|2 j=1 tΦf (t)Euk e dt. |Φg (−βt)|2 |ΦK (bt)|2 3 ≤ n2 3 + 2 n 3 + 2 n (3.5.8) dt An argument similar to the one used in the proof of Theorem 1 in Holzmann et al. (2007) implies that the first summand in the upper bound of (3.5.8) is Op (n−1 b−2κ−3 ). The second summand is Op (1), by assumption (B), and Φh (t)/Φg (−βt) = Φf (t). To analyze the third summand in the upper bound of (3.5.8), decompose the integral into two ranges, |t| > M and |t| ≤ M , and use the conditions (A)-(B) to show that the term with integration over |t| ≤ M is Op (1), while the term with |t| > M is of the order Op (bmin(2r−2κ−3,0) ), thereby showing that the third summand in (3.5.8) is of the order Op (1) + Op (b2r−2κ−3 ). Thus S11 = Op (n−1 b−2κ−3 ) + Op (1) + Op (b2r−2κ−3 ). (3.5.9) Similarly one obtains that S12 and S13 are of the same order as S11 . Then (3.5.7), (3.5.9), nb2κ+3 → ∞, nb7/2 → ∞ imply nb2κ S1 = Op (n−1 b−3 ) + Op (b2κ ) + Op (b2r−3 ) + Op (n−2 b−7 ) = op (1). This together with (3.5.6) completes the proof of (3.5.1). 78 From (3.5.1) and (3.5.5) we obtain (fn − fˆn )2 (x) dx + 2 Tˆn − Tn = (fn − fˆn )(fn − Kb ∗ f0 )(x) dx = op (n−1 b−2κ−1/2 ), by (3.2.1) and (3.2.2). Hence, in view of (3.2.2), 1/2 n/CV,b Tˆn − CM,b / (2π)n →d N 0, 1/2π 2 . (3.5.10) To complete the proof of (3.2.4), it suffices to show that (a) 1 − 1/2 CˆV,b 1/2 = op (b1/2 ), CV,b (b) CˆM,b 1/2 CˆV,b − CM,b 1/2 = op (1). CV,b To show (3.5.11)(a), recall ψ(β, s, t) := Φg (βt + βs)Φf (t + s). Then |CV,b − CˆV,b | |ΦK (tb)|2 |ΦK (sb)|2 |ψ(β, s, t)|2 ds dt 2 2 |Φg (βt)| |Φg (βs)| = − |ΦK (tb)|2 |ΦK (sb)|2 ˆ s, t)|2 ds dt |ψ(β, 2 2 ˆ ˆ |Φg (βt)| |Φg (βs)| ˆ 2| |ΦK (tb)|2 |ΦK (sb)|2 ||Φg (βt)|2 − |Φg (βt)| |ψ(β, s, t)|2 ds dt 2 2 2 ˆ |Φg (βt)| |Φg (βs)| |Φg (βt)| ≤ + ˆ 2| |ΦK (tb)|2 |ΦK (sb)|2 ||Φg (βs)|2 − |Φg (βs)| |ψ(β, s, t)|2 ds dt 2 2 2 ˆ ˆ |Φg (βt)| |Φg (βs)| |Φg (βs)| + |ΦK (tb)|2 |ΦK (sb)|2 ˆ s, t)|2 ds dt |ψ(β, s, t)|2 − |ψ(β, 2 2 ˆ ˆ |Φg (βt)| |Φg (βs)| 79 (3.5.11) In view of (3.5.4), the first term in the above bound is bounded from the above by max 1 − |t|≤1/b |Φg (βt)|2 ˆ 2 |Φg (βt)| |ΦK (tb)|2 |ΦK (sb)|2 |ψ(β, s, t)|2 ds dt |Φg (βt)|2 |Φg (βs)|2 = Op (n−1/2 b−1−κ CV,b ). The other two terms in the above bounds are bounded similarly. Together with (3.2.1) and nb2κ+1 → ∞, we obtain |1 − CˆV,b /CV,b | = Op (n−1/2 b−1−κ ) = op (b−1/2 ), which implies (3.5.11)(a). Next, consider (3.5.11)(b). Applying (3.2.1), (3.5.11)(a) and nb2κ+1 → ∞, 1/2 1/2 |CˆM,b /CˆV,b − CM,b /CV,b | −1/2 −1/2 1/2 1/2 ≤ |CˆM,b − CM,b ||CˆV,b | + CM,b CV,b 1 − CˆV,b /CV,b ≤ max 1 − |t|≤1/b |Φg (βt)|2 1/2 CM,b CˆV,b + op (1) 2 ˆ |Φg (βt)| = Op (n−1/2 b−3/2−κ ) = op (1). This completes the proof of (3.5.11), which combined with (3.5.10) also prove (3.2.4), thereby completing the proof of Theorem 3.2.1. Proof of Theorem 3.2.2. Let ζβ (b) := exp(2ν(β)/bλ ), β ∈ R. We shall first show that n bλ−1+2λω+2λ0 ζβ (b) (fˆn − fn )2 (x)dx = op (1). 80 (3.5.12) The proof is similar as in the ordinary smooth case. We only list some main differences. First, arguing as for (3.5.4), for the super smooth case, (A ) implies ˆ Φg (−βt) −1 |t|≤1/b Φg (−βt) max = ˆ − Φg (−βt) Φg (−βt) Φg (−βt) |t|≤1/b max (3.5.13) 1/2 = Op (n−1/2 b−1+2λ0 ζβ (b)). ˆ ≥ |Φg (−βt)|/2, t ∈ [−1/b, 1/b]}. By (3.2.6), hence P (An ) → 1, with An := {|Φg (−βt)| Assumptions (B ) and (D ) imply that there exist constants M, cβ , Cβ < ∞, such that λ λ for |t| > M , cβ |t|λ0 e−ν(β)|t| ≤ |Φg (t)| ≤ Cβ |t|λ0 e−ν(β)|t| and |Φf (t)| ≤ Cg1 |t|−λ1 . Also, on the event An , there exists some β˜ between βˆ and β, such that S2 is bounded from the above by |ΦK (bt)|2 |Ψn (t) − Φh (t)|2 + |Φh (t)|2 2µg βˆ − β 2 dt b2 |Φg (−βt)|4 |t|≥M +Op (n−1 ). Based on (3.2.5), |ΦK (bt)|2 |Ψn (t) − Φg (t)|2 dt = Op (n−1 bλ−1+2λω+4λ0 ζβ2 (b)). 4 |Φg (−βt)| |t|≥M From Lemma 5 in van Es and Uh (2005), it follows that |ΦK (bt)Φh (−βt)|2 dt = Op (b2λ0 +2λ1 +λ(1+2ω) ζβ (b)). |Φg (−βt)|4 |t|≥M 81 So when λ1 > 1 and n, b satisfy (3.2.6), we have S2 = op (n−1 bλ−1+2λω+2λ0 ζβ (b)). Now we consider S1 . Follow the same arguments as (3.5.7), using assumptions (A ) and (B ) to obtain S1 ≤ 8(βˆ − β)2 /b2 S11 + 8(ˆ α − α)2 /b2 S12 + Op b2λ0 −1 ζβ (b) n2 b 4 . (3.5.14) Consider S11 first. Similar as (3.5.8), together with assumptions (A )-(D ), we obtain S11 = Op (n−1 b−1+2λ0 ζβ (b)) + Op (1) + Op (b2λ0 −1+2λ1 +λ(1+2ω) ζβ (b)). (3.5.15) S12 can be considered the same way. Thus the above arguments(3.2.6), (3.5.14) and (3.5.15) imply nS1 λ−1+2λω+2λ 0 ζβ (b) b = Op (n−1 b−λ−2λω ) + Op (n−1 b−λ−2λω−1−2λ0 ) +O(b2λ1 −2 ) + Op (n−1 b−λ−4−2λω ) = op (1). This completes the proof of (3.5.12). Combining this with (3.2.5), we obtain Tˆn − Tn = (fn − fˆn )2 (x) dx + 2 (3.5.16) (fn − fˆn )(fn − Kb ∗ f0 )(x) dx = op (n−1 bλ−1+2λω+2λ0 ζβ (b)). 82 Also, since βˆ − β = Op (n−1/2 ), the first derivatives of ν(β) and C(β) exist, ˆ − ν(β))/bλ | = op (1). |1 − exp − 2(ν(β) (3.5.17) Then (3.5.16) and (3.5.17) yield to (3.2.7), thus we complete the proof of Theorem 3.2.2. Proof of Theorem 3.3.1. Define T˜n = 2 fˆn (x) − Kb ∗ f1 (x) dx, Argue as in the proof of Theorem 3.2.1 to obtain −1/2 nC˜V,b T˜n − CˆM,b / 2πn →d N 0, 1/2π 2 , (3.5.18) where C˜V,b is same as CˆV,b with f replaced by f1 . Hence, C˜V,b ≈ b−(4κ+1) . Next, consider nb2κ+1/2 (Tˆn − T˜n ) = nb2κ+1/2 +2nb2κ+1/2 Because Kb ∗ f0 (x) − Kb ∗ f1 (x) 2 dx fˆn (x) − Kb ∗ f1 (x) Kb ∗ f1 (x) − Kb ∗ f0 (x) dx. 2 Kb ∗ f0 (x) − Kb ∗ f1 (x) dx → f1 − f0 2 > 0 and nb2κ+3 → ∞, the first term in the right hand side above is of the order O(nb2κ+1/2 ) → ∞, while by (3.5.18) and the Cauchy-Schwarz inequality, the second term is of the order op (nb2κ+1/2 ). This completes the proof of Theorem 3.3.1. The proofs of Theorems 3.3.2, 3.3.3 and 3.3.4 are similar to those of Theorems 3.3.1 and 83 3.2.1, and hence no details are given. Proof of Theorem 3.3.5. For the sake of completeness of this chapter, we first provide a brief proof of (3.3.3). For j = 1, · · · , n, let Dj = 1 π |ΦK (tb)|2 it(εj −β uj ) e − Φf1 (t) Φf1 (t) − Φf0 (t) dt. Φg (−βt) Note that since K is symmetric, Dj is real. Rewrite Tn − Kb ∗ (f1 − f0 ) 2 (fn − Kb ∗ f1 )2 dx + 2 = (fn − Kb ∗ f1 )(Kb ∗ (f1 − f0 )) dx. Recall (3.2.2) and that nb4κ+2 → ∞. Hence, the first term on the right hand side above is Op (n−1 b2κ+1 ) = op (n−1/2 ). Using Parseval’s equation, the second term can be written as n−2 n j=1 Dj . Note that Dj ’s are independent arrays identically distributed r.v.’s, with ED1 = 0, and Var(D1 ) converging to τ02 : = 1 π2 Φh (t − s) 1 − 2 π 1 = Var 2π 3 (Φf1 (s) − Φf0 (s))(Φf1 (t) − Φf0 (t)) Φg (βs)Φg (−βt) Φf1 (−t)(Φf1 (t) − Φf0 (t)) dt e−itε Φf1 (t) − Φf0 (t) Φg (βt) ds dt 2 dt . Moreover, 1 E|D1 |4 ≤ 4 π 4 1 + |Φf1 (t)| |Φf1 (t)| + |Φf0 (t)| dt = O(1), |Φg (−βt)| 84 by the assumption (B) with r > κ + 1. Hence one obtains (3.3.3), by the Lindeberg-Feller CLT. To complete the proof of Theorem 3.3.5, first, consider the case where α is known, so that fˆn is based on the residuals Yi − α − βˆ Zi ’s only. Without loss of generality, assume α = 0. Under the alternative H1 , n1/2 (Tˆn − Tn ) = n1/2 (fˆn − fn )2 (x) dx + 2n1/2 +2n1/2 (fˆn − fn )(fn − Kb ∗ f1 )(x) dx (fˆn − fn )(Kb ∗ f1 − Kb ∗ f0 )(x) dx. The same proof as that of (3.5.1) and nb4κ+2 → ∞ imply n1/2 (fˆn − fn )2 (x) dx = op (n−1/2 b−2κ ) = op (1). (3.5.19) This fact together with (3.2.4) and the Cauchy-Schwarz inequality implies 2n1/2 (fˆn − fn )(fn − Kb ∗ f1 )(x) dx = op (n−1 b−3κ−1 ) = op (1). (3.5.20) To deal with the the remaining part, let ∆f (x) := (Kb ∗ f1 − Kb ∗ f0 )(x). Rewrite fˆn − fn as the sum of the following two terms: D1 := D2 := e−itx ΦK (bt) e−itx ΦK (bt) it(εj −βˆ uj ) n (e j=1 it(ε −βˆ uj ) −e j ) ∆f (x) dt dx, ˆ 2πnΦg (−βt) it(εj −βˆ uj ) n e j=1 1 ˆ Φg (−βt) 2πn 85 − 1 ∆f (x)dtdx. Φg (−βt) Consider D1 first. Since nb4κ+2 → ∞, and κ > 1, then uniformly in |t| ≤ 1/b, 1 n n it(ε −βˆ uj ) e j it(ε −β uj ) −e j n j=1 it(β = ˆ Zj eit(εj −βˆ uj ) − β) n j=1 + op (n−1/2 ). Let te−itx ΦK (bt) C0 := it(εj −β uj ) n j=1 [Zjk e − EZjk e it(εj −β uj ) ] 2πnΦg (−βt) dt∆f (x) dx. Then EC0 = 0 and EC02 (3.5.21) it(εj −β uj ) n te−itx Φ ≤E ≤ ≤ j=1 E|Zk |2 n E|Zk |2 n E|Zk |2 ≤ 2πn K (bt) te−itx − EZjk e 2πnΦg (−βt) Zjk e it(εj −β uj ) dt∆f (x) dx 2 ΦK (bt) dt∆f (x) dx 2πΦg (−βt) te−itx ΦK (bt) |ΦK (bt)| dt |Φg (−βt)| 2 ΦK (bt) dt∆f (x) dx 2πΦg (−βt) |∆f˙ (x)| dx 2 = O(n−1 b−2κ−2 ) = o(1). Hence, C0 = op (1). Since EZk eit(ε−β u) = µZ Φh (t) + Φf1 (t)Euk e−iβ ut , assumption (B) with r > κ + 1 and the relation Φh (t) = Φg (t)Φf1 (t) imply te−itx ΦK (bt) EZk eit(ε−β u) dt∆f (x)dx = O(1). Φg (−βt) 86 Together with (3.5.4), and nb4κ+2 → ∞, the above analysis yields D1 + i(βˆ − β) 2πn te−itx ΦK (bt) EZeit(ε−β u) dt∆f (x)dx Φg (−βt) (3.5.22) = Op (n−1 b−κ−1 ) = op (n−1/2 ). Next consider D2 . Uniformly in |t| ≤ 1/b, ˆ − Φg (−βt) Φg (−βt) Φ2g (−βt) p it(βk − βˆk )Euk e−iβ ut (βk − βˆk )2 t2 Eu2 e−iβ ut − Φ2g (−βt) Φ2g (−βt) = k=1 + Op (n−3/2 b−3−2κ ). Let C1 := te−itx ΦK (bt) n it(ε−β u) − Φ (t))Eu e−iβ ut h k j=1 (e ∆f (x)dtdx, 2 Φg (−βt) C2 := t2 e−itx ΦK (bt) n it(ε−β u) − Φ (t))Eu2 e−iβ ut h j=1 (e k ∆f (x)dtdx. 2 Φg (−βt) Note that ECi = 0, i = 1, 2, and same arguments as (3.5.21) yield EC12 = O(n−1 b−4κ−2 ) = o(1), EC22 = O(n−1 b−4κ−4 ) = o(b−2 ). Hence, C1 = op (1) and C2 = op (b−1 ). Since Φh (t) = Φf1 (t)Φg (−βt), nb4κ+2 → ∞, by (3.5.4) and assumption (B) with r > κ + 1, we obtain i(βˆ − β) D2 − 2πn te−itx Φ K (bt) = op (n−1/2 ). 87 Φf1 (t)Eue−iβ ut Φg (−βt) dt∆f (x) dx (3.5.23) Also, − 1 2π ite−itx ΦK (bt)Φf1 (t)dt = Kb ∗ f˙1 (x). Combine this with (3.5.22) and (3.5.23) to obtain (fˆn − fn )(fn − Kb ∗ f0 )(x) dx = (βˆ − β) Bf + op (n−1/2 ). 2n1/2 Recall (3.5.19) and (3.5.20), immediately n1/2 (Tˆn − Tn − (βˆ − β) Bf ) = op (1). (3.5.24) Next, consider the case when the intercept parameter α is unknown. Let a = α − α ˆ. Then ˆ − Kb ∗ f0 (x + a) 2 dx fn (x, α, β) Tˆn = ˆ − Kb ∗ f0 (x) 2 dx fn (x, α, β) = + −2 Kb ∗ f0 (x + a) − Kb ∗ f0 (x) 2 dx ˆ − Kb ∗ f0 (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx. fn (x, α, β) ˆ and from (3.5.24) we have The first term on the right side above is Tn (α, β), ˆ − Tn − (βˆ − β) Bf ) = op (1). n1/2 (Tn (α, β) Because f˙0 exists, and is finite, and a = Op (n−1/2 ), the second term is Op (n−1 ). Then to 88 deal with the third term, rewrite the factor multiplying -2 as the sum of the following three terms: ˆ − fn (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx, fn (x, α, β) fn (x) − Kb ∗ f1 (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx, Kb ∗ f1 (x) − Kb ∗ f0 (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx. By using the Cauchy-Schwarz inequality, together with a = Op (n−1/2 ), (3.5.18) and (3.5.24), verify that each of the first two terms above are op (n−1/2 ). The finiteness of f¨0 implies that the third term is equal to a Kb ∗ f1 (x) − Kb ∗ f0 (x) Kb ∗ f˙0 (x) dx + op (n−1/2 ). The above analysis and (3.5.24) imply n1/2 Tˆn − Tn − (βˆ − β) Bf − (ˆ α − α)Af = op (1). (3.5.25) This fact together with (3.3.3) completes the proof of Theorem 3.3.5. Proof of Theorem 3.3.6. For Tˆn , recall (3.3.5), (3.3.6) and (3.5.25). Using the details in the proof of Theorem 3.3.5, we obtain, 1 Tˆn − Kb ∗ (f1 − f0 ) 2 = n n (Dj + ηj Af + ζj Bf ) + op (n1/2 ). j=1 89 We write τ 2 := Var(D1 + η1 Af + ζ1 Bf ). (3.5.26) Since Dj + ηj Af + ζj Bf , for j = 1, · · · , n are arrays of i.i.d. zero mean r.v.’s and E|D1 |4 = O(1), E|η|2+ϑ < ∞ and E ζ 2+ϑ < ∞ for some ϑ > 0. Thus the claim (3.3.7) follows by the Lindeberg-Feller CLT, thereby completing the proof. The proof of Theorem 3.3.7 is similar as the arguments in the proof of Theorem 3.3.5 and 3.3.6. Thus we omit the details of the proof. 90 BIBLIOGRAPHY 91 BIBLIOGRAPHY [1] Bachmann, D. and Dette, H. (2005). A note on the Bickel–Rosenblatt test in autoregressive time series. Statistics and Probability Letters, 74(3), 221–234. [2] Bercu, B., and Portier, B. (2008). Kernel density estimation and goodness-of-fit test in adaptive tracking. SIAM Journal on Control and Optimization, 47(5), 2440–2457. [3] Bickel, P. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. The Annals of Statistics, 1, 1071–1095. [4] Boldin, M.V. (1982). An estimate of the distribution of the noise in an autoregressive scheme. Theory of Probability and Its Applications, 27(4), 866–871. [5] Boldin, M.V. (1990). On testing hypotheses in the sliding average scheme by the kolmogorovsmirnov and ω 2 tests. Theory of Probability and Its Applications, 34(4), 699– 704. [6] Borkowski, P. and Mielniczuk, J. (2012). Performance of variance function estimators for autoregressive time series of order one: asymptotic normality and numerical study. Control and Cybernetics, 41, 415-441. [7] Butucea, C. (2004). Asymptotic normality of the integrated square error of a density estimator in the convolution model. Sort, 28(1), 9–26. [8] Cheng, C. L., and Van Ness, J. W. (1999). Statistical regression with measurement error. John Wiley & Sons. [9] Cheng, F., and Sun, S. (2008). A goodness-of-fit test of the errors in nonlinear autoregressive time series models. Statistics & Probability Letters, 78(1), 50–59. [10] Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association, 83(404), 1184–1186. [11] Carroll, R. J., Ruppert, D., Stefanski, L.A., and Crainiceanu, C.M. (2012). Measurement error in nonlinear models: a modern perspective. CRC Press. 92 [12] Delaigle, A., and Hall, P. (2006). On optimal kernel choice for deconvolution. Statistics and Probability Letters, 76(15), 1594–1602. [13] Ducharme, G. R., and Lafaye de Micheaux, P. (2004). Goodness-of-fit tests of normality for the innovations in ARMA models. Journal of Time Series Analysis, 25(3), 373–395. [14] Durbin, J. (1973). Distribution Theory for Tests Based on the Sample Distribution Function. CBMS Regional Conference Series in Applied Mathematics 9. SIAM, Philadelphia. [15] Fan, J. (1991). Asymptotic normality for deconvolution kernel density estimators. Sankhy¯a: The Indian Journal of Statistics, Series A, 53(1), 97–110. [16] Fan, J. and Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85(3), 645–660. [17] Fan, J., and Yao, Q. (2002). Nonlinear time series: Nonparametric and Parametric Methods. Springer Series in Statistics, Springer-Verlag New York, Inc. [18] Freedman, D.A. (1975). On tail probabilities for martingales. The Annals of Probability 3(1), 100–118. [19] Fuller, W. A. (2009). Measurement error models. John Wiley & Sons. [20] Horv´ath, L., and Zitikis, R. (2006). Testing goodness of fit based on densities of GARCH innovations. Econometric theory, 22(03), 457–482. [21] Jiang, J. (2001). Goodness-of-fit tests for mixed model diagnostics. Annals of statistics, 1137–1164. [22] Khmaladze, E. V. (1982). Martingale Approach in the Theory of Goodness-of-fit Tests. Theory of Probability & Its Applications, 26(2), 240–257. [23] Khmaladze, E. V. (1993). Goodness of fit Problems and Scanning Innovation Martingales. Annals of Statistics, 21 (2), 798-829. [24] Khmaladze, E. V. and Koul, H. L. (2004). Martingale transforms goodness-of-fit tests in regression models. The Annals of Statistics, 32, 995–1034. [25] Khmaladze, E. V. and Koul, H. L. (2009). Goodness-of-fit problem for errors in nonparametric regression. The Annals of Statistics, 37, 3165–3185. 93 [26] Koul, H.L. Asymptotic behavior of Wilcoxon type confidence regions in multiple linear regression. Annals of Mathematical Statistics, 40 (1969),1950-1979 [27] H.L. Koul Some convergence theorems for ranks and weighted empirical cumulatives Annals of Mathematical Statistics, 41 (1970), 1768-1773 [28] Koul, H. L. (1991). A weak convergence result useful in robust autoregression. Journal of Statistical Planning and Inference, 29(3), 291–308. [29] Koul, H. L. (2002). Weighted Empirical Processes in Dynamic Nonlinear Models. Second Edition. Lecture Notes in Statistics, 166. Springer-Verlag New York, Inc. [30] Koul, H. L. and Ling, S. (2006). Fitting an error distribution in some heteroscedastic time series models. The Annals of Statistics, 34(2), 994–1012. [31] Koul, H. L. and Mimoto, N. (2012). A goodness-of-fit test for GARCH innovation density. Metrika, 75(1), 127–149. [32] Koul, H. L. and Song, W. (2012). A class of goodness-of-fit tests in linear errors-invariables model. Journal de la Soci´et´e Fran¸caise de Statistique, 153(1), 52–70. [33] Lee, S. and Na, S. (2002). On the BickelRosenblatt test for first-order autoregressive models. Statistics and Probability Letter 56, 23–35. [34] Loynes, R. M. (1980). The empirical d.f. of residuals from generalized regression. The Annals of Statistics, 8, 285–298. [35] Loubes, J. M. and Marteau, C. (2004). Goodness-of-fit testing strategies from indirect observations. Journal of Nonparametric Statistics, 26(1), 85–99. [36] Masry, E. (1996). Multivariate local polynomial regression for time series: Uniform strong consistency and rates. Journal of Time Series Analysis, 17(6), 571–599. [37] M¨ uller, U. U., Schick, A., and Wefelmeyer, W. (2007). Estimating the error distribution function in semiparametric regression. Statistics and Decisions-International Journal Stochastic Methods and Models, 25(1), 1-18. [38] M¨ uller, U.U., Schick, A. and Wefelmeyer, W. (2009a). Estimating the innovation distribution in nonparametric autoregression. Probability Theory and Related Fields, 144(1-2), 53–77. 94 [39] M¨ uller, U. U., Schick, A., and Wefelmeyer, W. (2009b). Estimating the error distribution function in nonparametric regression with multivariate covariates. Statistics & Probability Letters, 79(7), 957-964. [40] M¨ uller, U. U., Schick, A., and Wefelmeyer, W. (2012). Estimating the error distribution function in semiparametric additive regression models. Journal of Statistical Planning and Inference, 142(2), 552–566. [41] Na, S. (2009). Goodness-of-fit test using residuals in infinite-order autoregressive models. Journal of the Korean Statistical Society, 38(3), 287–295. [42] Neumeyer, N. and Van Keilegom, I. (2010). Estimating the error distribution in nonparametric multiple regression with applications to model testing. Journal of Multivariate Analysis, 101(5), 1067–1078. [43] Neumeyer, N. and Selk, L. (2013). A note on non-parametric testing for gaussian innovations in ararch models. Journal of Time Series Analysis, 34(3), 362–367. [44] Nikabadze, A. M. (1988). On a Method for Constructing Goodness-of-Fit Tests for Parametric Hypotheses in Rm . Theory of Probability and Its Applications, 32(3), 539– 544 [45] Ojeda, J.L. (2008). H¨older continuity properties of the local polynomial estimator. Prepublicaciones del Seminario Matem´ atico, 4, 1–21. [46] Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27(3), 832–837. Chicago [47] Selk, L. and Neumeyer, N. (2013). Testing for a Change of the Innovation Distribution in Nonparametric Autoregression: The Sequential Empirical Process Approach, Scandinavian Journal of Statistics, 40(4), 770–788. [48] Soms, A. P. (1976). An asymptotic expansion for the tail area of the t -distribution. Journal of the American Statistical Association, 71, 728–730. [49] Stefanski, L. A. and Carroll, R. J. (1990).Deconvolving kernel density estimators. Statistics, 21(2), 169–184. [50] Tsigroshvili, Z. (1998). Some notes on goodness-of-fit tests and innovation martingales. Proceedings of A. Razmadze Mathematical Institute, 117, 89–102. 95 [51] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence. Springer, New York. [52] Van Es, A. and Uh, H. -W. (2004). Asymptotic normality of nonparametric kernel type deconvolution density estimators: crossing the cauchy boundary. Nonparametric Statistics, 16(1-2), 261–277. [53] Wu, W.B., Huang, Y. and Huang, Y. (2010). Kernel estimation for time series: An asymptotic theory. Stochastic Processes and their Applications, 120(12), 2412–2431. [54] Yao, Q. and Tong, H. (1994). Quantifying the influence of initial values on non-linear prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 56(4), 701–725. 96