TAIL ESTIMATION OF THE SPECTRAL DENSITY UNDER FIXED-DOMAIN ASYMPTOTICS By Wei-Ying Wu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics 2011 ABSTRACT TAIL ESTIMATION OF THE SPECTRAL DENSITY UNDER FIXED-DOMAIN ASYMPTOTICS By Wei-Ying Wu For spatial statistics, two asymptotic approaches are usually considered: increasing domain asymptotics and fixed-domain asymptotics (or infill asymptotics). For increasing domain asymptotics, sampled data increase with the increasing spatial domain, while under infill asymptotics, data are observed on a fixed region with the distance between neighboring observations tending to zero. The consistency and asymptotic results under these two asymptotic frameworks can be quite different. For example, not all parameters are consistently estimated under infill asymptotics while consistency holds for those parameters under increasing asymptotics (Zhang 2004). For a stationary Gaussian random field on Rd with the spectral density f (λ) that satisfies f (λ) ∼ c |λ|−θ as |λ| → ∞, the parameters c and θ control the tail behavior of the spectral density where θ is related to the smoothness of random field and c can be used to determine the orthogonality of probability measures for a fixed θ. Specially, c corresponds to the microergodic parameter mentioned in Du et al. (2009) when Mat´rn covariance is assumed. e Additionally, under infill asymptotics, the tail behavior of the spectral density dominates the performance of the prediction, and the equivalence of the probability measures. Based on those reasons, it is significant in statistics to estimate c and θ. When the explicit form of f is known, its corresponding covariance structure can be computed through the Fourier transformation. Therefore, spatial domain methodologies like Maximum Likelihood Estimator (MLE) or Tapering MLE can be used for the estimation of c and θ. Unfortunately, the exact form of f should be unknown in practice. Under this situation, spatial domain methods will not be applied without the covariance information. In my work, for data observed on grid points, two methods which utilize tail frequency information are proposed to estimate c and θ. One of them can be viewed as a weighted local Whittle type estimator. Under proposed approaches, the explicit form of f and the restriction of the dimension are not necessary. The asymptotic properties of the proposed estimators under infill asymptotics (or fixed-domain asymptotics) are investigated in this dissertation together with simulation studies. ACKNOWLEDGMENT I would like to give my deepest thank to my supervisors, Professor Yimin Xiao and Chae Young Lim for their invaluable support and encouragement in my PhD studying at Michigan State University. They are always patiently sharing and explaining their knowledge for me. Without their help and guidance, it would have been impossible for me to finish this work. Also, I thank Professor Chae Young Lim for providing me financial support from her MSU grant (MSU 08-IRGP-1532) for my studies in the year 2009-2010. I also appreciate the help from my other committee members, Professor Mark M. Meerschaert and Professor Zhengfang Zhou. Thank you for your detailed suggestion and time on my dissertation. My next thank to my best friends and classmates - Wei-Wen Hsu, Gengxin Li, Hsiu-Ching Chang and Sumit Sinha for their help and suggestions in the past years. I am indebted to Professor Lijian Yang, Ms. Suzanne Watson and Mr. Eric Segur for their tremendous help and constant support. I want to thank all the people who have helped and inspired me during my doctoral studying. Most importantly, to give my thanks to my wife Wei-Ling, my parents and other families, whose patient love enabled me to complete this work. iv TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction 1.1 Increasing domain and fixed-domain asymptotics 1.2 The tail behavior of the spectral density . . . . . 1.2.1 Equivalence of probability measures . . . . 1.2.2 Prediction under fixed-domain asymptotics 2 Main Results 2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . 2.2 Asymptotic properties of a smoothed periodogram 2.3 Approach I . . . . . . . . . . . . . . . . . . . . . 2.3.1 Estimation of c under the known θ . . . . 2.3.2 Estimation of θ under the known c . . . . 2.3.3 Estimation under unknown c and θ . . . . 2.4 Approach II . . . . . . . . . . . . . . . . . . . . . 2.4.1 Estimation of c under known θ . . . . . . 2.4.2 Estimation of θ under known c . . . . . . 2.4.3 Estimation under unknown θ and c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii . . . . 1 2 4 5 7 . . . . . . . . . . 11 11 14 17 18 20 21 23 23 24 26 3 Simulation Study 28 4 Discussion 37 5 Appendix 5.1 The properties of gc,θ (λ) . . . . . . . 5.2 Proofs of Theorems in Section 2 . . 5.2.1 Proofs of Theorems in Section 5.2.2 Proofs of Theorems in Section 5.2.3 Proofs of Theorems in Section 40 40 45 45 47 73 . . . . 2.2 2.3 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v . . . . . 84 LIST OF TABLES 3.1 Estimation of θ under known c . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Estimation of c under known θ . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Estimation of θ under known c . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Estimation of θ under known c . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5 Estimation of c under known θ . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Estimation of θ under known c . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7 Estimation of θ under known c (Second approach) . . . . . . . . . . . . . . . 33 3.8 Estimation of θ under unknown c for Example 1 . . . . . . . . . . . . . . . . 33 3.9 Estimation of θ under unknown c for Example 2 . . . . . . . . . . . . . . . . 33 vi LIST OF FIGURES 3.1 Histogrm of Example 1 on different c. . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Histogrm of Example 2 on different c. . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Histogrm of Example 2 with different grid sizes on wrong c. 36 vii . . . . . . . . . Chapter 1 Introduction With recent advances in technology, we are facing enormous amount of data sets. When those data sets are observed on a regular grid, spectral analysis is popularly used due to fast computation using the Fast Fourier Transform. For example, parameters of the spectral density of a stationary lattice process can be estimated using a Whittle likelihood [Whittle, (1954)], which is more efficient in terms of computation compared to the maximum likelihood method on a spatial domain. In my dissertation, I propose new methodologies developed from the perspective of spectral analysis to estimate parameters that control the tail behavior of the spectral density for a stationary Gaussian random field under fixed-domain asymptotics, which is one of two famous sampling schemes in spatial statistics. The second sampling scheme is the increasing domain asymptotics. Before explaining my research problem, I first introduce these two sampling schemes and their differences. 1 1.1 Increasing domain and fixed-domain asymptotics Spatial data on a grid often can be regarded as a realization of a random field on a lattice. That is, for a random field, Z(s) on Rd , data is observed at ϕJ for J ∈ ∏d j=1 {1, · · · , mj }, where ϕ is a grid length. When ϕ is fixed and the sample size is increasing (increasing domain asymptotics), asymptotic properties of parameter estimates on a spectral domain have been studied by many authors [see, e.g., Whittle (1954), Guyon (1982, 1995), Boissy et al. (2005) and Guo et al. (2009)]. For example, Guyon (1982) studied asymptotic properties of estimators using a Whittle likelihood or its variants when a parametric model is assumed for the spectral density of a stationary process on a lattice. Guo et al. (2009) studied asymptotic properties of estimators of long-range dependence parameters for anisotropic spatial linear processes using a local Whittle likelihood method in which a parametric form near zero frequency is only assumed. This is an extension of Robinson’s research (1995) on time series. For spatial data, it is often natural to assume that the data is observed on a bounded domain of interest, therefore more observations on the bounded domain means that the distance between observations, ϕ, decreases as the number of observations increases. This sampling scheme requires a different asymptotic framework, called fixed-domain asymptotics [Stein (1999)] (or infill asymptotics [Cressie (1993)]). It has been shown that the asymptotic results under fixed-domain asymptotics can be different from the results under increasing-domain asymptotics [see, e.g., Mardia and Marshall (1984), Ying (1991, 1993), and Zhang (2004)]. For example, Zhang (2004) showed not all parameters in the Mat´rn covariance model of a stationary Gaussian random field e on Rd are consistently estimable when d is smaller than or equal to 3. He also showed 2 that a reparameterized quantity which is a function of variance and scale parameters can be estimated consistently by the maximum likelihood method. On the other hand, under increasing-domain asymptotics, the maximum likelihood estimators (MLEs) of variance and scale parameters for a stationary Gaussian process are consistent and asymptotically normal [Mardia and Marshall (1984)]. Although not all parameters can be estimated consistently under fixed-domain asymptotics, a microergodic parameter can be estimated consistently [see, e.g., Ying (1991, 1993), Zhang (2004), Zhang and Zimmerman (2005), Du et al. (2009), and Anderes (2010)]. The microergodicity of functions of parameters determines the equivalence of probability measures, whereby a microergodic parameter is the quantity that affects asymptotic mean squared prediction error under fixed-domain asymptotics. [Stein (1990, 1999)]. Although there have been more asymptotic results available recently under fixed-domain asymptotics, it is still very few in contrast with vast literature on increasing-domain asymptotics. Also, most results are for specific models of covariance functions. For example, Ying (1991, 1993) and Chen et al. (2000) studied asymptotic properties of estimators for a microergodic parameter in the exponential covariance function, while Zhang (2004), Loh (2005), Kaufman et al. (2008), Du et al. (2009) and Anderes (2010) investigated asymptotic properties of estimators for the Mat´rn covariance function. For the estimation of the fractal e dimension in the spatial domain under the fixed-domain asymptotics, Constantine and Hall (1994) estimated effective fractal dimension using variogram for a non-Gaussian stationary process on R. Chan and Wood (2004) introduced an increment-based estimator of the fractal dimension of a function of a stationary Gaussian random field on Rd when d = 1 or 2. These asymptotic results are established in the spatial domain. Asymptotic results in the spectral domain are even less under fixed-domain asymptotics. 3 Stein (1995) studied asymptotic properties of a spatial periodogram of a filtered version of a stationary Gaussian random field. Lim and Stein (2008) extended results of Stein (1995) and showed asymptotic normality of a smoothed spatial cross-periodogram under fixeddomain asymptotics. Regarding the parameter estimation in the spectral domain under fixed-domain asymptotics, Chan et al. (1995) proposed a periodogram-based estimator of the fractal dimension of a stationary Gaussian random field when d = 1. In the above discussions, it follows that the properties under increasing domain and fixed domain are quite different and more research works are required for fixed-domain asymptotics. In the next Section, I will begin to introduce my research problem under fixed-domain asymptotics. 1.2 The tail behavior of the spectral density In this dissertation, I propose estimators of parameters that control the tail behavior of the spectral density for a stationary Gaussian random field when the data is observed on a grid within a bounded domain and study their asymptotic properties under fixed-domain asymptotics. Let f (λ) be the spectral density of a stationary Gaussian random field, Z(s) on Rd and we assume that f (λ) ∼ c |λ|−θ as |λ| → ∞, λ ∈ Rd (1.1) where | · | is a usual Euclidean norm and θ > d to ensure integrability of f . That is, we only assume power law for the tail behavior of the spectral density and do not assume any specific parametric form of the spectral density. In the following subsection, the reasons for interest 4 in the tail behavior will be introduced from two perspectives; the equivalence of probability measures and the prediction. 1.2.1 Equivalence of probability measures The equivalence between two probability measures P1 and P2 on a measurable space {Ω, F) is that P1 (A) = 0 for any A ∈ F implies P2 (A) = 0 and denoted by P1 ≡ P2 . We usually assume F is generated by the paths of the process {Z(s), s ∈ D}. When the stationarity is considered for the process, many criteria based on the spectral densities have been developed to classify the equivalence of probability measures [see, e.g.,Ibragimov (1978), Yadrenko (1983) and Du (2009a)]. Theorem 1. (Yadrenko (1983)) Let Pi , i = 1, 2 be two probability measures such that under Pi , the process {Z(s), s ∈ Rd } is stationary Gaussian with mean 0 and a second-order spectral density fi (λ), λ ∈ Rd . If, for some θ > d, f1 (λ) |λ|θ is bounded away from 0 and ∞ as |λ| → ∞, and for some finite c, ∫ |λ|>c { } f2 (λ) − f1 (λ) 2 dλ < ∞. f1 (λ) (1.2) then P1 ≡ P2 on the paths of Z(s), s ∈ D, for any bounded subset D ⊂ Rd . The integrability of (1.2) is determined by the tail of spectral densities. For example, if fi (λ)’s are isotopic, i.e., depend only on |λ|, (1.2) will hold when there exists some ϵ > 0 such that f1 (λ) − 1 = O(|λ|−(d/2+ϵ) ) as|λ| → ∞. f2 (λ) 5 (1.3) This implies the equivalence of probability measures can be verified by the decay degree of their spectral densities. Many applications of the equivalence of measures have been explored to reduce the computational burden like a tapering method. Let ln (θ) be the log likelihood of data observed: n 1 1 ′ −1 ln (θ) = − log(2π) − log[det Vn ] − Xn Vn Xn . 2 2 2 (1.4) where n is sample size, Xn is a data vector and Vn is the covariance matrix. The computation cost to obtain Maximum Likelihood Estimator (MLE) can be expensive. To reduce computational burden, a tapering method on the covariance function can be used: ˜ V (l, θ) = V (l, θ) ◦ Vtap (l). where V (l, θ) is the covariance function of the underlying process that depends on parameter θ (possibly a set of parameters), Vtap (l) is the taper, a known positive function, that is 0 after a threshold distance and “ ◦ ” is Schur or Hadamard product. By replacing V (l, θ) with ˜ V (l, θ), tapered likelihood is attained as n 1 1 ′ ˜ −1 ˜ ln,tap (θ) = − log(2π) − log[det Vn (l, θ)] − Xn Vn (l, θ)n Xn . 2 2 2 (1.5) The consistency of the estimator based on ln,tap (θ) holds if the probability measure under ˜ V (l, θ) is equivalent to the one under V (l, θ) [see. Zhang (2004)]. More theoretical discussion about a tapered method is found in the Chapter 3 [Du, (2009a)]. 6 1.2.2 Prediction under fixed-domain asymptotics The another motivation to study tail behavior of the spectral density comes from its role in prediction. In spatial statistics, the best linear unbiased prediction is called kriging. Let process Z(s) be a mean zero stationary process and data is sampled at locations {s1 , s2 , s3 ...., } which are dense in a bounded region D ⊆ Rd , which implies that the infill sampling is ˆ used. Further, assume s∗ be a new location that we would like to explore. Let Z(s∗ , n) be the best linear unbiased prediction of Z(s∗ ) based on the data Z(s1 ), Z(s2 ), ..., Z(sn ) and ˆ e(s∗ , n) be the error between Z(s∗ ) and Z(s∗ , n). The following theorem [Stein (1998), p. 136] compares the prediction performance between a correct measure P1 and a misspecified measure P2 . Theorem 2. (Stein 1999, p.252) Let Z(s) be a mean zero stationary Gaussian random field under probability measure Pi with spectral density fi , for i = 1, 2. If there exist some ρ > 0 f (λ) such that f1 (λ)|λ|ρ is bounded away from 0 and ∞, and f2 (λ) → 1 as |λ| → ∞, 1 E1 (e2 (s∗ , n) − e1 (s∗ , n))2 =0 n→∞ E1 (e1 (s∗ , n))2 lim E (e (s∗ , n))2 lim 2 2 ∗ =0 n→∞ E1 (e2 (s , n))2 (1.6) where Ei (·) and ei (·) is the expectation and prediction error under probability measure Pi , for i = 1, 2. The above result means no matter which probability measures we used for prediction performance is asymptotically equivalent under the fixed-domain sampling if the tail behavior of f2 is as that of f1 . Thus, understanding the tail behavior of the spectral density is of great importance in spatial statistics. In my dissertation, we introduce two approaches to estimate parameters that control the 7 tail behavior of the spectral density. That is c and θ in (1.1). One of the proposed estimators is obtained by minimizing an objective function that can be viewed as a weighted Whittle likelihood, in which Fourier frequencies near a pre-specified non-zero frequency are considered. This approach is similar to the local Whittle likelihood method introduced by Robinson (1995) for estimating a long-range dependence parameter in time series analysis. For a stationary lattice process, Robinson (1995) proposed to estimate a long-range dependence parameter by minimizing the Whittle likelihood over Fourier frequencies near zero since the long-range parameter dependence is controlled by the behavior of the spectral density near zero. Meanwhile, we are interested in estimating parameters that govern the spectral density of a random field when the frequency is very large so that we need to focus on Fourier frequencies that are away from zero. In our work, we establish consistency and asymptotic normality of estimators of c and estimators of θ, respectively, when the other parameter is known. Some properties are also discussed when both parameters are unknown. Specially, if the Mat´rn covariance model is e considered, c is related to a microergodic parameter. Consider the Mat´rn spectral density e given as f (λ) = σ 2 α2ν π d/2 (α2 + |λ|2 )ν+d/2 , λ ∈ Rd . (1.7) Mat´rn spectral density has three parameters (σ 2 , α, ν), where σ 2 is the variance parameter, e α is the scale parameter and ν is the smoothness parameter. Since the Mat´rn spectral e density satisfies f (λ) ∼ σ 2 α2ν d π2 8 |λ|−(2ν+d) as |λ| → ∞, we have c ≡ σ 2 α2ν /π d/2 and θ ≡ 2ν + d, and σ 2 α2ν is a microergodic parameter. Thus, estimating σ 2 α2ν when ν is known is equivalent to estimate c when θ is known. There are several references that investigate estimation of σ 2 α2ν in the spatial domain. Zhang (2004) showed that σ 2 and α can be estimated only in the form of σ 2 α2ν under fixed-domain asymptotics when ν is known and d ≤ 3. Du et al. (2009) investigated asymptotic properties of the MLE and the tapered MLE of σ 2 α2ν when ν is known, α is fixed and d = 1 for a stationary Gaussian process. Anderes (2010) proposed an increment-based estimator of σ 2 α2ν for a geometric anisotropic Mat´rn covariance function and showed that e α can be estimated separately when d > 4. The parameter θ is related to the fractal index (or fractal dimension) when the process {Z(s), s ∈ Rd } is a stationary isotropic Gaussian process. For example, for a stationary Gaussian random field on Rd , suppose that its covariance function C(t) satisfies C(t) ∼ C(0) − k|t|α as |t| → 0 (1.8) for some k and 0 < α ≤ 2. In this case, α is the fractal index that governs the roughness of sample paths of the process and the fractal dimension D becomes D = d + (1 − α/2). This follows from Theorem 5.1 in Xue and Xiao (2010). When α = 2 in (1.8), it is possible that the sample function may be differentiable. This can be determined by the smoothness of C(t) in items of the spectral measure of {Z(s), s ∈ Rd }. Further information is in Adler and Taylor (2007) and Xue and Xiao (2010). On Abelian type theorem, (1.8) holding the corresponding spectral density satisfies f (λ) ∼ k ′ |λ|−(α+d) 9 as |λ| → ∞ so that θ ≡ α + d in our settings. The rest of this dissertation is organized in the following manner. First, in Chapter 2, we explain our settings and assumptions. We extend the results in Stein (1995) and Lim (2008) to more relaxed condition and then introduce our estimators and state theorems for the asymptotic properties of the proposed estimators. Simulation study will be presented in Chapter 3. In Chapter 4, we will discuss some issues related to our approach and possible extension of the current work. In the final chapter, we give proofs of our theoretical results. 10 Chapter 2 Main Results 2.1 Preliminary In this work, we consider a stationary Gaussian random field, Z(s) on Rd with the spectral density f (λ) that satisfies (1.1). Define a lattice process Yϕ (J ) by Yϕ (J ) ≡ Z(ϕJ ), where J ∈ Zd , the set of d-dimensional integer-valued vectors. The corresponding spectral density of Yϕ (J ) is ¯ fϕ (λ) = ϕ−d ( ∑ f Q∈Zd λ + 2πQ ϕ ) , ¯ for λ ∈ (−π, π ]d . Typically, fϕ (λ) may have a peak near the origin which is getting higher as ϕ → 0. This causes a problem to estimate the spectral density using the periodogram [Stein (1995)]. To alleviate the problem, we consider a discrete Laplacian operator to difference 11 the data, which is proposed by Stein (1995). The Laplacian operator is defined by ∆ϕ Z(s) = d ∑{ } Z(s + ϕ ej ) − 2Z(s) + Z(s − ϕ ej ) , j=1 where ej is the unit vector whose jth entry is 1. Depending on the behavior of the spectral density at high frequencies, we can apply the Laplacian operator iteratively to control the ( )τ τ peak near the origin. Define Yϕ (J ) ≡ ∆ϕ Z(s) as the lattice process obtained by applying the Laplacian operator τ times. Then as shown by Stein (1995), its corresponding spectral density becomes  d ∑ ¯τ fϕ (λ) =  j=1 2τ  ¯ fϕ (λ). 4 sin2 (λj /2)  (2.1) ¯τ Under the condition of (1.1), the limit of fϕ (λ) as ϕ → 0 after scaling by ϕd−θ is ¯τ ϕd−θ fϕ (λ) → c  d ∑  j=1 2τ  ∑ |λ + 2πQ|−θ 4 sin2 (λj /2)  Q ∈ Zd for λ ̸= 0. Define  { }2τ ∑   c ∑d 4 sin2 (λj /2) |λ + 2πQ|−θ , λ ∈ (−π, π]d \{0},  j=1  Q ∈ Zd   gc,θ (λ) =       (2.2) 0, λ = 0. The limit function, gc,θ (λ) is integrable by choosing τ such that 4τ − θ > −d. When d = 1, simple differencing is preferred as discussed in Stein (1995). Then, 4τ will be replaced with 2τ in our results. 12 Now suppose that Z(s) is observed on the lattice ϕJ . More specifically, we assume that τ we observe Yϕ (J ) at J ∈ Tm = {1, ..., m}d after differencing Z(s) using the Laplacian operator τ times. We further assume that ϕ = m−1 so that the number of observations τ increases within a bounded observation domain. The spectral density of Yϕ (J ) can be estimated by a periodogram which is defined using a discrete Fourier transform of the data. That is, the periodogram is defined by τ Im (λ) = (2πm)−d |D(λ)|2 , where D(λ) is the discrete Fourier transform of the data given by D(λ) = ∑ Yδτ (J ) exp{−i λT J }. J ∈Tm We consider the periodogram only at Fourier frequencies, 2πm−1 J for J ∈ Tm = {−⌊(m − 1)/2⌋, · · · , m − ⌊m/2⌋}d , where ⌊ x ⌋ is the largest integer not greater than x. A smoothed periodogram at Fourier frequencies is defined by ( ˆτ Im 2πJ m ) = ∑ ( τ Wh (K)Im K ∈Tm 2π(J + K) m ) , with weights Wh (K) given by Λh (2πK/m) , L∈Tm Λh (2πL/m) Wh (K) = ∑ 13 (2.3) where Λh (s) = 1 (s) Λ I h h {|| s ||≤h} for a symmetric continuous function Λ on Rd that satisfies Λ(s) ≥ 0 and Λ(0) > 0 and IA is the indicator function of the set A. The norm || · || is defined by || s || = max{|s1 |, |s2 |, ..., |sd |}. For positive functions a and b, a(λ) ≍ b(λ) for λ ∈ A means that there exist constants C1 and C2 such that 0 < C1 ≤ a(λ)/b(λ) ≤ C2 < ∞ for all possible λ ∈ A. For asymptotic results in this paper, we consider the following assumption on the spectral density f (λ). Assumption 1. The spectral density f (λ) of a stationary Gaussian random field {Z(s), s ∈ Rd }, A. f (λ) ∼ c |λ|−θ as |λ| → ∞, B. f (λ) is twice differentiable and there exists a positive constant C such that for |λ| > C, f (λ) ≍ (1 + |λ|)−θ , ∂ f (λ) ≍ (1 + |λ|)−(θ+1) ∂λj ∂2 f (λ) ≍ (1 + |λ|)−(θ+2) ∂λj ∂λk and (2.4) for j, k = 1, ..., d. 2.2 Asymptotic properties of a smoothed periodogram Asymptotic properties of a spatial periodogram and a smoothed spatial periodogram under fixed-domain asymptotics were investigated by Stein (1995) and Lim and Stein (2008). They assume that spectral density f is twice differentiable and satisfies (2.4) for all λ ∈ Rd . 14 This assumption tells us that the spectral density f (λ) behaves like (1 + |λ|)−θ for all λ, which is much stronger condition than (1.1). However this condition allows to find asymptotic bounds of expectation, variance and covariance of a spatial periodogram at Fourier frequency 2πJ /m for each m ̸= 0 and J such that ∥J ∥ ̸= 0. Consistency and asymptotic normality of a smoothed spatial periodogram at Fourier frequency 2πJ /m, however, are shown when limm→∞ 2πJ /m = µ ̸= 0, that is, J should not be closed to zero asymptotically. Since we make use of asymptotic properties of a smoothed spatial periodogram at such Fourier frequency under more general assumption (Assumption 1), we extend some of the results in Stein (1995) and Lim and Stein (2008) under Assumption 1. We focus only on a smoothed spatial periodogram in the following theorem, but results for a smoothed spatial cross-periodogram can be shown similarly. Throughout the dissertation, denote p −→ by convergence in probability; d −→ by convergence in distribution. Theorem 3. Suppose that the spectral density f of a stationary Gaussian random field Z(s) on Rd satisfies Assumption 1. Also suppose that 4τ > θ − 1 and h = Cm−γ for some C > 0 where γ satisfies max{(d − 2)/d, 0} < γ < 1. Further, assume that limm→∞ 2πJ /m = µ and 0 < ∥µ∥ < π. Then, we have ˆτ Im (2πJ /m) p −→ 1 ¯τ fϕ (2πJ /m) and 15 (2.5) ( ) d ˆτ mη m−(d−θ) Im (2πJ /m) − gc,θ (µ) −→ N ( Λ2 0, 2 Λ1 ( ) ) 2π d 2 gc,θ (µ) , C (2.6) ∫ where η = d(1 − γ)/2 and Λr = [−1,1]d Λr (s)ds. Remark 1. The function gc,θ is integrable under 4τ > θ−d which is satisfied by the condition 4τ > θ − 1. The condition 4τ > θ − 1 is necessary to show ( ) ¯τ ˆτ E Im (2πJ /m) /fϕ (2πJ /m) → 1 and the condition max{(d − 2)/d, 0} < γ < 1 is needed to show ( ) ¯τ ˆτ Var Im (2πJ /m) /fϕ (2πJ /m) → 0 so that (2.5) can be shown. 16 2.3 Approach I To estimate parameters, c and θ, we consider the following objective function to be minimized. L(c, θ) = ∑ { ( ) d−θ g (2π(J + K)/m) Wh (K) log m c,θ K ∈Tm } τ Im (2π(J + K)/m) + d−θ , gc,θ (2π(J + K)/m) m 1 (2.7) where Wh (K) is given in (2.3). In L(c, θ), 2πJ /m is any given Fourier frequency that satisfies ∥J ∥ ≍ m so that 2πJ /m is away from 0. L(c, θ) can be viewed as a weighted Whittle likelihood function. When Λ is a nonzero constant function, Wh (K) ≡ 1/|K| for K ∈ K, where K = {K ∈ Tm : ||2πK/m|| ≤ h} and |K| is the number of elements in the set K. Then, L(c, θ) is the form of a local Whittle likelihood for the lattice data {Yδτ (J ), J ∈ Tm } in which the true spectral density is replaced with md−θ gc,θ . Note that gc,θ (λ) is the limit of the spectral density of Yδτ (J ) after being scaled by m−(d−θ) for non-zero λ when ϕ = m−1 . The summation in L(c, θ) is over the Fourier frequencies near 2πJ /m by letting h → 0 as m → ∞. While a local Whittle likelihood method to estimate a long-range dependence parameter for time series considers Fourier frequencies near zero, we consider Fourier frequencies near a pre-specified non-zero frequency. For example, by choosing J such that ⌊2πJ /m⌋ = (π/2) 1d , where 1d is the d-dimensional vector of ones, L(c, θ) considers frequencies only near (π/2)1d . 17 2.3.1 Estimation of c under the known θ We consider the estimator of c by minimizing L(c, θ) when θ is known. Thus, the proposed estimator of c when θ is known as θ0 is given by c = arg min L(c, θ0 ), ˆ c∈C where C is the parameter space of c. c has the explicit expression obtained by ∂L(c, θ0 )/∂c = ˆ 0: c= ˆ ∑ 1 I τ (2π(J + K)/m) Wh (K) d−θ m , m 0 g0 (2π(J + K)/m) K ∈Tm (2.8) where g0 ≡ g1,θ0 . The following theorem establishes the consistency and asymptotic normality of the estimator c. ˆ Theorem 4. Suppose that the spectral density f of a stationary Gaussian random field Z(s) on Rd satisfies Assumption 1. Also suppose that 4τ > θ0 − 1 for a known θ0 and h = Cm−γ for some C > 0 where γ satisfies d/(d + 2) < γ < 1. Further, assume that J satisfies ⌊2πJ /m⌋ = (π/2) 1d and the true parameter c is in the interior of the parameter space C which is a closed interval. Then, for c given in (2.8), we have ˆ p c −→ c, ˆ (2.9) and ( d mη (ˆ − c) −→ N c Λ 0 , c2 2 Λ2 1 18 ( ) ) 2π d , C (2.10) ∫ where Λr = [−1,1]d Λr (s)ds and η = d(1 − γ)/2. Remark 2. Theorem 4 can also be proved when we replace θ0 in (2.8) with a consistent ˆ ˆ ˆ estimator θ as long as the estimator θ satisfies θ − θ0 = op ((log(m))−1 ). Remark 3. We can prove Theorem 4 for J such that limm→∞ 2πJ /m = µ and 0 < ∥µ∥ < π instead of the specific choice of ⌊2πJ /m⌋ = (π/2)1d , which we choose for simplicity in the proof. When we choose Λ as a constant function and C = (1/2)π 2 , we have d mη (ˆ − c) −→ N c ( ) 0 , 2d c2 π −d . For the Mat´rn spectral density given in (3.1) with d = 1, Du et al. (2009) showed that for e any fixed α1 with known ν, maximum likelihood estimator of σ 2 satisfies d 2 2ν n1/2 (ˆ 2 α1 − σ0 α0 ) −→ N σ 2ν ( ) 2 2ν 0 , 2(σ0 α0 )2 , (2.11) 2 where n is the sample size, and σ0 and α0 are true parameters. Note that m is the sample size of Yδτ which is the τ times differenced lattice process of Z(s). Since π 1/2 c = σ 2 α2ν for d = 1, we have the same asymptotic variance as in (2.11). However, our approach has a slower convergence rate since η < 1/3 when d = 1 as we used partial information. This is also the case for a local Whittle likelihood method in Robinson (1995). 19 2.3.2 Estimation of θ under the known c To estimate θ, we assume that c is known as c0 . The proposed estimator of θ is then given by ˆ θ = arg min L(c0 , θ), θ∈Θ (2.12) where Θ is the parameter space of θ. The consistency and the convergence rate of the ˆ proposed estimator θ are given in the following Theorem. Theorem 5. Suppose that the spectral density f of a stationary Gaussian random field Z(s) on Rd satisfies Assumption 1. Also suppose that 4τ > θ − 1 and h = Cm−γ for some C > 0 where γ satisfies d/(d + 2) < γ < 1. Further, assume that J satisfies ⌊2πJ /m⌋ = (π/2) 1d and the true parameter θ is in the interior of the parameter space Θ which is a closed interval. ˆ Then, for θ given in (2.12), we have ˆ p θ −→ θ. (2.13) ˆ θ − θ = op ((log m)−1 ). (2.14) In addition, ˆ Remark 4. The consistency of θ is not enough to determine the asymptotic distribution of ˆ θ since we have θ in the exponent of m in the expression of L(c, θ). For the proof of the asymptotic distribution, we need the rate of convergence given in (2.14). From Theorem 5, we can now show the following Theorem for the asymptotic distribution 20 ˆ of θ. Theorem 6. Under the conditions of Theorem 5, we have ( d ˆ log(m) mη (θ − θ) −→ N Λ 0, 2 Λ2 1 ( ) ) 2π d , C where η = d(1 − γ)/2. ˆ Remark 5. Note that we have a different convergence rate for θ compared to the convergence rate for c given in Theorem 4. The additional term log(m) is from the fact that θ is in the ˆ exponent of m in the expression of L(c, θ). 2.3.3 Estimation under unknown c and θ In the previous discussion, we consider estimation of one parameter when the other parameter is known. But in practice, both may be unknown. In order to handle this situation, c is assigned as any fixed value c∗ . The estimator of θ is then defined by ˆ θ = arg min L(c∗ , θ). θ∈Θ (2.15) Theorem 7. Suppose that the spectral density f of a stationary Gaussian random field Z(s) on Rd satisfies Assumption 1. Also suppose that 4τ > θ − 1 and h = Cm−γ for some C > 0 where γ satisfies d/(d + 2) < γ < 1. Further, assume that J satisfies ⌊2πJ /m⌋ = (π/2) 1d and the true parameter θ is in the interior of the parameter space Θ which is a closed interval. ˆ Then, for θ given in (2.15), we have ˆ p θ −→ θ. 21 (2.16) Furthermore, ˆ θ − θ = Op ((log m)−1 ). (2.17) ˆ In contrast to Theorem 5, The convergence rate of θ is slower. With this convergence ˆ rate, we can not prove asymptotic distribution of θ. Also, we could consider the estimator ˆ ˆ of c0 by minimizing L(θ, c), where θ defined in (2.15), that is, c= ˆ ∑ K ∈Tm Wh (K) τ Im (2π(J + K)/m) , ˆ ˆ md−θ gθ (2π(J + K)/m) 1 (2.18) ˆ where θ is the estimate of θ given in (2.15) with the fixed c∗ . But, the consistency of c is not ˆ guaranteed. Instead, we obtain the following results which can be easily derived from Corollary 1. c − c0 = Op (1). ˆ 22 2.4 Approach II In Section 2.3, we developed a local Whittle type estimator which utilizes Fourier frequency information around 2πJ /m = (π/2)1d . However, as the sample size increases, the Fourier frequencies used in the estimator will be very closed to 2πJ /m = (π/2)1d . Thus, we could use gc,θ (·) only at [2πJ /m]. In this Section, we provide another estimation methodology which uses directly the smoothed periodogram with a fixed frequency. Alternative estimator is obtained by minimizing ( ) R(c, θ) = log md−θ gc,θ (2πJ /m) + ˆτ Im (2πJ /m) . md−θ gc,θ (2πJ /m) 1 (2.19) Asymptotic properties will be discussed in the rest of this Section, and the organization is same as the Section 2.3. Most theoretical results of the new estimators are identical with those obtained in Section 2.3 but require some changes in proof. 2.4.1 Estimation of c under known θ The estimator of c is established by minimizing R(c, θ) when θ is known. Thus, when θ is known as θ0 , the proposed estimator of c is given by c = arg min R(c, θ0 ), ˆ c∈C where C is the parameter space of c. By the similar way in Section 2.3, the exact form of c ˆ is obtained by solving the equation ∂R(c, θ0 )/∂c = 0 : c= ˆ ˆτ Im (2πJ /m) , md−θ0 g0 (2πJ /m) 1 23 (2.20) where g0 ≡ g1,θ0 . The same consistency and asymptotic results as in Section 2.3 hold for this estimator. Theorem 8. Suppose that the spectral density f of a stationary Gaussian random field Z(s) on Rd satisfies Assumption 1. Also suppose that 4τ > θ0 − 1 for a known θ0 and h = Cm−γ for some C > 0 where γ satisfies d/(d + 2) < γ < 1. Further, assume that J satisfies ⌊2πJ /m⌋ = (π/2) 1d and the true parameter c is in the interior of the parameter space C which is a closed interval. Then, for c given in (2.20), we have ˆ p c −→ c, ˆ (2.21) and ( d mη (ˆ − c) −→ N c Λ 0 , c2 2 Λ2 1 ( ) ) 2π d , C (2.22) ∫ where Λr = [−1,1]d Λr (s)ds and η = d(1 − γ)/2. 2.4.2 Estimation of θ under known c Using (2.19), we can consider ˆ θ = arg min R(c0 , θ), θ∈Θ (2.23) where Θ is the parameter space of θ when we assume that c is known as c0 . In the following ˆ Theorem, the consistency and the convergence rate of the new estimator θ defined in (2.23) are provided. 24 Theorem 9. Suppose that the spectral density f of a stationary Gaussian random field Z(s) on Rd satisfies Assumption 1. Also suppose that 4τ > θ − 1 and h = Cm−γ for some C > 0 where γ satisfies d/(d + 2) < γ < 1. Further, assume that J satisfies ⌊2πJ /m⌋ = (π/2) 1d and the true parameter θ is in the interior of the parameter space Θ which is a closed interval. ˆ Then, for θ given in (2.23), we have ˆ p θ −→ θ. (2.24) ˆ θ − θ = op ((log m)−1 ). (2.25) In addition, Remark 6. With the similar way in the Section 2.3, the rate of convergence given in (2.25) ˆ is also useful for studying the asymptotic properties of θ. The same result as in the Section 2.3 will be shown. Theorem 10. Under the conditions of Theorem 9, we have ( d ˆ log(m) mη (θ − θ) −→ N where η = d(1 − γ)/2. 25 Λ 0, 2 Λ2 1 ( ) ) 2π d , C 2.4.3 Estimation under unknown θ and c In this subsection, we also consider the situation when both parameters are unknown. With a given c∗ which may be different from the true value c0 , the estimator of θ is established by ˆ θ = arg min R(c∗ , θ). θ∈Θ (2.26) Then, we have the following results which are similar to see 2.3.3. Theorem 11. Suppose that the spectral density f of a stationary Gaussian random field Z(s) on Rd satisfies Assumption 1. Also suppose that 4τ > θ0 − 1 for a known θ0 and h = Cm−γ for some C > 0 where γ satisfies d/(d + 2) < γ < 1. Further, assume that J satisfies ⌊2πJ /m⌋ = (π/2) 1d and the true parameter θ is in the interior of the parameter ˆ space Θ which is a closed interval. Then, for θ given in (2.26), we have ˆ p θ −→ θ (2.27) ˆ θ − θ = Op ((log m)−1 ). (2.28) ˆτ Im (2πJ /m) , ˆ ˆ md−θ gθ (2πJ /m) (2.29) Moreover, If c= ˆ 1 is viewed as the estimator of true value c0 , we can show c − c0 = Op (1). ˆ 26 ˆ Based on the quantity of c∗ , the overestimation and underestimation of θ for true value θ0 can be found in the following result. Theorem 12. (i) When c∗ < c0 , there exists M such that P (ii) When c∗ > c0 , there exists M such that P ( ( ) ˆ = 0 for m > M. θ0 ≥ θ ) ˆ = 0 for m > M. θ0 ≥ θ Remark 7. The properties of overestimation and underestimation for the first approach are also found from simulation study. However, theoretical results will be more complicated than second approach because the effect from Bm and Cm should be pored. 27 Chapter 3 Simulation Study In this chapter, simulation studies with various many models are introduced to validate the asymptotical results obtained in Chapter 2. Although estimators constructed in Chapter 2 work for high dimensional situation, one dimensional Mat´rn covariance model with various e parameter values are considered here. Let Z(s) is a stationary Gaussian process on R with a Mat´rn covariance function whose e spectral density follows (see e.g., Stein 1999, pp. 31) f (λ) = σ 2 (α2 + λ)−ν−1/2 . (3.1) Data are generated from the subroutine ”mnrnd” in Matlab with covariances following (3.1). We consider the region D = [0, 10] with different grid size ϕ = 0.1, 0.05 and 0.025 which corresponds to m = 100, 200, and 400. 500 data sets are simulated for each case. So that we have 500 parameter estimates. For the sake of simplifying computation, function Λ is a constant function so that Wh (K) is same for each K ∈ K. The four times finite difference operator (τ = 4) is applied on the 28 simulated data, and C = 1 and γ = 1/3 is chosen for the bandwidth . The notations used in the tables are defined as follows: m is sample size, |K| is the number of non-zero weights Wh (K), Bias is the average of the bias obtained by estimations, and STD is the standard deviation of estimates. In the first example, we consider (α, σ 2 , ν) = (1, 1/π, 1/2). In this case, true parameters of (c, θ) are (1/π, 2). Table 3.1 and Table 3.2 are results of estimates θ and c, respectively. Bias of Table 3.1 and Table 3.2 shows errors between our estimations and true value are less than 10−2 and STD means the estimations are very concentrated. Compared with the sample size (m), under the present bandwidth setting, the number of non-zero weights, |K|, seems to be small for each K ∈ K, that is, small number of frequencies are used. The wider bandwidth setting is also considered by replacing C = 1 with C = 5, and simulation output is shown in Table 3.3. The Bias and STD are slightly improved in the new bandwidth setting. The second simulation example comes from (3.1) with (α, σ 2 , ν) = (1, 1/π, 3/2) which implies (c, θ) = (1/π, 4). Under the same setting in the previous example with C = 1, The Bias and STD in Table 3.4 and 3.5 show similar results. Further, C = 5 is again applied to have wider bandwidth and the output is shown in Table 3.4. Although STD is improved, Bias in Table 3.6 did not be improved. From Table 3.3 and 3.6, the accuracy of estimation seems to be affected by which bandwidth we select. Therefore, it is important to find an optimal bandwidth. We will investigate this as a future research. Under the same simulation setting as Table 3.6, the second approach is also applied and the output are shown in Table 3.7. Compared with Table 3.6, the performance of the second approach seems to be similar with the first one. This matches those theoretical results we found before. 29 We consider the estimating θ when c is also unknown. In previous examples whose true value are (θ, c) = (2, 1/π) and (θ, c) = (4, 1/π). θ is estimated when c is assumed as 2, 1, 0.2 and 0.1. The simulation output of previous two examples under different c are shown in Table 3.8 and Table 3.9, and their histograms are placed in the Figure 3.1 and 3.2. When c is bigger than true value, the Bias is positive and grows as c increase. In the Figure 3.1 and 3.2, if the selected c is 1/π (true value of c), the estimates distributed around the both sides of the true value of θ (θ = 4). Meanwhile, when c is not equal to 1/π, most of estimates is left or right of the true value. Moreover, in the Figure 3.3, trend of estimations is gradually moving to true value as the increase of sample size. 30 Table 3.1: Estimation of θ under known c m 100 200 400 |K| Wh (K) 7 10 17 1/7 1/10 1/17 Bias 0.039 0.009 0.009 STD 0.129 0.088 0.05 Table 3.2: Estimation of c under known θ m 100 200 400 |K| Wh (K) 7 10 17 1/7 1/10 1/17 Bias 0.00072 0.0039 0.0024 STD 0.12 0.0945 0.078 Table 3.3: Estimation of θ under known c m 100 200 400 |K| Wh (K) 33 52 83 1/33 1/52 1/83 31 Bias -0.0024 0.004 0.002 STD 0.0618 0.038 0.0256 Table 3.4: Estimation of θ under known c m 100 200 400 |K| Wh (K) 7 10 17 1/7 1/10 1/17 Bias 0.032 0.02 0.011 STD 0.138 0.094 0.058 Table 3.5: Estimation of c under known θ m 100 200 400 |K| Wh (K) 7 10 17 1/7 1/10 1/17 Bias 0.014 0.003 -0.003 STD 0.132 0.094 0.077 Table 3.6: Estimation of θ under known c m 100 200 400 |K| Wh (K) 33 52 83 1/33 1/52 1/83 32 Bias 0.04 0.031 -0.027 STD 0.066 0.042 0.027 Table 3.7: Estimation of θ under known c (Second approach) m 100 200 400 |K| Wh (K) 33 52 83 1/33 1/52 1/83 Bias 0.004 0.026 -0.027 STD 0.077 0.047 0.03 Table 3.8: Estimation of θ under unknown c for Example 1 c 2 1 1/π 0.2 0.1 |K| Wh (K) 52 52 52 52 52 1/52 1/52 1/52 1/52 1/52 Bias 0.4907 0.2996 0.004 -0.1364 −0.3180 STD 0.0421 0.0419 0.0378 0.0417 0.0378 Table 3.9: Estimation of θ under unknown c for Example 2 c 2 1 1/π 0.2 0.1 |K| Wh (K) 52 52 52 52 52 1/52 1/52 1/52 1/52 1/52 33 Bias 0.5332 0.2245 0.031 -0.1309 -0.3331 STD 0.0418 0.0415 0.042 0.0413 0.0415 Figure 3.1: Histogram of Example 1 on different c. c=1/ π c=1 100 100 50 50 0 1.5 2 2.5 0 1.5 2 2.5 c=0.2 c=2 1 00 60 40 50 20 0 1.5 2 2.5 c=0.1 100 50 0 1.5 2 2.5 34 0 1.4 1.6 1.8 2 2.2 Figure 3.2: Histogram of Example 2 on different c. c=1/ π c=1 60 60 40 40 20 20 0 3.5 4 0 4.5 3.5 c=2 4 4.5 c=0.2 100 60 40 50 20 0 3.5 4 0 4.5 c=0.1 60 40 20 0 3.5 4 4.5 35 3.5 4 4.5 Figure 3.3: Histogrm of Example 2 with different grid sizes on wrong c. φ=0.1 100 50 0 3. 2 3. 4 3. 6 3. 8 4 4. 2 4. 4 4. 6 4. 8 φ=0.05 100 50 0 3. 2 3. 4 3. 6 3. 8 4 4. 2 4. 4 4. 6 4. 8 φ=0.025 60 40 20 0 3. 2 3. 4 3. 6 3. 8 4 4. 2 4. 4 4. 6 4. 8 36 Chapter 4 Discussion In this dissertation, we first extended the result of Stein and Lim (2008) on weaker assumptions. Then, we proposed two approaches to estimate c and θ that govern the tail behavior of the spectral density of a stationary Gaussian random field on Rd . The proposed estimators are obtained by minimizing the objective function given in (2.7) and (2.19). The first approach makes use of frequency information around 2πJ /m. The second approach only employ the information from [2πJ /m] = (π/2)1d . Regarding proofs of asymptotic results and simulation comparison, there is not much difference between these two approaches. As mentioned in Chapter 2, the objective function given in (2.7) is similar to the one used in the local Whittle likelihood method when a kernel function Λ in Wh (K) is constant. ¯τ When we replace md−θ gc,θ with fϕ (λ) and remove Wh (K) in (2.7), it can be thought of an approximation to the likelihood of Y τ (J ). This approximation, however, has not been ϕ verified under fixed-domain asymptotics. One might think that we can apply a similar technique to prove the validity of Whittle approximation to the likelihood since Y τ (J ) is ϕ ¯τ a lattice process. However, the spectral density fϕ (λ) of Y τ (J ) converges to zero, which ϕ 37 require a different approach and further investigation is needed. The weights in (2.7) is controlled by h, a bandwidth, which can be interpreted as a proportion of Fourier frequencies to be considered in the objective function. In our theorems, we assume h = Cm−γ for some constant C. In proofs, we make use of the properties of ˆτ a smoothed spatial periodogram Im . Simulation results are also changing with different bandwidth. Thus, we could find the optimal bandwidth that minimizes the mean squared ˆτ ˆτ error of Im . However, finding the mean squared error of Im needs explicit expressions of the ˆτ bias and variance of Im (λ) and this requires further investigation. It will be more useful when we can estimate c and θ together or estimate θ when c is unknown. Due to the form of gc,θ , proving their asymptotic properties under fixed-domain asymptotics is challenging and needs different mathematics. Although some contributions including theoretical results are made for the case which both parameters are unknown, more efforts are still need. In the current method, to estimate ˆ θ, c was pretended to be a fixed number c∗ but convergence rate of θ may be slower. To ˆ handle this problem, we believe updating c∗ through θ could be more reasonable, but how to update both estimators by an iterative way is still open. The approaches of the fractal index could be another alternative way to research the tail behavior of the spectral density. By Abelian type theorem, some relationships between the tail of the spectral density and the origin of the covariance function have been existed. In this situation, the methodologies for the fractal index may be useful but the detail have to be carefully considered. Also, we believe our approaches should be available for the stationary increment process. Finally, in our work, data are sampled from on the regular grid points. But in practice, irregular situation is more interesting. Several works or ideas discussed for increasing do38 main asymptotics may be also valid for fixed-domain asymptotics. Meanwhile, we are also interested in extending our univariate approaches univariate to multivariate situation. 39 Chapter 5 Appendix 5.1 The properties of gc,θ (λ) Some properties of the function gc,θ (λ) are discussed in this Appendix. These properties will be used in the proofs given in Appendix 5.2.1. Recall that gc,θ (λ) = c  d ∑  j=1 2τ  ∑ 2 (λ /2) |λ + 2πQ|−θ . 4 sin j  Q∈Zd For a function gc,θ (λ), let ∇g be the gradient of g with respect to λ and let g and g ˙ ¨ denote the first and second derivatives of gc,θ (λ) with respect to θ, respectively. That is,∇g = (∂g/∂λ1 , · · · , ∂g/∂λd ), g = ∂gc,θ (λ)/∂θ and g = ∂ 2 gc,θ (λ)/∂θ2 . ˙ ¨ We denote Aρ = [−π, π]d \ (−ρ, ρ)d for a fixed ρ that satisfies 0 < ρ < 1. Since we assume that the parameter space Θ is a closed interval in Chapter 2, let Θ = [θL , θU ] and θL > d. Although Lemma 1 can be shown for any fixed ρ with 0 < ρ < 1, we further assume that ρ is small enough so that all Fourier frequencies near (π/2)1d considered in R(c, θ) are contained in Aρ . 40 Lemma 1. The following properties hold for gc,θ (λ). Let c > 0 be a fixed constant. (a) There exist constants KL and KU such that for all (θ, λ) ∈ Θ × Aρ , 0 < KL ≤ gc,θ (λ) ≤ KU < ∞. (5.1) (b) For any θ1 , θ2 ∈ Θ, there exist constants KL and KU such that for all λ ∈ Aρ , 0 < KL ≤ gc,θ1 (λ)/gc,θ2 (λ) ≤ KU < ∞. (5.2) ˙ ¨ ˙ ˙ (c) ∇g, g, g , g/g and ∇(g/g) are uniformly bounded on Θ × Aρ . (d) gc,θ (λ) is continuous on Θ × Aρ . Proof. Since gc,θ (λ) is linear in c, it will be enough just consider g1,θ (λ). First, we find the ∑ upper and lower bounds of Q∈Zd |λ + 2πQ|−θ . For all (θ, λ) ∈ Θ × Aρ , we have ∑ |λ + 2πQ|−θ ≥ π −θU > 0 Q∈Zd and ∑ Q∈Zd |λ + 2πQ|−θ ≤ ∑ |λ + 2πQ|−θL + ϵ−θU Q∈Zd \{0} ≤ (2π)d ϵd−θL /(d − θL ) + ϵ−θU , 41 where the last inequality follows from ∑ ∫ −θL |λ + 2πQ| ≤ Q∈Zd \0 |y |≥1 ∫ ≤ |z |≥ϵ ∫ = = |λ + 2πy|−θL dy (2π)d |z|−θL dz (2π)d xd−1 x−θL dx x≥ϵ (2π)d ϵd−θL /(θL − d), (5.3) since θL > d. Thus, we have ∑ 0 < kL ≤ |λ + 2πQ|−θ ≤ kU < ∞, (5.4) Q∈Zd where kL = π −θU and kU = (2π)d ϵd−θL /(θL − d) + ϵ−θU . Then, (a) follows from (5.4), (4d sin2 (ϵ/2))2τ ≤  d ∑  j=1 2τ  4 sin2 (λj /2) ≤ (4d)2τ ,  and by setting KL ≡ c (4d sin2 (ϵ/2))2τ kL and KU ≡ c (4d)2τ kU . (b) follows from observing that ∑ Q∈Zd |λ + 2πQ|−θ has lower and upper bounds that are uniform on Θ × Aρ as given in (5.4). 42 For (c), we have ∂g ∂λi = c 4τ {∑ d }2τ −1 4 sin2 (λj /2) sin(λi ) j=1 −θ {∑ d j=1 ∑ ≤ K }2τ −1 ∑ 4 sin2 (λj /2) ∑ |λ + 2πQ|−θ Q∈Zd (λi + 2πQi ) |λ + 2πQ|−θ−2 Q∈Zd |λ + 2πQ|−θ Q∈Zd ≤ K kU for some constant K > 0 and kU given in (5.4), which implies uniform boundedness of ∇g on Θ × Aρ . For the uniform bound of g and g , we first compute g and g : ˙ ¨ ˙ ¨ g = −c ˙ {∑ d }2τ ∑ 4 sin2 (λj /2) j=1 g = c ¨ {∑ d |λ + 2πQ|−θ log |λ + 2πQ| , Q∈Zd }2τ ∑ 4 sin2 (λj /2) j=1 |λ + 2πQ|−θ (log |λ + 2πQ|)2 . Q∈Zd Since we can find x0 and K such that for a given β > 0, | log x| ≤ Kxβ for all x > x0 , we can show that there exist n0 , K1 and K2 that satisfy ∑ |g| ≤ K1 + K2 ˙ |λ + 2πQ|−θ+β Q∈Zd ,||Q||≥n0 for some fixed β > 0. When we choose β = (θL − θ)/2, we can show that ∑ |λ + 2πQ|−θ+β < ∞ Q∈Zd ,||Q||≥n0 43 using a similar argument to show (5.3), which leads to uniform boundedness of g. Similarly, ˙ we can show uniform boundedness of g . ¨ The uniform boundedness of g/g follows from uniform boundedness of g and (a). To ˙ ˙ show uniform boundedness of ∇(g/g), consider ˙ ∑ |λ + 2πQ|−θ−2 (λi + 2πQi )(1 − θ log |λ + 2πQ|) ∂ Q∈Zd (g/g) = − ˙ ∑ ∂λi |λ + 2πQ|−θ Q∈Zd ) )( ∑ (∑ −θ−2 −θ (λi + 2πQi ) |λ + 2πQ| log |λ + 2πQ| −θ Q∈Zd |λ + 2πQ| Q∈Zd + . )2 (∑ |λ + 2πQ|−θ Q∈Zd Since denominators in the expression of ∂ (g/g) /∂λi have uniform lower bounds as shown ˙ in (5.4), it is enough to find uniform bounds of numerators to show uniform boundedness of ∂ (g/g) /∂λi . By observing that |λi + 2πQi | ≤ |λ + 2πQ| and |λ + 2πQ|−1 ≤ K for some ˙ K > 0 on Aρ , we can show that each numerator in the expression of ∂ (g/g) /∂λi is uniformly ˙ bounded on Θ × Aρ using a similar argument to show uniform boundedness of g. ˙ ∑ To show (d), it is enough to show the continuity of Q∈Zd |λ + 2πQ|−θ on Θ × Aρ since {∑ }2τ d 2 (λ /2) is continuous on Aρ . It can be easily shown that j j=1 4 sin ∑ |λ + 2πQ|−θ Q∈Zd ,||Q||>n converges to zero uniformly on Θ × Aρ as n → ∞, which implies the uniform convergence of ∑ Q∈Zd ,||Q||≤n |λ + 2πQ|−θ to g(θ, λ). Thus, the continuity of gc,θ (λ) in λ follows from the continuity of |λ + 2πQ|−θ . 44 5.2 Proofs of Theorems in Section 2 5.2.1 Proofs of Theorems in Section 2.2 Proof of Theorem 3. If f (λ) satisfies (2.4) for all λ, (2.5) and (2.6) hold by results in Stein (1995) and Lim and Stein (2008). To prove (2.5) and (2.6) when (2.4) holds only for large λ, we need to show that the effect of f (λ) on |λ| ≤ C is negligible. Consider a spectral density k(λ) which satisfies k(λ) ∼ c|λ|−θ as |λ| → ∞ and k(λ) is twice differentiable and satisfies (2.4) for all λ. Also assume that k(λ) ≡ f (λ) for |λ| > C. f,τ Let Im (λ) be the periodogram at λ from the observations under f (λ) and f,τ am,ϕ (J , K) = (2πm)−d ∫  d ∑ Rd j=1 ( 4 sin2  )2τ ϕλj 2  ( where Φ(λ, J , k) = d ∏ j=1 sin sin2 ( ϕλj πJj 2 + m ) f (λ)Φ(λ, J , K)dλ. mϕλj 2 ( sin ) ϕλj πKj 2 + m ) Note that ( ) f,τ f,τ E Im (2πJ /m) = am,ϕ (J , J ), ( ) f,τ f,τ f,τ Var Im (2πJ /m) = am,ϕ (J , J )2 + am,ϕ (J , −J )2 . (2.5) and (2.6) follow from Theorems 3, 6 and 12 in Lim and Stein (2008) when these Theorems hold for f under Assumption 1. The key part of proofs of these Theorems under 45 Assumption 1 is to show ( ) f,τ E Im (2πJ /m) = 1 + O(m−β1 ) ¯τ (2πJ /m) fϕ ( ) f,τ Var Im (2πJ /m) = 1 + O(m−β2 ), ¯ f τ (2πJ /m)2 (5.5) (5.6) ϕ for some β1 , β2 > 0. Once (5.5) and (5.6) are shown, the other parts of proofs are similar to the proofs in Lim and Stein (2008). Since results in Stein (1995) and Lim and Stein (2008) hold for k(λ), we have (5.5) and (5.6) for k(λ). Then, (5.5) and (5.6) for f (λ) follow from am,ϕ (J , ±J ) − am,ϕ (J , ±J ) = O(m−d−4τ ), f,τ k,τ (5.7) for J that satisfies ∥J ∥ ≍ m and 2J /m ̸∈ Zd . (5.7) holds since f,τ k,τ am,ϕ (J , ±J ) − am,ϕ (J , ±J ) { )}2τ ( ∫ ∑d −d 2 ϕλj = (2πm) (f (λ) − k(λ))Φ(λ, J , k)dλ j=1 4 sin |λ|≤C 2 ∫ ≤ (2πm)−d |λ|≤C { ∑d j=1 ( 4 sin2 ϕλj 2 )}2τ |f (λ) − k(λ)| Φ(λ, J , k)dλ ≤ v m−d−4τ for some positive constant v since k(λ) ≡ f (λ) for |λ| > C and ϕλj /2 ± πJj /m stays away from zero and π when m is large. 46 5.2.2 Proofs of Theorems in Section 2.3 Proof of Theorem 4. To show weak consistency of c, we consider upper and lower bounds ˆ of c. Let ˆ K U = argmaxK ∈Tm ,W (K )̸=0 g0 (2π(J + K)/m) h and K L = argminK ∈Tm ,,W (K )̸=0 g0 (2π(J + K)/m). h Recall that g0 = g1,θ0 . Then, we have ∑ K ∈Tm τ Wh (K)Im (2π(J + K)/m) md−θ0 g0 (2π(J + K U )/m) ∑ K ∈Tm ≤ ≤ c ˆ τ Wh (K)Im (2π(J + K)/m) md−θ0 g0 (2π(J + K L )/m) which can be rewritten as ˆτ c Im (2πJ /m) md−θ0 gc,θ0 (2π(J + K U )/m) ≤ ≤ c ˆ ˆτ c Im (2πJ /m) md−θ0 gc,θ0 (2π(J + K L )/m) (5.8) with probability one. Note that both gc,θ0 (2π(J + K U )/m) and gc,θ0 (2π(J + K L )/m) ˆτ converge to gc,θ0 ((π/2)1d ) by continuity of gc,θ (λ) and m−(d−θ0 ) Im (2πJ /m) converges to gc,θ0 ((π/2)1d ) in probability by Theorem 3. Thus, it follows that c converges to c in probaˆ bility. 47 For the asymptotic distribution of c, note that we have ˆ ( mη ) ˆτ Im (2πJ /m) md−θ0 d −→ N − gc,θ0 ((π/2)1d ) ( Λ2 0, 2 Λ1 ( ) ) 2π d 2 gc,θ ((π/2)1d ) 0 C (5.9) from Proposition 12 in Lim and Stein (2008) and ( ( ) ) mη gc,θ0 2π(J + K E )/m − gc,θ0 ((π/2)1d ) −→ 0, (5.10) d for E = U or L, since 4τ > θ0 − 1, h = Cm−γ and d+2 < γ < 1. Then, (2.10) follows from (5.50) and (5.10). To prove Theorem 5, we consider following lemmas. Lemma 2. Consider a function hm (x) = − log(x) + dm (x − 1), where dm is positive and a function of a positive integer m. Also assume that dm → 1 as m → ∞. Then, for a given r with 0 < r < 1, there exist δr > 0 and Mr such that for all m ≥ Mr , hm (x) > δr , for any x ∈ Zr , where Zr = {z : |z − 1| > r, z > 0}. Proof. It can be easily shown that for any positive integer m, hm (x) is a convex function on (0, ∞) and minimized at x = 1/dm with hm (1/dm ) ≤ 0. Let h∞ (x) = − log(x) + x − 1. Since dm → 1, for any r ∈ (0, 1), there exists Mr > 0 such that for all m ≥ Mr , we have |1/dm − 1| ≤ r and min{hm (1 − r), hm (1 + r)} > (1/2) min{h∞ (1 − r), h∞ (1 + r)} > 0. 48 Hence for all x ∈ Zr , we have hr (x) ≥ min{hm (1 − r), hm (1 + r)} > (1/2) min{h∞ (1 − r), h∞ (1 + r)} ≡ δr . The following lemma shows that L(c0 , θ1 ) − L(c0 , θ0 ) can be bounded from below by three terms and two of them can be neglected. Lemma 3. For a positive integer m and θ1 ∈ Θ, we have L(c0 , θ1 ) − L(c0 , θ0 ) ≥ Am + Bm + Cm , where ( ) gc0 ,θ0 (2π(J + S m )/m) Am = − log mθ1 −θ0 gc0 ,θ1 (2π(J + S m )/m) ( ) ˆδ gc0 ,θ0 (2π(J + S m )/m) Im (2πJ /m) + d−θ mθ1 −θ0 −1 , gc0 ,θ1 (2π(J + S m )/m) m 0 gc0 ,θ0 (2π(J + K M )/m) ( Bm = log gc0 ,θ0 (2π(J + S m )/m) gc0 ,θ1 (2π(J + S M )/m) ) gc0 ,θ0 (2π(J + S M )/m) gc0 ,θ1 (2π(J + S m )/m) ( ) ˆτ gc0 ,θ0 (2π(J + K M )/m) Im (2πJ /m) 1− . Cm = d−θ gc0 ,θ0 (2π(J + K m )/m) m 0 gc0 ,θ0 (2π(J + K M )/m) 49 (5.11) (5.12) (5.13) In (5.11)-(5.13), K M , K m , S M and S m are defined as K M = arg max{K ∈Tm ,W (K )̸=0} gc0 ,θ0 (2π(J + K)/m), h K m = arg min{K ∈Tm ,W (K )̸=0} gc0 ,θ0 (2π(J + K)/m), h ( ) gc0 ,θ0 (2π(J + K)/m) S M = arg max{K ∈Tm ,W (K )̸=0} log , h gc0 ,θ1 (2π(J + K)/m) gc ,θ (2π(J + K)/m) S m = arg min{K ∈Tm ,W (K )̸=0} 0 0 . h gc0 ,θ1 (2π(J + K)/m) Furthermore, sup |Bm | = o(1), (5.14) θ∈Θ Cm = op (1), where (5.15) is under the conditions of Theorem 5. 50 (5.15) Proof. From the expression of L(c, θ) given in (2.7), we have L(c0 , θ1 ) − L(c0 , θ0 ) ∑ =− Wh (K) log ( gc ,θ (2π(J + K)/m) mθ1 −θ0 0 0 K ∈Tm ) gc0 ,θ1 (2π(J + K)/m) ∑ gc ,θ (2π(J + K)/m) I τ (2π(J + K)/m) Wh (K) d−θ m mθ1 −θ0 0 0 gc0 ,θ1 (2π(J + K)/m) m 0 gc0 ,θ0 (2π(J + K)/m) K ∈Tm ∑ I τ (2π(J + K)/m) Wh (K) d−θ m − m 0 gc0 ,θ0 (2π(J + K)/m) K ∈Tm ( ) gc0 ,θ0 (2π(J + S M )/m) ≥ log mθ1 −θ0 gc0 ,θ1 (2π(J + S M )/m) + ∑ gc ,θ (2π(J + S m )/m) I τ (2π(J + K)/m) Wh (K) d−θ m mθ1 −θ0 0 0 gc0 ,θ1 (2π(J + S m )/m) m 0 gc0 ,θ0 (2π(J + K M )/m) K ∈Tm ∑ I τ (2π(J + K)/m) − Wh (K) d−θ m m 0 gc0 ,θ0 (2π(J + K m )/m) K ∈Tm ( ) ˆτ g (2π(J + S M )/m) Im (2πJ /m) θ1 −θ0 c0 ,θ0 = − log m + d−θ gc0 ,θm (2π(J + S M )/m) m 0 gc0 ,θ0 (2π(J + K M )/m) ( ) gc0 ,θ0 (2π(J + S m )/m) gc0 ,θ0 (2π(J + K M )/m) × mθ1 −θ0 − gc0 ,θ1 (2π(J + S m )/m) gc0 ,θ0 (2π(J + K m )/m) + =: Hm . Hm is further decomposed as ( Hm = − log gc ,θ (2π(J + S m )/m) 1 mθ −θ0 0 0 gc0 ,θ1 (2π(J + S m )/m) ( ) ) gc0 ,θ0 (2π(J + S m )/m) + d−θ −1 mθ1 −θ0 gc ,θ1 (2π(J + S m )/m) m 0 gc0 ,θ0 (2π(J + K M )/m) 0 ( ) gc0 ,θ0 (2π(J + S m )/m) gc0 ,θ1 (2π(J + S M )/m) + log gc0 ,θ0 (2π(J + S M )/m) gc0 ,θ1 (2π(J + S m )/m) ( ) ˆτ gc0 ,θ0 (2π(J + K M )/m) Im (2πJ /m) + d−θ 1− , gc0 ,θ0 (2π(J + K m )/m) m 0 gc ,θ (2π(J + K M )/m) ˆδ Im (2πJ /m) 0 0 51 which is Am + Bm + Cm given in (5.11)-(5.13). Note that 2π(J +K M )/m, 2π(J +K m )/m, 2π(J +S M )/m and 2π(J +S m )/m converge to (π/2)1d as m → ∞. Note also that the convergence of 2π(J +S M )/m and 2π(J +S m )/m holds for θ1 uniformly on Θ, because h → 0. The continuity of gc0 ,θ in Lemma 1 implies that as m → ∞, ( log gc0 ,θ0 (2π(J + S m )/m) gc0 ,θ1 (2π(J + S M )/m) gc0 ,θ0 (2π(J + S M )/m) gc0 ,θ1 (2π(J + S m )/m) ) −→ 0 (5.16) holds for θ1 uniformly on Θ, therefore, supΘ |Bm | = o(1). Also, we have p ˆτ m−(d−θ0 ) Im (2πJ /m)/gc0 ,θ0 (2π(J + K M )/m) −→ 1, ˆτ since m−(d−θ0 ) Im (2πJ /m)/gc0 ,θ0 ((π/2)1d ) converges to one in probability by Theorem 3 and gc0 ,θ0 (2π(J + K M )/m) converges to gc0 ,θ0 ((π/2)1d ). Thus, together with 1− gc0 ,θ0 (2π(J + K M )/m) gc0 ,θ0 (2π(J + K m )/m) → 0, Cm converges to one in probability. Theorem 13 (Egorov theorem (Folland 1999)). Suppose that ν(X) < ∞, and f1 , f2 , ... and f are measurable complex-valued functions on X such that fn → f a.e. Then for every ϵ > 0 there exists E ⊆ X such that ν(E) < ϵ and fn → f uniformly on E c . Proof of Theorem 5. Let (Ω, F, P) be the probability space where a stationary Gaussian ˆ ˆ random field Z(s) is defined. To emphasize dependence on m, we use θm instead of θ in this 52 proof. Note that we have ˆ P (L(c0 , θm ) − L(c0 , θ0 ) ≤ 0) = 1 (5.17) ˆ for any positive integer m, due to the definition of θm . We are going to prove the theorem ˆ by deriving a contradiction to (5.17) when θm does not converge to θ0 in probability. ˆ Suppose that θm does not converge to θ0 in probability. Then, there exist ϵ > 0, δ > 0 and M1 such that for m ≥ M1 , ˆ P (|θm − θ0 | > ϵ) > δ. ˆ We define Dm = {ω ∈ Ω : |θm − θ0 | > ϵ}. By Lemma 3, we have ˆ L(c0 , θm ) − L(c0 , θ0 ) ≥ Am + Bm + Cm , ˆ where Am , Bm and Cm are given in (5.11)-(5.13) with θ1 = θ. Also, note that ( Am = hm gc ,θ (2π(J + S m )/m) ˆ mθ−θ0 0 0 gc ,θ (2π(J + S m )/m) ˆ 0 ) , where hm (·) is defined in Lemma 2 with dm = ˆδ Im (2πJ /m) md−θ0 gc0 ,θ0 (2π(J + K M )/m) , (5.18) where K M is defined in Lemma 3. We are going to show that there exist {mk }, a subsequence of {m} and a subset of Dmk 53 such that for large enough mk , Amk + Bmk + Cmk is bounded away from zero. By Theorem 3 and the convergence of gc0 ,θ0 (2π(J + K M )/m) to gc0 ,θ0 ((π/2)1d ), we p have dm → 1. Then, there exists {mk }, a subsequence of {m} such that dmk converges to one almost surely. By (5.17) in Lemma 3, almost sure convergence of dmk implies that Cmk given in (5.13) converges to zero almost surely. To use Lemma 2, we need uniform convergence of dmk which is obtained by Egorov’s Thoerem (Folland, 1999). By Egorov’s Theorem, there exists Gδ ⊂ Ω such that dmk and Cmk converge uniformly on Gδ and P (Gδ ) > 1 − δ/2. On the other hand, there exists a M2 , which does not depend on ω, such that for mk ≥ M2 , ˆ θm −θ0 mk k gc0 ,θ0 (2π(J + S mk )/mk ) gc ,θ (2π(J + S mk )/mk ) ˆ 0 mk −1 > 1 2 (5.19) for all ω ∈ Dmk , because of the uniform boundedness of gc0 ,θ0 /gc0 ,θ1 . Let Hmk = Dmk ∩ Gmk . Note that P (Hmk ) > δ/2 > 0 for mk ≥ M1 . Then, by Lemma 2 with r = 1/2, there exist δr > 0 and Mr such that for mk ≥ Mr , ) ( Amk ˆ θm −θ0 gc0 ,θ0 (2π(J + S mk )/mk ) = − log mk k gc0 ,θ1 (2π(J + S mk )/mk ) ) ( ˆδ ˆ Im (2πJ /mk ) θmk −θ0 gc0 ,θ0 (2π(J + S mk )/mk ) k mk −1 + d−θ gc0 ,θ1 (2π(J + S mk )/mk ) mk 0 gc0 ,θ0 (2π(J + K M )/mk ) > δr (5.20) uniformly on Hmk . Note here that Mr ≥ max{M1 , M2 }. By the uniform convergence of |Bm | on Θ shown in Lemma 3, there exists a M3 such 54 that for mk ≥ M3 , Bmk < δr 4 (5.21) ˆ with θ1 = θmk (ω) uniformly for ω ∈ Ω. The uniform convergence of Cmk on Gδ allows us to find M4 such that for mk ≥ M4 , Cmk < δr 4 (5.22) uniformly on Hmk . Therefore, for mk ≥ max{Mr , M3 , M4 }, we have Amk + Bmk + Cmk ≥ Amk − |Bmk | − |Cmk | > δr /2 on Hmk which leads ˆ L(c0 , θmk ) − L(c0 , θ0 ) > δr 2 (5.23) on Hmk . Since P (Hmk ) > δ/2 > 0, it contradicts to (5.17) which completes the proof. Here, we do not need P (∩k Hmk ) > 0 since (5.17) should holds for any m > 0. ˆ p To show (2.14), it is enough to show that mθ−θ0 −→ 1 which is equivalent to show that gc0 ,θ0 (2π(J + S m )/m) gc ,θ (2π(J + S m )/m) ˆ −→ gc ,θ (2π(J + S m )/m) ˆ mθ−θ0 0 0 gc ,θ (2π(J + S m )/m) ˆ −→ p 1, (5.24) 1. (5.25) 0 p 0 ˆ (5.24) follows from the consistency of θ and the continuity of gc0 ,θ shown in Lemma 5.1. 55 To show (5.25), notice that we have ( ) ˆ P L(c0 , θ) − L(c0 , θ0 ) ≤ 0 = 1 (5.26) ˆ for each m > 0 by the definition of θ and we have ( ) ˆ P L(c0 , θ) − L(c0 , θ0 ) ≥ Am + Bm + Cm = 1 by Lemma 3. Suppose that (5.25) does not hold. Then, there exists r > 0, δ > 0 and M1 such that ( gc ,θ (2π(J + S m )/m) ˆ −1 mθ−θ0 0 0 gc ,θ (2π(J + S m )/m) ˆ 0 P ) > r >δ for all m ≥ M1 . On the other hand, there exists {mk }, a subsequence of {m}, such that dmk → 1, Bm → 0 and Cm → 0 almost surely, where dm is given in (5.18), Bm and Cm ˆ are given in (5.12) and (5.13) with θ1 = θ. Then, by Egorov’s Thoerem, there exists Ωδ ⊂ Ω such that P (Ωδ ) > 1 − δ/2 and dmk , Bm and Cm are uniformly convergent on Ωδ . As in Lemma 2, for amk , a nonzero solution of hmk (bmk ) = 0, where ˆ bm = mθ−θ0 gc0 ,θ0 (2π(J + S m )/m) , gc ,θ (2π(J + S m )/m) ˆ 0 there exists M2 such that |amk − 1| ≤ r uniformly on Ωδ for all mk ≥ M2 . Now, define { Dm = ω : gc ,θ (2π(J + S m )/m) ˆ mθ−θ0 0 0 −1 gc ,θ (2π(J + S m )/m) ˆ 0 56 } > r . (5.27) Note that P (Dmk ∩ Ωδ ) ≥ δ/2 > 0 for all mk ≥ max{M1 , M2 }. Similarly to the proof of Lemma 2, for each mk ≥ max{M1 , M2 }, there exists δr > 0 such that Amk > δr for all ω ∈ Dmk ∩ Ωδ . This implies that P (Amk > δr ) ≥ δ/2 for each mk ≥ max{M1 , M2 }. Note that δr does not depend on mk which can be seen in Lemma 2. Meanwhile, there exists M3 such that for mk ≥ M3 , |Bmk | ≤ δr /4, |Cmk | ≤ δr /4 for all ω ∈ Ωδ . Hence we have ( ) ˆ − L(c0 , θ0 ) > δr /2 ≥ δ/2 P L(c0 , θ) for mk ≥ max{M1 , M2 , M3 }, which contradicts to (5.26). Thus, (5.25) is proved. ˆ Alternative Proof of Theorem 5. To show the consistency of θ, for a given ϵ > 0 such that 0 < ϵ < min{θU −θ0 , θL −θ0 }/2, define Θϵ = {θ : |θ−θ0 | ≤ ϵ} and Θc is the complement ϵ of Θϵ . Then, we have ( ) ˆ P θ ∈ Θc ∩ Θ = P ϵ ( inf ( ≤ P ) Θc ∩ Θ ϵ inf Θc ∩ Θ ϵ 57 L(c0 , θ) ≤ inf Θϵ ∩ Θ L(c0 , θ) ) (L(c0 , θ) − L(c0 , θ0 )) ≤ 0 . By Lemma 3, we also have inf Θc ∩ Θ ϵ (L(c0 , θ) − L(c0 , θ0 )) ≥ ≥ ≥ inf (Am + Bm + Cm ) Θc ∩ Θ ϵ inf (Am − |Bm |) + Cm Θc ∩ Θ ϵ inf Θc ∩ Θ ϵ Am − sup |Bm | + Cm , Θ ˆ where Am , Bm and Cm are given in (5.11)-(5.13). Thus, to show the consistency of θ, it is enough to show that there exists δ > 0 such that ) ( P inf Θc ∩ Θ ϵ Am + Cm > δ −→ 1. since Bm is deterministic with supΘ |Bm | → 0 as m → ∞. We can consider Am as ( gc ,θ (2π(J + S m )/m) Am = hm mθ−θ0 0 0 gc0 ,θ (2π(J + S m )/m) ) , where hm (·) is defined in Lemma 2 with dm = ˆδ Im (2πJ /m) md−θ0 gc0 ,θ0 (2π(J + K M )/m) , (5.28) where K M is defined in Lemma 3. For θ ∈ Θc ∩ Θ, if θ > θ0 + ϵ, ϵ mθ−θ0 gc0 ,θ0 (2π(J + S m )/m) gc0 ,θ (2π(J + S m )/m) −→ ∞ as m → ∞, because of the uniform boundedness of gc0 ,θ0 /gc0 ,θ shown in Lemma 1. Similarly, 58 if θ < θ0 − ϵ, mθ−θ0 gc0 ,θ0 (2π(J + S m )/m) −→ 0 gc0 ,θ (2π(J + S m )/m) as m → ∞. Thus, there exists M1 such that for m ≥ M1 , gc0 ,θ0 (2π(J + S m )/m) −1 gc0 ,θ (2π(J + S m )/m) mθ−θ0 > 1 , 2 (5.29) for all θ ∈ Θc ∩ Θ, because of the uniform boundedness of gc0 ,θ0 /gc0 ,θ . ϵ By Theorem 12 in Lim and Stein (2008) and the convergence of gc0 ,θ0 (2π(J + K M )/m) p p to gc0 ,θ0 ((π/2)1d ), dm → 1. Similarly, we can show that Cm → 0. Then, there exists a δ > 0 such that ( P ) inf Θc ∩ Θ ϵ Am + Cm > δ −→ 1 (5.30) by Lemma 2 with r = 1/2 and the fact that randomness of Am and Cm comes from the same quantity dm . This completes the proof of (2.13). To proof Theorem 6, we consider the following Lemma. Lemma 4. Under the conditions of Theorem 5, let η = d(1 − γ)/2, we have 59 (a)   ∑ I τ (2π(J + K)/m) mη  Wh (K) d−θ m − 1 m 0 gc0 ,θ0 (2π(J + K)/m) K ∈Tm ( ( ) ) Λ2 2π d d −→ N 0, 2 , Λ1 C (5.31) (b) ( ∑ Wh (K) 1 − K ∈Tm τ Im (2π(J + K)/m) ) md−θ0 gc0 ,θ0 (2π(J + K)/m) gc0 ,θ0 (2π(J + K)/m) ˙ gc0 ,θ0 (2π(J + K)/m) = Op (m−η ) (5.32) Proof. To prove (5.31), we find the asymptotic distribution of its lower and upper bounds. It can be easily shown that   ∑ I τ (2π(J + K)/m) Wh (K) d−θ m LB m ≤ mη  − 1 ≤ U B m , m 0 gc0 ,θ0 (2π(J + K)/m) K ∈Tm where ( LB m = mη ( U B m = mη ˆτ Im (2πJ /m) md−θ0 gc0 ,θ0 (2π(J + K M )/m) ˆτ Im (2πJ /m) md−θ0 gc0 ,θ0 (2π(J + K m )/m) 60 ) − 1 , (5.33) ) − 1 (5.34) with K M and K m as given in Lemma 3. We rewrite LB m as (( LBm = mη ) ˆτ Im (2πJ /m) md−θ0 gc0 ,θ0 ((π/2)1d ) − 1 gc0 ,θ0 ((π/2)1d ) gc0 ,θ0 (2π(J + K M )/m) ) gc0 ,θ0 ((π/2)1d ) + −1 . gc0 ,θ0 (2π(J + K M )/m) By Lemma 1 and γ > d/(d + 2), we have gc0 ,θ0 ((π/2)1d ) gc0 ,θ0 (2π(J + K M )/m) ) ( gc0 ,θ0 ((π/2)1d ) −1 mη gc0 ,θ0 (2π(J + K M )/m) −→ 1, −→ 0. Thus, by Theorem 3, ( d LB m −→ N Λ 0, 2 Λ2 1 ( ) ) 2π d . C ( ) ) 2π d . C Similarly, we can show ( d U B m −→ N Λ 0, 2 Λ2 1 The lower and upper bounds converge to the same distribution which implies (5.31). 61 To show (5.32), we rewrite the LHS of (5.32) as ( ∑ Wh (K) 1 − K ∈Tm τ Im (2π(J + K)/m) md−θ0 gc0 ,θ0 (2π(J + K)/m) ) gc0 ,θ0 (2π(J + K)/m) ˙ gc0 ,θ0 (2π(J + K)/m) ∑ gc ,θ (2π(J + K)/m) gc0 ,θ0 ((π/2)1d ) ˙ ˙ Wh (K) 0 0 − gc0 ,θ0 (2π(J + K)/m) gc0 ,θ0 ((π/2)1d ) K ∈Tm ∑ gc0 ,θ0 (2π(J + K)/m) ˙ I τ (2π(J + K)/m) − Wh (K) d−θ m m 0 gc ,θ (2π(J + K)/m) gc0 ,θ0 (2π(J + K)/m) = K ∈Tm + 0 0 gc0 ,θ0 ((π/2)1d ) ˙ gc0 ,θ0 ((π/2)1d ) . By Lemma 1 and γ > d/(d + 2), we can show that   ∑ gc ,θ (2π(J + K)/m) gc0 ,θ0 ((π/2)1d ) ˙ ˙  −→ 0. Wh (K) 0 0 − gc0 ,θ0 (2π(J + K)/m) gc0 ,θ0 ((π/2)1d ) K ∈Tm mη  Also, it can be easily shown that LB m ≤ ∑ τ Im (2π(J + K)/m) gc0 ,θ0 (2π(J + K)/m) ˙ ≤ U Bm, Wh (K) d−θ m 0 gc0 ,θ0 (2π(J + K)/m) gc0 ,θ0 (2π(J + K)/m) K ∈Tm where LB m = ˆτ Im (2πJ /m) gc0 ,θ0 ((π/2)1d )gc0 ,θ0 (2π(J + P m )/m) ˙ , 2 gc ,θ (2π(J + P m )/m) 0 0 ˆτ (2πJ /m) gc0 ,θ0 ((π/2)1d )gc0 ,θ0 (2π(J + P M )/m) ˙ Im U Bm = , 2 gc ,θ (2π(J + P M )/m) md−θ0 gc0 ,θ0 ((π/2)1d ) md−θ0 gc0 ,θ0 ((π/2)1d ) 0 0 62 with gc ,θ (2π(J + K)/m) ˙ P M = arg max{K ∈Tm ,W (K )̸=0} 20 0 , h gc ,θ (2π(J + K)/m) 0 0 gc0 ,θ0 (2π(J + K)/m) ˙ P m = arg min{K ∈Tm ,W (K )̸=0} 2 h g c0 ,θ0 (2π(J + K)/m) . By Lemma 1, γ > d/(d + 2) and Theorem 3, we can show that ( mη LB m − ( mη U Bm − gc0 ,θ0 ((π/2)1d ) ˙ gc0 ,θ0 ((π/2)1d ) gc0 ,θ0 ((π/2)1d ) ˙ ) d −→ ) gc0 ,θ0 ((π/2)1d ) d −→  ( )2 gc0 ,θ0 ((π/2)1d ) ˙ N 0, gc0 ,θ0 ((π/2)1d )  ( )2 gc0 ,θ0 ((π/2)1d ) ˙ N 0, gc0 ,θ0 ((π/2)1d ) Λ2 Λ2 1 Λ2 Λ2 1 ( ( 2π C 2π C )d )d  ,  . This completes the proof of (5.32). ˙ ¨ Proof of Theorem 6. Let L = ∂L/∂θ and L = ∂ 2 L/∂θ2 . To show the asymptotic distriˆ ˆ ˙ bution of θ, we consider the Taylor expansion of L(c0 , θ) around θ0 , ˆ ¯ ˆ ˙ ˙ ¨ L(c0 , θ) = L(c0 , θ0 ) + L(c0 , θ)(θ − θ0 ), ¯ ˆ ˆ ˙ where θ lies on the line segment between θ and θ0 . Since L(c0 , θ) = 0, we have ( )−1 ˆ ¯ ¨ ˙ log(m)mη (θ − θ0 ) = −log(m)mη L(c0 , θ) L(c0 , θ0 ). Thus, it is enough to show ( ˙ (log(m))−1 mη L(c0 , θ0 ) −→ ¯ ¨ (log(m))−2 L(c0 , θ) −→ d p 63 N 1. Λ2 0, 2 Λ1 ( ) ) 2π d , C (5.35) (5.36) Since ˙ L(c0 , θ0 ) = − log(m) + K ∈Tm ∑ − ∑ gc ,θ (2π(J + K)/m) ˙ Wh (K) 0 0 gc0 ,θ0 (2π(J + K)/m) τ Wh (K)Im (2π(J + K)/m) K ∈Tm ( ) − log(m)md−θ0 gc0 ,θ0 (2π(J + K)/m) + md−θ0 gc0 ,θ0 (2π(J + K)/m) ˙ × ( )2 md−θ0 gc0 ,θ0 (2π(J + K)/m)   ∑ τ Im (2π(J + K)/m)  = log(m) Wh (K) d−θ − 1 m 0 gc0 ,θ0 (2π(J + K)/m) K ∈Tm + ∑ ( Wh (K) 1 − K ∈Tm ) τ Im (2π(J + K)/m) md−θ0 gc0 ,θ0 (2π(J + K)/m) gc0 ,θ0 (2π(J + K)/m) ˙ gc0 ,θ0 (2π(J + K)/m) , we see that (5.35) follows from Lemma 4. Next we prove (5.36). After some simplification, we have ∑ ¯ ¨ L(c0 , θ) = (log(m))2 Wh (K) K ∈Tm − 2 log(m) ∑ τ Im (2π(J + K)/m) ¯ md−θ gc ,θ (2π(J + K)/m) ¯ 0 Wh (K) τ Im (2π(J + K)/m)gc ,θ (2π(J + K)/m) ˙ ¯ 0 ¯ md−θ g 2 ¯(2π(J + K)/m) c0 ,θ K ∈Tm +2 ∑ K ∈Tm ∑ Wh (K) τ Im (2π(J + K)/m)gc ,θ (2π(J + K)/m) ˙2 ¯ 0  ¯ md−θ g 3 ¯(2π(J + K)/m) c0 ,θ  τ Im (2π(J + K)/m)  + Wh (K) 1 − ¯ md−θ gc ,θ (2π(J + K)/m) ¯ K ∈Tm 0 gc ,θ (2π(J + K)/m) ˙2 ¯ ∑ − Wh (K) 20 g ¯(2π(J + K)/m) c0 ,θ K ∈Tm =: E1 + E2 , gc ,θ (2π(J + K)/m) ¨ ¯ 0 gc ,θ (2π(J + K)/m) ¯ 0 where E1 is the first term with (log(m))2 and E2 is the last four terms in the expression of 64 ¯ ¨ L(c0 , θ). First, we want to show that p (log(m))−2 E1 −→ 1. (5.37) It can be easily shown that LB m ≤ (log(m))−2 E1 ≤ U B m , where LB m = U BM = ˆτ Im (2πJ /m) ¯ mθ−θ0 gc0 ,θ0 ((π/2)1d ) ¯ md−θ0 gc0 ,θ0 ((π/2)1d ) gc0 ,θ (2π(J + P M )/m) ˆτ Im (2πJ /m) , ¯ mθ−θ0 gc0 ,θ0 ((π/2)1d ) ¯ md−θ0 gc0 ,θ0 ((π/2)1d ) gc0 ,θ (2π(J + P m )/m) with P M = arg max{K ∈Tm ,W (K )̸=0} gc ,θ (2π(J + K)/m), ¯ h 0 P m = arg min{K ∈Tm ,W (K )̸=0} gc ,θ (2π(J + K)/m). ¯ h 0 By Theorem 3, (2.14) in Theorem 5 and Lemma 1, we can show that both LB m and U B m converge to one in probability, which in turn implies (5.37). In a similar way, we can show that (log(m))−1 E2 = Op (1). Thus, together with (5.37), we can show (5.36), which completes the proof. 65 In order to prove Theorem 7, we will extend Lemma 2 to more generalized situation. Lemma 5. Consider a function hm (x) = − log(x) + dm (x − 1), (x > 0), where {dm } is a sequence of positive numbers such that dm → d > 0 as m → ∞. Then, there exists some rl ∈ (0, 1) and ru ∈ (1, ∞), δr > 0 and M such that ∀m ≥ M, we have hm (x) > 1 ∀x ∈ (0, rl ] ∪ [ru , ∞). Proof. Since dm → d > 0, then ∀ϵ ∈ (0, d), ∃M s.t. ∀m ≥ M, |dm − d| < ϵ or d − ϵ < dm < d + ϵ. ∀c > 0 fixed, note that the function fc (x) = − log(x) + c(x − 1), (x > 0) has the following properties: (i) fc (x) → ∞ as x → 0+ or x → ∞. 1 ′ ′ (ii) fc (x) = − x + c. So, fc (x) = 0 ⇔ x = 1 . c ′ fc (x) < 0 if x < 1 1 ′ and fc (x) > 0 if x > . c c (iii) fc (x) attains its minimum at x = 1 and fc (x) ≤ 0. ( fc ( 1 ) < 0 if c ̸= 1. Otherwise, c c 66 f1 (1) = 0. ) Hence we can find x1 < 1 < x2 such that c fc (x) ≥ 1 if 0 < x ≤ x1 or x ≥ x2 . Now we apply the above facts to c = d − ϵ or c = d + ϵ to get the following: (a) If 0 < x ≤ x1 , then hm (x) = − log(x) + dm (x − 1) ≥ − log(x) + (d + ϵ)(x − 1) ≥ 1. (b) If x ≥ x2 , then hm (x) = − log(x) + dm (x − 1) ≥ − log(x) + (d − ϵ)(x − 1) ≥ 1. Therefore, we have proved the Lemma. To prove Theorem 11, we first find the lower bound for L(c∗ , θ1 ) − L(c∗ , θ0 ). The construction of this lower bound follows by replacing c0 in (5.11),(5.12) and (5.13) in Lemma 3 with c∗ . The lower bounded is also established by three terms and two of them are dominated by the other. Lemma 6. For a positive integer m and any θ1 ∈ Θ, we have L(c∗ , θ1 ) − L(c∗ , θ0 ) ≥ Am + Bm + Cm , 67 where ( ) gc∗ ,θ (2π(J + S m )/m) 0 Am = − log mθ1 −θ0 gc∗ ,θ (2π(J + S m )/m) 1 ( ) ˆδ (2πJ /m) gc∗ ,θ (2π(J + S m )/m) Im 0 + d−θ −1 , mθ1 −θ0 gc∗ ,θ (2π(J + S m )/m) m 0 gc∗ ,θ (2π(J + K M )/m) 1 0 ( ) gc∗ ,θ (2π(J + S m )/m) gc∗ ,θ (2π(J + S M )/m) 0 1 gc∗ ,θ (2π(J + S M )/m) gc∗ ,θ (2π(J + S m )/m) 0 (1 ) ˆτ (2πJ /m) gc∗ ,θ (2π(J + K M )/m) Im 0 Cm = d−θ 1− . gc∗ ,θ (2π(J + K m )/m) 0 gc∗ ,θ (2π(J + K M )/m) m Bm = log (5.38) (5.39) (5.40) 0 0 In (5.38)-(5.40), K M , K m , S M and S m are defined as K M = arg max{K ∈Tm ,W (K )̸=0} gc∗ ,θ (2π(J + K)/m), 0 h K m = arg min{K ∈Tm ,W (K )̸=0} gc∗ ,θ (2π(J + K)/m), 0 h ) ( gc∗ ,θ (2π(J + K)/m) 0 , S M = arg max{K ∈Tm ,W (K )̸=0} log h gc∗ ,θ (2π(J + K)/m) 1 gc∗ ,θ (2π(J + K)/m) 0 S m = arg min{K ∈Tm ,W (K )̸=0} . h gc∗ ,θ (2π(J + K)/m) 1 Furthermore, sup |Bm | = o(1), (5.41) θ∈Θ Cm = op (1), (5.42) where (5.42) is under the conditions of Theorem 5. Proof of Lemma 6. The procedure of the proof for this Lemma is the same as Lemma 6. Therefore, we will not introduce the details. 68 Proof of Theorem 7. Let (Ω, F, P) be the probability space where a stationary Gaussian ˆ ˆ random field Z(s) is defined. To emphasize dependence on m, we use θm instead of θ in this proof. Note that we have ˆ P (L(c∗ , θm ) − L(c∗ , θ0 ) ≤ 0) = 1 (5.43) ˆ for any positive integer m, due to the definition of θm . We are going to prove the theorem ˆ by deriving a contradiction to (5.43) when θm does not converge to θ0 in probability. ˆ Suppose that θm does not converge to θ0 in probability. Then, there exist ϵ > 0, δ > 0 and M1 such that for m ≥ M1 , ˆ P (|θm − θ0 | > ϵ) > δ. ˆ We define Dm = {ω ∈ Ω : |θm − θ0 | > ϵ}. By Lemma 3, we have ˆ L(c∗ , θm ) − L(c∗ , θ0 ) ≥ Am + Bm + Cm , ˆ where Am , Bm and Cm are given in (5.38)-(5.40) with θ1 = θ. Also, note that ( Am = hm gc∗ ,θ (2π(J + S m )/m) ˆ 0 mθ−θ0 gc∗ ,θ (2π(J + S m )/m) ˆ ) , where hm (·) is defined in Lemma 5 with dm = ˆδ Im (2πJ /m) md−θ0 gc∗ ,θ (2π(J + K M )/m) 0 69 , (5.44) where K M is defined in Lemma 3. We are going to show that there exist {mk }, a subsequence of {m} and a subset of Dmk such that for large enough mk , Amk + Bmk + Cmk is bounded away from zero. By Theorem 3 and the convergence of gc∗ ,θ (2π(J + K M )/m) to gc∗ ,θ ((π/2)1d ), we 0 0 p have dm → d = c0 /c∗ . Then, there exists {mk }, a subsequence of {m} such that dmk converges to one almost surely. By (5.17) in Lemma 3, almost sure convergence of dmk implies that Cmk given in (5.40) converges to zero almost surely. To use Lemma 5, we need uniform convergence of dmk which is obtained by Egorov’s Thoerem (Folland, 1999). By Egorov’s Theorem, there exists Gδ ⊂ Ω such that dmk and Cmk converge uniformly on Gδ and P (Gδ ) > 1 − δ/2. On the other hand, there exists a M2 , which does not depend on ω, such that for mk ≥ M2 , ˆ θm −θ0 mk k gc∗ ,θ (2π(J + S mk )/mk ) 0 gc∗ ,θ ˆ mk (2π(J + S mk )/mk ) (5.45) falls on the outside of (rl , ru ) for all ω ∈ Dmk , because of the uniform boundedness of gc∗ ,θ /gc∗ ,θ . 0 1 Let Hmk = Dmk ∩ Gmk . Note that P (Hmk ) > δ/2 > 0 for mk ≥ M1 . Then, by Lemma 5, there exist δr > 0 and Mr such that for mk ≥ Mr , ) ( Amk ˆ θm −θ0 gc∗ ,θ0 (2π(J + S mk )/mk ) = − log mk k gc∗ ,θ (2π(J + S mk )/mk ) 1 ( ) δ ˆm (2πJ /mk ) ˆ I θm −θ0 gc∗ ,θ0 (2π(J + S mk )/mk ) k + d−θ −1 mk k gc∗ ,θ (2π(J + S mk )/mk ) 0 g ∗ (2π(J + K )/m ) mk 1 M k c ,θ0 > δr (5.46) 70 uniformly on Hmk . Note here that Mr ≥ max{M1 , M2 }. By the uniform convergence of |Bm | on Θ shown in Lemma 6, there exists a M3 such that for mk ≥ M3 , Bmk < δr 4 (5.47) ˆ with θ1 = θmk (ω) uniformly for ω ∈ Ω. The uniform convergence of Cmk on Gδ allows us to find M4 such that for mk ≥ M4 , Cmk < δr 4 (5.48) uniformly on Hmk . Therefore, for mk ≥ max{Mr , M3 , M4 }, we have Amk + Bmk + Cmk ≥ Amk − |Bmk | − |Cmk | > δr /2 on Hmk which leads δr ˆ L(c∗ , θmk ) − L(c∗ , θ0 ) > 2 (5.49) on Hmk . Since P (Hmk ) > δ/2 > 0, it contradicts to (5.43) which completes the proof. Here, we do not need P (∩k Hmk ) > 0 since (5.43) should holds for any m > 0. (2.17) comes from ( lim P m→∞ gc∗ ,θ (2π(J + S m )/m) ˆ 0 ∈ (rl , ru ) mθ−θ0 gc∗ ,θ (2π(J + S m )/m) ˆ Otherwise, the same contradiction to (5.43) will be found. 71 ) = 1. To prove (2.29), let K M = arg max{K ∈Tm ,W (K )̸=0} gθ (2π(J + K)/m), ˆ h K m = arg min{K ∈Tm ,W (K )̸=0} gθ (2π(J + K)/m). ˆ h Assume c= ˆ ∑ K ∈Tm Wh (K) τ Im (2π(J + K)/m) . ˆ ˆ md−θ gθ (2π(J + K)/m) 1 ˆτ g (2π(J +K )/m) ˆ Im (2π J /m) 1 c mθ−θ d−θ g (2π(J +K )/m) gθ (2π(J +K M )/m) m M M θ,c ˆ ˆ ≤ c ≤ c mθ−θ ˆ θ ˆτ gθ (2π(J +K m )/m) Im (2π J /m) 1 d−θ gθ,c (2π(J +K m )/m) g ˆ(2π(J +K m )/m) m θ By Theorem 3 and the convergence of gc,θ (2π(J + K m )/m) and gc,θ (2π(J + K M )/m) to gc,θ ((π/2)1d ), ˆτ Im (2πJ /m) →p 1 d−θ g (2π(J + K )/m) m M θ,c 1 and ˆτ Im (2πJ /m) →p 1. md−θ gθ,c (2π(J + K m )/m) 1 ˆ Corollary 1 is verified because θ − θ = Op (log(m)−1 ) and the boundedness of gc,θ . 72 5.2.3 Proofs of Theorems in Section 2.4 The idea to verify the theoretical results of the second estimator defined in Section 2.4 are similar with Section 2.3. The procedures of proofs will be simpler and worked out by Theorem 3 and Lemma 2. Proof of Theorem 8. Compared with the first estimator in Section 2.3, the consistency ˆτ of c will be directly attained because m−(d−θ0 ) Im (2πJ /m) converges to gc,θ0 ((π/2)1d ) in ˆ probability by Theorem 3. c= ˆ ˆτ Im (2π J /m) p d−θ0 g (2π J /m) → c. m 0 The asymptotic distribution of c comes from Theorem 3 ˆ ( mη ) ˆτ Im (2πJ /m) md−θ0 d −→ N − gc,θ0 ((π/2)1d ) ( Λ2 0, 2 Λ1 ( ) ) 2π d 2 gc,θ ((π/2)1d ) 0 C Proof of Theorem 9. For all θ1 and θ2 in Θ, ( ) gc0 ,θ2 (2πJ /m) R(c0 , θ1 ) − R(c0 , θ2 ) = − log mθ1 −θ2 gc0 ,θ1 (2πJ /m) ( ) ˆδ gc0 ,θ2 (2πJ /m) Im (2πJ /m) + d−θ mθ1 −θ2 −1 gc ,θ1 (2πJ /m) m 2 gc ,θ (2πJ /m) 0 2 0 73 (5.50) We also suppose that Z(s) is a stationary Gaussian random field defined on the probˆ ˆ ability space (Ω, F, P) and replace θm with θ in this proof. The main idea of proving the theorem is looking for a contradiction to ˆ P (R(c0 , θm ) − R(c0 , θ0 ) ≤ 0) = 1 (5.51) ˆ ˆ for any positive integer m, due to the definition of θm when θm does not converge to θ0 in probability. ˆ Suppose that θm does not converge to θ0 in probability. Then, there exist ϵ > 0, δ > 0 and M1 such that for m ≥ M1 , ˆ P (|θm − θ0 | > ϵ) > δ. ˆ We define Dm = {ω ∈ Ω : |θm − θ0 | > ϵ}. Assume dm = ˆδ Im (2πJ /m) md−θ0 gc0 ,θ0 (2πJ /m) , (5.52) p By Theorem 3, we know dm → 1. Then, there exists {mk }, a subsequence of {m} such that dmk converges to one almost surely. To use Lemma 2, we need uniform convergence of dmk which is obtained by Egorov’s Thoerem (Folland, 1999). By Egorov’s Theorem, there exists Gδ ⊂ Ω such that dmk converge uniformly on Gδ and P (Gδ ) > 1 − δ/2. On the other hand, there exists a M2 , which does not depend on ω, such that for mk ≥ 74 M2 , ˆ θm −θ0 mk k gc0 ,θ0 (2πJ /mk ) gc ,θ (2πJ /mk ) ˆ 0 mk −1 > 1 2 (5.53) for all ω ∈ Dmk , because of the uniform boundedness of gc0 ,θ0 /gc0 ,θ1 . Let Hmk = Dmk ∩ Gmk . Note that P (Hmk ) > δ/2 > 0 for mk ≥ M1 . Then, by Lemma 2 with r = 1/2, there exist δr > 0 and Mr such that for mk ≥ Mr , ( ) ˆ θm −θ0 gc0 ,θ0 (2πJ /mk ) − log mk k gc0 ,θ1 (2πJ /mk ) ) ( ˆδ ˆ Im (2πJ /mk ) θmk −θ0 gc0 ,θ0 (2πJ /mk ) mk −1 + d−θ k gc0 ,θ1 (2πJ /mk ) mk 0 gc0 ,θ0 (2πJ /mk ) > δr (5.54) uniformly on Hmk . Note here that Mr ≥ max{M1 , M2 }. Since P (Hmk ) > δ/2 > 0, it contradicts to (5.51) which completes the proof because (5.51) holds for any m > 0. ˆ p To show (2.14), it is enough to show that mθ−θ0 −→ 1 which is equivalent to show that ˆ mθ−θ0 because gc0 ,θ0 (2π(J + S m )/m) gc ,θ (2π(J + S m )/m) ˆ 0 gc0 ,θ0 (2π(J + S m )/m) gc ,θ (2π(J + S m )/m) ˆ 0 p −→ p −→ 1 (5.55) 1. (5.56) ˆ (5.56) follows from the consistency of θ and the continuity of gc0 ,θ shown in Lemma 5.1. To show (5.55), notice that we have ( ) ˆ P R(c0 , θ) − R(c0 , θ0 ) ≤ 0 = 1 75 (5.57) ˆ for each m > 0 by the definition of θ. Suppose that (5.55) does not hold. Then, there exists r > 0, δ > 0 and M1 such that ( gc ,θ (2πJ /m) ˆ mθ−θ0 0 0 −1 gc ,θ (2πJ /m) ˆ 0 P ) > r >δ for all m ≥ M1 . On the other hand, there exists {mk }, a subsequence of {m}, such that dmk → 1. Then, by Egorov’s Thoerem, there exists Ωδ ⊂ Ω such that P (Ωδ ) > 1 − δ/2 and dmk . Now, define { Dm = ω : gc ,θ (2πJ /m) ˆ mθ−θ0 0 0 −1 gc ,θ (2πJ /m) ˆ 0 } > r . (5.58) Note that P (Dmk ∩ Ωδ ) ≥ δ/2 > 0 for all mk ≥ max{M1 , Mr }. Similarly to the proof of ˆ Lemma 2, for each mk ≥ max{M1 , Mr }, there exists δr > 0 such that R(c0 , θ) − R(c0 , θ0 ) > δr for all ω ∈ Dmk ∩ Ωδ . This implies that ˆ P (R(c0 , θ) − R(c0 , θ0 ) > δr ) ≥ δ/2 for each mk ≥ max{M1 , Mr }. Note that δr does not depend on mk which can be seen in Lemma 2. ˙ ¨ Proof of Theorem 10. Let R = ∂L/∂θ and R = ∂ 2 R/∂θ2 . To show the asymptotic ˆ ˆ ˙ distribution of θ, we consider the Taylor expansion of R(c0 , θ) around θ0 , ˆ ¯ ˆ ˙ ˙ ¨ R(c0 , θ) = R(c0 , θ0 ) + R(c0 , θ)(θ − θ0 ), 76 ¯ ˆ ˆ ˙ where θ lies on the line segment between θ and θ0 . Since R(c0 , θ) = 0, we have ( )−1 ˆ ¯ ¨ ˙ log(m)mη (θ − θ0 ) = − log(m)mη R(c0 , θ) R(c0 , θ0 ). Thus, it is enough to show ( ˙ (log(m))−1 mη R(c0 , θ0 ) −→ ¯ ¨ (log(m))−2 R(c0 , θ) −→ d p N Λ 0, 2 Λ2 1 ( ) ) 2π d , C 1. Since gc ,θ (2πJ /m) ˙ ˙ ˆτ R(c0 , θ0 ) = − log(m) + 0 0 − Im (2π(J )/m) gc0 ,θ0 (2πJ /m) ( ) − log(m)md−θ0 gc0 ,θ0 (2πJ /m) + md−θ0 gc0 ,θ0 (2πJ /m) ˙ × ( )2 md−θ0 gc0 ,θ0 (2πJ /m) ( ) ˆτ Im (2πJ /m) = log(m) − 1 md−θ0 gc0 ,θ0 (2πJ /m) ( ) ˆτ gc0 ,θ0 (2πJ /m) ˙ Im (2πJ /m) , + 1 − md−θ0 gc0 ,θ0 (2πJ /m) gc0 ,θ0 (2πJ /m) we see that (5.59) follows from Lemma 4. 77 (5.59) (5.60) Next we prove (5.60). After some simplification, we have ˆ I τ (2πJ /m)gc ,θ (2πJ /m) ˙ ¯ ˆτ 0 ¯ ¨ 0 , θ) = (log(m))2 Im (2π(J + K)/m) − 2 log(m) m R(c ¯ ¯ d−θ g d−θ g 2 (2πJ /m) m m ¯ ¯ c0 ,θ (2πJ /m) c0 ,θ   ˆτ Im (2πJ /m)gc ,θ (2πJ /m) ˙2 ¯ g ¯(2πJ /m) ¨ ˆτ (2πJ /m) Im 0  c0 ,θ + 1 − +2 ¯ ¯ ¯ md−θ g 3 ¯(2πJ /m) md−θ gc ,θ (2πJ /m) gc0 ,θ (2πJ /m) ¯ c0 ,θ 0 gc ,θ (2πJ /m) ˙2 ¯ − 20 g ¯(2πJ /m) c0 ,θ =: E1 + E2 , where E1 is the first term with (log(m))2 and E2 is the last four terms in the expression of ¯ ¨ R(c0 , θ). First, we know that p (log(m))−2 E1 −→ 1. and (log(m))−1 E2 = Op (1) from Theorem 3, (2.14) in Theorem 5 and Lemma 1. Proof of Theorem 11. Let (Ω, F, P) be the probability space where a stationary Gaussian ˆ ˆ random field Z(s) is defined. To emphasize dependence on m, we use θm instead of θ in this 78 proof. Note that we have From the previous discussion, ( ˆ R(c∗ , θm ) − R(c∗ , θ0 ) = − log + ( ˆδ Im (2πJ /m) md−θ0 g∗ ,θ (2πJ /m) 0 g∗ ,θ (2πJ /m) ˆ 0 mθm −θ0 gc∗ ,θ (2πJ /m) ˆm ) ) gc∗ ,θ (2πJ /m) ˆ 0 mθm −θ0 −1 gc∗ ,θ (2πJ /m) ˆm and ˆ P (R(c∗ , θm ) − R(c∗ , θ0 ) ≤ 0) = 1, ∀m. (5.61) ˆ for any positive integer m, due to the definition of θm . We are going to prove the theorem ˆ by deriving a contradiction to (5.61) when θm does not converge to θ0 in probability. ˆ Suppose that θm does not converge to θ0 in probability. Then, there exist ϵ > 0, δ > 0 and M1 such that for m ≥ M1 , ˆ P (|θm − θ0 | > ϵ) > δ. ˆ We define Dm = {ω ∈ Ω : |θm − θ0 | > ϵ} and dm = ˆδ Im (2πJ /m) md−θ0 gc∗ ,θ (2πJ /m) . (5.62) 0 p By Theorem 3, we have dm → d = c0 /c∗ . Then, there exists {mk }, a subsequence of {m} such that dmk converges to one almost surely. To use Lemma 5, we need uniform convergence of dmk which is obtained by Egorov’s Thoerem (Folland, 1999). By Egorov’s 79 Theorem, there exists Gδ ⊂ Ω such that dmk converges uniformly on Gδ and P (Gδ ) > 1−δ/2. On the other hand, there exists a M2 , which does not depend on ω, such that for mk ≥ M2 , ˆ θm −θ0 mk k gc∗ ,θ (2πJ /mk ) 0 gc∗ ,θ ˆ mk (2πJ /mk ) (5.63) falls on the outside of (rl , ru ) for all ω ∈ Dmk , because of the uniform boundedness of gc∗ ,θ /gc∗ ,θ . 0 1 Let Hmk = Dmk ∩ Gmk . Note that P (Hmk ) > δ/2 > 0 for mk ≥ M1 . Then, by Lemma 5, there exist δr > 0 and Mr such that for mk ≥ Mr , ˆ R(c∗ , θm ) − R(c∗ , θ0 ) ) ( ˆ θmk −θ0 gc∗ ,θ0 (2πJ /mk ) = − log mk gc∗ ,θ (2πJ /mk ) 1 ( ) ˆδ (2πJ /mk ) ˆ Im θmk −θ0 gc∗ ,θ0 (2πJ /mk ) + d−θ k mk −1 gc∗ ,θ (2πJ /mk ) mk 0 gc∗ ,θ (2πJ /mk ) 1 (5.64) 0 > δr (5.65) uniformly on Hmk . Note here that Mr ≥ max{M1 , M2 }. It contradicts to (5.61) which completes the proof. Here, we do not need P (∩k Hmk ) > 0 since (5.61) should holds for any m > 0. Assume ˆτ Im (2πJ /m) . c= ˆ ˆ ˆ md−θ gθ (2πJ /m) 1 80 Then, ˆ c = c mθ−θ ˆ ˆτ Im (2πJ /m) gθ (2πJ /m) . md−θ gθ,c (2πJ /m) gθ (2πJ /m) ˆ 1 ˆ (2.29) is proven because of θ − θ = Op (log(m)−1 ), the boundedness of gc,θ and Theorem 3. Lemma 7. Consider a function hm (x) = − log(x) + dm (x − 1), where dm is positive and a function of a positive integer m. Also assume that dm → d > 1 (or < 1) as m → ∞. Then, there exists some M such that for all m ≥ M, hm (x) > 0 for any x > 1 (or x < 1). Proof. (For d > 1) As the previous discussion, we have known hm (x) is a convex function on (0, ∞) for any positive integer m and minimized at x = 1/dm with hm (1/dm ) ≤ 0. Because dm → d > 1, there exists M such that dm > 1 if m ≥ M. There exists two intersection points between x-axis of hm (x) will be 1 and um < 1. Since the convexity of hm (x), when m ≥ M, hm (x) < 0 if x > 1 (or x < 1). Proof of Theorem 12. Suppose that the result (i) of Theorem 12 does not hold, then there exists δ and M1 such that ( ) ˆm > δ P θ0 < θ for m > M1 . 81 p By Theorem 3, we have dm → d = c0 /c∗ . Then, there exists {mk }, a subsequence of {m} such that dmk converges to one almost surely. To use Lemma 7, we need uniform convergence of dmk which is obtained by Egorov’s Thoerem (Folland, 1999) . By Egorov’s Theorem, there exists Gδ ⊂ Ω such that dmk converges to uniformly on Gδ and P (Gδ ) > 1 − δ/2. Assume c∗ < c0 . Then, dm converge c0 /c∗ > 1. By the uniform convergence, there exists M such that dmk > 1 when mk > M. ˆ Assume that ω ∈ Ωmk = {ω : θ0 < θmk }. gc∗ ,θ (2πJ /m) > gc∗ ,θ ˆ 0 mk (2πJ /m) because of the monotonicity of gc∗ ,θ about θ. ˆ θm −θ0 mk k gc∗ ,θ (2πJ /mk ) 0 gc∗ ,θ ˆ mk for all ω ∈ Ωmk ∩ (2πJ /mk ) >1 (5.66) ∩ ∩ ˆ Gδ . Because R(c∗ , θmk )−R(c∗ , θ0 ) > 0 on Ωmk Gδ and P (Ωmk Gδ ) > 0 when mk > M , this contradicts to (5.61) will be found. The result (ii) will also be proven in a similar way. 82 BIBLIOGRAPHY 83 BIBLIOGRAPHY [1] Adler, R. J. and Taylor, J. E.(2007) Random fields and geometry. Springer Monographs in Mathematics Springer, New York. [2] Boissy, Y., Bhattacharyya, B.B., Li, X. and Richardson, G.D. (2005) Parameter estimates for fractional autoregressive spatial processes. Ann. Statist. 33, 2553-2567. [3] Chan, G., Hall, P., and Poskitt, D.S. (1995). Periodogram-based estimators of fractal properties. Ann. Statist. 23, 1684-1711. [4] Chan, G. and Wood, A. T. A. (2000). Increment-based estimators of fractal dimension for two-dimensional surface data. Statist. Sinica 10, 343-376. [5] Chan, G. and Wood, A. T. A. (2004). Estimation of fractal dimension for a class of non-Gaussian stationary processes and fields. Ann. Statist. 32, 1222-1260. [6] Chen, H.-S., Simpson, D.G. and Ying, Z. (2000). Infill asymptotics for a stochastic process model with measurement error. Statist. Sinica 10, 141-156. [7] Constantine, A. G. , Hall, P. (1994). Characterizing surface smoothness via estimation of effective fractal dimension. J. R. Statist. Soc. B. 56, 97-113. [8] Cressie, N. A. C (1993). Statistics for spatial data (rev. ed.). John Wiley, New York. [9] Du, J. (2009a). Asymptotic and computational methods in spatial statistics. (Ph. D. Thesis). Michigan State University, 1-111. [10] Du, J., Zhang, H. and Mandrekar, V. S. (2009). Fixed-domain asymptotic properties of tapered maximum likelihood estimators. Ann. Statist. 100, 993-1028. 84 [11] Folland, Gerald B. Real Analysis. Wiley-Interscience Publication., New York, 1999. [12] Furrer, R., Genton, M. G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics. Journal of Computational and Graphical Statistics. 15, 502-523. [13] Guo, H., Lim, C. and Meerschaert, M. M (2009). Local Whittle estimator for anisotropic random fields. J. Multivariate Anal. 100, 993-1028. [14] Guyon, X. (1982). Parameter estimation for a stationary process on a d-dimensional lattice. Biometrika. 69, 95-105. [15] Guyon, X. (1995). Random fields on a network : modeling, statistics, and applications. Springer-Verlag, New York. [16] Ibragimov, I. A. and Rozanov, Y. A. (1978). Gaussian random processes. Springer, New York. MR0543837 [17] Kaufman, C., Schervish, M. and Nychka, D. (2008). Covariance tapering for likelihood-based estimation in large spatial datasets. J. Amer. Statist. Assoc. 103, 15451555. [18] Loh, W.-L. (2005). Fixed-domain asymptotics for a subclass of Matern-type Gaussian random fields. Ann. Statist. 33, 2344-2394. [19] Lim, C. and Stein, M. L. (2008). Properties of spatial cross-periodograms using fixed-domain asymptotics. J. Multivariate Anal. 99, 1962-1984. [20] Mardia, K. V. and Marshall, R. J. (1984). Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika. 71, 135-146. [21] Robinson, P. M. (1995). Gaussian semiparametric estimation of long range dependence. Ann. Statist. 23, 1630-1661. [22] Stein, M. L. (1988). Asymptotically efficient prediction of a random field with a misspec- ified covariance function. Ann. Statist. 16, 55-63. [23] Stein, M. L. (1990). Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure. Ann. Statist. 18, 850-872. 85 [24] Stein, M. L. (1990). Bounds on the efficiency of linear predictions using an incorrect covariance function. Ann. Statist. 18, 1116-1138. [25] Stein, M. L. (1993). A simple condition for asymptotic optimality of linear predictions of random fields. Statistics and Probability Letters. 17, 399-404. [26] Stein, M. L. (1995). Fixed-domain asymptotics for spatial periodograms. J. Amer. Statist. Assoc. 432, 1962-1984. [27] Stein, M. L. (1999). Interpolation of spatial Data. Springer, New York. [28] Whittle, P. (1954). On stationary processes in the plane. Biometrika. 41, 434-449. [29] Xue, Y. and Xiao, Y. (2010). Fractal and smoothness properties of space-time Gaussian models. To appear in Frontiers Math. [30] Yadrenko, M. (1983). Spectral Theory of Random Fields, New York: Optimization Software. [31] Ying, Z. (1991). Asymptotic properties of a maximum likelihood estimator with data from a Gaussian process. J. Multivariate Anal. 36, 280-296. [32] Ying, Z. (1993). Maximum likelihood estimation of parameters under a spatial sampling scheme. Ann. Statist. 21, 1567-1590. [33] Zhang, H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Amer. Statist. Assoc. 465, 250-261. [34] Zhang, H. and Zimmerman, D. L. (2005). Towards reconciling two asymptotic frameworks in spatial statistics. Biometrika. 92, 921-936. 86