NONSTATIONARY GAUSSIAN RANDOM FIELDS WITH APPLICATION TO SPACE AND SPACE–TIME MODELING By Abolfazl Safikhani A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics – Doctor of Philosophy 2015 ABSTRACT NONSTATIONARY GAUSSIAN RANDOM FIELDS WITH APPLICATION TO SPACE AND SPACE–TIME MODELING By Abolfazl Safikhani Space-time models have become increasingly popular in scientific studies. Gaussian random fields (GRFs) are an important tool of this field. Developing covariance structures for GRFs, parameter estimation, and prediction in big spatial datasets are central challenges in spatio–temporal modeling. My thesis consists of three main chapters dealing with these main challenges in this area. In chapter 2, we investigate the problem of equivalence of GRFs with stationary increments, and obtain sufficient conditions for equivalence in terms of the behavior of the spectral measures at infinity. Further, the main results are applied to a rich family of nonstationary space-time models with possible anisotropy behavior. In chapter 3 , we propose a nonstationary parametric model, in which the underlying Gaussian random field may have different regularities in different directions, thus can be applied to model anisotropy. Using the theory of equivalence of Gaussian measures under nonstationary assumption, strong consistency of the tapered likelihood based estimation of the variance component under fixed domain asymptotics are derived by putting mild conditions on the spectral behavior of the tapering covariance function. In chapter 4, we merge two ideas of factor modeling and low-rank spatial processes approximation to develop and propose a class of hierarchical low-rank spatial factor models which offer a rich and flexible modeling option for dealing with large vector of outcomes observed at large number of locations. A Markov chain Monte Carlo algorithm is developed for estimation, and further, the full posterior distributions are recovered in a Bayesian predictive framework. Keywords: Space–time models, nonstationary Gaussian random fields, Equivalence of Gaussian measures, Covariance tapering, Fixed–domain asymptotics, Spatial factor modeling, Gaussian predictive processes, Hierarchical Bayesian I dedicate this dissertation to my wonderful parents. iv ACKNOWLEDGMENTS I wish to express my deepest gratitude to my advisors Dr. Yimin Xiao and Dr. Alla Sikorskii. Their guidance, patience and thoughtfulness made this work possible. It was a privilege to work with them. I would like to thank my thesis committee members Drs. Andrew Finley, V. S. Mandrekar, and Chae Young Lim, for their time and interest. Also, I appreciate the help and support of the Department of Statistics and Probability in last five years. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Spectral Conditions for Equivalence of Gaussian Random Fields with stationary increments . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 7 13 28 Chapter 2 2.1 2.2 2.3 2.4 Chapter 3 3.1 3.2 3.3 3.4 Covariance Tapering for Anisotropic Nonstationary Gaussian Random Fields with Application to Large Scale Spatial Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 36 37 38 42 Chapter 4 Spatial factor models and dimension reducing Gaussian predictive processes for high dimensional remotely sensed data and forest variables . . . . . . . . . . . . . . . . . . . . . . . . . . . Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Identifiablity Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prior Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Approximation using Gaussian predictive processes . . . . . . . . . . . . . . Data equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Updating the mean part b . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Updating ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.3 Updating the loading factors Λ . . . . . . . . . . . . . . . . . . . . . 4.6.4 Updating the nuggets τ . . . . . . . . . . . . . . . . . . . . . . . . . Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 45 46 47 48 49 50 51 51 52 53 53 54 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 vi LIST OF TABLES Table 3.1 Results for estimation of σ 2 . . . . . . . . . . . . . . . . . . . . . . . 43 Table 4.1 Parameter estimates with their 95% credible intervals . . . . . . . . 54 Table 4.2 Parameter estimates with their 95% credible intervals . . . . . . . . 58 vii LIST OF FIGURES Figure 3.1 3D Simulated Surface on [0, 1]2 with increments 0.02 . . . . . . . . . Figure 3.2 2D Projection of the Simulated Surface on [0, 1]2 with increments 0.02 40 Figure 3.3 Simulated Surface on [0, 1]2 with increments 0.07 Figure 4.1 AGB measurements from the Penobscot Experimental Forest in Bradley, Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Figure 4.2 Fitted plot of AGB measurements together with the observed values 56 Figure 4.3 LiDAR data with fitted values and credible bands . . . . . . . . . . 57 viii . . . . . . . . . . 39 42 Chapter 1 Introduction Space-time models have become increasingly popular in scientific studies such as geology, climatology, geophysics, environmental and atmospheric sciences ([9] and [41]). Gaussian random fields (GRFs) are an important tool of this field. Developing covariance structures for GRFs, parameter estimation, and prediction in big spatial datasets are central challenges in spatio–temporal modeling. My thesis consists of three main chapters dealing with these main challenges in this area. Finding conditions for equivalence of Gaussian measures is necessary for studying the asymptotics in the framework of covariance tapering. Equivalence of Gaussian measures is a classical problem in probability theory that has been studied since early 60s and 70s. Necessary and sufficient conditions for the equivalence of general Gaussian measures are given in [22], [15], [19]. However, the classical conditions are rather abstract, and may not be directly applicable to concrete cases. More verifiable conditions are then stated by restricting to a particular family of Gaussian measures. There are criteria for equivalence of stationary Gaussian processes in terms of their spectral densities in [25], and more general for stationary GRFs in [37] and [48]. However, investigation on the equivalence of nonstationary GRFs is limited to some special cases which are discussed in recent works (see, e.g., [7], [42], and [44]). In chapter 2, we study the equivalence of GRFs with stationary increments, and give a spectral condition under which the two GRFs will induce equivalent measures. This condition is easy to verify, and the results are applicable to different families of GRFs such 1 as the ones with the power law as their spectral densities with different power exponents in different dimensions, thus allowing anisotropy. Researchers across scientific fields are asking increasingly complex questions that require spatio–temporal models able to accommodate rich association structures. In such settings, traditional assumptions such as stationarity and isotropy on the underlying GRFs may not be appropriate. These assumptions become increasingly tenuous as the size of the dataset and domain increase, which is increasingly common in many applied settings. There have been considerable attempts in recent literature to construct nonstationary anisotropic spatial models (See [24], [17], [30], [31], [26], [1] and references therein for the construction of such nonstationary models). In chapter 3, we study covariance estimation for a family of anisotropic GRFs with stationary increments using covariance tapering (See [18] for definition). Covariance tapering introduces sparsity to allow for more efficient matrix operations which is essential in dealing with large scale data sets. Consistency of MLE under the tapered covariance structure is established by putting a mild condition on the decay rate of the spectral density of the tapering function at infinity. This result also deals with the problem of equivalence of nonstationary GRFs. Researchers are increasingly seeking inference about multiple outcome variables across spatial and temporal domains. This interest in applied community is due in large part to widely available geographic information systems (GIS), geographic position systems (GPS), and open source long-term monitoring networks. In such settings, models should capture both within location (i.e., among outcome variable) and between location covariance structures. Delivering this inference is often difficult due to the massive size of the given dataset. In chapter 4, we merge two ideas of factor modeling and low-rank spatial processes approximation to develop and propose a class of hierarchical low-rank spatial factor models 2 which offer a rich and flexible modeling option for dealing with large vector of outcomes observed at large number of locations. A Markov chain Monte Carlo algorithm is developed for estimation, and further, the full posterior distributions are recovered in a Bayesian predictive framework. We illustrate our methodology with simulation experiments and an environmental data set involving Light Detection and Ranging (LiDAR) signals together with above-ground biomass (AGB) from the Penobscot Experimental Forest in Bradley, Maine. 3 Chapter 2 Spectral Conditions for Equivalence of Gaussian Random Fields with stationary increments Space-time models have become increasingly popular in scientific studies. Recent works in this field consist of construction of nonstationary anisotropic Gaussian random fields (GRFs). However, the problem of equivalence of measures corresponding to GRFs, which has direct consequences on the study of such models, is discussed mostly for the stationary case, and results for the nonstationary case are limited. In this chapter, we investigate the problem of equivalence of GRFs with stationary increments, and obtain sufficient conditions for equivalence in terms of the behavior of the spectral measures at infinity. Further, the main results are applied to a rich family of nonstationary space-time models with possible anisotropy behavior. 2.1 Introduction Space-time models have become increasingly popular in scientific studies such as geology, climatology, geophysics, environmental and atmospheric sciences, etc ([9] and [41]). Gaussian random fields (GRFs) are basically the main tools in space-time modeling. Most of the para4 metric models proposed are GRFs with specific parametric covariance structure (See [10], [20], [43] and [21] for rich families of space-time covariance functions). The main job then is to find consistent estimates for the parameters, and finally use them for prediction of the underlying random field at unobserved locations, and also in performing statistical inference. Given a parametric family of Gaussian distributions, an important question is whether all the parameters are consistently estimable. First step to answer this question demands an investigation on the equivalence of the corresponding Gaussian measures involved in the model, since if two parameters in the parameter space give equivalent Gaussian distributions, then it is impossible to find consistent estimators for all the parameters involved regardless of the method chosen for estimation (See for example [50] for discussion on inconsistent estimation in Matern covariance functions). Another application of equivalence of Gaussian measures comes from covariance structure misspecification, and its effect on spatial interpolation ([41], chapter 4). In spatial statistics, an important goal is to predict the underlying GRF at unobserved locations. The common method is the best linear unbiased predictor (BLUP), which in known as kriging (See [41] for definition and properties of kriging). This, however needs information about the covarince function of the underlying GRF, which is in fact unknown. Using misspecified covariance functions in prediction may increase the prediction error. [41] proved that covariance functions which yield to equivalent Gaussian measures, have asymptotically the same prediction error rate (See also [39] and [40]). Therefore, finding sufficient conditions under which two covarince functions result in equivalent Gaussian distributions has direct impact in evaluating the prediction error in interpolation of spatial data, and thus proving asymptotically optimal prediction under misspecified covariance structure. There are other applications of equivalence and perpendicularity of GRFs in spatial modeling. For example, we refer to [18] and [28] for the application in covariance tapering. 5 Equivalence of Gaussian measures is a classical problem in probability theory that has been studied since early 60’s and 70’s. Necessary and sufficient conditions for the equivalence of general Gaussian measures are given in [22], [15], [19], and references therein (See also [32], [27] for conditions in terms of time domain reproducing kernel Hilbert spaces). These conditions are rather abstract, and may not be directly applicable to concrete cases. More specific conditions are then stated by restricting to a particular family of Gaussian measures. There are criteria for equivalence of stationary Gaussian processes in terms of their spectral densities in [25], and more general for stationary GRFs in [37] and [48]. However, investigation on the equivalence of nonstationary GRFs is limited to some special cases which are discussed in recent works. For instance, we refer to [7] on mixed fractional Brownian motion, [5] on Volterra processes, [38] on multiparameter Gaussian processes, and [42] on a family of power law covariance functions. Recently, [44] found sufficient conditions for equivalence of Gaussian processes with stationary increments in terms of their spectral densities similar to one in [25] for the stationary case. However, this result is only for one dimensional Gaussian processes, and the case for multidimensional GRFs with stationary increments is not discussed. Space-time models have become more complex, and traditional assumptions such as stationarity and isotropy on the underlying GRFs might not be appropriate in data analysis in different fields such as geophysics, environmental and atmospheric sciences, etc. There have been considerable attempts in recent works for constructing nonstationary anisotropic spatial models (We refer to [24], [17], [30], [31],[26], [1] and references therein for the construction of such nonstationary models). Therefore, studying the equivalence of nonstationary GRFs is of interest in the study of such nontrivial families of space-time models. In this paper, we will investigate the equivalence of GRFs with stationary increments which might have 6 different regularities in each direction, and thus allows anisotropy as well, and find sufficient conditions for their equivalence in terms of the behavior of their spectral measures at infinity. The rest of the chapter is organized as follows. We start section 2.2 by introducing some useful Hilbert spaces connected to the frequency domain, and study their structure. In section 2.3, we state the main result of the paper, which is sufficient conditions for equivalence of GRFs with stationary increments using the tail behavior of their spectral densities. In the last section, we apply the main results to a rich family of anisotropic nonstationary spatio-temporal Gaussian models. 2.2 Preliminary Let X = {Xt : t ∈ Rd } be a centered GRF with stationary increments with X(0) = 0. According to [49], we have the following spectral representations for X, and for the covariance structure of X, respectively: d X(t) = C(t, s) = E(Xt Xs ) = Rd Rd ei t,λ − 1 W (dλ), ei t,λ − 1 e−i s,λ − 1 F (dλ), (2.2.1) (2.2.2) where W is a complex-valued Gaussian random measure with control measure F , which is a non-negative symmetric measure on Rd \ {0}, called the spectral measure of X satisfying |λ|2 Rd 1 + |λ|2 F (dλ) < ∞. (2.2.3) If the spectral measure is absolutely continuous with respect to the Lebesgue measure 7 on Rd , we will call its Radon-Nikodym derivative spectral density, denoted by f (λ). We will give conditions in terms of the spectral densities for the equivalence of GRFs with stationary increments, but first, we need to define precisely equivalent GRFs. Definition 2.2.1. For a fixed D ⊆ Rd , we call two GRFs X = {Xt : t ∈ Rd }, Y = {Yt : t ∈ Rd } equivalent if they induce equivalent measures on the measurable space RD , B RD , in which B RD is the σ-field generated by the Borel subsets of RD . Moreover, we call them locally equivalent if they are equivalent on any bounded subset of Rd . Such spectral representations for the covariance structure of GRFs of the type given in 2.2.2 makes an important bridge between the problem of equivalence of GRFs and the description of the space generated by the linear combinations of the kernel functions in the spectral representation. For that purpose, in this section we need to study such linear spaces, and explore their properties which will be valuable in the next section. For that, let’s define for a fixed D ⊆ Rd bounded, and a measure F on Rd satisfying 2.2.3, an incomplete Hilbert space LeD = span et (λ) := ei t,λ − 1 : t ∈ D with the inner product et , es F = Rd et (λ)es (λ) F (dλ). We denote the closure in L2 (F ) of LeD by LD (F ). Also, for T > 0 define ΠT := λ = (λ1 , ..., λd ) ∈ Rd : −T ≤ λj ≤ T, ∀j = 1, ..., d . By looking at the basis functions generating LeD , we realize that they are entire functions defined on Cd , and they are of finite exponential type (See [36] for definition and properties). However, the elements in the completed Hilbert space, LD (F ), may not have the same properties. These properties enable us to apply Paley-Wiener type theorems, and get nice description of the elements in theses Hilbert spaces. This problem is discussed in details in 8 [34] and [33], and there are conditions mentioned in [33] on the measure F for this purpose. Therefore, we need to put further assumption on the spectral measure F in order to get the same nice properties. Here is the crucial assumption on the tail behavior of the spectral density f (λ), assuming it exists: (C1) There exist c, k, η > 0 such that for all |λ| > k, we have f (λ) ≥ |λ|c η . This assumption on the spectral density will force the elements in LD (F ) to be entire functions of finite exponential type. The next two lemmas will prove this statement. The following lemma is taken from [46], Lemma 2.2, and we state it here again for completeness. Lemma 2.2.1. Suppose that the spectral density f satisfies (C1). Then for fixed T > 0, there exists positive constants C and M such that for all functions φ of the form n φ(λ) = k ak ei t ,λ − 1 , (2.2.4) k=1 where ak ∈ R and tk ∈ ΠT , we have for all z ∈ Cd |φ(z)| ≤ C φ F exp{M |z|}. (2.2.5) Moreover, for fixed C1 > 0, there exists a positive constant C2 such that for all functions of the form 2.2.4, we have |φ(z)| ≤ C2 |z| φ F (2.2.6) for all |z| ≤ C1 . One can use 2.2.5 to define the limiting functions in LΠ (F ) in such a way that they T also satisfy both 2.2.5 and 2.2.6. We will prove this in the next Lemma. 9 Lemma 2.2.2. Let F be a spectral measure on Rd which satisfies (C1). Then, for each T > 0, the space LΠ (F ) consists of the restriction to Rd of entire functions on Cd of finite T exponential type. Moreover, 2.2.6 is also true for all functions in LΠ (F ). T Proof. The idea of the proof is similar to [33], p. 304. Take a sequence φn ∈ LeΠ , such that T φn − φ F → 0 for some φ ∈ LΠ (F ). Then, it is a Cauchy sequence in L2 (F ), and using T 2.2.5, we get |φn (z) − φm (z)| ≤ C φn − φm F exp{M |z|} (2.2.7) This means for each fixed z ∈ Cd , the sequence φn (z) is a Cauchy sequence in C. So, it is convergent. Let’s denote the limit by φ(z). Now, since limit in L2 (F ) sense implies the almost sure convergence for a subsequence, φ = φ a.e. with respect to F . From now on, we will take φ as our favorite version of the limits of functions in LeΠ . Therefore, the elements in T the space LΠ (F ), are not only the T L2 (F ) limits of functions in LeΠ , but also the pointwise T limits as well. Thus, both 2.2.5 and 2.2.6 are true for all the elements in LΠ (F ). The only T thing left to prove is that these functions are entire functions on Cd . But this is true since any element of the space LΠ (F ) is the locally uniform limit of functions of the form 2.2.4 T which are obviously entire functions, and thus, they are also entire functions as well (This is called Weirerstrass Thoerem. See for example [14], Proposition 2.8, p. 52). This lemma shows that if the spectral density satisfies the assumption (C1), we can complete the space LeΠ in such a way that the resulting functions are locally uniform T limits of entire functions, and hence, they are entire functions as well, and they are also of finite exponential type. Further, since 2.2.5 is true for all the elements in the Hilbert space LΠ (F ), we can see that the point evaluators, i.e. the functionals on LΠ (F ) of the T T 10 form φ → φ(z) for each fixed z ∈ Cd are bounded operators. Now, we can apply Riesz representation Theorem (See [23], Theorem 3. p. 31 ) to prove that the space LΠ (F ) is T a Reproducing Kernel Hilbert Space (RKHS) in the sense of [2]. This means there exists a family of functions KT (., .) from Rd × Rd to C such that first of all KT (ω, .) ∈ LΠ (F ) for T all ω ∈ Rd , and secondly, for every φ ∈ LΠ (F ) and ω ∈ Rd , we have the following kernel T property φ, KT (ω, .) F = Rd φ(λ)KT (ω, λ) F (dλ) = φ(ω). (2.2.8) Also, it is worthwhile to mention that the set of all functions {KT (ω, .) : ω ∈ Rd } is dense in LΠ (F ) (To see this, note that if φ ∈ LΠ (F ) is orthogonal to KT (ω, .) for all ω ∈ Rd , T T then φ(ω) = φ, KT (ω, .) F = 0, which implies φ = 0.), and further for all ψ ∈ L2 (F ), the function ω → ψ, KT (ω, .) F (2.2.9) is the orthogonal projection of ψ on LΠ (F ) (See the proof in [2], p. 345). We denote this T projection by πL (F ) ψ. ΠT Finding explicit forms of the reproducing kernels is not an easy job. However, in order to prove the results in section 2.3, we need upper bounds for the growth rate of the diagonal elements of the reproducing kernels at origin and also at infinity. The following proposition proves an important growth rate for the diagonal elements in the reproducing kernels. Proposition 2.2.1. Suppose that the spectral density of F , f (λ) satisfies (C1). Then, for fixed T > 0 and C1 > 0, there exists a positive constant C2 such that KT (ω, ω) ≤ C2 |ω|2 11 (2.2.10) for all |ω| < C1 . Proof. Since for fixed ω ∈ Rd , KT (ω, .) ∈ LΠ (F ), we can apply Lemma 2.2.2 to these T functions. This means that the inequality 2.2.6 can be applied to them. After doing so, we get |KT (ω, λ)| ≤ C2 |λ| KT (ω, .) F = C2 |λ|(KT (ω, ω))1/2 for all ω ∈ Rd and λ ∈ Cd with |λ| < C1 . Now, if we put λ = ω, since the constant C2 doesn’t depend on ω, we will get the desired result. We also need to define another Hilbert space based on the tensor product of the elements in LΠ (F ). For that purpose, first we define LeΠ ⊗ LeΠ to be the span of functions T T T (et ⊗ es )(ω, λ) := et (ω)es (λ) with t, s ∈ ΠT . Now, denote by LΠ (F ) ⊗ LΠ (F ) the closure T T in L2 (F ⊗ F ) of the space LeΠ ⊗ LeΠ . According to [2], Theorem 1, p. 361, the new Hilbert T T space LΠ (F ) ⊗ LΠ (F ) is also a RKHS with reproducing kernel T T ((ω1 , λ1 ), (ω2 , λ2 )) → KT (ω1 , ω2 )KT (λ1 , λ2 ). (2.2.11) This means in fact that for ψ ∈ LΠ (F ) ⊗ LΠ (F ), T T ψ, KT (ω, .) ⊗ KT (λ, .) F ⊗F = ψ(ω, λ). (2.2.12) We finish this section by a lemma stating that the norm of the elements in spaces LΠ (F ) T only depends on the tail behavior of the spectral measure F . 12 Lemma 2.2.3. Suppose f0 and f1 are two spectral densities satisfying the condition (C1), and further f0 (λ) f1 (λ) as |λ| → ∞. Then, LΠ (f0 ) = LΠ (f1 ), and further there exist T T positive constants C3 and C4 such that for all φ ∈ LΠ (f0 ) T C3 φ f ≤ φ f ≤ C4 φ f . 1 0 1 Proof. Suppose g ∈ LΠ (f0 ). There exists functions gn of the form 2.2.4 such that T g − gn f → 0 as n → ∞. Now, using lemma 2.2.2, we get 0 g − gn 2f 1 = Rd |g(λ) − gn (λ)|2 f1 (λ) dλ ≤ C2 g − gn 2f 0 |λ| 0 if and only if: φ F , ∀φ ∈ LeΠ , 1 T (i) φ F 0 (ii) ψ(ω, λ) = KT0 (ω, λ) − Rd KT0 (ω, γ)KT0 (λ, γ) F1 (dγ) ∈ LΠ (F0 ) ⊗ LΠ (F0 ) where T T KT0 (., .) are the reproducing kernels of the space LΠ (F0 ). T Proof. First, assume that the measures induced by them are equivalent. Then, by theorem 2.3.1, there exists a function ψ ∈ LΠ (F0 ) ⊗ LΠ (F0 ) such that the equation 2.3.1 will T T hold. Now, because of bilinearity and continuity of inner product together with the fact that LΠ (F0 ) = LΠ (F1 ), we get T T φ1 , φ2 F − φ1 , φ2 F = ψ, φ1 ⊗ φ2 F ⊗F 0 1 0 0 (2.3.2) for all φ1 , φ2 ∈ LΠ (F0 ) = LΠ (F1 ). Now, simply choose for fixed ω, λ ∈ Rd , φ1 (γ) = T T KT0 (ω, γ) and φ2 (γ) = KT0 (λ, γ), and replace them in equation 2.3.2 to get ψ(ω, λ) = KT0 (ω, λ) − Rd KT0 (ω, γ)KT0 (λ, γ) F1 (dγ). Conversely, since ψ ∈ LΠ (F0 ) ⊗ LΠ (F0 ), by reproducing kernel property we get ψ(ω, λ) = T T ψ, KT0 (ω, .) ⊗ KT0 (λ, .) F ⊗F . Also, note that by the form of ψ, we have 0 0 16 ψ(ω, λ) = KT0 (ω, .), KT0 (λ, .) F − KT0 (ω, .), KT0 (λ, .) F . Combining them together, we get 0 1 KT0 (ω, .), KT0 (λ, .) F − KT0 (ω, .), KT0 (λ, .) F = ψ, KT0 (ω, .) ⊗ KT0 (λ, .) F ⊗F . (2.3.3) 0 1 0 0 Now, since the span{KT0 (ω, .); ω ∈ Rd } is dense in LΠ (F0 )(= LΠ (F1 )), the equality 2.3.3 T T will be true for all the elements in LΠ (F0 ). T Remark 2.3.1. In order to prove that the function ψ appeared in Theorem 2.3.2 is in the space LΠ (F0 ) ⊗ LΠ (F0 ), we only need to show that it belongs to L2 (F0 ⊗ F0 ). To prove T T this fact, we will show that the orthogonal projection of ψ on LΠ (F0 ) ⊗ LΠ (F0 ) is in fact T T itself. For that, observe πL ΠT (F0 )⊗LΠT (F0 ) (ψ) (ω, λ) = = = ψ, KT0 (ω, .) ⊗ KT0 (λ, .) Rd Rd F0 ⊗F0 ψ(x, y)KT0 (ω, x)KT0 (λ, y) F0 (dx)F0 (dy) Rd Rd R d KT0 (x, γ)KT0 (y, γ)KT0 (ω, x)KT0 (λ, y) × F0 (dx)F0 (dy)(F0 (dγ) − F1 (dγ)) = Rd KT0 (ω, γ)KT0 (λ, γ) (F0 (dγ) − F1 (dγ)) = ψ(ω, λ). This is a huge reduction which strengthen the applicability of Theorem 2.3.2. By Remark 2.3.1, checking the second assumption in Theorem 2.3.2 is reduced to only checking ψ ∈ L2 (F0 ⊗ F0 ). However, checking the first assumption seems to be not an easy job since we need to compare the norms of all the elements in the space LeΠ under T 17 two different measures. For that purpose, in the following, we will try to find equivalent conditions which may be easier to verify in application. It is well known (See [13], p. 1009 ) that for ψ ∈ L2 (F ⊗ F ), one can define a HilbertSchmidt operator on L2 (F ) in the following way (ψφ) (ω) = Rd ψ(ω, λ)φ(λ) F (dλ) (2.3.4) for every φ ∈ L2 (F ). If we use specifically the ψ defined in Theorem 2.3.2, and restrict the domain to LΠ (F ), we will have again a Hilbert-Schmidt operator on LΠ (F ). Note that T T the image of the operator ψ is in fact inside the LΠ (F ). To prove this, observe that for T φ ∈ LΠ (F ), T πL ΠT (ψφ) (ω) = = = (ψφ)(x)KT (ω, x) F (dx) Rd Rd Rd φ(y)ψ(x, y)KT (ω, x) F (dx)F (dy) Rd Rd Rd φ(y)KT (x, γ)KT (y, γ)KT (ω, x) F (dx)F (dy) ×(F (dγ) − F1 (dγ)) = = Rd Rd Rd φ(y)KT (ω, γ)KT (y, γ) (F (dγ) − F1 (dγ))F (dy) φ(y)ψ(ω, y) F (dy) = (ψφ)(ω). This argument shows that ψφ ∈ LΠ (F ) for any φ ∈ LΠ (F ). Also, observe that since T T ψ(ω, λ) = ψ(λ, ω), the operator ψ is self-adjoint. This fact together with compactness of this operator (Since ψ is a Hilbert-Schmidt operator, it is already compact, see [13], p. 1009) enable us to use the spectral Theorem for compact normal operators (See [13], Corollary 5, 18 p. 905), which we will use in the proof of the next Theorem. In fact, next theorem shows that the first condition in Theorem 2.3.2 can be replaced by 1 ∈ / σ(ψ), where σ(ψ) is the spectrum of the operator ψ (See [13], p. 902 for definition). Theorem 2.3.3. Two GRFs with stationary increments and spectral measures F0 and F1 with F0 satisfying the condition (C1), are equivalent on ΠT if and only if the function defined by ψ(ω, λ) = KT0 (ω, λ) − Rd KT0 (ω, γ)KT0 (λ, γ) F1 (dγ) (2.3.5) belongs to LΠ (F0 ) ⊗ LΠ (F0 ), and 1 ∈ / σ(ψ). T T Proof. From 2.3.2, by putting φ1 = φ2 = φ, and the definition of the operator ψ in 2.3.4, we get φ 2F 1 =1− φ 2F 0 ψφ, φ F 0 2 φ F (2.3.6) 0 for all φ ∈ LΠ (F0 ). This simply implies that first σ(ψ) ⊆ (−∞, 1], and second there exists T a finite positive constant C such that φ F ≤ C φ F for all φ ∈ LeΠ since ψ is a bounded 1 0 T operator. This fact shows that proving ψ ∈ L2 (F0 ⊗ F0 ) is helping us to verify half of what we need in the first condition of theorem 2.3.2 as well. What remains is to show that 1∈ / σ(ψ) if and only if there exists a positive constant c such that φ F ≤ c φ F for all 0 1 φ ∈ LeΠ . T First, suppose that φ F ≤ c φ F for some c > 0. If 1 ∈ σ(ψ), it means that 0 1 there exists φ ∈ LΠ (F0 ) with φ F = 1 such that ψφ = φ. Putting it in 2.3.6, we get T 0 φ F = 0 which is contradiction. Conversely, suppose 1 ∈ / σ(ψ), and also there exists a 1 sequence φn ∈ LeΠ T such that φn F = 1 for all n ≥ 1, and φn F → 0 as n → ∞. 0 1 Since ψ is a self-adjoint compact operator, by Corollary 5 p. 905 in [13], there exists a 19 countable orthonormal basis for LΠ (F0 ) consisting of eigenvectors of ψ, denoting them T by gj , j = 1, 2, ... with corresponding eigenvalues λj . Now, each φn has the representation φn = ∞ j=1 anj gj for anj ∈ R. Putting this sequence back to the equation 2.3.6, we get that ψφn , φn F → 1 which means 0 ∞ 2 j=1 anj , ∞ 2 j=1 anj λj → 1 as n → ∞. Now, since 1 = φn 2F = we can rewrite the above equation as 0 ≤ 0 ∞ 2 j=1 anj (1 − λj ) → 0 (This quantity is non-negative since all the eigenvalues are bounded above by 1). Since 1 ∈ / σ(ψ), and {λj , j = 1, 2, ...} has no accumulation points in C except possibly 0 (See [13], Corollary 5 p. 905), there exists ∞ 2 j=1 anj (1 − λj ) ≥ > 0 such that sup {λj , j ≥ 1} = 1 − . However, this implies that ∞ 2 j=1 anj = for all n ∈ N, which is contradiction by the fact that this sequence must go to 0 when n goes to ∞. This completes the proof. Remark 2.3.2. Note that based on the proof of theorem 2.3.3, we can change the first condition in theorem 2.3.2 to (i) There exists a positive constant c such that φ F ≤ c φ F for all φ ∈ LeΠ . 0 1 T As lemma 2.2.3 emphasizes that the behavior of the spectral measure at origin does not affect the structure of the space LΠ (F ), one might expect the same formation in terms of the T equivalence of Gaussian measures. The following Theorem shows that changing the spectral measure on bounded subsets of Rd will not affect the equivalence of the corresponding GRFs. In other words, for checking the equivalence of GRFs, only the behavior of their spectral measures at infinity is important. Theorem 2.3.4. Suppose two GRFs with stationary increments have spectral measures F0 and F1 such that F0 satisfies the condition (C1), and F0 = F1 on I c , where I is a bounded subset of Rd . Then, these two GRFs are locally equivalent. 20 Proof. Define F1 (dλ) = 1I c (λ)F0 (dλ). First, we show that F0 and F1 will produce locally equivalent GRFs with stationary increments. For that, fix T > 0. We will investigate the equivalence of measures on ΠT . The function ψ appearing in Theorem 2.3.3 in this case is given by ψ(ω, λ) = I KT0 (ω, γ)KT0 (λ, γ) F0 (dγ). Now, by Fubini’s Theorem, and the reproducing kernel property we get ψ 2F ⊗F = 0 0 2 Rd R d = I I I Rd KT0 (ω, γ)KT0 (λ, γ) F0 (dγ) F0 (dω)F0 (dλ) KT0 (ω, γ)KT0 (ω, γ )F0 (dω) R KT0 (λ, γ )KT0 (λ, γ)F0 (dλ) d ×F0 (dγ)F0 (dγ ) 2 = I I KT0 (γ, γ ) F0 (dγ)F0 (dγ ). Now, using Proposition 2.2.1, and the fact that KT0 (γ, γ ) ψ 2F ⊗F ≤ 0 0 I 2 ≤ KT0 (γ, γ)KT0 (γ , γ ), we get KT0 (γ, γ) F0 (dγ) 2 ≤ const. I 2 2 |γ| F0 (dγ) < ∞ by 2.2.3. This implies ψ ∈ LΠ (F0 ) ⊗ LΠ (F0 ). T T It remains to show that 1 ∈ / σ(ψ). For that purpose, take an arbitrary φ ∈ LΠ (F0 ), and T observe that (We use the fact that KT0 (ω, λ) = KT0 (λ, ω). See [2], p. 344 ) 21 (ψφ) (λ) = = = Rd Rd φ(ω)ψ(λ, ω) F0 (dω) φ(ω) I Rd = I I KT0 (λ, γ)KT0 (ω, γ) F0 (dγ) F0 (dω) φ(ω) KT0 (λ, γ)KT0 (γ, ω) F0 (dω)F0 (dγ) φ(γ) KT0 (λ, γ) F0 (dγ) = πL ΠT (F0 ) φ1I (λ). Therefore, if ψφ = φ, it implies in particular that φ F ≤ φ1I F . This means φ = 0 0 0 almost everywhere with respect to F0 in I c . Hence, since φ is an entire function, this implies that φ = 0. Thus, 1 cannot be in the spectrum of ψ. So far, we proved GRFs with spectral measures F0 and F1 are locally equivalent, but since F0 = F1 on I, similarly, we can say that F1 and F1 produce locally equivalent GRFs. Putting these two together, we get the desired result. Theorems 2.3.2, and 2.3.3 give necessary and sufficient conditions for equivalence of GRFs with stationary increments, but still verifying the conditions mentioned in these two theorems for some concrete cases might be difficult. In the literature, there are sufficient conditions for equivalence of certain GRFs in terms of only their spectral densities. These conditions are easily verifiable once the two spectral densities are given. For example, we refer to [25], theorem 17, p. 104, [37], theorem 4, and [48], theorem 4, p. 156 for stationary GRFs, and also [44], theorem 5.3, [45], theorem 1, and [42] theorem 1 for some nonstationary cases. Therefore, in the rest of this section, we attempt to find similar sufficient conditions for equivalence, only in terms of the spectral measures. 22 Theorem 2.3.5. Suppose that the spectral measure F0 and F1 have positive densities f0 and f1 with respect to the Lebesgue measure, and F0 satisfies the condition (C1). If there exists a finite constant C > 0 such that φ F ≤ C φ F for all φ ∈ LeΠ , and 0 1 T f1 (λ) − f0 (λ) 2 0 KT (λ, λ)f0 (λ) dλ < ∞ f0 (λ) |λ|>k (2.3.7) for some k > 0, then GRFs with stationary increments and spectral measures F0 and F1 are equivalent on ΠT . Proof. Applying theorem 2.3.4, we can change the value of f1 on any bounded set, without having any consequences on the equivalence. So, we assume here that f0 = f1 on |λ| ≤ k. The function ψ in Theorem 2.3.3 here will be of the form ψ(ω, λ) = Rd KT0 (ω, γ)KT0 (λ, γ) (f0 (λ) − f1 (λ)) dγ f − f1 = πL (F ) KT0 (ω, .) 0 ΠT 0 f0 (observe that since KT0 (ω, λ) 2 (λ). f −f ≤ KT0 (ω, ω)KT0 (λ, λ), 2.3.7 implies that KT0 (ω, .) 0f 1 ∈ 0 L2 (F0 ) for all ω ∈ Rd , and hence using the orthogonal projection makes sense). Now, it 23 follows that Rd R d |ψ(ω, λ)|2 F0 (dλ)F0 (dω) = ≤ = 2 f − f1 πL (F ) KT0 (ω, .) 0 D 0 f0 Rd F0 (dω) F0 f0 − f1 2 0 KT (ω, .) F0 (dω) f0 Rd F0 Rd R 2 |KT0 (ω, γ)| d f0 (γ) − f1 (γ) 2 F0 (dγ) f0 (γ) ×F0 (dω) = = f0 (γ) − f1 (γ) 2 0 KT (γ, γ)f0 (γ) dγ f0 (γ) Rd f1 (γ) − f0 (γ) 2 0 KT (γ, γ)f0 (γ) dγ f0 (γ) |γ|>k < ∞ by the integrability assumption 2.3.7. Hence ψ ∈ L2 (F0 ⊗ F0 ), and this completes the proof. In the condition 2.3.7, in addition to the behavior of the spectral densities at infinity, the growth rate of the diagonal elements of the reproducing kernels of the space LΠ (F0 ) at T infinity is also playing an important role. Since finding explicit forms of reproducing kernels are difficult, we need at least to find upper bounds for the growth rate of the diagonal terms. The following condition on spectral density helps us to accomplish this task: (C2) For spectral density f , there exist an entire function φ0 on Cd of finite exponential type such that f (λ) |φ0 (λ)|2 as |λ| → ∞ on Rd . As it was mentioned, condition (C2) will force an upper bound for the behavior of the reproducing kernels on the diagonal at infinity, and the following lemma explores this fact. 24 Lemma 2.3.1. Suppose f0 is a spectral density such that it satisfies (C1) and (C2) for some entire function φ0 . Then, for T > 0, there exists a finite constant C > 0 such that the reproducing kernel KT0 of LΠ (f0 ) satisfies T 2 KT0 (ω, λ) KT0 (ω, ω) ≤C f0 (λ) for all real ω, and all λ large enough. In particular, KT0 (λ, λ) ≤ C f0 (λ) (2.3.8) for all λ large enough. Proof. The idea of the proof is similar to the one in Lemma 3 p. 57 in [45]. Put f (λ) = |φ0 (λ)|2 . Since f and f0 are comparable at ∞, and f is bounded around 0, it is clear that f is satisfying both conditions 2.2.3 and (C1). This means we can define LΠ (f ) the same T way, and this space is also a RKHS. Consider an arbitrary orthonormal basis for this space, and denote them by ψk , k = 1, 2, ... . Now, by Lemma 2.2.2, they are entire functions on Cd with finite exponential type which doesn’t depend on k. Further, we know ψk φ0 ∈ L2 (Rd ) since Rd |ψk (λ)φ0 (λ)|2 dλ = Rd |ψk (λ)|2 f (λ) dλ = 1 < ∞. Therefore, we can apply the Paley-Wiener Theorem ([36], Theorem 3.4.2. p. 171) to get ψk φ0 = gk for certain functions gk ∈ L2 (B) where B is a bounded subset of Rd (Here h stands for the Fourier transform of h). By Parseval relation, we can deduce that gk ’s are 25 orthonormal in L2 (B). It follows from Bessel’s inequality that |ψk (λ)|2 f (λ) = k k B e−i t,λ gk (t) dt 2 2 ei t,λ dt ≤ B = m(B), where m(B) is the Lebesgue measure of B. Therefore, |ψk (λ)|2 ≤ m(B)/f (λ) ≤ C/f0 (λ) k for large enough λ. Now, consider the reproducing kernels of LΠ (f0 ), denoting them by T KT0 (ω, .). Since f (λ) f0 (λ) as |λ| → ∞, by lemma (2.2.3), KT0 (ω, .) belong to LΠ (f ) T as well for ll ω ∈ Rd . Thus, we can expand it using the basis ψk , and get KT0 (ω, λ) = k KT0 (ω, .), ψk f ψk (λ), and then by Cauchy-Schwarz and lemma 2.2.3, we get 2 |KT0 (ω, λ)| 2 ≤ c = |ψk (λ)|2 KT0 (ω, .) f ≤ k 2 0 KT (ω, .) f 0 |ψk (λ)|2 k |ψk (λ)|2 , c KT0 (ω, ω) k which makes the proof complete. Theorem 2.3.5 in combination with the lemma 2.3.1 will lead to an appealing result. If the relative difference between two spectral densities is square integrable at infinity, then the corresponding GRFs with stationary increments will be locally equivalent. We finish this 26 section by proving this fact. Theorem 2.3.6. Suppose that the spectral measures F0 and F1 have positive densities f0 and f1 with respect to the Lebesgue measure, with f0 satisfying (C1) and (C2) for some entire function φ0 on Cd of finite exponential type. If there exists a finite constant k > 0 such that f1 (λ) − f0 (λ) 2 dλ < ∞, f0 (λ) |λ|>k (2.3.9) then GRFs with stationary increments having spectral measures F0 and F1 are locally equivalent. Proof. Combine Lemma 2.3.1, and Theorem 2.3.5. The only thing we need to additionally prove is that 1 ∈ / σ(ψ). In spirit of Theorem 2.3.4, we can assume that f0 = f1 on |λ| ≤ k. Now, take an arbitrary element φ ∈ LD (f0 ), and observe that using the multidimensional Paley-Wiener Theorem ([36], Theorem 3.4.2. p. 171), we get φφ0 is the inverse Fourier transform of a squared integrable function g with bounded support, B in Rd . This implies that |φ(λ)φ0 (λ)|2 = e−i λ,γ g(γ) dγ 2 B 2 e−i λ,γ g 2 (λ) dλ ≤ B g 2 (λ) dλ = B < ∞ for all λ ∈ Rd . This means φφ0 is bounded on Rd . This fact together with 2.3.9 imply that 27 f −f φ 0f 1 ∈ L2 (f0 ). Now, observe that 0 (ψφ) (λ) = = = Rd Rd φ(ω)ψ(λ, ω) F0 (dω) φ(ω) |γ|>k |γ|>k Rd = φ(ω) KT0 (λ, γ)KT0 (γ, ω) φ(γ) |γ|>k = πL KT0 (λ, γ)KT0 (ω, γ) ΠT (F0 ) f0 (γ) − f1 (γ) f0 (γ) f0 (γ) − f1 (γ) f0 (γ) f0 (γ) − f1 (γ) f0 (γ) F0 (dγ) F0 (dω) F0 (dω)F0 (dγ) KT0 (λ, γ) F0 (dγ) f − f1 φ 0 1{|γ|>k} (λ). f0 Now, similar to the proof of the Theorem 2.3.4, if ψφ = φ, we get that f −f . Letting k → ∞, by the Dominated Convergence Theorem, φ F ≤ φ 0f 1 1{|γ|>k} 0 0 F0 we get φ F = 0. This implies that φ = 0. Thus, 1 cannot be in the spectrum of ψ. The 0 proof is complete. 2.4 Application In this section, we apply the results in section 2.3 to some anisotropic GRFs with stationary increments. In particular, we consider GRFs with stationary increments and spectral density of the form f (λ) = 1 , βj γ d |λ | j=1 j where λ = (λ1 , ..., λd ) ∈ Rd \{0}, βj > 0 for all j = 1, ..., d, and γ > (2.4.1) d 1 j=1 βj . The latter condition guaranties the integrability condition in 2.2.3 for spectral measures (See Proposition 2.1 in [47]). Fractal and smoothness properties of this family of spatio-temporal 28 models are discussed in [47]. In order to apply the main results to this model, we need to verify the assumptions needed. First of all, condition (C1) is obviously satisfied for spectral densities of the form 2.4.1. Next lemma provides the proof that these spectral densities also satisfy (C2). Lemma 2.4.1. Spectral density functions of the form 2.4.1 satisfy condition (C2). Proof. First of all, it is obvious that 1 1 βj γ d j=1 |λj | 1+ , βj γ d |λ | j j=1 as |λ| → ∞. Therefore, it suffices to prove the lemma for functions of the form on the right hand side. Now, similar to the construction made in the proof of lemma 2.3 in [29], we can find a function φ ∈ L2 (B) for some bounded subset B ⊆ Rd such that 2 1 1+ βj d j=1 |λj | γ φ(λ) , as |λ| → ∞, which φ is the Fourier transform of φ. By Paley-Wiener Theorem ([36], Theorem 3.4.2. p. 171), φ is actually the restriction on Rd of an entire function on Cd with finite exponential type. Therefore, the proof is complete. In order to apply the main results, first we consider a similar family of spectral densities of the form f (λ) = 1 Hj Q+2 d j=1 |λj | , where λ = (λ1 , ..., λd ) ∈ Rd \{0}, 0 < Hj < 1 for all j = 1, ..., d, and Q = (2.4.2) d 1 j=1 Hj (See the connection between this family and the ones of the form 2.4.1 in remark 2.2 in [47]). 29 Theorem 2.4.1. Suppose f0 and f1 are spectral densities of the form 2.4.2 with parameters Hj0 and Hj1 , respectively. Then, GRFs with stationary increments and spectral densities f0 and f1 are locally equivalent if and only if Hj0 = Hj1 for all j = 1, ..., d. Proof. Suppose for some k ∈ {1, ..., d}, Hk0 < Hk1 . By Lemma 3.2 in [47], there exist c1 , c2 > 0, such that for all t ∈ Rd d 2Hji |t| c1 j=1 d ≤ et 2f i ≤ c2 2Hji |t| , (2.4.3) j=1 for i = 0, 1. If we simply choose t ∈ Rd with tk = l, and tj = 0 for j = k, we get et 2f 1 et 2f 0 c 2 Hk1 −Hk0 ≤ 2l → 0 as l → 0. c1 This violates the necessary condition for equivalence of Gaussian measures in Theorem 2.3.1. Therefore, the proof in complete. Now, we go back to the more flexible spectral densities for space-time models with stationary increments given in 2.4.1. Next theorem proves that under certain conditions, the mixture of spectral densities of the form 2.4.1 will be equivalent to the one with the lowest decay rate at infinity. Similar results of this type for Gaussian processes can be seen in [44] and [7]. Theorem 2.4.2. Suppose X and Y are two independent centered GRFs with stationary increments with spcetral densities of the form 2.4.1 with parameters (βj , γ) and (βj , γ ), 30 respectively. Then, if γ > 1 2 d j=1 1 βj + max1≤j≤d βj βj γ, (2.4.4) then X and X + Y will be locally equivalent. Proof. Using theorem 2.3.6, we only need to show βj 2γ d j=1 |λj | |λ|>1 βj d j=1 |λj | 2γ dλ < ∞. (2.4.5) By using the inequality (a + b)p ≤ 2p (ap + bp ), we can break the integral in 2.4.5 into d integrals. Thus, it’s enough to show for each fixed j = 1, ..., d Ij := 2β γ |λj | j |λ|>1 β d k k=1 |λk | 2γ dλ < ∞. (2.4.6) √ Since |λ| > 1, this implies that |λk | > 1/ d for some k ∈ {1, ..., d}. We need to make two cases. Case I is when k = j, and case II is when k = j. In both cases, we use the following fact that given positive constants β and γ, and nonnegative constant b, there exists a finite positive constant c such that for all a > 0 ∞ 0 ∞ − γ− 1 − b xb yb β β dx = a γ γ dy a + xβ 1 + yβ 0   − γ− 1 − b   β β c a if βγ − b > 1, =    ∞ if βγ − b ≤ 1. 31 (2.4.7) First, let’s consider case I. By, applying d times the formula 2.4.7, ∞ Ij ≤ √1 d ∞ 2β γ |λj | j 0 dλ 0 d βr r=1 |λr | d−1 2βj γ ∞ ≤ c ∞ ... |λj | √1 d 2γ − β |λj | j 2γ dλj 1 r=j β r < ∞ 1 r=j β r since βj 2γ − > 2βj γ + 1 due to condition 2.4.4. Next, we consider case II, where k = j. Similar to case I, we use formula 2.4.7 iteratively, but we do the integration in different order. Denote integration in λi for i = j, k by dλ\j,k , and observe ∞ Ij ≤ √1 d dλk ... 0 0 √1 d βr d r=1 |λr | d−1 ∞ ∞ ≤ c 2β γ |λj | j ∞ ∞ dλk ∞ 0 0 ∞ √1 d βr d r=1 |λr | 1 β |λk | k 2γ − dλj dλ\j,k 1 ... d−2 ≤ c 2γ 2βj γ βj − 1 r=k β r 2βj γ 2γ − 1 − βj βj dλ\j,k dλk < ∞, where the second and the third inequality follow since 2γ βj > 2βj γ + 1 and βk 2γ − 2βj γ βj − 1 r=k β r > 1, respectively using the assumption 2.4.4. Therefore, the proof is complete. Next, we will consider similar situation as in theorem 2.4.2, but this time we put discrete 32 spectral measure mixed with the ones of the form 2.4.1. For that purpose, consider discrete spectral measure of the form F ({−γ n }) = F ({γ n }) = αn , where γ n ∈ Rd , αn ≥ 0, for n ≥ 1, and |γ n |2 ∞ n=1 1+|γ n |2 αn (2.4.8) < ∞. If {γ n , n = 1, 2, ...} is a bounded subset of Rd , then in view of theorem 2.3.4, this spectral measure won’t affect the equivalence of Gaussian measures. Therefore, we consider here only the case where |γ n | → ∞ as n → ∞. Theorem 2.4.3. Let X and Y be two independent centered GRFs with stationary increments with spectral measures FX and FY . Suppose FX has density with respect to Lebesgue measure on Rd , denoted by f , which satisfies both conditions (C1) and (C2), and FY is a discrete measure of the form 2.4.8. Then, if n>N αn < ∞, f (γ n ) (2.4.9) for some N ≥ 1, then X and X + Y are locally equivalent. Proof. First of all, using theorem 2.3.4, we can assume αn = 0 for n = 1, ..., N , without having any consequences on the equivalence of Gaussian measures. Second, observe that for all φ ∈ LeΠ , φ F ≤ φ F +F , which by remark 2.3.2, is equivalent to condition (i) in X X Y T theorem 2.3.2. All we need to prove is then to show that the function ψ in theorem 2.3.2 is in L2 (f ⊗ f ). For that, using Foubini’s theorem and the reproducing kernel property we 33 have 2 ψ 2f ⊗f = Rd Rd n>N = αn αm n,m>N αn KT (ω, γ n )KT (λ, γ n ) f (ω)f (λ)dωdλ R d Rd KT (ω, γ n )KT (λ, γ n )KT (ω, γ m )KT (λ, γ m ) × f (ω)f (λ)dωdλ αn αm |KT (γ m , γ n )|2 = n,m>N  ≤  n>N 2 αn  f (γ n ) < ∞ by the assumption 2.4.9. The proof is complete. 34 Chapter 3 Covariance Tapering for Anisotropic Nonstationary Gaussian Random Fields with Application to Large Scale Spatial Data Sets Estimating the covariance structure of spatial random processes is an important step in spatial data analysis. Maximum likelihood estimation is a popular method in spatial models based on Gaussian random fields. But calculating the likelihood in large scale data sets is computationally infeasible due to the heavy computation of the precision matrix. One way to mitigate this issue, which is due to Furrer et al. (2006), is to “taper” the covariance matrix. While most of the results in the current literature focus on isotropic tapering for stationary Gaussian processes, there are many cases in application that require modeling of anisotropy and/or nonstationarity. In this chapter, we propose a nonstationary parametric model, in which the underlying Gaussian random field may have different regularities in different directions, thus can be applied to model anisotropy. Using the theory of equivalence of Gaussian measures under nonstationary assumption, strong consistency of the tapered likelihood based estimation of the variance component under fixed domain asymptotics are 35 derived by putting mild conditions on the spectral behavior of the tapering covariance function. The procedure is illustrated with numerical simulation. 3.1 Introduction Spatial Statistics is nowadays a very active research field in Statistics, and has many applications in Geology, Agricultural Science, Environmental Science, Climate data, etc. [9, 41]. A common problem in this field is the estimation of the covariance structure in spatial models based on Gaussian random fields. Likelihood-based estimators are the most popular method for estimating the covariance parameters. However, in large scale spatial data sets, calculating the likelihood is computationally infeasible due to the heavy calculation of the precision matrix. There are different ways to overcome this issue. The first idea is to set the off-diagonal entries of the covariance matrix to zero, or to keep the first k subdiagonal entries for some integer k and put the rest of them to be zero, which is called banding. In this way, the resulting matrix becomes sparse, and one can use the existing algorithms dealing with sparse matrices to handle the new covariance matrix efficiently. But, the problem with banding is that the final covariance matrix may not be positive definite, which is a huge drawback since all the theoretical covariance matrices must be positive definite. An alternative approach is to multiply the covariance function by a positive definite compactly supported correlation function. By this way, the resulting covariance matrix is again sparse, but still positive definite. This is called covariance tapering [18]. A natural question is that how we can use tapering to construct consistent estimates for the covariance parameters in spatial regression models. Kaufman et al. (2008) proposed two different likelihood-based estimations of the parameters in the Matern covariance function, and proved the strong con- 36 sistency of the estimates using the results in the equivalence of stationary Gaussian measures [37]. See also [12] for more results in the asymptotics of tapered maximum likelihood estimators. There are two strong assumptions under which the consistency is proved, isotropy, and stationarity of the underlying Gaussian random field. However, there are many cases in application in which the data sets resemble anisotropy and nonstationarity. There have been interests in general (not necessarily in the context of covariance tapering) for dropping these conditions in spatial modellings (See [11], and [1]). In this chapter, we relax both of these conditions simultaneously. For that purpose, in section 3.2, we introduce a class of parametric models for spatial modeling which are anisotropic and non-stationary. In section 3.3, we state the main result of the paper which is deriving strong consistency of the tapered likelihood-based estimates of the variance parameter. In the last section, we demonstrate the procedure proposed here via numerical simulation. 3.2 Preliminary In this section, we introduce a class of Gaussian random fields which can be used as spatial models. Any Gaussian field model needs a mean structure and a covariance structure to be uniquely defined. However, finding new models for covariance structure might be hard since one can only choose covariance structures from the family of positive definite functions and it is complicated to verify the positive definiteness. An alternative is to use the spectral representations of positive definite functions or variograms. We consider here specifically Gaussian field models with stationary increments. By applying the results in [49], the covariance functions of Gaussian random fields with stationary increments on Rd can be determined by a symmetric non-negative measures on Rd \ {0} satisfying certain integrability 37 condition (See e.g., [47] for more details). This measure is called spectral measure. If this measure is absolutely continuous with respect to Lebesgue measure on Rd , we will call its Radon-Nikodym derivative the “spectral density”. Now, we propose the following spectral density: σ2 f (λ) = 1+ where λ = (λ1 , ..., λd ) ∈ Rd , Q = d j 1/Hj , Hj j |λj | Q+ν (3.2.1) and σ 2 , ν and Hj ’s are all positive parameters. σ 2 is the variance component, and Hj ’s are related to the smoothness of the model in different directions. Figures 3.1 and 3.2 are realizations of such a Gaussian field over the two-dimensional grid [0, 1]2 with increments of 0.02 with parameters H1 = 1.5, H2 = 3, and ν = 0.4. Having different parameters, Hj , j = 1, ..., d, for different components of λ ∈ Rd , allows the underlying random field to have different regularities in different directions, and therefore, this model is able to capture anisotropy appearing in spatial data sets. (See more about fractal and smoothness properties of these models in [47]. See also [43]). 3.3 Main Result In this section, we state the main result of the article, which is the consistency of the tapered likelihood estimation of the variance parameter σ 2 in the model introduced in the last section under the fixed domain asymptotics. The following theorem, shows how to get equivalent Gaussian measures in tapering set up, which is necessary in proving strong consistency, by putting mild conditions on the spectral behavior of the tapering covariance function. 38 Figure 3.1 3D Simulated Surface on [0, 1]2 with increments 0.02 Theorem 1. Suppose {Z(t), t ∈ T }, where T ⊆ Rd is a mean zero Gaussian random field with stationary increments, and the spectral density of the form 3.2.1 for some positive constants σ 2 , ν, and Hj , j = 1, ..., d. Let’s denote the covariance function of the process by K0 (x). If K1 (x) = K0 (x)Kt (x) where Kt (x) is a correlation function, and its Fourier transform exits and satisfies the following condition: M ft (λ) ≤ 1+ for some j Hj Q+ν+ (3.3.1) |λj | > max{Q/2, 1/min{Hj /2}}, and M > 0. Now, if Hj > 1, j = 1, ..., d, then the Gaussian measures induced by K0 and K1 will be equivalent on the paths of {Z(t), t ∈ T } 39 Figure 3.2 2D Projection of the Simulated Surface on [0, 1]2 with increments 0.02 for any bounded subset T ⊂ Rd . Remark 1. Proof of this theorem is similar to that of Theorem 1 in [28]. However, due to the nonstationarity assumption, it relies on the study of the equivalence of Gaussian random fields with stationary increments, which is discussed in details in the second chapter. Suppose one observes the random process at locations sj ’s j = 1, ..., n on some bounded domain in Rd , and Kt (x) is the tapering covariance function. Fix a tapering parameter γ > 0. Define the tapering matrix T (γ) to be T (γ)ij = Kt ( si − sj ; γ). This means that if si − sj > γ, then T (γ)ij = 0. Denote the true covariance matrix of the process by Σ(θ). Now, the idea of tapering is to use Σ(θ) ◦ T (γ) as the new covariance matrix in the 40 likelihood of the process. Here, ◦ means the element wise matrix product. Therefore, the Tapered Maximum likelihood estimator (Tapered MLE ) of σ 2 will be: σt2 = argmaxσ2 Tapered likelihood = argmaxσ2 1 1 − log |Σ(θ) ◦ T (γ)| − Z (Σ(θ) ◦ T (γ))−1 Z , 2 2 where Z is the vector of observed values of the random field at the specified locations. Evaluating the inverse of Σ(θ) ◦ T (γ) is much faster than Σ(θ) in many cases, especially with large n. This is the benefit of using Tapered MLE. However, this estimator might be biased in general. In [28], another estimator based on the same idea of tapering is presented which is unbiased. We call it here the Adjusted Tapered MLE, and it is defined as follows: 2 σadj. = argmaxσ2 Adjusted tapered likelihood = argmaxσ2 1 1 − log |Σ(θ) ◦ T (γ)| − Z 2 2 (Σ(θ) ◦ T (γ))−1 ◦ T (γ) Z . We will see in section 4 that the Adjusted Tapered MLE performs better comparing to the Tapered MLE due to its unbiasedness. Theorem 1 is the key part in proving the consistency of the tapered maximum likelihood estimators for the variance component (σ 2 ). The outline of the proof is similar to Theorem 2 in [28], and also it relies on the above theorem. We omit the proof here. 41 3.4 Simulation Results In this section, we illustrate the methods discussed in previous parts by applying them into some simulated spatial data sets. For this purpose, we simulated 1000 data sets, each consisting of a multivariate Gaussian vector of length 14×14. The locations are two-dimensional grid over [0, 1]2 with increments 0.07. We added a random noise in each coordinate, uniformly distributed on [−0.01, 0.01]. We used the spectral density 3.2.1 to generate the normal vector with H1 = 1.5, H2 = 2, ν = 0.5, and σ 2 = 1. Figure 3.3 shows a simulated surface using the above parameters over the specified grid. Figure 3.3 Simulated Surface on [0, 1]2 with increments 0.07 We used the same tapering function used in [28] with different tapering parameter γ = 42 0.7, 0.4, 0.2, respectively. We applied three diferent estimators for estimating σ 2 : MLE, Tapered MLE and Adjusted Tapered MLE (See [28] for details.). Table 3.1 shows the results for this procedure for different tapering parameters. The values in the table are the averages of the estimated value over the 1000 repetitions with their standard errors in the bracket. As it was expected, by decreasing the tapering parameter, the Tapered MLE becomes more and more biased. However, the Adjusted Tapered MLE is almost unbiased regardless of the changes in the tapering parameter. Table 3.1 Results for estimation of σ 2 σ2 MLE Tapered MLE Adjusted Tapered MLE γ = 0.7 γ = 0.4 γ = 0.2 0.999(0.10) 0.999(0.10) 0.999(0.10) 0.437(0.06) 0.409(0.07) 0.400(0.17) 1.025(0.53) 1.019(0.47) 1.031(0.61) Simulations show that comparing to MLE, Adjusted Tapered MLE is a good alternative for estimating the covariance parameters in spatial data analysis, and it is computationally feasible. Further, this method has the potential to be generalized to more complicated models which have nonstationarity and anisotropy. There are other methods dealing with covariance estimation in large spatial data sets. For example, in [11], nonstationary spatial models are defined through some fixed basis functions, and then weighted least squares method (rather than the MLE approach) is chosen for covariance parameter estimation with the emphasis of finding the best linear unbiased prediction (BLUP). 43 Chapter 4 Spatial factor models and dimension reducing Gaussian predictive processes for high dimensional remotely sensed data and forest variables Combining spatially-explicit long-term forest inventory and remotely sensed information from Light Detection and Ranging (LiDAR) datasets through statistical models can be a powerful tool for predicting and mapping above-ground biomass (AGB) at a range of geographic scales. We present and examine a novel modeling approach to improve prediction of AGB using LiDAR data. More specifically, we introduce a multivariate spatial regression model in which both LiDAR data and AGB play as target variables. There are two sources of computational infeasibility in the proposed model which need to be addressed. First, the number of target variables are too high since we put both LiDAR and AGB simultaneously (In our real dataset, we have more than 100 LiDAR data points at different heights for each location). Second, the number of locations are also too high (more than 451 locations in our 44 data).In other words, dimension reduction is needed in two aspects: (i) the length of the vector of outcomes, and (ii) the very large number of spatial locations. We use spatial factor modeling for the first aspect, and low-rank spatial processes for the second one. Latent variable (factor) models are usually used to address the large vector of outcomes, and lowrank spatial processes offer a rich and flexible modeling option for dealing with a large number of locations. We introduce the model in the first section, and it is followed by a brief discussion on the identifiablility issue of the model in section 4.2. Preparation for the Bayesian hierarchical framework for the estimation and prediction is done in sections 4.3, 4.5 and 4.6. In section 4.4, we explain in detail the low-rank spatial processes chosen here to reduce the computational time. We finish this chapter by a simulation study and real data analysis. 4.1 Model The following hierarchical factor model is considered          z(s)  x(s) β   A 0  f (s)  z (s)  = +  +  y(s) xy (s) η αf αν ν(s) (s) y (4.1.1) where z(s) is m×1 vector, y(s) ∈ R, x(s) is t×m set of covariates for the high-dimensional signal z(s), xy (s) is r×1 vector of covariates for the target variable y(s), β and η are t×1 and r × 1 vectors of parameters, respectively. A is m × q matrix of factor loadings with elements satisfying a1j > 0 ∀j = 1, ..., q. αf = (αf,1 , αf,2 , ..., αf,q ) is a 1 × q vector of coefficients, and αν > 0. Also, f (s) = (f1 (s), f2 (s), ..., fq (s)) where each fj (s) follows an independent 45 spatial GP (0, Cj (s1 , s2 ; θj )) with unit variance, and ν(s) follows a GP (0, Cj (.; θν )) with unit 2 )) and variance and independent from all fj s. z (s) and y (s) are N (0, Σ z := diag(τ12 , ..., τm N (0, τy2 ) where τ 2 and τy2 are representing the usual geostatistical nuggets for predicting z(s) and y(s), respectively. In this model, z(s) represents the LiDAR data, y(s) is the biomass, and fj (s)’s are the factors with the total number of q factors. 4.2 Identifiablity Issue     f (s) A 0  Define Λ :=  . The indentifiability issue in model 4.1.1 , and ω(s) :=  ν(s) αf αν relates to the existence of an (q + 1) × (q + 1) orthogonal matrix P (i.e. P P = P P = Iq ) such that P w(s) has the same covariance structure as w(s). Following the argument in [35], the only possible matrices in this case are the ones whose entries are in {−1, 0, 1} with exactly one nonzero element in each row and each column. In other words, changing the sign of some of the components in w(s), and permuting some of the elements in w(s) are the only two groups of transformations which can lead to another Gaussian random field with the same distribution as w(s). Both cases need to be handled in order to have an identifiable model. The first group can be prevented by putting constrain on the sign of the parameters involved in Λ. To be precise, at each column in Λ, the sign of one of the elements needs to be fixed. Here, we have positive elements in the first row of A with αν > 0, which together will solve the first part. Switching the orders of the elements in w(s) can be prevented by putting constrain on the priors related to the covarince parameters in f (s) and ν(s), which will be addressed in the next section. 46 4.3 Prior Specification We put a multivariate normal prior on the mean parameters β and η, with mean µβ and µη , −1 and variance Σβ and Ση , respectively ( we will consider a flat prior, i.e. Σ−1 β = Ση = 0, and also µβ = µη = 0 ). Both nugget effects τ 2 and τy2 are assigned Inverse Gamma distribution IG(a, b) with choices of a = 2 for having an infinite variance and b = 5. All the parameters in the matrix Λ are considered to be independent normal random variables or truncated normal random variables depending on whether their sign are determined beforehand. More precisely, aij ∼ N (0, c2a ) for i = 1, αf,i ∼ N (0, c2α ) for i = 1, ..., q, and a1i ∼ N (0, c2a )I(a1i > 0) for i = 1, ..., q, and αν ∼ N (0, c2a )I(αν > 0). Correlation functions related to fj s and ν are considered to be exponential correlation function, i.e. cov(fj (s), fj (t)) := ρj (s, t; φj ) = exp(−φj s−t ) for j = 1, ..., q, and also cov(ν(s), ν(t)) := ρν (s, t; φν ) = exp(−φν s−t ). Because of identifiability issue discussed in section 4.2, we construct a joint distribution for the φj s to ensure ordering. In particular, we set π(φ) = π(φ1 )π(φ2 | φ1 )...π(φq | φq−1 , ..., φ1 ) (4.3.1) where π(φ1 ) is a uniform density with support (φl , φu ) with φl = − log(0.05)/dmax and φu = − log(0.01)/dmin , where dmin and dmax are the minimum and maximum distance across all the locations, and for j = 2, 3, ..., q cj π(φj | φj−1 , ..., φ1 ) ∝ exp − φj − φj−1 I(φj−1 < φj < φu ). (4.3.2) Based on the discussion in [35], we fix cj = 2j as a reasonable choice. Also, φν ∼ unif orm(φl , φu ). 47 4.4 Approximation using Gaussian predictive processes Due to computational burden, we need to replace the random effects f (s) and ν(s) by a low-rank approximation of them. One way to address this issue is to project the mentioned Gaussian random fields into the space generated by them at certain locations, called “knots” (We refer to [4] and [16] for the complete discussion on Gaussian predictive processes). Suppose S ∗ = {s∗1 , ..., s∗n∗ } is the set of knots considered for approximating both fields f (s) and ν(s). We replace f (s) and ν(s) with f ∗ (s) and ν ∗ (s) with the following details. Define the Gaussian field f˜k∗ (s) ∼ N fk∗ (s), σk∗ 2 (s) (4.4.1) where fk∗ (s) := dk (s; φk ) Dk∗ (φk )−1 fk (S ∗ ), σk∗ 2 (s) := 1 − dk (s; φk ) Dk∗ (φk )−1 dk (s; φk ), (4.4.2) and ν˜∗ (s) ∼ N ν ∗ (s), σν∗ 2 (s) (4.4.3) where ν ∗ (s) := dν (s; φν ) Dν∗ (φν )−1 ν(S ∗ ), σν∗ 2 (s) := 1 − dν (s; φν ) Dν∗ (φν )−1 dν (s; φν ), (4.4.4) and dk (s; φk ) is an n∗ × 1 vector whose ith element is ρk (s, s∗i ; φk ), Dk∗ (φk ) is an n∗ × n∗ covariance matrix whose (i, j)th entry is ρk (s∗i , s∗j ; φk ), fk (S ∗ ) is an n∗ × 1 vector with elements as the evaluation of the field fk at the knot points. The case for ν(s) is exactly similar to the fields fk s by just simply replacing the index k by ν. After this replacement, 48 model 4.1.1 can be written as       f ∗ (s)    ∗ (s) z  z(s)  x(s) β   A 0      = +  +  ∗ ∗ y(s) xy (s) η αf αν ν (s) y (s) (4.4.5) where f ∗ (s) = (f1∗ (s), f2∗ (s), ..., fq∗ (s)) , ∗z (s) ∼ N (0, Σ ∗ (s) := AΣf˜∗ A + Σ z ), and ∗y (s) ∼ z N (0, σ 2∗ (s) := αf Σf˜∗ αf + αν2 σν∗ 2 (s) + τy2 ) with Σf˜∗ to be a q × q diagonal matrix with y diagonal elements σk∗ 2 (s), k = 1, ..., q. 4.5 Data equation Let       0  β  , b =  , xy (s) η z(s) x(s) Y (s) =   , X (s) =  y(s) 0     ∗ ∗ f (s)  z (s) ω ∗ (s) =  .  , (s) =  ∗ ∗ ν (s) y (s) (4.5.1) Now, the model can be written as Y (si ) = X(si )b + Λ ω ∗ (si ) + (si ) (4.5.2) for i = 1, ..., n. Also, the matrix form of the model is Y = Xb + Λ ω ∗ + 49 (4.5.3) where   0 Λ  Y (s1 )   X(s1 )              .. Y =  ...  , X =  ...  , Λ =  , .             0 Λ Y (sn ) X(sn )     ∗  (s1 )   ω (s1 )          =  ...  , ω ∗ =  ...  .         ∗ (sn ) ω (sn )     (4.5.4) Here, ω ∗ ∼ N (0, Σω∗ ), and ∼ N (0, Σ ),     0  0  Σω∗ (s1 ) Σ (s1 )         . . . . Σ =   , Σω∗ =  . .         0 Σω∗ (sn ) 0 Σ (sn ) (4.5.5) with     0  0  Σ ∗z (si ) Σf ∗ (si ) ,Σ ∗ = Σ (s ) =     ω (si )  i 2 2 0 σ ∗ (s ) 0 σν ∗ (s ) y i i (4.5.6) where Σf ∗ (s) is a diagonal matrix with diagonal elements dk (s; φk ) Dk∗ (φk )−1 dk (s; φk ), and σν2∗ (s) = dν (s; φν ) Dν∗ (φν )−1 dν (s; φν ). 4.6 Computations The first step for computation purposes is to find the full conditionals of the parameters involved in the model. 50 4.6.1 Updating the mean part b ˜β The posterior of β follows Nt µ ˜β , Σ with: −1 n ˜β = Σ Σβ −1 + x(si )Σ (si )x(si ) (4.6.1) i=1 and n ˜β µ ˜β = Σ Σβ −1 µβ + x(si )Σ (si ) (z(si ) − Af (si )) , (4.6.2) i=1 where Σ (s) is an (m) × (m) diagonal matrix with j-th diagonal element τj−2 for j = 1, ..., m. ˜ η with: The posterior distribution of η follows a Nr µ ˜η , Σ −1 n ˜η = Σ Ση −1 xy (si )xy (si ) /τν2 + (4.6.3) i=1 and n ˜η µ ˜η = Σ Ση −1 µ η xy (si )τν−2 (y(si ) − λν ω(si )) , + (4.6.4) i=1 where λν is the last row of the matrix Λ. 4.6.2 Updating ω ω consists of two main components: f and ν , and they will be updated similarly. For k = 1, ..., q, fk will follow a Nn ˜ µ ˜f , Σ f k k with: ˜ = D (φk ) D (φk ) + Dk (φk ) Σf (φk )−1 Dk (φk ) Σ k k f k k 51 −1 Dk (φk ) (4.6.5) and ˜ D (φk )−1 Dk (φk ) Σf (φk )−1 fk , µ ˜f = Σ k f k k k (4.6.6) where Σf (φk ) is an n × n diagonal matrix whose j-th diagonal element is σk 2 (sj ), and k Dk (φk ) is the covariance matrix between the observation locations and the knots for the k-th factor fk . 4.6.3 Updating the loading factors Λ For j = 2, ..., m, the full conditionals for the j-th row of Λ follows Nq (mj , Kj ), where −2 F F Kj−1 = c−2 a Iq + τj (4.6.7) mj = τj−2 Kj F (zj − µj ), (4.6.8) and where F = (f (s1 ), ..., f (sn )) , and µj is the mean vector of zj . The first row of Λ is normal distribution truncated at 0 with mean m1 and variance K1 . The last row of Λ follows Nq+1 (mν , Kν ) with: −2 W W Kν−1 = c−2 a Iq+1 + τν (4.6.9) and mν = τν−2 Kν W (y − µy ), (4.6.10) where W = (ω(s1 ), ..., ω(sn )) , and µy is the mean vector of yj . The last element of this row is truncated at 0. 52 4.6.4 Updating the nuggets τ For j = 1, ..., m, the full conditional distribution of τj2 will be Inverse Gamma with shape parameter a + n/2, and scale parameter b + 21 zj − µj − F λj zj − µj − F λj where λj is the j-th row of the matrix Λ. τν2 will be also Inverse Gamma with the same shape parameter, but with scale parameter b + 12 (y − µy − W λν ) (y − µy − W λν ) where λν is the last row of the matrix Λ. 4.7 Simulations We illustrate the performance of the model using a simulation study. Here, we generated data over 256 locations across a [0, 50] × [0, 50] square using the following parameters:         1 0 2 0 0  0.1  2 5                 b = 10 , Λ = −1 2 0 , φ =  0.4  , Σ = 0 5 0 .                 0 0 1 0.12 1 −1 2 12 In this simulation, t = r = 1 and m = q = 2. We used Gaussian predictive processes to reduce the computation time by selecting 81 knots. Table 4.1 shows the parameter estimates together with their credible interval. In can be observed from this table that all the true parameter values are within their 95% credible interval calculated by the posterior distributions. 53 Table 4.1 Parameter estimates with their 95% credible intervals 4.8 parameter β1 β2 η true value 5 10 12 estimate 6.278622 9.152451 12.13515 credible interval (4.213607,8.506537) (8.141619,10.197632) (10.67103,13.93106) τ12 τ22 τν2 2 5 1 1.7069400 2.6038261 1.4963317 (0.9133798,2.6578697) (0.8570427,6.4833734) (0.7562288,2.4505370) Λ1,1 Λ1,2 Λ2,1 Λ2,2 Λ3,1 Λ3,2 Λ3,3 2 1 -1 2 1 -1 2 2.235054 0.6389909 -1.0007031 2.371756 0.6106929 -0.8785957 2.430128 (1.418790,2.978264) (0.2377660,1.4958831) (-1.8632592,-0.3081513) (1.055427,2.886742) (-0.1468869,2.2537335) (-1.3010258,-0.5326576) (1.571113,3.469093) φ1 φ2 φν 0.1 0.4 0.12 0.06601322 (0.04838031,0.12743472) 0.6352006 (0.3916554,0.8975183) 0.08413867 (0.04986396,0.19860435) Real Data Analysis In this section, the proposed model is assessed using LiDAR data acquired from NASA Goddards LiDAR, Hyper-spectral & Thermal imager and field inventory data from the Penobscot Experimental Forest in Bradley, Maine (NASA Goddards LiDAR, Hyper-spectral, and Thermal (G-LiHT) imager 28 is an air-borne platform developed, in part, to examine how future space-originating LiDAR, e.g., ICESat-2, 29 GEDI, or other platforms, may be combined with field-based validation measurements to build predictive models for AGB and other forest variables; See [3] and [8] for more information). The dataset consist of two parts: (i) biomass measurements for over 451 locations, and (ii) LiDAR signal data observed at more than 100 different heights in more than 5000 locations including those we have their biomass (AGB) measurements. Figure 4.1 plots the Penobscot Experimental Forest together with the AGB measurements. 54 Figure 4.1 AGB measurements from the Penobscot Experimental Forest in Bradley, Maine We only used the portion of the data in which we have both AGB measurement and LiDAR data for the analysis so that we would be able to use the multivariate model proposed in this chapter for data fitting. This portion consists of 451 locations with 64 outcome variables (63 from the LiDAR data, and one from the AGB). The maximum number of factors is set to be 10. The stochastic factor selection is done similarly as in [35] by putting Bernoulli priors on keeping each factor in the model independently (See section 2.3 in [35] for more explanation on the adaptive Bayesian factor selection). Further, 100 knots are chosen for the low-rank Guassian predictive processes approximation. We used 20,000 MCMC iterations, including 2,000 samples for burn–in. Table 4.2 summarizes the fitted model which includes 55 some of the parameter estimates with their credible intervals. Out of all the 10 factors, factor # 5 was not selected in the stochastic selection procedure we have chosen in our model. One could observe the wide credible interval for the coefficient αf,5 as compared to all the other factor loadings as a sign of its corresponding factor not being selected. Moreover, figure 4.2 shows both the real data and the fitted values coming from the fitted model. Also, in figure 4.3 one can see the fitted values for the LiDAR data at several different locations. Figure 4.2 Fitted plot of AGB measurements together with the observed values 56 Figure 4.3 LiDAR data with fitted values and credible bands 57 Table 4.2 Parameter estimates with their 95% credible intervals parameter η0 estimate 0.9912472 credible intervals (0.9559148,1.0267694) τy2 0.1186832 (0.1039515,0.1366400) αf,1 αf,2 αf,3 αf,4 αf,5 αf,6 αf,7 αf,8 αf,9 αf,10 αf,ν φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ9 φ10 φν 0.1887652 (-0.2071523,0.5462206) -0.1370774 (-0.4929575,0.1614393) 0.0883730 (-0.16048225,0.32833808) 0.0604460 (-0.16171807,0.27297671) -0.016114 (-6.11216314,6.18284331) -0.021537 (-0.24357843,0.08823335) 0.0242004 (-0.13238865,0.18407468) -0.115645 (-0.30288243,0.01502838) 0.1121742 (-0.2331279,0.2656526) -0.165493 (-0.2868342,0.1519692) 0.112052 (0.005656521,0.357829027) 0.7551075 0.9855927 1.375376 2.140675 14.867140 31.397823 46.783236 61.83730 77.90531 95.05987 0.7620842 (0.7501629,0.7775895) (0.9161703,1.0884193) (1.211139,1.602903) (1.734234,2.813785) (3.371117,35.734567) (5.248678,53.407113) (8.355846,67.885794) (18.15725,81.25511) (35.90097,91.89012) (55.68514,99.91256) (0.7504561,0.8102750) 58 BIBLIOGRAPHY 59 BIBLIOGRAPHY [1] Anderes, E. B. and Stein, M. L.: “Estimating deformations of isotropic Gaussian random fields on the plane”, The Annals of Statistics, 719–741(2008) [2] Aronszajn, N.: “Theory of reproducing kernels”, Transactions of the American mathematical society, 68.3, 337–404(1950) [3] Awadallah, A., Abbott, V., Wynne, R., and Nelson, R.: “Estimating forest canopy height and biophysical parameters 454 using photon-counting laser altimetry”, Proc. 13th International Conference on LiDAR Applications for Assessing Forest 455 Ecosystems, 1291-136 (2013) [4] Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H.: “Gaussian predictive process models for large spatial data sets”, Journal of the Royal Statistical Society, Series B, 70(4), 825–848 (2008) [5] Baudoin, F. and Nualart, D.: “Equivalence of Volterra processes”, Stochastic processes and their applications, 107(2), 327–350 (2003) [6] Chatterji, S. D. and Mandrekar, V.: “Equivalence and singularity of Gaussian measures and applications”, Probabilistic analysis and related topics, 1, 169–197 (1978) [7] Cheridito, P.: “Mixed fractional Brownian motion”, Bernoulli, 7(6), 913–934 (2001) [8] Cook, B., Corp, L., Nelson, R., Middleton, E., Morton, D., McCorkel, J., Masek, J., Ranson, K., Ly, V., ad Montesano, P.: “Nasa goddards lidar, hyperspectral and thermal (g-liht) airborne imager”, Remote Sensing, 5, 4045-4066 (2013) [9] Cressie, N.A.C.: “Statistics for Spatial Data”, 2nd ed. New York: Jone Wiley & Sons (1993) [10] Cressie, N. and Huang, H. C.: “Classes of nonseparable, spatio–temporal stationary covariance functions”, Journal of the American Statistical Association, 94(448), 1330– 1339(1999) [11] Cressie, N., Johannesson, G.: “Fixed rank kriging for very large spatial data sets”, Journal of the Royal Statistical Society: Series B, Vol. 70(1), 209–226 (2008) 60 [12] Du, J., Zhang, H., Mandrekar, V. S.: “Fixed–domain asymptotic properties of tapered maximum likelihood estimators”, The Annals of Statistics, Vol. 37(6A), 3330-3361 (2009) [13] Dunford, N. and Schwartz, J. T.: “Linear operators. Part 2: Spectral theory. Self adjoint operators in Hilbert space”, New York (1963) [14] Ebeling, W.: “Functions of several complex variables and their singularities”, Vol. 83. American Mathematical Soc. (2007) [15] Feldman, J.: “Equivalence and perpendicularity of Gaussian processes”, Pacific J. Math., 8(4), 699–708 (1958) [16] Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E.: “Improving the performance of predictive process modeling for large datasets”, Computational statistics and data analysis, 53(8), 2873–2884 (2009) [17] Fuentes, M. and Smith, R. L.: “A new class of nonstationary spatial models”, Technical report, North Carolina State University, Raleigh, NC, (2001) [18] Furrer, R. and Genton, M. G. and Nychka, D.: “Covariance tapering for interpolation of large spatial datasets”, Journal of Computational and Graphical Statistics, 15(3), (2006) [19] Gikhman, I. I. and Skorokhod, A. V.: “The Theory of Stochastic Processes: I”, Springer (1965) [20] Gneiting, T.: “Nonseparable, stationary covariance functions for space–time data”, Journal of the American Statistical Association, 97(458), 590–600 (2002) [21] Gneiting, T. and Genton, M. and Guttorp, P.: “Geostatistical space–time models, stationarity, separability and full symmetry”, Statistical Methods for Spatio–Temporal Systems, 151–175 (2007) [22] Hajek, J.: “On a Property of Normal Distribution of Any Stochastic Processes”, Math. Statist. Prob., 245–252 (1958) [23] Halmos, P. R.: “Introduction to Hilbert space and the theory of spectral multiplicity”, New York: Chelsea (1957) [24] Higdon, D. and Swall, J. and Kern, J.: “Non–stationary spatial modeling”, Bayesian statistics, 6(1), 761–768 (1999) 61 [25] Ibragimov, I. A. and Rozanov, I. A.: “Gaussian random processes”, New York: SpringerVerlag. (1978) [26] Jun, M. and Stein, M. L.: “Nonstationary covariance models for global data”, The Annals of Applied Statistics, 1271–1289 (2008) [27] Kallianpur, G. and Oodaira, H.: “Non–anticipative representations of equivalent Gaussian processes”, The Annals of Probability, 1(1), 104–122 (1973) [28] Kaufman, C. G. and Schervish, M. J. and Nychka, D. W.: “Covariance tapering for likelihood-based estimation in large spatial data sets”, Journal of the American Statistical Association, 103(484), 1545–1555 (2008) [29] Luan, N. and Xiao, Y.: “Spectral conditions for strong local nondeterminism and exact Hausdorff measure of ranges of Gaussian random fields”, Journal of Fourier Analysis and Applications, 18(1), 118–145 (2012) [30] Nychka, D. and Wikle, C. and Royle, J. A.: “Multiresolution models for nonstationary spatial covariance functions”, Statistical Modeling, 2(4), 315–331 (2002) [31] Paciorek, C. J. and Schervish, M. J.: “Spatial modeling using a new class of nonstationary covariance functions”, Environmetrics, 17(5), 483–506 (2006) [32] Parzen, E.: “Probability density functionals and reproducing kernel Hilbert spaces”, Proceedings of the Symposium on Time Series Analysis, 196, 155–169 (1963) [33] Pitt, L. D.: “Stationary Gaussian Markov fields on Rd with a deterministic component”, Journal of Multivariate Analysis, 5.3, 300–311 (1975) [34] Pitt, L. D.: “Some problems in the spectral theory of stationary processes on Rd ”, Indiana Univ. Math. J., 23, 343–365 (1973) [35] Ren, Q., and Banerjee, S.: “Hierarchical Factor Models for Large Spatially Misaligned Data: A LowRank Predictive Process Approach”, Biometrics, 69(1), 19–30 (2013) [36] Ronkin, L. I.: “Introduction to the theory of entire functions of several variables”, American Mathematical Soc (1974) [37] Skorokhod, A. V. and Yadrenko, M. I.: “On absolute continuity of measures corresponding to homogeneous Gaussian fields”, Theory of Probability and Its Applications, 18(1), 27–40 (1973) 62 [38] Sottinen, T. and Tudor, C. A.: “On the equivalence of multiparameter Gaussian processes”, Journal of Theoretical Probability, 19(2), 461–485 (2006) [39] Stein, M. L.: “Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure”, The Annals of Statistics, 850–872 (1990) [40] Stein, M. L.: “Predicting random fields with increasing dense observations”, The Annals of Applied Probability, 9(1), 242–273 (1999) [41] Stein, M. L.: “Interpolation of spatial data: some theory for kriging”, Springer (1999) [42] Stein, M. L.: “Equivalence of Gaussian measures for some nonstationary random fields”, Journal of Statistical Planning and Inference, 123(1), 1–11 (2004) [43] Stein, M. L.: “Space–time covariance functions”, Journal of the American Statistical Association, 100(469), 310–321 (2005) [44] Van Zanten, H.: “When is a linear combination of independent fBms equivalent to a single fBm?”, Stochastic processes and their applications, 117(1), 57–70 (2007) [45] Van Zanten, H.: “A remark on the equivalence of Gaussian processes”, Electronic Communications in Probability, 13, 54–59 (2008) [46] Xiao, Y.: “Strong local nondeterminism and sample path properties of Gaussian random Fields”, Asymptotic Theory in Probability and Statistics with Applications (T.-L. Lai, Q.-M. Shao and L. Qian, eds), Higher Education Press, Beijing., 136–176 (2007) [47] Xue, Y. and Xiao, Y: “Fractal and smoothness properties of space-time Gaussian models”, Frontiers of Mathematics in China, 6(6), 1217–1248 (2011) [48] Yadrenko, M. I.: “Spectral theory of random fields”, A. V. Balakrisn’an (Ed.). Optimization Software. Publications Division (1983) [49] Yaglom, A. M.: “Some classes of random fields in n-dimensional space, related to stationary random processes”, Theory of Probability and Its Applications, 2(3), 273–320 (1957) [50] Zhang, H.: “Inconsistent estimation and asymptotically equal interpolations in model– based geostatistics”, Journal of the American Statistical Association, 99(465), 250–261 (2004) 63