NONSTATIONARY GAUSSIAN RANDOM FIELDS WITH APPLICATION TO SPACE
AND SPACE–TIME MODELING
By
Abolfazl Safikhani

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Statistics – Doctor of Philosophy
2015

ABSTRACT
NONSTATIONARY GAUSSIAN RANDOM FIELDS WITH APPLICATION
TO SPACE AND SPACE–TIME MODELING
By
Abolfazl Safikhani
Space-time models have become increasingly popular in scientific studies. Gaussian random fields (GRFs) are an important tool of this field. Developing covariance structures for
GRFs, parameter estimation, and prediction in big spatial datasets are central challenges
in spatio–temporal modeling. My thesis consists of three main chapters dealing with these
main challenges in this area. In chapter 2, we investigate the problem of equivalence of
GRFs with stationary increments, and obtain sufficient conditions for equivalence in terms
of the behavior of the spectral measures at infinity. Further, the main results are applied
to a rich family of nonstationary space-time models with possible anisotropy behavior. In
chapter 3 , we propose a nonstationary parametric model, in which the underlying Gaussian
random field may have different regularities in different directions, thus can be applied to
model anisotropy. Using the theory of equivalence of Gaussian measures under nonstationary
assumption, strong consistency of the tapered likelihood based estimation of the variance
component under fixed domain asymptotics are derived by putting mild conditions on the
spectral behavior of the tapering covariance function. In chapter 4, we merge two ideas of
factor modeling and low-rank spatial processes approximation to develop and propose a class
of hierarchical low-rank spatial factor models which offer a rich and flexible modeling option
for dealing with large vector of outcomes observed at large number of locations. A Markov
chain Monte Carlo algorithm is developed for estimation, and further, the full posterior
distributions are recovered in a Bayesian predictive framework.

Keywords: Space–time models, nonstationary Gaussian random fields, Equivalence of
Gaussian measures, Covariance tapering, Fixed–domain asymptotics, Spatial factor modeling, Gaussian predictive processes, Hierarchical Bayesian

I dedicate this dissertation to my wonderful parents.

iv

ACKNOWLEDGMENTS

I wish to express my deepest gratitude to my advisors Dr. Yimin Xiao and Dr. Alla Sikorskii.
Their guidance, patience and thoughtfulness made this work possible. It was a privilege to
work with them.
I would like to thank my thesis committee members Drs. Andrew Finley, V. S. Mandrekar,
and Chae Young Lim, for their time and interest. Also, I appreciate the help and support
of the Department of Statistics and Probability in last five years.

v

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Spectral Conditions for Equivalence of Gaussian Random Fields
with stationary increments . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4
4
7
13
28

Chapter 2
2.1
2.2
2.3
2.4

Chapter 3

3.1
3.2
3.3
3.4

Covariance Tapering for Anisotropic Nonstationary Gaussian
Random Fields with Application to Large Scale Spatial Data
Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35
36
37
38
42

Chapter 4

Spatial factor models and dimension reducing Gaussian predictive processes for high dimensional remotely sensed data
and forest variables . . . . . . . . . . . . . . . . . . . . . . . . . . .
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Identifiablity Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prior Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Approximation using Gaussian predictive processes . . . . . . . . . . . . . .
Data equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Updating the mean part b . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 Updating ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.3 Updating the loading factors Λ . . . . . . . . . . . . . . . . . . . . .
4.6.4 Updating the nuggets τ . . . . . . . . . . . . . . . . . . . . . . . . .
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44
45
46
47
48
49
50
51
51
52
53
53
54

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

4.1
4.2
4.3
4.4
4.5
4.6

4.7
4.8

vi

LIST OF TABLES
Table 3.1

Results for estimation of σ 2 . . . . . . . . . . . . . . . . . . . . . . .

43

Table 4.1

Parameter estimates with their 95% credible intervals . . . . . . . .

54

Table 4.2

Parameter estimates with their 95% credible intervals . . . . . . . .

58

vii

LIST OF FIGURES
Figure 3.1

3D Simulated Surface on [0, 1]2 with increments 0.02 . . . . . . . . .

Figure 3.2

2D Projection of the Simulated Surface on [0, 1]2 with increments 0.02 40

Figure 3.3

Simulated Surface on [0, 1]2 with increments 0.07

Figure 4.1

AGB measurements from the Penobscot Experimental Forest in Bradley,
Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Figure 4.2

Fitted plot of AGB measurements together with the observed values

56

Figure 4.3

LiDAR data with fitted values and credible bands . . . . . . . . . .

57

viii

. . . . . . . . . .

39

42

Chapter 1
Introduction
Space-time models have become increasingly popular in scientific studies such as geology,
climatology, geophysics, environmental and atmospheric sciences ([9] and [41]). Gaussian
random fields (GRFs) are an important tool of this field. Developing covariance structures
for GRFs, parameter estimation, and prediction in big spatial datasets are central challenges
in spatio–temporal modeling. My thesis consists of three main chapters dealing with these
main challenges in this area.
Finding conditions for equivalence of Gaussian measures is necessary for studying the
asymptotics in the framework of covariance tapering. Equivalence of Gaussian measures
is a classical problem in probability theory that has been studied since early 60s and 70s.
Necessary and sufficient conditions for the equivalence of general Gaussian measures are
given in [22], [15], [19]. However, the classical conditions are rather abstract, and may
not be directly applicable to concrete cases. More verifiable conditions are then stated by
restricting to a particular family of Gaussian measures. There are criteria for equivalence of
stationary Gaussian processes in terms of their spectral densities in [25], and more general for
stationary GRFs in [37] and [48]. However, investigation on the equivalence of nonstationary
GRFs is limited to some special cases which are discussed in recent works (see, e.g., [7], [42],
and [44]). In chapter 2, we study the equivalence of GRFs with stationary increments, and
give a spectral condition under which the two GRFs will induce equivalent measures. This
condition is easy to verify, and the results are applicable to different families of GRFs such
1

as the ones with the power law as their spectral densities with different power exponents in
different dimensions, thus allowing anisotropy.
Researchers across scientific fields are asking increasingly complex questions that require
spatio–temporal models able to accommodate rich association structures. In such settings,
traditional assumptions such as stationarity and isotropy on the underlying GRFs may not
be appropriate. These assumptions become increasingly tenuous as the size of the dataset
and domain increase, which is increasingly common in many applied settings. There have
been considerable attempts in recent literature to construct nonstationary anisotropic spatial models (See [24], [17], [30], [31], [26], [1] and references therein for the construction of
such nonstationary models). In chapter 3, we study covariance estimation for a family of
anisotropic GRFs with stationary increments using covariance tapering (See [18] for definition). Covariance tapering introduces sparsity to allow for more efficient matrix operations
which is essential in dealing with large scale data sets. Consistency of MLE under the tapered covariance structure is established by putting a mild condition on the decay rate of the
spectral density of the tapering function at infinity. This result also deals with the problem
of equivalence of nonstationary GRFs.
Researchers are increasingly seeking inference about multiple outcome variables across
spatial and temporal domains. This interest in applied community is due in large part to
widely available geographic information systems (GIS), geographic position systems (GPS),
and open source long-term monitoring networks. In such settings, models should capture
both within location (i.e., among outcome variable) and between location covariance structures. Delivering this inference is often difficult due to the massive size of the given dataset.
In chapter 4, we merge two ideas of factor modeling and low-rank spatial processes approximation to develop and propose a class of hierarchical low-rank spatial factor models
2

which offer a rich and flexible modeling option for dealing with large vector of outcomes
observed at large number of locations. A Markov chain Monte Carlo algorithm is developed for estimation, and further, the full posterior distributions are recovered in a Bayesian
predictive framework. We illustrate our methodology with simulation experiments and an
environmental data set involving Light Detection and Ranging (LiDAR) signals together with
above-ground biomass (AGB) from the Penobscot Experimental Forest in Bradley, Maine.

3

Chapter 2
Spectral Conditions for Equivalence of
Gaussian Random Fields with
stationary increments
Space-time models have become increasingly popular in scientific studies. Recent works
in this field consist of construction of nonstationary anisotropic Gaussian random fields
(GRFs). However, the problem of equivalence of measures corresponding to GRFs, which
has direct consequences on the study of such models, is discussed mostly for the stationary
case, and results for the nonstationary case are limited. In this chapter, we investigate the
problem of equivalence of GRFs with stationary increments, and obtain sufficient conditions
for equivalence in terms of the behavior of the spectral measures at infinity. Further, the
main results are applied to a rich family of nonstationary space-time models with possible
anisotropy behavior.

2.1

Introduction

Space-time models have become increasingly popular in scientific studies such as geology, climatology, geophysics, environmental and atmospheric sciences, etc ([9] and [41]). Gaussian
random fields (GRFs) are basically the main tools in space-time modeling. Most of the para4

metric models proposed are GRFs with specific parametric covariance structure (See [10],
[20], [43] and [21] for rich families of space-time covariance functions). The main job then
is to find consistent estimates for the parameters, and finally use them for prediction of the
underlying random field at unobserved locations, and also in performing statistical inference.
Given a parametric family of Gaussian distributions, an important question is whether all the
parameters are consistently estimable. First step to answer this question demands an investigation on the equivalence of the corresponding Gaussian measures involved in the model,
since if two parameters in the parameter space give equivalent Gaussian distributions, then
it is impossible to find consistent estimators for all the parameters involved regardless of the
method chosen for estimation (See for example [50] for discussion on inconsistent estimation
in Matern covariance functions). Another application of equivalence of Gaussian measures
comes from covariance structure misspecification, and its effect on spatial interpolation ([41],
chapter 4). In spatial statistics, an important goal is to predict the underlying GRF at unobserved locations. The common method is the best linear unbiased predictor (BLUP), which
in known as kriging (See [41] for definition and properties of kriging). This, however needs
information about the covarince function of the underlying GRF, which is in fact unknown.
Using misspecified covariance functions in prediction may increase the prediction error. [41]
proved that covariance functions which yield to equivalent Gaussian measures, have asymptotically the same prediction error rate (See also [39] and [40]). Therefore, finding sufficient
conditions under which two covarince functions result in equivalent Gaussian distributions
has direct impact in evaluating the prediction error in interpolation of spatial data, and thus
proving asymptotically optimal prediction under misspecified covariance structure. There
are other applications of equivalence and perpendicularity of GRFs in spatial modeling. For
example, we refer to [18] and [28] for the application in covariance tapering.
5

Equivalence of Gaussian measures is a classical problem in probability theory that has
been studied since early 60’s and 70’s. Necessary and sufficient conditions for the equivalence
of general Gaussian measures are given in [22], [15], [19], and references therein (See also
[32], [27] for conditions in terms of time domain reproducing kernel Hilbert spaces). These
conditions are rather abstract, and may not be directly applicable to concrete cases. More
specific conditions are then stated by restricting to a particular family of Gaussian measures.
There are criteria for equivalence of stationary Gaussian processes in terms of their spectral
densities in [25], and more general for stationary GRFs in [37] and [48]. However, investigation on the equivalence of nonstationary GRFs is limited to some special cases which are
discussed in recent works. For instance, we refer to [7] on mixed fractional Brownian motion,
[5] on Volterra processes, [38] on multiparameter Gaussian processes, and [42] on a family
of power law covariance functions. Recently, [44] found sufficient conditions for equivalence
of Gaussian processes with stationary increments in terms of their spectral densities similar
to one in [25] for the stationary case. However, this result is only for one dimensional Gaussian processes, and the case for multidimensional GRFs with stationary increments is not
discussed.
Space-time models have become more complex, and traditional assumptions such as stationarity and isotropy on the underlying GRFs might not be appropriate in data analysis in
different fields such as geophysics, environmental and atmospheric sciences, etc. There have
been considerable attempts in recent works for constructing nonstationary anisotropic spatial
models (We refer to [24], [17], [30], [31],[26], [1] and references therein for the construction
of such nonstationary models). Therefore, studying the equivalence of nonstationary GRFs
is of interest in the study of such nontrivial families of space-time models. In this paper,
we will investigate the equivalence of GRFs with stationary increments which might have
6

different regularities in each direction, and thus allows anisotropy as well, and find sufficient
conditions for their equivalence in terms of the behavior of their spectral measures at infinity.
The rest of the chapter is organized as follows. We start section 2.2 by introducing
some useful Hilbert spaces connected to the frequency domain, and study their structure. In
section 2.3, we state the main result of the paper, which is sufficient conditions for equivalence
of GRFs with stationary increments using the tail behavior of their spectral densities. In
the last section, we apply the main results to a rich family of anisotropic nonstationary
spatio-temporal Gaussian models.

2.2

Preliminary

Let X = {Xt : t ∈ Rd } be a centered GRF with stationary increments with X(0) = 0.
According to [49], we have the following spectral representations for X, and for the covariance
structure of X, respectively:

d

X(t) =

C(t, s) = E(Xt Xs ) =

Rd

Rd

ei t,λ − 1 W (dλ),

ei t,λ − 1

e−i s,λ − 1 F (dλ),

(2.2.1)

(2.2.2)

where W is a complex-valued Gaussian random measure with control measure F , which is a
non-negative symmetric measure on Rd \ {0}, called the spectral measure of X satisfying
|λ|2
Rd

1 + |λ|2

F (dλ) < ∞.

(2.2.3)

If the spectral measure is absolutely continuous with respect to the Lebesgue measure

7

on Rd , we will call its Radon-Nikodym derivative spectral density, denoted by f (λ). We will
give conditions in terms of the spectral densities for the equivalence of GRFs with stationary
increments, but first, we need to define precisely equivalent GRFs.
Definition 2.2.1. For a fixed D ⊆ Rd , we call two GRFs X = {Xt : t ∈ Rd }, Y = {Yt : t ∈
Rd } equivalent if they induce equivalent measures on the measurable space RD , B RD

,

in which B RD is the σ-field generated by the Borel subsets of RD . Moreover, we call them
locally equivalent if they are equivalent on any bounded subset of Rd .
Such spectral representations for the covariance structure of GRFs of the type given
in 2.2.2 makes an important bridge between the problem of equivalence of GRFs and the
description of the space generated by the linear combinations of the kernel functions in
the spectral representation. For that purpose, in this section we need to study such linear
spaces, and explore their properties which will be valuable in the next section. For that,
let’s define for a fixed D ⊆ Rd bounded, and a measure F on Rd satisfying 2.2.3, an
incomplete Hilbert space LeD = span et (λ) := ei t,λ − 1 : t ∈ D

with the inner product

et , es F = Rd et (λ)es (λ) F (dλ). We denote the closure in L2 (F ) of LeD by LD (F ). Also,
for T > 0 define

ΠT := λ = (λ1 , ..., λd ) ∈ Rd : −T ≤ λj ≤ T, ∀j = 1, ..., d .

By looking at the basis functions generating LeD , we realize that they are entire functions
defined on Cd , and they are of finite exponential type (See [36] for definition and properties).
However, the elements in the completed Hilbert space, LD (F ), may not have the same
properties. These properties enable us to apply Paley-Wiener type theorems, and get nice
description of the elements in theses Hilbert spaces. This problem is discussed in details in
8

[34] and [33], and there are conditions mentioned in [33] on the measure F for this purpose.
Therefore, we need to put further assumption on the spectral measure F in order to get the
same nice properties. Here is the crucial assumption on the tail behavior of the spectral
density f (λ), assuming it exists:
(C1) There exist c, k, η > 0 such that for all |λ| > k, we have f (λ) ≥ |λ|c η .
This assumption on the spectral density will force the elements in LD (F ) to be entire
functions of finite exponential type. The next two lemmas will prove this statement. The
following lemma is taken from [46], Lemma 2.2, and we state it here again for completeness.
Lemma 2.2.1. Suppose that the spectral density f satisfies (C1). Then for fixed T > 0,
there exists positive constants C and M such that for all functions φ of the form
n

φ(λ) =

k
ak ei t ,λ − 1 ,

(2.2.4)

k=1

where ak ∈ R and tk ∈ ΠT , we have for all z ∈ Cd

|φ(z)| ≤ C φ F exp{M |z|}.

(2.2.5)

Moreover, for fixed C1 > 0, there exists a positive constant C2 such that for all functions
of the form 2.2.4, we have
|φ(z)| ≤ C2 |z| φ F

(2.2.6)

for all |z| ≤ C1 .
One can use 2.2.5 to define the limiting functions in LΠ (F ) in such a way that they
T

also satisfy both 2.2.5 and 2.2.6. We will prove this in the next Lemma.
9

Lemma 2.2.2. Let F be a spectral measure on Rd which satisfies (C1). Then, for each
T > 0, the space LΠ (F ) consists of the restriction to Rd of entire functions on Cd of finite
T
exponential type. Moreover, 2.2.6 is also true for all functions in LΠ (F ).
T
Proof. The idea of the proof is similar to [33], p. 304. Take a sequence φn ∈ LeΠ , such that
T
φn − φ F → 0 for some φ ∈ LΠ (F ). Then, it is a Cauchy sequence in L2 (F ), and using
T
2.2.5, we get
|φn (z) − φm (z)| ≤ C φn − φm F exp{M |z|}

(2.2.7)

This means for each fixed z ∈ Cd , the sequence φn (z) is a Cauchy sequence in C. So, it
is convergent. Let’s denote the limit by φ(z). Now, since limit in L2 (F ) sense implies the
almost sure convergence for a subsequence, φ = φ a.e. with respect to F . From now on, we
will take φ as our favorite version of the limits of functions in LeΠ . Therefore, the elements in
T

the space LΠ (F ), are not only the
T

L2 (F )

limits of functions in LeΠ , but also the pointwise
T

limits as well. Thus, both 2.2.5 and 2.2.6 are true for all the elements in LΠ (F ). The only
T
thing left to prove is that these functions are entire functions on Cd . But this is true since
any element of the space LΠ (F ) is the locally uniform limit of functions of the form 2.2.4
T

which are obviously entire functions, and thus, they are also entire functions as well (This
is called Weirerstrass Thoerem. See for example [14], Proposition 2.8, p. 52).

This lemma shows that if the spectral density satisfies the assumption (C1), we can
complete the space LeΠ in such a way that the resulting functions are locally uniform
T
limits of entire functions, and hence, they are entire functions as well, and they are also
of finite exponential type. Further, since 2.2.5 is true for all the elements in the Hilbert
space LΠ (F ), we can see that the point evaluators, i.e. the functionals on LΠ (F ) of the
T
T
10

form φ → φ(z) for each fixed z ∈ Cd are bounded operators. Now, we can apply Riesz
representation Theorem (See [23], Theorem 3. p. 31 ) to prove that the space LΠ (F ) is
T
a Reproducing Kernel Hilbert Space (RKHS) in the sense of [2]. This means there exists a
family of functions KT (., .) from Rd × Rd to C such that first of all KT (ω, .) ∈ LΠ (F ) for
T
all ω ∈ Rd , and secondly, for every φ ∈ LΠ (F ) and ω ∈ Rd , we have the following kernel
T
property
φ, KT (ω, .) F =

Rd

φ(λ)KT (ω, λ) F (dλ) = φ(ω).

(2.2.8)

Also, it is worthwhile to mention that the set of all functions {KT (ω, .) : ω ∈ Rd } is dense
in LΠ (F ) (To see this, note that if φ ∈ LΠ (F ) is orthogonal to KT (ω, .) for all ω ∈ Rd ,
T
T
then φ(ω) = φ, KT (ω, .) F = 0, which implies φ = 0.), and further for all ψ ∈ L2 (F ), the
function
ω → ψ, KT (ω, .) F

(2.2.9)

is the orthogonal projection of ψ on LΠ (F ) (See the proof in [2], p. 345). We denote this
T
projection by πL (F ) ψ.
ΠT
Finding explicit forms of the reproducing kernels is not an easy job. However, in order
to prove the results in section 2.3, we need upper bounds for the growth rate of the diagonal
elements of the reproducing kernels at origin and also at infinity. The following proposition
proves an important growth rate for the diagonal elements in the reproducing kernels.
Proposition 2.2.1. Suppose that the spectral density of F , f (λ) satisfies (C1). Then, for
fixed T > 0 and C1 > 0, there exists a positive constant C2 such that

KT (ω, ω) ≤ C2 |ω|2

11

(2.2.10)

for all |ω| < C1 .
Proof. Since for fixed ω ∈ Rd , KT (ω, .) ∈ LΠ (F ), we can apply Lemma 2.2.2 to these
T
functions. This means that the inequality 2.2.6 can be applied to them. After doing so, we
get

|KT (ω, λ)| ≤ C2 |λ| KT (ω, .) F
= C2 |λ|(KT (ω, ω))1/2

for all ω ∈ Rd and λ ∈ Cd with |λ| < C1 . Now, if we put λ = ω, since the constant C2
doesn’t depend on ω, we will get the desired result.
We also need to define another Hilbert space based on the tensor product of the elements
in LΠ (F ). For that purpose, first we define LeΠ ⊗ LeΠ to be the span of functions
T
T
T
(et ⊗ es )(ω, λ) := et (ω)es (λ) with t, s ∈ ΠT . Now, denote by LΠ (F ) ⊗ LΠ (F ) the closure
T
T
in L2 (F ⊗ F ) of the space LeΠ ⊗ LeΠ . According to [2], Theorem 1, p. 361, the new Hilbert
T
T
space LΠ (F ) ⊗ LΠ (F ) is also a RKHS with reproducing kernel
T

T

((ω1 , λ1 ), (ω2 , λ2 )) → KT (ω1 , ω2 )KT (λ1 , λ2 ).

(2.2.11)

This means in fact that for ψ ∈ LΠ (F ) ⊗ LΠ (F ),
T

T

ψ, KT (ω, .) ⊗ KT (λ, .) F ⊗F = ψ(ω, λ).

(2.2.12)

We finish this section by a lemma stating that the norm of the elements in spaces LΠ (F )
T

only depends on the tail behavior of the spectral measure F .

12

Lemma 2.2.3. Suppose f0 and f1 are two spectral densities satisfying the condition (C1),
and further f0 (λ)

f1 (λ) as |λ| → ∞. Then, LΠ (f0 ) = LΠ (f1 ), and further there exist
T
T

positive constants C3 and C4 such that for all φ ∈ LΠ (f0 )
T

C3 φ f ≤ φ f ≤ C4 φ f .
1
0
1

Proof. Suppose g ∈ LΠ (f0 ). There exists functions gn of the form 2.2.4 such that
T
g − gn f → 0 as n → ∞. Now, using lemma 2.2.2, we get
0
g − gn 2f

1

=

Rd

|g(λ) − gn (λ)|2 f1 (λ) dλ

≤ C2 g − gn 2f
0

|λ|<C1

|λ|2 f1 (λ) dλ

+ C g − gn 2f .
0

Therefore, g ∈ LΠ (f1 ) as well. Similarly by simply putting g − gn = φ in the above
T
calculations, we can verify the second part of the lemma.

2.3

Main Results

In this section, we study the equivalence of GRFs with stationary increments, and clarify
its connection to the Hilbert spaces constructed in section 2.2. In particular, the role of
the reproducing kernels of the RKHS LΠ (F ) will be emphasized. We start this section by
T

rephrasing Theorem 5, p. 84. in [25] into our context (See also [48], Theorem 1, p. 149).
Theorem 2.3.1. Two centered GRFs with stationary increments, and the spectral measures
F0 and F1 are equivalent on D if and only if:

13

(i) φ F
0

φ F , ∀φ ∈ LeD ,
1

(ii) There exists a function ψ ∈ LD (F0 ) ⊗ LD (F0 ) such that for all t, s ∈ D

et , es F − et , es F = ψ, et ⊗ es F ⊗F .
0
1
0 0

(2.3.1)

Proof. Proof is actually reconstruction of the proof of the Theorem 5, p. 84 of [25]. The
starting point is however, Theorem 4, p. 80. This Theorem can be adapted to our context
since its proof only involves the “entropy distance” between the Gaussian measures, and
thus is true for general GRFs (See also [6], Theorem 4.1, 4.4 pp. 180-185 ). After doing
so, we get that two GRFs with stationary increments and spectral measures F0 and F1 are
φ F , ∀φ ∈ LeD , and ∆ = I − A∗ A is a
1

equivalent on D ⊆ Rd , if and only if, φ F
0

Hilbert-Schmidt operator in LD (F0 ), in which I is the identity operator on LD (F0 ), and
A : LD (F0 ) → LD (F1 ) with Aφ = φ for all φ ∈ LD (F0 ). Now, since ∆ is a self-adjoint
operator, if it is also a Hilbert-Schmidt operator, by the Spectral Theorem (See [13], corollary
5 p. 905 ), we can conclude that there exists an orthonormal basis for LD (F0 ) consisting
of the eigenvectors of ∆, denoting them by φj , j = 1, 2, ..., with corresponding eigenvalues
λj , j = 1, 2, ... with

2
j λj

< ∞. Note that we can write

2
j λj

=

j,k

∆φj , φk 2F .
0

The square root of this quantity is called the Hilbert-Schmidt norm. This norm doesn’t
depend on the choice of the orthonormal basis (See [13], Lemma 2, p. 1010). Therefore, we
can rephrase Theorem 4 in the following form: two GRFs with stationary increments and
spectral measures F0 and F1 are equivalent on D, if and only if, φ F
0
and

j,k

φ F , ∀φ ∈ LeD ,
1

∆φj , φk 2F < ∞ for any orthonormal basis for LD (F0 ).
0

Now, take an arbitrary orthonormal basis for LD (F0 ), φ1 , φ2 , ..., and suppose
j,k

∆φj , φk 2F < ∞. Define ψ0 (ω, λ) =
0

j,k

14

∆φj , φk F φj (ω)φk (λ). We can see that
0

ψ0 2F ⊗F =
0 0

j,k

∆φj , φk 2F < ∞, and thus by the form of ψ0 , it’s clear that it belongs
0

to LD (F0 ) ⊗ LD (F0 ). Also, observe that

ψ0 , φj ⊗ φk F ⊗F =
0 0

∆φj , φk F
0

=

(I − A∗ A)φj , φk F
0

=

φj , φk F − φj , φk F .
0
1

This shows that 2.3.1 holds for orthonormal basis of the space LD (F0 ). Therefore, by
continuity of inner product 2.3.1 will be true for all the elements in LD (F0 ), especially for
et and es when t, s ∈ D.
Conversely, suppose there exists a function ψ0 ∈ LD (F0 )⊗LD (F0 ), such that φj , φk F −
0
φj , φk F = ψ0 , φj ⊗ φk F ⊗F for an orthonormal basis φj ’s for LD (F0 ). Then, we have
0 0
1
∆φj , φk 2F
j,k

0

φj , φk F − φj , φk F
1
0

=
j,k

=
j,k

≤

2

ψ0 , φj ⊗ φk 2F ⊗F
0 0

ψ0 2F ⊗F
0 0

< ∞.

This completes the proof.
Previous theorem is stated in general form where equivalence of GRFs is guarantied
with no restriction on the spectral measure. However, verifying the second condition in this
theorem which involves finding a function ψ ∈ LD (F0 ) ⊗ LD (F0 ) with the property 2.3.1,
seems to be hard. If we put the condition (C1) on the tail behavior of the spectral density

15

of one of the measures involved, we will get the following theorem using the reproducing
kernels of LΠ (F0 ). In fact, this theorem clarifies what must be the function ψ in theorem
T
2.3.1.
Theorem 2.3.2. Two centered GRFs with stationary increments, and spectral measures F0
and F1 , with F0 satisfying assumption (C1), are equivalent on ΠT for some T > 0 if and
only if:
φ F , ∀φ ∈ LeΠ ,
1
T

(i) φ F
0

(ii) ψ(ω, λ) = KT0 (ω, λ) − Rd KT0 (ω, γ)KT0 (λ, γ) F1 (dγ) ∈ LΠ (F0 ) ⊗ LΠ (F0 ) where
T
T
KT0 (., .) are the reproducing kernels of the space LΠ (F0 ).
T
Proof. First, assume that the measures induced by them are equivalent. Then, by theorem
2.3.1, there exists a function ψ ∈ LΠ (F0 ) ⊗ LΠ (F0 ) such that the equation 2.3.1 will
T
T
hold. Now, because of bilinearity and continuity of inner product together with the fact that
LΠ (F0 ) = LΠ (F1 ), we get
T

T

φ1 , φ2 F − φ1 , φ2 F = ψ, φ1 ⊗ φ2 F ⊗F
0
1
0 0

(2.3.2)

for all φ1 , φ2 ∈ LΠ (F0 ) = LΠ (F1 ). Now, simply choose for fixed ω, λ ∈ Rd , φ1 (γ) =
T
T
KT0 (ω, γ) and φ2 (γ) = KT0 (λ, γ), and replace them in equation 2.3.2 to get

ψ(ω, λ) = KT0 (ω, λ) −

Rd

KT0 (ω, γ)KT0 (λ, γ) F1 (dγ).

Conversely, since ψ ∈ LΠ (F0 ) ⊗ LΠ (F0 ), by reproducing kernel property we get ψ(ω, λ) =
T
T
ψ, KT0 (ω, .) ⊗ KT0 (λ, .) F ⊗F . Also, note that by the form of ψ, we have
0 0

16

ψ(ω, λ) = KT0 (ω, .), KT0 (λ, .) F − KT0 (ω, .), KT0 (λ, .) F . Combining them together, we get
0
1
KT0 (ω, .), KT0 (λ, .) F − KT0 (ω, .), KT0 (λ, .) F = ψ, KT0 (ω, .) ⊗ KT0 (λ, .) F ⊗F . (2.3.3)
0
1
0 0
Now, since the span{KT0 (ω, .); ω ∈ Rd } is dense in LΠ (F0 )(= LΠ (F1 )), the equality 2.3.3
T
T
will be true for all the elements in LΠ (F0 ).
T

Remark 2.3.1. In order to prove that the function ψ appeared in Theorem 2.3.2 is in the
space LΠ (F0 ) ⊗ LΠ (F0 ), we only need to show that it belongs to L2 (F0 ⊗ F0 ). To prove
T

T

this fact, we will show that the orthogonal projection of ψ on LΠ (F0 ) ⊗ LΠ (F0 ) is in fact
T
T
itself. For that, observe

πL

ΠT (F0 )⊗LΠT (F0 )

(ψ) (ω, λ) =
=
=

ψ, KT0 (ω, .) ⊗ KT0 (λ, .)
Rd Rd

F0 ⊗F0

ψ(x, y)KT0 (ω, x)KT0 (λ, y) F0 (dx)F0 (dy)

Rd Rd R d

KT0 (x, γ)KT0 (y, γ)KT0 (ω, x)KT0 (λ, y)
× F0 (dx)F0 (dy)(F0 (dγ) − F1 (dγ))

=

Rd

KT0 (ω, γ)KT0 (λ, γ) (F0 (dγ) − F1 (dγ))

= ψ(ω, λ).

This is a huge reduction which strengthen the applicability of Theorem 2.3.2.
By Remark 2.3.1, checking the second assumption in Theorem 2.3.2 is reduced to only
checking ψ ∈ L2 (F0 ⊗ F0 ). However, checking the first assumption seems to be not an
easy job since we need to compare the norms of all the elements in the space LeΠ under
T
17

two different measures. For that purpose, in the following, we will try to find equivalent
conditions which may be easier to verify in application.
It is well known (See [13], p. 1009 ) that for ψ ∈ L2 (F ⊗ F ), one can define a HilbertSchmidt operator on L2 (F ) in the following way

(ψφ) (ω) =

Rd

ψ(ω, λ)φ(λ) F (dλ)

(2.3.4)

for every φ ∈ L2 (F ). If we use specifically the ψ defined in Theorem 2.3.2, and restrict the
domain to LΠ (F ), we will have again a Hilbert-Schmidt operator on LΠ (F ). Note that
T
T
the image of the operator ψ is in fact inside the LΠ (F ). To prove this, observe that for
T
φ ∈ LΠ (F ),
T

πL

ΠT

(ψφ) (ω) =
=
=

(ψφ)(x)KT (ω, x) F (dx)

Rd

Rd Rd

φ(y)ψ(x, y)KT (ω, x) F (dx)F (dy)

Rd Rd Rd

φ(y)KT (x, γ)KT (y, γ)KT (ω, x) F (dx)F (dy)
×(F (dγ) − F1 (dγ))

=
=

Rd Rd
Rd

φ(y)KT (ω, γ)KT (y, γ) (F (dγ) − F1 (dγ))F (dy)

φ(y)ψ(ω, y) F (dy)

= (ψφ)(ω).

This argument shows that ψφ ∈ LΠ (F ) for any φ ∈ LΠ (F ). Also, observe that since
T

T

ψ(ω, λ) = ψ(λ, ω), the operator ψ is self-adjoint. This fact together with compactness of this
operator (Since ψ is a Hilbert-Schmidt operator, it is already compact, see [13], p. 1009)
enable us to use the spectral Theorem for compact normal operators (See [13], Corollary 5,
18

p. 905), which we will use in the proof of the next Theorem. In fact, next theorem shows
that the first condition in Theorem 2.3.2 can be replaced by 1 ∈
/ σ(ψ), where σ(ψ) is the
spectrum of the operator ψ (See [13], p. 902 for definition).
Theorem 2.3.3. Two GRFs with stationary increments and spectral measures F0 and F1
with F0 satisfying the condition (C1), are equivalent on ΠT if and only if the function defined
by
ψ(ω, λ) = KT0 (ω, λ) −

Rd

KT0 (ω, γ)KT0 (λ, γ) F1 (dγ)

(2.3.5)

belongs to LΠ (F0 ) ⊗ LΠ (F0 ), and 1 ∈
/ σ(ψ).
T
T
Proof. From 2.3.2, by putting φ1 = φ2 = φ, and the definition of the operator ψ in 2.3.4, we
get
φ 2F

1

=1−
φ 2F
0

ψφ, φ F
0
2
φ F

(2.3.6)

0

for all φ ∈ LΠ (F0 ). This simply implies that first σ(ψ) ⊆ (−∞, 1], and second there exists
T
a finite positive constant C such that φ F ≤ C φ F for all φ ∈ LeΠ since ψ is a bounded
1
0
T
operator. This fact shows that proving ψ ∈ L2 (F0 ⊗ F0 ) is helping us to verify half of
what we need in the first condition of theorem 2.3.2 as well. What remains is to show that
1∈
/ σ(ψ) if and only if there exists a positive constant c such that φ F ≤ c φ F for all
0
1
φ ∈ LeΠ .
T

First, suppose that φ F ≤ c φ F for some c > 0. If 1 ∈ σ(ψ), it means that
0
1
there exists φ ∈ LΠ (F0 ) with φ F = 1 such that ψφ = φ. Putting it in 2.3.6, we get
T
0
φ F = 0 which is contradiction. Conversely, suppose 1 ∈
/ σ(ψ), and also there exists a
1
sequence φn ∈ LeΠ

T

such that φn F = 1 for all n ≥ 1, and φn F → 0 as n → ∞.
0
1

Since ψ is a self-adjoint compact operator, by Corollary 5 p. 905 in [13], there exists a

19

countable orthonormal basis for LΠ (F0 ) consisting of eigenvectors of ψ, denoting them
T
by gj , j = 1, 2, ... with corresponding eigenvalues λj . Now, each φn has the representation
φn =

∞
j=1 anj gj

for anj ∈ R. Putting this sequence back to the equation 2.3.6, we get

that ψφn , φn F → 1 which means
0
∞
2
j=1 anj ,

∞
2
j=1 anj λj

→ 1 as n → ∞. Now, since 1 = φn 2F =

we can rewrite the above equation as 0 ≤

0

∞
2
j=1 anj (1 − λj )

→ 0 (This quantity

is non-negative since all the eigenvalues are bounded above by 1). Since 1 ∈
/ σ(ψ), and
{λj , j = 1, 2, ...} has no accumulation points in C except possibly 0 (See [13], Corollary 5
p. 905), there exists
∞
2
j=1 anj (1

− λj ) ≥

> 0 such that sup {λj , j ≥ 1} = 1 − . However, this implies that
∞
2
j=1 anj

=

for all n ∈ N, which is contradiction by the fact that

this sequence must go to 0 when n goes to ∞. This completes the proof.
Remark 2.3.2. Note that based on the proof of theorem 2.3.3, we can change the first condition in theorem 2.3.2 to

(i) There exists a positive constant c such that φ F ≤ c φ F for all φ ∈ LeΠ .
0
1
T
As lemma 2.2.3 emphasizes that the behavior of the spectral measure at origin does not
affect the structure of the space LΠ (F ), one might expect the same formation in terms of the
T

equivalence of Gaussian measures. The following Theorem shows that changing the spectral
measure on bounded subsets of Rd will not affect the equivalence of the corresponding GRFs.
In other words, for checking the equivalence of GRFs, only the behavior of their spectral
measures at infinity is important.
Theorem 2.3.4. Suppose two GRFs with stationary increments have spectral measures F0
and F1 such that F0 satisfies the condition (C1), and F0 = F1 on I c , where I is a bounded
subset of Rd . Then, these two GRFs are locally equivalent.
20

Proof. Define F1 (dλ) = 1I c (λ)F0 (dλ). First, we show that F0 and F1 will produce locally
equivalent GRFs with stationary increments. For that, fix T > 0. We will investigate the
equivalence of measures on ΠT . The function ψ appearing in Theorem 2.3.3 in this case is
given by
ψ(ω, λ) =
I

KT0 (ω, γ)KT0 (λ, γ) F0 (dγ).

Now, by Fubini’s Theorem, and the reproducing kernel property we get

ψ 2F ⊗F =
0 0

2

Rd R d

=
I I

I
Rd

KT0 (ω, γ)KT0 (λ, γ) F0 (dγ) F0 (dω)F0 (dλ)

KT0 (ω, γ)KT0 (ω, γ )F0 (dω)

R

KT0 (λ, γ )KT0 (λ, γ)F0 (dλ)
d

×F0 (dγ)F0 (dγ )
2

=
I I

KT0 (γ, γ ) F0 (dγ)F0 (dγ ).

Now, using Proposition 2.2.1, and the fact that KT0 (γ, γ )

ψ 2F ⊗F ≤
0 0

I

2

≤ KT0 (γ, γ)KT0 (γ , γ ), we get

KT0 (γ, γ) F0 (dγ)
2

≤ const.
I

2

2

|γ| F0 (dγ)

< ∞

by 2.2.3. This implies ψ ∈ LΠ (F0 ) ⊗ LΠ (F0 ).
T
T
It remains to show that 1 ∈
/ σ(ψ). For that purpose, take an arbitrary φ ∈ LΠ (F0 ), and
T

observe that (We use the fact that KT0 (ω, λ) = KT0 (λ, ω). See [2], p. 344 )

21

(ψφ) (λ) =
=
=

Rd
Rd

φ(ω)ψ(λ, ω) F0 (dω)
φ(ω)

I Rd

=
I

I

KT0 (λ, γ)KT0 (ω, γ) F0 (dγ)

F0 (dω)

φ(ω) KT0 (λ, γ)KT0 (γ, ω) F0 (dω)F0 (dγ)

φ(γ) KT0 (λ, γ) F0 (dγ)

= πL

ΠT (F0 )

φ1I (λ).

Therefore, if ψφ = φ, it implies in particular that φ F ≤ φ1I F . This means φ = 0
0
0
almost everywhere with respect to F0 in I c . Hence, since φ is an entire function, this implies
that φ = 0. Thus, 1 cannot be in the spectrum of ψ.
So far, we proved GRFs with spectral measures F0 and F1 are locally equivalent, but since
F0 = F1 on I, similarly, we can say that F1 and F1 produce locally equivalent GRFs. Putting
these two together, we get the desired result.
Theorems 2.3.2, and 2.3.3 give necessary and sufficient conditions for equivalence of
GRFs with stationary increments, but still verifying the conditions mentioned in these two
theorems for some concrete cases might be difficult. In the literature, there are sufficient
conditions for equivalence of certain GRFs in terms of only their spectral densities. These
conditions are easily verifiable once the two spectral densities are given. For example, we
refer to [25], theorem 17, p. 104, [37], theorem 4, and [48], theorem 4, p. 156 for stationary
GRFs, and also [44], theorem 5.3, [45], theorem 1, and [42] theorem 1 for some nonstationary
cases. Therefore, in the rest of this section, we attempt to find similar sufficient conditions
for equivalence, only in terms of the spectral measures.

22

Theorem 2.3.5. Suppose that the spectral measure F0 and F1 have positive densities f0 and
f1 with respect to the Lebesgue measure, and F0 satisfies the condition (C1). If there exists
a finite constant C > 0 such that φ F ≤ C φ F for all φ ∈ LeΠ , and
0
1
T
f1 (λ) − f0 (λ) 2 0
KT (λ, λ)f0 (λ) dλ < ∞
f0 (λ)
|λ|>k

(2.3.7)

for some k > 0, then GRFs with stationary increments and spectral measures F0 and F1 are
equivalent on ΠT .
Proof. Applying theorem 2.3.4, we can change the value of f1 on any bounded set, without
having any consequences on the equivalence. So, we assume here that f0 = f1 on |λ| ≤ k.
The function ψ in Theorem 2.3.3 here will be of the form

ψ(ω, λ) =

Rd

KT0 (ω, γ)KT0 (λ, γ) (f0 (λ) − f1 (λ)) dγ

f − f1
= πL (F ) KT0 (ω, .) 0
ΠT 0
f0
(observe that since KT0 (ω, λ)

2

(λ).

f −f
≤ KT0 (ω, ω)KT0 (λ, λ), 2.3.7 implies that KT0 (ω, .) 0f 1 ∈
0

L2 (F0 ) for all ω ∈ Rd , and hence using the orthogonal projection makes sense). Now, it

23

follows that

Rd R d

|ψ(ω, λ)|2 F0 (dλ)F0 (dω) =
≤
=

2

f − f1
πL (F ) KT0 (ω, .) 0
D 0
f0
Rd

F0 (dω)
F0

f0 − f1 2
0
KT (ω, .)
F0 (dω)
f0
Rd
F0
Rd

R

2
|KT0 (ω, γ)|
d

f0 (γ) − f1 (γ) 2
F0 (dγ)
f0 (γ)
×F0 (dω)

=
=

f0 (γ) − f1 (γ) 2 0
KT (γ, γ)f0 (γ) dγ
f0 (γ)
Rd
f1 (γ) − f0 (γ) 2 0
KT (γ, γ)f0 (γ) dγ
f0 (γ)
|γ|>k

< ∞

by the integrability assumption 2.3.7. Hence ψ ∈ L2 (F0 ⊗ F0 ), and this completes the
proof.
In the condition 2.3.7, in addition to the behavior of the spectral densities at infinity,
the growth rate of the diagonal elements of the reproducing kernels of the space LΠ (F0 ) at
T

infinity is also playing an important role. Since finding explicit forms of reproducing kernels
are difficult, we need at least to find upper bounds for the growth rate of the diagonal terms.
The following condition on spectral density helps us to accomplish this task:
(C2) For spectral density f , there exist an entire function φ0 on Cd of finite
exponential type such that f (λ)

|φ0 (λ)|2 as |λ| → ∞ on Rd .

As it was mentioned, condition (C2) will force an upper bound for the behavior of the
reproducing kernels on the diagonal at infinity, and the following lemma explores this fact.

24

Lemma 2.3.1. Suppose f0 is a spectral density such that it satisfies (C1) and (C2) for
some entire function φ0 . Then, for T > 0, there exists a finite constant C > 0 such that the
reproducing kernel KT0 of LΠ (f0 ) satisfies
T
2
KT0 (ω, λ)

KT0 (ω, ω)
≤C
f0 (λ)

for all real ω, and all λ large enough. In particular,

KT0 (λ, λ) ≤

C
f0 (λ)

(2.3.8)

for all λ large enough.
Proof. The idea of the proof is similar to the one in Lemma 3 p. 57 in [45]. Put f (λ) =
|φ0 (λ)|2 . Since f and f0 are comparable at ∞, and f is bounded around 0, it is clear that
f is satisfying both conditions 2.2.3 and (C1). This means we can define LΠ (f ) the same
T
way, and this space is also a RKHS. Consider an arbitrary orthonormal basis for this space,
and denote them by ψk , k = 1, 2, ... . Now, by Lemma 2.2.2, they are entire functions on Cd
with finite exponential type which doesn’t depend on k. Further, we know ψk φ0 ∈ L2 (Rd )
since
Rd

|ψk (λ)φ0 (λ)|2 dλ =

Rd

|ψk (λ)|2 f (λ) dλ = 1 < ∞.

Therefore, we can apply the Paley-Wiener Theorem ([36], Theorem 3.4.2. p. 171) to get
ψk φ0 = gk for certain functions gk ∈ L2 (B) where B is a bounded subset of Rd (Here h
stands for the Fourier transform of h). By Parseval relation, we can deduce that gk ’s are

25

orthonormal in L2 (B). It follows from Bessel’s inequality that

|ψk (λ)|2 f (λ) =
k

k

B

e−i t,λ gk (t) dt

2

2
ei t,λ dt

≤
B

= m(B),

where m(B) is the Lebesgue measure of B. Therefore,

|ψk (λ)|2 ≤ m(B)/f (λ) ≤ C/f0 (λ)
k

for large enough λ. Now, consider the reproducing kernels of LΠ (f0 ), denoting them by
T

KT0 (ω, .). Since f (λ)

f0 (λ) as |λ| → ∞, by lemma (2.2.3), KT0 (ω, .) belong to LΠ (f )
T

as well for ll ω ∈ Rd . Thus, we can expand it using the basis ψk , and get KT0 (ω, λ) =
k

KT0 (ω, .), ψk f ψk (λ), and then by Cauchy-Schwarz and lemma 2.2.3, we get
2

|KT0 (ω, λ)|

2

≤ c
=

|ψk (λ)|2

KT0 (ω, .) f

≤

k
2
0
KT (ω, .) f
0

|ψk (λ)|2
k

|ψk (λ)|2 ,

c KT0 (ω, ω)
k

which makes the proof complete.

Theorem 2.3.5 in combination with the lemma 2.3.1 will lead to an appealing result. If
the relative difference between two spectral densities is square integrable at infinity, then the
corresponding GRFs with stationary increments will be locally equivalent. We finish this
26

section by proving this fact.
Theorem 2.3.6. Suppose that the spectral measures F0 and F1 have positive densities f0
and f1 with respect to the Lebesgue measure, with f0 satisfying (C1) and (C2) for some
entire function φ0 on Cd of finite exponential type. If there exists a finite constant k > 0
such that
f1 (λ) − f0 (λ) 2
dλ < ∞,
f0 (λ)
|λ|>k

(2.3.9)

then GRFs with stationary increments having spectral measures F0 and F1 are locally equivalent.
Proof. Combine Lemma 2.3.1, and Theorem 2.3.5. The only thing we need to additionally
prove is that 1 ∈
/ σ(ψ). In spirit of Theorem 2.3.4, we can assume that f0 = f1 on |λ| ≤ k.
Now, take an arbitrary element φ ∈ LD (f0 ), and observe that using the multidimensional
Paley-Wiener Theorem ([36], Theorem 3.4.2. p. 171), we get φφ0 is the inverse Fourier
transform of a squared integrable function g with bounded support, B in Rd . This implies
that

|φ(λ)φ0 (λ)|2 =

e−i λ,γ g(γ) dγ

2

B
2
e−i λ,γ g 2 (λ) dλ

≤
B

g 2 (λ) dλ

=
B

< ∞

for all λ ∈ Rd . This means φφ0 is bounded on Rd . This fact together with 2.3.9 imply that

27

f −f
φ 0f 1 ∈ L2 (f0 ). Now, observe that
0

(ψφ) (λ) =
=
=

Rd
Rd

φ(ω)ψ(λ, ω) F0 (dω)
φ(ω)
|γ|>k

|γ|>k Rd

=

φ(ω) KT0 (λ, γ)KT0 (γ, ω)

φ(γ)
|γ|>k

= πL

KT0 (λ, γ)KT0 (ω, γ)

ΠT (F0 )

f0 (γ) − f1 (γ)
f0 (γ)

f0 (γ) − f1 (γ)
f0 (γ)
f0 (γ) − f1 (γ)
f0 (γ)

F0 (dγ)

F0 (dω)

F0 (dω)F0 (dγ)

KT0 (λ, γ) F0 (dγ)

f − f1
φ 0
1{|γ|>k} (λ).
f0

Now, similar to the proof of the Theorem 2.3.4, if ψφ = φ, we get that
f −f
. Letting k → ∞, by the Dominated Convergence Theorem,
φ F ≤ φ 0f 1 1{|γ|>k}
0
0
F0
we get φ F = 0. This implies that φ = 0. Thus, 1 cannot be in the spectrum of ψ. The
0

proof is complete.

2.4

Application

In this section, we apply the results in section 2.3 to some anisotropic GRFs with stationary
increments. In particular, we consider GRFs with stationary increments and spectral density
of the form
f (λ) =

1

,
βj γ
d
|λ
|
j=1 j

where λ = (λ1 , ..., λd ) ∈ Rd \{0}, βj > 0 for all j = 1, ..., d, and γ >

(2.4.1)

d
1
j=1 βj .

The

latter condition guaranties the integrability condition in 2.2.3 for spectral measures (See
Proposition 2.1 in [47]). Fractal and smoothness properties of this family of spatio-temporal

28

models are discussed in [47]. In order to apply the main results to this model, we need to
verify the assumptions needed. First of all, condition (C1) is obviously satisfied for spectral
densities of the form 2.4.1. Next lemma provides the proof that these spectral densities also
satisfy (C2).
Lemma 2.4.1. Spectral density functions of the form 2.4.1 satisfy condition (C2).
Proof. First of all, it is obvious that

1

1

βj γ

d
j=1 |λj |

1+

,
βj γ
d
|λ
|
j
j=1

as |λ| → ∞. Therefore, it suffices to prove the lemma for functions of the form on the right
hand side. Now, similar to the construction made in the proof of lemma 2.3 in [29], we can
find a function φ ∈ L2 (B) for some bounded subset B ⊆ Rd such that
2

1
1+

βj
d
j=1 |λj |

γ

φ(λ) ,

as |λ| → ∞, which φ is the Fourier transform of φ. By Paley-Wiener Theorem ([36], Theorem
3.4.2. p. 171), φ is actually the restriction on Rd of an entire function on Cd with finite
exponential type. Therefore, the proof is complete.
In order to apply the main results, first we consider a similar family of spectral densities
of the form
f (λ) =

1
Hj Q+2
d
j=1 |λj |

,

where λ = (λ1 , ..., λd ) ∈ Rd \{0}, 0 < Hj < 1 for all j = 1, ..., d, and Q =

(2.4.2)

d
1
j=1 Hj

(See the

connection between this family and the ones of the form 2.4.1 in remark 2.2 in [47]).
29

Theorem 2.4.1. Suppose f0 and f1 are spectral densities of the form 2.4.2 with parameters
Hj0 and Hj1 , respectively. Then, GRFs with stationary increments and spectral densities f0
and f1 are locally equivalent if and only if Hj0 = Hj1 for all j = 1, ..., d.
Proof. Suppose for some k ∈ {1, ..., d}, Hk0 < Hk1 . By Lemma 3.2 in [47], there exist
c1 , c2 > 0, such that for all t ∈ Rd
d

2Hji

|t|

c1
j=1

d

≤

et 2f
i

≤ c2

2Hji

|t|

,

(2.4.3)

j=1

for i = 0, 1. If we simply choose t ∈ Rd with tk = l, and tj = 0 for j = k, we get
et 2f

1
et 2f
0

c 2 Hk1 −Hk0
≤ 2l
→ 0 as l → 0.
c1

This violates the necessary condition for equivalence of Gaussian measures in Theorem 2.3.1.
Therefore, the proof in complete.

Now, we go back to the more flexible spectral densities for space-time models with stationary increments given in 2.4.1. Next theorem proves that under certain conditions, the
mixture of spectral densities of the form 2.4.1 will be equivalent to the one with the lowest
decay rate at infinity. Similar results of this type for Gaussian processes can be seen in [44]
and [7].
Theorem 2.4.2. Suppose X and Y are two independent centered GRFs with stationary
increments with spcetral densities of the form 2.4.1 with parameters (βj , γ) and (βj , γ ),

30

respectively. Then, if
γ >

1
2

d
j=1

1
βj

+ max1≤j≤d

βj
βj

γ,

(2.4.4)

then X and X + Y will be locally equivalent.
Proof. Using theorem 2.3.6, we only need to show
βj 2γ
d
j=1 |λj |
|λ|>1

βj

d
j=1 |λj |

2γ

dλ < ∞.

(2.4.5)

By using the inequality (a + b)p ≤ 2p (ap + bp ), we can break the integral in 2.4.5 into d
integrals. Thus, it’s enough to show for each fixed j = 1, ..., d

Ij :=

2β γ
|λj | j
|λ|>1

β
d
k
k=1 |λk |

2γ

dλ < ∞.

(2.4.6)

√
Since |λ| > 1, this implies that |λk | > 1/ d for some k ∈ {1, ..., d}. We need to make two
cases. Case I is when k = j, and case II is when k = j. In both cases, we use the following
fact that given positive constants β and γ, and nonnegative constant b, there exists a finite
positive constant c such that for all a > 0
∞
0

∞
− γ− 1 − b
xb
yb
β β
dx
=
a
γ
γ dy
a + xβ
1 + yβ
0


− γ− 1 − b


β β
c a
if βγ − b > 1,
=



∞
if βγ − b ≤ 1.

31

(2.4.7)

First, let’s consider case I. By, applying d times the formula 2.4.7,
∞

Ij ≤

√1
d

∞

2β γ
|λj | j

0

dλ

0

d
βr
r=1 |λr |

d−1
2βj γ

∞

≤ c

∞

...

|λj |

√1
d

2γ −

β
|λj | j

2γ

dλj

1
r=j β
r

< ∞

1
r=j β
r

since βj 2γ −

> 2βj γ + 1 due to condition 2.4.4.

Next, we consider case II, where k = j. Similar to case I, we use formula 2.4.7 iteratively,
but we do the integration in different order. Denote integration in λi for i = j, k by dλ\j,k ,
and observe
∞

Ij ≤

√1
d

dλk

...
0

0

√1
d

βr
d
r=1 |λr |

d−1
∞

∞

≤ c

2β γ
|λj | j

∞

∞

dλk

∞
0

0

∞
√1
d

βr
d
r=1 |λr |

1
β

|λk | k

2γ −

dλj dλ\j,k

1

...
d−2

≤ c

2γ

2βj γ
βj

−

1
r=k β
r

2βj γ
2γ − 1 −
βj
βj

dλ\j,k

dλk

< ∞,

where the second and the third inequality follow since 2γ βj > 2βj γ + 1 and
βk

2γ −

2βj γ
βj

−

1
r=k β
r

> 1, respectively using the assumption 2.4.4. Therefore, the

proof is complete.
Next, we will consider similar situation as in theorem 2.4.2, but this time we put discrete
32

spectral measure mixed with the ones of the form 2.4.1. For that purpose, consider discrete
spectral measure of the form

F ({−γ n }) = F ({γ n }) = αn ,

where γ n ∈ Rd , αn ≥ 0, for n ≥ 1, and

|γ n |2
∞
n=1 1+|γ n |2 αn

(2.4.8)

< ∞. If {γ n , n = 1, 2, ...} is a

bounded subset of Rd , then in view of theorem 2.3.4, this spectral measure won’t affect the
equivalence of Gaussian measures. Therefore, we consider here only the case where |γ n | → ∞
as n → ∞.
Theorem 2.4.3. Let X and Y be two independent centered GRFs with stationary increments
with spectral measures FX and FY . Suppose FX has density with respect to Lebesgue measure
on Rd , denoted by f , which satisfies both conditions (C1) and (C2), and FY is a discrete
measure of the form 2.4.8. Then, if

n>N

αn
< ∞,
f (γ n )

(2.4.9)

for some N ≥ 1, then X and X + Y are locally equivalent.
Proof. First of all, using theorem 2.3.4, we can assume αn = 0 for n = 1, ..., N , without
having any consequences on the equivalence of Gaussian measures. Second, observe that for
all φ ∈ LeΠ , φ F ≤ φ F +F , which by remark 2.3.2, is equivalent to condition (i) in
X
X
Y
T
theorem 2.3.2. All we need to prove is then to show that the function ψ in theorem 2.3.2
is in L2 (f ⊗ f ). For that, using Foubini’s theorem and the reproducing kernel property we

33

have
2

ψ 2f ⊗f =

Rd Rd n>N

=

αn αm
n,m>N

αn KT (ω, γ n )KT (λ, γ n ) f (ω)f (λ)dωdλ

R d Rd

KT (ω, γ n )KT (λ, γ n )KT (ω, γ m )KT (λ, γ m )
× f (ω)f (λ)dωdλ

αn αm |KT (γ m , γ n )|2

=
n,m>N


≤ 
n>N

2
αn 
f (γ n )

< ∞

by the assumption 2.4.9. The proof is complete.

34

Chapter 3
Covariance Tapering for Anisotropic
Nonstationary Gaussian Random
Fields with Application to Large Scale
Spatial Data Sets
Estimating the covariance structure of spatial random processes is an important step in spatial data analysis. Maximum likelihood estimation is a popular method in spatial models
based on Gaussian random fields. But calculating the likelihood in large scale data sets is
computationally infeasible due to the heavy computation of the precision matrix. One way
to mitigate this issue, which is due to Furrer et al. (2006), is to “taper” the covariance
matrix. While most of the results in the current literature focus on isotropic tapering for
stationary Gaussian processes, there are many cases in application that require modeling of
anisotropy and/or nonstationarity. In this chapter, we propose a nonstationary parametric
model, in which the underlying Gaussian random field may have different regularities in different directions, thus can be applied to model anisotropy. Using the theory of equivalence
of Gaussian measures under nonstationary assumption, strong consistency of the tapered
likelihood based estimation of the variance component under fixed domain asymptotics are

35

derived by putting mild conditions on the spectral behavior of the tapering covariance function. The procedure is illustrated with numerical simulation.

3.1

Introduction

Spatial Statistics is nowadays a very active research field in Statistics, and has many applications in Geology, Agricultural Science, Environmental Science, Climate data, etc. [9, 41].
A common problem in this field is the estimation of the covariance structure in spatial models based on Gaussian random fields. Likelihood-based estimators are the most popular
method for estimating the covariance parameters. However, in large scale spatial data sets,
calculating the likelihood is computationally infeasible due to the heavy calculation of the
precision matrix. There are different ways to overcome this issue. The first idea is to set
the off-diagonal entries of the covariance matrix to zero, or to keep the first k subdiagonal
entries for some integer k and put the rest of them to be zero, which is called banding. In
this way, the resulting matrix becomes sparse, and one can use the existing algorithms dealing with sparse matrices to handle the new covariance matrix efficiently. But, the problem
with banding is that the final covariance matrix may not be positive definite, which is a
huge drawback since all the theoretical covariance matrices must be positive definite. An
alternative approach is to multiply the covariance function by a positive definite compactly
supported correlation function. By this way, the resulting covariance matrix is again sparse,
but still positive definite. This is called covariance tapering [18]. A natural question is that
how we can use tapering to construct consistent estimates for the covariance parameters in
spatial regression models. Kaufman et al. (2008) proposed two different likelihood-based
estimations of the parameters in the Matern covariance function, and proved the strong con-

36

sistency of the estimates using the results in the equivalence of stationary Gaussian measures
[37]. See also [12] for more results in the asymptotics of tapered maximum likelihood estimators. There are two strong assumptions under which the consistency is proved, isotropy,
and stationarity of the underlying Gaussian random field. However, there are many cases
in application in which the data sets resemble anisotropy and nonstationarity. There have
been interests in general (not necessarily in the context of covariance tapering) for dropping
these conditions in spatial modellings (See [11], and [1]). In this chapter, we relax both of
these conditions simultaneously. For that purpose, in section 3.2, we introduce a class of
parametric models for spatial modeling which are anisotropic and non-stationary. In section
3.3, we state the main result of the paper which is deriving strong consistency of the tapered
likelihood-based estimates of the variance parameter. In the last section, we demonstrate
the procedure proposed here via numerical simulation.

3.2

Preliminary

In this section, we introduce a class of Gaussian random fields which can be used as spatial
models. Any Gaussian field model needs a mean structure and a covariance structure to
be uniquely defined. However, finding new models for covariance structure might be hard
since one can only choose covariance structures from the family of positive definite functions and it is complicated to verify the positive definiteness. An alternative is to use the
spectral representations of positive definite functions or variograms. We consider here specifically Gaussian field models with stationary increments. By applying the results in [49], the
covariance functions of Gaussian random fields with stationary increments on Rd can be determined by a symmetric non-negative measures on Rd \ {0} satisfying certain integrability

37

condition (See e.g., [47] for more details). This measure is called spectral measure. If this
measure is absolutely continuous with respect to Lebesgue measure on Rd , we will call its
Radon-Nikodym derivative the “spectral density”.
Now, we propose the following spectral density:
σ2

f (λ) =
1+

where λ = (λ1 , ..., λd ) ∈ Rd , Q =

d
j 1/Hj ,

Hj
j |λj |

Q+ν

(3.2.1)

and σ 2 , ν and Hj ’s are all positive parameters.

σ 2 is the variance component, and Hj ’s are related to the smoothness of the model in
different directions. Figures 3.1 and 3.2 are realizations of such a Gaussian field over the
two-dimensional grid [0, 1]2 with increments of 0.02 with parameters H1 = 1.5, H2 = 3, and
ν = 0.4.
Having different parameters, Hj , j = 1, ..., d, for different components of λ ∈ Rd , allows
the underlying random field to have different regularities in different directions, and therefore,
this model is able to capture anisotropy appearing in spatial data sets. (See more about
fractal and smoothness properties of these models in [47]. See also [43]).

3.3

Main Result

In this section, we state the main result of the article, which is the consistency of the tapered
likelihood estimation of the variance parameter σ 2 in the model introduced in the last section
under the fixed domain asymptotics. The following theorem, shows how to get equivalent
Gaussian measures in tapering set up, which is necessary in proving strong consistency, by
putting mild conditions on the spectral behavior of the tapering covariance function.

38

Figure 3.1 3D Simulated Surface on [0, 1]2 with increments 0.02
Theorem 1. Suppose {Z(t), t ∈ T }, where T ⊆ Rd is a mean zero Gaussian random field
with stationary increments, and the spectral density of the form 3.2.1 for some positive constants σ 2 , ν, and Hj , j = 1, ..., d. Let’s denote the covariance function of the process by
K0 (x). If K1 (x) = K0 (x)Kt (x) where Kt (x) is a correlation function, and its Fourier transform exits and satisfies the following condition:

M

ft (λ) ≤
1+
for some

j

Hj Q+ν+

(3.3.1)

|λj |

> max{Q/2, 1/min{Hj /2}}, and M > 0. Now, if Hj > 1, j = 1, ..., d, then the

Gaussian measures induced by K0 and K1 will be equivalent on the paths of {Z(t), t ∈ T }
39

Figure 3.2 2D Projection of the Simulated Surface on [0, 1]2 with increments 0.02
for any bounded subset T ⊂ Rd .
Remark 1. Proof of this theorem is similar to that of Theorem 1 in [28]. However, due to
the nonstationarity assumption, it relies on the study of the equivalence of Gaussian random
fields with stationary increments, which is discussed in details in the second chapter.
Suppose one observes the random process at locations sj ’s j = 1, ..., n on some bounded
domain in Rd , and Kt (x) is the tapering covariance function. Fix a tapering parameter
γ > 0. Define the tapering matrix T (γ) to be T (γ)ij = Kt ( si − sj ; γ). This means that
if si − sj

> γ, then T (γ)ij = 0. Denote the true covariance matrix of the process by

Σ(θ). Now, the idea of tapering is to use Σ(θ) ◦ T (γ) as the new covariance matrix in the
40

likelihood of the process. Here, ◦ means the element wise matrix product. Therefore, the
Tapered Maximum likelihood estimator (Tapered MLE ) of σ 2 will be:

σt2 = argmaxσ2 Tapered likelihood
= argmaxσ2

1
1
− log |Σ(θ) ◦ T (γ)| − Z (Σ(θ) ◦ T (γ))−1 Z ,
2
2

where Z is the vector of observed values of the random field at the specified locations.
Evaluating the inverse of Σ(θ) ◦ T (γ) is much faster than Σ(θ) in many cases, especially with
large n. This is the benefit of using Tapered MLE. However, this estimator might be biased
in general. In [28], another estimator based on the same idea of tapering is presented which
is unbiased. We call it here the Adjusted Tapered MLE, and it is defined as follows:

2
σadj.
= argmaxσ2 Adjusted tapered likelihood

= argmaxσ2

1
1
− log |Σ(θ) ◦ T (γ)| − Z
2
2

(Σ(θ) ◦ T (γ))−1 ◦ T (γ) Z .

We will see in section 4 that the Adjusted Tapered MLE performs better comparing to the
Tapered MLE due to its unbiasedness. Theorem 1 is the key part in proving the consistency
of the tapered maximum likelihood estimators for the variance component (σ 2 ). The outline
of the proof is similar to Theorem 2 in [28], and also it relies on the above theorem. We omit
the proof here.

41

3.4

Simulation Results

In this section, we illustrate the methods discussed in previous parts by applying them into
some simulated spatial data sets. For this purpose, we simulated 1000 data sets, each consisting of a multivariate Gaussian vector of length 14×14. The locations are two-dimensional
grid over [0, 1]2 with increments 0.07. We added a random noise in each coordinate, uniformly distributed on [−0.01, 0.01]. We used the spectral density 3.2.1 to generate the normal
vector with H1 = 1.5, H2 = 2, ν = 0.5, and σ 2 = 1. Figure 3.3 shows a simulated surface
using the above parameters over the specified grid.

Figure 3.3 Simulated Surface on [0, 1]2 with increments 0.07

We used the same tapering function used in [28] with different tapering parameter γ =
42

0.7, 0.4, 0.2, respectively. We applied three diferent estimators for estimating σ 2 : MLE,
Tapered MLE and Adjusted Tapered MLE (See [28] for details.). Table 3.1 shows the results
for this procedure for different tapering parameters. The values in the table are the averages
of the estimated value over the 1000 repetitions with their standard errors in the bracket.
As it was expected, by decreasing the tapering parameter, the Tapered MLE becomes more
and more biased. However, the Adjusted Tapered MLE is almost unbiased regardless of the
changes in the tapering parameter.
Table 3.1 Results for estimation of σ 2
σ2
MLE
Tapered MLE
Adjusted Tapered MLE

γ = 0.7
γ = 0.4
γ = 0.2
0.999(0.10) 0.999(0.10) 0.999(0.10)
0.437(0.06) 0.409(0.07) 0.400(0.17)
1.025(0.53) 1.019(0.47) 1.031(0.61)

Simulations show that comparing to MLE, Adjusted Tapered MLE is a good alternative
for estimating the covariance parameters in spatial data analysis, and it is computationally
feasible. Further, this method has the potential to be generalized to more complicated
models which have nonstationarity and anisotropy.
There are other methods dealing with covariance estimation in large spatial data sets.
For example, in [11], nonstationary spatial models are defined through some fixed basis
functions, and then weighted least squares method (rather than the MLE approach) is chosen
for covariance parameter estimation with the emphasis of finding the best linear unbiased
prediction (BLUP).

43

Chapter 4
Spatial factor models and dimension
reducing Gaussian predictive
processes for high dimensional
remotely sensed data and forest
variables
Combining spatially-explicit long-term forest inventory and remotely sensed information
from Light Detection and Ranging (LiDAR) datasets through statistical models can be
a powerful tool for predicting and mapping above-ground biomass (AGB) at a range of
geographic scales. We present and examine a novel modeling approach to improve prediction
of AGB using LiDAR data. More specifically, we introduce a multivariate spatial regression
model in which both LiDAR data and AGB play as target variables. There are two sources
of computational infeasibility in the proposed model which need to be addressed. First, the
number of target variables are too high since we put both LiDAR and AGB simultaneously
(In our real dataset, we have more than 100 LiDAR data points at different heights for each
location). Second, the number of locations are also too high (more than 451 locations in our

44

data).In other words, dimension reduction is needed in two aspects: (i) the length of the
vector of outcomes, and (ii) the very large number of spatial locations. We use spatial factor
modeling for the first aspect, and low-rank spatial processes for the second one. Latent
variable (factor) models are usually used to address the large vector of outcomes, and lowrank spatial processes offer a rich and flexible modeling option for dealing with a large
number of locations.
We introduce the model in the first section, and it is followed by a brief discussion on the
identifiablility issue of the model in section 4.2. Preparation for the Bayesian hierarchical
framework for the estimation and prediction is done in sections 4.3, 4.5 and 4.6. In section 4.4,
we explain in detail the low-rank spatial processes chosen here to reduce the computational
time. We finish this chapter by a simulation study and real data analysis.

4.1

Model

The following hierarchical factor model is considered













 

z(s)  x(s) β   A 0  f (s)  z (s)

=
+

+

y(s)
xy (s) η
αf αν
ν(s)
(s)
y

(4.1.1)

where z(s) is m×1 vector, y(s) ∈ R, x(s) is t×m set of covariates for the high-dimensional
signal z(s), xy (s) is r×1 vector of covariates for the target variable y(s), β and η are t×1 and
r × 1 vectors of parameters, respectively. A is m × q matrix of factor loadings with elements
satisfying a1j > 0 ∀j = 1, ..., q. αf = (αf,1 , αf,2 , ..., αf,q ) is a 1 × q vector of coefficients,
and αν > 0. Also, f (s) = (f1 (s), f2 (s), ..., fq (s)) where each fj (s) follows an independent

45

spatial GP (0, Cj (s1 , s2 ; θj )) with unit variance, and ν(s) follows a GP (0, Cj (.; θν )) with unit
2 )) and
variance and independent from all fj s. z (s) and y (s) are N (0, Σ z := diag(τ12 , ..., τm

N (0, τy2 ) where τ 2 and τy2 are representing the usual geostatistical nuggets for predicting z(s)
and y(s), respectively.
In this model, z(s) represents the LiDAR data, y(s) is the biomass, and fj (s)’s are the
factors with the total number of q factors.

4.2

Identifiablity Issue








f (s)
A 0 
Define Λ := 
. The indentifiability issue in model 4.1.1
, and ω(s) := 
ν(s)
αf αν
relates to the existence of an (q + 1) × (q + 1) orthogonal matrix P (i.e. P P = P P = Iq )
such that P w(s) has the same covariance structure as w(s). Following the argument in
[35], the only possible matrices in this case are the ones whose entries are in {−1, 0, 1} with
exactly one nonzero element in each row and each column. In other words, changing the
sign of some of the components in w(s), and permuting some of the elements in w(s) are the
only two groups of transformations which can lead to another Gaussian random field with
the same distribution as w(s). Both cases need to be handled in order to have an identifiable
model. The first group can be prevented by putting constrain on the sign of the parameters
involved in Λ. To be precise, at each column in Λ, the sign of one of the elements needs to
be fixed. Here, we have positive elements in the first row of A with αν > 0, which together
will solve the first part. Switching the orders of the elements in w(s) can be prevented by
putting constrain on the priors related to the covarince parameters in f (s) and ν(s), which
will be addressed in the next section.

46

4.3

Prior Specification

We put a multivariate normal prior on the mean parameters β and η, with mean µβ and µη ,
−1
and variance Σβ and Ση , respectively ( we will consider a flat prior, i.e. Σ−1
β = Ση = 0, and

also µβ = µη = 0 ). Both nugget effects τ 2 and τy2 are assigned Inverse Gamma distribution
IG(a, b) with choices of a = 2 for having an infinite variance and b = 5. All the parameters
in the matrix Λ are considered to be independent normal random variables or truncated
normal random variables depending on whether their sign are determined beforehand. More
precisely, aij ∼ N (0, c2a ) for i = 1, αf,i ∼ N (0, c2α ) for i = 1, ..., q, and a1i ∼ N (0, c2a )I(a1i >
0) for i = 1, ..., q, and αν ∼ N (0, c2a )I(αν > 0). Correlation functions related to fj s and ν
are considered to be exponential correlation function, i.e. cov(fj (s), fj (t)) := ρj (s, t; φj ) =
exp(−φj

s−t ) for j = 1, ..., q, and also cov(ν(s), ν(t)) := ρν (s, t; φν ) = exp(−φν

s−t ).

Because of identifiability issue discussed in section 4.2, we construct a joint distribution for
the φj s to ensure ordering. In particular, we set

π(φ) = π(φ1 )π(φ2 | φ1 )...π(φq | φq−1 , ..., φ1 )

(4.3.1)

where π(φ1 ) is a uniform density with support (φl , φu ) with φl = − log(0.05)/dmax and
φu = − log(0.01)/dmin , where dmin and dmax are the minimum and maximum distance
across all the locations, and for j = 2, 3, ..., q

cj
π(φj | φj−1 , ..., φ1 ) ∝ exp −
φj − φj−1

I(φj−1 < φj < φu ).

(4.3.2)

Based on the discussion in [35], we fix cj = 2j as a reasonable choice.

Also, φν ∼

unif orm(φl , φu ).
47

4.4

Approximation using Gaussian predictive processes

Due to computational burden, we need to replace the random effects f (s) and ν(s) by a
low-rank approximation of them. One way to address this issue is to project the mentioned
Gaussian random fields into the space generated by them at certain locations, called “knots”
(We refer to [4] and [16] for the complete discussion on Gaussian predictive processes).
Suppose S ∗ = {s∗1 , ..., s∗n∗ } is the set of knots considered for approximating both fields f (s)
and ν(s). We replace f (s) and ν(s) with f ∗ (s) and ν ∗ (s) with the following details. Define
the Gaussian field
f˜k∗ (s) ∼ N fk∗ (s), σk∗ 2 (s)

(4.4.1)

where

fk∗ (s) := dk (s; φk ) Dk∗ (φk )−1 fk (S ∗ ), σk∗ 2 (s) := 1 − dk (s; φk ) Dk∗ (φk )−1 dk (s; φk ),

(4.4.2)

and
ν˜∗ (s) ∼ N ν ∗ (s), σν∗ 2 (s)

(4.4.3)

where

ν ∗ (s) := dν (s; φν ) Dν∗ (φν )−1 ν(S ∗ ), σν∗ 2 (s) := 1 − dν (s; φν ) Dν∗ (φν )−1 dν (s; φν ),

(4.4.4)

and dk (s; φk ) is an n∗ × 1 vector whose ith element is ρk (s, s∗i ; φk ), Dk∗ (φk ) is an n∗ × n∗
covariance matrix whose (i, j)th entry is ρk (s∗i , s∗j ; φk ), fk (S ∗ ) is an n∗ × 1 vector with
elements as the evaluation of the field fk at the knot points. The case for ν(s) is exactly
similar to the fields fk s by just simply replacing the index k by ν. After this replacement,
48

model 4.1.1 can be written as












f ∗ (s)







∗ (s)
z


z(s)  x(s) β   A 0  
 

=
+

+

∗
∗
y(s)
xy (s) η
αf αν
ν (s)
y (s)

(4.4.5)

where f ∗ (s) = (f1∗ (s), f2∗ (s), ..., fq∗ (s)) , ∗z (s) ∼ N (0, Σ ∗ (s) := AΣf˜∗ A + Σ z ), and ∗y (s) ∼
z
N (0, σ 2∗ (s) := αf Σf˜∗ αf + αν2 σν∗ 2 (s) + τy2 ) with Σf˜∗ to be a q × q diagonal matrix with
y

diagonal elements σk∗ 2 (s), k = 1, ..., q.

4.5

Data equation

Let








 
0 
β 
, b =  ,
xy (s)
η

z(s)
x(s)
Y (s) = 
 , X (s) = 
y(s)
0




∗
∗
f (s)
 z (s)
ω ∗ (s) = 
.
 , (s) = 
∗
∗
ν (s)
y (s)

(4.5.1)

Now, the model can be written as

Y (si ) = X(si )b + Λ ω ∗ (si ) + (si )

(4.5.2)

for i = 1, ..., n. Also, the matrix form of the model is

Y = Xb + Λ ω ∗ +

49

(4.5.3)

where


0
Λ
 Y (s1 ) 
 X(s1 ) 












..
Y =  ...  , X =  ...  , Λ = 
,
.












0
Λ
Y (sn )
X(sn )




∗
 (s1 ) 
 ω (s1 ) 








=  ...  , ω ∗ =  ...  .








∗
(sn )
ω (sn )








(4.5.4)

Here, ω ∗ ∼ N (0, Σω∗ ), and ∼ N (0, Σ ),








0 
0 
Σω∗ (s1 )
Σ (s1 )








.
.
.
.
Σ =

 , Σω∗ = 
.
.








0
Σω∗ (sn )
0
Σ (sn )

(4.5.5)

with








0 
0 
Σ ∗z (si )
Σf ∗ (si )
,Σ ∗
=
Σ (s ) = 


 ω (si )

i
2
2
0
σ ∗ (s )
0
σν ∗ (s )
y i
i

(4.5.6)

where Σf ∗ (s) is a diagonal matrix with diagonal elements dk (s; φk ) Dk∗ (φk )−1 dk (s; φk ), and
σν2∗ (s) = dν (s; φν ) Dν∗ (φν )−1 dν (s; φν ).

4.6

Computations

The first step for computation purposes is to find the full conditionals of the parameters
involved in the model.
50

4.6.1

Updating the mean part b

˜β
The posterior of β follows Nt µ
˜β , Σ

with:
−1

n

˜β =
Σ

Σβ

−1

+

x(si )Σ (si )x(si )

(4.6.1)

i=1

and
n

˜β
µ
˜β = Σ

Σβ −1 µβ +

x(si )Σ (si ) (z(si ) − Af (si )) ,

(4.6.2)

i=1

where Σ (s) is an (m) × (m) diagonal matrix with j-th diagonal element τj−2 for j = 1, ..., m.
˜ η with:
The posterior distribution of η follows a Nr µ
˜η , Σ
−1

n

˜η =
Σ

Ση

−1

xy (si )xy (si ) /τν2

+

(4.6.3)

i=1

and
n

˜η
µ
˜η = Σ

Ση

−1 µ

η

xy (si )τν−2 (y(si ) − λν ω(si )) ,

+

(4.6.4)

i=1

where λν is the last row of the matrix Λ.

4.6.2

Updating ω

ω consists of two main components: f and ν , and they will be updated similarly. For
k = 1, ..., q, fk will follow a Nn

˜
µ
˜f , Σ
f
k
k

with:

˜ = D (φk ) D (φk ) + Dk (φk ) Σf (φk )−1 Dk (φk )
Σ
k
k
f
k
k

51

−1

Dk (φk )

(4.6.5)

and
˜ D (φk )−1 Dk (φk ) Σf (φk )−1 fk ,
µ
˜f = Σ
k
f
k
k
k

(4.6.6)

where Σf (φk ) is an n × n diagonal matrix whose j-th diagonal element is σk 2 (sj ), and
k
Dk (φk ) is the covariance matrix between the observation locations and the knots for the
k-th factor fk .

4.6.3

Updating the loading factors Λ

For j = 2, ..., m, the full conditionals for the j-th row of Λ follows Nq (mj , Kj ), where

−2 F F
Kj−1 = c−2
a Iq + τj

(4.6.7)

mj = τj−2 Kj F (zj − µj ),

(4.6.8)

and

where F = (f (s1 ), ..., f (sn )) , and µj is the mean vector of zj . The first row of Λ is normal
distribution truncated at 0 with mean m1 and variance K1 . The last row of Λ follows
Nq+1 (mν , Kν ) with:
−2 W W
Kν−1 = c−2
a Iq+1 + τν

(4.6.9)

and
mν = τν−2 Kν W (y − µy ),

(4.6.10)

where W = (ω(s1 ), ..., ω(sn )) , and µy is the mean vector of yj . The last element of this row
is truncated at 0.

52

4.6.4

Updating the nuggets τ

For j = 1, ..., m, the full conditional distribution of τj2 will be Inverse Gamma with shape
parameter a + n/2, and scale parameter b + 21 zj − µj − F λj

zj − µj − F λj where λj is

the j-th row of the matrix Λ. τν2 will be also Inverse Gamma with the same shape parameter,
but with scale parameter b + 12 (y − µy − W λν ) (y − µy − W λν ) where λν is the last row of
the matrix Λ.

4.7

Simulations

We illustrate the performance of the model using a simulation study. Here, we generated
data over 256 locations across a [0, 50] × [0, 50] square using the following parameters:


 




1 0
2 0 0
 0.1 
2
5






 






 
b = 10 , Λ = −1 2 0 , φ =  0.4  , Σ = 0 5 0 .






 






 
0 0 1
0.12
1 −1 2
12
In this simulation, t = r = 1 and m = q = 2. We used Gaussian predictive processes
to reduce the computation time by selecting 81 knots. Table 4.1 shows the parameter
estimates together with their credible interval. In can be observed from this table that all
the true parameter values are within their 95% credible interval calculated by the posterior
distributions.

53

Table 4.1 Parameter estimates with their 95% credible intervals

4.8

parameter
β1
β2
η

true value
5
10
12

estimate
6.278622
9.152451
12.13515

credible interval
(4.213607,8.506537)
(8.141619,10.197632)
(10.67103,13.93106)

τ12
τ22
τν2

2
5
1

1.7069400
2.6038261
1.4963317

(0.9133798,2.6578697)
(0.8570427,6.4833734)
(0.7562288,2.4505370)

Λ1,1
Λ1,2
Λ2,1
Λ2,2
Λ3,1
Λ3,2
Λ3,3

2
1
-1
2
1
-1
2

2.235054
0.6389909
-1.0007031
2.371756
0.6106929
-0.8785957
2.430128

(1.418790,2.978264)
(0.2377660,1.4958831)
(-1.8632592,-0.3081513)
(1.055427,2.886742)
(-0.1468869,2.2537335)
(-1.3010258,-0.5326576)
(1.571113,3.469093)

φ1
φ2
φν

0.1
0.4
0.12

0.06601322 (0.04838031,0.12743472)
0.6352006
(0.3916554,0.8975183)
0.08413867 (0.04986396,0.19860435)

Real Data Analysis

In this section, the proposed model is assessed using LiDAR data acquired from NASA Goddards LiDAR, Hyper-spectral & Thermal imager and field inventory data from the Penobscot
Experimental Forest in Bradley, Maine (NASA Goddards LiDAR, Hyper-spectral, and Thermal (G-LiHT) imager 28 is an air-borne platform developed, in part, to examine how future
space-originating LiDAR, e.g., ICESat-2, 29 GEDI, or other platforms, may be combined
with field-based validation measurements to build predictive models for AGB and other forest variables; See [3] and [8] for more information). The dataset consist of two parts: (i)
biomass measurements for over 451 locations, and (ii) LiDAR signal data observed at more
than 100 different heights in more than 5000 locations including those we have their biomass
(AGB) measurements. Figure 4.1 plots the Penobscot Experimental Forest together with
the AGB measurements.
54

Figure 4.1 AGB measurements from the Penobscot Experimental Forest in Bradley, Maine
We only used the portion of the data in which we have both AGB measurement and
LiDAR data for the analysis so that we would be able to use the multivariate model proposed
in this chapter for data fitting. This portion consists of 451 locations with 64 outcome
variables (63 from the LiDAR data, and one from the AGB). The maximum number of factors
is set to be 10. The stochastic factor selection is done similarly as in [35] by putting Bernoulli
priors on keeping each factor in the model independently (See section 2.3 in [35] for more
explanation on the adaptive Bayesian factor selection). Further, 100 knots are chosen for the
low-rank Guassian predictive processes approximation. We used 20,000 MCMC iterations,
including 2,000 samples for burn–in. Table 4.2 summarizes the fitted model which includes

55

some of the parameter estimates with their credible intervals. Out of all the 10 factors, factor
# 5 was not selected in the stochastic selection procedure we have chosen in our model. One
could observe the wide credible interval for the coefficient αf,5 as compared to all the other
factor loadings as a sign of its corresponding factor not being selected. Moreover, figure 4.2
shows both the real data and the fitted values coming from the fitted model. Also, in figure
4.3 one can see the fitted values for the LiDAR data at several different locations.

Figure 4.2 Fitted plot of AGB measurements together with the observed values

56

Figure 4.3 LiDAR data with fitted values and credible bands

57

Table 4.2 Parameter estimates with their 95% credible intervals
parameter
η0

estimate
0.9912472

credible intervals
(0.9559148,1.0267694)

τy2

0.1186832

(0.1039515,0.1366400)

αf,1
αf,2
αf,3
αf,4
αf,5
αf,6
αf,7
αf,8
αf,9
αf,10
αf,ν
φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φν

0.1887652
(-0.2071523,0.5462206)
-0.1370774
(-0.4929575,0.1614393)
0.0883730 (-0.16048225,0.32833808)
0.0604460 (-0.16171807,0.27297671)
-0.016114
(-6.11216314,6.18284331)
-0.021537
(-0.24357843,0.08823335)
0.0242004 (-0.13238865,0.18407468)
-0.115645
(-0.30288243,0.01502838)
0.1121742
(-0.2331279,0.2656526)
-0.165493
(-0.2868342,0.1519692)
0.112052 (0.005656521,0.357829027)
0.7551075
0.9855927
1.375376
2.140675
14.867140
31.397823
46.783236
61.83730
77.90531
95.05987
0.7620842

(0.7501629,0.7775895)
(0.9161703,1.0884193)
(1.211139,1.602903)
(1.734234,2.813785)
(3.371117,35.734567)
(5.248678,53.407113)
(8.355846,67.885794)
(18.15725,81.25511)
(35.90097,91.89012)
(55.68514,99.91256)
(0.7504561,0.8102750)

58

BIBLIOGRAPHY

59

BIBLIOGRAPHY
[1] Anderes, E. B. and Stein, M. L.: “Estimating deformations of isotropic Gaussian random
fields on the plane”, The Annals of Statistics, 719–741(2008)
[2] Aronszajn, N.: “Theory of reproducing kernels”, Transactions of the American mathematical society, 68.3, 337–404(1950)
[3] Awadallah, A., Abbott, V., Wynne, R., and Nelson, R.: “Estimating forest canopy
height and biophysical parameters 454 using photon-counting laser altimetry”, Proc. 13th
International Conference on LiDAR Applications for Assessing Forest 455 Ecosystems,
1291-136 (2013)
[4] Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H.: “Gaussian predictive process
models for large spatial data sets”, Journal of the Royal Statistical Society, Series B,
70(4), 825–848 (2008)
[5] Baudoin, F. and Nualart, D.: “Equivalence of Volterra processes”, Stochastic processes
and their applications, 107(2), 327–350 (2003)
[6] Chatterji, S. D. and Mandrekar, V.: “Equivalence and singularity of Gaussian measures
and applications”, Probabilistic analysis and related topics, 1, 169–197 (1978)
[7] Cheridito, P.: “Mixed fractional Brownian motion”, Bernoulli, 7(6), 913–934 (2001)
[8] Cook, B., Corp, L., Nelson, R., Middleton, E., Morton, D., McCorkel, J., Masek, J.,
Ranson, K., Ly, V., ad Montesano, P.: “Nasa goddards lidar, hyperspectral and thermal
(g-liht) airborne imager”, Remote Sensing, 5, 4045-4066 (2013)
[9] Cressie, N.A.C.: “Statistics for Spatial Data”, 2nd ed. New York: Jone Wiley & Sons
(1993)
[10] Cressie, N. and Huang, H. C.: “Classes of nonseparable, spatio–temporal stationary
covariance functions”, Journal of the American Statistical Association, 94(448), 1330–
1339(1999)
[11] Cressie, N., Johannesson, G.: “Fixed rank kriging for very large spatial data sets”,
Journal of the Royal Statistical Society: Series B, Vol. 70(1), 209–226 (2008)
60

[12] Du, J., Zhang, H., Mandrekar, V. S.: “Fixed–domain asymptotic properties of tapered
maximum likelihood estimators”, The Annals of Statistics, Vol. 37(6A), 3330-3361 (2009)
[13] Dunford, N. and Schwartz, J. T.: “Linear operators. Part 2: Spectral theory. Self adjoint
operators in Hilbert space”, New York (1963)
[14] Ebeling, W.: “Functions of several complex variables and their singularities”, Vol. 83.
American Mathematical Soc. (2007)
[15] Feldman, J.: “Equivalence and perpendicularity of Gaussian processes”, Pacific J.
Math., 8(4), 699–708 (1958)
[16] Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E.: “Improving the performance
of predictive process modeling for large datasets”, Computational statistics and data
analysis, 53(8), 2873–2884 (2009)
[17] Fuentes, M. and Smith, R. L.: “A new class of nonstationary spatial models”, Technical
report, North Carolina State University, Raleigh, NC, (2001)
[18] Furrer, R. and Genton, M. G. and Nychka, D.: “Covariance tapering for interpolation of
large spatial datasets”, Journal of Computational and Graphical Statistics, 15(3), (2006)
[19] Gikhman, I. I. and Skorokhod, A. V.: “The Theory of Stochastic Processes: I”, Springer
(1965)
[20] Gneiting, T.: “Nonseparable, stationary covariance functions for space–time data”,
Journal of the American Statistical Association, 97(458), 590–600 (2002)
[21] Gneiting, T. and Genton, M. and Guttorp, P.: “Geostatistical space–time models, stationarity, separability and full symmetry”, Statistical Methods for Spatio–Temporal Systems, 151–175 (2007)
[22] Hajek, J.: “On a Property of Normal Distribution of Any Stochastic Processes”, Math.
Statist. Prob., 245–252 (1958)
[23] Halmos, P. R.: “Introduction to Hilbert space and the theory of spectral multiplicity”,
New York: Chelsea (1957)
[24] Higdon, D. and Swall, J. and Kern, J.: “Non–stationary spatial modeling”, Bayesian
statistics, 6(1), 761–768 (1999)

61

[25] Ibragimov, I. A. and Rozanov, I. A.: “Gaussian random processes”, New York: SpringerVerlag. (1978)
[26] Jun, M. and Stein, M. L.: “Nonstationary covariance models for global data”, The
Annals of Applied Statistics, 1271–1289 (2008)
[27] Kallianpur, G. and Oodaira, H.: “Non–anticipative representations of equivalent Gaussian processes”, The Annals of Probability, 1(1), 104–122 (1973)
[28] Kaufman, C. G. and Schervish, M. J. and Nychka, D. W.: “Covariance tapering for
likelihood-based estimation in large spatial data sets”, Journal of the American Statistical
Association, 103(484), 1545–1555 (2008)
[29] Luan, N. and Xiao, Y.: “Spectral conditions for strong local nondeterminism and exact
Hausdorff measure of ranges of Gaussian random fields”, Journal of Fourier Analysis and
Applications, 18(1), 118–145 (2012)
[30] Nychka, D. and Wikle, C. and Royle, J. A.: “Multiresolution models for nonstationary
spatial covariance functions”, Statistical Modeling, 2(4), 315–331 (2002)
[31] Paciorek, C. J. and Schervish, M. J.: “Spatial modeling using a new class of nonstationary covariance functions”, Environmetrics, 17(5), 483–506 (2006)
[32] Parzen, E.: “Probability density functionals and reproducing kernel Hilbert spaces”,
Proceedings of the Symposium on Time Series Analysis, 196, 155–169 (1963)
[33] Pitt, L. D.: “Stationary Gaussian Markov fields on Rd with a deterministic component”,
Journal of Multivariate Analysis, 5.3, 300–311 (1975)
[34] Pitt, L. D.: “Some problems in the spectral theory of stationary processes on Rd ”,
Indiana Univ. Math. J., 23, 343–365 (1973)
[35] Ren, Q., and Banerjee, S.: “Hierarchical Factor Models for Large Spatially Misaligned
Data: A LowRank Predictive Process Approach”, Biometrics, 69(1), 19–30 (2013)
[36] Ronkin, L. I.: “Introduction to the theory of entire functions of several variables”,
American Mathematical Soc (1974)
[37] Skorokhod, A. V. and Yadrenko, M. I.: “On absolute continuity of measures corresponding to homogeneous Gaussian fields”, Theory of Probability and Its Applications,
18(1), 27–40 (1973)
62

[38] Sottinen, T. and Tudor, C. A.: “On the equivalence of multiparameter Gaussian processes”, Journal of Theoretical Probability, 19(2), 461–485 (2006)
[39] Stein, M. L.: “Uniform asymptotic optimality of linear predictions of a random field
using an incorrect second-order structure”, The Annals of Statistics, 850–872 (1990)
[40] Stein, M. L.: “Predicting random fields with increasing dense observations”, The Annals
of Applied Probability, 9(1), 242–273 (1999)
[41] Stein, M. L.: “Interpolation of spatial data: some theory for kriging”, Springer (1999)
[42] Stein, M. L.: “Equivalence of Gaussian measures for some nonstationary random fields”,
Journal of Statistical Planning and Inference, 123(1), 1–11 (2004)
[43] Stein, M. L.: “Space–time covariance functions”, Journal of the American Statistical
Association, 100(469), 310–321 (2005)
[44] Van Zanten, H.: “When is a linear combination of independent fBms equivalent to a
single fBm?”, Stochastic processes and their applications, 117(1), 57–70 (2007)
[45] Van Zanten, H.: “A remark on the equivalence of Gaussian processes”, Electronic Communications in Probability, 13, 54–59 (2008)
[46] Xiao, Y.: “Strong local nondeterminism and sample path properties of Gaussian random
Fields”, Asymptotic Theory in Probability and Statistics with Applications (T.-L. Lai,
Q.-M. Shao and L. Qian, eds), Higher Education Press, Beijing., 136–176 (2007)
[47] Xue, Y. and Xiao, Y: “Fractal and smoothness properties of space-time Gaussian models”, Frontiers of Mathematics in China, 6(6), 1217–1248 (2011)
[48] Yadrenko, M. I.: “Spectral theory of random fields”, A. V. Balakrisn’an (Ed.). Optimization Software. Publications Division (1983)
[49] Yaglom, A. M.: “Some classes of random fields in n-dimensional space, related to stationary random processes”, Theory of Probability and Its Applications, 2(3), 273–320
(1957)
[50] Zhang, H.: “Inconsistent estimation and asymptotically equal interpolations in model–
based geostatistics”, Journal of the American Statistical Association, 99(465), 250–261
(2004)

63