By
Nian Liu
A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Statistics—Doctor of Philosophy
2024
PARAMETER ESTIMATION FOR UNIVARIATE AND BIVARIATE GAUSSIAN
PROCESSES AND FIELDS
ABSTRACT
Gaussian random fields are widely studied in various subject areas. This dissertation focuses
on estimating covariance parameters of stationary Gaussian random fields based on both regularly
and irregularly spaced sampling points, as well as investigating the infill asymptotic properties of
the estimators.
We first consider a bivariate Gaussian random process and propose an increment-based estimator
for the smoothness parameter in the cross-covariance function, for which the strong consistency
and asymptotic normality hold under the infill asymptotic framework. We further study the joint
asymptotic distribution of estimators for smoothness parameters in the cross-covariance and autocovariance
functions. Subsequently, we estimate the scale parameter and range parameters of a
univariate anisotropic Ornstein-Uhlenbeck field based on quadratic forms of vectors of observations.
The estimators we propose are computationally more efficient than the maximum likelihood
estimators but have similar infill asymptotic performances with MLEs. Another computational
complexity reduction method we use is the Vecchia approximation. We estimate the scale parameter
in the Matérn covariance function using the maximizer of the likelihood approximated by the
standard Vecchia approach. We study the bias resulting from a misspecified range parameter and
the conditioning variables of the Vecchia approximation. The theoretical results in this work are
illustrated by simulations.
Copyright by
NIAN LIU
2024
ACKNOWLEDGEMENTS
The research in this dissertation was partially supported by the NSF grant DMS-2153846.
I would like to express my genuine gratitude to my advisor, Dr. Yimin Xiao, for his support,
encouragement, and guidance in my research and career development. I would also like to express
my appreciation to Dr. Andrew Finley, Dr. Shlomo Levental, Dr. Haolei Weng, and Dr. Dongsheng
Wu for serving on my guidance committee and providing me with valuable suggestions. In addition,
I appreciate the faculty and staff in the Department of Statistics and Probability for their help during
my PhD program.
I would also like to thank my family and friends for their care and support. I am more than
fortunate to be surrounded by such warm and kind people.
iv
TABLE OF CONTENTS
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 2 ESTIMATION OF SMOOTHNESS PARAMETERS . . . . . . . . . . . 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Estimating the Cross Smoothness Parameter . . . . . . . . . . . . . . . . . . . 6
2.3 Irregular Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
CHAPTER 3 ANISOTROPIC ORNSTEIN-UHLENBECK FIELD . . . . . . . . . . . 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Product Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Separable Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
CHAPTER 4 VECCHIA APPROXIMATION . . . . . . . . . . . . . . . . . . . . . . 58
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Maximum Likelihood Estimator for 𝜎2 . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
APPENDIX A QUADRATIC VARIATIONS FROM IRREGULAR SAMPLING . . . . 76
APPENDIX B HIGH EXCURSION PROBABILITY . . . . . . . . . . . . . . . . . . 84
APPENDIX C STOCHASTIC PARTIAL DIFFERENTIAL EQUATION . . . . . . . . 86
v
CHAPTER 1
INTRODUCTION
Gaussian random fields (GRFs) are essential tools in spatial statistics, physics, finance, image
processing, and other various areas. A random field, as a generalization of a stochastic process, is
a collection of random variables indexed by elements in a topological space, which could be taken
as R𝑑 (𝑑 ≥ 1). This work focuses on estimating covariance parameters of stationary GRFs and
investigating infill asymptotic properties of the estimators.
The covariance function of a univariate stationary isotropic GRF {𝑋(t), t ∈ R𝑑 } considered by
Anderes and Stein (2008) and Loh (2015) is written as
Cov(𝑋(s), 𝑋(t + s)) =
Õ⌊𝜈⌋
𝑘=0
𝛽𝑘 ||t||2𝑘 + 𝛽∗
𝜈𝐺𝜈 (||t||) + 𝑂(||t||2𝜈+𝜏) as ||t|| → 0, ∀s, t ∈ R𝑑, (1.1)
where || · || denotes the Euclidean distance, 𝛽0 > 0, 𝛽∗
𝜈 ≠ 0, and 𝜏 > 0 are constants, ⌊𝜈⌋ =
max{𝜈0 ∈ Z : 𝜈0 < 𝜈}, and 𝐺𝜈 : [0, ∞) ↦→ R is defined by
𝐺𝜈 (𝑥) =
8>>>>
<
>>>>:
𝑥2𝜈 + 𝑥2𝜈 (log 𝑥 − 1)1Z(𝜈), 𝑥 > 0,
0, 𝑥 = 0.
This model includes the Matérn and exponential classes of covariance functions, which are widely
used in spatial interpolation (Stein, 1999; Gramacy, 2020).
The isotropic exponential class covariance function is defined as
𝜎2 exp

−𝜃||𝑠||2𝜈

, s ∈ R𝑑, (1.2)
where 𝜎2 > 0, 𝜃 > 0, 0 < 𝜈 ≤ 1. The case when 0 < 𝜈 < 1 is contained in model (1.1) with
𝛽0 = 𝜎2. When 𝜈 = 1/2, the function (1.2) is called the Ornstein-Uhlenbeck covariance function,
which is also a special case of the Matérn class of covariance functions. The Matérn covariance
model
(𝜃||t||)𝜈𝐾𝜈 (𝜃||t||), t ∈ R𝑑, (1.3)
where 𝐾𝜈 is the modified Bessel function of the second kind with order 𝜈, was proposed by von
Kármán (1948) with 𝜈 = 1/3 and 𝑑 = 3. Some properties of the Matérn model were demonstrated in
1
Matérn (1986), Kent (1989), and Stein (1999). The stochastic partial differential equation (SPDE)
that generates a Gaussian process on R𝑑 with the Matérn covariance function is presented in Whittle
(1954) and Whittle (1963) as

∇2 − 𝜃2
 𝑝
𝜉 (x) = 𝜖 (x), x ∈ R𝑑, (1.4)
where ∇2 is the Laplace operator, 𝜃 > 0 and 𝑝 > 𝑑/4 are constants, 𝜖 is the Gaussian white noise
with unit variance. The covariance function of 𝜉 as a solution to (1.4) is
𝐸(𝜉 (s)𝜉 (t + s)) =
(||t||/𝜃)2𝑝−𝑑/2𝐾2𝑝−𝑑/2 (𝜃||t||)
22𝑝−1Γ(2𝑝)
, t, s ∈ R𝑑 . (1.5)
A more general class of stationary GRFs on R2 derived from second-order SPDEs was discussed by
Heine (1955). Later, Vecchia (1985) introduced the derivation of covariance functions from spectral
densities of stationary GRFs on R2, and showed the corresponding SPDEs. One generalization of
model (1.3) is the spatio-temporal covariance function (Cressie and Huang, 1999; Gneiting, 2002;
De Iaco et al., 2002; Ma, 2005, 2008). Jones and Zhang (1997) considered the spatio-temporal
random field defined by the SPDE
Õ𝑑
𝑖=1
𝜕2
𝜕𝑠2
𝑖
! 𝑝
− 𝑐
𝜕
𝜕𝑡
!
𝑍(s; 𝑡) = 𝜖 (s; 𝑡), s = (𝑠1, 𝑠2, . . . , 𝑠𝑑)′ ∈ R𝑑, 𝑡 ∈ R,
where 𝑝 > 𝑑/2 and 𝑐 > 0 are constants, 𝜖 (𝑠; 𝑡) is the Gaussian white noise.
For the multivariate GRF {𝑋(t), t ∈ R𝑑 }, where 𝑋 ∈ R𝑝 and 𝑝 ≥ 1, Gneiting et al. (2010)
introduced a multivariate Matérn model, where the marginal and cross-covariance functions of a
multivariate spatial random field are all of the Matérn type. Hu et al. (2013) introduced an approach
to construct multivariate Gaussian random fields (GRFs) using systems of SPDEs. Based on systems
of SPDEs with additive type G noise whose marginal covariance functions are of Matérn
type, Bolin and Wallin (2020) formulated a new class of multivariate non-Gaussian models. SPDE
models for GRFs are also researched by Hu and Steinsland (2016), Leonenko et al. (2011), Carrizo
Vergara (2018), and Lindgren et al. (2011, 2022).
The Matérn and exponential classes of covariance functions both have mainly three types of
parameters: the scale parameter 𝜎2, which equals the variance of 𝑋(t) at any t ∈ R𝑑; the range
2
parameter 𝜃, which measures how fast the correlation decays with the distance; and the smoothness
parameter 𝜈, which controls the smoothness such as mean square differentiability of the random
field. More specifically, 𝑋 is 𝑛 times mean square differentiable if and only if 𝑛 < 𝜈 (Stein, 1999;
Anderes and Stein, 2008).
The increasing-domain asymptotics and infill (fixed-domain) asymptotics are two frameworks
under which the covariance parameter estimations for GRFs have been studied (Cressie, 1993;
Stein, 1999). Under the increasing-domain asymptotic framework, the minimum distance between
sampling locations is bounded away from zero, and the sampling region grows as the sample size 𝑁
increases. Under infill asymptotics, the sampling region is fixed and bounded, and the mesh of the
sampling points decreases as the sample size 𝑁 tends to infinity. Besides, there is another asymptotic
framework called hybrid asymptotics or mixed domain asymptotics, under which the sampling
locations increasingly densely fill in any given subregion of the unbounded sampling region (Stein,
1999; Lahiri, 2003; Lahiri and Mukherjee, 2004; Chang et al., 2017).
This work focuses on the infill asymptotic framework, which plays an important role in spatial
sampling design and kriging (Stein, 1999; Zhu and Zhang, 2006). Assuming the smoothness
parameter 𝜈 is known, Zhang (2004), Du et al. (2009), Wang and Loh (2011), and Kaufman and
Shaby (2013) provided infill asymptotic results for the MLE and tapered MLE of the microergodic
parameter of the GRF with the Matérn covariance function; while Bevilacqua et al. (2019) studied
infill asymptotics for MLE of the microergodic parameter in the generalized Wendland covariance
function, which exhibits the same behavior as of the Matérn function at the origin according to
Gneiting (2002). Using quadratic variations defined based on irregularly spaced sampling designs
(more details described in Appendices A.2-A.3), Loh et al. (2021) also estimated the microergodic
parameter of the Matérn covariance function under the infill asymptotic framework.
The estimation of the smoothness parameter has also been widely studied. Regarding the fractal
dimension, which is a measure of the smoothness of sample paths of a stochastic process, existing
approaches of estimation include the box-counting method (Hall and Wood, 1993), variogram estimator
(Constantine and Hall, 1994), periodogram-based estimator (Chan et al., 1995), variation
3
method (Dubuc et al., 1989), etc. The infill asymptotic behavior of increment-based estimators for
the smoothness parameter of a stationary GRF was studied by Kent and Wood (1997), Chan and
Wood (2000), Loh (2015), and Loh et al. (2021). For time series or spatial data, Gneiting et al.
(2012) discussed various types of estimators of its fractal dimension under the infill asymptotic
framework, considering both stationary and nonstationary univariate GRF models. Zhou and Xiao
(2018) studied the joint infill asymptotic properties of increment-based estimators for smoothness
parameters in the autocovariance functions of two coordinates of {𝑋(𝑡) = (𝑋1(𝑡), 𝑋2 (𝑡))𝑇 , 𝑡 ∈ R},
which extended the work of Kent and Wood (1997) to the bivariate case.
The subsequential chapters are organized as follows. In Chapter 2, we consider the bivariate
model {𝑋(𝑡) = (𝑋1(𝑡), 𝑋2(𝑡))𝑇 , 𝑡 ∈ R} studied by Zhou and Xiao (2018) and propose an incrementbased
estimator for the smoothness parameter in the cross-covariance function of 𝑋(𝑡), based on
both regularly and irregularly spaced sampling points. The strong consistency and asymptotic normality
of the estimator are demonstrated under the infill asymptotic framework. In Chapter 3, we
estimate the scale parameter and range parameters of a univariate anisotropic Ornstein-Uhlenbeck
field on R2. The estimators we propose have similar asymptotic behaviors with MLEs, but with less
computational cost. In Chapter 4, we estimate the scale parameter in the Matérn covariance function
using MLE, whose computational complexity is reduced by the Vecchia approximation. We
study the bias resulting from a misspecified range parameter and the conditioning variables of the
Vecchia approximation. Simulation results are presented in each chapter to illustrate the theoretical
results.
4
CHAPTER 2
ESTIMATION OF SMOOTHNESS PARAMETERS
2.1 Introduction
Based on the infill asymptotic behaviors of quadratic variations (Lévy, 1940; Baxter, 1956;
Grenander, 1981), the increment-based methods have been used by several authors to consistently
estimate the smoothness parameter of a univariate stationary Gaussian random field under the infill
asymptotic framework (Istas and Lang, 1997; Kent and Wood, 1997; Chan and Wood, 2000; Loh,
2015; Loh et al., 2021). Consider a Gaussian process 𝑋 observed on 0 = 𝑡0 < 𝑡1 < · · · < 𝑡𝑛 = 1,
Istas and Lang (1997) and Kent and Wood (1997) independently generalized the quadratic variation
defined as
Í𝑛𝑗
=1
(𝑋(𝑡 𝑗 ) − 𝑋(𝑡 𝑗−1)2 using vectors of increment. The empirical mean of squared
process defined by Kent and Wood (1997) is equivalent to the empirical quadratic variation studied
by Istas and Lang (1997). An increment of order 𝑝 is vector 𝑎 = (𝑎−𝐽 , 𝑎1−𝐽 , . . . , 𝑎𝐽 )𝑇 ∈ R2𝐽+1
(𝐽 > 0) satisfying
Õ𝐽
𝑗=−𝐽
𝑗 𝑞𝑎 𝑗
8>>>>
<
>>>>:
= 0, 0 ≤ 𝑞 ≤ 𝑝,
≠ 0, 𝑞 = 𝑝 + 1.
The increment-based estimators could also be used for estimating the fractal dimension of nonstationary
GRFs (Zhu and Stein, 2002; Begyn, 2005; Kubilius and Melichov, 2010).
Denote by 𝑋 = {(𝑋1 (𝑡), 𝑋2(𝑡))𝑇 , 𝑡 ∈ R} a bivariate stationary Gaussian process with zero mean
and covariance function
𝐶(𝑡) =
©­­
«
𝐶11 (𝑡) 𝐶12 (𝑡)
𝐶21 (𝑡) 𝐶22 (𝑡)
ª®®
¬
. (2.1)
Assume that as |𝑡 | → 0,
𝐶𝑖𝑖 (𝑡) = 𝜎2
𝑖
− 𝑐𝑖𝑖 |𝑡 |𝛼𝑖𝑖 + 𝑜(|𝑡 |𝛼𝑖𝑖 ), (2.2)
𝐶𝑖 𝑗 (𝑡) = 𝜌𝜎1𝜎2 (1 − 𝑐12|𝑡 |𝛼12 + 𝑜(|𝑡 |𝛼12 )), (2.3)
where 𝜎𝑖 , 𝑐𝑖𝑖 , 𝑐𝑖 𝑗 > 0, 𝛼𝑖𝑖 ∈ (0, 2), |𝜌| ∈ (0, 1), 𝑖, 𝑗 ∈ {1, 2}, 𝑖 ≠ 𝑗 . Following the framework
of Gneiting et al. (2010), Zhou and Xiao (2018) imposed the following assumptions to make the
5
covariance function (2.1) valid:
𝛼12 > (𝛼11 + 𝛼22)/2
or 𝛼12 = (𝛼11 + 𝛼22)/2 and 𝑐2
12𝜌2𝜎2
1𝜎2
2 < 𝑐11𝑐22.
2.2 Estimating the Cross Smoothness Parameter
Consider the Gaussian process 𝑋 modeled by (2.1-2.3). When 𝛼12 = (𝛼11 + 𝛼22)/2, the cross
smoothness parameter 𝛼12 could be estimated using estimators for 𝛼11 and 𝛼22. This case can be
treated by using the results in Zhou and Xiao (2018). In the following, we focus on the case when
𝛼12 > (𝛼11 + 𝛼22)/2 and construct an increment-based estimator for 𝛼12.
The regularity conditions below are introduced for the convenience of subsequent analysis.
Consider the condition (𝐴𝑞) in Kent and Wood (1997) for the 𝑞th derivative of covariance function
𝐶𝑖 𝑗 , that is,
𝐶
(𝑞)
𝑖 𝑗
(𝑡) = −𝐴𝑖 𝑗
𝛼𝑖 𝑗 !
𝑞!
|𝑡 |𝛼𝑖 𝑗−𝑞 + 𝑜(|𝑡 |𝛼𝑖 𝑗−𝑞) (2.4)
as |𝑡 | → 0, where 𝑞 ≥ 1, 𝑖, 𝑗 ∈ {1, 2}, 𝐴𝑖𝑖 = 𝑐𝑖𝑖 , 𝐴12 = 𝐴21 = 𝜌𝜎1𝜎2𝑐12, and 𝛼𝑖 𝑗 !/𝑞! = 𝛼𝑖 𝑗 (𝛼𝑖 𝑗 −
1) . . . (𝛼𝑖 𝑗 − 𝑞 + 1).
Under the infill asymptotics framework, Section 2.2.1 discusses the covariation of 𝑋, and Section
2.2.2 further studies asymptotic properties of the increment-based estimator for 𝛼12. Some
simulation results are presented in Section 2.2.3.
2.2.1 Covariation
Let 𝑎 = (𝑎−𝐽 , 𝑎1−𝐽 , . . . , 𝑎𝐽 )𝑇 be an increment of order 𝑝. Denote by 𝑋𝑢
𝑛,𝑖
∈ R𝑛(2𝐽+1) the vector of
observations of component 𝑋𝑖 , where 𝑖 = 1, 2, 𝑢 = 1, 2, . . . , 𝑚 and 𝑛 ∈ Z+. For 𝑗 = 1, 2, . . . , 2𝐽 + 1
and 𝑘 = 1, 2, . . . , 𝑛, let
(𝑋𝑢
𝑛,𝑖
) 𝑗+(𝑘−1) (2𝐽+1) = 𝑋𝑖

𝑘 + 𝑢( 𝑗 − 𝐽 − 1)
𝑛

.
In other words, for 𝑘 = 1, 2, . . . , 𝑛(2𝐽 + 1),
(𝑋𝑢
𝑛,𝑖
)𝑘 = 𝑋𝑖

𝑘𝐽 + 1 + 𝑢(𝑘 − 𝑘𝐽 (2𝐽 + 1) − 𝐽 − 1)
𝑛

,
6
where 𝑘𝐽 = max{ 𝑗 ∈ Z : 𝑗 < 𝑘/(2𝐽 + 1)}. Define
𝑌𝑢
𝑛 :=
©­­
«
𝑌𝑢
𝑛,1
𝑌𝑢
𝑛,2
ª®®
¬
=
©­­
«
𝑛𝛼11/2(𝐼𝑛 ⊗ 𝑎𝑇 ) 0
0 𝑛𝛼22/2 (𝐼𝑛 ⊗ 𝑎𝑇 )
ª®®
¬
©­­
«
𝑋𝑢
𝑛,1
𝑋𝑢
𝑛,2
ª®®
¬
,
where ⊗ denotes the Kronecker product. More specifically, for 𝑘 = 1, . . . , 𝑛,
(𝑌𝑢
𝑛,𝑖
)𝑘 = 𝑛𝛼𝑖𝑖/2
2Õ𝐽+1
𝑗=1
𝑎 𝑗−𝐽−1 (𝑋𝑢
𝑛,𝑖
) 𝑗+(𝑘−1) (2𝐽+1) .
Denote by
𝑍𝑢
𝑛,12
(𝑘) = 𝑛𝛼12−(𝛼11+𝛼22)/2 (𝑌𝑢
𝑛,1
)𝑘 (𝑌𝑢
𝑛,2
)𝑘 , 𝑘 = 1, . . . , 𝑛
and define the covariation as
¯𝑍
𝑢
𝑛,12 =
1
𝑛
Õ𝑛
𝑗=1
𝑍𝑢
𝑛,12
( 𝑗 )
=
1
2𝑛𝛼12−(𝛼11+𝛼22)/2−1(𝑌𝑢
𝑛
)𝑇 ©­­
«
0 𝐼𝑛
𝐼𝑛 0
ª®® ¬
𝑌𝑢
𝑛 .
(2.5)
We first discuss the infill asymptotic properties of covariations ¯
𝑍
𝑢
𝑛,12, based on which the estimator
for 𝛼12 will be constructed (see (2.27) below).
Theorem 1. Assume (2.4) holds for 𝑞 = 2𝑝 + 3 and 𝑖, 𝑗 ∈ {1, 2}, then ∀𝑢 = 1, . . . , 𝑚,
¯𝑍
𝑢
𝑛,12
𝑃 →
𝐴𝑢𝛼12
as 𝑛 → ∞ if 𝛼11 + 𝛼22 < 2𝛼12 < 𝛼11 + 𝛼22 + 1 < 4𝑝 + 4 or 4𝑝 + 3 < 𝛼11 + 𝛼22 < 2𝛼12 < 4𝑝 + 4,
where 𝐴 = −𝜌𝜎1𝜎2𝑐12
Í𝐽
𝑘,𝑙=−𝐽 𝑎𝑘𝑎𝑙 |𝑘 − 𝑙 |𝛼12 .
7
Proof. Based on (2.2) and (2.3), for any 𝑗 , 𝑘 = 1, . . . , 𝑛 and any 𝑢, 𝑣 = 1, . . . , 𝑚,
𝜎𝑢𝑣
𝑛,𝑖𝑟
(𝑘 − 𝑗 ) := 𝐸[(𝑌𝑢
𝑛,𝑖
) 𝑗 (𝑌𝑣
𝑛,𝑟
)𝑘 ] = 𝑛(𝛼𝑖𝑖+𝛼𝑟𝑟 )/2
Õ𝐽
𝑠,𝑡=−𝐽
𝑎𝑠𝑎𝑡𝐸

𝑋𝑖

𝑗 + 𝑠𝑢
𝑛

𝑋𝑟

𝑘 + 𝑡𝑢
𝑛

= 𝑛(𝛼𝑖𝑖+𝛼𝑟𝑟 )/2
Õ
𝑠,𝑡
𝑎𝑠𝑎𝑡𝐶𝑖𝑟

𝑗 − 𝑘 + 𝑠𝑢 − 𝑡𝑣
𝑛

= −𝐴𝑖𝑟𝑛(𝛼𝑖𝑖+𝛼𝑟𝑟 )/2−𝛼𝑖𝑟
Õ
𝑠,𝑡
𝑎𝑠𝑎𝑡 | 𝑗 − 𝑘 + 𝑠𝑢 − 𝑡𝑣|𝛼𝑖𝑟 + 𝑜(𝑛(𝛼𝑖𝑖+𝛼𝑟𝑟 )/2−𝛼𝑖𝑟 )
→
8>>>>
<
>>>>:
−𝐴𝑖𝑖
Í
𝑠,𝑡 𝑎𝑠𝑎𝑡 | 𝑗 − 𝑘 + 𝑠𝑢 − 𝑡𝑣|𝛼𝑖𝑖 , 𝑖 = 𝑟
0, 𝑖 ≠ 𝑟
(2.6)
as 𝑛 → ∞, where 𝑖, 𝑟 ∈ {1, 2}. Thus,
𝐸[𝑍𝑢
𝑛,12
( 𝑗 )] = 𝑛𝛼12−(𝛼11+𝛼22)/2𝐸[(𝑌𝑢
𝑛,1
) 𝑗 (𝑌𝑢
𝑛,2
) 𝑗 ]
= −𝜌𝜎1𝜎2𝑐12
Õ
𝑘,𝑙
𝑎𝑘𝑎𝑙 |𝑘 − 𝑙 |𝛼12𝑢𝛼12 + 𝑜(1)
→ 𝐴𝑢𝛼12 as 𝑛 → ∞,
(2.7)
where 𝐴 = 0 if 𝛼12/2 ∈ Z and 𝑝 ≥ 𝛼12/2, due to the fact that
Í
𝑘,𝑙 𝑎𝑘𝑎𝑙 (𝑘 − 𝑙)𝑟 = 0 for 𝑟 ≤ 2𝑝 + 1.
If (2.4) holds for 𝑞 = 2𝑝 + 3, then ∀− 𝑛 < ℎ < 𝑛, there exists ℎ∗ between ℎ and ℎ + 𝑠𝑢 − 𝑡𝑣 such
that
Õ
𝑠,𝑡
𝑎𝑠𝑎𝑡𝐶𝑖𝑟

ℎ + 𝑠𝑢 − 𝑡𝑣
𝑛

=
2(𝑢𝑣) 𝑝+1
(2𝑝 + 2)!𝑛2𝑝+2

𝐷21
𝐶
(2𝑝+2)
𝑖𝑟

ℎ
𝑛

+ 𝑢 + 𝑣
𝑛(2𝑝 + 3) 𝐷1𝐷2𝐶
(2𝑝+3)
𝑖𝑟

ℎ∗
𝑛

, (2.8)
where 𝑖, 𝑟 ∈ {1, 2}, 𝐷1 =
Í
𝑠 𝑎𝑠𝑠𝑝+1, 𝐷2 =
Í
𝑠 𝑎𝑠𝑠𝑝+2. As a result, when 𝑗 − 𝑘 = ℎ,
𝐶𝑜𝑣(𝑍𝑢
𝑛,12
( 𝑗 ), 𝑍𝑣
𝑛,12
(𝑘)) = 𝐸[𝑍𝑢
𝑛,12
( 𝑗 )𝑍𝑣
𝑛,12
(𝑘)] − 𝐸[𝑍𝑢
𝑛,12
( 𝑗 )]𝐸[𝑍𝑣
𝑛,12
(𝑘)]
= 𝑛2𝛼12−(𝛼11+𝛼22)

𝐸[(𝑌𝑢
𝑛,1
) 𝑗 (𝑌𝑣
𝑛,1
)𝑘 ]𝐸[(𝑌𝑢
𝑛,2
) 𝑗 (𝑌𝑣
𝑛,2
)𝑘 ]
+𝐸[(𝑌𝑢
𝑛,1
) 𝑗 (𝑌𝑣
𝑛,2
)𝑘 ]𝐸[(𝑌𝑣
𝑛,1
)𝑘 (𝑌𝑢
𝑛,2
) 𝑗 ]

= 𝑛2𝛼12

2(𝑢𝑣) 𝑝+1𝐷1
(2𝑝 + 2)!𝑛2𝑝+2
2
(𝐹𝑢𝑣
𝑛,12
(ℎ)2 + 𝐹𝑢𝑣
𝑛,11
(ℎ)𝐹𝑢𝑣
𝑛,22
(ℎ)),
(2.9)
where for 𝑖, 𝑟 ∈ {1, 2},
𝐹𝑢𝑣
𝑛,𝑖𝑟
(ℎ) = 𝐷1𝐶
(2𝑝+2)
𝑖𝑟

ℎ
𝑛

+ 𝑢 + 𝑣
𝑛(2𝑝 + 3) 𝐷2𝐶
(2𝑝+3)
𝑖𝑟

ℎ∗
𝑛

.
8
As ℎ/𝑛 → 0,
𝐹𝑢𝑣
𝑛,12
(ℎ)2 =

ℎ
𝑛
2𝛼12−(4𝑝+4) 
𝐴12
𝛼12!
(2𝑝 + 2)!
2 
𝐷1𝐷2
𝑢 + 𝑣
2𝑝 + 3
2(𝛼12 − 2𝑝 − 2) |ℎ|−1
+𝐷21
+ 𝐷22
(𝑢 + 𝑣)2
(2𝑝 + 3)2
(𝛼12 − 2𝑝 − 2)2|ℎ|−2

(1 + 𝑜(1)) ,
𝐹𝑢𝑣
𝑛,11
(ℎ)𝐹𝑢𝑣
𝑛,22
(ℎ) =

ℎ
𝑛
𝛼11+𝛼22−(4𝑝+4)
𝐴11𝐴22
𝛼11!
(2𝑝 + 2)!
𝛼22!
(2𝑝 + 2)!

𝐷1𝐷2
𝑢 + 𝑣
2𝑝 + 3
(𝛼11
+ 𝛼22 − 4𝑝 − 4) |ℎ|−1 + 𝐷21
+ 𝐷22
(𝑢 + 𝑣)2
(2𝑝 + 3)2
(𝛼11 − 2𝑝 − 2) (𝛼22
−2𝑝 − 2) |ℎ|−2

(1 + 𝑜(1)) .
It was shown in the proof of Theorem 1 in Kent and Wood (1997) that as 𝑛 → ∞,
Õ𝑛−1
ℎ=−𝑛+1

1 −
|ℎ|
𝑛

|ℎ|𝑎 =
8>>>>
<
>>>>:
𝑂(1), if 𝑎 < −1;
𝑂(𝑛𝑎+1), if 𝑎 > −1.
Hence, as 𝑛 → ∞,
𝐶𝑜𝑣(¯
𝑍
𝑢
𝑛,12, ¯
𝑍
𝑣
𝑛,12
) =
1
𝑛
Õ𝑛−1
ℎ=−𝑛+1

1 −
|ℎ|
𝑛

𝐶𝑜𝑣(𝑍𝑢
𝑛,12
(0), 𝑍𝑣
𝑛,12
(ℎ))
= 𝑛2𝛼12−(4𝑝+4)−1

2(𝑢𝑣) 𝑝+1𝐷1
(2𝑝 + 2)!
2
Õ𝑛−1
ℎ=−𝑛+1

1 −
|ℎ|
𝑛
 
𝐹𝑢𝑣
𝑛,12
(ℎ)2 + 𝐹𝑢𝑣
𝑛,11
(ℎ)𝐹𝑢𝑣
𝑛,22
(ℎ)

=
8>>>>
<
>>>>:
𝑂(𝑛2𝛼12−(𝛼11+𝛼22)−1), if 𝛼11 + 𝛼22 < 4𝑝 + 3;
𝑂(𝑛2𝛼12−(4𝑝+4)), if 𝛼11 + 𝛼22 > 4𝑝 + 3.
(2.10)
It is induced from (2.7) and (2.10) that, when 𝛼11 + 𝛼22 < 2𝛼12 < 𝛼11 + 𝛼22 + 1 < 4𝑝 + 4 or
4𝑝 + 3 < 𝛼11 + 𝛼22 < 2𝛼12 < 4𝑝 + 4, ¯
𝑍
𝑢
𝑛,12
𝑃 →
𝐴𝑢𝛼12 as 𝑛 → ∞.
Remark. Under the conditions of Theorem 1, we have natural consequences as follows.
(i) Take 𝑝 = 0, then for 𝛼11 + 𝛼22 < 3, ¯
𝑍
𝑢
𝑛,12
𝑃 →
𝐴𝑢𝛼12 as 𝑛 → ∞ if 𝛼11 + 𝛼22 < 2𝛼12 <
𝛼11 + 𝛼22 + 1; for 𝛼11 + 𝛼22 > 3, the convergence holds if 𝛼11 + 𝛼22 < 2𝛼12 < 4.
9
(ii) Take 𝑝 ≥ 1, then for any 𝛼1, 𝛼2 ∈ (0, 2), ¯
𝑍
𝑢
𝑛,12
𝑃 →
𝐴𝑢𝛼12 as 𝑛 → ∞ if 𝛼11 + 𝛼22 < 2𝛼12 <
𝛼11 + 𝛼22 + 1.
The convergence in probability in Theorem 1 can be strengthened to almost sure convergence
by applying the following lemma and the Borel–Cantelli Lemma.
Lemma 1. Under conditions in Theorem 1, ∀𝑢 = 1, . . . , 𝑚, there exists a constant 𝐶 ∈ (0, ∞)
independent of 𝑛 such that for all large enough 𝑛 and ∀0 < 𝜉 < 1,
𝑃

(¯
𝑍
𝑢
𝑛,12
)2 − 𝐸(¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2

> 𝜉
!
≤ 𝐶 exp

−𝑛min{𝛼11+𝛼22+1,4𝑝+4}/2−𝛼12 𝜉
4 − 𝜉

. (2.11)
Proof. For 𝑛 ≥ 1 and 𝑢 = 1, . . . , 𝑚, denote
𝑀𝑢
𝑛 =
1
2𝑛𝛼12−(𝛼11+𝛼22)/2−1(Σ1/2
𝑌
)𝑇©­­
«
0 𝐼𝑛
𝐼𝑛 0
ª®®
¬
Σ1/2
𝑌 ,
then according to (2.5), ¯
𝑍
𝑢
𝑛,12
d=
𝑈𝑇𝑀𝑢
𝑛𝑈, where 𝑈 ∼ 𝑁(0, 𝐼2𝑛). By the Hanson-Wright inequality,
there exists constants 𝐶1, 𝐶2 that do not depend on 𝑛 or 𝑢 such that ∀0 < 𝜉 < 1,
𝑃

¯𝑍
𝑢
𝑛,12
− 𝐸¯
𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12

> 𝜉
!
≤ 2 exp
− min
(
𝐶1𝜉|𝐸¯𝑍
𝑢
𝑛,12
|
||𝑀𝑢
𝑛 ||2
,
𝐶2𝜉2|𝐸¯
𝑍
𝑢
𝑛,12
|2
||𝑀𝑢
𝑛 ||2
𝐹
)!
.
Under the conditions in Theorem 1, as 𝑛 → ∞ there is
||𝑀𝑢
𝑛
||2
𝐹 = 𝑡𝑟 ( (𝑀𝑢
𝑛
)2) = 𝑣𝑎𝑟 (¯
𝑍
𝑢
𝑛,12
)/2 =
8>>>><
>>>>: 𝑂(
𝑛2
𝛼12−(𝛼11+𝛼22)−1), if 𝛼11 +
𝛼22 <
4
𝑝 +
3;
𝑂(𝑛2𝛼12−(4𝑝+4)), if 𝛼11 + 𝛼22 > 4𝑝 + 3.
(2.12)
Since 𝐸¯
𝑍
𝑢
𝑛,12
→ 𝐴𝑢𝛼12 as 𝑛 → ∞ and ||𝑀𝑢
𝑛
||2 ≤ ||𝑀𝑢
𝑛
||𝐹, there exists a constant 𝐶0 ∈ (0, ∞)
that does not depend on 𝑛 but may depend on 𝑢 such that
𝑃
 ¯𝑍 𝑢
𝑛,12
− 𝐸¯
𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12

> 𝜉
!
≤ 𝐶0 exp

−𝑛min{𝛼11+𝛼22+1,4𝑝+4}/2−𝛼12𝜉

. (2.13)
Under the conditions in Theorem 1,
(𝐸¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2 =
𝐸(¯
𝑍
𝑢
𝑛,12
)2 − 𝑣𝑎𝑟 (¯
𝑍
𝑢
𝑛,12
)
𝐸(¯
𝑍
𝑢
𝑛,12
)2
→ 1 as 𝑛 → ∞.
10
Thus, ∀0 < 𝜉 < 1, 1 − 𝜉/2 < (𝐸¯
𝑍
𝑢
𝑛,12
)2/𝐸(¯
𝑍
𝑢
𝑛,12
)2 < 1 + 𝜉/2 when 𝑛 is large enough. Together
with (2.13) it implies
𝑃

(¯
𝑍
𝑢
𝑛,12
)2 − 𝐸(¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2

> 𝜉
!
≤𝑃
©­
«
(𝐸¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
!2
− 1

+

(𝐸¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2
− 1

> 𝜉
ª®
¬
=𝑃
©­
«

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
!2
− 1

>
𝜉 + (𝐸¯
𝑍
𝑢
𝑛,12
)2/𝐸(¯
𝑍
𝑢
𝑛,12
)2 − 1
(𝐸¯
𝑍
𝑢
𝑛,12
)2/𝐸(¯
𝑍
𝑢
𝑛,12
)2
ª®
¬
≤𝑃
©­
«

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
!2
− 1

>
𝜉 − 𝜉/2
1 − 𝜉/2
ª®
¬
for large 𝑛
=𝑃
©­
«

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
!2
− 1

>
𝜉
2 − 𝜉
ª®
¬
≤𝑃

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
− 1

·

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
+ 1

>
𝜉
2 − 𝜉
,

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
− 1

≤ 𝜉
2 − 𝜉
!
+ 𝑃

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
− 1

>
𝜉
2 − 𝜉
!
≤𝑃

¯𝑍
𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
− 1

>
𝜉/(2 − 𝜉)
2 + 𝜉/(2 − 𝜉)
!
+ 𝑃

¯𝑍 𝑢
𝑛,12
𝐸¯
𝑍
𝑢
𝑛,12
− 1

>
𝜉
2 − 𝜉
!
≤𝐶 exp

−𝑛min{𝛼11+𝛼22+1,4𝑝+4}/2−𝛼12 𝜉
4 − 𝜉

for some constant 𝐶 ∈ (0, ∞) that is independent of 𝑛 and 𝜉 but may depend on 𝑢.
The joint asymptotic distribution of the covariations is presented in the following theorem.
Theorem 2. Denote by ¯
𝑍
𝑛,12 = (¯
𝑍
1
𝑛,12, . . . , ¯
𝑍
𝑚
𝑛,12
)𝑇 and take 𝑝 ≥ 1. When 𝛼11 + 𝛼22 < 2𝛼12 and
(2.4) holds for 𝑞 = 2𝑝 + 2,
𝑛1/2+(𝛼11+𝛼22)/2−𝛼12 (¯
𝑍
𝑛,12 − 𝐸¯
𝑍
𝑛,12) 𝑑 → 𝑁(0,Φ) (2.14)
as 𝑛 → ∞, where the matrix Φ ∈ R𝑚×𝑚 has entries
Φ𝑢,𝑣 = 𝐴11𝐴22
Õ∞
ℎ=−∞
Õ𝐽
𝑠,𝑡, 𝑗 ,𝑙=−𝐽
𝑎𝑠𝑎𝑡𝑎 𝑗𝑎𝑙 |ℎ + 𝑠𝑢 − 𝑡𝑣|𝛼11 |ℎ + 𝑗𝑢 − 𝑙𝑣|𝛼22 , 1 ≤ 𝑢, 𝑣 ≤ 𝑚. (2.15)
11
Proof. By the Cramér-Wold theorem, to prove the asymptotic normality of ¯
𝑍
𝑛,12, it suffices to show
that ∀𝜸 ∈ R𝑚,
𝑛1/2+(𝛼11+𝛼22)/2−𝛼12𝜸𝑇 (¯
𝑍
𝑛,12 − 𝐸¯
𝑍
𝑛,12) 𝑑 → 𝑁(0, 𝜸𝑇Φ𝜸) (2.16)
as 𝑛 → ∞.
Denote by
𝑊𝑛 = (𝑌1
𝑛,1
(1), . . . ,𝑌𝑚
𝑛,1
(1),𝑌1
𝑛,1
(2), . . . ,𝑌𝑚
𝑛,1
(𝑛),𝑌1
𝑛,2
(1), . . . ,𝑌𝑚
𝑛,2
(𝑛))𝑇 ∈ R2𝑚𝑛, (2.17)
then
𝑛1/2+(𝛼11+𝛼22)/2−𝛼12𝜸𝑇 ¯
𝑍
𝑛,12 =
1
2𝑛−1/2𝑊𝑇
𝑛
©­­
«
0 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸)
𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸) 0
ª®®
¬
𝑊𝑛,
where 𝑑𝑖𝑎𝑔(𝑥) maps a vector 𝑥 to a diagonal matrix whose diagonal is 𝑥, 1𝑛 ∈ R𝑛 is a vector with
all its entries equals 1. Let 𝑉𝑛 = 𝐶𝑜𝑣(𝑊𝑛) and
𝐺𝑛 =
1
2𝑛−1/2 (𝑉1/2
𝑛 )𝑇©­­
«
0 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸)
𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸) 0
ª®®
¬
𝑉1/2
𝑛 , (2.18)
then 𝑛1/2+(𝛼11+𝛼22)/2−𝛼12𝜸𝑇 ¯
𝑍
𝑛,12
𝑑 =
𝜖𝑇
𝑛 𝐺𝑛𝜖𝑛
𝑑 =
𝜖𝑇
𝑛 𝑑𝑖𝑎𝑔(eig(𝐺𝑛))𝜖𝑛 for 𝜖𝑛 ∼ 𝑁(0, 𝐼2𝑚𝑛).
It follows from the proof of Theorem 2 in Zhou and Xiao (2018) that (2.16) holds if Tr(𝐺4
𝑛
) → 0
and 2Tr(𝐺2
𝑛
) → 𝜸𝑇Φ𝜸 as 𝑛 → ∞.
Let
𝐻𝑛 = 𝑉𝑛
©­­
«
0 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸)
𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸) 0
ª®®
¬
,
then for 𝑖1, 𝑖2 ∈ {1, 2}, 𝑗1, 𝑗2 ∈ {1, . . . , 𝑛} and 𝑘1, 𝑘2 ∈ {1, . . . , 𝑚},
𝐻𝑛 ( (𝑖1 − 1)𝑚𝑛 + ( 𝑗1 − 1)𝑚 + 𝑘1, (𝑖2 − 1)𝑚𝑛 + ( 𝑗2 − 1)𝑚 + 𝑘2) = 𝛾𝑘2𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
( 𝑗2 − 𝑗1).
12
Thus,
Tr(𝐻4
𝑛
) =
Õ𝑚
𝑘1,...,𝑘4=1
𝛾𝑘1𝛾𝑘2𝛾𝑘3𝛾𝑘4
Õ2
𝑖1,...,𝑖4=1
Õ𝑛
𝑗1,..., 𝑗4=1

𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
( 𝑗2 − 𝑗1)
𝜎𝑘2𝑘3
𝑛,𝑖2 (3−𝑖3)
( 𝑗3 − 𝑗2)𝜎𝑘3𝑘4
𝑛,𝑖3 (3−𝑖4)
( 𝑗4 − 𝑗3)𝜎𝑘4𝑘1
𝑛,𝑖4 (3−𝑖1)
( 𝑗1 − 𝑗4)

≤
Õ𝑚
𝑘1,...,𝑘4=1
|𝛾𝑘1𝛾𝑘2𝛾𝑘3𝛾𝑘4
|
Õ2
𝑖1,...,𝑖4=1
𝑛
Õ
|ℎ1 |,|ℎ2 |,|ℎ3 |<𝑛

𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
(ℎ1)
𝜎𝑘2𝑘3
𝑛,𝑖2 (3−𝑖3)
(ℎ2)𝜎𝑘3𝑘4
𝑛,𝑖3 (3−𝑖4)
(ℎ3)𝜎𝑘4𝑘1
𝑛,𝑖4 (3−𝑖1)
(ℎ1 + ℎ2 + ℎ3)

,
Tr(𝐻2
𝑛
) = 2
Õ𝑚
𝑘1,𝑘2=1
𝛾𝑘1𝛾𝑘2
Õ𝑛
𝑗1, 𝑗2=1

𝜎𝑘1𝑘2
𝑛,12
( 𝑗2 − 𝑗1)
2
+ 𝜎𝑘1𝑘2
𝑛,11
( 𝑗2 − 𝑗1)𝜎𝑘1𝑘2
𝑛,22
( 𝑗2 − 𝑗1)

= 2𝑛
Õ𝑚
𝑘1,𝑘2=1
𝛾𝑘1𝛾𝑘2
Õ
|ℎ|<𝑛

1 −
|ℎ|
𝑛
 
𝜎𝑘1𝑘2
𝑛,12
(ℎ)
2
+ 𝜎𝑘1𝑘2
𝑛,11
(ℎ)𝜎𝑘1𝑘2
𝑛,22
(ℎ)

.
For any fixed ℎ, the convergence of 𝜎𝑢𝑣
𝑛,𝑖𝑟
(ℎ) as 𝑛 → ∞ is presented in (2.6). By Theorem 1 in
Kent and Wood (1997) and Lemma 2 in Zhou and Xiao (2018), when 𝛼11 + 𝛼22 < 2𝛼12 and (2.4)
holds for 𝑞 = 2𝑝 + 2,
𝜎𝑢𝑣
𝑛,𝑖𝑖
(ℎ) = 𝑂(|ℎ|𝛼𝑖𝑖−2𝑝−2) and 𝜎𝑢𝑣
𝑛,12
(ℎ) = 𝑂(|ℎ|(𝛼11+𝛼22)/2−2𝑝−2) (2.19)
uniformly for 𝑛 > |ℎ|. If 𝑝 ≥ 1, then 𝛼𝑖𝑖 − 2𝑝 − 2 < −2 and (𝛼11 + 𝛼22)/2 − 2𝑝 − 2 < −2 hold for
any 𝛼11, 𝛼22 ∈ (0, 2). Hence there exists a constant 𝑐0 > 0 such that
Õ𝑛−1
ℎ1,ℎ2,ℎ3=1−𝑛

𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
(ℎ1)𝜎𝑘2𝑘3
𝑛,𝑖2 (3−𝑖3)
(ℎ2)𝜎𝑘3𝑘4
𝑛,𝑖3 (3−𝑖4)
(ℎ3)𝜎𝑘4𝑘1
𝑛,𝑖4 (3−𝑖1)
(ℎ1 + ℎ2 + ℎ3)

≤ 𝑐0
Õ𝑛−1
ℎ1,ℎ2,ℎ3=1−𝑛

|ℎ1|
𝛼𝑖1𝑖1
+𝛼(3−𝑖2 ) (3−𝑖2 )
2
−2𝑝−2|ℎ2|
𝛼𝑖2𝑖2
+𝛼(3−𝑖3 ) (3−𝑖3 )
2
−2𝑝−2
|ℎ3|
𝛼𝑖3𝑖3
+𝛼(3−𝑖4 ) (3−𝑖4 )
2
−2𝑝−2

= 𝑂(1)
as 𝑛 → ∞, ∀𝑖1, 𝑖2, 𝑖3, 𝑖4 ∈ {1, 2}. Consequently, Tr(𝐻4
𝑛
) = 𝑂(𝑛) and
Tr(𝐺4
𝑛
) =

1
2𝑛−1/2
4
Tr(𝐻4
𝑛
) = 𝑂(𝑛−1) → 0
13
as 𝑛 → ∞.
For 𝑢, 𝑣 ∈ {1, . . . , 𝑚} and ℎ ∈ Z, define
𝑑𝑢𝑣
𝑛
(ℎ) := 1|ℎ|<𝑛

1 −
|ℎ|
𝑛
 
𝜎𝑢𝑣
𝑛,12
(ℎ)
2
+ 𝜎𝑢𝑣
𝑛,11
(ℎ)𝜎𝑢𝑣
𝑛,22
(ℎ)

.
Then for any fixed ℎ,
𝑑𝑢𝑣
𝑛
(ℎ) → 𝐴11𝐴22
Õ𝐽
𝑠,𝑡, 𝑗 ,𝑙=−𝐽
𝑎𝑠𝑎𝑡𝑎 𝑗𝑎𝑙 |ℎ + 𝑠𝑢 − 𝑡𝑣|𝛼11 |ℎ + 𝑗𝑢 − 𝑙𝑣|𝛼22
as 𝑛 → ∞. Moreover,
𝑑𝑢𝑣
𝑛
(ℎ) ≤

𝜎𝑢𝑣
𝑛,12
(ℎ)
2
+ 𝜎𝑢𝑣
𝑛,11
(ℎ)𝜎𝑢𝑣
𝑛,22
(ℎ) = 𝑂(|ℎ|𝛼11+𝛼22−4𝑝−4)
uniformly for 𝑛 > |ℎ|. If 𝑝 ≥ 1, then 𝛼11 +𝛼22 −4𝑝 −4 < −4 and
Í∞
ℎ=−∞ |ℎ|𝛼11+𝛼22−4𝑝−4 < ∞. Thus
for any 𝑢, 𝑣 ∈ {1, . . . , 𝑚}, {𝑑𝑢𝑣
𝑛
(ℎ), ℎ ∈ Z} is dominated by a summable sequence. It therefore
follows from the dominated convergence theorem that
Tr(𝐺2
𝑛
) =
1
4𝑛
Tr(𝐻2
𝑛
)
=
1
2
Õ𝑚
𝑘1,𝑘2=1
𝛾𝑘1𝛾𝑘2
Õ∞
ℎ=−∞
𝑑𝑘1𝑘2
𝑛
(ℎ)
→ 𝐴11𝐴22
2
Õ𝑚
𝑘1,𝑘2=1
𝛾𝑘1𝛾𝑘2
Õ∞
ℎ=−∞
Õ𝐽
𝑠,𝑡, 𝑗 ,𝑙=−𝐽
𝑎𝑠𝑎𝑡𝑎 𝑗𝑎𝑙 |ℎ + 𝑠𝑘1 − 𝑡𝑘2|𝛼11 |ℎ + 𝑗 𝑘1 − 𝑙𝑘2|𝛼22
:=
1
2𝜸𝑇Φ𝜸 as 𝑛 → ∞,
where Φ ∈ R𝑚×𝑚 is a constant matrix with entries defined in (2.15).
This proves Theorem 2.
Take 𝑝 = 1, 𝐽 = 1, and 𝑎 = (1, −2, 1)𝑇 , we further discuss the joint asymptotic distribution
of covariations defined in this chapter and the quadratic variations ¯
𝑍
𝑛,1, ¯
𝑍
𝑛,2 studied by Zhou and
Xiao (2018), where ¯
𝑍
𝑛,𝑖 = (¯
𝑍
1
𝑛,𝑖 , . . . , ¯
𝑍
𝑚
𝑛,𝑖
)𝑇 and
¯𝑍
𝑢
𝑛,𝑖 =
1
𝑛
(𝑌𝑢
𝑛,𝑖
)𝑇𝑌𝑢
𝑛,𝑖 , 𝑢 = 1, . . . , 𝑚, 𝑖 = 1, 2. (2.20)
14
Theorem 3. When 𝛼11 + 𝛼22 < 2𝛼12 and (2.4) holds for 𝑞 = 4,
𝑛𝐷𝛼
©­­­­­
«
¯𝑍
𝑛,1 − 𝐸¯
𝑍
𝑛,1
¯𝑍
𝑛,2 − 𝐸¯
𝑍
𝑛,1
¯𝑍
𝑛,12 − 𝐸¯
𝑍
𝑛,12
ª®®®®®
¬
𝑑 →
𝑁
©­­­­­
«
0,
©­­­­­
«
Φ1
Φ2
Φ
ª®®®®®
¬
ª®®®®®
¬
(2.21)
as 𝑛 → ∞, where
𝐷𝛼 =
©­­­­­
«
1
2
12
1+𝛼11+𝛼22
2
− 𝛼12
ª®®®®®
¬
,
the matrix Φ ∈ R𝑚×𝑚 is as defined in Theorem 2, and matrices Φ𝑖 ∈ R𝑚×𝑚 have entries as
(Φ𝑖)𝑢,𝑣 = 2𝐴2
𝑖𝑖
Õ∞
ℎ=−∞
Õ1
𝑠,𝑡=−1
𝑎𝑠𝑎𝑡 |ℎ + 𝑠𝑢 − 𝑡𝑣|𝛼𝑖𝑖
!2
, 𝑖 = 1, 2. (2.22)
Proof. By the Cramér-Wold theorem, it suffices to prove that ∀𝜸1 = (𝛾1,1, . . . , 𝛾1,𝑚)𝑇 , 𝜸2 =
(𝛾2,1, . . . , 𝛾2,𝑚)𝑇 , and 𝜸12 = (𝛾12,1, . . . , 𝛾12,𝑚)𝑇 ∈ R𝑚,
√
𝑛

𝜸𝑇1
(¯
𝑍
𝑛,1 − 𝐸¯
𝑍
𝑛,1) + 𝜸𝑇2
(¯
𝑍
𝑛,2 − 𝐸¯
𝑍
𝑛,2) + 𝑛
𝛼11+𝛼22
2
−𝛼12𝜸𝑇
12
(¯
𝑍
𝑛,12 − 𝐸¯
𝑍
𝑛,12)

𝑑 →
𝑁(0, 𝜸𝑇1 Φ1𝜸1 + 𝜸𝑇2 Φ2𝜸2 + 𝜸𝑇
12Φ𝜸12) (2.23)
as 𝑛 → ∞.
Recall the notation 𝑊𝑛 defined in (2.17) and 𝑉𝑛 = Cov(𝑊𝑛), let
Λ𝑛 =
2 √
𝑛
(𝑉1/2
𝑛 )𝑇Γ𝑛𝑉1/2
𝑛 , (2.24)
where
Γ𝑛 =
©­­
«
𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸1) 0
0 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸2)
ª®®
¬
. (2.25)
It follows from definitions of ¯
𝑍
𝑛,1, ¯
𝑍
𝑛,2, and ¯
𝑍
𝑛,12 that
𝑛𝐷𝛼
©­­­­­
«
𝜸1
𝜸2
𝜸12
ª®®®®®
¬
𝑇
©­­­­­
«
¯𝑍
𝑛,1
¯𝑍
𝑛,2
¯𝑍
𝑛,12
ª®®®®®
¬
=
1 √
𝑛
𝑊𝑇
𝑛
©­­
«
𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸1) 1
2 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸12)
1
2 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸12) 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸2)
ª®®
¬
𝑊𝑛
𝑑 =
𝜖𝑇
𝑛

𝐺𝑛 + 1
2Λ𝑛

𝜖𝑛,
15
where 𝜖𝑛 ∼ 𝑁(0, 𝐼3𝑚𝑛) and 𝐺𝑛 is defined in (2.18). Therefore, it remains to prove
Tr

(𝐺𝑛 + 1
2Λ𝑛)2

→ 1
2

𝜸𝑇1 Φ1𝜸1 + 𝜸𝑇2 Φ2𝜸2 + 𝜸𝑇
12Φ𝜸12

and
Tr

(𝐺𝑛 + 1
2Λ𝑛)4

→ 0
as 𝑛 → ∞.
It has been proved by Zhou and Xiao (2018) that as 𝑛 → ∞,
Tr(Λ2
𝑛
) → 2

𝜸𝑇1 Φ1𝜸1 + 𝜸𝑇2 Φ2𝜸2

and Tr(Λ4
𝑛
) → 0
when 𝛼11 + 𝛼22 < 2𝛼12 and (2.4) holds for 𝑞 = 4. Since conditions in Theorem 2 are satisfied, we
also have
Tr(𝐺2
𝑛
) → 1
2𝜸𝑇
12Φ𝜸12 and Tr(𝐺4
𝑛
) → 0
as 𝑛 → ∞. Moreover,
Tr(𝐺𝑛Λ𝑛) =
1
𝑛
Tr
©­­
«
𝑉𝑛
©­­
«
0 𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸12)
𝑑𝑖𝑎𝑔(1𝑛 ⊗ 𝜸12) 0
ª®®
¬
𝑉𝑛Γ𝑛
ª®®
¬
=
1
𝑛
Õ2𝑚𝑛
ℓ1,ℓ2=1
(𝐻𝑛)ℓ1,ℓ2
(𝑉𝑛Γ𝑛)ℓ2,ℓ1
=
1
𝑛
Õ𝑚
𝑘1,𝑘2=1
Õ2
𝑖1,𝑖2=1
𝛾𝑖1,𝑘1𝛾12,𝑘2
Õ𝑛
𝑗1, 𝑗2=1
𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
( 𝑗2 − 𝑗1)𝜎𝑘2𝑘1
𝑛,𝑖2𝑖1
( 𝑗2 − 𝑗1)
=
Õ𝑚
𝑘1,𝑘2=1
Õ2
𝑖1,𝑖2=1
𝛾𝑖1,𝑘1𝛾12,𝑘2
Õ
|ℎ|<𝑛

1 −
|ℎ|
𝑛

𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
(ℎ)𝜎𝑘2𝑘1
𝑛,𝑖2𝑖1
(ℎ)
→ 0 as 𝑛 → ∞ (2.26)
by the dominated convergence theorem, since 𝜎𝑢𝑣
𝑛,12
(ℎ) → 0 as 𝑛 → ∞ for any 𝑢, 𝑣 = 1, . . . , 𝑚 and
any fixed ℎ. Due to the fact that
Card{( 𝑗1, . . . , 𝑗4) : 1 ≤ 𝑗1, . . . , 𝑗4 ≤ 𝑛, 𝑗𝑖+1 − 𝑗𝑖 = ℎ𝑖 (𝑖 = 1, 2, 3)} ≤ 𝑛,
16
we have
Tr

(𝐺𝑛Λ𝑛)2

=
1
𝑛2 Tr

(𝐻𝑛𝑉𝑛Γ𝑛)2

=
1
𝑛2
Õ2𝑚𝑛
ℓ1,...,ℓ4=1
(𝐻𝑛)ℓ1,ℓ2
(𝑉𝑛Γ𝑛)ℓ2,ℓ3
(𝐻𝑛)ℓ3,ℓ4
(𝑉𝑛Γ𝑛)ℓ4,ℓ1
=
1
𝑛2
Õ𝑚
𝑘1,...,𝑘4=1
Õ2
𝑖1,...,𝑖4=1
𝛾𝑖1,𝑘1𝛾12,𝑘2𝛾𝑖3,𝑘3𝛾12,𝑘4
Õ𝑛
𝑗1,..., 𝑗4=1
𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
( 𝑗2 − 𝑗1)𝜎𝑘2𝑘3
𝑛,𝑖2𝑖3
( 𝑗2 − 𝑗3)𝜎𝑘3𝑘4
𝑛,𝑖3 (3−𝑖4)
( 𝑗4 − 𝑗3)𝜎𝑘4𝑘1
𝑛,𝑖4𝑖1
( 𝑗4 − 𝑗1)ª®
¬
≤ 1
𝑛
Õ𝑚
𝑘1,...,𝑘4=1
Õ2
𝑖1,...,𝑖4=1
𝛾𝑖1,𝑘1𝛾12,𝑘2𝛾𝑖3,𝑘3𝛾12,𝑘4
Õ𝑛−1
ℎ1,ℎ2,ℎ3=1−𝑛
𝜎𝑘1𝑘2
𝑛,𝑖1 (3−𝑖2)
(ℎ1)𝜎𝑘2𝑘3
𝑛,𝑖2𝑖3
(ℎ2)𝜎𝑘3𝑘4
𝑛,𝑖3 (3−𝑖4)
(ℎ3)𝜎𝑘4𝑘1
𝑛,𝑖4𝑖1
(ℎ1 + ℎ2 + ℎ3)
!
.
Follow similar steps in the proof of Theorem 2, there exists a constant 𝑐0 > 0 such that
Tr

(𝐺𝑛Λ𝑛)2

≤ 𝑐0
𝑛
Õ𝑚
𝑘1,...,𝑘4=1
Õ2
𝑖1,...,𝑖4=1
|𝛾12,𝑘2𝛾𝑖3,𝑘3𝛾12,𝑘4
|
Õ𝑛−1
ℎ1,ℎ2,ℎ3=1−𝑛
|ℎ1|
𝛼𝑖1𝑖1
+𝛼(3−𝑖2 ) (3−𝑖2 )
2
−4|ℎ2|
𝛼𝑖2𝑖2
+𝛼𝑖3𝑖3
2
−4|ℎ3|
𝛼𝑖3𝑖3
+𝛼(3−𝑖4 ) (3−𝑖4 )
2
−4
!
= 𝑂(𝑛−1) as 𝑛 → ∞,
since ∀𝑖, 𝑗 = 1, 2, 12
(𝛼𝑖𝑖 + 𝛼𝑗 𝑗 ) − 4 < −2.
Consequently, as 𝑛 → ∞,
Tr

(𝐺𝑛 + 1
2Λ𝑛)2

= Tr(𝐺2
𝑛
) + 1
4
Tr(Λ2
𝑛
) + Tr(𝐺𝑛Λ𝑛)
→ 1
2

𝜸𝑇1 Φ1𝜸1 + 𝜸𝑇2 Φ2𝜸2 + 𝜸𝑇
12Φ𝜸12

,
where entries of Φ1, Φ2, and Φ are defined in (2.22) and (2.15). The Cauchy–Schwarz inequality
17
implies that
Tr

(𝐺𝑛 + 1
2Λ𝑛)4

= Tr(𝐺4
𝑛
) + 2Tr(𝐺3
𝑛Λ𝑛) + 1
2
Tr( (𝐺𝑛Λ𝑛)2) + Tr(𝐺2
𝑛Λ2
𝑛
)
+ 1
2
Tr(𝐺𝑛Λ3
𝑛
) + 1
24 Tr(Λ4
𝑛
)
≤ Tr(𝐺4
𝑛
) + 2
q
Tr(𝐺6
𝑛)Tr(Λ2
𝑛) + 1
2
Tr( (𝐺𝑛Λ𝑛)2) +
q
Tr(𝐺4
𝑛)Tr(Λ4
𝑛)
+ 1
2
q
Tr(𝐺2
𝑛)Tr(Λ6
𝑛) + 1
24 Tr(Λ4
𝑛
)
→ 0
as 𝑛 → ∞. This finishes the proof using the convergence of the moment generating function.
2.2.2 Convergence of Estimator
Define the estimator of 𝛼12 as
ˆ 𝛼12 =
1
2
Õ𝑚
𝑢=1
𝐿𝑢 log(¯
𝑍
𝑢
𝑛,12
)2, (2.27)
where {𝐿𝑢, 𝑢 = 1, . . . , 𝑚} is a list of constants satisfying
Í𝑚
𝑢=1 𝐿𝑢 = 0 and
Í𝑚
𝑢=1 𝐿𝑢 log 𝑢 = 1.
Plug in the definition of ¯
𝑍
𝑢
𝑛,12 given in (2.5), then ˆ 𝛼12 is a function of the observed process 𝑋𝑢
𝑛 and
increment 𝑎 only, written as
ˆ 𝛼12 =
1
2
Õ𝑚
𝑢=1
𝐿𝑢 log
©­­
«
1
2𝑛𝛼12−1𝑋𝑢
𝑛
𝑇 ©­­
«
0 𝐼𝑛 ⊗ (𝑎𝑎𝑇 )
𝐼𝑛 ⊗ (𝑎𝑎𝑇 ) 0
ª®®
¬
𝑋𝑢
𝑛
ª®®
¬
2
=
1
2
Õ𝑚
𝑢=1
𝐿𝑢 log
©­­
«
𝑋𝑢
𝑛
𝑇 ©­­
«
0 𝐼𝑛 ⊗ (𝑎𝑎𝑇 )
𝐼𝑛 ⊗ (𝑎𝑎𝑇 ) 0
ª®®
¬
𝑋𝑢
𝑛
ª®®
¬
2
, (2.28)
where 𝑋𝑢
𝑛 = ( (𝑋𝑢
𝑛,1
)𝑇 , (𝑋𝑢
𝑛,2
)𝑇 )𝑇 .
Theorem 4. Assume the increment 𝑎 = (𝑎−𝐽 , 𝑎1−𝐽 , . . . , 𝑎𝐽 )𝑇 of order 𝑝 satisfies
Õ𝐽
𝑘,𝑙=−𝐽
𝑎𝑘𝑎𝑙 |𝑘 − 𝑙 |𝛼12 ≠ 0,
and (2.4) holds for 𝑞 = 2𝑝 + 3 and 𝑖, 𝑗 ∈ {1, 2}. If 𝛼11 + 𝛼22 < 2𝛼12 < 𝛼11 + 𝛼22 + 1 < 4𝑝 + 4 or
4𝑝 + 3 < 𝛼11 + 𝛼22 < 2𝛼12 < 4𝑝 + 4, then ˆ 𝛼12
𝑎→.𝑠. 𝛼12 as 𝑛 → ∞.
18
Proof. It follows from Lemma 1 and the Borel–Cantelli Lemma that ∀𝑢 = 1, . . . , 𝑚,
(¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2
𝑎→.𝑠. 1 as 𝑛 → ∞.
When 𝛼11 + 𝛼22 < 2𝛼12 < 𝛼11 + 𝛼22 + 1 < 4𝑝 + 4 or 4𝑝 + 3 < 𝛼11 + 𝛼22 < 2𝛼12 < 4𝑝 + 4, (2.7) and
(2.10) imply that
𝐸(¯
𝑍
𝑢
𝑛,12
)2 = 𝐶𝑜𝑣(¯
𝑍
𝑢
𝑛,12
) + (𝐸¯
𝑍
𝑢
𝑛,12
)2 → 𝐴2𝑢2𝛼12 ,
where 𝐴 = −𝜌𝜎1𝜎2𝑐12
Í
𝑘,𝑙 𝑎𝑘𝑎𝑙 |𝑘 − 𝑙 |𝛼12 . When
Í
𝑘,𝑙 𝑎𝑘𝑎𝑙 |𝑘 − 𝑙 |𝛼12 ≠ 0, ˆ 𝛼12 defined in (2.27) can
be written as
ˆ 𝛼12 =
1
2
Õ𝑚
𝑢=1
𝐿𝑢
log
(¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2
+ log 𝐸(¯
𝑍
𝑢
𝑛,12
)2
!
=
1
2
Õ𝑚
𝑢=1
𝐿𝑢 log
(¯
𝑍
𝑢
𝑛,12
)2
𝐸(¯
𝑍
𝑢
𝑛,12
)2
+ 1
2
Õ𝑚
𝑢=1
𝐿𝑢 log 𝐸(¯
𝑍
𝑢
𝑛,12
)2
𝑎→.𝑠. 1
2
Õ𝑚
𝑢=1
𝐿𝑢 log 1 + 1
2
Õ𝑚
𝑢=1
𝐿𝑢 log(𝐴2𝑢2𝛼12 ) = 𝛼12
as 𝑛 → ∞ by the continuous mapping theorem.
To derive the asymptotic normality of ˆ 𝛼12, we further assume that as 𝑡 → 0,
𝐶12 (𝑡) = 𝐶21 (𝑡) = 𝜌𝜎1𝜎2 (1 − 𝑐12|𝑡 |𝛼12 + 𝑂(|𝑡 |𝛼12+𝛽12 )), (2.29)
for some 𝛽12 > 0. It follows from (2.7) that 𝐸[𝑍𝑢
𝑛,12
( 𝑗 )] = 𝐴𝑢𝛼12 + 𝑂(𝑛−𝛽12 ). The following
corollary is straightforward when a further assumption is made on 𝛽12.
Corollary 1. Under conditions in Theorem 2, if 𝛼12 + 𝛽12 > (𝛼11 + 𝛼22 + 1)/2, then
𝑛1/2+(𝛼11+𝛼22)/2−𝛼12 (¯
𝑍
𝑛,12 − 𝐴𝜙) 𝑑 → 𝑁(0,Φ) (2.30)
as 𝑛 → ∞, where 𝜙 ∈ R𝑚 and 𝜙𝑗 = 𝑗𝛼12 , 𝑗 = 1, . . . , 𝑚.
The asymptotic normality of ˆ 𝛼12 is then induced by the multivariate delta method.
19
Theorem 5. Take 𝑝 ≥ 1 and assume (2.4) holds for 𝑞 = 2𝑝 + 2. When 𝐴 ≠ 0, if 𝛼11 + 𝛼22 < 2𝛼12
and 𝛼12 + 𝛽12 > (𝛼11 + 𝛼22 + 1)/2, then
𝑛1/2+(𝛼11+𝛼22)/2−𝛼12 ( ˆ 𝛼12 − 𝛼12) 𝑑 → 𝑁(0, 𝐴−2˜
𝐿
𝑇Φ˜
𝐿
) (2.31)
as 𝑛 → ∞, where ˜
𝐿
= (𝐿1, 𝐿2/2𝛼12 , . . . , 𝐿𝑚/𝑚𝛼12 )𝑇 ∈ R𝑚.
Proof. Define a mapping 𝑓 : R𝑚 → R by
𝑓 (𝑥) =
1
2
Õ𝑚
𝑢=1
𝐿𝑢 log 𝑥2
𝑢, ∀𝑥 = (𝑥1, . . . , 𝑥𝑚) ∈ R𝑚.
Then 𝑓 (¯
𝑍
𝑛,12) = ˆ 𝛼12, 𝑓 (𝐴𝜙) = 𝛼12. When 𝐴 ≠ 0, 𝑓 is continuously differentiable in a neighborhood
of 𝐴𝜙 and ∇ 𝑓 (𝐴𝜙) = 𝐴−1˜
𝐿
.
Use the multivariate Taylor’s theorem,
𝑛1/2+(𝛼11+𝛼22)/2−𝛼12 ( ˆ 𝛼12 − 𝛼12) = 𝑛1/2+(𝛼11+𝛼22)/2−𝛼12∇ 𝑓 (𝐴𝑛) (¯𝑍𝑛,12 − 𝐴𝜙),
where |𝐴𝑛 − 𝐴𝜙| < |¯
𝑍
𝑛,12 − 𝐴𝜙|. As 𝑛 → ∞, Theorem 1 implies ¯
𝑍
𝑛,12
𝑃 →
𝐴𝜙, so we also have
𝐴𝑛
𝑃 →
𝐴𝜙. Applying the continuous mapping theorem, ∇ 𝑓 (𝐴𝑛) 𝑃 → ∇ 𝑓 (𝐴𝜙). It follows from
Corollary 1 and Slutsky’s theorem that as 𝑛 → ∞,
𝑛1/2+(𝛼11+𝛼22)/2−𝛼12∇ 𝑓 (𝐴𝑛) (¯
𝑍
𝑛,12 − 𝐴𝜙) 𝑑 →∇ 𝑓 (𝐴𝜙)𝑁(0,Φ) 𝑑 = 𝑁(0, 𝐴−2˜
𝐿
𝑇Φ𝐿).
This finishes the proof.
Take 𝑝 = 1, 𝐽 = 1, and 𝑎 = (1, −2, 1)𝑇 . As was studied by Kent and Wood (1997) and Zhou
and Xiao (2018), the estimators
ˆ 𝛼𝑖𝑖 =
Õ𝑚
𝑢=1
𝐿𝑖,𝑢 log ¯
𝑍
𝑢
𝑛,𝑖 , 𝑖 = 1, 2 (2.32)
are strongly consistent and jointly converge in distribution to a multivariate Gaussian distribution,
where ¯
𝑍
𝑢
𝑛,𝑖 ’s are defined in (2.20), 𝐿𝑖,𝑢’s are constants such that
Í𝑚
𝑢=1 𝐿𝑖,𝑢 = 0 and
Í𝑚
𝑢=1 𝐿𝑖,𝑢 log 𝑢 =
1. The following theorem presents the joint asymptotic distribution of ˆ 𝛼11, ˆ 𝛼22, and ˆ 𝛼12 as 𝑛 → ∞.
20
Theorem 6. Assume that as |𝑡 | → 0, (2.29) holds with 𝛼12 + 𝛽12 > (𝛼11 + 𝛼22 + 1)/2, and
𝐶𝑖𝑖 (𝑡) = 𝜎2
𝑖
− 𝑐𝑖𝑖 |𝑡 |𝛼𝑖𝑖 + 𝑂(|𝑡 |𝛼𝑖𝑖+𝛽𝑖𝑖 ), 𝑖 = 1, 2
for some constants 𝛽11, 𝛽22 > 1/2. If 2𝛼12 > 𝛼11 + 𝛼22, 𝛼12 ≠ 2, and (2.4) holds for 𝑞 = 4, then as
𝑛 → ∞,
𝑛𝐷𝛼
©­­­­­
«
ˆ 𝛼11 − 𝛼11
ˆ 𝛼22 − 𝛼22
ˆ 𝛼12 − 𝛼12
ª®®®®®
¬
𝑑 →
𝑁
©­­­­­
«
0,
©­­­­­
«
𝐴−2
1 ˜
𝐿
𝑇1
Φ1˜
𝐿
1
𝐴−2
2 ˜
𝐿
𝑇2
Φ2˜
𝐿
2
𝐴−2˜
𝐿
𝑇3
Φ˜
𝐿
3
ª®®®®®
¬
ª®®®®®
¬
, (2.33)
where 𝐴𝑖 = 𝑐𝑖𝑖 (8 − 2𝛼𝑖𝑖+1) and ˜
𝐿
𝑖 = (𝐿𝑖,1, 𝐿𝑖,2/2𝛼𝑖𝑖 , . . . , 𝐿𝑖,𝑚/𝑚𝛼𝑖𝑖 )𝑇 ∈ R𝑚 for 𝑖 = 1, 2, 𝐴 =
𝜌𝜎1𝜎2𝑐12(8 − 2𝛼12+1), ˜
𝐿
3 = (𝐿3,1, 𝐿3,2/2𝛼12 , . . . , 𝐿3,𝑚/𝑚𝛼12 )𝑇 ∈ R𝑚, the matrices Φ1,Φ2,Φ ∈
R𝑚×𝑚 and 𝐷𝛼 are as defined in Theorem 3.
Proof. When 𝑎 = (1, −2, 1)𝑇 , we have
𝐴 = −𝜌𝜎1𝜎2𝑐12
Õ𝐽
𝑘,𝑙=−𝐽
𝑎𝑘𝑎𝑙 |𝑘 − 𝑙 |𝛼12 = 𝜌𝜎1𝜎2𝑐12 (8 − 2𝛼12+1).
It follows from (2.7) and Equation (14) in Zhou and Xiao (2018) that as 𝑛 → ∞,
𝑛𝐷𝛼
©­­­­­
«
𝐸¯
𝑍
𝑛,1 − 𝐴1𝜙1
𝐸¯
𝑍
𝑛,2 − 𝐴2𝜙2
𝐸¯
𝑍
𝑛,12 − 𝐴𝜙
ª®®®®®
¬
=
©­­­­­
«
𝑂

𝑛1/2−𝛽11

𝑂

𝑛1/2−𝛽22

𝑂

𝑛(1+𝛼11+𝛼22)/2−𝛼12−𝛽12

ª®®®®®
¬
→ 0 (2.34)
if 𝛽11, 𝛽22 > 1/2 and 𝛼12 + 𝛽12 > (𝛼11 + 𝛼22 + 1)/2, where 𝜙𝑖 = (1, 2𝛼𝑖𝑖 , . . . , 𝑚𝛼𝑖𝑖 )𝑇 for 𝑖 = 1, 2,
and 𝜙 = (1, 2𝛼12 , . . . , 𝑚𝛼12 )𝑇 . Together with Theorem 3 this implies that
𝑛𝐷𝛼
©­­­­­
«
¯𝑍
𝑛,1 − 𝐴1𝜙1
¯𝑍
𝑛,2 − 𝐴2𝜙2
¯𝑍
𝑛,12 − 𝐴𝜙
ª®®®®®
¬
𝑑 →
𝑁
©­­­­­
«
0,
©­­­­­
«
Φ1
Φ2
Φ
ª®®®®®
¬
ª®®®®®
¬
(2.35)
as 𝑛 → ∞.
Define a mapping f : R2𝑚
>0
× R ↦→ R3 as
f(x) =
©­­­­­
«
Í𝑚
𝑢=1 𝐿1,𝑢 log 𝑥1,𝑢
Í𝑚
𝑢=1 𝐿2,𝑢 log 𝑥2,𝑢
1
2
Í𝑚
𝑢=1 𝐿3,𝑢 log 𝑥2
3,𝑢
ª®®®®®
¬
21
for any x = (𝑥1,1, . . . , 𝑥1,𝑚, 𝑥2,1, . . . , 𝑥2,𝑚, 𝑥3,1, . . . , 𝑥3,𝑚) ∈ R2𝑚
>0
×R, where 𝐿𝑖,𝑢’s are constants such
that
Í𝑚
𝑢=1 𝐿𝑖,𝑢 = 0 and
Í𝑚
𝑢=1 𝐿𝑖,𝑢 log 𝑢 = 1, ∀𝑖 ∈ {1, 2, 3}. Denote by ¯
𝑍
𝑛 = (¯
𝑍
𝑇
𝑛,1, ¯
𝑍
𝑇
𝑛,2, ¯
𝑍
𝑇
𝑛,12
)𝑇 and
𝝓 = (𝐴1(𝜙1)𝑇 , 𝐴2(𝜙2)𝑇 , 𝐴𝜙𝑇 )𝑇 , then
f(¯
𝑍
𝑛) = ( ˆ 𝛼11, ˆ 𝛼22, ˆ 𝛼12)𝑇 , f(𝝓) = (𝛼11, 𝛼22, 𝛼12)𝑇 .
When 𝛼12 ≠ 2, 𝐴 = 𝜌𝜎1𝜎2𝑐12 (8 − 2𝛼12+1) ≠ 0 and f is thus continuously differentiable in a
neighborhood of 𝝓. Moreover, ∇f(𝝓) = (𝐴−1
1 ˜
𝐿
𝑇1
, 𝐴−1
2 ˜
𝐿
𝑇2
, 𝐴−1˜
𝐿
𝑇3
)𝑇 .
In a similar manner as in the proof of Theorem 5, it could be proved that as 𝑛 → ∞,
𝑛𝐷𝛼
©­­­­­
«
ˆ 𝛼11 − 𝛼11
ˆ 𝛼22 − 𝛼22
ˆ 𝛼12 − 𝛼12
ª®®®®®
¬
𝑑 →
∇f(𝝓)𝑁
©­­­­­
«
0,
©­­­­­
«
Φ1
Φ2
Φ
ª®®®®®
¬
ª®®®®®
¬
𝑑 =
𝑁
©­­­­­
«
0,
©­­­­­
«
𝐴−2
1 ˜
𝐿
𝑇1
Φ1˜
𝐿
1
𝐴−2
2 ˜
𝐿
𝑇2
Φ2˜
𝐿
2
𝐴−2˜
𝐿
𝑇3
Φ˜
𝐿
3
ª®®®®®
¬
ª®®®®®
¬
.
This finishes the proof.
2.2.3 Simulation
Denote by 𝑀𝜈 the Matérn covariance function with parameter 𝜈. Namely,
𝑀𝜈 (𝑡) = 21−𝜈Γ(𝜈)−1|𝑡 |𝜈𝐾𝜈 (|𝑡 |)
= 1 − Γ(1 − 𝜈)
4𝜈Γ(1 + 𝜈)
|𝑡 |2𝜈 + 1
4(1 − 𝜈)
|𝑡 |2 + 𝑂(|𝑡 |2𝜈+2) + 𝑂(|𝑡 |4) as 𝑡 → 0.
Take 𝐶11 = 𝐶22 = 𝑀0.5 and 𝐶12 = 𝐶21 = 0.5𝑀0.55. Let 𝑚 = 50, 𝑝 = 1, 𝑎 = (1, −2, 1)𝑇 and
𝑛 ∈ {200, 250, . . . , 1500}. For each value of 𝑛, generate 3000 independent realizations of the
process 𝑋. In this case, 𝜎1 = 𝜎2 = 1, 𝛼11 = 𝛼22 = 1, 𝜌 = 0.5, 𝛼12 = 1.1 > (𝛼11 +𝛼22)/2, 𝛽12 = 0.9,
𝑐12 = 0.51.1Γ(1 − 0.55)/Γ(1 + 0.55), 𝑐11 = 𝑐22 = 0.5Γ(0.5)/Γ(1.5),
𝐴 = −𝜌𝜎1𝜎2𝑐12
Õ
𝑘,𝑙
𝑎𝑘𝑎𝑙 |𝑘 − 𝑙 |𝛼12 = 𝑐12 (4 − 21.1) ≈ 1.9177 ≠ 0,
22
𝛼12 + 𝛽12 = 2 > 3/2 = (𝛼11 + 𝛼22 + 1)/2.
It follows from Theorem 2 that ∀𝑢 = 1, . . . , 𝑚,
Φ𝑢,𝑢 = 𝐴11𝐴22
Õ∞
ℎ=−∞
Õ𝐽
𝑠,𝑡, 𝑗 ,𝑙=−𝐽
𝑎𝑠𝑎𝑡𝑎 𝑗𝑎𝑙 |ℎ + 𝑠𝑢 − 𝑡𝑣|𝛼11 |ℎ + 𝑗𝑢 − 𝑙𝑣|𝛼22
= (𝐴11)2
Õ∞
ℎ=−∞
(6|ℎ| − 4|ℎ + 𝑢| + |ℎ + 2𝑢| − 4|ℎ − 𝑢| + |ℎ + 2𝑢|)2
=

Γ(0.5)
2Γ(1.5)
2
16𝑢2 + 2
Õ𝑢
ℎ=1
(6ℎ − 4(ℎ + 𝑢) + 4𝑢 − 4(𝑢 − ℎ))2
+2
Õ2𝑢
ℎ=𝑢+1
(6ℎ − 4(ℎ + 𝑢) + 4𝑢 − 4(ℎ − 𝑢))2 + 2
Õ∞
ℎ=2𝑢+1
(6ℎ − 4(ℎ + 𝑢) + 2ℎ − 4(ℎ − 𝑢))2
!
=
8
3
(4𝑢3 + 5𝑢)
is the asymptotic marginal variance of 𝑛1/2+(𝛼11+𝛼22)/2−𝛼12¯
𝑍
𝑢
𝑛,12 as (2.15) presented. The empirical
marginal distributions of ¯
𝑍
𝑢
𝑛,12 (𝑢 = 1, 10, 20, 30, 40, 50) when 𝑛 = 1500 are shown in Figure 2.1,
where 3000 realizations are presented in the histogram.
Take ˆ 𝛼12 as the ordinary least squares estimator for 𝛽1 in the linear regression model
1
2
log(¯
𝑍
𝑛,12)2 =
©­­­­­­­­
«
1 log 1
1 log 2
...
...
1 log𝑚
ª®®®®®®®®
¬
©­­
«
𝛽0
𝛽1
ª®®
¬
,
then as was simplified by Kent and Wood (1997),
ˆ 𝛼12 =
1
2
Õ𝑚
𝑢=1
log 𝑢 − 1
𝑚
Í𝑚𝑣
=1 log 𝑣
Í𝑚
𝑢=1

log 𝑢 − 1
𝑚
Í𝑚𝑣
=1 log 𝑣
2 log(¯
𝑍
𝑢
𝑛,12
)2,
which is an example of the estimator defined in (2.27). Since conditions in Theorem 4 are satisfied,
ˆ 𝛼12 is a strongly consistent estimator for 𝛼12. The asymptotic normality follows from Theorem 5.
Figure 2.3 and 2.2 confirm these claims.
2.3 Irregular Sampling
Since regularly spaced data is not always available, it is of practical importance to study estimators
of the smoothness parameter based on irregular sampling designs. Given observations of
23
u = 1
Density
−20 −10 0 10 20
0.00 0.02 0.04 0.06 0.08
u = 10
Density
−400 −200 0 200 400
0.000 0.001 0.002 0.003 0.004
u = 20
Density
−1000 −500 0 500 1000
0.0000 0.0004 0.0008 0.0012
u = 30
Density
−2000 −1000 0 1000 2000
0e+00 2e−04 4e−04 6e−04
u = 40
Density
−2000 0 2000 4000
0e+00 2e−04 4e−04
u = 50
Density
−4000 −2000 0 2000 4000 6000
0e+00 1e−04 2e−04 3e−04
Figure 2.1The empirical distribution of
√
𝑛1−2𝛼12+𝛼11+𝛼22 (¯
𝑍
𝑢
𝑛
− 𝐴𝑢𝛼12 ) when 𝑛 = 1500 with 3000
realizations. The red curve is the density function of 𝑁(0, 8(4𝑢3 + 5𝑢)/3).
n1-2a12+a11+a22(a^
12 - a12)
Density
−40 −20 0 20 40
0.00 0.04 0.08 0.12
Figure 2.2The empirical distribution of
√
𝑛1−2𝛼12+𝛼11+𝛼22 ( ˆ 𝛼12 − 𝛼12) when 𝑛 = 1500 with 3000
realizations. The red curve is the density function of 𝑁(0, 𝐴−2˜
𝐿
𝑇Φ𝑛˜
𝐿
), where Φ𝑛 is the empirical
covariance matrix of ¯
𝑍
𝑛,12 with 3000 realizations when 𝑛 = 1500.
24
200 400 600 800 1000 1200 1400
0.20 0.25 0.30
n
Bias
Figure 2.3The average absolute value of bias among 3000 realizations when
𝑛 = 200, 250, . . . , 1500.
a Gaussian process, constructing quadratic variations of a certain order is an essential step when
defining increment-based estimators of the smoothness parameter. When the observation locations
are not evenly spaced, coefficients of the increment discussed in Section 2.2 will be related to distances
between sampling points. Begyn (2005), Loh (2015), and Loh et al. (2021) proposed several
irregular sampling designs, based on which the infill asymptotic properties of quadratic variations
are studied. Details of the irregular sampling designs are included in Appendix A.
In Section 2.3.1, we discuss the joint behaviors of quadratic variations for two coordinates in the
bivariate model based on the deformed sampling design. In Section 2.3.2, we define a strong consistent
estimator for the cross smoothness parameter and present the rate of almost sure convergence
for estimators based on the stratified sampling design.
2.3.1 Quadratic Variations
Consider a special case of the bivariate stationary Gaussian process 𝑋(𝑡) = (𝑋1 (𝑡), 𝑋2(𝑡)) defined
in (2.1-2.3). Let the autocovariance function for each coordinate of 𝑋 and the cross-covariance
25
function of 𝑋 all take the following form such that ∀𝑡, 𝑠 ∈ R and ∀𝑖, 𝑗 ∈ {1, 2},
𝐶𝑖 𝑗 (𝑡) =
⌊𝛼Õ𝑖 𝑗/2⌋
𝑘=0
𝛽𝑘 (𝜃𝑖 𝑗 |𝑡 |)2𝑘 + 𝛽∗
𝛼𝑖 𝑗𝐺𝛼𝑖 𝑗
(𝜃𝑖 𝑗 |𝑡 |) + 𝑂(|𝑡 |𝛼𝑖 𝑗+𝜏) (2.36)
as |𝑡 | → 0 for some constant 𝜏 > 0, where 𝛽0 = 𝜎𝑖𝜎𝑗 (𝜌+(1−𝜌)1𝑖=𝑗 ), ⌊𝑥⌋ = max{𝑥0 ∈ Z : 𝑥0 < 𝑥},
𝛽∗
𝛼𝑖 𝑗 ≠ 0, and 𝐺𝛼𝑖 𝑗 : [0, ∞) ↦→ R is defined by
𝐺𝛼𝑖 𝑗
(𝑥) = 𝑥𝛼𝑖 𝑗 + 𝑥𝛼𝑖 𝑗 (log 𝑥 − 1)1Z(𝛼𝑖 𝑗/2)
when 𝑥 > 0 and 𝐺𝛼𝑖 𝑗
(0) = 0.
Under the setting of deformed sampling design defined in (A.3), we study the cross-covariance
of quadratic variations defined in (A.6) for coordinates 𝑋1 and 𝑋2.
Proposition 1. For dilation 𝜃 ∈ {1, 2} and the order of increment ℓ ∈ {1, 2, . . . , ⌊(𝑛 − 1)/𝜃⌋},
𝐸(𝑉1
𝜃,ℓ𝑉2
𝜃,ℓ
)
𝐸𝑉1
𝜃,ℓ𝐸𝑉2
𝜃,ℓ
=
8>>>>>>>>
<
>>>>>>>>:
𝑂(𝑛𝛼11+𝛼22−2𝛼12−1) if 𝛼12 < 2ℓ − 1/2,
𝑂(𝑛𝛼11+𝛼22−2𝛼12−1 log 𝑛) if 𝛼12 = 2ℓ − 1/2,
𝑂(𝑛𝛼11+𝛼22−4ℓ) if 𝛼12 > 2ℓ − 1/2,
where 𝑉𝑖𝜃
,ℓ is the quadratic variation of 𝑋𝑖 (𝑖 = 1, 2) as defined in (A.6).
Proof. For the brevity of symbols, denote by 𝑎𝑖 = (𝑎𝜃,ℓ;𝑖,𝑘 )ℓ
𝑘=0 the vector of increment defined in
(A.4). Write 𝑋 𝑗
𝑖 = (𝑋𝑗 (𝑡𝑖+𝜃𝑘 ))ℓ
𝑘=0 and ∇𝜃,ℓ𝑋 𝑗
𝑖 = 𝑎𝑇𝑖
𝑋 𝑗
𝑖 . Then
𝐸(𝑉1
𝜃,ℓ𝑉2
𝜃,ℓ
) = 𝐸
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1
©­
«
Õℓ
𝑘=0
𝑎𝜃,ℓ;𝑖,𝑘 𝑋1(𝑡𝑖+𝜃𝑘 )
!2
Õℓ
𝑘=0
𝑎𝜃,ℓ; 𝑗 ,𝑘 𝑋2(𝑡 𝑗+𝜃𝑘 )
!2
ª®
¬
= 𝐸
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1

𝑎𝑇𝑖
𝑋1
𝑖
2 
𝑎𝑇𝑗
𝑋2
𝑗
2
=
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1

𝐸

(𝑋1
𝑖
)𝑇 (𝑎𝑖𝑎𝑇𝑖
)𝑋1
𝑖

𝐸
h
(𝑋2
𝑗
)𝑇 (𝑎 𝑗𝑎𝑇𝑗
)𝑋2
𝑗
i
+ 2

𝐸
h
(𝑋1
𝑖
)𝑇 (𝑎𝑖𝑎𝑇𝑗
)𝑋2
𝑗
i 2
=
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1
𝐸(∇𝜃,ℓ𝑋1
𝑖
)2𝐸(∇𝜃,ℓ𝑋2
𝑗
)2 + 2
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1

𝐸
h
(𝑋1
𝑖
)𝑇 (𝑎𝑖𝑎𝑇𝑗
)𝑋2
𝑗
i 2
.
26
By Theorem 1 (a) in Loh (2015),
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1
𝐸(∇𝜃,ℓ𝑋1
𝑖
)2𝐸(∇𝜃,ℓ𝑋2
𝑗
)2 = 𝐸𝑉1
𝜃,ℓ𝐸𝑉2
𝜃,ℓ = 𝑂(𝑛2ℓ+1−𝛼11 ) · 𝑂(𝑛2ℓ+1−𝛼22 ) (2.37)
as 𝑛 → ∞.
With the cross-covariance function defined in (2.36),
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1

𝐸
h
(𝑋1
𝑖
)𝑇 (𝑎𝑖𝑎𝑇𝑗
)𝑋2
𝑗
i 2
= 𝑂
©­­
«
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1
©­
«
Õℓ
𝑝,𝑞=0
𝑎𝜃,ℓ;𝑖,𝑝𝑎𝜃,ℓ; 𝑗 ,𝑞 |𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞 |𝛼12ª®
¬
2
ª®®
¬
as 𝑛 → ∞. The properties of ℓth order increment imply that as 𝑛 → ∞,
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1
©­
«
Õℓ
𝑝,𝑞=0
𝑎𝜃,ℓ;𝑖,𝑝𝑎𝜃,ℓ; 𝑗 ,𝑞 |𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞 |𝛼12ª®
¬
2
=
Õ
|𝑖−𝑗 |≤𝜃ℓ+1
©­
«
Õℓ
𝑝,𝑞=0
𝑂(𝑛2ℓ)

𝑖 − 𝑗 + 𝜃(𝑝 − 𝑞)
𝑛 − 1 𝜑(1) (0) + 𝑂(𝑛−2)
𝛼12ª® ¬
2
+
Õ
|𝑖−𝑗 |>𝜃ℓ+1
©­
«
Õℓ
𝑝,𝑞=0
𝑎𝜃,ℓ;𝑖,𝑝𝑎𝜃,ℓ; 𝑗 ,𝑞 |𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞 |𝛼12ª®
¬
2
:=𝐴𝑛 + 𝐵𝑛,
where 𝐴𝑛 = 𝑂(𝑛1+4ℓ−2𝛼12 ) and
𝐵𝑛 ≤
Õ
|𝑖−𝑗 |>𝜃ℓ+1
©­
«
Õℓ
𝑝,𝑞=0
|𝑎𝜃,ℓ;𝑖,𝑝𝑎𝜃,ℓ; 𝑗 ,𝑞 | · |𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞 |𝛼12+2ℓ−2ℓª®
¬
2
≤
Õ
|𝑖−𝑗 |>𝜃ℓ+1
©­
«
max
0≤𝑝,𝑞≤ℓ
|𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞 |𝛼12−2ℓ
Õℓ
𝑝,𝑞=0
|𝑎𝜃,ℓ;𝑖,𝑝𝑎𝜃,ℓ; 𝑗 ,𝑞 (𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞)2ℓ |ª®
¬
2
= 𝑂(1)
Õ
|𝑖−𝑗 |>𝜃ℓ+1
max
0≤𝑝,𝑞≤ℓ
|𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞 |2𝛼12−4ℓ
= 𝑂(𝑛2)
¹ 1
1/𝑛
𝑠2𝛼12−4ℓ𝑑𝑠.
Thus,
𝑛Õ−𝜃ℓ
𝑖, 𝑗=1
©­
«
Õℓ
𝑝,𝑞=0
𝑎𝜃,ℓ;𝑖,𝑝𝑎𝜃,ℓ; 𝑗 ,𝑞 |𝑡𝑖+𝜃 𝑝 − 𝑡 𝑗+𝜃𝑞 |𝛼12ª®
¬
2
=
8>>>>>>>>
<
>>>>>>>>:
𝑂(𝑛1+4ℓ−2𝛼12 ) if 𝛼12 < 2ℓ − 1/2,
𝑂(𝑛2 log 𝑛) if 𝛼12 = 2ℓ − 1/2,
𝑂(𝑛2) if 𝛼12 > 2ℓ − 1/2.
27
This finishes the proof together with (2.37).
For a stationary GRF 𝑋 on R𝑑 with zero mean and the isotropic Matérn covariance function
𝐶(t) =
𝜎2(𝜂||t||)𝜈
2𝜈−1Γ(𝜈)
𝜅𝜈 (𝜂||t||), ∀t ∈ R𝑑, (2.38)
where 𝜎, 𝜂, 𝜈 > 0 are constants, we discuss the finite sample joint distribution of 𝑉1,1,ℓ and 𝑉2,1,ℓ
in the remaining of this section. The quadratic variations 𝑉𝜃,𝑑,ℓ are defined in (A.21). Consider the
case when 𝑑 = 1 and 0 < 𝜈 < ℓ, 𝜈 ∉ Z. Write
∇𝜃,ℓ𝑋 =
􀀀
∇𝜃,1,ℓ𝑋𝑖
𝑛−2ℓ𝜔𝑛
𝑖=1
and denote by 𝑉𝑢𝑣 (𝑛, ℓ) = (∇𝑢,ℓ𝑋)𝑇∇𝑣,ℓ𝑋, 𝑊𝑢𝑣 (𝑛, ℓ) = 𝐶𝑜𝑣(∇𝑢,ℓ𝑋, ∇𝑣,ℓ𝑋) for 𝑢, 𝑣 ∈ {1, 2}. For
the brevity, write 𝑉𝑢𝑣 (𝑛, ℓ) as 𝑉𝑢𝑣 and 𝑊𝑢𝑣 (𝑛, ℓ) as 𝑊𝑢𝑣 in the following text.
It follows from Eq.(15) in Loh et al. (2021) that as 𝑛 → ∞,
(𝑊𝑢𝑣)𝑖,𝑖+ℎ
=𝛽∗
𝜈
Õℓ
𝑗 ,𝑘=0
𝑐i,𝑢,1,ℓ ( 𝑗 )𝑐i+h,𝑣,1,ℓ (𝑘)

ℎ + (𝑣𝑘 − 𝑢 𝑗 )𝜔𝑛 + 𝛿𝑖+ℎ,𝑘 − 𝛿𝑖, 𝑗
𝑛

2𝜈
+ 𝑂
𝜔𝑛
𝑛
2ℓ

+ 𝑂
𝜔𝑛
𝑛
2𝜈+2
=𝛽∗
𝜈
Õℓ
𝑗 ,𝑘=0

𝑐ℓ ( 𝑗 ) + 𝑂(𝜔−1
𝑛
)
 
𝑐ℓ (𝑘) + 𝑂(𝜔−1
𝑛
)
ℎ + (𝑣𝑘 − 𝑢 𝑗 )𝜔𝑛 + 𝛿𝑖+ℎ,𝑘 − 𝛿𝑖, 𝑗
𝑛

2𝜈
+ 𝑜
𝜔𝑛
𝑛
2𝜈

=𝛽∗
𝜈
Õℓ
𝑗 ,𝑘=0
𝑐ℓ ( 𝑗 )𝑐ℓ (𝑘)

ℎ + (𝑣𝑘 − 𝑢 𝑗 )𝜔𝑛 + 𝛿𝑖+ℎ,𝑘 − 𝛿𝑖, 𝑗
𝑛

2𝜈
+ 𝑜
𝜔𝑛
𝑛
2𝜈

=
𝜔𝑛
𝑛
2𝜈
𝛽∗
𝜈
Õℓ
𝑗 ,𝑘=0
𝑐ℓ ( 𝑗 )𝑐ℓ (𝑘)

𝑣𝑘 − 𝑢 𝑗 +
ℎ + 𝛿𝑖+ℎ,𝑘 − 𝛿𝑖, 𝑗
𝜔𝑛

2𝜈
+ 𝑜
𝜔𝑛
𝑛
2𝜈

(2.39)
for any 1 ≤ 𝑖 ≤ 𝑖 + ℎ ≤ 𝑛 − 2ℓ𝜔𝑛. Denote by 𝑎𝑢𝑣 (𝜈, ℓ) = 𝛽∗
𝜈
Íℓ
𝑗
,𝑘=0 𝑐ℓ ( 𝑗 )𝑐ℓ (𝑘) |𝑣𝑘 − 𝑢 𝑗 |2𝜈, then
∀1 ≤ 𝑖 ≤ 𝑖 + ℎ ≤ 𝑛 − 2ℓ𝜔𝑛,
(𝑛/𝜔𝑛)2𝜈 (𝑊𝑢𝑣)𝑖,𝑖+ℎ → 𝑎𝑢𝑣 (𝜈, ℓ) (2.40)
as 𝑛 → ∞.
28
Take 𝜖 ∼ 𝑁(0, 𝐼𝑛−2ℓ𝜔𝑛
), then for 𝜃 = 1, 2,
(𝑛/𝜔𝑛)2𝜈
𝑛 − 2ℓ𝜔𝑛
𝑉𝜃,1,ℓ
𝑑 =
(𝑛/𝜔𝑛)2𝜈
𝑛 − 2ℓ𝜔𝑛
𝜖𝑇𝑊𝜃𝜃𝜖
𝑑 =
𝜖𝑇
 (𝑛/𝜔𝑛)2𝜈
𝑛 − 2ℓ𝜔𝑛
𝑑𝑖𝑎𝑔(eig(𝑊𝜃𝜃 ))

𝜖 := 𝜖𝑇Λ𝜃𝑛
𝜖,
the cumulant generating function of which is
log 𝐸𝑒𝑡𝜖𝑇Λ𝜃
𝑛 𝜖 =
𝑛−Õ2ℓ𝜔𝑛
𝑘=1
log(1 − 2𝑡𝜆𝑘 )−1/2
=
1
2
Õ∞
𝑚=1
(2𝑡)𝑚
𝑚
𝑛−Õ2ℓ𝜔𝑛
𝑘=1
𝜆𝑚𝑘
,
where 𝑡 < min(𝜆−1
𝑘
) and 𝜆𝑘 , 𝑘 = 1, . . . , 𝑛 − 2ℓ𝜔𝑛 are diagonal elements of Λ𝜃𝑛
.
Denote by 𝑟𝑛 = (𝑛/𝜔𝑛)2𝜈
𝑛−2ℓ𝜔𝑛
and recall the notation 𝑊𝜃 = 𝑉𝜃,1,ℓ/𝐸𝑉𝜃,1,ℓ for 𝜃 = 1, 2. Write 𝐻𝑛 =
𝑊22−𝑊21𝑊−1
11𝑊12, then ∇2,ℓ𝑋|∇1,ℓ𝑋 ∼ 𝑁(𝑊21𝑊−1
11
∇1,ℓ𝑋, 𝐻𝑛) and the moment generating function
of 𝑉2,1,ℓ |∇1,ℓ𝑋 is
𝑀𝑉2,1,ℓ |∇1,ℓ 𝑋 (𝑡)
=|𝐼 − 2𝑡𝐻𝑛|−1/2 exp

−1
2
(∇1,ℓ𝑋)𝑇𝑊−1
11𝑊12

𝐼 − (𝐼 − 2𝑡𝐻𝑛)−1

𝐻−1
𝑛 𝑊21𝑊−1
11
∇1,ℓ𝑋

, (2.41)
where 𝐼 is the (𝑛 − 2ℓ𝜔𝑛)-dimensional identity matrix. Moreover, the moment generating function
of the vector ˜
𝑉
:= 𝑟𝑛 (𝑉1,1,ℓ,𝑉2,1,ℓ)𝑇 is
𝑀˜
𝑉
(𝑠, 𝑡) =

𝐼2(𝑛−2ℓ𝜔𝑛) − 2
©­­
«
𝑟𝑛𝑡𝐻𝑛 0
0 𝐻𝑠𝑡
𝑛 𝑊11
ª®®
¬

−1/2
, (2.42)
where
𝐻𝑠𝑡
𝑛 = 𝑟𝑛𝑠𝐼 − 1
2𝑊−1
11𝑊12

𝐼 − (𝐼 − 2𝑟𝑛𝑡𝐻𝑛)−1

𝐻−1
𝑛 𝑊21𝑊−1
11 .
29
This is due to the fact that
𝑀˜
𝑉
(𝑠, 𝑡) = 𝐸

𝑒𝑟𝑛 (𝑠𝑉1,1,ℓ+𝑡𝑉2,1,ℓ ) 
= 𝐸

𝑒𝑟𝑛𝑠𝑉1,1,ℓ𝐸

𝑒𝑟𝑛𝑡𝑉2,1,ℓ |∇1,ℓ𝑋
 
= 𝐸

𝑒𝑟𝑛𝑠𝑉1,1,ℓ𝑀𝑉2,1,ℓ |∇1,ℓ 𝑋 (𝑟𝑛𝑡)

= |𝐼 − 2𝑟𝑛𝑡𝐻𝑛|−1/2𝐸
h
exp

(∇1,ℓ𝑋)𝑇𝐻𝑠𝑡
𝑛
∇1,ℓ𝑋
i
= |𝐼 − 2𝑟𝑛𝑡𝐻𝑛|−1/2𝑀(∇1,ℓ 𝑋)𝑇𝐻𝑠𝑡
𝑛 ∇1,ℓ 𝑋
(1)
= |𝐼 − 2𝑟𝑛𝑡𝐻𝑛|−1/2|𝐼 − 2𝐻𝑠𝑡
𝑛 𝑊11|−1/2
=

𝐼2(𝑛−2ℓ𝜔𝑛) − 2
©­­
«
𝑟𝑛𝑡𝐻𝑛 0
0 𝐻𝑠𝑡
𝑛 𝑊11
ª®®
¬

−1/2
.
2.3.2 Estimating Smoothness Parameters
We first consider a univariate stationary GRF 𝑋 on R𝑑 with zero mean and the isotropic Matérn
covariance function (2.38). Based on the stratified design introduced in Appendix A.2.3, the following
results on the rate of convergence hold for ˆ 𝜈𝑛,ℓ defined in (A.26).
Proposition 2. When 𝑑 ∈ {1, 2, 3} and ℓ ∈ Z+,
1. if 0 < 𝜈 ≤ ℓ − 1, then
𝑛𝑑(1−𝛾0)/2−𝑘 (𝜈ˆ𝑛,ℓ − 𝜈) 𝑎→.𝑠. 0 as 𝑛 → ∞
for any (𝑑(1 − 𝛾0)/2 − 𝛾0) ∨ (𝑑/2 − 2) (1 − 𝛾0) < 𝑘 < 𝑑(1 − 𝛾0)/2;
2. if ℓ − 1 < 𝜈 < ℓ − 𝑑/4, then
𝑛𝑑(1−𝛾0)/2−𝑘 ( ˆ 𝜈𝑛,ℓ − 𝜈) 𝑎→.𝑠. 0 as 𝑛 → ∞
for any (𝑑(1 − 𝛾0)/2 − 𝛾0) ∨ (𝑑/2 − 2ℓ + 2𝜈) (1 − 𝛾0) < 𝑘 < 𝑑(1 − 𝛾0)/2;
3. if 𝜈 = ℓ − 𝑑/4, then
𝑛𝑑(1−𝛾0)/2−𝑘 (log 𝑛)−1/2 ( ˆ 𝜈𝑛,ℓ − 𝜈) 𝑎→.𝑠. 0 as 𝑛 → ∞
30
for any (𝑑(1 − 𝛾0)/2 − 𝛾0) ∨ (𝑑/2 − 2ℓ + 2𝜈) (1 − 𝛾0) < 𝑘 < 𝑑(1 − 𝛾0)/2;
4. if ℓ − 𝑑/4 < 𝜈 < ℓ, then
𝑛(2ℓ−2𝜈) (1−𝛾0)−𝑘 (𝜈ˆ𝑛,ℓ − 𝜈) 𝑎→.𝑠. 0 as 𝑛 → ∞
for any (2ℓ − 2𝜈) (1 − 𝛾0) − 𝛾0 < 𝑘 < (2ℓ − 2𝜈) (1 − 𝛾0).
Proof. Theorem 1(a) in Loh et al. (2021) implies that as 𝑛 → ∞
ˆ 𝜈𝑛,ℓ − 𝜈 =
log(𝑉2,𝑑,ℓ/𝑉1,𝑑,ℓ) − log(22𝜈)
2 log 2
=
1
2 log 2
log©­
«
𝑉2,𝑑,ℓ/𝐸𝑉2,𝑑,ℓ
𝑉1,𝑑,ℓ/𝐸𝑉1,𝑑,ℓ
· 𝐸𝑉2,𝑑,ℓ
𝐸𝑉1,𝑑,ℓ
22𝜈
ª®
¬
=
1
2 log 2
log©­
«
𝑉2,𝑑,ℓ/𝐸𝑉2,𝑑,ℓ
𝑉1,𝑑,ℓ/𝐸𝑉1,𝑑,ℓ
􀀀
22𝜈 + 𝑂(ℎ(𝑛))

22𝜈
ª®
¬
=
1
2 log 2
log

𝑉2,𝑑,ℓ/𝐸𝑉2,𝑑,ℓ
𝑉1,𝑑,ℓ/𝐸𝑉1,𝑑,ℓ
(1 + 𝑂(ℎ(𝑛)))

, (2.43)
where
ℎ(𝑛) =
8>>>>>>>>
<
>>>>>>>>:
𝑛−𝛾0 + 𝑛(𝛾0−1) ( (2ℓ−2𝜈)∧2) if 𝜈 ∉ Z,
𝑛−𝛾0 + 𝑛2(𝛾0−1) log 𝑛 if 𝜈 = ℓ − 1,
𝑛−𝛾0 + 𝑛2(𝛾0−1) if 0 < 𝜈 ≤ ℓ − 2, 𝜈 ∈ Z.
Denote by 𝑊𝜃 = 𝑉𝜃,𝑑,ℓ/𝐸𝑉𝜃,𝑑,ℓ for 𝜃 = 1, 2, then it suffices to find the convergence rate of
𝑊2/𝑊1 − 1. It was proved in Loh et al. (2021) (P21-25) that
𝑃(|𝑊𝜃 − 1| ≥ 𝜖) ≤ 2 exp

−𝐶 min

𝜖
𝑎𝑛
,
𝜖2
𝑏𝑛

, ∀𝜖 > 0,
where as 𝑛 → ∞,
𝑎𝑛 =
8>>>>>>>>
<
>>>>>>>>:
𝑂(𝑛𝑑(𝛾0−1)) if 𝜈 < ℓ − 𝑑/2,
𝑂(𝑛𝑑(𝛾0−1) log 𝑛) if 𝜈 = ℓ − 𝑑/2,
𝑂(𝑛(2ℓ−2𝜈) (𝛾0−1)) if ℓ − 𝑑/2 < 𝜈 < ℓ,
31
𝑏𝑛 =
8>>>>>>>>
<
>>>>>>>>:
𝑂(𝑛𝑑(𝛾0−1)) if 𝜈 < ℓ − 𝑑/4,
𝑂(𝑛𝑑(𝛾0−1) log 𝑛) if 𝜈 = ℓ − 𝑑/4,
𝑂(𝑛(4ℓ−4𝜈) (𝛾0−1)) if ℓ − 𝑑/4 < 𝜈 < ℓ.
Then for any positive constant 𝑐0,
𝑃(𝑐0|𝑊𝜃 − 1| ≥ 𝜖) ≤ 2 exp
−𝐶 min
(
𝜖
𝑐0𝑎𝑛
,
𝜖2
𝑐2
0𝑏𝑛
)!
, ∀𝜖 > 0.
By the Borel-Cantelli lemma, for 𝜃 = 1, 2, 𝑓 (𝑛, 𝑘) (𝑊𝜃 − 1) → 0 a.s. as 𝑛 → ∞ for any 𝑘 > 0,
where
𝑓 (𝑛, 𝑘) =
8>>>>>>>>
<
>>>>>>>>:
𝑛𝑑(1−𝛾0)/2−𝑘 if 𝜈 < ℓ − 𝑑/4,
𝑛𝑑(1−𝛾0)/2−𝑘 (log 𝑛)−1/2 if 𝜈 = ℓ − 𝑑/4,
𝑛(2ℓ−2𝜈) (1−𝛾0)−𝑘 if ℓ − 𝑑/4 < 𝜈 < ℓ.
Thus, 𝑓 (𝑛, 𝑘) (𝑊2/𝑊1 − 1) = 𝑓 (𝑛, 𝑘) ( (𝑊2 − 1) − (𝑊1 − 1))/𝑊1 → 0 a.s. as 𝑛 → ∞ for any 𝑘 > 0.
It follows from (2.43) that as 𝑛 → ∞,
𝑓 (𝑛, 𝑘) ( ˆ 𝜈𝑛,ℓ − 𝜈) =
𝑓 (𝑛, 𝑘)
2 log 2
log

𝑊2
𝑊1
(1 + 𝑂(ℎ(𝑛)))

∼ 𝑓 (𝑛, 𝑘)

𝑊2
𝑊1
(1 + 𝑂(ℎ(𝑛))) − 1

= 𝑓 (𝑛, 𝑘) (𝑊2/𝑊1 − 1) + 𝑓 (𝑛, 𝑘)𝑂(ℎ(𝑛)).
When 𝑑 ∈ {1, 2, 3}, it always holds that ℓ − 1 < ℓ − 𝑑/4 < ℓ and 𝑑/4 ∉ Z, so
𝑓 (𝑛, 𝑘)ℎ(𝑛) =
8>>>>>>>>>>>>>>>>
<
>>>>>>>>>>>>>>>>:
𝑛𝑑(1−𝛾0)/2−𝛾0−𝑘 + 𝑛(𝑑/2−2) (1−𝛾0)−𝑘 if 0 < 𝜈 < ℓ − 1,
𝑛𝑑(1−𝛾0)/2−𝛾0−𝑘 + 𝑛(𝑑/2−2) (1−𝛾0)−𝑘 log 𝑛 if 𝜈 = ℓ − 1,
𝑛𝑑(1−𝛾0)/2−𝛾0−𝑘 + 𝑛(𝑑/2−2ℓ+2𝜈) (1−𝛾0)−𝑘 if ℓ − 1 < 𝜈 < ℓ − 𝑑/4,
(𝑛𝑑(1−𝛾0)/2−𝛾0−𝑘 + 𝑛(𝑑/2−2ℓ+2𝜈) (1−𝛾0)−𝑘 ) (log 𝑛)−1/2 if 𝜈 = ℓ − 𝑑/4,
𝑛(2ℓ−2𝜈) (1−𝛾0)−𝛾0−𝑘 + 𝑛−𝑘 if ℓ − 𝑑/4 < 𝜈 < ℓ.
This finishes the proof.
32
Remark 1. Briefly speaking, as 𝑛 → ∞, it holds that
𝑛(1−𝛾0) (𝑑/2∧(2ℓ−2𝜈))−𝑘 (𝜈ˆ − 𝜈) 𝑎→.𝑠. 0 if 𝜈 ≠ ℓ − 𝑑/4, (2.44)
𝑛𝑑(1−𝛾0)/2−𝑘 (log 𝑛)−1/2 (𝜈ˆ − 𝜈) 𝑎→.𝑠. 0 if 𝜈 = ℓ − 𝑑/4, (2.45)
where 𝑘 is a constant whose range depends on 𝑑, 𝛾0, and ℓ − 𝜈.
In the remaining of this section, we consider a bivariate Gaussian process 𝑋(𝑡) = (𝑋1(𝑡), 𝑋2(𝑡))
with zero mean and covariance function
𝐶(𝑡) =
©­­
«
𝐶11 (𝑡) 𝐶12 (𝑡)
𝐶21(𝑡) 𝐶22 (𝑡)
ª®®
¬
,
where 𝐶𝑖 𝑗 is the Matérn covariance function
𝐶𝑖 𝑗 (𝑡) =
𝜎2
𝑖 𝑗
(𝜂𝑖 𝑗 |𝑡 |)𝜈𝑖 𝑗
2𝜈𝑖 𝑗−1Γ(𝜈𝑖 𝑗 )
𝜅𝜈𝑖 𝑗
(𝜂𝑖 𝑗 |𝑡 |), ∀𝑡 ∈ R, (2.46)
where 𝑖, 𝑗 ∈ {1, 2}, 𝜎12 = 𝜎21 = 𝜌𝜎11𝜎22, 𝜈𝑖 𝑗 , 𝜂𝑖 𝑗 , 𝜎11, 𝜎22 > 0, |𝜌| ∈ (0, 1).
Under the stratified sampling design introduced in Appendix A.2.3, write
𝑌𝜃
𝑛,1 = (∇1
𝜃,1,ℓ𝑋1, ∇1
𝜃,1,ℓ𝑋2, . . . , ∇1
𝜃,1,ℓ𝑋𝑛−2ℓ𝜔𝑛
)𝑇 ,
𝑌𝜃
𝑛,2 = (∇2
𝜃,1,ℓ𝑋1, ∇2
𝜃,1,ℓ𝑋2, . . . , ∇2
𝜃,1,ℓ𝑋𝑛−2ℓ𝜔𝑛
)𝑇 ,
𝑌𝜃
𝑛 =
©­­
«
𝑌𝜃
𝑛,1
𝑌𝜃
𝑛,2
ª®®
¬
∈ R2(𝑛−2ℓ𝜔𝑛) ,
and define the covariation as
𝑍𝜃
𝑛,12 =
Õ
1≤𝑖≤𝑛−2ℓ𝜔𝑛

∇1
𝜃,1,ℓ𝑋𝑖
 
∇2
𝜃,1,ℓ𝑋𝑖

=
1
2
(𝑌𝜃
𝑛
)𝑇 ©­­
«
0 𝐼𝑛−2ℓ𝜔𝑛
𝐼𝑛−2ℓ𝜔𝑛 0
ª®®
¬
𝑌𝜃
𝑛 , (2.47)
where 𝜃 ∈ {1, 2}, ℓ ∈ Z+, and
∇𝑘
𝜃,1,ℓ𝑋𝑖 =
Õℓ¯
𝑗=0
𝑐i,𝜃,1,ℓ ( 𝑗 )𝑋𝑘 (xi, 𝑗 ), 𝑖 ∈ {1, . . . , 𝑛 − 2ℓ𝜔𝑛}, 𝑘 = 1, 2. (2.48)
33
Proposition 3. When 2(𝜈11 + 𝜈22) < 4𝜈12 < {(2(𝜈11 + 𝜈22) + 1) ∧ 4ℓ} and 𝜈11 ∨ 𝜈22 < ℓ,
𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
𝑎→.𝑠. 1 as 𝑛 → ∞, (2.49)
where 𝜃 ∈ {1, 2} and ℓ ∈ Z+.
Proof. It follows from Theorem 1 (a) in Loh et al. (2021) that as 𝑛 → ∞,
𝐸𝑍𝜃
𝑛,12 =

𝜔𝑛𝜃
𝑛
2𝜈12
(𝑛 − 2ℓ𝜔𝑛)©­
«
𝛽∗ Õ
1≤ 𝑗 ,𝑘≤ℓ
𝑐 𝑗 ,𝜃,1,ℓ𝑐𝑘,𝜃,1,ℓ𝐺𝜈12
(| 𝑗 − 𝑘 |) + 𝑜(1)ª®
¬
, (2.50)
where 𝜃 ∈ {1, 2} and ℓ ∈ Z+. For 𝑘 = 1, 2, let
∇𝑘
𝜃,ℓ𝑋 =

∇𝑘
𝜃,1,ℓ𝑋𝑖
𝑛−2ℓ𝜔𝑛
𝑖=1
and write 𝑊𝑘
𝜃𝜃
(𝑛, ℓ) = 𝐶𝑜𝑣(∇𝑘
𝜃,ℓ𝑋, ∇𝑘
𝜃,ℓ𝑋), 𝑊12
𝜃𝜃
(𝑛, ℓ) = 𝐶𝑜𝑣(∇1
𝜃,ℓ𝑋, ∇2
𝜃,ℓ𝑋). Then the variance of
the covariation follows
𝑣𝑎𝑟
𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
!
=
𝐸(𝑍𝜃
𝑛,12
)2 − (𝐸𝑍𝜃
𝑛,12
)2
(𝐸𝑍𝜃
𝑛,12
)2
=
Í
1≤𝑖, 𝑗≤𝑛−2ℓ𝜔𝑛 𝐸

∇1
𝜃,1,ℓ𝑋𝑖∇1
𝜃,1,ℓ𝑋𝑗∇2
𝜃,1,ℓ𝑋𝑖∇2
𝜃,1,ℓ𝑋𝑗

− (𝐸𝑍𝜃
𝑛,12
)2
(𝐸𝑍𝜃
𝑛,12
)2
=
(𝐸𝑍𝜃
𝑛,12
)2 + Í
1≤𝑖, 𝑗≤𝑛−2ℓ𝜔𝑛

(𝑊1
𝜃𝜃
)𝑖, 𝑗 (𝑊2
𝜃𝜃
)𝑖, 𝑗 + (𝑊12
𝜃𝜃
)2
𝑖, 𝑗

− (𝐸𝑍𝜃
𝑛,12
)2
(𝐸𝑍𝜃
𝑛,12
)2
=
1
(𝐸𝑍𝜃
𝑛,12
)2
Õ
1≤𝑖, 𝑗≤𝑛−2ℓ𝜔𝑛
(𝑊1
𝜃𝜃
)𝑖, 𝑗 (𝑊2
𝜃𝜃
)𝑖, 𝑗 + (𝑊12
𝜃𝜃
)2
𝑖, 𝑗 .
It follows from the same manner as in (3.18-3.19) of Loh et al. (2021) that, based on the definition
of 𝑐𝑖,𝜃,1,ℓ in (A.20) and the Taylor expansion of the function 𝐶12,
1
(𝐸𝑍𝜃
𝑛,12
)2
Õ
1≤𝑖, 𝑗≤𝑛−2ℓ𝜔𝑛
(𝑊12
𝜃𝜃
)2
𝑖, 𝑗 =
8>>>>>>>>
<
>>>>>>>>:
𝑂
􀀀𝜔𝑛
𝑛

, 0 < 𝜈12 < ℓ − 1/4,
𝑂

𝜔𝑛
𝑛 log 𝑛
𝜔𝑛

, 𝜈12 = ℓ − 1/4,
𝑂
􀀀𝜔𝑛
𝑛
4ℓ−4𝜈12

, ℓ − 1/4 < 𝜈12 < ℓ
(2.51)
34
as 𝑛 → ∞. Similarly, when 𝜈11 ∨ 𝜈22 < ℓ,
1
(𝐸𝑍𝜃
𝑛,12
)2
Õ
1≤𝑖, 𝑗≤𝑛−2ℓ𝜔𝑛
(𝑊1
𝜃𝜃
)𝑖, 𝑗 (𝑊2
𝜃𝜃
)𝑖, 𝑗
=
8>>>>>>>>
<
>>>>>>>>:
𝑂
􀀀𝜔𝑛
𝑛
2𝜈11+2𝜈22−4𝜈12+1

, 0 < 2(𝜈11 + 𝜈22) < 4ℓ − 1,
𝑂
􀀀𝜔𝑛
𝑛
2𝜈11+2𝜈22−4𝜈12+1 log 𝑛
𝜔𝑛

, 2(𝜈11 + 𝜈22) = 4ℓ − 1,
𝑂
􀀀𝜔𝑛
𝑛
4ℓ−4𝜈12

, 4ℓ − 1 < 2(𝜈11 + 𝜈22) < 4ℓ
(2.52)
as 𝑛 → ∞. Thus,
𝑣𝑎𝑟
𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
!
=
8>>>>>>>>
<
>>>>>>>>:
𝑂
􀀀𝜔𝑛
𝑛
2𝜈11+2𝜈22−4𝜈12+1

, 0 < 2(𝜈11 + 𝜈22) < 4𝜈12 ≤ 4ℓ − 1,
𝑂
􀀀𝜔𝑛
𝑛
2𝜈11+2𝜈22−4𝜈12+1 log 𝑛
𝜔𝑛

, 4ℓ − 1 = 2(𝜈11 + 𝜈22) < 4𝜈12 < 4ℓ,
𝑂
􀀀𝜔𝑛
𝑛
4ℓ−4𝜈12

, 4ℓ − 1 < 2(𝜈11 + 𝜈22) < 4𝜈12 < 4ℓ
(2.53)
as 𝑛 → ∞. Consequently, when 2(𝜈11 + 𝜈22) < 4𝜈12 < {(2(𝜈11 + 𝜈22) + 1) ∧ 4ℓ} and 𝜈11 ∨ 𝜈22 < ℓ,
𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
𝑃 →
1 as 𝑛 → ∞.
According to the definition in (2.47),
𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
d=
𝑈𝑇Σ𝜃𝑛 𝑈,
where 𝑈 ∼ 𝑁(0, 𝐼2(𝑛−2ℓ𝜔𝑛)) and
Σ𝜃𝑛
=
1
2𝐸𝑍𝜃
𝑛,12
Cov(𝑌𝜃
𝑛
)1/2©­­
«
0 𝐼𝑛−2ℓ𝜔𝑛
𝐼𝑛−2ℓ𝜔𝑛 0
ª®®
¬
Cov(𝑌𝜃
𝑛
)1/2.
The Hanson-Wright inequality implies that there exists an absolute constant 𝐶 > 0 such that ∀𝜖 > 0,
𝑃

𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
− 1

≥ 𝜖
!
= 𝑃

𝑈𝑇Σ𝜃𝑛
𝑈 − 𝐸[𝑈𝑇Σ𝜃𝑛
𝑈]

≥ 𝜖

≤ 2 exp
−𝐶 min
(
𝜖
||Σ𝜃 𝑛
||2
,
𝜖2
||Σ𝜃 𝑛
||2
𝐹
)!
. (2.54)
Since ||Σ𝜃𝑛 ||2 ≤ ||Σ𝜃𝑛
||𝐹 and
||Σ𝜃𝑛
||2
𝐹 =
1
2𝑣𝑎𝑟
𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
!
,
35
the Borel-Cantelli lemma together with (2.53) and (2.54) induces that if 2(𝜈11 + 𝜈22) < 4𝜈12 <
{(2(𝜈11 + 𝜈22) + 1) ∧ 4ℓ} and 𝜈11 ∨ 𝜈22 < ℓ, then
𝑍𝜃
𝑛,12
𝐸𝑍𝜃
𝑛,12
𝑎→.𝑠. 1 as 𝑛 → ∞. (2.55)
This finishes the proof.
Consequently, the estimator defined as
ˆ 𝜈12 =
log(𝑍2
𝑛,12
/𝑍1
𝑛,12
)2
4 log 2
(2.56)
is a strongly consistent estimator for 𝜈12 based on irregularly spaced data.
Theorem 7. Under the conditions of Proposition 3,
ˆ 𝜈12
𝑎→.𝑠. 𝜈12 as 𝑛 → ∞. (2.57)
Proof. It follows from (2.50) that
𝐸𝑍2
𝑛,12
𝐸𝑍1
𝑛,12
→ 22𝜈12 as 𝑛 → ∞.
By the result of Proposition 3,
𝑍2
𝑛,12
𝑍1
𝑛,12
=
𝑍2
𝑛,12
/𝐸𝑍2
𝑛,12
𝑍1
𝑛,12
/𝐸𝑍1
𝑛,12
·
𝐸𝑍2
𝑛,12
𝐸𝑍1
𝑛,12
𝑎→.𝑠. 22𝜈12 as 𝑛 → ∞. (2.58)
The proof is completed by applying the continuous mapping theorem.
36
CHAPTER 3
ANISOTROPIC ORNSTEIN-UHLENBECK FIELD
3.1 Introduction
Proposed by Uhlenbeck and Ornstein (1930), the Ornstein-Uhlenbeck process is widely used
in spatial statistics and finance. Denote by {𝑊(𝑢, 𝑡); 𝑢, 𝑡 ∈ R+} a standard Wiener field, then the
random field
𝑋(𝑢, 𝑡) = 𝜎 exp(−𝜆𝑢 − 𝜇𝑡)𝑊

𝑒2𝜆𝑢, 𝑒2𝜇𝑡

, 𝑢, 𝑡 ∈ R (3.1)
is a zero-mean stationary Ornstein-Uhlenbeck field on R2 with covariance function
Cov (𝑋 (𝑢, 𝑡) , 𝑋 (𝑣, 𝑠)) = 𝜎2 exp (−𝜆|𝑢 − 𝑣| − 𝜇|𝑡 − 𝑠|) , ∀𝑢, 𝑡, 𝑣, 𝑠 ∈ R, (3.2)
where (𝜎2, 𝜆, 𝜇) ∈ R3
>0. As indicated by Theorem 7.2 in Piterbarg (1995), the parameters 𝜎2,
𝜆, and 𝜇 characterize the high excursion probability of 𝑋 on a closed Jordan set (the details are
provided in Appendix B). Estimating their values is thus of significance in extreme value theory
and has applications in risk assessment for rare events.
Ying (1993) proves the strong consistency and asymptotic normality of the maximum likelihood
estimators (MLEs) for 𝜎2, 𝜆, and 𝜇 in (3.2), thus has presented the identifiability of the parameters.
The MLEs are asymptotically efficient as shown by van der Vaart (1996). The MLE is also
commonly used to estimate covariance parameter under other models. For Gaussian random fields
on R𝑑 (𝑑 = 1, 2, 3) with the isotropic Matérn covariance function, Bachoc et al. (2019) studied
the asymptotic distributions of MLE and constrained MLE for the variance and correlation length
parameters. Bevilacqua et al. (2019) investigated strong consistency and asymptotic distribution of
the MLE for the microergodic parameters in generalized Wendland covariance functions. However,
the calculation of precision matrices and numerical optimizations usually make it computationally
expensive to get MLEs. To reduce the computational cost, approaches aiming at sparse covariance
matrices or sparse precision matrices have been widely studied, such as covariance tapering (Furrer
et al., 2006; Kaufman et al., 2008; Du et al., 2009) and Vecchia approximations (Vecchia, 1988;
Pardo-Igúzquiza and Dowd, 1997; Katzfuss and Guinness, 2021).
37
For Gaussian random fields with the covariance function
Cov (𝑋 (u) , 𝑋 (v)) = 𝜎2
Ö𝑑
𝑖=1
exp (−𝜃𝑖 |𝑢𝑖 − 𝑣𝑖 |𝛾) , ∀u, v ∈ R𝑑, (3.3)
Lam and Loh (2000) proved the strong consistency of MLEs for 𝜃1, . . . , 𝜃𝑑 when 𝛾 = 2, based
on observations on a regular lattice. Later, Wang (2010) provided consistent estimators for 𝜎2
and 𝜃1, . . . , 𝜃𝑑 using quadratic variations and spectral analysis when 𝑑 ≥ 2 and 0 < 𝛾 < 1. The
covariance function of the Ornstein-Uhlenbeck field 𝑋 we consider in this chapter is a special case
of (3.3) with 𝑑 = 2 and 𝛾 = 1. Since 𝑋 is Markovian, its precision matrix has sparse closed-form
expression (Baldi Antognini and Zagoraiou, 2010), which reduces the computational complexity
and the memory storage requirement of MLEs. The estimators we propose in this chapter are
computationally more efficient than MLEs, while their strong consistency and asymptotic normality
still hold.
This chapter is organized as follows. We formulate estimations for 𝜎2𝜇, 𝜎2𝜆, and 𝜎2𝜆𝜇 in
Section 3.2 based on MLEs. Section 3.3 includes estimations for 𝜎2, 𝜆, and 𝜇, as well as the
asymptotic behaviors of the estimators. Some simulation results are presented in Section 3.4. In
Section 3.5, conclusions and our future research plans are provided.
3.2 Product Estimation
Denote by 𝑥𝑖 𝑗 = 𝑋
􀀀
𝑢𝑖 , 𝑡 𝑗

, 𝑥𝑖 = (𝑥𝑖1, . . . , 𝑥𝑖𝑛)𝑇 , 𝑥 = (𝑥𝑇1 , 𝑥𝑇2 , . . . , 𝑥𝑇
𝑚
)𝑇 ∈ R𝑚𝑛 and
𝐴 (𝜆) =

𝑒−𝜆|𝑢𝑖−𝑢 𝑗 |

𝑚×𝑚
, 𝐵 (𝜇) =

𝑒−𝜇|𝑡𝑖−𝑡 𝑗 |

𝑛×𝑛
, (3.4)
where 0 = 𝑢0 < 𝑢1 < · · · < 𝑢𝑚 = 1, 0 = 𝑡0 < 𝑡1 < · · · < 𝑡𝑛 = 1. Then 𝑥 ∼ 𝑁
􀀀
0, 𝜎2𝐴 (𝜆) ⊗ 𝐵 (𝜇)

.
For notational convenience, write Δ𝑢𝑖 = 𝑢𝑖 − 𝑢𝑖−1 (𝑖 = 1, · · · , 𝑚) and Δ𝑡𝑖 = 𝑡𝑖 − 𝑡𝑖−1 (𝑖 = 1, · · · , 𝑛).
Suppose max𝑖 Δ𝑢𝑖 → 0 as 𝑚 → ∞ and max𝑖 Δ𝑡𝑖 → 0 as 𝑛 → ∞. Define estimators for 𝜎2𝜇, 𝜎2𝜆,
and 𝜎2𝜆𝜇 as
d𝜎2𝜇 =
1
𝑛
Õ𝑚
𝑖=1
𝑥𝑇
𝑖·𝐵−1 (1)𝑥𝑖·Δ𝑢𝑖 , (3.5)
d𝜎2𝜆 =
1
𝑚
Õ𝑛
𝑗=1
𝑥𝑇 · 𝑗 𝐴−1(1)𝑥· 𝑗Δ𝑡 𝑗 , (3.6)
38
[𝜎2𝜆𝜇 =
1
𝑚𝑛
𝑥𝑇

𝐴−1 (1) ⊗ 𝐵−1 (1)

𝑥. (3.7)
In what follows, we discuss the asymptotic behaviors of the estimators in (3.5-3.7) as 𝑛 → ∞ and
𝑚 → ∞.
Proposition 4. Under model (3.2), as 𝑛 → ∞ and 𝑚 → ∞,
𝐸d𝜎2𝜇 = 𝜎2𝜇 − 𝜎2 (𝜇 + 1)2 − 4
2𝑛
+ 𝑜(𝑛−1),
𝐸d𝜎2𝜆 = 𝜎2𝜆 − 𝜎2 (𝜆 + 1)2 − 4
2𝑚
+ 𝑜(𝑚−1),
𝐸[𝜎2𝜆𝜇 = 𝜎2𝜆𝜇 − 𝜎2𝑚𝜆( (𝜇 + 1)2 − 4) + 𝑛𝜇( (𝜆 + 1)2 − 4)
2𝑚𝑛
+ 𝑜(𝑛−1) + 𝑜(𝑚−1).
Proof. For any 1 ≤ 𝑖 ≤ 𝑚, since 𝑥𝑖· ∼ 𝑁
􀀀
0, 𝜎2𝐵(𝜇)

, we have
𝐸

1
𝑛
𝑥𝑇
𝑖·𝐵−1 (1)𝑥𝑖·

=
𝜎2
𝑛
Tr

𝑀𝐵
𝜇

,
where 𝑀𝐵
𝜇 = 𝐵−1 (1)𝐵(𝜇). As a result,
𝐸d𝜎2𝜇 =
Õ𝑚
𝑖=1
𝐸

1
𝑛
𝑥𝑇
𝑖·𝐵−1(1)𝑥𝑖·

Δ𝑢𝑖 =
𝜎2
𝑛
Tr

𝑀𝐵
𝜇

because
Í𝑚
𝑖=1 Δ𝑢𝑖 = 1.
It is well known that the 𝑛 × 𝑛 precision matrix 𝐵−1 (1) has entries as

𝐵−1 (1)

𝑖, 𝑗
=
8>>>>>>>>>>>>>>>>
<
>>>>>>>>>>>>>>>>:
1
1−exp(−2|𝑡1−𝑡2 |) , if 𝑖 = 𝑗 = 1,
1
1−exp(−2|𝑡𝑛−1−𝑡𝑛|) , if 𝑖 = 𝑗 = 𝑛,
1
1−exp(−2|𝑡𝑖−1−𝑡𝑖 |) + exp(−2|𝑡𝑖−𝑡𝑖+1 |)
1−exp(−2|𝑡𝑖−𝑡𝑖+1 |) , if 1 < 𝑖 = 𝑗 < 𝑛,
− exp(−|𝑡𝑖−𝑡 𝑗 |)
1−exp(−2|𝑡𝑖−𝑡 𝑗 |) , if |𝑖 − 𝑗 | = 1,
0, if |𝑖 − 𝑗 | > 1.
Thus, the entries of 𝑀𝐵
𝜇 are

𝑀𝐵
𝜇

𝑖, 𝑗
=
8>>>>>>>>
<
>>>>>>>>:
˜𝐵
2𝑏1 𝑗 − 𝑞1 𝑗 , if 𝑖 = 1,
(˜
𝐵
𝑖 + 𝐵𝑖)𝑏𝑖 𝑗 − 𝑝𝑖 𝑗 − 𝑞𝑖 𝑗 , if 1 < 𝑖 < 𝑛,
˜𝐵
𝑛𝑏𝑛 𝑗 − 𝑝𝑛 𝑗 , if 𝑖 = 𝑛,
(3.8)
39
where 1 ≤ 𝑗 ≤ 𝑛,
𝐵𝑖 =
exp(−2|𝑡𝑖+1 − 𝑡𝑖 |)
1 − exp(−2|𝑡𝑖+1 − 𝑡𝑖 |) , ˜
𝐵
𝑖 = 1 + 𝐵𝑖−1, 𝑏𝑖 𝑗 = exp(−𝜇|𝑡𝑖 − 𝑡 𝑗 |),
𝑝𝑖 𝑗 =
exp(−|𝑡𝑖−1 − 𝑡𝑖 |)
1 − exp(−2|𝑡𝑖−1 − 𝑡𝑖 |) 𝑏(𝑖−1) 𝑗 , 𝑞𝑖 𝑗 =
exp(−|𝑡𝑖+1 − 𝑡𝑖 |)
1 − exp(−2|𝑡𝑖+1 − 𝑡𝑖 |) 𝑏(𝑖+1) 𝑗 .
Since max𝑖 Δ𝑡𝑖 → 0 as 𝑛 → ∞, it further holds that
Tr

𝑀𝐵
𝜇

= 𝑛 + 2
Õ𝑛
𝑖=2
𝑒−2Δ𝑡𝑖 (1 − 𝑒−(𝜇−1)Δ𝑡𝑖 )
1 − 𝑒−2Δ𝑡𝑖
= 𝑛 + (𝜇 − 1)
Õ𝑛
𝑖=2

1 − Δ𝑡𝑖 + 𝑂

(Δ𝑡𝑖)2

= 𝑛 + (𝜇 − 1) (𝑛 − 1 − (1 − 𝑡1) + 𝑜(1))
and 𝐸d𝜎2𝜇 = 𝜎2
𝑛 Tr

𝑀𝐵
𝜇

= 𝜎2𝜇 − 𝜎2 (𝜇+1)2−4
2𝑛
+ 𝑜(𝑛−1) as 𝑛 → ∞. In a similar manner, there is
𝐸d𝜎2𝜆 = 𝜎2𝜆 − 𝜎2 (𝜆 + 1)2 − 4
2𝑚
+ 𝑜(𝑚−1)
as 𝑚 → ∞.
Moreover,
𝐸[𝜎2𝜆𝜇 =
1
𝑚𝑛
𝐸𝑥𝑇

𝐴−1(1) ⊗ 𝐵−1(1)

𝑥
=
𝜎2
𝑚𝑛
Tr

𝐴−1 (1) ⊗ 𝐵−1 (1)

(𝐴(𝜆) ⊗ 𝐵(𝜇))

=
𝜎2
𝑚𝑛
Tr

𝐴−1(1)𝐴(𝜆)

Tr

𝐵−1(1)𝐵(𝜇)

=
1
𝜎2 𝐸d𝜎2𝜆𝐸d𝜎2𝜇
= 𝜎2𝜆𝜇 − 𝜎2𝑚𝜆( (𝜇 + 1)2 − 4) + 𝑛𝜇( (𝜆 + 1)2 − 4)
2𝑚𝑛
+ 𝑜(𝑛−1) + 𝑜(𝑚−1)
as 𝑛 → ∞ and 𝑚 → ∞. This finishes the proof.
Proposition 4 indicates that d𝜎2𝜆, d𝜎2𝜇, and [𝜎2𝜆𝜇 are asymptotically unbiased estimators for
𝜎2𝜆, 𝜎2𝜇, and 𝜎2𝜆𝜇. To further study the convergence of variances of the estimators, we first
introduce the following lemma regarding variances of quadratic forms.
40
Lemma 2. Under model (3.2), as 𝑛 → ∞ and 𝑚 → ∞,
Var

1
𝑛
𝑥𝑇
𝑖·𝐵−1(1)𝑥𝑖·

=
2
𝑛
(𝜎2𝜇)2 + 𝑂(𝑛−2), ∀1 ≤ 𝑖 ≤ 𝑚, (3.9)
Var

1
𝑚
𝑥𝑇 · 𝑗 𝐴−1(1)𝑥· 𝑗

=
2
𝑚
(𝜎2𝜆)2 + 𝑂(𝑚−2), ∀1 ≤ 𝑗 ≤ 𝑛. (3.10)
Proof. Since 𝑥𝑖· ∼ 𝑁
􀀀
0, 𝜎2𝐵(𝜇)

for any 1 ≤ 𝑖 ≤ 𝑚, we have
Var

1
𝑛
𝑥𝑇
𝑖·𝐵−1(1)𝑥𝑖·

= 2

𝜎2
𝑛
2
Tr

(𝑀𝐵
𝜇
)2

,
where 𝑀𝐵
𝜇 = 𝐵−1 (1)𝐵(𝜇). Recall the entries of 𝑀𝐵
𝜇 in (3.8), we thus have
Tr

(𝑀𝐵
𝜇
)2

=
􀀀˜
𝐵
2𝑏11 − 𝑞11
2 + 2
􀀀˜
𝐵
2𝑏1𝑛 − 𝑞1𝑛
 􀀀˜
𝐵
𝑛𝑏𝑛1 − 𝑝𝑛1

+
􀀀˜
𝐵
𝑛𝑏𝑛𝑛 − 𝑝𝑛𝑛
2
+ 2
Õ𝑛−1
𝑖=2
􀀀˜
𝐵
2𝑏1𝑖 − 𝑞1𝑖
 􀀀􀀀˜
𝐵
𝑖 + 𝐵𝑖

𝑏𝑖1 − 𝑝𝑖1 − 𝑞𝑖1

+ 2
Õ𝑛−1
𝑖=2
􀀀˜
𝐵
𝑛𝑏𝑛𝑖 − 𝑝𝑛𝑖
 􀀀􀀀˜
𝐵
𝑖 + 𝐵𝑖

𝑏𝑖𝑛 − 𝑝𝑖𝑛 − 𝑞𝑖𝑛

+
Õ𝑛−1
𝑘=2
Õ𝑛−1
𝑖=2
􀀀􀀀˜
𝐵
𝑘 + 𝐵𝑘

𝑏𝑘𝑖 − 𝑝𝑘𝑖 − 𝑞𝑘𝑖
 􀀀􀀀˜
𝐵
𝑖 + 𝐵𝑖

𝑏𝑖𝑘 − 𝑝𝑖𝑘 − 𝑞𝑖𝑘

.
For the convenience of expression, we introduce a few more notations as below. Denote by
𝑇1 = (˜
𝐵
2 − 𝑞11)2 + (˜
𝐵
𝑛 − 𝑝𝑛𝑛)2 +
Õ𝑛−1
𝑘=2
(˜𝐵
𝑘 + 𝐵𝑘 − 𝑝𝑘 𝑘 − 𝑞𝑘 𝑘 )2,
𝑇2 =
Õ𝑛−1
𝑖=2
􀀀
(˜
𝐵
2𝑏𝑖1 − 𝑞1𝑖)
􀀀
(˜
𝐵
𝑖 + 𝐵𝑖)𝑏𝑖1 − 𝑝𝑖1 − 𝑞𝑖1

+ (˜
𝐵
𝑛𝑏𝑖𝑛 − 𝑝𝑛𝑖)
􀀀
(˜
𝐵
𝑖 + 𝐵𝑖)𝑏𝑖𝑛 − 𝑝𝑖𝑛 − 𝑞𝑖𝑛

,
𝑇3 =
Õ𝑛−1
𝑖,𝑘=2
𝑘≠𝑖
􀀀
(˜
𝐵
𝑘 + 𝐵𝑘 )𝑏𝑘𝑖 − 𝑝𝑘𝑖 − 𝑞𝑘𝑖
 􀀀
(˜
𝐵
𝑖 + 𝐵𝑖)𝑏𝑖𝑘 − 𝑝𝑖𝑘 − 𝑞𝑖𝑘

,
𝑇4 = (˜
𝐵
2𝑏1𝑛 − 𝑞1𝑛) (˜
𝐵
𝑛𝑏1𝑛 − 𝑝𝑛1),
then Tr

(𝑀𝐵
𝜇
)2

= 𝑇1 + 2𝑇2 + 𝑇3 + 2𝑇4. As 𝑛 → ∞,
𝑇1 =
1
2
(𝜇 + 1)2 − 1
4
(𝜇 + 1) (𝜇2 − 1) (Δ𝑡2 + Δ𝑡𝑛) + (𝑛 − 2)𝜇2 − 1
2 𝜇(𝜇2 − 1) (𝑡𝑛 − 𝑡2 + 𝑡𝑛−1 − 𝑡1)
+
Õ𝑛
𝑘=2
𝑂( (Δ𝑡𝑘 )2) +
Õ𝑛−1
𝑘=2
𝑂(Δ𝑡𝑘Δ𝑡𝑘+1)
=𝑛𝜇2 + 𝑂(1).
41
It thus remains to prove 2𝑇2 + 𝑇3 + 2𝑇4 = 𝑂(1) as 𝑛 → ∞.
As was previously defined,
˜𝐵
2𝑏1𝑘 − 𝑞1𝑘 = 𝑒−𝜇(𝑡𝑘−𝑡1) 1 − 𝑒−(1−𝜇)Δ𝑡2
1 − 𝑒−2Δ𝑡2
, ∀𝑘 ≥ 2,
˜𝐵
𝑛𝑏𝑛𝑘 − 𝑝𝑛𝑘 = 𝑒−𝜇(𝑡𝑛−𝑡𝑘 ) 1 − 𝑒−(1−𝜇)Δ𝑡𝑛
1 − 𝑒−2Δ𝑡𝑛
, ∀𝑘 ≤ 𝑛 − 1.
For any 2 ≤ 𝑖, 𝑘 ≤ 𝑛 − 1 and 𝑖 ≠ 𝑘,
(˜
𝐵
𝑘 + 𝐵𝑘 )𝑏𝑘𝑖 − 𝑝𝑘𝑖 − 𝑞𝑘𝑖
=
𝑒−𝜇|𝑡𝑘−𝑡𝑖 | − 𝑒−Δ𝑡𝑘−𝜇|𝑡𝑘−1−𝑡𝑖 |
1 − 𝑒−2Δ𝑡𝑘
+ 𝑒−2Δ𝑡𝑘+1−𝜇|𝑡𝑘−𝑡𝑖 | − 𝑒−Δ𝑡𝑘+1−𝜇|𝑡𝑘+1−𝑡𝑖 |
1 − 𝑒−2Δ𝑡𝑘+1
=
8>>>>
<
>>>>:
𝑒−𝜇(𝑡𝑘−𝑡𝑖 )

1−𝑒−(1−𝜇)Δ𝑡𝑘
1−𝑒−2Δ𝑡𝑘
+ 𝑒−2Δ𝑡𝑘+1−𝑒−(1+𝜇)Δ𝑡𝑘+1
1−𝑒−2Δ𝑡𝑘+1

, if 𝑖 ≤ 𝑘 − 1,
𝑒−𝜇(𝑡𝑖−𝑡𝑘 )

1−𝑒−(1+𝜇)Δ𝑡𝑘
1−𝑒−2Δ𝑡𝑘
+ 𝑒−2Δ𝑡𝑘+1−𝑒−(1−𝜇)Δ𝑡𝑘+1
1−𝑒−2Δ𝑡𝑘+1

, if 𝑖 ≥ 𝑘 + 1.
Thus as 𝑛 → ∞,
𝑇3 =
1
8

1 − 𝜇2
2 Õ𝑛−1
𝑖,𝑘=2
𝑖<𝑘
𝑒−2𝜇(𝑡𝑘−𝑡𝑖 )

𝑡𝑘+1 − 𝑡𝑘−1 + 𝑂( (Δ𝑡𝑘 )2) + 𝑂( (Δ𝑡𝑘+1)2)

(𝑡𝑖+1 − 𝑡𝑖−1
+𝑂( (Δ𝑡𝑖)2) + 𝑂( (Δ𝑡𝑖+1)2)

∝
Õ𝑛−1
𝑖,𝑘=2
𝑖<𝑘
𝑒−2𝜇(𝑡𝑘−𝑡𝑖 ) (𝑡𝑘+1 − 𝑡𝑘−1) (𝑡𝑖+1 − 𝑡𝑖−1) + 𝑜(1)
≤ 2
Õ𝑛−1
𝑘=2
(Δ𝑡𝑘 + Δ𝑡𝑘+1) + 𝑜(1)
= 𝑂(1).
Similarly, 𝑇2 = 𝑂(1) and 𝑇4 = 𝑂(1) as 𝑛 → ∞. This finishes the proof of (3.9). The proof of (3.10)
follows the same manner and is thus omitted.
Based on Lemma 2, the rates of convergence for d𝜎2𝜆, d𝜎2𝜇, and[𝜎2𝜆𝜇 are derived as follows.
Proposition 5. Under model (3.2), as 𝑛 → ∞ and 𝑚 → ∞,
Var(d𝜎2𝜇) =
1
𝑛𝜆2

2𝜆 − 1 + 𝑒−2𝜆

(𝜎2𝜇)2 + 𝑂(𝑛−2),
42
Var(d𝜎2𝜆) =
1
𝑚𝜇2

2𝜇 − 1 + 𝑒−2𝜇

(𝜎2𝜆)2 + 𝑂(𝑚−2),
Var([𝜎2𝜆𝜇) =
2
𝑚𝑛

𝜎2𝜆𝜇
2
+ 𝑂

𝑚−2𝑛−1

+ 𝑂

𝑚−1𝑛−2

.
Proof. Under model (3.2),
Var(d𝜎2𝜇) = Var

1
𝑛
𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1 (1)

𝑥

= 2

𝜎2
𝑛
2
Tr

(𝐷𝑚𝐴(𝜆)) ⊗

𝐵−1(1)𝐵(𝜇)
2
= 2

𝜎2
𝑛
2
Tr

(𝐷𝑚𝐴(𝜆))2

Tr

(𝑀𝐵
𝜇
)2

,
where 𝐷𝑚 denotes the 𝑚 × 𝑚 diagonal matrix with (𝐷𝑚)𝑖𝑖 = Δ𝑢𝑖 , 𝑖 = 1, 2, . . . , 𝑚.
As 𝑚 → ∞,
Tr

(𝐷𝑚𝐴(𝜆))2

=
Õ𝑚
𝑖, 𝑗=1
Δ𝑢𝑖Δ𝑢 𝑗 𝑒−2𝜆|𝑢𝑖−𝑢 𝑗 |
→
¹ 1
0
¹ 1
0
𝑒−2𝜆|𝑥−𝑦|𝑑𝑥𝑑𝑦
=
2𝜆 − 1 + 𝑒−2𝜆
2𝜆2 .
(3.11)
It follows from the proof of Lemma 2 that Tr

(𝑀𝐵
𝜇
)2

= 𝑛𝜇2 + 𝑂(1) as 𝑛 → ∞. Thus,
Var(d𝜎2𝜇) =
1
𝑛𝜆2

2𝜆 − 1 + 𝑒−2𝜆

(𝜎2𝜇)2 + 𝑂(𝑛−2)
as 𝑛 → ∞ and 𝑚 → ∞. The proof for the variance of d𝜎2𝜆 follows the same manner.
Moreover, as 𝑛 → ∞ and 𝑚 → ∞,
Var([𝜎2𝜆𝜇) = 2

𝜎2
𝑚𝑛
2
Tr

𝐴−1(1)𝐴(𝜆)

⊗

𝐵−1 (1)𝐵(𝜇)
2
=
1
2𝜎4 Var

1
𝑛
𝑥𝑇1·𝐵−1(1)𝑥1·

Var

1
𝑚
𝑥𝑇 · 𝑗 𝐴−1 (1)𝑥· 𝑗

=
2
𝑚𝑛

𝜎2𝜆𝜇
2
+ 𝑂

𝑚−2𝑛−1

+ 𝑂

𝑚−1𝑛−2

by the results of Lemma 2.
43
For each of the estimators formulated in (3.5-3.7), its asymptotic distribution is shown in the
following theorem.
Theorem 8. Under model (3.2), as 𝑛 → ∞ and 𝑚 → ∞,
√
𝑛(d𝜎2𝜇 − 𝜎2𝜇) 𝑑 → 𝑁

0,

2
𝜆
− 1 − 𝑒−2𝜆
𝜆2

(𝜎2𝜇)2

,
√
𝑚(d𝜎2𝜆 − 𝜎2𝜆) 𝑑 → 𝑁

0,

2
𝜇
− 1 − 𝑒−2𝜇
𝜇2

(𝜎2𝜆)2

.
Furthermore, when 𝑚 = 𝑟𝑛 and 𝑛 → ∞,
√
𝑚𝑛([𝜎2𝜆𝜇 − 𝜎2𝜆𝜇) 𝑑 → 𝑁

−𝜎2 𝑟𝜆( (𝜇 + 1)2 − 4) + 𝜇( (𝜆 + 1)2 − 4)
2
√
𝑟
, 2(𝜎2𝜆𝜇)2

.
Proof. Under model (3.2), the joint density of 𝑥 is
𝑝𝐽𝑚
𝑛
(𝜎2, 𝜆, 𝜇) := (2𝜋𝜎2)−𝑚𝑛/2| (𝐴(𝜆) ⊗ 𝐵(𝜇)) |−1/2 exp

− 1
2𝜎2 𝑥𝑇 (𝐴(𝜆) ⊗ 𝐵(𝜇))−1 𝑥

. (3.12)
For any 𝑚, 𝑛 ∈ Z+,
√
𝑚𝑛([𝜎2𝜆𝜇 − 𝜎2𝜆𝜇) =
1 √
𝑚𝑛
𝑥𝑇

𝐴−1(1) ⊗ 𝐵−1(1)

− 𝜆𝜇

𝐴−1 (𝜆) ⊗ 𝐵−1(𝜇)

𝑥
+
√
𝑚𝑛𝜆𝜇

𝑥𝑇 𝐴−1(𝜆) ⊗ 𝐵−1(𝜇)
𝑚𝑛
𝑥 − 𝜎2

=
2𝜎2
√
𝑚𝑛

𝐸 log 𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚𝑛 (𝜎2, 𝜆, 𝜇)
− log 𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)

+ 1 √
𝑚𝑛
𝐸𝑥𝑇

𝐴−1 (1) ⊗ 𝐵−1(1)

− 𝜆𝜇

𝐴−1(𝜆) ⊗ 𝐵−1 (𝜇)

𝑥
+
√
𝑚𝑛𝜆𝜇

𝑥𝑇 𝐴−1(𝜆) ⊗ 𝐵−1(𝜇)
𝑚𝑛
𝑥 − 𝜎2

=
2𝜎2
√
𝑚𝑛

𝐸 log 𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)
− log 𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)

+
√
𝑚𝑛

𝐸[𝜎2𝜆𝜇 − 𝜎2𝜆𝜇

+ 𝜆𝜇
√
𝑚𝑛

𝑥𝑇

𝐴−1 (𝜆) ⊗ 𝐵−1(𝜇)

𝑥 − 𝐸𝑥𝑇

𝐴−1 (𝜆) ⊗ 𝐵−1 (𝜇)

𝑥

.
Since the probability measure corresponding to 𝑝𝐽𝑚
𝑛
(𝜎2, 𝜆, 𝜇) and the probability measure corresponding
to 𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1) are equivalent (Ying, 1993), the Radon-Nikodym derivative satisfies
44
(Ibragimov and Rozanov, 1978)
𝑃

0 < lim
𝑚𝑛→∞
𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)
< ∞

= 1 and − ∞ < 𝐸 log

lim
𝑚𝑛→∞
𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)

< ∞.
Thus as 𝑚𝑛 → ∞,
2𝜎2
√
𝑚𝑛

𝐸 log 𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)
− log 𝑝𝐽𝑚
𝑛
(𝜎2𝜆𝜇, 1, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)

=
2𝜎2
√
𝑚𝑛
􀀀
𝑂(1) − 𝑂𝑝 (1)

= 𝑜𝑝 (1).
By the Central Limit Theorem, as 𝑚𝑛 → ∞
𝜆𝜇
√
𝑚𝑛

𝑥𝑇

𝐴−1 (𝜆) ⊗ 𝐵−1 (𝜇)

𝑥 − 𝐸𝑥𝑇

𝐴−1(𝜆) ⊗ 𝐵−1 (𝜇)

𝑥
 𝑑 → 𝑁(0, 2(𝜎2𝜆𝜇)2).
By Proposition 4, when 𝑚 = 𝑟𝑛 and 𝑛 → ∞,
√
𝑚𝑛

𝐸[𝜎2𝜆𝜇 − 𝜎2𝜆𝜇

= −𝜎2 𝑟𝜆( (𝜇 + 1)2 − 4) + 𝜇( (𝜆 + 1)2 − 4)
2
√
𝑟
+ 𝑜(1).
As a result, when 𝑚 = 𝑟𝑛 and 𝑛 → ∞,
√
𝑚𝑛([𝜎2𝜆𝜇 − 𝜎2𝜆𝜇) 𝑑 → 𝑁

−𝜎2 𝑟𝜆( (𝜇 + 1)2 − 4) + 𝜇( (𝜆 + 1)2 − 4)
2
√
𝑟
, 2(𝜎2𝜆𝜇)2

. (3.13)
For any 0 ≤ 𝑢 ≤ 1, the joint density of 𝑦𝑢· := (𝑋(𝑢, 𝑡1), 𝑋(𝑢, 𝑡2), . . . , 𝑋(𝑢, 𝑡𝑛)) is
𝑝𝐵
𝑛
(𝜎2, 𝜇; 𝑢) := (2𝜋𝜎2)−𝑛/2|𝐵(𝜇) |−1/2 exp

− 1
2𝜎2 𝑦𝑇𝑢
·𝐵−1 (𝜇)𝑦𝑢·

. (3.14)
Recall that 𝐷𝑚 is the 𝑚×𝑚 diagonal matrix with (𝐷𝑚)𝑖𝑖 = Δ𝑢𝑖 , 𝑖 = 1, 2, . . . , 𝑚. For any 𝑚, 𝑛 ∈ Z+,
√
𝑛(d𝜎2𝜇 − 𝜎2𝜇) =
1 √
𝑛
𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1(1)

− 𝜇

𝐷𝑚 ⊗ 𝐵−1(𝜇)

𝑥
+
√
𝑛𝜎2𝜇

𝑥𝑇 𝐷𝑚 ⊗ 𝐵−1(𝜇)
𝜎2𝑛
𝑥 − 1

=
2𝜎2
√
𝑛
Õ𝑚
𝑖=1
Δ𝑢𝑖

𝐸 log 𝑝𝐵
𝑛
(𝜎2𝜇, 1; 𝑢𝑖)
𝑝𝐵
𝑛 (𝜎2, 𝜇; 𝑢𝑖)
− log 𝑝𝐵
𝑛
(𝜎2𝜇, 1; 𝑢𝑖)
𝑝𝐵
𝑛 (𝜎2, 𝜇; 𝑢𝑖)

+
√
𝑛𝜎2𝜇

𝑥𝑇 𝐷𝑚 ⊗ 𝐵−1(𝜇)
𝜎2𝑛
𝑥 − 1

(3.15)
+ 1 √
𝑛
𝐸𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1(1)

− 𝜇

𝐷𝑚 ⊗ 𝐵−1(𝜇)

𝑥
=
2𝜎2
√
𝑛
𝐸 log 𝑝𝐵
𝑛
(𝜎2𝜇, 1; 𝑢1)
𝑝𝐵
𝑛 (𝜎2, 𝜇; 𝑢1)
−
Õ𝑚
𝑖=1
Δ𝑢𝑖 log 𝑝𝐵
𝑛
(𝜎2𝜇, 1; 𝑢𝑖)
𝑝𝐵
𝑛 (𝜎2, 𝜇; 𝑢𝑖)
!
+
√
𝑛

𝐸d𝜎2𝜇 − 𝜎2𝜇

+ 𝜇
√
𝑛

𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1 (𝜇)

𝑥 − 𝐸𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1 (𝜇)

𝑥

. (3.16)
45
By Proposition 4, as 𝑚 → ∞ and 𝑛 → ∞,
√
𝑛

𝐸d𝜎2𝜇 − 𝜎2𝜇

= 𝑜(1). (3.17)
Denote by
𝐻𝑛 :=
1 √
𝑛

𝐷𝑚 ⊗ 𝐵−1(𝜇)

(𝐴(𝜆) ⊗ 𝐵(𝜇)) ,
then as 𝑛 → ∞,
Tr(𝐻2
𝑛
) =
Õ𝑚
𝑘, 𝑗=1
Δ𝑢𝑘Δ𝑢 𝑗 𝑒−2𝜆|𝑢𝑘−𝑢 𝑗 | → 1
𝜆
− 1 − 𝑒−2𝜆
2𝜆2 ,
Tr(𝐻4
𝑛
) <
𝑛
𝑛2
→ 0.
The convergence of moment generating function implies that as 𝑚 → ∞ and 𝑛 → ∞,
𝜇
√
𝑛

𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1 (𝜇)

𝑥 − 𝐸𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1(𝜇)

𝑥
 𝑑 → 𝑁

0,

2
𝜆
− 1 − 𝑒−2𝜆
𝜆2

(𝜎2𝜇)2

. (3.18)
Since ∀0 ≤ 𝑢 ≤ 1, the probability measure corresponding to 𝑝𝐵
𝑛
(𝜎2, 𝜇; 𝑢) and the probability
measure corresponding to 𝑝𝐵
𝑛
(𝜎2𝜇, 1; 𝑢) are equivalent (Ying, 1991), the Radon-Nikodym derivative
satisfies (Ibragimov and Rozanov, 1978)
𝑃

0 < 𝜌𝐵
𝑢 < ∞

= 1, (3.19)
−∞ < 𝐸 log 𝜌𝐵
𝑢 < ∞, (3.20)
where 𝜌𝐵
𝑢 = lim𝑛→∞
𝑝𝐵
𝑛
(𝜎2𝜇,1;𝑢)
𝑝𝐵
𝑛 (𝜎2,𝜇;𝑢) .
Moreover, since the probability measure corresponding to 𝑝𝐽𝑚
𝑛
(𝜎2, 𝜆, 𝜇) and the probability
measure corresponding to 𝑝𝐽𝑚
𝑛
(𝜎2𝜇, 𝜆, 1) are equivalent (Ying, 1993), the Radon-Nikodym derivative
satisfies (Ibragimov and Rozanov, 1978)
𝑃

0 < lim
𝑚𝑛→∞
𝑝𝐽𝑚
𝑛
(𝜎2𝜇, 𝜆, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)
< ∞

= 1.
Thus as 𝑚, 𝑛 → ∞,
log 𝑝𝐽𝑚
𝑛
(𝜎2𝜇, 𝜆, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)
= − 𝑚
2
log
|𝜎2𝜇𝐵(1) |
|𝜎2𝐵(𝜇) |
− 1
2𝑥𝑇

𝐴−1 (𝜆) ⊗

1
𝜎2𝜇
𝐵−1 (1) − 1
𝜎2 𝐵−1 (𝜇)

𝑥
=𝑂𝑝 (1).
46
For any 𝑚, 𝑛 ≥ 1, denote by
𝐽𝑚𝑛 = 𝑥𝑇

1
𝑚
𝐴−1 (𝜆) − 𝐷𝑚

⊗

1
𝜎2𝜇
𝐵−1(1) − 1
𝜎2 𝐵−1(𝜇)

𝑥.
Since Tr (𝐷𝑚𝐴(𝜆)) =
Í𝑚
𝑖=1 Δ𝑢𝑖 = 1 and Tr
􀀀
(𝐷𝑚𝐴(𝜆))2
=
Í𝑚
𝑖, 𝑗=1 Δ𝑢𝑖Δ𝑢 𝑗 𝑒−2𝜆|𝑢𝑖−𝑢𝑘 | = 𝑂(1), there
are
𝐸𝐽𝑚𝑛 = Tr

1
𝑚
𝐴−1(𝜆) − 𝐷𝑚

⊗

1
𝜎2𝜇
𝐵−1 (1) − 1
𝜎2 𝐵−1(𝜇)
 
𝐴(𝜆) ⊗ 𝜎2𝐵(𝜇)

= Tr

1
𝑚
𝐼𝑚 − 𝐷𝑚𝐴(𝜆)

⊗

1
𝜇
𝐵−1(1)𝐵(𝜇) − 𝐼𝑛

= Tr

1
𝑚
𝐼𝑚 − 𝐷𝑚𝐴(𝜆)

Tr

1
𝜇
𝐵−1 (1)𝐵(𝜇) − 𝐼𝑛

= 0, ∀𝑚, 𝑛 ≥ 1,
and
Var(𝐽𝑚𝑛) = 2Tr

1
𝑚
𝐼𝑚 − 𝐷𝑚𝐴(𝜆)

⊗

1
𝜇
𝐵−1 (1)𝐵(𝜇) − 𝐼𝑛
2!
=
1
𝑚

1 + 𝑚Tr

(𝐷𝑚𝐴(𝜆))2

− 2Tr (𝐷𝑚𝐴(𝜆))
 
1
𝜇2 Tr

(𝑀𝐵
𝜇
)2

+ Tr(𝐼𝑛) − 2
𝜇
Tr

𝑀𝐵
𝜇

=
1
𝑚
𝑂(𝑚) (𝑛 + 𝑂(1) + 𝑛 − 2(𝑛 + 𝑂(1)))
= 𝑂(1) as 𝑚, 𝑛 → ∞,
where 𝑀𝐵
𝜇 = 𝐵−1 (1)𝐵(𝜇). Thus, 𝐽𝑚𝑛 = 𝑂𝑝 (1) as 𝑚, 𝑛 → ∞. Hence,
Õ𝑚
𝑖=1
Δ𝑢𝑖 log 𝑝𝐵
𝑛
(𝜎2𝜇, 1; 𝑢𝑖)
𝑝𝐵
𝑛 (𝜎2, 𝜇; 𝑢𝑖)
= −1
2
log
|𝜎2𝜇𝐵(1) |
|𝜎2𝐵(𝜇) |
− 1
2𝑥𝑇

𝐷𝑚 ⊗

1
𝜎2𝜇
𝐵−1 (1) − 1
𝜎2 𝐵−1 (𝜇)

𝑥
=
1
𝑚
log 𝑝𝐽𝑚
𝑛
(𝜎2𝜇, 𝜆, 1)
𝑝𝐽𝑚
𝑛 (𝜎2, 𝜆, 𝜇)
+ 1
2 𝐽𝑚𝑛
= 𝑂𝑝 (1) (3.21)
as 𝑚, 𝑛 → ∞. Moreover, it is implied by (3.20) that
𝐸 log 𝑝𝐵
𝑛
(𝜎2𝜇, 1; 𝑢1)
𝑝𝐵
𝑛 (𝜎2, 𝜇; 𝑢1)
= 𝑂(1). (3.22)
47
As a result of (3.17-3.22), as 𝑚 → ∞ and 𝑛 → ∞,
√
𝑛(d𝜎2𝜇 − 𝜎2𝜇) 𝑑 → 𝑁

0,

2
𝜆
− 1 − 𝑒−2𝜆
𝜆2

(𝜎2𝜇)2

. (3.23)
Similarly, for any 0 ≤ 𝑡 ≤ 1, the joint density of 𝑦·𝑡 := (𝑋(𝑢1, 𝑡), 𝑋(𝑢2, 𝑡), . . . , 𝑋(𝑢𝑚, 𝑡)) is
𝑝𝐴𝑚
(𝜎2, 𝜆; 𝑡) := (2𝜋𝜎2)−𝑚/2|𝐴(𝜆) |−1/2 exp

− 1
2𝜎2 𝑦𝑇 ·𝑡 𝐴−1 (𝜆)𝑦·𝑡

. (3.24)
Denote by ˜
𝐷
𝑛 the 𝑛 × 𝑛 diagonal matrix with ( ˜
𝐷
𝑛)𝑖𝑖 = Δ𝑡𝑖 , 𝑖 = 1, 2, . . . , 𝑛. Then for any 𝑚, 𝑛 ∈ Z+,
√
𝑚(d𝜎2𝜆 − 𝜎2𝜆) =
2𝜎2
√
𝑚
𝐸 log 𝑝𝐴𝑚
(𝜎2𝜆, 1; 𝑡1)
𝑝𝐴 𝑚
(𝜎2, 𝜆; 𝑡1)
−
Õ𝑛
𝑖=1
Δ𝑡𝑖 log 𝑝𝐴𝑚
(𝜎2𝜆, 1; 𝑡𝑖)
𝑝𝐴 𝑚
(𝜎2, 𝜆; 𝑡𝑖)
!
+
√
𝑚

𝐸d𝜎2𝜆 − 𝜎2𝜆

+ 𝜆
√
𝑚

𝑥𝑇

𝐴−1 ⊗ ˜
𝐷
𝑛

𝑥 − 𝐸𝑥𝑇

𝐴−1 ⊗ ˜
𝐷
𝑛

𝑥

= 𝑜𝑝 (1) + 𝑜(1) + 𝜎2𝜆

𝑥𝑇 ˜
𝐻
𝑚𝑥 − 𝐸𝑥𝑇 ˜
𝐻
𝑚𝑥

as 𝑚, 𝑛 → ∞, (3.25)
where ˜
𝐻
𝑚 := 1 √
𝑚
􀀀
𝐴−1(𝜆) ⊗ ˜
𝐷
𝑛

(𝐴(𝜆) ⊗ 𝐵(𝜇)). Thus,
√
𝑚(d𝜎2𝜆 − 𝜎2𝜆) 𝑑 → 𝑁

0,

2
𝜇
− 1 − 𝑒−2𝜇
𝜇2

(𝜎2𝜆)2

(3.26)
as 𝑚, 𝑛 → ∞.
3.3 Separable Estimation
Based on the results presented in Section 3.2, define estimators
ˆ𝜆
=
[𝜎2𝜆𝜇
d𝜎2𝜇
=
𝑥𝑇 􀀀
𝐴−1 (1) ⊗ 𝐵−1 (1)

𝑥
𝑚
Í𝑚
𝑖=1 𝑥𝑇
𝑖·𝐵−1(1)𝑥𝑖·Δ𝑢𝑖
, (3.27)
𝜇ˆ =
[𝜎2𝜆𝜇
d𝜎2𝜆
=
𝑥𝑇 􀀀
𝐴−1(1) ⊗ 𝐵−1(1)

𝑥
𝑛
Í𝑛𝑗
=1 𝑥𝑇 · 𝑗 𝐴−1 (1)𝑥· 𝑗Δ𝑡 𝑗
, (3.28)
and
ˆ𝜎2 =
d𝜎2𝜆d𝜎2𝜇
[𝜎2𝜆𝜇
=
Í𝑛𝑗
=1 𝑥𝑇 · 𝑗 𝐴−1 (1)𝑥· 𝑗Δ𝑡 𝑗
 􀀀Í𝑚
𝑖=1 𝑥𝑇
𝑖·𝐵−1 (1)𝑥𝑖·Δ𝑢𝑖

𝑥𝑇
􀀀
𝐴−1 (1) ⊗ 𝐵−1 (1)

𝑥
, (3.29)
where d𝜎2𝜇, d𝜎2𝜆, and [𝜎2𝜆𝜇 are defined in (3.5-3.7), matrices 𝐴 and 𝐵 are defined in (3.4). The
main results of this chapter are regarding the joint asymptotic normality and the strong consistency
of ˆ
𝜆
, 𝜇ˆ, and 𝜎ˆ 2.
48
Theorem 9. Under model (3.2), if 𝑚/𝑛 → 𝑟 as 𝑛 → ∞, then
√
𝑚
©­­­­­
«
ˆ𝜆
− 𝜆
𝜇ˆ − 𝜇
ˆ𝜎2 − 𝜎2
ª®®®®®
¬
𝑑 →
𝑁
©­­­­­
«
0,
©­­­­­
«
𝑟𝐶𝜆 0 −𝑟𝜎2 𝐶𝜆
𝜆
0 𝐶𝜇 −𝜎2 𝐶𝜇
𝜇
−𝑟𝜎2 𝐶𝜆
𝜆
−𝜎2 𝐶𝜇
𝜇 𝜎4

𝐶𝜇
𝜇2 + 𝑟 𝐶𝜆
𝜆2

ª®®®®®
¬
ª®®®®®
¬
as 𝑛 → ∞, (3.30)
where 𝐶𝜆 = 2𝜆 − 1 + 𝑒−2𝜆 and 𝐶𝜇 = 2𝜇 − 1 + 𝑒−2𝜇.
Proof. It was shown in the proof of Theorem 8 that when 𝑚/𝑛 → 𝑟 and 𝑛 → ∞,
√
𝑚
©­­­­­
«
d𝜎2𝜇 − 𝜎2𝜇
d𝜎2𝜆 − 𝜎2𝜆
[𝜎2𝜆𝜇 − 𝜎2𝜆𝜇
ª®®®®®
¬
=
©­­­­­
«
√
𝑟𝜎2𝜇

𝑥𝑇 𝐷𝑚⊗𝐵−1 (𝜇)
𝜎2
√
𝑛
𝑥 − 𝐸𝑥𝑇 𝐷𝑚⊗𝐵−1 (𝜇)
𝜎2
√
𝑛
𝑥

+ 𝑜𝑝 (1)
𝜎2𝜆

𝑥𝑇 𝐴−1 (𝜆)⊗ ˜
𝐷
𝑛
𝜎2
√
𝑚
𝑥 − 𝐸𝑥𝑇 𝐴−1 (𝜆)⊗ ˜
𝐷
𝑛
𝜎2
√
𝑚
𝑥

+ 𝑜𝑝 (1)
𝜎√2𝜆𝜇
𝑛

𝑥𝑇 𝐴−1 (𝜆)⊗𝐵−1 (𝜇)
𝜎2
√
𝑚𝑛
𝑥 − 𝐸𝑥𝑇 𝐴−1 (𝜆)⊗𝐵−1 (𝜇)
𝜎2
√
𝑚𝑛
𝑥

+ 𝑂

√1
𝑛

+ 𝑜𝑝 (1)
ª®®®®®
¬
= 𝑉 − 𝐸𝑉 + 𝑜𝑝 (1),
where
𝑉 =

√
𝑟𝜎2𝜇

𝑥𝑇 𝐷𝑚⊗𝐵−1 (𝜇)
𝜎2
√
𝑛
𝑥

, 𝜎2𝜆

𝑥𝑇 𝐴−1 (𝜆)⊗ ˜
𝐷
𝑛
𝜎2
√
𝑚
𝑥

, 𝜎√2𝜆𝜇
𝑛

𝑥𝑇 𝐴−1 (𝜆)⊗𝐵−1 (𝜇)
𝜎2
√
𝑚𝑛
𝑥
𝑇
.
For any 𝛾 = (𝛾1, 𝛾2, 𝛾3)𝑇 ∈ R3
>0,
𝛾𝑇𝑉 = 𝑥𝑇

𝛾1
√
𝑟𝜎2𝜇
𝐷𝑚 ⊗ 𝐵−1(𝜇)
𝜎2
√
𝑛
+ 𝛾2𝜎2𝜆
𝐴−1(𝜆) ⊗ ˜
𝐷
𝑛
𝜎2
√
𝑚
+ 𝛾3
𝜎2𝜆𝜇
√
𝑛
𝐴−1 (𝜆) ⊗ 𝐵−1(𝜇)
𝜎2
√
𝑚𝑛

𝑥
:= 𝑥𝑇𝑀𝑚𝑛𝑥.
It was revealed in the proof of Theorem 8 that
˜𝑀
𝑚𝑛 := 𝑀𝑚𝑛𝜎2𝐴(𝜆) ⊗ 𝐵(𝜇)) (3.31)
= 𝛾1
√
𝑟𝜎2𝜇𝐻𝑛 + 𝛾2𝜎2𝜆 ˜
𝐻
𝑚 + 𝛾3
𝜎2𝜆𝜇
𝑛
√
𝑚
𝐼𝑚𝑛, (3.32)
where matrices 𝐻𝑛 and ˜
𝐻
𝑚 satisfy that as 𝑚, 𝑛 → ∞,
Tr(𝐻2
𝑛
) → 1
𝜆
− 1 − 𝑒−2𝜆
2𝜆2 , Tr( ˜
𝐻
2𝑚
) → 1
𝜇
− 1 − 𝑒−2𝜇
2𝜇2 ;
Tr(𝐻𝑘
𝑛
) = 𝑜(1), Tr( ˜
𝐻
𝑘𝑚
) = 𝑜(1), ∀𝑘 ≥ 3.
49
Moreover, ∀𝑚, 𝑛 ∈ 𝑍+,
Tr(𝐻𝑛) =
1 √
𝑛
Tr(𝐷𝑚𝐴(𝜆) ⊗ 𝐼𝑛) =
√
𝑛,
Tr( ˜
𝐻
𝑚) =
1 √
𝑚
Tr(𝐼𝑚 ⊗ ˜
𝐷
𝑛𝐵(𝜇)) =
√
𝑚;
Tr(𝐻𝑛 ˜
𝐻
𝑚) =
1 √
𝑚𝑛
Tr(𝐷𝑚𝐴(𝜆) ⊗ ˜
𝐷
𝑛𝐵(𝜇)) =
Tr(𝐷𝑚𝐴(𝜆))Tr( ˜
𝐷
𝑛𝐵(𝜇))
√
𝑚𝑛
=
1 √
𝑚𝑛
;
Tr(𝐻𝑘
𝑛 ˜
𝐻
𝑚) =
Tr
􀀀
(𝐷𝑚𝐴(𝜆))𝑘 
Tr( ˜
𝐷
𝑛𝐵(𝜇))
√
𝑛𝑘𝑚
=
Tr(𝐻𝑘
𝑛
)
𝑛
√
𝑚
, Tr(𝐻𝑛 ˜
𝐻
𝑘𝑚
) =
Tr( ˜
𝐻
𝑘𝑚
)
𝑚
√
𝑛
, ∀𝑘 ≥ 2;
Tr

(𝐻𝑛 ˜
𝐻
𝑚)2

=
Tr
􀀀
(𝐷𝑚𝐴(𝜆))2
Tr
􀀀
( ˜
𝐷
𝑛𝐵(𝜇))2
𝑚𝑛
=
Tr(𝐻2
𝑛
)Tr( ˜
𝐻
2𝑚
)
𝑚𝑛
.
Thus when 𝑚/𝑛 → 𝑟 and 𝑛 → ∞,
Tr( ˜
𝑀
2𝑚
𝑛
) = (𝛾1
√
𝑟𝜎2𝜇)2Tr(𝐻2
𝑛
) + (𝛾2𝜎2𝜆)2Tr( ˜
𝐻
2𝑚
) + 𝑂(Tr(𝐻𝑛 ˜
𝐻
𝑚)) + 𝑂

Tr(𝐻𝑛)
𝑛
√
𝑚

+ 𝑂

Tr( ˜
𝐻
𝑚)
𝑛
√
𝑚

+ 𝑂

1
𝑛

→ (𝛾1
√
𝑟𝜎2𝜇)2 2𝜆 − 1 + 𝑒−2𝜆
2𝜆2
+ (𝛾2𝜎2𝜆)2 2𝜇 − 1 + 𝑒−2𝜇
2𝜇2 , (3.33)
Tr( ˜
𝑀
4𝑚
𝑛
) = 𝑂

Tr(𝐻4
𝑛
)

+ 𝑂

Tr( ˜
𝐻
4𝑚
)

+ 𝑂

Tr( (𝐻𝑛 ˜
𝐻
𝑚)2)

+ 𝑂

Tr(𝐻3
𝑛 ˜
𝐻
𝑚)

+ 𝑂

Tr(𝐻𝑛 ˜
𝐻
3𝑚
)

+ 1
𝑛
√
𝑚

𝑂

Tr(𝐻3
𝑛
)

+ 𝑂

Tr( ˜
𝐻
3𝑚
)

+ 𝑂

Tr(𝐻2
𝑛 ˜
𝐻
𝑚)

+ 𝑂

Tr(𝐻𝑛 ˜
𝐻
2𝑚
)

+ 1
𝑛2𝑚

𝑂

Tr(𝐻2
𝑛
)

+ 𝑂
􀀀
Tr(𝐻𝑛 ˜
𝐻
𝑚)

+ 𝑂

Tr( ˜
𝐻
2𝑚
)

+ 1
𝑛3
√
𝑚3
􀀀
𝑂 (Tr(𝐻𝑛)) + 𝑂
􀀀
Tr( ˜𝐻𝑚)

→ 0, (3.34)
Hence, the convergence of the moment generating function for 𝛾𝑇 (𝑉 −𝐸𝑉) implies that it is asymptotically
Gaussian with zero mean and the variance equals
2 lim
𝑚/𝑛→𝑟,𝑛→∞
Tr( ˜
𝑀
2𝑚
𝑛
) = 2

𝑟 (𝛾1𝜎2𝜇)2 2𝜆 − 1 + 𝑒−2𝜆
2𝜆2
+ (𝛾2𝜎2𝜆)2 2𝜇 − 1 + 𝑒−2𝜇
2𝜇2

.
By the Cramér–Wold theorem, when 𝑚/𝑛 → 𝑟 as 𝑛 → ∞,
√
𝑚
©­­­­­
«
d𝜎2𝜇 − 𝜎2𝜇
d𝜎2𝜆 − 𝜎2𝜆
[𝜎2𝜆𝜇 − 𝜎2𝜆𝜇
ª®®®®®
¬
𝑑 →
𝑁
©­­­­­
«
0,
©­­­­­
«
2𝑟 (𝜎2𝜇)2 2𝜆−1+𝑒−2𝜆
2𝜆2 0 0
0 2(𝜎2𝜆)2 2𝜇−1+𝑒−2𝜇
2𝜇2 0
0 0 0
ª®®®®®
¬
ª®®®®®
¬
. (3.35)
50
Define function 𝑔 : R3
>0
↦→ R3
>0 as
𝑔(𝑥, 𝑦, 𝑧) = (𝑧/𝑥, 𝑧/𝑦, 𝑥𝑦/𝑧), ∀(𝑥, 𝑦, 𝑧) ∈ R3
>0. (3.36)
Then the Jacobian matrix of 𝑔 is
𝐽𝑔 (𝑥, 𝑦, 𝑧) =
©­­­­­
«
−𝑧/𝑥2 0 1/𝑥
0 −𝑧/𝑦2 1/𝑦
𝑦/𝑧 𝑥/𝑧 −𝑥𝑦/𝑧2
ª®®®®®
¬
.
It follows from the definition that
𝑔

d𝜎2𝜇,d𝜎2𝜆,[𝜎2𝜆𝜇

=

ˆ𝜆
, 𝜇ˆ, 𝜎ˆ 2

,
𝑔

𝜎2𝜇, 𝜎2𝜆, 𝜎2𝜆𝜇

=

𝜆, 𝜇, 𝜎2

,
𝐽𝑔

𝜎2𝜇, 𝜎2𝜆, 𝜎2𝜆𝜇

©­­­­­
«
2𝑟 (𝜎2𝜇)2 2𝜆−1+𝑒−2𝜆
2𝜆2 0 0
0 2(𝜎2𝜆)2 2𝜇−1+𝑒−2𝜇
2𝜇2 0
0 0 0
ª®®®®®
¬
𝐽𝑔

𝜎2𝜇, 𝜎2𝜆, 𝜎2𝜆𝜇
𝑇
=
©­­­­­
«
𝑟 (2𝜆 − 1 + 𝑒−2𝜆) 0 −𝑟𝜎2
𝜆
(2𝜆 − 1 + 𝑒−2𝜆)
0 2𝜇 − 1 + 𝑒−2𝜇 −𝜎2
𝜇
(2𝜇 − 1 + 𝑒−2𝜇)
−𝑟𝜎2
𝜆
(2𝜆 − 1 + 𝑒−2𝜆) −𝜎2
𝜇
(2𝜇 − 1 + 𝑒−2𝜇) 𝜎4

2𝜇−1+𝑒−2𝜇
𝜇2 + 𝑟 2𝜆−1+𝑒−2𝜆
𝜆2

ª®®®®®
¬
.
Thus when 𝑚/𝑛 → 𝑟 as 𝑛 → ∞,
√
𝑚
©­­­­­
«
ˆ𝜆
− 𝜆
𝜇ˆ − 𝜇
ˆ𝜎2 − 𝜎2
ª®®®®® ¬
𝑑 →
𝑁
©­­­­­
«
0,
©­­­­­
«
𝑟 (2𝜆 − 1 + 𝑒−2𝜆) 0 −𝑟𝜎2
𝜆
(2𝜆 − 1 + 𝑒−2𝜆)
0 2𝜇 − 1 + 𝑒−2𝜇 −𝜎2
𝜇
(2𝜇 − 1 + 𝑒−2𝜇)
−𝑟𝜎2
𝜆
(2𝜆 − 1 + 𝑒−2𝜆) −𝜎2
𝜇
(2𝜇 − 1 + 𝑒−2𝜇) 𝜎4

2𝜇−1+𝑒−2𝜇
𝜇2 + 𝑟 2𝜆−1+𝑒−2𝜆
𝜆2

ª®®®®®
¬
ª®®®®®
¬
.
The proof is finished using the multivariate delta method.
51
Remark 2. The estimators ˆ
𝜆
and 𝜇ˆ are asymptotically independent. This is due to the zero entries
of 𝐽𝑔 as well as the asymptotic independence of d𝜎2𝜇 and d𝜎2𝜆, which is based on the fact that
Tr(𝐷𝑚𝐴(𝜆)) = Tr( ˜
𝐷
𝑛𝐵(𝜇)) = 1, ∀𝑚, 𝑛.
Besides the asymptotic normality, estimators ˆ
𝜆
, 𝜇ˆ, and 𝜎ˆ 2 are also strongly consistent.
Theorem 10. Under model (3.2), as 𝑚, 𝑛 → ∞,

ˆ𝜆
, 𝜇ˆ, 𝜎ˆ 2
 𝑎→.𝑠.

𝜆, 𝜇, 𝜎2

. (3.37)
Proof. Since the function 𝑔 defined in (3.36) is a continuous function, the continuous mapping
theorem makes it suffice to prove

d𝜎2𝜇,d𝜎2𝜆,[𝜎2𝜆𝜇
 𝑎→.𝑠.

𝜎2𝜇, 𝜎2𝜆, 𝜎2𝜆𝜇

as 𝑚, 𝑛 → ∞.
For any (𝜆0, 𝜇0) ∈ R2
>0, there always exists a compact region C in R2
>0 that contains (𝜆0, 𝜇0) and
(1, 1) as its interior points. Therefore (4.13) and (4.14) in the proof of Theorem 1 in Ying (1993)
both hold. Namely, as 𝑛 → ∞,
𝑥𝑇1·𝐵−1(1)𝑥1· +
Õ𝑚
𝑖=2
(𝑥𝑖· − 𝑒−𝜀𝑖𝑥(𝑖−1)·)𝑇𝐵−1 (1) (𝑥𝑖· − 𝑒−𝜀𝑖𝑥(𝑖−1)·)
1 − 𝑒−2𝜀𝑖
𝑎.𝑠. = 𝜆0𝜇0𝜎2
0
Õ𝑚
𝑖=2
Õ𝑛
𝑘=2
𝜔2
𝑖𝑘
+ [𝜆0𝜎2
0
+ 𝜆0𝜇0𝜎2
0
(1 − 𝜇0) +
𝜆0(1 − 𝜇0)2𝜎2
0
2
]𝑚
+ [𝜇0𝜎2
0
+ 𝜆0𝜇0𝜎2
0
(1 − 𝜆0) +
𝜇0(1 − 𝜆0)2𝜎2
0
2
]𝑛 + 𝑜(𝑛).
52
As a result, as 𝑚, 𝑛 → ∞,
𝑙𝑚,𝑛 (1, 1, 𝜎2) − 𝑙𝑚,𝑛 (1, 1, 𝜆0𝜇0𝜎2
0
)
=(1 + 𝑚 − 1 + 𝑛 − 1 + (𝑚 − 1) (𝑛 − 1)) log( 𝜎2
𝜆0𝜇0𝜎2
0
)
+ ( 1
𝜎2
− 1
𝜆0𝜇0𝜎2
0
) [𝑥𝑇1 𝐵−1(1)𝑥1 +
Õ𝑚
𝑖=2
(𝑥𝑖 − 𝑒−𝜀𝑖𝑥𝑖−1)𝑇𝐵−1(1) (𝑥𝑖 − 𝑒−𝜀𝑖𝑥𝑖−1)
1 − 𝑒−2𝜀𝑖
]
=(
𝜆0𝜇0𝜎2
0
𝜎2
− 1)
Õ𝑚
𝑖=2
Õ𝑛
𝑘=2
𝜔2
𝑖𝑘
− (𝑚 − 1) (𝑛 − 1) log(
𝜆0𝜇0𝜎2
0
𝜎2
)
+ ( 1
𝜎2
− 1
𝜆0𝜇0𝜎2
0
) [𝜆0𝜎2
0
+ 𝜆0𝜇0𝜎2
0
(1 − 𝜇0) +
𝜆0 (1 − 𝜇0)2𝜎2
0
2
]𝑚
+ ( 1
𝜎2
− 1
𝜆0𝜇0𝜎2
0
) [𝜇0𝜎2
0
+ 𝜆0𝜇0𝜎2
0
(1 − 𝜆0) +
𝜇0(1 − 𝜆0)2𝜎2
0
2
]𝑛
+ (𝑚 + 𝑛 − 1) log( 𝜎2
𝜆0𝜇0𝜎2
0
) + 𝑜(𝑛)
𝑎.𝑠. = (𝑚 − 1) (𝑛 − 1) (
𝜆0𝜇0𝜎2
0
𝜎2
− 1 − log(
𝜆0𝜇0𝜎2
0
𝜎2
)) + 𝑜(𝑚𝑛), (3.38)
where the last equality holds since
Í𝑚
𝑖=2
Í𝑛𝑘
=2
(𝜔2
𝑖𝑘
− 1) = 𝑜(𝑚𝑛) almost surely. Thus,
𝑙𝑚,𝑛 (1, 1, 𝜎2) − 𝑙𝑚,𝑛 (1, 1, 𝜆0𝜇0𝜎2
0
) → ∞ a.s.
as 𝑚, 𝑛 → ∞ if 𝜎2 ≠ 𝜆0𝜇0𝜎2
0 . Together with Lemma 4 in Ying (1991), the result above entails
argmin
𝜎2
𝑙𝑚,𝑛 (1, 1, 𝜎2) 𝑎→.𝑠. 𝜆0𝜇0𝜎2
0 (3.39)
as 𝑚, 𝑛 → ∞. Hence as 𝑚, 𝑛 → ∞,[𝜎2𝜆𝜇
𝑎→.𝑠. 𝜎2𝜆𝜇.
It remains to prove that as 𝑚, 𝑛 → ∞, d𝜎2𝜇
𝑎→.𝑠. 𝜎2𝜇 and d𝜎2𝜆
𝑎→.𝑠. 𝜎2𝜆. It follows from the
definition that under model (3.2),
d𝜎2𝜇 =
1
𝑛
𝑥𝑇

𝐷𝑚 ⊗ 𝐵−1 (1)

𝑥
𝑑 =
𝜖𝑇Λ𝑚𝑛𝜖,
where 𝜖 ∼ 𝑁(0, 𝐼𝑚𝑛) and Λ𝑚𝑛 is a diagonal matrix whose diagonal entries are eigenvalues of the
matrix
𝜎2
𝑛

(𝐴1/2(𝜆))𝑇𝐷𝑚𝐴1/2 (𝜆)

⊗

(𝐵1/2 (𝜇))𝑇𝐵−1 (1)𝐵1/2(𝜇)

.
53
By the result of Proposition 5,
||Λ𝑚𝑛 ||2
𝐹 =Tr

𝜎2
𝑛

(𝐴1/2 (𝜆))𝑇𝐷𝑚𝐴1/2(𝜆)

⊗

(𝐵1/2(𝜇))𝑇𝐵−1(1)𝐵1/2 (𝜇)
2!
=
1
2
Var

d𝜎2𝜇

=𝑂(𝑛−1) as 𝑚, 𝑛 → ∞.
Moreover, ||Λ𝑚𝑛 ||2 ≤ ||Λ𝑚𝑛 ||𝐹 = 𝑂(𝑛−1/2) as 𝑚, 𝑛 → ∞. Thus, the Hanson-Wright inequality
implies that for sufficiently large 𝑛, ∃𝐶0 > 0 such that
𝑃

d𝜎2𝜇 − 𝐸d𝜎2𝜇

≥ 𝜉

≤ 2 exp
−𝐶 min
(
𝜉
||Λ𝑚𝑛 ||2
,
𝜉2
||Λ𝑚𝑛||2
𝐹
)!
≤ 2 exp(−𝐶0
√
𝑛𝜉), ∀𝜉 > 0, (3.40)
where 𝐶 > 0 is an absolute constant. It hence follows from the Borel–Cantelli lemma that d𝜎2𝜇 −
𝐸d𝜎2𝜇
𝑎→.𝑠. 0 as 𝑚, 𝑛 → ∞. By the results of Proposition 4,
d𝜎2𝜇 − 𝜎2𝜇 = d𝜎2𝜇 − 𝐸d𝜎2𝜇 + 𝐸d𝜎2𝜇 − 𝜎2𝜇
𝑎→.𝑠. 0 (3.41)
as 𝑚, 𝑛 → ∞.
In a similar manner, it can be proved thatd𝜎2𝜆
𝑎→.𝑠. 𝜎2𝜆 as 𝑚, 𝑛 → ∞. This finishes the proof.
3.4 Simulation
Let 𝜆 = 0.5, 𝜇 = 10, 𝜎2 = 4. For each value of the sample size 𝑛 = 500, 600, . . . , 2000 and
𝑚 = 0.5𝑛, we set irregular sampling locations as 𝑢0 = 𝑡0 = 0, 𝑢𝑚 = 𝑡𝑛 = 1, and
(𝑢𝑖 , 𝑡 𝑗 ) =

𝑖
𝑚
+ 𝑈𝑖𝑢
,
𝑗
𝑛
+ 𝑈𝑗
𝑡

, ∀0 < 𝑖 < 𝑚, 0 < 𝑗 < 𝑛,
where𝑈𝑖𝑢
𝑖.𝑖∼.𝑑. 𝑈

− 1
2𝑚, 1
2𝑚

and𝑈𝑗
𝑡
𝑖.𝑖∼.𝑑. 𝑈

− 1
2𝑛 , 1
2𝑛

are independent uniformly distributed random
variables. Given sampling locations, we run 1000 realizations and calculate ˆ
𝜆
, 𝜇ˆ, and 𝜎ˆ 2 as defined
in Section 3.3. One realization when 𝑛 = 500 is shown in Figure 3.1. The averaged absolute value
54
0.2 0.4 0.6 0.8 1.0
0.2 0.4 0.6 0.8 1.0
u
t
−6
−4
−2
0
2
4
Figure 3.1A simulated OU field with 𝑚 = 250 and 𝑛 = 500.
Table 3.1Empirical quantiles of standardized bias when estimating 𝜆.
𝜆
𝑁(0, 1)
𝑛 500 1000 2000
5% -1.4462 -1.5188 -1.4547 -1.6448
25% -0.6030 -0.5308 -0.5681 -0.6744
50% 0.1559 0.0893 0.0763 0
75% 0.9224 0.7342 0.7039 0.6744
95% 1.9886 1.8533 1.7377 1.6448
of bias for each sample size and the histogram of bias when 𝑛 = 2000 are shown in Figure 3.2. For
𝑛 = 500, 1000, 2000, some empirical quantiles of
√
𝑚(ˆ
𝜆
− 𝜆)
p
𝑟 (2𝜆 − 1 + 𝑒−2𝜆)
,
√
𝑚(𝜇ˆ − 𝜇)
p
2𝜇 − 1 + 𝑒−2𝜇
, and
√
𝑚( ˆ𝜎2 − 𝜎2)
r
𝜎4

2𝜇−1+𝑒−2𝜇
𝜇2 + 𝑟 2𝜆−1+𝑒−2𝜆
𝜆2

are shown in Tables 3.1-3.3.
3.5 Discussion
We proposed estimators for covariance parameters of an anisotropic Ornstein-Uhlenbeck field
observed on [0, 1]2. The estimators ˆ
𝜆
, 𝜇ˆ, and 𝜎ˆ 2 formulated in Section 3.3 are strongly consistent
and have lower computational complexity than the MLEs of 𝜆, 𝜇, and 𝜎2. As the sample size goes
to infinity, the estimators we proposed asymptotically follow normal distribution, but have higher
55
400 600 800 1000
0.012 0.016 0.020 0.024
l
m
bias
m(l^
- l)
r(2l - 1 + e-2l)
Density
−3 −2 −1 0 1 2 3 4
0.0 0.1 0.2 0.3 0.4
400 600 800 1000
0.12 0.16 0.20 0.24
m
m
bias
m(m^ - m)
2m - 1 + e-2m
Density
−4 −2 0 2 4
0.0 0.1 0.2 0.3 0.4
400 600 800 1000
0.10 0.12 0.14 0.16 0.18 0.20
s2
m
bias
m(s2 ^ - s2)
s2 (2m - 1 + e-2m) m2 + r(2l - 1 + e-2l) l2
Density
−3 −2 −1 0 1 2 3
0.0 0.1 0.2 0.3 0.4
Figure 3.2The plots in the first row present averaged absolute values of bias for
𝑛 = 500, 600, . . . , 2000 and 𝑚 = 𝑛/2 among 1000 realizations. The second row of plots present
the empirical distributions of bias with 1000 realizations when 𝑛 = 2000 and 𝑚 = 1000, where the
red curve indicates the density function of 𝑁(0, 1).
Table 3.2Empirical quantiles of standardized bias when estimating 𝜇.
𝜇
𝑁(0, 1)
𝑛 500 1000 2000
5% -1.9248 -1.8978 -1.6819 -1.6448
25% -1.0806 -0.9460 -0.8577 -0.6744
50% -0.3960 -0.3193 -0.2047 0
75% 0.3473 0.3897 0.4762 0.6744
95% 1.4002 1.3349 1.5174 1.6448
Table 3.3Empirical quantiles of standardized bias when estimating 𝜎2.
𝜎2
𝑁(0, 1)
𝑛 500 1000 2000
5% -1.6784 -1.5597 -1.6782 -1.6448
25% -0.7708 -0.6583 -0.6788 -0.6744
50% -0.0558 -0.0277 0 0
75% 0.7103 0.6323 0.6719 0.6744
95% 1.7328 1.5499 1.4902 1.6448
56
variance compared with the MLEs studied by Ying (1993). This presents a trade-off between the
computational cost and the estimation accuracy.
The sampling grid based on which ˆ
𝜆
, 𝜇ˆ, and 𝜎ˆ 2 are formulated is defined by lines parallel to
the coordinate axes. For a significantly anisotropic OU field such as the one shown in Figure 3.1,
the coordinate axes are distinguishable. When values of 𝜆 and 𝜇 are close, however, it could be
difficult to determine directions along which observations should be taken. It is thus of interest to
study the properties of estimators when sampling directions are not parallel to the coordinate axes.
The main results presented in this chapter focus on the asymptotic behaviors of the estimators. It
would also be interesting to study their finite-sample distributions and measure the distance between
a finite-sample distribution and the asymptotic distribution. The statistical inference for parameters
𝜆, 𝜇, and 𝜎2 is also worth analyzing. The exploration of these topics is reserved for future research
work.
57
CHAPTER 4
VECCHIA APPROXIMATION
4.1 Introduction
Consider a zero-mean Gaussian process 𝑋 with the Matérn covariance function
𝐶𝑜𝑣(𝑋(𝑡), 𝑋(𝑡 + 𝑑)) = 𝐾(𝑑) = 𝜎2 (𝜃𝑑)𝜈
Γ(𝜈)2𝜈−1
K𝜈 (𝜃𝑑), (4.1)
where 𝜃 > 0, 𝜈 > 0, 𝜎2 > 0, Γ is the gamma function, and K𝜈 is the modified Bessel function of the
second kind. Denote by 𝑋𝑛 = (𝑋(𝑡𝑛
1
), 𝑋(𝑡𝑛
2
), . . . , 𝑋(𝑡𝑛
𝑛
)) the observations of 𝑋 with sample size 𝑛.
When 𝜈 ≠ 1
2 , 𝑋 is not Markovian and the sparse precision matrix of 𝑋𝑛 discussed in Chapter 3 is not
valid. It is thus necessary to study other approaches to reduce the computational cost of the MLE.
The existing approaches to achieve computational efficiency include covariance tapering (Furrer
et al., 2006; Kaufman et al., 2008; Du et al., 2009), Gaussian Markov random fields representation
(Rue and Held, 2005; Lindgren et al., 2011), multiresolution approximation (Nychka et al., 2015;
Katzfuss, 2017), etc.
The Vecchia approximation is a method to reduce the computational burden through sparse
precision matrices. Write the joint density function of 𝑋(𝑡𝑛
1
), 𝑋(𝑡𝑛
2
), . . . , 𝑋(𝑡𝑛
𝑛
) as
𝑓𝑛 = 𝑓𝑋(𝑡𝑛
1
)
Ö𝑛
𝑖=2
𝑓𝑋(𝑡𝑛
𝑖
) |𝑋(𝑡𝑛
𝑖−1
)...𝑋(𝑡𝑛
1
) .
The Vecchia’s method (Vecchia, 1988) approximates 𝑓𝑛 by
ˆ 𝑓𝑛 = 𝑓𝑋(𝑡𝑛
1
)
Ö𝑛
𝑖=2
𝑓𝑋(𝑡𝑛
𝑖
) |𝑋(𝑡𝑛
𝑖−1
)...𝑋(𝑡𝑛
1∨(𝑖−𝑘)
) (4.2)
for some 𝑘 ≪ 𝑛, which makes the precision matrix of 𝑋𝑛 a band matrix and could thus significantly
reduce the computational complexity. The accuracy of Vecchia approximation has been discussed
in both theoretical and practical aspects (Stein et al., 2004; Datta et al., 2016; Guinness, 2018; Finley
et al., 2019; Zhang et al., 2021; Cao et al., 2022). Under a more general framework proposed
by Katzfuss and Guinness (2021), where the conditioning vector contains both observed data and
latent variables, the nearest-neighbor Gaussian process, latent autoregressive process, multiresolu-
58
tion approximation, and many other popular Gaussian process approximation methods are special
cases of the Vecchia approach.
In the remainder of this chapter, we focus on the standard Vecchia approximation and estimate
the scale parameter in the Matérn covariance function by MLE solved from the approximated likelihood.
The effects of the misspecified range parameter and the conditioning variables on the bias
are discussed in Section 4.2, and simulation results are presented in Section 4.3.
4.2 Maximum Likelihood Estimator for 𝜎2
Under a regular sampling design on fixed domain, we have 𝑡𝑛
𝑖 = 𝑖/𝑛 for 𝑖 = 1, 2, . . . , 𝑛. When
𝜈 is known, the expectation of MLE for 𝜎2 from Vecchia approximation satisfies the following
results.
Proposition 6. Denote by ˆ𝜎2 the MLE for 𝜎2 from Vecchia approximation with 𝜈 known and 𝜃
replaced by some fixed 𝜃0 > 0. When 𝑘 = 1 in (4.2), 𝐸 ˆ𝜎2 = 𝜎2 for any 𝑛 ≥ 2 if 𝜃0 = 𝜃, and
𝐸 ˆ𝜎2 =
8>>>>>>>>
<
>>>>>>>>:
𝜎2

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2𝜈−2) + 𝑂(𝑛−1) + 𝑂(𝑛−2𝜈), 𝜈 < 1,
𝜎2

𝜃
𝜃0
2
+ 𝑂( (log 𝑛)−1), 𝜈 = 1,
𝜎2

𝜃
𝜃0
2
+ 𝑂(𝑛−1) + 𝑂(𝑛2−2𝜈), 𝜈 > 1, 𝜈 ∉ Z
as 𝑛 → ∞ if 𝜃0 ≠ 𝜃. When 𝑘 = 2 in (4.2) and 𝜃0 ≠ 𝜃,
𝐸 ˆ𝜎2 =
8>>>>>>>>>>>>>>>>
<
>>>>>>>>>>>>>>>>:
𝜎2

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2𝜈−2) + 𝑂(𝑛−1) + 𝑂(𝑛−2𝜈), 𝜈 < 1,
𝜎2

𝜃
𝜃0
2
+ 𝑂( (log 𝑛)−1), 𝜈 = 1,
𝜎2

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛−1) + 𝑂(𝑛2−2𝜈) + 𝑂(𝑛2𝜈−4), 1 < 𝜈 < 2,
𝜎2

𝜃
𝜃0
4
+ 𝑂( (log 𝑛)−1), 𝜈 = 2,
𝜎2

𝜃
𝜃0
4
+ 𝜎2𝛽2
6𝜏−𝛽2

𝜃
𝜃0
2
− 1
2
+ 𝑂(𝑛−1) + 𝑂(𝑛4−2𝜈), 𝜈 > 2, 𝜈 ∉ Z
as 𝑛 → ∞, where 𝜏 = Γ(1−𝜈)
25Γ(3−𝜈) and 𝛽 = 1
4(1−𝜈) .
Proof. Denote for 1 ≤ 𝑖 ≤ 𝑛 that
𝐾0
𝑛,𝑖 =
(𝜃0𝑖/𝑛)𝜈
Γ(𝜈)2𝜈−1
K𝜈 (𝜃0𝑖/𝑛)
59
for some fixed 𝜃0 > 0, and write 𝐾𝑛,𝑖 = 𝜎−2𝐾(𝑖/𝑛). It follows from (9.6.2) and (9.6.10) in
Abramowitz and Stegun (1948) that for 𝜈 ∉ Z,
𝑥𝜈
Γ(𝜈)2𝜈−1
K𝜈 (𝑥) = 1 − 𝛼𝑥2𝜈 + 𝛽𝑥2 + 𝜏𝑥4 + 𝑂(𝑥2𝜈+2) + 𝑂(𝑥6) + 𝑂(𝑥2𝜈+4) as 𝑥 → 0, (4.3)
where 𝛼 = Γ(1−𝜈)
4𝜈Γ(1+𝜈) , 𝜏 = Γ(1−𝜈)
25Γ(3−𝜈) , and 𝛽 = 1
4(1−𝜈) . The gamma function Γ on R is defined as
Γ(𝑥) =
8>>>>
<
>>>>:
¯ ∞
0 𝑡𝑥−1𝑒−𝑡d𝑡, 𝑥 > 0,
Γ(𝑥+𝑛+1)
𝑥(𝑥+1)···(𝑥+𝑛) , 𝑥 < 0, 𝑥 ∉ Z,
(4.4)
where 𝑛 is chosen such that 𝑥+𝑛 > 0. For 𝜈 ∈ Z, it follows from (9.6.10) and (9.6.11) in Abramowitz
and Stegun (1948) that
𝑥𝜈
Γ(𝜈)2𝜈−1
K𝜈 (𝑥) =
Õ𝜈−1
𝑘=0
(−1)𝑘
(𝜈 − 𝑘 − 1)!
𝑘!(𝜈 − 1)!
 𝑥
2
2𝑘
+ 2(−1)𝜈+1
(𝜈 − 1)!
log
 𝑥
2
Õ∞
𝑘=0
1
𝑘!(𝜈 + 𝑘)!
 𝑥
2
2𝜈+2𝑘
+ (−1)𝜈
Õ∞
𝑘=1
Í𝑘ℎ =1
2
ℎ
+ Í𝑘+𝜈
ℎ=𝑘+1
1
ℎ
− 2𝛾
𝑘!(𝜈 + 𝑘)!(𝜈 − 1)!
 𝑥
2
2𝜈+2𝑘
+
(−1)𝜈
(𝜈 − 1)!𝜈!
Õ𝜈
ℎ=1
1
ℎ
− 2𝛾
!  𝑥
2
2𝜈
=
Õ∞
𝑘=0

𝑐𝜈,𝑘𝑥2𝑘 + ˜ 𝑐𝜈,𝑘𝑥2𝜈+2𝑘 log 𝑥

, (4.5)
where 𝛾 is the Euler’s constant, 𝑐𝜈,𝑘 , ˜ 𝑐𝜈,𝑘 are constants depending only on 𝜈 and 𝑘.
Case 1. When 𝑘 = 1, the approximated joint density is
ˆ 𝑓𝑛 (𝑥1, . . . , 𝑥𝑛) = (2𝜋𝜎2)−𝑛2 (1 − 𝐾2
𝑛,1
)−𝑛−1
2 exp
− 1
2𝜎2
𝑥2
1
+ 1
1 − 𝐾2
𝑛,1
Õ𝑛
𝑖=2
(𝑥𝑖 − 𝑥𝑖−1𝐾𝑛,1)2
!!
(4.6)
since 𝑋(𝑡𝑛
𝑖
) |𝑋(𝑡𝑛
𝑖−1
) ∼ 𝑁

𝑋(𝑡𝑛
𝑖−1
)𝐾𝑛,1, 𝜎2 (1 − 𝐾2
𝑛,1
)

. Hence,
log ˆ 𝑓𝑛 (𝑥1, . . . , 𝑥𝑛) |𝜃=𝜃0= −𝑛
2
log 𝜎2 − 1
2𝜎2
𝑥2
1
+ 1
1 − (𝐾0
𝑛 )2
Õ𝑛
𝑖=2
(𝑥𝑖 − 𝑥𝑖−1𝐾0
𝑛
)2
!
+ 𝐶, (4.7)
where 𝐾0
𝑛 = 𝐾0
𝑛,1, 𝐶 is a constant not depending on 𝜎2. The MLE of 𝜎2 calculated from (4.7) is
thus
ˆ𝜎2 =
1
𝑛
𝑥2
1
+ 1
1 − (𝐾0
𝑛 )2
Õ𝑛
𝑖=2
(𝑥𝑖 − 𝑥𝑖−1𝐾0
𝑛
)2
!
, (4.8)
60
where 𝑥𝑖 = 𝑋(𝑖/𝑛), 𝑖 = 1, . . . , 𝑛. Under model (4.1), there is
𝐸 ˆ𝜎2 =
𝜎2
𝑛

1 + (𝑛 − 1) 1 + (𝐾0
𝑛
)2 − 2𝐾0
𝑛𝐾𝑛,1
1 − (𝐾0
𝑛 )2

for any 𝑛 ≥ 2. Consequently, 𝐸 ˆ𝜎2 = 𝜎2 always holds when 𝜃0 = 𝜃. Cases when 𝜃0 ≠ 𝜃 are
discussed below.
When 0 < 𝜈 < 1, (4.3) implies that as 𝑛 → ∞,
1 + (𝐾0
𝑛
)2 − 2𝐾0
𝑛𝐾𝑛,1
1 − (𝐾0
𝑛 )2 =
𝜃2𝜈 + 𝛼𝑛−2𝜈𝜃2𝜈 (𝜃2𝜈 − 𝜃2𝜈
0
/2) − 𝑛2𝜈−2𝜃2𝛽/𝛼 + 𝑂(𝑛−2)
𝜃2𝜈
0
+ 𝛼𝑛−2𝜈 (𝜃4𝜈
0
/2) − 𝑛2𝜈−2𝜃2
0𝛽/𝛼 + 𝑂(𝑛−2)
=

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2𝜈−2) + 𝑂(𝑛−2𝜈) + 𝑂(𝑛−2).
Hence,
𝐸 ˆ𝜎2 = 𝜎2

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2𝜈−2) + 𝑂(𝑛−1) + 𝑂(𝑛−2𝜈).
When 𝜈 > 1 and 𝜈 ∉ Z, (4.3) implies that as 𝑛 → ∞,
1 + (𝐾0
𝑛
)2 − 2𝐾0
𝑛𝐾𝑛,1
1 − (𝐾0
𝑛 )2 =
−2𝛽𝜃2/𝑛2 + 2𝛼𝜃2𝜈/𝑛2𝜈 + 𝛽2(𝜃4
0
− 2𝜃2𝜃2
0
)/𝑛4 − 2𝜏(𝜃/𝑛)4 + 𝑂(𝑛−2−2𝜈)
−2𝛽𝜃2
0
/𝑛2 + 2𝛼𝜃2𝜈
0
/𝑛2𝜈 − 𝛽2(𝜃0/𝑛)4 − 2𝜏(𝜃0/𝑛)4 + 𝑂(𝑛−2−2𝜈)
=

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2−2𝜈) + 𝑂(𝑛−2𝜈) + 𝑂(𝑛−2)
and
𝐸 ˆ𝜎2 = 𝜎2

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2−2𝜈−2) + 𝑂(𝑛−1).
When 𝜈 = 1, it follows from (4.5) that
𝑥𝜈
Γ(𝜈)2𝜈−1
K𝜈 (𝑥) = 1 + 𝑐1𝑥2 log(1/𝑥) + 𝑐2𝑥2 + 𝑐3𝑥4 log(1/𝑥) + 𝑐4𝑥4 + 𝑂(𝑥6 log 𝑥) (4.9)
as 𝑥 → 0, where 𝑐1, 𝑐2, 𝑐3, 𝑐4 are constants only depending on 𝜈. Thus,
1 + (𝐾0
𝑛
)2 − 2𝐾0
𝑛𝐾𝑛,1
1 − (𝐾0
𝑛 )2
=
𝑟22𝑐1𝑛−2 log 𝑛 + 2𝑐2𝑛−2 − 2𝑐1𝑛−2(𝑟2 log 𝑟 + (1 + 𝑟2)𝑐2/𝑐1) + 𝑂(𝑛−4(log 𝑛)2)
2𝑐1𝑛−2 log 𝑛 − 2𝑐2𝑛−2 + 𝑂(𝑛−4(log 𝑛)2)
=𝑟2 + 𝑂( (log 𝑛)−1) + 𝑂( (log 𝑛)−2),
61
where 𝑟 = 𝜃/𝜃0. Hence,
𝐸 ˆ𝜎2 = 𝜎2

𝜃
𝜃0
2
+ 𝑂( (log 𝑛)−1).
Case 2. When 𝑘 = 2, the approximated joint density is
ˆ 𝑓𝑛 (𝑥1, . . . , 𝑥𝑛)
=
(2𝜋𝜎2𝑏)−𝑛2 𝑏
q
1 − 𝐾2
𝑛,1
exp
− 1
2𝜎2
𝑥2
1
+
(𝑥2 − 𝐾𝑛,1𝑥1)2
1 − 𝐾2
𝑛,1
+ 1
𝑏
Õ𝑛
𝑖=3
(𝑥𝑖 − 𝑎1𝑥𝑖−1 − 𝑎2𝑥𝑖−2)2
!!
,
where 𝑎1 = 𝐾𝑛,1−𝐾𝑛,1𝐾𝑛,2
1−(𝐾𝑛,1)2 , 𝑎2 = 𝐾𝑛,2−(𝐾𝑛,1)2
1−(𝐾𝑛,1)2 , and 𝑏 = 1 − (𝐾𝑛,1)2+(𝐾𝑛,2)2−2(𝐾𝑛,1)2𝐾𝑛,2
1−(𝐾𝑛,1)2 . This is due to
©­­­­­
«
𝑋(𝑡𝑛
𝑖
)
𝑋(𝑡𝑛
𝑖−1
)
𝑋(𝑡𝑛
𝑖−2
)
ª®®®®®
¬
∼ 𝑁
©­­­­­
«
0, 𝜎2
©­­­­­
«
1 𝐾(|𝑡𝑛
𝑖
− 𝑡𝑛
𝑖−1
|) 𝐾(|𝑡𝑛
𝑖
− 𝑡𝑛
𝑖−2
|)
𝐾(|𝑡𝑛
𝑖
− 𝑡𝑛
𝑖−1
|) 1 𝐾(|𝑡𝑛
𝑖−1
− 𝑡𝑛
𝑖−2
|)
𝐾(|𝑡𝑛
𝑖
− 𝑡𝑛
𝑖−2
|) 𝐾(|𝑡𝑛
𝑖−1
− 𝑡𝑛
𝑖−2
|) 1
ª®®®®® ¬
ª®®®®® ¬
and the regular sampling design, which implies that ∀3 ≤ 𝑖 ≤ 𝑛,
𝑋(𝑡𝑛
𝑖
) | (𝑋(𝑡𝑛
𝑖−1
), 𝑋(𝑡𝑛
𝑖−2
)) ∼ 𝑁

𝑎1𝑋(𝑡𝑛
𝑖−1
) + 𝑎2𝑋(𝑡𝑛
𝑖−2
), 𝜎2𝑏

.
Take arg max𝜎2 log ˆ 𝑓𝑛 and plug in 𝜃 = 𝜃0, then
ˆ𝜎2 =
1
𝑛
𝑥2
1
+
(𝑥2 − 𝐾0
𝑛,1𝑥1)2
1 − (𝐾0
𝑛,1
)2
+ 1
𝑏0
Õ𝑛
𝑖=3
(𝑥𝑖 − 𝑎01𝑥𝑖−1 − 𝑎02
𝑥𝑖−2)2
!
, (4.10)
where 𝑎01
=
𝐾0
𝑛
,1
−𝐾0
𝑛
,1𝐾0
𝑛
,2
1−(𝐾0
𝑛
,1
)2 , 𝑎02
=
𝐾0
𝑛
,2
−(𝐾0
𝑛
,1
)2
1−(𝐾0
𝑛
,1
)2 , and 𝑏0 =
(𝐾0𝑛
,1
)2+(𝐾0
𝑛
,2
)2−2(𝐾0
𝑛
,1
)2𝐾0
𝑛
,2
1−(𝐾0
𝑛
,1
)2 . This estimator can
also be written as a quadratic form
ˆ𝜎2 =
1
𝑛
𝑋𝑇
𝑛 𝑀−1𝑋𝑛,
where 𝑋𝑛 = (𝑋(𝑡𝑛
1
), 𝑋(𝑡𝑛
2
), . . . , 𝑋(𝑡𝑛
𝑛
)) and
𝑀−1 =
©­­­­­­­­­­­­­­­­­­­
«
1
1−(𝐾0
𝑛
,1
)2
+ (𝑎02
)2
𝑏0
𝑎01
𝑎02
𝑏0 − 𝐾0
𝑛
,1
1−(𝐾0
𝑛
,1
)2
−𝑎02
𝑏0
𝑎01
𝑎02
𝑏0 − 𝐾0
𝑛
,1
1−(𝐾0
𝑛
,1
)2
1
1−(𝐾0
𝑛
,1
)2
+ 𝑎02
12
𝑏0
𝑎0
12
𝑏0
−𝑎02
𝑏0
−𝑎02
𝑏0
𝑎0
12
𝑏0
1+𝑎02
12
𝑏0
𝑎0
12
𝑏0
. . .
−𝑎02
𝑏0
𝑎0
12
𝑏0
. . .
. . .
. . .
. . .
1+𝑎02
12
𝑏0
𝑎0
12
𝑏0
−𝑎02
𝑏0
. . . 𝑎0
12
𝑏0
1+(𝑎01
)2
𝑏0
−𝑎01
𝑏0
−𝑎02
𝑏0
−𝑎01
𝑏0
1
𝑏0
ª®®®®®®®®®®®®®®®®®®®
¬
62
is an 𝑛-dimensional pentadiagonal matrix, where 𝑎0
12 = 𝑎01
𝑎02
− 𝑎01
, 𝑎02
12 = (𝑎01
)2 + (𝑎02
)2.
Denote by 𝜎2Σ the covariance matrix of 𝑋𝑛, then Σ𝑖 𝑗 = 𝐾𝑛,|𝑖−𝑗 | and
𝐸 ˆ𝜎2 =
𝜎2
𝑛
Tr(𝑀−1Σ)
=
𝜎2
𝑛
2 + (𝑛 − 2)
(1 + (𝐾0
𝑛,1
)2 − 2𝐾0
𝑛,1𝐾𝑛,1) (1 − 𝐾0
𝑛,2
)
(1 + 𝐾0
𝑛,2
− 2(𝐾0
𝑛,1
)2) (1 − (𝐾0
𝑛,1
)2)
+
2(𝐾0
𝑛,2
− (𝐾0
𝑛,1
)2) (1 − 𝐾𝑛,2)
(1 + 𝐾0
𝑛,2
− 2(𝐾0
𝑛,1
)2) (1 − 𝐾0
𝑛,2
)
!!
:=
𝜎2
𝑛
(2 + (𝑛 − 2)𝐴𝑛) . (4.11)
Consequently, 𝐸 ˆ𝜎2 = 𝜎2 always holds when 𝜃0 = 𝜃. Cases when 𝜃0 ≠ 𝜃 are discussed below.
After similar steps as did in Case 1, it follows from (4.3) that when 𝜈 ∉ Z,
𝐴𝑛 =
8>>>>>>>>
<
>>>>>>>>:

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2𝜈−2) + 𝑂(𝑛−2𝜈), if 𝜈 < 1,

𝜃
𝜃0
2𝜈
+ 𝑂(𝑛2−2𝜈) + 𝑂(𝑛2𝜈−4), if 1 < 𝜈 < 2,

𝜃
𝜃0
4
+ 𝛽2
6𝜏−𝛽2

𝜃
𝜃0
2
− 1
2
+ 𝑂(𝑛4−2𝜈) + 𝑂(𝑛−2), if 𝜈 > 2.
When 𝜈 = 1, it follows from (4.5) and (4.9) that as 𝑛 → ∞,
1 + (𝐾0
𝑛,1
)2 − 2𝐾0
𝑛,1𝐾𝑛,1
1 − (𝐾0
𝑛,1
)2 = 𝑟2 + 𝑟2 log 𝑟
log(𝜃0/𝑛)
+ 𝑐2𝑟2 log 𝑟
𝑐1(log(𝜃0/𝑛))2
+ 𝑂( (log 𝑛)−3),
1 − 𝐾0
𝑛,2
1 + 𝐾0
𝑛,2
− 2(𝐾0
𝑛,1
)2 = −log(𝜃0/𝑛)
log 2
− log 2 − 𝑐2/𝑐1
log 2
+ 𝑂(𝑛−2(log 𝑛)3),
𝐾0
𝑛,2
− (𝐾0
𝑛,1
)2
1 + 𝐾0
𝑛,2
− 2(𝐾0
𝑛,1
)2 =
log(𝜃0/𝑛)
2 log 2
+ 2 log 2 − 𝑐2/𝑐1
2 log 2
+ 𝑂(𝑛−2(log 𝑛)3),
1 − 𝐾𝑛,2
1 − 𝐾0
𝑛,2
= 𝑟2 + 𝑟2 log 𝑟
log(2𝜃0/𝑛)
+ 𝑐2𝑟2 log 𝑟
𝑐1 (log(2𝜃0/𝑛))2
+ 𝑂( (log 𝑛)−3),
where 𝑟 = 𝜃/𝜃0. Hence,
𝐴𝑛 =

𝜃
𝜃0
2
+ 𝑂( (log 𝑛)−1) + 𝑂( (log 𝑛)−2).
Similarly, when 𝜈 = 2, it follows from (4.5) that
𝑥𝜈
Γ(𝜈)2𝜈−1
K𝜈 (𝑥) = 1 + 𝑐′
2𝑥2 + 𝑐′
3𝑥4 log(1/𝑥) + 𝑐′
4𝑥4 + 𝑂(𝑥6 log 𝑥) (4.12)
63
as 𝑥 → 0, where 𝑐′
2, 𝑐′
3, 𝑐′
4 are constants only depending on 𝜈. Thus, as 𝑛 → ∞,
1 + (𝐾0
𝑛,1
)2 − 2𝐾0
𝑛,1𝐾𝑛,1
1 − (𝐾0
𝑛,1
)2 = 𝑟2 + (𝑟2 − 𝑟4)
𝑐′
3
𝑐′
2

𝜃0
𝑛
2
log

𝜃0
𝑛

+ 𝑂(𝑛−2),
1 − 𝐾0
𝑛,2
1 + 𝐾0
𝑛,2
− 2(𝐾0
𝑛,1
)2 = − 4
3
+
𝑐′
2𝑛2(16𝑐′
4
− 16𝑐′
3 log 2 − 2(𝑐′
2
)2 − 4𝑐′
4
)
36(𝑐′
3𝜃0 log(𝜃0/𝑛))2
+
𝑐′
2𝑛2
3𝑐′
3𝜃2
0 log(𝜃0/𝑛)
+ 𝑂(𝑛2(log 𝑛)−3),
𝐾0
𝑛,2
− (𝐾0
𝑛,1
)2
1 + 𝐾0
𝑛,2
− 2(𝐾0
𝑛,1
)2 =
7
6
−
𝑐′
2𝑛2(16𝑐′
4
− 16𝑐′
3 log 2 − 2(𝑐′
2
)2 − 4𝑐′
4
)
72(𝑐′
3𝜃0 log(𝜃0/𝑛))2
−
𝑐′
2𝑛2
6𝑐′
3𝜃2
0 log(𝜃0/𝑛)
+ 𝑂(𝑛2(log 𝑛)−3),
1 − 𝐾𝑛,2
1 − 𝐾0
𝑛,2
= 𝑟2 + 4(𝑟2 − 𝑟4)
𝑐′
3
𝑐′
2

𝜃0
𝑛
2
log

2𝜃0
𝑛

+ 𝑂(𝑛−2),
𝐴𝑛 =

𝜃
𝜃0
4
+ 𝑂( (log 𝑛)−1) + 𝑂( (log 𝑛)−2).
This together with (4.11) finishes the proof.
Remark. Only 𝑘 = 1, 2 are considered in Proposition 6 since the corresponding Vecchia approximation
is computationally efficient. If 𝜃 is known, then taking 𝜃0 = 𝜃 when construct ˆ𝜎2 will
result in unbiased estimator for 𝜎2.
4.3 Simulation
Let 𝜎2 = 1 and 𝜃 = 5 in (4.1). For each value of 𝑛 ∈ {200, 250, . . . , 1000}, generate 15000
independent realizations of 𝑋. In the following text, denote by 𝜎2
𝜈,𝑘 = lim𝑛→∞ 𝐸 ˆ𝜎2, whose value
is proved in Proposition 6.
Fix 𝜃0 = 1 when solving for MLE of 𝜎2 using the Vecchia approximation (4.2). For (𝜈, 𝑘) ∈
{(0.3, 1), (1.3, 1), (1.3, 2)}, the first row of plots in Figure 4.1 presents the boxplot of ˆ𝜎2 − 𝜎2
𝜈,𝑘
among 15000 realizations at each sample size 𝑛. The second row of plots in Figure 4.1 presents
the empirical distribution of ˆ𝜎2 − 𝜎2
𝜈,𝑘 when 𝑛 = 1000, where the red curve indicates the density
64
200 350 500 650 800 950
−1.0 −0.5 0.0 0.5 1.0
n = 0.3, k = 1
n
200 350 500 650 800 950
−20 0 20 40 60
n = 1.3, k = 1
n
200 350 500 650 800 950
−20 −10 0 10 20
n = 1.3, k = 2
n
Density
−0.4 −0.2 0.0 0.2 0.4
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Density
−10 0 10 20 30 40 50
0.00 0.01 0.02 0.03 0.04 0.05
Density
−10 −5 0 5 10
0.00 0.02 0.04 0.06 0.08 0.10 0.12
Figure 4.1Empirical distributions of bias with 15000 realizations. (𝜎2 = 1, 𝜃 = 5, 𝜃0 = 1.)
function of normal distribution with zero mean and standard deviation equals the empirical standard
deviation of ˆ𝜎2−𝜎2
𝜈,𝑘 among 15000 realizations. For the same three pairs of values of (𝜈, 𝑘), Figure
4.2 presents the average and standard deviation of absolute values of ˆ𝜎2 − 𝜎2
𝜈,𝑘 at each sample size
𝑛 among 15000 realizations when (𝜈, 𝑘) = (0.3, 1) and (𝜈, 𝑘) = (1.3, 2). For the case when
(𝜈, 𝑘) = (1.3, 1), 50000 realizations are generated since the estimator ˆ𝜎2 has a larger variance.
Fix 𝜃0 = 𝜃 = 5 when solving for MLE of 𝜎2 using the Vecchia approximation (4.2), then
𝜎2
𝜈,𝑘 = 𝜎2 = 1. For the same dataset of realizations, plots in Figure 4.3 include the boxplot of ˆ𝜎2−𝜎2
among 15000 realizations at each sample size 𝑛, as well as the empirical distribution of ˆ𝜎2 − 𝜎2
when 𝑛 = 1000, where the red curve indicates the density function of normal distribution with
zero mean and standard deviation equals the empirical standard deviation of ˆ𝜎2 −𝜎2 among 15000
realizations. Figure 4.4 presents the average and standard deviation of absolute values of ˆ𝜎2 − 𝜎2
at each sample size 𝑛 among 15000 realizations when (𝜈, 𝑘) = (0.3, 1) and (𝜈, 𝑘) = (1.3, 2). For
the case when (𝜈, 𝑘) = (1.3, 1), since the variance of ˆ𝜎2 is larger, 50000 realizations are generated.
The first row of plots in Figure 4.2 and Figure 4.4 illustrate Proposition 6. Furthermore, it is
65
200 400 600 800 1000
0.10 0.12 0.14 0.16 0.18 0.20 0.22
n = 0.3, k = 1
n
bias
200 400 600 800 1000
6.20 6.25 6.30 6.35 6.40 6.45
n = 1.3, k = 1
n
bias
200 400 600 800 1000
2.5 3.0 3.5 4.0 4.5 5.0 5.5
n = 1.3, k = 2
n
bias
200 400 600 800 1000
0.15 0.20 0.25
n = 0.3, k = 1
n
sd
200 400 600 800 1000
7.88 7.90 7.92 7.94 7.96 7.98
n = 1.3, k = 1
n
sd
200 400 600 800 1000
3 4 5 6
n = 1.3, k = 2
n
sd
Figure 4.2The average and standard deviation for absolute value of bias when
𝑛 = 200, 250, . . . , 1000. (𝜎2 = 1, 𝜃 = 5, 𝜃0 = 1.)
indicated by the simulation results that when 𝑘 < 𝜈, the standard deviation of ˆ𝜎2 is not significantly
reduced as the sample size increases, and the empirical distribution of ˆ𝜎2−𝜎2
𝜈,𝑘 appears to be rightskewed.
When 𝑘 > 𝜈, however, the standard deviation of ˆ𝜎2 decreases as the sample size increases,
and the empirical distribution of ˆ𝜎2 − 𝜎2
𝜈,𝑘 when 𝑛 = 1000 is close to normal distribution. As is
observed from Figure 4.2, the standard deviation of ˆ𝜎2 when (𝜈, 𝑘) = (0.3, 1) is smaller compared
with the case when (𝜈, 𝑘) = (1.3, 2). Let 𝜃0 = 𝜃, then (𝜈, 𝑘) = (0.3, 1) and (𝜈, 𝑘) = (1.3, 2) result
in similar values of the standard deviation of ˆ𝜎2, as is shown in Figure 4.4.
For future research, it is interesting to perform theoretical analysis for more asymptotic properties
of ˆ𝜎2, including the convergence rate of its variance and its asymptotic distribution. The
sampling design considered in this chapter is limited to a regular grid on the line, which is also the
sampling design studied in Section III of Zhang et al. (2021). It is challenging but interesting to
extend the existing results to irregular sampling designs on R (𝑑 ≥ 1).
66
200 350 500 650 800 950
−0.2 0.0 0.2 0.4
n = 0.3, k = 1
n
200 350 500 650 800 950
0 1 2 3
n = 1.3, k = 1
n
200 350 500 650 800 950
−0.2 0.0 0.2 0.4
n = 1.3, k = 2
n
Density
−0.1 0.0 0.1 0.2
0 2 4 6 8
Density
−1 0 1 2 3
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Density
−0.2 −0.1 0.0 0.1 0.2
0 2 4 6 8
Figure 4.3Empirical distributions of bias with 15000 realizations. (𝜎2 = 1, 𝜃 = 5, 𝜃0 = 5.)
200 400 600 800 1000
0.04 0.05 0.06 0.07 0.08
n = 0.3, k = 1
n
bias
200 400 600 800 1000
0.250 0.254 0.258 0.262
n = 1.3, k = 1
n
bias
200 400 600 800 1000
0.04 0.05 0.06 0.07 0.08
n = 1.3, k = 2
n
bias
200 400 600 800 1000
0.05 0.06 0.07 0.08 0.09 0.10
n = 0.3, k = 1
n
sd
200 400 600 800 1000
0.325 0.330 0.335
n = 1.3, k = 1
n
sd
200 400 600 800 1000
0.05 0.06 0.07 0.08 0.09 0.10
n = 1.3, k = 2
n
sd
Figure 4.4The average and standard deviation for absolute value of bias when
𝑛 = 200, 250, . . . , 1000. (𝜎2 = 1, 𝜃 = 5, 𝜃0 = 5.)
67
BIBLIOGRAPHY
[1] Milton Abramowitz and Irene A Stegun. Handbook of mathematical functions with formulas,
graphs, and mathematical tables, volume 55. US Government printing office, 1948.
[2] Ethan B. Anderes and Michael L. Stein. Estimating deformations of isotropic Gaussian random
fields on the plane. The Annals of Statistics, 36(2):719 – 741, 2008. doi: 10.1214/
009053607000000893. URL https://doi.org/10.1214/009053607000000893.
[3] François Bachoc, Agnès Lagnoux, and Andrés F. López-Lopera. Maximum likelihood estimation
for Gaussian processes under inequality constraints. Electronic Journal of Statistics, 13
(2):2921 – 2969, 2019. doi: 10.1214/19-EJS1587. URL https://doi.org/10.1214/19-EJS1587.
[4] Alessandro Baldi Antognini and Maroussa Zagoraiou. Exact optimal designs for computer
experiments via kriging metamodelling. Journal of Statistical Planning and Inference, 140
(9):2607–2617, 2010. ISSN 0378-3758. doi: https://doi.org/10.1016/j.jspi.2010.03.027. URL
https://www.sciencedirect.com/science/article/pii/S0378375810001400.
[5] Glen Earl Baxter. A strong limit theorem for gaussian processes. Proceedings of
the American Mathematical Society, 7(3):522–527, 1956. URL https://doi.org/10.1090/
S0002-9939-1956-0090920-6.
[6] Arnaud Begyn. Quadratic Variations along Irregular Subdivisions for Gaussian Processes.
Electronic Journal of Probability, 10:691 – 717, 2005. doi: 10.1214/EJP.v10-245. URL
https://doi.org/10.1214/EJP.v10-245.
[7] Moreno Bevilacqua, Tarik Faouzi, Reinhard Furrer, and Emilio Porcu. Estimation and prediction
using generalized wendland covariance functions under fixed domain asymptotics.
The Annals of Statistics, 47(2):pp. 828–856, 2019. ISSN 00905364, 21688966. URL
https://www.jstor.org/stable/26581883.
[8] David Bolin and Jonas Wallin. Multivariate type g matérn stochastic partial differential equation
random fields. Journal of the Royal Statistical Society Series B: Statistical Methodology,
82(1):215–239, 2020.
[9] Jian Cao, Joseph Guinness, Marc G Genton, and Matthias Katzfuss. Scalable gaussian-process
regression and variable selection using vecchia approximations. Journal of Machine Learning
Research, 23(348):1–30, 2022.
[10] Ricardo Carrizo Vergara. Development of geostatistical models using stochastic partial differential
equations. Theses, Université Paris sciences et lettres, December 2018. URL
https://pastel.hal.science/tel-02188146.
[11] Grace Chan and Andrew T.A. Wood. Increment-based estimators of fractal dimension for
two-dimensional surface data. Statistica Sinica, 10, 2000. ISSN 10170405.
68
[12] Grace Chan, Peter Hall, and D. S. Poskitt. Periodogram-Based Estimators of Fractal Properties.
The Annals of Statistics, 23(5):1684 – 1711, 1995. doi: 10.1214/aos/1176324319. URL
https://doi.org/10.1214/aos/1176324319.
[13] Chih-Hao Chang, Hsin-Cheng Huang, and Ching-Kang Ing. Mixed domain asymptotics for a
stochastic process model with time trend and measurement error. Bernoulli, 23(1):159 – 190,
2017. doi: 10.3150/15-BEJ740. URL https://doi.org/10.3150/15-BEJ740.
[14] A. G. Constantine and Peter Hall. Characterizing Surface Smoothness Via Estimation of Effective
Fractal Dimension. Journal of the Royal Statistical Society: Series B (Methodological),
56(1):97–113, 12 1994. ISSN 0035-9246. doi: 10.1111/j.2517-6161.1994.tb01963.x. URL
https://doi.org/10.1111/j.2517-6161.1994.tb01963.x.
[15] Noel Cressie and Hsin-Cheng Huang. Classes of nonseparable, spatio-temporal stationary
covariance functions. Journal of the American Statistical Association, 94(448):1330–1340,
1999. ISSN 01621459. URL http://www.jstor.org/stable/2669946.
[16] Noel A. C. Cressie. Statistics for Spatial Data. J. Wiley, 1993.
[17] Abhirup Datta, Sudipto Banerjee, Andrew O Finley, and Alan E Gelfand. Hierarchical nearestneighbor
gaussian process models for large geostatistical datasets. Journal of the American
Statistical Association, 111(514):800–812, 2016.
[18] Sandra De Iaco, Donald Myers, and Donato Posa. Nonseparable space-time covariance models:
Some parametric families. Mathematical Geology, 34:23–42, 01 2002. ISSN 1573-8868.
doi: 10.1023/A:1014075310344.
[19] Juan Du, Hao Zhang, and V. S. Mandrekar. Fixed-domain asymptotic properties of tapered
maximum likelihood estimators. The Annals of Statistics, 37(6A):3330–3361, 2009. ISSN
00905364, 21688966. URL http://www.jstor.org/stable/25662196.
[20] B. Dubuc, J. F. Quiniou, C. Roques-Carmes, C. Tricot, and S. W. Zucker. Evaluating the fractal
dimension of profiles. Phys. Rev. A, 39:1500–1512, Feb 1989. doi: 10.1103/PhysRevA.39.
1500. URL https://link.aps.org/doi/10.1103/PhysRevA.39.1500.
[21] Andrew O. Finley, Abhirup Datta, Bruce D. Cook, Douglas C. Morton, Hans E. Andersen,
and Sudipto Banerjee. Efficient algorithms for bayesian nearest neighbor gaussian processes.
Journal of Computational and Graphical Statistics, 28(2):401–414, 2019. doi:
10.1080/10618600.2018.1537924. URL https://doi.org/10.1080/10618600.2018.1537924.
PMID: 31543693.
[22] Reinhard Furrer, Marc G Genton, and Douglas Nychka. Covariance tapering for interpolation
of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502–523,
2006. doi: 10.1198/106186006X132178. URL https://doi.org/10.1198/106186006X132178.
69
[23] Reinhard Furrer, Marc G. Genton, and Douglas Nychka. Covariance tapering for interpolation
of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502–523,
2006. ISSN 10618600. URL http://www.jstor.org/stable/27594195.
[24] R. Gay and C. C. Heyde. On a class of random field models which allows long range dependence.
Biometrika, 77(2):401–403, 1990. ISSN 00063444. URL http://www.jstor.org/stable/
2336820.
[25] Tilmann Gneiting. Nonseparable, stationary covariance functions for space-time data. Journal
of the American Statistical Association, 97(458):590–600, 2002. ISSN 01621459. URL http:
//www.jstor.org/stable/3085674.
[26] Tilmann Gneiting, William Kleiber, and Martin Schlather. Matérn cross-covariance functions
for multivariate random fields. Journal of the American Statistical Association, 105(491):
1167–1177, 2010. doi: 10.1198/jasa.2010.tm09420. URL https://doi.org/10.1198/jasa.2010.
tm09420.
[27] Tilmann Gneiting, Hana Ševčíková, and Donald B. Percival. Estimators of Fractal Dimension:
Assessing the Roughness of Time Series and Spatial Data. Statistical Science, 27(2):247 –
277, 2012. doi: 10.1214/11-STS370. URL https://doi.org/10.1214/11-STS370.
[28] Robert B Gramacy. Surrogates: Gaussian process modeling, design, and optimization for the
applied sciences. Chapman and Hall/CRC, 2020.
[29] Ulf Grenander. Abstract Inference. Wiley, 1981.
[30] Joseph Guinness. Permutation and grouping methods for sharpening gaussian process approximations.
Technometrics, 60(4):415–429, 2018.
[31] Peter Hall and Andrew Wood. On the performance of box-counting estimators of fractal
dimension. Biometrika, 80(1):246–251, 03 1993. ISSN 0006-3444. doi: 10.1093/biomet/80.
1.246. URL https://doi.org/10.1093/biomet/80.1.246.
[32] V. Heine. Models for two-dimensional stationary stochastic processes. Biometrika, 42(1/2):
170–178, 1955. ISSN 00063444. URL http://www.jstor.org/stable/2333434.
[33] Xiangping Hu and Ingelin Steinsland. Spatial modeling with system of stochastic partial
differential equations. Wiley Interdisciplinary Reviews: Computational Statistics, 8(2):112–
125, 2016.
[34] Xiangping Hu, Daniel Simpson, Finn Lindgren, and Håvard Rue. Multivariate gaussian
random fields using systems of stochastic partial differential equations. arXiv preprint
arXiv:1307.1379, 2013.
[35] I.A. Ibragimov and Y.A. Rozanov. Gaussian random processes. Springer-Verlag, 1978.
70
[36] Jacques Istas and Gabriel Lang. Quadratic variations and estimation of the local hölder index
of a gaussian process. Annales de l’I.H.P. Probabilités et statistiques, 33(4):407–436, 1997.
URL http://www.numdam.org/item/AIHPB_1997__33_4_407_0/.
[37] Richard H. Jones and Yiming Zhang. Models for continuous stationary space-time processes.
In Timothy G. Gregoire, David R. Brillinger, Peter J. Diggle, Estelle Russek-Cohen,
William G. Warren, and Russell D. Wolfinger, editors, Modelling Longitudinal and Spatially
Correlated Data, pages 289–298, New York, NY, 1997. Springer New York. ISBN 978-1-
4612-0699-6.
[38] Matthias Katzfuss. A multi-resolution approximation for massive spatial datasets. Journal
of the American Statistical Association, 112(517):201–214, 2017. doi: 10.1080/01621459.
2015.1123632. URL https://doi.org/10.1080/01621459.2015.1123632.
[39] Matthias Katzfuss and Joseph Guinness. A general framework for vecchia approximations of
gaussian processes. Statistical Science, 36(1):124–141, 2021.
[40] C. G. Kaufman and B. A. Shaby. The role of the range parameter for estimation and prediction
in geostatistics. Biometrika, 100(2):473–484, 2013. ISSN 00063444. URL http://www.jstor.
org/stable/43304571.
[41] Cari G. Kaufman, Mark J. Schervish, and Douglas W. Nychka. Covariance tapering for
likelihood-based estimation in large spatial data sets. Journal of the American Statistical
Association, 103(484):1545–1555, 2008. doi: 10.1198/016214508000000959. URL
https://doi.org/10.1198/016214508000000959.
[42] Cari G. Kaufman, Mark J. Schervish, and Douglas W. Nychka. Covariance tapering for
likelihood-based estimation in large spatial data sets. Journal of the American Statistical
Association, 103(484):1545–1555, 2008. ISSN 01621459. URL http://www.jstor.org/stable/
27640203.
[43] John T. Kent. Continuity Properties for Random Fields. The Annals of Probability, 17(4):1432
– 1440, 1989. doi: 10.1214/aop/1176991163. URL https://doi.org/10.1214/aop/1176991163.
[44] John T. Kent and Andrew T. A. Wood. Estimating the fractal dimension of a locally selfsimilar
gaussian process by using increments. Journal of the Royal Statistical Society. Series B
(Methodological), 59(3):679–699, 1997. ISSN 00359246. URL http://www.jstor.org/stable/
2346018.
[45] K Kubilius and D Melichov. Quadratic variations and estimation of the hurst index of the
solution of sde driven by a fractional brownian motion. Lithuanian Mathematical Journal, 50
(4):401–417, Nov 2010.
[46] S. N. Lahiri. Central limit theorems for weighted sums of a spatial process under a class of
stochastic and fixed designs. Sankhyā: The Indian Journal of Statistics (2003-2007), 65(2):
71
356–388, 2003. ISSN 09727671. URL http://www.jstor.org/stable/25053269.
[47] S. N. Lahiri and Kanchan Mukherjee. Asymptotic distributions of m-estimators in a spatial
regression model under some fixed and stochastic spatial sampling designs. Annals of the
Institute of Statistical Mathematics, 56(2):225–250, June 2004.
[48] Tao-Kai Lam and Wei-Liem Loh. Estimating structured correlation matrices in smooth
Gaussian random field models. The Annals of Statistics, 28(3):880 – 904, 2000. doi:
10.1214/aos/1015952003. URL https://doi.org/10.1214/aos/1015952003.
[49] Nikolai Leonenko, Maria D. Ruiz-Medina, and Murad S. Taqqu. Fractional Elliptic, Hyperbolic
and Parabolic Random Fields. Electronic Journal of Probability, 16(none):1134 – 1172,
2011. doi: 10.1214/EJP.v16-891. URL https://doi.org/10.1214/EJP.v16-891.
[50] Paul Lévy. Le mouvement brownien plan. American Journal of Mathematics, 62:487–550,
1940. URL https://mathscinet.ams.org/mathscinet-getitem?mr=2734.
[51] Finn Lindgren, Håvard Rue, and Johan Lindström. An explicit link between gaussian fields and
gaussian markov random fields: the stochastic partial differential equation approach. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 73(4):423–498, 2011. doi:
https://doi.org/10.1111/j.1467-9868.2011.00777.x. URL https://rss.onlinelibrary.wiley.com/
doi/abs/10.1111/j.1467-9868.2011.00777.x.
[52] Finn Lindgren, David Bolin, and Håvard Rue. The spde approach for gaussian and nongaussian
fields: 10 years and still running. Spatial Statistics, 50:100599, 2022. ISSN 2211-
6753. doi: https://doi.org/10.1016/j.spasta.2022.100599. URL https://www.sciencedirect.
com/science/article/pii/S2211675322000057. Special Issue: The Impact of Spatial Statistics.
[53] Wei-Liem Loh. Estimating the smoothness of a Gaussian random field from irregularly spaced
data via higher-order quadratic variations. The Annals of Statistics, 43(6):2766 – 2794, 2015.
doi: 10.1214/15-AOS1365. URL https://doi.org/10.1214/15-AOS1365.
[54] Wei-Liem Loh, Saifei Sun, and Jun Wen. On fixed-domain asymptotics, parameter estimation
and isotropic Gaussian random fields with Matérn covariance functions. The Annals of
Statistics, 49(6):3127 – 3152, 2021. doi: 10.1214/21-AOS2077. URL https://doi.org/10.
1214/21-AOS2077.
[55] Wei-Liem Loh, Saifei Sun, and Jun Wen. Supplement to ”On fixed-domain asymptotics, parameter
estimation and isotropic Gaussian random fields with Matérn covariance functions”,
2021.
[56] Chunsheng Ma. Spatio-temporal variograms and covariance models. Advances in Applied
Probability, 37(3):706–725, 2005. ISSN 00018678. URL http://www.jstor.org/stable/
30037351.
72
[57] Chunsheng Ma. Recent developments on the construction of spatio-temporal covariance
models. Stochastic Environmental Research and Risk Assessment, 22:39–47, 03 2008. doi:
10.1007/s00477-007-0154-x.
[58] Bertil Matérn. Spatial Variation. Springer New York, NY, 1986.
[59] M.M. Meerschaert and A. Sikorskii. Stochastic Models for Fractional Calculus. De Gruyter
Studies in Mathematics. De Gruyter, 2011. ISBN 9783110258165.
[60] Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan
Sain. A multiresolution gaussian process model for the analysis of large spatial datasets. Journal
of Computational and Graphical Statistics, 24(2):579–599, 2015. doi: 10.1080/10618600.
2014.914946. URL https://doi.org/10.1080/10618600.2014.914946.
[61] Eulogio Pardo-Igúzquiza and Peter A Dowd. Amle3d: A computer program for the inference
of spatial covariance parameters by approximate maximum likelihood estimation. Computers
& Geosciences, 23(7):793–805, 1997.
[62] Vladimir I. Piterbarg. Asymptotic Methods in the Theory of Gaussian Processes and Fields.
American Mathematical Society, 1995.
[63] Havard Rue and Leonhard Held. Gaussian Markov random fields: theory and applications.
Chapman and Hall/CRC, 2005. URL https://doi.org/10.1201/9780203492024.
[64] Michael L. Stein, Zhiyi Chi, and Leah J. Welty. Approximating Likelihoods for Large Spatial
Data Sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 66(2):
275–296, 04 2004. ISSN 1369-7412. doi: 10.1046/j.1369-7412.2003.05512.x. URL https:
//doi.org/10.1046/j.1369-7412.2003.05512.x.
[65] Michael Leonard Stein. Interpolation of Spatial Data: Some Theory for Kriging. Springer,
1999.
[66] George E Uhlenbeck and Leonard S Ornstein. On the theory of the brownian motion. Physical
review, 36(5):823, 1930.
[67] Balth. van der Pol and H. Bremmer. Operational calculus based on the two-sided Laplace
integral. Cambridge University Press, 1950.
[68] Aad van der Vaart. Maximum likelihood estimation under a spatial sampling scheme. The
Annals of Statistics, 24(5):2049 – 2057, 1996. doi: 10.1214/aos/1069362309. URL https:
//doi.org/10.1214/aos/1069362309.
[69] A. V. Vecchia. A general class of models for stationary two-dimensional random processes.
Biometrika, 72(2):281–291, 1985. ISSN 00063444. URL http://www.jstor.org/stable/
2336080.
73
[70] A. V. Vecchia. Estimation and model identification for continuous spatial processes. J. Roy.
Statist. Soc. Ser. B, 50(2):297–312, 1988. ISSN 0035-9246. URL http://links.jstor.org/sici?
sici=0035-9246(1988)50:2<297:EAMIFC>2.0.CO;2-D&origin=MSN.
[71] Theodore von Kármán. Progress in the statistical theory of turbulence. Proceedings of the
National Academy of Sciences of the United States of America, 34(11):530–539, 1948. doi:
https://doi.org/10.1073/pnas.34.11.530.
[72] Daqing Wang. Fixed domain asymptotics and consistent estimation for gaussian random field
models in spatial statistics and computer experiments, 2010. URL https://scholarbank.nus.
edu.sg/handle/10635/19060.
[73] Daqing Wang and Wei-Liem Loh. On fixed-domain asymptotics and covariance tapering in
Gaussian random field models. Electronic Journal of Statistics, 5(none):238 – 269, 2011. doi:
10.1214/11-EJS607. URL https://doi.org/10.1214/11-EJS607.
[74] P. Whittle. On stationary processes in the plane. Biometrika, 41(3/4):434–449, 1954. ISSN
00063444. URL http://www.jstor.org/stable/2332724.
[75] P. Whittle. Stochastic processes in several dimensions. Bulletin of the International Statistical
Institute, 40:974–994, 1963.
[76] Zhiliang Ying. Asymptotic properties of a maximum likelihood estimator with data from a
gaussian process. Journal of Multivariate Analysis, 36(2):280 – 296, 1991. doi: 10.1016/
0047-259X(91)90062-7.
[77] Zhiliang Ying. Maximum likelihood estimation of parameters under a spatial sampling
scheme. The Annals of Statistics, 21, 9 1993. ISSN 0090-5364. doi: 10.1214/aos/1176349272.
[78] Hao Zhang. Inconsistent estimation and asymptotically equal interpolations in model-based
geostatistics. Journal of the American Statistical Association, 99(465):250–261, 2004. doi:
10.1198/016214504000000241. URL https://doi.org/10.1198/016214504000000241.
[79] Lu Zhang, Wenpin Tang, and Sudipto Banerjee. Fixed-domain asymptotics under vecchia’s
approximation of spatial process likelihoods. arXiv preprint arXiv:2101.08861, 2021. URL
https://doi.org/10.48550/arXiv.2101.08861.
[80] Yuzhen Zhou and Yimin Xiao. Joint asymptotics for estimating the fractal indices of bivariate
gaussian processes. Journal of Multivariate Analysis, 165:56–72, 2018. ISSN 0047-
259X. doi: https://doi.org/10.1016/j.jmva.2017.12.001. URL https://www.sciencedirect.
com/science/article/pii/S0047259X17307509.
[81] Zhengyuan Zhu and Michael L. Stein. Parameter estimation for fractional brownian surfaces.
Statistica Sinica, 12(3):863–883, 2002. ISSN 10170405, 19968507. URL http://www.jstor.
org/stable/24306999.
74
[82] Zhengyuan Zhu and Hao Zhang. Spatial sampling design under the infill asymptotic framework.
Environmetrics, 17(4):323–337, 2006. doi: https://doi.org/10.1002/env.772. URL
https://onlinelibrary.wiley.com/doi/abs/10.1002/env.772.
75
APPENDIX A
QUADRATIC VARIATIONS FROM IRREGULAR SAMPLING
A.1 𝑑 = 1
(6) studied quadratic variations defined using irregular observations of process (𝑋𝑡 )𝑡∈[0,1] with
Gaussian increments. Suppose (𝑋𝑡 ) is observed at
0 = 𝑡
(𝑛)
0 < 𝑡
(𝑛)
1 < · · · < 𝑡
(𝑛)
𝑁𝑛
= 1, 𝑛 ∈ N
and denote by Δ𝑡
(𝑛)
𝑘 = 𝑡
(𝑛)
𝑘+1
− 𝑡
(𝑛)
𝑘 , 𝑘 = 0, . . . , 𝑁𝑛 − 1. Write Δ𝑡
(𝑛)
𝑘 as Δ𝑡𝑘 for brevity. Let
Δ𝑋𝑘 = Δ𝑡𝑘−1𝑋𝑡𝑘+1
+ Δ𝑡𝑘 𝑋𝑡𝑘−1
− (Δ𝑡𝑘−1 + Δ𝑡𝑘 )𝑋𝑡𝑘 . (A.1)
It is straightforward that
𝑡𝑞
𝑘+1Δ𝑡𝑘−1 + 𝑡𝑞
𝑘−1Δ𝑡𝑘 − 𝑡𝑞
𝑘
(Δ𝑡𝑘−1 + Δ𝑡𝑘 ) = 0, 𝑞 = 0, 1;
𝑡2
𝑘
+1Δ𝑡𝑘−1 + 𝑡2
𝑘
−1Δ𝑡𝑘 − 𝑡2
𝑘
(Δ𝑡𝑘−1 + Δ𝑡𝑘 ) ≠ 0.
The second order quadratic variation is then defined as
V𝑛 (𝑋) = 2
𝑁Õ𝑛−1
𝑘=1
Δ𝑡𝑘 (Δ𝑋𝑘 )2
(Δ𝑡𝑘−1) 3−𝛾
2 (Δ𝑡𝑘 ) 3−𝛾
2 (Δ𝑡𝑘−1 + Δ𝑡𝑘 )
, (A.2)
where 𝛾 > 0 is related to the smoothness of (𝑋𝑡 ). For example, if (𝑋𝑡 ) is a fractional Brownian
motion with Hurst’s index 𝐻, then 𝛾 = 2 − 2𝐻.
Denote by 𝑚𝑛 = max{Δ𝑡
(𝑛)
𝑘 ; 0 ≤ 𝑘 ≤ 𝑁𝑛 − 1} and 𝑝𝑛 = min{Δ𝑡
(𝑛)
𝑘 ; 0 ≤ 𝑘 ≤ 𝑁𝑛 − 1}. It is
assumed in (6) that
(i) For a sequence of positive real numbers (𝑙𝑘 )𝑘≥1,
lim
𝑛→∞ sup
1≤𝑘≤𝑁𝑛−1

Δ𝑡
(𝑛)
𝑘−1
Δ𝑡
(𝑛)
𝑘
− 𝑙𝑘

= 0;
(ii) 𝑚𝑛 = 𝑂(𝑝𝑛) as 𝑛 → ∞;
(iii) 𝑝𝑛 = 𝑜( 1
log 𝑛
) as 𝑛 → ∞.
76
With irregular observations satisfying the assumptions above, the almost sure convergence ofV𝑛 (𝑋)
is proved under some regularity conditions on (𝑋𝑡 ).
Although (6) considered a general class of irregular observations, the quadratic variation defined
in (A.2) could not be evaluated when 𝛾 is unknown. Also, 𝛾 could not be estimated when 𝑚𝑛 = 𝑝𝑛 =
1
𝑁𝑛
does not hold. The quadratic variations defined by (53), however, do not depend on unknown
parameters.
(53) considered a stationary, isotropic Gaussian random field 𝑋 on R𝑑, 𝑑 = 1, 2. When 𝑑 = 1,
define irregular lattice points
𝑡𝑖 = 𝜑

𝑖 − 1
𝑛 − 1

, 𝑖 = 1, . . . , 𝑛 (A.3)
for 𝑛 ≥ 2, where 𝜑 : R ↦→ R is a twice continuously differentiable function with 𝜑(0) = 0, 𝜑(1) = 1
and min0≤𝑠≤1 𝜑′(𝑠) > 0.
For 𝜃 ∈ {1, 2} and ℓ ∈ {1, 2, . . . , ⌊(𝑛 − 1)/𝜃⌋}, define
𝑎𝜃,ℓ;𝑖,𝑘 =
ℓ!
Î
0≤ 𝑗≤ℓ, 𝑗≠𝑘
(𝑡𝑖+𝜃𝑘 − 𝑡𝑖+𝜃 𝑗 ) , 𝑘 = 0, . . . , ℓ, (A.4)
∇𝜃,ℓ𝑋𝑖 =
Õℓ
𝑘=0
𝑎𝜃,ℓ;𝑖,𝑘 𝑋(𝑡𝑖+𝜃𝑘 ), 𝑖 = 1, . . . , 𝑛 − 𝜃ℓ. (A.5)
Lemma 1 in (53) shows that
Õℓ
𝑘=0
𝑎𝜃,ℓ;𝑖,𝑘 𝑡𝑞
𝑖+𝜃𝑘 =
8>>>><
>>>>: 0, 𝑞 =
0, .
.
.
,
ℓ
−
1
ℓ!, 𝑞 = ℓ.
The ℓth order quadratic variations are defined as
𝑉𝜃,ℓ =
𝑛Õ−𝜃ℓ
𝑖=1
􀀀
∇𝜃,ℓ𝑋𝑖
2
, 𝜃 ∈ {1, 2}, ℓ ∈ {1, 2, . . . , ⌊(𝑛 − 1)/𝜃⌋}. (A.6)
A.2 𝑑 > 1
A.2.1 Observations along a curve
(53) studied the case when 𝑑 = 2 and 𝑋 is observed along a fixed curve in R2. Assume that
(i) ∃𝜖 > 0, 𝐿 > 0 s.t. 𝛾 : (−𝜖, 𝐿 + 𝜖) ↦→ R𝑑 is a 𝐶2-curve parameterized by arc length;
77
(ii) ∃𝐶 > 0 s.t. ||𝛾(𝑡∗) − 𝛾(𝑡) || ≥ 𝐶|𝑡∗ − 𝑡 |, ∀𝑡∗, 𝑡 ∈ [0, 𝐿].
Denote by 𝑋𝑖 = 𝑋(𝛾(𝑡𝑖)) and 𝑑𝑖, 𝑗 = ||𝛾(𝑡𝑖) − 𝛾(𝑡 𝑗 ) || for 1 ≤ 𝑖, 𝑗 ≤ 𝑛, where 𝑡𝑖 is defined in (A.3).
For 𝜃, ℓ ∈ {1, 2}, define
𝑏𝜃,ℓ;𝑖,𝑘 =
ℓ Î
0≤ 𝑗≤ℓ, 𝑗≠𝑘
(𝑑𝑖,𝑖+𝜃𝑘 − 𝑑𝑖,𝑖+𝜃 𝑗 ) , 𝑘 = 0, . . . , ℓ, (A.7)
˜∇𝜃,ℓ𝑋𝑖 =
Õℓ
𝑘=0
𝑏𝜃,ℓ;𝑖,𝑘 𝑋𝑖+𝜃𝑘 , 𝑖 = 1, . . . , 𝑛 − 𝜃ℓ. (A.8)
Lemma 1 in (53) shows that
Õℓ
𝑘=0
𝑏𝜃,ℓ;𝑖,𝑘 𝑑𝑞
𝑖,𝑖+𝜃𝑘 =
8>>>>
<
>>>>:
0, 𝑞 = 0, . . . , ℓ − 1
ℓ, 𝑞 = ℓ.
The ℓth order quadratic variations are constructed as
˜𝑉
𝜃,ℓ =
𝑛Õ−𝜃ℓ
𝑖=1
􀀀 ˜∇𝜃,ℓ𝑋𝑖
2
, 𝜃, ℓ ∈ {1, 2}. (A.9)
A.2.2 Observations on deformed lattice
When 𝑑 = 2 and 𝑋 is observed on deformed lattice points in R2, (53) also defined corresponding
second order quadratic variations.
Consider an open set Ω in R2 with [0, 1]2 ⊂ Ω, and a 𝐶2(Ω) diffeomorphism 𝜑˜ : Ω ↦→ R2. Let
𝜑˜ = (𝜑1, 𝜑2). Write 𝑋𝑖1,𝑖2 = 𝑋(x𝑖1,𝑖2 ), where x𝑖1,𝑖2 = (𝑥𝑖1,𝑖2
1 , 𝑥𝑖1,𝑖2
2
)′ = (𝜑1(𝑖1/𝑛, 𝑖2/𝑛), 𝜑2 (𝑖1/𝑛, 𝑖2/𝑛))′
for 1 ≤ 𝑖1, 𝑖2 ≤ 𝑛.
For 𝜃 ∈ {1, 2} and 1 ≤ 𝑖1, 𝑖2 ≤ 𝑛 − 𝜃, let
𝐴𝜃;𝑖1,𝑖2 =
©­­
«
𝑥𝑖1+𝜃,𝑖2
1
− 𝑥𝑖1,𝑖2
1 𝑥𝑖1+𝜃,𝑖2
2
− 𝑥𝑖1,𝑖2
2
𝑥𝑖1,𝑖2+𝜃
1
− 𝑥𝑖1,𝑖2
1 𝑥𝑖1,𝑖2+𝜃
2
− 𝑥𝑖1,𝑖2
2
ª®®
¬
,
𝐵𝜃;𝑖1,𝑖2 =
©­­
«
𝑥𝑖1+𝜃,𝑖2
1
− 𝑥𝑖1+𝜃,𝑖2+𝜃
1 𝑥𝑖1+𝜃,𝑖2
2
− 𝑥𝑖1+𝜃,𝑖2+𝜃
2
𝑥𝑖1,𝑖2+𝜃
1
− 𝑥𝑖1+𝜃,𝑖2+𝜃
1 𝑥𝑖1,𝑖2+𝜃
2
− 𝑥𝑖1+𝜃,𝑖2+𝜃
2
ª®®
¬
.
78
Then define
©­­
«
˜∇𝜃,1𝑋𝑖1,𝑖2
˜∇𝜃,2𝑋𝑖1,𝑖2
ª®®
¬
= 𝐵−1
𝜃;𝑖1,𝑖2
©­­
«
𝑋𝑖1+𝜃,𝑖2
− 𝑋𝑖1+𝜃,𝑖2+𝜃
𝑋𝑖1,𝑖2+𝜃 − 𝑋𝑖1+𝜃,𝑖2+𝜃
ª®®
¬
− 𝐴−1
𝜃;𝑖1,𝑖2
©­­
«
𝑋𝑖1+𝜃,𝑖2
− 𝑋𝑖1,𝑖2
𝑋𝑖1,𝑖2+𝜃 − 𝑋𝑖1,𝑖2
ª®®
¬
(A.10)
=
Õ
0≤𝑘1,𝑘2≤1
©­­
«
𝑐𝑘1,𝑘2
𝜃,1;𝑖1,𝑖2
𝑋𝑖1+𝜃𝑘1,𝑖2+𝜃𝑘2
𝑐𝑘1,𝑘2
𝜃,2;𝑖1,𝑖2
𝑋𝑖1+𝜃𝑘1,𝑖2+𝜃𝑘2
ª®®
¬
, (A.11)
where 𝐵−1
𝜃;𝑖1,𝑖2 and 𝐴−1
𝜃;𝑖1,𝑖2 exist for large enough 𝑛 since 𝜑˜ is a diffeomorphism. Lemma 2 in (53)
shows that for 𝑗 , ℓ ∈ {1, 2},
Õ
0≤𝑘1,𝑘2≤1
𝑐𝑘1,𝑘2
𝜃,ℓ;𝑖1,𝑖2

𝑥𝑖1+𝜃𝑘1,𝑖2+𝜃𝑘2
𝑗
𝑞
= 0, 𝑞 = 0, 1.
The second order quadratic variations are defined as
˜𝑉
𝜃,ℓ =
Õ
1≤𝑖1,𝑖2≤𝑛−𝜃
􀀀 ˜∇𝜃,ℓ𝑋𝑖1,𝑖2
2
, 𝜃, ℓ ∈ {1, 2}. (A.12)
For quadratic variations defined in (A.6), (A.9) and (A.12), the rates of their expectations and
variances as 𝑛 → ∞ are proved by (53) under some regularity conditions on 𝑋.
(54) focused on the stationary GRF 𝑋 on R𝑑 with isotropic Matérn covariance function, and
studied quadratic variations constructed from irregular observations of 𝑋 when 𝑑 > 2.
The definition in (A.12) is extended to the case where 𝑋 is observed on [0, 1]𝑑 and 𝑑 ∈ Z+.
Consider an open set Ω in R𝑑 with [0, 1]𝑑 ⊂ Ω, and a 𝐶2(Ω) diffeomorphism 𝝋 = (𝜑1, . . . , 𝜑𝑑) :
Ω ↦→ R𝑑. Write
x(i) = (𝑥1(i), . . . , 𝑥𝑑 (i))′ =

𝜑1
 i
𝑛

, . . . , 𝜑𝑑
 i
𝑛
′
and 𝑋𝑖1,...,𝑖𝑑 = 𝑋(x(i)), where i = (𝑖1, . . . , 𝑖𝑑)′ and 1 ≤ 𝑖1, . . . , 𝑖𝑑 ≤ 𝑛. The sample size is thus 𝑛𝑑.
For 𝜃 ∈ {1, 2} and ℓ ∈ Z+, let
ℓ¯ =
Õℓ
𝑙=1
©­­
«
𝑙 + 𝑑 − 1
𝑑 − 1
ª®®
¬
, (A.13)
xi, 𝑗 = (𝑥i, 𝑗;1, . . . , 𝑥i, 𝑗 ;𝑑)′ = x(𝑖1 + 𝑘1𝜃, . . . , 𝑖𝑑 + 𝑘𝑑𝜃), 𝑗 = 0, . . . , ℓ¯,
˜yi, 𝑗 =
𝑛
𝜃
(xi, 𝑗 − xi,0), 𝑗 = 1, . . . , ℓ¯,
79
where 𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛−ℓ𝜃}, 𝑘1, . . . , 𝑘𝑑 ∈ {0, 1, . . . , ℓ} and
Í𝑑
𝑖=1 𝑘𝑖 ∈ {0, 1, . . . , ℓ}, 𝑗 denotes
the lexicographical order of combinations (𝑘1, . . . , 𝑘𝑑), xi,0 = x(i). The detailed rule of ordering
is described in Section 5.1 of (54).
For 𝑙 = 1, . . . , ℓ and s = (𝑠1, . . . , 𝑠𝑑)′ ∈ R𝑑, define
a⟨𝑑,𝑙⟩ (s) =
Ö𝑑
𝑘=1
𝑠𝑙𝑘
𝑘
𝑙𝑘 !
!
∈ R
©­­­­
«
𝑙 + 𝑑 − 1
𝑑 − 1
ª®®®®
¬, (A.14)
where 𝑙1, . . . , 𝑙𝑑 ∈ {0, 1, . . . , ℓ} and
Í𝑑
𝑖=1 𝑙𝑖 = 𝑙. The elements of a⟨𝑑,𝑙⟩ (s) are arranged in lexicographic
ordering with respect to (𝑙1, . . . , 𝑙𝑑). Define a ℓ¯× ℓ¯matrix
˜𝐴
i,𝜃,𝑑,ℓ =
©­­­­­­­­
«
a⟨𝑑,1⟩ ( ˜yi,1) a⟨𝑑,2⟩ ( ˜yi,1) · · · a⟨𝑑,ℓ⟩ ( ˜yi,1)
a⟨𝑑,1⟩ (y˜i,2) a⟨𝑑,2⟩ (y˜i,2) · · · a⟨𝑑,ℓ⟩ (y˜i,ℓ¯
)
...
...
. . .
...
a⟨𝑑,1⟩ (y˜i,ℓ¯
) a⟨𝑑,2⟩ (y˜i,1) · · · a⟨𝑑,ℓ⟩ (y˜i, ¯ ℓ
)
ª®®®®®®®®
¬
(A.15)
and assume | ˜
𝐴
i,𝜃,𝑑,ℓ | ≠ 0 for all 𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛 − ℓ𝜃}.
Denote by ˜
𝐴
−1
i,𝜃,𝑑,ℓ =

˜ 𝛼𝑗 ,𝑘
i,𝜃,𝑑,ℓ

1≤ 𝑗 ,𝑘≤ℓ¯
and let
˜ 𝑐i,𝜃,𝑑,ℓ ( 𝑗 ) =
8>>>>
<
>>>>:
˜ 𝛼
ℓ¯, 𝑗
i,𝜃,𝑑,ℓ, ∀𝑗 = 1, . . . , ℓ¯,
−Íℓ¯
𝑘=1 ˜ 𝛼
ℓ¯,𝑘
i,𝜃,𝑑,ℓ, if 𝑗 = 0.
(A.16)
For 𝜃 ∈ {1, 2} and ℓ ∈ Z+, define
˜∇𝜃,𝑑,ℓ𝑋𝑖1,...,𝑖𝑑 =
Õℓ¯
𝑗=0
˜ 𝑐i,𝜃,𝑑,ℓ ( 𝑗 )𝑋(xi, 𝑗 ), 𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛 − 2ℓ}. (A.17)
The ℓth order quadratic variation is then defined as
˜𝑉
𝜃,𝑑,ℓ =
Õ
1≤𝑖1,...,𝑖𝑑≤𝑛−2ℓ
􀀀 ˜∇𝜃,𝑑,ℓ𝑋𝑖1,...,𝑖𝑑
2
. (A.18)
A.2.3 Stratified sampling
Let
x(i) = (𝑥1(i), . . . , 𝑥𝑑 (i))′ =

𝑖1 − 1 + 𝛿i;1
𝑛
, . . . ,
𝑖𝑑 − 1 + 𝛿i;𝑑
𝑛
′
∈ [0, 1)𝑑,
80
where i = (𝑖1, . . . , 𝑖𝑑)′ and 1 ≤ 𝑖1, . . . , 𝑖𝑑 ≤ 𝑛; 0 ≤ 𝛿i;𝑘 < 1 (𝑘 = 1, . . . , 𝑑) are constants that can
vary with 𝑛. Let 𝜔𝑛 be an integer depending only on 𝑛 such that 𝜔𝑛 = 𝑂(𝑛𝛾0 ) as 𝑛 → ∞, where
𝛾0 ∈ (0, 1) is a constant.
For 𝜃 ∈ {1, 2} and ℓ ∈ Z+, let
xi, 𝑗 = (𝑥i, 𝑗;1, . . . , 𝑥i, 𝑗 ;𝑑)′ = x(𝑖1 + 𝑘1𝜔𝑛𝜃, . . . , 𝑖𝑑 + 𝑘𝑑𝜔𝑛𝜃), 𝑗 = 0, . . . , ℓ¯,
yi, 𝑗 =
𝑛
𝜔𝑛𝜃
(xi, 𝑗 − xi,0), 𝑗 = 1, . . . , ℓ¯,
where 𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛−ℓ𝜔𝑛𝜃}, other notations are as defined in Section A.2.2. Define a ℓ¯×ℓ¯
matrix
𝐴i,𝜃,𝑑,ℓ =
©­­­­­­­­
«
a⟨𝑑,1⟩ (yi,1) a⟨𝑑,2⟩ (yi,1) · · · a⟨𝑑,ℓ⟩ (yi,1)
a⟨𝑑,1⟩ (yi,2) a⟨𝑑,2⟩ (yi,2) · · · a⟨𝑑,ℓ⟩ (yi,ℓ¯
)
...
...
. . .
...
a⟨𝑑,1⟩ (yi,ℓ¯
) a⟨𝑑,2⟩ (yi,1) · · · a⟨𝑑,ℓ⟩ (yi, ¯ ℓ
)
ª®®®®®®®® ¬
, (A.19)
where a⟨𝑑,𝑙⟩ (·) is defined in (A.14). Assume |𝐴i,𝜃,𝑑,ℓ | ≠ 0 for all 𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛 − ℓ𝜔𝑛𝜃}.
Then denote by 𝐴−1
i,𝜃,𝑑,ℓ =

𝛼𝑗 ,𝑘
i,𝜃,𝑑,ℓ

1≤ 𝑗 ,𝑘≤ℓ¯
. Let
𝑐i,𝜃,𝑑,ℓ ( 𝑗 ) =
8>>>>
<
>>>>:
𝛼
ℓ¯, 𝑗
i,𝜃,𝑑,ℓ, ∀𝑗 = 1, . . . , ℓ¯,
−Íℓ¯
𝑘=1 𝛼
ℓ¯,𝑘
i,𝜃,𝑑,ℓ, if 𝑗 = 0.
(A.20)
The ℓth order quadratic variation is then defined as
𝑉𝜃,𝑑,ℓ =
Õ
1≤𝑖1,...,𝑖𝑑≤𝑛−2ℓ𝜔𝑛
􀀀
∇𝜃,𝑑,ℓ𝑋𝑖1,...,𝑖𝑑
2
, (A.21)
where 𝜃 ∈ {1, 2}, ℓ ∈ Z+ and
∇𝜃,𝑑,ℓ𝑋𝑖1,...,𝑖𝑑 =
Õℓ¯
𝑗=0
𝑐i,𝜃,𝑑,ℓ ( 𝑗 )𝑋(xi, 𝑗 ), 𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛 − 2ℓ𝜔𝑛}. (A.22)
A.3 Randomized Sampling Design
Section 4 in (54) considered random sampling on [0, 1)𝑑, where 𝑑 ∈ {1, 2, 3}. It is an extension
of the stratified sampling discussed in Section A.2.3.
81
Let x1, . . . , x𝑁 be a sequence of i.i.d. random vectors in R𝑑 that are independent of the GRF 𝑋.
Assume the probability density function 𝑝(x) of x1 satisfies
¹
[0,1)𝑑
𝑝(x)dx = 1 and inf
[0,1)𝑑
𝑝(x) ≥ 𝑝0 > 0. (A.23)
When 𝑝0 in (A.23) is unknown, let
𝑛𝜏 =
$
𝑁
𝜏 log2(𝑁)
1/𝑑
%
, ∀𝜏 > 0.
Let ˆ 𝜏 be the smallest real number greater than or equal to 1 such that
{x1, . . . , x𝑁 } ∩
Ö𝑑
𝑗=1

𝑖 𝑗 − 1
𝑛 ˆ 𝜏
,
𝑖 𝑗
𝑛 ˆ 𝜏

≠ ∅, ∀𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛 ˆ 𝜏}.
Consider the effective sample only:
8>>
<
>>:

x𝑗 , 𝑋(x𝑗 )
    
: x𝑗 ∈
Ö𝑑
𝑗=1

𝑖 𝑗 − 1
𝑛 ˆ 𝜏
,
𝑖 𝑗
𝑛 ˆ 𝜏

, 𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , 𝑛 ˆ 𝜏}, 𝑗 ∈ {1, . . . , 𝑁}
9>>
=
>>;
. (A.24)
Take a subset of x𝑗 ’s in (A.24) such that for each i = (𝑖1, . . . , 𝑖𝑑)′ with 1 ≤ 𝑖1, . . . , 𝑖𝑑 ≤ 𝑛 ˆ 𝜏, there
is strictly one 𝑗 satisfying x𝑗 ∈ Î𝑑𝑗
=1
h
𝑖 𝑗−1
𝑛 ˆ 𝜏
, 𝑖 𝑗
𝑛 ˆ 𝜏

. Write the selected x𝑗 as x(i). The randomized
sampling design is then reduced to the stratified sampling design with a sample size of 𝑛𝑑 ˆ 𝜏 . Thus,
the ℓth order quadratic variations could be defined as in (A.21), where 𝜃 ∈ {1, 2}, ℓ ∈ Z+ and 𝑛 is
replaced by 𝑛 ˆ 𝜏.
When 𝑝0 in (A.23) is known, let 𝜏0 = 3/𝑝0 and
¯ 𝑛𝜏 =
$
𝑁
𝜏 log(𝑁)
1/𝑑
%
,
where 𝜏 ≥ 𝜏0. Let ¯ 𝜏 be the smallest real number greater than or equal to 𝜏0 such that
{x1, . . . , x𝑁 } ∩
Ö𝑑
𝑗=1

𝑖 𝑗 − 1
¯ 𝑛 ¯ 𝜏
,
𝑖 𝑗
¯ 𝑛 ¯ 𝜏

≠ ∅, ∀𝑖1, . . . , 𝑖𝑑 ∈ {1, . . . , ¯ 𝑛 ¯ 𝜏}.
The effective sample is defined as in (A.24) by replacing 𝑛 ˆ 𝜏 with ¯ 𝑛 ¯ 𝜏. Similarly, the ℓth order
quadratic variations are defined as in (A.21), where 𝜃 ∈ {1, 2}, ℓ ∈ Z+ and 𝑛 is replaced by ¯ 𝑛 ¯ 𝜏.
82
A.4 Estimating Smoothness Parameters
Based on the a.s. convergence of the quadratic variation defined in (A.2), when a fractional
Ornstein-Uhlenbeck process 𝑂𝐻 is observed from regular sampling, its fractional parameter 𝐻 ∈
(0, 1) has a strongly consistent estimator as
ˆ𝐻
𝑛 =
1
2
−
log
Í𝑁𝑛−1
𝑘=1

𝑂𝐻
𝑘+1
𝑁𝑛
+ 𝑂𝐻
𝑘−1
𝑁𝑛
− 2𝑂𝐻
𝑘
𝑁𝑛
2!
2 log 𝑁𝑛
, (A.25)
where 1/𝑁𝑛 = 𝑜(1/log 𝑛).
Quadratic variations constructed in (A.6), (A.9) and (A.12) are used to estimate the smoothness
parameter 𝜈 in covariance function (1.1).
The estimators of 𝜈 defined by (53) are minimizers of functions that depend on sampling locations
and quadratic variations. Although with no closed form expressions, the estimators are proved
to be strongly consistent when ℓ > 𝜈 and observations are on [0, 1] or along a curve. When 𝑋 is
observed on deformed lattice and 𝜈 ∈ (0, 2), ℓ ∈ {1, 2}, the estimator defined using (A.12) is proved
to be strongly consistent as well.
The Matérn covariance function belongs to the class of functions defined in (1.1). To estimate
its smoothness parameter 𝜈, define
ˆ 𝜈𝑛,ℓ =
log(𝑉2,𝑑,ℓ/𝑉1,𝑑,ℓ)
2 log 2 , (A.26)
where 𝑉𝜃,𝑑,ℓ, 𝜃 = 1, 2 are quadratic variations defined in (A.18), (A.21) and Section A.3, corresponding
to different kinds of sampling design. When ℓ > 𝜈, it is proved by (54) that ˆ 𝜈𝑛,ℓ → 𝜈 a.s.
as 𝑛 → ∞.
83
APPENDIX B
HIGH EXCURSION PROBABILITY
We first introduce some notations and definitions presented in (62).
The structural modulus of vector t ∈ R𝑛 is defined as
|t|𝐸,𝛼 =
Õ𝑘
𝑖=1
©­
«
Õ𝐸(𝑖)
𝑗=𝐸(𝑖−1)+1
𝑡2
𝑗
ª®
¬
𝛼𝑖/2
,
where 𝐸 = {𝑒1, 𝑒2, . . . , 𝑒𝑘 }, 𝛼 = {𝛼1, 𝛼2, . . . , 𝛼𝑘 }, 𝑒𝑖 , 𝛼𝑖 ∈ Z+ (𝑖 = 1, 2, . . . , 𝑘),
Í𝑘
𝑖=1 𝑒𝑖 = 𝑛,
𝐸(𝑖) =
Í𝑖
𝑗=0 𝑒 𝑗 , 𝑒0 = 0. A structure (𝐸, 𝛼) defines a partition of the space R𝑛 into a direct product
of orthogonal subspaces (R𝑛 = ×𝑘
𝑖=1R𝑒𝑖 ) such that the restrictions of the structural modulus |t|𝐸,𝛼
on either of them is a Euclidean norm taken to the degree 𝛼𝑖 , 𝑖 = 1, 2, . . . , 𝑘, respectively.
Example 1. Let 𝑛 = 𝑘 = 2 and 𝐸 = {1, 1}, then 𝐸(0) = 0, 𝐸(1) = 1, 𝐸(2) = 2, and
|t|𝐸,𝛼 = |𝑡1|𝛼1 + |𝑡2|𝛼2 , ∀t = (𝑡1, 𝑡2) ∈ R2,
where 𝛼1, 𝛼2 ∈ Z+.
Let 𝜒(t), t ∈ R𝑛 be a Gaussian field with continuous trajectories, and
𝐸 𝜒(t) = −|t|𝐸,𝛼,
Cov (𝜒(t), 𝜒(s)) = |t|𝐸,𝛼 + |s|𝐸,𝛼 − |t − s|𝐸,𝛼,
where 𝛼𝑖 ≤ 2 makes the covariance function valid. For any compact set 𝑇 ⊂ R𝑛 and matrix
𝑀 ∈ R𝑛×𝑛, denote by
𝐻𝑀
(𝐸,𝛼),(𝐸′,𝛼′) (𝑇) = 𝐸 exp

max
𝑇

𝜒(t) − |𝑀t|𝐸′,𝛼′
    
.
Write 𝐻𝐸,𝛼 (𝑇) = 𝐻0
(𝐸,𝛼),(𝐸′,𝛼′)
(𝑇), where 0 is the zero matrix.
A set 𝐴 ⊂ R𝑛 is called Jordan measurable if its interior and closure have the same Lebesgue
measure, i.e. its boundary has Lebesgure measure zero. The system {𝐴𝑢, 𝑢 > 0} is said to blow up
slowly with the rate 𝜅 > 0 if each of these sets contains a unit cube and mes(𝐴𝑢) = 𝑂(𝑒𝜅𝑢2/2) as
𝑢 → ∞.
Theorem 7.2 in (62) is presented as below, where the subscript ·𝐸,𝛼 is written as ·𝛼 for short.
84
Theorem 11. (62) Let {𝑋(t), t ∈ R𝑛} be a Gaussian homogeneous field with zero mean and the
covariance function 𝑟 (t) satisfies that there exists a non-degenerate matrix 𝐶 and a structure (𝐸, 𝛼)
such that
𝑟 (𝐶t) = 1 − |t|𝛼 + 𝑜(|t|𝛼) as 𝑡 → 0,
𝑟 (t) → 0 as 𝑡 → ∞.
(B.1)
Then there exists a number 𝜅 > 0 such that for any system of closed Jordan sets, blowing up slowly
with the rate 𝜅,
𝑃

max
t∈𝐴𝑢
𝑋(t) > 𝑢

= 𝐻𝛼mes(𝐴𝑢) |det𝐶−1|
Ö𝑘
𝑖=1
𝑢2𝑒𝑖/𝛼𝑖Ψ(𝑢) (1 + 𝑜(1)) as 𝑢 → ∞, (B.2)
where
𝐻𝛼 = lim
𝑡→∞
𝐻𝛼 ( [0, 𝑡]𝑛)
𝑡𝑛
and Ψ(𝑢) = √1
2𝜋
¯ ∞
𝑢 exp(−𝑥2/2)d𝑥.
Remark 3. The zero-mean stationary Ornstein-Uhlenbeck field 𝑋 with covariance function defined
in (3.2) taking 𝜎2 = 1 satisfies conditions in Theorem 11 with 𝑛 = 2, 𝐸 = {1, 1}, 𝛼 = {1, 1}, and
𝐶 =
©­­
«
1/𝜆 0
0 1/𝜇
ª®®
¬
.
85
APPENDIX C
STOCHASTIC PARTIAL DIFFERENTIAL EQUATION
Write the two-sided Laplace transform of a function ℎ as
Lℎ (𝑝) =
¹ ∞
−∞
𝑒−𝑝𝑥ℎ(𝑥)d𝑥, (C.1)
and denote by 𝐷𝑛 the differential operator of order 𝑛, i.e. 𝐷𝑛ℎ(𝑥) = d𝑛
d𝑥𝑛 ℎ(𝑥). It follows from the
differentiation rule presented on Page 48-50 of (67) that
L𝐷𝑛ℎ (𝑝) = 𝑝𝑛Lℎ (𝑝), ∀𝑛 ∈ Z+ (C.2)
when
lim
𝑥→∞
𝑒−𝑝𝑥ℎ(𝑥) − lim
𝑥→−∞
𝑒−𝑝𝑥ℎ(𝑥) = 0.
The case when 𝑛 ∉ Z+ is discussed in (59). We first introduce the definition of fractional
derivatives below. For any 𝛼 > 0, define the fractional difference operator Δ𝛼 as
Δ𝛼 𝑓 (𝑥) =
Õ∞
𝑗=0
Γ(𝛼 + 1)
𝑗 !Γ(𝛼 − 𝑗 + 1)
(−1) 𝑗 𝑓 (𝑥 − 𝑗 ℎ)
and write the fractional derivative in the Grünwald-Letnikov finite difference form as
𝐷𝛼 𝑓 (𝑥) :=
d𝛼 𝑓 (𝑥)
d𝑥𝛼 = lim
ℎ→0
Δ𝛼 𝑓 (𝑥)
ℎ𝛼 . (C.3)
Alternative integral forms for the fractional derivative are also presented in (59), as shown in Tables
C.1-C.2. Consider the Riemann-Liouville fractional derivative of order 0 < 𝛼 < 1, of which the
Laplace transform is written as
¹ ∞
−∞
𝑒−𝑝𝑥𝐷𝛼 𝑓 (𝑥)d𝑥 =
¹ ∞
−∞
𝑒−𝑝𝑥 d
d𝑥
¹ ∞
0
𝑓 (𝑥 − 𝑦) 𝑦−𝛼
Γ(1 − 𝛼) d𝑦d𝑥
=
1
Γ(1 − 𝛼)
 
𝑒−𝑝𝑥
¹ ∞
0
𝑓 (𝑥 − 𝑦)𝑦−𝛼d𝑦
∞
𝑥=−∞
−
¹ ∞
−∞
¹ ∞
0
𝑓 (𝑥 − 𝑦)𝑦−𝛼d𝑦d𝑒−𝑝𝑥

:=
1
Γ(1 − 𝛼)
(𝐼1 − 𝐼2),
86
where
𝐼2 = −𝑝
¹ ∞
0
𝑒−𝑝𝑦𝑦−𝛼
¹ ∞
−∞
𝑓 (𝑧)𝑒−𝑝𝑧d𝑧d𝑦
= −𝑝𝛼L𝑓 (𝑝)
when 𝑒−𝑝𝑥 𝑦−𝛼 𝑓 (𝑥 − 𝑦) is integrable. If it further holds that
lim
𝑥→∞
𝑒−𝑝𝑥
¹ ∞
0
𝑓 (𝑥 − 𝑦)𝑦−𝛼d𝑦 − lim
𝑥→−∞
𝑒−𝑝𝑥
¹ ∞
0
𝑓 (𝑥 − 𝑦)𝑦−𝛼d𝑦 = 0,
then
L𝐷𝛼 𝑓 (𝑝) = 𝑝𝛼L𝑓 (𝑝).
Generator form
¯ ∞
0
( 𝑓 (𝑥) − 𝑓 (𝑥 − 𝑦)) 𝛼𝑦−𝛼−1
Γ(1−𝛼) d𝑦
Caputo form
¯ ∞
0
d
d𝑥 𝑓 (𝑥 − 𝑦) 𝑦−𝛼
Γ(1−𝛼) d𝑦
Riemann-Liouville form d
d𝑥
¯ ∞
0 𝑓 (𝑥 − 𝑦) 𝑦−𝛼
Γ(1−𝛼) d𝑦
Table C.1Alternative integral forms for the fractional derivative when 0 < 𝛼 < 1.
Generator form
¯ ∞
0
( 𝑓 (𝑥 − 𝑦) − 𝑓 (𝑥) + 𝑦 d
d𝑥 𝑓 (𝑥)) 𝛼(𝛼−1)𝑦−𝛼−1
Γ(2−𝛼) d𝑦
Caputo form
¯ ∞
0
d2
d𝑥2 𝑓 (𝑥 − 𝑦) 𝑦1−𝛼
Γ(2−𝛼) d𝑦
Riemann-Liouville form d2
d𝑥2
¯ ∞
0 𝑓 (𝑥 − 𝑦) 𝑦1−𝛼
Γ(2−𝛼) d𝑦
Table C.2Alternative integral forms for the fractional derivative when 1 < 𝛼 < 2.
Consider the stochastic partial differential equation (SPDE)
𝐿

𝜕
𝜕𝑡1
,
𝜕
𝜕𝑡2

𝑋(𝑡1, 𝑡2) = 𝜖 (𝑡1, 𝑡2), 𝑡1, 𝑡2 ∈ R, (C.4)
where 𝐿 is a linear differential operator. The Green’s function of 𝐿 satisfies
𝐿

𝜕
𝜕𝑡1
,
𝜕
𝜕𝑡2

𝐺(𝑡1, 𝑡2) = 𝛿0 (𝑡1)𝛿0(𝑡2), 𝑡1, 𝑡2 ∈ R, (C.5)
where 𝛿0 is the Dirac measure at 0.
When 𝜖 is the Gaussian white noise, it holds that
𝐸[𝜖 (𝑠1, 𝑠2)𝜖 (𝑠1 + 𝑡1, 𝑠2 + 𝑡2)] = 𝛿0 (𝑡1)𝛿0 (𝑡2), ∀𝑠1, 𝑠2, 𝑡1, 𝑡2 ∈ R. (C.6)
87
The covariance function of 𝑋 is thus
𝐶(𝑡1, 𝑡2) := 𝐸[𝑋(𝑠1, 𝑠2)𝑋(𝑠1 + 𝑡1, 𝑠2 + 𝑡2)], ∀𝑠1, 𝑠2, 𝑡1, 𝑡2 ∈ R
=
¹ ∞
−∞
¹ ∞
−∞
𝐺(𝑠1, 𝑠2)𝐺(𝑠1 + 𝑡1, 𝑠2 + 𝑡2)d𝑠1d𝑠2, ∀𝑡1, 𝑡2 ∈ R. (C.7)
As presented in (32), when the operator 𝐿 takes the form of
𝐿

𝜕
𝜕𝑡1
,
𝜕
𝜕𝑡2

= 𝑐1
𝜕2
𝜕𝑡2
1
+ 𝑐2
𝜕2
𝜕𝑡2
2
+ 𝑐3
𝜕2
𝜕𝑡1𝜕𝑡2
+ 𝑐4
𝜕
𝜕𝑡1
+ 𝑐5
𝜕
𝜕𝑡2
+ 𝑐6, (C.8)
the Laplace transforms of the Green’s function and the covariance function derived from (C.4)
satisfy
L𝐺 (𝑝, 𝑞) =
1
𝐿(𝑝, 𝑞) , (C.9)
L𝐶 (𝑝, 𝑞) =
1
𝐿(𝑝, 𝑞)𝐿(−𝑝, −𝑞) . (C.10)
As a special case of (C.8), the elliptic form of the operator 𝐿 is discussed in (74), where the corresponding
SPDE is
𝜕2
𝜕𝑡2
1
+ 𝜕2
𝜕𝑡2
2
− 𝛾2
!
𝑋(𝑡1, 𝑡2) = 𝜖 (𝑡1, 𝑡2). (C.11)
Denote by 𝐾ℓ the modified Bessel functions of the second kind. The Green’s function for (C.11) is
thus
𝐺(𝑡1, 𝑡2) = L−1 1
𝑝2 + 𝑞2 − 𝛾2 =
1
2𝜋
𝐾0

𝛾
q
𝑡2
1
+ 𝑡2
2

.
The spectral density function of 𝑋 as the Fourier transform of the covariance function 𝐶 is derived
as
𝑓𝑋 (𝜉, 𝜂) =
1
(2𝜋)2 𝐿𝐶 (𝑖𝜉, 𝑖𝜂)
=
1
(2𝜋)2 􀀀
−𝜉2 − 𝜂2 − 𝛾22
∝ 1
􀀀
𝜉2 + 𝜂2 + 𝛾22 .
(24) considered the SPDE
(∇2 − 𝛽2)𝜈𝑋(𝑡1, 𝑡2) = 𝜖 (𝑡1, 𝑡2), (C.12)
88
where ∇2 = 𝜕2/𝜕𝑡2
1
+ 𝜕2/𝜕𝑡2
2, 𝜖 is a white noise field, 𝛽 ∈ R, 𝜈 > 0, and
(∇2 − 𝛽2)𝜈 = (−1)𝜈
Õ∞
𝑗=0

𝜈
𝑗

(−∇2) 𝑗 𝛽2(𝜈−𝑗 ) . (C.13)
The Green’s function of (∇2 − 𝛽2)𝜈 satisfies
(−1)𝜈
Õ∞
𝑗=0

𝜈
𝑗

(−∇2) 𝑗 𝛽2(𝜈−𝑗 )𝐺(𝑡1, 𝑡2) = 𝛿0 (𝑡1)𝛿0 (𝑡2). (C.14)
Taking Laplace transform on both sides of equation (C.14) yields
(−1)𝜈
Õ∞
𝑗=0

𝜈
𝑗

(−𝑝2 − 𝑞2) 𝑗 𝛽2(𝜈−𝑗 )L𝐺 (𝑝, 𝑞) = 1.
Thus,
L𝐺 (𝑝, 𝑞) = ©­
«
Õ∞
𝑗=0

𝜈
𝑗

(𝑝2 + 𝑞2) 𝑗 (−𝛽2)𝜈−𝑗ª®
¬
−1
=
1
􀀀
𝑝2 + 𝑞2 − 𝛽2𝜈 .
The spectral density function of 𝑋 is
𝑓𝑋 (𝜉, 𝜂) =
1
(2𝜋)2
©­
«
(−1)𝜈
Õ∞
𝑗=0

𝜈
𝑗
 
−(𝑖𝜉)2 − (𝑖𝜂)2
 𝑗
𝛽2(𝜈−𝑗 )ª®
¬
−2
∝ 1
􀀀
𝜉2 + 𝜂2 + 𝛽22𝜈
,
which is also presented in (75).
89