MINIMAX LOWER BOUNDS IN HIGH ORDER TENSOR MODELS

WITH APPLICATIONS TO NEUROIMAGING

By

Chitrak Banerjee

A DISSERTATION

Submitted to

Michigan State University

in partial fulﬁllment of the requirements

for the degree of

Statistics — Doctor of Philosophy

2020

ABSTRACT

MINIMAX LOWER BOUNDS IN HIGH ORDER TENSOR MODELS WITH

APPLICATIONS TO NEUROIMAGING

By

Chitrak Banerjee

Minimax principle is a very useful concept in mathematical statistics for ﬁnding optimal

estimators. While unbiasedness and invariance principle are useful tools for ﬁnding opti-

mal estimators, they are often restrictive and in certain cases may not even yield optimal

estimators, see Ferguson (1967). Minimax principle on the other hand, is based on linear

ordering principle and is often less restrictive. While there are several methods for ﬁnding

minimax optimal estimators such as methods due to H´ajek, Le Cam, Fano and Assouad, in

our work, we speciﬁcally use H´ajek and Fano’s methods to explore the minimax optimality

of integral curve estimators in high order tensor models. High angular resonance diﬀusion

imaging (HARDI) is a popular in-vivo brain imaging technique proposed by Ozarslan and

Mareci (2003). Besides the mathematical model for HARDI, successful tracing of neural

ﬁbers using HARDI presents the challenge of estimation and uncertainty quantiﬁcation in

presence of measurement errors. Our work here is based on the semi-parametric estimation

method proposed by Carmichael and Sakhanenko (2015), where the authors have provided

a consistent method for tracing ﬁber in the presence of measurement error using HARDI.

The ﬁrst work described here establishes the estimators proposed in Carmichael and Sakha-

nenko (2015) are minimax optimal with respect to their asymptotic risk. The framework of

HARDI allows to accommodate complex neural ﬁber structures where ﬁber tracts cross each

other, converge, diverge, “fan out” or “kiss”, thus our work generalizes the minimax lower

bound results in Sakhanenko (2012) where a similar result was established under a simpler

model where imaging signals are modeled by a vector ﬁeld perturbed by an additive noise.

The second work establishes the global bounds for the integral curve estimators proposed

by Carmichael and Sakhanenko (2015). Therefore suggesting that not only the asymptotic

rate of convergence of the integral curve estimator is minimax optimal locally but also it is

minimax optimal globally. Additionally in the simulation study of our second work we have

introduced a metric based on global minimax optimal rates which can compare the relative

accuracy of diﬀerent imaging protocols that are used to obtain HARDI data.

Copyright by
CHITRAK BANERJEE
2020

Dedicated to my late grandmother Anjali Banerjee, her ﬁghting spirit and commitment for

social justice inspire me everyday.

v

ACKNOWLEDGMENTS

I would like to extend my gratitude towards my advisor Professor Lyudmila Sakhanenko,

Professor Hira Lal Koul and my mentor Dhruv B. Sharma for their constant support and

encouragement. I would also like to thank Dr. David C. Zhu, it was a great pleasure working

with him. I would also like to thank my committee members Professor Tapabrata Maiti and

Professor Frederi G. Viens.

During my ﬁve years at Michigan State University, I had a journey which was ﬁlled with

excitement and challenges. I would not have made it till the end if my best friend Piyali

Basak wasn’t there besides me. Lastly I would like to thank my friends and colleagues

Metin Arda Ero˘glu, Andrew Dennhardt and many others with whom I shared a lot of good

moments.

vi

CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Review of DT-MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
1.2 Review of minimax lower bounds
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 H´ajek’s principle
1.2.2 Fano’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Review of Integral curve estimation . . . . . . . . . . . . . . . . . . . . . . .
1.4 Model
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.3 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 2 Local Minimax Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.1.1 Notations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Assumptions and main results . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Parametric subclass of tensors . . . . . . . . . . . . . . . . . . . . . .
2.3 Experiment
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Remarks and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 Constructed parametric subclass . . . . . . . . . . . . . . . . . . . . .
2.5.2 Main Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.3 Proofs of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.3.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . .
2.5.3.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . .

Chapter 3 Global Minimax Bound . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Assumptions and main results . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Parametric subclass of tensors . . . . . . . . . . . . . . . . . . . . . .
3.3 Simulation study and Data analysis . . . . . . . . . . . . . . . . . . . . . . .
3.3.1
Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Neuroimaging example . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Remarks and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

1
1
3
4
8
9
11
13
14
16

18
18
19
20
20
23
26
28
29
30
36
58
58
65

68
68
69
73
76
76
79
82
83

vii

83
3.5.1 Proof of Lemma 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
3.5.3 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Chapter 4 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.1 Extension of integral curve estimation . . . . . . . . . . . . . . . . . . . . . . 118

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

viii

LIST OF TABLES

Table 3.1: The 25th percentile, median and 75th percentile of the κ values are or-
dered in top to bottom in each line for diﬀerent combinations of SNR and
thickness of ﬁbers; the sample size is n = 603. . . . . . . . . . . . . . . . .

Table 3.2: Protocols for HARDI. Note that the second protocol has a total of 6 b0
. . . . . . . . . . . . . . . . . .

images after 2 repetitions of 3 b0 images.

Table 3.3: Comparison of the constant κ for tracing of anterior ﬁbers based on imaging
datasets obtained via diﬀerent scanning protocols. Here δ = 0.003, β =
10−7.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

80

81

ix

LIST OF FIGURES

Figure 1.1: A neuronal ﬁber bundle across the genu of corpus callosum is created
based on the C-S (2015) method and is shown on the axial plane in (a)
2D and (b) 3D views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 2.1: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for C-pattern. We traced each ﬁber
for 30 steps of size δ = 0.02. 100 of independent samples of n = n3
0
observations are used to create each boxplot. The labels on x-axis mark
n0. The theoretical value is −1/3. The location t0 is the endpoint of the
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
trace.
Figure 2.2: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for Y-pattern. We traced each ﬁber
for 30 steps of size δ = 0.02. 100 of independent samples of n = n3
0
observations are used to create each boxplot. The labels on x-axis mark
n0. The theoretical value is −1/3. We trace the main branch.
. . . . . .

Figure 3.1: We trace the integral curve using Carmichael and Sakhanenko (2015)
method creating the “Y” pattern. Here sample size n = 603, signal-noise
ratio SN R = 2, thickness of the ﬁber ε = 0.04, step size δ = 2 and the
constant β = 10−7.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 3.2: The red, blue and green lines show the 25th, 50th and 75th percentiles
of the κ values across all the sample sizes n = n3
0, n0 = 30, . . . , 100 with
increasing n0 by 10 at each step, repeated 100 times. Here we have used
step size δ = 0.02 and β = 10−7 for each of the iterations.
. . . . . . . .

Figure 3.3: The computation times in seconds for simulation of κ values for diﬀerent
. . . . . . . . . . . . . . . . . . . .

choices of SNR and thickness values.

Figure 3.4: A neuronal ﬁber bundle across the genu of corpus callosum is created
based on the Carmichael and Sakhanenko (2015) method.
In (a) one
particular seed point was used. The conﬁdence ellipsoids along the ﬁber
have 95% conﬁdence. In (b) several seed points were used to create several
. . . .
estimated ﬁbers. Two diﬀerent branches are shown in two colors.

10

27

28

76

77

79

81

x

LIST OF ALGORITHMS

Algorithm 1:

Estimation of Integral Curves . . . . . . . . . . . . . . . . . . . . .

15

xi

Chapter 1

Introduction

Neuroimaging is one of the most important biomedical imaging tools that plays a key role in

detecting anomalies in human brian due to brain injuries such as concussions, brain tumors,

cognitive impairments, onset of Alzheimer’s Disease (AD) and many other brain related

illnesses.Thus, it is often imperative to have a proper imaging technique which can better

equip neurosurgeons and clinicians to administer correct treatment for their patients. The

two most common imaging techniques that are used at present are Computed tomography

(CT) scan and Magnetic resonance imaging (MRI). While CT scan uses X-Ray, MRI uses

radio waves causing less potential harm to human tissue due to radiation from high frequency

beams. In our work we will speciﬁcally explore some interesting statistical properties of one

of the imaging techniques, High angular resonance diﬀusion imaging (HARDI) involving

Diﬀussion tensor (DT-MRI) scans.

1.1 Review of DT-MRI

Magnetic resonance imaging (MRI) utilizes the dynamics of self-spinning protons, most

commonly in water molecules, as the source of energy to generate MRI signal. Under a strong

magnetic ﬁeld, a group of these spins forms a net magnetization. This net magnetization

can be perturbed by a radio frequency (RF) electromagnetic wave. Its wobbling (precessing)

phenomenon can be measured as signals by an MRI scanner. By manipulating the magnetic

1

ﬁeld using gradients, we can identify the locations of the signal, which in turn allow us to

generate images. As MRI contains no radiation and thus the potential damage to the human

body is minimal, it has become an important tool in both clinical and research applications.

The phenomenon of water diﬀusion is further taken advantage in MRI to develop diﬀu-

sion weighted imaging (DWI). Water diﬀusion in the presence of a magnetic ﬁeld gradient

leads to MRI signal loss. In an unrestricted environment, water and other molecules move

or diﬀuse randomly in three dimensions resulting from thermal energy. This motion is called

Brownian motion. Studying the Brownian motion of molecules (water molecules in our case)

in the brain, we can provide information regarding the neuronal structural connectivity in

vivo. These measurements have been made possible with DWI, see Basser et. al. (1994), Le

Bihan et. al. (2001), which applies diﬀusion-weighted gradients in various directions to assess

the diﬀusing directions of the water molecules. With DWI data, as in the commonly used

diﬀusion tensor imaging (DTI) techniques, diﬀusivity values and principal diﬀusion orienta-

tion can be estimated at each voxel. Since healthy axons contain intact myelin sheaths and

tend to align in organized orientations, water diﬀusivity in a voxel tends to be preferentially

along the direction of the axonal bundles. By inspecting the orientations of the diﬀusion

tensors at neighboring voxels, axonal ﬁber bundles can be traced. The success of the axonal

tracing can be used to understand the structural connections between brain regions, see Le

Bihan et. al. (2001), Zhu and Majumdar (2014). This can also be used to assess axonal

changes over time in applications such as brain maturation in young children, see Chang and

Zhu (2013), axonal degeneration in Alzheimer’s diseases, see Zhu et. al. (2013). However,

successful tractography based on DWI data faces some fundamentally challenging demands:

speciﬁcally, the need for high signal-to-noise ratio (SNR), high spatial resolution, a relative

long scan time, the ability to resolve crossing ﬁbers, full coverage of tracks of interest, and

2

the ability to trace at regions with low diﬀusion anisotropy. To address the issue of crossing

ﬁbers, high angular resolution diﬀusion imaging (HARDI), proposed by ¨Ozarslan and Mareci

(2003) in DWI has gained some success. The issues related to neuronal ﬁber tractography

in DWI motivated our research on the integral curve estimation.

1.2 Review of minimax lower bounds

Here we review some of the fundamental principles and methods that we commonly use in

our model to ﬁnd minimax lower bounds. Suppose Θ is a nonempty set commonly referred

to as a parameter space, A is a nonempty set of actions available to the statistician called
an action space. Let w be a non-negative real-valued function deﬁned on Θ × A referred

to as the loss function. Also, suppose X is a random variable from the probability space
(X , B, Pθ). A statistical decision problem or a statistical game is a game (Θ, A , w) coupled
with an experiment involving a random variable X whose distribution Pθ depends on the
value of the unknown parameter θ ∈ Θ. On the basis of the outcome of the experiment
X = x, the statistician chooses an action d(x) ∈ A . Such a function d which maps the

sample space X into A is called a decision or a statistical decision. Therefore d(X) or

w(θ, d(X)) is a random quantity. A non-negative quantity deﬁned by

(cid:90)

Eθw(θ, d(X)) =

w(θ, d(x))dPθ(x),

is called the risk function of the decision rule d. To illustrate consider the following example:
let X1, . . . , Xn be a sample from N (µ, 1) where µ ∈ R is the parameter space, µ is unknown.
Suppose based on X1, . . . , Xn we deﬁne a decision rule d(X1, . . . , Xn) = ¯X, then under

3

squared error loss the risk function is given by Eµ( ¯X − µ)2.

The fundamental problem of decision theory can be stated as: given a game (Θ, A , w)
and a random observable X whose distribution depends on θ ∈ Θ, how can we choose the best

decision rule? Traditionally there are two fundamental methods for ﬁnding optimal decision

rule, see Ferguson (1967). The ﬁrst one is restricting the available decision rule, examples of

such a method include unbiasedness, invariance. The second method is ordering the decision

rules. Examples of which are Bayes principle and minimax principle. In our work we are

interested in the minimax principle. A decision rule d0 is said to be minimax if

Eθw (θ, d0) = inf
d∈A

sup
θ∈Θ

sup
θ∈Θ

Eθw (θ, d) .

In other words, a minimax decision rule if exists is the decision rule that minimizes the
maximum risk among all possible decision rules d ∈ A . There are many proposed methods

of ﬁnding minimax decision rules. In light of our problem we will review H´ajek’s principle

and the method due to Fano.

1.2.1 H´ajek’s principle

H´ajek’s principle for ﬁnding a minimax decision rule is based on the fundamental principle

of Local Asymptotic Normality (LAN). As described in Ibragimov and Has’minkii (2013),

(cid:1) , θ ∈ Θ ⊂ Rk, is a family of statistical experiments or random

suppose (cid:0)Xn, Bn, Pθ,n

variables. Then the family Pθ,n with the density f depending on θ is called locally asymptotic
normal (LAN) at t ∈ Θ as n → ∞, if for some non-degenerate k × k matrix Ψ(n) = Ψ(n, t)

4

and any u ∈ Rk, the representation

Zn,t(u) =

dP

t+Ψ(n)u,n

dPt,n

(Xn) = exp

(cid:18)
uT ∆n,t − 1
2

(cid:19)

(cid:107)u(cid:107)2 + ϕn(u, t)

,

is vaild, where the distribution or the law of ∆n,t

L(∆n,t|Pt,n) (cid:55)→ N (0,I), n → ∞,

where I is the identity matrix of order k. Moreover, for any u ∈ Rk, ϕn(u, t) (cid:55)→ 0 in
probability Pt,n as n → ∞. The quantities ∆n,t and Ψ(n, t) are given by

Ψ(n, t) = (nI(t))−1/2 ,
n(cid:88)
∆n,t = (nI(t))−1/2

i=1

∂ ln f (Xi, t)

∂t

,

where I(t) is the information matrix. Next we would like to review the concept of statistical
regularity. A family of random variables X with the density p(x; θ), θ ∈ Θ, is called regular

if

1. p(x; θ) is a continuous function on Θ for ν−almost all x.

2. X has ﬁnite Fisher’s information at each point θ ∈ Θ.

3. The function ∂

∂θ p1/2(x; θ) is continuous in the space L2(ν),

where L2(ν) is the space of functions whose second order moments with respect to measure ν

are ﬁnite. Note that if the density p(x; θ) satisﬁes conditions 2 and 3 above it can be modiﬁed
on sets of ν−measure zero (which may depend on θ) in such a manner that it becomes a

5

continuous function of θ. Furthermore, if we consider the measure ν as a probability measure

and p1/2(X; θ), a random function of θ then it will satisfy the condition

E(cid:16)

p1/2(X; θ + h) − p1/2(X; θ)

(cid:17)2 ≤ Bh2,

where B > 0 is a constant. Now with the deﬁnition of statistical regularity, let us present

the following theorem due to H´ajek, which is an important result to show that a family of

random variables is LAN.

Theorem 1. Let Θ ⊂ Rk, fj be the density of the j−th regular experiment depending on θ
and the matrix Ψ2(n, θ) is a positive deﬁnite matrix and the following conditions are satisﬁed:

1. For any u0 > 0

n(cid:88)

j=1

n→∞ sup
lim
|u|<u0

(cid:90)

1/2
j

(cid:104) ∂f

(x, t + Ψ−1(n, t)u)

∂t

− ∂f

1/2
j

(x, t)

∂t

, Ψ−1(n, t)u(cid:105)2νj(dx) = 0,

where νj is the Lebesgue measure.

n(cid:88)

2. (Lyapounov’s condition) For any δ > 0,

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Ψ−1(n)
Then the family of measures Pθ,n(A) =(cid:82) ···(cid:82)
n(cid:81)

lim
n→∞

Ej

j=1

A

j=1

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2+δ

= 0.

∂ ln f (Xi, t)

∂t

f (xj, θ)νj(dx) satisﬁes the LAN condition

6

for θ = t with

 n(cid:88)

j=1

Ψ−1(n, t) =

Ij(t)

−1/2
n(cid:88)

and

∆n,t = Ψ−1(n, t)

∂ ln f (Xi, t)

.

∂t

j=1

Note that condition 2 in theorem 1 sometimes can be replaced by Lindeberg’s condition

which often does not require ﬁniteness of moments of order (2 + δ). However Lindeberg’s

condition is usually harder to verify. Next we deﬁne a class of the loss functions Wε,2 as

follows

1. The function w ∈ Wε,2 is non-negative on Rk, where k is the dimension of the parameter

set; moreover, w(0) = 0 and w(u) is continuous at u = 0 but is not identically 0.

2. Function w is symmetric, that is w(u) = w(−u).

3. The sets {u : w(u) < c} are convex sets for all c > 0.

4. Any function w ∈ Wε,2 grows slower than exp(ε|u|2), ε > 0 as |u| → ∞.

Having deﬁned the class of loss functions Wε,2, let us state the main lemma due to H´ajek

for establishing the minimax lower bound for the asymptotic risk of the estimators of θ.

Lemma 1 (H´ajek’s Lemma). Suppose X1, X2, . . . is a sequence of random variables from
a regular family of distributions and let the probability measure induced by Xn, Pθ,n satisfy
the LAN condition at the point θ = t, with a normalizing matrix Ψ(n, t) such that Ψ(n, t) →
Ψ(t) as n → ∞ where Ψ(t), t ∈ Θ, is positive deﬁnite. Then for any family of estimators

7

Tn = T (X1, . . . , Xn), any loss function w ∈ Wε,2, and any δ > 0 we have

(cid:16)

n→∞ sup
lim
|θ−t|<δ

Eθw

Ψ−1(n, t)(Tn − θ)

(cid:17) ≥

(cid:90)

1

(2π)k/2

Rk

w(x) exp

(cid:18)

−(cid:107)x(cid:107)2

2

(cid:19)

dx.

(1.2.1.1)

To ﬁnd the minimax estimator one can optimize the left hand side of (1.2.1.1) with

respect to all possible estimators of θ.

1.2.2 Fano’s principle

The principle of Fano, see Devroye (1987), is established upon Shannon’s information and

the discretization of the parameter space. Here we would like to ﬁrst introduce the key

concepts of Shannon information and Kullback-Lieblar (KL) divergence.

Suppose X is a discrete random variable with probabilities p1, . . . , pn depending on θ ∈ Θ

then I (X, θ) = − n(cid:80)

i=1

pi log pi, is called the Shannon’s information or entropy function. If X

is absolutely continuous with respect to Lebesgue measure and the density of X, f depends
on θ ∈ Θ, then the Shannon’s entropy or information is given by I (X, θ) = −Eθ(log f (X)).

Next we will deﬁne KL divergence. Suppose X is a random variable with density f

depending on a parameter θ ∈ Θ, then the KL divergence between densities fθ and fθ(cid:48) is

given by

K(fθ, fθ(cid:48)) =

where ν(x) is the Lebesgue measure.

(cid:90)

(cid:19)

(cid:18) fθ

fθ(cid:48)

fθ log

dν(x),

Lemma 2 (Fano’s Lemma). Let X be a random variable with density equal to one of the r+1
possible densities f1, . . . , fr+1, where K(fi, fj) ≤ β for all i (cid:54)= j. Let π(X) ∈ {1, . . . , r + 1}

8

be an estimate of the index. Then

Pi(π(X) (cid:54)= i) ≥ 1 − (β + log 2)

log r

,

sup

i

where Pi is the probability induced by fi.

In our work we have extended Fano’s lemma in a multidimensional setting to prove global

minimax bound for the integral curve estimators. Besides the principle of H´ajek and Fano,

there are other useful principles due to Le Cam and Assouad, see Guntuboyina (2011), which

we have deferred in our work.

1.3 Review of Integral curve estimation

The problem of ﬁber tracing from the imaging data obtained from DT-MRI was ﬁrst consid-

ered as a problem of integral curve estimation by Koltchinskii et al. (2007). They considered
the model where a vector ﬁeld v : G (cid:55)→ Rd was observed at locations Xi ∈ G, i = 1, . . . , n,

perturbed by an additive noise. The equation of the model is given by

Vi = v(Xi) + ξi,

where ξi, i = 1, . . . , n, are i.i.d bounded random vectors with Eξ = 0 and Cov(ξ, ξ) = Σ. The

neural ﬁbers were modeled as the solution of the ODE or equivalently the integral equation

dx(t)

dt

= v(x(t)), t ≥ 0, x(0) ∈ G,

⇔x(t) = x(0) +

v(x(s))ds.

t(cid:90)

0

9

In their work the authors had developed a theoretically rigorous non-parametric approach
to provide an estimate ˆX(t), t ≥ 0, based on the data (Xi, Vi), i = 1, . . . , n.

As an alternative method, Probabilistic ﬁber tractography, as described in Behrens et.

al. (2007) is another popular technique in DWI because it can assess the relative strength

of ﬁber connection. However, this technique employs Monte Carlo sampling and bootstrap

techniques, and depends on somewhat arbitrary prior parameter assumptions based on fully

parametric models. Due to the incorrect parameter assumptions often times the error due

to the repeated Monte Carlo sampling exacerbates the problem of estimation.

To build a more sophisticated data driven model upon the methodology proposed by

Koltchinskii et al. (2007), “low-order” DTI model by Carmichael and Sakhanenko (2016)

and “high-order” HARDI model C-S (2015) were proposed recently. With these approaches,

they demonstrated tighter conﬁdence ellipsoids around the ﬁbers, and more robustness in

handling crossing ﬁbers than with other DTI methods, see C-S (2015). The present work

will concentrate on HARDI model by C-S (2015). To further motivate this model we display

an enhanced image of ﬁber tracing.

(a) 2D view on axial plane.

(b) 3D view on axial plane.

Figure 1.1: A neuronal ﬁber bundle across the genu of corpus callosum is created based on
the C-S (2015) method and is shown on the axial plane in (a) 2D and (b) 3D views.

10

Figure 1.1 reveals the tracing of a ﬁber bundle across the genu of the corpus callosum

by the method described in C-S (2015), which contains thick axonal ﬁbers connecting the

two cerebral hemispheres and enables the communication between them. The branches are

shown in magenta and cyan colors. The blue region consists of 95% conﬁdence ellipsoids

surrounding the estimated curve, which are obtained using the asymptotics of the integral

curve estimators of the ﬁbers. Therefore, the C-S (2015) estimation method can provide a

measure of uncertainty surrounding the estimated curve. In the next sections in this chapter

we will present some key elements from C-S (2015).

1.4 Model

Let S(x, g) denote the relative amount of water diﬀusion along a spatial direction g ∈ R3,
(cid:107)g(cid:107) = 1 and at a location (also called a voxel) x. ¨Ozarslan and Mareci (2003) and Descoteaux

et. al. (2006) have proposed a model for HARDI using a super-symmetric tensor D of order
M (even) and rank R ≥ 2 by the following equation:

(cid:18) S(x, g)

(cid:19)

S0(x)

log

= −c

d(cid:88)

d(cid:88)

i1=1

iM =1

. . .

Di1...iM

(x)gi1

. . . giM

+ σ(x, g)ξg,

(1.4.1)

where S0(x) is the amount of water diﬀusion without any magnetic ﬁeld gradient; σ(x, g) > 0;

ξg describes the noise, and the constant c depends on several factors involved into the imaging

procedure, see Carmichael and Sakhanenko (2015) for more details. Depending on the type

of imaging the dimension d of the location x would either be 2 or 3. Similar construction of

DT-MRI model can also be found in Ying et. al. (2007), where the authors described the

concept of super-symmetric tensors in detail.

11

At any ﬁxed location x the log-losses log

are stacked into the vector Y (x) given by

(cid:18)S(x, g)

(cid:19)

S0(x)

of signal along N directions g1, ..., gN

Y (x) = BD(x) + Σ1/2(x)Ξx,

(1.4.2)

where D is a vector representation of the super-symmetric tensor D and the matrix B is

constructed out of vectors g1, ..., gN . Therefore, given a set of points x1, . . . , xn in a bounded
open convex subset G of Rd, one observes

Yi = BD(xi) + Σ1/2(xi)ξi,

(1.4.3)

where JM = (M + 1)(M + 2)/2 and N = JM m for some m ≥ 1, B ∈ RN×JM , Yi, ξi ∈ RN
and Σ(Xi) is a N × N symmetric positive deﬁnite matrix.

R(cid:80)

A super-symmetric tensor D of the rank R and the order M (even) can be represented
vr ⊗ . . . ⊗ vr for some v1, . . . , vR ∈ Rd, where the notation u ⊗ w means the
by D =
outer product of vectors u, w ∈ Rd, which is simply a 2D tensor with the components
(cid:123)(cid:122)
(cid:125)
(u ⊗ w)ij = uiwj for i, j = 1, . . . , d. Also we will use v⊗M = v ⊗ . . . ⊗ v
, v ∈ Rd, as an

(cid:124)

r=1

abbreviated notation for tensor products. Then by deﬁnition for all r = 1, . . . , R the pair

M times

λ(r), v(r) minimizes the Frobenius norm

d(cid:88)

. . .

d(cid:88)

(cid:16)

(cid:17)2

,

D

(r)
i1...iM

− λvi1

. . . viM

iM =1

i1=1
D(r) = D(r−1) − λ(r−1)(v(r−1))⊗M , D(1) = D.

(1.4.4)

The quantities λ(1), . . . , λ(R) and v(1), . . . v(R) are called the pseudo-eigenvalues and pseudo-

12

eigenvectors of the tensor D respectively; see Ying et. al. (2007) for more details.

Deﬁne the integral curves arising out of the diﬀerential equations involving the pseudo-

eigenvectors for r = 1, . . . , R as

dx(r)(t)

dt

= v(r)(x(r)(t)),

t ≥ 0, x(r)(0) = a ∈ G.

(1.4.5)

Under the HARDI model these integral curves serve as models of axonal ﬁbers inside a

human brain.

1.4.1 Assumptions

The key assumptions of the estimation process in Carmichael and Sakhanenko (2015) are:

(A1) G is a bounded open set in Rd with Lebesgue measure 1. It contains the support of the
twice continuously diﬀerentiable everywhere, super-symmetric tensor ﬁeld D : Rd (cid:55)−→
of even order M > 2 and rank 1 ≤ R ≤ (M + 2)/2. For a vector v and a tensor
RdM
D deﬁne the matrix-valued function T : Rd × RdM (cid:55)−→ Rd2

as

d(cid:88)

d(cid:88)

T (v, D)km := (M − 1)

. . .

Dkmi3...iM vi3 . . . viM ,

k, m = 1, . . . d.

(1.4.1.1)

i3=1

iM =1

Then assume that Ker(T (v(r), D(r)) − λ(r)I) = 0 everywhere in the support of D for
r = 1, . . . , R, where Ker(T ) stands for the kernel of the linear map T i.e. the space of
all vectors that are zero under T .

(A2) The initial point a lies inside the support of D(·).

(A3) There exists a number τ > 0 such that for all t1, t2 ∈ (0, τ ) with t1 (cid:54)= t2, x(r)(t1) (cid:54)=

x(r)(t2) for all r = 1, . . . , R.

13

(A4) Locations {Xj, j ≥ 1} are independent and uniformly distributed in G.

(A5) The observed data {(Xj, Y (Xj)), j = 1, . . . , n}, obeys the model given in equation
(1.4.3) with a ﬁxed non-random known real-valued N × JM matrix B, an unknown
symmetric positive deﬁnite N × N tensor ﬁeld Σ : Rd (cid:55)−→ RN 2

continuous on G , un-

observable random N -component vectors Ξj, j = 1, . . . , n, respectively. Additionally,
it is assumed that BT B is invertible and EΣ4

kl(X1) < ∞, 1 ≤ k, l ≤ N .

(A6) The random measurement error vectors Ξj, j ≥ 1, are i.i.d and independent of loca-
tions. Also, EΞ1 = 0 and EΞ1,kΞ1,l = δkl for all 1 ≤ k, l ≤ N , where δkl is the

Kronecker delta.

(A7) The kernel K is non-negative and twice continuously diﬀerentiable on its bounded

support. Moreover, (cid:82)

K(x)dx = 1,(cid:82)

xK(x)dx = 0.

Rd

Rd

(A8) The bandwidth hn satisﬁes the condition nhd+3

n → β > 0 as n → ∞, where β is a

known ﬁxed number.

1.4.2 Estimation

The estimation of the integral curves was proposed by Carmichael and Sakhanenko (2015)

in the following steps:

14

Algorithm 1: Estimation of Integral Curves

Input: B, Y, Xj, K j = 1(1)n

Output: ˆX (r)

n (t)

1 Estimate D at locations Xj, using the ordinary LSE deﬁned by

˜D(Xj) = (BT B)−1BT Y (Xj), j = 1(1)n

or the weighted LSE given by

˜D(Xj) = (BT Σ−1(Xj)B)−1BT Σ−1(Xj)Y (Xj), j = 1(1)n

and since Σ is generally unknown, this relationship can be iterated, see Zhu et. al.

(2007,2009).

2 Estimate D at every other location x ∈ G using the kernel smoothing estimator ˆD at

locations in-between the measurement locations Xj using

(cid:18) x − Xj

(cid:19)

n(cid:88)

j=1

ˆDn(x) =

1
nhd
n

K

hn

˜D(Xj),

(1.4.2.1)

where K is a kernel function and hn is a bandwidth.

3 Estimate v(r)(x), r = 1(1)R using the iteration method described in (1.4.4) for any x ∈ G.

Thus, for all r = 1(1)R we get pseudo-eigenvalues ˆλ(r)

n (x) and pseudo- eigenvectors ˆv(r)
4 Finally, estimate x(r)(t), t ∈ [0, τ ], r = 1(1)R, using the solution of the ODE given by an

n (x).

estimator of the integral curve x(r)(t), t ∈ [0, τ ],

dˆx(r)
n (t)
dt

= ˆv(r)

n (ˆx(r)

n (t)),

t ≥ 0, ˆx(r)

n (0) = a.

15

The details and implementations of this algorithm in a simulation study and in a real

data analysis can be found in Carmichael and Sakhanenko (2015).

1.4.3 Asymptotic Distribution

Under the assumptions (A1)–(A8), Carmichael and Sakhanenko (2015), established as n →
∞

(cid:113)
nhd−1

n

(r)

n (t) − x(r)(t))

(ˆx

d−→ G(t), t ∈ [0, τ ], for all r = 1, . . . , R,

(1.4.3.1)

(cid:113)

nhd−1

n =

where G(t) is a Gaussian process that depends on D, K, x(r) and β > 0, the latter is a

tuning constant from condition (A8). This result indicates that there exist asymptotically

normal estimators of the respective integral curves with a convergence rate of

O(n2/(d+3)).

The goal of this chapter is to prove that this rate is optimal in the minimax sense under

some appropriate loss. Theorem 1 in the present chapter establishes the minimax rate

optimality of the estimators of the integral curves in Carmichael and Sakhanenko (2015).

Another result is described in Carmichael and Sakhanenko (2015): for a ﬁxed r = 1, . . . , R

there exists an unique point τr ∈ (0, τ ) for which the sequence

(cid:113)

(cid:34)

nhd−1

n

| ˆx

(r)

n (t) − z |2 − | x(r)(τr) − z |2

min
t∈[0,τ ]

(cid:35)

(1.4.3.2)

converges to a non-degenerate distribution. The quantity described in (1.4.3.2) represents
the minimum L2 distance between the estimated integral curve and a point z. In fact a more

general result holds. Let Γ be a closed subset of G, let d(x, y) be a distance between x and

16

y in Rd. Now deﬁne

d(x, Γ) = inf
y∈Γ

d(x, y).

Then for a strictly increasing function m deﬁned on R+ (for example m(u) = u2 or m(u) =

u, u > 0), let ϕ(x) = m(d(x, Γ)). Subsequently, the sequence (1.4.3.2) can be generalized to

(cid:34)

(cid:113)
nhd−1

n

(cid:16)

(cid:17) − min

t∈[0,τ ]

(cid:16)

min
t∈[0,τ ]

ϕ

(r)
n (t)

ˆx

ϕ

x(r)(t)

, r = 1, . . . , R.

(1.4.3.3)

Theorem 2 in this chapter ensures the optimal rate of convergence of (1.4.3.3) in the mini-

max sense for each r = 1, . . . , R. Hence, it will guarantee that the tests of whether a ﬁber

reaches a region, based on the statistics min
t∈[0,τ ]

ϕ

appropriate loss functions.

, have minimax-optimal rates under

(cid:17)(cid:35)

(cid:17)

(cid:16)

(r)
n (t)

ˆx

17

Chapter 2

Local Minimax Bounds

2.1

Introduction

This chapter will address the issue of optimality of the rate of convergence and establish the

minimax optimal rate for the asymptotic risk of the integral curve estimators and some of

their functionals. We will use H´ajek’s lemma for a specially constructed parametric subclass

inside the given class of tensor ﬁelds to bound from below the supremum of asymptotic risk

of the integral curve estimators rising out of the subclass in the same spirit as in the book

by Ibragimov and Khasminskii (2013). This will in turn provide the lower bound for the

integral curve estimators based on the bigger general class of tensors, which is our main object

of interest. In many decision theoretic problems minimax bounds on the asymptotic risk

function play an important role in choosing an optimal estimator from a class of estimators.

Thus, establishing a lower bound for asymptotic minimax risk for decision rules has been

studied extensively in past literature. Efromovich (2018, 2014), Ibragimov and Khasminskii

(2013) have provided a general framework in simpler non- and semi-parametric estimation

problems to study the minimax rates for the asymptotic risk of the estimators therein. While

Cator (2011) has studied minimax lower bounds in the nonparametric estimation problem

of a monotone regression (or density) function, Guntuboyina (2011) has provided a more

generalized framework for the study of minimax lower bounds with an extensive theoretical

18

foundation in semi- and nonparametric estimation problems. Thus, given the problem of

accurately estimating neural ﬁbers we believe that minimax lower bounds for the integral

curve estimators of ﬁbers will play an important role.

Sakhanenko (2012) in her work established the minimax lower bounds for the asymptotic

risk of the estimators of the integral curve under a simpler model where imaging signals were

modeled by a vector ﬁeld perturbed by an additive noise. That work does not ensure the

point-wise convergence rate of the asymptotic risk of integral curve estimators are optimal

in the minimax sense when we see ﬁber patterns that cross each other, converge, diverge,

‘fan out’ or ‘kiss’. To assess such situations we will consider the HARDI model, and we will

establish minimax lower bounds for the asymptotic risk of the ﬁber estimators there. Addi-

tionally, our construction relaxes the orthogonality of axonal ﬁber tracts, a key consequence

of the vector model by Sakhanenko (2012), hence the present work will generalize the results

of that paper.

The rest of the chapter is organized as follows. In Section 2 we will introduce the theo-

rems, notations, assumptions along with the construction of the parametric subclass inside

the general tensor class, in order to establish the minimax optimal rates of the asymptotic

risk of the ﬁber estimators. Section 3 will illustrate our theory via a simulation example.

The concluding remarks will be provided in Section 4. All the detailed proofs of our results

along with the necessary lemmas will be described in Section 5 of this chapter.

2.1.1 Notations

Throughout our work the following notations are used. For x ∈ Rd, xl represents its l-th
coordinate, l = 1, . . . , d, | · | denotes the absolute value of a real number, (cid:107)·(cid:107) represents
the standard Euclidean norm of a vector, and for any matrix A, (cid:107)A(cid:107)2

F = T r(AAT ) denotes

19

the matrix norm (or otherwise known as Frobenius norm). Similarly, for a tensor D, we
deﬁne (cid:107)D(cid:107)2

, see De Lathauwer et. al. (2000) for reference. Also

D2

. . .

d(cid:80)

d(cid:80)

F :=

d(cid:80)

i1=1

iM =1

i1...iM

(cid:104)x, y(cid:105) =
xiyi. Next, for a function f : Rd (cid:55)−→ R, ∇f denotes the gradient and ∇2f
denotes the tensor (matrix) ﬁeld of the second order partial derivatives of f. Finally, Det(·)

i=1

denotes the determinant value of the corresponding matrix.

2.2 Assumptions and main results

In this section the general results regarding minimax optimality of the estimators of the

integral curves deﬁned in (1.4.5), are stated along with additional notations and assumptions.

2.2.1 Assumptions

We require a slight modiﬁcation of the assumptions introduced in the previous chapter, in

order to propose the results for minimax optimality of the integral curve estimators.

(A9) Conditions (A1)–(A6) hold as described in the previous chapter.

(A10) Noise variables {ξi : i = 1, 2, . . .} are independent and identically distributed with a

common density function f : RN (cid:55)−→ R+ independent of {Xi : i = 1, 2, . . .}.

(A11) The function

(A12) (cid:82)(cid:13)(cid:13)∇2√

f (y)(cid:13)(cid:13)2

√
f is twice diﬀerentiable everywhere in RN .

dy < ∞ and(cid:82) f (y)(cid:107)∇ log f (y)(cid:107)4dy < ∞.

(A13) In addition to (A5), for all x ∈ G the matrix Σ(x) is ﬁnite positive deﬁnite such that

its smallest eigenvalue µmin(x) ≥ µ > 0.

20

Note that since f is a density function, assumption (A9) guarantees that there exists

an open set S ∈ RN , where ∇√
(cid:82) f (y)(cid:107)∇ log f (y)(cid:107)αdy < ∞ for any α ∈ [0, 4] by means of H¨older’s inequality. Assumption

dition (F3) in Sakhanenko (2012), which along with the fact that f is a density, implies

f is not zero. Assumption (A12) is similar to the con-

(A13) on the scaling matrix is not very restrictive, and it allows to bound any arbitrary ﬁnite
power of Σ−1 matrix within the set G in its matrix norm.

Throughout this chapter, the following classes are considered. For a ﬁxed τ > 0 let
D2(a, G, τ ) be the class of super-symmetric tensor ﬁelds D which satisfy conditions (A1)–
(A3). Let W be the class of all non-trivial even functions w : R (cid:55)−→ R+ that are 0 at 0,

and whose subgraphs are convex, see Ibragimov and Khasminskii (2013). Examples of such
functions could be u2,|u|, I(|u| > c). Let En(τ ) denote the class of all possible estimators of
the integral curves x(r)(t), r = 1, . . . , R, t ∈ [0, τ ], based on the data.

Theorem 2. Let τ > 0, 1 ≤ R ≤ (M + 2)/2. Suppose the assumptions (A9) – (A13) hold.
Then for any point a ∈ G, any t0 ∈ (0, τ ], any function w ∈ W and any unit vectors
e1, . . . , eR ∈ Rd the following holds

(cid:104) ˆX (1)

n (t0) − x(1)

n (t0), e1(cid:105)

...

(cid:104) ˆX (R)

n (t0) − x(R)

n (t0), eR(cid:105)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)

 > 0.

lim inf
n→∞

inf
n ∈En(τ )
n ,..., ˆX (R)

ˆX (1)

sup

D∈D2(a,G,τ )

Ew

n2/(d+3)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)


The result simply means that the integral curve estimator ˆX

(r)
n (t0), r = 1(1)R, minimizes
the maximum risk among all the estimators inside the class D2(a, G, τ ) at any point t0 ∈

(0, τ ]. In calculation of the asymptotic risk we use the Euclidean norm of error projections

appropriately scaled by the asymptotic rate of convergence n2/(d+3). It is interesting to note

that the errors in estimation ˆX

(r)

n (t0) − x

(r)
n (t0), r = 1(1)R are projected on the directions

21

given by unit vectors e1, . . . , eR that are not needed to be orthogonal. In our proof we would

use a tensor ﬁeld of rank R = 2, but our result can be generalized for any tensor ﬁeld of
rank R ≥ 2 in a similar fashion.

Next, in order to state Theorem 2, let us deﬁne for a ﬁxed τ > 0 the class Fn(τ ) of all

(cid:16)

(cid:17)

possible estimators of

inf
t∈[0,τ ]

ϕ

(r)
n (t)

x

based on the data.

Theorem 3. Suppose the assumptions (A9) – (A13) hold. Then for any closed subset Γ ⊂ G,
any point a ∈ G\Γ such that ϕ is continuously diﬀerentiable at a with ∇ϕ(a) (cid:54)= 0, for any
τ > 0 and any function w ∈ W we have

lim inf
n→∞

inf
n ∈Fn(τ )
ˆF (1)
n ,..., ˆF (R)

sup

D∈D2(a,G,τ )

Ew

n2/(d+3)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)


n − inf
ˆF (1)
t∈[0,τ ]
...

n − inf
ˆF (R)
t∈[0,τ ]

(cid:16)
(cid:16)

ϕ

ϕ

x(1)
n (t)

x(R)
n (t)

(cid:17)
(cid:17)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)

 > 0.

Theorem 3 in this chapter corresponds to the connectivity test described in Corollary

1 of Theorem 1 in C-S (2015). The connectivity of brain ﬁbers refers to a question of

whether a curve starting at a given initial point a travels through a region of interest z.

For example, there is a C pattern across the genu of Corpus Callosum that connects the

(cid:16)

(cid:17)

the functional

inf
t∈[0,τ ]

ϕ

(r)
n (t)

x

left and right lobes in a human brain as on Figure 1. In such situations the estimator of

, r = 1(1)R constructed using the integral curve estimator

ˆX

(r)
n (t0), r = 1(1)R, is also minimax, which means that it minimizes the supremum of the

error in estimation among all other estimators uniformly, with respect to the class of tensors
D2(a, G, τ ) under any arbitrary loss function w ∈ W.

Next, we describe the parametric subclass construction, which will be pivotal for estab-

lishing the two theorems that we have described above.

22

2.2.2 Parametric subclass of tensors

The main idea of the proofs is to bound the supremum over D2(a, G, τ ) from below by a

supremum over a suitable parametric subclass and connect the deviation between estimated

integral curve and the true integral curve with the deviation between the parameter and its

estimator. Then we will apply H´ajek’s Lemma that provides the positive lower bound for

the parameter estimators for the parameters inside the subclass.

Thus, in this section we construct the parametric subclass. Without loss of generality
assume R = 2. For the parameters θ1, θ2 ∈ R we select numbers λ1 > λ2 > 0 and vector-
0 , g : Rd (cid:55)−→ Rd in a special way described after the tensor class construction.
as its i1, .., iM -th

Recall that for a vector v the super-symmetric tensor v⊗M has vi1

(1)
0 , v

ﬁelds v

. . . viM

(2)

component. Deﬁne

(1)

(1)

(2)

(2)

n (x, θ2)⊗M

n (x, θ1)⊗M + λ2v
0 (x) + θ1n−αgn(x))⊗M} + λ2{(v
0 (x)⊗M + θ1n−αv
0 (x)⊗M + θ2n−αv

Dn(x, θ) = λ1v
= λ1{(v
= λ1{v
+ λ2{v
= D0(x) + θ1n−αD1(x) + θ2n−αD2(x) + n−2αM
+ n−2αM

0 (x)⊗M−1 ⊗ gn(x) + n−2αM
0 (x)⊗M−1 ⊗ gn(x) + n−2αM

0 (x) + θ2n−αgn(x))⊗M}
n (x, θ1)}
n (x, θ2)}

(1)
n (x, θ1)

(2)
n (x, θ2),

(1)

(2)

(1)

(2)

(1)

(2)

(2.2.2.1)

where

gn(x) = g(x1, nγx2, ..., nγxd), γ > 0,

(2.2.2.2)

23

D0(x) = λ1v

D1(x) = λ1v

D2(x) = λ2v

M

(1)
n (x, θ1) = λ1

M

(2)
n (x, θ2) = λ2

(2)

0 (x)⊗M ,

(1)

(1)

(2)

0 (x)⊗M + λ2v
0 (x)⊗M−1 ⊗ gn(x),
0 (x)⊗M−1 ⊗ gn(x),
M(cid:88)
0 (x)⊗M−i ⊗ (θi
M(cid:88)

0 (x)⊗M−i ⊗ (θi

i=2

(1)

v

(2)

v

1n−iαgn(x)⊗i),

2n−iαgn(x)⊗i).

(2.2.2.3)

i=2

It is evident that the pseudo-eigenvectors of Dn are

(1)
n (x, θ1) = v
v

(2)
n (x, θ2) = v
v

(1)

0 (x) + θ1n−αgn(x),
0 (x) + θ2n−αgn(x).

(2)

In addition to the assumptions made in Section 2.2, we assume the following for the pseudo-

eigenvalues and pseudo-eigenvectors in the parametric subclass of the tensors.

(A14) Numbers α, γ > 0 are chosen so that

1 − γ(d − 1) − 2α = 0, α = 2/(d + 3).

(A15) The numbers λ1, λ2 representing pseudo-eigenvalues of the tensor Dn are such that
0 (x) : G (cid:55)−→ Rd are bounded and continuous
0 (x)} has zero Lebesgue measure,
(1)
0 (x) = cv

(1)
λ1 > λ2 > 0. The vector ﬁelds v
0 (x), v
such that for any c ∈ R the set {x : v

(2)

(2)

24

and the following inequality holds

(cid:32)

(M − 1)qM−2(x)λ2

λ1

1 +

(cid:20)(M − 1)q2(x)

(M − 2)

(cid:21)(cid:33)

− 1

(cid:54)= 0,

where q(x) = (cid:104)v

(1)
0 (x), v

(2)

0 (x)(cid:105).

Additionally, we assume the integral curves associated with v

(i)
0 , i = 1, 2, satisfy con-

dition (A3).

(A16) The constants θ1, θ2 ∈ R are bounded. The function g : G (cid:55)−→ Rd is such that (cid:107)g(cid:107)M

is L4 integrable on Rd and the following relationships hold

(cid:107)v

(i)

0 (x) + θ1n−αgn(x)(cid:107) = 1, for all x ∈ G, i = 1, 2.

Note that (A14) boils down to γ = (1 − 2α)/(d − 1) = 1/(d + 3) = α/2. Also note that

(1)
0 (x), v
v

(2)
0 (x) and g have the support G. Therefore, all the tensor ﬁelds also have the

support G, thus (A2) is trivially satisﬁed for Dn. Moreover, (A16) is quite natural since the

(i)
perturbations from v
0

are assumed to be small. Finally, it is easy to note that if q = 0

then it implies that v

(1)
0 (x) and v

(2)
0 (x) are orthogonal, and hence, (A15) will be trivially

satisﬁed.Therefore, the class containing v

(1)
0 (x), v

(2)
0 (x) and g satisfying (A15) and (A16) is

fairly large.

25

2.3 Experiment

We begin this section by introducing the imaging protocol behind HARDI. Figure 1.1 in

the introduction was obtained from a Diﬀusion Weighted Imaging (DWI) dataset that was

collected from a 26-year-old healthy male brain on a GE 3T Signa HDx MR scanner (GE

Healthcare, Waukesha, WI) with an 8-channel head coil. The subject signed the consent

form approved by the Michigan State University Institutional Review Board. DWI images

were acquired with a spin-echo echo-planar imaging (EPI) sequence for 13 minutes with

the following parameters: 48 contiguous 2.4-mm axial slices in an interleaved order, FOV
= 22 cm × 22 cm, matrix size = 128 × 128, number of excitations (NEX) = 1, TE = 72.3

ms, TR = 11.5 s, 60 diﬀusion-weighted volumes (one per gradient direction) with b = 1000

s/mm2, 6 volumes with b = 0 and parallel imaging acceleration factor = 2.

The seed point is chosen in the corpus callosum (CC), which contains thick axonal ﬁbers

connecting the two cerebral hemispheres and enabling the communication between them.

The general anatomical locations of these axonal ﬁbers are well established. These ﬁbers are

often used to evaluate new techniques in ﬁber tractography.

On Figure 1.1 a ﬁber in the anterior part of the CC, called the genu of CC, is constructed
with δ = 0.003, β = 10−7. The branches are shown in magenta and cyan colors, they were

traced for 70 and 50 steps.

According to Theorem 1 log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n is asymptotically close to −2/(d+3),
which is −1/3 in case of 3D image. Even though the general anatomical locations of some

ﬁbers are well studied, the exact true ﬁbers x(t0) are not quite known. Also the same subject

is usually not scanned repeatedly for 100 times to assess the estimation error empirically.

That is why we will illustrate our results via an artiﬁcial simulation study to trace some

26

typical patterns of ﬁbers when the underlying truth is known mathematically.

We consider the 2 typical patterns for axonal ﬁbers: C-pattern and Y-pattern. We use

the same design of these patterns as described in Sakhanenko and DeLaura (2017). For

each pattern the tensor ﬁeld was generated perturbed by a normal noise. Then a random
sample of n0 × n0 × n0 observations on a regular grid was taken, and a ﬁber was estimated.
The distance (cid:107) ˆXn(t0) − x(t0)(cid:107) at the endpoint was computed. The procedure was repeated

100 times independently. For Y-pattern the endpoint t0 is chosen on the main branch. We

used the ﬁber thickness 0.04 and signal-to-noise ratio of 5 in the setup of Sakhanenko and

DeLaura (2017). The tuning parameter β was 0.0001.

The results are presented in Figures 2 and 3 as boxplots of 100 values

log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for each sample size n = n3
0.

Figure 2.1: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for C-pattern. We traced each ﬁber for 30
steps of size δ = 0.02. 100 of independent samples of n = n3
0 observations are used to create
each boxplot. The labels on x-axis mark n0. The theoretical value is −1/3. The location t0
is the endpoint of the trace.

27

163248648096112128144160-1.2-1-0.8-0.6-0.4-0.200.2Figure 2.2: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for Y-pattern. We traced each ﬁber for 30
steps of size δ = 0.02. 100 of independent samples of n = n3
0 observations are used to create
each boxplot. The labels on x-axis mark n0. The theoretical value is −1/3. We trace the
main branch.

The boxplots of values log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n hover around the theoretical value
−1/3 for all sample sizes and both patterns. They are uniformly closer to the theoretical

value for C-pattern than for Y-pattern, which is expected, due to branching.

2.4 Remarks and Discussion

In the construction of the proof we have assumed the order M of the tensor D can be any

even number. Recent developments in the research have shown cases where M could be taken

as 4 or 6. In general d is always 2 or 3 depending on the imaging technique. Therefore, if

order M = 4 tensors are studied then the rank of the tensor can be at most R = 3, see (A1).

In that situation proof of Theorem 2 and 3 will follow along the lines of Sakhanenko (2012).

The idea comes from the fact that v

(2)
n (x, θ2) can be chosen orthogonal to
n (x, θ1), 1 ≤ r ≤ 3 can be chosen to be unit vector ﬁelds along the

(1)
n (x, θ1) and v

each other. Thus, v

(r)

coordinate axes, and therefore, the orthogonality will be utilized in the proof. But for the

28

163248648096112128144160-1-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.10rank R ≥ 4, which is possible for the order M ≥ 6, such a proof would not work.

On the other hand, our estimation problem is motivated by the situation, where axonal

ﬁbers have “branching”, “crossing”, “converging” patterns in them and, hence, are never

orthogonal to each other. Moreover, orders higher than 4 are becoming important in order

to model bundles of crossing ﬁbers. In these situations the proof that we discuss here can

be used with wider general applicability.

Also we used the lower bounds inspired by H´ajek’s lemma and LAN families technique.

Alternatively, one can try to establish lower bounds using the technique in Efromovich (2008).

Finally, we remark that the integral curve estimators are instrumental in providing con-

ﬁdence regions along axonal ﬁbers on a brain image based on DT-MRI data, and the width

of these conﬁdence regions is controlled by the convergence rate. Thus, showing rate opti-

mality indicates that the widths of these conﬁdence regions cannot be further optimized by

choosing faster convergent estimators.

2.5 Proofs

First, we need to check that the subclass of tensors constructed in the previous section is
indeed a subclass of the class D2(a, G, τ ) for all n suﬃciently large. Obviously, conditions

(A2) and (A3) are satisﬁed for all large n. Lemma 1 in this section shows that (A1) is

fulﬁlled for the tensors in the subclass. This result acts as a requirement in the proof
of Theorem 1 where we establish ˆθi,n − θi = nα(cid:104) ˆX

0 (cid:105) × O(1). We will then
use H´ajek’s lemma (lemma 7) for the vector of parameters (θ1, θ2) to establish the lower
bound for the asymptotic risk of the integral curve estimators in D2(a, G, τ ), see e.g. the

(i)

(i)

n (t0) − x

book by Ibragimov and Khasminskii (2013). But Hajek’s lemma requires the family of

29

densities derived in Lemma 2 to be so-called locally asymptotically normal (LAN) with well-

deﬁned limit of information matrices that are deﬁned in Lemma 3. Lemmas 4 and 5 check

LAN condition, meanwhile Lemma 5 and Lemma 6 establish conditions required for H´ajek’s

lemma. In particular, Lemma 6 proves Lyapunov’s condition.

2.5.1 Constructed parametric subclass

The representation (2.2.2.1) is the parametric construction of the tensor ﬁeld Dn. In order to

establish the minimax lower bound this construction will be followed throughout this chapter.

Below the ﬁrst Lemma is presented which ensures that the construction as proposed above

satisﬁes the assumption (A1) that has been stated in the introduction, see Sakhanenko et.

al. (2015) for reference.

Recall

(1)
n (x, θ1) = v
v

(2)
n (x, θ2) = v
v

(1)

0 (x) + θ1n−αgn(x),
0 (x) + θ2n−αgn(x).

(2)

For the ease of notation deﬁne the following expressions

D

(2)
n (θ2) = λ2v

D

(1)
n (θ1) = λ1v

(2)

n (x, θ2)⊗M
n (x, θ1)⊗M + λ2v

(1)

(2)

n (x, θ2)⊗M

(2.5.1.1a)

(2.5.1.1b)

(2.5.1.2)

(2.5.1.3)

30

T (v

(i)
n (x, θi), D

(i)
n (θi))km

d(cid:88)

d(cid:88)

D

(i)
(n)kmi3...id

(θi)v

(i)
(n)i3

(x, θi) . . . v

(i)
(n)iM

(x, θi),

(2.5.1.4)

=(M − 1)

. . .

i3=1

iM =1

where gn(x) is deﬁned in equation (2.2.2.2). It is also assumed that for i = 1, 2:

(cid:107) v

(i)

n (x, θi) (cid:107)= 1, subsequently (cid:107) v

(i)

0 (x) (cid:107)=: ci < 1.

(2.5.1.5)

Lemma 3. For the tensors D

(2)
n (θ2) and D

(1)

n (θ1) the following relations hold for all x ∈ G

Ker(T (v
Ker(T (v

(2)
n (x, θ2), D

(1)
n (x, θ1), D

(2)

n (θ2)) − λ2I) = 0,
n (θ1)) − λ1I) = 0.

(1)

(2.5.1.6)

(2.5.1.7)

31

Proof.

Det(T (v

(2)
n (x, θ2), D

d(cid:88)

(2)

n (θ2)) − λ2I)
d(cid:88)

=Det

(M − 1)

. . .

(cid:16)

i3=1

iM =1

(cid:17)

− λ2δkm
(cid:16)

=Det

(M − 1) λ2v
d(cid:88)

(v

(2)
(n)iM

iM =1

. . . (

=λd

=Det((M − 1) λ2 v
2 Det((M − 1) v
d(cid:88)
2 {−(M − 1) (

=λd

i=1

D

(2)
(n)kmi3...id

(θ2)v

(2)
(n)i3

(x, θ2) . . . v

(2)
(n)iM

(x, θ2)

d(cid:88)

i3=1

(v

(2)
(n)i3

(x, θ2))2)

(2.5.1.8)

(cid:17)

(2)
(n)k(x, θ2) v

(2)
(n)m(x, θ2) (

(x, θ2))2) − λ2δkm

(2)
(n)k(x, θ2) v

(2)
n (x, θ2) v

(2)

(n)m(x, θ2) − λ2I) by the assumption in 2.5.1.5
n (x, θ2)T − I)

(2)

(2)

(n)i(x, θ2))2) + 1} by algebraic manipulation

(v

= − λd

2(M − 2) (cid:54)= 0 is ensured as long as M > 2.

Similarly for equation (2.5.1.7) it can be seen

Det(T (v

(1)
n (x, θ1), D

(1)

n (θ1)) − λ1I)
d(cid:88)

d(cid:88)

. . .

i3=1

iM =1

=Det((M − 1)

− λ1δkm).

D

(1)
(n)kmi3...id

(θ1)v

(1)
(n)i3

(x, θ1) . . . v

(1)
(n)iM

(x, θ1)

(2.5.1.9)

32

To simplify the expression consider

d(cid:88)
d(cid:88)

i3=1

. . .

. . .

=

d(cid:88)
d(cid:88)

iM =1

D

(cid:16)

i3=1

iM =1

(1)
(n)kmi3...id

(θ1)v

(1)
(n)i3

(x, θ1) . . . v

(1)
(n)iM

(x, θ1)

λ1 v

(1)
(n)k(x, θ1) v

(1)
(n)m(x, θ1) (v

(1)
(n)i3

(x, θ1))2 . . . (v

(1)
(n)iM

(x, θ1))2

+ λ2 v

(2)
(n)k(x, θ2) v

(2)
(n)m(x, θ2) (v

(x, θ1)v

(2)
(n)i3

(x, θ2))

(1)
(n)i3

(cid:17)

. . . (v

(1)
(n)iM
(1)

(2)
(n)iM

(x, θ1)v

(x, θ2))
(n)m(x, θ1) + λ2 qM−2

n

v

=λ1 v

(1)
(n)k(x, θ1) v

(2)
(n)k(x, θ2) v

(2)
(n)m(x, θ2),

where

Denote the matrix

d(cid:88)

i=1

qn =

(1)
(n)i(x, θ1)v
v

(2)
(n)i(x, θ2),

qM−2

n

= q(cid:48)

n > 0, since M > 2 is even.

A = (M − 1)λ1v

(1)
n (x, θ1)v

(1)

n (x, θ1)T − λ1I,

which is non-singular by (2.5.1.8). Rewrite equation (2.5.1.9) as follows

(2.5.1.10)

(2.5.1.11)

n (θ1)) − λ1I)

(1)

Det(T (v
(1)
n (x, θ1), D
= Det((M − 1)[λ1 v
= Det(A + {(M − 1)λ2q(cid:48)
nv

(2)

n (x, θ2)}v

(2)
n (x, θ2)T )

(1)
n (x, θ1) v

(1)

n (x, θ1) + λ2 q(cid:48)

(2)
n (x, θ2) v

n v

(2)

n (x, θ2)] − λ1I)

(2.5.1.12)

= Det(A)(1 + v

(2)

n (x, θ2)T A−1(M − 1)λ2q(cid:48)
nv

(2)
n (x, θ2)),

33

where we used the identity Det(A + uvT ) = (1 + vT A−1u)Det(A). Next, we apply the
identity to obtain A−1

(B + ceT )−1 = B−1 − B−1ceT B−1
1 + eT B−1c

.

(2.5.1.13)

The expression for A−1 becomes

(1)

n (x, θ1)}v

(1)

n (x, θ1)T (− 1
I)
λ1
n (x, θ1)}

(1)

I){(M − 1)λ1v

(1)
n (x, θ1)T

A−1 = − 1
λ1

I −

1 + v

(− 1
λ1

(1)

I){(M − 1)λ1v
n (x, θ1)T (− 1
λ1
(1)
n (x, θ1)v
1 − (M − 1)
(1)
n (x, θ1)v
(2 − M )λ1
(1)
n (x, θ1)v
(M − 2)

I − (M − 1)/λ1v
I − (M − 1)v
(M − 1)v

−I +

= − 1
λ1
= − 1
λ1

(cid:34)

=

1
λ1

(1)
n (x, θ1)T

(1)
n (x, θ1)T

(2.5.1.14)

(cid:35)

.

34

Using above rewrite equation (2.5.1.12)

(cid:17)

(2)

n (x, θ2)T A−1{(M − 1)λ2q(cid:48)
nv
(M − 1)v

(2)
n (x, θ2)

(cid:105)

(1)
n (x, θ1)v
(M − 2)

(1)
n (x, θ1)T

(cid:35)

(2.5.1.15)

(cid:16)
(cid:32)

(cid:32)

=Det(A)

Det(A)

1 + v

=Det(A)

1 + v

(cid:2)(M − 1)λ2q(cid:48)

(2)

n (x, θ2)T 1
λ1

(cid:104) − I +
n (x, θ2)(cid:3)(cid:33)
(cid:34)
(cid:105)(cid:33)

(2)
n (x, θ2)T +

−v

nv

(2)

1 +

(2)
n (x, θ2)

1
(cid:104)
λ1
(M − 1)λ2q(cid:48)
(cid:20)
nv
(cid:34)
−(M − 1)λ2q(cid:48)
n +
−(M − 1)λ2qM−2

1
λ1
1
λ1
(M − 1)qM−2

(cid:32)

n

λ2

=Det(A)(1 +

=Det(A)(1 +

=Det(A)

1 +

n
λ1

qn(M − 1)v

(1)
n (x, θ1)T

(M − 2)

n

(cid:21)
(M − 2)
)
λ2(M − 1)2qM

n(M − 1)2λ2q(cid:48)
q2
(cid:21)(cid:33)

(M − 2)

(cid:20)(M − 1)q2

+

n

n

− 1

(M − 2)

(cid:54)= 0.

(cid:35)

)

(1)
0 (x) and v

(2)
Note if g, v
0 (x) are orthogonal to each other then the inequality in (2.5.1.15)
holds for any n provided R ≤ d − 1. In general, a close investigation of the quantity given
0 (x)(cid:105) + o(1) and together with
by qn = (cid:104)v

n (x, θ2)(cid:105) reveals that qn = (cid:104)v

(1)
n (x, θ1), v

(1)
0 (x), v

(2)

(2)

(A13), it implies that the above inequality holds for all n large enough. Thus, this Lemma

shows that the parametric reconstruction of the tensor subclass is compatible with (A1) and

can be used for investigation of asymptotic lower bounds for the estimation of the integral

curve.

35

2.5.2 Main Lemmas

The next section will use the ideas and techniques mentioned in Ibragimov and Khasminskii

(2013) in order to utilize it further in the proofs of the two Main Theorems. The lower

bound of rates of convergence for the diﬀerence in parameter and its estimate are obtained

from the results for LAN families of distributions. A well known result due to H´ajek (1972)

will be presented in this subsection. Lemma 4 will introduce the density function for the

subclass model. Lemma 5 and 6 would imply that the estimation problem inside parameter

subclass yields a regular experiment. Lemma 5, 7, and 8 would imply the conditions of the

Theorem II.3.1’ in Ibragimov and Khasminskii (2013), which would further imply that the

family of distributions in (2.5.2.2) is LAN. Note that lemma 9 is somewhat similar to the

formula II.12.19 in Ibragimov and Khasminskii (2013), which also known as the Lemma due

to H´ajek (1972).

Consider a typical observation under the parametric subclass. It would be written as

(cid:32)

(cid:16)

(xi, yi) =

xi, B

D0(xi) + θ1n−αD1(xi) + θ2n−αD2(xi) + n−2αM

(cid:17)

(cid:33)

(1)
n (x, θ1)

(2.5.2.1)

+ Σ1/2(xi)ξi

,

i = 1, . . . , n.

+ n−2αM

(2)
n (x, θ2)

Lemma 4. Then the function given by:

φn(x, y, θ1, θ2) = f (Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM

+ θ2n−αD2(x) + n−2αM

(1)
n (x, θ1)
n (x, θ2))))Det(Σ−1/2(x))I(x  G)

(2)

(2.5.2.2)

represents the density for (X, Y ).

Proof. Note that φn(x, y, θ1, θ2) ≥ 0. To check that (cid:82)(cid:82) φn(x, y, θ1, θ2)dxdy = 1 let us

36

consider the following:

(cid:90) (cid:90)

f (Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM

(1)
n (x, θ1)

+ θ2n−αD2(x) + n−2αM

(2)

n (x, θ2))))Det(Σ−1/2(x))I(x  G).

(2.5.2.3)

Apply the transformation

(cid:16)
˜y = Σ−1/2(x)

y − B(cid:0)D0(x) + θ1n−αD1(x) + n−2αM

(1)
n (x, θ1)

n (x, θ2)(cid:1)(cid:17)

(2)

+ θ2n−αD2(x) + n−2αM

˜x = x,

,

(2.5.2.4)

where the Jacobian matrix is given by

∂x

∂ ˜x
∂y
∂ ˜x

Id

 =

∂x
∂ ˜y
∂y
∂ ˜y

J =

∂x
∂ ˜y

0 Σ−1/2(˜x)

 .

(2.5.2.5)

Therefore, | Det(J) |=| Det(Σ1/2(˜x)) |, and hence the following is obtained

(cid:90) (cid:90)

f (˜y) | Det(Σ−1/2(˜x)) || Det(Σ1/2(˜x)) | I(˜x  G)d˜xd˜y = 1.

37

Deﬁne the following quantities, which are components of information matrices

∂θ1

∂θ2

∂ ln φn

φn(x, y, θ1, θ2)dxdy,

|∂ ln φn
∂θ1
|∂ ln φn
∂θ2

|2φn(x, y, θ1, θ2)dxdy,
|2φn(x, y, θ1, θ2)dxdy,

(cid:90) (cid:90)
(cid:90) (cid:90)
(cid:90) (cid:90) ∂ ln φn
| (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105) |2 dxdy,
(cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105)
(cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105)dxdy,
| (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105) |2 dxdy.

Ψ11(n) =n

Ψ22(n) =n

Ψ12(n) =n

(cid:90)
(cid:90)

[c1,c2]×Rd−1

[c1,c2]×Rd−1

(cid:90)

(cid:90)
(cid:90)

RN

RN

(cid:90)

(1)
0 =
I

(12)
0 =
I

(2)
0 =
I

RN

[c1,c2]×Rd−1

Lemma 5. Both the matrix I0 as well as Ψ(n) are ﬁnite and positive deﬁnite, and

Ψ11(n) Ψ12(n)

Ψ12(n) Ψ22(n)

 =

 I

(1)
0

(12)
I
0

(12)
I
0

(2)
I
0

 = I0.

n→∞ Ψ(n) = lim
lim
n→∞

(2.5.2.6)

(2.5.2.7)

38

Proof. For x ∈ G we have

ln φn = ln f (Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM
ln|Σ(x)|

+ θ2n−αD2(x) + n−2αM

(2)

n (x, θ2)))) − 1
2

(1)
n (x, θ1)

⇒ ∂ ln φn
∂θ1

= −f−1(cid:104)∇f, n−αΣ−1/2(x)BD1(x) + n−2αΣ−1/2(x)B∇M
= −f−1((cid:104)∇f, n−αΣ−1/2(x)BD1(x)(cid:105)

(1)

n (x, θ1)(cid:105)

(2.5.2.8)

+ (cid:104)∇f, n−2αΣ−1/2(x)B∇M

(1)

n (x, θ1)(cid:105)).

In the above equation the argument of f is suppressed. Similarly,

∂ ln φn

∂θ2

= −f−1((cid:104)∇f, n−αΣ−1/2(x)BD2(x)(cid:105)

+ (cid:104)∇f, n−2αΣ−1/2(x)B∇M

(2)

n (x, θ1)(cid:105)).

(2.5.2.9)

A simple application of Cauchy Schwarz Inequality reveals that Ψ(n) is positive deﬁnite

except for the case where

(x, y) :

∂ ln φn

∂θ1

= k

∂ ln φn

∂θ2

, k ∈ R.

(2.5.2.10)

A careful term by term comparison of (2.5.2.10) with (2.5.2.8) and (2.5.2.9) shows that the

equality holds on the set:

{x : v

(1)
0 (x) = cv

(2)

0 (x), c ∈ R}.

(2.5.2.11)

By (A15) the Lebesgue measure of this set is 0. Next, a careful investigation of the quantity

39

Ψ11(n) shows:

Ψ11(n) = n

|2φn(x, y, θ1, θ2)dxdy

| ∂ ln φn
∂θ1
f−1|(cid:104)∇f, n−αΣ−1/2(x)BD1(x)(cid:105)

(cid:90) (cid:90)
(cid:90) (cid:90)
= n
(cid:90) (cid:90)
+ (cid:104)∇f, n−2αΣ−1/2(x)B∇M
= n
+ (cid:104)∇f, n−2αΣ−1/2(x)B∇M

f−1|(cid:104)∇f, n−αΣ−1/2(x)BD1(x)(cid:105)

(1)

n (x, θ1)(cid:105)|2dxdy

(2.5.2.12)

(1)

n (x, θ1)(cid:105)|2Det(Σ−1/2(x))I(x ∈ G)dxdy.

We consider the following transformation:

i = 2, . . . , d & ˜x1 = x1,

˜xi = nγxi
˜y = Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM

+ θ2n−α2D2(x) + n−2αM

(2)
n (x, θ2))).

(1)
n (x, θ1)

(2.5.2.13)

Deﬁne ˜Gn = {˜x : x ∈ G}. Also since G is a convex set, it can be shown that ˜Gn converges
to [c1, c2] × Rd−1. The Jacobian of the transformation is:

∂x

∂ ˜x
∂y
∂ ˜x

 =


1

∂x
∂ ˜y
∂y
∂ ˜y

J =



0

0 n−γId−1

∂x
∂ ˜y



0

Σ1/2(˜x1, n−γ ˜x−1)

(2.5.2.14)

|Det(J)| = n−γ(d−1)|Det(Σ1/2(˜x1, n−γ ˜x−1))|
= n−γ(d−1)Det(Σ1/2(˜x1, n−γ ˜x−1)),

where x−1 = (x2, . . . , xd). In what follows it is understood that the functional argument x
changes to (˜x, n−γ ˜x−1) and the coordinate system changes from (x, y) to (˜x, ˜y). Also it is

40

understood that the range of integration remains the same for ˜y coordinate, which is RN .

However, the range of integration with respect to ˜x changes to ˜Gn, which has been deﬁned
above. Now rewrite equation (2.5.2.12) suppressing the arguments ˜y, (˜x, n−γ ˜x−1), θ1, θ2 for

the notational simplicity.

Ψ11(n) =n1−γ(d−1)

(cid:90) (cid:90) (cid:16) | (cid:104)f−1/2∇f, n−αΣ−1/2BD1(cid:105)
n (cid:105) |2(cid:17)

(1)

(cid:90) (cid:90)
+ (cid:104)f−1/2∇f, n−2αΣ−1/2B∇M
(cid:104)
n1−γ(d−1)−2α
| (cid:104)f−1/2∇f, Σ−1/2BD1(cid:105) |2 d˜xd˜y
(cid:90) (cid:90)
+ n1−γ(d−1)−4α

| (cid:104)f−1/2∇f, Σ−1/2B∇M

≤2

(cid:105)
n (cid:105) |2 d˜xd˜y

(1)

.

Det(Σ−1/2)Det(Σ1/2)d˜xd˜y

Applying C.S. Inequality on the ﬁrst and second terms in (2.5.2.15) yields

| (cid:104)f−1/2∇f, Σ−1/2BD1(cid:105) |2 ≤ (cid:107)f−1/2∇f(cid:107)2(cid:107)Σ−1/2BD1(cid:107)2,
n (cid:105) |2 ≤ (cid:107)f−1/2∇f(cid:107)2(cid:107)Σ−1/2B∇M

| (cid:104)f−1/2∇f, Σ−1/2B∇M

(1)

(1)

n (cid:107)2.

(2.5.2.15)

(2.5.2.16)

Therefore, equation (2.5.2.15) can be simply written as

Ψ11(n) ≤2n1−γ(d−1)−2α(cid:16)(cid:90)

(cid:17)(cid:16)

(cid:107)f−1/2∇f(cid:107)2d˜y

(cid:90)

(cid:90)

(cid:90)

(˜x1,n−γ ˜x−1)∈ ˜Gn

(cid:17)

(cid:107)Σ−1/2BD1(cid:107)2d˜x

(cid:107)Σ−1/2B∇M (1)

n (cid:107)2d˜x)

+ 2n1−γ(d−1)−4α(

≤2n1−γ(d−1)−2α(cid:16)(cid:90)
+ 2n1−γ(d−1)−4α(cid:16)(cid:90)

(cid:107)f−1/2∇f(cid:107)2d˜y)(

(cid:17)(cid:16)(cid:90)
(cid:17)(cid:16)(cid:90)

(˜x1,n−γ ˜x−1)∈ ˜Gn
F(cid:107)B(cid:107)2
(cid:107)Σ−1/2(cid:107)2
(cid:107)Σ−1/2(cid:107)2

f(cid:107)∇ log f(cid:107)2d˜y

f(cid:107)∇ log f(cid:107)2d˜y

F(cid:107)D1(cid:107)2
F(cid:107)B(cid:107)2

F d˜x)
n (cid:107)2
F(cid:107)∇M (1)

F d˜x

(2.5.2.17)

(cid:17)

,

Note that because of (A5) for all x ∈ G, Σ(x) is assumed positive deﬁnite and (cid:107)Σ(cid:107)4

F < ∞

41

and therefore if we consider an eigen-decomposition of Σ such that

Σ(x) = Q(x)Λ(x)Q(x)T ,

where Λ(x) = Diag(µ1(x), . . . , µd(x)) consists of ﬁnite eigenvalues bounded away from 0 and
∞ for all x ∈ G, Q(x) is an orthogonal matrix. Then for all (˜x1, n−γ ˜x2, . . . , n−γ ˜xd) ∈ ˜Gn it
can be seen that (cid:107)Σ−1/2(˜x1, n−γ ˜x−1)(cid:107)2

F < ∞.

Also recall that Di = λi(v

vectorize the symmetrized version of Di, which is given by

(i)

0 )⊗M−1 ⊗ g for i = 1, 2. In order to bound Di one needs to
(cid:16)

(i)

0 )⊗M−2

Dsym

i

=

1
M

λi

g ⊗ (v

(i)

(i)

0 )⊗M−1 + v
(cid:17)
0 ⊗ g ⊗ . . . ⊗ (v
0 )⊗M−1 ⊗ g(x)

(i)

,

+ . . . + (v

and therefore,

(cid:107)Di(cid:107) ≤ (cid:107)Dsym
=⇒ (cid:107)Di(cid:107) ≤ (cid:107)Dsym

i

i

(cid:107) = λi(cid:107)v
(cid:107) = λi(cid:107)v

(i)

0 (cid:107)M−1(cid:107)g(cid:107)
0 (cid:107)M−1(cid:107)g(cid:107) ∀ (˜x1, n−γ ˜x−1) ∈ ˜Gn.

(i)

Using a similar type of argument one can construct a symmetrized version of ∇M

(i)
n by

symmetrizing each additive component of ∇M

(i)
n , and hence it can be concluded that

(cid:107)∇M

(i)

n (cid:107) ≤ (cid:107)(∇M

(i)

n )sym(cid:107) ≤ λi(a

(i)

2 (cid:107)v

(i)

0 (cid:107)M−2(cid:107)g(cid:107)2 + . . . + a

(i)

M(cid:107)g(cid:107)M ),

where a

(i)
2 , . . . , a

(i)
M > 0 are appropriate constants. Combining these bounds it can be seen

42

that equation (2.5.2.17) becomes

Ψ11(n) ≤ 2n1−γ(d−1)−2α(cid:16)(cid:90)
+ 2n1−γ(d−1)−4α(cid:16)(cid:90)
(cid:16)

(cid:90)

(cid:17)

(cid:107)g(˜x)(cid:107)2d˜x

(cid:90)

˜Gn

Ψ11
1

(cid:17)(cid:16)
(cid:17)

f(cid:107)∇ log f(cid:107)2d˜y

L

f(cid:107)∇ log f(cid:107)2d˜y

Ψ11
L
2

(1)

2 (cid:107)v

(1)

0 (cid:107)M−2(cid:107)g(˜x)(cid:107)2 + . . . + a

(a

(cid:17)

< ∞

(1)

M (cid:107)g(˜x)(cid:107)M )d˜x

(2.5.2.18)

˜Gn

Ψ11
due to assumptions in Section 3, where L
1

, L

Ψ11
2 > 0 are appropriate constants. Finally,

Ψ11(n)
=n1−γ(d−1)−2α

+2n1−γ(d−1)−3α

(cid:90) (cid:90)
(cid:90) (cid:90)

| (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))BD1((˜x1, n−γ ˜x−1))(cid:105) |2 d˜xd˜y
| (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))BD1((˜x1, n−γ ˜x−1))(cid:105) |

| (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))B∇M (1)

n ((˜x1, n−γ ˜x−1), θ1)(cid:105) | d˜xd˜y

+n1−γ(d−1)−4α

(cid:90) (cid:90)

=n1−γ(d−1)−2α

+o(n−α).

(cid:90) (cid:90)

| (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))B∇M (1)

| (cid:104)∇(cid:112)f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))BD1((˜x1, n−γ ˜x−1))(cid:105) |2 d˜xd˜y

n ((˜x1, n−γ ˜x−1), θ1)(cid:105) |2 d˜xd˜y

(2.5.2.19)

Using Lebesgue DCT the following is obtained

(cid:90)

(cid:90)

| (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105) |2 dxdy

lim
n→∞ Ψ11(n) =

RN

[c1,c2]×Rd−1

= I (1)
0 ,

where D1(x1, 0, . . . , 0) = λ1v

(1)

0 (x1, 0, . . . , 0)⊗M−1⊗ g(x1, . . . , xd). In a very similar manner

43

it can be shown that

lim
n→∞ Ψ12(n) =

RN

(cid:90)

(cid:90)
(cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105)
(cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105)dxdy = I (12)
(cid:90)
| (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105) |2 dxdy

[c1,c2]×Rd−1

(cid:90)

,

0

lim
n→∞ Ψ22(n) =

RN

[c1,c2]×Rd−1

= I (2)
0 .

Also it is very interesting to note that applying CS Inequality yields

(12)
(I
0

)2 < I

(1)
0 I

(2)
0 .

The equality case can be ignored from the fact that equality happens iﬀ

(cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105)
=k (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105) , k ∈ R,

⇒v

(1)
0 (x1, 0, . . . , 0) = c v

(2)

0 (x1, 0, . . . , 0) , c ∈ R.

(2.5.2.20)

(2.5.2.21)

(2.5.2.22)

Again by (A15) Lebesgue measure of such a set in (2.5.2.22) is 0. Therefore, the proof is

complete.

The lemma below establishes the fact that the function ∇√

φn is continuous with respect
to (θ1, θ2) in the space of L2(G×RN ), which along with Lemma 5 will ensure that the family

of distributions φn constitutes a regular experiment.
√

Lemma 6. The function ∇√
to (θ1, θ2) on L2(G × RN ).

φn(x, y, θ1, θ2) ≡

(cid:18) ∂

(cid:19)

√

φn
∂θ2

∂

,

is continuous with respect

φn
∂θ1

44

Proof. It will be enough to show that for any l ≡ (l1, l2) ∈ R2

2

(cid:90) (cid:90) (cid:104)∇(cid:112)φn(x, y, θ1, θ2), l(cid:105)
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+h
2

θ

where L > 0 is a ﬁnite constant. Note that

I(x ∈ G)dxdy ≤ (cid:107)h(cid:107)2 (cid:107)l(cid:107)2 n−4α−(d−1)γL, (2.5.2.23)

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)

√

∂
φn
∂θ2

− l2

2

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)

(2.5.2.24)

=

≤2l2

1

≤4l2

1

θ

√

√

− ∂

+ 2l2
2

− l1

φn
∂θ1

∂
φn
∂θ1
√

(cid:104)∇(cid:112)φn(x, y, θ1, θ2), l(cid:105)
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+h
l1
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)
 ∂
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)
 ∂
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)
(cid:34) ∂
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)
 ∂
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2)
(cid:34) ∂
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)
 ∂
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2)

φn
∂θ1
√

φn
∂θ1
√

φn
∂θ2
√

φn
∂θ2

φn
∂θ2

− ∂

√

√

+

+

+4l2
2

√

∂
φn
∂θ1
√

φn
∂θ1

− ∂

√

− ∂

φn
∂θ1

√

φn
∂θ1
√

− ∂

φn
∂θ2

− ∂

φn
∂θ2

√

∂
φn
∂θ2

√

+ l2

φn
∂θ2

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)
2
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)
2
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)
2
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2)
2(cid:35)
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)
2
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2)
2(cid:35)
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)

.

Recall the transformation (2.5.2.13) that we used earlier

˜y = Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM

(1)

n (x, θ1) + θ2n−α2D2(x)

+ n−2αM

(2)
n (x, θ2))),

45

therefore,

∂

√

φn
∂θ1

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)

= −n−α
(cid:113)

(cid:28)
BT Σ−1/2(x)∇(cid:112)f (˜y), D1(x) + n−α∇M

(cid:29)

(1)
n (x, θ1)

Det(Σ−1/2(x))I(x ∈ G).

Then

√

∂
φn
∂θ1

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2)
(cid:28)

= − n−α

BT Σ−1/2(x)∇

(cid:113)
f (˜y − Σ−1/2(x)Bh1(n−αD1(x) + n−2α∇M (1)

n (x, t1))),

(2.5.2.25)

(cid:29)(cid:113)

D1(x) + n−α∇M (1)

n (x, θ1) + n−αh1∇2M (1)

n (x, t2)

Det(Σ−1/2(x))I(x ∈ G),

where t1, t2 are some numbers in (θ1, θ1 + h1) than could depend on x.

√

In the following expression we consider the integral of the squared diﬀerence of the partial

derivatives of

φn with respect to θ1 at two diﬀerent coordinate points (θ1 + h1, θ2) and

(θ1, θ2). To carry out the integration, the transformation that is given in Equation (2.5.2.13)

will be used. Also for notational simplicity often times the arguments involving ˜x, associated
with the change of variable x = (˜x1, n−γ ˜x−1), will be suppressed in the following expressions
from here onward. Also it is understood that the range of integration remains the same for

46

˜y coordinate, which is RN . However, the range of integration of ˜x changes to ˜Gn.

I(x  G)dxdy

√

φn
∂θ1

2

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2)
(cid:90) (cid:90) (cid:20)(cid:28)

(cid:90) (cid:90)  ∂
(cid:18)
(cid:113)
f (˜y − Σ−1/2Bh1(n−αD1 + n−2α∇M (1)

√
φn
∂θ1

BT Σ−1/2

− ∂

=n−2α−(d−1)γ

∇

(cid:29)
n ((˜x1, n−γ ˜x−1), θ1)

+

(cid:28)
D1 + n−α∇M (1)
(cid:113)
f (˜y − Σ−1/2Bh1(n−αD1 + n−2α∇M (1)

BT Σ−1/2(˜x1, n−γ ˜x−1)

∇
n−αh1∇2M (1)

(cid:29)(cid:21)2
n ((˜x1, n−γ ˜x−1), t2)

n ((˜x1, n−γ ˜x−1), t1))) − ∇(cid:112)f (˜y)

(cid:19)

,

(2.5.2.26)

n ((˜x1, n−γ ˜x−1), t1))),

Det(Σ1/2)d˜xd˜y,

which can be bounded by the sum of the following two terms in Equation (2.5.2.27) and
(2.5.2.30) respectively, using simple algebraic inequality (a + b)2 ≤ 2a2 + 2b2. The ﬁrst term

47

is

(cid:90) (cid:90) (cid:20)(cid:28)

(cid:18)

2n−2α−(d−1)γ

BT Σ−1/2

(cid:113)
f (˜y − Σ−1/2Bh1(n−αD1 + n−2α∇M (1)

∇

n ((˜x1, n−γ ˜x−1), t1))) − ∇(cid:112)f (˜y)

(cid:19)

,

(cid:29)2
(cid:90) (cid:90) (cid:20)(cid:28)
n ((˜x1, n−γ ˜x−1), θ1)

Det(Σ1/2)d˜xd˜y

D1 + n−α∇M (1)
(cid:18)(cid:90) 1

=2n−2α−(d−1)γ

∇2

0

BT Σ−1/2

(cid:113)
f (˜y − τ Σ−1/2Bh1(n−αD1 + n−2α∇M (1)

n ((˜x1, n−γ ˜x−1), t1)))dτ

(cid:19)

(2.5.2.27)

=2n−4α−(d−1)γh2

(cid:29)2
Σ−1/2Bh1(n−αD1 + n−2α∇M (1)
n ((˜x1, n−γ ˜x−1), t1)),
(cid:90) (cid:90) (cid:20)(cid:28)
(cid:18)
(cid:19)
D1 + n−α∇M (1)
n ((˜x1, n−γ ˜x−1), θ1)
∇2(cid:112)f (y)
(cid:29)2

Σ−1/2B(D1 + n−α∇M (1)
D1 + n−α∇M (1)

n ((˜x1, n−γ ˜x−1), t1)),

n ((˜x1, n−γ ˜x−1), θ1)

BT Σ−1/2

Det(Σ1/2)d˜xd˜y

Det(Σ1/2)d˜xdy,

1

where the following change of variable is used

y = ˜y − τ h1n−α(Σ−1/2BD1 + n−αΣ−1/2B∇M

(1)

n (·, t1)).

(2.5.2.28)

48

Now this term can be bounded further by

(cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)BT Σ−1/2(˜x1, n−γ ˜x−1)
(cid:18)

2n−2α−(d−1)γh2

1

(cid:19)

Σ−1/2(˜x1, n−γ ˜x−1)

n ((˜x1, n−γ ˜x−1), t1))

∇2(cid:112)f (y)
(cid:13)(cid:13)(cid:13)(cid:13)2
(cid:13)(cid:13)(cid:13)(cid:13)2

n ((˜x1, n−γ ˜x−1), θ1)

Det(Σ1/2(˜x1, n−γ ˜x−1))d˜xdy

(D1(˜x1, n−γ ˜x−1) + n−α∇M (1)

n ((˜x1, n−γ ˜x−1), θ1)

Det(Σ−3/2(˜x1, n−γ ˜x−1))d˜xdy

(2.5.2.29)

1

×

≤ 2Ln−2α−(d−1)γh2

× B(D1(˜x1, n−γ ˜x−1) + n−α∇M (1)

(cid:19)
∇2(cid:112)f (y)

(cid:13)(cid:13)(cid:13)(cid:13)D1(˜x1, n−γ ˜x−1) + n−α∇M (1)
(cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)(cid:18)
(cid:13)(cid:13)(cid:13)(cid:13)D1(˜x1, n−γ ˜x−1) + n−α∇M (1)
(cid:18)(cid:90)

≤ 2Ln−2α−(d−1)γ(1 + O(n−α))h2

×

1

≤ 2Ln−2α−(d−1)γ(1 + O(n−α))h2

1

×

×

(cid:18)(cid:90)

(cid:107) g(˜x)(cid:107)4(cid:107)v(1)

0 (˜x1, n−γ ˜x−1)(cid:107)4(M−1)d˜x

≤ 2Ln−2α−(d−1)γ(1 + O(n−α))h2

1

(cid:13)(cid:13)(cid:13)(cid:13)2

dy

(cid:13)(cid:13)(cid:13)(cid:13)2
n ((˜x1, n−γ ˜x−1), t1))
(cid:33)
(cid:32)(cid:90) (cid:13)(cid:13)(cid:13)(cid:13)∇2(cid:112)f (y)
(cid:13)(cid:13)(cid:13)(cid:13)2
(cid:19)
(cid:32)(cid:90) (cid:13)(cid:13)(cid:13)(cid:13)∇2(cid:112)f (y)
(cid:33)
(cid:13)(cid:13)(cid:13)(cid:13)2
(cid:19)
(cid:90) (cid:13)(cid:13)(cid:13)(cid:13)∇2(cid:112)f (y)
(cid:13)(cid:13)(cid:13)(cid:13)2

(cid:107) g(˜x)(cid:107)4d˜x

(cid:90)

dy

dy

(cid:107)D1(˜x1, n−γ ˜x−1)(cid:107)4Det(Σ−3/2(˜x1, n−γ ˜x−1))d˜x

≤ h2

1n−4α−(d−1)γL(1)

1 (1 + O(n−α)),

where we used CS inequality and the triangle inequality for Euclidean norms. In the last

inequality we used the assumptions on g and v
D1 using, (cid:107)D1(cid:107) ≤ λ1(cid:107)v
(cid:107)∇M

(1)
0 , and we bounded the Euclidean norm of
0 (cid:107)M−1(cid:107)g(cid:107) ∀ (˜x1, n−γ ˜x−1) ∈ ˜Gn. On the other hand, we used
M(cid:107)g(cid:107)M ) as another bound on the Euclidean norm
n (cid:107) ≤ λi(a
n . We also used the observation that eigenvalues of Σ(x), x ∈ G are assumed to be

0 (cid:107)M−2(cid:107)g(cid:107)2 +. . .+a

2 (cid:107)v

(1)

(i)

(i)

(i)

(i)

(i)

of M
bounded away from 0 and ∞. Finally L, L

(1)
1 > 0 are generic constants.

49

The second term in the sum that bounds (2.5.2.26) is given by

2n−4α−(d−1)γh2

1

Det(Σ1/2)d˜xdy
≤2n−4α−(d−1)γh2

BT Σ−1/2∇(cid:112)f (y),∇2M
(cid:13)(cid:13)(cid:13)(cid:13)2(cid:13)(cid:13)(cid:13)(cid:13)∇2M

(cid:90) (cid:90) (cid:28)
(cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)BT Σ−1/2∇(cid:112)f (y)
(cid:90)
(cid:107)∇(cid:112)f (y)(cid:107)2dy
Det(Σ1/2)d˜xdy
1n−4α−(d−1)γ
1n−4α−(d−1)γL

(cid:107)∇2M

(cid:90)

(2)
1 ,

1

≤2h2

≤2h2

(1)

(cid:29)2
n ((˜x1, n−γ ˜x−1), t2)
(cid:13)(cid:13)(cid:13)(cid:13)2

n ((˜x1, n−γ ˜x−1), t2)

(1)

(2.5.2.30)

(1)

n (t2)(cid:107)2Det(Σ−1/2)d˜x

where the following transformation is used

y = ˜y − h1n−αΣ−1/2(˜x)BD1(˜x) − h1n−2αΣ−1/2(˜x)B∇M

(1)
n (˜x, t2),

(2.5.2.31)

also a simple application of CS inequality separates out the quantities involving y and ˜x.
Furthermore, ∇2M
sponding symmetrized version in a similar fashion as we did it for ∇M

(1)
n (t2) can be bounded in its matrix norm using the bound on its corre-

(1)
n . Also L

(2)
1 > 0

is generic constant which bounds the product of the integrals in (2.5.2.30). Combining

(2.5.2.28) and (2.5.2.30), we obtain the following bound for all suﬃciently large n

(cid:90) (cid:90) ∂

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2)

√

φn
∂θ1

2

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)

√

φn
∂θ1

− ∂

I(x  G)dxdy ≤ h2

1n−4α−(d−1)γL1. (2.5.2.32)

Next, we bound the integral of the squared diﬀerence of the partial derivative of

√

φn with

respect to θ1 at two points (θ1, θ2 + h2) and (θ1, θ2) respectively. Recall the transformation

50

(2.5.2.13) that we used earlier

(cid:16)
˜y = Σ−1/2(x)

y − B(cid:0)D0(x) + θ1n−αD1(x) + n−2αM

+ n−2αM

(1)

n (x, θ1) + θ2n−α2D2(x)

n (x, θ2)(cid:1)(cid:17)

(2)

,

therefore,

∂

√

φn
∂θ1

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)

= −n−α
(cid:113)

Det(Σ−1/2(x))I(x ∈ G).

(cid:28)
BT Σ−1/2(x)∇(cid:112)f (˜y), D1(x) + n−α∇M

(cid:29)

(1)
n (x, θ1)

Then
√
φn
∂θ1

∂

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2)

(cid:28)
BT Σ−1/2(x)

= −n−α
(cid:113)
f (˜y − Σ−1/2(x)Bh2(n−αD2(x) + n−2α∇M

× ∇
D1(x) + n−α∇M

(1)
n (x, θ1)

Det(Σ−1/2(x))I(x ∈ G),

(cid:29)(cid:113)

(2)
n (x, t3))),

(2.5.2.33)

where t3 is a number in (θ2, θ2 + h2) which may depend on x. Next, the transformation
(2.5.2.13) is used. Recall that the transformed variable is x = (˜x1, n−γ ˜x−1) and ˜x is inte-

51

grated over ˜Gn.

√

√

φn
∂θ1

2

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2)

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2)
(cid:90) (cid:90) (cid:28)

(cid:90) (cid:90)  ∂
(cid:18)
(cid:113)
f (˜y − Σ−1/2Bh2(n−αD2 + n−2α∇M (2)

BT Σ−1/2

φn
∂θ1

− ∂

∇

×

=n−2α−(d−1)γ

(cid:29)2

I(x  G)dxdy

(cid:19)
n ((˜x1, n−γ ˜x−1), t3))) − ∇(cid:112)f (˜y)

,

I(˜x ∈ ˜Gn)Det(Σ1/2)d˜xd˜y

=n−4α−(d−1)γh2

(cid:90) (cid:90) (cid:28)
n ((˜x1, n−γ ˜x−1), θ1)

D1 + n−α∇M (1)
(cid:18)(cid:90) 1
(cid:113)
f (˜y − τ Σ−1/2Bh2(n−αD2 + n−2α∇M (2)

BT Σ−1/2

∇2

×

2

n ((˜x1, n−γ ˜x−1), t3)))dτ

I(˜x ∈ ˜Gn)Det(Σ1/2)d˜xd˜y

(cid:19)

(2.5.2.34)

=n−4α−(d−1)γh2

2

0

× Σ−1/2B(D2 + n−α∇M (2)
D1 + n−α∇M (1)

BT Σ−1/2

n ((˜x1, n−γ ˜x−1), t3))),

n ((˜x1, n−γ ˜x−1), t3))),
(cid:19)

(cid:29)2
(cid:90) (cid:90) (cid:28)
(cid:18)
n ((˜x1, n−γ ˜x−1), θ1)
∇2(cid:112)f (y)
(cid:29)2
(cid:19)
n ((˜x1, n−γ ˜x−1), θ1)
∇2(cid:112)f (y)
(cid:13)(cid:13)(cid:13)(cid:13)2
(D2 + n−α∇M (2)
(cid:19)2
(cid:90) (cid:18)
n ((˜x1, n−γ ˜x−1), θ1)
(cid:90)
∇2(cid:112)f (y)
(cid:32)(cid:90) (cid:18)
(cid:19)2
∇2(cid:112)f (y)
(cid:19)

dy

dy

× Σ−1/2B(D2 + n−α∇M (2)
D1 + n−α∇M (1)

(cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)(cid:18)
(cid:13)(cid:13)(cid:13)(cid:13)D1 + n−α∇M (1)

≤n−4α−(d−1)γh2
2L

×

≤n−4α−(d−1)γ(1 + O(n−α))h2
2L

≤n−4α−(d−1)γ(1 + O(n−α))h2
2L

(cid:18)(cid:90)

(cid:107)g(˜x)(cid:107)4(cid:107)v(2)

0 (˜x1, n−γ ˜x−1)(cid:107)4d˜x

(cid:13)(cid:13)(cid:13)(cid:13)2

I(˜x ∈ ˜Gn)Det(Σ1/2)d˜xdy

n ((˜x1, n−γ ˜x−1), t3)))

I(˜x ∈ ˜Gn)Det(Σ−3/2)d˜xdy

(cid:33)
(cid:107)D2(cid:107)4Det(Σ1/2)d˜x

≤h2

2n−4α−(d−1)γL2(1 + O(n−α)).

The bound (2.5.2.34) can be obtained using similar arguments we used earlier in this proof

to bound D1 and D2 in Euclidean norm and using the fact that eigenvalues of Σ(x) are

52

bounded away from 0 and ∞ so that the matrix norm of Σ−3/2(x) is ﬁnite.

The bounds that have been established in (2.5.2.32) and (2.5.2.34) can be used with each

of the components of equation (2.5.2.24) to obtain the ﬁnal bound. It is possible because the

expressions are symmetric in arguments θ1 and θ2. Finally, tying up all the bounds together

for some generic constant L > 0 we obtain for all suﬃciently large n

(cid:90) (cid:90) (cid:104)∇(cid:112)φn(x, y, θ1, θ2), l(cid:105)
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+h

2

θ

I(x ∈ G)dxdy

≤4l2

1n−4α−(d−1)γL1 + 4l2
2n−4α−(d−1)γL2 + 4l2
2n−4α−(d−1)γL3
1h2
1h2
2h2
1n−4α−(d−1)γL4
+ 4l2
2h2
≤(cid:107)h(cid:107)2 (cid:107)l(cid:107)2 n−4α−(d−1)γL.

(2.5.2.35)

Next, we check condition (1) in Theorem II.6.1 in Ibragimov and Khasminskii (2013),

which together with Lemma 6 will imply that the family of distributions φn is LAN.

Lemma 7. For any k > 0

(cid:90) (cid:90) (cid:28) ∂

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+Ψ−1/2(n)u

θ

(cid:29)2
, Ψ−1/2(n)u

I (x ∈ G) dxdy = 0.

(2.5.2.36)

√
φn

∂θ

n→∞ sup
lim
(cid:107)u(cid:107)<k

n

Proof. By (A14), we have 1 − 2α − (d − 1)γ = 0. Using h = Ψ−1/2(n)u in lemma 6 for ﬁxed

53

k > 0, we obtain

n→∞ sup
lim
(cid:107)u(cid:107)<k

n

≤ lim

n→∞ sup
(cid:107)u(cid:107)<k

√

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+Ψ−1/2(n)u
(cid:90) (cid:90)
n−2α(cid:13)(cid:13)(cid:13)Ψ−1/2(n)u
(cid:13)(cid:13)(cid:13)4

(cid:104) ∂

φn

∂θ

θ

, Ψ−1/2(n)u(cid:105)2I (x ∈ G) dxdy

(cid:16)

n−2α(cid:17)

L = lim

n→∞ O

= 0.

The lemma below gives the Lyapunov’s condition for the family φn, which is a little

stronger condition than Lindeberg’s condition but simpler to verify. See condition (6.1) in

Section II in Ibragimov and Khasminskii (2013).

Lemma 8. For some δ ∈ (0, 2) Lyapunov’s condition holds

(cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)Ψ−1/2(n)

lim
n→∞ n

Proof. It is evident that due to the facts presented in Lemma 5 as n → ∞ we have

φn (x, y, θ) dxdy = 0.

(2.5.2.37)

∂θ

∂ ln φn

(cid:13)(cid:13)(cid:13)(cid:13)2+δ
a11(n), a12(n)
 →

a21(n), a22(n)

a11, a12

a21, a22

 (cid:54)= 02×2.

Ψ−1/2(n) =

(2.5.2.38)

With the help of the simple inequality

|a + b|p ≤ 2p−1 (|a|p + |b|p) p > 1, a, b ∈ R,

54

the following can be observed

(cid:13)(cid:13)(cid:13)(cid:13)2+δ

=

≤2δ/2

∂ ln φn

∂θ

∂θ1

∂ ln φn

(cid:13)(cid:13)(cid:13)(cid:13)Ψ−1/2(n)
(cid:34)(cid:12)(cid:12)(cid:12)(cid:12)a11(n)
(cid:34)(cid:12)(cid:12)(cid:12)(cid:12)a11(n)
(cid:34)(cid:12)(cid:12)(cid:12)(cid:12)a11(n)
(cid:12)(cid:12)(cid:12)(cid:12)a21(n)

∂θ1

+

∂ ln φn

≤21+3δ/2

+ a12(n)

∂ ln φn

∂θ1

+ a12(n)

∂ ln φn

∂θ1

(cid:12)(cid:12)(cid:12)(cid:12)2+δ

+

It is enough to show that

(cid:12)(cid:12)(cid:12)(cid:12)2(cid:35)(2+δ)/2
(cid:12)(cid:12)(cid:12)(cid:12)2+δ(cid:35)

∂θ2

∂ ln φn

∂ ln φn

+ a22(n)

∂ ln φn

∂θ2

∂ ln φn

∂θ1

+ a22(n)

+

+

+

∂θ2

∂θ1

∂θ2

∂θ2

∂ ln φn

∂ ln φn

∂ ln φn

(cid:12)(cid:12)(cid:12)(cid:12)2
(cid:12)(cid:12)(cid:12)(cid:12)a21(n)
(cid:12)(cid:12)(cid:12)(cid:12)2+δ
(cid:12)(cid:12)(cid:12)(cid:12)a21(n)
(cid:12)(cid:12)(cid:12)(cid:12)2+δ
(cid:12)(cid:12)(cid:12)(cid:12)2+δ
(cid:12)(cid:12)(cid:12)(cid:12)a12(n)
(cid:12)(cid:12)(cid:12)(cid:12)2+δ(cid:35)
(cid:12)(cid:12)(cid:12)(cid:12)a22(n)
(cid:12)(cid:12)(cid:12)(cid:12)2+δ
(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ln φn
(cid:12)(cid:12)(cid:12)(cid:12)2+δ
(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ln φn

∂ ln φn

∂θ1

∂θ2

∂θ2

.

lim
n→∞ n

lim
n→∞ n

φn(x, y, θ1, θ2)dxdy = 0,

φn(x, y, θ1, θ2)dxdy = 0.

(2.5.2.39)

(2.5.2.40)

(2.5.2.41)

For the following expression again the change of variables described in Equation (2.5.2.13)
will be used. Since the transformation on x is x = (˜x1, n−γ ˜x−1), the arguments involving ˜x
will be suppressed for notational simplicity and the corresponding range of integration will

be ˜Gn. Therefore, using the expression for

∂ ln φn

∂θ1

from equation (2.5.2.8) the following can

55

be obtained:

(cid:12)(cid:12)(cid:12)(cid:12)2+δ

(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ln φn
(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)f−1(cid:104)∇f, n−αΣ−1/2(x)BD1(x) + n−2αΣ−1/2(x)B∇M (1)

φn(x, y, θ1, θ2)dxdy

∂θ1

n

=n

n (x, θ1)(cid:105)(cid:12)(cid:12)(cid:12)2+δ

× φn(x, y, θ1, θ2)dxdy

≤21+δn1−α(2+δ)

+ 21+δn1−2α(2+δ)

=21+δn1−γ(d−1)−α(2+δ)

× f−(1+δ)(˜y)d˜xd˜y

+21+δn1−γ(d−1)−2α(2+δ)

× f−(1+δ)(˜y)d˜xd˜y

≤Ln1−γ(d−1)−α(2+δ)

+Ln1−γ(d−1)−2α(2+δ)

≤Ln1−γ(d−1)−α(2+δ)

n (x, θ1)

(cid:12)(cid:12)(cid:12)2+δ

φn(x, y, θ1, θ2)dxdy

φn(x, y, θ1, θ2)dxdy

(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)f−1(cid:104)∇f, Σ−1/2(x)BD1(x)(cid:105)(cid:12)(cid:12)(cid:12)2+δ
(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)f−1(cid:104)∇f, Σ−1/2(x)B∇M (1)
(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:104)∇f (˜y), Σ−1/2(˜x1, n−γ ˜x−1)BD1(˜x1, n−γ ˜x−1)(cid:105)2(cid:12)(cid:12)(cid:12)(1+δ/2)
(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:104)∇f (˜y), Σ−1/2)B∇M (1)
n ((˜x1, n−γ ˜x−1)), θ1)(cid:105)2(cid:12)(cid:12)(cid:12)1+δ/2
(cid:90) (cid:13)(cid:13)(cid:13)Σ−1/2D1
(cid:13)(cid:13)(cid:13)2+δ
(cid:90) (cid:13)(cid:13)(cid:13)Σ−1/2∇M (1)
(cid:19)L(1)
(cid:90)
(cid:19)

(cid:90)
(cid:90)
(cid:18)(cid:90)
(cid:18)(cid:90)

(cid:107)∇f (˜y)(cid:107)2+δ f−(1+δ)(˜y)d˜y

(cid:107)∇f (˜y)(cid:107)2+δ f−(1+δ)(˜y)d˜y

(cid:107)∇ log f (˜y)(cid:107)2+δ f (˜y)d˜y

(cid:107)g(˜x)(cid:107)2+δd˜x

n (θ1)

˜x∈ ˜Gn

d˜x

d˜x

n

+Ln1−γ(d−1)−2α(2+δ)

(cid:107)∇ log f (˜y)(cid:107)2+δ f (˜y)d˜y

L(2)

n

(cid:90)

˜x∈ ˜Gn

(cid:107)a(i)

2 (cid:107)v(i)

0 (cid:107)M−2(cid:107)g(˜x)(cid:107)2 + . . . + a(i)

M(cid:107)g(˜x)(cid:107)M(cid:107)2+δd˜x

≤n1−γ(d−1)−α(2+δ)L,

(2)
(1)
n , L
n

where L, L
are generic constants. We use CS inequality, the boundedness of the
eigenvalues of Σ(˜x1, n−γ ˜x−1) and the boundedness of the Euclidean norm of the vector ﬁeld
0 (˜x1, n−γ ˜x−1). As it can be noted the technique that has been used to bound D1, M

(1)
n

(1)

v

56

(2.5.2.42)

(cid:13)(cid:13)(cid:13)2+δ



and Σ here, is the same technique that have been used in the previous Lemmas 5 and 7.

Therefore,

(cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12)φ−1

n

(cid:12)(cid:12)(cid:12)(cid:12)2+δ

∂φn
∂θ1

lim
n→∞ n

φn(x, y, θ1, θ2)dxdy ≤ lim

n→∞ O(n−αδ) = 0.

With a similar argument equation (2.5.2.41) can also be established. Hence, we conclude

the proof of Lyapunov’s condition.

Finally, we present the famous lemma due to H´ajek (1972) tailored for our purposes. This

lemma will serve as a crucial connection to prove the minimax lower bound for the integral
curves based on tensor ﬁelds from the class D2(a, G, τ ).

Lemma 9. For any estimator ˆθn of the parameter θ ≡ (θ1, θ2) in the parametric family of
distributions φn described in lemma 4 satisfying LAN condition, any loss function w ∈ W,
and for any b > 0, for which Kb is the square [−b, b]2 in R2, the following holds:

lim inf
n→∞

{(θ1,θ2): I

(θ1,θ2) ∈Kb}

sup
1/2
0

≥0.25(2π)−1

w ((cid:107)y(cid:107)) exp

En
θ w

(cid:18)
−(cid:107)y(cid:107)

2

ˆθ1,n − θ1

ˆθ2,n − θ2

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)



1/2
0

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I


(cid:19)

dy,

Kb/2

(cid:90)

(cid:90)

(cid:90) n(cid:89)

i=1

57

where En

θ is the expectation with respect to family of measures

. . .

φn(xi, yi, θ1, θ2)dxidyi.

2.5.3 Proofs of Theorems

2.5.3.1 Proof of Theorem 1

The integral curves are deﬁned to follow along the corresponding unit vector ﬁelds. Since
(cid:107)v
0 (cid:107) = cr < 1, we deﬁne the following integral curves

(r)

∂
∂t

x

(r)
0 (t) =

∂
∂t

x

(r)
n (t; θr) = v

v
(cid:107)v

(r)
0 (t))
0 (t))(cid:107),
(r)

(r)
0 (x
(r)
0 (x
(r)
(r)
n (x
n (t; θr)),

(2.5.3.1)

r = 1, 2.

Note that suppressing the argument inside x

(1)
0 (s) in the vector ﬁeld it can be observed that,

(1)

0 (cid:107)2 + 2θ1n−α(cid:104)v

(1)

0 , g(cid:105) + (cid:107)g(cid:107)2θ2

1n−2α

1 = (cid:107)v

(1)

(1)

0 + θ1n−αg(cid:107)2 = (cid:107)v
=⇒ 1 − (cid:107)v
=⇒ 1 − (cid:107)v
(cid:107)v

0 (cid:107)2 = 2θ1n−α(cid:104)v
2θ1n−α(cid:104)v
0 (cid:107)
0 (cid:107) =

(1)

(1)

(1)

0 , g(cid:105) + (cid:107)g(cid:107)2θ2
0 , g(cid:105) + (cid:107)g(cid:107)2θ2
(1 + c1)c1

(1)

1n−2α
1n−2α

.

58

Deﬁne for r = 1,

(1)
n (t) = x

∆

(1)
0 (t)

(1)

n (s; θ1); θ1)ds −

t(cid:90)

0

v
(cid:107)v

(1)
0 (x
(1)
0 (x

(1)
0 (s))
0 (s))(cid:107) ds
(1)

(cid:105)

(1)

n (t; θ1) − x
t(cid:90)
t(cid:90)

(1)
n (x
v

0

(cid:104)

(1)
0 (x
v

=

=

0

+ θ1n−α

t(cid:90)

 1(cid:90)

0

0

=

(1)
0 (x

(1)
0 (s))

ds

(1)

n (s; θ1)) − v
t(cid:90)
(cid:104)

g(x

0

− θ1n−α(cid:107)g(x

(1)

n (s; θ1)) − 2(cid:104)v
0 (s))(cid:107)2v
(1 + c1)c1
n (s; θ1) + (1 − λ)x

(1)

(1)

∇v

(1)
0 (λx

(1)

0 , g(cid:105)v

(1)
0 (x
(1 + c1)c1

(1)
0 (s))

(1)
0 (x

(1)
0 (s))

ds

(cid:105)
 ∆

(1)
0 (s))dλ

(1)
n (s)ds

t(cid:90)

(cid:104)

0

+ θ1n−α

g(x

− θ1n−α(cid:107)g(x

(1)

n (s; θ1)) − 2(cid:104)v
0 (s))(cid:107)2v
(1 + c1)c1

(1)
0 (x

(1)

(1)

0 , g(cid:105)v

(1)
0 (x
(1 + c1)c1

(1)
0 (s))

(cid:105)

(1)
0 (s))

ds.

Then we obtain

(cid:13)(cid:13)(cid:13)∆(1)

n (t)

(cid:13)(cid:13)(cid:13) ≤ |θ1|n−α
t(cid:90)

+

max
λ∈[0,1]

0

≤ |θ1|n−α

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + An−α
(cid:33)
(cid:13)(cid:13)(cid:13) ds
(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)∆(1)
(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + An−α
(cid:33)
(cid:13)(cid:13)(cid:13) ds},

0 (s))

n (s)

0 , g(cid:105)v(1)

0 (x(1)
(1 + c1)c1

0 (s))

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) 2(cid:104)v(1)
(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) 2(cid:104)v(1)

t(cid:90)

0

t(cid:90)

0

(cid:32)(cid:13)(cid:13)(cid:13)g(x(1)
(cid:13)(cid:13)(cid:13)∇v(1)
(cid:32)(cid:13)(cid:13)(cid:13)g(x(1)
t(cid:90)

max
λ∈[0,1]

exp{

(cid:13)(cid:13)(cid:13) +

n (s; θ1))

n (s; θ1))

(cid:13)(cid:13)(cid:13) +
(cid:13)(cid:13)(cid:13)∇v(1)

0

59

0 (λx(1)

n (s, θ1) + (1 − λ)x(1)

0 (s))

0 , g(cid:105)v(1)

0 (x(1)
(1 + c1)c1

0 (s))

0 (λx(1)

n (s, θ1) + (1 − λ)x(1)

(2.5.3.2)

ds

ds

(2.5.3.3)

where A > 0 is some positive constant. The last step in equation (2.5.3.3) is done using

(1)
are continuously diﬀerentiable
Gr¨onwall-Bellman-type inequality. Recall that g and v
0
vector ﬁelds deﬁned on an open bounded convex subset G of Rd. This implies that they
are L2 bounded as well. Therefore,

(cid:13)(cid:13)(cid:13) ≤ |θ1|n−αC

(1)
0 . Similarly, it can also be

(1)
n (t)

established that

(2)
n (t)

(2)
0 . Next, consider

(cid:13)(cid:13)(cid:13)∆
(cid:13)(cid:13)(cid:13) ≤ |θ2|n−αC

(cid:105)

(cid:13)(cid:13)(cid:13)∆

(cid:104)

v(1)
0 (x(1)

+ θ1n−α

0 (x(1)

0 (s))

ds

n (s; θ1)) − v(1)
t(cid:90)
(cid:104)

g(x(1)

n (s; θ1)) − 2(cid:104)v(1)
0 (s))(cid:107)2v(1)
(1 + c1)c1

0

− θ1n−α(cid:107)g(x(1)

0 , g(cid:105)v(1)

0 (x(1)
(1 + c1)c1

0 (s))

0 (x(1)

0 (s))

(cid:105)

ds

∇v(1)

0 (x(1)

0 (s))∆(1)

n (s)ds

∆(1)

n (t) =

=

t(cid:90)

0

t(cid:90)

0

+

0

Denote

(cid:104)∇2v(1)

0 (λx(1)

n (s; θ1) + (1 − λ)x(1)

0 (s)), (∆(1)

n (s))T(cid:105)dλ

n (s)ds

(2.5.3.4)

 ∆(1)

 1(cid:90)
t(cid:90)
t(cid:90)

0

0

(cid:34)
 1(cid:90)

t(cid:90)

0

1
2

+ θ1n−α

+ θ1n−α

t(cid:90)

−

(cid:35)

 ∆(1)

g(x(1)

0 (s)) − 2(cid:104)v(1)

0 , g(cid:105)v(1)

0 (x(1)
(1 + c1)c1

0 (s))

ds

∇g(ρx(1)

n (s, θ1) + (1 − ρ)x(1)

0 (s))dρ

n (s)ds

0

0

1n−2α(cid:107)g(x(1)
θ2

0 (s))(cid:107)2v(1)
(1 + c1)c1

0 (x(1)

0 (s))

.

˜g(x

(1)
0 (s)) = g(x

0 (s)) − 2(cid:104)v

(1)

(1)

0 , g(cid:105)v

(1)
0 (x
(1 + c1)c1

(1)
0 (s))

.

60

Therefore, by the triangle inequality for the Euclidean norm

(1)

n (t) −

∇v

(1)
0 (x

(1)
0 (s))∆

(1)

n (s)ds − θ1n−α

˜g(x

t(cid:90)

0

≤ 1
2

t(cid:90)

0

 1(cid:90)
t(cid:90)

0

(1)
0 (λx

(cid:13)(cid:13)(cid:13)∇2v
 1(cid:90)
(cid:13)(cid:13)(cid:13)∇g(ρx

+ |θ1|n−α

t(cid:90)

0

(1)
0 (s))ds

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
(cid:107)∆
(cid:13)(cid:13)(cid:13) dλ
(cid:13)(cid:13)(cid:13)∆
(cid:13)(cid:13)(cid:13) dρ

(1)

n + (1 − λ)x

(1)
0 )

(1)

n + (1 − ρ)x

(1)
0 )

(1)

n (s)(cid:107)2ds

(cid:13)(cid:13)(cid:13) ds

(1)
n (s)

0

0

+ θ2

1n−2α

t(cid:90)

0

Hence it is evident that for all suﬃciently large n

(1)

n (t) −

∇v

(1)
0 (x

(1)
0 (s))∆

(1)

n (s)ds − θ1n−α

(cid:107)g(x

(1)

0 (s))(cid:107)2(cid:107)v

(1)
0 (x
(1 + c1)c1

(1)

0 (s))(cid:107)

ds.

(2.5.3.5)

t(cid:90)

0

˜g(x

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ Cn−2α.

(1)
0 (s))ds

(2.5.3.6)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)∆

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)∆

t(cid:90)

0

t(cid:90)

So the integral equation along with the corresponding diﬀerential equation from the inequal-

ity in (2.5.3.6) is given by,

(1)
n (t) =

∆

∇v

(1)
0 (x

(1)
0 (s))∆

(1)

n (s)ds + θ1n−α

0

(1)

n (t) = ∇v
∆

(1)
0 (x

∂
∂t

(1)
0 (t))∆

(1)

n (t) + θ1n−α˜g(x

(1)

0 (t)) + O(n−2α).

t(cid:90)

0

(1)

0 (s))ds + O(n−2α),

˜g(x

(2.5.3.7)

In order to solve the ODE given in equation (2.5.3.7) note that there exists a function
G(1)(s, t) : (t, s) ∈ [0, T ]2 which satisﬁes the following conditions:

1.

∂
∂t

G(1)(s, t) = ∇v

(1)
0 (x

(1)

0 (t))G(1)(s, t), 0 ≤ s ≤ t ≤ T ;

2. G(1)(t, t) = Id×d;

61

3. G(1)(s, t) ≡ 0 , 0 ≤ t < s ≤ T.

Then by Theorem (2.2) in chapter (7) in Coddington and Levinson (1955) the solution to

equation (2.5.3.7) is given using the Green’s function G(1)(s, t) as the following

(1)

n (t) = θ1n−α

∆

G(1)(s, t)˜g(x

(1)

0 (s))ds + O(n−2α) ∀t ∈ [0, T ] .

Which in turn provides the following relation

(1)

n (t0; θ1) − x

x

(1)

0 (t0) = θ1n−α

G(1)(s, t0)˜g(x

(1)

0 (s))ds + O(n−2α).

(2.5.3.8)

0

Similarly, for ∆

(2)
n (t) = x

(2)

n (t; θ2) − x

(2)
0 (t) the diﬀerential equation can be given by,

(2)

n (t) = ∇v
∆

(2)
0 (x

∂
∂t

(2)
0 (t))∆

(2)

n (t) + θ2n−α˜g(x

(2)

0 (t)) + O(n−2α),

(2.5.3.9)

and the solution of the diﬀerential equation can be given via G(2)(s, t0), another Green’s

function which satisﬁes the above mentioned three conditions with v

(2)
0 .

(2)

n (t0; θ2) − x

x

(2)

0 (t0) = θ2n−α

G(2)(s, t0)˜g(x

(2)

0 (s))ds + O(n−2α).

(2.5.3.10)

t(cid:90)

0

t0(cid:90)

t0(cid:90)

0

62

For two unit vectors ei ∈ Rd, i = 1, 2, that are non-orthogonal to ˜g(x

(i)

0 (t)), t ∈ [0, T ], i = 1, 2,

respectively, observe that

(cid:104)x(1)
n (t0; θ1) − x(1)

0 (t0), e1(cid:105) = θ1n−α

(cid:104)x(2)
n (t0; θ2) − x(2)

0 (t0), e2(cid:105) = θ2n−α

 t0(cid:90)
 t0(cid:90)

0

0

(cid:104)G(1)(s, t0)˜g(x(1)

0 (s)), e1(cid:105)ds

(cid:104)G(2)(s, t0)˜g(x(2)

0 (s)), e2(cid:105)ds

 (1 + o(1)) ,
 (1 + o(1)) .

(2.5.3.11)

Note that due to conditions (1) and (2) on G(r) there exists an  > 0 such that G(r)(s∗, t) =
Id×d + o(1), s∗ ∈ [t − , t] . Hence, it is evident that for r = 1, 2,

t0(cid:90)
(cid:104)G(r)(s, t0)˜g(x

Cr =

0

(r)

0 (s)), er(cid:105)ds (cid:54)= 0.

(2.5.3.12)

Therefore, for some r

(1)
n (t0; θ1), r

(2)

n (t0; θ2) converging to 0 as n → ∞ we have,

(cid:16)(cid:104)x
(cid:16)(cid:104)x

θ1 =

θ2 =

nα
C1
nα
C2

(1)

0 (t0), e1(cid:105)(cid:17)(cid:16)
0 (t0), e2(cid:105)(cid:17)(cid:16)

(2)

(1)

n (t0, θ1) − x
n (t0, θ2) − x

(2)

(cid:17)
(cid:17)

1 + r

(1)
n (t0; θ1)

1 + r

(2)
n (t0; θ2)

,

.

(2.5.3.13)

For r = 1, 2 an estimator of θr based on the integral curve estimate ˆX

(r)
n (t0) for x

(r)
n (t0, θr)

can be constructed as:

ˆθr,n =

nα
Cr

(cid:16)(cid:104) ˆX

(r)

n (t0) − x

0 (t0), er(cid:105)(cid:17)(cid:16)

(r)

1 + R

(r)
n (t0)

(cid:17)

,

(2.5.3.14)

where R

(r)

n (t0) converges to 0 as n → ∞ and satisﬁes,

sup
|θr|≤M

| R

(r)

n (t0) − r

(r)

n (t0; θr) | ≤ sup
|θr|≤M

| (cid:104) ˆX

(r)

n (t0) − x

(r)

n (t0, θr), er(cid:105) | .

(2.5.3.15)

nα
Cr

63

Now notice that,

+

nα

n (t0) − x(r)

| ˆθr,n − θr |
≤ nα
| Cr | | (cid:104) ˆX (r)
| Cr | | (cid:104) ˆX (r)
≤ nα
n (t0) − x(r)
| Cr | | (cid:104) ˆX (r)
| θr |
| 1 + r(r)
n (t0; θr) |

+

n (t0, θr), er(cid:105) |

n (t0, θr), er(cid:105) || 1 + R(r)

n (t0) |

| R(r)

n (t0) − r(r)

n (t0; θr) | .

n (t0) − x(r)

0 (t0), er(cid:105)R(r)

n (t0) − (cid:104)x(r)

n (t0, θr) − x(r)

0 (t0), er(cid:105)r(r)

n (t0; θr) |

(2.5.3.16)

Due to (A16), an M0 > 0 is chosen such that {(θ1, θ2) : I
θr |≤ M0} for some square Kb ∈ R2. Therefore, we get the following inequality

1/2
0

(θ1, θ2) ∈ Kb} ⊂ {(θ1, θ2) : |

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)



1/2
0

ˆθ2,n − θ2

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I
ˆθ1,n − θ1

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)


ˆθ1,n − θ1
(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)

(cid:13)(cid:13)(cid:13) (2 + 2M0)nα

ˆθ2,n − θ2

E ˜w

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I

(cid:13)(cid:13)(cid:13)I

1/2
0

1/2
0

{(θ1,θ2): I

(θ1,θ2) ∈Kb}

sup
1/2
0

≤

≤

sup

{(θ1,θ2): |θi|≤M0}

sup

{(θ1,θ2): |θi|≤M0}

E ˜w

E ˜w

(cid:104) ˆX

(cid:104) ˆX

(1)

(1)

n (t0) − x
n (t0, θ1), e1(cid:105)
C1
n (t0) − x
n (t0, θ2), e2(cid:105)
(2)
C2

(2)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)

 .

(2.5.3.17)

Since ˜w ∈ W, it is an increasing function and

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I

1/2
0

ˆθ1,n − θ1

ˆθ2,n − θ2

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤(cid:13)(cid:13)(cid:13)I


1/2
0

(cid:13)(cid:13)(cid:13)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
ˆθ1,n − θ1

ˆθ2,n − θ2

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ,


is used in equation (2.5.3.17) along with the relation established in (2.5.3.15) and (2.5.3.16).

64

Finally, deﬁne ˜w

inequality

sup

D∈D2(a,G,τ )

Ew

1/2
0

(cid:13)(cid:13)(cid:13)I
(cid:13)(cid:13)(cid:13) (2 + 2M0)
(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
(cid:104) ˆX
nα

n (t0) − x
n (t0) − x

(cid:104) ˆX

(2)

(1)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
 y1

C1
y2
C2

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
 = w ((cid:107)y(cid:107)) and, therefore, we have the

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)



≥

sup

{(θ1,θ2): |θi|≤M0}

Ew

(1)

n (t0) − x
n (t0) − x

(2)

(1)

n (t0, θ1), e1(cid:105)
n (t0, θ2), e2(cid:105)

(2)

(2.5.3.18)

Therefore, using H´ajek’s lemma for ˜w, the following gives us the ultimate lower bound

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
 .

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)



(1)

(2)

(cid:104) ˆX

n (t0, θ1), e1(cid:105)
n (t0, θ2), e2(cid:105)

nα

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
(cid:104) ˆX
(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)
(cid:104) ˆX
nα
(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I
ˆθ1,n − θ1

(cid:19)

ˆθ2,n − θ2

1/2
0

(cid:104) ˆX

dy.

lim inf
n→∞

inf
n ∈En(T )
(2)

ˆX

(1)
n , ˆX

sup

D∈D2(a,G,τ )

Ew

≥ lim inf
n→∞

{(θ1,θ2): I

(θ1,θ2) ∈Kb}

sup
1/2
0

(cid:90)

≥ 0.25(2π)−1

˜w ((cid:107)y(cid:107)) exp

(cid:18)

E ˜w

−(cid:107)y(cid:107)

2

(1)

n (t0) − x
n (t0) − x

(2)

(1)

n (t0, θ1), e1(cid:105)
n (t0, θ2), e2(cid:105)

(2)

(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)



(2.5.3.19)

Kb/2

This provides a nontrivial bound for the minimax loss to the class of estimators for all b > 0.

Thus, by choosing b arbitrarily large the proof can be completed.

2.5.3.2 Proof of Theorem 2

Proof. To prove Theorem 3 the structure of Theorem 2 is followed. Let t0 = T . Since
∇ϕ(a) (cid:54)= 0, without loss of generality one can assume a = 0. Then there exists a neighbor-
hood of 0, say N2ρ = {x ∈ Rd : |x| ≤ 2ρ}, where ∇ϕ(x) (cid:54)= 0.

65

(1)
Also v
0

(2)
and v
0

are chosen such that the corresponding integral curves x

(1)
0 (t) and

x

(2)

(cid:16)

0 (t), t ∈ [0, T ], stay in N2ρ. Then for all t ∈ [0, T ] we have
(cid:16)
(cid:16)

(cid:16)
(cid:17)

(cid:17)(cid:17)(cid:48)

(r)
0 (t)

(r)
0 (t)

(r)
, v
0

(cid:28)

∇ϕ

(cid:16)

(cid:17)

(cid:16)

(cid:16)

=

ϕ

x

x

are chosen in such a way that

ϕ

(cid:17)(cid:29)
(cid:17)(cid:17)(cid:48)

x

(r)
0 (t)

(cid:54)= 0.

x

(r)
0 (t)

< 0 which is ensured

and monotonicity of ϕ.

(cid:16)

x

ϕ

(cid:17)

(r)
0 (T )

(cid:16)

(cid:17)

.

x

(r)
0 (t)

= inf

t∈[0,T ]

ϕ

(r)
Moreover, v
0

(r)
0 (t)
(r)
by the continuity of v
0

x

As a result,

(cid:13)(cid:13)(cid:13)∆

(cid:13)(cid:13)(cid:13) ≤ |θr| n−αC

(r)
0

(r)
n (t)

Since
ρ and M0 such that for all n ≥ n0,

Then by using continuity of v

(cid:12)(cid:12)(cid:12)x

(r)
n (t; θr)

for all t ∈ [0, T ] and |θr| ≤ M0 there exists n0 depending on

(cid:12)(cid:12)(cid:12) ≤ 2ρ for all t ∈ [0, T ] and |θr| ≤ M0.
(cid:16)

0 (x),∇ϕ(x), we can assume
(cid:17)(cid:29)

(cid:17)

(r)

(r)
n (t; θr)

x

(r)
, v
0

(r)
n (t; θr)

x

< 0

(cid:16)

(cid:28)

∇ϕ

for all t ∈ [0, T ], |θr| ≤ M0 and for all large enough n ≥ n0. Then for all t ∈ [0, T ], |θr| ≤ M0
and all large enough n ≥ n1 depending on ρ, M0 and g we observe

(cid:16)

(cid:104)∇ϕ
x
= (cid:104)∇ϕ

(cid:16)

(cid:17)

(cid:16)
(cid:16)

(cid:17)

(cid:16)

(cid:17)(cid:105)

(r)
n (t; θr)

(r)
, v
n

(r)
n (t; θr)

x

(r)
n (t; θr)

x

,

(r)
v
0

(r)
n (t; θr)

x

(cid:17)

(cid:16)

x

+ θrn−αg

(cid:17)(cid:17)(cid:105) < 0.

(r)
n (t; θr)

Since g and ∇ϕ are bounded on N2ρ, then
(cid:17)

(cid:16)

ϕ

(r)
n (T ; θr)

x

= inf

t∈[0,T ]

(cid:16)

x

ϕ

(cid:17)

(r)
n (t; θr)

.

66

As a result,

Suppose Kr =

seen that

(cid:16)

(cid:16)

ϕ

(cid:17)
(cid:17)

(r)
n (t; θr)

x

(r)
0 (t)

ϕ

t∈[0,T ]

(cid:16)
(cid:17) − inf
(cid:17)
(cid:16)
(cid:17) − ϕ
(cid:17)(cid:17)(cid:48)(cid:16)
(cid:17)(cid:17)(cid:48)
n (T ; θr) − x
θrn−αC

(r)
0 (T )

(r)

x

x

= ϕ

(r)
n (T ; θr)

(cid:16)

x

ϕ

inf
t∈[0,T ]

(cid:16)

x

(cid:16)
(cid:16)

(cid:16)
(cid:16)

ϕ

ϕ

=

=

x

(r)
0 (T )

x

(r)
0 (T )

(cid:17)(cid:17)(cid:48)

(2.5.3.20)

(1 + o(1))

(r)
0 (T )

(r)
0 (T ) (1 + o(1)) .

x

(r)
0 (T )

C

(r)
0 (T ) then similarly to the proof of Theorem 2 it can be

(cid:32)

(cid:16)

inf
t∈[0,T ]

ϕ

(r)
n (t; θr)

x

(cid:17) − inf

t∈[0,T ]

(cid:16)

x

ϕ

(cid:17)(cid:33)(cid:16)

(r)
0 (t)

(cid:17)

.

1 + r

(r)
n (T ; θi)

θr =

nα
Kr

Also the estimate of θr is constructed through an arbitrary estimate of the minimum distance

(r)
ˆF
n

and appropriately chosen sequences r

(r)
n (T ; θr) & R

(r)
n (T ), and it is given by

(cid:32)

(cid:16)

(cid:17)(cid:33)(cid:16)

(cid:17)

ˆθr,n =

nα
Kr

(r)

n − inf
ˆF
t∈[0,T ]

ϕ

x

(r)
0 (t)

1 + R

(r)
n (T )

.

The rest of the proof follows the same lines of the proof of Theorem 2.

67

Chapter 3

Global Minimax Bound

3.1

Introduction

In the previous chapter we already established pointwise rate of convergence for the asymp-

totic risk of the CS estimators is minimax optimal, see also Banerjee, Sakhanenko and Zhu

(2019). The present chapter establishes the rate of convergence of the asymptotic risk of

the integral curve estimator is minimax optimal, globally. Historically, in nonparametric

estimation framework Stone (1982) established global minimax optimal rates for the es-

timators in a simple nonparametric regression setting. Some more recent works include

Raskutti, Wainwright and Yu (2012), where the authors have established global minimax

optimal rates for sparse additive models over reproducing kernel Hilbert spaces (RKHS) in a
L1 type convex optimization framework. Guntuboyina and Sen (2015), Kim and Samworth

(2016) established global rate of convergence in univariate convex regression and log-concave

density estimation, respectively. Interestingly enough, in both of these works the estimator

achieves the similar asymptotic rate of convergence which is globally minimax. In our work

we establish similar results, but under a semi-parametric setting involving high order tensor

structure of the signals. Moreover, we use this globally minimax optimal rate with Monte

Carlo (MC) simulation study to compare diﬀerent scanning protocols to ﬁnd the one that

gives the smallest stable global lower bound on the asymptotic risk of the estimated ﬁber

68

trajectories. This comparison method harnesses the global information in ﬁbers as opposed

to some t test applied to an image summary statistic.

The rest of the chapter is organized as follows. In section 2 we introduce the assumptions,

main results, the deﬁnition of integrated or supremum norm for our problem and the basic

underlying assumptions needed for the inference described speciﬁcally in this chapter. In

section 3 we describe the physical phenomenon behind the DT-MRI and also describe our

simulation study and results from a real data analysis. Additionally, in this section we pro-

vide a simulation based choice of the protocol that gives the lowest global lower bound on

the asymptotic risk of the estimators. All the proofs with necessary lemmas and propositions

are provided in section 5.

3.2 Assumptions and main results

Here we describe the main theorems and lemma required to establish the mimimax lower

bound for the global asymptotic risk of the integral curve estimator. First, in addition to

the assumptions in chapter 1, (A5) and (A6), let us introduce an assumption on the density

f of the noise variable ξ.

(A5(cid:48)) Noise variables {ξi : i = 1, 2, . . .} are i.i.d. with a common density f such that all the

second order partial derivatives of the function

f (z + u) − f (z)

f (z)

69

(cid:90)

(cid:18)

(cid:19)

f (z)dz, z ∈ RN ,

g(u) := −

ln

1 +

RN

are continuous at 0.

The class of densities that is described by condition (A5(cid:48)) is fairly extensive. In particular,
normal densities satisfy (A5(cid:48)). Moreover, if f is “regular” as deﬁned in Ibragimov and

Has’minskii (2013), then the second order partial derivatives of g can be written as

(cid:90)

(cid:90)

(cid:48)(cid:48)
ij(u) =
g

(cid:48)
i (y)f

(cid:48)
j(y − u)/f (y)dy =

f

(cid:48)
j(y)f

(cid:48)
i (y − u)/f (y)dy, i, j = 1, . . . , N.

f

The assertion of (A5(cid:48)) can be understood by following the ﬁrst part of the Lemma I.7.1

where it immediately follows from the ﬁniteness of Fisher information

(cid:90)

Iii =

(cid:48)
i (z))2/f (z)dz, i = 1, . . . , N,

(f

and a continuity type condition:

(cid:90)

(cid:48)
i (y + h) − f

(cid:48)
i (h))2/f (y)dy ≤ C|h|2, i = 1, . . . , N,

(f

for all h such that |h| ≤ ε for some ε > 0 and a constant C > 0. These conditions are

similar to the conditions (b) and (c) in the deﬁnition of regular experiment in Ibragimov and

Has’minskii (2013).

Now, we present the main theorems along with some motivation to establish global

optimal bounds for integral curves estimators. We also present the parametric subclass

of tensors and their construction. Furthermore, we describe Lemma 10 showing that the

constructed parametric subclass satisﬁes the assumption (A1). In addition to the class of

tensors, we introduce the following classes essential for the results presented in this chapter.
Let ˜W be the class of all even functions w : RR (cid:55)→ R that are non-decreasing on R+,

70

|ur|. Let
0 at 0 and w(x) > 0 for all x > 0. Examples of such functions are
En(a∗, T ) denote the class of all possible estimators of the curve x(r)(t), t ∈ [0, T ], r =

u2
r,

r=1

r=1

R(cid:80)

R(cid:80)

1, . . . , R, obtained by solving the ODE involving pseudo-eigenvectors v(r), r = 1, . . . , R.
Recall that the integrated Lp-norm for a vector function y(t) = (y1(t), . . . , yd(t)), t ∈ [0, T ],

is deﬁned as

and

(cid:107)y(cid:107)p,T :=

, 1 ≤ p < ∞,

 T(cid:90)

d(cid:88)
|yi(t)|pdt

1/p

i=1

0

(cid:107)y(cid:107)∞,T := sup
t∈[0,T ]

|yk(t)|.

max
1≤k≤d

Theorem 4. Assume conditions (A1)-(A6) and (A5(cid:48)) hold and 1 ≤ R ≤ (M + 2)/2. Then
for any number T > 0, any point a∗ ∈ G, any function w ∈ ˜W, we have

lim inf
n→∞

inf
n ∈En(a∗,T )
n ,..., ˆX (R)

ˆX (1)

sup

D∈D2(a∗,G,T )

Ew

n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX (R)

n − x(R)(cid:107)p,T

> 0.

(cid:16)

n2/(d+3)(cid:16)(cid:107) ˆX (1)

(cid:17)(cid:17)

(cid:17)(cid:17)

Theorem 5. Assume conditions (A1)-(A6) and (A5(cid:48)) hold and 1 ≤ R ≤ (M + 2)/2. Deﬁne

for c, k > 0,

Dc,k = {D ∈ D2(a∗, G, T ) : (cid:107)D − D0(cid:107)∞ ≤ cn−2/(d+3) and (cid:107)D

(cid:48)(cid:48) − D

(cid:48)(cid:48)
0(cid:107)∞ ≤ k}.

Then for any numbers c > 0, T > 0, any point a∗ ∈ G, any D ∈ Dc,k, any function w ∈ ˜W,
any 1 ≤ p ≤ ∞, and some k > 0, we have

lim inf
n→∞

inf
n ∈En(a∗,T )
n ,..., ˆX (R)

ˆX (1)

sup
D∈Dc,k

Ew

n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX (R)

n − x(R)(cid:107)p,T

> 0.

(cid:16)

n2/(d+3)(cid:16)(cid:107) ˆX (1)

71

Theorems 4 and 5 establish that the ensemble of integral curve estimators ˆX

(r)
n (t), r =
1, . . . , R, t ∈ [0, T ], minimizes the maximum risk among all the estimators inside the respec-

tive classes under the integrated norm and the appropriate loss function w. Thus, we achieve

the minimax lower bound under the integrated norm with loss function w. If we analyze

carefully Theorem 4, then we see that it is an immediately corollary of Theorem 5, since
the class Dc,k ⊂ D2(a∗, G, T ). In the statements for both theorems we see that the norm
of the errors is scaled by n2/(d+3) which matches the asymptotic rate of convergence of the

Carmichael and Sakhanenko (2015) estimator.

Theorem 6. Assume conditions (A1)-(A8) and (A5(cid:48)) hold. Let { ˆX

(r)
n

: r = 1, . . . , R} be

the integral curve estimators of the solutions of the ODEs involving the pseudo-eigenvectors

v(r)(t), r = 1, . . . , R.

1. Additionally let E(cid:107)ξi(cid:107)q

q < ∞, for some q ≥ 4. Then for each r = 1, . . . , R, for any

number T > 0, for any point a∗ ∈ G and any 2 ≤ p ≤ q, we have

E(cid:107)n2/(d+3)(cid:16) ˆX

n − x(r)(cid:17)(cid:107)p,T < ∞.

(r)

sup
n

sup

D∈D2(a∗,G,T )

2. Moreover for each r = 1, . . . , R, for any number T > 0, for any point a∗ ∈ G, we have

E(cid:107)n2/(d+3)(cid:16) ˆX

n − x(r)(cid:17)(cid:107)∞,T < ∞.

(r)

sup
n

sup

D∈D2(a∗,G,T )

Theorem 6 exploits the moment conditions on the noise variables to establish that the
maximum risk of appropriately scaled integral curve estimators under integrated Lp−norm

or supremum-norm converges to a ﬁnite constant. These three theorems together study the

global bounds for the asymptotic risk of the integral curve estimator and show that it is

72

minimax.

3.2.1 Parametric subclass of tensors

In order to prove the minimax global rate for the asymptotic risk of the integral curve
estimator, we start by proposing a construction of a parametric subclass of D2(a∗, G, T ).

We start with perturbing the curves which will perturb the resulting gradient vector ﬁeld
translating the perturbation ultimately to the tensor ﬁeld in D2(a∗, G, T ).

(r)

For each r = 1, . . . , R, let x

0 (t, a∗), t ∈ [0, T ], denote the integral curve starting at a∗,
(r)
(r)
0 (t, a∗)), where x
0 (0, a∗) = a∗. Additionally, for small  > 0,
driven by the vectors v
consider A, an  neighborhood of a∗ (in Euclidean norm) in a (d−1)-dimensional hyperspace
transversal to the ﬂow at a∗. Suppose the volume swept by A, under the ﬂow v

(r)
0 (x

(r)
0 , is denoted

by

(r)

,T = {x(r) = x

G

(r)

0 (t, a) : t ∈ [0, T ], a ∈ A} ⊂ G, for all r = 1, . . . , R.

Then G

(r)
,τ deﬁnes a neighborhood of the integral curve x

(r)

0 (t, a∗), t ∈ [0, τ ], as we vary the

initial point a. Deﬁne,

x

(r)
b (t, a) = x

(r)

0 (t + n−αϕ

(r)

b (t)ψ(r)(nγ|a − a∗|), a),

(r)

where ϕ
b ∈ {0, 1}P . Also suppose that ϕ

b (t), t ∈ [0, T ], is a family of twice continuously diﬀerentiable functions indexed by
b (t) ≤ 1, for
r = 1, . . . , R. Additionally assume, ψ(r)(z), z > 0, is a three times continuously diﬀerentiable

b (T ) = 0,−1 < ∇ϕ

b (t) (cid:54)≡ 0, ϕ

(r)
b (0) = 0, ϕ

(r)

(r)

(r)

73

function such that

(r)

0 < ψ(r)(z) < c/(cid:107)v
∇ψ(r)(z) ≤ 0 for z > 0, ψ(r)(z) = 0 for z > .

0 (cid:107)∞, ∇ψ(r)(0) = ∇2ψ(r)(0) = 0,

Note that the perturbation in parameter t is small enough and for all r = 1, . . . , R, the

perturbation vanishes far enough from x

(r)

0 (t, a∗), t ∈ [0, T ]. Then the corresponding pertur-

bation in the vector ﬁeld can be found as

d
dt

x

(r)
b (t, a) = v

(r)
b

(x

= v

(r)
0 (x

d
dt

x

(r)
b (t, a)) =
b (t, a))(1 + n−α∇ϕ

(r)

(r)

0 (t + n−αϕ

(r)

b (t)ψ(r)(nγ|a − a∗|), a)

(r)

b (t)ψ(r)(nγ|a − a∗|)).

(3.2.1.1)

By ﬂow box theorem (see Lemma 3.2.120 in Chicone (1999)) for x(r) ∈ G

(r)
,T , there are
b (x) ∈ [0, T ] and ab(x) ∈ A,
such that there is a b-perturbed integral curve starting in A, which goes through x(r) and

uniquely deﬁned twice continuously diﬀerentiable functions t

(r)

x

(r)
b (t

(r)
b (x(r)), ab(x(r))) = x, for r = 1, . . . , R,

since v

(r)

0 (x) (cid:54)= 0, for all x ∈ G

(r)
,T . So the expression in (3.2.1.1) can be written as

v

(r)
b,n(x) = v

(r)

0 (x)(1 + n−α∇ϕ

(r)
b (t

(r)

b (x))ψ(r)(nγ|ab(x) − a∗|)).

(3.2.1.2)

Now for a ﬁxed b ∈ {0, 1}P , we could construct an order M rank R tensor such that

Db(x) = λ1v

(1)

b,n(x)⊗M + λ2v

(2)

b,n(x)⊗M + . . . + λRv

(R)

b,n (x)⊗M .

(3.2.1.3)

74

Also deﬁne the tensor corresponding to vectors v

(r)
0 (x), r = 1, . . . , R, named D0 as

D0(x) = λ1v

(1)

0 (x)⊗M + λ2v

(2)

0 (x)⊗M + . . . + λRv

(R)
0

(x)⊗M ,

(3.2.1.4)

(r)
0 , r = 1, . . . , R, respectively, are chosen in
where pseudo-eigenvalues and eigenvectors λr, v
such a way that D0 belongs to the parametric subclass in D2(a∗, G, T ), meaning in particular,

that it satisﬁes (A1).

Below we present the ﬁrst Lemma which will ensure that the construction proposed above

yields the parametric family of tensors that satisfy condition (A1). Let us introduce some

additional notation, for p = 1, . . . , R,

D

(p)
0 (x) =

D

(p)
b

(x) =

R(cid:88)
R(cid:88)

r=p

r=p

(r)

0 (x)⊗M ,

λrv

(r)

b,n(x)⊗M .

λrv

(3.2.1.5)

Also we assume that (cid:107)v
b,n → 1 as n → ∞.

(r)

c

(r)

0 (x)(cid:107) = 1 then (cid:107)v

(r)

b,n(x)(cid:107)2 =: c

(r)
b,n. Hence, it is easy to note that

(p)
Lemma 10. For p = 1, . . . , R, the tensors D
b

and for all x ∈ G

(r)
,T , r = 1, 2 the following

relations hold

Ker(T (v

(p)
(p)
b,n(x), D
b

(x)) − λpI) = 0.

We will provide the proof of this lemma along with the proof of theorems in section 5 of

this chapter.

75

3.3 Simulation study and Data analysis

3.3.1 Simulation Study

Following the setup of Sakhanenko and DeLaura (2017), we consider a simulation study

based on the “Y” pattern that often presents itself while neural ﬁbers branch. Based on

signal-noise ratio (SNR) and thickness of the ﬁbers we provide a 3D plot, see Figure 3.1, of

an estimated integral curve along with 95% conﬁdence pointwise ellipsoids computed using

the method proposed in Carmichael and Sakhanenko (2015).

(a) 3-dimensional trace of the “Y” pattern.

(b) 3-dimensional trace of the “Y” pattern with
95 % conﬁdence ellipsoids.

Figure 3.1: We trace the integral curve using Carmichael and Sakhanenko (2015) method
creating the “Y” pattern. Here sample size n = 603, signal-noise ratio SN R = 2, thickness
of the ﬁber ε = 0.04, step size δ = 2 and the constant β = 10−7.

We simulate the “Y” pattern using several sample sizes from 303 to 1003. For each

curve we computed the estimated constant in the lower bound in Theorem 2, which is

(cid:17)(cid:17)

(cid:16)

n1/3(cid:16)(cid:107) ˆX

κ = w

(1)

n − x(1)(cid:107)2,T ,(cid:107) ˆX

(2)

n − x(2)(cid:107)2,T

. As our method is minimax globally

with respect to the asymptotic risk of the estimators, we can compare this κ values for

diﬀerent scenarios to chose the best one. Below we provide the 25th percentile, median

and 75th percentile for the κ values that are simulated over 100 times for each of the eight

76

diﬀerent sample sizes.

Figure 3.2: The red, blue and green lines show the 25th, 50th and 75th percentiles of the κ
values across all the sample sizes n = n3
0, n0 = 30, . . . , 100 with increasing n0 by 10 at each
step, repeated 100 times. Here we have used step size δ = 0.02 and β = 10−7 for each of the
iterations.

As we can see from ﬁgure 3.2, the median of the κ values tend to stabilize when we use

n0 = 60. Therefore, next we will investigate the robustness of the values for κ when we vary

signal-noise ratio (SNR) and thickness of the ﬁbers with the sample size n = 603. In Table

3.1 we provide the results.

77

30405060708090100n000.0050.010.0150.020.0250.030.0350.040.045Table 3.1: The 25th percentile, median and 75th percentile of the κ values are ordered in top
to bottom in each line for diﬀerent combinations of SNR and thickness of ﬁbers; the sample
size is n = 603.

SNR Thickness = 0.02

2

4

6

8

10

2.14 × 10−4
3.25 × 10−3
2.63 × 10−2
2.04 × 10−7
1.17 × 10−5
7.38 × 10−4
1.78 × 10−8
4.16 × 10−7
5.3 × 10−5
5.25 × 10−10
4.31 × 10−8
1.87 × 10−6
1.05 × 10−10
6.1 × 10−9
9.07 × 10−8

0.04
9.82 × 10−5
2.88 × 10−4
2.39 × 10−3
1.14 × 10−7
2.39 × 10−7
1.49 × 10−6
3.49 × 10−9
8.82 × 10−9
2.93 × 10−8
1.86 × 10−10
4.02 × 10−10
1.13 × 10−9
2.58 × 10−11
5.21 × 10−11
1.07 × 10−10

0.06
7.27 × 10−5
2.05 × 10−4
1.22 × 10−3
1.05 × 10−7
2.34 × 10−7
5.74 × 10−7
2.42 × 10−9
4.86 × 10−9
1.85 × 10−8
2.17 × 10−10
3.7 × 10−10
1.53 × 10−9
3 × 10−11
5.97 × 10−11
1.34 × 10−10

0.08
7.6 × 10−5
1.95 × 10−4
8.3 × 10−4
1.07 × 10−7
2.55 × 10−7
8.09 × 10−7
3.05 × 10−9
6.3 × 10−9
2.18 × 10−8
2.11 × 10−10
4.02 × 10−10
8.12 × 10−10
2.67 × 10−11
5.28 × 10−11
1.02 × 10−10

0.1
8.19 × 10−5
2.6 × 10−4
1.13 × 10−3
9.4 × 10−8
2.35 × 10−7
6.1 × 10−7
2.99 × 10−9
5.55 × 10−9
1.35 × 10−8
2.03 × 10−10
3.93 × 10−10
7.28 × 10−10
2.78 × 10−11
5.05 × 10−11
1.53 × 10−10

From Table 3.1 it is evident that, if we increase the SNR, the κ value decreases, indicating

that as the signal gets stronger, the traced curve estimates the true curve more accurately,

and hence the asymptotic risk decreases. However, the thickness of the curves does not eﬀect

the value of κ in a signiﬁcant way. Overall, the smallest thickness seems to have the worst

performance, while the best value of κ is obtained more often when the thickness is close

78

to 0.06 indicating that when the thickness of the curves is extremely high or low then the

uncertainty in estimation is higher.

Figure 3.3: The computation times in seconds for simulation of κ values for diﬀerent choices
of SNR and thickness values.

From Figure 3.3 we can see that the optimal computation time is achieved when SN R = 4
and thickness of the ﬁber is 0.02. In practice, SN R values normally range from 3 − 5 for

HARDI data, which suggests that our method could also achieve optimality with respect to

computation time in real life HARDI data.

3.3.2 Neuroimaging example

Several DWI datasets were collected from a 28-year-old healthy male brain on a GE 3T

Signa HDx MR scanner (GE Healthcare, Waukesha, WI) with an 8-channel head coil. The

subject signed the consent form approved by the Michigan State University Institutional

Review Board. DWI images were acquired with a spin-echo echo-planar imaging (EPI)

sequence for several minutes per session using the following protocols with the following

79

246810SNR567891011Compute time in sec1040.020.040.060.080.1thicknessparameters summarized in Table 3.2. All protocols had 48 contiguous 2.4-mm axial slices
in an interleaved order and matrix size = 128 × 128, TE = 72.3 ms, TR = 7.5 s, b = 1000
s/mm2, FOV is 22 cm ×22 cm, and parallel imaging acceleration factor = 2.

Table 3.2: Protocols for HARDI. Note that the second protocol has a total of 6 b0 images
after 2 repetitions of 3 b0 images.

Protocol

scan time

slice size NEX # of slices TR

# of b0 images

30 directions

6.5 mins

2.4 mm 1

30 directions, 2 reps

12.9 mins

2.4 mm 2

60 directions

12.9 mins

2.4 mm 1

90 directions

19.2 mins

2.4 mm 1

150 directions

20 mins

2.4 mm 1

48

48

48

48

32

11.5 s

11.5 s

11.5 s

11.5 s

7.5 s

3

3

6

9

9

We traced axonal ﬁbers in the anterior part of the corpus callosum which connect the

right and left frontal lobes. The general anatomical locations of these axonal ﬁbers are

well established. These ﬁbers can be used to evaluate new techniques in ﬁber tractogra-

phy. Several initial points were chosen in the region of interest (ROI) based on anatomical

considerations. Under each protocol, starting with each seed point, we used the estimation

technique in Carmichael and Sakhanenko (2015) to trace a ﬁber until it ran into water.

Figure 3.4 provides the reference images done for protocol with 60 directions.

For each curve we computed the estimated constant in the lower bound in Theorem 2,
which is κ = w(n1/3(cid:107) ˆXn − x(cid:107)2,T ). We used the weight function w(u) = u2/T . Since the

estimation method has the rate optimality property (with respect to n), the only thing left

to be optimized is the constant. It can be used to compare diﬀerent rate-optimal estimators.

In this study the estimators diﬀer according to underlying protocols used to obtain the data.

80

(a) single seed point

(b) multiple seed points

Figure 3.4: A neuronal ﬁber bundle across the genu of corpus callosum is created based on
the Carmichael and Sakhanenko (2015) method. In (a) one particular seed point was used.
The conﬁdence ellipsoids along the ﬁber have 95% conﬁdence. In (b) several seed points
were used to create several estimated ﬁbers. Two diﬀerent branches are shown in two colors.

The smaller constant κ would indicate a more successful estimator. Table 3.3 summarizes

our ﬁndings.

Table 3.3: Comparison of the constant κ for tracing of anterior ﬁbers based on imaging
datasets obtained via diﬀerent scanning protocols. Here δ = 0.003, β = 10−7.

Protocol

30 directions

30 directions, 2 reps

60 directions

90 directions

150 directions

Number of voxels ROI size
128 × 128 × 48
128 × 128 × 48
128 × 128 × 48
128 × 128 × 48
128 × 128 × 32

119

122

120

119

171

25th

Median 75th

0.1156

0.3630

1.6954

0.0915

0.2851

0.8066

0.0417

0.1211

0.3596

0.0481

0.1955

1.1941

0.1869

0.5762

1.4582

The protocol with 60 directions has the lowest κ, while protocol with 90 directions comes

in with the second smallest κ. For each protocol the distribution of constant values with

respect to the initial point is skewed right, which is expected since the uncertainty increases

81

along the ﬁbers.

It is quite interesting that the design with 60 diﬀerent directions gives

lower κ in comparison with the block design when 30 directions are independently repeated

twice. Usually, block designs yield smaller variances for the estimated ﬁbers locally, so one

might argue that block designs have advantage locally, see Sakhanenko (2013). However,

here the block design performs worse in a global sense. We speculate that this is due to

noise behaviour along the whole ﬁber.

3.4 Remarks and Conclusion

In summary we would like to comment that in this work we have proved the minimax

optimality of the asymptotic risk of the nonparametric integral curve estimators described in

Carmichael and Sakhanenko (2015) in the whole domain of the imaging ﬁeld G. Therefore,

we have established the global minimax optimality of the estimation method. Although

the asymptotic rates that we proposed are minimax optimal in the global sense, one can

further optimize the constant κ involved in the risk function to get optimal results. In our

data analysis we have provided a comparative study of the diﬀerent imaging protocols with

respect to this constant, and we have found the protocol that provides the optimal value to

the global asymptotic risk. The analysis was performed on a single subject (human brain).

We have similar results for another subject. This can be further studied with more diﬀerent

subjects to understand if the optimal protocol for scanning procedure remains the same

across subjects. However, it is beyond the scope of this work and could be explored as a nice

applied direction for future research in this topic. On the theoretical side, one can explore the

constant in lower bounds and obtain optimal constants to reﬁne theoretical results further.

82

3.5 Proofs

3.5.1 Proof of Lemma 10

Proof. In order to prove Lemma 10, let us ﬁrst assume n is suﬃciently large so that all the
terms with o(n−α) can be ignored and all assertions hold up to terms of order o(n−α). Then

note that for r = R

T (v
=T (v

(R)
(R)
b,n (x), D
b

(x))km

(R)
0

(R)
(x), D
0

(x))km

d(cid:88)

d(cid:88)

i3=1

iM =1

=λR(M − 1)

. . .

v

(R)
0k (x) v

(R)
0m (x) (v

(R)
0i3

(x))2 . . . (v

(R)
0iM

(x))2 − λRδkm

=λR[(M − 1)v0kv0m − δkm].

To show Ker(T (v

(R)
0

(R)
(x), D
0

(x)) − λRI) = 0, we consider any arbitrary u ∈ Rd such that

(cid:104)λR[(M − 1)v

(R)
0k v

(R)

0m − δkm], u(cid:105) = 0

(M − 1)v

(R)
0k v

(R)
0m um =

d(cid:88)

m=1

δkmum

d(cid:88)

d(cid:88)

k=1

(R)
v
0k uk

(M − 1)(v

(R)
0k )2v

(R)
0m um =

(R)
v
0k uk

=⇒ d(cid:88)
=⇒ d(cid:88)

m=1

d(cid:88)

k=1

m=1

=⇒ (M − 1)

=⇒ d(cid:88)

d(cid:88)

v

(R)
0m um =

m=1

k=1

v

(R)
0m um = 0.

m=1

83

Therefore, it implies that, if uk = 0, k = 1, . . . , d, then Ker(T (v

(R)
0

(R)
(x), D
0

(x))− λRI) = 0.

Similarly,

(T (v

(p)
0 (x), D

=(M − 1)

R(cid:88)

(p)

0 (x)) − λpI)km
d(cid:88)
d(cid:88)

λr

. . .

iM =1

r=p

i3=1
− δkmλp

(cid:104)

=

(M − 1)λpv

(p)
0k (x)v

(p)

0m(x) − δkmλp

(R)
0k (x)v
v

(R)
0m (x) v

(R)
0i3

(x) . . . v

(R)
0iM

(x) v

(p)
0i3

(x) . . . v

(p)
0iM

(x)

(cid:105)

+ (M − 1)

R(cid:88)

r=p+1

λrv

(r)
0k (x)v

(r)

0m(x)qM−2

r,p

,

(3.5.1.1)

where qr,p = (cid:104)v
Ker(T (v

(p)
0 (x), D

(p)

(r)
0 , v
(p)

0 (cid:105), p (cid:54)= r = 1, . . . , R. Now consider a generic u ∈ Rd, which belongs to
0 (x)) − λpI) then

d(cid:88)

m=1

(M − 1)λpv

(p)
0k (x)v

d(cid:88)

R(cid:88)

m=1

r=p+1

(p)

0m(x)um + (M − 1)
d(cid:88)

(p)
0k (x) we get
v

Then taking a sum over

d(cid:88)
d(cid:88)

m=1

k=1

(p)

0m(x)um + (M − 1)

v

(p)

0m(x)um + (M − 1)

v

R(cid:88)

d(cid:88)
R(cid:88)

m=1

r=p+1

λrqM−1

r,p

λrv

r,p = λp

(r)

0m(x)umqM−1
d(cid:88)

(r)
0m(x)um = 0.
v

(M − 1)λp

(M − 2)λp

λrv

(r)
0k (x)v

(r)

0m(x)umqM−2

r,p = λpuk.

(3.5.1.2)

d(cid:88)

k=1

(p)
0k (x)uk,
v

m=1

r=p+1

m=1

84

d(cid:80)
d(cid:88)

k=1

m=1

(M − 1)λpql,p

Similarly, taking

(l)
0k (x) on (3.5.1.2), we get for l = p + 1, . . . , R,
v

(p)

0m(x)um + (M − 1)

v

R(cid:88)

r=p+1

d(cid:88)

m=1

(r)
0m(x)um
v

(3.5.1.3)

r,p

λrql,rqM−1
d(cid:88)

= λp

(l)
0m(x)um.
v

d(cid:80)

m=1

Suppose

m=1

(l)
0m(x)um = Al, then together with (3.5.1.2) and (3.5.1.3) we get a linear
v

system of equations:

λp(M − 2)Ap + (M − 1)

R(cid:88)
λrqM−1
R(cid:88)

r=p+1

r,p Ar = 0,

λp(M − 1)ql,pAp + (M − 1)

λrql,rqM−2

r,p Ar − λpAl = 0,

(3.5.1.4)

l = p + 1, . . . , R.

r=p+1

We can write these R − p + 1 equations in R − p + 1 variables Al, l = p + 1, . . . , R given by

(3.5.1.4), in matrix notations:

QR,pA = 0,

where A = (Ap, Ap+1, . . . , AR) and



QR,p =

 .

(M − 2)λp

(M − 1)λpqp+1,p

(M − 1)λp+1qM−1

p+1,p

(M − 1)λp+1qM−2

p+1,p − λp

. . .

. . .

(M − 1)λRqM−1

R,p

(M − 1)λRqp+1,RqM−1

R,p

...

(M − 1)λpqR,p

. . .

. . .

(M − 1)λRqM−2

R,p − λp

85

Now suppose Det(QR,p) (cid:54)= 0, then
d(cid:88)

Al =

v

(l)
0mum = 0, l = p, p + 1, . . . , R,

(3.5.1.5)

or in other words

m=1



(p)
v
01

. . . v

(p)
0d

...

(R)
v
01

. . . v

(R)
0d

 = 0.





u1
...

ud

Now putting equation (3.5.1.5) back in (3.5.1.3), we get uk = 0, k = 1, . . . , d. Next we need
to ﬁnd under what conditions the (R− p + 1)× (R− p + 1) matrix QR,p is of the full rank for
p = 1, . . . , R. Now in order to ensure Det(QR,p) (cid:54)= 0, p = 1, . . . , R, consider the following

cases:

Case I

(k)

If R ≤ d, then after choosing R − p pseudo-eigenvectors we can choose the rest p of pseudo-
0 , k = R − p + 1, . . . , R, that are mutually orthogonal, and as a result we
eigenvectors v
will have qk,m = 0, k, m = R − p + 1, . . . , R. Therefore, as long as λp (cid:54)= 0, we will have
Det(QR,p) (cid:54)= 0.

Case II

If R > d then ﬁrst let us consider R = d + 1. In that case, we can choose the ﬁrst d pseudo-

(k)
eigenvectors v
0
Det(QR,p) (cid:54)= 0, p = 1, . . . , R.

to be orthogonal, hence qi,j = δij, i, j = 1, . . . , d. Now we need to check

1. When p = R, we have dim(QR,p) = 1 × 1 and Det(QR,p) (cid:54)= 0 =⇒ λR(M − 2) (cid:54)= 0.

86

2. When p = R − 1, we have dim(QR,p) = 2 × 2 and

Det(QR,p) (cid:54)= 0





=⇒ Det

(M − 2)λR−1

(M − 1)λRqM−1
R,R−1

(M − 1)λR−1qR,R−1

(M − 1)λRqM−2
R,R−1 − (M − 2) − (M − 1)2qM

R,R−1 − λR

R,R−1 (cid:54)= 0.

=⇒ (M − 1)(M − 2)qM−2


 (cid:54)= 0

(3.5.1.6)

Now (3.5.1.6) is a polynomial in qR,R−1, that can yield at most M roots, provided
λR−1 (cid:54)= 0. Hence, we can choose v

such that qR,R−1 satisﬁes (3.5.1.6).

(R)
0

3. When p = R−2, then we have dim(QR,p) = 3×3. Also we use the fact that qR−1,R−2 =

0, as ﬁrst d pseudo-eigenvectors are orthogonal. Here in this case, the QR,p matrix is

given by



(M − 2)λR−2

(M − 1)λR−1qR−1,R−2

(M − 1)λR−2qR−1,R−2
(M − 1)λR−2qR,R−2

(M − 1)λR−1qM−2
(M − 1)λR−1qR−1,RqM−2

R−1,R−2 − λR−1
R−1,R−2

 ,

(M − 1)λR−1qM−1
R,R−2
(M − 1)λRqR−1,RqM−2
R,R−2
(M − 1)λRqM−2
R,R−2 − λR

and therefore,

Det(QR,p) (cid:54)= 0





=⇒ Det

(M − 2)λR−2

0

(M − 1)λR−2qR,R−2

0

−λR−1

0

=⇒ − ((M − 1)(M − 2)qM−2

R,R−2 − (M − 2) − (M − 1)2qM


 (cid:54)= 0

(M − 1)λR−1qM−1
R,R−2
(M − 1)λRqR−1,RqM−2
R,R−2
(M − 1)λRqM−2
R,R−2 − λR
R,R−2) (cid:54)= 0.

87

Note that the last assertion is similar to (3.5.1.6) provided λR, λR−1, λR−2 (cid:54)= 0.

4. Similarly, for p = R − 3, we have dim(QR,p) = 4 × 4 and here we use the fact that

qR−2,R−3, qR−1,R−3 = 0 using similar arguments. Hence,

Det(QR,p) (cid:54)= 0

(M − 2)λR−2

=⇒ Det

0

0




0

−λR−2

0

0

0

−λR−1

0

(M − 1)λRqM−1
R,R−3
(M − 1)λRqR−2,RqM−2
R,R−3
(M − 1)λRqR−1,RqM−2
R,R−3
(M − 1)λRqM−2
R,R−3 − λR
R,R−3) (cid:54)= 0.

0
=⇒ (−1)2((M − 1)(M − 2)qM−2

0

R,R−3 − (M − 2) − (M − 1)2qM




(cid:54)= 0

(R)
In this way we can continue and at each step we need to choose v
0

, in such a way that,

(−1)(R−p−1)((M − 1)(M − 2)qM−2

R,p − (M − 2) − (M − 1)2qM

R,p) (cid:54)= 0, for p = 1, . . . , R − 1.

Finally for any R = d + k, where k > 1, by induction we will get a polynomial expression in

qR,R−k+1

qR,R−k

qR,R−k−1

. . .

qR−1,R−k+1

. . .

...

qR,p

qR,p

,

qR−k+1,R−k

. . .

qR−k+1,p

similar to (3.5.1.6), where each of the qR1,R2
ﬁxed integer degree. Therefore, we can choose λ1, . . . , λR (cid:54)= 0 and {v

∈ [−1, 1] and the polynomial will be of a
0 } such that

(1)
0 , . . . , v

(R)

88

Det(QR,p) (cid:54)= 0, p = 1, . . . , R.

Next, we prove Theorem 1 and 2 using the construction that we already mentioned using
some intermediate results. It is easy note that as Dc,k ⊂ D2(a∗, G, T ), therefore Theorem 4

is an immediate corollary of Theorem 5.

3.5.2 Proof of Theorem 5

Proof. The proof of Theorem 5 will be based on the following Lemmas and are similar to the

construction given in Ibragimov and Has’minskii (2013) and (1990). We additionally provide

a result that extends Fano’s Lemma in the multidimensional parameter space. We use that

result in Lemma 12 tailored to ﬁt our problem and similar to Theorem 5.7 in Devroye (1987).

Lemma 11 provides a bound for the function g deﬁned in the earlier section. In Lemma 13 we
prove the smoothness condition on the perturbed class of tensor ﬁelds Dc,k. Finally, Lemma

14 and 15 provide a construction for the tensor ﬁelds and integral curves which we will make

use in the proof of Theorem 5.
Lemma 11. Let f satisfy condition (A5(cid:48)). Then there exists such δ1 > 0, that for any vector
u, satisfying |u| ≤ δ1 we have,

g(u) ≤ C(f, δ1)|u|2,

where C(f, δ1) is a positive constant.

The proof of the Lemma 11 will follow along the same lines of the proof of Lemma 1

from Sakhanenko (2011). Below we consider Proposition 1 in which the Fano’s Lemma is

described for multidimensional parameters.

89

Proposition 1. Let X be a random variable whose density depends on parameter from a mul-

tidimensional parameter space. Suppose, the densities are indexed by i = (i1, . . . , iR), fi are
such that KL divergence between them K(fi, fj) ≤ β and 1 ≤ ir ≤ lr + 1, r = 1, . . . , R. Fur-
thermore, for the parameters (θ(1), . . . , θ(R)), for each r let the estimate of θ(r) be Ψ(r)(X),
which takes value in {1, . . . , lr + 1}, L = (l1 + 1) . . . (lR + 1). Then,

Pi (Ψ(X) (cid:54)= i) ≥ 1 − β + ln 2
ln(L − 1)

,

sup

i

where Pi1,...,iR

is the probability induced by fi.

Proof. Let θ = (θ(1), . . . , θ(R)), be a random vector such that,

P(θ(1) = i1, . . . , θ(R) = iR) =

1

L , 1 ≤ ir ≤ lr + 1, r = 1, . . . , R.

For ease of notation assume Ψ(X) = (Ψ(1)(X), . . . , Ψ(R)(X)). Then,

(cid:88)

i1,...,iR

P(θ(1) = i1, . . . , θ(R) = iR|X) ln P(θ(1) = i1, . . . , θ(R) = iR|X)
(cid:19)

(cid:18) P(θ = i|X)

=P(θ = Ψ(X)|X) ln(P (θ = Ψ(X)|X)) + P(θ (cid:54)= Ψ(X)|X)log(P (θ (cid:54)= Ψ(X)|X))

P(θ = i|X)

(cid:88)

i(cid:54)=Ψ(X)

+ P(θ (cid:54)= Ψ(X)|X)

P(θ (cid:54)= Ψ(X)|X)

ln

P(θ (cid:54)= Ψ(X)|X)

≥ − ln 2 − P(θ (cid:54)= Ψ(X)|X) ln(L − 1),

where we applied Lemma (5.1) from the Devroye (1987) twice. The quantity on the left-

90

hand side of this chain of inequalities will now be bounded from above. Indeed,

P(θ = i|X) = P(θ(1) = i1, . . . , θ(R) = iR|X)

(cid:80)

=

fi1,...,iR

(X)
fj1,...,jR

.

(X)

j1,...,jR

Thus,

P(θ = i|X) ln(P(θ = i|X)))

fj(x)dx

 1

L

 ≥ 1

L

fj(x)

(cid:88)

j

ln fj(x)

(cid:88)

j

E(

i

i

(cid:88)
(cid:90) (cid:88)
(cid:90)
(cid:88)
(cid:90)
(cid:88)
(cid:88)

1
L2

1
L

i,j

i

1
L2

i,j

=

=

=

=



 fi(x)(cid:80)
 1
(cid:88)
 fi(x)dx, since, ln

fj(x)

L

j

j

j

ln

fj(x)

fi(x)(cid:80)
 fi(x)(cid:80)
(cid:33)
(cid:32)

j

ln

fj(x)

ln

fi
fj

fidx − lnL

K(fi, fj) − lnL

≤β − lnL.

We conclude that

Hence,

β − lnL ≥ − ln 2 − P(θ (cid:54)= Ψ(X)) ln(L − 1).

Pi (Ψ(X) (cid:54)= i) ≥ P(Ψ(X) (cid:54)= θ) ≥ lnL − β − ln 2
ln(L − 1)

.

sup

i

91

Finally,

Pi (Ψ(X) (cid:54)= i) ≥ 1 − β + ln 2
ln(L − 1)

.

sup

i

Next, we describe Proposition 2 to establish a condition for providing a bound for the KL-
|v(x)|2dx

divergence of the densities in our parametric subclass. Now let us deﬁne (cid:107)v(cid:107)2

G =(cid:82)

G

for a vector ﬁeld v.

Proposition 2. Let Pi, fi be respectively the probability measure and density induced by

, i = (i1, . . . , iR). Then for any two indices i (cid:54)= j such that

(cid:16)

(cid:17)

X, BDi(X) + Σ1/2(X)ξ
1 ≤ ir, jr ≤ lr + 1, r = 1, . . . , R,

(cid:107)Di − Dj(cid:107)2

G ≤ β =⇒ K(fi, fj) ≤ C(f, δ1)CΣ,Bβ,

where C(f, δ1) is a constant introduced in Lemma 11 and CΣ,B is a constant depending on

Σ and B only.

92

f (Σ−1/2(x)(y − BDi(x)))
f (Σ−1/2(x)(y − BDj(x)))
f (˜y − Σ−1/2(x)BDi(x))
f (˜y − Σ−1/2(x)BDj(x))

ln

ln

f (Σ−1/2(x)(y − BDi(x)))dxdy

f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x))dxd˜y.

Proof.

K(fi, fj)

(cid:90)
(cid:90)

G

(cid:90)
(cid:90)

RN

=

=

G

= −

= −

= −

RN
Using the change of variable ˜y = Σ−1/2y on y only:
(cid:90)
(cid:90)
(cid:90)
(cid:90)

(cid:17)
z + Σ−1/2(x)B(Di(x) − Dj(x))

f (˜y − Σ−1/2(x)BDj(x))
(cid:16)
f (˜y − Σ−1/2(x)BDi(x))

RN

ln

G

f

ln

f (z)

f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x))dxd˜y

f (z)Det(Σ1/2(x))dxd˜y,

G

RN
using the change of variable z = ˜y − Σ−1/2(x)BDi(x),
(cid:90)
(cid:90)

z + Σ−1/2(x)B(Di(x) − Dj(x))

(cid:17) − f (z)

(cid:16)

f

1 +

ln

f (z)

 f (z)Det(Σ1/2(x))dxdz.

G

RN

Finally applying Lemma 11 we obtain

(cid:90)
(cid:90)

G

K(fi, fj) ≤ C(f, δ1)

≤ C(f, δ1)

|Σ−1/2(x)B(Di(x) − Dj(x))|2Det(Σ1/2(x))dx

(cid:107)Σ−1/2(x)(cid:107)2

F(cid:107)B(cid:107)2

F(cid:107)Di(x) − Dj(x)(cid:107)2

F Det(Σ1/2(x))dx

G

≤ C(f, δ1)CΣ,B(cid:107)Dθ − D˜θ(cid:107)2
≤ C(f, δ1)CΣ,Bβ.

G

93

Lemma 12. Let Dl be a class of L tensor ﬁelds Dθ, θ = (θ(1), . . . , θ(R)) inside the parametric
subclass of Dc,k, such that for any θ (cid:54)= ˜θ and positive constants β, δ

(r)
2 , r = 1, . . . , R,

(cid:107)Dθ − D ˜θ(cid:107)2

G ≤ δ1,(cid:107)Dθ − D ˜θ(cid:107)2

G ≤ β and (cid:107)x

(r)

θ − x

(r)
˜θ

(cid:107)1,T ≥ 2δ

(r)
2 ,

θ ∈ Rd denotes the integral curve corresponding to v
where for each r = 1, . . . , R and θ, x
starting at a∗. In addition, let f satisfy condition (A5(cid:48)). Then for any w ∈ ˜W we have,

(r)

(r)
θ

Ew((cid:107) ˆX

(1)

n − x(1)(cid:107)1,T , . . . ,(cid:107) ˆX

sup
D∈DL

(R)

n − x(R)(cid:107)1,T )
(cid:18)

≥

|x(r)|≥δ

inf
(r)
2 : r=1,...,R

w(|x(1)|, . . . ,|x(R)|)

1 − nCΣ,BC(f, δ1)β + ln 2

ln(L − 1)

(cid:19)

.

Proof. The proof of this Lemma will follow the structure of the modiﬁed version of Fano’s

Lemma that we described in Proposition 1.

Let Θ be a uniform random vector such that

1

(cid:16)

(cid:17)

P(θ(1) = i1, . . . , θ(R) = iR) =

(l1 + 1) . . . (lR + 1)

, 1 ≤ ir ≤ lr + 1, r = 1, . . . , R.

Let Pi be the probability measure induced by

X, BDi(X) + Σ1/2(X)ξ

, i = (i1, . . . , iR).

For each r = 1, . . . , R, deﬁne Ψ(X1, . . . , Xn, ξ1, . . . , ξn) = (Ψ(1)(X), . . . , Ψ(R)(X)), where
Ψ(R)(X) = θ(r) if (cid:107) ˆX

n − x(r)(cid:107)1,T ≥ δ

(r)

(r)
2 .

94

Then by Proposition 1 (modiﬁed version of Fano’s Lemma) we have

(cid:16)(cid:107) ˆX

Ew

(1)

n − x(1)(cid:107)1,T , . . . ,(cid:107) ˆX

(R)

n − x(R)(cid:107)1,T

(1)

n − x(1)(cid:107)1,T , . . . ,(cid:107) ˆX

(R)

n − x(R)(cid:107)1,T

(cid:17)

(cid:17)

sup

D∈D2(a∗,G,T )

(cid:16)(cid:107) ˆX

≥ sup
D∈Dl

Ew

≥

≥

≥

|x(r)|≥δ

inf
(r)
2 : r=1,...,R

|x(r)|≥δ

inf
(r)
2 : r=1,...,R

|x(r)|≥δ

inf
(r)
2 : r=1,...,R

w(|x(1)|, . . . ,|x(R)|) sup
D∈Dl
w(|x(1)|, . . . ,|x(R)|) max

i

(cid:18)

P((cid:107) ˆX

(r)

n − x

(r)

θ (cid:107)1,T ≥ δ

(r)
2 ; r = 1, . . . , R)

Pi(ψ(r)(X) (cid:54)= θ(r) r = 1, . . . , R)

w(|x(1)|, . . . ,|x(R)|)

1 − nI((X1, ξ1), θ) + ln 2

ln(L − 1)

(cid:19)

,

where I ((X1, ξ1), θ) is the Shannon’s information. Notice that the Shannon’s information

as mentioned above can be bounded from above as in Has’minskii and Ibragimov (1990):

f (Σ−1/2(x)(y − BDj(x)))

I ((X1, ξ1), Θ)

=L−1

=L−1

l1+1(cid:88)

i1=1

l1+1(cid:88)

i1=1

(cid:90)

lR+1(cid:88)

(cid:90)

(cid:32)

ln

. . .

G

RN

iR=1

L−1 (cid:80)
f (Σ−1/2(x)(y − BDi(x)))
(cid:33)
f (Σ−1/2(x)(y − BDi(x)))
lR+1(cid:88)
L−1(cid:80)
f (˜y − Σ−1/2(x)BDi(x))

f (˜y − Σ−1/2(x)BDj(x))

j1,...,jR

(cid:32)

RN

iR=1

(cid:33)
f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x))

dxd˜y.

(cid:90)

dxdy

(cid:90)

G

. . .

ln

j

Using the change of variable ˜y = Σ−1/2y on y only and using the concavity of log we can

95

further bound I ((X1, ξ1), Θ) from above by

l1+1(cid:88)

i1=1

L−1

(cid:90)

lR+1(cid:88)

(cid:90)

(cid:32)

ln

. . .

f (˜y − Σ−1/2(x)BDi(x))
(cid:33)
f (˜y − Σ−1/2(x)BD(x))
f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x))

iR=1

RN

G

dxd˜y.

Now we can bound the integral above similarly to the proof of Proposition 2. Hence, we get

I ((X1, ξ1), Θ)

i1=1

l1+1(cid:88)
l1+1(cid:88)
l1+1(cid:88)

i1=1

. . .

. . .

. . .

iR=1

lR+1(cid:88)
lR+1(cid:88)
lR+1(cid:88)

iR=1

(cid:90)
(cid:90)

G

G

C(f, δ1)

C(f, δ1)

|Σ−1/2(x)BDi(x) − Σ−1/2(x)BD(x)|2Det(Σ1/2(x))dx

(cid:107)Σ−1/2(x)(cid:107)2

F(cid:107)B(cid:107)2

F(cid:107)Di(x) − D(x)(cid:107)2

F Det(Σ1/2(x))dx

C(f, δ1)CΣ,B(cid:107)Dθ − D˜θ(cid:107)2

G

≤ L−1

≤ L−1

≤ L−1

i1=1

iR=1
≤ C(f, δ1)CΣ,Bβ.

This argument along with Propositions 1 and 2, completes the proof of Lemma 12.

Next, we deﬁne a P/4-separated net in L1-norm B by using b ∈ {0, 1}P similar to
Sakhanenko (2013). The cardinality of such set B is larger than exp(P/2), see Ibragimov

and Has’minskii (1980, 1981). Recall (3.2.1.2):

(r)
b,n(x) = v
v

(r)

0 (x)(1 + n−αϕ

b

(cid:48)(r)

(r)

b (x))ψ(r)(nγ|ab(x) − a∗|)).

(t

Note that by construction, the starting point ab(x) ∈ Aε does not depend on b, therefore

96

ab(x) = a0(x), for all b ∈ {0, 1}P . Moreover,

(r)
0 (x) = t
t

(r)

b (x) + n−αϕ

(r)
b (t

(r)

b (x))ψ(r)(nγ|a0(x) − a∗|).

(3.5.2.1)

Following the construction of Sakhanenko (2011), suppose, for each r = 1, . . . , R, we can
construct ϕ(r)(t) : R (cid:55)→ [−1, 1] be a twice continuously diﬀerentiable function with support

[0, 1] such that

and

ϕ(r)(0) = ϕ(r)(1) = 0,−1 < ϕ

(cid:48)(r)(t) ≤ 1,

P(cid:88)

i=1

ϕ

(r)
b (t) =

bihϕ(r)((t − (i − 1)h)/h),

(r)
where P = P1nδ, h = T /P. Moreover, suppose the components of ϕ
are supported on
b
equi-length subintervals [(i − 1)h, ih], i = 1, . . . , P. It is very important for us now to show
that the tensor subclass is indeed in D2(a∗, G, T ), in particular its members are twice dif-

ferentiable. The smoothness condition is a requirement as stated in (A1), which is pivotal

to our estimation process of the tensor model of the ﬁber tracts. Below we would intro-

duce Lemma 13 which will establish that the tensors in our parametric subclass are twice

continuously diﬀerentiable.

Lemma 13. For all b ∈ B, we have Db ∈ Dc,k.

Proof. Following the arguments in Lemma 3 of Sakhanenko (2011) we note that the quantity

(r)
t
b

, r = 1, . . . , R, is twice continuously diﬀerentiable. Therefore, the implicit diﬀerentiation

(r)
of the equation (3.5.2.1) would imply the gradient and the Hessian of t
b

, r = 1, . . . , R, are

97

bounded for all suﬃciently large n. In other words, for each r = 1, . . . , R,

(cid:48)(r)
(cid:107)t

b

(x)(cid:107)2 ≤ |t

0

(cid:48)(r)

(x)| + ε
1 − ε

(r)
1

(r)
1

and (cid:107)t

(cid:48)(cid:48)(r)

b

(x)(cid:107)F ≤ ε

(r)
2

for some positive constants ε

(r)
1 , ε

(r)
2 . Also,

(cid:48)(r)
ϕ
b

(cid:48)(cid:48)(r)
ϕ
b

(cid:48)(cid:48)(cid:48)(r)

b

ϕ

(t) =

(t) =

(t) =

i=1

P(cid:88)
P(cid:88)
P(cid:88)

i=1

i=1

(cid:48)(r)((t − (i − 1)h)/h),

biϕ

(cid:48)(cid:48)(r)((t − (i − 1)h)/h)nδP1/T,

biϕ

(cid:48)(cid:48)(cid:48)(r)((t − (i − 1)h)/h)n2δP 2

biϕ

1 /T 2.

Recall (3.2.1.4), D0(x) =

R(cid:80)

r=1

(r)

0 (x)⊗M . Then we can rewrite the tensor elements by

λrv

 M(cid:89)

j=1

R(cid:88)

r=1

λr

 .

(r)
v
0,ij

(x)

D0,i1,...,iM

(x) =

Taking the ﬁrst and the second derivatives of D0,i1,...,iM

(x) in xu and xu, xv respectively we

98

get

∂D0,i1,...,iM

(x)

∂xu

=

∂2D0,i1,...,iM
∂xu∂xv

(x)

=

R(cid:88)
R(cid:88)

r=1

r=1

λr

λr

+

j=1

 M(cid:88)
(cid:32) M(cid:88)
M(cid:88)

j=1

v

(r)
0,il

(x)

M(cid:89)
(cid:0) M(cid:89)

l=1,l(cid:54)=j

∂v

(x)

(r)
0,ij
∂xu

∂2v

(r)
0,ij

(x)

∂xu∂xv

(r)
v
0,ij

(x)

l=1,l(cid:54)=j
(r)
0,ij2
∂xv

∂v

∂v

(x)

(r)
0,ij1
∂xu

 ,
(x)(cid:1)
(cid:16) M(cid:89)

j1, j2 = 1
j1 (cid:54)= j2

l = 1
l (cid:54)= j1, j2

(cid:17)(cid:33)

.

(r)
v
0,il

(x)

(3.5.2.2)

Similarly (3.2.1.3) states that

0 (x)⊗M(cid:0)1 + n−αϕ

(r)

(cid:48)(r)

b

b (x))ψ(r)(nγ|ab(x) − a∗|)(cid:1)M .

(r)

(t

λrv

Db(x) =

R(cid:88)

r=1

R(cid:88)
R(cid:88)

r=1

Hence, we can rewrite the above tensor element-wise as follows

Db,i1,...,iM

(x) =

=

λrv

(x) . . . v

(r)
0,i1

M(cid:89)

λr(

(r)
v
0,ij

(x))

r=1

j=1

(r)
0,iM

(x)

(cid:18)
1 + n−αϕ

(cid:48)(r)

b

(cid:18)
1 + n−αϕ

(cid:48)(r)

b

(t

(cid:19)M
b (x))ψ(r)(nγ|a0(x) − a∗|)
(cid:19)M

(r)

(r)

b (x))ψ(r)(nγ|a0(x) − a∗|)

(t

.

Now taking the partial derivative of the tensor component Db,i1,...,iM

(x) with respect to xu,

99

we get

=

+

r=1

R(cid:88)
R(cid:88)
(cid:16)

r=1

∂Db,i1,...,iM (x)

∂xu

(cid:16)

1 + n−αϕ

(cid:48)(r)
b

λr

M(cid:89)

(cid:17)M
b (x))ψ(r)(nγ|a0(x) − a∗|)
(t(r)
(cid:32)

(cid:16)

 M(cid:88)

j=1

(x)

∂v(r)
0,ij
∂xu



(x)

M(cid:89)
(cid:17)M−1 ×

v(r)
0,il

l=1,l(cid:54)=j

λrn−α(

v(r)
0,ij

(x))

M

1 + n−αϕ

(cid:48)(r)
b

b (x))ψ(r)(nγ|a0(x) − a∗|)
(t(r)

j=1

(t(r)

b (x))

∂t(r)
b (x)
∂xu

(cid:48)(cid:48)(r)
ϕ
b

ψ(r)(nγ|a0(x) − a∗|)

(cid:48)(r)
+ nγϕ
b

(t(r)

b (x))ψ

(cid:48)(r)(nγ|a0(x) − a∗|)

(a0(x) − a∗)T
|a0(x) − a∗|

∂a0(x)

∂xu

(cid:17)(cid:33)

.

Next, the second order derivative of the tensor component Db,i1,...,iM

(x) with respect to xu

100

(cid:17)M
b (x))ψ(r)(nγ|a0(x) − a∗|)
(t(r)
M(cid:89)

(cid:16)

(x)

(x)

∂v(r)
0,ij1
∂xu

∂v(r)
0,ij2
∂xv

v(r)
0,il

l=1, l(cid:54)=j1,j2

(3.5.2.3)

(cid:0) M(cid:89)

l=1,l(cid:54)=j

(x)(cid:1)

v(r)
0,ij

(x)

∂2v(r)
0,ij
∂xu∂xv

and xv, where u (cid:54)= v, is given by

∂2Db,i1,...,iM (x)

∂xu∂xv

(cid:16)

=

λr

1 + n−αϕ

(cid:48)(r)
b

M(cid:88)
R(cid:88)

(cid:16)

j1,j2=1, j1(cid:54)=j2

R(cid:88)

r=1

+

(cid:32)

r=1

(cid:48)(cid:48)(r)
ϕ
b

(t(r)

b (x))

∂t(r)
b (x)
∂xu

(a0(x) − a∗)T
|a0(x) − a∗|

∂a0(x)

∂xu

(cid:32)

j=1

(x)

(cid:32) M(cid:88)
(cid:17)(cid:33)
(cid:17)M−1


(t(r)

(x)

b (x))ψ

+2M n−α

1 + n−αϕ

(cid:48)(r)
b

b (x))ψ(r)(nγ|a0(x) − a∗|)
(t(r)

λr

ψ(r)(nγ|a0(x) − a∗|) + nγϕ

(cid:48)(r)
b

(cid:48)(r)(nγ|a0(x) − a∗|)

(cid:33) M(cid:88)

M(cid:89)

(x)

∂v(r)
0,ij
∂xu

v(r)
0,il

l=1,l(cid:54)=j

j=1
∂t(r)
b (x)
∂xu

+M (M − 1)n−2α

(cid:48)(cid:48)(r)
b

ϕ

(t(r)

b (x))

ψ(r)(nγ|a0(x) − a∗|)

(cid:48)(r)
+ nγϕ
b

b (x))ψ

(cid:48)(r)(nγ|a0(x) − a∗|)

(t(r)

 M(cid:89)

R(cid:88)

λr

r=1

j=1

(cid:48)(cid:48)(cid:48)(r)
ϕ
b

(t(r)

b (x))

(cid:16)

(x)

v(r)
0,ij

(cid:32)

(cid:33)T

+M n−α

(cid:32)

∂t(r)
b (x)
∂xu

∂t(r)
b (x)
∂xv

ψ(r)(nγ|a0(x) − a∗|)

1 + n−αϕ

(cid:48)(r)
b

b (x))ψ(r)(nγ|a0(x) − a∗|)
(t(r)

(cid:33)2

(a0(x) − a∗)T
|a0(x) − a∗|

∂a0(x)

∂xu

(cid:17)M−1 ×

(cid:48)(cid:48)(r)
+ ϕ
b

(t(r)

b (x))

∂2t(r)
b (x)
∂xu∂xv

ψ(r)(nγ|a0(x) − a∗|)

(cid:48)(cid:48)(r)
+ nγϕ
b

(cid:16) ∂t(r)

(t(r)

b (x))ψ

(cid:18) (a0(x) − a∗)T

|a0(x) − a∗|

b (x)
∂xu
(cid:48)(r)
b

(cid:48)(r)(nγ|a0(x) − a∗|)×

(cid:19)

+

∂xv

∂a0(x)

∂t(r)
b (x)
∂xv

(cid:18) (a0(x) − a∗)T
(cid:48)(r)(nγ|a0(x) − a∗|) ×(cid:16)(cid:18) ∂a0(x)

(cid:48)(cid:48)(r)(nγ|a0(x) − a∗|)

|a0(x) − a∗|

(cid:18) (a0(x) − a∗)T
(cid:19)T ∂a0(x)

|a0(x) − a∗|
∂a0(x)

∂xv

+ n2γϕ

(t(r)

b (x))ψ

(cid:48)(r)
+ nγϕ
b

(t(r)

b (x))ψ

(a0(x) − a∗)T
|a0(x) − a∗|

+

∂2a0(x)
∂xu∂xv

− (a0(x) − a∗)T
|a0(x) − a∗|3

∂a0(x)

∂xu

(a0(x) − a∗)T ∂a0(x)
∂xv

∂xv

∂xu

|a0(x) − a∗|

(cid:19)(cid:17)

(cid:17)(cid:33)

.

∂a0(x)

(cid:19)(cid:18) (a0(x) − a∗)T

∂xu

|a0(x) − a∗|
1

(cid:19)

∂a0(x)

∂xu

101

The leading term in (3.5.2.3) is

n2γ−αϕ

(cid:48)(r)
b

(t(r)

b (x))ψ

(cid:48)(cid:48)(r)(nγ|a0(x) − a∗|)

(cid:18) (a0(x) − a∗)T

|a0(x) − a∗|

(cid:19)(cid:18) (a0(x) − a∗)T

|a0(x) − a∗|

∂a0(x)

∂xv

(cid:19)

.

∂a0(x)

∂xu

Now by construction the vector ﬁelds v

(r)
0 (x), r = 1, . . . , R, are bounded and twice continu-

ously diﬀerentiable, and the functions ϕ(r), ψ(r), r = 1, . . . , R, are bounded thrice and twice

continuously diﬀerentiable, respectively. Therefore, in the expression of (3.5.2.3), if α > 0

and γ = α/2, then Db satisﬁes (cid:107)Db−D0(cid:107)∞ ≤ cn−α and (cid:107)D

(cid:48)(cid:48)
b −D

(cid:48)(cid:48)
0(cid:107)∞ ≤ k for some constant

k > 0.

Now we move on to our next lemma which shows that the diﬀerence in the tensors for

any two b ∈ B is bounded in the integrated L2-norm.
Lemma 14. For any b, ˜b ∈ B, such that b (cid:54)= ˜b and any large enough n we have

(cid:107)Db(x) − D˜b(x)(cid:107)2

G ≤ Cn−(d−1)γ+2α,

where C > 0 is a constant.

102

r=1 λrv

(r)

b,n(x)⊗M . Now for ﬁxed b, ˜b ∈ B, we have

Proof. From (3.2.1.3) we have Db(x) =(cid:80)R
(cid:90)
(cid:107)Db(x) − D˜b(x)(cid:107)2
(cid:90)
(cid:90)

(x)⊗M(cid:17)(cid:107)2

(cid:107)Db(x) − D˜b(x)(cid:107)2

b,n(x)⊗M − λrv(r)

λrv(r)

F dx

F dx

˜b,n

=

=

G

G

G

(cid:16)
(cid:16)

(cid:107) R(cid:88)
(cid:107) R(cid:88)

r=1

λrv(r)

0 (x)⊗M (1 + n−αϕ

(cid:48)(r)
b

=

− λrv(r)

0 (x)⊗M (1 + n−αϕ

0 (x)⊗M(cid:16)

(cid:48)(r)
˜b

(t(r)
˜b
(1 + n−αϕ

(cid:48)(r)
b

(cid:107) R(cid:88)

λrv(r)

b (x))ψ(r)(nγ|a0(x) − a∗|))M
(t(r)

(x))ψ(r)(nγ|a0(x) − a∗|))M(cid:17)(cid:107)2

F dx
b (x))ψ(r)(nγ|a0(x) − a∗|))M
(t(r)

(cid:48)(r)
− (1 + n−αϕ
˜b

(t(r)
˜b

r(cid:107)v(r)
λ2

0 (x)⊗M(cid:107)2

(cid:48)(r)
− (1 + n−αϕ
˜b

(t(r)
˜b

F dx

(x))ψ(r)(nγ|a0(x) − a∗|))M(cid:17)(cid:107)2
(cid:16)
(x))ψ(r)(nγ|a0(x) − a∗|))M(cid:17)2
(cid:104)(cid:16)

(1 + n−αϕ

(cid:48)(r)
b

F

dx

r(cid:107)v(r)
λ2

0 (x)⊗M(cid:107)2

F

M n−αψ(r)(nγ|a0(x) − a∗|)(ϕ

(cid:48)(r)
b

r=1

G

(cid:90)

(cid:90)

(cid:90)

=

≤

=

r=1

G

r=1

G

R(cid:88)

R(cid:88)

r=1

G

b (x))ψ(r)(nγ|a0(x) − a∗|))M
(t(r)

b (x)) − ϕ
(t(r)

(cid:48)(r)
˜b

(t(r)
˜b

(x)))

(cid:17)

Cϕ,ψ

(cid:105)2

dx,

where the sequence C(n)

R(cid:88)

(cid:16)

r=1

λ2
r sup
x∈G

ϕ,ψ = 1 + O(n−2α),
0 (x)⊗M(cid:107)2

(cid:17)(cid:90)

F

(cid:107)v(r)

G

=Cn−2αM 2

(ψ(r)(nγ|a0(x) − a∗|))2(ϕ

(cid:48)(r)
b

b (x)) − ϕ
(t(r)

(cid:48)(r)
˜b

(t(r)
˜b

(x)))2dx.

(3.5.2.4)

In the above expression we use the simple identity:

(1 + u)M − (1 + v)M = (u − v)(cid:0)M +

(cid:18)M

(cid:19)

2

(u + v) +

(cid:18)M

(cid:19)

3

(u2 + uv + v2) + . . .(cid:1)

and the fact that functions ϕ, ψ are bounded. Hence, we can ﬁnd a bounded sequence

103

(n)

ϕ,ψ with which we can bound the o(n−α) terms in the expression (3.5.2.4). Now for each

C

r = 1, . . . , R, integrals

(cid:90)

(ψ(r)(nγ|a0(x) − a∗|))2(ϕ

(cid:48)(r)

b

(r)

(cid:48)(r)
b (x)) − ϕ
˜b

(t

(r)
(t
˜b

(x)))2dx

G

can be bounded by Crn−(d−1)γ using Lemma 4 in Sakhanenko (2011), where Cr > 0 is some

generic constant depending on r. Hence, we conclude that

(cid:107)Db(x) − D˜b(x)(cid:107)2

G ≤ Cn−2α−(d−1)γ

for some generic C > 0.

Next, we present the lemma which shows the curves driven by the pseudo-eigenvectors

are separated in L1-norm inside the class indexed by B.
Lemma 15. For each r = 1, . . . , R and for any b, ˜b ∈ B, such that b (cid:54)= ˜b and any large

enough n we have

(cid:107)x

(r)

b − x
(r)
˜b

(cid:107)1,T ≥ Cn−α−δ

for some C > 0 depending on r.

104

Proof. For any b, ˜b ∈ B, b (cid:54)= ˜b, we have

(·, a∗)(cid:107)1,T
b (t)ψ(0), a∗) − x

(r)

(t))ψ(0)

0

(r)

(r)

(r)

(cid:107)x

=

=

(cid:90) T
b (·, a∗) − x
(r)
˜b
(cid:32)
0 (t + n−αϕ
(cid:107)x
(cid:90) T
(cid:107)n−α(ϕ
b (t) − ϕ
(r)
(cid:90) 1
˜b
(cid:32)

0 (x(t + n−αψ(0)(πϕ
(cid:90) T
=n−αψ(0)
(cid:90) 1

(r)

v

0

0

0

(t)|

(r)

|ϕ
b (t) − ϕ
(r)
˜b
(cid:90) 1

v

(cid:16)(cid:107)
≥n−αψ(0)
(cid:90) T

0

|ϕ

0

inf
t∈[0,T ]
b (t) − ϕ
(cid:90) T
(r)
˜b

(r)

(t)|dt

=n−αψ(0)Cr

|ϕ

(r)

b (t) − ϕ
(r)
˜b

(t)|dt.

0

(r)

0 (t + n−αϕ

(r)
˜b

(t)ψ(0), a∗)(cid:107)1dt

(cid:33)

dt

(t)), a∗))dπ(cid:107)1

(r)

b (t) + (1 − π)ϕ
(r)
˜b

(cid:17)(cid:33)

dt

(r)

(r)

0 (x(t + n−αψ(0)(πϕ
b (t) + (1 − π)ϕ
(cid:34)(cid:16)
(r)
˜b
0 (x(t + n−αψ(0)(πϕ
b (t) + (1 − π)ϕ
(cid:35)
(r)
˜b

(t)), a∗))dπ(cid:107)1

(r)

(r)

(cid:107)

v

0

(cid:17)

(t)), a∗))dπ(cid:107)1

In order to complete the proof of this lemma we need to show that

(r)

0 (x(t + n−αψ(0)(πϕ

v

(r)

b (t) + (1 − π)ϕ
(r)
˜b

(t)), a∗))dπ(cid:107)1 > 0.

(3.5.2.5)

Let us suppose there exists an n0 such that for all n ≥ n0,

(cid:90) 1

0

(cid:107)
inf
t∈[0,T ]

(cid:90) 1

0

(cid:107)
inf
t∈[0,T ]

(r)

0 (x(t + n−αψ(0)(πϕ

v

(r)

b (t) + (1 − π)ϕ
(r)
˜b

(t)), a∗))dπ(cid:107)1 = 0.

105

Then there exists t0 ∈ [0, T ] such that for n ≥ n0

(r)

0 (x(t0 + n−αψ(0)(πϕ

v

(r)

b (t0) + (1 − π)ϕ
(r)
˜b

(t0)), a∗))dπ = 0.

(3.5.2.6)

Using the change of variable t0 + n−αψ(0)(πϕ

(r)

b (t0) + (1 − π)ϕ
(r)
˜b

(t0)) = u, we can write

(cid:90) 1

0

(3.5.2.6) as

(cid:90) t0+n−αψ(0)ϕ
(r)
b
t0+n−αψ(0)ϕ
(r)
˜b
0 (x(t0,n + n−αψ(0)(πnϕ

(t0)

(t0)

(r)

=⇒ v

(r)
0 (x(u, a∗))
v

n−αψ(0)(ϕ

du
(r)
(r)
b (t0)ϕ
˜b

= 0

(t0))

(r)

b (t0,n) + (1 − πn)ϕ
(r)
˜b

(t0,n)))) = 0,

where {t0,n}n≥1, t0,n → t0 and πn ∈ [0, 1]. This would imply that there exits x ∈ X0 ⊂ G
0 (x) = 0, x ∈ X0, which violates the assumption that we have imposed in

such that v

(r)

(A1) and (A3). Therefore (3.5.2.5) holds true. Now by following the proof of Lemma 5 in

Sakhanenko (2011) the integral

(cid:90) T

0

|ϕ

(r)

b (t) − ϕ
(r)
˜b

(t)|dt

can be shown to be bounded from below by Cn−δ, where C is a generic constant depending

on r. Hence, ultimately, we can conclude

(cid:107)x

(r)

b − x
(r)
˜b

(cid:107)1,T ≥ Cn−α−δ.

106

Now going back to the expression in Theorem 5 using Lemma 12 we obtain

ˆX

(1)
n ,..., ˆX

inf
n ∈En(a∗,T )
(R)

sup
D∈Dc,k

Ew

ˆX

(1)
n ,..., ˆX

inf
n ∈En(a∗,T )
(R)

Ew

sup
D∈DL

≥

≥

(cid:16)
n2/(d+3)(cid:16)(cid:107) ˆX
n2/(d+3)(cid:16)(cid:107) ˆX
(cid:16)
(cid:18)

(1)

n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX

(1)

n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX

|x(r)|≥δ

inf
(r)
2 : r=1,...,R

wn(|x(1)|, . . . ,|x(R)|)

1 − nCΣ,BC(f, δ1)β + ln 2

ln(L − 1)

(R)

(cid:17)(cid:17)
n − x(R)(cid:107)p,T
(cid:17)(cid:17)

(R)

n − x(R)(cid:107)p,T
(cid:19)

.

(3.5.2.7)

In the above expression we substitute CΣ,BC(f, δ1) = C > 0 as a generic constant and
choose β = Cn1−2α−(d−1)γ, δ
2 = C/2n−α−δ. Also it can be shown that L ≥ exp(P/2),

(r)

see Ibragimov and Has’minskii (1980, 1981), hence by an algebraic manipulation we get
ln(L − 1) ≥ P/2 − 1. Therefore, we can rewrite (3.5.2.7) as

(cid:16)

n2/(d+3)(cid:16)(cid:107) ˆX

ˆX

(1)
n ,..., ˆX

inf
n ∈En(a∗,T )
(R)

Ew

sup
D∈Dc,k

=

|x(r)|≥C/2n−α−δ : r=1,...,R

inf

wn(|x(1)|, . . . ,|x(R)|)

>0.5

inf

|x(r)|≥0.5Ch: r=1,...,R

w(|x(1)|, . . . ,|x(R)|) > 0,

(cid:17)(cid:17)

n − x(R)(cid:107)p,T

(1)

(R)

n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX
(cid:32)
1 − n1−2α−(d−1)γC2 + ln 2

(cid:33)

P/2 − 1

where wn(|x(1)|, . . . ,|x(R)|) = w

(cid:16)

n2/(d+3)(cid:16)|x(1)|, . . . ,|x(R)|(cid:17)(cid:17)
(cid:16) P1−2

(3.5.2.8)

. Note that while obtaining

(cid:17)− ln 2, and this completes

(3.5.2.8) we have chosen α = 2/(d+3), γ = α/2 and δ = 0. Also we can choose P1 suﬃciently

large, where P = P1nδ, introduced before such that C2 <

4

the proof of Theorem 5.

107

3.5.3 Proof of Theorem 6

To prove part (a), ﬁrst ﬁx r in 1, . . . , R. Then we can write,

E(cid:107) ˆX

(r)

n − x(r)(cid:107)p,T = E

(cid:107) ˆX

(r)

n (t) − x(r)(t)(cid:107)p
pdt

 τ(cid:90)
 τ(cid:90)

0

0

1/p
1/p

≤

E(cid:107) ˆX

(r)

n (t) − x(r)(t)(cid:107)p
pdt

, using Jensen(cid:48)s inequality.

(3.5.3.1)

Recall that D denote the vectorized representation of the super-symmetric tensor D. Using

this notation and by deﬁnition of an integral curve estimator we can write

=

=

=

ˆX

0

D

(r)

n (t) − x(r)(t)
t(cid:90)
(cid:16)
t(cid:90)
(cid:16)

(cid:16) ˆX
v(r)(cid:16) ˆDn
v(r)(cid:16)
(cid:16) ˆX
t(cid:90)
(cid:16)
v(r)(cid:16)
∇v(r)(cid:16)
(cid:16)
t(cid:90)
∇v(r)(cid:16)

0

t(cid:90)

0

+

(cid:16)

D

D

D

+

0

(r)
n (s)

(r)
n (s)

(cid:16) ˆX

D

D

ds

ds

x(r)(s)

x(r)(s)

(cid:17)(cid:17)(cid:17)
(cid:17)(cid:17)(cid:17)

(cid:16)
(cid:17)(cid:17) − v(r)(cid:16)
(cid:16)
(cid:17)(cid:17) − v(r)(cid:16)
(cid:17)(cid:17) − v(r)(cid:16)
(cid:16) ˆX
(cid:17)(cid:16) ˆX
(cid:16)
(cid:17)(cid:17)∇D
(cid:17)
n (s) − x(r)(s)
(cid:17)(cid:17)(cid:16) ˆDn(x(r)(s)) − D(x(r)(s))
(cid:17)

(cid:17)(cid:17)(cid:17)

(r)
n (s)

x(r)(s)

ds

(r)

D

(r)
n (s)

x(r)(s)

x(r)(s)

ds

ds + R

(r)
n (t),

0

108

where the remainder term R

(r)
n (t) is given by

=

R

t(cid:90)

0

+

(r)
n (t)

D

D

(cid:16)

(cid:16)

(r)
n (s)

v(r)(cid:16)
(cid:16) ˆX
(cid:17)(cid:17) − v(r)(cid:16)
(cid:16) ˆX
(cid:17)
n (s) − x(r)(s)
t(cid:90)
(cid:16) ˆX
(cid:16) ˆX
(cid:17)(cid:17) − v(r)(cid:16)
v(r)(cid:16)
(cid:16) ˆDn(x(r)(s)) − D(x(r)(s))
(cid:17)

(r)
n (s)

(cid:16)

ds

(r)

D

D

ds.

0

(cid:17)(cid:17) − ∇v(r)(cid:16)

D

(cid:16)

(cid:17)(cid:17)∇D

x(r)(s)

(cid:16)

(cid:17)(cid:17)

x(r)(s)

x(r)(s)

(cid:17)(cid:17) − ∇v(r)(cid:16)

(r)
n (s)

(cid:16)

D

(cid:17)(cid:17)(cid:17)

x(r)(s)

Now similar to Koltchinskii et al. (2007) we can decompose

(r)

n (s) − x(r)(s) = Z

(r)
n (s) + δ

ˆX

(r)
n (s), for each r = 1, . . . , R.

Then we can write

=

x(r)(s)

(r)
n (s)ds + R

(r)
n (t)

ˆX

0

D

(r)

n (t) − x(r)(t)
t(cid:90)
∇v(r)(cid:16)
(cid:16)
t(cid:90)
∇v(r)(cid:16)
t(cid:90)
∇v(r)(cid:16)

+

+

0

(cid:16)

δ

x(r)(s)

(cid:17)(cid:17)∇D
(cid:16)
(cid:17)
(cid:17)(cid:16)
(cid:17)(cid:16) ˆDn − D
(cid:17)(cid:17)∇D
(cid:16)

(cid:17)

(cid:17)

D(x(r)(s))

x(r)(s)

ds

D

x(r)(s)

x(r)(s)

(r)
n (s)ds.

Z

0

109

Here the stochastic processes δ

(r)
n (t), Z

(r)

n (t) for any t ∈ [0, τ ], are represented by

(r)
n (s)ds + R

(r)
n (t)

δ

x(r)(s)

ds

(cid:17)

(cid:17)

x(r)(s)

x(r)(s)

(r)
n (s)ds.

Z

(r)
n (t) =

δ

(r)
n (t) =

Z

x(r)(s)

(cid:17)(cid:17)∇D
(cid:16)
(cid:17)
(cid:17)(cid:17)(cid:16) ˆDn − D
(cid:17)(cid:16)
(cid:16)
(cid:17)(cid:17)∇D
(cid:17)

t(cid:90)
t(cid:90)

0

D

D

x(r)(s)

x(r)(s)

(cid:16)
∇v(r)(cid:16)
(cid:16)
∇v(r)(cid:16)
t(cid:90)
∇v(r)(cid:16)
(cid:17)(cid:17)∇D
(cid:17)(cid:17)(cid:16) ˆDn − D

(cid:16)
(cid:16)

x(r)(s)

x(r)(s)

D

0

0

D

+

(cid:16)

(cid:16)

D

x(r)(s)

= v

(cid:48)(r)(x(r)(s)). Then Z

(r)
n (t) satisﬁes the

(cid:17)(cid:16)

(cid:17)

x(r)(s)

+ Z

(r)
n (t), Z

(r)
n (0) = 0.

(3.5.3.2)

Let us denote ∇v(r)(cid:16)
= ∇v(r)(cid:16)

equation:

dZ

(r)
n (t)
dt

(r)
n (s) =

Z

0

where U (r) : R2 (cid:55)→ Rd2

The solution to the stochastic diﬀerential equation in (3.5.3.1), can be represented by the
d−dimesional random process

t(cid:90)

U (r)(t, s)∇v(r)(cid:16)

(cid:16)

D

x(r)(s)

(cid:17)(cid:17)(cid:16) ˆDn − D

(cid:17)(cid:16)

(cid:17)

x(r)(s)

ds, t ∈ [0, τ ],

is the Green’s function deﬁned as the solution of the PDE given by:

= ∇v(r)(cid:16)

(cid:16)

(cid:17)(cid:17)∇D

(cid:16)

(cid:17)

D

x(r)(t)

x(r)(t)

U (r)(t, s),

∂U (r)(t, s)

∂t

U (r)(s, s) = Id×d.

110

(r)

n (t)(cid:107)p + (cid:107)EZ

(r)

n (t)(cid:107)p + (cid:107)δ

(cid:17)p

(r)

n (t)(cid:107)p

. (3.5.3.3)

˜D(Xj), where ˜D(Xj) is the LSE estimator of D(Xj)

Now from (3.5.3.1) we have

E(cid:107) ˆX

(r)

p ≤ E(cid:16)(cid:107)Z
n (t) − x(r)(t)(cid:107)p
(cid:18) x−Xj
n(cid:80)

(r)

n (t) − EZ
(cid:19)

Since ˆDn(x) = 1
nhd
n

K

j=1

hn

given in (1.4.3), we can write

˜D(Xj) = (BT B)−1BT Y (Xj)

= (BT B)−1BT (BD(Xj) + Σ1/2(Xj)Ξj)
= D(Xj) + Γj, where Γj = (BT B)−1BT Σ1/2(Xj)Ξj.

n(cid:80)

i=1

(cid:90)

Now let us denote Z

(r)
n (t) =

χ

(r)
i

(t), where

(cid:33)

ds(cid:0)D(Xi) + Γj

(cid:1) ,

x(r)(s) − Xj

χ

(r)
i

f

(r)
t

(r)
t

f

(s)K

(t) =

(s) = I[0,t](s)U (r)(t, s)∇v(r)(cid:16)

hn

(cid:17)

D(x(r)(s))

.

Then using similar arguments from Carmichael and Sakhanenko (2015) we can derive

(cid:90)
(cid:90)

EZ

(r)

n (t) = −hn

+

1
2

h2
n

(cid:90)

(s)∇D(x(r)(s))

f

(r)
t

uK(u)duds

f

(r)
t

(s)

K(u)(cid:104)∇2D(x(r)(s))u, u(cid:105)duds + o(h2
n).

Since we select the bandwidth hn = n−1/(d+3), we get EZ
β (t) is the mean of the limiting Gaussian process G(r)(t) for integral curve estimator

(r)
n (t) =

, where

(r)

µ

µ

n

(r)
β

(cid:113)
(t)+oP (1)
nhd−1

111

(cid:32)

(cid:90)

ˆX

(r)
n (t) and β is the constant that has been introduced in (A8). Hence,

(r)

β (t)(cid:113)
(cid:107) µ
nhd−1

n

(cid:107)p = n−2/(d+3)(cid:107)µ

(r)

β (t)(cid:107)p.

Also notice that using similar arguments from Carmichael and Sakhanenko (2015) we get

(r)

n (t)| = oP ((nhd−1
|δ

n

)−1/2).

sup
0≤t≤τ

Now applying Rosenthal’s inequality from Ibragimov and Ibragimov (2008) to independent
n with ﬁnite p−th moments, we obtain

mean zero random variables

(t) − Eχ

/nhd

(t)

χ

(r)
i

(r)
i

(cid:16)

(cid:16)

E(cid:107) n(cid:88)
≤ d(cid:88)

i=1

k=1

C(r)(p) max

(cid:32) n(cid:88)

E(cid:16)(cid:16)

(cid:17)

(t)

(cid:17)
E|(cid:16)

χ

(r)
i

(cid:40) n(cid:88)

i=1

(r)

i,k (t) − Eχ

χ

(t) − Eχ

χ

(r)
i

/nhd

n(cid:107)p

p

(cid:17)
(cid:17)2(cid:33)p/2(cid:41)

/nhd

(r)
i,k (t)

,

n|p,

(r)

i,k (t) − Eχ
(cid:17)

(r)
i,k (t)

/nhd
n

i=1

112

where C(r)(p) > 0 depends on p and r only. Now we can bound the p−th moment of χ

(r)
i,k

using the similar arguments from Koltchinskii et al. (2007).

ds1 (Dk(Xi) + Γj,k) × . . .×

(cid:17)

E|(cid:16)
(cid:90)

R

≤C(nhd

i,k (t)

(cid:90)
i,k (t) − Eχ(r)
χ(r)
n)−pE
(cid:32)

R

f (r)
t

f (r)
t

(sp)K

(cid:33)

(cid:32)
n|p
/nhd
x(r)(s1) − Xi
(cid:33)

hn

(s1)K

x(r)(sp) − Xi

hn

(cid:32)

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)
(cid:90)

|f (r)

(s1)| . . .|f (r)

(sp)|

t

dsp (Dk(Xi) + Γj,k)

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)

≤C(cid:48)(nhd

n)−p
(cid:32)

Rd+p

x(r)(s1) − y

K

=C(cid:48)hd

n(nhd

(cid:90)

hn
n)−p
(cid:32)

Rd+p

(cid:33)(cid:33)

ds1 . . . dspdy

(cid:32)

x(r)(sp) − y

. . . K

hn
(s1)| . . .|f (r)

|f (r)

t

x(r)(s2) − x(r)(s1)

K(z)K

z +

t

(cid:33)

(sp)|

× . . . × K

(cid:32)

z +

(cid:33)(cid:33)

ds1 . . . dspdz

x(r)(sp) − x(r)(s1)

hn

=C(cid:48)hd+p−1

n

(nhd

n)−p

(s1)||f (r)

(s1 + τ2h)| . . .|f (r)

(s1 + τph)|K(z)×

t

t

(cid:33)
(cid:32)

(cid:90)

(cid:90)

Rp−1

(cid:90)

(cid:32)

hn
|f (r)

t

τ2hn

τphn

(cid:19)1/p

Rd+p

x(r)(s1 + τ2h) − x(r)(s1)

× . . .×

x(r)(s1 + τph) − x(r)(s1)

t

(cid:33)
(cid:33)(cid:33)
(cid:18)(cid:90)

dzds1dτ2 . . . dτp

(cid:19)1/p(cid:18)(cid:90)

≤C(cid:48)hd+p−1

(nhd

n)−p

˜Λ(τ2, . . . , τp)

|f (r)

t

(s1)|pds1

(cid:32)
(cid:32)

K

z + τ2

K

z + τp

n

(cid:18)(cid:90)

|f (r)

t

(s1 + τphn)|pds1

dτ2 . . . dτp

(cid:18)(cid:90)

˜Λ(τ2, . . . , τp)

(cid:19)

≤C(cid:48)(cid:48)hd+p−1

n

(nhd

n)−p

Rp−1

≤C1hd+p−1

n

(nhd

n)−p.

|f (r)

t

(s1)|pds1

113

(cid:19)1/p × . . .×

|f (r)

t

(s1 + τ2hn)|pds1

(3.5.3.4)

Here, in (3.5.3.4),

˜Λ(τ2, . . . , τp)

=

sup

s1,...,sp∈[0,τ ]

(cid:90)

Rd

(cid:32)

K(z)K

z + τ2

x(r)(s2) − x(r)(s1)

s2 − s1

(cid:33)

(cid:32)

z + τp

× . . . × K

(cid:33)

dz,

x(r)(sp) − x(r)(s1)

sp − s1

is analogous to Λ(τ1, τ2, τ3) in Carmichael and Sakhanenko (2015). Dk is the k-th component
of vectorized D. Also C1, C, C(cid:48) and C(cid:48)(cid:48) are positive constants depending on B, D, U (r). Thus,
from (3.5.3.3), we obtain by replacing hn = n−1/(d+3),

i=1

(r)
i

(t)

(cid:16)

/nhd

n(cid:107)p

p

(r)
χ
i

(t) − Eχ

E(cid:107) n(cid:88)

(cid:17)
C1nhd+p−1
(cid:40)
n
(nhd
n)p
C(cid:48)
=dC(r)(p)n−3p/(d+3) max
1n(4−p)/(d+3), C(cid:48)

C2nhd+1
n
(nhd
n)2

≤dC(r)(p) max

(cid:40)

(cid:32)

(cid:33)p/2(cid:41)

,

2np/(d+3)

(cid:41)

(3.5.3.5)

=dC(r)(p)C(cid:48)

2n−2p/(d+3),

where C(cid:48)

1, C(cid:48)

2 are positive constants. As a result, we have

E(cid:107) ˆX

(r)

n − x(r)(cid:107)p

p ≤n−2p/(d+3)b((cid:107)µ

(r)

β (t)(cid:107)p)

E(cid:107) ˆX

(r)

n − x(r)(cid:107)p,T ≤n−2/(d+3)

b((cid:107)µ

(r)

β (t)(cid:107)pdt

 τ(cid:90)

1/p

for some bounded and continuous function b : [0,∞) (cid:55)→ [0,∞). This concludes the proof of

0

part (a) in Theorem 3.

114

To prove part (b), we ﬁrst notice that we can write

E sup
t∈[0,τ ]

max
1≤k≤d

| ˆX

(r)

n,k(t) − x

≤(cid:107)U (r)(cid:107)∞E sup
t∈[0,τ ]

max
1≤k≤d

(r)

k (t)|
(cid:16)

(r)
χ
i

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n(cid:88)

i=1

(cid:17)

(t)

/nhd
n

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup

t∈[0,τ ]

max
1≤k≤d

(t) − Eχ

(r)
i

|µ

(r)

β,k(t)|(cid:113)

nhd
n

+ E sup
t∈[0,τ ]

max
1≤k≤d

|δ

(r)

k (t)|.

We already know sup
t∈[0,τ ]
L2-norm similar to Sakhanenko (2011) we can bound,

n

|δ(r)(t)| = oP ((nhd−1

)−1/2). Now using a maximal inequality for

(cid:17)

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)
(cid:17)

(r)
χ
i

(cid:16)
(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n(cid:88)

i=1

i=1

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n(cid:88)
E sup
(cid:16)
n(cid:80)

χ

(r)
i

√
≤

.

E sup
t∈[0,τ ]

(t) − Eχ

(r)
i

(t)

(t)

(r)
i

(cid:16)

(r)
χ
i

/nhd
n

/nhd
n

t∈[0,τ ]

max
1≤k≤d

d max
1≤k≤d

(t) − Eχ

(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)21/2
(cid:17)
(s) = I[0,t](s)U (r)(t, s)∇v(r)(cid:16)
(cid:16)E|W (r)(t1) − W (r)(t2)|2(cid:17)1/2 ≤ C(r)n−2/(d+3)|t2 − t1|.

(t) − Eχ

/nhd

(r)
t

(r)
i

i=1

(t)

Let us denote, W (r)(t) =

few steps of (3.5.3.4) with p = 2 and f
0 ≤ t1 < t2 ≤ τ we get

n, t ∈ [0, τ ]. Then repeating the ﬁrst

(cid:17)

D(x(r)(s))

, for

(3.5.3.6)

Observe that the diameter of [0, τ ] in the metric m(t1, t2) = C(r)n−2/(d+3)|t2 − t1| is
diam([0, τ ], m) = τ C(r)n−2/(d+3) and the maximal number of ε-separated points in [0, τ ]
with respect to metric m is Dist(ε, m) = τ C(r)n−2/(d+3)/ε. Now using the inequality on

115

page 100 in van der Vaart and Wellner (1996) yields

(cid:32)

|W (r)(t)|2

E sup
t∈[0,τ ]

(cid:33)1/2

≤(cid:16)E|W (r)(t0)|2(cid:17)1/2

+ K(r)

(cid:90)

0

diam([0,τ ],m)

Dist1/2(ε, m)dε

and we can write

(cid:90)

diam([0,τ ],m)

Dist1/2(ε, m)dε =

(cid:90)

τ C(r)n−2/(d+3)

0

0

On the other hand, from (3.5.3.5) we have

generic constant C(r) > 0. Then we have the bound:

(cid:17)1/2

τ C(r)n−2/(d+3)/ε

dε = 2τ C(r)n−2/(d+3).

(cid:16)
(cid:16)E|W (r)(t0)|2(cid:17)1/2 ≤ C(r)n−2/(d+3) for some
(cid:33)1/2

≤ C(r)n−2/(d+3).

(cid:32)

|W (r)(t)|2

E sup
t∈[0,τ ]

As a result ﬁnally we get

E sup
t∈[0,τ ]

max
1≤k≤d

| ˆX

(r)

n,k(t) − x

(r)

k (t)| ≤ C(r)n−2/(d+3)

for some generic C(r) > 0, which concludes the proof of Theorem 6.

116

Chapter 4

Future research

In our work we have presented a comprehensive framework for establishing both local and

global minimax lower bounds for the asymptotic risk of the integral curve estimator in high

order tensor models. Although here we have speciﬁcally developed the theoretical results for

the HARDI model, we can generalize our work into any semi-parametric model in a similar

fashion.

One of the shortcomings of our current method of estimation, is the various computational

issues that arise near the branching of integral curves. The issue of computational eﬃciency

of our method can be researched further which could also reduce the computation time near

the branching of the ﬁbers.

Another interesting direction for future work could be to ﬁnd minimax estimators in the

style of Efromovich (1998) by optimizing the constants further in the lower bounds for both

local and global risks. However, this type of analysis may involve more technical details in

the proofs of results. Moreover, the methodology we developed in chapter 3 can be used

further for comparisons of diﬀerent protocols in a much more broader setting with many

patients involved in a study. In such a setting we can compare the accuracy of the protocols

while controlling for the individual eﬀects of each patient. Thus, it could give us a better

understanding of the valuable metric that we have explored in chapter 3 and could potentially

be deployed in future neuroimaging studies for assessing accuracy.

117

4.1 Extension of integral curve estimation

We would also like to make a remark that the integral curve estimation with proper uncer-

tainty quantiﬁcation is a powerful tool that can be used to study more general stochastic
process Xt, t ∈ [0,∞) given by the model

dXt = µ(Xt, t)dt + σ(Xt, t)dWt,

where µ(Xt, t) is the drift parameter, σ2(Xt, t)/2 is the diﬀusion parameter and Wt, t ∈ [0,∞)

is a Wiener process. The Fokker-Planck equation for the density p(x, t) of the stochastic

process Xt is given by

∂
∂t

p(x, t) = − ∂
∂x

[µ(x, t)p(x, t)] +

∂2
∂x2 [

σ2(Xt, t)

2

p(x, t)].

Some of the contemporary work where similar model has been used can be found in Zheng

et al. (2019) and Toppalododdi and Wettlaufer (2017). While in Zheng et al. (2019) authors

proposed maximum likelihood estimators of the parameters in the Fokker-Planck equation
in presence of measurement errors modeled by an α−stable L´evy noise, Toppaladoddi and

Wettlaufer (2017) have studied the numerical aspects of the solution to the Fokker-Planck

equation to model the density of the thickness of glacial ice sheets. In these type of framework

we can extend the integral curve estimation in stochastic partial diﬀerential equation (SPDE)

with proper uncertainty quantiﬁcation. Since our method uses relaxed assumptions on the

measurement errors involved in the stochastic diﬀerential equation, the estimates that we

provide are more robust with respect to the underlying model, achieving optimal conﬁdence

bounds at the same time.

118

BIBLIOGRAPHY

119

BIBLIOGRAPHY

[1] Banerjee, C., Sakhanenko, L., Zhu D.C. Lower Bounds for Accuracy of Estimation in
Magnetic Resonance High Angular Resolution Diﬀusion Imaging Data J Indian Soc Probab
Stat (2019).

[2] Basser, P.J., Mattiello, J., Le Bihan, D., MR diﬀusion tensor spectroscopy and imaging,

Biophys J, Vol. 66, 259–267 (1994).

[3] Behrens, T.E., Berg, H.J., Jbabdi, S., Rushworth, M.F., Woolrich, M.W. (2007) Proba-
bilistic diﬀusion tractography with multiple ﬁbre orientations: What can we gain?, Neu-
roimage, Vol. 34, 144–155.

[4] Carmichael, O., Sakhanenko, L. Estimation of integral curves from high angular res-
olution diﬀusion imaging (HARDI) data, Linear Algebra and its Application, Vol. 473,
377–403 (2015).

[5] Carmichael, O., Sakhanenko, L. Integral curves from noisy diﬀusion MRI data with

closed-form uncertainty estimates, Stat Inference Stoch Process 19, 289–319 (2016).

[6] Cator, E. (2011) Adaptivity and optimality of the monotone least-squares estimator,

Bernoulli, Vol. 17(2), 714–735.

[7] Chang, S.E., Zhu, D.C., Neural network connectivity diﬀerences in children who stutter,

Brain, Vol. 136, 3709–3726 (2013).

[8] Chicone, C., Ordinary diﬀerential equations with applications, Springer Science and

Business Media, (2006).

[9] Coddington, E.A., Levinson, M. (1955) Theory of Ordinary Diﬀerential Equations.

McGraw-Hill, New York.

[10] De Lathauwer L., de Moor B., Vandewalle J. (2000) On the best rank-1 and rank-
(R1, R2..., RN ) approximation of higher-order tensors, SIAM J. Matrix Anal. Appl., Vol.
21, 1324—1342.

[11] Devroye, L., A course in density estimation, Progress in Probability and Statistics, vol.

14. Birkh¨auser, Boston, (1987).

[12] Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R. Apparent diﬀusion coeﬃ-
cients from high angular resolution diﬀusion imaging: estimation and applications, Magn.
Reson. Med., Vol. 56, 395–410, (2006).

120

[13] Efromovich, S. (1998) Simultaneous sharp estimation of functions and their derivatives,

Ann. Statist., Vol. 26, 1, 273–278.

[14] Efromovich, S. (2008) Adaptive estimation of and oracle inequalities for probability

densities and characteristic functions, Ann. Statist., Vol. 36, 3, 1127–1155.

[15] Efromovich, S. (2014) Nonparametric Curve Estimation, Springer.

[16] Efromovich, S. (2017) Missing, modiﬁed, and large-p-small-n data in nonparametric

curve estimation. Calcutta Stat. Assoc. Bulletin, 69(1), 1–34.

[17] Efromovich, S. (2018) Missing and Modiﬁed Data in Nonparametric Estimation: With
R Examples, Chapman and Hall/CRC Monographs on Statistics and Applied Probability.

[18] Ferguson, T.S., Mathematical statistics: A decision theoretic approach, Academic Press,

New York, (1967).

[19] Guntuboyina, A., Sen, B. Global risk bounds and adaptation in univariate convex re-

gression.

[20] Guntuboyina, A. (2011) Minimax Lower Bounds (Doctoral dissertation) Yale University,

New Haven, USA.

[21] H´ajek, J. (1972) Local asymptotic minimax and admissibility in estimation. Proc, Sixth
Berkeley Symp. on Math. Statist. and Prob., Vol. 1, 175–194. Probab. Theory Relat. Fields
163, 379—411 (2015).

[22] Has’minskii, R.Z., Ibragimov, I.A., On density estimation in the view of Kolmogorov’s

ideas in approximation theory, Ann. Statist. Vol. 18, No. 3, 999–1010 (1990).

[23] Ibragimov, M., Ibragimov R., Optimal constants in the Rosenthal inequality for random
variables with zero odd moments, Statistics and Probability Letters, Vol. 78, 186–189,
(2008).

[24] Ibragimov, I.A., Has’minskii, R.Z. Statistical Estimation - Asymptotic Theory, Springer

Science and Business Media, (2013).

[25] Ibragimov, I.A., Khasminskii, R.Z., On estimation of the density function, Zap. Nauch.

Sem. LOMI AN SSSR, 98, 61–85, (1980).

[26] Ibragimov, I.A., Khasminskii, R.Z., Further results on nonparametric density estima-

tion, Zap. Nauch. Sem. LOMI AN SSSR, 108, 73–89 (1981).

[27] Kim, A.K.H., Samworth, R.J., Global rates of convergence in log-concave density esti-

mation, Ann. Statist, Vol. 44, No. 6, 2756–2779, (2016).

121

[28] Koltchinskii, V., Sakhanenko, L., Cai, S. Integral Curves of Noisy Vector Fields and
Statistical Problems in Diﬀusion Tensor Imaging: Nonparametric Kernel Estimation and
Hypotheses Testing, Ann. Statist, Vol. 35, No. 4, 1576–1607, (2007).

[29] Le Bihan, D., Mangin, J.F., Poupon, C., Clark, C.A., Pappata, S., Molko, N.,
Chabriat,H., Diﬀusion tensor imaging: concepts and applications, J. Magn. Reson. Imag-
ing., Vol. 13, 534–546, (2001).

[30] ¨Ozarslan, E., Mareci, T. Generalized diﬀusion tensor imaging and analytical relation-
ships between diﬀusion tensor imaging and high angular resolution diﬀusion imaging,
Magn. Reson. Med., Vol. 50, 955–965, (2003).

[31] Raskutti, G., Wainwright, M.J., Yu, B., Minimax-Optimal Rates For Sparse Additive
Models Over Kernel Classes Via Convex Programming, Journal of Machine Learning
Research, Vol. 13, 389–427, (2012).

[32] Sakhanenko, L. How to choose the number of gradient directions for estimation problems
from noisy diﬀusion tensor data. Festschrift for Hira Koul Lahiri, S., Schick, A., SenGupta,
A., Sriram, T.N. (Eds), Springer Contemporary Developments in Statistical Theory, pp.
305–311, (2013).

[33] Sakhanenko, L. (2012) Lower bounds for accuracy of estimation in Diﬀusion Tensor

Imaging, Theory Probab. Appl., Vol. 54, 168–177.

[34] Sakhanenko, L., Global rate optimality in a model for Diﬀusion Tensor Imaging, Theory

Probab. Appl., Vol. 55, No. 1, 77—90, (2011).

[35] Sakhanenko, L. and DeLaura, M. A comparison study of statistical tractography
methodologies for Diﬀusion Tensor Imaging. International Journal of Statistics: Advances
in Theory and Applications, 1(1), 93-110, (2017).

[36] Stone, C.J., Optimal global rates of convergence for nonparametric regression, Ann.

Statist, Vol. 10, No. 4, 1040–1053, (1982).

[37] Toppaladoddi, S., Wettlaufer, J.S., Statistical Mechanics and the Climatology of the

Arctic Sea Ice Thickness Distribution, J. Stat Phys, Vol. 167, 683—702, (2017).

[38] van der Vaart A.W., Wellner J.A., Weak Convergence and Empirical Processes with
Applications to Statistics, Springer Series in Statistics. Springer, New York, NY, (1996).

[39] Ying, L., Zou, Y., Klemer, D., and Wang, J., Determination of Fiber Orientation in
MRI Diﬀusion Tensor Imaging Based on Higher-Order Tensor Decomposition, Proceedings
of the 29th Annual International Conference of the IEEE EMBS, 2065–2068, (2007).

122

[40] Zheng, Y., Fang, Y., Duan, J., Sun, X., Fu, X., Kurths, J., The maximum likelihood
climate change for global warming under the inﬂuence of greenhouse eﬀect and L´evy noise,
Chaos: An Interdisciplinary Journal of Nonlinear Science, Vol. 1(1), (2020).

[41] Zhu, D.C., Majumdar, S., Integration of resting-state FMRI and diﬀusion-weighted MRI
connectivity analyses of the human brain: limitations and improvement, J. Neuroimaging,
Vol. 24, 176–186, (2014).

[42] Zhu, D.C., Majumdar, S., Korolev, I.O., Berger, K.L., Bozoki, A.C., Alzheimer’s disease
and amnestic mild cognitive impairment weaken connections within the default mode
network: a multi-modal imaging study, J. Alzheimers Dis., Vol. 34, 969–984, (2013).

[43] Zhu H., Zhang H., Ibrahim J., and Peterson B. (2007) Statistical analysis of diﬀusion
tensors in diﬀusion-weighted magnetic resonance image data, J. Amer. Statist. Assoc.,
Vol. 102, 1081-–1110.

[44] Zhu H., Li Y., Ibrahim I., Shi X., An H., Chen Y., Gao W., Lin W., Rowe D. and
Peterson B. (2009) Regression models for identifying noise sources in magnetic resonance
images, J. Amer. Statist. Assoc., Vol. 104, 623-–637.

123