MINIMAX LOWER BOUNDS IN HIGH ORDER TENSOR MODELS WITH APPLICATIONS TO NEUROIMAGING By Chitrak Banerjee A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Statistics — Doctor of Philosophy 2020 ABSTRACT MINIMAX LOWER BOUNDS IN HIGH ORDER TENSOR MODELS WITH APPLICATIONS TO NEUROIMAGING By Chitrak Banerjee Minimax principle is a very useful concept in mathematical statistics for finding optimal estimators. While unbiasedness and invariance principle are useful tools for finding opti- mal estimators, they are often restrictive and in certain cases may not even yield optimal estimators, see Ferguson (1967). Minimax principle on the other hand, is based on linear ordering principle and is often less restrictive. While there are several methods for finding minimax optimal estimators such as methods due to H´ajek, Le Cam, Fano and Assouad, in our work, we specifically use H´ajek and Fano’s methods to explore the minimax optimality of integral curve estimators in high order tensor models. High angular resonance diffusion imaging (HARDI) is a popular in-vivo brain imaging technique proposed by Ozarslan and Mareci (2003). Besides the mathematical model for HARDI, successful tracing of neural fibers using HARDI presents the challenge of estimation and uncertainty quantification in presence of measurement errors. Our work here is based on the semi-parametric estimation method proposed by Carmichael and Sakhanenko (2015), where the authors have provided a consistent method for tracing fiber in the presence of measurement error using HARDI. The first work described here establishes the estimators proposed in Carmichael and Sakha- nenko (2015) are minimax optimal with respect to their asymptotic risk. The framework of HARDI allows to accommodate complex neural fiber structures where fiber tracts cross each other, converge, diverge, “fan out” or “kiss”, thus our work generalizes the minimax lower bound results in Sakhanenko (2012) where a similar result was established under a simpler model where imaging signals are modeled by a vector field perturbed by an additive noise. The second work establishes the global bounds for the integral curve estimators proposed by Carmichael and Sakhanenko (2015). Therefore suggesting that not only the asymptotic rate of convergence of the integral curve estimator is minimax optimal locally but also it is minimax optimal globally. Additionally in the simulation study of our second work we have introduced a metric based on global minimax optimal rates which can compare the relative accuracy of different imaging protocols that are used to obtain HARDI data. Copyright by CHITRAK BANERJEE 2020 Dedicated to my late grandmother Anjali Banerjee, her fighting spirit and commitment for social justice inspire me everyday. v ACKNOWLEDGMENTS I would like to extend my gratitude towards my advisor Professor Lyudmila Sakhanenko, Professor Hira Lal Koul and my mentor Dhruv B. Sharma for their constant support and encouragement. I would also like to thank Dr. David C. Zhu, it was a great pleasure working with him. I would also like to thank my committee members Professor Tapabrata Maiti and Professor Frederi G. Viens. During my five years at Michigan State University, I had a journey which was filled with excitement and challenges. I would not have made it till the end if my best friend Piyali Basak wasn’t there besides me. Lastly I would like to thank my friends and colleagues Metin Arda Ero˘glu, Andrew Dennhardt and many others with whom I shared a lot of good moments. vi CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Review of DT-MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Review of minimax lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 H´ajek’s principle 1.2.2 Fano’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Review of Integral curve estimation . . . . . . . . . . . . . . . . . . . . . . . 1.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 2 Local Minimax Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 2.1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Assumptions and main results . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Parametric subclass of tensors . . . . . . . . . . . . . . . . . . . . . . 2.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Remarks and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Constructed parametric subclass . . . . . . . . . . . . . . . . . . . . . 2.5.2 Main Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Proofs of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . 2.5.3.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . Chapter 3 Global Minimax Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Assumptions and main results . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Parametric subclass of tensors . . . . . . . . . . . . . . . . . . . . . . 3.3 Simulation study and Data analysis . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Neuroimaging example . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Remarks and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 1 3 4 8 9 11 13 14 16 18 18 19 20 20 23 26 28 29 30 36 58 58 65 68 68 69 73 76 76 79 82 83 vii 83 3.5.1 Proof of Lemma 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.5.3 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Chapter 4 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.1 Extension of integral curve estimation . . . . . . . . . . . . . . . . . . . . . . 118 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 viii LIST OF TABLES Table 3.1: The 25th percentile, median and 75th percentile of the κ values are or- dered in top to bottom in each line for different combinations of SNR and thickness of fibers; the sample size is n = 603. . . . . . . . . . . . . . . . . Table 3.2: Protocols for HARDI. Note that the second protocol has a total of 6 b0 . . . . . . . . . . . . . . . . . . images after 2 repetitions of 3 b0 images. Table 3.3: Comparison of the constant κ for tracing of anterior fibers based on imaging datasets obtained via different scanning protocols. Here δ = 0.003, β = 10−7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 80 81 ix LIST OF FIGURES Figure 1.1: A neuronal fiber bundle across the genu of corpus callosum is created based on the C-S (2015) method and is shown on the axial plane in (a) 2D and (b) 3D views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.1: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for C-pattern. We traced each fiber for 30 steps of size δ = 0.02. 100 of independent samples of n = n3 0 observations are used to create each boxplot. The labels on x-axis mark n0. The theoretical value is −1/3. The location t0 is the endpoint of the . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . trace. Figure 2.2: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for Y-pattern. We traced each fiber for 30 steps of size δ = 0.02. 100 of independent samples of n = n3 0 observations are used to create each boxplot. The labels on x-axis mark n0. The theoretical value is −1/3. We trace the main branch. . . . . . . Figure 3.1: We trace the integral curve using Carmichael and Sakhanenko (2015) method creating the “Y” pattern. Here sample size n = 603, signal-noise ratio SN R = 2, thickness of the fiber ε = 0.04, step size δ = 2 and the constant β = 10−7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.2: The red, blue and green lines show the 25th, 50th and 75th percentiles of the κ values across all the sample sizes n = n3 0, n0 = 30, . . . , 100 with increasing n0 by 10 at each step, repeated 100 times. Here we have used step size δ = 0.02 and β = 10−7 for each of the iterations. . . . . . . . . Figure 3.3: The computation times in seconds for simulation of κ values for different . . . . . . . . . . . . . . . . . . . . choices of SNR and thickness values. Figure 3.4: A neuronal fiber bundle across the genu of corpus callosum is created based on the Carmichael and Sakhanenko (2015) method. In (a) one particular seed point was used. The confidence ellipsoids along the fiber have 95% confidence. In (b) several seed points were used to create several . . . . estimated fibers. Two different branches are shown in two colors. 10 27 28 76 77 79 81 x LIST OF ALGORITHMS Algorithm 1: Estimation of Integral Curves . . . . . . . . . . . . . . . . . . . . . 15 xi Chapter 1 Introduction Neuroimaging is one of the most important biomedical imaging tools that plays a key role in detecting anomalies in human brian due to brain injuries such as concussions, brain tumors, cognitive impairments, onset of Alzheimer’s Disease (AD) and many other brain related illnesses.Thus, it is often imperative to have a proper imaging technique which can better equip neurosurgeons and clinicians to administer correct treatment for their patients. The two most common imaging techniques that are used at present are Computed tomography (CT) scan and Magnetic resonance imaging (MRI). While CT scan uses X-Ray, MRI uses radio waves causing less potential harm to human tissue due to radiation from high frequency beams. In our work we will specifically explore some interesting statistical properties of one of the imaging techniques, High angular resonance diffusion imaging (HARDI) involving Diffussion tensor (DT-MRI) scans. 1.1 Review of DT-MRI Magnetic resonance imaging (MRI) utilizes the dynamics of self-spinning protons, most commonly in water molecules, as the source of energy to generate MRI signal. Under a strong magnetic field, a group of these spins forms a net magnetization. This net magnetization can be perturbed by a radio frequency (RF) electromagnetic wave. Its wobbling (precessing) phenomenon can be measured as signals by an MRI scanner. By manipulating the magnetic 1 field using gradients, we can identify the locations of the signal, which in turn allow us to generate images. As MRI contains no radiation and thus the potential damage to the human body is minimal, it has become an important tool in both clinical and research applications. The phenomenon of water diffusion is further taken advantage in MRI to develop diffu- sion weighted imaging (DWI). Water diffusion in the presence of a magnetic field gradient leads to MRI signal loss. In an unrestricted environment, water and other molecules move or diffuse randomly in three dimensions resulting from thermal energy. This motion is called Brownian motion. Studying the Brownian motion of molecules (water molecules in our case) in the brain, we can provide information regarding the neuronal structural connectivity in vivo. These measurements have been made possible with DWI, see Basser et. al. (1994), Le Bihan et. al. (2001), which applies diffusion-weighted gradients in various directions to assess the diffusing directions of the water molecules. With DWI data, as in the commonly used diffusion tensor imaging (DTI) techniques, diffusivity values and principal diffusion orienta- tion can be estimated at each voxel. Since healthy axons contain intact myelin sheaths and tend to align in organized orientations, water diffusivity in a voxel tends to be preferentially along the direction of the axonal bundles. By inspecting the orientations of the diffusion tensors at neighboring voxels, axonal fiber bundles can be traced. The success of the axonal tracing can be used to understand the structural connections between brain regions, see Le Bihan et. al. (2001), Zhu and Majumdar (2014). This can also be used to assess axonal changes over time in applications such as brain maturation in young children, see Chang and Zhu (2013), axonal degeneration in Alzheimer’s diseases, see Zhu et. al. (2013). However, successful tractography based on DWI data faces some fundamentally challenging demands: specifically, the need for high signal-to-noise ratio (SNR), high spatial resolution, a relative long scan time, the ability to resolve crossing fibers, full coverage of tracks of interest, and 2 the ability to trace at regions with low diffusion anisotropy. To address the issue of crossing fibers, high angular resolution diffusion imaging (HARDI), proposed by ¨Ozarslan and Mareci (2003) in DWI has gained some success. The issues related to neuronal fiber tractography in DWI motivated our research on the integral curve estimation. 1.2 Review of minimax lower bounds Here we review some of the fundamental principles and methods that we commonly use in our model to find minimax lower bounds. Suppose Θ is a nonempty set commonly referred to as a parameter space, A is a nonempty set of actions available to the statistician called an action space. Let w be a non-negative real-valued function defined on Θ × A referred to as the loss function. Also, suppose X is a random variable from the probability space (X , B, Pθ). A statistical decision problem or a statistical game is a game (Θ, A , w) coupled with an experiment involving a random variable X whose distribution Pθ depends on the value of the unknown parameter θ ∈ Θ. On the basis of the outcome of the experiment X = x, the statistician chooses an action d(x) ∈ A . Such a function d which maps the sample space X into A is called a decision or a statistical decision. Therefore d(X) or w(θ, d(X)) is a random quantity. A non-negative quantity defined by (cid:90) Eθw(θ, d(X)) = w(θ, d(x))dPθ(x), is called the risk function of the decision rule d. To illustrate consider the following example: let X1, . . . , Xn be a sample from N (µ, 1) where µ ∈ R is the parameter space, µ is unknown. Suppose based on X1, . . . , Xn we define a decision rule d(X1, . . . , Xn) = ¯X, then under 3 squared error loss the risk function is given by Eµ( ¯X − µ)2. The fundamental problem of decision theory can be stated as: given a game (Θ, A , w) and a random observable X whose distribution depends on θ ∈ Θ, how can we choose the best decision rule? Traditionally there are two fundamental methods for finding optimal decision rule, see Ferguson (1967). The first one is restricting the available decision rule, examples of such a method include unbiasedness, invariance. The second method is ordering the decision rules. Examples of which are Bayes principle and minimax principle. In our work we are interested in the minimax principle. A decision rule d0 is said to be minimax if Eθw (θ, d0) = inf d∈A sup θ∈Θ sup θ∈Θ Eθw (θ, d) . In other words, a minimax decision rule if exists is the decision rule that minimizes the maximum risk among all possible decision rules d ∈ A . There are many proposed methods of finding minimax decision rules. In light of our problem we will review H´ajek’s principle and the method due to Fano. 1.2.1 H´ajek’s principle H´ajek’s principle for finding a minimax decision rule is based on the fundamental principle of Local Asymptotic Normality (LAN). As described in Ibragimov and Has’minkii (2013), (cid:1) , θ ∈ Θ ⊂ Rk, is a family of statistical experiments or random suppose (cid:0)Xn, Bn, Pθ,n variables. Then the family Pθ,n with the density f depending on θ is called locally asymptotic normal (LAN) at t ∈ Θ as n → ∞, if for some non-degenerate k × k matrix Ψ(n) = Ψ(n, t) 4 and any u ∈ Rk, the representation Zn,t(u) = dP t+Ψ(n)u,n dPt,n (Xn) = exp (cid:18) uT ∆n,t − 1 2 (cid:19) (cid:107)u(cid:107)2 + ϕn(u, t) , is vaild, where the distribution or the law of ∆n,t L(∆n,t|Pt,n) (cid:55)→ N (0,I), n → ∞, where I is the identity matrix of order k. Moreover, for any u ∈ Rk, ϕn(u, t) (cid:55)→ 0 in probability Pt,n as n → ∞. The quantities ∆n,t and Ψ(n, t) are given by Ψ(n, t) = (nI(t))−1/2 , n(cid:88) ∆n,t = (nI(t))−1/2 i=1 ∂ ln f (Xi, t) ∂t , where I(t) is the information matrix. Next we would like to review the concept of statistical regularity. A family of random variables X with the density p(x; θ), θ ∈ Θ, is called regular if 1. p(x; θ) is a continuous function on Θ for ν−almost all x. 2. X has finite Fisher’s information at each point θ ∈ Θ. 3. The function ∂ ∂θ p1/2(x; θ) is continuous in the space L2(ν), where L2(ν) is the space of functions whose second order moments with respect to measure ν are finite. Note that if the density p(x; θ) satisfies conditions 2 and 3 above it can be modified on sets of ν−measure zero (which may depend on θ) in such a manner that it becomes a 5 continuous function of θ. Furthermore, if we consider the measure ν as a probability measure and p1/2(X; θ), a random function of θ then it will satisfy the condition E(cid:16) p1/2(X; θ + h) − p1/2(X; θ) (cid:17)2 ≤ Bh2, where B > 0 is a constant. Now with the definition of statistical regularity, let us present the following theorem due to H´ajek, which is an important result to show that a family of random variables is LAN. Theorem 1. Let Θ ⊂ Rk, fj be the density of the j−th regular experiment depending on θ and the matrix Ψ2(n, θ) is a positive definite matrix and the following conditions are satisfied: 1. For any u0 > 0 n(cid:88) j=1 n→∞ sup lim |u| 0, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Ψ−1(n) Then the family of measures Pθ,n(A) =(cid:82) ···(cid:82) n(cid:81) lim n→∞ Ej j=1 A j=1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2+δ = 0. ∂ ln f (Xi, t) ∂t f (xj, θ)νj(dx) satisfies the LAN condition 6 for θ = t with  n(cid:88) j=1 Ψ−1(n, t) = Ij(t) −1/2 n(cid:88) and ∆n,t = Ψ−1(n, t) ∂ ln f (Xi, t) . ∂t j=1 Note that condition 2 in theorem 1 sometimes can be replaced by Lindeberg’s condition which often does not require finiteness of moments of order (2 + δ). However Lindeberg’s condition is usually harder to verify. Next we define a class of the loss functions Wε,2 as follows 1. The function w ∈ Wε,2 is non-negative on Rk, where k is the dimension of the parameter set; moreover, w(0) = 0 and w(u) is continuous at u = 0 but is not identically 0. 2. Function w is symmetric, that is w(u) = w(−u). 3. The sets {u : w(u) < c} are convex sets for all c > 0. 4. Any function w ∈ Wε,2 grows slower than exp(ε|u|2), ε > 0 as |u| → ∞. Having defined the class of loss functions Wε,2, let us state the main lemma due to H´ajek for establishing the minimax lower bound for the asymptotic risk of the estimators of θ. Lemma 1 (H´ajek’s Lemma). Suppose X1, X2, . . . is a sequence of random variables from a regular family of distributions and let the probability measure induced by Xn, Pθ,n satisfy the LAN condition at the point θ = t, with a normalizing matrix Ψ(n, t) such that Ψ(n, t) → Ψ(t) as n → ∞ where Ψ(t), t ∈ Θ, is positive definite. Then for any family of estimators 7 Tn = T (X1, . . . , Xn), any loss function w ∈ Wε,2, and any δ > 0 we have (cid:16) n→∞ sup lim |θ−t|<δ Eθw Ψ−1(n, t)(Tn − θ) (cid:17) ≥ (cid:90) 1 (2π)k/2 Rk w(x) exp (cid:18) −(cid:107)x(cid:107)2 2 (cid:19) dx. (1.2.1.1) To find the minimax estimator one can optimize the left hand side of (1.2.1.1) with respect to all possible estimators of θ. 1.2.2 Fano’s principle The principle of Fano, see Devroye (1987), is established upon Shannon’s information and the discretization of the parameter space. Here we would like to first introduce the key concepts of Shannon information and Kullback-Lieblar (KL) divergence. Suppose X is a discrete random variable with probabilities p1, . . . , pn depending on θ ∈ Θ then I (X, θ) = − n(cid:80) i=1 pi log pi, is called the Shannon’s information or entropy function. If X is absolutely continuous with respect to Lebesgue measure and the density of X, f depends on θ ∈ Θ, then the Shannon’s entropy or information is given by I (X, θ) = −Eθ(log f (X)). Next we will define KL divergence. Suppose X is a random variable with density f depending on a parameter θ ∈ Θ, then the KL divergence between densities fθ and fθ(cid:48) is given by K(fθ, fθ(cid:48)) = where ν(x) is the Lebesgue measure. (cid:90) (cid:19) (cid:18) fθ fθ(cid:48) fθ log dν(x), Lemma 2 (Fano’s Lemma). Let X be a random variable with density equal to one of the r+1 possible densities f1, . . . , fr+1, where K(fi, fj) ≤ β for all i (cid:54)= j. Let π(X) ∈ {1, . . . , r + 1} 8 be an estimate of the index. Then Pi(π(X) (cid:54)= i) ≥ 1 − (β + log 2) log r , sup i where Pi is the probability induced by fi. In our work we have extended Fano’s lemma in a multidimensional setting to prove global minimax bound for the integral curve estimators. Besides the principle of H´ajek and Fano, there are other useful principles due to Le Cam and Assouad, see Guntuboyina (2011), which we have deferred in our work. 1.3 Review of Integral curve estimation The problem of fiber tracing from the imaging data obtained from DT-MRI was first consid- ered as a problem of integral curve estimation by Koltchinskii et al. (2007). They considered the model where a vector field v : G (cid:55)→ Rd was observed at locations Xi ∈ G, i = 1, . . . , n, perturbed by an additive noise. The equation of the model is given by Vi = v(Xi) + ξi, where ξi, i = 1, . . . , n, are i.i.d bounded random vectors with Eξ = 0 and Cov(ξ, ξ) = Σ. The neural fibers were modeled as the solution of the ODE or equivalently the integral equation dx(t) dt = v(x(t)), t ≥ 0, x(0) ∈ G, ⇔x(t) = x(0) + v(x(s))ds. t(cid:90) 0 9 In their work the authors had developed a theoretically rigorous non-parametric approach to provide an estimate ˆX(t), t ≥ 0, based on the data (Xi, Vi), i = 1, . . . , n. As an alternative method, Probabilistic fiber tractography, as described in Behrens et. al. (2007) is another popular technique in DWI because it can assess the relative strength of fiber connection. However, this technique employs Monte Carlo sampling and bootstrap techniques, and depends on somewhat arbitrary prior parameter assumptions based on fully parametric models. Due to the incorrect parameter assumptions often times the error due to the repeated Monte Carlo sampling exacerbates the problem of estimation. To build a more sophisticated data driven model upon the methodology proposed by Koltchinskii et al. (2007), “low-order” DTI model by Carmichael and Sakhanenko (2016) and “high-order” HARDI model C-S (2015) were proposed recently. With these approaches, they demonstrated tighter confidence ellipsoids around the fibers, and more robustness in handling crossing fibers than with other DTI methods, see C-S (2015). The present work will concentrate on HARDI model by C-S (2015). To further motivate this model we display an enhanced image of fiber tracing. (a) 2D view on axial plane. (b) 3D view on axial plane. Figure 1.1: A neuronal fiber bundle across the genu of corpus callosum is created based on the C-S (2015) method and is shown on the axial plane in (a) 2D and (b) 3D views. 10 Figure 1.1 reveals the tracing of a fiber bundle across the genu of the corpus callosum by the method described in C-S (2015), which contains thick axonal fibers connecting the two cerebral hemispheres and enables the communication between them. The branches are shown in magenta and cyan colors. The blue region consists of 95% confidence ellipsoids surrounding the estimated curve, which are obtained using the asymptotics of the integral curve estimators of the fibers. Therefore, the C-S (2015) estimation method can provide a measure of uncertainty surrounding the estimated curve. In the next sections in this chapter we will present some key elements from C-S (2015). 1.4 Model Let S(x, g) denote the relative amount of water diffusion along a spatial direction g ∈ R3, (cid:107)g(cid:107) = 1 and at a location (also called a voxel) x. ¨Ozarslan and Mareci (2003) and Descoteaux et. al. (2006) have proposed a model for HARDI using a super-symmetric tensor D of order M (even) and rank R ≥ 2 by the following equation: (cid:18) S(x, g) (cid:19) S0(x) log = −c d(cid:88) d(cid:88) i1=1 iM =1 . . . Di1...iM (x)gi1 . . . giM + σ(x, g)ξg, (1.4.1) where S0(x) is the amount of water diffusion without any magnetic field gradient; σ(x, g) > 0; ξg describes the noise, and the constant c depends on several factors involved into the imaging procedure, see Carmichael and Sakhanenko (2015) for more details. Depending on the type of imaging the dimension d of the location x would either be 2 or 3. Similar construction of DT-MRI model can also be found in Ying et. al. (2007), where the authors described the concept of super-symmetric tensors in detail. 11 At any fixed location x the log-losses log are stacked into the vector Y (x) given by (cid:18)S(x, g) (cid:19) S0(x) of signal along N directions g1, ..., gN Y (x) = BD(x) + Σ1/2(x)Ξx, (1.4.2) where D is a vector representation of the super-symmetric tensor D and the matrix B is constructed out of vectors g1, ..., gN . Therefore, given a set of points x1, . . . , xn in a bounded open convex subset G of Rd, one observes Yi = BD(xi) + Σ1/2(xi)ξi, (1.4.3) where JM = (M + 1)(M + 2)/2 and N = JM m for some m ≥ 1, B ∈ RN×JM , Yi, ξi ∈ RN and Σ(Xi) is a N × N symmetric positive definite matrix. R(cid:80) A super-symmetric tensor D of the rank R and the order M (even) can be represented vr ⊗ . . . ⊗ vr for some v1, . . . , vR ∈ Rd, where the notation u ⊗ w means the by D = outer product of vectors u, w ∈ Rd, which is simply a 2D tensor with the components (cid:123)(cid:122) (cid:125) (u ⊗ w)ij = uiwj for i, j = 1, . . . , d. Also we will use v⊗M = v ⊗ . . . ⊗ v , v ∈ Rd, as an (cid:124) r=1 abbreviated notation for tensor products. Then by definition for all r = 1, . . . , R the pair M times λ(r), v(r) minimizes the Frobenius norm d(cid:88) . . . d(cid:88) (cid:16) (cid:17)2 , D (r) i1...iM − λvi1 . . . viM iM =1 i1=1 D(r) = D(r−1) − λ(r−1)(v(r−1))⊗M , D(1) = D. (1.4.4) The quantities λ(1), . . . , λ(R) and v(1), . . . v(R) are called the pseudo-eigenvalues and pseudo- 12 eigenvectors of the tensor D respectively; see Ying et. al. (2007) for more details. Define the integral curves arising out of the differential equations involving the pseudo- eigenvectors for r = 1, . . . , R as dx(r)(t) dt = v(r)(x(r)(t)), t ≥ 0, x(r)(0) = a ∈ G. (1.4.5) Under the HARDI model these integral curves serve as models of axonal fibers inside a human brain. 1.4.1 Assumptions The key assumptions of the estimation process in Carmichael and Sakhanenko (2015) are: (A1) G is a bounded open set in Rd with Lebesgue measure 1. It contains the support of the twice continuously differentiable everywhere, super-symmetric tensor field D : Rd (cid:55)−→ of even order M > 2 and rank 1 ≤ R ≤ (M + 2)/2. For a vector v and a tensor RdM D define the matrix-valued function T : Rd × RdM (cid:55)−→ Rd2 as d(cid:88) d(cid:88) T (v, D)km := (M − 1) . . . Dkmi3...iM vi3 . . . viM , k, m = 1, . . . d. (1.4.1.1) i3=1 iM =1 Then assume that Ker(T (v(r), D(r)) − λ(r)I) = 0 everywhere in the support of D for r = 1, . . . , R, where Ker(T ) stands for the kernel of the linear map T i.e. the space of all vectors that are zero under T . (A2) The initial point a lies inside the support of D(·). (A3) There exists a number τ > 0 such that for all t1, t2 ∈ (0, τ ) with t1 (cid:54)= t2, x(r)(t1) (cid:54)= x(r)(t2) for all r = 1, . . . , R. 13 (A4) Locations {Xj, j ≥ 1} are independent and uniformly distributed in G. (A5) The observed data {(Xj, Y (Xj)), j = 1, . . . , n}, obeys the model given in equation (1.4.3) with a fixed non-random known real-valued N × JM matrix B, an unknown symmetric positive definite N × N tensor field Σ : Rd (cid:55)−→ RN 2 continuous on G , un- observable random N -component vectors Ξj, j = 1, . . . , n, respectively. Additionally, it is assumed that BT B is invertible and EΣ4 kl(X1) < ∞, 1 ≤ k, l ≤ N . (A6) The random measurement error vectors Ξj, j ≥ 1, are i.i.d and independent of loca- tions. Also, EΞ1 = 0 and EΞ1,kΞ1,l = δkl for all 1 ≤ k, l ≤ N , where δkl is the Kronecker delta. (A7) The kernel K is non-negative and twice continuously differentiable on its bounded support. Moreover, (cid:82) K(x)dx = 1,(cid:82) xK(x)dx = 0. Rd Rd (A8) The bandwidth hn satisfies the condition nhd+3 n → β > 0 as n → ∞, where β is a known fixed number. 1.4.2 Estimation The estimation of the integral curves was proposed by Carmichael and Sakhanenko (2015) in the following steps: 14 Algorithm 1: Estimation of Integral Curves Input: B, Y, Xj, K j = 1(1)n Output: ˆX (r) n (t) 1 Estimate D at locations Xj, using the ordinary LSE defined by ˜D(Xj) = (BT B)−1BT Y (Xj), j = 1(1)n or the weighted LSE given by ˜D(Xj) = (BT Σ−1(Xj)B)−1BT Σ−1(Xj)Y (Xj), j = 1(1)n and since Σ is generally unknown, this relationship can be iterated, see Zhu et. al. (2007,2009). 2 Estimate D at every other location x ∈ G using the kernel smoothing estimator ˆD at locations in-between the measurement locations Xj using (cid:18) x − Xj (cid:19) n(cid:88) j=1 ˆDn(x) = 1 nhd n K hn ˜D(Xj), (1.4.2.1) where K is a kernel function and hn is a bandwidth. 3 Estimate v(r)(x), r = 1(1)R using the iteration method described in (1.4.4) for any x ∈ G. Thus, for all r = 1(1)R we get pseudo-eigenvalues ˆλ(r) n (x) and pseudo- eigenvectors ˆv(r) 4 Finally, estimate x(r)(t), t ∈ [0, τ ], r = 1(1)R, using the solution of the ODE given by an n (x). estimator of the integral curve x(r)(t), t ∈ [0, τ ], dˆx(r) n (t) dt = ˆv(r) n (ˆx(r) n (t)), t ≥ 0, ˆx(r) n (0) = a. 15 The details and implementations of this algorithm in a simulation study and in a real data analysis can be found in Carmichael and Sakhanenko (2015). 1.4.3 Asymptotic Distribution Under the assumptions (A1)–(A8), Carmichael and Sakhanenko (2015), established as n → ∞ (cid:113) nhd−1 n (r) n (t) − x(r)(t)) (ˆx d−→ G(t), t ∈ [0, τ ], for all r = 1, . . . , R, (1.4.3.1) (cid:113) nhd−1 n = where G(t) is a Gaussian process that depends on D, K, x(r) and β > 0, the latter is a tuning constant from condition (A8). This result indicates that there exist asymptotically normal estimators of the respective integral curves with a convergence rate of O(n2/(d+3)). The goal of this chapter is to prove that this rate is optimal in the minimax sense under some appropriate loss. Theorem 1 in the present chapter establishes the minimax rate optimality of the estimators of the integral curves in Carmichael and Sakhanenko (2015). Another result is described in Carmichael and Sakhanenko (2015): for a fixed r = 1, . . . , R there exists an unique point τr ∈ (0, τ ) for which the sequence (cid:113) (cid:34) nhd−1 n | ˆx (r) n (t) − z |2 − | x(r)(τr) − z |2 min t∈[0,τ ] (cid:35) (1.4.3.2) converges to a non-degenerate distribution. The quantity described in (1.4.3.2) represents the minimum L2 distance between the estimated integral curve and a point z. In fact a more general result holds. Let Γ be a closed subset of G, let d(x, y) be a distance between x and 16 y in Rd. Now define d(x, Γ) = inf y∈Γ d(x, y). Then for a strictly increasing function m defined on R+ (for example m(u) = u2 or m(u) = u, u > 0), let ϕ(x) = m(d(x, Γ)). Subsequently, the sequence (1.4.3.2) can be generalized to (cid:34) (cid:113) nhd−1 n (cid:16) (cid:17) − min t∈[0,τ ] (cid:16) min t∈[0,τ ] ϕ (r) n (t) ˆx ϕ x(r)(t) , r = 1, . . . , R. (1.4.3.3) Theorem 2 in this chapter ensures the optimal rate of convergence of (1.4.3.3) in the mini- max sense for each r = 1, . . . , R. Hence, it will guarantee that the tests of whether a fiber reaches a region, based on the statistics min t∈[0,τ ] ϕ appropriate loss functions. , have minimax-optimal rates under (cid:17)(cid:35) (cid:17) (cid:16) (r) n (t) ˆx 17 Chapter 2 Local Minimax Bounds 2.1 Introduction This chapter will address the issue of optimality of the rate of convergence and establish the minimax optimal rate for the asymptotic risk of the integral curve estimators and some of their functionals. We will use H´ajek’s lemma for a specially constructed parametric subclass inside the given class of tensor fields to bound from below the supremum of asymptotic risk of the integral curve estimators rising out of the subclass in the same spirit as in the book by Ibragimov and Khasminskii (2013). This will in turn provide the lower bound for the integral curve estimators based on the bigger general class of tensors, which is our main object of interest. In many decision theoretic problems minimax bounds on the asymptotic risk function play an important role in choosing an optimal estimator from a class of estimators. Thus, establishing a lower bound for asymptotic minimax risk for decision rules has been studied extensively in past literature. Efromovich (2018, 2014), Ibragimov and Khasminskii (2013) have provided a general framework in simpler non- and semi-parametric estimation problems to study the minimax rates for the asymptotic risk of the estimators therein. While Cator (2011) has studied minimax lower bounds in the nonparametric estimation problem of a monotone regression (or density) function, Guntuboyina (2011) has provided a more generalized framework for the study of minimax lower bounds with an extensive theoretical 18 foundation in semi- and nonparametric estimation problems. Thus, given the problem of accurately estimating neural fibers we believe that minimax lower bounds for the integral curve estimators of fibers will play an important role. Sakhanenko (2012) in her work established the minimax lower bounds for the asymptotic risk of the estimators of the integral curve under a simpler model where imaging signals were modeled by a vector field perturbed by an additive noise. That work does not ensure the point-wise convergence rate of the asymptotic risk of integral curve estimators are optimal in the minimax sense when we see fiber patterns that cross each other, converge, diverge, ‘fan out’ or ‘kiss’. To assess such situations we will consider the HARDI model, and we will establish minimax lower bounds for the asymptotic risk of the fiber estimators there. Addi- tionally, our construction relaxes the orthogonality of axonal fiber tracts, a key consequence of the vector model by Sakhanenko (2012), hence the present work will generalize the results of that paper. The rest of the chapter is organized as follows. In Section 2 we will introduce the theo- rems, notations, assumptions along with the construction of the parametric subclass inside the general tensor class, in order to establish the minimax optimal rates of the asymptotic risk of the fiber estimators. Section 3 will illustrate our theory via a simulation example. The concluding remarks will be provided in Section 4. All the detailed proofs of our results along with the necessary lemmas will be described in Section 5 of this chapter. 2.1.1 Notations Throughout our work the following notations are used. For x ∈ Rd, xl represents its l-th coordinate, l = 1, . . . , d, | · | denotes the absolute value of a real number, (cid:107)·(cid:107) represents the standard Euclidean norm of a vector, and for any matrix A, (cid:107)A(cid:107)2 F = T r(AAT ) denotes 19 the matrix norm (or otherwise known as Frobenius norm). Similarly, for a tensor D, we define (cid:107)D(cid:107)2 , see De Lathauwer et. al. (2000) for reference. Also D2 . . . d(cid:80) d(cid:80) F := d(cid:80) i1=1 iM =1 i1...iM (cid:104)x, y(cid:105) = xiyi. Next, for a function f : Rd (cid:55)−→ R, ∇f denotes the gradient and ∇2f denotes the tensor (matrix) field of the second order partial derivatives of f. Finally, Det(·) i=1 denotes the determinant value of the corresponding matrix. 2.2 Assumptions and main results In this section the general results regarding minimax optimality of the estimators of the integral curves defined in (1.4.5), are stated along with additional notations and assumptions. 2.2.1 Assumptions We require a slight modification of the assumptions introduced in the previous chapter, in order to propose the results for minimax optimality of the integral curve estimators. (A9) Conditions (A1)–(A6) hold as described in the previous chapter. (A10) Noise variables {ξi : i = 1, 2, . . .} are independent and identically distributed with a common density function f : RN (cid:55)−→ R+ independent of {Xi : i = 1, 2, . . .}. (A11) The function (A12) (cid:82)(cid:13)(cid:13)∇2√ f (y)(cid:13)(cid:13)2 √ f is twice differentiable everywhere in RN . dy < ∞ and(cid:82) f (y)(cid:107)∇ log f (y)(cid:107)4dy < ∞. (A13) In addition to (A5), for all x ∈ G the matrix Σ(x) is finite positive definite such that its smallest eigenvalue µmin(x) ≥ µ > 0. 20 Note that since f is a density function, assumption (A9) guarantees that there exists an open set S ∈ RN , where ∇√ (cid:82) f (y)(cid:107)∇ log f (y)(cid:107)αdy < ∞ for any α ∈ [0, 4] by means of H¨older’s inequality. Assumption dition (F3) in Sakhanenko (2012), which along with the fact that f is a density, implies f is not zero. Assumption (A12) is similar to the con- (A13) on the scaling matrix is not very restrictive, and it allows to bound any arbitrary finite power of Σ−1 matrix within the set G in its matrix norm. Throughout this chapter, the following classes are considered. For a fixed τ > 0 let D2(a, G, τ ) be the class of super-symmetric tensor fields D which satisfy conditions (A1)– (A3). Let W be the class of all non-trivial even functions w : R (cid:55)−→ R+ that are 0 at 0, and whose subgraphs are convex, see Ibragimov and Khasminskii (2013). Examples of such functions could be u2,|u|, I(|u| > c). Let En(τ ) denote the class of all possible estimators of the integral curves x(r)(t), r = 1, . . . , R, t ∈ [0, τ ], based on the data. Theorem 2. Let τ > 0, 1 ≤ R ≤ (M + 2)/2. Suppose the assumptions (A9) – (A13) hold. Then for any point a ∈ G, any t0 ∈ (0, τ ], any function w ∈ W and any unit vectors e1, . . . , eR ∈ Rd the following holds (cid:104) ˆX (1) n (t0) − x(1) n (t0), e1(cid:105) ... (cid:104) ˆX (R) n (t0) − x(R) n (t0), eR(cid:105) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   > 0. lim inf n→∞ inf n ∈En(τ ) n ,..., ˆX (R) ˆX (1) sup D∈D2(a,G,τ ) Ew n2/(d+3) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  The result simply means that the integral curve estimator ˆX (r) n (t0), r = 1(1)R, minimizes the maximum risk among all the estimators inside the class D2(a, G, τ ) at any point t0 ∈ (0, τ ]. In calculation of the asymptotic risk we use the Euclidean norm of error projections appropriately scaled by the asymptotic rate of convergence n2/(d+3). It is interesting to note that the errors in estimation ˆX (r) n (t0) − x (r) n (t0), r = 1(1)R are projected on the directions 21 given by unit vectors e1, . . . , eR that are not needed to be orthogonal. In our proof we would use a tensor field of rank R = 2, but our result can be generalized for any tensor field of rank R ≥ 2 in a similar fashion. Next, in order to state Theorem 2, let us define for a fixed τ > 0 the class Fn(τ ) of all (cid:16) (cid:17) possible estimators of inf t∈[0,τ ] ϕ (r) n (t) x based on the data. Theorem 3. Suppose the assumptions (A9) – (A13) hold. Then for any closed subset Γ ⊂ G, any point a ∈ G\Γ such that ϕ is continuously differentiable at a with ∇ϕ(a) (cid:54)= 0, for any τ > 0 and any function w ∈ W we have lim inf n→∞ inf n ∈Fn(τ ) ˆF (1) n ,..., ˆF (R) sup D∈D2(a,G,τ ) Ew n2/(d+3) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  n − inf ˆF (1) t∈[0,τ ] ... n − inf ˆF (R) t∈[0,τ ] (cid:16) (cid:16) ϕ ϕ x(1) n (t) x(R) n (t) (cid:17) (cid:17) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   > 0. Theorem 3 in this chapter corresponds to the connectivity test described in Corollary 1 of Theorem 1 in C-S (2015). The connectivity of brain fibers refers to a question of whether a curve starting at a given initial point a travels through a region of interest z. For example, there is a C pattern across the genu of Corpus Callosum that connects the (cid:16) (cid:17) the functional inf t∈[0,τ ] ϕ (r) n (t) x left and right lobes in a human brain as on Figure 1. In such situations the estimator of , r = 1(1)R constructed using the integral curve estimator ˆX (r) n (t0), r = 1(1)R, is also minimax, which means that it minimizes the supremum of the error in estimation among all other estimators uniformly, with respect to the class of tensors D2(a, G, τ ) under any arbitrary loss function w ∈ W. Next, we describe the parametric subclass construction, which will be pivotal for estab- lishing the two theorems that we have described above. 22 2.2.2 Parametric subclass of tensors The main idea of the proofs is to bound the supremum over D2(a, G, τ ) from below by a supremum over a suitable parametric subclass and connect the deviation between estimated integral curve and the true integral curve with the deviation between the parameter and its estimator. Then we will apply H´ajek’s Lemma that provides the positive lower bound for the parameter estimators for the parameters inside the subclass. Thus, in this section we construct the parametric subclass. Without loss of generality assume R = 2. For the parameters θ1, θ2 ∈ R we select numbers λ1 > λ2 > 0 and vector- 0 , g : Rd (cid:55)−→ Rd in a special way described after the tensor class construction. as its i1, .., iM -th Recall that for a vector v the super-symmetric tensor v⊗M has vi1 (1) 0 , v fields v . . . viM (2) component. Define (1) (1) (2) (2) n (x, θ2)⊗M n (x, θ1)⊗M + λ2v 0 (x) + θ1n−αgn(x))⊗M} + λ2{(v 0 (x)⊗M + θ1n−αv 0 (x)⊗M + θ2n−αv Dn(x, θ) = λ1v = λ1{(v = λ1{v + λ2{v = D0(x) + θ1n−αD1(x) + θ2n−αD2(x) + n−2αM + n−2αM 0 (x)⊗M−1 ⊗ gn(x) + n−2αM 0 (x)⊗M−1 ⊗ gn(x) + n−2αM 0 (x) + θ2n−αgn(x))⊗M} n (x, θ1)} n (x, θ2)} (1) n (x, θ1) (2) n (x, θ2), (1) (2) (1) (2) (1) (2) (2.2.2.1) where gn(x) = g(x1, nγx2, ..., nγxd), γ > 0, (2.2.2.2) 23 D0(x) = λ1v D1(x) = λ1v D2(x) = λ2v M (1) n (x, θ1) = λ1 M (2) n (x, θ2) = λ2 (2) 0 (x)⊗M , (1) (1) (2) 0 (x)⊗M + λ2v 0 (x)⊗M−1 ⊗ gn(x), 0 (x)⊗M−1 ⊗ gn(x), M(cid:88) 0 (x)⊗M−i ⊗ (θi M(cid:88) 0 (x)⊗M−i ⊗ (θi i=2 (1) v (2) v 1n−iαgn(x)⊗i), 2n−iαgn(x)⊗i). (2.2.2.3) i=2 It is evident that the pseudo-eigenvectors of Dn are (1) n (x, θ1) = v v (2) n (x, θ2) = v v (1) 0 (x) + θ1n−αgn(x), 0 (x) + θ2n−αgn(x). (2) In addition to the assumptions made in Section 2.2, we assume the following for the pseudo- eigenvalues and pseudo-eigenvectors in the parametric subclass of the tensors. (A14) Numbers α, γ > 0 are chosen so that 1 − γ(d − 1) − 2α = 0, α = 2/(d + 3). (A15) The numbers λ1, λ2 representing pseudo-eigenvalues of the tensor Dn are such that 0 (x) : G (cid:55)−→ Rd are bounded and continuous 0 (x)} has zero Lebesgue measure, (1) 0 (x) = cv (1) λ1 > λ2 > 0. The vector fields v 0 (x), v such that for any c ∈ R the set {x : v (2) (2) 24 and the following inequality holds (cid:32) (M − 1)qM−2(x)λ2 λ1 1 + (cid:20)(M − 1)q2(x) (M − 2) (cid:21)(cid:33) − 1 (cid:54)= 0, where q(x) = (cid:104)v (1) 0 (x), v (2) 0 (x)(cid:105). Additionally, we assume the integral curves associated with v (i) 0 , i = 1, 2, satisfy con- dition (A3). (A16) The constants θ1, θ2 ∈ R are bounded. The function g : G (cid:55)−→ Rd is such that (cid:107)g(cid:107)M is L4 integrable on Rd and the following relationships hold (cid:107)v (i) 0 (x) + θ1n−αgn(x)(cid:107) = 1, for all x ∈ G, i = 1, 2. Note that (A14) boils down to γ = (1 − 2α)/(d − 1) = 1/(d + 3) = α/2. Also note that (1) 0 (x), v v (2) 0 (x) and g have the support G. Therefore, all the tensor fields also have the support G, thus (A2) is trivially satisfied for Dn. Moreover, (A16) is quite natural since the (i) perturbations from v 0 are assumed to be small. Finally, it is easy to note that if q = 0 then it implies that v (1) 0 (x) and v (2) 0 (x) are orthogonal, and hence, (A15) will be trivially satisfied.Therefore, the class containing v (1) 0 (x), v (2) 0 (x) and g satisfying (A15) and (A16) is fairly large. 25 2.3 Experiment We begin this section by introducing the imaging protocol behind HARDI. Figure 1.1 in the introduction was obtained from a Diffusion Weighted Imaging (DWI) dataset that was collected from a 26-year-old healthy male brain on a GE 3T Signa HDx MR scanner (GE Healthcare, Waukesha, WI) with an 8-channel head coil. The subject signed the consent form approved by the Michigan State University Institutional Review Board. DWI images were acquired with a spin-echo echo-planar imaging (EPI) sequence for 13 minutes with the following parameters: 48 contiguous 2.4-mm axial slices in an interleaved order, FOV = 22 cm × 22 cm, matrix size = 128 × 128, number of excitations (NEX) = 1, TE = 72.3 ms, TR = 11.5 s, 60 diffusion-weighted volumes (one per gradient direction) with b = 1000 s/mm2, 6 volumes with b = 0 and parallel imaging acceleration factor = 2. The seed point is chosen in the corpus callosum (CC), which contains thick axonal fibers connecting the two cerebral hemispheres and enabling the communication between them. The general anatomical locations of these axonal fibers are well established. These fibers are often used to evaluate new techniques in fiber tractography. On Figure 1.1 a fiber in the anterior part of the CC, called the genu of CC, is constructed with δ = 0.003, β = 10−7. The branches are shown in magenta and cyan colors, they were traced for 70 and 50 steps. According to Theorem 1 log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n is asymptotically close to −2/(d+3), which is −1/3 in case of 3D image. Even though the general anatomical locations of some fibers are well studied, the exact true fibers x(t0) are not quite known. Also the same subject is usually not scanned repeatedly for 100 times to assess the estimation error empirically. That is why we will illustrate our results via an artificial simulation study to trace some 26 typical patterns of fibers when the underlying truth is known mathematically. We consider the 2 typical patterns for axonal fibers: C-pattern and Y-pattern. We use the same design of these patterns as described in Sakhanenko and DeLaura (2017). For each pattern the tensor field was generated perturbed by a normal noise. Then a random sample of n0 × n0 × n0 observations on a regular grid was taken, and a fiber was estimated. The distance (cid:107) ˆXn(t0) − x(t0)(cid:107) at the endpoint was computed. The procedure was repeated 100 times independently. For Y-pattern the endpoint t0 is chosen on the main branch. We used the fiber thickness 0.04 and signal-to-noise ratio of 5 in the setup of Sakhanenko and DeLaura (2017). The tuning parameter β was 0.0001. The results are presented in Figures 2 and 3 as boxplots of 100 values log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for each sample size n = n3 0. Figure 2.1: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for C-pattern. We traced each fiber for 30 steps of size δ = 0.02. 100 of independent samples of n = n3 0 observations are used to create each boxplot. The labels on x-axis mark n0. The theoretical value is −1/3. The location t0 is the endpoint of the trace. 27 163248648096112128144160-1.2-1-0.8-0.6-0.4-0.200.2 Figure 2.2: Boxplot of log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n for Y-pattern. We traced each fiber for 30 steps of size δ = 0.02. 100 of independent samples of n = n3 0 observations are used to create each boxplot. The labels on x-axis mark n0. The theoretical value is −1/3. We trace the main branch. The boxplots of values log (cid:107) ˆXn(t0) − x(t0)(cid:107)/ log n hover around the theoretical value −1/3 for all sample sizes and both patterns. They are uniformly closer to the theoretical value for C-pattern than for Y-pattern, which is expected, due to branching. 2.4 Remarks and Discussion In the construction of the proof we have assumed the order M of the tensor D can be any even number. Recent developments in the research have shown cases where M could be taken as 4 or 6. In general d is always 2 or 3 depending on the imaging technique. Therefore, if order M = 4 tensors are studied then the rank of the tensor can be at most R = 3, see (A1). In that situation proof of Theorem 2 and 3 will follow along the lines of Sakhanenko (2012). The idea comes from the fact that v (2) n (x, θ2) can be chosen orthogonal to n (x, θ1), 1 ≤ r ≤ 3 can be chosen to be unit vector fields along the (1) n (x, θ1) and v each other. Thus, v (r) coordinate axes, and therefore, the orthogonality will be utilized in the proof. But for the 28 163248648096112128144160-1-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.10 rank R ≥ 4, which is possible for the order M ≥ 6, such a proof would not work. On the other hand, our estimation problem is motivated by the situation, where axonal fibers have “branching”, “crossing”, “converging” patterns in them and, hence, are never orthogonal to each other. Moreover, orders higher than 4 are becoming important in order to model bundles of crossing fibers. In these situations the proof that we discuss here can be used with wider general applicability. Also we used the lower bounds inspired by H´ajek’s lemma and LAN families technique. Alternatively, one can try to establish lower bounds using the technique in Efromovich (2008). Finally, we remark that the integral curve estimators are instrumental in providing con- fidence regions along axonal fibers on a brain image based on DT-MRI data, and the width of these confidence regions is controlled by the convergence rate. Thus, showing rate opti- mality indicates that the widths of these confidence regions cannot be further optimized by choosing faster convergent estimators. 2.5 Proofs First, we need to check that the subclass of tensors constructed in the previous section is indeed a subclass of the class D2(a, G, τ ) for all n sufficiently large. Obviously, conditions (A2) and (A3) are satisfied for all large n. Lemma 1 in this section shows that (A1) is fulfilled for the tensors in the subclass. This result acts as a requirement in the proof of Theorem 1 where we establish ˆθi,n − θi = nα(cid:104) ˆX 0 (cid:105) × O(1). We will then use H´ajek’s lemma (lemma 7) for the vector of parameters (θ1, θ2) to establish the lower bound for the asymptotic risk of the integral curve estimators in D2(a, G, τ ), see e.g. the (i) (i) n (t0) − x book by Ibragimov and Khasminskii (2013). But Hajek’s lemma requires the family of 29 densities derived in Lemma 2 to be so-called locally asymptotically normal (LAN) with well- defined limit of information matrices that are defined in Lemma 3. Lemmas 4 and 5 check LAN condition, meanwhile Lemma 5 and Lemma 6 establish conditions required for H´ajek’s lemma. In particular, Lemma 6 proves Lyapunov’s condition. 2.5.1 Constructed parametric subclass The representation (2.2.2.1) is the parametric construction of the tensor field Dn. In order to establish the minimax lower bound this construction will be followed throughout this chapter. Below the first Lemma is presented which ensures that the construction as proposed above satisfies the assumption (A1) that has been stated in the introduction, see Sakhanenko et. al. (2015) for reference. Recall (1) n (x, θ1) = v v (2) n (x, θ2) = v v (1) 0 (x) + θ1n−αgn(x), 0 (x) + θ2n−αgn(x). (2) For the ease of notation define the following expressions D (2) n (θ2) = λ2v D (1) n (θ1) = λ1v (2) n (x, θ2)⊗M n (x, θ1)⊗M + λ2v (1) (2) n (x, θ2)⊗M (2.5.1.1a) (2.5.1.1b) (2.5.1.2) (2.5.1.3) 30 T (v (i) n (x, θi), D (i) n (θi))km d(cid:88) d(cid:88) D (i) (n)kmi3...id (θi)v (i) (n)i3 (x, θi) . . . v (i) (n)iM (x, θi), (2.5.1.4) =(M − 1) . . . i3=1 iM =1 where gn(x) is defined in equation (2.2.2.2). It is also assumed that for i = 1, 2: (cid:107) v (i) n (x, θi) (cid:107)= 1, subsequently (cid:107) v (i) 0 (x) (cid:107)=: ci < 1. (2.5.1.5) Lemma 3. For the tensors D (2) n (θ2) and D (1) n (θ1) the following relations hold for all x ∈ G Ker(T (v Ker(T (v (2) n (x, θ2), D (1) n (x, θ1), D (2) n (θ2)) − λ2I) = 0, n (θ1)) − λ1I) = 0. (1) (2.5.1.6) (2.5.1.7) 31 Proof. Det(T (v (2) n (x, θ2), D d(cid:88) (2) n (θ2)) − λ2I) d(cid:88) =Det (M − 1) . . . (cid:16) i3=1 iM =1 (cid:17) − λ2δkm (cid:16) =Det (M − 1) λ2v d(cid:88) (v (2) (n)iM iM =1 . . . ( =λd =Det((M − 1) λ2 v 2 Det((M − 1) v d(cid:88) 2 {−(M − 1) ( =λd i=1 D (2) (n)kmi3...id (θ2)v (2) (n)i3 (x, θ2) . . . v (2) (n)iM (x, θ2) d(cid:88) i3=1 (v (2) (n)i3 (x, θ2))2) (2.5.1.8) (cid:17) (2) (n)k(x, θ2) v (2) (n)m(x, θ2) ( (x, θ2))2) − λ2δkm (2) (n)k(x, θ2) v (2) n (x, θ2) v (2) (n)m(x, θ2) − λ2I) by the assumption in 2.5.1.5 n (x, θ2)T − I) (2) (2) (n)i(x, θ2))2) + 1} by algebraic manipulation (v = − λd 2(M − 2) (cid:54)= 0 is ensured as long as M > 2. Similarly for equation (2.5.1.7) it can be seen Det(T (v (1) n (x, θ1), D (1) n (θ1)) − λ1I) d(cid:88) d(cid:88) . . . i3=1 iM =1 =Det((M − 1) − λ1δkm). D (1) (n)kmi3...id (θ1)v (1) (n)i3 (x, θ1) . . . v (1) (n)iM (x, θ1) (2.5.1.9) 32 To simplify the expression consider d(cid:88) d(cid:88) i3=1 . . . . . . = d(cid:88) d(cid:88) iM =1 D (cid:16) i3=1 iM =1 (1) (n)kmi3...id (θ1)v (1) (n)i3 (x, θ1) . . . v (1) (n)iM (x, θ1) λ1 v (1) (n)k(x, θ1) v (1) (n)m(x, θ1) (v (1) (n)i3 (x, θ1))2 . . . (v (1) (n)iM (x, θ1))2 + λ2 v (2) (n)k(x, θ2) v (2) (n)m(x, θ2) (v (x, θ1)v (2) (n)i3 (x, θ2)) (1) (n)i3 (cid:17) . . . (v (1) (n)iM (1) (2) (n)iM (x, θ1)v (x, θ2)) (n)m(x, θ1) + λ2 qM−2 n v =λ1 v (1) (n)k(x, θ1) v (2) (n)k(x, θ2) v (2) (n)m(x, θ2), where Denote the matrix d(cid:88) i=1 qn = (1) (n)i(x, θ1)v v (2) (n)i(x, θ2), qM−2 n = q(cid:48) n > 0, since M > 2 is even. A = (M − 1)λ1v (1) n (x, θ1)v (1) n (x, θ1)T − λ1I, which is non-singular by (2.5.1.8). Rewrite equation (2.5.1.9) as follows (2.5.1.10) (2.5.1.11) n (θ1)) − λ1I) (1) Det(T (v (1) n (x, θ1), D = Det((M − 1)[λ1 v = Det(A + {(M − 1)λ2q(cid:48) nv (2) n (x, θ2)}v (2) n (x, θ2)T ) (1) n (x, θ1) v (1) n (x, θ1) + λ2 q(cid:48) (2) n (x, θ2) v n v (2) n (x, θ2)] − λ1I) (2.5.1.12) = Det(A)(1 + v (2) n (x, θ2)T A−1(M − 1)λ2q(cid:48) nv (2) n (x, θ2)), 33 where we used the identity Det(A + uvT ) = (1 + vT A−1u)Det(A). Next, we apply the identity to obtain A−1 (B + ceT )−1 = B−1 − B−1ceT B−1 1 + eT B−1c . (2.5.1.13) The expression for A−1 becomes (1) n (x, θ1)}v (1) n (x, θ1)T (− 1 I) λ1 n (x, θ1)} (1) I){(M − 1)λ1v (1) n (x, θ1)T A−1 = − 1 λ1 I − 1 + v (− 1 λ1 (1) I){(M − 1)λ1v n (x, θ1)T (− 1 λ1 (1) n (x, θ1)v 1 − (M − 1) (1) n (x, θ1)v (2 − M )λ1 (1) n (x, θ1)v (M − 2) I − (M − 1)/λ1v I − (M − 1)v (M − 1)v −I + = − 1 λ1 = − 1 λ1 (cid:34) = 1 λ1 (1) n (x, θ1)T (1) n (x, θ1)T (2.5.1.14) (cid:35) . 34 Using above rewrite equation (2.5.1.12) (cid:17) (2) n (x, θ2)T A−1{(M − 1)λ2q(cid:48) nv (M − 1)v (2) n (x, θ2) (cid:105) (1) n (x, θ1)v (M − 2) (1) n (x, θ1)T (cid:35) (2.5.1.15) (cid:16) (cid:32) (cid:32) =Det(A) Det(A) 1 + v =Det(A) 1 + v (cid:2)(M − 1)λ2q(cid:48) (2) n (x, θ2)T 1 λ1 (cid:104) − I + n (x, θ2)(cid:3)(cid:33) (cid:34) (cid:105)(cid:33) (2) n (x, θ2)T + −v nv (2) 1 + (2) n (x, θ2) 1 (cid:104) λ1 (M − 1)λ2q(cid:48) (cid:20) nv (cid:34) −(M − 1)λ2q(cid:48) n + −(M − 1)λ2qM−2 1 λ1 1 λ1 (M − 1)qM−2 (cid:32) n λ2 =Det(A)(1 + =Det(A)(1 + =Det(A) 1 + n λ1 qn(M − 1)v (1) n (x, θ1)T (M − 2) n (cid:21) (M − 2) ) λ2(M − 1)2qM n(M − 1)2λ2q(cid:48) q2 (cid:21)(cid:33) (M − 2) (cid:20)(M − 1)q2 + n n − 1 (M − 2) (cid:54)= 0. (cid:35) ) (1) 0 (x) and v (2) Note if g, v 0 (x) are orthogonal to each other then the inequality in (2.5.1.15) holds for any n provided R ≤ d − 1. In general, a close investigation of the quantity given 0 (x)(cid:105) + o(1) and together with by qn = (cid:104)v n (x, θ2)(cid:105) reveals that qn = (cid:104)v (1) n (x, θ1), v (1) 0 (x), v (2) (2) (A13), it implies that the above inequality holds for all n large enough. Thus, this Lemma shows that the parametric reconstruction of the tensor subclass is compatible with (A1) and can be used for investigation of asymptotic lower bounds for the estimation of the integral curve. 35 2.5.2 Main Lemmas The next section will use the ideas and techniques mentioned in Ibragimov and Khasminskii (2013) in order to utilize it further in the proofs of the two Main Theorems. The lower bound of rates of convergence for the difference in parameter and its estimate are obtained from the results for LAN families of distributions. A well known result due to H´ajek (1972) will be presented in this subsection. Lemma 4 will introduce the density function for the subclass model. Lemma 5 and 6 would imply that the estimation problem inside parameter subclass yields a regular experiment. Lemma 5, 7, and 8 would imply the conditions of the Theorem II.3.1’ in Ibragimov and Khasminskii (2013), which would further imply that the family of distributions in (2.5.2.2) is LAN. Note that lemma 9 is somewhat similar to the formula II.12.19 in Ibragimov and Khasminskii (2013), which also known as the Lemma due to H´ajek (1972). Consider a typical observation under the parametric subclass. It would be written as (cid:32) (cid:16) (xi, yi) = xi, B D0(xi) + θ1n−αD1(xi) + θ2n−αD2(xi) + n−2αM (cid:17) (cid:33) (1) n (x, θ1) (2.5.2.1) + Σ1/2(xi)ξi , i = 1, . . . , n. + n−2αM (2) n (x, θ2) Lemma 4. Then the function given by: φn(x, y, θ1, θ2) = f (Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM + θ2n−αD2(x) + n−2αM (1) n (x, θ1) n (x, θ2))))Det(Σ−1/2(x))I(x  G) (2) (2.5.2.2) represents the density for (X, Y ). Proof. Note that φn(x, y, θ1, θ2) ≥ 0. To check that (cid:82)(cid:82) φn(x, y, θ1, θ2)dxdy = 1 let us 36 consider the following: (cid:90) (cid:90) f (Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM (1) n (x, θ1) + θ2n−αD2(x) + n−2αM (2) n (x, θ2))))Det(Σ−1/2(x))I(x  G). (2.5.2.3) Apply the transformation (cid:16) ˜y = Σ−1/2(x) y − B(cid:0)D0(x) + θ1n−αD1(x) + n−2αM (1) n (x, θ1) n (x, θ2)(cid:1)(cid:17) (2) + θ2n−αD2(x) + n−2αM ˜x = x, , (2.5.2.4) where the Jacobian matrix is given by ∂x ∂ ˜x ∂y ∂ ˜x Id  = ∂x ∂ ˜y ∂y ∂ ˜y J = ∂x ∂ ˜y 0 Σ−1/2(˜x)  . (2.5.2.5) Therefore, | Det(J) |=| Det(Σ1/2(˜x)) |, and hence the following is obtained (cid:90) (cid:90) f (˜y) | Det(Σ−1/2(˜x)) || Det(Σ1/2(˜x)) | I(˜x  G)d˜xd˜y = 1. 37 Define the following quantities, which are components of information matrices ∂θ1 ∂θ2 ∂ ln φn φn(x, y, θ1, θ2)dxdy, |∂ ln φn ∂θ1 |∂ ln φn ∂θ2 |2φn(x, y, θ1, θ2)dxdy, |2φn(x, y, θ1, θ2)dxdy, (cid:90) (cid:90) (cid:90) (cid:90) (cid:90) (cid:90) ∂ ln φn | (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105) |2 dxdy, (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105) (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105)dxdy, | (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105) |2 dxdy. Ψ11(n) =n Ψ22(n) =n Ψ12(n) =n (cid:90) (cid:90) [c1,c2]×Rd−1 [c1,c2]×Rd−1 (cid:90) (cid:90) (cid:90) RN RN (cid:90) (1) 0 = I (12) 0 = I (2) 0 = I RN [c1,c2]×Rd−1 Lemma 5. Both the matrix I0 as well as Ψ(n) are finite and positive definite, and Ψ11(n) Ψ12(n) Ψ12(n) Ψ22(n)  =  I (1) 0 (12) I 0 (12) I 0 (2) I 0  = I0. n→∞ Ψ(n) = lim lim n→∞ (2.5.2.6) (2.5.2.7) 38 Proof. For x ∈ G we have ln φn = ln f (Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM ln|Σ(x)| + θ2n−αD2(x) + n−2αM (2) n (x, θ2)))) − 1 2 (1) n (x, θ1) ⇒ ∂ ln φn ∂θ1 = −f−1(cid:104)∇f, n−αΣ−1/2(x)BD1(x) + n−2αΣ−1/2(x)B∇M = −f−1((cid:104)∇f, n−αΣ−1/2(x)BD1(x)(cid:105) (1) n (x, θ1)(cid:105) (2.5.2.8) + (cid:104)∇f, n−2αΣ−1/2(x)B∇M (1) n (x, θ1)(cid:105)). In the above equation the argument of f is suppressed. Similarly, ∂ ln φn ∂θ2 = −f−1((cid:104)∇f, n−αΣ−1/2(x)BD2(x)(cid:105) + (cid:104)∇f, n−2αΣ−1/2(x)B∇M (2) n (x, θ1)(cid:105)). (2.5.2.9) A simple application of Cauchy Schwarz Inequality reveals that Ψ(n) is positive definite except for the case where (x, y) : ∂ ln φn ∂θ1 = k ∂ ln φn ∂θ2 , k ∈ R. (2.5.2.10) A careful term by term comparison of (2.5.2.10) with (2.5.2.8) and (2.5.2.9) shows that the equality holds on the set: {x : v (1) 0 (x) = cv (2) 0 (x), c ∈ R}. (2.5.2.11) By (A15) the Lebesgue measure of this set is 0. Next, a careful investigation of the quantity 39 Ψ11(n) shows: Ψ11(n) = n |2φn(x, y, θ1, θ2)dxdy | ∂ ln φn ∂θ1 f−1|(cid:104)∇f, n−αΣ−1/2(x)BD1(x)(cid:105) (cid:90) (cid:90) (cid:90) (cid:90) = n (cid:90) (cid:90) + (cid:104)∇f, n−2αΣ−1/2(x)B∇M = n + (cid:104)∇f, n−2αΣ−1/2(x)B∇M f−1|(cid:104)∇f, n−αΣ−1/2(x)BD1(x)(cid:105) (1) n (x, θ1)(cid:105)|2dxdy (2.5.2.12) (1) n (x, θ1)(cid:105)|2Det(Σ−1/2(x))I(x ∈ G)dxdy. We consider the following transformation: i = 2, . . . , d & ˜x1 = x1, ˜xi = nγxi ˜y = Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM + θ2n−α2D2(x) + n−2αM (2) n (x, θ2))). (1) n (x, θ1) (2.5.2.13) Define ˜Gn = {˜x : x ∈ G}. Also since G is a convex set, it can be shown that ˜Gn converges to [c1, c2] × Rd−1. The Jacobian of the transformation is: ∂x ∂ ˜x ∂y ∂ ˜x  =  1 ∂x ∂ ˜y ∂y ∂ ˜y J =  0 0 n−γId−1 ∂x ∂ ˜y  0 Σ1/2(˜x1, n−γ ˜x−1) (2.5.2.14) |Det(J)| = n−γ(d−1)|Det(Σ1/2(˜x1, n−γ ˜x−1))| = n−γ(d−1)Det(Σ1/2(˜x1, n−γ ˜x−1)), where x−1 = (x2, . . . , xd). In what follows it is understood that the functional argument x changes to (˜x, n−γ ˜x−1) and the coordinate system changes from (x, y) to (˜x, ˜y). Also it is 40 understood that the range of integration remains the same for ˜y coordinate, which is RN . However, the range of integration with respect to ˜x changes to ˜Gn, which has been defined above. Now rewrite equation (2.5.2.12) suppressing the arguments ˜y, (˜x, n−γ ˜x−1), θ1, θ2 for the notational simplicity. Ψ11(n) =n1−γ(d−1) (cid:90) (cid:90) (cid:16) | (cid:104)f−1/2∇f, n−αΣ−1/2BD1(cid:105) n (cid:105) |2(cid:17) (1) (cid:90) (cid:90) + (cid:104)f−1/2∇f, n−2αΣ−1/2B∇M (cid:104) n1−γ(d−1)−2α | (cid:104)f−1/2∇f, Σ−1/2BD1(cid:105) |2 d˜xd˜y (cid:90) (cid:90) + n1−γ(d−1)−4α | (cid:104)f−1/2∇f, Σ−1/2B∇M ≤2 (cid:105) n (cid:105) |2 d˜xd˜y (1) . Det(Σ−1/2)Det(Σ1/2)d˜xd˜y Applying C.S. Inequality on the first and second terms in (2.5.2.15) yields | (cid:104)f−1/2∇f, Σ−1/2BD1(cid:105) |2 ≤ (cid:107)f−1/2∇f(cid:107)2(cid:107)Σ−1/2BD1(cid:107)2, n (cid:105) |2 ≤ (cid:107)f−1/2∇f(cid:107)2(cid:107)Σ−1/2B∇M | (cid:104)f−1/2∇f, Σ−1/2B∇M (1) (1) n (cid:107)2. (2.5.2.15) (2.5.2.16) Therefore, equation (2.5.2.15) can be simply written as Ψ11(n) ≤2n1−γ(d−1)−2α(cid:16)(cid:90) (cid:17)(cid:16) (cid:107)f−1/2∇f(cid:107)2d˜y (cid:90) (cid:90) (cid:90) (˜x1,n−γ ˜x−1)∈ ˜Gn (cid:17) (cid:107)Σ−1/2BD1(cid:107)2d˜x (cid:107)Σ−1/2B∇M (1) n (cid:107)2d˜x) + 2n1−γ(d−1)−4α( ≤2n1−γ(d−1)−2α(cid:16)(cid:90) + 2n1−γ(d−1)−4α(cid:16)(cid:90) (cid:107)f−1/2∇f(cid:107)2d˜y)( (cid:17)(cid:16)(cid:90) (cid:17)(cid:16)(cid:90) (˜x1,n−γ ˜x−1)∈ ˜Gn F(cid:107)B(cid:107)2 (cid:107)Σ−1/2(cid:107)2 (cid:107)Σ−1/2(cid:107)2 f(cid:107)∇ log f(cid:107)2d˜y f(cid:107)∇ log f(cid:107)2d˜y F(cid:107)D1(cid:107)2 F(cid:107)B(cid:107)2 F d˜x) n (cid:107)2 F(cid:107)∇M (1) F d˜x (2.5.2.17) (cid:17) , Note that because of (A5) for all x ∈ G, Σ(x) is assumed positive definite and (cid:107)Σ(cid:107)4 F < ∞ 41 and therefore if we consider an eigen-decomposition of Σ such that Σ(x) = Q(x)Λ(x)Q(x)T , where Λ(x) = Diag(µ1(x), . . . , µd(x)) consists of finite eigenvalues bounded away from 0 and ∞ for all x ∈ G, Q(x) is an orthogonal matrix. Then for all (˜x1, n−γ ˜x2, . . . , n−γ ˜xd) ∈ ˜Gn it can be seen that (cid:107)Σ−1/2(˜x1, n−γ ˜x−1)(cid:107)2 F < ∞. Also recall that Di = λi(v vectorize the symmetrized version of Di, which is given by (i) 0 )⊗M−1 ⊗ g for i = 1, 2. In order to bound Di one needs to (cid:16) (i) 0 )⊗M−2 Dsym i = 1 M λi g ⊗ (v (i) (i) 0 )⊗M−1 + v (cid:17) 0 ⊗ g ⊗ . . . ⊗ (v 0 )⊗M−1 ⊗ g(x) (i) , + . . . + (v and therefore, (cid:107)Di(cid:107) ≤ (cid:107)Dsym =⇒ (cid:107)Di(cid:107) ≤ (cid:107)Dsym i i (cid:107) = λi(cid:107)v (cid:107) = λi(cid:107)v (i) 0 (cid:107)M−1(cid:107)g(cid:107) 0 (cid:107)M−1(cid:107)g(cid:107) ∀ (˜x1, n−γ ˜x−1) ∈ ˜Gn. (i) Using a similar type of argument one can construct a symmetrized version of ∇M (i) n by symmetrizing each additive component of ∇M (i) n , and hence it can be concluded that (cid:107)∇M (i) n (cid:107) ≤ (cid:107)(∇M (i) n )sym(cid:107) ≤ λi(a (i) 2 (cid:107)v (i) 0 (cid:107)M−2(cid:107)g(cid:107)2 + . . . + a (i) M(cid:107)g(cid:107)M ), where a (i) 2 , . . . , a (i) M > 0 are appropriate constants. Combining these bounds it can be seen 42 that equation (2.5.2.17) becomes Ψ11(n) ≤ 2n1−γ(d−1)−2α(cid:16)(cid:90) + 2n1−γ(d−1)−4α(cid:16)(cid:90) (cid:16) (cid:90) (cid:17) (cid:107)g(˜x)(cid:107)2d˜x (cid:90) ˜Gn Ψ11 1 (cid:17)(cid:16) (cid:17) f(cid:107)∇ log f(cid:107)2d˜y L f(cid:107)∇ log f(cid:107)2d˜y Ψ11 L 2 (1) 2 (cid:107)v (1) 0 (cid:107)M−2(cid:107)g(˜x)(cid:107)2 + . . . + a (a (cid:17) < ∞ (1) M (cid:107)g(˜x)(cid:107)M )d˜x (2.5.2.18) ˜Gn Ψ11 due to assumptions in Section 3, where L 1 , L Ψ11 2 > 0 are appropriate constants. Finally, Ψ11(n) =n1−γ(d−1)−2α +2n1−γ(d−1)−3α (cid:90) (cid:90) (cid:90) (cid:90) | (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))BD1((˜x1, n−γ ˜x−1))(cid:105) |2 d˜xd˜y | (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))BD1((˜x1, n−γ ˜x−1))(cid:105) | | (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))B∇M (1) n ((˜x1, n−γ ˜x−1), θ1)(cid:105) | d˜xd˜y +n1−γ(d−1)−4α (cid:90) (cid:90) =n1−γ(d−1)−2α +o(n−α). (cid:90) (cid:90) | (cid:104)f−1/2∇f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))B∇M (1) | (cid:104)∇(cid:112)f (˜y), Σ−1/2((˜x1, n−γ ˜x−1))BD1((˜x1, n−γ ˜x−1))(cid:105) |2 d˜xd˜y n ((˜x1, n−γ ˜x−1), θ1)(cid:105) |2 d˜xd˜y (2.5.2.19) Using Lebesgue DCT the following is obtained (cid:90) (cid:90) | (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105) |2 dxdy lim n→∞ Ψ11(n) = RN [c1,c2]×Rd−1 = I (1) 0 , where D1(x1, 0, . . . , 0) = λ1v (1) 0 (x1, 0, . . . , 0)⊗M−1⊗ g(x1, . . . , xd). In a very similar manner 43 it can be shown that lim n→∞ Ψ12(n) = RN (cid:90) (cid:90) (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105) (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105)dxdy = I (12) (cid:90) | (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105) |2 dxdy [c1,c2]×Rd−1 (cid:90) , 0 lim n→∞ Ψ22(n) = RN [c1,c2]×Rd−1 = I (2) 0 . Also it is very interesting to note that applying CS Inequality yields (12) (I 0 )2 < I (1) 0 I (2) 0 . The equality case can be ignored from the fact that equality happens iff (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD1(x1, 0, . . . , 0)(cid:105) =k (cid:104)∇(cid:112)f (y), Σ−1/2(x1, 0, . . . , 0)BD2(x1, 0, . . . , 0)(cid:105) , k ∈ R, ⇒v (1) 0 (x1, 0, . . . , 0) = c v (2) 0 (x1, 0, . . . , 0) , c ∈ R. (2.5.2.20) (2.5.2.21) (2.5.2.22) Again by (A15) Lebesgue measure of such a set in (2.5.2.22) is 0. Therefore, the proof is complete. The lemma below establishes the fact that the function ∇√ φn is continuous with respect to (θ1, θ2) in the space of L2(G×RN ), which along with Lemma 5 will ensure that the family of distributions φn constitutes a regular experiment. √ Lemma 6. The function ∇√ to (θ1, θ2) on L2(G × RN ). φn(x, y, θ1, θ2) ≡ (cid:18) ∂ (cid:19) √ φn ∂θ2 ∂ , is continuous with respect φn ∂θ1 44 Proof. It will be enough to show that for any l ≡ (l1, l2) ∈ R2 2 (cid:90) (cid:90) (cid:104)∇(cid:112)φn(x, y, θ1, θ2), l(cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+h 2 θ where L > 0 is a finite constant. Note that I(x ∈ G)dxdy ≤ (cid:107)h(cid:107)2 (cid:107)l(cid:107)2 n−4α−(d−1)γL, (2.5.2.23) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2) √ ∂ φn ∂θ2 − l2 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) (2.5.2.24) = ≤2l2 1 ≤4l2 1 θ √ √ − ∂ + 2l2 2 − l1 φn ∂θ1 ∂ φn ∂θ1 √ (cid:104)∇(cid:112)φn(x, y, θ1, θ2), l(cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+h l1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)  ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)  ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2) (cid:34) ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)  ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2) (cid:34) ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2+h2)  ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2) φn ∂θ1 √ φn ∂θ1 √ φn ∂θ2 √ φn ∂θ2 φn ∂θ2 − ∂ √ √ + + +4l2 2 √ ∂ φn ∂θ1 √ φn ∂θ1 − ∂ √ − ∂ φn ∂θ1 √ φn ∂θ1 √ − ∂ φn ∂θ2 − ∂ φn ∂θ2 √ ∂ φn ∂θ2 √ + l2 φn ∂θ2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2) 2(cid:35) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2) 2(cid:35) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) . Recall the transformation (2.5.2.13) that we used earlier ˜y = Σ−1/2(x)(y − B(D0(x) + θ1n−αD1(x) + n−2αM (1) n (x, θ1) + θ2n−α2D2(x) + n−2αM (2) n (x, θ2))), 45 therefore, ∂ √ φn ∂θ1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) = −n−α (cid:113) (cid:28) BT Σ−1/2(x)∇(cid:112)f (˜y), D1(x) + n−α∇M (cid:29) (1) n (x, θ1) Det(Σ−1/2(x))I(x ∈ G). Then √ ∂ φn ∂θ1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2) (cid:28) = − n−α BT Σ−1/2(x)∇ (cid:113) f (˜y − Σ−1/2(x)Bh1(n−αD1(x) + n−2α∇M (1) n (x, t1))), (2.5.2.25) (cid:29)(cid:113) D1(x) + n−α∇M (1) n (x, θ1) + n−αh1∇2M (1) n (x, t2) Det(Σ−1/2(x))I(x ∈ G), where t1, t2 are some numbers in (θ1, θ1 + h1) than could depend on x. √ In the following expression we consider the integral of the squared difference of the partial derivatives of φn with respect to θ1 at two different coordinate points (θ1 + h1, θ2) and (θ1, θ2). To carry out the integration, the transformation that is given in Equation (2.5.2.13) will be used. Also for notational simplicity often times the arguments involving ˜x, associated with the change of variable x = (˜x1, n−γ ˜x−1), will be suppressed in the following expressions from here onward. Also it is understood that the range of integration remains the same for 46 ˜y coordinate, which is RN . However, the range of integration of ˜x changes to ˜Gn. I(x  G)dxdy √ φn ∂θ1 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2) (cid:90) (cid:90) (cid:20)(cid:28) (cid:90) (cid:90)  ∂ (cid:18) (cid:113) f (˜y − Σ−1/2Bh1(n−αD1 + n−2α∇M (1) √ φn ∂θ1 BT Σ−1/2 − ∂ =n−2α−(d−1)γ ∇ (cid:29) n ((˜x1, n−γ ˜x−1), θ1) + (cid:28) D1 + n−α∇M (1) (cid:113) f (˜y − Σ−1/2Bh1(n−αD1 + n−2α∇M (1) BT Σ−1/2(˜x1, n−γ ˜x−1) ∇ n−αh1∇2M (1) (cid:29)(cid:21)2 n ((˜x1, n−γ ˜x−1), t2) n ((˜x1, n−γ ˜x−1), t1))) − ∇(cid:112)f (˜y) (cid:19) , (2.5.2.26) n ((˜x1, n−γ ˜x−1), t1))), Det(Σ1/2)d˜xd˜y, which can be bounded by the sum of the following two terms in Equation (2.5.2.27) and (2.5.2.30) respectively, using simple algebraic inequality (a + b)2 ≤ 2a2 + 2b2. The first term 47 is (cid:90) (cid:90) (cid:20)(cid:28) (cid:18) 2n−2α−(d−1)γ BT Σ−1/2 (cid:113) f (˜y − Σ−1/2Bh1(n−αD1 + n−2α∇M (1) ∇ n ((˜x1, n−γ ˜x−1), t1))) − ∇(cid:112)f (˜y) (cid:19) , (cid:29)2 (cid:90) (cid:90) (cid:20)(cid:28) n ((˜x1, n−γ ˜x−1), θ1) Det(Σ1/2)d˜xd˜y D1 + n−α∇M (1) (cid:18)(cid:90) 1 =2n−2α−(d−1)γ ∇2 0 BT Σ−1/2 (cid:113) f (˜y − τ Σ−1/2Bh1(n−αD1 + n−2α∇M (1) n ((˜x1, n−γ ˜x−1), t1)))dτ (cid:19) (2.5.2.27) =2n−4α−(d−1)γh2 (cid:29)2 Σ−1/2Bh1(n−αD1 + n−2α∇M (1) n ((˜x1, n−γ ˜x−1), t1)), (cid:90) (cid:90) (cid:20)(cid:28) (cid:18) (cid:19) D1 + n−α∇M (1) n ((˜x1, n−γ ˜x−1), θ1) ∇2(cid:112)f (y) (cid:29)2 Σ−1/2B(D1 + n−α∇M (1) D1 + n−α∇M (1) n ((˜x1, n−γ ˜x−1), t1)), n ((˜x1, n−γ ˜x−1), θ1) BT Σ−1/2 Det(Σ1/2)d˜xd˜y Det(Σ1/2)d˜xdy, 1 where the following change of variable is used y = ˜y − τ h1n−α(Σ−1/2BD1 + n−αΣ−1/2B∇M (1) n (·, t1)). (2.5.2.28) 48 Now this term can be bounded further by (cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)BT Σ−1/2(˜x1, n−γ ˜x−1) (cid:18) 2n−2α−(d−1)γh2 1 (cid:19) Σ−1/2(˜x1, n−γ ˜x−1) n ((˜x1, n−γ ˜x−1), t1)) ∇2(cid:112)f (y) (cid:13)(cid:13)(cid:13)(cid:13)2 (cid:13)(cid:13)(cid:13)(cid:13)2 n ((˜x1, n−γ ˜x−1), θ1) Det(Σ1/2(˜x1, n−γ ˜x−1))d˜xdy (D1(˜x1, n−γ ˜x−1) + n−α∇M (1) n ((˜x1, n−γ ˜x−1), θ1) Det(Σ−3/2(˜x1, n−γ ˜x−1))d˜xdy (2.5.2.29) 1 × ≤ 2Ln−2α−(d−1)γh2 × B(D1(˜x1, n−γ ˜x−1) + n−α∇M (1) (cid:19) ∇2(cid:112)f (y) (cid:13)(cid:13)(cid:13)(cid:13)D1(˜x1, n−γ ˜x−1) + n−α∇M (1) (cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) (cid:13)(cid:13)(cid:13)(cid:13)D1(˜x1, n−γ ˜x−1) + n−α∇M (1) (cid:18)(cid:90) ≤ 2Ln−2α−(d−1)γ(1 + O(n−α))h2 × 1 ≤ 2Ln−2α−(d−1)γ(1 + O(n−α))h2 1 × × (cid:18)(cid:90) (cid:107) g(˜x)(cid:107)4(cid:107)v(1) 0 (˜x1, n−γ ˜x−1)(cid:107)4(M−1)d˜x ≤ 2Ln−2α−(d−1)γ(1 + O(n−α))h2 1 (cid:13)(cid:13)(cid:13)(cid:13)2 dy (cid:13)(cid:13)(cid:13)(cid:13)2 n ((˜x1, n−γ ˜x−1), t1)) (cid:33) (cid:32)(cid:90) (cid:13)(cid:13)(cid:13)(cid:13)∇2(cid:112)f (y) (cid:13)(cid:13)(cid:13)(cid:13)2 (cid:19) (cid:32)(cid:90) (cid:13)(cid:13)(cid:13)(cid:13)∇2(cid:112)f (y) (cid:33) (cid:13)(cid:13)(cid:13)(cid:13)2 (cid:19) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)∇2(cid:112)f (y) (cid:13)(cid:13)(cid:13)(cid:13)2 (cid:107) g(˜x)(cid:107)4d˜x (cid:90) dy dy (cid:107)D1(˜x1, n−γ ˜x−1)(cid:107)4Det(Σ−3/2(˜x1, n−γ ˜x−1))d˜x ≤ h2 1n−4α−(d−1)γL(1) 1 (1 + O(n−α)), where we used CS inequality and the triangle inequality for Euclidean norms. In the last inequality we used the assumptions on g and v D1 using, (cid:107)D1(cid:107) ≤ λ1(cid:107)v (cid:107)∇M (1) 0 , and we bounded the Euclidean norm of 0 (cid:107)M−1(cid:107)g(cid:107) ∀ (˜x1, n−γ ˜x−1) ∈ ˜Gn. On the other hand, we used M(cid:107)g(cid:107)M ) as another bound on the Euclidean norm n (cid:107) ≤ λi(a n . We also used the observation that eigenvalues of Σ(x), x ∈ G are assumed to be 0 (cid:107)M−2(cid:107)g(cid:107)2 +. . .+a 2 (cid:107)v (1) (i) (i) (i) (i) (i) of M bounded away from 0 and ∞. Finally L, L (1) 1 > 0 are generic constants. 49 The second term in the sum that bounds (2.5.2.26) is given by 2n−4α−(d−1)γh2 1 Det(Σ1/2)d˜xdy ≤2n−4α−(d−1)γh2 BT Σ−1/2∇(cid:112)f (y),∇2M (cid:13)(cid:13)(cid:13)(cid:13)2(cid:13)(cid:13)(cid:13)(cid:13)∇2M (cid:90) (cid:90) (cid:28) (cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)BT Σ−1/2∇(cid:112)f (y) (cid:90) (cid:107)∇(cid:112)f (y)(cid:107)2dy Det(Σ1/2)d˜xdy 1n−4α−(d−1)γ 1n−4α−(d−1)γL (cid:107)∇2M (cid:90) (2) 1 , 1 ≤2h2 ≤2h2 (1) (cid:29)2 n ((˜x1, n−γ ˜x−1), t2) (cid:13)(cid:13)(cid:13)(cid:13)2 n ((˜x1, n−γ ˜x−1), t2) (1) (2.5.2.30) (1) n (t2)(cid:107)2Det(Σ−1/2)d˜x where the following transformation is used y = ˜y − h1n−αΣ−1/2(˜x)BD1(˜x) − h1n−2αΣ−1/2(˜x)B∇M (1) n (˜x, t2), (2.5.2.31) also a simple application of CS inequality separates out the quantities involving y and ˜x. Furthermore, ∇2M sponding symmetrized version in a similar fashion as we did it for ∇M (1) n (t2) can be bounded in its matrix norm using the bound on its corre- (1) n . Also L (2) 1 > 0 is generic constant which bounds the product of the integrals in (2.5.2.30). Combining (2.5.2.28) and (2.5.2.30), we obtain the following bound for all sufficiently large n (cid:90) (cid:90) ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1+h1,θ2) √ φn ∂θ1 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) √ φn ∂θ1 − ∂ I(x  G)dxdy ≤ h2 1n−4α−(d−1)γL1. (2.5.2.32) Next, we bound the integral of the squared difference of the partial derivative of √ φn with respect to θ1 at two points (θ1, θ2 + h2) and (θ1, θ2) respectively. Recall the transformation 50 (2.5.2.13) that we used earlier (cid:16) ˜y = Σ−1/2(x) y − B(cid:0)D0(x) + θ1n−αD1(x) + n−2αM + n−2αM (1) n (x, θ1) + θ2n−α2D2(x) n (x, θ2)(cid:1)(cid:17) (2) , therefore, ∂ √ φn ∂θ1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) = −n−α (cid:113) Det(Σ−1/2(x))I(x ∈ G). (cid:28) BT Σ−1/2(x)∇(cid:112)f (˜y), D1(x) + n−α∇M (cid:29) (1) n (x, θ1) Then √ φn ∂θ1 ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2) (cid:28) BT Σ−1/2(x) = −n−α (cid:113) f (˜y − Σ−1/2(x)Bh2(n−αD2(x) + n−2α∇M × ∇ D1(x) + n−α∇M (1) n (x, θ1) Det(Σ−1/2(x))I(x ∈ G), (cid:29)(cid:113) (2) n (x, t3))), (2.5.2.33) where t3 is a number in (θ2, θ2 + h2) which may depend on x. Next, the transformation (2.5.2.13) is used. Recall that the transformed variable is x = (˜x1, n−γ ˜x−1) and ˜x is inte- 51 grated over ˜Gn. √ √ φn ∂θ1 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(θ1,θ2+h2) (cid:90) (cid:90) (cid:28) (cid:90) (cid:90)  ∂ (cid:18) (cid:113) f (˜y − Σ−1/2Bh2(n−αD2 + n−2α∇M (2) BT Σ−1/2 φn ∂θ1 − ∂ ∇ × =n−2α−(d−1)γ (cid:29)2 I(x  G)dxdy (cid:19) n ((˜x1, n−γ ˜x−1), t3))) − ∇(cid:112)f (˜y) , I(˜x ∈ ˜Gn)Det(Σ1/2)d˜xd˜y =n−4α−(d−1)γh2 (cid:90) (cid:90) (cid:28) n ((˜x1, n−γ ˜x−1), θ1) D1 + n−α∇M (1) (cid:18)(cid:90) 1 (cid:113) f (˜y − τ Σ−1/2Bh2(n−αD2 + n−2α∇M (2) BT Σ−1/2 ∇2 × 2 n ((˜x1, n−γ ˜x−1), t3)))dτ I(˜x ∈ ˜Gn)Det(Σ1/2)d˜xd˜y (cid:19) (2.5.2.34) =n−4α−(d−1)γh2 2 0 × Σ−1/2B(D2 + n−α∇M (2) D1 + n−α∇M (1) BT Σ−1/2 n ((˜x1, n−γ ˜x−1), t3))), n ((˜x1, n−γ ˜x−1), t3))), (cid:19) (cid:29)2 (cid:90) (cid:90) (cid:28) (cid:18) n ((˜x1, n−γ ˜x−1), θ1) ∇2(cid:112)f (y) (cid:29)2 (cid:19) n ((˜x1, n−γ ˜x−1), θ1) ∇2(cid:112)f (y) (cid:13)(cid:13)(cid:13)(cid:13)2 (D2 + n−α∇M (2) (cid:19)2 (cid:90) (cid:18) n ((˜x1, n−γ ˜x−1), θ1) (cid:90) ∇2(cid:112)f (y) (cid:32)(cid:90) (cid:18) (cid:19)2 ∇2(cid:112)f (y) (cid:19) dy dy × Σ−1/2B(D2 + n−α∇M (2) D1 + n−α∇M (1) (cid:90) (cid:90) (cid:13)(cid:13)(cid:13)(cid:13)(cid:18) (cid:13)(cid:13)(cid:13)(cid:13)D1 + n−α∇M (1) ≤n−4α−(d−1)γh2 2L × ≤n−4α−(d−1)γ(1 + O(n−α))h2 2L ≤n−4α−(d−1)γ(1 + O(n−α))h2 2L (cid:18)(cid:90) (cid:107)g(˜x)(cid:107)4(cid:107)v(2) 0 (˜x1, n−γ ˜x−1)(cid:107)4d˜x (cid:13)(cid:13)(cid:13)(cid:13)2 I(˜x ∈ ˜Gn)Det(Σ1/2)d˜xdy n ((˜x1, n−γ ˜x−1), t3))) I(˜x ∈ ˜Gn)Det(Σ−3/2)d˜xdy (cid:33) (cid:107)D2(cid:107)4Det(Σ1/2)d˜x ≤h2 2n−4α−(d−1)γL2(1 + O(n−α)). The bound (2.5.2.34) can be obtained using similar arguments we used earlier in this proof to bound D1 and D2 in Euclidean norm and using the fact that eigenvalues of Σ(x) are 52 bounded away from 0 and ∞ so that the matrix norm of Σ−3/2(x) is finite. The bounds that have been established in (2.5.2.32) and (2.5.2.34) can be used with each of the components of equation (2.5.2.24) to obtain the final bound. It is possible because the expressions are symmetric in arguments θ1 and θ2. Finally, tying up all the bounds together for some generic constant L > 0 we obtain for all sufficiently large n (cid:90) (cid:90) (cid:104)∇(cid:112)φn(x, y, θ1, θ2), l(cid:105) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+h 2 θ I(x ∈ G)dxdy ≤4l2 1n−4α−(d−1)γL1 + 4l2 2n−4α−(d−1)γL2 + 4l2 2n−4α−(d−1)γL3 1h2 1h2 2h2 1n−4α−(d−1)γL4 + 4l2 2h2 ≤(cid:107)h(cid:107)2 (cid:107)l(cid:107)2 n−4α−(d−1)γL. (2.5.2.35) Next, we check condition (1) in Theorem II.6.1 in Ibragimov and Khasminskii (2013), which together with Lemma 6 will imply that the family of distributions φn is LAN. Lemma 7. For any k > 0 (cid:90) (cid:90) (cid:28) ∂ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)θ+Ψ−1/2(n)u θ (cid:29)2 , Ψ−1/2(n)u I (x ∈ G) dxdy = 0. (2.5.2.36) √ φn ∂θ n→∞ sup lim (cid:107)u(cid:107) 0, we obtain n→∞ sup lim (cid:107)u(cid:107) 1, a, b ∈ R, 54 the following can be observed (cid:13)(cid:13)(cid:13)(cid:13)2+δ = ≤2δ/2 ∂ ln φn ∂θ ∂θ1 ∂ ln φn (cid:13)(cid:13)(cid:13)(cid:13)Ψ−1/2(n) (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)a11(n) (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)a11(n) (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)a11(n) (cid:12)(cid:12)(cid:12)(cid:12)a21(n) ∂θ1 + ∂ ln φn ≤21+3δ/2 + a12(n) ∂ ln φn ∂θ1 + a12(n) ∂ ln φn ∂θ1 (cid:12)(cid:12)(cid:12)(cid:12)2+δ + It is enough to show that (cid:12)(cid:12)(cid:12)(cid:12)2(cid:35)(2+δ)/2 (cid:12)(cid:12)(cid:12)(cid:12)2+δ(cid:35) ∂θ2 ∂ ln φn ∂ ln φn + a22(n) ∂ ln φn ∂θ2 ∂ ln φn ∂θ1 + a22(n) + + + ∂θ2 ∂θ1 ∂θ2 ∂θ2 ∂ ln φn ∂ ln φn ∂ ln φn (cid:12)(cid:12)(cid:12)(cid:12)2 (cid:12)(cid:12)(cid:12)(cid:12)a21(n) (cid:12)(cid:12)(cid:12)(cid:12)2+δ (cid:12)(cid:12)(cid:12)(cid:12)a21(n) (cid:12)(cid:12)(cid:12)(cid:12)2+δ (cid:12)(cid:12)(cid:12)(cid:12)2+δ (cid:12)(cid:12)(cid:12)(cid:12)a12(n) (cid:12)(cid:12)(cid:12)(cid:12)2+δ(cid:35) (cid:12)(cid:12)(cid:12)(cid:12)a22(n) (cid:12)(cid:12)(cid:12)(cid:12)2+δ (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ln φn (cid:12)(cid:12)(cid:12)(cid:12)2+δ (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ln φn ∂ ln φn ∂θ1 ∂θ2 ∂θ2 . lim n→∞ n lim n→∞ n φn(x, y, θ1, θ2)dxdy = 0, φn(x, y, θ1, θ2)dxdy = 0. (2.5.2.39) (2.5.2.40) (2.5.2.41) For the following expression again the change of variables described in Equation (2.5.2.13) will be used. Since the transformation on x is x = (˜x1, n−γ ˜x−1), the arguments involving ˜x will be suppressed for notational simplicity and the corresponding range of integration will be ˜Gn. Therefore, using the expression for ∂ ln φn ∂θ1 from equation (2.5.2.8) the following can 55 be obtained: (cid:12)(cid:12)(cid:12)(cid:12)2+δ (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ln φn (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)f−1(cid:104)∇f, n−αΣ−1/2(x)BD1(x) + n−2αΣ−1/2(x)B∇M (1) φn(x, y, θ1, θ2)dxdy ∂θ1 n =n n (x, θ1)(cid:105)(cid:12)(cid:12)(cid:12)2+δ × φn(x, y, θ1, θ2)dxdy ≤21+δn1−α(2+δ) + 21+δn1−2α(2+δ) =21+δn1−γ(d−1)−α(2+δ) × f−(1+δ)(˜y)d˜xd˜y +21+δn1−γ(d−1)−2α(2+δ) × f−(1+δ)(˜y)d˜xd˜y ≤Ln1−γ(d−1)−α(2+δ) +Ln1−γ(d−1)−2α(2+δ) ≤Ln1−γ(d−1)−α(2+δ) n (x, θ1) (cid:12)(cid:12)(cid:12)2+δ φn(x, y, θ1, θ2)dxdy φn(x, y, θ1, θ2)dxdy (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)f−1(cid:104)∇f, Σ−1/2(x)BD1(x)(cid:105)(cid:12)(cid:12)(cid:12)2+δ (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)f−1(cid:104)∇f, Σ−1/2(x)B∇M (1) (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:104)∇f (˜y), Σ−1/2(˜x1, n−γ ˜x−1)BD1(˜x1, n−γ ˜x−1)(cid:105)2(cid:12)(cid:12)(cid:12)(1+δ/2) (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:104)∇f (˜y), Σ−1/2)B∇M (1) n ((˜x1, n−γ ˜x−1)), θ1)(cid:105)2(cid:12)(cid:12)(cid:12)1+δ/2 (cid:90) (cid:13)(cid:13)(cid:13)Σ−1/2D1 (cid:13)(cid:13)(cid:13)2+δ (cid:90) (cid:13)(cid:13)(cid:13)Σ−1/2∇M (1) (cid:19)L(1) (cid:90) (cid:19) (cid:90) (cid:90) (cid:18)(cid:90) (cid:18)(cid:90) (cid:107)∇f (˜y)(cid:107)2+δ f−(1+δ)(˜y)d˜y (cid:107)∇f (˜y)(cid:107)2+δ f−(1+δ)(˜y)d˜y (cid:107)∇ log f (˜y)(cid:107)2+δ f (˜y)d˜y (cid:107)g(˜x)(cid:107)2+δd˜x n (θ1) ˜x∈ ˜Gn d˜x d˜x n +Ln1−γ(d−1)−2α(2+δ) (cid:107)∇ log f (˜y)(cid:107)2+δ f (˜y)d˜y L(2) n (cid:90) ˜x∈ ˜Gn (cid:107)a(i) 2 (cid:107)v(i) 0 (cid:107)M−2(cid:107)g(˜x)(cid:107)2 + . . . + a(i) M(cid:107)g(˜x)(cid:107)M(cid:107)2+δd˜x ≤n1−γ(d−1)−α(2+δ)L, (2) (1) n , L n where L, L are generic constants. We use CS inequality, the boundedness of the eigenvalues of Σ(˜x1, n−γ ˜x−1) and the boundedness of the Euclidean norm of the vector field 0 (˜x1, n−γ ˜x−1). As it can be noted the technique that has been used to bound D1, M (1) n (1) v 56 (2.5.2.42) (cid:13)(cid:13)(cid:13)2+δ   and Σ here, is the same technique that have been used in the previous Lemmas 5 and 7. Therefore, (cid:90) (cid:90) (cid:12)(cid:12)(cid:12)(cid:12)φ−1 n (cid:12)(cid:12)(cid:12)(cid:12)2+δ ∂φn ∂θ1 lim n→∞ n φn(x, y, θ1, θ2)dxdy ≤ lim n→∞ O(n−αδ) = 0. With a similar argument equation (2.5.2.41) can also be established. Hence, we conclude the proof of Lyapunov’s condition. Finally, we present the famous lemma due to H´ajek (1972) tailored for our purposes. This lemma will serve as a crucial connection to prove the minimax lower bound for the integral curves based on tensor fields from the class D2(a, G, τ ). Lemma 9. For any estimator ˆθn of the parameter θ ≡ (θ1, θ2) in the parametric family of distributions φn described in lemma 4 satisfying LAN condition, any loss function w ∈ W, and for any b > 0, for which Kb is the square [−b, b]2 in R2, the following holds: lim inf n→∞ {(θ1,θ2): I (θ1,θ2) ∈Kb} sup 1/2 0 ≥0.25(2π)−1 w ((cid:107)y(cid:107)) exp En θ w (cid:18) −(cid:107)y(cid:107) 2 ˆθ1,n − θ1 ˆθ2,n − θ2 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   1/2 0 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I  (cid:19) dy, Kb/2 (cid:90) (cid:90) (cid:90) n(cid:89) i=1 57 where En θ is the expectation with respect to family of measures . . . φn(xi, yi, θ1, θ2)dxidyi. 2.5.3 Proofs of Theorems 2.5.3.1 Proof of Theorem 1 The integral curves are defined to follow along the corresponding unit vector fields. Since (cid:107)v 0 (cid:107) = cr < 1, we define the following integral curves (r) ∂ ∂t x (r) 0 (t) = ∂ ∂t x (r) n (t; θr) = v v (cid:107)v (r) 0 (t)) 0 (t))(cid:107), (r) (r) 0 (x (r) 0 (x (r) (r) n (x n (t; θr)), (2.5.3.1) r = 1, 2. Note that suppressing the argument inside x (1) 0 (s) in the vector field it can be observed that, (1) 0 (cid:107)2 + 2θ1n−α(cid:104)v (1) 0 , g(cid:105) + (cid:107)g(cid:107)2θ2 1n−2α 1 = (cid:107)v (1) (1) 0 + θ1n−αg(cid:107)2 = (cid:107)v =⇒ 1 − (cid:107)v =⇒ 1 − (cid:107)v (cid:107)v 0 (cid:107)2 = 2θ1n−α(cid:104)v 2θ1n−α(cid:104)v 0 (cid:107) 0 (cid:107) = (1) (1) (1) 0 , g(cid:105) + (cid:107)g(cid:107)2θ2 0 , g(cid:105) + (cid:107)g(cid:107)2θ2 (1 + c1)c1 (1) 1n−2α 1n−2α . 58 Define for r = 1, (1) n (t) = x ∆ (1) 0 (t) (1) n (s; θ1); θ1)ds − t(cid:90) 0 v (cid:107)v (1) 0 (x (1) 0 (x (1) 0 (s)) 0 (s))(cid:107) ds (1) (cid:105) (1) n (t; θ1) − x t(cid:90) t(cid:90) (1) n (x v 0 (cid:104) (1) 0 (x v = = 0 + θ1n−α t(cid:90)  1(cid:90) 0 0 = (1) 0 (x (1) 0 (s)) ds (1) n (s; θ1)) − v t(cid:90) (cid:104) g(x 0 − θ1n−α(cid:107)g(x (1) n (s; θ1)) − 2(cid:104)v 0 (s))(cid:107)2v (1 + c1)c1 n (s; θ1) + (1 − λ)x (1) (1) ∇v (1) 0 (λx (1) 0 , g(cid:105)v (1) 0 (x (1 + c1)c1 (1) 0 (s)) (1) 0 (x (1) 0 (s)) ds (cid:105)  ∆ (1) 0 (s))dλ (1) n (s)ds t(cid:90) (cid:104) 0 + θ1n−α g(x − θ1n−α(cid:107)g(x (1) n (s; θ1)) − 2(cid:104)v 0 (s))(cid:107)2v (1 + c1)c1 (1) 0 (x (1) (1) 0 , g(cid:105)v (1) 0 (x (1 + c1)c1 (1) 0 (s)) (cid:105) (1) 0 (s)) ds. Then we obtain (cid:13)(cid:13)(cid:13)∆(1) n (t) (cid:13)(cid:13)(cid:13) ≤ |θ1|n−α t(cid:90) + max λ∈[0,1] 0 ≤ |θ1|n−α (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + An−α (cid:33) (cid:13)(cid:13)(cid:13) ds (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)∆(1) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + An−α (cid:33) (cid:13)(cid:13)(cid:13) ds}, 0 (s)) n (s) 0 , g(cid:105)v(1) 0 (x(1) (1 + c1)c1 0 (s)) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) 2(cid:104)v(1) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) 2(cid:104)v(1) t(cid:90) 0 t(cid:90) 0 (cid:32)(cid:13)(cid:13)(cid:13)g(x(1) (cid:13)(cid:13)(cid:13)∇v(1) (cid:32)(cid:13)(cid:13)(cid:13)g(x(1) t(cid:90) max λ∈[0,1] exp{ (cid:13)(cid:13)(cid:13) + n (s; θ1)) n (s; θ1)) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)∇v(1) 0 59 0 (λx(1) n (s, θ1) + (1 − λ)x(1) 0 (s)) 0 , g(cid:105)v(1) 0 (x(1) (1 + c1)c1 0 (s)) 0 (λx(1) n (s, θ1) + (1 − λ)x(1) (2.5.3.2) ds ds (2.5.3.3) where A > 0 is some positive constant. The last step in equation (2.5.3.3) is done using (1) are continuously differentiable Gr¨onwall-Bellman-type inequality. Recall that g and v 0 vector fields defined on an open bounded convex subset G of Rd. This implies that they are L2 bounded as well. Therefore, (cid:13)(cid:13)(cid:13) ≤ |θ1|n−αC (1) 0 . Similarly, it can also be (1) n (t) established that (2) n (t) (2) 0 . Next, consider (cid:13)(cid:13)(cid:13)∆ (cid:13)(cid:13)(cid:13) ≤ |θ2|n−αC (cid:105) (cid:13)(cid:13)(cid:13)∆ (cid:104) v(1) 0 (x(1) + θ1n−α 0 (x(1) 0 (s)) ds n (s; θ1)) − v(1) t(cid:90) (cid:104) g(x(1) n (s; θ1)) − 2(cid:104)v(1) 0 (s))(cid:107)2v(1) (1 + c1)c1 0 − θ1n−α(cid:107)g(x(1) 0 , g(cid:105)v(1) 0 (x(1) (1 + c1)c1 0 (s)) 0 (x(1) 0 (s)) (cid:105) ds ∇v(1) 0 (x(1) 0 (s))∆(1) n (s)ds ∆(1) n (t) = = t(cid:90) 0 t(cid:90) 0 + 0 Denote (cid:104)∇2v(1) 0 (λx(1) n (s; θ1) + (1 − λ)x(1) 0 (s)), (∆(1) n (s))T(cid:105)dλ n (s)ds (2.5.3.4)  ∆(1)  1(cid:90) t(cid:90) t(cid:90) 0 0 (cid:34)  1(cid:90) t(cid:90) 0 1 2 + θ1n−α + θ1n−α t(cid:90) − (cid:35)  ∆(1) g(x(1) 0 (s)) − 2(cid:104)v(1) 0 , g(cid:105)v(1) 0 (x(1) (1 + c1)c1 0 (s)) ds ∇g(ρx(1) n (s, θ1) + (1 − ρ)x(1) 0 (s))dρ n (s)ds 0 0 1n−2α(cid:107)g(x(1) θ2 0 (s))(cid:107)2v(1) (1 + c1)c1 0 (x(1) 0 (s)) . ˜g(x (1) 0 (s)) = g(x 0 (s)) − 2(cid:104)v (1) (1) 0 , g(cid:105)v (1) 0 (x (1 + c1)c1 (1) 0 (s)) . 60 Therefore, by the triangle inequality for the Euclidean norm (1) n (t) − ∇v (1) 0 (x (1) 0 (s))∆ (1) n (s)ds − θ1n−α ˜g(x t(cid:90) 0 ≤ 1 2 t(cid:90) 0  1(cid:90) t(cid:90) 0 (1) 0 (λx (cid:13)(cid:13)(cid:13)∇2v  1(cid:90) (cid:13)(cid:13)(cid:13)∇g(ρx + |θ1|n−α t(cid:90) 0 (1) 0 (s))ds (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:107)∆ (cid:13)(cid:13)(cid:13) dλ (cid:13)(cid:13)(cid:13)∆ (cid:13)(cid:13)(cid:13) dρ (1) n + (1 − λ)x (1) 0 ) (1) n + (1 − ρ)x (1) 0 ) (1) n (s)(cid:107)2ds (cid:13)(cid:13)(cid:13) ds (1) n (s) 0 0 + θ2 1n−2α t(cid:90) 0 Hence it is evident that for all sufficiently large n (1) n (t) − ∇v (1) 0 (x (1) 0 (s))∆ (1) n (s)ds − θ1n−α (cid:107)g(x (1) 0 (s))(cid:107)2(cid:107)v (1) 0 (x (1 + c1)c1 (1) 0 (s))(cid:107) ds. (2.5.3.5) t(cid:90) 0 ˜g(x (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ Cn−2α. (1) 0 (s))ds (2.5.3.6) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)∆ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)∆ t(cid:90) 0 t(cid:90) So the integral equation along with the corresponding differential equation from the inequal- ity in (2.5.3.6) is given by, (1) n (t) = ∆ ∇v (1) 0 (x (1) 0 (s))∆ (1) n (s)ds + θ1n−α 0 (1) n (t) = ∇v ∆ (1) 0 (x ∂ ∂t (1) 0 (t))∆ (1) n (t) + θ1n−α˜g(x (1) 0 (t)) + O(n−2α). t(cid:90) 0 (1) 0 (s))ds + O(n−2α), ˜g(x (2.5.3.7) In order to solve the ODE given in equation (2.5.3.7) note that there exists a function G(1)(s, t) : (t, s) ∈ [0, T ]2 which satisfies the following conditions: 1. ∂ ∂t G(1)(s, t) = ∇v (1) 0 (x (1) 0 (t))G(1)(s, t), 0 ≤ s ≤ t ≤ T ; 2. G(1)(t, t) = Id×d; 61 3. G(1)(s, t) ≡ 0 , 0 ≤ t < s ≤ T. Then by Theorem (2.2) in chapter (7) in Coddington and Levinson (1955) the solution to equation (2.5.3.7) is given using the Green’s function G(1)(s, t) as the following (1) n (t) = θ1n−α ∆ G(1)(s, t)˜g(x (1) 0 (s))ds + O(n−2α) ∀t ∈ [0, T ] . Which in turn provides the following relation (1) n (t0; θ1) − x x (1) 0 (t0) = θ1n−α G(1)(s, t0)˜g(x (1) 0 (s))ds + O(n−2α). (2.5.3.8) 0 Similarly, for ∆ (2) n (t) = x (2) n (t; θ2) − x (2) 0 (t) the differential equation can be given by, (2) n (t) = ∇v ∆ (2) 0 (x ∂ ∂t (2) 0 (t))∆ (2) n (t) + θ2n−α˜g(x (2) 0 (t)) + O(n−2α), (2.5.3.9) and the solution of the differential equation can be given via G(2)(s, t0), another Green’s function which satisfies the above mentioned three conditions with v (2) 0 . (2) n (t0; θ2) − x x (2) 0 (t0) = θ2n−α G(2)(s, t0)˜g(x (2) 0 (s))ds + O(n−2α). (2.5.3.10) t(cid:90) 0 t0(cid:90) t0(cid:90) 0 62 For two unit vectors ei ∈ Rd, i = 1, 2, that are non-orthogonal to ˜g(x (i) 0 (t)), t ∈ [0, T ], i = 1, 2, respectively, observe that (cid:104)x(1) n (t0; θ1) − x(1) 0 (t0), e1(cid:105) = θ1n−α (cid:104)x(2) n (t0; θ2) − x(2) 0 (t0), e2(cid:105) = θ2n−α  t0(cid:90)  t0(cid:90) 0 0 (cid:104)G(1)(s, t0)˜g(x(1) 0 (s)), e1(cid:105)ds (cid:104)G(2)(s, t0)˜g(x(2) 0 (s)), e2(cid:105)ds  (1 + o(1)) ,  (1 + o(1)) . (2.5.3.11) Note that due to conditions (1) and (2) on G(r) there exists an  > 0 such that G(r)(s∗, t) = Id×d + o(1), s∗ ∈ [t − , t] . Hence, it is evident that for r = 1, 2, t0(cid:90) (cid:104)G(r)(s, t0)˜g(x Cr = 0 (r) 0 (s)), er(cid:105)ds (cid:54)= 0. (2.5.3.12) Therefore, for some r (1) n (t0; θ1), r (2) n (t0; θ2) converging to 0 as n → ∞ we have, (cid:16)(cid:104)x (cid:16)(cid:104)x θ1 = θ2 = nα C1 nα C2 (1) 0 (t0), e1(cid:105)(cid:17)(cid:16) 0 (t0), e2(cid:105)(cid:17)(cid:16) (2) (1) n (t0, θ1) − x n (t0, θ2) − x (2) (cid:17) (cid:17) 1 + r (1) n (t0; θ1) 1 + r (2) n (t0; θ2) , . (2.5.3.13) For r = 1, 2 an estimator of θr based on the integral curve estimate ˆX (r) n (t0) for x (r) n (t0, θr) can be constructed as: ˆθr,n = nα Cr (cid:16)(cid:104) ˆX (r) n (t0) − x 0 (t0), er(cid:105)(cid:17)(cid:16) (r) 1 + R (r) n (t0) (cid:17) , (2.5.3.14) where R (r) n (t0) converges to 0 as n → ∞ and satisfies, sup |θr|≤M | R (r) n (t0) − r (r) n (t0; θr) | ≤ sup |θr|≤M | (cid:104) ˆX (r) n (t0) − x (r) n (t0, θr), er(cid:105) | . (2.5.3.15) nα Cr 63 Now notice that, + nα n (t0) − x(r) | ˆθr,n − θr | ≤ nα | Cr | | (cid:104) ˆX (r) | Cr | | (cid:104) ˆX (r) ≤ nα n (t0) − x(r) | Cr | | (cid:104) ˆX (r) | θr | | 1 + r(r) n (t0; θr) | + n (t0, θr), er(cid:105) | n (t0, θr), er(cid:105) || 1 + R(r) n (t0) | | R(r) n (t0) − r(r) n (t0; θr) | . n (t0) − x(r) 0 (t0), er(cid:105)R(r) n (t0) − (cid:104)x(r) n (t0, θr) − x(r) 0 (t0), er(cid:105)r(r) n (t0; θr) | (2.5.3.16) Due to (A16), an M0 > 0 is chosen such that {(θ1, θ2) : I θr |≤ M0} for some square Kb ∈ R2. Therefore, we get the following inequality 1/2 0 (θ1, θ2) ∈ Kb} ⊂ {(θ1, θ2) : | (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   1/2 0 ˆθ2,n − θ2 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I ˆθ1,n − θ1  (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   ˆθ1,n − θ1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  (cid:13)(cid:13)(cid:13) (2 + 2M0)nα ˆθ2,n − θ2 E ˜w (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I  (cid:13)(cid:13)(cid:13)I 1/2 0 1/2 0 {(θ1,θ2): I (θ1,θ2) ∈Kb} sup 1/2 0 ≤ ≤ sup {(θ1,θ2): |θi|≤M0} sup {(θ1,θ2): |θi|≤M0} E ˜w E ˜w (cid:104) ˆX (cid:104) ˆX (1) (1) n (t0) − x n (t0, θ1), e1(cid:105) C1 n (t0) − x n (t0, θ2), e2(cid:105) (2) C2 (2) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   . (2.5.3.17) Since ˜w ∈ W, it is an increasing function and (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I 1/2 0 ˆθ1,n − θ1 ˆθ2,n − θ2 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤(cid:13)(cid:13)(cid:13)I  1/2 0 (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ˆθ1,n − θ1 ˆθ2,n − θ2 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ,  is used in equation (2.5.3.17) along with the relation established in (2.5.3.15) and (2.5.3.16). 64 Finally, define ˜w inequality sup D∈D2(a,G,τ ) Ew 1/2 0 (cid:13)(cid:13)(cid:13)I (cid:13)(cid:13)(cid:13) (2 + 2M0) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:104) ˆX nα n (t0) − x n (t0) − x (cid:104) ˆX (2) (1) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  y1 C1 y2 C2 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  = w ((cid:107)y(cid:107)) and, therefore, we have the  (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   ≥ sup {(θ1,θ2): |θi|≤M0} Ew (1) n (t0) − x n (t0) − x (2) (1) n (t0, θ1), e1(cid:105) n (t0, θ2), e2(cid:105) (2) (2.5.3.18) Therefore, using H´ajek’s lemma for ˜w, the following gives us the ultimate lower bound (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  .  (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   (1) (2) (cid:104) ˆX n (t0, θ1), e1(cid:105) n (t0, θ2), e2(cid:105) nα (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:104) ˆX (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:104) ˆX nα (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)I ˆθ1,n − θ1  (cid:19) ˆθ2,n − θ2 1/2 0 (cid:104) ˆX dy. lim inf n→∞ inf n ∈En(T ) (2) ˆX (1) n , ˆX sup D∈D2(a,G,τ ) Ew ≥ lim inf n→∞ {(θ1,θ2): I (θ1,θ2) ∈Kb} sup 1/2 0 (cid:90) ≥ 0.25(2π)−1 ˜w ((cid:107)y(cid:107)) exp (cid:18) E ˜w −(cid:107)y(cid:107) 2 (1) n (t0) − x n (t0) − x (2) (1) n (t0, θ1), e1(cid:105) n (t0, θ2), e2(cid:105) (2) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)   (2.5.3.19) Kb/2 This provides a nontrivial bound for the minimax loss to the class of estimators for all b > 0. Thus, by choosing b arbitrarily large the proof can be completed. 2.5.3.2 Proof of Theorem 2 Proof. To prove Theorem 3 the structure of Theorem 2 is followed. Let t0 = T . Since ∇ϕ(a) (cid:54)= 0, without loss of generality one can assume a = 0. Then there exists a neighbor- hood of 0, say N2ρ = {x ∈ Rd : |x| ≤ 2ρ}, where ∇ϕ(x) (cid:54)= 0. 65 (1) Also v 0 (2) and v 0 are chosen such that the corresponding integral curves x (1) 0 (t) and x (2) (cid:16) 0 (t), t ∈ [0, T ], stay in N2ρ. Then for all t ∈ [0, T ] we have (cid:16) (cid:16) (cid:16) (cid:17) (cid:17)(cid:17)(cid:48) (r) 0 (t) (r) 0 (t) (r) , v 0 (cid:28) ∇ϕ (cid:16) (cid:17) (cid:16) (cid:16) = ϕ x x are chosen in such a way that ϕ (cid:17)(cid:29) (cid:17)(cid:17)(cid:48) x (r) 0 (t) (cid:54)= 0. x (r) 0 (t) < 0 which is ensured and monotonicity of ϕ. (cid:16) x ϕ (cid:17) (r) 0 (T ) (cid:16) (cid:17) . x (r) 0 (t) = inf t∈[0,T ] ϕ (r) Moreover, v 0 (r) 0 (t) (r) by the continuity of v 0 x As a result, (cid:13)(cid:13)(cid:13)∆ (cid:13)(cid:13)(cid:13) ≤ |θr| n−αC (r) 0 (r) n (t) Since ρ and M0 such that for all n ≥ n0, Then by using continuity of v (cid:12)(cid:12)(cid:12)x (r) n (t; θr) for all t ∈ [0, T ] and |θr| ≤ M0 there exists n0 depending on (cid:12)(cid:12)(cid:12) ≤ 2ρ for all t ∈ [0, T ] and |θr| ≤ M0. (cid:16) 0 (x),∇ϕ(x), we can assume (cid:17)(cid:29) (cid:17) (r) (r) n (t; θr) x (r) , v 0 (r) n (t; θr) x < 0 (cid:16) (cid:28) ∇ϕ for all t ∈ [0, T ], |θr| ≤ M0 and for all large enough n ≥ n0. Then for all t ∈ [0, T ], |θr| ≤ M0 and all large enough n ≥ n1 depending on ρ, M0 and g we observe (cid:16) (cid:104)∇ϕ x = (cid:104)∇ϕ (cid:16) (cid:17) (cid:16) (cid:16) (cid:17) (cid:16) (cid:17)(cid:105) (r) n (t; θr) (r) , v n (r) n (t; θr) x (r) n (t; θr) x , (r) v 0 (r) n (t; θr) x (cid:17) (cid:16) x + θrn−αg (cid:17)(cid:17)(cid:105) < 0. (r) n (t; θr) Since g and ∇ϕ are bounded on N2ρ, then (cid:17) (cid:16) ϕ (r) n (T ; θr) x = inf t∈[0,T ] (cid:16) x ϕ (cid:17) (r) n (t; θr) . 66 As a result, Suppose Kr = seen that (cid:16) (cid:16) ϕ (cid:17) (cid:17) (r) n (t; θr) x (r) 0 (t) ϕ t∈[0,T ] (cid:16) (cid:17) − inf (cid:17) (cid:16) (cid:17) − ϕ (cid:17)(cid:17)(cid:48)(cid:16) (cid:17)(cid:17)(cid:48) n (T ; θr) − x θrn−αC (r) 0 (T ) (r) x x = ϕ (r) n (T ; θr) (cid:16) x ϕ inf t∈[0,T ] (cid:16) x (cid:16) (cid:16) (cid:16) (cid:16) ϕ ϕ = = x (r) 0 (T ) x (r) 0 (T ) (cid:17)(cid:17)(cid:48) (2.5.3.20) (1 + o(1)) (r) 0 (T ) (r) 0 (T ) (1 + o(1)) . x (r) 0 (T ) C (r) 0 (T ) then similarly to the proof of Theorem 2 it can be (cid:32) (cid:16) inf t∈[0,T ] ϕ (r) n (t; θr) x (cid:17) − inf t∈[0,T ] (cid:16) x ϕ (cid:17)(cid:33)(cid:16) (r) 0 (t) (cid:17) . 1 + r (r) n (T ; θi) θr = nα Kr Also the estimate of θr is constructed through an arbitrary estimate of the minimum distance (r) ˆF n and appropriately chosen sequences r (r) n (T ; θr) & R (r) n (T ), and it is given by (cid:32) (cid:16) (cid:17)(cid:33)(cid:16) (cid:17) ˆθr,n = nα Kr (r) n − inf ˆF t∈[0,T ] ϕ x (r) 0 (t) 1 + R (r) n (T ) . The rest of the proof follows the same lines of the proof of Theorem 2. 67 Chapter 3 Global Minimax Bound 3.1 Introduction In the previous chapter we already established pointwise rate of convergence for the asymp- totic risk of the CS estimators is minimax optimal, see also Banerjee, Sakhanenko and Zhu (2019). The present chapter establishes the rate of convergence of the asymptotic risk of the integral curve estimator is minimax optimal, globally. Historically, in nonparametric estimation framework Stone (1982) established global minimax optimal rates for the es- timators in a simple nonparametric regression setting. Some more recent works include Raskutti, Wainwright and Yu (2012), where the authors have established global minimax optimal rates for sparse additive models over reproducing kernel Hilbert spaces (RKHS) in a L1 type convex optimization framework. Guntuboyina and Sen (2015), Kim and Samworth (2016) established global rate of convergence in univariate convex regression and log-concave density estimation, respectively. Interestingly enough, in both of these works the estimator achieves the similar asymptotic rate of convergence which is globally minimax. In our work we establish similar results, but under a semi-parametric setting involving high order tensor structure of the signals. Moreover, we use this globally minimax optimal rate with Monte Carlo (MC) simulation study to compare different scanning protocols to find the one that gives the smallest stable global lower bound on the asymptotic risk of the estimated fiber 68 trajectories. This comparison method harnesses the global information in fibers as opposed to some t test applied to an image summary statistic. The rest of the chapter is organized as follows. In section 2 we introduce the assumptions, main results, the definition of integrated or supremum norm for our problem and the basic underlying assumptions needed for the inference described specifically in this chapter. In section 3 we describe the physical phenomenon behind the DT-MRI and also describe our simulation study and results from a real data analysis. Additionally, in this section we pro- vide a simulation based choice of the protocol that gives the lowest global lower bound on the asymptotic risk of the estimators. All the proofs with necessary lemmas and propositions are provided in section 5. 3.2 Assumptions and main results Here we describe the main theorems and lemma required to establish the mimimax lower bound for the global asymptotic risk of the integral curve estimator. First, in addition to the assumptions in chapter 1, (A5) and (A6), let us introduce an assumption on the density f of the noise variable ξ. (A5(cid:48)) Noise variables {ξi : i = 1, 2, . . .} are i.i.d. with a common density f such that all the second order partial derivatives of the function f (z + u) − f (z) f (z) 69 (cid:90) (cid:18) (cid:19) f (z)dz, z ∈ RN , g(u) := − ln 1 + RN are continuous at 0. The class of densities that is described by condition (A5(cid:48)) is fairly extensive. In particular, normal densities satisfy (A5(cid:48)). Moreover, if f is “regular” as defined in Ibragimov and Has’minskii (2013), then the second order partial derivatives of g can be written as (cid:90) (cid:90) (cid:48)(cid:48) ij(u) = g (cid:48) i (y)f (cid:48) j(y − u)/f (y)dy = f (cid:48) j(y)f (cid:48) i (y − u)/f (y)dy, i, j = 1, . . . , N. f The assertion of (A5(cid:48)) can be understood by following the first part of the Lemma I.7.1 where it immediately follows from the finiteness of Fisher information (cid:90) Iii = (cid:48) i (z))2/f (z)dz, i = 1, . . . , N, (f and a continuity type condition: (cid:90) (cid:48) i (y + h) − f (cid:48) i (h))2/f (y)dy ≤ C|h|2, i = 1, . . . , N, (f for all h such that |h| ≤ ε for some ε > 0 and a constant C > 0. These conditions are similar to the conditions (b) and (c) in the definition of regular experiment in Ibragimov and Has’minskii (2013). Now, we present the main theorems along with some motivation to establish global optimal bounds for integral curves estimators. We also present the parametric subclass of tensors and their construction. Furthermore, we describe Lemma 10 showing that the constructed parametric subclass satisfies the assumption (A1). In addition to the class of tensors, we introduce the following classes essential for the results presented in this chapter. Let ˜W be the class of all even functions w : RR (cid:55)→ R that are non-decreasing on R+, 70 |ur|. Let 0 at 0 and w(x) > 0 for all x > 0. Examples of such functions are En(a∗, T ) denote the class of all possible estimators of the curve x(r)(t), t ∈ [0, T ], r = u2 r, r=1 r=1 R(cid:80) R(cid:80) 1, . . . , R, obtained by solving the ODE involving pseudo-eigenvectors v(r), r = 1, . . . , R. Recall that the integrated Lp-norm for a vector function y(t) = (y1(t), . . . , yd(t)), t ∈ [0, T ], is defined as and (cid:107)y(cid:107)p,T := , 1 ≤ p < ∞,  T(cid:90) d(cid:88) |yi(t)|pdt 1/p i=1 0 (cid:107)y(cid:107)∞,T := sup t∈[0,T ] |yk(t)|. max 1≤k≤d Theorem 4. Assume conditions (A1)-(A6) and (A5(cid:48)) hold and 1 ≤ R ≤ (M + 2)/2. Then for any number T > 0, any point a∗ ∈ G, any function w ∈ ˜W, we have lim inf n→∞ inf n ∈En(a∗,T ) n ,..., ˆX (R) ˆX (1) sup D∈D2(a∗,G,T ) Ew n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX (R) n − x(R)(cid:107)p,T > 0. (cid:16) n2/(d+3)(cid:16)(cid:107) ˆX (1) (cid:17)(cid:17) (cid:17)(cid:17) Theorem 5. Assume conditions (A1)-(A6) and (A5(cid:48)) hold and 1 ≤ R ≤ (M + 2)/2. Define for c, k > 0, Dc,k = {D ∈ D2(a∗, G, T ) : (cid:107)D − D0(cid:107)∞ ≤ cn−2/(d+3) and (cid:107)D (cid:48)(cid:48) − D (cid:48)(cid:48) 0(cid:107)∞ ≤ k}. Then for any numbers c > 0, T > 0, any point a∗ ∈ G, any D ∈ Dc,k, any function w ∈ ˜W, any 1 ≤ p ≤ ∞, and some k > 0, we have lim inf n→∞ inf n ∈En(a∗,T ) n ,..., ˆX (R) ˆX (1) sup D∈Dc,k Ew n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX (R) n − x(R)(cid:107)p,T > 0. (cid:16) n2/(d+3)(cid:16)(cid:107) ˆX (1) 71 Theorems 4 and 5 establish that the ensemble of integral curve estimators ˆX (r) n (t), r = 1, . . . , R, t ∈ [0, T ], minimizes the maximum risk among all the estimators inside the respec- tive classes under the integrated norm and the appropriate loss function w. Thus, we achieve the minimax lower bound under the integrated norm with loss function w. If we analyze carefully Theorem 4, then we see that it is an immediately corollary of Theorem 5, since the class Dc,k ⊂ D2(a∗, G, T ). In the statements for both theorems we see that the norm of the errors is scaled by n2/(d+3) which matches the asymptotic rate of convergence of the Carmichael and Sakhanenko (2015) estimator. Theorem 6. Assume conditions (A1)-(A8) and (A5(cid:48)) hold. Let { ˆX (r) n : r = 1, . . . , R} be the integral curve estimators of the solutions of the ODEs involving the pseudo-eigenvectors v(r)(t), r = 1, . . . , R. 1. Additionally let E(cid:107)ξi(cid:107)q q < ∞, for some q ≥ 4. Then for each r = 1, . . . , R, for any number T > 0, for any point a∗ ∈ G and any 2 ≤ p ≤ q, we have E(cid:107)n2/(d+3)(cid:16) ˆX n − x(r)(cid:17)(cid:107)p,T < ∞. (r) sup n sup D∈D2(a∗,G,T ) 2. Moreover for each r = 1, . . . , R, for any number T > 0, for any point a∗ ∈ G, we have E(cid:107)n2/(d+3)(cid:16) ˆX n − x(r)(cid:17)(cid:107)∞,T < ∞. (r) sup n sup D∈D2(a∗,G,T ) Theorem 6 exploits the moment conditions on the noise variables to establish that the maximum risk of appropriately scaled integral curve estimators under integrated Lp−norm or supremum-norm converges to a finite constant. These three theorems together study the global bounds for the asymptotic risk of the integral curve estimator and show that it is 72 minimax. 3.2.1 Parametric subclass of tensors In order to prove the minimax global rate for the asymptotic risk of the integral curve estimator, we start by proposing a construction of a parametric subclass of D2(a∗, G, T ). We start with perturbing the curves which will perturb the resulting gradient vector field translating the perturbation ultimately to the tensor field in D2(a∗, G, T ). (r) For each r = 1, . . . , R, let x 0 (t, a∗), t ∈ [0, T ], denote the integral curve starting at a∗, (r) (r) 0 (t, a∗)), where x 0 (0, a∗) = a∗. Additionally, for small  > 0, driven by the vectors v consider A, an  neighborhood of a∗ (in Euclidean norm) in a (d−1)-dimensional hyperspace transversal to the flow at a∗. Suppose the volume swept by A, under the flow v (r) 0 (x (r) 0 , is denoted by (r) ,T = {x(r) = x G (r) 0 (t, a) : t ∈ [0, T ], a ∈ A} ⊂ G, for all r = 1, . . . , R. Then G (r) ,τ defines a neighborhood of the integral curve x (r) 0 (t, a∗), t ∈ [0, τ ], as we vary the initial point a. Define, x (r) b (t, a) = x (r) 0 (t + n−αϕ (r) b (t)ψ(r)(nγ|a − a∗|), a), (r) where ϕ b ∈ {0, 1}P . Also suppose that ϕ b (t), t ∈ [0, T ], is a family of twice continuously differentiable functions indexed by b (t) ≤ 1, for r = 1, . . . , R. Additionally assume, ψ(r)(z), z > 0, is a three times continuously differentiable b (T ) = 0,−1 < ∇ϕ b (t) (cid:54)≡ 0, ϕ (r) b (0) = 0, ϕ (r) (r) (r) 73 function such that (r) 0 < ψ(r)(z) < c/(cid:107)v ∇ψ(r)(z) ≤ 0 for z > 0, ψ(r)(z) = 0 for z > . 0 (cid:107)∞, ∇ψ(r)(0) = ∇2ψ(r)(0) = 0, Note that the perturbation in parameter t is small enough and for all r = 1, . . . , R, the perturbation vanishes far enough from x (r) 0 (t, a∗), t ∈ [0, T ]. Then the corresponding pertur- bation in the vector field can be found as d dt x (r) b (t, a) = v (r) b (x = v (r) 0 (x d dt x (r) b (t, a)) = b (t, a))(1 + n−α∇ϕ (r) (r) 0 (t + n−αϕ (r) b (t)ψ(r)(nγ|a − a∗|), a) (r) b (t)ψ(r)(nγ|a − a∗|)). (3.2.1.1) By flow box theorem (see Lemma 3.2.120 in Chicone (1999)) for x(r) ∈ G (r) ,T , there are b (x) ∈ [0, T ] and ab(x) ∈ A, such that there is a b-perturbed integral curve starting in A, which goes through x(r) and uniquely defined twice continuously differentiable functions t (r) x (r) b (t (r) b (x(r)), ab(x(r))) = x, for r = 1, . . . , R, since v (r) 0 (x) (cid:54)= 0, for all x ∈ G (r) ,T . So the expression in (3.2.1.1) can be written as v (r) b,n(x) = v (r) 0 (x)(1 + n−α∇ϕ (r) b (t (r) b (x))ψ(r)(nγ|ab(x) − a∗|)). (3.2.1.2) Now for a fixed b ∈ {0, 1}P , we could construct an order M rank R tensor such that Db(x) = λ1v (1) b,n(x)⊗M + λ2v (2) b,n(x)⊗M + . . . + λRv (R) b,n (x)⊗M . (3.2.1.3) 74 Also define the tensor corresponding to vectors v (r) 0 (x), r = 1, . . . , R, named D0 as D0(x) = λ1v (1) 0 (x)⊗M + λ2v (2) 0 (x)⊗M + . . . + λRv (R) 0 (x)⊗M , (3.2.1.4) (r) 0 , r = 1, . . . , R, respectively, are chosen in where pseudo-eigenvalues and eigenvectors λr, v such a way that D0 belongs to the parametric subclass in D2(a∗, G, T ), meaning in particular, that it satisfies (A1). Below we present the first Lemma which will ensure that the construction proposed above yields the parametric family of tensors that satisfy condition (A1). Let us introduce some additional notation, for p = 1, . . . , R, D (p) 0 (x) = D (p) b (x) = R(cid:88) R(cid:88) r=p r=p (r) 0 (x)⊗M , λrv (r) b,n(x)⊗M . λrv (3.2.1.5) Also we assume that (cid:107)v b,n → 1 as n → ∞. (r) c (r) 0 (x)(cid:107) = 1 then (cid:107)v (r) b,n(x)(cid:107)2 =: c (r) b,n. Hence, it is easy to note that (p) Lemma 10. For p = 1, . . . , R, the tensors D b and for all x ∈ G (r) ,T , r = 1, 2 the following relations hold Ker(T (v (p) (p) b,n(x), D b (x)) − λpI) = 0. We will provide the proof of this lemma along with the proof of theorems in section 5 of this chapter. 75 3.3 Simulation study and Data analysis 3.3.1 Simulation Study Following the setup of Sakhanenko and DeLaura (2017), we consider a simulation study based on the “Y” pattern that often presents itself while neural fibers branch. Based on signal-noise ratio (SNR) and thickness of the fibers we provide a 3D plot, see Figure 3.1, of an estimated integral curve along with 95% confidence pointwise ellipsoids computed using the method proposed in Carmichael and Sakhanenko (2015). (a) 3-dimensional trace of the “Y” pattern. (b) 3-dimensional trace of the “Y” pattern with 95 % confidence ellipsoids. Figure 3.1: We trace the integral curve using Carmichael and Sakhanenko (2015) method creating the “Y” pattern. Here sample size n = 603, signal-noise ratio SN R = 2, thickness of the fiber ε = 0.04, step size δ = 2 and the constant β = 10−7. We simulate the “Y” pattern using several sample sizes from 303 to 1003. For each curve we computed the estimated constant in the lower bound in Theorem 2, which is (cid:17)(cid:17) (cid:16) n1/3(cid:16)(cid:107) ˆX κ = w (1) n − x(1)(cid:107)2,T ,(cid:107) ˆX (2) n − x(2)(cid:107)2,T . As our method is minimax globally with respect to the asymptotic risk of the estimators, we can compare this κ values for different scenarios to chose the best one. Below we provide the 25th percentile, median and 75th percentile for the κ values that are simulated over 100 times for each of the eight 76 different sample sizes. Figure 3.2: The red, blue and green lines show the 25th, 50th and 75th percentiles of the κ values across all the sample sizes n = n3 0, n0 = 30, . . . , 100 with increasing n0 by 10 at each step, repeated 100 times. Here we have used step size δ = 0.02 and β = 10−7 for each of the iterations. As we can see from figure 3.2, the median of the κ values tend to stabilize when we use n0 = 60. Therefore, next we will investigate the robustness of the values for κ when we vary signal-noise ratio (SNR) and thickness of the fibers with the sample size n = 603. In Table 3.1 we provide the results. 77 30405060708090100n000.0050.010.0150.020.0250.030.0350.040.045 Table 3.1: The 25th percentile, median and 75th percentile of the κ values are ordered in top to bottom in each line for different combinations of SNR and thickness of fibers; the sample size is n = 603. SNR Thickness = 0.02 2 4 6 8 10 2.14 × 10−4 3.25 × 10−3 2.63 × 10−2 2.04 × 10−7 1.17 × 10−5 7.38 × 10−4 1.78 × 10−8 4.16 × 10−7 5.3 × 10−5 5.25 × 10−10 4.31 × 10−8 1.87 × 10−6 1.05 × 10−10 6.1 × 10−9 9.07 × 10−8 0.04 9.82 × 10−5 2.88 × 10−4 2.39 × 10−3 1.14 × 10−7 2.39 × 10−7 1.49 × 10−6 3.49 × 10−9 8.82 × 10−9 2.93 × 10−8 1.86 × 10−10 4.02 × 10−10 1.13 × 10−9 2.58 × 10−11 5.21 × 10−11 1.07 × 10−10 0.06 7.27 × 10−5 2.05 × 10−4 1.22 × 10−3 1.05 × 10−7 2.34 × 10−7 5.74 × 10−7 2.42 × 10−9 4.86 × 10−9 1.85 × 10−8 2.17 × 10−10 3.7 × 10−10 1.53 × 10−9 3 × 10−11 5.97 × 10−11 1.34 × 10−10 0.08 7.6 × 10−5 1.95 × 10−4 8.3 × 10−4 1.07 × 10−7 2.55 × 10−7 8.09 × 10−7 3.05 × 10−9 6.3 × 10−9 2.18 × 10−8 2.11 × 10−10 4.02 × 10−10 8.12 × 10−10 2.67 × 10−11 5.28 × 10−11 1.02 × 10−10 0.1 8.19 × 10−5 2.6 × 10−4 1.13 × 10−3 9.4 × 10−8 2.35 × 10−7 6.1 × 10−7 2.99 × 10−9 5.55 × 10−9 1.35 × 10−8 2.03 × 10−10 3.93 × 10−10 7.28 × 10−10 2.78 × 10−11 5.05 × 10−11 1.53 × 10−10 From Table 3.1 it is evident that, if we increase the SNR, the κ value decreases, indicating that as the signal gets stronger, the traced curve estimates the true curve more accurately, and hence the asymptotic risk decreases. However, the thickness of the curves does not effect the value of κ in a significant way. Overall, the smallest thickness seems to have the worst performance, while the best value of κ is obtained more often when the thickness is close 78 to 0.06 indicating that when the thickness of the curves is extremely high or low then the uncertainty in estimation is higher. Figure 3.3: The computation times in seconds for simulation of κ values for different choices of SNR and thickness values. From Figure 3.3 we can see that the optimal computation time is achieved when SN R = 4 and thickness of the fiber is 0.02. In practice, SN R values normally range from 3 − 5 for HARDI data, which suggests that our method could also achieve optimality with respect to computation time in real life HARDI data. 3.3.2 Neuroimaging example Several DWI datasets were collected from a 28-year-old healthy male brain on a GE 3T Signa HDx MR scanner (GE Healthcare, Waukesha, WI) with an 8-channel head coil. The subject signed the consent form approved by the Michigan State University Institutional Review Board. DWI images were acquired with a spin-echo echo-planar imaging (EPI) sequence for several minutes per session using the following protocols with the following 79 246810SNR567891011Compute time in sec1040.020.040.060.080.1thickness parameters summarized in Table 3.2. All protocols had 48 contiguous 2.4-mm axial slices in an interleaved order and matrix size = 128 × 128, TE = 72.3 ms, TR = 7.5 s, b = 1000 s/mm2, FOV is 22 cm ×22 cm, and parallel imaging acceleration factor = 2. Table 3.2: Protocols for HARDI. Note that the second protocol has a total of 6 b0 images after 2 repetitions of 3 b0 images. Protocol scan time slice size NEX # of slices TR # of b0 images 30 directions 6.5 mins 2.4 mm 1 30 directions, 2 reps 12.9 mins 2.4 mm 2 60 directions 12.9 mins 2.4 mm 1 90 directions 19.2 mins 2.4 mm 1 150 directions 20 mins 2.4 mm 1 48 48 48 48 32 11.5 s 11.5 s 11.5 s 11.5 s 7.5 s 3 3 6 9 9 We traced axonal fibers in the anterior part of the corpus callosum which connect the right and left frontal lobes. The general anatomical locations of these axonal fibers are well established. These fibers can be used to evaluate new techniques in fiber tractogra- phy. Several initial points were chosen in the region of interest (ROI) based on anatomical considerations. Under each protocol, starting with each seed point, we used the estimation technique in Carmichael and Sakhanenko (2015) to trace a fiber until it ran into water. Figure 3.4 provides the reference images done for protocol with 60 directions. For each curve we computed the estimated constant in the lower bound in Theorem 2, which is κ = w(n1/3(cid:107) ˆXn − x(cid:107)2,T ). We used the weight function w(u) = u2/T . Since the estimation method has the rate optimality property (with respect to n), the only thing left to be optimized is the constant. It can be used to compare different rate-optimal estimators. In this study the estimators differ according to underlying protocols used to obtain the data. 80 (a) single seed point (b) multiple seed points Figure 3.4: A neuronal fiber bundle across the genu of corpus callosum is created based on the Carmichael and Sakhanenko (2015) method. In (a) one particular seed point was used. The confidence ellipsoids along the fiber have 95% confidence. In (b) several seed points were used to create several estimated fibers. Two different branches are shown in two colors. The smaller constant κ would indicate a more successful estimator. Table 3.3 summarizes our findings. Table 3.3: Comparison of the constant κ for tracing of anterior fibers based on imaging datasets obtained via different scanning protocols. Here δ = 0.003, β = 10−7. Protocol 30 directions 30 directions, 2 reps 60 directions 90 directions 150 directions Number of voxels ROI size 128 × 128 × 48 128 × 128 × 48 128 × 128 × 48 128 × 128 × 48 128 × 128 × 32 119 122 120 119 171 25th Median 75th 0.1156 0.3630 1.6954 0.0915 0.2851 0.8066 0.0417 0.1211 0.3596 0.0481 0.1955 1.1941 0.1869 0.5762 1.4582 The protocol with 60 directions has the lowest κ, while protocol with 90 directions comes in with the second smallest κ. For each protocol the distribution of constant values with respect to the initial point is skewed right, which is expected since the uncertainty increases 81 along the fibers. It is quite interesting that the design with 60 different directions gives lower κ in comparison with the block design when 30 directions are independently repeated twice. Usually, block designs yield smaller variances for the estimated fibers locally, so one might argue that block designs have advantage locally, see Sakhanenko (2013). However, here the block design performs worse in a global sense. We speculate that this is due to noise behaviour along the whole fiber. 3.4 Remarks and Conclusion In summary we would like to comment that in this work we have proved the minimax optimality of the asymptotic risk of the nonparametric integral curve estimators described in Carmichael and Sakhanenko (2015) in the whole domain of the imaging field G. Therefore, we have established the global minimax optimality of the estimation method. Although the asymptotic rates that we proposed are minimax optimal in the global sense, one can further optimize the constant κ involved in the risk function to get optimal results. In our data analysis we have provided a comparative study of the different imaging protocols with respect to this constant, and we have found the protocol that provides the optimal value to the global asymptotic risk. The analysis was performed on a single subject (human brain). We have similar results for another subject. This can be further studied with more different subjects to understand if the optimal protocol for scanning procedure remains the same across subjects. However, it is beyond the scope of this work and could be explored as a nice applied direction for future research in this topic. On the theoretical side, one can explore the constant in lower bounds and obtain optimal constants to refine theoretical results further. 82 3.5 Proofs 3.5.1 Proof of Lemma 10 Proof. In order to prove Lemma 10, let us first assume n is sufficiently large so that all the terms with o(n−α) can be ignored and all assertions hold up to terms of order o(n−α). Then note that for r = R T (v =T (v (R) (R) b,n (x), D b (x))km (R) 0 (R) (x), D 0 (x))km d(cid:88) d(cid:88) i3=1 iM =1 =λR(M − 1) . . . v (R) 0k (x) v (R) 0m (x) (v (R) 0i3 (x))2 . . . (v (R) 0iM (x))2 − λRδkm =λR[(M − 1)v0kv0m − δkm]. To show Ker(T (v (R) 0 (R) (x), D 0 (x)) − λRI) = 0, we consider any arbitrary u ∈ Rd such that (cid:104)λR[(M − 1)v (R) 0k v (R) 0m − δkm], u(cid:105) = 0 (M − 1)v (R) 0k v (R) 0m um = d(cid:88) m=1 δkmum d(cid:88) d(cid:88) k=1 (R) v 0k uk (M − 1)(v (R) 0k )2v (R) 0m um = (R) v 0k uk =⇒ d(cid:88) =⇒ d(cid:88) m=1 d(cid:88) k=1 m=1 =⇒ (M − 1) =⇒ d(cid:88) d(cid:88) v (R) 0m um = m=1 k=1 v (R) 0m um = 0. m=1 83 Therefore, it implies that, if uk = 0, k = 1, . . . , d, then Ker(T (v (R) 0 (R) (x), D 0 (x))− λRI) = 0. Similarly, (T (v (p) 0 (x), D =(M − 1) R(cid:88) (p) 0 (x)) − λpI)km d(cid:88) d(cid:88) λr . . . iM =1 r=p i3=1 − δkmλp (cid:104) = (M − 1)λpv (p) 0k (x)v (p) 0m(x) − δkmλp (R) 0k (x)v v (R) 0m (x) v (R) 0i3 (x) . . . v (R) 0iM (x) v (p) 0i3 (x) . . . v (p) 0iM (x) (cid:105) + (M − 1) R(cid:88) r=p+1 λrv (r) 0k (x)v (r) 0m(x)qM−2 r,p , (3.5.1.1) where qr,p = (cid:104)v Ker(T (v (p) 0 (x), D (p) (r) 0 , v (p) 0 (cid:105), p (cid:54)= r = 1, . . . , R. Now consider a generic u ∈ Rd, which belongs to 0 (x)) − λpI) then d(cid:88) m=1 (M − 1)λpv (p) 0k (x)v d(cid:88) R(cid:88) m=1 r=p+1 (p) 0m(x)um + (M − 1) d(cid:88) (p) 0k (x) we get v Then taking a sum over d(cid:88) d(cid:88) m=1 k=1 (p) 0m(x)um + (M − 1) v (p) 0m(x)um + (M − 1) v R(cid:88) d(cid:88) R(cid:88) m=1 r=p+1 λrqM−1 r,p λrv r,p = λp (r) 0m(x)umqM−1 d(cid:88) (r) 0m(x)um = 0. v (M − 1)λp (M − 2)λp λrv (r) 0k (x)v (r) 0m(x)umqM−2 r,p = λpuk. (3.5.1.2) d(cid:88) k=1 (p) 0k (x)uk, v m=1 r=p+1 m=1 84 d(cid:80) d(cid:88) k=1 m=1 (M − 1)λpql,p Similarly, taking (l) 0k (x) on (3.5.1.2), we get for l = p + 1, . . . , R, v (p) 0m(x)um + (M − 1) v R(cid:88) r=p+1 d(cid:88) m=1 (r) 0m(x)um v (3.5.1.3) r,p λrql,rqM−1 d(cid:88) = λp (l) 0m(x)um. v d(cid:80) m=1 Suppose m=1 (l) 0m(x)um = Al, then together with (3.5.1.2) and (3.5.1.3) we get a linear v system of equations: λp(M − 2)Ap + (M − 1) R(cid:88) λrqM−1 R(cid:88) r=p+1 r,p Ar = 0, λp(M − 1)ql,pAp + (M − 1) λrql,rqM−2 r,p Ar − λpAl = 0, (3.5.1.4) l = p + 1, . . . , R. r=p+1 We can write these R − p + 1 equations in R − p + 1 variables Al, l = p + 1, . . . , R given by (3.5.1.4), in matrix notations: QR,pA = 0, where A = (Ap, Ap+1, . . . , AR) and  QR,p =  . (M − 2)λp (M − 1)λpqp+1,p (M − 1)λp+1qM−1 p+1,p (M − 1)λp+1qM−2 p+1,p − λp . . . . . . (M − 1)λRqM−1 R,p (M − 1)λRqp+1,RqM−1 R,p ... (M − 1)λpqR,p . . . . . . (M − 1)λRqM−2 R,p − λp 85 Now suppose Det(QR,p) (cid:54)= 0, then d(cid:88) Al = v (l) 0mum = 0, l = p, p + 1, . . . , R, (3.5.1.5) or in other words m=1  (p) v 01 . . . v (p) 0d ... (R) v 01 . . . v (R) 0d  = 0.   u1 ... ud Now putting equation (3.5.1.5) back in (3.5.1.3), we get uk = 0, k = 1, . . . , d. Next we need to find under what conditions the (R− p + 1)× (R− p + 1) matrix QR,p is of the full rank for p = 1, . . . , R. Now in order to ensure Det(QR,p) (cid:54)= 0, p = 1, . . . , R, consider the following cases: Case I (k) If R ≤ d, then after choosing R − p pseudo-eigenvectors we can choose the rest p of pseudo- 0 , k = R − p + 1, . . . , R, that are mutually orthogonal, and as a result we eigenvectors v will have qk,m = 0, k, m = R − p + 1, . . . , R. Therefore, as long as λp (cid:54)= 0, we will have Det(QR,p) (cid:54)= 0. Case II If R > d then first let us consider R = d + 1. In that case, we can choose the first d pseudo- (k) eigenvectors v 0 Det(QR,p) (cid:54)= 0, p = 1, . . . , R. to be orthogonal, hence qi,j = δij, i, j = 1, . . . , d. Now we need to check 1. When p = R, we have dim(QR,p) = 1 × 1 and Det(QR,p) (cid:54)= 0 =⇒ λR(M − 2) (cid:54)= 0. 86 2. When p = R − 1, we have dim(QR,p) = 2 × 2 and Det(QR,p) (cid:54)= 0   =⇒ Det (M − 2)λR−1 (M − 1)λRqM−1 R,R−1 (M − 1)λR−1qR,R−1 (M − 1)λRqM−2 R,R−1 − (M − 2) − (M − 1)2qM R,R−1 − λR R,R−1 (cid:54)= 0. =⇒ (M − 1)(M − 2)qM−2   (cid:54)= 0 (3.5.1.6) Now (3.5.1.6) is a polynomial in qR,R−1, that can yield at most M roots, provided λR−1 (cid:54)= 0. Hence, we can choose v such that qR,R−1 satisfies (3.5.1.6). (R) 0 3. When p = R−2, then we have dim(QR,p) = 3×3. Also we use the fact that qR−1,R−2 = 0, as first d pseudo-eigenvectors are orthogonal. Here in this case, the QR,p matrix is given by  (M − 2)λR−2 (M − 1)λR−1qR−1,R−2 (M − 1)λR−2qR−1,R−2 (M − 1)λR−2qR,R−2 (M − 1)λR−1qM−2 (M − 1)λR−1qR−1,RqM−2 R−1,R−2 − λR−1 R−1,R−2  , (M − 1)λR−1qM−1 R,R−2 (M − 1)λRqR−1,RqM−2 R,R−2 (M − 1)λRqM−2 R,R−2 − λR and therefore, Det(QR,p) (cid:54)= 0   =⇒ Det (M − 2)λR−2 0 (M − 1)λR−2qR,R−2 0 −λR−1 0 =⇒ − ((M − 1)(M − 2)qM−2 R,R−2 − (M − 2) − (M − 1)2qM   (cid:54)= 0 (M − 1)λR−1qM−1 R,R−2 (M − 1)λRqR−1,RqM−2 R,R−2 (M − 1)λRqM−2 R,R−2 − λR R,R−2) (cid:54)= 0. 87 Note that the last assertion is similar to (3.5.1.6) provided λR, λR−1, λR−2 (cid:54)= 0. 4. Similarly, for p = R − 3, we have dim(QR,p) = 4 × 4 and here we use the fact that qR−2,R−3, qR−1,R−3 = 0 using similar arguments. Hence, Det(QR,p) (cid:54)= 0 (M − 2)λR−2 =⇒ Det 0 0   0 −λR−2 0 0 0 −λR−1 0 (M − 1)λRqM−1 R,R−3 (M − 1)λRqR−2,RqM−2 R,R−3 (M − 1)λRqR−1,RqM−2 R,R−3 (M − 1)λRqM−2 R,R−3 − λR R,R−3) (cid:54)= 0. 0 =⇒ (−1)2((M − 1)(M − 2)qM−2 0 R,R−3 − (M − 2) − (M − 1)2qM   (cid:54)= 0 (R) In this way we can continue and at each step we need to choose v 0 , in such a way that, (−1)(R−p−1)((M − 1)(M − 2)qM−2 R,p − (M − 2) − (M − 1)2qM R,p) (cid:54)= 0, for p = 1, . . . , R − 1. Finally for any R = d + k, where k > 1, by induction we will get a polynomial expression in qR,R−k+1 qR,R−k qR,R−k−1 . . . qR−1,R−k+1 . . . ... qR,p qR,p , qR−k+1,R−k . . . qR−k+1,p similar to (3.5.1.6), where each of the qR1,R2 fixed integer degree. Therefore, we can choose λ1, . . . , λR (cid:54)= 0 and {v ∈ [−1, 1] and the polynomial will be of a 0 } such that (1) 0 , . . . , v (R) 88 Det(QR,p) (cid:54)= 0, p = 1, . . . , R. Next, we prove Theorem 1 and 2 using the construction that we already mentioned using some intermediate results. It is easy note that as Dc,k ⊂ D2(a∗, G, T ), therefore Theorem 4 is an immediate corollary of Theorem 5. 3.5.2 Proof of Theorem 5 Proof. The proof of Theorem 5 will be based on the following Lemmas and are similar to the construction given in Ibragimov and Has’minskii (2013) and (1990). We additionally provide a result that extends Fano’s Lemma in the multidimensional parameter space. We use that result in Lemma 12 tailored to fit our problem and similar to Theorem 5.7 in Devroye (1987). Lemma 11 provides a bound for the function g defined in the earlier section. In Lemma 13 we prove the smoothness condition on the perturbed class of tensor fields Dc,k. Finally, Lemma 14 and 15 provide a construction for the tensor fields and integral curves which we will make use in the proof of Theorem 5. Lemma 11. Let f satisfy condition (A5(cid:48)). Then there exists such δ1 > 0, that for any vector u, satisfying |u| ≤ δ1 we have, g(u) ≤ C(f, δ1)|u|2, where C(f, δ1) is a positive constant. The proof of the Lemma 11 will follow along the same lines of the proof of Lemma 1 from Sakhanenko (2011). Below we consider Proposition 1 in which the Fano’s Lemma is described for multidimensional parameters. 89 Proposition 1. Let X be a random variable whose density depends on parameter from a mul- tidimensional parameter space. Suppose, the densities are indexed by i = (i1, . . . , iR), fi are such that KL divergence between them K(fi, fj) ≤ β and 1 ≤ ir ≤ lr + 1, r = 1, . . . , R. Fur- thermore, for the parameters (θ(1), . . . , θ(R)), for each r let the estimate of θ(r) be Ψ(r)(X), which takes value in {1, . . . , lr + 1}, L = (l1 + 1) . . . (lR + 1). Then, Pi (Ψ(X) (cid:54)= i) ≥ 1 − β + ln 2 ln(L − 1) , sup i where Pi1,...,iR is the probability induced by fi. Proof. Let θ = (θ(1), . . . , θ(R)), be a random vector such that, P(θ(1) = i1, . . . , θ(R) = iR) = 1 L , 1 ≤ ir ≤ lr + 1, r = 1, . . . , R. For ease of notation assume Ψ(X) = (Ψ(1)(X), . . . , Ψ(R)(X)). Then, (cid:88) i1,...,iR P(θ(1) = i1, . . . , θ(R) = iR|X) ln P(θ(1) = i1, . . . , θ(R) = iR|X) (cid:19) (cid:18) P(θ = i|X) =P(θ = Ψ(X)|X) ln(P (θ = Ψ(X)|X)) + P(θ (cid:54)= Ψ(X)|X)log(P (θ (cid:54)= Ψ(X)|X)) P(θ = i|X) (cid:88) i(cid:54)=Ψ(X) + P(θ (cid:54)= Ψ(X)|X) P(θ (cid:54)= Ψ(X)|X) ln P(θ (cid:54)= Ψ(X)|X) ≥ − ln 2 − P(θ (cid:54)= Ψ(X)|X) ln(L − 1), where we applied Lemma (5.1) from the Devroye (1987) twice. The quantity on the left- 90 hand side of this chain of inequalities will now be bounded from above. Indeed, P(θ = i|X) = P(θ(1) = i1, . . . , θ(R) = iR|X) (cid:80) = fi1,...,iR (X) fj1,...,jR . (X) j1,...,jR Thus, P(θ = i|X) ln(P(θ = i|X))) fj(x)dx  1 L  ≥ 1 L fj(x) (cid:88) j ln fj(x) (cid:88) j E( i i (cid:88) (cid:90) (cid:88) (cid:90) (cid:88) (cid:90) (cid:88) (cid:88) 1 L2 1 L i,j i 1 L2 i,j = = = =   fi(x)(cid:80)  1 (cid:88)  fi(x)dx, since, ln fj(x) L j j j ln fj(x) fi(x)(cid:80)  fi(x)(cid:80) (cid:33) (cid:32) j ln fj(x) ln fi fj fidx − lnL K(fi, fj) − lnL ≤β − lnL. We conclude that Hence, β − lnL ≥ − ln 2 − P(θ (cid:54)= Ψ(X)) ln(L − 1). Pi (Ψ(X) (cid:54)= i) ≥ P(Ψ(X) (cid:54)= θ) ≥ lnL − β − ln 2 ln(L − 1) . sup i 91 Finally, Pi (Ψ(X) (cid:54)= i) ≥ 1 − β + ln 2 ln(L − 1) . sup i Next, we describe Proposition 2 to establish a condition for providing a bound for the KL- |v(x)|2dx divergence of the densities in our parametric subclass. Now let us define (cid:107)v(cid:107)2 G =(cid:82) G for a vector field v. Proposition 2. Let Pi, fi be respectively the probability measure and density induced by , i = (i1, . . . , iR). Then for any two indices i (cid:54)= j such that (cid:16) (cid:17) X, BDi(X) + Σ1/2(X)ξ 1 ≤ ir, jr ≤ lr + 1, r = 1, . . . , R, (cid:107)Di − Dj(cid:107)2 G ≤ β =⇒ K(fi, fj) ≤ C(f, δ1)CΣ,Bβ, where C(f, δ1) is a constant introduced in Lemma 11 and CΣ,B is a constant depending on Σ and B only. 92 f (Σ−1/2(x)(y − BDi(x))) f (Σ−1/2(x)(y − BDj(x))) f (˜y − Σ−1/2(x)BDi(x)) f (˜y − Σ−1/2(x)BDj(x)) ln ln f (Σ−1/2(x)(y − BDi(x)))dxdy f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x))dxd˜y. Proof. K(fi, fj) (cid:90) (cid:90) G (cid:90) (cid:90) RN = = G = − = − = − RN Using the change of variable ˜y = Σ−1/2y on y only: (cid:90) (cid:90) (cid:90) (cid:90) (cid:17) z + Σ−1/2(x)B(Di(x) − Dj(x)) f (˜y − Σ−1/2(x)BDj(x)) (cid:16) f (˜y − Σ−1/2(x)BDi(x)) RN ln G f ln f (z) f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x))dxd˜y f (z)Det(Σ1/2(x))dxd˜y, G RN using the change of variable z = ˜y − Σ−1/2(x)BDi(x), (cid:90) (cid:90) z + Σ−1/2(x)B(Di(x) − Dj(x)) (cid:17) − f (z) (cid:16) f 1 + ln f (z)  f (z)Det(Σ1/2(x))dxdz. G RN Finally applying Lemma 11 we obtain (cid:90) (cid:90) G K(fi, fj) ≤ C(f, δ1) ≤ C(f, δ1) |Σ−1/2(x)B(Di(x) − Dj(x))|2Det(Σ1/2(x))dx (cid:107)Σ−1/2(x)(cid:107)2 F(cid:107)B(cid:107)2 F(cid:107)Di(x) − Dj(x)(cid:107)2 F Det(Σ1/2(x))dx G ≤ C(f, δ1)CΣ,B(cid:107)Dθ − D˜θ(cid:107)2 ≤ C(f, δ1)CΣ,Bβ. G 93 Lemma 12. Let Dl be a class of L tensor fields Dθ, θ = (θ(1), . . . , θ(R)) inside the parametric subclass of Dc,k, such that for any θ (cid:54)= ˜θ and positive constants β, δ (r) 2 , r = 1, . . . , R, (cid:107)Dθ − D ˜θ(cid:107)2 G ≤ δ1,(cid:107)Dθ − D ˜θ(cid:107)2 G ≤ β and (cid:107)x (r) θ − x (r) ˜θ (cid:107)1,T ≥ 2δ (r) 2 , θ ∈ Rd denotes the integral curve corresponding to v where for each r = 1, . . . , R and θ, x starting at a∗. In addition, let f satisfy condition (A5(cid:48)). Then for any w ∈ ˜W we have, (r) (r) θ Ew((cid:107) ˆX (1) n − x(1)(cid:107)1,T , . . . ,(cid:107) ˆX sup D∈DL (R) n − x(R)(cid:107)1,T ) (cid:18) ≥ |x(r)|≥δ inf (r) 2 : r=1,...,R w(|x(1)|, . . . ,|x(R)|) 1 − nCΣ,BC(f, δ1)β + ln 2 ln(L − 1) (cid:19) . Proof. The proof of this Lemma will follow the structure of the modified version of Fano’s Lemma that we described in Proposition 1. Let Θ be a uniform random vector such that 1 (cid:16) (cid:17) P(θ(1) = i1, . . . , θ(R) = iR) = (l1 + 1) . . . (lR + 1) , 1 ≤ ir ≤ lr + 1, r = 1, . . . , R. Let Pi be the probability measure induced by X, BDi(X) + Σ1/2(X)ξ , i = (i1, . . . , iR). For each r = 1, . . . , R, define Ψ(X1, . . . , Xn, ξ1, . . . , ξn) = (Ψ(1)(X), . . . , Ψ(R)(X)), where Ψ(R)(X) = θ(r) if (cid:107) ˆX n − x(r)(cid:107)1,T ≥ δ (r) (r) 2 . 94 Then by Proposition 1 (modified version of Fano’s Lemma) we have (cid:16)(cid:107) ˆX Ew (1) n − x(1)(cid:107)1,T , . . . ,(cid:107) ˆX (R) n − x(R)(cid:107)1,T (1) n − x(1)(cid:107)1,T , . . . ,(cid:107) ˆX (R) n − x(R)(cid:107)1,T (cid:17) (cid:17) sup D∈D2(a∗,G,T ) (cid:16)(cid:107) ˆX ≥ sup D∈Dl Ew ≥ ≥ ≥ |x(r)|≥δ inf (r) 2 : r=1,...,R |x(r)|≥δ inf (r) 2 : r=1,...,R |x(r)|≥δ inf (r) 2 : r=1,...,R w(|x(1)|, . . . ,|x(R)|) sup D∈Dl w(|x(1)|, . . . ,|x(R)|) max i (cid:18) P((cid:107) ˆX (r) n − x (r) θ (cid:107)1,T ≥ δ (r) 2 ; r = 1, . . . , R) Pi(ψ(r)(X) (cid:54)= θ(r) r = 1, . . . , R) w(|x(1)|, . . . ,|x(R)|) 1 − nI((X1, ξ1), θ) + ln 2 ln(L − 1) (cid:19) , where I ((X1, ξ1), θ) is the Shannon’s information. Notice that the Shannon’s information as mentioned above can be bounded from above as in Has’minskii and Ibragimov (1990): f (Σ−1/2(x)(y − BDj(x))) I ((X1, ξ1), Θ) =L−1 =L−1 l1+1(cid:88) i1=1 l1+1(cid:88) i1=1 (cid:90) lR+1(cid:88) (cid:90) (cid:32) ln . . . G RN iR=1 L−1 (cid:80) f (Σ−1/2(x)(y − BDi(x))) (cid:33) f (Σ−1/2(x)(y − BDi(x))) lR+1(cid:88) L−1(cid:80) f (˜y − Σ−1/2(x)BDi(x)) f (˜y − Σ−1/2(x)BDj(x)) j1,...,jR (cid:32) RN iR=1 (cid:33) f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x)) dxd˜y. (cid:90) dxdy (cid:90) G . . . ln j Using the change of variable ˜y = Σ−1/2y on y only and using the concavity of log we can 95 further bound I ((X1, ξ1), Θ) from above by l1+1(cid:88) i1=1 L−1 (cid:90) lR+1(cid:88) (cid:90) (cid:32) ln . . . f (˜y − Σ−1/2(x)BDi(x)) (cid:33) f (˜y − Σ−1/2(x)BD(x)) f (˜y − Σ−1/2(x)BDi(x))Det(Σ1/2(x)) iR=1 RN G dxd˜y. Now we can bound the integral above similarly to the proof of Proposition 2. Hence, we get I ((X1, ξ1), Θ) i1=1 l1+1(cid:88) l1+1(cid:88) l1+1(cid:88) i1=1 . . . . . . . . . iR=1 lR+1(cid:88) lR+1(cid:88) lR+1(cid:88) iR=1 (cid:90) (cid:90) G G C(f, δ1) C(f, δ1) |Σ−1/2(x)BDi(x) − Σ−1/2(x)BD(x)|2Det(Σ1/2(x))dx (cid:107)Σ−1/2(x)(cid:107)2 F(cid:107)B(cid:107)2 F(cid:107)Di(x) − D(x)(cid:107)2 F Det(Σ1/2(x))dx C(f, δ1)CΣ,B(cid:107)Dθ − D˜θ(cid:107)2 G ≤ L−1 ≤ L−1 ≤ L−1 i1=1 iR=1 ≤ C(f, δ1)CΣ,Bβ. This argument along with Propositions 1 and 2, completes the proof of Lemma 12. Next, we define a P/4-separated net in L1-norm B by using b ∈ {0, 1}P similar to Sakhanenko (2013). The cardinality of such set B is larger than exp(P/2), see Ibragimov and Has’minskii (1980, 1981). Recall (3.2.1.2): (r) b,n(x) = v v (r) 0 (x)(1 + n−αϕ b (cid:48)(r) (r) b (x))ψ(r)(nγ|ab(x) − a∗|)). (t Note that by construction, the starting point ab(x) ∈ Aε does not depend on b, therefore 96 ab(x) = a0(x), for all b ∈ {0, 1}P . Moreover, (r) 0 (x) = t t (r) b (x) + n−αϕ (r) b (t (r) b (x))ψ(r)(nγ|a0(x) − a∗|). (3.5.2.1) Following the construction of Sakhanenko (2011), suppose, for each r = 1, . . . , R, we can construct ϕ(r)(t) : R (cid:55)→ [−1, 1] be a twice continuously differentiable function with support [0, 1] such that and ϕ(r)(0) = ϕ(r)(1) = 0,−1 < ϕ (cid:48)(r)(t) ≤ 1, P(cid:88) i=1 ϕ (r) b (t) = bihϕ(r)((t − (i − 1)h)/h), (r) where P = P1nδ, h = T /P. Moreover, suppose the components of ϕ are supported on b equi-length subintervals [(i − 1)h, ih], i = 1, . . . , P. It is very important for us now to show that the tensor subclass is indeed in D2(a∗, G, T ), in particular its members are twice dif- ferentiable. The smoothness condition is a requirement as stated in (A1), which is pivotal to our estimation process of the tensor model of the fiber tracts. Below we would intro- duce Lemma 13 which will establish that the tensors in our parametric subclass are twice continuously differentiable. Lemma 13. For all b ∈ B, we have Db ∈ Dc,k. Proof. Following the arguments in Lemma 3 of Sakhanenko (2011) we note that the quantity (r) t b , r = 1, . . . , R, is twice continuously differentiable. Therefore, the implicit differentiation (r) of the equation (3.5.2.1) would imply the gradient and the Hessian of t b , r = 1, . . . , R, are 97 bounded for all sufficiently large n. In other words, for each r = 1, . . . , R, (cid:48)(r) (cid:107)t b (x)(cid:107)2 ≤ |t 0 (cid:48)(r) (x)| + ε 1 − ε (r) 1 (r) 1 and (cid:107)t (cid:48)(cid:48)(r) b (x)(cid:107)F ≤ ε (r) 2 for some positive constants ε (r) 1 , ε (r) 2 . Also, (cid:48)(r) ϕ b (cid:48)(cid:48)(r) ϕ b (cid:48)(cid:48)(cid:48)(r) b ϕ (t) = (t) = (t) = i=1 P(cid:88) P(cid:88) P(cid:88) i=1 i=1 (cid:48)(r)((t − (i − 1)h)/h), biϕ (cid:48)(cid:48)(r)((t − (i − 1)h)/h)nδP1/T, biϕ (cid:48)(cid:48)(cid:48)(r)((t − (i − 1)h)/h)n2δP 2 biϕ 1 /T 2. Recall (3.2.1.4), D0(x) = R(cid:80) r=1 (r) 0 (x)⊗M . Then we can rewrite the tensor elements by λrv  M(cid:89) j=1 R(cid:88) r=1 λr  . (r) v 0,ij (x) D0,i1,...,iM (x) = Taking the first and the second derivatives of D0,i1,...,iM (x) in xu and xu, xv respectively we 98 get ∂D0,i1,...,iM (x) ∂xu = ∂2D0,i1,...,iM ∂xu∂xv (x) = R(cid:88) R(cid:88) r=1 r=1 λr λr + j=1  M(cid:88) (cid:32) M(cid:88) M(cid:88) j=1 v (r) 0,il (x) M(cid:89) (cid:0) M(cid:89) l=1,l(cid:54)=j ∂v (x) (r) 0,ij ∂xu ∂2v (r) 0,ij (x) ∂xu∂xv (r) v 0,ij (x) l=1,l(cid:54)=j (r) 0,ij2 ∂xv ∂v ∂v (x) (r) 0,ij1 ∂xu  , (x)(cid:1) (cid:16) M(cid:89) j1, j2 = 1 j1 (cid:54)= j2 l = 1 l (cid:54)= j1, j2 (cid:17)(cid:33) . (r) v 0,il (x) (3.5.2.2) Similarly (3.2.1.3) states that 0 (x)⊗M(cid:0)1 + n−αϕ (r) (cid:48)(r) b b (x))ψ(r)(nγ|ab(x) − a∗|)(cid:1)M . (r) (t λrv Db(x) = R(cid:88) r=1 R(cid:88) R(cid:88) r=1 Hence, we can rewrite the above tensor element-wise as follows Db,i1,...,iM (x) = = λrv (x) . . . v (r) 0,i1 M(cid:89) λr( (r) v 0,ij (x)) r=1 j=1 (r) 0,iM (x) (cid:18) 1 + n−αϕ (cid:48)(r) b (cid:18) 1 + n−αϕ (cid:48)(r) b (t (cid:19)M b (x))ψ(r)(nγ|a0(x) − a∗|) (cid:19)M (r) (r) b (x))ψ(r)(nγ|a0(x) − a∗|) (t . Now taking the partial derivative of the tensor component Db,i1,...,iM (x) with respect to xu, 99 we get = + r=1 R(cid:88) R(cid:88) (cid:16) r=1 ∂Db,i1,...,iM (x) ∂xu (cid:16) 1 + n−αϕ (cid:48)(r) b λr M(cid:89) (cid:17)M b (x))ψ(r)(nγ|a0(x) − a∗|) (t(r) (cid:32) (cid:16)  M(cid:88) j=1 (x) ∂v(r) 0,ij ∂xu  (x) M(cid:89) (cid:17)M−1 × v(r) 0,il l=1,l(cid:54)=j λrn−α( v(r) 0,ij (x)) M 1 + n−αϕ (cid:48)(r) b b (x))ψ(r)(nγ|a0(x) − a∗|) (t(r) j=1 (t(r) b (x)) ∂t(r) b (x) ∂xu (cid:48)(cid:48)(r) ϕ b ψ(r)(nγ|a0(x) − a∗|) (cid:48)(r) + nγϕ b (t(r) b (x))ψ (cid:48)(r)(nγ|a0(x) − a∗|) (a0(x) − a∗)T |a0(x) − a∗| ∂a0(x) ∂xu (cid:17)(cid:33) . Next, the second order derivative of the tensor component Db,i1,...,iM (x) with respect to xu 100 (cid:17)M b (x))ψ(r)(nγ|a0(x) − a∗|) (t(r) M(cid:89) (cid:16) (x) (x) ∂v(r) 0,ij1 ∂xu ∂v(r) 0,ij2 ∂xv v(r) 0,il l=1, l(cid:54)=j1,j2 (3.5.2.3) (cid:0) M(cid:89) l=1,l(cid:54)=j (x)(cid:1) v(r) 0,ij (x) ∂2v(r) 0,ij ∂xu∂xv and xv, where u (cid:54)= v, is given by ∂2Db,i1,...,iM (x) ∂xu∂xv (cid:16) = λr 1 + n−αϕ (cid:48)(r) b M(cid:88) R(cid:88) (cid:16) j1,j2=1, j1(cid:54)=j2 R(cid:88) r=1 + (cid:32) r=1 (cid:48)(cid:48)(r) ϕ b (t(r) b (x)) ∂t(r) b (x) ∂xu (a0(x) − a∗)T |a0(x) − a∗| ∂a0(x) ∂xu (cid:32) j=1 (x) (cid:32) M(cid:88) (cid:17)(cid:33) (cid:17)M−1  (t(r) (x) b (x))ψ +2M n−α 1 + n−αϕ (cid:48)(r) b b (x))ψ(r)(nγ|a0(x) − a∗|) (t(r) λr ψ(r)(nγ|a0(x) − a∗|) + nγϕ (cid:48)(r) b (cid:48)(r)(nγ|a0(x) − a∗|) (cid:33) M(cid:88) M(cid:89) (x) ∂v(r) 0,ij ∂xu v(r) 0,il l=1,l(cid:54)=j j=1 ∂t(r) b (x) ∂xu +M (M − 1)n−2α (cid:48)(cid:48)(r) b ϕ (t(r) b (x)) ψ(r)(nγ|a0(x) − a∗|) (cid:48)(r) + nγϕ b b (x))ψ (cid:48)(r)(nγ|a0(x) − a∗|) (t(r)  M(cid:89) R(cid:88) λr r=1 j=1 (cid:48)(cid:48)(cid:48)(r) ϕ b (t(r) b (x)) (cid:16) (x) v(r) 0,ij (cid:32) (cid:33)T +M n−α (cid:32) ∂t(r) b (x) ∂xu ∂t(r) b (x) ∂xv ψ(r)(nγ|a0(x) − a∗|) 1 + n−αϕ (cid:48)(r) b b (x))ψ(r)(nγ|a0(x) − a∗|) (t(r) (cid:33)2 (a0(x) − a∗)T |a0(x) − a∗| ∂a0(x) ∂xu (cid:17)M−1 × (cid:48)(cid:48)(r) + ϕ b (t(r) b (x)) ∂2t(r) b (x) ∂xu∂xv ψ(r)(nγ|a0(x) − a∗|) (cid:48)(cid:48)(r) + nγϕ b (cid:16) ∂t(r) (t(r) b (x))ψ (cid:18) (a0(x) − a∗)T |a0(x) − a∗| b (x) ∂xu (cid:48)(r) b (cid:48)(r)(nγ|a0(x) − a∗|)× (cid:19) + ∂xv ∂a0(x) ∂t(r) b (x) ∂xv (cid:18) (a0(x) − a∗)T (cid:48)(r)(nγ|a0(x) − a∗|) ×(cid:16)(cid:18) ∂a0(x) (cid:48)(cid:48)(r)(nγ|a0(x) − a∗|) |a0(x) − a∗| (cid:18) (a0(x) − a∗)T (cid:19)T ∂a0(x) |a0(x) − a∗| ∂a0(x) ∂xv + n2γϕ (t(r) b (x))ψ (cid:48)(r) + nγϕ b (t(r) b (x))ψ (a0(x) − a∗)T |a0(x) − a∗| + ∂2a0(x) ∂xu∂xv − (a0(x) − a∗)T |a0(x) − a∗|3 ∂a0(x) ∂xu (a0(x) − a∗)T ∂a0(x) ∂xv ∂xv ∂xu |a0(x) − a∗| (cid:19)(cid:17) (cid:17)(cid:33) . ∂a0(x) (cid:19)(cid:18) (a0(x) − a∗)T ∂xu |a0(x) − a∗| 1 (cid:19) ∂a0(x) ∂xu 101 The leading term in (3.5.2.3) is n2γ−αϕ (cid:48)(r) b (t(r) b (x))ψ (cid:48)(cid:48)(r)(nγ|a0(x) − a∗|) (cid:18) (a0(x) − a∗)T |a0(x) − a∗| (cid:19)(cid:18) (a0(x) − a∗)T |a0(x) − a∗| ∂a0(x) ∂xv (cid:19) . ∂a0(x) ∂xu Now by construction the vector fields v (r) 0 (x), r = 1, . . . , R, are bounded and twice continu- ously differentiable, and the functions ϕ(r), ψ(r), r = 1, . . . , R, are bounded thrice and twice continuously differentiable, respectively. Therefore, in the expression of (3.5.2.3), if α > 0 and γ = α/2, then Db satisfies (cid:107)Db−D0(cid:107)∞ ≤ cn−α and (cid:107)D (cid:48)(cid:48) b −D (cid:48)(cid:48) 0(cid:107)∞ ≤ k for some constant k > 0. Now we move on to our next lemma which shows that the difference in the tensors for any two b ∈ B is bounded in the integrated L2-norm. Lemma 14. For any b, ˜b ∈ B, such that b (cid:54)= ˜b and any large enough n we have (cid:107)Db(x) − D˜b(x)(cid:107)2 G ≤ Cn−(d−1)γ+2α, where C > 0 is a constant. 102 r=1 λrv (r) b,n(x)⊗M . Now for fixed b, ˜b ∈ B, we have Proof. From (3.2.1.3) we have Db(x) =(cid:80)R (cid:90) (cid:107)Db(x) − D˜b(x)(cid:107)2 (cid:90) (cid:90) (x)⊗M(cid:17)(cid:107)2 (cid:107)Db(x) − D˜b(x)(cid:107)2 b,n(x)⊗M − λrv(r) λrv(r) F dx F dx ˜b,n = = G G G (cid:16) (cid:16) (cid:107) R(cid:88) (cid:107) R(cid:88) r=1 λrv(r) 0 (x)⊗M (1 + n−αϕ (cid:48)(r) b = − λrv(r) 0 (x)⊗M (1 + n−αϕ 0 (x)⊗M(cid:16) (cid:48)(r) ˜b (t(r) ˜b (1 + n−αϕ (cid:48)(r) b (cid:107) R(cid:88) λrv(r) b (x))ψ(r)(nγ|a0(x) − a∗|))M (t(r) (x))ψ(r)(nγ|a0(x) − a∗|))M(cid:17)(cid:107)2 F dx b (x))ψ(r)(nγ|a0(x) − a∗|))M (t(r) (cid:48)(r) − (1 + n−αϕ ˜b (t(r) ˜b r(cid:107)v(r) λ2 0 (x)⊗M(cid:107)2 (cid:48)(r) − (1 + n−αϕ ˜b (t(r) ˜b F dx (x))ψ(r)(nγ|a0(x) − a∗|))M(cid:17)(cid:107)2 (cid:16) (x))ψ(r)(nγ|a0(x) − a∗|))M(cid:17)2 (cid:104)(cid:16) (1 + n−αϕ (cid:48)(r) b F dx r(cid:107)v(r) λ2 0 (x)⊗M(cid:107)2 F M n−αψ(r)(nγ|a0(x) − a∗|)(ϕ (cid:48)(r) b r=1 G (cid:90) (cid:90) (cid:90) = ≤ = r=1 G r=1 G R(cid:88) R(cid:88) r=1 G b (x))ψ(r)(nγ|a0(x) − a∗|))M (t(r) b (x)) − ϕ (t(r) (cid:48)(r) ˜b (t(r) ˜b (x))) (cid:17) Cϕ,ψ (cid:105)2 dx, where the sequence C(n) R(cid:88) (cid:16) r=1 λ2 r sup x∈G ϕ,ψ = 1 + O(n−2α), 0 (x)⊗M(cid:107)2 (cid:17)(cid:90) F (cid:107)v(r) G =Cn−2αM 2 (ψ(r)(nγ|a0(x) − a∗|))2(ϕ (cid:48)(r) b b (x)) − ϕ (t(r) (cid:48)(r) ˜b (t(r) ˜b (x)))2dx. (3.5.2.4) In the above expression we use the simple identity: (1 + u)M − (1 + v)M = (u − v)(cid:0)M + (cid:18)M (cid:19) 2 (u + v) + (cid:18)M (cid:19) 3 (u2 + uv + v2) + . . .(cid:1) and the fact that functions ϕ, ψ are bounded. Hence, we can find a bounded sequence 103 (n) ϕ,ψ with which we can bound the o(n−α) terms in the expression (3.5.2.4). Now for each C r = 1, . . . , R, integrals (cid:90) (ψ(r)(nγ|a0(x) − a∗|))2(ϕ (cid:48)(r) b (r) (cid:48)(r) b (x)) − ϕ ˜b (t (r) (t ˜b (x)))2dx G can be bounded by Crn−(d−1)γ using Lemma 4 in Sakhanenko (2011), where Cr > 0 is some generic constant depending on r. Hence, we conclude that (cid:107)Db(x) − D˜b(x)(cid:107)2 G ≤ Cn−2α−(d−1)γ for some generic C > 0. Next, we present the lemma which shows the curves driven by the pseudo-eigenvectors are separated in L1-norm inside the class indexed by B. Lemma 15. For each r = 1, . . . , R and for any b, ˜b ∈ B, such that b (cid:54)= ˜b and any large enough n we have (cid:107)x (r) b − x (r) ˜b (cid:107)1,T ≥ Cn−α−δ for some C > 0 depending on r. 104 Proof. For any b, ˜b ∈ B, b (cid:54)= ˜b, we have (·, a∗)(cid:107)1,T b (t)ψ(0), a∗) − x (r) (t))ψ(0) 0 (r) (r) (r) (cid:107)x = = (cid:90) T b (·, a∗) − x (r) ˜b (cid:32) 0 (t + n−αϕ (cid:107)x (cid:90) T (cid:107)n−α(ϕ b (t) − ϕ (r) (cid:90) 1 ˜b (cid:32) 0 (x(t + n−αψ(0)(πϕ (cid:90) T =n−αψ(0) (cid:90) 1 (r) v 0 0 0 (t)| (r) |ϕ b (t) − ϕ (r) ˜b (cid:90) 1 v (cid:16)(cid:107) ≥n−αψ(0) (cid:90) T 0 |ϕ 0 inf t∈[0,T ] b (t) − ϕ (cid:90) T (r) ˜b (r) (t)|dt =n−αψ(0)Cr |ϕ (r) b (t) − ϕ (r) ˜b (t)|dt. 0 (r) 0 (t + n−αϕ (r) ˜b (t)ψ(0), a∗)(cid:107)1dt (cid:33) dt (t)), a∗))dπ(cid:107)1 (r) b (t) + (1 − π)ϕ (r) ˜b (cid:17)(cid:33) dt (r) (r) 0 (x(t + n−αψ(0)(πϕ b (t) + (1 − π)ϕ (cid:34)(cid:16) (r) ˜b 0 (x(t + n−αψ(0)(πϕ b (t) + (1 − π)ϕ (cid:35) (r) ˜b (t)), a∗))dπ(cid:107)1 (r) (r) (cid:107) v 0 (cid:17) (t)), a∗))dπ(cid:107)1 In order to complete the proof of this lemma we need to show that (r) 0 (x(t + n−αψ(0)(πϕ v (r) b (t) + (1 − π)ϕ (r) ˜b (t)), a∗))dπ(cid:107)1 > 0. (3.5.2.5) Let us suppose there exists an n0 such that for all n ≥ n0, (cid:90) 1 0 (cid:107) inf t∈[0,T ] (cid:90) 1 0 (cid:107) inf t∈[0,T ] (r) 0 (x(t + n−αψ(0)(πϕ v (r) b (t) + (1 − π)ϕ (r) ˜b (t)), a∗))dπ(cid:107)1 = 0. 105 Then there exists t0 ∈ [0, T ] such that for n ≥ n0 (r) 0 (x(t0 + n−αψ(0)(πϕ v (r) b (t0) + (1 − π)ϕ (r) ˜b (t0)), a∗))dπ = 0. (3.5.2.6) Using the change of variable t0 + n−αψ(0)(πϕ (r) b (t0) + (1 − π)ϕ (r) ˜b (t0)) = u, we can write (cid:90) 1 0 (3.5.2.6) as (cid:90) t0+n−αψ(0)ϕ (r) b t0+n−αψ(0)ϕ (r) ˜b 0 (x(t0,n + n−αψ(0)(πnϕ (t0) (t0) (r) =⇒ v (r) 0 (x(u, a∗)) v n−αψ(0)(ϕ du (r) (r) b (t0)ϕ ˜b = 0 (t0)) (r) b (t0,n) + (1 − πn)ϕ (r) ˜b (t0,n)))) = 0, where {t0,n}n≥1, t0,n → t0 and πn ∈ [0, 1]. This would imply that there exits x ∈ X0 ⊂ G 0 (x) = 0, x ∈ X0, which violates the assumption that we have imposed in such that v (r) (A1) and (A3). Therefore (3.5.2.5) holds true. Now by following the proof of Lemma 5 in Sakhanenko (2011) the integral (cid:90) T 0 |ϕ (r) b (t) − ϕ (r) ˜b (t)|dt can be shown to be bounded from below by Cn−δ, where C is a generic constant depending on r. Hence, ultimately, we can conclude (cid:107)x (r) b − x (r) ˜b (cid:107)1,T ≥ Cn−α−δ. 106 Now going back to the expression in Theorem 5 using Lemma 12 we obtain ˆX (1) n ,..., ˆX inf n ∈En(a∗,T ) (R) sup D∈Dc,k Ew ˆX (1) n ,..., ˆX inf n ∈En(a∗,T ) (R) Ew sup D∈DL ≥ ≥ (cid:16) n2/(d+3)(cid:16)(cid:107) ˆX n2/(d+3)(cid:16)(cid:107) ˆX (cid:16) (cid:18) (1) n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX (1) n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX |x(r)|≥δ inf (r) 2 : r=1,...,R wn(|x(1)|, . . . ,|x(R)|) 1 − nCΣ,BC(f, δ1)β + ln 2 ln(L − 1) (R) (cid:17)(cid:17) n − x(R)(cid:107)p,T (cid:17)(cid:17) (R) n − x(R)(cid:107)p,T (cid:19) . (3.5.2.7) In the above expression we substitute CΣ,BC(f, δ1) = C > 0 as a generic constant and choose β = Cn1−2α−(d−1)γ, δ 2 = C/2n−α−δ. Also it can be shown that L ≥ exp(P/2), (r) see Ibragimov and Has’minskii (1980, 1981), hence by an algebraic manipulation we get ln(L − 1) ≥ P/2 − 1. Therefore, we can rewrite (3.5.2.7) as (cid:16) n2/(d+3)(cid:16)(cid:107) ˆX ˆX (1) n ,..., ˆX inf n ∈En(a∗,T ) (R) Ew sup D∈Dc,k = |x(r)|≥C/2n−α−δ : r=1,...,R inf wn(|x(1)|, . . . ,|x(R)|) >0.5 inf |x(r)|≥0.5Ch: r=1,...,R w(|x(1)|, . . . ,|x(R)|) > 0, (cid:17)(cid:17) n − x(R)(cid:107)p,T (1) (R) n − x(1)(cid:107)p,T , . . . ,(cid:107) ˆX (cid:32) 1 − n1−2α−(d−1)γC2 + ln 2 (cid:33) P/2 − 1 where wn(|x(1)|, . . . ,|x(R)|) = w (cid:16) n2/(d+3)(cid:16)|x(1)|, . . . ,|x(R)|(cid:17)(cid:17) (cid:16) P1−2 (3.5.2.8) . Note that while obtaining (cid:17)− ln 2, and this completes (3.5.2.8) we have chosen α = 2/(d+3), γ = α/2 and δ = 0. Also we can choose P1 sufficiently large, where P = P1nδ, introduced before such that C2 < 4 the proof of Theorem 5. 107 3.5.3 Proof of Theorem 6 To prove part (a), first fix r in 1, . . . , R. Then we can write, E(cid:107) ˆX (r) n − x(r)(cid:107)p,T = E (cid:107) ˆX (r) n (t) − x(r)(t)(cid:107)p pdt  τ(cid:90)  τ(cid:90) 0 0 1/p 1/p ≤ E(cid:107) ˆX (r) n (t) − x(r)(t)(cid:107)p pdt , using Jensen(cid:48)s inequality. (3.5.3.1) Recall that D denote the vectorized representation of the super-symmetric tensor D. Using this notation and by definition of an integral curve estimator we can write = = = ˆX 0 D (r) n (t) − x(r)(t) t(cid:90) (cid:16) t(cid:90) (cid:16) (cid:16) ˆX v(r)(cid:16) ˆDn v(r)(cid:16) (cid:16) ˆX t(cid:90) (cid:16) v(r)(cid:16) ∇v(r)(cid:16) (cid:16) t(cid:90) ∇v(r)(cid:16) 0 t(cid:90) 0 + (cid:16) D D D + 0 (r) n (s) (r) n (s) (cid:16) ˆX D D ds ds x(r)(s) x(r)(s) (cid:17)(cid:17)(cid:17) (cid:17)(cid:17)(cid:17) (cid:16) (cid:17)(cid:17) − v(r)(cid:16) (cid:16) (cid:17)(cid:17) − v(r)(cid:16) (cid:17)(cid:17) − v(r)(cid:16) (cid:16) ˆX (cid:17)(cid:16) ˆX (cid:16) (cid:17)(cid:17)∇D (cid:17) n (s) − x(r)(s) (cid:17)(cid:17)(cid:16) ˆDn(x(r)(s)) − D(x(r)(s)) (cid:17) (cid:17)(cid:17)(cid:17) (r) n (s) x(r)(s) ds (r) D (r) n (s) x(r)(s) x(r)(s) ds ds + R (r) n (t), 0 108 where the remainder term R (r) n (t) is given by = R t(cid:90) 0 + (r) n (t) D D (cid:16) (cid:16) (r) n (s) v(r)(cid:16) (cid:16) ˆX (cid:17)(cid:17) − v(r)(cid:16) (cid:16) ˆX (cid:17) n (s) − x(r)(s) t(cid:90) (cid:16) ˆX (cid:16) ˆX (cid:17)(cid:17) − v(r)(cid:16) v(r)(cid:16) (cid:16) ˆDn(x(r)(s)) − D(x(r)(s)) (cid:17) (r) n (s) (cid:16) ds (r) D D ds. 0 (cid:17)(cid:17) − ∇v(r)(cid:16) D (cid:16) (cid:17)(cid:17)∇D x(r)(s) (cid:16) (cid:17)(cid:17) x(r)(s) x(r)(s) (cid:17)(cid:17) − ∇v(r)(cid:16) (r) n (s) (cid:16) D (cid:17)(cid:17)(cid:17) x(r)(s) Now similar to Koltchinskii et al. (2007) we can decompose (r) n (s) − x(r)(s) = Z (r) n (s) + δ ˆX (r) n (s), for each r = 1, . . . , R. Then we can write = x(r)(s) (r) n (s)ds + R (r) n (t) ˆX 0 D (r) n (t) − x(r)(t) t(cid:90) ∇v(r)(cid:16) (cid:16) t(cid:90) ∇v(r)(cid:16) t(cid:90) ∇v(r)(cid:16) + + 0 (cid:16) δ x(r)(s) (cid:17)(cid:17)∇D (cid:16) (cid:17) (cid:17)(cid:16) (cid:17)(cid:16) ˆDn − D (cid:17)(cid:17)∇D (cid:16) (cid:17) (cid:17) D(x(r)(s)) x(r)(s) ds D x(r)(s) x(r)(s) (r) n (s)ds. Z 0 109 Here the stochastic processes δ (r) n (t), Z (r) n (t) for any t ∈ [0, τ ], are represented by (r) n (s)ds + R (r) n (t) δ x(r)(s) ds (cid:17) (cid:17) x(r)(s) x(r)(s) (r) n (s)ds. Z (r) n (t) = δ (r) n (t) = Z x(r)(s) (cid:17)(cid:17)∇D (cid:16) (cid:17) (cid:17)(cid:17)(cid:16) ˆDn − D (cid:17)(cid:16) (cid:16) (cid:17)(cid:17)∇D (cid:17) t(cid:90) t(cid:90) 0 D D x(r)(s) x(r)(s) (cid:16) ∇v(r)(cid:16) (cid:16) ∇v(r)(cid:16) t(cid:90) ∇v(r)(cid:16) (cid:17)(cid:17)∇D (cid:17)(cid:17)(cid:16) ˆDn − D (cid:16) (cid:16) x(r)(s) x(r)(s) D 0 0 D + (cid:16) (cid:16) D x(r)(s) = v (cid:48)(r)(x(r)(s)). Then Z (r) n (t) satisfies the (cid:17)(cid:16) (cid:17) x(r)(s) + Z (r) n (t), Z (r) n (0) = 0. (3.5.3.2) Let us denote ∇v(r)(cid:16) = ∇v(r)(cid:16) equation: dZ (r) n (t) dt (r) n (s) = Z 0 where U (r) : R2 (cid:55)→ Rd2 The solution to the stochastic differential equation in (3.5.3.1), can be represented by the d−dimesional random process t(cid:90) U (r)(t, s)∇v(r)(cid:16) (cid:16) D x(r)(s) (cid:17)(cid:17)(cid:16) ˆDn − D (cid:17)(cid:16) (cid:17) x(r)(s) ds, t ∈ [0, τ ], is the Green’s function defined as the solution of the PDE given by: = ∇v(r)(cid:16) (cid:16) (cid:17)(cid:17)∇D (cid:16) (cid:17) D x(r)(t) x(r)(t) U (r)(t, s), ∂U (r)(t, s) ∂t U (r)(s, s) = Id×d. 110 (r) n (t)(cid:107)p + (cid:107)EZ (r) n (t)(cid:107)p + (cid:107)δ (cid:17)p (r) n (t)(cid:107)p . (3.5.3.3) ˜D(Xj), where ˜D(Xj) is the LSE estimator of D(Xj) Now from (3.5.3.1) we have E(cid:107) ˆX (r) p ≤ E(cid:16)(cid:107)Z n (t) − x(r)(t)(cid:107)p (cid:18) x−Xj n(cid:80) (r) n (t) − EZ (cid:19) Since ˆDn(x) = 1 nhd n K j=1 hn given in (1.4.3), we can write ˜D(Xj) = (BT B)−1BT Y (Xj) = (BT B)−1BT (BD(Xj) + Σ1/2(Xj)Ξj) = D(Xj) + Γj, where Γj = (BT B)−1BT Σ1/2(Xj)Ξj. n(cid:80) i=1 (cid:90) Now let us denote Z (r) n (t) = χ (r) i (t), where (cid:33) ds(cid:0)D(Xi) + Γj (cid:1) , x(r)(s) − Xj χ (r) i f (r) t (r) t f (s)K (t) = (s) = I[0,t](s)U (r)(t, s)∇v(r)(cid:16) hn (cid:17) D(x(r)(s)) . Then using similar arguments from Carmichael and Sakhanenko (2015) we can derive (cid:90) (cid:90) EZ (r) n (t) = −hn + 1 2 h2 n (cid:90) (s)∇D(x(r)(s)) f (r) t uK(u)duds f (r) t (s) K(u)(cid:104)∇2D(x(r)(s))u, u(cid:105)duds + o(h2 n). Since we select the bandwidth hn = n−1/(d+3), we get EZ β (t) is the mean of the limiting Gaussian process G(r)(t) for integral curve estimator (r) n (t) = , where (r) µ µ n (r) β (cid:113) (t)+oP (1) nhd−1 111 (cid:32) (cid:90) ˆX (r) n (t) and β is the constant that has been introduced in (A8). Hence, (r) β (t)(cid:113) (cid:107) µ nhd−1 n (cid:107)p = n−2/(d+3)(cid:107)µ (r) β (t)(cid:107)p. Also notice that using similar arguments from Carmichael and Sakhanenko (2015) we get (r) n (t)| = oP ((nhd−1 |δ n )−1/2). sup 0≤t≤τ Now applying Rosenthal’s inequality from Ibragimov and Ibragimov (2008) to independent n with finite p−th moments, we obtain mean zero random variables (t) − Eχ /nhd (t) χ (r) i (r) i (cid:16) (cid:16) E(cid:107) n(cid:88) ≤ d(cid:88) i=1 k=1 C(r)(p) max (cid:32) n(cid:88) E(cid:16)(cid:16) (cid:17) (t) (cid:17) E|(cid:16) χ (r) i (cid:40) n(cid:88) i=1 (r) i,k (t) − Eχ χ (t) − Eχ χ (r) i /nhd n(cid:107)p p (cid:17) (cid:17)2(cid:33)p/2(cid:41) /nhd (r) i,k (t) , n|p, (r) i,k (t) − Eχ (cid:17) (r) i,k (t) /nhd n i=1 112 where C(r)(p) > 0 depends on p and r only. Now we can bound the p−th moment of χ (r) i,k using the similar arguments from Koltchinskii et al. (2007). ds1 (Dk(Xi) + Γj,k) × . . .× (cid:17) E|(cid:16) (cid:90) R ≤C(nhd i,k (t) (cid:90) i,k (t) − Eχ(r) χ(r) n)−pE (cid:32) R f (r) t f (r) t (sp)K (cid:33) (cid:32) n|p /nhd x(r)(s1) − Xi (cid:33) hn (s1)K x(r)(sp) − Xi hn (cid:32) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) |f (r) (s1)| . . .|f (r) (sp)| t dsp (Dk(Xi) + Γj,k) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤C(cid:48)(nhd n)−p (cid:32) Rd+p x(r)(s1) − y K =C(cid:48)hd n(nhd (cid:90) hn n)−p (cid:32) Rd+p (cid:33)(cid:33) ds1 . . . dspdy (cid:32) x(r)(sp) − y . . . K hn (s1)| . . .|f (r) |f (r) t x(r)(s2) − x(r)(s1) K(z)K z + t (cid:33) (sp)| × . . . × K (cid:32) z + (cid:33)(cid:33) ds1 . . . dspdz x(r)(sp) − x(r)(s1) hn =C(cid:48)hd+p−1 n (nhd n)−p (s1)||f (r) (s1 + τ2h)| . . .|f (r) (s1 + τph)|K(z)× t t (cid:33) (cid:32) (cid:90) (cid:90) Rp−1 (cid:90) (cid:32) hn |f (r) t τ2hn τphn (cid:19)1/p Rd+p x(r)(s1 + τ2h) − x(r)(s1) × . . .× x(r)(s1 + τph) − x(r)(s1) t (cid:33) (cid:33)(cid:33) (cid:18)(cid:90) dzds1dτ2 . . . dτp (cid:19)1/p(cid:18)(cid:90) ≤C(cid:48)hd+p−1 (nhd n)−p ˜Λ(τ2, . . . , τp) |f (r) t (s1)|pds1 (cid:32) (cid:32) K z + τ2 K z + τp n (cid:18)(cid:90) |f (r) t (s1 + τphn)|pds1 dτ2 . . . dτp (cid:18)(cid:90) ˜Λ(τ2, . . . , τp) (cid:19) ≤C(cid:48)(cid:48)hd+p−1 n (nhd n)−p Rp−1 ≤C1hd+p−1 n (nhd n)−p. |f (r) t (s1)|pds1 113 (cid:19)1/p × . . .× |f (r) t (s1 + τ2hn)|pds1 (3.5.3.4) Here, in (3.5.3.4), ˜Λ(τ2, . . . , τp) = sup s1,...,sp∈[0,τ ] (cid:90) Rd (cid:32) K(z)K z + τ2 x(r)(s2) − x(r)(s1) s2 − s1 (cid:33) (cid:32) z + τp × . . . × K (cid:33) dz, x(r)(sp) − x(r)(s1) sp − s1 is analogous to Λ(τ1, τ2, τ3) in Carmichael and Sakhanenko (2015). Dk is the k-th component of vectorized D. Also C1, C, C(cid:48) and C(cid:48)(cid:48) are positive constants depending on B, D, U (r). Thus, from (3.5.3.3), we obtain by replacing hn = n−1/(d+3), i=1 (r) i (t) (cid:16) /nhd n(cid:107)p p (r) χ i (t) − Eχ E(cid:107) n(cid:88) (cid:17) C1nhd+p−1 (cid:40) n (nhd n)p C(cid:48) =dC(r)(p)n−3p/(d+3) max 1n(4−p)/(d+3), C(cid:48) C2nhd+1 n (nhd n)2 ≤dC(r)(p) max (cid:40) (cid:32) (cid:33)p/2(cid:41) , 2np/(d+3) (cid:41) (3.5.3.5) =dC(r)(p)C(cid:48) 2n−2p/(d+3), where C(cid:48) 1, C(cid:48) 2 are positive constants. As a result, we have E(cid:107) ˆX (r) n − x(r)(cid:107)p p ≤n−2p/(d+3)b((cid:107)µ (r) β (t)(cid:107)p) E(cid:107) ˆX (r) n − x(r)(cid:107)p,T ≤n−2/(d+3) b((cid:107)µ (r) β (t)(cid:107)pdt  τ(cid:90) 1/p for some bounded and continuous function b : [0,∞) (cid:55)→ [0,∞). This concludes the proof of 0 part (a) in Theorem 3. 114 To prove part (b), we first notice that we can write E sup t∈[0,τ ] max 1≤k≤d | ˆX (r) n,k(t) − x ≤(cid:107)U (r)(cid:107)∞E sup t∈[0,τ ] max 1≤k≤d (r) k (t)| (cid:16) (r) χ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n(cid:88) i=1 (cid:17) (t) /nhd n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup t∈[0,τ ] max 1≤k≤d (t) − Eχ (r) i |µ (r) β,k(t)|(cid:113) nhd n + E sup t∈[0,τ ] max 1≤k≤d |δ (r) k (t)|. We already know sup t∈[0,τ ] L2-norm similar to Sakhanenko (2011) we can bound, n |δ(r)(t)| = oP ((nhd−1 )−1/2). Now using a maximal inequality for (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:17) (r) χ i (cid:16) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n(cid:88) i=1 i=1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n(cid:88) E sup (cid:16) n(cid:80) χ (r) i √ ≤ . E sup t∈[0,τ ] (t) − Eχ (r) i (t) (t) (r) i (cid:16) (r) χ i /nhd n /nhd n t∈[0,τ ] max 1≤k≤d d max 1≤k≤d (t) − Eχ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)21/2 (cid:17) (s) = I[0,t](s)U (r)(t, s)∇v(r)(cid:16) (cid:16)E|W (r)(t1) − W (r)(t2)|2(cid:17)1/2 ≤ C(r)n−2/(d+3)|t2 − t1|. (t) − Eχ /nhd (r) t (r) i i=1 (t) Let us denote, W (r)(t) = few steps of (3.5.3.4) with p = 2 and f 0 ≤ t1 < t2 ≤ τ we get n, t ∈ [0, τ ]. Then repeating the first (cid:17) D(x(r)(s)) , for (3.5.3.6) Observe that the diameter of [0, τ ] in the metric m(t1, t2) = C(r)n−2/(d+3)|t2 − t1| is diam([0, τ ], m) = τ C(r)n−2/(d+3) and the maximal number of ε-separated points in [0, τ ] with respect to metric m is Dist(ε, m) = τ C(r)n−2/(d+3)/ε. Now using the inequality on 115 page 100 in van der Vaart and Wellner (1996) yields (cid:32) |W (r)(t)|2 E sup t∈[0,τ ] (cid:33)1/2 ≤(cid:16)E|W (r)(t0)|2(cid:17)1/2 + K(r) (cid:90) 0 diam([0,τ ],m) Dist1/2(ε, m)dε and we can write (cid:90) diam([0,τ ],m) Dist1/2(ε, m)dε = (cid:90) τ C(r)n−2/(d+3) 0 0 On the other hand, from (3.5.3.5) we have generic constant C(r) > 0. Then we have the bound: (cid:17)1/2 τ C(r)n−2/(d+3)/ε dε = 2τ C(r)n−2/(d+3). (cid:16) (cid:16)E|W (r)(t0)|2(cid:17)1/2 ≤ C(r)n−2/(d+3) for some (cid:33)1/2 ≤ C(r)n−2/(d+3). (cid:32) |W (r)(t)|2 E sup t∈[0,τ ] As a result finally we get E sup t∈[0,τ ] max 1≤k≤d | ˆX (r) n,k(t) − x (r) k (t)| ≤ C(r)n−2/(d+3) for some generic C(r) > 0, which concludes the proof of Theorem 6. 116 Chapter 4 Future research In our work we have presented a comprehensive framework for establishing both local and global minimax lower bounds for the asymptotic risk of the integral curve estimator in high order tensor models. Although here we have specifically developed the theoretical results for the HARDI model, we can generalize our work into any semi-parametric model in a similar fashion. One of the shortcomings of our current method of estimation, is the various computational issues that arise near the branching of integral curves. The issue of computational efficiency of our method can be researched further which could also reduce the computation time near the branching of the fibers. Another interesting direction for future work could be to find minimax estimators in the style of Efromovich (1998) by optimizing the constants further in the lower bounds for both local and global risks. However, this type of analysis may involve more technical details in the proofs of results. Moreover, the methodology we developed in chapter 3 can be used further for comparisons of different protocols in a much more broader setting with many patients involved in a study. In such a setting we can compare the accuracy of the protocols while controlling for the individual effects of each patient. Thus, it could give us a better understanding of the valuable metric that we have explored in chapter 3 and could potentially be deployed in future neuroimaging studies for assessing accuracy. 117 4.1 Extension of integral curve estimation We would also like to make a remark that the integral curve estimation with proper uncer- tainty quantification is a powerful tool that can be used to study more general stochastic process Xt, t ∈ [0,∞) given by the model dXt = µ(Xt, t)dt + σ(Xt, t)dWt, where µ(Xt, t) is the drift parameter, σ2(Xt, t)/2 is the diffusion parameter and Wt, t ∈ [0,∞) is a Wiener process. The Fokker-Planck equation for the density p(x, t) of the stochastic process Xt is given by ∂ ∂t p(x, t) = − ∂ ∂x [µ(x, t)p(x, t)] + ∂2 ∂x2 [ σ2(Xt, t) 2 p(x, t)]. Some of the contemporary work where similar model has been used can be found in Zheng et al. (2019) and Toppalododdi and Wettlaufer (2017). While in Zheng et al. (2019) authors proposed maximum likelihood estimators of the parameters in the Fokker-Planck equation in presence of measurement errors modeled by an α−stable L´evy noise, Toppaladoddi and Wettlaufer (2017) have studied the numerical aspects of the solution to the Fokker-Planck equation to model the density of the thickness of glacial ice sheets. In these type of framework we can extend the integral curve estimation in stochastic partial differential equation (SPDE) with proper uncertainty quantification. Since our method uses relaxed assumptions on the measurement errors involved in the stochastic differential equation, the estimates that we provide are more robust with respect to the underlying model, achieving optimal confidence bounds at the same time. 118 BIBLIOGRAPHY 119 BIBLIOGRAPHY [1] Banerjee, C., Sakhanenko, L., Zhu D.C. Lower Bounds for Accuracy of Estimation in Magnetic Resonance High Angular Resolution Diffusion Imaging Data J Indian Soc Probab Stat (2019). [2] Basser, P.J., Mattiello, J., Le Bihan, D., MR diffusion tensor spectroscopy and imaging, Biophys J, Vol. 66, 259–267 (1994). [3] Behrens, T.E., Berg, H.J., Jbabdi, S., Rushworth, M.F., Woolrich, M.W. (2007) Proba- bilistic diffusion tractography with multiple fibre orientations: What can we gain?, Neu- roimage, Vol. 34, 144–155. [4] Carmichael, O., Sakhanenko, L. Estimation of integral curves from high angular res- olution diffusion imaging (HARDI) data, Linear Algebra and its Application, Vol. 473, 377–403 (2015). [5] Carmichael, O., Sakhanenko, L. Integral curves from noisy diffusion MRI data with closed-form uncertainty estimates, Stat Inference Stoch Process 19, 289–319 (2016). [6] Cator, E. (2011) Adaptivity and optimality of the monotone least-squares estimator, Bernoulli, Vol. 17(2), 714–735. [7] Chang, S.E., Zhu, D.C., Neural network connectivity differences in children who stutter, Brain, Vol. 136, 3709–3726 (2013). [8] Chicone, C., Ordinary differential equations with applications, Springer Science and Business Media, (2006). [9] Coddington, E.A., Levinson, M. (1955) Theory of Ordinary Differential Equations. McGraw-Hill, New York. [10] De Lathauwer L., de Moor B., Vandewalle J. (2000) On the best rank-1 and rank- (R1, R2..., RN ) approximation of higher-order tensors, SIAM J. Matrix Anal. Appl., Vol. 21, 1324—1342. [11] Devroye, L., A course in density estimation, Progress in Probability and Statistics, vol. 14. Birkh¨auser, Boston, (1987). [12] Descoteaux, M., Angelino, E., Fitzgibbons, S., Deriche, R. Apparent diffusion coeffi- cients from high angular resolution diffusion imaging: estimation and applications, Magn. Reson. Med., Vol. 56, 395–410, (2006). 120 [13] Efromovich, S. (1998) Simultaneous sharp estimation of functions and their derivatives, Ann. Statist., Vol. 26, 1, 273–278. [14] Efromovich, S. (2008) Adaptive estimation of and oracle inequalities for probability densities and characteristic functions, Ann. Statist., Vol. 36, 3, 1127–1155. [15] Efromovich, S. (2014) Nonparametric Curve Estimation, Springer. [16] Efromovich, S. (2017) Missing, modified, and large-p-small-n data in nonparametric curve estimation. Calcutta Stat. Assoc. Bulletin, 69(1), 1–34. [17] Efromovich, S. (2018) Missing and Modified Data in Nonparametric Estimation: With R Examples, Chapman and Hall/CRC Monographs on Statistics and Applied Probability. [18] Ferguson, T.S., Mathematical statistics: A decision theoretic approach, Academic Press, New York, (1967). [19] Guntuboyina, A., Sen, B. Global risk bounds and adaptation in univariate convex re- gression. [20] Guntuboyina, A. (2011) Minimax Lower Bounds (Doctoral dissertation) Yale University, New Haven, USA. [21] H´ajek, J. (1972) Local asymptotic minimax and admissibility in estimation. Proc, Sixth Berkeley Symp. on Math. Statist. and Prob., Vol. 1, 175–194. Probab. Theory Relat. Fields 163, 379—411 (2015). [22] Has’minskii, R.Z., Ibragimov, I.A., On density estimation in the view of Kolmogorov’s ideas in approximation theory, Ann. Statist. Vol. 18, No. 3, 999–1010 (1990). [23] Ibragimov, M., Ibragimov R., Optimal constants in the Rosenthal inequality for random variables with zero odd moments, Statistics and Probability Letters, Vol. 78, 186–189, (2008). [24] Ibragimov, I.A., Has’minskii, R.Z. Statistical Estimation - Asymptotic Theory, Springer Science and Business Media, (2013). [25] Ibragimov, I.A., Khasminskii, R.Z., On estimation of the density function, Zap. Nauch. Sem. LOMI AN SSSR, 98, 61–85, (1980). [26] Ibragimov, I.A., Khasminskii, R.Z., Further results on nonparametric density estima- tion, Zap. Nauch. Sem. LOMI AN SSSR, 108, 73–89 (1981). [27] Kim, A.K.H., Samworth, R.J., Global rates of convergence in log-concave density esti- mation, Ann. Statist, Vol. 44, No. 6, 2756–2779, (2016). 121 [28] Koltchinskii, V., Sakhanenko, L., Cai, S. Integral Curves of Noisy Vector Fields and Statistical Problems in Diffusion Tensor Imaging: Nonparametric Kernel Estimation and Hypotheses Testing, Ann. Statist, Vol. 35, No. 4, 1576–1607, (2007). [29] Le Bihan, D., Mangin, J.F., Poupon, C., Clark, C.A., Pappata, S., Molko, N., Chabriat,H., Diffusion tensor imaging: concepts and applications, J. Magn. Reson. Imag- ing., Vol. 13, 534–546, (2001). [30] ¨Ozarslan, E., Mareci, T. Generalized diffusion tensor imaging and analytical relation- ships between diffusion tensor imaging and high angular resolution diffusion imaging, Magn. Reson. Med., Vol. 50, 955–965, (2003). [31] Raskutti, G., Wainwright, M.J., Yu, B., Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming, Journal of Machine Learning Research, Vol. 13, 389–427, (2012). [32] Sakhanenko, L. How to choose the number of gradient directions for estimation problems from noisy diffusion tensor data. Festschrift for Hira Koul Lahiri, S., Schick, A., SenGupta, A., Sriram, T.N. (Eds), Springer Contemporary Developments in Statistical Theory, pp. 305–311, (2013). [33] Sakhanenko, L. (2012) Lower bounds for accuracy of estimation in Diffusion Tensor Imaging, Theory Probab. Appl., Vol. 54, 168–177. [34] Sakhanenko, L., Global rate optimality in a model for Diffusion Tensor Imaging, Theory Probab. Appl., Vol. 55, No. 1, 77—90, (2011). [35] Sakhanenko, L. and DeLaura, M. A comparison study of statistical tractography methodologies for Diffusion Tensor Imaging. International Journal of Statistics: Advances in Theory and Applications, 1(1), 93-110, (2017). [36] Stone, C.J., Optimal global rates of convergence for nonparametric regression, Ann. Statist, Vol. 10, No. 4, 1040–1053, (1982). [37] Toppaladoddi, S., Wettlaufer, J.S., Statistical Mechanics and the Climatology of the Arctic Sea Ice Thickness Distribution, J. Stat Phys, Vol. 167, 683—702, (2017). [38] van der Vaart A.W., Wellner J.A., Weak Convergence and Empirical Processes with Applications to Statistics, Springer Series in Statistics. Springer, New York, NY, (1996). [39] Ying, L., Zou, Y., Klemer, D., and Wang, J., Determination of Fiber Orientation in MRI Diffusion Tensor Imaging Based on Higher-Order Tensor Decomposition, Proceedings of the 29th Annual International Conference of the IEEE EMBS, 2065–2068, (2007). 122 [40] Zheng, Y., Fang, Y., Duan, J., Sun, X., Fu, X., Kurths, J., The maximum likelihood climate change for global warming under the influence of greenhouse effect and L´evy noise, Chaos: An Interdisciplinary Journal of Nonlinear Science, Vol. 1(1), (2020). [41] Zhu, D.C., Majumdar, S., Integration of resting-state FMRI and diffusion-weighted MRI connectivity analyses of the human brain: limitations and improvement, J. Neuroimaging, Vol. 24, 176–186, (2014). [42] Zhu, D.C., Majumdar, S., Korolev, I.O., Berger, K.L., Bozoki, A.C., Alzheimer’s disease and amnestic mild cognitive impairment weaken connections within the default mode network: a multi-modal imaging study, J. Alzheimers Dis., Vol. 34, 969–984, (2013). [43] Zhu H., Zhang H., Ibrahim J., and Peterson B. (2007) Statistical analysis of diffusion tensors in diffusion-weighted magnetic resonance image data, J. Amer. Statist. Assoc., Vol. 102, 1081-–1110. [44] Zhu H., Li Y., Ibrahim I., Shi X., An H., Chen Y., Gao W., Lin W., Rowe D. and Peterson B. (2009) Regression models for identifying noise sources in magnetic resonance images, J. Amer. Statist. Assoc., Vol. 104, 623-–637. 123