Fast and memoryefficient subspace embeddings for tensor data with applications
The widespread use of multisensor technology and the emergence of big data sets have brought the necessity to develop more versatile tools to represent higherorder data with multiple aspects and high dimensionality. Data in the form of multidimensional arrays, also referred to as tensors, arise in a variety of applications including chemometrics, physics, hyperspectral imaging, highresolution videos, neuroimaging, biometrics, and social network analysis. Early multiway data analysis approaches used to reformat such tensor data as large vectors or matrices and would then resort to dimensionality reduction methods developed for lowdimensional data. However, by vectorizing tensors, the inherent multiway structure of the data and the possible correlation between different dimensions will be lost, in some cases resulting in a degradation in the performance of vectorbased methods. Moreover, in many cases, vectorizing tensors leads to vectors with extremely high dimensionality that might render most existing methods computationally impractical. In the case of dimension reduction, the enormous amount of memory needed to store the embedding matrix becomes the main obstacle. This highlights the need for approaches that are applied to tensor data in their multidimensional form. To reduce the dimension of an $n_1 \times n_2 \times \dots \times n_d$ tensor to $m_1 \times m_2 \times \dots \times m_d$ with $m_j \leq n_j$, MPCA\footnote{Multilinear Principal Component Analysis} would change the memory requirement from $\prod_{j=1}^d m_jn_j$ for vector PCA to $\sum_{j=1}^d m_jn_j$, which can be a considerable improvement.On the other hand, tensor dimension reduction methods such as MPCA need training samples for the projection matrices to be learned. This makes such methods time consuming and computationally less efficient than oblivious approaches such as the JohnsonLindenstrauss embedding. The term \textit{oblivious} refers to the fact that one does not need any data samples beforehand to learn the embedding that projects a new data sample onto a lowerdimensional space.\\ In this thesis, first a review of tensor concepts and algebra as well as common tensor decompositions is presented. Next, a modewise JL approach is proposed for compressing tensors without reshaping them into potentially very large vectors. Theoretical guarantees for the norm and inner product approximation errors as well as theoretical bounds on the embedding dimension are presented for data with low CP rank, and the corresponding effects of basis coherence assumptions are addressed. Experiments are performed using various choices of embedding matrices. Results verify the validity of one and twostage modewise JL embeddings in preserving the norm of MRI and synthesized data constructed from both coherent and incoherent bases. Two novel applications of the proposed modewise JL method are discussed. (i) Approximate solutions to least squares problems as a computationally efficient way of fitting tensor decompositions: The proposed approach is incorporated as a stage in the fitting procedure, and is tested on relatively lowrank MRI data. Results show improvement in computational complexity at a slight cost in the accuracy of the solution in the Euclidean norm. (ii) ManyBody Perturbation Theory problems involving energy calculations: In large model spaces, the dimension sizes of tensors can grow fast, rendering the direct calculation of perturbative correction terms challenging. The secondorder energy correction term as well as the onebody radius correction are formulated and modeled as inner products in such a way that modewise JL can be used to reduce the computational complexity of the calculations. Experiments are performed on data from various nuclei in different model space sizes, and show that in the case of large model spaces, very good compression can be achieved at the price of small errors in the estimated energy values.
Read
 In Collections

Electronic Theses & Dissertations
 Copyright Status
 In Copyright
 Material Type

Theses
 Authors

Zare, Ali
 Thesis Advisors

Iwen, Mark A.
 Committee Members

Aviyente, Selin
Wang, Rongrong
Xie, Yuying
 Date
 2022
 Subjects

Data mining
 Degree Level

Doctoral
 Language

English
 Pages
 xi, 113 pages
 ISBN

9798834078241
 Permalink
 https://doi.org/doi:10.25335/1ebyfn97