Fast and memory-efficient subspace embeddings for tensor data with applications

The widespread use of multisensor technology and the emergence of big data sets have brought the necessity to develop more versatile tools to represent higher-order data with multiple aspects and high dimensionality. Data in the form of multidimensional arrays, also referred to as tensors, arise in a variety of applications including chemometrics, physics, hyperspectral imaging, high-resolution videos, neuroimaging, biometrics, and social network analysis. Early multiway data analysis approaches used to reformat such tensor data as large vectors or matrices and would then resort to dimensionality reduction methods developed for low-dimensional data. However, by vectorizing tensors, the inherent multiway structure of the data and the possible correlation between different dimensions will be lost, in some cases resulting in a degradation in the performance of vector-based methods. Moreover, in many cases, vectorizing tensors leads to vectors with extremely high dimensionality that might render most existing methods computationally impractical. In the case of dimension reduction, the enormous amount of memory needed to store the embedding matrix becomes the main obstacle. This highlights the need for approaches that are applied to tensor data in their multi-dimensional form. To reduce the dimension of an $n_1 \times n_2 \times \dots \times n_d$ tensor to $m_1 \times m_2 \times \dots \times m_d$ with $m_j \leq n_j$, MPCA\footnote{Multilinear Principal Component Analysis} would change the memory requirement from $\prod_{j=1}^d m_jn_j$ for vector PCA to $\sum_{j=1}^d m_jn_j$, which can be a considerable improvement.On the other hand, tensor dimension reduction methods such as MPCA need training samples for the projection matrices to be learned. This makes such methods time consuming and computationally less efficient than oblivious approaches such as the Johnson-Lindenstrauss embedding. The term \textit{oblivious} refers to the fact that one does not need any data samples beforehand to learn the embedding that projects a new data sample onto a lower-dimensional space.\\ In this thesis, first a review of tensor concepts and algebra as well as common tensor decompositions is presented. Next, a modewise JL approach is proposed for compressing tensors without reshaping them into potentially very large vectors. Theoretical guarantees for the norm and inner product approximation errors as well as theoretical bounds on the embedding dimension are presented for data with low CP rank, and the corresponding effects of basis coherence assumptions are addressed. Experiments are performed using various choices of embedding matrices. Results verify the validity of one- and two-stage modewise JL embeddings in preserving the norm of MRI and synthesized data constructed from both coherent and incoherent bases. Two novel applications of the proposed modewise JL method are discussed. (i) Approximate solutions to least squares problems as a computationally efficient way of fitting tensor decompositions: The proposed approach is incorporated as a stage in the fitting procedure, and is tested on relatively low-rank MRI data. Results show improvement in computational complexity at a slight cost in the accuracy of the solution in the Euclidean norm. (ii) Many-Body Perturbation Theory problems involving energy calculations: In large model spaces, the dimension sizes of tensors can grow fast, rendering the direct calculation of perturbative correction terms challenging. The second-order energy correction term as well as the one-body radius correction are formulated and modeled as inner products in such a way that modewise JL can be used to reduce the computational complexity of the calculations. Experiments are performed on data from various nuclei in different model space sizes, and show that in the case of large model spaces, very good compression can be achieved at the price of small errors in the estimated energy values.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Zare, Ali

Thesis Advisors: Iwen, Mark A.

Committee Members: Aviyente, Selin
Wang, Rongrong
Xie, Yuying

Date Published: 2022

Subjects: Data mining

Program of Study: Computational Mathematics, Science and Engineering - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: xi, 113 pages

ISBN: 9798834078241

Permalink: https://doi.org/doi:10.25335/1eby-fn97

Fast and memory-efficient subspace embeddings for tensor data with applications

Full text