Exploring low-rank prior in high-dimensional data
High-dimensional data plays a ubiquitous role in real applications, ranging from biology, computer vision, to social media. The large dimensionality poses new challenges on statistical methods due to the "curse of dimensionality". To overcome these challenges, many statistical and machine learning approaches have been developed based on imposing additional assumptions on the data. One popular assumption is the low-rank prior, which assumes the high-dimensional data lies in a low-dimensional subspace, and approximately exhibits low-rank structure.In this dissertation, we explore various applications of low-rank prior. Chapter 2 studies the stability of leading singular subspaces. Various widely used algorithms have been proposed in numerical analysis, matrix completion, and matrix denoising based on the low-rank assumption, such as Principal Component Analysis and Singular Value Hard Thresholding. Many of these methods involve the computation of Singular Value Decomposition (SVD). To study the stability of these algorithms, in Chapter 2 we establish a useful set of formulae for the sinÎ₈ distance between the original and the perturbed singular subspaces. Following this, we further derive a collection of new results on SVD perturbation related problems.In Chapter 3, we employ the low-rank prior for manifold denoising problems. Specifically, we generalize the Robust PCA (RPCA) method to manifold setting and propose an optimization framework that separates the sparse component from the noisy data. It is worth noting that in this chapter, we generalize the low-rank prior to a more general form to accommodate data with a more complex structure, instead of assuming the data itself lies in a low-dimensional subspace as in RPCA, we assume the clean data is distributed around a low-dimensional manifold. Therefore, if we consider a local neighborhood, the sub-matrix will be approximately low rank.Subsequently, in Chapter 4 we study the stability of invariant subspaces for eigensystems. Specifically, we focus on the case where the eigensystem is ill-conditioned and explore how the condition numbers affect the stability of invariant subspaces.The material presented in this dissertation encompasses several publications and preprints in the fields of Statistical, Numerical Linear Algebra, and Machine Learning, including Lyu and Wang (2020a); Lyu et al. (2019); Lyu and Wang (2022).
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- Attribution-NoDerivatives 4.0 International
- Material Type
-
Theses
- Authors
-
Lyu, He
- Thesis Advisors
-
Wang, Rongrong R.
- Committee Members
-
Wang, Rongrong R.
Iwen, Mark M.
Tang, Jiliang J.
Xie, Yuying Y.
- Date Published
-
2023
- Subjects
-
Engineering
Statistics
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 126 pages
- ISBN
-
9798379702236
- Permalink
- https://doi.org/doi:10.25335/c47y-1s04