GRAPH-BASED CLUSTERING ALGORITHMS FOR SINGLE-CELL RNA SEQUENCING DATA : METHODS AND THEORY
The innovative technology of single-cell RNA sequencing (scRNAseq) allows us to extract gene expression information from each cell of a tissue, resulting in data sets of tens of thousands to millions of points (cells). Clustering of cells based on the similarity of their gene expression enables the understanding of their functions and hence the characterization of cell types in a tissue. This dissertation focuses on the most widely used clustering methodology for scRNAseq data – clustering based on the graph representation of data points (cells as vertices on a graph). Firstly, we showcase how existing methods can effectively identify an important group of tumor growth related cells in the analysis of head and neck cancer scRNAseq data. The newly discovered marker genes can potentially facilitate new therapy approaches. Secondly, we introduce a novel clustering method that preserves both the global data geometry and cluster structure, via multidimensional scaling based on power-weighted path metrics. The new method outperforms prevailing scRNAseq clustering algorithms on a wide range of benchmarking data sets. Thirdly, we study spectral clustering on shared nearest neighbors (SNN) graphs. In contrast to current ad-hoc methods for number of neighbors selection, we develop a general cross-validation tuning algorithm to achieve effective clustering. Moreover, we provide a comprehensive theoretical analysis of SNN based spectral clustering in the nonparametric setting. Our theoretical results reveal an optimal range of the number of neighbors for cluster identification and characterize the impact of data density on spectral clustering.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Manousidaki, Andriana
- Thesis Advisors
-
Weng, Haolei HW
Xie, Yuying YX
- Committee Members
-
Little, Anna AL
Cui, Yuehua YC
- Date Published
-
2023
- Subjects
-
Biometry
Statistics
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 96 pages
- Permalink
- https://doi.org/doi:10.25335/revb-wy89