Single Cells Are Biological Tokens : Towards Cell Language Models
The rapid advancement of single-cell technologies allows for simultaneous measurement of multiple molecular features within individual cells, providing unprecedented multimodal data through single-cell multi-omics and spatial omics technologies. This dissertation addresses the complex challenges of modeling these multimodal interactions using deep learning techniques. We propose two series of studies: the first, is the application of graph neural networks and graph transformers to model relations between multimodal features, incorporating external domain knowledge. We propose Single-cell Multi-Omics GNN (scMoGNN) and Single-cell Multi-Omics Transformer (scMoFormer), the latter extends the former one and demonstrates the prospect of transformers in single-cell multi-omics representation learning. The second is the application of transformers in spatial omics representation learning. We propose Spatial Transformer (SpaFormer), a transformer-based masked autoencoder learning framework for extracting cell context information and imputing spatial transcriptomics data. Despite the effectiveness of these models, their knowledge transferability across tasks and datasets remains limited. To overcome this, we introduce a new transformer-based foundation model, Cell Pre-trained Language Model (CellPLM), that encodes inter-cellular relations and multimodal features, demonstrating the significant potential of foundation models for future research in single-cell biology.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Wen, Hongzhi
- Thesis Advisors
-
Tang, Jiliang
- Committee Members
-
Tu, Guan-Hua
Liu, Hui
Xie, Yuying
- Date Published
-
2024
- Subjects
-
Bioinformatics
Artificial intelligence
- Program of Study
-
Computer Science - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 127 pages
- Permalink
- https://doi.org/doi:10.25335/xdxk-jd57