Statistical methods for single cell gene expression : differential expression, curve estimation and graphical modelling
This dissertation elucidates a set of statistical methods developed for analysis of single cell gene expression datasets. Gene expression profiling of single cells has led to unprecedented progress in understanding normal physiology, disease progression and developmental processes. However, despite many improvements in high throughput sequencing, various technical factors including cell-cycle heterogeneity, library size differences, amplification bias, and low RNA capture per cell lead to high noise in scRNA-seq experiments. A primary characteristic of these datasets is the presence of high number of zeroes which represents the undetectable level of expression for a transcript. Statistical methods capable of modelling novel single cell experiments are developed and new estimation strategies are proposed and validated using simulated and real data experiments. ⁰́Ø In Chapter 1, the motivation and underlying philosophies of single cell gene expression is reviewed. Methods for analysis of dose response experiments and gene co-expression networks are reviewed and novel statistical hypothesis to be investigated using single cell experiments are discussed. ⁰́Ø In Chapter 2, I analyze a unique in vivo dose response hepatic scRNAseq dataset consisting of 9 dose groups with 3 biological replicates for 11 distinct liver cell types for greater than 100K cells. A Hurdle model for multiple group data is proposed, which models the bimodality of single cell gene expression within multiple groups. Based on the model assumptions, I derive a fit for purpose Bayesian test for simultaneously testing the differences in mean gene expression and zero proportions for multiple dose groups. For comparison the counterpart likelihood-ratio test for differential expression that incorporates testing for both components is also derived. This chapter was originally published in [1]. ⁰́Ø In Chapter 3, dose response curve estimation for single cell experiments is studied. Current protocols for genomic dose response modelling are only capable of modelling bulk and microarray datasets. A semiparametric regression model for joint dose response curve estimation for multiple cell-types while accounting for confounding covariates is proposed. A novel, scalable and efficient optimization algorithm using the MM philosophy is proposed for the estimation of both monotone and non-monotone curves. Two relevant tests of hypothesis are discussed and the proposed methods are validated using several simulated datasets. ⁰́Ø In Chapter 4, co-expression network estimation is studied using graph signal processing. A kernelized signed graph learning approach is developed for learning single cell gene co-expression networks, based on the assumption of smoothness of gene expressions over activating edges. Performance is assessed using real human and mouse embryonic datasets. This chapter was originally published in [2].
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Saha, Satabdi
- Thesis Advisors
-
Maiti, Tapabrata TM
- Committee Members
-
Sakhanenko, Lyudmila LS
Levental, Shlomo SL
Bhattacharya, Sudin SB
- Date Published
-
2022
- Program of Study
-
Statistics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xvii, 160 pages
- ISBN
-
9798841797449
- Permalink
- https://doi.org/doi:10.25335/yq4q-8e06