Computational discovery and annotations of cell-type specific long-range gene regulation
Long-range regulation by distal enhancers plays critical roles in cell-type specific transcriptional programs. Delineation of the underlying mechanisms underlying long-range enhancer regulation will improve our systems-level understandings on the gene regulatory networks and their functional impacts on human diseases. Although there are experimental approaches to infer cell-type specific long-range regulation, they suffer from the problems of low resolution or high false negative rates. Recent technological advances make it possible to have a comprehensive profile of the regulatory activities in multiple layers, bringing us to the multi-omics era. Here, we took use of the booming data resources and integrated them into machine learning models to uncover the resulting effects of long-range regulation, especially in diseases. In the first study about androgen-induced gene regulation in the ovary and its impact on female fertility, we identified a total of 190 annotated significant differentially expressed genes. The H3K27me3 histone modification level change was observed in more than half of the DEGs, highlighting the importance of complex long-range multi-enhancer regulation of androgen receptors regulated genes in the ovarian cells. However, current computational predictions of genome-wide enhancer-promoter interactions are still challenging due to limited accuracy and the lack of knowledge on the molecular mechanisms. Based on recent biological investigations, the protein-protein interactions (PPIs) between transcription factors (TFs) have been found to participate in the regulation of chromatin loops. Therefore, we developed a novel predictive model for cell-type specific enhancer-promoter interactions by leveraging the information of TF PPI signatures. Evaluated by a series of rigorous performance comparisons, the new model achieves superior performance over other methods. In this chromatin loop prediction model, TF bindings inferred from Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) make an essential contribution to the instruction to prioritize specific TF PPIs that may mediate cell-type specific long-range regulatory interactions and reveal new mechanistic understandings of enhancer regulation. When processing ChIP-seq data, we detected, on average, 25% of the ChIP-seq reads can be aligned to multiple positions in the reference genome. These reads are discarded by traditional pipeline, which causes a large loss of information. To cope with this waste, we developed a Bayesian model and designed a Gibbs sampling algorithm to properly align these reads. Evidences from a series of biological comparisons indicated a significantly better performance of this model over the competing tool. In summary, our studies took full advantage of the booming data in this multi-omics era, to provide a novel view of the cell-type specific long-range regulation by distal enhancers and its effects on diseases.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Huang, Binbin
- Thesis Advisors
-
Wang, Jianrong
- Committee Members
-
Arnosti, David
Yan, Ming
Cui, Yuehua
- Date Published
-
2021
- Subjects
-
Bioinformatics
Biology
Mathematics
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 175 pages
- ISBN
-
9798762100809
- Permalink
- https://doi.org/doi:10.25335/590w-kd66