Defining the characteristics and roles of functional genomic sequences using computational approaches
"Advances in biotechnology have provided a wealth of sequencing data that is transforming our view of a genome. Eukaryotic genomes, initially thought to contain discrete genes in a sea of non-functional DNA, have been found to exhibit pervasive biochemical activity, particularly transcription. However, whether this biochemical activity is functional (i.e. under evolutionary selection) or the result of noisy activity of cellular machinery represents a fundamental debate of the post-genome era. The research described in this dissertation focuses on two open questions confronting genome biology: 1) Where are the functional elements within a genome? 2) What roles are functional elements performing? For the first question, I focused on transcribed regions in unannotated, intergenic regions of genomes, which represent functionally ambiguous sequences. To determine which and how many intergenic transcribed regions (ITRs) represent functional sequences, machine learning-based function prediction models were established using Arabidopsis thaliana as a model. The prediction models were able to successfully distinguish between benchmark functional (phenotype genes) and non-functional sequences (pseudogenes) using evolutionary, biochemical, and sequence-based structural features. When applied to ITRs, 400303% of ITRs were predicted as functional, suggesting ITRs primarily represent transcriptional noise. I further investigated the evolutionary histories of ITRs in four grass (Poaceae) species. ITRs were found to be primarily species-specific and exhibit recent duplicates, with rare examples of ancient duplicate retention. In addition, ITR duplicates and orthologs were usually not expressed. Function prediction models were also generated in Oryza sativa (rice) that predicted 600303% of rice ITRs as nonfunctional. The results of function prediction models and evaluating evolutionary histories both suggest ITRs are primarily non-functional sequences. However, I also provide a list of potentially-functional ITRs that should be considered high priority targets for future experimental studies. For the second question, I established a machine learning framework to predict mutant phenotypes, which provide potent evidence for the role of a gene. Phenotype predictions were focused on essential genes (those with lethal mutant phenotypes) in A. thaliana, as these genes represent a historically well-studied group. Combining 57 expression, duplication, evolutionary, and gene network characteristics through machine learning methods accurately distinguished between genes with lethal and non-lethal mutant phenotypes. Additionally, essential gene prediction models could be applied across species; essential gene prediction models generated in A. thaliana could identify essential genes in rice and Saccharomyces cerevisiae. Thus, machine-learning represents a promising avenue of prioritization of candidate genes for large-scale phenotyping efforts. Overall, the research described in this dissertation highlight computational approaches as highly effective in defining functional sequences and classifying the likely roles of genes."--Pages ii-iii.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- Attribution-NonCommercial-ShareAlike 4.0 International
- Material Type
-
Theses
- Authors
-
Lloyd, John P.
- Thesis Advisors
-
Shiu, Shin-Han
- Committee Members
-
Buell, Carol R.
Last, Robert L.
Mias, George I.
- Date
- 2017
- Program of Study
-
Plant Biology - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xv, 194 pages
- ISBN
-
9780355353679
0355353679
- Permalink
- https://doi.org/doi:10.25335/prr8-0f44