You are here
Search results
(1 - 5 of 5)
- Title
- Computational identification and analysis of non-coding RNAs in large-scale biological data
- Creator
- Lei, Jikai
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Non-protein-coding RNAs (ncRNAs) are RNA molecules that function directly at the level of RNA without translating into protein. They play important biological functions in all three domains of life, i.e. Eukarya, Bacteria and Archaea. To understand the working mechanisms and the functions of ncRNAs in various species, a fundamental step is to identify both known and novel ncRNAs from large-scale biological data.Large-scale genomic data includes both genomic sequence data and NGS sequencing...
Show moreNon-protein-coding RNAs (ncRNAs) are RNA molecules that function directly at the level of RNA without translating into protein. They play important biological functions in all three domains of life, i.e. Eukarya, Bacteria and Archaea. To understand the working mechanisms and the functions of ncRNAs in various species, a fundamental step is to identify both known and novel ncRNAs from large-scale biological data.Large-scale genomic data includes both genomic sequence data and NGS sequencing data. Both types of genomic data provide great opportunity for identifying ncRNAs. For genomic sequence data, a lot of ncRNA identification tools that use comparative sequence analysis have been developed. These methods work well for ncRNAs that have strong sequence similarity. However, they are not well-suited for detecting ncRNAs that are remotely homologous. Next generation sequencing (NGS), while it opens a new horizon for annotating and understanding known and novel ncRNAs, also introduces many challenges. First, existing genomic sequence searching tools can not be readily applied to NGS data because NGS technology produces short, fragmentary reads. Second, most NGS data sets are large-scale. Existing algorithms are infeasible on NGS data because of high resource requirements. Third, metagenomic sequencing, which utilizes NGS technology to sequence uncultured, complex microbial communities directly from their natural inhabitants, further aggravates the difficulties. Thus, massive amount of genomic sequence data and NGS data calls for efficient algorithms and tools for ncRNA annotation.In this dissertation, I present three computational methods and tools to efficiently identify ncRNAs from large-scale biological data. Chain-RNA is a tool that combines both sequence similarity and structure similarity to locate cross-species conserved RNA elements with low sequence similarity in genomic sequence data. It can achieve significantly higher sensitivity in identifying remotely conserved ncRNA elements than sequence based methods such as BLAST, and is much faster than existing structural alignment tools. miR-PREFeR (miRNA PREdiction From small RNA-Seq data) utilizes expression patterns of miRNA and follows the criteria for plant microRNA annotation to accurately predict plant miRNAs from one or more small RNA-Seq data samples. It is sensitive, accurate, fast and has low-memory footprint. metaCRISPR focuses on identifying Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) from large-scale metagenomic sequencing data. It uses a kmer hash table to efficiently detect reads that belong to CRISPRs from the raw metagonmic data set. Overlap graph based clustering is then conducted on the reduced data set to separate different CRSIPRs. A set of graph based algorithms are used to assemble and recover CRISPRs from the clusters.
Show less
- Title
- Mechanisms of adaptation and speciation : an experimental study using artificial life
- Creator
- Anderson, Carlos Jesus
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
Detailed experimental studies in evolutionary biology are sometimes difficult--even with model organisms. Theoretical models alleviate some of these difficulties and often provide clean results, but they cannot always capture the complexity of dynamic evolutionary processes. Artificial life systems are tools that fall somewhere between model organisms and theoretical models that have been successfully used to study evolutionary biology. These systems simulate simple organisms that replicate,...
Show moreDetailed experimental studies in evolutionary biology are sometimes difficult--even with model organisms. Theoretical models alleviate some of these difficulties and often provide clean results, but they cannot always capture the complexity of dynamic evolutionary processes. Artificial life systems are tools that fall somewhere between model organisms and theoretical models that have been successfully used to study evolutionary biology. These systems simulate simple organisms that replicate, acquire random mutations, and reproduce differentially; as a consequence, they evolve naturally (i.e., evolution itself is not simulated). Here I use the software Avida to study several open questions on the genetic mechanisms of adaptation and speciation.In Chapter 1 (p. 13), I investigated whether beneficial alleles during adaptation came from new mutations or standing genetic variation--alleles already present in the population. I found that most beneficial alleles came from standing genetic variation, but new mutations were necessary for long-term evolution. I also found that adaptation from standing genetic variation was faster than from new mutations. Finally, I found that recombination brought together beneficial combinations of alleles from standing genetic variation.In Chapter 2 (p. 31), I investigated the probability of compensatory adaptation vs. reversion. Compensatory adaptation is the fixation of mutations that ameliorate the effects of deleterious mutations while the original deleterious mutations remain fixed. I found that compensatory adaptation was very common, but the window of opportunity for reversion was increased when the initial fitness of the population was high, the population size was large, and the mutation rate was high. The reason that the window of opportunity for reversion was constrained was that negative epistatic interactions with compensatory mutations prevented the revertant from being beneficial to the population.In Chapter 3 (p. 58), I showed experimentally that compensatory adaptation can lead to reproductive isolation (specifically, postzygotic isolation). In addition, I found that the strength of this isolation was independent of the effect size of the original deleterious mutations. Finally, I found that both deleterious and compensatory mutations contribute equally to reproductive isolation.Reproductive isolation between populations often evolves as a byproduct of independent adaptation to new environments, but the selective pressures of these environments may be divergent (`ecological speciation') or uniform (`mutation-order speciation'). In Chapter 4 (p. 75), I compared directly the strength of postzygotic isolation generated by ecological and mutation-order processes with and without migration. I found that ecological speciation generally formed stronger isolation than mutation-order speciation and that mutation-order speciation was more sensitive to migration than ecological speciation.Under the Dobzhansky-Muller model of speciation, hybrid inviability or sterility results from the evolution of genetic incompatibilities (DMIs) between species-specific alleles. This model predicts that the number of pairwise DMIs between species should increase quadratically through time, but the few tests of this `snowball effect' have had conflicting results. In Chapter 5 (p. 101), I show that pairwise DMIs accumulated quadratically, supporting the snowball effect. I found that more complex genetic interactions involved alleles that rescued pairwise incompatibilities, explaining the discrepancy between the expected accumulations of DMIs and observation.
Show less
- Title
- Novel computational approaches to investigate microbial diversity
- Creator
- Zhang, Qingpeng
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Species diversity is an important measurement of ecological communities.Scientists believe that there is a strong relationship between speciesdiversity and ecosystem processes. However efforts to investigate microbialdiversity using whole genome shotgun reads data are still scarce. With novel applications of data structuresand the development of novel algorithms, firstly we developed an efficient k-mer countingapproach and approaches to enable scalable streaming analysis of large and error...
Show moreSpecies diversity is an important measurement of ecological communities.Scientists believe that there is a strong relationship between speciesdiversity and ecosystem processes. However efforts to investigate microbialdiversity using whole genome shotgun reads data are still scarce. With novel applications of data structuresand the development of novel algorithms, firstly we developed an efficient k-mer countingapproach and approaches to enable scalable streaming analysis of large and error-prone short-read shotgun data sets. Then based on these efforts, we developed a statistical framework allowing for scalable diversity analysis of large,complex metagenomes without the need for assembly or reference sequences. Thismethod is evaluated on multiple large metagenomes from differentenvironments, such as seawater, human microbiome, soil. Given the velocity ingrowth of sequencing data, this method is promising for analyzing highlydiverse samples with relatively low computational requirements. Further, as themethod does not depend on reference genomes, it also provides opportunities totackle the large amounts of unknowns we find in metagenomicdatasets.
Show less
- Title
- Postmortem microbiome computational methods and applications
- Creator
- Kaszubinski, Sierra Frances
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Microbial communities have potential evidential utility for forensic applications. However, bioinformatic analysis of high-throughput sequencing data varies widely among laboratories and can potentially affect downstream forensic analyses and data interpretations. To illustrate the importance of standardizing methodology, we compared analyses of postmortem microbiome samples using several bioinformatic pipelines, while varying minimum library size or the minimum number of sequences per sample...
Show moreMicrobial communities have potential evidential utility for forensic applications. However, bioinformatic analysis of high-throughput sequencing data varies widely among laboratories and can potentially affect downstream forensic analyses and data interpretations. To illustrate the importance of standardizing methodology, we compared analyses of postmortem microbiome samples using several bioinformatic pipelines, while varying minimum library size or the minimum number of sequences per sample, and sample size. Using the same input sequence data, we found that pipeline significantly affected the microbial communities. Increasing minimum library size and sample size increased the number of low abundant and infrequent taxa detected. Our results show that bioinformatic pipeline and parameter choice significantly affect the resulting microbial communities, which is important for forensic applications. One such forensic application is the potential postmortem reflection of manner of death (MOD) and cause of death (COD). Microbial community metrics have linked the postmortem microbiome with antemortem health status. To further explore this association, we demonstrated that postmortem microbiomes could differentiate beta-dispersion among M/COD, especially for cardiovascular disease and drug-related deaths. Beta-dispersion associated with M/COD has potential forensic utility to aid certifiers of death by providing additional evidence for death determination. Additional supplemental files including tables of raw data and additional statistical tests are available in supplemental files online, denoted in the text as table 'S'.
Show less
- Title
- Reprogramming to the nervous system : a computational and candidate gene approach
- Creator
- Alicea, Bradly John
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
The creation of stem-like cells, neuronal cells, and skeletal muscle fibers from a generic somatic precursor phenotype has many potential applications. These uses range from cell therapy to disease modeling. The enabling methodology for these applications is known as direct cellular reprogramming. While the biological underpinnings of cellular reprogramming go back to the work of Gurdon and other developmental biologists, the direct approach is a rather recent development. Therefore, our...
Show moreThe creation of stem-like cells, neuronal cells, and skeletal muscle fibers from a generic somatic precursor phenotype has many potential applications. These uses range from cell therapy to disease modeling. The enabling methodology for these applications is known as direct cellular reprogramming. While the biological underpinnings of cellular reprogramming go back to the work of Gurdon and other developmental biologists, the direct approach is a rather recent development. Therefore, our understanding of the reprogramming process is largely based on isolated findings and interesting results. A true synthesis, particularly from a systems perspective, is lacking. In this dissertation, I will attempt to build toward an intellectual synthesis of direct reprogramming by critically examining four types of phenotypic conversion that result in production of nervous system components: induced pluripotency (iPS), induced neuronal (iN), induced skeletal muscle (iSM), and induced cardiomyocyte (iCM). Since potential applications range from tools for basic science to disease modeling and bionic technologies, the need for a common context is essential.This intellectual synthesis will be defined through several research endeavors. The first investigation introduces a set of experiments in which multiple fibroblast cell lines are converted to two terminal phenotypes: iN and iSM. The efficiency and infectability of cells subjected to each reprogramming regimen are then compared both statistically and quantitatively. This set of experiments also resulted in the development of novel analytical methods for measuring reprogramming efficiency and infectability. The second investigation features a critical review and statistical analysis of iPS reprogramming, specifically when compared to indirect reprogramming (SCNT-ES) and related stem-like cells. The third investigation is a review and theoretical synthesis which stakes out new directions in our understanding of the direct reprogramming process, including recent computational modeling endeavors and results from the iPS, iN and induced cardiomyocyte (iCM) experiments. To further unify the outcomes of these studies, additional results related to Chapter 2 and directions for future research will be presented. The additional results will allow for further interpretation and insight into the role of diversity in direct reprogramming. These future directions include both experimental approaches (a technique called mechanism disruption) and computational approaches (preliminary results for an agent-based population-level approximation of direct reprogramming). The insights provided here will hopefully provide a framework for theoretical development and a guide for traditional biologists and systems biologists alike.
Show less