Search results
Pages
-
-
Title
-
Computational identification and analysis of non-coding RNAs in large-scale biological data
-
Creator
-
Lei, Jikai
-
Date
-
2015
-
Collection
-
Electronic Theses & Dissertations
-
Description
-
Non-protein-coding RNAs (ncRNAs) are RNA molecules that function directly at the level of RNA without translating into protein. They play important biological functions in all three domains of life, i.e. Eukarya, Bacteria and Archaea. To understand the working mechanisms and the functions of ncRNAs in various species, a fundamental step is to identify both known and novel ncRNAs from large-scale biological data.Large-scale genomic data includes both genomic sequence data and NGS sequencing...
Show moreNon-protein-coding RNAs (ncRNAs) are RNA molecules that function directly at the level of RNA without translating into protein. They play important biological functions in all three domains of life, i.e. Eukarya, Bacteria and Archaea. To understand the working mechanisms and the functions of ncRNAs in various species, a fundamental step is to identify both known and novel ncRNAs from large-scale biological data.Large-scale genomic data includes both genomic sequence data and NGS sequencing data. Both types of genomic data provide great opportunity for identifying ncRNAs. For genomic sequence data, a lot of ncRNA identification tools that use comparative sequence analysis have been developed. These methods work well for ncRNAs that have strong sequence similarity. However, they are not well-suited for detecting ncRNAs that are remotely homologous. Next generation sequencing (NGS), while it opens a new horizon for annotating and understanding known and novel ncRNAs, also introduces many challenges. First, existing genomic sequence searching tools can not be readily applied to NGS data because NGS technology produces short, fragmentary reads. Second, most NGS data sets are large-scale. Existing algorithms are infeasible on NGS data because of high resource requirements. Third, metagenomic sequencing, which utilizes NGS technology to sequence uncultured, complex microbial communities directly from their natural inhabitants, further aggravates the difficulties. Thus, massive amount of genomic sequence data and NGS data calls for efficient algorithms and tools for ncRNA annotation.In this dissertation, I present three computational methods and tools to efficiently identify ncRNAs from large-scale biological data. Chain-RNA is a tool that combines both sequence similarity and structure similarity to locate cross-species conserved RNA elements with low sequence similarity in genomic sequence data. It can achieve significantly higher sensitivity in identifying remotely conserved ncRNA elements than sequence based methods such as BLAST, and is much faster than existing structural alignment tools. miR-PREFeR (miRNA PREdiction From small RNA-Seq data) utilizes expression patterns of miRNA and follows the criteria for plant microRNA annotation to accurately predict plant miRNAs from one or more small RNA-Seq data samples. It is sensitive, accurate, fast and has low-memory footprint. metaCRISPR focuses on identifying Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) from large-scale metagenomic sequencing data. It uses a kmer hash table to efficiently detect reads that belong to CRISPRs from the raw metagonmic data set. Overlap graph based clustering is then conducted on the reduced data set to separate different CRSIPRs. A set of graph based algorithms are used to assemble and recover CRISPRs from the clusters.
Show less
-
-
Title
-
Investigating fast folding of RNA pseudoknot VPK with an ultrafast microfluidic mixer
-
Creator
-
Meindl, Andreas John
-
Date
-
2012
-
Collection
-
Electronic Theses & Dissertations
-
Description
-
Despite advances in understanding the theory behind RNA folding, ab initio prediction of the folding process has not been achieved yet. Given only the sequence information we still cannot tell the precise three-dimensional structure of neither RNA nor protein. Knowing the kinetics of folding we hope to learn more about the arrangement of secondary and tertiary structure.For that reason we investigated the folding process of RNA pseudoknot VPK with our microfluidic mixing device. VPK (variant...
Show moreDespite advances in understanding the theory behind RNA folding, ab initio prediction of the folding process has not been achieved yet. Given only the sequence information we still cannot tell the precise three-dimensional structure of neither RNA nor protein. Knowing the kinetics of folding we hope to learn more about the arrangement of secondary and tertiary structure.For that reason we investigated the folding process of RNA pseudoknot VPK with our microfluidic mixing device. VPK (variant pseudoknot) is a variant of the mouse mammary tumor virus (MMTV) pseudoknot and it was specifically designed to prevent the formation of alternative base pairings in the stem regions. Using two differently labeled samples, VPK-2AP and F-VPK, and high and low salt folding conditions, we analyzed the folding process and determined the different folding rates. Our results match very well with recently published findings, but they also raise the possible existence of folding transitions not seen in T-jump experiments. The measured folding times are in the range of 0.5 to several milliseconds.The folding process seems to have different pathways with one of the stems of the pseudoknot forming before, perhaps even initiating the folding of the other.
Show less
Pages