You are here
Home » Mutation (Biology) (x) » Doctoral (x) » Computational biology (x) » Theses (x) » In Copyright (x) » RNA (x)
Search results
(1 - 4 of 4)
- Title
- Reprogramming to the nervous system : a computational and candidate gene approach
- Creator
- Alicea, Bradly John
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
The creation of stem-like cells, neuronal cells, and skeletal muscle fibers from a generic somatic precursor phenotype has many potential applications. These uses range from cell therapy to disease modeling. The enabling methodology for these applications is known as direct cellular reprogramming. While the biological underpinnings of cellular reprogramming go back to the work of Gurdon and other developmental biologists, the direct approach is a rather recent development. Therefore, our...
Show moreThe creation of stem-like cells, neuronal cells, and skeletal muscle fibers from a generic somatic precursor phenotype has many potential applications. These uses range from cell therapy to disease modeling. The enabling methodology for these applications is known as direct cellular reprogramming. While the biological underpinnings of cellular reprogramming go back to the work of Gurdon and other developmental biologists, the direct approach is a rather recent development. Therefore, our understanding of the reprogramming process is largely based on isolated findings and interesting results. A true synthesis, particularly from a systems perspective, is lacking. In this dissertation, I will attempt to build toward an intellectual synthesis of direct reprogramming by critically examining four types of phenotypic conversion that result in production of nervous system components: induced pluripotency (iPS), induced neuronal (iN), induced skeletal muscle (iSM), and induced cardiomyocyte (iCM). Since potential applications range from tools for basic science to disease modeling and bionic technologies, the need for a common context is essential.This intellectual synthesis will be defined through several research endeavors. The first investigation introduces a set of experiments in which multiple fibroblast cell lines are converted to two terminal phenotypes: iN and iSM. The efficiency and infectability of cells subjected to each reprogramming regimen are then compared both statistically and quantitatively. This set of experiments also resulted in the development of novel analytical methods for measuring reprogramming efficiency and infectability. The second investigation features a critical review and statistical analysis of iPS reprogramming, specifically when compared to indirect reprogramming (SCNT-ES) and related stem-like cells. The third investigation is a review and theoretical synthesis which stakes out new directions in our understanding of the direct reprogramming process, including recent computational modeling endeavors and results from the iPS, iN and induced cardiomyocyte (iCM) experiments. To further unify the outcomes of these studies, additional results related to Chapter 2 and directions for future research will be presented. The additional results will allow for further interpretation and insight into the role of diversity in direct reprogramming. These future directions include both experimental approaches (a technique called mechanism disruption) and computational approaches (preliminary results for an agent-based population-level approximation of direct reprogramming). The insights provided here will hopefully provide a framework for theoretical development and a guide for traditional biologists and systems biologists alike.
Show less
- Title
- Novel computational approaches to investigate microbial diversity
- Creator
- Zhang, Qingpeng
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Species diversity is an important measurement of ecological communities.Scientists believe that there is a strong relationship between speciesdiversity and ecosystem processes. However efforts to investigate microbialdiversity using whole genome shotgun reads data are still scarce. With novel applications of data structuresand the development of novel algorithms, firstly we developed an efficient k-mer countingapproach and approaches to enable scalable streaming analysis of large and error...
Show moreSpecies diversity is an important measurement of ecological communities.Scientists believe that there is a strong relationship between speciesdiversity and ecosystem processes. However efforts to investigate microbialdiversity using whole genome shotgun reads data are still scarce. With novel applications of data structuresand the development of novel algorithms, firstly we developed an efficient k-mer countingapproach and approaches to enable scalable streaming analysis of large and error-prone short-read shotgun data sets. Then based on these efforts, we developed a statistical framework allowing for scalable diversity analysis of large,complex metagenomes without the need for assembly or reference sequences. Thismethod is evaluated on multiple large metagenomes from differentenvironments, such as seawater, human microbiome, soil. Given the velocity ingrowth of sequencing data, this method is promising for analyzing highlydiverse samples with relatively low computational requirements. Further, as themethod does not depend on reference genomes, it also provides opportunities totackle the large amounts of unknowns we find in metagenomicdatasets.
Show less
- Title
- Studying the effects of sampling on the efficiency and accuracy of k-mer indexes
- Creator
- Almutairy, Meznah
- Date
- 2017
- Collection
- Electronic Theses & Dissertations
- Description
-
"Searching for local alignments is a critical step in many bioinformatics applications and pipelines. This search process is often sped up by finding shared exact matches of a minimum length. Depending on the application, the shared exact matches are extended to maximal exact matches, and these are often extended further to local alignments by allowing mismatches and/or gaps. In this dissertation, we focus on searching for all maximal exact matches (MEMs) and all highly similar local...
Show more"Searching for local alignments is a critical step in many bioinformatics applications and pipelines. This search process is often sped up by finding shared exact matches of a minimum length. Depending on the application, the shared exact matches are extended to maximal exact matches, and these are often extended further to local alignments by allowing mismatches and/or gaps. In this dissertation, we focus on searching for all maximal exact matches (MEMs) and all highly similar local alignments (HSLAs) between a query sequence and a database of sequences. We focus on finding MEMs and HSLAs over nucleotide sequences. One of the most common ways to search for all MEMs and HSLAs is to use a k-mer index such as BLAST. A major problem with k-mer indexes is the space required to store the lists of all occurrences of all k-mers in the database. One method for reducing the space needed, and also query time, is sampling where only some k-mer occurrences are stored. We classify sampling strategies used to create k-mer indexes in two ways: how they choose k-mers and how many k-mers they choose. The k-mers can be chosen in two ways: fixed sampling and minimizer sampling. A sampling method might select enough k-mers such that the k-mer index reaches full accuracy. We refer to this sampling as hard sampling. Alternatively, a sampling method might select fewer k-mers to reduce the index size even further but the index does not guarantee full accuracy. We refer to this sampling as soft sampling. In the current literature, no systematic study has been done to compare the different sampling methods and their relative benefits/weakness. It is well known that fixed sampling will produce a smaller index, typically by roughly a factor of two, whereas it is generally assumed that minimizer sampling will produce faster query times since query k-mers can also be sampled. However, no direct comparison of fixed and minimizer sampling has been performed to verify these assumptions. Also, most previous work uses hard sampling, in which all similar sequences are guaranteed to be found. In contrast, we study soft sampling, which further reduces the k-mer index at a cost of decreasing query accuracy. We systematically compare fixed and minimizer sampling to find all MEMs between large genomes such as the human genome and the mouse genome. We also study soft sampling to find all HSLAs using the NCBI BLAST tool with the human genome and human ESTs. We use BLAST, since it is the most widely used tool to search for HSLAs. We compared the sampling methods with respect to index size, query time, and query accuracy. We reach the following conclusions. First, using larger k-mers reduces query time for both fixed sampling and minimizer sampling at a cost of requiring more space. If we use the same k-mer size for both methods, fixed sampling requires typically half as much space whereas minimizer sampling processes queries slightly faster. If we are allowed to use any k-mer size for each method, then we can choose a k-mer size such that fixed sampling both uses less space and processes queries faster than minimizer sampling. When identifying HSLAs, we find that soft sampling significantly reduces both index size and query time with relatively small losses in query accuracy. The results demonstrate that soft sampling is a simple but effective strategy for performing efficient searches for HSLAs. We also provide a new model for sampling with BLAST that predicts empirical retention rates with reasonable accuracy."--Pages ii-iii.
Show less
- Title
- Algebraic topology and machine learning for biomolecular modeling
- Creator
- Cang, Zixuan
- Date
- 2018
- Collection
- Electronic Theses & Dissertations
- Description
-
Data is expanding in an unprecedented speed in both quantity and size. Topological data analysis provides excellent tools for analyzing high dimensional and highly complex data. Inspired by the topological data analysis's ability of robust and multiscale characterization of data and motivated by the demand of practical predictive tools in computational biology and biomedical researches, this dissertation extends the capability of persistent homology toward quantitative and predictive data...
Show moreData is expanding in an unprecedented speed in both quantity and size. Topological data analysis provides excellent tools for analyzing high dimensional and highly complex data. Inspired by the topological data analysis's ability of robust and multiscale characterization of data and motivated by the demand of practical predictive tools in computational biology and biomedical researches, this dissertation extends the capability of persistent homology toward quantitative and predictive data analysis tools with an emphasis in biomolecular systems. Although persistent homology is almost parameter free, careful treatment is still needed toward practically useful prediction models for realistic systems. This dissertation carefully assesses the representability of persistent homology for biomolecular systems and introduces a collection of characterization tools for both macromolecules and small molecules focusing on intra- and inter-molecular interactions, chemical complexities, electrostatics, and geometry. The representations are then coupled with deep learning and machine learning methods for several problems in drug design and biophysical research. In real-world applications, data often come with heterogeneous dimensions and components. For example, in addition to location, atoms of biomolecules can also be labeled with chemical types, partial charges, and atomic radii. While persistent homology is powerful in analyzing geometry of data, it lacks the ability of handling the non-geometric information. Based on cohomology, we introduce a method that attaches the non-geometric information to the topological invariants in persistent homology analysis. This method is not only useful to handle biomolecules but also can be applied to general situations where the data carries both geometric and non-geometric information. In addition to describing biomolecular systems as a static frame, we are often interested in the dynamics of the systems. An efficient way is to assign an oscillator to each atom and study the coupled dynamical system induced by atomic interactions. To this end, we propose a persistent homology based method for the analysis of the resulting trajectories from the coupled dynamical system. The methods developed in this dissertation have been applied to several problems, namely, prediction of protein stability change upon mutations, protein-ligand binding affinity prediction, virtual screening, and protein flexibility analysis. The tools have shown top performance in both commonly used validation benchmarks and community-wide blind prediction challenges in drug design.
Show less