You are here
Search results
(1  3 of 3)
 Title
 Algebraic topology and machine learning for biomolecular modeling
 Creator
 Cang, Zixuan
 Date
 2018
 Collection
 Electronic Theses & Dissertations
 Description

Data is expanding in an unprecedented speed in both quantity and size. Topological data analysis provides excellent tools for analyzing high dimensional and highly complex data. Inspired by the topological data analysis's ability of robust and multiscale characterization of data and motivated by the demand of practical predictive tools in computational biology and biomedical researches, this dissertation extends the capability of persistent homology toward quantitative and predictive data...
Show moreData is expanding in an unprecedented speed in both quantity and size. Topological data analysis provides excellent tools for analyzing high dimensional and highly complex data. Inspired by the topological data analysis's ability of robust and multiscale characterization of data and motivated by the demand of practical predictive tools in computational biology and biomedical researches, this dissertation extends the capability of persistent homology toward quantitative and predictive data analysis tools with an emphasis in biomolecular systems. Although persistent homology is almost parameter free, careful treatment is still needed toward practically useful prediction models for realistic systems. This dissertation carefully assesses the representability of persistent homology for biomolecular systems and introduces a collection of characterization tools for both macromolecules and small molecules focusing on intra and intermolecular interactions, chemical complexities, electrostatics, and geometry. The representations are then coupled with deep learning and machine learning methods for several problems in drug design and biophysical research. In realworld applications, data often come with heterogeneous dimensions and components. For example, in addition to location, atoms of biomolecules can also be labeled with chemical types, partial charges, and atomic radii. While persistent homology is powerful in analyzing geometry of data, it lacks the ability of handling the nongeometric information. Based on cohomology, we introduce a method that attaches the nongeometric information to the topological invariants in persistent homology analysis. This method is not only useful to handle biomolecules but also can be applied to general situations where the data carries both geometric and nongeometric information. In addition to describing biomolecular systems as a static frame, we are often interested in the dynamics of the systems. An efficient way is to assign an oscillator to each atom and study the coupled dynamical system induced by atomic interactions. To this end, we propose a persistent homology based method for the analysis of the resulting trajectories from the coupled dynamical system. The methods developed in this dissertation have been applied to several problems, namely, prediction of protein stability change upon mutations, proteinligand binding affinity prediction, virtual screening, and protein flexibility analysis. The tools have shown top performance in both commonly used validation benchmarks and communitywide blind prediction challenges in drug design.
Show less
 Title
 Studying the effects of sampling on the efficiency and accuracy of kmer indexes
 Creator
 Almutairy, Meznah
 Date
 2017
 Collection
 Electronic Theses & Dissertations
 Description

"Searching for local alignments is a critical step in many bioinformatics applications and pipelines. This search process is often sped up by finding shared exact matches of a minimum length. Depending on the application, the shared exact matches are extended to maximal exact matches, and these are often extended further to local alignments by allowing mismatches and/or gaps. In this dissertation, we focus on searching for all maximal exact matches (MEMs) and all highly similar local...
Show more"Searching for local alignments is a critical step in many bioinformatics applications and pipelines. This search process is often sped up by finding shared exact matches of a minimum length. Depending on the application, the shared exact matches are extended to maximal exact matches, and these are often extended further to local alignments by allowing mismatches and/or gaps. In this dissertation, we focus on searching for all maximal exact matches (MEMs) and all highly similar local alignments (HSLAs) between a query sequence and a database of sequences. We focus on finding MEMs and HSLAs over nucleotide sequences. One of the most common ways to search for all MEMs and HSLAs is to use a kmer index such as BLAST. A major problem with kmer indexes is the space required to store the lists of all occurrences of all kmers in the database. One method for reducing the space needed, and also query time, is sampling where only some kmer occurrences are stored. We classify sampling strategies used to create kmer indexes in two ways: how they choose kmers and how many kmers they choose. The kmers can be chosen in two ways: fixed sampling and minimizer sampling. A sampling method might select enough kmers such that the kmer index reaches full accuracy. We refer to this sampling as hard sampling. Alternatively, a sampling method might select fewer kmers to reduce the index size even further but the index does not guarantee full accuracy. We refer to this sampling as soft sampling. In the current literature, no systematic study has been done to compare the different sampling methods and their relative benefits/weakness. It is well known that fixed sampling will produce a smaller index, typically by roughly a factor of two, whereas it is generally assumed that minimizer sampling will produce faster query times since query kmers can also be sampled. However, no direct comparison of fixed and minimizer sampling has been performed to verify these assumptions. Also, most previous work uses hard sampling, in which all similar sequences are guaranteed to be found. In contrast, we study soft sampling, which further reduces the kmer index at a cost of decreasing query accuracy. We systematically compare fixed and minimizer sampling to find all MEMs between large genomes such as the human genome and the mouse genome. We also study soft sampling to find all HSLAs using the NCBI BLAST tool with the human genome and human ESTs. We use BLAST, since it is the most widely used tool to search for HSLAs. We compared the sampling methods with respect to index size, query time, and query accuracy. We reach the following conclusions. First, using larger kmers reduces query time for both fixed sampling and minimizer sampling at a cost of requiring more space. If we use the same kmer size for both methods, fixed sampling requires typically half as much space whereas minimizer sampling processes queries slightly faster. If we are allowed to use any kmer size for each method, then we can choose a kmer size such that fixed sampling both uses less space and processes queries faster than minimizer sampling. When identifying HSLAs, we find that soft sampling significantly reduces both index size and query time with relatively small losses in query accuracy. The results demonstrate that soft sampling is a simple but effective strategy for performing efficient searches for HSLAs. We also provide a new model for sampling with BLAST that predicts empirical retention rates with reasonable accuracy."Pages iiiii.
Show less
 Title
 The integration of computational methods and nonlinear multiphoton multimodal microscopy imaging for the analysis of unstained human and animal tissues
 Creator
 Murashova, Gabrielle Alyse
 Date
 2019
 Collection
 Electronic Theses & Dissertations
 Description

Nonlinear multiphoton multimodal microscopy (NMMM) used in biological imaging is a technique that explores the combinatorial use of different multiphoton signals, or modalities, to achieve contrast in stained and unstained biological tissues. NMMM is a nonlinear lasermatter interaction (LMI), which utilizes multiple photons at once (multiphoton processes, MP). The statistical probability of multiple photons arriving at a focal point at the same time is dependent on the twophoton absorption ...
Show moreNonlinear multiphoton multimodal microscopy (NMMM) used in biological imaging is a technique that explores the combinatorial use of different multiphoton signals, or modalities, to achieve contrast in stained and unstained biological tissues. NMMM is a nonlinear lasermatter interaction (LMI), which utilizes multiple photons at once (multiphoton processes, MP). The statistical probability of multiple photons arriving at a focal point at the same time is dependent on the twophoton absorption (TPA) crosssection of the molecule being studied and is incredibly difficult to satisfy using typical incoherent light, say from a light bulb. Therefore, the stimulated emission of coherent photons by pulsed lasers are used for NMMM applications in biomedical imaging and diagnostics.In this dissertation, I hypothesized that due to the nearIR wavelength of the Ytterbium(Yb)fiber laser (1070 nm), the four MPtwophoton excited fluorescence (2PEF), second harmonic generation (SHG), threephoton excited fluorescence (3PEF) and third harmonic generation (THG), generated by focusing this ultrafast laser, will provide contrast to unstained tissues sufficient for augmenting current histological staining methods used in disease diagnostics. Additionally, I hypothesized that these NMMM images (NMMMIs) can benefit from computational methods to accurately separate their overlapping endogenous MP signals, as well as train a neural network for image classification to detect neoplastic, inflammatory, and healthy regions in the human oral mucosa. Chapter II of this dissertation explores the use of NMMM to study the effects of storage on donated red blood cells (RBCs) using noninvasive 2PEF and THG without breaching the blood storage bag. Unlike the lack of RBC fluorescence previously reported, we show that with twophoton (2P) excitation from an 800 nm source, and threephoton (3P) excitation from a 1060 nm source, there was sufficient fluorescent signal from hemoglobin as well as other endogenous fluorophores. Chapter III employs NMMM to establish the endogenous MP signals present in healthy excised and unstained mouse and Cynomolgus monkey retinas using 2PEF, 3PEF, SHG, and THG. We show the first epidirection detected crosssection and depthresolved images of unstained isolated retinas obtained using NMMM with an ultrafast fiber laser centered at 1070 nm and a 303038 fs pulse. Two spectrally and temporally distinct regions were shown; one from the nerve fiber layer (NFL) to the inner receptor layer (IRL), and one from the retinal pigmented epithelium (RPE) and choroid. Chapter IV focuses on the use of minimal NMMM signals from a 1070 nm Ybfiber laser to match and augment H&Elike contrast in human oral squamous cell carcinoma (OSCC) biopsies. In addition to performing depthresolved (DR) imaging directly from the paraffin block and matching H&Elike contrast, we showed how the combination of characteristic inflammatory 2PEF signals undetectable in H&E stained tissues and SHG signals from stromal collagen can be used to analytical distinguish healthy, mild and severe inflammatory, and neoplastic regions and determine neoplastic margins in a threedimensional (3D) manner. Chapter V focuses on the use of computational methods to solve an inverse problem of the overlapping endogenous fluorescent and harmonic signals within mouse retinas. The leastsquares fitting algorithm was most effective at accurately assigning photons from the NMMMIs to their source. This work, unlike commercial software, permits using custom signal source reference spectra from endogenous molecules, not from fluorescent tags and stains. Finally, Chapter VI explores the use of the OSCC images to train a neural network image classifier to achieve the overall goal of classifying the NMMMIs into three categorieshealthy, inflammatory, and neoplastic. This work determined that even with a small dataset (< 215 images), the features present in NMMMIs in combination with tiling, transfer learning can train an image classifier to classify healthy, inflammatory, and neoplastic OSCC regions with 70% accuracy.My research successfully shows the potential of using NMMM in tandem with computational methods to augment current diagnostic protocols used by the health care system with the potential to improve patient outcomes as well as decrease pathology departmental costs. These results should facilitate the continued study and development of NMMM so that in the future, NMMM can be used for clinical applications.
Show less