Algebraic topology and machine learning for biomolecular modeling
Cang, Zixuan
Bioinformatics
Algebraic topology
Machine learning
Computational biology
Applied mathematics
Persistent homology analysis
Thesis Ph. D. Michigan State University. Mathematics 2018
Data is expanding in an unprecedented speed in both quantity and size. Topological data analysis provides excellent tools for analyzing high dimensional and highly complex data. Inspired by the topological data analysis's ability of robust and multiscale characterization of data and motivated by the demand of practical predictive tools in computational biology and biomedical researches, this dissertation extends the capability of persistent homology toward quantitative and predictive data analysis tools with an emphasis in biomolecular systems. Although persistent homology is almost parameter free, careful treatment is still needed toward practically useful prediction models for realistic systems. This dissertation carefully assesses the representability of persistent homology for biomolecular systems and introduces a collection of characterization tools for both macromolecules and small molecules focusing on intra- and inter-molecular interactions, chemical complexities, electrostatics, and geometry. The representations are then coupled with deep learning and machine learning methods for several problems in drug design and biophysical research. In real-world applications, data often come with heterogeneous dimensions and components. For example, in addition to location, atoms of biomolecules can also be labeled with chemical types, partial charges, and atomic radii. While persistent homology is powerful in analyzing geometry of data, it lacks the ability of handling the non-geometric information. Based on cohomology, we introduce a method that attaches the non-geometric information to the topological invariants in persistent homology analysis. This method is not only useful to handle biomolecules but also can be applied to general situations where the data carries both geometric and non-geometric information. In addition to describing biomolecular systems as a static frame, we are often interested in the dynamics of the systems. An efficient way is to assign an oscillator to each atom and study the coupled dynamical system induced by atomic interactions. To this end, we propose a persistent homology based method for the analysis of the resulting trajectories from the coupled dynamical system. The methods developed in this dissertation have been applied to several problems, namely, prediction of protein stability change upon mutations, protein-ligand binding affinity prediction, virtual screening, and protein flexibility analysis. The tools have shown top performance in both commonly used validation benchmarks and community-wide blind prediction challenges in drug design.
Includes bibliographical references (pages 181-198).
Online resource; title from PDF title page (Proquest, viewed on May 14, 2019).
Wei, Guowei
Liu, Di
Munch, Elizabeth
Tong, Yiying
2018
text
Electronic dissertations
Academic theses
application/pdf
1 online resource (xiv, 198 pages) : illustrations
isbn:9780438207493
isbn:0438207491
umi:10842562
local:Cang_grad.msu_0128D_16205
en
In Copyright
Ph.D.
Doctoral
Mathematics - Doctor of Philosophy
Michigan State University