ALGEBRAIC TOPOLOGY AND GRAPH THEORY BASED APPROACHES FOR PROTEIN FLEXIBILITY ANALYSIS AND B FACTOR PREDICTION
Protein fluctuation, measured by B factors, has been shown to highly correlate to protein flexibility and function. Several methods have been developed to predict protein B factoras well as related applications such as docking pose ranking, domain separation, entropycalculation, hinge detection, hot spot detection, stability analysis, etc. While many B factormethods exist, reliable B factor prediction continues to be an ongoing challenge and there ismuch room for improvement.This work introduces a paradigm shifting geometric graph based model called the multi-scale weighted colored graph (MWCG) model. The MWCG model is a new generation of computational algorithms that signicantly improves the current landscape of protein struc-tural fluctuation analysis. The MWCG model treats each protein as a colored graph where colored nodes correspond to atomic element types and edges are weighted by a generalized centrality metric. Each graph contains multiple subgraphs based on interaction typesbetween graphic nodes, then protein rigidity is represented by generalized centralities of subgraphs. MWCGs predict the B factors of protein residues and accurately analyze the flexibility of all atoms in a protein simultaneously. The MWCG model presented in thiswork captures element specific interactions across multiple scales and is a novel visual tool for identifying various protein secondary structures. This work also demonstrates MWCG protein hinge detection using a variety of proteins.Cross protein prediction of protein B factors has previously been an unsolved problem in terms of B factor prediction methods. Since many proteins are dicult to crystallize, and for some it is likely impossible, models that can cross predict protein B factor are absolutelynecessary. By integrating machine learning and the advanced graph theory MWCG method, this work provides a robust cross protein B factor prediction solution using a set of known proteins to predict the B factors of a protein previously unseen to the algorithm. Thealgorithm connects different proteins using global protein features such as the resolution of the X-ray crystallography data. The combination of global and local features results in successful cross protein B factor prediction. To test and validate these results this work considers several machine learning approaches such as random forest, gradient boosted trees, and deep convolutional neural networks.Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B factor prediction. This work introduces atom specific persistent homology (ASPH) to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to anatom specic topological representation of individual atomic properties in a molecule. Atom specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and blind cross protein B factor prediction.Extensive numerical testing indicates the proposed methods provide novel and powerful graph theory and algebraic topology based tools for analyzing and predicting atom specific, localized protein flexibility information.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Bramer, David
- Thesis Advisors
-
Wei, Guowei
- Committee Members
-
Cheng, Yingda
Tong, Yiying
Chui, Chichia C.
- Date Published
-
2019
- Subjects
-
Mathematics
- Program of Study
-
Mathematics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 149 pages