Machine learning for pose selection
Scoring functions play an important role in protein related systems. In general, scoring functions were developed to connect three dimensional structures and corresponding stabilities. In protein- folding systems, scoring functions can be used to predict the most stable protein structure; in protein-ligand and protein-protein systems, scoring functions can be used to find the best ligand structure, predict the binding affinities, and identifying the correct binding modes. Potential functions make up an essential part of scoring functions. Each potential function usually represents a different interaction that exists in a protein or protein-ligand system. In many traditional scoring functions, energies calculated from individual potential functions were simply sum up to estimate the stability of the whole structure. However, it is possible that those energies cannot be directly added together. In other words, some of those potential functions might describe more important interactions, whereas other potential functions are used to represent insignificant interactions. Hence, it will be useful to construct a model, which can emphasize the important interactions, andignore the insignificant ones.With the development of machine learning (ML), it became possible to build up a model, which can address the importance of different interactions. In this work, we combined random forest (RF) algorithm and different potential function sets to solve the pose selection problem in protein- folding and protein-ligand systems. Chapter 3 and chapter 5 show the results of combing RF algorithm with knowledge-based potential functions and force field potential functions for protein-folding systems. Chapter 4 shows the result of combining the RF method with knowledge-based potential functions for protein-ligand systems. As the results from chapter 3, chapter 4, and chapter 5, it is obvious that the RF model based on potential functions outperformed all of the traditional scoring functions in accuracy and native ranking tests. In order to test the importance of potential functions, scrambled and uniform artificial potential function sets were generated in chapter 3, the test results suggest that the potential function set is important in the model, and the most useful information from knowledge-base potential functions are the peak positions. In chapter 5, the importance of the RF algorithm and potential functions were tested. The results also suggest that the potential functions are important, and the RF model is also necessary to achieve the best performance.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Thesis Advisors
-
Merz, Kenneth M.
- Committee Members
-
Hunt, Katharine C.
Levine, Benjamin G.
Cukier, Robert I.
- Date
- 2020
- Program of Study
-
Chemistry - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xiii, 163 pages
- ISBN
-
9781658475181
1658475186