Application of machine learning to problems in computational chemistry and biology
With the ever-increasing amounts of chemical and biological data, advancement of machine learning algorithms and computational power, machine learning techniques have started to play a more important role in computational chemistry and biology. We have implemented machine learning models to solve a range of problems from structure prediction, force field development to the prediction of drug molecule toxicity. Since protein chemical shift perturbations (CSPs) induced by ligand binding can be used to refine the structure of a protein-ligand complex I developed a regression model, called HECSP, to compute ligand induced CSPs of protons in a protein, which yielded correlation coefficients of 0.897 (1HA), 0.971 (1HN) and 0.945 (sidechain 1H) with root-mean-square errors (RMSEs) of 0.151 (1HA), 0.199 (1HN) and 0.257 ppm (sidechain 1H), respectively. Based on HECSP, we can further distinguish native ligand poses from decoys and refine protein-ligand complex structures by comparing predicted CSPs with observed values, which is realized with a scoring function (NMRScore_P). Other than HECSP, I have also developed a regression model (EZAFF) to determine force field parameters of 4-6 coordinated zinc containing systems. The reliability of the model has been tested on 6 metalloproteins and 6 organometallic compounds with different coordination spheres. Besides regression, another important part of machine learning are classification problems like the prediction of toxicity of small molecules. Based on the Tox21 dataset, I trained models to predict toxicity using both chemical descriptors and one-dimensional similarities as molecular features. These models cover support vector machine (SVM), random forest (RF) and deep neural network (DNN). AUC results have showed the benefit of including similarities for both RF and DNN. The Highest AUC achieved on the test set is 0.879 by RF.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Yu, Zhuoqin
- Thesis Advisors
-
Merz, Kenneth
- Committee Members
-
Cukier, Robert
Hunt, Katherine
Levine, Ben
- Date Published
-
2019
- Subjects
-
Chemistry
- Program of Study
-
Chemistry - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 141 pages
- Permalink
- https://doi.org/doi:10.25335/2ht0-mg16