Integration of topological data analysis and machine learning for small molecule property predictions
Accurate prediction of small molecule properties is of paramount importance to drug design and discovery. A variety of quantitative properties of small molecules has been studied in this thesis. These properties include solvation free energy, partition coefficient, aqueous solubility, and toxicity endpoints. The highlight of this thesis is to introduce an algebraic topology based method, called element specific persistent homology (ESPH), to predict small molecule properties. Essentially ESPH describes molecular properties in terms of multiscale and multicomponent topological invariants and is different from conventional chemical and physical representations. Based on ESPH and its modified version, element-specific topological descriptors (ESTDs) are constructed. The advantage of ESTDs is that they are systematical, comprehensive, and scalable with respect to molecular size and composition variations, and are readily suitable for machine learning methods, rendering topological learning algorithms. Due to the inherent correlation between different small molecule properties, multi-task frameworks are further employed to simultaneously predict related properties. Deep neural networks, along with ensemble methods such as random forest and gradient boosting trees, are used to develop quantitative predictive models. Physical based molecular descriptors and auxiliary descriptors are also used in addition to ESTDs. As a result, we obtain state-of-the-art results for various benchmark data sets of small molecule properties. We have also developed two online servers for predicting properties of small molecules, TopP-S and TopTox. TopP-S is a software for topological learning predictions of partition coefficient and aqueous solubility, and TopTox is a software for computing element-specific tological descriptors (ESTDs) for toxicity endpoint predictions. They are available at http://weilab.math.msu.edu/TopP-S/ and http://weilab.math.msu.edu/TopTox/, respectively.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Wu, Kedi
- Thesis Advisors
-
Wei, Guo-Wei
- Committee Members
-
Chiu, Chichia
Tang, Moxun
Tong, Yiying
- Date Published
-
2018
- Subjects
-
Machine learning
Homology theory
Algebraic topology
Algebra, Homological
Molecules
Mathematical models
Structure-activity relationships (Biochemistry)
Computer simulation
- Program of Study
-
Applied Mathematics - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xv, 125 pages
- ISBN
-
9780355922820
0355922827
- Permalink
- https://doi.org/doi:10.25335/eray-xm55