Machine learning on drug discovery : algorithms and applications
Drug development is an expensive and time-consuming process where thousands of chemical compounds are being tested and experiments being conducted in order to find out drugs that are safe and effective. Modern drug development aims to speed up the intermediate steps and reduce cost by leveraging machine learning techniques, typically at drug discovery and preclinical research stages. Better identification of promising candidates can significantly reduce the load of later processes, e.g., clinical trials, saving tons of resources as well as time. In this dissertation, we explored and proposed novel machine learning algorithms for drug discovery from the aspects of robustness, knowledge transfer, molecular generation and optimization. First of all, labels from high-throughput experiments (e.g., biological profiling and chemical screening) often contain inevitable noise due to technical and biological variations. We proposed a method that leverages both disagreement and agreement among deep neural networks to mitigate the negative effect brought by noisy labels and better predict drug responses. Secondly, graph neural networks (GNNs) has become popular for modeling graph-structured data (e.g., molecules). Graph contrastive learning, by maximizing the mutual information between paired graph augmentations, has been shown to be an effective strategy for pretraining GNNs. However, the existing graph contrastive learning methods have intrinsic limitations when adopted for molecular tasks. Therefore, we proposed a method that utilizes domain knowledge at both local- and global-level to assist representation learning. The local-level domain knowledge guides the augmentation process such that variation is introduced without changing graph semantics. The global-level knowledge encodes the similarity information between graphs in the entire dataset and helps to learn representations with richer semantics. Last but not least, we proposed a search-based approach for multi-objective molecular generation and optimization. We show that given proper design and sufficient information, search-based methods can achieve performance comparable or even better than deep learning methods while being computationally efficient. Specifically, the proposed method starts with existing molecules and uses a two-stage search strategy to gradually modify them into new ones, based on transformation rules derived from large compound libraries. We demonstrate all the proposed methods with extensive experiments.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Thesis Advisors
-
Zhou, Jiayu
Chen, Bin
- Committee Members
-
Liu, Kevin J.
Tan, Pang-Ning
- Date Published
-
2022
- Program of Study
-
Computer Science - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xv, 88 pages
- ISBN
-
9798438720867
- Permalink
- https://doi.org/doi:10.25335/2zzc-qd42