IMPROVING THE FIDELITY AND USABILITY OF MOLECULAR MODELS THROUGH HYBRIDIZATION AND MACHINE LEARNING TECHNIQUES By Mehmet C¸ a˘grı Kaymak A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computer Science—Doctor of Philosophy 2023 ABSTRACT Molecular dynamics (MD) is a powerful computational method used to simulate the motion of atoms and molecules. MD simulations compute the evolution of a system of inter- acting particles by applying Newton’s equations of motion, facilitating the study of a range of physical, chemical, and biological phenomena. While quantum mechanical (QM) simula- tions result in accurate predictions of geometries and energies essential for studying various phenomena, the computational complexity has led to the emergence of new approaches such as classical force fields, reactive force fields, and machine learning potentials (MLPs), each offering unique trade-offs. Classical force fields offer longer simulation times due to assump- tions such as static bonds and charges, which prohibit the study of reactive systems. Reactive force fields, such as ReaxFF, bridge the gap between QM methods and classical force fields by allowing dynamic bonds and charges. The improved flexibility results in a higher compu- tational load and a more complex functional form that is hand-crafted by domain experts. MLPs are a more recent approach that utilize large datasets to eliminate complex functional forms, while also leveraging the vast ecosystem of machine learning frameworks for enhanced computational efficiency and ease of development. As the number of methodologies increases, the landscape of MD methods becomes more complex, with each method bringing unique attributes and challenges in simulating molecular systems. We introduce innovative hybridization techniques aiming to leverage the strengths of multiple modeling approaches, improving predictive capabilities and computational effi- ciency. We introduce a hybrid modeling approach called ReaxFF/AMBER that combines the reactivity and polarization capabilities of ReaxFF with the efficiency of classical force fields, facilitating the simulation of larger reactive regions. Although ReaxFF can offer high fidelity when trained carefully, the existing parameterization tools lack the efficiency and speed essential for creating new ReaxFF parameter sets for different applications of interest. We have proposed a novel parameter optimization approach, JAX-ReaxFF, leveraging the capabilities of a scalable machine learning framework to drastically reduce the training times for ReaxFF, thus enhancing the development of new force fields for various applications. We have also modified JAX-ReaxFF to run end-to-end differentiable simulations on different architectures such as CPUs, GPUs, or TPUs with the help of JAX. JAX is a library known for high-performance numerical computing and it provides features such as automatic dif- ferentiation and optimization of Python functions. This approach also allows for improved integration with existing machine learning software infrastructure, offering enhanced flexi- bility and performance portability. Lastly, we propose and compare various uncertainty quantification (UQ) methods suitable for MLPs. These methods are essential for active learning-based data generation approaches, which are crucial for training data-intensive machine learning models. While our primary focus is on MLPs, the datasets created using active learning methods could also enhance the parameterization efforts for classical and reactive force fields. Copyright by MEHMET C¸ A ˘GRI KAYMAK 2023 This thesis is dedicated to my family. v ACKNOWLEDGEMENTS Reaching this milestone would not have been possible without the support of many. I want to take this moment to express my gratitude to my PhD advisor, Hasan Metin Aktulga. His valuable feedback and guidance have been instrumental throughout my PhD journey. His constant support and understanding, especially during challenging times, were greatly appreciated. Besides my advisor, I want to thank my committee members for their valuable feedback and time. I am also grateful to my fellow graduate students for the collaborative exchange of ideas and camaraderie, which have significantly enhanced and brightened our shared path. Finally, my profound and heartfelt gratitude is due to my parents, Ayhan and Feride Kaymak, for their endless encouragement. I must also thank my brother K¨ur¸sat for being there whenever I need him and adding joy to this journey, even from afar. The published works presented in this work are supported in part by a NIH grant (Award No. GM130641), a NSF CDS&E grant (Award No. 1807622) and a NSF OAC grant (1835144). Computational resources provided by the Institute for Cyber-Enabled Research at Michigan State University were utilized for the computations. vi TABLE OF CONTENTS CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Contributions of This Thesis . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 2 REAXFF/AMBER—A FRAMEWORK FOR HYBRID REACTIVE/NONREACTIVE FORCE FIELD MOLECULAR DYNAMICS SIMULATIONS . . . . . . . . . . . . . . . . . . . . . 2.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The ReaxFF/AMBER Integration . . . . . . . . . . . . . . . . . . . . . . 2.3 Claisen Rearrangement Simulations with ReaxFF/Amber . . . . . . . . . 2.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 JAX-REAXFF: A GRADIENT-BASED FRAMEWORK FOR FAST OPTIMIZATION OF REACTIVE FORCE FIELDS . . . . 3.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Force Field Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 4 END-TO-END DIFFERENTIABLE REACTIVE MOLECULAR DYNAMICS SIMULATIONS USING JAX . . . . . . . . . . . . . 4.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 5 UNCERTAINTY QUANTIFICATION METHODS FOR MACHINE LEARNING POTENTIALS . . . . . . . . . . . . . . . 5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 6 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 10 13 13 14 23 29 31 32 35 45 52 57 59 59 62 70 75 76 77 82 86 88 90 APPENDIX A SUPPLEMENTAL DATA FOR REAXFF/AMBER—A FRAMEWORK FOR HYBRID REACTIVE/NONREACTIVE FORCE FIELD MOLECULAR DYNAMICS SIMULATIONS . . 100 APPENDIX B SUPPLEMENTAL DATA FOR JAX-REAXFF: A GRADIENT-BASED FRAMEWORK FOR FAST OPTIMIZATION OF REACTIVE FORCE FIELDS . . . . . . . 101 vii APPENDIX C SUPPLEMENTAL DATA FOR UNCERTAINTY QUANTIFICATION METHODS FOR MACHINE LEARNING POTENTIALS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 viii CHAPTER 1 INTRODUCTION 1.1 Background and Related Work N-body simulation methods focus on the simulation and analysis of the dynamic inter- actions between multiple discrete entities in a complex physical or abstract system. These methods find applications in a wide range of scientific disciplines, including astrophysics, molecular dynamics, and fluid dynamics, among others. N-body computational techniques involve the numerical approximation of the forces and motions governing these interacting bodies, allowing researchers to gain insights into the behavior and evolution of complex systems, from celestial bodies in space to molecules in a chemical reaction. Simulation tech- niques address the challenge of solving equations for multiple interacting bodies by breaking down time into discrete intervals and integrating the behavior of individual particles over these discrete time steps. In this approach, particles are simplistically considered as point masses and interact with each other according to predefined potential functions. This field plays a pivotal role in understanding and predicting the behavior of systems with multiple interacting components, offering valuable insights into natural phenomena and facilitating advancements in various scientific and engineering domains. Atomistic simulation methods are a suite of computational techniques where individual atoms and molecules are the main entities to be simulated and analyzed. By numerically solving the equations governing atomic interactions, atomistic simulations provide a means to explore various phenomena, such as chemical reactions, phase transitions, and mechanical properties, with high precision and detail. Various atomistic simulation methods are devel- oped, including quantum mechanical methods, classical force fields, reactive force fields, and machine learning potentials, each tailored to address specific research questions and model- ing requirements. Fig. 1.1 compares different simulations methods in terms of computational complexity and accuracy. Coarse-grained simulations are computational models used in molecular and biological 1 studies where groups of atoms are simplified into single entities. This approach allows the exploration of larger systems and longer timescales, making it useful for studying macro- scopic properties and behaviors such as protein folding and polymer dynamics. However, due to its simplified nature, some atomic-level details are omitted, making it essential to com- bine coarse-grained simulations with other detailed modeling techniques for a comprehensive understanding. Figure 1.1 Simulation methods. 1.1.1 Quantum Mechanical Methods Quantum Mechanical (QM) simulations enable the accurate prediction of geometries and energies by solving the Schr¨odinger’s equation. QM methods, often referred to as ab-initio methods, start with first-principles quantum mechanics and minimize approximations when deriving solutions. Consequently, due to their modeling fidelity, ab-initio techniques gener- ally yield outcomes that are more precise and dependable compared to atomistic methods that rely on various approximations as it is discussed in later sections. 2 Coarse GrainingClassical Force FieldsReactive Force FieldsTimescale and System SizeAccuracy and Applicability (reactions)QM/MMQuantum MechanicalMachine Learning PotentialsWith high quality training data One of the earliest ab-initio approach is the Hartree-Fock/self-consistent field (HF/SCF) method. At its core, the Hartree-Fock method dissects the complex multi-electron wave func- tion of a molecular system into a collection of simpler one-electron wave functions known as molecular orbitals. In subsequent years, new advanced quantum mechanical methods have been developed. These include second-order Møller-Plesset perturbation theory (MP2), cou- pled cluster with single, double, and triple perturbative excitations (CCSD(T)), second-order perturbation theory within the complete active space framework (CASPT2), and approaches based on density functional theory (DFT) [24]. 1.1.2 Classical Force Fields It is important to note that ab-initio methods, while highly accurate, can be compu- tationally intensive and are primarily suited for small to moderately sized systems with shorter simulation times due to their computational cost. To address this limitation and extend simulations to larger and more complex systems, researchers often transition to force field-based approaches. These approaches treat the atomic nucleus and its electrons as a unit particle and the interactions between atoms are governed by a force field (FF). This force field is a set of parameterized mathematical equations that capture atomic interactions such as bonds, valence angles, torsion, van der Waals, and Coulomb interactions. These simplifications greatly reduce the overall computational cost, but an important measure of the predictive power of force fields is their fidelity, i.e., how well they can reproduce the re- sults of QM calculations and experimental studies. Development of high fidelity force fields relies heavily on optimization of various force field parameters based on carefully selected quantum-chemical and experimental reference data. With the help of these approximations and careful training, molecular dynamics (MD) methods have proven to be successful in atomistic simulations with billions of degrees of freedom [48]. MD methods are based on Newton’s equations of motion, which describe the motion of a particle in terms of its mass, position, and the forces acting upon it. By repeatedly integrating these equations over successive time steps, MD simulations are able to compute 3 the evolution of a system of interacting particles. While one could argue that MD simula- tions might significantly differ from a real system’s actual trajectory due to simplifications, approximations, and numerical errors, MD methods are grounded in the fundamental prin- ciple of classical statistical mechanics: the ergodic hypothesis. This hypothesis states that all states with a specified energy, volume, and number of particles are equally likely to be explored over time by the system [102]. As a result, although one cannot expect MD sim- ulations to precisely replicate real trajectories, they allow to create ensembles of snapshots representing physically achievable system configurations. Hence, this allows researchers to apply concepts from statistical thermodynamics to investigate time-averaged properties such as density, temperature, and free energies. Classical MD models as implemented in highly popular MD software such as AMBER [15], LAMMPS [103], GROMACS [40] and NAMD [72] are based on the assumption of static chem- ical bonds and, in general, static charges. These simplifications enable longer simulations that have been proven to be essential for applications such as drug discovery, protein fold- ing and function [64]. In addition to force field development, there have been efforts to create custom-designed application-specific integrated circuits (ASICs) to extend simulation timescales beyond what was previously attainable [93, 92]. 1.1.3 Reactive Force Fields Classical force fields are not applicable to study phenomena where chemical reactions and charge polarization effects play a significant role. To address this gap, reactive force fields (e.g., ReaxFF [108], REBO [13], Tersoff [101]) have been developed. These bond order po- tentials allow bonds to form and break throughout the simulation and they can dynamically assign partial charges to atoms using suitable charge models such as the electronegativity equalization method (EEM) [65]. The functional forms for reactive potentials are signifi- cantly more complex than their classical counterparts due to the presence of dynamic bonds and charges. For instance, ReaxFF has a formulation that contains more than 100 parame- ters for a simulation with 3 elements, and is about two orders of magnitude more expensive 4 than a typical Lennard-Jones potential. Consequently, training reactive force fields is an even more difficult task due to the need to capture complex phenomena such as charge distributions and reactions, and due to the large number of parameters involved. The ReaxFF potential is an excellent candidate for modernization and hybridization because it supports dynamic charges and bond breaking, unlike classical force fields. With its complex functional form, ReaxFF can express a wide variety of biological and chemical phenomena when properly parameterized [88]. Since it is an important building block for the further chapters, we provide further details about the ReaxFF potential in the following paragraphs. 1.1.3.1 ReaxFF Overview ReaxFF divides the total potential energy into various parts, including bonded and non- bonded interactions as shown in Eq. (1.1). The model takes atom coordinates and required force field parameters for the set of elements present in the system as input, and calculates all terms constituting the potential energy together with the corresponding forces. The derivative of each potential energy term with respect to atom positions gives forces that are fundamental to the MD simulation. There are a number of ReaxFF implementations with different features and architectural support such as the original Fortran Reax code [108], PuReMD [2, 3, 53], GULP [30] and LAMMPS [73]. Esystem = Ebond + Elone-pair + Eover + Eunder + Eval + Epen + Etors + Econj (1.1) + EHbond + EvdWaals + ECoulomb An important aspect of ReaxFF that separates it from classical MD models are the notions of bond orders and dynamic partial charges (not shown in Eq. (1.1)). The bond order concept is used to determine the bond strength between pairs of atoms given their element types and distances. These pairwise bond orders are then subjected to corrections that take into account the information about all atoms surrounding each atom to obtain the predicted bonding information in a system. The corrected bond order constitutes the main input for 5 common potential energy terms such as bond energy (Ebond), valence angle energy (Eval), and torsion angle energy (Etors). However, in a dynamic bonding model, since atoms may not attain their optimal coordinations, additional terms such as lone pair (Elone-pair), over/under- coordination (Eover, Eunder), three-body penalty (Epen), and four-body conjugation (Econj) energies are needed. For systems with hydrogen bonds, a special energy term (EHbond) is used. The van der Waals energy (EvdWaals), which is based on the Morse potential, and the electrostatic energy term (ECoulomb), which uses shielded and range-limited interactions based on dynamic charges calculated from charge models such as EEM [65], constitute the non-bonded terms in ReaxFF. Typically, bonded interactions are truncated at 5 ˚A, hydrogen bonds are effective up to 7.5 ˚A and non-bonded interactions are range limited to 10-12 ˚A depending on the system. Fig. 1.2 summarizes the calculations performed within a ReaxFF step. Figure 1.2 Flow graph of calculations performed in the Reax force field. 6 Atom PositionsDetermine ChargesDetermine bond orders (BO)System specific termsEvdWaalsEBondEOverECoulombEAngleETorsionDetermine angles and torsionsCorrect (BO) for local overcoordinationEsystemNon-bondingBondingNon-bonded Interaction List2-body Interaction List3-body and 4-body Interaction Lists 1.1.4 Machine Learning Potentials The success of ML techniques in fields such as computer vision and natural language processing has triggered its wide-spread use also in scientific computing. More recently, we started witnessing an increase in the number of scientific applications adopting ML libraries such as Tensorflow [1], PyTorch [70], and JAX [11]. Such tools have enabled fast proto- typing of new ideas as well as hardware portability without sacrificing much computational efficiency. Specifically, in molecular modeling and simulation, a new class of force fields called machine learning potentials (MLP) such as SNAP [104], the Behler/Parrinello potential [9], SchNet [86], OrbNet [76], and NequIP [8] has emerged. While the increased number of parameters and improved learning capacity of MLPs bring the accuracy of MD simulations closer to the QM level, the training data quality becomes critical for the quality of the MLPs. Although the classical and reactive MD models also require high quality training data and careful parameterization, due to the embedded physical bias, they typically require less data than their ML based counterparts. Typically, MLPs encode the local neighborhood information of each atom into a latent vector, compressing and summarizing the local information. Subsequently, the latent vector is mapped to observables, typically energies or charges. The sum of the individual atom energies yields the total molecular energy, as shown in Eq. (1.2) where Ei, atomic represents the local atomic energy for atom i, Epot is the total energy of the molecule with N atoms. The forces ( ⃗Fi) needed to drive the simulation are determined by taking the derivative of the total energy with respect to the atomic positions (thereby guaranteeing energy conservation). Epot = (cid:88) i∈N Ei, atomic ⃗Fi = −∇iEpot (1.2) (1.3) While it is possible to directly predict the total energy, representing it as a sum of individual atomic energies simplifies the training process and also makes computations easily 7 parallelizable, as calculations for each atom are independent of one another. MLPs differ in how they encode local information. Fingerprint-based MLPs, like the Behler/Parrinello potential, rely on handcrafted descriptors, while more recent approaches utilize learnable feature extractors. Message Passing Neural Networks (MPNNs) offer a versatile framework grounded in graph-based deep learning. MPNNs represent molecular structures as graphs, with atoms as nodes and bonds as edges, utilizing message-passing al- gorithms to update node representations by aggregating information from neighboring nodes. This flexibility makes MPNNs well-suited for a range of molecular modeling tasks, includ- ing property prediction and generative modeling. Equivariant Neural Networks, meanwhile, are designed to respect the symmetries inherent in input data, making them particularly valuable for systems with spatial or rotational symmetries. They ensure that network pre- dictions remain invariant or equivariant under specific transformations, such as rotations or translations. A study by Batzner et al. [8] demonstrates that, when carefully trained, they exhibit higher data efficiency, as they inherently account for the data’s symmetries without requiring additional data or training. The ANI potential [96], an early MLP, played a pivotal role in popularizing machine learning for molecular modeling and simulation. It relies on handcrafted local descriptors, building upon the Behler/Parrinello potential. Due to its widespread adoption and availabil- ity as an open-source implementation (TorchANI [33]), ANI serves as an ideal candidate to demonstrate new approaches. In Chapter 5, we showcase and compare various uncertainty quantification methods for MLPs using ANI as an illustrative example. Further details about the ANI architecture are provided in the subsequent section. 1.1.4.1 ANI Overview ANI, unlike traditional force fields, employs artificial neural networks to define potential energy as an explicit function of atomic coordinates, without predefined bond concepts. ANI computes an atomic environmental vector (AEV) for each atom, which is invariant to translation and rotation. Using an atom type specific feed forward neural network, these 8 vectors are then mapped to a scalar value representing the atomic contribution to the total energy. The sum of these energies gives the total molecular energy as shown in Eq. (1.2). AEVs contain radial and angular sections to vectorize the local environment. The local radial environment of atom i is encoded by utilizing Eq. (1.4) where η is used to control the width of the Gaussian distribution and Rs is used to shift the peak. Rij is the distance between atoms i and j. To better capture the radial local environment, multiple η and Rs values are used. In Eq. (1.4), m is used as the index over a set of η and Rs values. Finally, individual GR i,m values are concatenated to create the radial part of the AEV. GR i,m = all atoms (cid:88) j̸=i e−η(Rij −Rs)2 fC (Rij) (1.4) To limit the size of the local region, neighbors beyond a selected cutoff (RC) are ignored. To make the transition around the cutoff region continuous, a piece-wise cutoff function is utilized (fC). fC (Rij) =    0.5 × cos (cid:17) (cid:16) πRij RC 0.0 + 0.5 for Rij ≤ RC for Rij > RC (1.5) For the angular features of atom i, Eq. (1.6) is used where θijk is the angle between atoms i, j and k, centered on atom i. GAmod i,m is basically in the form of a sum over all neighboring atom pairs (j and k). While η and Rs work similar to (1.4), θs allows probing different regions of the angular environment by shifting the angle term. ζ controls the width of the peaks in the angular environment. fC multipliers are used to make the function differentiable around the cutoff region. m is used as the index over a set of η, Rs, ζ and θs values. Finally, individual GAmod i,m values are concatenated to create the angular part of the AEV. (1 + cos (θijk − θs))ζ GAmod i,m =21−ζ all (cid:88) j,k̸=i (cid:34) × exp −η (cid:18) Rij + Rik 2 (cid:19)2(cid:35) − Rs fC (Rij) fC (Rik) (1.6) 9 The AEV of a given atom distinguishes between atom types by incorporating unique radial components for each neighboring atom type and unique angular components for each neighboring atom type pair, ensuring accurate representation of the atomic environment. 1.2 Contributions of This Thesis In this thesis, we introduce new techniques that aim to improve the fidelity and usability of molecular models through hybridization and machine learning techniques. As introduced earlier, there are various modeling approaches with their unique trade- offs. While simulating the whole system with the desired method is straightforward, some applications require different predictive capabilities and considerations such as polarizability and/or reactivity. Combining multiple modeling approaches in a hybrid framework could help researchers leverage the capabilities of multiple approaches. Mixed quantum mechan- ics/molecular mechanic (QM/MM) [25] methods help combining classical force fields such as AMBER with QM methods when the apriori knowledge of the reactions in a system of interest and spatially localized reactivity is available. Although this opens up new possibili- ties, QM approaches are still computationally infeasible when the reactive region is large or when long simulated time scales are necessary for the chemistry of interest. In Chapter 2, we propose a new solution called ReaxFF/AMBER that combines a reactive force field with a classical force field. As ReaxFF supports reactivity and polarization, it enables simulating larger reactive regions when a high fidelity force field is available for the system of interest. The same approach could be easily extended for the newly emerged MLPs. While the AMBER/ReaxFF approach is proven to be successful when ReaxFF model is trained properly, the existing optimization frameworks lack the necessary speed that is essential to develop new force fields in a timely manner for new applications. The previous approaches rely on the existing ReaxFF implementations that are designed to run large sim- ulations for long time windows. We propose a new parameter optimization approach called JAX-ReaxFF that utilizes an efficient and scalable machine learning framework called JAX. By leveraging auto-differentiation, auto-vectorization and just-in-time (JIT) compilation ca- 10 pabilities of JAX, we are able to reduce the training time for ReaxFF drastically. This is especially important when one wants to utilize large datasets that are crafted for more data- hungry MLPs. JAX-ReaxFF could easily leverage these datasets. Similar approach could also be used for classical force fields such as AMBER as demonsrated in [114]. Although JAX-ReaxFF is crafted specifically for parameter optimization tasks, a Python based implementation offers a portable, performant, and easy to maintain software. Also, Python is the main language for ML based MD tools, and this leads more ways to hybridize force fields similar to the ReaxFF/AMBER approach. As the cross-language communication between different libraries is troublesome, having different force fields and MD implemen- tations in Python offers greater flexibility. With the auto-differentiation and performance portability capabilities of JAX, we present a new end-to-end differentiable and portable ReaxFF implementation that could be used for MD simulations, free energy calculations or parameter optimization. The new implementation is a part of a bigger JAX based MD library called JAX-MD, which already supports different ML and force field based simu- lations. This leads to better use of existing software infrastructure. As it is discussed in Chapter 4, the performance on modern GPUs is still on par with the existing software for ReaxFF. This work showcases the practicality of the ML infrastructure for complex force fields such as ReaxFF. While we extend the capabilities of the chosen classical and reactive force fields by utiliz- ing hybridization and the existing ML software infrastructure, the development of the MLPs lead to the need for bigger datasets and methods to create them. Moreover, unlike regu- lar force fields with hand-crafted functional forms, MLPs typically utilize general building blocks for prediction which makes their performance harder to quantify. To remedy some of these issues, we propose and compare different uncertainty quantification (UQ) approaches suitable for MLPs in Chapter 5. As it gets harder to generate rich datasets for molecular modeling, active learning based approaches have gained popularity [74, 91, 97]. Since having a reliable, fast and preferably differentiable UQ metric is essential for active learning based 11 data generation, we evaluate different UQ metrics on various datasets using the ANI model as an example. 12 CHAPTER 2 REAXFF/AMBER—A FRAMEWORK FOR HYBRID REACTIVE/NONREACTIVE FORCE FIELD MOLECULAR DYNAMICS SIMULATIONS The following chapter presents previously published work on adding reactive capabili- ties by utilizing ReaxFF in AMBER [77]. This work is reproduced with the permission of American Chemical Society. As stated in Chapter 1, the AMBER MD software package is a widely used tool to study biological systems such as proteins. However, since it is a classical force field, it only supports static bonds and charges. This limits its abilities to simulate chemical reactions due to the rigid connectivity requirement. QM methods can mitigate these limitations such as formation or breaking of bonds and charge fluctuations due to geometry changes. However, QM models are usually applied to fragments of the regions involved in, say an enzymatic reaction, limiting the ability to explore the influence of environmental effects. Although QM based methods can be very accurate in predicting chemical reaction events, they remain limited to small systems simulated over short time scales. Hybrid quantum mechanics/molecular mechanics (QM/MM) methods were developed to combine the best features of empirical force fields (EFF) and QM models to tackle a range of chemical problems [90]. Reactive force fields represent the intermediate approach explored in this section. A hybrid ReaxFF/AMBER molecular dynamics (MD) tool is presented, which introduces ReaxFF capabilities to capture bond breaking and formation within the AMBER MD software package. The following sections motivate and discuss the results of this work. 2.1 Background and Motivation In QM/MM methods, the total system is divided into two separate QM and MM zones. The QM zone is the chemically active region which is treated by a range of QM methods and the rest of the system is the MM zone which is treated using an EFF. Since the introduction of the QM/MM method, various approaches have been implemented, and this method has 13 found extensive applications to deal with complex systems in realistic environments because of the significant reduction in the computational cost compared to pure QM methods [90, 25, 36, 62, 61]. Different QM/MM simulation tools have long been supported in the AMBER MD package [71, 16]. Some QM methods including semiempirical neglect of diatomic overlap (NDDO)-type and density functional tight binding (DFTB) are built-in (i.e., internal) and are supported natively within AMBER [112, 87]. More advanced QM methods are supported via a file-based integration interface to external QM software packages [35, 43]. Despite highly innovative techniques, algorithmic improvements and fast implementa- tions, the computational cost of the QM region still stands as the rate limiting factor in QM/MM simulations. ReaxFF is based on the bond length/bond order concept that bridges QM and MM methods in terms of functionality and computational costs. Importantly, ReaxFF provides a reasonable approximation of reactive phenomena at computational costs comparable to MM methods. The hybrid ReaxFF/AMBER molecular dynamics (MD) tool enables us to study local reactive events in large systems at a fraction of the computational costs of QM/MM models. Another major challenge with atomistic simulations is that chem- ical reactions through transition states can take place on a time scale that cannot be reached by regular molecular dynamics simulations. Therefore, approaches based on enhanced sam- pling methods are used to locate the transition state for a chemical reaction using QM/MM methods. Umbrella sampling [49] is one of the most well-known enhanced sampling methods and can readily be used in the new ReaxFF/AMBER tool. It is used herein to map out the reaction profile of the Claisen rearrangement as a validation study. 2.2 The ReaxFF/AMBER Integration Similar to QM/MM methods, atoms are split into three categories in the ReaxFF/AMBER method: (i) ReaxFF atoms, which include all atoms in the chemically reactive region and are handled by a ReaxFF implementation, (ii) the ReaxFF/MM transition atoms, which in- clude all atoms within a certain cutoff of the ReaxFF region and is handled by ReaxFF and AMBER collaboratively, and (iii) the MM atoms, which include all remaining atoms and is 14 handled exclusively by AMBER. These categories are illustrated in Fig. 2.1 with sphere-like shapes, but these regions can obviously be of any shape. In what follows, we describe the implementation of the ReaxFF/AMBER method. Figure 2.1 ReaxFF/AMBER regions in the integration implementation. 2.2.1 Implementation AMBER is the simulation driver in the ReaxFF/AMBER MD integration. After AMBER categorizes atoms into their respective groups, it sends all relevant information for ReaxFF and ReaxFF/MM atoms to the ReaxFF program. The ReaxFF/AMBER tool currently uses the external model interface; that is, the necessary data transfers between ReaxFF and AMBER are performed using file-based data exchange. Therefore, after AMBER completes writing the data exchange files, it launches the ReaxFF program as an external binary and waits for its output files. The ReaxFF program then runs a zero-step nonperiodic simulation to calculate the dynamic charges, energies, and forces on the ReaxFF and ReaxFF/MM atoms. Upon completion, it writes this information back into another file which is finally read by AMBER to complete the energy and force computations for the ReaxFF/MM and MM regions. In implementing the interface between the ReaxFF and AMBER programs, we have adopted the following procedure for a successful hybrid model: 15 MM RegionReaxFF/MMRegionReaxFF Region • Dynamic charges on ReaxFF atoms are calculated under the influence of ReaxFF/MM atoms with static charges. • All ReaxFF interactions between ReaxFF-ReaxFF pairs are calculated without any modifications. • Electrostatic interactions between ReaxFF (w/dynamic charge)-ReaxFF/MM (w/static charge) atom pairs are calculated by ReaxFF. • van der Waals interactions between ReaxFF-ReaxFF pairs/MM atom pairs are handled by AMBER (e.g., using a Lennard-Jones potential). • Interactions between MM-ReaxFF/MM and MM-MM pairs are handled by AMBER as usual. We should note that there are some limitations of the current ReaxFF/AMBER tool. Now, only systems with noncovalent bonds between ReaxFF and ReaxFF/MM regions can be studied. Also, only shared memory parallelism can be leveraged for the time-being. Never- theless, as we demonstrate below, the current implementation serves as a proof-of-concept on the feasibility and advantages of this approach. 2.2.2 Dynamic Charges with the Modified Electronegativity Equalization Method (mEEM) Since we allow the statically charged MM and transition region atoms to polarize the ReaxFF atoms, the dynamic charge model used by ReaxFF needs to be modified. As mentioned previously, ReaxFF can use the charge equilibration (QEq) or electronegativ- ity equalization method (EEM) to determine the charges. Our current implementation is based on EEM. Before describing the necessary modifications, we briefly discuss the EEM charge model that is currently used by ReaxFF/AMBER. The EEM charge model [65] relies on the principle that charges should be distributed on atoms to satisfy constraints for both the net system charge and the equalized atom electronegativities. Let atomic charges be q = (q1, q2, . . . , qn) and the positions be R = (r1, r2, . . . , rn), where qi ∈ R and ri ∈ R3. On 16 the basis of Sanderson’s Electronegativity Equalization Principle [83], the electronegativity of all atoms needs to be equalized: ϵ1 = ϵ2 = · · · = ϵn = ¯ϵ (2.1) where ϵi is the electronegativity of atom i and ¯ϵ is the average molecular electronegativity. The other constraint forces the sum of the atomic charges to be equal to the given net system charge: (cid:88) i qi = qnet. (2.2) The constraints and the parameterized inter-atomic interactions can be merged into the following linear equation where the charges q are the solution of:    H 1n 1T n 0       q ¯ϵ     =   −χ qnet    . (2.3) Here, χ is an n by 1 vector of atomic electronegativities, and the Hij values, that is, individual elements of the n by n H matrix, are defined as δijηi+(1 − δij)·Fi,j, where δij is the Kronecker delta operator, ηi is the idempotential, and Fij is defined as Fi,j =    3(cid:113) 1 ij +γ−3 r3 ij , rij ≤ rnonb (2.4) 0, otherwise. In the equations above, rij = ||rj − ri||2 is the distance between atoms i and j, rnonb is the cutoff radius, γij = √ γi · γj is a pairwise shielding term tuned for element types of atoms i and j to avoid unbounded electrostatic energy at short distances, and ¯ϵ is the dielectric constant of the medium. It has been demonstrated that the EEM model reproduces QM calculated Mulliken charges [14]. To account for the polarization effect of the transition region atoms on the core ReaxFF region, we introduce the ReaxFF/AMBER atoms as particles with fixed charges to the EEM 17 solver. This is done by modifying the system of equations as follows. Assuming that there are c ReaxFF atoms and t transition region atoms, then Hcore ∈ Rc×c captures the interactions within the core ReaxFF region and Hcore−trans ∈ Rc×t captures the impact of the transition region atoms on the ReaxFF atoms.       Hcore Hcore−trans 1c Htrans−core Htrans 1T c 1T t 1t 0             qcore qtrans ¯ϵ             = −χcore −χtrans qnet       (2.5) Since a direct solution of the above linear system scales cubically with the number of total atoms, typically iterative solvers are used to obtain approximations to the optimized dynamic charges. In such an iterative scheme, we only evolve the charges on the core atoms as qtrans are fixed values given in the MM force field. As such, the rows corresponding to the transition region atoms can be ignored in the above linear system, and we obtain    Hcore Hcore−trans 1c 1T c 1T t 0          qcore qtrans ¯ϵ          =    . −χcore qnet Since qtrans is fixed, it can be rearranged as    Hcore 1c 1T c 0       qcore ¯ϵ     =   −χ qnet     −   Hcore−trans 1T t (cid:20)    (cid:21) . qtrans (2.6) (2.7) These modifications ensure that the existence of the fixed charges does not slow down the charge equilibration step since the fixed charges and dynamic charges are separated. The modifications above are sufficient to run nonperiodic ReaxFF/MM simulations, as well as periodic boundary simulations without long-range electrostatic interactions. When running periodic boundary simulations with long-range electrostatic interactions AMBER uses the particle mesh Ewald (PME) approach which calculates interactions within a certain cutoff distance directly and approximates the rest over a mesh [20]. To be able to account for 18 the long-range interactions in EEM, we have incorporated the impact of the mesh points on ReaxFF atoms by further extending the EEM matrix, that is, by adding a column and row containing the effects of the mesh points. This scheme does not conserve the energy in NVE simulations with periodic boundaries because ReaxFF uses tapered electrostatic interactions [18] that force the Coulomb effects to slowly decay to 0 at the interaction cutoff radius which is typically set to 10-12 ˚A. Tapering of the Coulomb interactions is a built-in feature of the ReaxFF formulation, and existing ReaxFF parameter sets have been trained with this design principle in mind. We believe that by retraining ReaxFF parameters without tapered Coulomb kernels, periodic boundary simulations with PME can successfully be enabled, but doing so goes beyond the scope of this paper, so we leave it as a topic for further exploration. 2.2.3 Validation For validation of the resulting ReaxFF/AMBER combination, we performed experiments using benzene-in-water systems, one with periodic boundary conditions, and one without (Fig. 2.2). Benzene is not reactive in water, but the goal of these computational experiments was to validate that the ReaxFF/MM method can achieve energy conservation and produce reasonable dynamic charges. Figure 2.2 (Left) ReaxFF Benzene in a TIP3P water droplet with a total of 627 atoms, (right) ReaxFF Benzene in a TIP3P water box with a total of 4398 atoms. For both simulations (nonperiodic and periodic), the systems are first energy-minimized 19 and then heated to 300 K using the Berendsen thermostat in AMBER. Finally, ReaxFF/MM NVE simulations are run using a time step of 0.25 fs to check energy conservation and charges. A relatively short time step was chosen as this is the recommended setting for ReaxFF simulations, especially in the presence of H atoms. For all simulations, the SHAKE algorithm was turned off. Figure 2.3 Simulation of a benzene molecule in water using the TIP3P water model and NVE ensemble. A time step of 0.25 fs was used. (Blue) Nonperiodic boundary condition with an infinite cutoff. (Red) Periodic boundary condition with a QM cutoff of 10 ˚A (qmcut = 10 ˚A) and ReaxFF/PME interaction turned off (qm pme = 0). As shown in Fig. 2.3, the energy is conserved for both simulations. Fig. 2.4 shows the carbon and hydrogen charges for Reax atoms (denoted as Reax-C and Reax-H, respectively), as well as the average across all C and H atoms in a time step (denoted as Reax-C-Avg and Reax-H-Avg, respectively). As can be seen, dynamic charges produced by ReaxFF/MM under the influence of statically charged transition region atoms are in line with the carbon and hydrogen charges given in the AMBER force field for benzene (denoted as AMBER-C and AMBER-H, respectively). 20 Periodic boundary cond. Nonperiodic boundary cond.Time (ps)E(t) - Eavg (kcal/mol)__4.2.0. -2. -4.0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0. Figure 2.4 Dynamic ReaxFF charges and fixed AMBER charges with a ReaxFF/MM simu- lation of a benzene molecule in water with periodic boundary conditions using parameters qmcut = 10 ˚A and qm pme = 0. 2.2.4 Performance Analysis As mentioned before, the motivation for the ReaxFF/MM method is that it can sig- nificantly reduce the computational time that would be needed by a comparable QM/MM simulation. To illustrate this, we benchmarked the computational cost of separate equili- bration simulations of allyl vinyl ether (AVE) which consists of 14 atoms using ReaxFF, PuReMD, SCC-DTFB, PM3, MM, the Hartree-Fock method (HF), and density functional theory (DFT) using the B3LYP functional. The HF and DFT calculations used the 6-31G* basis set. The benchmark simulations were performed using standalone ReaxFF, PuReMD, AMBER, and the QUICK software package [63] in sequential execution mode on an Intel Xeon E5-2670 v2 CPU which runs at 2.50 GHz. The benchmarked time per simulation steps are shown in Table 2.1. In Table 2.1, PuReMD refers to a C language based efficient parallel implementation of the original ReaxFF implementation in Fortran. Time steps of 1 fs for MM, 0.5 fs for SCC-DFTB and PM3, and 0.25 fs for ReaxFF calculations are assumed for these benchmark calculations. These benchmark data show that ReaxFF method can be ∼2-5 times faster 21 Atomic Charges for Benzene in TIP3P Water..Reax-CReax-HReax-C AvgReax-H AvgAMBER-CAMBER-H0.200.150.100.050.00 -0.05 -0.10 -0.15 -0.200.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time (ps) Charge (e) Method seconds/iteration (relative wrt MM) time step (fs) ns/day (relative wrt PuReMD) SCC-DFTB 3.088 × 10−3 (39.8) 1.83 × 10−3 (23.6) 6.25 × 10−4 (8.1) 3.63 × 10−4 (4.7) 7.75 × 10−5 (1.0) 1.833 (23651.6) 2.762 (35638.7) PM3 ReaxFF PuReMD MM HF DFT 0.5 0.5 0.25 0.25 1.0 13.99 (0.2) 23.61 (0.4) 34.56 (0.6) 59.50 (1.0) 1114.84 (18.7) Table 2.1 Sequential execution time in seconds per time step and nanoseconds per day for SCC-DFTB, PM3, ReaxFF, PuReMD, MM, HF, and DFT calculations. B3LYP functional and the 6-31G* basis set were used for the HF and DFT geometry optimization calculations. than PM3 or SCC- DFTB methods, 3 orders of magnitude faster than HF or DFT methods, and ∼18 times slower than conventional molecular mechanics methods. Therefore, ReaxFF can offer an alternative for QM/MM calculations to handle larger complex systems at longer time scales. Currently, ReaxFF/AMBER software uses the original Fortran ReaxFF code through a file-based data exchange interface. We analyzed the computational performance of the ReaxFF/AMBER software with the nonperiodic benzene-in-water simulation for 1000 steps using an infinite electrostatic interaction cutoff. The timing breakdown shown in Fig. 2.5 is averaged over the course of this simulation. Please see Section A.1 for specifications of the hardware and software used in these computational experiments. Since the data exchange between ReaxFF and AMBER is file based, it requires the writing and reading of files at every simulation step. There are also the additional costs associated with launching the ReaxFF program as an external binary. Finally, the modifications to the standalone ReaxFF implementation is minimal and for that reason, ReaxFF allocates and initializes various data structures at every time step. Time spent on the I/O and system calls to the external program will be removed by introducing array-based data exchanges between the relevant ReaxFF and AMBER subroutines. Additionally, by keeping the data structures in memory persistently and reusing them at each time step, we could eliminate most of the time spent on the allocation and initialization parts. When all these potential 22 Figure 2.5 Breakdown of the total execution time of ReaxFF/AMBER software for the benzene-in-water simulation into its main components. The total time spent per step is 0.46 sec. updates are considered, the ReaxFF/AMBER software would be accelerated significantly beyond what is shown here because only 10.9% of the overall time is spent on the necessary ReaxFF calculations (see Fig. 2.5). 2.3 Claisen Rearrangement Simulations with ReaxFF/Amber To demonstrate the capabilities of the new ReaxFF/AMBER integration, we carried out ReaxFF/MM modeling of the classic Claisen rearrangement of allyl vinyl ether (AVE) solvated in an explicit TIP3P water model [47] as shown in Fig. 2.6. Since the Claisen rearrangement is a well-studied reaction [31, 32], it was chosen to evaluate the chemical accuracy of our new method. In these simulations, the solvent is treated by AMBER using the TIP3P water model and the reactant AVE is treated with ReaxFF. No covalent bonds cross the ReaxFF/AMBER boundary. 23 Figure 2.6 (left) Solvation of allyl vinyl ether (AVE) in octahedral TIP3P water with total number of 4319 atoms. (right) Molecular structure of AVE. 2.3.1 ReaxFF Parameter Optimization The existing ReaxFF force field [117] was tested for simulating the Claisen rearrangement. The initial tests showed that the system got trapped in a local minimum after reaching the transition state (TS), and therefore a proper Claisen rearrangement could not be observed (Fig. 2.7). To resolve this, ab initio QM data for the Claisen rearrangement of AVE in Figure 2.7 Claisen rearrangement of AVE in vacuum using original and trained ReaxFF force fields. The original force field leads to getting trapped in local minimum. vacuum were generated. The intrinsic reaction coordinate (IRC) method was used to obtain QM data from two different chair-like and boat-like transition states. All the optimization 24 Before FF trainingAfter FF trainingdE (kcal/mol) 40 30 20 10 0 -10 -20 -30 0 5000 1000 15000 20000 Iteration and IRC calculations were done with the Gaussian 16 package [26]. B3LYP functional and the 6-31G* basis set were used for the calculations. Through IRC calculations, a Claisen rearrangement transition of all chair-like and boat-like configurations and the energy changes were recorded. Comparison of these QM data against the initial ReaxFF force field results are shown in the top row of Fig. 2.8. These initial energy differences and the failure of the initial force field in capturing the Claisen rearrangement indicated the requirement of force field training against the generated QM data. The training data used for optimizing a ReaxFF parameter set can include QM data on charges, heat of formations (kcal/mol), energy minimized geometries (in angstrom or degree), lattice cell parameters (in angstrom or degree) and relative energies (kcal/mol). All the training data are added to the training set and the force field is reparametrized to minimize the error function: ei = (cid:18) xi,QM − xi,ReaxF F σi (cid:19)2 (2.8) where xi,QM and xi,ReaxF F are the QM and ReaxFF values of the ith entry of the training set, respectively, and σi are weight parameters that determine the desired accuracy for individual training data items. Force field parameters which are designated to be tuned are defined in a separate input file. The ReaxFF training feature in the standalone Fortran code then uses a line search scheme to optimize each parameter to be tuned one at a time. The bottom row of Fig. 2.8 shows the results of our fitting against QM data for chair- like and boat-like transition states. Evaluation of the new force field after training showed satisfactory behavior in capturing the Claisen rearrangement in vacuum (Fig. 2.7). Therefore, this updated ReaxFF force field was used to perform free energy calculations of the Claisen rearrangement of AVE in the presence of explicit TIP3P water using ReaxFF/AMBER. 2.3.2 Free Energy Calculation with Umbrella Sampling The MD driver in the ReaxFF/AMBER integration scheme is AMBER. Hence, we can use all of AMBER’s advanced sampling techniques. One such feature is the umbrella sampling free energy calculation technique. Using the newly optimized parameter set, AMBER’s um- 25 Figure 2.8 (a) QM (B3LYP/6-31G*) vs original ReaxFF force field results for the Claisen rearrangement of chair-like and boat-like structures. (b) QM (B3LYP/6-31G*) vs trained ReaxFF force field results for the Claisen rearrangement of chair-like and boat-like structures. The energies are with respect to the completely open AVE geometry optimized energy as the reference. brella sampling feature coupled with the weighted histogram analysis method (WHAM) was used to generate the potential of mean force (PMF) of this reaction with ReaxFF/AMBER. The reaction coordinate chosen for the umbrella sampling simulations was the distance be- tween two terminal AVE carbon atoms referred to as the C1 − C5 distance (Fig. 2.9). Figure 2.9 Claisen rearrangement of AVE while C1 − C5 distance is changed from 3.5 to 1.6 ˚A. A series of harmonic potentials were used to constrain the reaction coordinates to the defined windows. The C1 − C5 distance was varied from 3.5 to 1.6 ˚A. For the regions near 26 Chair-likeBoat-likeChair-likeBoat-like 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 C-C distance (Å) 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 C-C distance (Å) 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 C-C distance (Å) 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 C-C distance (Å)4030 20 10dE (kcal/mol) 0-10-204030 20 10dE (kcal/mol) 0-10-204030 20 10dE (kcal/mol) 0-10-20 40 30 20 10dE (kcal/mol) 0-10-20 (a) (b) ReaxFF QM ReaxFF QM ReaxFF QM ReaxFF QM the transition state (1.7-2.4 ˚A), 1000 kcal/mol/˚A2 was used to obtain enough sampling for each window, and for the regions far from the transition state (1.6-1.7 ˚A and 2.4-3.5 ˚A) a weaker restraint (200-700 kcal/mol/˚A2) was used. For the 1.6-3.0 ˚A range, 0.02 ˚A window intervals were defined and for the 3.0-3.5 ˚A range, 0.1 ˚A window intervals were defined. In total, 81 windows were created for sampling, while varying the C1 − C5 distance from 3.5 to 1.6 ˚A. All simulations were performed using the TIP3P rigid three site point charge water model. AVE was solvated in an octahedral box of TIP3P water molecules of 25 ˚A radius. All simulations were performed with periodic boundary conditions. The topology files for AVE were created using the general AMBER force field (GAFF) [113]. AMBER MM simulations using periodic boundary conditions were performed with a 10 ˚A cutoff for the real space nonbonded interactions, and the particle mesh Ewald (PME) algorithm was incorporated to account for long-range electrostatics beyond the cutoff. After 2000 steps minimization in each window, we ran 25 ps NPT (constant number of atoms, constant pressure and constant temperature) equilibration using constant pressure Langevin dynamics with the Berendsen barostat at 300 K and 1 atm. The shake algorithm was enabled for these simulations in the MM region, and the time step of 0.25 fs was chosen for both ReaxFF (no shake) and AMBER regions. Data collection for umbrella sampling was started after the equilibration phase for 5 ps in each window. Umbrella sampling consisted of data collection from separate windows of the reaction coordinate simultaneously. By defining proper harmonic restraint constants in each window, we allowed neighboring windows to overlap and ensured there were enough windows to cover the entire reaction coordinate space. Data sampling were performed every 0.0125 ps of the production stage of umbrella sampling simulations. Finally, the PMF was calculated by combining the data from each window using WHAM. 2.3.3 Results and Discussion To evaluate the potential of mean force (PMF) variations as a function of the reaction coordinate defined for the Claisen rearrangement, umbrella sampling calculations were em- 27 ployed. As mentioned, the simulations in different windows needed to be such that we could observe convergence on sampling to complete the umbrella sampling calculations. To make sure that all the windows overlap properly before performing the WHAM calculations, the histograms of all windows were generated (Fig. 2.10). This plot shows that there was proper overlap and no obvious gaps were observed. Using this data set, we used WHAM to construct the PMF of the Claisen rearrangement. Figure 2.10 Histogram of C1 − C5 distance samplings from the umbrella sampling windows. The C1 − C5 distance was varied from 3.5 to 1.6 ˚A. For the regions far from the transition state (1.5-1.7 ˚A and 2.4-3.5 ˚A) a weaker restraint (200-700 kcal/mol/˚A2) was used and for the regions near the transition state (1.7-2.4 ˚A) a stronger restraint (1000 kcal/mol/˚A2 ) was used. For the 1.6-3.0 ˚A range, 0.02 ˚A interval windows were defined and for 3.0-3.5 ˚A windows interval of 0.1 ˚A were defined. The PMF obtained from umbrella sampling calculations is shown in Fig. 2.11. The calculated PMF can be used to evaluate the Claisen rearrangement barrier height and the transition state configuration. A barrier height of 34.8 kcal/mol was obtained from these cal- culations. We also utilized an implementation of the self-consistent charge density functional tight-binding (SCC-DFTB) method, which is a semiempirical method based on density func- tional theory (DFT) [22], and also semiempirical neglect of diatomic overlap (NDDO) PM3 method [100] as part of the QM/MM support in the AMBER 18 MD program to perform 28 8000600040002000 0 1.5 2.0 2.5 3.0 3.5 FrequencyC1 - C5 Distance (Å) Figure 2.11 PMF of AVE Claisen rearrangement in TIP3P periodic water using combined ReaxFF/AMBER. The energies are with respect to the completely open AVE geometry optimized energy as the reference. calculations on the Claisen rearrangement. Also, another SCC-DFTB QM/MM umbrella sampling PMF calculation using the SPC/E water model was performed to evaluate the impact of the water model on this simulation. The results of all these PMF calculations are shown in Fig. 2.11. The transition configuration of these different methods are shown in Fig. A.1. ReaxFF/AMBER integration successfully captured the Claisen rearrangement reaction. The first kinetic study of thermal rearrangement of AVE in the gas-phase by Schuler and Murphy reported an activation energy of 30.6 kcal/mol [85]. This is consistent with other experimental reports of experimental activation energies for chair transitions [29, 115, 41]. This barrier height has been reported to be lower in water solvent than the value in the gas phase and nonpolar solvents [116]. Transition state bond lengths of 2.2-2.3 ˚A were reported using different simulation techniques [38]. 2.4 Concluding Remarks An AMBER/ReaxFF interface has been developed in this work that offers the capability of capturing chemical reactions as an alternative to AMBER QM/MM methods. The main 29 C1 - C5 Distance (Å)3.4 3.2 3.0 2.8 2.6 2.4 2.2 2.0 1.8 1.6 35 30 25 20 15 10 5 0 -5 -10 -15Energy (kcal/mol) DFTB-TIP3P DFTB-SPC PM3-TIP3P ReaxFF-TIP3P objective of this development is obtaining an approach to add a reactivity feature to the classically nonreactive molecular dynamics simulations with reduced computational costs compared to QM/MM methods. This new interface will be a useful tool for modeling big biomolecular systems with local reactive regions. The initial file-based implementation of this interface was assessed by calculation of the reaction profile of Claisen rearrangement of AVE solvated in the TIP3P water model. We conducted umbrella sampling simulations by generating a series of configurations along the distance between two end carbon atoms in AVE as the reaction coordinate, ran biasing simulations, and extracted the PMF. These results showed the capability of the ReaxFF/AMBER integration in capturing a Claisen rearrangement reaction despite the small inaccuracies in the reaction barrier height which can be enhanced by more thorough training of the ReaxFF force field. The benchmark timing data revealed that a significant portion of the calculation time was spent on I/O and initialization/synchronization in this file-based version of the integra- tion. As a result, we later developed an array-based implementation to eliminate these I/O and initialization/synchronization overheads. To further accelerate the calculations, we in- tegrated PuReMD, an efficient parallel implementation of the original ReaxFF written in C. PuReMD offers substantial performance enhancements over the currently used Fortran-based ReaxFF reference implementations. It supports shared memory, distributed memory, and GPU implementations, enabling us to leverage modern massively parallel architectures. The performance of the array-based AMBER/ReaxFF (PuReMD) is presented in the follow-up work [78]. 30 CHAPTER 3 JAX-REAXFF: A GRADIENT-BASED FRAMEWORK FOR FAST OPTIMIZATION OF REACTIVE FORCE FIELDS This chapter was previously published as [50], and has been reformatted to meet the requirements of this dissertation. This work is reproduced with the permission of American Chemical Society. The ReaxFF model bridges the gap between traditional classical models and quantum mechanical (QM) models by incorporating dynamic bonding and polarizability. As shown in Chapter 2, when combined with classical force fields such as AMBER, ReaxFF could enhance their capabilities to simulate interactions that require dynamic bonds and charges. However, to achieve realistic simulations using ReaxFF, model parameters must be optimized against high fidelity training data which typically come from QM calculations. Existing parameter optimization methods for ReaxFF consist of black-box techniques using genetic algorithms or Monte-Carlo methods. Due to the stochastic behavior of these methods, the optimization process oftentimes requires millions of error evaluations for com- plex parameter fitting tasks, thereby significantly hampering the rapid development of high quality parameter sets. Rapid optimization of the parameters is essential for developing and refining Reax force fields because producing a force field which exhibits empirical accuracy in terms of dynamics typically requires multiple refinements to the training data as well as to the parameters under optimization. In this section, we present JAX-ReaxFF, a novel soft- ware tool that leverages modern machine learning infrastructure to enable fast optimization of ReaxFF parameters. By calculating gradients of the loss function using the JAX library, JAX-ReaxFF utilizes highly effective local optimization methods that are initiated from multiple guesses in the high dimensional optimization space to obtain high quality results. Leveraging the architectural portability of the JAX framework, JAX-ReaxFF can execute efficiently on multi-core CPUs, graphics processing units (GPUs), or even tensor processing units (TPUs). As a result of using the gradient information and modern hardware accel- 31 erators, we are able to decrease ReaxFF parameter optimization time from days to mere minutes. Furthermore, JAX-ReaxFF framework can also serve as a sandbox environment for domain scientists to explore customizing the ReaxFF functional form for more accurate modeling. 3.1 Background and Motivation Before going into the details of JAX-ReaxFF, we provide some background on ReaxFF and existing software for ReaxFF parameter optimizations. 3.1.1 ReaxFF Training ReaxFF parameters are grouped by the number of atoms involved in the interaction (e.g., single-body, two-body, three-body and four-body) in addition to the system-wide global parameters. Based on the distances and angles between atoms and corresponding model parameters, bonded, 3-body, 4-body, H-bond and non-bonded interaction lists are formed dynamically at each time step. For every interaction, corresponding parameters from the parameter set based on the element types of the atoms involved are used to calculate the Esystem. As described in detail in Senftle et al. [88], there exist parameter sets for different kinds of simulations such as combustion, aqueous systems, metals, and biological systems. Even if there already is a parameter set for a simulation, it may require further tuning for a particular application. In some cases, the model needs to be trained from scratch which is a complex task. In general, as the number of elements in a parameter set increases, force field optimization becomes harder due to the increasing number of parameters involved. ReaxFF training procedure requires three different inputs: i) geometries, a set of atom clusters crucial in describing the system of interest (e.g., bond stretching, angle and torsion scans, reaction transition states, crystal structures, etc.), ii) training data, properties of these atom clusters (such as energy minimized structures, relative energies for bond/angle/torsion scans, partial charges and forces) which are calculated using high-fidelity QM models, and iii) model parameters to be optimized along with a fitness function that combines different 32 types of training items as follows: Error(m) = N (cid:88) i=1 (cid:18) xi − yi σi (cid:19)2 . (3.1) In Eq. (3.1), m is the model with a given set of force field parameters, xi is the prediction by model m, yi is the ground truth as calculated by QM, and σ−1 i is the weight assigned to each training item. Table 3.1 summarizes commonly used training data types (charge, energy, geometry, and force matching) and provides examples for each of them. Since ReaxFF dynamically assigns partial charges during a simulation, charge-based items could be included to help ReaxFF charges approximate QM-based atomic charges. An energy-based training data item uses a linear relationship of different geometries (expressed through their identifiers) because relative energies rather than the absolute energies drive the chemical and physical processes. As an illustrative example, the reaction H2 + 1 2 O2 → 2H2O could be expressed as an energy-based training item with prescribed weights as EH2 1 + EO2 2 − EH2O 0.5 = Etarget. For structural items, geometries must be energy minimized as accurate prediction of the low- est energy states is crucial. By including structural training items (bond distances, valence angles, and torsion angles), the ReaxFF model can be guided to accurately predict the struc- tural properties of the target systems. For other training item types, energy minimization is optional but usually recommended. Energy minimizing the geometries before calculating the error of a given set of parameters oftentimes assists by preventing overfitting, since this process serves as data augmentation – atom positions change slightly for each energy min- imization which overall makes it harder to overfit. Ultimately, whether a geometry should be energy minimized or not is up to the user. 33 Type Charge Energy Geometry Force Training Item ID1 1 ID1/1 - ID2/2 - ID3/3 ID1/1 ID3/2 - ID1/3 ID1 1 2 ID2 1 2 3 ID3 1 2 3 4 ID1 1 ID2 Target 0.5 50 -150 30 1.25 120 170 Description Charge for atom 1 in geometry ID1 (in elementary charge) Scaled energy difference between specified geometries (in kcal/mol) Distance between atoms 1 and 2 in geometry ID1 (in ˚A) Valence angle between atoms 1, 2 and 3 in geometry ID2 (in degrees) Torsion angle between atoms 1, 2, 3 and 4 in geometry ID3 (in degrees) 0.5 0.5 0.5 Forces on atom 1 in geometry ID1 (in kcal/mol ˚A) 1.0 Root mean squared gradients in geometry ID2 (in kcal/mol ˚A) Table 3.1 Commonly used training item types in ReaxFF training. Identifiers (e.g., ID1, ID2 and ID3) denote geometries, i.e., atomic structures or molecules. A charge-based training item is declared by specifying the atom number in an input geometry along with the target charge value. Since energy-based training is performed using relative energies, these training items are specified through a simple linear arithmetic expression by using the identifiers of the geometries being compared. More than 2 geometries can make up an energy expression. In geometry-based training, target distance, valence angle or torsion angle values can be specified for particular atoms over the energy-minimized structure of an input geometry. For force matching, target force values between particular atoms for a given geometry are specified. 3.1.2 Related Work Existing force field optimization methods for ReaxFF employ gradient-free black-box optimization methods such as GAs and EAs. These methods perform a global search in a high dimensional space and come with a high computational cost because they do not utilize any gradient information, but rather rely solely on error evaluations at different points in the search space for guiding improvement. The earliest ReaxFF optimization tool is the sequential parabolic parameter interpolation method (SOPPI) [107]. SOPPI uses a one-parameter-at-a-time approach where consecutive single parameter searches are performed until a certain convergence criteria is met. The algorithm is simple, but as the number of parameters increases, the number of one-parameter optimization steps needed for convergence increases drastically. Also, the success of this method is very dependent on the initial guess and the order of the parameters to be optimized. Due to these drawbacks of SOPPI, various global methods such as GAs [21, 46, 56], SA [42, 44], EAs [105], PSO [27], and search methods based on machine learning [19, 39, 68, 89] have been investigated for ReaxFF optimization. For an explanation and evaluation of the most promising of these methods, we refer readers to the prior work by Shchygol et al.[94]. 34 These methods have been proven to be successful for ReaxFF optimization. However, due to the absence of any gradient information, these global search methods require a high number of potential energy evaluations, as such they can be very costly. With the emergence of optimized tools for machine learning to calculate gradients of complex functions automatically, a method called Intelligent-ReaxFF has been proposed to leverage these tools to train Reax force fields [39]. In this work, the TensorFlow library was used to calculate gradients for force field optimization. However, the method does not have the flexibility of the previously mentioned methods in terms of the training data. The force field only can be trained to match the ReaxFF energies to the absolute energies in the reference data; relative energies, charges, or forces cannot be used in the training, essentially limiting its usability. Also since it does not filter out the unnecessary 2-body, 3-body, and 4-body interactions before the optimization step, it is significantly slower than JAX-ReaxFF. 3.2 Proposed Method In the following sub-sections, we discuss how JAX enables efficient and robust gradient calculation and how JAX is coupled with an efficient re-implementation of ReaxFF with these details in mind. We also highlight additional features with provide JAX-ReaxFF with performance portability to several computing architectures, including geometry clustering for batch processing on GPUs. Finally, we discuss how these details influence the local optimizer design. 3.2.1 Overview The JAX library performs auto-differentiation on mathematical functions expressed using Python code. As such, implementation of the ReaxFF energy expressions (see Eq. (1.1)) in Python forms the core of JAX-ReaxFF. Once the individual energy expressions and the training error function are provided, JAX can easily calculate the gradient of the training error function with respect to the ReaxFF parameters under optimization. As mentioned earlier, atomic forces can also be part of the training data set – these can be calculated using the gradients of ReaxFF energy expressions with respect to atom positions, too. 35 Molecular systems used for force field training tend to have a small number of atoms com- pared to regular MD runs. Using a software designed for running simulations with thousands of atoms (such as PuReMD or LAMMPS/ReaxFF) to run several small scale simulations introduces overheads. Optimizations in these software (such as optimized sparse solvers for atomic charges, fast neighbor list generation algorithms, and distributed computation) would actually increase the overall run-time for small systems and result in unnecessarily complex code. Even though vanilla Python code tends to be slower than optimized Fortran or C code, when the auto-diff functionality, the use of small geometries, and the just-in-compiled XLA support (discussed in Section 3.2.3 are considered, the advantages of JAX-ReaxFF outweighs the performance loss from not using Fortran or C. While gradient-based optimization functionality is straight-forward to achieve using JAX as described above, there are a number of important considerations needed to realize an efficient (from a run-time point-of-view) and scalable (from a memory utilization perspective) parameter optimization framework. Fig. 3.1 gives an overview of the task-flow in JAX- ReaxFF. After the neighbor list and interactions lists are calculated for the input geometries (Section 3.2.2, we cluster the inputs based on the size of their interaction lists and align them properly in memory to ensure efficient single instruction multiple data (SIMD) parallelization (Section 3.2.3). After these preparation steps, the main optimization loop is executed until convergence or the maximum number of optimization steps are reached (which typically takes only tens of iterations). During the parameter optimization loop, some molecules might require energy minimization. Hence, the main optimization loop contains a “gradient- based optimization” step followed by a “geometry optimization” step. We discuss each step in more detail in the ensuing subsections. 3.2.2 ReaxFF Model Implementation In ReaxFF implementations for MD simulations, neighbor and interaction lists are cre- ated based on the atom positions and the fixed force field parameters. Due to the dynamic nature of interactions in ReaxFF, accurate and fast calculation of energy terms (especially 36 Figure 3.1 JAX-ReaxFF execution flow graph. Inputs are shown in green (initial geometries, training data, and the list of parameters to be optimized), while the core steps of JAX- ReaxFF are shown in blue. The gray box shows the main optimization loop (Algorithm 3.1). the higher order ones such as valence angle and torsion) is critical. Differently from regu- lar ReaxFF MD simulations, the force field is also dynamic during parameter optimization, hence adding to the challenges of developing an efficient implementation. Pair-wise bonded interactions: We illustrate the challenges using bond order calcu- lations as an example. As shown in Fig. 1.2, all bonded interactions depend on the corrected bond order term. Initially, if the distance between two atoms is less than a given cutoff, typically 5 ˚A, the uncorrected bond orders (BO) are calculated according to Eq. (3.2), where rij is the actual distance between the atom pair i-j, and rσ o , rπ o , and rππ o are the idea bond lengths for σ-σ, σ-π and π-π bonds, respectively. The parameters pbo1, pbo3, and pbo5 are typically small negative values, while pbo2, pbo4, and pbo6 are relatively large positive values. Together, they ensure that the bond order function attains the value 1 at or below the ideal 37 Initial GeometriesNeighbor list and interaction listsGenerate required data structuresClustered and aligned geometries Clustering and aligningGradient based parameter optimizationGeometry optimizationCalculate True ErrorFFcurGeominGeomin & FFcurSave if bestGeoinit GeoinitSelected ReaxFF parameters to be optimized andTraining data(only for the first iteration)FFinit bond distance, and it smoothly decreases to 0 when rij is greater than the ideal bond length. BO′ ij + BOππ ij (cid:19)pbo2 (cid:21) pbo1 = exp ij + BOπ ij = BOσ (cid:18) rij (cid:20) rσ o (cid:18) rij rπ o (cid:18) rij rππ o + exp + exp pbo5 pbo3 (cid:20) (cid:20) (cid:19)pbo4 (cid:21) (cid:19)pbo6 (cid:21) (3.2) Normally, if the uncorrected bond order is greater than a certain threshold, it is added to the initial bond list and subsequently bond order corrections are applied based on the neighborhood of the atoms forming the bond. In the context of parameter optimization though, whether the pair i-j will form a bond above the given threshold also depends on the values of those parameters. Furthermore, if a given molecular structure in the training data set requires geometry optimization (as is needed for most structural properties), atom positions change as well. To speedup the computation, we utilize just-in-time compilation via XLA to create a static computational graph and map it to the desired computational device. Since the shapes of the input arrays are part of the graph by design, when their sizes change, JAX automatically triggers recompilation which is quite expensive considering the complexity of the ReaxFF model. Therefore we create the interaction lists once before the optimization starts and use masks to ignore the unwanted elements throughout the parameter and/or geometry optimization steps. For this purpose, for every unique element pair, the maximum possible distance (rmax) where each atom pair can have a bond order above a given threshold is determined using the equations below: fij(r) = max pbo1-6 , rσ o ,rσ o ,rππ o BO′ ij rmax = max r r subject to fij(r) ≥ threshold. 38 (3.3) (3.4) If some BO related parameters are included in the optimization, the parameter values which maximize the BO term (Eq. (3.3)) are selected from the specified parameter ranges. Then through a distance scan, the maximum possible distance is determined as the cutoff for inclusion of bond orders between that pair of elements (Eq. (3.4)). For geometries that require minimization, the maximum calculated distance is extended by a buffer distance to be able to accommodate potential atom position changes. This enables us to eliminate the expensive recompilation step as the atoms move during geometry optimization and/or force field parameters change during the parameter optimization process. Higher Order Bonded Interactions: Similar logic is applied for other types of inter- actions. In a given molecule with N atoms, when there is no trimming, there will be O(N 3) 3-body and O(N 4) 4-body interactions. Trimming these interaction lists is required to de- crease the computational and memory costs. 3-body and 4-body interaction lists are built using the corrected BO term. Another threshold is applied to the bonds forming the 3-body and 4-body interaction lists. Since the higher order bonded interactions are built using the corrected BO terms, the thresholds are also based on the previously described maximum possible BO terms. Further trimming of the lists is possible by scanning multiple distances and angles, but due to the increased computational complexity, trimming based solely on the BO term is employed. Non-bonded Interactions: It is assumed that there is a non-bonded interaction be- tween every atom in the system since the non-bonded interaction cutoff (which is typically 10 ˚A) is much larger than the size of molecular/crystal structures used for training. There- fore, non-bonded interactions form an N x N matrix. If the system has periodic boundary conditions with box dimensions a, b, and c and with non-bonded interaction cutoff of r, then the size of the tensor for non-bonded interactions will become N 2 · (cid:16) 2 · (cid:109) (cid:108) r a (cid:17) (cid:16) · 2 · + 1 (cid:109) (cid:108) r b (cid:17) (cid:16) · 2 · + 1 (cid:17) . + 1 (cid:109) (cid:108)r c Note that the terms after N 2 accounts for the periodic boundaries. 39 Evaluation of the Potential Energy: Once interaction lists are created as described above, they stay constant throughout the optimization with unwanted interactions sim- ply being masked out. Although masking wastes some computational power, it avoids the expensive reneighboring, interaction list recreation, and recompilation steps as force field pa- rameters evolve. It also leads to a simplified codebase because the interaction list generation part can be separated from the force field optimization process. The interaction list creation is always performed on the CPU using multiprocessing, regardless of whether a hardware accelerator is used for the optimization part or not. To calculate the potential energy, a similar approach to the standalone ReaxFF code is followed with the exception of charge equilibration. The use of EEM for distributing partial charges requires the solution of a system of linear equations (for details see Mortier et al.[65]) which is solved using preconditioned iterative solvers for large systems [69]. However, since the number of atoms is small for the training set structures, a direct LU factorization was easier to implement and auto-differentiate. 3.2.3 Clustering and Alignment for SIMD Parallelization JAX uses XLA, a domain specific compiler for linear algebra, to achieve hardware porta- bility. Using XLA, JAX-ReaxFF can easily run on multi-core CPUs, GPUs, or TPUs without any code changes. JAX offers vectorization (vmap) and parallelization (pmap) support to take full advantage of the underlying architecture. While vmap is aimed at single instruction mul- tiple data (SIMD) parallelism which merges multiple small computations into batches to achieve high device utilization, pmap targets multiple instruction multiple data parallelism. Our target architecture has been GPUs as they provide significant performance advan- tages over multi-core CPUs and have become the mainstream hardware accelerators. How- ever, attaining high performance on GPUs requires some important considerations. Since parameter optimization requires efficient execution of several small atomic structures as opposed to running one big MD simulation in parallel, JAX-ReaxFF leverages the vmap sup- port to accelerate the energy and gradient calculations. The keys for efficient vectorization 40 in JAX-ReaxFF are the pre-calculation of interaction lists that remain static throughout optimization (as described in the previous subsection), the clustering of input geometries with similar computational demands together (explained below), and the alignment of the interaction lists of geometries in the same cluster (by padding as necessary) for efficient memory accesses. As mentioned before, unwanted/unnecessary interactions in these static lists are masked during the energy and gradient calculations so that they do not affect the results. To cluster the input geometries for efficient vectorization, a modified version of the k- means algorithm [59] was used. The distance between geometry gi and cluster Cj is given as Dist (Cj, gi) = Cost (Cj ∪ {gi}) − Cost (Cj) . (3.5) The distance is an indicator of the change in computational load after assigning geometry gi to cluster Cj. The cost function calculates the approximate computational cost of aligning the geometries in a given set as follows: Cost (C) = s · [w1n1 + w2n2 + w3n3 + w4n4 + w5n5n2 1 (cid:3) . (3.6) In Eq. (3.6), s is the number of geometries within cluster C, and n1, n2, n3, n4, and n5 are the max numbers of atoms, 2-body interactions, 3-body interactions, 4-body interactions, and periodic boxes within cluster C, respectively. Since n1−5 are the main components in the cost calculation, collectively they can be thought as the cluster center. The coefficients w1 through w5 are indicators of the relative computational cost for single body, two-body, three-body, four-body, and non-bonded interactions, respectively. They can be determined empirically to accurately represent the computational costs in a given training set for a particular architecture. Since the interaction lists of the geometries within each cluster need to be aligned properly, the cost is calculated by multiplying the cluster size with the weighted sum of the maximum interaction list sizes within the cluster. This enables the clustering 41 algorithm to group geometries that are computationally similar and consequently increase the computational as well as memory efficiencies. After initializing the k cluster centers randomly, each geometry is assigned to these clusters based on the unique distance metric shown in Eq. (3.5). After clusters and their corresponding centers are updated, a new iteration is performed where each geometry is reassigned to the cluster that is closest to it. After every reassignment, cluster centers are updated since the newly assigned geometry might shift a cluster center drastically because we use the maximum interaction sizes within each cluster as the cluster center. If we were to delay the update till the end of the reassignment step, the final clusters would become sub- optimal due to the delay in center updates. Hence, unlike the original k-means algorithm, the order of geometries affects the result. Therefore, input geometries are shuffled after each iteration for randomization. The process continues until the cluster centers do not change anymore. Also, to ensure high performance, the clustering algorithm is executed multiple times starting from different random initial cluster centers and the one where the total wasted computation (which can be determined by the total amount of padding) is minimal is chosen as the final clustering of the geometries. Although the algorithm does not guarantee optimality, empirical results show that the presented clustering algorithm outperforms the original k-means algorithm for this task. Since the results are satisfactory, other clustering methods were not explored. The compilation time of JAX increases drastically with the number of clusters because JAX unrolls the loop that iterates through the clusters during compilation. Also, if the wasted computation does not increase significantly, a smaller number of clusters is more preferable for GPUs since improving SIMD parallelism is easier within clusters. For these reasons, the number of clusters is selected automatically. Unless the computational gain from a higher number of cluster centers is not significant, smaller number of clusters is preferred. The pseudocode for the clustering algorithm is provided in Section B.1. 42 3.2.4 Gradient-Based Local Optimization After the final clusters are formed, parameter optimization is performed using gradient- based local optimizers with multi-start as depicted in Fig. 3.1. Vectorization-based paral- lelism is employed for both energy minimization and parameter optimization steps shown in this figure. For gradient-based optimization to work, JAX traces the error function from Eq. (3.1) and computes the gradients of the parameters. However, since typically many geometries require geometry optimization for energy minimization, tracing the gradients through the optimization step is more error prone due to the complex functional form of the ReaxFF. To remedy this problem, we separate the geometry optimization from the error minimization. The error function in Eq. (3.1) requires geometries to be optimized before the final predic- tions. The error function without the geometry optimization can be thought as a surrogate model since it is a fast way to approximate the true error where the geometry optimization is done as well. Only single step energy calculations are used for surrogate error which de- viates from the true error depending on how much geometries change during the true error calculation. The approach accelerates the training significantly since only cheap single step calculations are done for the local parameter optimization and it does not require tracing the gradients through the geometry optimization step as it is not part of the surrogate error calculation. The optimization algorithm starts from the initial geometries (Geoinit) and the initial force field (FFinit) then the force field is iteratively improved. For each iteration of the training loop shown in Algorithm 3.1, two different local optimizations are performed, one being local geometry optimization using the steepest descent method and the main one being minimization of the fitness error on the energy minimized geometries by updating the force field parameters using various local optimization method such as L-BFGS-B and SLSQP. Both of these methods are classified as quasi-Newton methods where the Hessian matrix is approximated by successive gradient calculations [52, 118]. Error minimization 43 Algorithm 3.1 Gradient-based local optimization. 1: FFcur ← FFinit 2: for iteration = 1, 2, . . . do 3: FFcur ← Locally minimize the error through the selected gradient-based algorithm using Geomin and starting from FFcur. Geomin is fixed. 4: Geomin ← Geometry optimize the structures starting from the initial geometries Geoinit with the current model FFcur Ecur ← Calculate the current error using Geomin and FFcur if Ecur < Ebest then Ebest ← Ecur FFbest ← FFcur end if if |Ecur − Eprev|/Eprev < 0.001 then F F cur ← Add small uniform noise to F F best 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: end for end if Eprev ← Ecur step uses the Geomin and the optimized force field (FFcur) from the last iteration and after applying the selected gradient-based algorithm, FFcur gets updated with the newly trained force field. This step uses the surrogate model where the error is calculated with only the single step calculations. After that the geometry optimization step starts from Geoinit and yields optimized geometries (Geomin) using FFcur. The true error is calculated right after the geometry optimization, if there are any geometries that require it. After each iteration, the true error (Ecur) for FFcur on the training data is calculated. If Ecur is lower than the lowest error so far (Ebest), FFcur is saved as the best force field (FFbest). If the optimizer gets stuck in a local minima, noise is added to FFbest to escape the minima. The error on the surrogate gets closer to the true error as parameters converge because changes in parameters become minimal. One disadvantage of separating the energy minimization from the local optimization is that the error for the geometry items will be be ignored by the local optimization since the atom positions will not change. This introduces a discrepancy between the true error and the surrogate one. However, if the training data has multiple items related to the geometry-based items as a result of potential energy surface scans, the discrepancy could be minimized. As it is demonstrated through numerical experiments, the surrogate 44 approach works well in practice for a variety of training tasks which have geometry-based items. 3.3 Evaluation We evaluate the performance of JAX-ReaxFF, as well as the quality of the resulting force fields using published data sets with different characteristics. 3.3.1 Experimental Setup Training Tasks: We identified three training tasks1 that form a well-rounded test bench with their varying degrees of complexity. These tasks include different system types (cobalt, a metal; silica, an amorphous material; disulfide, a molecular system), different types and numbers of items in the training data set, and different number of parameters with their respective ranges to be optimized. We also note that these training tasks have been used by others for computational studies as well as for comparison of different force field optimization methods. Hence there exist several data points in the literature that serve as a benchmark for our JAX-ReaxFF tests. While structures in the cobalt and silica data sets mostly require energy minimization, those in the disulfide case mostly require single-step energy evaluations. Section 3.3.1 summarizes the specifications of the selected training tasks. JAX-ReaxFF currently does not support training items with simulation cell optimization, as such these were ignored. This only affects the silica data set which has only 5 of them (out of a total of 296 cases). Training Data Npar Nstrc Nminim C G F Cobalt [54] Silica [23] Disulfide [67] 12 67 87 146 302 231 130 221 10 0 5 0 0 26 255 0 0 4401 P 0 6 0 E 144 265 219 Table 3.2 Training data sets where Npar is the number of parameters to optimize, Nstrc is the number of structures in the training data set and Nminim is the number of geometries to be energy minimized. Columns C, G, F , P , and E denote the number of training items used for atomic charges, geometries, atomic forces, cell parameters, and energies, respectively. 1The data sets which are provided in the Supplementary Information of Ref.[94] and can be downloaded from https://ndownloader.figstatic.com/files/18698201. 45 Baseline Results: We compare the performance and training accuracy of JAX-ReaxFF to those of methods by Shchygol et al. [94], namely the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), the Monte Carlo Force Field (MCFF) optimizer, and the GA techniques described therein. The MCFF optimizer utilizes the simulated annealing approach to slowly modify the parameters and act based on the change in the error value. The remaining two approaches are population-based and are inspired by the basic principles of biological evolution. In GAs, a population of candidate solutions for a given optimization problem is evolved towards better solutions. Typically, evolution happens through random mutations and cross-overs between selected candidate solutions. In CMA-ES, new solutions are sampled from a multivariate normal distribution. The pairwise dependencies between the parameters are captured by the covariance matrix, and as the search progresses, CMA-ES updates the covariance matrix. All three approaches use ReaxFF model as a black box and find the direction solely from the function evaluations. Shchygol et al. [94] has compared these methods on different training tasks without focusing on tuning them and repeated the experiments multiple times with different starting conditions. Since they have provided an important test bench to compare different optimizers for ReaxFF, we follow the same approach to evaluate JAX-ReaxFF. Since the exact software and hardware from [94] are not accessible, execution times for the baseline methods are approximated on the hardware described in Section S2 of Supporting Information. We have calculated the time per true error evaluation for each training task using OGOLEM with sPuReMD backend and multiplied this by the total number of error evaluations presented in [94]. This approximation is a lower bound for CMA-ES and MCFF since they have a lower level of parallelism unlike genetic algorithms where each evaluation is independent from each other. The initial guess is an important factor which could change the results. It is especially important for gradient-based optimizations because these methods cannot move through the space freely as they need to follow the direction of the gradients. To show the capabilities 46 of JAX-ReaxFF, we experimented with all three initialization methods used by Shchygol et al. [94], namely initial guesses which are random, educated, and based on the literature. For random initial guesses, values are sampled from a uniform distribution-based on the given parameter ranges. To produce educated guesses, prior information from the previous related force fields is used as it is described further in Shchygol et al. [94]. For the literature-based initial guesses, force fields developed previously using the same training data sets are used. To obtain reliable results, each optimization method is repeated ten times using different initial guesses. For the educated- and literature-based initial guesses, a small amount of uniform random noise is added to the parameters without violating the range restrictions. (cid:3) where k is the length of the For each parameter p, the noise value is sampled from (cid:2) −1k 10 , 1k 10 given range for parameter p. By increasing the range of the random noise, the optimizer could be forced to move further away from the starting parameters, but the described noise range is used for the results presented herein. 3.3.2 Run-time and Training Performance Evaluation In JAX-ReaxFF, as mentioned above, two different gradient-based optimization algo- rithms are available, L-BFGS-B and SLSQP. For both L-BFGS-B and SLSQP, the maximum number iterations is set to 100 which is for the step 3 of Algorithm 3.1. For L-BFGS-B, the maximum number of iterations for the line search is set to 20 and the maximum number of variable matrix corrections to approximate the Hessian matrix is set to 20. For the rest of the control parameters, the default values from the SciPy library are used. The iteration count for the main optimization loop of Algorithm 3.1 is set to 20 where the local error minimization and the geometry optimization steps are iteratively repeated this many times. Therefore, for all JAX-ReaxFF experiments, the true error calculation with geometry opti- mization is done 20 times as the local error minimization only uses single step calculations. For experiments reported here, the percentage of the noise used to help escaping the local minima (line 11, Algorithm 3.1) is set to be 0.01%. Increasing the noise can potentially improve the results, but this typically results in higher training times. JAX-ReaxFF was 47 compiled to run in single precision for all experiments except for the disulfide case due to the numerical issues described later. Each training experiment is repeated 10 times and the lowest and median training errors are reported. 3.3.2.1 Cobalt Force Field Optimization Cobalt test case has only energy-based training items targeting the bulk cobalt properties. About 90% of these items require energy minimization, yet the training error does not fluctuate as shown in Fig. B.1. This shows that the surrogate error is close to the true error for this data set. Otherwise, the error would fluctuate between iterations since the surrogate error is used for the error minimization in each iteration. For some of the random runs, SLSQP does not show any progress initially. One possible explanation is that when the initial parameters are from a non-smooth part of the optimization space that cause high gradients, the optimizer fails to escape (Fig. B.1b). Small noise that is added when a stall in progress is detected stimulates progress as expected. In Section 3.3.2.1, we compare the convergence of JAX-ReaxFF against the black box approaches for the cobalt test case. For optimizations with JAX-ReaxFF, we observe that the SLSQP method almost always outperforms L-BFGS-B in terms of both lowest and median errors; this is also true for the silica and disulfide test cases, too. Hence, we compare JAX-ReaxFF to other techniques based on the SLSQP results. Among black box methods, the CMA-ES method gives the best force fields in terms of the lowest errors, but the GA method is the best one when it comes to median errors. Compared to CMA-ES, we observe significantly faster convergence for JAX-ReaxFF in terms of both the number of evaluations required (≈600 vs. 45,000) as well as the time taken (≈25 vs. 880 minutes), while obtaining similar or better force fields as measured by the lowest error criterion. Compared to the GA method, SLSQP is even more efficient computationally (≈25 vs. 3913 minutes), while yielding significantly better results in terms of the median error criterion except for the random initial guess case. These comparisons are CPU times only, but as mentioned above, one advantage of JAX-ReaxFF is that it can be ported to GPUs simply by choosing GPUs 48 Method Initial Guess Lowest Error Median Error L-BFGS-B SLSQP Genetic Algorithm CMA-ES MCFF rand edu lit rand edu lit rand edu lit rand edu lit rand edu lit 1368 1352 1366 1191 1168 1187 1346 1349 1345 1150 1159 1180 1422 1532 1360 2334 1499 1446 2253 1188 1189 1645 1424 1483 1894 1491 2320 2104 2092 1405 Avg. # Single Step Eval. 480 418 450 513 618 637 - - - # True Eval. Avg. CPU Exec. Time (min) Avg. GPU Exec. Time (min) 20 20 23.5 24.8 200k 3913 45k 880 45k 880 1.2 1.3 - - - Table 3.3 Cobalt training results. as the compilation target. On the GPU, JAX-ReaxFF’s execution times decrease by a factor of 20, making it even more advantageous compared to the CPU-only black-box codes. 3.3.2.2 Silica Force Field Optimization The silica training set includes energy-based, charge-based, and geometry-based items to describe the behavior of amorphous and crystalline silica, as well as its reactions. 73% of the items in this training set require energy minimization. As shown in Fig. S2 in SI, unlike the cobalt case, the error fluctuates more between iterations, possibly due to the presence of geometry matching items in the training set, the relatively unstable nature of energy minimization during parameter optimization and the fact that the number of parameters to be optimized are significantly higher in this case than the cobalt case. Across all initial guess schemes and error criteria, the best performing black box method for the silica test case is the GA method. Although the surrogate model based on single point evaluation for JAX-ReaxFF ignores the simulation cell optimization items, it is still able to minimize the training errors comparable to those of the GA method (≈3900 vs. ≈3650 for the lowest error and ≈4500 vs. ≈3750 for median error). The execution times for JAX-ReaxFF (32 49 Method Initial Guess Lowest Error Median Error L-BFGS-B SLSQP Genetic Algorithm CMA-ES MCFF rand edu lit rand edu lit rand edu lit rand edu lit rand edu lit 3901 4143 4315 3870 3977 3857 3577 3705 3593 3739 3747 3793 5059 5632 4885 5214 4467 5068 4498 4540 4534 3738 3817 3721 4753 4122 4298 6584 7127 6126 Avg. # Single Step Eval. 1865 1385 1929 2962 2839 2938 - - - # True Eval. Avg. CPU Exec. Time (min) Avg. GPU Exec. Time (min) 20 20 25.0 31.9 200k 1632 45k 367 45k 367 1.6 2.0 - - - Table 3.4 Silica training results. mins on the CPU and 2 mins on the GPU) are significantly shorter than that of the GA method (1632 mins). The CMA-ES method which gives force fields with errors similar to JAX-ReaxFF’s is still significantly slower (367 mins) than JAX-ReaxFF. 3.3.2.3 Disulfide Force Field Optimization The disulfide training data is significantly different from the previous ones as it mainly relies on force matching for model optimization. In JAX-ReaxFF, forces are calculated by taking the derivative of the potential energy expressions with respect to atom positions Fx = ∂Ep ∂x ∂(Fx − Ft)2 ∂p = ∂( ∂Ep ∂x − Ft)2 ∂p (3.7) where Fx is the 3-dimensional force vector for atom x, Ft is the target force vector from the training data set and p is the model parameter to be optimized. The term ∂(Ft−Fx)2 gives ∂p the gradients for the force matching items in the objective function. We observed that when the optimization is performed in single precision as we did with the previous two cases, the gradients of the force-based items with respect to the parameters from Eq. (3.2) result in 50 Method Initial Guess Lowest Error Median Error L-BFGS-B* SLSQP* Genetic Algorithm CMA-ES MCFF rand edu lit rand edu lit rand edu lit rand edu lit rand edu lit 6513 8162 6943 4744 4774 4970 19285 18054 18524 8052 8727 9284 8507 9608 10605 12906 9649 7578 43712 4913 5836 20384 20150 21206 11371 11105 11120 11893 13393 13625 Avg. # Single Step Eval. 2017 1942 1837 1059 3523 4270 - - - # True Eval. Avg. CPU Exec. Time (min) Avg. GPU Exec. Time (min) 20 20 13.5 22.2 340k 878 45k 116 45k 116 1.2 2.0 - - - Table 3.5 Disulfide training results (using double precision). extremely high values, ∼ 1017. These high gradients prevent the local optimizers from doing any progress as seen in Fig. B.3. Bond order parameters form the core of the dynamic bond concept in ReaxFF and therefore affect all types of bonded interactions. Calculation of the numerically sensitive bond order functionals [28] and the second order gradients necessary for force-matching (Eq. (3.7)) in single precision is likely the culprit for this problem. This issue can be remedied easily by performing the disulfide training in double precision at the expense of roughly doubling the execution time. As shown in Fig. B.4 and Section 3.3.2.3, this significantly improves the results. JAX- ReaxFF can attain training errors much lower than the black-box methods (4744 vs. 8052 for lowest errors and 4913 vs. 11105 for median errors) in much shorter time (22 vs. 116 minutes on the CPU) despite the performance hit from using double precision. On the GPU, the execution time of JAX-ReaxFF with double precision again takes only a couple minutes. This situation shows that gradient-based optimization is prone to numerical failures when higher order gradients are needed, but the use of double precision provides an easy remedy and the resulting execution time penalty does not represent any problem in terms of the total times taken. 51 3.3.3 Discussion of the Results These results show that the training errors of the force fields optimized using JAX- ReaxFF are on par with or better than those from the literature [94], while the training time decreases by one to three orders of magnitude. The relatively good median training errors obtained from multiple runs also indicate that JAX-ReaxFF could produce various force fields with comparable performance. The main benefit is that the overall training times are significantly lower which enables researchers to quickly iterate over ideas, try different weighting of the training items, or use different data for their indented tasks. The training error of optimized parameters can be seen as proxies, but the quality of the resulting force field parameter sets ultimately need to be validated through actual MD simulations and comparisons against experimental and/or QM data. The next section focuses on validating the quality of the force fields trained using JAX-ReaxFF. 3.4 Force Field Validation MD simulations in this work are performed using the Large-scale Atomic/Molecular Mas- sively Parallel Simulator (LAMMPS), a molecular dynamics program from Sandia National Laboratories [103]. A relatively short time step of 0.5 fs was used in all simulations. This is the recommended setting for ReaxFF simulations of systems that do not include light atoms like hydrogen. All N V T ensemble (constant number of atoms, volume and temperature) simulations were performed using the Nose-Hoover thermostat to control the temperature with a temperature damping parameter of 100 fs which determines how rapidly the temper- ature is relaxed. All N P T ensemble (constant number of atoms, pressure and temperature) simulations were performed using the Nose-Hoover thermostat to control the temperature with a temperature damping parameter of 100 fs and the Nose-Hoover barostat to control the pressure with a temperature damping parameter of 1000 fs. 3.4.1 MD Simulations of Pure Cobalt We investigated the crystal lattice constant correlation with cohesive energy in crystals of fcc cobalt for validation. The lattice constant was changed from 3 ˚A to 5 ˚A and the 52 Figure 3.2 Variations in pure cobalt single fcc crystal cohesive energy by variations of the lattice constant. associated lattice cohesive energies were recorded (Fig. 3.2). Simulation results from the force field with the lowest error fitted using JAX-ReaxFF (see Section 3.3.2.1) were compared to two previously published ReaxFF force fields for cobalt [54, 94] and embedded atom method (EAM) force field [75]. To validate the quality of the force field in capturing the dynamics behavior, the annealing loop was generated for a pure cobalt crystal structure and was compared to the available force fields. A cubic simulation box of 5 x 5 x 5 ideal fcc cobalt unit cells was generated for annealing simulations using the N P T ensemble between 1000 K and 3000 K. After the N P T equilibration of the pure cobalt crystal at 1000 K, the system was subjected to N P T ensemble annealing between 1000 K and 3000 K by 10 K/ps heating and cooling rate. A time step of 0.5 fs was used for the simulations. The changes in the system energy during this annealing loop is shown in Fig. 3.3. Three ReaxFF force fields showed similar dynamic evolution behavior for the pure cobalt structure while the EAM force field showed a different dynamic evolution (Fig. 3.3). After completion of the annealing loop, structural evaluation showed that using the ReaxFF force fields resulted a considerable recrystallization in the pure cobalt structure, while recrystallization was not observed when EAM force field was utilized (Fig. 3.4). These results demonstrate that Reax 53 3.003.253.503.754.004.254.504.755.00Lattice constant100908070605040Ecoh (Kcal/mol)EAM2010 FF2019 FFNew FF Figure 3.3 Annealing loop of a 5 x 5 x 5 fcc cobalt crystal including 500 atoms using the newly fitted ReaxFF force field (left) and the EAM force field (right) with heating and cooling rate of 10 K/ps. Figure 3.4 Final configurations of pure Co fcc crystals after the annealing loop in the 1000 K and 3000 K temperature range. force fields yield more physically meaningful simulations, and the fact that the newly fitted force field can produce simulation results similar to those reported in the literature provides validation for the JAX-ReaxFF based optimization method. 3.4.2 MD Simulations of Amorphous Silica To evaluate the quality of the silica force field optimized using JAX-ReaxFF, the amor- phous silica structure introduced in Fogarty et al. [23] was reconstructed. The amorphous silica system included 2000 SiO2 molecules with an initial density of 2.2 g/cm3 (Fig. 3.5); it was energy-minimized first to eliminate initial bad contacts. The system was then annealed twice between 300 K and 4000 K. The first annealing loop was performed using N V T en- semble with heating and cooling rate of 37 K/ps. The second annealing loop was performed 54 1000 1500 2000 2500 3000 Temperature (K) -3.5 -3.6 -3.7 -3.8 -3.9 -4.0 -4.1 -4.2 -4.3 -4.4Energy per atom (eV) Heating CoolingTemperature (K) -3.75 -3.80 -3.85 -3.90 -3.95 -4.00 -4.05 -4.10 -4.15 -4.20 -4.25 -4.301000 1500 2000 2500 3000 Energy per atom (eV) Heating Cooling Figure 3.5 The amorphous silica structure including 2000 SiO2 molecules and total of 6000 atoms. Silicon atoms are shown with yellow color and Oxygen atoms are shown with red color. in N P T ensemble between 300 K and 4000 K using a Nose–Hoover thermostat and barostat at 1.01325 bar pressure. Similar to the N V T annealing, the heating and cooling rate was 37 K/ps. In the production run, the silica system was equilibrated in N P T ensemble using T = 300 K and P = 1.01325 bar for an additional 200 ps. These calculations were performed us- ing the force field with the lowest training error from JAX-ReaxFF, as well as two previously published Reax force fields [23, 94]. The properties of the final configuration of these silica structures are compared in Table 3.6. While the new force field produced by JAX-ReaxFF results in a different parameter set than the two previously published force fields, it attains density and coordination numbers that are almost identical to the experimentally deter- mined values. The radial distribution functions of the final configuration of silica structure equilibrated using our fitted force field are shown in Fig. 3.6). The locations of the peaks and the overall shapes for Si-O, O-O, and Si-Si closely match the expected values too. These results provide further evidence that JAX-ReaxFF can produce high quality force fields. 3.4.3 MD Simulations of Sulfur Bond Containing Molecules To test the validity of the force fields containing Sulfur parameters optimized using JAX-ReaxFF, we performed potential energy scans for single molecules (dimethyl disul- fide (DMDS), dimethyl thioether (DMTE), hydrogen sulfide (H2S), and hydrogen disulfide 55 Property Density (g/cm3) Si coordination O coordination 2010 FF[23] 2.19 3.99 1.99 2019 FF[94] 2.31 3.94 1.97 New FF 2.23 3.97 1.99 Table 3.6 Silica properties calculated using three different force fields. The experimental value reported for silica density is 2.2 g/cm3 [66]. Figure 3.6 Radial distribution function of silicon-oxygen, oxygen-oxygen, and silicon-silicon for the silica structure at the end of annealing and equilibration. (H2S2) with different restraints. The restraints are applied on the S-S bond of DMDS, S-C bond of DMTE, H-S-H angle of H2S, and H-S-S-H torsion angle of H2S2. In Fig. 3.7, results from newly fitted force fields are compared against two previously published ReaxFF force fields [94, 67]. The force field “New FF2” has the lowest training error among all JAX-ReaxFF runs. While the force fields from the literature are able to nicely match the potential energy scans from CASPT2 ab initio data, New FF2 exhibits a discontinuity for the potential energy surface of H2S2, pointing to a potential overfitting problem. JAX-ReaxFF actually seems to be able to find a better force field parameter set for the provided training data (in terms of the error) as it exploits the gradient information. Besides the training data, the validation data is also provided for this task [67]. Therefore, we picked another force field, called “New FF1” from the parameter sets produced by JAX- ReaxFF. New FF1 has the lowest cumulative error on both the training and the validation data sets. As can be seen in the H2S2 plot, New FF1 fixes the overfitting issue, while being 56 02468g(r)0.02.55.07.510.012.515.017.5r(Å)O-OSi-OSi-Si Figure 3.7 Potential energy scans of molecules containing sulfur bonds with different re- straints, calculated using the newly trained and previously reported force fields. New FF1 has the lowest cumulative error and New FF2 has the lowest training error. able to closely match the CASPT2 data overall. 3.5 Concluding Remarks We presented a new software called JAX-ReaxFF that enables fast optimization of Reax force field parameters by leveraging recent advances in machine learning. JAX-ReaxFF uses several innovative techniques for high performance on architectures with GPUs. Clustering similar geometries together to maximize the SIMD parallelism while limiting the padding for alignment yields high parallelism, especially for single step evaluations. As it is described in Algorithm 3.1, by using single step energy evaluation-based approximations to the error function and gradient information about the search space, we are able to decrease the con- vergence time significantly with the help of GPU acceleration. We have empirically showed that even though the local optimizer is not fully aware of the geometry optimization, the overall algorithm converges with changes in parameter space becoming minimal as the algo- rithm progresses. Based on extensive experiments, we demonstrated that even if the starting 57 75100125H-S-H angle (deg)020H2S0100200300H-S-S-H angle (deg)01020H2S2234S-S distance (Å)050DMDS24S-C distance (Å)050DMTEPotential energy (Kcal/mol)2016 FF2019 FFNew FF 1New FF 2CASPT2 initial guess is a poor one, gradient-based local optimizers are able to improve the fitness of the force field drastically. Combined together, these innovations allow users to optimize Reax force fields in mere minutes. This is notable because existing methods require several days for obtaining essentially same quality force fields. Finally, JAX-ReaxFF provides a utility not available in other similar tools – its auto-diff functionality enables the study of the new functional forms for the interactions of the ReaxFF model without explicitly imple- menting the force calculations and the optimizer, since both forces and parameter gradients can automatically be calculated by JAX. 58 CHAPTER 4 END-TO-END DIFFERENTIABLE REACTIVE MOLECULAR DYNAMICS SIMULATIONS USING JAX The contents of this chapter first appeared as [51]. This work is reproduced with the permission of Springer Nature. While JAX-ReaxFF improves the speed of parameter optimization for ReaxFF, it is not suitable for running large scale MD simulations. JAX-ReaxFF precomputes the interaction lists to compute the potential energy and keeps them fixed throughout the parameter opti- mization. Also, since the molecules used for parameterization tend to have small number of atoms, the way the interaction lists are generated is not suitable for bigger molecules. In this section, we introduce a new implementation of ReaxFF using JAX that is suitable for both running large scale simulations and parameter optimization. We leverage the exist- ing JAX-MD library [84] to better support larger molecules without sacrificing performance. The new approach allows running end-to-end differentiable simulations. Moreover, it enables easy integration of ReaxFF with different potentials that already exist within JAX-MD. 4.1 Background and Motivation 4.1.1 Related Work To enable large-scale and long duration simulations, several ReaxFF implementations with different features and architectural support have been developed over the past couple decades. PuReMD has shared and distributed-memory versions for both CPUs and GPUs (CUDA-based), all of which are maintained separately [3, 2, 53], and several of these versions have been integrated into LAMMPS and AMBER [77]. More recently, to ensure hardware portability and simplify code maintenance and performance optimizations, a Kokkos-based implementation of ReaxFF has been developed in LAMMPS [106]. Kokkos is a performance portable programming model and allows the same codebase implemented using its primitives to be compiled for different backends. The current ReaxFF/Kokkos software also supports distributed-memory parallelism. In addition to the above open-source software, SCM pro- 59 vides a commercial software that includes ReaxFF support [81]. As mentioned in Chapter 1, a new class of force fields called machine learning potentials (MLP) such as SNAP [104], the Behler/Parrinello potential [9], SchNet [86], OrbNet [76], and NequIP [8] has emerged. ML libraries such as Tensorflow [1], PyTorch [70], and JAX [11] have gained widespread adoption not only for ML approaches but as a general purpose programming model even when using conventional techniques. This can be attributed to the convenience of advanced tools developed around these programming models and libraries such as auto-differentiation, auto-vectorization, and just-in-time compilation. Such tools have enabled fast prototyping of new ideas as well as hardware portability without sacrificing much computational efficiency. Intelligent-ReaxFF [39] and JAX-ReaxFF [50] implementations both leverage modern machine learning frameworks. However, they are both primarily designed for force field fitting, and as such they are designed to work with molecular systems typically containing tens of atoms, and they cannot scale beyond systems with more than a couple hundred atoms. More importantly, they both lack molecular dynamics capabilities. 4.1.2 JAX and JAX-MD Overview Since the new ReaxFF implementation is developed in JAX-MD, important design and implementation decisions were based on how JAX and JAX-MD work. As such, we first briefly describe these frameworks. JAX [11] is a machine learning framework for transforming numerical functions. It implements the Numpy API using its own primitives and provides high order transformation functions for any Python function written using JAX primitives. The most notable of these transformation functions are automatic differentiation (grad ), vectorization on a single device to leverage SIMD parallelism (vmap), parallelization across multiple devices (pmap), and just-in-time compilation (JIT ). These transformations can be composed together to enable more complex ones. JAX uses XLA, a domain specific compiler for linear algebra, under the hood to achieve hardware portability. This allows any Python code written in terms of 60 JAX primitives to be seamlessly compiled for CPUs, GPUs, or TPUs. Since XLA is also used extensively to accelerate Tensorflow models, XLA is supported for almost all modern processors, including GPUs by Nvidia and AMD. With JIT, XLA could apply performance optimizations targeted specifically for the selected device. The main limitation of JAX is that it expects the input data to the transformed functions to have fixed sizes. This allows XLA to adopt more aggressive performance optimizations during compilation, but when the size of the input data changes, the code needs to be recompiled. JAX-MD [84] is an MD package built in Python using JAX. It is designed for performing differentiable physics simulations with a focus on MD. It supports periodic and non-periodic simulation environments. JAX-MD employs a scalable 3D grid-cell binning based algorithm to construct the neighbor list for atoms in a given system. It includes integrators for vari- ous kinds of ensembles as well as Fast Inertial Relaxation Engine (FIRE) Descent [10] and Gradient Descent based energy minimizers. Various machine learning potentials such as the Behler-Perrinello architecture [9] and graph neural networks including the Neural Equivari- ant Interatomic Potentials (NequIP) [8], based on the GraphNet library [7], are also readily available. When combined with the capabilities of JAX, this rich ecosystem enables re- searchers to easily develop and train hybrid approaches for various chemistry and physics applications. 4.1.3 Our Contribution The aforementioned features of ML cyber-infrastructure are highly attractive from the perspective of MD software, considering the fact that existing force field implementations are mostly written in low-level languages and tuned to the target hardware for high performance. As such, we introduce a portable, performant, and easy-to-maintain ReaxFF implementation in Python built on top of JAX-MD [84]. This new implementation of ReaxFF is • easy-to-maintain because it only requires expressing the functional form of the po- tential energy for different atomic interactions in Python. MD simulations require calculation of forces which are calculated by taking the gradient of the potential en- 61 ergy with respect to atom positions at each time step. This can simply be accomplished with a call to the grad( ) function in JAX, • hardware portable because for its functional transformations, JAX uses XLA (Ac- celerated Linear Algebra) [82], which is a domain specific compiler for vector and matrix operations. Since XLA has high performance implementations across different CPUs (x86 64 and ARM) as well as GPUs (Nvidia and AMD), porting our ReaxFF implementation does not require any additional coding, • performant because we ensure that our underlying ReaxFF interaction lists are suit- able for vectorization, and we leverage just-in-time compilation effectively through a carefully designed update/reallocation scheme, • versatile because we designed our implementation such that the same interaction kernels can be re-used in either a single high performance run (needed for long MD simulations) or multiple small single-step runs (needed for parameter optimization) settings. This allows our implementation to be suitable for force field training as well. Also, it simplifies the study of new functional forms for various interactions in the ReaxFF model. 4.2 Design and Implementation In this section, we describe the overall design considerations and present the final design for our ReaxFF implementation in JAX-MD. To simplify the design and ensure modularity, generation of the interaction lists have been separated from the computation of partial energy terms. For overall efficiency and scalability, special consideration has been given to memory management. 4.2.1 Memory Management To avoid frequent re-compilations, sizes of input to JAX’s transforming functions must be known and fixed. As such, we separate the logic for handling the interaction list generation 62 Algorithm 4.1 General structure of computations in an MD simulation. 1: interLists ← Create the interaction lists using the allocate function 2: for timestep = 1, 2, . . . do Calculate forces 3: Update positions using the calculated forces 4: overflow ← Update the interaction lists 5: if overflow then 6: 7: 8: 9: end for interLists ← Reallocate based on the most recent utilizations end if into allocate and update parts. The allocate function estimates the sizes of all interaction lists (see Fig. 4.1) and allocates the needed memory with some buffer space (default 20%). Due to its dynamic nature, JAX transformations such as vmap and jit cannot be applied to the allocate function. The update function works with the already-allocated interaction lists, and fills them based on atom positions while preserving their sizes. Since the update function works on arrays with static sizes, JAX transformations such as vmap and jit can be and are applied to this function. For effective use of vmap, the update function also applies padding when necessary. Finally, while filling in the interaction lists, it also checks whether the utilization of the space allocated for each list falls below a threshold mark (default 50%) where the utilization is the ratio of the true size to the total size. If it does, a call to the allocate function is triggered to shrink the interaction lists as shown in Algorithm 4.1, which in turn causes JAX to recompile the rest of the code since array dimensions change. Another important aspect of our memory management scheme is the filtering of inter- action lists. In ReaxFF, while bonds are calculated dynamically, not all bonds are strong enough to be chemically meaningful, and therefore they are ignored (a typical bond strength threshold is 0.01). This has ramifications for higher-order interactions such as 3-body, 4- body, and H-bond interactions as well because they are built on top of the dynamically generated bond lists. As we discuss in more detail below, the acceptance criteria for each interaction is different. For 3-body and 4-body interactions, acceptance criteria depend on the strength of bonds among the involved atoms as well as force field parameters specific 63 Figure 4.1 Flow graph describing the generation of the interaction lists. to that group of atoms; for H-bonds, it is a combination of acceptor-donor atom types and bond strengths. However, the steps for filtering all interaction lists are similar and can be implemented as a generic routine with a candidate interaction list and an interaction-specific acceptance criterion. The interactions that require filtering and their relevant input data are shown as yellow nodes in Fig. 4.1. First, the candidate interaction list is populated. Then, candidates get masked based on the predefined acceptance criterion. Finally, the candi- date list is pruned and passed onto its corresponding potential energy computation function. While actually pruning the candidate list might be seen as an overhead, we note that the number of unaccepted 3-body and 4-body interactions are so high that simply ignoring them 64 Non-bondedinteractions(up to 10-12 Å) Bondedinteractions(up to 5 Å) 3-body interaction list generation with filtering4-body interaction list generation with filteringNon-bonded interaction filtering (based on atom type and distance)Bonded interaction filtering (based on atom type and BO)H-bond interaction list generation with filteringReaxFF PotentialAtom PositionsBox Info.Bonded interaction filtering(Based on BO): Grid based neighbor list generation: Filtration based interaction list generation: InputFiltered bonded interactionsFiltered bonded and non-bonded interactions during the potential energy computations introduce a significant computational overhead. Also, the memory required to keep the unfiltered 3-body and 4-body interaction lists would limit the scalability of our implementation for GPUs due to their limited memory resources. The filtering logic discussed above is JAX-friendly because the shapes of the intermediate (candidate) and final (pruned) data structures are fixed. As such, vmap and jit transfor- mations can be applied to the filtering procedure, too. As with un-pruned lists, filtered interaction list generation also keeps track of utilization of the relevant lists and sets the overflow flag, when necessary. 4.2.2 Generation of Interaction Lists Figure 4.2 Illustration of grid-cell neighbor search used to generate neighbor lists. Pair-wise Bonded Interactions: In ReaxFF, bond order (BO) between atom pairs are at the heart of all bonded potential energy computations. The BOs are computed in two steps. First, uncorrected BOs are computed according to Eq. (4.1), where rij is the distance between the atom pair i-j, and rσ o , rπ o , and rππ o are the ideal bond lengths for σ-σ, σ-π and 65 Rcutoff}}Rcutoff π-π bonds, respectively. BO′ ij = BOσ ij + BOπ (cid:20) pbo1 · = exp (cid:20) + exp pbo5 · ij + BOππ ij (cid:19)pbo2 (cid:21) (cid:18) rij rσ o (cid:18) rij rππ o (cid:19)pbo6 (cid:21) + exp (cid:19)pbo4 (cid:21) (cid:20) pbo3 · (cid:18) rij rπ o (4.1) After uncorrected bond orders are computed, the strength of BO′ ij is corrected based on the local neighborhood of atoms i and j. The corrected BO (BOij) represents the coordina- tion number (i.e., number of bonds) between two atoms. Corrected bonds below a certain threshold get discarded as they do not correspond to chemical bonds. Hence, they do not contribute to the total energy. To calculate uncorrected BO, for each atom in a given system, their neighbors are found using a grid-cell binning based neighbor search algorithm (Fig. 4.2). This allows us to generate the bonded neighbor lists in O(N k) where N is the number of atoms and k is the average number neighbors per atom. The side length of the grid cell is set to 5.5 ˚A, as a buffer space of 0.5 ˚A is added to the 5 ˚A actual bonded interaction cutoff to avoid frequent updates to the neighbors list. Since the cell size is almost the same as the bonded interaction cutoff, neighbor search only requires checking the nearby 33 grid cells. Neighbor information is stored in a 2D format where the neighbors of atom i are located on ith row with padding and alignment, as necessary. This format which is very similar to the ELLPACK format [110] is highly amenable for vectorization and memory coalescing on modern GPUs. It also simplifies bond order corrections because the neighbor indices for a given atom are stored consecutively. As will be discussed later, it also helps creating 3-body (for valency) and 4-body (for torsion) interactions since they use BOs as the main input. After creating the 2D neighbor array, BO terms are calculated and pairs with small BOs are filtered out as described above. Higher Order Bonded Interactions: After pruning the bonded interactions, 3-body and 4-body interaction lists are generated (Fig. 4.3). For each atom, every two neighbor 66 Figure 4.3 Atoms and their interactions involved in formation of the 3-body and 4-body interactions. pairs are selected to form the candidate list for 3-body interactions. In a system with N atoms and k neighbors per atom, there will be O (N k2) candidates. Then the candidates are masked and filtered based the involved BO terms to form the final array with shape M × 3 where M is the total number of interactions and columns are atom indices. After that, the finalized 3-body interaction list is used to generate the candidates for the 4-body interactions. For each 3 body interaction i-j-k, neighbors of both j and k are explored to form the 4-body candidate list and then the candidates get filtered based on the 4-body specific mask. When the molecule involves hydrogen bonds, the hydrogen interaction list is built using the filtered bonded and non-bonded interactions. A hydrogen bond can only be present if there are hydrogen donors and acceptors. While the acceptor and the hydrogen are cova- lently bonded (short range), the acceptor bonds to the hydrogen through a dipole-dipole interaction, therefore it is long ranged (up to 7.5 ˚A). Hence, to find all possible hydrogen bonds involving a given hydrogen atom, both its bonded neighbors and non-bonded neigh- bors are scanned. Using the appropriate masking criterion, the final interaction list is formed to be used for potential energy calculations. Figure 4.4 Atoms and their interactions involved in formation of hydrogen bonds. 67 ℓijkAtom iFiltered bonded interactionsjkAtom j (or k)ijkℓ3-body interaction4-body interactionAtom iFiltered bonded interactionsFiltered nonbonded interactionsH (ith atom)DonorAcceptor Non-Bonded Interactions: In ReaxFF, non-bonded interactions are effective up to 10– 12 ˚A , and they are smoothly tapered down to 0 beyond the cutoff. Similar to the pair-wise bonded interactions, the long range neighbor lists are also built using the grid-cell binning approach, this time using a buffer distance of 1 ˚A to avoid frequent neighbor updates. The neighbors are again stored in a 2D array similar to the ELLPACK format. This simplifies accessing the long range neighbors of a given atom while building the Hydrogen bond in- teractions list (as shown in Fig. 4.4). Also, the sparse matrix-vector multiplication kernel (SpMV) required for the dynamic charge calculation becomes simpler and more suitable for GPUs [109]. The non-bonded interaction list is used to compute van der Waals and Coulomb energy terms. While EvdWaals computation is relatively simple as it only involves the summation of the pair-wise interaction energies, ECoulomb requires charges to be dynamically computed based on a suitable charge model such as the charge equilibration (QEq) [80], electronega- tivity equalization (EE) [65], or atom-condensed Kohn-Sham density functional theory ap- proximated to second order (ACKS2) method [111]. Our current JAX-based implementation relies on the EE method. The EE method involves assigning partial charges to individual atoms while satisfying constraints for both the net system charge and the equalized atom electronegativities. The details of the charge equilibration are provided in Section 2.2.2. Since the size of the linear system is (n + 1) × (n + 1), it is prohibitively expensive to solve it with direct methods when n is becomes large (beyond a few hundred). Hence, we employ an iterative sparse linear solver as the matrix is sparse for large systems due the 10-12 ˚A cutoff for the non- bonded interactions. The iterative solvers available in JAX only expect a linear operator as a function pointer that can perform the matrix-vector multiplication. This allows us to define the SpMV operation directly using the non-bonded neighbor lists provided in an ELLPACK- like format described earlier without applying any transformations. Another optimization to accelerate the charge equilibration is to use initial guesses to warm start the iterative solver. 68 Algorithm 4.2 Gradient-based parameter optimization. 1: θ ← Initialize the model parameters 2: training set ← Align the training set by padding with dummy atoms 3: lossFunction ← Create a loss function by utilizing vmap(energyFunction) 4: calculateGradients ← jit(grad(lossFunction)) 5: while stopping criterion not met do 6: 7: 8: 9: 10: end while Xi, Yi ← Sample a minibatch of data from the training set Create the interaction lists for Xi g ← calculateGradients(θ, interLists, Yi) θ ← Update the model parameters using g Since the charges fluctuate smoothly as the simulation progresses, we use the cubic spline extrapolation to produce the initial guesses based on past history [2]. 4.2.3 Force Field Training Predictive capabilities of empirical force fields are arguably more important than their performance. For this, it is crucial for force field parameters to be optimized using high- fidelity quantum mechanical training data. In contrast to MD simulations involving a single system iterated over long durations, this optimization process typically involves executing several (on the order of hundreds to thousands, depending on the model and target systems) small molecular systems for a single step using different parameter sets in a high-throughput fashion. While evolutionary algorithms have traditionally been used for Reax force field optimizations, as JAX-ReaxFF [50] and Intelligent-ReaxFF [39] have recently demonstrated, using gradient-based optimization techniques can accelerate the training process by two to three orders of magnitude. However, the gradient information needed for force field optimization is much more complex than that of MD simulation – one needs to calculate the derivative of the fitness function which is typically formulated as a weighted sum of the difference between predicted and reference quantities over all systems in the training dataset with respect to parameters to be optimized (which is usually on the order of tens of parameters for ReaxFF). While this would be a formidable task using analytical or numerical techniques, the auto-differentiation capabilities of JAX enable us to easily repurpose the 69 above described ReaxFF MD implementation for parameter optimization. By composing different transformations, a simple loss function defined for a single sample can extended to work for a batch of training data as shown in Algorithm 4.2. To fully take advantage of SIMD parallelism, especially on GPUs, we ensure that different molecules in the training dataset are properly divided into small batches. To reduce the number of dummy atoms and the amount of padding within each batch, the training set could be clustered based on how much computation they require. Given the allocate/update mechanism described in Section 4.2.1, the different sizes of interaction lists for different molecular systems in a batch data does not cause additional challenges. 4.3 Experimental Results 4.3.1 Software and Hardware Setup To verify the accuracy of the presented JAX-based ReaxFF implementation, simulations were performed using molecular systems shown in Table 4.1. The Kokkos-based LAMMPS implementation of ReaxFF was chosen for validation and benchmarking comparisons due to its maturity and maintenance. For this purpose, we used the most recent stable release of LAMMPS (git tag stable 23Jun2022 update3), and experimented on both Nvidia and AMD GPUs. LAMMPS was built using GCC v10.3.0, OpenMPI v4.1.1, and CUDA v11.4.2 for the Nvidia GPUs, and with ROCm v5.3.0, aomp v16.0, and OpenMPI v4.1.4 for the AMD GPUs (using device-specific compiler optimization flags for both). For the JAX ex- periments, Python v3.8, JAX v0.4.1, and JAX-MD v0.2.24 were paired with CUDA v11.4.2 for the Nvidia GPUs, and ROCm v5.3.0 for the AMD GPUs. Hardware details are presented in Table 4.2. The compute nodes at the Michigan State University High-Performance Com- puting Center (MSU-HPCC) and the AMD Cloud Platform are used for the experiments. 70 Name Chem. Rep. N Sim. Box (˚A) Force Field Water H2O SiO2 Silica 2400 6000 29.0 × 28.9 × 29.3 36.9 × 50.7 × 52.5 [23] [23] Table 4.1 Molecular systems used in the performance evaluation section, with the third column (N ) indicating the number of atoms, the fourth one denoting the dimensions of the rectangular simulation box, and the last column showing the force field used to simulate the system. GPU CPU Cluster Intel Xeon 8358 (64 cores) MSU-HPCC Intel Xeon Platinum 8260 (48 cores) MSU-HPCC A100 V100 MI210 AMD EPYC 7742 (64 cores) MI100 AMD EPYC 7742 (64 cores) AMD Cloud Platform AMD Cloud Platform Table 4.2 Hardware details of the platforms used for performance experiments. 4.3.2 Validation of MD Capabilities Fig. 4.5 shows that the JAX-based ReaxFF energies almost perfectly match those from LAMMPS in actual MD simulations. The deviation only becomes visible after 2000 MD steps which is inevitable due to machine precision limitations. The relative energy difference is around 10−7 for both the water and silica systems. 4.3.2.1 Performance and Scalability We compare the performance of JAX-based ReaxFF to Kokkos/ReaxFF package in LAMMPS on both Nvidia and AMD GPUs. While Kokkos/ReaxFF supports MPI par- allelism, we use a single GPU for all tests. Kokkos/ReaxFF incurs minimal communication overheads when there is a single MPI process. The performance comparison on AMD GPUs is possible through Kokkos’ ROCm backend support, as well as the availability of JAX/XLA on AMD GPUs. To create systems with varying size, the molecular systems shown in Table 4.1 have been periodically replicated along the x, y, and z dimensions. The number of atoms vary from 2400 to 19200 for the water systems and from 6000 to 24000 for the silica systems. 71 Figure 4.5 Comparison of absolute (top plots) and relative difference (bottom plots) in potential energies for NVE simulations with a time step of 0.2 fs and a CG solver with 1e-6 tolerance for the charge calculation. For each experiment, NVE simulations with a time step of 0.2 fs were run for 5000 steps, and the average time per step in ms was reported. For both the Kokkos and JAX-based implementations, the buffer distance for the non-bonded interactions was set to 1 ˚A. While reneighboring is done every 25 MD steps for Kokkos, the JAX implementation keeps track of how much atoms move since the last neighborhood update and only reneighbors when atoms move more than the buffer distance. As suggested by the Kokkos documentation, the half-neighbor list option is used. While written in Python using JAX primitives, the proposed implementation is faster when the system size is small on all GPUs. As the number of atoms increases, while the time increases linearly for the JAX implementation, the Kokkos one increases sublinearly. The sublinear scaling for Kokkos indicates that it cannot fully utilize the resources when the problem size is small unlike JAX. As the problem size increases, Kokkos starts to utilize the GPU better and yield better performance. The Kokkos implementation achieves up to 3.2x speedup for the largest water systems on AMD GPUs (MI100 and MI210). On Nvidia GPUs 72 208300208200208100kcal/molWater box, 2400 atomsKOKKOSJAX402008.873e5Silica, 6000 atomsKOKKOSJAX010002000300040005000MD step0.20.10.00.10.2kcal/molKOKKOS - JAX010002000300040005000MD step0.20.4KOKKOS - JAXPotential Energy in NVE Simulation (dt = 0.2 fs) (V100 and A100), it is around 2.3x faster for the same water system with 19200 atoms. For the silica systems where there are no hydrogen bonds, Kokkos is around 2x faster on the AMD GPUs and 1.5x on the Nvidia GPUs. On the other hand, when the problem size is small, JAX achieves up to 1.8x speedup on an A100 GPU. Figure 4.6 Average time per MD step (in ms) for the water systems with varying sizes. 4.3.3 Training To demonstrate the training performance of the described implementation, we trained the ReaxFF parameters on the public QM9 dataset of about 134k relaxed organic molecules made up of H, C, N, O, and F atoms, with each molecule containing up to nine non- hydrogen atoms [79]. All systems are calculated at the B3LYP/6-31G(2df,p) level of theory. To simplify the dataset, we removed the molecules that contain F atoms which resulted in around 130k molecules. During optimization, 80% of the data is used for training and the remaining 20% for testing. The training is done using the AdamW optimizer [60] from the Optax library [5] with a batch size of 512 and the learning rate is set to 0.001. The ReaxFF model is typically fit to the training data containing relative energy differ- ences between molecules with the same type of atoms (different conformations and configu- 73 A10001020Time in msV1000204024004800960019200NMI21002550Time in ms24004800960019200NMI10002550Water SystemsJAXKOKKOS Figure 4.7 Average time per MD step (in ms) for the silica systems with varying sizes. rations) and the energies of the individual atoms get canceled out. Since the QM9 dataset only contains the absolute energies, we added a new term to the ReaxFF potential to remedy the energy shifts caused by the self-energies of the individual atoms. Esystem = EReaxFF + Eself-energy Eself-energy = N (cid:88) i=1 si (4.2) In Eq. (4.2), EReaxFF is the original ReaxFF potential designed to capture the interaction related terms and Eself-energy is the newly added parameterized self-energy term to capture the energy shifts, and si is the self energy of atom i solely determined by the atom type. Hence, the new term only contains 4 parameters as there are 4 atom types in the modified QM9 dataset. In total, around 1100 ReaxFF parameters are optimized during the training. The training is performed on an A100 GPU with each epoch taking approximately 8 seconds. Fig. 4.8 shows the mean absolute error (MAE) per epoch. Since the ReaxFF model has a relatively small number of parameters compared to most modern ML methods, the training and test MAE perfectly overlap throughout the training. The final MAE of the model on the test data is 3.6 kcal/mol. While this is higher than the ideal target of 1 kcal/mol error, 74 A100010Time in msV10002060001200024000NMI21002040Time in ms60001200024000NMI10002040Silica SystemsJAXKOKKOS Figure 4.8 Training progress of the ReaxFF model on the QM9 dataset, with the final MAE on the test data being 3.6 kcal/mol. we note that this is a straight optimization without any fine-tuning to demonstrate the capabilities of the new ReaxFF implementation. 4.4 Concluding Remarks With the accelerator landscape changing rapidly and becoming more complex, cross plat- form compilers gain more importance as they enable the same codebase to be used on different architectures. By leveraging modern machine learning cyber-infrastructure, we developed a new JAX-based ReaxFF implementation that is easy-to-maintain, hardware portable, per- formant, and versatile. Using auto-differentiation, forces in MD simulations are computed directly from energy functions implemented in Python without requiring any extra coding. It also allows the same code to be used for both MD simulations and parameter optimiza- tion which are both essential to study any system of interest with ReaxFF. While Kokkos is an another cross-plotform solution, it lacks auto-differentiation and batching optimization capabilities. Although it is more performant for bigger molecules, the JAX implementation is faster for small ones while also providing new functionalities. 75 020406080100Epoch101102MAETraining ReaxFF on QM9 DatasetTraining MAETest MAE CHAPTER 5 UNCERTAINTY QUANTIFICATION METHODS FOR MACHINE LEARNING POTENTIALS As mentioned previously, machine learning potentials could achieve a high level of ac- curacy when trained on comprehensive datasets. Quantum mechanical (QM) methods can be employed to compute target observables such as energies, charges, or forces for specific molecules to form a dataset; however, sampling chemically relevant molecules from the vast chemical space remains a formidable challenge. When training a model to study a single chemistry of interest, the relevant data could be collected by a domain expert. Nevertheless, constructing a dataset for a more general model that could be used to study a diverse variety of systems is not feasible using this approach. Additionally, any sampling method relying solely on QM methods becomes computationally prohibitive due to their significant costs. To address these limitations, active learning based data generation methods have been employed to train machine learning potentials [97, 98, 74, 37]. In active learning, the chemical space is explored by running a sampling method such as molecular dynamics simulations using a machine learning model. When the prediction uncertainty for a given sample exceeds a certain threshold, the target observables for the sample are collected and added to the training data. The model used for sampling is retrained when there is enough new data. The process is repeated until a certain criterion is met. The choice of UQ metric is important for this procedure as it determines where the model fails in the sampled space. In this section, we propose and compare different uncertainty quantification (UQ) methods to be used for active learning. We utilized the previously introduced ANI model as the MLP model for demonstration but the investigated UQ methods could be easily extended to work with other MLPs. Our main motivation is to explore cheaper alternatives to the ensemble based UQ as it is computationally expensive. 76 Figure 5.1 Ensemble training with cross-validation. 5.1 Methods 5.1.1 Ensemble Based Uncertainty Quantification Combining multiple predictors to improve performance is a widely used approach in lit- erature. A popular example is the random forest method, which combines multiple decision trees (weak learners) to make predictions [12]. Another way to train ensembles is by using bootstrapping. In this method, each member of the ensemble is trained on a different set of samples from the original training data, with resampling being performed with replace- ment. However, in cross-validation based ensemble training, the resampling is done without replacement as illustrated in Fig. 5.1. This lowers the amount of data each model receives but increases overall diversity which is preferred for the uncertainty estimation. To estimate the uncertainty, multiple models are trained using cross-validation, and the standard deviation of their predictions for a given sample is used as the uncertainty metric. Despite its simplicity, ensemble-based uncertainty estimation has proven to be effective for various classification and regression tasks [55]. Smith et al. successfully employed this approach to iteratively extend the initial training data using active learning [97]. For a given molecule with N atoms, the ensemble disagreement based uncertainty metric (U Qens) is defined as where σ is the standard deviation (STD) of the predictions of the models in the ensemble. U Qens = σ √ N (5.1) 77 Training DataValidationHoldoutTraining DataValidationHoldoutTraining DataValidationHoldoutTraining DataValidationHoldoutValidationTraining DataHoldout Figure 5.2 Feature and latent space distance. STD is divided by the square root of the number of atoms to normalize the metric so that it is not biased towards larger molecules as further explained in [97]. Despite its simplicity and performance as a UQ metric, it is computationally expensive in both training and inference time since it employs multiple models. The time cost could be hidden by using a separate GPU for each model, but it increases the total amount of resources needed. 5.1.2 Nearest Neighbor Based Uncertainty Quantification An alternative to ensemble based UQ, distance to the training data is proven to be an effective UQ metric [95, 58]. Intuitively, predictions for samples that are dissimilar to the training data are expected to be inaccurate. Hence, it is expected that the distance to the training data is highly correlated with the prediction uncertainty. If there is single descriptive vector (fingerprint) per molecule and a given molecule has descriptor x, the uncertainty of a model M trained on Xtrain can be defined as Dist(x, Xtrain) = arg min x′∈Xtrain d(x, x′) 78 Feature space distanceLatent space distanceInputOutput where d is a distance metric, typically Euclidean distance. To improve its robustness, K- Nearest Neighbors (K-NN) algorithm could be used which calculates the average distance to the closest k elements from the training data. A greater distance suggests that the corresponding chemical space is less represented in the training data. Moreover, since this method requires only a single model to gather latent representations and calculate the K-NN distance, it is computationally less intensive compared to the ensemble-based approach. Unlike handcrafted descriptors, machine learning models typically learn target specific representations from data. Janet et al. proposed using these latent representations to cal- culate distances, as these representations are specifically tailored for the given task [45]. Typically, deep learning models have many layers to improve the learning capacity. The vector before the output layer is shown to be an effective representation to use for distance based UQ [45]. MLPs typically employ distinct neural networks for each atom type, with these networks predicting the corresponding atomic energies. The total energy for a molecule is calculated by summing up the predicted atomic energies. Hence, there is no single latent vector rep- resenting the entire molecule unlike the molecular representations explored in [45]. Our proposed approach involves calculating the latent space distance to the training data at the atomic level. This is accomplished by first gathering type-specific latent vectors from the training data for each atom type. Subsequently, a K-Nearest Neighbors (K-NN) model is constructed for each atom type using the collected data. To compute the distance for a given molecule, each atom in the molecule is processed, and its atom type is identified. Using the corresponding K-NN model, the distance for each atom is calculated. Finally, the UQ met- ric is determined by averaging the distances calculated for all the atoms in the molecule. Algorithm 5.1 provides a concise summary of this approach. If a molecule contains atoms with latent representations that are dissimilar to the training data, it indicates a level of uncertainty which is expected to lead to inaccurate predictions. 79 Algorithm 5.1 Latent space distance using K-NN. 1: Ntype ← Number of available atom types 2: M ← Set of atomic NNs (in total, Ntype models) 3: Xtrain ← Training data 4: kneigh ← Number of neighbors for K-NN 5: for each i = 1, 2, . . . Ntype do 6: 7: 8: end for 9: Vi ← Collect latent vectors for type i from Xtrain using Mi KNNi ← Create K-NN model from Vi ▷ After creating the K-NN models from the training data, the UQ metric could be calculated in the following way: 10: xtest ← Given molecule to calculate the UQ metric for 11: dx ← 0 12: for each a ∈ x do 13: 14: 15: 16: 17: end for 18: dx ← dx/len(x) (Calculate the average distance) t ← Atom type of a va ← Latent vector for atom a using Mt da ← Calculate the distance using KNNt and va dx ← dx + da 5.1.3 Autoencoder Based Uncertainty Quantification As previously explained, the K-NN approach requires storing latent vectors after training to calculate the distance to the training data. During inference, it is necessary to find the nearest neighbors for each atom. However, if the training data is substantial, the latent vectors might not fit into memory, potentially drastically increasing the neighbor search time. To remedy this, we propose an alternative approach using autoencoders to estimate uncertainty. Employing variations of autoencoders is a well-known method for detecting outliers or estimating uncertainty [4, 17]. Typically, autoencoders are trained in an unsupervised man- ner where the labels are not available. The model is trained using reconstruction loss which is typically L2 norm of the difference between the input vector and the reconstructed vector. Reconstruction minimization has also been explored as an auxiliary task to improve general- ization during supervised model training [57]. Le et al. show that trying to reconstruct the input vector from the latent space regularizes the latent space and improves generalization 80 performance [57]. Figure 5.3 Multi-task autoencoder training. As illustrated in Fig. 5.3, instead of reconstructing the input vector, we pass the latent vector through a bottleneck to further compress it and then reconstruct the latent vector from this compressed version. We favored this method since there may not be an explicit input vector for some MLPs such as graph neural networks (GNNs) [34]. While we primarily studied ANI models that create explicit input vectors using predefined functions like Behler and Parrinello symmetry functions [9], our approach is easily adaptable for different MLPs as well. To train a model using reconstruction loss, we employ a loss function that combines both energy and reconstruction loss: L(W ) = (cid:88) i   (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) Ei − (cid:88) ˆEij j (cid:13) 2 (cid:13) (cid:13) (cid:13) (cid:13) (cid:88) + α ∥ ˆxij − xij∥2 j   (5.2) where W is the model weights, i is the molecule index and j is the atom index. For the energy loss part, Ei is the reference energy and ˆEij is the atomic energy prediction. For the reconstruction loss part, xij is the latent vector for atom j and ˆxij is the reconstructed vector. Hyperparameter α controls the weight of the reconstruction loss. For a molecule with N atoms, the equation below provides the molecular reconstruction 81 yInputLatent VectorCompressedLatent VectorReconstructedLatent Vector error to serve as a UQ metric: UQreconst = 1 N (cid:88) j ∥ ˆxj − xj∥2 . (5.3) We should note that just as there is a distinct neural network for each atom type, the autoencoder component is also specific to each type. 5.2 Evaluation 5.2.1 Datasets We have picked 2 different datasets to evaluate and compare the methods described in the earlier section. 5.2.1.1 QM9 dataset The QM9 dataset comprises approximately 130k energy-equilibrated molecules made up of C, H, O, N, and F atoms. Each molecule contains up to 9 heavy atoms. There are 13 properties per molecule calculated at the B3LYP/6-31G(2df,p) level of theory [79]. We only used potential energy during training and evaluation. Although it lacks diversity, it is a well-studied benchmark dataset. 5.2.1.2 ANI-1ccx dataset The ANI-1ccx dataset contains nearly 500k diverse molecular conformations consisting of C, H, O and N atoms. It is an intelligently selected 10% sub-sample of the ANI-1x data set, but recomputed with an accurate coupled cluster (approximately CCSD(T)/CBS) level of theory [99]. 5.2.2 Experiments To compare the UQ metrics, we designed two types of experiments. For the first set of experiments, we split each dataset into train, test and validation sets then compare how well the UQ metrics correlate with the error in the test data. Although comparing these methods in a static way provides valuable insights, in an active learning environment, the training data is iteratively extended based on model uncertainty evaluation. Hence, we designed another experiment that can provide further information about the methods in a dynamic 82 environment where the data is constantly updated. The details are provided in the later sections. We used an ensemble of 5 models for the ensemble-based UQ. For K-NN, we set K to 1. Lastly, we set α to 0.005 for the autoencoder approach. Regarding the training-related hyperparameters, we set the batch size to 256, the learning rate to 0.001, and trained for 400 epochs. Additionally, we utilized a learning rate scheduler that reduces the learning rate by 25% when the validation loss does not improve for 10 consecutive epochs. The details about the model architecture are provided in Fig. C.1 of Appendix C. The reported errors are normalized in the following way: Errornorm = |E − ˆE| √ N where E and ˆE are target and predicted energies respectively, and N is the number of atoms for a given molecule. 5.2.2.1 Static Experiments We split each dataset into the training and testing data at an 80/20 ratio. For single model UQ metrics (K-NN and autoencoder based), 20% of training data is used as validation data and for ensemble training, cross-validation is used. After the training, the UQ metrics are evaluated on the test data. Figure 5.4 RMSE ratios for QM9 and ANI1CCX. The ratios are calculated by dividing the test data into two groups based on the magnitude of the UQ metric, higher is better. 83 707580859095100Selected metric threshold (in percentile)4681012RMSE ratioQM9 RMSE Ratio (p > threshold / p <= threshold)ensemble-5KNN-1Autoencoder707580859095100Selected metric threshold (in percentile)1.501.752.002.252.502.753.003.25RMSE ratioANI-1ccx RMSE Ratio (p > threshold / p <= threshold)ensemble-5KNN-1Autoencoder The Root Mean Square Error (RMSE) ratios displayed in Fig. 5.4 were derived as follows: First, the samples in the test data were sorted by the magnitude of the specified UQ metric. Then, based on a chosen percentile rank threshold, the data was divided into two distinct groups. Samples with uncertainty higher than the threshold were placed in the first group, while the remaining samples were allocated to the second group. The ratio of these two groups provides the values shown in Fig. 5.4. If the UQ metric is highly correlated with the absolute error, it effectively differentiates between high-error and low-error cases, resulting in a higher RMSE ratio. In contrast, a UQ metric that is randomly distributed would lead to an RMSE ratio of approximately 1, as the division between the two groups would be essentially arbitrary, causing the RMSE values within each group to be very similar to each other and to the overall RMSE. As seen in the figure, the ensemble-based UQ delivers the best results for both datasets, followed by K-NN, and lastly, the autoencoder-based UQ. The difference between the ensemble- based and K-NN based UQ metrics is very minimal for both datasets. While all UQ metrics perform more effectively on the QM9 datasets, the performance disparity between the met- rics is less pronounced on the more diverse ANI-1ccx dataset. A possible explanation for this observation could be the extensive use of active learning in the generation of the ANI-1ccx dataset [99]. This heightened diversity might lead the learned models to be more balanced, potentially resulting in a suboptimal performance of the UQ metrics. Method Dataset Normalized RMSE (kcal/mol) QM9 Ensemble 0.2084 K-NN QM9 0.2288 Autoencoder QM9 0.2543 Ensemble ANI-1ccx 0.7243 ANI-1ccx 0.8515 K-NN Autoencoder ANI-1ccx 0.8531 Table 5.1 Normalized RMSE in the test data for different setups. The RMSE values on the evaluated datasets and models are presented in Table 5.1. 84 Since the ensemble approach aggregates predictions from multiple models, its RMSE on the test data is lower than that of the single-model predictions used in the K-NN and autoencoder approaches. Additionally, the RMSE values for the autoencoder approach are slightly higher due to the complexity of learning two different tasks simultaneously, as it employs a loss function detailed in Eq. (5.2). This dual-task learning can introduce a trade- off that potentially impairs the optimization of the RMSE. 5.2.2.2 Dynamic Experiments Figure 5.5 Iterative data refinement. To more effectively compare the UQ metrics, we devised the method depicted in Fig. 5.5. For each dataset, we first split it randomly into three segments. The initial training set is used to start the active learning process, while the initial data pool augments the training data using a chosen UQ metric. The final test set monitors the performance throughout the active learning phase. After training, a segment of the data pool is selected based on assessed uncertainty and added to the training data. In every iteration, uncertainty for all samples in the data pool is evaluated. Subsequently, a subset with the highest uncertainty augments the training data. For a more comprehensive comparison, we also assessed random selection and selection based on normalized absolute error. To keep the RMSE metric more 85 DatasetInitial Training Set(5%)Initial Data Pool(75%)Final Test Set(20%)Training SetData poolModel(s)TrainMove 1% of data pool to the training set based on UQ metricEvaluateActive Learning Loop comparable across the UQ metrics, a single model without the reconstruction loss is trained with the training data available at each cycle for every UQ metric and Fig. 5.6 is created with errors reported from these models. Figure 5.6 Change of the RMSE corresponding to the top 1% of test samples with the highest normalized error for each cycle. As seen in Fig. 5.6, while random selection lags behind the investigated UQ metrics, these UQ metrics perform similarly to the selection based on normalized error. Though the differences are subtle, the ensemble approach performs slightly better, while the K-NN approach yields marginally inferior results. 5.3 Concluding Remarks Since the quality of data is crucial for the generalizability of MLPs, active learning approaches have gained more importance in generating such data. While the existing ap- proaches often utilize ensemble disagreement as the uncertainty metric, we have explored and compared different methods, including K-NN and autoencoder-based uncertainty quantifica- tion. Ensemble disagreement yields superior results but is computationally more demanding as multiple models must be trained and used for inference. We offer more economical al- ternatives and demonstrate their efficacy on the QM9 and ANI-1ccx datasets through two distinct experiments. Although our focus is primarily on the ANI architecture, we plan to conduct similar evaluations for newer MLP architectures in the future since the UQ metrics 86 051015202530Cycle2345678Normalized RMSE of Highest 1% (kcal/mol)QM9RandomTrueEnsemble-5KNN-1Autoencoder051015202530Cycle56789ANI-1ccxRandomTrueEnsemble-5KNN-1Autoencoder we provide are easily adaptable to different MLPs with minimal changes. 87 CHAPTER 6 CONCLUSION AND FUTURE WORK In this thesis, we have sought to enhance the fidelity and usability of molecular mod- els using hybridization and machine learning methods. Despite the multitude of modeling approaches available for atomistic simulations, each with its unique trade-offs, no single ap- proach is universally applicable. We introduced ReaxFF/AMBER, a hybrid method that melds the reactive force field ReaxFF with the classical force field AMBER, thereby broad- ening the capabilities of the latter by adding dynamic bonds and charges. However, due to the complex nature of ReaxFF, the existing parameterization tools lacked the efficiency and speed needed for the expansive scope of biological simulations enabled by ReaxFF/AMBER. This led us to develop JAX-ReaxFF, a gradient-based parameterization tool for ReaxFF, by leveraging the versatile JAX framework. With this new tool, we are able to cut down the training time drastically compared to the existing approaches. While JAX-ReaxFF is designed for fast parameter optimization, it is not suitable for running MD simulations. To leverage the vast machine learning ecosystem for easier development, maintenance, and seamless integration with existing Python libraries, we have integrated an end-to-end differ- entiable implementation of ReaxFF into JAX-MD, which is a JAX based MD library. While offering the mentioned benefits, the performance of the new implementation is on par with the widely used KOKKOS-based ReaxFF implementation on modern NVIDIA and AMD GPUs. The wide success and adaptation of the machine learning techniques and libraries have led to the development of a new class of force fields called machine learning potentials. Un- like their classical and reactive counterparts, MLPs hinge on high quality training data to offer QM level accuracy while being drastically faster. We compared different uncertainty quantification approaches as they are essential for the active learning methods used to gen- erate training data with minimal supervision. Also, UQ metrics help understanding where MLPs fail when running MD simulations as they are hard to examine compared to the clas- 88 sical force fields. Although our preliminary experiments predominantly centered on the ANI architecture, future endeavors will branch out to newer MLP designs, including message passing and equivariant neural networks. 89 BIBLIOGRAPHY [1] Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 265–283, 2016. [2] [3] [4] [5] [6] [7] [8] Hasan Metin Aktulga, Joseph C Fogarty, Sagar A Pandit, and Ananth Y Grama. Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques. Parallel Computing, 38(4-5):245–259, 2012. Hasan Metin Aktulga, Sagar A Pandit, Adri CT van Duin, and Ananth Y Grama. Reactive molecular dynamics: Numerical methods and algorithmic techniques. SIAM Journal on Scientific Computing, 34(1):C1–C23, 2012. Jinwon An and Sungzoon Cho. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2(1):1–18, 2015. Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Claudio Fan- tacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hes- sel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Lena Martens, Hamza Merzic, Vladimir Mikulik, Tamara Norman, John Quan, George Papamakarios, Roman Ring, Francisco Ruiz, Alvaro Sanchez, Ros- alia Schneider, Eren Sezener, Stephen Spencer, Srivatsan Srinivasan, Luyu Wang, Wojciech Stokowiec, and Fabio Viola. The DeepMind JAX Ecosystem, 2020. See http://github.com/deepmind/jax. Jonathan T Barron. Continuously differentiable exponential linear units. preprint arXiv:1704.07483, 2017. arXiv Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vini- cius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018. Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E (3)- equivariant graph neural networks for data-efficient and accurate interatomic poten- tials. Nature communications, 13(1):1–11, 2022. [9] J¨org Behler and Michele Parrinello. Generalized neural-network representation of high- dimensional potential-energy surfaces. Physical review letters, 98(14):146401, 2007. [10] Erik Bitzek, Pekka Koskinen, Franz G¨ahler, Michael Moseler, and Peter Gumbsch. Structural relaxation made simple. Physical review letters, 97(17):170201, 2006. 90 [11] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman- Milne, et al. Jax: composable transformations of python+ numpy programs. Version 0.2, 5:14–24, 2018. [12] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001. [13] Donald W Brenner, Olga A Shenderova, Judith A Harrison, Steven J Stuart, Boris Ni, and Susan B Sinnott. A second-generation reactive empirical bond order (rebo) potential energy expression for hydrocarbons. Journal of Physics: Condensed Matter, 14(4):783, 2002. [14] Patrick Bultinck, Wilfried Langenaeker, Philippe Lahorte, Frank De Proft, Paul Geer- lings, Michel Waroquier, and JP Tollenaere. The electronegativity equalization method i: Parametrization and validation for atomic charge calculations. The Journal of Phys- ical Chemistry A, 106(34):7887–7894, 2002. [15] David A Case, H Metin Aktulga, Kellon Belfon, IY Ben-Shalom, Scott R Brozell, DS Cerutti, TE Cheatham III, GA Cisneros, VWD Cruzeiro, TA Darden, et al. Amber 2021. University of California Press, 2021. [16] David A Case, Thomas E Cheatham III, Tom Darden, Holger Gohlke, Ray Luo, Ken- neth M Merz Jr, Alexey Onufriev, Carlos Simmerling, Bing Wang, and Robert J Woods. The amber biomolecular simulation programs. Journal of computational chem- istry, 26(16):1668–1688, 2005. [17] Zhaomin Chen, Chai Kiat Yeo, Bu Sung Lee, and Chiew Tong Lau. Autoencoder-based network anomaly detection. In 2018 Wireless telecommunications symposium (WTS), pages 1–5. IEEE, 2018. [18] Kimberly Chenoweth, Adri CT Van Duin, and William A Goddard. Reaxff reactive force field for molecular dynamics simulations of hydrocarbon oxidation. The Journal of Physical Chemistry A, 112(5):1040–1053, 2008. [19] Chaitanya M Daksha, Jejoon Yeon, Sanjib C Chowdhury, and John W Gillespie Jr. Automated reaxff parametrization using machine learning. Computational Materials Science, 187:110107, 2021. [20] Tom Darden, Darrin York, and Lee Pedersen. Particle mesh ewald: An n log (n) method for ewald sums in large systems. The Journal of chemical physics, 98(12):10089–10092, 1993. [21] Mark Dittner, Julian M¨uller, Hasan Metin Aktulga, and Bernd Hartke. Efficient global optimization of reactive force-field parameters. Journal of computational chemistry, 36(20):1550–1561, 2015. [22] Marcus Elstner, Dirk Porezag, G Jungnickel, J Elsner, M Haugk, Th Frauenheim, Sandor Suhai, and Gotthard Seifert. Self-consistent-charge density-functional tight- binding method for simulations of complex materials properties. Physical Review B, 58(11):7260, 1998. 91 [23] Joseph C Fogarty, Hasan Metin Aktulga, Ananth Y Grama, Adri CT Van Duin, and Sagar A Pandit. A reactive molecular dynamics simulation of the silica-water interface. The Journal of chemical physics, 132(17):174704, 2010. [24] Richard A Friesner. Ab initio quantum chemistry: Methodology and applications. Proceedings of the National Academy of Sciences, 102(19):6648–6653, 2005. [25] Richard A Friesner and Victor Guallar. Ab initio quantum chemical and mixed quan- tum mechanics/molecular mechanics (qm/mm) methods for studying enzymatic catal- ysis. Annu. Rev. Phys. Chem., 56:389–427, 2005. [26] MJ ea Frisch, GW Trucks, H Bernhard Schlegel, GE Scuseria, MA Robb, JR Cheese- man, G Scalmani, VPGA Barone, GA Petersson, HJRA Nakatsuji, et al. Gaussian 16, 2016. [27] David Furman, Benny Carmeli, Yehuda Zeiri, and Ronnie Kosloff. Enhanced particle swarm optimization algorithm: Efficient training of reaxff reactive force fields. Journal of chemical theory and computation, 14(6):3100–3112, 2018. [28] David Furman and David J Wales. A well-behaved theoretical framework for reaxff reactive force fields. The Journal of Chemical Physics, 153(2):021102, 2020. [29] Joseph J Gajewski and Neal D Conrad. Variable transition state structure in 3, 3- sigmatropic shifts from. alpha.-secondary deuterium isotope effects. Journal of the American Chemical Society, 101(22):6693–6704, 1979. [30] Julian D Gale, Paolo Raiteri, and Adri CT van Duin. A reactive force field for aqueous- calcium carbonate systems. Physical Chemistry Chemical Physics, 13(37):16666–16679, 2011. [31] Bruce Ganem. The mechanism of the claisen rearrangement: d´ej`a vu all over again. Angewandte Chemie International Edition in English, 35(9):936–945, 1996. [32] Jiali Gao. Combined qm/mm simulation study of the claisen rearrangement of al- Journal of the American Chemical Society, lyl vinyl ether in aqueous solution. 116(4):1563–1564, 1994. [33] Xiang Gao, Farhad Ramezanghorbani, Olexandr Isayev, Justin S Smith, and Adrian E Roitberg. Torchani: A free and open source pytorch-based deep learning implemen- tation of the ani neural network potentials. Journal of chemical information and modeling, 60(7):3408–3415, 2020. [34] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR, 2017. [35] Andreas W G¨otz, Matthew A Clark, and Ross C Walker. An extensible interface for qm/mm molecular dynamics simulations with amber. Journal of computational chemistry, 35(2):95–108, 2014. 92 [36] Victor Guallar and Frank H Wallrapp. Qm/mm methods: Looking inside heme proteins biochemisty. Biophysical chemistry, 149(1-2):1–11, 2010. [37] Konstantin Gubaev, Evgeny V Podryabinkin, and Alexander V Shapeev. Machine learning of molecular properties: Locality and active learning. The Journal of chemical physics, 148(24), 2018. [38] Jonathan´aM Guest, J ´aSimon Craw, Mark´aA Vincent, and Ian´aH Hillier. The effect of water on the claisen rearrangement of allyl vinyl ether: theoretical methods including explicit solvent and electron correlation. Journal of the Chemical Society, Perkin Transactions 2, 1(1):71–74, 1997. [39] Feng Guo, Yu-Shi Wen, Shi-Quan Feng, Xiao-Dong Li, Heng-Shuai Li, Shou-Xin Cui, Zhen-Rong Zhang, Hai-Quan Hu, Gui-Qing Zhang, and Xin-Lu Cheng. Intelligent- reaxff: Evaluating the reactive force field parameters with machine learning. Compu- tational Materials Science, 172:109393, 2020. [40] Berk Hess, Carsten Kutzner, David Van Der Spoel, and Erik Lindahl. Gromacs 4: al- gorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of chemical theory and computation, 4(3):435–447, 2008. [41] Martin Hiersemann and Udo Nubbemeyer. The Claisen rearrangement: methods and applications. John Wiley & Sons, 2007. [42] Pierre O Hubin, Denis Jacquemin, Laurence Leherte, and Daniel P Vercauteren. Pa- rameterization of the reaxff reactive force field for a proline-catalyzed aldol reaction. Journal of Computational Chemistry, 37(29):2564–2572, 2016. [43] Christine M Isborn, Andreas W Gotz, Matthew A Clark, Ross C Walker, and Todd J Mart´ınez. Electronic absorption spectra from mm and ab initio qm/mm molecular dynamics: environmental effects on the absorption spectrum of photoactive yellow protein. Journal of chemical theory and computation, 8(12):5092–5106, 2012. [44] Eldhose Iype, Markus H¨utter, APJ Jansen, Silvia V Nedea, and CCM Rindt. Pa- rameterization of a reactive force field using a monte carlo algorithm. Journal of computational chemistry, 34(13):1143–1154, 2013. [45] Jon Paul Janet, Chenru Duan, Tzuhsiung Yang, Aditya Nandy, and Heather J Kulik. A quantitative uncertainty metric controls error in neural network-driven chemical discovery. Chemical science, 10(34):7913–7922, 2019. [46] Andres Jaramillo-Botero, Saber Naserifar, and William A Goddard III. General mul- tiobjective force field optimization framework, with application to reactive force fields for silicon carbide. Journal of Chemical Theory and Computation, 10(4):1426–1439, 2014. [47] William L Jorgensen, Jayaraman Chandrasekhar, Jeffry D Madura, Roger W Impey, and Michael L Klein. Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics, 79(2):926–935, 1983. 93 [48] Jaewoon Jung, Wataru Nishima, Marcus Daniels, Gavin Bascom, Chigusa Kobayashi, Adetokunbo Adedoyin, Michael Wall, Anna Lappala, Dominic Phillips, William Fis- cher, et al. Scaling molecular dynamics beyond 100,000 processor cores for large-scale biophysical simulations. Journal of computational chemistry, 40(21):1919–1930, 2019. [49] J K¨astner. Umbrella sampling. wires computational molecular science 1: 932–942, 2011. [50] Mehmet Cagri Kaymak, Ali Rahnamoun, Kurt A. O’Hearn, Adri CT Van Duin, Ken- neth M Merz Jr, and Hasan Metin Aktulga. JAX-ReaxFF: A Gradient-Based Frame- work for Fast Optimization of Reactive Force Fields. Journal of Chemical Theory and Computation, 18(9):5181–5194, 2022. [51] Mehmet Cagri Kaymak, Samuel S Schoenholz, Ekin D Cubuk, Kurt A O’Hearn, Ken- neth M Merz Jr, and Hasan Metin Aktulga. End-to-end differentiable reactive molecu- lar dynamics simulations using jax. In International Conference on High Performance Computing, pages 202–219. Springer, 2023. [52] Dieter Kraft. sequential quadratic programming. Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raum- fahrt, 1988. A software package for [53] Sudhir B Kylasa, Hasan Metin Aktulga, and Ananth Y Grama. Puremd-gpu: A reactive molecular dynamics simulation package for gpus. Journal of Computational Physics, 272:343–359, 2014. [54] Matthew R LaBrosse, J Karl Johnson, and Adri CT van Duin. Development of a transferable reactive force field for cobalt. The Journal of Physical Chemistry A, 114(18):5855–5861, 2010. [55] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scal- able predictive uncertainty estimation using deep ensembles. Advances in neural in- formation processing systems, 30, 2017. [56] Henrik R Larsson, Adri CT van Duin, and Bernd Hartke. Global optimization of pa- rameters in the reactive force field reaxff for sioh. Journal of computational chemistry, 34(25):2178–2189, 2013. [57] Lei Le, Andrew Patterson, and Martha White. Supervised autoencoders: Improving generalization performance with unsupervised regularizers. Advances in neural infor- mation processing systems, 31, 2018. [58] Ruifeng Liu and Anders Wallqvist. Molecular similarity-based domain applicability metric efficiently identifies out-of-domain compounds. Journal of Chemical Information and Modeling, 59(1):181–189, 2018. [59] Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982. 94 [60] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. [61] Xiya Lu, Dong Fang, Shingo Ito, Yuko Okamoto, Victor Ovchinnikov, and Qiang Cui. Qm/mm free energy simulations: Recent progress and challenges. Molecular simulation, 42(13):1056–1078, 2016. [62] Rita P Magalh˜aes, Henriques S Fernandes, and S´ergio F Sousa. Modelling enzymatic Israel mechanisms with qm/mm approaches: current status and future challenges. Journal of Chemistry, 60(7):655–666, 2020. [63] Madushanka Manathunga, Yipu Miao, Dawei Mu, Andreas W Gotz, and Kenneth M Merz Jr. Parallel implementation of density functional theory methods in the quantum interaction computational kernel program. Journal of Chemical Theory and Compu- tation, 16(7):4315–4326, 2020. [64] Luca Monticelli and D Peter Tieleman. Force fields for classical molecular dynamics. Biomolecular simulations: Methods and protocols, pages 197–213, 2013. [65] Wilfried J Mortier, Swapan K Ghosh, and S Shankar. Electronegativity-equalization method for the calculation of atomic charges in molecules. Journal of the American Chemical Society, 108(15):4315–4320, 1986. [66] RL Mozzi and n BE Warren. The structure of vitreous silica. Journal of Applied Crystallography, 2(4):164–172, 1969. [67] Julian M¨ulller and Bernd Hartke. Reaxff reactive force field for disulfide mechanochem- istry, fitted to multireference ab initio data. Journal of chemical theory and computa- tion, 12(8):3913–3925, 2016. [68] Hiroya Nakata and Shandan Bai. Development of a new parameter optimization scheme for a reactive force field based on a machine learning approach. Journal of computa- tional chemistry, 40(23):2000–2012, 2019. [69] Kurt A O’Hearn, Abdullah Alperen, and Hasan Metin Aktulga. Fast solvers for charge distribution models on shared memory platforms. SIAM Journal on Scientific Com- puting, 42(1):C1–C22, 2020. [70] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural in- formation processing systems, 32, 2019. [71] David A Pearlman, David A Case, James W Caldwell, Wilson S Ross, Thomas E Cheatham III, Steve DeBolt, David Ferguson, George Seibel, and Peter Kollman. Am- ber, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Computer Physics Communications, 91(1-3):1– 41, 1995. 95 [72] James C Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel, Laxmikant Kale, and Klaus Schul- ten. Scalable molecular dynamics with namd. Journal of computational chemistry, 26(16):1781–1802, 2005. [73] Steve Plimpton. Fast parallel algorithms for short-range molecular dynamics. Journal of computational physics, 117(1):1–19, 1995. [74] Evgeny V Podryabinkin, Evgeny V Tikhonov, Alexander V Shapeev, and Artem R Oganov. Accelerating crystal structure prediction by machine-learning interatomic potentials with active learning. Physical Review B, 99(6):064114, 2019. [75] GP Purja Pun and Y Mishin. Embedded-atom potential for hcp and fcc cobalt. Physical Review B, 86(13):134116, 2012. [76] Zhuoran Qiao, Matthew Welborn, Animashree Anandkumar, Frederick R Manby, and Thomas F Miller III. OrbNet: Deep learning for quantum chemistry using symmetry- adapted atomic-orbital features. The Journal of chemical physics, 153(12):124111, 2020. [77] Ali Rahnamoun, Mehmet Cagri Kaymak, Madushanka Manathunga, Andreas W G¨otz, Adri CT Van Duin, Kenneth M Merz Jr, and Hasan Metin Aktulga. Reaxff/amber—a framework for hybrid reactive/nonreactive force field molecular dynamics simulations. Journal of chemical theory and computation, 16(12):7645–7654, 2020. [78] Ali Rahnamoun, Kurt A O’Hearn, Mehmet Cagri Kaymak, Zhen Li, Kenneth M Merz Jr, and Hasan Metin Aktulga. A polarizable cationic dummy metal ion model. The Journal of Physical Chemistry Letters, 13(23):5334–5340, 2022. [79] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilien- feld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014. [80] Anthony K Rappe and William A Goddard III. Charge equilibration for molecular dynamics simulations. The Journal of Physical Chemistry, 95(8):3358–3363, 1991. [81] SCM ReaxFF. Theoretical chemistry, 2020. [82] Amit Sabne. Xla: Compiling machine learning for peak performance. 2020. [83] RT Sanderson. An interpretation of bond lengths and a classification of bonds. Science, 114(2973):670–672, 1951. [84] Samuel Schoenholz and Ekin Dogus Cubuk. Jax md: a framework for differentiable physics. Advances in Neural Information Processing Systems, 33:11428–11441, 2020. [85] Frederic W Schuler and George W Murphy. The kinetics of the rearrangement of vinyl allyl ether1. Journal of the American Chemical Society, 72(7):3155–3159, 1950. 96 [86] Kristof Sch¨utt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert M¨uller. Schnet: A continuous-filter con- volutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017. [87] Gustavo de M Seabra, Ross C Walker, Marcus Elstner, David A Case, and Adrian E Roitberg. Implementation of the scc-dftb method for hybrid qm/mm simulations within the amber molecular dynamics package. The Journal of Physical Chemistry A, 111(26):5655–5664, 2007. [88] Thomas P Senftle, Sungwook Hong, Md Mahbubul Islam, Sudhir B Kylasa, Yuanxia Zheng, Yun Kyung Shin, Chad Junkermeier, Roman Engel-Herbert, Michael J Janik, Hasan Metin Aktulga, et al. The reaxff reactive force-field: development, applications and future directions. npj Computational Materials, 2(1):1–14, 2016. [89] Mert Y Sengul, Yao Song, Nadire Nayir, Yawei Gao, Ying Hung, Tirthankar Dasgupta, Indeedopt: a deep learning-based reaxff parameterization and Adri CT van Duin. framework. npj Computational Materials, 7(1):1–9, 2021. [90] Hans Martin Senn and Walter Thiel. Qm/mm methods for biomolecular systems. Angewandte Chemie International Edition, 48(7):1198–1229, 2009. [91] Alexander Shapeev, Konstantin Gubaev, Evgenii Tsymbalov, and Evgeny Podryabinkin. Active learning and uncertainty estimation. Machine Learning Meets Quantum Physics, pages 309–329, 2020. [92] David E Shaw, Peter J Adams, Asaph Azaria, Joseph A Bank, Brannon Batson, Al- istair Bell, Michael Bergdorf, Jhanvi Bhatt, J Adam Butts, Timothy Correia, et al. Anton 3: twenty microseconds of molecular dynamics simulation before lunch. In Proceedings of the International Conference for High Performance Computing, Net- working, Storage and Analysis, pages 1–11, 2021. [93] David E Shaw, Ron O Dror, John K Salmon, JP Grossman, Kenneth M Mackenzie, Joseph A Bank, Cliff Young, Martin M Deneroff, Brannon Batson, Kevin J Bowers, et al. Millisecond-scale molecular dynamics simulations on anton. In Proceedings of the conference on high performance computing networking, storage and analysis, pages 1–11, 2009. [94] Ganna Shchygol, Alexei Yakovlev, Toman Trnka, Adri CT Van Duin, and Toon Ver- straelen. Reaxff parameter optimization with monte-carlo and evolutionary algorithms: Guidelines and insights. Journal of Chemical Theory and Computation, 15(12):6799– 6812, 2019. [95] Robert P Sheridan, Bradley P Feuston, Vladimir N Maiorov, and Simon K Kearsley. Similarity to molecules in the training set is a good discriminator for prediction accu- racy in qsar. Journal of chemical information and computer sciences, 44(6):1912–1928, 2004. 97 [96] Justin S Smith, Olexandr Isayev, and Adrian E Roitberg. Ani-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules. Scientific data, 4(1):1–8, 2017. [97] Justin S Smith, Ben Nebgen, Nicholas Lubbers, Olexandr Isayev, and Adrian E Roit- berg. Less is more: Sampling chemical space with active learning. The Journal of chemical physics, 148(24):241733, 2018. [98] Justin S Smith, Benjamin Nebgen, Nithin Mathew, Jie Chen, Nicholas Lubbers, Leonid Burakovsky, Sergei Tretiak, Hai Ah Nam, Timothy Germann, Saryu Fensin, et al. Au- tomated discovery of a robust interatomic potential for aluminum. Nature communi- cations, 12(1):1257, 2021. [99] Justin S Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Bar- ros, Adrian E Roitberg, Olexandr Isayev, and Sergei Tretiak. The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci- entific data, 7(1):1–10, 2020. [100] James JP Stewart. Optimization of parameters for semiempirical methods v: Modifi- cation of nddo approximations and application to 70 elements. Journal of Molecular modeling, 13:1173–1213, 2007. [101] JJPRB Tersoff. Modeling solid-state chemistry: Interatomic potentials for multicom- ponent systems. Physical review B, 39(8):5566, 1989. [102] Jos Thijssen. Computational physics. Cambridge university press, 2007. [103] Aidan P Thompson, H Metin Aktulga, Richard Berger, Dan S Bolintineanu, W Michael Brown, Paul S Crozier, Pieter J in’t Veld, Axel Kohlmeyer, Stan G Moore, Trung Dac Nguyen, et al. Lammps-a flexible simulation tool for particle-based materials model- ing at the atomic, meso, and continuum scales. Computer Physics Communications, 271:108171, 2022. [104] Aidan P Thompson, Laura P Swiler, Christian R Trott, Stephen M Foiles, and Garritt J Tucker. Spectral neighbor analysis method for automated generation of quantum- accurate interatomic potentials. Journal of Computational Physics, 285:316–330, 2015. [105] Tomas Trnka, Igor Tvaroska, and Jaroslav Koca. Automated training of reaxff reac- tive force fields for energetics of enzymatic reactions. Journal of chemical theory and computation, 14(1):291–302, 2018. [106] Christian R Trott, Damien Lebrun-Grandi´e, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S Hollman, Dan Ibanez, et al. Kokkos 3: Programming model extensions for the exascale era. IEEE Transac- tions on Parallel and Distributed Systems, 33(4):805–817, 2021. [107] Adri CT van Duin, Jan MA Baas, and Bastiaan Van De Graaf. Delft molecular mechanics: a new approach to hydrocarbon force fields. inclusion of a geometry- dependent charge calculation. Journal of the Chemical Society, Faraday Transactions, 90(19):2881–2895, 1994. 98 [108] Adri CT Van Duin, Siddharth Dasgupta, Francois Lorant, and William A Goddard. Reaxff: a reactive force field for hydrocarbons. The Journal of Physical Chemistry A, 105(41):9396–9409, 2001. [109] F Vazquez, Ester M Garz´on, JA Martinez, and JJ Fernandez. The sparse matrix vector product on GPUs. In Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, volume 2, pages 1081–1092. Computational and Mathematical Methods in Science and Engineering Gij´on, Spain, 2009. [110] Francisco V´azquez, Jos´e-Jes´us Fern´andez, and Ester M Garz´on. A new approach for sparse matrix vector product on nvidia gpus. Concurrency and Computation: Practice and Experience, 23(8):815–826, 2011. [111] Toon Verstraelen, PW Ayers, Veronique Van Speybroeck, and Michel Waroquier. ACKS2: Atom-condensed Kohn-Sham DFT approximated to second order. The Jour- nal of chemical physics, 138(7):074108, 2013. [112] Ross C Walker, Michael F Crowley, and David A Case. The implementation of a fast and accurate qm/mm potential method in amber. Journal of computational chemistry, 29(7):1019–1031, 2008. [113] Junmei Wang, Romain M Wolf, James W Caldwell, Peter A Kollman, and David A Case. Development and testing of a general amber force field. Journal of computational chemistry, 25(9):1157–1174, 2004. [114] Xinyan Wang, Jichen Li, Lan Yang, Feiyang Chen, Yingze Wang, Junhan Chang, Jun- min Chen, Wei Feng, Linfeng Zhang, and Kuang Yu. Dmff: an open-source automatic differentiable platform for molecular force field development and molecular dynamics simulation. Journal of Chemical Theory and Computation, 19(17):5897–5909, 2023. [115] Olaf Wiest, Kersey A Black, and KN Houk. Density functional theory isotope ef- fects and activation energies for the cope and claisen rearrangements. Journal of the American Chemical Society, 116(22):10336–10337, 1994. [116] Jun Zhang, Y Isaac Yang, Lijiang Yang, and Yi Qin Gao. Dynamics and kinetics study of “in-water” chemical reactions by enhanced sampling of reactive trajectories. The Journal of Physical Chemistry B, 119(45):14505–14514, 2015. [117] Weiwei Zhang and Adri CT Van Duin. Improvement of the reaxff description for func- tionalized hydrocarbon/water weak interactions in the condensed phase. The Journal of Physical Chemistry B, 122(14):4083–4092, 2018. [118] Ciyou Zhu, Richard H Byrd, Peihuang Lu, and Jorge Nocedal. Algorithm 778: L-bfgs- b: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans- actions on Mathematical Software (TOMS), 23(4):550–560, 1997. 99 APPENDIX A SUPPLEMENTAL DATA FOR REAXFF/AMBER—A FRAMEWORK FOR HYBRID REACTIVE/NONREACTIVE FORCE FIELD MOLECULAR DYNAMICS SIMULATIONS A.1 Hardware and Software Setup All computations have been performed on Laconia, a cluster with over 400 compute nodes at Michigan State University’s High-Performance Computing Center. Each of the base compute nodes on Laconia has 28 cores, located on two fourteen core Intel Xeon E5- 2680v4 Broadwell 2.4 GHz processors, and has 128 GB DDR3 2133 MHz ECC memory. Each core possesses a 64 KB L1 cache (32 KB instruction, 32 KB data), a 256 KB L2 cache. Between fourteen cores of a single “Broadwell” processor, a 35 MB L3 cache is shared. At the time of the experiments, the Laconia nodes ran CentOS version 7 distribution of GNU/linux for x86 64 architectures. The standalone ReaxFF software was built using the Intel Fortran compiler (IFORT) version 2019.3.199 with the -O3 flag. A.2 QM Claisen Rearrangement Transition Configurations Figure A.1 Transition configuration of AVE during Claisen rearrangement for QM, ReaxFF, SCC-DFTB and PM3 simulations. 100 APPENDIX B SUPPLEMENTAL DATA FOR JAX-REAXFF: A GRADIENT-BASED FRAMEWORK FOR FAST OPTIMIZATION OF REACTIVE FORCE FIELDS B.1 Clustering Algorithms Algorithm B.1 Clustering (G, I, R). Ccur ← Initialize cluster centers by selecting random geometries for i = 1, 2, . . . , I do Cprev ← Ccur Shuffle G for each g ∈ G do Assign g to ci where Dist (g, ci) is minimum Update the cluster centers 1: Cbest ← Keep track of the best so far 2: for r = 1, 2, . . . , R do 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: end for end for if Cost (Ccur) < Cost (Cbest) then end for if Ccur == Cprev then Cbest ← Ccur end if end if break Algorithm B.2 Modified k-Means for geometry clustering. 1: kmax ← Maximum number of clusters 2: R ← Number of repetitions for the clustering algorithm 3: I ← Number of iterations for the clustering algorithm 4: Cselected ← Selected clustering of the geometries 5: for k = 1, 2, . . . , kmax do 6: 7: 8: 9: 10: 11: end for Cselected ← Ck break Costk, Ck ← Clustering (G, I, R) if |Costk − Costk−1| / Costk−1 < tolerance or k == kmax then end if ▷ See Algorithm B.1 101 B.2 Convergence Plots for Training Tasks B.2.1 Cobalt (a) L-BFGS-B (b) SLSQP Figure B.1 Convergence of the local optimizers for the cobalt data set. B.2.2 Silica (a) L-BFGS-B (b) SLSQP Figure B.2 Convergence of the local optimizers for the silica data set. 102 05101520Iteration103104105106107108ErrorCobalt: Random Initial Guess and L-BFGS-B05101520Iteration103104105106107108109ErrorCobalt: Random Initial Guess and SLSQP05101520Iteration104105ErrorSilica: Random Initial Guess and L-BFGS-B05101520Iteration104105ErrorSilica: Random Initial Guess and SLSQP B.2.3 Disulfide (a) L-BFGS-B (b) SLSQP Figure B.3 Convergence of the local optimizers for the disulfide data set with single precision. (a) L-BFGS-B (b) SLSQP Figure B.4 Convergence of the local optimizers for the disulfide data set with double precision. B.3 Hardware and Software Setup All the CPU performance experiments reported were conducted on server with two Intel Xeon E5-2680 v4 @ 2.40GHz processors (2 processors each with 14 cores) and 128 GB 2133 MHz DDR4 RAM. The GPU experiments were conducted on server with a sole Intel Core i7-9700K CPU @ 3.60GHz processor, 16 GB 3000 MHz DDR4 RAM, and single 1080-TI PCI-e card (11 GB GDDR5X memory). For the baseline methods, the OGOLEM version 1.0 software with the PuReMD backend was used. The proposed methods were implemented in Python 3.7 and utilized JAX version 0.1.76, NumPy version 1.16.4 and SciPy version 1.5.1. 103 05101520Iteration105106ErrorDisulfide: Random Initial Guess and L-BFGS-B05101520Iteration105106ErrorDisulfide: Random Initial Guess and SLSQP05101520Iteration104105106ErrorDisulfide: Random Initial Guess and L-BFGS-B05101520Iteration104105106ErrorDisulfide: Random Initial Guess and SLSQP APPENDIX C SUPPLEMENTAL DATA FOR UNCERTAINTY QUANTIFICATION METHODS FOR MACHINE LEARNING POTENTIALS C.1 Model Architecture Figure C.1 Model architectures for each available atom type and the autoencoder part (en- coder and decoder). As illustrated in Fig. C.1, the CELU [6] activation function is employed because it is twice differentiable, a critical property for using this model as a potential energy function. The term Linear refers to a linear transformation layer with learnable weights, with the output dimension provided as an argument. The model architecture determines the input dimension. 104 NameArchitectureHLinear(160)-CELU(0.1)-Linear(128)-CELU(0.1)-Linear(96)-CELU(0.1)-Linear(1)CLinear(144)-CELU(0.1)-Linear(112)-CELU(0.1)-Linear(96)-CELU(0.1)-Linear(1)O, N, FLinear(128)-CELU(0.1)-Linear(112)-CELU(0.1)-Linear(96)-CELU(0.1)-Linear(1)Encoder for all typesLinear(16)-CELU(0.1)Decoder for all typesCELU(0.1)-Linear(96)