MACHINE LEARNING-BASED COARSE-GRAINED MODELS FOR MOLECULAR SYSTEMS By Liyao Lyu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Computational Mathematics, Science and Engineering—Doctor of Philosophy 2025 ABSTRACT Multiscale modeling poses a formidable challenge in computational mathematics, particularly in integrating microscale interactions into meso- or macro-scale constitutive relations. While reduced-order models allow the simulation of extensive systems, their analytical formulations are generally unclosed. Take coarse-grained molecular dynamics as an example, the Mori-Zwangzig formulism decomposes the dynamics into deterministic, memory, and stochastic terms, but the explicit forms for these three terms are unknown. In this thesis, we present a series of studies to solve these problems. Firstly, the main challenge for the deterministic term comes from the high dimensionality and the presence of energy barriers of the free energy surface (FES). We propose a consensus sampling-based approach that reformulates the FES construction as a minimax problem. This framework simultaneously optimizes the function representation of the FES and the training set used to learn it. In particular, the maximization step establishes a stochastic interacting particle system to achieve the adaptive sampling of the max-residue regime by modulating the exploitation of the Laplace approximation of the current loss function and the exploration of the uncharted phase space; the minimization step updates the FES approximation with the new training set. By iteratively solving the minimax problem, the present method essentially achieves an adversarial learning of the FESs with unified tasks for both phase space exploration and posterior error-enhanced sampling. Besides, memory interactions are also important for predicting the collective transport and diffusion processes. To construct this, we introduce a machine-learning-based coarse-grained molecular dynamics model that captures the dissipative many-body contribution. The neural network representation is carefully designed to preserve the physical symmetries and the thermo-consistency. Unlike the common empirical reduced models, the present model is constructed based on the Mori-Zwanzig formalism and naturally inherits the heterogeneous state-dependent memory term rather than matching the mean-field metrics such as the velocity autocorrelation function. Finally, when applied to non-equilibrium systems, models based on the Mori-Zwanzig formalism face inherent challenges. A key issue lies in the Zwanzig projection, which relies on the marginal distribution of the system. We present a data-driven approach for constructing reduced models that retain certain generalization abilities for non-equilibrium processes. Unlike the conventional CG models based on pre-selected CG variables (e.g., the center of mass), the present CG model seeks a set of auxiliary CG variables based on the time-lagged independent component analysis to minimize the entropy contribution of the unresolved variables. This ensures the distribution of the unresolved variables under a broad range of non-equilibrium conditions approaches the one under equilibrium. Through numerical validation, we demonstrate that our model can accurately predict viscoelastic behavior in various non-equilibrium flow regimes. Copyright by LIYAO LYU 2025 ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my supervisor, Prof. Huan Lei. He is not only a brilliant scholar but also an exceptional mentor. Working with him has been an immensely enriching experience, helping me develop critical thinking and problem-solving skills. I am especially grateful for the intellectual freedom he provided, allowing me to explore and refine my ideas with confidence. I am equally grateful to the members of my thesis committee, Prof. Andrew Christlieb, Prof. Di Liu, and Prof. Yimin Xiao, for their insightful critiques, constructive suggestions, and invaluable support. Prof. Andrew Christlieb is one of the most patient and approachable teachers I have had the privilege to learn from. I have gained a great deal from his expertise in hyperbolic conservation laws and tensor decomposition. I am also deeply appreciative of Prof. Di Liu for his guidance in machine learning, which provided me with valuable insights. Additionally, I sincerely enjoyed my discussions with Prof. Yimin Xiao on the analytical aspects of stochastic differential equations; these exchanges greatly enriched my understanding of the subject. I would like to express my sincere gratitude to the professors whose courses greatly contributed to my academic growth. Prof. Jun Kitagawa’s thorough instruction in partial differential equations helped me develop a careful approach to theoretical problems, while Prof. Hui-Chia Yu’s engaging teaching style sparked my interest in numerical methods for chemical engineering. Prof. Rongrong Wang’s clear explanations of numerical partial differential equations strengthened my understanding, and Prof. Daniel Appelö’s expertise in numerical linear algebra helped me connect theory with practical applications. Their dedication to teaching has been invaluable in shaping my knowledge and perspective. I would also like to sincerely thank my friends—Pei Ge, Zhiyuan She, Shijun Liang, Haishen Dai, Siyu Guo, Yue Zhao and many others—for their support and companionship throughout my PhD journey. This path can often be challenging and isolating, and their presence made it much more enjoyable and fulfilling. I deeply cherish the moments we shared, especially our dinners together, which brought warmth and laughter to this demanding period of my life. v Above all, I owe my deepest gratitude to my family for their unwavering support, encouragement, and sacrifices. Their patience during my countless hours of work, their belief in me during moments of doubt, and their unconditional love have been my foundation. This achievement is as much theirs as it is mine. vi TABLE OF CONTENTS CHAPTER 1 INTRODUCTION TO MULTISCALE MODELING AND MOLECULAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DYNAMICS . . 1 CHAPTER 2 DEVELOPING COARSE-GRAINED MOLECULAR DYNAMICS 5 MODELS WITH NON-MARKOVIAN MEMORY EFFECTS . . . . . . 5 2.1 The Mori-Zwanzig Framework for Coarse-Graining . . . . . . . . . . . . . . . 7 2.2 Mori-Zwanzig Formulaism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Empirical Approaches to Coarse-Graining . . . . . . . . . . . . . . . . . . . 9 . 12 2.4 Incorporating Physical Memory Kernels in Coarse-Grained Models . . . . . . . 13 2.5 Symmetry-preserving neural network representation . . . . . . . . . . . . . . 2.6 Training . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.7 Numerical Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 . 2.8 Conclusion . . . . . . . . . . CHAPTER 3 GENERALIZATION OF REDUCED MODELS FOR NON-EQUILIBRIUM DYNAMICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 . 21 3.1 Challenges in Modeling Non-Equilibrium Processes . . . . . . . . . . . . . . 3.2 Methodology for Generalizable Coarse-Grained Models . . . . . . . . . . . . . 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 . . 3.4 Conclusion . . . CHAPTER 4 CONSENSUS-BASED ENHANCED SAMPLING FOR HIGH-DIMENSIONAL FREE ENERGY SURFACES . . . . . . . . . . . . . . . . . . . . . . . 44 . 44 . 47 . 53 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1 Challenges in Constructing High-Dimensional Free Energy Surfaces . . . . . 4.2 Consensus-Based Sampling Methodology . . . . . . . . . . . . . . . . . . . 4.3 Application of Consensus-Based Sampling to Biomolecular Systems . . . . . 4.4 Conclusions . . . . . CHAPTER 5 CONCLUSION AND OUTLOOK . . . . . . . . . . . . . . . . . . . . . 62 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 APPENDIX A SIMULATION DETAIL FOR NONEQUILIBRIUM CASE . . . . . . . 76 APPENDIX B SIMULATION DETAIL AND PROOF FOR CONSENSUS BASED SAMLPING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 vii CHAPTER 1 INTRODUCTION TO MULTISCALE MODELING AND MOLECULAR DYNAMICS Numerical simulation has emerged as an important tool for scientific understanding, complementing experimental and theoretical approaches. As was declared by Dirac back in 1929, the right physical principle for most of what we are interested in is already provided by the principles of quantum mechanics (QM), there is no need to look further. However, the ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe theoriticaly or computationly. This intricate challenge remained largely ignored until Anderson underscored in Anderson (1972), that the constructivist hypothesis falters when faced with the dual challenges of scale and complexity. The vastness of scale leads to the well-known curse of dimensionality. While feeding atomic numbers of all involved atoms into QM can theoretically furnish a complete model, the problem’s dimensionality swells by three with the addition of each atom. Consequently, these first principle models are currently only applied to simple cases and are inefficient for managing expansive system sizes and extended simulation timescales. Model reduction shows promise, as they reduce degrees of freedom by omitting certain details, thereby facilitating larger-scale simulation. However, the reduced models are almost always obtained empirically, by guessing. Making the right guess often requires and represents far-reaching physical insight. In molecular dynamics (MD), the conservative interaction between particles is categorized into three distinct terms: two-particle pairwise, three-particle angular, and four-particle dihedral. On the mesoscale, as seen in Coarse-Grained Molecular Dynamics (CGMD), nonconservative friction forces are typically assumed to be isotropic in the Generalized Langevin Equation (GLE) or pairwise in Dissipative Particle Dynamics (DPD). At the macroscale, the derivation of macroscopic constitutive equations from kinetic equations necessitates the closure of the collision operator. This guessing game can be quite hard and less productive because of the complexity. As we will show in the first project, the inherent complexity is often largely oversimplified by traditionally intuition-driven models, such as the GLE or DPD in coarse-grained molecular dynamics (CGMD) 1 problems. The recent emergence of machine learning, with its prowess in assimilating multimodal, multi-fidelity data and elucidating correlations among interrelated phenomena, offers a unique opportunity to address these challenges. However, simply inputting atomic coordinates into a network to fit interactions can lead to a cumbersome parameter space, complicating the training process. More critically, an exclusive reliance on machine learning risks sidelining the fundamental laws of physics, which may result in ill-posed problems or unphysical solutions. Progress has been achieved by incorporating physical priors, such as physical symmetries, into the network ansatz. This not only enhances accuracy but also bolsters the transparency of the resulting models. A case in point is DeepMD, which constructs an extensive and symmetry-invariant network to capture the many-body nature of the conservative force field. However, accurate prediction of the dynamics in CGMD further relies on faithfully modeling a memory term that represents the energy-dissipation processes arising from the unresolved degrees of freedom (DoF). The importance of the state-dependent non-Markovian friction tensor (force) for dynamic properties remains broadly overlooked, with the conventional state-independent GLE kernel being often used. In our first project Lyu and Lei (2023b), we will show that the state-dependence of memory terms can be critical for constructing a reliable extensive CGMD model that preserves the collective dynamics behavior. The commonly used velocity auto-correlation function (VACF) is a mean-field metric. Reproducing the VACF (as can be sufficient to capture in GLE) is insufficient to capture the crucial collective behavior such as the hydrodynamic mode, collective diffusion (Van-Hove function), et al. In contrast, we propose a physical information-embedded network-based model constructed directly from Zwanzig’s formalism that faithfully retains the heterogeneous energy dissipation among the CG particles and accurately predicts the dynamics on the collective scale. The complexity isn’t confined solely to the dynamic equation; it also extends to the selection of dynamic coordinates Lyu and Lei (2024). A case in point is water models. Coordinates (or sites) ranging from one to six have been proposed to represent water, each aiming to capture distinct behaviors. The judicious selection of coordinates is rooted in physical insight, much like 2 the model-building previously mentioned. As the complexity of the molecular structure escalates, this task becomes progressively challenging. Data-driven coordinate selection seems a potential alternative to the traditional human intelligence-driven approach. In extensive CGMD systems, the center of mass (COM) remains the most prevalent coordinate. However, when modeling non-Newtonian fluids, which exhibit enormously complex flow behavior, capturing the deformation of polymers becomes crucial. This aspect is often oversimplified when relying solely on COM. The ongoing project intends to emphasize the importance of dynamic coordinate selection for faithfully capturing macroscope rheology behavior. Apart from dynamic equations and dynamic coordinates, one particular problem of these data-driven methods is sampling Lyu and Lei (2023a). Generating a good representative data set is in general not an easy task. For example, in molecular dynamics, a representative data set to construct a free energy surface (FES) includes most states of a molecule including ground states, meta-stable states, and some high-energy states (unstable states). In most cases, the molecules are stuck in a local minimum on the energy surface, and it is difficult to jump over the energy barriers. Enhance sampling thus is important for these problems to accelerate ergodic ability. To increase the probability of rare events occurring in MD simulations, the simulation process can be interfered with using various methods, including raising the temperature, replica exchange, and adding bias potential. Traditional boosted sampling methods based on bias potential suffer from the curse of dimensionality. In the third project, we propose the Consensus-Based Enhanced Sampling (CES) by framing the FESs construction as a min-max problem. The minimization aspect focuses on optimizing the network parameters to minimize error at the sampled points, while the maximization aspect focuses on optimizing the distribution of the sampling to maximize the error within the dataset. By iteratively addressing the min-max problem, we are able to construct a high-dimensional energy surface. Integrating deep learning with multiscale modeling holds potential, offering enhancements to models traditionally anchored in human intelligence. However, forging a seamless integration between machine learning and established scientific structural models remains elusive. The accuracy 3 and transparency of data-driven models continue to be viewed with skepticism by a broader community. We aspire that our work will contribute to bolstering the confidence in these approaches. While our primary focus has been on MD and CGMD scales, we also aim to extend our methods to the macro scale in the future. 4 CHAPTER 2 DEVELOPING COARSE-GRAINED MOLECULAR DYNAMICS MODELS WITH NON-MARKOVIAN MEMORY EFFECTS © 2023 American Physical Society. Reproduced with permission from L. Lyu and H. Lei, Phys. Rev. Lett. 131, 177301 (2023). https://doi.org/10.1103/PhysRevLett.131.177301 2.1 The Mori-Zwanzig Framework for Coarse-Graining Accurately predicting the collective behavior of multi-scale physical systems is a long-standing problem that requires the integrated modeling of the molecular-level interactions across multiple scales Anderson (1972). However, for systems without clear scale separation, there often exists no such a set of simple collective variables by which we can formulate the evolution in an analytic and self-determined way. One canonical example is coarse-grained molecular dynamics (CGMD). While the reduced degrees of freedom (DoFs) enable us to achieve a broader range of the spatio-temporal scale, the construction of truly reliable CG models remains highly non-trivial. A significant amount of work Torrie and Valleau (1977a); Rosso et al. (2002); Maragliano and Vanden-Eijnden (2006); Izvekov and Voth (2005); Noid et al. (2008); Rudd and Broughton (1998); Lyubartsev and Laaksonen (1995); Shell (2008); Kumar et al. (1992); Nielsen et al. (2004); Laio and Parrinello (2002); Darve and Pohorille (2001) (see also review Noid (2013)), including recent machine learning (ML)-based approaches Behler and Parrinello (2007); Stecher et al. (2014a); John and Csányi (2017); Lemke and Peter (2017); Chmiela et al. (2017); Zhang et al. (2018a,b), have been devoted to constructing the conservative CG potential for retaining consistent static and thermodynamic properties. However, accurate prediction of the CG dynamics further relies on faithfully modeling a memory term that represents the energy-dissipation processes arising from the unresolved DoFs; the governing equations generally become non-Markovian on the CG scale. Moreover, such non-Markovian term often depends on the resolved variables in a complex way Satija et al. (2017); Luo et al. (2006); Best and Hummer (2010); Plotkin and Wolynes (1998); Straus et al. (1993); Morrone et al. (2012); Daldrop et al. (2017) where the analytic formulation is generally unknown. In particular, for extensive CGMD systems (i.e., the number of CG particles can be proportionally changed according 5 to the simulation size), the memory term often exhibits strong many-body effect and needs to satisfy various physical symmetry constraints among the CG particles. Existing approaches often rely on empirical models such as Brownian motion Einstein (1905), Langevin dynamics Kampen (2007), and dissipative particle dynamics (DPD) Hoogerbrugge and Koelman (1992); Español and Warren (1995). Despite their broad applications, studies Lei et al. (2010); Hijón et al. (2010); Yoshimoto et al. (2013) based on direct construction from full MD show that the empirical (e.g., pairwise additive) forms can be insufficient to capture the state-dependent energy-dissipation processes due to the many-body and non-Markovian effects. Recent efforts Lange and Grubmüller (2006); Wang et al. (2020b); Ceriotti et al. (2009); Baczewski and Bond (2013); Davtyan et al. (2015); Lei et al. (2016); Li et al. (2017); Russo et al. (2019); Jung et al. (2017); Lee et al. (2019); Ma et al. (2019, 2021); Klippenstein and van der Vegt (2021); Vroylandt et al. (2022); She et al. (2023); Xie et al. (2022) model the memory term based on the generalized Langevin equation (GLE) and its variants (see also review Klippenstein et al. (2021)). The velocity auto-correlation function (VACF) is often used as the target quantity for model parameterization. While it may serve as an appropriate measure for certain non-extensive systems Widder et al. (2022); Meyer et al. (2019), the VACF is essentially a metric of the background dissipation under mean-field approximation. For extensive CGMD systems, the homogeneous kernel overlooks the heterogeneity of the energy dissipation among the CG particles stemming from the many-body nature of the marginal probability density function of the CG variables. This limitation imposes a fundamental challenge for accurately modeling the local irreversible responses as well as the transport and diffusion processes on the collective scale. This work aims to fill the gap with a new CG model that faithfully entails the state-dependent non-Markovian memory and the coherent noise for extensive MD systems. The model formulation can be loosely viewed as an extended dynamics of the CG variables joint with a set of non-Markovian features that embodies the many-body nature of the energy dissipation among the CG particles. Specifically, we treat each CG particle as an agent and seek a set of symmetry-preserving neural network (NN) representations that directly map its local environments to the non-Markovian friction interactions, and thereby circumvent the exhausting efforts of fitting the individual memory terms 6 with a unified empirical form. Different from the ML-based potential model Zhang et al. (2018b), the memory terms are represented by NNs in form of second-order tensors that strictly preserve the rotational symmetry and the positive-definite constraint. Coherent noise can be introduced satisfying the second fluctuation-dissipation theorem and retaining consistent invariant distribution. Rather than matching the VACF, the model is trained based on the Mori-Zwanzig (MZ) projection formalism such that the effects of the unresolved interactions can be seamlessly inherited. We emphasize that the construction is not merely for mathematical rigor. Numerical results of a polymer molecule system show that the CG models with empirical memory forms are generally insufficient to capture heterogeneous inter-molecular dissipation that leads to inaccurate cross-correlation functions among the particles. Fortunately, the present model can reproduce both the auto- and cross-correlation functions. More importantly, it accurately predicts the challenging collective dynamics characterized by the hydrodynamic mode correlation and the van Hove function Van Hove (1954) and shows the promise to predict the meso-scale transport and diffusion processes encoded with molecular-level fidelity. 2.2 Mori-Zwanzig Formulaism Let us consider a full MD system consisting of 𝑀 molecules with a total number of 𝑁 atoms. The phase space vector is denoted by z = [q, p], where q, p ∈ R3𝑁 represent the position and momentum vector, respectively. Given z(0) = z0, the evolution follows z(𝑡) = eL𝑡z0, where L is the Liouville operator determined by the Hamiltonian 𝐻 (z). The CG variables are defined by representing each molecule as a CG particle, i.e., 𝜙(z) = (cid:2)𝜙𝑄 (z), 𝜙𝑃 (z)(cid:3), where ϕ𝑄 (z) = [Q1, Q2, · · · , Q𝑀] and ϕ𝑃 (z) = [P1, P2, · · · , P𝑀] represent the center of mass and the total momentum of individual molecules, respectively. Z (𝑡) = [Q(𝑡), P (𝑡)] denote the map 𝜙(z(𝑡)) with z(0) = z0. Using the Koopman operator Koopman (1931), Z (𝑡) can be mapped from the initial values, i.e., Z (𝑡) = eL𝑡Z (0), (2.1) where L is the Liouville operator determined by the full-model Hamiltonian 𝐻 (z). Below we derive the reduced model by choosing CG variables Z as a linear mapping of the full phase-space 7 vector z and we refer to Ref. Hijón et al. (2010); Darve et al. (2009) for discussions of the more general cases. Following Zwanzig’s approach, we define a projection operator as the conditional expectation with a fixed CG vector Z, i.e., PZ 𝑓 (z) := ∫ 𝛿(ϕ(z) − Z) 𝜌0(z) 𝑓 (z) dz/Ω(Z), where 𝜌0(z) ∝ e−𝛽𝐻 (z) represents the equilibrium density function and Ω(Z) = ∫ 𝛿(ϕ(z) −Z) 𝜌0(z) dz. Also, we define an orthogonal operator QZ = I − PZ. Using Eq. (2.1), we have (cid:164)Z (𝑡) = eL𝑡 PZ LZ (0) + eL𝑡 QZ LZ (0). In particular, we choose Z = [Q, P ] = Z (0). Using the Duhamel-Dyson identity, we can write the dynamics of Z (𝑡) as (cid:164)Z (𝑡) = eL𝑡 PZ LZ (0) + ∫ 𝑡 0 d𝑠eL (𝑡−𝑠) PZ LeQZ L𝑠QZ LZ (0) + eQZ L𝑡 QZ LZ (0). (2.2) Let us start with the mean-field term PZ LZ (0). For the present study, the CG variables are linear functions of z. Therefore, we have PZ LQ = LQ = M −1P , i.e., QZ LQ ≡ 0. For PZ LP associated with the 𝑖-th CG particle, we have ∫ ∫ ∫ PZ LP𝑖 = = = 𝛿(ϕ(z) − Z) 𝜌0(z)LP𝑖 dz/Ω(Z) 𝛿(ϕ(z) − Z) 𝜌0(z) (− ∇q𝑖 𝐻 (z)) dz/Ω(Z) ∑︁ 𝑖∈N𝑖 𝛿(ϕ(z) − Z)(𝛽−1 ∑︁ 𝑖∈N𝑖 𝛿(ϕ𝑄 𝜙(q) − Q) 𝜌0(q) dq/ ∫ ∇q𝑖 ) 𝜌0(z) dz/Ω(Z) = 𝛽−1∇Q𝑖 ∫ 𝛿(ϕ𝑄 (q) − Q) 𝜌0(q) dq (2.3) = −∇Q𝑖𝑈 (Q), where N𝑖 represents the index set of the atoms that belongs to the 𝑖-th molecule, and the free energy 𝑈 (Q) = −𝛽−1 ln (cid:2)∫ 𝛿(ϕ𝑄 (q) − Q) 𝜌0(q) dq(cid:3). 8 For the memory term PZ LeQZ L𝑠QZ LP associated with the 𝑖-th CG particle, we have PZ LeQZ L𝑠QZ LP𝑖 = = = ∫ ∫ ∫ 𝜌0(z)𝛿(ϕ(z) − Z)LeQZ L𝑠QZ LP𝑖 dz/Ω(Z) 𝜌0(z)(L𝜙(z) · ∇Z)𝛿(ϕ(z) − Z)eQZ L𝑠QZ LP𝑖 dz/Ω(Z) 𝜌0(z)(QZ LP · ∇P )𝛿(ϕ(z) − Z)eQZ L𝑠QZ LP𝑖 dz/Ω(Z) (b𝑦 QZ LQ ≡ 0) ∫ 𝜌0(z)𝛿(ϕ(z) − Z) (QZ LP ) ⊗ eQZ L𝑠QZ LP𝑖 dz/Ω(Z) = ∇P · (cid:18)∫ = ∇P · (cid:124) 𝜌0(z)𝛿(ϕ(z) − Z) (QZ LP ) ⊗ eQZ L𝑠QZ LP𝑖 dz/Ω(Z) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) ˜K𝑖, (Z,𝑠) (cid:19) (cid:125) − ˜K𝑖,(Z, 𝑠) · ∇P (1/Ω(Z)) Ω(Z). (2.4) Furthermore, we take the assumption that the memory kernel only depends on the positions of the CG particles Q, i.e., ∇P · ˜K (Z, 𝑠) ≡ 0. Also, similar to the derivation in Eq. (2.3), we note that ∫ Ω(Z) ∝ 𝛿(ϕ𝑄 (q) − Q) 𝜌0(q)𝛿(ϕ𝑃 (q) − P )𝑒−𝛽P 𝑇 M −1P /2 dz ∝ 𝑒−𝛽P 𝑇 M −1P /2. (2.5) Therefore, Eq. (2.4) can be further simplified as PZ LeQZ L𝑠QZ LP𝑖 = −𝛽 ˜K𝑖,(Q, 𝑠) · M −1P . With Eqs. (2.3) (2.6), we can show that the dynamics of Z = [Q, P ] can be written as (cid:164)Q = M −1P (cid:164)P = −∇𝑈 (Q) − ∫ 𝑡 0 K (Q(𝑡 − 𝑠), 𝑠)V (𝑡 − 𝑠) d𝑠 + R(𝑡), (2.6) (2.7) where K (Q, 𝑠) = 𝛽 ˜K (Q, 𝑠) and R(𝑡) = eQZ L𝑡 QZ LZ (0) is modeled as a random process representing the different initial condition z0 with 𝜙(z0) = Z. 2.3 Empirical Approaches to Coarse-Graining Eq. (2.7) provides the starting point to derive the various CG models. Direct evaluation of K (Q, 𝑡) imposes a challenge as it relies on solving the full-dimensional orthogonal dynamics eQZ 𝑡. Some empirical models have been developed to represent the kernel historically. 9 Figure 2.1 The RDF of the full MD and various CG models with the conservative CG potential 𝑈 (Q) constructed by the DeepCG model. 2.3.1 M-DPD • Formulation: In the M-DPD model, the dynamics of the CG system is governed by (cid:164)Q𝑖 = 𝑚−1 𝑖 P𝑖, (cid:164)P𝑖 = F 𝐶 𝑖 − ∑︁ 𝑗 𝛾𝑖 𝑗 V 𝑗 + R(𝑡). (2.8) The computation of conservative force term F 𝐶 is conducted by DeePCG Zhang et al. (2018b) method for all models considered in this paper. This approach enables the successful reproduction of the structural properties for each model as shown in Fig. 2.1. • Construction: The friction tensor is calculated from random force correlation by constrained dynamics 𝛾𝑖 𝑗 (Q) = 𝛾⊥(𝑄𝑖 𝑗 ) (1 − e𝑖 𝑗 e𝑇 𝑖 𝑗 ) + 𝛾∥ (𝑄𝑖 𝑗 ) (e𝑖 𝑗 e𝑇 𝑖 𝑗 ), 𝛾∥ (𝑄𝑖 𝑗 ) = 𝛾⊥(𝑄𝑖 𝑗 ) = 1 𝑘 𝐵𝑇 1 𝑘 𝐵𝑇 ∫ ∞ 0 ∫ ∞ 0 E[𝛿F ∥ 𝑖 (𝑠)𝑇 · 𝛿F ∥ 𝑗 (0)] d𝑠, E[𝛿F ⊥ 𝑖 (𝑠)𝑇 · 𝛿F ⊥ 𝑗 (0)] d𝑠, (2.9) where e𝑖 𝑗 is the unit vector represents along Q𝑖 𝑗 , the symbol ∥ represents the radial component along Q𝑖 𝑗 and ⊥ for perpendicular component. For further details, see Lei et al. (2010); Hijón 10 010203040x0.00.51.01.52.0RDFFull MDDeepCG et al. (2010). 2.3.2 NM-GLE • Formulation: In the standard GLE model, the dynamics of the CG system are governed by (cid:164)Q = M −1P , ∫ 𝑡 (cid:164)P = F 𝐶 − 0 K (𝑡 − 𝑠)V (𝑠) d𝑠 + R(𝑡). (2.10) where the over-simplified kernel K is only time-independent and homogeneous among the CG particles, as discussed in the main context. • Construction: The memory kernel depends on solving the Volterra equation G(𝑡) = − ∫ 𝑡 0 K (𝑡 − 𝑠)C (𝑠) d𝑠, (2.11) where C (𝑡) = E[V (𝑡)V (0)𝑇 ] and G(𝑡) = E[𝛿F (𝑡)V (0)𝑇 ] are the correlation matrices. In practice, the velocity correlation matrix C (𝑡) is often simplified to VACF c(𝑡) = E[V (𝑡)𝑇 · V (0)] for individual CG particles under the mean-field approximation. As a result, kernel K reduces to a scalar function depending only on time, which overlooks the state dependency. With the obtained kernel in proper form, the system can be cast into an extended dynamics driven by white noise, e.g., see Ceriotti et al. (2009); Lei et al. (2016); Wang et al. (2020b); She et al. (2023). 2.3.3 NM-DPD • Formulation: In the NM-DPD model, the dynamics of the CG system are governed by (cid:164)Q𝑖 = 𝑚−1 𝑖 P𝑖, (cid:164)P𝑖 = F 𝐶 𝑖 − ∫ 𝑡 ∑︁ 𝑗 0 K𝑖 𝑗 (𝑡 − 𝑠)V𝑖 𝑗 (𝑠) d𝑠 + R𝑖 𝑗 (𝑡), ∑︁ 𝑗 (2.12) where the kernel K𝑖 𝑗 further depends on the pair-wise distance 𝑄𝑖 𝑗 . • Construction: The memory kernel depends on solving the Volterra integral equation A𝑖 𝑗 (𝑡) = − ∫ 𝑡 0 K𝑖 𝑗 (𝑡 − 𝑠)B𝑖 𝑗 (𝑠) d𝑠, (2.13) 11 where the correlation functions are computed as two orthogonal parts, i.e., A𝑖 𝑗 (𝑡) = E[δF ∥ 𝑖 𝑗 (𝑡) · V ∥ 𝑖 𝑗 (0)𝑇 ] + E[𝛿F ⊥ 𝑖 𝑗 (𝑡) · V ⊥ 𝑖 𝑗 (0)𝑇 ], B𝑖 𝑗 (𝑡) = E[V ∥ 𝑖 𝑗 (𝑡) · V ∥ 𝑖 𝑗 (0)𝑇 ] + E[V ⊥ 𝑖 𝑗 (𝑡) · V ⊥ 𝑖 𝑗 (0)𝑇 ]. (2.14) As a result, the orthogonal parts of the friction term can be solved separately, and extended dynamics can be also applied for simulation, e.g., see Li et al. (2017). However, as shown below, such empirical forms are limited to capturing the state-dependence that turns out to be crucial for the dynamics on the collective scale, and motivates the present model retaining the many-body nature of K (Q, 𝑡). 2.4 Incorporating Physical Memory Kernels in Coarse-Grained Models To elaborate the essential idea, let us start with the Markovian approximation K (Q, 𝑡) ≈ −𝚪(Q)𝛿(𝑡), where 𝚪(Q) = 𝚵(Q)𝚵(Q)𝑇 is the friction tensor preserving the semi-positive definite condition, and 𝚵(Q) needs to retain the translational, rotational, and permutational symmetry, i.e., 𝚵𝑖 𝑗 (Q1 + b, · · · , Q𝑀 + b) = 𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) 𝚵𝑖 𝑗 (UQ1, · · · , UQ𝑀) = U𝚵𝑖 𝑗 (Q1, · · · , Q𝑀)U𝑇 (2.15) 𝚵𝜎(𝑖)𝜎( 𝑗) (Q𝜎(1), · · · , Q𝜎(𝑀)) = 𝚵𝑖 𝑗 (Q1, · · · , Q𝑀), where 𝚵𝑖 𝑗 ∈ R3×3 represents the friction contribution of 𝑗-th particle on 𝑖-th particle, b ∈ R3 is a translation vector, U is a unitary matrix, and 𝜎(·) is a permutation function. To inherit the many-body interactions, we map the local environment of each CG particle into a set of generalized coordinates, i.e., ˆQ𝑘 𝑖 = Q𝑖 + (cid:205)𝑙∈N𝑖 𝑓 𝑘 (𝑄𝑖𝑙)Q𝑖𝑙, where f : R → R𝐾 is an encoder function to be learned, and N𝑖 = {𝑙 |𝑄𝑖𝑙 < 𝑟𝑐} is the neighboring index set of the 𝑖-th particle within a cut-off distance 𝑟𝑐. Accordingly, ˆQ𝑖 𝑗 ∈ R3×𝐾 represents a set of features that encode the 𝑖 − ˆQ𝑘 𝑗 inter-molecular configurations beyond the pairwise approximation. The 𝑘-th column ˆQ𝑘 𝑖 𝑗 = ˆQ𝑘 preserves the translational and permutational invariance, by which we represent 𝚵𝑖 𝑗 by 𝚵𝑖 𝑗 = 𝐾 ∑︁ 𝑘=1 ℎ𝑘 ( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 ) ˆQ𝑘 𝑖 𝑗 ⊗ ˆQ𝑘 𝑖 𝑗 + ℎ0( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 )I (2.16) where ℎ : R𝐾×𝐾 → R𝐾+1 are encoder functions which will be represented by NNs. For 𝑖 = 𝑗, we have 𝚵𝑖𝑖 = − (cid:205) 𝑗 ∈N𝑖 𝚵𝑖 𝑗 based on the Newton’s third law. 12 Eq. (2.16) entails the state-dependency of the memory term K (Z, 𝑡) under the Markovian approximation. To incorporate the non-Markovian effect, we embed the memory term within an extended Markovian dynamics Ceriotti et al. (2009) (see also Ref. She et al. (2023)). Specifically, we seek a set of non-Markovian features ζ := [ζ1, ζ2, · · · , ζ𝑛], and construct the joint dynamics of [Z, ζ] by imposing the many-body form of the friction tensor between P and ζ, i.e., (cid:164)Q = M −1P (cid:164)P = −∇𝑈 (Q) + 𝚵(Q)ζ (cid:164)ζ = −𝚵(Q)𝑇 V − 𝚲ζ + ξ(𝑡), (2.17) where 𝚵 = (cid:2)𝚵1𝚵2 · · · 𝚵𝑛(cid:3) and each sub-matrix takes the form (2.16) constructed by {f 𝑖 (·), h𝑖 (·)}𝑛 𝑖=1 respectively. 𝚲 = ˆ𝚲 ⊗ I represents the coupling among 𝑛 features, where I ∈ R3𝑁×3𝑁 is the identity 𝑇 matrix and ˆ𝚲 ∈ R𝑛×𝑛needs to satisfy the Lyapunov stability condition ˆ𝚲 + ˆ𝚲 ≥ 0. Therefore, we write ˆ𝚲 = ˆL ˆL𝑇 + ˆL𝑎, where ˆL is a lower triangular matrix and ˆL𝑎 is an anti-symmetry matrix which will be determined later. By choosing the white noise ξ(𝑡) following ⟨ξ(𝑡)ξ(𝑡′)⟩ = 𝛽−1(𝚲 + 𝚲𝑇 )𝛿(𝑡 − 𝑡′), (2.18) Eq. (2.17) retains the consistent invariant distribution 𝜌(Q, P , ξ) ∝ exp[−𝛽(𝑈 (Q) + P 𝑇M −1P /2 + ζ𝑇 ζ/2)]. 2.5 Symmetry-preserving neural network representation Preserving the physical symmetry constraints is crucial for both the accuracy and the generalization ability of the constructed ML-models. Besides the conservative potential 𝑈 (Q), the constructed memory term will need to satisfy the translation- and permutation-invariance, as well as the rotation-symmetries. Let Tb, RU, and P𝜎 denote the translation, rotation, and permutation operator whose actions on a general function F (Q1, · · · , Q𝑀) defined by TbF (Q1, · · · , Q𝑀) := F (Q1 + b, · · · , Q𝑀 + b), RU F (Q1, · · · , Q𝑀) := F (Q1U, · · · , Q𝑀U), (2.19) P𝜎F (Q1, · · · , Q𝑀) := F (Q𝜎(1), · · · , Q𝜎(𝑀)), 13 where b ∈ R3 is a position vector, U ∈ R3×3 is an orthogonal matrix and 𝜎 is an arbitrary permutation of the set of indices. The components of the constructed memory will need to satisfy the symmetry constraints Tb𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) = 𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) RU𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) = U𝚵𝑖 𝑗 (Q1, · · · , Q𝑀)U𝑇 (2.20) P𝜎𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) = 𝚵𝜎(𝑖)𝜎( 𝑗) (Q𝜎(1), · · · , Q𝜎(𝑀)), Proposition 2.5.1. The representation 𝚵𝑖 𝑗 = 𝐾 ∑︁ ℎ𝑘 ( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 ) (cid:16) ˆQ𝑘 𝑖 𝑗 (cid:17) (cid:16) ˆQ𝑘 𝑖 𝑗 (cid:17)𝑇 +ℎ0( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 )I preserves the symmetry conditions (2.20), where ˆQ𝑘 𝑘=1 𝑖 = Q𝑖+ ∑︁ 𝑓 𝑘 (𝑄𝑖𝑙)Q𝑖𝑙 represents the local environment-determined 𝑙∈N𝑖 features (generalized coordinate) for the 𝑖-th particle, f : R → R𝐾 and h : R𝐾×𝐾 → R𝐾+1 are two encoder functions. (cid:13) Proof. We note that TbQ𝑖 𝑗 = TbQ𝑖 − TbQ 𝑗 = Q𝑖 𝑗 , Tb𝑄𝑖 𝑗 = (cid:13) (cid:13) = 𝑄𝑖 𝑗 , RUQ𝑖 𝑗 = UQ𝑖 𝑗 , RU𝑄𝑖 𝑗 = 𝑄𝑖 𝑗 , P𝜎Q𝑖 𝑗 = Q𝜎(𝑖)𝜎( 𝑗), and P𝜎𝑄𝑖 𝑗 = 𝑄𝜎(𝑖)𝜎( 𝑗). Therefore, for arbitrary indices 𝑖 and 𝑘, the feature ˆQ𝑘 𝑖 satisfy the following symmetry conditions (cid:13)TbQ𝑖 − TbQ 𝑗 Tb ˆQ𝑘 𝑖 = TbQ𝑖 + ∑︁ 𝑓 𝑘 (Tb𝑄𝑖𝑙)TbQ𝑖𝑙 = ˆQ𝑘 𝑖 + b RU ˆQ𝑘 𝑖 = RUQ𝑖 + 𝑙∈N𝑖 ∑︁ 𝑓 𝑘 (RU𝑄𝑖𝑙)RUQ𝑖𝑙 = U ˆQ𝑘 𝑖 P𝜎 ˆQ𝑘 𝑖 = P𝜎Q𝑖 + 𝑙∈N𝑖 ∑︁ 𝑓 𝑘 (P𝜎𝑄𝑖𝑙)P𝜎Q𝑖𝑙 = ˆQ𝑘 , 𝜎(𝑖) 𝑙∈N𝜎 (𝑖) where we have used the fact that (cid:205)𝑙 𝑓 (𝑟𝑙)r𝑙 is permutational invariant for the last equation. (2.21) Therefore, we have Tb ˆQ𝑖 𝑗 = Tb ˆQ𝑖 − Tb ˆQ 𝑗 = ˆQ𝑖 𝑗 , Tb ˆ𝑄𝑖 𝑗 = ∥Tb ˆQ𝑖 − Tb ˆQ 𝑗 ∥ = ˆ𝑄𝑖 𝑗 , RU ˆQ𝑖 𝑗 = U ˆQ𝑖 𝑗 , RU ˆ𝑄𝑖 𝑗 = ˆ𝑄𝑖 𝑗 , P𝜎 ˆQ𝑖 𝑗 = ˆQ𝜎(𝑖)𝜎( 𝑗), and P𝜎 ˆ𝑄𝑖 𝑗 = ˆ𝑄𝜎(𝑖)𝜎( 𝑗). Thus, for arbitrary indices 𝑖, 𝑗 and 𝑘, the encoder functions ℎ𝑘 ( ˆQ𝑖 𝑗 ˆQ𝑇 𝑖 𝑗 ) satisfy the following symmetry condition Tbℎ𝑘 ( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 ) = ℎ𝑘 ((Tb ˆQ𝑖 𝑗 )𝑇 Tb ˆQ𝑖 𝑗 ) = ℎ𝑘 ( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 ) RU ℎ𝑘 ( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 ) = ℎ𝑘 ((RU ˆQ𝑖 𝑗 )𝑇 RU ˆQ𝑖 𝑗 ) = ℎ𝑘 ( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 ) P𝜎 ℎ𝑘 ( ˆQ𝑇 𝑖 𝑗 ˆQ𝑖 𝑗 ) = ℎ𝑘 ( ˆQ𝑇 𝜎(𝑖)𝜎( 𝑗) ˆQ𝜎(𝑖)𝜎( 𝑗)). Plugging Eq. (2.22) into the definition of 𝚵𝑖 𝑗 yields (2.20). 14 (2.22) □ 2.6 Training Eq. (2.17) departs from the common CG models by retaining both the heterogeneity and non-Markovianity of the energy dissipation process. Rather than matching the mean-field metrics such as the homogeneous VACF, we learn the embedded memory 𝚵(Q(𝑡))e𝚲(𝑡−𝑠)𝚵(Q(𝑠))𝑇 based on the MZ form. However, directly solving the orthogonal dynamics eQZ 𝐿𝑡 is computationally intractable. Alternatively, we introduce the constrained dynamics ˜z(𝑡) = eR𝑡z(0) following Ref. Hijón et al. (2010). Based on the observation PQ = PR ≡ 0, we sample the MZ form from ˜z(𝑡), i.e., K𝑀 𝑍 (Z, 𝑡) = PZ [(eR𝑡 QZ LP ) (QZ LP )𝑇 ] and the memory of the CG model reduces to K𝐶𝐺 (Z, 𝑡) = 𝚵(Q)e𝚲𝑡𝚵(Q)𝑇 . This enables us to train the CG models in terms of the encoders {f 𝑖 (·), h𝑖 (·)}𝑛 𝑖=1 and matrices ˆL and ˆL𝑎 by minimizing the empirical loss 𝐿 = 𝑁𝑠 ∑︁ 𝑁𝑡 ∑︁ 𝑙=1 𝑗=1 (cid:13) (cid:13)K𝐶𝐺 (Z (𝑙), 𝑡 𝑗 ) − K𝑀 𝑍 (Z (𝑙), 𝑡 𝑗 ) (cid:13) (cid:13) 2 (cid:13) (cid:13) , (2.23) where 𝑙 represents the different CG configurations. 2.7 Numerical Result To demonstrate the accuracy of the present model, we consider a full micro-scale model of a star-shaped polymer melt system similar to Ref. Hijón et al. (2010), where each molecule consists of 73 atoms. The atomistic interactions are modeled by the Weeks-Chandler-Anderse potential and the Hookean bond potential. The full system consists of 486 molecules in a cubic domain 90 × 90 × 90 with periodic boundary conditions. The Nosé-Hoover thermostat Nosé (1984); Hoover (1985) is employed to equilibrate the system with 𝑘 𝐵𝑇 = 4.0 and micro-canonical ensemble simulation is conducted during the production stage. Below we compare different dynamic properties predicted by the full MD and the various CG models. For fair comparisons, we use the same CG potential 𝑈 (Q) constructed by the DeePCG scheme Zhang et al. (2018b) for all the CG models; the differences in dynamic properties solely arise from the different formulations of the memory term. Let us start with the VACF which has been broadly used in CG model parameterization and validation. As shown in Fig. 2.2, the predictions from the present model (NM-MB) show good agreement with the full MD results. In contrast, the CG model with the memory term represented by 15 Figure 2.2 The VACF of the full MD and CG models with various memory formulations in (a) semi-log scale (b) original scale. “M” and “NM” represent Markovian and Non-Markovian; GLE, DPD, and MB represent state-independent, pairwise, and the present (NM-MB) model retaining the many-body effects, respectively. the pairwise decomposition and Markovian approximation (i.e., the standard M-DPD form) yields apparent deviations. The form of the pairwise decomposition with non-Markovian approximation (NM-DPD) shows improvement at a short time scale but exhibits large deviations at an intermediate scale. Such limitations indicate pronounced many-body effects in the energy dissipation among the CG particles. Alternatively, if we set the VACF as the target quantity, we can parameterize the empirical model such as GLE by matching the VACF predicted by the full MD. Indeed, the prediction from the constructed GLE recovers the MD results. However, as shown below, this form over-simplifies the heterogeneity of the memory term and leads to inaccurate predictions on the collective scales. Fig. 2.3 shows the velocity cross-correlation function (VCCF) between two CG particles, i.e., 𝐶𝑥𝑥 (𝑡; 𝑟0) = E[V𝑖 (0) · V 𝑗 (𝑡)|𝑄𝑖 𝑗 (0) = 𝑟0], where 𝑟0 represents the initial distance. Similar to VACF, the present model (NM-MB) yields good agreement with the full MD results. However, the predictions from other empirical models, including the GLE form, show apparent deviations. Such limitations arise from the inconsistent representation of the local energy dissipation and can be understood as following. The VACF represents the energy dissipation on each particle as a 16 1031021011000.10.40.71.0(a)Full MDM-DPDNM-GLENM-DPDNM-MB012340.10.40.71.0(b)VACFTime Figure 2.3 The VCCF 𝐶𝑥𝑥 (𝑡; 𝑟0) predicted by the full MD and different CG models with initial distance (a) 10 < 𝑟0 < 11 and (b) 14 < 𝑟0 < 15. Same line legend as Fig. 2.2. homogeneous background heat bath; it is essentially a mean-field metric and can not characterize the dissipative interactions among the particles. Hence, the reduced models that only recover the VACF could be insufficient to retain the consistent local momentum transport and the correlations among the particles. Figure 2.4 (a) Longitudinal and (b) Transverse hydrodynamic modes predicted by MD and different CG models. Same line legend as Fig. 2.2. 17 0240.0050.0200.035(a)0240.000.030.06(b)TimeVCCF051015200.50.00.51.0Longitudinal(a)051015200.000.250.500.751.00Transverse(b)Time Furthermore, the various empirical models for local energy dissipations can lead to fundamentally different transport processes on the collective scale. Fig. 2.4 shows the normalized correlations of the longitudinal and transverse hydrodynamic modes Hansen and McDonald (1990), i.e., 𝐶𝐿 (𝑡) = ⟨ ˜𝑢1(𝑡) ˜𝑢1(0)⟩ and 𝐶𝑇 (𝑡) = ⟨ ˜𝑢2(𝑡) ˜𝑢2(0)⟩, where ˜u = 1/𝑀 (cid:205)𝑀 𝑗=1 V 𝑗 e𝑖k·Q 𝑗 , k is the wave vector, and the subscripts 1 and 2 represent the direction parallel and perpendicular to k, respectively. Similar to the VCCF, the prediction from the present model (NM-MB) agrees well with the MD results while other models show apparent deviations. In particular, the prediction from the GLE model shows strong over-damping due to the ignorance of the inter-molecule dissipations. Finally, we examine the diffusion process on the collective scale. Fig. 2.5 shows the van Hove function that characterizes the evolution of the inter-particle structural correlation defined by 𝐺 (𝑟, 𝑡) ∝ 1 𝑀 2 (cid:205)𝑀 𝑗≠𝑖 𝛿(∥Q𝑖 (𝑡) − Q 𝑗 (0) ∥ − 𝑟). At 𝑡 = 0, 𝐺 (𝑟, 𝑡) reduces to the standard radial distribution function where all the CG models can recover such initial conditions. However, for 𝑡 > 0, predictions from the models with the pairwise decomposition (NM-DPD) and the GLE form show apparent deviations. Specifically, at an early stage near 𝑡 = 50, the neighboring particles begin to artificially jump into the region near the reference particle, violating the fluid-structure thereafter. In contrast, the present model (NM-MB) shows consistent predictions of the structure evolution over a long period until 𝑡 = 1000, when the initial fluid structure ultimately diffuses into a homogeneous state. 2.8 Conclusion To conclude, this work reveals a caveat in constructing reliable CGMD models that preserve the collective dynamics. Unlike the empirical forms, we developed a CG model that faithfully accounts for the broadly overlooked many-body nature of the non-Markovian memory term for extensive MD systems. While the significance of preserving the many-body nature of the conservative force field on static properties has been gradually recognized, the caveat on the memory term seems to remain under-explored. We show that retaining the heterogeneity and the strong correlation of the local energy dissipation is crucial for accurately predicting the cross-correlation among the CG particles, which, however, can not be fully characterized by the mean-field metrics such as VACF. 18 Figure 2.5 The van Hove function predicted by (a) full MD (b) the present NM-MB model (c) NM-DPD model (d) GLE model. It depicts the time evolution (y-axis) of the radius distribution function (x-axis). 19 0102030200400600(c)1.00.80.80.81.00200400600800(a)0.00.20.40.60.81.01.00102030(d)1.00.80.80.81.0(b)1.00.80.80.61.0Timer More importantly, the memory form representing the inter-molecule energy dissipations may play a profound role in the transport and diffusion processes on the collective scale. In particular, the present model accurately predicts the hydrodynamic mode correlation and the van Hove function where empirical forms show limitations, and therefore, shows the promise to accurately predict the emergent phenomena relevant to hydrodynamic transport and diffusive processes on the collective scale. 20 CHAPTER 3 GENERALIZATION OF REDUCED MODELS FOR NON-EQUILIBRIUM DYNAMICS 3.1 Challenges in Modeling Non-Equilibrium Processes Predictive modeling of the collective behaviors of multi-scale physical systems poses a persistent challenge for both fundamental science advancement and various applications Anderson (1972). While the canonical molecular dynamics enables us to faithfully account for the micro-scale interactions, numerical simulations often show limitations for systems without clear scale separation, where the computational cost could become prohibitive to reach the resolved scale of interest. This motivates the construction of various coarse-grained molecular dynamics (CGMD) models. By picking a set of collective variables (CVs), the CG models seek the reduced dynamics with less degrees of freedom to achieve a broader range of spatio-temporal scales. The constructed CG models are generally governed by a conservative CG potential representing the free energy of the resolved CVs and a memory term (along with a coherent noise) representing the energy dissipation that arises from the coupling with unresolved variables. Extensive research efforts Rosso et al. (2002); Maragliano and Vanden-Eijnden (2006); Izvekov and Voth (2005); Noid et al. (2008); Rudd and Broughton (1998); Lyubartsev and Laaksonen (1995); Shell (2008); Nielsen et al. (2004); Laio and Parrinello (2002); Darve and Pohorille (2001); Soper (1996); Reith et al. (2003); Nielsen et al. (2004); Das and Andersen (2012), including machine learning (ML) based approaches Behler and Parrinello (2007); Stecher et al. (2014a); John and Csányi (2017); Lemke and Peter (2017); Chmiela et al. (2017); Zhang et al. (2018b); Ge et al. (2023); van der Oord et al. (2023), have been dedicated to developing the CG potential to preserve the marginal density distribution of the CG variables and hence the various static properties. To retain the CG dynamics, the memory term needs to be properly introduced. Several approaches have been developed based on the direct approximations Darve et al. (2009); Lei et al. (2010); Hijón et al. (2010); Yoshimoto et al. (2013); Hudson and Li (2020) of the Mori-Zwanzig (MZ) projection formalism Mori (1965); Zwanzig (1973) and data-driven parameterization Lange and Grubmüller (2006); Ceriotti et al. (2009); Coifman et al. (2008); Crosskey and Maggioni (2017); Baczewski and Bond (2013); Ma et al. (2016); Lei et al. 21 (2016); Russo et al. (2019); Jung et al. (2017); Lee et al. (2019); Klippenstein and van der Vegt (2021); Vroylandt et al. (2022); She et al. (2023); Xie et al. (2022); Ge et al. (2024) of empirical forms (e.g., dissipative particle dynamics Hoogerbrugge and Koelman (1992); Español and Warren (1995), the generalized Langevin dynamics Zwanzig (2001)). A recent work Lyu and Lei (2023b) proposed a symmetry-preserving representation of the memory term that accounts for both non-Markovianity and the many-body nature among the CG variables, which proves to be crucial for diffusion and transport processes on a resolved scale. Ideally, by accurately constructing the conservative free energy and the memory term, the CG models will enable us to quantify the propagation of the micro-scale interactions and therefore probe the collective behaviors across multiple scales. However, the validity and generalization ability for practical applications remains under-explored. In particular, for extensive MD systems (i.e., the number of molecules proportional to the system size), the CG variables are often chosen a priori such as the centers of mass (COMs) of individual molecules; the CG model is generally constructed such that certain dynamic properties (e.g, the mean square displacement, velocity correlation functions) under equilibrium can be properly reproduced. On the other hand, the model’s applicability for non-equilibrium processes remains questionable. For instance, if we coarse grain a polymer melt system using the COMs of individual molecules, we should not expect the CG model can capture the visco-elastic responses arising from the molecule deformation under an external flow field. From a model reduction perspective, this limitation arises from the choice of the CG projection operator, which is generally defined with respect to the (marginal) equilibrium density of the full MD system. Accordingly, reduced dynamics can recover the full MD prediction only when the underlying distribution is close to equilibrium. However, this caveat seems to be broadly overlooked in existing CGMD models that use equilibrium dynamic properties as the metric, and therefore poses fundamental challenges for predicting the non-equilibrium processes in real applications. In principle, we may directly construct the reduced model for a non-equilibrium process by defining a projection operator with respect to a specific probability measure. However, the reduced dynamics typically relies on a time-dependent projection operator (e.g., see Ref. Kawasaki and 22 Gunton (1973); Willis and Picard (1974); Baxevani et al. (2023)), which, as a result, generates a non-stationary memory term that may not be transferred to a different scenario. In this work, we present a new approach for constructing reliable CGMD models applicable to non-equilibrium processes. Rather than using pre-selected CG variables such as the COMs, we seek a set of auxiliary CG variables as the generalized coordinates of each molecule for the optimal representation of both the intra- and inter-molecular interactions. The key observation is that by systematically introducing these auxiliary CG variables, the conditional distribution of unresolved variables under various non-equilibrium conditions approaches that under equilibrium conditions. This warrants the generalization ability and enables us to transfer the empirical approximation of the non-equilibrium processes into the construction of the Zwanzig’s projection dynamics with respect to a probability density distribution with augmented conditional variables. To construct the reduced dynamics, we note that both the free energy and the memory term exhibit the many-body nature. We generalize the symmetry-preserving neural network representation of the state-dependent memory developed in our previous work Lyu and Lei (2023b) to model both the intra- and inter-energy dissipation. For each CG variable, we further introduce a number of non-Markovian features, allowing a coherent white noise term to be naturally imposed that satisfies the second fluctuation-dissipation theorem and preserves the invariant distribution. We demonstrate the proposed method by constructing the CGMD models of a polymer melt system. Numerical results show that the CG model based on the pre-selected COMs can recover the dynamic properties near equilibrium but is insufficient to predict the reduced dynamics under external flow conditions. Conversely, the present model guarantees that the unresolved orthogonal dynamics remains near equilibrium, ensuring the applicability of the projection formalism, and therefore, accurately predict non-equilibrium processes under various flow field conditions. 3.2 Methodology for Generalizable Coarse-Grained Models 3.2.1 Preliminaries Let us consider a full MD system consisting of 𝑀 molecules with a total number of 𝑁 atoms. For simplicity, we assume each molecule consists of 𝑁𝑚 atoms, and the mass is set to be unit. The 23 full model is governed by (cid:164)z = S∇𝐻 (z) z(0) = z0, (3.1) where 𝐻 (z) is the Hamiltonian, z = (cid:2)q1, q2, · · · , q𝑀, p1, p2, · · · , p𝑀 (cid:3) is the phase space vector and (cid:104) q𝐼 S is the symplectic matrix. Specifically, q𝐼 = ∈ R𝑁𝑚×3 1 and p𝐼 = , · · · , p𝐼 , · · · , q𝐼 p𝐼 1 𝑁𝑚 𝑁𝑚 (cid:105) (cid:104) (cid:105) represent the position and momentum vectors of the 𝐼-th molecule for 𝐼 = 1, 2, · · · 𝑀. In this work, the superscript indices in capital letter 𝐼 and 𝐽 ranging from 1 to 𝑀 label the molecule index, while the subscript indices 𝑛 and 𝑙 ranging from 1 to 𝑁𝑚 label the individual particles within the molecule. To construct the reduced model, we define the CG variables of each molecule Z𝐼 = (cid:2)Q𝐼, P𝐼 (cid:3) via a map 𝜙 : R𝑁𝑚×6 → R𝑚×6, i.e., Q𝐼 = 𝜙𝑄 (q𝐼) P𝐼 = 𝜙𝑃 (q𝐼, p𝐼), (3.2) where Q𝐼 ∈ R𝑚×3 and P𝐼 ∈ R𝑚×3 represent the generalized coordinates and momenta of the 𝐼-th molecule, respectively. Let Q = [Q1, Q2, · · · , Q𝑚𝑀] and P = [P1, P2, · · · , P𝑚𝑀] denote the CG variables of the full system. Furthermore, we use Q𝐼 = (cid:2)Q(𝐼−1)𝑚+1, · · · , Q𝐼𝑚(cid:3) to denote the position 𝑗 = Q𝜇 and 𝜇 = (𝐼 − 1)𝑚 + 𝑗, similarly for P 𝐼. In this work, vector of the 𝐼-th molecule, where Q𝐼 the Greek letters 𝛼, 𝛽, 𝜇, 𝜈 ranging from 1, · · · , 𝑚𝑀 label the global indices of CG coordinate. In particular, one natural choice is to define Q𝐼 ∈ R3 as the COM and P 𝐼 ∈ R3 the total momentum of the 𝐼-th molecule (e.g., see Refs. Lei et al. (2010); Hijón et al. (2010); Yoshimoto et al. (2013)), which essentially eliminates the intra-molecular DOFs. The dynamic evolution is governed (cid:164)Z (𝑡) = LZ (𝑡), where the Louville operator L𝜙(z) := −((∇𝐻 (z0))𝑇 S∇z0)𝜙(z) depends on the full phase space vector z. To derive the reduced dynamics in terms of Z (𝑡), we define the by Zwanzig’s projection operator as a condition expectation with respect to Z (0) = Z, i.e., PZ 𝑓 := E[ 𝑓 (z)|𝜙(z) = Z] = ∫ 𝛿(ϕ(z) − Z) 𝜌0(z) 𝑓 (z)dz/ ∫ 𝛿(ϕ(z) − Z) 𝜌0(z)dz, (3.3) where 𝜌0(z) ∝ exp [−𝛽𝐻 (z)] represents the equilibrium Boltzmann distribution. Accordingly, we (cid:164)Z (𝑡) = LZ (𝑡) on the sub-space of the CG variables. Using the may project the evolution dynamics Duhamel-Dyson formula, the reduced dynamics takes the form ∫ 𝑡 (cid:164)Z (𝑡) = eL𝑡 PZ LZ (0) + d𝑠eL (𝑡−𝑠) PZ LeQZ L𝑠QZ LZ (0) + eQZ L𝑡 QZ LZ (0), (3.4) 0 24 where QZ = I − PZ. With some further assumptions, the reduced dynamics can be simplified into the following integro-differential equation (cid:164)Q = M −1P (cid:164)P = −∇𝑈 (Q) + ∫ 𝑡 0 K (Q(𝑠), 𝑡 − 𝑠) (cid:164)Q(𝑠) d𝑠 + R(𝑡), (3.5) where M is the mass matrix of the CG variables, 𝑈 (Q) is the conservative free energy, and K (Q, 𝑡) is the memory term assumed to be independent of P and R(𝑡) is the fluctuation force. In principle, with proper construction of the individual modeling terms, Eq. (3.5) provides an accurate CGMD model of the full dynamics (3.1). However, its applicability for practical applications, especially the generalization ability for modeling non-equilibrium processes remains questionable. This caveat is rooted in the definition of the projection operator in Eq. (3.3), where the conditional expectation is defined with respect to the marginal density of the equilibrium distribution 𝜌0(z). Accordingly, the derived reduced dynamics (3.5) assumes that the initial distribution of the full model satisfies 𝜌(z|Z) ∝ 𝛿(𝜙(z) − Z) 𝜌0(z). Hence, Eq. (3.5) is valid only when the underlying distribution of the unresolved variables is close to equilibrium. Unfortunately, this condition is generally not guaranteed when we model non-equilibrium processes (e.g., full model (3.1) in the presence of an external field) using the CG variables like the COMs of individual molecules, which poses a fundamental challenge for the transferability of the reduced model (2.2) and (3.5) in real applications. 3.2.2 Construction of the auxiliary CG variables Following the above discussion, let 𝐻𝑒 (z) and 𝜌𝑒 (z) denote the Hamiltonian and an initial (e.g., the steady-state) distribution under an external field. Theoretically, we may define the Zwanzig’s projection operator with respect to 𝜌𝑒 (z) and derive the reduced dynamics under this external field. However, the constructed reduced dynamics could be valid only for this specific condition but lack the generalization ability for other external field conditions. To address this issue, we transfer the exhausting effort of constructing specific reduced models for individual non-equilibrium processes to pursuing the following question: how to ensure the conditional projection operator defined by Eq. (3.3) remains valid for a range of external conditions? 25 A key observation is that by introducing a set of auxiliary CG variables Z𝐼 = [Z𝐼 1 , Z𝐼 2 , · · · , Z𝐼 𝑚] for the individual molecules 𝐼 = 1, 2, · · · , 𝑀, the conditional probability density function (PDF) under external field approaches that under the equilibrium distribution, i.e., 1 Z𝑒 𝑀 (cid:214) 𝑚 (cid:214) 𝐼=1 𝑗=1 𝛿(𝜙(z𝐼) 𝑗 − Z 𝐼 𝑗 ) 𝜌𝑒 (z) ≈ 1 Z0 𝑀 (cid:214) 𝑚 (cid:214) 𝐼=1 𝑗=1 𝛿(𝜙(z𝐼) 𝑗 − Z 𝐼 𝑗 ) 𝜌0(z). (3.6) Intuitively, this approximation can be understood as follows: As we increase the number of resolved CG variables, the entropy contribution of the unresolved ones decreases. Accordingly, the free energy of different external conditions approaches the equilibrium case, and the reduced model (3.4) and (3.5) in the form of Z 𝐼 can be applied to non-equilibrium processes. This is somewhat similar to the work Lei et al. (2020); Yu et al. (2005) on modeling the constitutive closure of polymer solution, where a set of generalized conformation tensors are introduced to represent the subgrid polymer configurations under various flow conditions. To construct the auxiliary CG variables, we seek a linear map 𝜙(·) in the form of a matrix f𝑊 ∈ R𝑁𝑚×𝑚, i.e., 𝜙𝑄 (q𝐼) 𝑗 = (cid:205)𝑁𝑚 𝑛=1 (cid:205)𝑁𝑚 𝑛=1 𝑤𝑛 𝑗 q𝐼 𝑛 𝑤𝑛 𝑗 , 𝜙𝑃 (p𝐼) 𝑗 = 𝑚 ∑︁ 𝑖=1 𝑀 𝑗𝑖 (cid:205)𝑁𝑚 𝑛=1 (cid:205)𝑁𝑚 𝑛=1 𝑤𝑛𝑖p𝐼 𝑛 𝑤𝑛𝑖 , (3.7) where the mass matrix is defined as 𝑀 𝑗𝑖 = (cid:205)𝑁𝑚 𝑛=1 𝑤𝑛 𝑗 𝑤𝑛𝑖, and the normalization ensures that the transitional and rotational symmetry of the CG variables, i.e., 𝜙(q𝐼 + c, p𝐼) = [𝜙𝑄 (q𝐼 + c), 𝜙𝑃 (p𝐼)] = [𝜙𝑄 (q𝐼) + c, 𝜙𝑃 (p𝐼)] 𝜙(q𝐼U, p𝐼U) = [𝜙𝑄 (q𝐼U), 𝜙𝑃 (p𝐼U)] = [𝜙𝑄 (q𝐼)U, 𝜙𝑃 (p𝐼)U], for any c ∈ R3 and U ∈ S𝑂 (3). As shown below, Eq. (3.7) can be loosely viewed as seeking the principal normal modes of the full dynamics. We note that it is possible to construct 𝜙(·) as a non-linear encoder to seek optimal CG representations of the inter- and intra- molecular interactions. In this work, we focus on developing a general CG framework of introducing auxiliary CG variables to enhance the generalization ability and examining Eq. (3.6) as a valid metric for modeling 26 non-equilibrium processes. We restrict 𝜙(·) to be linear; the model reduction in terms of nonlinear CG variables will be investigated in future work. Furthermore, we impose the following two constraints to the weight matrix W , i.e., 0 < 𝑤𝑛𝑖 < 1, 𝑚 ∑︁ 𝑖=1 𝑤𝑛𝑖 = 1 for all 𝑛 = 1, · · · , 𝑁𝑚. (3.8) We emphasize that these constraints do not impose further restrictions on constructing the CG variables. Specifically, for any mapping defined by a matrix W1 ∈ R𝑁𝑚×(𝑚−1), there exists an equivalent mapping defined by a matrix W2 ∈ R𝑁𝑚×𝑚 which satisfies the above constraints (3.8). We refer to the Appendix for the detailed discussion. Furthermore, we note that the second (cid:205)𝑚 constraint ensures that the COM of the full molecule is consistent with that of the CG variables, i.e., 𝑛=1 q𝐼 of the different CG models on the same metric in Sec. 3.3. Moreover, Newton’s third law can be 𝑛, which enables us to establish a fair comparison 𝑗 = (cid:205)𝑚 𝑗=1 𝑛 = (cid:205)𝑁𝑚 (cid:16)(cid:205)𝑁𝑚 𝑛=1 (cid:205)𝑁𝑚 𝑛=1 𝑤𝑛 𝑗 q𝐼 𝑤𝑛 𝑗 Q𝐼 𝑗=1 (cid:17) naturally imposed in Eq. (3.20). To learn CG mapping 𝜙(·), we aim to find a weight matrix W such that condition (3.6) holds, which, unfortunately, cannot be easily checked a priori. Alternatively, we propose to learn the CG variables with the longest relaxation of the time correlation function. Specifically, we collect training samples under an external field and seek W by solving the optimization problem (cid:205)𝑀,𝑚 (cid:205)𝜏 (cid:205)𝑡𝑐 𝑡=0 (cid:205)𝜏 (cid:205)𝑀,𝑚 𝐼, 𝑗 V𝐼 𝐼, 𝑗 V𝐼 𝑗 (𝜏)V𝐼 𝑗 (𝜏)V𝐼 𝑗 (𝜏 + 𝑡)𝑇 𝑗 (𝜏)𝑇 max W (3.9) under constraint (3.8), where V𝐼 𝑗 = (cid:205)𝑁𝑚 𝑛=1 (cid:205)𝑁𝑚 𝑛=1 𝑤𝑛 𝑗 p𝐼 𝑛 𝑤𝑛 𝑗 , 𝑡𝑐 is a hyperparameter representing the cut-off time when the correlation function decays to 0 and 𝜏 is randomly selected from a long MD trajectory under the steady state. Intuitively, Eq. (3.9) yields the eigenmodes of the largest (negative) eigenvalue of the operator L under the linear approximation Crommelin and Vanden-Eijnden (2011). We note that this is somewhat similar to the time-lagged independent component analysis (TICA) Pérez-Hernández et al. (2013); Molgedey and Schuster (1994); Schwantes and Pande (2013) and dynamic mode decomposition (DMD) Schmid (2010); Chen et al. (2012); Kutz et al. (2016) to identify the dominant dynamics from the simulation data. Furthermore, we emphasize that Eq. (3.9) 27 𝜇 Figure 3.1 Diagram illustrating the neural network architecture for the CG conservative force and the memory kernel. (a) The construction of CG potential function 𝑈 and memory kernel K. Initially, Q is converted into a local environment matrix { ˜Q𝜇}𝑀𝑚 𝜇=1. Sub-networks, illustrated in (b), map ˜Q𝜇 to a local feature D𝜇 and generalized coordinate ˆQ𝜇. Finally, the totel potential is constructed by Eq. (3.10), i.e. 𝑈 = (cid:205)𝑀𝑚 ˜𝑈 (𝐷 𝜇). The total memory kernel K is constructed with the state-dependent component of the memory kernel derived from ˆQ𝜇 using Eq. (2.16) and the time-dependent component 𝚲. (b) The sub-networks map ˜Q𝜇 to a local feature 𝐷 𝜇 and generalized coordinate ˆQ𝜇. The neighbour of the 𝜇-th CG coordinate is denoted by N𝜇 = {𝛼, · · · , 𝜈, · · · , 𝛽}. (b1) The 𝑘-th row of generalized local environment matrix embeds the relative information between 𝜇-th coordinate and its 𝑘-th neighbor (labeled as the 𝜈-th coordinate), including type embedding 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇, and distance information. (b2) The 𝐾1 × 𝐾2 symmetry perserving feature D𝜇 is constructed by (G1,𝜇)𝑇 , ˆQ𝜇, ˆQ𝑇 𝜇 and G2,𝜇. (b3) The generalized coordinate is constructed from the local environment embedding matrix G3,𝜇 and relative position ¯Q𝜇, which is the last three columns of ˜Q𝜇. should not viewed as the equivalent condition of Eq. (3.6). Rather, it is essentially an empirical approach for learning CG mappings such that the metric (3.6) satisfies approximately. 3.2.3 Construction of the symmetry-preserving free energy function With the CG mapping 𝜙, the construction of the CG model proceeds with learning the free energy 𝑈 (Q) and the memory kernel K (Q, 𝑡). To retain the extensive structure, the free energy 𝑈 (Q) is decomposed into the local potential of individual CG coordinates, i.e., 𝑈 (Q) = 𝑀𝑚 ∑︁ 𝜇=1 ˜𝑈 (D( ˜Q𝜇)), (3.10) where ˜𝑈 (·) represents the local potential, ˜Q𝜇 represents the local environment determined by the relative position between the CG coordinates Q𝜇 and its neighbors Q𝜈 in {𝜈 (cid:12) (cid:12)𝜈 ∈ N𝜇 }. The detailed 28 Mem. KernelD state embedded mat.Qcoord. mat.˜Q1⋯˜Qμ⋯˜QMmD1⋯Dμ⋯DMmgener. local env. mat.local featurestotal potential (a) (b)QμG1,μDμ=(G1,μ)T˜Qμ(˜Qμ)TG2,μlocal coord.gener. local env. mat.(b2) ̂Qμ=Qμ⊕(G3,μ)T¯Qμ̂Q1⋯̂Qμ⋯̂QMm̂Ξ1,2⋯̂Ξ1,Mm⋯⋯⋯̂ΞMm,1⋯̂ΞMm,MmKCG(Q,t)=Ξ(Q)eΛtΞ(Q)T embed. mat. G1,μ defined as g1(s(rνμ),aν,aμ,bνμ)G2,μg2(s(rνμ),aν,aμ,bνμ)G3,μg3(s(rνμ),aν,aμ,bνμ)G2,μG3,μ (b3)U(Q)=Mm∑μ=1˜U(D(˜Qμ)) (b1)coord.K3 gener.(b3)(b2)˜Qμ=aαaμbαμs(rαμ)̂xαμ̂yαμ̂zαμ⋯⋯⋯⋯⋯⋯⋯ajaμbνμs(rνμ)̂xνμ̂yνμ̂zνμ⋯⋯⋯⋯⋯⋯⋯aβaμbβμs(rβμ)̂xβμ̂yβμ̂zβμ (b1)=¯Qμ form will be specified following Eq. (3.12). D(·) represents the encoder functions that map ˜Q𝜇 into symmetry-preserving features. N𝜇 denotes the CG coordinate indices 𝜈 in the neighbour of CG coordinate indices 𝜇, such that 𝑟 𝜇𝜈 < 𝑟𝑐. We define 𝑁𝜇 as the cardinality of the set N𝜇. Fig. 3.1 sketches the neural network representation of 𝑈 (Q), where a structure similar to Ref. Zhang et al. (2018c) is used to preserve the translational and rotational symmetry constraints. To impose the permutational symmetry, we note that 𝑈 (Q) remains invariant with the index permutation among the individual molecule, i.e., 𝑈 (Q𝜎(1), Q𝜎(2), · · · , Q𝜎(𝑀)) = 𝑈 (Q1, Q2, · · · , Q𝑀), (3.11) where {𝜎(𝐼)}𝑀 𝐼=1 represents an index permutation among moleculars. However, 𝑈 (Q) does not necessarily remain invariant under an index permutation among the generalized CG coordinates, i.e., 𝑈 (Q1 1 , Q1 2 , · · · , Q2 1 , Q2 2 · · · ) ≠ 𝑈 (Q1 1 , Q2 2 , · · · , Q2 1 , Q1 2 · · · ), 𝑈 (Q1 1 , Q1 2 , · · · , Q2 1 , Q2 2 · · · ) ≠ 𝑈 (Q1 2 , Q1 1 , · · · , Q2 1 , Q1 2 · · · ), (3.12) where Q𝐼 𝑖 = Q(𝐼−1)𝑚+𝑖. This complication can not be remedied by introducing the particle type information as done in Ref. Zhang et al. (2018c). This is because, for a specific CG coordinate, its contribution to 𝑈 (Q) could be either inter- or intra-molecular interaction associated with other CG coordinates. Therefore, we define the local environment of the 𝜇-th coordinate by ˜Q𝜇 = (𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇, 𝑠(𝑟𝜈𝜇), ˆ𝑥𝜈𝜇, ˆ𝑦𝜈𝜇, ˆ𝑧𝜈𝜇) ∈ R𝑁𝜇×7, where relative position r𝜈𝜇 = Q𝜈 −Q𝜇 = (𝑥𝜈𝜇, 𝑦𝜈𝜇, 𝑧𝜈𝜇) denotes the relative position to its neighbor, ˆ𝑥𝜈𝜇 = 𝑠(𝑟𝜈𝜇)𝑥𝜈𝜇/𝑟𝜈𝜇, and similar for ˆ𝑦𝜈𝜇 and ˆ𝑧𝜈𝜇. 𝑠(𝑟) is a preselected function defined by 𝑠(𝑟) = 𝑟 𝑟 2 𝑐 1 𝑟 1 2𝑟 0    if 𝐼 = 𝐽, if 𝑟 < 𝑟𝑐𝑠 and 𝐼 ≠ 𝐽, (cid:104) 1 + cos (cid:16) 𝜋(𝑟−𝑟𝑐𝑠) 𝑟𝑐−𝑟𝑐𝑠 (cid:17)(cid:105) if 𝑟𝑐𝑠 ≤ 𝑟 < 𝑟𝑐 and 𝐼 ≠ 𝐽, e𝑙𝑠𝑒, where 𝐼 and 𝐽 represent the molecule indices of the 𝜇-th and its neighbor coordinate. The motivation of this definition is the interactions between molecules are intended to steadily decline to zero as 29 the distance grows, whereas intra-molecular interactions are expected to intensify with increased distance. In particular, if the two coordinates belong to the same molecule, their intra-molecular interactions will always be accounted for. Otherwise, 𝑠(𝑟) is a smooth differentiable function that decays to 0 beyond 𝑟𝑐. 𝑟𝑐𝑠 is a hyper-parameter and is set to be 𝑟𝑐 − 1 in this study. The 𝑎𝜈, 𝑎𝜇 and 𝑏𝜈𝜇 are parameters to represent the type of atom and the type of interaction. Specifically, 𝑏𝜈𝜇 is set to be 1 if the 𝜇-th coordinate and its neighbor 𝜈-th coordinate are part of the same molecule 𝐼, which means that there exist 𝑖, 𝑗 ∈ {1, · · · , 𝑚} such that 𝜇 = (𝐼 − 1)𝑚 + 𝑖 and 𝜈 = (𝐼 − 1)𝑚 + 𝑗. Otherwise, it is set to be −1. In addition, we also introduce 𝑎𝜈, 𝑎𝜇 to represent the type of the 𝜇-th CG coordinate and the type of the 𝜈-th CG coordinate within the neighborhood. In practice, 𝑎𝜇, 𝑎𝜈 are set to be 𝑖, 𝑗 ∈ {1, · · · , 𝑚} defined in 𝑏𝜈𝜇. With the local environment ˜Q𝜇, we construct the local features D ∈ R𝐾1×𝐾2 by D( ˜Q𝜇) = (cid:0)G1,𝜇(cid:1)𝑇 ˜Q𝜇 (cid:0) ˜Q𝜇(cid:1)𝑇 G2,𝜇 𝑇 (3.13) ∑︁ ∑︁ (cid:169) (cid:173) (cid:171) 𝜈∈N𝜇 𝜈∈N𝜇 := (cid:169) (cid:173) (cid:171) g2(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) ˜Q𝜈𝜇(cid:170) (cid:174) (cid:172) g1(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) ˜Q𝜈𝜇(cid:170) (cid:174) (cid:172) where G1,𝜇 ∈ R𝑁𝜇×𝐾1, G2,𝜇 ∈ R𝑁𝜇×𝐾2. The 𝑘-th row of G1,𝜇 is represented as a neural network g1(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) that embeds the relative information between the 𝜇-th coordinate and its 𝑘-th neighbor (labeled as the 𝜈-th coordinate) in R4 to a feature space in R𝐾1. Similarly, G2,𝜇 = g2(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) is represented by a neural network mapping R4 to R𝐾2. , To construct the free energy function 𝑈 (Q), we conduct restrained MD simulations (see Sec. 3.2.4 for details) and train the neural network representations ˜𝑈 (·), g1(·) and g2(·) by minimizing the empirical loss between the MD and CG model, i.e., 𝐿𝑈 = 𝑆 ∑︁ 𝑠=1 (cid:13) (cid:13) (cid:13) ∇𝑈 (Q(𝑠)) + F (Q(𝑠)) (cid:13) 2 (cid:13) (cid:13) , (3.14) where the superscript 𝑠 represents various configurations and the F represents the conservative force term sampled from the full MD simulation; see Sec. 3.2.5 for details. 30 3.2.4 Construction of the symmetry-preserving memory function Besides the free energy function 𝑈 (Q), the reduced dynamics further depends on the memory function K (Q, 𝑡). In particular, recent work Lyu and Lei (2023b) shows that the memory function could exhibit a strong many-body nature. Empirical approximations such as the standard GLE with a homogeneous kernel and the dissipative particle dynamics with a pairwise decomposition generally show limitations in modeling the heterogeneous energy dissipation among the CG particles. To predict the collective dynamics, it is crucial to accurately model both the non-Markovian and many-body nature of the memory function. Following the neural network structure proposed in Ref. Lyu and Lei (2023b), we encode the many-body dissipative interactions among the CG particles by introducing coordinate features 𝐾3𝜇 ] ∈ R𝐾3×3, where the feature (cid:98)Q𝜇 is defined by (cid:98)Q𝜇 = [ (cid:98)Q1 𝜇, · · · , (cid:98)Q 𝜇, (cid:98)Q2 (cid:98)Q𝜇 = Q𝜇 ⊕ (cid:0)G3,𝜇(cid:1)𝑇 ¯Q𝜇 := Q𝜇 ⊕ ∑︁ 𝜈∈N𝜇 g3(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) ¯Q𝜈𝜇, (3.15) where G3,𝜇 ∈ R𝑁𝜇×𝐾3, where g3 : R4 → R𝐾3 is a neural network that encodes the dissipative ¯Q𝜇 ∈ R𝑁𝜇×3 denotes the relative position between the 𝜇-th interactions beyond the pairwise form. coordinate and its local neighbor, whose 𝑘-th row (corresponding to 𝜈-th coordinate) is defined as ¯Q𝜈𝜇 := ( ˆ𝑥𝜈𝜇, ˆ𝑦𝜈𝜇, ˆ𝑧𝜈𝜇). Accordingly, we can construct the state-dependent memory in the form of 𝚵(Q(𝑡))e𝚲(𝑡−𝑠)𝚵(Q(𝑠))𝑇 , with 𝚵 = (cid:2)𝚵1𝚵2 · · · 𝚵𝐷 (cid:3) and 𝚵𝑑 𝜇𝜈 = 𝐾3∑︁ 𝑘=1 𝑘 ( ˆQ𝜇𝜈 ˆQ𝑇 ℎ𝑑 𝜇𝜈) ˆQ𝑘 𝜇𝜈 ⊗ ˆQ𝑘 𝜇𝜈 + ℎ𝑑 0 ( ˆQ𝜇𝜈 ˆQ𝑇 𝜇𝜈)I, (3.16) where ℎ : R𝐾3×𝐾3 → R𝐷 (𝐾3+1) and 𝑑 = 1, · · · , 𝐷. 𝚲 takes an extendable form 𝚲 = ˆ𝚲 ⊗ I, where I ∈ R3𝑀×3𝑀 is the identity matrix and ˆ𝚲 ∈ R𝐷×𝐷 specifies the coupling among the features. It takes the form ˆ𝚲 = ˆL ˆL𝑇 + ˆL𝑎, where ˆL is a lower triangular matrix and ˆL𝑎 is an anti-symmetry 𝑇 matrix; it satisfies the Lyapunov stability condition ˆ𝚲 + ˆ𝚲 ≥ 0. We can show that 𝚵 in the form of Eq. (3.16) strictly preserve the translational, rotational, and 31 permutational symmetry constraints, i.e., 𝜇𝜈 (Q1 + b, · · · , Q𝑚𝑀 + b) = 𝚵𝑑 𝚵𝑑 𝜇𝜈 (Q1, · · · , Q𝑚𝑀) 𝜇𝜈 (Q1U, · · · , Q𝑚𝑀U) = U𝚵𝑑 𝚵𝑑 𝜇𝜈 (Q1, · · · , Q𝑚𝑀)U𝑇 (3.17) 𝚵𝑑 𝜇𝜈 (Q𝜎(1), Q𝜎(2), · · · , Q𝜎(𝑀)) = 𝚵𝑑 𝜎(𝜇)𝜎(𝜈) (Q1, Q2, · · · , Q𝑀) for any c ∈ R3, U ∈ S𝑂 (3), and index permutation {𝜎(𝐼)}𝑀 Q𝐼 = (cid:2)Q(𝐼−1)𝑚+1, · · · , Q𝐼𝑚(cid:3). 𝐼=1 between the molecules, where To construct K (Q, 𝑡), we learn the memory embedded in Eq. (3.21) in the form of 𝚵(Q(𝑡))e𝚲(𝑡−𝑠)𝚵(Q(𝑠))𝑇 directly from the Zwanzig’s formalism so that the many-body nature can be naturally inherited. We estimate the memory term K𝑀 𝑍 (Q, 𝑡) for a collection of CG configurations. This enables us to train the memory function in terms of the encoders {g3(·), ℎ} and matrices { ˆL, ˆL𝑎} by minimizing the empirical loss 𝐿𝐾 = 𝑁𝑠 ∑︁ 𝑁𝑡 ∑︁ 𝑠=1 𝑗=1 (cid:13) (cid:13)K𝐶𝐺 (Q(𝑠), 𝑡 𝑗 ) − K𝑀 𝑍 (Q(𝑠), 𝑡 𝑗 ) (cid:13) (cid:13) 2 (cid:13) (cid:13) , (3.18) where 𝑠 represents the different CG configurations. 3.2.5 Traning and evaluation of the CGMD model Following Ref. Hijón et al. (2010), we use the restrained dynamics ˜z(𝑡) = eR𝑡z(0) to approximate the orthogonal dynamics eQZ 𝐿𝑡 based on the observation PQ = PR ≡ 0. Accordingly, the conservative term can be approximated by F (Q) = PZ LP and the memory term of Zwanzig’s (cid:2)(eR𝑡 QZ LP ) (QZ LP )𝑇 (cid:3) and the form (see Eq. (3.4)) can be approximated by K𝑀 𝑍 (Q, 𝑡) = PZ memory of the CG model reduces to K𝐶𝐺 (Q, 𝑡) = 𝚵(Q)e𝚲𝑡𝚵(Q)𝑇 . To collect the training samples, we establish the restrained dynamics following (cid:164)q𝐼 = p𝐼 − W (W 𝑇 W )−1W 𝑇 p𝐼 (cid:164)p𝐼 = f 𝐼 − W (W 𝑇 W )−1W 𝑇 f 𝐼 (3.19) where f 𝐼 = (cid:104) f 𝐼 1 , · · · , f 𝐼 𝑁𝑚 (cid:105) ∈ R𝑁𝑚×3 is the force of the individual atoms belonging to the molecule 𝐼. The total force of each CG variable is defined as F𝐼 𝑗 = (cid:205)𝑁𝑚 𝑛=1 𝑤𝑛 𝑗 f 𝐼 𝑛, which is equivalently denoted by F𝜇 with 𝜇 = (𝐼 − 1)𝑚 + 𝑗 and 𝜇 = 1, · · · , 𝑚𝑀. 32 Under the restrained dynamics (3.19), the orthogonal fluctuation term corresponds to the random force of each CG variable, i.e., QZ LP = [𝛿F1, 𝛿F2, · · · , 𝛿F𝑚𝑀], where 𝛿F𝜇 = F𝜇 − (cid:10)F𝜇(cid:11) represents the random force of the 𝜇-th CG coordinate and ⟨·⟩ represents the condition expectation whose generalization ability will be examined in Sec. 3.3. Due to the constraint (3.8), the Newton’s third law is naturally satisfied, i.e., 𝑁𝑚∑︁ 𝑚𝑀 ∑︁ 𝑀,𝑚 ∑︁ F𝜇 = 𝜇 𝐼,𝑖 𝑛 𝑤𝑛𝑖f 𝐼 𝑛 = 𝑀,𝑚 ∑︁ 𝑁𝑚∑︁ 𝑀,𝑁𝑚∑︁ 𝑤𝑛𝑖 𝐼,𝑖 𝑛 𝐽,𝑙 f 𝐼𝐽 𝑛𝑙 = 𝑀,𝑁𝑚∑︁ 𝑀,𝑁𝑚∑︁ 𝐼,𝑛 𝐽,𝑙 f 𝐼𝐽 𝑛𝑙 ≡ 0, (3.20) 𝑛𝑙 represents the force between the 𝑛-th atom of the 𝐼-th molecule and the 𝑙-th atom of the where f 𝐼𝐽 𝐽-th molecule. Therefore, 𝛿F𝜇 = − (cid:205)𝜈≠𝜇 𝛿F𝜈 and 𝚵𝜇𝜇 = − (cid:205)𝜈≠𝜇 𝚵𝜇𝜈. The constructed reduced model can be simulated by the extended dynamics of [Q, P , ζ] in the form of (cid:164)Q = M −1P (cid:164)P = −∇𝑈 (Q) + 𝚵(Q)ζ (cid:164)ζ = −𝚵(Q)𝑇 (cid:164)Q − 𝚲ζ + ξ(𝑡), (3.21) where we introduce the white noise term following ⟨ξ(𝑡)ξ(𝑡′)⟩ = 𝛽−1(𝚲 + 𝚲𝑇 )𝛿(𝑡 − 𝑡′). With this choice, we can show that Eq. (3.21) satisfies the second fluctuation-dissipation theorem. We refer to Ref. Lyu and Lei (2023b) for the detailed proof. 3.3 Numerical Results 3.3.1 The full atomistic model To validate our method, we consider a full micro-scale model of a polymer melt system. Each polymer molecule consists of 𝑁𝑚 = 106 atoms. The central atom is connected to 3 arms and each arm consists of 5 atoms. At the end of each arm, there are 6 additional arms connected to the end atom and each additional arm consists of 5 atoms. The pairwise atomistic interaction is chosen to be the Weeks-Chandler-Andersen potential 𝑉𝑝 (𝑟) =  𝑉L𝐽 (𝑟) − 𝑉L𝐽 (𝑅𝑐), 𝑟 < 𝑅𝑐   0, 𝑟 ≥ 𝑅𝑐 33 𝑉L𝐽 (𝑟) = 4𝜖 (cid:17) 12 (cid:20)(cid:16) 𝜎 𝑟 (cid:17) 6(cid:21) , − (cid:16) 𝜎 𝑟 (3.22) where 𝜖 = 1.0 is the dispersion energy, 𝑅𝑐 = 21/6𝜎 and 𝜎 = 2.415 is the hardcore distance. The bond interaction is chosen to be the FENE potential 𝑉𝑏 (𝑟) = −0.5𝐾 𝑅2 0 ln 1 − (cid:34) (cid:19) 2(cid:35) , (cid:18) 𝑟 𝑅0 (3.23) where stiffness is set to be 𝐾 = 1 and the distance is to be 𝑅0 = 5. The complete system consists of 𝑀 = 300 molecules in a 90 × 90 × 90 domain in the reduced unit. The periodic boundary condition is imposed along each direction. The system is equilibrated under the Nosé-Hoover thermostat with temperature 𝑘 𝐵𝑇 = 4.0. In this work, the external force field is imposed on the 𝑥-𝑦 plane and the temperature is defined by the velocity in the 𝑧 direction. To collect the training samples, we impose an external force field 𝑓𝑥 along the 𝑥-direction to generate the reverse Poiseuille flow, i.e.,    if 0 < 𝑦 < 45, 𝑓𝑥 (𝑦) = − 𝑓0 else, 𝑓0 (3.24) where 𝑓0 = 0.01 is the force magnitude. The full model is simulated for 1 × 106 steps with a time step d𝑡 = 1 × 10−3 to achieve the steady state, followed by 5 × 105 steps to collect the statistical samples in Eq (3.9), with 𝑡𝑐 set to be 5. To construct the CG models, we learn 𝑚 = 3 and 𝑚 = 4 CVs from the training set. Throughout the rest of this paper, we will denote the reduced model that only uses the standard COMs by CGCOM, and the reduced models with 3 and 4 auxiliary CVs (per each molecule) by CG3 and CG4, respectively. In particular, for the CG3 model, the obtained 3 CG variables are equivalent which uniformly divides the individual molecule into three parts. In contrast, the obtained CG variables are inequivalent for the CG4 model. 3.3.2 Generalization ability of the CG models As discussed in Sec. 3.2.1 and Sec. 3.2.2, the generalization ability of the CG models for the non-equilibrium processes relies on the assumption that the conditional PDF (i.e., ∝ 𝛿(𝜙𝑄 (q) − Q) 𝜌𝑒 (q)) used for defining the CG projection operator (3.3) remains nearly the same, i.e., Eq. (3.6) holds for various external flow conditions. In practice, the direct evaluation of this high-dimensional PDF becomes computationally intractable. Instead, we relax this metric by examining the second moments of the atomistic particle distribution conditional to fixed CG 34 particle position in the orthogonal direction defined by q𝐼 variables under various non-equilibrium conditions. Specifically, we examine the variation of the ⊥,W = q𝐼 − W (W 𝑇 W )−1W 𝑇 q𝐼, where q𝐼 ∈ R𝑁𝑚×3 represents the atomistic positions of the 𝐼-th molecule and W is the weight matrix constructed by Eq. (3.9). The second-moment covariance matrix is defined by C(𝑦) = Eneq (cid:104) ⊥,W q𝐼 q𝐼 ⊥,W 𝑇 | 𝑦𝐼 𝑐 = 𝑦 (cid:105) , where C(𝑦) ∈ R𝑁𝑚×𝑁𝑚 and the conditional expectation Eneq is evaluated over the full molecules 𝐼 = 1, 2, · · · , 𝑀 under the stationary distribution generated by the external force field (3.24). The condition 𝑦𝐼 𝑐 = 𝑦 restricts the COMs of the sampling molecule along the 𝑦-direction. In particular, the external force generates the reverse Poiseuille flow along the 𝑥-direction, where the flow shear rate 𝜕𝑢 𝑥 𝜕𝑦 and the external stress vary along the 𝑦-direction. Specifically, the shear rate is close to 0 at 𝑦 = 𝐿/4 and 𝑦 = 3𝐿/4, and achieves the largest value at 𝑦 = 0, 𝑦 = 𝐿/2, and 𝑦 = 𝐿. Therefore, the variation of C (𝑦) with respect to 𝑦 provides a computationally accessible metric for Eq. (3.6). If the conditional PDF is close to the equilibrium distribution, C (𝑦) should be homogeneous. Conversely, the large variation implies the deviation from the equilibrium distribution. Figure 3.2 shows the difference ∥C (𝑦) − C (𝑦′) ∥𝐹 of the various CG models at different locations, where 𝑦 and 𝑦′ correspond to the horizontal and vertical axis, respectively, and ∥ · ∥𝐹 is the Frobenius norm. It is clear that when the COMs are the only CG variables, the unresolved (i.e., orthogonal) degrees of freedom exhibit heterogeneous second moment along the 𝑦-drection, implying pronounced variation of the conditional PDF under different external fields. On the other hand, the variation decreases significantly when auxiliary CG variables are introduced into the CG3 and CG4 models, showing the promise of certain generalization ability for non-equilibrium processes. The discrepancy in the conditional PDF will further lead to inconsistent reduced modeling terms under various non-equilibrium processes. To probe this effect, we examine the pairwise conservative force between the CG coordinates. We emphasize that the CG force generally exhibits a many-body nature Ge et al. (2023); here we use the projection along the pairwise direction Lei et al. (2010); Hijón et al. (2010) as a metric to quantify the generalization ability of the free energy 35 Figure 3.2 The Frobenius norm of the second-moment difference ∥C (𝑦) − C (𝑦′) ∥𝐹 of the various CG models under the steady reverse Poiseuille flow generated by external force 𝑓0 = 0.01 (upper) and 𝑓0 = 0.005 (lower), where 𝑦 and 𝑦′ are represented by the horizontal and vertical axis, respectively. (a) and (d) CGCOM; (b) and (e) CG3; (c) and (f) CG4. 𝑈 (Q). Specifically, let F 𝐼𝐽 equilibrium state, the pairwise conservative force is defined as 𝐹eq 𝑖 𝑗 denote the force between the CG coordinates Q𝐼 𝑖 𝑗 (𝑄) = Eeq (cid:104) 𝑖 and Q𝐽 (cid:12) 𝑄 𝐼𝐽 · e𝐼𝐽 F 𝐼𝐽 (cid:12) 𝑖 𝑗 𝑖 𝑗 (cid:12) 𝑗 . For the (cid:105) 𝑖 𝑗 = 𝑄 , where e𝐼𝐽 𝑖 𝑗 = Q𝐼𝐽 𝑖 𝑗 and Eeq [·] is the expectation over the molecule index 1 ≤ 𝐼, 𝐽 ≤ 𝑀 under the equilibrium state, and 1 ≤ 𝑖, 𝑗 ≤ 𝑚 is kept to represent the CG coordinate index within 𝑖 𝑗 /𝑄 𝐼𝐽 each molecule. For the non-equilibrium state, the pairwise conservative force is defined as (cid:105) , where the condition is taken that both Q𝐼 𝑖 𝐹neq 𝑖 𝑗 and Q𝐽 | 𝑦𝐼 F 𝐼𝐽 𝑖 𝑗 (𝑄, 𝑦) = Eneq (cid:104) · e𝐼𝐽 𝑖 𝑗 = 𝑄 𝑖 𝑗 𝑗 share a specific value along the 𝑦-direction. 𝑖 𝑗 (𝑄) − 𝐹neq 𝐹eq 𝑖 𝑗 𝑗 = 𝑦, 𝑄 𝐼𝐽 𝑖 = 𝑦𝐽 Figure 3.3 shows the difference (cid:12) (cid:12) (cid:12) for the different CG models. For the standard CGCOM model where Q is represented by the COMs, the two CG force terms agree well at (𝑄, 𝑦) (cid:12) (cid:12) (cid:12) 𝑦 = 𝐿/4 = 22.5 and 𝑦 = 3𝐿/4 = 67.5, where the shear rate is close to zero and the atomistic particle distribution is near equilibrium. In contrast, the two force terms show pronounced differences at 𝑦 = 0, 2/𝐿, and 𝐿 where the shear rate is large and the non-equilibrium distribution deviates from 36 03060900306090(a)0.00.20.40.60.81.0×10303060900306090(b)01530456003060900306090(c)01530456003060900306090(d)0.00.20.40.60.81.0×10303060900306090(e)01530456003060900306090(f)015304560 the equilibrium distribution. This is consistent with the specific pattern of the second-moment difference in Fig. 3.2(a) and (d). The large difference implies the heterogeneity of the conditional PDF 𝛿(𝜙𝑄 (q) − Q) 𝜌𝑒 (Q) under various external conditions. Such limitation has a clear physical interpretation: the intra-molecular interactions may significantly affect the visco-elastic response as well as the collective dynamics, which, however, cannot be captured by the COMs. Fortunately, the inconsistency can be alleviated by introducing auxiliary CG variables into the reduced model. As shown in Fig. 3.3, the conservative force difference |𝐹eq 𝑖 𝑗 (𝑄) − 𝐹neq 𝑖 𝑗 (𝑄, 𝑦)| decreases to 𝑂 (0.1) for the CG3 and CG4 models, showing the improvement in the generalization ability of the free energy term 𝑈 (Q). Finally, we study the generalization ability of the memory term K (Q, 𝑡). Similar to U (Q), we note that the memory term K (Q, 𝑡) generally exhibits a many-body nature Lyu and Lei (2023b); here we examine the variation of the pairwise fluctuation force 𝛿F 𝐼𝐽 𝑖 𝑗 = F 𝐼𝐽 𝑖 𝑗 − 𝐹𝑖 𝑗 (𝑄 𝐼𝐽 𝑖 𝑗 )e𝐼𝐽 𝑖 𝑗 . It can be loosely viewed as a metric of K (Q, 𝑡 = 0) = E [𝛿F ⊗ 𝛿F ]. Specifically, we evaluate the variation of 𝛿F 𝐼𝐽 𝑖 𝑗 under the equilibrium state 𝑖 𝑗 (𝑄) = Eeq (cid:104) 𝐾 eq and the nonequilibrium state 𝛿F 𝐼𝐽 𝑖 𝑗 𝑇 (cid:16) I − e𝐼𝐽 𝑖 𝑗 𝑇 e𝐼𝐽 𝑖 𝑗 (cid:17) 𝛿F 𝐼𝐽 𝑖 𝑗 (cid:12) 𝑄 𝐼𝐽 (cid:12) (cid:12) 𝑖 𝑗 = 𝑄 (cid:105) , 𝐾 neq 𝑖 𝑗 (𝑄, 𝑦) = Eneq (cid:104) 𝛿F 𝐼𝐽 𝑖 𝑗 𝑇 (cid:16) I − e𝐼𝐽 𝑖 𝑗 𝑇 e𝐼𝐽 𝑖 𝑗 (cid:17) 𝛿F 𝐼𝐽 𝑖 𝑗 (cid:12) 𝑄 𝐼𝐽 (cid:12) (cid:12) 𝑖 𝑗 = 𝑄, 𝑦𝐼 𝑖 = 𝑦𝐽 𝑗 = 𝑦 (cid:105) , where the expectation is over the molecule index 1 ≤ 𝐼, 𝐽 ≤ 𝑀. Fig. 3.4 shows the difference |𝐾 n𝑒𝑞 𝑖 𝑗 𝑖 𝑗 (𝑄)| for various CG models. Similar to the conservative CG force, the variation (𝑄, 𝑦) − 𝐾 e𝑞 of the fluctuation force exhibits pronounced heterogeneity for the CGCOM model and achieves the largest discrepancy at the locations of large shear rate (𝑦 = 0, 𝐿/2, 𝐿). In contrast, the discrepancy becomes much smaller for the CG3 and CG4 models with auxiliary CG variables. 3.3.3 Non-equilibrium flows The inconsistent reduced modeling terms shown in Sec. 3.3.2 reveal a caveat of using the standard CGCOM model for non-equilibrium processes. We examine this effect by simulating various non-equilibrium flows. 37 (cid:12) (cid:12) (cid:12) for various CG models, Figure 3.3 The pairwise conservative force difference which loosely quantifies the generalization ability of the CG free energy 𝑈 (Q). (a) CGCOM; (b) CG3 with (𝑖, 𝑗) = (1, 1); (c-f) CG4 with (𝑖, 𝑗) = (1, 1), (2, 2), · · · , (4, 4). (𝑄) − 𝐹eq 𝑖 𝑗 (𝑄, 𝑦) 𝐹neq 𝑖 𝑗 (cid:12) (cid:12) (cid:12) To establish a fair comparison, we construct the many-body form of the free energy 𝑈 (Q) and the memory term K (Q, 𝑡) with the symmetry-preserving neural network representations presented in Sec. 3.2.3 and Sec. 3.2.4 for the CGCOM, CG3 and CG4 models, respectively. To verify 𝑈 (Q), we sample the radial distribution function of the inter-molecular COMs for the CGCOM model and the intra-molecular CG coordinates for the CG3 and CG4 models. To verify K (Q, 𝑡), we sample the velocity auto-correlation function of the CG variables. As shown in the Appendix, the predictions of the three CG models show good agreement with the full MD results, which verify their efficacy for modeling equilibrium processes. Next, we consider the non-equilibrium reverse Poiseuille flow generated by an external force (3.24) with 𝑓0 = 0.01. Fig. 3.5 shows the velocity development 𝑢𝑥 (𝑦, 𝑡). Compared with the full MD results, the prediction of the standard CGCOM overestimates the instantaneous velocity magnitude by three times. Furthermore, it can not capture the development oscillation near 𝑡 = 60. This 38 5254565851015202530(a)02468yQ5254565851015202530(b)0.00.51.01.52.0×10−1yQ5254565851015202530(c)0.00.51.01.52.0×10−1yQ5254565851015202530(d)0.00.51.01.52.0×10−1yQ5254565851015202530(e)0.00.51.01.52.0×10−1yQ5254565851015202530(f)0.00.51.01.52.0×10−1yQ Figure 3.4 The difference of the variation of the fluctuation force |𝐾𝑖 𝑗 eq(𝑄, 𝑦)| for different CG models, which loosely quantifies the generalization ability of the CG memory term K (Q, 𝑡) at 𝑡 = 0. The sub-figure labels are the same as Fig. 3.3. neq(𝑄) − 𝐾𝑖 𝑗 Figure 3.5 The development of the reverse Poiseuille flow under the external force 𝑓0 = 0.01 predicted by the full MD and three CG models. 39 52545658511162126(a)0.00.51.01.52.02.53.0×102yQ52545658515202530(b)0510152025yQ52545658515202530(c)0510152025yQ52545658515202530(d)0510152025yQ52545658515202530(e)0510152025yQ52545658515202530(f)0510152025yQ0306090MDCGCOM010020003060CG3100200CG4−2−1012×10−1yt Figure 3.6 The steady state reverse Poiseuille flow under external force 𝑓0 = 0.01 predicted by the full MD and the CG models. (a) The stationary velocity profile 𝑢𝑥 (𝑦) (b) The shear-rate-dependent viscosity. limitation is rooted in the choice of the COMs as the CG variables, which ignores the intra-molecular interactions and therefore can not capture the complex visco-elastic responses under non-equilibrium processes. The prediction of the CG3 model shows significant improvement to the CGCOM model but overestimates the magnitude of the velocity oscillation during the flow development stage. On the other hand, the prediction of the CG4 model shows a good agreement with the full MD results throughout the development stage. Fig. 3.6 shows the steady state velocity profile 𝑢𝑥 (𝑦) predicted by the different models. Similar to the development stage, the prediction of the CG4 model shows a good agreement with the MD results. However, the prediction of the CGCOM and the CG3 models show apparent discrepancies due to their inefficacy in modeling the intra-molecular interactions. To quantify the difference, we compute the shear-rate-dependent viscosity from the steady velocity profile. The predictions of the CGCOM and the CG3 model are nearly independent of the shear rate, which indicates that they can not capture the visco-elastic responses associated with the molecular conformation change under external shear flow. In contrast, the CG4 model can faithfully reproduce the shear-rate-dependent viscosity of the full MD results. Finally, we examine the generalization ability of the CG models for other flow fields. While 40 10203040y0.000.050.100.150.20VelocityProfileMDCGCOMCG3CG4(a)0.01.53.04.5ShearRate×10−31020304050ShearViscosity(b) Figure 3.7 The velocity field of a 2D Taylor vortex predicted by the full MD and the CG models. (a-b) The contour of the instantaneous velocity 𝑢𝑦 (𝑥, 𝑦, 𝑡) at 𝑡 = 50 (a) and 𝑡 = 100 (b). (c) The development of velocity 𝑢𝑦 (𝑥 = 25, 𝑦, 𝑡) (d) The steady velocity field 𝑢𝑦 (𝑥, 𝑦). 41 0306090MDCGCOM0306003060CG30306090CG4−4−2024×10−2yx(a)0306090MDCGCOM0306003060CG30306090CG4−4−2024×10−2yx(b)0306090MDCGCOM010003060CG30100200CG4−8−4048×10−2yt(c)0306090MDCGCOM0306003060CG30306090CG4−2−1012×10−2yx(d) the training samples of the CG models are collected in the reverse Poiseuille flow, we validate the constructed models with the Taylor vortex generated by the external force field 𝑓𝑥 (𝑥, 𝑦) = − 𝑓0 sin (cid:19) (cid:18) 2𝜋𝑥 𝐿 cos (cid:19) (cid:18) 2𝜋𝑦 𝐿 , 𝑓𝑦 (𝑥, 𝑦) = 𝑓0 cos (cid:19) (cid:18) 2𝜋𝑥 𝐿 sin (cid:19) (cid:18) 2𝜋𝑦 𝐿 , where 𝑓0 = 1 × 10−2 and 𝐿 = 90. Fig. 3.7 shows the prediction of the 2D velocity contour from various models. Similar to the previous example, the CGCOM and CG3 models overestimate the magnitude of the velocity field due to their insufficiency in modeling the intra-molecular interactions arising from the molecular conformation change under the external flow field. In contrast, the prediction of the CG4 model yields show good agreement with the full MD model. Although the CG4 model performs well in the vortex flow, we should not view it as the “correct” (as opposed to the CGCOM) model for all non-equilibrium processes. Our main point is that the conditional PDF associated with the CG projection needs to be consistent to ensure the model’s generalization ability, which, unfortunately, can not be guaranteed for non-equilibrium processes. This issue may severely limit a CG model’s applicability to practical problems. On the other hand, properly introducing auxiliary CG variables may mitigate this inconsistency, and significantly improve the model’s generalization ability. 3.4 Conclusion This work presents a caveat in constructing reliable CG molecular dynamics models for non-equilibrium processes. Specifically, a CG model’s generalization ability relies on its consistency in the conditional PDF of the phase space vector (or equivalently, the CG projector operator) under various non-equilibrium conditions. This criterion is determined by the proper choice of the CG variables a priori and can not be remedied by constructing more accurate reduced modeling (the free energy and the memory) terms. Although the Zwanzig’s projection formalism can in principle provide the rigorous reduced dynamics, it is valid only when the distribution of the CG variables is consistent with the one associated with the CG projection operator. Unfortunately, this metric seems to be broadly overlooked in most existing studies on CGMD modeling. While the previous works (e.g. Refs. Lei et al. (2010); Hijón et al. (2010)) use dynamic properties such as the velocity auto-correlation functions and the mean squared displacement to validate the CG models, these 42 quantities are essentially defined under the marginal density distribution near-equilibrium; their applications to non-equilibrium processes are generally un-warranted. To alleviate the above challenge, this work proposes to learn a set of auxiliary CG variables based on the TICA to minimize the entropy contribution of the unresolved variables so that the conditional PDF of the full phase space vector under various conditions approaches the equilibrium case (i.e., Eq. (3.6)). We verify this generalization metric by examining the distribution of the second moment, the fluctuation force, and its variation under various external conditions. We show that the common CG model that uses the molecule’s COMs as the CG variables is generally insufficient to retain the consistent conditional PDF. In contrast, the present models with auxiliary CG variables show significant improvement. Furthermore, the crucial role of this metric is reflected in modeling the non-equilibrium reverse Poiseuille flow and the Taylor vortex. In particular, the prediction of the standard CGCOM model exhibits large discrepancies from the full MD results due to its inefficacy in modeling the intramolecular interactions under external flows different from the equilibrium conditions. Conversely, the present model can faithfully recover the complex visco-elastic responses and therefore yields consistent predictions with the full MD results. Finally, we note that the learning of the auxiliary CG variables in the present study remains somewhat empirical. They are constructed based on the TICA and hence take a linear form of the full atomistic coordinates. In practice, we may learn the CG variables as nonlinear encoders of the molecular conformation to achieve a more efficient representation of the intra-molecular interactions (e.g., see Ref. Fang et al. (2022)). We leave this for future study. 43 CHAPTER 4 CONSENSUS-BASED ENHANCED SAMPLING FOR HIGH-DIMENSIONAL FREE ENERGY SURFACES 4.1 Challenges in Constructing High-Dimensional Free Energy Surfaces Molecular dynamics (MD) provides an essential tool Frenkel and Smit (2023) to access the micro-scale insights of complex processes in many scientific applications, including chemical engineering, material synthesis, and drug design. To further predict the collective behaviors, it is often desirable to construct the free energy surfaces (FESs) with respect to a set of functional-relevant collective variables (CVs). However, the accurate construction of FES remains a practical challenge due to the high dimensionality and the prevalence of energy barriers. Several ingenious methods based on importance sampling have been developed, such as umbrella sampling Torrie and Valleau (1977b), histogram reweighting Kumar et al. (1992), metadynamics Laio and Parrinello (2002); Barducci et al. (2008); Grafke and Laio (2024), variational enhanced sampling Valsson and Parrinello (2014); Shaffer et al. (2016); Bonati et al. (2019), weighted ensemble and its variants Zhang et al. (2007); Ahn et al. (2017, 2021), hyperdynamics Voter (1997), transition matrix based unbiasing method Wu and Noé (2014), orthogonal space random walk Zheng et al. (2008) and adaptive biasing force Darve and Pohorille (2001); Darve et al. (2008); Lelièvre et al. (2008); Chipot and Lelièvre (2011). Despite the efficacy in enhancing the phase space exploration, these methods are based on iterative estimation of the biased probability density functions (PDFs) (or biased force), where the efficiency generally diminishes as the number of CVs increases. Strategies such as bias-exchange Piana and Laio (2007) and parallel bias metadynamics Pfaendtner and Bonomi (2015); Prakash et al. (2018) have been developed to alleviate this issue. Both methods factorize the biasing potential as a product of several low-dimensional functions and can work effectively when the underlying FESs admit such structures which, however, may not always be the case for high-dimensional problems. Several other methods Sugita and Okamoto (1999); Marinari and Parisi (1992); Gao (2008); Yang and Qin Gao (2009); Kim et al. (2006); Hamelberg et al. (2004); Miao et al. (2015); Lu and Vanden-Eijnden (2013); Yu et al. (2016) achieve enhanced sampling 44 irrespective of pre-defined CVs (see review Yang et al. (2019)). However, the construction of explicit high-dimensional FESs remains a challenging problem. Alternative to the histogram methods, the temperature-accelerated molecular dynamics Maragliano and Vanden-Eijnden (2006) and the adiabatic free energy dynamics Rosso et al. (2002); Abrams and Tuckerman (2008) are developed by introducing an extended dynamics of the CVs with an artificially high temperature to overcome the energy barriers. The methods avoid dealing with the biased potentials and have shown the promise of efficient exploration of the high-dimensional CV space Abrams and Vanden-Eijnden (2010). On the other hand, to further construct the FES, the sampling points are deposited in a greedy way Maragliano and Vanden-Eijnden (2008); the effectiveness for high-dimensional problems remains under-explored. With the recent advancements in machine learning, kernel methods and deep neural networks (DNNs) have demonstrated promising results in representing multi-dimensional FES Stecher et al. (2014b); Mones et al. (2016); Schneider et al. (2017); Bonati et al. (2019); Sidky and Whitmer (2018). However, as also noted in Ref. Cendagorta et al. (2020), one essential problem for constructing high-dimensional FES lies in how to optimize the training set. Ideally, to efficiently approximate a high-dimensional function, the sampling should enable certain adaptivity based on the posterior error so that the training set can be updated according to the particular underlying FES. However, such posterior error-based sampling remains largely open for FES construction since the approximation residual can neither be easily queried nor sampled within the full phase space. In practice, the training samples are often collected either as pre-defined grid points or in a heuristic manner. As the number of CVs increases, such choices may induce pronounced approximation error and suffer from performance degradation, since most phase space regimes essentially become thermodynamically inaccessible. Remarkably, the DeepVes Bonati et al. (2019) enables the efficient sampling of the CV space by jointly constructing the bias potential and the target distribution. On the other hand, the accurate reconstruction of explicit FES further requires the estimation of high-dimensional PDFs from the samples. Recent efforts Noé et al. (2019); Gabrié et al. (2022) seek the direct approximation of the transition operator instead of estimating PDFs using generative models, when representative 45 configurations are known a priori. In addition, the reinforced dynamics Zhang et al. (2018d); Wang et al. (2022) (see also Ref. van der Oord et al. (2023)) proposes an efficient approach to impose adaptivity by using an uncertainty indicator as an indirect measure of the construction error to bias MD simulations, which relies on calculating the standard deviation of the predictions from multiple DNNs trained on the same dataset. In this work, we present a consensus-based enhanced sampling (CES) method to efficiently construct high-dimensional FES for complex MD systems. A unique feature is that the method enables adaptive posterior error-based sampling such that the FES approximation and the sample points can be simultaneously optimized. Also, unlike the reinforced dynamics, the method does not rely on the training of multiple DNN representations. The main idea is to formulate the construction as a min-max problem, where the max-problem seeks a residue-based distribution to establish adaptive sampling in the vicinity of the explored phase space regime, and the min-problem optimizes the DNN parameters for the FES representation. Iteratively optimizing both the training set and the DNN representation achieves an adversarial construction of the FES pertaining to the kinetic transition regime. For the maximization step, it is worth mentioning that the query of the thermodynamically accessible phase space is non-trivial. In particular, existing approaches Tang et al. (2023); Bao et al. (2020) in the spirit of adversarial generative models can not be directly applied, since the global residual error is unknown a prior. Instead, we establish a consensus-based sampling Carrillo et al. (2022) (see also Refs. Pinnau et al. (2017); Carrillo et al. (2018, 2021); Chen et al. (2022) ) in the form of a stochastic interacting particle system governed by a McKean stochastic differential equation. A quadratic potential is adaptively constructed to probe the local maximum error regime by exploiting the Laplace approximation under a low-temperature limit. Meanwhile, a coherent noise term is introduced to efficiently explore the full CV space under a high-temperature limit and yield the target sampling points for the maximization problem, by which the minimization of the DNN residue can be solved jointly. We demonstrate the effectiveness of the proposed method by constructing the FES of several biomolecule systems with the number of CVs up to 30. Fig. 4.1 sketches the workflow of the proposed method. 46 Figure 4.1 The workflow of the present CES-based method. In the minimization step, given a collection of sampling points, the reference force (i.e., the gradient of the underlying FES) can be calculated using the restrained dynamics; a comparison with the force inferred from the DNN approximation yields a posterior error. In the maximization step, the posterior error determines a residue-based distribution with entropy regularization. An interacting particle system following McKean stochastic dynamics is used to achieve an adaptive sampling of the max-residue regime by modulating the exploitation of the Laplace approximation of the current residue-based distribution and the exploration of the uncharted phase space. The FES can be accurately reconstructed after several iterations of the minimization and maximization step. 4.2 Consensus-Based Sampling Methodology 4.2.1 Free energy and mean forces We consider a full model with micro-scale coordinates r ∈ R𝑁 whose dynamics is governed by potential 𝑈 (r) : R𝑁 → R under temperature 𝑇. Suppose we are interested in CVs s(r) : R𝑁 → Γ with Γ ⊂ R𝑀, the FES 𝐴(z) of the CVs is defined by 𝐴(z) = − 1 𝛽 ln 𝜌(z), where 𝛽 = 1/𝑘 𝐵𝑇 is the inverse of the thermal temperature, 𝜌(z) = ∫ 1 𝑍 exp (−𝛽𝑈 (r))𝛿(s(r) − z)dr (4.1) (4.2) is the marginal density for s(r) = z, 𝑍 = ∫ exp (−𝛽𝑈 (r))dr is the partition function, and 𝛿(·) represents the Dirac delta function, which we refer readers to Stoltz et al. (2010) for more detailed 47 MinimizationstageMaximizationstageReferenceFESInferredDNNPosteriorErrorQuadraticApproximationEnhanceSampling explaination. For high-dimensional CVs, the direct estimation of 𝜌(z) often becomes numerically challenging. An alternative approach is to fit the mean force F(z) := −∇𝐴(z) at various sample points, which can be estimated via the restrained dynamics Allen and Tildesley (2017) by introducing a harmonic term into the full potential, i.e., 𝑈𝑘 (r, z) = 𝑈 (r) + 𝑘 2 (s(r) − z)𝑇 (s(r) − z), (4.3) where 𝑘 represents the magnitude of the restrained potential. As shown in Ref Maragliano and Vanden-Eijnden (2006), the mean force can be computed by F(z) = lim𝑘→∞ F𝑘 (z), where F𝑘 (z) is defined by F𝑘 (z) = ∫ 1 𝑍𝑘 (z) 𝑘 (s(r) − z) exp (−𝛽𝑈𝑘 (r, z))dr, (4.4) and can be sampled as the local first-moment estimation. In principle, given a collection of sample points z, 𝐴(z) can be re-constructed (up to a constant) by matching the mean force −∇𝐴(z) at the individual points. However, for complex MD systems, the sampling over the phase space could be highly non-trivial due to the prevalence of local energy minima; the training set is often determined a prior as pre-selected points or in a greedy manner. As the number of CVs increases, the empirical random samples may introduce pronounced discretization error of the loss function over the full phase space. To efficiently construct 𝐴(z) in the thermodynamically accessible regime, it is desirable to simultaneously optimize the training set and the FES approximation through certain adaptive sampling based on the posterior residual error. This motivates the present method illustrated below. 4.2.2 Min-Max formulation Let 𝐴N (z) denote the DNN representation of the FES 𝐴(z), which is parameterized by minimizing the loss function LN (z) = ∥∇z 𝐴N (z) + F(z)) ∥2 (4.5) for z ∈ Γ. To solve the problem, we introduce a sampling distribution 𝑞(z) and define the weighted loss (LN , 𝑞) = ∫ Γ LN (z)𝑞(z)dz. 48 (4.6) A desired distribution intends to maximize the discrepancy in the dataset for a given network 𝐴N (z). Accordingly, we define the maximum problem as 𝐺 [LN ] = max 𝑞 (LN , 𝑞). (4.7) Since (LN , 𝑞) is always non-negative, a good approximation of the original free energy surface 𝐴N (z) (up to a constant) can be obtained by solving the following problems min 𝐴N max 𝑞 (LN , 𝑞). (4.8) Proposition 4.2.1. Assuming that there exists a solution of LN (z) = 0 for z ∈ Γ, then 𝐴∗ is a solution if and only if it solves (4.8). Proof. Suppose 𝐴∗ is the solution for LN (z) = 0, it satisfies (LN ∗, 𝑞) = 0 for any 𝑞 ∈ 𝑊, i.e., 𝐺 [LN ∗] = 0. Therefore, 𝐴∗ is a solution for the minimax problem (4.8). On the other hand, if ˆ𝐴 is the minimizer for problem (4.8) but not the solution for LN (z) = 0, then there exists ˆ𝑞 ∈ 𝑉 such that (L ˆN , ˆ𝑞) > 0. However, (LN ∗, 𝑞) = 0 for all 𝑞, which contradicts the assumption that ˆ𝐴 is the minimizer. □ Proposition 4.2.1 shows that the direct construction of FES 𝐴(z) can be reformulated as an adversarial learning of the optimal solution 𝐴N (z) for the min-max problem (4.8). Accordingly, the training consists of two components: the minimization part optimizes the DNN representation with the current training set; the maximization part explores the regime of the largest residue for the current DNN representation and essentially establishes an adaptive sampling of the training set based on the posterior error. To numerically solve the max-problem, certain regularization needs to be introduced. Otherwise, the max-problem will simply yield a delta measure, i.e., 𝛿(z − z∗), where z∗ = arg max LN (z). Since the sampling needs to simultaneously identify the max-residual regime and explore the uncharted phase space, we introduce the entropy-based regularization Wang et al. (2020a); Gao et al. (2022) (see also Refs. Gulrajani et al. (2017); Miyato et al. (2018); Tang et al. (2023) for gradient-based 49 regularization), and the max-problem is reformulated by ∫ min 𝑞 (−LN (z) + 𝜅−1 ℎ ln 𝑞(z))𝑞(z)dz. (4.9) The problem is convex for a PDF 𝑞 with a unique global minimum at 𝑞∗(z) = exp(−𝜅ℎL− where 𝑍 ∗ = ∫ exp(−𝜅ℎL− N (z))/𝑍 ∗, N (z) = −LN (z). The parameter 𝜅ℎ is a Lagrangian N (z))dz and L− multiplier, balancing the focus between the peak concentration and the scope of exploration, and is somewhat similar to the inverse temperature in statistical mechanics. An elevated 𝜅−1 ℎ induces a distribution closer to a uniform distribution. Conversely, a diminished 𝜅−1 ℎ induces a concentrated distribution near the max-residue point. However the analytical formula of 𝑞∗(z) futher depends on the accurate FES, the direct sampling remains challenging for high-dimensional CVs. Inspired by the consensus-based sampling method Carrillo et al. (2022), we establish a stochastic particle system governed by the McKean stochastic differential equation. By properly constructing the conservative potential and noise terms, the particle distribution provides a good approximation of 𝑞∗ as illustrated below. 4.2.3 Exploitation and exploration in the max-problem To approximate the target distribution 𝑞∗(z), particularly in the vicinity of the max-residual point z∗, we exploit Laplace’s principle in the large deviations theory, i.e., (cid:18)∫ (cid:18) − 1 𝜅 log lim 𝜅→∞ exp (−𝜅 𝑓 (z))d𝜌∗(z) (cid:19)(cid:19) = 𝑓 (z∗) (4.10) holds true for any compactly supported probability measure 𝜌∗, where z∗ ∈ supp(𝜌∗) uniquely minimizes the function 𝑓 . This enables us to identify the max-residual point from a collection of samples (cid:8)z𝑖(cid:9) 𝑁𝑤 𝑖=1 by the first-order momentum m under the weighted density function 𝑝(z), i.e., ∫ m = z𝑝(z)dz ≈ 𝑁𝑤 ∑︁ 𝑖=1 z𝑖 ˆ𝑝(z𝑖) ˆ𝑝(z) = N (z)) exp (−𝜅𝑙 L− 𝑖=1 exp (−𝜅𝑙 L− N (z𝑖)) (cid:205)𝑁𝑤 (4.11) where 𝜅−1 𝑙 represents a low temperature limit. However, the integration is subject to the so-called curse of dimensionality as the number of CVs increases. Instead, we treat sampler z𝑖 as a random walker z𝑖 𝑡 governed by the following McKean stochastic differential equation (cid:164)z𝑖 𝑡 = − 1 𝛾 ∇z𝐺 (z𝑖 𝑡) + √︄ 2 𝜅ℎ𝛾 𝜉𝑖 (𝑡), 50 (4.12) where 𝐺 (z𝑡) = 1 2 (z𝑡 − m𝑡)𝑇𝑉 −1 𝑡 (z𝑡 − m𝑡) denotes an adaptively constructed conservative potential function. The formulations of m𝑡, 𝑉𝑡 are specified in (4.13) with the rationale discussed in the next section. Consequently, 𝐺 (z𝑡) navigates the random walkers (i.e., individual particles) towards m𝑡, which represents the region of large residual error. The second term in Eq. (4.12) represents a stochastic term where 𝛾 represents the friction coefficient and 𝜉 (𝑡) represents the standard Gaussian white noise characterized by zero mean and covariance E[𝜉𝑖 (𝑡)𝜉 𝑗 (𝑡′)] = 𝛿𝑖 𝑗 𝛿(𝑡 − 𝑡′). The coupling of the conservative and stochastic terms maintains a relatively high temperature ℎ , and a large friction coefficient 𝛾 is applied such that the distribution is almost always Gaussian 𝜅−1 during the evolution. As shown in the following section, the distribution of walkers 𝑞𝑡 (z) converges to ∝ exp(−𝜅ℎ𝐺 (z)) characterized by m∞ and 𝑉∞. Accordingly, the balance between exploitation and exploration is controlled using two temperatures 𝜅−1 𝑙 and 𝜅−1 ℎ . As 𝜅−1 𝑙 decreases, the random walker distribution concentrates near the max-residual points of the current model, reflecting the role of exploitation. Conversely, as 𝜅−1 ℎ increases, the distribution smoothens progressively, enhancing the exploration of the uncharted regions. 4.2.4 Adaptive parameter estimation and FES construction In this section, we show that the sampling distribution governed by Eq. (4.12) converges to a steady state as an approximation of 𝑞∗ with the proper choices of m𝑡 and 𝑉𝑡 given by m𝑡 = 𝑁𝑤 ∑︁ 𝑖=1 𝑡 ˆ𝑝(z𝑖 z𝑖 𝑡), 𝑉𝑡 = 𝜅𝑡 𝑁𝑤 ∑︁ 𝑖=1 (z𝑖 𝑡 − m𝑡) (z𝑖 𝑡 − m𝑡)𝑇 ˆ𝑝(z𝑖 𝑡), (4.13) where ˆ𝑝(z) is defined by (4.11). In particular, with 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ, (4.12) converges to the distribution based on the target residues. Proposition 4.2.2. Suppose L− 𝜇), 𝑞𝑡 → exp (−𝜅ℎL − ∫ exp (−𝜅ℎL − N (z)) N (z))dz as 𝑡 → ∞, given 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ. N (z) takes a local quadratic approximation in form of 1 2 (z−𝜇)𝑇 Σ−1(z− We refer to SI for the proof. The stochastic dynamics (4.12) and (4.13) differs from the regular Langevin dynamics; it is nonlocal and similar to the one in Carrillo et al. (2022) except that the 51 parameter 𝜅ℎ appears in the target distribution that modulates the exploration of the sampling dynamics. The quadratic assumption of the loss function L− N (z) is due to the fact that we are mainly interested in the regime near the max-residue point. Under the low-temperature limit, the local regime can be well characterized by the first and second moments following Laplace’s principle. While m identifies the extremal point, 𝑉 recognizes the anisotropic nature among the different CVs. In this study, for the sake of computational efficiency, we further simplify 𝑉 by only considering the diagonal entries denoted as v. Moreover, we utilize a moving average to ensure stable estimation m𝑡+1 = 𝛽1m𝑡 + (1 − 𝛽1) 𝑁𝑤 ∑︁ 𝑡 ˆ𝑝(z𝑖 z𝑖 𝑡), v𝑡+1 = 𝛽2v𝑡 + (1 − 𝛽2)𝜅𝑡 𝑖=1 𝑁𝑤 ∑︁ (z𝑖 𝑡 − m) ⊙ (z𝑖 𝑡 − m) ˆ𝑝(z𝑖 𝑡), (4.14) 𝑖=1 where 𝛽1 and 𝛽2 are hyper-parameters. To maintain unbiased estimation, normalization is implemented as follows: m = m𝑡 1−𝛽𝑡 1 and v = v𝑡 1−𝛽𝑡 2 . With the training samples obtained from the aforementioned maximization step, the DNN representation of the FES is optimized using the Adam stochastic gradient descent method Kingma and Ba (2015) for the minimization step. The loss function of the updated DNN representation 𝐴N (z), in turn, navigates the consensus-based adaptive sampling for the updated maximization step. The min-max problem is solved iteratively to achieve comprehensive sampling of the full phase space. So far, the construction process is based on the function approximation of 𝐴(z) over the full regime. However, we note that the kinetic processes of a molecular system are generally characterized by the local minima and saddles points on the FES. On the other hand, the regimes of high free energy are less relevant. To accurately construct these thermodynamically accessible regimes, we modify the loss function L as LN (z) = ∥∇z 𝐴N (z) + F(z) ∥2 ∥F(z) ∥2 + 𝑒 , (4.15) for all the biomolecule systems except the toy example of the 1D Rastrigin function. 𝑒 is a small value to regularize the denominator. We note that a similar formulation is used in Ref. Maragliano and 52 Vanden-Eijnden (2008) to quantify the model accuracy after the FES is constructed. Alternatively, in this study, LN (z) directly involves the construction process for the adaptive sampling of the training set. A detailed algorithm is presented in Algorithm 4.1. Algorithm 4.1 Consensus-based enhance sampling. Require: Initial sampling point z𝑖 0, for 𝑖 = 1, . . . , 𝑁𝑤 Require: Initial DNN parameter 𝜃0 Require: The number of training iterations 𝑁𝑡𝑟𝑎𝑖𝑛 Require: The number of data collected 𝑁𝑑𝑎𝑡𝑎 in each training iteration 𝑗 ← 0, 𝑡 ← 0 𝑇 ← ⌈ 𝑁𝑑𝑎𝑡 𝑎 ⌉ 𝑁𝑤 while 𝑗 < 𝑁𝑡𝑟𝑎𝑖𝑛 do while 𝑡 ≤ 𝑇 do 𝑡) = ∇z 𝐴N (z𝑖 𝑡; 𝜃 𝑗 ) 𝑡 at z𝑖 𝑡 calculate the mean force F𝑖 calculate the predicted force F𝜃 (z𝑖 𝐿𝑖 ← LN (z𝑖 𝑡) 𝑤𝑖 ← exp (𝜅𝑙 𝐿𝑖) (cid:205)𝑖 exp (𝜅𝑙 𝐿𝑖) m𝑡+1 ← 𝛽1m𝑡 + (1 − 𝛽1) (cid:205)𝑖 z𝑖 v𝑡+1 ← 𝛽2v𝑡 + (𝜅𝑙 + 𝜅ℎ)(1 − 𝛽2) (cid:205)𝑖 (z𝑖 m ← m𝑡+1 1−𝛽𝑡 1 v ← v𝑡+1 1−𝛽𝑡 2 z𝑖 ← z𝑖 𝑡+1 𝑡 ← 𝑡 + 1 𝑡 − m) ÷ v + √︃ 2𝛼 𝛾𝜅ℎ 𝑡 − 𝛼 𝛾 (z𝑖 𝑡𝑤𝑖 𝑡 − m𝑡)2𝑤𝑖 𝜂𝑡, 𝜂𝑡 ∼ N (0, 1) end while Save the training dataset D 𝑗 = {z𝑖 Optimize 𝜃 𝑗+1 using the generated training set D𝑙 for 𝑙 = 0, . . . , 𝑗. 𝑗 ← 𝑗 + 1 𝑡 }𝑇 𝑡=0 𝑡, F𝑖 end while 4.3 Application of Consensus-Based Sampling to Biomolecular Systems 4.3.1 One-dimensional Rastrigin function To illustrate the essential idea of the present method, we start with the 1-dimensional Rastrigin function: 𝑓 (𝑥) = 𝑥2 − cos(2𝜋𝑥), 𝑥 ∈ [−3, 3]. (4.16) Instead of a neural network, we simply use a piecewise polynomial function 𝑓𝜃 (𝑥) for this 1D 53 Figure 4.2 Adaptive sampling and construction of the 1D Rastrigin function. Left: The reference function 𝑓 (𝑥) and the constructed approximations 𝑓𝜃𝑖 (𝑥) obtained at different iteration steps. The relative 𝑙2 error is less than 6 × 10−3 after 12 iterations. Right: The residual function | 𝑓 (𝑥) − 𝑓𝜃𝑖 (𝑥)|. The symbols represent the locations identified by each adaptive sampling (i.e., the maximization) step where new points will be added for the next training (i.e., the minimization) step. problem, where 𝜃 represents the fitting parameters. Accordingly, the residual is directly defined as | 𝑓 − 𝑓𝜃𝑖 |. Initially, we set 𝑓𝜃0 (𝑥) ≡ 8 with consistent boundary condition. We use the proposed sampling method with 10 walkers to estimate the first and second moments, yielding 𝑚1 = 1 × 10−4 and 𝑉1 = 0.022 and therefore the first and second derivative 𝑓 ′(𝑚1) = 0 and 𝑓 ′′(𝑚1) = 1 𝑉1 = 40.97. This value is extremely close to the actual minimizer at 0 (i.e., the max-residue point) and the second derivative 𝑓 ′′(𝑚1) = 41.47. Accordingly, we add new data points near 𝑥 = 1 × 10−4 into the training set consisting of the boundary points 𝑓 (−3) = 𝑓 (3) = 8, which enables us to construct an updated approximation 𝑓𝜃1 (𝑥). Similarly, with the approximation 𝑓𝜃𝑖−1 (𝑥), we conduct another sampling process (i.e., the maximization step) and include the obtained samples near 𝑚𝑖 into the training set, yielding an updated approximation 𝑓𝜃𝑖 (𝑥). As shown in Fig. 4.2, for each iteration, the sampling step can pinpoint the regime where the error is most pronounced. Furthermore, as shown in the Supplementary Information (SI), the second momentum near the maximum region can be estimated correctly as well. The underlying function 𝑓 (𝑥) can be fully recovered after 12 iterations. 54 2.50.02.501020302.50.02.5302010001312 4.3.2 Two-dimensional FES We use the alanine dipeptide (Ace-Ala-Nme), referred to as Ala2, as a benchmark problem. The molecule is solvated in 383 TIP3P water molecules similar to Ref. Zhang et al. (2018d). The full MD system is simulated in a canonical ensemble under temperature 300𝐾 using the Amber99-SB force field Hornak et al. (2006) with a time step is 2 fs. We refer to the SI for the simulation details. We choose the CVs as the torsion angles: 𝜙 (C, N, C𝛼, C) and 𝜓 (N, C𝛼, C, N). For comparison, we also construct the FESs using the VES Bonati et al. (2019) and RiD Zhang et al. (2018d) method with the same setup parameters presented therein. For the CES method, we use 10 walkers to explore the configuration space. The initial points of the walkers at the 𝑘th iteration are chosen to be the final 10 points of a biased dynamics with the bias potential −𝐴N (z; 𝜃 𝑘−1). In the sampling stage, the inverse of the low and high temperatures are set to be 𝜅𝑙 = 10 and 𝜅ℎ = 1, respectively. For each sample point, restrained dynamics is conducted with 𝜅 = 500 for 5000 steps to compute the average force. The timestep 𝛼 for the dynamics of the random walkers (4.12) is set to be 0.1. Fig. 4.3 shows the FESs constructed by the three different methods and the reference obtained by the metadynamics Laio and Parrinello (2002) using a long simulation time. As presented in Fig. 4.3e, the CES method yields smaller approximation error and meanwhile requires lower computational cost. The better performance is not unexpected since the sample points can be adaptively optimized based on the construction error. It is worth mentioning that the VES method was initially developed to achieve efficient enhanced sampling in the high-dimensional phase space; it is somewhat unfair to use it for direct FES construction since the estimation of the probability density function from the obtained samples could be numerically challenging for high-dimensional cases. On the other hand, the present CES method enables the explicit construction of FES if the analytical form is needed. Also, we note that the accuracy of the RiD method can be further improved using more iterations at the cost of larger simulation and training overhead. Below, we test the methods in more complex systems. 55 Table 4.1 The accuracy of the constructed 2D FES (the Ala2 molecule) and computational time (in hours, the same below) for the VES, RiD, and CES methods. The 𝑙2 and 𝑙∞ error are computed up to 40 KJ/mol. The reference solution is constructed by the metadynamics. The simulation time of the CES method is multiplied by 10 since 10 walkers are used. Method VES RiD CES Accuracy 𝑙2 error 5.39 3.15 𝑙∞ error Simulation 21.03 11.04 17.98 Time 47.5 Train 1.88 10.68 0.23 × 10 0.22 (GPU) 0.18 (CPU) 0.13 (GPU) Figure 4.3 The 2D FES for the Ala2 molecule on the 𝜙 − 𝜓 plane constructed by (a) Metadynamics (reference) (b) VES (c) RiD (d) CES (the present method). The accuracy and the computational cost are shown in Table 4.1. 4.3.3 Two-dimensional FES 4.3.4 Three-dimensional FES Next, we consider a s-(1)-phenylethyl (s1pe) peptoid in an aqueous environment similar to Refs. Weiser and Santiso (2019); Wang et al. (2020a). The full system consists of one biomolecule and 546 water molecules in a (2.9n𝑚)3 dodecahedron box. The CHARMM general force field (CGenFF) Weiser and Santiso (2019) is used for the biomolecule and the TIP3P model Jorgensen et al. (1983) is used for the water molecules. The system temperature is set to be 298 K with a time step is 2 fs. 56 −202(a)(b)−202−202(c)−202(d)010203040 We refer to the SI for details. The CVs are the three torsion angles 𝜔 spanned by atoms (C𝛼, C, N, C𝛼), 𝜙, and 𝜓, where the latter two are the same as the Ala2 molecule. The FES is constructed by both the RiD and CES methods. The setup of the RiD method is the same as Wang et al. (2020a). For the CES method, we use 20 walkers and set the inverse of the low and high temperatures to be 𝜅𝑙 = 10 and 𝜅ℎ = 2, respectively. The initial points of these walkers at each iteration are chosen in the same method as the two-dimensional problem. The timestep 𝛼 for the dynamics of the random walkers (4.12) is set to be 0.1. For each sample point, restrained dynamics with 𝜅 = 500 is conducted for 10000 steps to compute the mean force. For visualization, we project the constructed FES onto a two-dimensional plane and fix the third variable. Fig. 4.4 shows the projected FES on the 𝜔 − 𝜙 and 𝜙 − 𝜓 plane obtained from the CES and the RiD methods. For each projection, the reference is constructed as a 2D FES using the metadynamics Laio and Parrinello (2002). Similar to the previous 2D case, the present CES method yields higher accuracy with lower computational cost. 4.3.5 Nine-dimensional FES Furthermore, we consider a more complex molecule, the peptoid trimer (s1pe)3, solvated in a (4.2 nm)3 dodecahedron box with 1622 TIP3P water molecules. The force field and other simulation setups are similar to the s1pe molecule. The chosen CVs consist of the 9 torsion angles 𝜔, 𝜙, 𝜓 that are defined in s1pe case, associated with the different C𝛼 atoms and denoted as 𝜔1, 𝜙1, 𝜓1, 𝜔2, 𝜙2, 𝜓2, 𝜔3, 𝜙3, 𝜓3. We use 64 walkers for this case and the initial conditions of the walkers at each iteration are chosen with the same method as the previous cases. The inverse of the low and the high temperature are set to be 𝜅𝑙 = 100 and 𝜅ℎ = 2, respectively. The timestep 𝛼 of the dynamics of the random walkers is set to be 0.1. The FES is constructed by the CSE method using 28 iterations of sampling and training, which requires 225.53 × 64 = 14434.35 CPU hours for simulation and 6.06 GPU hours for training. For comparison, the RiD method uses 17900 CPU hours for simulation and 15.44 GPU hours for training. Similar to the above 3D problem, the constructed 9D FES is projected onto various 2D 57 Figure 4.4 The 3D FES for molecule s1pe projected on the 𝜔 − 𝜙 with 𝜓 = 1.5 (first row) and 𝜙 − 𝜓 planes with 𝜔 = 1.5 (second-row). (a-d) The 2D FES constructed using metadynamics with the third variable restrained (reference); (b-e) projection of the 3D FES constructed by the RiD method; (c-f) projection of the 3D FES constructed by the present CES method. The CES method requires 4.81 × 20 CPU hours for sampling and 0.84 GPU hours for training, whereas the RiD method uses 423.33 CPU hours and 8 GPU hours, respectively. For a more detailed quantitative analysis, we refer to the SI. planes with the remaining variables fixed. For each 2D projection, the reference is constructed as a 2D FES using metadynamics. Fig. 4.5 shows the projection on the 𝜔1 − 𝜙1 and 𝜔1 − 𝜓1 plane (see SI for visualization of other 2D projections). Compared with the Rid method, the present CES method yields higher accuracy with lower computational cost. The different performances could be possibly due to the distinct ways of imposing the sampling adaptivity for the two methods. Specifically, the Rid method trains a replica of DNNs on the same sample set and uses the standard deviation of multiple DNNs’ predictions as an indirect measure of the construction error. The significant sampling error of the mean force (i.e., −∇𝐴(z)) could be overlooked by using the standard deviation as the uncertainty indicator, and therefore may further propagate into the construction error. For instance, a biased sampling of mean force may lead to a consistent biased prediction among multiple DNNs. As a result, even if the standard deviation of multiple DNNs’ predictions for a sample point is small, the construction error could be still 58 202(a)202202(d)(b)202(e)(c)202(f)010203040 Figure 4.5 The 9D FES for molecule (s1pe)3 projected on the 𝜔1 − 𝜙1 (first row) and 𝜔1 − 𝜓1 plane (second row). (a-d) The 2D FES constructed using metadynamics with the remaining variable restrained (reference); (b-e) projection of the 9D FES constructed by the RiD method using 17900 CPU hours for sampling and 15.44 GPU hours for training; (c-f) projection of the 9D FES constructed by the CES method using 14434.35 CPU hours and 6.06 GPU hours, respectively. pronounced in that region. Instead, the CES method directly uses the construction error to impose the adaptivity and therefore achieve enhanced sampling of the regions lacking accuracy. 4.3.6 Thirty-dimensional FES Finally, we consider the polyalanine-15 (Ace-(Ala)15-Nme), denoted as Ala16, solvated in 2258 water molecules. The full system is simulated temperature 300K in a (4.62 nm)3 dodecahedron box using the Amber99-SB force field with a time step is 2 fs. We refer to the SI for the simulation details. The chosen CVs consist of torsion angles 𝜙 and 𝜓, defined in Ala2 case, associated with the different C𝛼s, denoted as {𝜙𝑖, 𝜓𝑖}15 𝑖=1. We use 64 walkers for this case and the initial value of the walkers at each iteration is given by the biased simulation before. The inverse of the low and high temperatures are set to be 𝜅𝑙 = 20 and 𝜅ℎ = 5, respectively. The timestep 𝛼 for dynamics of the random walkers Eq. (4.12) is set to be 0.1. The FES is constructed using 100 iterations of sampling and training. Similar to the previous case, the obtained FES is plotted on a two-dimensional plane 59 202(a)202202(d)(b)202(e)(c)202(f)0102030405060 Figure 4.6 The 30D FES for molecule Ala16 projected on various 2D planes. (a-d) 𝜙1 − 𝜓1 (b-e) 𝜙2 − 𝜙3 (c-f) 𝜓2 − 𝜓3. (a-b-c) 2D FES constructed by metadynamics (reference) (d-e-f) The 30D FES constructed by the present CES method. while the remaining variables are fixed and the reference is constructed as a 2D FES using the metadynamics. Fig. 4.6 shows the projection on the 𝜙1 − 𝜓1, 𝜙2 − 𝜙3 and 𝜓2 − 𝜓3 plane. For all the cases, the projection of the 30-dimensional FES shows good agreement with the 2-dimensional reference solution. We have also examined the projection on other planes; the prediction shows good agreement with the reference solution as well. We refer to the SI for details. 4.4 Conclusions We have presented a consensus-based approach for constructing high-dimensional FESs by reformulating the construction task as a minimax optimization problem. Rather than seeking the direct fitting, this method essentially establishes an adversarial learning of FESs by simultaneously optimizing the target function approximation and the training set. While the common approaches mainly focus on the efficient exploration of the phase space in the presence of local minima, the present method further accounts for the discretization error that has been broadly overlooked in FES construction. Adaptive sampling of the max-residue regime is achieved through the consensus-based sampling of a posteriori residue-induced distribution in the form of a stochastic particle system in 60 202(a)202202(d)(b)202(e)(c)202(f)0102030405060 the CV space. Given the fact the sampling only relies on the first and second-moment estimation, the method could be particularly efficient for high-dimensional problems. While the numerical results of biomolecular systems have demonstrated the effectiveness for FES construction, the present framework of unifying the residual minimization and max-residue enhanced sampling is quite general for model reduction of complex systems, e.g., the stochastic reduced dynamics (e.g., see Ref. She et al. (2023)) with a state-dependent memory for multiscale physical systems. 61 CHAPTER 5 CONCLUSION AND OUTLOOK In this thesis, we aim to address the challenges of mesoscale modeling by tackling the problem from three key perspectives: model representation, collective variable selection, and data collection. By integrating these techniques, we strive to develop a comprehensive framework for constructing faithful models that accurately capture phenomena across different scales. In model representation part, we propose a machine learning-based memory kernel to model the many-body nature of dissipative interactions. This approach is designed to respect physical constraints, such as symmetry, and ensures thermodynamic consistency by strictly satisfying the second fluctuation-dissipation theorem. By embedding these fundamental principles into the model, we aim to achieve accurate and physically meaningful representations of complex systems. In collective variable selection part, we introduce a data-driven method to identify collective variables that capture the principal differences between distributions under various nonequilibrium conditions. By treating these differences as CVs, our approach can generalize across different external force fields, making it adaptable to a wide range of nonequilibrium scenarios. This step is crucial for reducing the dimensionality of the problem while preserving the essential dynamics of the system. In data collection section, to address the challenges of sampling in high-dimensional phase spaces, we reformulate the problem as a minimax optimization. This approach balances exploration of uncharted regions of the phase space with exploitation of areas where the model exhibits high error. By iteratively refining the training data, we ensure that the sampling process is both comprehensive and focused on regions critical for improving model accuracy. By integrating these three perspectives, we aim to develop a unified framework for mesoscale modeling that bridges the gap between microscale interactions and macroscale dynamics. These challenges are not confined to modeling problems between the microscale and mesoscale. We also aim to extend our framework to address modeling challenges between the mesoscale and macroscale, such as non-Fourier heat conduction and the diffusion of active materials. In these 62 applications, we strive to develop models that are not only physically consistent and accurate but also mathematically well-posed. While this is relatively straightforward for ordinary differential equations or stochastic differential equations for models we construct here, it becomes significantly more challenging for partial differential equations or stochastic partial differential equations. We hope to address this by carefully designing the model structure to guarantee existence, uniqueness, and stability of solutions, which are essential for reliable simulations and predictions. Recent advancements in generative models, such as generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models, have demonstrated remarkable capabilities in modeling and sampling from high-dimensional distributions. Despite the explicit connection with the sampling problems we presented ahead, a natural and intriguing question arises: Can these generative models be extended to generate high-dimensional stochastic processes? Extending generative models to generate high-dimensional stochastic processes that simultaneously satisfy physical symmetries and thermodynamic consistency is a challenging but highly promising direction. 63 BIBLIOGRAPHY Abrams, C. F. and Vanden-Eijnden, E. (2010). Large-scale conformational sampling of proteins using temperature-accelerated molecular dynamics. Proceedings of the National Academy of Sciences, 107(11):4961–4966. Abrams, J. B. and Tuckerman, M. E. (2008). Efficient and direct generation of multidimensional free energy surfaces via adiabatic dynamics without coordinate transformations. The Journal of Physical Chemistry B, 112(49):15742–15757. Ahn, S.-H., Grate, J. W., and Darve, E. F. (2017). Efficiently sampling conformations and pathways using the concurrent adaptive sampling (cas) algorithm. The Journal of Chemical Physics, 147(7). Ahn, S.-H., Ojha, A. A., Amaro, R. E., and McCammon, J. A. (2021). Gaussian-accelerated molecular dynamics with the weighted ensemble method: A hybrid method improves thermodynamic and kinetic sampling. Journal of Chemical Theory and Computation, 17(12):7938–7951. Allen, M. P. and Tildesley, D. J. (2017). Computer simulation of liquids. Oxford university press. Anderson, P. W. (1972). More is different: Broken symmetry and the nature of the hierarchical structure of science. Science, 177(4047):393–396. Baczewski, A. D. and Bond, S. D. (2013). Numerical integration of the extended variable generalized Langevin equation with a positive Prony representable memory kernel. The Journal of chemical physics, 139(4):044107. Bao, G., Ye, X., Zang, Y., and Zhou, H. (2020). Numerical solution of inverse problems by weak adversarial networks. Inverse Problems, 36(11):115003. Barducci, A., Bussi, G., and Parrinello, M. (2008). Well-tempered metadynamics: a smoothly converging and tunable free-energy method. Physical review letters, 100(2):020603. Baxevani, G., Harmandaris, V., Kalligiannaki, E., and Tsantili, I. (2023). Bottom-up transient time models in coarse-graining molecular systems. Multiscale Modeling & Simulation, 21(4):1746–1774. Behler, J. and Parrinello, M. (2007). Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett., 98:146401. Best, R. B. and Hummer, G. (2010). Coordinate-dependent diffusion in protein folding. Proceedings of the National Academy of Sciences, 107(3):1088–1093. Bonati, L., Zhang, Y.-Y., and Parrinello, M. (2019). Neural networks-based variationally enhanced sampling. Proceedings of the National Academy of Sciences, 116(36):17641–17647. 64 Bussi, G., Donadio, D., and Parrinello, M. (2007). Canonical sampling through velocity rescaling. The Journal of chemical physics, 126(1):014101. Carrillo, J. A., Choi, Y.-P., Totzeck, C., and Tse, O. (2018). An analytical framework for consensus-based global optimization method. Mathematical Models and Methods in Applied Sciences, 28(06):1037–1066. Carrillo, J. A., Hoffmann, F., Stuart, A. M., and Vaes, U. (2022). Consensus-based sampling. Studies in Applied Mathematics, 148(3):1069–1140. Carrillo, J. A., Jin, S., Li, L., and Zhu, Y. (2021). A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: Control, Optimisation and Calculus of Variations, 27:S5. Cendagorta, J. R., Tolpin, J., Schneider, E., Topper, R. Q., and Tuckerman, M. E. (2020). Comparison of the performance of machine learning models in representing high-dimensional free energy surfaces and generating observables. The Journal of Physical Chemistry B, 124(18):3647–3660. Ceriotti, M., Bussi, G., and Parrinello, M. (2009). Langevin equation with colored noise for constant-temperature molecular dynamics simulations. Physical review letters, 102(2):020601. Chen, J., Jin, S., and Lyu, L. (2022). A consensus-based global optimization method with adaptive momentum estimation. Communications in Computational Physics, 31(4):1296–1316. Chen, K. K., Tu, J. H., and Rowley, C. W. (2012). Variants of dynamic mode decomposition: boundary condition, koopman, and fourier analyses. Journal of nonlinear science, 22:887–915. Chipot, C. and Lelièvre, T. (2011). Enhanced sampling of multidimensional free-energy landscapes using adaptive biasing forces. SIAM Journal on Applied Mathematics, 71(5):1673–1695. Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., and Müller, K.-R. (2017). Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015. Coifman, R. R., Kevrekidis, I. G., Lafon, S., Maggioni, M., and Nadler, B. (2008). Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Modeling & Simulation, 7(2):842–864. Crommelin, D. and Vanden-Eijnden, E. (2011). Diffusion estimation from multiscale data by operator eigenpairs. Multiscale Modeling & Simulation, 9(4):1588–1623. Crosskey, M. and Maggioni, M. (2017). Atlas: a geometric approach to learning high-dimensional stochastic systems near manifolds. Multiscale Modeling & Simulation, 15(1):110–156. Daldrop, J. O., Kowalik, B. G., and Netz, R. R. (2017). External potential modifies friction of 65 molecular solutes in water. Physical Review X, 7(4):041065. Darve, E. and Pohorille, A. (2001). Calculating free energies using average force. The Journal of Chemical Physics, 115(20):9169–9183. Darve, E., Rodríguez-Gómez, D., and Pohorille, A. (2008). Adaptive biasing force method for scalar and vector free energy calculations. The Journal of Chemical Physics, 128(14). Darve, E., Solomon, J., and Kia, A. (2009). Computing generalized Langevin equations and generalized fokker-planck equations. Proc. Natl. Acad. Sci., 106(27):10884–10889. Das, A. and Andersen, H. C. (2012). The multiscale coarse-graining method. ix. a general method for construction of three body coarse-grained force fields. The Journal of chemical physics, 136(19):194114. Davtyan, A., Dama, J. F., Voth, G. A., and Andersen, H. C. (2015). Dynamic force matching: A method for constructing dynamical coarse-grained models with realistic time dependence. J. Chem. Phys., 142. Einstein, A. (1905). On the movement of small particles suspended in a stationary liquid demanded by the molecular-kinetic theory of heat. Annalen der Physik, 17:549–560. Español, P. and Warren, P. (1995). Statistical mechanics of dissipative particle dynamics. Europhysics Letters, 30(4):191–196. Fang, L., Ge, P., Zhang, L., E, W., and Lei, H. (2022). DeePN2: A deep learning-based non-Newtonian hydrodynamic model. Journal of Machine Learning, 1:114–140. Frenkel, D. and Smit, B. (2023). Understanding molecular simulation: from algorithms to applications. Elsevier. Gabrié, M., Rotskoff, G. M., and Vanden-Eijnden, E. (2022). Adaptive monte carlo augmented with normalizing flows. Proceedings of the National Academy of Sciences, 119(10):e2109420119. Gao, X., Xu, Z. Q., and Zhou, X. Y. (2022). State-dependent temperature control for langevin diffusions. SIAM Journal on Control and Optimization, 60(3):1250–1268. Gao, Y. Q. (2008). An integrate-over-temperature approach for enhanced sampling. The Journal of Chemical Physics, 128(6). Ge, P., Zhang, L., and Lei, H. (2023). Machine learning assisted coarse-grained molecular dynamics modeling of meso-scale interfacial fluids. J. Chem. Phys., 158:064104. Ge, P., Zhang, Z., and Lei, H. (2024). Data-driven learning of the generalized langevin equation with state-dependent memory. Phys. Rev. Lett., 133:077301. 66 Grafke, T. and Laio, A. (2024). Metadynamics for transition paths in irreversible dynamics. Multiscale Modeling & Simulation, 22(1):125–141. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. (2017). Improved training of wasserstein gans. Advances in neural information processing systems, 30. Hamelberg, D., Mongan, J., and McCammon, J. A. (2004). Accelerated molecular dynamics: a promising and efficient simulation method for biomolecules. The Journal of Chemical Physics, 120(24):11919–11929. Hansen, J. P. and McDonald, I. (1990). Theory of Simple Liquids. Academic, London. Hess, B., Bekker, H., Berendsen, H. J., and Fraaije, J. G. (1997). Lincs: A linear constraint solver for molecular simulations. Journal of Computational Chemistry, 18(12):1463–1472. Hijón, C., Español, P., Vanden-Eijnden, E., and Delgado-Buscalioni, R. (2010). Mori–Zwanzig formalism as a practical computational tool. Faraday discuss., 144:301–322. Hoogerbrugge, P. J. and Koelman, J. M. V. A. (1992). Simulating microscopic hydrodynamic phenomena with dissipative particle dynamics. Europhys. Lett., 19(3):155–160. Hoover, W. G. (1985). Canonical dynamics: equilibrium phase-space distributions. Physical Review A, 31(3):1695–1697. Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A., and Simmerling, C. (2006). Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins: Structure, Function, and Bioinformatics, 65(3):712–725. Hudson, T. and Li, X. H. (2020). Coarse-graining of overdamped langevin dynamics via the mori–zwanzig formalism. Multiscale Modeling & Simulation, 18(2):1113–1135. Izvekov, S. and Voth, G. A. (2005). A multiscale coarse-graining method for biomolecular systems. The Journal of Physical Chemistry B, 109(7):2469–2473. John, S. T. and Csányi, G. (2017). Many-body coarse-grained interactions using gaussian approximation potentials. The Journal of Physical Chemistry B, 121(48):10934–10949. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., and Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. The Journal of chemical physics, 79(2):926–935. Jung, G., Hanke, M., and Schmid, F. (2017). Iterative reconstruction of memory kernels. Journal of Chemical Theory and Computation, 13(6):2481–2488. Kampen, N. V. (2007). Stochastic processes in physics and chemistry. North Holland. 67 Kawasaki, K. and Gunton, J. D. (1973). Theory of nonlinear transport processes: Nonlinear shear viscosity and normal stress effects. Physical Review A, 8(4):2048. Kim, J., Straub, J. E., and Keyes, T. (2006). Statistical-temperature monte carlo and molecular dynamics algorithms. Physical review letters, 97(5):050601. Kingma, D. and Ba, J. (2015). Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR). Klippenstein, V., Tripathy, M., Jung, G., Schmid, F., and van der Vegt, N. F. (2021). Introducing memory in coarse-grained molecular simulations. The Journal of Physical Chemistry B, 125(19):4931–4954. Klippenstein, V. and van der Vegt, N. F. A. (2021). Cross-correlation corrected friction in (generalized) langevin models. The Journal of Chemical Physics, 154(19):191102. Koopman, B. O. (1931). Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences, 17(5):315–318. Kumar, S., Rosenberg, J. M., Bouzida, D., Swendsen, R. H., and Kollman, P. A. (1992). The weighted histogram analysis method for free-energy calculations on biomolecules. i. the method. Journal of Computational Chemistry, 13(8):1011–1021. Kutz, J. N., Brunton, S. L., Brunton, B. W., and Proctor, J. L. (2016). Dynamic mode decomposition: data-driven modeling of complex systems. SIAM. Laio, A. and Parrinello, M. (2002). Escaping free-energy minima. Proceedings of the National Academy of Sciences, 99(20):12562–12566. Lange, O. F. and Grubmüller, H. (2006). Collective Langevin dynamics of conformational motions in proteins. J. Chem. Phys., 124:214903. Lee, H. S., Ahn, S.-H., and Darve, E. F. (2019). The multi-dimensional generalized Langevin equation for conformational motion of proteins. The Journal of Chemical Physics, 150(17):174113. Lei, H., Baker, N. A., and Li, X. (2016). Data-driven parameterization of the generalized Langevin equation. Proc. Natl. Acad. Sci., 113(50):14183–14188. Lei, H., Caswell, B., and Karniadakis, G. E. (2010). Direct construction of mesoscopic models from microscopic simulations. Physical Review E, 81(2). Lei, H., Wu, L., and E, W. (2020). Machine learning based non-newtonian fluid model with molecular fidelity. Physics Review E, 102:043309. Lelièvre, T., Rousset, M., and Stoltz, G. (2008). Long-time convergence of an adaptive biasing force 68 method. Nonlinearity, 21(6):1155. Lemke, T. and Peter, C. (2017). Neural network based prediction of conformational free energies Journal of Chemical Theory and - a new route toward coarse-grained simulation models. Computation, 13(12):6213–6221. Li, Z., Lee, H. S., Darve, E., and Karniadakis, G. E. (2017). Computing the non-Markovian coarse-grained interactions derived from the Mori–Zwanzig formalism in molecular systems: Application to polymer melts. The Journal of Chemical Physics, 146(1):014104. Lindahl, Abraham, Hess, and van der Spoel (2019). Gromacs 2019.2 source code. Lu, J. and Vanden-Eijnden, E. (2013). Infinite swapping replica exchange molecular dynamics leads to a simple simulation patch using mixture potentials. The Journal of Chemical Physics, 138(8). Luo, G., Andricioaei, I., Xie, X. S., and Karplus, M. (2006). Dynamic distance disorder in proteins is caused by trapping. The Journal of Physical Chemistry B, 110(19):9363–9367. Lyu, L. and Lei, H. (2023a). Consensus-based construction of high-dimensional free energy surface. arXiv preprint arXiv:2311.05009. Lyu, L. and Lei, H. (2023b). Construction of coarse-grained molecular dynamics with many-body non-markovian memory. Physical Review Letters, 131(17):177301. Lyu, L. and Lei, H. (2024). On the generalization ability of coarse-grained molecular dynamics models for non-equilibrium processes. arXiv preprint arXiv:2409.11519. Lyubartsev, A. P. and Laaksonen, A. (1995). Calculation of effective interaction potentials from radial distribution functions: A reverse monte carlo approach. Phys. Rev. E, 52:3730–3737. Ma, L., Li, X., and Liu, C. (2016). Derivation and approximation of coarse-grained dynamics from Langevin dynamics. Arxiv preprint, page 1605.04886. Ma, L., Li, X., and Liu, C. (2019). Coarse-graining langevin dynamics using reduced-order techniques. Journal of Computational Physics, 380:170–190. Ma, Z., Wang, S., Kim, M., Liu, K., Chen, C.-L., and Pan, W. (2021). Transfer learning of memory kernels for transferable coarse-graining of polymer dynamics. Soft Matter, 17(24):5864–5877. Maragliano, L. and Vanden-Eijnden, E. (2006). A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations. Chemical physics letters, 426(1-3):168–175. Maragliano, L. and Vanden-Eijnden, E. (2008). Single-sweep methods for free energy calculations. The Journal of Chemical Physics, 128(18):184110. 69 Marinari, E. and Parisi, G. (1992). Simulated tempering: a new monte carlo scheme. Europhysics letters, 19(6):451. Meyer, H., Pelagejcev, P., and Schilling, T. (2019). Non-markovian out-of-equilibrium dynamics: A general numerical procedure to construct time-dependent memory kernels for coarse-grained observables. Europhysics Letters, 128(4):40001. Miao, Y., Feher, V. A., and McCammon, J. A. (2015). Gaussian accelerated molecular dynamics: Unconstrained enhanced sampling and free energy calculation. Journal of Chemical Theory and Computation, 11(8):3584–3595. Miyamoto, S. and Kollman, P. A. (1992). Settle: An analytical version of the shake and rattle algorithm for rigid water models. Journal of Computational Chemistry, 13(8):952–962. Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957. Molgedey, L. and Schuster, H. G. (1994). Separation of a mixture of independent signals using time delayed correlations. Physical review letters, 72(23):3634. Mones, L., Bernstein, N., and Csányi, G. (2016). Exploration, sampling, and reconstruction of free energy surfaces with gaussian process regression. Journal of chemical theory and computation, 12(10):5100–5110. Mori, H. (1965). Transport, collective motion, and Brownian motion. Progress of Theoretical Physics, 33(3):423–455. Morrone, J. A., Li, J., and Berne, B. J. (2012). Interplay between hydrodynamics and the free energy surface in the assembly of nanoscale hydrophobes. The Journal of Physical Chemistry B, 116(1):378–389. Nielsen, S. O., Lopez, C. F., Srinivas, G., and Klein, M. L. (2004). Coarse grain models and the computer simulation of soft materials. Journal of Physics: Condensed Matter, 16(15):R481–R512. Noé, F., Olsson, S., Köhler, J., and Wu, H. (2019). Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147. Noid, W. G. (2013). Perspective: coarse-grained models for biomolecular systems. J. Chem. Phys., 139(9):090901. Noid, W. G., Chu, J.-W., Ayton, G. S., Krishna, V., Izvekov, S., Voth, G. A., Das, A., and Andersen, H. C. (2008). The multiscale coarse-graining method. I. a rigorous bridge between atomistic and coarse-grained models. J. Chem. Phys., 128(24):244114. Nosé, S. (1984). A molecular dynamics method for simulations in the canonical ensemble. Molecular 70 Physics, 52(2):255–268. Parrinello, M. and Rahman, A. (1981). Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied Physics, 52(12):7182–7190. Pérez-Hernández, G., Paul, F., Giorgino, T., De Fabritiis, G., and Noé, F. (2013). Identification of slow molecular order parameters for markov model construction. The Journal of chemical physics, 139(1). Pfaendtner, J. and Bonomi, M. (2015). Efficient sampling of high-dimensional free-energy landscapes with parallel bias metadynamics. Journal of chemical theory and computation, 11(11):5062–5067. Piana, S. and Laio, A. (2007). A bias-exchange approach to protein folding. The journal of physical chemistry B, 111(17):4553–4559. Pinnau, R., Totzeck, C., Tse, O., and Martin, S. (2017). A consensus-based model for global optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences, 27(01):183–204. Plotkin, S. S. and Wolynes, P. G. (1998). Non-Markovian configurational diffusion and reaction coordinates for protein folding. Phys. Rev. Lett., 80:5015–5018. Prakash, A., Fu, C. D., Bonomi, M., and Pfaendtner, J. (2018). Biasing smarter, not harder, by partitioning collective variables into families in parallel bias metadynamics. Journal of chemical theory and computation, 14(10):4985–4990. Reith, D., Pütz, M., and Müller-Plathe, F. (2003). Deriving effective mesoscale potentials from atomistic simulations. Journal of Computational Chemistry, 24(13):1624–1636. Rosso, L., Mináry, P., Zhu, Z., and Tuckerman, M. E. (2002). On the use of the adiabatic molecular dynamics technique in the calculation of free energy profiles. The Journal of chemical physics, 116(11):4389–4402. Rudd, R. E. and Broughton, J. Q. (1998). Coarse-grained molecular dynamics and the atomic limit of finite elements. Phys. Rev. B, 58:R5893–R5896. Russo, A., Durán-Olivencia, M. A., Kevrekidis, I. G., and Kalliadasis, S. (2019). Deep learning as closure for irreversible processes: A data-driven generalized Langevin equation. arXiv preprint arXiv:1903.09562. Satija, R., Das, A., and Makarov, D. E. (2017). Transition path times reveal memory effects and anomalous diffusion in the dynamics of protein folding. The Journal of Chemical Physics, 147(15):152707. Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. Journal of 71 fluid mechanics, 656:5–28. Schneider, E., Dai, L., Topper, R. Q., Drechsel-Grau, C., and Tuckerman, M. E. (2017). Stochastic neural network approach for learning high-dimensional free energy surfaces. Physical review letters, 119(15):150601. Schwantes, C. R. and Pande, V. S. (2013). Improvements in markov state model construction reveal many non-native interactions in the folding of ntl9. Journal of chemical theory and computation, 9(4):2000–2009. Shaffer, P., Valsson, O., and Parrinello, M. (2016). Enhanced, targeted sampling of high-dimensional free-energy landscapes using variationally enhanced sampling, with an application to chignolin. Proceedings of the National Academy of Sciences, 113(5):1150–1155. She, Z., Ge, P., and Lei, H. (2023). Data-driven construction of stochastic reduced dynamics encoded with non-markovian features. The Journal of Chemical Physics, 158(3):034102. Shell, M. S. (2008). The relative entropy is fundamental to multiscale and inverse thermodynamic problems. The Journal of Chemical Physics, 129(14). Sidky, H. and Whitmer, J. K. (2018). Learning free energy landscapes using artificial neural networks. The Journal of chemical physics, 148(10):104111. Soper, A. (1996). Empirical potential monte carlo simulation of fluid structure. Chemical Physics, 202(2):295–306. Stecher, T., Bernstein, N., and Csányi, G. (2014a). Free energy surface reconstruction from umbrella samples using gaussian process regression. Journal of Chemical Theory and Computation, 10(9):4079–4097. Stecher, T., Bernstein, N., and Csányi, G. (2014b). Free energy surface reconstruction from umbrella samples using gaussian process regression. Journal of chemical theory and computation, 10(9):4079–4097. Stoltz, G., Rousset, M., et al. (2010). Free energy computations: A mathematical perspective. World Scientific. Straus, J. B., Gomez Llorente, J. M., and Voth, G. A. (1993). Manifestations of spatially dependent friction in classical activated rate processes. The Journal of Chemical Physics, 98(5):4082–4097. Sugita, Y. and Okamoto, Y. (1999). Replica-exchange molecular dynamics method for protein folding. Chemical physics letters, 314(1-2):141–151. Tang, K., Zhai, J., Wan, X., and Yang, C. (2023). Adversarial adaptive sampling: Unify pinn and optimal transport for the approximation of pdes. arXiv preprint arXiv:2305.18702. 72 Torrie, G. and Valleau, J. (1977a). Nonphysical sampling distributions in monte carlo free-energy estimation: Umbrella sampling. Journal of Computational Physics, 23(2):187–199. Torrie, G. M. and Valleau, J. P. (1977b). Nonphysical sampling distributions in monte carlo free-energy estimation: Umbrella sampling. Journal of Computational Physics, 23(2):187–199. Tribello, G. A., Bonomi, M., Branduardi, D., Camilloni, C., and Bussi, G. (2014). Plumed 2: New feathers for an old bird. Computer physics communications, 185(2):604–613. Valsson, O. and Parrinello, M. (2014). Variational approach to enhanced sampling and free energy calculations. Phys. Rev. Lett., 113:090601. van der Oord, C., Sachs, M., Kovács, D. P., Ortner, C., and Csányi, G. (2023). Hyperactive learning for data-driven interatomic potentials. npj Computational Materials, 9(1):168. Van Hove, L. (1954). Correlations in space and time and born approximation scattering in systems of interacting particles. Phys. Rev., 95:249–262. Voter, A. F. (1997). Hyperdynamics: Accelerated molecular dynamics of infrequent events. Physical Review Letters, 78(20):3908. Vroylandt, H., Goudenège, L., Monmarché, P., Pietrucci, F., and Rotenberg, B. (2022). Likelihood-based non-markovian models from molecular dynamics. Proceedings of the National Academy of Sciences, 119(13):e2117586119. Wang, D., Wang, Y., Chang, J., Zhang, L., Wang, H., and E, W. (2022). Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics. Nature Computational Science, 2(1):20–29. Wang, H., Zariphopoulou, T., and Zhou, X. Y. (2020a). Reinforcement learning in continuous time and space: A stochastic control approach. The Journal of Machine Learning Research, 21(1):8145–8178. Wang, S., Ma, Z., and Pan, W. (2020b). Data-driven coarse-grained modeling of polymers in solution with structural and dynamic properties conserved. Soft Matter, 16(36):8330–8344. Weiser, L. J. and Santiso, E. E. (2019). A cgenff-based force field for simulations of peptoids with both cis and trans peptide bonds. Journal of computational chemistry, 40(22):1946–1956. Widder, C., Koch, F., and Schilling, T. (2022). Generalized langevin dynamics simulation with non-stationary memory kernels: How to make noise. The Journal of Chemical Physics, 157(19):194107. Willis, C. and Picard, R. (1974). Time-dependent projection-operator approach to master equations for coupled systems. Physical Review A, 9(3):1343. 73 Wu, H. and Noé, F. (2014). Optimal estimation of free energies and stationary densities from multiple biased simulations. Multiscale Modeling & Simulation, 12(1):25–54. Xie, P., Car, R., and E, W. (2022). Ab Initio Generalized Langevin Equations. arXiv preprint arXiv:2211.06558. Yang, L. and Qin Gao, Y. (2009). A selective integrated tempering method. The Journal of Chemical Physics, 131(21). Yang, Y. I., Shao, Q., Zhang, J., Yang, L., and Gao, Y. Q. (2019). Enhanced sampling in molecular dynamics. The Journal of Chemical Physics, 151(7). Yoshimoto, Y., Kinefuchi, I., Mima, T., Fukushima, A., Tokumasu, T., and Takagi, S. (2013). Bottom-up construction of interaction models of non-markovian dissipative particle dynamics. Phys. Rev. E, 88:043305. Yu, P., Du, Q., and Liu, C. (2005). From micro to macro dynamics via a new closure approximation to the fene model of polymeric fluids. Multiscale Modeling & Simulation, 3(4):895–917. Yu, T.-Q., Lu, J., Abrams, C. F., and Vanden-Eijnden, E. (2016). Multiscale implementation of infinite-swap replica exchange molecular dynamics. Proceedings of the National Academy of Sciences, 113(42):11744–11749. Zhang, B. W., Jasnow, D., and Zuckerman, D. M. (2007). Efficient and verified simulation of a path ensemble for conformational change in a united-residue model of calmodulin. Proceedings of the National Academy of Sciences, 104(46):18043–18048. Zhang, L., Han, J., Wang, H., Car, R., and E, W. (2018a). Deep potential molecular dynamics: A scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett., 120:143001. Zhang, L., Han, J., Wang, H., Car, R., and E, W. (2018b). Deepcg: Constructing coarse-grained models via deep neural networks. The Journal of Chemical Physics, 149(3):034101. Zhang, L., Han, J., Wang, H., Saidi, W., Car, R., and E, W. (2018c). End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, pages 4436–4446. Curran Associates, Inc. Zhang, L., Wang, H., and E, W. (2018d). Reinforced dynamics for enhanced sampling in large atomic and molecular systems. The Journal of chemical physics, 148(12):124113. Zheng, L., Chen, M., and Yang, W. (2008). Random walk in orthogonal space to achieve efficient free-energy simulation of complex systems. Proceedings of the National Academy of Sciences, 105(51):20227–20232. 74 Zwanzig, R. (1973). Nonlinear generalized Langevin equations. J. Stat. Phys., 9:215 – 220. Zwanzig, R. (2001). Nonequilibrium Statistical Mechanics. Oxford University Press. 75 APPENDIX A SIMULATION DETAIL FOR NONEQUILIBRIUM CASE A.1 Additional result of CGMD model under equilibrium state To establish a fair comparison of the various CG models, we examine the static and dynamic properties that have been widely used as benchmark problems for model validation. Fig. A.1 shows the radial distribution function between the CG coordinates. For all three CG models, their predictions agree well with the full MD results, which verifies the accuracy of the CG free energy function 𝑈 (Q). Fig. A.2 shows the normalized velocity auto-correlation function (VACF) of the CG variables. Similar to the static properties, the prediction of the CG models agrees well with the full MD results, which verifies the accuracy of the many-body memory term K (Q, 𝑡). Despite of the good agreement, we emphasize that these properties are essentially defined under the marginal density near equilibrium. The applicability to non-equilibrium processes is generally unwarranted, and further relies on the consistency in the conditional probability density function associated with the CG projection operator (i.e., Eq. (3.6)) as discussed in Sec. 3.2. A.2 CG variable training In this section, we show that Eq. 3.8 does not impose further constraints on constructing the CG variables. We examine the equivalence of two CG mapping. For instance, when the centers of mass (COMs) are utilized as the only CG variables, the information they encompass is the same as the COMs plus a constant or the COMs multiplied by a constant. Definition A.2.1. For two CG map 𝜙 and ˇ𝜙, if there exist a map T, such that 𝜙(q, p) = T( ˇ𝜙(q, p)), we say 𝜙 ≤ ˇ𝜙. Also, if 𝜙 ≤ ˇ𝜙 and ˇ𝜙 ≤ 𝜙, we say that the two maps are equivalent. In other words, if 𝜙 ≤ ˇ𝜙, any properties that can be measured from the CG coordinate 𝜙, denote as F (𝜙) , can also be meansured from ˇ𝜙 as F (T( ˇ𝜙)). If 𝜙 ≤ ˇ𝜙 and a CG model d d𝑡 ˇ𝜙 = ˇL ˇ𝜙 is constructed using CG coordinate ˇ𝜙, the CG model for 𝜙 can be constructed by the chain rule d𝑡 𝜙 = 𝛿𝑇 ˇL is deterministic or Itô’s lemma if ˇL includes stochastic term. ˇL ˇ𝜙 if 𝛿 ˇ𝜙 d Proposition A.2.2. For any linear map 𝜙 defined by a matrix W ∈ R𝑁𝑚×(𝑚−1), there exists a map 76 Figure A.1 The distribution of the pairwise distance between CG coordinates in the CGMD model compared with the Full MD model. (a) The radius distribution function 𝑔(𝑟) of the CGCOM model. (b) The distribution of the pairwise distance 𝑔′(𝑟) between two CG coordinates within the same molecular of the CG3 model. (c)-(j) The distribution of distance 𝑔′(𝑟) between two CG coordinate (1-2,1-3,1-4,2-3,2-4,3-4, respectively) within the same molecular of CG4 model. 77 8162432r0.00.40.81.2g(r)(a)MDCGCOM0102030r0.000.040.080.12ρ(r)(r)MDCG35101520r0.000.050.100.15g0(r)(c)MDCG45101520r0.000.050.10g0(r)(d)MDCG45101520r0.000.050.10g0(r)(g)MDCG45101520r0.000.050.10g0(r)(h)MDCG404812r0.000.050.100.150.200.25g0(r)(i)MDCG45101520r0.000.050.10g0(r)(j)MDCG4 Figure A.2 The VACF of CVs in different CG models and their comparison with Full MD. (a) VACF of CVs in CG3 model and the comparison with Full MD. (b)-(g) VACF of CVs (1,2,3,4, respectively) in CG4 model and the comparison with Full MD. ˇ𝜙, defined by a matrix ˇW ∈ R𝑁𝑚×𝑚, subject to the following constraints (A.1) (A.2) 0 < ˇ𝑤𝑛𝑖 < 1, 𝑚 ∑︁ 𝑖=1 ˇ𝑤𝑛𝑖 = 1. Proof. For any W with entry 𝑤𝑖 𝑗 , ˇW can be constructed as   where ˜𝑤 = min𝑛,𝑖 𝑤𝑛𝑖 and ¯𝑤 = max𝑖 (cid:205)𝑚−1 ˇ𝑤𝑛𝑖 =  𝑖 𝑤1 𝑛𝑖 − ˜𝑤1 ¯𝑤 𝑚−1 ∑︁ 1 − , 0 < 𝑖 < 𝑚, ˇ𝑤𝑛𝑖, 𝑖 = 𝑚, 𝑖=1 (𝑤𝑛𝑖 − ˜𝑤). It’s obvious that ˇ𝑤𝑛𝑖 ∈ [0, 1] for 0 < 𝑖 < 𝑚. Since 𝑚−1 ∑︁ 𝑖=1 ˇ𝑤𝑛𝑖 = (cid:205)𝑚−1 𝑖=1 (𝑤𝑛𝑖 − ˜𝑤) ¯𝑤 ∈ [0, 1], we have ˇ𝑤𝑛𝑚 ∈ [0, 1]. Also, (cid:205)𝑖 ˇ𝑤𝑛𝑖 = 1 for every 𝑛, hence ˇ𝑊 satisfies the constraints. Since 𝑚 ∑︁ 𝑁𝑚∑︁ ( 𝑖 𝑛 ˇ𝑤𝑛𝑖) ˇQ𝐼 𝑖 = 𝑚 ∑︁ 𝑁𝑚∑︁ 𝑖 𝑛 ˇ𝑤𝑛𝑖q𝐼 𝑛 = 𝑁𝑚∑︁ 𝑛 q𝐼 𝑛, (A.3) 78 012345Time0.000.250.500.751.00VACF(a)MDCGCOM012345Time0.000.250.500.751.00VACF(b)MDCG301234Time0.000.250.500.751.00VACF(c)MDCG401234Time0.000.250.500.751.00VACF(d)MDCG401234Time0.000.250.500.751.00VACF(g)MDCG401234Time0.000.250.500.751.00VACF(h)MDCG4 we have Q𝐼 𝑖 = = = (cid:205)𝑁𝑚 𝑛 𝑤𝑛𝑖q𝐼,𝑛 (cid:205)𝑁𝑚 𝑛 𝑤𝑛𝑖 ( ¯𝑤 ˇ𝑤𝑛𝑖 + ˜𝑤)q𝐼,𝑛 ( ¯𝑤 ˇ𝑤𝑛𝑖 + ˜𝑤) ˇ𝑤𝑛𝑖) ˇQ𝐼 (cid:205)𝑁𝑚 𝑛 (cid:205)𝑁𝑚 𝑛 (cid:205)𝑁𝑚 𝑛 ¯𝑤((cid:205)𝑁𝑚 𝑛 (cid:16)(cid:205)𝑁𝑚 𝑛 𝑗 𝑖 + ˜𝑤 (cid:205)𝑚 ( ¯𝑤 ˇ𝑤𝑛𝑖 + ˜𝑤) (A.4) ˇ𝑤𝑛 𝑗 (cid:17) ˇQ𝐼 𝑗 . Therefore, Q𝐼 is a linear transformation of ˇQ𝐼 for all 𝐼 = 1, · · · , 𝑀. Similarly, P𝐼 can be shown as a linear transformation of ˇP𝐼. □ 79 APPENDIX B SIMULATION DETAIL AND PROOF FOR CONSENSUS BASED SAMLPING B.1 Proof of Proposition 2 The sampling distribution is governed by the following McKean stochastic differential equation (cid:164)z𝑖 𝑡 = − 1 𝛾 ∇z𝐺 (z𝑖 𝑡) + √︄ 2 𝜅ℎ𝛾 𝜉𝑖 (𝑡), (B.1) (z𝑡 − m𝑡) and 𝛾 is the friction coefficient. We want to show it where 𝐺 (z𝑡) = 1 2 (z𝑡 − m𝑡)𝑇𝑉 −1 𝑡 converges to a steady state as an approximation of 𝑞∗ with the proper choices of m𝑡 and 𝑉𝑡 given by m𝑡 = 𝑁𝑤 ∑︁ 𝑖=1 𝑡 ˆ𝑝(z𝑖 z𝑖 𝑡), 𝑉𝑡 = 𝜅𝑡 𝑁𝑤 ∑︁ 𝑖=1 (z𝑖 𝑡 − m𝑡) (z𝑖 𝑡 − m𝑡)𝑇 ˆ𝑝(z𝑖 𝑡), (B.2) where ˆ𝑝(z) = exp (−𝜅𝑙 L − 𝑖=1 exp (−𝜅𝑙 L − N (z)) N (z𝑖)) (cid:205)𝑁𝑤 and 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ. Proposition 2. Suppose L− 𝑞𝑡 → exp (−𝜅ℎL − ∫ exp (−𝜅ℎL − N (z)) N (z))dz as 𝑡 → ∞, given 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ. N (z) takes a local quadratic approximation in form of 1 2 (z− 𝜇)𝑇 Σ−1(z− 𝜇), Proof. Let 𝑞∞(z) denote the invariant distribution of the McKean SDE (B.1). Then 𝑞∞(z) must be the invariant distribution of the following McKean stochastic dynamics (cid:164)z = − 1 𝛾 𝑉 −1 𝜅𝑙,∞(z − m𝜅𝑙,∞) + √︄ 2 𝛾𝜅ℎ 𝜉 (𝑡), (B.3) where m𝜅𝑙,∞ and 𝜅−1 ∝ 𝑞∞(z)𝑒−𝜅𝑙 L − 𝑡 𝑉𝜅𝑙,∞ are the mean and the covariance matrix of the re-weighted density N (z). With the fluctuation-dissipation relation for Eq. (B.3), we can show 𝑞∞(z) follows the Gaussian distribution with mean m𝜅𝑙,∞ and covariance matrix 𝜅−1 ℎ 𝑉𝜅𝑙,∞. Since LN (z) = 1 2 (z − 𝜇)𝑇 Σ−1(z − 𝜇) is quadratic, the re-weighted density of a Gaussian distribution 𝑞(z) ∼ N (m, 𝑉) remains Gaussian, i.e., 𝑞(z)𝑒−𝜅𝑙 L − N (z) ∝ N (m𝜅𝑙 , 𝑉𝜅𝑙 ), 80 where m𝜅𝑙 and 𝑉𝜅𝑙 are defined by m𝜅𝑙 = (𝑉 −1 + 𝜅𝑙Σ−1)−1(𝜅𝑙Σ−1𝜇 + 𝑉 −1m), 𝑉𝜅𝑙 = (𝑉 −1 + 𝜅𝑙Σ−1)−1. (B.4) In particular, we choose m = m𝜅𝑙,∞ and 𝑉 = 𝜅−1 ℎ 𝑉𝜅𝑙,∞, then Eq. (B.4) yields m𝜅𝑙,∞ = (𝜅ℎ𝑉 −1 𝜅𝑙,∞ + 𝜅𝑙Σ−1)−1(𝜅𝑙Σ−1𝜇 + 𝜅ℎ𝑉 −1 𝜅𝑙,∞m𝜅𝑙,∞), 𝑡 𝑉𝜅𝑙,∞ = (𝜅ℎ𝑉 −1 𝜅−1 𝜅𝑙,∞ + 𝜅𝑙Σ−1)−1. It is easy to show that by choosing 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ, m𝜅𝑙,∞ and 𝑉𝜅𝑙,∞ recovers 𝜇 and Σ, respectively, and the invariant density takes the form 𝑞∞(z) ∼ N (cid:16) 𝜇, 𝜅−1 ℎ Σ (cid:17) . □ B.2 Simulation setup All the MD simulations are performed using the package GROMACS 2019.2 Lindahl et al. (2019) and open-source, community-developed PLUMED library Tribello et al. (2014). The simulation is carried out on Intel(R) Xeon(R) Platinum 8260 CPU. B.2.1 ala2 (the two-dimensional problem) The Ace-Ala-Nme (ala2) molecule is modeled by the Amber99SB force field Hornak et al. (2006). The molecules are dissolved in 383 water molecules in a periodic simulation cell. The cut-off radius of the van der Waals interaction is 0.9 nm. The Coulomb interaction is treated with the smooth particle mesh Ewald method with a real space cutoff of 0.9 nm and a reciprocal space grid spacing of 0.12 nm. The system is integrated with the leap-frog scheme at time step 2 fs. The temperature of the system is set to 300 K by a velocity-rescale thermostat Bussi et al. (2007) with a relaxation time of 0.2 ps. The Parrinello-Rahman barostat Parrinello and Rahman (1981) with a relaxation time scale of 1.5 ps and a compressibility of 4.5 × 10−5 bar−1 is coupled to the system to control the pressure to 1 bar. The hydrogen atom is constrained by the LINCS algorithmHess et al. (1997) and the H–O bond and H–O–H angle of water molecules are constrained by the SETTLE algorithmMiyamoto and Kollman (1992). 81 B.2.2 s1pe (three-dimensional problem) The s-(1)-phenylethyl (s1pe) molecule is modeled by the CHARMM general force field (CGenFF) Weiser and Santiso (2019). The molecule is dissolved in 546 TIP3P water molecules in a (2.69 nm)3 dodecahedron box. The cut-off radius of the van der Waals interaction is 1 nm. The Coulomb interaction is treated with the smooth particle mesh Ewald method with a real space cutoff of 1 nm and a reciprocal space grid spacing of 0.12 nm. The system is integrated with the leap-frog scheme at time step 2 fs. The temperature of the system is set to 298 K by a velocity-rescale thermostat Bussi et al. (2007) with a relaxation time of 0.2 ps. The Parrinello-Rahman barostat pressure couple, the hydrogen atom constraint, the H–O bond constraint, and the H–O–H angle of water molecules constraint are the same as the previous one. B.2.3 (s1pe)3 (the nine-dimensional problem) The (s1pe)3 molecule is modeled by the CHARMM general force field (CGenFF) Weiser and Santiso (2019). The molecule is solvated in a (4.2 nm)3 dodecahedron box with 1622 TIP3P water molecules. Other simulation setups are similar to the s1pe molecule. B.2.4 ala16 (the thirty-dimensional problem) The Ace-(Ala)15-Nme (ala16) molecule is modeled by the Amber99SB force field Hornak et al. (2006). The molecule is solvated in 2258 water molecules in a (4.62 nm)3 dodecahedron box. Other simulation setups are similar to the ala2 molecule. B.3 Traning The training data is collected during the sampling process by restricted dynamics. The initial 10% steps of the restricted dynamics are used as equilibrium and the rest 90% steps are used to calculate the mean force. The FES are parameterized as a fully connected neural network. The depth and width of the NN are shown in Supplementary Table B.1. The NNs are trained by Adam for 100000 steps with a learning rate 1 × 10−3. For each training step, 5000 sampling points are randomly selected from the data set. All the training process is carried out on v100 with 32768 MB memory. 82 Table B.1 The depth and width of the NN used to parameterize the FES of different molecules. molecule ala2 s1pe (s1pe)3 ala16 depth width 3 4 4 4 48 64 512 640 B.4 Additional result B.4.1 Additional result of the 1D Rastrigin function Supplementary Table B.2 presents the estimations of the first and second momentum during the construction of the 1D Rastrigin function. The first momentum, predicted to be the maximum residual points in Proposition 2, is confirmed to be numerically accurate. The second momentum is validated through the comparison between 𝑓 ′′ and 𝑓 ′′ 𝜃𝑖−1 ± 𝑉 −1, with the sign influenced by the residual’s sign. Numerical resuts verify the accurate prediction of the second momentum, i.e., (cid:12) (cid:12) (cid:12) ≈ 𝑉 −1 for 𝑥∗ at the max-residual point. 𝑓 ′′ (𝑥∗) − 𝑓 ′′ 𝜃𝑖 (𝑥∗) (cid:12) (cid:12) (cid:12) Table B.2 The first and second momentum estimation made during the construction of 1D Rastrigin function. The maximum residual point of | 𝑓 (𝑥) − 𝑓𝜃𝑖 (𝑥)| is computed by mesh points with mesh size 1 × 10−4. The first momentum is estimated from the equilibrium distribution of CES dynamics. The second derivative of 𝑓 is computed at the first momentum estimated point. The second derivative estimation is made from 𝑓 ′′ ± 𝑉 −1, where 𝑉 is second momentum estimated from the equilibrium distribution of CES dynamics and sign is determined by the sign of 𝑓 − 𝑓𝜃𝑖 . 𝜃𝑖−1 iteration 𝑖 maximum residual point first momentum estimated second derivative of 𝑓 second derivative estimated 0 0.000 -0.000 41.478 40.974 1 2.000 2.000 41.478 37.442 2 -2.000 -2.000 41.478 36.916 3 0.996 0.996 41.468 38.283 4 -0.996 -0.995 41.462 39.519 5 2.805 2.805 15.273 11.841 6 -2.806 -2.806 15.539 13.380 7 1.500 1.499 -37.478 -36.796 8 -1.502 -1.503 -37.473 -36.271 9 0.503 0.502 -37.474 -37.515 10 -0.501 -0.501 -37.478 -37.285 11 2.416 2.414 -31.901 -31.521 12 -2.413 -2.413 -31.688 -31.189 B.4.2 Additional result of the three-dimensional FES (s1pe) Supplementary Figure B.1-B.3 shows the additional 2D projections of the 3D FES (molecule s1pe) constructed by the Rid and presented CES method. For each case, the reference is constructed as a 2D FES using the metadynamics. The CES method yields a better agreement with the reference. The computational cost and accuracy of CES and RiD is shown in Supplementary Table B.3. It shows that the CES method has better accuray with less computational cost. 83 Table B.3 The accuracy of the constructed 3D FES (the s1pe molecule) and computational time for RiD and CES methods. The 𝑙2 and 𝑙∞ error are calculated up to 40 KJ/mol. For each case, the FES is projected onto a 2D plane with the third variable fixed; the reference solution is constructed as a 2D FES by the metadynamics. The simulation time of the CES method is multiplied by 20 since 20 walkers are used. Method Restraint RiD CES 𝜓 = 1.5 𝜔 = 1.5 𝜓 = 1.5 𝜔 = 1.5 Accuracy 𝑙2 error 5.76 12.04 2.44 3.89 𝑙∞ error 25.72 49.13 11.21 28.80 Time Sampling Train 423.33 8 (GPU) 4.81 × 20 0.84 (GPU) Figure B.1 The 3D FES for the s1pe molecule projected on the 𝜙 − 𝜓 plane with the third variable 𝜔 = −1 (first row) and 𝜔 = −2 (second row). (a) 2D FES by metadynamics (reference) (b) RiD (c) CES. B.4.3 Additional results of the nine-dimensional FES (s1pe)3 The addition 2D projections of the 9D FES for molecule (s1pe)3 are presented in Supplementary Figure B.4. Similar to the 3D case, the FES constructed by the present CES method shows a better agreement with the reference constructed as a 2D FES using metadynamics. B.4.4 Additional results of the thirty-dimensional FES (ala16) The addition 2D projections of the 30D FES for molecule ala16 are presented in Supplementary Figure B.5. We note that the projection on the 𝜙5 − 𝜓5 plane is significantly different from other projections such as the 𝜙1 − 𝜓1 plane in the main context. While the ala16 molecule consists of 15 sequential alanine residues, the FES for individual 𝜙 − 𝜓 projections shows different features. The numerical results of the present CES method show good agreement with the reference. 84 202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040 Figure B.2 The 3D FES of the s1pe molecule on the 𝜔 − 𝜓 plane with the third variable 𝜙 = 0 (first row) and 𝜙 = −1 (second row). (a) 2D FES by metadynamics (reference) (b) RiD (c) CES. Figure B.3 The 3D FES of the s1pe molecule on the 𝜔 − 𝜙 plane with the third variable 𝜓 = 0 (first row) and 𝜓 = −1 (second row). (a) 2D FES by metadynamics (reference) (b) RiD (c) CES. 85 202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040 Figure B.4 The 9D FES for the (s1pe)3 molecule projected on the 𝜔1 − 𝜙1, 𝜔2 − 𝜙2, 𝜔2 − 𝜓2, 𝜙2 − 𝜓2, 𝜔3 − 𝜙3, 𝜔3 − 𝜓3, 𝜙3 − 𝜓3 plane from top to bottom. (a) 2D FES by metadynamics (reference) (b) RiD (c) CES. 86 202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060 Figure B.5 The 30D FES of the ala16 projected on the 𝜙5 − 𝜓5 plane. (a) 2D FES constructed by metadynamics (reference) (b) 30D FES constructed by the CES method. 87 202202(a)202(b)0204060