MACHINE LEARNING-BASED COARSE-GRAINED MODELS FOR MOLECULAR
SYSTEMS

By

Liyao Lyu

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Computational Mathematics, Science and Engineering—Doctor of Philosophy

2025

ABSTRACT

Multiscale modeling poses a formidable challenge in computational mathematics, particularly

in integrating microscale interactions into meso- or macro-scale constitutive relations. While

reduced-order models allow the simulation of extensive systems, their analytical formulations are

generally unclosed. Take coarse-grained molecular dynamics as an example, the Mori-Zwangzig

formulism decomposes the dynamics into deterministic, memory, and stochastic terms, but the

explicit forms for these three terms are unknown. In this thesis, we present a series of studies to

solve these problems. Firstly, the main challenge for the deterministic term comes from the high

dimensionality and the presence of energy barriers of the free energy surface (FES). We propose a

consensus sampling-based approach that reformulates the FES construction as a minimax problem.

This framework simultaneously optimizes the function representation of the FES and the training

set used to learn it. In particular, the maximization step establishes a stochastic interacting particle

system to achieve the adaptive sampling of the max-residue regime by modulating the exploitation

of the Laplace approximation of the current loss function and the exploration of the uncharted

phase space; the minimization step updates the FES approximation with the new training set. By

iteratively solving the minimax problem, the present method essentially achieves an adversarial

learning of the FESs with unified tasks for both phase space exploration and posterior error-enhanced

sampling. Besides, memory interactions are also important for predicting the collective transport

and diffusion processes. To construct this, we introduce a machine-learning-based coarse-grained

molecular dynamics model that captures the dissipative many-body contribution. The neural network

representation is carefully designed to preserve the physical symmetries and the thermo-consistency.

Unlike the common empirical reduced models, the present model is constructed based on the

Mori-Zwanzig formalism and naturally inherits the heterogeneous state-dependent memory term

rather than matching the mean-field metrics such as the velocity autocorrelation function. Finally,

when applied to non-equilibrium systems, models based on the Mori-Zwanzig formalism face

inherent challenges. A key issue lies in the Zwanzig projection, which relies on the marginal

distribution of the system. We present a data-driven approach for constructing reduced models that

retain certain generalization abilities for non-equilibrium processes. Unlike the conventional CG

models based on pre-selected CG variables (e.g., the center of mass), the present CG model seeks a

set of auxiliary CG variables based on the time-lagged independent component analysis to minimize

the entropy contribution of the unresolved variables. This ensures the distribution of the unresolved

variables under a broad range of non-equilibrium conditions approaches the one under equilibrium.

Through numerical validation, we demonstrate that our model can accurately predict viscoelastic

behavior in various non-equilibrium flow regimes.

Copyright by
LIYAO LYU
2025

ACKNOWLEDGEMENTS

First and foremost, I would like to express my deepest gratitude to my supervisor, Prof. Huan Lei.

He is not only a brilliant scholar but also an exceptional mentor. Working with him has been an

immensely enriching experience, helping me develop critical thinking and problem-solving skills. I

am especially grateful for the intellectual freedom he provided, allowing me to explore and refine

my ideas with confidence.

I am equally grateful to the members of my thesis committee, Prof. Andrew Christlieb, Prof. Di

Liu, and Prof. Yimin Xiao, for their insightful critiques, constructive suggestions, and invaluable

support. Prof. Andrew Christlieb is one of the most patient and approachable teachers I have had

the privilege to learn from. I have gained a great deal from his expertise in hyperbolic conservation

laws and tensor decomposition. I am also deeply appreciative of Prof. Di Liu for his guidance in

machine learning, which provided me with valuable insights. Additionally, I sincerely enjoyed my

discussions with Prof. Yimin Xiao on the analytical aspects of stochastic differential equations;

these exchanges greatly enriched my understanding of the subject.

I would like to express my sincere gratitude to the professors whose courses greatly contributed

to my academic growth. Prof. Jun Kitagawa’s thorough instruction in partial differential equations

helped me develop a careful approach to theoretical problems, while Prof. Hui-Chia Yu’s engaging

teaching style sparked my interest in numerical methods for chemical engineering. Prof. Rongrong

Wang’s clear explanations of numerical partial differential equations strengthened my understanding,

and Prof. Daniel Appelö’s expertise in numerical linear algebra helped me connect theory with

practical applications. Their dedication to teaching has been invaluable in shaping my knowledge

and perspective.

I would also like to sincerely thank my friends—Pei Ge, Zhiyuan She, Shijun Liang, Haishen

Dai, Siyu Guo, Yue Zhao and many others—for their support and companionship throughout my

PhD journey. This path can often be challenging and isolating, and their presence made it much

more enjoyable and fulfilling. I deeply cherish the moments we shared, especially our dinners

together, which brought warmth and laughter to this demanding period of my life.

v

Above all, I owe my deepest gratitude to my family for their unwavering support, encouragement,

and sacrifices. Their patience during my countless hours of work, their belief in me during moments

of doubt, and their unconditional love have been my foundation. This achievement is as much theirs

as it is mine.

vi

TABLE OF CONTENTS

CHAPTER 1

INTRODUCTION TO MULTISCALE MODELING AND MOLECULAR
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DYNAMICS .

.

1

CHAPTER 2

DEVELOPING COARSE-GRAINED MOLECULAR DYNAMICS
5
MODELS WITH NON-MARKOVIAN MEMORY EFFECTS . . . . . .
5
2.1 The Mori-Zwanzig Framework for Coarse-Graining . . . . . . . . . . . . . . .
7
2.2 Mori-Zwanzig Formulaism . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
2.3 Empirical Approaches to Coarse-Graining . . . . . . . . . . . . . . . . . . .
9
. 12
2.4
Incorporating Physical Memory Kernels in Coarse-Grained Models . . . . . .
. 13
2.5 Symmetry-preserving neural network representation . . . . . . . . . . . . . .
2.6 Training . .
. 15
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 Numerical Result
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
.
2.8 Conclusion .

. .
.
.

. .

. .

.

CHAPTER 3

GENERALIZATION OF REDUCED MODELS FOR NON-EQUILIBRIUM
DYNAMICS .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 21
. 21
3.1 Challenges in Modeling Non-Equilibrium Processes . . . . . . . . . . . . . .
3.2 Methodology for Generalizable Coarse-Grained Models . . . . . . . . . . . . . 23
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Numerical Results .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
.
.
3.4 Conclusion .

.

.

CHAPTER 4

CONSENSUS-BASED ENHANCED SAMPLING FOR HIGH-DIMENSIONAL
FREE ENERGY SURFACES . . . . . . . . . . . . . . . . . . . . . . . 44
. 44
. 47
. 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1 Challenges in Constructing High-Dimensional Free Energy Surfaces . . . . .
4.2 Consensus-Based Sampling Methodology . . . . . . . . . . . . . . . . . . .
4.3 Application of Consensus-Based Sampling to Biomolecular Systems . . . . .
4.4 Conclusions .

.

.

.

.

CHAPTER 5

CONCLUSION AND OUTLOOK . . . . . . . . . . . . . . . . . . . . . 62

BIBLIOGRAPHY .

.

.

.

.

. .

. .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

APPENDIX A

SIMULATION DETAIL FOR NONEQUILIBRIUM CASE . . . . . .

. 76

APPENDIX B

SIMULATION DETAIL AND PROOF FOR CONSENSUS BASED
SAMLPING .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

vii

CHAPTER 1

INTRODUCTION TO MULTISCALE MODELING AND MOLECULAR DYNAMICS

Numerical simulation has emerged as an important tool for scientific understanding, complementing

experimental and theoretical approaches. As was declared by Dirac back in 1929, the right physical

principle for most of what we are interested in is already provided by the principles of quantum

mechanics (QM), there is no need to look further. However, the ability to reduce everything to

simple fundamental laws does not imply the ability to start from those laws and reconstruct the

universe theoriticaly or computationly. This intricate challenge remained largely ignored until

Anderson underscored in Anderson (1972), that the constructivist hypothesis falters when faced

with the dual challenges of scale and complexity.

The vastness of scale leads to the well-known curse of dimensionality. While feeding atomic

numbers of all involved atoms into QM can theoretically furnish a complete model, the problem’s

dimensionality swells by three with the addition of each atom. Consequently, these first principle

models are currently only applied to simple cases and are inefficient for managing expansive system

sizes and extended simulation timescales.

Model reduction shows promise, as they reduce degrees of freedom by omitting certain details,

thereby facilitating larger-scale simulation. However, the reduced models are almost always obtained

empirically, by guessing. Making the right guess often requires and represents far-reaching physical

insight. In molecular dynamics (MD), the conservative interaction between particles is categorized

into three distinct terms: two-particle pairwise, three-particle angular, and four-particle dihedral.

On the mesoscale, as seen in Coarse-Grained Molecular Dynamics (CGMD), nonconservative

friction forces are typically assumed to be isotropic in the Generalized Langevin Equation (GLE) or

pairwise in Dissipative Particle Dynamics (DPD). At the macroscale, the derivation of macroscopic

constitutive equations from kinetic equations necessitates the closure of the collision operator.

This guessing game can be quite hard and less productive because of the complexity. As we

will show in the first project, the inherent complexity is often largely oversimplified by traditionally

intuition-driven models, such as the GLE or DPD in coarse-grained molecular dynamics (CGMD)

1

problems. The recent emergence of machine learning, with its prowess in assimilating multimodal,

multi-fidelity data and elucidating correlations among interrelated phenomena, offers a unique

opportunity to address these challenges. However, simply inputting atomic coordinates into a

network to fit interactions can lead to a cumbersome parameter space, complicating the training

process. More critically, an exclusive reliance on machine learning risks sidelining the fundamental

laws of physics, which may result in ill-posed problems or unphysical solutions.

Progress has been achieved by incorporating physical priors, such as physical symmetries, into the

network ansatz. This not only enhances accuracy but also bolsters the transparency of the resulting

models. A case in point is DeepMD, which constructs an extensive and symmetry-invariant network

to capture the many-body nature of the conservative force field. However, accurate prediction of

the dynamics in CGMD further relies on faithfully modeling a memory term that represents the

energy-dissipation processes arising from the unresolved degrees of freedom (DoF).

The importance of the state-dependent non-Markovian friction tensor (force) for dynamic

properties remains broadly overlooked, with the conventional state-independent GLE kernel being

often used. In our first project Lyu and Lei (2023b), we will show that the state-dependence of

memory terms can be critical for constructing a reliable extensive CGMD model that preserves the

collective dynamics behavior. The commonly used velocity auto-correlation function (VACF) is a

mean-field metric. Reproducing the VACF (as can be sufficient to capture in GLE) is insufficient

to capture the crucial collective behavior such as the hydrodynamic mode, collective diffusion

(Van-Hove function), et al. In contrast, we propose a physical information-embedded network-based

model constructed directly from Zwanzig’s formalism that faithfully retains the heterogeneous

energy dissipation among the CG particles and accurately predicts the dynamics on the collective

scale.

The complexity isn’t confined solely to the dynamic equation; it also extends to the selection

of dynamic coordinates Lyu and Lei (2024). A case in point is water models. Coordinates (or

sites) ranging from one to six have been proposed to represent water, each aiming to capture

distinct behaviors. The judicious selection of coordinates is rooted in physical insight, much like

2

the model-building previously mentioned. As the complexity of the molecular structure escalates,

this task becomes progressively challenging. Data-driven coordinate selection seems a potential

alternative to the traditional human intelligence-driven approach. In extensive CGMD systems,

the center of mass (COM) remains the most prevalent coordinate. However, when modeling

non-Newtonian fluids, which exhibit enormously complex flow behavior, capturing the deformation

of polymers becomes crucial. This aspect is often oversimplified when relying solely on COM. The

ongoing project intends to emphasize the importance of dynamic coordinate selection for faithfully

capturing macroscope rheology behavior.

Apart from dynamic equations and dynamic coordinates, one particular problem of these

data-driven methods is sampling Lyu and Lei (2023a). Generating a good representative data set

is in general not an easy task. For example, in molecular dynamics, a representative data set to

construct a free energy surface (FES) includes most states of a molecule including ground states,

meta-stable states, and some high-energy states (unstable states). In most cases, the molecules are

stuck in a local minimum on the energy surface, and it is difficult to jump over the energy barriers.

Enhance sampling thus is important for these problems to accelerate ergodic ability.

To increase the probability of rare events occurring in MD simulations, the simulation process

can be interfered with using various methods, including raising the temperature, replica exchange,

and adding bias potential. Traditional boosted sampling methods based on bias potential suffer

from the curse of dimensionality. In the third project, we propose the Consensus-Based Enhanced

Sampling (CES) by framing the FESs construction as a min-max problem. The minimization aspect

focuses on optimizing the network parameters to minimize error at the sampled points, while the

maximization aspect focuses on optimizing the distribution of the sampling to maximize the error

within the dataset. By iteratively addressing the min-max problem, we are able to construct a

high-dimensional energy surface.

Integrating deep learning with multiscale modeling holds potential, offering enhancements

to models traditionally anchored in human intelligence. However, forging a seamless integration

between machine learning and established scientific structural models remains elusive. The accuracy

3

and transparency of data-driven models continue to be viewed with skepticism by a broader

community. We aspire that our work will contribute to bolstering the confidence in these approaches.

While our primary focus has been on MD and CGMD scales, we also aim to extend our methods to

the macro scale in the future.

4

CHAPTER 2

DEVELOPING COARSE-GRAINED MOLECULAR DYNAMICS MODELS WITH
NON-MARKOVIAN MEMORY EFFECTS

© 2023 American Physical Society. Reproduced with permission from L. Lyu and H. Lei, Phys.

Rev. Lett. 131, 177301 (2023). https://doi.org/10.1103/PhysRevLett.131.177301

2.1 The Mori-Zwanzig Framework for Coarse-Graining

Accurately predicting the collective behavior of multi-scale physical systems is a long-standing

problem that requires the integrated modeling of the molecular-level interactions across multiple

scales Anderson (1972). However, for systems without clear scale separation, there often exists no

such a set of simple collective variables by which we can formulate the evolution in an analytic and

self-determined way. One canonical example is coarse-grained molecular dynamics (CGMD). While

the reduced degrees of freedom (DoFs) enable us to achieve a broader range of the spatio-temporal

scale, the construction of truly reliable CG models remains highly non-trivial. A significant amount

of work Torrie and Valleau (1977a); Rosso et al. (2002); Maragliano and Vanden-Eijnden (2006);

Izvekov and Voth (2005); Noid et al. (2008); Rudd and Broughton (1998); Lyubartsev and Laaksonen

(1995); Shell (2008); Kumar et al. (1992); Nielsen et al. (2004); Laio and Parrinello (2002); Darve

and Pohorille (2001) (see also review Noid (2013)), including recent machine learning (ML)-based

approaches Behler and Parrinello (2007); Stecher et al. (2014a); John and Csányi (2017); Lemke and

Peter (2017); Chmiela et al. (2017); Zhang et al. (2018a,b), have been devoted to constructing the

conservative CG potential for retaining consistent static and thermodynamic properties. However,

accurate prediction of the CG dynamics further relies on faithfully modeling a memory term

that represents the energy-dissipation processes arising from the unresolved DoFs; the governing

equations generally become non-Markovian on the CG scale. Moreover, such non-Markovian term

often depends on the resolved variables in a complex way Satija et al. (2017); Luo et al. (2006);

Best and Hummer (2010); Plotkin and Wolynes (1998); Straus et al. (1993); Morrone et al. (2012);

Daldrop et al. (2017) where the analytic formulation is generally unknown.

In particular, for

extensive CGMD systems (i.e., the number of CG particles can be proportionally changed according

5

to the simulation size), the memory term often exhibits strong many-body effect and needs to satisfy

various physical symmetry constraints among the CG particles. Existing approaches often rely on

empirical models such as Brownian motion Einstein (1905), Langevin dynamics Kampen (2007),

and dissipative particle dynamics (DPD) Hoogerbrugge and Koelman (1992); Español and Warren

(1995). Despite their broad applications, studies Lei et al. (2010); Hijón et al. (2010); Yoshimoto

et al. (2013) based on direct construction from full MD show that the empirical (e.g., pairwise

additive) forms can be insufficient to capture the state-dependent energy-dissipation processes due

to the many-body and non-Markovian effects. Recent efforts Lange and Grubmüller (2006); Wang

et al. (2020b); Ceriotti et al. (2009); Baczewski and Bond (2013); Davtyan et al. (2015); Lei et al.

(2016); Li et al. (2017); Russo et al. (2019); Jung et al. (2017); Lee et al. (2019); Ma et al. (2019,

2021); Klippenstein and van der Vegt (2021); Vroylandt et al. (2022); She et al. (2023); Xie et al.

(2022) model the memory term based on the generalized Langevin equation (GLE) and its variants

(see also review Klippenstein et al. (2021)). The velocity auto-correlation function (VACF) is often

used as the target quantity for model parameterization. While it may serve as an appropriate measure

for certain non-extensive systems Widder et al. (2022); Meyer et al. (2019), the VACF is essentially

a metric of the background dissipation under mean-field approximation. For extensive CGMD

systems, the homogeneous kernel overlooks the heterogeneity of the energy dissipation among the

CG particles stemming from the many-body nature of the marginal probability density function

of the CG variables. This limitation imposes a fundamental challenge for accurately modeling the

local irreversible responses as well as the transport and diffusion processes on the collective scale.

This work aims to fill the gap with a new CG model that faithfully entails the state-dependent

non-Markovian memory and the coherent noise for extensive MD systems. The model formulation

can be loosely viewed as an extended dynamics of the CG variables joint with a set of non-Markovian

features that embodies the many-body nature of the energy dissipation among the CG particles.

Specifically, we treat each CG particle as an agent and seek a set of symmetry-preserving neural

network (NN) representations that directly map its local environments to the non-Markovian friction

interactions, and thereby circumvent the exhausting efforts of fitting the individual memory terms

6

with a unified empirical form. Different from the ML-based potential model Zhang et al. (2018b),

the memory terms are represented by NNs in form of second-order tensors that strictly preserve the

rotational symmetry and the positive-definite constraint. Coherent noise can be introduced satisfying

the second fluctuation-dissipation theorem and retaining consistent invariant distribution. Rather

than matching the VACF, the model is trained based on the Mori-Zwanzig (MZ) projection formalism

such that the effects of the unresolved interactions can be seamlessly inherited. We emphasize that

the construction is not merely for mathematical rigor. Numerical results of a polymer molecule

system show that the CG models with empirical memory forms are generally insufficient to capture

heterogeneous inter-molecular dissipation that leads to inaccurate cross-correlation functions among

the particles. Fortunately, the present model can reproduce both the auto- and cross-correlation

functions. More importantly, it accurately predicts the challenging collective dynamics characterized

by the hydrodynamic mode correlation and the van Hove function Van Hove (1954) and shows the

promise to predict the meso-scale transport and diffusion processes encoded with molecular-level

fidelity.

2.2 Mori-Zwanzig Formulaism

Let us consider a full MD system consisting of 𝑀 molecules with a total number of 𝑁 atoms. The

phase space vector is denoted by z = [q, p], where q, p ∈ R3𝑁 represent the position and momentum
vector, respectively. Given z(0) = z0, the evolution follows z(𝑡) = eL𝑡z0, where L is the Liouville
operator determined by the Hamiltonian 𝐻 (z). The CG variables are defined by representing each
molecule as a CG particle, i.e., 𝜙(z) = (cid:2)𝜙𝑄 (z), 𝜙𝑃 (z)(cid:3), where ϕ𝑄 (z) = [Q1, Q2, · · · , Q𝑀] and
ϕ𝑃 (z) = [P1, P2, · · · , P𝑀] represent the center of mass and the total momentum of individual
molecules, respectively. Z (𝑡) = [Q(𝑡), P (𝑡)] denote the map 𝜙(z(𝑡)) with z(0) = z0. Using the

Koopman operator Koopman (1931), Z (𝑡) can be mapped from the initial values, i.e.,

Z (𝑡) = eL𝑡Z (0),

(2.1)

where L is the Liouville operator determined by the full-model Hamiltonian 𝐻 (z). Below we

derive the reduced model by choosing CG variables Z as a linear mapping of the full phase-space

7

vector z and we refer to Ref. Hijón et al. (2010); Darve et al. (2009) for discussions of the more

general cases.

Following Zwanzig’s approach, we define a projection operator as the conditional expectation with
a fixed CG vector Z, i.e., PZ 𝑓 (z) := ∫ 𝛿(ϕ(z) − Z) 𝜌0(z) 𝑓 (z) dz/Ω(Z), where 𝜌0(z) ∝ e−𝛽𝐻 (z)
represents the equilibrium density function and Ω(Z) = ∫ 𝛿(ϕ(z) −Z) 𝜌0(z) dz. Also, we define an
orthogonal operator QZ = I − PZ. Using Eq. (2.1), we have (cid:164)Z (𝑡) = eL𝑡 PZ LZ (0) + eL𝑡 QZ LZ (0).

In particular, we choose Z = [Q, P ] = Z (0). Using the Duhamel-Dyson identity, we can write the

dynamics of Z (𝑡) as

(cid:164)Z (𝑡) = eL𝑡 PZ LZ (0) +

∫ 𝑡

0

d𝑠eL (𝑡−𝑠) PZ LeQZ L𝑠QZ LZ (0) + eQZ L𝑡 QZ LZ (0).

(2.2)

Let us start with the mean-field term PZ LZ (0). For the present study, the CG variables are

linear functions of z. Therefore, we have PZ LQ = LQ = M −1P , i.e., QZ LQ ≡ 0. For PZ LP

associated with the 𝑖-th CG particle, we have

∫

∫

∫

PZ LP𝑖 =

=

=

𝛿(ϕ(z) − Z) 𝜌0(z)LP𝑖 dz/Ω(Z)

𝛿(ϕ(z) − Z) 𝜌0(z) (−

∇q𝑖 𝐻 (z)) dz/Ω(Z)

∑︁

𝑖∈N𝑖

𝛿(ϕ(z) − Z)(𝛽−1 ∑︁
𝑖∈N𝑖
𝛿(ϕ𝑄 𝜙(q) − Q) 𝜌0(q) dq/

∫

∇q𝑖 ) 𝜌0(z) dz/Ω(Z)

= 𝛽−1∇Q𝑖

∫

𝛿(ϕ𝑄 (q) − Q) 𝜌0(q) dq

(2.3)

= −∇Q𝑖𝑈 (Q),

where N𝑖 represents the index set of the atoms that belongs to the 𝑖-th molecule, and the free energy
𝑈 (Q) = −𝛽−1 ln (cid:2)∫ 𝛿(ϕ𝑄 (q) − Q) 𝜌0(q) dq(cid:3).

8

For the memory term PZ LeQZ L𝑠QZ LP associated with the 𝑖-th CG particle, we have

PZ LeQZ L𝑠QZ LP𝑖 =

=

=

∫

∫

∫

𝜌0(z)𝛿(ϕ(z) − Z)LeQZ L𝑠QZ LP𝑖 dz/Ω(Z)

𝜌0(z)(L𝜙(z) · ∇Z)𝛿(ϕ(z) − Z)eQZ L𝑠QZ LP𝑖 dz/Ω(Z)

𝜌0(z)(QZ LP · ∇P )𝛿(ϕ(z) − Z)eQZ L𝑠QZ LP𝑖 dz/Ω(Z) (b𝑦 QZ LQ ≡ 0)
∫

𝜌0(z)𝛿(ϕ(z) − Z) (QZ LP ) ⊗ eQZ L𝑠QZ LP𝑖 dz/Ω(Z)

= ∇P ·

(cid:18)∫

= ∇P ·

(cid:124)

𝜌0(z)𝛿(ϕ(z) − Z) (QZ LP ) ⊗ eQZ L𝑠QZ LP𝑖 dz/Ω(Z)
(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

(cid:123)(cid:122)
˜K𝑖, (Z,𝑠)

(cid:19)

(cid:125)

− ˜K𝑖,(Z, 𝑠) · ∇P (1/Ω(Z)) Ω(Z).

(2.4)

Furthermore, we take the assumption that the memory kernel only depends on the positions of the

CG particles Q, i.e., ∇P · ˜K (Z, 𝑠) ≡ 0. Also, similar to the derivation in Eq. (2.3), we note that

∫

Ω(Z) ∝

𝛿(ϕ𝑄 (q) − Q) 𝜌0(q)𝛿(ϕ𝑃 (q) − P )𝑒−𝛽P 𝑇 M −1P /2 dz ∝ 𝑒−𝛽P 𝑇 M −1P /2.

(2.5)

Therefore, Eq. (2.4) can be further simplified as

PZ LeQZ L𝑠QZ LP𝑖 = −𝛽 ˜K𝑖,(Q, 𝑠) · M −1P .

With Eqs. (2.3) (2.6), we can show that the dynamics of Z = [Q, P ] can be written as

(cid:164)Q = M −1P

(cid:164)P = −∇𝑈 (Q) −

∫ 𝑡

0

K (Q(𝑡 − 𝑠), 𝑠)V (𝑡 − 𝑠) d𝑠 + R(𝑡),

(2.6)

(2.7)

where K (Q, 𝑠) = 𝛽 ˜K (Q, 𝑠) and R(𝑡) = eQZ L𝑡 QZ LZ (0) is modeled as a random process

representing the different initial condition z0 with 𝜙(z0) = Z.

2.3 Empirical Approaches to Coarse-Graining

Eq. (2.7) provides the starting point to derive the various CG models. Direct evaluation of

K (Q, 𝑡) imposes a challenge as it relies on solving the full-dimensional orthogonal dynamics eQZ 𝑡.

Some empirical models have been developed to represent the kernel historically.

9

Figure 2.1 The RDF of the full MD and various CG models with the conservative CG potential
𝑈 (Q) constructed by the DeepCG model.

2.3.1 M-DPD

• Formulation: In the M-DPD model, the dynamics of the CG system is governed by

(cid:164)Q𝑖 = 𝑚−1

𝑖 P𝑖,

(cid:164)P𝑖 = F 𝐶

𝑖 −

∑︁

𝑗

𝛾𝑖 𝑗 V 𝑗 + R(𝑡).

(2.8)

The computation of conservative force term F 𝐶 is conducted by DeePCG Zhang et al.

(2018b) method for all models considered in this paper. This approach enables the successful

reproduction of the structural properties for each model as shown in Fig. 2.1.

• Construction: The friction tensor is calculated from random force correlation by constrained

dynamics

𝛾𝑖 𝑗 (Q) = 𝛾⊥(𝑄𝑖 𝑗 ) (1 − e𝑖 𝑗 e𝑇

𝑖 𝑗 ) + 𝛾∥ (𝑄𝑖 𝑗 ) (e𝑖 𝑗 e𝑇

𝑖 𝑗 ),

𝛾∥ (𝑄𝑖 𝑗 ) =

𝛾⊥(𝑄𝑖 𝑗 ) =

1
𝑘 𝐵𝑇
1
𝑘 𝐵𝑇

∫ ∞

0
∫ ∞

0

E[𝛿F ∥

𝑖 (𝑠)𝑇 · 𝛿F ∥

𝑗 (0)] d𝑠,

E[𝛿F ⊥

𝑖 (𝑠)𝑇 · 𝛿F ⊥

𝑗 (0)] d𝑠,

(2.9)

where e𝑖 𝑗 is the unit vector represents along Q𝑖 𝑗 , the symbol ∥ represents the radial component

along Q𝑖 𝑗 and ⊥ for perpendicular component. For further details, see Lei et al. (2010); Hijón

10

010203040x0.00.51.01.52.0RDFFull MDDeepCGet al. (2010).

2.3.2 NM-GLE

• Formulation: In the standard GLE model, the dynamics of the CG system are governed by

(cid:164)Q = M −1P ,
∫ 𝑡

(cid:164)P = F 𝐶 −

0

K (𝑡 − 𝑠)V (𝑠) d𝑠 + R(𝑡).

(2.10)

where the over-simplified kernel K is only time-independent and homogeneous among the

CG particles, as discussed in the main context.

• Construction: The memory kernel depends on solving the Volterra equation

G(𝑡) = −

∫ 𝑡

0

K (𝑡 − 𝑠)C (𝑠) d𝑠,

(2.11)

where C (𝑡) = E[V (𝑡)V (0)𝑇 ] and G(𝑡) = E[𝛿F (𝑡)V (0)𝑇 ] are the correlation matrices. In

practice, the velocity correlation matrix C (𝑡) is often simplified to VACF c(𝑡) = E[V (𝑡)𝑇 ·

V (0)] for individual CG particles under the mean-field approximation. As a result, kernel K

reduces to a scalar function depending only on time, which overlooks the state dependency.

With the obtained kernel in proper form, the system can be cast into an extended dynamics

driven by white noise, e.g., see Ceriotti et al. (2009); Lei et al. (2016); Wang et al. (2020b);

She et al. (2023).

2.3.3 NM-DPD

• Formulation: In the NM-DPD model, the dynamics of the CG system are governed by

(cid:164)Q𝑖 = 𝑚−1

𝑖 P𝑖,

(cid:164)P𝑖 = F 𝐶

𝑖 −

∫ 𝑡

∑︁

𝑗

0

K𝑖 𝑗 (𝑡 − 𝑠)V𝑖 𝑗 (𝑠) d𝑠 +

R𝑖 𝑗 (𝑡),

∑︁

𝑗

(2.12)

where the kernel K𝑖 𝑗 further depends on the pair-wise distance 𝑄𝑖 𝑗 .

• Construction: The memory kernel depends on solving the Volterra integral equation

A𝑖 𝑗 (𝑡) = −

∫ 𝑡

0

K𝑖 𝑗 (𝑡 − 𝑠)B𝑖 𝑗 (𝑠) d𝑠,

(2.13)

11

where the correlation functions are computed as two orthogonal parts, i.e.,

A𝑖 𝑗 (𝑡) = E[δF ∥

𝑖 𝑗 (𝑡) · V ∥

𝑖 𝑗 (0)𝑇 ] + E[𝛿F ⊥

𝑖 𝑗 (𝑡) · V ⊥

𝑖 𝑗 (0)𝑇 ],

B𝑖 𝑗 (𝑡) = E[V ∥

𝑖 𝑗 (𝑡) · V ∥

𝑖 𝑗 (0)𝑇 ] + E[V ⊥

𝑖 𝑗 (𝑡) · V ⊥

𝑖 𝑗 (0)𝑇 ].

(2.14)

As a result, the orthogonal parts of the friction term can be solved separately, and extended

dynamics can be also applied for simulation, e.g., see Li et al. (2017).

However, as shown below, such empirical forms are limited to capturing the state-dependence that

turns out to be crucial for the dynamics on the collective scale, and motivates the present model

retaining the many-body nature of K (Q, 𝑡).

2.4

Incorporating Physical Memory Kernels in Coarse-Grained Models

To elaborate the essential idea, let us start with the Markovian approximation K (Q, 𝑡) ≈

−𝚪(Q)𝛿(𝑡), where 𝚪(Q) = 𝚵(Q)𝚵(Q)𝑇 is the friction tensor preserving the semi-positive definite

condition, and 𝚵(Q) needs to retain the translational, rotational, and permutational symmetry, i.e.,

𝚵𝑖 𝑗 (Q1 + b, · · · , Q𝑀 + b) = 𝚵𝑖 𝑗 (Q1, · · · , Q𝑀)

𝚵𝑖 𝑗 (UQ1, · · · , UQ𝑀) = U𝚵𝑖 𝑗 (Q1, · · · , Q𝑀)U𝑇

(2.15)

𝚵𝜎(𝑖)𝜎( 𝑗) (Q𝜎(1), · · · , Q𝜎(𝑀)) = 𝚵𝑖 𝑗 (Q1, · · · , Q𝑀),

where 𝚵𝑖 𝑗 ∈ R3×3 represents the friction contribution of 𝑗-th particle on 𝑖-th particle, b ∈ R3 is a

translation vector, U is a unitary matrix, and 𝜎(·) is a permutation function.

To inherit the many-body interactions, we map the local environment of each CG particle into a

set of generalized coordinates, i.e., ˆQ𝑘

𝑖 = Q𝑖 + (cid:205)𝑙∈N𝑖

𝑓 𝑘 (𝑄𝑖𝑙)Q𝑖𝑙, where f : R → R𝐾 is an encoder

function to be learned, and N𝑖 = {𝑙 |𝑄𝑖𝑙 < 𝑟𝑐} is the neighboring index set of the 𝑖-th particle

within a cut-off distance 𝑟𝑐. Accordingly, ˆQ𝑖 𝑗 ∈ R3×𝐾 represents a set of features that encode the
𝑖 − ˆQ𝑘
𝑗

inter-molecular configurations beyond the pairwise approximation. The 𝑘-th column ˆQ𝑘

𝑖 𝑗 = ˆQ𝑘

preserves the translational and permutational invariance, by which we represent 𝚵𝑖 𝑗 by

𝚵𝑖 𝑗 =

𝐾
∑︁

𝑘=1

ℎ𝑘 ( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 ) ˆQ𝑘

𝑖 𝑗 ⊗ ˆQ𝑘

𝑖 𝑗 + ℎ0( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 )I

(2.16)

where ℎ : R𝐾×𝐾 → R𝐾+1 are encoder functions which will be represented by NNs. For 𝑖 = 𝑗, we
have 𝚵𝑖𝑖 = − (cid:205) 𝑗 ∈N𝑖 𝚵𝑖 𝑗 based on the Newton’s third law.

12

Eq. (2.16) entails the state-dependency of the memory term K (Z, 𝑡) under the Markovian

approximation. To incorporate the non-Markovian effect, we embed the memory term within an

extended Markovian dynamics Ceriotti et al. (2009) (see also Ref. She et al. (2023)). Specifically,

we seek a set of non-Markovian features ζ := [ζ1, ζ2, · · · , ζ𝑛], and construct the joint dynamics of

[Z, ζ] by imposing the many-body form of the friction tensor between P and ζ, i.e.,

(cid:164)Q = M −1P

(cid:164)P = −∇𝑈 (Q) + 𝚵(Q)ζ

(cid:164)ζ = −𝚵(Q)𝑇 V − 𝚲ζ + ξ(𝑡),

(2.17)

where 𝚵 = (cid:2)𝚵1𝚵2 · · · 𝚵𝑛(cid:3) and each sub-matrix takes the form (2.16) constructed by {f 𝑖 (·), h𝑖 (·)}𝑛
𝑖=1
respectively. 𝚲 = ˆ𝚲 ⊗ I represents the coupling among 𝑛 features, where I ∈ R3𝑁×3𝑁 is the identity

𝑇
matrix and ˆ𝚲 ∈ R𝑛×𝑛needs to satisfy the Lyapunov stability condition ˆ𝚲 + ˆ𝚲

≥ 0. Therefore, we

write ˆ𝚲 = ˆL ˆL𝑇 + ˆL𝑎, where ˆL is a lower triangular matrix and ˆL𝑎 is an anti-symmetry matrix

which will be determined later. By choosing the white noise ξ(𝑡) following

⟨ξ(𝑡)ξ(𝑡′)⟩ = 𝛽−1(𝚲 + 𝚲𝑇 )𝛿(𝑡 − 𝑡′),

(2.18)

Eq. (2.17) retains the consistent invariant distribution

𝜌(Q, P , ξ) ∝ exp[−𝛽(𝑈 (Q) + P 𝑇M −1P /2 + ζ𝑇 ζ/2)].

2.5 Symmetry-preserving neural network representation

Preserving the physical symmetry constraints is crucial for both the accuracy and the generalization

ability of the constructed ML-models. Besides the conservative potential 𝑈 (Q), the constructed

memory term will need to satisfy the translation- and permutation-invariance, as well as the

rotation-symmetries. Let Tb, RU, and P𝜎 denote the translation, rotation, and permutation operator

whose actions on a general function F (Q1, · · · , Q𝑀) defined by

TbF (Q1, · · · , Q𝑀) := F (Q1 + b, · · · , Q𝑀 + b),

RU F (Q1, · · · , Q𝑀) := F (Q1U, · · · , Q𝑀U),

(2.19)

P𝜎F (Q1, · · · , Q𝑀) := F (Q𝜎(1), · · · , Q𝜎(𝑀)),

13

where b ∈ R3 is a position vector, U ∈ R3×3 is an orthogonal matrix and 𝜎 is an arbitrary permutation

of the set of indices. The components of the constructed memory will need to satisfy the symmetry

constraints

Tb𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) = 𝚵𝑖 𝑗 (Q1, · · · , Q𝑀)

RU𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) = U𝚵𝑖 𝑗 (Q1, · · · , Q𝑀)U𝑇

(2.20)

P𝜎𝚵𝑖 𝑗 (Q1, · · · , Q𝑀) = 𝚵𝜎(𝑖)𝜎( 𝑗) (Q𝜎(1), · · · , Q𝜎(𝑀)),

Proposition 2.5.1. The representation 𝚵𝑖 𝑗 =

𝐾
∑︁

ℎ𝑘 ( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 )

(cid:16) ˆQ𝑘

𝑖 𝑗

(cid:17) (cid:16) ˆQ𝑘

𝑖 𝑗

(cid:17)𝑇

+ℎ0( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 )I preserves

the symmetry conditions (2.20), where ˆQ𝑘

𝑘=1
𝑖 = Q𝑖+

∑︁

𝑓 𝑘 (𝑄𝑖𝑙)Q𝑖𝑙 represents the local environment-determined

𝑙∈N𝑖
features (generalized coordinate) for the 𝑖-th particle, f : R → R𝐾 and h : R𝐾×𝐾 → R𝐾+1 are two

encoder functions.

(cid:13)
Proof. We note that TbQ𝑖 𝑗 = TbQ𝑖 − TbQ 𝑗 = Q𝑖 𝑗 , Tb𝑄𝑖 𝑗 = (cid:13)
(cid:13) = 𝑄𝑖 𝑗 , RUQ𝑖 𝑗 = UQ𝑖 𝑗 ,
RU𝑄𝑖 𝑗 = 𝑄𝑖 𝑗 , P𝜎Q𝑖 𝑗 = Q𝜎(𝑖)𝜎( 𝑗), and P𝜎𝑄𝑖 𝑗 = 𝑄𝜎(𝑖)𝜎( 𝑗). Therefore, for arbitrary indices 𝑖 and 𝑘,
the feature ˆQ𝑘

𝑖 satisfy the following symmetry conditions

(cid:13)TbQ𝑖 − TbQ 𝑗

Tb ˆQ𝑘

𝑖 = TbQ𝑖 +

∑︁

𝑓 𝑘 (Tb𝑄𝑖𝑙)TbQ𝑖𝑙 = ˆQ𝑘

𝑖 + b

RU ˆQ𝑘

𝑖 = RUQ𝑖 +

𝑙∈N𝑖
∑︁

𝑓 𝑘 (RU𝑄𝑖𝑙)RUQ𝑖𝑙 = U ˆQ𝑘
𝑖

P𝜎 ˆQ𝑘

𝑖 = P𝜎Q𝑖 +

𝑙∈N𝑖
∑︁

𝑓 𝑘 (P𝜎𝑄𝑖𝑙)P𝜎Q𝑖𝑙 = ˆQ𝑘

,

𝜎(𝑖)

𝑙∈N𝜎 (𝑖)
where we have used the fact that (cid:205)𝑙 𝑓 (𝑟𝑙)r𝑙 is permutational invariant for the last equation.

(2.21)

Therefore, we have Tb ˆQ𝑖 𝑗 = Tb ˆQ𝑖 − Tb ˆQ 𝑗 = ˆQ𝑖 𝑗 , Tb ˆ𝑄𝑖 𝑗 = ∥Tb ˆQ𝑖 − Tb ˆQ 𝑗 ∥ = ˆ𝑄𝑖 𝑗 , RU ˆQ𝑖 𝑗 =

U ˆQ𝑖 𝑗 , RU ˆ𝑄𝑖 𝑗 = ˆ𝑄𝑖 𝑗 , P𝜎 ˆQ𝑖 𝑗 = ˆQ𝜎(𝑖)𝜎( 𝑗), and P𝜎 ˆ𝑄𝑖 𝑗 = ˆ𝑄𝜎(𝑖)𝜎( 𝑗). Thus, for arbitrary indices 𝑖, 𝑗
and 𝑘, the encoder functions ℎ𝑘 ( ˆQ𝑖 𝑗 ˆQ𝑇

𝑖 𝑗 ) satisfy the following symmetry condition

Tbℎ𝑘 ( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 ) = ℎ𝑘 ((Tb ˆQ𝑖 𝑗 )𝑇 Tb ˆQ𝑖 𝑗 ) = ℎ𝑘 ( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 )

RU ℎ𝑘 ( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 ) = ℎ𝑘 ((RU ˆQ𝑖 𝑗 )𝑇 RU ˆQ𝑖 𝑗 ) = ℎ𝑘 ( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 )

P𝜎 ℎ𝑘 ( ˆQ𝑇

𝑖 𝑗 ˆQ𝑖 𝑗 ) = ℎ𝑘 ( ˆQ𝑇

𝜎(𝑖)𝜎( 𝑗)

ˆQ𝜎(𝑖)𝜎( 𝑗)).

Plugging Eq. (2.22) into the definition of 𝚵𝑖 𝑗 yields (2.20).

14

(2.22)

□

2.6 Training

Eq.

(2.17) departs from the common CG models by retaining both the heterogeneity and

non-Markovianity of the energy dissipation process. Rather than matching the mean-field metrics

such as the homogeneous VACF, we learn the embedded memory 𝚵(Q(𝑡))e𝚲(𝑡−𝑠)𝚵(Q(𝑠))𝑇 based

on the MZ form. However, directly solving the orthogonal dynamics eQZ 𝐿𝑡 is computationally

intractable. Alternatively, we introduce the constrained dynamics ˜z(𝑡) = eR𝑡z(0) following Ref.

Hijón et al. (2010). Based on the observation PQ = PR ≡ 0, we sample the MZ form from

˜z(𝑡), i.e., K𝑀 𝑍 (Z, 𝑡) = PZ [(eR𝑡 QZ LP ) (QZ LP )𝑇 ] and the memory of the CG model reduces to
K𝐶𝐺 (Z, 𝑡) = 𝚵(Q)e𝚲𝑡𝚵(Q)𝑇 . This enables us to train the CG models in terms of the encoders

{f 𝑖 (·), h𝑖 (·)}𝑛

𝑖=1 and matrices ˆL and ˆL𝑎 by minimizing the empirical loss

𝐿 =

𝑁𝑠
∑︁

𝑁𝑡
∑︁

𝑙=1

𝑗=1

(cid:13)
(cid:13)K𝐶𝐺 (Z (𝑙), 𝑡 𝑗 ) − K𝑀 𝑍 (Z (𝑙), 𝑡 𝑗 )
(cid:13)

(cid:13)
2
(cid:13)
(cid:13)

,

(2.23)

where 𝑙 represents the different CG configurations.

2.7 Numerical Result

To demonstrate the accuracy of the present model, we consider a full micro-scale model of a

star-shaped polymer melt system similar to Ref. Hijón et al. (2010), where each molecule consists of

73 atoms. The atomistic interactions are modeled by the Weeks-Chandler-Anderse potential and the

Hookean bond potential. The full system consists of 486 molecules in a cubic domain 90 × 90 × 90

with periodic boundary conditions. The Nosé-Hoover thermostat Nosé (1984); Hoover (1985) is

employed to equilibrate the system with 𝑘 𝐵𝑇 = 4.0 and micro-canonical ensemble simulation is

conducted during the production stage. Below we compare different dynamic properties predicted by

the full MD and the various CG models. For fair comparisons, we use the same CG potential 𝑈 (Q)

constructed by the DeePCG scheme Zhang et al. (2018b) for all the CG models; the differences in

dynamic properties solely arise from the different formulations of the memory term.

Let us start with the VACF which has been broadly used in CG model parameterization and

validation. As shown in Fig. 2.2, the predictions from the present model (NM-MB) show good

agreement with the full MD results. In contrast, the CG model with the memory term represented by

15

Figure 2.2 The VACF of the full MD and CG models with various memory formulations in (a)
semi-log scale (b) original scale. “M” and “NM” represent Markovian and Non-Markovian; GLE,
DPD, and MB represent state-independent, pairwise, and the present (NM-MB) model retaining the
many-body effects, respectively.

the pairwise decomposition and Markovian approximation (i.e., the standard M-DPD form) yields

apparent deviations. The form of the pairwise decomposition with non-Markovian approximation

(NM-DPD) shows improvement at a short time scale but exhibits large deviations at an intermediate

scale. Such limitations indicate pronounced many-body effects in the energy dissipation among

the CG particles. Alternatively, if we set the VACF as the target quantity, we can parameterize

the empirical model such as GLE by matching the VACF predicted by the full MD. Indeed, the

prediction from the constructed GLE recovers the MD results. However, as shown below, this form

over-simplifies the heterogeneity of the memory term and leads to inaccurate predictions on the

collective scales.

Fig. 2.3 shows the velocity cross-correlation function (VCCF) between two CG particles, i.e.,
𝐶𝑥𝑥 (𝑡; 𝑟0) = E[V𝑖 (0) · V 𝑗 (𝑡)|𝑄𝑖 𝑗 (0) = 𝑟0], where 𝑟0 represents the initial distance. Similar to

VACF, the present model (NM-MB) yields good agreement with the full MD results. However,

the predictions from other empirical models, including the GLE form, show apparent deviations.

Such limitations arise from the inconsistent representation of the local energy dissipation and can

be understood as following. The VACF represents the energy dissipation on each particle as a

16

1031021011000.10.40.71.0(a)Full MDM-DPDNM-GLENM-DPDNM-MB012340.10.40.71.0(b)VACFTimeFigure 2.3 The VCCF 𝐶𝑥𝑥 (𝑡; 𝑟0) predicted by the full MD and different CG models with initial
distance (a) 10 < 𝑟0 < 11 and (b) 14 < 𝑟0 < 15. Same line legend as Fig. 2.2.

homogeneous background heat bath; it is essentially a mean-field metric and can not characterize

the dissipative interactions among the particles. Hence, the reduced models that only recover the

VACF could be insufficient to retain the consistent local momentum transport and the correlations

among the particles.

Figure 2.4 (a) Longitudinal and (b) Transverse hydrodynamic modes predicted by MD and different
CG models. Same line legend as Fig. 2.2.

17

0240.0050.0200.035(a)0240.000.030.06(b)TimeVCCF051015200.50.00.51.0Longitudinal(a)051015200.000.250.500.751.00Transverse(b)TimeFurthermore, the various empirical models for local energy dissipations can lead to fundamentally

different transport processes on the collective scale. Fig. 2.4 shows the normalized correlations

of the longitudinal and transverse hydrodynamic modes Hansen and McDonald (1990), i.e.,
𝐶𝐿 (𝑡) = ⟨ ˜𝑢1(𝑡) ˜𝑢1(0)⟩ and 𝐶𝑇 (𝑡) = ⟨ ˜𝑢2(𝑡) ˜𝑢2(0)⟩, where ˜u = 1/𝑀 (cid:205)𝑀
𝑗=1 V 𝑗 e𝑖k·Q 𝑗 , k is the wave
vector, and the subscripts 1 and 2 represent the direction parallel and perpendicular to k, respectively.

Similar to the VCCF, the prediction from the present model (NM-MB) agrees well with the MD

results while other models show apparent deviations. In particular, the prediction from the GLE

model shows strong over-damping due to the ignorance of the inter-molecule dissipations.

Finally, we examine the diffusion process on the collective scale. Fig. 2.5 shows the van

Hove function that characterizes the evolution of the inter-particle structural correlation defined

by 𝐺 (𝑟, 𝑡) ∝ 1
𝑀 2

(cid:205)𝑀

𝑗≠𝑖 𝛿(∥Q𝑖 (𝑡) − Q 𝑗 (0) ∥ − 𝑟). At 𝑡 = 0, 𝐺 (𝑟, 𝑡) reduces to the standard radial

distribution function where all the CG models can recover such initial conditions. However, for

𝑡 > 0, predictions from the models with the pairwise decomposition (NM-DPD) and the GLE form

show apparent deviations. Specifically, at an early stage near 𝑡 = 50, the neighboring particles begin

to artificially jump into the region near the reference particle, violating the fluid-structure thereafter.

In contrast, the present model (NM-MB) shows consistent predictions of the structure evolution over

a long period until 𝑡 = 1000, when the initial fluid structure ultimately diffuses into a homogeneous

state.

2.8 Conclusion

To conclude, this work reveals a caveat in constructing reliable CGMD models that preserve the

collective dynamics. Unlike the empirical forms, we developed a CG model that faithfully accounts

for the broadly overlooked many-body nature of the non-Markovian memory term for extensive

MD systems. While the significance of preserving the many-body nature of the conservative force

field on static properties has been gradually recognized, the caveat on the memory term seems to

remain under-explored. We show that retaining the heterogeneity and the strong correlation of the

local energy dissipation is crucial for accurately predicting the cross-correlation among the CG

particles, which, however, can not be fully characterized by the mean-field metrics such as VACF.

18

Figure 2.5 The van Hove function predicted by (a) full MD (b) the present NM-MB model (c)
NM-DPD model (d) GLE model. It depicts the time evolution (y-axis) of the radius distribution
function (x-axis).

19

0102030200400600(c)1.00.80.80.81.00200400600800(a)0.00.20.40.60.81.01.00102030(d)1.00.80.80.81.0(b)1.00.80.80.61.0TimerMore importantly, the memory form representing the inter-molecule energy dissipations may play

a profound role in the transport and diffusion processes on the collective scale. In particular, the

present model accurately predicts the hydrodynamic mode correlation and the van Hove function

where empirical forms show limitations, and therefore, shows the promise to accurately predict the

emergent phenomena relevant to hydrodynamic transport and diffusive processes on the collective

scale.

20

CHAPTER 3

GENERALIZATION OF REDUCED MODELS FOR NON-EQUILIBRIUM DYNAMICS

3.1 Challenges in Modeling Non-Equilibrium Processes

Predictive modeling of the collective behaviors of multi-scale physical systems poses a persistent

challenge for both fundamental science advancement and various applications Anderson (1972).

While the canonical molecular dynamics enables us to faithfully account for the micro-scale

interactions, numerical simulations often show limitations for systems without clear scale separation,

where the computational cost could become prohibitive to reach the resolved scale of interest. This

motivates the construction of various coarse-grained molecular dynamics (CGMD) models. By

picking a set of collective variables (CVs), the CG models seek the reduced dynamics with less

degrees of freedom to achieve a broader range of spatio-temporal scales. The constructed CG

models are generally governed by a conservative CG potential representing the free energy of the

resolved CVs and a memory term (along with a coherent noise) representing the energy dissipation

that arises from the coupling with unresolved variables. Extensive research efforts Rosso et al.

(2002); Maragliano and Vanden-Eijnden (2006); Izvekov and Voth (2005); Noid et al. (2008); Rudd

and Broughton (1998); Lyubartsev and Laaksonen (1995); Shell (2008); Nielsen et al. (2004); Laio

and Parrinello (2002); Darve and Pohorille (2001); Soper (1996); Reith et al. (2003); Nielsen et al.

(2004); Das and Andersen (2012), including machine learning (ML) based approaches Behler and

Parrinello (2007); Stecher et al. (2014a); John and Csányi (2017); Lemke and Peter (2017); Chmiela

et al. (2017); Zhang et al. (2018b); Ge et al. (2023); van der Oord et al. (2023), have been dedicated

to developing the CG potential to preserve the marginal density distribution of the CG variables

and hence the various static properties. To retain the CG dynamics, the memory term needs to be

properly introduced. Several approaches have been developed based on the direct approximations

Darve et al. (2009); Lei et al. (2010); Hijón et al. (2010); Yoshimoto et al. (2013); Hudson and

Li (2020) of the Mori-Zwanzig (MZ) projection formalism Mori (1965); Zwanzig (1973) and

data-driven parameterization Lange and Grubmüller (2006); Ceriotti et al. (2009); Coifman et al.

(2008); Crosskey and Maggioni (2017); Baczewski and Bond (2013); Ma et al. (2016); Lei et al.

21

(2016); Russo et al. (2019); Jung et al. (2017); Lee et al. (2019); Klippenstein and van der Vegt

(2021); Vroylandt et al. (2022); She et al. (2023); Xie et al. (2022); Ge et al. (2024) of empirical forms

(e.g., dissipative particle dynamics Hoogerbrugge and Koelman (1992); Español and Warren (1995),

the generalized Langevin dynamics Zwanzig (2001)). A recent work Lyu and Lei (2023b) proposed

a symmetry-preserving representation of the memory term that accounts for both non-Markovianity

and the many-body nature among the CG variables, which proves to be crucial for diffusion and

transport processes on a resolved scale.

Ideally, by accurately constructing the conservative free energy and the memory term, the CG

models will enable us to quantify the propagation of the micro-scale interactions and therefore probe

the collective behaviors across multiple scales. However, the validity and generalization ability for

practical applications remains under-explored. In particular, for extensive MD systems (i.e., the

number of molecules proportional to the system size), the CG variables are often chosen a priori such

as the centers of mass (COMs) of individual molecules; the CG model is generally constructed such

that certain dynamic properties (e.g, the mean square displacement, velocity correlation functions)

under equilibrium can be properly reproduced. On the other hand, the model’s applicability for

non-equilibrium processes remains questionable. For instance, if we coarse grain a polymer melt

system using the COMs of individual molecules, we should not expect the CG model can capture

the visco-elastic responses arising from the molecule deformation under an external flow field.

From a model reduction perspective, this limitation arises from the choice of the CG projection

operator, which is generally defined with respect to the (marginal) equilibrium density of the full

MD system. Accordingly, reduced dynamics can recover the full MD prediction only when the

underlying distribution is close to equilibrium. However, this caveat seems to be broadly overlooked

in existing CGMD models that use equilibrium dynamic properties as the metric, and therefore

poses fundamental challenges for predicting the non-equilibrium processes in real applications.

In principle, we may directly construct the reduced model for a non-equilibrium process by

defining a projection operator with respect to a specific probability measure. However, the reduced

dynamics typically relies on a time-dependent projection operator (e.g., see Ref. Kawasaki and

22

Gunton (1973); Willis and Picard (1974); Baxevani et al. (2023)), which, as a result, generates

a non-stationary memory term that may not be transferred to a different scenario. In this work,

we present a new approach for constructing reliable CGMD models applicable to non-equilibrium

processes. Rather than using pre-selected CG variables such as the COMs, we seek a set of auxiliary

CG variables as the generalized coordinates of each molecule for the optimal representation of

both the intra- and inter-molecular interactions. The key observation is that by systematically

introducing these auxiliary CG variables, the conditional distribution of unresolved variables under

various non-equilibrium conditions approaches that under equilibrium conditions. This warrants the

generalization ability and enables us to transfer the empirical approximation of the non-equilibrium

processes into the construction of the Zwanzig’s projection dynamics with respect to a probability

density distribution with augmented conditional variables. To construct the reduced dynamics, we

note that both the free energy and the memory term exhibit the many-body nature. We generalize

the symmetry-preserving neural network representation of the state-dependent memory developed

in our previous work Lyu and Lei (2023b) to model both the intra- and inter-energy dissipation. For

each CG variable, we further introduce a number of non-Markovian features, allowing a coherent

white noise term to be naturally imposed that satisfies the second fluctuation-dissipation theorem

and preserves the invariant distribution. We demonstrate the proposed method by constructing the

CGMD models of a polymer melt system. Numerical results show that the CG model based on the

pre-selected COMs can recover the dynamic properties near equilibrium but is insufficient to predict

the reduced dynamics under external flow conditions. Conversely, the present model guarantees

that the unresolved orthogonal dynamics remains near equilibrium, ensuring the applicability of the

projection formalism, and therefore, accurately predict non-equilibrium processes under various

flow field conditions.

3.2 Methodology for Generalizable Coarse-Grained Models

3.2.1 Preliminaries

Let us consider a full MD system consisting of 𝑀 molecules with a total number of 𝑁 atoms.

For simplicity, we assume each molecule consists of 𝑁𝑚 atoms, and the mass is set to be unit. The

23

full model is governed by

(cid:164)z = S∇𝐻 (z) z(0) = z0,

(3.1)

where 𝐻 (z) is the Hamiltonian, z = (cid:2)q1, q2, · · · , q𝑀, p1, p2, · · · , p𝑀 (cid:3) is the phase space vector and
(cid:104)
q𝐼
S is the symplectic matrix. Specifically, q𝐼 =
∈ R𝑁𝑚×3
1

and p𝐼 =

, · · · , p𝐼

, · · · , q𝐼

p𝐼
1

𝑁𝑚

𝑁𝑚

(cid:105)

(cid:104)

(cid:105)

represent the position and momentum vectors of the 𝐼-th molecule for 𝐼 = 1, 2, · · · 𝑀. In this work,

the superscript indices in capital letter 𝐼 and 𝐽 ranging from 1 to 𝑀 label the molecule index, while

the subscript indices 𝑛 and 𝑙 ranging from 1 to 𝑁𝑚 label the individual particles within the molecule.
To construct the reduced model, we define the CG variables of each molecule Z𝐼 = (cid:2)Q𝐼, P𝐼 (cid:3) via a
map 𝜙 : R𝑁𝑚×6 → R𝑚×6, i.e.,

Q𝐼 = 𝜙𝑄 (q𝐼) P𝐼 = 𝜙𝑃 (q𝐼, p𝐼),

(3.2)

where Q𝐼 ∈ R𝑚×3 and P𝐼 ∈ R𝑚×3 represent the generalized coordinates and momenta of the 𝐼-th

molecule, respectively. Let Q = [Q1, Q2, · · · , Q𝑚𝑀] and P = [P1, P2, · · · , P𝑚𝑀] denote the CG
variables of the full system. Furthermore, we use Q𝐼 = (cid:2)Q(𝐼−1)𝑚+1, · · · , Q𝐼𝑚(cid:3) to denote the position
𝑗 = Q𝜇 and 𝜇 = (𝐼 − 1)𝑚 + 𝑗, similarly for P 𝐼. In this work,
vector of the 𝐼-th molecule, where Q𝐼

the Greek letters 𝛼, 𝛽, 𝜇, 𝜈 ranging from 1, · · · , 𝑚𝑀 label the global indices of CG coordinate.

In particular, one natural choice is to define Q𝐼 ∈ R3 as the COM and P 𝐼 ∈ R3 the total

momentum of the 𝐼-th molecule (e.g., see Refs. Lei et al. (2010); Hijón et al. (2010); Yoshimoto et al.

(2013)), which essentially eliminates the intra-molecular DOFs. The dynamic evolution is governed
(cid:164)Z (𝑡) = LZ (𝑡), where the Louville operator L𝜙(z) := −((∇𝐻 (z0))𝑇 S∇z0)𝜙(z) depends on
the full phase space vector z. To derive the reduced dynamics in terms of Z (𝑡), we define the

by

Zwanzig’s projection operator as a condition expectation with respect to Z (0) = Z, i.e.,

PZ 𝑓 := E[ 𝑓 (z)|𝜙(z) = Z] =

∫

𝛿(ϕ(z) − Z) 𝜌0(z) 𝑓 (z)dz/

∫

𝛿(ϕ(z) − Z) 𝜌0(z)dz,

(3.3)

where 𝜌0(z) ∝ exp [−𝛽𝐻 (z)] represents the equilibrium Boltzmann distribution. Accordingly, we
(cid:164)Z (𝑡) = LZ (𝑡) on the sub-space of the CG variables. Using the

may project the evolution dynamics

Duhamel-Dyson formula, the reduced dynamics takes the form
∫ 𝑡

(cid:164)Z (𝑡) = eL𝑡 PZ LZ (0) +

d𝑠eL (𝑡−𝑠) PZ LeQZ L𝑠QZ LZ (0) + eQZ L𝑡 QZ LZ (0),

(3.4)

0

24

where QZ = I − PZ. With some further assumptions, the reduced dynamics can be simplified into

the following integro-differential equation

(cid:164)Q = M −1P

(cid:164)P = −∇𝑈 (Q) +

∫ 𝑡

0

K (Q(𝑠), 𝑡 − 𝑠) (cid:164)Q(𝑠) d𝑠 + R(𝑡),

(3.5)

where M is the mass matrix of the CG variables, 𝑈 (Q) is the conservative free energy, and K (Q, 𝑡)

is the memory term assumed to be independent of P and R(𝑡) is the fluctuation force.

In principle, with proper construction of the individual modeling terms, Eq. (3.5) provides

an accurate CGMD model of the full dynamics (3.1). However, its applicability for practical

applications, especially the generalization ability for modeling non-equilibrium processes remains

questionable. This caveat is rooted in the definition of the projection operator in Eq. (3.3), where the

conditional expectation is defined with respect to the marginal density of the equilibrium distribution

𝜌0(z). Accordingly, the derived reduced dynamics (3.5) assumes that the initial distribution of

the full model satisfies 𝜌(z|Z) ∝ 𝛿(𝜙(z) − Z) 𝜌0(z). Hence, Eq. (3.5) is valid only when the

underlying distribution of the unresolved variables is close to equilibrium. Unfortunately, this

condition is generally not guaranteed when we model non-equilibrium processes (e.g., full model

(3.1) in the presence of an external field) using the CG variables like the COMs of individual

molecules, which poses a fundamental challenge for the transferability of the reduced model (2.2)

and (3.5) in real applications.

3.2.2 Construction of the auxiliary CG variables

Following the above discussion, let 𝐻𝑒 (z) and 𝜌𝑒 (z) denote the Hamiltonian and an initial (e.g.,

the steady-state) distribution under an external field. Theoretically, we may define the Zwanzig’s

projection operator with respect to 𝜌𝑒 (z) and derive the reduced dynamics under this external field.

However, the constructed reduced dynamics could be valid only for this specific condition but lack

the generalization ability for other external field conditions.

To address this issue, we transfer the exhausting effort of constructing specific reduced models

for individual non-equilibrium processes to pursuing the following question: how to ensure the

conditional projection operator defined by Eq. (3.3) remains valid for a range of external conditions?

25

A key observation is that by introducing a set of auxiliary CG variables Z𝐼 = [Z𝐼
1

, Z𝐼
2

, · · · , Z𝐼

𝑚] for

the individual molecules 𝐼 = 1, 2, · · · , 𝑀, the conditional probability density function (PDF) under

external field approaches that under the equilibrium distribution, i.e.,

1
Z𝑒

𝑀
(cid:214)

𝑚
(cid:214)

𝐼=1

𝑗=1

𝛿(𝜙(z𝐼) 𝑗 − Z 𝐼

𝑗 ) 𝜌𝑒 (z) ≈

1
Z0

𝑀
(cid:214)

𝑚
(cid:214)

𝐼=1

𝑗=1

𝛿(𝜙(z𝐼) 𝑗 − Z 𝐼

𝑗 ) 𝜌0(z).

(3.6)

Intuitively, this approximation can be understood as follows: As we increase the number of resolved

CG variables, the entropy contribution of the unresolved ones decreases. Accordingly, the free

energy of different external conditions approaches the equilibrium case, and the reduced model (3.4)

and (3.5) in the form of Z 𝐼 can be applied to non-equilibrium processes. This is somewhat similar

to the work Lei et al. (2020); Yu et al. (2005) on modeling the constitutive closure of polymer

solution, where a set of generalized conformation tensors are introduced to represent the subgrid

polymer configurations under various flow conditions.

To construct the auxiliary CG variables, we seek a linear map 𝜙(·) in the form of a matrix

f𝑊 ∈ R𝑁𝑚×𝑚, i.e.,

𝜙𝑄 (q𝐼) 𝑗 =

(cid:205)𝑁𝑚
𝑛=1
(cid:205)𝑁𝑚
𝑛=1

𝑤𝑛 𝑗 q𝐼
𝑛
𝑤𝑛 𝑗

,

𝜙𝑃 (p𝐼) 𝑗 =

𝑚
∑︁

𝑖=1

𝑀 𝑗𝑖

(cid:205)𝑁𝑚
𝑛=1
(cid:205)𝑁𝑚
𝑛=1

𝑤𝑛𝑖p𝐼
𝑛
𝑤𝑛𝑖

,

(3.7)

where the mass matrix is defined as 𝑀 𝑗𝑖 = (cid:205)𝑁𝑚
𝑛=1

𝑤𝑛 𝑗 𝑤𝑛𝑖, and the normalization ensures that the

transitional and rotational symmetry of the CG variables, i.e.,

𝜙(q𝐼 + c, p𝐼) = [𝜙𝑄 (q𝐼 + c), 𝜙𝑃 (p𝐼)] = [𝜙𝑄 (q𝐼) + c, 𝜙𝑃 (p𝐼)]

𝜙(q𝐼U, p𝐼U) = [𝜙𝑄 (q𝐼U), 𝜙𝑃 (p𝐼U)] = [𝜙𝑄 (q𝐼)U, 𝜙𝑃 (p𝐼)U],

for any c ∈ R3 and U ∈ S𝑂 (3). As shown below, Eq. (3.7) can be loosely viewed as seeking the

principal normal modes of the full dynamics. We note that it is possible to construct 𝜙(·) as a

non-linear encoder to seek optimal CG representations of the inter- and intra- molecular interactions.

In this work, we focus on developing a general CG framework of introducing auxiliary CG variables

to enhance the generalization ability and examining Eq.

(3.6) as a valid metric for modeling

26

non-equilibrium processes. We restrict 𝜙(·) to be linear; the model reduction in terms of nonlinear

CG variables will be investigated in future work.

Furthermore, we impose the following two constraints to the weight matrix W , i.e.,

0 < 𝑤𝑛𝑖 < 1,

𝑚
∑︁

𝑖=1

𝑤𝑛𝑖 = 1

for all 𝑛 = 1, · · · , 𝑁𝑚.

(3.8)

We emphasize that these constraints do not impose further restrictions on constructing the CG
variables. Specifically, for any mapping defined by a matrix W1 ∈ R𝑁𝑚×(𝑚−1), there exists an
equivalent mapping defined by a matrix W2 ∈ R𝑁𝑚×𝑚 which satisfies the above constraints (3.8).

We refer to the Appendix for the detailed discussion. Furthermore, we note that the second

(cid:205)𝑚

constraint ensures that the COM of the full molecule is consistent with that of the CG variables, i.e.,
𝑛=1 q𝐼
of the different CG models on the same metric in Sec. 3.3. Moreover, Newton’s third law can be

𝑛, which enables us to establish a fair comparison

𝑗 = (cid:205)𝑚
𝑗=1

𝑛 = (cid:205)𝑁𝑚

(cid:16)(cid:205)𝑁𝑚
𝑛=1

(cid:205)𝑁𝑚
𝑛=1

𝑤𝑛 𝑗 q𝐼

𝑤𝑛 𝑗

Q𝐼

𝑗=1

(cid:17)

naturally imposed in Eq. (3.20).

To learn CG mapping 𝜙(·), we aim to find a weight matrix W such that condition (3.6) holds,

which, unfortunately, cannot be easily checked a priori. Alternatively, we propose to learn the CG

variables with the longest relaxation of the time correlation function. Specifically, we collect training

samples under an external field and seek W by solving the optimization problem

(cid:205)𝑀,𝑚

(cid:205)𝜏 (cid:205)𝑡𝑐
𝑡=0
(cid:205)𝜏 (cid:205)𝑀,𝑚

𝐼, 𝑗 V𝐼
𝐼, 𝑗 V𝐼

𝑗 (𝜏)V𝐼
𝑗 (𝜏)V𝐼

𝑗 (𝜏 + 𝑡)𝑇
𝑗 (𝜏)𝑇

max
W

(3.9)

under constraint (3.8), where V𝐼

𝑗 =

(cid:205)𝑁𝑚
𝑛=1
(cid:205)𝑁𝑚
𝑛=1

𝑤𝑛 𝑗 p𝐼
𝑛
𝑤𝑛 𝑗

, 𝑡𝑐 is a hyperparameter representing the cut-off time

when the correlation function decays to 0 and 𝜏 is randomly selected from a long MD trajectory under

the steady state. Intuitively, Eq. (3.9) yields the eigenmodes of the largest (negative) eigenvalue

of the operator L under the linear approximation Crommelin and Vanden-Eijnden (2011). We

note that this is somewhat similar to the time-lagged independent component analysis (TICA)

Pérez-Hernández et al. (2013); Molgedey and Schuster (1994); Schwantes and Pande (2013) and

dynamic mode decomposition (DMD) Schmid (2010); Chen et al. (2012); Kutz et al. (2016) to

identify the dominant dynamics from the simulation data. Furthermore, we emphasize that Eq. (3.9)

27

𝜇

Figure 3.1 Diagram illustrating the neural network architecture for the CG conservative force and the
memory kernel. (a) The construction of CG potential function 𝑈 and memory kernel K. Initially, Q
is converted into a local environment matrix { ˜Q𝜇}𝑀𝑚
𝜇=1. Sub-networks, illustrated in (b), map ˜Q𝜇 to a
local feature D𝜇 and generalized coordinate ˆQ𝜇. Finally, the totel potential is constructed by Eq.
(3.10), i.e. 𝑈 = (cid:205)𝑀𝑚
˜𝑈 (𝐷 𝜇). The total memory kernel K is constructed with the state-dependent
component of the memory kernel derived from ˆQ𝜇 using Eq.
(2.16) and the time-dependent
component 𝚲. (b) The sub-networks map ˜Q𝜇 to a local feature 𝐷 𝜇 and generalized coordinate ˆQ𝜇.
The neighbour of the 𝜇-th CG coordinate is denoted by N𝜇 = {𝛼, · · · , 𝜈, · · · , 𝛽}. (b1) The 𝑘-th row
of generalized local environment matrix embeds the relative information between 𝜇-th coordinate
and its 𝑘-th neighbor (labeled as the 𝜈-th coordinate), including type embedding 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇, and
distance information. (b2) The 𝐾1 × 𝐾2 symmetry perserving feature D𝜇 is constructed by (G1,𝜇)𝑇 ,
ˆQ𝜇, ˆQ𝑇
𝜇 and G2,𝜇. (b3) The generalized coordinate is constructed from the local environment
embedding matrix G3,𝜇 and relative position ¯Q𝜇, which is the last three columns of ˜Q𝜇.

should not viewed as the equivalent condition of Eq. (3.6). Rather, it is essentially an empirical

approach for learning CG mappings such that the metric (3.6) satisfies approximately.

3.2.3 Construction of the symmetry-preserving free energy function

With the CG mapping 𝜙, the construction of the CG model proceeds with learning the free

energy 𝑈 (Q) and the memory kernel K (Q, 𝑡). To retain the extensive structure, the free energy

𝑈 (Q) is decomposed into the local potential of individual CG coordinates, i.e.,

𝑈 (Q) =

𝑀𝑚
∑︁

𝜇=1

˜𝑈 (D( ˜Q𝜇)),

(3.10)

where ˜𝑈 (·) represents the local potential, ˜Q𝜇 represents the local environment determined by the
relative position between the CG coordinates Q𝜇 and its neighbors Q𝜈 in {𝜈 (cid:12)

(cid:12)𝜈 ∈ N𝜇 }. The detailed

28

Mem. KernelD state embedded mat.Qcoord. mat.˜Q1⋯˜Qμ⋯˜QMmD1⋯Dμ⋯DMmgener. local env. mat.local featurestotal potential (a) (b)QμG1,μDμ=(G1,μ)T˜Qμ(˜Qμ)TG2,μlocal coord.gener. local env. mat.(b2) ̂Qμ=Qμ⊕(G3,μ)T¯Qμ̂Q1⋯̂Qμ⋯̂QMm̂Ξ1,2⋯̂Ξ1,Mm⋯⋯⋯̂ΞMm,1⋯̂ΞMm,MmKCG(Q,t)=Ξ(Q)eΛtΞ(Q)T embed. mat. G1,μ deﬁned as g1(s(rνμ),aν,aμ,bνμ)G2,μg2(s(rνμ),aν,aμ,bνμ)G3,μg3(s(rνμ),aν,aμ,bνμ)G2,μG3,μ (b3)U(Q)=Mm∑μ=1˜U(D(˜Qμ)) (b1)coord.K3 gener.(b3)(b2)˜Qμ=aαaμbαμs(rαμ)̂xαμ̂yαμ̂zαμ⋯⋯⋯⋯⋯⋯⋯ajaμbνμs(rνμ)̂xνμ̂yνμ̂zνμ⋯⋯⋯⋯⋯⋯⋯aβaμbβμs(rβμ)̂xβμ̂yβμ̂zβμ (b1)=¯Qμform will be specified following Eq. (3.12). D(·) represents the encoder functions that map ˜Q𝜇

into symmetry-preserving features. N𝜇 denotes the CG coordinate indices 𝜈 in the neighbour of CG

coordinate indices 𝜇, such that 𝑟 𝜇𝜈 < 𝑟𝑐. We define 𝑁𝜇 as the cardinality of the set N𝜇.

Fig. 3.1 sketches the neural network representation of 𝑈 (Q), where a structure similar to Ref.

Zhang et al. (2018c) is used to preserve the translational and rotational symmetry constraints. To

impose the permutational symmetry, we note that 𝑈 (Q) remains invariant with the index permutation

among the individual molecule, i.e.,

𝑈 (Q𝜎(1), Q𝜎(2), · · · , Q𝜎(𝑀)) = 𝑈 (Q1, Q2, · · · , Q𝑀),

(3.11)

where {𝜎(𝐼)}𝑀

𝐼=1 represents an index permutation among moleculars. However, 𝑈 (Q) does not

necessarily remain invariant under an index permutation among the generalized CG coordinates, i.e.,

𝑈 (Q1
1

, Q1
2

, · · · , Q2
1

, Q2

2 · · · ) ≠ 𝑈 (Q1
1

, Q2
2

, · · · , Q2
1

, Q1

2 · · · ),

𝑈 (Q1
1

, Q1
2

, · · · , Q2
1

, Q2

2 · · · ) ≠ 𝑈 (Q1
2

, Q1
1

, · · · , Q2
1

, Q1

2 · · · ),

(3.12)

where Q𝐼

𝑖 = Q(𝐼−1)𝑚+𝑖. This complication can not be remedied by introducing the particle type

information as done in Ref. Zhang et al. (2018c). This is because, for a specific CG coordinate,

its contribution to 𝑈 (Q) could be either inter- or intra-molecular interaction associated with other

CG coordinates. Therefore, we define the local environment of the 𝜇-th coordinate by ˜Q𝜇 =

(𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇, 𝑠(𝑟𝜈𝜇), ˆ𝑥𝜈𝜇, ˆ𝑦𝜈𝜇, ˆ𝑧𝜈𝜇) ∈ R𝑁𝜇×7, where relative position r𝜈𝜇 = Q𝜈 −Q𝜇 = (𝑥𝜈𝜇, 𝑦𝜈𝜇, 𝑧𝜈𝜇)

denotes the relative position to its neighbor, ˆ𝑥𝜈𝜇 = 𝑠(𝑟𝜈𝜇)𝑥𝜈𝜇/𝑟𝜈𝜇, and similar for ˆ𝑦𝜈𝜇 and ˆ𝑧𝜈𝜇. 𝑠(𝑟)

is a preselected function defined by

𝑠(𝑟) =

𝑟
𝑟 2
𝑐

1
𝑟

1
2𝑟

0






if 𝐼 = 𝐽,

if 𝑟 < 𝑟𝑐𝑠 and 𝐼 ≠ 𝐽,

(cid:104)

1 + cos

(cid:16) 𝜋(𝑟−𝑟𝑐𝑠)
𝑟𝑐−𝑟𝑐𝑠

(cid:17)(cid:105)

if 𝑟𝑐𝑠 ≤ 𝑟 < 𝑟𝑐 and 𝐼 ≠ 𝐽,

e𝑙𝑠𝑒,

where 𝐼 and 𝐽 represent the molecule indices of the 𝜇-th and its neighbor coordinate. The motivation

of this definition is the interactions between molecules are intended to steadily decline to zero as

29

the distance grows, whereas intra-molecular interactions are expected to intensify with increased

distance. In particular, if the two coordinates belong to the same molecule, their intra-molecular

interactions will always be accounted for. Otherwise, 𝑠(𝑟) is a smooth differentiable function that

decays to 0 beyond 𝑟𝑐. 𝑟𝑐𝑠 is a hyper-parameter and is set to be 𝑟𝑐 − 1 in this study. The 𝑎𝜈, 𝑎𝜇 and

𝑏𝜈𝜇 are parameters to represent the type of atom and the type of interaction. Specifically, 𝑏𝜈𝜇 is

set to be 1 if the 𝜇-th coordinate and its neighbor 𝜈-th coordinate are part of the same molecule

𝐼, which means that there exist 𝑖, 𝑗 ∈ {1, · · · , 𝑚} such that 𝜇 = (𝐼 − 1)𝑚 + 𝑖 and 𝜈 = (𝐼 − 1)𝑚 + 𝑗.

Otherwise, it is set to be −1. In addition, we also introduce 𝑎𝜈, 𝑎𝜇 to represent the type of the 𝜇-th

CG coordinate and the type of the 𝜈-th CG coordinate within the neighborhood. In practice, 𝑎𝜇, 𝑎𝜈

are set to be 𝑖, 𝑗 ∈ {1, · · · , 𝑚} defined in 𝑏𝜈𝜇.

With the local environment ˜Q𝜇, we construct the local features D ∈ R𝐾1×𝐾2 by

D( ˜Q𝜇) = (cid:0)G1,𝜇(cid:1)𝑇 ˜Q𝜇 (cid:0) ˜Q𝜇(cid:1)𝑇

G2,𝜇

𝑇

(3.13)

∑︁

∑︁

(cid:169)
(cid:173)
(cid:171)

𝜈∈N𝜇

𝜈∈N𝜇

:= (cid:169)
(cid:173)
(cid:171)

g2(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) ˜Q𝜈𝜇(cid:170)
(cid:174)
(cid:172)

g1(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) ˜Q𝜈𝜇(cid:170)
(cid:174)
(cid:172)
where G1,𝜇 ∈ R𝑁𝜇×𝐾1, G2,𝜇 ∈ R𝑁𝜇×𝐾2. The 𝑘-th row of G1,𝜇 is represented as a neural network
g1(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) that embeds the relative information between the 𝜇-th coordinate and
its 𝑘-th neighbor (labeled as the 𝜈-th coordinate) in R4 to a feature space in R𝐾1. Similarly,
G2,𝜇 = g2(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) is represented by a neural network mapping R4 to R𝐾2.

,

To construct the free energy function 𝑈 (Q), we conduct restrained MD simulations (see Sec.

3.2.4 for details) and train the neural network representations ˜𝑈 (·), g1(·) and g2(·) by minimizing

the empirical loss between the MD and CG model, i.e.,

𝐿𝑈 =

𝑆
∑︁

𝑠=1

(cid:13)
(cid:13)
(cid:13)

∇𝑈 (Q(𝑠)) + F (Q(𝑠))

(cid:13)
2
(cid:13)
(cid:13)

,

(3.14)

where the superscript 𝑠 represents various configurations and the F represents the conservative

force term sampled from the full MD simulation; see Sec. 3.2.5 for details.

30

3.2.4 Construction of the symmetry-preserving memory function

Besides the free energy function 𝑈 (Q), the reduced dynamics further depends on the memory

function K (Q, 𝑡). In particular, recent work Lyu and Lei (2023b) shows that the memory function

could exhibit a strong many-body nature. Empirical approximations such as the standard GLE

with a homogeneous kernel and the dissipative particle dynamics with a pairwise decomposition

generally show limitations in modeling the heterogeneous energy dissipation among the CG particles.

To predict the collective dynamics, it is crucial to accurately model both the non-Markovian and

many-body nature of the memory function.

Following the neural network structure proposed in Ref. Lyu and Lei (2023b), we encode the

many-body dissipative interactions among the CG particles by introducing coordinate features
𝐾3𝜇 ] ∈ R𝐾3×3, where the feature (cid:98)Q𝜇 is defined by

(cid:98)Q𝜇 = [ (cid:98)Q1

𝜇, · · · , (cid:98)Q

𝜇, (cid:98)Q2

(cid:98)Q𝜇 = Q𝜇 ⊕ (cid:0)G3,𝜇(cid:1)𝑇 ¯Q𝜇

:= Q𝜇 ⊕

∑︁

𝜈∈N𝜇

g3(𝑠(𝑟𝜈𝜇), 𝑎𝜈, 𝑎𝜇, 𝑏𝜈𝜇) ¯Q𝜈𝜇,

(3.15)

where G3,𝜇 ∈ R𝑁𝜇×𝐾3, where g3 : R4 → R𝐾3 is a neural network that encodes the dissipative
¯Q𝜇 ∈ R𝑁𝜇×3 denotes the relative position between the 𝜇-th

interactions beyond the pairwise form.

coordinate and its local neighbor, whose 𝑘-th row (corresponding to 𝜈-th coordinate) is defined as

¯Q𝜈𝜇 := ( ˆ𝑥𝜈𝜇, ˆ𝑦𝜈𝜇, ˆ𝑧𝜈𝜇). Accordingly, we can construct the state-dependent memory in the form of
𝚵(Q(𝑡))e𝚲(𝑡−𝑠)𝚵(Q(𝑠))𝑇 , with 𝚵 = (cid:2)𝚵1𝚵2 · · · 𝚵𝐷 (cid:3) and

𝚵𝑑

𝜇𝜈 =

𝐾3∑︁

𝑘=1

𝑘 ( ˆQ𝜇𝜈 ˆQ𝑇
ℎ𝑑

𝜇𝜈) ˆQ𝑘

𝜇𝜈 ⊗ ˆQ𝑘

𝜇𝜈 + ℎ𝑑

0 ( ˆQ𝜇𝜈 ˆQ𝑇

𝜇𝜈)I,

(3.16)

where ℎ : R𝐾3×𝐾3 → R𝐷 (𝐾3+1) and 𝑑 = 1, · · · , 𝐷. 𝚲 takes an extendable form 𝚲 = ˆ𝚲 ⊗ I, where

I ∈ R3𝑀×3𝑀 is the identity matrix and ˆ𝚲 ∈ R𝐷×𝐷 specifies the coupling among the features. It

takes the form ˆ𝚲 = ˆL ˆL𝑇 + ˆL𝑎, where ˆL is a lower triangular matrix and ˆL𝑎 is an anti-symmetry

𝑇
matrix; it satisfies the Lyapunov stability condition ˆ𝚲 + ˆ𝚲

≥ 0.

We can show that 𝚵 in the form of Eq. (3.16) strictly preserve the translational, rotational, and

31

permutational symmetry constraints, i.e.,

𝜇𝜈 (Q1 + b, · · · , Q𝑚𝑀 + b) = 𝚵𝑑
𝚵𝑑

𝜇𝜈 (Q1, · · · , Q𝑚𝑀)

𝜇𝜈 (Q1U, · · · , Q𝑚𝑀U) = U𝚵𝑑
𝚵𝑑

𝜇𝜈 (Q1, · · · , Q𝑚𝑀)U𝑇

(3.17)

𝚵𝑑

𝜇𝜈 (Q𝜎(1), Q𝜎(2), · · · , Q𝜎(𝑀)) = 𝚵𝑑

𝜎(𝜇)𝜎(𝜈) (Q1, Q2, · · · , Q𝑀)

for any c ∈ R3, U ∈ S𝑂 (3), and index permutation {𝜎(𝐼)}𝑀
Q𝐼 = (cid:2)Q(𝐼−1)𝑚+1, · · · , Q𝐼𝑚(cid:3).

𝐼=1 between the molecules, where

To construct K (Q, 𝑡), we learn the memory embedded in Eq. (3.21) in the form of 𝚵(Q(𝑡))e𝚲(𝑡−𝑠)𝚵(Q(𝑠))𝑇

directly from the Zwanzig’s formalism so that the many-body nature can be naturally inherited. We

estimate the memory term K𝑀 𝑍 (Q, 𝑡) for a collection of CG configurations. This enables us to train
the memory function in terms of the encoders {g3(·), ℎ} and matrices { ˆL, ˆL𝑎} by minimizing the

empirical loss

𝐿𝐾 =

𝑁𝑠
∑︁

𝑁𝑡
∑︁

𝑠=1

𝑗=1

(cid:13)
(cid:13)K𝐶𝐺 (Q(𝑠), 𝑡 𝑗 ) − K𝑀 𝑍 (Q(𝑠), 𝑡 𝑗 )
(cid:13)

(cid:13)
2
(cid:13)
(cid:13)

,

(3.18)

where 𝑠 represents the different CG configurations.

3.2.5 Traning and evaluation of the CGMD model

Following Ref. Hijón et al. (2010), we use the restrained dynamics ˜z(𝑡) = eR𝑡z(0) to

approximate the orthogonal dynamics eQZ 𝐿𝑡 based on the observation PQ = PR ≡ 0. Accordingly,

the conservative term can be approximated by F (Q) = PZ LP and the memory term of Zwanzig’s
(cid:2)(eR𝑡 QZ LP ) (QZ LP )𝑇 (cid:3) and the
form (see Eq. (3.4)) can be approximated by K𝑀 𝑍 (Q, 𝑡) = PZ
memory of the CG model reduces to K𝐶𝐺 (Q, 𝑡) = 𝚵(Q)e𝚲𝑡𝚵(Q)𝑇 . To collect the training samples,

we establish the restrained dynamics following

(cid:164)q𝐼 = p𝐼 − W (W 𝑇 W )−1W 𝑇 p𝐼

(cid:164)p𝐼 = f 𝐼 − W (W 𝑇 W )−1W 𝑇 f 𝐼

(3.19)

where f 𝐼 =

(cid:104)

f 𝐼
1

, · · · , f 𝐼
𝑁𝑚

(cid:105)

∈ R𝑁𝑚×3 is the force of the individual atoms belonging to the molecule

𝐼. The total force of each CG variable is defined as F𝐼

𝑗 = (cid:205)𝑁𝑚
𝑛=1

𝑤𝑛 𝑗 f 𝐼

𝑛, which is equivalently denoted

by F𝜇 with 𝜇 = (𝐼 − 1)𝑚 + 𝑗 and 𝜇 = 1, · · · , 𝑚𝑀.

32

Under the restrained dynamics (3.19), the orthogonal fluctuation term corresponds to the
random force of each CG variable, i.e., QZ LP = [𝛿F1, 𝛿F2, · · · , 𝛿F𝑚𝑀], where 𝛿F𝜇 = F𝜇 − (cid:10)F𝜇(cid:11)
represents the random force of the 𝜇-th CG coordinate and ⟨·⟩ represents the condition expectation

whose generalization ability will be examined in Sec. 3.3. Due to the constraint (3.8), the Newton’s

third law is naturally satisfied, i.e.,
𝑁𝑚∑︁
𝑚𝑀
∑︁

𝑀,𝑚
∑︁

F𝜇 =

𝜇

𝐼,𝑖

𝑛

𝑤𝑛𝑖f 𝐼

𝑛 =

𝑀,𝑚
∑︁

𝑁𝑚∑︁

𝑀,𝑁𝑚∑︁

𝑤𝑛𝑖

𝐼,𝑖

𝑛

𝐽,𝑙

f 𝐼𝐽
𝑛𝑙 =

𝑀,𝑁𝑚∑︁

𝑀,𝑁𝑚∑︁

𝐼,𝑛

𝐽,𝑙

f 𝐼𝐽
𝑛𝑙 ≡ 0,

(3.20)

𝑛𝑙 represents the force between the 𝑛-th atom of the 𝐼-th molecule and the 𝑙-th atom of the

where f 𝐼𝐽
𝐽-th molecule. Therefore, 𝛿F𝜇 = − (cid:205)𝜈≠𝜇 𝛿F𝜈 and 𝚵𝜇𝜇 = − (cid:205)𝜈≠𝜇 𝚵𝜇𝜈.

The constructed reduced model can be simulated by the extended dynamics of [Q, P , ζ] in the

form of

(cid:164)Q = M −1P

(cid:164)P = −∇𝑈 (Q) + 𝚵(Q)ζ

(cid:164)ζ = −𝚵(Q)𝑇 (cid:164)Q − 𝚲ζ + ξ(𝑡),

(3.21)

where we introduce the white noise term following ⟨ξ(𝑡)ξ(𝑡′)⟩ = 𝛽−1(𝚲 + 𝚲𝑇 )𝛿(𝑡 − 𝑡′). With this

choice, we can show that Eq. (3.21) satisfies the second fluctuation-dissipation theorem. We refer to

Ref. Lyu and Lei (2023b) for the detailed proof.

3.3 Numerical Results

3.3.1 The full atomistic model

To validate our method, we consider a full micro-scale model of a polymer melt system. Each

polymer molecule consists of 𝑁𝑚 = 106 atoms. The central atom is connected to 3 arms and each

arm consists of 5 atoms. At the end of each arm, there are 6 additional arms connected to the end

atom and each additional arm consists of 5 atoms. The pairwise atomistic interaction is chosen to be

the Weeks-Chandler-Andersen potential

𝑉𝑝 (𝑟) =


𝑉L𝐽 (𝑟) − 𝑉L𝐽 (𝑅𝑐), 𝑟 < 𝑅𝑐



0, 𝑟 ≥ 𝑅𝑐

33

𝑉L𝐽 (𝑟) = 4𝜖

(cid:17) 12

(cid:20)(cid:16) 𝜎
𝑟

(cid:17) 6(cid:21)

,

−

(cid:16) 𝜎
𝑟

(3.22)

where 𝜖 = 1.0 is the dispersion energy, 𝑅𝑐 = 21/6𝜎 and 𝜎 = 2.415 is the hardcore distance. The

bond interaction is chosen to be the FENE potential

𝑉𝑏 (𝑟) = −0.5𝐾 𝑅2

0 ln

1 −

(cid:34)

(cid:19) 2(cid:35)

,

(cid:18) 𝑟
𝑅0

(3.23)

where stiffness is set to be 𝐾 = 1 and the distance is to be 𝑅0 = 5. The complete system consists of

𝑀 = 300 molecules in a 90 × 90 × 90 domain in the reduced unit. The periodic boundary condition

is imposed along each direction. The system is equilibrated under the Nosé-Hoover thermostat with

temperature 𝑘 𝐵𝑇 = 4.0. In this work, the external force field is imposed on the 𝑥-𝑦 plane and the

temperature is defined by the velocity in the 𝑧 direction. To collect the training samples, we impose

an external force field 𝑓𝑥 along the 𝑥-direction to generate the reverse Poiseuille flow, i.e.,




if 0 < 𝑦 < 45,

𝑓𝑥 (𝑦) =

− 𝑓0

else,

𝑓0

(3.24)

where 𝑓0 = 0.01 is the force magnitude. The full model is simulated for 1 × 106 steps with a time

step d𝑡 = 1 × 10−3 to achieve the steady state, followed by 5 × 105 steps to collect the statistical

samples in Eq (3.9), with 𝑡𝑐 set to be 5. To construct the CG models, we learn 𝑚 = 3 and 𝑚 = 4

CVs from the training set. Throughout the rest of this paper, we will denote the reduced model that

only uses the standard COMs by CGCOM, and the reduced models with 3 and 4 auxiliary CVs (per

each molecule) by CG3 and CG4, respectively. In particular, for the CG3 model, the obtained 3

CG variables are equivalent which uniformly divides the individual molecule into three parts. In

contrast, the obtained CG variables are inequivalent for the CG4 model.

3.3.2 Generalization ability of the CG models

As discussed in Sec. 3.2.1 and Sec. 3.2.2, the generalization ability of the CG models

for the non-equilibrium processes relies on the assumption that the conditional PDF (i.e., ∝

𝛿(𝜙𝑄 (q) − Q) 𝜌𝑒 (q)) used for defining the CG projection operator (3.3) remains nearly the same,

i.e., Eq. (3.6) holds for various external flow conditions. In practice, the direct evaluation of

this high-dimensional PDF becomes computationally intractable. Instead, we relax this metric

by examining the second moments of the atomistic particle distribution conditional to fixed CG

34

particle position in the orthogonal direction defined by q𝐼

variables under various non-equilibrium conditions. Specifically, we examine the variation of the
⊥,W = q𝐼 − W (W 𝑇 W )−1W 𝑇 q𝐼, where
q𝐼 ∈ R𝑁𝑚×3 represents the atomistic positions of the 𝐼-th molecule and W is the weight matrix

constructed by Eq. (3.9). The second-moment covariance matrix is defined by

C(𝑦) = Eneq (cid:104)

⊥,W q𝐼
q𝐼

⊥,W

𝑇

| 𝑦𝐼

𝑐 = 𝑦

(cid:105)

,

where C(𝑦) ∈ R𝑁𝑚×𝑁𝑚 and the conditional expectation Eneq is evaluated over the full molecules

𝐼 = 1, 2, · · · , 𝑀 under the stationary distribution generated by the external force field (3.24). The

condition 𝑦𝐼

𝑐 = 𝑦 restricts the COMs of the sampling molecule along the 𝑦-direction.

In particular, the external force generates the reverse Poiseuille flow along the 𝑥-direction,

where the flow shear rate

𝜕𝑢 𝑥
𝜕𝑦 and the external stress vary along the 𝑦-direction. Specifically,
the shear rate is close to 0 at 𝑦 = 𝐿/4 and 𝑦 = 3𝐿/4, and achieves the largest value at 𝑦 = 0,

𝑦 = 𝐿/2, and 𝑦 = 𝐿. Therefore, the variation of C (𝑦) with respect to 𝑦 provides a computationally

accessible metric for Eq. (3.6). If the conditional PDF is close to the equilibrium distribution, C (𝑦)

should be homogeneous. Conversely, the large variation implies the deviation from the equilibrium

distribution.

Figure 3.2 shows the difference ∥C (𝑦) − C (𝑦′) ∥𝐹 of the various CG models at different locations,

where 𝑦 and 𝑦′ correspond to the horizontal and vertical axis, respectively, and ∥ · ∥𝐹 is the Frobenius

norm. It is clear that when the COMs are the only CG variables, the unresolved (i.e., orthogonal)

degrees of freedom exhibit heterogeneous second moment along the 𝑦-drection, implying pronounced

variation of the conditional PDF under different external fields. On the other hand, the variation

decreases significantly when auxiliary CG variables are introduced into the CG3 and CG4 models,

showing the promise of certain generalization ability for non-equilibrium processes.

The discrepancy in the conditional PDF will further lead to inconsistent reduced modeling

terms under various non-equilibrium processes. To probe this effect, we examine the pairwise

conservative force between the CG coordinates. We emphasize that the CG force generally exhibits

a many-body nature Ge et al. (2023); here we use the projection along the pairwise direction Lei

et al. (2010); Hijón et al. (2010) as a metric to quantify the generalization ability of the free energy

35

Figure 3.2 The Frobenius norm of the second-moment difference ∥C (𝑦) − C (𝑦′) ∥𝐹 of the various
CG models under the steady reverse Poiseuille flow generated by external force 𝑓0 = 0.01 (upper) and
𝑓0 = 0.005 (lower), where 𝑦 and 𝑦′ are represented by the horizontal and vertical axis, respectively.
(a) and (d) CGCOM; (b) and (e) CG3; (c) and (f) CG4.

𝑈 (Q). Specifically, let F 𝐼𝐽

equilibrium state, the pairwise conservative force is defined as 𝐹eq

𝑖 𝑗 denote the force between the CG coordinates Q𝐼
𝑖 𝑗 (𝑄) = Eeq (cid:104)

𝑖 and Q𝐽
(cid:12)
𝑄 𝐼𝐽
· e𝐼𝐽
F 𝐼𝐽
(cid:12)
𝑖 𝑗
𝑖 𝑗
(cid:12)

𝑗 . For the
(cid:105)
𝑖 𝑗 = 𝑄

,

where e𝐼𝐽

𝑖 𝑗 = Q𝐼𝐽

𝑖 𝑗 and Eeq [·] is the expectation over the molecule index 1 ≤ 𝐼, 𝐽 ≤ 𝑀 under
the equilibrium state, and 1 ≤ 𝑖, 𝑗 ≤ 𝑚 is kept to represent the CG coordinate index within

𝑖 𝑗 /𝑄 𝐼𝐽

each molecule. For the non-equilibrium state, the pairwise conservative force is defined as

(cid:105)

, where the condition is taken that both Q𝐼
𝑖

𝐹neq
𝑖 𝑗
and Q𝐽

| 𝑦𝐼

F 𝐼𝐽
𝑖 𝑗

(𝑄, 𝑦) = Eneq (cid:104)
· e𝐼𝐽
𝑖 𝑗 = 𝑄
𝑖 𝑗
𝑗 share a specific value along the 𝑦-direction.
𝑖 𝑗 (𝑄) − 𝐹neq
𝐹eq
𝑖 𝑗

𝑗 = 𝑦, 𝑄 𝐼𝐽

𝑖 = 𝑦𝐽

Figure 3.3 shows the difference

(cid:12)
(cid:12)
(cid:12) for the different CG models. For the
standard CGCOM model where Q is represented by the COMs, the two CG force terms agree well at

(𝑄, 𝑦)

(cid:12)
(cid:12)
(cid:12)

𝑦 = 𝐿/4 = 22.5 and 𝑦 = 3𝐿/4 = 67.5, where the shear rate is close to zero and the atomistic particle

distribution is near equilibrium. In contrast, the two force terms show pronounced differences at

𝑦 = 0, 2/𝐿, and 𝐿 where the shear rate is large and the non-equilibrium distribution deviates from

36

03060900306090(a)0.00.20.40.60.81.0×10303060900306090(b)01530456003060900306090(c)01530456003060900306090(d)0.00.20.40.60.81.0×10303060900306090(e)01530456003060900306090(f)015304560the equilibrium distribution. This is consistent with the specific pattern of the second-moment

difference in Fig. 3.2(a) and (d). The large difference implies the heterogeneity of the conditional

PDF 𝛿(𝜙𝑄 (q) − Q) 𝜌𝑒 (Q) under various external conditions. Such limitation has a clear physical

interpretation: the intra-molecular interactions may significantly affect the visco-elastic response as

well as the collective dynamics, which, however, cannot be captured by the COMs. Fortunately, the

inconsistency can be alleviated by introducing auxiliary CG variables into the reduced model. As

shown in Fig. 3.3, the conservative force difference |𝐹eq

𝑖 𝑗 (𝑄) − 𝐹neq
𝑖 𝑗

(𝑄, 𝑦)| decreases to 𝑂 (0.1) for

the CG3 and CG4 models, showing the improvement in the generalization ability of the free energy

term 𝑈 (Q).

Finally, we study the generalization ability of the memory term K (Q, 𝑡). Similar to U (Q), we

note that the memory term K (Q, 𝑡) generally exhibits a many-body nature Lyu and Lei (2023b);

here we examine the variation of the pairwise fluctuation force 𝛿F 𝐼𝐽

𝑖 𝑗 = F 𝐼𝐽

𝑖 𝑗 − 𝐹𝑖 𝑗 (𝑄 𝐼𝐽

𝑖 𝑗 )e𝐼𝐽

𝑖 𝑗 . It can be

loosely viewed as a metric of K (Q, 𝑡 = 0) = E [𝛿F ⊗ 𝛿F ]. Specifically, we evaluate the variation

of 𝛿F 𝐼𝐽

𝑖 𝑗 under the equilibrium state
𝑖 𝑗 (𝑄) = Eeq (cid:104)
𝐾 eq

and the nonequilibrium state

𝛿F 𝐼𝐽
𝑖 𝑗

𝑇 (cid:16)

I − e𝐼𝐽
𝑖 𝑗

𝑇

e𝐼𝐽
𝑖 𝑗

(cid:17)

𝛿F 𝐼𝐽
𝑖 𝑗

(cid:12)
𝑄 𝐼𝐽
(cid:12)
(cid:12)

𝑖 𝑗 = 𝑄

(cid:105)

,

𝐾 neq
𝑖 𝑗

(𝑄, 𝑦) = Eneq (cid:104)

𝛿F 𝐼𝐽
𝑖 𝑗

𝑇 (cid:16)

I − e𝐼𝐽
𝑖 𝑗

𝑇

e𝐼𝐽
𝑖 𝑗

(cid:17)

𝛿F 𝐼𝐽
𝑖 𝑗

(cid:12)
𝑄 𝐼𝐽
(cid:12)
(cid:12)

𝑖 𝑗 = 𝑄, 𝑦𝐼

𝑖 = 𝑦𝐽

𝑗 = 𝑦

(cid:105)

,

where the expectation is over the molecule index 1 ≤ 𝐼, 𝐽 ≤ 𝑀. Fig. 3.4 shows the difference
|𝐾 n𝑒𝑞
𝑖 𝑗

𝑖 𝑗 (𝑄)| for various CG models. Similar to the conservative CG force, the variation

(𝑄, 𝑦) − 𝐾 e𝑞

of the fluctuation force exhibits pronounced heterogeneity for the CGCOM model and achieves the

largest discrepancy at the locations of large shear rate (𝑦 = 0, 𝐿/2, 𝐿). In contrast, the discrepancy

becomes much smaller for the CG3 and CG4 models with auxiliary CG variables.

3.3.3 Non-equilibrium flows

The inconsistent reduced modeling terms shown in Sec. 3.3.2 reveal a caveat of using the

standard CGCOM model for non-equilibrium processes. We examine this effect by simulating

various non-equilibrium flows.

37

(cid:12)
(cid:12)
(cid:12) for various CG models,
Figure 3.3 The pairwise conservative force difference
which loosely quantifies the generalization ability of the CG free energy 𝑈 (Q). (a) CGCOM; (b)
CG3 with (𝑖, 𝑗) = (1, 1); (c-f) CG4 with (𝑖, 𝑗) = (1, 1), (2, 2), · · · , (4, 4).

(𝑄) − 𝐹eq

𝑖 𝑗 (𝑄, 𝑦)

𝐹neq
𝑖 𝑗

(cid:12)
(cid:12)
(cid:12)

To establish a fair comparison, we construct the many-body form of the free energy 𝑈 (Q) and

the memory term K (Q, 𝑡) with the symmetry-preserving neural network representations presented

in Sec. 3.2.3 and Sec. 3.2.4 for the CGCOM, CG3 and CG4 models, respectively. To verify 𝑈 (Q),

we sample the radial distribution function of the inter-molecular COMs for the CGCOM model and

the intra-molecular CG coordinates for the CG3 and CG4 models. To verify K (Q, 𝑡), we sample the

velocity auto-correlation function of the CG variables. As shown in the Appendix, the predictions

of the three CG models show good agreement with the full MD results, which verify their efficacy

for modeling equilibrium processes.

Next, we consider the non-equilibrium reverse Poiseuille flow generated by an external force

(3.24) with 𝑓0 = 0.01. Fig. 3.5 shows the velocity development 𝑢𝑥 (𝑦, 𝑡). Compared with the full MD

results, the prediction of the standard CGCOM overestimates the instantaneous velocity magnitude

by three times. Furthermore, it can not capture the development oscillation near 𝑡 = 60. This

38

5254565851015202530(a)02468yQ5254565851015202530(b)0.00.51.01.52.0×10−1yQ5254565851015202530(c)0.00.51.01.52.0×10−1yQ5254565851015202530(d)0.00.51.01.52.0×10−1yQ5254565851015202530(e)0.00.51.01.52.0×10−1yQ5254565851015202530(f)0.00.51.01.52.0×10−1yQFigure 3.4 The difference of the variation of the fluctuation force |𝐾𝑖 𝑗
eq(𝑄, 𝑦)| for different
CG models, which loosely quantifies the generalization ability of the CG memory term K (Q, 𝑡) at
𝑡 = 0. The sub-figure labels are the same as Fig. 3.3.

neq(𝑄) − 𝐾𝑖 𝑗

Figure 3.5 The development of the reverse Poiseuille flow under the external force 𝑓0 = 0.01
predicted by the full MD and three CG models.

39

52545658511162126(a)0.00.51.01.52.02.53.0×102yQ52545658515202530(b)0510152025yQ52545658515202530(c)0510152025yQ52545658515202530(d)0510152025yQ52545658515202530(e)0510152025yQ52545658515202530(f)0510152025yQ0306090MDCGCOM010020003060CG3100200CG4−2−1012×10−1ytFigure 3.6 The steady state reverse Poiseuille flow under external force 𝑓0 = 0.01 predicted by the
full MD and the CG models. (a) The stationary velocity profile 𝑢𝑥 (𝑦) (b) The shear-rate-dependent
viscosity.

limitation is rooted in the choice of the COMs as the CG variables, which ignores the intra-molecular

interactions and therefore can not capture the complex visco-elastic responses under non-equilibrium

processes. The prediction of the CG3 model shows significant improvement to the CGCOM model

but overestimates the magnitude of the velocity oscillation during the flow development stage. On

the other hand, the prediction of the CG4 model shows a good agreement with the full MD results

throughout the development stage.

Fig. 3.6 shows the steady state velocity profile 𝑢𝑥 (𝑦) predicted by the different models. Similar

to the development stage, the prediction of the CG4 model shows a good agreement with the MD

results. However, the prediction of the CGCOM and the CG3 models show apparent discrepancies

due to their inefficacy in modeling the intra-molecular interactions. To quantify the difference, we

compute the shear-rate-dependent viscosity from the steady velocity profile. The predictions of the

CGCOM and the CG3 model are nearly independent of the shear rate, which indicates that they can

not capture the visco-elastic responses associated with the molecular conformation change under

external shear flow. In contrast, the CG4 model can faithfully reproduce the shear-rate-dependent

viscosity of the full MD results.

Finally, we examine the generalization ability of the CG models for other flow fields. While

40

10203040y0.000.050.100.150.20VelocityProﬁleMDCGCOMCG3CG4(a)0.01.53.04.5ShearRate×10−31020304050ShearViscosity(b)Figure 3.7 The velocity field of a 2D Taylor vortex predicted by the full MD and the CG models.
(a-b) The contour of the instantaneous velocity 𝑢𝑦 (𝑥, 𝑦, 𝑡) at 𝑡 = 50 (a) and 𝑡 = 100 (b). (c) The
development of velocity 𝑢𝑦 (𝑥 = 25, 𝑦, 𝑡) (d) The steady velocity field 𝑢𝑦 (𝑥, 𝑦).

41

0306090MDCGCOM0306003060CG30306090CG4−4−2024×10−2yx(a)0306090MDCGCOM0306003060CG30306090CG4−4−2024×10−2yx(b)0306090MDCGCOM010003060CG30100200CG4−8−4048×10−2yt(c)0306090MDCGCOM0306003060CG30306090CG4−2−1012×10−2yx(d)the training samples of the CG models are collected in the reverse Poiseuille flow, we validate the

constructed models with the Taylor vortex generated by the external force field

𝑓𝑥 (𝑥, 𝑦) = − 𝑓0 sin

(cid:19)

(cid:18) 2𝜋𝑥
𝐿

cos

(cid:19)

(cid:18) 2𝜋𝑦
𝐿

,

𝑓𝑦 (𝑥, 𝑦) = 𝑓0 cos

(cid:19)

(cid:18) 2𝜋𝑥
𝐿

sin

(cid:19)

(cid:18) 2𝜋𝑦
𝐿

,

where 𝑓0 = 1 × 10−2 and 𝐿 = 90. Fig. 3.7 shows the prediction of the 2D velocity contour from

various models. Similar to the previous example, the CGCOM and CG3 models overestimate the

magnitude of the velocity field due to their insufficiency in modeling the intra-molecular interactions

arising from the molecular conformation change under the external flow field. In contrast, the

prediction of the CG4 model yields show good agreement with the full MD model.

Although the CG4 model performs well in the vortex flow, we should not view it as the “correct”

(as opposed to the CGCOM) model for all non-equilibrium processes. Our main point is that the

conditional PDF associated with the CG projection needs to be consistent to ensure the model’s

generalization ability, which, unfortunately, can not be guaranteed for non-equilibrium processes.

This issue may severely limit a CG model’s applicability to practical problems. On the other

hand, properly introducing auxiliary CG variables may mitigate this inconsistency, and significantly

improve the model’s generalization ability.

3.4 Conclusion

This work presents a caveat in constructing reliable CG molecular dynamics models for

non-equilibrium processes. Specifically, a CG model’s generalization ability relies on its consistency

in the conditional PDF of the phase space vector (or equivalently, the CG projector operator) under

various non-equilibrium conditions. This criterion is determined by the proper choice of the CG

variables a priori and can not be remedied by constructing more accurate reduced modeling (the

free energy and the memory) terms. Although the Zwanzig’s projection formalism can in principle

provide the rigorous reduced dynamics, it is valid only when the distribution of the CG variables

is consistent with the one associated with the CG projection operator. Unfortunately, this metric

seems to be broadly overlooked in most existing studies on CGMD modeling. While the previous

works (e.g. Refs. Lei et al. (2010); Hijón et al. (2010)) use dynamic properties such as the velocity

auto-correlation functions and the mean squared displacement to validate the CG models, these

42

quantities are essentially defined under the marginal density distribution near-equilibrium; their

applications to non-equilibrium processes are generally un-warranted.

To alleviate the above challenge, this work proposes to learn a set of auxiliary CG variables based

on the TICA to minimize the entropy contribution of the unresolved variables so that the conditional

PDF of the full phase space vector under various conditions approaches the equilibrium case (i.e.,

Eq. (3.6)). We verify this generalization metric by examining the distribution of the second moment,

the fluctuation force, and its variation under various external conditions. We show that the common

CG model that uses the molecule’s COMs as the CG variables is generally insufficient to retain

the consistent conditional PDF. In contrast, the present models with auxiliary CG variables show

significant improvement. Furthermore, the crucial role of this metric is reflected in modeling the

non-equilibrium reverse Poiseuille flow and the Taylor vortex. In particular, the prediction of the

standard CGCOM model exhibits large discrepancies from the full MD results due to its inefficacy

in modeling the intramolecular interactions under external flows different from the equilibrium

conditions. Conversely, the present model can faithfully recover the complex visco-elastic responses

and therefore yields consistent predictions with the full MD results.

Finally, we note that the learning of the auxiliary CG variables in the present study remains

somewhat empirical. They are constructed based on the TICA and hence take a linear form of the

full atomistic coordinates. In practice, we may learn the CG variables as nonlinear encoders of the

molecular conformation to achieve a more efficient representation of the intra-molecular interactions

(e.g., see Ref. Fang et al. (2022)). We leave this for future study.

43

CHAPTER 4

CONSENSUS-BASED ENHANCED SAMPLING FOR HIGH-DIMENSIONAL FREE
ENERGY SURFACES

4.1 Challenges in Constructing High-Dimensional Free Energy Surfaces

Molecular dynamics (MD) provides an essential tool Frenkel and Smit (2023) to access the

micro-scale insights of complex processes in many scientific applications, including chemical

engineering, material synthesis, and drug design. To further predict the collective behaviors, it is

often desirable to construct the free energy surfaces (FESs) with respect to a set of functional-relevant

collective variables (CVs). However, the accurate construction of FES remains a practical challenge

due to the high dimensionality and the prevalence of energy barriers. Several ingenious methods

based on importance sampling have been developed, such as umbrella sampling Torrie and Valleau

(1977b), histogram reweighting Kumar et al. (1992), metadynamics Laio and Parrinello (2002);

Barducci et al. (2008); Grafke and Laio (2024), variational enhanced sampling Valsson and Parrinello

(2014); Shaffer et al. (2016); Bonati et al. (2019), weighted ensemble and its variants Zhang et al.

(2007); Ahn et al. (2017, 2021), hyperdynamics Voter (1997), transition matrix based unbiasing

method Wu and Noé (2014), orthogonal space random walk Zheng et al. (2008) and adaptive

biasing force Darve and Pohorille (2001); Darve et al. (2008); Lelièvre et al. (2008); Chipot and

Lelièvre (2011). Despite the efficacy in enhancing the phase space exploration, these methods

are based on iterative estimation of the biased probability density functions (PDFs) (or biased

force), where the efficiency generally diminishes as the number of CVs increases. Strategies such

as bias-exchange Piana and Laio (2007) and parallel bias metadynamics Pfaendtner and Bonomi

(2015); Prakash et al. (2018) have been developed to alleviate this issue. Both methods factorize

the biasing potential as a product of several low-dimensional functions and can work effectively

when the underlying FESs admit such structures which, however, may not always be the case for

high-dimensional problems. Several other methods Sugita and Okamoto (1999); Marinari and

Parisi (1992); Gao (2008); Yang and Qin Gao (2009); Kim et al. (2006); Hamelberg et al. (2004);

Miao et al. (2015); Lu and Vanden-Eijnden (2013); Yu et al. (2016) achieve enhanced sampling

44

irrespective of pre-defined CVs (see review Yang et al. (2019)). However, the construction of explicit

high-dimensional FESs remains a challenging problem.

Alternative to the histogram methods, the temperature-accelerated molecular dynamics Maragliano

and Vanden-Eijnden (2006) and the adiabatic free energy dynamics Rosso et al. (2002); Abrams

and Tuckerman (2008) are developed by introducing an extended dynamics of the CVs with an

artificially high temperature to overcome the energy barriers. The methods avoid dealing with

the biased potentials and have shown the promise of efficient exploration of the high-dimensional

CV space Abrams and Vanden-Eijnden (2010). On the other hand, to further construct the FES,

the sampling points are deposited in a greedy way Maragliano and Vanden-Eijnden (2008); the

effectiveness for high-dimensional problems remains under-explored.

With the recent advancements in machine learning, kernel methods and deep neural networks

(DNNs) have demonstrated promising results in representing multi-dimensional FES Stecher et al.

(2014b); Mones et al. (2016); Schneider et al. (2017); Bonati et al. (2019); Sidky and Whitmer (2018).

However, as also noted in Ref. Cendagorta et al. (2020), one essential problem for constructing

high-dimensional FES lies in how to optimize the training set. Ideally, to efficiently approximate a

high-dimensional function, the sampling should enable certain adaptivity based on the posterior error

so that the training set can be updated according to the particular underlying FES. However, such

posterior error-based sampling remains largely open for FES construction since the approximation

residual can neither be easily queried nor sampled within the full phase space. In practice, the

training samples are often collected either as pre-defined grid points or in a heuristic manner. As the

number of CVs increases, such choices may induce pronounced approximation error and suffer from

performance degradation, since most phase space regimes essentially become thermodynamically

inaccessible. Remarkably, the DeepVes Bonati et al. (2019) enables the efficient sampling of the CV

space by jointly constructing the bias potential and the target distribution. On the other hand, the

accurate reconstruction of explicit FES further requires the estimation of high-dimensional PDFs

from the samples. Recent efforts Noé et al. (2019); Gabrié et al. (2022) seek the direct approximation

of the transition operator instead of estimating PDFs using generative models, when representative

45

configurations are known a priori. In addition, the reinforced dynamics Zhang et al. (2018d); Wang

et al. (2022) (see also Ref. van der Oord et al. (2023)) proposes an efficient approach to impose

adaptivity by using an uncertainty indicator as an indirect measure of the construction error to bias

MD simulations, which relies on calculating the standard deviation of the predictions from multiple

DNNs trained on the same dataset.

In this work, we present a consensus-based enhanced sampling (CES) method to efficiently

construct high-dimensional FES for complex MD systems. A unique feature is that the method

enables adaptive posterior error-based sampling such that the FES approximation and the sample

points can be simultaneously optimized. Also, unlike the reinforced dynamics, the method does

not rely on the training of multiple DNN representations. The main idea is to formulate the

construction as a min-max problem, where the max-problem seeks a residue-based distribution to

establish adaptive sampling in the vicinity of the explored phase space regime, and the min-problem

optimizes the DNN parameters for the FES representation. Iteratively optimizing both the training

set and the DNN representation achieves an adversarial construction of the FES pertaining to the

kinetic transition regime. For the maximization step, it is worth mentioning that the query of the

thermodynamically accessible phase space is non-trivial. In particular, existing approaches Tang

et al. (2023); Bao et al. (2020) in the spirit of adversarial generative models can not be directly

applied, since the global residual error is unknown a prior. Instead, we establish a consensus-based

sampling Carrillo et al. (2022) (see also Refs. Pinnau et al. (2017); Carrillo et al. (2018, 2021);

Chen et al. (2022) ) in the form of a stochastic interacting particle system governed by a McKean

stochastic differential equation. A quadratic potential is adaptively constructed to probe the local

maximum error regime by exploiting the Laplace approximation under a low-temperature limit.

Meanwhile, a coherent noise term is introduced to efficiently explore the full CV space under a

high-temperature limit and yield the target sampling points for the maximization problem, by which

the minimization of the DNN residue can be solved jointly. We demonstrate the effectiveness of the

proposed method by constructing the FES of several biomolecule systems with the number of CVs

up to 30. Fig. 4.1 sketches the workflow of the proposed method.

46

Figure 4.1 The workflow of the present CES-based method. In the minimization step, given a
collection of sampling points, the reference force (i.e., the gradient of the underlying FES) can
be calculated using the restrained dynamics; a comparison with the force inferred from the DNN
approximation yields a posterior error. In the maximization step, the posterior error determines a
residue-based distribution with entropy regularization. An interacting particle system following
McKean stochastic dynamics is used to achieve an adaptive sampling of the max-residue regime by
modulating the exploitation of the Laplace approximation of the current residue-based distribution
and the exploration of the uncharted phase space. The FES can be accurately reconstructed after
several iterations of the minimization and maximization step.

4.2 Consensus-Based Sampling Methodology

4.2.1 Free energy and mean forces

We consider a full model with micro-scale coordinates r ∈ R𝑁 whose dynamics is governed by

potential 𝑈 (r) : R𝑁 → R under temperature 𝑇. Suppose we are interested in CVs s(r) : R𝑁 → Γ

with Γ ⊂ R𝑀, the FES 𝐴(z) of the CVs is defined by

𝐴(z) = −

1
𝛽 ln 𝜌(z),

where 𝛽 = 1/𝑘 𝐵𝑇 is the inverse of the thermal temperature,

𝜌(z) =

∫

1
𝑍

exp (−𝛽𝑈 (r))𝛿(s(r) − z)dr

(4.1)

(4.2)

is the marginal density for s(r) = z, 𝑍 = ∫ exp (−𝛽𝑈 (r))dr is the partition function, and 𝛿(·)

represents the Dirac delta function, which we refer readers to Stoltz et al. (2010) for more detailed

47

MinimizationstageMaximizationstageReferenceFESInferredDNNPosteriorErrorQuadraticApproximationEnhanceSamplingexplaination. For high-dimensional CVs, the direct estimation of 𝜌(z) often becomes numerically

challenging. An alternative approach is to fit the mean force F(z) := −∇𝐴(z) at various sample

points, which can be estimated via the restrained dynamics Allen and Tildesley (2017) by introducing

a harmonic term into the full potential, i.e.,

𝑈𝑘 (r, z) = 𝑈 (r) +

𝑘

2

(s(r) − z)𝑇 (s(r) − z),

(4.3)

where 𝑘 represents the magnitude of the restrained potential. As shown in Ref Maragliano and

Vanden-Eijnden (2006), the mean force can be computed by F(z) = lim𝑘→∞ F𝑘 (z), where F𝑘 (z) is

defined by

F𝑘 (z) =

∫

1
𝑍𝑘 (z)

𝑘 (s(r) − z) exp (−𝛽𝑈𝑘 (r, z))dr,

(4.4)

and can be sampled as the local first-moment estimation.

In principle, given a collection of sample points z, 𝐴(z) can be re-constructed (up to a constant)

by matching the mean force −∇𝐴(z) at the individual points. However, for complex MD systems,

the sampling over the phase space could be highly non-trivial due to the prevalence of local

energy minima; the training set is often determined a prior as pre-selected points or in a greedy

manner. As the number of CVs increases, the empirical random samples may introduce pronounced

discretization error of the loss function over the full phase space. To efficiently construct 𝐴(z) in the

thermodynamically accessible regime, it is desirable to simultaneously optimize the training set

and the FES approximation through certain adaptive sampling based on the posterior residual error.

This motivates the present method illustrated below.

4.2.2 Min-Max formulation

Let 𝐴N (z) denote the DNN representation of the FES 𝐴(z), which is parameterized by minimizing

the loss function

LN (z) = ∥∇z 𝐴N (z) + F(z)) ∥2

(4.5)

for z ∈ Γ. To solve the problem, we introduce a sampling distribution 𝑞(z) and define the weighted

loss

(LN , 𝑞) =

∫

Γ

LN (z)𝑞(z)dz.

48

(4.6)

A desired distribution intends to maximize the discrepancy in the dataset for a given network 𝐴N (z).

Accordingly, we define the maximum problem as

𝐺 [LN ] = max

𝑞

(LN , 𝑞).

(4.7)

Since (LN , 𝑞) is always non-negative, a good approximation of the original free energy surface

𝐴N (z) (up to a constant) can be obtained by solving the following problems

min
𝐴N

max
𝑞

(LN , 𝑞).

(4.8)

Proposition 4.2.1. Assuming that there exists a solution of LN (z) = 0 for z ∈ Γ, then 𝐴∗ is a

solution if and only if it solves (4.8).

Proof. Suppose 𝐴∗ is the solution for LN (z) = 0, it satisfies (LN ∗, 𝑞) = 0 for any 𝑞 ∈ 𝑊, i.e.,

𝐺 [LN ∗] = 0. Therefore, 𝐴∗ is a solution for the minimax problem (4.8). On the other hand, if ˆ𝐴 is

the minimizer for problem (4.8) but not the solution for LN (z) = 0, then there exists ˆ𝑞 ∈ 𝑉 such

that (L ˆN

, ˆ𝑞) > 0. However, (LN ∗, 𝑞) = 0 for all 𝑞, which contradicts the assumption that ˆ𝐴 is the

minimizer.

□

Proposition 4.2.1 shows that the direct construction of FES 𝐴(z) can be reformulated as an

adversarial learning of the optimal solution 𝐴N (z) for the min-max problem (4.8). Accordingly,

the training consists of two components: the minimization part optimizes the DNN representation

with the current training set; the maximization part explores the regime of the largest residue for

the current DNN representation and essentially establishes an adaptive sampling of the training set

based on the posterior error.

To numerically solve the max-problem, certain regularization needs to be introduced. Otherwise,

the max-problem will simply yield a delta measure, i.e., 𝛿(z − z∗), where z∗ = arg max LN (z). Since

the sampling needs to simultaneously identify the max-residual regime and explore the uncharted

phase space, we introduce the entropy-based regularization Wang et al. (2020a); Gao et al. (2022)

(see also Refs. Gulrajani et al. (2017); Miyato et al. (2018); Tang et al. (2023) for gradient-based

49

regularization), and the max-problem is reformulated by

∫

min
𝑞

(−LN (z) + 𝜅−1

ℎ ln 𝑞(z))𝑞(z)dz.

(4.9)

The problem is convex for a PDF 𝑞 with a unique global minimum at 𝑞∗(z) = exp(−𝜅ℎL−
where 𝑍 ∗ = ∫ exp(−𝜅ℎL−

N (z))/𝑍 ∗,
N (z) = −LN (z). The parameter 𝜅ℎ is a Lagrangian

N (z))dz and L−

multiplier, balancing the focus between the peak concentration and the scope of exploration, and is

somewhat similar to the inverse temperature in statistical mechanics. An elevated 𝜅−1
ℎ

induces a

distribution closer to a uniform distribution. Conversely, a diminished 𝜅−1
ℎ

induces a concentrated

distribution near the max-residue point.

However the analytical formula of 𝑞∗(z) futher depends on the accurate FES, the direct sampling

remains challenging for high-dimensional CVs. Inspired by the consensus-based sampling method

Carrillo et al. (2022), we establish a stochastic particle system governed by the McKean stochastic

differential equation. By properly constructing the conservative potential and noise terms, the

particle distribution provides a good approximation of 𝑞∗ as illustrated below.

4.2.3 Exploitation and exploration in the max-problem

To approximate the target distribution 𝑞∗(z), particularly in the vicinity of the max-residual

point z∗, we exploit Laplace’s principle in the large deviations theory, i.e.,

(cid:18)∫

(cid:18)

−

1
𝜅 log

lim
𝜅→∞

exp (−𝜅 𝑓 (z))d𝜌∗(z)

(cid:19)(cid:19)

= 𝑓 (z∗)

(4.10)

holds true for any compactly supported probability measure 𝜌∗, where z∗ ∈ supp(𝜌∗) uniquely

minimizes the function 𝑓 . This enables us to identify the max-residual point from a collection of
samples (cid:8)z𝑖(cid:9) 𝑁𝑤

𝑖=1 by the first-order momentum m under the weighted density function 𝑝(z), i.e.,

∫

m =

z𝑝(z)dz ≈

𝑁𝑤
∑︁

𝑖=1

z𝑖 ˆ𝑝(z𝑖)

ˆ𝑝(z) =

N (z))

exp (−𝜅𝑙 L−
𝑖=1 exp (−𝜅𝑙 L−

N (z𝑖))

(cid:205)𝑁𝑤

(4.11)

where 𝜅−1

𝑙

represents a low temperature limit. However, the integration is subject to the so-called

curse of dimensionality as the number of CVs increases. Instead, we treat sampler z𝑖 as a random

walker z𝑖

𝑡 governed by the following McKean stochastic differential equation

(cid:164)z𝑖
𝑡 = −

1
𝛾

∇z𝐺 (z𝑖

𝑡) +

√︄

2
𝜅ℎ𝛾

𝜉𝑖 (𝑡),

50

(4.12)

where 𝐺 (z𝑡) = 1

2 (z𝑡 − m𝑡)𝑇𝑉 −1

𝑡

(z𝑡 − m𝑡) denotes an adaptively constructed conservative potential

function. The formulations of m𝑡, 𝑉𝑡 are specified in (4.13) with the rationale discussed in the next

section. Consequently, 𝐺 (z𝑡) navigates the random walkers (i.e., individual particles) towards m𝑡,

which represents the region of large residual error. The second term in Eq. (4.12) represents a

stochastic term where 𝛾 represents the friction coefficient and 𝜉 (𝑡) represents the standard Gaussian

white noise characterized by zero mean and covariance E[𝜉𝑖 (𝑡)𝜉 𝑗 (𝑡′)] = 𝛿𝑖 𝑗 𝛿(𝑡 − 𝑡′).

The coupling of the conservative and stochastic terms maintains a relatively high temperature

ℎ , and a large friction coefficient 𝛾 is applied such that the distribution is almost always Gaussian
𝜅−1
during the evolution. As shown in the following section, the distribution of walkers 𝑞𝑡 (z) converges

to ∝ exp(−𝜅ℎ𝐺 (z)) characterized by m∞ and 𝑉∞. Accordingly, the balance between exploitation

and exploration is controlled using two temperatures 𝜅−1

𝑙

and 𝜅−1

ℎ . As 𝜅−1

𝑙 decreases, the random

walker distribution concentrates near the max-residual points of the current model, reflecting the role

of exploitation. Conversely, as 𝜅−1
ℎ

increases, the distribution smoothens progressively, enhancing

the exploration of the uncharted regions.

4.2.4 Adaptive parameter estimation and FES construction

In this section, we show that the sampling distribution governed by Eq. (4.12) converges to a

steady state as an approximation of 𝑞∗ with the proper choices of m𝑡 and 𝑉𝑡 given by

m𝑡 =

𝑁𝑤
∑︁

𝑖=1

𝑡 ˆ𝑝(z𝑖
z𝑖

𝑡),

𝑉𝑡 = 𝜅𝑡

𝑁𝑤
∑︁

𝑖=1

(z𝑖

𝑡 − m𝑡) (z𝑖

𝑡 − m𝑡)𝑇 ˆ𝑝(z𝑖

𝑡),

(4.13)

where ˆ𝑝(z) is defined by (4.11). In particular, with 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ, (4.12) converges to the distribution

based on the target residues.

Proposition 4.2.2. Suppose L−
𝜇), 𝑞𝑡 → exp (−𝜅ℎL −
∫ exp (−𝜅ℎL −

N (z))
N (z))dz

as 𝑡 → ∞, given 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ.

N (z) takes a local quadratic approximation in form of 1

2 (z−𝜇)𝑇 Σ−1(z−

We refer to SI for the proof. The stochastic dynamics (4.12) and (4.13) differs from the regular

Langevin dynamics; it is nonlocal and similar to the one in Carrillo et al. (2022) except that the

51

parameter 𝜅ℎ appears in the target distribution that modulates the exploration of the sampling

dynamics. The quadratic assumption of the loss function L−

N (z) is due to the fact that we are mainly

interested in the regime near the max-residue point. Under the low-temperature limit, the local

regime can be well characterized by the first and second moments following Laplace’s principle.

While m identifies the extremal point, 𝑉 recognizes the anisotropic nature among the different CVs.

In this study, for the sake of computational efficiency, we further simplify 𝑉 by only considering the

diagonal entries denoted as v. Moreover, we utilize a moving average to ensure stable estimation

m𝑡+1 = 𝛽1m𝑡 + (1 − 𝛽1)

𝑁𝑤
∑︁

𝑡 ˆ𝑝(z𝑖
z𝑖

𝑡),

v𝑡+1 = 𝛽2v𝑡 + (1 − 𝛽2)𝜅𝑡

𝑖=1
𝑁𝑤
∑︁

(z𝑖

𝑡 − m) ⊙ (z𝑖

𝑡 − m) ˆ𝑝(z𝑖

𝑡),

(4.14)

𝑖=1
where 𝛽1 and 𝛽2 are hyper-parameters. To maintain unbiased estimation, normalization is
implemented as follows: m = m𝑡
1−𝛽𝑡
1

and v = v𝑡
1−𝛽𝑡
2

.

With the training samples obtained from the aforementioned maximization step, the DNN

representation of the FES is optimized using the Adam stochastic gradient descent method Kingma

and Ba (2015) for the minimization step. The loss function of the updated DNN representation

𝐴N (z), in turn, navigates the consensus-based adaptive sampling for the updated maximization step.

The min-max problem is solved iteratively to achieve comprehensive sampling of the full phase

space.

So far, the construction process is based on the function approximation of 𝐴(z) over the full

regime. However, we note that the kinetic processes of a molecular system are generally characterized

by the local minima and saddles points on the FES. On the other hand, the regimes of high free

energy are less relevant. To accurately construct these thermodynamically accessible regimes, we

modify the loss function L as

LN (z) =

∥∇z 𝐴N (z) + F(z) ∥2
∥F(z) ∥2 + 𝑒

,

(4.15)

for all the biomolecule systems except the toy example of the 1D Rastrigin function. 𝑒 is a small value

to regularize the denominator. We note that a similar formulation is used in Ref. Maragliano and

52

Vanden-Eijnden (2008) to quantify the model accuracy after the FES is constructed. Alternatively,

in this study, LN (z) directly involves the construction process for the adaptive sampling of the

training set. A detailed algorithm is presented in Algorithm 4.1.

Algorithm 4.1 Consensus-based enhance sampling.
Require: Initial sampling point z𝑖
0, for 𝑖 = 1, . . . , 𝑁𝑤
Require: Initial DNN parameter 𝜃0
Require: The number of training iterations 𝑁𝑡𝑟𝑎𝑖𝑛
Require: The number of data collected 𝑁𝑑𝑎𝑡𝑎 in each training iteration

𝑗 ← 0, 𝑡 ← 0
𝑇 ← ⌈ 𝑁𝑑𝑎𝑡 𝑎
⌉
𝑁𝑤
while 𝑗 < 𝑁𝑡𝑟𝑎𝑖𝑛 do
while 𝑡 ≤ 𝑇 do

𝑡) = ∇z 𝐴N (z𝑖

𝑡; 𝜃 𝑗 )

𝑡 at z𝑖
𝑡

calculate the mean force F𝑖
calculate the predicted force F𝜃 (z𝑖
𝐿𝑖 ← LN (z𝑖
𝑡)
𝑤𝑖 ← exp (𝜅𝑙 𝐿𝑖)
(cid:205)𝑖 exp (𝜅𝑙 𝐿𝑖)
m𝑡+1 ← 𝛽1m𝑡 + (1 − 𝛽1) (cid:205)𝑖 z𝑖
v𝑡+1 ← 𝛽2v𝑡 + (𝜅𝑙 + 𝜅ℎ)(1 − 𝛽2) (cid:205)𝑖 (z𝑖
m ← m𝑡+1
1−𝛽𝑡
1
v ← v𝑡+1
1−𝛽𝑡
2
z𝑖
← z𝑖
𝑡+1
𝑡 ← 𝑡 + 1

𝑡 − m) ÷ v +

√︃ 2𝛼
𝛾𝜅ℎ

𝑡 − 𝛼

𝛾 (z𝑖

𝑡𝑤𝑖

𝑡 − m𝑡)2𝑤𝑖

𝜂𝑡, 𝜂𝑡 ∼ N (0, 1)

end while
Save the training dataset D 𝑗 = {z𝑖
Optimize 𝜃 𝑗+1 using the generated training set D𝑙 for 𝑙 = 0, . . . , 𝑗.
𝑗 ← 𝑗 + 1

𝑡 }𝑇
𝑡=0

𝑡, F𝑖

end while

4.3 Application of Consensus-Based Sampling to Biomolecular Systems

4.3.1 One-dimensional Rastrigin function

To illustrate the essential idea of the present method, we start with the 1-dimensional Rastrigin

function:

𝑓 (𝑥) = 𝑥2 − cos(2𝜋𝑥),

𝑥 ∈ [−3, 3].

(4.16)

Instead of a neural network, we simply use a piecewise polynomial function 𝑓𝜃 (𝑥) for this 1D

53

Figure 4.2 Adaptive sampling and construction of the 1D Rastrigin function. Left: The reference
function 𝑓 (𝑥) and the constructed approximations 𝑓𝜃𝑖 (𝑥) obtained at different iteration steps. The
relative 𝑙2 error is less than 6 × 10−3 after 12 iterations. Right: The residual function | 𝑓 (𝑥) − 𝑓𝜃𝑖 (𝑥)|.
The symbols represent the locations identified by each adaptive sampling (i.e., the maximization)
step where new points will be added for the next training (i.e., the minimization) step.

problem, where 𝜃 represents the fitting parameters. Accordingly, the residual is directly defined

as | 𝑓 − 𝑓𝜃𝑖 |. Initially, we set 𝑓𝜃0 (𝑥) ≡ 8 with consistent boundary condition. We use the proposed
sampling method with 10 walkers to estimate the first and second moments, yielding 𝑚1 = 1 × 10−4

and 𝑉1 = 0.022 and therefore the first and second derivative 𝑓 ′(𝑚1) = 0 and 𝑓 ′′(𝑚1) = 1
𝑉1

= 40.97.

This value is extremely close to the actual minimizer at 0 (i.e., the max-residue point) and the second

derivative 𝑓 ′′(𝑚1) = 41.47. Accordingly, we add new data points near 𝑥 = 1 × 10−4 into the training

set consisting of the boundary points 𝑓 (−3) = 𝑓 (3) = 8, which enables us to construct an updated

approximation 𝑓𝜃1 (𝑥). Similarly, with the approximation 𝑓𝜃𝑖−1 (𝑥), we conduct another sampling
process (i.e., the maximization step) and include the obtained samples near 𝑚𝑖 into the training set,

yielding an updated approximation 𝑓𝜃𝑖 (𝑥). As shown in Fig. 4.2, for each iteration, the sampling

step can pinpoint the regime where the error is most pronounced. Furthermore, as shown in the

Supplementary Information (SI), the second momentum near the maximum region can be estimated

correctly as well. The underlying function 𝑓 (𝑥) can be fully recovered after 12 iterations.

54

2.50.02.501020302.50.02.53020100013124.3.2 Two-dimensional FES

We use the alanine dipeptide (Ace-Ala-Nme), referred to as Ala2, as a benchmark problem. The

molecule is solvated in 383 TIP3P water molecules similar to Ref. Zhang et al. (2018d). The full

MD system is simulated in a canonical ensemble under temperature 300𝐾 using the Amber99-SB

force field Hornak et al. (2006) with a time step is 2 fs. We refer to the SI for the simulation details.

We choose the CVs as the torsion angles: 𝜙 (C, N, C𝛼, C) and 𝜓 (N, C𝛼, C, N). For comparison,

we also construct the FESs using the VES Bonati et al. (2019) and RiD Zhang et al. (2018d)

method with the same setup parameters presented therein. For the CES method, we use 10 walkers

to explore the configuration space. The initial points of the walkers at the 𝑘th iteration are chosen to

be the final 10 points of a biased dynamics with the bias potential −𝐴N (z; 𝜃 𝑘−1). In the sampling

stage, the inverse of the low and high temperatures are set to be 𝜅𝑙 = 10 and 𝜅ℎ = 1, respectively. For

each sample point, restrained dynamics is conducted with 𝜅 = 500 for 5000 steps to compute the

average force. The timestep 𝛼 for the dynamics of the random walkers (4.12) is set to be 0.1. Fig.

4.3 shows the FESs constructed by the three different methods and the reference obtained by the

metadynamics Laio and Parrinello (2002) using a long simulation time. As presented in Fig. 4.3e,

the CES method yields smaller approximation error and meanwhile requires lower computational

cost. The better performance is not unexpected since the sample points can be adaptively optimized

based on the construction error.

It is worth mentioning that the VES method was initially developed to achieve efficient enhanced

sampling in the high-dimensional phase space; it is somewhat unfair to use it for direct FES

construction since the estimation of the probability density function from the obtained samples

could be numerically challenging for high-dimensional cases. On the other hand, the present CES

method enables the explicit construction of FES if the analytical form is needed. Also, we note that

the accuracy of the RiD method can be further improved using more iterations at the cost of larger

simulation and training overhead. Below, we test the methods in more complex systems.

55

Table 4.1 The accuracy of the constructed 2D FES (the Ala2 molecule) and computational time (in
hours, the same below) for the VES, RiD, and CES methods. The 𝑙2 and 𝑙∞ error are computed up
to 40 KJ/mol. The reference solution is constructed by the metadynamics. The simulation time of
the CES method is multiplied by 10 since 10 walkers are used.

Method

VES
RiD

CES

Accuracy

𝑙2 error
5.39
3.15

𝑙∞ error Simulation
21.03
11.04

17.98

Time

47.5

Train

1.88

10.68

0.23 × 10

0.22 (GPU)
0.18 (CPU)
0.13 (GPU)

Figure 4.3 The 2D FES for the Ala2 molecule on the 𝜙 − 𝜓 plane constructed by (a) Metadynamics
(reference) (b) VES (c) RiD (d) CES (the present method). The accuracy and the computational
cost are shown in Table 4.1.

4.3.3 Two-dimensional FES

4.3.4 Three-dimensional FES

Next, we consider a s-(1)-phenylethyl (s1pe) peptoid in an aqueous environment similar to Refs.

Weiser and Santiso (2019); Wang et al. (2020a). The full system consists of one biomolecule and

546 water molecules in a (2.9n𝑚)3 dodecahedron box. The CHARMM general force field (CGenFF)

Weiser and Santiso (2019) is used for the biomolecule and the TIP3P model Jorgensen et al. (1983)

is used for the water molecules. The system temperature is set to be 298 K with a time step is 2 fs.

56

−202(a)(b)−202−202(c)−202(d)010203040We refer to the SI for details.

The CVs are the three torsion angles 𝜔 spanned by atoms (C𝛼, C, N, C𝛼), 𝜙, and 𝜓, where the

latter two are the same as the Ala2 molecule. The FES is constructed by both the RiD and CES

methods. The setup of the RiD method is the same as Wang et al. (2020a). For the CES method,

we use 20 walkers and set the inverse of the low and high temperatures to be 𝜅𝑙 = 10 and 𝜅ℎ = 2,

respectively. The initial points of these walkers at each iteration are chosen in the same method as

the two-dimensional problem. The timestep 𝛼 for the dynamics of the random walkers (4.12) is set

to be 0.1. For each sample point, restrained dynamics with 𝜅 = 500 is conducted for 10000 steps to

compute the mean force.

For visualization, we project the constructed FES onto a two-dimensional plane and fix the

third variable. Fig. 4.4 shows the projected FES on the 𝜔 − 𝜙 and 𝜙 − 𝜓 plane obtained from the

CES and the RiD methods. For each projection, the reference is constructed as a 2D FES using the

metadynamics Laio and Parrinello (2002). Similar to the previous 2D case, the present CES method

yields higher accuracy with lower computational cost.

4.3.5 Nine-dimensional FES

Furthermore, we consider a more complex molecule, the peptoid trimer (s1pe)3, solvated in a

(4.2 nm)3 dodecahedron box with 1622 TIP3P water molecules. The force field and other simulation

setups are similar to the s1pe molecule.

The chosen CVs consist of the 9 torsion angles 𝜔, 𝜙, 𝜓 that are defined in s1pe case, associated

with the different C𝛼 atoms and denoted as 𝜔1, 𝜙1, 𝜓1, 𝜔2, 𝜙2, 𝜓2, 𝜔3, 𝜙3, 𝜓3. We use 64 walkers

for this case and the initial conditions of the walkers at each iteration are chosen with the same

method as the previous cases. The inverse of the low and the high temperature are set to be 𝜅𝑙 = 100

and 𝜅ℎ = 2, respectively. The timestep 𝛼 of the dynamics of the random walkers is set to be 0.1.

The FES is constructed by the CSE method using 28 iterations of sampling and training, which

requires 225.53 × 64 = 14434.35 CPU hours for simulation and 6.06 GPU hours for training. For

comparison, the RiD method uses 17900 CPU hours for simulation and 15.44 GPU hours for

training. Similar to the above 3D problem, the constructed 9D FES is projected onto various 2D

57

Figure 4.4 The 3D FES for molecule s1pe projected on the 𝜔 − 𝜙 with 𝜓 = 1.5 (first row) and 𝜙 − 𝜓
planes with 𝜔 = 1.5 (second-row). (a-d) The 2D FES constructed using metadynamics with the
third variable restrained (reference); (b-e) projection of the 3D FES constructed by the RiD method;
(c-f) projection of the 3D FES constructed by the present CES method. The CES method requires
4.81 × 20 CPU hours for sampling and 0.84 GPU hours for training, whereas the RiD method uses
423.33 CPU hours and 8 GPU hours, respectively. For a more detailed quantitative analysis, we
refer to the SI.

planes with the remaining variables fixed. For each 2D projection, the reference is constructed as

a 2D FES using metadynamics. Fig. 4.5 shows the projection on the 𝜔1 − 𝜙1 and 𝜔1 − 𝜓1 plane

(see SI for visualization of other 2D projections). Compared with the Rid method, the present CES

method yields higher accuracy with lower computational cost.

The different performances could be possibly due to the distinct ways of imposing the sampling

adaptivity for the two methods. Specifically, the Rid method trains a replica of DNNs on the same

sample set and uses the standard deviation of multiple DNNs’ predictions as an indirect measure of

the construction error. The significant sampling error of the mean force (i.e., −∇𝐴(z)) could be

overlooked by using the standard deviation as the uncertainty indicator, and therefore may further

propagate into the construction error. For instance, a biased sampling of mean force may lead to

a consistent biased prediction among multiple DNNs. As a result, even if the standard deviation

of multiple DNNs’ predictions for a sample point is small, the construction error could be still

58

202(a)202202(d)(b)202(e)(c)202(f)010203040Figure 4.5 The 9D FES for molecule (s1pe)3 projected on the 𝜔1 − 𝜙1 (first row) and 𝜔1 − 𝜓1
plane (second row).
(a-d) The 2D FES constructed using metadynamics with the remaining
variable restrained (reference); (b-e) projection of the 9D FES constructed by the RiD method using
17900 CPU hours for sampling and 15.44 GPU hours for training; (c-f) projection of the 9D FES
constructed by the CES method using 14434.35 CPU hours and 6.06 GPU hours, respectively.

pronounced in that region. Instead, the CES method directly uses the construction error to impose

the adaptivity and therefore achieve enhanced sampling of the regions lacking accuracy.

4.3.6 Thirty-dimensional FES

Finally, we consider the polyalanine-15 (Ace-(Ala)15-Nme), denoted as Ala16, solvated in 2258

water molecules. The full system is simulated temperature 300K in a (4.62 nm)3 dodecahedron

box using the Amber99-SB force field with a time step is 2 fs. We refer to the SI for the simulation

details.

The chosen CVs consist of torsion angles 𝜙 and 𝜓, defined in Ala2 case, associated with the

different C𝛼s, denoted as {𝜙𝑖, 𝜓𝑖}15

𝑖=1. We use 64 walkers for this case and the initial value of the

walkers at each iteration is given by the biased simulation before. The inverse of the low and high

temperatures are set to be 𝜅𝑙 = 20 and 𝜅ℎ = 5, respectively. The timestep 𝛼 for dynamics of the

random walkers Eq. (4.12) is set to be 0.1. The FES is constructed using 100 iterations of sampling

and training. Similar to the previous case, the obtained FES is plotted on a two-dimensional plane

59

202(a)202202(d)(b)202(e)(c)202(f)0102030405060Figure 4.6 The 30D FES for molecule Ala16 projected on various 2D planes. (a-d) 𝜙1 − 𝜓1 (b-e)
𝜙2 − 𝜙3 (c-f) 𝜓2 − 𝜓3. (a-b-c) 2D FES constructed by metadynamics (reference) (d-e-f) The 30D
FES constructed by the present CES method.

while the remaining variables are fixed and the reference is constructed as a 2D FES using the

metadynamics. Fig. 4.6 shows the projection on the 𝜙1 − 𝜓1, 𝜙2 − 𝜙3 and 𝜓2 − 𝜓3 plane. For all

the cases, the projection of the 30-dimensional FES shows good agreement with the 2-dimensional

reference solution. We have also examined the projection on other planes; the prediction shows

good agreement with the reference solution as well. We refer to the SI for details.

4.4 Conclusions

We have presented a consensus-based approach for constructing high-dimensional FESs by

reformulating the construction task as a minimax optimization problem. Rather than seeking the

direct fitting, this method essentially establishes an adversarial learning of FESs by simultaneously

optimizing the target function approximation and the training set. While the common approaches

mainly focus on the efficient exploration of the phase space in the presence of local minima, the

present method further accounts for the discretization error that has been broadly overlooked in FES

construction. Adaptive sampling of the max-residue regime is achieved through the consensus-based

sampling of a posteriori residue-induced distribution in the form of a stochastic particle system in

60

202(a)202202(d)(b)202(e)(c)202(f)0102030405060the CV space. Given the fact the sampling only relies on the first and second-moment estimation,

the method could be particularly efficient for high-dimensional problems. While the numerical

results of biomolecular systems have demonstrated the effectiveness for FES construction, the

present framework of unifying the residual minimization and max-residue enhanced sampling is

quite general for model reduction of complex systems, e.g., the stochastic reduced dynamics (e.g.,

see Ref. She et al. (2023)) with a state-dependent memory for multiscale physical systems.

61

CHAPTER 5

CONCLUSION AND OUTLOOK

In this thesis, we aim to address the challenges of mesoscale modeling by tackling the problem

from three key perspectives: model representation, collective variable selection, and data collection.

By integrating these techniques, we strive to develop a comprehensive framework for constructing

faithful models that accurately capture phenomena across different scales.

In model representation part, we propose a machine learning-based memory kernel to model

the many-body nature of dissipative interactions. This approach is designed to respect physical

constraints, such as symmetry, and ensures thermodynamic consistency by strictly satisfying the

second fluctuation-dissipation theorem. By embedding these fundamental principles into the model,

we aim to achieve accurate and physically meaningful representations of complex systems.

In collective variable selection part, we introduce a data-driven method to identify collective

variables that capture the principal differences between distributions under various nonequilibrium

conditions. By treating these differences as CVs, our approach can generalize across different

external force fields, making it adaptable to a wide range of nonequilibrium scenarios. This step is

crucial for reducing the dimensionality of the problem while preserving the essential dynamics of

the system.

In data collection section, to address the challenges of sampling in high-dimensional phase

spaces, we reformulate the problem as a minimax optimization. This approach balances exploration

of uncharted regions of the phase space with exploitation of areas where the model exhibits

high error. By iteratively refining the training data, we ensure that the sampling process is both

comprehensive and focused on regions critical for improving model accuracy. By integrating these

three perspectives, we aim to develop a unified framework for mesoscale modeling that bridges the

gap between microscale interactions and macroscale dynamics.

These challenges are not confined to modeling problems between the microscale and mesoscale.

We also aim to extend our framework to address modeling challenges between the mesoscale and

macroscale, such as non-Fourier heat conduction and the diffusion of active materials. In these

62

applications, we strive to develop models that are not only physically consistent and accurate but

also mathematically well-posed. While this is relatively straightforward for ordinary differential

equations or stochastic differential equations for models we construct here, it becomes significantly

more challenging for partial differential equations or stochastic partial differential equations. We

hope to address this by carefully designing the model structure to guarantee existence, uniqueness,

and stability of solutions, which are essential for reliable simulations and predictions.

Recent advancements in generative models, such as generative adversarial networks (GANs),

variational autoencoders (VAEs), and diffusion models, have demonstrated remarkable capabilities

in modeling and sampling from high-dimensional distributions. Despite the explicit connection

with the sampling problems we presented ahead, a natural and intriguing question arises: Can

these generative models be extended to generate high-dimensional stochastic processes? Extending

generative models to generate high-dimensional stochastic processes that simultaneously satisfy

physical symmetries and thermodynamic consistency is a challenging but highly promising direction.

63

BIBLIOGRAPHY

Abrams, C. F. and Vanden-Eijnden, E. (2010). Large-scale conformational sampling of proteins
using temperature-accelerated molecular dynamics. Proceedings of the National Academy of
Sciences, 107(11):4961–4966.

Abrams, J. B. and Tuckerman, M. E. (2008). Efficient and direct generation of multidimensional
free energy surfaces via adiabatic dynamics without coordinate transformations. The Journal of
Physical Chemistry B, 112(49):15742–15757.

Ahn, S.-H., Grate, J. W., and Darve, E. F. (2017). Efficiently sampling conformations and pathways
using the concurrent adaptive sampling (cas) algorithm. The Journal of Chemical Physics, 147(7).

Ahn, S.-H., Ojha, A. A., Amaro, R. E., and McCammon, J. A. (2021). Gaussian-accelerated molecular
dynamics with the weighted ensemble method: A hybrid method improves thermodynamic and
kinetic sampling. Journal of Chemical Theory and Computation, 17(12):7938–7951.

Allen, M. P. and Tildesley, D. J. (2017). Computer simulation of liquids. Oxford university press.

Anderson, P. W. (1972). More is different: Broken symmetry and the nature of the hierarchical

structure of science. Science, 177(4047):393–396.

Baczewski, A. D. and Bond, S. D. (2013). Numerical integration of the extended variable generalized
Langevin equation with a positive Prony representable memory kernel. The Journal of chemical
physics, 139(4):044107.

Bao, G., Ye, X., Zang, Y., and Zhou, H. (2020). Numerical solution of inverse problems by weak

adversarial networks. Inverse Problems, 36(11):115003.

Barducci, A., Bussi, G., and Parrinello, M. (2008). Well-tempered metadynamics: a smoothly

converging and tunable free-energy method. Physical review letters, 100(2):020603.

Baxevani, G., Harmandaris, V., Kalligiannaki, E., and Tsantili, I. (2023). Bottom-up transient
time models in coarse-graining molecular systems. Multiscale Modeling & Simulation,
21(4):1746–1774.

Behler, J. and Parrinello, M. (2007). Generalized neural-network representation of high-dimensional

potential-energy surfaces. Phys. Rev. Lett., 98:146401.

Best, R. B. and Hummer, G. (2010). Coordinate-dependent diffusion in protein folding. Proceedings

of the National Academy of Sciences, 107(3):1088–1093.

Bonati, L., Zhang, Y.-Y., and Parrinello, M. (2019). Neural networks-based variationally enhanced

sampling. Proceedings of the National Academy of Sciences, 116(36):17641–17647.

64

Bussi, G., Donadio, D., and Parrinello, M. (2007). Canonical sampling through velocity rescaling.

The Journal of chemical physics, 126(1):014101.

Carrillo, J. A., Choi, Y.-P., Totzeck, C., and Tse, O. (2018). An analytical framework for
consensus-based global optimization method. Mathematical Models and Methods in Applied
Sciences, 28(06):1037–1066.

Carrillo, J. A., Hoffmann, F., Stuart, A. M., and Vaes, U. (2022). Consensus-based sampling.

Studies in Applied Mathematics, 148(3):1069–1140.

Carrillo, J. A., Jin, S., Li, L., and Zhu, Y. (2021). A consensus-based global optimization method
for high dimensional machine learning problems. ESAIM: Control, Optimisation and Calculus
of Variations, 27:S5.

Cendagorta, J. R., Tolpin, J., Schneider, E., Topper, R. Q., and Tuckerman, M. E. (2020). Comparison
of the performance of machine learning models in representing high-dimensional free energy
surfaces and generating observables. The Journal of Physical Chemistry B, 124(18):3647–3660.

Ceriotti, M., Bussi, G., and Parrinello, M. (2009). Langevin equation with colored noise for
constant-temperature molecular dynamics simulations. Physical review letters, 102(2):020601.

Chen, J., Jin, S., and Lyu, L. (2022). A consensus-based global optimization method with adaptive

momentum estimation. Communications in Computational Physics, 31(4):1296–1316.

Chen, K. K., Tu, J. H., and Rowley, C. W. (2012). Variants of dynamic mode decomposition:
boundary condition, koopman, and fourier analyses. Journal of nonlinear science, 22:887–915.

Chipot, C. and Lelièvre, T. (2011). Enhanced sampling of multidimensional free-energy landscapes

using adaptive biasing forces. SIAM Journal on Applied Mathematics, 71(5):1673–1695.

Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., and Müller, K.-R.
(2017). Machine learning of accurate energy-conserving molecular force fields. Science advances,
3(5):e1603015.

Coifman, R. R., Kevrekidis, I. G., Lafon, S., Maggioni, M., and Nadler, B. (2008). Diffusion maps,
reduction coordinates, and low dimensional representation of stochastic systems. Multiscale
Modeling & Simulation, 7(2):842–864.

Crommelin, D. and Vanden-Eijnden, E. (2011). Diffusion estimation from multiscale data by

operator eigenpairs. Multiscale Modeling & Simulation, 9(4):1588–1623.

Crosskey, M. and Maggioni, M. (2017). Atlas: a geometric approach to learning high-dimensional

stochastic systems near manifolds. Multiscale Modeling & Simulation, 15(1):110–156.

Daldrop, J. O., Kowalik, B. G., and Netz, R. R. (2017). External potential modifies friction of

65

molecular solutes in water. Physical Review X, 7(4):041065.

Darve, E. and Pohorille, A. (2001). Calculating free energies using average force. The Journal of

Chemical Physics, 115(20):9169–9183.

Darve, E., Rodríguez-Gómez, D., and Pohorille, A. (2008). Adaptive biasing force method for scalar

and vector free energy calculations. The Journal of Chemical Physics, 128(14).

Darve, E., Solomon, J., and Kia, A. (2009). Computing generalized Langevin equations and

generalized fokker-planck equations. Proc. Natl. Acad. Sci., 106(27):10884–10889.

Das, A. and Andersen, H. C. (2012). The multiscale coarse-graining method. ix. a general method
for construction of three body coarse-grained force fields. The Journal of chemical physics,
136(19):194114.

Davtyan, A., Dama, J. F., Voth, G. A., and Andersen, H. C. (2015). Dynamic force matching: A
method for constructing dynamical coarse-grained models with realistic time dependence. J.
Chem. Phys., 142.

Einstein, A. (1905). On the movement of small particles suspended in a stationary liquid demanded

by the molecular-kinetic theory of heat. Annalen der Physik, 17:549–560.

Español, P. and Warren, P. (1995). Statistical mechanics of dissipative particle dynamics. Europhysics

Letters, 30(4):191–196.

Fang, L., Ge, P., Zhang, L., E, W., and Lei, H. (2022). DeePN2: A deep learning-based non-Newtonian

hydrodynamic model. Journal of Machine Learning, 1:114–140.

Frenkel, D. and Smit, B. (2023). Understanding molecular simulation:

from algorithms to

applications. Elsevier.

Gabrié, M., Rotskoff, G. M., and Vanden-Eijnden, E. (2022). Adaptive monte carlo augmented with
normalizing flows. Proceedings of the National Academy of Sciences, 119(10):e2109420119.

Gao, X., Xu, Z. Q., and Zhou, X. Y. (2022). State-dependent temperature control for langevin

diffusions. SIAM Journal on Control and Optimization, 60(3):1250–1268.

Gao, Y. Q. (2008). An integrate-over-temperature approach for enhanced sampling. The Journal of

Chemical Physics, 128(6).

Ge, P., Zhang, L., and Lei, H. (2023). Machine learning assisted coarse-grained molecular dynamics

modeling of meso-scale interfacial fluids. J. Chem. Phys., 158:064104.

Ge, P., Zhang, Z., and Lei, H. (2024). Data-driven learning of the generalized langevin equation

with state-dependent memory. Phys. Rev. Lett., 133:077301.

66

Grafke, T. and Laio, A. (2024). Metadynamics for transition paths in irreversible dynamics.

Multiscale Modeling & Simulation, 22(1):125–141.

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. (2017). Improved

training of wasserstein gans. Advances in neural information processing systems, 30.

Hamelberg, D., Mongan, J., and McCammon, J. A. (2004). Accelerated molecular dynamics: a
promising and efficient simulation method for biomolecules. The Journal of Chemical Physics,
120(24):11919–11929.

Hansen, J. P. and McDonald, I. (1990). Theory of Simple Liquids. Academic, London.

Hess, B., Bekker, H., Berendsen, H. J., and Fraaije, J. G. (1997). Lincs: A linear constraint solver

for molecular simulations. Journal of Computational Chemistry, 18(12):1463–1472.

Hijón, C., Español, P., Vanden-Eijnden, E., and Delgado-Buscalioni, R. (2010). Mori–Zwanzig

formalism as a practical computational tool. Faraday discuss., 144:301–322.

Hoogerbrugge, P. J. and Koelman, J. M. V. A. (1992). Simulating microscopic hydrodynamic

phenomena with dissipative particle dynamics. Europhys. Lett., 19(3):155–160.

Hoover, W. G. (1985). Canonical dynamics: equilibrium phase-space distributions. Physical Review

A, 31(3):1695–1697.

Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A., and Simmerling, C. (2006). Comparison
of multiple amber force fields and development of improved protein backbone parameters. Proteins:
Structure, Function, and Bioinformatics, 65(3):712–725.

Hudson, T. and Li, X. H. (2020). Coarse-graining of overdamped langevin dynamics via the

mori–zwanzig formalism. Multiscale Modeling & Simulation, 18(2):1113–1135.

Izvekov, S. and Voth, G. A. (2005). A multiscale coarse-graining method for biomolecular systems.

The Journal of Physical Chemistry B, 109(7):2469–2473.

John, S. T. and Csányi, G. (2017). Many-body coarse-grained interactions using gaussian

approximation potentials. The Journal of Physical Chemistry B, 121(48):10934–10949.

Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., and Klein, M. L. (1983).
Comparison of simple potential functions for simulating liquid water. The Journal of chemical
physics, 79(2):926–935.

Jung, G., Hanke, M., and Schmid, F. (2017). Iterative reconstruction of memory kernels. Journal of

Chemical Theory and Computation, 13(6):2481–2488.

Kampen, N. V. (2007). Stochastic processes in physics and chemistry. North Holland.

67

Kawasaki, K. and Gunton, J. D. (1973). Theory of nonlinear transport processes: Nonlinear shear

viscosity and normal stress effects. Physical Review A, 8(4):2048.

Kim, J., Straub, J. E., and Keyes, T. (2006). Statistical-temperature monte carlo and molecular

dynamics algorithms. Physical review letters, 97(5):050601.

Kingma, D. and Ba, J. (2015). Adam: A method for stochastic optimization.

International

Conference on Learning Representations (ICLR).

Klippenstein, V., Tripathy, M., Jung, G., Schmid, F., and van der Vegt, N. F. (2021). Introducing
memory in coarse-grained molecular simulations. The Journal of Physical Chemistry B,
125(19):4931–4954.

Klippenstein, V. and van der Vegt, N. F. A. (2021). Cross-correlation corrected friction in

(generalized) langevin models. The Journal of Chemical Physics, 154(19):191102.

Koopman, B. O. (1931). Hamiltonian systems and transformation in Hilbert space. Proceedings of

the National Academy of Sciences, 17(5):315–318.

Kumar, S., Rosenberg, J. M., Bouzida, D., Swendsen, R. H., and Kollman, P. A. (1992). The
weighted histogram analysis method for free-energy calculations on biomolecules. i. the method.
Journal of Computational Chemistry, 13(8):1011–1021.

Kutz, J. N., Brunton, S. L., Brunton, B. W., and Proctor, J. L. (2016). Dynamic mode decomposition:

data-driven modeling of complex systems. SIAM.

Laio, A. and Parrinello, M. (2002). Escaping free-energy minima. Proceedings of the National

Academy of Sciences, 99(20):12562–12566.

Lange, O. F. and Grubmüller, H. (2006). Collective Langevin dynamics of conformational motions

in proteins. J. Chem. Phys., 124:214903.

Lee, H. S., Ahn, S.-H., and Darve, E. F. (2019). The multi-dimensional generalized Langevin equation

for conformational motion of proteins. The Journal of Chemical Physics, 150(17):174113.

Lei, H., Baker, N. A., and Li, X. (2016). Data-driven parameterization of the generalized Langevin

equation. Proc. Natl. Acad. Sci., 113(50):14183–14188.

Lei, H., Caswell, B., and Karniadakis, G. E. (2010). Direct construction of mesoscopic models from

microscopic simulations. Physical Review E, 81(2).

Lei, H., Wu, L., and E, W. (2020). Machine learning based non-newtonian fluid model with

molecular fidelity. Physics Review E, 102:043309.

Lelièvre, T., Rousset, M., and Stoltz, G. (2008). Long-time convergence of an adaptive biasing force

68

method. Nonlinearity, 21(6):1155.

Lemke, T. and Peter, C. (2017). Neural network based prediction of conformational free energies
Journal of Chemical Theory and

- a new route toward coarse-grained simulation models.
Computation, 13(12):6213–6221.

Li, Z., Lee, H. S., Darve, E., and Karniadakis, G. E. (2017). Computing the non-Markovian
coarse-grained interactions derived from the Mori–Zwanzig formalism in molecular systems:
Application to polymer melts. The Journal of Chemical Physics, 146(1):014104.

Lindahl, Abraham, Hess, and van der Spoel (2019). Gromacs 2019.2 source code.

Lu, J. and Vanden-Eijnden, E. (2013). Infinite swapping replica exchange molecular dynamics leads
to a simple simulation patch using mixture potentials. The Journal of Chemical Physics, 138(8).

Luo, G., Andricioaei, I., Xie, X. S., and Karplus, M. (2006). Dynamic distance disorder in proteins

is caused by trapping. The Journal of Physical Chemistry B, 110(19):9363–9367.

Lyu, L. and Lei, H. (2023a). Consensus-based construction of high-dimensional free energy surface.

arXiv preprint arXiv:2311.05009.

Lyu, L. and Lei, H. (2023b). Construction of coarse-grained molecular dynamics with many-body

non-markovian memory. Physical Review Letters, 131(17):177301.

Lyu, L. and Lei, H. (2024). On the generalization ability of coarse-grained molecular dynamics

models for non-equilibrium processes. arXiv preprint arXiv:2409.11519.

Lyubartsev, A. P. and Laaksonen, A. (1995). Calculation of effective interaction potentials from
radial distribution functions: A reverse monte carlo approach. Phys. Rev. E, 52:3730–3737.

Ma, L., Li, X., and Liu, C. (2016). Derivation and approximation of coarse-grained dynamics from

Langevin dynamics. Arxiv preprint, page 1605.04886.

Ma, L., Li, X., and Liu, C. (2019). Coarse-graining langevin dynamics using reduced-order

techniques. Journal of Computational Physics, 380:170–190.

Ma, Z., Wang, S., Kim, M., Liu, K., Chen, C.-L., and Pan, W. (2021). Transfer learning of memory
kernels for transferable coarse-graining of polymer dynamics. Soft Matter, 17(24):5864–5877.

Maragliano, L. and Vanden-Eijnden, E. (2006). A temperature accelerated method for sampling free
energy and determining reaction pathways in rare events simulations. Chemical physics letters,
426(1-3):168–175.

Maragliano, L. and Vanden-Eijnden, E. (2008). Single-sweep methods for free energy calculations.

The Journal of Chemical Physics, 128(18):184110.

69

Marinari, E. and Parisi, G. (1992). Simulated tempering: a new monte carlo scheme. Europhysics

letters, 19(6):451.

Meyer, H., Pelagejcev, P., and Schilling, T. (2019). Non-markovian out-of-equilibrium dynamics:
A general numerical procedure to construct time-dependent memory kernels for coarse-grained
observables. Europhysics Letters, 128(4):40001.

Miao, Y., Feher, V. A., and McCammon, J. A. (2015). Gaussian accelerated molecular dynamics:
Unconstrained enhanced sampling and free energy calculation. Journal of Chemical Theory and
Computation, 11(8):3584–3595.

Miyamoto, S. and Kollman, P. A. (1992). Settle: An analytical version of the shake and rattle

algorithm for rigid water models. Journal of Computational Chemistry, 13(8):952–962.

Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. (2018). Spectral normalization for generative

adversarial networks. arXiv preprint arXiv:1802.05957.

Molgedey, L. and Schuster, H. G. (1994). Separation of a mixture of independent signals using time

delayed correlations. Physical review letters, 72(23):3634.

Mones, L., Bernstein, N., and Csányi, G. (2016). Exploration, sampling, and reconstruction of free
energy surfaces with gaussian process regression. Journal of chemical theory and computation,
12(10):5100–5110.

Mori, H. (1965). Transport, collective motion, and Brownian motion. Progress of Theoretical

Physics, 33(3):423–455.

Morrone, J. A., Li, J., and Berne, B. J. (2012). Interplay between hydrodynamics and the free
energy surface in the assembly of nanoscale hydrophobes. The Journal of Physical Chemistry B,
116(1):378–389.

Nielsen, S. O., Lopez, C. F., Srinivas, G., and Klein, M. L. (2004). Coarse grain models and the
computer simulation of soft materials. Journal of Physics: Condensed Matter, 16(15):R481–R512.

Noé, F., Olsson, S., Köhler, J., and Wu, H. (2019). Boltzmann generators: Sampling equilibrium

states of many-body systems with deep learning. Science, 365(6457):eaaw1147.

Noid, W. G. (2013). Perspective: coarse-grained models for biomolecular systems. J. Chem. Phys.,

139(9):090901.

Noid, W. G., Chu, J.-W., Ayton, G. S., Krishna, V., Izvekov, S., Voth, G. A., Das, A., and Andersen,
H. C. (2008). The multiscale coarse-graining method. I. a rigorous bridge between atomistic and
coarse-grained models. J. Chem. Phys., 128(24):244114.

Nosé, S. (1984). A molecular dynamics method for simulations in the canonical ensemble. Molecular

70

Physics, 52(2):255–268.

Parrinello, M. and Rahman, A. (1981). Polymorphic transitions in single crystals: A new molecular

dynamics method. Journal of Applied Physics, 52(12):7182–7190.

Pérez-Hernández, G., Paul, F., Giorgino, T., De Fabritiis, G., and Noé, F. (2013). Identification
of slow molecular order parameters for markov model construction. The Journal of chemical
physics, 139(1).

Pfaendtner, J. and Bonomi, M. (2015). Efficient sampling of high-dimensional free-energy landscapes
with parallel bias metadynamics. Journal of chemical theory and computation, 11(11):5062–5067.

Piana, S. and Laio, A. (2007). A bias-exchange approach to protein folding. The journal of physical

chemistry B, 111(17):4553–4559.

Pinnau, R., Totzeck, C., Tse, O., and Martin, S. (2017). A consensus-based model for global
optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences,
27(01):183–204.

Plotkin, S. S. and Wolynes, P. G. (1998). Non-Markovian configurational diffusion and reaction

coordinates for protein folding. Phys. Rev. Lett., 80:5015–5018.

Prakash, A., Fu, C. D., Bonomi, M., and Pfaendtner, J. (2018). Biasing smarter, not harder, by
partitioning collective variables into families in parallel bias metadynamics. Journal of chemical
theory and computation, 14(10):4985–4990.

Reith, D., Pütz, M., and Müller-Plathe, F. (2003). Deriving effective mesoscale potentials from

atomistic simulations. Journal of Computational Chemistry, 24(13):1624–1636.

Rosso, L., Mináry, P., Zhu, Z., and Tuckerman, M. E. (2002). On the use of the adiabatic molecular
dynamics technique in the calculation of free energy profiles. The Journal of chemical physics,
116(11):4389–4402.

Rudd, R. E. and Broughton, J. Q. (1998). Coarse-grained molecular dynamics and the atomic limit

of finite elements. Phys. Rev. B, 58:R5893–R5896.

Russo, A., Durán-Olivencia, M. A., Kevrekidis, I. G., and Kalliadasis, S. (2019). Deep learning as
closure for irreversible processes: A data-driven generalized Langevin equation. arXiv preprint
arXiv:1903.09562.

Satija, R., Das, A., and Makarov, D. E. (2017). Transition path times reveal memory effects
and anomalous diffusion in the dynamics of protein folding. The Journal of Chemical Physics,
147(15):152707.

Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. Journal of

71

fluid mechanics, 656:5–28.

Schneider, E., Dai, L., Topper, R. Q., Drechsel-Grau, C., and Tuckerman, M. E. (2017). Stochastic
neural network approach for learning high-dimensional free energy surfaces. Physical review
letters, 119(15):150601.

Schwantes, C. R. and Pande, V. S. (2013). Improvements in markov state model construction reveal
many non-native interactions in the folding of ntl9. Journal of chemical theory and computation,
9(4):2000–2009.

Shaffer, P., Valsson, O., and Parrinello, M. (2016). Enhanced, targeted sampling of high-dimensional
free-energy landscapes using variationally enhanced sampling, with an application to chignolin.
Proceedings of the National Academy of Sciences, 113(5):1150–1155.

She, Z., Ge, P., and Lei, H. (2023). Data-driven construction of stochastic reduced dynamics encoded

with non-markovian features. The Journal of Chemical Physics, 158(3):034102.

Shell, M. S. (2008). The relative entropy is fundamental to multiscale and inverse thermodynamic

problems. The Journal of Chemical Physics, 129(14).

Sidky, H. and Whitmer, J. K. (2018). Learning free energy landscapes using artificial neural

networks. The Journal of chemical physics, 148(10):104111.

Soper, A. (1996). Empirical potential monte carlo simulation of fluid structure. Chemical Physics,

202(2):295–306.

Stecher, T., Bernstein, N., and Csányi, G. (2014a). Free energy surface reconstruction from umbrella
samples using gaussian process regression. Journal of Chemical Theory and Computation,
10(9):4079–4097.

Stecher, T., Bernstein, N., and Csányi, G. (2014b). Free energy surface reconstruction from
umbrella samples using gaussian process regression. Journal of chemical theory and computation,
10(9):4079–4097.

Stoltz, G., Rousset, M., et al. (2010). Free energy computations: A mathematical perspective.

World Scientific.

Straus, J. B., Gomez Llorente, J. M., and Voth, G. A. (1993). Manifestations of spatially dependent
friction in classical activated rate processes. The Journal of Chemical Physics, 98(5):4082–4097.

Sugita, Y. and Okamoto, Y. (1999). Replica-exchange molecular dynamics method for protein

folding. Chemical physics letters, 314(1-2):141–151.

Tang, K., Zhai, J., Wan, X., and Yang, C. (2023). Adversarial adaptive sampling: Unify pinn and

optimal transport for the approximation of pdes. arXiv preprint arXiv:2305.18702.

72

Torrie, G. and Valleau, J. (1977a). Nonphysical sampling distributions in monte carlo free-energy

estimation: Umbrella sampling. Journal of Computational Physics, 23(2):187–199.

Torrie, G. M. and Valleau, J. P. (1977b). Nonphysical sampling distributions in monte carlo
free-energy estimation: Umbrella sampling. Journal of Computational Physics, 23(2):187–199.

Tribello, G. A., Bonomi, M., Branduardi, D., Camilloni, C., and Bussi, G. (2014). Plumed 2: New

feathers for an old bird. Computer physics communications, 185(2):604–613.

Valsson, O. and Parrinello, M. (2014). Variational approach to enhanced sampling and free energy

calculations. Phys. Rev. Lett., 113:090601.

van der Oord, C., Sachs, M., Kovács, D. P., Ortner, C., and Csányi, G. (2023). Hyperactive learning

for data-driven interatomic potentials. npj Computational Materials, 9(1):168.

Van Hove, L. (1954). Correlations in space and time and born approximation scattering in systems

of interacting particles. Phys. Rev., 95:249–262.

Voter, A. F. (1997). Hyperdynamics: Accelerated molecular dynamics of infrequent events. Physical

Review Letters, 78(20):3908.

Vroylandt, H., Goudenège, L., Monmarché, P., Pietrucci, F., and Rotenberg, B. (2022).
Likelihood-based non-markovian models from molecular dynamics. Proceedings of the National
Academy of Sciences, 119(13):e2117586119.

Wang, D., Wang, Y., Chang, J., Zhang, L., Wang, H., and E, W. (2022). Efficient sampling
of high-dimensional free energy landscapes using adaptive reinforced dynamics. Nature
Computational Science, 2(1):20–29.

Wang, H., Zariphopoulou, T., and Zhou, X. Y. (2020a). Reinforcement learning in continuous
time and space: A stochastic control approach. The Journal of Machine Learning Research,
21(1):8145–8178.

Wang, S., Ma, Z., and Pan, W. (2020b). Data-driven coarse-grained modeling of polymers in
solution with structural and dynamic properties conserved. Soft Matter, 16(36):8330–8344.

Weiser, L. J. and Santiso, E. E. (2019). A cgenff-based force field for simulations of peptoids with

both cis and trans peptide bonds. Journal of computational chemistry, 40(22):1946–1956.

Widder, C., Koch, F., and Schilling, T. (2022). Generalized langevin dynamics simulation
with non-stationary memory kernels: How to make noise. The Journal of Chemical Physics,
157(19):194107.

Willis, C. and Picard, R. (1974). Time-dependent projection-operator approach to master equations

for coupled systems. Physical Review A, 9(3):1343.

73

Wu, H. and Noé, F. (2014). Optimal estimation of free energies and stationary densities from

multiple biased simulations. Multiscale Modeling & Simulation, 12(1):25–54.

Xie, P., Car, R., and E, W. (2022). Ab Initio Generalized Langevin Equations. arXiv preprint

arXiv:2211.06558.

Yang, L. and Qin Gao, Y. (2009). A selective integrated tempering method. The Journal of Chemical

Physics, 131(21).

Yang, Y. I., Shao, Q., Zhang, J., Yang, L., and Gao, Y. Q. (2019). Enhanced sampling in molecular

dynamics. The Journal of Chemical Physics, 151(7).

Yoshimoto, Y., Kinefuchi, I., Mima, T., Fukushima, A., Tokumasu, T., and Takagi, S. (2013).
Bottom-up construction of interaction models of non-markovian dissipative particle dynamics.
Phys. Rev. E, 88:043305.

Yu, P., Du, Q., and Liu, C. (2005). From micro to macro dynamics via a new closure approximation

to the fene model of polymeric fluids. Multiscale Modeling & Simulation, 3(4):895–917.

Yu, T.-Q., Lu, J., Abrams, C. F., and Vanden-Eijnden, E. (2016). Multiscale implementation of
infinite-swap replica exchange molecular dynamics. Proceedings of the National Academy of
Sciences, 113(42):11744–11749.

Zhang, B. W., Jasnow, D., and Zuckerman, D. M. (2007). Efficient and verified simulation of a path
ensemble for conformational change in a united-residue model of calmodulin. Proceedings of the
National Academy of Sciences, 104(46):18043–18048.

Zhang, L., Han, J., Wang, H., Car, R., and E, W. (2018a). Deep potential molecular dynamics: A

scalable model with the accuracy of quantum mechanics. Phys. Rev. Lett., 120:143001.

Zhang, L., Han, J., Wang, H., Car, R., and E, W. (2018b). Deepcg: Constructing coarse-grained

models via deep neural networks. The Journal of Chemical Physics, 149(3):034101.

Zhang, L., Han, J., Wang, H., Saidi, W., Car, R., and E, W. (2018c). End-to-end symmetry preserving
inter-atomic potential energy model for finite and extended systems. In Bengio, S., Wallach, H.,
Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural
Information Processing Systems 31, pages 4436–4446. Curran Associates, Inc.

Zhang, L., Wang, H., and E, W. (2018d). Reinforced dynamics for enhanced sampling in large

atomic and molecular systems. The Journal of chemical physics, 148(12):124113.

Zheng, L., Chen, M., and Yang, W. (2008). Random walk in orthogonal space to achieve efficient
free-energy simulation of complex systems. Proceedings of the National Academy of Sciences,
105(51):20227–20232.

74

Zwanzig, R. (1973). Nonlinear generalized Langevin equations. J. Stat. Phys., 9:215 – 220.

Zwanzig, R. (2001). Nonequilibrium Statistical Mechanics. Oxford University Press.

75

APPENDIX A

SIMULATION DETAIL FOR NONEQUILIBRIUM CASE

A.1 Additional result of CGMD model under equilibrium state

To establish a fair comparison of the various CG models, we examine the static and dynamic

properties that have been widely used as benchmark problems for model validation. Fig. A.1

shows the radial distribution function between the CG coordinates. For all three CG models, their

predictions agree well with the full MD results, which verifies the accuracy of the CG free energy

function 𝑈 (Q). Fig. A.2 shows the normalized velocity auto-correlation function (VACF) of the

CG variables. Similar to the static properties, the prediction of the CG models agrees well with the

full MD results, which verifies the accuracy of the many-body memory term K (Q, 𝑡).

Despite of the good agreement, we emphasize that these properties are essentially defined under

the marginal density near equilibrium. The applicability to non-equilibrium processes is generally

unwarranted, and further relies on the consistency in the conditional probability density function

associated with the CG projection operator (i.e., Eq. (3.6)) as discussed in Sec. 3.2.

A.2 CG variable training

In this section, we show that Eq. 3.8 does not impose further constraints on constructing the CG

variables. We examine the equivalence of two CG mapping. For instance, when the centers of mass

(COMs) are utilized as the only CG variables, the information they encompass is the same as the

COMs plus a constant or the COMs multiplied by a constant.

Definition A.2.1. For two CG map 𝜙 and ˇ𝜙, if there exist a map T, such that 𝜙(q, p) = T( ˇ𝜙(q, p)),

we say 𝜙 ≤ ˇ𝜙. Also, if 𝜙 ≤ ˇ𝜙 and ˇ𝜙 ≤ 𝜙, we say that the two maps are equivalent.

In other words, if 𝜙 ≤ ˇ𝜙, any properties that can be measured from the CG coordinate 𝜙, denote

as F (𝜙) , can also be meansured from ˇ𝜙 as F (T( ˇ𝜙)). If 𝜙 ≤ ˇ𝜙 and a CG model d

d𝑡 ˇ𝜙 = ˇL ˇ𝜙 is
constructed using CG coordinate ˇ𝜙, the CG model for 𝜙 can be constructed by the chain rule
d𝑡 𝜙 = 𝛿𝑇

ˇL is deterministic or Itô’s lemma if

ˇL includes stochastic term.

ˇL ˇ𝜙 if

𝛿 ˇ𝜙

d

Proposition A.2.2. For any linear map 𝜙 defined by a matrix W ∈ R𝑁𝑚×(𝑚−1), there exists a map

76

Figure A.1 The distribution of the pairwise distance between CG coordinates in the CGMD model
compared with the Full MD model. (a) The radius distribution function 𝑔(𝑟) of the CGCOM model.
(b) The distribution of the pairwise distance 𝑔′(𝑟) between two CG coordinates within the same
molecular of the CG3 model. (c)-(j) The distribution of distance 𝑔′(𝑟) between two CG coordinate
(1-2,1-3,1-4,2-3,2-4,3-4, respectively) within the same molecular of CG4 model.

77

8162432r0.00.40.81.2g(r)(a)MDCGCOM0102030r0.000.040.080.12ρ(r)(r)MDCG35101520r0.000.050.100.15g0(r)(c)MDCG45101520r0.000.050.10g0(r)(d)MDCG45101520r0.000.050.10g0(r)(g)MDCG45101520r0.000.050.10g0(r)(h)MDCG404812r0.000.050.100.150.200.25g0(r)(i)MDCG45101520r0.000.050.10g0(r)(j)MDCG4Figure A.2 The VACF of CVs in different CG models and their comparison with Full MD. (a) VACF
of CVs in CG3 model and the comparison with Full MD. (b)-(g) VACF of CVs (1,2,3,4, respectively)
in CG4 model and the comparison with Full MD.

ˇ𝜙, defined by a matrix ˇW ∈ R𝑁𝑚×𝑚, subject to the following constraints

(A.1)

(A.2)

0 < ˇ𝑤𝑛𝑖 < 1,

𝑚
∑︁

𝑖=1

ˇ𝑤𝑛𝑖 = 1.

Proof. For any W with entry 𝑤𝑖 𝑗 ,

ˇW can be constructed as



where ˜𝑤 = min𝑛,𝑖 𝑤𝑛𝑖 and ¯𝑤 = max𝑖 (cid:205)𝑚−1

ˇ𝑤𝑛𝑖 =



𝑖

𝑤1

𝑛𝑖 − ˜𝑤1
¯𝑤
𝑚−1
∑︁

1 −

,

0 < 𝑖 < 𝑚,

ˇ𝑤𝑛𝑖,

𝑖 = 𝑚,

𝑖=1

(𝑤𝑛𝑖 − ˜𝑤). It’s obvious that ˇ𝑤𝑛𝑖 ∈ [0, 1] for 0 < 𝑖 < 𝑚.

Since

𝑚−1
∑︁

𝑖=1

ˇ𝑤𝑛𝑖 =

(cid:205)𝑚−1

𝑖=1 (𝑤𝑛𝑖 − ˜𝑤)
¯𝑤

∈ [0, 1],

we have ˇ𝑤𝑛𝑚 ∈ [0, 1]. Also, (cid:205)𝑖 ˇ𝑤𝑛𝑖 = 1 for every 𝑛, hence ˇ𝑊 satisfies the constraints. Since

𝑚
∑︁

𝑁𝑚∑︁

(

𝑖

𝑛

ˇ𝑤𝑛𝑖) ˇQ𝐼

𝑖 =

𝑚
∑︁

𝑁𝑚∑︁

𝑖

𝑛

ˇ𝑤𝑛𝑖q𝐼

𝑛 =

𝑁𝑚∑︁

𝑛

q𝐼
𝑛,

(A.3)

78

012345Time0.000.250.500.751.00VACF(a)MDCGCOM012345Time0.000.250.500.751.00VACF(b)MDCG301234Time0.000.250.500.751.00VACF(c)MDCG401234Time0.000.250.500.751.00VACF(d)MDCG401234Time0.000.250.500.751.00VACF(g)MDCG401234Time0.000.250.500.751.00VACF(h)MDCG4we have

Q𝐼

𝑖 =

=

=

(cid:205)𝑁𝑚

𝑛 𝑤𝑛𝑖q𝐼,𝑛
(cid:205)𝑁𝑚
𝑛 𝑤𝑛𝑖
( ¯𝑤 ˇ𝑤𝑛𝑖 + ˜𝑤)q𝐼,𝑛
( ¯𝑤 ˇ𝑤𝑛𝑖 + ˜𝑤)
ˇ𝑤𝑛𝑖) ˇQ𝐼
(cid:205)𝑁𝑚
𝑛

(cid:205)𝑁𝑚
𝑛
(cid:205)𝑁𝑚
𝑛
¯𝑤((cid:205)𝑁𝑚
𝑛

(cid:16)(cid:205)𝑁𝑚
𝑛

𝑗

𝑖 + ˜𝑤 (cid:205)𝑚
( ¯𝑤 ˇ𝑤𝑛𝑖 + ˜𝑤)

(A.4)

ˇ𝑤𝑛 𝑗

(cid:17) ˇQ𝐼

𝑗

.

Therefore, Q𝐼 is a linear transformation of ˇQ𝐼 for all 𝐼 = 1, · · · , 𝑀. Similarly, P𝐼 can be shown as a

linear transformation of ˇP𝐼.

□

79

APPENDIX B

SIMULATION DETAIL AND PROOF FOR CONSENSUS BASED SAMLPING

B.1 Proof of Proposition 2

The sampling distribution is governed by the following McKean stochastic differential equation

(cid:164)z𝑖
𝑡 = −

1
𝛾

∇z𝐺 (z𝑖

𝑡) +

√︄

2
𝜅ℎ𝛾

𝜉𝑖 (𝑡),

(B.1)

(z𝑡 − m𝑡) and 𝛾 is the friction coefficient. We want to show it

where 𝐺 (z𝑡) = 1

2 (z𝑡 − m𝑡)𝑇𝑉 −1

𝑡

converges to a steady state as an approximation of 𝑞∗ with the proper choices of m𝑡 and 𝑉𝑡 given by

m𝑡 =

𝑁𝑤
∑︁

𝑖=1

𝑡 ˆ𝑝(z𝑖
z𝑖

𝑡),

𝑉𝑡 = 𝜅𝑡

𝑁𝑤
∑︁

𝑖=1

(z𝑖

𝑡 − m𝑡) (z𝑖

𝑡 − m𝑡)𝑇 ˆ𝑝(z𝑖

𝑡),

(B.2)

where ˆ𝑝(z) =

exp (−𝜅𝑙 L −
𝑖=1 exp (−𝜅𝑙 L −

N (z))

N (z𝑖))

(cid:205)𝑁𝑤

and 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ.

Proposition 2. Suppose L−
𝑞𝑡 → exp (−𝜅ℎL −
∫ exp (−𝜅ℎL −

N (z))
N (z))dz

as 𝑡 → ∞, given 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ.

N (z) takes a local quadratic approximation in form of 1

2 (z− 𝜇)𝑇 Σ−1(z− 𝜇),

Proof. Let 𝑞∞(z) denote the invariant distribution of the McKean SDE (B.1). Then 𝑞∞(z) must be

the invariant distribution of the following McKean stochastic dynamics

(cid:164)z = −

1
𝛾

𝑉 −1
𝜅𝑙,∞(z − m𝜅𝑙,∞) +

√︄

2
𝛾𝜅ℎ

𝜉 (𝑡),

(B.3)

where m𝜅𝑙,∞ and 𝜅−1
∝ 𝑞∞(z)𝑒−𝜅𝑙 L −

𝑡 𝑉𝜅𝑙,∞ are the mean and the covariance matrix of the re-weighted density

N (z). With the fluctuation-dissipation relation for Eq. (B.3), we can show 𝑞∞(z)

follows the Gaussian distribution with mean m𝜅𝑙,∞ and covariance matrix 𝜅−1

ℎ 𝑉𝜅𝑙,∞.

Since LN (z) = 1

2 (z − 𝜇)𝑇 Σ−1(z − 𝜇) is quadratic, the re-weighted density of a Gaussian

distribution 𝑞(z) ∼ N (m, 𝑉) remains Gaussian, i.e.,

𝑞(z)𝑒−𝜅𝑙 L −

N (z) ∝ N (m𝜅𝑙 , 𝑉𝜅𝑙 ),

80

where m𝜅𝑙 and 𝑉𝜅𝑙 are defined by

m𝜅𝑙 = (𝑉 −1 + 𝜅𝑙Σ−1)−1(𝜅𝑙Σ−1𝜇 + 𝑉 −1m),

𝑉𝜅𝑙 = (𝑉 −1 + 𝜅𝑙Σ−1)−1.

(B.4)

In particular, we choose m = m𝜅𝑙,∞ and 𝑉 = 𝜅−1

ℎ 𝑉𝜅𝑙,∞, then Eq. (B.4) yields

m𝜅𝑙,∞ = (𝜅ℎ𝑉 −1

𝜅𝑙,∞ + 𝜅𝑙Σ−1)−1(𝜅𝑙Σ−1𝜇 + 𝜅ℎ𝑉 −1

𝜅𝑙,∞m𝜅𝑙,∞),

𝑡 𝑉𝜅𝑙,∞ = (𝜅ℎ𝑉 −1
𝜅−1

𝜅𝑙,∞ + 𝜅𝑙Σ−1)−1.

It is easy to show that by choosing 𝜅𝑡 = 𝜅𝑙 + 𝜅ℎ, m𝜅𝑙,∞ and 𝑉𝜅𝑙,∞ recovers 𝜇 and Σ, respectively, and

the invariant density takes the form

𝑞∞(z) ∼ N

(cid:16)

𝜇, 𝜅−1

ℎ Σ

(cid:17)

.

□

B.2 Simulation setup

All the MD simulations are performed using the package GROMACS 2019.2 Lindahl et al. (2019)

and open-source, community-developed PLUMED library Tribello et al. (2014). The simulation is

carried out on Intel(R) Xeon(R) Platinum 8260 CPU.

B.2.1

ala2 (the two-dimensional problem)

The Ace-Ala-Nme (ala2) molecule is modeled by the Amber99SB force field Hornak et al.

(2006). The molecules are dissolved in 383 water molecules in a periodic simulation cell. The

cut-off radius of the van der Waals interaction is 0.9 nm. The Coulomb interaction is treated with

the smooth particle mesh Ewald method with a real space cutoff of 0.9 nm and a reciprocal space

grid spacing of 0.12 nm. The system is integrated with the leap-frog scheme at time step 2 fs. The

temperature of the system is set to 300 K by a velocity-rescale thermostat Bussi et al. (2007) with a

relaxation time of 0.2 ps. The Parrinello-Rahman barostat Parrinello and Rahman (1981) with a

relaxation time scale of 1.5 ps and a compressibility of 4.5 × 10−5 bar−1 is coupled to the system to

control the pressure to 1 bar. The hydrogen atom is constrained by the LINCS algorithmHess et al.

(1997) and the H–O bond and H–O–H angle of water molecules are constrained by the SETTLE

algorithmMiyamoto and Kollman (1992).

81

B.2.2

s1pe (three-dimensional problem)

The s-(1)-phenylethyl (s1pe) molecule is modeled by the CHARMM general force field (CGenFF)

Weiser and Santiso (2019). The molecule is dissolved in 546 TIP3P water molecules in a (2.69

nm)3 dodecahedron box. The cut-off radius of the van der Waals interaction is 1 nm. The Coulomb

interaction is treated with the smooth particle mesh Ewald method with a real space cutoff of 1 nm

and a reciprocal space grid spacing of 0.12 nm. The system is integrated with the leap-frog scheme

at time step 2 fs. The temperature of the system is set to 298 K by a velocity-rescale thermostat

Bussi et al. (2007) with a relaxation time of 0.2 ps. The Parrinello-Rahman barostat pressure couple,

the hydrogen atom constraint, the H–O bond constraint, and the H–O–H angle of water molecules

constraint are the same as the previous one.

B.2.3

(s1pe)3 (the nine-dimensional problem)

The (s1pe)3 molecule is modeled by the CHARMM general force field (CGenFF) Weiser and

Santiso (2019). The molecule is solvated in a (4.2 nm)3 dodecahedron box with 1622 TIP3P water

molecules. Other simulation setups are similar to the s1pe molecule.

B.2.4

ala16 (the thirty-dimensional problem)

The Ace-(Ala)15-Nme (ala16) molecule is modeled by the Amber99SB force field Hornak et al.

(2006). The molecule is solvated in 2258 water molecules in a (4.62 nm)3 dodecahedron box. Other

simulation setups are similar to the ala2 molecule.

B.3 Traning

The training data is collected during the sampling process by restricted dynamics. The initial

10% steps of the restricted dynamics are used as equilibrium and the rest 90% steps are used to

calculate the mean force. The FES are parameterized as a fully connected neural network. The

depth and width of the NN are shown in Supplementary Table B.1. The NNs are trained by Adam

for 100000 steps with a learning rate 1 × 10−3. For each training step, 5000 sampling points are

randomly selected from the data set. All the training process is carried out on v100 with 32768 MB

memory.

82

Table B.1 The depth and width of the NN used to parameterize the FES of different molecules.

molecule
ala2
s1pe
(s1pe)3
ala16

depth width

3
4
4
4

48
64
512
640

B.4 Additional result

B.4.1 Additional result of the 1D Rastrigin function

Supplementary Table B.2 presents the estimations of the first and second momentum during

the construction of the 1D Rastrigin function. The first momentum, predicted to be the maximum

residual points in Proposition 2, is confirmed to be numerically accurate. The second momentum

is validated through the comparison between 𝑓 ′′ and 𝑓 ′′

𝜃𝑖−1

± 𝑉 −1, with the sign influenced by the

residual’s sign. Numerical resuts verify the accurate prediction of the second momentum, i.e.,
(cid:12)
(cid:12)
(cid:12)

≈ 𝑉 −1 for 𝑥∗ at the max-residual point.

𝑓 ′′ (𝑥∗) − 𝑓 ′′
𝜃𝑖

(𝑥∗)

(cid:12)
(cid:12)
(cid:12)

Table B.2 The first and second momentum estimation made during the construction of 1D Rastrigin
function. The maximum residual point of | 𝑓 (𝑥) − 𝑓𝜃𝑖 (𝑥)| is computed by mesh points with mesh size
1 × 10−4. The first momentum is estimated from the equilibrium distribution of CES dynamics. The
second derivative of 𝑓 is computed at the first momentum estimated point. The second derivative
estimation is made from 𝑓 ′′
± 𝑉 −1, where 𝑉 is second momentum estimated from the equilibrium
distribution of CES dynamics and sign is determined by the sign of 𝑓 − 𝑓𝜃𝑖 .

𝜃𝑖−1

iteration 𝑖
maximum residual point
first momentum estimated
second derivative of 𝑓
second derivative estimated

0
0.000
-0.000
41.478
40.974

1
2.000
2.000
41.478
37.442

2
-2.000
-2.000
41.478
36.916

3
0.996
0.996
41.468
38.283

4
-0.996
-0.995
41.462
39.519

5
2.805
2.805
15.273
11.841

6
-2.806
-2.806
15.539
13.380

7
1.500
1.499
-37.478
-36.796

8
-1.502
-1.503
-37.473
-36.271

9
0.503
0.502
-37.474
-37.515

10
-0.501
-0.501
-37.478
-37.285

11
2.416
2.414
-31.901
-31.521

12
-2.413
-2.413
-31.688
-31.189

B.4.2 Additional result of the three-dimensional FES (s1pe)

Supplementary Figure B.1-B.3 shows the additional 2D projections of the 3D FES (molecule

s1pe) constructed by the Rid and presented CES method. For each case, the reference is constructed

as a 2D FES using the metadynamics. The CES method yields a better agreement with the reference.

The computational cost and accuracy of CES and RiD is shown in Supplementary Table B.3. It

shows that the CES method has better accuray with less computational cost.

83

Table B.3 The accuracy of the constructed 3D FES (the s1pe molecule) and computational time for
RiD and CES methods. The 𝑙2 and 𝑙∞ error are calculated up to 40 KJ/mol. For each case, the FES
is projected onto a 2D plane with the third variable fixed; the reference solution is constructed as a
2D FES by the metadynamics. The simulation time of the CES method is multiplied by 20 since 20
walkers are used.

Method Restraint

RiD

CES

𝜓 = 1.5
𝜔 = 1.5
𝜓 = 1.5
𝜔 = 1.5

Accuracy

𝑙2 error
5.76
12.04
2.44
3.89

𝑙∞ error
25.72
49.13
11.21
28.80

Time

Sampling

Train

423.33

8 (GPU)

4.81 × 20

0.84 (GPU)

Figure B.1 The 3D FES for the s1pe molecule projected on the 𝜙 − 𝜓 plane with the third variable
𝜔 = −1 (first row) and 𝜔 = −2 (second row). (a) 2D FES by metadynamics (reference) (b) RiD (c)
CES.

B.4.3 Additional results of the nine-dimensional FES (s1pe)3

The addition 2D projections of the 9D FES for molecule (s1pe)3 are presented in Supplementary

Figure B.4. Similar to the 3D case, the FES constructed by the present CES method shows a better

agreement with the reference constructed as a 2D FES using metadynamics.

B.4.4 Additional results of the thirty-dimensional FES (ala16)

The addition 2D projections of the 30D FES for molecule ala16 are presented in Supplementary

Figure B.5. We note that the projection on the 𝜙5 − 𝜓5 plane is significantly different from other

projections such as the 𝜙1 − 𝜓1 plane in the main context. While the ala16 molecule consists of 15

sequential alanine residues, the FES for individual 𝜙 − 𝜓 projections shows different features. The

numerical results of the present CES method show good agreement with the reference.

84

202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040Figure B.2 The 3D FES of the s1pe molecule on the 𝜔 − 𝜓 plane with the third variable 𝜙 = 0 (first
row) and 𝜙 = −1 (second row). (a) 2D FES by metadynamics (reference) (b) RiD (c) CES.

Figure B.3 The 3D FES of the s1pe molecule on the 𝜔 − 𝜙 plane with the third variable 𝜓 = 0 (first
row) and 𝜓 = −1 (second row). (a) 2D FES by metadynamics (reference) (b) RiD (c) CES.

85

202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040202202(a)202(b)202(c)010203040Figure B.4 The 9D FES for the (s1pe)3 molecule projected on the 𝜔1 − 𝜙1, 𝜔2 − 𝜙2, 𝜔2 − 𝜓2, 𝜙2 −
𝜓2, 𝜔3 − 𝜙3, 𝜔3 − 𝜓3, 𝜙3 − 𝜓3 plane from top to bottom. (a) 2D FES by metadynamics (reference)
(b) RiD (c) CES.

86

202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060202202(a)202(b)202(c)0102030405060Figure B.5 The 30D FES of the ala16 projected on the 𝜙5 − 𝜓5 plane. (a) 2D FES constructed by
metadynamics (reference) (b) 30D FES constructed by the CES method.

87

202202(a)202(b)0204060