Machine Learning-based Coarse-Grained models for molecular systems

Multiscale modeling poses a formidable challenge in computational mathematics, particularly in integrating microscale interactions into meso- or macro-scale constitutive relations. While reduced-order models allow the simulation of extensive systems, their analytical formulations are generally unclosed. Take coarse-grained molecular dynamics as an example, the Mori-Zwangzig formulism decomposes the dynamics into deterministic, memory, and stochastic terms, but the explicit forms for these three terms are unknown. In this thesis, we present a series of studies to solve these problems. Firstly, the main challenge for the deterministic term comes from the high dimensionality and the presence of energy barriers of the free energy surface (FES). We propose a consensus sampling-based approach that reformulates the FES construction as a minimax problem. This framework simultaneously optimizes the function representation of the FES and the training set used to learn it. In particular, the maximization step establishes a stochastic interacting particle system to achieve the adaptive sampling of the max-residue regime by modulating the exploitation of the Laplace approximation of the current loss function and the exploration of the uncharted phase space; the minimization step updates the FES approximation with the new training set. By iteratively solving the minimax problem, the present method essentially achieves an adversarial learning of the FESs with unified tasks for both phase space exploration and posterior error-enhanced sampling. Besides, memory interactions are also important for predicting the collective transport and diffusion processes. To construct this, we introduce a machine-learning-based coarse-grained molecular dynamics model that captures the dissipative many-body contribution. The neural network representation is carefully designed to preserve the physical symmetries and the thermo-consistency. Unlike the common empirical reduced models, the present model is constructed based on the Mori-Zwanzig formalism and naturally inherits the heterogeneous state-dependent memory term rather than matching the mean-field metrics such as the velocity autocorrelation function. Finally, when applied to non-equilibrium systems, models based on the Mori-Zwanzig formalism face inherent challenges. A key issue lies in the Zwanzig projection, which relies on the marginal distribution of the system. We present a data-driven approach for constructing reduced models that retain certain generalization abilities for non-equilibrium processes. Unlike the conventional CG models based on pre-selected CG variables (e.g., the center of mass), the present CG model seeks a set of auxiliary CG variables based on the time-lagged independent component analysis to minimize the entropy contribution of the unresolved variables. This ensures the distribution of the unresolved variables under a broad range of non-equilibrium conditions approaches the one under equilibrium.Through numerical validation, we demonstrate that our model can accurately predict viscoelastic behavior in various non-equilibrium flow regimes.

Read