SOLVING THE QUANTUM MANY-BODY PROBLEM WITH NEURAL-NETWORK QUANTUM STATES By Jane Mee Kim A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Physics — Doctor of Philosophy Computational Mathematics, Science and Engineering — Dual Major 2023 ABSTRACT Quantum many-body systems provide a rich framework for exploring and understanding the fundamental laws of physics. By studying the collective behavior and emergent phenomena that arise from the intricate microscopic interactions among particles, we can deepen our understanding of quantum mechanics and gain insight into its broader implications for macroscopic observations. However, quantum many-body systems pose significant computational challenges, as the information contained in the many-body wave function grows exponentially with the size of the system. This exponential scaling, coupled with the presence of strong interparticle correlations, makes the accurate description of these systems difficult, if not impossible, for traditional analytical or perturbative techniques. In this interdisciplinary approach, we aim to solve the quantum many-body problem by representing the trial wave function of a variational Monte Carlo calculation by a so-called neural-network quantum state. These states, as their name suggests, are rooted in artificial neural networks and serve as a novel alternative to conventional parameterizations of the trial wave function. In addition to reviewing key concepts in quantum many-body theory and machine learning, we investigate a diverse set of continuous-space systems with varying levels of complexity. Starting from a pedagogical overview of an exactly solvable system of bosons in one dimension, we work our way up to strongly-interacting fermionic systems in three dimensions, including dilute neutron matter and ultra-cold Fermi gases. The highly non-perturbative interactions featured in these systems motivate the development of innovative neural-network quantum states, capable of discovering strong correlations while maintaining required symmetries and boundary conditions. We accompany this study with the description of two distinct implementations of neural-network quantum states, each with their unique goals and strategies. Our findings indicate that neural-network quantum states provide a powerful and flexible strategy for investigating a wide range of quantum phenomena, without relying on prior assumptions about the underlying physics as in traditional approaches. This thesis is dedicated to my parents, who have been a constant source of love and support despite their distance, and to Elliot, my life partner and best friend, who fills my days with joy and laughter. iv ACKNOWLEDGMENTS Meeting Morten Hjorth-Jensen during my senior year at MSU was an incredibly fortunate event that profoundly influenced my academic journey. Knowing he would make an excellent mentor, he was a large part of the reason I decided to stay at MSU to pursue my PhD. His expertise, patience, and dedication have played a pivotal role in fostering my passion and shaping my scientific identity. Morten, I am forever grateful for your kindness, as well as your ability to strike a delicate balance between providing steadfast support and encouraging independent thinking. Please know that you have had an immense impact on my growth as a researcher and a human being. Scott Pratt, I distinctly remember approaching your office one day after encountering the statement "x = x + 1" and thinking to myself, "Well, that’s simply not true!" It seems trivial now, but this moment marked my very first introduction to coding. What’s more, I knew nothing about research at the time, but you convinced me to give it a shot. Thank you for taking a chance on me, and thank you for your many words of wisdom. To my esteemed committee members, Filomena Nunes, Artemis Spyrou, Huey-Wen Lin, Michael Murillo, and Stuart Tessmer (and a special mention to Matthew Hirn, who is no longer at MSU), I am grateful for your support and guidance throughout the years. You have not only fulfilled your roles as committee members but also served as invaluable role models for me. I would also like to express my gratitude to the nuclear theory group, particularly Dean Lee, Scott Bogner, and Heiko Hergert, for the countless stimulating discussions we have had over the years. I hope that our paths will cross often in the future. Alessandro Lovato and Bryce Fore, it has been an absolute delight collaborating with both of you. I have learned so much during this past year and a half, and I eagerly look forward to our continued collaborations in the future. Gabriel Pescia, Jannes Nys, and v Giuseppe Carleo, I am deeply appreciative of the opportunity to work alongside each of you and to have had the unforgettable experience of meeting in person. The time spent together in Lausanne holds a special place in my heart, and I hope to have the chance to visit again in the future. Throughout my nearly decade-long journey at MSU, my interests and career aspirations have undergone complete transformations. In the face of these changes, I am deeply grateful for the presence of Kim Crosslan in the P&A department. Kim, speaking not only for myself but for all graduate students, you are an incredibly important source of stability, and your impact is immeasurable. Thank you for the many times you have saved me. My dearest Elliot, I can not imagine what graduate school would have been like if you had not been there for every second of it. Thank you for being you, and for helping me through stressful times. Umma and Appa, I am excited for you to finally see what I’ve been working on for the past few years. Thank you for always encouraging me to work through tough obstacles to strive towards my goals, hearing ’hwaiting!’ is often exactly what I needed. Therese (Erin), thank you for your supportive words over the years. I miss you all deeply. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Challenges of the Quantum Many-Body Problem . . . . . . . . . . . . . . . 1 1.2 Neural-Network Quantum States . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Algorithm Development Through the Lens of Nuclear Theory . . . . . . . . 3 1.4 Structure of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 The Quantum Many-Body Problem . . . . . . . . . . . . . . . . 5 2.1 Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Indistinguishable Particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 The Schrödinger Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 The Variational Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5 Ab Initio Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 3 Quantum Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Variational Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Diffusion Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Chapter 4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.1 The Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.4 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.5 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.6 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.7 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Chapter 5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.1 NeuralAnsatz: C++ Software for Localized Systems . . . . . . . . . . . . . 115 5.2 Python Software for Periodic Systems . . . . . . . . . . . . . . . . . . . . . . 149 Chapter 6 The Calogero-Sutherland Model . . . . . . . . . . . . . . . . . . . 156 6.1 Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.2 Neural-Network Quantum States . . . . . . . . . . . . . . . . . . . . . . . . 159 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Chapter 7 Dilute Neutron Matter . . . . . . . . . . . . . . . . . . . . . . . . . 169 vii 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Chapter 8 Homogeneous Electron Gas . . . . . . . . . . . . . . . . . . . . . . 181 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 8.2 Exact Backflow Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 183 8.3 Message-Passing Neural-Network Quantum States . . . . . . . . . . . . . . . 185 8.4 Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 8.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Chapter 9 Ultra-cold Fermi Gases . . . . . . . . . . . . . . . . . . . . . . . . 196 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 9.2 Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.3 Neural-Network Quantum States . . . . . . . . . . . . . . . . . . . . . . . . 201 9.4 Variational Monte Carlo and Training . . . . . . . . . . . . . . . . . . . . . . 214 9.5 Diffusion Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 9.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 9.7 Conclusions and perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Chapter 10 Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . 228 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 viii LIST OF TABLES Table 8.1: Total energy per particle in Hartree for unpolarized system of N = 14 particles. WAPNet and FermiNet are alternative NQS architectures optimized via VMC. We include FCI and DCD results as benchmarks from quantum chemistry. . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Table 8.2: Total energy per particle in Hartree for the unpolarized system of N = 54 particles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Table 9.1: The limiting cases of the overlap between two neural backflow spinors inspired by spinors on the Bloch sphere. . . . . . . . . . . . . . . . . . . 209 Table 9.2: Energy per particle for various values of µ and the corresponding values of re . The values with asterisks (∗ ) are extrapolations from the linear fits shown in Fig. 9.4. The parameter v0 = 1 is fixed. . . . . . . . . . . . . . . 220 Table 9.3: Energies per particle and interaction parameters for the two-body potential in Eq. (9.2) giving different scattering lengths with the same effective range kF re = 0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Table 9.4: Energies per particle for different numbers of particles, with kF re = 0.2. Values with asterisks (∗ ) indicate translation-invariant calculations. . . . 225 Table 9.5: Energies per particle for different numbers of particles, with kF re = 0.2. Values with asterisks (∗ ) indicate translation-invariant calculations. . . . 226 ix LIST OF FIGURES Figure 3.1: The cyclic workflow of the variational Monte Carlo algorithm. Starting with a random set of variational parameters, the cycle is repeated until the variational energy converges. . . . . . . . . . . . . . . . . . . . . . . 48 Figure 3.2: The occupancy of single-particle states for bosons (left) and fermions (right) at zero temperature. The two different colors (blue/orange) of fermions represent the two possible spin states (up/down). While bosons can occupy the same single-particle state, fermions are restricted to distinct states due to the Pauli exclusion principle. The energy of the highest occupied single-particle state is commonly referred to as the Fermi energy εF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Figure 3.3: The effects of a symmetric Jastrow factor (left) and a backflow transformation (right) on the nodes of a fermionic wave function. While both the Jastrow factor and the backflow transformation maintain the antisymmetry of the overall wave function and have the ability to modify the wave function’s amplitude, only the backflow transformation is capable of shifting the positions of the nodes. . . . . . . . . . . . . . . . 57 Figure 3.4: This cartoon depicts trajectories of stochastic gradient descent, with momentum (orange) and without (blue), near a narrow parameter space minimum. Momentum smooths the path by counteracting noise in opposing directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Figure 3.5: A visualization of the Stochastic Reconfiguration algorithm, a second-order optimization method that stretches and squeezes the energy landscape such that difficult minima (left) are more isotropic (right). This method typically reduces the required number of optimization steps by an order of magnitude compared to the simple stochastic gradient descent method. . . . . . . . . . . . . . . . . . . . . 67 Figure 4.1: Cartoon of the curse of dimensionality; as the dimension of the data increases (left to right), the density of a fixed number of data points decreases exponentially. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Figure 4.2: Comparison of a random data set (left) with no information, a data set with obvious clusters (middle), and a data set with an obvious lower- dimensional manifold (right). The latter two are examples of real-life data sets that contain information, i.e. they have structure. . . . . . . . 74 x Figure 4.3: A visualization of the two main types of supervised learning: regression (left) and classification (right). The blue data points are the inputs, and the orange data points are the target outputs. The goal of a supervised machine learning problem is to learn and generalize the mapping (solid black lines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Figure 4.4: A depiction of principal component analysis (left) and cluster identification (right), two common types of unsupervised learning. . . . 77 Figure 4.5: A depiction of the variational Monte Carlo algorithm in the context of agent-based modeling, a commonly employed reinforcement learning framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Figure 4.6: A general Boltzmann machine in which all visible (blue circles) and hidden (gray circles) nodes are fully connected (solid black lines). . . . . . . . . 83 Figure 4.7: A standard restricted Boltzmann machine, where the hidden nodes (gray circles) are binary and the visible nodes (blue circles) can be either binary or Gaussian. There are no connections (solid black lines) between nodes within the same layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Figure 4.8: A multivariate Gaussian-binary restricted Boltzmann machine, a step between a standard Gaussian-binary restricted Boltzmann machine and a general Boltzmann machine. In addition to the connections (solid black lines) between the visible nodes (blue circles) and hidden nodes (gray circles), there are connections (dotted black lines) among the visible nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Figure 4.9: The conditional probability of a single binary hidden node hj activating, given the state of the visible nodes v, for a multivariate Gaussian-binary restricted Boltzmann machine. The blue line represents the probability if the hidden nodes are allowed to take values of 0 and 1 (Eq. (4.32)), while the orange line represents the probability of the hidden nodes are allowed to take values of -1 and 1 (Eq. (4.39)). The x-axis is a transformation of the visible nodes given by Eq. (4.28). . . . . . . . . . . . . . . . . . . . . 96 xi Figure 4.10: The conditional probability of a single uniform node hj activating, given the state of the visible nodes v, for a multivariate Gaussian-uniform restricted Boltzmann machine. The blue line represents the probability if the uniform nodes are allowed to take values between 0 and 1 (Eq. (4.43)), while the orange line represents the probability of the hidden nodes are allowed to take values of -1 and 1 (Eq. (4.46)). The x-axis is a transformation of the visible nodes given by Eq. (4.28). We also plot the softplus function (Eq. (4.17)) for comparison, as its exhibits similar character. Probability values greater than 1 are assumed to mean the hidden node hj is always activated. . . . . . . . . . . . . . . . . . . 100 Figure 4.11: Comparison of the four functions we called f (x) during our derivations of the multivariate Gaussian-binary and Gaussian-uniform restricted Boltzmann machines (Eqs. (4.17), (4.41), (4.44), (4.47)). These functions appear in the log-likelihoods log P (v). . . . . . . . . . . . . . . 101 Figure 4.12: A deep feedforward neural network with six inputs (blue circles) and three outputs (orange circles). There are two hidden layers with eight nodes each (gray circles). Adjacent layers are connected by directed connections, as information flows from left to right. Arrowheads for the directed connections (solid black lines) are omitted for visual brevity. . . 103 Figure 4.13: An example of a permutation-invariant Jastrow wave function constructed from the single output of a feedforward neural network. To enforce permutation invariance, the input vectors xi can be sorted according to a chosen rule before concatenating them into a single input vector. The large gray triangle represents a standard feedforward neural network with nine inputs and one output. . . . . . . . . . . . . . . . . . 106 Figure 4.14: A depiction of a permutation-invariant Jastrow wave function constructed from a Deep Set. First, each of the input vectors xi are passed through identical copies of a standard feedforward neural network ϕ (gray trapezoids). Then the pooling operation (gray rounded box) generates a latent space representation of the set of inputs by destroying the ordering. Finally, the latent space representation is passed through another feedforward neural network ρ (gray triangle) with a single output. The positive-definite Jastrow factor can be obtained by simply exponentiating the single output. . . . . . . . . . . . . . . . . . . . . . . 108 Figure 4.15: One layer of a graph neural network that generates an output graph with the same structure as the original input graph. Three types of transformations can be considered: global (top), edge (middle), and vertex (bottom) transformations. The nodes and edges highlighted in orange show examples of the graph components that contribute to a given transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 xii Figure 4.16: A visualization of the attention scores between a single element of a query vector and all the elements of a key vector. Larger attention scores are depicted with darker, bolder lines. . . . . . . . . . . . . . . . . . . . . . 113 Figure 5.1: The highest-level structure of the NeuralAnsatz code, an object-oriented C++ software for neural-network quantum states. . . . . . . . . . . . . . 116 Figure 5.2: A specific example of the hierarchical structure provided by abstract base classes and derived classes. The classes that appear lower on this graph inherit properties from the classes above it. . . . . . . . . . . . . . . . . 120 Figure 6.1: The regularized two-body potential for the Calogero-Sutherland model, where x0 is the regularization parameter (Eq. (6.4)). The long-range behavior of the original potential (black, dashed line) is preserved, but the height of the potential decreases with larger values of x0 . Here, we have used an interaction parameter value of β = 2. . . . . . . . . . . . . 158 Figure 6.2: Training curves for feedforward neural networks with different activation functions. In this initial pretraining stage, x0 = 0.5. . . . . . . . . . . . . 161 Figure 6.3: Training curves for the different variants of Gaussian-binary and Gaussian- uniform restricted Boltzmann machines, with x0 = 0.5. . . . . . . . . . . 163 Figure 6.4: The converged energy as a function of the regularization parameter x0 for feedforward neural networks with different activation functions. . . . . . 164 Figure 6.5: The converged energy as a function of the regularization parameter x0 for the various Gaussian-binary and Gaussian-uniform restricted Boltzmann machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Figure 6.6: One-body densities during three different points during training: the initial stage, after pretraining with x0 = 0.5, and after training with x0 = 0.01. The black dashed line represents the one-body distribution from the exact ground state wave function. . . . . . . . . . . . . . . . . 166 Figure 7.1: NQS training data in neutron matter at ρ = 0.04 fm−3 (data points) compared with Hartree-Fock (dotted line), conventional VMC (dashed line), constrained-path ADMC (dash-dotted line) and unconstrained-path ADMC results (solid line). . . . . . . . . . . . . . . . . . . . . . . . . . . 178 xiii Figure 7.2: Low-density neutron-matter π / EFT equation of state as obtained with the hidden-nucleon NQS for π / EFT potential “o" (blue circles) and π / EFT potential “a" (orange squares) compared with interactions which retain pion-exchange terms (green band). We see that the “o" potential is in excelent agreement with the π-full interactions while the “a" potential has a slightly stiffer equations of state due primarily to a more repulsive three-body force. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Figure 7.3: Spin-singlet and triplet two-body distribution functions at two different densities: ρ = 0.01 fm−3 (panel a) and ρ = 0.04 fm−3 (panel b). The NQS calculations (solid symbols) are compared with non-interacting Fermi Gas results (solid lines). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Figure 8.1: Energy differences between ground-state energies, obtained with other methods and with the MP-NQS, in units of the thermodynamic Hartree Fock energy, for various densities, polarizations, and system sizes. (Top) N = 14, 19, (Bottom): N = 54 particles. Error bars are too small to be visible for most densities. The corresponding numerical data can be found in the Supplemental Material. . . . . . . . . . . . . . . . . . . . . 191 Figure 8.2: Spin-averaged radial distribution function for the homogeneous electron gas with N = 128 electrons at low densities (rs = 110, 200). Error bars are smaller than the symbols. The unpolarized fluid is obtained from [113] for rs = 110. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Figure 9.1: A cartoon of the BCS-BEC crossover. Moving from left to right, the attractive interaction between opposite-spin fermions increases. However, in the BEC regime, the attraction binds the pairs so tightly that they behave as weakly repulsive bosons. The region between the weakly attractive Cooper pairs and the weakly repulsive dimers is known as the unitary limit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Figure 9.2: Schematic representation of a message-passing neural network with T iterations. Dashed lines represent the concatenation operations, while solid lines represent the parameterized transformations (linear transformations and nonlinear feedforward neural networks). Messages, highlighted in pink, mediate the exchange of information between the one- and two-body streams, in blue. A yellow box indicates a single iteration of the network. . . . . . . . . . . . . . . . . . . . . . . . . . . 205 xiv Figure 9.3: Ground-state energies per particle as a function of the MPNN depth T for the SJ-PW (blue squares), SJ-BF (orange circles), and PJ-BF (green triangles) ansätze. The interaction parameters are set to v0 = 1 and µ = 5, corresponding to an effective range of re kF = 0.4. The DMC benchmark energies with and without pairing are displayed as solid and dashed lines, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Figure 9.4: Ground-state energies per particle as a function of the effective range. The DMC-BCS benchmark energies (blue circles) and the Pfaffian-Jastrow with backflow (PJ-BF) energies (orange triangles) are extrapolated to zero effective range using linear fits (dashed lines). The shaded regions are the error bands for the DMC-BCS and PJ-BF energies. . . . . . . . . . . . . 219 Figure 9.5: Opposite-spin pair densities as a function of small kF r at unitarity (v0 = 1) and µ = 5 (blue squares), µ = 10 (orange circles), and µ = 20 (green triangles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Figure 9.6: Opposite-spin pair densities in the crossover region for the BCS phase 1/akF = −0.5 (blue squares), unitarity 1/akF = 0 (orange circles), and BEC phase 1/akF = 0.5 (green triangles). The effective range of all cases are fixed kF re = 0.2. See Table 9.4 for the corresponding values of v0 and µ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Figure 9.7: Upper panel: Energy per particle in the BCS-BEC crossover region as a function of the scattering length a for a fixed effective range kF re = 0.2. Lower panel: Difference between Pfaffian-Jastrow with backflow (PJ-BF) and DMC-BCS benchmark energies. See Table 9.4 for the corresponding values of v0 and µ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 xv 1 Introduction 1.1 Challenges of the Quantum Many-Body Problem One of the fundamental challenges in addressing the quantum many-body problem lies in the extensive amount of information required to fully characterize a quantum state. In contrast to classical mechanics, where the focus is to determine the trajectories of the particles in the system, in quantum mechanics, our objective is to determine the relative probabilities of all potential states occurring for all particles. This distinction arises due to the probabilistic nature of quantum mechanics, where particles can exist in multiple states simultaneously and their behaviors are described by the evolution of wave functions. All possible states of a quantum system can be represented as vectors in a many-body Hilbert space. As the number of particles increases, the dimension of the Hilbert space grows exponentially, resulting in a staggering number of possible states. This exponential growth poses a formidable computational challenge, as the explicit representation and manipulation of the wave function becomes intractable even for systems of modest size. Consequently, quantum many-body theory is driven by the use of efficient and physically motivated approximations. Specific strategies vary depending on the type of problem at hand, but ideally, we want to encode as much relevant information about the system as possible in the fewest number of free parameters or degrees of freedom. In continuous space, the exponential scaling problem is magnified by an infinite basis. Nonetheless, resolving the spatial distribution of a quantum system remains of utmost interest, as it is often more intuitive to interpret physical phenomena in terms of position and spatial relationships. 1 1.2 Neural-Network Quantum States In this study, we solve continuous-space quantum many-body problems by enhancing an established computational approach, called variational Monte Carlo, through the integration of machine learning—a category of models that learns from data rather than relying on predefined instructions. More specifically, we parameterize the trial wave function with artificial neural networks, computational models inspired by the structure of the human brain, and optimize their parameters by minimizing the energy of the system. Trial wave functions that involve these artificial neural networks are aptly named "neural-network quantum states". Since their initial application by Carleo and Troyer in 2017 [1], neural-network quantum states have outperformed traditional variational Monte Carlo calculations in fields ranging from condensed matter physics [2] to quantum chemistry [3] and nuclear physics [4]. Remarkably, in many instances [2, 5], they have even surpassed diffusion Monte Carlo, a computational method widely regarded as the gold standard of quantum many-body methods. The rapid growth of this field underscores the tremendous potential demonstrated by neural-network quantum states. As we continue to develop and refine these techniques, new and exciting opportunities emerge for further breakthroughs that could reshape our understanding of complex quantum systems. 2 1.3 Algorithm Development Through the Lens of Nuclear Theory Advancements in neural-network quantum states often find inspiration from established techniques in machine learning. While the prospect of adapting unexplored machine learning methods sparks curiosity, it is equally intriguing to explore the advantages that a background in physics can bring to overcoming these challenges. In particular, the distinctive nature of nuclear interactions provides a compelling reason to design neural-network quantum states with broader applicability compared to states tailored for interactions commonly encountered in condensed matter systems. Nuclear Hamiltonians, which describe the interactions between protons and neutrons in real space, offer profound insights into the intricate mechanisms governing the universe—from the microscopic domain of atomic nuclei to the expansive reaches of neutron stars. With the intention of eventually applying our neural-network quantum states to nuclear systems, we thoughtfully design the wave functions to ensure compatibility with the exchange of spin and isospin. For instance, in our investigation of ultra-cold Fermi gases in Sec. 9, we adopt a generalized approach to the Pfaffian wave function [6, 7], a type of wave function used in the presence of strong pairing correlations. Even though ultra-cold Fermi gases technically fall under the condensed matter category, our Pfaffian neural-network quantum state can be applied immediately to nuclear systems as well. By embracing a more inclusive approach, we discover a remarkably versatile and scalable ansatz. 3 1.4 Structure of Dissertation This dissertation begins with a comprehensive overview of the quantum many-body problem in Chapter 2, including the establishment of notation and vocabulary used in later chapters. In Chapter 3, we discuss Quantum Monte Carlo methods, serving as a foundational framework to facilitate the incorporation of machine learning techniques for enhanced flexibility. Then in Chapter 4, we provide a broad background on machine learning, focusing on the mathematics of artificial neural networks. Chapter 5 presents two distinct approaches for the implementation of neural-network quantum states: one in C++ and the other in Python. The former allows for a deeper understanding of how the neural-network quantum states are trained, as gradients are computed analytically in terms of the trainable parameters, while the latter affords greater freedom in design, as gradients are computed numerically. Chapters 6-9 provide illustrative examples of neural-network quantum states employed in different quantum systems, with the latter three chapters being a collection of articles. Each of these examples will provide discussion on the power of neural-network quantum states as they are applied in the specific study. Finally, Chapter 10 includes conclusions and perspectives on the field as a whole. 4 2 The Quantum Many-Body Problem The collective behavior of a quantum many-body system often displays remarkable deviations from the simple summation of the individual constituents. Strong correlations and intricate interactions between particles lead to emergent phenomena that cannot be adequately described by classical mechanics alone. These phenomena, such as phase transitions and the formation of superfluid states, underscore the importance of quantum mechanics in unraveling the complex behaviors and novel properties of quantum many-body systems. In this study, our objective is to solve the time-independent Schrödinger equation for a nonrelativistic system of identical particles in continuous space. We will focus on determining the ground state, which plays a pivotal role in understanding the system’s stability, symmetries, and equilibrium properties. Furthermore, the ground state serves as a reference point for all excited states, enabling meaningful comparisons to experimental observations. The purpose of this chapter is to establish the foundation for our investigation. We will cover the essential aspects, significance, and challenges associated with quantum many-body problems, as well as examples of commonly employed ab initio methods. This discussion will help provide a comprehensive and contextualized understanding of the field and motivate our subsequent efforts to enhance quantum Monte Carlo methods with neural networks. 5 2.1 Formalism Dirac notation, also known as bra-ket notation, offers a concise and abstract representation of quantum states, operators, and measurements. It is intimately linked to the concept of Hilbert space, a complex inner product space denoted as H. A general quantum state is represented as a ket vector that lives in Hilbert space |Ψ⟩ ∈ H. For each ket |Ψ⟩, there exists a corresponding linear map from Hilbert space to the field of complex numbers ⟨Ψ| : H → C, which we call a bra vector. One can interpret a bra vector as a measurement device or probing tool that extracts useful information regarding a particular state. By applying a bra ⟨Ψ| to the left-hand side of another ket |Φ⟩, we define the inner product ⟨Ψ|Φ⟩, a complex scalar that characterizes the overlap between the states |Ψ⟩ and |Φ⟩. This inner product is essential for computing probabilities and expectation values of observables, as it enables us to properly define the notions of distance and orthogonality between states in Hilbert space. Operators are linear transformations  : H → H that act on a state to produce another state. While some operators, like time-evolution or rotation operators, do not correspond directly to measurable quantities, they still have mathematical and physical significance for describing the dynamics and symmetries of quantum states. Observables, such as position, momentum, and spin, are represented by Hermitian operators. Formally, they satisfy ⟨ÂΨ|Φ⟩ = ⟨Ψ|ÂΦ⟩ for any |Ψ⟩, |Φ⟩ ∈ H. Hermitian operators have two important features: 1. Real eigenvalues. All possible measurements are real, implying that all expectation values ⟨Â⟩ ≡ ⟨Ψ|Â|Ψ⟩/⟨Ψ|Ψ⟩ are also real. The denominator ensures that the average is taken over a normalized probability distribution. These real expectation values play a crucial role in making comparisons and predictions with experimental results. 2. Orthogonal eigenvectors. The eigenvectors corresponding to distinct eigenvalues form 6 a set of orthonormal basis vectors {|α⟩} that span the Hilbert space, yielding the completeness relation X 1̂ = |α⟩⟨α|. (2.1) α Any state can be expanded as a linear combination of the eigenvectors ! X X |Ψ⟩ = |α⟩⟨α| |Ψ⟩ = ⟨α|Ψ⟩|α⟩. (2.2) α α If the states are continuous, we convert the sum into an integral Z Z 1̂ = dα|α⟩⟨α|, |Ψ⟩ = dα⟨α|Ψ⟩|α⟩. (2.3) Since we will work with both continuous and discrete states, this distinction is of minor significance. It will be left to the reader to convert sums into integrals or vice versa, whenever they are not explicitly written. Therefore, one should identify the observables which provide the most convenient set of basis states {|α⟩} to represent the many-body state |Ψ⟩. Determining the coefficients (or functions) ⟨α|Ψ⟩ is equivalent to finding a representation of the abstract state |Ψ⟩. The best choice of basis for a many-body system depends on the specific nature of the particles and the chosen method for solving the problem. In preparation for our later treatment of N particles in d spatial dimensions, let us define our single-particle basis states as   |ri ⟩, for bosons,   |xi ⟩ = (2.4)  |ri ⟩ ⊗ |si ⟩ ≡ |ri , si ⟩, for fermions,   7 where ri ∈ Rd are the coordinates of the ith particle, si ∈ Rs contains the spin-like degrees of freedom (if there are any), and i = 1, 2, ..., N . The spatial parts are eigenstates of the single-particle position operator1 rˆi , rˆi |ri ⟩ = ri |ri ⟩, (2.5) where the inner product between any two eigenstates is a d-dimensional Dirac delta function ⟨ri |ri′ ⟩ = δ(ri − ri′ ). (2.6) The spin parts are written as   |szi ⟩, for spin-1/2 fermions,   |si ⟩ = (2.7)  |szi , tzi ⟩ = |szi ⟩ ⊗ |tzi ⟩, for nucleons,   where |szi ⟩ are the eigenstates of the single-particle spin-z operator ŝzi , ŝzi |szi ⟩ = szi |szi ⟩, ⟨szi |sz′ i ⟩ = δsz sz′ , (2.8) i i and |tzi ⟩ are the eigenstates of the single-particle isospin-z operator t̂zi , t̂zi |tzi ⟩ = tzi |tzi ⟩, ⟨tzi |tz′ i ⟩ = δtz tz′ , (2.9) i i 1 The position operator for the ith particle is commonly denoted as X̂ or x̂ , but we chose rˆ to stay i i i consistent with our definition of xi and ri . 8 and δij denotes the Kronecker delta. The resulting single-particle Hilbert spaces are   L2 (Rd ), for spin-0 bosons,         Hi = L2 (Rd ) ⊗ C2 , for spin-1/2 fermions, (2.10)      L2 (Rd ) ⊗ C2 ⊗ C2 , for nucleons,    and a many-body configuration can be written as the tensor product of the single-particle basis states |X⟩ = |x1 ⟩ ⊗ |x2 ⟩ ⊗ · · · ⊗ |xN ⟩ ≡ |x1 , x2 , ..., xN ⟩ ∈ H, (2.11) living in a tensor product of single-particle Hilbert spaces H = H1 ⊗ H2 ⊗ · · · ⊗ HN . (2.12) Assuming the dimensions of all the single-particle Hilbert spaces are the same, it is now clear that the dimension of the many-body Hilbert space scales exponentially with the number of particles N dim(Hi ) = dim(H1 )N . Y dim(H) = (2.13) i=1 This is the primary source of difficulty in quantum many-body problems. The exponential scaling of the Hilbert space dimension applies for even the simplest spin systems in which dim(H1 ) = 2. Our challenge is exacerbated by the fact that the particles in our systems of interest have continuous spatial degrees of freedom, resulting in an infinite dim(H1 ). Accordingly, quantum many-body methods are often built around different ways of 9 mitigating this problem, such as truncating Hilbert space to a finite subspace or exploring only relevant areas of Hilbert space stochastically. Our many-body state |Ψ⟩ can be expanded in terms of the many-body configurations defined in Eq. (2.11), Z |Ψ⟩ = dX⟨X|Ψ⟩|X⟩ (2.14) where our goal is to determine the wave function Ψ(X) ≡ ⟨X|Ψ⟩, the probability amplitude of finding the system in a certain configuration |X⟩. Since we may have a mixture of continuous and discrete degrees of freedom, it is sometimes convenient to decompose |X⟩ as |X⟩ = |R⟩ ⊗ |S⟩, (2.15) |R⟩ = |r1 ⟩ ⊗ |r2 ⟩ ⊗ · · · ⊗ |rN ⟩ ≡ |r1 , r2 , ...rN ⟩, (2.16) |S⟩ = |s1 ⟩ ⊗ |s2 ⟩ ⊗ · · · ⊗ |sN ⟩ ≡ |s1 , s2 , ...sN ⟩, (2.17) instead of how it was originally expressed in Eq. (2.11). Then we can easily convert the integral over X into a discrete sum over the spin degrees of freedom and an integral over the continuous spatial degrees of freedom Z XZ XZ dd r1 dd r2 · · · dd rN , XX dX −→ dR = ··· (2.18) S s1 s2 sN with the latter omitted in the bosonic case. The single-particle momentum operator p̂i = −iℏ∇i , (2.19) 10 where ∇i ≡ ∂r ∂ is the gradient with respect to position the ith particle, acts on the wave i function as Z Z   p̂i |Ψ⟩ = dX⟨X|p̂i |Ψ⟩|X⟩ = dX − iℏ∇i ⟨X|Ψ⟩ |X⟩ (2.20) The single-particle eigenstates are p̂i |pi ⟩ = pi |pi ⟩, ⟨pi |p′i ⟩ = δ(pi − p′i ), (2.21) with the continuous momenta pi ∈ Rd becoming discrete in the presence of periodicity. Expressed in position space, the momentum eigenfunctions are plane waves 1 φpi (xi ) ≡ ⟨xi |pi ⟩ = eipi ·ri /ℏ , (2.22) (2πℏ)d/2 which will become relevant in our studies of infinite matter. Just as we can expand states in terms of our working basis, we can expand operators as well, Z  Z  Z Z  = dX|X⟩⟨X|  dX |X ⟩⟨X | = dX dX ′ ⟨X|Â|X ′ ⟩|X⟩⟨X ′ |, ′ ′ ′ (2.23) where the coefficients ⟨X|Â|X ′ ⟩ are called matrix elements of the operator Â, even though our basis contains continuous components. If the operator is diagonal in our basis, we can 11 write Z Z  = dX dX ′ ⟨X|Â|X ′ ⟩|X⟩⟨X ′ | Z Z = dX dX ′ ⟨X|Â|X⟩δ(X − X ′ )|X⟩⟨X ′ | Z (2.24) = dX⟨X|Â|X⟩|X⟩⟨X| Z = dX Â(X)|X⟩⟨X| with δ(X − X ′ ) ≡ δ(R − R′ )δS,S ′ representing the product of a Dirac delta function over the spatial degrees of freedom and a Kronecker delta over the spin degrees of freedom, if it applies. It is common to leave off the explicit dependence on X in the the function Â(X) ≡ ⟨X|Â|X⟩, since this representation of  is particularly simple. 2.2 Indistinguishable Particles The behavior of quantum systems exhibit fundamental differences from classical systems, primarily due to the concept of particle indistinguishability. In classical mechanics, particles are always distinguishable; even when they possess identical masses, charges, or other properties, they can be assigned different labels to unambiguously describe and predict their motion. As a result, classical systems feature well-defined trajectories for each particle, albeit often involving highly nonlinear coupled differential equations. When we established our mathematical framework in the previous section, we implicitly assumed that the particles were distinguishable. This is evident in Eq. (2.11) and Eq. (2.12), for example, where each particle and single-particle Hilbert space was given an index corresponding to their ordering. Now, let us consider what happens if we assume the particles are indistinguishable. More specifically, we assume that the expectation value of 12 any observable  is invariant under any permutation P̂ ∈ SN of the N particle indices ⟨Ψ|Â|Ψ⟩ ⟨P̂ Ψ|Â|P̂ Ψ⟩ ⟨Ψ|P̂ −1 ÂP̂ |Ψ⟩ = = , (2.25) ⟨Ψ|Ψ⟩ ⟨P̂ Ψ|P̂ Ψ⟩ ⟨Ψ|Ψ⟩ where we have used that P̂ † = P̂ −1 in the last equality. The above equation implies that [Â, P̂ ] = 0. Since any permutation in the symmetric group SN can be written as a product of transpositions, it is sufficient to investigate the action of the transposition operator P̂ij , also known as the particle exchange operator, which trades the labeling of the ith and jth particle. P̂ij |X⟩ = P̂ij |x1 , ..., xi , ..., xj , ..., xN ⟩ = |x1 , ..., xj , ..., xi , ..., xN ⟩. (2.26) Applying the same particle exchange operator twice restores the original ordering P̂ij2 |X⟩ = P̂ij |x1 , ..., xj , ..., xi , ..., xN ⟩ = |x1 , ..., xi , ..., xj , ..., xN ⟩ = |X⟩, (2.27) implying the eigenvalues of P̂ij are ±1. Thus, the possible effects of exchanging the particle ordering P̂ij |ΨS ⟩ = |ΨS ⟩, for bosons, (2.28) P̂ij |ΨA ⟩ = −|ΨA ⟩, for fermions, (2.29) give rise to two fundamentally different classes of identical particles: bosons, which have purely symmetric states and obey Bose-Einstein statistics, and fermions, which have purely antisymmetric states and obey Fermi-Dirac statistics. The subscripts S and A stand for 13 symmetric and antisymmetric, respectively. By the spin-statistics theorem, the difference between the two categories is attributed to an intrinsic property called spin. Bosons, such as photons and gluons, have integer values of spin, while fermions, such as electrons, protons, and neutrons, have half-integer values. We will only consider spin-0 and spin-1/2 particles in this work, as shown already in Eq. (2.10). In Sec. 9, we will discover that the line distinguishing bosons and fermions is sometimes less distinct than it initially appears. It is straightforward to see how the action of the particle exchange operator can be generalized to an arbitrary permutation P̂ ∈ SN . The sign, or parity, of the permutation is σ(P ) = (−1)p , where p is the number of transpositions required to decompose the permutation into a product of transpositions2 . Then the permutation acts as P̂ |ΨS ⟩ = |ΨS ⟩, for bosons, (2.30) P̂ |ΨA ⟩ = σ(P )|ΨA ⟩, for fermions, (2.31) for the two cases. Our working basis states |X⟩ are not eigenstates themselves, but we can construct symmetric and antisymmetric states from them. In fact, we can define the symmetrization operator 1 X Ŝ = P̂ , (2.32) N! P̂ ∈SN and the antisymmetrization operator 1 X  = σ(P )P̂ , (2.33) N! P̂ ∈SN 2 There exists an infinite number of ways to decompose a permutation into a product of transpositions. However, an odd permutation will invariably be decomposed into an odd number of transpositions, and an even permutation will be decomposed into an even number of transpositions. 14 that will map any many-body state |Ψ⟩ ∈ H to the equivalent symmetric and antisymmetric states Ŝ|Ψ⟩ ≡ |ΨS ⟩, Â|Ψ⟩ ≡ |ΨA ⟩. (2.34) Both of these operators are idempotent, Ŝ 2 = Ŝ, Â2 = Â, (2.35) meaning they project generic states in Hilbert space to symmetric and antisymmetric subspaces, respectively, |ΨS ⟩ ∈ HS , |ΨA ⟩ ∈ HA . (2.36) Applying the symmetrization operator after already applying the antisymmetrization operator destroys any state, and vice versa Ŝ  = ÂŜ = 0, (2.37) implying the subspaces HS and HA are entirely distinct. Futhermore, any two states that are related by a permutation |Ψ′ ⟩ = Pˆ′ |Ψ⟩, Pˆ′ ∈ SN , (2.38) will map to the same symmetric state |Ψ′S ⟩ = Ŝ|Ψ′ ⟩ = Ŝ|Ψ⟩ = |ΨS ⟩, (2.39) 15 and to the same antisymmetric state except for a possible sign flip |Ψ′A ⟩ = Â|Ψ′ ⟩ = σ(P ′ )Â|Ψ⟩ = σ(P ′ )|ΨA ⟩. (2.40) Therefore, the many-body Hilbert space H that we established in Eq. (2.12) for distinguishable particles is typically much larger than the direct sum of the subspaces for indistinguishable particles HS ⊕ HA ⊆ H. (2.41) This is spectacular news! The physically realizable quantum states of our system will either be in HS or HA , which is smaller than the entire Hilbert space H by a factor that scales as N !, the order of the symmetric group SN . While this does not alleviate the dimensionality problem entirely because our Hilbert space is still infinite dimensional, it does help make the many-body problem more tractable. Enforcing other symmetries, such as parity and time-reversal, can similarly help reduce the effective size of our Hilbert space. Now that we are able to construct both symmetric and antisymmetric states from a generic one, let us consider what happens if two (or more) indistinguishable particles occupy the same exact single-particle state. Then there exists a particle exchange operator P̂ij for some i ̸= j that generates a symmetry of the many-body state, P̂ij |Ψij ⟩ = |Ψij ⟩. (2.42) Here, we have denoted the state as |Ψij ⟩ to specify that it is invariant under i ↔ j. No issues arise for the corresponding bosonic system according to Eq. (2.39), so bosons are fine with being on top of their identical neighbors. However, for the fermionic system, Eq. (2.40) 16 implies a contradiction Â|Ψij ⟩ = −Â|Ψij ⟩. (2.43) The above equation can only be possible if Â|Ψij ⟩ = 0, leading to the well-known Pauli exclusion principle. This fundamental principle asserts that no two identical fermions can occupy the same quantum state simultaneously. As a result, fermions exhibit distinct behavior from bosons, particularly at low temperatures. 2.3 The Schrödinger Equation The behavior of a non-relativistic quantum system is governed by the Schrödinger equation ∂ iℏ |Ψ⟩ = Ĥ|Ψ⟩, (2.44) ∂t where the Hamiltonian Ĥ is a Hermitian operator that represents the total energy of the system. We will exclusively work with a time-independent Hamiltonian that is diagonal in our working basis |X⟩, meaning Z Ĥ = dX Ĥ(X)|X⟩⟨X|, (2.45) where Ĥ(X) ≡ ⟨X|Ĥ|X⟩ as defined in Eq. (2.24). Our Hamiltonian takes the form Ĥ = K̂ + V̂ , (2.46) 17 where the non-relativistic kinetic energy is defined as N N 1 X 2 ℏ2 X 2 K̂ = p̂i = − ∇i (2.47) 2m 2m i=1 i=1 and the potential energy V̂ = V (X) may include external potentials, two- or three-body interactions, and spin- or isospin-dependence. While the above Hamiltonian is not dependent on time, the state |Ψ⟩ still could be. We have suppressed this time-dependence thus far, but we will now make it explicit by writing Z |Ψ(t)⟩ = dX⟨X|Ψ(t)⟩|X⟩, (2.48) where the time-dependent wave function can be decomposed as ⟨X|Ψ(t)⟩ ≡ Ψ(X, t) = ψ(X)ϕ(t), (2.49) following the separation of variables method of solving partial differential equations. Then the Schrödinger equation reads ∂ ∂ iℏ Ψ(X, t) = Ĥ(X)Ψ(X, t) =⇒ iℏψ(X) ϕ(t) = ϕ(t)Ĥ(X)ψ(X). (2.50) ∂t ∂t Rearranging all of the t-dependent components on the left and the X-dependent components on the right, iℏ ∂ 1 ϕ(t) = Ĥ(X)ψ(X) = E, (2.51) ϕ(t) ∂t ψ(X) 18 we find that each side of the equation must be equal to some constant E. The left-hand side can be solved immediately ∂ iℏ ϕ(t) = Eϕ(t) =⇒ ϕ(t) = ϕ(0)e−iEt/ℏ . (2.52) ∂t The right-hand side, however, is an eigenvalue problem Ĥ(X)ψ(X) = Eψ(X), (2.53) also known as the time-independent Schrödinger equation. Eigenvalue problems are challenging because they are inherently nonlinear. What’s more, the eigenvectors in our situation are actually continuous eigenfunctions rather than finite-dimensional vectors. However, once we identify the solutions of the time-independent Schrödinger equation above, we will automatically obtain the solutions of the full time-dependent Schrödinger equation as well. Let us index the possible solutions with α, which can either label the discrete energy spectrum for a bound system or the continuous spectrum for an unbound system. Then the so-called stationary states are Ψα (X, t) = ψα (X)e−iEα t/ℏ , (2.54) where we omit writing the constant ϕ(0) from Eq. (2.52) because it can be absorbed into the coefficients cα of the general solution, a linear combination of stationary states X Ψ(X, t) = cα Ψα (X, t). (2.55) α 19 The stationary states are called "stationary" because their associated probability distributions do not depend on time |Ψα (X, t)|2 = ψα∗ (X)eiEα t/ℏ ψα (X)e−iEα t/ℏ = |ψα (X)|2 . (2.56) Given that i(Eα −Eβ )t/ℏ |Ψ(X, t)|2 = c∗α cβ ψα∗ (X)ψβ (X)e XX , (2.57) α β the same cannot be said about the general solution. 2.4 The Variational Principle While the solutions of the time-dependent Schrödinger equation are vital for understanding the dynamics of the quantum system, it is a necessary step to first solve the time-independent Schrödinger equation. However, the exact time-independent solutions and their corresponding energies are still difficult, if not impossible, to obtain. One valuable tool for finding approximate solutions is called the variational principle, which states that the expectation value of the Hamiltonian is an upper bound for the true ground state energy of the system E0 , ⟨Ψ|Ĥ|Ψ⟩ E[Ψ] ≡ ⟨Ĥ⟩ = ≥ E0 , (2.58) ⟨Ψ|Ψ⟩ for any state |Ψ⟩. It is simple to prove. First, we write the eigenstates of the Hamiltonian as |Ψα ⟩ = |α⟩ and their corresponding energies as Eα . The eigenstates form a complete orthonormal basis with ⟨α|β⟩ = δαβ . If we assume |Ψ⟩ is normalized, ⟨Ψ|Ψ⟩ = 1, then the 20 expectation value of the energy can be expanded as E[Ψ] = ⟨Ψ|Ĥ|Ψ⟩ XX = ⟨Ψ|α⟩⟨α|Ĥ|β⟩⟨β|Ψ⟩ α β XX (2.59) = ⟨Ψ|α⟩⟨β|Ψ⟩Eα δαβ α β Eα |⟨α|Ψ⟩|2 . X = α The ground state energy E0 , by definition, is smaller than all the other energy eigenvalues {Eα }α̸=0 , leading to the result Eα |⟨α|Ψ⟩|2 ≥ E0 , X E[Ψ] = (2.60) α where equality can only hold for the ground state |Ψ0 ⟩. Similar variational principles can be developed for the excited states. First, let us order our energy eigenstates such that E0 ≤ E1 ≤ E2 ..., where E1 is the energy of the first excited state |Ψ1 ⟩, E2 is the energy of the second excited state |Ψ2 ⟩, and so on. Then the variational principle for the nth excited state is given by ⟨Ψ|Ĥ|Ψ⟩ E[Ψ] = ≥ En , (2.61) ⟨Ψ|Ψ⟩ if our state |Ψ⟩ is orthogonal to all the lower-lying states, ⟨Ψm |Ψ⟩ = 0 for all m < n. (2.62) 21 This statement can be proved by the same method as Eq. (2.59). We have used the Latin indicies m and n in place of the Greek indicies α and β to emphasize that our eigenstates have been ordered according to their corresponding energies. Another useful fact is that the variance of energy is only zero if and only if our state is an eigenstate of the Hamiltonian σE2 = 0 ⇐⇒ |Ψ⟩ = |Ψ ⟩ for some α. α (2.63) The backward direction of the statement above is trivial to prove, so we will just prove the forward direction here. Expanding the variance of the energy in terms of the energy eigenstates, we obtain σE2 = ⟨(Ĥ − ⟨Ĥ⟩)2 ⟩ = ⟨Ĥ 2 ⟩ − ⟨Ĥ⟩2 !  Eα2 |⟨α|Ψ⟩|2 − Eα |⟨α|Ψ⟩|2  Eβ |⟨β|Ψ⟩|2  X X X = α α β (2.64) Eα2 |⟨α|Ψ⟩|2 − Eα Eβ |⟨α|Ψ⟩|2 |⟨β|Ψ⟩|2 X XX = α α β   Eα |⟨α|Ψ⟩|2 Eα − Eβ |⟨β|Ψ⟩|2  . X X = α β There are only two ways that the above can equal zero for all α: either |⟨α|Ψ⟩|2 = 0 or Eβ |⟨β|Ψ⟩|2 = Eα |⟨α|Ψ⟩|2 =⇒ |⟨α|Ψ⟩|2 = 1. X Eα = (2.65) β Therefore, the state |Ψ⟩ must be an eigenstate of the Hamiltonian for some α. A nonzero variance of the energy indicates that the system is not in a well-defined state, but rather, a 22 superposition of eigenstates. We must address an important caveat regarding the variational principle. Specifically, the variational principle only holds true if the state adheres to the correct symmetries of the system. The many-body Hamiltonian often does not provide information about whether the system consists of bosons or fermions. For example, when describing a system of identical charged particles, the Coulomb interaction only depends on the charge and distances between the pairs. If one employs a state intended for bosons when computing the expectation value of the energy for a fermionic system, it can lead to energies much lower than the true ground state energy of the system. Such erroneous results are not violations of the variational principle; instead, they arise due to incorrect implementation. 2.5 Ab Initio Methods A many-body method is called ab initio (or "from first principles") if it only relies on fundamental principles and established laws of nature. It involves solving the Schrödinger equation starting from a microscopic Hamiltonian rather than a Hamiltonian derived from empirical or experimental data. In this section, we will provide examples of common ab initio methods. While some these methods are not explicitly formulated for real-space systems, they can often be adapted and applied to such systems with the appropriate modifications. 2.5.1 Configuration Interaction Full Configuration Interaction is an exact method that aims to solve the time-independent Schrödinger equation through diagonalization of the Hamiltonian in matrix form. We begin by expanding the wave function as a linear combination of orthonormal many-body basis 23 states {|α⟩}, X |Ψ⟩ = cα |α⟩, (2.66) α where cα are the coefficients of the expansion and the states |α⟩ are usually taken to be all possible antisymmetrized products of single-particle basis states, also known as Slater determinants, for fermionic systems. By inserting this expansion and the completeness relation into the Schrödinger equation, we obtain Ĥ|Ψ⟩ = E|Ψ⟩, (2.67) X X Ĥcα |α⟩ = E cα |α⟩, (2.68) α α XX X ⟨β|Ĥ|α⟩cα |β⟩ = E cα |α⟩. (2.69) α β α Since the basis states are orthogonal, multiplying on the left by ⟨γ| yields X ⟨γ|Ĥ|α⟩cα = Ecγ , (2.70) α which can be written in a more convenient form by organizing the coefficients cα into a vector and ⟨γ|Ĥ|α⟩ into a matrix       ⟨1|Ĥ|1⟩ ⟨1|Ĥ|2⟩ ··· ⟨1|Ĥ|α⟩ · · ·  c1   c1             ⟨2|Ĥ|1⟩ ⟨2|Ĥ|2⟩ · · · ⟨2|Ĥ|α⟩ · · ·  c2   c2       .. .. .. .. .. = E  ..       . . . . (2.71)  .  .       · · ·          ⟨α|Ĥ|1⟩ ⟨α|Ĥ|2⟩ · · · ⟨α|Ĥ|α⟩ · · · cα  cα             .. .. .. .. ..  .  .. . .. . . . . . 24 Clearly, solving the eigenvalue problem above is only imaginable when the matrix is of finite size. However, even in such cases, the dimensionality of the problem often poses significant challenges and renders it computationally intractable. One alternative approach is to truncate the many-body basis to a finite, manageable subset {|α1 ⟩, |α2 ⟩, ..., |αn ⟩}, leading to the more commonly-employed method called Configuration Interaction. In truncating our many-body basis, it becomes important to consider which states we include in our subset. Typically, the most relevant states are constructed by identifying a suitable reference state, like the Hartree-Fock solution, and constructing excitations or configurations around it. 2.5.2 Hartree-Fock Theory Hartree-Fock theory is one of the simplest approximate methods for solving the many-body Schrödinger equation. It provides a mean-field description of the system, where each particle moves in an effective average field created by all other particles. This approach essentially decouples the two-body interaction, resulting in a tractable computational scheme, but it neglects correlation effects, which can be significant in systems with strong interactions. Let us assume that the Hamiltonian of a fermionic system contains up to two-body interactions Ĥ(X) = Ĥ0 (X) + V̂ (X), (2.72) XN Ĥ0 (X) = ĥ0 (xi ), (2.73) i=1 XN V̂ (X) = v̂(xij ), (2.74) i 0. Our goal is to accurately approximate the target distribution P (X) by the distribution of a finite number of samples T 1X P (X) ≈ δ(X − Xt ), (3.3) T t=1 which is only possible if P (X) is invariant under the action of the transition probability. In 35 other words, P (X) is the stationary distribution for the Markov process characterized by P (X ′ |X) if it satisfies Z P (X ′ ) = dXP (X ′ |X)P (X), (3.4) for all X ′ . In principle, the above stationary condition provides a way to evolve to P (X) starting from any initial distribution P (X0 ). The distribution at a later time t can be obtained by repeatedly applying the transition probability on P (X0 ), Z P (Xt ) = dXt−1 dXt−2 · · · dX0 P (Xt |Xt−1 )P (Xt−1 |Xt−2 ) · · · P (X1 |X0 )P (X0 ). (3.5) For large enough t, the Markov chain must converge to the stationary distribution lim P (Xt ) = P (X), (3.6) t→∞ regardless of the initial state X0 . The algorithm for generating the samples is as follows: 1. Initialize. Draw the first sample X0 from any initial probability distribution P (X0 ). 2. Iterate. For each iteration t = 1, ..., T , draw the next sample Xt from P (Xt ) in Eq. (3.5). While it is simple to state the algorithm, the second step is somewhat difficult to complete without additional restrictions. In order to treat the samples in a Markov chain as independent samples from the target distribution, the chain should be fully equilibrated and the autocorrelation between samples should be low. Let us define τ as the number of steps required to sufficiently reach the 36 stationary distribution, also known as the equilibration or burn-in time. In addition, let us define γ as the number of steps required to sufficiently reduce the autocorrelation between samples ⟨Xt · Xt+γ ⟩ ≈ 0. (3.7) Then, our target distribution can be better approximated by T 1X P (X) ≈ δ(X − Xτ +γt ), (3.8) T t=1 for some large, but finite value of T . This approach allows us to faithfully treat each effective sample Xτ +γt as independent samples from P (X), at the cost of throwing away τ + (γ − 1)T samples out of the total number of samples τ + γT . 3.1.1.1 Metropolis-Hastings The Metropolis-Hastings algorithm is a specific type of MCMC sampling algorithm that enforces the detailed balance condition, a stricter restriction than the stationary condition in Eq. (3.4). Namely, it assumes the Markov process is reversible P (X ′ |X)P (X) = P (X|X ′ )P (X ′ ), (3.9) for every pair of states X, X ′ . By integrating both sides of the detailed balance condition, Z Z Z dXP (X ′ |X)P (X) = dXP (X|X ′ )P (X ′ ) = P (X ′ ) dXP (X|X ′ ) = P (X ′ ), (3.10) it automatically follows that P (X) is the stationary distribution. 37 In addition, this approach involves decomposing the transition probability into two components P (X ′ |X) = Q(X ′ |X)A(X ′ |X), (3.11) where the proposal probability density Q(X ′ |X) suggests a candidate state X ′ based on the current state X, and the acceptance probability A(X ′ |X) determines if the proposed state should be accepted or rejected. Plugging this into the detailed balance equation yields A(X ′ |X) Q(X|X ′ )P (X ′ ) = . (3.12) A(X|X ′ ) Q(X ′ |X)P (X) Q(X|X ′ )P (X ′ ) If A(X|X ′ ) = 1, then A(X ′ |X) = . Likewise, if A(X ′ |X) = 1 then Q(X ′ |X)P (X) Q(X ′ |X)P (X) A(X ′ |X) = . Thus the choice of A(X ′ |X) that maximizes the acceptance Q(X|X ′ )P (X ′ ) probability is Q(X|X ′ )P (X ′ )   A(X ′ |X) ≡ min 1, . (3.13) Q(X ′ |X)P (X) In practice, the Metropolis-Hastings algorithm can be realized by separating the transition step into two parts: 1. Initialize. Draw the first sample X0 from an initial probability distribution P (X0 ). 2. Iterate. For each iteration t = 1, ..., T : (a) Propose. Draw a candidate state X ′ based on the previous state Xt−1 according to the proposal probability density Q(X ′ |Xt−1 ). (b) Calculate. Calculate the acceptance probability of the transition A(X ′ |Xt−1 ). (c) Accept-or-reject. Draw a uniform random number r ∼ U(0, 1) between 0 and 1. If r < A(X ′ |Xt−1 ), accept the transition by setting Xt = X ′ . Otherwise, reject 38 the transition by setting Xt = Xt−1 . Since our states may involve discrete spin projections S in addition to the continuous spatial coordinates R, we will further decompose all the probabilities as P (X) = P (R)P (S), (3.14) Q(X ′ |X) = Q(R′ |R)Q(S ′ |S), (3.15) A(X ′ |X) = A(R′ |R)A(S ′ |S), (3.16) and perform Metropolis steps in R and S separately. For a localized system of particles, one can take the initial distribution of the spatial degrees of freedom as P (R0 ) ∼ N (0, σ02 I), (3.17) with a variance σ02 chosen to minimize the equilibration time τ . For an infinite system, one can take the initial distribution as a uniform distribution P (R0 ) ∼ U(−L/2, L/2), (3.18) ranging the entire size of the d-dimensional simulation box of side length L. The proposal probability density for the spatial coordinates is commonly chosen to be a Gaussian distribution Q(R′ |R) ∼ N (R, σ 2 I), (3.19) with a variance σ 2 that balances a short autocorrelation time γ with a high acceptance probability A(R′ |R). Since the proposal probability is symmetric, Q(R′ |R) = Q(R|R′ ), 39 the acceptance probability for the move simplifies to P (R′ ) |Ψ(R′ , S)|2     A(R′ |R) = min 1, = min 1, . (3.20) P (R) |Ψ(R, S)|2 For the spin degrees of freedom S, it is possible to take a similar strategy as above by initializing a random configuration of spins and proposing random spins to flip rather than coordinates to perturb. However, the Hamiltonians we will use always commute with the total spin projection operator on the z-axis N Ŝ z ŝzi , X = (3.21) i=1 N meaning the total spin projection S z = szi is conserved. Similarly, the total isospin X i=1 projection on the z-axis T z is also preserved for systems of nucleons, as the total isospin projection operator N T̂ z t̂zi X = (3.22) i=1 commutes with the nuclear Hamiltonian. To avoid sampling unphysical spin configurations, we can restrict our Metropolis walk to preserve the total spin (and isospin) projection as follows: 1. Initialize. Generate any spin configuration S0 with the desired total spin projection S z (and isospin projection T z ). Spin-up particles are assigned a spin value of +1, while the spin-down particles are assigned −1. 2. Iterate. For each iteration t = 1, 2, ..., T : (a) Propose. Choose a random pair of particles (i, j) to exchange spin degrees of 40 freedom. The proposal spin configuration S ′ is the same as St−1 , except with the spins of particle i and j exchanged s′i = sj,t−1 , (3.23) s′j = si,t−1 . A different pair can be chosen to exchange the isospin degrees of freedom, if desired. (b) Compute. Compute the acceptance probability of the proposed configuration P (S ′ ) |Ψ(R, S ′ )|2     A(S ′ |St−1 ) = min 1, = min 1, . (3.24) P (St−1 ) |Ψ(R, St−1 )|2 (c) Accept-or-reject. Draw a uniform random number r ∼ U(0, 1) between 0 and 1. If r < A(S ′ |St−1 ), accept the spin exchange by setting St = S ′ . Otherwise, reject the exchange by setting St = St−1 . 3.1.1.2 Importance Sampling Importance sampling, a variant of the Metropolis-Hastings algorithm, draws inspiration from the Fokker-Planck equation—a generalization of the diffusion equation. The basic idea of importance sampling is to guide the proposal probability for the spatial degrees of freedom Q(R′ |R) towards regions with higher probability P (R). For a general, time-dependent probability distribution function, the Fokker-Planck equation of a diffusion process reads ∂ P (R, t) = D∇ · (∇ − F (R)) P (R, t), (3.25) ∂t 41 where ∇ = (∇1 , ∇2 , ..., ∇N ) is the combination of all gradient operators for the N particles, 2 ℏ is a constant diffusion coefficient, and F (R) is the drift velocity due to an external D = 2m potential. Here, we have also suppressed all dependence on the spins S, as they remain constant during these steps. Our goal is find the drift velocity F (R) such that our probability distribution converges to the stationary distribution determined by our wave function ∂ P (R, t) = P (R) = |Ψ(R)|2 =⇒ P (R, t) = 0. (3.26) ∂t Under this condition, the Fokker-Planck equation simplifies to ∇2 P (R) = ∇ · F (R)P (R) = F (R) · ∇P (R) + P (R)∇ · F (R) (3.27) For the Laplacian to appear on the right-hand side, the drift force should have the form F (R) = f (P (R))∇P (R), (3.28) where f is a scalar function to be determined. Then, Eq. (3.25) can be written as ∇2 P (R) = f (P (R))∇P (R) · ∇P (R) + P (R)f (P (R))∇2 P (R) (3.29) + P (R)∇f (P (R)) · ∇P (R). Matching the Laplacian terms, we find that 1 f (P (R)) = , (3.30) P (R) 42 which also eliminates the gradient terms. Therefore, the drift velocity must have the form 1 1 F (R) = ∇P (R) = 2 ∇Ψ(R), (3.31) P (R) Ψ(R) where we have assumed Ψ(R) is real-valued in the last equality. The Fokker-Planck equation can be seen as a deterministic description of the stochastic dynamics captured by the corresponding Langevin equation, ∂ R(t) = DF (R) + ξ, (3.32) ∂t a stochastic differential equation that describes the dynamics of a particle under the influence of both deterministic forces and random noise. In the above, ξ ∈ RN d is a random perturbation drawn from a Gaussian distribution N (0, 2DI) and F (R) is the drift force evaluated at the initial configuration, defined in Eq. (3.31). Integrating the Langevin equation over a short time interval ∆t, we obtain √ R(t + ∆t) = R(t) + D∆tF (R) + ∆tξ. (3.33) This proposal rule implies the normalized proposal density is given by Q(R′ |R, ∆t) = N (R + D∆tF (R), 2DδtI) (3.34) 1 ′ 2 = e−(R −R−D∆tF (R)) /4D∆t , (4πD∆t)N d/2 43 which is no longer symmetric due to the drift term. Then the acceptance probability becomes P (R′ )Q(R|R′ , ∆t)   A(R′ |R, ∆t) = min 1, P (R)Q(R′ |R, ∆t) ( ′ ′ 2 ) (3.35) |Ψ(R′ , S)|2 e−(R−R −D∆tF (R )) /4D∆t = min 1, . |Ψ(R, S)|2 e−(R′ −R−D∆tF (R))2 /4D∆t The additional weight appearing next to the original acceptance probability can be easily simplified by defining the drift velocity v(R) ≡ F (R)/2, ′ ′ 2 e−(R−R −D∆tF (R )) /4D∆t     − v(R)−v(R′ ) (R−R′ )+D∆t v(R)−v(R′ ) =e (3.36) ′ 2 e−(R −R−D∆tF (R)) /4D∆t This weight corrects for the sampling bias introduced by our proposal distribution exploring more relevant regions of the probability distribution. An equivalent derivation can be achieved by integrating the Fokker-Planck equation itself for a small time step ∆t and using the Green’s function method to solve for the time-dependent probability distribution. The resulting Green’s function coincides with the proposal distribution in Eq. (3.34). 3.1.2 Integration Monte Carlo integration is a numerical technique that uses random sampling to efficiently estimate the value of a high-dimensional integral. Recall that we can expand the expectation value of operators  in our |X⟩ basis as R ⟨Ψ|Â|Ψ⟩ dX⟨Ψ|X⟩⟨X|Â|Ψ⟩ ⟨Â⟩ = = R , (3.37) ⟨Ψ|Ψ⟩ dX⟨Ψ|X⟩⟨X|Ψ⟩ 44 Inserting 1 = ⟨X|Ψ⟩/⟨X|Ψ⟩ into the numerator, we have dX|Ψ(X)|2 A(X) R ⟨Â⟩ = , (3.38) dX|Ψ(X)|2 R where A(X) is the local quantity corresponding to the operator  ⟨X|Â|Ψ⟩ A(X) ≡ . (3.39) ⟨X|Ψ⟩ Notice that |Ψ(X)|2 P (X) = R (3.40) dX|Ψ(X)|2 is none other than the normalized probability distribution given by our wave function from Eq. (3.1). Then the expectation value becomes Z ⟨Â⟩ = dXP (X)A(X), (3.41) where the integral over X can be appropriately decomposed into a sum over S and an integral over R whenever necessary. Any integral with this form can be approximated as a simple average over local quantities T 1X ⟨Â⟩ ≈ A(Xt ) ≡ ⟨A(X)⟩, (3.42) T t=1 where the average is weighted over T samples from the probability distribution Xt ∼ P (X). By the law of large numbers, the average of the local quantities ⟨A(X)⟩ will converge to the expected value of the operator ⟨Â⟩ as T increases. Notice the notation we have chosen for the expectation value of the operator versus the weighted sum over the local quantities, as it 45 differs from other texts. We express both with angled brackets, but the explicit dependence on X for the latter is meant to imply X is pulled from our probability distribution. We choose this notation ⟨A(X)⟩ over the more common ⟨AL ⟩ to leave room for other subscripts. 3.2 Variational Monte Carlo Variational Monte Carlo (VMC) is a direct application of the variational principle from Sec. 2.4 and Monte Carlo integration from Sec. 3.1.2. The core idea is to take some parameterized trial wave function Ψθ (X), and minimize the corresponding expectation value of the energy with respect to the variational parameters θ ⟨Ψθ |Ĥ|Ψθ ⟩ E[Ψθ ] ≡ , (3.43) ⟨Ψθ |Ψθ ⟩ min E[Ψθ ] ≥ E0 . (3.44) θ The variational principle guarantees that the variational energy E[Ψθ ] establishes a strict upperbound on the true ground state energy of the system. To find the optimal parameters, we use gradient descent methods, a family of iterative optimization algorithms discussed in more detail in Sec. 3.2.2. For now, we just compute the gradient of the variational energy with respect to the parameters 1   ∇θ E[Ψθ ] = ⟨∇θ Ψθ |Ĥ|Ψθ ⟩ + ⟨Ψθ |Ĥ|∇θ Ψθ ⟩ ⟨Ψθ |Ψθ ⟩ ⟨Ψ |Ĥ|Ψθ ⟩   − θ ⟨∇ Ψ |Ψ θ θ θ ⟩ + ⟨Ψ θ |∇ Ψ θ θ ⟩ (3.45) ⟨Ψθ |Ψθ ⟩2 ! ⟨Ψθ |Ĥ|∇θ Ψθ ⟩ ⟨Ψθ |∇θ Ψθ ⟩ =2 − E[Ψθ ] , ⟨Ψθ |Ψθ ⟩ ⟨Ψθ |Ψθ ⟩ 46 where we have assumed our trial wave function is real-valued in the last equality. To approximate all the high-dimensional integrals above, we employ Monte Carlo integration. For the variational energy, we simply write T 1X E[Ψθ ] ≈ Eθ (Xt ) ≡ ⟨Eθ (X)⟩, (3.46) T t=1 where the configurations Xt are sampled from |Ψθ (X)|2 , T is the number of samples, and the local energy is defined as ⟨X|Ĥ|Ψθ ⟩ ĤΨθ (X) Eθ (X) ≡ = . (3.47) ⟨X|Ψθ ⟩ Ψθ (X) For the gradient in Eq. (3.45), we first define the local gradient of the wave function as ⟨X|∇θ Ψθ ⟩ ∇ Ψ (X) Oθ (X) ≡ = θ θ = ∇θ log Ψθ (X), (3.48) ⟨X|Ψθ ⟩ Ψθ (X) which allows us to write dX|Ψθ (X)|2 Oθ (X) R R ⟨Ψθ |∇θ Ψθ ⟩ dX⟨Ψθ |X⟩⟨X|∇θ Ψθ ⟩ = = dX|Ψθ (X)|2 R R ⟨Ψθ |Ψθ ⟩ dX⟨Ψθ |X⟩⟨X|Ψθ ⟩ T (3.49) 1X ≈ Oθ (Xt ) ≡ ⟨Oθ (X)⟩, T t=1 dX|Ψθ (X)|2 Eθ (X)Oθ (X) R R ⟨Ψθ |Ĥ|∇θ Ψθ ⟩ dX⟨Ψθ |Ĥ|X⟩⟨X|∇θ Ψθ ⟩ = = dX|Ψθ (X)|2 R R ⟨Ψθ |Ψθ ⟩ dX⟨Ψθ |X⟩⟨X|Ψθ ⟩ T (3.50) 1X ≈ Eθ (Xt )Oθ (Xt ) ≡ ⟨Eθ (X)Oθ (X)⟩, T t=1 47 Figure 3.1: The cyclic workflow of the variational Monte Carlo algorithm. Starting with a random set of variational parameters, the cycle is repeated until the variational energy converges. where we have once again assumed the wave function is real. Therefore, the gradient can be approximated as   ∇θ E[Ψθ ] ≈ ∇θ ⟨Eθ (X)⟩ = 2 ⟨Eθ (X)Oθ (X)⟩ − ⟨Eθ (X)⟩⟨Oθ (X)⟩ . (3.51) The variational Monte Carlo algorithm is a simple cyclic procedure; we sample configurations from the square of the wave function, estimate the variational energy and its gradient, and update the parameters. There are many ways we can complete the last step of this cycle, which we will cover in Sec. 3.2.2. More information about the implementation will be discussed in Chapter 5. 3.2.1 Trial Wave Functions The choice of trial wave function, or ansatz, is the most crucial ingredient of a variational Monte Carlo calculation. If chosen incorrectly, the upperbound on the ground state energy provided by the variational principle is effectively meaningless. For this reason, traditional VMC calculations rely on carefully curating a trial wave function to capture the essential features and correlations of the system under study. This involves considering the symmetries, boundary conditions, and known properties of the system, as well as incorporating relevant physical insights and intuition. Consequently, these trial wave functions are intricately tailored to the specific Hamiltonians, making them ill-suited for 48 application to similar, albeit distinct, Hamiltonians without substantial structural modifications. In this section, we will showcase a diverse range of conventional trial wave functions and analyze their suitability for different types of systems. Our objective is to identify the limitations of these methods while appreciating their inherent strengths. Through this exploration, we aim to develop a deeper understanding of the advantages offered by neural networks and their potential to overcome the limitations commonly associated with traditional strategies. 3.2.1.1 Kato’s Cusp Condition In the presence of singularities in the potential V (X), Kato’s cusp condition asserts that the local kinetic energy Kθ (X) must precisely counterbalance the diverging potential such that the total local energy remains finite as any two particles approach each other. More formally, the cusp condition states lim Eθ (X) = lim (Kθ (X) + V (X)) < ∞, (3.52) rij →0 r →0 ij for all pairs i, j, where rij = |ri − rj | is defined as the Euclidean distance between particles i and j, and the local kinetic energy is given by N X ℏ2 ∇2i Ψθ (X) Kθ (X) = − . (3.53) 2m Ψθ (X) i=1 For strongly-correlated systems, it is especially important to design a trial wave function that upholds the cusp condition throughout training. Even a finite, but hard interaction potential may warrant a cusp condition to ensure stability throughout the optimization of 49 Ψθ (X). A "hard" potential is often used to describe an interaction potential that exhibits abrupt changes in magnitude as the distance between particles varies. These potentials typically have a large scattering length while being short-range in nature. 3.2.1.2 Jastrow Factor Enforcement of Kato’s cusp condition usually comes in the form of a Jastrow factor, also known as a Jastrow wave function. For a system of identical particles, it can be any permutation-invariant function of the particles, as it preserves the symmetry of the overall wave function. The goal of the Jastrow factor is to incorporate the effects of particle-particle interactions into the trial wave function. The form of the Jastrow factor depends on the specific system and the character of the interactions being considered. For a system of bosons, the ground state wave function must be positive-definite and symmetric with respect to particle exchange. Since we can ignore spin, the generic trial wave function for bosons can be written as Ψθ (R) = Φ0 (R)eJ(R) , (3.54) where the first term represents the ground state of the non-interacting problem and the second term is the symmetric Jastrow factor containing the inter-particle correlations. By writing YN Φ0 (R) = ξ0 (ri ), (3.55) i=1 the non-interacting problem becomes simple to solve because it can be separated into N 50 identical and independent parts   K̂ + V1 (R) Φ0 (R) = E0nonint Φ0 (R), ℏ2 2   − ∇ + v1 (ri ) ξ0 (ri ) = ε0 ξ0 (ri ), (3.56) 2m i N E0nonint X = ε0 = N ε 0 , i=1 which can often be solved analytically. Each of the N bosons occupy the lowest single-particle state with energy ε0 , so the non-interacting energy is simply their sum. Examples of exactly-solvable non-interacting systems include the free system Ĥ free = K̂, (3.57) where the single-particle ground state is the trivial plane-wave orbital ψ0free (ri ) = 1, (3.58) with zero energy εfree 0 = 0, in which case, the trial wave function is just the Jastrow factor on its own. Another common example is the non-interacting system of isotropic harmonic oscillators N 1 Ĥ ho mω 2 ri2 , X = K̂ + (3.59) 2 i=1 where ω is the oscillation frequency. The single-particle ground state wave function is a Gaussian − mω r 2 ψ0ho (ri ) = e 2ℏ i (3.60) 51 with the corresponding energy d εho 0 = 2 ℏω. (3.61) Constructing the Jastrow factor is often much more difficult than the non-interacting ground state. Nonetheless, there are some exactly-solvable interactions that serve as great examples for demonstrating the general strategy. Consider the Calogero-Sutherland model of bosons in a one-dimensional harmonic oscillator trap interacting with an inverse-squared potential N ℏ2 X β(β − 1) Ĥ cs = Ĥ ho + V cs (R), V cs (R) = , (3.62) m i