MACHINE LEARNING AND COUPLED CLUSTER THEORY APPLIED TO INFINITE
                                   MATTER
                                      By
                              Julie Lynn Butler
                             A DISSERTATION
                                 Submitted to
                         Michigan State University
                 in partial fulfillment of the requirements
                              for the degree of
                      Physics—Doctor of Philosophy
                                     2023


                                            ABSTRACT
An initio many-body methods, such as coupled cluster theory, are able to make accurate predictions
on systems, even if no experimental data for the system exists. This makes them an invaluable tool
for studying exotic nuclei or systems, such as infinite matter, where experimental results are sparse.
However, the performance and accuracy of a coupled cluster calculation suffer from truncations
of the cluster operator and basis truncations, both of which are needed to reduce the scale of the
problem such that it is computationally feasible. Additionally, when coupled cluster theory is
applied to systems of infinite matter (the homogeneous electron gas and infinite nuclear matter),
the number of particles in the system must also be truncated to a finite number, this introduces
another source of error in the calculations. Finally, when modeling all nuclear systems, including
infinite nuclear matter, the choice of nuclear interaction can greatly affect both the results and
the computational run time. Simple nuclear interactions, such as the Minnesota potential, have
comparatively small run times but lack the accuracy of computationally more complex interactions
which are derived from effective field theory.
The goal of this thesis is to improve the accuracy of coupled cluster calculations applied to two
infinite matter systems: the homogeneous electron gas and two infinite nuclear matter system (pure
neutron matter and symmetric nuclear matter). Coupled cluster calculations at the doubles and
triples levels will be compared to determine if the increase in computational time is worth the
increase in accuracy. Additionally, two different nuclear interactions will be tested on calculations
of pure neutron matter: a toy model called the Minnesota potential, which is computationally
simple, and a much more complex set of optimized interactions which are derived from effective
field theory, which increase the accuracy but also the computational run time.
Finally, the main part of this thesis is devoted to the development of a simple machine learning
algorithm that can accurately extrapolate coupled cluster calculations of infinite systems separately
to the complete basis limit and the thermodynamic limit. This algorithm, known as sequential
regression extrapolation, combines a Bayesian machine learning algorithm with a unique way of
formatting the training data to create a powerful and accurate extrapolate that can be trained on very


little data, does not require hyperparameter tuning, and can automatically produce uncertainties on
its predictions. With this method, we are able to accurately predict the coupled cluster correlation
energies of infinite matter systems accurately in the complete basis and thermodynamic limits while
savings months of computational time in the process.


Copyright by
JULIE LYNN BUTLER
2023


For Adam, who has never stopped encouraging me to pursue my dreams
                                 v


                                   ACKNOWLEDGEMENTS
Graduate school has been an amazing journey and, luckily, a journey that I have not had to take
alone. All the people listed below have contributed in some way and helped make this dissertation
a reality. It has been a dream of mine since I was in high school to get a Ph.D.; there is no way I
could have done it without these fantastic people.
There is no other way to start this acknowledgments section than by extending a huge thank you
to my advisor, Morten Hjorth-Jensen. Morten is the best teacher and mentor I could have asked
for. He has influenced my research, teaching, and outlook on the world more than I could have
imagined. Morten, while I may no longer be your student, I will always look to you as an example
of what a professor and a research advisor should be.
I want to thank my committee members Scott Bogner, Johannes Pollanen, Danny Caballero, and
Sean Liddick for their unending patience and assistance throughout this process. I also must thank
Heiko Hergert for introducing me to many-body theory during my first summer at MSU and aiding
me throughout this journey. I have also been lucky to share this journey as a graduate student
with my group members, Yani Udiani, Jane Kim, Danny Jammooa, and Kang Yu. Being able
to experience all of the aspects of being a nuclear theory graduate student with you has been a
pleasure.
I would also like to acknowledge my collaborators at Oak Ridge National Laboratory, Gustav Jansen
and Justin Leitz, without whom the coupled cluster results presented in this thesis would not be
possible. Not only did you make significant contributions to my research, but I also look up to you
as scientists and mentors. Both of you made my time living in Oak Ridge much more manageable
and enjoyable.
Next, I want to acknowledge Kim Crosslan and Elizabeth Delinski for all of their assistance over
the past five years. There is no way I could have navigated all of the different aspects of graduate
school without your help.
Graduate school would have been much harder without the encouragement and support of my
family. I have to thank my parents, Jimmy and Linda, for their support throughout this process.
                                                vi


My sister, Bre, and my brother-in-law, Graham, have been an unending source of encouragement
when things got hard. I must also acknowledge the family I gained through marriage: Chris, Mary
Jane, Aaron, Nana, Papa, Nannie, and Pawpaw. There is no way I could have done all I have over
the past five years if you were not there to support me.
I want to thank the friends that have been with me since college (Courtney, Connor, Mikhayla,
Will) and the friends I have made in graduate school (Eric, Nick, and Ana) for always encouraging
me and believing in me even when I did not believe in myself. Graduate school would have been a
much more lonely place without all of the company and good times you provide.
I also have to acknowledge to contributions of my dog, Lord Kelvin, to my mental health and sanity
throughout all of the virtual classes and work-from-home days I have endured as a graduate student.
Finally, I must acknowledge the person this thesis is dedicated to: my husband, Adam. There is
no way I can ever pay you back for all of the assistance you have provided and the sacrifices you
have made to help me get this degree, but I dedicate this thesis to you as an acknowledgment of
everything you have done.
                                                  vii


                                  TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      x
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    xi
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
CHAPTER 1        INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           1
CHAPTER 2        MANY-BODY FRAMEWORK . . . . . . . .                 . . . . . . . . . . . . . .  7
      2.1 Introduction to Many-Body Theory . . . . . . . . . .       . . . . . . . . . . . . . .  7
      2.2 Dirac Bracket Notation for Many-Body Systems . . .         . . . . . . . . . . . . . .  9
      2.3 Occupation Representation and Second Quantization          . . . . . . . . . . . . . . 14
      2.4 Normal Ordering . . . . . . . . . . . . . . . . . . .      . . . . . . . . . . . . . . 20
      2.5 Diagrammatic Methods . . . . . . . . . . . . . . . .       . . . . . . . . . . . . . . 21
      2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . . 26
CHAPTER 3        MANY-BODY THEORIES             .  . . . . . . . . . . . . . . . . . . . . . . . 27
      3.1 Introduction . . . . . . . . . . . .  .  . . . . . . . . . . . . . . . . . . . . . . . 27
      3.2 Hartree-Fock Theory . . . . . . .     .  . . . . . . . . . . . . . . . . . . . . . . . 27
      3.3 Many-Body Perturbation Theory .       .  . . . . . . . . . . . . . . . . . . . . . . . 29
      3.4 Coupled Cluster Theory . . . . . .    .  . . . . . . . . . . . . . . . . . . . . . . . 31
      3.5 Many-Body Methods Conclusion .        .  . . . . . . . . . . . . . . . . . . . . . . . 44
CHAPTER 4        INFINITE MATTER SYSTEMS . . . . . .             . . . . . . . . . . . . . . . . 45
      4.1 Introduction to Infinite Matter Systems . . . . . .    . . . . . . . . . . . . . . . . 45
      4.2 Single Particle Basis for Infinite Matter . . . . . .  . . . . . . . . . . . . . . . . 46
      4.3 The Homogeneous Electron Gas . . . . . . . . .         . . . . . . . . . . . . . . . . 48
      4.4 Interlude: Nuclear Forces . . . . . . . . . . . . .    . . . . . . . . . . . . . . . . 51
      4.5 Infinite Nuclear Matter . . . . . . . . . . . . . .    . . . . . . . . . . . . . . . . 53
      4.6 Truncation Errors in Infinite Matter Calculations .    . . . . . . . . . . . . . . . . 54
      4.7 Conclusion . . . . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . 57
CHAPTER 5        BAYESIAN MACHINE LEARNING                 . . . . . . . . . . . . . . . . . . . 58
      5.1 Machine Learning . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 58
      5.2 Linear Regression . . . . . . . . . . . . . .    . . . . . . . . . . . . . . . . . . . 59
      5.3 Ridge Regression . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 60
      5.4 Kernel Ridge Regression . . . . . . . . . .      . . . . . . . . . . . . . . . . . . . 62
      5.5 Neural Networks . . . . . . . . . . . . . .      . . . . . . . . . . . . . . . . . . . 63
      5.6 Bayesian Ridge Regression . . . . . . . . .      . . . . . . . . . . . . . . . . . . . 69
      5.7 Gaussian Processes . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 70
      5.8 Conclusion . . . . . . . . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . 71
CHAPTER 6        SEQUENTIAL REGRESSION EXTRAPOLATION (SRE) . . . . . . . 72
      6.1 Introduction to Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
                                              viii


      6.2 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  72
      6.3 CCD Extrapolations With Respect to Number of Single Particle States .       . . . .  75
      6.4 CCD Extrapolations With Respect to Number of Particles . . . . . . . .      . . . .  78
      6.5 CCDT Extrapolations With Respect to Number of Single Particle States        . . . .  79
      6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . .  79
CHAPTER 7       THE HOMOGENEOUS ELECTRON GAS RESULTS . . . . . . . . . . 81
      7.1 The Thermodynamic Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
CHAPTER 8       INFINITE NUCLEAR MATTER RESULTS                 . . . . . . . . . . . . . . . 104
      8.1 Pure Neutron Matter . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . 104
      8.2 Symmetric Nuclear Matter . . . . . . . . . . . . .    . . . . . . . . . . . . . . . 121
      8.3 Symmetry Energy . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . 126
      8.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
CHAPTER 9       CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
                                             ix


                                     LIST OF TABLES
                                                         ˆ
Table 3.1 This table summarizes the approximations for 𝑒𝑇 the various CCSDT-𝑛 methods
          use in the amplitude equations, along with the approximations used for CCSD
          and CCSDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Table 3.2 The predicted computational run times of various coupled cluster calculations
          as a function of M, the number of single particle states in the calculation . . . . . 40
                                               x


                                       LIST OF FIGURES
Figure 2.1 Diagrammatic representations for the Fermi vacuum state, the one-particle
           one-hole excited state, and the two-particle two-hole excited state . . . . . . . . 22
Figure 2.2 Diagrammatic representations for the one-particle one-hole and two-particle
           two-hole excited states written as annihilation and creation operators applied
           to the Fermi vacuum state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 2.3 A generic representation of the one body operator. The orientation of the
           lines does not matter, so the diagram can be represented with both lines being
           entirely vertical or with the lines being slanted. Since these lines do not have
           arrows, they represent generic single-particle states that could be occupied or
           unoccupied in the Fermi vacuum state . . . . . . . . . . . . . . . . . . . . . . 23
Figure 2.4 The specific diagrams for the one-body operator, with the top left diagram for
           two unoccupied states, the top right diagram for two unoccupied states, and
           the bottom two diagrams containing one occupied and one unoccupied state . . 24
Figure 2.5 The general diagram for a two-body interaction . . . . . . . . . . . . . . . . . 25
Figure 2.6 The nine specific diagrams for a two-body interaction . . . . . . . . . . . . . . 25
Figure 3.1 Diagrammatic representations for the 𝑇ˆ1 , 𝑇ˆ2 , and 𝑇ˆ3 operators. Higher order
           𝑇ˆ𝑚 operators can be created using the pattern in the operators shown here . . . . 42
Figure 3.2 The diagrammatic formulation for the CCD amplitude equation . . . . . . . . . 43
Figure 3.3 Antisymmetrized Goldstone diagrams representing the 𝑇ˆ3 contributions to
           the CCSDT 𝑇ˆ2 equations. D10𝑏 and D10𝑐 are relevant to the CCDT-1
           approximation defined in the last section . . . . . . . . . . . . . . . . . . . . . 43
Figure 3.4 Antisymmetrized Goldstone diagrams representing the CCSDT 𝑇ˆ3 equations
           relevant to calculating the CCDT-1 approximation. For a full list of all
           antisymmetrized Goldstone diagrams for the CCSDT 𝑇ˆ3 equations, the reader
           is referred to Reference [80] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 7.1 Δ𝐸𝐶𝐶 per electron plotted against N with calculations performed at four
           different values of M. As M increases, the error in the calculation decreases,
           but the run time drastically increases . . . . . . . . . . . . . . . . . . . . . . . 81
                                                 xi


Figure 7.2 The total run time needed, in node seconds, to perform a CCD calculation on
           the HEG with 𝑟 𝑠 = 0.5 and at N = 66, 294, and 514. The CCD calculations
           are performed at many values of M, as shown on the x-axis. A node second
           is defined as the total run time of the program times the number of nodes
           used to run the program. Four nodes from Michigan State University’s
           high-performance supercomputer were utilized in this case. These nodes
           have Intel Xeon processes with a clock speed of 2.4 GHz . . . . . . . . . . . . 83
Figure 7.3 The total amount of RAM in gigabytes needed to perform CCD calculations
           on the HEG with 𝑟 𝑠 = 0.5 and with 66, 294, or 514 electrons. Calculations
           were performed at various numbers of single-particle states (x-axis) . . . . . . . 84
Figure 7.4 A plot of Δ𝐸𝐶𝐶 and Δ𝐸 𝑀 𝐵𝑃𝑇 versus M for a HEG system with N=342 and
           r𝑠 =0.5. The converged correlation energies calculated at M=6,142 are shown
           with dashed lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Figure 7.5 Figure 7.4 is transformed such that Δ𝐸𝐶𝐶 are plotted as a function of tΔ𝐸 𝑀 𝐵𝑃𝑇
           (such that each point has the same M). The resulting plot has a linear
           relationship that becomes stronger to the left (as M increases) . . . . . . . . . . 86
Figure 7.6 A plot of the ratio of Δ𝐸𝐶𝐶 to Δ𝐸 𝑀 𝐵𝑃𝑇 as a function of M for N=114, 342,
           and 514 and r𝑠 =0.5. As M increases, the ratio (or the slope of the line in
           Figure 7.5) becomes increasingly constant. However, it is important to note
           that the run time increases drastically as M increases . . . . . . . . . . . . . . . 87
Figure 7.7 The converged CCD correlation energies (black) calculated at M = 6,142
           for an HEG with 𝑟 𝑠 = 0.5. Also displayed are the results of predicting the
           converged correlation energies with a truncation at 20 open shells (yellow)
           which has a percent error of 6.87%, using a 1/M power law (blue) which has
           an average percent error of 7.00%, using the CCD to MBPT2 slope calculated
           at 20 open shells (green) which have an average percent error of 1.16%, and
           using linear regression to model the graph from Figure 7.5 (red) which has
           an average percent error of 0.95% . . . . . . . . . . . . . . . . . . . . . . . . . 90
Figure 7.8 This figure shows the extrapolations’ dependence on the hyperparameter 𝛼.
           The results of performing an SRE analysis on the HEG with 𝑟 𝑠 = 0.5 and at
           various numbers of electrons. The machine learning algorithm used for the
           SRE algorithm is ridge regression . . . . . . . . . . . . . . . . . . . . . . . . . 91
Figure 7.9 The results of performing an SRE analysis on the HEG to predict the converged
           CCD correlation energies for 14 different numbers of electrons and five
           different values of 𝑟 𝑠 . The SRE predictions, made using a ridge regression
           algorithm with 𝛼 = 10−10 , are shown with the triangular markers and the solid
           lines representing the full correlation energy calculations at M = 6,142. The
           average percent error between the SRE predictions and the full calculations
           is 0.45% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
                                                xii


Figure 7.10 The results from performing an SRE extrapolation in terms of M for various
            numbers of electrons and values of r𝑠 . Results are plotted against Δ𝐸𝐶𝐶,6142 . . 96
Figure 7.11 Here, with have recreated Figure 7.7, but have added the SRE results, which
            have an average percent error of 0.31% . . . . . . . . . . . . . . . . . . . . . . 97
Figure 7.12 The results of performing an SRE analysis with Gaussian processes to predict
            the converged correlation energies for the HEG at different values of N and
            𝑟 𝑠 . The correlation energies calculated at M = 6,142 are plotted with a solid
            line and are taken to be the true results, the SRE predictions are plotted with
            triangular markers, and the shaded region represents the uncertainty on the
            SRE predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Figure 7.13 The results of performing an SRE analysis with Gaussian processes to predict
            the converged correlation energies for the HEG at different values of N and
            𝑟 𝑠 and only ten training points. The correlation energies calculated at M =
            6,142 are plotted with a solid line and are taken to be the true results, the
            SRE predictions are plotted with triangular markers, and the shaded region
            represents the uncertainty on the SRE predictions . . . . . . . . . . . . . . . . 99
Figure 7.14 The converged CCD correlation energies for the HEG calculated at M =
            6,142 (solid) and the TDL predictions for the correlation energy per particle
            calculated through the SRE method (dashed) . . . . . . . . . . . . . . . . . . . 100
Figure 7.15 The CCD correlation energy per electron for an HEG with 𝑟 𝑠 = 0.5 and
            calculated at M = 6,142 (black). The power law prediction for the TDL
            correlation energy per electron is shown in green, and the SRE predictions
            for the TDL correlation energy per electron extrapolated from calculated and
            predicted energies are shown in red and blue, respectively . . . . . . . . . . . . 101
Figure 8.1  The CCD correlation energy per neutron as a function of the number of
            neutrons in the system for pure neutron matter with the Minnesota potential.
            All calculations are performed at 6,142 single particle states and with periodic
            boundary conditions. The full calculations, shown with the solid lines, were
            performed on MSU’s ICER supercomputer. The SRE predictions, shown
            with the triangular data points, were predicted using only ten training data
            points from 5-14 open shells. The shaded region shows the Gaussian process
            uncertainty in predicting the converged correlation energy per neutron. The
            overall RMSE error between the full calculated and the predicted CCD
            correlation energies per neutron is 0.037 MeV . . . . . . . . . . . . . . . . . . 105
Figure 8.2  An SRE analysis of pure neutron matter with the Minnesota potential using
                                                                                 1/4
            a Gaussian process machine learning algorithm with 𝛼 = 𝛿𝑦 𝑡𝑟𝑎𝑖𝑛          . This
            increases the accuracy of the preductions but also increases the uncertainity
            on the predictions. The results are shown with uncertainities (left) and without
            (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
                                                 xiii


Figure 8.3 The run times, in node seconds, for coupled cluster doubles calculations of
           pure neutron matter as a function of the number of single-particle states in
           the calculation. Calculations performed at three different densities are shown . . 109
Figure 8.4 The results from predicting the converged Δ𝐸𝐶𝐶 /𝑁 for pure neutron matter
           using the coupled cluster doubles approximation. Figure (a) shows pure
           neutron matter for a range of densities around nuclear density (0.10 ≤ d ≤
           0.20), and figure (b) shows the results for a low-density pure neutron matter
           (d ≤ 0.05). For both graphs, the solid black line plots the fully converged
           Δ𝐸𝐶𝐶𝐷 /𝑁, calculated at 1,478 single-particle states; the triangular markers
           represent the SRE predictions at each density of interest. The dashed red lines
           represent the point used to train the GP algorithms with the highest number
           of single-particle states (M= 186 for Figure (a) and M= 502 for Figure (b)) to
           show that the training set has not reached convergence . . . . . . . . . . . . . . 110
Figure 8.5 A comparison of correlation energy calculations for pure neutron matter near
           nuclear density for 66 neutrons. The red data sets were calculated with the
           Minnesota potential, and the blue data sets with the chiral potentials. The
           dashed lines represent the converged correlation energy per particle from
           MBPT2, the solid lines represent the converged CCD correlation energies
           per particle, and the circular makers represent the ML predictions for the
           converged CCD correlation energies per particle . . . . . . . . . . . . . . . . . 111
Figure 8.6 A comparison between the CCD and CCDT-1 correlation energies for a pure
           neutron matter system (N = 66 neutrons) with NNLO chiral potentials and for
           densities around nuclear density. The CCDT-1 approximation consistently
           gives a lower correlation energy than CCD correlation energy at all densities
           in the graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Figure 8.7 The run times needed to complete CCD and CCDT-1 calculations for a PNM
           system reported in node hours . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Figure 8.8 The results from performing coupled-cluster calculations with the CCDT-1
           approximation on a pure neutron matter system with chiral NNLO potentials,
           66 neutrons, and d = 0.16 fm−3 . The full calculations are shown with a solid
           line, the training data used for the SRE algorithm is shown with the triangular
           markers, and the SRE prediction for the converged correlation energy is shown
           with the dashed line. The percent error between the SRE prediction and the
           fully calculated converged correlation energy is 0.51%, and the time savings
           for performing the SRE method is 180.68 node hours . . . . . . . . . . . . . . 116
Figure 8.9 The results from predicting the converged Δ𝐸𝐶𝐶𝐷𝑇1 /𝑁 for pure neutron matter
           at densities around nuclear density. The average percent error across all
           densities is 0.45%, and the time savings from generating only the training
           data versus the fully converged Δ𝐸𝐶𝐶𝐷𝑇1 /𝑁 is 917.30 node hours . . . . . . . . 116
                                                 xiv


Figure 8.10 CCD (red) versus CCDT-1 (black) correlation energies for a pure neutron
            matter system with 66 neutrons and chiral potentials. The converged correlation
            energies from full calculations are shown with solid lines, and the points
            represent the machine-learning predictions. The machine learning prediction
            is shown with the uncertainties from the Gaussian process algorithm . . . . . . 117
Figure 8.11 Correlation energies of PNM calculated with CCD (red), CCDT-1 (green),
            and CCD(T) (blue) for densities around nuclear density . . . . . . . . . . . . . 118
Figure 8.12 The run time (in node hours per GHz) to calculate the correlation energies for
            a PNM system at d = 0.16fm−3 calculated using CCD (red), CCDT-1 (green),
            and CCD(T) (blue) as a function of the number of single-particle states in the
            calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Figure 8.13 The CCD(T) correlation energies for PNM for densities around nuclear
            density were calculated at M= 1,478 (solid line) and predicted with the SRE
            method (triangular markers). The uncertainties from the Gaussian processes
            algorithm are shown with error bars on the markers . . . . . . . . . . . . . . . 120
Figure 8.14 The correlation energies for PNM were calculated using CCD (red), CCDT-1
            (green), and CCD(T) (blue). The fully converged calculations are shown with
            the solid lines, and the SRE predictions are shown with the triangular markers . 121
Figure 8.15 The CCD correlation energies for PNM (red) and SNM (blue) for densities
            around nuclear matter and with a chiral NNLO potential . . . . . . . . . . . . . 122
Figure 8.16 The CCD correlation energies for SNM calculated fully at M = 2,956 (solid
            line) and using the SRE method to predict them (triangular marker). The
            average percent error of this graph is 0.54% and the time savings of using
            SRE over performing the full calculations is 6.48 node days . . . . . . . . . . . 123
Figure 8.17 The run times (in node hours per GHz) for CCD calculations of SNM (red) and
            CCD(T) calculations of SNM (blue). The CCD calculations were performed
            on processors which ran at 2.4GHz and used 24 MPI nodes, while the CCD(T)
            calculations were performed on processors that ran at 3.0GHz and used 64
            MPI nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Figure 8.18 The CCD(T) correlation energies for SNM calculated at M = 2,060 (solid
            line) and predicted with SRE (triangular markers) . . . . . . . . . . . . . . . . 126
Figure 8.19 The correlation energies for SNM were calculated with CCD (red) and
            CCD(T) (blue). The full calculations at M = 2,956 (CCD) and M = 2,060
            (CCD(T)) are shown with solid lines, and the SRE predictions are shown with
            triangular markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
                                                xv


Figure 8.20 The symmetry energy for infinite nuclear matter is calculated using the CCD
            approximation (red) and the CCD(T) approximation (blue). The symmetry
            energy calculated from fully converged calculations is shown with the solid
            lines and SRE predictions with the triangular points. Error bars also represent
            the uncertainty on the symmetry energy from SRE predictions from the
            Gaussian process algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
                                               xvi


                             LIST OF ABBREVIATIONS
A      Number of Particles
BRR    Bayesian Ridge Regression
CC     Coupled Cluster Theory
CCD    Coupled Cluster Doubles
CCDT   Coupled Cluster Doubles and Triples
CCD(T) Coupled Cluster Doubles with perturbative triples
CCDT-1 Coupled Cluster Doubles and Triples approximation
CCS    Coupled Cluster Singles
CCSD   Coupled Cluster Singles and Doubles
CCSDT  Coupled Cluster Sinlges, Doubles, and Triples
GP     Gaussian Processes
HEG    Homogeneous Electron Gas
HF     Hartree-Fock Theory
ICER   Institute for Cyber-Enabled Computing
INM    Infinite Nuclear Matter
KRR    Kernel Ridge Regression
LR     Linear Regression
M      Number of Single Particle States
MBPT   Many-Body Perturbation Theory
ML     Machine Learning
MSU    Michigan State University
N      Number of Particles
NN     Neural Network
PNM    Pure Neutron Matter
                                          xvii


RNN Recurrent Neural Network
RR  Ridge Regression
SNM Symmetric Nuclear Matter
TDL Thermodynamic Limit
                             xviii


                                               CHAPTER 1
                                           INTRODUCTION
Describing a system of many-interacting particles is difficult in the context of classical mechanics.
It becomes even more complicated when the quantum mechanical properties of the particles are
included. The only quantum mechanical many-body problems we can solve exactly are simplified
problems, and thus we must develop a framework to approximately solve the many-body problem
for large systems of interest, such as atomic nuclei and infinite matter systems [1]. This is the basis
of the field of many-body physics, which intersects with nuclear theory, condensed-matter theory,
quantum chemistry, and other subfields of chemistry and physics.
Specifically, we will look at ab initio, or first principles, many-body methods where the calculations
start from a given Hamiltonian and attempt to solve the many-body problem without allowing any
uncontrolled approximation. These first principle calculations are essential in nuclear physics and
quantum chemistry calculations. It should be noted that while a proper ab initio calculation in
nuclear physics would begin with the quark and gluon degrees of freedom, in this thesis we use ab
initio to refer to a group of many-body methods which can which can be improved systematically
given the Hamiltonian of the systems and relevant laws of motion.
Though ab initio many-body calculations can be expensive, both in terms of computational time and
computational resources, they play an important role in many-body physics for a few reasons. First,
performing these calculations at a fine-grained and controllable level allows us to probe new and
interesting particle interactions and challenge existing computational methods. Second, these first
principle methods allow theoretical many-body physicists to guide new experiments and interpret
the results from these experiments. Furthermore, since these calculations start from first principles,
these ab initio many-body methods can make predictions on systems where no experimental data
exists [2].
In nuclear physics, the results from performing an ab initio calculation on a light nucleus are
expected to yield a binding energy that has an error of less than a few percent when compared
to experimental results. While this is a good accuracy considering that these methods start only
                                                      1


from the Hamiltonian of the system, the shell model with matrix elements which are fitted to
experimental data would achieve a higher level of accuracy. However, ab initio methods still have
a place in nuclear physics as they allow us to test and expand our knowledge of the various nuclear
interaction models. Additionally, they allow us to study nuclei from the nuclear chart in regions
where very little data is available for phenomenological models [2]. They are many ab initio
many-body methods available for nuclear studies, and they differ in their computational costs and
the regions of the nuclear chart where they can be applied. They also have differences in flexibility
and ability to deal with various Hamiltonians.
The ab initio many-body method which will be the focus of this thesis is coupled cluster theory,
originally named exp S, which originated in the nuclear physics community over sixty years ago.
Coester and Kümmel proposed an exponential ansatz that could be used to describe the correlations
within a nucleus. Since its inception in the late 1950s and early 1960s, coupled cluster theory has
become an important many-body method, both in its home field of nuclear physics and quantum
chemistry.
There are a few reasons why coupled cluster theory is such a popular many-body method. First,
coupled cluster theory is fully microscopic and is both size extensive and size consistent [2, 3].
Second, coupled cluster theory can exactly produce the fully correlated many-body wavefunction
if all components of its cluster operator are considered. However, since this is usually not pos-
sible, coupled cluster theory provides a systematic truncation method that allows for hierarchical
improvements. Third, though coupled cluster theory is not a variational method, its energy behaves
in most cases as a variational quantity. Finally, the form of the equations in coupled cluster theory
is very amenable to parallel implementation, including with MPI and utilizing graphical processing
unit (GPU) computations [4].
In the 1960s and 1970s, coupled cluster theory was further developed and was applied to the field of
quantum chemistry, where it continues to flourish to this day [5–7]. Though this thesis is in nuclear
physics, we will take inspiration from quantum chemistry and use some of the advancements made
in that field. Reference [7] provides a good review of coupled cluster theory’s importance to the
                                                   2


field of quantum chemistry.
There are several good reviews on coupled cluster theory’s history in nuclear physics and develop-
ments at that time (see Refs. [2, 8, 9]). However, we will summarize the highlights and mention
some recent coupled cluster results in nuclear physics that post-date these reviews. Though coupled
cluster theory was initially developed in nuclear physics, it grew slowly in the field and experienced
only slow changes and growth during the second half of the 20th century [9]. This was due to a
combination of inaccurate models to describe the nuclear interaction and computational limitations,
which made describing nuclear systems of interest challenging and inaccurate. Heisenberg and
Mihaila were the first to use a high-precision nucleon-nucleon interaction and three-nucleon forces
in a coupled cluster calculation of a nucleus, specifically oxygen-16 [10]. Coupled cluster theory
experienced a revival in nuclear physics in the early 2000s (see, for example, Reference [4]) and has
been successfully and accurately applied to both medium mass nuclei and is now being extended to
study high mass nuclei. See Ref. [2] for a review of coupled cluster calculations on medium mass
nuclei among other systems and Ref. [11] for a recent result concerning coupled cluster studies of
high mass nuclei.
Though coupled cluster theory in nuclear physics does differ from coupled cluster theory in quantum
chemistry, an essential advancement in the revival of coupled cluster theory in nuclear physics in
the early 2000s occurred when Dean and Hjorth-Jensen took the standard approach to coupled
cluster theory from quantum chemistry and translated it to nuclear physics [4]. Additionally, within
nuclear physics, several conceptual and theoretical developments involving nuclear interactions
increased coupled cluster accuracy when applied to nuclear problems. These advances include
the development of soft interactions through the application of effective field theory (EFT) and
the inclusion of three-nucleon interactions. These changes, along with the development of more
realistic nuclear interactions [12–19], made coupled cluster theory into a more feasible method
for application to nuclear studies. Finally, increased computational technology has allowed more
computational cycles for faster calculations and the option for coupled cluster calculations to be
further accelerated by leveraging modern computational methods such as parallel computing and
                                                  3


utilizing graphical processing units (GPUs) [2, 20]. Recent advances in coupled cluster theory have
allowed calculations on nuclei as large as lead-208 ([11]).
One important system which can be analyzed with coupled cluster theory is a system of infinite
matter, meaning that the system contains an infinite number of particles and spans an infinite
volume in space. The most important infinite matter system in nuclear physics is infinite nuclear
matter, where all of the particles are protons and neutrons in various fractions. In this thesis, we
will investigate two infinite nuclear matter systems: pure neutron matter, where all the particles are
protons, and symmetric nuclear matter, where the particles are evenly divided between protons and
neutrons.
Infinite nuclear matter has several essential applications within nuclear physics and astrophysics.
Within nuclear physics, calculations of infinite nuclear matter are performed alongside calculations
of finite nuclei to support the results (see, for example, [11, 17, 21]). Additionally, calculations
of infinite nuclear matter are essential for determining the nuclear equation of state, which can
determine the behavior of both finite nuclei and the behavior of matter in astrophysical objects.
Finally, some astrophysical objects, such as neutron stars, are challenging to study using traditional
observational astronomy. However, we can still conclude their behaviors and properties by per-
forming many-body studies on their particles. The matter that makes up objects, such as neutron
stars, can be modeled by various types of infinite nuclear matter approaches [22–33]. There have
been several coupled cluster studies of infinite nuclear matter previously [2, 3, 34–36].
Outside of nuclear systems, this thesis will also perform coupled cluster calculations of the homo-
geneous electron gas (HEG), an infinite matter system containing only electrons with a positive
background charge, so there is no net charge [37]. The electron gas is essential in many fields of
chemistry, such as density functional theory, where it is used to benchmark calculations, and is used
as a model for the electrons in superconducting metals. The HEG has been extensively studied with
coupled cluster theory for decades [38–43]. For our purposes, the homogeneous electron gas will be
a test case for developing methods that can be applied to infinite nuclear matter. The homogeneous
electron gas, made entirely of electrons, is governed by the Coulomb force, a much simpler force
                                                   4


than the nuclear forces. Thus coupled cluster calculations of the homogeneous electron gas are less
time-consuming than calculations of infinite nuclear matter, making it a great sandbox to build our
methodologies.
While it may be evident that modeling an infinite system in a finite computational space is not
feasible, this is a significant hurdle to performing accurate coupled cluster calculations on infinite
matter systems. The number of single-particle states and the number of particles in the system must
be truncated, introducing basis incompleteness and finite size errors, respectively. The coupled
cluster calculation, specifically the coupled cluster correlation operator, must also be truncated due
to computational time and resource limitations.
While traditional methods of extrapolation can be used to remove the basis incompleteness and
finite size errors with some success ([38, 42, 44–50]), these methods are usually very specific
to the exact system and many-body method being studied. It should also be noted that many of
these studies focus on extrapolations to the thermodynamic limit and not the complete basis limit.
Additionally, performing the calculations at large values of M and N to mitigate these errors is
computationally prohibitive due to a significant amount of time and resources, machine learning
offers a new and promising avenue for removing these errors. Machine learning is a field at the
intersection of artificial intelligence and data science where the algorithms "learn" to perform a
task by being given examples of the task they are to perform. Machine learning is becoming a
common tool in many-body physics, and there have been many recent studies analyzing its various
uses in the field [51–69]. A review is provided in Ref. [70]. The goal of this thesis is to use a
class of machine learning algorithms based on Bayesian statistics and a novel method of formatting
the data to create an algorithm called sequential regression extrapolation (SRE), which is capable
of removing the basis incompleteness and finite size errors from coupled cluster calculations of
infinite matter systems, using only complete calculations performed at small values of M and N.
Additionally, we wish to develop the SRE method is such a way that it can be applied to remove
truncation errors from any system and using any many-body method. Machine learning has been
used to perform extrapolations in many-body calculations ([51, 54, 71–75]), and some of these
                                                   5


studies do attempt to use machine learning ro remove the basis incompleteness or the finite size
errors, but the SRE method is unique among these studies. We will use the SRE predictions and
results calculated at the complete basis limit and the thermodynamic limit to determine the accuracy
of the extrapolations. We will also determine the time saved by generating a small training data set
at low M and N and using the SRE algorithm to extrapolate versus performing a coupled cluster
calculation near the absolute basis limit or the thermodynamic limit.
The remainder of this thesis contains seven content chapters followed by conclusions and future
works chapter. Chapter 2 will develop the basics of many-body theory, including second quantiza-
tion and diagrammatic methods. Chapter 3 develops Hartree-Fock theory, many-body perturbation
theory, and coupled cluster theory as the many-body methods of interest in this thesis. Chapter 4
expands on the infinite matter systems studied in this work, including the three-dimensional ho-
mogeneous electron gas, pure neutron matter, and symmetric nuclear matter. Chapter 5 introduces
machine learning, including neural networks, various regression algorithms, and Bayesian machine
learning. Chapter 6 develops the machine learning-based extrapolation algorithm, sequential re-
gression extrapolation (SRE), which will be used to remove the basis incompleteness and finite
size errors from coupled cluster calculations of infinite matter. Finally, Chapters 7 and 8 are the
results chapter, showing the success of the SRE algorithm in removing the basis incompleteness
and finite size errors from infinite matter calculations, first of the electron gas (Chapter 7) and then
from infinite nuclear matter (Chapter 8).
                                                    6


                                              CHAPTER 2
                                   MANY-BODY FRAMEWORK
2.1   Introduction to Many-Body Theory
Modeling a system containing many-interacting particles can be a complex problem, even in clas-
sical mechanics. This problem becomes even more complicated when the quantum mechanical
properties of the particles are taken into account, as this drastically increases the system’s complex-
ity. Furthermore, when the particles are protons and neutrons, this problem becomes exceedingly
tricky due to the need to include complex nuclear interaction. However, it is still possible to make
meaningful studies of nuclear many-body systems using many-body methods [1].
We will start our discussion of many-particle systems by defining the many-body Schrödinger
equation used to find the system’s energies. The time-dependent Schrödinger equation, shown in
Equation (2.1), can be used to find all of the possible wavefunctions of the many-body system.
Each wavefunction describes the system in a different configuration, so a many-body system has
many wavefunctions that need to be found. This can be represented as:
                                           𝜕Φ(® 𝑟 , 𝑡)
                                        𝑖ℏ             = 𝐻Φ(®
                                                         ˆ 𝑟 , 𝑡),                                (2.1)
                                              𝜕𝑡
where ℏ is the reduced Planck’s constant, Φ is a wavefunction of the system, and 𝐻ˆ is the many-body
Hamiltonian, which is unique to the system being studied. In this thesis we will use the common
notation where capital Greek letters (Φ and Ψ) represent many-body wavefunctions but lowercase
Greek letters (𝜙 and 𝜓) represent single-particle wavefunctions. In many cases, such as in this
thesis, we remove the time dependence from the many-body problem to reduce its complexity. This
reduces the Schrödinger equation to its time-independent form:
                                             ˆ 𝑟 ) = 𝐸Φ(®
                                            𝐻Φ(®            𝑟 ),                                  (2.2)
where 𝐸 is the system’s energy if it is in the state represented by the wavefunction Φ. Removing the
time dependence reduces the Schrödinger equation to an eigenvalue problem where the eigenvalues
are the energies of the many-body system. The eigenvectors are the corresponding wavefunctions
(or state vectors).
                                                       7


Next, we need to define the Hamiltonian of our system. The Hamiltonian for a many-particle system
can be represented generally in the form shown in Equation (2.3) [35]:
                                      𝐻ˆ = 𝐾ˆ + 𝑉ˆ1 + 𝑉ˆ2 + 𝑉ˆ3 + ... .                          (2.3)
In the above equation, 𝐾ˆ is the kinetic energy of the system, 𝑉ˆ1 is an external one-body potential,
𝑉ˆ2 represents the two-body interaction and 𝑉ˆ3 represents the three-body interaction. Note that in
nuclear physics there will not be an external one-body potential. For a system containing A particles,
there are a maximum of A-body interactions, though, in practice, the number of interactions (which
corresponds to the number of matrix elements in the Hamiltonian) is usually limited. It is common to
limit the many-body Hamiltonian to the 2-body interaction level, sometimes the 3-body interaction
level. It is uncommon to include interactions higher than the 3-body interaction level, especially for
large systems or nuclear systems with complex interactions, since the complexity of the interactions
increases drastically with the number of particles involved. Each sum also includes an increasing
number of particles at higher interaction levels. For example, the kinetic energy only has A terms,
but the two-body interaction sum already has 21 A(A-1) terms [1].
Once the Hamiltonian has been defined, the next step in the many-body problem is to obtain the
eigenvalues and the eigenfunctions from 𝐻,     ˆ which is a matrix. The eigenvalues of 𝐻ˆ are the
energies of the system, and the eigenvectors are the wavefunctions of the system and are thus
the solutions to the time-independent Schrödinger equation shown in Equation (2.2). However,
in practice, Equation (2.2) can only be solved exactly for small, uninteresting systems. Thus we
must turn to many-body methods to approximately solve for the energies of a many-body system
[1, 76, 77]. In the following two sections, we will develop the notation (Dirac notation) and
a way to represent the many-body basis that is more convenient (occupation representation and
second quantization). Then we will revisit the many-body Hamiltonian once these tools have been
developed before closing the chapter. Finally, we will develop other formalisms, including normal
ordering and diagrammatic methods.
                                                     8


2.2   Dirac Bracket Notation for Many-Body Systems
Before developing the different many-body methods, we must develop a framework for representing
specific quantities found in many-body theory. This framework work is called Dirac bra-ket
notation, and it dramatically simplifies many quantum mechanical expressions. It is commonly
used in quantum mechanics and many-body physics because it provides a simple way to represent
complex quantum mechanical operations as linear algebra problems.
The base quantity of Dirac notation is a ket. Given a column vector, x, which contains 𝑛 elements,
we can denote the vector in ket notations as:
                                                        
                                                   𝑥0 
                                                        
                                                        
                                                   𝑥1 
                                                        
                                                        
                                                        
                                                   𝑥2 
                                                        
                                            |x⟩ =  .  ,
                                                        
                                                                                                (2.4)
                                                        
                                                        
                                                   . 
                                                        
                                                        
                                                   . 
                                                        
                                                        
                                                        
                                                  𝑥 𝑛−1 
                                                        
where we will begin numbering the elements from 0, leading to the last element having an index
of 𝑛-1. In general, x𝑖 ∈ C.
The corresponding bra vector to the ket is simply the Hermitian conjugate of |x⟩:
                                        h                          i
                                 ⟨x| = 𝑥0∗ 𝑥 1∗ 𝑥 2∗ . . . 𝑥 𝑛−1∗    .                          (2.5)
Note that the bra vector is a row vector instead of a column vector.
In quantum mechanics, different kets represent the system’s different states. These kets differ from
representing quantum mechanical states in wave mechanics (where the state is represented as a
function) and in matrix mechanics (where the states are represented as basis expansions). However,
kets encode the same information as these other representations. In addition, kets allow invariance
in the state, meaning that the choice of coordinates and basis can be chosen at any point and quickly
changed [78]. For now, the bras and kets will only describe a state with one particle; we will
develop a formalism to describe many-particle states as bras and kets later in this section.
                                                  9


It is also important to note that since the ket |x⟩ is a quantum mechanical state, then it belongs to
an infinite dimensional Hilbert space, H (|x⟩ ∈ 𝐻). The corresponding bra, ⟨x|, also belongs to the
same Hilbert space [78].
Hilbert spaces are closed under linear combinations, which gives rise to the superposition of
quantum states [78]. Dirac notation also provides an easy way to represent states in a superposition,
meaning that they are simultaneous in two or more states, and the same state is determined upon
system measurement. Given two states |x⟩ and |y⟩, we can say that state |z⟩ is in a superposition of
|x⟩ and |y⟩ with the following:
                                                |z⟩ = 𝛼|x⟩ + 𝛽|y⟩,                                      (2.6)
where 𝛼 and 𝛽 are complex numbers. The bra of the above state ⟨z| can be denoted as:
                                               ⟨z| = 𝛼∗ ⟨x + 𝛽∗ ⟨y|.                                    (2.7)
Note that in Equation (2.7), the coefficients of the kets are 𝛼∗ and 𝛽∗ (i.e., complex conjugates)
instead of just 𝛼 and 𝛽.
The coefficients 𝛼 and 𝛽 have a physical interpretation. When the state |z⟩ is measured and collapses
into either |x⟩ or |y⟩, there is an |𝛼| 2 probability that the state will be found in |x⟩ and a |𝛽| 2 chance
that it will be found in |y⟩. Note that the notation |𝛼| 2 represents the complex modulus of 𝛼. Since
both |𝛼| 2 and |𝛽| 2 represent probabilities that the two different outcomes of measuring state |z⟩ will
occur then |𝛼| 2 + |𝛽| 2 = 1.
Inner Product
For kets |x⟩ and |y⟩, the inner product between the two will be defined as:
                               ⟨y|x⟩ = 𝑦 ∗0 𝑥 0 + 𝑦 ∗1 𝑥1 + 𝑦 ∗2 𝑥 2 + ... + 𝑦 ∗𝑛−1 𝑥 𝑛−1 .             (2.8)
It is important to note that⟨y|x⟩ = ⟨x|y⟩. Also note that ⟨y|x⟩ will sometimes be referred to as the
overlap between the states x and y.
Many quantum mechanics bases are orthogonal and normalized and therefore called orthonormal.
However, two restrictions are placed on most bases in many-body theory. First, inner products are
                                                          10


used to ensure that a basis is both orthogonal and normalized. In terms of inner products, this
means that for each state, i, in the basis, the inner product of the state with itself must be one to be
normalized:
                                                 ⟨i|i⟩ = 1, ∀ i.                                   (2.9)
The second restriction is that the basis is orthogonal. This means that the inner product of a state i
with another state j must be zero:
                                              ⟨i|j⟩ = 0, ∀ i ≠ j.                                 (2.10)
For a basis that is both normalized and orthogonal, the basis is said to be orthonormal, and the
conditions can be represented in compact delta notation as:
                                              ⟨i|j⟩ = 𝛿𝑖 𝑗 , ∀ i, j.                              (2.11)
Outer Product
The outer product between |x⟩ and |y⟩ can be computing using the following formula:
                                  𝑥 0 𝑦 ∗0    𝑥0 𝑦 ∗1      𝑥 0 𝑦 ∗2        𝑥 0 𝑦 ∗𝑛−1 
                                                                                       
                                                                    . . .
                                                                                       
                                  𝑥1 𝑦∗       𝑥 1 𝑦 ∗1     𝑥 1 𝑦 ∗2 . . .  𝑥 1 𝑦 ∗𝑛−1 
                                         0
                                                                                       
                                  𝑥 2 𝑦 ∗0    𝑥 2 𝑦 ∗1     𝑥 2 𝑦 ∗2              ∗
                                                                                       
                                                                     . . .  𝑥 2 𝑦 𝑛−1 
                                                                                       
                      |x⟩⟨y| =  .
                                                                                       
                                                  .            .     . . .       .      .        (2.12)
                                                                                       
                                                                                       
                                  .              .            .     . . .       .      
                                                                                       
                                                                                       
                                  .              .            .     . . .       .      
                                                                                       
                                                                                       
                                 𝑥 𝑛−1 𝑦 ∗0 𝑥 𝑛−1 𝑦 ∗1 𝑥 𝑛−1 𝑦 ∗2                  ∗
                                                                                       
                                                                     . . . 𝑥 𝑛−1 𝑦 𝑛−1 
                                                                                       
Another restriction on a quantum mechanical basis is that it is complete. We can define the
completeness relation for a general basis as:
                                                 ∑︁
                                                        |𝑖⟩⟨𝑖| = I,                               (2.13)
                                                    𝑖
where I is the identity matrix in Hilbert space H.
                                                          11


Tensor Product
Tensor products are essential when describing many-particle systems from one-particle state vectors
(or wavefunctions). In wave mechanics, we would describe the many-particle wavefunction with N
particles as
                                Ψ(x1 , x2 , ....x𝑁 ) = 𝜓 𝑝1 (x1 )𝜓 𝑝2 (x2 )...𝜓 𝑝 𝐴 (x𝑁 ),           (2.14)
where 𝜓 𝑝𝑖 represent one-particle wave functions and x𝑖 are the single-particle degrees of freedom
(usually position and spin) [78]. These are called product states, forming a complete A-body basis
[78]. In Dirac notation, we can also represent a many-particle wavefunction similarly, but in this
case, we will use tensor products [78]:
                            |Ψ⟩ = |𝜓 𝑝1 𝜓 𝑝2 ...𝜓 𝑝 𝐴 ⟩ = |𝜓 𝑝1 ⟩ ⊗ |𝜓 𝑝2 ⟩ ⊗ ... ⊗ |𝜓 𝑝 𝐴 ⟩.        (2.15)
In Dirac notation, we can represent the complete N-body basis as:
                                                   ∑︁
                                        |Ψ⟩ =                𝑑 𝑝1 ...𝑝 𝐴 |𝜓 𝑝1 ...𝜓 𝑝 𝐴 ⟩,           (2.16)
                                                 𝑝 1 ,...𝑝 𝐴
where 𝑑 𝑝1 ...𝑝 𝐴 = ⟨𝜓 𝑝1 ...𝜓 𝑝 𝐴 |Ψ⟩ is the overlap between the states.
Note that the order of the single particle wavefunctions matters in either Equation (2.14) or in
Equation (2.15). It is possible to exchange the order of the single particle wavefunctions, which
will be discussed in the next section.
Operators, Expectation Values, and Matrix Elements
In Dirac notation, operators are matrices typically represented using hat notation: 𝑜.        ˆ To find the
expectation value of this operator in Dirac notation, it is placed in between a bra and a ket:
                                                           ⟨𝜓| 𝑜|𝜙⟩,
                                                                ˆ                                    (2.17)
where 𝜓 and 𝜙 can be the same or different wavefunction depending on the operator. This
combination of bra, operator, and ket is also called a matrix element. Several operators will be
defined in the next section once more framework exists to support them, but we can define one
operator here. The exchange operator, 𝑃,         ˆ is defined such that:
                                                 ˆ 1 𝜓2 ⟩ = ±|𝜓2 𝜓1 ⟩,
                                                𝑃|𝜓                                                  (2.18)
                                                               12


where 𝜓𝑖 are single particle wavefunction [79]. A plus sign on the right side of the equation
indicates the particles are bosons, and a minus sign indicates the particles are fermions; in this
work, we will only consider particles that are fermions. Now, we can define the following two
matrix elements using the exchange operator and the orthonormal rules extended to a many-body
basis:
                               ⟨𝜓1 𝜓2 | 𝑃|𝜓
                                        ˆ 1 𝜓2 ⟩ = −⟨𝜓1 𝜓2 |𝜓2 𝜓1 ⟩ = 0.                      (2.19)
and
                              ⟨𝜓1 𝜓2 | 𝑃|𝜓
                                       ˆ 2 𝜓1 ⟩ = −⟨𝜓1 𝜓2 |𝜓1 𝜓2 ⟩ = −1.                      (2.20)
Note that the exchange operator can work on more than two-particle states. If there are more than
two particles in the many-body wavefunction, the particles the exchange operator is acting on are
denoted as subscripts:
                                  𝑃ˆ12 |𝜓1 𝜓2 𝜓3 ...⟩ = −|𝜓2 𝜓1 𝜓3 ...⟩.                      (2.21)
Slater Determinants
The wavefunction for an A-particle system that solves Schrödinger’s equation precisely can be
expressed as an expansion of Slater determinants which span a complete, usually infinite, basis
[76]. However, usually, the single-particle space must be truncated only to contain M single-particle
                                                                                     
states due to computational limitations, for which an N-particle system leads to 𝑀  𝑁 determinants
[76].
We can represent out many-body wavefunction, Φ, as the following determinant:
                                     𝜓1 (®𝑥 1 ) 𝜓2 (®𝑥 1 ) . . . 𝜓 𝑁 (®    𝑥1 )
                                     𝜓1 (®𝑥 2 ) 𝜓2 (®𝑥 2 ) . . . 𝜓 𝑁 (®    𝑥2 )
                                 1       .          .      . . .         .
                            Φ=                                                  ,             (2.22)
                                 𝑁       .          .      . . .         .
                                         .          .      . . .         .
                                     𝜓1 (®𝑥 𝜈 ) 𝜓2 (®𝑥 𝜈 ) . . . 𝜓 𝑁 (®    𝑥𝜈 )
                                                     13


where Φ is the many-body wavefunction and 𝜓𝑛 (®           𝑟 𝑚 ) is the single-particle wavefunction of the n-th
particle with spatial and spin coordinates described by 𝑟®𝑚 . Each column represents a different
single-particle wavefunction, and each row represents a different set of spatial and spin coordinates.
Thus represented in the Slater determinant represents every particle being in every configuration.
2.3    Occupation Representation and Second Quantization
Occupation Number Representation
Up to this point, we have used |Φ 𝑝1 ...𝑝 𝐴 ⟩ to represent a many-fermion wavefunction with 𝐴 fermions.
The single-particle basis in which the single-particle wavefunctions exist is not denoted explicitly,
but it contains M single-particle states (where M can be finite or infinite). There is another way to
represent a many-fermion wavefunction in a way where the number of single-particle states in the
basis is more explicitly shown. This is called occupation number representation. Given an ordered
single particle basis, any many-fermion wavefunction can be represented as:
                                      |Φ⟩ = |𝑛1 𝑛2 𝑛3 ....𝑛 𝑀 ⟩     𝑛𝑖 = 0, 1.                           (2.23)
If the state 𝑛𝑖 is occupied (i.e., contains a particle), then 𝑛𝑖 is one. If 𝑛𝑖 is unoccupied, then it is
a zero. As an example, consider the many-fermion wavefunction made up of the single particle
wavefunctions 𝜓 𝑝1 , 𝜓 𝑝4 , 𝜓 𝑝7 , and 𝜓 𝑝8 . The many-fermionic wavefunction made from these states
can be represented in occupation number representation as:
                                     |Φ 𝑝1 𝑝4 𝑝7 𝑝8 ⟩ = |010010011000...⟩.                               (2.24)
In the above equation, the number of trailing zeros can be finite or infinite depending on if M is
finite or infinite. The total number of occupied states (states which are one) must equal the number
of particles, N, in the system. So in Equation (2.24), there are four particles in the system, so there
must be four ones in the many-body wavefunction. Also, note that, as previously, we start indexing
the single particle states at zero instead of one.
We can represent a couple of other important wavefunctions in occupation number representation.
A single particle wavefunction can be represented in occupation number representation, where only
                                                         14


the index corresponding to the single particle state is non-zero. For example, 𝜓2 in occupation
number representation is:
                                      |𝜓2 ⟩ = |2⟩ = |001000....⟩,                                    (2.25)
where we have introduced the shorthand |𝜓 𝑝 ⟩ = | 𝑝⟩ [78]. We can also introduce the true vacuum
state in occupation number representation, which is simply the state where all M single-particle
states are unoccupied (all the indices are zero) [78]:
                                       |0000000000.....⟩ = |0⟩.                                      (2.26)
Occupation number representation will be a more convenient way to represent these many-fermionic
wavefunctions moving forward as we develop the rest of the necessary formulas. Therefore, moving
forward, a many-fermion wavefunction is assumed to be in occupation number representation unless
explicitly stated otherwise.
Creation and Annihilation Operators
The annihilation and creation operators allow us to change from one many-body state to another by
removing existing particles and adding new ones. Let 𝑎 †𝑝 represent the fermionic creation operator
acting on state 𝑝 and 𝑎 𝑝 represent the fermionic annihilation operator acting on state p (p = 0, 1,
..., M). The creation and annihilation operators are Hermitian adjoints of each other ((𝑎 †𝑝 ) † = 𝑎 𝑝
and (𝑎 𝑝 ) † = 𝑎 †𝑝 ).
2.3.0.1     Annihilation and Creation Operators on Single-Particle States
When acting on the single particle state | 𝑝⟩, the annihilation operator 𝑎 𝑝 removes the particle at
state 𝑝, resulting in the true vacuum state. When the creation operator 𝑎 †𝑝 acts on state | 𝑝⟩, it results
in an answer of zero since it cannot create an occupied state at index 𝑝 if one already exists. Thus
we have:
                                     𝑎 𝑝 | 𝑝⟩ = |0⟩ 𝑎𝑛𝑑 𝑎 †𝑝 | 𝑝⟩ = 0.                               (2.27)
When the creation operator 𝑎 †𝑝 acts on the true vacuum state |0⟩, it creates the single-particle state
| 𝑝⟩. On the other hand, when the annihilation operator 𝑎 𝑝 acts on the true vacuum state, it answers
                                                    15


zero since it cannot create an unoccupied state at index 𝑝 if it is already unoccupied. Then:
                                         𝑎 𝑝 |0⟩ = 0 𝑎𝑛𝑑 𝑎 †𝑝 |0⟩ = | 𝑝⟩.                         (2.28)
Annihilation and Creation Operators on Many-Particle States
The annihilation and creation operators are applied to the many-fermion states similarly to the
single-particle states. For example, when the creation operator 𝑎 †𝑝 is applied to a many-fermion
state, it returns the same state with p now occupied (𝑝=1) if 𝑝 was unoccupied and 0 if state 𝑝 was
already occupied. However, there is now an additional phase factor that needs to be accounted for.
                                                       
                                                       
                                                        (−1) 𝑚 |...1...⟩,  if 𝑝 = 0
                                                       
                                                       
                                                       
                                𝑎 †𝑝 |...𝑝...⟩ =                                                  (2.29)
                                                       
                                                       
                                                        0,
                                                                           if 𝑝 = 1
                                                       
In Equation (2.29), 𝑚 refers to the number of occupied states prior to state 𝑝 and determines the
phase factor. The annihilation operator is applied to a many-fermion state in the same manner:
                                                     
                                                     
                                                      (−1) 𝑚 |...0...⟩,   if 𝑝 = 1
                                                     
                                                     
                                                     
                               𝑎 𝑝 |...𝑝...⟩ =                                       .            (2.30)
                                                     
                                                     
                                                      0,
                                                                          if 𝑝 = 0
                                                     
Annihilation and creation operators can be applied in chains to a given many-body state. In this
case, the rightmost operator acts first, then the next rightmost operator acts on the result, and so on.
Below is an example of how chains of operators can act on a given many-body state to create an
entirely new state.
                                          𝑎 †7 𝑎 6 𝑎 †5 𝑎 †3 𝑎 3 |1001101010⟩
                                           = 𝑎 †7 𝑎 6 𝑎 †5 𝑎 †3 |1001101010⟩
                                           = 𝑎 †7 𝑎 6 𝑎 †5 |1000101010⟩
                                                                                                  (2.31)
                                           =   𝑎 †7 𝑎 6 |1001101010⟩
                                           = 𝑎 †7 |1001111010⟩
                                           = |1001110110⟩
                                                                 16


Every many-fermion state can be created from the true vacuum state with a chain of creation
operators.
                                        |Φ 𝑝1 ...𝑝 𝐴 ⟩ = 𝑎 †1 ...𝑎 †𝐴 |0⟩                         (2.32)
Using these creation operators, we can create a critical state: the Fermi vacuum state represented
as |Ψ0 ⟩. For an N particle system, the Fermi vacuum state occupies the N lowest energy states,
and the remaining single-particle states are unoccupied. We can create the Fermi vacuum state by
applying N creation operators to the actual vacuum state:
                                         |Φ0 ⟩ = 𝑎 †1 𝑎 †2 ...𝑎 †𝑁 |0⟩.                           (2.33)
By defining the Fermi vacuum state, we also define the Fermi level. The Fermi level is the line
that separates the highest energy-occupied single particle state from the lowest energy-unoccupied
single particle state in the Fermi vacuum state. We also will denote a new way to label single
particle states based on the Fermi vacuum state. States which are occupied in the Fermi vacuum
state (i.e., are below the Fermi level) will be labeled with indices 𝑖, 𝑗, 𝑘, 𝑙, ... . States which are
unoccupied in the Fermi vacuum state (i.e., are above the Fermi level) are labeled with the indices
𝑎, 𝑏, 𝑐, 𝑑,... . States which are labeled with the indices 𝑝, 𝑞, 𝑟, 𝑠, ... can be either above or below
the Fermi level. This notation will be vital as we construct the many-body methods in the next
chapter.
Anticommutation Relations
All creation and annihilation operators must obey the following two anticommutation rules. Re-
member that {𝐴, 𝐵} = 𝐴𝐵 + 𝐵𝐴 = 𝐵𝐴 + 𝐴𝐵 = {𝐵, 𝐴}. Thus, we have the anticommutation
rules:
                                              {𝑎 †𝑝 , 𝑎 𝑞 } = 𝛿 𝑝𝑞 ,                              (2.34)
and
                                       {𝑎 𝑝 , 𝑎 𝑞 } = {𝑎 †𝑝 , 𝑎 †𝑞 } = 0.                         (2.35)
                                                        17


Operators in Second Quantization
Number Operators
The first operator we will look at in the second quantization is the number operator. The number
operator is defined as:
                                                   𝑁ˆ 𝑝 = 𝑎 †𝑝 𝑎 𝑝 .                             (2.36)
When the number operator acts on a many-body state, the result is:
                                                     
                                                     
                                                                     if 𝑝 = 1
                                                     
                                                      𝑝|...𝑝...⟩,
                                                     
                                                     
                                  𝑁ˆ 𝑝 |...𝑝...⟩ =                            .                  (2.37)
                                                     
                                                     
                                                      1,
                                                                    if 𝑝 = 0
                                                     
Since the many-body state is unchanged on both sides of the equation for 𝑝=1, the many-body state
is an eigenket of the number operator if index 𝑝 is occupied. Note that 𝑁ˆ 𝑝 conserves particle number
(because there are an equal number of creation and annihilation operators) and is Hermitian.
The total number operator is defined as the sum of number operators overall single-particle indices:
                                                         ∑︁
                                                  𝑁ˆ =        𝑁ˆ 𝑝 .                             (2.38)
                                                          𝑝
When 𝑁ˆ acts on a many-body state, it returns the number of fermions in that state. For example,
when the number operator is applied to the 𝐴-body state |Ψ⟩:
                                                 𝑁ˆ |Ψ⟩ = 𝐴|Ψ⟩.                                  (2.39)
Note that |Ψ⟩ is an eigenket of the number operator.
Hamiltonian and 𝑘-Body Operators
As in the first section of this chapter, we can define the Hamiltonian for a many-body system as:
                                            𝐻ˆ = 𝐾ˆ + 𝑉ˆ2 + 𝑉ˆ3 + ...,                           (2.40)
where 𝐾ˆ is the one-body operator (the kinetic energy), 𝑉ˆ2 is the two-body operator, 𝑉ˆ3 is the three-
body operator, and so on. There are, at most, 𝐴 operators in the Hamiltonian for an N-body system.
                                                         18


Second quantization provides us with two ways to represent these 𝑘-body operators: Goldstone and
Hugenholtz. In the Goldstone form, the 𝑘-body operator can be represented as:
                                         1        ∑︁
                                𝑉ˆ 𝑘 = ( ) 2               ⟨𝑝 1 ...𝑝 𝑘 | 𝑣ˆ 𝑘 |𝑞 1 ...𝑞 𝑘 ⟩𝑎 †𝑝1 ...𝑎 †𝑝 𝑘 𝑎 𝑞 𝑘 ...𝑎 𝑞1 ,                  (2.41)
                                         𝑘! 𝑝1 ...𝑝 𝑘
                                                𝑞 1 ...𝑞 𝑘
where the matrix element is defined as:
                                       ∫        ∫
    ⟨𝑝 1 ...𝑝 𝑘 | 𝑣ˆ 𝑘 |𝑞 1 ...𝑞 𝑘 ⟩ =      ...        𝜓 ∗𝑝1 (𝑥 1 )...𝜓 ∗𝑝 𝑘 (𝑥 𝑘 ) 𝑣ˆ 𝑘 (𝑥 1 , ..., 𝑥 𝑘 )𝜓 𝑞1 (𝑥1 )...𝜓 𝑞 𝑘 (𝑥 𝑘 )𝑑𝑥 1 ...𝑑𝑥 𝑘
                                                                                                                                            (2.42)
In Equation (2.42), we again use the single-particle wavefunctions to calculate the many-fermion
matrix elements. Again, the 𝑘-body interaction in the Hugenholtz form is very similar, except the
matrix elements are antisymmetrized:
                                        1        ∑︁
                               𝑉ˆ 𝑘 = ( ) 2               ⟨𝑝 1 ...𝑝 𝑘 | 𝑣ˆ 𝑘 |𝑞 1 ...𝑞 𝑘 ⟩ 𝐴 𝑎 †𝑝1 ...𝑎 †𝑝 𝑘 𝑎 𝑞 𝑘 ...𝑎 𝑞1 .                (2.43)
                                        𝑘! 𝑝1 ...𝑝 𝑘
                                               𝑞 1 ...𝑞 𝑘
Using the antisymmetrized matrix elements is often more convenient and compact, so we will
primarily use the Hugenholtz form for this thesis.
Due to computation limitations, the maximum 𝑘-body interaction included in the calculations in
this thesis is the three-body interaction. Therefore, we can explicitly define the one-body, two-body,
and three-body interactions in the Hugenholtz form below.
                                                                 ∑︁
                                                           𝐾ˆ =       ⟨𝑝| 𝑘ˆ |𝑞⟩ 𝐴 𝑎 †𝑝 𝑎 𝑞                                                 (2.44)
                                                                  𝑝𝑞
                                                            1 ∑︁
                                                𝑉ˆ2 =               ⟩ 𝑝𝑞| 𝑣ˆ 2 |𝑟 𝑠⟩𝑎 †𝑝 𝑎 †𝑞 𝑎 𝑠 𝑎𝑟                                        (2.45)
                                                            2 𝑝𝑞𝑟 𝑠
                                                    1 ∑︁
                                          𝑉ˆ3 =                 ⟩ 𝑝𝑞𝑟 | 𝑣ˆ 3 |𝑠𝑡𝑢⟩𝑎 †𝑝 𝑎 †𝑞 𝑎𝑟† 𝑎 𝑢 𝑎 𝑡 𝑎 𝑠                                 (2.46)
                                                    6 𝑝𝑞𝑟 𝑠𝑡𝑢
The exact form of 𝑣ˆ 2 and 𝑣ˆ 3 will depend on the interaction of the system being studied. However,
since the systems are defined in detail in Chapter 4, they will be left in their generic forms until
then.
                                                                          19


2.4   Normal Ordering
We now see how many-body states and operators can be written in second quantization. Though
second quantization simplifies many-body calculations, some can still become long and complicated
with strings of creation and annihilation operators. Normal ordering simplifies these calculations
by arranging chains of creation and annihilation operators such that annihilation operators are all
on the right and creation operators are all on the left. We can thus define a typical ordered sequence
of operators as:
                                   𝑛[ 𝑜ˆ1 ... 𝑜ˆ 𝑚 ] = (−1) 𝑅 𝑎𝑖† ...𝑎 †𝑗 𝑎 𝑗+1 ...𝑎 𝑚 ,            (2.47)
where 𝑛[...] defines a normal ordered sequence of operators, 𝑜ˆ𝑖 represents either a creation or
annihilation operator and 𝑅 handles the sign change for the permutations—each time two operators
exchange location, 𝑅 increments by one. Normal ordering is essential because it makes determining
which expressions and matrix elements will yield a non-zero result easier. For example, consider
the following matrix element.
                                              ⟨00101|𝑎 4 𝑎 †1 𝑎 0 |10001⟩                           (2.48)
Written in its current form, it is difficult to tell if this matrix element will be non-zero without
evaluating each operator. However, if we normal order the operators, it becomes more apparent.
     ⟨00101|𝑛[𝑎 3 𝑎 †1 𝑎 0 ]|10010⟩ = −⟨00101|𝑛[𝑎 †1 𝑎 4 𝑎 0 ]|10001⟩ = ⟨00101|𝑎 †1 𝑎 0 𝑎 4 |10001⟩ (2.49)
Now it is obvious that the matrix element will yield zero as 𝑎 †1 acting on the bra will return zero.
All operators that have been derived so far and which will be derived in the coming chapters will
be written in a normal ordered form as this will drastically simplify the extensive calculations we
will be performing.
A closely related concept to normal ordering is Wick’s theorem, which allows for these complex
many-body states and operators to be simplified further. However, Wick’s theorem will not be used
in future derivations, so it will not be derived here. For a thorough explanation and derivation of
Wick’s theorem, see Reference [79] and [80]. -
                                                          20


2.5    Diagrammatic Methods
Diagrammatic notation for representing Slater determinants and operators originated with Richard
Feynman in the field of quantum field theory (QFT). The methods were further developed for
applications in many-body physics, see, for example, Paldus and Cizek in 1975 [81]. The following
will be a brief description of these diagrams. For a more thorough explanation of these diagrams,
please refer to Chapter 4 of Reference [80] and Reference [81].
Diagrammatic methods are practical in studying many-body systems because the equations involved
with solving these methods can get out of hand quickly. In addition, diagrammatic methods provide
a more compact way to display these equations, which can hint at how specific components of the
calculations will cancel or combine with other parts.
First, we will begin by representing the primary states in diagrammatic representation. First, the
reference state is represented by nothing; for most systems in this thesis, the reference state will
be the Fermi vacuum state (|Φ0 ⟩). A one-particle one-hole excitation, |Φ𝑖𝑎 ⟩, is represented by two
vertical lines. The line pointing upwards represents a state that is unoccupied in the Fermi vacuum
state (i.e., indexed by 𝑎, 𝑏, 𝑐, ...), and a line pointing downwards represents a state that is occupied in
the Fermi vacuum state (i.e., indexed by 𝑖, 𝑗, 𝑘, ...). Any line which does not have a directional arrow
could represent either state (i.e., would be indexed with 𝑝, 𝑞, 𝑟, ...). Note that neither the horizontal
arrangements nor spacing are significant in the representation. A two-particle, two-hole excited
state, |Φ𝑖𝑎𝑏𝑗 ⟩, is represented by four vertical lines, two of which point down and two which point up.
Following this pattern, an n-particle n-hole excited state is represented by 2𝑛 vertical lines, where
n points upwards and n points downwards. figure 2.1 shows the diagrammatic representations of
the Fermi vacuum state, the one-particle one-hole, and the two-particle two-hole excited states.
Representing the excited states as creation and annihilation operators acting on the Fermi vacuum
state is also possible. Remember that the one-particle one-hole excited state can be written as
|Φ𝑖𝑎 ⟩ = 𝑎 †𝑎 𝑎𝑖 |Φ0 ⟩ where 𝑎 † is a fermionic creation operator and 𝑎 is the fermionic creation operator.
This means that the two-particle two-hole excited state can be written as |Φ𝑖𝑎𝑏𝑗 ⟩ = 𝑎 †𝑎 𝑎 †𝑏 𝑎 𝑗 𝑎𝑖 |Φ0 ⟩.
The state 𝑎 †𝑎 𝑎𝑖 |Φ0 ⟩ can be represented the same way as |Φ𝑖𝑎 ⟩, but with two horizontal lines at the
                                                       21


Figure 2.1 Diagrammatic representations for the Fermi vacuum state, the one-particle one-hole
excited state, and the two-particle two-hole excited state
Figure 2.2 Diagrammatic representations for the one-particle one-hole and two-particle two-hole
excited states written as annihilation and creation operators applied to the Fermi vacuum state
bottom to represent the Fermi vacuum state. To represent the bra of this state (⟨Φ𝑖𝑎 | = ⟨Φ0 |𝑎𝑖† 𝑎 𝑎 ) is
represented in the same way, but the vertical lines representing the Fermi vacuum state are moved to
the top. The two-particle two-hole excited state is represented similarly. These diagrams are shown
in figure 2.2. Note that there is some ambiguity in the representation of |Φ𝑖𝑎𝑏𝑗 ⟩ in this representation,
it could also represent |Φ𝑖𝑏𝑎𝑗 ⟩.
Now that we have the diagrammatic notation for the various states, we can define a notation for
the various operators. Here we will explicitly define the one-body and two-body operators in
diagrammatic notation, but the form of the higher-order operators will follow the same patterns.
We can define a generic one-body operator as:
                                             ∑︁
                                      𝑂ˆ 1 =    ⟨𝑝| 𝑜ˆ1 |𝑞⟩{𝑎 †𝑝 𝑎 𝑞 },                            (2.50)
                                             𝑝𝑞
where 𝑝 and 𝑞 can be occupied or unoccupied states, this one-body operator can be represented
                                                  22


Figure 2.3 A generic representation of the one body operator. The orientation of the lines does not
matter, so the diagram can be represented with both lines being entirely vertical or with the lines
being slanted. Since these lines do not have arrows, they represent generic single-particle states
that could be occupied or unoccupied in the Fermi vacuum state
with either diagram in Figure 2.3. The horizontal orientation of the lines does not matter so the
lines can be in the purely vertical orientation, as in the left diagram, or they can be slanted, as in the
right diagram. Since the lines in neither diagram of Figure 2.3 have directional arrows, the lines
are represented by indices 𝑝 and 𝑞.
When creating the specific diagrams for the one-body operator, there are four options, as shown
in Figure 2.4. There is one diagram where both indices are from states that are unoccupied in the
Fermi vacuum state, which is represented by the top left diagram and equation form as:
                                          ∑︁
                                               ⟨𝑎| 𝑜ˆ1 |𝑏⟩{𝑎 †𝑎 𝑎 𝑏 }.                             (2.51)
                                           𝑎𝑏
One diagram also results from having both states taken from the states occupied in the Fermi
vacuum state. This is the diagram shown in the top right of Figure 2.4 and is represented in
equation form as:
                                          ∑︁
                                                ⟨𝑖| 𝑜ˆ1 | 𝑗⟩{𝑎𝑖† 𝑎 𝑗 }.                            (2.52)
                                            𝑖𝑗
Finally, two diagrams can be created: one state is drawn from the occupied states and one from the
unoccupied states. These are shown in the bottom row of Figure 2.4. The bottom left diagram is
represented in equation form:
                                           ∑︁
                                                ⟨𝑎| 𝑜ˆ1 |𝑖⟩{𝑎 †𝑎 𝑎𝑖 },                             (2.53)
                                            𝑖𝑎
                                                       23


Figure 2.4 The specific diagrams for the one-body operator, with the top left diagram for two
unoccupied states, the top right diagram for two unoccupied states, and the bottom two diagrams
containing one occupied and one unoccupied state
the bottom right diagram in equation form is:
                                         ∑︁
                                               ⟨𝑖| 𝑜ˆ1 |𝑎⟩{𝑎𝑖† 𝑎 𝑎 }.                          (2.54)
                                          𝑖𝑎
In the diagrams, the summation of the indices is implied using Einstein notation. In general,
when creating diagrams from an equation, the index in the bra of the matrix element will exit
the interaction vertex, and the index in the ket will be the line that enters the interaction vertex:
⟨𝑙𝑒 𝑓 𝑡 𝑙𝑖𝑛𝑒| 𝑜ˆ1 |𝑟𝑖𝑔ℎ𝑡 𝑙𝑖𝑛𝑒⟩.
Finally, in this section, we can move onto the diagrammatic representation of a generic two-body
operator, which we can represent in equation form as:
                                        ∑︁
                                 𝑂ˆ 2 =       ⟨𝑝𝑞| 𝑜ˆ2 |𝑟 𝑠⟩{𝑎 †𝑝 𝑎 †𝑞 𝑎 𝑠 𝑎𝑟 }.               (2.55)
                                        𝑝𝑞𝑟 𝑠
The generic two-body diagram can be described in Figure 2.5, and the nine specific two-body
diagrams can be found in Figure 2.6.
                                                       24


                     Figure 2.5 The general diagram for a two-body interaction
                 Figure 2.6 The nine specific diagrams for a two-body interaction
From these examples, we can begin to create a set of rules for creating these diagrams and converting
them to algebraic expressions:
   1. Lines representing occupied states in the Fermi vacuum are labeled with indices like 𝑖, 𝑗,
      𝑘, ... and point downwards. Lines representing unoccupied states in the Fermi vacuum are
      labeled with indices 𝑎, 𝑏, 𝑐, ... and point upwards. Finally, lines that could represent either
      type of state are labeled with indices 𝑝, 𝑞, 𝑟, ... and have no directional arrows.
   2. The horizontal arrangement and spacing of the diagrams do not matter.
   3. Lines that represent the bra of the interaction matrix element leave an interaction vertex and
      are called external lines.
   4. Lines that represent the ket of the interaction matrix element enter an interaction vertex and
      are called internal lines.
                                                  25


    5. Every one body interaction vertex represents the factor ⟨𝑜𝑢𝑡| 𝑜ˆ1 |𝑖𝑛⟩.
    6. Every two-body interaction vertex represents the factor ⟨𝑙𝑒 𝑓 𝑡 − 𝑜𝑢𝑡 𝑟𝑖𝑔ℎ𝑡 − 𝑜𝑢𝑡| 𝑜ˆ2 |𝑙𝑒 𝑓 𝑡 −
       𝑖𝑛 𝑟𝑖𝑔ℎ𝑡 − 𝑖𝑛⟩.
    7. If two lines start and end in the exact location, they are considered equivalent. Each set of
       equivalent lines in a diagram represents a factor of 1/2.
    8. The sign of a diagram is calculated using (−1) (𝑛𝑜𝑐𝑐 +𝑙) , where 𝑛𝑜𝑐𝑐 are the number of lines
       representing states that are occupied in the Fermi vacuum state and l is the number of loops
       in the diagram.
    9. Each pair of unique external lines (i.e., the bras of the interactions) that are not connected to
       the same interaction picks up a permutation factor.
2.6    Conclusion
In this chapter, we have developed second quantization and diagrammatic methods, two frameworks
that will allow us to represent and solve problems in many-body physics more efficiently. However,
many problems of interest are still too large and complicated to be solved with the full Schrödinger’s
equation approach, even with the advancements made in this chapter. Therefore, in addition to these
two frameworks, we need to develop a set of many-body methods, which will provide approximate
solutions to the Schrödinger equation, but which will be able to describe much larger systems of
interest. In the next chapter, three of these many-body methods will be developed–Hartree-Fock
theory, many-body perturbation theory, and coupled cluster theory–with coupled cluster theory
being particularly interesting to the remainder of this thesis.
                                                   26


                                              CHAPTER 3
                                     MANY-BODY THEORIES
3.1   Introduction
Ab initio many-body methods aim to solve the many-body problem starting only from the Hamilto-
nian of the system and a set of known approximations [51]. Although a proper ab initio approach
in nuclear physics entails dealing with degrees of freedom that include quarks and gluons, we use
ab initio here to mean a method that can systematically improve upon the many-body framework
devised in the last chapter starting from a given Hamiltonian and the respective laws of motion.
Many ab initio many-body methods are applied to nuclear physics and other fields, but for this
thesis, we will limit the scope to just three of these methods. Hartree-Fock theory (HF) is one of the
oldest and simplest many-body methods. However, despite its simplicity and small computational
requirements, it is one of the least accurate methods. The other two methods investigated in this
chapter are post-Hartree Fock methods, which improve the Hartree Fock result. These two methods
are many-body perturbation theory (MBPT) and coupled cluster theory (CC).
This chapter will first explore Hartree-Fock theory as the simplest many-body method and the basis
of MBPT and CC. Then, after a brief introduction to post-Hartree Fock methods, MBPT will be
investigated, followed by a thorough explanation of coupled cluster theory.
3.2   Hartree-Fock Theory
The first many-body method we will explore in this chapter is the Hartree-Fock theory (HF) [82,
83]. Hartree-Fock, one of the earliest many-body methods, is an iterative algorithm that can be used
to find the ground state of a system given its Hamiltonian [78]. As an independent-particle model,
                                                    27


Hartree-Fock Theory begins with the Slater determinant, defined in Section 2.2 and rewritten here:
                                       𝜓1 (®𝑥 1 ) 𝜓2 (® 𝑥 1 ) . . . 𝜓 𝑁 (®         𝑥1 )
                                       𝜓1 (®𝑥 2 ) 𝜓2 (® 𝑥 2 ) . . . 𝜓 𝑁 (®         𝑥2 )
                                    1      .           .         . . .           .
                              Φ=                                                        ,                 (3.1)
                                   𝑁       .           .         . . .           .
                                           .           .         . . .           .
                                       𝜓1 (®𝑥 𝜈 ) 𝜓2 (® 𝑥 𝜈 ) . . . 𝜓 𝑁 (®         𝑥𝜈 )
where Φ is the many-body wavefunction and 𝜓𝑛 (®          𝑟 𝑚 ) is the single-particle wavefunction of the 𝑚-th
particle with spatial and spin coordinates described by 𝑟®𝑚 . Hartree-Fock Theory uses an iterative
algorithm to vary the single-particle wavefunctions, 𝜓 such that the energy of the entire many-body
wavefunction, represented by the Slater determinant, is minimized [80]. This minimization can be
achieved by solving eigenvalue problems for all single-particle wavefunctions. All of the eigenvalue
equations are coupled and have the form:
                                                  𝑓ˆ𝜓𝑖 = 𝜖𝑖 𝜓𝑖 ,                                          (3.2)
where 𝜖𝑖 is the single-particle energy and 𝑓ˆ is the single-particle Fock operator. The single-particle
Fock operator, which depends on all of the single-particle wavefunctions, has the following matrix
elements:
                                                              ∑︁
                                  ⟨𝑝| 𝑓ˆ|𝑞⟩ = ⟨𝑝| 𝑡ˆ|𝑞⟩ +         ⟨𝑝𝑖| 𝑣ˆ 2 |𝑞𝑖⟩ 𝐴 ,                      (3.3)
                                                                𝑖
where 𝑡ˆ is the single-particle kinetic energy operator and 𝑣ˆ 2 is the interaction between two particles
[3]. It is important to note that for nuclear problems, interactions beyond the two-body level
contribute significantly to the total interaction. We can also define the many-body Fock operator,
 ˆ as:
𝐹,
                                                 ∑︁
                                          𝐹ˆ =       ⟨𝑝| 𝑓ˆ|𝑞⟩𝑎 †𝑝 𝑎 𝑞 .                                  (3.4)
                                                  𝑝𝑞
For many systems, the solution provided by HF provides an excellent initial approximation for the
ground state energy and its corresponding many-body wavefunction. HF can recover approximately
                                                       28


99% of the ground state energy and approximately 95% of the corresponding wavefunction for
electronic systems [3, 80]. However, since HF is an independent particle model, the missing energy
comes from the interaction between particles, and therefore this energy must be recovered as well.
The need to recover the energy from the interactions between particles, known as the correlation
energy, has led to the development of more advanced many-body methods. This chapter will
develop two of these, many-body perturbation theory and coupled cluster theory.
3.3   Many-Body Perturbation Theory
By this point, it has been well established that finding the energy of a many-body system is done
by solving the following eigenvalue problem where |Ψ⟩ is a many-body wavefunction:
                                   ˆ
                                  𝐻|Ψ⟩   = 𝐸 |Ψ⟩ −→ ⟨𝐻|Ψ⟩ ˆ     = 𝐸.                          (3.5)
However, this problem can only be solved fully for a few uninteresting systems. Therefore, we have
developed our first many-body method, HF theory, capable of finding approximately the energy
of a many-body system. However, HF is an independent particle model, and we also wish to
include the interactions between the particles in the energy. Thus, we will start by developing
many-body perturbation theory (MBPT), a post-Hartree-Fock method, so called because it starts
with the Hartree-Fock energy but then adds a correction to account for the interactions between the
particles [80, 84–86].
Many-body perturbation theory assumes that the Hamiltonian can be split into two pieces, a non-
interacting component, 𝐻ˆ 0 , and an interacting component, 𝐻ˆ 𝐼 . The interacting component is a
small perturbation away from the non-interacting component, thus the method’s name. Thus, we
have:
                                           𝐻ˆ = 𝐻ˆ 0 + 𝐻ˆ 𝐼 .                                 (3.6)
From this definition of the Hamiltonian, we can split the energy of the many-body system into two
components: a reference energy, 𝐸 0 , and a correlation energy, Δ𝐸 𝑀 𝐵𝑃𝑇 , leading to:
                                        𝐸 = 𝐸 𝑅𝑒 𝑓 + Δ𝐸 𝑀 𝐵𝑃𝑇 .                               (3.7)
                                                  29


The reference energy is defined using the total Hamiltonian and the Fermi vacuum state, |Φ0 ⟩ and
the total Hamiltonian as:
                                          𝐸 𝑅𝑒 𝑓 = ⟨Φ0 | 𝐻|Φ ˆ 0 ⟩.                             (3.8)
In practice, the reference energy is, in fact, the Hartree-Fock energy, assuming we are using a Fock
basis, which we will be using for every calculation in this thesis. Thus the correlation energy is an
additional term to the MBPT energy compared to the HF energy.
Now we can define the MBPT correlation energy as the matrix elements formed when the interacting
component of the Hamiltonian is applied to a many-body wavefunction as the ket and the Fermi
vacuum as the bra, giving:
                                        Δ𝐸 𝑀 𝐵𝑃𝑇 = ⟨Φ0 | 𝐻ˆ 𝐼 |Ψ⟩.                              (3.9)
However, solving equation is no simpler than solving the original eigenvalue problem. Thus we
will rephrase the MBPT correlation energy as:
                                                         ∞
                                                        ∑︁
                                         Δ𝐸 𝑀 𝐵𝑃𝑇 =           Δ𝐸 (𝑖) ,                         (3.10)
                                                        𝑖=1
where Δ𝐸 (𝑖) is the i-th order correction to the MBPT correlation energy. We can define the form
of the first two corrections as follows. The first order correction to the MBPT energy is:
                                         Δ𝐸 (1) = ⟨Φ0 | 𝐻ˆ 𝐼 |Ψ0 ⟩.                            (3.11)
The second order correction to the MBPT energy is:
                                           1 ∑︁ ⟨𝑖 𝑗 | 𝑣ˆ 2 |𝑎𝑏⟩⟨𝑎𝑏| 𝑣ˆ 2 |𝑖 𝑗⟩
                                 Δ𝐸 (2) =                                                      (3.12)
                                           4 𝑖 𝑗 𝑎𝑏 (𝜖𝑖 + 𝜖 𝑗 ) − (𝜖 𝑎 + 𝜖 𝑏 )
Theoretically, there are infinite corrections to the MBPT correlation energy. In practice, we cannot
compute infinite corrections to the MBPT energy, and we must truncate Equation (3.10) to a finite
number of terms. However, the form of the correlation energy provides a convenient truncation
scheme. The MBPT truncation MBPT1 (many-body perturbation theory to the first order) uses the
approximation Δ𝐸 𝑀 𝐵𝑃𝑇 ≈ Δ𝐸 (1) , the MBPT truncation scheme MBPT2 (many-body perturbation
theory to the second order) uses the approximation Δ𝐸 𝑀 𝐵𝑃𝑇 ≈ Δ𝐸 (1) +Δ𝐸 (2) , and so on. This thesis
                                                     30


will primarily focus on MBPT2 as the leading MBPT approximation used in these calculations.
Any MBPT results presented here will only include one-body and two-body interactions.
3.4   Coupled Cluster Theory
3.4.1   Introduction to the Method
Coupled cluster theory (CC), initially developed in nuclear physics (for its development, see
Reference [87, 88], and for its resurgence in nuclear physics, see Reference [2, 4, 8–10]), has
long been the goal standard for quantum chemistry calculations ([7]). Coupled cluster theory
provides a method to systematically include complicated interactions beyond the mean field, is
non-perturbative, size extensive, non-variational, and is widely used for performing calculations on
strongly correlated systems [35]. Being size-extensive is important for CC to be applied to large
systems. Computationally, CC scales polynomially with respect to the number of occupied and
unoccupied states in the system, making it an efficient many-body method for small to medium-
sized systems but relatively slow for large systems. Compared to MBPT, CC is a more accurate
method but also incurs more extensive computational run times.
In quantum chemistry and electronic structure, CC is considered the "gold-standard" many-body
method but also can be too computationally expensive for some applications, especially for studies
of larger molecules [62]. However, quantum chemists have made great strides in accelerating
coupled cluster calculations of electronic systems through various methods, including truncations,
which will be discussed in a few sections. For interesting applications of coupled cluster theory in
quantum chemistry, see, for example, References [6, 7, 40, 42, 54, 62, 89–94].
In CC, we can represent the N-Fermion wavefunction (|Ψ⟩) using the so called exponential ansatz:
                                                    ˆ
                                           |Ψ⟩ = 𝑒𝑇 |Φ0 ⟩.                                      (3.13)
In the above equation, |Φ0 ⟩ is the Fermi vacuum state where the N particles in the system occupy
the N lowest energy states. 𝑇ˆ is known as the correlation or the cluster operator, and it is the sum
of N operators, where N is the number of particles in the system [35, 62], and can be written as:
                                                  𝑁
                                                 ∑︁
                                            𝑇ˆ =      𝑇ˆ𝑚 .                                     (3.14)
                                                 𝑖=1
                                                 31


Each 𝑇ˆ𝑚 operator in Equation (3.14) represents the m-particle m-hole excitation operator, which
has the below form [35, 62].
                                    1                   ∑︁
                                                                           ...𝑎 𝑚 †
                            𝑇ˆ𝑚 = ( ) 2                              𝑡𝑖𝑎11...𝑖 𝑚
                                                                                  𝑎 𝑎1 ...𝑎 †𝑎 𝑚 𝑎𝑖 𝑚 ...𝑎𝑖1 (3.15)
                                   𝑚! 𝑖
                                                 1 ...𝑖 𝑚 𝑎 1 ...𝑎 𝑚
Single-particle states with labels 𝑖 𝑛 correspond to states occupied in the Fermi vacuum state, and
single-particle states with the labels 𝑎 𝑛 correspond to states unoccupied in the Fermi vacuum state.
The operator 𝑎 † is the Fermion creation operator, and the operator 𝑎 is the Fermion annihilation
operator (both are defined in Chapter 2). The coefficients 𝑡 are called T-amplitudes, and they are
determined through a complex set of non-linear equations:
                                                                 |𝑒 −𝑇 𝐻𝑒
                                                                      ˆ ˆ 𝑇ˆ
                                      ⟨Φ𝑖𝑎11,𝑖,𝑎2 ,...,𝑖
                                                      2 ,...,𝑎 𝑘
                                                            𝑘
                                                                                 |Φ0 ⟩ = 0,                  (3.16)
where the index 𝑘 = 1, 2, ..., A [35]. When 𝑘 = 1, the one-particle one-hole excitation operator is
recovered (𝑇ˆ1 ), when 𝑘 = 2 the two-particle two-hole excitation operator is recovered (𝑇ˆ2 ), and so
on. Therefore, amplitude equations must be solved for untruncated coupled-cluster theory to fully
derive the cluster operator for an N-body system. If coupled cluster equations are solved with the
untruncated cluster operator, they arrive as the same result as Schrödinger’s equation.
It is important to note that in Equation (3.16), we have used a shorthand notation to refer to the bra
vector, which is, in fact, a 𝑘-particle 𝑘-hole excitation of the Fermi vacuum state [35]. Written out
fully in second quantization, we can define the 𝑘-particle 𝑘-hole excitation of the Fermi vacuum
state as:
                                 |Φ𝑖𝑎11,𝑖,𝑎2 ,...,𝑖
                                             2 ,...,𝑎 𝑘
                                                    𝑘
                                                         ⟩ = 𝑎 †𝑎1 ...𝑎 †𝑎 𝑘 𝑎𝑖 𝑘 ...𝑎𝑖1 |Φ0 ⟩,              (3.17)
where 𝑎 † is the Fermion creation operator, and 𝑎 is the Fermion annihilation operator.
As a note on notation, the T-amplitudes, scalars that are calculated through determining the m-
particle m-hole excitation operators, can be represented equivalently in the following two notations:
                                         𝑡𝑖𝑎11...𝑖
                                                ...𝑎 𝑚
                                                    𝑚
                                                           = ⟨𝑎 1 ...𝑎 𝑚 | 𝑡ˆ|𝑖 1 ...𝑖 𝑚 ⟩.                  (3.18)
As an aside from the form of the CC wavefunction:
                                                                        ˆ
                                                          |Ψ⟩ = 𝑒𝑇 |Φ0 ⟩,                                    (3.19)
                                                                    32


one can use a Taylor expansion to expand the exponential function to obtain:
                               |Ψ⟩ = (1 + 𝑇ˆ + 𝑇ˆ 2 /2! + 𝑇ˆ 3 /3! + ...)|Φ0 ⟩.               (3.20)
This expansion explains why some of the later CC approximations we will be looking at contain
terms such as 𝑇ˆ1 and 𝑇ˆ2 but also 𝑇ˆ1𝑇ˆ2 and 𝑇ˆ13 .
3.4.2    Energy and Correlation Energy
Given an N-body many-body wavefunction, its energy can be determined by solving the eigenvalue
problem that results from applying the Hamiltonian to the wavefunction (i.e., solving the time-
independent Schrödinger’s equation) using:
                                                ˆ
                                               𝐻|Ψ⟩    = 𝐸 |Ψ⟩.                               (3.21)
The above equation Equation (3.21) can give just the energy on the right-hand side of the left-hand
side turned into a matrix element such that the equation now has the form:
                                               ⟨Ψ| 𝐻|Ψ⟩
                                                     ˆ     = 𝐸.                               (3.22)
Equation (3.22) has an implicit ⟨Ψ|Ψ⟩ = 1 on the righthand side of the equation. From here, we can
split the Hamiltonian into two pieces, the normal-ordered Hamiltonian and the vacuum expectation
value:
                                               𝐻ˆ = 𝐻ˆ 𝑁 + 𝐸 0 .                              (3.23)
where the vacuum expectation value 𝐸 0 is defined as:
                                             𝐸 0 = ⟨Φ0 |𝐻|Φ0 ⟩.                               (3.24)
Combining Equations (3.22) and (3.23) yields:
                                           ⟨Ψ| 𝐻ˆ 𝑁 + 𝐸 0 |Ψ⟩ = 𝐸,                            (3.25)
which can be split up into the following two terms:
                                      ⟨Ψ| 𝐻ˆ 𝑁 |Ψ⟩ + ⟨Ψ|𝐸 0 |Ψ⟩ = 𝐸.                          (3.26)
                                                       33


Since ⟨Ψ|𝐸 0 |Ψ⟩ = 𝐸 0 ⟨Ψ|Ψ⟩ = 𝐸 0 , then we can simplify Equation (3.26) to:
                                         ⟨Ψ| 𝐻ˆ 𝑁 |Ψ⟩ + 𝐸 0 = 𝐸.                                  (3.27)
From here, we can rewrite the coupled cluster exponential equation:
                                                ˆ
                                             𝑒𝑇 |Φ0 ⟩ = |Ψ⟩,                                      (3.28)
which can be inserted into Equation (3.27) to yield.
                                    ⟨Φ0 |𝑒 −𝑇 𝐻ˆ 𝑁 𝑒𝑇 |Φ0 ⟩ + 𝐸 0 = 𝐸.
                                             ˆ       ˆ
                                                                                                  (3.29)
From here, we can define the similarity transformed normal ordered Hamiltonian to be:
                                            𝐻¯ 𝑁 = 𝑒 −𝑇 𝐻ˆ 𝑁 𝑒𝑇 ,
                                                        ˆ      ˆ
                                                                                                  (3.30)
which yields a final form of the energy equation:
                                        ⟨Φ0 | 𝐻¯ 𝑁 |Φ0 ⟩ + 𝐸 0 = 𝐸.                               (3.31)
In 3.31, 𝐸 0 is known as the reference energy and (for this work) it is the Hartree-Fock energy.
This makes coupled cluster a post-Hartree-Fock method, similar to MBPT. The term ⟨Φ0 | 𝐻¯ 𝑁 |Φ0 ⟩
is the correlation energy or the coupled cluster correction to the Hartree-Fock energy. As a final
definition for this section, the CC correlation energy will be represented as Δ𝐸𝐶𝐶 = ⟨Φ0 | 𝐻¯ 𝑁 |Φ0 ⟩.
Much like MBPT, coupled cluster calculations will generally be reported in terms of the correlation
energy instead of the total energy. Again, this is due to convention, as the correlation energy is
the most important part of the energy calculation. Also note that for some systems (namely the
infinite matter systems discussed in the next chapter), CC correlation energies are usually reported
as the CC correlation energy per particle by convention such that systems with different numbers
of particles can be compared.
The cluster operator 𝑇ˆ is undefined in the above equations. It is obtained by solving a set of coupled
cluster amplitude equations set up in the previous section, of which there are N sets of equations
for an N particle system. and the number of equations per set depends on the number of single
                                                     34


particle states in the calculation. This system of equations is quite large even for relatively small
systems and requires and iterative procedure to be solved. Defining the cluster operator (and, more
specifically, the t-amplitudes) is the most computationally extensive step when performing a CC
energy calculation.
3.4.3    Computational Methods and Approximations
For almost all CC calculations performed computationally, the cluster operator must be truncated
[62]. Systems, where CC calculations can be performed with the full cluster operator, tend to be
uninteresting toy models [35]. However, the form of the cluster operator provides a physically
motivated method for truncation where all 𝑚-particle 𝑚-hole excitation operators over a certain
level are set to zero. This approximation scheme is referred to as the SUB𝑛 approximation scheme
[35]. There is also a convenient naming scheme when the cluster operator is truncated in this
way. For example, if 𝑇ˆ ≈ 𝑇ˆ1 , the approximation is called the coupled cluster single approximation
(CCS), and if 𝑇ˆ ≈ 𝑇ˆ1 + 𝑇ˆ2 then we call this the coupled cluster singles and doubles approximation
(CCSD), and if 𝑇ˆ ≈ 𝑇ˆ1 + 𝑇ˆ2 + 𝑇ˆ3 this leads to the coupled cluster singles, doubles, and triples
approximation (CCSDT) [51]. Due to computational limitations, approximations over the triples
level are rare, but have been performed on a limited set of systems [43]. For the infinite matter
systems that are defined in the next chapter, the 𝑇ˆ1 operator gives no contribution to the energy so
that we can simplify the above truncations: CCSD becomes CCD (coupled-cluster doubles) where
𝑇ˆ ≈ 𝑇ˆ2 and CCSDT becomes CCDT (coupled-cluster doubles and triples) where 𝑇ˆ ≈ 𝑇ˆ2 + 𝑇ˆ3 .
As the primary goal of this thesis is to use machine learning to extend the range of pre-existing
coupled cluster methods instead of developing new methods or improving on existing methods, the
explanations below are kept short and generally free of derivations. There have been many great
works deriving the various coupled cluster methods in great detail, and we would like to point the
readers to References [7, 80, 94], among others, for complete derivations of the methods introduced
in this section.
As is expected, the higher the level of approximation, the more accurate the calculations but, the
more extended run times. However, if having all N terms in the cluster operator gives 100% of the
                                                   35


energy of an N-particle system, then the CCSD approximation gives approximately 90% of the total
energy, and CCSDT gives almost 99% of the total energy [80]. Each additional 𝑚-particle 𝑚-hole
excitation operator does increase the accuracy of the energy calculation, but the improvements
progressively decrease in size. Additionally, as expected, the computational time and resource
requirements increase drastically as the order of the CC calculation increase. For example, the
expected run time for a CCSD calculation is 𝑂 (𝑀 6 ), where M is the number of single-particle
states. The expected run time for a CCSDT calculation is 𝑂 (𝑀 8 ), and considering M is likely to be
1,000 or higher for accurate calculations, this extra factor of 𝑀 2 represents a significant increase in
run time. However, since we would like to achieve the accuracy of CCSDT, recovering 99% of the
total energy, we can instead look at some approximative methods to the CCSDT approximation.
These methods will not be as accurate as the CCSDT method, but they will be more accurate than
the CCSD method as they contain some components from the 𝑇ˆ3 operator. Additionally, while these
approximative triples methods will have a longer run time than a CCSD calculation, they will have
much shorter run times than a complete CCSDT calculation, making the triples approximations a
good balance of accuracy and run time.
3.4.4   CCSDT Approximations
There are two types of approximative triples calculations: non-iterative perturbative triples approxi-
mations, CCSD(T), and iterative CCSDT-n approximations. The methods differ in their complexity
and formulations, but give similar results and thus will be compared below.
3.4.4.1   Non-Iterative Triples Approximations
The perturbative triples approximation, CCSD(T), is a non-iterative triples approximation that is the
gold standard for quantum chemistry applications [2]. The CCSD(T) method was developed in the
late 1980s in the field of quantum chemistry [95]. The computational considerations for CCSD(T)
have two components: an iterative component with a cost of 𝑂 (𝑀 6 ) (the CCSD equations) and a
non-iterative component with a cost of 𝑂 (𝑀 7 ), making it faster than complete CCSDT calculations
by a factor of M or greater. Since M can be over 1,000, this can represent a significant decrease in
the run time for a calculation.
                                                  36


We will start developing the CCSD(T) equations by defining the following two permutation opera-
tors:
                                             ˆ
                                            𝑃(𝑎/𝑏𝑐)      = 1 − 𝑃ˆ 𝑎𝑏 − 𝑃ˆ 𝑎𝑐 ,                                         (3.32)
and:
                                      ˆ
                                     𝑃(𝑎/𝑏𝑐|𝑘/𝑖        𝑗) = 𝑃(𝑎/𝑏𝑐)
                                                                ˆ           ˆ
                                                                            𝑃(𝑘/𝑖   𝑗),                                (3.33)
where 𝑃ˆ 𝑝𝑞 is the permutation between states 𝑝 and 𝑞. Next we can define an equation through
which the 𝑇ˆ3 amplitudes can be defined from the 𝑇ˆ2 amplitudes:
                                      ∑︁                                                 ∑︁
        𝜖𝑖𝑎𝑏𝑐   𝑎𝑏𝑐
           𝑗 𝑘 𝑡𝑖 𝑗 𝑘 = 𝑃(𝑎/𝑏𝑐|𝑘/𝑖 𝑗)
                        ˆ                   ⟨𝑏𝑐| 𝑣ˆ 2 |𝑑𝑘⟩𝑡𝑖𝑎𝑑𝑗 = 𝑃(𝑐/𝑎𝑏|𝑖/ 𝑗 𝑘)
                                                                       ˆ                       ⟩𝑙𝑐| 𝑣ˆ 2 | 𝑗 𝑘⟩𝑡𝑖𝑙𝑎𝑏 , (3.34)
                                         𝑑                                                  𝑙
where:
                                     𝜖𝑖𝑎𝑏𝑐
                                        𝑗 𝑘 = (𝜖𝑖 + 𝜖 𝑗 𝜖 𝑘 ) − (𝜖 𝑎 + 𝜖 𝑏 + 𝜖 𝑐 ).                                    (3.35)
Equation (3.34) defines the leading order (i.e., the second-order) terms in the full CCSDT triples
equations. This equation will be used here to define the CCSD(T) method and in the next section
to define the CCSDT-1a method. Next we can define 𝐸𝑇(4) , the triples-excitation contributions to
the MBPT4 (MBPT to the fourth order):
                                                                  (2)∗ 𝑎𝑏𝑐 (2) 𝑎𝑏𝑐
                                                    ∑︁
                                        𝐸𝑇(4) =           𝑡𝑖𝑎𝑏𝑐
                                                             𝑗𝑘       𝑡𝑖 𝑗 𝑘 𝜖𝑖 𝑗 𝑘 .                                  (3.36)
                                                  𝑖> 𝑗 >𝑘
                                                  𝑎>𝑏>𝑐
Here we are defining the second order 𝑇ˆ3 amplitudes (see Sec. 10.5 of Reference [80]). However,
                                                               [2]                      (2)
for CCSD(T), we are going to define these as 𝑡𝑖𝑎𝑏𝑐         𝑗𝑘        instead of 𝑡𝑖𝑎𝑏𝑐
                                                                                    𝑗𝑘        because we will generate the
𝑇ˆ3 amplitudes using Equation (3.35) and the converged 𝑇ˆ2 amplitudes (the square brackets represent
that we are using converged 𝑇ˆ2 amplitudes and the number in the brackets is a generalized order).
From here, we are in a place to define the energy that results from a CCSD(T) calculation using the
following equation:
                            𝐸𝐶𝐶𝑆𝐷 (𝑇) = 𝐸𝐶𝐶𝑆𝐷 + 𝐸𝑇[4] = 𝐸𝐶𝐶𝑆𝐷 + 𝐸 𝑡[4] + 𝐸 𝑠𝑡[4] ,                                     (3.37)
where:
                                                                  [2]∗ 𝑎𝑏𝑐 [2] 𝑎𝑏𝑐
                                                    ∑︁
                                        𝐸 𝑡[4] =          𝑡𝑖𝑎𝑏𝑐
                                                             𝑗𝑘       𝑡𝑖 𝑗 𝑘 𝜖𝑖 𝑗 𝑘 ,                                  (3.38)
                                                  𝑖> 𝑗 >𝑘
                                                  𝑎>𝑏>𝑐
                                                             37


and where:
                                                                                      [2]
                                             ∑︁            ∑︁
                                 𝐸 𝑠𝑡[4] = (     𝑡𝑖𝑎 [2] )      ⟩𝑏𝑐| 𝑣ˆ 2 | 𝑗 𝑘⟩𝑡𝑖𝑎𝑏𝑐
                                                                                   𝑗𝑘     .      (3.39)
                                              𝑖𝑎           𝑗 >𝑘
                                                           𝑏>𝑐
Note that the only difference between CCSD(T) and another non-iterative CCSDT approximate
method, CCSD[T] (see. Reference [96]), is the inclusion of the 𝐸 𝑠𝑡[4] term, which is usually
relatively small. We should also mention an improvement to CCSD(T), called Λ-CCSD(T), which
has been developed in quantum chemistry and is defined in Reference [89].
3.4.4.2     Iterative Triples Approximations
Another method of approximating the triple contribution is through the iterative CCSDT-𝑛 methods,
where 𝑛 = 1a, 1b, 2, 3, 4, or 5. Though computationally more expensive than their non-iterative
counterparts, iterative triples approximations are more accurate [2]. All methods will be briefly
described here, but we will focus on CCSDT-1a, which will be used for calculations later in
this thesis. For more information on CCSDT-1b - CCSDT-4, please refer to Reference [94] and
Reference [96].
The simplest of the CCSDT-n methods is the CCSDT-1a approximation and is created by setting 𝑇ˆ1
= 𝑇ˆ3 = 0 when being projected against the three-particle three-hole excited state. The CCSDT-1a
covers some infinite order terms from the 𝑇ˆ3 operator. This contrasts CCSD, which covers no
terms from the 𝑇ˆ3 , and CCSDT, which covers all terms. As shown in Table 3.1, the CCSDT-1a
method uses the operator 𝑒𝑇1 +𝑇2 +𝑇3 when projected against the singly excited state, the operator
                               ˆ   ˆ    ˆ
𝑒𝑇1 +𝑇2 + 𝑇ˆ3 when projected against the doubly excited state, and the operator 1 + 𝑇ˆ2 when projected
  ˆ  ˆ
against the three-particle three-hole excited state. These projections are used when constructing the
amplitude equations, which are used to derive the T-amplitudes and the energies and are in contrast
to a complete CCSDT calculation where the operator 𝑒𝑇1 +𝑇2 +𝑇3 is used no matter the state [94, 96].
                                                                      ˆ     ˆ  ˆ
Some successive approximations are made in the CCSDT-1a approximation that will be discussed
as the other methods are made.
The CCSDT-1b model uses the complete 𝑒𝑇1 +𝑇2 +𝑇3 operator for the singly and doubly excited states
                                                    ˆ    ˆ   ˆ
but, like CCSDT-1a, uses the operator 1 + 𝑇ˆ2 for the three-particle three-hole excited state. Because
                                                           38


the full operator is projected against the doubly excited state, the disconnected 𝑇ˆ1𝑇ˆ3 clusters are
included. Additionally, CCSDT-1b has a similar computational time to CCSDT-1a [96].
The next approximation is the CCSDT-2 approximation, which like CCSDT-1b, applies the com-
plete 𝑒𝑇1 +𝑇2 +𝑇3 operator to the single and doubly excited states and uses 𝑇ˆ1 = 𝑇ˆ3 = 0 when applied
         ˆ  ˆ  ˆ
to the triply excited state. However, it changes the operator projected on the triply excited states
      ˆ
to 𝑒𝑇2 [94, 96]. This is an important approximation since it adds the effects of the disconnected
𝑇ˆ2𝑇ˆ2 clusters on the 𝑇3 amplitudes. The CCSDT-3 approximation is conceptually the simplest,
where only 𝑇ˆ3 is set to zero, such that the full 𝑒𝑇1 +𝑇2 +𝑇3 operator is projected onto the singly and
                                                        ˆ  ˆ ˆ
doubly excited states but the operator 𝑒𝑇1 +𝑇2 is projected onto the triply excited state [96]. The final
                                            ˆ  ˆ
approximation, CCSDT-4, projects the full 𝑒𝑇1 +𝑇2 +𝑇3 operator onto the singly and doubly excited
                                                    ˆ  ˆ  ˆ
states but uses the operator 𝑒𝑇1 +𝑇2 + 𝑇ˆ3 for the triply excited states. The CCSDT-5 approximation is
                                 ˆ  ˆ
simply the full CCSDT calculation.
The reason to perform this approximation over the complete CCSDT calculation is computational
run times. A total CCSDT calculation is expected to have a run time on the order of 𝑂 (𝑀 8 ). The
expected run times for the CCSDT-n approximations are in Table 3.2 [94]. CCSDT-1a, CCSDT-1b,
CCSDT-2, and CCSDT-3 have computational run times that are shorter than a complete CCSDT
calculation by a factor of M (where typically M is greater than 1,000) but are more extended than
CCSD calculation by a factor of M. However, they include contributions from 𝑇ˆ3 and the triply
excited states that CCSD does not. CCSDT-4 has the exact computational cost as a complete CCSDT
calculation, so there are not many cases where it would be preferred over just using a complete
CCSDT calculation. Additionally, about 75% of the extra energy from the triples contribution is
captured by the CCSDT-1a approximation, which CCSDT-1b, CCSDT-2, and CCSDT-3 not adding
any other contributions to the energy from full summations, though they do add some contributions
from partial summations [94]. Thus as a good compromise of accuracy and computational run
time, we will use the CCSDT-1a approximation.
                                                      39


                                Method              |Φ𝑖𝑎 ⟩          |Φ𝑖𝑎𝑏𝑗 ⟩          |Φ𝑖𝑎𝑏𝑐𝑗𝑘 ⟩
                                                   𝑒𝑇1 +𝑇2          𝑒𝑇1 +𝑇2
                                                     ˆ   ˆ            ˆ    ˆ
                                 CCSD                                                   N/A
                                                 𝑒𝑇1 +𝑇2 +𝑇3   𝑒𝑇1 +𝑇2 + 𝑇ˆ3
                                                   ˆ ˆ ˆ          ˆ    ˆ
                              CCSDT-1a                                                1 + 𝑇ˆ2
                                                 𝑒𝑇1 +𝑇2 +𝑇3     𝑒𝑇1 +𝑇2 +𝑇3
                                                   ˆ   ˆ   ˆ        ˆ ˆ ˆ
                              CCSDT-1b                                                1 + 𝑇ˆ2
                                                 𝑒𝑇1 +𝑇2 +𝑇3     𝑒𝑇1 +𝑇2 +𝑇3
                                                   ˆ ˆ ˆ            ˆ ˆ ˆ                   ˆ
                               CCSDT-2                                                  𝑒𝑇2
                                                 𝑒𝑇1 +𝑇2 +𝑇3     𝑒𝑇1 +𝑇2 +𝑇3           𝑒𝑇1 +𝑇2
                                                   ˆ ˆ ˆ            ˆ ˆ ˆ                ˆ     ˆ
                               CCSDT-3
                                                 𝑒𝑇1 +𝑇2 +𝑇3     𝑒𝑇1 +𝑇2 +𝑇3       𝑒𝑇1 +𝑇2 + 𝑇ˆ3
                                                   ˆ ˆ ˆ            ˆ ˆ ˆ            ˆ    ˆ
                               CCSDT-4
                                                 𝑒𝑇1 +𝑇2 +𝑇3     𝑒𝑇1 +𝑇2 +𝑇3        𝑒𝑇1 +𝑇2 +𝑇3
                                                   ˆ ˆ ˆ            ˆ ˆ ˆ              ˆ ˆ ˆ
                                CCSDT
                                                                               ˆ
Table 3.1 This table summarizes the approximations for 𝑒𝑇 the various CCSDT-𝑛 methods use in
the amplitude equations, along with the approximations used for CCSD and CCSDT
                                           Method          Computational Cost
                                            CCSD                    𝑂 (𝑀 6 )
                                        CCSDT-1a                    𝑂 (𝑀 7 )
                                        CCSDT-1b                    𝑂 (𝑀 7 )
                                         CCSDT-2                    𝑂 (𝑀 7 )
                                         CCSDT-3                    𝑂 (𝑀 7 )
                                         CCSDT-4                    𝑂 (𝑀 8 )
                                          CCSDT                     𝑂 (𝑀 8 )
Table 3.2 The predicted computational run times of various coupled cluster calculations as a
function of M, the number of single particle states in the calculation
If we consider the Hamiltonian with at most a two-body force written in normal product form, we
get:
                                    ∑︁                           ∑︁
               𝐻¯ = 𝐻¯ 0 + 𝑉¯2 =          ⟨𝑝| 𝑓 |𝑞⟩{𝑎 †𝑝 𝑎 𝑞 } +         𝑝𝑞𝑟 𝑠⟨𝑝𝑞| 𝑣ˆ 2 |𝑟 𝑠⟩{𝑎 †𝑝 𝑎 †𝑞 𝑎 𝑠 𝑎𝑟 }. (3.40)
                                      𝑝𝑞
We can then write the CCSD equations using this Hamiltonian as (following Reference [94]):
                                                         1
                                        ⟨Φ0 | 𝐻¯ (𝑇ˆ2 + 𝑇ˆ12 )|Φ0 ⟩ = Δ𝐸𝐶𝐶𝑆𝐷 ,                                    (3.41)
                                                         2
and:
                                                                1            1
                               ⟨Ψ𝑖𝑎 | 𝐻¯ (𝑇ˆ1 + 𝑇ˆ2 + 𝑇ˆ1𝑇ˆ2 + 𝑇ˆ12 + 𝑇ˆ13 )|Φ0 ⟩ = 0,                            (3.42)
                                                                2            3!
and finally:
                                                1                   1            1            1
                 ⟨Φ𝑖𝑎𝑏𝑗 | 𝐻¯ (1 + 𝑇ˆ1 + 𝑇ˆ2 + 𝑇ˆ22 + 𝑇ˆ1𝑇ˆ2 + 𝑇ˆ12𝑇ˆ2 + 𝑇ˆ13 + 𝑇ˆ14 |Φ0 ⟩ = 0.                    (3.43)
                                                2                   2            3!           4
                                                             40


Here we have the CCSD energy equation in Equation (3.41), the amplitude equation for the singly
excited states in Equation (3.42), and the amplitude for the doubly excited states in Equation (3.43).
Using the same system, the CCSDT-1a equations are as follows (also following Reference [94]),
with the correlation energy equation being:
                                                              1
                                           ⟨Φ0 | 𝐻¯ (𝑇ˆ2 + 𝑇ˆ12 )|Φ0 ⟩ = Δ𝐸𝐶𝐶𝑆𝐷𝑇−1𝑎 ,                                    (3.44)
                                                              2
the singly excited amplitude equation being:
                                                                  1              1
                                      ⟨Φ𝑖𝑎 | 𝐻¯ (𝑇ˆ1 + 𝑇ˆ2 |𝑇ˆ3 + 𝑇ˆ12 + 𝑇ˆ1𝑇ˆ2 | 𝑇ˆ13 |Φ0 ⟩ = 0,                        (3.45)
                                                                  2              3!
the doubly excited amplitude equation being:
                                                       1                 1     1     1          1
                      ⟨Φ𝑖𝑎𝑏𝑗 | 𝐻¯ (1 + 𝑇ˆ1 + 𝑇ˆ2 |𝑇ˆ3 | 𝑇ˆ12 + 𝑇ˆ1𝑇ˆ2 + 𝑇ˆ22 | 𝑇ˆ13 | 𝑇ˆ2𝑇ˆ12 | 𝑇ˆ14 |Φ0 ⟩ = 0,          (3.46)
                                                       2                 2    3!     2          4!
and finally the triply excited amplitude equation being:
                                                    ⟨Φ𝑖𝑎𝑏𝑐
                                                        𝑗 𝑘 | 𝐻0𝑇3 + 𝑉 𝑇2 |Φ0 ⟩ = 0.
                                                               ¯ ˆ      ¯ˆ                                               (3.47)
We can write the 𝑇-amplitudes for the 𝑇ˆ3 operator that will result from a CCSDT-1a calculation as
(which is the same as Equation (3.35)):
                                                ∑︁                                          ∑︁
         𝜖𝑖𝑎𝑏𝑐   𝑎𝑏𝑐
            𝑗 𝑘 𝑡𝑖 𝑗 𝑘 = 𝑃(𝑎/𝑏𝑐|𝑘/𝑖 𝑗)
                             ˆ                       ⟨𝑏𝑐| 𝑣ˆ 2 |𝑑𝑘⟩𝑡𝑖𝑎𝑑
                                                                      𝑗 = 𝑃(𝑐/𝑎𝑏|𝑖/ 𝑗 𝑘)
                                                                           ˆ                     ⟩𝑙𝑐| 𝑣ˆ 2 | 𝑗 𝑘⟩𝑡𝑖𝑙𝑎𝑏 . (3.48)
                                                  𝑑                                           𝑙
For the CCSDT-1a method, we use the unconverged 𝑇ˆ2 amplitudes to calculate the 𝑇ˆ3 amplitudes
and converge both sets of amplitudes in an iterative method that requires 10 to 20 steps on average
[80]. Another improvement of the CCSDT-1a model over a complete CCSDT calculation is that
it avoids the storage of the 𝑇ˆ3 amplitudes. So not only are there reduced computational time
requirements, but there is also a reduction in the amount of memory required [80, 94].
Briefly, we can compare CCSD(T) and CCSDT-1a since we will be using both approximations later
in this thesis. Timing-wise, CCSD(T) should provide a better cost-performance ratio as its run time
is an iterative 𝑂 (𝑀 6 ) followed by a non-iterative 𝑂 (𝑀 7 ) while CCSDT-1a is a fully iterative 𝑂 (𝑀 7 ).
Additionally, CCSDT-1a and CCSD(T) should produce similar results, but CCSD(T) has fewer 𝑇ˆ3
                                                                    41


Figure 3.1 Diagrammatic representations for the 𝑇ˆ1 , 𝑇ˆ2 , and 𝑇ˆ3 operators. Higher order 𝑇ˆ𝑚
operators can be created using the pattern in the operators shown here
terms than CCSDT-1a. However, as both methods have the same number of 𝑇ˆ3𝑇ˆ3 coupling terms,
CCSD(T) should avoid the exaggeration of the 𝑇ˆ3 effects that can plague CCSDT-1a calculations.
Finally, for systems where 𝑇ˆ3 have significant effects, both CCSDT-1a and CCSD(T) provide a
reliable method, but for practical applications, the CCSD(T) calculation is preferable as a CCSDT
approximation due to run-time considerations and the reduction in exaggerating the effects of the
𝑇ˆ3 operator [94].
3.4.5    Diagrammatic Representations
Finally, we can briefly discuss the diagrammatic representations of the CC approximations up to
and including the triples approximation. In this chapter, we will only mention some of the crucial
diagrams. For a complete derivation of the coupled cluster diagrams up to and beyond the triples
level, see, for example, Reference [80].
To the diagrammatic rules we developed in Chapter 2 of this thesis, we can also add the following
diagrammatic rules, which are specific to drawing coupled cluster diagrams:
     • Every vertex representing 𝑇ˆ𝑚 picks up a factor of ⟨𝑎 1 ...𝑎 𝑚 | 𝑡ˆ|𝑖1 ...𝑖 𝑚 ⟩ (the T-amplitudes)
     • Each pair of 𝑇ˆ𝑚 vertices which are equivalent pick up a factor of 1/2. The vertices are
       equivalent if they connect to the interaction vertex in the same way.
Representing the 𝑇ˆ1 , 𝑇ˆ2 , and 𝑇ˆ3 operators in diagrammatic representation is quite simple, and these
are shown in Figure 3.1.
                                                      42


            Figure 3.2 The diagrammatic formulation for the CCD amplitude equation
Figure 3.3 Antisymmetrized Goldstone diagrams representing the 𝑇ˆ3 contributions to the CCSDT
𝑇ˆ2 equations. D10𝑏 and D10𝑐 are relevant to the CCDT-1 approximation defined in the last section
Figure 3.4 Antisymmetrized Goldstone diagrams representing the CCSDT 𝑇ˆ3 equations relevant to
calculating the CCDT-1 approximation. For a full list of all antisymmetrized Goldstone diagrams
for the CCSDT 𝑇ˆ3 equations, the reader is referred to Reference [80]
To show a complete coupled cluster calculation written out in diagrammatic form, we will represent
the CCD amplitude equation in diagrammatic form in Equation (3.2). Though long, this collection
of diagrams is significantly shorter than the entirely written out CCD amplitude equation using
summations and Dirac notation, which can be found in Reference [35].
                                                 43


Finally, look at some diagrams from the 𝑇ˆ3 operator, which are important to the CCD(T) and
CCDT-1 approximations. In fact, the terms 𝑇1𝑎 and 𝑇1𝑏 from the 𝑇ˆ3 amplitude equations form
Equation (3.48). The amplitudes that are gained by this equation are then used to calculate the
diagrams 𝐷 10𝑏 and 𝐷 10𝑐 for the 𝑇ˆ2 amplitudes [97].
3.5   Many-Body Methods Conclusion
In this chapter, we have developed three many-body methods we will use throughout this thesis.
The most fundamental of these methods is the Hartree-Fock theory, which will recover a reasonable
amount of the system’s energy. It is an independent particle model, so it cannot account for the
energy from interacting particles. The first post-Hartree-Fock method we developed, so-called
because it improves the Hartree-Fock result, was many-body perturbation theory (MBPT) which
provides a convenient truncation scheme that allows for systematic improvements to the Hartree-
Fock results. The final many-body method, developed in great detail due to its importance later in
this thesis, was coupled cluster theory, which is more accurate than MBPT but much more time-
consuming. Coupled cluster theory also provides a convenient truncation scheme and, at the triple
level, allows us to recover most of the system’s energy even with the method being truncated. Now
that we have developed our many-body methods, the next chapter will be devoted to developing the
system to which these methods will be applied: infinite matter.
                                                  44


                                              CHAPTER 4
                                   INFINITE MATTER SYSTEMS
4.1   Introduction to Infinite Matter Systems
An infinite matter system is simply a system which contains infinite particles and occupies an
infinite volume. We will be looking at two infinite matter systems in this chapter. The first,
the homogeneous electron gas, which has critical applications in several areas of chemistry and
condensed matter physics [9, 37, 38, 40, 43, 54, 76, 98–101]. Since the homogeneous electron gas
is a more straightforward infinite matter system, it is also a test bed for developing methods that
can be applied to more complex infinite matter systems. The second infinite matter system we will
investigate in this chapter is infinite nuclear matter [9, 21, 35, 102, 103]. Specifically, pure neutron
matter is an infinite nuclear matter system where all the particles are neutrons ([14, 20, 30, 32,
33, 104–107]), and symmetric nuclear matter, where half of the particles are protons and half are
neutrons ([25, 108]). Studies of infinite nuclear matter play an essential role in nuclear physics and
astrophysics [11, 22, 23, 26–31, 57, 60, 67, 77].
Performing coupled cluster calculations on infinite matter is not a new concept. For coupled cluster
calculations of the electron gas, see, for example, References [3, 38, 40, 78], among many other
interesting studies. For infinite matter, there are References [3, 11, 34, 35, 78] among others.
However, this work is unique in the methodology we will take to remove some of the errors that
result from these calculations, described towards the end of this chapter. It is important to note
that for all infinite matter systems investigated in this thesis, the one-particle one-hole excitation
operator (𝑇ˆ1 ) in the cluster operator is zero due to symmetry considerations [35]. It is almost
important to note that these infinite matter systems are momentum conserving and that the total
momentum is zero [35].
                                                   45


4.2   Single Particle Basis for Infinite Matter
We can begin to define a single particle basis for infinite matter calculations in three dimensions by
defining the single particle wavefunction as plane waves with the form:
                                                               1 𝑖 𝑘·®
                                                                     ®𝑟
                                                     ® = √ 𝑒
                                                 𝜓 𝑘,𝜎                   𝜉𝜎 ,                    (4.1)
                                                               Ω
where 𝜉𝜎 has two possible values corresponding to spin-up particles:
                                                                  
                                                                 1 
                                                               =  ,
                                                                  
                                                        𝜉𝜎= 1                                    (4.2)
                                                            2    0 
                                                                  
                                                                  
or spin-down particles:
                                                                   
                                                                  0
                                                       𝜉𝜎=− 1 =   .
                                                                   
                                                                                                 (4.3)
                                                             2    1
                                                                   
                                                                   
                                              3
Additionally, in Equation (4.1), Ω = 𝐿 corresponds to the volume of the infinite matter system. The
limit 𝐿 → ∞ is taken to accurately simulate an infinite matter system after the various expectation
values have been calculated [1, 76, 77].
When using the plane waves as the single particle wavefunctions and assuming periodic boundary
conditions, we can define the single particle momentums to be:
                                                               2𝜋
                                                         𝑘𝑖 =     𝑛𝑖 ,                           (4.4)
                                                                𝐿
where 𝑖 = 𝑥, 𝑦, or 𝑧 and n𝑖 = 0, ±1, ±2, ... [76, 77]. From the momentum numbers, we can define
the kinetic energy of an infinite matter system in second quantization to be:
                                                     ∑︁ ℏ2 𝑘 2𝑝
                                            𝑇ˆ =                 𝑎 †𝑝𝜎
                                                                    ®
                                                                         𝑎 𝑝𝜎
                                                                           ® 𝑝.                  (4.5)
                                                            2𝑚         𝑝
                                                      ® 𝑝
                                                      𝑝𝜎
Finally, we can define the energy of each single particle state in terms of its quantum numbers:
                                                       ℏ2 2𝜋 2 2
                                 𝜖 𝑛 𝑥 ,𝑛 𝑦 ,𝑛 𝑧 =         ( ) (𝑛𝑥 + 𝑛2𝑦 + 𝑛2𝑧 ).                (4.6)
                                                       2𝑚 𝐿
We can rewrite Equation (4.6) to be in terms of the momentum using Equation (4.4), resulting in:
                                                           ℏ2 2
                                       𝜖 𝑛 𝑥 ,𝑛 𝑦 ,𝑛 𝑧 =       (𝑘 + 𝑘 2𝑦 + 𝑘 2𝑧 ).               (4.7)
                                                          2𝑚 𝑥
                                                              46


Some of the single-particle states in a system will contain particles, which we will call occupied
single-particle states. On the other hand, some will not contain particles, referred to as unoccupied
single-particle states. When the system is arranged in its ground state formation (i.e., the lowest
energy configuration), the occupied single-particle states fill the states with the lowest single-
particle energies. Thus, the occupied single particle states will lie inside a so-called Fermi sphere,
described by the Fermi momentum, k𝐹 , and the Fermi energy, E𝐹 . The Fermi level divides the
occupied and unoccupied single-particle states in this ground state configuration.
The following set of equations can relate to the Fermi momentum and the Fermi energy:
                                                                √︂
                                            ℏ2 𝑘 2𝑓                2𝑚𝐸 𝑓
                                    𝐸𝑓 =             −→ 𝑘 𝑓 =             .                          (4.8)
                                              2𝑚                      ℏ2
From the Fermi momentum, we can define the average density of the system, 𝜌0 , to be:
                                                         𝑘 3𝑓
                                                    𝜌0 =      .                                      (4.9)
                                                         3𝜋 2
The average density, 𝜌0 can also be defined as 𝜌0 = 𝑁/Ω, where 𝑁 is the number of particles in
the system and like 𝐿, the limit 𝑁 → ∞ is taken. However, the limits 𝑁, 𝐿 → ∞ are taken such
that 𝜌0 remains at a fixed, finite value. Finally, 𝑁 and 𝑘 𝐹 can be related with:
                                          𝑘 3𝑓 Ω                3𝜋 2 𝑁 1/3
                                    𝑁=            −→ 𝑘 𝐹 = (           ) .                          (4.10)
                                           3𝜋 2                  Ω
While in this theoretical model, there are an infinite number of single-particle states in the basis,
in practice, due to computational limitations, the total number of single-particle states included in
the calculation, 𝑀, must be truncated to a finite number. In this work, we will assume that the
number of occupied and unoccupied single particle states correspond to closed-shell structure, and
the total number of shells will be finite. Here we will define the a shell as a collection of single
particle states with the same energy level. We will place a spherical energy cut-off on the quantum
numbers such that 𝑛2𝑥 + 𝑛2𝑦 + 𝑛2𝑧 ≤ 𝑁 𝑠ℎ𝑒𝑙𝑙𝑠 − 1, where 𝑁 𝑠ℎ𝑒𝑙𝑙𝑠 is the total number of energy shells
included in the calculation. It is important to note that 𝑁 𝑠ℎ𝑒𝑙𝑙𝑠 ≠ 𝑁𝑚𝑎𝑥 . Rather, 𝑁𝑚𝑎𝑥 corresponds
to the maximum value of 𝑛𝑥 , 𝑛 𝑦 , or 𝑛 𝑧 that is included in the calculation. This closed shell structure
is also used for coupled cluster calculations in Reference [77], [78], and [35] among others.
                                                       47


Enforcing a closed-shell structure for the occupied and unoccupied single-particle states and trun-
cating the total number of shells allowed in the system restricts the number of allowed particles and
total single-particle states to a finite set of values. We refer to these allowed numbers as "magic
numbers," They correspond to the total number of single-particle states when s shells are included
in a calculation. The first magic number is two, as there are two single-particle states in the first
energy shell. The second magic number is 14, as there are 14 single-particle states when two energy
shells are included (2 in the first shell and 12 in the second shell). Continuing as such gives the
following magic numbers: 2, 14, 38, 54, 66, 114, ... . Note that these are only the magic numbers
for the homogeneous electron gas and pure neutron matter. For symmetric nuclear matter, all magic
numbers are doubled (4, 28, 76, 108, 132, 228, ...). In the context of this thesis, the term shells will
refer to the total number of energy shells used in the calculation, 𝑀 will refer to the total number
of single-particle states in the system (both occupied and unoccupied), and open shells will refer to
the number of shells above the Fermi level only.
4.3   The Homogeneous Electron Gas
The homogeneous electron gas (HEG) is an infinite matter system containing only electrons with
a uniform positive background charge such that the overall net charge of the system is zero [3].
Though the HEG as a theoretical model exists in various dimensions [3], this thesis will only
focus on the three-dimensional electron gas. The HEG in three dimensions is essential in density
functional theory, where it is the cornerstone of the local density approximation [3]. It is also
a reasonable model for several systems of interest in quantum chemistry and condensed matter
physics, including the electrons in semiconductors and alkali metals [3]. Additionally, studies of
the HEG can be used to build a framework that can be transferred to studies of infinite nuclear matter,
which is essential for studying the equation of the state of nuclear matter and many-body studies of
neutron stars. The HEG is used as a framework for developing analysis tools for other many-body
systems because the HEG is a relatively simple infinite matter system, leading to many published
properties (both analytical and numerical) for results to be compared to. Other studies have used the
HEG as a system that allows for comparisons between the results of different many-body methods
                                                    48


(see, for example, Reference [3]).
The Hamiltonian for the HEG consisting of N electrons can be defined as
                                        𝐻ˆ = 𝐾ˆ + 𝑉ˆ𝑒𝑒 + 𝑉ˆ𝑏𝑒 + 𝑉ˆ𝑏𝑏 ,                                       (4.11)
where 𝐾ˆ is the kinetic energy operator, 𝑉ˆ𝑒𝑒 is the interaction between all sets of two electrons, 𝑉ˆ𝑏𝑒
describes the interaction of all electrons with the positive background charge, and 𝑉ˆ𝑏𝑏 describes the
contribution of the background charge interacting with itself. Since the HEG comprises a positively
charged background and negatively charged electrons, the Coulomb force is the primary force at
play in the interactions. The Coulomb force is a long-range force that acts over an infinite space.
However, instead of using the Coulomb interaction, we will instead use Ewald’s interaction, which
splits the electron interaction into a short-range term and a long-range term while simultaneously
dealing with 𝑉ˆ𝑏𝑒 and 𝑉ˆ𝑏𝑏 . We can rewrite the HEG Hamiltonian using the Ewald interaction as:
                                         1 ∑︁          1 ∑︁              1
                                 𝐻ˆ = −        ∇𝛼2 +           𝑣ˆ 𝛼𝛽 + 𝑁𝑣 𝑀 ,                                (4.12)
                                         2   𝛼
                                                       2 𝛼≠𝛽             2
where 𝛼 and 𝛽 are electron indices [76]. The first term is the kinetic energy operator, and the final
two terms comprise the Ewald interaction [76]. The term 𝑣 𝑀 is called the Madelung term and 𝑣ˆ 𝛼𝛽 ,
the two-electron operator, is defined as:
                                               1 ∑︁            ® 𝑟 𝛼 −®
                                                                      𝑟𝛽)
                                       𝑣ˆ 𝛼𝛽 =         𝑣 𝑞® 𝑒𝑖 𝑞(®         ,                                 (4.13)
                                               𝑉
                                                   𝑞®
                                                                       4𝜋
where 𝑉 = 𝐿 3 is the finite volume of the HEG and 𝑣 𝑞® =               𝑞®2
                                                                             if 𝑞® ≠ 0 and 𝑣 𝑞® = 0 if 𝑞® = 0 [76].
This choice of Hamiltonian leads to the plane wave basis for the single-particle states described
in Section 2 of this chapter. While the Ewald interaction makes the HEG easier to work with
by splitting up the long-range and short-range components of the electron-electron interaction, it
cannot correctly describe the exchange-correlation energy [3].
Note that the density of the homogeneous electron gas is typically described by a parameter called
the Wigner-Seitz radius. The Wigner-Seitz radius can be defined as
                                                         𝑟0
                                                 𝑟𝑠 =       ,                                                (4.14)
                                                        𝑟𝐵
                                                      49


where r𝐵 is the Bohr radius, and r0 is related to the size of the box containing the electron gas by
                                                 4 3 𝑁
                                                  𝜋𝑟 =              ,                              (4.15)
                                                 3 0 𝐿3
where the density of the HEG is 𝑑 = 𝑁/𝐿 3 , measured in fm−3 [78]. A larger value of r𝑠 means the
modeled system is in a larger box and therefore is less dense than a smaller value of r𝑠 for the same
number of particles. It is important to note that the HEG behaves differently at different densities.
At low densities (or high r𝑠 ), the electrons in the HEG are for a lattice. This process is called
Wigner crystallization and results from the long-range repulsive interactions between the electrons
[3]. At higher densities (or lower values of r𝑠 ), the HEG can better be described as a liquid instead
of as a gas [3].
The convergence of the correlation energy for the electron gas using plane waves as a basis has yet
to be thoroughly investigated, but some work has been done in Reference [76]. One reason why the
correlation energy of the electron gas is an interesting problem is that it has substantial contributions
from the electron-electron cusp. However, this electron-electron cusp does lead to trouble when
making computational truncations as a finite set of single-particle states cannot accurately describe
the behavior in this region [76].
It is important to note that the MBPT2 correlation energies are well-defined for a finite electron gas.
However, they diverge in the thermodynamic limit of an infinite electron gas in three dimensions
[76]. This happens at high densities due to the dominance of the particle-hole ring diagrams [3].
The single particle energies of the HEG when using the Ewald interaction are as follows for occupied
single particle states:
                                          𝑘®𝑖2 ∑︁                          𝑣𝑀
                                    𝜖𝑖 =       −      ⟨𝑖 𝑗 | 𝑣ˆ 2 | 𝑗𝑖⟩ −     ,                    (4.16)
                                           2      𝑗≠𝑖
                                                                            2
and for unoccupied single particle states:
                                              𝑘®𝑎2 ∑︁
                                       𝜖𝑎 =       −       ⟨𝑎 𝑗 | 𝑣ˆ 2 | 𝑗 𝑎⟩.                      (4.17)
                                               2       𝑗
Here we have used the common denotation of indices where 𝑎, 𝑏, 𝑐,... represent states which are
unoccupied in the Fermi ground state, 𝑖, 𝑗, 𝑘,... represent states which are occupied in the Fermi
ground state, and 𝑝, 𝑞, 𝑟,... could represent either [76].
                                                      50


We can also define the two-electron matrix elements in integral notation as:
                                          ∫ ∫
                                       1
          ⟨𝑖 𝑗 | 𝑣ˆ 2 |𝑎𝑏⟩ = 𝛿𝑖𝑎 𝛿 𝑗 𝑏 2       𝑑®    𝑟 2 𝜓𝑖∗ (®
                                                𝑟 1 𝑑®        𝑟 1 )𝜓 ∗𝑗 (®
                                                                         𝑟 2 ) 𝑣ˆ 2 (®
                                                                                     𝑟 1 , 𝑟®2 )𝜓𝑎 (®
                                                                                                    𝑟 1 )𝜓 𝑏 (®
                                                                                                              𝑟 2 ), (4.18)
                                      Ω
where 𝑣ˆ 2 is defined in Equation (4.13) and 𝜓 are plane waves defined in Sec. 4.2.
By convention, the energies calculated using the HEG will be reported in Hartrees (1 Hartree =
4.35𝑥10−18 Joules). However, this differs from the units that will be used for the next system,
infinite nuclear matter.
4.4    Interlude: Nuclear Forces
The properties of infinite nuclear matter and finite nuclear systems are determined by the nuclear
forces that govern the interactions between the nucleons. These nuclear forces are given by
two fundamental forces: the strong nuclear force, which binds the nucleons together, and the
electromagnetic force, which causes repulsion between the protons. The strong force is stronger
over short distances (which causes nuclei to hold together), but the electromagnetic force is a
long-range force acting over large distances. The nature of the strong force needs to be better
understood, and this can make it challenging to model the nuclear forces in theoretical calculations
[18, 19]. Because of this, the first interaction used to perform calculations in this thesis is the "toy"
interaction called the Minnesota potential, which is defined in Reference [109] and used in, for
example, Refs [77, 78, 110] to model the nuclear interaction. The Minnesota potential is a local,
nucleon-nucleon-only, moderately soft potential reproducing the nucleon-nucleon effective range
parameters. It provides a reasonable approximation for the binding energies of light nuclei, but
it is a simple interaction that is computationally inexpensive compared to more realistic nuclear
interactions. It should be noted that the Minnesota potential is a phenomenological interaction,
meaning that it was constructed by fitting experimental data instead of being derived from first
principles. The matrix elements for the Minnesota potential are given by:
                                      1      1                    1
                             𝑉𝑖, 𝑗 =    [𝑉𝑅 + (1 + 𝑃𝑖𝜎𝑗 )𝑉𝑡 + (1 − 𝑃𝑖𝜎𝑗 )𝑉𝑠 ] (1 + 𝑃𝑖𝑟𝑗 ).
                                      2      2                    2
In the above equation, 𝑃 𝜎 is the spin exchange operator, 𝑃𝑟 is the space exchange operator, and V 𝑅 ,
V𝑡 , and V𝑠 are given by the following equations, where r𝑖 𝑗 is the distance between two nucleons
                                                           51


and the constants are found by fitting to experimental data.
                                            𝑉𝑅 = 𝑉0𝑅 𝑒 −𝜅 𝑅 𝑟𝑖 𝑗
                                                                2
                                            𝑉𝑡 = −𝑉0𝑡 𝑒 −𝜅𝑡 𝑟𝑖 𝑗
                                                                2
                                            𝑉𝑡 = 𝑉0𝑠 𝑒 −𝜅 𝑠 𝑟𝑖 𝑗 .
                                                             2
Though the Minnesota potential is computationally inexpensive and captures important aspects of
the nuclear interactions, modern advancements have improved our models of nuclear forces [11,
18, 19, 104, 111–123]. The second nuclear force used to model infinite nuclear matter in this
thesis, which is a more model interaction, is derived from practical field theory (EFT) [77] [2,
12–15, 18, 19, 29, 108, 119, 124–126]. The nuclear forces derived from EFT have an advantage
over other nuclear forces in that the two-body and the many-body interactions can be derived in a
mutually consistent matter [35]. Much recent progress has been made in deriving the nuclear forces
based on chiral EFT and the nuclear Hamiltonian for many nucleonic matter calculations, not using
forces from chiral EFT, including both nucleon-nucleon (NN) and three-nucleon (3N) forces [104,
112, 117, 118, 123, 127, 128]. For the results presented herein, we will use the parametrization
NNLO𝑜 𝑝𝑡 for the NN and local 3N interactions.
It should be noted that implementing the 3N forces in the single particle basis used for infinite
nuclear matter calculations is much simpler than implementing them in the harmonic oscillator
basis commonly used for nuclei calculations [34]. However, the large number of matrix elements
needed to compute 3N forces is still the computationally limiting factor in these calculations [34].
The specific chiral potentials used in this work are detailed in References [18] and [19]. They have
optimized Δ-full interactions and are calibrated with nuclear matter properties [18]. The Δ-full
interaction is detailed in References [120–122, 129–131]. The interaction employs a standard
non-local regulator function of the form:
                                            𝑓 ( 𝑝) = 𝑒 ( 𝑝/Λ) ,
                                                              2𝑛
                                                                                              (4.19)
where 𝑝 is the relative momentum, 𝑛 is 4, and Λ is 394 MeV. We are using these interactions
at the next-to-next-to-leading order (NNLO). In this order, 17 low-energy coefficients (LECs)
                                                    52


parameterize the interaction and whose values are given in Reference [18]. This leads to a form of
the Hamiltonian, which can be written as:
                                                             ∑︁=17
                                                          𝑁 𝐿𝐸𝐶𝑠
                                           𝐻 (𝛼) = ℎ0 +             𝛼𝑖 ℎ𝑖 ,                            (4.20)
                                                              𝑖=1
where ℎ0 = 𝑡 𝑘𝑖𝑛 + 𝑣 0 , 𝑡 𝑘𝑖𝑛 is the kinetic energy, 𝑣 0 is a constant potential that does not depend on the
LECs, 𝛼 ® is a vector that denotes all of the LEC. Of the two interactions used in this thesis to model
nuclear forces, the optimized Δ-full interaction is more accurate than the Minnesota potential, but
has more complex matrix elements which leads to longer run times.
4.5   Infinite Nuclear Matter
Infinite nuclear matter is defined as a system containing infinite nucleons (protons and neutrons)
that only interact via the nuclear forces [35]. Studies of infinite nuclear matter are essential for
understanding the matter within dense astronomical objects such as neutron stars [77]. Neutron
stars are exciting because they offer insights into nuclear processes and astrophysical observables.
However, neutrons stars also contain matter that spans several orders of magnitude and contain
many different compositions of matter [26–31, 77]. Neutron star matter occurs at densities of 0.1
fm−3 or greater and consists of various fractions of neutrons, protons, electrons, and muons. These
particles exist in beta equilibrium (𝛽-equilibrium) governed by the weak force.
Studies of infinite nuclear matter are also focused on determining the equation of state (EoS) .
When considering applications to neutron stars, the EoS can help determine the mass range, the
relationship between the star’s mass and its radius, the thickness of the star’s crust, and the rate at
which the star will cool down [77]. Determination of the EoS also links neutron stars to the neutron
skin in atomic nuclei and the symmetry energy of nuclear matter. The symmetry energy is crucial
because it relates to the difference between proton and neutron radii in nuclei [35].
Solving the EoS depends on our ability to solve the many-body problem for infinite nuclear matter
[35, 77]. The nuclear matter has been of interest in many-body studies since the early days of
the field (see Reference [132] for a review of these early studies). These early calculations were
performed with Brueckner-Bethe-Goldstone theory ([133, 134]). However, modern many-body
                                                        53


studies of nuclear matter are performed with a varied of methods, including coupled-cluster theory
(CC) ([2, 3, 34, 35, 77, 78, 119, 125]), Monte Carlo methods ([16, 23, 32, 33, 57, 105, 106, 135,
136]), Green’s function methods ([107, 108, 137, 138]) and methods from the renormalization
group theory family ([77, 126, 139]). The coupled-cluster theory will be the many-body method of
interest for the remainder of this thesis. Coupled-cluster calculations of nuclear matter date back
to the 1970s and 1980s [35].
One property of infinite nuclear matter we are interested in is the proton fraction, which is defined
as
                                                       𝜌𝑝
                                                 𝑥𝑝 =     ,                                        (4.21)
                                                       𝜌
where 𝜌 𝑝 is the density of protons in the matter, 𝜌𝑛 is the denisty of neutrons in the matter, and 𝜌
= 𝜌 𝑝 + 𝜌𝑛 is the total density of the infinite nuclear matter [77]. Defining different proton fractions
defines different types of infinite nuclear matter systems. For example, if 𝑥 𝑝 = 0, the infinite nuclear
matter system contains only neutrons. We will refer to this system as pure neutron matter (PNM).
If 𝑥 𝑝 = 1/2, then the system contains an equivalent number of protons and neutrons, and this system
is called symmetric nuclear matter (SNM).
From the proton fraction, we can define the symmetry energy as the difference between the energy
for pure neutron matter and symmetric nuclear matter at a set density [77]:
                              𝐸 𝑠𝑦𝑚 (𝜌) = 𝐸 (𝜌, 𝑥 𝑝 = 0) − 𝐸 (𝜌, 𝑥 𝑝 = 1/2).                       (4.22)
All energies for infinite nuclear matter calculations will be, by convention, reported in units of
MeV.
4.6    Truncation Errors in Infinite Matter Calculations
As mentioned throughout this chapter, several truncations and approximations must be made to study
these infinite matter systems in a computational framework. Though computational limitations
require these truncations and approximations, they introduce errors into the calculations, making
them undesirable [38, 76]. Though it is impossible to remove every error that comes from the
various truncations and approximations, this thesis will focus on mitigating the effects of three
                                                    54


truncation errors: the error that results from truncating the number of single-particle states in the
basis, the error that results from truncating the number of particles and volume of the infinite matter
system, and the error that results from truncating the coupled cluster correlation operator, 𝑇.  ˆ
As discussed in Section 2 of this chapter, the number of single-particle states (𝑀) in the calculation is
truncated due to truncating the total number of energy shells allowed in the system. This introduces
an error in the calculation called the basis incompleteness error. The basis incompleteness error can
be mitigated by increasing the number of single-particle states (or energy shells) in the calculation.
Unfortunately, this also drastically increases the computational time and resources needed. Coupled
cluster theory has polynomial time scaling with respect to 𝑀, which, while being better than other
many-body methods, still means that computational times can be prohibitive at large values of
𝑀. Additionally since increasing the number of single-particle states in the system increases the
number of matrix elements in the calculations, computational resources are also increased.
While infinite matter systems theoretically contain infinite particles over an infinite volume, these
values must be truncated to finite values computationally [35]. This introduces an error in the
calculations called finite size error. Since the number of particles in the system and the volume of
the system are related by a fixed density (𝜌0 = 𝑁/Ω), it is sufficient to increase 𝑁 to the infinite limit
while keeping the density constant. However, the same problem occurs when attempting to increase
𝑀 since coupled cluster theory also has polynomial time scaling with respect to the number of
particles in the system; however, the power tends to be higher with particles than with single-particle
states. Additionally, matrix elements involving occupied states (or particles) are computationally
more complex than those involving only unoccupied states. Therefore, adding more particles to the
system increases the computational resources needed than additional single-particle states.
As discussed in the previous chapter, the coupled cluster correlation operator is:
                                                     𝑁
                                                    ∑︁
                                               𝑇ˆ =     𝑇ˆ𝑚 ,                                      (4.23)
                                                    𝑚=1
where 𝑁 is the total number of particles in the system and 𝑇ˆ𝑚 represents the 𝑚-particle 𝑚-hole
excitation operator. We can make this operator specific to infinite matter systems by making two
                                                    55


changes. First, infinite matter systems contain particles, so 𝑁 = ∞. Second, the 1-particle 1-hole
excitation operator, 𝑇ˆ1 , will always be zero for infinite matter systems due to symmetry in the
momentum. Therefore, we can write the cluster operator for infinite matter systems as:
                                                       ∞
                                                      ∑︁
                                                 𝑇ˆ =     𝑇ˆ𝑚 .                                     (4.24)
                                                      𝑚=2
Also, as discussed in the previous chapter, the correlation operator is truncated in practice by setting
excitation operators over a certain level to zero. In this thesis, the coupled cluster results will either
be calculated using coupled cluster doubles (CCD) where 𝑇ˆ ≈ 𝑇2 or approximations to coupled
cluster doubles triples (CCDT) where 𝑇ˆ ≈ 𝑇2 + 𝑇3 . Since the full cluster operator is not used in the
calculations, this does reduce the accuracy of the calculations. However, in this thesis, we will not
be extrapolating to the complete coupled cluster calculation since this would require performing
coupled cluster calculations at more truncation levels. However, we will attempt to improve the
accuracy of infinite matter calculations by predicting the CCDT result from the CCD result and
the CCD result from the MBPT2 result, thus improving the accuracy while keeping computational
time and resources small.
However, in the infinite matter calculations, there are sources of error from approximations and
truncations that are not addressed in this thesis but could be the subject of future work. Some of them
will be discussed briefly here. First, the Coulomb interaction, the predominant interaction in the
homogeneous electron gas, is a long-range interaction that acts over an infinite distance. Truncating
the volume of the electron gas to a finite size also truncates the Coulomb interaction. It is important
to note that this is a finite size error occurring in the HEG but not in infinite nuclear matter [3]. This
means that studies of the HEG at finite sizes will face additional complications compared to studies
of infinite nuclear matter [3]. In addition to truncating the coupled cluster correlation operator, not
all n-body interactions are included in the calculation because the Hamiltonian is also truncated.
In this work, calculations are limited to including, at most, the two-body or three-body matrix
elements. However, the highest level of matrix elements contributing to the system is 𝑘-body,
where 𝑁 is the total number of particles. Finally, while the Coulomb interaction, the primary
interaction of the electron gas has a simple closed-form representation that can be calculated
                                                      56


precisely, the primary interaction in the infinite nuclear matter is the nuclear interaction. However,
the nuclear interaction has no simple closed-form solution and cannot be represented precisely.
Therefore all nuclear interactions are approximations with various levels of accuracy compared to
real nuclear interaction.
4.7    Conclusion
In this chapter, we have explored the two systems of interest in this thesis: the homogeneous electron
gas and infinite nuclear matter. We have also developed a single-particle basis for these calculations
which provides a convenient method for truncating the number of particles in the system and the
number of single-particle states, which is required due to computational limitations. We have also
investigated two nuclear interaction models and seen some errors that can occur when trying to
model an infinite system in a finite computational space.
Up until this point, this thesis has focused on the physical framework necessary for understanding
this work. We first developed a many-body framework to define our many-body systems. We then
developed three many-body methods which can be used to find the energies of our systems, and
finally, we explored the two systems of interest. The following two chapters will develop the needed
computational framework: machine learning and the basics of the SRE method.
                                                   57


                                             CHAPTER 5
                               BAYESIAN MACHINE LEARNING
5.1    Machine Learning
Machine learning is the field that occurs at the intersection of data science and artificial intelligence;
it is the science of programming a computer so that it can learn from a given data set [79].
Machine learning algorithms can solve these problems without being programmed with task-
specific instructions; machine learning encompasses a set of generic algorithms that can be applied
to various problems.
Machine learning can be applied to a wide variety of problems, but it is most useful when traditional
styles of programming cannot solve the problem or would take a very long time to solve a problem.
This includes problems where the traditional solution would have a long list of rules; machine
learning can find patterns in the data set without problem-specific programming. Machine learning
also excels in problems where a large amount of complex data is difficult to sort and work with by
hand or with traditional programming [79].
Machine learning algorithms can be classified into one of three categories: supervised, unsuper-
vised, and reinforcement learning. In supervised learning, the training data given to the algorithm
is labeled, meaning it has both an X and a y component. Therefore, supervised learning aims to map
every X to its corresponding y correctly. There are two types of supervised learning: classification
(when y contains a finite number of possible values) and regression (when y contains an infinite
number of possible values). All machine learning algorithms developed in this chapter will be
supervised learning. Unsupervised learning algorithms work with unlabelled data, meaning they
only receive the X component of the data set. Common unsupervised learning tasks are clustering
and dimensionality reduction. Finally, reinforcement learning describes a set of algorithms where
an agent learns to solve a problem by maximizing its reward. Supervised and unsupervised learning
are common in machine learning applications in physics, while reinforcement learning in physics
applications is less common.
Machine learning has recently become a popular tool in physics and is being applied to a wide range
                                                   58


of problems. Machine learning has found applications across all areas of physics. Specifically
looking at neural networks, they have been used in nuclear physics to perform extrapolations (for
example References [140], [66], [141],[142],[67],[68],[69],[52]) and to directly solve the many-
body problem (for example Ref [53]). Neural networks can be problematic for ab initio data
sets due to very small data sets but will be explored in this thesis as a possible machine learning
method. Machine learning can also be used in many-body physics to simulate the wavefunction
of a system to find the variationally lowest energy of the system [57]. More generally, machine
learning algorithms have been used to predict the coupled cluster energies of molecules using a
variety of machine learning algorithms. The work presented in this thesis, though unique, has been
inspired by these previous applications of machine learning in the field of many-body physics.
The remainder of this chapter will be structured as follows. The following sections will develop four
standard supervised learning algorithms: linear regression, its related algorithms ridge and kernel
ridge regression, and finally, a brief discussion on neural networks and recurrent neural networks.
Next, there will be a discussion of Bayesian statistics to prepare the reader for the development of
the final two machine learning algorithms: Bayesian ridge regression and Gaussian processes.
5.2   Linear Regression
Linear regression is one of the most basic machine learning algorithms. However, it is often
the first algorithm studied in a machine learning class or textbook because it still contains all of
the essential elements of a supervised machine learning algorithm (a loss function, parameters,
and optimization). One of the main reasons linear regression is still studied as a machine learning
algorithm is that it has analytical expressions for its parameters. This contrasts with other supervised
machine learning algorithms, such as neural networks, which use numerical algorithms to optimize
the parameters. Furthermore, due to the analytical expressions for the parameters, linear regression
will always return the same output if given the same input, again in contrast to other machine
learning algorithms such as neural networks. This reproducibility makes linear regression and its
related algorithms attractive for physical applications.
                                                    59


The output of the linear regression algorithm can be written as follows:
                                              𝑦ˆ 𝐿𝑖𝑛𝑒𝑎𝑟 = 𝑋𝜃,                                  (5.1)
where X is the input of the algorithm and 𝜃 is a vector of constant parameters, often called the
weights. The "machine learning" part of the algorithm is tuning 𝜃 so that:
                                             𝑦𝑇𝑟𝑢𝑒 = 𝑦ˆ 𝐿𝑖𝑛𝑒𝑎𝑟 .                               (5.2)
In machine learning, a loss function measures the error in the algorithm’s prediction from the real
data. The loss function for the linear regression algorithm is the mean-squared error function:
                                                     𝑛−1
                                                   1 ∑︁
                                 𝐽 𝐿𝑖𝑛𝑒𝑎𝑟 (𝜃) =          (𝑦𝑖 − 𝑦ˆ 𝐿𝑖𝑛𝑒𝑎𝑟,𝑖 ) 2 ,               (5.3)
                                                   𝑛 𝑖=0
where y = y𝑇𝑟𝑢𝑒 .
The optimal values of 𝜃 are the ones that make the output of the linear regression algorithm the
same as the actual data. Another way of phrasing this is that the optimal values of 𝜃 are the values
that minimize the loss function. Thus linear regression becomes a simple minimization problem!
Minimizing the loss function with respect to the parameters 𝜃 yields the optimal parameters, the
values of 𝜃 that make the output of the linear regression algorithm as close to the actual data as
possible.
                                        𝜃 𝐿𝑖𝑛𝑒𝑎𝑟 = (𝑋 𝑇 𝑋) −1 𝑋 𝑇 𝑦.                           (5.4)
These optimized values of 𝜃 can then be multiplied by new data not in the training set to make
predictions for new data.
5.3   Ridge Regression
Several regularized linear regression forms exist, including ridge, LASSO, and elastic net. This
section focuses on ridge regression, where the base linear regression loss function is regularized
by the L2 norm of the weights, 𝜃 [79, 143, 144].
The output of a ridge regression algorithm is the same as for linear regression:
                                               𝑦ˆ 𝑅𝑖𝑑𝑔𝑒 = 𝑋𝜃,                                  (5.5)
                                                      60


The main difference from linear regression comes with the loss function. The ridge regression loss
function, shown in Equation (5.6), consists of the standard mean-squared error as the first term and
a second term known as a regularization term. The regularization term is simply the L2 norm of
the weights, 𝜃, scaled by a parameter 𝜆, known as the strength of the regularization. Note that in
some sources, this scaling factor is shown as 𝜆/2, or sometimes as 𝛼.
                                                𝑛−1                          𝑛−1
                                              1 ∑︁                           ∑︁
                               𝐽 𝑅𝑖𝑑𝑔𝑒 (𝜃) =        (𝑦𝑖 − 𝑦ˆ 𝑅𝑖𝑑𝑔𝑒,𝑖 ) 2 + 𝜆     𝜃 𝑖2            (5.6)
                                              𝑛 𝑖=0                          𝑖=0
The optimal values of 𝜃 will again be found via a simple minimization which results in optimal
values of 𝜃 which are close to 𝜃 𝐿𝑖𝑛𝑒𝑎𝑟 but which contain an additional term from the regularization:
                                        𝜃 𝑅𝑖𝑑𝑔𝑒 = (𝑋 𝑇 𝑋 − 𝜆I) −1 𝑋 𝑇 𝑦,                         (5.7)
Ridge regression tends to be more accurate than linear regression due to the bias-variance trade-
off. The addition of the regularization term to the loss function adds some amount of bias to the
algorithm. This, in turn, causes a drop in the variance, leading to ridge regression being better able
to generalize to data outside of its training set than linear regression. The regularization term also
keeps the model’s weights small, which can have computational benefits.
The value chosen for the hyperparameter 𝜆 can significantly affect the values of the optimized
𝜃 𝑟𝑖𝑑𝑔𝑒 and the results of the future predictions. For example, if 𝜆 = 0, then 𝜃 𝑟𝑖𝑑𝑔𝑒 = 𝜃 𝐿𝑖𝑛𝑒𝑎𝑟 (the
ridge regression algorithm becomes linear regression because the regularization term is removed).
On the other hand, if 𝜆 is very large, then the weights will all be forced to be very small, and the
resulting predictions will be a straight line through the mean of the data set.
Since the results of a ridge regression model depend on the value of 𝜆 chosen, hyperparameter
tuning is used to choose an optimal value. In hyperparameter tuning, a ridge regression model is
trained using the training data set and various values of 𝜆. The algorithm’s performance with each
new value of 𝜆 is checked against a validation data set, which can be a subset of the training set,
the test set, or a separate data set. For a thorough hyperparameter tuning of 𝜆, hundreds of different
values are typically tested, spanning several orders of magnitude. The value of 𝜆 that is kept is the
one that produces the lowest error when recreating the validation data set.
                                                      61


While finding a value of 𝜆 that produces an acceptable error in this manner is possible, it poses
a few problems. First, there is a theoretical value of 𝜆 that will minimize the loss function when
applied to the validation data set. However, it is unlikely that the exact value will be found in a
traditional hyperparameter tuning process where the list of supplied values for 𝜆 are tested by brute
force. This leaves the open question of if the value of 𝜆 that performed best in the tuning process is
the best value overall. Secondly, while the hyperparameter tuning process can be fast (depending
on the number of values tested), it is an additional step that needs to occur with each new data set
and thus adds time to the ML analysis. Lastly, hyperparameter tuning requires a validation data set,
either taken from the training or test data set, making them smaller, or generated separately. While
this may be fine for some machine learning applications, especially those with large data sets or
where data is easy to generate, this is a significant drawback for many-body applications since new
data points can represent large computational requirements in time and resources.
5.4   Kernel Ridge Regression
A regression algorithm related to ridge regression is called kernel ridge regression (KRR) [62,
79, 143]. While it uses the same loss function as ridge regression, the inputs are altered using a
kernel function. This is known as the kernel trick and adds non-linearity into the model, making
it able to generalize to a broader range of data sets. KRR assumes that the output data, y, can be
approximated by multiplying a set of weights, 𝜃, by values from a kernel function:
                                                   𝑛−1
                                                   ∑︁
                                         𝑦ˆ 𝐾 𝑅𝑅 =     𝜃 𝑖 𝑘 (𝑥, 𝑥𝑖 ),                          (5.8)
                                                   𝑖=0
where the points x𝑖 correspond to the training data, and x is the point to produce a prediction. The
function k is the kernel function and can take different forms depending on the application. Some
common kernel functions are the polynomial kernel,
                                        𝑘 (𝑥, 𝑦) = (𝛾𝑥𝑇 𝑦 + 𝑐 0 ) 𝑑 ,                           (5.9)
and the sigmoid kernel,
                                     𝑘 (𝑥, 𝑦) = 𝑡𝑎𝑛ℎ(𝛾𝑥𝑇 𝑦 + 𝑐 0 ).                            (5.10)
                                                     62


Each of these kernel functions comes with some hyperparameters: 𝛾, c0 , and d for the polynomial
kernel and 𝛾 and c0 for the sigmoid kernel, for example. The user must set these hyperparameters
before the algorithms are trained. Even the kernel function can be considered a hyperparameter
since the user chooses it, and it affects the results of the trained algorithm.
The loss function for KRR is the same as ridge regression’s, a regularized form of the mean-squared
error function where the regularization term is the L2 norm of the weights, 𝜃. Therefore, KRR still
has the 𝜆 hyperparameter. Just like the previous algorithms, the optimized values of the weights
are found by minimizing the loss function with respect to the weights yielding optimized values of
the weights to be:
                                         𝜃 𝐾 𝑅𝑅 = (𝐾 − 𝜆I) −1 𝑦,                              (5.11)
where K is known as the kernel matrix and is defined as:
                                             𝐾𝑖, 𝑗 = 𝑘 (𝑥𝑖 , 𝑥 𝑗 ).                           (5.12)
While KRR is better at generalizing than linear regression and ridge regression due to the inclusion
of the kernel function, this does introduce significantly more hyperparameters into the algorithm.
Furthermore, this drastically increases the complexity of the hyperparameter tuning process since
the kernel function and its hyperparameters must also be tuned in addition to 𝜆. Therefore, while
KRR can produce the most accurate predictions once trained of the three regression algorithms
investigated here, the requirements for its hyperparameter tuning are a significant drawback when
total run time and data set size are restrictions, as in the case with the research presented in this
thesis.
5.5   Neural Networks
Neural networks are computational systems that can learn to perform tasks by considering examples,
generally without being programmed with any task-specific rules. Another way to phrase this is
that a neural network is a computational system that learns to match a given input to the correct
output. They are a broad category of machine learning algorithms, including popular algorithms
such as convolutional neural networks, recurrent neural networks, and deep learning.
                                                     63


Fully Connected Feedforward Neural Network
Though there are many types of neural networks, using just the phrase "neural network" typically
refers to a type of neural network known as a fully connected feedforward neural network (FFNN).
This type of network can also be known as a multilayer perceptron (MLP) if it has at least one hidden
layer. These FFNNs contain many interconnected layers which transform the given input into an
output [51]. The base unit of an FFNN is called a neuron, and these neurons are arranged into
columns which are called layers [51]. Information in an FFNN moves only forward. Additionally,
each neuron is connected to every neuron in the next layer, and there are no connections between
neurons in the same layer. This means that the input to a layer in the neural network is simply the
output from the previous layer. As we will see in a moment, each neuron receives a weighted sum
of the outputs of all neurons in the previous layer.
5.5.0.1    Mathematics of Neurons and Layers
Each neuron is a mathematical function involving a column from a weight matrix, a scalar from a
bias vector, and an activation function. We can represent the mathematical form of the ith neuron
in the first hidden layer as:
                                                      𝑀
                                                     ∑︁
                                         𝑦ˆ 𝑖 = 𝑓 (       𝑤 𝑖 𝑗 𝑥 𝑗 + 𝑏𝑖 ),                      (5.13)
                                                     𝑗=1
where x is the input to the neural network, w is the weight matrix that scales the input of the neuron,
b is the bias vector that ensures the neuron output is non-zero. Finally, the function f is known as
the activation function, which adds nonlinearity to the neuron [51]. The weights and biases are
free parameters (meaning they are set for the neural network during training), but the activation
function is a hyperparameter [51].
For each neuron, i, in the first hidden layer of a neural network, we can represent its mathematical
form as:
                                                     ∑︁𝑀
                                       𝑦ˆ 𝑖1  = 𝑓 (1
                                                          𝑤 𝑖1𝑗 𝑥 𝑗 + 𝑏𝑖1 ).                     (5.14)
                                                      𝑗=1
We can also write out the mathematical form for the entire first hidden layer as:
                                             𝑦ˆ 1 = 𝑓 1 (𝑊1 x + b1 ).                            (5.15)
                                                         64


Similarly, for the second hidden layer, we can represent the mathematical form of each neuron as:
                                                                 ∑︁𝑁
                                                𝑦𝑖2   = 𝑓 (   2
                                                                        𝑤 𝑖2𝑗 𝑦 1𝑗 + 𝑏𝑖2 ).                               (5.16)
                                                                  𝑗=1
Note here that the weights matrix is no longer multiplied by the inputs to the neural network (x)
but rather by the output of the first hidden layer. This is because the input to the first hidden layer
is the input to the neural network, but the input to the second hidden layer is the output of the first
hidden layer. Therefore, we can expand the above equation to be more precise:
                                                ∑︁𝑁                  ∑︁𝑀
                                   𝑦𝑖2 = 𝑓 2 (          𝑤 𝑖2𝑗 𝑓 1 (         𝑤 1𝑘 𝑗 𝑥 𝑘 + 𝑏 1𝑗 ) + 𝑏𝑖2 ).                  (5.17)
                                                 𝑗=1                  𝑘=1
We can also write a mathematical form for the entire second hidden layer in matrix-vector form as:
                                                    𝑦ˆ 2 = 𝑓 2 (𝑊2 𝑦ˆ 1 + b2 ),                                           (5.18)
or more explicitly as
                                        𝑦ˆ 2 = 𝑓 2 (𝑊2 𝑓 1 (𝑊1 (𝑥) + b1 ) + b2 ).                                         (5.19)
Finally, we can use the pattern we have developed to write down the equation for the mathematical
output for a neuron on the 𝑙-th hidden layer of the neural network. For the 𝑖-th neuron on the 𝑙-th
layer, we can describe it mathematically as:
                                                                 𝑁𝑙
                                                                ∑︁
                                               𝑦𝑖𝑙   = 𝑓 (  𝑙
                                                                      𝑤 𝑖𝑙 𝑗 𝑦 𝑙−1      𝑙
                                                                               𝑗 + 𝑏 𝑖 ),                                 (5.20)
                                                                𝑗=1
and more explicitly as
                                            𝑁𝑙
                                           ∑︁                      𝑁
                                                                   ∑︁𝑙−1
                                                                                                         𝑙
                            𝑦𝑖𝑙        𝑙
                                   = 𝑓 (         𝑤 𝑖𝑙 𝑗  𝑓 𝑙−1
                                                                 (       𝑤 𝑙−2      𝑙−1       𝑙−1
                                                                             𝑘 𝑗 𝑦 𝑘 + 𝑏 𝑗 ) + 𝑏 𝑖 ),                     (5.21)
                                            𝑗=1                    𝑘=1
and finally, all the way expanded as
                        𝑁𝑙
                        ∑︁                 𝑁 𝑙−1
                                            ∑︁                           ∑︁ 𝑀
                                                                      1
              𝑦𝑖𝑙    𝑙
                  = 𝑓 (     𝑤 𝑖𝑙 𝑗 𝑓 𝑙−1
                                         (        𝑤 𝑙−1
                                                      𝑗 𝑘 (·    · ·𝑓 (          𝑤 1𝑚𝑛 𝑥 𝑛 + 𝑏 1𝑚 ) · ··) + 𝑏 𝑙−1    𝑙
                                                                                                             𝑘 ) + 𝑏 𝑗 ). (5.22)
                        𝑗=1                 𝑘=1                           𝑛=1
We can also write a mathematical expression for the output of the entire 𝑙-th layer using matrix-
vector format as:
                                                   𝑦ˆ 𝑙 = 𝑓 𝑙 (𝑊𝑙 𝑦ˆ 𝑙−1 + b𝑙 ),                                          (5.23)
                                                                      65


which can be expanded to
                                  𝑦ˆ 𝑙 = 𝑓 𝑙 (𝑊𝑙 𝑓 𝑙−1 (𝑊𝑙−1 𝑦ˆ 𝑙−2 + b𝑙−1 ) + b𝑙 ),            (5.24)
and finally to
                      𝑦ˆ 𝑙 = 𝑓 𝑙 (𝑊𝑙 𝑓 𝑙−1 (𝑊𝑙−1 (· · · 𝑓 1 (𝑊1 x + b1 ) · ··) + b𝑙−1 ) + b𝑙 ). (5.25)
It is a complicated expression and can grow to be very large, but it is a set equation that describes
the output of a neural network with 𝑙-1 hidden layers and an output layer. So it is also possible to
rephrase the definition of a neural network to be an analytical function that maps a set of inputs to
a set of outputs using a set of optimized parameters [51].
Activation Functions
The activation function is nonlinear. Its purpose is to allow the neural network to capture nonlinear
patterns in the data set [51]. Without an activation function, a neural network could only produce
an output that was a linear combination of the inputs. There are many choices for the activation
function for the neurons, but standard activation functions are the sigmoid, the hyperbolic tangent
function (tanh), and the rectified linear unit (ReLu) [51]. Some neural network architectures will
have some layers with no activation function. For example, it is common to have no activation
function on the output layer so as not to constrain the range of values the output layer can produce.
Loss Functions and Training Neural Networks
A loss function determines how much the output from a neural network differs from the real/expected
result. There is no set loss function used with neural networks, but two common loss functions
are the mean-squared error loss function and the mean absolute error function. The mean-squared
error loss function (MSE) can be defined as:
                                                              𝑁
                                                          1 ∑︁
                                         𝐽 𝑀𝑆𝐸 (𝑊) =              (𝑦𝑖 − 𝑦ˆ 𝑖 ) 2 ,
                                                          𝑁 𝑖=1
where y is the true data set, 𝑦ˆ is the neural network prediction, N is the number of data points in
the set, and W is the neural network’s weights. The loss function depends on the weights of the
neural network because changing the weights of the neural network changes its output.
                                                          66


The mean-absolute error loss function (MAE) has a similar form:
                                                      𝑁
                                                   1 ∑︁
                                     𝐽 𝑀 𝐴𝐸 (𝑊) =        |𝑦𝑖 − 𝑦ˆ 𝑖 |,
                                                  𝑁 𝑖=1
A significant part of working with neural networks is a process known as training, where the weights
of the neural network are optimized such that the cost function is minimized. This training process
has two phases: the forward pass and the backpropagation.
The forward pass occurs when data is sent through the neural network (left to right on the above
graph) to produce a predicted output. This predicted output is fed into the loss function with the
actual data set to generate the loss value.
After the forward pass comes backpropagation, where the error from the loss function is backprop-
agated through the layers of the neural networks, and its weights are adjusted layer by layer so that
the next forward pass will result in a reduced loss value. A simple way to optimize the weights of
a neural network during backpropagation is through an optimization technique known as gradient
descent. The weights of the neural network are adjusted by the derivative of the loss function with
respect to the weights, scaled by a hyperparameter known as the learning rate:
                                                     𝜕𝐽 (𝑊)
                                          𝑊 = 𝑊 − 𝑟𝑙         ,
                                                       𝜕𝑊
The learning rate (r𝑙 ) is a number typically much less than one, and it is also a hyperparameter, so
its value must be set before the neural network is run.
The process of training a neural network involves many different iterations of forward pass followed
by backpropagation. Typically a training process will continue until a certain number of training
iterations has been reached, or the difference in the current loss value compared to the value from
the previous iteration is below a certain threshold. However, neural networks should not be trained
for an overly long time because this will lead to something called overfitting, where the neural
network learns to match the data set it is trained with very well (so it will show a small loss value)
but loses all generality when given new data (so it will perform poorly when given the new data
set).
                                                  67


Recurrent Neural Networks
There are many different types of neural networks besides the basic FFNN, such as convolutional
neural networks (CNNs) and recurrent neural networks (RNNs) [51]. However, RNNs will be the
focus of this section as they are designed to handed ordered data, which may make them better at
extrapolations than regular neural networks. Furthermore, recurrent neural networks can be used
to analyze time series data; thus, they have typical applications in finance.
In feedforward neural networks, data only flows in one direction, from an input layer to the output
layer. The architecture of a recurrent neural network will look very similar to a feedforward neural
network, except each neuron feeds its output back into itself and sends its output to the next layer.
For every data point the RNN receives, each neuron receives both the input from the previous layer
(using the same equations as a feedforward neural network) and its output from the previous data
point. This combination of inputs gives an RNN a memory, making it capable of predicting the
future when analyzing time series data or being an extrapolator for other data sets.
The main problem RNNS faces is a vanishing or exploding gradient that can occur during the
training process. It is possible to fix this problem using different types of RNN layers. Common
alternative RNN layers are long-short-term memory layers (LSTM) and gated rectified unit layers
(GRU).
A Discussion on Neural Network Hyperparameters
One of the drawbacks of neural networks is that they have many hyperparameters that need to be
set by the user before the neural network can be trained. Examples of hyperparameters include the
activation function, the learning rate, the use of dropout during the training process, etc. When
using RNNs, the type of RNN layer (regular, LSTM, or GRU) adds another set of hyperparameters.
Additionally, the number of layers in the neural network defines its depth, and the number of
neurons in the hidden layers defines the width of the neural network [51]. Both of these are also
hyperparameters. A network’s ability to find patterns in complex data is governed by the number of
hidden layers and neurons the network contains, but too many neurons and layers can make for a slow
training process [51]. Thus, with the vast number of possible hyperparameter combinations present
                                                   68


in neural networks, hyperparameter tuning can be difficult and time-consuming. Additionally, since
neural network weights are initialized randomly, and the optimization of the weights is typically
not performed with a closed-form optimization algorithm, a neural network will return a different
result every time it is trained. This means that to ensure that a neural network has a good set of
hyperparameters to give accurate results reliably, it must be retrained many times, and its results
averaged to determine its actual performance. It is not uncommon in physical science applications
where producing reliable results is essential to use a "forest" of 100 identical neural networks and
use the average results, the true answer, and the standard deviation across all the predictions as the
uncertainty of the result [51].
5.6    Bayesian Ridge Regression
The following two machine learning algorithms we will investigate belong to a family of algorithms
called Bayesian machine learning, so called because they rely on Bayes’ theorem and Bayesian
statistics to both make predictions and determine the uncertainty of those predictions. The main
draw of using Bayesian machine learning algorithms over standard machine learning algorithms is
that Bayesian algorithms determine their hyperparameters using Bayesian statistics to determine
the most likely values. They also produce uncertainty in their results, which is essential for machine
learning applications in the physical sciences [79, 143, 145].
Therefore an alternative to performing traditional ridge regression with hyperparameter tuning is
to use Bayesian ridge regression, the Bayesian form of ridge regression where the training process
sets the value of 𝜆 [79, 143, 146, 147]. In Bayesian ridge regression, the output data, y, is assumed
to be in a Gaussian distribution around X𝜃:
                                     𝑝(𝑦|𝑋, 𝜃, 𝜆) = 𝑁 (𝑦|𝑋𝜃, 𝜆),                                 (5.26)
Priors for the weights, 𝜃, are give by a spherical Gaussian, where 𝛽−1 is taken to be the precision:
                                        𝑝(𝜃|𝛽) = 𝑁 (𝜃|0, 𝛽−1 I),                                 (5.27)
where I is the identity matrix.
                                                  69


The priors over 𝜆 and 𝛽 are assumed to be gamma distributions, and 𝜃, 𝜆, and 𝛽 are estimated
jointly when the model is fit. The Bayesian ridge regression analysis will result in different
weights than a regular ridge regression analysis, but a Bayesian ridge regression algorithm is more
robust. Additionally, since the value of 𝜆 is set when the algorithm is trained, using Bayesian ridge
regression eliminates the need for hyperparameter tuning and a validation data set, making it a
better choice than ridge regression when time savings is essential, and the size of the data sets is
minimal.
5.7    Gaussian Processes
Gaussian processes (GP) are non-parametric models that use Bayesian statistics to provide a machine
learning algorithm that can model complex relationships between the data, even if the predictions
are based on uncertain information. GPs can be used for classification, but we will only consider
regression applications in this thesis, meaning the GPs will learn to model a continuous function.
GPs assume that the function to be modeled can be represented as a Gaussian distribution over the
function values at every point [79, 143, 145].
Given a set of inputs x and a set of outputs y, the GP defines a joint distribution over all outputs
such that the subsequent multivariate Gaussian distribution, N, is created:
                                                𝑝(y|x) = 𝑁 (𝜇, Σ),
The vector 𝜇 contains the mean of each element of y (𝜇𝑖 = 𝐸 [𝑦𝑖 ]), and Σ is the covariance matrix
with elements Σ𝑖, 𝑗 = 𝑘 (𝑦𝑖 , 𝑦 𝑗 ), where k is known as the covariance or the kernel function.
The kernel function used in the GP analysis presented in this work is the rational quadratic (RQ)
kernel, which is defined as:
                                                            𝑑 (𝑥𝑖 , 𝑥 𝑗 ) 2 −𝛼
                                       𝑘 (𝑥𝑖 , 𝑥 𝑗 ) = (1 +                ) ,
                                                               2𝛼𝑙 2
where 𝛼 is a hyperparameter known as the scale mixture parameter, l is a hyperparameter that sets
the length scale of the kernel, and the function d calculates the Euclidean distance between 𝑥𝑖 and
𝑥 𝑗 . The values of the hyperparameters are set during the training process. The rational quadratic
kernel can model functions with a mixture of local and global smoothness. It was chosen as the
                                                         70


kernel function for this analysis because it produced the lowest RMSE error on the extrapolations
compared to other kernel functions.
GPs were chosen as a primary machine learning method because they can be thought analogous
to the Bayesian implementation kernel ridge regression (KRR). KRR relies on user-chosen hyper-
parameters, complicating the analysis process. The values of these hyperparameters (one from
the loss function, 2-5 from the kernel) are chosen through hyperparameter tuning, where many
combinations of hyperparameters are tested to find the best combination. The drawback to this
method is that it requires a validation data set to be used for the hyperparameter tuning, in addition
to the training data set and the test data set. Additionally, it was found that the data sets used in this
work are sensitive to the values of the hyperparameters (a slight change in hyperparameter value
could lead to a significant change in extrapolated value), so a quiet through hyperparameter tuning
would need to be performed to find the optimal values.
In the context of this research, we are interested in time savings in addition to prediction accuracy.
Therefore the goal is to accurately generate the converged correlation energies with as few training
points as possible and as fast as possible. Furthermore, eliminating the need to perform hyperpa-
rameter tuning saves a significant amount of time in the machine learning analysis and eliminates
the need to generate a validation data set.
5.8   Conclusion
In this chapter, we have introduced machine learning and investigated different supervised learning
methods. However, as they are commonly formatted, these algorithms will only make a good ex-
trapolator for some-body data sets. RNNs, built for extrapolations, require too much training data to
be helpful in a many-body application (where generating data points is computationally expensive),
and the hyperparameter tuning process associated with RNNs is a significant drawback. However,
the other machine learning algorithms we investigated here need to be built for extrapolation and
thus perform poorly. Thus in the next chapter, we will create a method of formatting a data set,
called sequential regression extrapolation (SRE), that will take a simple regression algorithm, such
as ridge regression, and turn it into a powerful extrapolator.
                                                   71


                                            CHAPTER 6
                   SEQUENTIAL REGRESSION EXTRAPOLATION (SRE)
6.1   Introduction to Methodology
Removing the basis incompleteness and finite size errors by formatting it as an extrapolation
problem is a promising application of machine learning because it is well formatted. Furthermore,
the extrapolations are known to have asymptotic values for the correlation energies (i.e., the
converged values), there is a simple pattern in the data for the machine learning to pick up, and the
training data is already near the converged values (but still far enough away that performing the
extrapolation increases the accuracy of the results) [51].
Attempting to remove basis incompleteness and finite size errors from many-body calculations
with machine learning is not a novel idea. For example, in Ref [51], the authors attempt to use a
neural network to perform an extrapolation to remove the basis incompleteness error for a harmonic
oscillator basis used in coupled cluster calculations of nuclei. However, the authors encountered
the problems that would be expected when using neural networks for this application, including
difficulty working with small data sets (leading to the need for interpolation before extrapolation)
and neural network’s inability to create exactly reproducible results due to its training process
[51]. Another example of using machine learning to remove truncation errors from many-body
calculations can be found in [54].
We should point out that other methods exist to eliminate the finite size errors other than performing
calculations at higher N or extrapolating. One of these options is to use twisted boundary conditions
instead of periodic boundary conditions, which provide much more accurate estimates of the
correlation energies [3].
6.2   Formulation
As discussed in the previous chapter, supervised machine learning algorithms require label training
data, meaning that both 𝑥 and 𝑦 components exist. Therefore, a supervised machine learning
algorithm learns to approximate a function, 𝑓 , during its training process such that 𝑓 (𝑥) = 𝑦 for
                                                  72


every value of x in the data set.
However, supervised machine learning algorithms tend to fail when asked to make predictions out-
side their training range (for example, see Reference [51]). Put another way, supervised machine
learning algorithms tend to make more extrapolators and are thus not used for extrapolation appli-
cations. However, one type of supervised machine learning performed well with extrapolations; in
fact, it was designed to make extrapolations on time series data. This algorithm is the recurrent
neural network discussed in the last section. RNNs perform very well on extrapolations as their
design inherently gives them some "memory" of previous data that they can use to predict new data.
While RNNs may be a good choice for performing time series analysis and extrapolations, they
have some significant drawbacks when applying them to ab initio data sets. First, RNNs, like all
neural networks, require much training data to make accurate predictions and avoid overfitting.
While this is not a problem for many applications of neural networks, this is a significant drawback
when applying any neural network to ab initio data sets. Since each new point in an ab initio
data set can represent significant computational time and resource investment, these data sets are
usually relatively small, especially by neural network standards [51]. One way around this is to
use an interpolation algorithm to increase the size of the data set artificially, but this represents an
additional step in the workflow and possibly an additional source of error in the analysis [51]. Thus
in this work, we want to avoid using interpolation as much as possible.
Secondly, due to how their weights are initialized, neural networks are inherently random [51].
This means a neural network, including RNNs, will produce slightly different results each time it is
trained, even when the same training data is used. The irreproducibility is a significant drawback
when using neural networks to predict physical values. Additionally, the uncertainty of a neural
network’s prediction is an open research question.
Finally, as mentioned in the previous chapter, neural networks generally have a large number
of hyperparameters, including the number of hidden layers, the number of neurons per hidden
layer, and the activation function, among many others. This leads to complicated and long tuning
processes, dramatically increasing the runtime needed to perform an ML analysis.
                                                 73


While RNNs are not a good choice of a machine learning algorithm for this application, we can
take inspiration from them and create a machine learning algorithm that can. A standard machine
learning algorithm is trained to take a point from the training set, 𝑥𝑖 , and match it to its corresponding
y values, 𝑦𝑖 . Thus training a standard machine learning algorithm has the following relations to be
learned:
                                    𝑓 𝑀 𝐿 (𝑥 1 ) = 𝑦 1 , 𝑓 𝑀 𝐿 (𝑥2 ) = 𝑦 2 , ....                      (6.1)
While this training pattern makes many supervised machine learning algorithms excellent at match-
ing new inputs to the correct output, they typically only perform well in the range of data encom-
passed by the training data.
However, there is a way to train RNNs around this problem. This training pattern is typically
used when RNNs perform time-series analysis, which is common in the financial industry. Instead
of learning the relationship between the 𝑥 and 𝑦 components of the data set, a time-series data
formatting teaches the RNN to learn the pattern between a sequence of 𝑦 values and the next 𝑦
value in the training data. The training points in this form will look as follows:
                                      𝑓 𝑅𝑁 𝑁 (𝑦 𝑘−3 , 𝑦 𝑘−2 , 𝑦 𝑘−1 ) = 𝑦 𝑘 ,                          (6.2)
making the RNN much better at extrapolating because it is trained to predict the next value in a
sequence. Note that the sequence length in the inputs could be of any length, thus adding another
hyperparameter.
The drawback of this form of training here is that the data must be sequential (have an order to it),
and the data must be evenly spaced regarding the dependent variables. Additionally, since the RNN
only sees the 𝑦 component of the data, it may lose the information encoded in the 𝑥 component.
However, an RNN trained in this manner is an extremely powerful extrapolator.
Nevertheless, as explained earlier in this section, there are several drawbacks to using RNNs in
this application. However, we can combine the training style of an RNN with a simpler machine-
learning algorithm to make a more straightforward but still powerful extrapolator. Furthermore,
since none of the data in this thesis is time-dependent, instead of calling this style of formatting the
                                                         74


data a time series, we will call it sequential formatting, as the data must be arranged sequentially.
This gives rise to the name of the machine learning algorithm we are building to extrapolate many-
body data sets, sequential regression extrapolation (SRE); so named because it combines sequential
data formatting with a regression algorithm to create an extrapolator that can make predictions from
small data sets. Thus we will train a regression algorithm using the following format:
                                       𝑓 𝑅 (𝑦 𝑘−3 , 𝑦 𝑘−2 , 𝑦 𝑘−1 ) = 𝑦 𝑘 .                         (6.3)
The length of the input, now known as the SRE sequence length, can be any length. However, the
larger the sequence length is, the fewer total data points present once the data has been formatted, so
there is a balancing act between choosing a sequence length long enough to encode the sequential
patterns in the data but not so long that there are very few points remaining after the formatting.
Now that we have developed the algorithm generally, we can apply it to remove the basis incom-
pleteness and finite size errors arising from many-body calculations of infinite matter systems. The
remainder of this section will develop the specific SRE formalism to remove these errors.
6.3   CCD Extrapolations With Respect to Number of Single Particle States
This section describes the SRE formulation to remove the basis incompleteness errors from CCD
calculations of infinite matter.
Both the Δ𝐸𝐶𝐶 and Δ𝐸 𝑀 𝐵𝑃𝑇 converge as M increases. In that case, there must be a large value of
M where their ratio becomes a constant:
                                      Δ𝐸𝐶𝐶,𝐿𝑎𝑟𝑔𝑒 𝑀
                                                            −→ 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡.                            (6.4)
                                    Δ𝐸 𝑀 𝐵𝑃𝑇,𝐿𝑎𝑟𝑔𝑒 𝑀
One of the machine learning algorithms described in the previous chapter will be used to find
this constant value using only data collected at low values of M, where the ratio still needs to be
converged. The energies Δ𝐸𝐶𝐶 and Δ𝐸 𝑀 𝐵𝑃𝑇 are calculated for the training data set generated at
small values of M. The exact values of M will depend on the value of N and the system in the
calculations. However, the largest value of M used across all training data in this thesis is 2,090
single-particle states. For each data set (constant N and 𝜌0 /𝑟 𝑠 ), the training data for the regression
                                                       75


algorithm was created by dividing Δ𝐸𝐶𝐶 by Δ𝐸 𝑀 𝐵𝑃𝑇 at the exact value of M.
                              Δ𝐸𝐶𝐶,𝑀        Δ𝐸𝐶𝐶,𝑀1        Δ𝐸𝐶𝐶,𝑀2      Δ𝐸𝐶𝐶,𝑀3
                      𝑦=                 =             ,             ,             , ...             (6.5)
                            Δ𝐸 𝑀 𝐵𝑃𝑇,𝑀 Δ𝐸 𝑀 𝐵𝑃𝑇,𝑀1 Δ𝐸 𝑀 𝐵𝑃𝑇,𝑀2 Δ𝐸 𝑀 𝐵𝑃𝑇,𝑀3
Next, the machine learning algorithm, 𝑓 𝑅 , is trained on this data set using the SRE formulation
developed earlier in this chapter. The sequence length shown here is three data points, but depending
on the analysis being done, it could range from one to three. We are adding this hyperparameter
into the analysis, whose value will have to be set, but in general, we want a smaller sequence length
if there is less training data:
                             Δ𝐸𝐶𝐶,𝑘−3       Δ𝐸𝐶𝐶,𝑘−2       Δ𝐸𝐶𝐶,𝑘−1         Δ𝐸𝐶𝐶,𝑘
                       𝑓𝑅 (               ,            ,               )=              .             (6.6)
                            Δ𝐸 𝑀 𝐵𝑃𝑇,𝑘−3 Δ𝐸 𝑀 𝐵𝑃𝑇,𝑘−2 Δ𝐸 𝑀 𝐵𝑃𝑇,𝑘−1        Δ𝐸 𝑀 𝐵𝑃𝑇,𝑘
The SRE algorithm is then used to extrapolate this data set to many points until the ratio of
correlation energies has converged. This value can then be taken as the slope of the graph created
with Δ𝐸𝐶𝐶 plotted as a function of Δ𝐸 𝑀 𝐵𝑃𝑇 :
                                              Δ𝐸𝐶𝐶,𝑘
                                        lim              = 𝑠𝑙𝑜 𝑝𝑒 = 𝑚.                               (6.7)
                                       𝑘→∞ Δ𝐸 𝑀 𝐵𝑃𝑇,𝑘
Finally, Δ𝐸 𝑀 𝐵𝑃𝑇,𝐿𝑎𝑟𝑔𝑒     𝑀 is generated, a process that takes less than one second (not including the
time to generate the matrix elements). This energy, Δ𝐸 𝑀 𝐵𝑃𝑇,𝐿𝑎𝑟𝑔𝑒       𝑀, is multiplied by the slope, m,
to approximate Δ𝐸𝐶𝐶,𝐿𝑎𝑟𝑔𝑒        𝑀.
                                    𝑚Δ𝐸 𝑀 𝐵𝑃𝑇,𝐿𝑎𝑟𝑔𝑒  𝑀   ≈ Δ𝐸𝐶𝐶,𝐿𝑎𝑟𝑔𝑒   𝑀                            (6.8)
Due to convergence, the number of single-particle states used to calculate the large M data sets will
vary depending on the system. It is important to note that the data sets used to train a machine
learning algorithm in this work consist of 3-16 points each, making these data sets some of the
smallest used in physics applications of machine learning. Many machine learning algorithms need
more points to be accurately trained, even up to 1-2 orders of magnitude more, as with some neural
networks. For example, the ML-based process shown in Reference [62] can generate accurate
molecular CCD correlation energies using only 12 data points with kernel ridge regression and
a training process based on MP2 amplitudes. However, the MP2 amplitudes used in the training
                                                     76


data have a very high dimensionality which can increase the time needed for training. Some other
studies designed machine learning algorithms to use small data sets but still needed around 100
data points, if not more (see Reference [63] for example) and are usually artificially extended with
interpolation (for example, see [51].
It is also of note that machine learning is typically only used to make extrapolations in exceptional
cases such as recurrent neural networks. When asked to make predictions outside their training
range, many machine learning algorithms could improve. However, it will be shown that SRE
can make accurate extrapolations using only a small training set, making it a unique machine-
learning algorithm. There is, however, a drawback to the SRE method. Since it relies only on
the 𝑦 component of the training data set, it assumes that the data is evenly spaced with respect
to the 𝑥 variable and can only make predictions at the same spacing. This works well for this
application since "evenly spaced" means a calculation at every closed shell of unoccupied states,
and extrapolations are made until the result converges (the exact 𝑥 value where this occurs is not
essential for this study). However, this limitation does mean that the SRE method is only suitable
for some applications.
Error Analysis on Prediction
The prediction has a measurable uncertainty since Bayesian methods calculate the slope, 𝑚. This
can be transferred to an uncertainty on the predicted converged CCD correlation energy using the
following scheme. Given that the converged CCD correlation energy is:
                                         Δ𝐸𝐶𝐶 = 𝑚Δ𝐸 𝑀 𝐵𝑃𝑇 ,                                      (6.9)
then we can relate the uncertainties on all three quantities using equation Equation 6.10. In Equation
6.10, 𝛿𝑥 refers to the uncertainty associated with quantity 𝑥.
                                      𝛿Δ𝐸𝐶𝐶 𝛿Δ𝐸 𝑀 𝐵𝑃𝑇 𝛿𝑚
                                               =              +                                 (6.10)
                                      Δ𝐸𝐶𝐶         Δ𝐸 𝑀 𝐵𝑃𝑇      𝑚
We will also assume that the uncertainty in the calculation of the MBPT2 correlation energy is
zero, so 6.10 simplifies to:
                                             𝛿Δ𝐸𝐶𝐶 𝛿𝑚
                                                     =      .                                   (6.11)
                                              Δ𝐸𝐶𝐶       𝑚
                                                   77


Now, Equation 6.11 is solved for 𝛿Δ𝐸𝐶𝐶 , or the uncertainty on the predicted converged CCD
correlation energy, yielding:
                                                          Δ𝐸𝐶𝐶
                                             𝛿Δ𝐸𝐶𝐶 =              𝛿𝑚.                              (6.12)
                                                            𝑚
However, this can be simplified using the following:
                                                                           Δ𝐸𝐶𝐶
                              Δ𝐸𝐶𝐶 = 𝑚Δ𝐸 𝑀 𝐵𝑃𝑇 −→ Δ𝐸 𝑀 𝐵𝑃𝑇 =                      ,                (6.13)
                                                                              𝑚
yielding as a final equation for the CCD correlation energy uncertainty:
                                          𝛿Δ𝐸𝐶𝐶 = Δ𝐸 𝑀 𝐵𝑃𝑇 𝛿𝑚.                                     (6.14)
Finally, to present the results as the correlation energy per particle, as is typically done with infinite
nuclear matter calculations, the uncertainty in the predicted converged CCD correlation energy can
be found by dividing both sides of Equation 6.14 by N.
                                              Δ𝐸𝐶𝐶 Δ𝐸 𝑀 𝐵𝑃𝑇
                                          𝛿(          =              𝛿𝑚                            (6.15)
                                                 𝑁            𝑁
6.4    CCD Extrapolations With Respect to Number of Particles
Removing the finite size error from truncating the number of particles in the system will be similar
to removing the basis incompleteness error. We will start with a data set that is created by dividing
the converged CCD correlation energies by the number of particles in the system for small values
of N, resulting in:
                                             𝑁𝑘            𝑁1         𝑁2
                                        Δ𝐸𝐶𝐶𝐷         Δ𝐸𝐶𝐶𝐷       Δ𝐸𝐶𝐶𝐷
                                   𝑦=              =            ,         , ....                   (6.16)
                                           𝑁𝑘            𝑁1          𝑁2
Then we will train a machine learning algorithm using the sequential formatting developed in this
last section on the data set we have created:
                                       𝑁 𝑘−3       𝑁 𝑘−2       𝑁 𝑘−1         𝑁𝑘
                                    Δ𝐸𝐶𝐶𝐷       Δ𝐸𝐶𝐶𝐷      Δ𝐸𝐶𝐶𝐷         Δ𝐸𝐶𝐶𝐷
                               𝑓𝑅 (           ,          ,           )=          .                 (6.17)
                                     𝑁 𝑘−3       𝑁 𝑘−2      𝑁 𝑘−1           𝑁𝑘
The final step is to use the trained machine learning algorithm to extrapolate this ratio until
convergence, resulting in the CCD correlation energy in the thermodynamic limit. Thus, we have:
                                                                  ∞
                                            lim = lim = Δ𝐸𝐶𝐶𝐷          ,                           (6.18)
                                           𝑘→∞      𝑘→∞
                                                        78


where Δ𝐸𝐶𝐶𝐷∞    is the CCD correlation energy per particle in the thermodynamic limit. Additionally,
since the prediction the machine learning algorithm makes is the result we are looking for, the
uncertainty of that prediction does not need to be modified.
6.5   CCDT Extrapolations With Respect to Number of Single Particle States
For removing the basis incompleteness errors from calculations using the approximative triples
methods, we will use the same method developed for removing the basis incompleteness errors
from CCD calculations. Even though the MPBT2 correlation energies are still at a comparatively
lower level of approximation than the CCDT approximations, the SRE methods will still extrapolate
well because both MBPT2 and CCDT converge as the value of 𝑀 increases, and both MBPT2 and
CCDT converge at roughly the same rate. So, fortunately, CCD and CCDT correlation energies
can be predicted with the same methods.
However, we will not perform extrapolations to the thermodynamic limit for the approximative
triples calculations because we will use system sizes that already reduce the finite size effects. This
will be explained in greater detail in Chapter 8.
6.6   Conclusion
In this chapter and the previous one, we have developed the computational framework needed for
this thesis. First, we explored several supervised machine learning algorithms, and then we used
these algorithms to develop the sequential regression extrapolation (SRE) algorithm. Though we
have developed the SRE algorithm to work with any supervised machine learning algorithm, in
practice, we will restrict ourselves to only using three in the following two results chapters: ridge
regression, Bayesian ridge regression, and Gaussian processes. Several reasons for not using neural
networks or recurrent neural networks for this application have been given in this chapter and the
previous one. Though kernel ridge regression does perform well as the machine learning algorithm
in an SRE analysis, the large amount of hyperparameters KRR has makes it unattractive. Many-
body calculations are very time-consuming; thus, avoiding the need to generate a validation data set
is optimal. Furthermore, suppose we wish to develop a machine learning extrapolation algorithm
that is both accurate and saves time. In that case, the time costs of generating a validation data
                                                  79


set and performing an extensive tuning process over many hyperparameters are unattractive. Thus
neural networks, recurrent neural networks, and kernel ridge regression are eliminated as possible
algorithms.
The following two chapters will apply the methods we have developed in this chapter to coupled
cluster calculations of infinite nuclear matter calculations. Chapter 7 will focus on calculations of
the homogeneous electron gas, and Chapter 8 will focus on calculations of infinite nuclear matter;
pure neutron matter and symmetric nuclear matter will be analyzed.
                                                  80


                                             CHAPTER 7
                     THE HOMOGENEOUS ELECTRON GAS RESULTS
The first infinite matter system we will analyze with the SRE method is the homogeneous electron
gas (HEG), described in Chapter 4. The HEG, as one of the simpler infinite matter systems, is an
excellent sandbox to fully develop the SRE method described in the last chapter.
Figure 7.1 shows the results of computing the CCD correlation energy per electron for a HEG with
𝑟 𝑠 = 0.5 at various numbers of electrons and with four different numbers of single-particle states.
We can see that the calculation of Δ𝐸𝐶𝐶 depends heavily on the number of single-particle states in
the system, and we can begin to see that the correlation energy converges as the number of single-
particle states in the system increases. The calculations performed with only 514 single-particle
states (or 15 shells) are the least accurate of the four plots in Figure 7.1 since they are performed
with the fewest single-particle states. On the other hand, the calculations performed with 6,142
single-particle states (or 70 shells in total) are the most accurate of the four plots, and in fact,
the convergence of Δ𝐸𝐶𝐶 has been confirmed at this point. However, it is not always feasible to
perform the calculations at this high number of single-particle states due to the associated high
computational time and resource requirements.
Figure 7.1 Δ𝐸𝐶𝐶 per electron plotted against N with calculations performed at four different values
of M. As M increases, the error in the calculation decreases, but the run time drastically increases
                                                   81


Figure 7.2 shows the total run time in node seconds for CCD calculations of the HEG with 𝑟 𝑠 = 0.5,
N = 66, 294, or 514, and at various numbers of single-particle states. Node seconds/minutes/hours/-
days will be used to report computational run times in this thesis so that run times generated with
different numbers of MPI nodes can be compared. A node second (or minute, hour, or day) is
defined as the run time of the coupled cluster program (in seconds/minutes/hours/days) times the
number of MPI nodes used in the calculation. For the HEG, all calculations were performed using
nodes from Michigan State University’s high-performance computing center. Each node is an Intel
Xeon processor with a reported clock speed of 2.4 GHz. Each coupled cluster program was run
with four MPI nodes and 28 OpenMP threads. The specifications of the coupled cluster code used
to perform the HEG and neutron matter calculations with the Minnesota potential shown in the
section are described in Ref. [78].
In Figure 7.2, there is a consistent pattern in the data where more electrons and more single-particle
states in a calculation require a longer run time. This is an obvious and expected result, and in fact,
the CCD run times should scale polynomially with the number of particles and single particle states
allowed in the calculation. For example, a CCD calculation of the HEG should scale as 𝑂 (𝑁 4 𝑀 6 ),
where typically M is much greater than N [38]. If we look at the results for N = 66 and assume that
the converged CCD correlation energy occurs at M = 6,142 single-particle states (see Figure 7.1),
the total run time for the CCD program is 653.88 node seconds (10.89 node minutes). However,
generating the converged CCD correlation energy for N = 514 and M = 6,142 takes 42,791 node
seconds (11.88 node hours). Thus while generating the converged CCD correlation energies at
a smaller number of particles results in reasonable computational run times, performing the same
calculation at a large number of electrons is not feasible in most studies.
Another factor we can look at is the amount of RAM needed to perform the CCD calculations at
various numbers of electrons and single-particle states. The amount of RAM needed to perform
the calculations should increase as both N and M increase as, naively, the Hamilton for the system
               
should be 𝑀  𝑁 on each side. However, modern advancements limit the number of matrix elements
that must be computed and stored. Figure 7.3 shows the total amount of RAM (in gigabytes)
                                                    82


Figure 7.2 The total run time needed, in node seconds, to perform a CCD calculation on the HEG
with 𝑟 𝑠 = 0.5 and at N = 66, 294, and 514. The CCD calculations are performed at many values
of M, as shown on the x-axis. A node second is defined as the total run time of the program
times the number of nodes used to run the program. Four nodes from Michigan State University’s
high-performance supercomputer were utilized in this case. These nodes have Intel Xeon processes
with a clock speed of 2.4 GHz
needed to perform CCD calculations for the HEG at 𝑟 𝑠 = 0.5. There are three different numbers
of electrons represented in the figure: N = 66 (red triangles), N = 294 (blue squares), and N = 514
(green circles). These numbers of electrons span the range of interest for this work. The number
of single-particle states varied from less than 1,000 to over 6,000. We see a consistent pattern in
the data where the amount of RAM needed for the calculation increases as the number of electrons
and single-particle states changes. The converged CCD correlation energy for 514 electrons (the
highest amount of electrons in this thesis) and at 6,142 single particle states takes a startling 283.79
GB of RAM to run. The converged CCD correlation energy at 66 electrons and 6,142 single particle
states still takes 106.49 GB of RAM, a very moderate number of particles to have in a calculation.
Therefore, not only is computing the converged CCD correlation energies expensive in the amount
of computational time required, but the converged correlation energies also take a considerable
amount of computational resources, requiring a supercomputer for these calculations to be feasible.
Thus, though increasing the number of single-particle states in a CCD calculation of the HEG
                                                  83


Figure 7.3 The total amount of RAM in gigabytes needed to perform CCD calculations on the HEG
with 𝑟 𝑠 = 0.5 and with 66, 294, or 514 electrons. Calculations were performed at various numbers
of single-particle states (x-axis)
increases the accuracy of the CCD correlation energy, it comes with high computational time and
resource considerations that can make these calculations infeasible. Therefore, we are motivated to
develop a method by which the CCD correlation energies at a high number of single-particle states
can be recreated using only calculations performed at a lower number of single-particle states,
where the computational resource and time requirements are significantly more reasonable.
In Figure 7.4, Δ𝐸𝐶𝐶𝐷 and Δ𝐸 𝑀 𝐵𝑃𝑇2 is plotted as a function of the number of single-particle states
in the calculation for a HEG system with 𝑟 𝑠 = 0.5 and 342 electrons. The CCD results are plotted
in red, and the MBPT2 results are in blue. The calculations up to 2,090 single-particle states are
plotted with the solid lines, and then the converged correlation energies are plotted with the dashed
lines. It is much easier in this figure to see that the correlation energies converge as the number of
single-particle states increases and that even at around 2,000 single-particle states, the correlation
energies are not close to their converged values. However, the ranges of single-particle states
shown in this figure with the solid lines are the calculations we wish to use to predict the converged
CCD correlation energy, as at these ranges, the computational time and resource requirements are
reasonable.
                                                    84


Figure 7.4 A plot of Δ𝐸𝐶𝐶 and Δ𝐸 𝑀 𝐵𝑃𝑇 versus M for a HEG system with N=342 and r𝑠 =0.5. The
converged correlation energies calculated at M=6,142 are shown with dashed lines
There are a few important things to note in Figure 7.4. First, there is a significant difference between
the converged correlation energies, meaning that even though it is easier to generate Δ𝐸 𝑀 𝐵𝑃𝑇
compared to Δ𝐸𝐶𝐶 , Δ𝐸 𝑀 𝐵𝑃𝑇 is not a good approximation to the true correlation energy. Second,
though it takes 21,735.10 node seconds (6.03 node hours) to calculate Δ𝐸𝐶𝐶,6142       342   it only takes
5.12 node seconds to generate Δ𝐸 𝑀     342        . Thus generating the converged MBPT2 correlation
                                         𝐵𝑃𝑇,6142
energies is significantly faster than generating the converged CCD correlation energies; the MBPT2
calculations are faster by four orders of magnitude. Finally, note that though the MBPT2 and CCD
calculations converge to different values, they converge at roughly the same rate and display similar
patterns in their convergence. This similarity in convergence was the inspiration for the SRE
method.
Next, we will take Figure 7.4 and reformat it such that Δ𝐸𝐶𝐶 is plotted as a function of Δ𝐸 𝑀 𝐵𝑃𝑇
(Figure 7.5). Each point is at a constant value of M in this format, and M increases from the right
to the left. In Figure 7.5, note that the graph is relatively linear, and in fact, it becomes more linear
from right to left. Put another way, the graph of Δ𝐸𝐶𝐶 versus Δ𝐸 𝑀 𝐵𝑃𝑇 becomes more linear as
M increases because as M increases, both correlation energies converge. We can check this using
Figure 7.6
                                                     85


Figure 7.5 Figure 7.4 is transformed such that Δ𝐸𝐶𝐶 are plotted as a function of tΔ𝐸 𝑀 𝐵𝑃𝑇 (such
that each point has the same M). The resulting plot has a linear relationship that becomes stronger
to the left (as M increases)
Figure 7.6 shows the ratio of Δ𝐸𝐶𝐶 /Δ𝐸 𝑀 𝐵𝑃𝑇 as a function of the number of single-particle states in
the calculations for three different numbers of electrons for calculations of the HEG at 𝑟 𝑠 = 0.5. The
ratio Δ𝐸𝐶𝐶 /Δ𝐸 𝑀 𝐵𝑃𝑇 is the slope of the line in Figure 7.5. While the slopes are rapidly changing at
lower values of M, by the end of the graph, all of the slopes are almost constant, though the slopes
at higher numbers of electrons converge slower than the slopes at lower numbers of electrons.
However, this graph does verify that the slopes will become constant, and the SRE method should
work if we can accurately predict this slope.
                                                  86


Figure 7.6 A plot of the ratio of Δ𝐸𝐶𝐶 to Δ𝐸 𝑀 𝐵𝑃𝑇 as a function of M for N=114, 342, and 514
and r𝑠 =0.5. As M increases, the ratio (or the slope of the line in Figure 7.5) becomes increasingly
constant. However, it is important to note that the run time increases drastically as M increases
Traditional Extrapolation Methods
Before we begin to develop in full the SRE method for determining the converged CCD correlation
energies with respect to the number of single-particle states in the calculation, let us look at two
traditional methods for removing the basis incompleteness error and two methods that may work
based on the analysis from the last section.
The most common way to deal with a basis incompleteness error is to truncate the number of single-
particle states at some high number. Figure 7.7 displays the results of truncating the calculations
at 20 open shells, corresponding to between 922 and 2,090 single particle states depending on the
number of electrons in the calculation for a HEG with 𝑟 𝑠 = 0.5. This graph, plotted in orange, is
compared to the correlation energy calculation performed at 70 total energy shells (M = 6,142) in
black, which are the fully converged results where the basis incompleteness error is eliminated. Of
course, one way to negate the basis incompleteness error is to perform calculations at an higher
numbers of single-particle states. However, as we saw in the last section, this is not necessarily
feasible due to the high computational time and resource requirements, and, therefore, simply
increasing the number of single-particle states in the system is not an attractive option for removing
the basis incompleteness error.
                                                  87


Another common method is to find the correlation energy in the complete basis limit by using a
power law of the form:
                                   Δ𝐸𝐶𝐶𝐷 (𝑀) = 𝐸𝐶𝐶𝐷,∞ + 𝐴𝑀 −𝛼 ,                                (7.1)
where 𝐸𝐶𝐶𝐷,∞ (the correlation energy in the complete basis limit) and 𝐴 are fit to data from
correlation energy calculations at several different values of M. The value of 𝛼 is typically fixed
to 1, so this method is generally referred to as a 1/M power law [38]. The results of performing
this power law fit on a HEG system with 𝑟 𝑠 = 0.5 and using 16 data points per number of electrons
(from 5 open shells to 20 open shells) are shown in Figure 7.7 in blue. We can see that this power
law fit overestimates the HEG correlation energies by about the same margin that the truncation
at 20 open shells underestimated them. This indicates that for the data available, neither of these
traditional methods of removing the basis incompleteness error are attractive option. The power
law does not provide a good fit and overestimates the correlation energies; while truncating the
basis at a large number of single-particle states can provide an accurate result, the computational
time and resource requirements make this option not feasible for many systems.
Next, we can look at two methods inspired by the analysis performed in the previous section. In
Figure 7.5, the graph is reasonably linear if we plot the CCD correlation energies as a function of
the MBPT2 correlation energies. The first approach to calculating the converged CCD correlation
energies from this plot may be to take the CCD and MBPT2 calculation at a relatively high number
of single-particle states, create the ratio 𝑚 = Δ𝐸𝐶𝐶𝐷 /Δ𝐸 𝑀 𝐵𝑃𝑇2 , and then take m and multiply it
by the MBPT2 correlation energy at a very high number of single-particle states to approximate
the CCD correlation energy at that same high number of single-particle states (here M = 6,142):
Δ𝐸𝐶𝐶𝐷,6142 ≈ 𝑚Δ𝐸 𝑀 𝐵𝑃𝑇2,6142 . For this investigation, we derived the slope from calculations
performed at 20 open energy shells (the same truncation level we used earlier), then calculated
the slope and generated a prediction for the CCD correlation energy at 6,142 single-particle states.
The results of this analysis are shown in green in Figure 8.11. We can see that this is a much
better approximation of the converged CCD correlation energies than the previous two attempts.
However, there are still significant deviations at higher numbers of electrons. Therefore, while this
                                                   88


is an improvement, we will seek a better method to accurately predict the converged correlation
energies at all numbers of electrons, including the high values.
As a final method, we will use linear regression to fit a subset of the data shown in Figure 7.5. It
should be noted that even though linear regression is quite a simple machine learning algorithm,
this is our first machine learning approach to remove the basis incompleteness error from these
calculations.
For training the linear regression algorithm, we will use 16 sets of MBPT2 and CCD correlation
energies drawn from calculations between 5 and 20 open shells (the exact number of single-particle
states depends on the number of electrons). First, the linear regression algorithm will be trained
and then asked to predict Δ𝐸𝐶𝐶𝐷,6142 by giving Δ𝐸 𝑀 𝐵𝑃𝑇2,6142 . It is important to remember that
generating MBPT2 correlation energies is a fast computation.
The results of performing this analysis are shown in red in Figure 7.7. These results are similar
to the previous slope analysis, where the correlation energies at lower numbers of electrons are
well matched, but there are deviations at higher numbers. We could improve this result by using
training data drawn from higher M values. However, like the other methods shown here, that is an
unattractive solution due to the increased computational time and resource requirements.
Thus the two traditional extrapolation methods we have looked at and the two new methods fail to
satisfactorily remove the basis incompleteness errors without requiring data drawn from calculations
with a much higher number of single-particle states. Thus, we need to be in a position to begin
fully developing the SRE method.
Ridge Regression
Now that we have looked at performing the basis completeness extrapolations with traditional
methods, we can develop a machine learning-based approach. A good machine learning algorithm
will have a small number of hyperparameters to tune and give reproducible and accurate results for
the extrapolation. Ridge regression only has one hyperparameter and a closed-form solution for its
optimized weights. Therefore, it will always train the same way when given the same training data,
making it a much more attractive alternative than neural networks, with their many hyperparameters
                                                 89


Figure 7.7 The converged CCD correlation energies (black) calculated at M = 6,142 for an HEG
with 𝑟 𝑠 = 0.5. Also displayed are the results of predicting the converged correlation energies with
a truncation at 20 open shells (yellow) which has a percent error of 6.87%, using a 1/M power law
(blue) which has an average percent error of 7.00%, using the CCD to MBPT2 slope calculated at
20 open shells (green) which have an average percent error of 1.16%, and using linear regression
to model the graph from Figure 7.5 (red) which has an average percent error of 0.95%
and unrepeatable results.
The ridge regression algorithm contains one hyperparameter, 𝛼, which is the strength of the
regularization. Since this is a hyperparameter, the user must set its value before training the ridge
regression algorithm. However, its value can significantly impact the results of the ridge regression.
For example, figure 7.8 shows the results of using the SRE algorithm with ridge regression and
various 𝛼 values to predict the converged Δ𝐸𝐶𝐶 energies of the HEG with 𝑟 𝑠 = 0.5.
For Figure 7.8, the training data was chosen to be between 5 and 20 open shells (inclusive) or 114 ≤
M ≤ 2,090 (exact values of M depend on N). The sequence length for the SRE algorithm was set to
3, and the data set was extrapolated until there were 50 total points and the convergence of the slope
was confirmed. The value of Δ𝐸𝐶𝐶 we get from this process is considered converged. Therefore we
will compare it to converged calculations of the HEG at M = 6,142, where the correlation energy
convergence has converged. In Figure 7.8 Δ𝐸𝐶𝐶,6142 are plotted with scattered square points.
These are considered to be the "true" or most accurate results. The solid lines show the results
                                                  90


Figure 7.8 This figure shows the extrapolations’ dependence on the hyperparameter 𝛼. The results
of performing an SRE analysis on the HEG with 𝑟 𝑠 = 0.5 and at various numbers of electrons. The
machine learning algorithm used for the SRE algorithm is ridge regression
of performing the SRE algorithm to converge the correlation energy with respect to the number
of single-particle states using the SRE algorithm with a variety of values for the hyperparameter
𝛼. Note that several of the values of 𝛼 produce essentially equivalent results, so there are not
six distinct lines. For 𝛼 = 10−10 and 10−5 (which provide nearly identical results), the results are
very close to the fully calculated converged correlation energies, with the average percent error
overall numbers of electrons of 0.49%. However, on the other hand, 𝛼 = 1 and 𝛼 = 100 provide
results that visually appear to be suitable matches for the expected pattern in the data but do not
match the converged calculations. However, if the fully converged calculations were not available
for comparison, these usually could be considered reasonable predictions. For 𝛼 = 1, the average
percent error overall numbers of electrons is 3.09%, and for 𝛼 = 100, the average percent error is
3.87%. Not shown in Figure 7.8 are the predictions for 𝛼 = 0.01 because the average percent error
for the SRE algorithm’s predictions and the actual data for this 𝛼 value is 3.45x106 %. While this
produces an undeniably incorrect result even if the actual data was not available for comparison,
this shows that wide variations in predictions can happen as 𝛼 varies, and 𝛼 values both larger and
smaller than 0.01 provide reasonable predictions, but 𝛼 = 0.01 provides predictions that do not
converge.
A hyperparameter is typically set through a process known as hyperparameter tuning. In hyperpa-
rameter tuning, many different values are tested for the hyperparameters to find the best ones. First,
                                                 91


the algorithm is defined using a given set of hyperparameters and trained using the training data;
then, its accuracy is tested using a validation data set. The validation data set can be a subset of the
training data, the test data, or a new data set altogether. This define-train-test cycle is repeated with
many combinations of hyperparameters, and the hyperparameters that produce the lowest error on
the validation data set are kept for future use.
For the HEG, we have data at five different values of 𝑟 𝑠 , so it was decided to choose one value of
𝑟 𝑠 for hyperparameter tuning. Therefore, for the chosen value of 𝑟 𝑠 , not only does the SRE training
data need to be generated, but the converged correlations energies at M = 6,142 also need to be
generated, which will be a time-consuming process. When performing the hyperparameter tuning
process on 𝛼, 1,000 different values were tested, evenly spaced on a log scale between 10−10 and
100 (inclusive). For each value, a ridge regression algorithm was defined with that value for 𝛼. The
ridge regression algorithm was used for an SRE analysis using the same training process described
above. The predicted values for Δ𝐸𝐶𝐶 were then compared to the fully converged calculations. The
value for 𝛼 that produced the lowest average percent error overall numbers of electrons was kept.
This process was performed independently with all five different 𝑟 𝑠 data sets to see how the results
compare.
When the hyperparameter tuning is performed using the data sets 0.1 ≤ 𝑟 𝑠 ≤ 0.75, the optimal
alpha value found was 10−10 . However, the hyperparameter tuning is performed on the data set at
𝑟 𝑠 = 0.05, and the optimal alpha value was found to be 4.27x10−4 . When 𝛼 = 10−10 the average
percent error for all 𝑟 𝑠 values was 0.45%, which is quite good and is a very acceptable error on the
actual data for a prediction made with unconverted data. However, when 𝛼 = 4.27x10−3 the average
percent error for all 𝑟 𝑠 values is 7.5x1011 %, because some of the predictions do not converge.
When analyzing the results, this visually is a wrong prediction, but as seen in Figure 7.8, values of
𝛼 could be chosen that provide seemingly good but incorrect answers. Moving forward with the
ridge regression analysis, we will use 𝛼 = 10−10 .
Figure 7.9 shows the results of performing an SRE analysis on the HEG data set using a ridge
regression algorithm with 𝛼 = 10−10 . This value was chosen because it produces the smallest
                                                    92


Figure 7.9 The results of performing an SRE analysis on the HEG to predict the converged CCD
correlation energies for 14 different numbers of electrons and five different values of 𝑟 𝑠 . The SRE
predictions, made using a ridge regression algorithm with 𝛼 = 10−10 , are shown with the triangular
markers and the solid lines representing the full correlation energy calculations at M = 6,142. The
average percent error between the SRE predictions and the full calculations is 0.45%
validation error when performing hyperparameter tuning with the 𝑟 𝑠 = 0.5 data set. The SRE
training procedure was the same as described earlier in this section. The average percent error
across all data points in Figure 7.9 is 0.45%, which is tiny. This error is an acceptable level of
accuracy of the machine learning predictions, especially considering the potential computational
time and resource savings.
The time needed to generate the training data for the hyperparameter tuning process (i.e., the
correlation energies from 5 to 20 open shells for 𝑟 𝑠 = 0.5) was 27.52 node hours. However, the
time needed to generate the validation data set (i.e., the correlation energies at M = 6,142 for 𝑟 𝑠
= 0.5) was 42.6 node hours, meaning that the total time needed to generate the data needed for
the hyperparameter tuning was 70.12 node hours so there is considerable time to find the optimal
value of 𝛼. If we consider performing the SRE analysis on all five HEG data sets, the time needed
to generate all training data is 153.28 node hours. Compared to the time needed to generate
all Δ𝐸𝐶𝐶,6142 at 219.44 nide hours, this leads to a time savings for the SRE method of 66.16
node hours, or over half a day of computational time saved using the SRE method over doing the
complete calculations with no loss of relevant accuracy as the average percent error is less than
0.5%. However, when we add the time needed to generate the validation data set to tune 𝛼, the time
savings drops to only 23.56 node hours, which is much less attractive savings.
                                                  93


Though the SRE algorithm with ridge regression can accurately recreate the converged Δ𝐸𝐶𝐶
for the electron gas at various high densities, the results also display a significant dependence
on the hyperparameter 𝛼 value. Because of this dependence, if the fully calculated converged
Δ𝐸𝐶𝐶 are not also available, as they are in this work; for comparison, it could be easy to choose
a value of 𝛼 which created a set of converged Δ𝐸𝐶𝐶 which visually look feasible but are not
accurate (see Figure 7.8 for examples). Furthermore, the high computational time and resource
requirements needed to generate the validation data set for the hyperparameter tuning process are
unattractive. Therefore, we wish to develop a method that can provide the accuracy displayed in
Figure 7.9 without dependence and reliance on hyperparameters. Thus, we will start to investigate
the Bayesian machine learning implementation of ridge regression: Bayesian ridge regression.
Bayesian Ridge Regression
Bayesian ridge regression, the Bayesian machine learning implementation of ridge regression,
should be a better choice for the machine learning algorithm for this SRE analysis. Since it is a
Bayesian algorithm, it uses Bayesian statistics to find a good value for the hyperparameter, 𝛼. This
means that we do not need to perform hyperparameter tuning or generate a validation data set.
Therefore, using Bayesian ridge regression, the only CCD calculations that need to be performed
are those for the training data, meaning this method should result in significant time savings between
the SRE results and the computing fully converged energies. Additionally, since this is a Bayesian
algorithm, it will give us an error in its predictions. Hence, we can get an uncertainty on our
converged CCD correlation energy prediction.
This section attempts to predict the CCD correlation energy at M = 6,142 single-particle states
(the converged correlation energy) using only the CCD and MBPT2 correlation energies from 5-20
opens shells and the MBPT2 correlation energy at M = 6,142. The exact values of M used in
calculating the training data depend on the number of particles in the system but will range from 66
to 2,090 single-particle states. At each of these calculations, the number of single-particle states is
not enough to converge the CCD correlation energy, thus, all of the training data suffers from basis
incompleteness errors. Here we will be looking at calculations performed at five different values
                                                  94


of 𝑟 𝑠 , so these systems are of a high-density electron gas.
Then we used the implementation provided by the Python library Scikit-Learn for Bayesian ridge
regression. The SRE sequence length was two, meaning the Bayesian ridge regression algorithm
took two inputs (the previous two ratios of CCD correlation energy to MBPT2 correlation energy).
Once the algorithm was trained using the 16 training points, the SRE algorithm was used to
extrapolate the ratio until it was converged (resulting in a data set of 50 total points). The final ratio
was extracted and multiplied by Δ𝐸 𝑀 𝐵𝑃𝑇2,6142 to approximate Δ𝐸𝐶𝐶𝐷,6142 . The analysis results on
14 numbers of electrons and five different 𝑟 𝑠 values are shown in Figure 7.10.
Figure 7.10 plots the converged CCD correlation energies calculated at M = 6,142 with solid lines.
Calculations were performed at 14 different numbers of electrons and five values of 𝑟 𝑠 . Each color
in the figure represents a different value of 𝑟 𝑠 . The SRE predictions for each correlation energy
are shown with the triangular markers, and the shaded regions represent the uncertainty in the
predictions, which comes from the Bayesian ridge regression algorithm. The average percent error
for the entire graph is 0.47%. It should be noted that this is essentially the same result we achieve
from the ridge regression analysis with the best value of 𝛼.
Furthermore, the time needed to generate the SRE training data for all predictions was 200.44 node
hours, while the total time needed to generate all energies at M = 6,142 is 289.43 node hours.
This leads to a time savings of 88.99 node hours or over 3.5 days of computational time saved
with no loss of relevant accuracy. So not only can the SRE method with Bayesian ridge regression
accurately predict the converged CCD correlation energies for the HEG across a variety of values
of N and 𝑟 𝑠 , but it can also produce uncertainties on its predictions and results in considerable time
savings over performing the complete calculations.
                                                   95


Figure 7.10 The results from performing an SRE extrapolation in terms of M for various numbers
of electrons and values of r𝑠 . Results are plotted against Δ𝐸𝐶𝐶,6142
Finally, we can compare the performance of the SRE algorithm to the traditional extrapolation
methods we analyzed in Figure 7.7. For the HEG at 𝑟 𝑠 = 0.5, if we add the SRE predictions to the
others, we get Figure 7.11. We can see that the SRE predictions are much closer than any other
methods and provide a reasonable approximation to the converged correlation energies, even at
high numbers of electrons.
In this section, we have developed the first Bayesian implementation of the SRE method. It has the
accuracy of the ridge regression implementation developed in the previous section but without the
ambiguity of choosing the algorithm’s hyperparameters. Additionally, Bayesian ridge regression
provides uncertainties in its predictions, which are helpful in a scientific context. Overall, the SRE
method with Bayesian ridge regression is the most promising of the methods attempted so far to
remove the basis incompleteness errors.
Gaussian Process with Rational Quadratic Kernel
As a final study for the HEG, let us use the Gaussian process (GP) algorithm, another Bayesian
machine learning algorithm, as the machine learning component in an SRE analysis. Here we will
be using Gaussian processes as the machine learning component of the SRE algorithm. The kernel
                                                   96


Figure 7.11 Here, with have recreated Figure 7.7, but have added the SRE results, which have an
average percent error of 0.31%
function will be a modified rational quadratic kernel consisting of a constant kernel multiplied by
a rational quadratic kernel and then added to a white kernel. All kernels and the GP algorithm are
implemented using the Python library Scikit-Learn.
As a first study, we will use the same training data set as before, 16 training points taken from
between five and 20 open shells. We will use an SRE sequence length of three for this training.
The only other parameter we set is the 𝛼 value of the GP algorithm, which we set to 𝛿𝑦 𝑡𝑟𝑎𝑖𝑛 4   , where
𝛿𝑦 𝑡𝑟𝑎𝑖𝑛 is the standard deviation 𝑦 component of the training data set. Using the standard deviation
when setting 𝛼 seems to keep the resulting uncertainties small but has little effect on the prediction
results.
Figure 7.12 shows the analysis results on the same data set used in the Bayesian ridge regression
section. The average percent error between the SRE predictions for the converged correlation
energies and the fully calculated correlation energies at M = 6,142 is 0.7three%. While this is
larger than the error resulting from the Bayesian ridge regression analysis, it is still quite small and
acceptable. Furthermore, it is essential to note that the uncertainties on this, represented by the
shaded regions, are significantly smaller than those on the Bayesian ridge regression plot. They are
                                                 97


Figure 7.12 The results of performing an SRE analysis with Gaussian processes to predict the
converged correlation energies for the HEG at different values of N and 𝑟 𝑠 . The correlation
energies calculated at M = 6,142 are plotted with a solid line and are taken to be the true results,
the SRE predictions are plotted with triangular markers, and the shaded region represents the
uncertainty on the SRE predictions
so small in places hidden entirely by the lines!
As a final analysis with the SRE method on the HEG, we will again use GP as the machine learning
algorithm but will reduce the number of training points from 16 to only ten. Here we will use data
collected from five to 14 open shells, thus removing the six training points with the highest number
of single-particle states (and thus the highest computational costs). Here we have also reduced the
SRE sequence length to 1 as there are a few training points.
The results of performing this analysis are shown in Figure 7.13, using the same plotting scheme
as 7.12. The lower number of training points does lead to a higher average percent error of 1.16%,
but this is still a relatively low error, and it does appear, based on the results graph, that most of
that error is concentrated in the last two points of the 𝑟 𝑠 = 0.75 data set.
Next, we must point out the time savings we have gained by reducing the number of training points.
When using 16 training points, the time savings is 88.99 node hours, which is not an insignificant
amount of time saved. However, with only ten training points, and since the six training points
we have removed have the most significant computational time, with this analysis, the amount of
computational time saved is 224.43 node hours. That is equivalent to over a week in computational
time saved for an error of only 1.16%!
Finally, we justify using the Gaussian processes as the machine learning algorithm over Bayesian
                                                    98


Figure 7.13 The results of performing an SRE analysis with Gaussian processes to predict the
converged correlation energies for the HEG at different values of N and 𝑟 𝑠 and only ten training
points. The correlation energies calculated at M = 6,142 are plotted with a solid line and are taken
to be the true results, the SRE predictions are plotted with triangular markers, and the shaded
region represents the uncertainty on the SRE predictions
ridge regression when there is a small amount of training data. For example, suppose we were to
perform this same analysis with Bayesian ridge regression, using ten training points and a sequence
length of 1. In that case, the average percent error across all the correlation energies is 6.05%,
much higher than the 1.16% we see with GP. Thus, the GP algorithm works better with smaller
data sets and will be the primary algorithm we will use in the next chapter to analyze the infinite
nuclear matter, as no training set in that chapter will contain more than ten points.
7.1   The Thermodynamic Limit
As a final look at the HEG, we can attempt to calculate the correlation energy per electron in the
thermodynamic limit using the formalism developed in the previous chapter. We have used 14
training points, from N = 2 to N = 502, and an SRE sequence length of three. Again, the machine
learning algorithm used to make these predictions is the Bayesian Ridge regression algorithm. We
attempted this extrapolation using the correlation energies, which have been fully calculated at M =
6,142, and the correlation energies, which have been predicted by the SRE method and are shown
in Figure 7.10. However, these two extrapolations result in identical correlation energies, with a
percent error between the two extrapolated data sets being 0.0%. Thus, there is no reason to use
the fully calculated data to predict the thermodynamic limit (TDL) correlation energies when the
                                                  99


Figure 7.14 The converged CCD correlation energies for the HEG calculated at M = 6,142 (solid)
and the TDL predictions for the correlation energy per particle calculated through the SRE method
(dashed)
SRE predictions result in the same answer and yield significant time savings.
Figure 7.14 shows the correlation energies per particle for the HEG at a variety of values for N and
𝑟 𝑠 with solid lines. The horizontal dashed lines show the correlation energy at the TDL as predicted
by the SRE algorithm. Even though the correlation energies show significant oscillations with
respect to the number of electrons in the system (likely due to the periodic boundary conditions),
the SRE predictions seem to provide a suitable match for where the oscillations will converge.
While there do exist literature values for the HEG in the TDL (for example, see the work by Bishop
et al. in [39] and [40]), the results are highly dependent on the many-body method used, the values
of 𝑟 𝑠 studies, and the specifications of the HEG system. Thus we could not find any currently
published literature results to compare our predictions for the exact specifications we have reported.
However, we can use a power law, a standard method to find the TDL energies (see Ref. [148]).
Therefore, we will use a power law of the form:
                                                       ∞
                                     Δ𝐸𝐶𝐶𝐷 (𝑁) = Δ𝐸𝐶𝐶𝐷     + 𝐴𝑁 −𝛼 ,                             (7.2)
where Δ𝐸𝐶𝐶𝐷 ∞ , A, and 𝛼 are parameters that are fit using data generated at finite numbers of
electrons and Δ𝐸𝐶𝐶𝐷 ∞   the is correlation energy at the thermodynamic limit. Usually, 𝛼 is fixed to
one, which is used when calculating the correlation energy for the system at the TDL. We, however,
are interested in calculating the correlation energy in the TDL per electron and thus will leave 𝛼
                                                   100


Figure 7.15 The CCD correlation energy per electron for an HEG with 𝑟 𝑠 = 0.5 and calculated at
M = 6,142 (black). The power law prediction for the TDL correlation energy per electron is shown
in green, and the SRE predictions for the TDL correlation energy per electron extrapolated from
calculated and predicted energies are shown in red and blue, respectively
unfixed. The results for performing this power law fit are shown in Figure 7.15 for a HEG with
𝑟 𝑠 = 0.5 and are also plotted with the SRE predictions for the correlation energy at the TDL from
complete calculations and the SRE predictions. Also shown are the complete calculations for the
finite numbers of electrons shown at M = 6,142. While it is impossible to say which of these results
is the most accurate, it does appear that the SRE predictions visually provide a better matching for
the convergence of the correlation energies in the TDL. However, the best method of predicting the
correlation energies in the TDL and the error in the predictions is still a topic of ongoing research
and will require long running calculations at the TDL to ensure accuracy.
7.1.1    Homogeneous Electron Gas Conclusion
In this chapter, we have looked at several methods for removing the basis incompleteness error from
CCD calculations of the HEG without performing calculations at a high number of single-particle
states. Using the traditional methods of truncating the basis at some number of shells or using a
1/𝑀 power law fails to give reasonable predictions for the correlations energies in the complete
basis limit, which increases the basis size of the complete calculations that need to be performed.
Additionally, the two methods we looked at, which are based on the relationship of the convergence
of the MBPT2 and CCD correlation energies, gave better predictions for the correlation energies
in the complete basis limit but still failed to accurately reproduce the correlation energies in the
                                                 101


complete basis limit for a high number of electrons.
As we began to develop the SRE method, we had a method that could accurately recreate the
correlation energies in the complete basis limit for various numbers of electrons and values of
𝑟 𝑠 with only minimal training data. However, when we developed the SRE method with ridge
regression, we got an average percent error of 0.45% across 70 complete basis limit predictions if
the best value of 𝛼 was chosen. On the other hand, if the best value of 𝛼 was not used, then the
average percent error was infinity. This highlights the drawback of traditional machine learning
methods: the abundance of hyperparameters and the need to perform hyperparameter tuning.
Even though ridge regression only has one hyperparameter, 𝛼, finding its best value still requires
hyperparameter tuning, which means we must create a validation data set already in the complete
basis limit, which takes a long time to generate and reduces the time savings of using the SRE
method to predict the correlation energies in the complete basis limit.
Thus we were motivated to use Bayesian machine learning, as these algorithms find the optimized
values of their hyperparameters, eliminating the need for hyperparameter tuning and a validation
data set. The first Bayesian algorithm we tested was Bayesian ridge regression, which produced
all 70 correlation energies in the complete basis limit with an average error of only 0.47%, almost
identical to the average ridge regression error. Additionally, since Bayesian ridge regression is
a Bayesian algorithm, it produces uncertainties in its predictions, so we have error bars on our
results. Since we only needed 16 training points to train the algorithm fully and no validation set,
the total time saved using SRE to predict the total basis limit correlation energies instead of fully
calculating them was over 3.5 node days, with an error of only 0.47%. Next, we attempted to reduce
the amount of training data needed, and in order to accomplish this, we had to use a new Bayesian
machine learning algorithm, Gaussian processes (GP). We could predict the correlation energies in
the complete basis limit using only ten training points with the GP algorithm. This resulted in an
average percent error of 1.16% across all 70 predictions, leading to over one week of computational
time savings.
Finally, we looked at an initial analysis using the SRE method to predict the CCD correlation energy
                                                   102


in the thermodynamic limit. However, this work is still in progress because, when this thesis was
written, it was impossible to know the TDL energies for the specific system we are modeling. Thus
we cannot calculate the error in our predictions.
To conclude, the SRE method with Bayesian machine learning algorithms can accurately produce
the correlations energies of the HEG in the complete basis limit while saving many hours of
computational time and producing uncertainties in its results. As the electron gas is a test bed for
developing methods that will eventually be applied to more complicated infinite matter systems, the
analysis performed in this chapter motivates using the SRE method to predict correlation energies
in the complete basis limit for other infinite matter systems.
                                                 103


                                            CHAPTER 8
                           INFINITE NUCLEAR MATTER RESULTS
We have explored and developed the SRE method by applying it in various ways to the homogeneous
electron gas. The electron gas, with its shorter computational times and more straightforward
interactions, makes it an excellent test bed to develop methods that can be applied to other infinite
matter systems. This chapter studies one of these more complicated systems: infinite nuclear
matter. Therefore, we will be using SRE methods perfected on the electron gas. For this chapter,
all SRE algorithms use Gaussian processes with a modified rational quadratic kernel that was
developed in the last section of the previous chapter. We will start our studies in this chapter with
pure neutron matter with both toy and realistic nuclear interactions and then move on to symmetric
nuclear matter with realistic interactions before concluding with a brief study of the symmetry
energy of infinite nuclear matter.
8.1   Pure Neutron Matter
8.1.1    Coupled Cluster Doubles and the Minnesota Potential
As a toy nuclear interaction, the Minnesota potential will provide us with approximations to the
energies for a pure neutron matter system, but without a high degree of accuracy. However, the
Minnesota potential is less computationally taxing than the more realistic chiral potentials and only
includes two-body forces; the Minnesota potential, like the HEG, is a valuable test bed for further
developing our SRE method before we apply it to the more realistic nuclear matter calculations
with the chiral potentials.
Gaussian Processes
The training data for each combination of density and the number of neutrons was taken to be
the CCD and MBPT2 correlation energies between five open energy shells and 14 open energy
shells (inclusive). This data was formatted using the process described in Chapter 6. The machine
learning algorithm used to analyze these data sets was Gaussian Process (GP). For each data set, the
𝛼 value of the GP algorithm was set to be the standard deviation of the training data set. The kernel
                                                 104


for the GP algorithm was chosen to be a constant kernel multiplied by a rational quadratic kernel,
added to a white kernel (all kernels are Scikit-Learn’s implementation). The hyperparameters of
the kernels were set by the GP algorithm during the training process. Once each GP was trained, it
was asked to predict 100 data points, leading to a converged ratio of correlation energies. The final
point in this data set was taken to be 𝑚 and was multiplied by Δ𝐸 𝑀 𝐵𝑃𝑇,6142 to give an approximation
for Δ𝐸𝐶𝐶,6142 . Additionally, the GP algorithms produced their uncertainties on all of the plotted
predictions and the results in Figure 8.1.
Figure 8.1 The CCD correlation energy per neutron as a function of the number of neutrons in
the system for pure neutron matter with the Minnesota potential. All calculations are performed at
6,142 single particle states and with periodic boundary conditions. The full calculations, shown
with the solid lines, were performed on MSU’s ICER supercomputer. The SRE predictions, shown
with the triangular data points, were predicted using only ten training data points from 5-14 open
shells. The shaded region shows the Gaussian process uncertainty in predicting the converged
correlation energy per neutron. The overall RMSE error between the full calculated and the
predicted CCD correlation energies per neutron is 0.037 MeV
When looking at these results, it is also essential to consider the errors associated with other
approximation schemes that do not involve machine learning—truncating the number of single-
particle states in the system to some set number of energy shells. For example, if we limit the
calculation to 14 open energy shells (15 to 30 shells), the error from the CCD correlation energies
at 70 shells (M= 6,142) is 0.56 MeV, much larger than the GP result of 0.037 MeV. Furthermore,
we can include more energy shells before truncating. For example, if 40 total shells are included
                                                 105


in the calculation, then the RMSE error from the results at 70 shells is 0.26 MeV; for truncating
at 50 total shells, it is 0.15 MeV, and for truncating at 65 total shells, it is 0.045 MeV. Therefore,
even when exerting significant computational time and resources to raise the truncation level of
the calculations, the GP prediction made with very little training data collected at a low number of
energy shells is still the most accurate.
Accuracy is not the only metric we must consider; we must also consider the computational time and
resources needed to generate the training data and perform the ML analysis versus just performing
the complete calculation. The timing data presented in this paragraph was collected on Michigan
State University’s HPCC supercomputer on Intel Xeon processors with 2.40 GHz and 240 GB
of RAM. The coupled cluster code was memory-optimized using the diagonal block structure
developed in Ref. [78] and was parallelized using four compute nodes and 28 threads per node.
Figure 8.1 contains 90 points, each of which has to be individually calculated or predicted with
the machine learning algorithm. Since each machine learning process has 10 points, 900 training
points are needed to produce Figure 8.1. However, their run times are comparatively short since
they are taken from 5-14 open energy shells. Generating all 900 training data points takes 136 node
hours. It takes about 20 seconds to perform all 90 ML analyses, thus adding a negligible amount of
time to the total ML analysis run time, which will be dominated by the time to generate the training
data. To generate all 90 data points in Figure 8.1 at 70 energy shells each, the total run time is
484 node hours, meaning that by using the ML extrapolation method instead, 348 node hours of
computational run time is saved. That is equivalent to over 14 node days of computational time
saved and a resulting error of only 0.037 MeV!
It is possible to achieve better accuracy with this prediction method by changing the alpha hyperpa-
rameter of the Gaussian process algorithm. Instead of using the standard deviation of the training
data as the 𝛼 parameter (as used to generate Figure 8.1), instead we will use the fourth root of
                                                       1/4
the standard deviation of the training data (𝛼 = 𝛿𝑦 𝑡𝑟𝑎𝑖𝑛  ). However, while doing this increases the
accuracy (the RMSE drops from 0.037 MeV to 0.025 MeV), the uncertainty drastically increases.
The results for performing this extrapolation are shown in Figure 8.2, where the uncertainties are
                                                  106


shown in (a) but are removed in (b) for clarity.
Figure 8.2 An SRE analysis of pure neutron matter with the Minnesota potential using a Gaussian
                                                     1/4
process machine learning algorithm with 𝛼 = 𝛿𝑦 𝑡𝑟𝑎𝑖𝑛     . This increases the accuracy of the preductions
but also increases the uncertainity on the predictions. The results are shown with uncertainities
(left) and without (right)
8.1.2    Coupled Cluster Doubles and Realistic Potentials
In this section, we will perform CCD correlation energy calculations on a pure neutron matter
system, using chiral potentials derived from effective field theory (EFT) to model the nuclear
interaction. These potential more realistically models the nuclear interaction when compared to the
Minnesota potential, so these results are expected to be more accurate than the Minnesota potential
results from the previous section. Specifically, we use next-to-next-to-leading order (NNLO) chiral
potentials in this section, which were described in detail in Chapter 4.
We will limit our calculations to a 66 neutron system for the chiral potential. This is done for two
reasons. Ref. [35] and [34] show that, especially for pure neutron matter, calculations for a system
of 66 neutrons are nearly indistinguishable from calculations performed at the infinite matter limit.
So for 66 neutrons, the finite size error in a pure neutron matter calculation is minimized. Therefore,
we do not need to perform calculations at higher values of N to perform an 𝑁 → ∞ extrapolation.
Secondly, the CCD correlation energy calculations for pure neutron matter with chiral potential
are very time intensive, much more so than pure neutron matter calculations with the Minnesota
potential. For the Minnesota potential, the average time to generate converged CCD correlation
energies across all densities in Figure 8.1 for 66 neutrons is 23.48 node minutes. These calculations
were all performed at 70 total energy shells or 6,142 single-particle states. However, when we
perform the same calculations with the realistic chiral potentials from EFT, the average run time
                                                    107


across all densities shown in Figure 8.4 is 27.84 node hours per calculation. Moreover, these
calculations converge at a much lower number of single-particle states, so these calculations were
performed at 1,478 single-particle states or 27 total energy shells. Note that these calculations
were performed with different codes, written in different languages, and performed with different
high-performance computing set-ups. Hence, the run times are not directly comparable. However,
it is sufficient to say that performing these calculations with realistic chiral potentials is much more
computationally challenging than using the Minnesota potential.
As a further motivation for applying the SRE method to this system, let us examine the computational
time and resources needed to perform calculations on a pure neutron matter system with 66 neutrons,
a density of 0.16 fm−3 , and chiral NNLO potentials. Figure 8.3 shows the computational time in
node seconds needed to perform the CCD correlation energy calculations as the number of single-
particle states increases.
Figure 8.4 shows CCD calculations and ML predictions for pure neutron matter at 66 neutrons
at two different density ranges. Figure 8.4a shows a density range around nuclear density (which
nuclear density being around 0.16 fm−3 ), and Figure 8.4b shows a low-density range. Densities
around nuclear density are essential for studies of nuclei. The outer layers of a neutron star are
thought to have densities around nuclear density ([77] [34] [2]), while low-density nuclear matter
is also of interest [57]. The ML specifications for the plots in Figure 8.4 are as follows. For 8.4a,
each ML prediction was made using only three training data points, MBPT2 and CCD correlation
energy calculations between 114 and 186 single-particle states (or between one and three open
shells). We were able to use such a small number of training points here because the correlation
energies converge rather quickly. The SRE sequence length was chosen to be one here since there
are only three training points. The converged CCD calculations in the figure were calculated at
M=1,478 single particle states or 27 total energy shells (22 open energy shells). This number of
single-particle states was chosen as the converged values because it is similar to the number of
single-particle states used to calculate converged values in similar studies (Refs. [35], [34], and
[11]). The same Gaussian Process algorithm and kernel as the Minnesota potential analysis was
                                                   108


Figure 8.3 The run times, in node seconds, for coupled cluster doubles calculations of pure neutron
matter as a function of the number of single-particle states in the calculation. Calculations
performed at three different densities are shown
also used here, but the variance of the algorithm was chosen to be the standard deviation of the
                                          4
training data to the fourth power (𝛼 = 𝛿𝑦 𝑡𝑟𝑎𝑖𝑛 ). For Figure 8.4b, the GP specifications are the same,
but for the training data, seven training points were used from 186 to 502 single-particle states
(three to nine open energy shells). This change was made because the correlation energy converges
slower at lower densities. The SRE sequence length was also increased to three since more training
data is available. The CCD correlation energy at M= 1,478 was still used at the fully converged
value.
For Figure 8.4a, the average percent error between the fully converged CCD correlation energies
and the ML extrapolated CCD correlation energies is 0.25%, and for Figure 8.4b it is 1.90%. The
slightly higher average percent error for the second figure is easily explained. The convergence
of the correlation energy is slower at lower densities. Especially at d = 0.005 fm−3 and d = 0.01
                                                   109


          (a) CCD calculations of PNM around nuclear density     (b) CCD calculations of PNM at low densities
Figure 8.4 The results from predicting the converged Δ𝐸𝐶𝐶 /𝑁 for pure neutron matter using
the coupled cluster doubles approximation. Figure (a) shows pure neutron matter for a range of
densities around nuclear density (0.10 ≤ d ≤ 0.20), and figure (b) shows the results for a low-density
pure neutron matter (d ≤ 0.05). For both graphs, the solid black line plots the fully converged
Δ𝐸𝐶𝐶𝐷 /𝑁, calculated at 1,478 single-particle states; the triangular markers represent the SRE
predictions at each density of interest. The dashed red lines represent the point used to train the
GP algorithms with the highest number of single-particle states (M= 186 for Figure (a) and M=
502 for Figure (b)) to show that the training set has not reached convergence
fm−3 , the correlation energies had not sufficiently converged at 1,478 single-particle states. This
explains why there is a discrepancy between the ML prediction and the complete calculation at
d = 0.005fm−3 . Additionally, the ML prediction at d = 0.01 fm−3 closely matches the complete
calculations, but there are pretty large error bars on the prediction when compared to the other
predictions. This could also be explained by the slow convergence of the correlation energy at this
point.
The timing data for Figure 8.4 was collected using Michigan State University’s high-performance
computing center using Intel Xeon compute nodes running at 2.4 GHz and with 240GB of RAM
each. For these calculations, 12 MPI nodes were used per calculation. For example, for Figure
8.4a, generating the six fully converged CCD correlation energies at M = 1,478 single particle
states took 156 node hours. However, generating the training data for the six GP algorithms took
only 3.84 node hours, leading to a time savings of using the ML extrapolations of 152 node hours.
Additionally, for Figure 8.4b, it takes 289 node hours to generate all ten fully converged CCD
correlation energies and M= 1,478 single-particle states. However, it only takes 69.6 node hours
                                                             110


Figure 8.5 A comparison of correlation energy calculations for pure neutron matter near nuclear
density for 66 neutrons. The red data sets were calculated with the Minnesota potential, and the blue
data sets with the chiral potentials. The dashed lines represent the converged correlation energy
per particle from MBPT2, the solid lines represent the converged CCD correlation energies per
particle, and the circular makers represent the ML predictions for the converged CCD correlation
energies per particle
to generate the ten GP training data sets. Therefore using the ML extrapolation over doing the
complete calculations saves 219 node hours. That is a total computational time savings of over 15
node days for Figure 8.4
When considering both the accuracy and the time savings, the SRE method applied to CCD
calculations of pure neutron matter with chiral potentials appears successful. The CCD correlation
energies per particle can be accurately recreated for all but the smallest densities with very little
training data and significant time savings.
Figure 8.5 compares pure neutron matter converged correlation energies per particle calculated
with MBPT2 and CCD and for the Minnesota and chiral potentials. The Minnesota potential
results are from calculations performed at 6,142 single-particle states of 70 total energy shells,
and the chiral potential results are from 1,478 single-particle states or 27 energy shells. However,
both the MBPT2 and CCD correlations energies have sufficiently converged. The ML predictions
for the converged CCD correlation energy per particle are also shown. The results shown are for
                                                 111


calculations involving 66 neutrons and near nuclear density. There are a few critical observations
we can make from this graph. First, we know that the Minnesota and chiral potentials will produce
different results as they model the nuclear interaction differently and to different levels of accuracy.
The results differ significantly, with the chiral potentials yielding lower energy for both MBPT2
and CCD calculations. Additionally, the difference in the results increases as density increases,
indicating that there will be quite a significant difference between the two potentials at densities
greater than nuclear density. Secondly, we can look at the difference between the MBPT2 and
CCD results for the two potentials. For both potentials, the MBPT2 correlation energy per particle
is lower than the CCD correlation energy per particle. However, for the Minnesota potential, the
MBPT2 correlation energies are only slightly below the CCD correlation energies, and the gap
between the MBPT2 correlation energies and the CCD correlation energies is roughly consistent
over the entire density range shown. There is a more significant discrepancy between the MBPT2
and CCD correlation energies for the chiral potential, and this gap grows as the density decreases.
Finally, it is essential to note that for each potential, the ML predictions to the converged CCD
correlations energies are in good agreement with the total calculations, so the SRE method is
potential independent.
8.1.3   Coupled Cluster Triples Approximations and Realistic Potentials
Beyond the CCD approximation, we can also look at the CCDT-1 coupled-cluster approximation,
adding some aspects of the 𝑇ˆ3 excitation operator into the calculations without the full-time needed
for a complete CCDT calculation. In the next section, we will also consider the CCDT-1 and
CCD(T) approximation, another coupled-cluster approximation that adds some aspects of the 𝑇ˆ3
operator into the calculations using a different algorithm.
First, we should verify that the CCDT-1 approximation with the same NNLO chiral potentials gives
a different (and more accurate) approximation than the CCD approximation for pure neutron matter.
In Figure 8.6, the CCD correlation energies per particle are plotted as a function of the density of
the pure neutron matter system for 66 neutrons and at densities around nuclear density with the
solid red line. The CCD calculations shown were performed at 1,478 single-particle states, where
                                                 112


Figure 8.6 A comparison between the CCD and CCDT-1 correlation energies for a pure neutron
matter system (N = 66 neutrons) with NNLO chiral potentials and for densities around nuclear
density. The CCDT-1 approximation consistently gives a lower correlation energy than CCD
correlation energy at all densities in the graph
the calculations converged. For the CCDT-1 calculations, shown with the solid black line, the
calculations were also performed with the NNLO chiral potentials for pure neutron matter and 66
neutrons and at densities around nuclear density. As a result, the correlation energy calculations
using CCDT-1 converge much faster than for CCD and only need 514 single-particle states to
reach convergence. Additionally, since CCDT-1 calculations scale as 𝑂 (𝑀 7 ) compared to a CCD
calculation at 𝑂 (𝑀 6 ), performing CCDT-1 calculation at higher values of Mis computationally
prohibitive. As shown in Fig 8.6, the CCD and CCDT-1 approximations give significantly different
predictions for the correlation energies at each density value shown.
We can also compare the run times needed to complete a CCDT-1 calculation over a CCD calcu-
lation. Figure 8.7 shows the total run time needed to complete a CCDT-1 calculation on Michigan
State University’s High-Performance Computing Center using Intel Xeon nodes with a clock speed
of 2.4 GHz and 24 MPI nodes per run. The results in this figure were collected by performing
CCDT-1 calculations on a pure neutron matter system with chiral NNLO potentials and with 66
neutrons and a density of 0.16 fm−3 . The results are reported in node hours. It is obvious that the
                                                 113


Figure 8.7 The run times needed to complete CCD and CCDT-1 calculations for a PNM system
reported in node hours
run time increases very quickly with the number of single-particle states in the calculation. This
rapid increase in run time is even more apparent if we plot the CCD and CCDT-1 run times on
the same plot, as in Figure 8.7. Though the CCDT-1 calculation converges at a smaller number
of single-particle states, the total run time needed is drastically higher than that needed for the
CCD calculations. At 514 single-particle states, the CCDT-1 calculation takes over 15 times longer
than the CCD calculation at the same number of single-particle states. So even though we were
motivated to use the SRE method to save time on the CCD calculations with the chiral NNLO
potentials due to their run times, there is even more of a need for an accurate extrapolation method
for the CCDT-1 calculations as their run times are an order of magnitude larger.
Next, we will look at the feasibility of predicting the CCDT-1 correlation energy from the MBPT2
correlation energy using the same SRE and machine learning setup as the above doubles predictions.
We will use the SRE method described in Chapter 6 to predict the CCDT-1 correlation energies
                                                  114


from the MBPT2 correlation energies using three training points (from one to three open shells or
114 to 186 single particle states) with a sequence length of one and a Gaussian process algorithm
with its alpha value set to the standard deviation of the training data to the fourth power.
Figure 8.8 shows the results of predicting the converged CCDT-1 correlation energies using the
method described above for a pure neutron matter system with 66 neutrons and d = 0.16 fm−3 . For
the plot, the complete calculations are shown with solid lines, the training data with points, and
the SRE predicted converged correlation energy with the horizontal dashed line. The percent error
between the converged Δ𝐸𝐶𝐶𝐷𝑇1 that has been fully calculated at 514 single particle states and the
SRE prediction is 0.51%. As a comparison, the percent error between the converged Δ𝐸𝐶𝐶𝐷𝑇1 that
has been fully calculated at 514 single particle states and the largest point in the training data (at
186 single particle states) is 12.19%. This means that in the time needed to generate the training
data, which is 14.29 node hours for the three points, the accuracy of the correlation energy can be
improved from 12.19% to just 0.51%. Furthermore, the time needed to calculate the fully converged
correlation energy at 514 single-particle states is 194.97 node hours, meaning that just for this one
density, the SRE method saves 180.68 node hours while only sacrificing 0.51% in accuracy.
Now that we have shown that the SRE method can be feasibly applied to predict converged CCDT-
1 correlation energies using only the MBPT2 energies, we can use this method to predict the
correlation energies at various densities. Figure 8.9 compares the correlation energies from the last
point in the training data at M= 186 (red dashed line), the fully calculated converged Δ𝐸𝐶𝐶𝐷𝑇1 at 514
single-particle states (black line), and the SRE prediction for the converged correlation energies at
densities around nuclear density. We can see that the training data is far from the converged values,
and in fact, the difference between the M= 186 plot and the M= 514 plot is highest at the lower
densities shown here. This is to show that the correlation energies used to train the SRE algorithm
are not converged, and thus, a noticeable improvement is made over just using the training data. For
Figure 8.9, the average percent error between the SRE predictions and the complete calculations is
0.45%. Furthermore, the total computational time saved when generating the six data points with
the SRE method over fully generating them at 514 single-particle states is 917.30 node hours.
                                                  115


Figure 8.8 The results from performing coupled-cluster calculations with the CCDT-1
approximation on a pure neutron matter system with chiral NNLO potentials, 66 neutrons, and
d = 0.16 fm−3 . The full calculations are shown with a solid line, the training data used for the
SRE algorithm is shown with the triangular markers, and the SRE prediction for the converged
correlation energy is shown with the dashed line. The percent error between the SRE prediction
and the fully calculated converged correlation energy is 0.51%, and the time savings for performing
the SRE method is 180.68 node hours
Figure 8.9 The results from predicting the converged Δ𝐸𝐶𝐶𝐷𝑇1 /𝑁 for pure neutron matter at
densities around nuclear density. The average percent error across all densities is 0.45%, and
the time savings from generating only the training data versus the fully converged Δ𝐸𝐶𝐶𝐷𝑇1 /𝑁 is
917.30 node hours
                                                116


Figure 8.10 CCD (red) versus CCDT-1 (black) correlation energies for a pure neutron matter system
with 66 neutrons and chiral potentials. The converged correlation energies from full calculations
are shown with solid lines, and the points represent the machine-learning predictions. The machine
learning prediction is shown with the uncertainties from the Gaussian process algorithm
Finally, let us compare the CCDT-1 results to the CCD results for densities around nuclear density
with the SRE predictions (Figure 8.10). Not only does the SRE method have good predictions for
the CCD calculations in the density range shown, it also performs equally well on the more complex
CCDT-1 calculations.
As a final set of calculations with pure neutron matter, we can look at performing CC calculations
using the CCD(T) approximation.
First, let us compare the correlation energies of a PNM system for densities around nuclear density
calculated using CCD, CCDT-1, and CCD(T). It has already been shown that CCD and CCDT-1
give significantly different values for the correlation energies. Still, we want to ensure a significant
difference between these two data sets and the CCD(T) energies. Figure 8.11 shows the fully
converged correlation energies for a PNM system calculated with the three CC approximations.
The CCD and CCD(T) were performed at M= 1,478, and the CCDT-1 calculations were performed
at M= 514. It does appear that the three methods give significantly different answers, so it will be
an exciting test for the SRE method to compare its performance on all three methods.
Now, we can compare the run times of each method when applied to PNM. The expected run time
of a CCD calculation is an iterative 𝑂 (𝑀 6 ), and the expected run time of a CCDT-1 calculation is
an iterative 𝑂 (𝑀 7 ). In comparison, the expected run time of a CCD(T) calculation falls in between
                                                  117


Figure 8.11 Correlation energies of PNM calculated with CCD (red), CCDT-1 (green), and CCD(T)
(blue) for densities around nuclear density
these two with an iterative 𝑂 (𝑀 6 ) followed by a non-iterative step of 𝑂 (𝑀 7 ). And in fact, that
is what we see in Figure 8.12 where CCD(T) is faster than CCDT-1 but slower than CCD. Note
that the times are reported here in node hours per GHz. The CCD(T) calculations for PNM were
performed on Oak Ridge National Laboratory’s Andes supercomputer on AMD processors, which
have a clock speed of 3.0 GHz and use 64 MPI nodes. Just as we defined a node hour as the total run
time of a calculation multiplied by the number of MPI nodes in the calculations to compare the run
times computed with different numbers of MPI nodes, here we define node hours per GHz (node
hours divided by the clock speed of the processors) to compare results which have been performed
on different supercomputers.
Now we can apply the SRE method to attempt to predict the converged CCD(T) correlation
energies for PNM. We have used three training points, like with the other PNM and chiral potential
extrapolations. Still, here we have moved the training data to slightly higher numbers of single-
particle states (M= 186, 246, and 294) due to the slower convergence of the CCD(T) correlation
energies with respect to the number of single-particle states. We have kept the SRE sequence
length as 1. Figure 8.13 shows the results of performing this SRE analysis with the full CCD(T)
                                               118


Figure 8.12 The run time (in node hours per GHz) to calculate the correlation energies for a PNM
system at d = 0.16fm−3 calculated using CCD (red), CCDT-1 (green), and CCD(T) (blue) as a
function of the number of single-particle states in the calculation
correlation energy calculations performed at M= 1,478 shown with the solid line and the SRE
predictions with the triangular markers. Here the average percent error between the two data sets is
0.54%. Furthermore, the time needed to perform the six CCD(T) calculations at M= 1,478 shown
in Figure 8.13 took 594.15 node hours, but the time needed to generate the SRE training data was
only 32.16 node hours. This leads to a time savings of 561.99 node hours or 23.41 node days.
Finally, we are in a position to compare the SRE predictions for CC calculations of PNM using
CCD (red), CCDT-1 (green), and CCD(T) (blue), as seen in Fig 8.14. For all three CC methods, the
complete calculations are shown with solid lines. The SRE predictions are shown with triangular
markers with error bars, with the error bars representing the uncertainties from the Gaussian
processes algorithm. For all three data sets, the SRE method appears to provide a good match
for all the correlation energies across all the densities shown. The average percent error of this
                                                 119


Figure 8.13 The CCD(T) correlation energies for PNM for densities around nuclear density were
calculated at M= 1,478 (solid line) and predicted with the SRE method (triangular markers). The
uncertainties from the Gaussian processes algorithm are shown with error bars on the markers
figure is 0.42%. Furthermore, the entire time saved using the SRE method to predict the converged
correlation energies rather than fully calculating them is 796.89 node hours or 33.20 node days.
Therefore, in calculating just 18 points for the PNM system, we save over a month of computational
time using the SRE method.
In this section, we have performed coupled cluster calculations of pure neutron matter using two
different models of nuclear interaction and three different levels of coupled cluster approximations.
However, it is essential to note that the main point of this section is not to compare the effects
of different coupled cluster calculations and interactions, though we have done this. Instead,
the main point of this section is to determine the performance of the SRE methods on different
types of coupled cluster calculations. As we have seen throughout this section, the SRE method
can accurately extrapolate to the converged correlation energy of the pure neutron matter system,
                                                  120


Figure 8.14 The correlation energies for PNM were calculated using CCD (red), CCDT-1 (green),
and CCD(T) (blue). The fully converged calculations are shown with the solid lines, and the SRE
predictions are shown with the triangular markers
regardless of the combination of nuclear interaction and coupled cluster approximation used.
Furthermore, over a month of computational time is saved in this section by using the SRE method
with no loss of relevant accuracy. Finally, it is essential to note that even though we are using
machine learning, the training data sets we have used range from 3 - 10 data points. These are
some of the smallest training data sets found in the literature for machine learning applications in
the physical sciences and emphasize an essential feature of the SRE method. That being that the
SRE method can make very accurate extrapolations using very little training data.
8.2   Symmetric Nuclear Matter
8.2.1   Coupled Cluster Doubles with Realistic Nuclear Interactions
Now, we can move on from analyzing pure neutron matter and instead turn our attention to symmetric
nuclear matter (SNM), which comprises an equivalent number of protons and neutrons. We will
                                                121


Figure 8.15 The CCD correlation energies for PNM (red) and SNM (blue) for densities around
nuclear matter and with a chiral NNLO potential
continue to use the chiral NNLO potentials. Here all SNM systems will be calculated using 132
nucleons (66 protons and 66 neutrons) since these results will be similar to the energies at the TDL
[34]. Furthermore, the immense computational costs of SNM calculations also prohibit studies
and higher particle numbers. Compared to a PNM calculation, an SNM calculation will need to
double the number of single-particle states to accommodate double the number of particles. It
will need approximately three times the computational time and resources. However, it is essential
to perform calculations of SNM in addition to PNM, even at the higher computational costs,
because the addition of protons into the system causes different correlations in the system, leading
to a correlation energy that converges slower with respect to the number of single-particle states.
Thus SNM should be a more challenging system for the SRE method and will genuinely test its
predictive abilities. Additionally, as we think about moving towards calculations of finite nuclei in
future works, accurate calculations of both PNM and SNM are essential for these studies. Figure
8.15 compares the CCD correlation energies per nucleon for PNM (red) and SNM (blue). We can
see that the SNM correlation energies are significantly larger than the PNM ones.
The CCD data set for SNM was generated using a supercomputer available through Michigan State
                                                 122


Figure 8.16 The CCD correlation energies for SNM calculated fully at M = 2,956 (solid line) and
using the SRE method to predict them (triangular marker). The average percent error of this graph
is 0.54% and the time savings of using SRE over performing the full calculations is 6.48 node days
University’s Institute for Cyber-Enabled Research (ICER) using Intel Xeon processors, which have
a clock speed of 2.4 GHz. For the MPI parallelization, 24 nodes were used. It takes, on average,
26.07 node hours to generate a converged CCD energy for a PNM system (at M = 1,478). However,
since SNM is a more challenging case, and the number of single-particle states doubles, it takes,
on average, 84.12 node hours to perform a converged CCD correlation energy calculation for SNM
at M = 2,956. This means the computational resources needed for an SNM calculation are more
significant than those needed for a PNM calculation by about a factor of three.
The SRE predictions in Figure 8.16 were generated using seven training points from M = 324 to M
= 1,004. The SRE sequence length is used as three since there are more training data points here
compared to the PNM case. In addition, the SNM CCD energy has a slower convergence than the
PNM energies, leading to the use of more training data.
When predicting the CCD correlation energies using the SRE method, the average percent error
between the predictions and the complete calculations at M = 2,956 is 0.54%. This is compared to
the average percent error between the calculations at M = 2,956 and M = 1,004 (the largest training
data point) of 9.52%. Thus the SRE provides a much more accurate method than truncating the
                                                123


calculations at the level of the training data. Furthermore, it takes 504.77 node hours to compute
all six data points shown in Figure 8.16 at M = 2,956 but only 349.17 node hours to generate the
training data needed by the SRE algorithm to predict all 6 points. This leads to a computational
time savings of 155.61 node hours or 6.48 node days. Thus the SRE method can accurately recreate
the CCD correlation energies of an SNM system with significant time savings.
8.2.2    Coupled Cluster Doubles with Perturbative Triples and Realistic Potentials
As a final data set with SRE, we will attempt to predict the converged correlation energies for
SNM using the CCD(T) approximation. The CCD(T) approximation has significantly higher time
requirements than the CCD approximation, as we saw with the PNM case. But this difference in
time will be even higher more drastic for SNM since there will be a higher number of single-particle
states. This extra factor of M can contribute significantly to the run time since it can be as large as M
= 2,956 for the fully converged calculations. However, for this case, due to the high computational
costs, we have truncated M to only 23 energy shells or M = 2,060. At this level, the correlation
energies have converged to an order of 10−3 or less, so it should be sufficient for this work, but it
is important to note that if the calculations were carried all the way out to M = 2,956 then the run
times would be much higher. We can see in Figure 8.17 that the CCD(T) run times (blue) are, in
fact, much higher than the CCD run times. The CCD(T) data was collected on Oak Ridge National
Laboratory’s Andes supercomputer using 3.0 GHz and 64 MPI nodes. Since the CCD and CCD(T)
data was collected on different supercomputers and with a different number of MPI nodes, in Figure
8.17, we have standardized the run time by reporting it as node hour per GHz.
The average time to perform a CCD calculation of SNM at M = 2,956 is 84.12 node hours. However,
the average time to complete a CCD(T) calculation of SNM at only M = 2,060 is 335 node hours.
This significant increase in run time is necessary as the CCD(T) approximation adds some aspects
of the 𝑇ˆ3 operator into the calculations neglected in the CCD approximation. This does make the
calculations more accurate (more comparable to an actual SNM system), but we do have to pay
the much higher computational cost. This further motivates the use of the SRE method because
here we are taking almost 14 node days per density of interest, making studies of, for example, the
                                                  124


Figure 8.17 The run times (in node hours per GHz) for CCD calculations of SNM (red) and CCD(T)
calculations of SNM (blue). The CCD calculations were performed on processors which ran at
2.4GHz and used 24 MPI nodes, while the CCD(T) calculations were performed on processors that
ran at 3.0GHz and used 64 MPI nodes
equation of state of nuclear matter prohibitively expensive.
Figure 8.18 shows the results of performing an SRE analysis on CCD(T) calculations of SNM. The
calculations at M = 2,060 are shown with the solid lines, and then SRE predictions are shown with
triangular markers, which have error bars that come from the uncertainty of the Gaussian process
algorithm; here, we have used six training points (from M = 162 to M = 406), and we have set the
SRE sequence length to one. This results in an average percent error of 0.65%. Compared to the
average percent error between the data at M = 406 (the largest training point) and the converged
energies at M = 2,060 of 14.70%, the SRE method accurately predicts the converged CCD(T)
energies from unconverged energies. Looking at the run times, it takes a total of 2,000 node
hours to generate the six data points at M = 2,060, but it only takes 721 node hours to create the
training data needed by the SRE algorithm, which is required to produce these converged correlation
energies accurately. This results in a time savings of 1,278 node hours or 53.28 node days.
Finally, we can compare the CCD correlation energies for SNM (red) and the CCD(T) correlation
energies for SNM (blue) in Figure 8.19. The complete calculations at M = 2,956 for CCD and M
                                                 125


Figure 8.18 The CCD(T) correlation energies for SNM calculated at M = 2,060 (solid line) and
predicted with SRE (triangular markers)
= 2,060 for CCD(T) are shown with solid lines, and the SRE markers are shown with triangular
markers. We draw two important conclusions from this figure. First, the CCD and CCD(T)
correlation energies are significantly different, so this does justify the higher computational cost of
the CCD(T) calculations as the CCD(T) results should more closely model a real system as they take
into account some aspects of the 𝑇ˆ3 operator. Second, we can see that the SRE method performs
well on both CCD and CCD(T) data sets, again showing that SRE works well on different coupled
cluster data sets with no modifications of the algorithm besides changing the two hyperparameters:
the number of training data points and the SRE sequence length.
8.3   Symmetry Energy
While we are limiting our infinite matter calculations only to pure neutron matter (𝑥 𝑝 = 0.0) and
symmetric nuclear matter (𝑥 𝑝 = 0.5), we can still calculate a quantity that is important to studies
                                                  126


Figure 8.19 The correlation energies for SNM were calculated with CCD (red) and CCD(T) (blue).
The full calculations at M = 2,956 (CCD) and M = 2,060 (CCD(T)) are shown with solid lines,
and the SRE predictions are shown with triangular markers
of infinite nuclear matter and the nuclear equation of state. The quantity is the symmetry energy
which is defined as:
                            𝐸 𝑠𝑦𝑚 (𝜌) = 𝐸 (𝜌, 𝑥 𝑝 = 0) − 𝐸 (𝜌, 𝑥 𝑝 = 1/2).                   (8.1)
Figure 8.20 shows the symmetry energies calculated for both the CCD approximation (red) and
the CCD(T) approximation (blue) for densities around nuclear density. The symmetry energies
calculated with full converged energies at 𝑀 = 1,478 for PNM and 𝑀 = 2,956 for SNM are shown
with the solid lines, and the symmetry energies calculated with SRE predictions are shown with
triangular markers. Additionally, the uncertainty on the SRE predictions is shown with error bars.
This uncertainty comes from the Gaussian process algorithm. Note that this plot is energy per
nucleon instead of correlation energy per nucleon as the other plots in this chapter.
For the CCD approximation, the average percent error between the two data sets is 0.056%, and
for the CCD(T) approximation, it is 0.078%. Note that these percent errors are much smaller
than previous results because we are considering the total energy instead of just the correlation
energies so small differences have much less of an impact. We also need to consider time savings
                                                  127


Figure 8.20 The symmetry energy for infinite nuclear matter is calculated using the CCD
approximation (red) and the CCD(T) approximation (blue). The symmetry energy calculated from
fully converged calculations is shown with the solid lines and SRE predictions with the triangular
points. Error bars also represent the uncertainty on the symmetry energy from SRE predictions
from the Gaussian process algorithm
as a justification for performing the SRE method over the complete calculations for the symmetry
energy. The time needed to generate all 12 points (6 for PNM and 6 for SNM) needed to calculate
the symmetry energy at 𝑀 = 1,478, or 2,956, is 661 node hours for the CCD approximation and
2608 node hours for the CCD(T) approximation. The time needed to generate all training data for
the CCD approximation is 353 node hours and 1472 node hours for the CCD(T) approximation. The
training data requires just 55.8% of the computational time it takes to generate the fully converged
energies and most of this time is due to the high cost of SNM calculations. This leads to a time
savings of 20.11 node days for the CCD approximation with an average error of 0.056%. The time
savings for the CCD(T) approximation is 47.33 node days with an average percent error of only
0.078%. Therefore, when calculating the symmetry energy, the SRE method provides an accurate
prediction and can save many days of computational time.
                                               128


8.4    Final Remarks
As pointed out in the pure neutron matter section of this chapter, the main goal of this chapter
is not to compare coupled cluster calculations with different levels of approximation, different
nuclear interactions, and different infinite nuclear matter systems. Though we have made some
of these comparisons, the main point of this chapter is to analyze the accuracy and time savings
of the SRE algorithm. As we have seen throughout this chapter, coupled cluster calculations of
nuclear systems have very large (and possibly prohibitive) computational times, especially when
using realistic nuclear interactions and higher-order coupled cluster approximations. The SRE
method, with its ability to make accurate extrapolations with a small amount of training data,
can make these studies more feasible by drastically reducing the run times needed to perform the
calculations. This is especially important when we realize that studies of the nuclear equation of
state (an essential future work for this thesis) will require accurate calculations at different densities
and proton fractions. Thus, the development of the SRE method, by reducing the computational
time needed to perform accurate calculations, makes large-scale studies of infinite nuclear matter
much more feasible, paving the way for novel studies to occur in the future.
                                                   129


                                             CHAPTER 9
                                           CONCLUSION
The sequential regression extrapolation (SRE) method developed here based on Bayesian regression
algorithms is an accurate and valuable extrapolator for removing basis incompleteness errors from
coupled cluster calculations of infinite matter systems. Furthermore, when the infinite matter
system needs to be taken to the thermodynamic limit, it is possible to use SRE to perform this task.
Since the SRE algorithm uses training data taken from calculations at small numbers of single-
particle states to predict the correlation energy at many single-particle states, the SRE algorithm
can offer significant time savings over performing fully converged correlation energy calculations.
As shown in this thesis, using the SRE algorithm to save over 100 node hours in the calculations of
just one correlation energy is possible. Furthermore, this huge time savings does come with a loss
in the accuracy of performing the whole savings. However, the average percent error between the
SRE prediction and the fully calculated result was typically less than 1%, making it quite a slight
difference compared to the large amount of computational time that has been saved.
Furthermore, besides developing the SRE method, we also compared different methods of per-
forming coupled cluster calculations of systems of infinite nuclear matter. First, we compared the
results from two different interactions: the Minnesota potential, a toy interaction, and chiral NNLO
potentials, which are much more realistic. By comparing these two, we learned that they differ quite
significantly around densities of infinite nuclear matter that are similar to nuclear densities and that
calculations containing NNLO potentials do take much longer to compute compared to a Minnesota
potential applied to the same system. Furthermore, we could also compare the difference between
the coupled cluster approximations coupled cluster doubles (CCD), the iterative coupled cluster
triples approximation (CCDT-1), and coupled cluster doubles with perturbative triples (CCD(T))
on calculations of infinite matter systems. We found that the triples approximations give significant
results when compared to the CCD approximation, so they are worth performing even though they
provide an increase in computational time.
Though it has been mentioned throughout this thesis, it is essential to emphasize the size of the
                                                  130


training data sets used in this work. This work’s most extensive training data set used only 16 training
points, and the smallest training set used only 3 points. Some areas of physics could be faster to
adopt machine learning because of the vast amount of training data that some machine learning
algorithms require. However, this work has shown that accurate machine learning predictions can
be made with very few training points, thus encouraging using machine learning as a tool in fields
with small data sets.
Possible Future Works
A few notable future works stem directly from the work presented here. First, while we showed
results from both the CCD, CCDT-1, and CCD(T) approximations, we did not have the capabilities
to produce coupled cluster correlation energies using a complete CCDT calculation. This is
mainly due to the high computational costs of a complete triple calculation (𝑂 (𝑀 8 )), but recent
computational advancements make this more achievable.
Additionally, while we only looked at pure neutron matter and symmetric nuclear matter here, there
are other proton fractions of interest. If we want to model the equation of the state of nuclear matter
thoroughly, then we need to be able to accurately predict the properties of neutron matter at proton
fractions beyond just 0.0 and 0.5.
While truncating the number of particles in a calculation is limited to infinite matter and other large
systems, truncations occur in every ab initio many-body calculation. Basis truncation is especially
common and occurs in almost every calculation except some simple toy models. The last part of
this thesis will be dedicated to exploring some possible future applications of the SRE methodology
which has been developed.
An extension of the work presented in this thesis is to apply the SRE method to remove basis
incompleteness errors from coupled cluster calculations of nuclei. Though nuclei are finite systems
and the number of nucleons in the system generally does not need to be truncated, the number
of single-particle states is still truncated, leading to a need to extrapolate to an infinite model
space [51]. This is especially true for heavy nuclei and nuclei that are weakly bound [51]. There
are methods to perform these extrapolations on nuclei calculations, but when using the harmonic
                                                  131


oscillator basis, which mixes the ultraviolet and infrared cutoffs, these extrapolation methods can
fail [51]. Machine learning has been used to perform similar extrapolations (see Reference [51],
[64], [65], for example), but these were performed with neural networks and thus incurred all of
the problems that were experienced with neural networks in this thesis.
Additionally, the SRE method has no reason to be restricted to only predicting CC energies using
MBPT2 energies. Extending the SRE method to other many-body methods should also be possible.
One of the most accurate yet restricted ab initio many-body methods is full configuration interaction
theory (FCI), which uses a variationally optimized linear combination of the full set of Slater
determinants. Full configuration interaction is used in nuclear physics and electronic-structure
theory, but its complexity limits it to only the smallest of systems. The ground state energy, which
FCI finds, is the lowest (variationally) and most accurate that can be achieved. If an infinite single
particle basis is used, FCI produces the solutions to the Schrödinger equation. However, due to
computational limitations, FCI calculations must be performed with a finite basis, meaning they
will fail to retrieve the total energy [76].
While most of this thesis, except for the sections on the electron gas, have been devoted to nuclear
physics applications, ab initio many-body methods occur in many other fields besides nuclear
physics. For example, similarity renormalization group theory (RG), coupled cluster theory, and
other many-body methods are prevalent in other fields of physics and quantum chemistry. Therefore,
the development of the SRE method should improve calculations outside of the realm in which it
was developed.
                                                  132


                                        BIBLIOGRAPHY
 [1] W.H. Young N.H. March and S. Sampanthar. The Many-Body Problem in Quantum Physics.
     Dover Publications, 1995.
 [2] G Hagen et al. “Coupled-cluster computations of atomic nuclei”. In: Reports on Progress in
     Physics 77.9 (Sept. 2014), p. 096302. doi: 10.1088/0034-4885/77/9/096302.
 [3] Gustav Baardsen. “Coupled-cluster theory for infinite matter”. Ph.D.. University of Oslo,
     2014.
 [4] D. J. Dean and M. Hjorth-Jensen. “Coupled-cluster approach to nuclear physics”. In: Physical
     Review C 69 (5 May 2004), p. 054320. doi: 10.1103/PhysRevC.69.054320.
 [5] Jiří Čížek. “On the Correlation Problem in Atomic and Molecular Systems. Calculation
     of Wavefunction Components in Ursell-Type Expansion Using Quantum-Field Theoretical
     Methods”. In: The Journal of Chemical Physics 45.11 (May 2004), pp. 4256–4266. doi:
     10.1063/1.1727484.
 [6] J. Čižek and J. Paldus. “Correlation problems in atomic and molecular systems III. Red-
     erivation of the coupled-pair many-electron theory using the traditional quantum chemical
     methodst”. In: International Journal of Quantum Chemistry 5.4 (1971), pp. 359–379. doi:
     https://doi.org/10.1002/qua.560050402.
 [7] Rodney J. Bartlett and Monika Musiał. “Coupled-cluster theory in quantum chemistry”. In:
     Review of Modern Physics 79 (1 Feb. 2007), pp. 291–352. doi: 10.1103/RevModPhys.79.291.
 [8] H. Kümmel, K.H. Lührmann, and J.G. Zabolitzky. “Many-fermion theory in expS- (or
     coupled cluster) form”. In: Physics Reports 36.1 (1978), pp. 1–63. doi: https://doi.org/10.
     1016/0370-1573(78)90081-9.
 [9] R. F. Bishop. “An overview of coupled cluster theory and its applications in physics”. English.
     In: Theoretica Chimica Acta 80 (1991), pp. 95–148. doi: 10.1007/BF01119617.
[10] Jochen H. Heisenberg and Bogdan Mihaila. “Ground state correlations and mean field in 16 O”.
     In: Physical Review C 59 (3 Mar. 1999), pp. 1440–1448. doi: 10.1103/PhysRevC.59.1440.
[11] Baishan Hu et al. “Ab initio predictions link the neutron skin of 208Pb to nuclear forces”. In:
     Nature Physics 18.10 (Aug. 2022). doi: 10.1038/s41567-022-01715-8.
[12] A. Ekström et al. “Accurate nuclear radii and binding energies from a chiral interaction”. In:
     Physical Review C 91 (5 May 2015), p. 051301. doi: 10.1103/PhysRevC.91.051301.
[13] L. Coraggio et al. “Nuclear-matter equation of state with consistent two- and three-body
     perturbative chiral interactions”. In: Physical Review C 89 (4 Apr. 2014), p. 044321. doi:
                                               133


     10.1103/PhysRevC.89.044321.
[14] F. Sammarruca et al. “Toward order-by-order calculations of the nuclear and neutron matter
     equations of state in chiral effective field theory”. In: Physical Review C 91 (5 May 2015),
     p. 054311. doi: 10.1103/PhysRevC.91.054311.
[15] Robert Roth et al. “Medium-Mass Nuclei with Normal-Ordered Chiral 𝑁 𝑁+3𝑁 Interactions”.
     In: Physical Review Letters 109 (5 July 2012), p. 052501. doi: 10.1103/PhysRevLett.109.
     052501.
[16] Alessandro Lovato et al. “Comparative study of three-nucleon potentials in nuclear matter”.
     In: Physical Review C 85 (2 Feb. 2012), p. 024003. doi: 10.1103/PhysRevC.85.024003.
[17] Francesca Sammarruca and Randy Millerson. “Exploring the relationship between nuclear
     matter and finite nuclei with chiral two- and three-nucleon forces”. In: Physical Review C 102
     (3 Sept. 2020), p. 034313. doi: 10.1103/PhysRevC.102.034313.
[18] A. Ekström et al. “Δ isobars and nuclear saturation”. In: Physical Review C 97 (2 Feb. 2018),
     p. 024332. doi: 10.1103/PhysRevC.97.024332.
[19] W. G. Jiang et al. “Accurate bulk properties of nuclei from 𝐴 = 2 to ∞ from potentials with Δ
     isobars”. In: Physical Review C 102 (5 Nov. 2020), p. 054301. doi: 10.1103/PhysRevC.102.
     054301.
[20] S. Mishev and M. Savova. “Coupled-cluster method for nucleonic matter using GPU tensor
     cores”. In: AIP Conference Proceedings 2522.1 (Sept. 2022). 090007. doi: 10.1063/5.
     0101117.
[21] L Satpathy. “Infinite nuclear matter based for mass of atomic nuclei”. In: Journal of Physics
     G: Nuclear Physics 13.6 (June 1987), p. 761. doi: 10.1088/0305-4616/13/6/009.
[22] J.W. Negele and D. Vautherin. “Neutron star matter at sub-nuclear densities”. In: Nuclear
     Physics A 207.2 (1973), pp. 298–320. doi: https://doi.org/10.1016/0375-9474(73)90349-7.
[23] S. Gandolfi, J. Carlson, and Sanjay Reddy. “Maximum mass and radius of neutron stars, and
     the nuclear symmetry energy”. In: Physical Review C 85 (3 Mar. 2012), p. 032801. doi:
     10.1103/PhysRevC.85.032801.
[24] W. G. Jiang et al. “Emulating \emphab initio computations of infinite nucleonic matter”. In:
     (Dec. 2022).
[25] H. Moeini and G.H. Bordbar. “Nuclear matter calculations with the phenomenological three-
     nucleon interaction”. In: Nuclear Physics A 1017 (2022), p. 122339. doi: https://doi.org/10.
     1016/j.nuclphysa.2021.122339.
                                                 134


[26] “Neutron Star Models: Masses and Radii”. In: Black Holes, White Dwarfs, and Neutron
     Stars. John Wiley & Sons, Ltd, 1983. Chap. 9, pp. 241–266. doi: https://doi.org/10.1002/
     9783527617661.ch9.
[27] J. M. Lattimer and M. Prakash. “Neutron Star Structure and the Equation of State”. In: The
     Astrophysical Journal 550.1 (Mar. 2001), p. 426. doi: 10.1086/319702.
[28] James M. Lattimer and Madappa Prakash. “Neutron star observations: Prognosis for equation
     of state constraints”. In: Physics Reports 442.1 (2007). The Hans Bethe Centennial Volume
     1906-2006, pp. 109–165. doi: https://doi.org/10.1016/j.physrep.2007.02.003.
[29] Andrew W. Steiner, James M. Lattimer, and Edward F. Brown. “THE EQUATION OF STATE
     FROM OBSERVED MASSES AND RADII OF NEUTRON STARS”. in: The Astrophysical
     Journal 722.1 (Sept. 2010), p. 33. doi: 10.1088/0004-637X/722/1/33.
[30] James M. Lattimer. “The Nuclear Equation of State and Neutron Star Masses”. In: An-
     nual Review of Nuclear and Particle Science 62.1 (2012), pp. 485–515. doi: 10.1146/
     annurev-nucl-102711-095018.
[31] Henning Heiselberg and Morten Hjorth-Jensen. “Phases of dense matter in neutron stars”.
     In: Physics Reports 328.5 (2000), pp. 237–327. doi: https://doi.org/10.1016/S0370-1573(99)
     00110-6.
[32] S. Gandolfi et al. “Quantum Monte Carlo calculation of the equation of state of neutron matter”.
     In: Physical Review C 79 (5 May 2009), p. 054005. doi: 10.1103/PhysRevC.79.054005.
[33] S. Gandolfi et al. “Equation of state of low-density neutron matter, and the 1 𝑆0 pairing gap”.
     In: Physical Review C 80 (4 Oct. 2009), p. 045802. doi: 10.1103/PhysRevC.80.045802.
[34] G. Hagen et al. “Coupled-cluster calculations of nucleonic matter”. In: Physical Review C 89
     (1 Jan. 2014), p. 014319. doi: 10.1103/PhysRevC.89.014319.
[35] G. Baardsen et al. “Coupled-cluster studies of infinite nuclear matter”. In: Physical Review C
     88 (5 Nov. 2013), p. 054312. doi: 10.1103/PhysRevC.88.054312.
[36] Sean Bruce Sangolt Miller. “Quantum Mechanical Studies of Infinite Matter by the use of
     Coupled-Cluster Calculations, with an Emphasis on Nuclear Matter”. Master’s. University of
     Oslo, 2017.
[37] Murray Gell-Mann and Keith A. Brueckner. “Correlation Energy of an Electron Gas at High
     Density”. In: Physical Review 106 (2 Apr. 1957), pp. 364–368. doi: 10.1103/PhysRev.106.
     364.
[38] James J. Shepherd. “Communication: Convergence of many-body wave-function expansions
     using a plane-wave basis in the thermodynamic limit”. In: The Journal of Chemical Physics
                                                135


     145.3 (2016), p. 031104. doi: 10.1063/1.4958461.
[39] R. F. Bishop and K. H. Lührmann. “Electron correlations. II. Ground-state results at low
     and metallic densities”. In: Physical Review B 26 (10 Nov. 1982), pp. 5523–5557. doi:
     10.1103/PhysRevB.26.5523.
[40] R.F. Bishop and K.H. Lührmann. “Coupled clusters and the electron gas at metallic densities”.
     In: Physica B+C 108.1 (1981), pp. 873–874. doi: https://doi.org/10.1016/0378-4363(81)
     90741-5.
[41] Verena A. Neufeld, Hong-Zhou Ye, and Timothy C. Berkelbach. “Ground-State Properties
     of Metallic Solids from Ab Initio Coupled-Cluster Theory”. In: The Journal of Physical
     Chemistry Letterss 13.32 (2022). PMID: 35939802, pp. 7497–7503. doi: 10.1021/acs.jpclett.
     2c01828.
[42] Tina N. Mihm, Bingdi Yang, and James J. Shepherd. “Power Laws Used to Extrapolate
     the Coupled Cluster Correlation Energy to the Thermodynamic Limit”. In: Journal of
     Chemical Theory and Computation 17.5 (2021). PMID: 33830754, pp. 2752–2758. doi:
     10.1021/acs.jctc.0c01171.
[43] Verena A. Neufeld and Alex J. W. Thom. “A study of the dense uniform electron gas with
     high orders of coupled cluster”. In: The Journal of Chemical Physics 147.19 (Nov. 2017).
     194105. doi: 10.1063/1.5003794.
[44] Andreas Köhn and David P. Tew. “Towards the Hartree–Fock and coupled-cluster singles
     and doubles basis set limit: A study of various models that employ single excitations into a
     complementary auxiliary basis set”. In: The Journal of Chemical Physics 132.2 (Jan. 2010).
     024101. doi: 10.1063/1.3291040.
[45] Thomas B. Adler and Hans-Joachim Werner. “Local explicitly correlated coupled-cluster
     methods: Efficient removal of the basis set incompleteness and domain errors”. In: The
     Journal of Chemical Physics 130.24 (June 2009). 241101. doi: 10.1063/1.3160675.
[46] Ke Liao and Andreas Grüneis. “Communication: Finite size correction in periodic coupled
     cluster theory calculations of solids”. In: The Journal of Chemical Physics 145.14 (Oct.
     2016). 141102. doi: 10.1063/1.4964307.
[47] Tina N. Mihm, Laura Weiler, and James J. Shepherd. “How the Exchange Energy Can
     Affect the Power Laws Used to Extrapolate the Coupled Cluster Correlation Energy to the
     Thermodynamic Limit”. In: Journal of Chemical Theory and Computation 19.6 (2023).
     PMID: 36918372, pp. 1686–1697. doi: 10.1021/acs.jctc.2c00737.
[48] Xin Xing and Lin Lin. “Origin of inverse volume scaling in periodic coupled cluster calcula-
     tions towards thermodynamic limit”. In: (Apr. 2023).
                                             136


[49] Tina N. Mihm, William Z. Van Benschoten, and James J. Shepherd. “Accelerating convergence
     to the thermodynamic limit with twist angle selection applied to methods beyond many-body
     perturbation theory”. In: The Journal of Chemical Physics 154.2 (Jan. 2021). 024113. doi:
     10.1063/5.0033408.
[50] T.N. Mihm, T. Schäfer, and S.K. et al. Ramadugu. “A shortcut to the thermodynamic limit for
     quantum many-body calculations of metals”. In: Nat Comput Sci 1 (2021), pp. 801–808. doi:
     10.1038/s43588-021-00165-1.
[51] W. G. Jiang, G. Hagen, and T. Papenbrock. “Extrapolation of nuclear structure observables
     with artificial neural networks”. In: Physical Review C 100 (5 Nov. 2019), p. 054326. doi:
     10.1103/PhysRevC.100.054326.
[52] R. Utama and J. Piekarewicz. “Validating neural-network refinements of nuclear mass mod-
     els”. In: Physical Review C 97 (1 Jan. 2018), p. 014306. doi: 10.1103/PhysRevC.97.014306.
[53] Giuseppe Carleo and Matthias Troyer. “Solving the quantum many-body problem with arti-
     ficial neural networks”. In: Science 355.6325 (2017), pp. 602–606. doi: 10.1126/science.
     aag2302.
[54] Laura Weiler, Tina N. Mihm, and James J. Shepherd. “Machine learning for a finite size
     correction in periodic coupled cluster theory calculations”. In: The Journal of Chemical
     Physics 156.20 (May 2022). 204109. doi: 10.1063/5.0086580.
[55] Justin Smith et al. “Approaching coupled cluster accuracy with a general-purpose neural
     network potential through transfer learning”. In: Nature Communications 10 (July 2019).
     doi: 10.1038/s41467-019-10827-4.
[56] M. R. Mumpower et al. “Physically interpretable machine learning for nuclear masses”. In:
     Physical Review C 106 (2 Aug. 2022), p. L021301. doi: 10.1103/PhysRevC.106.L021301.
[57] Bryce Fore et al. “Dilute neutron star matter from neural-network quantum states”. In: (Dec.
     2022).
[58] Rodrigo A. Vargas-Hernández et al. “Extrapolating Quantum Observables with Machine
     Learning: Inferring Multiple Phase Transitions from Properties of a Single Phase”. In: Phys-
     ical Review Letters 121 (25 Dec. 2018), p. 255702. doi: 10.1103/PhysRevLett.121.255702.
[59] Léo Neufcourt et al. “Bayesian approach to model-based extrapolation of nuclear observables”.
     In: Physical Review C 98 (3 Sept. 2018), p. 034318. doi: 10.1103/PhysRevC.98.034318.
[60] R. Utama, J. Piekarewicz, and H. B. Prosper. “Nuclear mass predictions for the crustal
     composition of neutron stars: A Bayesian neural network approach”. In: Physical Review C
     93 (1 Jan. 2016), p. 014311. doi: 10.1103/PhysRevC.93.014311.
                                              137


[61] Marco Knöll et al. “Machine learning for the prediction of converged energies from ab initio
     nuclear structure calculations”. In: Physics Letterss B 839 (2023), p. 137781. doi: https:
     //doi.org/10.1016/j.physletb.2023.137781.
[62] Johannes T. Margraf and Karsten Reuter. “Making the Coupled Cluster Correlation Energy
     Machine-Learnable”. In: The Journal of Physical Chemistry A 122.30 (2018). PMID:
     29985611, pp. 6343–6348. doi: 10.1021/acs.jpca.8b04455.
[63] Ying Zhang and Chen Ling. “A strategy to apply machine learning to small datasets in materials
     science”. In: npj Computational Materials 4 (May 2018). doi: 10.1038/s41524-018-0081-z.
[64] Gianina Alina Negoita et al. “Deep Learning: A Tool for Computational Nuclear Physics”.
     In: (Mar. 2018).
[65] Gianina Alina Negoita et al. “Deep learning: Extrapolation tool for ab initio nuclear theory”.
     In: Physical Review C 99 (5 May 2019), p. 054308. doi: 10.1103/PhysRevC.99.054308.
[66] S. Athanassopoulos et al. “Nuclear mass systematics using neural networks”. In: Nuclear
     Physics A 743.4 (2004), pp. 222–235. doi: https://doi.org/10.1016/j.nuclphysa.2004.08.006.
[67] R. Utama, J. Piekarewicz, and H. B. Prosper. “Nuclear mass predictions for the crustal
     composition of neutron stars: A Bayesian neural network approach”. In: Physical Review C
     93 (1 Jan. 2016), p. 014311. doi: 10.1103/PhysRevC.93.014311.
[68] R Utama, Wei-Chia Chen, and J Piekarewicz. “Nuclear charge radii: density functional theory
     meets Bayesian neural networks”. In: Journal of Physics G: Nuclear and Particle Physics
     43.11 (Oct. 2016), p. 114002. doi: 10.1088/0954-3899/43/11/114002.
[69] R. Utama and J. Piekarewicz. “Refining mass formulas for astrophysical applications: A
     Bayesian neural network approach”. In: Physical Review C 96 (4 Oct. 2017), p. 044308. doi:
     10.1103/PhysRevC.96.044308.
[70] Amber Boehnlein et al. “Colloquium: Machine learning in nuclear physics”. In: Review of
     Modern Physics 94 (3 Sept. 2022), p. 031003. doi: 10.1103/RevModPhys.94.031003.
[71] Nawar Ismail and Alexandros Gezerlis. “Machine-learning approach to finite-size effects
     in systems with strongly interacting fermions”. In: Physical Review C 104 (5 Nov. 2021),
     p. 055802. doi: 10.1103/PhysRevC.104.055802.
[72] Sota Yoshida. “Nonparametric Bayesian approach to extrapolation problems in configuration
     interaction methods”. In: Physical Review C 102 (2 Aug. 2020), p. 024305. doi: 10.1103/
     PhysRevC.102.024305.
[73] Bryce Meredig et al. “Can machine learning identify the next high-temperature superconduc-
     tor? Examining extrapolation performance for materials discovery”. In: Mol. Syst. Des. Eng.
                                             138


     3 (5 2018), pp. 819–825. doi: 10.1039/C8ME00012C.
[74] Ryosuke Jinnouchi, Hirohito Hirata, and Ryoji Asahi. “Extrapolating Energetics on Clusters
     and Single-Crystal Surfaces to Nanoparticles by Machine-Learning Scheme”. In: The Journal
     of Physical Chemistry C 121.47 (2017), pp. 26397–26405. doi: 10.1021/acs.jpcc.7b08686.
[75] Rodrigo Navarro Pérez and Nicolas Schunck. “Controlling extrapolations of nuclear properties
     with feature selection”. In: Physics Letterss B 833 (2022), p. 137336. doi: https://doi.org/10.
     1016/j.physletb.2022.137336.
[76] James J. Shepherd et al. “Convergence of many-body wave-function expansions using a plane-
     wave basis: From homogeneous electron gas to solid state systems”. In: Physical Review B
     86 (3 July 2012), p. 035111. doi: 10.1103/PhysRevB.86.035111.
[77] Justin G. Lietz et al. “Computational Nuclear Physics and Post Hartree-Fock Methods”. In:
     An Advanced Course in Computational Nuclear Physics: Bridging the Scales from Quarks
     to Neutron Stars. Ed. by Morten Hjorth-Jensen, Maria Paola Lombardo, and Ubirajara
     van Kolck. Cham: Springer International Publishing, 2017, pp. 293–399. doi: 10.1007/
     978-3-319-53336-0_8.
[78] Justin Gage Leitz. “Computational developments for ab initio many-body theory”. Ph.D..
     Michigan State University, 2019.
[79] Aurélien Géron. Hands-on machine learning with Scikit-Learn and TensorFlow : concepts,
     tools, and techniques to build intelligent systems. Sebastopol, CA: O’Reilly Media, 2017.
[80] Isaiah Shavitt and Rodney J. Bartlett. Many-Body Methods in Chemistry and Physics: MBPT
     and Coupled-Cluster Theory. Cambridge Molecular Science. Cambridge University Press,
     2009. doi: 10.1017/CBO9780511596834.
[81] J. Paldus and J. čížek. “Time-Independent Diagrammatic Approach to Perturbation Theory
     of Fermion Systems”. In: ed. by Per-Olov Löwdin. Vol. 9. Advances in Quantum Chemistry.
     Academic Press, 1975, pp. 105–197. doi: https://doi.org/10.1016/S0065-3276(08)60040-4.
[82] D. R. Hartree. “The Wave Mechanics of an Atom with a Non-Coulomb Central Field. Part I.
     Theory and Methods”. In: Mathematical Proceedings of the Cambridge Philosophical Society
     24.1 (1928), pp. 89–110. doi: 10.1017/S0305004100011919.
[83] V. Fock. “Näherungsmethode zur Lösung des quantenmechanischen Mehrkörperproblems”.
     In: Zeitschrift fur Physik 61.1-2 (Jan. 1930), pp. 126–148. doi: 10.1007/BF01340294.
[84] Chr. Møller and M. S. Plesset. “Note on an Approximation Treatment for Many-Electron
     Systems”. In: Physical Review 46 (7 Oct. 1934), pp. 618–622. doi: 10.1103/PhysRev.46.618.
[85] J. Hubbard. “The description of collective motions in terms of many-body perturbation
                                                139


     theory”. In: Proceedings of the Royal Society A 240 (1223 July 1957), pp. 539–560. doi:
     10.1098/rspa.1957.0106.
[86] N.M. Hugenholtz. “Perturbation theory of large quantum systems”. In: Physica 23.1 (1957),
     pp. 481–532. doi: https://doi.org/10.1016/S0031-8914(57)92950-6.
[87] F. Coester. “Bound states of a many-particle system”. In: Nuclear Physics 7 (1958), pp. 421–
     424. doi: https://doi.org/10.1016/0029-5582(58)90280-3.
[88] F. Coester and H. Kümmel. “Short-range correlations in nuclear wave functions”. In: Nuclear
     Physics 17 (1960), pp. 477–485. doi: https://doi.org/10.1016/0029-5582(60)90140-1.
[89] Andrew G. Taube and Rodney J. Bartlett. “Improving upon CCSD(T): Λ CCSD(T). I. Potential
     energy surfaces”. In: The Journal of Chemical Physics 128.4 (Jan. 2008). 044110. doi:
     10.1063/1.2830236.
[90] Stanisław A. Kucharski and Rodney J. Bartlett. “Noniterative energy corrections through
     fifth-order to the coupled cluster singles and doubles method”. In: The Journal of Chemical
     Physics 108.13 (Apr. 1998), pp. 5243–5254. doi: 10.1063/1.475961.
[91] Rodney J. Bartlett et al. “Non-iterative fifth-order triple and quadruple excitation energy
     corrections in correlated methods”. In: Chemical Physics Letters 165.6 (1990), pp. 513–522.
     doi: https://doi.org/10.1016/0009-2614(90)87031-L.
[92] Krishnan Raghavachari et al. “A fifth-order perturbation comparison of electron correlation
     theories”. In: Chemical Physics Letterss 157.6 (1989), pp. 479–483. doi: https://doi.org/10.
     1016/S0009-2614(89)87395-6.
[93] W. Kutzelnigg. “Error analysis and improvements of coupled-cluster theory.” In: Theoret.
     Chim. Acta 80 (May 1991), pp. 349–386. doi: 10.1007/BF01117418.
[94] Yuan He, Zhi He, and Dieter Cremer. “Comparison of CCSDT-n methods with coupled-cluster
     theory with single and double excitations and coupled-cluster theory with single, double, and
     triple excitations in terms of many-body perturbation theory - What is the most effective triple-
     excitation method?” In: Theoretical Chemistry Accounts: Theory, Computation, and Model-
     ing (Theoretica Chimica Acta) 105 (Jan. 2001), pp. 182–196. doi: 10.1007/s002140000196.
[95] Krishnan Raghavachari et al. “A fifth-order perturbation comparison of electron correlation
     theories (Reprinted from Chemical Physics Letterss)”. In: Chemical Physics Letterss 589
     (Dec. 2013), pp. 37–40. doi: 10.1016/j.cplett.2013.08.064.
[96] Miroslav Urban et al. “Towards a full CCSDT model for electron correlation”. In: The Journal
     of Chemical Physics 83 (Oct. 1985), pp. 4041–4046. doi: 10.1063/1.449067.
[97] Audun Skau Hansen. “Coupled cluster studies of infinite systems”. Master’s. University of
                                                140


      Oslo, 2015.
 [98] Katuro Sawada. “Correlation Energy of an Electron Gas at High Density”. In: Physical
      Review 106 (2 Apr. 1957), pp. 372–383. doi: 10.1103/PhysRev.106.372.
 [99] Yu-Liang Liu. “The intrinsical character of the electronic correlation in an electron gas”.
      In: Physics Letterss A 383.12 (2019), pp. 1336–1340. doi: https://doi.org/10.1016/j.physleta.
      2019.01.029.
[100] SHANG-kENG MA and KEITH A. BRUECKNER. “Correlation Energy of an Electron Gas
      with a Slowly Varying High Density”. In: Physical Review 165 (1 Jan. 1968), pp. 18–31. doi:
      10.1103/PhysRev.165.18.
[101] W. J. Carr and A. A. Maradudin. “Ground-State Energy of a High-Density Electron Gas”. In:
      Physical Review 133 (2A Jan. 1964), A371–A374. doi: 10.1103/PhysRev.133.A371.
[102] J.P. Blaizot. “Nuclear compressibilities”. In: Physics Reports 64.4 (1980), pp. 171–248. doi:
      https://doi.org/10.1016/0370-1573(80)90001-0.
[103] L Satpathy. “Infinite nuclear matter based for mass of atomic nuclei”. In: Journal of Physics
      G: Nuclear Physics 13.6 (June 1987), p. 761. doi: 10.1088/0305-4616/13/6/009.
[104] K. Hebeler and A. Schwenk. “Chiral three-nucleon forces and neutron matter”. In: Physical
      Review C 82 (1 July 2010), p. 014314. doi: 10.1103/PhysRevC.82.014314.
[105] J. Carlson et al. “Quantum Monte Carlo calculations of neutron matter”. In: Physical Review
      C 68 (2 Aug. 2003), p. 025802. doi: 10.1103/PhysRevC.68.025802.
[106] Alexandros Gezerlis and J. Carlson. “Low-density neutron matter”. In: Physical Review C 81
      (2 Feb. 2010), p. 025803. doi: 10.1103/PhysRevC.81.025803.
[107] M Baldo and G F Burgio. “Properties of the nuclear medium”. In: Reports on Progress in
      Physics 75.2 (Jan. 2012), p. 026301. doi: 10.1088/0034-4885/75/2/026301.
[108] Arianna Carbone, Artur Polls, and Arnau Rios. “Symmetric nuclear matter with chiral three-
      nucleon forces in the self-consistent Green’s functions approach”. In: Physical Review C 88
      (4 Oct. 2013), p. 044302. doi: 10.1103/PhysRevC.88.044302.
[109] D.R. Thompson, M. Lemere, and Y.C. Tang. “Systematic investigation of scattering problems
      with the resonating-group method”. In: Nuclear Physics A 286.1 (1977), pp. 53–66. doi:
      https://doi.org/10.1016/0375-9474(77)90007-0.
[110] S. K. Bogner et al. “Testing the density matrix expansion against ab initio calculations of
      trapped neutron drops”. In: Physical Review C 84 (4 Oct. 2011), p. 044306. doi: 10.1103/
      PhysRevC.84.044306.
                                                141


[111] B. Friman and A. Schwenk. “Three-Body Interactions in Fermi Systems”. In: From Nuclei to
      Stars, pp. 141–156. doi: 10.1142/9789814329880_0006.
[112] J. W. Holt, N. Kaiser, and W. Weise. “Density-dependent effective nucleon-nucleon interaction
      from chiral three-nucleon forces”. In: Physical Review C 81 (2 Feb. 2010), p. 024002. doi:
      10.1103/PhysRevC.81.024002.
[113] K. Hebeler et al. “Improved nuclear matter calculations from chiral low-momentum interac-
      tions”. In: Physical Review C 83 (3 Mar. 2011), p. 031301. doi: 10.1103/PhysRevC.83.
      031301.
[114] Hans-Werner Hammer, Andreas Nogga, and Achim Schwenk. “Colloquium: Three-body
      forces: From cold atoms to nuclei”. In: Review of Modern Physics 85 (1 Jan. 2013), pp. 197–
      217. doi: 10.1103/RevModPhys.85.197.
[115] W.N. Polyzou and W. Glöckle. “Three-body interactions and on-shell equivalent two-body
      interactions”. In: Few-Body System 9 (Feb. 1990), pp. 97–21. doi: 10.1007/BF01091701.
[116] Jun-ichi Fujita and Hironari Miyazawa. “Pion Theory of Three-Body Forces”. In: Progress
      of Theoretical Physics 17.3 (Mar. 1957), pp. 360–365. doi: 10.1143/PTP.17.360.
[117] A. Ekström et al. “Optimized Chiral Nucleon-Nucleon Interaction at Next-to-Next-to-Leading
      Order”. In: Physical Review Letters 110 (19 May 2013), p. 192502. doi: 10.1103/
      PhysRevLett.110.192502.
[118] G Hagen et al. “Emergent properties of nuclei from ab initio coupled-cluster calculations*”.
      In: Physica Scripta 91.6 (May 2016), p. 063006. doi: 10.1088/0031-8949/91/6/063006.
[119] G Hagen et al. “Emergent properties of nuclei from ab initio coupled-cluster calculations*”.
      In: Physica Scripta 91.6 (May 2016), p. 063006. doi: 10.1088/0031-8949/91/6/063006.
[120] Sven Binder et al. “Ab initio path to heavy nuclei”. In: Physics Letterss B 736 (2014),
      pp. 119–123. doi: https://doi.org/10.1016/j.physletb.2014.07.010.
[121] R. Navarro Pérez, J. E. Amaro, and E. Ruiz Arriola. “Low-energy chiral two-pion exchange
      potential with statistical uncertainties”. In: Physical Review C 91 (5 May 2015), p. 054002.
      doi: 10.1103/PhysRevC.91.054002.
[122] R. J. Furnstahl et al. “Quantifying truncation errors in effective field theory”. In: Physical
      Review C 92 (2 Aug. 2015), p. 024005. doi: 10.1103/PhysRevC.92.024005.
[123] Michael I. Haftel and Frank Tabakin. “Nuclear saturation and the smoothness of nucleon-
      nucleon potentials”. In: Nuclear Physics A 158.1 (1970), pp. 1–42. doi: https://doi.org/10.
      1016/0375-9474(70)90047-3.
                                                 142


[124] Petr Navrátil et al. “Unified ab initio approaches to nuclear structure and reactions”. In:
      Physica Scripta 91.5 (Apr. 2016), p. 053002. doi: 10.1088/0031-8949/91/5/053002.
[125] Sven Binder et al. “Extension of coupled-cluster theory with a noniterative treatment of
      connected triply excited clusters to three-body Hamiltonians”. In: Physical Review C 88 (5
      Nov. 2013), p. 054319. doi: 10.1103/PhysRevC.88.054319.
[126] H. Hergert et al. “The In-Medium Similarity Renormalization Group: A novel ab initio method
      for nuclei”. In: Physics Reports 621 (2016). Memorial Volume in Honor of Gerald E. Brown,
      pp. 165–222. doi: https://doi.org/10.1016/j.physrep.2015.12.007.
[127] D. R. Entem and R. Machleidt. “Accurate charge-dependent nucleon-nucleon potential at
      fourth order of chiral perturbation theory”. In: Physical Review C 68 (4 Oct. 2003), p. 041001.
      doi: 10.1103/PhysRevC.68.041001.
[128] G. Hagen et al. “Medium-Mass Nuclei from Chiral Nucleon-Nucleon Interactions”. In:
      Physical Review Letters 101 (9 Aug. 2008), p. 092502. doi: 10.1103/PhysRevLett.101.
      092502.
[129] P. Navrátil, G. P. Kamuntavi čius, and B. R. Barrett. “Few-nucleon systems in a translationally
      invariant harmonic oscillator basis”. In: Physical Review C 61 (4 Mar. 2000), p. 044001. doi:
      10.1103/PhysRevC.61.044001.
[130] I. Angeli and K.P. Marinova. “Table of experimental nuclear ground state charge radii:
      An update”. In: Atomic Data and Nuclear Data Tables 99.1 (2013), pp. 69–95. doi:
      https://doi.org/10.1016/j.adt.2011.12.006.
[131] H. Kümmel, K.H. Lührmann, and J.G. Zabolitzky. “Many-fermion theory in expS- (or
      coupled cluster) form”. In: Physics Reports 36.1 (1978), pp. 1–63. doi: https://doi.org/10.
      1016/0370-1573(78)90081-9.
[132] B. D. DAY. “Elements of the Brueckner-Goldstone Theory of Nuclear Matter”. In: Review
      of Modern Physics 39 (4 Oct. 1967), pp. 719–744. doi: 10.1103/RevModPhys.39.719.
[133] K. A. Brueckner. “Many-Body Problem for Strongly Interacting Particles. II. Linked Cluster
      Expansion”. In: Physical Review 100 (1 Oct. 1955), pp. 36–45. doi: 10.1103/PhysRev.100.36.
[134] K. A. Brueckner, C. A. Levinson, and H. M. Mahmoud. “Two-Body Forces and Nuclear
      Saturation. I. Central Forces”. In: Physical Review 95 (1 July 1954), pp. 217–228. doi:
      10.1103/PhysRev.95.217.
[135] A. Gezerlis et al. “Quantum Monte Carlo Calculations with Chiral Effective Field Theory
      Interactions”. In: Physical Review Letters 111 (3 July 2013), p. 032501. doi: 10.1103/
      PhysRevLett.111.032501.
                                                143


[136] J. Carlson et al. “Quantum Monte Carlo methods for nuclear physics”. In: Review of Modern
      Physics 87 (3 Sept. 2015), pp. 1067–1118. doi: 10.1103/RevModPhys.87.1067.
[137] V. Somà et al. “Chiral two- and three-nucleon forces along medium-mass isotope chains”. In:
      Physical Review C 89 (6 June 2014), p. 061301. doi: 10.1103/PhysRevC.89.061301.
[138] W.H. Dickhoff and C. Barbieri. “Self-consistent Green’s function method for nuclei and
      nuclear matter”. In: Progress in Particle and Nuclear Physics 52.2 (2004), pp. 377–496. doi:
      https://doi.org/10.1016/j.ppnp.2004.02.038.
[139] T. D. Morris, N. M. Parzuchowski, and S. K. Bogner. “Magnus expansion and in-medium
      similarity renormalization group”. In: Physical Review C 92 (3 Sept. 2015), p. 034331. doi:
      10.1103/PhysRevC.92.034331.
[140] J. Clark et al. “Statistical Modeling of Nuclear Systematics”. In: (Oct. 2001). doi: 10.1142/
      9789812811127_0008.
[141] N. J. Costiris et al. “Decoding 𝛽-decay systematics: A global statistical model for 𝛽− half-lives”.
      In: Physical Review C 80 (4 Oct. 2009), p. 044332. doi: 10.1103/PhysRevC.80.044332.
[142] S Akkoyun et al. “An artificial neural network application on nuclear charge radii”. In:
      Journal of Physics G: Nuclear and Particle Physics 40.5 (Mar. 2013), p. 055106. doi:
      10.1088/0954-3899/40/5/055106.
[143] Kevin P. Murphy. Machine learning : a probabilistic perspective. Cambridge, Mass. [u.a.]:
      MIT Press, 2013.
[144] Wessel N. van Wieringen. “Lecture notes on ridge regression”. In: arXiv: Methodology
      (2015).
[145] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine
      learning. Adaptive computation and machine learning. MIT Press, 2006, pp. I–XVIII, 1–248.
[146] David J. C. MacKay. “Bayesian Interpolation”. In: Neural Computation 4.3 (May 1992),
      pp. 415–447. doi: 10.1162/neco.1992.4.3.415.
[147] Michael E. Tipping. “Sparse Bayesian Learning and the Relevance Vector Machine”. In: J.
      Mach. Learn. Res. 1 (Sept. 2001), pp. 211–244. doi: 10.1162/15324430152748236.
[148] A.J. Kallio, E. Pajanne, and R.F. Bishop. Recent Progress in Many-body Theories. Recent
      progress in many-body theories no. 1. Plenum, 1987.
                                                 144