EIGENVECTOR CONTINUATION: CONVERGENCE AND EMULATORS By Avik Sarkar A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Physics – Doctor of Philosophy 2022 ABSTRACT EIGENVECTOR CONTINUATION: CONVERGENCE AND EMULATORS By Avik Sarkar There has been a great interest in the scientific community in using machine learning to build emulators that can accurately predict scientific processes using only a fraction of the time needed for direct calculations. The computational advantage of emulators allows us to study processes that are beyond what is possible with direct calculations. Eigenvector continuation is one such emulation technique that was introduced recently. It is a variational method that finds the extremal eigenvalues and eigenvectors of a Hamiltonian matrix with one or more control parameters. The computational advantage comes from projecting the Hamiltonian onto a much smaller subspace of basis vectors corresponding to eigenvectors at some chosen training values of the control parameters. The method has proven to be very efficient and accurate for interpolating and extrapolating eigenvectors. In this work, we present a study on the error convergence properties of eigenvector continuation. With the insights we gain from learning the convergence properties, we then propose a self-learning algorithm to efficiently select training eigenvectors for eigenvector continuation. Self-learning is an active-learning process that relies on a fast estimate of the emulator error and a greedy local optimization algorithm that becomes more accurate as the emulator approximation improves. We show that self-learning emulators are highly efficient algorithms that offer both high speed and high accuracy, and it can be applied to any emulator that emulates the solution to a system of constraint equations, such as solutions of algebraic or transcendental equations, linear and nonlinear differential equations, and linear and nonlinear eigenvalue problems. ACKNOWLEDGMENTS I am extremely grateful to my academic advisor, Dean Lee, for his guidance and support throughout my time as a graduate student. All this work would not have been possible without him. I am also thankful for the amount of time he spent with his research students and me outside of work and research. I am grateful for all the great experiences I had with him while traveling to Saugatuck beach, visiting the tulip festival in Holland, going apple-picking, and during the long drives to conferences. Through several conversations over lunches and dinners, I had the opportunity to learn various things, including several insights about academic life, which I appreciated a lot. I hope to learn more from him through future research collaborations. I would like to thank my other PhD guidance committee members, William Lynch, Thomas Parker, Jeffery Schenker, and Vladimir Zelevinsky. I have learned a lot from their excellent questions and feedback on my research. I am also grateful for all the faculty I had the opportunity to interact with during my time as a graduate student. I would especially like to thank Scott Pratt, Wade Fisher, Thomas Parker, Jeffery Schenker, Alex Brown, and Vladimir Zelevinsky for helping me learn the basics of theoretical physics and mathematics during the first couple of years of my graduate student life. I would also like to express gratitude to Ning Li, who spent a lot of time with me to help me learn the nuclear lattice Effective Field Theory codes. I also appreciated the research discussions I had with several other people in the lab. In particular, I would like to thank Xilin Zhang. I would also like to thank my group members, Caleb Hicks, Jacob Watkins, Gabriel Given, Zhengrong Qian, and Joseph Bonitati, for all the great discussions. iii Finally, I would like to acknowledge the enormous emotional support that I have received from my friends in the lab. I will also miss the company of my roommates Zhen Li and Juan Gu. This work would have been much more difficult without them. The computational resources for this work were provided by the Institute for Cyber- Enabled Research (iCER) at Michigan State University. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 The Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Perturbation Theory and Avoided Level-Crossing . . . . . . . . . . . . . . . 8 1.4 Variational Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Eigenvector Continuation . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 How to Apply Eigenvector Continuation . . . . . . . . . . . . . . . . . . . . 12 2.2 Variational Principle in Eigenvector Continuation . . . . . . . . . . . . . . . 19 2.3 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Interpolation and Emulators using Eigenvector Continuation . . . . . . . . . 26 Chapter 3 Anharmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Anharmonic Oscillator Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Eigenvector Continuation Applied to the Anharmonic Oscillator . . . . . . . 32 3.3 Error Bound for General Analytic Continuation . . . . . . . . . . . . . . . . 42 3.4 Error Bound for the Anharmonic Oscillator . . . . . . . . . . . . . . . . . . . 46 Chapter 4 Convergence of Eigenvector Continuation . . . . . . . . . . . . . 54 4.1 Vector Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Asymptotic Convergence of Eigenvector Continuation and Vector Continuation 65 4.3 Convergence of Eigenvector Continuation outside Radius of Convergence of Perturbation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4 Multi-parameter Eigenvector Continuation . . . . . . . . . . . . . . . . . . . 81 4.5 Error Dependence in Eigenvector Continuation on Location of Training Points 85 Chapter 5 Self-learning Emulators . . . . . . . . . . . . . . . . . . . . . . . . 89 5.1 Finding Optimal Training Point Locations for Eigenvector Continuation . . . 91 5.2 Self-learning Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2.1 Constraint equations and error estimates . . . . . . . . . . . . . . . . 101 5.2.2 Natural Cubic Spline Emulator . . . . . . . . . . . . . . . . . . . . . 104 5.2.3 Reduced Basis Emulator . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.3 Self-learning Eigenvector Continuation . . . . . . . . . . . . . . . . . . . . . 116 Chapter 6 Nuclear Lattice EFT and Floating-block Method for Eigenvec- tor Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1 Floating Block Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 v 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Chapter 7 Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . . . 133 Chapter 8 Supplemental Materials . . . . . . . . . . . . . . . . . . . . . . . . 136 8.1 Anharmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.1.1 Calculation of derivatives of eigenvector through perturbative expansion136 8.1.2 Simplified recursion relation . . . . . . . . . . . . . . . . . . . . . . . 139 8.1.3 Use in Eigenvector Continuation . . . . . . . . . . . . . . . . . . . . . 141 8.2 Natural Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 vi LIST OF TABLES Table 3.1: The optimal choice for the number of analytic continuations and the corre- sponding α (ratio of radii while performing analytic continuations) required to minimize approximation error. . . . . . . . . . . . . . . . . . . . . . . . 52 Table 4.1: Eigenvector continuation in anharmonic oscillator. We calculate the exact eigenvectors at location ci and use them as training eigenvectors for EC, to estimate the eigenvector at ct . We see that EC approximation becomes better with more training eigenvectors, and with more spaced apart training points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Table 4.2: Energy Error with eigenvector continuation in the anharmonic oscillator at ct = 1. It converges exponentially as we increase the order of the method (see figure 3.4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Table 6.1: Eigenvector continuation calculation for He-4 with Lt = 40. . . . . . . . . 132 Table 6.2: Eigenvector continuation calculation for He-4 with ct = −5.1e−7. . . . . . 132 vii LIST OF FIGURES Figure 2.1: Radius of convergence of power series expansion at any point is limited by the singular points z and z̄. . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 2.2: Analytic continuation allows us to extend standard perturbation theory radius of convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Figure 2.3: Interpolation with eigenvector continuation - we try to find approximate eigenvalue problem solution everywhere between −700 ≤ c ≤ 700 using only six training eigenvectors. . . . . . . . . . . . . . . . . . . . . . . . . 28 Figure 3.1: Convergence of ground state energy with different methods for ct = 0.05 and 0.1. Perturbation theory fails, but eigenvector continuation does not. Notice that perturbation theory works for a few orders, and then diverges eventually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figure 3.2: Eigenvector continuation converges for ct = 5, a point far away from origin. At this point perturbation theory diverges right away. . . . . . . 36 Figure 3.3: Even though we have a singularity z0 in the negative c axis, we can an- alytically continue to any point on the positive c axis with the help of multiple analytic continuations. . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 3.4: Exponential convergence in eigenvector continuation. . . . . . . . . . . . 38 Figure 3.5: Convergence of the eigenvector continuation method with c = 0.1 (top) and 5 (bottom). The convergence accelerates as we analytically continue further away from our point of singularity because for each successive analytic continuation we can draw bigger circles where our series will converge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 viii Figure 3.6: Comparison of error convergence for direct orthogonalization of truncated Hamiltonian with dimension N , and eigenvector continuation with order N . Perturbative EC refers to the case where we take the derivatives of the eigenvector at c = 0 as the training points. The other EC refers to the case where we random select training points at small c. For the top figure, ct = 0.1 and for the bottom figure ct = 0.1. We consistently see eigenvector continuation outperforming direct orthogonalization. . . . . . 41 Figure 3.7: Level crossing for even states along negative real axis of c. . . . . . . . . 42 Figure 3.8: Analytic continuation of a general wave function |ψ(c)⟩ with singularities at z0 , z1 and their complex conjugates. . . . . . . . . . . . . . . . . . . . 43 Figure 3.9: Difference of absolute values of lowest two even parity state eigenvalues. 48 Figure 3.10: Multiple analytic continuation of the wave function |ψ(c)⟩ for the anhar- monic oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Figure 4.1: Logarithm of the error versus order N for eigenvector continuation (aster- isks), vector continuation (solid lines), and perturbation theory (dashed lines). The three different colors (black, blue and red) correspond with Models 1A, 1B, and 1C respectively. . . . . . . . . . . . . . . . . . . . . 64 Figure 4.2: Comparison of the convergence ratios µVC (ct ), µEC (ct ), and µPT (ct ) for Model 2 with N = 10 and N ′ = 0. The training vectors for all cases are evaluated on the weak-coupling BCS side at c = −0.4695. The unitary limit value corresponds to c = −3.957. . . . . . . . . . . . . . . . . . . . 71 Figure 4.3: The lowest six energies of Model 3 as a function of c. . . . . . . . . . . . 74 Figure 4.4: Comparison of the convergence ratios µVC , µEC , and µPT versus ct for Model 3 with N = 20 and N ′ = 0. . . . . . . . . . . . . . . . . . . . . . 75 Figure 4.5: Plots of the convergence ratios µVC , µEC , and the LO, NLO, N2 LO, and N3 LO approximations to µVC versus ct for Model 3 with N = 20 and N ′ = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Figure 4.6: Plots of the convergence ratios µVC , µEC , and the Padé approximations (1,1) and (2,2) to µVC versus ct for Model 3 with N = 20 and N ′ = 0. . 77 ix Figure 4.7: Plots of the convergence ratios µVC , µVC EC EC 3 1 , µ , µ1 , and the N LO ap- proximations to µVC and µVC 1 versus ct for Model 3 with N = 20 and N ′ = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Figure 4.8: Plots of the convergence ratio µVC versus ct for Model 3 with N ′ = 0 and N = 5, 10, 15, 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Figure 4.9: Comparison of the logarithm of the eigenvector continuation error versus ND for D = 1 and D = 2 dimensions. . . . . . . . . . . . . . . . . . . . . 82 Figure 4.10: Comparison of the logarithm of the eigenvector continuation error versus ND for D = 1 and D = 3 dimensions. . . . . . . . . . . . . . . . . . . . . 84 Figure 4.11: We show different choices of path for the eigenvector continuation train- ing vectors. The direct straight line is in green, semicircle is in orange, isosceles right triangle is in blue. . . . . . . . . . . . . . . . . . . . . . . 86 Figure 4.12: Comparison of the logarithm of the error versus order N for different paths for the eigenvector continuation training vectors. The number of training points corresponds to N + 1. . . . . . . . . . . . . . . . . . . . . . . . . . 87 Figure 5.1: Eigenvector continuation emulator - we emulate approximate eigenvalue problem solution everywhere between −700 ≤ c ≤ 700 using six training eigenvectors. The training eigenvectors are chosen such that the volume formed by these vectors is maximized. The final volume is less than 0.02. 93 Figure 5.2: Energies for different excited states for the random matrix example given in chapter 2. After an avoided level crossing, the excited eigenvector changes with another eigenvector, and if we want to capture this informa- tion we need training points on both sides of the avoided level crossing. This explains our choice of training points in figure 5.1. . . . . . . . . . . 94 Figure 5.3: Estimates of the logarithm of the error for the cubic spline self-learning emulator, which finds the lowest real root of the fifth order polynomial p(x) in Model 1. We show results after iteration 0, 1, and 2. . . . . . . . 106 Figure 5.4: Logarithm of the actual error and error estimate for the cubic spline self- learning emulator in Model 1 after 10 iterations. . . . . . . . . . . . . . . 107 x Figure 5.5: Plot of the lowest real solution to Eq. (5.7) versus c4 in Model 2. The self-learning emulator needs to take significantly more training points near the discontinuity at c4 ≈ 1.232. . . . . . . . . . . . . . . . . . . . . . . . 108 Figure 5.6: Logarithm of the actual error and error estimate for the cubic spline self- learning emulator in Model 2 after 10 iterations. . . . . . . . . . . . . . . 109 Figure 5.7: Logarithm of the actual error and error estimate for the cubic spline self- learning emulator in Model 2 after 20 iterations. . . . . . . . . . . . . . . 109 Figure 5.8: Natural spline emulator error scaling for Model 1. We plot the logarithm of the error versus the logarithm of the number of iterations. . . . . . . . 111 Figure 5.9: Logarithm of the actual error, error estimate, and corrected error esti- mate for the natural spline emulator with self-learning in Model 3 after 20 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Figure 5.10: Logarithm of the actual error, error estimate, and corrected error esti- mate for the reduced basis emulator with self-learning in Model 3 after 10 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Figure 5.11: Natural spline emulator error scaling for Model 2. We plot the logarithm of the error versus the logarithm of the number of iterations. . . . . . . . 115 Figure 5.12: Reduced basis method error scaling for Model 2. We plot the logarithm of the error versus the number of iterations. . . . . . . . . . . . . . . . . 116 Figure 5.13: Logarithm of the error in Model 2 after 40 iterations using self-learning EC. In (a) we show the logarithm of the actual error (red), and in (b) we show the logarithm of the estimated error (blue). . . . . . . . . . . . . . 123 Figure 5.14: Plot of the two-particle short-range correlations in Model 2. ρ13 (red) measures the probability that particles 1 and 3 occupy the same lattice site, and the correlation function ρ23 (blue) measures the probability that particles 2 and 3 occupy the same lattice site. . . . . . . . . . . . . . . . 124 Figure 5.15: Eigenvector continuation emulator error scaling for Model 3. We plot the logarithm of the error versus the number of iterations. . . . . . . . . . . 125 xi Figure 6.1: We move the auxiliary-fields of different time steps in the numerator so that we have the same auxiliary-field for time steps in numerator and denominator with same coupling. This figure shows the auxiliary-fields are moved around for Lt = 40. The different time steps in numerator and denominator with same color have same auxiliary-field. . . . . . . . . . . 130 xii Chapter 1 Introduction Eigenvector continuation (EC) is a relatively new computational technique [1] that finds the extremal eigenvalues and eigenvectors of a parameter dependent Hamiltonian matrix. It is a variational method where we project the Hamiltonian onto a subspace of basis vectors corresponding to eigenvectors at some chosen training values of the control parameters. Eigenvector continuation was originally designed for extrapolation of quantum many- body wave functions in extremely large vector spaces where general vector manipulations are not possible. It has been used in our research group to extend quantum Monte Carlo methods to problems with strong sign oscillations [2]. It has been used as a fast emulator for quantum many-body systems [3,4], and as a resummation method for perturbation theory [5]. It is well suited for studying the connections between microscopic nuclear forces and nuclear structure, a topic that has generated much recent interest [6–14]. However, we should also mention that eigenvector continuation is a special case of a larger existing formalism called reduced basis methods, which is itself a special case of an even larger class of methods called model order reduction. Although it has been shown that eigenvector continuation gives us accurate approxi- mations to our target extremal eigenvalue and eigenvector, its convergence properties have not been studied before. We know that the error in our approximation improves if we take 1 more training points, but the location of the training points also make a big difference. This dependence of approximation error on the number of training points and their location is very problem specific, and very hard to characterize. In this work, we present a study of convergence properties of eigenvector continuation, which give us additional insight about how the method works. Using this insight and ma- chine learning, we then present a self-learning algorithm to select optimal training points for eigenvector continuation. We show that this active-learning protocol of self-learning can not only be used with eigenvector continuation emulators, but also with any emulator that tries to emulate the solution to a system of constraint equations. With various examples, we try to demonstrate the widespread applicability of self-learning emulators and hope that they will be useful in other scientific fields using emulators. This work is organized as follows. In this chapter we go over our notations, basic definitions, and some introduction and background on basic topics that will be relevant for studying eigenvector continuations. We encourage the reader to directly skip to the next chapter if the reader is already familiar with these concepts. All the chapters after this will make no reference to the materials presented in this chapter. In chapter 2, we introduce the method of eigenvector continuation, and explain all its details. We demonstrate with simple examples how to apply it to any sample problem, and illustrate the advantages of using this method. In chapter 3, we consider a practical problem in physics - the anharmonic oscillator. Standard perturbation theory has zero radius of convergence in the case of the anharmonic oscillator, but we show that eigenvector continuation can still be applied to this problem. The method works perfectly well, and we can extrapolate to any region using this technique. In chapter 4, motivated by our results from the anharmonic oscillator, we discuss the 2 error convergence properties of eigenvector continuation. We show how the approximation error in eigenvector continuation depends on the location and number of training points in this problem, and we try to study the asymptotic convergence behaviour as we add more and more training points. This helps us understand more about eigenvector continuation, and we make an important realization about orthogonality among the training eigenvectors. In chapter 5, we continue our discussion of error convergence and selection of optimal training points for eigenvector continuation. We come up with an iterative self-learning algorithm that uses eigenvector continuation emulator itself to learn where the next optimal training point is. We then show that this algorithm is actually very general and can be applied to a whole class of emulators with a constraint equation. We present several examples to demonstrate that self-learning emulators work very well and that it efficiently learns the optimal locations for training points. In chapter 6, we give a brief introduction to our research group’s Nuclear Lattice Ef- fective Field Theory (EFT) calculations of nuclear observables. This is done the help of Auxiliary-Field Quantum Monte Carlo, but in these calculations we are plagued by a prob- lem called "sign problem", which result from the repulsive nature of the Coulomb interaction. This is a signal-to-noise ratio problem, and with strong sign problems, we cannot extract the ground state energy using this formalism. We then present an algorithm to implement eigenvector continuation in these lattice Monte Carlo codes, called the floating-block eigen- vector continuation. We show that eigenvector continuation can help us alleviate some of our sign problems by being able to extrapolate to the target point we are interested in. Finally we end with conclusions and possible future work. We also provide some details of our numerical calculations in a supplemental materials, which is given at the end of this document. 3 We start with defining our notations and basic Hamiltonian mechanics. We will not describe all the details, and the reader, if unfamiliar with the topic, is encouraged to look up any quantum mechanics book [15]. 1.1 Hamiltonian In physics, we are often interested in understanding how particles interact with each other, and how they evolve in time. For large particles, we have Newtonian mechanics, which describe the physics of their interactions very well. For small particles we need quantum mechanics. Here we will first describe quantum mechanics using wave mechanics formalism. Following the usual wave mechanics notations, we use the term wavefunction to represent any function in our coordinate space. If we take the Fourier transform of the wavefunction in coordinate space, we get a wavefunction in momentum space. The wavefunction Ψ(x, t) can represent a state of particle(s), and in coordinate repre- sentation, |Ψ(x, t)|2 dx is the probability of finding the particle(s) in the region x to x + dx. This wave function satisfies the Schrödinger equation, which governs how it evolves in time. In one-dimension, this is given by, ∂Ψ ℏ2 ∂ 2 Ψ iℏ =− +VΨ (1.1) ∂t 2m ∂x2 This time-dependent equation is often written in terms of the Hamiltonian operator Ĥ = p̂2 /2m + V , where p̂ = (ℏ/i)(∂/∂x). The corresponding equation is then ĤΨ = iℏ(∂Ψ/∂t). In multi-dimensions, (∂/∂x) becomes multi-dimensional derivatives ∇. The Hamiltonian is an important concept because any given problem and its interac- 4 tions can be expressed by its Hamiltonian. For example, for the free particle without any interactions, the Hamiltonian is given by Ĥ = p̂2 /2m. We solve equation 1.1 by separation of variables, and separating out the time dependence we get, Ĥψ = Eψ (1.2) where Ψ = ψe−iEt/ℏ . We call the solutions to equation 1.2 stationary states because the probability of finding the particle at any point |Ψ|2 = |ψe−iEt/ℏ |2 = |ψ|2 does not change with time. Since the Hamiltonian Ĥ is a Hermitian operator, the stationary state solutions to 1.2 form a complete basis. This means that we can represent any given wavefunction as a linear combination of the stationary states. This also allows us to write any operator as a function that takes a given stationary state to some combination of other stationary states. In other words, we can represent any operator as a matrix that acts on a vector written in terms of a basis of stationary states. This is the matrix formalism of quantum mechanics, and using this matrix mechanics we can turn the Schrödinger equation of 1.2 into a matrix equation, H |ψ⟩ = E |ψ⟩ (1.3) where H is a matrix and |ψ⟩ is a column vector. We are using the Dirac notation of bra-ket, where |ψ⟩ is called a ket and is a column vector, and ⟨ψ| is called a bra and is a row vector. Their inner product is written as ⟨ψ|ψ⟩, and it is just the multiplication of a row vector and a column vector. We also assume that any given state |ψ⟩ is normalized such that ⟨ψ|ψ⟩ = 1, 5 because we can always re-normalize any given ket as |ψ ′ ⟩ = |ψ⟩ / ⟨ψ|ψ⟩, so that ⟨ψ ′ |ψ ′ ⟩ = 1. p In this work, we will be looking at the Hamiltonian as a matrix H, and we are interested in solving the eigenvalue problem given in equation 1.3. The eigenvalue E has a physical interpretation as energy of the system, and thus, we will often call the eigenvalue of the Hamiltonian as eigenenergy or just energy. We call the smallest eigenvalue as the ground state energy, and the larger eigenvalues as excited state energies. The eigenvectors corresponding to them are called ground state eigenvector and excited state eigenvectors respectively. Since eigenvector continuation is a variational technique that gives best results with the ground state, we will often be interested only in the ground state. Therefore, in this work we will often call the ground state energy and the ground state eigenvector of the Hamiltonian as just the eigenenergy (or simply energy) and eigenvector of the Hamiltonian respectively. 1.2 The Harmonic Oscillator In this section we review the quantum harmonic oscillator that is usually covered in a quan- tum mechanics class. The harmonic oscillator problem is important for us because in chapter 3, we will consider the anharmonic oscillator, which builds on the harmonic oscillator prob- lem. Here we define the ladder operators and show how we get the eigenenergies for this problem. The classical harmonic oscillator is a mass m attached to a spring with spring constant k. Its motion is given by Hooke’s law, d2 x F = −kx = m 2 (1.4) dt 6 Its potential is given by, 1 1 V = kx2 = mω 2 x2 (1.5) 2 2 p where ω = k/m is the frequency of oscillation. Accordingly, the quantum harmonic oscillator Hamiltonian is defined by, p̂2 1 Ĥ = + mω 2 x2 (1.6) 2m 2 To solve the Schrödinger equation with this Hamiltonian, we define the ladder operations as follows, 1  a± = √ ∓ ip + mωx (1.7) 2ℏmω Here a+ is called the creation operator and a− is called the annihilation operator. If we √ define the eigenstates of the number operator N = a+ a− as |n⟩, then a+ |n⟩ = n + 1 |n⟩ √ and a− |n + 1⟩ = n |n − 1⟩. With the help of commutation relationship [x, p] = iℏ, these operators allows us to rewrite the Hamiltonian in the following way,  1 H = ℏω a+ a− + (1.8) 2 Since N |n⟩ = n |n⟩, we see that the eigenenergies of the Hamiltonian are of the form,  1 En = ℏω n + (1.9) 2 7 and the eigenvectors of the Hamiltonian are the same as the eigenvectors of the Number operator |n⟩. In coordinate representation, − mω x2  mω 1/4 |n⟩ = An (a+ )n ψ0 (x), where ψ0 (x) = e 2ℏ (1.10) πℏ where An is some normalization constant to make ⟨n|n⟩ = 1. 1.3 Perturbation Theory and Avoided Level-Crossing In this work, we consider parameter dependent Hamiltonians of the form H(c) = H0 + cH1 , where H0 and H1 are hermitian matrices. The idea here is that H1 can be seen as a perturbation from H0 , whose solution we know already. We can think of arriving at the target Hamiltonian H(ct ) by setting c = 0 and slowly increasing it until c = ct . If the parameter dependent wavefunction and energy are denoted by ψ(c) and E(c), then for small c, we can expand them in a power series, c2 ′′ ψ(c) = ψ(0) + cψ ′ (0) + ψ (0) + · · · 2! c2 E(c) = E(0) + cE ′ (0) + E ′′ (0) + · · · (1.11) 2! where the prime represents derivative with respect to the parameter c. In standard perturbation theory, one would put these expansions into the Schrödinger equation and compare the coefficients of equal derivatives. We do not derive the expressions for the results of perturbation theory here. Instead, here we will look at the expansion in equation 1.11 in the complex plane (when we extend c to be complex parameter), and try to understand why at some point the series expansion break down. 8 If we assume that all eigenvalues of H0 and H1 are not degenerate, then as we change the parameter in our Hamiltonian in H = H0 + cH1 , the relative ordering of eigenvalues and their corresponding eigenvectors change. As c → ±∞, the eigenvalues and eigenvectors of H are dominated by the eigenvalues and eigenvectors of H1 . When c → −∞ the largest eigenvalue of H1 is the smallest eigenvalue of H, and the smallest eigenvalue of H1 is the largest eigenvalue of H. But when c → ∞ the largest eigenvalue of H1 is now the largest eigenvalue of H, and the smallest eigenvalue of H1 becomes the smallest eigenvalue of H. As we move from c → −∞ to c → ∞, all the eigenvalues of the H invert, and at some point they cross each other. In general, this crossing happens somewhere in the complex plane, and not necessarily on the real axis. If we look at only along the real axis, then we see that different eigenvalues cross each other, but they do not intersect. Instead, we see avoided level-crossings. Corresponding to these avoided level-crossings along the real-axis, there are associated branch point singularities of the levels when they are continued in the complex plane. These very special points are called exceptional points. They have been well studied [16, 17], and we only mention some brief details here. Consider the avoided level-crossing between two eigenvalues ϵ1 (c) and ϵ2 (c). Suppose these eigenvalues meet at λc in the complex plane. At this point, the two eigenvalues are equal, i.e., ϵ1 (λc ) = ϵ2 (λc ), and their corresponding eigenvectors also converge into a single eigenvector. This makes these points special because at these points there is only one eigenvector corresponding to this repeated eigenvalue, and the dimension of the eigenspace is less than the dimension of the matrix. Therefore, we have a defective matrix, and the Hamiltonian is not fully diagonalizable. The two functions, ϵ1 (c) and ϵ2 (c), are the values of one single analytic function on two Riemann sheets, and the two sheets are connected at 9 the branch point singularity occurring at c = λc . These branch point singularities of eigenvectors and eigenvalues restrict the area of convergence for simple expansions like that in equation 1.11. This means that when we have an exceptional point at λc , standard perturbation theory will fail if we try for a large c, with |c| > |λc |. This concept will be relevant for us when we discuss how eigenvector continuation can extrapolate beyond the region of convergence of perturbation theory using analytic continuation in the complex plane. 1.4 Variational Principle The variational principle says that for any given state |ψ⟩ and Hamiltonian H, the ground state energy Egs will be less than or equal to ⟨ψ|H|ψ⟩, i.e., Egs ≤ ⟨ψ|H|ψ⟩. This is relevant for us because eigenvector continuation is a variational technique, and we will see that the eigenvector continuation energy approximation for the ground state will always be greater than the actual energy. The variational principle also highlights a second important observation about eigen- vector continuation. Variational principle applies only for the ground state energy of the system. If we want to apply variational principle for excited state, then we need to remove the ground state and look in orthogonal directions to apply it. We have a similar situation with eigenvector continuation. Eigenvector continuation works best for estimating ground state energy and eigenvector (i.e. extremal eigenvector and eigenvalue), however, it can be applied to estimate the excited state energy if we include the ground state and excited state eigenvectors in the training data. 10 Chapter 2 Eigenvector Continuation Eigenvector continuation (EC) is a variational method that finds the extremal eigenvalues and eigenvectors of a Hamiltonian matrix that depends on one or more control parameters. It was first introduced in [1, 2], and was originally developed from studying the convergence of the Lanczos algorithm and searching for a set of basis states that would converge more quickly than the Krylov subspace. The main idea in eigenvector continuation is that as we vary the parameters in the Hamiltonian that we have control over, the extremal eigenvector also varies with the con- trol parameter, but this eigenvector moves in a subspace whose dimensionality is much less than the dimension of the Hilbert space that the eigenvector lives in. This implies that we will capture the essence of the problem if we project the problem onto the smaller vector space that the extremal eigenvector lives in. The estimate that we can get from this pro- jection is much faster to calculate because of the reduction in dimensionality. This speedup boost makes eigenvector continuation a general technique that can be applied to variety of problems. Typically, we apply eigenvector continuation to a problem by projecting the target Hamiltonian onto a subspace of basis vectors corresponding to eigenvectors at some chosen training values of the control parameters. We then solve the generalized eigenvalue problem 11 to find the extremal eigenvectors. Effectively, this gives a linear combination of the input eigenvectors that minimizes the eigenvalue of the Hamiltonian at the target value of the parameter. This linear combination serves as a good low-dimensional estimate for our exact eigenvector. 2.1 How to Apply Eigenvector Continuation Let us consider a one-parameter family of Hamiltonian matrices H(c) = H0 + cH1 , where both H0 and H1 are finite-dimensional Hermitian matrices, and c is a parameter that we can control. The extension to the multi-parameter case follows naturally, and will be discussed after our discussion of the one-parameter case. For the given Hamiltonian family H(c), we are interested in finding the ground state eigenvector |v(ct )⟩ (=extremal eigenvector) and eigenvalue E(ct ) for some target parameter value c = ct . As noted in Ref. [1], eigenvector continuation can also be extended to excited states by including excited state eigenvectors at the training points. For now we will focus on ground state calculations in this analysis. The first big advantage of eigenvector continuation that we will illustrate is that it allows us to extrapolate to a region where we cannot access by direct calculation. Our next chapter is entirely dedicated to illustrate this point with the help of an example on anharmonic oscillator. For now, suppose that for the given ct that we are interested in, H(ct ) cannot be diagonalized directly with regular computational techniques in a reasonable amount of time. We can only calculate its eigenvalues and eigenvectors for small c. We want to use this information about the eigenvectors for small c to find the eigenvector at large ct . Let us calculate the exact eigenvectors for k small values of c, where we can compute the eigenvector 12 of the Hamiltonian matrix directly. We will call these k points as our training points since we are training eigenvector continuation to give an output based on these training eigenvectors. We denote these points by ci = {c1 , · · · , ck }, and the ground state eigenvector at those corresponding points by |v(c1 )⟩ , · · · , |v(ck )⟩. We are interested in finding |v(ct )⟩ and E(ct ), where H |v(ct )⟩ = E(ct ) |v(ct )⟩. We define the norm matrix N as, Nij = ⟨v(ci )|v(cj )⟩ (2.1) where |v(ci )⟩ and |v(cj )⟩ are both taken from the training eigenvector set |v(c1 )⟩ , · · · , |v(ck )⟩. We project the target Hamiltonian H(ct ) on to the subspace spanned by the training eigenvectors |v(c1 )⟩ , · · · , |v(ck )⟩, and define the projected matrix H̃ as, H̃ij (ct ) = ⟨v(ci )|H(ct )|v(cj )⟩ (2.2) Note that we are using the target Hamiltonian with parameter c = ct , but we are projecting onto eigenvectors with parameter c = ci . After calculating these matrices, we solve the generalized eigenvalue problem for H̃ and N . A generalized eigenvalue problem for these two matrices is the problem of finding a (nonzero) vector |w(ck )⟩ that satisfies, H̃ |w(ct )⟩ = ẼN |w(ct )⟩ (2.3) Then, our eigenvector continuation estimate of energy is EEC = Ẽ and EC estimate of 13 the eigenvector is |v⟩EC = TV |w(ct )⟩, where TV is the matrix of training vectors,   TV = |v(c1 )⟩ · · · |v(ck )⟩ (2.4) Here |v(ci )⟩ corresponds to the eigenvector at c = ci , with all elements written in a column. The eigenvector |w(ct )⟩ we get after we solve the generalized eigenvalue problem is the eigenvector we want, but it is in the projected low-dimensional space. We get our eigenvector in our original Hilbert space, we need to multiply by the matrix of training vectors TV . We will address the question of how to choose our training points ci = {c1 , · · · , ck } later. Different choices of ci gives us different results, and the accuracy of our eigenvector continuation approximation depends on our choice of ci , and we will study this dependence when we study the convergence properties of eigenvector continuation. Only after we study the convergence properties, we will be in a position to discuss strategies to optimally choose our training points. In theory, this is all we need to perform eigenvector continuation. However, in practice, we often ortho-normalize our training vectors by Gram-Schmidt algorithm. This makes the norm matrix Nij the identity operator, and equation 2.3 just becomes a normal eigenvalue problem. Note that when we orthogonalize our training vectors, we also need to use a orthonormal set of eigenvectors in the training vector matrix in 2.4. Orthogonalization speeds up our calculation especially for large number of training vectors, and it is critical for our understanding of convergence of eigenvector continuation. We will describe the convergence in more details after two chapters. Before we move to an example, let us summarize the main idea of eigenvector contin- uation. H is a n dimensional matrix, where n can be a very large number, and H̃ is a k 14 dimensional projection of the H matrix (at the target parameter), onto the space of k train- ing points, and k is much usually smaller than n. The training eigenvectors are the exact eigenvectors of the Hamiltonian at different parameter values, and often far away from the target parameter value. By solving for the eigenvalue of H̃ instead of H, we have reduced the dimensionality of the problem, and the diagonalization is much faster. This illustrates the second big advantage of eigenvector continuation - huge computational speed up factor. Later we will see that instead of using eigenvector continuation for extrapolation, we can use EC just for this computational speed boost and use it to build fast and accurate emulators that can interpolate in between training points. For now, we will focus on extrapolation. In the next chapter, we will show how this extrapolation works with a practical example - anharmonic oscillator. In the anharmonic oscillator problem, standard perturbation theory diverges and we cannot get any results beyond a certain value of the parameter. But as we will soon show, eigenvector continuation is able to extrapolate to a region where perturbation theory fails. However, we will first take a look at a simple problem using random matrices to demonstrate how we apply eigenvector continuation to any problem. Suppose we have a Hamiltonian H = H0 + cH1 , where H0 and H1 are random 10-dimensional Hermitian matrices, given by, 15    9.87 9.00 10.14 9.86 10.62 9.94 10.01 10.11 9.90 9.04       9.00 10.11 8.93 9.58 9.38 9.15 10.77 9.70 9.71 7.90        10.14 8.93 9.88 9.87 10.85 9.86 10.11 8.78 9.78 10.55      9.86 9.58 9.87 9.93 9.82 10.35 10.06 9.91 9.47 10.81       10.62 9.38 10.85 9.82 10.15 9.74 9.63 11.29 10.03 10.68   H0 =      9.94 9.15 9.86 10.35 9.74 8.08 9.67 9.00 11.16 10.52       10.01 10.77 10.11 10.06 9.63 9.67 9.41 8.54 10.73 10.15       10.11 9.70 8.78 9.91 11.29 9.00 8.54 8.74 10.40 10.04        9.90 9.71 9.78 9.47 10.03 11.16 10.73 10.40 11.20 8.96      9.04 7.90 10.55 10.81 10.68 10.52 10.15 10.04 8.96 12.27   0.15 0.04 0.13 0.10 0.15 0.09 0.07 0.13 0.13 0.08     0.04 0.21 0.10 0.09 0.08 0.14 0.14 0.06 0.10 0.09       0.13 0.10 0.03 0.08 0.06 0.04 0.12 0.08 0.07 0.07     0.10 0.09 0.08 0.11 0.13 0.11 0.06 0.10 0.10 0.12       0.15 0.08 0.06 0.13 0.13 0.02 0.04 0.11 0.11 0.04   H1 =    0.09 0.14 0.04 0.11 0.02 0.10 0.11 0.09 0.10 0.13       0.07 0.14 0.12 0.06 0.04 0.11 0.09 0.13 0.05 0.08       0.13 0.06 0.08 0.10 0.11 0.09 0.13 0.07 0.13 0.13       0.13 0.10 0.07 0.10 0.11 0.10 0.05 0.13 0.05 0.12     0.08 0.09 0.07 0.12 0.04 0.13 0.08 0.13 0.12 0.04 Suppose we have the exact ground state eigenvector |ψ(c)⟩ at c = 50 and c = 100, but we are interested in the eigenvector at ct = 1000. For this simple example, we can calculate the 16 eigenvector directly by othrogonalization of the Hamiltonian, but for demonstration purposes we will show that we can find an estimate of the ground state eigenvector at c = 1000 with eigenvector continuation, using only two training eigenvectors. The ground state eigenvectors at c = 50 and c = 100 are,  T |ψ(c = 50)⟩ = 0.31 0.08 −0.44 0.06 −0.04 −0.46 0.31 −0.46 0.24 0.36  T |ψ(c = 100)⟩ = 0.31 0.06 −0.48 0.07 −0.07 −0.42 0.39 −0.43 0.22 0.32 So our training vector matrix TV is a 10 × 2 matrix, where the two columns are the two eigenvectors above. However, before projecting H(ct ) onto these training vectors, we first orthonormalize them. The orthogonalized TV matrix is,  T 0.31 0.08 −0.44 0.06 −0.04 −0.46 0.31 −0.46 0.24 0.36 TV =   0.01 0.22 0.38 −0.12 0.24 −0.31 −0.70 −0.16 0.20 0.30 Now the norm matrix N is just the 2 × 2 identity matrix, and the projected Hamiltonian matrix H̃ is given by,   † −124.24 27.09  H̃ = TV HTV =   27.09 −9.39  T The eigenvalue of this matrix H̃ is -130.31, and its corresponding eigenvector is 0.98 −0.22 . When projected back to our 10-dimensional space, we get our eigenvector continuation esti- mate of eigenvector. 17  T |ψ(ct )⟩EC = 0.30 0.03 −0.51 0.08 −0.09 −0.38 0.45 −0.41 0.19 0.29 The exact eigenvalue at ct = 1000 is -130.56 and the exact eigenvector is  T |ψ(ct )⟩ = 0.31 0.05 −0.52 0.11 −0.13 −0.39 0.44 −0.39 0.22 0.23 The difference in eigenvalue is 0.25, and the norm of the difference of the two eigenvectors is about 0.08. Thus, with only two training points, eigenvector continuation gave us an eigenvalue estimate with 99.8% accuracy, and an eigenvector estimate with 92% accuracy. With more training points, the estimate becomes better. If we add another training point at c = 150, then with three training points, the eigenvector continuation estimate of eigenvalue is 99.99% accurate, and the EC estimate of eigenvector is 99% accurate. Note that to find the precision, we need to calculate the exact result at the target point and directly compare with the eigenvector continuation approximation. Calculation of the exact result can be hard or impossible in some problems, and in those cases the precision of our approximation cannot be calculated directly. For now, we discuss problems where we can calculate the exact result and directly compare our approximations. Later, we will address this issue when we discuss self-learning eigenvector continuation. Eigenvector continuation reduced the problem of diagonalizing a 10-dimensional matrix to diagonalizing a 2-dimensional matrix. For this particular example, we achieved a speed- up factor of more than 7 times, while losing < 0.2% accuracy. This shows why eigenvector continuation can be such a efficient tool at making approximations. We can achieve even more accurate estimates with more training points, but it will take a little longer time. There is a trade-off between accuracy and computational speed. However, eigenvector continuation 18 will always be faster than direct diagonalization of the target matrix. Note that in our eigenvector continuation estimate, the accuracy in eigenvalue is higher than accuracy in eigenvector. Furthermore, the estimate of eigenvalue is greater than the actual eigenvalue. This is because eigenvector continuation is a variational technique, and works by trying to minimize the eigenvalue. In the next section, we describe the variational approach of this method, and gain deeper understanding of how eigenvector continuation works. 2.2 Variational Principle in Eigenvector Continuation Although it might not seem like it, eigenvector continuation is a variational technique, and as such we see several properties in EC as seen in other variational methods. For example, the eigenvector continuation estimate of ground state energy will always be greater than the actual ground state energy of the problem. We mentioned before that the EC estimate of the eigenvector is a linear combination of the training eigenvectors. Let us see why this is true, and it will be clear why eigenvector continuation is a variational technique. Consider again the TV , H and H̃ matrix in equations 2.1, 2.2 and 2.4. This time let us write the eigenvector |w(ct )⟩ as a column vector |w(ct )⟩ = [a1 · · · ak ]T . Also, for simplification, let us orthonormalize our training vectors, so that the norm matrix becomes † † an identity matrix, i.e., N = TV TV = TV TV = I. Now we are interested in solving the 19 generalized eigenvalue problem given in equation 2.3,      a1   a1      H̃ · · · = Ẽ · · ·             ak ak      T    a1   a1      ⇒ |v1 ⟩ · · · |vk ⟩ Hn×n |v1 ⟩ · · · |vk ⟩ = Ẽ · · ·     · · · k×n n×k         ak ak k×1 k×1 Multiplying both sides by TV and using orthonormality of TV matrix, we have        a1     a1      ⇒ Hn×n |v1 ⟩ · · · |vk ⟩ = Ẽ |v1 ⟩ · · · |vk ⟩     · · ·  · · · n×k    k×n    ak ak k×1 k×1     ⇒ Hn×n a1 |v1 ⟩ + · · · + ak |vk ⟩ = Ẽ a1 |v1 ⟩ + · · · + ak |vk ⟩ (2.5) n×1 n×1 To improve clarity, we have shown the size of the vectors in subscript. We are interested in the lowest eigenvalue, and as equation 2.5 shows, the eigenvector continuation algorithm picks a linear combination of the training vectors that minimizes the eigenvalue Ẽ. The eigenvector [a1 · · · ak ]T of the projected Hamiltonian H̃ is in fact the coefficients of the linear combination of the training eigenvectors. Now the variational approach of eigenvector continuation is clear, because the method is varying the coefficients [a1 · · · ak ]T to minimize the energy. We do not see it directly be- 20 cause we are not minimizing anything directly. However, when we are looking for the lowest eigenvalue, the method chooses the coefficients so that it can get the smallest eigenvalue as possible. In this way, eigenvector continuation chooses a linear combination of the training eigenvectors such that the eigenvalue is minimized. This variational principle is also the rea- son why eigenvector continuation works best for the ground state eigenvector and eigenvalue, i.e., extremal eigenvector and eigenvalue. Now let us look into why a linear combination of training points can approximate the target eigenvector so well, even when we are extrapolating far away in the parameter space. This brings us to our next topic - analytic continuation. 2.3 Analytic Continuation We can understand why eigenvector continuation can extrapolate so well by seeing it as an analytic continuation in the complex plane. By looking at series expansions in the complex plane, we can also see how eigenvector continuation compares to perturbation theory. Let us first see how analytic continuation can be used to extend perturbation theory, and then we will discuss why this is relevant to eigenvector continuation. The ideas presented in this section are not directly used while applying eigenvector continuation to a practical problem. Here we only present theoretical ideas that can help understand how eigenvector continuation works. Suppose we have a parameter dependent Hamiltonian H(c), where H is Hermitian for real c. We are interested in calculating the ground state eigenvector of H(c) for some real c, but we can also look at the dependence of the eigenvector as we vary c in the complex plane. In the end we will be interested in a point on the real axis, but this extension to 21 complex plane has several benefits, and that one that will be obvious soon is that it allows us to visualize the effective range of perturbation theory. For the typical Hamiltonian form that we will consider, H(c) = H0 + cH1 with Hermitian H0 and H1 , one sees several avoided level-crossings that correspond to singular points on the complex plane. These points limit the radius of convergence of any power series expansion of the eigenvector involved in the level-crossing. Suppose we want to expand the ground state eigenvector |ψ(c)⟩ around c = 0. Then we can write, ∞ cn |ψ (n) (0)⟩ X |ψ(c)⟩ = (2.6) n! n=0 where |ψ (n) (0)⟩ denotes the nth derivative of the eigenvector with respect to the parameter c, and evaluated at c = 0. Note that this derivative is another eigenvector, and is in general a multidimensional vector. Now suppose there is a singularity at z and z̄. The situation shown in figure 2.1. The singularity will restrict the radius of convergence of equation 2.6 to the circle shown in picture 2.1a. Equation 2.6 is how standard perturbation theory works, and we generally perturb from c = 0. So if we were to use perturbation theory in our problem, we would only be able to get results for any c within the radius of convergence. If we move outside the circle shown in 2.1a, perturbation theory breaks down. Now we can also expand in a power series centered at another point on the real axis, say w. This would be corresponding to doing perturbation theory, but starting our perturbation from c = w. For this series we can write, ∞ (c − w)n |ψ (n) (w)⟩ X |ψ(c)⟩ = (2.7) n! n=0 22 (a) Expansion around c = 0 (b) Expansion around c = w Figure 2.1: Radius of convergence of power series expansion at any point is limited by the singular points z and z̄. As figure 2.1b shows, the circle within which this series converges is little different to the previous circle where perturbation theory from c = 0 converged. We have some overlap with the previous circle, but we also have covered some region outside our circle centered at c = 0. In the region where they overlap, both of our expansions work, and we can write an equation for the nth derivative of |ψ(c)⟩ at c = w as a series, ∞ wm |ψ (n) (w)⟩ |ψ (n+m) (0)⟩ X = (2.8) m! m=0 This follows from equation 2.6. Now we can substitute this equation in equation 2.7 and get, ∞ X ∞ wm (c − w)n |ψ (n) (w)⟩ |ψ (n+m) (0)⟩ X = (2.9) n!m! n=0 m=0 This equation is now applicable anywhere in the radius of convergence of the series in equation 2.7, which is the circle shown in figure 2.1b. However, it only has derivatives at c = 0, which can all be calculated at the origin. Thus, using only the information at c = 0, 23 we can calculate the eigenvector at any point in the circle shown in figure 2.1b. This means that we have analytically continued from the circle in figure 2.1a to the circle in figure 2.1b, and went beyond the standard perturbation theory radius of convergence. This new region that we can access by analytic continuation is shown in figure 2.2. Figure 2.2: Analytic continuation allows us to extend standard perturbation theory radius of convergence. So we have seen that we can extend perturbation theory by analytic continuation. This is possible because of the smoothness in the parameter of the Hamiltonian, and complex analysis tells us that the information about the far away points from origin, is actually present near the origin. We can write the eigenvector at a far away point as a linear combination of the derivatives of the eigenvector at origin. Now, any given training vector can also be written as a sum of derivatives of the eigenvector at origin, and consequently, we can combine them to write the eigenvector at a far away point as a linear combination of our training 24 eigenvectors. As we have seen in our previous section, eigenvector continuation gives us a linear combination of our input training vectors, and thus it now makes sense why eigenvector continuation works. The important thing is that for any given value of the parameter c, there exists a linear combination of training points that can approximate the eigenvector at that point |ψ(c)⟩. As long as this linear combination exists, eigenvector continuation can do a variational approach to find the linear combination that approximates the energy the best. In practical application, when we perform eigenvector continuation, we are dealing with finite dimensional matrices and we have a finite number of training points. So we don’t have an infinite number of terms in our series and as such we are making truncated approximations in equations 2.6, 2.7, 2.8, and 2.9. This makes our linear sum not exactly equal to the actual target point eigenvector. However, the main point of this technique is that for numerical purposes, we don’t need infinite or large number of training points. We get a very good approximation with a few training points. This relation to analytic continuation is the reason why the technique was named eigen- vector continuation. Let us emphasize once again that in a practical calculation, we are not calculating any series expansions. But these ideas of analytic continuation help us explain why eigenvector continuation works so well in extrapolating. We will come back to this in the next chapter, where we study the problem of anharmonic oscillator. In that context, we will demonstrate the power of analytic continuation, and how eigenvector continuation allows us to calculate values that perturbation theory cannot give. 25 2.4 Interpolation and Emulators using Eigenvector Con- tinuation In this section we discuss the second big advantage of using eigenvector continuation - huge computational speedup boost. As we mentioned before, by projecting the full size Hamil- tonian H onto the subspace of training vectors, which is substantially smaller than the dimension of the H, we reduce the dimensionality of the problem by possibly orders of magnitude. This means that even when we can solve a problem using conventional means, eigenvector continuation allows us to get accurate approximations of the result potentially orders of magnitude faster. This brings us to the next point - interpolation. So far we have seen eigenvector contin- uation as an extrapolation technique, especially when we see it as an analytic continuation. However, the reasoning that we used with series expansions before also works when we are inside the base circle where perturbation theory works. Within the circle where the pertur- bation theory works, the eigenvector at target coupling that we are interested in can also be written as a linear combination of training eigenvectors (technically an infinite number of training eigenvectors, but we get an approximation by truncating it). Another way to put it is that we can analytically continue to the circle we are already in - it is the trivial continuation. So, eigenvector continuation can be used at any target coupling, whether we are extrapolating to a far away point in the parameter space, or we are interpolating between training points in the parameter space. We should mention that when eigenvector continuation was introduced as a general technique, the interpolation aspect of the method gained more popularity than the extrapo- lation aspect. The reason is because we can use eigenvector continuation as a emulator, by 26 giving it the exact data at particular training points and estimating (or emulating) the exact result everywhere else. This can be interpolation or extrapolation depending on where our training data is in the parameter space, however the accuracy of our eigenvector continuation emulator is greatly increased if our training points are spread out and we are interpolating points in between them. Let us illustrate this interpolation idea with an example. We will go back to the same example that we used for extrapolation - the one with 10-dimensional random matrices. With the same H0 and H1 , we have H = H0 + cH1 . But this time, instead of being interested in the ground state eigenvector at c = 1000, we want an approximate ground state eigenvector for all c between −700 ≤ c ≤ 700. That is a lot of points. In our case, we discretized the c space into 14000 points, and that means we want the eigenvector at 14000 points. If we perform direct diagonalization, we need to diagonalize 14000 times, which takes some time to perform numerically. Instead we can perform eigenvector continuation everywhere, using the exact eigenvectors obtained from direct diagonalization at a fixed set of training values, as training vectors. We only need to calculate the exact result a couple of times, and performing eigenvector continuation over the rest of the points is much faster. Figure 2.3 shows the error in our eigenvector continuation emulator across the whole c parameter space with six training eigenvectors. The error in eigenvalue is just the difference between the EC emulated eigenvalue and the actual eigenvalue, whereas the error in eigenvector is the norm of the difference between the EC emulated eigenvector and the actual eigenvector. The training eigenvectors are located at points where the log of difference of the errors is -35. Technically the error should be −∞, but we have limitation of numerical precision, and we manually set the error at these points to e−35 . As figure 2.3 shows, eigenvector continuation does an excellent job at interpolating 27 Figure 2.3: Interpolation with eigenvector continuation - we try to find approximate eigen- value problem solution everywhere between −700 ≤ c ≤ 700 using only six training eigen- vectors. everywhere across our domain −700 ≤ c ≤ 700. However, this plot also raises several questions like, how many training points do we need to get a reasonable approximation, and where should we take our training points? We will try to answer these questions in chapters 4 and 5. For this particular example, we actually chose our training points carefully to get good results with only six training points, and we will explain the details behind choosing training points later. 28 Chapter 3 Anharmonic Oscillator In this section we present our first results of eigenvector continuation applied to a practi- cal physics problem - the anharmonic oscillator. The anharmonic oscillator has been well studied in physics, and it has several applications in nuclear and condensed matter physics. We will start with motivation of studying anharmonicity, and we will mention some of the literature work done in this field. We discuss several things about how we can apply eigen- vector continuation to this system, and how it helps solve the zero radius of convergence of perturbation theory. We will also start looking at the error convergence properties of eigenvector continuation, and see how they scale with increasing number of training points. We will derive some bounds for the error, however we will come back to error convergence in more details in the next chapter. To start things with, we have the simple quantum harmonic oscillator, which is studied in a typical quantum mechanics course. A Hamiltonian with a harmonic oscillator potential is given by, p2 mω 2 x2 H(c) = H = + (3.1) 2m 2 where ω is the fundamental frequency of the harmonic oscillator. We will assume the reader is familiar with this problem, and a small introduction is given in the first chapter. 29 However, in studying vibrations occurring in several places of nuclear and condensed matter physics, one often needs to go beyond the simple harmonic oscillator approximation. For example, anharmonic effects can be important in systems incorporating light atoms with large vibrational amplitudes, systems with weak bonding such as hydrogen bonding, and systems at high temperatures. The quantum anharmonic oscillator has been studied extensively during the past few decades [18–23]. Several studies involving asymptotic expansions and perturbation theory, using techniques from the WKB method, have been done. In particular, it is known that in the case of the quartic anharmonic oscillator, perturbation theory has zero radius of convergence. As a matter of fact, one of the reasons to consider such a system is because this model has a well-defined but divergent perturbation series. Furthermore, the singularity structure has been analyzed before [24], and there is a singularity that lies near the negative imaginary axis. Apart from being useful in modeling vibrations, the quartic anharmonic oscillator (hence- forth referred to as just anharmonic oscillator) is also a simple model field theory in one- dimensional space-time. This field theory is defined by the Hamiltonian, 1 1 H = ϕ̇2 + m2 ϕ2 + λϕ4 (3.2) 2 2 with commutation relations [ϕ, ϕ̇] = i. Since this field-theory model involves no space dimensions, there are no asymptotic states and particle scattering. Moreover, it is a one-dimensional model, which means that it describes a universe which has just one oscillating point. Nonetheless, this model has some value in studying, since it may give us some idea of the underlying complexity of a more 30 realistic field theory. This model has been well studied as well [25, 26]. 3.1 Anharmonic Oscillator Hamiltonian If we add a perturbation of x4 term to the Hamiltonian in equation 3.1, then we get the Hamiltonian for anharmonic oscillator, p2 mω 2 x2 H(c) = + + cx4 (3.3) 2m 2 where c is the strength of the perturbation. Since we consider the cx4 term as a perturbation, we employ the annihilation and creator operators of the standard harmonic oscillator, which are given by, r mω ip a= (x + ) 2ℏ mω r † mω ip a = (x − ) (3.4) 2ℏ mω √ They act on the nth eigenstate, represented by |n⟩, by a |n⟩ = n |n − 1⟩ and a† = √ q ℏ (a + a† ). n + 1 |n + 1⟩. From this we can write x = 2mω We choose the eigenstates of the harmonic oscillator to be our basis states. In this basis, our perturbation Hamiltonian h′ = x4 , has matrix elements given by,  ℏ 2 ⟨i|h′ |i⟩ = 6i2 + 6i + 3 2mω  ℏ 2 ⟨i − 2|h′ |i⟩ = p (4i − 2) i(i − 1) 2mω  ℏ 2 ⟨i + 2|h′ |i⟩ = p (4i + 6) (i + 1)(i + 2) (3.5) 2mω 31  ℏ 2 p ⟨i − 4|h′ |i⟩ = i(i − 1)(i − 2)(i − 3) 2mω  ℏ 2 p ⟨i + 4|h′ |i⟩ = (i + 1)(i + 2)(i + 3)(i + 4) 2mω 1 ⟨i|h|i⟩ = ℏω(i + ) (3.6) 2 p2 2 2 where h = 2m + mω2 x is the harmonic oscillator Hamiltonian. Our entire Hamiltonian is H = h + ch′ , and for simplicity we choose ℏω = 1 and mω 2 = 1. All the numerical results in this work are with these choices of m and ω. 2 2 The x4 term dominates the potential V = mω2 x + cx4 for large x, and with c being the coefficient of x4 , we immediately notice that for c < 0 the potential V → −∞ as x → ±∞, and thus there can be no bound states. However, for c ≥ 0 there can be bound states. This means that if we try perturbation theory at c = 0, it should diverge because there is a singularity right on the negative x-axis that prevents any kind of series expansion at c = 0. And we will indeed show that perturbation theory diverges for even small values of c, however even in this situation eigenvector continuation manages to extrapolate correctly. 3.2 Eigenvector Continuation Applied to the Anharmonic Oscillator If we try to think of eigenvector continuation as an analytic continuation, it still makes no sense to analytically continue if the starting circle has zero radius of convergence, and one would guess that eigenvector continuation should also fail. Yes, in theory we do have zero radius of convergence, but the anharmonic oscillator problem also has an infinite dimensional basis. The basis eigenstates |n⟩ go from n = 0 to n = ∞. In a practical calculation, we 32 work with finite basis, and this truncation causes the singular point to be not exactly at the origin, but a little bit away from origin. As we include more basis points, the singularity comes closer to the origin, and converges to origin as our matrices dimension N → ∞. In our calculation and results, we take N = 200. So eigenvector continuation does have a small circle from where we can analytically continue. This might make it look like that perturbation theory also will converge in that small circle, however, as we take higher orders in perturbation theory, we need information further and further away from origin. To see this, let us just write a few derivatives of any function f (c) at c = 0, −f (x − h) + f (x + h) f ′ (x) = 2h −f (x − 2h) + 16f (x − h) − 30f (x) + 16f (x + h) − f (x + 2h) f ′′ (x) = 12h2 f (x − 3h) − 8f (x − 2h) + 14f (x − h) − 10f (x) + f (x + h) + 8f (x + 2h) − f (x + 3h) f ′′′ (x) = 4h3 Here we are using two-sided derivatives. As we can see for higher derivatives we need points further away from the point where derivatives is being calculated (f (x ± 2h) for second derivative, f (x ± 3h) for third derivative, and so on). What this means is that in that small circle, perturbation theory will converge for a few orders, but as we calculate higher order terms, the series starts to diverge because eventually higher order terms need information outside the circle. No matter where we calculate using perturbation theory at the origin, it will eventually diverge as we go to higher order. We show this result in figure 3.1. 33 Figure 3.1: Convergence of ground state energy with different methods for ct = 0.05 and 0.1. Perturbation theory fails, but eigenvector continuation does not. Notice that perturbation theory works for a few orders, and then diverges eventually. Now we will explain how eigenvector continuation results were calculated as shown in figure 3.1. We have already described the basics of eigenvector continuation and how it is applied in chapter 2. Here we mention how we choose our training points. In order to make a fair comparison with perturbation theory, we take the same training eigenvectors 34 as perturbation theory. Perturbation theory tries to expand in a series around the point of perturbation, and tries to approximate by truncating the series up to a certain point. We have seen this in previous section in equation 2.6, and we write a similar expansion for perturbation theory at c = 0, but this time truncated to order N, N cn |ψ (n) (0)⟩ X |ψ(c)⟩P T = (3.7) n! n=0 where |ψ (n) (0)⟩ denotes the nth derivative of the eigenvector with respect to the parameter c, and evaluated at c = 0. So the N th order perturbation theory tries to approximate the actual eigenvector as a linear combination of the N derivatives of the ground state eigenvector {|ψ (0) (0)⟩ , · · · , |ψ (N ) (0)⟩}. However, the coefficients for this linear sum are fixed. If we performed eigenvector continu- ation with the same set of derivatives {|ψ (0) (0)⟩ , · · · , |ψ (N ) (0)⟩}, then eigenvector continu- ation will also try to estimate the exact target eigenvector by a linear combination of those same N derivatives of the ground state eigenvector at c = 0. This time however, the coeffi- cients are not fixed, and eigenvector continuation tries to vary these coefficients to get the best approximation. From this perspective, it seems natural that eigenvector continuation will outperform the standard perturbation theory. This is exactly how we do for our eigenvector continuation calculation. For N th or- der EC calculation, we use the N derivatives of the ground state eigenvector at c = 0, {|ψ (0) (0)⟩ , · · · , |ψ (N ) (0)⟩} as our N training eigenvectors. As we mentioned before, higher order derivatives of the eigenvector at the origin have information of the eigenvector near the origin, with higher the order of derivative, the further away information we have. So our choice of training vectors {|ψ (0) (c)⟩ , · · · , |ψ (N ) (c)⟩} is equivalent to taking N training 35 vectors close to the origin. However, the derivatives are a systematic way to adding more training vectors, rather than randomly choosing training points near origin, and we stick to choosing higher order derivatives as our training vectors for eigenvector continuation. This idea of choosing training points will be important later when we discuss the convergence of eigenvector continuation in next chapter. We have established through figure 3.1 that eigenvector continuation works and is able to converge to the correct result, whereas perturbation theory fails. We show in figure 3.2 that eigenvector continuation is also able to extrapolate to points far away from the origin. This might be surprising since our original circle where we can analytically continue from is abysmally small. However the idea of single analytic continuation can be extended to several analytic continuations, drawing bigger and bigger circles each time. This idea of multiple analytic continuations is illustrated in figure 3.3. Figure 3.2: Eigenvector continuation converges for ct = 5, a point far away from origin. At this point perturbation theory diverges right away. To keep this section shorter, and since it is not directly related to the results we are 36 Figure 3.3: Even though we have a singularity z0 in the negative c axis, we can analytically continue to any point on the positive c axis with the help of multiple analytic continuations. presenting, we describe in the supplemental materials, how we computationally calculate the derivatives of the ground state eigenvector. Although figures 3.1 and 3.2 show how the eigenvector continuation method converges as we add more training points, we learn more details when we look at it in a log graph. The first thing we observe is that for the first few orders there is an exponential convergence (which is the best type of convergence we want). This is shown in figure 3.4. However, as we go higher in order of eigenvector continuation, we notice that the convergence accelerates after some time (see figure 3.5). This can be understood intuitively by the fact that as we perform analytic continuation and go further away from our singularities, we can perform the next analytic continuation with larger radii, which accelerates the convergence rate. We can compare the error convergence of eigenvector continuation to other simple meth- ods, like direct diagonalization of the Hamiltonian matrix. Now direct diagonalization of a large matrix can be difficult, however we can employ a truncation trick to simply the process. 37 Figure 3.4: Exponential convergence in eigenvector continuation. Our Hamiltonian matrix might be a huge n-dimensional matrix, but we simply truncate the matrix to the lowest N states, and diagonalize a N × N Hamiltonian matrix, where N can be a small number. As we increase N and it approaches the dimension of the original Hamil- tonian matrix, the eigenvalue solution of the truncated matrix will converge exponentially to the eigenvalue solution of the actual matrix. However, the numerical complexity grows higher as we increase the dimension. Furthermore, if our original unperturbed Hamiltonian is diagonal and we add a non-diagonal perturbation matrix, the numerical complexity in- creases with strength of the perturbation c. So, this direct method may not work with high dimensions of matrices and high perturbation strength. However, for small enough dimen- sions and perturbation strength, we can compare how the diagonalization method works versus the eigenvector continuation. As we mentioned earlier, our current full dimension of Hamiltonian matrix is 200. We can now compare how the convergence of direct orthogonalization of truncated 38 Figure 3.5: Convergence of the eigenvector continuation method with c = 0.1 (top) and 5 (bottom). The convergence accelerates as we analytically continue further away from our point of singularity because for each successive analytic continuation we can draw bigger circles where our series will converge. Hamiltonian depends on truncation size N to the convergence of eigenvector continuation as a function of number of training points N . Figure (3.6) shows the result. Here we have also included the case where we perform eigenvector continuation with training vectors randomly 39 chosen at small c. These eigenvectors contain more information of the subspace than the case where we choose training vectors from perturbative expansion, because they span a larger space along the c axis. This is why we see this kind of eigenvector continuation (with training points near origin) being better than perturbative EC (with training vectors as derivatives of eigenvector at c = 0). We should also clarify a small detail in choosing training eigenvectors for eigenvector continuation, when we use eigenvectors of the Hamiltonian at small perturbation strengths as training eigenvectors. As we mentioned before, the eigenvector continuation method needs to "learn" the low-dimensional subspace through the different training eigenvectors that we choose for the method. When we choose our training eigenvectors to be the exact eigenvectors at K small parameter values, say {c1 , · · · , cK }, we obtain a basis {|ψ(c1 )⟩ , · · · , |ψ(cK )⟩}, and the method projects the Hamiltonian onto the subspace spanned by this basis to get an approximate result. To get better results, we perform a Gram-Schmidt orthogonalization on that set of K eigenvectors {|ψ(c1 )⟩ , · · · , |ψ(cK )⟩}, and remove (if any) vectors that are pointing in almost the same direction. The removal is justified by the fact that such vectors do not give much additional information about the subspace. This is often the case if we choose our ci very close together and the total Hamiltonian and its eigenvectors do not change much within the small variation of ci . We will now try to mathematically estimate the asymptotic error convergence with eigenvector continuation and analytic continuation. But before we begin, we remind ourselves that the singularity that causes perturbation theory to diverge is due to the avoided level crossings of the eigenvectors corresponding to different eigenvalues. These level crossings for the even parity states of anharmonic oscillator is shown in figure 3.7. As we can see, several different states cross with each other. Although it looks like the different lines intersect with 40 Figure 3.6: Comparison of error convergence for direct orthogonalization of truncated Hamil- tonian with dimension N , and eigenvector continuation with order N . Perturbative EC refers to the case where we take the derivatives of the eigenvector at c = 0 as the training points. The other EC refers to the case where we random select training points at small c. For the top figure, ct = 0.1 and for the bottom figure ct = 0.1. We consistently see eigenvector continuation outperforming direct orthogonalization. each other, actually all the crossings are the avoided-level crossing, and they do not intersect. 41 Figure 3.7: Level crossing for even states along negative real axis of c. 3.3 Error Bound for General Analytic Continuation Consider a general case where z0 and z̄0 are the nearest singularities to the origin. Let w1 be a real number such that |w1 | < |z0 |, and let z1 and z̄1 be the nearest singularities to the point w1 . Let c be a real number such that |c − w1 | < |z1 − w1 |. We start with the convergent series ∞ (c − w1 )n1 |ψ (n1 ) (w1 )⟩ X |ψ(c)⟩ = . (3.8) n1 ! n1 =0 Since z1 and z̄1 are the nearest singularities to the point w1 , they sit at the convergence radius for this series expansion. We therefore know that for any positive ρ1 such ρ1 < |z1 − w1 |, there exists a finite positive number B1 (ρ1 ) that gives an upper bound, n ρ1 1 |ψ (n1 ) (w1 )⟩ < B1 (ρ1 ), (3.9) n1 ! 42 Figure 3.8: Analytic continuation of a general wave function |ψ(c)⟩ with singularities at z0 , z1 and their complex conjugates. for all n1 . Let the remainder |R1 (c, w1 )⟩N1 be defined as |R1 (c, w1 )⟩N1 = |ψ(c)⟩ − [|ψ(c, w1 )⟩]N , (3.10) 1 where N1 (c − w1 )n1 |ψ (n1 ) (w1 )⟩ X [|ψ(c, w1 )⟩]N = . (3.11) 1 n1 ! n1 =0 We then have ∞ X (n ) (c − w1 )n1 |R1 (c, w1 )⟩N1 = |ψ 1 (w1 )⟩ , (3.12) n1 ! n1 =N1 +1 43 which satisfies the bound ∞ |c − w1 |n1 |ψ (n1 ) (w1 )⟩ X |R1 (c, w1 )⟩N1 < n1 ! n1 =N1 +1 ∞ |c − w1 | n1 X   < B1 (ρ1 ) ρ1 n1 =N1 +1 |c − w1 | N1 +1   ρ1 < B1 (ρ1 ) , (3.13) ρ1 − |c − w1 | ρ1 for any positive ρ1 with ρ1 < |z1 − w1 |. We now consider the finite sum, N1 (c − w1 )n1 |ψ (n1 ) (w1 )⟩ X [|ψ(c, w1 )⟩]N = . (3.14) 1 n1 ! n1 =0 We can write |ψ (n1 ) (w1 )⟩ as the convergent series ∞ n w1 0 |ψ (n1 ) (w (n +n ) X 1 )⟩ = |ψ 0 1 (0)⟩ . (3.15) n0 ! n0 =0 Since z0 and z̄0 are the nearest singularities to the origin, they sit at the radius of convergence for this series expansion. Therefore we know that for any positive ρ0 with ρ0 < |z0 |, there exists a finite positive number B0 (ρ0 ) such that n ρ0 0 |ψ (n0 +n1 ) (0)⟩ < B0 (ρ0 ) (3.16) n0 ! for all n0 and for all n1 from 0 to N1 . There also exists a finite positive number B̃0 (ρ0 ) such 44 that N1 n ρ0 0 |c − w1 |n1 |ψ (n0 +n1 ) (0)⟩ X < B̃0 (ρ0 ). (3.17) n0 !n1 ! n1 =0 Let the remainder |R0 (c, w0 , w1 )⟩N0 ,N1 be defined as |R0 (c, w0 , w1 )⟩N0 ,N1 = [|ψ(c, w1 )⟩]N − [|ψ(c, w0 , w1 )⟩]N ,N , (3.18) 1 0 1 where N1 N0 n X X (n +n ) w1 0 (c − w1 )n1 [|ψ(c, w0 , w1 )⟩]N ,N = |ψ 0 1 (0)⟩ . (3.19) 0 1 n0 ! n1 ! n1 =0 n0 =0 The remainder |R0 (c, w0 , w1 )⟩N0 ,N1 is then N1 ∞ n X X (n +n ) w1 0 (c − w1 )n1 |R0 (c, w0 , w1 )⟩N0 ,N1 = |ψ 0 1 (0)⟩ , (3.20) n0 ! n1 ! n1 =0 n0 =N0 +1 and this satisfies the bound N1 ∞ |w1 |n0 |c − w1 |n1 |ψ (n0 +n1 ) (0)⟩ X X | R0 (c, w0 , w1 )⟩N0 ,N1 < , n0 ! n1 ! n1 =0 n0 =N0 +1 ∞ |w1 | n0 X   < B̃0 (ρ0 ) ρ0 n0 =N0 +1 |w1 | N0 +1   ρ0 < B̃0 (ρ0 ) . (3.21) ρ0 − |w1 | ρ0 Putting everything together, we have the result |w1 | N0 +1   ρ0 |ψ(c)⟩ − [|ψ(c, w0 , w1 )⟩]N ,N < B̃0 (ρ0 ) 0 1 ρ0 − |w1 | ρ0 |c − w1 | N1 +1   ρ1 +B1 (ρ1 ) , (3.22) ρ1 − |c − w1 | ρ1 45 where N1 N0 n X X (n +n ) w1 0 (c − w1 )n1 [|ψ(c, w0 , w1 )⟩]N ,N = |ψ 0 1 (0)⟩ . (3.23) 0 1 n0 ! n1 ! n1 =0 n0 =0 If we now choose ρ0 = |z0 | − ϵ0 for small positive ϵ0 and choose ρ1 = |z1 − w1 | − ϵ1 for small positive ϵ1 , then we have the asymptotic bounds N +1 N +1 0 1   |w1 | |c − w1 | |ψ(c)⟩ − [|ψ(c, w0 , w1 )⟩]N ,N = O +O , 0 1 |z0 | − ϵ0 |z1 − w1 | − ϵ1 (3.24) in the limits N1 → ∞ followed by N0 → ∞. If we were to analytically continue two times (at center w1 and w2 ), then the asymptotic error bound would be given by, N +1 0  |w1 | |ψ(c)⟩ − [|ψ(c, w0 , w1 )⟩]N ,N = O (3.25) 0 1 |z0 | − ϵ0 N +1 N +1 1 2   |w2 − w1 | |c − w2 | +O +O (3.26) |z1 − w1 | − ϵ1 |z2 − w2 | − ϵ1 Similarly, we could write an equation for n analytic continuations. 3.4 Error Bound for the Anharmonic Oscillator Now, let us consider the case of anharmonic oscillator. p2 mω 2 x2 H= + + cx4 (3.27) 2m 2 If we truncate our matrix dimension at some value Nmax , then this would correspond to analyzing the anharmonic oscillator up to a certain maximum distance rmax from the 46 √ origin, which would go as rmax ∼ Nmax . Since x4 dominates x2 , if we go large enough x, the potential becomes negative for negative c and no bound state can exist (since V → −∞ as x → ∞). Thus, we will have a singularity when c is such that, mω 2 rmax 2 4 + crmax ≈0 2 Thus, the dependence of the point of singularity on Nmax goes like c ∼ −1/Nmax . When we truncate our Hilbert space at Nmax , the matrix is finite-dimensional and Hermitian, so the eigenvector will be analytic on the real axis (real part of c). So, the singularity must appear on the complex plane, with non-zero Im(c). It was shown in [24] that the singularity lies close to the negative imaginary axis. We have tried to verify this by looking at the level crossing of the lowest two even parity states. This is shown in the figure 3.9, where the difference between the two lowest even parity state eigenvalues is shown over the complex plane near origin. We can see some nice avoided level crossings that we mentioned in figure 3.7. Let z0 be the real part of point of singularity, which would be situated along the negative real axis. Let it be at a distance of r0 from the origin, which is related to the dimension of our Hamiltonian matrix Nmax , by r0 ∼ 1/Nmax . Then, we can perform the analytic continuation as described earlier. To extend the reach of the process, we can perform multiple analytic continuation. Although the actual point of singularity is at some point further than |z0 |, we will perform the following analysis assuming that the singularity is at z0 . This choice actually limits our analytic continuation, as our radius of convergence would be a little larger than what we treat here, but we will show that even with moving the singularity a little closer to the origin, there still exists analytic continuation to any point on the real axis. 47 Figure 3.9: Difference of absolute values of lowest two even parity state eigenvalues. For the first continuation, let us choose w1 = αr0 − r0 with α > 1, and so w1 lies along positive real axis (see figure 3.10). Then, we can analytically continue to a circle of radius r1 = αr0 . Now, we can perform the analytic continuation again, but this time the expansion 48 Figure 3.10: Multiple analytic continuation of the wave function |ψ(c)⟩ for the anharmonic oscillator. would be around w1 , instead of the origin. Let us choose w2 = αr1 − r0 (from the origin and along the real axis), so the radius of analytic continuation would be r2 = αr1 = (α)2 r0 . Analytically continuing again, by taking the center at w3 = αr2 − r0 (expanding around w2 ), this time we get the radius of convergence to be r3 = (α)3 r0 . Repeating this process, we find that when we do the analytic continuation for the nth time, we can analytically continue to a circle of radius rn = (α)n r0 . The center of the circle is at c = αrn − r0 on the real axis, and the rightmost point on the real axis during the nth iteration is 2rn − r0 . Each time the next center of circle is (α − 1)rn away from the previous center of the circle. In the case of anharmonic oscillator we have z0 = z1 , and thus using equation (3.24), 49 we can find the asymptotic error for one analytic expansion, N +1 N +1 0 1   |w1 | |c − w1 | |ψ(c)⟩ − [|ψ(c, w0 , w1 )⟩]N ,N = O +O 0 1 |z0 | − ϵ0 |z0 − w1 | − ϵ1 N +1 0 (α − 1)r1 N1 +1    (α − 1)r0 =O +O r0 − ϵ0 r1 − ϵ1 ∼ O (α − 1)N0 +1 + O (α)N1 +1 (3.28) Now, for two analytic continuations, then the asymptotic error would be,  N0 +1  N1 +1 |w1 | |w2 − w1 | |ψ(c)⟩ − [|ψ(c, w0 , w1 , w2 )⟩]N0 ,N1 ,N2 = O +O |z0 | − ϵ0 |z0 − w1 | − ϵ1  N2 +1 |c − w2 | +O |z0 − w2 | − ϵ1  N0 +1  N1 +1 (α − 1)r0 (α − 1)r1 ⇒ |ψ(c)⟩ − [|ψ(c, w0 , w1 , w2 )⟩]N0 ,N1 ,N2 = O +O r0 − ϵ0 r1 − ϵ1  N1 +1 (α − 1)r2 +O r2 − ϵ1 ∼ O (α − 1)N0 +1 + O (α − 1)N1 +1 + O (α − 1)N2 +1 (3.29) Continuing in this fashion, we find the asymptotic error for n analytic continuations is, |ψ(c)⟩ − [|ψ(c, w0 , w1 , ..., wn )⟩]N ,...,Nn ∼ O (α − 1)N0 +1 + O (α − 1)N1 +1 0 + ... + O (α − 1)Nn +1 (3.30) From equation (3.30), it is clear that as long as 1 < α < 2, we let our order of eigenvector 50 continuation go to infinity and we choose N0 , N1 , ..., Nn → ∞, the error will go to zero (the sum of Ni is the order of EC). Thus, this analytic continuation method always converges, even though perturbation theory diverges. However, note that infinite N0 , N1 , ..., Nn would correspond to infinite order of eigenvec- tor continuation. Let us see what we can do in finite order of EC. Suppose we are interested in reaching near the point c = 1 on the real axis and ask ourselves the question how many analytic continuations do we need to reach that point. We find that the number of iterations n should be such that, rn+1 − r0 ∼ 1 ⇒ (α)n+1 r0 − r0 ∼ 1 (3.31) Substituting the relation between r0 and Nmax (recall that Nmax was our dimension of Hamiltonian matrix), we have (α)n+1 − 1 ∼ Nmax (3.32) and thus, ln(Nmax + 1) n∼ −1 (3.33) ln(α) So, n should be an integer larger than the right hand side of equation (3.33). Now suppose that we are interested in analytically continuing till c = 1 with a given order of eigenvector continuation, say M . We still need at least n separate analytic continuations, where n is given by equation (3.33). If each of those expansion series (like equation (3.8)) Pn have Ni terms, then we have i=0 Ni = M . With this constraint, we ask the question "What is the optimal choice of α to minimize error for given M, Nmax and c = 1?" 51 With equation (3.30) in mind, we try to minimize, n (α − 1)Ni +1 , X (3.34) i=1 subject to the constraint, Xn Ni = M (3.35) i=1 where n is the number of analytic continuation, given by equation (3.33). This minimization does not entirely make sense since equation (3.30) is only defined up to order of magnitude and we are missing some constant factors in each sum term, which in principle could be anything. However, the solution of this minimization does give us an idea of what ideal α could look like. Utilizing symmetry in individual terms, we find that Ni are all equal with Ni = M/n, and thus we minimize n(α − 1)M/n , with n given by (3.33). We solve the problem numerically and list our results for different M and Nmax Nmax M Number of analytic continuations (n) Optimum α 10 5 2 1.8171 10 10 4 1.6154 50 10 4 1.6154 50 20 6 1.5449 50 30 8 1.4646 50 40 9 1.4497 50 50 10 1.4297 100 10 8 1.6699 100 25 10 1.5213 100 50 12 1.4262 100 75 13 1.3905 100 100 13 1.3905 Table 3.1: The optimal choice for the number of analytic continuations and the corresponding α (ratio of radii while performing analytic continuations) required to minimize approximation error. The results suggest that about α = 1.5 seems to be a good ratio of radii between 52 iterations. To find an exact value for optimal α, we would need to define equation (3.30) more concretely, but we do not have an accurate knowledge of the wave-functions. However, our main goal in this derivation is to show that eigenvector continuation with given order does make sense, and the results do converge to the actual numbers as we increase our order of EC and we have indeed shown that. Note that the choice of optimal α has no effect on our numerical calculation of eigen- vector continuation results. We merely present a discussion to better understand the con- vergence of eigenvector continuation. Our eigenvector continuation method does not have α as a free parameter and we have no control over it. For a given order of EC, the only choice that we have is our selection of training eigenvectors for the method, and it is unclear how that indirectly chooses some α (and therefore n) for us. 53 Chapter 4 Convergence of Eigenvector Continuation So far we have seen that eigenvector continuation works brilliantly to simplify our calcula- tions, and allowing us to extrapolate and interpolate in the parameters of the Hamiltonian. One natural question that arises next is how to optimize our given eigenvector continuation calculation. We have a trade-off in accuracy of the emulator and the computational time it takes for emulation, and we want to get the best results with a given number of training points. So we want to know where should we take our training points to get best perfor- mance, and what order eigenvector continuation should be perform, i.e., how many training vectors we need. However, to answer these questions, we need to first understand the conver- gence properties of eigenvector continuation as we add more training points and as we vary the location of training points. In this chapter, we will discuss how eigenvector continuation converges as a function of number of training points, and we will gain additional insight into how eigenvector continuation works. In the next chapter we will come back to where we should take training points for best results. The answer to what we are seeking are intuitively somewhat clear. Eigenvector con- tinuation method learns the low-dimensional subspace in which the target eigenvector lives through the different training eigenvectors that we choose for the method. With a fixed number of training points, we want our training points to be further apart so that we have 54 more information about the eigenspace, and this will make eigenvector continuation learn the manifold more. And it is also clear that the more training eigenvectors we choose, the better our EC approximation will be. No. of Values of ci ct EC ground state energy Exact Energy terms 3 0.01, 0.02, 0.03 0.4 0.672645280 0.668772604 3 0.02, 0.04, 0.06 0.4 0.670458737 0.668772604 3 0.03, 0.06, 0.09 0.4 0.669562096 0.668772604 4 0.01, 0.02, 0.03, 0.04 0.4 0.669503683 0.668772604 4 0.02, 0.04, 0.06, 0.08 0.4 0.668956488 0.668772604 5 0.01, 0.02, · · · , 0.05 0.4 0.668898346 0.668772604 5 0.02, 0.04, · · · , 0.10 0.4 0.668788797 0.668772604 10 0.01, 0.02, · · · , 0.10 0.4 0.668772607 0.668772604 3 0.01, 0.02, 0.03 1 0.842860074 0.803770651 3 0.02, 0.04, 0.06 1 0.827563722 0.803770651 3 0.03, 0.06, 0.09 1 0.819258718 0.803770651 4 0.01, 0.02, 0.03, 0.04 1 0.817399692 0.803770651 4 0.02, 0.04, 0.06, 0.08 1 0.809913282 0.803770651 5 0.01, 0.02, · · · , 0.05 1 0.808359378 0.803770651 5 0.02, 0.04, · · · , 0.10 1 0.805211425 0.803770651 10 0.01, 0.02, · · · , 0.10 1 0.803778746 0.803770651 Table 4.1: Eigenvector continuation in anharmonic oscillator. We calculate the exact eigen- vectors at location ci and use them as training eigenvectors for EC, to estimate the eigenvec- tor at ct . We see that EC approximation becomes better with more training eigenvectors, and with more spaced apart training points. However, the error dependence of eigenvector continuation on the location of training points is non-trivial and extremely problem specific. It is hard to make general comments about the convergence behavior as we move our training points around. So it is very hard to answer for a general Hamiltonian where we should choose our training points. As to how many training points we need, it depends on how complicated our problem is, and how much accuracy we want. Let us look at this point through an example. Table 4.1 shows how eigenvector continuation performs as we vary the location and number of training points in the problem of the anharmonic oscillator. 55 And we have seen in the case of the anharmonic oscillator that as we add more training points, the error convergence looks exponential (see figure 3.4). Table 4.1 shows the data for that graph. No. of Values of ci EC energy Exact Difference from terms Energy Exact Energy 2 0.01, 0.02 0.917673992 0.803770651 0.113903341 3 0.01, 0.02, 0.03 0.842860074 0.803770651 0.039089423 4 0.01, 0.02, · · · , 0.04 0.817399692 0.803770651 0.013629041 5 0.01, 0.02, · · · , 0.05 0.808359378 0.803770651 0.004588727 6 0.01, 0.02, · · · , 0.06 0.805234149 0.803770651 0.001463498 7 0.01, 0.02, · · · , 0.07 0.804209584 0.803770651 0.000438933 8 0.01, 0.02, · · · , 0.08 0.803894175 0.803770651 0.000123524 9 0.01, 0.02, · · · , 0.09 0.803803266 0.803770651 3.26149E-05 10 0.01, 0.02, · · · , 0.10 0.803778746 0.803770651 8.09511E-06 12 0.01, 0.02, · · · , 0.12 0.803771231 0.803770651 5.7948E-07 Table 4.2: Energy Error with eigenvector continuation in the anharmonic oscillator at ct = 1. It converges exponentially as we increase the order of the method (see figure 3.4). So the eigenvector continuation error depends on the number of training points and the location of the training points. As we mentioned in the beginning of this chapter, there are two questions - how many training points we need, and where to choose our training points. It is hard to answer them together because the error convergence depends on both of them in a complicated way. We simplify the problem and study them separately. We study the error convergence with respect to number of training points, and separately consider the question of how to choose our training points, when we have a fixed number of them. We consider the latter question in the next chapter. It is hard to answer the question of where to choose our N training points for any given problem, but we can come up with an algorithm to iteratively choose the best location of the next training point, and we can start at one or two random training points and repeat the algorithm as many times as we want. In this chapter, we will consider the answer to the former question - the error dependence 56 on the number of training points. However, since the error varies so much as we vary the location of a training point, it makes no sense to study the error convergence as we add another training point, because the error depends on where we add the next training point. So if we want to study the error convergence with respect to number of training points, we need a systematic way to add our next training point. And we have seen one systematic way in the anharmonic oscillator problem in the previous chapter. We were taking the derivatives of the eigenvector at c = 0 and using them as our training eigenvectors. Now higher order derivatives of the eigenvector at origin are well defined, and it makes sense to ask questions like how does my eigenvector continuation approximation improve as we add one more higher order derivative to the training set. This systematic way is also great way to extend to large number of training points, because we want to know the asymptotic error convergence of the method. To study this idea of adding higher order derivatives to the training set mathematically, we introduce a more generalized form of eigenvector continuation, and we call it vector continuation (VC). We generalize the problem from being interested in eigenvector of a particular n-dimensional Hamiltonian with a variable parameter c, to being interested in a vector living in n-dimensional vector space, parameterized by c. Our n-dimensional vector could be the eigenvector of a n-dimensional Hamiltonian, and in that case we go back from vector continuation to eigenvector continuation. Vector continuation allows us to redefine our problem more mathematically, and this generalization help us understand eigenvector continuation’s convergence properties in a general sense. After we derive our results, we will show that vector continuation when applied to our usual problem of finding eigenvectors of a parameterized Hamiltonian gives us the same result as that of eigenvector continuation. 57 4.1 Vector Continuation Consider a n-dimensional vector space V parameterized by c. We first consider c to be one-dimensional parameter. The ideas presented here can be extended to a vector space parameterized by multi-dimensional c. Let us denote any vector in this space by |v(c)⟩. We assume that this vector space varies smoothly as we vary c. Since this is a smooth space in c, we can apply all our tools from analysis. Suppose we are interested at knowing the vector |v(ct )⟩ at a particular value of the parameter ct , but we only have the information of the vector at specified training points ci = {c1 , · · · , ck }. Let us denote the training vectors as |v(c0 )⟩ , · · · , |v(ck )⟩. Like eigenvector continuation, in vector continuation we approximate |v(ct )⟩ as a linear combination of vectors |v(c0 )⟩ , · · · , |v(cN )⟩ at training points c = c0 , · · · , cN . The difference is that in vector continuation we construct the best approximation by projecting |v(ct )⟩ onto the subspace spanned by the training point vectors. While this sounds similar to eigenvector continuation, the way we implement it in our code is different. Vector continuation is a simpler process than the variational calculation used in eigenvector continuation. However, we should emphasize that since it requires knowledge of the target eigenvector, vector con- tinuation should be viewed as a diagnostic tool to study error convergence, rather than a practical method for determining |v(ct )⟩. We will show that eigenvector continuation and vector continuation have identical convergence properties, and so it suffices to understand the convergence properties of vector continuation. As the name suggests, vector continuation can also be generalized to any smooth vector path |v(c)⟩ without reference to Hamiltonian matrices or eigenvectors. We want to understand the asymptotic convergence of eigenvector and vector contin- 58 uation at large orders. As we mentioned earlier, we need a systematic way to add more training points. To do this we consider a sequence of training points c0 , · · · , cN with a well-defined limit point clim for large N that is some distance away from the target point ct . Without loss of generality we can redefine clim so that our limit point corresponds to c = 0. In the limit where the training points accumulate around c = 0, we can replace our training vectors |v(c0 )⟩ , · · · , |v(cN )⟩ with the derivatives of |v(c)⟩ at c = 0, which we write as |v (0) (0)⟩ , · · · , |v (N ) (0)⟩. These derivative vectors approximately span the same N + 1- dimensional subspace as our original training vectors. We explained the reasoning behind this in the previous chapter. And there is an important detail here that we should mention. Although we are taking the limit of large N , the values of N we take are always vastly smaller than the number of dimensions of our linear space. Our choice of training points provides a good definition for the convergence properties at large orders. It can be viewed as the worst possible choice of training points for best convergence since all of our training points are clustered in one area, and we need to extrap- olate to a target point located somewhere else. When the training points are spread apart and not clustered, the convergence is generally much faster, especially if the training points surround the target point from all sides. We have seen this in the example of anharmonic oscillator. However, as we mentioned before, the convergence for this more general case is highly dependent on the positions of the training points, and it is difficult to make a clean definition of asymptotic convergence. Therefore, for now, with our choice of training points, we try to learn the worst possible error convergence for large N . We have already seen the convergence is faster in the one parameter case of the anharmonic oscillator when the train- ing points are spread out, and later we will show that convergence for the multi-parameter case, it is much faster if there exists a smooth curve connecting some subset of the training 59 points to the target point. We will now compare the error convergence of vector continuation, eigenvector continua- tion, and perturbation theory. First we define what are our three different approximations to the target eigenvector |v(ct )⟩ in the three different methods are. For the given target eigen- vector |v(ct )⟩, we denote the order-N eigenvector continuation approximation as |v(ct )⟩EC N , the order-N vector continuation approximation as |v(ct )⟩VC N , and the N th order perturba- tive theory approximation as |v(ct )⟩PT N . For eigenvector continuation to make sense, in the following we assume that the vector |v(c)⟩ is actually the eigenvector of some Hamiltonian H(c). As we will soon see, vector continuation does not care about H(c) and works with only |v(ct )⟩ (assuming it is given somehow), whereas eigenvector continuation works with the Hamiltonian H(c). All three methods work by approximating the target eigenvector as a linear combination of the training eigenvectors. Therefore, to make a fair comparison, we use the same set of training vectors {|v (0) (0)⟩ , · · · , |v (N ) (0)⟩} for eigenvector continuation and vector continua- tion. So we have our set of training derivative vectors |v (0) (0)⟩ , · · · , |v (N ) (0)⟩, and we per- form Gram-Schmidt orthogonalization to get a set of orthonormal vectors, which we call |w(0) (0)⟩ , · · · , |w(N ) (0)⟩. Here we are assuming that the derivative vectors are linearly in- dependent, and therefore our orthonormal set also has size N . This assumption is generally true for almost all of the practical problems we encounter in physics. With this orthonormal basis |w(0) (0)⟩ , · · · , |w(N ) (0)⟩, we define our vector continuation approximation |v(ct )⟩VC N as N |v(ct )⟩VC ⟨w(n) (0)|v(ct )⟩ |w(n) (0)⟩ . X N = (4.1) n=0 60 Notice that we have just projected the target eigenvector onto the individual (orthonormal) training eigenvectors. Therefore, to perform this we need the information about the target eigenvector |v(ct )⟩. As a reminder, vector continuation is not a practical method to extrap- olate to a target location. It is only a diagnostic tool to compare and understand error convergence. In the same orthonormal basis of training vectors, we can also write the eigenvector continuation approximation |v(ct )⟩EC N as N |v(ct )⟩EC a(ct , n, N ) |w(n) (0)⟩ , X N = (4.2) n=0 where the coefficients a(ct , n, N ) are found by minimizing the expectation value of H(ct ). And we can consider perturbation theory (PT) around the point c = 0. If the nearest branch point to c = 0 is z0 , then the series expansion ∞ cn |v (n) (0)⟩ t X |v(ct )⟩ = (4.3) n! n=0 will converge for |ct | < |z0 | and diverge for |ct | > |z0 |. We define the perturbation theory approximation |v(ct )⟩PTN as the partial series truncated at order n = N , N cn |v(ct )⟩PT |v (n) (0)⟩ t . X N = (4.4) n! n=0 In this analysis we have assumed that the radius of convergence is greater than zero. This follows from the fact that H(c) is a finite-dimensional Hermitian matrix. And we have made similar observation with the case of anharmonic oscillator. For the following discussion, we assume ct is within our radius of convergence of perturbation theory, and thus equation 4.4 61 is well defined and converges. We define the error in these three approximations to |v(ct )⟩ by computing the norm of the residual vector, which is the difference of exact vector and its approximation. When we look at the norm of the residual vector as a function N , we can determine whether the approximations are converging or not. Now we will compare the three different approximations in equations 4.1, 4.2 and 4.4 through some examples. The first example that we consider, which we call Model 1, is a one parameter family of eigenvectors of a Hamiltonian matrix of the form H(c) = H0 + cH1 , where H0 and H1 are Hermitian matrices. We choose H0 to be a diagonal matrix, while H1 has both diagonal and off-diagonal elements. We consider three different situations with different H0 and H1 , and we call them Models 1A, 1B, and 1C. In all three cases, the dimension of H matrix is 800. In Model 1A, we take H0 to be the matrix with elements H0 (n, n) = n for n = 1, · · · , 800. We choose H1 to be H1 (n, n) = n for n = 1, · · · , 800 and H1 (n + 2, n) = H1 (n, n + 2) = 1, (4.5) for n = 1, · · · , 798. As we mentioned before, avoided level crossing in the complex plane is the cause of singularities, which limit the radius of convergence of perturbation theory. Since we want to look at a target point which is within the radius of convergence of perturbation theory, it is important to know where the singularity for this Hamiltonian is in the complex plane. For this model, the nearest branch point to the origin is located at c = −0.559±0.497i. In Model 1B, we take H0 to be the matrix with elements H0 (n, n) = 2n for n = 62 1, · · · , 800. We choose H1 to be H1 (n, n) = n for n = 1, · · · , 800 and H1 (n + 2, n) = H1 (n, n + 2) = 1, (4.6) for n = 1, · · · , 798. This time the nearest branch point to the origin is located at c = −1.422 ± 0.503i. In Model 1C, we take H0 to be the matrix with elements H0 (n, n) = 100n for n = 1, · · · , 800. We choose H1 to be H1 (1, 1) = −100, H1 (2, 2) = −200, and take H1 (n, n) = −75n for n = 3, · · · , 800. For the off-diagonal entries we set H1 (n + 1, n) = H1 (n, n + 1) = 1, (4.7) for n = 1, · · · , 799. In this model, the nearest branch point to the origin is located at c = 0.907 ± 0.255i. With these three different sets of matrices defining three different problems, we try to approximate the eigenvector |v(ct )⟩ with our three approximation techniques in each of the problems. For all three model calculations, the target value of the parameter is ct = 0.2, which is within the radius of convergence of perturbation theory for all three cases. In Fig. 4.1 we plot the logarithm of the error for all three cases of Model 1. From figure 4.1 we can make two important observations. The first observation is that eigenvector and vector continuation converge faster than perturbation theory. The second point is that eigenvector and vector continuation have nearly identical errors at each order. We should also mention that there is nothing special about these matrix models, and we find similar results to these in all other matrix examples where perturbation theory is convergent. 63 Figure 4.1: Logarithm of the error versus order N for eigenvector continuation (asterisks), vector continuation (solid lines), and perturbation theory (dashed lines). The three different colors (black, blue and red) correspond with Models 1A, 1B, and 1C respectively. So this shows that eigenvector continuation performs better than perturbation theory even where perturbation theory works, and we have much less error at same order of calcu- lation. The second point is even more interesting - even though we calculated eigenvector continuation and vector continuation approximations very differently (see equations 4.1 and 4.2), they have the same convergence properties. This tell us more about how eigenvector continuation works as a method and how well it converges. Let us try to understand why we get similar convergence results with eigenvector continuation and vector continuation. 64 4.2 Asymptotic Convergence of Eigenvector Continua- tion and Vector Continuation Let us now try to show that eigenvector continuation and vector continuation have iden- tical convergence properties. Consider vector continuation at order N . Let V N (0) be the subspace spanned by |w(0) (0)⟩ , · · · |w(N ) (0)⟩, and let V⊥N (0) be the orthogonal complement to V N (0). Now equation 4.1 tells us that there is no error at all in the coefficients of |w(0) (0)⟩ , · · · |w(N ) (0)⟩, and the residual vector for |v(ct )⟩VC N N lies entirely in V⊥ (0). Next we consider eigenvector continuation at order N . In this case we project H(ct ) onto V N (0) and find the resulting ground state by minimizing the eigenvalue. In this way we have effectively turned off all matrix elements of H(ct ) that involve vectors in V⊥N (0). If we now turn on these matrix elements as a perturbation, then we get a first-order correction to the wave function from transition matrix elements connecting V N (0) with V⊥N (0). However, this correction to the wave function that lies in V⊥N (0), and our wave function in V N (0) is not affected. The corrections to the coefficients of |w(0) (0)⟩ , · · · |w(N ) (0)⟩ will appear at second order in perturbation theory, since this involves transitions from V N (0) to V⊥N (0) and then returning back from V⊥N (0) to V N (0). If the norm of the residual vector for eigenvector continuation is O(ϵ), then eigenvector continuation and vector continuation will differ at O(ϵ2 ). This proves that eigenvector continuation and vector continuation have identical convergence properties in the limit of large N . Now let us study the asymptotic convergence properties by looking at the last term in our approximations in equations 4.1, 4.2 and 4.4. The idea here is that we can understand how quickly these series converge by looking at last term in the series, and observing how quickly that is converging in the limit of large N . The norm of the last term of all the 65 corresponding series approximations are, LVC N (ct ) = ⟨w (N ) (0)|v(c )⟩ , t (4.8) LEC N (ct ) = |a(ct , N, N )| , (4.9) N LPT (c ) = |v (N ) (0)⟩ ct . (4.10) N t N! Again in equation 4.8, we need to knowledge of the exact eigenvector |v(ct )⟩. If we don’t have that information, and we are within the radius of convergence of perturbation theory, then we can expand |v(ct )⟩ perturbatively, similar to equation 4.3. Then we have, ∞ cn LVC ⟨w(N ) (0)|v (n) (0)⟩ t . X N (ct ) = n! (4.11) n=0 However, by definition |w(N ) (0)⟩ is orthogonal to {|v (0) (0)⟩ , · · · , |v (N −1) (0)⟩} and thus we can write, ∞ cn LVC ⟨w(N ) (0)|v (n) (0)⟩ t . X N (ct ) = n! (4.12) n=N So even though we do not know |v(ct )⟩, we can find LVC N (ct ) perturbatively, order by order. In this series expression, we will call the first term in the series for n = N as the leading order (LO) approximation. We will call the partial series up to n = N +1 the next-to-leading order (NLO) approximation, and so on. The Nk LO approximation is similarly defined by, N +k VC,Nk LO cn ⟨w(N ) (0)|v (n) (0)⟩ t , X LN (ct ) = (4.13) n! n=N and at leading order we have 66 VC,LO cN LN (ct ) = ⟨w(N ) (0)|v (N ) (0)⟩ t . (4.14) N! Now if we look at equations 4.14 and 4.10, then we notice that they are similar, except for ⟨w(N ) (0)|v (N ) (0)⟩ instead of |v (N ) (0)⟩. However, notice that ⟨w(N ) (0)|v (N ) (0)⟩ is smaller than the norm of |v (N ) (0)⟩ because, in general, |v (N ) (0)⟩ is not orthogonal to the other derivative vectors {|v (0) (0)⟩ , · · · , |v (N −1) (0)⟩}. This explains why vector continuation is converging more rapidly than perturbation theory. Perturbation theory must deal with constructive and destructive interference between non-orthogonal vectors at different orders of perturbation theory, a phenomenon that we call differential folding. Depending on the problem, differen- tial folding can be a very large effect, and it is the reason why perturbation theory converges more slowly than vector and eigenvector continuation. Another way to put it is that we know that a Taylor expansion like ∞ xn f (n) (0) X f (x) = (4.15) n! n=0 converges as we take the limit to infinity. However, for an approximation with a fixed number of terms in the series (like in equation 4.4), N xn f (n) (0) X f˜(x) = (4.16) n! n=0 the choice of Taylor series coefficients xn /n! for the derivatives is not optimal. Our intuition of Fourier analysis tells us that we get a much better approximation when we make a sum of orthogonal terms, and that is indeed the case. Vector continuation uses this orthogonality, 67 and thus is better than perturbation theory. Now let us use the last terms in the three series to study the asymptotic convergence. We look at the convergence ratio obtained by taking the ratio of the last term in the series at two widely separated orders N ′ and N , with N > N ′ . We define the convergence ratios as follows, 1/(N −N ′ ) µVC (ct ) = LVC VC N (ct )/LN ′ (ct ) , (4.17) 1/(N −N ′ ) µEC (ct ) = LEC EC N (ct )/LN ′ (ct ) , (4.18) 1/(N −N ′ ) µPT (c t) = PT PT LN (ct )/LN ′ (ct ) . (4.19) These definitions are motivated from the ratio test of convergence for a series. Intuitively, µ is ratio at which consecutive terms in the series converge (or diverge) asymptotically. However, note that our definitions with finite orders will have a small dependence on N ′ and N . However, for notational convenience, we omit writing the explicit dependence on N and N ′ . Also, when it is clear from the context, we omit writing the dependence on ct as well. We also note that these convergence ratio functions will not be very smooth. Occasionally when the numerator vanishes, we will have cusps and it will blow up whenever the denominator vanishes. Fortunately these special points occur at only a few isolated values of ct , and we eliminate cusps or divergences at any particular value of ct by changing N or N ′ . Overall the functions in Eq. (4.17), (4.18), and (4.19) provide a useful measure of the convergence properties of the three methods. Now let us look at how these three convergence ratios compare with an example. We look at a new practical problem, and we call this example Model 2. In this model, we consider a system of spin-half fermions with attractive zero-range interactions in three dimensions. 68 As we vary the interaction strength between the fermions, we observe some very interesting physics. At weak coupling this many-body system acts like a Bardeen-Cooper-Schrieffer (BCS) superfluid, but at strong coupling it behaves like a Bose-Einstein condensate (BEC) [27, 28]. In between the BCS and BEC regimes, there is a smooth crossover region that contains a scale-invariant point called the unitary limit. At this point the scattering length diverges, and the two-body system has a zero energy resonance. There have been various experimental studies of BCS-BEC crossover and the unitary limit using trapped ultra-cold Fermi gases of alkali atoms [29–32]. Here we use eigenvector continuation to study the crossover transition for two spin-up and two spin-down fermions in an L = 4 periodic cubic lattice as detailed in Ref. [33]. The corresponding Hamiltonian has 49 = 262144 dimensions, and we use projection operators to remove all unphysical states without the proper antisymmetrization. With 4 particles, L = 4, and 3 dimensions, one would expect 412 possible configurations, but we measure all the particles with respect to one particle, i.e., we have 3 particles that can move around with respect to the 4th particle. This reduces the configuration space to 49 = 262144 dimen- sional. Without eigenvector continuation, we will have to diagonalize a 262144-dimensional Hamiltonian matrix at the target coupling, which takes a long time, and even more time as our coupling becomes larger. eigenvector continuation gives us a huge speed boost here. Let n denote the spatial lattice points on a three dimensional L3 periodic lattice. We use lattice units where all the physical quantities are multiplied by powers of the spatial lattice spacing to make the combination dimensionless. Let the lattice annihilation operators for the spin-up and spin-down fermions be a↑ (n) and a↓ (n) respectively. Our free non-relativistic lattice Hamiltonian is given by, 69 3 X X † Hfree = ai (n)ai (n) m i=↑,↓ n 1 X X X † h ˆ ˆ i − ai (n) ai (n + l) + ai (n − l) . (4.20) 2m n l=1,2,3 i=↑,↓ We add to the free Hamiltonian an on-site contact interaction, X Hfree + C ρ↑ (n)ρ↓ (n), (4.21) n where † ρ↑ (n) = a↑ (n)a↑ (n), (4.22) † ρ↓ (n) = a↓ (n)a↓ (n). (4.23) So the different particles feel an attraction (or repulsion depending on sign of C) only when they are on the same site. If the attraction is strong enough, then it becomes a condensate. Our control parameter c corresponds to the product of the particle mass m and in- teraction coupling C. Negative coupling indicates attractive interaction, whereas positive coupling would mean repulsive interaction. In terms of our control parameter c, the value c = −3.957 corresponds with the unitary limit. The larger negative values of c correspond to the strong-coupling BEC phase, and smaller negative values of c correspond to the weak- coupling BCS phase. The point c = 0 corresponds to a non-interacting system with special symmetries and degeneracies. We choose the training vectors at a more general point on the weak-coupling BCS side at c = −0.4695. Now if we compare the convergence of the three series in this model, then we see similar results as shown in figure 4.1. So let us now look at how the convergence ratios for the three 70 methods compare for this model. In Fig. 4.2 we show the convergence ratios µVC , µEC , and µPT (as defined in equations 4.17, 4.18 and 4.19) versus ct for N = 10 and N ′ = 0. Note that ct is negative, meaning we are looking in the attractive coupling regime. Smaller µ corresponds to faster convergence since by definition convergence ratio µ is how small the next term in the series is becoming compared to the previous term. We see from the graph that µVC and µEC remain well below µPT , indicating the faster convergence of vector and eigenvector continuation compared to perturbation theory. Figure 4.2: Comparison of the convergence ratios µVC (ct ), µEC (ct ), and µPT (ct ) for Model 2 with N = 10 and N ′ = 0. The training vectors for all cases are evaluated on the weak- coupling BCS side at c = −0.4695. The unitary limit value corresponds to c = −3.957. As we cross into the strong-coupling BEC side, perturbation theory diverges, as indi- cated by the convergence ratio µPT exceeding 1. However vector and eigenvector continuation both converge even at very strong coupling far on BEC side, as indicated by µVC and µEC 71 both remaining well below 1. Furthermore, the vector and eigenvector continuation results are in close agreement with each other, with only a slight difference when the convergence is slower. We also plot the LO, NLO, N2 LO, N3 LO, and N4 LO approximations to µVC as defined in Eq. (4.13). We can observe that the expansion of µVC converges for ct within the radius of convergence of perturbation theory. Our results also show that the entire BEC- BCS crossover region can be estimated very well by variational methods like eigenvector continuation. We also note that figure 4.2 shows the convergence of the three series as a function of the parameter c. Smaller µ corresponds to faster convergence, and we see that the convergence near the origin is faster than the convergence for far away points on the c axis. To summarize our results so far, we have seen that within the radius of convergence of perturbation theory, eigenvector continuation and vector continuation have similar conver- gence properties, and they converge faster than the standard perturbation theory. The main reason behind this is because of the orthonormal expansion that vector continuation employs, and somehow eigenvector continuation also learns the orthogonal space correctly and chooses linear combinations that makes best use of the orthonormal expansion. On the other hand, perturbative expansion is sub-optimal because the different non-orthogonal components in different orders of perturbation theory have constructive and destructive interference. Now let us look at how the convergence looks like beyond the radius of convergence of perturba- tion theory. We have already seen it a little bit in the strong coupling BEC regime in figure 4.2, but let us now look in detail for a general case. 72 4.3 Convergence of Eigenvector Continuation outside Ra- dius of Convergence of Perturbation Theory Outside the radius of convergence of perturbation theory, we can estimate the convergence ratio using extrapolation methods. If there are no branch points nearby, then the convergence ratio function can be extrapolated using standard methods such as Padé approximants [5] or conformal mapping [34, 35]. To illustrate this, we consider another example, which we call Model 3. In this model, we again consider a Hamiltonian of the form H = H0 +cH1 , and we take H0 to be a 500×500 diagonal matrix with entries H0 (n, n) = 100n for n = 1, · · · , 500. H1 is a 500 × 500 matrix with nonzero entries as follows: H1 (1, 1) = 40, (4.24) H1 (2, 2) = −80, (4.25) H1 (3, 3) = −180, (4.26) H1 (4, 4) = −260, (4.27) H1 (5, 5) = −320, (4.28) H1 (6, 6) = −335, (4.29) for n = 1, · · · , 499: H1 (n + 1, n) = H1 (n, n + 1) = 2, (4.30) for n = 1, · · · , 498: H1 (n + 2, n) = H1 (n, n + 2) = 5, (4.31) 73 for n = 1, · · · , 497: H1 (n + 3, n) = H1 (n, n + 3) = 5, (4.32) for n = 7, · · · , 500: H1 (n, n) = 50n. (4.33) Figure 4.3: The lowest six energies of Model 3 as a function of c. Model 3 was chosen in such a way that perturbation theory will break down due to several sharp avoided level crossings in the complex plane. In Fig. 4.3 we show the energies of the lowest six energies as a function of the control parameter c. The closest branch point to c = 0 occurs very close to the real axis near c = 0.84. The first avoided level crossing near c = 0.84 can be seen in the figure. For the three different methods, we can look in model 3 at how the convergence of the three series and the convergence ratios compare. We again find similar results as figure 4.1 74 when we compare the convergence of the three different series. And a comparison of the convergence ratios µVC , µEC , and µPT is shown in figure 4.4, as a function of ct for N = 20 and N ′ = 0. In the limit |N − N ′ | → ∞, the convergence ratio µPT will cross the value 1 for ct near 0.84, indicating the divergence of perturbation theory. In our calculation, we don’t see this because of finite |N − N ′ |, but we have verified that as we increase |N − N ′ | further, the convergence ratio µPT goes closer to 1 near ct = 0.84. On the other hand, we observe that µVC and µEC are in close agreement with each other and remain below 1 near ct = 0.84, indicating that they still converge even though perturbation theory does not. Figure 4.4: Comparison of the convergence ratios µVC , µEC , and µPT versus ct for Model 3 with N = 20 and N ′ = 0. In Fig. 4.5 we plot µVC and the LO, NLO, N2 LO, and N3 LO approximations to µVC for N = 20 and N ′ = 0. Once again we see that the series expansion in equation 4.12 converges 75 Figure 4.5: Plots of the convergence ratios µVC , µEC , and the LO, NLO, N2 LO, and N3 LO approximations to µVC versus ct for Model 3 with N = 20 and N ′ = 0. within the radius of convergence of perturbation theory, which for this example corresponds to c = 0.84. Now in this model 3, we can easily perform eigenvector and vector continuation on the negative c axis because there are no branch points there. We show this extrapolation in figure 4.6, where we plot µVC and µEC for N = 20 and N ′ = 0, and we are going to parameter values beyond the range of perturbation theory (which is |c| < 0.84). We also show the (1,1) and (2,2) Padé approximations to µVC . We see that the Padé approximations describe the shape of µVC quite well since there are no nearby branch points. If we want to extrapolate along the positive c axis, then we have to deal with the branch point near c = 0.84. Near these singular points, the slope of the convergence ratio function 76 Figure 4.6: Plots of the convergence ratios µVC , µEC , and the Padé approximations (1,1) and (2,2) to µVC versus ct for Model 3 with N = 20 and N ′ = 0. will be more than that predicted by Padé approximants or conformal mapping because the Riemann surface of the ground state eigenvector is entwined with the Riemann surface of the first excited state eigenvector. If the branch point is very close to the real axis, then we have an avoided level crossing or Landau-Zener transition where the wave functions of the ground state and first excited state interchange as we pass by the branch point. What this means is as we pass by the branch point, the ground state becomes the excited state, and the excited state becomes the ground state. If keep looking only at the ground state, then its properties will change quickly as we cross the branch point. However, if we add in the information from the first excited state, then we have a way to study the ground state convergence beyond the level-crossing - just look at the convergence of the first excited state and extrapolate it across the level-crossing. As we pass the avoided level-crossing, the excited state has become 77 the ground state, and its convergence properties are same as that of the ground state. We therefore have a way to predict the convergence ratio of the ground state beyond the branch point by looking at the convergence ratio of the excited state. We define the convergence ratios for the first excited state µVC 1 and µEC 1 in the same manner as µVC and µEC , except that we replace the target ground state |v(ct )⟩ with the first excited state |v1 (ct )⟩. However, note that we are still using the same orthonormal basis states |w(n) (0)⟩ associated with the ground state at c = 0. To calculate the eigenvector continuation ap- proximation of the first excited state, |v1 (ct )⟩EC N , we want to use a subspace that includes derivatives of the ground state |v (0) (0)⟩ · · · |v (N ) (0)⟩ and also derivatives of the first excited (0) (N ) state |v1 (0)⟩ · · · |v1 (0)⟩. Therefore at each training point location, we take the ground state and the excited state eigenvectors as the training eigenvectors. If we have n training point locations, then we have 2n number of training points for eigenvector continuation. This increases the numerical complexity and the error in the estimate, but eigenvector con- tinuation can be done for excited states in this way. In Fig. 4.7 we show µVC , µVC EC EC 3 1 , µ , µ1 , and the N LO approximations to µ VC and µVC 1 for N = 20 and N ′ = 0. Notice the approximate vertical and horizontal reflection symmetries near the branch point. For ct < 0.84 the increase in the ground-state convergence ratio mirrors the decrease in the excited-state convergence ratio. Also the increase in the ground-state convergence ratio for ct > 0.84 is reflected in the decrease in the excited-state convergence ratio for ct < 0.84. To summarize, we can learn a great deal of information about the convergence of vector and eigenvector continuation just from series expansions around c = 0. Near the branch point, we know that µVC (and µEC ) for ground state and excited state, cross near the midpoint between Nk LO approximations to µVC and µVC 1 for any k. And the N LO ap- k 78 Figure 4.7: Plots of the convergence ratios µVC , µVC EC EC 3 1 , µ , µ1 , and the N LO approxima- tions to µVC and µVC ′ 1 versus ct for Model 3 with N = 20 and N = 0. proximations to µVC and µVC 1 are calculated entirely from perturbation theory at c = 0. Also, the location of the nearby branch point can be deduced from the convergence radius of the series expansions. So all the information about error convergence, even beyond branch point, is present at the origin. While there are limits to how far we can go in ct with these convergence ratio predictions, it is clear that we can predict the convergence ratios inside and to some extent outside the radius of convergence, from the derivatives of the eigenvectors near c = 0. It is quite fascinating how far we can extrapolate beyond the radius of conver- gence of perturbation theory, given that we are using the same derivatives that perturbation theory utilizes. And the fact that these derivatives of the eigenvectors near c = 0 are being used to predict convergence of vector and eigenvector continuation, which are completely 79 Figure 4.8: Plots of the convergence ratio µVC versus ct for Model 3 with N ′ = 0 and N = 5, 10, 15, 20. non-perturbative calculations. Before we move on to the next topic, let us quickly mention the dependence of our convergence ratio results on N ′ and N . So far all our calculations were done with N = 20 and N ′ = 0. The convergence ratio µVC provides information about the rate of convergence of vector continuation in the limit of many training vectors, so ideally we want a large N − N ′ . In Fig. 4.8 we plot µVC for Model 3 for N ′ = 0 and N = 5, 10, 15, 20. We see that the convergence ratio is approaching a common ratio in the limit of large N . 80 4.4 Multi-parameter Eigenvector Continuation Now let us try to extend our discussion of eigenvector continuation to the case where we have D > 1 parameters in the Hamiltonian. If we again want to take the derivatives of the eigenvector at some point, the multi-parameter case is exactly equivalent to the one parameter case if change our one-dimensional derivatives to directional derivatives (ct −c)·∇ that act upon |v(c)⟩. Here c is the limit point of the training data, which we can redefine to be at the origin, and ct is the target point where we are interested in finding the eigenvector at. In other words, we can just redefine our parameters so that we can work with only one effective parameter. But if we now consider the more difficult problem of convergence at all target points at some fixed distance from c, then we cannot reduce the problem to one dimension. For this problem, if we want eigenvector continuation estimate of the target eigenvector, then we need to include all (k +D −1)!/[k!(D −1)!] partial derivatives at order k. But for the training eigenvectors, we need to include all the derivatives of order less than k as well, and therefore we can conclude that a D parameter calculation with ND training vectors is equivalent to a one parameter calculation with N1 + 1 training vectors with (N1 + D)! ND = . (4.34) N1 !D! We will now test this idea in multi-parameter eigenvector continuation through a numeri- cal example. Let H0 be a diagonal matrix with elements H0 (n, n) = n0.1 with n = 1, · · · , 500. Let H1 , H2 , and H3 be three different random 500 × 500 matrices with each matrix element sampled from a normal distribution with zero mean and standard deviation 10−3 . We will call this random matrices example as Model 4. 81 Figure 4.9: Comparison of the logarithm of the eigenvector continuation error versus ND for D = 1 and D = 2 dimensions. First we consider the convergence of eigenvector continuation for the two parameter Hamiltonian given by, H(c1 , c2 ) = H0 + c1 H1 + c2 H2 . (4.35) We take the training points to be in the neighborhood of the origin c = 0. For the target point we pick a random point ct = (−1.636, 1.150). Our convergence results are shown in figure 4.9. We first consider the one parameter case labelled as 1D Derivative EC. This corresponds to taking directional derivatives [ct · ∇]k acting on |v(c)⟩ at c = 0 for k = 0, · · · , N1 , for a total N1 + 1 training vectors. We plot the logarithm of the error versus N1 . We now compare this with the eigenvector continuation results using N2 + 1 training 82 vectors for the two dimensional case. These training vectors correspond to the N2 + 1 k k lowest-order partial derivatives ∇11 ∇22 acting on |v(c)⟩ at c = 0. This convergence data is labelled as 2D Derivative EC in figure 4.9. From the error equivalence formula N2 = (N1 + 2)!/(N1 !2!) − 1, we have N1 = 0 → N2 = 0, N1 = 1 → N2 = 2, N1 = 2 → N2 = 5, N1 = 3 → N2 = 9, N1 = 4 → N2 = 14, N1 = 5 → N2 = 20. (4.36) As we see in Fig. 4.9, these predictions work quite well. The errors for N1 are approximately equal to the errors for the corresponding value of N2 . We can compare these results with the results we obtain when, instead of using partial derivatives, we simply take training vectors at N2 + 1 random points in the neighborhood of the origin. This data is labelled as 2D Random EC. As we can see, this data agrees well with the 2D Derivative EC data. This is exactly what we expected, since the partial derivatives are well approximated by finite differences of |v(c)⟩ at points in the neighborhood of the origin. Now in Model 4, we now consider the convergence of eigenvector continuation for the three parameter Hamiltonian H(c1 , c2 , c3 ) = H0 + c1 H1 + c2 H2 + c3 H3 . (4.37) 83 Figure 4.10: Comparison of the logarithm of the eigenvector continuation error versus ND for D = 1 and D = 3 dimensions. with random target point ct = (−1.034, −1.065, 1.341). In this case we have the error equivalence formula N3 = (N1 + 3)!/(N1 !3!) − 1 and therefore N1 = 0 → N3 = 0, N1 = 1 → N3 = 3, N1 = 2 → N3 = 9, N1 = 3 → N3 = 19. (4.38) Our error convergence results for this case is shown in figure 4.10. As we see in the figure, these predictions also work quite well. The errors for N1 , labelled as 1D Derivative EC, are approximately equal to the errors for the corresponding value of N3 , labelled as 3D 84 Derivative EC. We also show the results we obtain when we take training vectors at N3 + 1 random points in the neighborhood of the origin. This data is labelled as 3D Random EC, and the errors match quite well with the 3D Derivative EC results. 4.5 Error Dependence in Eigenvector Continuation on Location of Training Points As we have mentioned before, the error in eigenvector continuation depends not only on the order of calculation, but also on the location of training points. And since this dependence is very problem specific, it is very hard to answer in general, which training point locations are optimal for any given problem. Nevertheless, in this section we discuss some ideas about this dependence, and how to choose training points more efficiently in a multi-parameter case. As we already noted, the multi-parameter case reduces to the one parameter case if we select training points along a straight line passing through the target point. This probably is the best strategy for choosing locations for training points if we are interested only at one point, or points along one line. If the training points lie along a smooth curved path that passes through the target point, then we expect convergence faster than the general case but slower than the straight line example, due to the curvature of the path and the increased path length. We illustrate this with the two parameter example for Model 4, H(c1 , c2 ) = H0 + c1 H1 + c2 H2 . (4.39) 85 Figure 4.11: We show different choices of path for the eigenvector continuation training vectors. The direct straight line is in green, semicircle is in orange, isosceles right triangle is in blue. Suppose that our target point, C, is located at ct = (2, 0). We first consider training points evenly spaced along a straight line from the origin, which we label as point A, to the target point C. This is shown in Fig. 4.11. In Fig. 4.12 we plot the logarithm of the error versus N , where the number of training points is given by N + 1, and this data is labelled as ’Straight Line EC’. Now we consider training points along a semicircular arc ABC, as shown in Fig. 4.11. The convergence for this case is displayed in Fig. 4.12 , and this data is labelled as ’Circle EC’. The convergence is slower than for the straight line example but faster than the general two parameter convergence that we saw in Fig. 4.9. This is in agreement with what we predicted. Next we consider points along a right isosceles triangle ABC as shown in Fig. 4.11. In this example the path has a discontinuity at B. The convergence for this case is displayed 86 Figure 4.12: Comparison of the logarithm of the error versus order N for different paths for the eigenvector continuation training vectors. The number of training points corresponds to N + 1. in Fig. 4.12 , and this data is labelled as ’Triangle EC’. As one might expect, there is almost no reduction in error as we take training points along the line segment AB, which does not pass through the target point. After we reach B and turn the corner to C, we see that the convergence rate is similar to the one parameter case again. When we are taking training points along the line segment AB, it is as if that eigenvector continuation does not get to see another dimension in the parameter space, but once we turn the corner, all of a sudden the dimension space of training vectors of EC has increased, and now it gives a much better estimate. This shows that the existence of a smooth path connecting the training points and the target is very important for the convergence rate of eigenvector continuation. It is also worth mentioning how our example with equally spaced points on the straight 87 line compares with our analysis using derivative vectors. If the training points are not too far apart, then the convergence is similar to that obtained by replacing the training vectors with N + 1 derivative vectors evaluated at the centroid of the training data. If the training points are spaced far apart, however, then the training points far away from the target point are mostly unhelpful, and the convergence is mostly dominated by the training points closest to the target point. In this situation, although we still call this N th order eigenvector continuation calculation, only a few points in the N training points are actually contributing to the calculation. This is another intricate balance of choosing our training point location and number of training points. A few training points near the target point have much more information than many training points far away from the target point. The difficulty in optimizing our eigenvector continuation calculations with training points spaced apart, is that for any given target point, we often don’t know the optimal region where we should take training points. In our next chapter, we try to answer this question of choosing optimal locations for training points, by using the emulator itself to learn where it should choose its training points. We come up with Self-learning Emulator algorithm, which is an iterative algorithm to choose the best location for the next training point. 88 Chapter 5 Self-learning Emulators As we mentioned before, eigenvector continuation builds on the idea that as we vary the parameter(s) in our Hamiltonian, the ground state eigenvector varies in a subspace that has much lower dimensionality than the full dimension of the linear space that the Hamiltonian lives in. And we can learn this subspace in which the target eigenvector lives in through training eigenvectors, taken at certain training points in the parameter space. In this chapter we continue our discussion of how to choose optimal locations for our training points for eigenvector continuation so that we can efficiently learn the subspace in which the target eigenvector lives in. We also discuss in this chapter about the second advantage of eigenvector continua- tion that we mentioned in the second chapter - computational advantage of EC and its use in emulator. The dimensionality reduction that eigenvector continuation offers, greatly ac- celerates the numerical calculation of any problem, and provides us a huge computational speed-up. This speed-up can sometimes be several orders of magnitude faster, and we can use eigenvector continuation to quickly get an estimate at any point in the parameter space, where other conventional methods would take a long time to calculate the exact result. This is justified by that the fact that we can get more than 99.99% accuracy in a fraction of the time we need to calculate the exact result. This is the main idea of an emulator - we can 89 estimate (emulate) the exact result at any point very quickly using an emulator, rather than performing a long exact calculation. This observation that eigenvector continuation can function as an accurate emulator for quantum systems, was quite recent [3], and this was followed by a number of new develop- ments and applications [4, 36–41]. The interest in emulators across many scientific disciplines has been long standing one. There has been great interest in using machine learning tools to build efficient emulators that predict scientific processes beyond what is possible with direct calculations [42–45]. A common problem in all these cases is that the emulator must be trained using a large amounts of training data, which is often hard to collect since the required computations are difficult and expensive. It is the same case with eigenvector continuation emulator. We need training eigenvectors for eigenvector continuation calculations, and the more training points we have, the better our EC approximation. However, we need to perform exact calculations at these training points, which can be computationally difficult. Ideally, we want the best results with as few training points as possible. And this brings us back to the problem of optimizing our choice of number of training points, and our choice of their location. In this chapter, we build on our discussion in the previous chapter, and first show how we could come up with a simple algorithm to choose the locations of our training points based on orthogonality of the training eigenvectors. We then refine this idea to formulate another algorithm to select training points for eigenvector continuation without any prior information about the exact eigenvector at any point. The key idea here is that we can use the eigenvector continuation emulator itself to find the next optimal training point. It turns out that this self-learning algorithm has general applicability, and it can be applied to a whole class of emulators that satisfy some constraint equation. In the following sections, we 90 will discuss and illustrate this self-learning emulator with several examples. 5.1 Finding Optimal Training Point Locations for Eigen- vector Continuation In the previous chapter, through the orthonormal expansions of vector continuation we have gained a key insight in convergence of eigenvector continuation. The eigenvector continua- tion method works by projecting the target Hamiltonian onto the subspace spanned by the training eigenvectors, and the higher the effective dimensionality of this subspace spanned by the training eigenvectors, the better our eigenvector continuation estimated will be. The effective dimensionality of this subspace is determined by how orthogonal the training vectors are to each other. If all the training eigenvectors are pointing in almost same direction, the information content in the subspace spanned by these training eigenvectors is low because all the additional eigenvectors after the first eigenvector, are not providing any additional information of the subspace in the target eigenvector lives in. On the other hand, if all the training eigenvectors are orthonormal to each other, then the subspace spanned by these training eigenvectors has the highest possible information content about the subspace in the target eigenvector lives in. With that in mind, we can come up with an iterative algorithm for selecting the loca- tion of next training point in eigenvector continuation. We start with one or two training eigenvectors at a random values of the parameter c0 , and then iteratively add more training eigenvectors. At each iteration, we scan through all the eigenvectors |ψ(c)⟩, and choose the next training location such that the training eigenvectors are as orthogonal as possible. This can be done by selecting the next eigenvector |ψ(ci )⟩ at location ci in the ith iteration, such 91 that the multi-dimensional volume made by the vectors {|ψ(c0 )⟩ , · · · |ψ(ci )⟩} is maximized. If we arrange all the training vectors in a matrix form (by putting them as columns),   V = |v(c1 )⟩ · · · |v(ck )⟩ (5.1) then we can calculate the k−dimensional volume formed by these vectors by, volume = Det{V T × V } (5.2) Since our eigenvectors are normalized to 1, if all of the eigenvectors are orthonormal, then the volume is 1. If they are not orthogonal, then it will be less than 1, and we can use this volume as a measure of how orthogonal our training eigenvectors are. If at any iteration, after scanning through all eigenvectors |ψ(c)⟩, the addition of the new eigenvector reduces the volume of the training eigenvectors significantly, then we know that new eigenvector is not giving us a lot of new information about the subspace in which the target eigenvector lives. This would means that we already have enough training points, and additional training points do not increase the accuracy of the eigenvector continuation estimation by much. So, in this algorithm, we start with one or two random training points, and keep adding more training points iteratively, until the volumes become less than a specified amount. Note that we are not removing any training points that we already selected earlier, but in a practical application this can be done if we want to further optimize our choice of training points. Let us now show with an example what the results look like. In fact, we have already shown the result with this algorithm in chapter 2, when we presented an eigenvector continuation emulator with 10 dimensional random matrices. We reproduce the plot and 92 show it in figure 5.1 for convenience. Figure 5.1: Eigenvector continuation emulator - we emulate approximate eigenvalue problem solution everywhere between −700 ≤ c ≤ 700 using six training eigenvectors. The training eigenvectors are chosen such that the volume formed by these vectors is maximized. The final volume is less than 0.02. In this example, we consider Hamiltonian of the form H = H0 + cH1 , with H0 , H1 being random Hermitian matrices given in chapter 2. We start with two random training points at c = −693.4 and c = −115, and add more training points according to the algorithm above. We stop when the volume is less than 0.02. Now, we may ask why the training points are chosen in weird clusters of two. We ideally want far away training points to maximize the volume formed by the training eigenvectors, however this algorithm is picking nearby training points. This can be explained once we look at the plot of avoided-level crossings in figure 5.2. As we can see, there are several avoided level crossings near c = −120 and c = 0. As we 93 Figure 5.2: Energies for different excited states for the random matrix example given in chapter 2. After an avoided level crossing, the excited eigenvector changes with another eigenvector, and if we want to capture this information we need training points on both sides of the avoided level crossing. This explains our choice of training points in figure 5.1. go across these level crossings different eigenvectors corresponding to different eigenvalues exchange with each other. eigenvector continuation cannot capture this information with training points on only one side of the level-crossing, because after the level-crossing the eigenvector has changed completely. So we need training points on both sides of the level- crossing, and if there are several nearby avoided level crossings, then we need to take several training points there. The key idea here is that we want our training eigenvectors to be spaced far apart from each other so that they are pointing in different directions in the subspace in which the target eigenvector lives. While we might be tempted to randomly choose training points far 94 away from each other, this notion of ’far away’ is defined by the avoided level crossings of our problem, and if the crossings are close in the parameter space, we need to take training points that are close to each other. This might look counter intuitive, we are actually taking eigenvectors which greatly differ from each other. So, for an ideal eigenvector continuation calculation, we want our training points spaced out such that we capture all the information of these avoided level crossings. However, the problem lies in the fact that for a large Hamiltonian matrix, it is not easy to compute the location of all these avoided level crossings. Thus, we have no apriori information about where we should select our training points. But our algorithm of trying to optimize the volume formed by our training eigenvectors does lead us to correct locations for efficient eigenvector continuation calculation. We can improve this algorithm further by trying to remove some training points when the volume becomes too low, and then adding new training points until the volume becomes small again. We can find out which training eigenvector increases the volume the most upon removal, and remove that vector.We can repeat this process again and again, until we reach an equilibrium point. This will remove the dependence of the algorithm on the random initial training point. Although this algorithm works very well in finding optimal locations for our training points, it has one massive flaw - we need to compute the exact eigenvector everywhere to be able to compare and choose the eigenvector which maximizes the volume of our set of training eigenvectors. The whole idea behind using an emulator is that direct calculations can be computationally expensive, and we can get a quick approximate using an emulator. If we have to find the exact result everywhere to find the optimal location of our training point, then it beats the point of using an emulator. Nonetheless, this idea of using the volume 95 formed by the training eigenvectors as a measure of orthogonality between the vectors is still very useful. In any eigenvector continuation calculation, we can use this volume measure to know when we have added too many training points, and to remove some of the training eigenvectors which give the least information about our target subspace. The volume of the eigenvectors helps us find out how many training points we need, and gives us a sense of required dimensionality of the projected space. It is also an important concept that can be used to deal with eigenvector continuation using noisy matrices - a work done by Caleb Hicks in our research group. We will not describe the noise correction work here. We now have a clear goal of what we want. We want training points as orthogonal to each other as possible, and again we want to add training points iteratively, learning more about the target subspace with each additional training eigenvector. But now we want to find where to take our next training point without actually calculating the exact eigenvector anywhere. Now we can make use of the fact that eigenvector continuation tries to give us an approximate solution to the eigenvalue problem of the target Hamiltonian H(ct ). If at the target parameter ct , the exact energy and exact eigenvector are denoted by E(ct ) and |ψ(ct )⟩ respectively, then they must satisfy, H(ct ) |ψ(ct )⟩ = E(ct ) |ψ(ct )⟩ (5.3) If eigenvector continuation gives us an approximation of energy Ẽ(ct ) and approximation to eigenvector by |ψ(ct )⟩EC , then we can immediately test how good these approximations are, by substituting Ẽ(ct ) and |ψ(ct )⟩EC in equation 5.3, and seeing how well the equation is satisfied. But we have to define a way to measure how "well" the constraint equation 96 is satisfied, and this can be done in many ways. For instance, we can use the residual H(ct ) |ψ(ct )⟩EC − Ẽ(ct ) |ψ(ct )⟩EC as a measure of how well the eigenvalue equation is sat- isfied. If the eigenvector continuation approximation is very close to the exact results, then the EC approximation should satisfy eigenvalue equation very well, and we should have a very small residual. However, if the EC approximation at some point is bad, then we will have a large residual. Note that with a given number of training eigenvectors, we do not need to calculate the exact eigenvector anywhere to calculate the eigenvector continuation approximation over the entire parameter space. As we mentioned earlier, calculating the exact result can be computationally expensive, and we only have to do it a few times when we are calculating the exact eigenvector at a training point. In each iteration, after we use eigenvector continuation to emulate over all parameter space, we find the point where the residual (as defined above) is highest, and we calculate the exact eigenvector once at that point. We include that eigenvector in our set of training eigenvectors for our next iteration. That is all we need to do to find our next optimal training point. The key idea in this self-learning eigenvector continuation algorithm is how we use the constraint equation, namely eigenvalue problem, to quickly determine where our approximation error is largest. The eigenvector at this point has lowest overlap with the linear span of our current set of eigenvectors, and we choose that eigenvector as our next training eigenvector. The reason behind this choice is that among all possible eigenvectors, this eigenvector will have the largest component along the direction orthogonal to the subspace spanned by our current set of eigenvectors, and thus will make the volume formed by the new set of eigenvectors the largest. Now notice that even if we were not solving an eigenvalue problem, and we wanted 97 to solve a different constrained equation, we can still use this idea of using the constraint equation to find how good any given approximation to the actual solution is. Thus this idea of self-learning can be used with any emulator, as long as we have a constraint equation that the exact solution satisfies. In the next section, we describe in detail this self-learning emulator algorithm for the general case. We then show its applicability in a variety of examples. 5.2 Self-learning Emulator Although we described the self-learning algorithm with the example of eigenvector contin- uation emulator, in this section we will again begin from scratch and describe the detailed algorithm for general use. We begin this section with an introduction to emulators, and what properties we want in a good emulator. We state the training problem that emulators face, and then we explain the self-learning algorithm and show how it helps emulators overcome this problem. This problem of choosing optimal training data is a machine learning problem at its heart, and along the way we will review some machine learning literature as well. An emulator is a form of data modelling where we build fundamental relationships between a collection of observed input variables and some desired output variables. It is a fast-to-evaluate statistical approximation of a detailed mathematical model. It is useful because the detailed calculation can be slow or relatively expensive, and emulators can be or- ders of magnitude faster to evaluate. Some examples of usage of emulators include searching for exotic particles in high-energy physics [46], detecting likelihood of disease progression in a patient [47], and emulating eigenvectors of a Hamiltonian in nuclear physics. Here we are interested in the last problem, but different kinds of emulators are in use in several different 98 disciplines of science. A good emulator should be detailed enough to capture salient aspects of the mathemat- ical problem at hand, but a highly detailed model can make it computationally expensive, hindering its usability. An emulator is designed using some training points where the exact data is known, and the emulator should faithfully reproduce the exact solution at those training points. Everywhere else, the emulator provides an approximate to the exact result, and this approximation error decreases with the number of training points as either a power law for piecewise continuous emulators or exponentially fast for smooth function emulators. However, more training points means more computational time to emulate, and we want the best possible emulator with least number of training points. This means we want to optimize the location of a fixed number of training points to get better performance. This can be quite a challenge since finding the exact result anywhere is computationally expensive, and each time we want to evaluate the performance of the emulator with a set of training points, we need to solve for the exact solution at those training points. We cannot exhaustively search for all set of training eigenvectors to find the optimal solution. The problem at hand is about optimally selecting a subset of a large dataset that we want to choose as training points. If we use machine learning to solve this problem, then it becomes an active-learning problem. Active learning (also called “query learning,” or sometimes “optimal experimental design” in the statistics literature) is a subfield of machine learning that deals with choosing an optimal subset from a large amount of data. The key hypothesis is that, if the learning algorithm is allowed to choose the data from which it learns it will perform better with less training. It has been used in a variety of fields [48–50] and is often combined with other machine learning methods like neural networks. Now we introduce an active learning protocol called self-learning emulation that relies 99 on a fast estimate of the emulator error. It is a greedy local optimization algorithm and it becomes progressively more accurate as the emulator improves. It provides a potential solution to the problem of selecting optimal training data for emulators, when the objective of the emulator is to solve a system of constraint equations over some domain of control parameters. As we will show, self-learning emulators are highly efficient algorithms that speed up our calculations by several orders of magnitude or more. The gain in computational speed is achieved by using the emulator itself to estimate the error. We note that the self-learning emulators we discuss here are qualitatively different from other machine learning algorithms that model the solutions using gradient descent optimiza- tion of some chosen loss function. Although these gradient descent optimization methods are highly parallelizable and very fast, they usually suffer from critical slowing down with respect to error convergence and cannot achieve arbitrarily high accuracy in polynomial computing time. We can summarize the self-learning algorithm as follows. It is an iterative algorithm to choose training points, and we start with a few random training points. We use the constraint equation and the emulator to find over our parameter space a fast estimate of error in emulation. We then find the point where the emulation error is largest, and then choose our next training point there. And we iterate this algorithm to find and add more training points. We will now describe in details on how to get this fast error estimate. We should also mention that this idea of using the emulator to find additional training points has been applied before. In [51], along with the usual estimate of the actual result, the emulator also gave an estimate of error in emulation, and the active learning algorithm was based on picking the next training point where the emulated error was the largest. While this idea is similar to ours, our method of calculating estimated error is different, and we 100 achieve much better results with self-learning emulation. 5.2.1 Constraint equations and error estimates Suppose we have a set of simultaneous constraint equations Gi (x, c) = 0 in variables x = {xj }, and control parameters c = {ck }, which vary over some domain D. Let us denote the exact solutions as x(c). We assume that we have an emulator which uses the exact solutions for some set of training points {c(i) } and constructs an approximate solution x̃(c) for all c ∈ D. We define the error or loss function as the norm ∥∆x(c)∥ of the residual vector ∆x(c) = x(c) − x̃(c). We want to choose a set of training points for our emulator such that the peak value of the error function over the domain D is minimized. And we will choose this set of training points iteratively such that the peak error goes down with every iteration. We want to choose the point where the error function ∆x(c) is largest as our next training point. However, we cannot calculate the error function ∆x(c) without calculating the exact solution x(c) there, and as we have discussed before, calculating the exact solution is slow and computationally expensive, and we cannot find it for all c ∈ D. Thus, in each iteration, we want to find an estimate of the error function, which should be much faster to calculate than calculating the exact solution. We will call this error estimate as fast error estimate or simply fast estimate. Since the error function will vary over many orders of magnitude, it is more convenient to work with the natural logarithm of the error function, log∥∆x(c)∥. The emulator will reproduce the exact solution at the training points {c(i) }, and the logarithm of the error function will become negative infinity. Therefore, the logarithm of the error function will become a rapidly varying function of c as we include more training points. 101 Let us consider the case where ∆x(c) is small enough that we can accurately expand the constraint equations as Gi (x̃(c), c) + ∆x(c)·∇x Gi (x̃(c), c) ≈ 0. (5.4) If the number of degrees of freedom is small, we can solve the linear inversion problem for ∆x(c). The solution for ∆x(c) will be a fast estimate for the error. This estimate is just what we would expect from the multivariate form of the Newton-Raphson method. However, for most cases of interest, there will be many degrees of freedom and the matrix inversion required to solve for ∆x(c) will be too slow or computationally impossible for our self-learning emulator training process. We therefore choose another non-negative functional F [{Gi (x̃(c), c)}] as a substitute for ∥∆x(c)∥. The only essential requirement we impose on F [{Gi (x̃(c), c)}] is that it is linearly proportional to ∥∆x(c)∥ in the limit ∥∆x(c)∥ → 0. This allows us to write the logarithm of the error as log∥∆x(c)∥ = log F [{Gi (x̃(c), c)}] + A + B(c), (5.5) where A is a constant and the average of B(c) over the domain D is zero. Since F [{Gi (x̃(c), c)}] is linearly proportional to ∥∆x(c)∥ in the limit ∥∆x(c)∥ → 0, the function log F [{Gi (x̃(c), c)}] will have the same steep hills and valleys as the function log∥∆x(c)∥ as we include more training points. In the limit of large number of training points, we can neglect the much smaller variation of B(c) over the domain D. We can therefore approximate the logarithm of the error as log F [{Gi (x̃(c), c)}] + A. Now the un- known constant A is irrelevant if we are comparing the logarithm of the error for different 102 points c. Nevertheless, we can also quickly estimate A simply by taking several random samples of c and computing the average value of the difference between log∥∆x(c)∥ and log F [{Gi (x̃(c), c)}]. Later we will show that we can even refine this estimate further using machine learning to approximate the function B(c). A greedy algorithm is an approach for solving a problem by selecting the best option available at the moment. It doesn’t worry whether the current best result will bring the overall optimal result. The algorithm never reverses the earlier decision even if the choice is wrong. It works in a top-down approach. And the self-learning emulator training program is a greedy algorithm since in each iteration, we search over parameter space to find the point c where the logarithm of the error is greatest at the moment. We then add this point to the training set and repeat the whole process. In this manner we have constructed a fast emulator that becomes more and more accu- rate as more training points are added and provides a reliable estimate of the emulator error. It should be emphasized that the self-learning emulation is just an algorithm to learn the best training points for the emulator, and it does not change the process of emulation itself. Thus it can be used with any emulator with a constraint and reproduces the exact solution at the training points. This could be a simple method such as polynomial interpolation or a Gaussian process, or a more involved method such as neural networks or eigenvector con- tinuation. We retain all the beneficial properties of the emulator such as its computational speed advantage, parallelizablilty, ease of application, etc. It can be applied to any system of constraints such as solutions of algebraic or transcendental equations, linear and nonlinear differential equations, and linear and nonlinear eigenvalue problems. We now show how this algorithm works in several examples. 103 5.2.2 Natural Cubic Spline Emulator In mathematics, a spline is a special function defined piecewise by polynomials. If we have a given set of data points (x0 , y0 ), · · · , (xn , yn ), then a spline interpolator tries to fill in the data between the given data points by piecewise polynomial functions. The data points, where different piecewise polynomial functions converge from different sides, are called knots. A common spline is the natural cubic spline, which uses piecewise cubic polynomials to interpolate in between the data points. The word ’natural’ refers to the fact that these polynomials are joined C 2 smoothly. In the one dimensional case, this simply means that the first and second derivatives at the knots are equal from both sides. We describe in the supplemental materials how from a given data set (x0 , y0 ), · · · , (xn , yn ), we can form a set of n cubic polynomials that approximate the data in between the data points. Let us now consider some natural cubic spline emulators and apply self-learning algo- rithm to it. In the first example we consider, Model 1, we are interested in finding the smallest root of an nth order polynomial of the form p(x) = cn xn + · · · + c1 x + c0 , where n is odd and the coefficients are real. Since n is odd, the polynomial crosses zero at least once if cn ̸= 0, and so we can uniquely define the lowest real root. We will consider the case where all of the coefficients are fixed and only the coefficient cn−1 is varied. Let x be the exact lowest real root, and this will be a function of cn−1 . We are interested in finding x(cn−1 ) as cn−1 varies over some domain D, but instead of finding polynomial roots at every point, we want to find the exact root only at a few points, and we want to use that data to build a cubic spline emulator that will approximate the smallest root everywhere else. If the order of polynomial n is greater than 4, there is no algebraic solution, and the roots must be solved numerically. Now a direct numerical root calculation will take a long time, and we want to 104 get a quick approximation using splines, which will be faster. Our goal with spline emulator is to have lowest peak emulation error. We want to optimize our choice of location for the training points that the emulator will work with, but we don’t know the emulation error anywhere without calculating the exact root at that point. So we perform self-learning algorithm to find the best set of points. Let the cubic spline approximation be x̃(cn−1 ) for any given cn−1 . The logarithm of the error function is then log |∆x(cn−1 )| where ∆x(cn−1 ) = x(cn−1 ) − x̃(cn−1 ). We want to come up with a fast error estimate that we can use to guess where the error is highest. For our fast error estimate F [x̃(z, c), c], we need some function that is linearly proportional to the actual error ∥∆x(z, c)∥ in the limit ∥∆x(z, c)∥ → 0. There are many good choices one can make, and here we choose, |p(x̃(cn−1 ))| |∆x(cn−1 )| ≈ p , (5.6) ′ |p (x̃(cn−1 ))|2 + ϵ2 where we have included a small regulator ϵ to avoid divergences when the derivative p′ vanishes. This estimate |∆x(cn−1 )| is again motivated by the Newton-Raphson method. We use the right-hand side of equation 5.6 for our error estimate. For our example, we take a fifth-order polynomial, n = 5, with coefficients c5 = 5, c3 = 3, c2 = 2, c1 = 1, c0 = 1. We vary c4 in the region 0 ≤ c4 ≤ 100. We also set ϵ = 1 as defined in equation 5.6. We start with four training points for cn−1 , with two on the boundary and two in random points in between. In Fig. 5.3 we show results for the estimate of the logarithm of the error, and how the training process occurs. At our training points, we manually set the error to be exp(-25). For each iteration, the self-learning algorithm is choosing a new training point that corresponds to the location of the maximum of estimated 105 error function from the previous iteration. Figure 5.3: Estimates of the logarithm of the error for the cubic spline self-learning emulator, which finds the lowest real root of the fifth order polynomial p(x) in Model 1. We show results after iteration 0, 1, and 2. In Fig. 5.4 we show that there is excellent agreement between the logarithm of the error for the actual error and the error estimate after 10 iterations. We note that the error has dropped significantly for all values of c4 from the original error estimate in Fig. 5.3, indicating that the self-learning emulator is functioning as intended. The fact that more training points are needed for smaller values of c4 shows that the training process is not simply adding more training points at random, but is instead uniformly improving the emulator performance across the entire domain. We also note that in this case the error estimate is matching well with the actual error. Therefore both A and B(c) as defined in equation 5.5 are negligible for this example. 106 Figure 5.4: Logarithm of the actual error and error estimate for the cubic spline self-learning emulator in Model 1 after 10 iterations. For this Model 1, it is relatively easy to numerically compute the roots of a 5th order polynomial, but the spline emulator is still faster. And we find that our self-learning protocol provides a factor of about 200 times greater computational speed over tuning the cubic spline emulator using direct calculations of the error. Let us consider a slightly more challenging problem for us to numerically solve. After that we will discuss and define the speed-up factor for our emulator more precisely. We again want to use a natural cubic spline emulator, but this time we are interested in solving a little harder problem. In this example, Model 2, we want to find the lowest real solution of a transcendental equation given by, p(x) = c5 x5 + c4 x4 sin(10x) + c3 x3 + c2 x2 + c1 x + c0 = 0, (5.7) 107 where all the coefficients ci are real. We again fix coefficients c5 = c3 = c2 = c1 = c0 = 1, and we vary the coefficient c4 in the region −1 ≤ c4 ≤ 2. We are interested in the smallest real x that satisfies equation 5.7, and for this choice of coefficients, we know that a real solution for x always exist for real c4 , however the dependence of the solution on c4 is not trivial and has discontinuities with respect to parameter c4 . We apply the self-learning algorithm to train the natural cubic spline emulator for this problem. We start with three training points for c4 , two on the boundary and one in the interior. We denote the cubic spline approximation by x̃(c4 ) for all values of c4 . The logarithm of the error function is then log |∆x(c4 )| where ∆x(c4 ) = x(c4 ) − x̃(c4 ). We can again estimate |∆x(c4 )| using the Newton-Raphson method, and we use the right-hand side of equation 5.6 for our error estimate with ϵ = 1. Figure 5.5: Plot of the lowest real solution to Eq. (5.7) versus c4 in Model 2. The self- learning emulator needs to take significantly more training points near the discontinuity at c4 ≈ 1.232. 108 The smallest solution to the equation is shown in Fig. 5.5. We also show how well our spline emulator does in approximating the function everywhere. We notice that the dependence of the solution on variable c4 is non-trivial, and there is a discontinuity at c4 ≈ 1.232. Figure 5.6: Logarithm of the actual error and error estimate for the cubic spline self-learning emulator in Model 2 after 10 iterations. Figure 5.7: Logarithm of the actual error and error estimate for the cubic spline self-learning emulator in Model 2 after 20 iterations. Figures 5.6 and 5.7 show the logarithm of the error estimate and actual error after 10 and 20 iterations of self-learning algorithm respectively. The fact that more training points are needed near c4 ≈ 1.2 shows that the training process is not simply adding more training 109 points at random, but is instead choosing training points which add the most information to the emulation process. The self-learning emulator took significantly more training points near the discontinuity to accurately emulate it. Also notice that the peak error has dropped from figure 5.6 to 5.7, as we add another 10 training points. This shows that if we keep continuing in this manner, we can reach arbitrary precision with our emulators. Again our error estimates are matching well with the actual error, and thus both A and B(c) as defined in equation 5.5 are negligible for Model 2. Now let us comment on the error scaling in spline emulators, and how much speed-up factor do we get from using this spline emulator. In the limit of large number of training points, N , the error for the spline interpolation for a smooth function scales as O(N −4 ) [52]. This is because the error of the cubic interpolation scales as the fourth power of the interval between training points. However, this is only true when the function is smooth, and in the limit that N is large. For Model 2, however, the exact solution has a jump discontinuity, and so the power law scaling is slower. Numerically, we find that the error is approximately O(N −2.2 ). We see this in figure 5.8, where the slope of the graph is −2.2. Using an Intel i7-9750H processor, evaluating the exact solution using standard root finding methods for one value of c4 requires about 10−1 s of computational time. In com- parison, it takes about 10−6 s to estimate the solution for any c4 , using spline interpolation with 23 training points. So the raw emulator speedup factor is therefore sraw ∼ 105 . Let M be the number of evaluations of needed, i.e., number of points in our domain except for training points. Let Nϵ be the number of emulator training points needed to achieve error tolerance ϵ. The overall computational speedup factor for the self-learning emulator can then be estimated by the minimum of M/Nϵ and raw emulator speed-up factor. If the fast error estimate were not used, then Nϵ would be replaced by the number of evaluations needed to 110 Figure 5.8: Natural spline emulator error scaling for Model 1. We plot the logarithm of the error versus the logarithm of the number of iterations. train the emulator to the desired error tolerance ϵ, which is generally much larger than Nϵ . 5.2.3 Reduced Basis Emulator We now turn our attention to solving differential equations with emulators. In our next example, Model 3, we will emulate the solution of an ordinary differential equation with one variable z and one control parameter c. We consider a family of differential equations Lx(z) = 0, where 1 d2 2 d L= 2 2 − 3 + c2 e2c , (5.8) (1 + 2z) dz (1 + 2z) dz and c is a real parameter. Our boundary conditions are x(z = 0, c) = 0 and ∂z x(z = 0, c) = 1 for all c. We consider the region 0 ≤ z ≤ 1, and 0 ≤ c ≤ 1. The exact solution is given by 111 x(z, c) = ce1c sin[cec (z + z 2 )]. For this problem, we consider two different emulators. The first is the natural spline emulator, which we have already seen. We can use splines to perform interpolations and extrapolations in c for each value of z. The second emulator is a reduced basis emulator, which uses high-fidelity solutions of the differential equation for several training values of c and solves the constraint equations approximately using subspace projection. Reduced basis (RB) emulators have been proven useful for solving computationally-intensive parameterized partial differential equations [53–57]. We also tried to emulate the solution to this differential equation using neural networks, and Gaussian Process (GP) emulators, but they did not perform very well, and we omit their results. We again want a fast error estimate F [x̃(z, c), c] that is linearly proportional to the actual error ∥∆x(z, c)∥ in the limit ∥∆x(z, c)∥ → 0. There are many ways to choose this, but we again choose an expression that we can relate to 5.4. We define our error estimate as, Lx̃(z, c) F [x̃(z, c), c] = q , (5.9) d Lx̃(z, c)2 + ϵ2 dz 1 where we have again included a small regulator ϵ to avoid divergences. Here we are using the L1 norm, which is the integral over z of the absolute value. Note that this fast error estimate can be used by both the spline emulator and reduced basis emulator. In fact, for solving this differential equation, this fast estimate can be used with any emulator. We initialize with two training points at the boundaries and one in the interior. Figure 5.9 shows the actual error and estimated error after 20 iterations of the self-learning spline emulator. On the other hand, we found that the reduced basis emulator performs very well 112 Figure 5.9: Logarithm of the actual error, error estimate, and corrected error estimate for the natural spline emulator with self-learning in Model 3 after 20 iterations. for this problem, and therefore in order to show more details of the performance before reaching the limits of machine precision, we extend the domain to the wider interval of 0 ≤ c ≤ 2 for the RB emulator. Fig. 5.10 shows the actual error and estimated error after 10 iterations of the self-learning RB algorithm. In both cases the difference between the actual error and estimate error is a slowly- varying function of c as predicted. We also note that the exact solution x(z, c) = ce1c sin[cec (z+ z 2 )] oscillates more rapidly with increasing c, and the emulators therefore need more training points for larger c. This again shows self-learning algorithm is selecting training eigenvectors such that the emulator performance across the entire domain is uniformly improving. In this example, there is a difference between the error estimate and the actual error, indicating that A + B(c) in equation 5.5 is non-zero. So we cannot tell the actual error, just from knowing the error estimate. However, in these situations we can estimate the difference 113 Figure 5.10: Logarithm of the actual error, error estimate, and corrected error estimate for the reduced basis emulator with self-learning in Model 3 after 10 iterations. between the error estimate and the actual error by constructing a Gaussian Process (GP) emulator for the difference function A + B(c). We train the GP by computing A + B(c) at the midpoints in between the emulator training points. We have performed this correction for both the spline and RB emulators, and the results are shown in Figs. 5.9 and 5.10. We see that the corrected error estimate is in excellent agreement with the actual error. In this Model 3, the solution is smoothly varying function. So we should expect O(N −4 ) error scaling with spline emulator. However it seems that we have not yet reached the asymptotic scaling large N limit, and the error scaling is approximately O(N −1.88 ). This can be seen in figure 5.11. In contrast, the reduced basis emulator has exponentially fast error scaling. This is because the reduced basis emulator is itself a smooth function. We can view the addition of training points as matching more derivatives of the smooth emulator to derivatives of the smooth exact solution. The error scaling is therefore similar to the error 114 Figure 5.11: Natural spline emulator error scaling for Model 2. We plot the logarithm of the error versus the logarithm of the number of iterations. scaling of a convergent power series. Figure 5.12 shows the error scaling for the reduced basis emulator for Model 3. We see that the error scaling is O(e−2.66N ), for N above 10 training points. On a single Intel i7-9750H processor, numerically solving the differential equation for one value of c takes about 7 × 10−2 s. In contrast the spline emulator requires about 1.7 × 10−3 s for 23 training points, and the RB emulator takes about 5.5 × 10−4 s for 13 training points. Therefore the spline emulator has a raw speedup factor of sraw ∼ 40, while the RB emulator has a raw speedup factor of sraw ∼ 130. Given the somewhat comparable values for sraw and the exponential scaling of the error for the RB emulator, we conclude that the RB emulator significantly outperforms the spline emulator for this example. 115 Figure 5.12: Reduced basis method error scaling for Model 2. We plot the logarithm of the error versus the number of iterations. 5.3 Self-learning Eigenvector Continuation We return to our discussion of eigenvector continuation and show that self-learning eigen- vector continuation works very well in selecting optimal training points for us. Suppose we are interested in finding the ground state eigenvector |v(c)⟩, and eigenenergy E(c) of the Hamiltonian H(c). Let the eigenvector continuation estimate of eigenvector be |ṽ(c)⟩, and the estimate of eigenenergy be Ẽ. The logarithm of the error is log∥|∆v(c)⟩∥, where |∆v(c)⟩ = |v(c)⟩ − |ṽ(c)⟩. Comput- ing the error directly will be computationally too expensive for large systems, and so we will instead work with a fast error estimate F [ṽ(c), H(c)]. Earlier we had mentioned that |H(c) |ṽ(c)⟩ − Ẽ(c) |ṽ(c)⟩ | could work as a fast error estimate. Here we show that we can 116 come up with a better error estimate, s ⟨ṽ(c)|[H(c) − Ẽ(c)]2 |ṽ(c)⟩ F [ṽ(c), H(c)] = . (5.10) ⟨ṽ(c)|[H(c)]2 |ṽ(c)⟩ This F [ṽ(c), H(c)] is proportional to the square root of the variance of the Hamiltonian, but it will be linearly proportional to ∥|∆v(c)⟩∥ in the limit ∥|∆v(c)⟩∥ → 0. Thus, log F [ṽ(c), H(c)] can be used as a fast error estimate. We will now present a geometrical picture of eigenvector continuation error, as well as some additional insight into why we choose the error estimate given in equation 5.10). Suppose we know the eigenvectors at M different training points, {c(1) , · · · , c(M ) }. We label the set of M training eigenvectors as SM = {|v(c(1) )⟩ , · · · , |v(c(M ) )⟩}. Let us define the norm matrix N (SM ) as    ⟨v(c(1) )|v(c(1) )⟩ ··· ⟨v(c(1) )|v(c(M ) )⟩    .. .. .. , (5.11)    . . .     ⟨v(c (M ) (1) )|v(c )⟩ · · · ⟨v(c (M ) )|v(c(M ) )⟩ and let Ω2 (SM ) be the determinant of N (SM ). Then Ω2 (SM ) corresponds to the square of the volume of the M -dimensional parallelopiped defined by the vectors in the set SM . If all the eigenvectors are normalized, then the maximum possible volume is 1, which is attained when all the eigenvectors are orthogonal. Let us now consider selecting the next training point, cM +1 . Let P be the projection operator onto the linear span of SM , and let Q be the orthogonal complement so that Q = 1 − P . Suppose we now expand our training set SM by adding another training vector 117 |v(c)⟩ to form SM +1 . Let us define the perpendicular projection vector |v⊥ (c)⟩ as |v⊥ (c)⟩ = Q |v(c)⟩ . (5.12) Since Ω2 (SM ) is the squared volume of the parallelopiped defined by the vectors in SM and Ω2 (SM +1 ) is the squared volume of the parallelopiped defined by the vectors in SM +1 , it follows that the ratio Ω2 (SM +1 ) to Ω2 (SM ) is given by the squared norm of |v⊥ (c)⟩, Ω2 (SM +1 ) = ⟨v⊥ (c)|v⊥ (c)⟩ . (5.13) Ω2 (SM ) Let us define the projections of H onto P and Q subspaces as H P (c) = P H(c)P, H Q (c) = QH(c)Q. (5.14) The eigenvector continuation approximation is just the approximation of |v(c)⟩ by some eigenvector of H P (c), which we denote as |v P (c)⟩. This is because eigenvector continua- tion just gives us the eigenvector of the Hamiltonian projected on to the space of training eigenvectors, which is the P subspace. Let the corresponding energy be labelled E P (c) so that H P (c) |v P (c)⟩ = E P (c) |v P (c)⟩ . (5.15) We also label the eigenvectors of H Q (c) contained in the orthogonal complement Q as, Q Q H Q (c) |vj (c)⟩ = E Q (c) |vj (c)⟩ . (5.16) 118 When the difference between the exact eigenvector and the eigenvector continuation estimate of the exact eigenvector is small, we can use first order perturbation theory to write X ⟨vjQ (c)|H(c)|v P (c)⟩ Q |v(c)⟩ ≈ |v P (c)⟩ + Q |vj (c)⟩ . (5.17) j E P (c) − Ej (c) To first order in perturbation theory, the residual vector is just |v⊥ (c)⟩ ≈ |v(c)⟩ − |v P (c)⟩. We therefore have X ⟨vjQ (c)|H(c)|v P (c)⟩ Q |v⊥ (c)⟩ ≈ Q |vj (c)⟩ (5.18) j E P (c) − Ej (c) If we now combine with Eq. (5.13), we get Q Q Ω2 (SM +1 ) X ⟨v P (c)|H(c)|vj (c)⟩ ⟨vj (c)|H(c)|v P (c)⟩ 2 = ∥|v⊥ (c)⟩∥ = . (5.19) Ω2 (SM ) Q [E P (c) − Ej (c)]2 j We can now connect this result with the fast error estimate function in equation 5.10. The second part of the equation gives an expression for the error term ∥|v⊥ (c)⟩∥ using first- order perturbation theory, and the first part of the equation is a geometric interpretation of the error term as the ratio of the squared volumes, Ω2 (SM +1 ) to Ω2 (SM ). Taking the logarithm of the square root in equation 5.19, we get Q Q 1 X ⟨v P (c)|H(c)|vj (c)⟩ ⟨vj (c)|H(c)|v P (c)⟩ log∥|v⊥ (c)⟩∥ = log . (5.20) 2 [E P (c) − E Q (c)]2 j j The term in the numerator, Q Q ⟨v P (c)|H(c)|vj (c)⟩ ⟨vj (c)|H(c)|v P (c)⟩ , (5.21) 119 will go to zero at each of the training points, causing large variations in the logarithm of the error as we add more and more training points. In contrast, the term in the denominator, Q [E P (c) − Ej (c)]2 , will be smooth as a function of c. Similarly, ⟨v P (c)|[H(c)]2 |v P (c)⟩ will also be a smooth function of c. We can write Q Q 1 X ⟨v P (c)|H(c)|vj (c)⟩ ⟨vj (c)|H(c)|v P (c)⟩ log Q = 2 [E P (c) − Ej (c)]2 j Q Q 1 X ⟨v P (c)|H(c)|vj (c)⟩ ⟨vj (c)|H(c)|v P (c)⟩ log + A + B(c), (5.22) 2 ⟨v P (c)|[H(c)]2 |v P (c)⟩ j where A is a constant and B(c) averages to zero over the entire domain of c. While the function B(c) is unknown, it will be dominated by the large variations in the logarithm of the error as more and more training points are added. We note that X ⟨v P (c)|H(c)|vjQ (c)⟩ ⟨vjQ (c)|H(c)|v P (c)⟩ ⟨v P (c)|[H(c)]2 |v P (c)⟩ j ⟨v P (c)|H(c)(1 − P )(1 − P )H(c)|v P (c)⟩ = ⟨v P (c)|[H(c)]2 |v P (c)⟩ ⟨v P (c)|[H(c) − H P (c)]2 |v P (c)⟩ = ⟨v P (c)|[H(c)]2 |v P (c)⟩ ⟨v P (c)|[H(c) − E P (c)]2 |v P (c)⟩ = . (5.23) ⟨v P (c)|[H(c)]2 |v P (c)⟩ We therefore arrive at the variance error estimate used in equation 5.10, 1 ⟨v P (c)|[H(c) − E P (c)]2 |v P (c)⟩ log∥|v⊥ (c)⟩∥ = log + A + B(c). (5.24) 2 ⟨v P (c)|[H(c)]2 |v P (c)⟩ Thus, with the fast error estimate of equation 5.10, we apply self-learning to eigenvector continuation emulator. We consider the ground state of a system of four distinguishable par- 120 ticles with equal masses on a three-dimensional lattice with zero-range interactions. We will call this example Model 4. We use lattice units where physical quantities are multiplied by powers of the spatial lattice spacing to make the combinations dimensionless. Furthermore, we set the particles masses to equal 1 in lattice units. We label the particles as 1, 2, 3, 4 and take the control parameters to be the six possible pairwise interactions, cij , with i < j. The lattice volume is a periodic cube of size L3 = 43 , and the corresponding Hamiltonian is a linear space with 262, 144 dimensions. This model can be viewed as a generalization of the four two-component fermions with zero-range interactions that we considered in chapter 4, and considered in Ref. [33, 39], or the Bose-Hubbard model considered in Ref. [1]. Let n denote the spatial lattice points on our three dimensional L3 periodic lattice. Let † the lattice annihilation and creation operators for particle i be written as ai (n) and ai (n) respectively. The free non-relativistic lattice Hamiltonian has the form 3 X X † Hfree = ai (n)ai (n) m i=1,2,3,4 n 1 X X X † h i − ai (n) ai (n + l̂) + ai (n − l̂) . (5.25) 2m i=1,2,3,4 l̂=1̂,2̂,3̂ n which is the same as the one we have seen in chapter 4. However, instead of being fermions, they are distinguishable particles, and all of them can be on one site. We add to the free Hamiltonian single-site contact interactions, and the resulting Hamiltonian then has the form XX H = Hfree + cij ρi (n)ρj (n), (5.26) i 1, and cA = 0 for all i. Furthermore, since the i (n) normalization is arbitrary, we also take c1 = 0 for all n > 1. In row 1 at order n we have X (n) X (n′ ) (n−n′ ) ⟨1| H |Ai ⟩ cA = λ1 c1 . (8.3) i i 1≤n′ ≤n (0) (n) Since c1 = 1 and c1 = 0 for all n > 1, we can rewrite this as (n) X (n) λ1 = ⟨1| H |Ai ⟩ cA (8.4) i i In row j > 1 at order n we have (n) X (n) X (n′ ) (n−n′ ) λ j cj + ⟨j| H |Ai ⟩ cA = λ1 cj (8.5) i i 1≤n′ ≤n In row Ai at order n we have (n+1) X (n) X (n) X (n′ ) (n−n′ ) λ Ai c A + ⟨Ai | H |j⟩ cj + ⟨Ai | H |Ak ⟩ cA = λ1 cA (8.6) i k i j k̸=i 1≤n′ ≤n 138 We solve these equations recursively. From Eq. (8.6) we have, (n) −1 (n−1) (n−1) − λ−1 X X cA = −λA ⟨Ai | H |j⟩ cj A ⟨Ai | H |Ak ⟩ cA i i i k j k̸=i (n′ ) (n−1−n′ ) +λ−1 X A λ1 cA (8.7) i i 1≤n′ ≤n−1 From Eq. (8.4), we get (n) X (n) λ1 = ⟨1| H |Ai ⟩ cA (8.8) i i And from Eq. (8.5) we have for j > 1, (n) (n) (n′ ) (n−n′ ) = −λ−1 ⟨j| H |Ai ⟩ cA + λ−1 X X cj j j λ1 cj (8.9) i i 1≤n′ ≤n For applications where the initial subspace is very large, it is impractical to fully diag- onalize the subspace. In that case we include the lowest energy subspace-projected eigen- vectors |j⟩ with the same quantum numbers as |1⟩ and extrapolate to the limit where all subspace-projected eigenvectors |j⟩ are included. 8.1.2 Simplified recursion relation We let |Ai ⟩ λ−1 |Ai ⟩ λ−1 XX XX L=− A ⟨Ai | H |j⟩ ⟨j| − A ⟨Ai | H |Ak ⟩ ⟨Ak | , (8.10) i i i j i k̸=i |j⟩ λ−1 XX S=− j ⟨j| H |Ai ⟩ ⟨Ai | , (8.11) j̸=1 i 139 |j⟩ λ−1 |Ai ⟩ λ−1 X X D= j ⟨j| + A ⟨Ai | , i (8.12) j̸=1 i X ⟨rL | = ⟨1| H |Ai ⟩ ⟨Ai | . (8.13) i We also define E X (n) (n) ψS = cj |j⟩ , (8.14) j E X (n) (n) ψL = cA |Ai ⟩ . (8.15) i i We can then write the recurrence relation as (n′ ) (n−1−n′ ) E E E  (n) (n−1) (n−1) X ψL =L ψL +L ψS +D λ1 ψL , (8.16) 1≤n′ ≤n−1 D E (n) (n) λ1 = rL ψL , (8.17) (n′ ) (n−n′ ) E E  (n) (n−1) X ψS =S ψL +D λ1 ψS . (8.18) 1≤n′ ≤n At first order we have E E E (1) (0) (0) ψL = L ψL + L ψS , (8.19) D E (1) (0) λ1 = rL L ψL , (8.20) E E D E E (1) (0) (0) (0) ψS =S ψL + rL L ψL D ψS . (8.21) 140 At second order we have E E E E (2) (1) (1) (1) (1) ψL =L ψL + L ψS + λ1 D ψL , E E E (0) (0) (1) =LL ψL + (LL + LS) ψL + LS ψS D Eh E E Ei (0) (0) (0) (0) + rL L ψL LD ψS + DL ψL + DL ψS , (8.22) D E E (2) (2) (2) λ1 = rL L ψL ψL D E D E D E (0) (0) (0) = rL LLL ψL + rL (LLL + LLS) ψL + rL LLS ψS D E hD E D Ei (0) (0) (0) + rL L ψL rL LLD ψS + rL LDL ψL D ED E (0) (0) + rL L ψL rL LDL ψS , (8.23) E E E E (2) (1) (1) (1) (2) (0) ψS = S ψL + λ1 D ψS + λ1 D ψS . (8.24) 8.1.3 Use in Eigenvector Continuation We now use the perturbative corrections at each order n as our basis vectors for eigenvector continuation. The norm matrix elements are X (n)∗ (n′ ) X (n)∗ (n′ ) Nn,n′ = cj cj + cA cA . (8.25) i i j i 141 while the Hamiltonian matrix elements are X (n)∗ (n′ ) X (n)∗ (n′ ) Hn,n′ (E) = cj λ j cj + E c A λ Ai c A i i j̸=1 i X X (n)∗ (n′ ) X X (n)∗ (n′ ) + cj ⟨j| H |Ai ⟩ cA + cA ⟨Ai | H |j⟩ cj i i j i i j X X (n)∗ (n′ ) + cA ⟨Ai | H |Ak ⟩ cA . (8.26) i k i k̸=i For many applications of interest, the number of basis vectors |Ai ⟩ will be extremely (n) large, so large that standard vector operators are not possible and storage of the entries cA i require more computer memory than available. In that case we cannot use the recursion relations directly. Instead, we need to completely unroll the recursion relations into their individual terms. The individual expressions can be evaluated by one of two options. The first option is exact calculation. This however becomes impractical rather quickly even with massively parallel computing. The more practical approach is to use stochastic sampling of the summation terms as suggested in the original stochastic error correction paper [62]. For our case of the anharmonic oscillator, we have a single parameter λ which we set to 0 after shifting the Hamiltonian by λI. The elements ⟨n|H|Ai ⟩ and ⟨Ai |H|n⟩ are particularly (0) (n) simple as n = 1 only. We have only one c1 as well, which is set to c1 = 1 and c1 = 0 for n > 1. 8.2 Natural Cubic Splines In this section, we show the algorithm for numerically computing Natural Cubic Splines. This can also be found in any book on splines [63], and even on Wikipedia. 142 Our input is a set of data C containing n + 1 points. Our output is a set of splines, which is composed of n 5-tuples. Algorithm: 1. Create new array a of size n + 1 and for i = 0, . . . , n set ai = yi . 2. Create new arrays b and d each of size n. 3. Create new array h of size n and for i = 0, . . . , n − 1 set hi = xi+1 − xi . 4. Create new array α of size n and for i = 1, . . . , n − 1 set 3 3 αi = (ai+1 − ai ) − (a − ai−1 ). hi hi−1 i 5. Create new arrays c, l, µ, and z each of size n + 1 . 6. Set l0 = 1, µ0 = z0 = 0 . 7. For i = 1, . . . , n − 1 (a) Set li = 2 (xi+1 − xi−1 ) − hi−1 µi−1 . h (b) Set µi = i . li α − hi−1 zi−1 (c) Set zi = i . li 8. Set ln = 1; zn = cn = 0. 9. For j = n − 1, n − 2, . . . , 0. (a) Set cj = zj − µj cj+1 .  aj+1 − aj hj cj+1 + 2cj (b) Set bj = − . hj 3 cj+1 − cj (c) Set dj = . 3hj 143 10. Create new set Splines and populate it with n splines S 11. For i = 0, . . . , n − 1 (a) Set Si,a = ai . (b) Set Si,b = bi . (c) Set Si,c = ci . (d) Set Si,d = di . (e) Set Si,x = xi . 12. The n output splines S are in the form:  2 3 Sj (x) = aj + bj x − xj + cj x − xj + dj x − xj 144 BIBLIOGRAPHY 145 BIBLIOGRAPHY [1] D. Frame, R. He, I. Ipsen, D. Lee, D. Lee, and E. Rrapaj. Eigenvector continuation with subspace learning. Phys. Rev. Lett., 121(3):032501, 2018. [2] D. K. Frame. Ab Initio Simulations of Light Nuclear Systems Using Eigenvector Con- tinuation and Auxiliary Field Monte Carlo. PhD thesis, Michigan State University, 2019. [3] S. König, A. Ekström, K. Hebeler, D. Lee, and A. Schwenk. Eigenvector Continuation as an Efficient and Accurate Emulator for Uncertainty Quantification. 2019. [4] A. Ekström and G. Hagen. Global sensitivity analysis of bulk properties of an atomic nucleus. Phys. Rev. Lett., 123(25):252501, 2019. [5] P. Demol, T. Duguet, A. Ekström, M. Frosini, K. Hebeler, S. König, D. Lee, A. Schwenk, V. Somà, and A. Tichai. Improved many-body expansions from eigenvector continua- tion. Phys. Rev., C101:041302(R), 2020. [6] A. Ekström, G. R. Jansen, K. A. Wendt, G. Hagen, T. Papenbrock, B. D. Carlsson, C. Forssén, M. Hjorth-Jensen, P. Navrátil, and W. Nazarewicz. Accurate nuclear radii and binding energies from a chiral interaction. Phys. Rev., C91(5):051301, 2015. [7] S. Elhatisari et al. Nuclear binding near a quantum phase transition. Phys. Rev. Lett., 117(13):132501, 2016. [8] V. Lapoux, V. Somà, C. Barbieri, H. Hergert, J. D. Holt, and S. R. Stroberg. Radii and Binding Energies in Oxygen Isotopes: A Challenge for Nuclear Forces. Phys. Rev. Lett., 117(5):052501, 2016. [9] M. Piarulli et al. Light-nuclei spectra from chiral dynamics. Phys. Rev. Lett., 120(5):052503, 2018. [10] B.-N. Lu, N. Li, S. Elhatisari, D. Lee, E. Epelbaum, and U.-G. Meißner. Essential elements for nuclear binding. Phys. Lett., B797:134863, 2019. [11] S. Binder et al. Few-nucleon and many-nucleon systems with semilocal coordinate-space regularized chiral nucleon-nucleon forces. Phys. Rev., C98(1):014002, 2018. [12] V. Somà, P. Navrátil, F. Raimondi, C. Barbieri, and T. Duguet. Novel chiral Hamil- tonian and observables in light and medium-mass nuclei. Phys. Rev., C101(1):014318, 2020. 146 [13] S. Gandolfi, D. Lonardoni, A. Lovato, and M. Piarulli. Atomic nuclei from quantum Monte Carlo calculations with chiral EFT interactions. 2020. [14] I. Tews, Z. Davoudi, A. Ekström, J. D. Holt, and J. E. Lynn. New Ideas in Constraining Nuclear Forces. 2020. [15] J. J. Sakurai and J. Napolitano. Modern Quantum Mechanics. Cambridge University Press, 2021. [16] W. D. Heiss and A. L. Sannino. Avoided level crossing and exceptional points. Journal of Physics A: Mathematical and General, 23(7):1167–1178, apr 1990. [17] T. Kato. Perturbation Theory for Linear Operators. Springer, 1995. [18] E. J. Weniger. Very accurate summation for the infinite coupling limit of the pertur- bation series expansions of anharmonic oscillators. Phys. Lett. A, 156:169–174, Jun, 1991. [19] E. J. Weniger. The summation of the ordinary and renormalized perturbation series for the ground state energy of the quartic, sextic, and octic anharmonic oscillators using nonlinear sequence transformations. J. Math. Phys., 34, 571, 1993. [20] E. J. Weniger. A convergent renormalized strong coupling perturbation expansion for the ground state energy of the quartic, sextic, and octic anharmonic oscillator. Ann. Phys., 246:133–165, Feb, 1996. [21] G. Bozzolo and A. Plastino. Generalized anharmonic oscillator: A simple variational approach. Phys. Rev. D, 24, Dec, 1981. [22] N. Bessis and G. Bessis. Open perturbation and the riccati equation: Algebraic de- termination of the quartic anharmonic oscillator energies and eigenfunctions. J. Math. Phys., 38, Jul, 1997. [23] F. M. Fernández, Q. Ma, and R. H. Tipping. Tight upper and lower bounds for energy eigenvalues of the schrödinger equation. Phys. Rev. A, 39, Feb, 1989. [24] C. M. Bender and T. T. Wu. Anharmonic oscillator. Phys. Rev., 184, 1231, Aug,1969. [25] C. M. Bender and T. T. Wu. Large-order behavior of perturbation theory. Phys. Rev. Lett., 27, 461, August 1971. [26] C. M. Bender and T. T. Wu. Anharmonic oscillator. ii. a study of perturbation theory in large order. Phys. Rev. D, 7, 1620, March 1973. 147 [27] A. J. Leggett. Modern Trends in the Theory of Condensed Matter. Proceedings of the XVIth Karpacz Winter School of Theoretical Physics, Karpacz, Poland, 1980. Springer- Verlag, Berlin, 1980. [28] P. Nozieres and S. Schmitt-Rink. Bose condensation in an attractive fermion gas: from weak to strong coupling superconductivity. J. Low Temp. Phys., 59:195–211, 1985. [29] K. M. O’Hara, S. L. Hemmer, M. E. Gehm, S. R. Granade, and J. E. Thomas. Obser- vation of a strongly interacting degenerate fermi gas of atoms. Science, 298:2179–2182, 2002. [30] S. Gupta, Z. Hadzibabic, M. W. Zwierlein, C. A. Stan, K. Dieckmann, C. H. Schunck, E. G. M. van Kempen, B. J. Verhaar, and W. Ketterle. Radio-frequency spectroscopy of ultracold fermions. Science, 300:1723–1726, 2003. [31] C. A. Regal and D. S. Jin. Measurement of positive and negative scattering lengths in a fermi gas of atoms. Phys. Rev. Lett., 90:230404, 2003. [32] M. J. H. Ku, A. T. Sommer, L. W. Cheuk, and M. W. Zwierlein. Revealing the Superfluid Lambda Transition in the Universal Thermodynamics of a Unitary Fermi Gas. Science, 335:563–, February 2012. [33] S. Bour, X. Li, D. Lee, U.-G. Meißner, and L. Mitas. Precision benchmark calculations for four particles at unitarity. Phys. Rev., A83:063619, 2011. [34] M. Kompaniets. Prediction of the higher-order terms based on Borel resummation with conformal mapping. J. Phys. Conf. Ser., 762(1):012075, 2016. [35] K. Van Houcke, F. Werner, and R. Rossi. High-precision numerical solution of the Fermi polaron problem and large-order behavior of its diagrammatic series. Phys. Rev., 101(4):045134, Jan 2020. [36] R. J. Furnstahl, A. J. Garcia, P. J. Millican, and X. Zhang. Efficient emulators for scattering using eigenvector continuation. Phys. Lett. B, 809:135719, 2020. [37] D. Bai and Z. Ren. Generalizing the calculable R-matrix theory and eigenvector contin- uation to the incoming wave boundary condition. Phys. Rev. C, 103(1):014612, 2021. [38] S. Wesolowski, I. Svensson, A. Ekström, C. Forssén, R. J. Furnstahl, J. A. Melendez, and D. R. Phillips. Fast & rigorous constraints on chiral three-nucleon forces from few-body observables. 4 2021. [39] A. Sarkar and D. Lee. Convergence of eigenvector continuation. Phys. Rev. Lett., 126:032501, Jan 2021. 148 [40] S. Yoshida and N. Shimizu. A new workflow of shell-model calculations with the emulator and preprocessing using eigenvector continuation, and shell-model code ShellModel.jl. 5 2021. [41] J. A. Melendez, C. Drischler, A. J. Garcia, R. J. Furnstahl, and X. Zhang. Fast & accurate emulation of two-body scattering observables without wave functions. 6 2021. [42] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová. Machine learning and the physical sciences. Rev. Mod. Phys., 91:045002, Dec 2019. [43] J. J. Thiagarajan, B. Venkatesh, R. Anirudh, P.-T. Bremer, J. Gaffney, G. Anderson, and B. Spears. Designing accurate emulators for scientific processes using calibration- driven deep models. Nature Communications, 11:5622, November 2020. [44] M. F. Kasim, D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula, G. Gre- gori, M. Jarvis, S. Khatiwala, J. Korenaga, J. Topp-Mugglestone, E. Viezzer, and S. M. Vinko. Building high accuracy emulators for scientific simulations with deep neural architecture search. arXiv e-prints, page arXiv:2001.08055, January 2020. [45] P. Bedaque et al. Report from the A.I. For Nuclear Physics Workshop. Eur. Phys. J. A, 57(3):100, 2021. [46] P. Baldi, S. P., and Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun., July 2014. [47] E. A, R. A, R. B, K. V, D. M, C. K, C. C, C. G, T. S, and D. J. A guide to deep learning in healthcare. Nat Med., Jan 2019. [48] B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009. [49] D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of artificial intelligence research, 4:129–145, 1996. [50] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Ma- chine learning, 15(2):201–221, 1994. [51] D. J. C. MacKay. Information-based objective functions for active data selection. MIT Press Direct, 4, 1992. [52] J. H. Ahlberg, E. N. Nilson, and J. H. Walsh. Theory of splines and their applications. Academic Press, 1967. [53] E. Bonilla, P. Giuliani, K. Godbey, and D. Lee. Training and Projecting: A Reduced Basis Method Emulator for Many-Body Physics. 3 2022. 149 [54] J. A. Melendez, C. Drischler, R. J. Furnstahl, A. J. Garcia, and X. Zhang. Model reduction methods for nuclear emulators. 3 2022. [55] A. Quarteroni, A. Manzoni, and F. Negri. Reduced Basis Methods for Partial Differential Equations. Springer, 2016. [56] A. Quarteroni, G. Rozza, and A. Manzoni. Certified reduced basis approximation for parametrized partial differential equations and applications. J.Math.Industry, 1, Jun 2011. [57] S. E. Field, C. R. Galley, F. Herrmann, J. S. Hesthaven, E. Ochsner, and M. Tiglio. Reduced basis catalogs for gravitational wave templates. Phys. Rev. Lett., 106:221102, Jun 2011. [58] S. Elhatisari, E. Epelbaum, H. Krebs, T. A. Lähde, D. Lee, N. Li, B.-n. Lu, U.-G. Meißner, and G. Rupak. Ab initio Calculations of the Isotopic Dependence of Nuclear Clustering. Phys. Rev. Lett., 119(22):222505, 2017. [59] M. Freer, H. Horiuchi, Y. Kanada-En’yo, D. Lee, and U.-G. Meißner. Microscopic Clustering in Light Nuclei. Rev. Mod. Phys., 90(3):035004, 2018. [60] M. Pincus. Letter to the editor—a monte carlo method for the approximate solution of certain types of constrained optimization problems. Operations Research, 18(6):1225– 1228, 1970. [61] A. Quarteroni, A. Manzoni, and F. Negri. Nuclear Lattice Effective Field Theory - An Introduction. Springer, 2019. [62] D. Lee, N. Salwen, and M. Windoloski. Introduction to stochastic error correction methods. Phy. Lett. B, 502:329–337, 2001. [63] J. H. Ahlberg, E. N. Nilson, and J. L. Walsh. The Theory of Splines and Their Appli- cations. ACADEMIC PRESS, 1967. 150