NEW PATHS IN LATTICE GAUGE THEORIES By Giovanni Pederiva A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Physics — Doctor of Philosophy 2022 ABSTRACT The theory of Quantum Chromodynamics (QCD) is a crucial element in our current understanding of the laws that govern the universe. It describes the Strong Interaction, the force that governs the behavior of quarks and gluons and confines them at low energies into hadrons, such as protons, neutrons and pions. Lattice QCD (LQCD) is a numerical approach that allows non-perturbative studies of the strong interaction at hadronic energy scales, where other theoretical methods fail. It works by discretizing space-time on a lattice of points, fixing the fermion field to the lattice sites and representing the gluon field as the links between them. Our current numerical approach to Lattice QCD is based upon Markov Chain Monte Carlo (MCMC) methods, from which statistical ensembles of lattice gauge field configurations are generated using a discrete version of the QCD action when evaluating the path-integrals. These calculations require the world’s largest supercomputing facilities and possible improvements or new ideas are actively being researched. In this dissertation we present three such attempts. First, we propose a new strategy to employ the recent advances in Machine Learning into LQCD. In particular, we present a method to use Neural Networks to accelerate the calculations of hadron two-point correlation functions and discuss its applicability. The second effort we present is part of new collaborative undertaking by the OPEN LATtice Ini- tiative (OPENLAT) to generate new state-of-the-art LQCD ensembles using the recently proposed Stabilized Wilson Fermions (SWF) package, which also includes a modified lattice action. The first results for the light hadron spectrum, obtained by Bayesian model averaging, are presented indicating the excellent scaling behavior of SWF towards the continuum. Finally, the tentative new approach of Quantum Computing for Lattice Field Theories is introduced as a possible radical solution to the sign problem. We compare three algorithms for quantum state preparation in the case of the Schwinger model, a toy model for QCD, and discuss their applicability to Noisy Intermediate-Scale Quantum (NISQ) systems. In loving memory of my mother. You are missed beyond measure. iii ACKNOWLEDGEMENTS This thesis marks the end of my life in Michigan and an incredibly intense part of my life. So many people have helped me along the way, in the good and the bad moments. First, I want to thank my supervisor Andrea Shindler for the opportunities he has given me, the constant support and the knowledge he shared with me. Working with you all these years has been fantastic. Then, I want to thank the members of my Ph.D. committee, professors A. Bazavov, D. Lee, C. Piermarocchi and J. Singh, for the guidance they have provided me. A special thanks go to Matt Rizik, Jack Dragos, Jangho Kim, Leon Hostetler and Johannes Weber for all our interesting discussions at MSU. I want to thank also all my other collaborators, especially the members of the OPENLAT initiative, for accepting me on their team. I also want to acknowledge all the people at FRIB, particularly those in the Nuclear Theory Group. A big thank you goes to Morten Hjorth-Jensen, for first giving me the opportunity to come to MSU six years ago and then for always supporting me. I ought to thank all the people I met here in Lansing, who helped me along the way and made my years in Michigan memorable. In particular, I want to thank Giordano, Samuel, Alessandro, Yannis, Nikos, Pely, Artemis, Jordan, David, Johan, Giovanni, Anne and Alida. I will keep the memories of our time together forever with me. I also must thank all the friends that I left back in Europe, who are also there for me whenever I come to visit; you make it worth every time. I want to mention the amazing group of Gianmarco, Emanuele, Sebastiano, Giuliano, Stefano, Andrea and Giovanni. But also, Annalisa and Clarissa. A very special thank you goes to Alessio, for all the fun reunions through the years. There are no words to express the gratitude I feel for Hannah. Thank you for being a true friend and always being there for me; I will miss our time together so much. Thanks also goes to the members of my family, for the unconditional support they give me, regardless of the distance; the most recent ones, Federico and little Jack, for all the fun times spent together; and to my father, who inspired me in my career and keeps supporting me. iv And finally, to you two. Life has taken us far away from one another, but no matter where we are, you mean everything to me. Whenever I look at the stars, I think of you, wishing it was a Summer night in Trento. Whatever happens in the future, I know we will be there for each other. Giovanni Pederiva East Lansing, October 7 2022 v TABLE OF CONTENTS LIST OF ABBREVIATIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii CHAPTER 1: INTRODUCTION TO QUANTUM CHROMODYNAMICS AND LATTICE FIELD THEORIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Quantum Chromodynamics in the Continuum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Quantum Chromodynamics on a Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Numerical Evaluation of Path Integrals on the Lattice QCD . . . . . . . . . . . . . . . . . . . . . . . 18 1.4 Modern Lattice QCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 APPENDIX 1A: TOOLS FOR STATISTICAL ANALYSIS IN LATTICE QCD . . . . . . . . . . 38 APPENDIX 1B: OVERVIEW OF TOPOLOGICAL EFFECTS IN LATTICE QCD . . . . . . 41 APPENDIX 1C: THE GRADIENT FLOW AND SCALE SETTING . . . . . . . . . . . . . . . . . . . . . 43 CHAPTER 2: MACHINE LEARNING METHODS FOR HADRON CORRELATORS FROM LATTICE QCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.1 Residue analysis of the BiCGSTAB Linear Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2 Machine Learning Regression for Two-point Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.3 Accelerating Correlator Calculations with ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.5 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 APPENDIX 2A: LINEAR ALGEBRA CONVENTIONS AND DEFINITIONS . . . . . . . . . . 79 APPENDIX 2B: THE CONJUGATE GRADIENT AND BICGSTAB ALGORITHMS . . 81 APPENDIX 2C: NEURAL NETWORK PARAMETERS IN DETAIL . . . . . . . . . . . . . . . . . . . . 83 CHAPTER 3: FIRST RESULTS OF HADRON SPECTROSCOPY WITH STABILIZED WILSON FERMIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.1 The OPEN LATtice initiative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.2 Stabilized Wilson Fermion Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.3 Simulating Dynamical QCD with 2 + 1 Quark Flavors and SWF . . . . . . . . . . . . . . . . . 95 3.4 First SWF Ensembles and Light Hadron Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.5 Outlook for Studies with SWF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 CHAPTER 4: QUANTUM STATE PREPARATION ALGORITHMS FOR THE SCHWINGER MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.1 A Quick Tour of Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.2 The Schwinger Model with a θ-term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.3 Adiabatic State Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 4.4 Quantum Approximate Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 4.5 The Rodeo Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 4.6 Comparison of Algorithms and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 vi BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 CHAPTER 5: SUMMARY AND OUTLOOK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 vii LIST OF ABBREVIATIONS AE Algorithmic Efficiency AIC Akaike Information Criterion ASP Adiabatic State Preparation BiCGSTAB BiConjugate Gradient Stabilized (Algorithm) BSM Beyond Standard Model CG Conjugate Gradient (Algorithm) CP Charge Parity (Symmetry) FLAG Flavour Lattice Averaging Group FP Fully Precise GBF Gaussian Bayes Factor GNN Global Neural Network (Model) GS Ground State HMC Hybrid Monte Carlo (Algorithm) HP High Precision HPC High-Performance Computing ILDG International Lattice Data Grid KS Krylov Subspace LFT Lattice Field Theory LP Low Precision LQCD Lattice Quantum Chromodynamics MC Monte Carlo MCMC Markov Chain Monte Carlo MD Molecular Dynamics ML Machine Learning MLP Multi-Layer Perceptron nEDM Neutron Electric Dipole Moment NISQ Noisy Intermediate Scale Quantum viii NN Neural Network OPENLAT Open Lattice Initiative PDF Probability Distribution Function PDG Particle Data Group pQCD Perturbative Quantum Chromodynamics PS Point-Sink Smeared-Source QAOA Quantum Approximate Optimization Algorithm QC Quantum Computing QCD Quantum Chromodynamics QED Quantum Electrodynamics QFT Quantum Field Theory RA Rodeo Algorithm ReLU Rectified Linear Unit SGD Stochastic Gradient Descent SM Standard Model SMD Stochastic Molecular Dynamics (Algorithm) SPNN Single Point Neural Network (Model) SS Smeared-Sink Smeared-Source SW Sheikholeslami-Wohlert SWF Stabilized Wilson Fermions TSM Truncated Solver Method WC Wilson Clover WCF Wilson Clover Fermions ix CHAPTER 1: INTRODUCTION TO QUANTUM CHROMODYNAMICS AND LATTICE FIELD THEORIES Quantum Field Theories (QFT) are in theoretical physics the framework, developed over the 20th century, in which Special Relativity and Quantum Mechanics are combined. The current best knowledge of the microscopic world, at the subatomic scale, are best described with QFTs. The general approach is to consider particles as quantized localized excitations of more fundamental objects, the quantum fields. The laws governing the behavior of these fields are described using Lagrangian formalism. The Standard Model (SM) of Particle Physics is the QFT that describes the interactions of the currently known fundamental particles, quarks and leptons, under three fundamental forces —electromagnetism, the weak and strong force— and the Higgs field. A summary of the particles included in the SM is found in Figure 1.1. 1st 2nd 3rd generation Goldstone outside standard matter unstable matter force carriers bosons standard model 2.3 MeV R/ 2/3 1.28 GeV R/ 2/3 173.2 GeV G/ 2/3 charge 125.1 GeV B B R/ strong nuclear force (color) G/ G/ u c B t H colors 6 quarks mass up 1/2 charm 1/2 top 1/2 spin Higgs 0 4.8 MeV −1/3 95 MeV −1/3 4.7 GeV −1/3 electromagnetic force (charge) R/ G/ R/ g co G/ B R/ G/ B r (+6 anti-quarks) s B lo d down 1/2 strange 1/2 b bottom 1/2 gluon 1 weak nuclear force (weak isospin) 511 keV −1 105.7 MeV −1 1.777 GeV −1 e µ τ γ gravitational force (mass) 6 leptons muon tau photon electron 1/2 1/2 1/2 1 < 2 eV < 190 keV < 18.2 MeV 80.4 GeV ±1 91.2 GeV (+6 anti-leptons) νe νµ ντ W Z ± e neutrino 1/2 µ neutrino 1/2 τ neutrino 1/2 1 1 graviton 12 fermions 5 bosons (+12 anti-fermions) (+1 opposite charge W ) increasing mass → Figure 1.1: Summary of the particles in the Standard Model [1]. 1 From a more technical point of view, the SM is a field theory based on three distinct local gauge symmetry groups, each associated with a fundamental force: SU(3)C × SU(2)L × U(1)Y . (1.1) | {z } broken to SU(2)W ×U(1)Q Through the Higgs mechanism, the SU(2)L × U(1)Y symmetries are broken into the SU(2)W group associated with the weak force and the U(1)Q group associated with the electromagnetic force. However, the piece of the SM that we are interested in this thesis is the SU(3) group, the one associated with the strong force. In this introductory chapter, we will discuss the general properties of the theory of Quantum Chromodynamics (QCD) and its prevalent tool for numerical simulations: Lattice Quantum Chromodynamics (LQCD). Due to the non-linear nature of the strong interaction at low energies, an effect given by its coupling constant αS being large in that regime, the perturbative approaches that are common for the other forces, or the high energy limit of the strong force, cannot be applied. Lattice QCD is instead a non-perturbative approach that has been widely adopted to study QCD effects at the hadronic scale. It is based on formulating the theory on a finite lattice of points in spacetime with spacing usually denoted a, with the fermions residing on the lattice sites while the gluons, the mediators of the strong force, are discretized as the links connecting neighboring sites. The discretization naturally introduces a regulator for the theory, in fact there is a momentum cut-off equal to π/a. A simplified depiction is found in Figure 1.2. Over the years, since its first formulation by Kenneth Wilson in 1974 [2], many successful numerical calculations have been performed with results covering various research areas such as hadron spectra [3], quark-gluon plasma physics [4], parton distribution functions [5], physics of light nuclei [6] or hadronic matrix elements that can be used for searches of Beyond Standard Model (BSM) physics, for example: in the case of calculating CP -violating operators for searches of new physics and the neutron electric dipole moment (nEDM) [7]; g − 2, the muon magnetic dipole moment [8]; double-β decay nucleon [9]. In order to obtain results from LQCD, Monte Carlo (MC) methods are employed to generate statistical ensembles of lattice gauge field configurations from 2 Gluons Lattice Spacing Quarks Figure 1.2: Simplified representation of the discretization of QCD on a lattice. which observables are computed. This procedure, however, requires a considerable amount of computational resources; in fact LQCD calculations routinely run on the world’s largest supercomputers. Therefore, there is a strong motivation to research possible numerical and algorithmic improvements for LQCD simulations. In this work, we will study some possible paths for improving the state of current lattice QCD calculations. This thesis is structured as follows: in the rest of this chapter, we will lay the theoretical foundations of QCD and its discretization on the lattice, highlighting the numerical challenges that arise. Then in Chapter 2 we will propose a new method to accelerate the calculation of hadronic two-point functions using Machine Learning (ML). In Chapter 3 a new collaborative effort to generate state-of-the-art lattice gauge field configurations using a set of novel improvements will be introduced. These new ensembles, called Stabilized Wilson Fermions (SWF) utilize a new lattice discretization called Exponential Clover Action. The results of the calculation of the light hadron spectrum of the ensembles will be shown as a first demonstration of their scaling properties. Finally, in Chapter 4 we will present some work related to the broader ongoing effort to formulate Lattice Field Theories (LFT), and lattice QCD in particular, on Quantum Computers. It is known that QC can circumvent the sign problem affecting QCD’s MC simulations by dealing with Hamiltonian dynamics directly. However, a formulation of LQCD that is suitable for quantum 3 hardware is still unknown, and work is being done on simpler models. In particular, we will explore and compare three algorithms for quantum state preparation in the case of the lattice Schwinger Model, i.e. quantum electrodynamics in 1 spatial dimension. 1.1 Quantum Chromodynamics in the Continuum The theory of Quantum Chromodynamics (QCD) describes the Strong Interaction, also some- times called the Strong Nuclear Force. It is the theory that governs the motion of quarks and gluons, the subatomic particles that make up the Hadrons, like protons and neutrons. QCD is contained in the Standard Model and is arguably its most complicated part. The reason for this is the phenomenon of confinement. In this section, we will briefly present some of the basic principles and concepts of the theory of QCD. The origin of QCD lay in the puzzle stemming from the large number of new particles discov- ered with bubble chambers starting from the 1950s that were called hadrons. Since it was thought that not all of them could be fundamental particles, a model for some constituents (that were later identified with quarks) was needed. First, the eightfold way was proposed by M. Gell-Mann [10] and Y. Ne’eman [11]. Then, after the proposal of a new quantum number for the quarks, called color charge, by O. W. Greenberg [12] and M.Y. Han and Y. Nambu [13], in 1973 H. Fritzsch, M. Gell-Mann and H. Leutwyler [14] formalized the full QCD Lagrangian. Quantum Chromodynamics is a non-Abelian gauge theory based upon the SU(3) symmetry group. This means that all fermions in the theory come in as three-component vectors and have a quantum number associated with the SU(3), commonly referred to as “color charge.” As opposed to electromagnetism, where the two charges are identified with positive and negative, in QCD the three resulting charges are identified with the three colors red (r), green (g) and blue (b). The theory of QCD has been thoroughly tested experimentally; first deep inelastic scattering experiments showed evidence of particles inside of protons and neutrons, then in e+ e− colliders the first direct evidence of gluons was found. In modern high-energy physics experiments, QCD is a fundamental component in understanding the phenomenology. For example, in Figure 1.3 we see the perturbative QCD (pQCD) ratio of the e+ e− → hadrons cross section to the e+ e− → µ+ µ− 4 one, in which the steps, for example at 4 GeV, can be inferred by considering how many active quark flavors are allowed at a given energy (the mass of the bottom quark is ≈ 4 GeV) and contribute to the background cross section. 3 Υ 10 J /ψ ψ (2S ) Z 2 10 φ ω 10 ρ′ 1 ρ -1 10 2 1 10 10 Figure 1.3: Ratio of the experimental cross section of e+ e− → hadrons and e+ e− → µ+ µ− as collected by the Particle Data Group [15]. The naive quark model is the dashed green line and perturbative QCD (pQCD) results are also shown as a solid red line. pQCD appears to agree well to experiment in non-resonant regions of the energy spectrum. 1.1.1 The QCD Lagrangian The main equation that characterizes a QFT is its Lagrangian density. From it, all the dynamics of the theory can be derived, that is the equations of motion of the free particles and their interactions. The Lagrangian of QCD1 is the so-called Yang-Mills Lagrangian with SU(3) as the gauge group. For Nf flavors of quarks it reads: Nf X 1 LQCD = ψ̄f (iγ µ Dµ − mf )ψf − (Gaµν )2 . (1.2) f =1 4 1 The notation in this section is taken from [16]. Refer to the source for a more complete and formal treatment of the matter. 5 We will briefly describe what all the symbols in this equation mean, starting with Gaµν : the Gluon Field Strength Tensor. This is defined as i Gaµν = [Dµ , Dν ] = ∂µ Aaν − ∂ν Aaµ + g0 f abc Abµ Acν , (1.3) g0 where µ and ν are Lorentz indices while a is the index that runs over the generators of the SU(3) symmetry gauge group. Here Aaµ represents the gluon field, which carries a Lorentz index and a group generator index a, it is a spin-1 boson field. Going further, g0 represents “bare coupling” of the strong force. f abc are the structure constants of the group SU(3); they satisfy the conditions: [ta , tb ] = if abc tc , (1.4) where ta are the generators of the associated Lie algebra su(3). The covariant derivative Dµ is given by: Dµ = ∂µ − ig0 ta Aaµ . (1.5) Quarks are the fermions that enter the Lagrangian as the complex-valued Grassmann fields ψf and ψ̄f with mass mf . They are formally spin- 12 Dirac spinors that carry color charge. They come in 6 flavors in the SM, denoted here with the index f , and these are u (up), d (down), s (strange), c (charm), b (bottom) and t (top). Gauge invariance is a crucial concept in QFT and it plays a fundamental role in the process of discretization of QCD that we will see in Section 1.2. We can start understanding it by considering the quark field, which is, in fact, a triplet of fermion fields:   ψr (x)   ψ(x) =  ψb (x) .  (1.6)   ψg (x) A local gauge transformation in applied to ψ(x) can be defined as: ψ(x) → ψ ′ (x) = Ω(x)ψ(x) = exp[iαa (x)ta ]ψ(x), with Ω(x) ∈ SU(3). (1.7) 6 The gluon field is introduced in the Lagrangian in such a way that the covariant derivative term is invariant under the same gauge transformation, so: Dµ ψ(x) → (Dµ ψ(x))′ = (∂µ − ig0 A′µ (x))Ω(x)ψ(x) = Ω(x)(Dµ ψ(x)), (1.8) from which the transformation for Aµ (x) is fixed to:   i Aµ (x) → Aµ (x) = Ω(x) Aµ (x) + ∂µ Ω† (x). ′ (1.9) g0 With this procedure, the whole Lagrangian of QCD is invariant under local gauge symmetry. From the Lagrangian, we can infer some qualitative features of QCD, for example, the self- interaction of gluons, through the three- and four-gluon vertex that come from the Gaµν Gµν a term. We can also notice that QCD is flavor diagonal, meaning that all interactions do not change the flavor of the quarks. 1.1.2 General Properties of QCD The study of Quantum Chromodynamics and its properties is a very active field of research. Some key features of the theory are shared with other non-Abelian field theories, for example, the self-interaction of the gauge bosons. Other prominent features instead arise when trying to renormalize the theory, in particular when the running of the coupling is considered. At low energies, the theory shows signs of confinement, while at high energies, it is asymptotically free. Running Coupling When renormalizing QCD, one has to consider the Renormalization Group Equation (RGE) for the bare parameters of the theory, i.e. the coupling gbare and the quark masses mfbare . These equations determine the rate at which the renormalized coupling g and the masses vary as a function of some renormalization scale µ. The resulting differential equation in the case of the coupling is called the beta function β(g): d β(g) = g(µ). (1.10) d log(µ) The result is the same for any non-Abelian theory SU(N ). One can expand β(g) in powers of the coupling as: β(g) = b0 g 3 + b1 g 5 + b2 g 7 + . . . (1.11) 7 and then integrate up to an arbitrary order. This results in a specific expression for the dependency of the coupling to the renormalization scale. The different coefficients in Equation (1.11) are obtained by calculating the contributions of higher order diagrams and are, in general, renormal- ization scheme dependent, starting from b0 which can be derived by the 1-loop corrections:   1 11 2 b0 = − N − Nf . (1.12) (4π)2 3 3 Here N is the size of the symmetry group SU(N ) and Nf is the number of fermion species in the theory. By inspecting the sign of Equation (1.12) one finds that any non-Abelian theory where 11 the number of fermions is less than 2 N (in the actual case of SU(3) this is ≤ 16) has vanishing coupling in the high energy limit, so we say it is “asymptotically free.” If one performs the integration of Equation (1.10) analytically at fixed order in g by substi- tuting Equation (1.12) into Equation (1.10) and integrating by parts, one obtains the following (expressed here in terms of αS = g 2 /4π, the analog of the famous fine-structure constant αEM of electromagnetism): αS0 4π αS (µ) = b0 α0S = , (1.13) 1+ log(µ2 /M 2 ) b0 log(µ2 /Λ2QCD ) 4π where αS0 is the value of the coupling at some fixed energy µ̄ and stems from the the integration. A typical choice for µ̄ is the mass of the Z boson, because we can obtain very precise experimental data about it. It is also a suitable choice as it is a sufficiently large scale where perturbation theory for QCD can be safely applied. The constant ΛQCD represents the typical energy at which the interactions become non-perturbative. We can see this because as µ → ΛQCD the coupling diverges. This can be computed from Lattice QCD [17] and estimated experimentally; in the M S scheme, experiments suggest that its value is ≈ 200 − 300 MeV. Asymptotic Freedom From Equation (1.13) it is clear that the coupling αS (µ) at high energies decreases. This implies that as the energy of quarks increases, they are less coupled to the gluon field and the QCD contributions to interaction processes become small. One of the main arguments in support of the search for Grand Unification Theories (GUT) [15] is the fact that at some energy 8 Figure 1.4: The strong coupling as a function of the energy scale, The data points are experimental results; the solid black line and yellow bands represent the QCD prediction using the reported value of the coupling at the mass of the Z boson (αS0 in our notation). Image taken from [18]. scale, the coupling of the strong interaction becomes comparable to that of the electroweak interaction. At the time when the theory of QCD had just been proposed, the discovery of asymptotic freedom, made by Gross, Wilczek and Politzer [19, 20] —who also won the Nobel Prize in 2004 for this study— was a stepping stone in affirming that QCD was indeed the correct theory for strong interactions. Asymptotic freedom permits perturbative expansions in powers of the coupling for sufficiently large energies. It is crucial, for example, at the scales of particle colliders to describe events before quarks and gluons hadronize. Confinement The running of the coupling, when going towards low energies, as seen in Equation (1.13) implies that αS ∼ 1 for energies on the order of ΛQCD . In this regime, known as 9 confinement, all quarks and gluons cannot appear as free particles and instead form color-neutral bound states. These color-singlets are predominantly of two types: mesons (pairs of quark anti- quark) and baryons (triplets of quarks), but more exotic states, such as tetraquarks, pentaquarks and glueballs are all in principle allowed by QCD and are being studied both theoretically and experimentally. Quarks are said to be confined because at low energies they can only be found in bound states. A single quark would, within a very small time scale, hadronize, which means they spawn new quark/anti-quark pairs from the vacuum to balance the color charge and reform colorless bound states quickly. A typical example is to consider two quarks inside a meson and try to pull them apart. Since the coupling increases for large distances, while being pulled apart more and more energy would be stored in the vacuum space between them, eventually spawning a new quark/anti-quark pair. This is pictorially shown in Figure 1.5. The behavior of the strong force at low energies also explains the large mass difference between quarks and hadrons. In these bound states, most of the mass is, in fact, energy coming from the dynamics of the gluon field, not from the constituent quarks. For example, the proton mass is 938 MeV, while its valence quarks only account for 10 MeV. Energy Energy Figure 1.5: Pictorial representation of confinement. As two color sources are pulled apart, the energy stored in the color field between the sources increases so much that a q q̄ pair is formed from the vacuum energy. Confinement is what fundamentally distinguishes the strong interaction from the other two forces of the SM. It also prevents perturbative methods from being applied in the study of hadrons. 10 This is the reason why non-perturbative methods, such as Lattice QCD are required to fully understand hadronic process. 1.2 Quantum Chromodynamics on a Lattice The idea of LQCD was introduced by K.G. Wilson in 1974 [2] as a method to study confinement. The main idea is to directly evaluate path integrals of the QCD action, in Euclidean space, by discretizing spacetime on a lattice. Over time, LQCD has been successfully used to perform calculations and is now considered the most systematic approach for non-perturbative calculations. It should be noted that LFTs are not only used for studies of QCD, but also for other QFTs. In this section, we will briefly describe how the discretization procedure works, the fundamental tools that are needed and some general results that show how successful this approach can be. In Section 1.4, we will instead look at some of the numerical details and challenges that are open in the field. 1.2.1 Discretization of the QCD Lagrangian Observables in Lattice QCD are computed using Feynman’s path-integrals, not in Minkovski space, but rather in Euclidean space by performing a Wick rotation. This is done to remove the oscillatory nature of the exponential of the action and instead get a real “Boltzmann-like factor.” In practice, for the expectation value of an observable O, one needs to evaluate2 : Z 1 ⟨O⟩ = D[ψ]D[ψ̄]D[A] Oe−SQCD , (1.14) Z D[ψ]D[ψ̄]D[A]e−SQCD and SQCD is the Euclidean R where Z is the QCD partition function Z = R version of the QCD action, or more simply: SQCD = d4 xLQCD , note that here also the Lagrangian is Wick rotated to Euclidean space. Evaluating path integrals is an impossible task, as they include a sum over all possible variations of a classical path. However, one can fix the number of degrees of freedom by introducing a finite lattice of points at which to evaluate the integral, effectively reducing the path-integral to a product of integrals over the field’s value at the lattice sites. Since we are dealing with four-dimensional 2 The formal treatment of Lattice QCD and the notation is primarily based upon [21]. 11 Euclidean space, the most straightforward choice is to fix a hyper-cubic lattice Λ: Λ = {n = (nx , ny , nz , nt )} where nx , ny , nz = 1, 2, . . . , N ; nt = 1, 2, . . . , NT . (1.15) Here N is the number of points in the spatial dimensions and NT is the number of points in the temporal one (we will see later why these two are set differently and with typically NT > N ), hence the volume is set as N 3 × NT . A crucial parameter in simulations is the lattice spacing, denoted simply as a. The discretization procedure has two main benefits: it makes numerical evaluations of path- integrals possible and introduces a momentum cutoff as a regulator for the theory. To connect the lattice results to the physical world, one has to perform the continuum limit a → 0 and the infinite volume limit aN, aNT → ∞. In order to choose proper values for the volume and the lattice spacing, one needs to consider both a fine enough value for a such that it resolves the hadronic scale (so a ≪ ΛQCD ) and a total volume larger than the typical scale of the lightest particles active in the interactions, so for hadronic processes one needs to consider the Compton wavelength of pions, for example, m−1 π . The Wilson Gauge Action To start the formal discretization procedure of QCD on the lattice, we begin with the fermionic part of the action. Since fermions can only be evaluated at fixed sites, we must use finite difference to perform derivatives, in particular we consider ∂µ ψ(x) → ψ(n+µ̂)−ψ(n−µ̂) 2a . With this, the fermion action for a fixed point becomes: Z SF [ψ, ψ̄] = d4 xψ̄(x)(γµ ∂ µ + m)ψ(x) (1.16) " 4 # X X ψ(n + µ̂) − ψ(n − µ̂) → a4 ψ̄(n) γµ + mψ(n) . n∈Λ µ=1 2a In order to preserve local gauge invariance, as in the continuum case, an extra degree of freedom must be introduced. In fact, it is clear to see that under a local gauge transformation Ω(n) ψ̄(n)ψ(n ± µ̂) → ψ̄(n)Ω† (n)Ω(n ± µ̂)ψ(n ± µ̂). (1.17) is not gauge-invariant. The problem can be solved by introducing a field U (n) that cancels the term Ω† (n)Ω(n ± µ̂). This is what we identify as the lattice version of the gauge fields; by convention 12 it is written as U (n, n + µ̂) = Uµ (n) and transforms as: Uµ (n) → Uµ′ (n) = Ω(n)Uµ (n)Ω† (n + µ̂) (1.18) U−µ (n) → U−µ′ (n) = Ω(n)U−µ (n)Ω† (n − µ̂). The field Uµ (n) is a collection of “links” from one lattice site to the neighboring ones and has the fundamental property that it is orientable U (n, n − µ̂) ≡ U−µ (n) = Uµ (n − µ̂)† . Its values are elements of SU(3). A pictorial representation is given in Figure 1.6, With this new insight, the n n + µ̂ n − µ̂ n Uµ (n) U−µ (n) ≡ Uµ† (n − µ̂) Figure 1.6: Schematic representation of the link variables Uµ (n) and U−µ (n) on the lattice. gauge-invariant lattice fermion action is: " 4 # X X Uµ (n)ψ(n + µ̂) − U−µ (n)ψ(n − µ̂) 4 SF [ψ, ψ̄] = a ψ̄(n) γµ + mψ(n) . (1.19) n∈Λ µ=1 2a The link field Uµ is related to the path-ordered product (P) of the gauge field Aµ (x) along some curve:  Z  ′ ′µ G(x, y) = P exp ig0 Aµ (x )dx . (1.20) C This quantity is the so-called “gauge transporter” [16] in the continuum. It has the property of connecting in a gauge covariant manner two points in spacetime. Hence it is clear to see the analogy with the lattice link variables. From Equation (1.20) it is easy to see that for small lattice spacing a one can expand: G(n, n + µ̂) ≡ Uµ (n) = exp(iaAµ (n)) = 1 + iaAµ (n) + O(a2 ) (1.21) G(n, n − µ̂) ≡ U−µ (n) = exp(−iaAµ (n)) = 1 − iaAµ (n − µ̂) + O(a2 ). The gauge transporter is naturally gauge-invariant when evaluated on closed loops, and this property also applies to the discretized version of the theory. In particular, one can use the minimal 13 loop, the so-called “Plaquette” (see Figure 1.7) as a building block for a discretized version of the gluonic part of the QCD action. Therefore, one can show, using Equation (1.21) that: Pµν (n) = Uµ (n)Uν (n + µ̂)U−µ (n + µ̂ + ν̂)U− ν(n + ν̂) (1.22) = Uµ (n)Uν (n + µ̂)Uµ† (n + ν̂)Uν† (n) ≈ exp ia2 Gµν (n) + O(a3 ) .   U−µ (n + µ̂ + ν̂) = Uµ† (n + ν̂) U−ν (n + µ̂) = Uν† (n + ν̂) Uν (n + µ̂) xν xµ Uµ (n) Figure 1.7: The Plaquette, as defined on the lattice, is the oriented product of the link variables of a minimal square in the plane µν. It is easy then to use the result of Equation (1.22) to construct the gauge part of the action: a4 X X Z 1 β XX SG = 2 d4 xGµν (x)2 → 2 Tr(Gµν (n)2 ) = Re [Tr (1 − Pµν (n))] . 4g 2g n∈Λ µν 3 n∈Λ µ<ν (1.23) Equation (1.23) is the so-called Wilson Gauge Action and is the simplest possible discretization for the Yang-Mills action on a lattice. The parameter β = 6/g 2 is the “inverse coupling” and is what is used in practice for simulations. 1.2.2 Lattice Fermions The discretization of the fermion action given by Equation (1.19) is incomplete, as it contains unphysical results. To better understand this, one needs to look at the Fourier transform of the 14 propagator. Introducing the lattice Dirac matrix Dxy [U ] between two lattice sites x and y: X SF [ψ, ψ̄] = a4 ψ̄(x)Dxy [U ]ψ(y), (1.24) n∈Λ where 4 † X Uµ (x)δx,(y−µ̂) − U−µ (x)δx,(y+µ̂) Dxy [U ] = γµ + mδx,y . (1.25) µ=1 2a Its Fourier transform is: 4 iX D̃pq = δ(p − q)D̃(p) where D̃(p) = m1 + γµ sin(pµ a). (1.26) a µ=1 The inverse of D̃pq is related to the Dirac propagator and results in: P −1 −ia µ γµ sin(pµ a) + m D̃(p) = P 2 2 . (1.27) µ sin (pµ a) + m Going back to position space, via an inverse Fourier transform, we obtain an expression for the quark propagator: 1 X ˙ D−1 (x; y) = D̃(p)−1 eip(x−y)a . (1.28) |Λ| p∈Λ̃ Equation (1.27) has the expected pole in the limit of m → 0, but also 15 other unphysical poles residing at the corners of the first Brillouin Zone (any value of pµ where all of its components are either 0 or πa ). This is referred to as the “fermion doublers” problem, and a solution was already proposed by Wilson in his first paper on discretizing QCD on a lattice [2]. First, it should be noted that the problem is not present in the continuum, when a = 0, so this is a purely and effect of the discretization procedure. The solution is then to add additional terms to the fermion action in such a way that they cancel the doublers without changing the theory in the continuum. The simplest such term is the Wilson term, which modifies D̃(p) as: 4 4 W iX 1X D̃ (p) = m1 + γµ sin(pµ a) + 1 (1 − cos(pµ a)). (1.29) a µ=1 a µ=1 This leaves the pole at m = 0 intact, but it is clear that the additional cosine piece vanishes in the continuum. In coordinate space, this becomes: ±4   W 1 X 4 Dxy [U ] = (1 − γµ )Uµ (x)δx,(y−µ̂) + m + δx,y , (1.30) 2a µ=±1 a 15 with the handy notation of γ−µ = −γµ . With this definition, we can then write the final form of the Wilson fermion action: X S W [ψ, ψ̄] = a4 W ψ̄(x)Dxy [U ]ψ(y). (1.31) n∈Λ The Wilson action, by construction, is correct up to orders of O(a). One can, in principle, design other terms that systematically improve this discretization by adding higher-dimensional terms that cancel exactly the O(a) errors. This is known as the Symanzik improvement program [22] and for the Wilson action the first order correcting term is the so-called “Clover Term,” computed first by B. Sheikholeslami and R. Wohlert [23]: 1 X S SW = S W − cSW a5 ψ̄(n)σµν Gµν (n)ψ(n). (1.32) 2 n∈Λ xν xµ Figure 1.8: Schematic representation of the symmetric definition of the clover term Qµν (n) on the lattice. The field strength tensor for a plane µν at a lattice site is given by the average of all the for plaquettes that start from a site n, all taken with the same orientation. The Clover Term is called this way because of the way the gluon field tensor is discretized, which resembles a clover leaf, see Figure 1.8. More precisely, the O(a2 ) improved definition of Gµν on the lattice is: 1 Gµν = [Qµν − Qνµ ] (1.33) 8 with Qµν = Pµν (n) + Pν−µ (n) + P−νµ (n) + P−µ−ν (n). (1.34) 16 In Chapter 3 we will see a modified version of the clover term, part of the Stabilized Wilson Fermions (SWF) package. 1.2.3 Different Types of Lattice Actions The Wilson-Clover action seen in Equation (1.32) is only one of the possible choices for a lattice fermion action. As discussed in the previous section, one is free to add any term to the Dirac matrix as long as it vanishes in the continuum. Over the years, there has been active research in designing different actions, each with its own benefits and downsides. Here we will briefly mention the most important classes. • Wilson-Clover Fermions: these are the ones we already discussed. It is the most straight- forward way to remove the fermion doublers problem, however, it explicitly breaks chiral symmetry. Still, clover fermions have been extensively used in LQCD calculations over the years. Some of the field’s most remarkable results, such as the light hadron spectrum, which is in complete agreement with experimental data, have been obtained with this discretization, see Figure 1.9. • Ginsparg-Wilson Fermions: this discretization is obtained by imposing the requirement of preserving chiral symmetry exactly on the lattice through the Ginsparg-Wilson equation [24]. However, this is done at the expense of the locality of the action. In practice, the two main types of actions of this class, Domain-wall [25–28] and overlap fermions [29, 30], have a much higher numerical cost compared to other discretizations, but are still extensively used due to their properties. • Staggered Fermions: these don’t remove the fermion doublers explicitly, but instead dis- tribute their values on multiple lattice sites, leading to 4 “tastes” (an unphysical analogous of flavor) that have to be removed afterwards. Introduced first by J. Kogut and L. Susskind [31]. Staggered fermions are widely used because of their relatively cheap computational cost and practicality, especially in finite-temperature QCD [32]. In Chapter 4 we will apply this discretization to the Schwinger Model. • Twisted-Mass Fermions: these are very similar to Wilson fermions, but have an additional 17 rotation in flavor space (the twist) that partially preserves chiral symmetry [33–35]. However, this procedure breaks isospin symmetry instead. Figure 1.9: Light Hadron Masses as calculated from the Budapest-Marseille-Wuppertal (BMW) collaboration [3] using Wilson-Clover fermions. The gray lines and bands represent the experi- mental data, the blue circles are the input used to fix the scale, while the LQCD calculation results are the red points. 1.3 Numerical Evaluation of Path Integrals on the Lattice QCD When performing numerical calculations of QCD on a lattice, as shown in the previous section, for a given observable O, one numerically has to evaluate the path integral: DψDψ̄DU O e−SE [ψ,ψ̄,U ] R ⟨O⟩ = R . (1.35) DψDψ̄DU e−SE [ψ,ψ̄,U ] This is done by first selecting a discretized lattice action: SE [ψ, ψ̄, U ] = SG [U ] + SF [ψ, ψ̄, U ] = SG [U ] + ψ̄M [U ]ψ. (1.36) In the path integral, at every point, there are three fields to consider, U , ψ, ψ̄. However, the fermion fields can be integrated exactly using the properties of Grassmann numbers, in particular, 18 for any given matrix K one has: Z ! ! YZ YZ −Θ∗ KΘ ∗ ∗ P ∗ DΘ DΘe = dθi dθi e−θi Kij θj = ∗ dθi∗ dθi e− i θi ki θi (1.37) i i Y = ki = det K. i This simplifies the form of the denominator of Equation (1.35), the partition function, which becomes: Z Z −S[ψ,ψ̄,U ] Z= DψDψ̄DU e = DU e−SG [U ] det M [U ]. (1.38) For the numerator, if O does not depend on the fermion fields, we can apply the same result, but if it depends on a fermion bilinear, a more general form of Equation (1.37) is needed: Z ! ! ∗ YZ ∗ YZ P ∗ DΘ∗ DΘθa∗ θb e−Θ KΘ = dθi∗ dθi θa∗ θb e−θi Kij θj = dθi∗ dθi θa∗ θb e− i θi ki θi i i (1.39) = (det K)(K −1 )ab , which results in: Z 1 ⟨O[ψ(n), ψ̄(m), U ]⟩ = DU O[M [U ]−1 , U ]e−SG [U ] (det M [U ]). (1.40) Z If multiple fermions appear in the observable, those need to be contracted using Wick’s theorem. However, their dependence still only enters as the inverse of the Dirac matrix M [U ], which in turn only depends on the gauge field. The form of Equation (1.40) cannot be simplified analytically any further. However, it is very suitable for numerical integration using Monte Carlo methods. In fact one can identify e−SG [U ] det M the term P [U ] = Z as a Probability Distribution Function (PDF), notice that it only depends on the gauge field U , and draw random samples from it in what we call an ensemble U = {U1 , U2 , . . . , UN }. The expectation value of the observable then becomes the ensemble average of O computed on the gauge field configurations of U: Z 1 X ⟨O⟩ = DU P [U ]O[ψ, ψ̄, U ] ≈ O(ψ, ψ̄, Ui ). (1.41) N U ∈U i 19 The problem is now reduced to finding a suitable procedure for drawing gauge field configurations from the PDF. The most widely used method is via Markov Chain Monte Carlo methods (MCMC) like the Metropolis-Hastings, Heat Bath or Overrelaxation algorithms. In practice, today, the Hybrid Monte Carlo algorithm, which we will describe in Section 1.3.1, is the most widely used one. Recently more novel approaches have been proposed, for example, using flow-based machine- learning algorithms [36, 37]. 1.3.1 The Hybrid Monte Carlo Algorithm The Hybrid Monte Carlo (HMC) algorithm was first proposed for Lattice QCD calculations in 1987 by S. Duane et al. [38] and it is now utilized in broader contexts, such as machine learning, with the different name, but fortunately with the same acronym, of Hamiltonian Monte Carlo. The key idea is to combine microcanonical updates, performed with Hamiltonian dynamics, and a Metropolis accept-reject condition to maintain ergodicity. The microcanonical part of the algorithm, often times called the Molecular Dynamics (MD) update for the analogy with other Hamiltonian dynamics simulations, is based on the observation that for a given bosonic field Q one can introduce a real field P that it is its conjugate momentum. An observable that depends on Q can then be calculated as: 1 2 DQ O[Q] e−S[Q] DQ O[Q] e− 2 P −S[Q] R R ⟨O⟩Q = R = 1 2 = ⟨O⟩Q,P (1.42) DQ e−S[Q] DQ e− 2 P −S[Q] R The expectation value does not change as the Gaussian integrals for P cancel. This formulation represents a non-relativistic microcanonical ensemble whose Hamiltonian is H[Q, P ] = 12 P 2 + S[Q] and its equation of motion are: ∂H ∂S Ṗ = − =− , (1.43) ∂Q ∂Q ∂H Q̇ = = P. ∂P One can utilize these equations to move from one lattice gauge configuration to another along the Markov Chain. To perform the Hamiltonian evolution, one uses regular algorithms developed for MD, such as the Leapfrog algorithm, which is widely used for its reversibility properties. Since 20 the MD update carries errors coming from the numerical integration, usually of O(ϵ2 ) where ϵ is the time-step size, to enforce ergodicity one performs a Metropolis accept-reject check. So the transition probability between two configurations before and after the MD update is: exp(−H[P ′ , Q′ ])   ′ ′ T (P , Q |P, Q) = min 1, = min (1, exp(−∆H)) . (1.44) exp(−H[P, Q]) This can be shown to satisfy detailed balance, hence making it suitable for MCMC calculations. 1.3.2 Hadron Correlators on the Lattice In order to study the properties of hadrons on the lattice, like their mass, one needs to find a suitable way to construct operators out of quark fields that match the desired hadron’s quantum numbers. In practice, the calculation of hadron correlators is what is performed on the lattice. These are two-point functions that are usually interpreted via spectral decomposition: X X C(t) = a3 ⟨O(t, x)Ō(0, 0)⟩ = ⟨0| Ô |k⟩⟨k| Ô† |0⟩e−tEk , (1.45) x k where O(t) is what is called an interpolating operator. It is a combination of quark fields with the same quantum numbers of the hadron under consideration, but also of all its excited states. These cannot be distinguished on the lattice, so all of them propagate at onec. For this reason, one needs to consider the time dependence of the two-point function, as their propagation is different because of the e−tEk factor, which makes heavy excited states decay very quickly in signal on the lattice. In contrast, the stable ground state is easily recovered in the t → ∞ case. We will see more of the details later in this section. In this work, we will extensively use these interpolating operators: ¯ Pion: Oπ+ (x) = d(x) α,c (γ5 )αβ u(x)β,c Kaon: OK + (x) = s̄(x)α,c (γ5 )αβ u(x)β,c   T Proton: OP (x) = ϵabc u(x)α,a u(x)β,b C(γ5 )βγ d(x)γ,c   Omega: OΩ (x) = ϵabc us(x)α,a s(x)Tβ,b C(γ5 )βγ s(x)γ,c . (1.46) 21 The correlator is then the contraction between O(x) and Ō(y). For pions, for example, it is: ⟨Oπ+ (x)Ōπ+ (y)⟩ = ¯ = ⟨d(x) α1 ,c1 (γ5 )α1 β1 u(x)β1 ,c1 ū(y)α2 ,c2 (γ5 )α2 β2 d(y)β2 ,c2 ⟩ ¯ = −⟨(γ5 )α1 β1 d(x) α1 ,c1 d(y)β2 ,c2 (γ5 )α2 β2 ū(y)α2 ,c2 u(x)β1 ,c1 ⟩ = −⟨Tr γ5 Du−1 (y, x)γ5 Dd−1 (x, y) ⟩G   (1.47) for the proton and the Ω baryon the contraction is slightly more involved, as there are more possible contractions of the quarks of identical flavor. In this case, Wick’s theorem is used to compute all terms that contribute to the correlator while keeping track of the relative signs coming from the anti-commutation relations, but it is diagrammatically shown in Figure 1.10. Figure 1.10: Schematic representation of the possible quark contractions for a proton interpolating operator. The term, for a generic quark flavor q, that appears in Equations (1.46) and (1.47) is, when contracted, the equivalent of the inverse Dirac operator between the two points where the quarks fields are located: ⟨q(x)α,a q̄(y)β,b ⟩ = D−1 (y, x)αβ ab . (1.48) From Equation (1.47) we see that in order to calculate the expectation value of the quark correlator, one needs the expectation value, computed over a gauge ensemble, of the trace of the appropriate (depending on the hadron) quark propagators D−1 (y, x). In principle, this requires the inversion of the whole Dirac matrix, but this is not feasible numerically due to the large size of the matrix. In practice, the problem of determining the quark propagator is reformulated in terms of a linear system by fixing a point source: ηβ,b (0) = δb,c1 δβ,α1 δy,(y,0) . (1.49) 22 This source is a simple Kronecker delta in space, Dirac and color indices. The problem is then reformulated as: D(0, x)αβ ab q(x)α,a = ηβ,b (0). This is equivalent to computing only one column of the inverse matrix. Since we are still interested in all possible combinations of Dirac and color indices, in practice, a total of 3 × 4 = 12 different source vectors are to be solved. The exact form of the Dirac operator matrix depends on the lattice action; in the case of the Wilson-Clover action, it is a very sparse matrix because fermions at one lattice site are only linked to their nearest neighbors. This kind of linear system is usually solved using iterative methods. One of the simplest ones is the Conjugate Gradient, but many variations are used. For example, the BiCGSTAB [39] is a common choice for non-hermitian operators. These iterative solvers are terminated at convergence: when for a small parameter ϵ: ||Axn − b|| < ϵ Recently, there has been much research on faster and more efficient algorithms, such as the SAP [40] and Multi-Grid [41] algorithms. In Chapter 2 we will discuss more about the details of the quark propagator solvers, as we will try to find a machine learning method to speed up the calculations. 1.3.3 Correlation Functions and Effective Masses One can extract properties of hadrons propagating on the lattice from the correlation function of Equation (1.45). An obvious first choice would be that of the mass. In Figure 1.11 we can see the typical behavior of C(t) for the pion and the nucleon. As expected, there is an exponential decay in time, both for positive and negative values around the point t = 0 where the source is located; this behavior is given by the periodic boundary conditions. The asymmetric shape of the baryon correlators is due to the mass difference of the positive and negative parity states, which both propagate from the same interpolating operator. The degradation of the signal of the nucleon correlator is due to the Signal-to-Noise problem, which we will discuss more in Section 1.4.2. 23 10−19 10−12 10−21 10−23 10−13 C π (t) C P (t) 10−25 10−14 10−27 10−15 10−29 0 10 20 30 40 50 60 0 10 20 30 40 50 60 t [l.u.] t [l.u.] Figure 1.11: Example of pion (left) and proton (right) correlators from an ensemble of the PACS-CS collaboration [42]. The lattice size is 323 × 64, the lattice spacing is a = 0.0907(13) fm while the pion mass mπ = 567.6(3) MeV, on the x-axis the time is reported in lattice units, as standard practice. The simplest possible way to extract hadron masses is to look at the “effective mass.” It is easily obtained from taking the logarithm of Equation (1.45):   C(t) meff (t) = ln . (1.50) C(t + 1) In the limit of t → ∞ the function meff (t) has a plateau whose value is the lowest energy of the spectrum of the hadron interpolating operator. One can then fit a constant to this plateau to get an estimate of the mass (see Figure 1.12). In Chapter 2 we will use this procedure, while in Chapter 3, a more involved analysis for the determination of hadron masses will be presented. 1.3.4 Smearing Techniques A common technique for improving the correlators’ signal is to use gauge link smearing. Since, most of the time, one is interested only in the ground state of the hadron state, one is free to modify the interpolating operator freely to increase its overlap with the ground state. Many different smearing procedures have been proposed, for example APE [43], HYP [44] or Stout [45] smearing. In this work, we employ the Gaussian gauge-invariant smearing technique introduced by S. Güsken [46]. The starting ansatz is that the hadron operator can be constructed from the 24 2000 4000 1000 2000 mπeff (t) MeV mPeff (t) MeV 0 0 −2000 −1000 −4000 −2000 0 10 20 30 40 50 60 0 10 20 30 40 50 60 t [l.u.] t [l.u.] Figure 1.12: Example of pion (left) and proton (right) effective mass, computed following Equa- tion (1.50) for the correlators in Figure 1.11 from an ensemble of the PACS-CS collaboration [42]. The lattice size is 323 × 64, the lattice spacing is a = 0.0907(13) fm while the pion mass mπ = 567.6(3) MeV. fermion field ψ(x, t) smeared with an appropriate wave-function F (x, x′ ): X X ψ̃(x, t) = F (x, x′ )ψ(x, t) = [δx,x′ + αH(x, x′ )] ψ(x, t) (1.51) x′ x′ where H(x, x′ ) is the hopping matrix: 3 X H(x, x′ ) = Uµ (x)δx′ ,x+µ̂ + Uµ† (x − µ̂)δx′ ,x−µ̂ (1.52) µ=1 the parameter α represents the strength of the coupling between neighboring links, hence the smoothness of the smearing. Equation (1.51) is one step of the smearing procedure, in practice, there are Nsmear applications of that same equation. The scheme then is characterized by two parameters (Nsmear , α). When applying this procedure to quark propagators, the algorithm is as follows: N 1. Create a smeared source at z and smear it with Nsmear applications of F : Q(z) = Fsmear (z, 0) 2. Solve the modified linear system: X M (z, t; y, t′ )(q̃)−1 (y, t′ ; 0) = Q(z)δt,0 , (1.53) y,t with X (q̃)−1 (y, t′ ; 0) = q −1 (y, t′ ; z, 0)F (z, 0), (1.54) z 25 3. Finally, multiply the smeared propagator with the wave function F again: † X X q −1 (y, t′ ; 0) = (F N ) (q̃)−1 (y, t; 0) (1.55) z y . One thing to note is that the last step is optional, in which case the “sink” is point-like and not smeared. The propagator computed this way is not positive definite, but can be useful for methods such as the variational method [47, 48] or matrix-Prony methods [49, 50]. Throughout this work, we will label the possible configuration of point or smeared sinks with the obvious notation of P S (point sink, smeared source) and SS (smeared sink, smeared source). 1.4 Modern Lattice QCD Lattice QCD is now a mature field with well-established practices regarding the generation of configurations, the measurement of observables and their analysis. In this section, we will first outline how simulations in LQCD are performed and later highlight some of the numerical challenges that are currently affecting them. 1.4.1 Structure of LQCD Calculations Current LQCD simulations are primarily performed with dynamical fermions, meaning that the PDF used for generating the Markov Chain for the gauge field ensemble includes the fermion action (as opposed to “quenched simulations”). The light quarks are usually considered to be degenerate, while all other flavors have to be simulated separately. The usual labeling nowadays is that of 2 + 1 for simulations with degenerate u, d quarks and the s quark, and 2 + 1 + 1 for simulations that also add the c quark. For numerical reasons and in order to correctly estimate the systematic uncertainties, LQCD calculations are not performed on a single ensemble of gauge field configurations. In fact, we have already seen, in Section 1.2.2, that the lattice action is only correct up to a fixed order in the lattice spacing a and contains artifacts that need to be removed to recover the continuum theory. So one needs several ensembles at different values of a and perform the continuum extrapolation (a → 0). Another extrapolation that is usually performed is that of the infinite volume (L → ∞) to make sure that the systematics introduced by the finite size of the simulation are accounted for. 26 In Section 1.3 we also have noticed that the condition number of the Dirac matrix approaches zero for smaller values of the quark mass. This makes calculation at the physical value numerically expensive. A way to circumvent this problem is to measure observables at different values of the pion mass mπ and later extrapolate to the physical point. This is known as the “chiral extrapolation.” Recently some calculations with physical mπ have become accessible due to improvements in computing hardware and algorithms. Since generating gauge field configurations is computationally very expensive, it is standard practice to save them to disk space, effectively separating the running of the MCMC algorithm from the measuring of observables into two distinct tasks. Here is a rough sketch of how the calculation is then performed: 1. Generate with MCMC methods a set of gauge field ensembles with different physical characteristics (lattice spacing, volume, quark masses), sampling from a PDF determined by choice of the lattice action. As already mentioned, this is numerically challenging and requires large High Performance Computing (HPC) facilities, often with several computing nodes at a time, because of the high cost of the MD updates. Therefore, several collaborations generate ensembles and share their configurations among their members or make them publicly available. In Chapter 3, we will see a new effort of this type, the Open Lattice Initiative (OPENLAT). 2. Since most of the observables require computing quark propagators, performing measure- ments is also a relatively expensive task from the numerical point of view. In fact, solving the linear systems is challenging because of the sheer size of the Dirac matrix and the poor convergence properties of linear solvers, especially when dealing with light quark masses. The measurement of observables, however, can be be performed on different hardware than the one for MCMC algorithms, in particular today these are mostly performed on GPU nodes at HPC facilities, with usually only very few nodes, while the parallelization is done at the configuration level (running multiple small jobs, one per configuration, at once). 3. Once the observables are computed over a set of configurations, expectation values are 27 evaluated, often using resampling methods such as bootstrap [51] or jackknife, see Ap- pendix 1A.2 for more details, and the extrapolations to infinite volume, continuum and physical quark mass are performed. All of this procedure is done to obtain reliable estimates of the statistical and systematic uncertainties. In doing this, also the autocorrelation of the MCMC needs to be taken into account, see Appendix 1A.1 4. In order to compare results coming from LQCD to experimental values, one needs to determine a physical scale for the lattice. In fact, all quantities computed on the lattice are expressed in terms of the lattice spacing a, so a physical quantity has to be used to set its value to the proper units. A similar procedure has to be performed for the quark masses as well. In general, one needs 1 + Nf physical quantities to fix the scale. Scale setting is a research field on its own and many different possibilities have been explored, such as using the static quark potential (from which the Sommer scale r0 is obtained) [52], heavy baryons like the Ω [53], or the gradient flow [54]; the latter is the one that will be used throughout this work, more details can be found in Appendix 1C. 1.4.2 Current Numerical Challenges in LQCD The approach presented in the previous section has been very successful, however, there are some limitations to the current approach that prevent, for example, large-scale simulations at the physical point with small lattice spacing. In this section, we will briefly present some of them. Critical Slowing Down An important quantity in MCMC algorithms is the autocorrelation time. It is a measure of how much the samples of a given chain are correlated with each other. Ideally one would want all the samples to be independent from each other. In LQCD calculations, since the typical size of the ensembles is rather small, it is critical to ensure the independence of the gauge field configurations or at least account for their autocorrelation as a systematic error. For a more detailed description, see Appendix 1A.1. When approaching the critical point of a theory the autocorrelation time increases. In the case of LQCD, this happens when the lattice spacing approaches zero. Calculations at small a are already more expensive as the total volume in physical units is usually kept constant, so the 28 lattices tend to have more points towards the continuum limit. However, due to the phenomenon of “critical slowing down,” the autocorrelation time also increases, further augmenting the cost of these simulations. Lattice QCD has, as opposed to the continuum theory, a non-fixed topology. The topological charge of the configurations in an ensemble fluctuates and its distribution is supposed to be Gaussian. When approaching the continuum, different topological sectors become increasingly separated and the MCMC algorithms, such as the HMC, struggle to sample all the sectors properly. This is the phenomenological explanation for the critical slowing down. For a more detailed explanation of topological effects in LQCD, see Appendix 1B. Dirac Matrix Condition Number As discussed in Section 1.3, the linear system for quark propagators is solved iteratively, using Krylov-subspace linear system solvers. This means that if after n iterations the solver has satisfied the inequality ∥rn ∥ < ϵ ∥η∥ , (1.56) then the quark propagator is said to be obtained with accuracy ϵ and the algorithm is terminated. In general, ϵ is chosen to be small enough that any effects coming from the linear solver truncation are smaller than the stochastic uncertainties coming from ensemble averages of observables. Rewriting the above relation as ∥DS − Dψn ∥ < ϵ ∥DS∥ , (1.57) and using standard bounds from the matrix norm inherited from the L2 vector norm, one deduces that the relative deviation of the solution after n solver iterations, ψn and the real solution S ∥S − ψn ∥ < ϵκ(D) ∥S∥ , (1.58) is controlled by ϵ and the condition number of the matrix D s λmax (D† D) κ(D) = ∥D∥ · D−1 = , (1.59) λmin (D† D) where we have used the matrix norm induced by the spectral radius of the matrix D. 29 Figure 1.13: Contractions of the square of a proton interpolating operator. In lattice QCD the lowest eigenvalue of D† D is proportional to the quark mass squared and the largest eigenvalue is proportional to the inverse lattice spacing squared. The implication is that as the quark mass is decreased, the linear system becomes increasingly ill-conditioned, leading to longer iteration times and possible failures in the solver algorithm. Signal to Noise When calculating baryonic observables in LQCD, an insurmountable problem is that of the Signal-to-Noise. The variance of an observable is defined as σ 2 (O) = ⟨O2 ⟩ − ⟨O⟩2 . In Figure 1.10, we showed the contractions for a single proton, but the possible contractions for the squared of the operator would be as in Figure 1.13. As shown pictorially, there are more possible contractions: one is the expected two baryon system, while another possible one is that of three pions propagating. Because on the lattice, in the large t limit, the lightest state dominates, the above picture is problematic, as the variance will be dominated by the three pion state and not by the two nucleons one. In detail, the signal-to-noise can be given by: e−tmp   ⟨O⟩ 3 StN = ∝ − 3 tm = exp −t(mp − mπ ) (1.60) σO e 2 π 2 This poses a challenge in extracting the ground state of such hadrons, as for small t the correlator is contaminated by excited states, but for large t the signal is lost due to noise. This effect becomes stronger as the mass gap between the nucleon and the pion increases, i.e. when approaching the physical point. Strong evidence of this problem can be seen in Figure 1.12 where the pion correlator and effective mass have a very smooth behavior and constant signal-to-noise. In contrast, the 30 proton data clearly shows degradation of signal towards the center of the lattice. To summarize, Lattice QCD is a mature field of research that is now a crucial component in our understanding of the SM. Thanks to the advances in computing power and numerical methods it can now be used for precision calculations of non-perturbative phenomena. There are, however, some challenges that need to be addressed either with small incremental improvements to more traditional approaches, like new lattice discretizations or better algorithms, or even with entirely novel methods such as Machine Learning or Quantum Computing among others. 31 BIBLIOGRAPHY [1] D. Galbraith and C. Burgard. “Standard model of the standard model.” (Nov. 12, 2013), [Online]. Available: https://texample.net/tikz/examples/model-physics/. [2] K. G. Wilson, “Confinement of quarks,” Physical Review D, vol. 10, no. 8, pp. 2445–2459, Oct. 15, 1974, issn: 0556-2821. doi: 10.1103/PhysRevD.10.2445. [3] S. Dürr et al., “Ab initio determination of light hadron masses,” Science, vol. 322, no. 5905, pp. 1224–1227, Nov. 21, 2008, issn: 0036-8075, 1095-9203. doi: 10.1126/science.1163233. [4] A. Bazavov, “An overview of (selected) recent results in finite-temperature lattice QCD,” Journal of Physics: Conference Series, vol. 446, p. 012 011, Sep. 2013, issn: 1742-6596. doi: 10.1088/1742-6596/446/1/012011. [5] H.-W. Lin et al., “Parton distributions and lattice QCD calculations: A community white paper,” Progress in Particle and Nuclear Physics, vol. 100, pp. 107–160, May 1, 2018, issn: 0146-6410. doi: 10.1016/j.ppnp.2018.01.007. [6] S. R. Beane, W. Detmold, K. Orginos, and M. J. Savage, “Nuclear physics from lattice QCD,” Progress in Particle and Nuclear Physics, vol. 66, no. 1, pp. 1–40, Jan. 1, 2011, issn: 0146-6410. doi: 10.1016/j.ppnp.2010.08.002. [7] J. Dragos, T. Luu, A. Shindler, J. de Vries, and A. Yousif, “Confirming the existence of the strong CP problem in lattice QCD with the gradient flow,” Physical Review C, vol. 103, no. 1, p. 015 202, Jan. 19, 2021. doi: 10.1103/PhysRevC.103.015202. [8] S. Borsanyi et al., “Leading hadronic contribution to the muon magnetic moment from lattice QCD,” Nature, vol. 593, no. 7857, pp. 51–55, May 2021, issn: 1476-4687. doi: 10.103 8/s41586-021-03418-1. [9] V. Cirigliano, W. Detmold, A. Nicholson, and P. Shanahan, “Lattice QCD inputs for nuclear double beta decay,” Progress in Particle and Nuclear Physics, vol. 112, p. 103 771, May 1, 2020, issn: 0146-6410. doi: 10.1016/j.ppnp.2020.103771. [10] M. Gell-Mann, “The eightfold way: A theory of strong interaction symmetry,” TID-12608, CTSL-20, 4008239, Mar. 15, 1961, TID–12 608, CTSL–20, 4 008 239. doi: 10.2172/4008239. [11] Y. Ne’eman, “Derivation of strong interactions from a gauge invariance,” Nuclear Physics, vol. 26, no. 2, pp. 222–229, Aug. 1961, issn: 00295582. doi: 10.1016/0029-5582(61)90134-1. [12] O. W. Greenberg, “Spin and unitary-spin independence in a paraquark model of baryons and mesons,” Physical Review Letters, vol. 13, no. 20, pp. 598–602, Nov. 16, 1964, issn: 0031-9007. doi: 10.1103/PhysRevLett.13.598. 32 [13] M. Y. Han and Y. Nambu, “Three-triplet model with double SU ( 3 ) symmetry,” Physical Review, vol. 139, no. 4, B1006–B1010, Aug. 23, 1965, issn: 0031-899X. doi: 10.1103/Phys Rev.139.B1006. [14] H. Fritzsch, M. Gell-Mann, and H. Leutwyler, “Advantages of the color octet gluon picture,” Physics Letters B, vol. 47, no. 4, pp. 365–368, Nov. 1973, issn: 03702693. doi: 10.1016/0370 -2693(73)90625-4. [15] Particle Data Group et al., “Review of particle physics,” Progress of Theoretical and Experi- mental Physics, vol. 2022, no. 8, p. 083C01, Aug. 8, 2022, issn: 2050-3911. doi: 10.1093/pt ep/ptac097. [16] M. E. Peskin, An Introduction To Quantum Field Theory. Boca Raton: CRC Press, Jan. 30, 2018, 866 pp., isbn: 978-0-429-50355-9. doi: 10.1201/9780429503559. [17] Y. Aoki et al., “FLAG review 2021,” The European Physical Journal C, vol. 82, no. 10, p. 869, Oct. 4, 2022, issn: 1434-6052. doi: 10.1140/epjc/s10052-022-10536-1. [18] S. Bethke, “Experimental tests of asymptotic freedom,” Progress in Particle and Nuclear Physics, vol. 58, no. 2, pp. 351–386, Apr. 2007, issn: 01466410. doi: 10.1016/j.ppnp.2006. 06.001. [19] D. J. Gross and F. Wilczek, “Ultraviolet behavior of non-abelian gauge theories,” Physical Review Letters, vol. 30, no. 26, pp. 1343–1346, Jun. 25, 1973, issn: 0031-9007. doi: 10.1103/ PhysRevLett.30.1343. [20] H. D. Politzer, “Reliable perturbative results for strong interactions?” Physical Review Letters, vol. 30, no. 26, pp. 1346–1349, Jun. 25, 1973, issn: 0031-9007. doi: 10.1103/PhysRevLett. 30.1346. [21] C. Gattringer and C. B. Lang, Quantum Chromodynamics on the Lattice (Lecture Notes in Physics). Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, vol. 788, isbn: 978-3-642- 01849-7. doi: 10.1007/978-3-642-01850-3. [22] K. Symanzik, “Continuum limit and improved action in lattice theories,” Nuclear Physics B, vol. 226, no. 1, pp. 187–204, Sep. 1983, issn: 05503213. doi: 10.1016/0550-3213(83)90468 -6. [23] B. Sheikholeslami and R. Wohlert, “Improved continuum limit lattice action for QCD with wilson fermions,” Nuclear Physics B, vol. 259, no. 4, pp. 572–596, Sep. 30, 1985, issn: 0550-3213. doi: 10.1016/0550-3213(85)90002-1. [24] P. H. Ginsparg and K. G. Wilson, “A remnant of chiral symmetry on the lattice,” Physical Review D, vol. 25, no. 10, pp. 2649–2657, May 15, 1982. doi: 10.1103/PhysRevD.25.2649. 33 [25] D. B. Kaplan, “A method for simulating chiral fermions on the lattice,” Physics Letters B, vol. 288, no. 3, pp. 342–347, Aug. 27, 1992, issn: 0370-2693. doi: 10.1016/0370-2693(92)9 1112-M. [26] D. B. Kaplan, “Chiral fermions on the lattice,” Nuclear Physics B - Proceedings Supplements, Proceedings of the International Symposium on, vol. 30, pp. 597–600, Mar. 1, 1993, issn: 0920-5632. doi: 10.1016/0920-5632(93)90282-B. [27] Y. Shamir, “Chiral fermions from lattice boundaries,” Nuclear Physics B, vol. 406, no. 1, pp. 90–106, Sep. 27, 1993, issn: 0550-3213. doi: 10.1016/0550-3213(93)90162-I. [28] V. Furman and Y. Shamir, “Axial symmetries in lattice QCD with kaplan fermions,” Nuclear Physics B, vol. 439, no. 1, pp. 54–78, Apr. 10, 1995, issn: 0550-3213. doi: 10.1016/0550-321 3(95)00031-M. [29] R. Narayanan and H. Neuberger, “A construction of lattice chiral gauge theories,” Nuclear Physics B, vol. 443, no. 1, pp. 305–385, Jun. 5, 1995, issn: 0550-3213. doi: 10.1016/0550-3 213(95)00111-5. [30] H. Neuberger, “Exactly massless quarks on the lattice,” Physics Letters B, vol. 417, no. 1, pp. 141–144, Jan. 15, 1998, issn: 0370-2693. doi: 10.1016/S0370-2693(97)01368-3. [31] J. Kogut and L. Susskind, “Hamiltonian formulation of wilson’s lattice gauge theories,” Physical Review D, vol. 11, no. 2, pp. 395–408, Jan. 15, 1975. doi: 10.1103/PhysRevD.11. 395. [32] A. Bazavov et al., “Equation of state in ( 2 + 1 )-flavor QCD,” Physical Review D, vol. 90, no. 9, p. 094 503, Nov. 4, 2014, issn: 1550-7998, 1550-2368. doi: 10.1103/PhysRevD.90.094503. [33] S. Aoki, “New phase structure for lattice QCD with wilson fermions,” Physical Review D, vol. 30, no. 12, pp. 2653–2663, Dec. 15, 1984. doi: 10.1103/PhysRevD.30.2653. [34] R. Frezzotti, P. A. Grassi, S. Sint, and P. Weisz, “Lattice QCD with a chirally twisted mass term,” Journal of High Energy Physics, vol. 2001, no. 8, pp. 058–058, Aug. 29, 2001, issn: 1029-8479. doi: 10.1088/1126-6708/2001/08/058. [35] A. Shindler, “Twisted mass lattice QCD,” Physics Reports, vol. 461, no. 2, pp. 37–110, May 2008, issn: 03701573. doi: 10.1016/j.physrep.2008.03.001. [36] R. Abbott et al., “Sampling QCD field configurations with gauge-equivariant flow models,” 2022, Publisher: arXiv Version Number: 2. doi: 10.48550/ARXIV.2208.03832. [37] S. Foreman, T. Izubuchi, L. Jin, X.-Y. Jin, J. C. Osborn, and A. Tomiya, “HMC with normalizing flows,” 2021, Publisher: arXiv Version Number: 1. doi: 10.48550/ARXIV.2112.01586. 34 [38] S. Duane, A. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid monte carlo,” Physics Letters B, vol. 195, no. 2, pp. 216–222, Sep. 1987, issn: 03702693. doi: 10.1016/0370-2693(87)911 97-X. [39] H. A. van der Vorst, “Bi-CGSTAB: A fast and smoothly converging variant of bi-CG for the solution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631–644, Mar. 1992, issn: 0196-5204, 2168-3417. doi: 10.1137/ 0913035. [40] M. Lüscher, “Solution of the dirac equation in lattice QCD using a domain decomposition method,” Computer Physics Communications, vol. 156, no. 3, pp. 209–220, Jan. 15, 2004, issn: 0010-4655. doi: 10.1016/S0010-4655(03)00486-7. [41] J. Brannick, R. C. Brower, M. A. Clark, J. C. Osborn, and C. Rebbi, “Adaptive multigrid algorithm for lattice QCD,” Physical Review Letters, vol. 100, no. 4, p. 041 601, Jan. 28, 2008, issn: 0031-9007, 1079-7114. doi: 10.1103/PhysRevLett.100.041601. [42] S. Aoki et al., “Non-perturbative renormalization of quark mass in nf= 2 + 1 QCD with the schrödinger functional scheme,” Journal of High Energy Physics, vol. 2010, no. 8, p. 101, Aug. 23, 2010, issn: 1029-8479. doi: 10.1007/JHEP08(2010)101. [43] M. Albanese et al., “Glueball masses and string tension in lattice QCD,” Physics Letters B, vol. 192, no. 1, pp. 163–169, Jun. 25, 1987, issn: 0370-2693. doi: 10.1016/0370-2693(87)9 1160-9. [44] A. Hasenfratz and F. Knechtli, “Flavor symmetry and the static potential with hypercubic blocking,” Physical Review D, vol. 64, no. 3, p. 034 504, Jul. 2, 2001, issn: 0556-2821, 1089-4918. doi: 10.1103/PhysRevD.64.034504. [45] C. Morningstar and M. Peardon, “Analytic smearing of SU ( 3 ) link variables in lattice QCD,” Physical Review D, vol. 69, no. 5, p. 054 501, Mar. 23, 2004, issn: 1550-7998, 1550-2368. doi: 10.1103/PhysRevD.69.054501. [46] S. Güsken, “A study of smearing techniques for hadron correlation functions,” Nuclear Physics B - Proceedings Supplements, vol. 17, pp. 361–364, Sep. 1, 1990, issn: 0920-5632. doi: 10.1016/0920-5632(90)90273-W. [47] C. Michael, “Adjoint sources in lattice gauge theory,” Nuclear Physics B, vol. 259, no. 1, pp. 58–76, Sep. 16, 1985, issn: 0550-3213. doi: 10.1016/0550-3213(85)90297-4. [48] M. Lüscher and U. Wolff, “How to calculate the elastic scattering matrix in two-dimensional quantum field theories by numerical simulation,” Nuclear Physics B, vol. 339, no. 1, pp. 222– 252, Jul. 23, 1990, issn: 0550-3213. doi: 10.1016/0550-3213(90)90540-T. 35 [49] G. T. Fleming, S. D. Cohen, H.-W. Lin, and V. Pereyra, “Excited-state effective masses in lattice QCD,” Physical Review D, vol. 80, no. 7, p. 074 506, Oct. 13, 2009, issn: 1550-7998, 1550-2368. doi: 10.1103/PhysRevD.80.074506. [50] G. Plonka and M. Tasche, “Prony methods for recovery of structured functions,” GAMM- Mitteilungen, vol. 37, no. 2, pp. 239–258, 2014, issn: 1522-2608. doi: 10.1002/gamm.20141 0011. [51] B. Efron, “Bootstrap methods: Another look at the jackknife,” The Annals of Statistics, vol. 7, no. 1, Jan. 1, 1979, issn: 0090-5364. doi: 10.1214/aos/1176344552. [52] R. Sommer, “A new way to set the energy scale in lattice gauge theories and its application to the static force and coupling constant in SU (2) yang-mills theory,” Nuclear Physics B, vol. 411, no. 2, pp. 839–854, Jan. 1994, issn: 05503213. doi: 10.1016/0550-3213(94)90473-1. [53] S. Capitani, M. Della Morte, G. von Hippel, B. Knippschild, and H. Wittig, “Scale setting via the omega baryon mass,” 2011, Publisher: arXiv Version Number: 1. doi: 10.48550/ARX IV.1110.6365. [54] M. Lüscher, “Properties and uses of the wilson flow in lattice QCD,” Journal of High Energy Physics, vol. 2010, no. 8, p. 71, Aug. 2010, issn: 1029-8479. doi: 10.1007/JHEP08(2010)071. [55] A. Sokal, “Monte carlo methods in statistical mechanics: Foundations and new algorithms,” in Functional Integration, C. DeWitt-Morette, P. Cartier, and A. Folacci, Eds., vol. 361, Series Title: NATO ASI Series, Boston, MA: Springer US, 1997, pp. 131–192, isbn: 978-1-4899-0321- 1. doi: 10.1007/978-1-4899-0319-8_6. [56] H. Flyvbjerg and H. G. Petersen, “Error estimates on averages of correlated data,” The Journal of Chemical Physics, vol. 91, no. 1, pp. 461–466, Jul. 1989, issn: 0021-9606, 1089-7690. doi: 10.1063/1.457480. [57] E. Witten, “Current algebra theorems for the u(1) “goldstone boson”,” Nuclear Physics B, vol. 156, no. 2, pp. 269–283, Sep. 1979, issn: 05503213. doi: 10.1016/0550-3213(79)90031 -2. [58] A. Di Giacomo, “Topology in lattice QCD,” 1997, Publisher: arXiv Version Number: 1. doi: 10.48550/ARXIV.HEP-LAT/9711034. [59] T. Schäfer and E. V. Shuryak, “Instantons in QCD,” Reviews of Modern Physics, vol. 70, no. 2, pp. 323–425, Apr. 1, 1998, issn: 0034-6861, 1539-0756. doi: 10.1103/RevModPhys.70.323. [60] M. Lüscher and S. Schaefer, “Lattice QCD without topology barriers,” Journal of High Energy Physics, vol. 2011, no. 7, p. 36, Jul. 7, 2011, issn: 1029-8479. doi: 10.1007/JHEP07(2011)0 36. 36 [61] M. F. Atiyah and I. M. Singer, “The index of elliptic operators on compact manifolds,” Bulletin of the American Mathematical Society, vol. 69, no. 3, pp. 422–433, 1963, issn: 0002-9904, 1936-881X. doi: 10.1090/S0002-9904-1963-10957-X. [62] H. Leutwyler and A. Smilga, “Spectrum of dirac operator and role of winding number in QCD,” Physical Review D, vol. 46, no. 12, pp. 5607–5632, Dec. 15, 1992, issn: 0556-2821. doi: 10.1103/PhysRevD.46.5607. [63] C. Alexandrou et al., “Comparison of topological charge definitions in lattice QCD,” 2017, Publisher: arXiv Version Number: 2. doi: 10.48550/ARXIV.1708.00696. [64] D. J. Gross, R. D. Pisarski, and L. G. Yaffe, “QCD and instantons at finite temperature,” Reviews of Modern Physics, vol. 53, no. 1, pp. 43–80, Jan. 1, 1981, issn: 0034-6861. doi: 10.1103/RevModPhys.53.43. [65] M. Lüscher, “Trivializing maps, the wilson flow and the HMC algorithm,” Communications in Mathematical Physics, vol. 293, no. 3, pp. 899–919, Feb. 2010, issn: 0010-3616, 1432-0916. doi: 10.1007/s00220-009-0953-7. [66] M. Lüscher and P. Weisz, “Perturbative analysis of the gradient flow in non-abelian gauge theories,” Journal of High Energy Physics, vol. 2011, no. 2, p. 51, Feb. 2011, issn: 1029-8479. doi: 10.1007/JHEP02(2011)051. [67] H. Munthe-Kaas, “Lie-butcher theory for runge-kutta methods,” BIT Numerical Mathematics, vol. 35, no. 4, pp. 572–587, Dec. 1, 1995, issn: 1572-9125. doi: 10.1007/BF01739828. [68] H. Munthe-Kaas, “Runge-kutta methods on lie groups,” BIT Numerical Mathematics, vol. 38, no. 1, pp. 92–111, Mar. 1, 1998, issn: 1572-9125. doi: 10.1007/BF02510919. [69] H. Childs et al., “VisIt: An end-user tool for visualizing and analyzing very large data,” Nov. 2012. [70] M. Bruno, T. Korzec, and S. Schaefer, “Setting the scale for the CLS 2 + 1 flavor ensembles,” Physical Review D, vol. 95, no. 7, p. 074 504, Apr. 12, 2017, issn: 2470-0010, 2470-0029. doi: 10.1103/PhysRevD.95.074504. 37 APPENDIX 1A: TOOLS FOR STATISTICAL ANALYSIS IN LATTICE QCD 1A.1 Autocorrelation of Data One inherent characteristic of Markov Chain Monte Carlo methods is the correlation between samples along the chain. This effect, called autocorrelation, is rather intuitive since the algorithms work by modifying the latest element on the chain by some random value, which naturally implies that the new element has some relation with the previous one if the change is not sufficiently large. This phenomenon leads to an effective decrease of the sample size as not all configurations are statistically independent. It is thus crucial to consider autocorrelation effects in MCMC calculations. A measure of autocorrelation is given by the integrated autocorrelation time, denoted with τint , which, given a set of ordered variables {x1 , x2 , . . . xN } indexed by a time coordinate t, is defined as: ∞ ∞ 1 X Γ(t) 1X τint = = ρ(t), (1A.1) 2 t=1 Γ(0) 2 t=1 where Γ(t) is the autocorrelation function of points at a distance t in the chain: N −t 1 X Γ(t) = Γ(−t) = ⟨(xi − x̄)(xi+t − x̄)⟩ ≈ (xi − x̄)(xi+t − x̄). (1A.2) N − t i=1 With this definition, in the case of large N , the true variance of the data is: σ̃ 2 = 2τint σ 2 , (1A.3) this factor can be estimated numerically for a given Markov chain and used to correct the results 1 a posteriori. It is easy to see that if τint = 2 then the variance is unchanged, meaning there is no autocorrelation effect, and this can happen only if Γ(t) = 0 for all t ̸= 0, i.e. all samples are uncorrelated. Unfortunately, on a finite sample of size N , the ratio ρ(t) becomes noisy for large t, making it hard to determine τint . However, the noise must sum to 0 in the large N limit, so that one can truncate the summation in Equation (1A.1) to a suitable window of size W . Following the 38 procedure in [55] the square of the deviations of ρ(t) can be computed approximately as: ∞ 1 X ⟨δρ(t)2 ⟩ ≈ [ρ(k + t) + ρ(k − t) − 2ρ(k)ρ(t)]2 . (1A.4) N k=1 The terms in this equation all vanish for a sufficiently large value of k, which means one can introduce a cut-off Λ, usually of order O(10 − 100) and only sum for k = 1, . . . , Λ. One can then define the width of the summation window for τint to be first value of t where ρ(t) is smaller than q its fluctuations, i.e. where ρ(t) < ⟨δρ(t)2 ⟩. One can also define an approximate error formula for the integrated autocorrelation time, which reads: 2(2W + 1) 2 σ 2 (τint ) ≈ τint . (1A.5) N Throughout this work, the autocorrelation of data computed on lattice gauge field configurations has been estimated using this procedure. 1A.1.1 Statistical Blocking A straightforward way to handle autocorrelated data is to use the method of Blocking [56]. It works by creating blocks of data of size B out of a data set by averaging B points along the Markov chain, creating a new chain of length N/B. The block-averages are then used as the new variables for the analysis, with the new data points being less autocorrelated than the unblocked ones. To estimate the optimal block size B one should calculate expectation values and their variance for several increasing values of B, until the variance reaches a plateau. It can be shown that the variance tends to the same value as the one corrected with the integrated autocorrelation time from Equation (1A.3). In fact, blocking can be used as a cheap method to estimate τint . 1A.2 The Bootstrap Resampling Method Resampling methods in statistics can be beneficial to correctly estimating the population mean and its uncertainty by utilizing subsets of a given sample of data. In the case of LQCD, since lattice ensembles are usually small in size, resampling is beneficial to improve the precision of the calculations. 39 A first estimate of the expectation value and its uncertainty, on a set of N uncorrelated measurements, for an observable quantity O, is given by the mean and standard error of the mean: N 1 X Ô = Oi , (1A.6) N i=1 v N r u 2 u 1 X σO σÔ = ⟨ Ô − ⟨O⟩ ⟩ = t 2 ⟨(⟨Oi − ⟨O⟩) (⟨Oj − ⟨O⟩)⟩ = √ , N i,j=1 N where Oi are the measurements for the observable O. The Bootstrap method [51] relies on building a set of Nb bootstrap samples by choosing N points at random from the the original data Oi with repetition. It then uses the samples’ mean to obtain an estimator of the population mean. In particular, given the mean of a bootstrap sample: N 1 X Ôb = Or(N ) , (1A.7) N i=1 where r(N ) signifies a random number drawn between 1 and N , then the population mean and its variance are given by: Nb Nb 1 X 2 1 X 2 Õ = Ôb σ̃Õ = (Ob − Õ) . (1A.8) Nb b=1 Nb b=1 The expectation value and the uncertainty of O are then given ⟨O⟩ = Õ and its associated uncertainty is σ̃Õ . It should be noted that the results are not unbiased, in fact, Õ ̸= Ô and their difference gives the order of the possible deviations of Õ from the true value ⟨O⟩. The Bootstrap method is numerically inexpensive and can also be used for derived quantities of the observable O. Furthermore, in the case of LQCD, if one computes multiple observables on the same ensemble, the bootstrap method can be applied to estimate the population means of composite quantities, as long as the random samples are drawn simultaneously, thus preserving the correlation of the data. 40 APPENDIX 1B: OVERVIEW OF TOPOLOGICAL EFFECTS IN LATTICE QCD The QCD vacuum exhibits particular topological properties that are related to important physical phenomena [57, 58]. One important property is the topological charge, which can intuitively be understood as the winding number of the quantum field. In the continuum theory, fields with different topological charges are separated as there is no unitary transformation that can connect them; we often refer to these as topological sectors. Instead, when QCD is discretized on a lattice, the MCMC simulations have access to all sectors, thanks to non-zero instanton modes [59], and, in fact, observables are only well defined if all the sectors are sampled correctly. A side-effect of applying boundary periodic conditions to the gauge field is, however, that when approaching the continuum limit, the topological sectors become increasingly separated, i.e. the MCMC algorithms have lower probabilities of tunneling from one sector to another. This, in practice, means that to achieve proper sampling of an observable requires longer runs the closer we get to the continuum [60]. On the lattice, there are different ways to define the topological charge: one is fermionic and is based on the Atiyah-Singer index theorem [61] and relates the charge to the distribution of the spectrum of the Dirac operator [62]; in contrast, the other definition, which is the one we will discuss now, only comes from gauge field effects. The two are identical in the continuum, and purely topological observables, like the topological susceptibility, should be independent of the definition used [63, 64]. The topological charge can be written as the the integral over all spacetime of the topological charge density: Z Q= d4 xq(x), (1B.1) where 1 q(x) = 2 Tr(Gµν (x)G̃µν (x)). (1B.2) 32π 41 The term G̃µν represents the dual of the gauge field tensor: 1 G̃µν (x) = ϵµνρσ Gρσ (x), (1B.3) 2 with ϵµνρσ being the totally anti-symmetric Levi-Civita symbol. On the lattice, the topological charge in Equation (1B.1) can be computed using a suitable discretized definition of the gauge tensor. As we have seen in Section 1.2.2, this can be done using the Clover term, so the topological charge becomes: 1 X X µνρσ Q[U ] = ϵ Tr[G(clover) µν (n)G(clover) ρσ (n)]. (1B.4) 64π 2 n∈Λ µ<ν This definition is susceptible to short-distance fluctuations of the gauge field and is therefore very noisy, in fact, it is divergent in the continuum. A tool to improve its determination is the Gradient Flow, which is discussed more in detail in Appendix 1C, as it dampens the high-frequency modes, only leaving the long-scale signal without needing any additional renormalization as the topological charge is a purely gluonic observable. A quantity of interest is the topological susceptibility χ, that is related to the second cumulant of the distribution of the topological charge and is defined as: 1 χ= ⟨Q2 ⟩. (1B.5) a4 |Λ| This quantity is crucial in understanding essential properties of instantons on the lattice and, through the Witten-Veneziano formula [57], it can be shown to be proportional to the mass of the η ′ meson. 42 APPENDIX 1C: THE GRADIENT FLOW AND SCALE SETTING The Gradient Flow, or Wilson Flow, is a method for studying non-perturbative effects of QCD, first proposed by M. Lüscher in the context of trivializing maps [65], but later explored more in depths in [54, 66] and following papers. It works by modifying the fields according to a gauge-covariant “diffusion-like” flow equation which evolves them towards the stationary points of the action. For SU(3) this equation reads: ∂tf Bµ = Dν Gνµ , (1C.1) where tf is the so-called flow-time, the parameter that controls the evolution, and Bµ (tf , x) is the flowed version of the original gauge field with the initial condition at zero flow-time set to be: Bµ |tf =0 = Aµ . The other terms in the equation are just the generalization to the Bµ field of the covariant derivative and the field tensor: Gµν = ∂µ Bν − ∂ν Bµ + [Bµ , Bν ], (1C.2) Dµ = ∂µ + [Bµ , ·]. (1C.3) From Equation (1C.1) we can see that the right-hand side is proportional to the gradient of the action; hence the name. Its effect is intuitively that of moving the field Bµ along the steepest descent path of the action. This, in turn, corresponds to a “Gaussian-like” smearing of the gauge √ fields with a smearing root-mean-square radius of size 8tf . On the lattice, one can rewrite the flow equations in terms of the field Vtf (x, µ) which is the flow generalization of the link filed U (x, µ): ∂tf Vtf (x, µ) = −g02 [∂x,µ SG (Vtf )]Vtf (x, µ), (1C.4) where SG is the Wilson plaquette action, see Equation (1.23), computed with the Vtf (x, µ). This equation can be integrated numerically on the lattice using Runge-Kutta Munthe-Kaas methods [67, 68]. One crucial fact about the gradient flow is that the flowed gauge fields do not need any renormalization; therefore, correlation functions containing them have a well-defined continuum 43 limit [66]. For example, in the case of the topological charge that we presented in Appendix 1B, the gradient flow serves to dampen the high-frequency modes removing noise but leaving the topological charge unaltered. It is, therefore, an advantageous method to accurately determine the topological charge on the lattice. In fig. 1C.1, we show an example of the smearing effect given by the gradient flow on the topological charge density. One can observe that the highly fluctuating values of the density as tf increases are removed, leaving only the long-distance effects that are the ones that truly contribute to the topological charge. Figure 1C.1: Topological charge computed at fixed Euclidean time on a pure gauge lattice of size 323 × 64 with spacing √ a = 0.06793 fm. The different figures represent different values of the flow time with radii 8tf = 0, 0.14, 0.30, 0.43, 0.52, 0.60 fm. The visualization has been performed using the software called “LatViz,” developed together with H.M. Vege interface based on the open source visualization tool by LLNL VisIt [69]. √ The topological charge at 8tf = 0 fm (the non-flowed field configuration) is highly affected by short-distance over the lattice sites. One can notice that the effect of the flow equation is to √ remove these fluctuations, that at 8tf = 0.60 fm are almost entirely gone. For larger smearing 44 radii, the emergence of well-separated topological sectors becomes visible. 1C.1 Scale Fixing with the Gradient Flow In one of the first articles about the gradient flow, it was suggested that the method could be used for scale setting in Lattice QCD [54]. In particular, this is done by considering the √ dimensionless quantity t2f ⟨E⟩, in the region where 8tf is at a hadronic scale. In that case, the quantity has a very weak dependence on the lattice spacing, and thus, it is, also considering that it can be computed with high numerical precision on the lattice, suitable for scale setting. It was proposed to consider the value t0 , defined as the point where: t20 ⟨E⟩ = 0.3 (1C.5) as scale. In practice this means that all lattice observables are then expressed in terms of t0 /a2 , while the value of t0 needs to be fixed using a physical observable, for example as done in [70], √ where they find the value t0 = 0.1464(18) fm. Conversely, given an ensemble and the physical value of t0 , one only needs to find the value of tf /a2 where Equation (1C.5) is satisfied to find t0 and then obtain the lattice spacing a, see Figure 3.10 for example. 45 CHAPTER 2: MACHINE LEARNING METHODS FOR HADRON CORRELATORS FROM LATTICE QCD Despite the great hardware, software and algorithmic developments, the computational cost of a realistic lattice QCD calculation is still among the most demanding numerical calculations performed regularly in High-Performance Computing (HPC) facilities around the world [1–3]. As an example, four out of five of the largest awards of the 24th Call of the Partnership for Advances Computing in Europe in 2021 were for LQCD [4]. The majority of the time spent by a lattice QCD calculation is in the computation of the quark propagator, which not only provides the building block for the determination of any fermionic correlation function, but it is needed repeatedly during the Markov Chain Monte Carlo (MCMC) used to generate the gauge ensemble with dynamical quarks, as discussed in Section 1.3. The calculation of the quark propagator is achieved by inverting a very large sparse matrix representing the lattice Dirac operator, see Section 1.4. The problem becomes a sparse linear system once we fix the location of the source, however, it can still contain O(109 ) unknowns because the quark propagator is effectively a vector of the size of the number of lattice sites times a factor 12 coming from color and Dirac indices for Wilson-type fermions. These types of linear systems are usually solved by adopting Krylov Subspace (KS) methods, such as the Conjugate Gradient (CG) or its variations like the Biconjugate Gradient Stabilized (BiCGSTAB) method[5]. However, due to the computational advance in recent years, calculations started to approach the continuum limit or the physical value of the light quark masses, but this leads to a tremendous increase in the solver iteration count [6]. Several preconditioners can be used to speed-up these type of calculations such as even-odd preconditioning, domain- decomposition [7] and deflation [8]. A successful way to precondition KS solvers is based on several variants of so-called multigrid algorithms such as inexact deflation [9, 10], the Multigrid Generalized Conjugate Residual (MG-GCR) [11] and multigrid approaches which are adaptive aggregation based such as the DD-αAMG [12]. 46 Recently, some studies on how Machine Learning (ML) algorithms could be used to speed up lattice calculations have been released. Since these methods have matured considerably over the past decade, there is a lot of interest in understanding if and how they can be employed in LQCD calculations. ML algorithms have already been applied in [13] to try to reconstruct the dependence of more complicated observables on simpler ones, or, for example, in [14]. Other studies have focused on so-called normalizing flows for generating configurations from direct sampling [15, 16], attempting to find a replacement, full or partial, of the HMC using generative ML models. At the same time, there have been studies in Lattice QCD for methods of noise reduction [17], mainly in the context of disconnected diagrams, such as the hopping parameter expansion tech- nique, the truncated solver method and the truncated eigenmode acceleration. One of these, the truncated solver method (TSM), relies on the rapid convergence of solvers to the solution within very few iterations, with most of the time being spent afterwards in gaining only little precision more. In more precise terms, the logarithm of the residue of the quark propagator linear system has a roughly linear dependence on the number of iterations, as we will see in Section 2.1. This well-known fact can be used to reduce the variance if one computes a set of N propagators with stochastic sources on a gauge field configuration with very low precision, then on a subset M ≪ N a high precision propagator is computed as well. The high precision data is then used to compute the possible bias of the low precision estimates. In this chapter, we propose a different strategy, in a similar spirit to the TSM: by taking advantage of recent developments in Machine Learning algorithms, in particular in the field of supervised learning, we apply them to the problem of speeding up the calculation of the hadron correlation functions. While ultimately the goal is to accelerate the determination of the quark propagator, in this work, we focus on derived quantities, hadron correlation functions and effective masses to reduce the dimensionality of the ML problem. As a first step in using ML to determine hadron correlators, we propose to combine the information contained in correlation functions determined with quark propagators calculated with different precisions from the linear system 47 solver. The analysis presented in this work shows that quark propagators calculated with sloppier precision than the standard one contain enough information to let the ML algorithm reconstruct the hadron correlators calculated with more precise quark propagators. To train the ML algorithm, one makes use of the ensemble generated with the MCMC and subsets of it. This chapter is organized as follows: in Section 2.1 we thoroughly introduce the problem with some numerical lattice details and study the dependence on the solver precision and its impact on two-point functions. In Section 2.2 we introduce a few supervised ML algorithms and discuss their application to our calculation. In Section 2.3, we study the correlation of the data from our ensembles and propose a tentative algorithm, while in Section 2.4 we will present the results of the ML procedure. Finally, in Section 2.5 we summarize our conclusions. 2.1 Residue analysis of the BiCGSTAB Linear Solver The convergence properties of the BiCGSTAB algorithm on a quark propagator problem are non-trivial. In particular, there is no guarantee that the residue of the linear system as a function of iteration counts is monotonically decreasing and, in fact, that is usually not the case. A description of the algorithm can be found in Appendix 2B. In this section, we investigate the convergence of the algorithm for quark propagator inversions for the LQCD ensembles we will use for our ML procedure from the traditional perspective. In this work, we consider QCD with Nf = 2 + 1 dynamical quark flavors. The theory is regulated on a Euclidean lattice of spacing a with an Iwasaki gauge action [18] and a non- perturbative O(a) improved fermion action [19, 20] (clover fermions). The details of the ensembles we have used can be found in [21–23] and for completeness we summarize them in Table 2.1. The ensembles span a range of masses mπ = 400 − 700 MeV at a fixed lattice spacing (ensembles M1 , M2 , M3 ), and three different lattice spacings while keeping the total volume and pion mass approximately fixed (ensembles A1 , A2 , M1 ). It is conceivable that, in the range of masses and lattice spacings we have investigated in this work, the results we obtain are largely independent of the fermion lattice action used. It would still be very interesting to test the algorithm we propose with different types of lattice actions such as Ginsparg-Wilson fermions [24]. 48 β κl κs L T a [f m] mπ [M eV ] N M1 1.90 0.13700 0.1364 32 64 0.0907(13) 699.0(3) 399 M2 1.90 0.13727 0.1364 32 64 0.0907(13) 567.6(3) 400 M3 1.90 0.13754 0.1364 32 64 0.0907(13) 409.7(7) 450 A1 1.83 0.13825 0.1371 16 32 0.1095(25) 710(1) 800 A2 1.90 0.13700 0.1364 20 40 0.0936(33) 676.3(7) 790 Table 2.1: Ensembles from the PACS-CS collaboration [22], with clover fermion action. Physical quantities (lattice spacing a and pion masses mπ ) have been calculated for another work [23]. Here β is the inverse coupling β = 6/g 2 , κl , κs are the light (up and down) and strange inverse 1 mass parameters, κ = 2(4+m) , L and T are the number of spacial and temporal lattice sites and finally N is the number of configurations in each ensemble. 2.1.1 Krylov Space Solvers To fix the notation, we briefly repeat some well-known facts about the determination of quark propagators with Krylov space solvers. For completeness, a few additional details can be found in Appendix 2A, while an exhaustive treatment of iterative methods for sparse linear systems can be found in [25]. The quark propagator q(x) is obtained by solving the equation Dq(x) = η(x) , (2.1) where D represents the lattice Dirac operator of choice at non-zero mass and η(x) is the source field, as explained in Section 1.4. In our case, the lattice operator D is given by the non-perturbative improved Wilson fermion operator. The equation satisfied by the quark propagator Equation (2.1) is a very large and sparse linear system that can be solved by generating a sequence of solutions qn of increasing precision. The accuracy of the solution after n iterations is measured by the norm of the residue rn = η − Dqn . (2.2) In the following, when we refer generically to the norm, the Ln2 normed vector space and the corresponding induced norm on the vector space of matrices are to be intended. Definitions and selected properties of the norms used in this work can be found in Appendix 2A. As presented in 49 Section 1.4.2, the iterations of the KS algorithms are terminated when the inequality ∥rn ∥ < ϵ ∥η∥ , (2.3) is satisfied and thus the quark propagator is obtained with accuracy ϵ; we will also refer to this parameter as stopping criterion and it is a crucial element in the discussion throughout this chapter. 2.1.2 BiCGSTAB Solver History As an example of a Krylov space solver behavior for different accuracies, in the left plot of Figure 2.1 we show the average time needed to compute the quark propagator as a function of the stopping criterion for all the ensembles listed in Table 2.1. Calculations are usually run with very small values of ϵ to ensure that the solver precision is always smaller than the statistical uncertainty coming from the ensemble averages. However, we notice that reducing the precision of the calculation from ϵ = 10−8 to the range ϵ = 10−1 − 10−3 would imply a significant reduction in the overall computational cost, which is what we aim to use ML for. Times For BiCGStab Solver for Quark Propagator Solver History for Sample Inversions A1 A1 1500 A2 10−1 A2 Solver time [s] Residue Squared M1 M1 1000 M2 10−5 M2 M3 M3 10−9 500 10−13 0 0 −2 −4 −6 −8 0 200 400 600 log10 () Iteration Number Figure 2.1: Left Plot: Average time to compute the quark propagator as a function of the stopping criterion for all the ensembles (see Table 2.1). To make the comparison fair, all the timings have been obtained using the same computational setup, i.e. the same number of nodes, cores per node and geometry. The error bar represents the standard deviation of the timings of the configurations in the ensemble. Right plot: Residuum as a function of the number of iterations for one sample configuration from each ensemble. Results shown in both plots are obtained using a BiCGSTAB algorithm. For completeness, in the right plot, we show the iterated residuum as a function of the number of iterations for a representative gauge field configuration of all the ensembles listed in Table 2.1. 50 It is clear to see that the trajectory of the solution is not monotonic with the iteration count, meaning that different local minima are explored by the solver, however, there is a general linear trend with the log of the precision. The main idea of this work is to try to reduce the computational cost of the linear system for the quark propagator by using some information about the solver history on a subset of the gauge ensemble and then using supervised ML methods to speedup the computations on the rest of the ensemble. As a first try, we use numerical data for different stopping parameters ϵ as training and prediction data sets. For example, using a precise measurement of the propagator (ϵ = 10−8 ) on a subset of the ensemble and a less precise one (ϵ = 10−1 , 10−2 , 10−3 ) on the whole ensemble. This would dramatically reduce the time required to compute observables. 2.2 Machine Learning Regression for Two-point Functions The field of Supervised Machine Learning is based upon the idea that for an arbitrary class of functions F = {f (x|θ) : θ ∈ Rl }, with f (x|θ) : Rn → Rm , parameterized by θ, and given a finite set of pairs T = {xi , yi }N i=1 , one can find the best function f (x|θ) in the class that approximates the probability distribution that generated the pairs in T . This might remind the reader of regular fitting, and indeed supervised learning is a superset of fitting procedures. The generalization comes from the possibility of using very generic function classes F which can be very useful in cases where the function F (x) : Rn → Rm that governs a process of interest is unknown and maybe hard or even impossible to define rigorously. In such cases, if a sample of input-output data from such a function or process is available (this is called the training set), one can use supervised ML to at least get an approximation of F (x) [26]. The procedure to obtain a valid approximation for F (x) is divided into two parts: the choice of the class of function F or model, these are what are commonly known as ML methods or algorithms, and the optimization of the parameters θ, known in the jargon as weights. The optimization is usually done with gradient descent or derived methods, mainly Stochastic Gradient Descent (SGD) [26] and related algorithms such as Adam [27] or AdaGrad [28]. All these optimization methods require a loss function to estimate the distance of the current approximation 51 candidate f (x|θ∗) from the known value for the training set. A simple choice is the squared error loss, or L2 -norm, L2 = N 2 P i (yi − f (xi |θ∗)) , but other loss functions can be defined. While the choice of the minimization algorithm, its input and parameters can affect the end results, the most important choice that impacts the model is the class of functions F. For example, one can choose the set of linear functions from Rn → Rm and an appropriate number of coefficients represented in θ. If the L2 -norm is chosen while optimizing, then, in fact, the whole procedure is the familiar case of linear regression. Other types of functions can be used and there is a very active field of research in approximation theory to define the convergence properties of different function classes. 2.2.1 Artificial Neural Networks An Artificial Neural Network (ANN or simply NN) is a function or model composed of a series of linear transformations alternated by non-linear operations. It has a set of unknown parameters, the weights, that schematically link sets of nodes that make up what are called layers. A very common subclass is that of feedforward NN, which means that layers are only connected in order, with their output being the input of only the following layer. Among these, the most popular one is the Multi-Layer Perceptron (MLP), which is NN made of a series of fully connected layers, see Figure 2.2. It has been proven that MLPs are universal approximants [29, 30], meaning that any function can be approximated arbitrarily well with such NN, however, the number of layers and their sizes are not determined uniquely, so no information is known exactly on how to find the optimal approximant, just its existence. The inner layers are called hidden because they do not represent any meaningful state on their own and hence are difficult to access and investigate them. Each node in a layer is based on the perceptron model, schematically drawn in Figure 2.3 Each node takes N -input values {x1 , . . . , xN } and models the final target y based on an N -sized vector of weights w and a bias b: XN y = σ(b + wi xi ), (2.4) k=1 52 Input Layer Hidden Layers Output Layer Figure 2.2: Example structure of a Neural Network for regression. The size of the input and output layers are fixed by the function that needs to be fit. The black lines represent the weights connecting the different nodes. Figure 2.3: Neural Network unit used in this work. First a linear combination of the N inputs xi with weights wi is summed to a bias b, then to the result t is passes to a non-linear activation function, a sigmoid in this case. where the σ is the activation function, which is a non-linear function, some simple example of which are the sigmoid: 1 σ(x) = , (2.5) 1 + e−x or the Rectified Linear Unit (ReLU):   0 if x < 0  σ(x) = (2.6)  x if x ≥ 0.  A layer is then composed of a set of such perceptrons and we can then formulate the problem for a single layer in matrix notation: y = σ(b + W · x), (2.7) 53 Now we can generalize the result to more layers. For example a 3-layer MLP would be: f (x) = σ3 (b3 + W23 · σ2 (b2 + W12 · σ1 (b1 + W01 · x))), (2.8) which is now clearly a series of alternating linear and non-linear transformations on the input vector x. Often, but in particular for the case of regression NN, the outermost function σ is taken to be an identity. The number of layers and their sizes are not fixed and have to be chosen wisely: these are examples of what are called hyperparameters. Other empirical choices are, for example, the type of activation function for each layer, the initial values for the weights and biases before the minimization is run or the minimization algorithm and its parameters, see Appendix 2C. To find the optimal weights and the biases of our NN we use the backpropagation algorithm, which is nowadays the workhorse for fast learning in NNs. The backpropagation algorithm became popular after the article by Rumelhart et al. [31], but it has its origin in the Master Thesis results of S. Linnainmaa [32]. The backpropagation algorithm is the crucial element of all Gradient Descent methods as it allows the calculation of the partial derivatives of the loss function of all the weight and biases without having to store any data or having an explicit functional for the derivatives. It is one of the most important examples of dynamic programming. In general, the optimization of the parameters of a NN is a highly non-trivial task of convex optimization. One of the main issues is the presence of several local minima of the loss function. In practice, one can only test on some sample data whether the fitted parameters give a good representation of the data or not. This is made even worse by the risk of over-fitting the model to the training data, which is exacerbated by the usually very large number of parameters. To mitigate this problem, often, a subset of the training data is not used for fitting the parameters but for validation or bias correction. Another way to mitigate over-fitting is by imposing some constraints on the values of the parameters, by, for example, modifying the loss function, see Appendix 2C. 54 2.2.2 Partitioning of the data As previously stated, in this work, we want to train a NN to model the trajectory of the BiCGSTAB solver from one or more low precisions to a high one. In order to do this, we compute the hadron two-point functions C(t) on all the N configurations of each ensemble in Table 2.1. These correlation functions are obtained from propagators that have been determined with four different stopping criterion: ϵ = 10−1 , 10−2 , 10−3 , 10−8 . NL Out of the N configurations, we choose a fraction f = N that will be used for training our ML model. We will call this partition the labeled fraction because for each configuration, we know both the low precision value of the correlation function and the target value, ϵ = 10−8 that we want to reproduce during the training. The test set, from which the ML estimator for the correlator is computed, is chosen as the remaining (1 − f )N configurations of the ensemble. The data for the precise correlator (ϵ = 10−8 ) on this set is never used in the ML process. It is only used to compare the ML predictions with a full-length traditional calculation and thus to test the efficiency of the model discussed. The labeled configurations are further partitioned into two sets of size NT and NB . The first are the ones used for training the NN model, while the second is used for bias correction, as done for example in [13] and shown more in detail in Section 2.3.1. Figure 2.4 shows a simple diagram of the data partitioning for clarity. 2.2.3 Single Point Correlator Models The set of ML models considered for this study all focuses on finding relations between a hadronic correlator C(t) at low precisions and the precise one. In what we will call Single Point Neural Network models (SPNN), the relation is found for every source-sink separation time independently. It effectively corresponds to training T different ML models, one for each value of t, for every hadronic correlator. There is much freedom in the choice of the model, as one can choose different input variables while keeping the target variable fixed. However, one has to consider that increasing the dimensionality D of the problem can affect the convergence. Here is the list of single-point models used in this work: 55 Compute a precise correlator and a sloppy correlator on a fraction Training Set of the ensemble for training and bias correction Total ensemble size Bias correction set Expensive inversions Use only the information of the sloppy correlator to Cheap inversions estimate the precise one Figure 2.4: Simple diagram on how the data is partitioned. On the whole ensemble the “sloppy” correlator C S (t) is computed from propagators with stopping criterion ϵ = 10−1 , 10−2 , 10−3 . On just NL configurations the “precise” correlator C P (t) is computed with ϵ = 10−8 . Input Layer Hidden Layers Output Layer Figure 2.5: Schematic representation of a NN for the model with multiple ϵ, SPNN2. It is important to note that all layers in this model are rather small. 56 • SPNN2: a Neural Network using ϵ = 10−1 , 10−2 as input and ϵ = 10−8 as target. This is done independently for every source-sink separation time t, i.e. a neural network is created and trained for every t. • SPNN3: same as SPNN2 but using ϵ = 10−1 , 10−2 , 10−3 as input and ϵ = 10−8 as target. • SPNN2t1/SPNN3t1: a Neural Network using ϵ = 10−1 , 10−2 (and 10−3 ) at time t and t ± 1 as input and ϵ = 10−8 as target. The difference with SPNN2/SPNN3 is that the input given to the NN also contains data from neighboring source-sink separation times t. • SPNN2t2/SPNN3t2: a Neural Network using ϵ = 10−1 , 10−2 (and 10−3 ) at time t, t ± 1 and t ± 2, as input and ϵ = 10−8 as target. It is clear that the dimensionality of each model space is then D = (2n + 1)Nϵ , where Nϵ is the number of stopping parameters used as input and n is the number of neighboring time points used. In Figure 2.5 we show the example structure of the NN corresponding to the SPNN3 model. To fix the size of the hidden layers, we followed a semi-empirical method that gradually reduces the size of the network until its performance decays. More details are in Appendix 2C. 2.2.4 Global Correlator Models A more radical approach is to create a model to directly relate a low precision correlator, considered as a vector of size T , or T × Nϵ , if multiple stopping parameter data is used as input at once, to a high precision one. Clearly, the dimensionality of the problem is much larger than the ones in the previous section, as now the ML model represents a function f : RT ×Nϵ → RT . Depending on the maximum ϵ used, we name these models GNN2 and GNN3 in analogy with the single point models. In Figure 2.6 we show the example of a NN corresponding to the GNN2 model. The size of the hidden layers has been set following the procedure in Appendix 2C. 2.3 Accelerating Correlator Calculations with ML The first consideration to ensure the feasibility of our ML procedure is to check if there is some degree of correlation between the hadron correlators at different precisions. For notation’s sake, given a gauge configuration Ui and a propagator evaluated with accuracy ϵ = 10−n , we (i) label the correlator as Cn (t; Ui ) = Cn (t). The average over the ensemble is denoted as simply 57 Input Layer Hidden Layers Output Layer Figure 2.6: Schematic representation of a NN for the global correlator model. This particular NN corresponds to the GNN2 model. The input layer is of size T × Nϵ while the output layer is of size T . The size and number of intermediate hidden layers are parameters of the model differ from the ones in the picture. 1 PN (i) Cn (t) = N i=1 Cn (t). To determine the correlation among the 2-point functions evaluated at different precisions we define the usual sample correlation coefficient as: PN  (i)  (i)  i=1 C m (t) − C m (t) C n (t) − C n (t) rmn = r  2 rP  2 . (2.9) PN (i) N (i) i=1 Cm (t) − Cm (t) i=1 Cn (t) − Cn (t) In the top plot of Figure 2.7 we show a heat map of r28 for the ensemble M2 as a function of the Euclidean time separation between the two interpolating operators t/a for selected hadrons. In the bottom plots, we plot the actual value of C2 (t; Ui ) against C8 (t; Ui ) for two different values of t for the pion: a small value, where the excited states contribution should still be large (bottom left), and an asymptotic one where the ground state should be dominant (bottom right). We clearly observe a high level of correlation for most of the 2-point functions. The correlation though, decreases towards the center of the lattice, where the signal-to-noise decreases. One could phenomenologically conclude that one starts to lose the signal when the accurate solution of the quark propagator is statistically uncorrelated to the quark propagator obtained during the first steps of the iterations. If we consider the iterative nature of the KS solvers, somehow, the loss of 58 correlation further away from the source location is expected. The higher the accuracy requested, the higher the number of iterations and the larger the distance from the source where the solution is accurate. Nevertheless, the clear correlation, which to first approximation is linear even in the large t case, can be exploited to extract information about the precise correlation starting from the sloppy one. Correlation Map of Hadrons for Ensemble M2 . Precisions:10−2 , 10−8 π 0.95 Hadron Species ρ 0.90 f0 0.85 a0 0.80 p 0 8 16 24 32 40 48 56 t Correlation of Proton for Ensemble M2 . Correlation of Proton for Ensemble M2 . ×10−21 Precisions: 10−2 , 10−8 t = 6 ×10−25 Precisions: 10−2 , 10−8 t = 21 4 1.4 3 1.2 1.0 2 C(t),  = 10− 2 C(t),  = 10− 2 0.8 1 0.6 0 0.4 −1 0.2 0.0 −2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 −2 −1 0 1 2 3 4 Precise C(t),  = 10− 8 ×10−21 Precise C(t),  = 10− 8 ×10−25 Figure 2.7: Top: The correlation coefficient between the correlator calculated at ϵ = 10−2 and ϵ = 10−8 for various hadrons as a function of the source-sink separation time. Bottom: the proton correlator at two precisions (ϵ = 10−2 , 10−8 ) plotted against each other for two different values of t. In Figure 2.8, we show all the data that is used as input for our calculations, both in the form of the correlator and as the effective mass plot, where central values and error bars are calculated using the information of all configurations of the ensemble utilizing the bootstrap method, see Appendix 1A.2. Our ML procedure intuitively has the aim of finding a mapping from lower ϵ data to the high precision one (the red data in the plots) at the configuration level, exploiting the 59 correlation between different precisions. One thing to notice is that the data for ϵ = 10−3 is overlapping very closely with the precise results of ϵ = 10−8 . This is the reason why it has been decided to consider the pairs of models SPNN2, SPNN3 and GNN2, GNN3 to check if the results are significantly different when the ϵ = 10−3 data is included. One has to consider, however, that there is a computational cost difference between ϵ = 10−3 and ϵ = 10−2 , so this, as we will see in Section 2.4.1, needs to be accounted for when comparing the results. Pion Correlator - Ensemble M2 Proton Correlator - Ensemble M2 10−19 10−12  = 10 −1  = 10−2 10−22  = 10−3  = 10−8 10−25 10−14 C π (t) C P (t) 10−28  = 10−1  = 10−2 10−16  = 10−3 10−31  = 10−8 0 10 20 30 40 50 60 0 10 20 30 40 50 60 t t Pion Effective Mass - Ensemble M2 Proton Effective Mass - Ensemble M2 1400 5000  = 10 −1  = 10−1 1200  = 10−2 4000  = 10−2  = 10−3  = 10−3 mπeff (t) MeV mPeff (t) MeV 1000  = 10−8 3000  = 10−8 800 2000 600 1000 400 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 t [l.u.] t [l.u.] Figure 2.8: Top row: Two-point correlator for the pion (left) and proton (right). Bottom row: effective mass for the pion (left) and proton (right). Different data series represent the varying stopping criterion ϵ. All data are for the ensemble M2 . 2.3.1 Bias Correcting Procedure Throughout this analysis, we employed a technique to remove the bias of the high precision ML estimates as outlined in [33], by considering the chance that the low-precision data for an observable OLP and the high-precision OHP one could have different expectation values. Calling 60 O∗HP the high precision estimate from the ML model using the OLP , one then considers the expectation value for OHP : ⟨OHP ⟩ = ⟨O∗HP ⟩ + ⟨OHP − OLP ⟩. (2.10) The second term is what we call the bias correcting term. It can be estimated with high precision from a small sample of data when the fluctuations of OHP and OLP are highly correlated, which we assume is the case for the problem we have at hand because the underlying gauge dynamics affects the correlator results at different precisions in a similar way. The unbiased estimator for the observable then becomes: −NL NX NB HP 1 ∗HP 1 X Ō = O + (OHP − OjLP ). (2.11) N − NL i=0 i NB j=1 j It is important to note that two sums run on different partitions of the ensemble data, as shown in Figure 2.4. The first one is for the N − NL unlabeled configurations on which only the low precision data is known, and the value of O∗HP is calculated using the ML model. The second sum runs on the bias correction subset of the labeled fraction of the ensemble, for which both OHP and OLP are known. This bias correcting procedure is the same as in the TSM method of [17], but here it is done at the ensemble level and not for a single configuration. 2.3.2 Error determination A popular method to improve the statistical analysis of lattice QCD data is the bootstrap resampling method [34]. Since, in this work, we present both the results coming from both a “traditional,” or “fully precise,” calculation and ones coming from a ML-aided method, it is important to specify how the bootstrapping of data is performed in the two cases. The data of the traditional calculation come from bootstrapping the results of the correlator C(t) from each configuration. As usual, in order to preserve the correlations between the data, one needs to apply the same sampling to all values of t. Throughout this work, we used Nboot = 1000 samples of size N (the number of configurations in the ensemble) for this case. The values for the 61 variance of all the expectation values presented come from the variance of the bootstrap sample averages. More details are found in Appendix 1A.2 The determination of the uncertainty coming from the ML procedure is more involved. Cru- cially one need not mix the labeled data with the unlabeled data. Our first step was then to create this partition, even though in an actual application of this method, this would not be necessary as the precise calculation data would not be available for all configurations. Each subset of the data is then bootstrapped independently, but within the same partition, all the data for different precisions, including the precise one, are sampled using the same random draws for the bootstrap, which are also maintained across different values of t, to preserve the correlations between the data. The bootstrap samples for the labeled data are then used to train Nboot neural networks (all with the same structure, obviously) and to calculate the bias correcting term. To each of these NN a bootstrap sample of unlabeled data is assigned and the ML estimator for each NN is computed, so the sample expectation value of an observable is then the average of the bias-corrected NN predictions on the sampled variables. Finally, as is standard for the bootstrap method, the final estimator and its variance are the average of the sample expectation values and their variance respectively. 2.4 Numerical Results We tested all of our models on the ensembles from the PACS-CS collaboration presented in Table 2.1 to understand the feasibility of our ML procedure for hadron correlators. Each model was trained using bootstrap, as presented in Appendix 1A.2, so that the results presented below are all bootstrap averages of different NN. 2.4.1 Performance Metrics To get a quantitative estimate of the performance of the algorithms, we define a computational- time gain factor ΓM L for every model. It depends on the fraction f of the ensemble used for training the NN and on the minimum ϵ used as input for the model. If we denote with tP the total time that one would need to determine the quark propagator with the highest accuracy, ϵ = 10−8 62 (the traditional calculation), the relative time gain factor is tϵmin ΓM L (f ) = (1 − f ) + f . (2.12) tP This is the sum of the relative total time required to obtain the correlator with the highest low precision ϵmin , either 10−2 or 10−3 depending on the model, on the N − NT configurations where tϵmin only the low precision is needed, tP (1 − f ), added to the training fraction f itself. ΓM L is not a time but rather a ratio of times representing a gain; it is bounded by 1, when the time to compute the low precision correlator tϵ is equal to the precise time tp or when f = 1, but since in general tϵ ≤ tp , then ΓM L < 1. The smaller ΓM L , the larger the computational gain. In computing Γ(f ) we neglected the time spent for training the NN, which is negligible when compared to the propagator calculations: it is of the order of ≈ 10 minutes on a regular laptop. In order to assess the quality of the results, we computed three quantities for each model: σM2 L • the ratio of the variances σE2 between the ML results and the high precision ones • the compatibility of the results |⟨O⟩ √M2L −⟨O⟩2E | σM L +σE • the algorithmic efficiency, what we define as: 1 σE2 AE(f ) = 2 . (2.13) ΓM L (f ) σM L This combines the time gain factor and the loss of precision in the variance. The higher this quantity, the more efficient it is to use the ML algorithm as opposed to perform a traditional calculation. These have been computed for multiple observables, namely for the correlator C(t1 ) at a fixed small Euclidean time t1 ≈ NT /10, the correlator at a fixed asymptotic time C(t2 ) ≈ NT /3 and for the effective mass calculated using a cosh fit in the case of mesons and a constant fit for baryons. 2.4.2 Single Point Models In the following pages, we show the results for the variance ratio, the compatibility of the results and the algorithmic efficiency for all the ML models we presented for both the pion and proton data on the ensemble M2 . There are three plots for each quantity as it is measured on t1 , 63 t2 and for the effective mass. For clarity, with “fully precise” (FP) results, we mean the observables computed over all configurations of the ensemble with a high precision propagator. The first observation to make is that all single-point models behave very similarly. The discrepancy between the SPNN3 class of models and the SPNN2 class is more evident, while increasing the number of time points in the model, so going from SPNN2 to SPNN2t1 and SPNN2t2 or the equivalents for SPNN3, does improve the results but not as much as increasing the number of precisions used in the model. This is very clear at large t and for the effective mass. There appears to be a larger gain in adding time points to the SPNN2 model than in the SPNN3 model. One thing that is apparent from Figure 2.9 is that the ML results are always compatible with the FP results, with compatibility consistently less than 0.8, so the central value obtained from ML is always well within 1σ of the precise one. A more interesting observation can come from looking at the variance ratio, Figure 2.10: we observe the variance for single points being always very close to the FP for sufficiently large f , but it is much higher for the effective mass. The fact that for low f the variance ratio is large is indicative of the inability of the ML models to capture the relation between the data fully. At large f , one would expect the variance ratio to rise, as less information can be extracted from the data and the effect of increasing f only results in reducing the sample size. We observe this behavior clearly for the pion data at t1 , where in fact, we expect the convergence of the solver to be extremely fast as it is close to the source. At the same time, for other quantities and for the proton, the effect is small, but however the variance ratio seems to plateau on a fixed value. 64 Compatibility for Single Point Network Results, Ensemble M2 Pion t1 = 6 Proton t1 = 6 1.4 1.4 1.2 1.2 1.0 1.0 |CML (t)−CE(t) | |CML (t)−CE(t) | 2 +σ 2 2 +σ 2 σML 0.8 σML 0.8 E E 0.6 0.6 √ √ 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f Pion t2 = 21 Proton t2 = 21 1.4 1.4 1.2 1.2 1.0 1.0 |CML (t)−CE(t) | |CML (t)−CE(t) | 2 +σ 2 2 +σ 2 σML 0.8 σML 0.8 E E 0.6 0.6 √ √ 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f Pion meff Proton meff 1.4 1.4 1.2 1.2 1.0 1.0 |meff eff |meff 2 +σ 2 eff 2 +σ 2 0.8 0.8 ML −mE | ML −mE | σML σML E E 0.6 0.6 √ √ 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f SPNN2 SPNN2t1 SPNN3 SPNN3 SPNN3t1 SPNN3t2 Figure 2.9: Compatibility between the ML results, using SPNN methods, and the fully precise results for the ensemble M2 as a function of the training fraction. The top row is the correlator at time t1 = T /10, the middle row is the correlator at time t2 = T /3 and the lower row is the results of a constant fit for the effective mass. 65 Variance Ratio for Single Point Network Results, Ensemble M2 Pion t1 = 6 Proton t1 = 6 4.5 3.5 4.0 3.0 3.5 3.0 2 2 2.5 2 2 σML /σE σML /σE 2.5 2.0 2.0 1.5 1.5 1.0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Training Fraction f Training Fraction f Pion t2 = 21 Proton t2 = 21 18 3.5 16 3.0 14 12 2.5 2 2 2 2 10 σML /σE σML /σE 8 2.0 6 4 1.5 2 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Training Fraction f Training Fraction f Pion meff Proton meff 10 14 12 8 10 2 2 6 2 2 8 σML /σE σML /σE 6 4 4 2 2 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Training Fraction f Training Fraction f SPNN2 SPNN2t1 SPNN3 SPNN3 SPNN3t1 SPNN3t2 Figure 2.10: Variance ratio between the ML results, using SPNN methods, and the fully precise results for the ensemble M2 as a function of the training fraction. The top row is the correlator at time t1 = T /10, the middle row is the correlator at time t2 = T /3 and the lower row is the results of a constant fit for the effective mass. 66 Algorithmic Efficiency of Single Point Network Results, Ensemble M2 Pion t1 = 6 Proton t1 = 6 2.2 2.25 2.0 2.00 1.8 1.75 1.6 AE(f ) 1.50 AE(f ) 1.4 1.25 1.2 1.0 1.00 0.8 0.75 0.6 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f Pion t2 = 21 Proton t2 = 21 2.4 1.8 2.2 1.6 2.0 1.4 1.2 1.8 AE(f ) 1.6 AE(f ) 1.0 0.8 1.4 0.6 1.2 0.4 1.0 0.2 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f Pion meff Proton meff 1.6 2.00 1.4 1.75 1.2 1.50 1.25 AE(f ) AE(f ) 1.0 1.00 0.8 0.75 0.6 0.50 0.4 0.25 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f SPNN2 SPNN2t1 SPNN3 SPNN3 SPNN3t1 SPNN3t2 Figure 2.11: Algorithmic efficiency of the ML results, using SPNN methods, and the fully precise results for the ensemble M2 as a function of the training fraction. The top row is the correlator at time t1 = T /10, the middle row is the correlator at time t2 = T /3 and the lower row is the results of a constant fit for the effective mass. 67 When looking at Figure 2.11, the AE, however, it is clear that SPNN2 models lead to better efficiency, as their Γ, the time gain factor, is much smaller because they use propagators only up to ϵ = 10−2 which are cheaper. The AE is also an effective measure to discern the optimal training fraction; it is clear that larger training fractions do improve the variance slightly but come at a higher computational cost. From our data, one can infer that for the pion, the optimal range for f is somewhere in the interval 0.10 − 0.25, while for the proton, a larger training fraction is needed, in the range 0.15 − 0.30. It is also interesting to see that for t2 and for the effective mass of the proton, the AE is higher for SPNN3 than for SPNN2 and that the AE also is smaller than 1 for some combinations of model and f . This could indicate that the relation in the region where the signal-to-noise starts to degrade, one is forced to use higher precision solvers. In general, we find that there are wide ranges of f where the algorithmic efficiency is larger than 1, meaning that the ML learning method is an effective way to speed up the calculation of hadron two-point correlators and the effective mass, with the best case being the pion for which a computational gain of more than a factor 2 can be consistently gained. There are, however, some critical challenges to this ML approach. The first is the wild fluctuations of the results shown in the figures above. The ML training procedure works well on average, but it seems to fail catastrophically for some setups, for example, one can look at the proton variance ratio for f = 0.10 for t2 in Figure 2.10. For all SPNN2 and SPNN3 models, there is a sharp spike, also present moderately at t1 but not for the effective mass. This could be due to the sample of data used for training since it is present irrespective of the model chosen. One possible source of the issue can be hypothesized by the compatibility, which for the same point is strikingly low. This means that the ML model is fully capturing the relation between the low and high precision data, but it is failing to reproduce its correct distribution; hence this could be a case of over-fitting. However, given the small sample size and the small size of the ensemble in general, this can be a hard problem to overcome. The second issue comes from the difference in the variance ratio of the single point of the 68 correlator C(t1 ) and C(t2 ) and the effective mass. For the SPNN models, the variance is always much larger for the effective mass than it is for the correlator for sufficiently large f , where there the ML models have converged. This potentially indicates a loss of correlation between the data at different t in the final ML results, which is problematic when, for example, performing fits to the data like in the case of the effective mass. This is naturally given by the fact that we are training independent models for each value of t, but while the input data is prepared in such a way that correlations between the data are preserved, in practice, the bootstrap samples used for training at different values of t all resample the data with the same random sequence, the output data depends on the training and the inner workings of the NN. This makes it so that the predicted values of the correlator coming from the ML model on a single configuration might be decorrelated at different t. We can see, however, that this effect is mitigated by adding the data for more values of t as input to the NN, which is the reason why SPNN2t1 and SPNN2t2 have better performance than SPNN2, and the same for the SPNN3 class. Adding the additional t increases the information about the time correlation in the NN, but from our results we can conclude that this is not enough to estimate meff correctly, for example. In Figure 2.12 we show the results on different ensembles for the SPNN3t2 model, which is the one that uses the most data and, in general, performs best, even though it has a lower time gain factor than the SPNN2 models. We observe no significative difference in behavior as masses, volumes and lattice spacings are changed, which is a good indication that this ML procedure can be applied safely for a wide range of physical parameters for the lattice gauge field ensembles. In general, the SPNN models are an effective method for reconstructing the hadron correlators and their distribution from low precision data. However, there are concerns that should be addressed for their practical use for calculations of secondary quantities like the effective mass. 69 SPNN3t2 Effective Mass Results for All Ensembles Pion Variance Ratio Proton Variance Ratio 20.0 17.5 8 15.0 12.5 6 2 2 2 2 σML /σE 10.0 σML /σE 4 7.5 5.0 2 2.5 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f Pion Compatibility Proton Compatibility 1.4 1.4 1.2 1.2 1.0 1.0 |meff eff |meff 2 +σ 2 eff 2 +σ 2 0.8 0.8 ML −mE | ML −mE | σML σML E E 0.6 0.6 √ √ 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f Pion Algorithmic Efficiency Proton Algorithmic Efficiency 1.6 1.8 1.4 1.6 1.2 1.4 1.0 1.2 AE(f ) 0.8 AE(f ) 1.0 0.6 0.8 0.4 0.6 0.4 0.2 0.2 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f A1 A2 M1 M2 M3 Figure 2.12: Variance Ratio (top), Compatibility (middle) and Algorithmic efficiency (bottom) of the ML effective mass results, using the SPNN3t2 method, and the fully precise results for the ensembles of Table 2.1 as a function of the training fraction. 70 2.4.3 Global Network Models The global network models are clearly much more complicated to train than the single-point networks. For a start, their input and output layers are of the order of the number of lattice sites in the temporal dimension, but for ML models, this is not an unprecedented size. What is potentially problematic is the nature of the data itself, in particular, the exponential decay of the correlator as t increases: the data spans several orders of magnitude and can, in exceptional cases, be negative, which prevents a logarithmic treatment of the data. As explained in Appendix 2C, the way to solve the issue is to scale all the data at a given time slice to have unit variance and zero mean. While this procedure has no consequence in the SPNN case, in GNN models, this effectively means that the network is only left to learn the correlations between the data and not the general behavior of the function. In the following figures, we show the results for the effective mass from GNN3 on all ensembles; the results for GNN2 are very similar, so they have been omitted for clarity of the figures. The most striking difference between SPNN and GNN models is the inversion of the roles of the variance and of the compatibility. The results have, in fact, a very small variance ratio when compared to the fully precise calculation but end up being a few σ away from the true results. The low variance is a clear sign of the network’s inability to fully capture the distribution of the correlator at every point t, while the low compatibility of the results implies that also the central value of the distribution is incorrect, at least in the region where the effective mass fit is performed. This also points to another problem coming from the scaling of the data: GNN models in no way can distinguish points, in the case of the proton, where the signal-to-noise ratio is large (small t) or small (t ≈ T /2). So the effectiveness of the training is significantly reduced. This is, however, not an exhaustive explanation for the effect as the pion is also affected. An apparent limitation of the GNN models is the extreme fluctuations of the results, especially of the compatibility. From the bottom two panels of Figure 2.13, we observe that, for example, the compatibility of the meff for the pion on the A2 ensemble oscillates from ≈ 7 to almost 0, when changing f from 0.1 to 0.2. This is a clear indication of the instability of the method. 71 One last consideration to make is that the GNN does give a small variance for the effective mass, as opposed to the SPNN models, implying that the correlations between different t are indeed preserved, although around the wrong central value and with a too small variance. Overall, GNN models do not seem suitable to the problem at hand, as they fail at reconstructing the hadron correlator distribution properly. GNN3 Effective Mass Results for All Ensembles Pion Variance Ratio Proton Variance Ratio 0.4 0.20 0.3 0.15 2 2 2 2 σML /σE 0.2 σML /σE 0.10 0.1 0.05 0.0 0.00 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f Pion Compatibility Proton Compatibility 12 8 10 6 8 |meff eff |meff 2 +σ 2 eff 2 +σ 2 ML −mE | ML −mE | σML σML 6 E E √ 4 √ 4 2 2 0 0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Training Fraction f Training Fraction f A1 A2 M1 M2 M3 Figure 2.13: Variance Ratio (top) and Compatibility (bottom) of the ML effective mass results, using the GNN3 method, and the fully precise results for the ensembles of Table 2.1 as a function of the training fraction. 72 2.5 Outlook Currently, the feasibility of this ML procedure is being studied. Some preliminary results seem encouraging, however, there is the question of how reliable the proposed algorithm is. We believe that there is certainly some information to be extracted from the solver history in the determination of quark propagators and this can be used to speed up the calculation. What we have found is that applying ML to derived quantities like the two-point correlation function or the effective mass is not sufficient to reach a degree of confidence in the results that is satisfactory for us. This is especially true as, in practice, there is always the possibility of performing the fully precise calculation, which is “only” more expensive numerically, but certainly more stable and reliable. However, if we want to adopt this ML routine, we would have to be confident that the computational gain comes at no expense of the final precision of our results. One possible way forward would be to test the limits of validity of this ML procedure further, i.e. going to larger volumes or to smaller pion masses. In those ranges, the numerical cost increases dramatically, which might make the algorithmic efficiency steadily large. This would effectively mean that the time that is saved from calculating only low precision propagators could be spent on sampling more sources on the same gauge field. This last point also suggests another possible improvement for our derived quantities approach, which is to use multiple stochastic sources per configuration and perform the ML training on the average of those or use them to effectively enlarge the training set size. However, this would shift the usefulness of the algorithm towards the possibility of only being applied for relatively simple derived quantities, as most of the information on the single linear system solve would be averaged out. This approach would then effectively be a variation of the TSM. A more interesting development would probably come from accessing the propagator directly during the iterative solver and training a model on the full quark propagator as opposed to the correlator. However, this is very hard to implement in practice for two reasons. The first is that the current LQCD software that computes propagators is highly optimized (mostly GPU) code, which makes it hard to modify efficiently. The second hurdle to overcome would be the large 73 size of the network needed for training, as the quark propagator is a very large object in terms of numerical size (it has a size proportional to the lattice size), which would make the training of such model very expensive and challenging. One last attempt would also be to try to apply different ML strategies, like analytic continuation or unsupervised learning, but those would have to be radically different approaches from what we have presented. Nevertheless, utilizing the solver trajectory to gain information on the quark propagator and speed up its calculation is something that should be explored further, not just for lattice QCD applications. The potential impact of this procedure to compute hadronic observables could also possibly be adapted to the speed of the calculation for pseudo-fermions in the Markov Chain Monte Carlo algorithms and, more in general, it could enable the calculations on large lattices, where the Dirac operator is prohibitively large for current computing architectures. 74 BIBLIOGRAPHY [1] B. Joó et al., “Status and future perspectives for lattice gauge theory calculations to the exascale and beyond,” The European Physical Journal A, vol. 55, no. 11, p. 199, Nov. 14, 2019, issn: 1434-601X. doi: 10.1140/epja/i2019-12919-7. [2] R. Brower, N. Christ, C. DeTar, R. Edwards, and P. Mackenzie, “Lattice QCD application development within the US DOE exascale computing project,” EPJ Web of Conferences, vol. 175, M. Della Morte, P. Fritzsch, E. Gámiz Sánchez, and C. Pena Ruano, Eds., p. 09 010, 2018, issn: 2100-014X. doi: 10.1051/epjconf/201817509010. [3] C. Drischler et al., “Towards grounding nuclear physics in QCD,” Progress in Particle and Nuclear Physics, vol. 121, p. 103 888, Nov. 1, 2021, issn: 0146-6410. doi: 10.1016/j.ppnp.2 021.103888. [4] S. Erotokritou. “PRACE 24th call for proposals supports high-gain scientific research and industrial innovations,” PRACE. (May 3, 2022), [Online]. Available: https://prace-ri.eu/p race-24th-call-for-proposals-supports-high-gain-scientific-research-and-industria l-innovations/. [5] H. A. van der Vorst, “Bi-CGSTAB: A fast and smoothly converging variant of bi-CG for the solution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631–644, Mar. 1992, issn: 0196-5204, 2168-3417. doi: 10.1137/ 0913035. [6] J. Brannick, R. C. Brower, M. A. Clark, J. C. Osborn, and C. Rebbi, “Adaptive multigrid algorithm for lattice QCD,” Physical Review Letters, vol. 100, no. 4, p. 041 601, Jan. 28, 2008, issn: 0031-9007, 1079-7114. doi: 10.1103/PhysRevLett.100.041601. [7] M. Lüscher, “Solution of the dirac equation in lattice QCD using a domain decomposition method,” Computer Physics Communications, vol. 156, no. 3, pp. 209–220, Jan. 15, 2004, issn: 0010-4655. doi: 10.1016/S0010-4655(03)00486-7. [8] M. Lüscher, “Local coherence and deflation of the low quark modes in lattice QCD,” Journal of High Energy Physics, vol. 2007, no. 7, pp. 081–081, Jul. 30, 2007, issn: 1029-8479. doi: 10.1088/1126-6708/2007/07/081. [9] J. M. Tang, R. Nabben, C. Vuik, and Y. A. Erlangga, “Comparison of two-level preconditioners derived from deflation, domain decomposition and multigrid methods,” Journal of Scientific Computing, vol. 39, no. 3, pp. 340–370, Jun. 1, 2009, issn: 1573-7691. doi: 10.1007/s10915 -009-9272-6. [10] E. Romero, A. Stathopoulos, and K. Orginos, “Multigrid deflation for lattice QCD,” Journal of Computational Physics, vol. 409, p. 109 356, May 15, 2020, issn: 0021-9991. doi: 10.1016/ 75 j.jcp.2020.109356. [11] R. Babich et al., “Adaptive multigrid algorithm for the lattice wilson-dirac operator,” Physical Review Letters, vol. 105, no. 20, p. 201 602, Nov. 11, 2010. doi: 10.1103/PhysRevLett.105. 201602. [12] A. Frommer, K. Kahl, S. Krieg, B. Leder, and M. Rottmann, “Adaptive aggregation-based domain decomposition multigrid for the lattice wilson–dirac operator,” SIAM Journal on Scientific Computing, vol. 36, no. 4, A1581–A1608, Jan. 2014, issn: 1064-8275, 1095-7197. doi: 10.1137/130919507. [13] B. Yoon, “A machine learning approach for efficient multi-dimensional integration,” Scientific Reports, vol. 11, no. 1, p. 18 965, Sep. 23, 2021, issn: 2045-2322. doi: 10.1038/s41598-021-9 8392-z. [14] T. Lechien and D. Dudal, “Neural network approach to reconstructing spectral functions and complex poles of confined particles,” 2022, Publisher: arXiv Version Number: 1. doi: 10.48550/ARXIV.2203.03293. [15] R. Abbott et al., “Sampling QCD field configurations with gauge-equivariant flow models,” 2022, Publisher: arXiv Version Number: 2. doi: 10.48550/ARXIV.2208.03832. [16] S. Foreman, T. Izubuchi, L. Jin, X.-Y. Jin, J. C. Osborn, and A. Tomiya, “HMC with normalizing flows,” 2021, Publisher: arXiv Version Number: 1. doi: 10.48550/ARXIV.2112.01586. [17] G. S. Bali, S. Collins, and A. Schäfer, “Effective noise reduction techniques for disconnected loops in lattice QCD,” Computer Physics Communications, vol. 181, no. 9, pp. 1570–1583, Sep. 1, 2010, issn: 0010-4655. doi: 10.1016/j.cpc.2010.05.008. [18] Y. Iwasaki, “Renormalization group analysis of lattice theories and improved lattice action: Two-dimensional non-linear o(n) sigma model,” Nuclear Physics B, vol. 258, pp. 141–156, Jan. 1, 1985, issn: 0550-3213. doi: 10.1016/0550-3213(85)90606-6. [19] B. Sheikholeslami and R. Wohlert, “Improved continuum limit lattice action for QCD with wilson fermions,” Nuclear Physics B, vol. 259, no. 4, pp. 572–596, Sep. 30, 1985, issn: 0550-3213. doi: 10.1016/0550-3213(85)90002-1. [20] M. Lüscher, S. Sint, R. Sommer, P. Weisz, and U. Wolff, “Non-perturbative o(a) improvement of lattice QCD,” Nuclear Physics B, vol. 491, no. 1, pp. 323–343, Apr. 28, 1997, issn: 0550-3213. doi: 10.1016/S0550-3213(97)00080-1. [21] R. Horsley et al., “The electric dipole moment of the nucleon from simulations at imaginary vacuum angle theta,” 2008, Publisher: arXiv Version Number: 2. doi: 10.48550/ARXIV.0 808.1428. 76 [22] S. Aoki et al., “Non-perturbative renormalization of quark mass in nf= 2 + 1 QCD with the schrödinger functional scheme,” Journal of High Energy Physics, vol. 2010, no. 8, p. 101, Aug. 23, 2010, issn: 1029-8479. doi: 10.1007/JHEP08(2010)101. [23] J. Dragos, T. Luu, A. Shindler, J. de Vries, and A. Yousif, “Confirming the existence of the strong CP problem in lattice QCD with the gradient flow,” Physical Review C, vol. 103, no. 1, p. 015 202, Jan. 19, 2021. doi: 10.1103/PhysRevC.103.015202. [24] P. H. Ginsparg and K. G. Wilson, “A remnant of chiral symmetry on the lattice,” Physical Review D, vol. 25, no. 10, pp. 2649–2657, May 15, 1982. doi: 10.1103/PhysRevD.25.2649. [25] Y. Saad, Iterative Methods for Sparse Linear Systems, Second. Society for Industrial and Applied Mathematics, Jan. 2003, isbn: 978-0-89871-534-7. doi: 10.1137/1.9780898718003. [26] T. Hastie, J. Friedman, and R. Tibshirani, The Elements of Statistical Learning (Springer Series in Statistics). New York, NY: Springer, 2001, isbn: 978-0-387-21606-5. doi: 10.1007/978-0 -387-21606-5. [27] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, Publisher: arXiv Version Number: 9. doi: 10.48550/ARXIV.1412.6980. [28] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization.,” Journal of machine learning research, vol. 12, no. 7, 2011. [29] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303–314, Dec. 1989, issn: 0932-4194, 1435-568X. doi: 10.1007/BF02551274. [30] V. Maiorov and A. Pinkus, “Lower bounds for approximation by MLP neural networks,” Neurocomputing, vol. 25, no. 1, pp. 81–91, Apr. 1999, issn: 09252312. doi: 10.1016/S0925-2 312(98)00111-8. [31] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back- propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, Oct. 1986, issn: 0028-0836, 1476-4687. doi: 10.1038/323533a0. [32] S. Linnainmaa, “Analysis of some known methods of improving the accuracy of floating- point sums,” BIT Numerical Mathematics, vol. 14, no. 2, pp. 167–202, Jun. 1, 1974, issn: 1572-9125. doi: 10.1007/BF01932946. [33] B. Yoon, T. Bhattacharya, and R. Gupta, “Machine learning estimators for lattice QCD observables,” Physical Review D, vol. 100, no. 1, p. 014 504, Jul. 9, 2019, issn: 2470-0010, 2470-0029. doi: 10.1103/PhysRevD.100.014504. 77 [34] B. Efron, “Bootstrap methods: Another look at the jackknife,” The Annals of Statistics, vol. 7, no. 1, Jan. 1, 1979, issn: 0090-5364. doi: 10.1214/aos/1176344552. [35] M. H. Gutknecht, “Variants of BICGSTAB for matrices with complex spectrum,” SIAM Journal on Scientific Computing, vol. 14, no. 5, pp. 1020–1033, Sep. 1993, issn: 1064-8275, 1095-7197. doi: 10.1137/0914062. [36] F. Pedregosa et al., “Scikit-learn: Machine learning in python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [37] P. Virtanen et al., “SciPy 1.0: Fundamental algorithms for scientific computing in python,” Nature Methods, vol. 17, no. 3, pp. 261–272, Mar. 2, 2020, issn: 1548-7091, 1548-7105. doi: 10.1038/s41592-019-0686-2. [38] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” Journal of Machine Learning Research - Proceedings Track, vol. 9, pp. 249–256, Jan. 2010. 78 APPENDIX 2A: LINEAR ALGEBRA CONVENTIONS AND DEFINITIONS Most of the conventions regarding linear algebra are taken from [25]. Here we report them for clarity and to fix the notation. 2A.1 Conventions on Norms We consider a vector space V over the complex field C endowed with an inner product (·, ·) satisfying the properties of positivity, linearity and Hermitian symmetry. The norm on any element of V is an application from the vector space to the real field R satisfying the properties of positivity, homogeneity and the triangle inequality. Given v ∈ V examples of norms on Cn are ||v||∞ = max |vi | , (2A.1) i∈[1:n] " n #1/p X ||v||p = |vi |p , p ≥ 1. (2A.2) i=1 The corresponding vector spaces are denoted respectively as Ln∞ and Lnp . In the p = 2 case the vector space Ln2 is endowed with an inner product and the Euclidean norm is defined to be p ||v||2 = (v, v). (2A.3) We consider now a vector space Mn of n × n matrices. Also in this case a matrix norm can be defined satisfying the usual properties and additionally the property of submultiplicativity, i.e. ||AB|| ≤ ||A|| ||B|| , ∀A, B ∈ Mn . A matrix norm can be induced by a vector norm defined on Cn as follows ||A|| ≡ max ||Av|| , (2A.4) ||v||=1 where it should be clear when we talk of a matrix norm or vector norm. The matrix norm induced by the norm on Ln2 is related to the spectral radius ρ(A) ≡ max {|λ|, λ ev. of A } , (2A.5) as follows p ||A||2 = ρ(A† A) . (2A.6) 79 2A.2 Preconditioned Krylov space solver Krylov-space iterative solvers are based on the idea of constructing more and more accurate solutions from the complex linear space defined by a Krylov sub-space. A Krylov space of dimension n is defined by Kn (A, η) ≡ span η, Dη, . . . , Dn−1 η .  (2A.7) Many algorithms use the Krylov sub-space to generate the proper affine vector space where to search for the solution of the iterative problem. For simplicity, we consider here the operator Q2 = D† D, real symmetric and positive definite, since this is the form of the operator most often used in LQCD calculations. We assume here that in a finite volume v and at finite lattice spacing a, the lattice Dirac operator is a sparse large N × N matrix. We define the Q2 -norm as p ||v||Q2 = (v, Q2 v) . (2A.8) The CG algorithm is an iterative procedure that aims to find an approximate solution after n iterations, ψn ∈ Kn (D, η) minimizing the error ||en ||Q2 = ||S − ψn ||Q2 = ||DS − Dψn ||2 , (2A.9) at step n. The rate of convergence of the CG depends on the spectrum of the lattice Dirac operator. If we make the assumption that all eigenvalues λ are well separated by the origin, and we only know the condition number κ(D): s λmax (D† D) κ(D) = ∥D∥ · D−1 = , (2A.10) λmin (D† D) then the Q2 -norm of the error after n iterations satisfies p !n ||en ||Q2 κ(D) − 1 ≤2 p . (2A.11) ||e0 ||Q2 κ(D) + 1 This result guarantees that, barring too large values of κ(D), the CG converges to a given accuracy √ ϵ with O( k) iterations. 80 APPENDIX 2B: THE CONJUGATE GRADIENT AND BICGSTAB ALGORITHMS Since the analysis presented in this chapter focused on applications to the BiCGSTAB algorithm, we present the pseudo-code for both the simpler Conjugate Gradient algorithm and the BiCGSTAB. 2B.1 The Conjugate Gradient Method Given a linear system Ax = b and an initial guess vector x0 and a stopping parameter ϵ the algorithm proceeds as follows [25]: Algorithm 1 Conjugate Gradient algorithm r0 ← b − Ax0 k←0 while rk > ϵ do rT r αk ← pTkApk k k xk+1 ← xk + αk pk rk+1 ← rk − αk Apk rT rk+1 βk ← k+1 rTk rk pk+1 ← rk+1 + βk pk k ←k+1 end while The CG algorithm only requires the direct application of the linear operator A and two temporary vectors rk and pk . By analysis, once can deduct that rk and pk span the same Krylov subspace and form two orthogonal bases, one with respect to the inner product, the other with respect to the inner product induced by A. The vectors xk represent the projection of the real solution x onto the Krylov subspace. 2B.2 The BiCGSTAB Algorithm The BiConjugate Gradient Stabilized method was first introduced by H. A. van der Vorst [5] in 1992 and, as its predecessor BiCG, relies on two vectors, instead of one in the CG, to construct the Krylov subspace which results in smoother and faster convergence properties. The generalization to the case of a complex linear operator, which is what is needed for the Dirac operator in LQCD, was first published by M. Gutknecht [35] and reads: 81 Algorithm 2 BiConjugate Gradient Stabilized algorithm r0 ← b − Ax0 s0 ← r0 y0 ← r 0 δ0 ← y0† r0 y† As ϕ0 ← 0δ0 0 k←0 while rk > ϵ do ωk ← 1/ϕk wk+1 ← rk − ωk Ask † χk ← (Aw k+1 ) wk+1 ||Awk+1 ||2 rk+1 ← wk+1 − χk Awk+1 xk+1 ← xk + ωk sk + χk wk+1 δk+1 ← y0† rk+1 ψk+1 ← − ωχk δkk+1 δk sk+1 ← rk+1 − ψk+1 (sk − χk Ask ) y† As ϕk+1 ← 0δk+1k+1 k ←k+1 end while In this algorithm, there are three auxiliary vectors needed for the iterations (wk , rk and sk ) and the stabilizing vector y0 . Also, this algorithm only requires the direct application of the linear operator. 82 APPENDIX 2C: NEURAL NETWORK PARAMETERS IN DETAIL When dealing with ML problems, and NN in particular, there are a lot of non-physical parameters and choices involved in the training. We briefly mentioned some of them in Section 2.2; here we present the choices for the values of hyper-parameters for reproducibility. We also mention that all models have been trained using the scikit-learn [36] package from scipy [37], in python which is the de facto standard language for ML, using the Multi-Layer Perceptron Regressor method. Starting Values When training a NN one can choose a starting point for the optimization algorithm. In practice, this is equivalent to setting starting values for all the weights and biases of the network. We have chosen to set them with a random uniform distribution of unit variance, following [38]. This is the most common choice and the default behavior in scikit-learn. Over-fitting and Cross Validation In order to prevent over-fitting, a standard L2-norm regulator has been used. This has the effect of penalizing networks with too large weights or biases, which can result from over-fitting, by adding a term to the loss function that is the L2-norm of all the weights and biases. Cross-validation is a common technique in ML training to lessen the bias of the model with respect to the training data sample. It acts by reserving a subsample of the training dataset on which the model is not fit, but it is instead tested during training. This is very effective a reducing over-fitting, especially when the sample dataset size is small. The cross-validation sample was set to 10% of the training data. Choice of the Optimizer There are several possible optimizers for training a ML model that are readily available to use in most frameworks. The main difference between them is the speed and the ability to distinguish local minima from the global minimum of the parameter space. The optimizer we used in this work is the Stochastic Gradient Descent algorithm (SGD). Choice of the Activation Function The choice of the activation function, as for the optimizer, can be easily made in most ML frameworks like scikit-learn as they are readily available. The sigmoid is a very well-studied activation function, but it is rather expensive to compute as it 83 requires exponentiation. A simpler function is the Rectified Linear Unit (ReLU), see Equation (2.6), which requires just an if statement, so it is much cheaper than the sigmoid. We chose the ReLU both for performance and because it more easily captures the chance of two nodes being completely uncorrelated, i.e. the weight between them is 0, while the sigmoid always has a non-zero contribution. Input Scaling To prevent numerical issues in the SGD minimizer, all the data that is used in the NN should have roughly the same magnitude and variance. We scaled all the data at every source- sink separation time to have zero mean and unit variance. This is a rather standard procedure and guarantees that all weights, biases and input data have the same magnitude. This is especially important in our case as the correlator data is exponentially decreasing with increasing t, meaning that the input to the NN would be extremely small compared to the biases and weights, which would make training almost impossible. Number and Sizes of Hidden Layers One of the most important choices to make when dealing with NN models is the structure of the NN itself. In order to fix the appropriate size of the layers of the network, we performed a semi-quantitative analysis of the results for varying sizes of the layers. We fixed the maximum number of layers to 3 and the sizes of the layers in the range from 1000 to 10. We then started from the largest network and used the SPNN3t2 model for the single point network and the GNN3 for the global one (they have the largest input layer size, all the models in the two classes have the same output layer size). By decreasing the sizes of the layers gradually, we found the point where results started to diverge significantly, meaning that the network became unable to represent the relation between the data properly. The final layer sizes were fixed to be at least 1.5 times larger than the found point. Note that the sizes of the layers were allowed to be different between the same network, in particular, the last hidden layer was kept smaller than the first one or two. The final sizes of the hidden layers of the NN that we used are (150, 150, 100) for the single point models and (600, 600, 200) for the global network models. It should be noted that during training, some of the NN did not converge, i.e. their loss function was never smaller than the 84 predefined criterion, which we took as default, of 10−4 . This happened only for values of t ≈ T /2, where the signal is weaker. However, we did not see any impact on the final results because of the bootstrap procedure we used. 85 CHAPTER 3: FIRST RESULTS OF HADRON SPECTROSCOPY WITH STABILIZED WILSON FERMIONS Modern Lattice QCD calculations require ever-improving procedures and techniques for the generation of gauge field ensembles. In this chapter, we present a new effort in this direction started by the OPEN LATtice Initiative. As was briefly discussed in Section 1.4, Wilson-Clover fermions (WCF) [1] are a very popular choice in LQCD among other discretizations because of their straightforward interpretation and the very limited restrictions they pose on possible observables. This also leads to them being among the simplest action to implement numerically and analyze. The discretization of regular Wilson fermions is of order O(a), but this can be rigorously improved to higher order, as, for example, through the clover term if the coefficient cSW is determined non-perturbatively. Thus WCF over the years enabled several precision calculations for a variety of physical observables, for example, several quantities that are used in the biyearly Flavor Lattice Averaging Group (FLAG) report [2], which summarizes the most important results from LQCD. There are, however, some limitations on the use of Wilson-Clover fermions. One of the most striking is the breaking of chiral symmetry, which in the context of gauge field generation has the impact of allowing the Dirac operator to take arbitrarily small values, especially in the case of small quark masses and coarse lattice spacing, an effect that grows worse with the larger lattice volumes. This, in turn, means that to generate ensembles at low pion masses, one needs to have very fine lattice spacings, but as we discussed in Section 1.4, small lattice spacings lead to topological freezing and critical slowing down in the MCMC. This effectively poses some bounds on the parameter space, in terms of lattice spacing, volume and quark mass, where simulations can be performed successfully. One should, however, note that these limits are very similar for other lattice discretizations. In recent years there has been a growing interest in master-field simulations [3], which by definition require very large volume simulations. This leads to an effort to improve the volume 86 limitations of Wilson-Clover fermions by introducing a set of techniques for numerical stabilization and a new definition of the fermion action [4]. This “package” of both algorithmic and analytical improvements is what we call Stabilized Wilson Fermions (SWF). The SWF package is of interest on its own, not just in the context of master-field simulations. Even at traditional volumes, SWF could have great benefits for single-baryon and light nuclear physics applications due to them enabling coarse lattice spacings at quark masses closer to the physical point, where WCF have issues due to stability [5]. For this reason, a new research collaboration has been formed to study and generate ensembles with SWF at the state-of-the-art level. 3.1 The OPEN LATtice initiative The study of Stabilized Wilson fermions only began in 2018 and many questions with regards to their applicability and properties in the parameters space need to be answered. In the context of master-field simulations there are encouraging results from [6, 7], but more evidence is still needed, especially in the case of coarse lattice spacing. The very first study [4], in which the non-perturbative tuning of the Sheikholeslami-Wohlert coefficient cSW was begun, showed early indication of a very good scaling behavior towards the continuum. To enable further investigations on SWF the OPEN LATtice initiative was founded [8]. This is, at the time of this writing, composed by seven principal investigators from the USA, Europe and Asia and one doctoral researcher, the author of this thesis. The goal of the collaboration is to generate state-of-the-art QCD ensembles using SWF and share them according to the principles of the open science philosophy with all of the LQCD community. Most of the results presented in this chapter have been produced as part of this effort. 3.1.1 Open science policy Since generating QCD gauge field ensembles is an extremely costly procedure from the numerical point of view, it became apparent in the early stages of Lattice QCD that storing and sharing the ensembles was of particular importance. One key aspect is to enable research groups with limited access to HPC facilities to participate in the scientific effort without having to start 87 from scratch, but by using community-generated ensembles. Some remarkable community-wide efforts were made, for example in the establishment in 2002 of the International Lattice Data Grid (ILDG) [9, 10] and all the previous smaller regional Lattice Data Grid. This proposed a standard format for the metadata and the gauge field configuration data and provided an interface where collaboration could upload and share ensembles. However, over the years, its use and relevance decreased. Only in the past two years, an effort to revise, improve and revive the ILDG has been set in progress. The OPENLat collaboration strongly believes in the values of open science as a means to facilitate the advance of Lattice QCD. Here are summarized the main points that we stand by: • Define and uphold quality: we propose to define quality standards for the ensembles we generate. In particular, some control observables are routinely checked to assess the quality of the produced gauge configurations and the stability of the Markov Chain. These quality standards are themselves continuously researched and improved upon. • Share and maintain repository: a crucial aspect when generating gauge field ensembles is to maintain an accessible long-term storage space for all the data that is produced. This requires data-management facilities in HPC centers, so it is also important to provide strategies for easy access to the data once it is generated and to control the integrity of the data itself. Once the ILDG should become active again, we are strongly in favor of complying with the upcoming standard of formats and sharing procedures. • Community boosting: the collaboration is open to the possibility of new members of interested parties joining our efforts with new resources to expand our sets of ensembles. • Grant and enable access: we will ensure that when the ensembles are completed, which is a three-stage process (see the following section), all the gauge field configurations will be made open access. This means that no restriction will be imposed by the collaboration on which research groups/projects can use the ensembles. 88 3.1.2 SWF Ensembles from the OPENLAT Initiative The goal of the OPENLAT initiative is to generate a set of lattice gauge field ensembles using SWF that span a multitude of parameters. This is done obviously to allow taking all the limits required to handle the systematic uncertainties properly for rigorous precision results. In particular, we span three parameters: the lattice spacing a for continuum extrapolation, the pion mass mπ for chiral extrapolations1 and the total lattice size L to control the finite volume effects. It should be noted that the coarsest lattice spacing, especially with small pion masses, which are, for example, useful for nuclear physics studies, is of particular interest since they represent the regime where SWF are supposedly better than regular WCF. 500 a=0.12 fm 0.064 fm m𝛑 [MeV] 0.094 fm 0.054 fm, obc 450 0.077 fm 400 m𝛑K 350 300 250 chiral trajectory 200 150 mphys 𝛑 100 continuum limit (a [fm])2 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Figure 3.1: Planned combinations of pion mass mπ and lattice spacing a for the OPENLAT ensembles. The target is to generate 500 independent configurations per ensemble, to later be extended to 1000. This is to be done in three stages: • Stage I: ensembles at the SU(3) flavor symmetric point, i.e. mπ = mK = 412 MeV • Stage II: ensembles with mπ = 300 MeV and mπ = 200 MeV • Stage III: ensembles at the physical point mπ = 135 MeV 1 Chiral extrapolations are extremely useful even when ensembles at the physical point are available, as they can help constrain fits in cases where a functional form for the mπ dependence is known, for example from chiral perturbation theory (χPT) to improve the signal at the physical pion mass by means of statistical fitting. 89 Figure 3.2: Overview of the OPENLAT ensembles presented here as parametrized by the lattice spacing a and mπ L. The height of the bars in the histograms represents the number of independent configurations that have been generated. Shaded boxes are ensembles in the tuning state. At the end of each stage, a publication presenting basic observables computed on the ensembles will be released as reference, together with the public availability of all the gauge field configurations. The choices of lattice spacings to be simulated are a = 0.064, 0.077, 0.094 and 0.12 fm, and all the ensembles have standard periodic boundary conditions. However, an additional set of ensembles with a = 0.055 fm will be generated with open boundary conditions to further improve critical slowing down and circumvent topological freezing. In Figure 3.1 the selected combinations of mπ and a are presented. At the time of this writing, the OPENLAT Initiative is approaching the completion of Stage I, in Figure 3.2, there is the most recent available status update on the ensemble generation. The shaded squares represent ensembles that are in the “tuning” stage, where the MC algorithm for generation is adjusted and the Markov chain is thermalized. In this chapter, we will present some of the procedures used for computing the reference observables for the upcoming Stage I publication. 90 3.2 Stabilized Wilson Fermion Package The SWF package comprises of a set of tools aimed at improving the stability of Wilson fermions simulations. While some of them had already been established previously, others are entirely new and were proposed in [11]. We can, however, categorize the improvements into two sets; one is aimed at better stability of the generation algorithms and other is improving the fermion discretization. The simulations are performed using the openQCD2.0+ software package[12]. As stated in the previous section, to generate the state-of-the-art ensembles a plethora of well-established techniques that improve the performance of the MCMC is also used, such as mass-preconditioning through Hasenbusch chains [13, 14], twisted mass reweighting for light quarks [15, 16] and the improved linear solvers such as the SAP [17, 18]. 3.2.1 Algorithmic Stability Improvements There are three components to the SWF in this category. The first is the Stochastic Molecular Dynamics (SMD) algorithm [19–22]. It is a modification of the traditional HMC algorithm aimed at preventing the problem arising from large jumps in the phase space trajectory. These can, for example, be due to an accumulation of errors coming from the numerical integration of the MD trajectory. The issue is that when such a jump occurs, the algorithm needs to spend several cycles to re-thermalize towards the target distribution. The key difference in SMD is that the momenta are not chosen randomly at every step but are instead randomly rotated from one update to the other. The algorithm is schematically presented below: Where π(x, µ), ϕ(x) are the momenta of the pseudo-fermion and gauge field, respectively. With a little work, it can be shown that the SMD and the HMC are equivalent in the case of fixed ϵ and large γ. In Figure 3.3, we show a simple schematic representation of the SMD as compared to the HMC algorithm. Some interesting features of the SMD are the proven ergodicity and its convergence to the exact stationary solution [23], the significant reduction of unbounded energy violations |∆H| ≫ 1 and the decrease autocorrelation at fixed cost. The effect is due to the algorithm being a “slower” than the HMC in terms of updates, but sampling the phase space in a more efficient way that more 91 1. Refresh π(x, µ) and ϕ(x) by a random field rotation: π → c1 π + c2 v ϕ → c1 ϕ + c2 D † η (v and η normal distributed) c21 + c22 = 1, c1 = e−ϵγ , ϵ = MD integration time, γ = friction parameter 2. short MD evolution 3. Accept/Reject-step (exact algorithm) 4. Repeat ⟲ Scheme 3.1: Short description of the Stochastic Molecular Dynamics algorithm. than compensates for the difference. Here ∆H is the difference in the energy before and after a full microcanonical update, which enters the accept-reject step of the algorithm in the exact manner as in the HMC, see Equation (1.44). One final feature, which only appears when using the SMD in conjunction with deflation algorithms for the pseudo-fermions, is that the smoother updates in the gauge field and its associated momenta ϕ(x) lead to improved updates of the deflation subspace as well. Figure 3.3: Pictorial representation of the HMC and SMD algorithms for generation of gauge field ensembles. Image courtesy of A. Francis. 92 P 1/2 √ L2 norm ∥η∥2 = x (η(x), η(x)) → ∝ V supremum norm: ∥η∥∞ = supx ∥η∥2 → V −independent Scheme 3.2: Schematic representation of the differences between the L2 norm, which is regularly used in most LQCD calculations, and the supremum norm. The next tool for improving the algorithmic stability in SWF that we present is the use of a volume-independent norm as a stopping criterion in the linear system solve for pseudo-fermions. As shown in Section 1.4, it is usual practice to stop the algorithm when: ∥rn ∥ = ∥η − Dqi ∥ < ϵ ∥η∥ . (3.1) However, this equation hides the fact that one is free to choose any vector norm. Most commonly, the L2 norm is chosen, but this leads to the criterion to be volume dependent. Hence, in the SWF package, the supremum norm is used. This norm helps against local effects that could lead to loss of precision. The last algorithmic stability improvement is the modification of the accept/reject global sum implementation, in openQCD2.0 in our case, to handle quadruple precision to prevent the accumulation of error, especially in the case of very large volumes. This is relevant primarily for master-field simulations, but it is nonetheless a part of the SWF package. 3.2.2 The Exponential Clover Action Probably the most striking feature of SWF is the use of a new discretization for the fermion action. This makes SWF fundamentally different from WCF, even though, as we will see, the new discretization is very similar to the original one by Sheikholeslami and Wohlert [1]. To start, we recall, from the discussion in Section 1.2, the full form of the WCF operator: ±4   SW 1 X 4 i D (x, y) = (1 − γµ )Uµ (x)δx,(y−µ̂) + m + δx,y + cSW σµν Gµν (x)δx,y . (3.2) 2a µ=±1 a 4 When applying even-odd preconditioning, as is standard practice since the field at a lattice site is only connected through the derivative to its nearest neighbors, which all have the opposite 93 “parity,” the operator takes the blocked form:   Dee Deo  D= . (3.3) Doe Doo The hopping term is only in the even-odd off-diagonal component, while the diagonal blocks are diagonal matrices themselves. Their sum is the diagonal component of the original DSW operator: i Dee + Doo = M + cSW σµν Gµν , (3.4) 4 4 where we have used the shorthand notation of M = a + m. By inspecting this diagonal term, one can conclude that the SW Dirac operator is not protected from having arbitrarily small eigenvalues coming from the sum of M with the clover term. Furthermore, in this formulation, the positive and negative eigenvalues are distributed equally. The probability of Doo having a very small eigenvalue scales with the number of eigenvalues itself, which is determined by the volume, making such simulations pathological. In order to fix this problem, when Stabilized Wilson Fermions were proposed in [11], the WCF discretization was replaced with an exponentiated version of the clover term: i h c′ i i Dee + Doo = M + cSW σµν F̂µν → M exp SW σµν F̂µν . (3.5) 4 M 4 It is trivial to see that the first-order expansion of the exponential version of the operator is identical to the traditional Wilson Clover. This new variant is, by definition, positive definite and easily invertible. These two features make it suitable for even-odd preconditioning, as now Doo is fully protected from small eigenvalues. It should also be noted that the exponential clover term is a valid Symanzik improvement term, which is crucial for renormalizability. One should take care that the c′SW coefficient is different in the exponential version than the one in the ordinary clover action, so it needs to be re-tuned non-perturbatively. It should, however, be noted that exceptionally small eigenvalues could still be present as in the traditional WCF, but these exceptional configurations would not be given by the clover term anymore, but from other sources. SWF do not entirely solve this problem, but they mitigate it in a parameter region that is inaccessible to regular WCF simulations. 94 3.2.3 A Quenched Example of the Exponential Clover Action To show the difference between the WCF action and the Exponential Clover one, we report the test done in [24] of a quenched calculation of the pion correlator. By quenched, we mean that the fermion action that was used for generating the ensemble is not the same action that is used when calculating the quark propagator for the fermion contractions. In particular, this test was performed with 1000 configurations with the gauge coupling β = 6.0 (a = 0.093fm) and volume L4 = 484 . The coefficients cSW for the two actions were tuned separately and the values of the quark masses were fixed to give the same pion masses mval π = 320 MeV. 1.0e+01 GPP(t), quenched, 𝜷=6.0, m𝜋=320MeV GPP(t=20,n) 1.0e+00 Clover Clover eClover eClover 1.0e-01 1.0e-02 1.0e-03 t 1.0e-04 0 5 10 15 20 25 30 35 40 45 200 400 600 800 1000 ncfg Figure 3.4: Left: The pion correlator determined using both the standard and exponentiated clover terms. Right: The Monte Carlo history of the pion correlator at fixed Euclidean time t/a = 20, towards the center of the lattice. Plots courtesy of A. Francis. The results, which can be seen in Figure 3.4, show no spikes in the correlator MC history, implying a much stronger stability for the exponential clover term. In turn, for all Euclidean separation times t (right plot), the correlator is well behaved, while it is clear that for times t/a = 20 the standard clover results are dominated by the noise coming from exceptional configurations. 3.3 Simulating Dynamical QCD with 2 + 1 Quark Flavors and SWF The goal of the OPENLAT Initiative is to generate QCD ensembles with 2 + 1 dynamical fermions, that is, a light degenerate isospin doublet of light quarks and a heavier strange quark. 95 2.2 cSW 2 1911.04533 1.8 new 1.6 1.4 1.2 g20 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Figure 3.5: Non-perturbative tuning of cSW using the Schrödinger functional for 2 + 1 dynamical fermions. Left plot: comparison of the WCF and exponentiated clover fermions as presented in [11]. Right plot: more recent results for the tuning of cSW (blue line and points) with an additional value of the coupling β = 3.685, as done in [24]. The red line and points are the results from [11]. In this section, we first briefly describe the tuning procedure for cSW (Section 3.3.1), then explain how the quark masses are fixed (Section 3.3.2) and finally, we’ll give some details on the ensemble generation process (Section 3.3.3). 3.3.1 Non-perturbative Tuning of cSW Following the procedure of Lüscher et al. [25] the tuning of cSW was performed in [11] for the range β ≥ 3.8 for three quark flavors. Later in [24] results were presented that extended to β = 3.685 to enable even coarser lattice spacings. This tuning is performed using the Schrödinger functional boundary conditions [26], determining cSW by imposing O(a)-improvement on small volumes, where L = 8 and T = 16. For the β = 3.685 point a value of L = 16 was used instead. In Figure 3.5 on the left, we report a comparison of the tuned values of cSW for WCF and our new exponential clover action. On the right there is a comparison of the fits between [11] and [24]. The results of this procedure lead to the following interpolation for the Sheikholeslami-Wohlert coefficient for the exponential clover action: 1 − 0.325022g02 − 0.0167274g04 cSW (g02 ) = . (3.6) 1 − 0.489157g02 96 3.3.2 Setting the chiral trajectory In order to simplify the treatment of the continuum limit, the OPENLAT Initiative decided to adopt the approach previously used by the QCDSF [27] and CLS [28] collaborations for dealing with the tuning of the quark mass parameters. The key idea is to link the masses of the light quark and of the strange quark together by tuning them at the SU(3) flavor-symmetric point, where all quark masses are equal to each other and fix the trace of the quark matrix when changing the coupling: Tr[M ] = const. This is done to reduce the mass-dependent cutoff effects. The starting point of the procedure is the relation, coming from chiral perturbation theory, for the meson mass combination mπK , which is known to be proportional to the trace of the quark mass matrix. This information can then be used as a tuning condition, that is, one changes the quark mass values until the physical value is obtained: 2 2  m2πK = mK + m2π /2 ≡ m2πK |phys . (3.7) 3 This mass tuning needs to be done at every value of the coupling, i.e. for every lattice spacing, with some minor perturbative effects coming from isospin breaking and electromagnetic effects, see [29]. When a different pion/kaon mass is desired, one needs to change the quark mass matrix accordingly, keeping Tr[M ] fixed. This means that as the pion mass approaches the physical point, so will the kaon mass. 3.3.3 Generation of SWF Ensembles At this point, we have presented all the ingredients needed by the OPENLAT initiative to generate the SWF ensembles. As discussed in Section 1.4, in order to fix the scale of the simulation, one needs one physical input for every parameter of the simulation. We chose to parametrize everything in terms of the pion mass mπ , the kaon mass mK and the gradient-flow time scale t0 [30], following the procedures of the CLS collaboration [28, 31]. In particular, the two quantities ϕ2 and ϕ4 are kept fixed while approaching the continuum:   2 2 1 2 ϕ2 = 8t0 mπ ϕ4 = 8t0 mK + mπ . (3.8) 2 97 This criterion, together with the non-perturbative tuning of cSW and the fixing of the trace quark mass matrix, sets the conditions for our simulations. At the time of this writing, in the table below we report the available ensembles from the OPENLAT Initiative at the flavor-symmetric point: β Size a [fm] mπ L [fm] mπ L Ncfg 3 3.685 24 × 96 0.12 411.54(46) 2.9 6.0 1200 3.8 243 × 96 0.094 410.58(84) 2.3 4.5 1200 323 × 96 0.094 409.57(46) 3.0 6.0 1300 3.9 483 × 96 0.077 410.39(22) 3.6 7.5 300 4.0 483 × 96 0.064 408.93(25) 3.0 6.4 600 Table 3.1: Current Stage I ensembles of the OPENLAT Initiative. Here β is the inverse coupling β = 6/g 2 , L is the spacial extent of the lattice and finally Ncfg is the number of configurations in each ensemble. Modern lattice simulations have complicated algorithms and a plethora of control observ- ables need to be monitored while generating configurations via MCMC. Here we present a non-exhaustive list of the checks that our collaboration performs while generating ensembles, with selected results for the β = 3.8, V = 323 × 96 ensemble shown as examples: • Lack of thermalization effects in the plaquette and in the energy density. If there is a drift in the average value of one of these quantities, then the algorithm is not sampling the PDF properly. In particular, a jump in either observable would imply the transition from a local minimum of the action to another, which is undesirable. Figure 3.6: Monte Carlo history of the plaquette during the a094m400 ensemble Markov chain. As expected, since the system is thermalized properly, the value is constant along the chain within the bounds of small stochastic fluctuations. 98 • Check the fluctuations, distribution and autocorrelation of the topological charge Q com- puted via the gradient flow. This is crucial in order to ensure correct sampling from all topological sectors, see Appendix 1B. The autocorrelation of this observable will be analyzed more in detail in Section 3.4.1 Figure 3.7: Monte Carlo history of the topological charge computed with the gradient flow during the a094m400 ensemble Markov chain. √ β = 3.8 mπ = 410 MeV V = 323 × 96 8t = 0.45 fm 60 50 40 30 20 10 0 −30 −20 −10 0 10 20 30 Q Figure 3.8: Distribution of the topological charge computed with the gradient flow for the ensemble a094m400. The orange line is a Gaussian fit to the histogram. The data is well described by a Gaussian, meaning that the algorithm is sampling all topological sectors effectively. • We ensure that the distribution of the error in the Hamiltonian at the end of a single MD update, ∆H, matches the acceptance rate of the SMD algorithm. A more detail discussion of this criterion can be found in [32], but, in general, if large fluctuations and non Gaussian 99 behavior were to be found in the distribution of ∆H that would imply that the MD step size δτ is too large leading to suboptimal sampling. Figure 3.9: Distribution of the error in the Hamiltonian ∆H of the MD updates during the a094m400 ensemble Markov chain. The solid lines represent the expected distribution from the acceptance rate (blue) and the fitted distribution (red). The relative discrepancy is acceptable as long as the fitted Gaussian is narrower and no apparent deviations from a normal distribution are present. • For the chiral trajectory, we make sure that the tuning quantity ϕ4 = 8t0 (m2K + m2π /2) is within 0.5% of the target value of 1.115 with an error of max. 1σ. This quantity is a hadronic observable, we will discuss it more in details in Section 3.4.3. 3.4 First SWF Ensembles and Light Hadron Spectroscopy Now that we have presented how the SWF ensembles are generated, we turn our attention to their physical properties. As mentioned in the presentation of the OPENLAT Initiative, one of the aims of the collaboration is to provide potential users with all the relevant metadata. In particular, we want to also provide reference values for the autocorrelation of the configurations and light hadron masses. These quantities are not meaningful when computed on a single configuration; they are only useful as ensemble averages. Hence, it would be interesting to provide these results in the most unbiased and reproducible form as possible. In this chapter we will first present some results for pure gauge observables and the integrated autocorrelation times for the topological charge of the ensembles under consideration, then in 100 Section 3.4.2 we will present a method for Bayesian estimation of the hadron spectrum and, finally, in Section 3.4.3 we will show the preliminary results. This analysis is restricted to a subset of all the configurations shown in Table 3.1 as shown in Table 3.2; this is due to the ongoing generation of these ensembles. For simplicity, we assign a readable name to each ensemble, which we will use to refer to them in the rest of this section. Ensemble t0 /a2 a [fm] Ncf g Nsrcs a12m400 1.48548(64) 0.12 600 100 a094m400S 2.4482(50) 0.094 500 100 a094m400 2.43979(89) 0.094 1200 100 a077m400 3.6198(30) 0.077 900 100 a064m400 5.2588(46) 0.064 1000 100 Table 3.2: SWF Ensembles used for the analysis. We note that the ensemble a094m400S is labeled differently as it represents a smaller volume compared to the others. Ncf g represents the number of gauge field configurations analyzed, but not all of them are to be considered independent. Nsrcs is the number of stochastic sources used to compute the averaged hadron correlators. 3.4.1 Gradient Flow Scale Determination and Autocorrelation Analysis The first analysis that we performed on the ensembles was to determine the value of the √ gradient flow scale t0 to fix the lattice spacing in physical units. We used the value of t0 = 0.1464(18) fm from [28] to convert to fermis. The determination of t0 has been carried out following the procedure outlined in Appendix 1C and the results are summarized in Figure 3.10 for the five ensembles at hand. Another quantity of interest that can function as a check for the quality of the ensembles but that can only be determined at the ensemble level is the integrated autocorrelation time τint of some observables. In particular, it is known that the topological charge (see Appendix 1B for a detailed discussion) has the longest correlation length along the Markov chain [33]. Its value is crucial to understand if all topological sectors are being sampled correctly and also to determine if the configurations that are being produced are statistically independent. To do this, we employed the windowing procedure outlined in Appendix 1A.1 on data for the topological √ charge Q at non-zero flow-time, with 8tf ≈ 0.3 fm. The integrated autocorrelation time has 101 Scale Setting with the Gradient Flow 0.35 0.30 0.25 t2 hEi 0.20 a12m400 t0 /a2 = 1.48548(64) a094m400 t0 /a2 = 2.43979(89) 0.15 a094m400S t0 /a2 = 2.4482(50) a077m400 t0 /a2 = 3.6198(30) a064m400 t0 /a2 = 5.2588(46) 0.10 0 1 2 3 4 5 6 7 8 t/a2 Figure 3.10: Scale setting example on the Stabilized Wilson Fermions ensembles, more details found in Chapter 3. In order to determine the lattice spacing a in physical units one determines the point tf /a2 at where t2f ⟨E⟩ = 0.3. also been computed for the hadronic two-point correlators for the pion, the proton and the Ω baryon, as these are the quantities that we focused on afterward. For the correlators, we chose a value for the source-sink separation t = 20 to compute the values for τπ , τP ,τΩ , this is an arbitrary choice, but there is no reason to think the autocorrelation, especially at large t should depend on it. The results are reported in Table 3.3. Ensemble τQ τπ τp τΩ a12m400 0.53(4) 1.56(26) 0.53(4) 0.50(4) a094m400S 0.66(13) 0.51(6) 0.54(6) 0.51(6) a094m400 0.64(6) 0.55(6) 0.51(4) 0.44(6) a077m400 0.54(10) 0.63(8) 0.72(10) 0.75(13) a064m4001 2.25(83) 0.81(15) 1.02(21) 1.03(20) a064m4002 1.52(42) 0.50(6) 0.57(11) 0.66(11) Table 3.3: Integrated autocorrelation times of the topological charge Q and of the pion, proton and Ω two-point correlators on the SWF flavor-symmetric ensembles. One important remark is that the ensemble a064m400 has two entries in the table: they 102 correspond to two different MD updates spacings between saves of the gauge field configurations during the execution of the SMD algorithm. The save interval was first set to a small value, but a preliminary analysis of τQ suggested that the save interval was too small, as there was a large autocorrelation between the configurations, τQ = 2.25. This is to stress the importance of this analysis. From the results for τπ , τp and τΩ we can observe that for hadronic observables we ob- serve that a064m4001 and a12m400 have values for the integrated autocorrelation time of at least one hadron that are considerably larger than the optimal value of 0.5, as explained in Appendix 1A.1. The solution we adopted has been to use statistical blocking for these ensembles, see Appendix 1A.1.1, with block size 2, before analyzing the data to decorrelate the samples. 3.4.2 A Bayesian Analysis Framework for Hadron Masses In Chapter 2 we have extensively presented results of hadron correlators and effective masses. However, to the modern lattice practitioner, the analysis used in the previous chapter might have appeared too simplistic. In particular, we relied on constant fits for the effective masses. While this is still a widely used approach, an increasing number of precision studies now use more sophisticated techniques, even for basic hadron spectroscopy. One of the main issues with the constant fit approach is the large bias that is introduced in selecting the fitting range from the effective mass. While this might not seem like a large problem in the case of mesons, where the signal does not degrade towards large source-sink separation times, in the case of baryons, where the signal-to-noise problem arises, this bias can sometimes be non-negligible. However, the systematic error introduced by choice of the fitting range is not necessarily always investigated, nor is there a clear framework for how to estimate it. One apparently obvious improvement over constant fits of the effective mass is to improve the approximating functional form. In particular, since we know that correlators are a sum of decaying exponentials as shown in Section 1.4, one could try directly fitting the correlator this way. Such an approach, usually called in jargon “multi-state fitting,” is valid and gives good results; however, the fitting algorithms tend to be unstable for such functionals due to their non-linearity. 103 Furthermore, the higher excited states have a much sharper decay due to their large exponent, which in turn leaves almost no signal on the correlator. These two problems lead to standard fitting procedures being somewhat unreliable and biased in the case of multi-state fits. Bayesian Fitting with Constraints A solution to the problem of the stability of multi-state fits for hadron correlators can be the introduction of educated initial guesses for the fit parameters. These initial values, or priors, introduce bias in the end results of the fit, which needs to be accounted for. Luckily, Bayesian statistics can help to deal with this issue in a systematic and mathematically rigorous way. The approach to Bayesian curve fitting in the presence of prior constraints applied to the specific case of lattice correlator data was first introduced by Lepage et al. in [34]. The starting point is to consider the exact form of the correlator from theory: X ∞ Cth (t, An , En ) = An e−En t , (3.9) n=1 where An are the amplitudes of the wavefunction corresponding to the energy levels En . A traditional fit would proceed by truncating the sum up to a fixed number of stated Ns and considering all the An and En as parameters. Then one would minimize the χ2 function defined as: X ∆C(t)∆C(t′ ) χ2 (An , En ) = 2 , (3.10) t,t′ σt,t ′ where ∆C(t) = C(t)−Cth (t, An , En ), with C(t) begin the ensemble average data for the correlator 2 ′ ′ at time t. The term σt,t ′ = C(t)C(t ) − C(t) C(t ) is the correlation matrix instead. The first consideration to make is that contributions of high-energy states are suppressed for large t, which means that there exists a time tmin above which only a finite number of states are contributing, hence truncating the infinite sum in the theoretical formula. Choosing a too small value for tmin would not be beneficial for fitting purposes and would bias the values of An , En away from their true values, while a too large value would lead to an effective loss of information leading to increased statistical errors. Without a rigorous way of determining the systematic uncertainties, this choice of tmin is biased and necessarily ad hoc, the same way the choice of a fitting window 104 for the constant case is. A good procedure would then be to test all possible values of tmin over a range of maximum numbers of states in the fit Ns . To constrain the fits and improve stability a choice of priors Ãn , Ẽn for the amplitudes and energies of the states is made. Then the χ2 is redefined as: X An − Ãn X En − Ẽn χ2 → χ2aug = χ2 + χ2prior with χ2prior = + (3.11) Ns σÃ2 Ns σẼ2 n n We should stress that the priors Ãn , Ẽn and their uncertainties σÃ2 n , σẼ2 n are inputs to the fit. This augmented χ-square function χ2aug will then favor fits with the final parameters close to the prior. It is then crucial to select the Gaussian uncertainty of the priors to a reasonable value that is not too constraining. With these definitions, the constrained fitting procedure is then to fit the data for all possible values of tmin and Ns is increased until the parameter of interest converges. It is interesting to notice that the final value of χ2aug is not of particular interest in the case of constrained fits. What really matters is the stability and the convergence of the fits as tmin changes and Ns is increased. Bayesian Model Averaging To further determine an unbiased value for the mass of the hadrons, we use a Bayesian model averaging procedure based on the Akaike Information Criterion (AIC) that has been recently proposed in [35] for the case of lattice correlator data with data windowing selection. This procedure further eliminates the need to “manually” check for convergence in the constrained fitting procedure as the number of states. In turn, it defines a weight for every fit result, parametrized by the pair tmin , Ns that is used to determine the weighted average of the fits. Given a fit result with parameters M = A1 , . . . An , E1 , . . . En the AIC is: AICM = −2 log(pr(M )) + χ2aug + 2k + 2n (3.12) where pr(M ) is the model likelihood function of the model, k is the number of fit parameters, including priors, and n is the number of excluded points from the fit, i.e. it depends on changes of tmin . The minimum value of the AIC among all models is used to determine the relative likelihood 105 of a given model Mi and some data D as: exp(AICmin − AICi ) pr(Mi |D) = P (3.13) j exp(AICmin − AICj ) where the denominator is just to fix the normalization to allow a probabilistic interpretation of the quantity. The relative likelihood is then used as weight for computing the weighted sum of a fit parameter ρ ∈ M . The unbiased estimator for the parameter is then its weighted average and its uncertainty is given by: !2 X X X σρ = σρ,i pr(Mi |D) + ⟨ρ⟩2i pr(Mi |D) − ⟨ρ⟩i pr(Mi |D) (3.14) i i i There are a few considerations to make on this procedure. The first and most important is that it requires no input whatsoever: the whole fitting procedure only now requires making an educated guess for the priors and attaching to their values a reasonably large uncertainty such that they do not become dominant in the fit. No further input is required. Another interesting feature that can be immediately inferred from Equation (3.12) is that this method favors models that a simpler, i.e. fit functions with fewer number of states. This sort of built-in Occam’s razor is very useful to a posteriori gain intuition on how many states can be determined from the given set of data. At the same time, Equation (3.12) tells us that this procedure favors models with larger fitting ranges, hence more data, which is certainly a feature that we implicitly desire. 3.4.3 Light Hadron Spectrum and Scaling from SWF Ensembles In order to compute the light hadron spectrum on the SWF ensembles we have implemented the exponential clover action in the Chroma [36] software package, for both the QDP++ (traditional CPU) and QDP-JIT (just-in-time compiled architecture-independent) backends. The propagators have been computed using the BiCGSTAB solver from the highly optimized QUDA GPU package. In addition we utilized the lalibe [37] software, which builds on top of Chroma to calculate fermion contractions. For this study, we decided to use Gaussian invariant smearing [38], see Section 1.3.4, and in order to choose the correct parameters, a quick study of the smearing results was conducted on 106 200 configurations with a = 0.094 fm. The results can be seen in Figure 3.11. When dealing with smearing, one needs to choose the parameters in order to suppress the higher excited states and reach the plateau earlier. However, if the smearing is excessive, then the signal starts to degrade and the statistical errors increase. In our case we opted for Nsmear = 32 and σ = 3.86, where √ σ, following the notation used in Section 1.3.4, is 4αNsmear and represents a dimensionless smearing radius. Proton Effective Mass - Smearing Test on Ensemble a094m400 - 200 Sources Nsmear = 30 α = 3.0 3500 Nsmear = 45 α = 3.5 Nsmear = 32 α = 3.86 3000 Nsmear = 64 α = 5.465 2500 mPeff [MeV] 2000 1500 1000 500 0 2 4 6 8 10 12 14 t [l.u.] Figure 3.11: Proton effective mass computed with different Gaussian smearing parameters. The data is from the ensemble a094m400, 200 configurations, 10 sources per configuration with smeared-smeared propagators. A common technique in lattice QCD calculations is the so-called “variational method” [39, 40]. In this context, it means that one computes a given correlation function with different combinations of interpolating operators at the source and the sink. Then a matrix is constructed with all the results and a Generalized Eigenvalue Problem (GEVP) is solved to greatly improve the signal of the lowest lying states. This procedure, however, requires computing several propagators for every source, as different interpolating operators lead to different sources and hence linear systems. Following the analysis conducted for example in [41] we used a numerically cheaper, but not so 107 powerful, method: computing only the point-smeared (PS) and smeared-smeared (SS) correlators for every hadron. This can be done with only one propagator solve, as the two correlators share the source. The fitting of the amplitudes and energy levels of the two correlators is performed simul- taneously, utilizing the Bayesian fitting procedure highlighted above. The correlated fits have been performed with the use of the gvar and lsqfit python packages, which have been developed starting from the work done in [34]. The ansatz for the multi-state correlator, either P S or SS, with Ns states reads: n=N Xs n=N Xs C(t, ZP,n , ZS,n , En ) = 2 ZS,n e−En t + ZS,n ZP,n e−En t , (3.15) n=0 n=0 where the factors ZP,n and ZS,n represent the amplitudes of the source and the sink for a given energy level En . As highlighted previously, our fitting procedure only requires a choice of the priors. For the hadron spectroscopy case, we had to choose values for all the parameters En , ZP,n and ZS,n together with their uncertainties to use for χprior , see Equation (3.11). The priors for the amplitudes Z̃P,n have been chosen to be all equal to the ground state one, which is fixed from the data, as we will see later. For the Z̃S,n priors, the values from the second exited stated up to Ns have been taken as half the ground state one. This is motivated by the expectation that smearing should reduce the overlap with higher excited states. The uncertainties of the amplitude priors have been set by the data for the ground state and set to twice the ground state uncertainty for the excited state. So, to summarize: Z̃P,0 → fixed from data Z̃P,i = Z̃P,0 for i = 1 . . . Ns (3.16) Z̃S,0 Z̃S,0 → fixed from data Z̃S,1 = Z̃S,0 ; Z̃S,i = for i = 2 . . . Ns 2 σZ̃P,0 → fixed from data σZ̃P,i = 2σZ̃P,0 for i = 1 . . . Ns σZ̃S,0 → fixed from data σZ̃S,1 = 2σZ̃S,0 ; σZ̃S,i = σZ̃S,0 for i = 2 . . . Ns . To fix the ground state values, we plot the ground state amplitude computed with the effective 108 mass, for both the SS or P S smearing:   −meff (t)t CS/P (t + 1) Z̃S/P,0 = e CS/P (t) where meff (t) = ln , (3.17) CS/P (t) where CS/P (t) is the short-hand notation for the two-point correlator computed either with SS or P S respectively. In Figure 3.12 we show an example of the prior fixing procedure for Z̃P,0 and Z̃S,0 from the data, noting that the gray bands in the plots represent the ±σZ̃S/P,0 band, which are taken to be approximately 10 times larger than the expected uncertainties of the real amplitudes. This is to stress that the bias introduced by this prior choice is really minimal, as it would be extremely hard to argue that the true value of the amplitude does not lie in the shaded region. From the values of Z̃S/P,0 and their corresponding σZ̃S/P,0 , the values for all the other priors are set as we explained earlier. ×10−7 8 0.006 ZS ZP 7 0.005 6 5 0.004 ZSP (t) ZSP (t) 4 3 0.003 2 0.002 1 0 5 10 15 20 25 0 5 10 15 20 25 t [l.u.] t [l.u.] Figure 3.12: Amplitudes for the point-smeared (left) and smeared-smeared (right) correlators and the respective prior choices. The data is from the a094m400 ensemble, with 100 stochastic sources per configuration. The choice of the energy priors is made in a similar manner, with the difference being that a logarithmic prior is used on the difference between the energy levels to force the order of the energy states and improve the stability of the fit. In practice, we fix the prior Ẽ0 and its associated σẼ0 by looking at the effective mass plot, again following the rule of thumb of 10 times the expected final uncertainty for σẼ0 . All the other energy priors are set to be ordered in steps of 2mπ above 109 the ground state, with equal σẼn . In other terms: Ẽ0 → fixed from data ln(Ẽi − Ẽi−1 ) = ln(2mπ ) for i = 1 . . . Ns (3.18) σẼ0 → fixed from data σẼi = σẼ0 for i = 2 . . . Ns . The value of mπ to be used in fixing the priors does not need to be very precise, because of the very large size of σẼn , so it can be determined with a rough constant fit to the effective mass, without fear of impacting the final results. In Figure 3.13 we show the procedure for fixing the ground state energy priors and widths from the pion and proton effective mass. Again, the shaded region is the width of the prior. 0.26 0.75 SS SS PS 0.70 PS 0.24 0.65 0.22 0.60 mπeff (t) mPeff (t) 0.55 0.20 0.50 0.18 0.45 0.40 0 10 20 30 40 0 5 10 15 20 25 t [l.u.] t [l.u.] Figure 3.13: Effective mass for the pion (left) and proton (right) correlators and the respective prior choice for the ground state energy E0 . The data is from the a094m400 ensemble, with 100 stochastic sources per configuration. With this information, we have all the required inputs for our fits. As outlined in the previous section, we perform a scan over a range of tmin , while the maximum value is fixed to the maximum value for the mesons and to a value where it does not affect the fit results due to the low signal- to-noise. Then we also perform fits in the range of NS = 1, . . . , 6. All the results are then combined using model averaging. In Figures 3.14 to 3.16 we show our results for the a094m400 ensemble for the pion, the proton and the Ω baryon, as these are the only possible hadrons at the flavor-symmetric point. 110 Ns = 1 2 3 4 5 0.200 E0π 0.195 0.190 a094m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D)0.1 0.0 2 7 12 17 22 27 tmin Figure 3.14: Fit results for the pion mass for the a094m400 ensemble. 0.58 Ns = 1 2 3 4 5 0.57 E0P 0.56 0.55 0.54 a094m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.25 0.00 2 7 12 17 tmin Figure 3.15: Fit results for the proton mass for the a094m400 ensemble. 111 Ns = 1 2 3 4 5 0.70 E0Ω 0.68 0.66 a094m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.25 0.00 2 7 12 17 tmin Figure 3.16: Fit results for the Ω baryon mass for the a094m400 ensemble. The first feature to notice is that the fits all converge as NS increases to the same value. This is the expected behavior, as first discussed in [34]. The low NS states only agree with the other results when tmin is large enough so that effectively no information on the higher states is present. This can also be inferred by looking at the final fitted parameters for the excited states for large tmin , which end up being almost unvaried from their prior value, meaning that the data and the fitting procedure pose no constraint on those parameters. It should be noted that the height of the top panel in Figures 3.14 to 3.16 is the width of the prior of the ground state energy. The gray band in all plots shows the model average result and error. We can see that it is a good representation of the data and is indeed compatible with all the converged fits and its uncertainty is not significantly smaller; on the contrary, it is of the same order. This is to maybe go against the myth that Bayesian fitting artificially decreases the final uncertainty. The second panel of each figure shows the Q factor, or p-value, of all the fits, which is the probability that the χ2 from the fit could have been larger, assuming the best-fit model is correct and that the data are all Gaussian and consistent with each other. In general, good fits have Q > 0.1. The third panel represents the normalized Gaussian Bayes Factor (GBF) wNs of the fits 112 at fixed tmin as NS is changed. The GBF, or pr(D|M ), is the probability of obtaining the fit data by randomly sampling the parameter model used in the fit. When comparing fit models at fixed tmin , the ones with higher wNs will be more likely to represent the data. From this data, we can infer that some fits have very poor quality, as expected, especially for low number of states and low tmin , this is apparent when looking at the Q factor. The information of wNS , on the contrary, can help determine whether a certain number of states is too large. This happens when all fits with more than n states, for a given value of tmin , have similar wNS , which implies that adding additional states to the fit function does not affect the fit. However, what in the end impacts our final estimator is the pr(M |D), shown in the bottom panel in the figures, which, as we explained in the previous section, is used as weight when averaging the models. It is interesting to notice that the models that have a significant weight have NS = 1, 2. This is expected, as from Equation (3.12), we see that fits with large NS are exponentially suppressed in weight. With this procedure, repeated for all ensembles, we performed a first study of the scaling of hadron masses towards the continuum for SWF ensemble at the flavor symmetric point; all the results for the individual fits can be found in this chapter’s appendix. First, in Figure 3.17 we computed ϕ4 according to Equation (3.8), to check the mass tunings of the ensembles. According to the quality criterion that was imposed, the value of ϕ4 should be within the gray band at the 1σ level, which is indeed the case, so the tuning has been performed correctly. In Figure 3.18, we show the results, from which we can indeed notice that SWF show almost no sign of cutoff effects for hadronic observables, they are, in fact, at the level of 1%. The continuum limit extrapolations for the masses of the proton and Ω are mp = 1175.0 ± 6.4 MeV and mΩ = 1454.7 ± 16.8, the which uncertainties of ≈ 0.5% and ≈ 1.1% respectively are on par with other determinations [41]. 113   Results for φ4 = 8t0 m2K + 12 m2π on SWF Ensembles 1.14 1.13 1.12 φ4 1.11 1.10 1.09 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 2 a [fm] Figure 3.17: Values of ϕ4 for SWF at the SU(3) flavor-symmetric point. Continuum Limit Extrapolation for mP Continuum Limit Extrapolation for mΩ 1200 1500 1180 1450 mP [MeV] mΩ [MeV] 1400 1160 χ2 /dof = 0.05 1350 χ2 /dof = 0.47 1140 mP = 1175.0 ± 6.4 MeV mΩ = 1454.7 ± 16.8 MeV 1300 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 a2 [fm] a2 [fm] Figure 3.18: Continuum limit extrapolation for the proton (left) and Ω (right) masses for SWF at the SU(3) flavor-symmetric point. 114 3.5 Outlook for Studies with SWF The results for the scaling properties of the hadron spectrum are very encouraging and show the validity and quality of the new SWF ensembles that have been generated by the OPENLAT initiative. At the most recent International Symposium on Lattice Field Theory, there have been several works presented that used SWF, with topics in spectroscopy [4, 24], finite temperature and master-field simulations [6]. Another current study, which we should mention, is the ongoing calculation of CP violating operators for searches of neutron electric dipole moment [42], which should see its first results in the upcoming year. Some of these studies are still in the preliminary stage and consist of calculations that have been previously done on other ensembles with different actions. It is therefore interesting to perform them using SWF configurations instead and compare the results to assess the quality. All these results give growing evidence of reduced cutoff effects when compared to other discretizations. At the same time, more ensembles with different physical parameters, most importantly different pion masses, are continuously being tuned and generated with an ongoing effort by the OPENLAT Initiative on several HPC facilities worldwide, and when they are ready and made publicly available, we expect to see wide adoption of the SWF ensembles for studies of Lattice QCD. 115 BIBLIOGRAPHY [1] B. Sheikholeslami and R. Wohlert, “Improved continuum limit lattice action for QCD with wilson fermions,” Nuclear Physics B, vol. 259, no. 4, pp. 572–596, Sep. 30, 1985, issn: 0550-3213. doi: 10.1016/0550-3213(85)90002-1. [2] Y. Aoki et al., “FLAG review 2021,” The European Physical Journal C, vol. 82, no. 10, p. 869, Oct. 4, 2022, issn: 1434-6052. doi: 10.1140/epjc/s10052-022-10536-1. [3] M. Lüscher, “Stochastic locality and master-field simulations of very large lattices,” EPJ Web of Conferences, vol. 175, M. Della Morte, P. Fritzsch, E. Gámiz Sánchez, and C. Pena Ruano, Eds., p. 01 002, 2018, issn: 2100-014X. doi: 10.1051/epjconf/201817501002. [4] A. Francis, J. R. Green, P. M. Junnarkar, C. Miao, T. D. Rae, and H. Wittig, “Lattice QCD study of the h dibaryon using hexaquark and two-baryon interpolators,” Physical Review D, vol. 99, no. 7, p. 074 505, Apr. 12, 2019, issn: 2470-0010, 2470-0029. doi: 10.1103/PhysRe vD.99.074505. [5] D. Mohler and S. Schaefer, “Remarks on strange-quark simulations with wilson fermions,” Physical Review D, vol. 102, no. 7, p. 074 506, Oct. 27, 2020, issn: 2470-0010, 2470-0029. doi: 10.1103/PhysRevD.102.074506. [6] P. Fritzsch, J. Bulava, M. Cè, A. Francis, M. Lüscher, and A. Rago, “Master-field simulations of QCD,” in Proceedings of The 38th International Symposium on Lattice Field Theory — PoS(LATTICE2021), May 16, 2022, p. 465. doi: 10.22323/1.396.0465. [7] M. Cè et al., “Approaching the master-field: Hadronic observables in large volumes,” in Proceedings of The 38th International Symposium on Lattice Field Theory — PoS(LATTICE2021), May 16, 2022, p. 383. doi: 10.22323/1.396.0383. [8] F. Cuteri et al. “OPEN LATtice initiative,” OPEN LATtice initiative. (2021), [Online]. Available: https://openlat1.gitlab.io/. [9] C. Davies, A. Irving, R. Kenway, and C. Maynard, “International lattice data grid,” Nuclear Physics B - Proceedings Supplements, vol. 119, pp. 225–226, May 2003, issn: 09205632. doi: 10.1016/S0920-5632(03)01509-3. [10] M. G. Beckett et al., “Building the international lattice data grid,” Computer Physics Commu- nications, vol. 182, no. 6, pp. 1208–1214, Jun. 2011, issn: 00104655. doi: 10.1016/j.cpc.20 11.01.027. [11] A. Francis, P. Fritzsch, M. Lüscher, and A. Rago, “Master-field simulations of o( a )-improved lattice QCD: Algorithms, stability and exactness,” Computer Physics Communications, vol. 255, p. 107 355, Oct. 2020, issn: 00104655. doi: 10.1016/j.cpc.2020.107355. 116 [12] M. Lüscher, “Code available at https://luscher.web.cern.ch/luscher/openQCD/,” [13] M. Hasenbusch, “Speeding up finite step-size updating of full QCD on the lattice,” Physical Review D, vol. 59, no. 5, p. 054 505, Feb. 3, 1999. doi: 10.1103/PhysRevD.59.054505. [14] M. Hasenbusch, “Speeding up the hybrid monte carlo algorithm for dynamical fermions,” Physics Letters B, vol. 519, no. 1, pp. 177–182, Oct. 25, 2001, issn: 0370-2693. doi: 10.1016/ S0370-2693(01)01102-9. [15] C. Urbach, K. Jansen, A. Shindler, and U. Wenger, “HMC algorithm with multiple time scale integration and mass preconditioning,” Computer Physics Communications, vol. 174, no. 2, pp. 87–98, Jan. 15, 2006, issn: 0010-4655. doi: 10.1016/j.cpc.2005.08.006. [16] M. Lüscher and S. Schaefer, “Lattice QCD with open boundary conditions and twisted-mass reweighting,” Computer Physics Communications, vol. 184, no. 3, pp. 519–528, Mar. 1, 2013, issn: 0010-4655. doi: 10.1016/j.cpc.2012.10.003. [17] M. Lüscher, “Lattice QCD and the schwarz alternating procedure,” Journal of High Energy Physics, vol. 2003, no. 5, pp. 052–052, May 20, 2003, issn: 1029-8479. doi: 10.1088/1126-6 708/2003/05/052. [18] M. Lüscher, “Schwarz-preconditioned HMC algorithm for two-flavor lattice QCD,” Computer Physics Communications, vol. 165, no. 3, pp. 199–220, Feb. 1, 2005, issn: 0010-4655. doi: 10.1016/j.cpc.2004.10.004. [19] A. Horowitz, “Stochastic quantization in phase space,” Physics Letters B, vol. 156, no. 1, pp. 89–92, Jun. 1985, issn: 03702693. doi: 10.1016/0370-2693(85)91360-7. [20] A. M. Horowitz, “The second order langevin equation and numerical simulations,” Nucl. Phys. B, vol. 280, p. 510, 1987. doi: 10.1016/0550-3213(87)90159-3. [21] A. M. Horowitz, “A generalized guided monte carlo algorithm,” Physics Letters B, vol. 268, no. 2, pp. 247–252, Oct. 1991, issn: 03702693. doi: 10.1016/0370-2693(91)90812-5. [22] K. Jansen and C. Liu, “Kramers equation algorithm for simulations of QCD with two flavors of wilson fermions and gauge group SU (2 ),” Nuclear Physics B, vol. 453, no. 1, pp. 375–392, Oct. 1995, issn: 05503213. doi: 10.1016/0550-3213(95)00427-T. [23] M. Lüscher, “Ergodicity of the SMD algorithm in lattice QCD, unpublished notes (2017), http://luscher.web.cern.ch/luscher/notes/smd-ergodicity.pdf,” [24] A. S. Francis et al., “Properties, ensembles and hadron spectra with stabilised wilson fermions,” in Proceedings of The 38th International Symposium on Lattice Field Theory — PoS(LATTICE2021), May 16, 2022, p. 118. doi: 10.22323/1.396.0118. 117 [25] M. Lüscher, S. Sint, R. Sommer, P. Weisz, and U. Wolff, “Non-perturbative o(a) improvement of lattice QCD,” Nuclear Physics B, vol. 491, no. 1, pp. 323–343, Apr. 28, 1997, issn: 0550-3213. doi: 10.1016/S0550-3213(97)00080-1. [26] J. Bulava and S. Schaefer, “Improvement of n f = 3 lattice QCD with wilson fermions and tree-level improved gauge action,” Nuclear Physics B, vol. 874, no. 1, pp. 188–197, Sep. 2013, issn: 05503213. doi: 10.1016/j.nuclphysb.2013.05.019. [27] W. Bietenholz et al., “Tuning the strange quark mass in lattice simulations,” Physics Letters B, vol. 690, no. 4, pp. 436–441, Jun. 2010, issn: 03702693. doi: 10.1016/j.physletb.2010.05 .067. [28] M. Bruno, T. Korzec, and S. Schaefer, “Setting the scale for the CLS 2 + 1 flavor ensembles,” Physical Review D, vol. 95, no. 7, p. 074 504, Apr. 12, 2017, issn: 2470-0010, 2470-0029. doi: 10.1103/PhysRevD.95.074504. [29] S. Aoki et al., “Review of lattice results concerning low-energy particle physics: Flavour lattice averaging group (FLAG),” The European Physical Journal C, vol. 77, no. 2, p. 112, Feb. 2017, issn: 1434-6044, 1434-6052. doi: 10.1140/epjc/s10052-016-4509-7. [30] M. Lüscher, “Properties and uses of the wilson flow in lattice QCD,” Journal of High Energy Physics, vol. 2010, no. 8, p. 71, Aug. 2010, issn: 1029-8479. doi: 10.1007/JHEP08(2010)071. [31] M. Bruno et al., “Simulation of QCD with n f = 2 + 1 flavors of non-perturbatively improved wilson fermions,” Journal of High Energy Physics, vol. 2015, no. 2, p. 43, Feb. 2015, issn: 1029-8479. doi: 10.1007/JHEP02(2015)043. [32] T. Takaishi and P. de Forcrand, “Testing and tuning symplectic integrators for the hybrid monte carlo algorithm in lattice QCD,” Physical Review E, vol. 73, no. 3, p. 036 706, Mar. 17, 2006, issn: 1539-3755, 1550-2376. doi: 10.1103/PhysRevE.73.036706. [33] M. Lüscher and S. Schaefer, “Lattice QCD without topology barriers,” Journal of High Energy Physics, vol. 2011, no. 7, p. 36, Jul. 7, 2011, issn: 1029-8479. doi: 10.1007/JHEP07(2011)0 36. [34] G. P. Lepage et al., “Constrained curve fitting,” Nuclear Physics B - Proceedings Supplements, vol. 106-107, pp. 12–20, 2002, issn: 0920-5632. doi: https://doi.org/10.1016/S0920-563 2(01)01638-3. [35] W. I. Jay and E. T. Neil, “Bayesian model averaging for analysis of lattice field theory results,” Physical Review D, vol. 103, no. 11, p. 114 502, Jun. 9, 2021, issn: 2470-0010, 2470-0029. doi: 10.1103/PhysRevD.103.114502. [36] R. G. Edwards and B. Joó, “The chroma software system for lattice QCD,” Nuclear Physics B 118 - Proceedings Supplements, vol. 140, pp. 832–834, Mar. 2005, issn: 09205632. doi: 10.1016/j. nuclphysbps.2004.11.254. [37] A. Gambhir et al., “Code available at https://github.com/callat-qcd/lalibe,” [38] S. Güsken, “A study of smearing techniques for hadron correlation functions,” Nuclear Physics B - Proceedings Supplements, vol. 17, pp. 361–364, Sep. 1, 1990, issn: 0920-5632. doi: 10.1016/0920-5632(90)90273-W. [39] B. J. Owen et al., “Variational approach to the calculation of g a,” Physics Letters B, vol. 723, no. 1, pp. 217–223, Jun. 2013, issn: 03702693. doi: 10.1016/j.physletb.2013.04.063. [40] J. Dragos et al., “Nucleon matrix elements using the variational method in lattice QCD,” Physical Review D, vol. 94, no. 7, p. 074 505, Oct. 25, 2016, issn: 2470-0010, 2470-0029. doi: 10.1103/PhysRevD.94.074505. [41] N. Miller et al., “Scale setting the möbius domain wall fermion on gradient-flowed HISQ action using the omega baryon mass and the gradient-flow scales t 0 and w 0,” Physical Review D, vol. 103, no. 5, p. 054 511, Mar. 24, 2021, issn: 2470-0010, 2470-0029. doi: 10.110 3/PhysRevD.103.054511. [42] J. Dragos, T. Luu, A. Shindler, J. de Vries, and A. Yousif, “Confirming the existence of the strong CP problem in lattice QCD with the gradient flow,” Physical Review C, vol. 103, no. 1, p. 015 202, Jan. 19, 2021. doi: 10.1103/PhysRevC.103.015202. 119 APPENDIX BAYESIAN FIT RESULTS FOR SWF HADRON MASSES In the table below we report the fit results for the SWF ensembles, from which Figures 3.17 and 3.18 are obtained. All results are obtained using the Bayesian analysis framework presented in Section 3.4.2. Ensemble t0 /a2 amπ amp amΩ a12m400 1.48548(64) 0.25075(28) 0.7096(35) 0.8659(73) a094m400S 2.4482(50) 0.19483(40) 0.5586(47) 0.6917(68) a094m400 2.43979(89) 0.19470(17) 0.5541(19) 0.6791(63) a077m400 3.6198(30) 0.159903(83) 0.4568(20) 0.5658(34) a064m400 5.2588(46) 0.132932(84) 0.3793(17) 0.4659(55) Table 3.4: Final results for masses from SWF ensembles. We also include, in Figures 3.19 to 3.22, the stability plots and weights for the ensembles not shown in Section 3.4. It is important to notice the stability of the fitting procedure results, how they are always compatible with the vast majority of the fits and do not underestimate the uncertainties. 120 Ns = 1 2 3 4 5 0.255 E0π 0.250 0.245 a12m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.25 0.00 2 7 12 17 22 tmin Ns = 1 2 3 4 5 0.74 E0P 0.72 0.70 a12m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.25 0.00 2 7 12 17 tmin 0.90 Ns = 1 2 3 4 5 0.88 E0Ω 0.86 0.84 a12m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.25 0.00 2 7 12 tmin Figure 3.19: Fit results for the for the a12m400 ensemble. 121 Ns = 1 2 3 4 5 0.200 E0π 0.195 0.190 a094m400S Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.2 0.0 2 7 12 17 22 27 tmin 0.58 Ns = 1 2 3 4 5 0.57 E0P 0.56 0.55 0.54 a094m400S Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.2 0.0 2 7 12 17 tmin Ns = 1 2 3 4 5 0.70 E0Ω 0.68 0.66 a094m400S Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D)0.25 0.00 2 7 12 17 tmin Figure 3.20: Fit results for the for the a094m400S ensemble. 122 0.1650 Ns = 1 2 3 4 5 0.1625 E0π 0.1600 0.1575 0.1550 a077m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.2 0.0 2 7 12 17 22 27 tmin 0.48 Ns = 1 2 3 4 5 0.47 E0P 0.46 0.45 0.44 a077m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.1 0.0 2 7 12 17 22 tmin Ns = 1 2 3 4 5 0.58 E0Ω 0.56 0.54 a077m400 Q 0.75 0.10 w Ns 0.75 0.10 0.5 pr(M |D) 0.0 2 7 12 17 tmin Figure 3.21: Fit results for the for the a077m400 ensemble. 123 0.1400 Ns = 1 2 3 4 5 0.1375 E0π 0.1350 0.1325 0.1300 a064m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.25 0.00 2 7 12 17 22 27 tmin Ns = 1 2 3 4 5 0.40 0.38 E0P 0.36 0.34 a064m400 Q 0.75 0.10 w Ns 0.75 0.10 0.25 pr(M |D) 0.00 2 7 12 17 22 tmin 0.49 Ns = 1 2 3 4 5 0.48 E0Ω 0.47 0.46 0.45 a064m400 Q 0.75 0.10 w Ns 0.75 0.10 pr(M |D) 0.25 0.00 2 7 12 17 tmin Figure 3.22: Fit results for the for the a064m400 ensemble. 124 CHAPTER 4: QUANTUM STATE PREPARATION ALGORITHMS FOR THE SCHWINGER MODEL Despite the ever-increasing computational capacities of modern HPC facilities, one issue with LFT calculations seems impossible to solve: the sign problem. This arises when sampling a PDF in the MCMC that is not strictly positive. Clearly, this implies that the PDF is not actually a probability distribution, but an approximation. Since the PDF in LQCD comes from the exponential of the Euclidean action and the fermion determinants, one must ensure their positivity, but this is not possible in the presence of chemical potentials [1], topological terms [2, 3] or real-time evolution [4]. A possible solution would be to switch to Hamiltonian formalism, where the sign problem is absent, but that would lead to large vector spaces that grow exponentially with the number of lattice sites, which are untreatable even on modern supercomputers. One way to deal with this formalism would be to utilize some computing machine that can easily deal with these bases. Luckily, such machines do exist: Quantum Computers. The field of Quantum Computing (QC) made giant leaps forward in the last decade, mostly thanks to the experimental improvements that led to the creation of the first hardware. The idea of quantum computers was first proposed in the 70s, with the development of the field of quantum information [5–7]. Later it was suggested as a means to simulate quantum problems directly [8–10]. The main advantage they have as opposed to classical computers, in the context of Physics applications, is that they have a natural way to implement the antisymmetry requirements of fermionic systems, which will become important as we will discuss in Section 4.2.1. In more general terms, quantum computers use qubits as their basic information unit as opposed to bits. Operations on qubits are performed using quantum gates as opposed to binary logic gates. Currently, we are in the so-called Noisy Intermediate Scale Quantum (NISQ) era of hardware, meaning that current state-of-the-art machines have between tens to a few hundreds qubits, which are not enough for designing fault-tolerant algorithms, hence the noisy, and are also not enough for “quantum supremacy” (that is solving a problem faster than any classical computer 125 could). Current hardware designs are varied [11], some of them are based on superconducting circuits [12], trapped ions [13], optical lattices [14] or solid-state systems [15]. The model for computations that we will employ in this work is the quantum gate array, in which an algorithm is represented by a chain of quantum gate operations applied to the qubits of the system. In this model, a physics simulation can be performed by first preparing a state of interest, then letting it evolve in real-time with the Hamiltonian and finally measuring an observable. Each of the three phases of the computation presents its challenges and is actively being studied. In this work, we will focus on three algorithms for state preparation: the Adiabatic State Preparation (ASP), the Quantum Approximate Optimization Algorithm (QAOA) and the Rodeo Algorithm. Dealing with QCD on a quantum computer is currently beyond our capabilities, though there is active research in possible ways to represent the LFTs on qubits [16–20] or by employing tensor networks instead [21]. Furthermore, with the current hardware limitations, there is no chance of reaching the scale of the simulations that are currently being done in LQCD, especially when considering limitations due to noise. There is nevertheless a strong push to investigate the feasibility of quantum computations of lattice field theories and a search for new algorithms and approaches. In this work, we will then shift our focus to the Schwinger model, which is a toy model for QCD as it shares some interesting properties like confinement. This chapter is structured as follows: first, in Section 4.1, we will give a brief introduction to quantum computing, then, in Section 4.2, the Schwinger model will be introduced and a possible way to discretize its Hamiltonian on a quantum computer will be presented. Finally, in Sections 4.3 to 4.5, we will discuss three algorithms for state preparation and compare their results. 4.1 A Quick Tour of Quantum Computing In computer science, the basic block of information is the bit, a binary state that can either be 0 or 1. A combination of n bits, a bit-string, can represent numbers from 0 to 2n . In quantum 126 computing, one deals with qubits instead. The fundamental states are usually represented as:     1 0 |0⟩ =   , |1⟩ =   . (4.1) 0 1 Qubits, however, as opposed to bits, can be in any superposition of these two states, in particular, they can be in any state: |q⟩ = a |0⟩ + b |1⟩ with a, b ∈ C, |a|2 + |b|2 = 1. (4.2) Qubit states are normalized to one and their amplitudes can be complex. A convenient representa- tion of one qubit can be given by the Bloch sphere, given by the parametrization based on two angles θ and ϕ: θ θ |q⟩ = cos |0⟩ + sin eiϕ |1⟩. (4.3) 2 2 Quantum states can then be represented as points on the surface of the sphere, as shown pictorially in Figure 4.1. Figure 4.1: Graphic representation of the Bloch sphere. A quantum state corresponds to points on the surface of the sphere, parametrized by two angles θ and ϕ. 127 States composed of multiple qubits are obtained by taking the tensor product of single qubits:       a2   a1 a2        a1        a1   a2    b2    a1 b 2  |q1 q2 ⟩ = |q1 ⟩ ⊗ |q2 ⟩ =   ⊗   =    =  .      (4.4) b1 b2 a   2  1 2     b a    b1    b2 b1 b2 With this basic understanding of how qubits work, we can start looking at which operations one can apply to them. 4.1.1 Quantum Gate Operations Even though most computer programmers don’t think about the bit-level operations happening in their CPUs, they are fundamentally designing algorithms that are chains of logic gates, such as NOT, AND, OR and XOR and their combinations. These gates are then grouped together into opcodes of the CPU, operations that can be done in a single clock cycle like addition, multiplication or more complex ones, which take bits contained in the registry and run through the relevant logic gate path and then write the result to another registry location. Quantum computers today still don’t have this last level of abstraction and algorithms are still being written in terms of single gate operations. However, the fundamental gates are not the familiar binary ones. We will now summarize the main features of such gates. All qubit gate operations are required to be unitary transformations on the qubits they act upon. This is imposed by the constraint of preserving the normalization of the wave function. There is, in principle, no limit on the number of qubits that a single gate operation can act upon, but due to hardware constraints, NISQ machines only implement 1 and 2 qubit operations. In general, they are represented with diagrams where horizontal lines stand for single qubits and boxes and vertical lines stand for gate operations. The flow is read from left to right. For a generic 1-qubit gate, this looks like: |q⟩ U |q⟩ U initial state final state 128 All 1-qubit gates can be represented by a generic SU(2) transformation, which requires three parameters, usually identified with three angles:   θ iλ θ  cos 2 −e sin 2  U (θ, ϕ, λ) =  . (4.5) iϕ θ i(ϕ+λ) θ e sin 2 e cos 2 This general version is, however, not very useful, so some special cases have their unique symbols and diagrams. Pauli Gates The first class of gates that we introduce are the single qubit gates that represent the Pauli matrices. These gates inherit the properties of Pauli matrices: for example, they anti commute and are involutory (they are their own inverse), so their square is the identity matrix. For completeness, we list them below:   1 0 I=   → I 0 1   0 1 X = σx =   → X 1 0   0 −i Y = σy =   → Y i 0   1 0 Z = σz =   → Z 0 −1 It can be interesting to notice that the X gate maps |0⟩ → |1⟩ and |1⟩ → |0⟩ so it is the quantum analogous of the NOT gate. Rotation Gates Another class of gates that will be very important in this work is the Rotation Gates. These represent rotations around one of the three axes X, Y, Z. These form a complete basis for SU(2), alternative to the Pauli basis, meaning that any gate can, in principle, be decomposed into rotations. These are defined as: 129   θ θ θ  cos 2 −i sin 2 RX (θ) = e−i 2 X =   → RX (θ) −i sin 2θ cos 2θ   θ θ θ cos 2 − sin 2  RY (θ) = e−i 2 Y =   → RY (θ) θ θ sin cos 2  2  −i θ2 θ e 0 RZ (θ) = e−i 2 Z =   → RZ (θ) θ 0 ei 2 Phase Shift Gates One useful quantum gate is the Phase Shift gate. This is parametrized by one angle ϕ and is defined as:   1 0  P (ϕ) =   → P (ϕ) 0 eiϕ This gate corresponds to a rotation along the z-axis on the Bloch sphere. It maps the basis states as: |0⟩ → |0⟩ and the |1⟩ → eiϕ |1⟩. It should be noted that the phase shift gate is not Hermitian, but the relation P † (ϕ) = P (−ϕ) holds instead. Hadamard Gate A crucial element in quantum computing is the possibility of creating states that are the superposition of the basis states. This is in practice done by the use of the Hadamard Gate:   √1 1 1  H= 2   → H 1 −1 In particular, it maps the basis states to states that have an equal probability of being in either basis state:   1 1 1  |0⟩ + |1⟩ H |0⟩ = √   |0⟩ = √ (4.6) 2 1 −1 2   1 1 1  |0⟩ − |1⟩ H |1⟩ = √   |1⟩ = √ . 2 1 −1 2 These states, when measured, have 50% probability of being either 0 or 1. 130 Controlled Two-Qubit Gates The class of controlled 2-qubit gates is that of operations acting on two qubits where one of the two is kept constant and is used to determine the weights of the transformation on the other qubit. For a general 1-qubit operator U , its controlled version is given by:   1 0 0 0    0 1 0 0  CU =  →    0  0 u00 u01   U   0 0 u10 u11 A particular case is that of the controlled X gates or, as it is most often called, the CNOT gate:   1 0 0 0   0 1 0 0 CNOT =   →   0 0 0 1     0 0 1 0 The CNOT gate flips the state of the target qubit based on the control qubit: if the control is |0⟩, the target qubit is left unchanged; if the control is |1⟩, then the target is flipped. A key use of the CNOT gate is to entangle or disentangle two qubits, with the typical example being the preparation of a Bell State with just a Hadamard and a CNOT: |0⟩ H |ψ⟩ = √1 ( |00⟩ + |11⟩) 2 |0⟩ which is a maximally entangled state. Another gate that we will employ is the Controlled Phase gate, which is simply given by:   1 0 0 0    0 1 0 0  CP (ϕ) =   →   0 0 1 0    P (ϕ)   0 0 0 eiϕ 131 Ising Gates The last class of gates that we present are the so-called Ising Gates. These have special relevance for physics problems as they are often used in real-time evolution algorithms. They are two-qubit gates that represent the exponential of pairs of X, Y, Z gats acting on both qubits. Their definitions are as follows:   θ θ  cos 2 0 0 −i sin 2    θ θ θ  0 cos 2 −i sin 2 0  RXX (θ) = e−i 2 X1 ⊗X2 =  →    RXX (θ)  0  −i sin 2θ 0     θ θ −i sin 2 0 0 cos 2   θ θ  cos 2 0 0 i sin 2    θ θ θ  0 cos 2 −i sin 2 0  RY Y (θ) = e−i 2 Y1 ⊗Y2 = →     RY Y (θ)  0  −i sin 2θ 0     θ θ i sin 2 0 0 cos   2 −i θ2 e 0 0 0   0 ei θ2 0   θ 0  RZZ (θ) = e−i 2 Z1 ⊗Z2 = →    θ  RZZ (θ)  0 i2  0 e 0     −i θ2 0 0 0 e These gates are difficult to implement in hardware, so, in practice, a particular decomposition is often used: RXX (θ) → RX (θ) RZ (− π2 ) RZ ( π2 ) RY Y (θ) → RZ (− π2 ) RX (θ) RZ ( π2 ) RZZ (θ) → RZ (θ) 132 4.1.2 Measurement of Observables In quantum computing algorithms the wave function is propagated through all the gates, effectively applying all the corresponding operators. Once the algorithm is finished, the final wave function is in a superposition of states, whose Hilbert space is defined by the number of qubits used in the algorithm. When a measurement is performed, each qubit can only return the value of 0 or 1, depending on the state that the wave function collapses on. The reading of an observable is then a binary string of the size of the number of qubits that are measured simultaneously. The probability of a given read is given by the amplitudes of the different states in the wave function. If two or more qubits are entangled, then the measurement of one will affect the state of the other. A measurement on a given qubit is usually denoted in algorithm diagrams with the symbol: |q⟩ = a |0⟩ + b |1⟩ |0⟩ or |1⟩ quantum state classical state As an example, if the wave function of a 2-qubit system after a series of quantum gates is in the state: r r r 1 1 1 |ψ⟩ = |00⟩ − |01⟩ − i |10⟩, (4.7) 3 2 6 then a measurement on the two qubits would return 00 with probability p(00) = 1/3, 01 with p(01) = 1/2 and 10 with p(10) = 1/6. The output 11 would have zero probability from the above quantum state. One important thing to consider is that measurements in quantum computing are destructive, meaning that the quantum state is lost, due to wave function collapse, after measuring. This has the practical implication that in order to perform any measurement, one needs to repeat the whole algorithm several times (or perform many “shots” in the jargon) to estimate the relative probabilities of the final states. This presents an interesting dichotomy: all of the quantum evolution, through unitary gates as the one we have presented in this section, is fully deterministic, while the measurement, a non-unitary operation, is a stochastic process. 133 4.1.3 Practical Considerations The current state of quantum simulations for the field of theoretical physics is, as stated previously, in the phase of developing new tentative algorithms to solve problems. In order to test them, one has two choices: using a quantum simulator or utilizing physical hardware. Quantum simulators are programs that translate an algorithm described in terms of qubit lines and gates to a linear algebra problem, where the size of vectors in the basis is determined by 2N , where N is the number of qubits. The simulators offer a lot of flexibility, as one can inspect the wave function state in its entirety along the algorithm, something that we cannot do on physical hardware. Furthermore, one is free to design custom gates or operations freely. The drawback is obviously the high computational cost of the simulation and the memory limitations that effectively limit the number of qubits that one can simulate, but this should be obvious to the reader as the exponential growth in basis states is the reason to use quantum computers instead of classical ones in the first place. The most widely used quantum simulator is probably Qiskit [22], developed and maintained by IBM. It provides a set of tools to effectively write algorithms and analyze their results. As with most simulators, it can both allow access to the wave function state or simulate measurement processes by randomly sampling the state vector given its amplitudes. Another feature that is worth mentioning is the unified framework that Qiskit offers for executing algorithms on the quantum hardware by IBM that is made publicly available [23]. While quantum simulators provide a powerful tool for testing algorithms, it is still tempting for many to try to execute them directly on physical hardware. This is, given the current state of quantum devices, purely an exercise. In practice, very little, if anything, needs to be changed in the algorithm to allow it to be executed on quantum hardware, while most systems that are being simulated are small enough to be fully analyzed either classically or with a simulator. This, however, is about to change due to the rapid increase in qubit count of machines. Noise in Quantum Computing The main difference between quantum simulators and real hardware is the effect of noise. Physical qubits are susceptible to external perturbations coming 134 from the external environment. There are multiple factors that add noise to the results of an algorithm: the first one is the coherence time of the qubits themselves, i.e. the time a qubit can retain its information without random state changes due to the environment. The second factor is the “efficiency” of the quantum gates, that is, the level of precision of the unitary transformations applied to the qubits. In particular, two-qubit gates are hard to construct, which results in them being the noisiest. A metric that is currently used to describe the length of a quantum algorithm is, in fact, the count of its CNOT gates, as they impact the total precision of the algorithm orders of magnitude more than single-qubit gates. The last source of noise in quantum devices is the measurement process, which has some level of experimental uncertainty that can lead to false reads of the state of qubits. For context, the publicly available IBM machines of the Falcon family are quantum systems of 5 or 7 superconducting transmon qubits. Each qubit has a relaxation time of ≈ 100 µs, while the CNOT gate error of the order of 1% and the readout error is on the level of 3%. The data is taken directly from [23]. Noise is such a significant issue in current quantum hardware that there is a strong push to find possible ways to perform “error mitigation” algorithmically, for example, through redundant encoding or CNOT extrapolations [24, 25], while the Holy Grail of the field would be self-correcting logical qubits [26, 27], which can be achieved by combining hundreds of noisy qubits together into one logical unit. In quantum simulators like Qiskit, there are options to numerically simulate the sources of noise of the hardware, which allows for better studies of error mitigation mechanisms. 4.1.4 Structure of Quantum Simulations In an ideal setup, once quantum computers become large enough and noise is under control, a quantum Hamiltonian simulation would be, in general, structured into three parts: state prepara- tion, real-time evolution and measurement. One would first choose a system of interest, for which a given observable needs to be measured, but whose spectrum of states is not known beforehand. This is the case of large problems, where exact diagonalization is not feasible classically or for 135 states coming from interactions. The first step is then that of state preparation, that is, starting from qubits that are all aligned to the |0⟩ state (this is the most usual convention on the hardware) and perform the right transformations to put them in the desired initial state of the system under consideration. This operation is not trivial, especially when no analytical or numerical data is available on the given state. In the case where the amplitudes of the wave function (projected onto the binary basis) are known the controlled-rotation [28], Grover-oracle [29] or the quantum-random-access- memory [30, 31] methods can be applied among others. When the amplitudes are not known, as in the case of many physics applications, and just the Hamiltonian is given, one has to rely on some dynamical algorithms for state preparation. It is important to note that a good algorithm for state preparation should just be precise, in the sense of efficiently constructing the desired state, but also short, as not to introduce noise in the rest of the simulation. Once a state is prepared, quantum simulations can let it evolve in real-time or by applying an interaction Hamiltonian different from the one used to prepare the initial state. These calculations are incredibly challenging on classical computers as the sign-problem prevents their effective numerical simulation. The final but crucial component of a quantum simulation is obviously the measurement of observables. This operation is also not trivial to perform in QC as measurements can only be done on single qubits, which collapse the information of the whole wave function. Some of the most widely used algorithms are the so-called quantum-classical algorithms and the Variational Quantum Eigensolver in particular [32, 33], which rely on the variational principle to construct parametrized quantum states that are optimized with classical algorithms to obtain measurements. An important topic of research in measurements is that of finding optimal groupings of qubits to measure, which depend on the level of entanglement of the system and on the observable operator itself [34]. In Sections 4.3 to 4.5 we will present three such methods applied to the case of the Schwinger Model: the Adiabatic State Preparation (ASP) algorithm, the Quantum Approximate Optimization 136 Algorithm (QAOA) and the Rodeo Algorithm (RA). We also will only compare these methods in terms of their algorithm lengths and state preparation efficiency, but not on secondary observables computed with quantum algorithms because of the overhead they introduce. 4.2 The Schwinger Model with a θ-term The Schwinger model is a theory of Quantum Electrodynamics (QED) in 1 + 1 dimensions; its symmetry group is U(1). First analyzed by Julian Schwinger in a series of papers in the early 50s [35, 36], it was soon utilized as a toy model for understanding properties of more complicated theories, such as the relation between gauge invariance and the mass of the gauge bosons [37, 38]. Later it was utilized by Coleman to better understand confinement [39, 40], as in the case of 1 spatial dimension, the electric potential has a linear dependence on the distance between charged particles —as opposed to inverse in the case of 3 spatial dimensions. This leads to charged particles being confined to bound states, as is the case for quarks in QCD. The massless case of the Schwinger model can be solved analytically in the case of periodic boundary conditions, which makes it a suitable tool for theoretical investigations [41, 42]. In the case of massive fermions, however, this is not the case, but some results can be obtained through mass perturbation theory [43–45]. One last peculiar feature of the Schwinger model is the topological effects given by the coupling of the non-integrable phase to the chiral condensate, which leads to the well-known θ-vacuum [41, 42]. Some work has also been done in studying the massive Schwinger model with the addition of a θ-term in the Lagrangian to understand the vacuum structure of the theory. In the presence of a topological θ-term its Lagrangian is typically written as: 1 gθ L = − Fµν F µν + ϵµν F µν + iψ̄γ µ (∂µ + igAµ )ψ − mψ̄ψ, (4.8) 4 4π where Fµν = ∂µ Aν − ∂ν Aµ is the field tensor, the Aµ are U(1) gauge fields, and ϵµν is the totally antisymmetric tensor. In 1 + 1 dimensions, the gamma matrices are γ 0 = σ z , γ 1 = iσ y , and γ 5 = γ 0 γ 1 . The theory has three parameters—the gauge coupling g, the fermion mass m, and the θ angle. 137 θ Following prior works [46], we perform a chiral transformation of the fields ψ → ei 2 γ5 ψ, θ ψ̄ → ψ̄ei 2 γ5 and the path integral measure [47, 48] to arrive at an equivalent Lagrangian: 1 L = − Fµν F µν + iψ̄γ µ (∂µ + igAµ )ψ − mψ̄eiθγ5 ψ. (4.9) 4 Note that we have traded the explicit θ-term and its direct relation to the gauge field at the cost of introducing a θ dependence in the fermion mass term. We choose the temporal gauge A0 = 0, and then a standard Legendre transform yields the Hamiltonian: Z   1 iθγ5 1 2 H = dx −iψ̄γ (∂1 + igA1 )ψ + mψ̄e ψ + E , (4.10) 2 where in one spatial dimension, the electric field E = F 10 = −Ȧ1 has only one component, and there is no magnetic field. More details about the Legendre transform can be found in the Appendix of this chapter. To satisfy gauge invariance, additional local constraints that govern the interaction between matter and gauge fields must be imposed. These constraints, which are provided by the Gauss law, in the temporal gauge become: ∂1 E(x) = g ψ̄(x)γ 0 ψ(x). This equation will become important when constructing a spin Hamiltonian for use on QC. 4.2.1 Discretization on a Lattice The continuous Hamiltonian of Equation (4.10) needs to be discretized in order to perform numerical simulations. To do this, we employ Kogut-Susskind staggered fermions [49, 50], follow- ing previous works found in [46, 51–53]. The fermion discretization that we use splits the upper ψu (x) and lower ψd (x) components of the Dirac fermions and places them on distinct sites of a lattice that has 2x = n sites, where x is the number of points in a spatial direction before the splitting, see Figure 4.2. The field ψ(x) is then mapped onto a field χn according to:   χn ψu (x) for n even  √ ↔ (4.11) a  ψd (x) for n odd  ϕn We also scale the gauge field by the coupling and relabel the gauge field A1 (x) → ag and its derivative E(x) = Ȧ1 (x) → −gLn to be consistent with the notation in other works. The 138 Figure 4.2: Schematic representation of the staggered discretization of the fermions at two neigh- boring lattice sites. A sub-lattice with half spacing introduced and the upper and lower components of the field are split. resulting Hamiltonian is then: N −1   X 1 nm θ χ†n eiϕn χn+1 − h.c.   H = −i − (−1) n=1 2a 2 N N −1 X g2a X 2 + m cos θ (−1)n χ†n χn + L . (4.12) n=1 2 n=1 n The field ϕn , the electric potential, lives on the lattice site n, while the field Ln on the link between sites, just like the U field when we discretized QCD. On the lattice, Gauss’s law is not trivially derived from discretizing the continuum version ∂1 E(x) = g ψ̄(x)γ 0 ψ(x), but one can define it in the strong-coupling limit by looking at the ground state of the theory, which corresponds to a “filled Dirac sea” condition. A more detailed discussion is found in [54]. The relevant result for Gauss’s law is: 1 − (−1)n Ln − Ln−1 = χ†n χn − . (4.13) 2 For a given matter configuration, the gauge fields are now completely determined if one fixes open boundary conditions, i.e. fixing the derivative of the field to 0 at the boundaries. The Ln field can, therefore, be reabsorbed in the definition of the fermion fields, as we will see later. It is important to note that this is only the case for 1 spatial dimension and this cannot be generalized to higher ones. In practice, this will enable us to only deal with fermionic degrees of freedom when constructing quantum algorithms, but if the aim is to simulate QCD on a quantum device eventually, then the problem of finding an appropriate formulation for the gauge degrees of 139 freedom needs to be addressed. Crucially, since the gauge field is a bosonic field, representing it onto a fermionic spin-chain is a non-trivial task; this is in sharp contrast with what happens in classical simulations, where bosons are simpler than fermions to deal with. In deriving this Hamiltonian, some care has been put into expanding terms to higher orders in a and shifting some terms by one site, utilizing the properties of staggered fermions to only have terms that either contain the field χ at one same site or at neighboring sites, more details are found in the Appendix. These “hopping terms,” that are symmetric with respect to the two fermion species, allow us to treat the upper and lower component of the original ψ field identically. This is done because we want to map the Schwinger model onto an Ising-like spin Hamiltonian with nearest-neighbor interactions, as we will see in the next section. 4.2.2 The Spin Hamiltonian of the Schwinger Model All Hamiltonians that need to be simulated on quantum computers eventually have to be mapped onto a spin chain, because of the fundamental structure of quantum devices. Luckily, there exists a very simple procedure to map a Hamiltonian formulated in second quantization, that is, with the creation and annihilation operators, to a spin system preserving all the necessary anti-symmetry properties: the Jordan-Wigner transformation [55] This takes the operators of a fermion field, χn in our case and maps them onto a chain of operators on a chain of spins, one per lattice site: ! ! Y Xn − iYn Y Xn + iYn χn = −iZl χ†n = −iZl , (4.14) l