MEASUREMENT OF ATMOSPHERIC MUON NEUTRINO DISAPPEARANCE WITH ICECUBE USING CONVOLUTIONAL NEURAL NETWORK RECONSTRUCTIONS By Jessie Micallef A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Physics – Doctor of Philosophy Computational Mathematics, Science, and Engineering – Dual Degree 2022 ABSTRACT MEASUREMENT OF ATMOSPHERIC MUON NEUTRINO DISAPPEARANCE WITH ICECUBE USING CONVOLUTIONAL NEURAL NETWORK RECONSTRUCTIONS By Jessie Micallef Neutrinos are neutral, fundamental particles that oscillate, or change flavor, while they travel. This phenomenon means that neutrinos deviate from the Standard Model prediction that they are massless, creating an exciting window into physics exploration beyond the Standard Model. Understanding neutrino oscillation and constraining their behavior is crucial to furthering the understanding of these particles and how they fit, or do not fit, into the Standard Model. The IceCube Neutrino Observatory has been detecting neutrinos for more than 10 years, leading to a large sample of atmospheric neutrino data available for studying neutrino oscillations. Reconstructing the neutrino interactions, such as the neutrino’s energy and direction, are key to constraining neutrino oscillation. A fast and robust reconstruction method was developed using convolutional neural networks (CNNs) and optimized to reconstruct parameters necessary to both reconstruct and isolate a pure atmospheric neutrino sample using the IceCube detector. This work compares the performance of this reconstruction to the current likelihood-based reconstruction currently used in IceCube. An analysis of the muon neutrino disappearance is then pursued using 9.28 years of neutrino data. The analysis shows competitive projected sensitivity, the ability to account for numerous systematics, and robust recovery of the physics parameters under statistical and systematic variations. While the 9.28 year sample is still under collaboration review, one year of the total data is used to perform a confirmatory study on the CNN reconstruction and sample. The oscillation parameter constraints from this one year analysis are in alignment with past IceCube analyses and other neutrino experiments within one sigma. This opens the pathway to use the CNN reconstruction for future analyses studying low energy neutrinos on IceCube. To all the those who inspired me, And to all those whom I hope to inspire iii ACKNOWLEDGMENTS For Professor Tyce DeYoung, who encouraged and supported my pursuit of a unique path in physics and computation. For Professors Claudio Kopper, Kendall Mahn, and Kirsten Tollefson who always had their doors open and supported me and my work. For Dr. Brian Clark, who taught me what it means to be a postdoc, a great scientist, and mentor. And who reminded me to breathe. For Dr. Shiqi Yu, who joined me on this adventure that is oscillation analysis and answered my many neutrino questions. For both ICER and the CMSE department, who maintained the high performance computing cluster and encouraged student access to it. This was one of the greatest, unexpected perks of my graduate career that has shaped my research trajectory. For Dr. Devyn Cantu, who was the luckiest part of my graduate school experience to find such a kindred soul in the same research group. Thanks for showing me around my first collaboration meeting, graduating first, and generally showing me how grad school is done. For Dr. Alison Peisker, who was my partner in grad school from the qualifier to TAing to research to pandemic walks. For Dr. Joao Pedro Athayde Marcondes de Andre and Dr. Joshua Hignight, who answered my many questions as I entered grad school research and were wonderfully supportive postdocs. For my IceCube collaborators, who provided an evolving community for me to grow into as a researcher. For Professor Myron Campbell and the KOTO research group, who shaped me into a curious, competent researcher ready to tackle the challenges of grad school research. And for inspiring me to climb mountains. For Laura Wood, my friend in so many adventures from comic cons to walking the Mackinac Bridge, who reminded me that life is what you make of it. iv For Claire Kopenhafer, who took on a whole conference with me and reminded me of the joys of random, epic hobbies. For my graduate peers and cohort, who answered so many of my questions outside of the classroom and created a supportive community of physicists. For the Grand Canonical Ensemble, i.e. physics choir, and all those in it, who brightened my days with music and friendship. For Strange Matter, Biggby, Koala Bakery and Cafe, and the Lansing Lugnuts for providing the best thesis-writing environments. For Allison Drechny, who listened to me rant about these strange little neutrino particles and the roller coaster ride that is research. And who never failed to make me laugh. For my parents, who support me in every way possible. I don’t even know how to acknowledge all you do for me. For my sister and brother-in-law, who celebrated with me through my struggles and triumphs. Thanks for throwing me the biggest mid thesis writing party (i.e. your wedding). For my cat, who can solve any problem by crawling into my arms. For my family, who have always supported me, no matter what, no matter when. I love you all. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii CHAPTER 1 NEUTRINO PROPERTIES AND OSCILLATION . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Neutrinos in the Standard Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Neutrino Mass Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Neutrino Oscillation in Vacuum . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Neutrino Oscillation in Matter . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Neutrino Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 Elastic and Quasi-Elastic Scattering . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Resonant Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3 Deep Inelastic Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Atmospheric Neutrino Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.1 Neutrino Flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 Oscillation Physics with Atmospheric Neutrinos . . . . . . . . . . . . . . . . . . . 14 1.5.1 Expected Oscillation Signature . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5.2 Status of Global Oscillation Measurements . . . . . . . . . . . . . . . . . 15 1.5.3 Status of IceCube Oscillation Measurements . . . . . . . . . . . . . . . . . 16 CHAPTER 2 NEUTRINO DETECTION TECHNIQUES . . . . . . . . . . . . . . . . 17 2.1 IceCube Neutrino Observatory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.1 Geometry of the Main Detector . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2 Digital Optical Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3 DOM Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Neutrino Detection in Ice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Cherenkov Radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.2 Detection Signatures in IceCube . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.2.1 Topology of Cascade-Like Events . . . . . . . . . . . . . . . . . 21 2.2.2.2 Topology of Track-Like Events . . . . . . . . . . . . . . . . . . 23 2.3 Optical Properties of the Ice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1 Bulk Ice Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1.1 Depth Dependence . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1.2 Azimuthal Dependence . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Hole Ice Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 CHAPTER 3 LOW ENERGY SIMULATION AND DATA SAMPLE PROCESSING 27 3.1 Monte Carlo Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.1 Neural Network Training Samples . . . . . . . . . . . . . . . . . . . . . . 27 3.1.2 Analysis Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 vi 3.2 Monte Carlo Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Neutrino Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Muon Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.3 Noise Hits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.4 Noise Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Propagating MC Photons and Detector Response . . . . . . . . . . . . . . . . . . 32 3.4 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.1 Level 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.2 Level 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4.3 Level 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.4 Level 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.5 Level 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 CHAPTER 4 FAST LOW ENERGY RECONSTRUCTION USING CONVOLU- TIONAL NEURAL NETWORKS . . . . . . . . . . . . . . . . . . . . . 37 4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Preparing the Training Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.1 Data Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.2 Preparing Input Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.3 Sample Cleaning Differences between Training Sets . . . . . . . . . . . . . 44 4.2.3.1 Neutrino vs. Noise BDT . . . . . . . . . . . . . . . . . . . . . . 45 4.2.4 Number of Pulses and DOMs Hit . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.4.1 Containment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.5 Training Each CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.5.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.5.2 Zenith Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.5.3 Vertex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.5.4 Track Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.5.5 Muon Background Classifier . . . . . . . . . . . . . . . . . . . . 53 4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.1 Reconstruction Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3.2 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.3 Zenith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3.4 Vertex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.5 Track Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.6 Atmospheric Muon Classifier . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4 Stability Against Detector Systematics . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4.1 Reconstructing Different DOM Efficiencies . . . . . . . . . . . . . . . . . 65 4.4.2 Reconstructing Different Hole Ice Parameterizations . . . . . . . . . . . . 66 4.4.3 Reconstructing Different Bulk Ice Properties . . . . . . . . . . . . . . . . . 68 4.4.4 Conclusion of Robustness to Systematics . . . . . . . . . . . . . . . . . . 69 CHAPTER 5 FLERCNN ANALYSIS SAMPLE SELECTION . . . . . . . . . . . . . 71 5.1 Resolution Motivated Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1.1 Starting Vertex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 vii 5.1.2 Number of hit DOMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1.3 Energy Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Data/MC Agreement Motivated Cuts . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.1 Identifying Atmospheric Muons . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.1.1 Straight Veto Cut for Muons . . . . . . . . . . . . . . . . . . . . 73 5.2.1.2 Muon BDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.2 Direct Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.3 Cosine Zenith Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3 Application of Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4 Analysis Distribution Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4.1 PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.2 Binning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4.3 Muon Sample KDE Template . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4.4 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 CHAPTER 6 ANALYSIS OF MUON NEUTRINO DISAPPEARANCE WITH 9.28 YEARS OF DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1 Overview of Analysis Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2 Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.3 Verifying Data Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.3.1 Data Quality Check Between Years . . . . . . . . . . . . . . . . . . . . . . 87 6.3.2 Data/MC Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.4 Systematic Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.4.1 Atmospheric Flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.4.1.1 Cosmic Ray Uncertainty . . . . . . . . . . . . . . . . . . . . . . 93 6.4.1.2 Uncertainty on Meson Production . . . . . . . . . . . . . . . . . 94 6.4.1.3 Normalization Terms . . . . . . . . . . . . . . . . . . . . . . . . 95 6.4.2 Oscillation Modeling and Parameters . . . . . . . . . . . . . . . . . . . . . 95 6.4.3 Cross Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4.4 Detector Systematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4.4.1 Hypersurfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.4.5 Assessment of Systematic Parameters’ Impact . . . . . . . . . . . . . . . . 99 6.4.6 Summary of All Systematic Parameters . . . . . . . . . . . . . . . . . . . 101 6.5 Verifying the Analysis Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.5.1 Asimov Inject-Recover Test . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.5.2 Projected Sensitivity To Oscillation Parameters . . . . . . . . . . . . . . . 104 6.5.3 Parameter Ensemble Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.5.4 Ensemble Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.6 Blind Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.6.1 Blind Fit Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.6.2 Blind Fit Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.6.3 Potential Contributions to the Blind Fit Result . . . . . . . . . . . . . . . . 112 CHAPTER 7 ANALYSIS OF MUON NEUTRINO DISAPPEARANCE WITH 0.92 YEARS OF DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 viii 7.1 One Year Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.2 Changes in Analysis with One Year Sample . . . . . . . . . . . . . . . . . . . . . 116 7.2.1 Adjusting the Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.2.2 Summary of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.3 Adjusted Expectations with One Year of Livetime . . . . . . . . . . . . . . . . . . 119 7.3.1 Sensitivity Projection for One Year . . . . . . . . . . . . . . . . . . . . . . 119 7.3.2 Impact of Systematics on One Year Analysis . . . . . . . . . . . . . . . . . 119 7.3.3 Pre-Fit Data/MC Agreement for One Year . . . . . . . . . . . . . . . . . . 120 7.4 Blind Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.4.1 Data/MC Agreement using Best Fit Values . . . . . . . . . . . . . . . . . . 123 7.5 Best Fit Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.5.1 Best Fit Values and Contours . . . . . . . . . . . . . . . . . . . . . . . . . 127 CHAPTER 8 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.1 Potential for Future Work on FLERCNN . . . . . . . . . . . . . . . . . . . . . . . 134 8.1.1 Optimize Current Architecture . . . . . . . . . . . . . . . . . . . . . . . . 134 8.1.2 Expand FLERCNN Reconstruction Output . . . . . . . . . . . . . . . . . . 135 8.1.3 Broaden FLERCNN Applications . . . . . . . . . . . . . . . . . . . . . . 136 8.2 Potential for Future Work on IceCube Neutrino Oscillations . . . . . . . . . . . . . 136 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 ix LIST OF TABLES Table 3.1: Simulation settings used to generate neutrinos for the analysis sample. . . . . . . 30 Table 4.1: Describes the loss function and activation used for each network. . . . . . . . . . 42 Table 4.2: Describes the transformations that the input features undergo during preprocessing. 44 Table 4.3: Describes the additional training sample cuts. For ease of viewing, cells highlighted in green indicate that the cut in that column was applied to the training sample for the network in that row. . . . . . . . . . . . . . . . . . . . . 45 Table 4.4: Describes the training samples for each network, which more details expanded in the following sections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 4.5: Describes the number of files used for training, the model chosen in terms of the raw training pass and epoch. . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 4.6: Describes the training samples learning rate and changes. . . . . . . . . . . . . . 48 Table 4.7: FLERCNN and Retro Reconstruction sample cut comparison after Level 5. All analysis cuts that apply to the FLERCNN sample are described in more detail in Sections 5.1 and 5.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Table 4.8: Runtime and implications of applying the different reconstructions to the MC sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Table 5.1: Describes cuts on the neutrino MC analysis sample and their individual impact when added to the final sample. . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Table 5.2: Describes the final rates, including re-weighting with interpolated hypersur- faces and muon KDE smoothing described in Sections 6.4.4.1 and 5.4.3, predicted by the MC simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Table 5.3: Describes the analysis binning using the CNN reconstructed and classified variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Table 6.1: Settings for the systematic sets used to generate the hypersurfaces, with each row describing a new set. Cells highlighted in green indicate settings that vary from their baseline values (row 1). Rows highlighted in gray indicate system- atic sets that are currently under study with potential mismatched simulation (Section 7.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 x Table 6.2: Systematic parameters, including those fixed during fit, along with the priors and bounds on the parameters that are free during fit. Parameters that are fixed have their row highlighted in gray. . . . . . . . . . . . . . . . . . . . . . . . . . 102 Table 6.3: Stop conditions for the blind fit plan. . . . . . . . . . . . . . . . . . . . . . . . . 110 Table 7.1: Total counts with 0.92 years of livetime predicted by the updated MC simulation. 117 Table 7.2: Changes made for the one year analysis to handle updates and reduced in statistics.119 Table 7.3: Agreement between the data and post-fit MC using the best fit values for the physics and systematic parameters, quantified by the 𝑝-value pre- vs. post-fit. . . 126 Table 7.4: Best fit values for the physics and systematic parameters using one year of livetime with FLERCNN sample. The pull describes the difference in 𝜎 of the parameters with Gaussian priors. . . . . . . . . . . . . . . . . . . . . . . . . . . 127 xi LIST OF FIGURES Figure 1.1: Standard Model of particle physics, including the particles and their media- tors. The lepton sector (green) and the bosons (red) are the critical regions for this work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Figure 1.2: The distances and energies necessary to probe the two mass squared differ- ences (Δ𝑚 21 2 in the dashed line and Δ𝑚 2 in the dotted-dashed line). Plot 32 from [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Figure 1.3: Feynman diagrams for the charged current (left) and neutral current (right) interactions. The ℓ represents the leptons (𝑒, 𝜇, or 𝜏). This particular example is of deep inelastic scattering mechanism (Section 1.3.3). . . . . . . . . . . . . 9 Figure 1.4: Contributions of each of the interaction cross sections as a function of energy (plots from [2]). Neutrinos are in the left plot and anti-neutrinos in the right plot. The general trend is the same for both neutrino and anti-neutrino, with QE dominating at the lowest energies (< 2 GeV), resonance at a few GeV, and DIS at the 10 GeV-scales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Figure 1.5: Description of atmospheric neutrino sources from cosmic ray interactions . . . 12 Figure 1.6: The neutrino flux (Φ𝜈 ) with all directions averaged as a function of energy (left plot), ratio of the flux for the difference seasons as a function of energy (middle plot) and the neutrino flux as a function of cos 𝜃 𝑧 (plots from [3]). . . . 13 Figure 1.7: Simulated probability of 𝜈𝑒 , 𝜈 𝜇 , 𝜈𝜏 (left to right) → 𝜈 𝜇 at various distances (in cosine zenith) and energies assuming normal ordering. Notice that the mass effects are visible around cos 𝜃 𝑧 = −0.8 in all three plots, in the region of 2-16 GeV for the first plot and 4-8 GeV in the middle and left plots. . . . . . 15 Figure 1.8: Result contours for global experimental results, including IceCube 7.5 year result and likelihood-based 9.28 year analysis’ sensitivity projection [4, 5, 6, 7, 8, 9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Figure 2.1: Top and side view of the IceCube detector (modified from [10]). The left side plot shows the variation in the scattering and absorption as a function of depth (Section 2.3.1). Two footprints are highlighted in the top view: the DeepCore array circled in a dotted blue line and the strings used to train the CNN reconstruction (Chapter 4) circled in purple. . . . . . . . . . . . . . . . . 19 Figure 2.2: Schematic of the digital optical module (DOM). . . . . . . . . . . . . . . . . . 20 xii Figure 2.3: Different types of interactions, their secondaries, and the expected signatures for incident neutrino energies < 100 GeV (modified from [11]). . . . . . . . . . 22 Figure 2.4: Scattering and absorption coefficients (left scale in blue line) and lengths (right scale in red) as a function of depth. Smaller values of the coefficients indicate cleaner ice [12]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Figure 2.5: The charge difference between MC and data measured as a function of the azimuth angle, showing the impact of the tilt and flow anisotropy [13]. . . . . . 26 Figure 3.1: Flow chart describing the processing for simulation (left) and data (right). . . . 28 Figure 4.1: Top view of the IceCube detector, with DeepCore strings in red, IceCube strings in gray, and a blue highlight on all strings used for the low-energy CNN. 39 Figure 4.2: Detailed network architecture for FLERCNN with the legend for colored layers on the right. The notation at the top indicates that for each event, there are 8 DeepCore strings with 60 DOMs per string and 5 summary variables per DOM fed into one branch of the network. The other branch of the network sees the 19 IceCube strings with their 60 DOMs per string and 5 summary variables per DOM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Figure 4.3: Shows the rate of the default discriminator in blue (from the SPE templates used to generate FLERCNN data and MC) compared to data as a function of charge. Agreement is seen above 0.25 PE [14]. . . . . . . . . . . . . . . . . . . 43 Figure 4.4: Energy training sample and loss progression. . . . . . . . . . . . . . . . . . . . 49 Figure 4.5: Zenith training sample and loss progression. . . . . . . . . . . . . . . . . . . . 50 Figure 4.6: X position training sample and Y position training sample. . . . . . . . . . . . 51 Figure 4.7: Z training sample and vertex loss progression. . . . . . . . . . . . . . . . . . . 52 Figure 4.8: PID training sample and loss progression. . . . . . . . . . . . . . . . . . . . . 53 Figure 4.9: True cosine zenith and energy distribution used to train the muon classifier. The true energy for the muon simulation is the energy of the muon at gen- eration, which is why it is different than the energy of the neutrino, which is shown at the interaction point. The 𝜈 𝜇 and 𝜈𝑒 histograms are stacked, in comparison with the muon histogram (not stacked). . . . . . . . . . . . . . . . 54 Figure 4.10: Loss vs. epoch progression as the muon classification network trained, using the binary crossentropy loss function. . . . . . . . . . . . . . . . . . . . . . . . 54 xiii Figure 4.11: Testing distributions for the 𝜈 𝜇 CC and 𝜈𝑒 CC events. The energy and zenith cuts are not applied in this case, since they will be applied based on the individual reconstructions’ outputs during testing. All other cuts applied are the CNN Cuts described in Table 5.1. . . . . . . . . . . . . . . . . . . . . . . . 56 Figure 4.12: Resolution shown for true energy for the 𝜈 𝜇 CC and 𝜈𝑒 CC events. With the invisible energy carried off by the outgoing neutrinos from NC interactions and some 𝜈𝜏 CC interactions, the resolution is shown vs. deposited energy for these selections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Figure 4.13: Resolution shown for reconstructed cosine zenith for the 𝜈 𝜇 CC tracks and remaining cascade events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Figure 4.14: Resolution shown for reconstructed radius from the center string (string 36) of the starting vertex for the 𝜈 𝜇 CC tracks and remaining cascade events. . . . . 61 Figure 4.15: Resolution shown for reconstructed z position of the starting vertex for the 𝜈 𝜇 CC tracks and remaining cascade events. . . . . . . . . . . . . . . . . . . . . 61 Figure 4.16: Confusion matrix for vertex reconstruction, exploring efficiency of cutting at 200m, proposed for oscillation analysis. . . . . . . . . . . . . . . . . . . . . . 61 Figure 4.17: Performance of Track classifier with binary classification histogram on the left and the ROC on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Figure 4.18: Performance of muon classifier with binary classification histogram on the left and the ROC on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Figure 4.19: Bias and spread for the FLERCNN performance on various detector system- atic sets for DOM efficiency. The FLERCNN reconstruction is shown in the solid lines and the likelihood-based reconstruction (retro) is shown in the dotted lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Figure 4.20: Bias and spread for the FLERCNN performance on various detector system- atic sets for hole ice. The FLERCNN reconstruction is shown in the solid lines and the likelihood-based reconstruction (retro) is shown in the dotted lines. 67 Figure 4.21: Bias and spread for the FLERCNN performance on various detector system- atic sets for bulk ice. The FLERCNN reconstruction is shown in the solid lines and the likelihood-based reconstruction (retro) is shown in the dotted lines. 68 Figure 5.1: The data/MC disagreement for the number of hits in the top 15 layers of IceCube DOMs. The left plot shows before the cut (keeping events < 0.5 hits) and the right plot shows the adjusted agreement after these events are removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 xiv Figure 5.2: The data/MC disagreement for the number of hits in the outermost IceCube strings. The left plot shows before the cut (keeping events with < 7.5 hits) and the right plot shows the adjusted agreement after these events are removed. . 74 Figure 5.3: Performance of Muon BDT, with the BDT predicted neutrino probability on the left and the rate with various proposed threshold cuts on the right (plots created by Dr. Shiqi Yu). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Figure 5.4: Number of DOMs hit with direct pulses, identified using the SANTA algo- rithm. The left plot shows before the cut (keeping events with < 3 direct hits) and the right plot shows the adjusted agreement after these events are removed. The MC scaling to the data means that removing the mismatched bin improves the overall normalization for the data/MC agreement. . . . . . . . 77 Figure 5.5: Rates of the different selections comparison from the earlier levels to the final level FLERCNN sample after analysis cuts, including re-weighting with interpolated hypersurfaces and muon KDE smoothing described in Sections 6.4.4.1 and 5.4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Figure 5.6: Result of the CNN track-like classifier on the nominal MC, binned more finely in the top plot and binned at the proposed 3-bin boundaries for analysis in the bottom plot. Note that the neutrino (and muon) flavors are all stacked. . . . . . 80 Figure 5.7: Percentages for each flavor’s fraction of events located in that PID bin as a function of energy. For example, the 𝜈 𝜇 CC sample shows that 80% of the 𝜈 𝜇 CC events > 60 GeV are classified as "tracks" with the chosen PID binning. . . 81 Figure 5.8: Composition of each PID bin as a function of true energy, with the relative percent of the each flavor’s contribution to the events in each PID bin. . . . . . . 81 Figure 5.9: Number of expected events in each bin over the given 9.28 years of livetime, with neutrino weights adjusted by interpolated hypersurfaces to account for detector systematics and the expected muon distribution smoothed by KDE. . . 82 Figure 5.10: Muon distribution in the analysis binning before the KDE smoothing (left column), with sharp features due to low statistics, and after KDE smoothing applied (middle column) along with the pull in each bin (right column). The gray bins are where 𝜎hist = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Figure 5.11: Percentage of muons expected in the sample per analysis bin after muon KDE smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Figure 5.12: Analysis binning after masking, with the masked bins shown in white. . . . . . 84 Figure 6.1: Energy distribution compared between years of data. . . . . . . . . . . . . . . . 88 xv Figure 6.2: Zenith angle distribution compared between years of data. . . . . . . . . . . . . 88 Figure 6.3: PID distribution compared between years of data. . . . . . . . . . . . . . . . . 88 Figure 6.4: Muon BDT distribution compared between years of data. . . . . . . . . . . . . 89 Figure 6.5: Rates compared between years of data, split up by month. . . . . . . . . . . . . 90 Figure 6.6: Data/MC agreement for reconstructed energy and zenith angle. . . . . . . . . . 91 Figure 6.7: Data/MC agreement for PID and muon BDT classifiers. Note that the muon BDT classifier has a cut applied at 0.8, which accounts for the behavior from 0.4-0.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Figure 6.8: Data/MC agreement for starting vertex z and 𝜌36 reconstructed variables (radial position relative to the center IceCube string #36). . . . . . . . . . . . . 92 Figure 6.9: Barr parameters labels, chart from [15]. . . . . . . . . . . . . . . . . . . . . . . 94 Figure 6.10: Example of hypersurface fit in one analysis bin projected on the DOM effi- ciency axes, using the discrete sets to create a continuous fit (plot from [8]). The lighter data points indicate systematic sets that have additional detector parameters pulled in addition to the DOM efficiency. The darker data points are systematic sets with only DOM efficiency changed. . . . . . . . . . . . . . 99 Figure 6.11: Relative effect of each systematic parameter on the analysis, determined by pulling the parameter from nominal to generate pseudo-data but fixing it to nominal when fitting. Bars in orange are systematic parameters that are free for the proposed analysis fit, while the bars in purple are proposed to be fixed. . 101 Figure 6.12: Scan over physics parameters showing the inject values for the pseudo-data in the orange dot, the fitted value in the purple cross, and the fitter starter point in red plus (the latter is the same for all scans). All fitted physics parameters recover their injected truth parameters. . . . . . . . . . . . . . . . . . . . . . . 104 Figure 6.13: Sensitivity of the FLERCNN and 𝐿𝐿𝐻-based analysis at the verification sample best fit point, compared to global results. . . . . . . . . . . . . . . . . . 106 Figure 6.14: Quantifying the ability of the fitter to recover systematic parameters injected at various off-nominal values, with a perfect recovery on the diagonal 1:1 gray line and the results for various fits shown in the dots. The color of the dots corresponds to the fit’s total −𝐿𝐿𝐻 result, where a smaller value (purple) of −𝐿𝐿𝐻 indicates a better fit than a larger value (yellow). This test is run assuming normal ordering of the neutrino masses. . . . . . . . . . . . . . . . . 107 xvi Figure 6.15: Fit results for all systematic and physics parameters after 500 ensemble test trials, assuming normal ordering. The distribution of fitted values for each parameter, including the distribution’s median value which should ideally be aligned with the nominal value and the distribution’s 1𝜎 which should be within the range of the prior. . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Figure 6.16: Fit results for all systematic and physics parameters after 500 ensemble test trials, assuming inverted ordering. The distribution of fitted values for each parameter, including the distribution’s median value which should ideally be aligned with the nominal value and the distribution’s 1𝜎 which should be within the range of the prior. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Figure 6.17: Systematic pulls for the nine year FLERCNN sample blind fit shown in terms of 𝜎 for parameters with a Gaussian prior. None of the parameters hit their stop condition, with all Gaussian pulls under 1𝜎. . . . . . . . . . . . . . . . . . 111 Figure 6.18: Pulls in 𝜎 for each of the analysis bins between the data and the MC at the fitted physics and systematic parameters for the nine year FLERCNN sample. None of the bins hit the stop conditions for the pull, with all bins pulling less than 3𝜎. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Figure 6.19: Test statistic result for nine years of data fit shown in purple line with a distribution of ensemble test TS results in yellow showing the expected TS spread for 499 trials at the data best fit point. The data TS falls within the ensemble distribution of TS results with a p-value of 0.6% which hits the stop condition requiring it to be > 5𝜎. . . . . . . . . . . . . . . . . . . . . . . . . . 112 Figure 7.1: Analysis binning after masking, with the masked bins shown in white, for the one year of livetime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Figure 7.2: Projected sensitivity for the one year analysis (red) vs. the nine year analysis (black), where the expected contour is wider with less statistics. . . . . . . . . . 120 Figure 7.3: New systematic mismodeling plot is on the left for the one year sample, with comparison to the right plot (from Figure 6.11) for the nine year sample. . . . . 121 Figure 7.4: Data/MC pre-fit agreement for one year of livetime with the reconstructed energy (left) and reconstructed zenith angle (right). . . . . . . . . . . . . . . . 121 Figure 7.5: Data/MC pre-fit agreement for one year of livetime with the CNN classified PID (left) and BDT Muon classification (right). . . . . . . . . . . . . . . . . . 122 Figure 7.6: Data/MC pre-fit agreement for one year of livetime with the starting vertex reconstructed 𝜌36 (left) and reconstructed z positions (right). . . . . . . . . . . 122 xvii Figure 7.7: Systematic pulls for the one year blind fit shown in terms of 𝜎 for parameters with a Gaussian prior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Figure 7.8: Pulls in 𝜎 for each of the analysis bins between the data and the MC at the fitted physics and systematic parameters. . . . . . . . . . . . . . . . . . . . . . 124 Figure 7.9: Test statistic result for the data fit shown in purple line with a distribution of ensemble test TS results in yellow showing the expected TS spread for 499 trials at the data best fit point. The data TS falls within the ensemble distribution of TS results with a p-value of 31.7%. . . . . . . . . . . . . . . . . 124 Figure 7.10: Post-fit data/MC agreement for the energy (left) and zenith angle (right). . . . . 125 Figure 7.11: Post-fit data/MC agreement for the PID score (left) and background muon BDT score (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Figure 7.12: Post-fit data/MC agreement for the 𝜌36 (left) and 𝑧 (right). . . . . . . . . . . . . 126 Figure 7.13: Result FLERCNN one year fit shown on the explored range (range of number line plot), Gaussian prior if relevant (blue shaded region), nominal value (green dashed line), stop condition boundary (orange lines), and fitted value 2 , solving (red cross). For historical reasons, the analysis software fits for Δ𝑚 31 2 = Δ𝑚 2 − Δ𝑚 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 for Δ𝑚 32 31 21 Figure 7.14: Result of the one year analysis, with 90% contour in red, 1𝜎 contour in blue, and the best fit point in the black X. The 1D projections for the physics parameters are show on top of and to the right of the main plot, with the 𝜒mod 2 value shown for each physics parameter. . . . . . . . . . . . . . . . . . . . . . 131 Figure 7.15: Result FLERCNN one year contours compared to other IceCube 𝜈 𝜇 disap- pearance anaylses (left) [16, 17, 8] and global experimental results (right) [4, 5, 6, 7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Figure 7.16: Projected sensitivity of the one year analysis using the one year result BFP. . . . 132 xviii CHAPTER 1 NEUTRINO PROPERTIES AND OSCILLATION 1.1 Introduction Particle physics uses the smallest, most fundamental building blocks to answer the universe’s biggest questions. While the Standard Model of particle physics lays the foundation for fundamental interactions, several discrepancies in the neutrino sector, such as the existence of neutrino masses and flavor mixing, leave open critical questions. Thus, neutrinos provide a rich opportunity to explore the Standard Model and to probe many of the open physics questions. Constraining the neutrino oscillation parameters gives us a better understanding of the remaining neutrino mysteries. The IceCube Neutrino Observatory offers a large atmospheric neutrino sample with which to perform an analysis measuring the neutrino oscillation phenomena, which will be explored in this work. As data sets in neutrino physics get larger, there is a need for computational methods that can handle increased sample complexity and quickly process the results. Machine Learning has been at the forefront of computational advances in the 21st century and has shown promising results when applied to other neutrino and particle physics experimental applications [18, 19, 20]. IceCube’s data and MC samples already exceed 100 million events, such that utilizing fast and accurate computational solutions are vital. This work will describe a series of convolutional neural networks developed expressly for reconstruction of 10 GeV-scale atmospheric neutrinos in the IceCube detector. This new reconstruction is then applied to a muon neutrino disappearance analysis to constrain the oscillation parameters and place these results in the broader context of other neutrino oscillation experiments. 1 1.2 Neutrinos in the Standard Model Neutrinos are fundamental, spin 1/2 particles that are part of the lepton sector in the Standard Model (Figure 1.1). The neutrinos are neutral, but correspond to a charged lepton pair, so that there are three known neutrino flavors: electron neutrino (𝜈𝑒 ), muon neutrino (𝜈 𝜇 ), and tau neutrino (𝜈𝜏 ). The Standard Model predicts that neutrinos are massless, but due to the fact that neutrinos have been observed to oscillate [21, 22], at least two of the neutrino states must have non-zero mass. Figure 1.1: Standard Model of particle physics, including the particles and their mediators. The lepton sector (green) and the bosons (red) are the critical regions for this work. 1.2.1 Neutrino Mass Hierarchy Each of the neutrino flavor states (𝜈𝛼=𝑒,𝜇,𝜏 ) are described using linearly independent superpositions of their mass states (𝜈𝑖=1,2,3 ). The absolute masses are still a source of experimental measurements, with the sum of the three masses estimated to be < 1 eV from measurements of beta decay, supernova, and atmospheric sources [23, 24, 25]. While the sign of the difference between the 𝜈1 and 𝜈2 mass states is known, where (𝑚 𝜈2 > 𝑚 𝜈1 ), the sign of 𝑚 𝜈3 is not yet known [23]. The case where 2 (𝑚 𝜈3 > 𝑚 𝜈2 > 𝑚 𝜈1 ) is called the normal ordering. The case where (𝑚 𝜈2 > 𝑚 𝜈1 > 𝑚 𝜈3 ) is called the inverted ordering. Neutrinos are generated and interact in their flavor states, but propagate in their mass states, which can lead to neutrino oscillation or flavor changing between the neutrino’s generation and interaction stages. Sections 1.2.2 and 1.2.3 discuss the factors that influence the probability of oscillations. Equation 1.1 shows the mathematical representation of the flavor states written in terms of the mass eigenstates. ∑︁ |𝜈𝛼 ⟩ = 𝑈𝛼𝑖 |𝜈𝑖 ⟩ (1.1) 𝑖 © 𝜈𝑒 ª © 𝜈1 ª ­ ® ­ ® ­𝜈 𝜇 ® = 𝑈PMNS ­𝜈2 ® ­ ® ­ ® (1.2) ­ ® ­ ® ­ ® ­ ® 𝜈𝜏 𝜈 « ¬ « 3¬ © 𝜈𝑒 ª ©𝑈𝑒1 𝑈𝑒2 𝑈𝑒3 ª ©𝜈1 ª ­ ® ­ ®­ ® ­𝜈 𝜇 ® = ­𝑈 𝜇1 𝑈 𝜇2 𝑈 𝜇3 ® ­𝜈2 ® ­ ® ­ ®­ ® (1.3) ­ ® ­ ®­ ® ­ ® ­ ®­ ® 𝜈𝜏 𝑈𝜏1 𝑈𝜏2 𝑈𝜏3 𝜈3 « ¬ « ¬« ¬ The unitary matrix 𝑈 is called the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) mixing matrix and describes the mixing between the neutrino mass and flavor states [26, 27]. ©1 0 0 ª © cos 𝜃 13 0 𝑒 −𝑖𝛿CP sin 𝜃 13 ª © cos 𝜃 12 sin 𝜃 12 0ª ­ ®­ ®­ ® 𝑈 = ­0 cos 𝜃 23 sin 𝜃 23 ® ­ ­ ®­ ®­ ® 0 1 0 ® ­− sin 𝜃 12 cos 𝜃 12 0® (1.4) ­ ®­ ®­ ® ­ ®­ ®­ ® 0 − sin 𝜃 23 cos 𝜃 23 𝑒𝑖𝛿CP sin 𝜃 13 0 cos 𝜃 13 0 0 1 « ¬« ¬« ¬ where 𝜃 is the mixing angle and 𝛿CP is the Charge-Parity (CP) violating phase. Each matrix in Equation 1.4 is typically probed experimentally by using neutrinos from different sources: atmospheric for 𝜃 23 , reactor for 𝜃 13 , and solar for 𝜃 12 respectively. This is due to the values of the PMNS matrix and mass differences between the neutrinos, which nature has fortuitously given us such that the oscillations between the 𝜈2 and 𝜈3 states essentially mix fully first (on shorter distance 3 scales) before engaging in mixing with the 𝜈1 state. This is not necessarily motivated by any theory, but measurements have indicated that this is the case. Using the analysis in this work, energies from 5-100 GeV are considered and the distances atmospheric neutrinos can travel range from 20-12,500 km depending on if it is generated above the detector in atmospheric at the South Pole or generated at the North Pole and travels through the Earth’s core to the detector. Figure 1.2 shows the regions of energy (𝐸) and distance (𝐿) that different neutrino experiments probe, and therefore which mass squared differences (such as Δ𝑚 32 2 = 𝑚 2 − 𝑚 2 ) they are sensitive to. IceCube’s blue 3 2 2 line in Figure 1.2, indicating that it is sensitive shaded region intersects with the dotted-dashed 𝑚 32 to measuring the atmospheric 𝑚 322 term. How the neutrino’s energy, distance travelled, and mass squared difference factor into the probability of oscillation are explored in more detail in Section 1.2.2. Figure 1.2: The distances and energies necessary to probe the two mass squared differences (Δ𝑚 21 2 2 in the dotted-dashed line). Plot from [1]. in the dashed line and Δ𝑚 32 This work studies atmospheric neutrino oscillation using the IceCube detector, and thus will focus on the 𝜃 23 mixing angle. Another experiment that probes the 𝜃 23 mixing angle using atmospheric neutrinos is Super-Kaminokande (or Super-K) shown in the bottom of the green region in Figure 1.2 [6]. Accelerator experiments at certain energies and distances are also sensitive to 4 𝜃 23 , so that experiments like T2K, NOvA, and MINOS+ will be used for comparison (Section 1.5.2), also denoted in the green region in Figure 1.2 at smaller energies and distances than IceCube probes [5, 4, 7]. 1.2.2 Neutrino Oscillation in Vacuum As the neutrino propagates, the evolution is described as using the mass eigenstates: ∑︁ |𝜈(𝑡)⟩ = 𝑈𝛼𝑖 𝑒 −𝑖𝐸𝑖 𝑡 |𝜈𝑖 ⟩ (1.5) 𝑖 √︃ where the energy is 𝐸𝑖 = 𝑝®2 + 𝑚𝑖2 , where 𝑝® is the momentum and 𝑚 is the mass. We then look at the probability of observing the neutrino in each of the flavor states: 2 𝑃𝜈𝛼 →𝜈 𝛽 (𝑡) = ⟨𝜈 𝛽 |𝜈𝛼 (𝑡)⟩ (1.6) ∑︁ 2 = 𝑈 𝛽𝑖 𝑈𝛼𝑖∗ 𝑒 −𝑖𝐸𝑖 𝑡 (1.7) 𝑖 −𝑖(𝐸𝑖 −𝐸 𝑗 )𝑡 ∑︁ = ∗ 𝑈 𝑈 𝑈∗ 𝑒 𝑈 𝛽𝑖 (1.8) 𝛼𝑖 𝛽 𝑗 𝛼 𝑗 𝑖, 𝑗 Some approximations can be made due to the fact we know the neutrino masses are small and we are in the relativistic regime, so that 𝐸 >> 𝑚𝑖 and 𝑡 ≈ 𝐿, where 𝐿 is the distance the neutrino has travelled [28]. 𝑚𝑖2 𝐸𝑖 ≈ 𝐸 + (1.9) 2𝐸 Therefore, to find 𝐸𝑖 − 𝐸 𝑗 , we can substitute in this approximation so that 5 𝑚𝑖2 𝑚 2𝑗 𝐸𝑖 − 𝐸 𝑗 ≈ 𝐸 + −𝐸− (1.10) 2𝐸 2𝐸 𝑚𝑖2 − 𝑚 2𝑗 ≈ (1.11) 2𝐸 Δ𝑚𝑖2𝑗 ≈ (1.12) 2𝐸 where Δ𝑚𝑖2𝑗 is referred to as the mass squared splitting. Putting this back into the probability: Δ𝑚𝑖2𝑗 −𝑖( )𝐿 ∑︁ 𝑃𝜈𝛼 →𝜈 𝛽 (𝐸, 𝐿) = ∗ 𝑈 𝑈 𝑈∗ 𝑒 𝑈 𝛽𝑖 2𝐸 (1.13) 𝛼𝑖 𝛽 𝑗 𝛼 𝑗 𝑖, 𝑗 Euler’s formula can be used to expand the exponential in Equation 1.13 along with using the unitary nature of U, so that: ! ∑︁ ∑︁ Δ𝑚𝑖2𝑗 𝐿 ª 𝑃𝜈𝛼 →𝜈 𝛽 (𝐸, 𝐿) = 𝛿𝛼𝛽 − 4 Re ∗ 𝑈 𝑈 𝑈∗ 𝑈 𝛽𝑖 sin2 © 𝛼𝑖 𝛽 𝑗 𝛼 𝑗 ­ ® 4𝐸 𝑖< 𝑗 𝑖 « ¬ ! 2 ∑︁ ∑︁ Δ𝑚𝑖 𝑗 𝐿 ª +2 Im ∗ 𝑈 𝑈 𝑈 ∗ sin © 𝑈 𝛽𝑖 (1.14) 𝛼𝑖 𝛽 𝑗 𝛼 𝑗 ­ ® 2𝐸 𝑖< 𝑗 𝑖 « ¬ Note that this is specific to neutrinos, with the imaginary term subtracted for the antineutrino case. The oscillation probability relies on the fundamental properties of the mixing angles from the PMNS matrix elements, which determine the amplitude of the oscillations, and the mass squared difference of the mass states, which determines the frequency of the oscillations. The probability is dependent on the energy of the neutrino (𝐸) and its distance traveled (𝐿), which can be measured experimentally. In total, there are seven free terms that govern neutrino oscillations: 𝜃 23 , 𝜃 13 , 𝜃 12 , Δ𝑚 312 , Δ𝑚 2 , Δ𝑚 2 , 𝛿 . 32 21 𝐶𝑃 A simplification of this is to consider the two neutrino approximation, where there are only two flavors (𝛼 = 𝑒, 𝜇) and mass states (𝑖 = 1, 2). In this case, Δ𝑚 2 accounts for the only two mass states (𝑚 12 − 𝑚 22 ) and U becomes the 2x2 matrix described as 6 © cos 𝜃 sin 𝜃 ª 𝑈 = ­­ ®. ® (1.15) − sin 𝜃 cos 𝜃 « ¬ Thus, the two flavor approximation for oscillation probability simplifies to ! Δ𝑚 2 𝐿 𝑃𝜈𝛼 →𝜈 𝛽 (𝐸, 𝐿) = sin2 (2𝜃) sin2 (1.16) 4𝐸 From this simplified view in Equation 1.16 it is clearer to see that to measure the mixing angle (𝜃) and the mass squared difference (Δ𝑚 2 ), one needs to experimentally determine the neutrino’s flavor (both 𝛼 and 𝛽), energy (𝐸), and distance travelled from generation to interaction (𝐿). For atmospheric neutrino oscillation measurements, this means understanding the expected neutrino flux per neutrino flavor to predict 𝛼 (Section 1.4). For 𝛽, 𝐸, and 𝐿, these will be determined in the detector itself using a given reconstruction method (Chapter 4). In addition, using Equation 1.16, one can note that the mass squared difference determines the frequency of the oscillation probability and the mixing angle controls the amplitude of the oscillation probability. Due to the relative amplitude of the 𝜃 23 term and frequency of the Δ𝑚 32 2 term at the distances (𝐿) and energies (𝐸) that IceCube probes with atmospheric neutrinos, the two flavor simplification is a good, first-order approximation of the expected oscillation features (Section 1.4). The 𝜈 𝜇 → 𝜈𝑒 oscillation features are sub-dominant and not resolvable for the IceCube atmospheric 𝜈 𝜇 disappearance analyses (left plot of Figure 1.7), thus the oscillation between the 𝜈 𝜇 → 𝜈𝜏 is the main contribution to the oscillation pattern observed (right plot of Figure 1.7). Even so, the full three flavor oscillation picture is considered computationally in the analysis in this work along with the matter effects described in the following section (Section 1.2.3). 1.2.3 Neutrino Oscillation in Matter As neutrinos pass through matter, they have the potential to interact with leptons and nucleons that is dependent on the density of the matter (Section 1.3). These interactions add an additional potential to the Hamiltonian for the neutrino oscillation: 7 𝐻 = 𝐻0 + 𝑉 (1.17) where 𝐻0 is the Hamiltonian for the kinetic energy, as defined in a vacuum (Section 1.2.2) and 𝑉 is the matter interaction potential. For the neutrinos traveling through the earth, the only surviving term that impacts the potential from lepton and nucleon interactions is from charged current interactions with electrons (Section 1.3), so that 𝑉 = 𝑑𝑖𝑎𝑔(𝑉𝑒 , 0, 0) [29]. √ 𝑉𝑒 = ± 2𝐺 𝐹 𝑛𝑒 (𝑥) (1.18) where 𝐺 𝐹 is the Fermi coupling constant and 𝑛𝑒 (𝑥) is the electron number density in the medium [29, 30]. The electron number density is not constant in the earth and must be properly accounted for in the analysis (Section 6.4.2). This adds a density-dependent term to the oscillation calculation that is the specific to the path of the neutrino. This is accounted for in the computational modeling with estimates of the electron density of the earth, so it remains that an oscillation experiment must measure the same parameters (final flavor, energy, and path of the neutrino). Accounting for the mass effects means that the modeling gets more complicated, where one must account for the electron density estimation in the earth when considering systematic parameters that could affect the experiment. Note that for normal ordering, the matter effects are seen in the neutrino channel (visible in Figure 1.7). For inverted order, the matter effects are seen in the antineutrino channel. 1.3 Neutrino Interactions Neutrinos interact only via the weak force, which has two types of mediators: the neutral 𝑍 0 - boson and the charged 𝑊 ± -boson. The Feynman diagrams for the interactions with these mediators are described in Figure 1.3. charged current (CC) interactions are mediated by the 𝑊 ± boson, and with an incident neutrino resulting in a charged lepton that is associated the incident neutrino’s flavor (i.e. 𝜈𝑒 → 𝑒). Neutral current (NC) interactions are mediated by the 𝑍 0 boson, and with an incident neutrino resulting in an output neutrino of the same flavor (𝜈𝑒 → 𝜈𝑒 ). 8 Figure 1.3: Feynman diagrams for the charged current (left) and neutral current (right) interactions. The ℓ represents the leptons (𝑒, 𝜇, or 𝜏). This particular example is of deep inelastic scattering mechanism (Section 1.3.3). At energies from 1-200 GeV, which encompasses the energy range relevant for the 𝜈 𝜇 disappear- ance analysis in Chapter 6, there are three mechanisms for the neutrino scattering. Figure 1.4 shows the cross section as a function of energy, where the expected scattering is either quasi-elastic (QE), resonant (RES), or deep inelastic scattering (DIS) as energy increases. Note that the anitneutrino cross section has a smaller total magnitude (right plot in Figure 1.4), with roughly the same overall shape as the neutrino cross section so that the ratio between them is about 2 across all energies. The following sections explain the process for each of the three aforementioned scattering mechanisms. Figure 1.4: Contributions of each of the interaction cross sections as a function of energy (plots from [2]). Neutrinos are in the left plot and anti-neutrinos in the right plot. The general trend is the same for both neutrino and anti-neutrino, with QE dominating at the lowest energies (< 2 GeV), resonance at a few GeV, and DIS at the 10 GeV-scales. 9 1.3.1 Elastic and Quasi-Elastic Scattering This interaction occurs when a neutrino scatters off of an entire nucleon and frees a nucleon or multiple nucleons. Elastic scattering can occur for neural current interactions and quasi-elastic (QE) scattering for charged current interactions. The cross section for this mechanism dominates at energies < 2 GeV [2]. The analysis in this work will focus on reconstructed energies > 5 GeV, so this process should only be relevant for a small fraction of the total sample. For QE neutrino CC interactions, the neutron is converted to a proton. For QE antineutrino CC interactions, the proton is converted to a neutron. For QE NC interactions (regardless of neutrino or antineutrino), there is no conversion, such that the proton will remain a proton after scattering. Calculating the cross section depends on some constants that have been constrained experimentally to high precision, the neutrino energy and four-momentum transfer, along with an axial mass term (𝑀 𝐴 ) [2]. This term has had a number of experimental measurements [2]. For the purposes of this analysis, the GENIE simulation framework is used since it is optimized to model low energy neutrino interactions, and it defines 𝑀 𝐴 = 0.99 GeV [31]. 𝑀 𝐴,QE is also included as a systematic parameter (Sections 3.2 and 6.4.3). 1.3.2 Resonant Scattering Resonance describes when the neutrino strikes the nucleon into an excited state, creating a nucleon and typically one or more pions, with photons and kaons also possible but less likely [2]. The cross section calculation typically used the Rein–Sehgal model, which is also employed by the GENIE simulation [2, 32]. Note that there is also an axial mass 𝑀 𝐴 term in the resonant scattering cross section, but it is set to 𝑀 𝐴 = 1.12 GeV in GENIE [31]. There is a separate 𝑀 𝐴,res systematic parameter included in the analysis to account for this (Section 6.4.3). The energy range used for this analysis, starting at reconstructed energies of 5 GeV, means that resonance will also be a subdominant mechanism that affects a small fraction of the events. The majority of the analysis sample will be composed of events that have undergone deep inelastic scattering. 10 1.3.3 Deep Inelastic Scattering Deep Inelastic Scattering (DIS) occurs when the neutrino has enough energy to scatter off of a quark in the nucleon. This mechanism leads to a hadronic shower along with the final state lepton, depicted by the Feynman diagram examples for CC and NC in Figure 1.3. DIS is expected to dominate at energies > 10 GeV as seen in Figure 1.4, and its cross section increases approximately linearly with energy in this region [2]. Thus, this will be the main mechanism for events in the analysis sample outlined in Chapters 5, due to the events selected so that their reconstructed energy is between 5-100 GeV. The DIS cross section has multiple theoretical models that have been calculated with a focus on different energy ranges. The main neutrino interaction simulation framework used in this work, GENIE, is optimized for < 100 GeV energies [31, 33]. While this is mostly aligned with the proposed energy range of the analysis in this work, there is a noted mismatch between some more current models of the DIS cross section, such as the CSMS [34]. CSMS calculation is known to not produce reliable results < 100 GeV, but does model the cross section well at higher energies. The mismatch in the cross section between these two models will be accounted for in the systematics of the analysis (Section 6.4). 1.4 Atmospheric Neutrino Sources One prevalent, natural source of neutrinos is from the cosmic ray interactions in the atmosphere. High energy nuclei and protons from outer space, or cosmic rays, interact with air molecules in the upper atmosphere, creating a hadronic shower (hadrons are particles made of two or more quarks). The cosmic ray shower is depicted in Figure 1.5a, showing the pions and kaons produced in the shower. The decay of these hadrons is the primary source of the atmospheric neutrinos [35]. Figure 1.5b shows the fraction of neutrinos expected from the pions vs. kaons, which is both energy dependent (x-axis) and incident zenith angle dependent (solid line indicates 𝜃 𝑧 = 0◦ and dotted line 𝜃 𝑧 = 60◦ ) [35]. This means that the neutrino flux is dependent on the meson production modeling, which in- 11 (b) Fraction of muons and muon neutrinos expected from the pion and kaon decay. The solid line indicates 𝜃 𝑧 = 0◦ (vertical) and the dotted (a) Diagram of the particle line 𝜃 𝑧 = 60◦ (plot from [35]). shower from cosmic ray inter- action (from [36]). Figure 1.5: Description of atmospheric neutrino sources from cosmic ray interactions cludes factors such as the cosmic ray flux, hadronic interaction modeling of the shower, atmospheric density and temperature, and the Earth’s magnetic field [3]. Some of these will be considered as systematic parameters during the analysis, with more details in Section 6.4. 1.4.1 Neutrino Flux Some of the factors that influence the meson production modeling, and in turn the neutrino flux, have been mentioned. Numerous models have attempted to quantify the neutrino flux, with the Honda model [37] used as a reference for this work. To look closer at the behavior of the neutrino flux, Figure 1.6 depicts the flux at the South Pole Lab (SPL) across various dependencies. The left plot shows the direction averaged flux as a function of neutrino energy. The gray dashed lines indicate the maximum region of interest for the 𝜈 𝜇 disappearance analysis (Section 6), denoting 1-200 GeV. The colored lines show that the largest expected flux at all energies is 𝜈 𝜇 . The 𝜈 𝜇 flux is more consistent than the 𝜈𝑒 flux in this range, but broadly it can be said that the 𝜈 𝜇 flux 12 Figure 1.6: The neutrino flux (Φ𝜈 ) with all directions averaged as a function of energy (left plot), ratio of the flux for the difference seasons as a function of energy (middle plot) and the neutrino flux as a function of cos 𝜃 𝑧 (plots from [3]). is at least 2 but up to 100 times larger than the 𝜈𝑒 flux. Both of the electron and muon neutrinos are typically twice as large compared to the expected flux of their respective antineutrino. The middle plot shows the seasonal dependence of the flux, where the y-axis has the ratio of the flux to the one year average. Note that there are 4 plots stacked on top of each other, showing a range of ±5% deviation from the average. There exists a clear seasonal difference in the flux, with Dec-Feb having an above average flux and Jun-Aug below average. This is due to seasonal change in the altitude of cosmic ray interactions [3]. Note that this difference is also energy dependent, with less seasonal variation at low energy compared to even the 200 GeV point marked with the dashed gray line. This contributes to the seasonal variations discussed in 4.2.1. The right plot in Figure 1.6 shows the flux dependence on the zenith angle at a single energy (3.2 GeV). The seasonal variation (shown between the dashed and solid lines) is more severe for neutrinos that are vertically downgoing at the South Pole (cos 𝜃 𝑧 = 1, explained further in Section 1.5). It is proposed that this is due to the muon energy loss in the atmosphere with the seasonal variations [3]. Overall, the shape of the flux is influenced by the geomagnetic field influencing the path of the charged leptons [3]. What we learn from these plots is that the neutrino flux has many factors that can cause 13 significant changes to the expected neutrino yield. Among these factors are influences that have energy, directional, seasonal, and flavor dependence. Section 6.4.1 discusses how these factors are accounted for in the oscillation analysis along with parameters that are used as systematics during the fit to account for potential deviations from the estimated models. 1.5 Oscillation Physics with Atmospheric Neutrinos Atmospheric neutrinos can be detected at various baseline distances (𝐿) and energies (𝐸) to provide a measure of the oscillation over the 2D parameter space. The distance is measured by using the angle that the neutrino travels through the detector, relative to the surface about the detector. Thus, a neutrino that is generated in the atmosphere directly above the surface of the IceCube detector at the South Pole and travels down through the detector is called "downgoing" and is denoted with 𝜃 𝑧 = 0 or cos 𝜃 𝑧 = 1. An "upgoing" neutrino is created on the other side of the earth, at the North Pole, and travels through the length of the earth to reach IceCube (𝜃 𝑧 = 180 or cos 𝜃 𝑧 = −1). The distance 𝐿 can then be calculated using the angle to find the distance through the earth that the neutrino has to travel to reach the detector. Since the zenith angle is the parameter measured using the detector, this will be reported such that cos 𝜃 𝑧 ∼ 𝐿. 1.5.1 Expected Oscillation Signature Figure 1.7 shows the probability of oscillation to 𝜈 𝜇 for each of the neutrino flavors. The 𝜈 𝜇 disappearance (𝜈 𝜇 → 𝜈 𝜇 ) is expected at cos(𝜃 𝑧 ) < 0.1 and 𝐸 < 100 GeV, where the probability of a muon neutrino remaining a muon neutrino is near zero. This is the region that experiments must be able to resolve to measure the disappearance signature. The value of the mass squared difference determines where the oscillation bands are located relative to 𝐿 and 𝐸–as the frequency of the oscillations. The mixing angle controls the depth of the oscillation signal shown in the color bar of Figure 1.7–as the amplitude of the probability. Note that the dominant consideration is for 𝜈 𝜇 → 𝜈𝜏 , which has a much stronger amplitude for the oscillation signal than 𝜈 𝜇 → 𝜈𝑒 . This is why 𝜈 𝜇 disappearance can be approximated with the two flavor neutrino picture, since the oscillation 14 Figure 1.7: Simulated probability of 𝜈𝑒 , 𝜈 𝜇 , 𝜈𝜏 (left to right) → 𝜈 𝜇 at various distances (in cosine zenith) and energies assuming normal ordering. Notice that the mass effects are visible around cos 𝜃 𝑧 = −0.8 in all three plots, in the region of 2-16 GeV for the first plot and 4-8 GeV in the middle and left plots. into 𝜈𝑒 is a sub-dominant effect. The analysis in this work is still performed computationally including the full three flavor oscillation paradigm, along with accounting for the matter effects. 1.5.2 Status of Global Oscillation Measurements For Δ𝑚 232 and sin2 (𝜃 ), numerous experiments have recently probed these parameters using 32 atmospheric neutrinos like in IceCube and Super-K ([17, 16, 38]) and accelerator-based neutrinos like in NOvA, MINOS+, and T2K [4, 7, 5]. Figure 1.8 has the current results for all of the aforementioned experiments compared to IceCube’s 7.5 year golden neutrino sample (Section 1.5.3). The advantage of IceCube’s ability to measure neutrino oscillation is the high statistics of atmospheric neutrino data available due to the detector’s large volume and the rate of atmospheric flux. IceCube has yet to publish a result taking advantage of the 10 years of livetime available with an analysis method capable of utilizing the full available sample. The goal of this work is to develop a reconstruction method to quickly process the high statistics sample and to analyze the sample using the IceCube 𝜈 𝜇 disappearance analysis procedure. 15 Figure 1.8: Result contours for global experimental results, including IceCube 7.5 year result and likelihood-based 9.28 year analysis’ sensitivity projection [4, 5, 6, 7, 8, 9]. 1.5.3 Status of IceCube Oscillation Measurements The most recent IceCube 𝜈 𝜇 disappearance result uses 7.5 years of data. Due to the method used for this analysis, it only uses "golden" neutrino events, or a harsh selection criteria [10], such that the events must have at least 5 hits that come from direct light (i.e. unscattered light, see Section 2.3.1). Even using the tight requirement, the 𝜈 𝜇 disappearance result with the golden sample provides improved contours compared to the previous three year livetime IceCube results. The contours and best fit result for sin2 (𝜃 23 ) and Δ𝑚 32 2 will be used as a comparison for this analysis. In addition to the 9.28 year livetime sample discussed in this work, there is another 9.28 year livetime sample that is reconstructed and optimized using a likelihood-based reconstruction [10]. Since the likelihood-based reconstructed sample analysis is not yet completed, it is only used in this work for comparisons on reconstruction resolution and projected sensitivity to the oscillation parameters. 16 CHAPTER 2 NEUTRINO DETECTION TECHNIQUES 2.1 IceCube Neutrino Observatory The IceCube Neutrino Observatory instruments a cubic kilometer of natural glacial ice at the South Pole. Deployed in columns of melted and then refrozen ice, IceCube is comprised of 5,160 Digital Optical Modules (DOMs) attached to 86 strings of cabling. The cables provide both power and communication capabilities to send readouts from the DOMs to the surface of the ice, which are then processed and recorded by computers in the IceCube Laboratory. Since the full deployment in the ice in 2011, the survival percentage of DOMs has exceeded 99% [39]. This stability has helped IceCube to record more than 10 years of livetime of data. 2.1.1 Geometry of the Main Detector Of the 86 strings, 78 are called "IceCube" strings. These strings are placed in a hexagonal array, with 125 m between them in 𝑥𝑦 position. Each IceCube string has 60 DOMs distributed between 1450 m to 2450 m in depth with a 17 m spacing between the modules [39]. The IceCube strings are shown in green in Figure 2.1. The DeepCore strings are the center 8 strings shown in red in Figure 2.1. These are less regularly spaced in the 𝑥𝑦 plane, ranging from 41 m to 105 m apart, all of which are closer together than the IceCube string spacing. The DeepCore strings have 60 total DOMs, but the top 10 are used for vetos and span depths between 1750 m to 1850 m (red highlight in side view of Figure 2.1), while the bottom 50 DOMs are between 2100 m and 2450 m in depth (light green highlight in side view of Figure 2.1) [39]. The veto DOMs are spaced every 10 m vertically, whereas the core 50 DOMs on the DeepCore string are 7 m apart in 𝑧 position. The majority of the DOMs on the DeepCore strings also have photomulitplier tubes (PMTs) with a higher quantum efficiency (∼35%) [39]. This aids in making the DeepCore DOMs more sensitive to low light, low energy 17 events. 2.1.2 Digital Optical Module The fundamental detection unit on IceCube, the DOM, contains a 10" PMT pointing downward into the ice (Figure 2.2). Along with the PMT, each DOM houses an embedded high-voltage generator, a LED flasher board for calibration, and a main board that processes the analogue and digital signals from the PMT. The components of the DOM are protected by a 13" glass pressure sphere that withstands the high pressure deployment in the ice. The PMT in the DOM captures photons from the Cherenkov light produced by charged particles traveling faster than the speed of light in ice (Section 2.2.1). The PMT amplifies the signal of electrons ejected from the photocathode due to the photon capture, creating a voltage drop at the anode that creates a pulse. The pulse’s raw waveform is digitized and read out if the signal surpasses the threshold of 0.25 photoelectrons (p.e.) [39]. 2.1.3 DOM Efficiency The DOM efficiency describes the scaling of the expected PMT efficiency in the DOMs. The PMT’s photon detection efficiency was measured before deployment into the ice for 13 DOMs, which informs the calibrated "expected" efficiency used for all DOMs. The DOM efficiency is a systematic parameter that allows the scaling on the calibrated efficiency to be adjusted. This is one of the systematic parameters for the analysis outlined Section 6.4. 2.2 Neutrino Detection in Ice Neutrinos are detected through indirect methods, due to the fact that they only interact weakly, so one must measure the light produced by secondary particles and their successive interactions. This information is then used to reconstruct, or estimate, the energy, direction, and interaction vertex of the incident neutrino. Due to the different secondaries expected from the CC and NC neutrino interactions for the different flavors, there are two main classes of signatures expected. 18 Figure 2.1: Top and side view of the IceCube detector (modified from [10]). The left side plot shows the variation in the scattering and absorption as a function of depth (Section 2.3.1). Two footprints are highlighted in the top view: the DeepCore array circled in a dotted blue line and the strings used to train the CNN reconstruction (Chapter 4) circled in purple. 19 Figure 2.2: Schematic of the digital optical module (DOM). Section 2.2.1 will first discuss when and how the secondary particles produce light. Section 2.2.2 will compare the different signatures visible in the IceCube detector and their interaction sources. 2.2.1 Cherenkov Radiation Cherenkov light is radiation emitted when a charged particle travels faster than the speed of light in a medium, primarily in the blue spectrum or at wavelengths of about 400 nm [40]. For a material’s refractive index (𝑛), charged particle’s velocity in a medium (𝑣), and speed of light in vacuum (𝑐), if 𝑣 > 𝑛𝑐 then a cone of light will be emitted. The cone of light is such that the opening angle 𝜃 𝐶 relative to the path of the charge particle is 𝑐 cos 𝜃 𝐶 = . (2.1) 𝑛𝑣 As 𝑣 → 𝑐 and using 𝑛 = 1.32 in ice [23], 𝜃 𝐶 ≈ 41◦ . Using 𝑣 = 𝑛𝑐 , we can find the energy threshold to see Cherenkov radiation emission. The relativistic energy must be used in this case [24]: 20 𝑚𝑐2 𝑚𝑐2 𝑛 𝐸𝐶 ≥ 𝑚𝑐2 𝛾 ≥ √︃ ≥ √ (2.2) 1 − ( 𝑣𝑐 ) 2 𝑛2 − 1 (2.3) which is dependent on the particle mass (𝑚). Thus, the minimum energy an electron needs to create Cherenkov radiation in the ice is 780 keV and a muon needs at least 160 MeV. Cherenkov radiation occurs with any charged particle that has reached the threshold energy in the ice, and thus light is expected from both the charged leptonic (e, 𝜇, 𝜏) and hadronic (made up of two or more quarks including proton, neutrino, pion, and kaon) secondaries. While they create similar cones of light, there are some differences in the signatures that these leave in the detector. 2.2.2 Detection Signatures in IceCube There are three main components of the neutrino interaction that create Cherenkov light. There is the lepton itself, which is the secondary that occurs in the charged current interactions (Section 1.3). There are also electromagnetic and hadronic showers. Electromagnetic showers are created by electrons, which do not travel far before scattering in interactions with other particles in the ice and creating an abundance of localized light from these interactions. Hadronic showers are created by hadrons, such as the neutrons and protons, which are disrupted during the DIS interactions and daughter particles emitted from the nucleons (Section 1.3). Figure 2.3 shows the different expected Cherenkov light contributors from the various flavors and interaction types. In the detector, these different interaction outputs for < 100 GeV neutrinos break down into two categories: cascade or track topology. 2.2.2.1 Topology of Cascade-Like Events Cascade-like events are signatures in the detector that are localized or have a spherical structure relative to the detector spacing. This structure occurs due to the hadronic and electromagnetic 21 Figure 2.3: Different types of interactions, their secondaries, and the expected signatures for incident neutrino energies < 100 GeV (modified from [11]). showers creating a cascade of charged particles in all directions, with the starting vertex at the center. This makes it difficult to resolve the direction the incident neutrino is traveling at these energies due to the small length scale of the shower compared to the detector spacing. Events that have a cascade signature are all 𝜈 flavor NC events, 𝜈𝑒 CC events, and most 𝜈𝜏 CC events at these energies (Figure 2.3). The NC events have an invisible outgoing neutrino, thus their only signature in the detector will be the hadronic shower from the disruption of the nucleus. The 𝜈𝑒 CC events have an outgoing electron, which will create the electromagnetic shower, giving the event the cascade-like signature. For the 𝜈𝜏 CC events, there are a few possibilies for the expected interactions. This is due to the fact that the tau lepton has heavy mass and thus has a shorter lifetime than both the muon and electron, so it is expected to decay quickly in the detector. Thus, we must examine the secondaries that the tau lepton decay will emit, which has three main expected decay modes. The largest fraction of the branching ratio is where the tau decays into various hadrons to create a hadronic shower, with 22 a branching ratio of 65% [23]. The next most dominate decay is where the 𝜏 decay produces an electron (𝜏 → 𝑒𝜈𝜏 𝜈𝑒 ) with a branching ratio of 18% [23]. This would result in an electromagnetic shower after the tau decay. At energies < 100 GeV, the tau is expected to only survive on the order of millimeters. Thus, the potential double bang signature of the two cascades (one from the original 𝜈𝜏 interaction and the 𝜏 decay) is not resolved in the detector, since the vertex of these signatures would overlap and be indistinguishable from a single cascade. The last fraction of the tau decay branching ratio is the tau decaying into a muon secondary (𝜏 → 𝜇𝜈𝜏 𝜈 𝜇 ), with a branching ratio of 17% [23]. This would lead to a track-like signature (Section 2.2.2.2), but with the expected 𝜈𝜏 CC component in the analysis sample comprising of < 5% of the total sample (Table 5.2), this is not expected to have a large effect in this work. Thus, the majority of the 𝜈𝜏 CC events will have a cascade topology and will be considered as such for this analysis. 2.2.2.2 Topology of Track-Like Events Track-like signatures in the IceCube detector are events that have a deposit of light whose source travels across the detector. The primary example of this is the outgoing muon from the 𝜈 𝜇 CC interaction (Figure 2.3). The energy of the outgoing muon affects both the number of photons emitted and the distance the muon travels before decaying. From an initial neutrino energy of 100 GeV, the outgoing muon is expected to travel less than 400 m in the ice. Thus, at the lowest energies, a track-like event can appear indistinguishable from a cascade-like event. As the track length increases, these events are easier to reconstruct the direction of travel, since there is a longer signature across the detector. 2.3 Optical Properties of the Ice IceCube takes advantage of the large volume of natural glacier ice at the South Pole. This is what allows IceCube to instrument such a large area, but also creates challenges to handle the natural irregularities. The key optical properties that IceCube needs to understand are the expected scattering and absorption lengths in the ice so the simulated photons can be accurately propagated 23 in the ice. When the photon scatters, it changes direction which increases the difficulty to trace the photon back to its origin. When a photon is absorbed, it interacts in the ice, so it is no longer detected by the DOMs. Achieving a better understanding of the ice is done by using the LED flashers on the DOMs to preform calibration [39, 41], along with using laser dust loggers to record back-scattering in the ice [42]. 2.3.1 Bulk Ice Properties The bulk ice refers to the unaltered ice used as the detector volume for IceCube. It is used to contrast the sections of refrozen columns of ice, the hole ice (Section 2.3.2), where the detector strings were inserted. The optical properties of the bulk ice are not uniform across the detector, but their variations have been explored and their dependencies accounted for in the simulation’s ice model. The ice model is used when propagating the simulated photons in the ice, the process which is described in Section 3.3. The ice model used for this work is called SPICE version 3.2.1, with more details in [11]. The following sections will describe important features of the ice model. While the model is used during the simulation, the parameters for the bulk ice’s scattering and absorption coefficients are also systematic parameters that are free during the analysis (Section 6.4). These apply a multiplicative factor to increase/decrease the amount of the absorption or scattering in the bulk ice, which in turn affects the number of photons expected to be detected by the DOMs. 2.3.1.1 Depth Dependence Figure 2.4 shows the variation of the scattering and absorption properties of the ice as a function of depth. The location of the DOMs on the IceCube strings are highlighted in pink, starting at 1450m, and extend down to overlap with the DeepCore DOM depths. One reason that the detector starts at depths greater than 1450 m is to utilize regions of more pure ice. DeepCore’s main volume instruments even deeper ice, to avoid the dust layer. The dust layer is the region with increased scattering and absorption of the photons found at depths of 2000 m to 2100 m (Figure 2.4). There are other, more subtle variations of the ice too that are 𝑥𝑦 dependent. 24 Figure 2.4: Scattering and absorption coefficients (left scale in blue line) and lengths (right scale in red) as a function of depth. Smaller values of the coefficients indicate cleaner ice [12]. 2.3.1.2 Azimuthal Dependence The depth dependence is not perfectly uniform across each 𝑧 position, due to the structure of the bedrock. Rather, there is a tilt to the horizontal layer structure where some of the ice properties are shared up to ±56 m apart [41]. This creates an azimuthal dependence, where position relative to the angle in the 𝑥𝑦 plane has an impact on the ice properties. There is also an additional azimuthal dependence seen in a preferential direction for the scattering in the ice, due to the glacial ice flow [13]. Figure 2.5 shows the relative charge differences seen between a simulation that does not account for these differences with the data, as a function of the azimuth angle. 2.3.2 Hole Ice Properties The hole ice refers to the refrozen cylindrical volume around the strings, which has different optical properties than the undisturbed bulk ice. Even though the same melted ice was used to refreeze in the drill holes, there is a difference observed in the region around the DOMs [43]. One potential reason for the different behavior is that the hole ice contains more air bubbles than the bulk ice [43]. During the refreezing processes, gases were released and without the years of compression 25 Figure 2.5: The charge difference between MC and data measured as a function of the azimuth angle, showing the impact of the tilt and flow anisotropy [13]. that the bulk ice experienced, a column of bubbles has been observed in the frozen ice using in-ice cameras [43]. This creates a local scattering effect that is independent from the observed bulk ice effects. The hole ice is accounted for in the simulation using a function to adjust the angular acceptance of the photons on the DOM. It is parameterized using two variables derived from a principal component analysis that describe the angular acceptance expected in the hole ice, 𝑝 0 and 𝑝 1 . These two variables are varied and multiple sets of simulation are generated with the different values of 𝑝 0 and 𝑝 1 to observe their effects and systematic parameters that are accounted for during the analysis (Section 6.4). 26 CHAPTER 3 LOW ENERGY SIMULATION AND DATA SAMPLE PROCESSING 3.1 Monte Carlo Samples The Monte Carlo (MC) samples are generated to simulate a wide range of particles and inter- action types. There are three main categories of simulation used in this analysis: events triggered by neutrinos, atmospheric muons, and noise. Multiple samples, or sets, were generated for the neutrino and atmospheric muon simulations, with each set having different settings for the detec- tor systematics. For this work, there are two main purposes for the generated neutrino samples: analysis and neural network training. The different uses and goals for these samples are described below, followed by Section 3.2 which describes how each of the types of MC are generated. The last section describes the shared processing for both MC and data to create samples that are pure enough to have the final reconstruction applied (Chapter 4). A flow chart of the processing pipeline for the MC and data is in Figure 3.1. 3.1.1 Neural Network Training Samples Two sets of neutrino samples are independently generated and specifically optimized for creating neural network training samples. Unlike the typical goal of analysis MC sets, which is to simulate the expected distributions of the data, these MC sets are generated with the same physics and interaction modeling but their final distributions are intended to be flat with respect to a single variable. The aim is to create samples that are unbiased with respect to the specific variable the network is being trained to reconstruct, and that ideally extend past the target reconstruction region to avoid artificial bias at the edges of the sample. The two special samples generated for training are intended to have the first sample uniformly distributed in energy (per GeV) and second sample uniformly distributed in zenith angle (per radian). The zenith network is trained for the zenith angle, not cosine of the angle, due to 27 Figure 3.1: Flow chart describing the processing for simulation (left) and data (right). training complications at the boundary of the sample, further explained in Section 4.2.5.2. The uniform energy training sample is also used to make the optimized training samples for the track classifier (Section 4.2.5.4) and muon classifier (Section 4.2.5.5). The vertex training sample uses an unoptimized sample, also derived from the energy training sample (Section 4.2.5.3). These samples undergo the same processing as the analysis sample up until Level 6 (Section 3.4.5). After Level 6, specific training optimization cuts are made to each training sample (Chapter 4). These samples are then used to train the networks, but are not used for any further purposes in the analysis. 3.1.2 Analysis Sample The goal of the analysis sample is to generate a MC sample that is as close to the expected data distribution as possible. While there is some importance sampling implemented to minimize the computational burden when simulating these events, there is close accounting of the sampling during generation so that weighting schemes can be applied to match the analysis distribution to the 28 expected flux. This sample, referred to hereafter as the analysis sample, will be used to evaluate the robustness of the analysis pipeline, project the sensitivity of our measurement of the oscillation parameters, assess the impact of the systematic parameters, and assess the goodness of fit (Chapter 6). The analysis sample is also used as the testing sample for the neural network reconstructions to gauge their performance on a data-like distribution that is completely independent of the training samples (Section 4.3). 3.2 Monte Carlo Generation 3.2.1 Neutrino Events The neutrino MC samples are generated using GENIE (version 2.12.8), which is a neutrino MC generator with an emphasis on neutrino interaction physics at the few-GeV energy range including QE, RES, and DIS interactions (Section 1.3) [32]. The cross-sections for all flavors of neutrinos and antineutrinos are pre-calculated by GENIE from 0-10 TeV for the analysis sample. The neutrinos are generated using a given generated spectrum, which does not have to necessarily match the expected physical flux since it will be weighted using a given model. The neutrino simulation is run in parallel batches split by energy to help distribute the com- putational load of the simulation. The neutrinos are uniformly injected in a manually-defined cylindrical volume centered on DeepCore with a defined spectral index. This is optimized such that the generation volume is adjusted with the neutrino energy range of each batch so that events are concentrated in the DeepCore region. This will also improve the chances that the events make it to the final level of processing. The ratio of generated neutrinos and antineutrinos is also defined as 70% and 30%, and kept constant between both the analysis sample and the training samples. For the analysis sample, the optimized settings are shown in Table 3.1 [44]. The analysis sample uses a constant spectral index of 𝐸 −2.0 for all flavors and energy ranges. The events are then weighted to match the expected model flux, so that the computational optimization does not jeopardize the physics result. The training samples for the reconstruction training specifically avoid matching the expected, 29 Flavor Energy (GeV) Radius (m) Length (m) Events/file Files 1-4 250 500 450k 4-12 250 500 100k 𝜈𝑒 + 𝜈¯ 𝑒 650 12-100 350 600 100k 100-10000 550 1000 57.5k 1-5 250 500 408k 5-80 400 900 440k 𝜈 𝜇 + 𝜈¯ 𝜇 1550 80-1000 450 1500 57.5k 1000-10000 550 1500 6.7k 1-4 250 500 1,500k 4-10 400 900 300k 𝜈 𝜇 + 𝜈¯ 𝜇 10-50 350 600 375k 350 50-1000 450 800 200k 1000-10000 550 1500 26k Table 3.1: Simulation settings used to generate neutrinos for the analysis sample. physical model distribution since the goal of these samples is to create a flat distribution and weighting is not used when training. Thus, the 𝜈 𝜇 and 𝜈𝑒 events generated for the training sample have various energy ranges and cylinder volumes used. The correlation between the energy and cylinder volume remains similar to the analysis sample (Table 3.1), but the number of events and files generated and even the spectral index vary significantly to create the artificially flat sample. Both the analysis sample and the training samples are generated with the nominal simulation settings, with detector systematics set to their calibrated defaults. The analysis sample is also generated as various systematics sets that are simulated with variations in five of the detector systematic settings (relating to DOM efficiency, hole ice, and bulk ice). More details about these settings and how they are integrated into the analysis are give in Section 6.4.4. 3.2.2 Muon Events The muon MC samples are generated with MuonGun, an IceCube software package that simulates atmospheric muon interactions [45]. The muons also use a generated spectrum that is optimized for computational efficiency and later weighted to an expected physical flux. The muons are generated on a defined cylindrical surface to determine their starting position, such that the generation surface is slightly larger than the IceCube detector volume. Muongun then 30 propagates the muons inward toward the detector. There is also an inner cylinder defined that the muons must intersect. In the case of the oscillation sample, this inner surface is defined as the DeepCore array to ensure that only muons that go through the DeepCore detector will be simulated. The unweighted muon flux used for generation is tuned to match some target distribution to further improve efficiency of generating time-intensive simulations that will be relevant to the final analysis. For the oscillation sample, it is optimized to generate muon events that pass cuts up to Level 5 (Section 3.4.4) and a scaling is applied to compensate for events that rarely make it through the optimization. Three categories of muon simulation were generated. The first was produced without target volume and/or filtering applied, to tune the early (Level 3-5) cuts. The second category is the muon sample with the baseline detector settings, optimized to pass the cuts up to the Level 5 processing, so that computational time is saved when generating a nominal sample to account for the expected atmospheric muon contamination (Section 5.4.3). The last category is a set of muon samples that have various shifts in the detector systematics (Section 6.1), to explore the effect of detector systematics on the muon distribution. 3.2.3 Noise Hits Noise hits, or dark noise, which describes DOMs triggered by something other than a Cherenkov photon from an external source, are expected in every event. The noise can originate from various sources that causes an electron emission in the photocathode, including thermal emission, electronic noise, or radioactive decays in the DOM components [39]. Noise hits are not a separate type of event, but rather are hits added into the pulse series for the neutrino and muon events to model the expected data pulse series more realistically. These simulated noise hits are generated with Vuvuzela, an IceCube software module written to simulate the expected noise rate of the DOMs [46]. 31 3.2.4 Noise Events Pure noise events occur in the detector, when the event triggers from noise hits alone, not a neutrino interaction or atmospheric muon. These events are simulated using the Vuvuzela software, and requires a continuous, time-intensive simulation of all detector components until a random noise coincidence triggers the DeepCore filter [11]. These events are very rare, though they are more common in the low energy IceCube samples due to the low threshold for event triggering due to targeting low energy events. Various cuts and filters are applied to specifically target these events, such that their final presence in the analysis sample is negligible (Table 5.2). 3.3 Propagating MC Photons and Detector Response Once generated, the process of propagating the different particles and accounting for the detector response is similar between the different simulation types. The interactions of the primary particles are simulated so that secondary particles and their properties are recorded. The Cherenkov photons, from the initial interaction and the secondary particles’ interactions, are then propagated through the ice and recorded if they intersect a DOM. The propagation is dependent on the ice model to determine local scattering and absorption properties (Section 2.3.1). The DOM response is taken into account by converting photons from the simulation into the hits that the DOM would record. This step has dependence on the PMT efficiency in the DOM, the thermal noise, and the hole ice (Section 2.3.2). Finally, the online data filter algorithms are applied to the simulation to determine which of the filters the event would pass. 3.4 Data Processing At this stage, both data and MC undergo the same processing. The first processing level (Level 2) applies the DeepCore filter, cleans the pulse series, and performs the first fast cuts to select events in DeepCore. Level 3 focuses on atmospheric muon and noise event removal to improve data/MC agreement. Level 4 uses machine learning methods to further reduce atmospheric muon and noise events. Level 5 targets coincident and corridor events using new algorithms and tighter 32 machine learning cuts. Level 6 calculates variables needed for the final analysis cuts. After Level 6, the samples are ready to be prepared for the reconstruction neural network training or to have the reconstruction applied to the analysis sample. The following sections outline the processing pipeline that has been developed by various other researchers in the IceCube collaboration [44]. 3.4.1 Level 2 For the purpose of the oscillation analysis, the DeepCore filter trigger is the first requirement, to ensure that events happen in the DeepCore region of the detector. The DeepCore filter requires at least 3 HLC hits (defined below) in the DeepCore array within a sliding time window of 2.5 𝜇s. Level 2 cuts any events that do not pass the DeepCore filter. Level 2 also applies an initial cleaning algorithm to the pulse series, to remove noise pulses that are not correlated with the neutrino event interaction. This algorithm uses the two categories of hits, the Soft Local Coincidence (SLC) and the Hard Local Coincidence (HLC). When a DOM records a hit, the PMT waveform is digitized and read out if the signal surpasses the threshold. An HLC hit is when there is a nearby coincidence with a neighboring hit DOM on the string (within 2 DOMs above or below) within 1 𝜇s. In contrast, an isolated hit is called an SLC hit, with this naming convention used for historical reasons. The SLC hits are useful for reconstructing low energy events, which produce less light for the detector to record, but are also more likely to be dark noise. Thus, since SLC hits are included in low energy events, additional algorithms are necessary to clean the pulse series. The Seeded-RT (Radius and ΔTime) cleaning algorithm uses the HLC hits as its starting list to seed the radius and time cleaning procedure. It loops through all of the HLC hits and searches for nearby SLC hits, checking if they occur within a certain radius and time window relative to the HLC hit. For the DeepCore array, it must be within 75 m and 500 ns and for the IceCube array, it uses a looser 150 m and 1000 ns to account for the larger distance between DOMs. These SLC hits within the radial and time window are added to the kept series, and the procedure is re-run to see if any remaining pulses are now near enough with the new series. This is repeated until no new pulses are added to the series, and all remaining pulses outside of the clean 33 pulse series. In addition to the Seeded-RT algorithm, a static time window cleaning is applied to the pulse series. This removes all pulses that are more than 4 𝜇s before or 6 𝜇s after the DeepCore trigger. These two algorithms, the Seeded-RT and time window cleaning, define the cleaned pulse series that is used as input for many Level 3-6 algorithms as well as the neural network reconstruction (Chapter 4). Level 2 is also used to run another fast event selection algorithm that calculates the center of gravity (COG) using hits in the DeepCore fiducial region in the Seeded-RT cleaned pulse series. The COG calculation is then used to explore any remaining hits in the veto region and remove these events if their derived speed relative to the COG indicates the hit could be causally connected to a muon crossing the detector. 3.4.2 Level 3 The goal of Level 3 is to apply fast algorithms that remove background muons, pure noise, and event types that are not simulated in the MC such as muon bundles and coincident events. Bundles are multiple muons that trigger in the detector in the same time window. MuonGun injects single muons, not multiples (i.e. bundles), so these are not accounted for in the simulation sample. Muon bundles are easy to remove using veto cuts, so there is not a concern of them in the final sample once these cuts are applied here. Coincident events are events that have two independent particles that interact in the detector within the same triggered time window, such as a neutrino and an atmospheric muon. Since these particles are simulated separately, they are also not accounted for in the MC sample. To remove noise, events are cut if they do not meet the minimum number of hit DOMs (< 6 cleaned hit DOMs), the minimum number of hits in a sliding time window (to check for local coincidence), or the minimum number of hits in the fiducial region (< 2 hits). To target muons, the cuts focus on the position of the hit DOMs. This includes cutting events if there are too many hits in the veto regions (> 7 hits the veto region or an overall ratio of 1.5 times the veto hits compared to 34 fiducial), cutting on a quick estimate of the starting vertex 𝑧 position to loosely constrain it within DeepCore (< -120 m) which indicates particle production at the neutrino interaction vertex, and demanding that most of the hits are concentrated in the first 600 ns of the event (cut if the ratio is < 0.37). To remove coincident events, the time between the first and last pulse is calculated and constrained. This is applied both to the uncleaned (< 13000 ns) and cleaned pulse (< 5000 ns) series, with a tighter constraint applied to the cleaned pulses. These cuts remove 95% of the background atmospheric muon events and 99% of the pure noise events from the Level 2 sample [44]. 3.4.3 Level 4 Now that the obvious background events are removed, machine learning algorithms can be applied to further identify and remove background events. There are two classifiers, the first has been trained to classify pure noise events and the second to classify atmospheric muon events at this level. A noise classifier is trained using Level 4 MC to distinguish pure noise events from neutrino events. It uses a Boosted Decision Tree (BDT) that takes in variables calculated at Level 3 and 4 to distinguish the probability that an event is neutrino-like rather than noise. The cut that is applied on the BDT score (> 0.7) reduces the noise rate by two orders of magnitude (Figure 5.5) while keeping 96% of neutrino events [44] between Level 3 and 4. The muon classifier also uses a BDT, which was trained to distinguish atmospheric muons from neutrino events. The training sample that is used is a subset of real detector data processed up to Level 4 for the atmospheric muons, since the Level 4 sample is expected to be composed of 99% atmospheric muon events. The GENIE MC is still used as the training sample for the neutrino events, also processed up to Level 4. The chosen cut on the BDT score (> 0.65) removes 94% of atmospheric muon events and keeps 87% of neutrino events between Level 3 and 4. 35 3.4.4 Level 5 Level 5 focuses on removing the atmospheric muon events that have eluded the previous cuts. The signatures of the remaining muons are distinct from the clear muon-like signatures already cut. Level 5 applies two categories of cuts to remove the remaining muon events: starting containment and corridor cuts. Starting containment is a check to ensure the event begins inside of the DeepCore fiducial volume. At Level 5, three fast estimates of the vertex are used to to apply a cut on the radial (< 150 m) and vertical (-490 m < 𝑧 < -225 m) containment. Corridor muons are events that sneak through the gaps between the hexagonal structure without hitting any strings until the DeepCore region. An algorithm is run at Level 5 that searches for DOMs hit along the corridor regions, using a COG calculation to determine the closest string associated with it. It then searches for DOMs along that corridor using a cylindrical spatial and time window to determine if light was detected in the region associated with a corridor event. If two or more DOMs are found to have light associated with the corridor region, the event is removed. The last cuts applied at Level 5 use the Level 4 classifiers again, further tightening their cuts (noise > 0.85 and muon > 0.9). 3.4.5 Level 6 For the purposes of this analysis, Level 6 includes some fast algorithms that are run to determine variables that are later used in final analysis cuts to improve data/MC agreement. No cuts are applied at this level, only preparation for the analysis cuts after the reconstruction has been applied. This includes the variables calculated to further cut coincident and corridor events. Section 5.2.1.1 and 5.2.2 contain more information about which variables are used and their effect on data/MC agreement. This is where my work begins, creating selections for each of the training samples and analysis samples after Level 6. 36 CHAPTER 4 FAST LOW ENERGY RECONSTRUCTION USING CONVOLUTIONAL NEURAL NETWORKS Previous low energy reconstruction methods for IceCube use likelihood methods to compare the observed pulse distributions in the DOMs to pre-calculated tables containing the number of expected photons. One such algorithm, Retro Reconstruction [10], compares the observed and expected distributions for the DOMs for various scenarios of neutrino energy, interaction vertex, and direction. The algorithm then calculates the likelihood of each hypothesis and finds the best match. This method takes 40 seconds on average to reconstruct a low energy neutrino event. IceCube has a large data sample of atmospheric neutrinos to reconstruct, with about 100,000 events expected per year, even after undergoing the processing cuts up to Level 6. The low energy reconstruction must handle an even greater sample of Monte Carlo (MC) generated to explore effect of detector systematics and backgrounds, reconstructing O(108 ) MC events. The Retro Reconstruction method takes months to accomplish this task, even while leveraging a computing cluster. A solution was needed that would speed up the reconstruction time, without losing accuracy in the difficult-to-reconstruct low energy regime. This is the motivation behind the Fast Low Energy Reconstruction using Convolutional Neural Networks (FLERCNN). FLERCNN is written with the intention to reconstruct neutrino events for a 𝜈 𝜇 disappearance oscillation analysis and informs the choices of variables that FLERCNN is optimized to reconstruct. To complete the oscillation analysis, one needs the 𝐿/𝐸 and flavor of the incoming neutrino. FLERCNN reconstructs zenith angle (cos(𝜃 𝑧𝑒𝑛𝑖𝑡ℎ ) ≈ 𝐿), energy (𝐸), and scores how confident it is that an event is track-like (predominately 𝜈 𝜇 CC at these energies) (Section 1.2.2). In addition, previous oscillation analyses on IceCube have benefited from cuts that ensure the neutrino events start in the DeepCore region (since its instrumentation is optimized for lower light events) and a cut to further remove background atmospheric muons [17, 47]. FLERCNN is trained with the capability to apply these two purity cuts in addition to reconstructing the necessary 37 analysis variables. The following sections will discuss how FLERCNN was specifically optimized to handle the low light events at the 10-GeV scale, the training procedure for each of the 5 FLERCNN regression and classification networks, and the performances. The network is trained and run using Keras software with a Tensorflow backend [48, 49]. 4.1 Architecture Convolutional Neural Networks (CNNs) are known for their capabilities in image recognition [50, 51, 52]. They function by using a kernel, or window, to span nearby pixel inputs and having them share their weights while training. CNNs have translational equivariance, which means they can identify objects regardless of their overall location in the image [53, 50]. This is useful for identifying similar events in different parts of the detector. The IceCube detector is a fairly regular array of optical modules, equating the DOMs to the pixels in an image. The input from the DOMs equate to the RGB (red, green, blue) input from typical images. This has already been proven successful, as IceCube has used CNNs to reconstruct the energy and direction of TeV to PeV events [20]. That high energy CNN loses resolution at the 100s of GeV scale, though, and resolution down to the 10-GeV scale is needed to resolve oscillation signatures. The architecture of the high energy CNN was used as a starting point to motivate the design for FLERCNN’s architecture and input features. To focus the CNN on the well-resolved low energy events, FLERCNN only uses strings in and around the DeepCore array. The CNN is given input from all 8 DeepCore strings and the 19 center- most IceCube strings (Figure 4.1). This accounts for the first dimension in the input array, which corresponds to an unordered string index in either the 8 DeepCore strings or 19 IceCube strings. The second dimension in the input array is the relative DOMs index, ordered by their position on the string. The last dimension is five variables that are chosen to summarize information of all photons that hit a given DOM in the event time window: • Sum of the charge 38 Figure 4.1: Top view of the IceCube detector, with DeepCore strings in red, IceCube strings in gray, and a blue highlight on all strings used for the low-energy CNN. • Time of the first hit • Time of the last hit • Charge-weighted mean of the hits • Charge-weighted standard deviation of the hits More details on these input features, how they are calculated, and their re-scaling is in Section 4.2.2. The inputs from the DeepCore and IceCube strings are recorded in separate arrays, and given separately to the CNN since they have different 𝑧-spacing between DOMs, resulting in a two- branched network architecture for the convolutional layers (Figure 4.2). The CNN kernel spans only the vertical, or 𝑧-depth, of the strings. Since the DeepCore strings are deployed in an irregular array, no convolution is applied in the 𝑥𝑦-plane. The same convolutional kernel applied to the IceCube strings gave satisfactory results, and thus both branches of the CNN use the same kernel applied in 𝑧-depth only. 39 The network structure in Figure 4.2 shows the 8 convolutional layers in green, with their kernel sizes specified each layer. They vary between 1-7 DOMs included in the kernel, always using an odd number so the same number of DOMs above and below the target DOM are taken into account. Every layer also included a batch normalization and dropout layer. The batch normalization is used to improve optimization and stability of the training [54]. The dropout layer randomly masks 20% of the nodes so that they do not contribute to the training during that forward pass. The nodes that are masked are changed every iteration through the training to help make the network more stable and more generalized. Two max pooling layers are included at layer 1 and 4, since max pooling is known to improve CNN recognition if the object is skewed or rotated slightly within the image [53]. After the 8th convolutional layer, both branches of the network are flattened into a single layer and concatenated together into a single, fully connected dense layer. The batch normalization and dropout is applied once more on the entire input before the network gives its prediction of the output. The network is trained by using the events in the training sample to input into the network. The forward pass occurs by taking the event’s inputs and running it through the existing network’s configuration of the nodes and their weights. After the last layer, an activation function is used to transform the output. Then, a chosen loss function is used to score how accurate the CNN’s reconstruction is. The backpropagation step then occurs, updating the weights using the gradient of the loss. The next event is then fed into the network for another forward pass, and this process continues until there is a full pass through the training data, or an epoch. Note that this process can also be done in batches, where all FLERCNN architectures use a batch size of 256 events for training. The chosen loss for each network is in Table 4.1. The energy FLERCNN network uses the mean absolute percentage error loss, such that 𝑛 100% ∑︁ 𝑡𝑖 − 𝑟𝑖 Loss = (4.1) 𝑛 𝑡𝑖 𝑖=1 where 𝑛 is the number of events, 𝑡𝑖 is the true value and 𝑟𝑖 is the reconstructed (or estimated) 40 Figure 4.2: Detailed network architecture for FLERCNN with the legend for colored layers on the right. The notation at the top indicates that for each event, there are 8 DeepCore strings with 60 DOMs per string and 5 summary variables per DOM fed into one branch of the network. The other branch of the network sees the 19 IceCube strings with their 60 DOMs per string and 5 summary variables per DOM. variable by the CNN. This was chosen carefully such that the energy network would give equal importance to reconstructing the low energy events (at a few GeV) as the high energy events (at 100 GeV). Since the O(10GeV) events are the region where oscillation is expected, the percentage error ensures that the 10 GeV events are well-reconstructed without over-emphasis on the O(100GeV) events. For the vertex and zenith FLERCNN networks, these use a mean squared error loss: 𝑛 1 ∑︁ Loss = (𝑡𝑖 − 𝑟𝑖 ) 2 (4.2) 𝑛 𝑖=1 41 This uses the difference between the true (𝑡) and reconstructed (𝑟) values, taking the square so that larger differences are emphasized more in the loss. Lastly, the classification networks use a binary crossentropy loss function for the output score: Loss = −(𝑡 ln(𝑟) + (1 − 𝑡) ln(1 − 𝑟)) (4.3) where 𝑡 is still the true value, but it is now either a 1 or 0, and 𝑟 is the classifier score. CNN Loss Function Activation Energy Mean Absolute Percentage Error Linear Zenith Mean Squared Error Linear Vertex Sum(Mean Squared Error) for X, Y, Z Linear Muon Binary Crossentropy Sigmoid Track Binary Crossentropy Sigmoid Table 4.1: Describes the loss function and activation used for each network. 4.2 Preparing the Training Samples The training samples account for the biggest difference between these networks. These were specially generated MC and their distributions were optimized to perform for each network’s purpose. The generation of the optimized MC training samples is outlined in Section 3.1.1. These samples undergo the typical MC processing described in Section 3.4, but there are additional cuts and optimizations used to prepare the samples for training. 4.2.1 Data Agreement There are a few data agreement measures that must be addressed at the level of charges. The first potential disagreement is that some of the IceCube data seasons only record quantized charge into 0.05 PE (photo-electron) steps. To make all data and MC appear uniform, all pulse charges are quantized (in 0.05 PE steps) before being used to calculate the variables for the CNN’s input features. 42 A study performed by IceCube’s Spencer Axani showed that data/MC disagreement is evident in charges recorded below 0.25 PE [14]. A plot from their technical report is included in here to show the expected mistmatch between data and MC below 0.25 PE (Figure 4.3). The data and MC input into FLERCNN uses the "default discriminator" settings. Thus, all pulses below the < 0.25 PE threshold are cut from both training and testing samples for FLERCNN to remove potential data/MC disagreement. Figure 4.3: Shows the rate of the default discriminator in blue (from the SPE templates used to generate FLERCNN data and MC) compared to data as a function of charge. Agreement is seen above 0.25 PE [14]. 4.2.2 Preparing Input Features In addition to using the cleaned pulse series (Section 3.4.1), all input features undergo a further selection criteria unique to the FLERCNN processing. This selection criteria helps to ensure that the pulses given to the CNN have good data/MC agreement and that the sample has minimal extraneous noise hits. The first step done in this process is that all pulse times are shifted to be relative to the DeepCore trigger, so that the trigger time is at 0 ns. Any events with no DeepCore trigger or greater than 1 DeepCore triggers are then removed, since it is unclear how to define 0 (trigger time) for the event 43 Division Input Variable Shift Requirement Factor > 0.25 PE; Sum Charge None 25 rounded to nearest 0.05 step Time First Pulse By trigger time 4000 Uses times in [-500,4000] ns Time Last Pulse By trigger time 4000 Uses times in [-500,4000] ns Uses times shifted Uses times in [-500,4000] ns; Charge Weighted Mean 4000 by trigger time pulses > 0.25 in 0.05 PE steps Charge Weighted Uses times shifted Uses times in [-500,4000] ns; 2000 Standard Deviation by trigger time pulses > 0.25 in 0.05 PE steps Table 4.2: Describes the transformations that the input features undergo during preprocessing. in these cases. After this shift is applied, only the pulses in [-500, 4000] ns window around the trigger are used to train the CNN. This corresponds to about 8 scattering lengths, and thus it is assumed that most pulses outside this window are noise. If the pulse times do not fall into the time window, they are not used for any of the five summary variables, including the sum of the charge for that DOM. Some re-scaling is applied to the input variables to help the CNN train faster and make sure it treats all inputs proportionally using a uniform range [55]. Thus, the input variables are divided by static, near maximum numbers determined early on from histograms of the raw input so that the input features all fall near the [-1,1] range. Each of the five summary variables are divided by a factor listed in the third column of Table 4.2. 4.2.3 Sample Cleaning Differences between Training Sets As these networks were developed over the course of three years, along with developing unique samples optimized for specific training variables, there are some differences in the pre-processing for the FLERCNN networks. Table 4.3 summarizes which networks use the pre-processing methods described below. The cuts in Table 4.3 are only applied to the described training samples. The cuts that will be used for testing and analysis samples are described separately in Tables 4.7 and 5.1. 44 ProbNu # Pulses nDOM CNN CNN Containment > 0.95 >= 8 stings cut Energy No No >= 7 No Zenith No No No Yes Vertex Yes Yes No No Track Yes Yes No No Muon Yes Yes >= 4 No Table 4.3: Describes the additional training sample cuts. For ease of viewing, cells highlighted in green indicate that the cut in that column was applied to the training sample for the network in that row. 4.2.3.1 Neutrino vs. Noise BDT To ensure that the events appear as "neutrino-like" as possible, a cut is applied on the BDT trained at Level 4 to distinguish noise events (Section 3.4.3). A tight cut is applied here at ProbNu > 0.95, to make sure the neutrino event used for training looks very neutrino-like. Even with this tight cut, this typically only removes a few percent neutrino events per sample, as earlier cuts up through Level 6 have already removed any suspicious-looking events. It should be noted that none of the CNN training samples include pure noise events, but all do include pulses in the neutrino (or muon) event that originate from noise, as described in Sections 3.4.1, 4.2.1, and 4.2.2. Note that this tighter noise BDT cut is only applied to the training sample, but is not applied during testing or analysis (see Table 4.7 for cuts used during testing and Table 5.1 for cuts applied for the analysis sample). 4.2.4 Number of Pulses and DOMs Hit In accordance with previous low energy IceCube reconstruction methods, a cut is applied to some CNN training samples to check that there are at least 8 hits in the pulse series. Newer models of the CNN optimized this cut by focusing on the CNN input itself, and thus a cut on number of DOMs in the CNN strings (8 DeepCore and center-most 19 IceCube strings) is introduced for some training samples. Some training samples apply one or both of these cuts, though it is suggested that future training samples for the CNN use just the number of DOMs in the CNN strings. Note that the 45 number of pulses cut is only applied to the training sample, but is not applied during testing. The number of DOMs hit is applied to all samples, from the training, testing, and final analysis sample (see Table 4.7 for cuts used during testing and Table 5.1 for cuts applied for the analysis sample). 4.2.4.1 Containment Lastly, most training samples relied on early processing (Level 2-5 described in Sections 3.4) to eliminate events with starting vertices outside of the DeepCore region. Training the zenith network, however, is particularly sensitive to seeing examples of full track signatures. Thus, a cut was applied so that all events started and ended in the CNN strings, using the true vertex, direction, and track length to determine. This cut is called "containment" in Table 4.3. Note that the containment cut is only applied to the training sample, but is not applied during testing or analysis (see Table 4.7 for cuts used during testing and Table 5.1 for cuts applied for the analysis sample). 4.2.5 Training Each CNN Each network has its own training sample, after applying the unique combination of cuts described in Section 4.2.3. Table 4.4 describes the content of each network’s training sample, including the particles, 𝜈 interactions, number of events, and energy range they cover. To confirm the network does not overfit to the given training sample, 20% of each training sample is set aside to validate the network’s progress after each training pass. At the end of each pass or evaluation of a single training file, the network is evaluated on both the training and validation sample using a specified loss function, which is defined by the user and chosen per network. Note that all FLERCNN variables were trained using multiple files containing a random subsetset of the training sample, so the loss function was evaluated at the end of each of these training files on a subset of the validation sample. The model is chosen at an epoch, or full pass through all the training files, that shows a near-minimum result in the loss along with no signs of overtraining. Overtraining would be indicated by the training loss continuing to decrease while 46 the validation loss stagnates or even increases, diverging from each other. Table 4.5 contains the information of the chosen epoch for each model, where epoch = raw training pass / number of files. The learning rate, or step size of changing the gradient of the weights while training, is updated throughout the training process. The learning rate is decreased as the loss decreases so that it can settle at a minimum loss. The equation used to update the learning rate is 𝑅𝑛 = 0.001floor(𝑛/𝐸)∗𝐷 where 𝑛 is number of raw training passes through the files, 𝐸 = number of passes between drops (LR Drop), 𝐷 = drop factor (LR Drop Factor). The floor function rounds down to the nearest integer, so that the rate only changes every 𝐸 passes. Table 4.6 has the settings used for the learning rate progression for each network, which varies 𝐸 (LR Drop) and 𝐷 (LR Drop Factor) specific to each network. The table has the final learning rate at the chosen model, which is determined by setting 𝑛 to the raw training pass at the chosen epoch. The differences between the CNN training sample size, composition, and step size (learning rate) progression are summarized in Tables 4.4, 4.5, 4.6 with more depth in the following sections. CNN Particle(s) 𝜈 Interactions # Training Events Energy Range (GeV) Energy 𝜈𝜇 CC 9 million 1-500 Zenith 𝜈𝜇 CC 5 million 1-300 Vertex 𝜈𝜇 CC 5 million 1-500 Muon 𝜈 𝜇 , 𝜈𝑒 , 𝜇 CC, NC 7 million 𝜈 5-200, 𝜇 150-5000 Track 𝜈 𝜇 , 𝜈𝑒 CC, NC 5 million 5-200 Table 4.4: Describes the training samples for each network, which more details expanded in the following sections. CNN # Files Raw Training Pass Epoch Energy 18 594 33 Zenith 10 700 70 Vertex 8 232 29 Track 12 192 16 Muon 12 240 20 Table 4.5: Describes the number of files used for training, the model chosen in terms of the raw training pass and epoch. 47 CNN LR Start LR Drop LR Drop Factor LR at chosen model Energy 0.001 200 0.5 2.5𝑥10−4 Zenith 0.001 80 0.8 1.7𝑥10−4 Vertex 0.001 50 0.1 1𝑥10−7 Track 0.001 50 0.1 1𝑥10−6 Muon 0.001 - - 0.001 Table 4.6: Describes the training samples learning rate and changes. 4.2.5.1 Energy A sample of more than 9 million unweighted, 𝜈 𝜇 CC events is generated to train FLERCNN for energy. The goal is to have a flat, unbiased energy distribution for training, between 1-500 GeV, since the target reconstruction region is 5-100 GeV for FLERCNN to use for the 𝜈 𝜇 disappearance analysis. Most energy bins (1 GeV per bin) have 20,000 events per bin, creating the "flat" sample distribution seen in Figure 4.4a. Events with a low number of hit DOMs are especially difficult for the network to resolve, and thus, these events are removed both when training and evaluating the network (Tables 4.3 and 5.1). The cut requires that the CNN has at least seven DOMs with input information, requiring that there are seven DOMs hit between the 8 DeepCore strings and 19 inner-most IceCube strings for the event to be reconstructed with FLERCNN. The total sample size is 9,026,280 events with 80% of the sample used for training and 20% for validation. To cope with the large file size that comes with storing the necessary pulse series information for training, the sample is split into 18 files and each file is loaded into the CNN during training in succession. The progression of the loss function with respect to each pass through the files is shown in Figure 4.4b. Since the training and validation are split into 18 files, there are 18 evaluations for the loss over the course of each epoch. The settings for the learning rate progression are 𝑅𝑛 = 0.001floor(𝑛/𝐸)∗𝐷 where 𝐸 = 200 and 𝐷 = 0.5 for the energy network. This combination of settings produces a fairly stable loss decrease over training progression (Figure 4.4b). Since each training pass is only through 1/18 of the data, the drop occurs at the equivalent of 11.11 epochs. This occurs since 𝐸 = 200, meaning the learning 48 rate is set to drop after 200 training passes. Each training pass scans one of the 18 files, therefore it changes at the equivalent of 200/18 = 11.11 epochs or passes through the full data set. The model used was after 594 training passes, or 33 epochs. Figure 4.4b shows that the validation loss (in cyan) vs. the training loss (in blue) does not diverge at epoch 33, so no overtraining is evident. The performance of the energy CNN will be evaluated in Section 4.3.2. (a) Part of the energy training sample, unbiased (b) Loss vs. epoch progression as the energy network in energy due to the nearly flat distribution per trained. GeV. Figure 4.4: Energy training sample and loss progression. 4.2.5.2 Zenith Angle The direction that the neutrino is traveling is used to determine the distance that it has traveled (𝐿) after originating from comic ray interactions in the atmosphere. The zenith angle, relative to the 𝑧-vertical axis of the IceCube detector, denotes if the neutrino is traveling from the South Pole atmosphere toward the Earth (𝜃 z = 0◦ ) ranging to if it is traveling through the Earth’s core toward the atmosphere above the detector (𝜃 z = 180◦ ). Determining the exact angle of travel at the GeV-scales, where only a few DOMs could be hit, is challenging for all reconstructions due to the limited information. Furthermore, track-like events with their hits distributed across multiple DOMs are typically easier for reconstructions to resolve. Thus, the training sample for training the zenith CNN is a 𝜈 𝜇 CC only sample with about 5 million events. 49 Optimizing the training sample is particularly tricky for the zenith sample as the sample cannot be generated with extended bounds in the same way that the energy network is. Even though the training bounds can not be extended in zenith angle, the energy range is still extended past the target analysis region, so that it has 1-300 GeV events. The distribution is flattened relative to the zenith angle (in radians) to train for the angle while minimizing bias (Figure 4.5a). This flat zenith training distribution was generated and tested by Dr. Shiqi Yu. A containment cut is applied to both the starting and ending locations of the muon from the 𝜈 𝜇 CC interaction that requires it to be within -505 m < 𝑧 < 500 m and 𝜌36 < 260 m, where 𝜌36 describes the radius from the center IceCube string (#36). After these cuts, there are a total of 5,024,876 events with 80% of the sample used for training and 20% for validation. To cope with the large file size that comes with storing the necessary pulse series information for training, the sample was split into 10 files. The settings for the learning rate progression are 𝑅𝑛 = 0.001floor(𝑛/𝐸)𝐷 , where 𝐸 = 80 and 𝐷 = 0.8 for the zenith network (Figure 4.5b). Since each training pass is only through 1/10 of the data, the drop occurs at the equivalent of 8 epochs. The model used was trained for 700 passes, or 70 epochs. The performance of the zenith CNN will be evaluated in Section 4.3.3. (a) Zenith training sample, unbiased in zenith angle (b) Loss vs. epoch progression as the zenith network due to the nearly flat distribution per radian. trained. Figure 4.5: Zenith training sample and loss progression. 50 4.2.5.3 Vertex The vertex CNN was trained by undergraduate researcher Julia Willison using an early version of a training sample initially created to train the energy CNN. As with the final energy training sample, it includes only 𝜈 𝜇 CC events and is flat in energy. Instead of the number of DOMs in CNN strings cut, it uses the number of pulses cut (required at least 8 hits). This training sample was not specifically optimized for vertex reconstruction, but it was found that it had a passable resolution and thus no further optimizations were applied (Figures 4.6a, 4.6b, 4.7a). The total sample size is 4,982,335 events with 80% of the sample used for training and 20% for validation. To cope with the large file size that comes with storing the necessary pulse series information for training, the sample was split into 8 files. The settings for the learning rate progression are 𝑅𝑛 = 0.001floor(𝑛/𝐸)𝐷 where 𝐸 = 50 and 𝐷 = 0.1 for the vertex network (Figure 4.7b). Since each training pass is only through 1/8 of the data, the drop occurs at the equivalent of 6.25 epochs. The model used was trained for 232 passes, or 29 epochs. The performance of the vertex CNN will be evaluated in Section 4.3.4. (a) 𝑥 distribution for part of the vertex CNN train- (b) 𝑦 distribution for part of the vertex CNN train- ing sample, optimized to be unbiased in energy but ing sample, optimized to be unbiased in energy but repurposed to train the vertex classifier. repurposed to train the vertex classifier. Figure 4.6: X position training sample and Y position training sample. 51 (a) 𝑧 distribution for part of the vertex CNN train- (b) Loss vs. epoch progression as the vertex network ing sample, optimized to be unbiased in energy trained, where the loss is the sum of the mean squared but repurposed to train the vertex classifier. error calculation for 𝑥, 𝑦, and 𝑧. Figure 4.7: Z training sample and vertex loss progression. 4.2.5.4 Track Classifier The track classifier aims to distinguish the track topology from the cascade topology (Section 2.2.2). It is sometimes referred as the PID (particle identification) classifier, since distinguishing the topology leads to the separation of the 𝜈 𝜇 CC events, thus identifying one of the particles. The training sample was created from the flat-energy generated training sample, cut to have a 50:50 distribution of track:cascade-like events. It uses 𝜈 𝜇 CC events for track-like signatures and uses 𝜈 𝜇 NC and 𝜈𝑒 CC & NC events for cascade-like signatures. This was enforced per energy bin, so that across the energy range (5-200 GeV), there are ∼15,000 track events and ∼15,000 cascade events per 1 GeV bin. In total, it contained 5,263,514 events, with 80% used for training and 20% for validation. To cope with the large file size that comes with storing the necessary pulse series information for training, the sample was split into 12 files. The settings for the learning rate progression are 𝑅𝑛 = 0.001 (floor(𝑛/𝐸)∗𝐷 where 𝐸 = 50 and 𝐷 = 0.1 for the track classifier. Since each training pass was only through 1/12 of the data, the drop occurs at the equivalent of 4.17 epochs. The model used was trained for 192 passes, or 16 epochs. The performance of the track classifier will be evaluated in Section 4.3.5. 52 (a) Energy distribution for the track clas- (b) Loss vs. epoch progression as the track classifi- sifier training sample, illustrating that the cation network trained, using the binary crossentropy sample was balance both in track/cascade loss function. composition along with energy distribution. Figure 4.8: PID training sample and loss progression. 4.2.5.5 Muon Background Classifier The muon classifier aims to distinguish background atmospheric muons that start outside of the detector from the neutrino signatures that interact inside of the detector. The neutrino portion of the track classifier training sample was re-purposed to train the muon classifier, with the addition of the existing oscNext Level 4 muon MC. The neutrino sample was cut down in size to match the muon sample, to give a 20:40:40 ratio between 𝜈𝑒 :𝜈 𝜇 :𝜇. The 𝜈𝑒 and 𝜈 𝜇 events contain both CC and NC interactions in the sample, and include true neutrino energies up to 200 GeV and all zenith angles (Figure 4.9). The Level 4 muon simulation sample is used rather than the Level 6 muon sample due to lack of statistics at Level 6, which can be seen by the muon rate in Figure 5.5. The muon energy in Figure 4.9 is the energy at generation, rather than the neutrino energy shown at its interaction point, which is why it is always much greater than the neutrino energies. This training sample also has a cut applied enforcing that the number of DOMs in the CNN strings ≥ 4. The total sample size is 6,923,001 events, with about 2.8 million atmospheric 𝜇, 2.8 million 𝜈 𝜇 events, and 1.8 million 𝜈𝑒 events. Again, 80% of the sample was used for training and 20% for validation. To cope with the large file size that comes with storing the necessary pulse series information for training, the sample was split into 12 files. 53 To train, the learning rate started at 0.001 and was not changed through the duration of training. The model used was trained for 240 training, or 24 epochs since there are 12 files total in the training sample. The performance of the background muon classifier will be evaluated in Section 4.3.6. Figure 4.9: True cosine zenith and energy distribution used to train the muon classifier. The true energy for the muon simulation is the energy of the muon at generation, which is why it is different than the energy of the neutrino, which is shown at the interaction point. The 𝜈 𝜇 and 𝜈𝑒 histograms are stacked, in comparison with the muon histogram (not stacked). Figure 4.10: Loss vs. epoch progression as the muon classification network trained, using the binary crossentropy loss function. 4.3 Performance Testing samples are independently generated neutrino MC: the analysis sample described in 3.1.2. These distributions are weighted with atmospheric flux and oscillation weights, making them 54 look as close as possible to the expected data distribution. The analysis cuts are therefore applied to this sample, which are independent of the training sample cuts. The motivation for these cuts are described in Chapter 5, but a table summarizing the cuts is included here (Table 4.7). Similar cuts are made on the likelihood reconstructed sample, using its own reconstructed variables to apply their cuts, along with other cuts optimized for a 𝜈 𝜇 disappearance analysis using this method. Cut Name CNN Cut High Stats Cut Reconstructed radius 𝜌CNN < 200 m 𝜌Retro < 300 m Reconstructed 𝑧 -495 m < 𝑍CNN < -225 m -500 m < 𝑍Retro < -200 m Reconstructed energy 5 GeV <= 𝐸 CNN <= 100 GeV 5 GeV <= 𝐸 Retro <= 300 GeV Reconstructed cosine zenith cos 𝜃 CNN <= 0.3 cos 𝜃 Retro <= 0.3 PID using 𝑃𝐼 𝐷 𝐶𝑁 𝑁 boundary: 𝑃𝐼 𝐷 Retro BDT boundary: reconstructed input [0.25, 0.55] [0.5, 0.85] Muon BDT using 𝐵𝐷𝑇CNN >= 0.8 𝐵𝐷𝑇Retro >= 0.8 reconstructed input # DeepCore Triggers only 1 N/A # CNN DOMs ≥7 N/A # Pulses N/A ≥8 Reconstructed time N/A 𝑡Retro < 14500 ns Passed Retro N/A True # Retro Iterations N/A > 10 DOMs with direct pulses > 2.5 > 2.5 # hits in top 15 DOMs < 0.5 < 0.5 # hits in outer strings < 7.5 < 7.5 Table 4.7: FLERCNN and Retro Reconstruction sample cut comparison after Level 5. All analysis cuts that apply to the FLERCNN sample are described in more detail in Sections 5.1 and 5.2. The differences between the training samples and testing sample is helpful to evaluate the adaptability of the CNN and make sure it is not overtrained, or over-optimized, to only perform well on distributions that look like the training sample. Figure 4.11 shows the distribution of energy and zenith for the testing sample, with the CNN analysis cuts applied (described in Table 5.1). The shape of the distribution is similar for the likelihood-based sample’s testing distribution, with its respective cuts applied. 55 Figure 4.11: Testing distributions for the 𝜈 𝜇 CC and 𝜈𝑒 CC events. The energy and zenith cuts are not applied in this case, since they will be applied based on the individual reconstructions’ outputs during testing. All other cuts applied are the CNN Cuts described in Table 5.1. 4.3.1 Reconstruction Runtime One of the main motivating factors for pursuing the CNN reconstruction method is for its evaluation speed. Table 4.8 shows the times for running all 5 FLERCNN reconstructions and classifications (currently done in serial) vs. the likelihood-based reconstruction (which returns 8 variables, not including the PID and muon classification scores). Avg Time Events per day Time for 108 events Reconstruction Method per Event (single core) (1000 CPU cores or GPU nodes) 5 CNNs on GPU 0.0011 s 80,000,000 2 min 5 CNNs on CPU 0.29 s 300,000 8 hours Likelihood-based method 40 s [10] 2,000 46 days Table 4.8: Runtime and implications of applying the different reconstructions to the MC sample. The GPU timing for the CNN is determined using an NVIDIA v100 GPU node. The CPU timing for the CNN is benchmarked using a single core on a 2.40 GHz Intel Xeon E5-2680v4, though Tensorflow does have the capability to use multithreaded operations if multiple CPU cores are available. The likelihood-base method was also benchmarked using a single core [10] and only has CPU capability. There is a difference in the number of output variables for the two methods. FLERCNN uses five networks run in succession to produce seven output values: true neutrino energy, zenith direction, (x,y,z) starting vertex, PID classification score, and Muon classification 56 score. In contrast, the likelihood-based method is evaluated once per event to output eight variables: track energy, cascade energy, (x,y,z,t) starting vertex, zenith direction, azimuth direction. The benchmark numbers are thus the total times for the CNN to output all seven variables and the likelihood-based method to output all eight of its variables. Note that the final column of the table indicates the differences with some generous assumptions. While IceCube has access to multiple computing clusters, completely reserving 1000 cores for the low energy reconstruction alone is not typically feasible, especially if they are reserved for days. There are also more CPU cores available than GPU, so the number of reasonably expected cores would be less for the GPUs, though they would only be required for a shorter duration due to the faster runtime. These number also do not factor in the read and write time of the events, which are stored as a special IceCube-specific file type in O(103 ) event batches. Thus, this increases the runtime for both the CNN and the likelihood-based reconstruction in the true application. FLERCNN has successfully reconstructed the full MC and data samples within a week. The likelihood-based reconstruction takes months to reconstruct the full MC and data samples. 4.3.2 Energy Figure 4.12 shows the fractional energy resolution as a function of energy. The resolution is evaluated separately for four samples, first 𝜈 𝜇 CC and 𝜈𝑒 CC events which have their deposited energy in the detector approximately equal to the true energy of the incoming neutrino. For the 𝜈𝜏 CC and all flavor NC events, these have secondary neutrinos from their interactions, and thus will have part of their energy not deposited in the detector (carried away from the outgoing neutrino). For the latter case, these resolutions are plotted against the deposited energy, which is a more accurate representation of what the reconstructions can hope to estimate. This is a reasonable representation though since the analysis is performed in the reconstructed energy space, so the deposited energy is a good reflection of how these flavors will behave in the analysis. Comparing to the likelihood-based reconstruction, the CNN performs well at low energies for 𝜈 𝜇 events, where the CNN’s median is near the 0 line (indicating good agreement) while the 57 likelihood-based plot overestimates the energies in this region. The CNN was specifically trained and optimized on the 𝜈 𝜇 CC events, with the intention of using it for the 𝜈 𝜇 disappearance analysis. This is why the behavior is best for the CNN on that flavor. The CNN still has a comparable resolution for the other flavors compared to the likelihood-based method. The spread for the CNN continues to be smaller than the likelihood-based reconstruction at low energies. There is some offset for the median, but since the same reconstruction is applied for all flavors and our detection methods are unable to distinguish beyond the two PID topology (track vs. cascade), adjustments cannot be made for each flavor accordingly. Overall, the CNN energy reconstruction behaviors similarly to the likelihood-based reconstruction, including its trends between flavors and the resolution is comparable. Note that for all these energy resolution plots, the energy cut is not applied but all other final level analysis cuts are (described in Chapter 5). This is because it would cause an artificial bias on the edges of the true energy resolution range. 4.3.3 Zenith Figure 4.13 shows the absolute resolution for the CNN and likelihood-based reconstructions as a function of energy for track events (left plot) and cascade events (right plot). The high energy tracks are the best resolved events for both reconstructions due to the long lever arm that the reconstructions have to work with. The slight difference in the 68% region for the CNN at high energies is likely due to the track leaving the CNN string region (19 IceCube strings + 8 DeepCore strings). Future improvements to resolution could consider adding additional strings or training a network to apply an ending position cut on the track to make sure it is contained in the CNN input region, but this would only offer a slight improvement in a region that is already better resolved than (for example) the cascade zenith angle resolution at any energy. Cascades prove to be more challenging for both reconstructions, as both methods have similar resolutions across all energies. Even though the CNN is trained on 𝜈 𝜇 CC events only, it manages to reconstruct reasonable zenith positions on average for the sample with the median resolution 58 Figure 4.12: Resolution shown for true energy for the 𝜈 𝜇 CC and 𝜈𝑒 CC events. With the invisible energy carried off by the outgoing neutrinos from NC interactions and some 𝜈𝜏 CC interactions, the resolution is shown vs. deposited energy for these selections. near the zero line and the 68% spread comparable to the likelihood-based method. Note that the artifact in both the reconstruction methods at low energy (seen in both the track and cascade plots) is likely due to the events being weighted with the expected flux and oscillation weights–the latter of which is likely the cause of this artifact. Figure 4.13: Resolution shown for reconstructed cosine zenith for the 𝜈 𝜇 CC tracks and remaining cascade events. 59 4.3.4 Vertex The CNN resolution is fairly constant across all energies, for both the track and cascade radial reconstructions (Figure 4.14). The median is comparable with the likelihood-based reconstruction, but unlike the likelihood-based reconstruction, the 68% spread does not improve with increased energy. This could be due to the fact that the CNN was trained only using the center-most strings, so that as events go up higher in energy, some of their hits could be outside of the region the CNN uses for input. Note that the difference between the two 68% range is at maximum 10 m. This is much smaller than the distance between strings, and as the intention of this reconstruction is to use it to apply a containment cut to make sure the events start in the DeepCore region, improving the resolution further is superfluous. The results for the 𝑧 reconstruction in Figure 4.15 are similar to the radial reconstruction in that the CNN has a larger 68% spread on the order of 10 m maximum difference. The median reconstructed 𝑧 per energy bin is very comparable to the likelihood-based reconstruction. Overall, the spread for the 𝑧-position is smaller than the radial reconstruction, evident for both reconstruction methods, which is not surprising since the DOM 𝑧-spacing is much finer than the 𝑥𝑦-spacing (Section 2.1.1). The performance of using this reconstruction as a classifier to cut events inside the DeepCore volume vs. outside the contained volume is in Figure 4.16. Note that it keeps most of the events inside of the desired detector region (95.65%) and successfully eliminates 52.03% of the events outside of the region. For the vertex reconstruction, there is a known artifact in the CNN reconstruction where the string positions are favored in the x and y reconstruction (contributing to the radial reconstruction) and where the DOM positions are favored in z. Since this reconstruction is used for a containment cut, the precise reconstruction position is not necessary to make a reasonable containment cut. The favoring of string positions is evident in Figure 4.14. Future work optimizing the vertex classifier could be to generate a training sample that is uniform in the (x,y,z) starting vertex position, or at least a sample that has more events outside of 60 Figure 4.14: Resolution shown for reconstructed radius from the center string (string 36) of the starting vertex for the 𝜈 𝜇 CC tracks and remaining cascade events. Figure 4.15: Resolution shown for reconstructed z position of the starting vertex for the 𝜈 𝜇 CC tracks and remaining cascade events. Figure 4.16: Confusion matrix for vertex reconstruction, exploring efficiency of cutting at 200m, proposed for oscillation analysis. 61 the desired detector region. CNNs are known for their ability to interpolate, not extrapolate [56], so it is expected that the CNN performs better on classifying the event types it has seen (inside of the desired cut volume) rather than events that were more rare in the training sample (outside of the desired cut volume). As is, this cut still has the necessary effect to improve the sample resolution by ensuring the sample predominantly contains events starting inside of DeepCore; it successfully removes 52.0% of events outside of DeepCore while keeping 95.7% of the events inside of DeepCore. 4.3.5 Track Classifier Assessing the performance of binary classifiers uses different metrics. One way is to plot the classifier’s score (0-1) on events from each binary class. The left plot in Figure 4.17 shows this representation, with the true track events (𝜈 𝜇 CC) in red and true cascade (all other neutrino flavors) in blue. Ideally, there would be a peak for the CNN track score at 1.0 for the true tracks and another peak at 0 for the true cascades. Figure 4.17: Performance of Track classifier with binary classification histogram on the left and the ROC on the right. The FLERCNN result in Figure 4.17 shows the true track distribution has a peak near 1.0, which are the higher energy tracks that the CNN is more confident in classifying (note the log y-axis). There is not a large true cascade bump near the score of zero, which would be ideal, though there are more true cascade events classified in the region < 0.4 than true track events. 62 Another plotting metric for binary classification is the Receiver Operator Curve (ROC), which compares the "true positive rate" (true track, classified as track) to the "false positive rate" (true cascade, classified as track). A completely random classification would get the binary classification result correct 50% of the time, which is represented by the diagonal 1:1 line on the ROC plot where the true positive and false positive rates are equal. To reduce this plot to a single number, the Area Under the Curve (AUC) can be calculated. The closer that the AUC is to 1.0, or the further away from a perfectly random result (at 0.5), the better the classification power. The right plot in Figure 4.17 compares the ROC for both the FLERCNN classifier and the BDT trained on the likelihood-based reconstruction variables used in the traditional IceCube low energy analysis. The AUC for both are comparable, greater than 0.5 but less than 1.0. This is expected due to the classifiers’ difficultly distinguishing low energy track events. The GeV-scale events often have only a handful of hit DOMs, making it difficult to distinguish the track-like structure from the cascade due to short track lengths. 4.3.6 Atmospheric Muon Classifier To asses the atmospheric muon classifier, the same histogram and ROC method are used as above. In the classification histogram (left plot in Figure 4.18), the true neutrino events (all flavors) have a peak at the 1.0 classification score. This indicates the CNNs capability to distinguish neutrinos from the sample. The true muon sample, however, appears mostly flat. Recall that the muon classifier is trained on Level 4 muon events (Section 4.2.5.5). The true muon histogram has a peak near 0 when tested on Level 4 muons, but this is not the target sample. Even so, the ROC (right plot in Figure 4.18) does show the CNN muon classifier’s ability to distinguish the neutrinos. The CNN’s AUC is slightly improved compared to the BDT trained on the likelihood-based reconstruction variables. A harsh cut could be used to remove most of the true muons and keep the true neutrinos, since the neutrinos are extremely peaked at the CNN score near 1.0. Instead, the output of this CNN is used to fed into a BDT that is used as a final analysis cut to distinguish atmospheric muons, which will be discussed further in Section 5.2.1.2. 63 Figure 4.18: Performance of muon classifier with binary classification histogram on the left and the ROC on the right. 4.4 Stability Against Detector Systematics The neutrino simulation sets used for the tests in the previous sections are the sample set to the nominal analysis parameters for the physics and systematic expectations (Table 6.2). There also exists neutrino simulation sets that have their detector systematic parameters purposely set to off-nominal values (generally 5-10% past expectations). Some of these sets are used here to test the response that the five FLERCNN networks have to the changes in the detector settings. Note that changes are expected, since changes in these parameters will lead to more or less photons being detected by the DOMs. Thus, this check is to confirm the overall trend of these changes align with expectations and do not significantly deviate from the baseline results. The variables that are explored in this section are the energy, zenith, and PID which contribute directly to the analysis measurement (Section 6). Figures 4.19, 4.20, and 4.21 focus only on 𝜈 𝜇 CC track-like events. The test was also performed on 𝜈𝑒 CC cascades and found to have similar results. The top two plots in each figure show the FLERCNN energy reconstruction as a function of true neutrino energy. The bottom two plots in each figure show the FLERCNN zenith angle reconstruction as a function of true cosine zenith. The bias plots look at the overall shift of the median of the distribution for the calculated resolution (fractional resolution for energy and absolute resolution for zenith). The 68% spread plots give the width of the 68% contained resolution. In essence, this breaks the plots in Figures 4.12 and 4.13 into two numbers per bin: the position of the 64 CNN median line and the range of the CNN 68% shaded region. The PID will be discussed as a fractional change between the rates for the track, mixed, and cascade classification (Section 5.4.1): the rate of the systematic set over the rate of the baseline set. 4.4.1 Reconstructing Different DOM Efficiencies Figure 4.19: Bias and spread for the FLERCNN performance on various detector systematic sets for DOM efficiency. The FLERCNN reconstruction is shown in the solid lines and the likelihood- based reconstruction (retro) is shown in the dotted lines. The energy reconstruction scales with increased DOM efficiency; this is as expected since increasing the DOM efficiency would increase the number of photons that the PMT detects. The shift in the bias also mirrors the change in the DOM efficiency, where the energy reconstruction bias difference between the baseline and +10% increase DOM efficiency changes by a factor of 0.10. The changed DOM efficiency should not have an affect on the angular reconstruction, which 65 is found to be true in Figure 4.19 as the FLERCNN zenith reconstruction shows a very consistent bias and 68% spread between these sets. The comparison to the likelihood-based reconstruction (retro, shown in the dotted lines) shows similar behavior to the CNN. The energy effects are of similar magnitude, with a slightly larger impact for the CNN in bias and spread. The likelihood-based reconstruction also follows the trend that the spread continues to decrease as the energy increases, whereas the CNN spread levels off. These differences are small, but could be due to the fact that the CNN only uses the centermost IceCube strings, so as events get up to higher energy, there is a larger chance for underestimating the energy due to hits being outside of the CNN input region. The zenith bias and spread show no effect for both reconstructions. Note that the likelihood- based reconstruction bias is near 0 at the horizontal angles (cos 𝜃 𝑧 = 0), better than the CNN which has a more linear trend. In contrast, the CNN shows a smaller spread then the likelihood-based reconstruction across all angles. Overall, the differences between these two reconstructions are small and the trends overall show similar behavior. The PID rates do change with the DOM efficiency, with the cascade bin most sensitive (change in the rate by 1.00±0.15 corresponding to the ±10% sets). This is due to the lowest energy events being classified in the cascade bin (Section 5.4.1) so that increasing the DOM efficiency means that less events are cut from the sample (creating an increase in rate) and decreasing the DOM efficiency means that more events are cut from the sample, as compared to the baseline set. Most other changes to the PID are within 1.00±0.05. 4.4.2 Reconstructing Different Hole Ice Parameterizations The different hole ice sets vary the 𝑝 0 and 𝑝 1 hole ice parameterization of the DOM’s angular ac- ceptance (Section 2.3.2). The numbers in the legend correspond to changes in the parameterization, and their magnitude of change from the baseline settings is the most telling difference–rather than the exact values. In this case, the energy FLERCNN results are less effected by changes in the hole ice–seen in both the bias and 68% spread plots. The angular FLERCNN results do have a slight 66 Figure 4.20: Bias and spread for the FLERCNN performance on various detector systematic sets for hole ice. The FLERCNN reconstruction is shown in the solid lines and the likelihood-based reconstruction (retro) is shown in the dotted lines. bias with the different hole ice sets, which is expected due to the fact that the hole ice properties adjust the angular acceptance for the DOM. Thus, the CNN has likely learned the baseline angular acceptance of the DOM, and changing the acceptance could alter the angular reconstruction. Note that the gray line has the most severe hole ice changes as compared to the baseline, and the zenith CNN responds accordingly. The comparison to the likelihood-based reconstruction (retro, shown in the dotted lines) shows similar behavior to the CNN. The energy bias and spread show little change due to the various hole ice sets. The zenith effects are of similar magnitude, with a slightly larger impact for the CNN in bias and for the likelihood-based reconstruction in spread. Overall, the direction of impact from the sets and trends are very similar between the two reconstructions. The PID rates across all hole ice sets are very stable, changing by less than 1.00±0.05, except 67 for a slightly larger effect on the severely pulled hole ice sample (set shown in gray in Figure 4.20). The change to the mixed and cascade bin rates is near 0.90, which could be due to the bias seen in the zenith CNN reconstruction causing more events in this sample to fail the zenith cut (Table 5.1). 4.4.3 Reconstructing Different Bulk Ice Properties Figure 4.21: Bias and spread for the FLERCNN performance on various detector systematic sets for bulk ice. The FLERCNN reconstruction is shown in the solid lines and the likelihood-based reconstruction (retro) is shown in the dotted lines. The scattering and absorption systematic sets change the scattering and absorption coefficients uniformly for the entire bulk ice (recall these coefficients vary with depth and are modeled by the ice model used during the simulation photon propagation - Section 2.3.1). For scattering > 1, it indicates more photon scattering in the ice and has a different influence depending on the orientation between the PMT face and the photon. If the photon is head on to the DOM, more scattering would likely divert the photon away from being detected. On the other hand, it can improve the probability 68 of the photon from the back of the PMT being detected due to scattering into a position with better angular acceptance. For absorption > 1, it means a higher chance the photon is absorbed in the ice, and thus is not detected by the PMT. As with the DOM efficiency, the energy CNN reconstruction’s bias scales with high energies predicted when the systematic set allows for more photons to reach the DOM along with the inverse being true. Thus, the trend of the change is reasonable and the magnitude does not exceed that of the changed DOM efficiency. The zenith CNN reconstruction does not show much sensitivity to the changes in the bulk ice, with only a slight change that scales with the scattering differences. With a change in scattering, the angular reconstruction becomes difficult since the path of the photon has been altered (whether more or less than the nominal that the CNN expects). Thus, these changes are minimal, only changing the bias by ∼ 0.1, but the trend is expected. The comparison to the likelihood-based reconstruction (retro, shown in the dotted lines) shows similar behavior to the CNN. The energy effects are of similar magnitude, with a slightly larger impact for the CNN in bias and spread. The zenith bias and spread show little change, with much of the same behavior as seen before. Overall, the CNN seems to have a slightly larger sensitivity in bias with the various systematic sets, but the overall magnitude is very close to the changes seen in the likelihood-base reconstruction. The trends are also consistent for the changes in the systematic sets themselves, so that the reconstructions behave as expected with these detector settings altered. The PID rates across all sets are very stable, less than 1.00±0.05. This is reasonable since the bulk ice systematic sets have less of a bias shift for the energy than the DOM efficiency sets, so the change in energy is not enough to cause a similar rate change seen in the ±10% DOM efficiency. 4.4.4 Conclusion of Robustness to Systematics Overall, the energy and zenith CNNs perform well on the the various systematic sets with their detector settings altered. The changes in the bias scale with the expected behavior due to the changes in number of expected photons. The notable variation in bias indicates that FLERCNN is sensitive to the detector systematics, and therefore will be able to account for these when fitting the 69 detector systematic parameters during the analysis. The sensitivity is well-behaved, such that the trends match expectations and the shifts are not extreme, so the reconstruction should respond well during the fit. The 68% spread is fairly consistent across all systematic sets, without concerning trends. The PID rates are very stable between the systematic sets and the baseline sets, with minor fluctuations likely due to events being added/removed from the sample due to the cuts. It is important to note that the cuts applied to the analysis sample were optimized with the nominal systematic set in mind, so this behavior is not surprising and only occurs with systematic sets at the boundary or even more severe than the expected shifts in these parameters. 70 CHAPTER 5 FLERCNN ANALYSIS SAMPLE SELECTION After reconstruction, a series of cuts must be applied to remove remaining backgrounds, such as atmospheric muon and noise events. The data distribution is then separated into the analysis binning, by energy, zenith angle, and PID. 5.1 Resolution Motivated Cuts The first set of cuts are motivated by improving the angular and energy resolution of the remaining events. These include ensuring the events start in the DeepCore array, have a minimum number of DOMs hit for the reconstruction to perform well, and fall into the energy range where the FLERCNN energy reconstruction performs best. 5.1.1 Starting Vertex Low energy events benefit from the improved detector density that DeepCore provides. Thus, we want to make sure events are in the DeepCore array, so that they can be reconstructed with the best possible resolution. Cuts are applied using the FLERCNN reconstructed radius relative to the origin at the center string 36 (𝜌36 ) and 𝑧-position of the starting vertex. For the radius, a cut is applied so all events must have 𝜌36 < 200 m. For the 𝑧 position, we use the the depth of the DeepCore strings, so that events with a 𝑧 vertex position of -495 m < 𝑧 < -225 m are kept. 5.1.2 Number of hit DOMs During the FLERCNN studies, we found that cutting on 𝑛DOM gives better energy resolution in the low energy region. Thus, a cut is applied requiring at least 7 DOMs are hit in the 19 IceCube strings and 8 DeepCore strings that the CNN uses to reconstruct the events for training/testing. This reduces the total neutrino sample by only 1.05%, after all other analysis cuts are applied. 71 5.1.3 Energy Range The 𝜈 𝜇 disappearance oscillation bands are expected in the region < 100 GeV (Figure 1.7). The FLERCNN energy network was therefore optimized to reconstruct this region, and thus a cut is applied on the reconstructed energy so that only with 𝐸 𝜈,reco < 100 GeV are included. Due to the very small number of pulses recorded by DeepCore at less than 5 GeV, training the network much lower than this proved difficult. To avoid poor resolution at the lowest energy events, a cut is made to ensure the reconstructed energy of the event is 𝐸 𝜈,reco > 5 GeV. 5.2 Data/MC Agreement Motivated Cuts The second category of cuts are motivated by data/MC agreement, ensuring the shape agreement between the 1D data distribution and MC prediction for each variable. Only the shape matters because there are neutrino and muon normalization terms included in the systematic parameters (Section 6.4.1.3), so that their final values will be determined during the analysis. These include variables that limit the noise events, atmospheric muon background events that sneak into corridors between the IceCube strings, and other events that are not well simulated or accounted for in the MC. 5.2.1 Identifying Atmospheric Muons After Level 6 (Section 3.4.5), the muon rates are comparable with the neutrinos. Most of the remaining muons are ones that have not been identified as clearly "muon-like" in previous cuts. One category of this are coincident muon events, which occur when a muon and neutrino interact in the detector within the same triggered time window. These are not simulated due to technical limits in the MC, though they do occur in real data, and thus must be removed through cuts to ensure data/MC agreement. A few cuts are applied at the analysis level to remove any remaining muons, likely with signatures like coincident events. 72 5.2.1.1 Straight Veto Cut for Muons There are two counting variables used to apply veto cuts to remove muons at the final level. The first is variable is the number of hits in the top 15 layers of IceCube DOMs, with the cut at < 0.5. The second variable is the number of hits in the outermost IceCube strings, and is cut at < 7.5 hits. These cuts only remove a small portion of the final sample (< 0.05%). Figures 5.1 and 5.2 show the distributions of these variables with all analysis cuts applied, except for the one in question. Figure 5.1: The data/MC disagreement for the number of hits in the top 15 layers of IceCube DOMs. The left plot shows before the cut (keeping events < 0.5 hits) and the right plot shows the adjusted agreement after these events are removed. 5.2.1.2 Muon BDT A more sophisticated cut is also employed to remove remaining atmospheric muons at the analysis level using machine learning. Rather than using the FLERCNN muon classifier directly, a BDT was trained using the FLERCNN muon classifier score as one of five training inputs. This BDT was optimized by Dr. Shiqi Yu to improve the performance of the muon identification. The training and performance of the muon BDT will be described briefly below as it relies heavily on FLERCNN variables. The Muon BDT uses 5 variables as inputs: 73 Figure 5.2: The data/MC disagreement for the number of hits in the outermost IceCube strings. The left plot shows before the cut (keeping events with < 7.5 hits) and the right plot shows the adjusted agreement after these events are removed. • FLERCNN vertex 𝑧 - 𝑧 depth position of the interaction vertex, reconstructed by the CNN • FLERCNN vertex 𝜌36 - radial position of the interaction vertex, reconstructed by the CNN • FLERCNN Muon Classification - score of how muon-like the event is according to the CNN classifier • Probability Neutrino from Data BDT - probability the event is a neutrino, according to a BDT trained at Level 4 on real data to distinguish muons (Section 3.4.3) • Corridor Cut’s 𝑧 Minimum - Lowest 𝑧 position recorded that matches the Corridor Cut algorithm’s hypothesis (Section 3.4.4) The reconstructed vertex 𝑧 and 𝜌36 are used because atmospheric muons are generated outside of the detector whereas the neutrinos we are detecting would have muon signatures starting inside of the detector. The muon from a 𝜈 𝜇 CC event should have its starting position at the neutrino interaction vertex, so providing the reconstructed vertex to the BDT should help it distinguish background muons that come from outside of the detector vs. the signature we want to see with the track starting inside of the detector. 74 The training sample for the BDT uses the MC analysis sample at Level 6 with the FLERCNN reconstruction applied (Section 3.1.2). There is only one additional cut applied, requiring cos(𝜃) < 0.3. This is applied to both the training and testing samples, where 50% of the neutrino analysis sample is used for training (50% for testing) and 70% of the muon analysis sample is used for training (30% for testing), with the exception of the 10% of the muon sample used for training the FLERCNN muon classifier. This is to prevent any bias being introduced in the training or testing of the BDT from the CNN already being optimized over this sub-sample of events. The left plot in Figure 5.3 shows clear distinguishing power between the true atmospheric muons, identified with a peak near the score of 0 for how neutrino-like these events are, and the true neutrinos, identified near 1.0 neutrino-like score with a strong peak. To compare directly with the FLERCNN muon classifier’s capability alone, the Muon BDT has an AUC of 0.961 from the ROC plot (not shown here), which is improved from the FLERCNN muon classifier’s AUC of 0.947. The right plot in Figure 5.3 illustrates the rate remaining between the neutrino and muon samples for various proposed cut thresholds using the muon BDT. The cut used for this analysis is at a 0.8 neutrino-like score, which has a strong effect of reducing the muon events down to a rate 𝑂 (102 ) mHz. This cut does remove 23% of neutrinos when added to the final sample, but the background muon reduction is deemed worth this loss. In total, the total neutrino rate is still at acceptable levels on the 𝑂 (0.1) mHz, and the muon is reduced to a fraction of a percent of the sample (Section 5.3). Both the training and and testing samples have similar behavior in both plots in Figure 5.3, which indicates the BDT is not overtrained and the training sample can be used in the final analysis. Thus, 100% of the muon analysis sample is used in the final analysis. 5.2.2 Direct Pulses There are some remaining coincident noise events in the detector, that are not captured by the coincident cut described in Section 5.2.1.1. These can be identified by an algorithm that identifies "direct" pulses, from photons that do not undergo scattering. The SANTA algorithm accomplishes this by searching for photons that are causally connected to the reconstructed vertex [10]. The 75 Figure 5.3: Performance of Muon BDT, with the BDT predicted neutrino probability on the left and the rate with various proposed threshold cuts on the right (plots created by Dr. Shiqi Yu). SANTA-determined number of hit DOMs with direct pulses, shown in Figure 5.4, is used to apply a cut on the number of DOMs hit with direct pulses, requiring at least 3 pulses. This cut results in a total neutrino sample reduction of 10.50%, after all other analysis cuts are applied (Table 5.1). Figure 5.4 shows this distribution with all cuts applied except for the number of DOMs hit with direct pulses, showing a large data/MC disagreement below 3 directly hit DOMs. Note that the MC is scaled to the data, so once the events below 3 direct hit DOMs are removed, the remaining data/MC disagreement shifts to hover around the 1.0 ideal line. 5.2.3 Cosine Zenith Range The zenith angle range has a cut applied to one end of the boundary. This is to avoid self-veto atmospheric effect. The self-veto atmospheric effect is where a muon and neutrino from a single air shower both interact in the detector, which would be likely removed due to the selection process. Since the muon and neutrino simulation is generated independently, this loss of neutrinos would not be accounted for in the MC sample. Thus, applying the cosine zenith cut, requiring events to have a reconstructed cos(𝜃) < 0.3, helps to remove the events where the self-veto effect is most likely. 76 Figure 5.4: Number of DOMs hit with direct pulses, identified using the SANTA algorithm. The left plot shows before the cut (keeping events with < 3 direct hits) and the right plot shows the adjusted agreement after these events are removed. The MC scaling to the data means that removing the mismatched bin improves the overall normalization for the data/MC agreement. In addition, this cut helps to remove any downgoing muons that sneak through other muon-tuned cuts. 5.3 Application of Cuts Table 5.1 summarized the impact of the cuts described on an individual basis for the neutrino sample. The percentage in the last column is the effect of the specified cut when all other cuts are already applied to the Level 5 sample, to show the impact that this last cut alone has on the final analysis sample. The energy and zenith cuts are separated in the table to notate the fact that these cuts are used in the analysis sample selection, so that some fraction of the events cut are removed due to the fact that they are not in the analysis region (rather than attempting to remove backgrounds). Thus, the fact that these percentages are larger for the energy and zenith cuts are not a concern, due to the fact that these cuts are aiming to constrain the sample to the necessary analysis region. Note that even with the individual contribution percentages, only 34.83% of the neutrinos are removed in total between the Level 5 cuts and the analysis cuts. Thus, these cuts are not mutually exclusive. The muon reduction is much larger, with the rate reduced by nearly a factor 77 of 200 between Level 5 and the final FLERCNN analysis cuts (Figure 5.5). Cut Name Value % Removed # CNN DOMs ≥7 1.05% CNN Radius < 200 m 0.09% CNN Z -495 m < z < -225 m 5.48% CNN Vertex R and Z cut 6.12% DOMs w/ direct pulse > 2.5 10.50% NTop15 < 0.5 0.03% NOuter < 7.5 0.001% Muon BDT >= 0.8 23.90% CNN Energy 5 GeV < E < 100 GeV 20.70% CNN Cosine Zenith < 0.3 19.66% Total all above cuts 34.83% Table 5.1: Describes cuts on the neutrino MC analysis sample and their individual impact when added to the final sample. The final event rates after the FLERCNN cuts are listed in Table 5.2. Figure 5.5 shows the reduction of noise and atmospheric muon background in relation to the neutrino samples. Rates for Levels 2-5 are extracted from previous work done by IceCube collaborators who optimized the early level processing [44], which is contrasted to the final FLERCNN analysis sample cuts at the last data points. At the final level, the atmospheric muon rate is an order of magnitude smaller than the lowest neutrino rate (𝜈𝜏 ) and the noise simulation sample has been completely eliminated. The noise rate in Table 5.2 has an upper limit of the noise rate with the limited statistics simulated. The FLERCNN selection rates do take into account the re-weighting from the neutrino hypersurfaces and smoothing from the muon KDE, which are described in Sections 6.4.4.1 and 5.4.3. 5.4 Analysis Distribution Template Chapter 4 and Section 5.3 have established that the reconstruction is completed for the necessary analysis parameters (energy, zenith, and PID) and the background has been reduced. The sample is now ready to be prepared for the 𝜈 𝜇 disappearance analysis, choosing the binning to define the analysis sample. This separation into the (energy, zenith, and PID) bins will be referred to as the 78 Figure 5.5: Rates of the different selections comparison from the earlier levels to the final level FLERCNN sample after analysis cuts, including re-weighting with interpolated hypersurfaces and muon KDE smoothing described in Sections 6.4.4.1 and 5.4.3. Selection Rate (mHz) # of Counts (9.3 yr) % of Sample 𝜈𝑒 CC 0.175 51,096 ± 77 23.2% 𝜈 𝜇 CC 0.453 132,674 ± 132 60.3% 𝜈𝜏 CC 0.037 10,747 ± 23 4.9% 𝜈 NC 0.082 24,053 ± 53 10.9% 𝜇 𝑎𝑡𝑚 0.005 1,392 ± 95 0.6% noise < 0.0006 ∼0 < 0.08% total 0.750 219,964 ± 190 - Table 5.2: Describes the final rates, including re-weighting with interpolated hypersurfaces and muon KDE smoothing described in Sections 6.4.4.1 and 5.4.3, predicted by the MC simulation. "analysis binning," where the event count for each bin will be utilized for comparison in the final analysis (Section 6.1). 5.4.1 PID Following the precedent set by the likelihood-based reconstruction IceCube 9.28 year analysis, a three PID bin approach is taken for the PID. For the three bins, two are cut so that they create the purest possible track-like and cascade-like bins after classification, with similar statistics in each. The remaining events are put into the "mixed" PID bin category, with a score somewhere between 79 the purest track-like and cascade-like events. Figure 5.6 shows the classifier score for various true neutrino (and muon) flavors, with finer binning in the top plot and separated into the three analysis bins on the bottom plot. The cut for the track bin is at >= 0.55, to create the largest sample of high purity track-like events. For the cascade bin, it is impossible to completely avoid 𝜈 𝜇 CC events since low energy events are often mis-classified due to the very short muon tracks (see Figure 5.8). The cascade cut is applied at <= 0.25 to avoid the peak of the 𝜈 𝜇 CC events near 0.35, and to keep similar statistics to the pure track-like bin. Figure 5.6: Result of the CNN track-like classifier on the nominal MC, binned more finely in the top plot and binned at the proposed 3-bin boundaries for analysis in the bottom plot. Note that the neutrino (and muon) flavors are all stacked. After the cut has been applied, we explore the relative effectiveness of the resulting separation. Figure 5.7 shows that the high energy 𝜈 𝜇 CC events are correctly classified as tracks, whereas the low energy 𝜈 𝜇 CC events are typically classified in the mixed PID bin. Figure 5.8 shows that the track PID bin is dominated by 𝜈 𝜇 CC events and the mixed PID bin is still mostly tracks, but much 80 less pure. This shows that the chosen separation accomplishes the intended goal to have a high purity track-like bin. Figure 5.7: Percentages for each flavor’s fraction of events located in that PID bin as a function of energy. For example, the 𝜈 𝜇 CC sample shows that 80% of the 𝜈 𝜇 CC events > 60 GeV are classified as "tracks" with the chosen PID binning. Figure 5.8: Composition of each PID bin as a function of true energy, with the relative percent of the each flavor’s contribution to the events in each PID bin. 5.4.2 Binning To see the oscillation signatures, the template is binned in reconstructed cos(𝜃 𝑧 ), energy, and PID. The PID binning is described in Section 5.4.1. The chosen range for the energy and zenith are described in Sections 5.1.3 and 5.2.3. The energy uses logarithmic binning to account for the spectral index favoring neutrino flux low energies and to focus on the expected oscillation signature region. The zenith axis is binned linearly in the cosine of the angle. The number of bins in energy and zenith is chosen to balance an ideal finest binning with ensuring there are enough statistics 81 expected per bin in 9.28 years of livetime. There are a few bins that do result in extremely low expected counts, in the cascade PID bin, and Section 5.4.4 discusses how this is dealt with in the analysis. The binning is summarized in Table 5.3 and the resulting analysis template distribution is in Figure 5.9. Figure 5.9: Number of expected events in each bin over the given 9.28 years of livetime, with neutrino weights adjusted by interpolated hypersurfaces to account for detector systematics and the expected muon distribution smoothed by KDE. Variable Number Bins Bounds step PID 3 [0, 0.25, 0.55, 1.00] linear Energy 12 [5, 100] log Zenith 10 [-1.0, 0.3] linear Table 5.3: Describes the analysis binning using the CNN reconstructed and classified variables. 5.4.3 Muon Sample KDE Template The muon sample is reduced to ∼1000 events after the analysis selection. Thus, there are not enough statistics after applying the analysis binning to produce hypersurfaces for the muon distribution. Due to the low statistics of muons that survive to the final analysis sample, the distribution has artificially sharp features with large differences between neighboring bins (see first column in Figure 5.10). Thus, a Kernel Density Estimator (KDE) is applied on the distribution to smooth the muon template. The sample is smoothed without the analysis boundaries on energy and zenith angle applied, so that muon events on the edges of the analysis region can influence the generated KDE distribution. The result of the muon distribution after the KDE smoothing is applied is shown in the center column of Figure 5.10, with the final column showing the relative pulls between the 82 original template and the new smoothed version. Note that most bins are under a 5𝜎 pull, and in particular the signal region in the track PID bin is minimally affected. Figure 5.10: Muon distribution in the analysis binning before the KDE smoothing (left column), with sharp features due to low statistics, and after KDE smoothing applied (middle column) along with the pull in each bin (right column). The gray bins are where 𝜎hist = 0. Exploring the effect of the remaining muon sample after the KDE smoothing, we look at the composition of the muons per analysis bin. Figure 5.11 shows the percentage of muons expected in the sample per analysis bin. Most bins contain less than 3% muons, except for the extremely low statistics bins in the cascade PID. This reinforces the need to mask out the analysis bins with low event counts, such as those in the upper left corner of the cascade PID histogram. 83 Figure 5.11: Percentage of muons expected in the sample per analysis bin after muon KDE smoothing. 5.4.4 Masking The bins with very low statistics can lead to large influences from background in those select analysis bins, as seen in the muon KDE. In addition, bins with low statistics can lead to large uncertainties in the hypersurface fits (Section 6.4.4.1). Furthermore, these bins can be the source of unwanted behavior in the oscillations fit if there are a few data events in a bin with < 1 event predicted, creating a large pull due to a small statistical change. To avoid complications from these bins, they are masked out during the analysis (Figure 5.12). In total, there are 10 bins masked in the analysis, all in the cascade PID bin. These bins each have fewer than 15 events predicted in the 9.3 year livetime, so masking these bins does not cause any appreciable change in the total expected rate for any of the flavors (as seen in Table 5.2). Figure 5.12: Analysis binning after masking, with the masked bins shown in white. 84 CHAPTER 6 ANALYSIS OF MUON NEUTRINO DISAPPEARANCE WITH 9.28 YEARS OF DATA 6.1 Overview of Analysis Procedure Now that the analysis sample has been established, we can perform the 𝜈 𝜇 disappearance analysis considering the full 3-flavor neutrino oscillation and matter effects. The analysis framework used is the PISA software, which was written by IceCube collaborators to use in high statistics neutrino oscillation experiments, optimized to analyze small signals [57, 58]. The goal is to use PISA to constrain the mass splitting Δ𝑚 323 and the mixing angle sin2 𝜃 oscillation parameters. To do this, 23 the expected and observed counts per bin are compared (Section 5.4.2). The data undergoes the processing up to Level 6 (Section 3.4), FLERCNN reconstruction (Chapter 4) and muon BDT (Section 5.2.1.2), and application of the analysis cuts (Chapter 5). The data is then used to provide the observed counts. The MC analysis sample outlined in Chapter 5 provides the expected counts per bin (Figure 5.12), weighted to match the data livetime. The systematic uncertainties and physics parameters for the MC are then varied to create a new realization of the expected counts per bin, and a test statistic (Section 6.2) is calculated to quantify the comparison of the new bin-by-bin prediction to the data. The test statistic is minimized, finding the optimal fitted values for each of the free systematic parameters (Section 6.4) and the two physics parameters (Δ𝑚 323 and sin2 𝜃 ). 23 Chapter 6 outlines the analysis using 9.28 years of livetime. This analysis is done in conjunction with Dr. Shiqi Yu and Finn Mayhew, and is still undergoing final checks in the review by the IceCube collaboration. Due to ongoing studies being performed for the 9.28 year analysis, a proof of concept analysis is run using the same procedure (with minor modifications) on 0.92 years of livetime (described in Chapter 7). This uses 10% of the full data sample to confirm the analysis procedure without biasing the analysers to the result of the full sample, since the smaller sample will be dominated by statistical uncertainties. 85 6.2 Test Statistic The test statistic (TS) is the function that is calculated at each realization of systematic and physics parameters. The scan continues until the test statistic converges on a minimum. The function used for this analysis is a binned Poisson likelihood, evaluating the likelihood of observing the data count (𝑑) in each bin (𝑖) given the expected number of events (𝑛). Ö 𝑛 𝑑𝑖 𝑒 −𝑛𝑖 𝑖 L (𝑑|𝑛( 𝜂, ® 𝑠®)) = (6.1) 𝑑𝑖 ! 𝑖∈bins 𝑑 𝑖 −𝑛 ∑︁ © 𝑛𝑖 𝑒 𝑖 ª ln(L (𝑑|𝑛( 𝜂, ® 𝑠®))) = ln ­ ® (6.2) 𝑑𝑖 ! 𝑖∈bins « ¬ The expected number of events is dependent on 𝜂® which represents the physics parameters and 𝑠® which represents the systematic parameters. These systematic parameters are described in detail in Section 6.4. The natural logarithm is taken of the likelihood (abbreviated 𝐿𝐿𝐻) and the negative 𝐿𝐿𝐻 is the value that is minimized. To account for the systematic parameters, a penalty term is added to use the priors on our systematics to influence the final LLH metric. 𝑑𝑖 −𝑛 2 ∑︁ © 𝑛𝑖 𝑒 𝑖 ª ∑︁ (𝑠 𝑗 − 𝑠ˆ 𝑗 ) ln(L (𝑑|𝑛( 𝜂, ® 𝑠®))) = ln ­ ®+ (6.3) 𝑖∈bins « 𝑑𝑖 ! 𝑗 ∈syst 𝜎𝑠2𝑗 ¬ where 𝑗 accounts for all 16 systematic parameters with the fitted value (𝑠 𝑗 ) compared to the nominal value (𝑠ˆ 𝑗 ). This process of minimizing the −𝐿𝐿𝐻 over the physics and systematic parameters is called the "fit". "Pre-fit" means the MC distribution using the nominal systematic parameter values and the physics parameters set to the previous best-fit point from the IceCube 𝜈 𝜇 disappearance analysis using the 7.5 year livetime "golden event" sample (values in Table 6.2). "Post-fit" indicates the systematic and physics parameters at the best-fit values for the FLERCNN analysis sample. 86 6.3 Verifying Data Consistency Data from IceCube seasons 2012 - 2021 is used for this analysis, for a total livetime of 9.28 years. IceCube data-taking seasons typically start in April or May of the year and do not necessarily correspond to one year of livetime. Note that only part of the 2021 season is complete at the time of this analysis, so only the 2021 season up to Jan 2022 is used for this study. 6.3.1 Data Quality Check Between Years The data quality is checked to verify that the analysis variables are consistent between different years of IceCube data. All data is used for the quality check, with a Kolmogorov–Smirnov test (KS-test) performed on various 1D distributions to assess their similarity. The KS-test measures the probability that the samples are drawn for the same underlying distribution, such that a higher 𝑝-value means they are similar. The null hypothesis is that all years of data are uniform, so a low 𝑝-value indicates higher probability of the distributions being distinguishable from each other. The reconstructed energy distributions show good agreement between all years of the data (Figure 6.1). The left histogram shows mostly large 𝑝-values, with no concerning patterns in the distribution. The right plot in Figure 6.1 compares the actual 𝑝-values observed to the expected distribution of 𝑝-values, to assess if there are a reasonable number of low 𝑝-value bins since some are expected due to minor fluctuations over the 10 years. The zenith angle test also shows good agreement between all years of data, with mostly large 𝑝-values spread across the 2D histogram (Figure 6.2). The two sided 𝑝-value is < 5% in the right plot, but it pulls in the positive direction, where there are fewer low 𝑝-values actually seen than predicted. Thus, there is no concern with the quality of the zenith angle data agreement between the years. The PID classifier data quality check shows good agreement between all years (Figure 6.3). The two-sided 𝑝-value also is above 5%, and favoring higher consistency with fewer actual low 𝑝-values than expected. 87 Figure 6.1: Energy distribution compared between years of data. Figure 6.2: Zenith angle distribution compared between years of data. Figure 6.3: PID distribution compared between years of data. 88 The muon BDT data quality check shows a trend of slightly lower 𝑝-values for 2019 and 2020 (Figure 6.4). Overall, these differences are not too harsh (generally above 5%) and the two sided 𝑝-value (right plot) is well above the 5% threshold of concern. The likelihood-based 9.28 year analysis sample also saw similar behavior for their final level muon classifier’s data quality check, and traced its behavior back to the Level 4 muon BDT (Section 3.4.3) that their final level BDT uses as input. Their study revealed that one of the Level 4 muon BDT inputs, an estimate of 𝜌36 , noticeably changes due to a slight change in the IceCube calibration file used for the data in the 2017 season and beyond, where a rounding error shifts the position of some DOMs by less than a mm. They theorize that the Level 4 Muon BDT distinguishes this change, creating the different behavior after 2017. The same Level 4 muon BDT is used as input for the FLERCNN final level muon BDT (Section 5.2.1.2). To fix this would require adjusting the calibration files and re-training the Level 4 BDT. This may not be the only cause, as the final level muon BDT shows larger changes in the 2019-2020 years (and reverts back in 2021). But as the effect is small and the 𝑝-values for these years do not exceed the 5% threshold set for concern, it was judged acceptable to move forward without re-optimizing for the small variation in the muon veto efficiency in those years. Figure 6.4: Muon BDT distribution compared between years of data. Lastly, there is an expected variation in between the months in each year due to the seasonal atmospheric changes [37]. Nonetheless, there is a consistency expected in the months between each year, to a degree. Figure 6.5 shows the variation of the rate over each month, with the years 89 split by color. The results show consistency across the years, with most years being well within the error bars from each other. Figure 6.5: Rates compared between years of data, split up by month. 6.3.2 Data/MC Agreement Data/MC agreement is evaluated by comparing the shape of the 1D distributions of the final level sample. The pre-fit agreement uses the analysis template with all physics and nuisance parameters set to nominal values (Table 6.2) with interpolated hypersurfaces applied. Before the fit is performed, we expect there to be some disagreement between the data and nominal MC comparison. Preliminary checks are still conducted to look for large discrepancies. The total MC rate is scaled to match the data rate, so this is a shape comparison only. Figure 6.6 has the FLERCNN energy distribution on the left and FLERCNN zenith angle distribution on the right. The top histogram in each figure stacks the separated interactions in the MC indicated by the colored regions, scaled to compare with the data in the black data points. The bottom part of the figure has the data/MC ratio from each bin, bounded to look at the region around the 1.0 line representing perfect agreement. 90 The energy does well across most the range, with expected improvement once the MC uses the data’s best fit for the parameters. The zenith angle agreement has a sharper disagreement, but to some extent this is expected since the zenith distribution is highly dependent on the flux parameters which are not yet fit to match the data. Neither of these plots show concerning, pre-fit disagreement and their trends are expected to be resolved after the fit parameters are applied to the MC (and is resolved post-fit in the one year analysis – see Section 7.4.1). Figure 6.6: Data/MC agreement for reconstructed energy and zenith angle. Figure 6.7 shows the data/MC agreement for the PID (left) and the atmospheric muon classifier (right). Both plots show good agreement across their included ranges (0-1 for PID and 0.8-1.0 for muon classifier). The vertex 𝑧 MC prediction in Figure 6.8 (left) shows good agreement with data across the included 𝑧 region from -495 to -225 m. For the radial reconstruction, there is a known artifact in the CNN reconstruction where the string positions are favored in the 𝑥 and 𝑦 reconstruction, contributing to the two-peak shape seen in the 𝜌36 histogram (right plot in Figure 6.8). Since this reconstruction is used for the containment cut, the precise reconstructed position is not necessary to make a reasonable containment cut. 91 Figure 6.7: Data/MC agreement for PID and muon BDT classifiers. Note that the muon BDT classifier has a cut applied at 0.8, which accounts for the behavior from 0.4-0.8. Figure 6.8: Data/MC agreement for starting vertex z and 𝜌36 reconstructed variables (radial position relative to the center IceCube string #36). 92 6.4 Systematic Uncertainties Various systematic influences are assessed for this analysis. There are 16 systematic parameters which are free during the analysis fit, while others have been determined to not have a large effect on the data and thus are fixed. Selecting only the systematics that impact the analysis is import to help the minimizer converge, which becomes increasingly complex as additional parameters are added. Broadly, the systematics considered fall into four categories: atmospheric flux (Section 6.4.1), oscillation modeling (Section 6.4.2), cross sections (Section 6.4.3), and detector systematics (Section 6.4.4). Section 6.4.5 shows how the impact of the systematic parameters are determined, to justify fixing the other oscillation parameters to their nominal values. 6.4.1 Atmospheric Flux The atmospheric flux model has a significant influence on the expected MC counts. The uncer- tainties for the flux models vary over energy and zenith angle, so that they must be accounted for carefully. The flux is dependent on the choice of the cosmic ray model, meson production yield, atmospheric model density, and hadronic interaction model. The FLERCNN analysis uses the MCEq toolkit to calculate the atmospheric flux [59]. Instead of enforcing the oscillation analysis to rely on the result of a single flux model, the introduction of the systematic parameters below allows the expected neutrino flux to vary and span the possible models and their uncertainties. The flux can be updated iteratively during the analysis due to the MCEq framework, allowing the counts to be re-weighted to reflect the flux estimation. 6.4.1.1 Cosmic Ray Uncertainty Shifts in the spectral index of the neutrino flux (Δ𝛾𝜈 ) and the energy pivot are due to the cosmic ray flux uncertainties, such that   Δ𝛾𝜈 𝐸 ΔΦ = (6.4) 𝐸 pivot 93 The energy pivot is fixed during the fit at 24.1 GeV. The neutrino flux spectral index is free, and Figure 6.11 indicates that this systematic is ranked as one of the leading impacts on the analysis, along with the detector systematics. 6.4.1.2 Uncertainty on Meson Production As discussed in Section 1.4, atmospheric neutrinos are produced through pion and kaon decay in the atmospheric. Thus, the uncertainty on the neutrino production from the meson flux is a factor that can affect the neutrino expectation. The meson flux uncertainties are energy and zenith angle dependent, outlined by Barr et al. in [15], and hence the naming convention in Table 6.2. Letters a-i refer to the pion uncertainty and w-z the kaon uncertainty, with the letter indicating the energy and 𝑥 lab region that the Barr parameter accounts for [15]. The MCEq framework allows variation for each the systematic parameters individually within their own region corresponding to the Barr term. Figure 6.9: Barr parameters labels, chart from [15]. The Barr parameters mostly show a weak influence on the fit (Figure 6.11), with only one parameter having a Δ𝐿𝐿𝐻 > 0.1𝜎. To keep consistency with the IceCube likelihood-based 9.28 year analysis, a total of five of the Barr parameters are chosen to remain free during the fit even thought their expected impact is low for the FLERCNN analysis [9]. 94 6.4.1.3 Normalization Terms The normalization terms (𝑁𝑥 in Table 6.2) uniformly scales the rate for the specified particle in the subscript across the entire energy range. The only normalization terms that are free during the fit are the neutrino normalization (𝑁 𝜈 ) and muon normalization (𝑁 𝜇 ). The neutrino normalization scaling helps to take into account things like the neutrinos removed during the veto cuts, such as coincident events. The muon normalization can account for things such as count variations due to the detector systematics, which are not directly factored into the muon estimated counts (how these are handled for the neutrinos in Section 6.4.4). Thus, both of these normalization terms free during the fit to allow additional flexibility for the fit to account for possible uniform scaling in the expected counts. 6.4.2 Oscillation Modeling and Parameters The oscillation modeling script must take into account parameters such as the earth density, depth of the detector, and height above the earth at which the neutrino is generated in the atmosphere (prop height). The earth density affects the matter effects–this analysis uses the 12 layer estimate of the preliminary reference Earth model [60] to account for the different density of from the core to the crust. There are three terms that account for the electron density in the inner core (YeI), outer core (YeO), and mantle (YeM). These values are fixed at the inputs used for the oscillation modeling computation (using Prob3++ [61],), along with the detector depth and neutrino generation height. The 𝜈 𝜇 disappearance analysis aims to fit Δ𝑚 313 and sin2 𝜃 , but since we are using the 3-flavor 23 paradigm, the probability of oscillation also depends on the other mixing angles, other mass squared differences, and the 𝛿CP violating term. This analysis is not sensitive to these variables, since they are subdominant to the 𝜈 𝜇 → 𝜈𝜏 oscillation (Section 1) so their values are fixed during the fit. 95 6.4.3 Cross Sections The uncertainties on the cross sections for DIS, resonant, and QE scattering must be considered as systematic parameters. Contributions to all three cross sections are free during the fit. The DIS systematic parameter in this analysis, simply referred to as "DIS" in Table 6.2, is a term that accounts for how "CSMS-like" the cross section is vs. "GENIE-like" (Section 1.3.3). This is done by finding the ratio of the GENIE and CSMS cross section and adjusting this ratio as a constant scale. With its current implementation, the DIS term is not applied with any energy dependence, but scales the total DIS cross section with an energy independent interpolation. As described in Sections 1.3.2 and 1.3.1, the cross section for resonance and quasi-elastic interactions are determined by the form factor, which is dependent on the axial mass (𝑀 𝐴 ) that has been determined from experimental measurements. Thus, the axial mass terms are separated for QE scattering 𝑀 𝐴,QE and resonant scattering 𝑀 𝐴,res , as they are implemented separately in GENIE [31]. Both of these axial mass terms are free systematic parameters that are fitted during the analysis. Note that a higher 𝑀 𝐴 implies higher cross section and thus higher neutrino yield– so including these terms in the systematics is critical to account for potential differences in the observed neutrino yield. 6.4.4 Detector Systematics The detector systematic parameters accounted for in this analysis are the DOM efficiency, two variables used to parameterize the hole ice effects (𝑝 0 and 𝑝 1 ), and the scattering and absorption of the bulk ice. Various MC systematic sets are generated with altered settings for these five detector systematics. These simulations provide discrete MC distributions that include the effect of these changed parameters, which are both energy and zenith dependent. Table 6.1 describes the systematic sets’ detector settings in detail, highlighting the changes from the nominal baseline values (first row of table). For each systematic set (corresponding to a row), there are 1-3 parameters changed from their nominal setting. The DOM efficiency indicates the PMT’s sensitivity to photons, with a value of 1.0 indicating an efficiency aligned with calibration 96 measurements (Sections 2.1.3 and 4.4.1) and values < 1 scale the efficiency so that less light per event is detector. The hole ice systematics describe the parameterize of the angular acceptance, where the magnitude of the changes indicate shifts in the upgoing (𝑝 0 ) and downgoing (𝑝 1 ) photon acceptances (Sections 2.3.2 and 4.4.2). The scattering and absorption values indicate scaling of the bulk ice scattering and absorption coefficients (Section 2.3.1). Thus, if the scattering systematic parameter is > 1, it indicates more photon scattering in the ice, which can decrease the detection rate of photons initially from the head on direction but increase the rate from more downgoing photons from being scattered toward the PMT face (Section 4.4.3). A absorption systematic parameter of > 1 indicates more absorption in the ice and a lower likelihood of the photon being detected. In total, 30 systematic sets are used to generate hypersurfaces that interpolate between the discrete settings used to create the 30 systematic sets. 6.4.4.1 Hypersurfaces The hypersurfaces are used to interpolate between the discrete systematic sets and return a con- tinuous, smooth space for the five detector systematics in each analysis bin (interaction type, reconstructed energy, reconstructed zenith). The sample is first split by PID, separated into three true categories due to their topology (Section 2.2.2): 𝜈 𝜇 CC (including 𝜈¯ 𝜇 CC), 𝜈𝜏 CC (including 𝜈¯ 𝜏 CC), and 𝜈𝑒 CC and all NC flavors (and their antineutrinos). Then, for each flavor they are calculated per analysis bin using a linear function where a slope and intercept are fit for each of the five detector systematic parameters. Figure 6.10 illustrates the hypersurface fit projected on the DOM efficiency axis for the 𝜈 𝜇 CC (including 𝜈¯ 𝜇 CC) sample. Each hypersurface is fit on one [PID, energy, cosine zenith] bin, with the example in Figure 6.10 on the [track, 28.7 - 38.6 GeV, -1.0 - -0.87] analysis bin using only the 𝜈 𝜇 CC sample. The uncertainty on the hypersurface fit (shown in the red shading around the line) is less than the statistical uncertainty on the counts in the bin in the baseline simulation, so they are also used to re-weight the expected nominal simulated count in each MC bin (y-axis of Figure 6.10. Notice that the re-weighting factor for this example has the nominal (1.0) DOM efficiency 97 DOM Efficiency Hole Ice 𝑝 0 Hole Ice 𝑝 1 Scattering Absorption 1.00 0.101569 -0.049344 1.00 1.00 0.90 0.101569 -0.049344 1.00 1.00 0.95 0.101569 -0.049344 1.00 1.00 1.05 0.101569 -0.049344 1.00 1.00 1.10 0.101569 -0.049344 1.00 1.00 1.20 0.101569 -0.049344 1.00 1.00 1.00 -0.064805 -0.108849 1.00 1.00 1.00 -0.483907 -0.01712 1.00 1.00 1.00 0.280321 -0.075404 1.00 1.00 1.00 0.112231 0.003502 1.00 1.00 0.88 -0.049899 -0.054374 1.00 1.00 0.93 -0.372924 0.034933 1.00 1.00 0.97 0.296566 -0.036394 1.00 1.00 1.03 0.124492 -0.11328 1.00 1.00 1.12 -0.314209 -0.077367 1.00 1.00 1.00 -0.5 -0.049344 1.00 1.00 1.00 0.1 -0.049344 1.00 1.00 1.00 0.3 -0.049344 1.00 1.00 1.00 0.101569 -0.1 1.00 1.00 1.00 0.101569 0 1.00 1.00 1.00 0.101569 0.05 1.00 1.00 1.00 0.101569 0.10 1.00 1.00 1.00 -0.2 -0.049344 1.00 1.00 1.00 0.101569 -0.049344 1.05 1.05 1.00 0.101569 -0.049344 1.05 0.95 1.00 0.101569 -0.049344 0.95 1.05 1.00 0.101569 -0.049344 0.95 0.95 1.00 0.101569 -0.049344 1.00 1.10 1.00 0.101569 -0.049344 1.00 0.90 1.00 0.101569 -0.049344 1.10 1.00 1.00 0.101569 -0.049344 0.90 1.00 Table 6.1: Settings for the systematic sets used to generate the hypersurfaces, with each row describing a new set. Cells highlighted in green indicate settings that vary from their baseline values (row 1). Rows highlighted in gray indicate systematic sets that are currently under study with potential mismatched simulation (Section 7.2). 98 offset to ∼1.1 . Figure 6.10: Example of hypersurface fit in one analysis bin projected on the DOM efficiency axes, using the discrete sets to create a continuous fit (plot from [8]). The lighter data points indicate systematic sets that have additional detector parameters pulled in addition to the DOM efficiency. The darker data points are systematic sets with only DOM efficiency changed. Since the hypersurfaces are calculated using binned simulation, they are subject to dependence on non-detector parameters. To reduce this bias, the hypersurface fit for the five systematics is then interpolated over the oscillation and flux parameters since these are most significant non- detector parameters for this analysis. Thus, fitted hypersurfaces for the detector systematics are 2 , 𝜃 , and the neutrino flux spectral index (Δ𝛾 ) parameter space. also interpolated in the Δ𝑚 32 23 𝜈 6.4.5 Assessment of Systematic Parameters’ Impact The systematic impact test aims to quantify the impact on the measured physics parameters if a given systematic parameter is fixed at a nominal, but the nominal is the incorrect value. Overall, this test should help determine which systematic parameters can safely be fixed during the analysis fit. The degree of mismodeling is calculated, which represents the potential that the true physics 99 parameters are rejected due to a specific systematic parameter being wrongly fixed and affecting the underlying data distribution. Note that no statistical fluctuations are applied in this test (referred to as an "Asimov" test). The procedure for the test is as follows: 1. Take one systematic parameter and pull by 1𝜎 (if Gaussian prior) or half way to its upper bound (if uniform prior). 2. Generate pseudo-data with this setting, and a choice for the true physics parameters. 3. Return parameter to nominal value and fix it there for the fit. 4. Fit the pseudo-data for the physics and systematic parameters, without fitting the fixed systematic parameter in question. 5. Calculate the difference in the TS between the fit with these settings vs. point representing the true physics parameters. 6. Repeat for different true values of the physics parameters, scanning 9 different combinations of the physics parameter settings. 7. Take the largest difference in TS for Figure 6.11. 8. Repeat for each systematic parameter in question. To determine the relative 𝜎 values for the −2Δ𝐿𝐿𝐻, Wilks’ thereom is used such that the 𝐿𝐿𝐻 is assumed to be 𝜒2 distributed. The 𝜎 values were used as a metric to determine the systematic parameters that should be included in the fit, due to the mismodeling potential they have if they are fixed to an incorrect assumption. Thus, all parameters with a −2Δ𝐿𝐿𝐻 > 0.1𝜎 are left free during the analysis (indicated by orange color bar). In addition, there are some variables that are also left free, even with the expected mismodeling impact low. The 𝑁 𝜇 normalization is free during the fit, so that there is a parameter to account for the scaling of the expected muon count. This is the only systematic parameter included to account for differences in the muon distribution, so it is not fixed in this analysis. The 𝑀 𝐴,𝑄𝐸 falls just below 100 Figure 6.11: Relative effect of each systematic parameter on the analysis, determined by pulling the parameter from nominal to generate pseudo-data but fixing it to nominal when fitting. Bars in orange are systematic parameters that are free for the proposed analysis fit, while the bars in purple are proposed to be fixed. the 1𝜎 threshold, but due to the fact that 𝑀 𝐴,𝑄𝐸 and 𝑀 𝐴,𝑟𝑒𝑠𝑡 are implemented separately in the GENIE simulation, both of these variables are left free for the fit. Lastly, four of the Barr parameters (𝑦 𝐾 + ,ℎ 𝜋+ , 𝑤 𝐾 + , 𝑔 𝑝𝑖 + ) that have −2Δ𝐿𝐿𝐻 > 0.1𝜎 are left free for comparison purposed to the other IceCube 9.28 year analysis using the likelihood-based reconstruction. 6.4.6 Summary of All Systematic Parameters In total, there are 16 systematic parameters that are left free during the fit. These parameters are all seeded at their nominal value in Table 6.2, based on expected values for each systematic. The other systematic parameters are fixed at their nominal values. 101 Table 6.2: Systematic parameters, including those fixed during fit, along with the priors and bounds on the parameters that are free during fit. Parameters that are fixed have their row highlighted in gray. 102 6.5 Verifying the Analysis Pipeline Aspects of the analysis that are verified in this section are the minimizer’s ability to recover true parameters, the projected sensitivity, and the reproducibility of fits with statistical fluctuations. These tests use pseudo-data generated with varied settings for the systematic and physics parameters, depending on their purpose. Some of these pseudo-data sets have no statistical fluctuations (Asimov) and some have Poisson statistical fluctuations applied dependent on the expected count in each bin. Each of the tests and results are described in the following sections. 6.5.1 Asimov Inject-Recover Test The Asimov inject/recover test uses MC samples without any statistical fluctuations. This test is run to check that the minimizer and analysis pipeline can recover various injected values of the physics parameters, along with the nominal systematic parameters. The procedure for the test is as follows: 1. Generate pseudo-data with certain combinations of mixing angle (𝜃 23 ) and mass splitting 2 ) values. (Δ𝑚 31 2. Run the fit on each set of pseudo-data to attempt to recover the physics and systematic parameters, with the physics parameters seeded at the best-fit point from the 7.5 year IceCube 𝜈 𝜇 disappearance analysis. 3. Compared the recovered (i.e. fitted) physics parameters with the injected values. With no statistical fluctuations, the physics parameters are expected to be recovered almost perfectly. The systematic parameters are also checked to ensure reasonable recovery. The recovered physics parameters are compared to the injected values at 25 different combina- tions of mass splitting and mixing angle. Figure 6.12 shows that all injected combinations of the physics parameters were successfully recovered by the minimizer. The systematic parameters are 103 also recovered with with similar precision. The basis of the minimizer is Migrad from the Minuit2 public library [62, 63, 64]. Figure 6.12: Scan over physics parameters showing the inject values for the pseudo-data in the orange dot, the fitted value in the purple cross, and the fitter starter point in red plus (the latter is the same for all scans). All fitted physics parameters recover their injected truth parameters. 6.5.2 Projected Sensitivity To Oscillation Parameters The sensitivity test evaluates the likelihood of multiple combinations of sin2 (𝜃 23 ) and Δ𝑚 322 . The physics parameter space is scanned, and the 𝐿𝐿𝐻 at each point is compared to the 𝐿𝐿𝐻 at the best fit. The contour for the 90% confidence level is then drawn based on the a 𝜒2 distribution, assuming Wilks’ theorem. The procedure for the sensitivity is as follows: 1. Weight the MC to nominal physics parameter values and used as pseudo-data for the fit (with no statistical fluctuations). 2. Get the 𝐿𝐿𝐻 value at the best fit point by performing a fit to the pseudo-data. 3. Fix the physics parameters to a selected off-nominal value and perform a new fit to the pseudo-data. 104 4. Calculate the difference in 𝐿𝐿𝐻 between the two fits (with off-nominal physics parameters vs. free fit to best fit point). 5. Repeat steps 3 and 4 for various combinations of the physics parameters to map the likelihood space 6. Use Wilks’ theorem to determine the test statistic value that represents the desired (1𝜎/90%) confidence level. 7. Compare differences in 𝐿𝐿𝐻 to draw the contour where the difference is greater than the critical value of the test statistic. The result in Figure 6.13 shows the FLERCNN 𝜈 𝜇 disappearance analysis sensitivity in the red contour. The best fit point from the IceCube 7.5 year 𝜈 𝜇 disappearance result was assumed to be true (contour in yellow). There is a direct comparison with the 9.28 year IceCube analysis using the likelihood-based reconstruction method (in black), showing that the FLERCNN sample has comparable sensitivity. These contours are contrasted with global experimental results from other neutrino oscillation experiments, indicating that FLERCNN projected sensitivity aligns with the current global experiments [4, 5, 6, 7, 8, 9]. 6.5.3 Parameter Ensemble Test The parameter ensemble test looks at the ability of the fitter to recover the injected parameters, particularly for large, off-nominal values of the physics and systematic parameter. The procedure for the parameter test is as follows: 1. Use the Gaussian priors and uniform ranges to select values for the physics and systematic parameters. 2. Generate Asimov pseudo-data using the chosen values for the physics and systematic param- eters. 105 Figure 6.13: Sensitivity of the FLERCNN and 𝐿𝐿𝐻-based analysis at the verification sample best fit point, compared to global results. 3. Perform the fit, using the same initial seed for each parameter over the multiple trials. 4. Repeat with different values selected for the physics and systematic parameters. Figure 6.14 shows the fitter’s ability to recover the injected physics and systematic parameters. Even with the same seed used for each parameter, the fit is typically able to recover the true value used to generate the pseudo-data. A perfect result would be on the diagonal 1:1 line, with 2 , hole ice, scattering, absorption, and DOM efficiency show the best capability parameters like Δ𝑚 31 of recovery (trials are nearest to the 1:1 line). Some of the parameters with the weakest recovery capability are the Barr parameters, which is expected from the results in Section 6.4.5 showing the systematic impact of the Barr parameters is minimal. There is also an interesting feature in the 𝜃 23 histogram where the octant flip is visible (fitting 55◦ rather than the true 35◦ ), but this is unique to fitting the angle and only happens in a small fraction of the trials. For nearly all parameters, there are some fits that have a large fitted error, but these typically are correlated with a large negative 𝐿𝐿𝐻 (green to yellow dots), indicating that their fit failed. This is where blind fit checks (Section 6.6.1) are useful, where the goodness of fit will be evaluated by comparing the fit’s 𝐿𝐿𝐻 to the 𝐿𝐿𝐻 of statistically fluctuated trial fits around the best fit point to ensure the final fit is not one of 106 these outliers (step 3 in Table 6.3). Figure 6.14: Quantifying the ability of the fitter to recover systematic parameters injected at various off-nominal values, with a perfect recovery on the diagonal 1:1 gray line and the results for various fits shown in the dots. The color of the dots corresponds to the fit’s total −𝐿𝐿𝐻 result, where a smaller value (purple) of −𝐿𝐿𝐻 indicates a better fit than a larger value (yellow). This test is run assuming normal ordering of the neutrino masses. 6.5.4 Ensemble Test The ensemble test explores the effect of statistical fluctuations on the analysis, performing the fit on the MC pseudo-data with added Poisson fluctuations over many trials. The procedure for the ensemble test is as follows: 1. Generate pseudo-data by adding Poisson fluctuations to the nominal MC template. 2. Fit physics and systematic parameters for each pseudo-data trial. 3. Explore performance over 𝑛 trials for the fitted physics and systematic parameters. 107 The results of the ensemble test for the 500 trials are in Figures 6.15 and 6.16, for both the normal and inverted orderings respectively. The histogram of the results for each parameter shows the fit values centered at the nominal parameter value. The 90% bounds for all histograms are within the prior 1𝜎 width, for the parameters with Gaussian priors. This means that the systematic and physics parameters are all fitting values within the expected range and centered at their expected values. Figure 6.15: Fit results for all systematic and physics parameters after 500 ensemble test trials, assuming normal ordering. The distribution of fitted values for each parameter, including the distri- bution’s median value which should ideally be aligned with the nominal value and the distribution’s 1𝜎 which should be within the range of the prior. 6.6 Blind Fit To reinforce objectivity, the fit is run on the data and some checks are performed without looking directly at the best fit point result. These checks include ensuring that each systematic parameter is fit at a value that is not outside of it’s expected range, each analysis bin is not pulled far from its expectation, and there is a reasonable goodness of fit. The blind fit procedure is described in Section 6.6.1 and the outcome in Section 6.6.2. 108 Figure 6.16: Fit results for all systematic and physics parameters after 500 ensemble test trials, assuming inverted ordering. The distribution of fitted values for each parameter, including the distribution’s median value which should ideally be aligned with the nominal value and the distri- bution’s 1𝜎 which should be within the range of the prior. 6.6.1 Blind Fit Procedure The following is the blind fit procedure to evaluate the behavior of the fit without looking directly at the result. The conditions under which the analysis will be paused to assess concerns (called stop conditions) along with the items that are reported are in Table 6.3. These terms were set to by the IceCube collaboration’s oscillations working group, in alignment with previous analyses and their precedents. 1. Fit both normal and inverted hierarchy. IceCube is not expected to be sensitive to the mass ordering in this analysis, but this is still done to ensure blindness. The one with the better Test Statistic (TS) is used for the rest of the blind fit tests, but it is not reported or known by the analyzer. 2. Check pulls on all systematic parameters. Allow one (out of 16 systematic parameters) to have a larger pull: stop if one parameter is pulled > 3𝜎 or more than one parameter hits 109 listed stop conditions (Table 6.3). 3. Check pulls in each analysis bin between the data and MC prediction for the Best Fit Point (BFP) and stop if > 5𝜎. Run an ensemble test at blind fit BFP to get expected TS distribution using 500 trials of pseudo-data. Check goodness of fit: stop if TS from data is above the 95% percentile (5% 𝑝-value) compared to the pseudo-data trials. 4. Check 1D data/MC distributions for energy, zenith, and PID to make sure they agree with 𝑝-value > 5%. Test # Test Stop Condition Report one parameter > 3𝜎; Parameter pulls: two or more parameters > 2𝜎 Distance to boundary and 2 Non-Detector or at boundary for absolute pull in 𝜎 non-Gaussian prior Parameter pulls: Distance to boundary 2 Fit outside of [-10%, +10%] DOM efficiency and absolute pull in 𝜎 Parameter pulls: 𝑝 0 fit outside of [-0.60, +0.40] Distance to boundary and 2 Hole Ice 𝑝 1 fit outside of [-0.13, +0.10] absolute pull in 𝜎 Parameter pulls: Scattering or absorption fit Distance to boundary and 2 Bulk Ice outside of [-10%, +10%] absolute pull in 𝜎 Goodness of fit: 3 On sided 𝑝-value < 5% Plot of 𝑝-values Test Statistic Goodness of fit: 3 Any bin > 5𝜎 bin-wise plot of pulls bin-wise pulls 1D Data/MC 4 One sided 𝑝-value < 5% 1D data/MC plots agreement Table 6.3: Stop conditions for the blind fit plan. 6.6.2 Blind Fit Result The fit was successfully run for normal hierarchy and inverted hierarchy, and a preference chosen blindly based on the better test statistic result. All systematic parameter pulls passed their stop conditions, fit within 1𝜎 for the Gaussian priors and not at the boundary for uniform priors (Figure 6.17) . All fit counts per analysis bin pulled less than 4𝜎 from the expected distribution (Figure 110 6.18). This passes tests 1, 2, and part of 3. The second part of blind fit test 3 is checking the 𝑝-value of the test statistic. This required running the ensemble test at the best fit point, and comparing the data fit’s test statistic with the pseudo-data trials test statistics to check the goodness of fit. This is the blind fit test that the fit failed, with a one sided 𝑝-value at 0.6%, less than the 5% stop condition (Figure 6.19). Figure 6.17: Systematic pulls for the nine year FLERCNN sample blind fit shown in terms of 𝜎 for parameters with a Gaussian prior. None of the parameters hit their stop condition, with all Gaussian pulls under 1𝜎. Figure 6.18: Pulls in 𝜎 for each of the analysis bins between the data and the MC at the fitted physics and systematic parameters for the nine year FLERCNN sample. None of the bins hit the stop conditions for the pull, with all bins pulling less than 3𝜎. 111 Figure 6.19: Test statistic result for nine years of data fit shown in purple line with a distribution of ensemble test TS results in yellow showing the expected TS spread for 499 trials at the data best fit point. The data TS falls within the ensemble distribution of TS results with a p-value of 0.6% which hits the stop condition requiring it to be > 5𝜎. 6.6.3 Potential Contributions to the Blind Fit Result The goodness of fit result at 0.6% means that the data fit is not outside the prediction of the pseudo-data trials, but does have a higher 𝐿𝐿𝐻 than 99.4% of the trials. This does not necessarily indicate a major problem in the fit, but there could be potential issues leading to the weak goodness of fit. Thus, the analysis has been paused while a few areas of potential contribution to this result are explored. The post-fit 1D data/MC agreement was checked for some of the reconstructed and cut variables, in particular for variables that are unique to the FLERCNN analysis such as the FLERCNN reconstructed variables and muon BDT classifier. The energy and zenith distributions were not revealed so as to keep the analysis blind to the variables that directly influence the measurement of the oscillation parameters. There were no anomalous disagreements between the explored post-fit data/MC distributions. Due to the small sample size, no muon statistical errors or systematic changes (beyond the constant scaling muon normalization parameter) were factored into the analysis. The justification for this was that the muons only contributed to 0.6% of the total sample (Table 5.2), but if there 112 are underestimations in the muon flux, then this fraction could be larger. The muon normalization term in the systematic parameters is intended to account for this, but as Figure 6.11 indicates, the analysis is not very sensitive to this systematic parameter. While this simultaneously indicates that mismodeling impact for 𝑁 𝜇 should not have a large effect on the resulting 𝐿𝐿𝐻, it could have enough of an effect to cause the slightly lower than expected goodness of fit. Furthermore, 𝑁 𝜇 only accounts for a constant scaling across all the analysis bins, such that energy or zenith dependent muon count fluctuations, such as those caused by detector systematics, could also potentially be contributing to the pulls. A first attempt was made to account for the error on the muon counts by adding a bootstrap error estimation on the KDE smoothed muon distribution. It was also noted that the muon KDE smoothing was implemented over linear binning for energy, when the analysis binning uses log binning for energy. Thus, this was also updated so that the KDE smooths over the log binned energy parameter for muons. These changes did result in a slight increase in the goodness of fit 𝑝-value up to 2%. This is likely due to better modeling of the muon distribution along with the bootstrap error producing a larger variance for the pseudo-data trials, more in line with the muon statistical error. This did not increase the test statistic above the stop condition threshold, but indicates that there could be contribution from the lack of muon statistics in the sample and properly accounting for their error, including the variation due to the detector systematics. Another potential source of disagreement is in the ice model, which is used to propagate photons in the simulation. A new, more detailed ice model has been recently developed on IceCube that accounts for the of polycrystals in the ice [12]. The birefringence leads to increased diffusion of photons along the flow axis and a small deflection toward the flow axis. Accounting for this in the ice model has shown improved modeling of the photon arrival time at a basic level, and its implementation to high energy IceCube reconstructions shows improved results at a higher level [65]. Using the new ice model would require regenerating all of the MC simulation, since the ice model is used during the early, time intensive photon propagation step (Section 3.3). This would 113 take months to re-simulate the 108 events used in this analysis, including the necessary suite of systematic sets to create a robust fit for the neutrino hypersurfaces (Section 6.4.4.1). Since FLERCNN was trained on the simulation using a different ice model, it would also need to be verified that it can estimate the reconstructed and classification variables well with the new ice model simulation. If FLERCNN needs to be re-trained using the new ice model, then another independently generated ∼ 107 MC events would need to be simulated for the FLERCNN training samples. Given the robustness of FLERCNN to other ice variations (Section 4.4), it seems likely that it would not require re-training. In that case, the pre-processing and reconstruction should only take a week from this point–which is much improved from the months it takes the likelihood-base reconstruction. It should be noted that the other IceCube 9.28 year 𝜈 𝜇 disappearance analysis using the likelihood-based reconstruction sees similar issues, with an even smaller goodness of fit. Since these analyses use the same base data and MC up to Level 6 (diverging at the reconstruction and final analysis cuts) and the same analysis software, these are the areas of focus for searching for potential issues. This was reinforced after the post-fit data/MC distributions for the FLERCNN sample did not show any concerning results, indicating that potential issues could be in the underlying assumptions such as the muon background or ice model. The 9.28 year 𝜈 𝜇 disappearance analysis using the likelihood-based reconstruction saw strongest pulls in the DIS cross section scaling and one of the Barr parameters (Section 6.4.1). As mentioned in Section 6.4.3, the DIS cross section adjustment is energy independent and may benefit from a treatment allowing flexibility that scales with the energy. There are currently studies being pursued on adding this implementation. Similarly, there is work being done to apply more advance handling of the Barr terms for the atmospheric flux. Unlike the likelihood-based reconstruction analysis sample, the FLERCNN analysis did not see a pull > 2𝜎 in any systematic parameters. The DIS and Barr parameters could still be contributing to the weaker goodness of fit, but there is no direct indication of any single systematic parameter pulling strongly in the FLERCNN analysis. As these studies are time-consuming and potentially open-ended, a small subset of the data will 114 instead be used as a proof of concept for the FLERCNN analysis without unblinding the full result. Only 10% of the analysis sample will be used, so that statistical uncertainties in the measured physics paramters will be large enough to preserve blindness in the 9.28 year analysis. Using a smaller analysis sample also requires less precise agreement of the data and MC. This will help to validate the analysis pipeline and show a result using the FLERCNN sample. Due to the low statistics, the analysis on 10% of the data will not create competitive bounds on the oscillation parameters, but should still fit in a region that aligns with other experiments. 115 CHAPTER 7 ANALYSIS OF MUON NEUTRINO DISAPPEARANCE WITH 0.92 YEARS OF DATA To demonstrate that the new FLERCNN reconstruction and analysis sample yields reasonable results, while the 9.28 year FLERCNN analysis remains under collaboration review, the analysis is run with one year of livetime. The analysis with only one year of livetime does not expect to have competitive oscillation constraints due to the limited statistics, but it will determine if the analysis is in agreement with other results and provide a proof of concept for the FLERCNN sample. 7.1 One Year Sample Only one year of data is used henceforth from the 2013 data taking season, which ran from April 2013 to May 2014. The total livetime for the 2013 season is 0.92 years, which is 10% of the full sample livetime. This sample will be referred to as the "one year sample," as compared to the FLERCNN "nine year sample." The one year sample is used to show a proof of concept that the new FLERCNN reconstruction and analysis procedure produce results that are in line with previous expectations. This is used to undergo the analysis fit and create the 𝜈 𝜇 disappearance oscillation parameter contours. 7.2 Changes in Analysis with One Year Sample The same MC sample is used to predict the expected analysis distribution (Section 5), with the number of expected counts re-weighted to the new livetime (Table 7.1). There is a minor update to the muon KDE for the one year analysis, which now has the smoothing applied with log binning in energy so that it better matches the analysis binning. The one year sample also updated the systematic set list, as some systematic sets were identified with potential concerns with the uniformity (rows highlighted in gray in Table 6.4). These sets are ones that have a more severe pull in the systematic parameters (one for DOM efficiency and eight for hole ice), so the detector systematic behavior is still well-defined within the expected systematic ranges. This did require 116 making a new version of the hypersurfaces, using 21 of the originally identified 30 systematic sets, which results in a new re-weighting for the nominal analysis sample. This re-weighting only changed the bin count by ±1 event per bin, for a total count difference of O (10) for the entire sample. All of these changes are minor, indicated by the fact that there is no change in the total rate or fractions expected from each of the flavors (Table 7.1). Selection Rate (mHz) # of Counts (0.92 yr) % of Sample 𝜈𝑒 CC 0.175 5,070 ± 7 23.2% 𝜈 𝜇 CC 0.453 13,166 ± 13 60.3% 𝜈𝜏 CC 0.037 1,066 ± 2 4.9% 𝜈 NC 0.082 2,387 ± 5 10.9% 𝜇 𝑎𝑡𝑚 0.005 138 ± 6 0.6% noise < 0.0006 ∼0 < 0.08% total 0.750 21,829 ± 17 - Table 7.1: Total counts with 0.92 years of livetime predicted by the updated MC simulation. Due to the decrease in statistics, some of the analysis bins have near-zero expected counts. As described in Section 5.4.4, these low statistics bins should be masked during analysis. Thus, the mask is extended for the one year sample to cover any bin with less than eight events. This occurs in the low energy cascade and track PID bins for the one year sample (Figure 7.1). Figure 7.1: Analysis binning after masking, with the masked bins shown in white, for the one year of livetime. 117 7.2.1 Adjusting the Test Statistic Some of the exploration for the nine year sample’s goodness of fit was looking at the muon error, which now includes an estimate of the muon uncertainty using the newly implemented bootstrap method. To make sure this is incorporated directly into the test statistic, the one year analysis will 2 use a 𝜒mod that was used as the test statistic by the IceCube 7.5 year golden sample analysis. This 2 𝜒mod contains an additional term in the denominator ((𝜎𝑖𝑛 ) 2 ) that directly accounts for the MC error. Thus, this will be the new test statistic for the one year analysis: ∑︁ (𝑛 − 𝑑 ) 2 ∑︁ (𝑠 𝑗 − 𝑠ˆ 𝑗 ) 2 2 𝑖 𝑖 𝜒mod = 𝑛)2 + (7.1) 𝑖∈bins 𝑛𝑖 − (𝜎 𝑖 𝑗 ∈syst 𝜎 2 𝑠 𝑗 The expected counts per bin (𝑛𝑖 ) are determined by the sum of the MC weights per bin. The statistical uncertainty on the simulation is included as (𝜎𝑖𝑛 ) 2 which is the sum of the squared MC weights. The second term of the 𝜒mod 2 is a penalty term on the systematics, where 𝑗 accounts for the systematic parameters with the fitted value (𝑠 𝑗 ) compared to the nominal value (𝑠ˆ 𝑗 ). This is the same penalty term included for the 𝐿𝐿𝐻 test statistic 6.2. 7.2.2 Summary of Changes Table 7.2 summarizes the changes for the one year analysis. Note that the last three changes have been tested on the nine year sample, but the goodness of fit remains <5% still with these changes. All other aspects of the one year sample remains the same as the proposed nine year analysis, since this one year sample is being used as a proof of concept. The changes described are not expected to affect most of the analysis checks previously performed, which were re-run on the one year sample to confirm as much. The only analysis checks where the outcomes change are for the projected sensitivity and the systematic impact test, as expected due to the change in statistics. Thus, these checks on the one year sample are discussed in Sections 7.3.1 and 7.3.2. 118 Change in Analysis One Year Sample Nine Year Sample Livetime (yrs) 0.92 9.28 Bins masked 58 10 Test statistic 2 𝜒mod LLH Muon Error bootstrap estimate None Muon KDE energy log bins linear bins Detector systematic sets 21 30 Table 7.2: Changes made for the one year analysis to handle updates and reduced in statistics. 7.3 Adjusted Expectations with One Year of Livetime 7.3.1 Sensitivity Projection for One Year The sensitivity projection is run using the new livetime of 0.92 years. Figure 7.2 shows the change in sensitivity between the one year (red) and full nine year (black) FLERCNN analysis samples. The main difference is due to the changed sample size, as the analysis pipeline has not changed significantly otherwise. The results from other IceCube samples and other experiments are included for comparison. It is expected that only one year of IceCube livetime would not be competitive with previous results; this is necessary to avoid biasing the result from the nine year analysis sample. 7.3.2 Impact of Systematics on One Year Analysis The other expected difference is the mismodeling impact of the systematic parameters on the one year analysis. It is now dominated by the statistical uncertainty, so each of the systematic parameters generally have less mismodeling potential. Figure 7.3 shows the comparison between the mismodeling for one year (left) vs. the full nine year FLERCNN sample (right). Note that the test statistic has changed between these two tests (Table 7.2), but both TS are assumed to follow 𝜒2 distributions so that their 𝜎 critical values remain the same. As expected, the magnitude of the mismodeling for each systematic parameter is less for the one year sample than that seen with the nine year sample. The one year sample has a maximum mismodeling impact of < 3𝜎 for DOM efficiency, whereas the nine year sample’s DOM efficiency mismodeling potential surpasses 5𝜎. The only exception to this trend in the difference of magnitude 119 Figure 7.2: Projected sensitivity for the one year analysis (red) vs. the nine year analysis (black), where the expected contour is wider with less statistics. is 𝑁 𝜈 , which has a slightly larger overall impact (> 0.5𝜎) in the one year sample than this same parameter in the nine year sample (> 0.3𝜎). The difference is minor, and could be due to the fact that that the nine year sample is sensitive to more systematics so that other systematic parameters could adjust to compensate for a mismodeled neutrino normalization. The same 16 systematic parameters will be kept free for the one year fit despite the decrease in magnitude of the mismodeling impact. This is motivated by the fact that the one year analysis is a proof of concept for the nine year analysis, so it uses the same systematics list. Thus, Table 6.2 still accurately describes the settings for the systematic parameters used in the one year analysis. 7.3.3 Pre-Fit Data/MC Agreement for One Year With the decrease in statistics, the error bars on the data/MC agreement comparison are larger. The same trends are evident here that were seen with the nine year analysis data/MC agreement plots, discussed in Section 4.2.1. This is expected since Section 6.3.1 shows consistency between most variables over the total ten years. Figures 7.4, 7.5, and 7.6 are included for completeness. 120 Figure 7.3: New systematic mismodeling plot is on the left for the one year sample, with comparison to the right plot (from Figure 6.11) for the nine year sample. Figure 7.4: Data/MC pre-fit agreement for one year of livetime with the reconstructed energy (left) and reconstructed zenith angle (right). 121 Figure 7.5: Data/MC pre-fit agreement for one year of livetime with the CNN classified PID (left) and BDT Muon classification (right). Figure 7.6: Data/MC pre-fit agreement for one year of livetime with the starting vertex reconstructed 𝜌36 (left) and reconstructed z positions (right). 122 7.4 Blind Fit The blind fit procedure and stop conditions remain the same for the one year analysis (Section 6.6.1). The one year analysis passes the blind fit checks: ✓ Successful fit of normal and inverted hierarchy ✓ Systematic pulls fall within stop condition (Figure 7.7 for Gaussian pulls, Figure 7.13 for a complete picture) ✓ Pulls in analysis bins fall within stop conditions (Figure 7.8) ✓ Goodness of fit TS 𝑝-value > 5% (Figure 7.9) ✓ Data/MC distributions for energy, zenith, and PID agree with 𝑝-value > 5% (Section 7.4.1) Figure 7.7: Systematic pulls for the one year blind fit shown in terms of 𝜎 for parameters with a Gaussian prior. 7.4.1 Data/MC Agreement using Best Fit Values The data/MC agreement is compared after the fit in Figures 7.10, 7.11, 7.12. Note that the comparison now relies on the fitted 𝑁 𝜈 and 𝑁 𝜇 systematic parameters to scale the MC to the data, rather than an artificial scaling applied during plotting with the pre-fit distributions. 123 Figure 7.8: Pulls in 𝜎 for each of the analysis bins between the data and the MC at the fitted physics and systematic parameters. Figure 7.9: Test statistic result for the data fit shown in purple line with a distribution of ensemble test TS results in yellow showing the expected TS spread for 499 trials at the data best fit point. The data TS falls within the ensemble distribution of TS results with a p-value of 31.7%. First, we will focus on the three FLERCNN variables that contribute directly to the analysis: energy, zenith, and PID. The energy has good data/MC agreement across the 5-100 GeV range, with no bias or trend evident. The 𝑝-value comparing the data and MC distribution is 0.141, which is greater than the required 5% threshold. The zenith distribution bias also no longer exists after the fit, with a large improvement in the 𝑝-value up to 0.773 post-fit. The PID continues to have good agreement after the fit too, with a 𝑝-value remaining near 1.0. All post-fit 𝑝-values for the core reconstructed and classified variables using FLERCNN (or FLERCNN as input for the BDT) show good post-fit agreement via this metric. The post-fit 124 Figure 7.10: Post-fit data/MC agreement for the energy (left) and zenith angle (right). Figure 7.11: Post-fit data/MC agreement for the PID score (left) and background muon BDT score (right). 125 Figure 7.12: Post-fit data/MC agreement for the 𝜌36 (left) and 𝑧 (right). 𝑝-values either large improvements or minimal change for the data/MC comparison (Table 7.3). Notably, the zenith distribution’s 𝑝-value improved by five orders of magnitude and the muon BDT score distribution’s 𝑝-value is more than doubled. Variables that saw a decreased 𝑝-value only changed by < 0.05, none of which drop below the 5% threshold. Reconstructed Post-Fit 𝑝-value Pre-Fit 𝑝-value energy 0.135 0.174 cos 𝜃 𝑧 0.773 1.6 x 10−6 PID Score 0.949 0.965 Muon BDT Score 0.231 0.102 𝜌36 0.090 0.113 𝑧 0.985 0.993 Table 7.3: Agreement between the data and post-fit MC using the best fit values for the physics and systematic parameters, quantified by the 𝑝-value pre- vs. post-fit. 7.5 Best Fit Result The result for the best fit is sin2 (𝜃 23 ) = 0.634+0.049 −0.300 2 = 2.41+0.22 × 10−3 GeV2 . Table and Δ𝑚 32 −0.21 7.4 contains the best fit results for all the physics and systematic parameters after the FLERCNN 0.92 year fit, including the pulls from their priors when applicable. Figure 7.13 has the full picture of this table, including the boundaries for the stop conditions where applicable. The notable result is that the mixing angle is fit off-maximal, closer to 52◦ rather than 45◦ , but is still consistent 126 with the fit at maximal at the 1𝜎 level. Since this analysis is performed with low statistics, the uncertainty and resulting contours are quite large–as intended so as not to bias the results of the nine year FLERCNN analysis. None of the physics or systematic parameters hit concerning boundaries though, as confirmed during the blind fit checks, indicating that they are consistent with expectations (Figure 7.13). Parameter Fitted Value Pull (𝜎) Nominal Value sin2 (𝜃 23 ) 0.626 - 0.506 2 Δ𝑚 32 0.00241 - 0.00241 DOM eff 1.0314 0.314 1.0 Hole ice 𝑝 0 -0.446 0.101569 Hole ice 𝑝 1 -0.0294 - -0.049344 Ice scattering 1.07 0.231 1.05 Ice absorption 0.977 -0.465 1.0 Δ𝛾𝜈 0.00567 0.0567 0 𝑀 𝐴,QE 0.0810 0.0810 0 𝑀 𝐴,res -0.139 -0.139 0 DIS 0.0748 0.0748 0 𝑁𝜈 0.862 - 1.0 𝑁𝜇 1.15 0.372 1.0 Barr, g_𝜋 + -0.0209 -0.0695 0 Barr, h_𝜋 + -0.0360 -0.240 0 Barr, i_𝜋 + -0.0837 -0.137 0 Barr, w_K+ 0.0601 0.150 0 Barr, y_K+ 0.0277 0.0924 0 Table 7.4: Best fit values for the physics and systematic parameters using one year of livetime with FLERCNN sample. The pull describes the difference in 𝜎 of the parameters with Gaussian priors. 7.5.1 Best Fit Values and Contours The contour is determined by scanning values of Δ𝑚 32 2 and sin2 (𝜃 ) and performing the fit again. 23 2 The 𝜒mod result is subtracted from the 𝜒mod2 at the best fit point, and the difference is compared to the 𝜒2 distribution’s critical values to determine 90% and 1𝜎 confidence intervals. Figure 7.14 focuses on the result of the one year sample, including its contours and best fit point. Figure 7.15 shows the FLERCNN 0.92 year sample in red, compared to previous IceCube results (left plot) and the most recent results from other neutrino experiments. The best fit is indicated 127 in the cross, and the comparison to IceCube results shows that not only does the best fit align closely to the other 90% contours, but also agrees very well with the other Δ𝑚 32 2 results from the IceCube 7.5 year golden neutrino sample. The off-maximal fit for sin2 (𝜃 23 ) is not unique, with the IceCube 2019 result seeing a similar sin2 (𝜃 23 ) fit, but the 1𝜎 contours still do not eliminate the possibility of maximal mixing. It is important to note that the contours for the one year fit are wide, as expected, and the best fit point has large uncertainty on it due to the low statistics. This comparison is reassuring that the FLERCNN sample has results that agree with past IceCube and neutrino experiment results, and should have much more precise conclusions once the 9.28 year sample is unblinded. Since the one year sensitivity project was performed using the IceCube 7.5 year best fit point at maximal mixing, there is a shape mismatch for the sensitivity projection (Figure 7.2) and the actual contour around the off-maximal best fit. To confirm that the result contour is aligned with the analysis expectation, the sensitivity projection re-run using the FLERCNN result’s best fit point. Figure 7.16 confirms the sensitivity and result are consistent when using an off-maximal best fit. 128 Figure 7.13: Result FLERCNN one year fit shown on the explored range (range of number line plot), Gaussian prior if relevant (blue shaded region), nominal value (green dashed line), stop condition boundary (orange lines), and fitted value (red cross). For historical reasons, the analysis 2 , solving for Δ𝑚 2 = Δ𝑚 2 − Δ𝑚 2 . software fits for Δ𝑚 31 32 31 21 129 Figure 7.13: (Cont’d) 130 Figure 7.14: Result of the one year analysis, with 90% contour in red, 1𝜎 contour in blue, and the best fit point in the black X. The 1D projections for the physics parameters are show on top of and 2 to the right of the main plot, with the 𝜒mod value shown for each physics parameter. Figure 7.15: Result FLERCNN one year contours compared to other IceCube 𝜈 𝜇 disappearance anaylses (left) [16, 17, 8] and global experimental results (right) [4, 5, 6, 7]. 131 Figure 7.16: Projected sensitivity of the one year analysis using the one year result BFP. 132 CHAPTER 8 CONCLUSIONS This work has introduced the Fast Low Energy Reconstruction using Convolutional Neural Networks as an alternative, fast reconstruction method for low energy neutrino events in the IceCube detector. The FLERCNN method provides five reconstructed variables (neutrino energy, zenith angle, 𝑥, 𝑦, 𝑧 positions) and the score for the track-like and muon-like classification for each event. The performance of these networks is comparable with the likelihood-based reconstruction method (Section 4.3), and notably improve the reconstruction speed by a factor of 6000. This permits IceCube’s large, atmospheric neutrino MC and data samples to be reconstructed fully in days rather than months. This speedup opens the potential for faster analysis pipelines and to expanding the MC systematic sets for even more sophisticated handling of systematic parameters, where the simulation generation will be the bottleneck rather than the reconstruction itself. The 9.28 year FLERCNN analysis is a collaborative effort by the IceCube group, lead by the author and Dr. Shiqi Yu. Extensive checks have been performed to prove the viability of using an analysis sample defined by the FLERCNN reconstruction and cuts. The FLERCNN sample selection has shown reasonable pre-fit data/MC agreement and reconstructed parameters in the data samples have similar distributions between years (Section 6.3). Numerous systematic parameters are accounted for and their potential impact on the analysis is quantified, including parameters affecting the neutrino flux, cross sections, and detector systematics (Section 6.4). The analysis pipeline itself is confirmed using multiple tests to explore the recovery potential of the fit for both the physics and systematic parameters in the pseudodata (Section 3.1.2). While all of these tests pass the minimum requirements agreed upon by the IceCube collaborators, there are some small anomalies that were judged to not have significant impact on the final results. The blind fit performed on the 9.28 year sample resulted in small pulls per systematic parameter and per bin, when compared with the MC expectations, that were below the predetermined threshold. The goodness of fit yielded a p-value at 2%, failing the >5% threshold pre-determined to pause 133 the analysis. While several potential sources of mismodeling have been put forward to explain the goodness of fit result (Section 6.6.3), comprehensive evaluation of these issues will be time consuming. Thus, 10% of the total sample was taken to use as a confirmatory proof of concept for a 𝜈 𝜇 disappearance analysis using the FLERCNN sample. The results of the one year analysis show good agreement with both previous IceCube analyses and global experimental measurements (Section 7.5.1). The small statistics on the sample do not lead to constraining contours as intended, but the predicted sensitivity for the analysis aligns with the results. This indicates that the predicted sensitivity for the 9.28 year sample should hold as expected, and thus be comparable to the global constraints from accelerator oscillation experiments. This is particularly exciting since IceCube probes a different source of neutrinos over multiple baselines compared to single-baseline accelerator experiments. This will lead to similarly constraining contours on the oscillation parameters using multiple sources, energies, and baselines. Constraining these parameters leads us one step closer to understand the many open questions about neutrinos and their behavior in the Standard Model and beyond. 8.1 Potential for Future Work on FLERCNN The FLERCNN framework was developed over three years with multiple contributions from undergraduate students, graduate students, and one postdoctoral researcher. The author developed the framework and documentation, along with training other collaborators on the tools. While FLERCNN has reached a stage enabling it to be used as a robust, fast reconstruction for an IceCube oscillation analysis, there are future developments that could further improve the existing and future capability of the tools. Some potential suggestions are outlined in the following sections. 8.1.1 Optimize Current Architecture To improve the current FLERCNN regression and classification networks, metaparameter opti- mization could be explored in future studies. Some initial studies of the neural network’s number of layers and nodes has been preformed by undergraduate Brandon Pries, under the guidance of the 134 author. These did not reveal any clear improvements on a first pass, but this could be revisited now that the training samples have been optimized. In addition, there are other metaparameters that received some limited study (such as the loss functions and learning rate changes), that lead to the current chosen settings but could be conducted more systematically for each individual FLERCNN network. One theory for the jagged verification loss plots (cyan line Figure 4.4b for example) is that they are due to the use of the batch normalization, which normalizes the layers over each subset of data (or batch) used during training and is typically known to lead to smoother, more stable behavior of the loss gradients [54]. Part of the architecture optimization could be to explore other options, such as instance normalization [66] to see if this normalization handles the sparse nature of the input data better to lead to more stable training. One could also explore potential optimization specific for sparse input arrays. Other neutrino physics experiments have even developed their own sparse CNNs [67], which could be potentially adapted to IceCube. Another addition to the current FLRECNN setup would be to add input from the outer IceCube strings. While the low energy samples themselves are tuned to only save events with interactions in the DeepCore array, the inclusion of summarized information from the outer rings of IceCube strings has shown some promise in improving resolutions, as seen in a study by undergraduate student Emma Hettinger. This summary input allows the network to use information from the outer strings without overemphasizing them and without excessively increasing the burden of the training file sizes, which are already quite large (Table 4.5). This could lead to improved resolution for some of the FLERCNN reconstructions, with the zenith and vertex reconstructions particularly benefiting from additional information from DOMs outside the currently included region. 8.1.2 Expand FLERCNN Reconstruction Output The FLERCNN architecture and training samples could be used to train for additional variables, such as the ending position of the muon track from 𝜈 𝜇 CC interactions or predictions of its own uncertainty on its reconstruction. Both of these variables could be used to improve the purity of 135 sample by cutting these lower quality events. Ensuring that the ending position of the track is within DeepCore is a cut that has been used for past IceCube oscillation analyses [17] and also showed improvements in training the zenith angle CNN (Section 4.5a), so training for this variable would help to apply a cut to only keep well-reconstructed events. Simultaneously training the networks for their own uncertainty has shown promise on other IceCube projects [20], and this could also be used as a purity selection for the sample if implemented. Adding the uncertainty prediction to the energy FLERCNN reconstruction is currently being pursued by graduate student Finn Mayhew. 8.1.3 Broaden FLERCNN Applications FLERCNN opens the potential for future analyses on the current IceCube geometry. FLERCNN could be used in its current state to explore other oscillation studies, such as the 𝜈𝜏 appearance [16]. Additional training would be necessary for FLERCNN to be adapted to other studies, such as a GeV dark matter search where azimuth angle would be necessary to reconstruct [68], but these are also potential applications for the networks. FLERCNN could also be developed to be versatile for other detector geometries. The future IceCube Upgrade intends to deploy seven densely instrumented strings near the center of DeepCore to improve the low energy neutrino detection [69]. Another branch in the FLERCNN architecture could be added and developed to reconstruct the next generation of IceCube low energy atmospheric neutrino data. This network would also need to be adapted to the new DOM designs, which employ multi-PMT modules as opposed to the single, downward facing PMT design for the IceCube and DeepCore DOMs. This work is being pursued by graduate student Finn Mayhew. 8.2 Potential for Future Work on IceCube Neutrino Oscillations As discussed previously, the 9.28 year FLERCNN analysis is pursuing follow up studies to explore potential contributions to the goodness of fit result. Both this analysis and the 9.28 year 𝜈 𝜇 disappearance analysis using the likelihood-based reconstruction are working to pursue the potential issues under study in Section 6.6.3. Looking beyond the current 𝜈 𝜇 disappearance study, 136 FLERCNN provides a fast reconstruction solution that is adaptable to optimizing samples for various neutrino oscillation analyses and beyond. With the improved reconstruction speed, even higher statistics of simulations could be generated. This could further help to parameterize the detector systematics, improving the hypersurfaces used in the oscillation studies (Section 6.4.4.1). As mentioned in Section 8.1.3, other studies that are performed with IceCube’s previous low energy samples include 𝜈𝜏 appearance [16], neutrino mass ordering [70], non-standard interactions [71], and sterile neutrino searches [72]. The FLERCNN analysis cuts could be tuned to optimize samples for these specific explorations. 137 BIBLIOGRAPHY 138 BIBLIOGRAPHY [1] M G Aartsen et al., “PINGU: a vision for neutrino and particle physics at the South Pole”, Journal of Physics G: Nuclear and Particle Physics vol. 44 (5) p. 054006 (2017), doi:10.1088/1361-6471/44/5/054006, URL https://doi.org/10.1088%2F1361-6471%2F44% 2F5%2F054006 [2] J. A. Formaggio and G. P. Zeller, “From eV to EeV: Neutrino cross sections across energy scales”, Reviews of Modern Physics vol. 84 (3) pp. 1307–1341 (2012), doi: 10.1103/revmodphys.84.1307, URL https://doi.org/10.1103%2Frevmodphys.84.1307 [3] M. Honda et al., “Atmospheric neutrino flux calculation using the NRLMSISE-00 atmospheric model”, Physical Review D vol. 92 (2) (2015), doi:10.1103/physrevd.92.023004, URL https: //doi.org/10.1103%2Fphysrevd.92.023004 [4] M. A. Acero et al., “An Improved Measurement of Neutrino Oscillation Parameters by the NOvA Experiment”, (2021), doi:10.48550/ARXIV.2108.08219, URL https://arxiv.org/abs/ 2108.08219 [5] K. Abe et al., “Improved constraints on neutrino mixing from the T2K experiment with 3.3 × 1021 protons on target”, Physical Review D vol. 103 (11) (2021), doi:10.1103/physrevd. 103.112008, URL https://doi.org/10.1103%2Fphysrevd.103.112008 [6] Volodymyr Takhistov, “Atmospheric Neutrino Oscillations with Super-Kamiokand”, url: https://indico.cern.ch/event/868940/contributions/3817174/attachments/2082629/ 3498300/ICHEP2020_SK_Takhistov.pdf (2020) [7] P. Adamson et al., “Precision Constraints for Three-Flavor Neutrino Oscillations from the Full MINOS+ and MINOS Dataset”, Physical Review Letters vol. 125 (13) (2020), doi: 10.1103/physrevlett.125.131802, URL https://doi.org/10.1103%2Fphysrevlett.125.131802 [8] Summer Blot et al., “Oscnext Verification Sample Analysis (internal)”, Tech. rep. (2022) [9] Étienne Bourbeau et al., “oscNext High Stats Sample (internal)”, Tech. rep. (2022) [10] R. Abbasi et al. (IceCube), “Low Energy Event Reconstruction in IceCube DeepCore”, (2022), doi:10.48550/ARXIV.2203.02303, URL https://arxiv.org/abs/2203.02303 [11] Martin Leuermann, Testing the Neutrino Mass Ordering with IceCube DeepCore, Ph.D. thesis, RWTH Aachen U. (2018), doi:10.18154/RWTH-2018-231554 [12] Dmitry Chirkin and Martin Rongen (ICECUBE), “Light diffusion in birefringent polycrystals and the IceCube ice anisotropy”, PoS vol. ICRC2019 p. 854 (2019), doi:10.22323/1.358.0854 [13] Dmitry Chirkin (ICECUBE), “Evidence of optical anisotropy of the South Pole ice”, in “33rd International Cosmic Ray Conference”, p. 0580 (2013) 139 [14] Spencer N. Axani, “SPE Templates V2 Technote (internal)”, Tech. rep. (2020) [15] G. D. Barr, S. Robbins, T. K. Gaisser, and T. Stanev, “Uncertainties in atmospheric neutrino fluxes”, Physical Review D vol. 74 (9) (2006), doi:10.1103/physrevd.74.094009, URL https: //doi.org/10.1103%2Fphysrevd.74.094009 [16] M. G. Aartsen et al., “Measurement of atmospheric tau neutrino appearance with IceCube DeepCore”, Physical Review D vol. 99 (3) (2019), doi:10.1103/physrevd.99.032007, URL https://doi.org/10.1103%2Fphysrevd.99.032007 [17] M. G. Aartsen et al. (IceCube), “Measurement of Atmospheric Neutrino Oscillations at 6–56 GeV with IceCube DeepCore”, Phys. Rev. Lett. vol. 120 p. 071801 (2018), doi:10.1103/ PhysRevLett.120.071801, URL https://link.aps.org/doi/10.1103/PhysRevLett.120.071801 [18] H. Watanabe et al., “First study of neutron tagging with a water Cherenkov detector”, As- troparticle Physics vol. 31 (4) pp. 320–328 (2009), doi:https://doi.org/10.1016/j.astropartphys. 2009.03.002, URL https://www.sciencedirect.com/science/article/pii/S0927650509000401 [19] Alexander Radovic et al., “Machine learning at the energy and intensity frontiers of particle physics”, Nature vol. 560 (7716) pp. 41–48 (2018), doi:10.1038/s41586-018-0361-2, URL https://doi.org/10.1038/s41586-018-0361-2 [20] R. Abbasi et al., “A convolutional neural network based cascade reconstruction for the IceCube Neutrino Observatory”, Journal of Instrumentation vol. 16 (07) p. P07041 (2021), doi:10. 1088/1748-0221/16/07/p07041, URL https://doi.org/10.1088/1748-0221/16/07/p07041 [21] Y. Fukuda et al. (Super-Kamiokande Collaboration), “Evidence for Oscillation of Atmospheric Neutrinos”, Phys. Rev. Lett. vol. 81 pp. 1562–1567 (1998), doi:10.1103/PhysRevLett.81.1562, URL https://link.aps.org/doi/10.1103/PhysRevLett.81.1562 [22] Q. R. Ahmad et al. (SNO Collaboration), “Direct Evidence for Neutrino Flavor Transformation from Neutral-Current Interactions in the Sudbury Neutrino Observatory”, Phys. Rev. Lett. vol. 89 p. 011301 (2002), doi:10.1103/PhysRevLett.89.011301, URL https://link.aps.org/doi/ 10.1103/PhysRevLett.89.011301 [23] P.A. Zyla et al. (Particle Data Group), “Review of Particle Physics”, PTEP vol. 2020 (8) p. 083C01 (2020), doi:10.1093/ptep/ptaa104 [24] D. Griffiths, Introduction to Elementary Particles, Wiley-VCH, second, revised edn. (2008) [25] K. Hirata et al., “Observation of a neutrino burst from the supernova SN1987A”, vol. 58 (14) pp. 1490–1493 (1987), doi:10.1103/PhysRevLett.58.1490, URL https://ui.adsabs.harvard. edu/abs/1987PhRvL..58.1490H [26] B. Pontecorvo, “Inverse beta processes and nonconservation of lepton charge”, Zh. Eksp. Teor. Fiz. vol. 34 p. 247 (1957) 140 [27] Ziro Maki, Masami Nakagawa, and Shoichi Sakata, “Remarks on the Unified Model of Elementary Particles”, Progress of Theoretical Physics vol. 28 (5) pp. 870–880 (1962), doi: 10.1143/PTP.28.870, https://academic.oup.com/ptp/article-pdf/28/5/870/5258750/28-5-870. pdf, URL https://doi.org/10.1143/PTP.28.870 [28] Mark Thomson, Neutrinos and Neutrino Oscillations, Cambridge University Press (2013) [29] M.C. Gonzalez-Garcia and Michele Maltoni, “Phenomenology with massive neutrinos”, Physics Reports vol. 460 (1-3) pp. 1–129 (2008), doi:10.1016/j.physrep.2007.12.004, URL https://doi.org/10.1016%2Fj.physrep.2007.12.004 [30] Kaoru Hagiwara, Naotoshi Okamura, and Ken ichi Senda, “The earth matter effects in neu- trino oscillation experiments from Tokai to Kamioka and Korea”, Journal of High Energy Physics vol. 2011 (9) (2011), doi:10.1007/jhep09(2011)082, URL https://doi.org/10.1007% 2Fjhep09%282011%29082 [31] Costas Andreopoulos et al., “The GENIE Neutrino Monte Carlo Generator: Physics and User Manual”, (2015), doi:10.48550/ARXIV.1510.05494, URL https://arxiv.org/abs/1510.05494 [32] C. Andreopoulos et al., “The GENIE neutrino Monte Carlo generator”, Nucl. Instrum. Meth- ods Phys. Res. A vol. 614 (1) pp. 87–104 (2010), doi:https://doi.org/10.1016/j.nima.2009.12. 009, URL https://www.sciencedirect.com/science/article/pii/S0168900209023043 [33] Júlia Tena-Vidal et al. (GENIE Collaboration), “Neutrino-nucleon cross-section model tuning in GENIE v3”, Phys. Rev. D vol. 104 p. 072009 (2021), doi:10.1103/PhysRevD.104.072009, URL https://link.aps.org/doi/10.1103/PhysRevD.104.072009 [34] Amanda Cooper-Sarkar, Philipp Mertsch, and Subir Sarkar, “The high energy neutrino cross- section in the Standard Model and its uncertainty”, Journal of High Energy Physics vol. 2011 (8) (2011), doi:10.1007/jhep08(2011)042, URL https://doi.org/10.1007%2Fjhep08% 282011%29042 [35] T. K. Gaisser and M. Honda, “Flux of Atmospheric Neutrinos”, Annual Review of Nuclear and Particle Science vol. 52 (1) pp. 153–199 (2002), doi:10.1146/annurev.nucl.52.050102.090645, URL https://doi.org/10.1146%2Fannurev.nucl.52.050102.090645 [36] Sebastian Euler, Observation of oscillations of atmospheric neutrinos with the IceCube Neu- trino Observatory, Ph.D. thesis, RWTH Aachen Univ. (Germany) (2013), () [37] M. Honda, T. Kajita, K. Kasahara, and S. Midorikawa, “New calculation of the atmospheric neutrino flux in a three-dimensional scheme”, Phys. Rev. D vol. 70 p. 043008 (2004), doi: 10.1103/PhysRevD.70.043008, URL https://link.aps.org/doi/10.1103/PhysRevD.70.043008 [38] K. Abe et al., “Atmospheric neutrino oscillation analysis with external constraints in Super- Kamiokande I-IV”, Physical Review D vol. 97 (7) (2018), doi:10.1103/physrevd.97.072001, URL https://doi.org/10.1103%2Fphysrevd.97.072001 [39] M. G. Aartsen et al. (IceCube), “The IceCube Neutrino Observatory: Instrumentation and On- line Systems”, JINST vol. 12 (03) p. P03012 (2017), doi:10.1088/1748-0221/12/03/P03012, 1612.05093 141 [40] P. A. Čerenkov, “Visible Radiation Produced by Electrons Moving in a Medium with Velocities Exceeding that of Light”, Phys. Rev. vol. 52 pp. 378–379 (1937), doi:10.1103/PhysRev.52.378, URL https://link.aps.org/doi/10.1103/PhysRev.52.378 [41] M.G. Aartsen et al., “Measurement of South Pole ice transparency with the IceCube LED calibration system”, Nuclear Instruments and Methods in Physics Research Section A: Ac- celerators, Spectrometers, Detectors and Associated Equipment vol. 711 pp. 73–89 (2013), doi:10.1016/j.nima.2013.01.054, URL https://doi.org/10.1016%2Fj.nima.2013.01.054 [42] S. Blot M. Rongen, R.C. Bay, “Observation of an optical anisotropy in the deep glacial ice at the geographic South Pole using a laser dust logger”, The Cryosphere vol. 14 p. 2537–2543 (2020), doi:"https://doi.org/10.5194/tc-14-2537-2020" [43] Rongen, Martin, “Measuring the optical properties of IceCube drill holes”, EPJ Web of Conferences vol. 116 p. 06011 (2016), doi:10.1051/epjconf/201611606011, URL https://doi. org/10.1051/epjconf/201611606011 [44] Summer Blot et al., “oscNext (internal)”, Tech. rep. (2022) [45] Yee Lam Elim Elim Cheung, Neutrino Interactions in IceCube above 1 TeV: Constraints on Atmospheric Charmed-Meson Production and Investigation of the Astrophysical Neu- trino Flux with 2 Years of IceCube Data taken 2010–2012, Ph.D. thesis, University of Wisconsin-Madison (2014), (https://docushare.icecube.wisc.edu/dsweb/Get/Document- 72536/thesis.pdf, page 130) [46] Michael J. Larson, Simulation and identification of non-Poissonian noise triggers in the IceCube neutrino detector, Ph.D. thesis, University of Alabama (2014), (https://ir.ua.edu/handle/123456789/1927) [47] Yee Lam Elim Elim Cheung, Measurement of Atmospheric Neutrino Oscillation Parameters Using Three Years of IceCube-DeepCore Data, Ph.D. thesis, University of Maryland (2018), (https://drum.lib.umd.edu/bitstream/handle/1903/21383/Cheung_umd_0117E_19177.pdf) [48] François Chollet et al., “Keras”, https://keras.io (2015) [49] Martín Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems”, (2015), software available from tensorflow.org, URL https://www.tensorflow.org/ [50] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, “ImageNet Classification with Deep Convolutional Neural Networks”, Commun. ACM vol. 60 p. 84 (2017), doi:10.1145/3065386 [51] Christian Szegedy et al., “Going Deeper with Convolutions”, CoRR vol. abs/1409.4842 (2014), 1409.4842, URL http://arxiv.org/abs/1409.4842 [52] Yann LeCun et al., “Handwritten Digit Recognition with a Back-Propagation Net- work”, in D. Touretzky (Ed.), “Advances in Neural Information Processing Systems”, vol. 2, Morgan-Kaufmann (1989), URL https://proceedings.neurips.cc/paper/1989/file/ 53c3bce66e43be4f209556518c2fcb54-Paper.pdf 142 [53] Arohan Ajit, Koustav Acharya, and Abhishek Samanta, “A Review of Convolutional Neural Networks”, in “2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE)”, pp. 1–5 (2020), doi:10.1109/ic-ETITE47903.2020.049 [54] Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry, “How Does Batch Normalization Help Optimization?”, in S. Bengio et al. (Eds.), “Advances in Neu- ral Information Processing Systems”, vol. 31, Curran Associates, Inc. (2018), URL https: //proceedings.neurips.cc/paper/2018/file/905056c1ac1dad141560467e0a99e1cf-Paper.pdf [55] Baijayanta Roy, “All about Feature Scaling”, (2020), URL https://towardsdatascience.com/ all-about-feature-scaling-bcc0ad75cb35 [56] Jiefeng Chen et al., “Robust Out-of-distribution Detection for Neural Networks”, (2020), doi:10.48550/ARXIV.2003.09711, URL https://arxiv.org/abs/2003.09711 [57] IceCube collaboration, “PISA”, https://user-web.icecube.wisc.edu/~peller/pisa_docs/index. html (2021) [58] M. G. Aartsen et al. (IceCube), “Computational Techniques for the Analysis of Small Signals in High-Statistics Neutrino Oscillation Experiments”, (2019), 1803.05390 [59] Anatoli Fedynitch, “mceq”, https://mceq.readthedocs.io/en/latest/ (2022) [60] Adam M. Dziewonski and Don L. Anderson, “Preliminary reference Earth model”, Physics of the Earth and Planetary Interiors vol. 25 (4) pp. 297–356 (1981), doi:https://doi. org/10.1016/0031-9201(81)90046-7, URL https://www.sciencedirect.com/science/article/ pii/0031920181900467 [61] Super-Kamiokande collaboration, “Prob3++”, https://github.com/rogerwendell/ Prob3plusplus (2018-2022) [62] Piti Ongmongkolkul and the iminuit team, “iminuit”, https://iminuit.readthedocs.io/en/stable/ index.html (2020) [63] F. James and M. Roos, “Minuit - a system for function minimization and analysis of the pa- rameter errors and correlations”, Computer Physics Communications vol. 10 (6) pp. 343–367 (1975), doi:https://doi.org/10.1016/0010-4655(75)90039-9, URL https://www.sciencedirect. com/science/article/pii/0010465575900399 [64] M. Winkler and F. James, “Minuit2”, https://root.cern.ch/doc/master/md_math_minuit2_doc_ Minuit2.html (2022) [65] Rasha Abbasi et al., “A novel microstructure based model to explain the IceCube ice anisotropy”, in “Proceedings of 37th International Cosmic Ray Conference — PoS(ICRC2021)”, vol. 395, p. 1119 (2021), doi:10.22323/1.395.1119 [66] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky, “Instance Normalization: The Missing Ingredient for Fast Stylization”, (2016), doi:10.48550/ARXIV.1607.08022, URL https://arxiv.org/abs/1607.08022 143 [67] P. Abratenko et al., “Semantic segmentation with a sparse convolutional neural network for event reconstruction in MicroBooNE”, Physical Review D vol. 103 (5) (2021), doi: 10.1103/physrevd.103.052012, URL https://doi.org/10.1103%2Fphysrevd.103.052012 [68] R. Abbasi et al. (IceCube Collaboration), “Search for GeV-scale dark matter annihilation in the Sun with IceCube DeepCore”, Phys. Rev. D vol. 105 p. 062004 (2022), doi:10.1103/ PhysRevD.105.062004, URL https://link.aps.org/doi/10.1103/PhysRevD.105.062004 [69] Aya Ishihara (IceCube Collaboration), “The IceCube Upgrade – Design and Science Goals”, (2019), doi:10.48550/ARXIV.1908.09441, URL https://arxiv.org/abs/1908.09441 [70] M. G. Aartsen et al., “Development of an analysis to probe the neutrino mass ordering with atmospheric neutrinos using three years of IceCube DeepCore data”, The European Physical Journal C vol. 80 (1) (2020), doi:10.1140/epjc/s10052-019-7555-0, URL https: //doi.org/10.1140%2Fepjc%2Fs10052-019-7555-0 [71] R. Abbasi et al., “All-flavor constraints on nonstandard neutrino interactions and generalized matter potential with three years of IceCube DeepCore data”, Physical Review D vol. 104 (7) (2021), doi:10.1103/physrevd.104.072006, URL https://doi.org/10.1103%2Fphysrevd.104. 072006 [72] M. G. Aartsen et al., “Search for sterile neutrino mixing using three years of IceCube DeepCore data”, Physical Review D vol. 95 (11) (2017), doi:10.1103/physrevd.95.112002, URL https: //doi.org/10.1103%2Fphysrevd.95.112002 144