SPARSE GRID DISCONTINUOUS GALERKIN METHODS FOR NONLINEAR OPTICS AND MATHEMATICAL MODELING OF ASYNCHRONOUS DATA FLOW IN PARALLEL COMPUTERS By Kai Huang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Applied Mathematics – Doctor of Philosophy 2022 ABSTRACT SPARSE GRID DISCONTINUOUS GALERKIN METHODS FOR NONLINEAR OPTICS AND MATHEMATICAL MODELING OF ASYNCHRONOUS DATA FLOW IN PARALLEL COMPUTERS By Kai Huang This thesis consists of two parts: the first part discusses the Sparse Grid Discontinuous Galerkin (SGDG) method and its adaptive version, and their applications in Maxwell equations in nonlinear optical media [1, 2]; the second part discusses a Hamilton-Jacobi model of asynchronous data flow in parallel computers, and corresponding numerical simulation using Weighted Essentially Non-Oscillatory (WENO) method. SGDG method and its adaptive version were developed in recent years [3, 4, 5, 6, 7], to numerically solve different linear or nonlinear PDE problems, reducing degrees of free- dom and computational cost. Compared to the previous works, this thesis mainly focuses on fully implicit SGDG method of nonlinear equations, which broadens the applications of the method. To achieve this goal, the existing SGDG package [8] is coupled with nonlin- ear solvers in PETSc [9]. Numerical simulations of several model equations and physical relevant problems are presented to demonstrate accuracy and robustness of the method. Presented in second part of the thesis are models of data flow on processors in a high performance computing framework involving computations necessitating inter-processor communications [10]. First comes an ordinary differential model, and its asymptotic limit results in a model which treats the computer as a continuum of processors and data flow as an Eulerian fluid governed by a conservation law. We derive a Hamilton-Jacobi equa- tion associated with this conservation law for which the existence and uniqueness of so- lutions can be proved. High order WENO interpolation [11, 12], together with strong stability preserving (SSP) method for time discretization, is applied to simulation of the Hamilton-Jacobi model. We then present the results of numerical experiments for both discrete and continuum models; these show a qualitative agreement between the two and the effect of variations in the computing environment’s processing capabilities on the progress of the modeled computation. Copyright by KAI HUANG 2022 ACKNOWLEDGEMENTS Over the five years study in Michigan State University, I have received tremendous help from a lot of people. First and foremost, I offer my deepest appreciation to my thesis advisor, Professor Yingda Cheng. Not only does she provide me guidance on my the- sis, but also she gives me valuable suggestions and generous helps throughout my PhD study. Without her patience and support in every aspect, this thesis would not have been possible. I would also thank all of my committee members, Professor Daniel Appelo, Professor Andrew Christlieb, and Professor Jianliang Qian, for their kindness and help. I am also indebted to my undergraduate advisor, Professor Mengping Zhang, for introducing me to the research of numerical analysis. Support from colleagues is always crucial in large research projects. I would thank my colleagues, Dr. Juntao Huang, Professor Zhanjing Tao, Professor Yan Jiang, Professor Wei Guo, Professor Yuan Liu, for their help and advice in several different parts of the thesis and my research, especially to the adaptive SGDG package and WENO method. I would also thank Dr. Richard C. Bernard, Dr. Cory Hauck, Dr. Xianzhu Tang, Dr. Qi Tang, William Sands, for their help in many aspects of my research career. Besides, I would appreciate my friends, Anqi Chen, Andres Felipe Galindo-Olarte, Eric Cheuk-Wai Yau, Jing Huang, Rui Wang, Yao Li, Runze Su, Jialin Qu, Jian Song, Zhichao Peng, etc. for their companion and kind help in the many years. Last but not the least, I am eternally grateful to my parents, Haizhuang Huang and Xianhua Liu. Their persistent support and unconditional love encourage me to pursue higher goals in my life, and make me who I am today. v TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Discontinuous Galerkin (DG) Method . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Sparse Grid Discontinuous Galerkin (SGDG) Method . . . . . . . . . . . . . 3 1.3 Weighted Essentially Non-Oscillatory (WENO) Method . . . . . . . . . . . . 5 1.4 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 CHAPTER 2 APPLICATION OF SGDG METHOD TO MAXWELL’S EQUA- TION IN NONLINEAR OPTICS . . . . . . . . . . . . . . . . . . . . . 11 2.1 Maxwell’s Equation in Nonlinear Optics . . . . . . . . . . . . . . . . . . . . . 11 2.2 Formulation of Energy Stable Adaptive SGDG Method in One Dimension . 15 2.2.1 SGDG Method and Alpert’s Multiwavelets . . . . . . . . . . . . . . . 16 2.2.2 Semi-Discrete DG Method for the Maxwell’s Equation . . . . . . . . . 18 2.2.3 Energy-Stable DG Method of the Maxwell Equation . . . . . . . . . . 21 2.2.4 Interpolatory Multiwavelets . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.5 SGDG Method with Multiresolution Interpolation in One Dimension 27 2.3 Formulation of Energy Stable Adaptive SGDG Method in Higher Dimension 28 2.3.1 Semi-Discrete and Fully-Discrete Energy Stable DG Method . . . . . 28 2.3.2 High Dimensional Sparse Grid DG Method and Interpolatory Mul- tiwavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3.3 Adaptive SGDG Method . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4 Adaptive SGDG Package and PETSc Implementation . . . . . . . . . . . . . 40 2.4.1 Overview of Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.2 Hash Key of SGDG Elements . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.3 SGDG Spatial Discretization . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4.4 Time Integration and PETSc Package . . . . . . . . . . . . . . . . . . . 47 2.5 Numerical Simulation in One Dimensional Case . . . . . . . . . . . . . . . . 52 2.5.1 Kink Shape Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.5.2 Soliton Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.6 Numerical Simulation in Higher Dimensional Case . . . . . . . . . . . . . . 59 2.6.1 Accuracy and Convergence Test of Linear Equation . . . . . . . . . . 59 2.6.2 Accuracy and Convergence Test for nonlinear Maxwell Equation . . 62 2.6.3 Spatial Optical Soliton Propagation . . . . . . . . . . . . . . . . . . . . 66 CHAPTER 3 APPLICATION OF WENO METHOD IN MATHEMATICAL MODEL OF ASYNCHRONOUS DATA FLOW IN PARALLEL COMPUT- ERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 vi 3.1 Models of Extreme Scale Computers . . . . . . . . . . . . . . . . . . . . . . . 73 3.2 The Discrete Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.3 The Continuum Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.4 Numerical Methods for Simulations . . . . . . . . . . . . . . . . . . . . . . . 82 3.4.1 ODE Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.4.2 Hamilton-Jacobi Implementation . . . . . . . . . . . . . . . . . . . . . 83 3.4.3 WENO Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 CHAPTER 4 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 APPENDIX A NUMERICAL SIMULATION OF 1D SOLITON PROPAGA- TION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 APPENDIX B NUMERICAL EXPERIMENTS OF ASYNCHRONOUS DATA FLOW MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . 107 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 vii LIST OF TABLES Table 2.1: CFL constant for fully implicit method . . . . . . . . . . . . . . . . . . . . 53 Table 2.2: Errors and Orders of Accuracy of E. k = 1, T = 1/v. . . . . . . . . . . . . . 54 Table 2.3: Errors and Orders of Accuracy of J. k = 1, T = 1/v. . . . . . . . . . . . . . 54 Table 2.4: Errors and Orders of Accuracy of E. k = 2, T = 1/v. . . . . . . . . . . . . . 54 Table 2.5: Errors and Orders of Accuracy of J. k = 2, T = 1/v. . . . . . . . . . . . . . 54 Table 2.6: Errors and Orders of Accuracy of E. k = 3, T = 1/v. . . . . . . . . . . . . . 55 Table 2.7: Errors and Orders of Accuracy of J. k = 3, T = 1/v. . . . . . . . . . . . . . 55 Table 2.8: Adaptive Result of P 2 case, using alternating flux . . . . . . . . . . . . . . 56 Table 2.9: Adaptive Result of P 2 case, using upwind flux . . . . . . . . . . . . . . . . 56 Table 2.10: Adaptive Result of P 3 case, using alternating flux . . . . . . . . . . . . . . 56 Table 2.11: Adaptive Result of P 3 case, using upwind flux . . . . . . . . . . . . . . . . 56 Table 2.12: CFL numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Table 2.13: Errors and orders of accuracy of u, for k = 1, at T = 1 . . . . . . . . . . . . 60 Table 2.14: Errors and orders of accuracy of u, for k = 1, at T = 0 . . . . . . . . . . . . 60 Table 2.15: Errors and orders of accuracy of u, for k = 2, at T = 1 . . . . . . . . . . . . 60 Table 2.16: Errors and orders of accuracy of u, for k = 2, at T = 0 . . . . . . . . . . . . 60 Table 2.17: Errors and orders of accuracy of u, for k = 3, at T = 1 . . . . . . . . . . . . 60 Table 2.18: Errors and orders of accuracy of u, for k = 3, at T = 0 . . . . . . . . . . . . 61 Table 2.19: L2 error and order for adaptive scheme, for T = 1 . . . . . . . . . . . . . . 62 Table 2.20: L2 and L∞ errors of Hz , Ex , Ey for Alternating Flux I case . . . . . . . . . . 65 viii LIST OF FIGURES Figure 2.1: Function E and J at t = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Figure 2.2: Accurate solution, and active elements at T = 1 for ε = 10−5 ; red circles are center of active elements. . . . . . . . . . . . . . . . . . . . . . 63 Figure 2.3: Magnitude |u1 (ζ, ξ)| of Fundamental Soliton and |u2 (ζ, ξ)| of Second Order Soliton, of Solutions of NLSE (2.70). . . . . . . . . . . . . . . . . . 68 Figure 2.4: Comparison for fundamental soliton between absolute value |Hz | of magnetic field (left column) and center of active elements (right col- umn) at different time. First row: T = 2; Second row: T = 4; Third row: T = 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Figure 2.5: Comparison for second order soliton between absolute value |Hz | of magnetic field (left column) and center of active elements (right col- umn) at different time. First row: T = 1.976; Second row: T = 3.024; Third row: T = 3.459. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Figure 3.1: Schematic of network of processors. Dashed lines denote inter-processor communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Figure 3.2: Flux Φ(0) defined in (3.30) . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Figure 3.3: Profiles of the processor speed α used in the numerical experiments. The non-standard orientation of the graphs is set to match the axes in the numerical results that follow. . . . . . . . . . . . . . . . . . . . . . . . 87 Figure 3.4: Comparison of the discrete model with (imax , k max ) = (2500, 500) and the continuum model for η = 0.2 at time t = 0.5. As expected, the dis- crete model shows better agreement with the continuum model than the previous version with only (imax , k max ) = (1000, 200) processors and stages; cf. fig. B.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Figure 3.5: The effect of a highly localized slowdown on ρ . . . . . . . . . . . . . . . 90 Figure 3.6: The effect of small variation in processor speed on ρ. After sufficient time, a profile emerges with the periodicity of α. . . . . . . . . . . . . . . 91 ix Figure A.1: Transient fundamental (M = 1) temporal soliton propagation with the adaptive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−5 , and coarsen threshold is η = 10−6 . First column: piecewise polyno- mial degree k = 1; second column: k = 2; third column: k = 3. First row: alternating flux I; second row: alternating flux II; third row: up- wind flux. Solutions at time T = 40 and T = 80 are plotted on each case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Figure A.2: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.1. . . 96 Figure A.3: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.1. . . 97 Figure A.4: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.1. . . . . . 98 Figure A.5: Transient fundamental (M = 1) temporal soliton propagation with the adaptive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−3 , and coarsen threshold is η = 10−4 . First column: piecewise polyno- mial degree k = 1; second column: k = 2; third column: k = 3. First row: alternating flux I; second row: alternating flux II; third row: up- wind flux. Solutions at time T = 40 and T = 80 are plotted on each case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Figure A.6: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.5. . . 100 Figure A.7: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.5. . . 101 Figure A.8: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.5. . . . . . 102 Figure A.9: Transient second order (M = 2) temporal soliton propagation with the adaptive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−5 , and coarsen threshold is η = 10−6 . First column: piecewise polyno- mial degree k = 1; second column: k = 2; third column: k = 3. First row: alternating flux I; second row: alternating flux II; third row: up- wind flux. Solutions at time T = 40 and T = 80 are plotted on each case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 x Figure A.10:Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.9. . . 104 Figure A.11:Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.9. . . 105 Figure A.12:Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.9. . . . . . 106 Figure B.1: Discrete solution r and continuum solution ρ of η = 0.2, in Example 3.5.1. From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Discrete solution is computed with (imax , k max ) = (1000, 200). Continuum solution is computed on a 103 × 103 mesh. . . . . 107 Figure B.2: Discrete solution r and continuum solution ρ of η = 1 case, in Ex- ample 3.5.1. From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Discrete solution is computed with (imax , k max ) = (500, 500). Continuum solution is computed on a 103 × 103 mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Figure B.3: Discrete solution r and continuum solution ρ of η = 5 case, in Ex- ample 3.5.1. From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Discrete solution is computed with (imax , k max ) = (200, 1000). Continuum solution is computed on a 103 × 103 mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Figure B.4: The effects on ρ due to variations in η, in Example 3.5.2. As η in- creases, the throttling effect of local slowdown spreads more quickly, and data is not processed as quickly. . . . . . . . . . . . . . . . . . . . . . 110 Figure B.5: The effects on ρ due to variations in β, in Example 3.5.3. Larger values of β lead to more throttling. . . . . . . . . . . . . . . . . . . . . . . . . . . 111 xi LIST OF ALGORITHMS Algorithm 2.1: DGSolution Class, in DGSolution.h . . . . . . . . . . . . . . . . . . . 43 Algorithm 2.2: Element Class, in Element.h . . . . . . . . . . . . . . . . . . . . . . . 44 Algorithm 2.3: DGAdapt class, in DGAdapt.h . . . . . . . . . . . . . . . . . . . . . . 46 Algorithm 2.4: Example of Linear Equations, in Linear2D.cpp . . . . . . . . . . . . . 49 Algorithm 2.5: While Loop to Find Numerical Solution in Algorithm 2.4 . . . . . . . 50 Algorithm 2.6: Function FormFunction for Nonlinear Solver in Algorithm 2.4 . . . . 51 Algorithm 2.7: Command to Run Executable Linear2D . . . . . . . . . . . . . . . . . 51 xii CHAPTER 1 INTRODUCTION In this chapter, we briefly introduce several numerical methods, which are the focus of our research projects: Discontinuous Galerkin (DG) Method, its variant: Sparse Grid Discon- tinuous Galerkin (SGDG) Method, and Weighted Essentially Non-Oscillatory (WENO) Method. 1.1 Discontinuous Galerkin (DG) Method Discontinuous Galerkin (DG) Method is a specific kind of finite element methods. Com- paring to traditional finite element method, basis functions of DG Methods are chosen to be discontinuous on element interfaces, such as piecewise polynomials, which leads to the introduction of numerical flux. By properly choosing numerical flux, the resulting numerical method is stable and convergent. The DG Method for transport equation was first introduced by Reed and Hill in 1973, and discussed in a report [13] of Los Alamos National Laboratory. The method was to solve the neuron transport equations, which are time-independent linear hyperbolic equations. In late 1980s and early 90s, a major development, namely Runge-Kutta DG Methods, was established by Cockburn and Shu in a series of paper [14, 15, 16, 17, 18], which is a framework to solve nonlinear time-dependent hyperbolic equations, e.g. Eu- ler equations of compressible gas dynamics [19]. Under this framework, the DG dis- cretization is used only for spatial variables, while explicit, nonlinearly stable high order Runge-Kutta methods [20, 21] are used to discretize the time variable. To avoid oscillation of numerical solutions when strong shock occurs, exact or approximate Riemann solvers as interface fluxes and total variation bounded (TVB) nonlinear limiters [22, 23] are in- troduced from finite volume methods. DG methodology was also generalized to treat viscous terms as well, and thus the DG methods were designed to solve Navier-Stokes 1 equations [24, 25]. Comparing to classical finite volume and finite difference methods, DG Methods have certain advantages [26]. By choosing polynomials of high degree, the corresponding DG Methods can achieve high order of accuracy; the mass matrix is sparse so the DG Meth- ods are parallelizable. In addition, DG Methods can handle h-p adaptivity; due to the discontinuous basis functions, the continuity restrictions for conforming finite element methods need not be considered, and grid refinement or unrefinement is easy to handle. Adaptivity is of particular interest for hyperbolic systems and can potentially advance computational efficiency [27, 28]. In 1970s, independent research of Galerkin Methods applying to elliptic and parabolic equations emerged, as discussed in [29], which are now called interior penalty (IP) meth- ods. The idea of penalty formulation could be traced back to 1960s, e.g. [30]. In 1971, Nitsche [31] developed the first penalty method, where a penalty term inversely propor- tional to mesh size was introduced to guarantee stability and optimal order of conver- gence for smooth exact solution. Instead, Babuska’s approach [32] used a more flexible penalty term, with consistency error and sub-optimal convergence rate. The method was further analyzed and generalized to nonlinear elliptic and parabolic problems by Arnold [33], which were summarized in [34]. After all, a unified framework is proposed [29, 35], for both primal formulation inspired by original interior penalty method, and methods inspired by finite volume methods of hyperbolic problems with proper numerical flux. We also refer to textbooks like [36] for further reference. Since the DG Methods were introduced, they have found rapid applications in such diverse areas as aeroacoustics, electro-magnetism, gas dynamics, granular flows, magne- tohydrodynamics, meteorology, modeling of shallow water, oceanography, oil recovery simulation, semiconductor device simulation, transport of contaminant in porous me- dia, turbomachinery, turbulent flows, viscoelastic flows and weather forecasting, among many others. For further discussions of this, we refer to review papers e.g. [26, 27, 37, 28]. 2 For earlier works on DG methods, we refer to the survey paper [26], and other papers in the same Springer volume. The lecture notes [38] is a good reference for many de- tails, as well as the extensive review paper [39]. There are several special journal issues devoted to the DG method [40, 41, 42, 43, 44], which contain many interesting papers on DG method in all aspects including algorithm design, analysis, implementation and applications. There are also a few books and lecture notes [45, 46, 47, 36, 48, 49] on DG methods. DG methods have also grown to be broadly adopted for electromagnetic simulations in the past two decades. They have been developed and analyzed for time dependent linear models, including Maxwell’s equations in free space (e.g., [50, 51, 52]), dispersive media (e.g., [53, 54, 55, 56]), as well as meta-materials (e.g., [57, 58, 59, 60]). However, there exists only limited study for DG methods for nonlinear Maxwell models. For ex- ample, in [61, 62], Kerr nonlinearity was investigated, where the entire Maxwell PDE- ODE system was cast as a nonlinear hyperbolic conservation law, for which DG methods have long been known for their success. A relaxed version of the Kerr model, called the Kerr–Debye model, was examined in [63], where a second-order asymptotic-preserving and positivity-preserving DG scheme is designed and analyzed; [64] also devised and analyzed asymptotic-preserving and positivity-preserving methods for the Kerr-Debye- Lorentz model. 1.2 Sparse Grid Discontinuous Galerkin (SGDG) Method As discussed above, DG methods have overall good performance for different kinds of problems like hyperbolic or elliptic equations, and they admit flexibility in choosing the discretization meshes and the approximation spaces. However, the methods are often considered too costly due to the large degrees of freedom of the approximation space, especially for problems on high dimension, due to curse of dimensionality [65]. Such drawback is more prominent when dealing with high-dimensional equations arising from 3 real-world applications, such as kinetic simulations, stochastic analysis, and mathemati- cal modeling in finance or statistics. The sparse grid techniques [66, 67] was introduced by Zenger [68], and later becomes a major tool to break the curse of dimensionality of grid-based approaches. The idea re- lies on a tensor product hierarchical basis representation, which can reduce the degrees of freedom without compromising too much on accuracy. The fundamentals of sparse grid techniques can further date back to Smolyak [69] for numerical integration, and they are closely related to hyperbolic cross [70, 71], boolean method [72], discrete blending method [73], and splitting extrapolation method [74]. The construction of the scheme is to seek a proper truncation of the tensor product hierarchical basis, which can be formally derived by solving an optimization problem of cost-benefit ratios [75]. Sparse grid techniques have been incorporated in collocation methods for high-dimensional stochastic differen- tial equations, Galerkin finite element methods, finite difference methods, finite volume methods, and spectral methods for high-dimensional PDEs [3]. Employing the similar sparse grid techniques as above, Sparse Grid Discontinuous Galerkin (SGDG) Methods are a specific class of DG Methods. Instead of choosing piece- wise polynomials for basis functions as in typical DG Methods, as introduced in [3, 4, 5], SGDG uses basis functions derived from Alpert’s multiwavlet basis functions [76, 77] and multiresolution analysis (MRA) [78]. In one dimension, SGDG Methods would be an equivalent reformulation of traditional DG Methods; however for higher dimension case, to break the curse of dimensionality, we use a subset of tensor product approxi- mation space for DG approximation, based on the hierarchial structure of multiwavelet basis functions. SGDG Methods efficiently reduce the number of degrees of freedom (DoF) of the unknowns from O(h−d ) to O(h−1 | log2 h|d−1 ) for d-dimensional problems [3], where h is the uniform mesh size in each dimension. Stability and conservation can be maintained, while error is only slightly deteriorated for sufficiently smooth solutions [4]. SGDG Methods have been applied to elliptic equations [3], transport equations [4], 4 reaction-diffusion equations [79], Helmholtz equation [80], Vlasov–Maxwell equations [81], radioactive transfer equation [82], etc. Success of SGDG Methods is encouraging, but several improvement comes afterwards. The main bottleneck is that the previous method can only treat some kind of linear equa- tions with given variable coefficients, or coefficients with specified dependence to un- knowns. To compute of nonlinear terms in nonlinear equations, interpolatory multi- wavelets for MRA quadrature and sparse grid collocation method [7, 83] were developed, where new multiwavelets called interpolatory multiwavelets were associated to interpo- lation on the hierarchial grid. Besides, since the Alpert’s and interpolatory multiwavelet basis functions are global, it is essential to find efficient implementation to make sure computational cost comparable to traditional DG Methods, so a fast algorithm regarding matrix-vector multiplication based on [84, 85] was also introduced into SGDG method. In addition, if the exact solution is not smooth enough, the standard SGDG Methods might not be a good choice, since it requires fine mesh to capture the non-smooth solution. In order to overcome this, a family of adaptive SGDG Methods for linear transport equa- tions were developed [5], by using fully tensorized basis functions, with hierarchial sur- plus as refinement or coarsening indicator automatically capturing the local structures. The adaptive method has also been applied to kinetic equations [5], hyperbolic conser- vation laws [6], wave equations [83, 86], Hamilton-Jacobi equation [87], and nonlinear Schrodinger equation [88]. 1.3 Weighted Essentially Non-Oscillatory (WENO) Method WENO (Weighted Essentially Non-Oscillatory) Methods, together with ENO (Essentially Non-Oscillatory) Methods, are high order accurate finite difference or finite volume schemes designed for solving hyperbolic and convection-diffusion equations with possibly dis- continuous solutions or solutions with sharp gradient regions. The key idea of ENO and WENO Methods is an nonlinear adaptive approximation procedure that achieves high 5 order accuracy on smooth region of function, while resolving shock or other discontinu- ities sharply without substantial oscillations. Because the main idea of the methods is not necessarily related to PDEs, ENO or WENO Methods have several non-PDE applications, too [89]. High order finite difference and finite volume methods are based on interpolations of discrete data, mostly by using algebraic polynomials. Approximation theory guarantees that a wider interpolation stencil yields higher order accuracy for smooth function. Fixed stencils are usually applied in these methods, and perform well in smooth problems; how- ever, fixed stencil interpolation causes unavoidable oscillation called Gibbs phenomena near discontinuity, which cannot be eliminated by mesh refinement, leading to numerical instability in nonlinear problems involving discontinuities [90]. In applications such as hyperbolic conservation laws, Hamilton-Jacobi Equations, or convection-diffusion equa- tions, etc. the exact solution might contain discontinuities on either the solution itself or its derivative, regardless of smoothness of initial or boundary conditions [89, 11], and fixed stencil interpolation is a less appealing choice in these cases. ENO schemes were first introduced by Harten, Engquist, Osher and Chakravarthy in 1987 [91] in the finite volume framework, which in the first time successfully attempting to obtain a self-similar, uniformly high order accurate, and essentially non-oscillatory in- terpolation. In this paper, they used Newton divided differences to determine the local smoothness of the function to be approximated, which indicates the relatively smooth stencil to be chosen among candidate stencils for interpolation. Later, ENO methods in fi- nite difference framework, together with TVD Runge-Kutta time integration, were shown save significant amount of computational cost in higher dimension [92, 93]. Based on ENO, WENO methods were developed using convex combination of all can- didates stencils instead of only one stencil as in ENO. The crucial ingredient is choice of the combination coefficients, also called nonlinear weights, to fulfill the following [89]: (1) when the solution is sufficiently smooth, the nonlinear weight should be close to so-called 6 "linear weight", which guarantees high order accuracy in combined stencil; (2) when some candidate stencils contain discontinuity while others don’t, the stencils containing discon- tinuities should be associated with smaller nonlinear weight. The first WENO Method [94] was the one dimension case for hyperbolic conserva- tion laws, under finite volume framework, while [95] provided the multidimensional version, and in addition, [96, 97] improved accuracy. [98] offered 1D finite volume for- mulation based on a staggered grid and Lax-Friedrichs formulation. [99] developed the multidimensional finite difference formulation with improved accuracy, and a general framework in forming a (2k + 1)-order WENO approximation from a k-th order ENO stencil was established. The fifth order WENO in [99] becomes the most commonly used WENO method. Higher order ((2k+1)-th order, with k = 3, 4, 5, ...) WENO reconstruction procedures were developed in [100]. WENO improves upon ENO in robustness, better smoothness of fluxes, better steady state convergence, better provable convergence prop- erties, and more efficiency. We refer to lecture notes and review papers [11, 101, 89, 12] for further details. Among the applications of WENO methods, we consider Hamilton-Jacobi equations ϕt + H(ϕx1 , ..., ϕxd ) = 0, ϕ(x, 0) = ϕ0 (x), where H is usually nonlinear but at least Lipschitz continuous. It is widely known that global C 1 solution does not exist for this equation, regardless of smoothness of initial condition. This is a direct implication, at least for one dimension case, from the fact that if we take u = ϕx , the Hamilton-Jacobi equation is equivalent to hyperbolic conservation law of function u. Singularities of u are discontinuities, so u is bounded with bounded variation, and so is derivative of ϕ. The idea of viscosity solution, a specific kind of weak solution, is then introduced to theory of Hamilton-Jacobi equation, to describe the unique physical relevant solution. More details are discussed in e.g. [102, 103, 104]. Because of the relation between Hamilton-Jacobi equation and hyperbolic conserva- tion law, ENO and WENO method applied in conservation law would be similarly ap- 7 plied to Hamilton-Jacobi equation. High order ENO schemes for solving Hamilton-Jacobi equations were developed in [105] for the second order case and in [106] for the more gen- eral cases, based on ENO schemes for solving conservation laws [91, 92, 93]. High order WENO schemes for solving Hamilton-Jacobi equations were developed in [107], based on WENO schemes for solving conservation laws [94, 99]. The framework of WENO schemes for solving Hamilton-Jacobi equations is again similar to that of ENO schemes described in the previous section. The key ingredient in designing a nonlinear weight, as discussed in [107], is similar to that in [99] for conservation laws, namely the smoothness indicator is a scaled sum of the squares of the L2 norms of the second and higher deriva- tives of the interpolation polynomial on the target interval. We refer to the lecture notes of Shu [12] for more details of ENO and WENO schemes. To couple with ENO and WENO spatially semi-discrete methods above, a popular time discretization method to choose is the class of strong stability preserving (SSP), which is also referred to as total variation diminishing (TVD) or high order Runge-Kutta time discretizations; see [92, 108, 109, 110]. 1.4 Organization of Thesis In Chapter 2, we consider the Maxwell equations with nonlinear optical media, where the nonlinearity comes from cubic Kerr effect and Raman scattering [111, 112]. The nu- merical simulation of such equations is considered to be computationally intensive but substantially more robust and physically relevant compared with Nonlinear Schrodinger Equation (NLSE) models. A semi-discrete energy-stable DG method was applied to such equations in one dimension [1] or higher dimension [2], while fully implicit method or trapezoidal method was used for time discretization. We construct the method under adaptive SGDG framework [5, 6, 83], with L2 indicator norm on each element for coars- ening and refinement. Sparse grid collocation method with interpolatory multiwavelet basis functions are required to deal with nonlinear problems, and its adaptive version 8 reduces computational cost [7]. Fast wavelet transform is applied to transform between point values or derivatives of numerical solution at interpolation points, and coefficients of hierarchial multiwavelet basis functions; the fast transform computes results dimen- sion by dimension after proper decomposition of one dimension matrix operator into upper and lower triangular part, so computational cost is further reduced [6, 7]. A C++ package is developed by a group of colleagues, for general numerical simulation using SGDG framework [8], incorporating the standard SGDG implementation [3] with Alpert’s multiwavelet basis [76], Lagrangian and Hermite interpolatory multiwavelet [7] for non- linear problems, adaptivity [5], fast matrix-vector multiplication [6], etc. Several time integration method, including IMEX-RK [6] was implemented in the package, and data structures, linear solvers, and multi-thread parallelism in Eigen package [113] of numer- ical linear algebra is integrated to the SGDG package. In our work, we further apply PETSc package [114] to implement the adaptive SGDG method. PETSc offers a whole set- ting of advanced data structures, which are designed for multi-core parallel computing; its rich library of nonlinear solvers [115], including a group of matrix-free method e.g. Jacobian-free Newton–Krylov methods [116], is vital for general simulations for nonlin- ear problems when Jacobian is not attainable. Several examples are made to test accuracy and to show efficiency of the implementation of the presented algorithm. In Chapter 3, we present a simplified ordinary differential model of data flow on pro- cessors in a high performance computing framework, which involves computations ne- cessitating inter-processor communications. The asymptotic limit of this model treats the computer as a continuum of processors and data flow as an Eulerian fluid governed by a conservation law, which implies a Hamilton-Jacobi equation [103, 104] for which the existence and uniqueness of solutions can be proven. We choose WENO method for nu- merical simulation of the continuum Hamilton-Jacobi model; WENO interpolation uses a convex combination of candidate stencils, and by assigning each stencil a nonlinear weight based on local smoothness of numerical solution on the stencil, WENO interpola- 9 tion has high order accuracy [11]. Comparing to other numerical methods like artificial viscosity or TVD method, there is no problem-dependent parameter in WENO method, and no accuracy degeneration occurs in smooth region of numerical solution [12]. Thus numerical experiments for the continuum model are calculated using a spatially fifth or- der WENO interpolation coupling with optimal third order SSP Runge-Kutta time inte- gration [12, 11], and compared with results from discrete model. A qualitative agreement is shown between the simulations on discrete and continuum models, and we investi- gate the effect of variations in the computing environment’s processing capabilities on the progress of the modeled computation. The major contents of this chapter was already published in [10]. 10 CHAPTER 2 APPLICATION OF SGDG METHOD TO MAXWELL’S EQUATION IN NONLINEAR OPTICS 2.1 Maxwell’s Equation in Nonlinear Optics The propagation of electromagnetic waves in general media is modeled by the time- dependent Maxwell’s partial differential equations (PDEs), coupled with constitutive laws that describe the response of the media. Particularly, in nonlinear media, the material response depends nonlinearly on the optical field, and many interesting physical phe- nomena, such as frequency mixing and second/third-harmonic generation have been ob- served and harnessed for practical applications. We refer to classical textbooks [117, 118, 119] for a more detailed review of the field of nonlinear optics. When Maxwell’s equations are considered to model the electromagnetic waves propa- gating through a nonlinear optical medium, the medium response is described by consti- tutive laws that relate the electric field E and the electric flux density D through the polar- ization P of the medium. Here we focus on a macroscopic phenomenological description of the polarization, which comprises both linear and nonlinear responses. Specifically, the linear response is modeled by a single resonance Lorentz dispersion, while the non- linear response is cubic and incorporates the instantaneous Kerr effect and the delayed nonlinear Lorentz dispersion called Raman scattering. Within this description, we will follow the auxiliary differential equation (ADE) approach, where the linear and nonlin- ear Lorentz dispersion is represented through a set of ODEs, describing the time evolution of P (hence of D) forced by E, appended to Maxwell’s equations. An alternative repre- sentation is via a recursive convolution method, where D is computed from E through a time convolution integral [120, 1]. We begin with the Maxwell’s equations, which govern the time evolution of the elec- 11 tric field E and magnetic field H in a non-magnetic nonlinear optical medium, on time domain (0, T ) and spatial domain Ω: ∂t B + ∇ × E = 0 (2.1) ∂t D + Js − ∇ × H = 0 (2.2) ∇·B = 0 (2.3) ∇·D = ρ (2.4) along with initial and boundary data in the domain Ω ⊂ Rd , d = 1, 2, 3. The variable Js is the source current density, and ρ is the charge density. The electric flux density D and the magnetic induction B are related to the electric and magnetic field, respectively, via the constitutive laws D = ϵ0 (ϵ∞ E + P) (2.5) B = µ0 H (2.6) where P is the polarization. The dielectric parameter is ϵ0 , the electric permittivity of free space, ϵ∞ , the relative electric permittivity in the limit of the infinite frequency, and µ0 , the magnetic permeability of free space. We will assume here that all model parameters are constant, and the material is isotropic. To model the linear and nonlinear dispersion in the material we use the auxiliary dif- ferential equation (ADE) approach as presented in [121, 120]. A thorough discussion of the modeling of Raman and Kerr effects in optical (silica) fibers can be found in [111]. The linear (L) delayed or retarded response of the material to the electromagnetic field is captured in the polarization, P, via a linear single resonance Lorentz response, which, in the form of a second order ODE, is provided as, ∂ 2 PLDelay 1 ∂PLDelay 2 + + ω02 PLDelay = ωp2 E, ∂t τ ∂t which could be split into first order system ∂PLDelay ∂J 1 = J, = − J − ω02 PLDelay + ωp2 E. ∂t ∂t τ 12 Here ω0 and ωp are the resonance and plasma frequencies of the medium, respectively, and τ −1 is a damping constant. In addition, ωp2 = (ϵs −ϵ0 )ω02 , with ϵs as the relative permittivity at zero frequency. For pulse widths that are sufficiently short (for e.g., shorter than 1 pico-second (ps) for Silica) [122], the nonlinear response has an instantaneous as well as a delayed component. For the nonlinear (NL) response of the medium, we will consider a cubic Kerr-type instan- taneous response, and a retarded Raman molecular vibrational response called Raman scattering. The Kerr effect is a phenomenon in which the refractive index of a material changes proportionally to the square of the applied electric field. Raman scattering arises from the electric field induced changes in the internal nuclear vibrations on time scales about 1 to 100 femto-seconds (fs) [112], and is modeled by a nonlinear single resonance Lorentz delayed response. The two nonlinear responses are given as PN L Kerr = a(1 − θ)E|E| , 2 and PN L Delay = aθQE, while the total nonlinear response is PN L = PN L NL Kerr +PDelay . Here a is a third order coupling constant, θ parameterizes the relative strength of the instantaneous electronic Kerr and retarded Raman molecular vibrational responses, and Q describes the natural molecular vibrations within the dielectric material that has frequency many orders of magnitude less than the optical wave frequency, responding to the field intensity. The time evolution of Q is given by the following ODE, ∂ 2Q 1 ∂Q 2 + + ωv2 Q = ωv2 |E|2 , ∂t τv ∂t where ωv is the resonance frequency of the vibration, and τv−1 a damping constant. This is essentially a model for a simple linear oscillator, but coupled to the nonlinear field intensity |E|2 . 13 At the end we obtain the following system, on (0, T ) × Ω, µ0 ∂t H = −∇ × E (2.7a) ∂t D = ∇ × H (2.7b) ∂t P = J (2.7c) 1 ∂t J = − J − ω02 P + ωp2 E (2.7d) τ ∂t Q = σ (2.7e) 1 ∂t σ = − σ − ωv2 Q + ωv2 |E|2 (2.7f) τv with constitutive law D = ϵ0 (ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE). (2.8) Note that here P is essentially PLDelay , the linear delayed response. As demonstrated in [1, 2], under the assumption of periodic boundary conditions, the energy of the system is µ0 ϵ0 ϵ∞ 2 ϵ0 ϵ0 ω02 2 ϵ0 aθ 2 ϵ0 aθ Z E(t) = |H|2 + |E| + 2 |J|2 + 2 |P| + 2 σ + Q|E|2 Ω 2 2 2ω p 2ω p 4ω v 2 3ϵ0 a(1 − θ) 4 ϵ0 aθ 2 + |E| + Q dΩ, (2.9) 4 4 satisfying d ϵ0 ϵ0 aθ Z Z 2 E =− 2 |J| dx − 2 σ 2 dx ≤ 0. (2.10) dt ωp τ Ω 2ωv τv Ω Besides, E ≥ 0 if θ ∈ [0, 3/4]. The one dimensional case is as following: µ0 ∂t H = ∂x E (2.11a) ∂t D = ∂x H (2.11b) ∂t P = J (2.11c) 1 ∂t J = − J − ω02 P + ωp2 E (2.11d) τ ∂t Q = σ (2.11e) 1 ∂t σ = − σ − ωv2 Q + ωv2 E 2 (2.11f) τv D = ϵ0 (ϵ∞ E + P + a(1 − θ)E 3 + aθQE) (2.11g) 14 Note that here we assume uniformity of all the vector fields in the y and z directions. Thus, all derivatives with respect to y and z in the curl and divergence operators are set to zero. All field quantities are represented by a single scalar component. The scalar mag- netic field H (hence B) represents the 2nd (or the 3rd) component of the vector magnetic field H, and the scalar electric flux density D (hence E) represents the 3rd (or the 2nd) component of D (hence E). Gauss’s laws (2.3) (2.4) only involve the x derivatives of the first components of B and D, and therefore they are decoupled from the one-dimensional model and become irrelevant. Numerical simulation later depends on dimensionless ver- sion [1] ∂t H = ∂x E (2.12a) ∂t D = ∂x H (2.12b) ∂t P = J (2.12c) 1 ∂t J = − J − ω02 P + ωp2 E (2.12d) τ ∂t Q = σ (2.12e) 1 ∂t σ = − σ − ωv2 Q + ωv2 E 2 (2.12f) τv D = ϵ∞ E + P + a(1 − θ)E 3 + aθQE (2.12g) 2.2 Formulation of Energy Stable Adaptive SGDG Method in One Di- mension In this section, we describe the energy-stable adaptive SGDG method in one dimension. We start from general formulation of SGDG method, and how it is applied to the nonlin- ear Maxwell equation as we specified above. Adaptive scheme based on SGDG formula- tion will also be discussed in the end of the section. 15 2.2.1 SGDG Method and Alpert’s Multiwavelets We first define the one-dimension spatial domain and its SGDG basis functions [3]. Con- sider Ω = [0, 1] as our spatial domain, and define a set of nested grids, where the n-th level grid Ωn consists of 2n uniform cells Inj = (2−n j, 2−n (j + 1)], j = 0, 1, . . . , 2n − 1 for any n ≥ 0. For notation convenience, we also denote I−1 = [0, 1]. Define Vnk = {v : v ∈ P k (Inj ), j = 0, 1, . . . , 2n − 1}, the piecewise polynomial space, of degree at most k, on n-th level grid Ωn , and we have the nested structure V0k ⊂ V1k ⊂ V2k ⊂ . . . . Define Wnk the orthogonal complement of Vn−1 k in Vnk , with respect to L2 inner product in [0, 1], i.e. M k Vn−1 Wnk = Vnk , k Vn−1 ⊥ Wnk . Again for notation convenience, let W0k = V0k ; then for any natural number N , M VNk = Wnk . 0≤n≤N As discussed in [76], we define a set of orthonormal basis functions for each Wnk , n = 0, . . . , N and they form a basis of VNk . The case of grid level n = 0 is trivial; the normalized shifted Legendre polynomials in [0, 1] will be proper choice for such purpose. 0 Denote these Legendre polynomials by vi,0 for i = 0, . . . , k. When n > 0, we consider the orthonormal basis of W1k ; basis of general Wnk , n ≥ 1 will turn out to be a scaling version of basis of W1k . In particular, orthonormal basis of W1k is defined as √ hi (x) = 2fi (2x − 1), i = 0, 1, . . . , k, where for fixed integer k ≥ 0, f0 , f1 , . . . , fk are a sequence of functions supported on [−1, 1], satisfying 16 1. fi on (0, 1) is a polynomial with degree k; 2. each fi extends evenly or oddly to (−1, 0), as in the following formula fi (x) = (−1)i+k+1 fi (−x), (2.13) for any x ∈ (−1, 0); 3. f0 , f1 , . . . , fk are orthonormal i.e. Z 1 fi (x)fj (x)dx = δij −1 for all 0 ≤ i, j ≤ k, where δij equals to 1 if i = j and 0 otherwise; 4. fj has vanishing moments Z 1 fj (x)xi dx = 0, −1 with i = 0, ..., j + k, for each 0 ≤ j ≤ k. It was shown [76] that we can compute all fi through a Gram-Schmidt procedure. The particular form of fi up to k = 4 are provided in Table 1 of [76], and for completeness we offer these results here. For simplicity, we only provide definition of fi on (0, 1), and its values on (−1, 0) are obtained by proper extension as described in equation (2.13). • k=0 r 1 f0 (x) = ; 2 • k=1 r r 3 1 f0 (x) = (−1 + 2x), f1 (x) = (−2 + 3x); 2 2 • k=2 r r 1 1 2 1 3 f0 (x) = (1 − 24x + 30x ), f1 (x) = (3 − 16x + 15x2 ), 3 2 2 2 r 1 5 f2 (x) = (4 − 15x + 12x2 ); 3 2 17 • k = 3: r r 15 1 f0 (x) = (1 + 4x − 30x2 + 28x3 ), f1 (x) = (−4 + 105x − 300x2 + 210x3 ), 34 42 r r 1 35 1 5 f2 (x) = (−5+48x−105x2 +64x3 ), f3 (x) = (−16+105x−192x2 +105x3 ); 2 34 2 42 • k = 4: r 1 f0 (x) = (1 + 30x + 210x2 − 840x3 + 630x4 ), 186 r 1 1 f1 (x) = (−5 − 144x + 1155x2 − 2240x3 + 1260x4 ), 2 38 r 35 f2 (x) = (22 − 735x + 3504x2 − 5460x3 + 2700x4 ), 14694 r 1 21 f3 (x) = (35 − 512x + 1890x2 − 2560x3 + 1155x4 ), 8 38 r 1 7 f4 (x) = (32 − 315x + 960x2 − 1155x3 + 480x4 ). 2 158 These multiwavelet functions retain orthonormal properties of wavelet bases for different hierarchical levels. More precisely, Wnk with n ≥ 1 has basis j vi,n (x) = 2(n−1)/2 hi (2n−1 x − j) = 2n/2 fi (2n x − 2j − 1), i = 0, ..., k, j = 0, ..., 2n−1 − 1. j j Each vi,n is supported on In−1 , with discontinuity on x = 2−n w, w = 2j, 2j + 1, 2j + 2. Note j n−1 −1 S that {vi,n }0≤j≤2 0≤i≤k,n≥1 0 {vi,0 }0≤i≤k is an orthonormal set in L2 ([0, 1]), i.e. Z 1 j ′ vi,n (x)vij′ ,n′ (x)dx = δii′ δjj ′ δnn′ , 0 j n−1 −1 }0≤j≤2 S 0 while {vi,n 0≤i≤k, 1≤n≤N {vi,0 }0≤i≤k is an orthonormal basis of VNk . 2.2.2 Semi-Discrete DG Method for the Maxwell’s Equation To formulate semi-discrete DG method [1, 27] for system (2.11), we need some more no- tations. Recall that we use N to represent the finest level of nested grid. Define i xi+1/2 = , i = 0, 1, . . . , 2N , 2N 18 the discrete points on Ω = [0, 1] containing discontinuities for functions in VNk . As defined above, INj = (xj−1/2 , xj+1/2 ] for j = 1, 2, . . . , 2N ; h = 2−N is mesh size. We also denote v + , v − the right and left limit on discontinuities, and [v] = v + − v − the jump, {v} = (v + + v − )/2 the average. They can be evaluated at each cell boundary xj±1/2 , denoted by e.g (v + )j±1/2 . Then on each cell INj , we can formulate the standard semi-discrete DG method of system (2.11) following [27, 1]: find functions Hh (t, ·), Dh (t, ·), Eh (t, ·), Ph (t, ·), Jh (t, ·), Qh (t, ·), σh (t, ·), on space VNk , such that for each j = 0, 1, . . . , 2N − 1 and each cell INj , Z Z µ0 ∂t Hh ϕdx + Eh ∂x ϕdx − (E ch ϕ− )j+1/2 + (E ch ϕ+ )j−1/2 = 0, ∀ϕ ∈ V k (2.14a) j j N IN IN Z Z ∂t Dh ϕdx + Hh ∂x ϕdx − (H fh ϕ− )j+1/2 + (Hfh ϕ+ )j−1/2 = 0, ∀ϕ ∈ VNk (2.14b) j j IN IN ∂ t Ph = J h (2.14c) 1 ∂t Jh = − Jh − ω02 Ph + ωp2 Eh (2.14d) τ ∂t Qh = σh (2.14e)   1 Z Z ∂t σh ϕdx = − σh − ωv2 Qh + ωv2 Eh2 ϕdx, ∀ϕ ∈ VNk (2.14f) j IN INj τv with constitutive law Z Z Dh ϕdx = ϵ0 (ϵ∞ Eh + Ph + a(1 − θ)Eh3 + aθQh Eh )ϕdx, ∀ϕ ∈ VNk . (2.15) j j IN IN Both terms E ch and Hfh are numerical fluxes. In our numerical experiments, we pick one of the following as our numerical flux [1]: central flux Ech = {Eh }, Hfh = {Hh }; (2.16) alternating flux either alternating flux I ch = E + , H E fh = H − (2.17a) h h or alternating flux II ch = E − , H E fh = H + (2.17b) h h 19 dissipative flux, or upwind flux, inspired by the upwind flux for the Maxwell system without Kerr, linear Lorentz and Raman effects r 1 µ0 1 ϵ0 ϵ∞ r Eh = {Eh } + c [Hh ], Hh = {Hh } + f [Eh ]. (2.18) 2 ϵ0 ϵ∞ 2 µ0 Recall that under the assumption of periodic boundary condition, energy E = E(t) of system (2.11) is µ0 2 ϵ0 ϵ∞ 2 ϵ0 ϵ0 ω02 2 Z E = H + E + 2 J2 + P (2.19a) Ω 2 2 2ωp 2ωp2 ! ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ 3ϵ0 a(1 − θ) 4 + 2 σ + Q + QE 2 + E dx (2.19b) 4ωv 4 2 4 and its derivative d ϵ0 ϵ0 aθ Z Z 2 E =− 2 J dx − 2 σ 2 dx ≤ 0, (2.20) dt ωp τ Ω 2ωv τv Ω indicates that energy E is non-increasing. Besides, energy E is non-negative if θ ∈ [0, 3/4]. Similar results could be obtained for semi-discrete DG method: Theorem 2.2.1 (Semi-Discrete Stability [1]) Under periodic boundary condition and the three fluxes (2.16)(2.17)(2.18) above, the semi-discrete numerical method (2.14) and the discrete energy d Eh satisfies Eh ≤ 0. In addition, Eh ≥ 0 if θ ∈ [0, 3/4]. The discrete energy Eh is dt µ0 2 ϵ0 ϵ∞ 2 ϵ0 ϵ0 ω02 2 Z Eh = Hh + Eh + 2 Jh2 + P (2.21a) Ω 2 2 2ωp 2ωp2 h ! ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ 3ϵ0 a(1 − θ) + 2 σh + Qh + Qh Eh2 + Eh4 dx (2.21b) 4ωv 4 2 4 With proper assumption of regularity of exact solution, using argument of certain L2 pro- jection or Gauss-Ladau projection, it was proved that the semi-discrete method converges to exact solution, and optimal convergence rate is obtained for alternating and upwind flux. Theorem 2.2.2 (Error estimates of semi-discrete method [1]) Under periodic boundary con- dition, assume the following regularity of exact solution of system (2.11) E, H, P, Q, J, σ ∈ W 1,∞ ([0, T ]; H k+1 (Ω)), 20 E ∈ W 1,∞ ([0, T ]; W 1,∞ (Ω)), Q ∈ W 1,∞ ([0, T ]; L∞ (Ω)), with sufficiently small a and θ, for numerical scheme (2.14), we have ||u − uh || ≤ Chr , u = E, H, P, Q, J, σ, (2.22) where r = k for central flux (2.16), and r = k + 1 for alternating (2.17) or upwind flux (2.18). 2.2.3 Energy-Stable DG Method of the Maxwell Equation On temporal discretization, a main focus is provable energy stability of fully discrete method. This turns out to be a nontrivial task for the nonlinear Maxwell equation. Com- mon choices, such as the second order leap-frog or implicit trapezoidal method, may not yield provable stability results as for the linear models [1], while the main difficulties arise from the nonlinear Kerr and Raman terms. Here we introduce two fully discrete meth- ods: one can be understood as novel modifications of leap-frog or implicit trapezoidal method, and the other is a fully implicit method. These temporal discretizations are of formal second order accuracy. The leap-frog style method [1] is as following: given functions unh ∈ VNk at time t = tn , with u = H, D, E, P, J, Q, σ, find un+1h ∈ VNk of all u at tn+1 = tn + ∆t, so that for any j and 21 each cell INj , and any ϕ ∈ VNk , n+1/2 Hh − Hhn Z Z µ0 ϕdx + Ehn ∂x ϕdx − (E cn ϕ− )j+1/2 + (E cn ϕ+ )j−1/2 = 0 (2.23a) h h j IN ∆t/2 INj Dhn+1 − Dhn Z Z n+1/2 ^n+1/2 − ^n+1/2 + ϕdx + Hh ∂x ϕdx − (Hh ϕ )j+1/2 + (Hh ϕ )j−1/2 = 0 (2.23b) j IN ∆t INj Z Z n+1 Dh ϕdx = ϵ0 (ϵ∞ Ehn+1 + Phn+1 + a(1 − θ)Yhn+1 + aθQn+1 h Eh n+1 )ϕdx (2.23c) j j IN IN   3 Z Z Yhn+1 ϕdx = n n+1 2 n 2 Yh + [(Eh ) + (Eh ) ](Eh − Eh ) ϕdx n+1 n (2.23d) j IN j IN 2 Phn+1 − Phn Jhn+1 + Jhn = (2.23e) ∆t 2 Jhn+1 − Jhn 1 Jhn+1 + Jhn P n+1 + Phn E n+1 + Ehn =− − ω02 h + ωp2 h (2.23f) ∆t τ 2 2 2 n+1 n n+1 n Qh − Qh σ + σh = h (2.23g) Z ∆tn+1 2 σh − σhn 1 σhn+1 + σhn n+1 + Qnh Z   2 Qh 2 n n+1 ϕdx = − − ωv + ωv Eh Eh ϕdx (2.23h) j IN ∆t j IN τv 2 2 n+1/2 Hhn+1 − Hh Z Z [ n+1 − [ µ0 ϕdx + Ehn+1 ∂x ϕdx − (E [ h ϕ )j+1/2 + (E [ n+1 + h ϕ )j−1/2 = 0 (2.23i) j IN ∆t/2 j IN [ n+1 is similar to E cn , or the same as E n+1 , for central and alternat- Be aware that the flux E [ [ ing fluxes; for upwind flux, we have r [ n+1 = {E n+1 } + 1 µ0 n+1/2 E[ h [Hh ]. 2 ϵ0 ϵ∞ The fully implicit method [1] is as following, with the same set-up as the leap-frog 22 method above: Hhn+1 − Hhn Ehn+1 + Ehn Z Z µ0 ϕdx + ∂x ϕdx j IN ∆t j IN 2 Ehn+1\ + Ehn − E n+1 \ + Ehn + −( ϕ )j+1/2 + ( h ϕ )j−1/2 = 0 (2.24a) 2 2 Dhn+1 − Dhn Hhn+1 + Hhn Z Z ϕdx + ∂x ϕdx j IN ∆t INj 2 Hhn+1^ + Hhn − H n+1 ^ + Hhn + −( ϕ )j+1/2 + ( h ϕ )j−1/2 = 0 (2.24b) Z Z 2 2 Dhn+1 ϕdx = ϵ0 (ϵ∞ Ehn+1 + Phn+1 + a(1 − θ)Yhn+1 + aθQn+1 h Eh n+1 )ϕdx (2.24c) j j IN IN   3 Z Z n+1 2 Yhn+1 ϕdx = n n 2 Yh + [(Eh ) + (Eh ) ](Eh − Eh ) ϕdx n+1 n (2.24d) j IN j IN 2 Phn+1 − Phn Jhn+1 + Jhn = (2.24e) ∆t 2 n+1 n+1 Jh − Jh n 1J + Jhn P n+1 + Phn E n+1 + Ehn =− h − ω02 h + ωp2 h (2.24f) ∆t τ 2 2 2 n+1 Qn+1 − Q n h σ + σ n h h = h (2.24g) ∆t 2 σhn+1 − σhn 1 σhn+1 + σhn n+1 + Qnh Z   2 Qh Z 2 n n+1 ϕdx = − − ωv + ωv Eh Eh ϕdx (2.24h) j IN ∆t j IN τv 2 2 We can establish the energy stability for the resulting fully discrete methods as following. Theorem 2.2.3 (Fully Discrete Stability [1]) Under periodic boundary condition and any of the three numerical fluxes, the discrete energy is non-increasing, i.e. Ehn+1 ≤ Ehn . In addition, Ehn ≥ 0, if θ ∈ [0, 3/4] and CFL condition ∆t ≤ Ch are satisfied. Here C is a constant depending on constants µ0 , ϵ0 , ϵ∞ of the Maxwell equation, polynomial degree k, and choice of numerical flux. The discrete energy Eh at t = tn is µ0 n 2 ϵ0 ϵ∞ 2 ϵ0 Z Ehn = (Hh ) + Eh + 2 Jh2 Ω 2 2 2ωp ! ϵ0 ω02 2 ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ 3ϵ0 a(1 − θ) + 2 Ph + 2 σh + Qh + Qh Eh2 + Eh4 dx (2.25) 2ωp 4ωv 4 2 4 23 for fully implicit method, and µ0 n+1/2 n−1/2 ϵ0 ϵ∞ 2 ϵ0 Z Ehn = Hh Hh + Eh + 2 Jh2 Ω 2 2 2ωp ! ϵ0 ω02 2 ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ 3ϵ0 a(1 − θ) 4 + 2 Ph + 2 σh + Qh + Qh Eh2 + Eh dx (2.26) 2ωp 4ωv 4 2 4 for leap-frog method. 2.2.4 Interpolatory Multiwavelets In addition to Alpert’s basis for SGDG space VNk , we need interpolatory multiwavelets to obtain a second set of basis, to deal with nonlinear terms, e.g. f (uh ) when uh ∈ VNk and f is nonlinear [7, 5, 6]. Alpert’s multiwavelets and the space Wnk are constructed, corresponding to the dif- ference of the L2 projection on adjacent levels. Instead, the sparse grid collocation basis proposed in [7] corresponds to interpolation on nested grids. Denote the set of interpola- tion points in the interval I = [0, 1] at mesh level 0 by X0 = {xi }Pi=0 ⊂ I, where the number of interpolation points in X0 is P + 1. Then the interpolation points at mesh level n ≥ 1, Xn , can be obtained in the following manner: Xn = {xji,n := 2−n (xi + j), i = 0, . . . , P, j = 0, . . . , 2n − 1}. In order to save computational cost, the points on different level of Xn should be nested, i.e., X0 ⊂ X1 ⊂ X2 ⊂ X3 ⊂ · · ·, which can be achieved by requiring X0 ⊂ X1 . Given the interpolation points, we define the basis functions on the 0-th level grid as Lagrange (K = 0) or Hermite (K ≥ 1) interpolation polynomials of degree less than or equal to M := (P + 1)(K + 1) − 1 which satisfy the property (l′ ) ϕi,l (xi′ ) = δii′ δll′ . 24 for i, i′ = 0, ..., P and l, l′ = 0, ..., K. It is easy to see that span{ϕi,l : i = 0, . . . , P, l = 0, . . . , K} = V0M . The constants P, K, M will be specified later, depending on the Alpert’s multiwavelets that the interpolatory multiwavelets relate to and the problem to solve. With the basis function at mesh level 0, we can define basis function at mesh level n ≥ 1: ϕji,l,n := 2−nl ϕi,l (2n x − j), i = 0, . . . , P, l = 0, . . . , K, j = 0, . . . , 2n − 1, which is a complete basis set for VnM . Now we introduce the hierarchical representations based on the nested structure of interpolation points. Define X̃0 := X0 and X̃n := Xn \ Xn−1 for n ≥ 1; then we have the decomposition [n Xn = X̃i . i=0 Denote the points in X̃1 by X̃1 = {x̃i }Pi=0 . Then the points in X̃n for n ≥ 1 can be repre- sented by X̃n = {x̃ji,n := 2−(n−1) (x̃i + j) : i = 0, . . . , P, j = 0, . . . , 2n − 1}. For notational convenience, we let W f0M = V0M . The increment function space W fnM for n ≥ 1 is introduced as a function space that satisfies M VnM = Vn−1 M fM , W n and is defined through the multiwavelets ψi,l ∈ V1M that satisfy (l′ ) (l′ ) ψi,l (xi′ ) = 0, ψi,l (x̃i′ ) = δii′ δll′ for i, i′ = 0, ..., P and l, l′ = 0, ..., K. Here the superscript l′ denotes the l′ -th order deriva- tive. Then W f M is given by n fnM = span{ψ j , W i = 0, ..., P, l = 0, ..., K, j = 0, ..., 2n−1 − 1}, i,l,n where j ψi,l,n (x) := 2−(n−1)l ψi,l (2n−1 x − j). 25 For completeness, we list the basis functions used in this thesis in the following, and we focus on Hermite interpolation here [6]; for more possible choices of basis, see [6, 7]. The basis functions in W f1 are piecewise polynomials on Il = (0, 1/2) and Ir = (1/2, 1). Note that the functions may be discontinuous at the interface x = 1/2; thus Il and Ir are both defined to be open intervals. The basis functions in W f1 here are all supported on one half interval Il or Ir and vanish on the other half. For simplicity, we will only declare the function on its support. For example, ψ0 (x)|Ir gives the definition of ψ0 on Ir while vanishing on Il . The interpolation points are put at the cell interface  +  − e0 = {0+ , 1− }, 1 1 X X1 = { e , } 2 2 Here and below, we use superscripts +, − to emphasize the left and right limits of a func- tion at that point, which is a feature of the discontinuous piecewise polynomial space. The basis functions in W f03 and W f13 are, when P = 1, K = 1, ϕ0,0 (x) = (x − 1)2 (2x + 1), ϕ1,0 (x) = −x2 (2x − 3), ϕ0,1 (x) = x(x − 1)2 , ϕ1,1 (x) = x2 (x − 1), and ψ0,0 (x)|Il = −4x2 (4x − 3), ψ1,0 (x)|Ir = 4(x − 1)2 (4x − 1), ψ0,1 (x)|Il = 2x2 (2x − 1), ψ1,1 (x)|Ir = 2(x − 1)2 (2x − 1). When P = 1, K = 2, the basis functions in W f 5 and W f 5 are 0 1 ϕ0,0 (x) = −(x − 1)3 (6x2 + 3x + 1), ϕ0,1 (x) = −x(x − 1)3 (3x + 1), 1 ϕ0,2 (x) = − x2 (x − 1)3 , ϕ1,0 (x) = x3 (6x2 − 15x + 10), 2 1 ϕ1,1 (x) = −x3 (x − 1)(3x − 4), ϕ1,2 (x) = x3 (x − 1)2 , 2 and ψ0,0 (x)|Il = 16x3 (12x2 − 15x + 5), ψ1,0 (x)|Ir = −16(x − 1)3 (12x2 − 9x + 2), 26 ψ0,1 (x)|Il = −8x3 (2x − 1)(3x − 2), ψ1,1 (x)|Ir = −8(x − 1)3 (2x − 1)(3x − 1), ψ0,2 (x)|Il = x3 (2x − 1)2 , ψ1,2 (x)|Ir = −(x − 1)3 (2x − 1)2 . The construction above has close connection with interpolation operators. For a given function f (x) ∈ C K+1 (I), we define INP,K [f ] as the standard Hermite interpolation on VNM , and we have the representation n−1 X N max(2X−1,0) X K X P INP,K [f ](x) = bji,l,n ψi,l,n j (x). n=0 j=0 l=0 i=0 Clearly, (INP,K − INP,K fM −1 )[f ](x) is in WN . The algorithm converting between the point values and the derivatives {f (l) (xji,n )} to hierarchical coefficients {bji,l,n } is given in [7], and by a standard argument in fast wavelet transform, can be performed in O(M 2n ) flops. 2.2.5 SGDG Method with Multiresolution Interpolation in One Dimension As discussed in section (2.2.3), we derive the energy stable DG method. To corporate it with multiresolution interpolation [6], we need to decide how to numerically calculate terms Z 1 f (Uhn+1 , Uhn )ϕdx, ∀ϕ ∈ VNk , (2.27) 0 in fully discrete numerical scheme (2.23) and (2.24), where U = (H, D, E, P, J, Q, σ) and f (Uhn+1 , Uhn ) takes one of the following up to some constants: [(Ehn+1 )2 + (Ehn )2 ](Ehn+1 − Ehn ), Ehn Ehn+1 , Qn+1 h Eh n+1 . Notice that nonlinear terms of the Maxwell equation discussed in the thesis occur as part of integrand of integrals over spatial domain. The fully discrete SGDG schemes with interpolation are to replace nonlinear term in form of (2.27) to Z 1 I f (Uhn+1 , Uhn ) ϕdx   (2.28) 0 27 where I[·] is a multiresolution interpolation operator INP,K as described in previous sec- tion, onto finite element space VNM with the same multiresolution structure as VNk , but of polynomial degree M = (P + 1)(K + 1) − 1, as we discuss above. To elaborate the implementation of such algorithm regarding treatment to nonlinear terms, assume we know both Uhn+1 and Uhn for an implicit scheme. We first read the (derivative) values of Uhn+1 and Uhn , each component of which is a linear combination of Alpert’s basis functions at the chosen interpolation points. We then calculate the (deriva- tive) values of f at interpolation points of interpolatory multiwavelet basis. Last, we transform the (derivative) values back to coefficients of (Alpert’s or interpolatory) multi- wavelet basis, by using the algorithm introduced in [7]. Such numerical algorithm can be performed through a fast matrix-vector product as in [6, 84]. As stated in [6], we remark that the computational cost does not increase too much compared to the multiresolution DG schemes for linear equations [5]. The cost of the transformation from the (derivative) values to hierarchical coefficients is only linearly dependent on the dimension d [7]. 2.3 Formulation of Energy Stable Adaptive SGDG Method in Higher Dimension 2.3.1 Semi-Discrete and Fully-Discrete Energy Stable DG Method As mentioned before in (2.7) and (2.8), we consider the following Maxwell system for vector-valued H, D, E, P, J and scalar Q, σ, on (0, T ) × Ω, µ0 ∂t H = −∇ × E (2.29a) ∂t D = ∇ × H (2.29b) ∂t P = J (2.29c) ∂t J = −γJ − ω02 P + ωp2 E (2.29d) ∂t Q = σ (2.29e) ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2 (2.29f) 28 with constitutive law D = ϵ0 (ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE). (2.30) Here 1 1 γ= , γv = . τ τv For simplicity, we consider the method only for the two-dimension transverse electric (TE) mode. Extension to the full three-dimension model is straightforward [2]. Thus, we consider the two-dimension system of equations µ0 ∂t Hz = ∂y Ex − ∂x Ey (2.31a) ∂t Dx = ∂y Hz (2.31b) ∂t Dy = −∂x Hz (2.31c) D = ϵ0 (ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE) (2.31d) ∂t P = J (2.31e) ∂t J = −γJ − ω02 P + ωp2 E (2.31f) ∂t Q = σ (2.31g) ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2 (2.31h) Here, all vector fields have two components polarized in the x-y plane, e.g. D = (Dx , Dy ), E = (Ex , Ey ), P = (Px , Py ), and J = (Jx , Jy ), except H = Hz . Q and σ are scalar compo- nents. Assume computational domain, now on x-y plane, is Ω = [0, 1] × [0, 1], and consider the uniform mesh for Sparse Grid DG purpose. Define i j xi+ 1 = , yj+ 1 = , i, j = 0, 1, . . . , 2N . 2 2N 2 2N Then {xi+1/2 } and {yj+1/2 } divide [0, 1] on x and y direction uniformly into 2N many cells. Let Ii = [xi− 1 , xi+ 1 ], Jj = [yj− 1 , yj+ 1 ], Kij = Ii × Jj , ∀i, j, 2 2 2 2 29 and {Kij }i,j is a partition or mesh of Ω. Each cell Kij has cell center xi− 1 + xi+ 1 yj− 1 + yj+ 1 i − 1/2 j − 1/2 (xi , yj ) = ( 2 2 , 2 2 )=( , ), 2 2 2N 2N and grid size is ∆x = ∆y = 2−N . Regarding the mesh, we define Vhk = v ∈ L2 (Ω) : v|Kij ∈ Qk (Kij ), ∀i, j ,  where Qk (Kij ) consists of polynomials with degree up to k in each variable on Kij . With- out confusion, Vhk is also used to represent its vector version in the following. For any function v ∈ Vhk , we write v(x±i+ 1 , y) = lim± v(xi+ 1 + ϵ, y), ± v(x, yj+ 1 ) = lim v(x, yj+ 1 + ϵ), ± 2 ϵ→0 2 2 ϵ→0 2 which are left or right limit on cell boundary of x or y direction. Average and jump on cell interface x = xi+1/2 is 1 + −  {v}xi+ 1 = v(xi+ 1 , y) + v(xi+ 1 , y) , [v]xi+ 1 = v(x+ i+ 1 , y) − v(x− i+ 1 , y), 2 2 2 2 2 2 2 and on cell interface y = yj+1/2 is 1 + −  + − {v}yj+ 1 = v(x, yj+ 1 ) + v(x, y ) , [v]yj+ 1 = v(x, yj+ 1 ) − v(x, y ). 2 2 2 j+ 21 2 2 j+ 12 A typical semi-discrete DG method is as following [2]: find each component of Hzh , Dh , Eh , Ph , Jh , Qh and σh in Vhk , under periodic boundary condition, such that, µ0 (∂t Hzh , ϕ) + BhE (Exh , Eyh , ϕ) = 0, ∀ϕ ∈ Vhk (2.32a) H (∂t Dxh , ϕ) + Bxh (Hzh , ϕ) = 0, ∀ϕ ∈ Vhk , (2.32b) H (∂t Dyh , ϕ) + Byh (Hzh , ϕ) = 0, ∀ϕ ∈ Vhk , (2.32c)     Dh = ϵ0 ϵ∞ Eh + Ph + a(1 − θ)Ih |Eh |2 Eh + aθIh (Qh Eh ) , (2.32d) ∂ t P h = Jh (2.32e) ∂t Jh = −γJh − ω02 Ph + ωp2 Eh (2.32f) ∂t Qh = σh (2.32g) ∂t σh = −γv σh − ωv2 Qh + ωv2 Ih (|Eh |2 ) (2.32h) 30 where Ih is interpolation on Gauss-Legendre points, projecting nonlinear terms to DG space, (·, ·) the inner product on Ω = [0, 1]2 , and 2N X Z 1 BhE (Exh , Eyh , ϕ) = −(Eyh , ∂x ϕ) + (Exh , ∂y ϕ) − E dyh (xi+ 1 , y)[ϕ]xi+ 1 dy 2 2 i=1 0 2 N X Z 1 + E xh (x, yj+ 1 )[ϕ]yj+ 1 dx, (2.33a) d d 2 2 j=1 0 2 N X Z 1 H Bxh (Hzh , ϕ) = H zh (x, yj+ 1 )[ϕ]yj+ 1 dx + (Hzh , ∂y ϕ), (2.33b) g g 2 2 j=1 0 2 N X Z 1 H Byh (Hzh , ϕ) = − Hg zh (xi+ 1 , y)[ϕ]xi+ 1 dy − (Hzh , ∂x ϕ). (2.33c) 2 2 i=1 0 We choose the fluxes to be central flux E yh (xi+ 1 , y) = {Eyh }xi+ 1 , Exh (x, yj+ 1 ) = {Exh }yj+ 1 , (2.34a) d d d 2 2 2 2 H zh (xi+ 1 , y) = {Hzh }xi+ 1 , Hzh (x, yj+ 1 ) = {Hzh }yj+ 1 , (2.34b) g g g 2 2 2 2 or alternating flux † ♮ E yh (xi+ 1 , y) = Eyh (xi+ 1 , y), Exh (x, yj+ 1 ) = Exh (x, yj+ 1 ), (2.35a) d d d 2 2 2 2 ‡ ♯ H zh (xi+ 1 , y) = Hzh (xi+ 1 , y), Hzh (x, yj+ 1 ) = Hzh (x, yj+ 1 ), (2.35b) g g g 2 2 2 2 where (†, ‡), (♮, ♯) = (+, −) or (−, +). For the semi-discrete in space methods with the central or alternating numerical fluxes, one can establish an energy relation similar as for the continuous model. Additionally, error estimates can be proved and they are optimal with respect to the approximation property of the discrete space Vhk when the numerical fluxes are alternating. Theorem 2.3.1 (Semi-discrete in space energy stability [2]) Under the assumption of peri- odic boundary conditions, the semi-discrete in space methods (2.32), with either central flux (2.34) or alternating flux (2.35), satisfy dEh (t) ϵ0 γ ϵ0 aθγv Z Z 2 =− 2 |Jh | dΩ − σh2 dΩ ≤ 0, (2.36) dt ωp Ω 2ωv2 Ω 31 with the discrete energy defined as µ0 2 ϵ0 ϵ∞ ϵ0 ϵ0 ω02 ϵ0 aθ 2 Z Eh = Hzh + |Eh |2 + 2 |Jh |2 + |Ph |2 + σ Ω 2 2 2ωp 2ωp 2 4ωv2 h ! ϵ0 aθ 3ϵ 0 a(1 − θ) ϵ0 aθ Ih Qh |Eh |2 + Ih |Eh |4 + Q2h dΩ.   + (2.37) 2 4 4 Meanwhile, when θ ∈ [0, 3/4], Eh ≥ 0. Theorem 2.3.2 (Semi-discrete in space error estimates [2]) Let T > 0 be given. Let κerr , ρerr ∈ (0, 1) be arbitrary constants, then under periodic boundary conditions and   1 1. θ ∈ 0, , 1 + 3(1 − ρerr )−2 2. aθCk ∥Q∥∞ ≤ ϵ∞ (1 − κerr ), and Ck2   2 2 3. a 12(1 − θ)Ck ∥E∥∞ ∥∂t E∥∞ + (12 − 11θ) ∥∂t E∥∞ + 2θCk ∥∂t Q∥∞ ≤ ϵ∞ κerr , ρerr as well as the exact solution being sufficiently smooth, the numerical solution uh given by 2.32 with suitable initialization admits the following error estimate at time T ∥u − uh ∥L2 (Ω) ≤ CC(κerr , ρerr )hr , u = Hz , E, P, J, σ, Q, (2.38) where r = k for central flux (2.34), and r = k + 1 for alternating flux (2.35). Here C is a generic constant independent of mesh size h, but may depend on k, the mesh parameter δ, the model parameters, and some Sobolev norm of the exact solutions up to time T . Following the semi-discrete method, the fully-discrete leap-frog DG scheme is, given n Hzh , Enh , Dnh , Jnh , Pnh , σhn and Qhn in Vhk at time tn , we find Hzh n+1 , En+1 n+1 n+1 n+1 h , Dh , Jh , P h , 32 σhn+1 and Qhn+1 ∈ Vhk at time tn+1 = tn + ∆t, satisfying ! n+1/2 n Hzh − Hzh µ0 , ϕ + BhE (Exh n n , Eyh , ϕ) = 0, ∀ϕ ∈ Vhk , (2.39a) ∆t/2  n+1 n Dxh − Dxh  H n+1/2 , ϕ + Bxh (Hzh , ϕ) = 0, ∀ϕ ∈ Vhk , (2.39b) ∆t n+1 ! n Dyh − Dyh H n+1/2 , ϕ + Byh (Hzh , ϕ) = 0, ∀ϕ ∈ Vhk , (2.39c) ∆t Dn+1 = ϵ0 ϵ∞ En+1 + Pn+1 + a(1 − θ)Yhn+1 + aθIh Qn+1 n+1  h h h h Eh , (2.39d) Yhn+1 = Yhn + Ih |En+1 n+1 2 n 2 n  n+1 n  h | + |E h | − E h ·E h E h − E h 1 + Ih (En+1 + Enh )·(En+1 − Enh )(En+1 + Enh ) ,  h h h (2.39e) 2 Pn+1 h − P n h Jn+1 + Jnh = h , (2.39f) ∆t 2 Jn+1 − Jnh Jn+1 + Jnh Pn+1 + Pnh En+1 + Enh h +γ h + ω02 h = ωp2 h , (2.39g) ∆t 2 2 2 Qn+1 h − Qnh σ n+1 + σhn = h , (2.39h) ∆t 2 σhn+1 − σhn σ n+1 + σhn Qn+1 + Qnh + γv h + ωv2 h = ωv2 Ih (En+1 n h ·Eh ), (2.39i) ∆t 2 ! 2 n+1 n+1/2 Hzh − Hzh n+1 n+1 µ0 , ϕ + BhE (Exh , Eyh , ϕ) = 0, ∀ϕ ∈ Vhk (2.39j) ∆t/2 The terms of BhE , Bxh H , and Byh H are defined in (2.33), with either the central fluxes (2.34) or alternating fluxes in (2.35). Energy stability result for fully discrete scheme follows: Theorem 2.3.3 (Fully-discrete energy stability [2]) Under the assumption of periodic bound- ary conditions, the fully-discrete leap-frog nodal DG schemes (2.39) satisfy ϵ0 γ∆t ϵ0 aθγv ∆t Z Z Ehn+1 − Ehn =− |Jn+1 h + Jnh |2 dΩ − (σhn+1 + σhn )2 dΩ ≤ 0, (2.40) 4ωp2 Ω 8ωv2 Ω with the discrete energy Ehn defined as µ0 n+1/2 n−1/2 ϵ0 ϵ∞ n 2 ϵ0 ϵ0 ω02 n 2 ϵ0 aθ n 2 Z  Ehn = Hzh Hzh + |Eh | + 2 |Jnh |2 + |Ph | + (σ ) Ω 2 2 2ωp 2ωp2 4ωv2 h ϵ0 aθ n n 2  3ϵ0 a(1 − θ) n 4  ϵ0 aθ n 2  + Ih Qh |Eh | + Ih |Eh | + (Qh ) dΩ. (2.41) 2 4 4 In addition, Ehn ≥ 0 if θ ∈ [0, 3/4] and ∆t ≤ Ch for some constant C. 33 Similarly, we can formulate fully implicit scheme as in (2.39); however, update of Hzh at intermediate time step n + 1/2 is unnecessary, so in fully implicit scheme, (2.39j) should be removed, while (2.39a),(2.39b),and (2.39c) should be replaced by n+1 n E n+1 + E n  n+1 n Hzh − Hzh  E Exh + Exh yh yh µ0 , ϕ + Bh ( , , ϕ) = 0, ∀ϕ ∈ Vhk (2.42a) ∆t 2 2  n+1 n n+1 n Dxh − Dxh  H Hzh + Hzh , ϕ + Bxh ( , ϕ) = 0, ∀ϕ ∈ Vhk , (2.42b) ∆t 2 n+1 ! n Dyh − Dyh n+1 H Hzh + Hzh n , ϕ + Byh ( , ϕ) = 0, ∀ϕ ∈ Vhk , (2.42c) ∆t 2 All simulations of two dimension in Section 2.6 use fully implicit scheme (2.39d)-(2.39i) and (2.42). 2.3.2 High Dimensional Sparse Grid DG Method and Interpolatory Multiwavelets Based on one dimensional Sparse Grid DG Method we discussed before, we are ready to define it in higher dimension d > 1. First we recall some basic notations about multi- indexes. For a multi-index α = (α1 , . . . , αd ) ∈ Nd0 , where N0 denotes the set of nonnegative integers, the l1 and l∞ norms are defined as X d |α|1 = αi , |α|∞ = max αi . 1≤i≤d i=1 The component-wise arithmetic operations are α · β = (α1 β1 , . . . , αd βd ), c · α = (cα1 , . . . , cαd ), 2α = (2α1 , . . . , 2αd ), while and relational operations are defined as α ≤ β ⇐⇒ αi ≤ βi ∀i; α = β ⇐⇒ α ≤ β and β ≤ α; α < β ⇐⇒ α ≤ β and β ≤ α. By making use of the multi-index notation, we denote by l = (l1 , . . . , ld ) ∈ Nd0 , the mesh level in a multivariate sense. Define the tensor-product mesh grid O O O Ωl = Ωl1 Ωl2 ... Ωld , 34 as tensor product of one dimensional mesh, and the corresponding mesh size hl = (hl1 , . . . , hld ). For convenience, we also write hli as hi , for i = 1, 2, . . . , d. Based on grid Ωl , we denote by Ilj = (h1 j1 , h1 (j1 + 1)) × (h2 j2 , h2 (j2 + 1)) × . . . × (hd jd , hd (jd + 1)) an elementary cell, and Vlk = {v : v(x) ∈ Qk (Il )j , 0 ≤ j ≤ 2l − 1} = Vl1k,x1 × Vl2k,x2 × . . . × Vldk,xd the tensor-product piecewise polynomial space, where Qk (Ilj ) denotes the collection of polynomials of degree up to k in each dimension on cell Ilj , and Vlik,xi the piecewise k- th order polynomial space of variable xi , with x = (x1 , . . . , xi , . . . , xd ). If we use equal mesh refinement of size hN = 2−N in each coordinate direction, the grid and space will be k denoted by ΩN and VN , respectively. Based on a tensor-product construction, the multi-dimensional increment space can be defined as Wlk = Wlk1 ,x1 × Wlk2 ,x2 × . . . × Wlkd ,xd Therefore, the standard tensor-product piecewise polynomial space on N can be written as M k VN = Wlk , |l|∞ ≤N ; l∈Nd0 while the sparse grid approximation space as discussed in e.g. [3] is M Vbk = N Wlk , (2.43) |l|1 ≤N ; l∈Nd0 k b k scales as O((k + 1)d 2N N d−1 ) [3], which is signif- a subset of VN . The dimension of V N k icantly less than that of VN with exponential dependence on N d. The approximation b k are discussed in e.g. [3], which has a stronger smoothness requirement results for V N k than the traditional space VN . However, for adaptive scheme we will discuss later, we 35 will not require the numerical solution to be in V b k , but rather in Vk , and we can choose N N k basis functions from VN to solve the PDE problem in each time step adatpively. Finally, we define the polynomial basis functions of degree k in multi-dimensions as Y d j vi,l (x) = vijmm,lm (xm ), m=1 and for all m, lm ∈ N0 , 0 ≤ im ≤ k, 0 ≤ jm ≤ max{0, 2lm −1 − 1}. The orthonormality of the basis is established in sections before. Furthermore, the support j of vi,l of all i are Ilj′ , where l′ = (l1′ , . . . , ld′ ) and lm ′ = max{lm − 1, 0}. Following the same manner, we can construct high-dimensional interpolatory multi- wavelets. Let fM = W W fM × W fM × . . . × W fM , l l1 ,x1 l2 ,x2 ld ,xd where W f M , m = 1, . . . , d are one-dimensional interpolatory multiwavelet spaces of lm ,xm mesh level lm and m-th dimension variable xm , and M here is degree of polynomials of interpolatory multiwavelets. Recall that we might choose M as a larger integer than k. Then M M fM, VN = W l |l|∞ ≤N ; l∈Nd0 while the sparse grid approximation space is M eM = V fM. W N l |l|1 ≤N ; l∈Nd0 2.3.3 Adaptive SGDG Method In this subsection, we formulate an adaptive multiresolution projection algorithm as dis- cussed in [123] and [5, 6]. The last two references provide full details of adaptive SGDG scheme. Define an accuracy threshold or error threshold ϵ > 0; we will use this threshold throughout this subsection when we demonstrate the adaptive scheme, in both refine- ment and coarsening procedures. In practice, we choose smaller accuracy threshold for 36 coarsening than refinement, usually δ = ε/10, where ε and δ are error threshold for re- finement and coarsening, respectively. The backbone of the algorithm is the fact that each hierarchical basis, of space VNk in k one dimension or of space VN in higher dimension, represents some details of a solution or a function on a specific mesh scale, which naturally provides an error indicator for the design of adaptive algorithms. In one dimension, consider u(x) has L2 projection ũ(x) on VNk supported on Ω = [0, 1], as linear combination of Alpert’s multiwavelet basis functions: n−1 −1} X k X N max{0,2 X ũ(x) = uji,n vi,n j (x), (2.44) i=0 n=0 j=0 where Z 1 uji,n = u(x)vi,nj (x)dx. 0 It was shown in [4] that n−1 −1} Xk max{0,2 X |uji,n |2 ≤ C2−(q+1)n |u|Hq+1 (Ω) , i=0 j=0 if u ∈ Hp+1 (Ω), where seminorm Hp+1 (Ω) is dp+1 u |u|Hp+1 (Ω) = , dxp+1 L2 (Ω) and q = min{p, k}. The hierarchial coefficients uji,n here (also called hierarchial surplus) naturally serves as an indicator for local smoothness of u(x). The main idea of the adap- tive algorithm is to choose only those basis functions or elements whose coefficients above a prescribed threshold value ϵ. Here we consider using L2 norm of u as indicator, and j since basis functions vi,n are orthnormal, this is equivalent to k !1/2 X j 2 |ui,n | i=0 for each fixed n and fixed 0 ≤ j ≤ max{0, 2n−1 − 1}. In summary, we would flag an element Vnj if k !1/2 X |uji,n |2 ≥ ϵ. i=0 37 k In higher dimension, projection ũ of function u on VN is X X X j ũ(x) = uji,l vi,l (x), (2.45) 0≤i≤k 0≤l≤N 0≤j≤max{0,2l−1 −1} where Z uji,l = u(x)vi,l j (x)dx, (2.46) Ω with spatial domain Ω = [0, 1]d and dimension d. Here multi-index of constant is simple duplicate of the constant in d times, where d is dimension of spatial domain, e.g. 1 = (1, . . . , 1), N = (N, . . . , N ), and max function here is component-wise. It is also demonstrated in [4, 5] that  1/2 X X  |uji,l |2  ≤ C2−(q+1)l |u|Hq+1 (Ω) , ∀u ∈ Hp+1 (Ω), 0≤i≤k 0≤j≤max{0,2l−1 −1} similar to the one dimension case. So in higher dimension, an element Vlj will be flagged as important if !1/2 X |uji,l |2 ≥ ϵ. (2.47) 0≤i≤k The adaptive scheme is implemented by hash table as the underlying data structure, which stores active elements of the adaptive scheme, and the numerical solution on these active elements. The implementation details will be discussed in next section. We now introduce the concepts of child, parent and leaf elements [6], for better understanding of ′ the adaptive procedure. Assume Vlj and Vlj′ are two elements, satisfying |l|∞ , |l′ |∞ ≤ N . Then element Vlj ais called child element of Vlj , if and only if for some 1 ≤ m ≤ d, l′m = lm + 1, where lm and l′m are m-th component of multi-index l and l′ , respectively. ′ Accordingly, we call Vlj a father element of Vlj′ . If an element does not have any of its child elements in the hash table, then we call it a leaf element. To describe the time evolution of adaptive scheme [6], consider now the numerical solution uh on active elements at time step tn is stored in a hash table H. Information of 38 leaf elements is stored in a separate leaf table. The time evolution from time step tn to tn+1 = tn + ∆t consists of four steps. The first step is the prediction step, which predicts the location on spatial domain Ω where the details of numerical solution uh become sig- nificant at the next time step tn+1 , to determine in next step where we should add more elements in order to capture the fine structures. We solve from tn to tn+1 using a cheap (p) solver, e.g. the forward Euler method in time, to obtain predicted solution uh . For spa- tial discretization in this step, if any nonlinear terms involve, the interpolation operator I applied here has the same multiresolution structure as determined by the hash table H corresponding to the numerical solution uh at current time step tn . (p) The second step is refinement step based on predicted solution uh . An active element in hash table, satisfying (2.47), is considered as important element. All child elements Vlj of an important element will be added to the hash table, to obtain new hash table H (p) , and coefficients uji,l for 0 ≤ i ≤ k associated to newly added multiwavelet basis functions, as in (2.46), are set to be zeros. Leaf table is also updated accordingly. Now in the third step, with numerical solution uh at time step tn , and a new hash table H (p) , we can solve numerical solution ũn+1h for the next step tn+1 before the final coarsening step. Again, the interpolation operator is now determined by the new hash table H (p) . The last step is coarsening step, to remove leaf elements that are not important. We traverse elements in the leaf table, and if a leaf element does not satisfy (2.47), it will be removed from both the leaf table and hash table. Note that the leaf table has to be dynam- ically updated in this procedure, since the removal of any leaf element may potentially create one or more new leaf elements. For the same reason, coarsening is done in recur- sive way, until no more leaf elements are removed and created. Now we obtain the new hash table and leaf table, and numerical solution un+1 h at time step n + 1. The last component of the algorithm is an efficient implementation of the hash table. As suggested in [124, 75] and further discussed in [4], the hash table approach is easy to implement, requires little storage overhead, and allows one to conveniently deal with hi- 39 erarchical multi-index (l, j) in the implementation. Specifically, by a prescribed hash func- tion that will be discussed in details in 2.4.2, the hierarchical multi-index (l, j) is mapped to a hash key (an integer), which serves as an address in the hash table. Then, given a hierarchical index, the associated data can be easily stored and retrieved by computing the hash key. 2.4 Adaptive SGDG Package and PETSc Implementation 2.4.1 Overview of Package Before we take into account full details of the package, we briefly introduce the struc- ture of the SGDG package [8] here, which is developed using C++ language. The SGDG package is developed under the same logic as development of SGDG methods and their adaptive version, together with necessary time integration methods, to solve several dif- ferent PDEs. Therefore in general, the SGDG pacakge consists significantly of two parts, regarding spatial and time discretization, together with other supporting classes and files. The most important part is SGDG spatial discretization, including Alpert’s multi- wavelet basis, interpolatory multiwavelet basis, formulation of bilinear form, represen- tation of numerical solutions as linear combination of basis functions, and fast algorithms for matrix multiplication. Several other features are implemented to support these pur- poses, including hash table to refer to basis functions, quadrature functions, and trans- formation of numerical solution under different multiwavelet basis, etc. Besides, several time integration methods are implemented to corporate with spatial SGDG discretiza- tion, including several Runge Kutta method or SSP methods [92, 108, 109, 110], and IMEX method [125, 6], with help of data structures and linear solvers in Eigen package [113]; af- ter re-implementing several key functions regarding PETSc [9] data structures, generally fully implicit method with nonlinear terms becomes available in the package. At the end, several examples regarding different equations are written for further research and users in future; several unit tests are developed to verify correctness of certain significant and 40 complicated C++ classes and functions; makefile is developed to handle compiling and linking procedure to create executable effectively; doxygen [126] is introduced to generate documentation for source codes of the package. 2.4.2 Hash Key of SGDG Elements For implementation of SGDG method, hash key is a key ingredient. After spatial dis- cretization, each element or cell in SGDG formulation is provided a unique integer as the hash key, and this element could be identified in the later calculations by the hash key. To describe the details of hash key, we first notice that each element is uniquely identi- fied by its levels and support indexes on each dimension. Consider the SGDG formulation of d dimension and an element Vlj , then on dimension i, the level of the element is li and the support is identified as ji , 1 ≤ i ≤ d, which are i-th component of multi-index l and ′ j, respectively. Assume there are two elements Vlj and Vlj′ , then element Vlj has smaller ′ hash key than Vlj′ , if and only if 1. |l|1 < |l′ |1 , i.e. Xd X d li < li′ ; i=1 i=1 or 2. |l|1 = |l′ |1 , and li = li′ for i = p + 1, p + 2, . . . , d, but lp < lp′ for some 1 ≤ p ≤ d; or 3. li = li′ for all 1 ≤ i ≤ d, and ji < ji′ for i = 1, 2, . . . , p − 1, but jp < jp′ for some 1 ≤ p ≤ d. It is clear that under such procedure, two elements, whose level vectors have the same l1 norm, or who have the same level vectors, will have relatively close hash key numbers; elements with smaller l1 norm of level vector will have smaller hash key numbers. These requirements are suitable for SGDG scheme, since sparse grid approximation space (2.43) collects elements with l1 norm of level vector bounded by the maximum mesh level N . 41 2.4.3 SGDG Spatial Discretization A DGSolution class is defined, in general, to represent the SGDG solutions as linear com- bination of (Alpert’s or interpolatory) multiwavelet basis functions. The class stores sev- eral important information, including maximum mesh level N = NMAX, dimension d = DIM, number of components of solutions VEC_NUM, and most importantly, the hash ta- ble implemented in a C++ standard container map dg. Certain member functions are im- plemented, which includes initialization by L2 projection of initial functions, copy func- tion to copy SGDG solutions among realizations, evaluation functions to evaluate SGDG solutions at different spatial points regarding different basis functions, functions to com- pute errors, treatments regarding artificial viscosity, etc. Active elements of SGDG method, together with the numerical solution, as discussed in last section, is stored in hash table, implemented by a standard C++ container map called dg. The key values of the map dg are hash key discussed in 2.4.2; the hash key is mapped to the object of class Element, each of which represents element Vlj , and nu- j merical solution on the Alpert’s multiwavelet basis {vi,l }0≤i≤k . On each element or ob- ject of Element class, the associated coefficients of numerical solution for Alpert’s basis functions are stored in ucoe_alpt, while the associated coefficients for other interpolatory multiwavelet basis are stored in ucoe_intp. Other necessary data, including fucoe_intp for coefficients of f (u) for interpolatory multiwavelet basis with given nonlinear function f and numerical solution u, are also declared and stored in Element class. Some basic information of the element, including the support and discontinuous points of the ba- sis functions, are also stored explicitly for further use in volume integral etc. Besides, to compute the volume integral and cell boundary integral more efficiently, we need to determine whether two basis functions (or two elements) are orthogonal to each other; several functions in the class play such role. For the purpose of adaptive method, we need to add or remove elements dynamically twice on each time step, so we need to store the pointer to children element in hash_ptr_chd and to parent element in hash_ptr_par. 42 class DGSolution { public: // constructor and destructor of the class static int DIM; // dimension of spatial domain static int VEC_NUM; // number of unknowns ... // find pointers to elements that are related to current element void find_ptr_vol_alpt(); void find_ptr_flx_alpt(); void find_ptr_vol_intp(); void find_ptr_flx_intp(); // functions regarding error calculations and copying data between vectors // key of the map is hash key // basis functions and associated data store in object of class Element std::unordered_map dg; // calculate value of DG solution at a given point std::vector val(const std::vector & x, \ const std::vector & derivative) const; // functions for initialization of DG solution protected: const bool sparse; const int NMAX; /// maximum mesh level Hash hash; AllBasis all_bas; AllBasis all_bas_Lag; AllBasis all_bas_Her; ... }; Algorithm 2.1: DGSolution Class, in DGSolution.h There are also several classes defined for basis functions, all of which are derived from the base class Basis. Common features of basis functions are defined here, including level of mesh, support of basis and discontinuous points, index indicating which multiwavelet function the current basis function is scaled from, and several functions to compute vol- ume or edge integrals. The derived classes, including AlptBasis, LagrBasis, HermBa- sis, represent the corresponding Alpert’s multiwavelet basis functions and Lagrangian or Hermite interpolatory multiwavelet basis functions. Expression of basis functions and their derivatives are implemented for further usage in volume and edge integral calcula- 43 class Element { public: Element(const std::vector & level_, const std::vector & suppt_, \ AllBasis & all_bas, Hash & hash); ~Element() {}; static int DIM; ///< dimension of spatial domain static int VEC_NUM; ///< number of unknowns static std::vector> is_intp; ///< specify which components need interpolation const std::vector level; ///< mesh level in each dimension const std::vector suppt; ///< support index in each dimension /// left and right end point of support in multi-dimension std::vector xl; std::vector xr; std::vector> dis_point; ///< discontinuous points std::vector> supp_interv; ///< interval of support std::vector order_elem; ///< order for this element in each dimension int hash_key; ///< hash key for this element static int PMAX_alpt; //< polynomial degree for Alpert’s basis static int PMAX_intp; //< polynomial degree for interpolatory basis // order of Alpert’s or interpolatory multiwavelet basis in this element, e.g std::vector> order_alpt_basis_in_dg; ... // coefficients of Alpert’s basis, and in fast multiplication e.g. std::vector> ucoe_alpt; ... // coefficients of interpolatory basis for numerical solution u, e.g. std::vector< VecMultiD > ucoe_intp; ... // coefficients for interpolatory basis of f(u), e.g. std::vector< std::vector< VecMultiD > > fucoe_intp; ... // coefficients of numerical solution on next time step, for fully implicit method std::vector< VecMultiD > up_intp_next; std::vector< VecMultiD > ucoe_alpt_next; // number of total children and parent elements; used in adaptive scheme int num_total_chd, num_total_par; // maps that store existing children and parents elements; used in adaptive scheme std::unordered_map hash_ptr_chd; std::unordered_map hash_ptr_par; ... }; Algorithm 2.2: Element Class, in Element.h 44 tions. Previous paragraphs discuss the top-down structure from DGSolution class to basis functions, and these classes are core classes for regular SGDG methods. However, to im- plement adaptive methods properly, a derived DGAdapt class from DGSolution class is necessary. The main feature added to DGAdapt is functions refine() and coarsen(), which conduct the refinement and coarseness procedure in adaptive method. The basic logic of these two algorithms are to search among elements stored in dg map of DGSolution class, or through leaf table called leaf and leaf_zero_child that store all leaf elements or leaf elements with no active child elements, to determine if an element is important, as discussed in details in Subsection 2.3.3. Note that an element’s importance and whether an element should be removed are determined by the element and its active children, so the determination process is local. The data structures defined above provide a clear picture of how solution is repre- sented in the SGDG algorithm, but we still need certain data structures or classes of ma- trix to represent related transformations when time integration is applied. Recall that in Basis class, we define a general class for all basis, together with operations on basis func- tions, especially certain inner products between different basis functions or their deriva- tives. We then define, accordingly, a class template AllBasis, which collects all basis functions in one dimension, where T represents one of AlptBasis, LagrBasis, HermBasis, specifying which type of basis is collected in the AllBasis; we employ C++ template here to provide a generic class among all kinds of basis functions. On top of the AllBasis class, we define class OperatorMatrix1D, which provides one dimensional transformation matrix from coefficients of numerical solution. Using the information provided in OperatorMatrix1D, we are able to assemble and store matrixes regarding SGDG spatial discretization in class BilinearForm. BilinearForm supports two sets of data structure: for thread parallelization we could use Eigen package [113] and its matrixes and vectors; for implicit scheme using nonlinear solver in time stepping, we use 45 class DGAdapt : public DGSolution { public: // constructor and destructor of class DGAdapt // refine based on reference solution generated by Euler forward void refine(); // coarsen void coarsen(); ... protected: const double eps; // refinement threshold const double eta; // coarsening threshold // leaf element that does not have its all children std::unordered_map leaf; // leaf element with no child std::unordered_map leaf_zero_child; // after adding or deleting, check no holes in solution void check_hole(); // update leaf table DGAdapt::leaf void update_leaf(); // add a given element into DG solution void add_elem(Element & elem); // delete a given element into DG solution void del_elem(Element & elem); ... private: // coarsen based on leaf elements with no child void coarsen_no_leaf(); // update leaf table DGAdapt::leaf_zero_child void update_leaf_zero_child(); ... }; Algorithm 2.3: DGAdapt class, in DGAdapt.h PETSc package [9] and its data structure and solvers. 46 2.4.4 Time Integration and PETSc Package The package uses ODESolver class to deal with time integration, coupling with data struc- ture from Eigen package. ODESolver class stores matrix and solution vectors together at the same time, and provide several functions to operate matrix and vector operations. Several specific time integration methods are defined as derived class of ODESolver, in- cluding explicit Strong Stability Preserving (SSP) methods, multistep methods, and IMEX method. Implementations of some of these methods require solving a linear system, in which case linear solver of Eigen package will be used to solve linear system when evolv- ing the solution to next time step. This kind of approach made certain implicit method available in the package, but a fully implicit method involving nonlinearity would not fit into such framework. To deal with this, we introduce PETSc package [9] for nonlinear solvers involved. In order to illustrate how SGDG package couples with PETSc package, we provide the following linear example. Consider linear PDE Ut − (AU )x − (BU )y = 0, (t, x, y) ∈ [0, T ] × [0, 1] × [0, 1] with periodic boundary condition on both x and y direction, where U (t, x, y) = (U0 , U1 , U2 )T is vector-valued with three components, and      0 0 −1 0 1 0     A=  0 0 0 ,  B=  1 0 0.      −1 0 0 0 0 0 An accurate solution of this system is √ 5t+x+2y)) √ U (t, x, y) = ecos(2π( ( 5, 2, −1). The semi-discrete SGDG scheme of this equation is, to find vector-valued solution U = 47 U (t, x, y), for any vector-valued test function V , such that 2N X Z 1 C01 U (x+ , y) + C00 U (x− , y) · [V ]   (Ut , V ) + (AU, Vx ) + dy 0 x=xj+1/2 j=1 2N X Z 1 C11 U (x, y + ) + C10 U (x, y − ) · [V ]   + (BU, Vy ) + dx = 0 (2.49) 0 y=yj+1/2 j=1 where (·, ·) is inner product on spatial region [0, 1]2 of (x, y). For alternating flux c0 = U0− , U c1 = U1+ , U c2 = U2+ , U we have          0 0 0 0 0 −1 0 0 0 0 1 0         C00 = 0 0 0 , C01 =0 0 0  , C10 = 1 0 0 , C11 = 0 0 0 .               −1 0 0 0 0 0 0 0 0 0 0 0 We can rewrite (2.49) as (Ut , V ) + B(U, V ) = 0 where B(·, ·) is a bilinear form. For fully implicit method in time, consider numerical solutions at time step tn and tn+1 = tn + ∆t are U n and U n+1 respectively, then ∆t F (U n+1 ; U n , V ) := (U n+1 − U n , V ) + B(U n+1 + U n , V ) = 0. (2.50) 2 In Algorithm 2.4, we provide an example of solving (2.48), after removing some tech- nical details, to reveal the main structure of the SGDG package. Maximum mesh level NMAX and polynomial degree PMAX of Alpert’s multiwavelet basis are provided by command line arguments. Static variables, hash key in class Hash, collection of all Alpert’s multiwavelet basis functions in class AllBasis, one dimension matrix for lin- ear operations in class OperatorMarix1D, and numerical solution and hash table in class DGSolution, are defined in proper order. Functions with name format assemble_matrix_*_system assemble the matrix corresponding to bilinear form B(·, ·), using coefficients matrix A, B, and C00 , C01 , C10 , C11 , and store the assembled ma- trix in PETSc matrix mat_petsc of class HyperbolicAlpt, a derived class from class Bilin- earForm. 48 // include header files typedef struct AppCtx AppCtx; struct AppCtx { HyperbolicAlpt* app_HyperbolicAlpt; DGSolution* app_DGSolution; double dt; }; int main(int argc, char** argv) { PetscErrorCode ierr; ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) return ierr; int NMAX=6,PMAX=1; ierr=PetscOptionsGetInt(NULL,NULL,"-NMAX",&NMAX,NULL);CHKERRQ(ierr); ierr=PetscOptionsGetInt(NULL,NULL,"-PMAX",&PMAX,NULL);CHKERRQ(ierr); // assign values to static variables AlptBasis::PMAX = PMAX; Element::DIM = 2; Element::VEC_NUM = 3; ... Hash hash; // hash key AllBasis all_bas_alpt(NMAX); const std::string boundary_type = "period"; OperatorMatrix1D \ oper_matx_alpt(all_bas_alpt, all_bas_alpt, boundary_type); DGSolution dg_solu(true, NMAX, NMAX, all_bas_alpt, hash); // initialize the DGSolution with an initial function HyperbolicAlpt linear(dg_solu, oper_matx_alpt); linear.assemble_matrix_vol_system(0,A); linear.assemble_matrix_vol_system(1,B); linear.assemble_matrix_flx_system(0,1,C01); linear.assemble_matrix_flx_system(0,-1,C00); linear.assemble_matrix_flx_system(1,1,C11); linear.assemble_matrix_flx_system(1,-1,C10); double final_time = 1./std::sqrt(5.); double cfl = 0.9; AppCtx app_sol; app_sol.app_HyperbolicAlpt = &linear; app_sol.app_DGSolution = &dg_solu; double current_time = 0.; // while loop to find numerical solution; see below // calculate errors; print solutions // Destroy PETSc objects return ierr; } Algorithm 2.4: Example of Linear Equations, in Linear2D.cpp 49 In the while loop of finding numerical solutions of each time step, as shown in Al- gorithm 2.5, a Scalable Nonlinear Equations Solvers (SNES) object of PETSc is created for solving general linear and nonlinear system. For linear problem, scalable linear equa- tions solvers (KSP) of PETSc is a better choice, and setting of KSP object is similar to SNES object. while (current_time < final_time) { double dt = cfl / std::pow(2., NMAX); dt = std::min( dt , finaltime - curr_time ); app_sol.dt = dt; // Initialize vectors for residual and next step solution Vec residual, next_step_solution; ... SNES snes; ierr = SNESCreate(PETSC_COMM_WORLD,&snes); ierr = SNESSetFunction(snes,residual,FormFunction,&app_sol); ierr = SNESSetFromOptions(snes); ierr = SNESSolve(snes,NULL,next_step_solution); // Copy next step solution back to DGSolution class curr_time += dt; // Destroy PETSc objects } Algorithm 2.5: While Loop to Find Numerical Solution in Algorithm 2.4 As in while loop of Algorithm 2.5, the key functions of SNES nonlinear solver are SNESSetFunction and SNESSolve. SNESSetFunction provides to the nonlinear solver a FormFunction, which compute F (x) if the nonlinear problem to be solved is F (x) = 0. In our example, as in Algorithm 2.6, FormFunction computes F (U n+1 ; U n , V ) with input of numerical solution of next time step U n+1 , and numerical solution of current time step U n and other related information are provided through an application context of class AppCtx. As in Algorithm 2.4, an AppCtx object stores time step dt= ∆t, and the pointer of DGSolution object of SGDG solution, and HyperbolicAlpt object of bilinear form of the system. Nevertheless, SNESSolve uses iterative method to find solution of nonlinear prob- 50 PetscErrorCode FormFunction(SNES snes,Vec next_u,Vec f,void* ctx) { AppCtx *user = (AppCtx*)ctx; PetscErrorCode ierr; Vec now_u; // Copy current step solution in user->app_DGSolution to now_u ... Vec sum_, diff_; // sum_ = 0.5 * (next_u + now_u) * user->dt ... // diff_ = next_u - now_u ... // f = diff_ + mat_petsc * sum_ ierr = MatMultAdd(user->app_HyperbolicAlpt->mat_petsc, sum_, diff_, f); return ierr; } Algorithm 2.6: Function FormFunction for Nonlinear Solver in Algorithm 2.4 lem, and SNESSetFromOptions sets various SNES parameters, either from command line arguments or by default. A typical set of options provided through command line argu- ments is like Linear2D -NMAX 6 -PMAX 3 -snes_mf \ -snes_rtol 1e-12 -snes_atol 1e-12 -snes_stol 1e-12 Algorithm 2.7: Command to Run Executable Linear2D which set maximum mesh level 6 and polynomial degree 3, choose matrix-free method through option -snes_mf, and set tolerance of nonlinear solvers by options –snes_rtol, –snes_atol, and –snes_stol. Before the end of this section, we remark that although the example provided here is linear, the nonlinear problem can fit into this structure too. In case a FormFunction is provided, matrix-free method can be applied to solve the numerical solution. 51 2.5 Numerical Simulation in One Dimensional Case 2.5.1 Kink Shape Solutions The first numerical test we consider, as discussed in [1], where a traveling wave solution was constructed for the instantaneous intensity-dependent Kerr response neglecting the influence of damping, corresponding to the case of θ = 0 and τ = ∞ in (2.11). This yields a simplified system from Equations (2.11) as the following: ∂t H = ∂x E (2.51a) ∂t D = ∂x H (2.51b) ∂t P = 6J (2.51c) ∂t J = 6(−ω02 P + ωp2 E) (2.51d) D = ϵ∞ E + P + aE 3 (2.51e) where the only nonlinear term comes from constitutive law of D, representing the cubic Kerr effect. We can find a traveling wave function E(x, t) = E(ξ) with ξ = 6(x−vt), where E(ξ) is comprised of a kink and anti-kink wave, and is solved based on the following ODE of E(ξ) and Φ(ξ): dE = Φ (2.52a) dξ dΦ 6av 2 EΦ2 + (ϵ∞ ω02 + ωp2 − ω02 /v 2 )E + aω02 E 3 = (2.52b) dξ 1 − ϵ∞ v 2 − 3av 2 E 2 Here the parameters are √ 0.6545 ϵ∞ = 2.25, a = 0.75, ω0 = 93.627179982222216, ωp = 3ω0 , v = √ , ϵ∞ with condition E(0) = 0, Φ(0) = 0.24919666777865812. These conditions are carefully chosen, so that the resulting solution of E, Φ are 6-periodic. The overall solution for all variables, based on E, Φ obtained from the ODE, again with 52 ξ = 6(x − vt), is the following: 1 1 H(x, t) = − E(ξ), D(x, t) = 2 E(ξ) (2.53) v v 1 P (x, t) = ( 2 − ϵ∞ )E(ξ) − aE(ξ)3 (2.54) v 1 J(x, t) = (ϵ∞ v − )Φ(ξ) + 3avE(ξ)2 Φ(ξ) (2.55) v Note that all H, D, E, P, J have period 1. The function E and J at t = 0 are given in Figure 2.1. Initial E Initial J 0.4 0.4 E J 0.2 0.2 0 0 -0.2 -0.2 -0.4 -0.4 0 2 4 6 0 2 4 6 Figure 2.1: Function E and J at t = 0 We perform an accuracy test for fully implicit method. We choose the CFL condition to be ∆t = CF L × h(k+1)/2 where h = 1/2N is mesh size, and k is degree of piecewise polynomial. CFL constant is set as in Table 2.1. L1 , L2 and L∞ errors with order, regarding function E and J, are shown, k 1 2 3 CFL 1 8 63 Table 2.1: CFL constant for fully implicit method respectively, in Table 2.2 and 2.3 for k = 1, in Table 2.4 and 2.5 for k = 2, and in Table 2.6 and 2.7 for k = 3. All errors are calculated at final time T = 1/v. It is clear that SGDG 53 scheme with P k Alpert’s multiwavelet basis has accuracy order k + 1. This is expected since SGDG is the same as traditional DG in one dimension. N L1 Error Order L2 Error Order L∞ Error Order 6 7.915e-04 1.506e-03 5.295e-03 7 2.217e-04 1.836 4.421e-04 1.769 1.708e-03 1.632 8 5.606e-05 1.983 1.143e-04 1.951 4.638e-04 1.881 9 1.399e-05 2.002 2.865e-05 1.996 1.180e-04 1.975 10 3.494e-06 2.001 7.160e-06 2.001 2.948e-05 2.001 Table 2.2: Errors and Orders of Accuracy of E. k = 1, T = 1/v. N L1 Error Order L2 Error Order L∞ Error Order 6 8.435e-03 1.623e-02 5.124e-02 7 2.604e-03 1.696 5.537e-03 1.551 1.982e-02 1.370 8 6.826e-04 1.931 1.514e-03 1.871 5.793e-03 1.775 9 1.717e-04 1.991 3.836e-04 1.980 1.486e-03 1.963 10 4.292e-05 2.000 9.603e-05 1.998 3.729e-04 1.994 Table 2.3: Errors and Orders of Accuracy of J. k = 1, T = 1/v. N L1 Error Order L2 Error Order L∞ Error Order 6 8.902e-04 1.718e-03 5.946e-03 7 1.161e-04 2.939 2.359e-04 2.864 9.532e-04 2.641 8 1.452e-05 3.000 2.967e-05 2.991 1.226e-04 2.959 9 1.816e-06 2.999 3.715e-06 2.998 1.541e-05 2.992 10 2.270e-07 3.000 4.643e-07 3.000 1.926e-06 3.000 Table 2.4: Errors and Orders of Accuracy of E. k = 2, T = 1/v. N L1 Error Order L2 Error Order L∞ Error Order 6 1.078e-02 2.051e-02 6.959e-02 7 1.426e-03 2.919 3.144e-03 2.706 1.200e-02 2.536 8 1.787e-04 2.996 3.992e-04 2.977 1.487e-03 3.013 9 2.236e-05 2.998 5.002e-05 2.997 1.844e-04 3.011 10 2.796e-06 3.000 6.253e-06 3.000 2.305e-05 3.000 Table 2.5: Errors and Orders of Accuracy of J. k = 2, T = 1/v. 54 N L1 Error Order L2 Error Order L∞ Error Order 6 8.642e-04 1.665e-03 5.768e-03 7 5.565e-05 3.957 1.134e-04 3.877 4.617e-04 3.643 8 3.473e-06 4.002 7.104e-06 3.996 2.926e-05 3.980 9 2.171e-07 4.000 4.440e-07 4.000 1.830e-06 3.999 10 1.357e-08 4.000 2.775e-08 4.000 1.144e-07 4.000 Table 2.6: Errors and Orders of Accuracy of E. k = 3, T = 1/v. N L1 Error Order L2 Error Order L∞ Error Order 6 1.058e-02 2.006e-02 6.870e-02 7 6.831e-04 3.953 1.521e-03 3.721 5.726e-03 3.585 8 4.279e-05 3.997 9.565e-05 3.991 3.510e-04 4.028 9 2.674e-06 4.000 5.980e-06 4.000 2.190e-05 4.003 10 1.688e-07 3.986 3.738e-07 4.000 1.371e-06 3.998 Table 2.7: Errors and Orders of Accuracy of J. k = 3, T = 1/v. Besides, we also use this example to test our adaptive method, and we calculate the convergence rate with respect to the error threshold Rε and with respect to degree of freedoms RDOF as following log(el−1 /el ) log(el−1 /el ) RDOF = , Rε = . log(DOFl /DOFl−1 ) log(εl /εl−1 ) where el is the standard L2 error, of either E or J component, with refinement parameter εl , and DOFl is the associated number of active degrees of freedom at final time after last coarsening step. 2.5.2 Soliton Propagation In this example, we will consider the soliton propagation similar to the setup in [1]. The coefficients in this example (2.12) are chosen as 1 1 ϵ∞ = 2.25, ϵs = 5.25, β1 = 3, = 1.168 × 10−5 , = 29.2/32, τ τv p a = 0.07, θ = 0.3, Ω0 = 12.57, ω0 = 5.84, ωv = 1.28, ωp = ω0 β1 . 55 ε DOF E error RDOF Rε J error RDOF Rε 5.00e-05 48 5.57e-05 3.46e-03 2.00e-05 58 3.10e-05 3.09 0.64 2.02e-03 2.86 0.59 1.00e-05 74 1.69e-05 2.50 0.88 1.26e-03 1.92 0.67 5.00e-06 88 9.14e-06 3.54 0.89 7.03e-04 3.38 0.85 2.50e-06 92 5.42e-06 11.73 0.75 3.72e-04 14.31 0.92 Table 2.8: Adaptive Result of P 2 case, using alternating flux ε DOF E error RDOF Rε J error RDOF Rε 5.00e-05 36 7.84e-05 2.46e-03 2.00e-05 40 3.42e-05 7.89 0.91 1.15e-03 7.23 0.83 1.00e-05 62 2.32e-05 0.88 0.56 8.14e-04 0.79 0.50 5.00e-06 76 8.76e-06 4.78 1.41 3.40e-04 4.28 1.26 2.50e-06 82 6.21e-06 4.54 0.50 1.79e-04 8.44 0.93 Table 2.9: Adaptive Result of P 2 case, using upwind flux ε DOF E error RDOF Rε J error RDOF Rε 5.00e-05 24 2.77e-05 1.71e-03 2.00e-05 32 1.64e-05 1.82 0.57 9.12e-04 2.18 0.69 1.00e-05 40 1.05e-05 2.01 0.65 5.65e-04 2.15 0.69 5.00e-06 52 6.01e-06 2.12 0.80 3.82e-04 1.49 0.56 2.50e-06 58 3.98e-06 3.76 0.59 2.75e-04 3.01 0.47 Table 2.10: Adaptive Result of P 3 case, using alternating flux ε DOF E error RDOF Rε J error RDOF Rε 5.00e-05 22 3.24e-05 1.25e-03 2.00e-05 26 1.76e-05 3.63 0.66 4.17e-04 6.55 1.19 1.00e-05 32 8.81e-06 3.35 1.00 2.33e-04 2.80 0.84 5.00e-06 44 6.58e-06 0.92 0.42 2.23e-04 0.14 0.06 2.50e-06 48 4.28e-06 4.94 0.62 1.32e-04 6.00 0.75 Table 2.11: Adaptive Result of P 3 case, using upwind flux Initially, all fields are zero. The left boundary is injected with an incoming solitary wave, for which the electric field is prescribed as E(x = 0, t) = f (t) cos(Ω0 t), where f (t) = M sech(t − 20). M is a constant to be specified later. As in [1], the boundary condition of H can be approximated from the linearized dispersion relation. Assuming a space-time harmonic variation ei(wt−kx) of all fields, the exact dispersion relation associ- 56 ated with the linear parts of the system implies that s √ ωp2 /ϵ∞ k = ω ϵ∞ 1 − 2 . ω − iω/τ − ω02 The approximate value of H is given by ( 8 ) Z ∞ X (−i)m  1 (m) iωt (m) H(x = 0, t) = H(ω)e b dω ≃ ℜ f (t) exp(iω0 t) . −∞ m=0 m! Z ω=Ω0 where ℜ is real part of complex number, f (m) (t) is the m-th derivative of f (t), and ( Z1 )(m) is the m-th derivative of Z = −ω/k with respect to ω, where k is considered as a function of ω as described above. Besides, we consider the absorbing right boundary derived from the linearized system, which is √ b = 3 E − − √1 H − , e = 3H− − ϵ∞ − E H E , 4 4 ϵ∞ 4 4 for central flux and alternating flux, and √ b = 1 E − − √1 H − , e = 1H− − ϵ∞ − E H E , 2 2 ϵ∞ 2 2 for upwind flux; these settings guarantee the energy stability for semi-discrete schemes as desired and discussed in [1]. We choose maximum mesh level to be N = 11, and the time step ∆t = CF L × ∆x = CF L/2N . The CFL number is chosen to be CF L = 0.3 for both alternating I/II fluxes and upwind flux, to eliminate the error from time stepping and to ensure the convergence to the expected solution. Figure A.1 and A.9 show the adaptive scheme solutions at two different time T = 40 or T = 80, for different fluxes and different degrees of Alpert’s multiwavelet basis of SGDG spaces, while the previous figure is for transient fundamental temporal soliton, and the later for transient second order temporal soliton. We choose refinement error threshold ε = 10−5 , and coarsening error threshold δ = ε/10 = 10−6 . As discussed in [121, 122, 1], a daughter pulse occurs other than the soliton-like pulse, and travels faster than the soliton-like pulse. Among all cases, the daughter pulse has 57 smaller magnitude and higher frequency, comparing to the soliton-like pulse. The daugh- ter pulse shows up in almost all simulations except in case of upwind flux with polyno- mial degree 1, since numerical dissipation of upwind flux is stronger than other flux, damping the magnitude of the pulse significantly. We also plot at time T = 40 and T = 80, the numerical solutions and active elements of adaptive scheme, for different fluxes and polynomial degree, in Figure A.2, A.3, A.4 for transient fundamental case, and in Figure A.10, A.11, A.12 for transient second order case. Among all cases, the active elements of numerical solution sweep from left bound- ary to right boundary, following the position of the soliton-like pulse. For any fixed flux, high order method uses fewer active elements on simulation, when the high order and low order method share the same refinement and coarsening accuracy threshold. It is also clear that fewer active elements are in the hash table for adaptive scheme with up- wind flux, especially at time T = 80 comparing to the other alternating fluxes; again this demonstrates upwind flux is numerically more dissipative than other fluxes, so that it significantly eliminates any oscillations. To demonstrate the function of refinement threshold ε, we also compute the transient fundamental temporal soliton with refinement threshold ε = 10−3 and coarsening thresh- old δ = 10−4 , in Figure A.5, A.6, A.7, and A.8. Spurious oscillations occur on simulations of alternating flux on the left side of soliton-like pulse; besides, oscillation on alternating flux II case has larger magnitude than that of alternating I case. There is no oscillation on upwind flux case, again due to numerical dissipation of upwind flux. Simulations with upwind flux use fewer active elements than alternating fluxes, which is a similar result as in ε = 10−5 case. Comparing to ε = 10−5 case, when polynomial degree and flux are the same, simulations of ε = 10−3 use fewer elements, especially on the right side of soliton-like pulse, and for this reason, daughter pulse in ε = 10−5 case does not show up in the ε = 10−3 case. Larger thresholds of refinement and coarsening allow larger error on numerical solution, so the numerical scheme becomes less sensitive on detecting the 58 daughter pulse of small magnitude. Similar to ε = 10−5 case, support of elements sweep from the left to the right following transport of soliton-like pulse. 2.6 Numerical Simulation in Higher Dimensional Case 2.6.1 Accuracy and Convergence Test of Linear Equation The following equation of unknown u = u(t, x, y), with (t, x, y) ∈ [0, T ] × Ω and Ω = [0, 1] × [0, 1], is considered in this test ut + ux + uy = 0, (2.56) with periodic boundary condition in both x and y direction. The initial condition is 1 u(0, x, y) = u0 (x, y) = , (2.57) 2 + sin(2π(x − y)) and the accurate solution is u(t, x, y) = u0 (x − t, y − t). (2.58) Spatially we use Sparse Grid DG method with upwind flux, and trapezoidal discretiza- tion in time. Since trapezoidal rule is second order in time, we use the following CFL condition: CF L ∆t ≤ (2.59) 1 1 + ∆x(k+1)/2 ∆y (k+1)/2 if spatial discretization is k + 1 order. The CFL number is chosen as the following: k 1 2 3 CFL 0.9 4.9 9.9 Table 2.12: CFL numbers We obtain the following tables of numerical error and order, for end time T = 1. Here 1 N represent the maximum mesh level on both x, y direction, i.e. ∆x = ∆y = N . We also 2 put together the table at T = 0, i.e. the error of projection of initialization for comparison, and similar order of accuracy is observed. 59 N L1 error order L2 error order L∞ error order 5 4.57312e-02 5.30394e-02 1.15945e-01 6 1.51710e-02 1.592 1.98859e-02 1.415 5.50358e-02 1.075 7 5.14680e-03 1.560 7.43171e-03 1.420 2.41551e-02 1.188 8 1.49080e-03 1.788 2.31736e-03 1.681 9.06915e-03 1.413 9 2.73017e-04 2.449 4.83023e-04 2.262 2.87813e-03 1.656 Table 2.13: Errors and orders of accuracy of u, for k = 1, at T = 1 N L1 error order L2 error order L∞ error order 5 7.89276e-03 1.24346e-02 5.85053e-02 6 2.32947e-03 1.761 3.63022e-03 1.776 2.05560e-02 1.509 7 8.50218e-04 1.454 1.49729e-03 1.278 1.17706e-02 0.804 8 2.33762e-04 1.863 4.21631e-04 1.828 3.97204e-03 1.567 9 6.76643e-05 1.789 1.24259e-04 1.763 1.27171e-03 1.643 Table 2.14: Errors and orders of accuracy of u, for k = 1, at T = 0 N L1 error order L2 error order L∞ error order 5 5.07673e-03 6.88257e-03 2.69062e-02 6 4.97155e-04 3.352 7.57930e-04 3.183 5.93592e-03 2.180 7 1.81276e-04 1.456 2.87770e-04 1.397 2.56533e-03 1.210 8 1.89313e-05 3.259 3.28919e-05 3.129 4.44195e-04 2.530 9 2.79344e-06 2.761 5.14717e-06 2.676 7.21485e-05 2.622 Table 2.15: Errors and orders of accuracy of u, for k = 2, at T = 1 N L1 error order L2 error order L∞ error order 5 1.26846e-03 2.26795e-03 1.55856e-02 6 1.53309e-04 3.049 2.74032e-04 3.049 2.37428e-03 2.715 7 4.71975e-05 1.700 9.17983e-05 1.578 1.04093e-03 1.190 8 6.57529e-06 2.844 1.27438e-05 2.849 1.99676e-04 2.382 9 9.84804e-07 2.739 2.00984e-06 2.665 3.13352e-05 2.672 Table 2.16: Errors and orders of accuracy of u, for k = 2, at T = 0 N L1 error order L2 error order L∞ error order 4 4.45318e-03 5.69848e-03 2.10659e-02 5 1.11115e-03 2.003 1.46891e-03 1.956 7.11292e-03 1.566 6 1.10792e-04 3.326 1.47087e-04 3.320 8.31930e-04 3.096 7 8.05073e-06 3.783 1.70609e-05 3.108 1.51009e-04 2.462 8 1.20156e-06 2.744 1.80771e-06 3.238 1.11994e-05 3.753 Table 2.17: Errors and orders of accuracy of u, for k = 3, at T = 1 60 N L1 error order L2 error order L∞ error order 4 1.04154e-03 1.48986e-03 4.86638e-03 5 2.18472e-04 2.253 4.12664e-04 1.852 1.88407e-03 1.369 6 1.81491e-05 3.589 3.27455e-05 3.656 3.26242e-04 2.530 7 2.18507e-06 3.054 5.21414e-06 2.651 7.02091e-05 2.216 8 1.71983e-07 3.667 3.57593e-07 3.866 6.91854e-06 3.343 Table 2.18: Errors and orders of accuracy of u, for k = 3, at T = 0 For the same equation, initial condition, and accurate solution as described above, we also consider the adaptive scheme. In such case we use CFL condition CF L ∆t ≤ , 1 1 + ∆x ∆y with CFL= 0.9. We simulate the case of maximum mesh level N = 7 or N = 8, at final time T = 1, and different refinement parameter ε; coarsen parameter η is set to be ε/10. We also compute convergence rate, Rε with respect to refinement parameter or error threshold, and RDOF with respect to active degree of freedom at final time, with following formula: log(el−1 /el ) RDOF = (2.60) log(DOFl /DOFl−1 ) log(el−1 /el ) Rε = (2.61) log(εl−1 /εl ) where el is L2 error, εl the corresponding refinement parameter ε, and DOFl the active degrees of freedom at final time step. These results are shown in Table 2.19. In the following Figure 2.2, we plot the accurate solution at T = 1, i.e. 1 u(t = 1, x, y) = , 2 + sin(2π(x − y)) and the distribution of active elements on computational domain, by showing the center of active elements, at final time T = 1, for different cases as given in Table 2.19, i.e k = 1 for N = 8 and k = 2, 3 for N = 7, with ε = 10−5 . The figures show that more active elements are needed in the region where the numerical solution has larger magnitude, which is expected for adaptive scheme. 61 ε DOF L2 error RDOF Rε 1e-03 1216 3.472e-03 5e-04 1848 2.671e-03 0.627 0.378 1e-04 3232 9.350e-04 1.878 0.652 k = 1, N = 8 5e-05 5456 3.508e-04 1.872 1.414 1e-05 10976 1.628e-04 1.099 0.477 5e-06 13980 1.286e-04 0.972 0.339 1e-03 900 1.367e-03 5e-04 1080 5.501e-04 4.993 1.313 1e-04 2268 2.874e-04 0.875 0.403 k = 2, N = 7 5e-05 2880 1.808e-04 1.941 0.669 1e-05 5400 4.064e-05 2.374 0.927 5e-06 6930 2.996e-05 1.222 0.440 1e-03 704 5.249e-04 5e-04 1120 4.092e-04 0.536 0.359 1e-04 1856 8.728e-05 3.059 0.960 k = 3, N = 7 5e-05 2112 4.854e-05 4.540 0.846 1e-05 3968 1.858e-05 1.523 0.597 5e-06 4800 6.058e-06 5.888 1.617 Table 2.19: L2 error and order for adaptive scheme, for T = 1 2.6.2 Accuracy and Convergence Test for nonlinear Maxwell Equation Here we consider the nondimensionlized form of system (2.31), i.e. ∂t Hz = ∂y Ex − ∂x Ey (2.62a) ∂t Dx = ∂y Hz (2.62b) ∂t Dy = −∂x Hz (2.62c) D = ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE (2.62d) ∂t P = J (2.62e) ∂t J = −γJ − ω02 P + ωp2 E (2.62f) ∂t Q = σ (2.62g) ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2 (2.62h) on spatial domain Ω = [0, 1]2 and time domain [0, T ], and manufactured solution with pe- riodic boundary condition as specified below, to demonstrate accuracy and convergence 62 Accurate Solution Center of Elements for N=8, k=1 1 1 0.8 0.8 0.6 0.6 y 0.4 0.4 1 0.2 1 0.5 0.5 0 y 0 0 0 0.5 1 x x Center of Elements for N=7, k=2 Center of Elements for N=7, k=3 1 1 0.8 0.8 0.6 0.6 y y 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 x x Figure 2.2: Accurate solution, and active elements at T = 1 for ε = 10−5 ; red circles are center of active elements. of SGDG method. We use following functions as manufactured solution: Hz (t, x, y) = exp(cos(w(t + αx + βy))/q), (2.63) and P = D = E = (β, −α)Hz , J = ∂t P, Q = Hz , σ = ∂t Hz . (2.64) We also define parameters as ϵ∞ = 1, a = 0.01, θ = 0.125, γ0 = γv = 0.05, ω0 = ωp = ωv = 1, (2.65) 63 so equations (2.62d), (2.62f), and (2.62h) needs to add extra source for balance of equa- tions. However, the PDE part of system (2.62) is satisfied, whenever α2 + β 2 = 1. In our example here, to impose periodic boundary condition, we choose √ 1 2 w = 2π 5, α= √ , β=√ , q = 15. (2.66) 5 5 To match the high order accuracy of spatial and time discretization, we first set  (k+1)/2  1  1 ∆t = CFL ×    1  , ∆x = ∆y = , (2.67) 1  2N 2 + ∆x ∆y where N is the maximum of mesh level, k is degree of Alpert’s multiwavelet functions, and we pick the CFL number as following:  0.3 k = 1      CFL = 1 k=2     2  k=3 We compute the errors as in Table 2.20 at final time T = 0.25, using Alternating Flux I. We can observe reduction of order of convergence from k + 1 of regular DG method to k + 1/2 of SGDG method by 1/2, given the property of SGDG method as discussed in e.g. [4], Theorem 3.4. This verifies the accuracy of the program. 64 N Hz Ex Ey ∞ ∞ L2 L L2 L L2 L∞ 1 P 6 8.08e-04 2.89e-03 1.41e-03 3.97e-03 1.61e-03 5.05e-03 7 2.71e-04 1.58 1.13e-03 1.35 5.19e-04 1.44 2.16e-03 0.88 6.11e-04 1.40 2.42e-03 1.06 8 5.99e-05 2.18 3.64e-04 1.63 2.10e-04 1.30 9.14e-04 1.24 2.33e-04 1.39 1.10e-03 1.14 9 2.16e-05 1.47 1.03e-04 1.82 7.72e-05 1.44 3.55e-04 1.36 8.19e-05 1.51 4.08e-04 1.43 10 6.01e-06 1.85 3.67e-05 1.49 2.78e-05 1.47 1.49e-04 1.25 2.88e-05 1.51 1.63e-04 1.33 P2 N Hz Ex Ey ∞ ∞ L2 L L2 L L2 L∞ 5 5.95e-04 1.27e-03 4.34e-04 1.09e-03 2.56e-04 8.53e-04 6 8.23e-05 2.85 2.16e-04 2.55 7.09e-05 2.61 2.92e-04 1.90 5.13e-05 2.32 2.16e-04 1.98 7 1.07e-05 2.94 2.81e-05 2.94 1.09e-05 2.70 4.50e-05 2.70 8.34e-06 2.62 3.84e-05 2.49 8 1.34e-06 3.00 4.83e-06 2.54 1.69e-06 2.69 7.98e-06 2.50 1.45e-06 2.52 8.08e-06 2.25 9 1.69e-07 2.99 5.10e-07 3.24 2.84e-07 2.57 1.48e-06 2.43 2.59e-07 2.49 1.39e-06 2.54 P3 N Hz Ex Ey ∞ ∞ L2 L L2 L L2 L∞ 5 8.97e-04 1.39e-03 5.98e-04 9.08e-04 3.00e-04 5.14e-04 6 5.67e-05 3.98 9.02e-05 3.94 3.78e-05 3.98 6.23e-05 3.86 1.90e-05 3.98 4.43e-05 3.54 7 3.65e-06 3.96 5.92e-06 3.93 2.44e-06 3.96 4.52e-06 3.79 1.23e-06 3.95 3.68e-06 3.59 8 2.28e-07 4.00 3.98e-07 3.90 1.53e-07 3.99 3.63e-07 3.64 7.79e-08 3.98 3.24e-07 3.50 9 1.43e-08 3.99 2.56e-08 3.96 9.64e-09 3.99 3.38e-08 3.43 5.06e-09 3.94 2.80e-08 3.53 Table 2.20: L2 and L∞ errors of Hz , Ex , Ey for Alternating Flux I case 65 2.6.3 Spatial Optical Soliton Propagation In this subsection, we present the result of the adaptive SGDG schemes to simulate phys- ically relevant problem of spatial optical soliton propagation. We start from the setup of the example in the dimensional form, and show the non-dimensionalized form after- wards based on which the actual simulation is conducted. We first consider the spatial optical soliton propagation in realistic glasses, character- ized by a three-pole Sellmeier linear dispersion, an instantaneous Kerr nonlinearity and a dispersive Raman nonlinearity as discussed in [2, 127]. The non-dimensionalized form of the equations is the following: µ0 ∂t Hz = ∂y Ex − ∂x Ey (2.68a) ∂t Dx = ∂y Hz (2.68b) ∂t Dy = −∂x Hz (2.68c) " 3 # X D = ϵ0 ϵ∞ E + b Ps + a(1 − θ)|E|2 E + aθQE (2.68d) s=1 ∂ t P s = Js , s = 1, 2, 3, (2.68e) 2 2 ∂t Js = −γJs − ω0s Ps + ωps E, s = 1, 2, 3, (2.68f) ∂t Q = σ (2.68g) ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2 (2.68h) with parameters ω01 = 2.7537 × 1016 rad/s, ω02 = 1.6205 × 1016 rad/s, ω03 = 1.9034 × 1014 rad/s, s 2 τ12 + τ22 a = 1.89 × 10−22 m2 /V 2 , γv = , ωv = , τ1 = 12.2f s, τ2 = 32.2f s, τ2 τ12 τ22 and p β1 = 0.69617, β2 = 0.40794, β3 = 0.89748, ωps = βps ω0s , γs = 0, s = 1, 2, 3, ϵ∞ = 1, b = 1, θ = 0.3. 66 The physical domain is [0, 38] for x and [−3, 3] for y, in unit of µm; however, to reduce the numerical oscillations from domain boundary, we set the computational domain to be larger, i.e. [0, 60] × [−4, 4] instead. On the left boundary x = 0, time-dependent hard source is injected to the magnetic field Hz , namely, Hz (x = 0, y, t) = H0 sin(ωc t)sech(y/w), (2.69) where, ωc = 4.35 × 1015 rad/s is the carrier frequency, and w = 667 nm, H0 = 4.77 × 107 A/m are the width and the magnitude of the incident wave, respectively. To better understand expected behaviour of our numerical tests, recall that in uniform glasses, the Maxwell’s equations (2.68) with the nonlinear constitutive laws reduce to the nonlinear Schrödinger equation (NLSE) under paraxial assumption [2]. The pulse pro- vided by the hard source above has predictable propagation, as indicated by the accurate solution of NLSE. In fact, the normalized NLSE of u = u(ζ, ξ), i.e. ∂u 1 ∂ 2u i = + |u|2 u, ζ ∈ (0, +∞), ξ ∈ (−∞, +∞), (2.70) ∂ζ 2 ∂ 2ξ with initial condition u(0, ξ) = g(ξ), ξ ∈ (−∞, +∞) admits bright soliton solutions. Specifically, regarding our numerical tests, we considered g(ξ) = g1 (ξ) := ηsech (η(ξ − ξ0 )) exp(−iΛξ − iϕ) as initial condition, and the solution is given by Λ2 − η 2 u1 (ζ, ξ) = ηsech (η(ξ − ξ0 − Λζ)) exp(−iΛξ − iϕ + i · ), 2 which is called fundamental soliton; η, Λ, ϕ, ξ0 are parameters. When Λ = 0, |u1 (ζ, ξ)| remains constant for all ζ, and one can see that the fundamental soliton propagates in the dispersive and weakly nonlinear medium without changing its amplitude, width or shape. To compare with following case, consider ξ0 = 0 and η = 1. Instead, when g(ξ) = g2 (ξ) := 2sech(ξ), 67 the solution becomes cosh(3ξ) + 3 exp(−4iζ) cosh(ξ) u2 (ζ, ξ) := 4 exp(−iζ/2) . cosh(4ξ) + 4 cosh(2ξ) + 3 cos(4ζ) This solution is called second-order soliton, as the result of the interactions between two fundamental solitons. We present the magnitude of u1 and u2 in the following Figure 2.3. 3 1 3 0.9 3.5 2 2 0.8 3 1 0.7 1 2.5 0.6 0 0 2 0.5 1.5 -1 0.4 -1 0.3 1 -2 -2 0.2 0.5 -3 0.1 -3 0 0.5 1 1.5 0 0.5 1 1.5 Figure 2.3: Magnitude |u1 (ζ, ξ)| of Fundamental Soliton and |u2 (ζ, ξ)| of Second Order Soliton, of Solutions of NLSE (2.70). Before we simulate the fundamental and second order solitons, we need a final step to non-dimensionalize the Maxwell equation (2.68). Following the procedure desribed in [1], we choose reference space scale x0 = 8µm, reference time scale t0 = x0 /c = 2.6685×10−14 s with c the light speed, and reference value of Hz is H0 = 4.77 × 107 A/m. If reference value of electric field is E0 , then we have rescaled fields and constants defined as follows: p (H/E0 ) µ0 /ϵ0 → H, D/(ϵ0 E0 ) → D, P/E0 → P, (J/E0 )t0 → J, E/E0 → E, Q/E02 → Q, (σ/E02 )t0 → σ, aE02 → a, ω∗ t0 → ω∗ , ∗ = 0s, ps, v, s = 1, 2, 3, γv t0 → γv . Note that here we use same notations for scaled and original variables. From the scaling p of H = Hz , we have H0 = E0 ϵ0 /µ0 = E0 ϵ0 c, so E0 = 1.797 × 1010 N/(A · s). 68 Spatial computational domain is [0, 60] × [−4, 4] in unit of µm, or [xa , xb ] × [ya , yb ] = [0, 7.5] × [0, 1], after nondimensionalization on both x, y direction and shifting on y direc- tion by 0.5, on x-y plane, and computational domain is large enough in both x and y di- rection to avoid non-physical oscillation. Initially, all the fields are set to be zero. To make the best use of boundary condition provided as Hz (x = 0, y, t) in (2.69), we consider the adaptive SGDG schemes with alternating numerical fluxes (2.35). Note that in these two cases, we do not need information of E(x = 0, y, t), since the flux regarding E on x = 0 along x direction is chosen to be right limit on x = 0, whose value is evaluated within the computational domain. We refer the interested readers to [1] for numerical bound- ary treatments suitable for other alternating fluxes. For simplicity we will only present the simulation results by the Alternating I numerical flux. And except for the injected boundary on x = 0, the other three edges of boundary are absorbing boundaries based on linearized system on characteristic direction, after neglecting the nonlinear effects of Kerr type and Raman scattering and the linear delayed response. The dimensionless form of these numerical flux follows as √ √ 1. at right boundary x = xb of x direction, (Hz + ϵ∞ Ey )+ = (Hz + ϵ∞ Ey )− and √ (Hz − ϵ∞ Ey )+ = 0; √ √ 2. at left boundary y = ya of y direction, (Hz + ϵ∞ Ex )+ = (Hz + ϵ∞ Ex )− and (Hz − √ ϵ∞ Ex )− = 0; √ √ 3. at left boundary y = yb of y direction, (Hz − ϵ∞ Ex )+ = (Hz − ϵ∞ Ex )− and (Hz + √ ϵ∞ Ex )+ = 0. In our simulation, we use P 2 Alpert’s multiwavlet as basis of numerical solution, cou- pling with P 3 interpolatory multiwavelet to deal with nonlinearity. The interpolatory multiwavelet basis is chosen to be one degree higher than Alpert’s, in order to eliminate numerical error of interpolation. Maximum mesh level is 11, i.e. the most refined ∆x 69 and ∆y might be 1/211 . Error thresholds for refinement and coarsening are ε = 10−3 and η = 10−4 , respectively. The relative tolerance of nonlinear solver of PETSc is set to be 10−9 . We consider first the case of fundamental soliton, where the non-dimensionalized form of initial condition is Hz (x = 0, y, t) = sin(ωc t)sech(y/w), with rescaled ωc = 116.08 and w = 0.083375. We plot the solutions at different time, and the centers of corresponding active elements for adaptive scheme in Figure 2.4. Note that in this case, the soliton during the propagation maintains magnitude and width as in first picture of Figure 2.3, while the error threshold allows error of specific magnitude to enter into the soliton solution. The distribution of active elements also provide important information of the soliton: more active elements accumulate around y = 0.5, where the magnitude of the magnetic field reaches its maximum, and fewer active elements support on region away from y = 0.5, since the soliton solution almost vanishes when y is close to its boundary. Along x direction, active elements and their support move along x direction as the soliton propagates. The adaptive scheme generates active elements that properly follow the propagation of the solution. Next, we present several simulation results of non-dimensionalized initial condition Hz (x = 0, y, t) = 2 sin(ωc t)sech(y/w) with same ωc and w as in fundamental soliton case; see Figure 2.5. We focus on the region when x is smaller than 1/3. This solution has changing magnitude and width, which is different from fundamental soliton. As predicted in NLSE (2.70) and Figure 2.3, this is the case when two fundamental solitons produce the second order soliton, with focusing and defocusing effect. Again the support of active elements match the region with larger |Hz | magnitude, while at the region close to boundary of y direction or of large x, the solution almost vanishes and fewer elements are needed for good resolution of the solution. 70 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 y 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 y 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 y 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Figure 2.4: Comparison for fundamental soliton between absolute value |Hz | of magnetic field (left column) and center of active elements (right column) at different time. First row: T = 2; Second row: T = 4; Third row: T = 6. 71 1 3.5 0.9 3 0.8 0.7 2.5 0.6 2 y 0.5 1.5 0.4 0.3 1 0.2 0.5 0.1 0 0 0.05 0.1 0.15 0.2 0.25 0.3 x 1 3.5 0.9 3 0.8 0.7 2.5 0.6 2 y 0.5 0.4 1.5 0.3 1 0.2 0.5 0.1 0 0 0.05 0.1 0.15 0.2 0.25 0.3 x 1 4 0.9 3.5 0.8 3 0.7 2.5 0.6 y 0.5 2 0.4 1.5 0.3 1 0.2 0.5 0.1 0 0 0.05 0.1 0.15 0.2 0.25 0.3 x Figure 2.5: Comparison for second order soliton between absolute value |Hz | of magnetic field (left column) and center of active elements (right column) at different time. First row: T = 1.976; Second row: T = 3.024; Third row: T = 3.459. 72 CHAPTER 3 APPLICATION OF WENO METHOD IN MATHEMATICAL MODEL OF ASYNCHRONOUS DATA FLOW IN PARALLEL COMPUTERS 3.1 Models of Extreme Scale Computers It has been well-established that current and future generations of extreme scale com- puters have achieved and, for the foreseeable future, are expected to achieve increases in performance via greater levels of parallelism at multiple levels — e.g., within the proces- sors as well as increasing the number of processors and nodes —as opposed to increases in clock speeds, which are expected to remain relatively flat. Additionally, extremely con- current codes, involving dynamic parallelism and greater degrees of asynchronous paral- lel executions, are increasingly needed to leverage this large scale parallellism [128, 129]. As machine improvements depend on increasingly complex architectures and as ad- ditional constraints on system development and planning (such as power consumption [129]) arise, a need for predictive, quantitative models of computational performance will grow greater. Previously developed modeling tools such as LogP [130] result in easily evaluated models which can prove difficult to extend and modify. Alternatively, PRAM models have been used as abstractions of codes; these however have scalability issues due to the complexity of simulating them [131]. Other modern tools [132, 133] are sim- ilarly still limited to fine-grain simulations of at most a few dozen nodes, again due to their computational complexity during simulations. Core counts are now in the hundreds of thousands and millions on machines in the TOP500 list of supercomputers; node counts consistently are in the thousands. Such num- bers mean that fine-grained simulation tools (such as those listed above) are incapable of describing large-scale phenomena. Alternative approaches have been proposed to ad- dress these issues: miniapp codes can mimic key features of the performance of exascale 73 codes with a much smaller codebase [134]. Aspen, a framework for performance model- ing [135, 136], uses a domain specific language which encodes both abstracted features of machines hardware and specific software applications to provide coarse-grained simula- tions. However, these suffer from the need to develop specialized simulation codes which can be problem dependent, resulting in possibly labor-intensive tools. A workflow mod- eling apporach, Pegasus, has been developed to model workflows using a graph-theoretic perspective to detect and manage anamolies in the computing environment [137]. We propose developing a macroscopic model of extreme scale computers which views such computing environments in a continuum framework. Such a model has several po- tential benefits: in addition to being computationally tractable, it will open up the pos- sibility of using the theoretical tools of partial differential equations to understand and control the performance of high-performance computing systems. Specifically, our goal is to derive a fluid-limit model of data flow — which can be described by a partial differ- ential equation — from a simplified deterministic model of data processing and flow in an extreme scale computer with interprocessor communications and asynchronous exe- cutions. Fluid models, beyond their obvious utility in physical systems, have been used to model flows in networks, such as vehicular traffic flows [138], supply chains [139], and gas networks [140, 141]. In particular, as discussed in [142] and [139], such fluid models lie at the end of a hierarchy of models which begin with microscopic or discrete mod- els. That is, similar to the derivation of physical fluid laws from many body physics, one may derive continuum-level flow equations from discrete-level models of the dynamics of agent interactions. With such a model, standard numerical simulation tools and analyt- ical methods may be brought to bear for studying large-scale phenomena in extreme-scale computing. We begin in section 3.2 with a microscopic model of a network of processors perform- ing a multi-stage computational task which necessitates inter-processor communications. In section 3.3, we derive a formal asymptotic limit of this agent-based model as the scale 74 of the system increases, resulting in an Eulerian fluid flow model. Along with the re- sulting nonlinear conservation law, we present a related Hamilton-Jacobi equation and establish the existence of solutions [103, 104] in section 3.3. In section 3.4, we illustrate the numerical methods we use to simulate discrete and continuum model, especially WENO interpolation for solving Hamilton-Jacobi equation [12, 11, 107, 99]. Finally in section 3.5, we present the results of numerical experiments, to show agreement between the micro- scopic and fluid models and then illustrate the behavior of solutions under heterogeneous computing layouts. This chapter is based on published paper [10]. 3.2 The Discrete Model In this section, we introduce the microscopic model, which is based on a highly simplified, deterministic, semi-discrete ordinary differential equation (ODE). Imagine the computer max as a network of processors {Pi }ii=1 that are arranged in a one-dimensional, periodic lat- tice. The computer is assigned a computational job involving a sequence of k max tasks which are identical in the sense that each one takes the same effort to complete. This computational job is divided by distributing data within the many processors. Denote by qi,k (t) the amount of data in Pi that sits in stage k at time t. The dynamics of qi,k are given by a conservation law of the form q̇i,k (t) = Fi,k−1 (t) − Fi,k (t), k = 1, . . . , k max , i = 1, . . . imax , (3.1) where Fi,k (i = 1, . . . , imax , k = 1, . . . , k max −1) is the rate of data moving in processor i from stage k to k + 1, referred to as the throughput. At the first stage k = 1, Fi,0 (i = 1, . . . , imax ) is the rate of data being loaded into processor i to be processed, referred to as the inflow, and at the final stage, Fi,kmax (i = 1, . . . , imax ) is the rate of data completing the final stage of the job, referred to as the outflow. Equation (3.1) implies that the data in each processor is neither created or destroyed, only moved in and out of the processor or in between 75 stages; that is, kXmax ! d qi,k = Fi,0 − Fi,kmax . (3.2) dt k=1 Another fundamental quantity of interest in the discrete model is Qi,k (t), which is defined as the amount of data at time t that has gone through the first k − 1 stages of Pi . For each t ≥ 0, max  kX  Z t Qi,k (t) = qi,j (t) + Fi,kmax (s)ds. (3.3) j=k 0 To determine the form of Fi,k , consider the case of with or without throttling. Without throttling, assume data can move in one processor Pi between tasks in the rate of a given maximum throughput ai ≥ 0, which is independent of the task number k. If we consider throttling, Fi,k (t) does not reach the maximum ai for two reasons considered here: self- throttling and neighbour-throttling. 1. Self-throttling: Given an amount of data qi,k to be processed at stage k in processor .. .. .. . . . qi−1,k+1 qi,k+1 qi+1,k+1 k+1 qi−1,k qi,k qi+1,k k qi−1,k−1 qi,k−1 qi+1,k−1 k−1 .. .. .. . . . Pi−1 Pi Pi+1 Figure 3.1: Schematic of network of processors. Dashed lines denote inter-processor com- munications 76 Pi , we define the self-throttling function    qi,k v1 (qi,k ; q∗ ) = max 0, min 1, , (3.4) q∗ in the form of roof-line model. If no data is available to be processed, then Fi,k = 0; if the amount of data to be processed drops below a certain threshold q∗ > 0, then Pi cannot maintain the throughput ai and the throughput is reduced. 2. Neighbor throttling: As the computational task is not entirely parallel across pro- cessors, Pi requires sufficient information from its neighbors to perform task k at full throughput. The neighbor throttling function v2 models this dependence. It gives the amount of available data on Pi at stage k   1 1 v2 (qi,k , ∆i+1,k , ∆i−1,k ; β) = min qi,k , max{∆i+1,k , 0}, max{∆i−1,k , 0} . (3.5) β β Here ∆i±1,k denotes the data on the right/left neighbor which is available to be used by Pi to process qi,k . The parameter β ∈ (0, 1] allows for the possibility that compu- tations do not rely in a one-to-one fashion upon the availability of data from neigh- bors. If ∆i±1,k = 0 the processing of data stops due to the absence of a necessary component of the computational task and so Fi,k = 0. Alternatively, if both ∆i+1,k and ∆i−1,k exceed βqi,k , then Pi has sufficient data from its neighbors to process qi,k and no throttling occurs. The data from the left/right neighbor which is available for processing at stage k is given by ∆i±1,k = Qi±1,k − Qi,k+1 = Qi±1,k − (Qi,k − qi,k ). (3.6) The data on each neighbor must have completed the same stage for it to be available; additionally, this data is not reused on Pi for the same stage. This means that the data available to be used from the neighbors can be written as above and so the amount of data available to be processed on Pi at stage k is given by  v2 qi,k , Qi+1,k − Qi,k + qi,k , Qi−1,k − Qi,k + qi,k ; β . (3.7) 77 The throughput Fi,k is a composition of the throttling functions v1 and v2 :    Fi,k = ai v1 v2 qi,k , Qi+1,k − Qi,k + qi,k , Qi−1,k − Qi,k + qi,k ; β ; q∗ . (3.8) At first glance, this definition of Fi,k appears circular since it depends on Qi,k , which in turn depends on Fi,kmax . However, as a consequence of the conservation law (3.2), kXmax kXmax Z t Z t Fi,kmax (s)ds = Fi,0 (s)ds + qi,j (0) − qi,j (t) (3.9) 0 0 j=1 j=1 Thus to complete the model, we need only prescribe initial data qi,k (0) and the inflow Fi,0 . To prescribe the inflow, we specify qi,0 and then let Fi,0 be evaluated according to (3.8). Theorem 3.2.1 ([10], Proposition 2.1) The system (3.1) with (i) throughput Fi,k defined in (3.8) for i = 1, . . . , imax and k = 1, . . . , k max ; (ii) prescribed initial data qi,k (0) for i = 1, . . . , imax and k = 1, . . . , k max ; and (iii) prescribed inflow data Fi,0 for i = 1, . . . , imax and t ≥ 0 has a unique solution for all t ≥ 0. Moreover, if qi,k (0) ≥ 0 for all i = 1, . . . , imax and k = 1, . . . , k max , then qi,k (t) ≥ 0 for all t ≥ 0 and i = 1, . . . , imax and k = 1, . . . , k max . 3.3 The Continuum Model In this section, we consider a formally accurate continuum model in the limit as number of processors imax and number of tasks k max tend to infinity. So far to our knowledge, there is no immediate conclusion of existence and uniqueness for such model, so a Hamilton- Jacobi equation is demonstrated at the end of this section. The extensive theory of vis- cosity solutions of Hamilton-Jacobi equation leads to promising result of existence and uniqueness of the model. Higher dimension model can be derived in the similar manner; for details, see [10] Section 3.3. For given imax , k max , we define the quantities: ε k max δ := (k max )−1 , ε := (imax )−1 , η := = max . (3.10) δ i 78 Here, δ is the fraction of the work done in each stage and ε is the average amount of data in a processor. In following paragraphs, we provide a brief discussion of derivation of the continuum model, by imax and k max tending to infinity. We assume, in taking this limit, that the job performed by the computer is fixed – that is, the total amount of work does not change. To derive such a continuum model, we first express the ODE (3.1) in terms of the following O(1) quantities: q∗ qi,k 1 ± Ri±1,k − Ri,k ai r∗ := , ri,k := , Ri,k := Qi,k , Di,k := ± , αi := . (3.11) εδ εδ ε ε ε Define the rescaled throttling functions    r w1 (r, r∗ ) = max 0, min 1, . (3.12a) r∗   − +  1 + 1 − w2 r, D , D ; η, β = min r, max{ηD + r, 0}, max{ηD + r, 0} (3.12b) β β and the composite function w r, D− , D+ ; r∗ , α, η, β := αw1 w2 (r, D− , D+ ; η, β); r∗ .   (3.13) The dynamics in (3.8) can now be re-expressed in terms of the O(1) quantities in (3.11), thereby obtaining an evolution formula for ri,k : fi,k−1 (t) − fi,k (t) ṙi,k (t) = (3.14) δ − +  fi,k (t) = w ri,k (t), −Di,k (t), Di,k (t); r∗ , αi , η, β (3.15) for i = 1, . . . , imax , k = 0, . . . , k max , and ri,0 is prescribed for i = 1, . . . , imax . The next step is to interpret (3.14) as a conservative finite-difference formula for a sufficiently smooth function ρ = ρ(x, y, t), defined on [0, 1) × [0, 1] × [0, ∞), such that ρ(xi , zk , t) = ri,k (t), (3.16) on grid points xi = (i − 0.5) ε and zk = kδ, (3.17) 79 for i = 1, . . . , imax and k = 0, . . . , k max . We also let α = α(x) be a continuous function such that α(xi ) = αi . Let the function ϕ = ϕ(x, z, t) interpolate the fluxes on the same grid points: ϕ(xi , zk , t) = fi,k (t), (3.18) for i = 1, . . . , imax , k = 1, . . . , k max , t ≥ 0, and Z 1 Z t P (x, z, t) = ρ(x, ξ, t)dξ + ϕ(x, 1, s)ds, (3.19) z 0 and we find that both Φ(1) (ρ(xi , zk , t), ∂x P (xi , zk , t), ∂x2 P (xi , zk , t); r∗ , a, η, β) (3.20) and Φ(0) (ρ(xi , zk , t), ∂x P (xi , zk , t); r∗ , a, η, β) (3.21) approximate − + w(ri,k (t), Di,k (t), Di,k (t); r∗ , αi , η, β), (3.22) when 0 ≪ ε, δ ≪ 1, where Φ(0) (ρ, ∂x P ; r∗ , α, η, β) = w (ρ, −∂x P, ∂x P ; r∗ , α, η, β) (3.23a)  ε ε  Φ(1) ρ, ∂x P, ∂x2 P ; r∗ , α, η, β = w ρ, −∂x P + ∂x2 P, ∂x P + ∂x2 P ; r∗ , α, η, β .  (3.23b) 2 2 Thus for 0 ≪ ε, δ ≪ 1, with η ∈ (0, ∞) fixed, the weak form of (3.14) is formally consistent with the continuum model ∂t ρ + ∂z Φ(ℓ) (ρ, ∂x P, ∂x2 P ; r∗ , a, η, β) = 0, (x, z, t) ∈ T1 × (0, 1) × (0, ∞), (3.24a) ρ(x, 0, t) = ρbc (x, t), (x, t) ∈ T1 × (0, ∞), (3.24b) ρ(x, z, 0) = ρ0 (x, z), (x, z) ∈ T1 × (0, 1) (3.24c) where Z 1 Z t P (x, z, t) = ρ(x, ξ, t)dξ + ϕ(ℓ) (x, 1, s)ds, (3.25a) z 0 ϕ(ℓ) (x, z, t) = Φ(ℓ) ρ(x, z, t), ∂x P (x, z, t), ∂x2 P (x, z, t); r∗ , α, η, β ,  (3.25b) 80 and Φ(ℓ) , ℓ ∈ {0, 1}, is given in (3.23). For the sake of compactness, we have slightly abused notation in (3.24a), as the definition of Φ(0) is independent of ∂x2 P . Additionally, we have identified [0, 1) with the one-dimensional torus T1 in order to reflect the periodic layout of the processors. As in the discrete case, it may appear that the model in (3.24) is circular due to the definition of P in (3.25a). However, as with F in (3.9), Φ(ℓ) can be unwrapped, this time using the conservation law (3.24a); that is Z t Z t Z 1 Z 1 (ℓ) ϕ (x, 1, s)ds = ϕ(x, 0, s)ds + ρ0 (x, ξ)dξ − ρ(x, ξ, t)dξ (3.26) 0 0 0 0 Thus the continuum model is complete once initial condition ρ0 and inflow condition ϕbc := ϕ(·, 0, ·) are specified. In practice, ρbc is prescribed and then ϕbc is evaluated using (3.25b) and (3.23). Furthermore, integrating (3.24a) with respect to z gives Z 1 ∂t ρ(x, ξ, t)dξ + Φ(ℓ) (x, 1, t) − Φ(ℓ) (x, z, t) = 0 (3.27) z Meanwhile, differentiating (3.25a) gives Z 1 ∂t P (x, z, t) = ∂t ρ(x, ξ, t)dξ + Φ(ℓ) (x, 1, t) (3.28) z Combining (3.27) and (3.28) and using the fact that ρ = −∂z P gives a closed Hamilton- Jacobi equation for P with initial and boundary conditions that are derived by applying the definition of P in (3.25a) to (3.24c) and (3.24b), respectively. The complete model is, for some T > 0, ∂t P − Φ(ℓ) (−∂z P, ∂x P, ∂x2 P ; r∗ , α, η, β) = 0, (x, z, t) ∈ T1 × (0, 1) × (0, T ), (3.29a) Z 1 Z t P (x, 0, t) − ρ0 (x, ξ)dξ − ϕbc (x, s)ds = 0, (x, t) ∈ T1 × (0, T ), (3.29b) 0 0 Z 1 P (x, z, 0) − ρ0 (x, ξ)dξ = 0, (x, z) ∈ T1 × (0, 1), (3.29c) z where (3.29b) is derived by integrating (3.24b) over z ∈ (0, 1) and applying (3.26). The existence and uniqueness of viscosity solution follows: 81 −ηD = r − βr∗ r ηD = r − βr∗ −ηD = r ηD = r Ω1 Ω1 Ω3 Ω4 r = r∗ Ω3 Ω2 Ω2 Ω4 −ηD ηD 1−β =r 1−β =r D Figure 3.2: Flux Φ(0) defined in (3.30) Theorem 3.3.1 ([10], Theorem 3.1) Assume that α and ρ0 are (i) non-negative, (ii) uniformly Lipschitz in their arguments, and (iii) periodic in x (that is, α(0) = α(1) and ρbc (0, t) = ρbc (1, t) RT for all t ∈ [0, T ]). Further, assume that there is an M where 0 ϕbc (x, s)ds ≤ M for all x ∈ T1 . Then there exists a unique, continuous, viscosity solution (in the sense of [103]) to (3.29). We use the flux function Φ(0) for all of the numerical simulations in Section 3.5. This function is a piecewise constant that can be expressed in the following form:      α (r, D) ∈ Ω1 ,    αr (r, D) ∈ Ω2 ,   r  ∗ Φ(0) (r, D; r∗ , α, β) = α(r + ηD) (3.30)   (r, D) ∈ Ω3 , βr∗     α(r − ηD)   (r, D) ∈ Ω4 ,    βr∗ where the subdomains Ωi are depicted in Figure 3.2. 3.4 Numerical Methods for Simulations In this section, we perform numerical simulations of the one dimensional processor sys- tem in order to (i) test the ability of the macroscopic model to approximate the discrete model when ε and δ are small and (ii) explore how model parameters affect the model out- put. Problem data is specified in terms of continuum models quantities. These quantities are translated back to discrete model quantities in order to implement ODE simulations. 82 3.4.1 ODE Implementation The explicit two-step Adams-Bashforth (Section III of [143]) is used to simulate the dis- crete model formed by (3.1), (3.3), and (3.8). Given η, values imax and k max are chosen so that k max /imax = η as in (3.10). We then compute a solution to the discrete model as follows. Using (3.11) and (3.16), we convert r∗ , a, ρ0 , ρbc to their discrete counterparts: q∗ = εδr∗ , qi,k (0) = εδρ0 (xi , zk ), qi,0 (t) = ρbc (xi , t), ai = εα(xi ). (3.31) This discrete model data is used to set the time step: q∗ ∆t = √ . (3.32) 2 maxi ai imax k max The outflow at Fi,kmax is tracked and accumulated over time in order to compute Qi,k from (3.3). At the final time T , the result of the explicit time stepping is converted back, via the formula in (3.11), i.e., ri,k (T ) = (εδ)−1 qi,k (T ). In order to compare this against solutions to the continuum model (see below), we use these point-wise values to generate a piecewise constant function r over the cells Ci,k = (xi − .5ε, xi + .5ε) × (zk , zk + δ): X r(x, z) = ri,k χCi,k (x, z). (3.33) i,k 3.4.2 Hamilton-Jacobi Implementation The Hamilton Jacobi equation (3.29) is solved numerically using a fifth-order WENO in- terpolation in x and z and the optimal third-order SSP Runge-Kutta method for time integration. Details of these algorithms can be found in Sections 3.2 and 6, respectively, of [11]. Once a numerical solution for P is computed, we again use WENO interpolation to approximate ρ via the relation ρ(x, z, t) = −∂z P (x, z, t). To condense the notation, let σ = ∂x P , τ = ∂z P and υ = ∂xx P . Then for fixed r∗ , α, η, β, and ℓ, let H(σ, τ, υ) = −Φ(ℓ) (−τ, σ, υ; r∗ , a, η, β). The numerical solution for P is computed 83 on a grid {xn , zm } where xn = n∆x, n = 1, 2, . . . , N, ∆x = N −1 , (3.34) zm = m∆z, m = 1, 2, . . . , M, ∆z = M −1 . (3.35) The semi-discrete method for the grid function Pn,m (t) ≈ P (xn , zm , t) is d − + − + Pn,m (t) = −Ĥ(σn,m , σn,m , τn,m , τn,m ; υn,m ), (3.36) dt ± ± where the numerical approximations σn,m ± ≈ σ(x± n , zm ) and τn,m ≈ τ (xn , zm ) are obtained via WENO interpolation and υn,m ≈ υ(xn , zm ) is computed by central difference. The numerical flux function Ĥ, based on the global Lax-Friedrichs flux: σ− + σ+ τ − + τ +   − − 1 1 + + Ĥ(σ , σ , τ , τ ; υ) = H , , υ − λx (σ + − σ − ) − λz (τ + − τ − ), (3.37) 2 2 2 2 where αη α λx = max |Hσ | = , λz = max |Hτ | = . (3.38) σ,τ βr∗ σ,τ βr∗ The time step for the SSP integrator is given by λx λz   ∆t + ≤ 0.6. (3.39) ∆x ∆z 3.4.3 WENO Interpolation Here we demonstrate how to compute σ ± and τ ± using fifth order WENO interpolation ± [11]. For simplicity, we only consider σi,j for fixed i, j, which approximate ∂x P (x, zj ) from − the left or right limit of x = xi , and we consider σi,j first. For fixed i, j, each of stencils X0 = {xi−3 , xi−2 , xi−1 , xi }, X1 = {xi−2 , xi−1 , xi , xi+1 }, X2 = {xi−1 , xi , xi+1 , xi+2 }, and function values of P (·, zj ) evaluated on the stencil, we can find a Lagrangian interpo- lating polynomial Qk of degree 3, k = 0, 1, 2, i.e. Qk (xi−3+k+r ) = P (xi−3+k+r , zj ), r = 0, 1, 2, 3. 84 Then −,k d σi,j := Qk (x) dx x=xi −,k approximate ∂x P (x− i , zj ); note that σi,j , k = 0, 1, 2 are the three possible interpolations of third order ENO procedure. Let ∆+x Pi,j := Pi+1,j − Pi,j = P (xi+1 , zj ) − P (xi , zj ) denote the standard forward difference operator; here we omit variable t of function P for convenience. Direct calculations imply −,0 1 ∆+x Pi−3,j 7 ∆+ + x Pi−2,j ∆x Pi−1,j σi,j = − (3.40) 3 ∆x 6 ∆x ∆x −,1 −1 ∆+ P x i−2,j 5 ∆ + P x i−1,j 1 ∆+ x Pi,j σi,j = + + (3.41) 6 ∆x 6 ∆x 3 ∆x + + + −,2 1 ∆x Pi−1,j 5 ∆x Pi,j 1 ∆x Pi+1,j σi,j = + − . (3.42) 3 ∆x 6 ∆x 6 ∆x −,k WENO procedure uses a convex combination of σi,j , k = 0, 1, 2 as final approximation − of σi,j , i.e. − −,0 −,1 −,2 σi,j = w0 σi,j + w1 σi,j + w2 σi,j , (3.43) where w0 , w1 , w2 ≥ 0 are nonlinear weights whose sum is 1. The key ingredient of WENO interpolation is to choose these nonlinear weights in the following way: 1. In smooth regions, w0 , w1 , w2 should be very close to the optimal linear weights: w0 = 0.1 + O(∆x2 ), w1 = 0.6 + O(∆x2 ), w2 = 0.3 + O(∆x2 ), − which makes σi,j defined by (3.43) fifth order accurate on stencil {xi−3 , xi−2 , . . . , xi+2 } in approximating ∂x P (xi , zj ) in smooth regions of P (·, zj ); 2. When stencil Xk contains discontinuity in the x derivative of P , the corresponding weight wk should be close to zero, so that stencil on non-smooth region has little − contribution to σi,j . The choice of weight in [107] satisfies wk = O(∆x4 ) in this case. 85 To find such nonlinear weight, a smoothness indicator is introduced to measure how smooth the function being interpolated is inside the interpolation stencil. As illustrated in [107], the smoothness indicator is a scaled sum of the squares of the L2 norms of the second and higher derivatives of the interpolation polynomial on the target interval, i.e. IS0 = 13(a − b)2 + 3(a − 3b)2 , (3.44) IS1 = 13(b − c)2 + 3(b + c)2 , (3.45) IS2 = 13(c − d)2 + 3(3c − d)2 , (3.46) where ∆2x Pi−2,j ∆2x Pi−1,j ∆2x Pi,j ∆2x Pi+1,j a= , b= , c= , d= , (3.47) ∆x2 ∆x2 ∆x2 ∆x2 and ∆2x represents the central difference, i.e. ∆2x Pi,j = Pi+1,j − 2Pi,j + Pi−1,j . Then the nonlinear weights are w ek wk = , k = 0, 1, 2, w e0 + we1 + w e2 where we usually choose ε = 10−6 , and 1 6 3 w e0 = , w e1 = , we2 = . (ε + IS0 )2 (ε + IS1 )2 (ε + IS2 )2 We obtain the fifth order WENO approximation as ∆+ ∆+ ∆+ ∆+   − x Pi−2,j x Pi−1,j x Pi,j x Pi+1,j σi,j = 12 − +7 +7 − − ΦW EN O (a, b, c, d), ∆x ∆x ∆x ∆x with   W EN O 1 1 1 Φ (a, b, c, d) := w0 (a − 2b + c) + w2 − (b − 2c + d), 3 6 2 and symmetrically, the approximation to the right derivative is ∆+ ∆+ ∆+ ∆+   + x Pi−2,j x Pi−1,j x Pi,j x Pi+1,j σi,j = 12 − +7 +7 − − ΦW EN O (e, d, c, b), ∆x ∆x ∆x ∆x with a, b, c, d in (3.47) and ∆2x Pi+2,j e= . ∆x2 86 3.5 Numerical Experiments We perform a sequence of exploratory experiments below, modifying the parameters η and β, as well as the throughput function α. In all cases, α, ρ0 , and ρbc are periodic with respect to x and the parameter r∗ = 1. Results are presented as two-dimensional color maps or line-outs in the z direction. In all figures, the horizontal axis corresponds to the z-axis. Profiles of α for each experiment are depicted in fig. 3.3. 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 x x x x 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0 0 0 0 0.6 0.7 0.8 0.9 1 1.1 0.6 0.7 0.8 0.9 1 1.1 0.6 0.7 0.8 0.9 1 1.1 0.6 0.7 0.8 0.9 1 1.1 α α α α (a) Example 1 (b) Examples 2-3 (c) Example 4 (d) Example 5 Figure 3.3: Profiles of the processor speed α used in the numerical experiments. The non- standard orientation of the graphs is set to match the axes in the numerical results that follow. Example 3.5.1 (Agreement between models) The purpose of this example is to demonstrate that the macroscopic model approximates the microscopic model when ε and δ are sufficiently small. We set β = 1 and consider η ∈ {0.2, 1, 5}. The initial condition, boundary condition, and processor speed are given by ρ0 (x, z) = 1.5 (sin(2πz))6 χ[0,0.5] (z), ρbc (x, t) = 0, α(x) = 1 − 0.4(sin(πx))2 , (3.48) respectively. Both models are simulated up to a final time t = 0.5. For this example, the Hamilton-Jacobi simulation is performed with a 1000 × 1000 mesh and a time step chosen according to (3.39) in order to generate a highly resolved numerical solution of the macroscopic model. For the microscopic model, we use imax = 1000 and k max = 200 when η = 0.2, imax = k max = 500 when η = 1, and imax = 200 and k max = 1000 when η = 5. These solutions to the microscopic model are then used to obtain the piecewise-constant function r on the 1000 × 1000 mesh from the Hamilton-Jacobi simulation. 87 Numerical results for η = 0.2, η = 1.0, and η = 5.0 are shown in fig. B.1, fig. B.2, and fig. B.3, respectively. While the results demonstrate general qualitative agreement between the models, discrepancies develop over time, especially for smaller values of η; see figs. B.1i and B.1l. For the worst case scenario (η = 0.2),we note that the k max is relatively small, suggesting that the asymptotic analysis is less relevant. Indeed, in this case the microscopic model displays a diffusive behavior similar to that encountered with under-resolved advective numerical schemes. In response to this error, we increase the size of the discrete model by a factor of 2.5 ( giving imax = 2500 and k max = 500), at which point the discrepancy between models decreases noticeably; see the first row of fig. 3.4. It is possible that an increase in the order of the approximation of the model with respect to z-advection (similar to that used to obtain Φ(0) ) may be needed to avoid such cases for low η models with relatively low k max . Such a modification remains open. 1.5 r 1 1.5 ρ 1 0.8 0.8 0.2 1 1 0.6 0.6 x x 0 0.4 0.4 0.5 0.5 0.2 0.2 −0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z 0 0 0.2 0.4 0.6 0.8 1 (a) r at t = 0.5 (b) ρ − r at t = 0.5 z (c) ρ,r at (x, t) = (0.3, 0.5) Figure 3.4: Comparison of the discrete model with (imax , k max ) = (2500, 500) and the con- tinuum model for η = 0.2 at time t = 0.5. As expected, the discrete model shows better agreement with the continuum model than the previous version with only (imax , k max ) = (1000, 200) processors and stages; cf. fig. B.1 For the remaining examples, the Hamilton-Jacobi simulations are performed on a coarser mesh of 100 × 100. Example 3.5.2 (Variations in η) In this example, we examine the effect of η on solutions to the macroscopic model while β = 1.0 is fixed. The initial condition, boundary condition, and processor 88 speed are given by ρ0 (x, z) = 1.5χz≤0.2 (x, z) ρbc (x, t) = 0, α(x) = 1 − 0.4(sin(πx))6 , (3.49) respectively. It is expected that the slower processor speed around x = 0.5 will slow down neigh- boring processors due to neighbor-based throttling, encoded in the definition of w2 in (3.12b). Moreover, the effect should become more global in x as η increases, since larger values of η cor- respond to a larger number of stages per processor. Indeed as the stages increase, interactions between neighbors begin to have a cumulative global effect. This trend can be observed by compar- ing results across the first three rows of fig. B.4 and in the line-outs in the final row. Example 3.5.3 (Variations in β) In this example, we examine the effect of β on solutions to the macroscopic model, while holding η = 1.0 fixed. The initial condition, boundary condition, and processor speed are again given by (3.49). Based on the definition of the function w2 in (3.12b), the expectation is that smaller values of β will lead to reduced throttling effects. Such behavior is confirmed by the numerical results in fig. B.5. Example 3.5.4 (Highly localized slowdown) In this example, we explore the effects of a highly localized slowdown in processor speed when η = β = 1. The initial and boundary conditions are given in (3.49), while the processor speed is given by α(x) = 1 − 0.4c(x), where       0 |x − .5| > .05     40x − 18  x ∈ [.45, .475] c(x) = . (3.50)      −40x + 22 x ∈ [.525, .55]     1  |x − .5| < .025 In particular, α ̸= 1 only on the interval (0.45, 0.55). Simulation results from this example are shown in fig. 3.5. At early times, slower processors in the center of the x domain prohibit neigh- boring processors from moving data to later stages of the calculation (i.e. along the z-direction). 89 The result is a buildup of data in the neighboring processors. As time progresses, the build-up of data spreads as throttled processors near the initial slowdown around x = 0.5 begin to effect neighbors further away. Eventually these buildups dissipate as the slower processors begin catch up with their throttled neighbors. 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (a) ρ at t = 0.1 (b) ρ at t = 0.25 (c) ρ at t = 0.5 Figure 3.5: The effect of a highly localized slowdown on ρ Example 3.5.5 (Long-term behavior) In previous examples, we have observed that under some conditions, solutions eventually resemble a traveling profile of the form ρ∗ (x, z, t) = χ[ζ0 (x),ζ1 (x)] (z − st), (3.51) where s is a positive constant and the profiles ζ0 and ζ1 are constant in time and satisfy ζ0 (x) < ζ1 (x) for all x ∈ [0, 1). Our intuition is that for a wide range of conditions, traveling profiles of this type will arise after sufficiently long times, if the z domain is extended to (0, ∞). Moreover the shape of ζ1 and ζ2 is closely related to the initial data and the shape of α. A more systematic study of such profiles in special case can be found in [144]. Rather than make a precise conjecture at this point, we instead provide an example which further demonstrates our intuition. Initial and boundary conditions are given in (3.49). Because the domain in z is limited, we introduce relatively small variations in α, which allow the system to settle faster: α(x) = 1 + 0.1 cos(4πx). (3.52) Simulation results for this example are presented in fig. 3.6. When t = 0.5, the solution has nearly settled to a profile of the form (3.51), with cusps that appear where the waves caused by 90 throttling meet, at x = 0.5 and at the periodic boundary. We see then at t = 0.75 that this profile is maintained, with cusps at the same location, and again at t = 1 (after extending the z domain). In particular the solution has the periodicity of α. 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (a) ρ at t = 0.1 (b) ρ at t = 0.25 (c) ρ at t = 0.5 1 1.5 1 1.5 0.8 0.8 1 1 0.6 0.6 x x 0.4 0.4 0.5 0.5 0.2 0.2 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0.6 0.8 1 1.2 1.4 z z (d) ρ at t = 0.75 (e) ρ at t = 1 Figure 3.6: The effect of small variation in processor speed on ρ. After sufficient time, a profile emerges with the periodicity of α. 91 CHAPTER 4 CONCLUSIONS In the thesis, we discuss several numerical methods, their applications, and implementa- tions. For Sparse Grid Discontinuous Galerkin (SGDG) method, we focus on its adaptive version [3, 4, 5, 6, 7, 8], with Alpert’s multiwavelet as basis for DG space, interpolatory multiwavelet and collocation method to treat nonlinear terms, fast wavelet transform di- mension by dimension to reduce computational cost, and nonlinear solver in PETSc [9] for implicit time stepping. We apply this method on Maxwell equation in one or two dimension in nonlinear optical media [120, 121, 111, 122]. An energy stable Discontin- uous Galerkin method [1, 2] is employed for this problem, and in our work, coupling with adaptive SGDG scheme package [8] using PETSc [9]. Numerical experiments of model problems show expected accuracy and convergence rate, and adaptive scheme with proper choice of error threshold can choose a set of active elements in hash table to represent numerical solutions, without significant degeneration of numerical accuracy or generating non-physical numerical solutions or oscillations. In future work, data struc- ture of parallel computation in PETSc, e.g. DMPlex to deal with unstructured mesh, and IS and PetscSection for indexing, can provide a solution to compute numerical solution of SGDG method in core parallelism. The usage of PETSc also provides a solution to SGDG method with fully implicit time stepping, broadening the availability of SGDG method. We can also explore different standard for refinement and coarsening step of adaptive method, to better specify active elements in adaptive scheme. For Weighted Essentially Non-Oscillatory (WENO) method, we consider the fifth or- der WENO interpolation, which maintains high order accuracy on smooth region, while any singularity does not significantly affect the numerical interpolation [107, 12]. WENO interpolation and strong stability preserving (SSP) time stepping provide promising fully discrete numerical method to solve Hamilton-Jacobi equation [11]. This method is ap- 92 plied to solve the Hamilton-Jacobi continuum model of extreme scale parallel computers, which is derived from a discrete model by taking limits on number of processors and tasks. Numerical experiments of both models show that this continuum model can cap- ture the asymptotic behaviour of the discrete model. Additionally, these experiments provide an initial understanding of solutions’ dependence on parameters associated with the parallelism of the modelled computation as well as the effects heterogeneities in pro- cessing capacity [10]. In future work, we can explore control strategies for α that can alleviate bottlenecks caused by local slowdowns in the processor speed. We can also ex- tend the model to allow for more complicated interactions, including stochastic effects, and explore strategies for optimal communication. Finally, the parameters of the model can be taken from processor components of a real supercomputer, and we can determine if the macroscopic model predicts the real global behaviour of the supercomputer. 93 APPENDICES 94 APPENDIX A NUMERICAL SIMULATION OF 1D SOLITON PROPAGATION 1 1 1 T=40 T=40 T=40 0.8 T=80 0.8 T=80 0.8 T=80 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 E 0 E 0 E 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x 1 1 1 T=40 T=40 T=40 0.8 T=80 0.8 T=80 0.8 T=80 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 E 0 E 0 E 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x 1 1 1 T=40 T=40 T=40 0.8 T=80 0.8 T=80 0.8 T=80 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 E 0 E 0 E 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x Figure A.1: Transient fundamental (M = 1) temporal soliton propagation with the adap- tive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−5 , and coarsen threshold is η = 10−6 . First column: piecewise polynomial degree k = 1; second column: k = 2; third column: k = 3. First row: alternating flux I; second row: alternating flux II; third row: upwind flux. Solutions at time T = 40 and T = 80 are plotted on each case. 95 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 1 T = 40.0 T = 80.0 T = 40.0 T = 80.0 0.5 0.5 0.5 0.5 E 0 E 0 E 0 E 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Alternating flux I (b) k = 2, Alternating flux I 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 T = 40.0 T = 80.0 0.5 0.5 E 0 E 0 -0.5 -0.5 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Alternating flux I Figure A.2: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.1. 96 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 1 T = 40.0 T = 80.0 T = 40.0 T = 80.0 0.5 0.5 0.5 0.5 E 0 E 0 E 0 E 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Alternating flux II (b) k = 2, Alternating flux II 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 T = 40.0 T = 80.0 0.5 0.5 E 0 E 0 -0.5 -0.5 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Alternating flux II Figure A.3: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.1. 97 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 1 T = 40.0 T = 80.0 T = 40.0 T = 80.0 0.5 0.5 0.5 0.5 E 0 E 0 E 0 E 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Upwind flux (b) k = 2, Upwind flux 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 T = 40.0 T = 80.0 0.5 0.5 E 0 E 0 -0.5 -0.5 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Upwind flux Figure A.4: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.1. 98 1 1 1 T=40 T=40 T=40 0.8 T=80 0.8 T=80 0.8 T=80 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 E 0 E 0 E 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x 1 1 1 T=40 T=40 T=40 0.8 T=80 0.8 T=80 0.8 T=80 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 E 0 E 0 E 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x 1 1 1 T=40 T=40 T=40 0.8 T=80 0.8 T=80 0.8 T=80 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 E 0 E 0 E 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x Figure A.5: Transient fundamental (M = 1) temporal soliton propagation with the adap- tive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−3 , and coarsen threshold is η = 10−4 . First column: piecewise polynomial degree k = 1; second column: k = 2; third column: k = 3. First row: alternating flux I; second row: alternating flux II; third row: upwind flux. Solutions at time T = 40 and T = 80 are plotted on each case. 99 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 1 T = 40.0 T = 80.0 T = 40.0 T = 80.0 0.5 0.5 0.5 0.5 E 0 E 0 E 0 E 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Alternating flux I (b) k = 2, Alternating flux I 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 T = 40.0 T = 80.0 0.5 0.5 E 0 E 0 -0.5 -0.5 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Alternating flux I Figure A.6: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.5. 100 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 1 T = 40.0 T = 80.0 T = 40.0 T = 80.0 0.5 0.5 0.5 0.5 E 0 E 0 E 0 E 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Alternating flux II (b) k = 2, Alternating flux II 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 T = 40.0 T = 80.0 0.5 0.5 E 0 E 0 -0.5 -0.5 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Alternating flux II Figure A.7: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.5. 101 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 1 T = 40.0 T = 80.0 T = 40.0 T = 80.0 0.5 0.5 0.5 0.5 E 0 E 0 E 0 E 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Upwind flux (b) k = 2, Upwind flux 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 T = 40.0 T = 80.0 0.5 0.5 E 0 E 0 -0.5 -0.5 -1 -1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Upwind flux Figure A.8: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.5. 102 2 2 2 T=40 T=40 T=40 1.6 T=80 1.6 T=80 1.6 T=80 1.2 1.2 1.2 0.8 0.8 0.8 0.4 0.4 0.4 E 0 E 0 E 0 -0.4 -0.4 -0.4 -0.8 -0.8 -0.8 -1.2 -1.2 -1.2 -1.6 -1.6 -1.6 -2 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x 2 2 2 T=40 T=40 T=40 1.6 T=80 1.6 T=80 1.6 T=80 1.2 1.2 1.2 0.8 0.8 0.8 0.4 0.4 0.4 E 0 E 0 E 0 -0.4 -0.4 -0.4 -0.8 -0.8 -0.8 -1.2 -1.2 -1.2 -1.6 -1.6 -1.6 -2 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x 2 2 2 T=40 T=40 T=40 1.6 T=80 1.6 T=80 1.6 T=80 1.2 1.2 1.2 0.8 0.8 0.8 0.4 0.4 0.4 E 0 E 0 E 0 -0.4 -0.4 -0.4 -0.8 -0.8 -0.8 -1.2 -1.2 -1.2 -1.6 -1.6 -1.6 -2 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x Figure A.9: Transient second order (M = 2) temporal soliton propagation with the adap- tive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−5 , and coarsen threshold is η = 10−6 . First column: piecewise polynomial degree k = 1; second column: k = 2; third column: k = 3. First row: alternating flux I; second row: alternating flux II; third row: upwind flux. Solutions at time T = 40 and T = 80 are plotted on each case. 103 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 2 2 2 2 T = 40.0 T = 80.0 T = 40.0 T = 80.0 1 1 1 1 E 0 E 0 E 0 E 0 -1 -1 -1 -1 -2 -2 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Alternating flux I (b) k = 2, Alternating flux I 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 2 2 T = 40.0 T = 80.0 1 1 E 0 E 0 -1 -1 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Alternating flux I Figure A.10: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.9. 104 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 2 2 2 2 T = 40.0 T = 80.0 T = 40.0 T = 80.0 1 1 1 1 E 0 E 0 E 0 E 0 -1 -1 -1 -1 -2 -2 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Alternating flux II (b) k = 2, Alternating flux II 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 2 2 T = 40.0 T = 80.0 1 1 E 0 E 0 -1 -1 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Alternating flux II Figure A.11: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.9. 105 12 12 12 12 9 9 9 9 6 6 6 6 3 3 3 3 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 2 2 2 2 T = 40.0 T = 80.0 T = 40.0 T = 80.0 1 1 1 1 E 0 E 0 E 0 E 0 -1 -1 -1 -1 -2 -2 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x x x (a) k = 1, Upwind flux (b) k = 2, Upwind flux 12 12 9 9 6 6 3 3 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 2 2 T = 40.0 T = 80.0 1 1 E 0 E 0 -1 -1 -2 -2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x (c) k = 3, Upwind flux Figure A.12: Comparison of active elements and electric field solutions, at T = 40 and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.9. 106 APPENDIX B NUMERICAL EXPERIMENTS OF ASYNCHRONOUS DATA FLOW MODELS 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (a) r at t = 0.1 (b) r at t = 0.25 (c) r at t = 0.5 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (d) ρ at t = 0.1 (e) ρ at t = 0.25 (f) ρ at t = 0.5 1 1 1 0.8 0.2 0.8 0.2 0.8 0.2 0.6 0.6 0.6 x 0 x 0 x 0 0.4 0.4 0.4 0.2 −0.2 0.2 −0.2 0.2 −0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (g) ρ − r at t = 0.1 (h) ρ − r at t = 0.25 (i) ρ − r at t = 0.5 1.5 1.5 1.5 r r r ρ ρ ρ 1 1 1 0.5 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (j) ρ,r at (x, t) = (0.3, 0.1) (k) ρ,r at (x, t) = (0.3, 0.25) (l) ρ,r at (x, t) = (0.3, 0.5) Figure B.1: Discrete solution r and continuum solution ρ of η = 0.2, in Example 3.5.1. From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Dis- crete solution is computed with (imax , k max ) = (1000, 200). Continuum solution is com- puted on a 103 × 103 mesh. 107 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (a) r at t = 0.1 (b) r at t = 0.25 (c) r at t = 0.5 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (d) ρ at t = 0.1 (e) ρ at t = 0.25 (f) ρ at t = 0.5 1 1 1 0.8 0.2 0.8 0.2 0.8 0.2 0.6 0.6 0.6 x 0 x 0 x 0 0.4 0.4 0.4 0.2 −0.2 0.2 −0.2 0.2 −0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (g) ρ − r at t = 0.1 (h) ρ − r at t = 0.25 (i) ρ − r at t = 0.5 1.5 1.5 1.5 r r r ρ ρ ρ 1 1 1 0.5 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (j) ρ,r at (x, t) = (0.3, 0.1) (k) ρ,r at (x, t) = (0.3, 0.25) (l) ρ,r at (x, t) = (0.3, 0.5) Figure B.2: Discrete solution r and continuum solution ρ of η = 1 case, in Example 3.5.1. From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Discrete solution is computed with (imax , k max ) = (500, 500). Continuum solution is com- puted on a 103 × 103 mesh. 108 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (a) r at t = 0.1 (b) r at t = 0.25 (c) r at t = 0.5 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (d) ρ at t = 0.1 (e) ρ at t = 0.25 (f) ρ at t = 0.5 1 1 1 0.8 0.2 0.8 0.2 0.8 0.2 0.6 0.6 0.6 x 0 x 0 x 0 0.4 0.4 0.4 0.2 −0.2 0.2 −0.2 0.2 −0.2 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (g) ρ − r at t = 0.1 (h) ρ − r at t = 0.25 (i) (ρ − r) at t = 0.50 1.5 1.5 1.5 r r r ρ ρ ρ 1 1 1 0.5 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (j) ρ,r at (x, t) = (0.3, 0.1) (k) ρ,r at (x, t) = (0.3, 0.25) (l) ρ,r at (x, t) = (0.3, 0.5) Figure B.3: Discrete solution r and continuum solution ρ of η = 5 case, in Example 3.5.1. From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Dis- crete solution is computed with (imax , k max ) = (200, 1000). Continuum solution is com- puted on a 103 × 103 mesh. 109 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (a) ρ at t = 0.1, η = 0.2 (b) ρ at t = 0.25, η = 0.2 (c) ρ at t = 0.5, η = 0.2 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (d) ρ at t = 0.1, η = 1 (e) ρ at t = 0.25, η = 1 (f) ρ at t = 0.5, η = 1 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (g) ρ at t = 0.1, η = 5 (h) ρ at t = 0.25, η = 5 (i) ρ at t = 0.5, η = 5 1.5 1.5 1.5 η = 0.2 η = 0.2 η = 0.2 η=1 η=1 η=1 η=5 η=5 η=5 1 1 1 0.5 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (j) ρ(0.3, z, 0.1) (k) ρ(0.3, z, 0.25) (l) ρ(0.3, z, 0.5) Figure B.4: The effects on ρ due to variations in η, in Example 3.5.2. As η increases, the throttling effect of local slowdown spreads more quickly, and data is not processed as quickly. 110 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (a) ρ at t = 0.1, β = 0.1 (b) ρ at t = 0.25, β = 0.1 (c) ρ at t = 0.5, β = 0.1 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (d) ρ at t = 0.1, β = 0.5 (e) ρ at t = 0.25, β = 0.5 (f) ρ at t = 0.5, β = 0.5 1 1.5 1 1.5 1 1.5 0.8 0.8 0.8 1 1 1 0.6 0.6 0.6 x x x 0.4 0.4 0.4 0.5 0.5 0.5 0.2 0.2 0.2 0 0 0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (g) ρ at t = 0.1, β = 1 (h) ρ at t = 0.25, β = 1 (i) ρ at t = 0.5, β = 1 1.5 1.5 1.5 β = 0.1 β = 0.1 β = 0.1 β = 0.5 β = 0.5 β = 0.5 β=1 β=1 β=1 1 1 1 0.5 0.5 0.5 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 z z z (j) ρ(0.3, z, 0.1) (k) ρ(0.3, z, 0.25) (l) ρ(0.3, z, 0.5) Figure B.5: The effects on ρ due to variations in β, in Example 3.5.3. Larger values of β lead to more throttling. 111 BIBLIOGRAPHY 112 BIBLIOGRAPHY [1] Vrushali A Bokil, Yingda Cheng, Yan Jiang, and Fengyan Li. Energy stable dis- continuous Galerkin methods for Maxwell’s equations in nonlinear optical media. Journal of Computational Physics, 350:420–452, 2017. [2] Maohui Lyu, Vrushali A Bokil, Yingda Cheng, and Fengyan Li. Energy stable nodal discontinuous Galerkin methods for nonlinear Maxwell’s equations in multi- dimensions. Journal of Scientific Computing, 89(2):1–42, 2021. [3] Zixuan Wang, Qi Tang, Wei Guo, and Yingda Cheng. Sparse grid discontinuous Galerkin methods for high-dimensional elliptic equations. Journal of Computational Physics, 314:244–263, 2016. [4] Wei Guo and Yingda Cheng. A sparse grid discontinuous Galerkin method for high-dimensional transport equations and its application to kinetic simulations. SIAM Journal on Scientific Computing, 38(6):A3381–A3409, 2016. [5] Wei Guo and Yingda Cheng. An adaptive multiresolution discontinuous Galerkin method for time-dependent transport equations in multidimensions. SIAM Journal on Scientific Computing, 39(6):A2962–A2992, 2017. [6] Juntao Huang and Yingda Cheng. An adaptive multiresolution discontinuous Galerkin method with artificial viscosity for scalar hyperbolic conservation laws in multidimensions. SIAM Journal on Scientific Computing, 42(5):A2943–A2973, 2020. [7] Zhanjing Tao, Yan Jiang, and Yingda Cheng. An adaptive high-order piecewise polynomial based sparse grid collocation method with applications. Journal of Com- putational Physics, 433:109770, 2021. [8] Yingda Cheng, Wei Guo, Juntao Huang, Kai Huang, Yuan Liu, and Zhanjing Tao. Adaptive multiresolution discontinuous Galerkin (DG) C++ package for solving partial differential equations in high dimensions. https://github.com/JuntaoHua ng/adaptive-multiresolution-DG, 2019-2022. [9] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Steven Benson, Jed Brown, Peter Brune, Kris Buschelman, Emil M. Constantinescu, Lisandro Dalcin, Alp Dener, Vic- tor Eijkhout, William D. Gropp, Václav Hapla, Tobin Isaac, Pierre Jolivet, Dmitry Karpeev, Dinesh Kaushik, Matthew G. Knepley, Fande Kong, Scott Kruger, Dave A. May, Lois Curfman McInnes, Richard Tran Mills, Lawrence Mitchell, Todd Mun- son, Jose E. Roman, Karl Rupp, Patrick Sanan, Jason Sarich, Barry F. Smith, Ste- fano Zampini, Hong Zhang, Hong Zhang, and Junchao Zhang. PETSc Web page. https://petsc.org/, 2022. [10] Richard C Barnard, Kai Huang, and Cory Hauck. A mathematical model of asyn- chronous data flow in parallel computers. IMA Journal of Applied Mathematics, 85(6):865–891, 2020. 113 [11] Chi-Wang Shu. High order numerical methods for time dependent Hamilton-Jacobi equations. In Mathematics and computation in imaging science and information process- ing, pages 47–91. World Scientific, 2007. [12] Chi-Wang Shu. Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyperbolic conservation laws. Advanced numerical approximation of non- linear hyperbolic equations, pages 325–432, 1998. [13] William H Reed and Thomas R Hill. Triangular mesh methods for the neutron transport equation. Technical report, Los Alamos Scientific Lab., N. Mex.(USA), 1973. [14] Bernardo Cockburn and Chi-Wang Shu. The Runge-Kutta local projection- discontinuous-Galerkin finite element method for scalar conservation laws. ESAIM: Mathematical Modelling and Numerical Analysis, 25(3):337–361, 1991. [15] Bernardo Cockburn and Chi-Wang Shu. TVB Runge-Kutta local projection discon- tinuous Galerkin finite element method for conservation laws. II. general frame- work. Mathematics of computation, 52(186):411–435, 1989. [16] Bernardo Cockburn, San-Yih Lin, and Chi-Wang Shu. TVB Runge-Kutta local pro- jection discontinuous Galerkin finite element method for conservation laws III: one- dimensional systems. Journal of computational Physics, 84(1):90–113, 1989. [17] Bernardo Cockburn, Suchung Hou, and Chi-Wang Shu. The Runge-Kutta local projection discontinuous Galerkin finite element method for conservation laws. IV. the multidimensional case. Mathematics of Computation, 54(190):545–581, 1990. [18] Bernardo Cockburn and Chi-Wang Shu. The Runge-Kutta discontinuous Galerkin method for conservation laws V: multidimensional systems. Journal of Computa- tional Physics, 141(2):199–224, 1998. [19] Bernardo Cockburn and Chi-Wang Shu. The P 1 -RKDG method for two-dimensional Euler equations of gas dynamics, volume 91-9. NASA Langley Research Center. Insti- tute for Computer Applications in Science and Engineering (ICASE), 1991. [20] Chi-Wang Shu and Stanley Osher. Efficient implementation of essentially non- oscillatory shock-capturing schemes. Journal of computational physics, 77(2):439–471, 1988. [21] Sigal Gottlieb, David I Ketcheson, and Chi-Wang Shu. Strong stability preserving Runge-Kutta and multistep time discretizations. World Scientific, 2011. [22] Chi-Wang Shu. TVB uniformly high-order schemes for conservation laws. Mathe- matics of Computation, 49(179):105–121, 1987. [23] Jianxian Qiu and Qiang Zhang. Stability, error estimate and limiters of discontinu- ous Galerkin methods. In Handbook of Numerical Analysis, volume 17, pages 147–171. Elsevier, 2016. 114 [24] Francesco Bassi and Stefano Rebay. A high-order accurate discontinuous finite el- ement method for the numerical solution of the compressible Navier–Stokes equa- tions. Journal of computational physics, 131(2):267–279, 1997. [25] Bernardo Cockburn and Chi-Wang Shu. The local discontinuous Galerkin method for time-dependent convection-diffusion systems. SIAM journal on numerical analy- sis, 35(6):2440–2463, 1998. [26] Bernardo Cockburn, George E Karniadakis, and Chi-Wang Shu. The development of discontinuous Galerkin methods. In Discontinuous Galerkin Methods, pages 3–50. Springer, 2000. [27] Chi-Wang Shu. Discontinuous Galerkin methods: general approach and stability. Numerical solutions of partial differential equations, 201, 2009. [28] Chi-Wang Shu. Discontinuous Galerkin method for time-dependent problems: sur- vey and recent developments. Recent developments in discontinuous Galerkin finite element methods for partial differential equations, pages 25–62, 2014. [29] Douglas N Arnold, Franco Brezzi, Bernardo Cockburn, and Donatella Marini. Dis- continuous Galerkin methods for elliptic problems. In Discontinuous Galerkin Meth- ods, pages 89–101. Springer, 2000. [30] Jacques Louis Lions. Problemes aux Limites non Homogenes a Donnees Irregulieres, pages 283–292. Springer Berlin Heidelberg, Berlin, Heidelberg, 1968. [31] Joachim Nitsche. Über ein Variationsprinzip zur Lösung von Dirichlet-Problemen bei Verwendung von Teilräumen, die keinen Randbedingungen unterworfen sind. In Abhandlungen aus dem mathematischen Seminar der Universität Hamburg, vol- ume 36, pages 9–15. Springer, 1971. [32] Ivo Babuška. The finite element method with penalty. Mathematics of computation, 27(122):221–228, 1973. [33] Douglas N. Arnold. An Interior Penalty Finite Element Method with Discontinuous Elements. PhD thesis, The University of Chicago, 1979. [34] Douglas N Arnold. An interior penalty finite element method with discontinuous elements. SIAM journal on numerical analysis, 19(4):742–760, 1982. [35] Douglas N Arnold, Franco Brezzi, Bernardo Cockburn, and L Donatella Marini. Unified analysis of discontinuous Galerkin methods for elliptic problems. SIAM journal on numerical analysis, 39(5):1749–1779, 2002. [36] Béatrice Rivière. Discontinuous Galerkin methods for solving elliptic and parabolic equa- tions: theory and implementation. SIAM, 2008. [37] Chi-Wang Shu. A brief survey on discontinuous Galerkin methods in computa- tional fluid dynamics. Advances in mechanics, 43(6):541–553, 2013. 115 [38] Bernardo Cockburn. Discontinuous Galerkin methods for convection-dominated problems. In High-order methods for computational physics, pages 69–224. Springer, 1999. [39] Bernardo Cockburn and Chi-Wang Shu. Runge-Kutta discontinuous Galerkin methods for convection-dominated problems. Journal of scientific computing, 16(3):173–261, 2001. [40] Douglas N. Arnold etc. Foreword for the special issue on discontinuous Galerkin method. J. Sci. Comput., 22–23(1–3), 2005. [41] Bernardo Cockburn and Chi-Wang Shu. Foreword. J. Sci. Comput., 40(1–3):1–3, Jul 2009. [42] Clint Dawson. Foreword. Computer Methods in Applied Mechanics and Engineering, 195(25):3183, 2006. Discontinuous Galerkin Methods. [43] Yanlai Chen, Bo Dong, and Chi-Wang Shu. A foreword to the special issue in honor of Professor Bernardo Cockburn on his 60th birthday: A life time of discontinuous schemings. J. Sci. Comput., 77(3):1303–1309, Dec. 2018. [44] Jan Hesthaven, Jennifer Ryan, Chi-Wang Shu, Jaap Vegt, Yan Xu, Qiang Zhang, and Zhimin Zhang. Preface to focused issue on discontinuous galerkin methods. Communications on Applied Mathematics and Computation, pages 1–2, 10 2021. [45] Jan S Hesthaven and Tim Warburton. Nodal discontinuous Galerkin methods: algo- rithms, analysis, and applications. Springer Science & Business Media, 2007. [46] Guido Kanschat. Discontinuous Galerkin methods for viscous incompressible flow. Springer, 2008. [47] B.Q. Li. Discontinuous Finite Elements in Fluid Dynamics and Heat Transfer. Compu- tational Fluid and Solid Mechanics. Springer London, 2005. [48] Daniele Antonio Di Pietro and Alexandre Ern. Mathematical aspects of discontinuous Galerkin methods, volume 69. Springer Science & Business Media, 2011. [49] V. Dolejší and M. Feistauer. Discontinuous Galerkin Method: Analysis and Applica- tions to Compressible Flow. Springer Series in Computational Mathematics. Springer International Publishing, 2015. [50] Eric T Chung, Patrick Ciarlet Jr, and Tang Fei Yu. Convergence and supercon- vergence of staggered discontinuous Galerkin methods for the three-dimensional Maxwell’s equations on Cartesian grids. Journal of Computational Physics, 235:14–31, 2013. [51] Bernardo Cockburn, Fengyan Li, and Chi-Wang Shu. Locally divergence-free dis- continuous Galerkin methods for the Maxwell equations. Journal of Computational Physics, 194(2):588–610, 2004. 116 [52] Jan S Hesthaven and Tim Warburton. Nodal high-order methods on unstructured grids: I. time-domain solution of Maxwell’s equations. Journal of Computational Physics, 181(1):186–221, 2002. [53] Stephen D Gedney, John C Young, Tyler C Kramer, and J Alan Roden. A discontin- uous Galerkin finite element time-domain method modeling of dispersive media. IEEE transactions on antennas and propagation, 60(4):1969–1977, 2012. [54] Yunqing Huang, Jichun Li, and Wei Yang. Interior penalty DG methods for Maxwell’s equations in dispersive media. Journal of Computational Physics, 230(12):4559–4570, 2011. [55] Stéphane Lanteri and Claire Scheid. Convergence of a discontinuous Galerkin scheme for the mixed time-domain Maxwell’s equations in dispersive media. IMA Journal of Numerical Analysis, 33(2):432–459, 2013. [56] Tiao Lu, Pingwen Zhang, and Wei Cai. Discontinuous Galerkin methods for dis- persive and lossy Maxwell’s equations and PML boundary conditions. Journal of Computational Physics, 200(2):549–580, 2004. [57] Eric T Chung and Patrick Ciarlet Jr. A staggered discontinuous Galerkin method for wave propagation in media with dielectrics and meta-materials. Journal of Com- putational and Applied Mathematics, 239:189–207, 2013. [58] Jichun Li. Development of discontinuous Galerkin methods for Maxwell’s equa- tions in metamaterials and perfectly matched layers. Journal of Computational and Applied Mathematics, 236(5):950–961, 2011. [59] Jichun Li and Jan S Hesthaven. Analysis and application of the nodal discontinuous Galerkin method for wave propagation in metamaterials. Journal of Computational Physics, 258:915–930, 2014. [60] Jichun Li, Cengke Shi, and Chi-Wang Shu. Optimal non-dissipative discontinuous Galerkin methods for Maxwell’s equations in Drude metamaterials. Computers & Mathematics with Applications, 73(8):1760–1780, 2017. [61] Elisabeth Blank. The Discontinuous Galerkin Method for Maxwell’s Equations: Applica- tion to Bodies of Revolution and Kerr-Nonlinearities. PhD thesis, Karlsruhe, Karlsruher Institut für Technologie (KIT), Diss., 2013, 2013. [62] Loula Fezoui and Stéphane Lanteri. Discontinuous Galerkin methods for the numerical solution of the nonlinear Maxwell equations in 1d. PhD thesis, Inria, 2015. [63] Juntao Huang and Chi-Wang Shu. A second-order asymptotic-preserving and positivity-preserving discontinuous Galerkin scheme for the Kerr–Debye model. Mathematical Models and Methods in Applied Sciences, 27(03):549–579, 2017. [64] Zhichao Peng, Vrushali A Bokil, Yingda Cheng, and Fengyan Li. Asymptotic and positivity preserving methods for Kerr-Debye model with Lorentz dispersion in one dimension. Journal of Computational Physics, 402:109101, 2020. 117 [65] Rand Corporation and Richard Bellman. Adoptive control processes: A guided tour. Princeton University Press, 1961. [66] Hans-Joachim Bungartz and Michael Griebel. Sparse grids. Acta numerica, 13:147– 269, 2004. [67] Jochen Garcke and Michael Griebel. Sparse grids and applications, volume 88. Springer Science & Business Media, 2012. [68] C Zenger. Sparse Grids, Parallel Algorithms for Partial Differential Equations: Pro- ceedings of the Sixth GAMM-Seminar, Kiel, January 1990, Notes on Numerical Fluid Mechanics (Vol. 31), W. Hackbusch, eds., Vieweg, Braunschweig, 1991. [69] Sergei Abramovich Smolyak. Quadrature and interpolation formulas for tensor products of certain classes of functions. In Doklady Akademii Nauk, volume 148, pages 1042–1045. Russian Academy of Sciences, 1963. [70] Konstantin Ivanovich Babenko. Approximation of periodic functions of many vari- ables by trigonometric polynomials. In Doklady Akademii Nauk, volume 132, pages 247–250. Russian Academy of Sciences, 1960. [71] Vladimir Nikolaevich Temlyakov. Approximations of functions with bounded mixed derivative. Trudy Matematicheskogo Instituta imeni VA Steklova, 178:3–113, 1986. [72] F-J Delvos. d-variate boolean interpolation. Journal of Approximation Theory, 34(2):99–114, 1982. [73] G Baszenski, F-J Delvos, and S Jester. Blending approximations with sine functions. In Numerical Methods in Approximation Theory, Vol. 9, pages 1–19. Springer, 1992. [74] Chin Bo Liem and Tsimin Shih. Splitting Extrapolation Method, the: A New Technique In Numerical Solution Of Multidimensional Prob, volume 7. World Scientific, 1995. [75] Michael Griebel. Sparse grids and related approximation schemes for higher dimensional problems. SFB 611, 2005. [76] Bradley K Alpert. A class of bases in L2 for the sparse representation of integral operators. SIAM journal on Mathematical Analysis, 24(1):246–262, 1993. [77] Beylkin Alpert, Gregory Beylkin, David Gines, and Lev Vozovoi. Adaptive solu- tion of partial differential equations in multiwavelet bases. Journal of Computational Physics, 182(1):149–190, 2002. [78] Stephane Mallat. A wavelet tour of signal processing: An approximation tour, 1999. [79] Yuan Liu, Yingda Cheng, Shanqin Chen, and Yong-Tao Zhang. Krylov implicit in- tegration factor discontinuous Galerkin methods on sparse grids for high dimen- sional reaction-diffusion equations. Journal of Computational Physics, 388:90–102, 2019. 118 [80] Wei Guo. A sparse grid discontinuous Galerkin method for the high-dimensional Helmholtz equation with variable coefficients. arXiv preprint arXiv:1905.08917, 2019. [81] Zhanjing Tao, Wei Guo, and Yingda Cheng. Sparse grid discontinuous Galerkin methods for the Vlasov-Maxwell system. Journal of Computational Physics: X, 3:100022, 2019. [82] Jianguo Huang and Yue Yu. A sparse grid discrete ordinate discontinuous Galerkin method for the radiative transfer equation. arXiv preprint arXiv:2101.09070, 2021. [83] Juntao Huang, Yuan Liu, Wei Guo, Zhanjing Tao, and Yingda Cheng. An adaptive multiresolution interior penalty discontinuous Galerkin method for wave equa- tions in second order form. Journal of Scientific Computing, 85(1):1–31, 2020. [84] Jie Shen and Haijun Yu. Efficient spectral sparse grid methods and applica- tions to high-dimensional elliptic problems. SIAM Journal on Scientific Computing, 32(6):3228–3250, 2010. [85] Andreas Zeiser. Fast matrix-vector multiplication in the sparse-grid Galerkin method. Journal of Scientific Computing, 47(3):328–346, 2011. [86] Juntao Huang, Yong Liu, Yuan Liu, Zhanjing Tao, and Yingda Cheng. A class of adaptive multiresolution ultra-weak discontinuous Galerkin methods for some nonlinear dispersive wave equations. SIAM Journal on Scientific Computing, 44(2):A745–A769, 2022. [87] Wei Guo, Juntao Huang, Zhanjing Tao, and Yingda Cheng. An adaptive sparse grid local discontinuous Galerkin method for Hamilton-Jacobi equations in high dimensions. Journal of Computational Physics, 436:110294, 2021. [88] Zhanjing Tao, Juntao Huang, Yuan Liu, Wei Guo, and Yingda Cheng. An adaptive multiresolution ultra-weak discontinuous Galerkin method for nonlinear Schrödinger equations. Communications on Applied Mathematics and Computation, 4(1):60–83, 2022. [89] Chi-Wang Shu. Essentially non-oscillatory and weighted essentially non-oscillatory schemes. Acta Numerica, 29:701–762, 2020. [90] Chi-Wang Shu. High order ENO and WENO schemes for computational fluid dy- namics. In High-order methods for computational physics, pages 439–582. Springer, 1999. [91] A. Harten, B. Engquist, S. Osher, and S.R. Chakravarthy. Uniformly high order accurate essentially non-oscillatory schemes. III. Journal of Computational Physics (Print), 71(2):231–303, 1987. [92] Chi-Wang Shu and Stanley Osher. Efficient implementation of essentially non- oscillatory shock-capturing schemes. Journal of computational physics, 77(2):439–471, 1988. 119 [93] Chi-Wang Shu and Stanley Osher. Efficient implementation of essentially non- oscillatory shock-capturing schemes, II. Journal of Computational Physics, 83(1):32– 78, 1989. [94] Xu-Dong Liu, Stanley Osher, and Tony Chan. Weighted essentially non-oscillatory schemes. Journal of computational physics, 115(1):200–212, 1994. [95] Oliver Friedrich. Weighted essentially non-oscillatory schemes for the interpolation of mean values on unstructured grids. Journal of computational physics, 144(1):194– 212, 1998. [96] Changqing Hu and Chi-Wang Shu. High order weighted ENO schemes for unstruc- tured meshes - preliminary results. Computational fluid dynamics’98, pages 356–362, 1998. [97] Changqing Hu and Chi-Wang Shu. Weighted essentially non-oscillatory schemes on triangular meshes. Journal of Computational Physics, 150(1):97–127, 1999. [98] Doron Levy, Gabriella Puppo, and Giovanni Russo. Central WENO schemes for hyperbolic systems of conservation laws. ESAIM: Mathematical Modelling and Nu- merical Analysis, 33(3):547–571, 1999. [99] Guang-Shan Jiang and Chi-Wang Shu. Efficient implementation of weighted eno schemes. Journal of computational physics, 126(1):202–228, 1996. [100] Dinshaw S Balsara and Chi-Wang Shu. Monotonicity preserving weighted essen- tially non-oscillatory schemes with increasingly high order of accuracy. Journal of Computational Physics, 160(2):405–452, 2000. [101] Chi-Wang Shu. High order weighted essentially nonoscillatory schemes for con- vection dominated problems. SIAM review, 51(1):82–126, 2009. [102] Michael G Crandall and Pierre-Louis Lions. Viscosity solutions of Hamilton-Jacobi equations. Transactions of the American mathematical society, 277(1):1–42, 1983. [103] Michael G Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscos- ity solutions of second order partial differential equations. Bulletin of the American mathematical society, 27(1):1–67, 1992. [104] Guy Barles. An introduction to the theory of viscosity solutions for first-order Hamilton–Jacobi equations and applications. In Hamilton-Jacobi equations: approx- imations, numerical analysis and applications, pages 49–109. Springer, 2013. [105] Stanley Osher and James A Sethian. Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of computational physics, 79(1):12–49, 1988. [106] Stanley Osher and Chi-Wang Shu. High-order essentially nonoscillatory schemes for Hamilton–Jacobi equations. SIAM Journal on numerical analysis, 28(4):907–922, 1991. 120 [107] Guang-Shan Jiang and Danping Peng. Weighted ENO schemes for Hamilton–Jacobi equations. SIAM Journal on Scientific computing, 21(6):2126–2143, 2000. [108] Chi-Wang Shu. Total-variation-diminishing time discretizations. SIAM Journal on Scientific and Statistical Computing, 9(6):1073–1084, 1988. [109] Sigal Gottlieb and Chi-Wang Shu. Total variation diminishing Runge-Kutta schemes. Mathematics of computation, 67(221):73–85, 1998. [110] Sigal Gottlieb, Chi-Wang Shu, and Eitan Tadmor. Strong stability-preserving high- order time discretization methods. SIAM review, 43(1):89–112, 2001. [111] G.P. Agrawal. Nonlinear Fiber Optics. Electronics & Electrical. Elsevier Science, 2007. [112] Peter M Goorjian, Allen Taflove, Rose M Joseph, and Susan C Hagness. Compu- tational modeling of femtosecond optical solitons from Maxwell’s equations. IEEE journal of quantum electronics, 28(10):2416–2422, 1992. [113] Gaël Guennebaud, Benoît Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010. [114] Shrirang Abhyankar, Jed Brown, Emil M Constantinescu, Debojyoti Ghosh, Barry F Smith, and Hong Zhang. PETSc/TS: A modern scalable ODE/DAE solver library. arXiv preprint arXiv:1806.01437, 2018. [115] Peter R Brune, Matthew G Knepley, Barry F Smith, and Xuemin Tu. Composing scalable nonlinear algebraic solvers. SIAM Review, 57(4):535–565, 2015. [116] Dana A Knoll and David E Keyes. Jacobian-free Newton–Krylov methods: a survey of approaches and applications. Journal of Computational Physics, 193(2):357–397, 2004. [117] N. Bloembergen. Nonlinear Optics. World Scientific, 1996. [118] R.W. Boyd. Nonlinear Optics. Elsevier Science, 2003. [119] G. New. Introduction to Nonlinear Optics. Cambridge University Press, 2011. [120] M.N.O. Sadiku. Computational Electromagnetics with MATLAB, Fourth Edition. CRC Press, 2018. [121] L Gilles, SC Hagness, and L Vázquez. Comparison between staggered and un- staggered finite-difference time-domain grids for few-cycle temporal optical soliton propagation. Journal of Computational Physics, 161(2):379–400, 2000. [122] Cheryl V Hile and William L Kath. Numerical solutions of Maxwell’s equations for nonlinear-optical pulse propagation. JOSA B, 13(6):1135–1145, 1996. [123] Olivier Bokanowski, Jochen Garcke, Michael Griebel, and Irene Klompmaker. An adaptive sparse grid semi-Lagrangian scheme for first order Hamilton-Jacobi Bell- man equations. Journal of Scientific Computing, 55(3):575–605, 2013. 121 [124] Michael Griebel. Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing, 61(2):151–179, 1998. [125] W. Hundsdorfer and J.G. Verwer. Numerical Solution of Time-Dependent Advection- Diffusion-Reaction Equations. Springer Series in Computational Mathematics. Springer Berlin Heidelberg, 2013. [126] Dimitri van Heesch. Doxygen web page. https://www.doxygen.nl/index.html, 1997-2019. [127] Jethro H Greene and Allen Taflove. General vector auxiliary differential equa- tion finite-difference time-domain method for nonlinear optics. Optics express, 14(18):8305–8310, 2006. [128] Scientific Grand Challenges. Architectures and technology for extreme scale com- puting. In US Department of Energy Workshop Report, 2009. [129] Jack Dongarra, Jeffrey Hittinger, John Bell, Luis Chacón, Robert Falgout, Michael Heroux, Paul Hovland, Esmond Ng, Clayton Webster, and Stefan Wild. Applied mathematics research for exascale computing. Technical report, U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research Program, 2014. [130] David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Eunice E. Santos, Klaus E. Schauser, Ramesh Subramonian, and Thorsten von Eicken. LogP: a practical model of parallel computation. Commun. ACM, 39:78–85, 1996. [131] A. Pietracaprina and G. Pucci. The complexity of deterministic PRAM simulation on distributed memory machines. Theory of Computing Systems, 30(3):231–247, 1997. [132] Alberto Nunez, Javier Fernandez, Rosa Filguiera, Felix Garcia, and Jesus Carretero. SIMCAN: A flexible, scalable and expandable simulation platform for modelling and simulating distributed architectures and applications. Simulation Modelling Practice and Theory, 20(1):12–32, 2012. [133] Julian M. Kunkel. Simulating parallel programs on application and system level. Computer Science - Research and Development, 28(2):167–174, 2013. [134] S.S. Dosanjh, R.F. Barrett, D.W. Doerfler, S.D. Hammond, K.S. Hemmert, M.A. Her- oux, P.T. Lin, K.T. Pedretti, A.F. Rodrigues, T.G. Trucano, and J.P. Luitjens. Exascale design space exploration and co-design. Future Generation Computer Systems, 30:46 – 58, 2014. [135] Kyle L. Spafford and Jeffrey S. Vetter. Aspen: A domain specific language for per- formance modeling. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pages 84:1–84:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press. 122 [136] Jeffrey S. Vetter and Jeremy S. Meredith. Synthetic program analysis with Aspen. In Proceedings of the 3rd International Conference on Exascale Applications and Software, Edinburgh, Scotland, UK, 04/2015 2015. University of Edinburgh, University of Edinburgh. [137] Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J. Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. Pegasus, a workflow management system for science automation. Future Generation Computer Systems, 46:17 – 35, 2015. [138] A. Aw and M. Rascle. Resurrection of "second order" models of traffic flow. SIAM Journal on Applied Mathematics, 60(3):916–938, 2000. [139] D. Armbruster, D. Marthaler, and C. Ringhofer. Kinetic and fluid model hierarchies for supply chains. Multiscale Modeling & Simulation, 2(1):43–61, 2003. [140] Mapundi K. Banda, Michael Herty, and Axel Klar. Gas flow in pipeline networks. Networks and Heterogeneous Media, 1(1):41–56, 2006. [141] Jens Brouwer, Ingenuin Gasser, and Michael Herty. Gas pipeline models revisited: Model hierarchies, nonisothermal models, and simulations of networks. Multiscale Modeling & Simulation, 9(2):601–623, 2011. [142] Axel Klar and Raimund Wegener. A hierarchy of models for multilane vehicular traffic. I. Modeling. SIAM J. Appl. Math., 59(3):983–1001 (electronic), 1999. [143] Ernst Hairer, Syvert P Nørsett, and Gerhard Wanner. Solving Ordinary Differential Equations I: Nonstiff problems. Springer, 1987. [144] Cory D Hauck, Michael Herty, and Giuseppe Visconti. Qualitative properties of mathematical model for data flow. Networks & Heterogeneous Media, 16(4), 2021. 123