SPARSE GRID DISCONTINUOUS GALERKIN METHODS FOR NONLINEAR OPTICS
   AND MATHEMATICAL MODELING OF ASYNCHRONOUS DATA FLOW IN
                         PARALLEL COMPUTERS
                                       By
                                   Kai Huang
                              A DISSERTATION
                                  Submitted to
                          Michigan State University
                  in partial fulfillment of the requirements
                               for the degree of
                Applied Mathematics – Doctor of Philosophy
                                      2022


                                        ABSTRACT
 SPARSE GRID DISCONTINUOUS GALERKIN METHODS FOR NONLINEAR OPTICS
     AND MATHEMATICAL MODELING OF ASYNCHRONOUS DATA FLOW IN
                                PARALLEL COMPUTERS
                                             By
                                         Kai Huang
This thesis consists of two parts: the first part discusses the Sparse Grid Discontinuous
Galerkin (SGDG) method and its adaptive version, and their applications in Maxwell
equations in nonlinear optical media [1, 2]; the second part discusses a Hamilton-Jacobi
model of asynchronous data flow in parallel computers, and corresponding numerical
simulation using Weighted Essentially Non-Oscillatory (WENO) method.
    SGDG method and its adaptive version were developed in recent years [3, 4, 5, 6, 7], to
numerically solve different linear or nonlinear PDE problems, reducing degrees of free-
dom and computational cost. Compared to the previous works, this thesis mainly focuses
on fully implicit SGDG method of nonlinear equations, which broadens the applications
of the method. To achieve this goal, the existing SGDG package [8] is coupled with nonlin-
ear solvers in PETSc [9]. Numerical simulations of several model equations and physical
relevant problems are presented to demonstrate accuracy and robustness of the method.
    Presented in second part of the thesis are models of data flow on processors in a high
performance computing framework involving computations necessitating inter-processor
communications [10]. First comes an ordinary differential model, and its asymptotic limit
results in a model which treats the computer as a continuum of processors and data flow
as an Eulerian fluid governed by a conservation law. We derive a Hamilton-Jacobi equa-
tion associated with this conservation law for which the existence and uniqueness of so-
lutions can be proved. High order WENO interpolation [11, 12], together with strong
stability preserving (SSP) method for time discretization, is applied to simulation of the
Hamilton-Jacobi model. We then present the results of numerical experiments for both


discrete and continuum models; these show a qualitative agreement between the two
and the effect of variations in the computing environment’s processing capabilities on the
progress of the modeled computation.


Copyright by
KAI HUANG
       2022


                                ACKNOWLEDGEMENTS
Over the five years study in Michigan State University, I have received tremendous help
from a lot of people. First and foremost, I offer my deepest appreciation to my thesis
advisor, Professor Yingda Cheng. Not only does she provide me guidance on my the-
sis, but also she gives me valuable suggestions and generous helps throughout my PhD
study. Without her patience and support in every aspect, this thesis would not have been
possible.
    I would also thank all of my committee members, Professor Daniel Appelo, Professor
Andrew Christlieb, and Professor Jianliang Qian, for their kindness and help. I am also
indebted to my undergraduate advisor, Professor Mengping Zhang, for introducing me
to the research of numerical analysis.
    Support from colleagues is always crucial in large research projects. I would thank
my colleagues, Dr. Juntao Huang, Professor Zhanjing Tao, Professor Yan Jiang, Professor
Wei Guo, Professor Yuan Liu, for their help and advice in several different parts of the
thesis and my research, especially to the adaptive SGDG package and WENO method.
I would also thank Dr. Richard C. Bernard, Dr. Cory Hauck, Dr. Xianzhu Tang, Dr. Qi
Tang, William Sands, for their help in many aspects of my research career.
    Besides, I would appreciate my friends, Anqi Chen, Andres Felipe Galindo-Olarte,
Eric Cheuk-Wai Yau, Jing Huang, Rui Wang, Yao Li, Runze Su, Jialin Qu, Jian Song,
Zhichao Peng, etc. for their companion and kind help in the many years.
    Last but not the least, I am eternally grateful to my parents, Haizhuang Huang and
Xianhua Liu. Their persistent support and unconditional love encourage me to pursue
higher goals in my life, and make me who I am today.
                                             v


                               TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . .            . . . . . . . . . . . .  1
   1.1 Discontinuous Galerkin (DG) Method . . . . . . . . . .         . . . . . . . . . . . .  1
   1.2 Sparse Grid Discontinuous Galerkin (SGDG) Method .             . . . . . . . . . . . .  3
   1.3 Weighted Essentially Non-Oscillatory (WENO) Method             . . . . . . . . . . . .  5
   1.4 Organization of Thesis . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . .  8
CHAPTER 2     APPLICATION OF SGDG METHOD TO MAXWELL’S EQUA-
              TION IN NONLINEAR OPTICS . . . . . . . . . . . . . . . . . . . . .              11
   2.1 Maxwell’s Equation in Nonlinear Optics . . . . . . . . . . . . . . . . . . . . .       11
   2.2 Formulation of Energy Stable Adaptive SGDG Method in One Dimension .                   15
       2.2.1 SGDG Method and Alpert’s Multiwavelets . . . . . . . . . . . . . . .             16
       2.2.2 Semi-Discrete DG Method for the Maxwell’s Equation . . . . . . . . .             18
       2.2.3 Energy-Stable DG Method of the Maxwell Equation . . . . . . . . . .              21
       2.2.4 Interpolatory Multiwavelets . . . . . . . . . . . . . . . . . . . . . . . .      24
       2.2.5 SGDG Method with Multiresolution Interpolation in One Dimension                  27
   2.3 Formulation of Energy Stable Adaptive SGDG Method in Higher Dimension                  28
       2.3.1 Semi-Discrete and Fully-Discrete Energy Stable DG Method . . . . .               28
       2.3.2 High Dimensional Sparse Grid DG Method and Interpolatory Mul-
              tiwavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  34
       2.3.3 Adaptive SGDG Method . . . . . . . . . . . . . . . . . . . . . . . . . .         36
   2.4 Adaptive SGDG Package and PETSc Implementation . . . . . . . . . . . . .               40
       2.4.1 Overview of Package . . . . . . . . . . . . . . . . . . . . . . . . . . . .      40
       2.4.2 Hash Key of SGDG Elements . . . . . . . . . . . . . . . . . . . . . . .          41
       2.4.3 SGDG Spatial Discretization . . . . . . . . . . . . . . . . . . . . . . . .      42
       2.4.4 Time Integration and PETSc Package . . . . . . . . . . . . . . . . . . .         47
   2.5 Numerical Simulation in One Dimensional Case . . . . . . . . . . . . . . . .           52
       2.5.1 Kink Shape Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . .     52
       2.5.2 Soliton Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    55
   2.6 Numerical Simulation in Higher Dimensional Case . . . . . . . . . . . . . .            59
       2.6.1 Accuracy and Convergence Test of Linear Equation . . . . . . . . . .             59
       2.6.2 Accuracy and Convergence Test for nonlinear Maxwell Equation . .                 62
       2.6.3 Spatial Optical Soliton Propagation . . . . . . . . . . . . . . . . . . . .      66
CHAPTER 3     APPLICATION OF WENO METHOD IN MATHEMATICAL MODEL
              OF ASYNCHRONOUS DATA FLOW IN PARALLEL COMPUT-
              ERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
                                            vi


   3.1 Models of Extreme Scale Computers . .       . . . . . . . . . . . . . . . . . . . . . 73
   3.2 The Discrete Model . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . 75
   3.3 The Continuum Model . . . . . . . . . .     . . . . . . . . . . . . . . . . . . . . . 78
   3.4 Numerical Methods for Simulations . .       . . . . . . . . . . . . . . . . . . . . . 82
       3.4.1 ODE Implementation . . . . . . .      . . . . . . . . . . . . . . . . . . . . . 83
       3.4.2 Hamilton-Jacobi Implementation        . . . . . . . . . . . . . . . . . . . . . 83
       3.4.3 WENO Interpolation . . . . . . .      . . . . . . . . . . . . . . . . . . . . . 84
   3.5 Numerical Experiments . . . . . . . . .     . . . . . . . . . . . . . . . . . . . . . 87
CHAPTER 4    CONCLUSIONS         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
   APPENDIX A       NUMERICAL SIMULATION OF 1D SOLITON PROPAGA-
                    TION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
   APPENDIX B       NUMERICAL EXPERIMENTS OF ASYNCHRONOUS DATA
                    FLOW MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . 107
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
                                           vii


                                     LIST OF TABLES
Table 2.1: CFL constant for fully implicit method . . . . . . . . . . . . . . . . . . . . 53
Table 2.2: Errors and Orders of Accuracy of E. k = 1, T = 1/v. . . . . . . . . . . . . . 54
Table 2.3: Errors and Orders of Accuracy of J. k = 1, T = 1/v. . . . . . . . . . . . . . 54
Table 2.4: Errors and Orders of Accuracy of E. k = 2, T = 1/v. . . . . . . . . . . . . . 54
Table 2.5: Errors and Orders of Accuracy of J. k = 2, T = 1/v. . . . . . . . . . . . . . 54
Table 2.6: Errors and Orders of Accuracy of E. k = 3, T = 1/v. . . . . . . . . . . . . . 55
Table 2.7: Errors and Orders of Accuracy of J. k = 3, T = 1/v. . . . . . . . . . . . . . 55
Table 2.8: Adaptive Result of P 2 case, using alternating flux . . . . . . . . . . . . . . 56
Table 2.9: Adaptive Result of P 2 case, using upwind flux . . . . . . . . . . . . . . . . 56
Table 2.10: Adaptive Result of P 3 case, using alternating flux . . . . . . . . . . . . . . 56
Table 2.11: Adaptive Result of P 3 case, using upwind flux . . . . . . . . . . . . . . . . 56
Table 2.12: CFL numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Table 2.13: Errors and orders of accuracy of u, for k = 1, at T = 1 . . . . . . . . . . . . 60
Table 2.14: Errors and orders of accuracy of u, for k = 1, at T = 0 . . . . . . . . . . . . 60
Table 2.15: Errors and orders of accuracy of u, for k = 2, at T = 1 . . . . . . . . . . . . 60
Table 2.16: Errors and orders of accuracy of u, for k = 2, at T = 0 . . . . . . . . . . . . 60
Table 2.17: Errors and orders of accuracy of u, for k = 3, at T = 1 . . . . . . . . . . . . 60
Table 2.18: Errors and orders of accuracy of u, for k = 3, at T = 0 . . . . . . . . . . . . 61
Table 2.19: L2 error and order for adaptive scheme, for T = 1 . . . . . . . . . . . . . . 62
Table 2.20: L2 and L∞ errors of Hz , Ex , Ey for Alternating Flux I case . . . . . . . . . . 65
                                              viii


                                     LIST OF FIGURES
Figure 2.1: Function E and J at t = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Figure 2.2: Accurate solution, and active elements at T = 1 for ε = 10−5 ; red
            circles are center of active elements. . . . . . . . . . . . . . . . . . . . . . 63
Figure 2.3: Magnitude |u1 (ζ, ξ)| of Fundamental Soliton and |u2 (ζ, ξ)| of Second
            Order Soliton, of Solutions of NLSE (2.70). . . . . . . . . . . . . . . . . . 68
Figure 2.4: Comparison for fundamental soliton between absolute value |Hz | of
            magnetic field (left column) and center of active elements (right col-
            umn) at different time. First row: T = 2; Second row: T = 4; Third
            row: T = 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Figure 2.5: Comparison for second order soliton between absolute value |Hz | of
            magnetic field (left column) and center of active elements (right col-
            umn) at different time. First row: T = 1.976; Second row: T = 3.024;
            Third row: T = 3.459. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Figure 3.1: Schematic of network of processors. Dashed lines denote inter-processor
            communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Figure 3.2: Flux Φ(0) defined in (3.30) . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 3.3: Profiles of the processor speed α used in the numerical experiments.
            The non-standard orientation of the graphs is set to match the axes in
            the numerical results that follow. . . . . . . . . . . . . . . . . . . . . . . . 87
Figure 3.4: Comparison of the discrete model with (imax , k max ) = (2500, 500) and
            the continuum model for η = 0.2 at time t = 0.5. As expected, the dis-
            crete model shows better agreement with the continuum model than
            the previous version with only (imax , k max ) = (1000, 200) processors
            and stages; cf. fig. B.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Figure 3.5: The effect of a highly localized slowdown on ρ . . . . . . . . . . . . . . . 90
Figure 3.6: The effect of small variation in processor speed on ρ. After sufficient
            time, a profile emerges with the periodicity of α. . . . . . . . . . . . . . . 91
                                                ix


Figure A.1: Transient fundamental (M = 1) temporal soliton propagation with
            the adaptive Sparse Grid DG fully implicit scheme, in Subsection
            2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−5 ,
            and coarsen threshold is η = 10−6 . First column: piecewise polyno-
            mial degree k = 1; second column: k = 2; third column: k = 3. First
            row: alternating flux I; second row: alternating flux II; third row: up-
            wind flux. Solutions at time T = 40 and T = 80 are plotted on each
            case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure A.2: Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.1. . . 96
Figure A.3: Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.1. . . 97
Figure A.4: Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.1. . . . . . 98
Figure A.5: Transient fundamental (M = 1) temporal soliton propagation with
            the adaptive Sparse Grid DG fully implicit scheme, in Subsection
            2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−3 ,
            and coarsen threshold is η = 10−4 . First column: piecewise polyno-
            mial degree k = 1; second column: k = 2; third column: k = 3. First
            row: alternating flux I; second row: alternating flux II; third row: up-
            wind flux. Solutions at time T = 40 and T = 80 are plotted on each
            case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Figure A.6: Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.5. . . 100
Figure A.7: Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.5. . . 101
Figure A.8: Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.5. . . . . . 102
Figure A.9: Transient second order (M = 2) temporal soliton propagation with
            the adaptive Sparse Grid DG fully implicit scheme, in Subsection
            2.5.2. Maximum mesh level N = 11. Refinement threshold ε = 10−5 ,
            and coarsen threshold is η = 10−6 . First column: piecewise polyno-
            mial degree k = 1; second column: k = 2; third column: k = 3. First
            row: alternating flux I; second row: alternating flux II; third row: up-
            wind flux. Solutions at time T = 40 and T = 80 are plotted on each
            case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
                                                 x


Figure A.10:Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.9. . . 104
Figure A.11:Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.9. . . 105
Figure A.12:Comparison of active elements and electric field solutions, at T = 40
            and T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.9. . . . . . 106
Figure B.1: Discrete solution r and continuum solution ρ of η = 0.2, in Example
            3.5.1. From left to right, columns correspond to solutions at t = 0.1,
            t = 0.25, and t = 0.5. Discrete solution is computed with (imax , k max ) =
            (1000, 200). Continuum solution is computed on a 103 × 103 mesh. . . . . 107
Figure B.2: Discrete solution r and continuum solution ρ of η = 1 case, in Ex-
            ample 3.5.1. From left to right, columns correspond to solutions at
            t = 0.1, t = 0.25, and t = 0.5. Discrete solution is computed with
            (imax , k max ) = (500, 500). Continuum solution is computed on a 103 ×
            103 mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Figure B.3: Discrete solution r and continuum solution ρ of η = 5 case, in Ex-
            ample 3.5.1. From left to right, columns correspond to solutions at
            t = 0.1, t = 0.25, and t = 0.5. Discrete solution is computed with
            (imax , k max ) = (200, 1000). Continuum solution is computed on a
            103 × 103 mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Figure B.4: The effects on ρ due to variations in η, in Example 3.5.2. As η in-
            creases, the throttling effect of local slowdown spreads more quickly,
            and data is not processed as quickly. . . . . . . . . . . . . . . . . . . . . . 110
Figure B.5: The effects on ρ due to variations in β, in Example 3.5.3. Larger values
            of β lead to more throttling. . . . . . . . . . . . . . . . . . . . . . . . . . . 111
                                                xi


                              LIST OF ALGORITHMS
Algorithm 2.1: DGSolution Class, in DGSolution.h . . . . . . . . . . . . . . . . . . . 43
Algorithm 2.2: Element Class, in Element.h . . . . . . . . . . . . . . . . . . . . . . . 44
Algorithm 2.3: DGAdapt class, in DGAdapt.h . . . . . . . . . . . . . . . . . . . . . . 46
Algorithm 2.4: Example of Linear Equations, in Linear2D.cpp . . . . . . . . . . . . . 49
Algorithm 2.5: While Loop to Find Numerical Solution in Algorithm 2.4 . . . . . . . 50
Algorithm 2.6: Function FormFunction for Nonlinear Solver in Algorithm 2.4 . . . . 51
Algorithm 2.7: Command to Run Executable Linear2D . . . . . . . . . . . . . . . . . 51
                                         xii


                                        CHAPTER 1
                                     INTRODUCTION
In this chapter, we briefly introduce several numerical methods, which are the focus of our
research projects: Discontinuous Galerkin (DG) Method, its variant: Sparse Grid Discon-
tinuous Galerkin (SGDG) Method, and Weighted Essentially Non-Oscillatory (WENO)
Method.
1.1    Discontinuous Galerkin (DG) Method
Discontinuous Galerkin (DG) Method is a specific kind of finite element methods. Com-
paring to traditional finite element method, basis functions of DG Methods are chosen
to be discontinuous on element interfaces, such as piecewise polynomials, which leads
to the introduction of numerical flux. By properly choosing numerical flux, the resulting
numerical method is stable and convergent.
    The DG Method for transport equation was first introduced by Reed and Hill in 1973,
and discussed in a report [13] of Los Alamos National Laboratory. The method was
to solve the neuron transport equations, which are time-independent linear hyperbolic
equations. In late 1980s and early 90s, a major development, namely Runge-Kutta DG
Methods, was established by Cockburn and Shu in a series of paper [14, 15, 16, 17, 18],
which is a framework to solve nonlinear time-dependent hyperbolic equations, e.g. Eu-
ler equations of compressible gas dynamics [19]. Under this framework, the DG dis-
cretization is used only for spatial variables, while explicit, nonlinearly stable high order
Runge-Kutta methods [20, 21] are used to discretize the time variable. To avoid oscillation
of numerical solutions when strong shock occurs, exact or approximate Riemann solvers
as interface fluxes and total variation bounded (TVB) nonlinear limiters [22, 23] are in-
troduced from finite volume methods. DG methodology was also generalized to treat
viscous terms as well, and thus the DG methods were designed to solve Navier-Stokes
                                               1


equations [24, 25].
    Comparing to classical finite volume and finite difference methods, DG Methods have
certain advantages [26]. By choosing polynomials of high degree, the corresponding DG
Methods can achieve high order of accuracy; the mass matrix is sparse so the DG Meth-
ods are parallelizable. In addition, DG Methods can handle h-p adaptivity; due to the
discontinuous basis functions, the continuity restrictions for conforming finite element
methods need not be considered, and grid refinement or unrefinement is easy to handle.
Adaptivity is of particular interest for hyperbolic systems and can potentially advance
computational efficiency [27, 28].
    In 1970s, independent research of Galerkin Methods applying to elliptic and parabolic
equations emerged, as discussed in [29], which are now called interior penalty (IP) meth-
ods. The idea of penalty formulation could be traced back to 1960s, e.g. [30]. In 1971,
Nitsche [31] developed the first penalty method, where a penalty term inversely propor-
tional to mesh size was introduced to guarantee stability and optimal order of conver-
gence for smooth exact solution. Instead, Babuska’s approach [32] used a more flexible
penalty term, with consistency error and sub-optimal convergence rate. The method was
further analyzed and generalized to nonlinear elliptic and parabolic problems by Arnold
[33], which were summarized in [34]. After all, a unified framework is proposed [29, 35],
for both primal formulation inspired by original interior penalty method, and methods
inspired by finite volume methods of hyperbolic problems with proper numerical flux.
We also refer to textbooks like [36] for further reference.
    Since the DG Methods were introduced, they have found rapid applications in such
diverse areas as aeroacoustics, electro-magnetism, gas dynamics, granular flows, magne-
tohydrodynamics, meteorology, modeling of shallow water, oceanography, oil recovery
simulation, semiconductor device simulation, transport of contaminant in porous me-
dia, turbomachinery, turbulent flows, viscoelastic flows and weather forecasting, among
many others. For further discussions of this, we refer to review papers e.g. [26, 27, 37, 28].
                                              2


For earlier works on DG methods, we refer to the survey paper [26], and other papers
in the same Springer volume. The lecture notes [38] is a good reference for many de-
tails, as well as the extensive review paper [39]. There are several special journal issues
devoted to the DG method [40, 41, 42, 43, 44], which contain many interesting papers
on DG method in all aspects including algorithm design, analysis, implementation and
applications. There are also a few books and lecture notes [45, 46, 47, 36, 48, 49] on DG
methods.
    DG methods have also grown to be broadly adopted for electromagnetic simulations
in the past two decades. They have been developed and analyzed for time dependent
linear models, including Maxwell’s equations in free space (e.g., [50, 51, 52]), dispersive
media (e.g., [53, 54, 55, 56]), as well as meta-materials (e.g., [57, 58, 59, 60]). However,
there exists only limited study for DG methods for nonlinear Maxwell models. For ex-
ample, in [61, 62], Kerr nonlinearity was investigated, where the entire Maxwell PDE-
ODE system was cast as a nonlinear hyperbolic conservation law, for which DG methods
have long been known for their success. A relaxed version of the Kerr model, called the
Kerr–Debye model, was examined in [63], where a second-order asymptotic-preserving
and positivity-preserving DG scheme is designed and analyzed; [64] also devised and
analyzed asymptotic-preserving and positivity-preserving methods for the Kerr-Debye-
Lorentz model.
1.2     Sparse Grid Discontinuous Galerkin (SGDG) Method
As discussed above, DG methods have overall good performance for different kinds of
problems like hyperbolic or elliptic equations, and they admit flexibility in choosing the
discretization meshes and the approximation spaces. However, the methods are often
considered too costly due to the large degrees of freedom of the approximation space,
especially for problems on high dimension, due to curse of dimensionality [65]. Such
drawback is more prominent when dealing with high-dimensional equations arising from
                                             3


real-world applications, such as kinetic simulations, stochastic analysis, and mathemati-
cal modeling in finance or statistics.
    The sparse grid techniques [66, 67] was introduced by Zenger [68], and later becomes
a major tool to break the curse of dimensionality of grid-based approaches. The idea re-
lies on a tensor product hierarchical basis representation, which can reduce the degrees of
freedom without compromising too much on accuracy. The fundamentals of sparse grid
techniques can further date back to Smolyak [69] for numerical integration, and they are
closely related to hyperbolic cross [70, 71], boolean method [72], discrete blending method
[73], and splitting extrapolation method [74]. The construction of the scheme is to seek a
proper truncation of the tensor product hierarchical basis, which can be formally derived
by solving an optimization problem of cost-benefit ratios [75]. Sparse grid techniques
have been incorporated in collocation methods for high-dimensional stochastic differen-
tial equations, Galerkin finite element methods, finite difference methods, finite volume
methods, and spectral methods for high-dimensional PDEs [3].
    Employing the similar sparse grid techniques as above, Sparse Grid Discontinuous
Galerkin (SGDG) Methods are a specific class of DG Methods. Instead of choosing piece-
wise polynomials for basis functions as in typical DG Methods, as introduced in [3, 4, 5],
SGDG uses basis functions derived from Alpert’s multiwavlet basis functions [76, 77]
and multiresolution analysis (MRA) [78]. In one dimension, SGDG Methods would be
an equivalent reformulation of traditional DG Methods; however for higher dimension
case, to break the curse of dimensionality, we use a subset of tensor product approxi-
mation space for DG approximation, based on the hierarchial structure of multiwavelet
basis functions. SGDG Methods efficiently reduce the number of degrees of freedom
(DoF) of the unknowns from O(h−d ) to O(h−1 | log2 h|d−1 ) for d-dimensional problems [3],
where h is the uniform mesh size in each dimension. Stability and conservation can
be maintained, while error is only slightly deteriorated for sufficiently smooth solutions
[4]. SGDG Methods have been applied to elliptic equations [3], transport equations [4],
                                               4


reaction-diffusion equations [79], Helmholtz equation [80], Vlasov–Maxwell equations
[81], radioactive transfer equation [82], etc.
    Success of SGDG Methods is encouraging, but several improvement comes afterwards.
The main bottleneck is that the previous method can only treat some kind of linear equa-
tions with given variable coefficients, or coefficients with specified dependence to un-
knowns. To compute of nonlinear terms in nonlinear equations, interpolatory multi-
wavelets for MRA quadrature and sparse grid collocation method [7, 83] were developed,
where new multiwavelets called interpolatory multiwavelets were associated to interpo-
lation on the hierarchial grid. Besides, since the Alpert’s and interpolatory multiwavelet
basis functions are global, it is essential to find efficient implementation to make sure
computational cost comparable to traditional DG Methods, so a fast algorithm regarding
matrix-vector multiplication based on [84, 85] was also introduced into SGDG method.
    In addition, if the exact solution is not smooth enough, the standard SGDG Methods
might not be a good choice, since it requires fine mesh to capture the non-smooth solution.
In order to overcome this, a family of adaptive SGDG Methods for linear transport equa-
tions were developed [5], by using fully tensorized basis functions, with hierarchial sur-
plus as refinement or coarsening indicator automatically capturing the local structures.
The adaptive method has also been applied to kinetic equations [5], hyperbolic conser-
vation laws [6], wave equations [83, 86], Hamilton-Jacobi equation [87], and nonlinear
Schrodinger equation [88].
1.3    Weighted Essentially Non-Oscillatory (WENO) Method
WENO (Weighted Essentially Non-Oscillatory) Methods, together with ENO (Essentially
Non-Oscillatory) Methods, are high order accurate finite difference or finite volume schemes
designed for solving hyperbolic and convection-diffusion equations with possibly dis-
continuous solutions or solutions with sharp gradient regions. The key idea of ENO and
WENO Methods is an nonlinear adaptive approximation procedure that achieves high
                                               5


order accuracy on smooth region of function, while resolving shock or other discontinu-
ities sharply without substantial oscillations. Because the main idea of the methods is not
necessarily related to PDEs, ENO or WENO Methods have several non-PDE applications,
too [89].
    High order finite difference and finite volume methods are based on interpolations of
discrete data, mostly by using algebraic polynomials. Approximation theory guarantees
that a wider interpolation stencil yields higher order accuracy for smooth function. Fixed
stencils are usually applied in these methods, and perform well in smooth problems; how-
ever, fixed stencil interpolation causes unavoidable oscillation called Gibbs phenomena
near discontinuity, which cannot be eliminated by mesh refinement, leading to numerical
instability in nonlinear problems involving discontinuities [90]. In applications such as
hyperbolic conservation laws, Hamilton-Jacobi Equations, or convection-diffusion equa-
tions, etc. the exact solution might contain discontinuities on either the solution itself
or its derivative, regardless of smoothness of initial or boundary conditions [89, 11], and
fixed stencil interpolation is a less appealing choice in these cases.
    ENO schemes were first introduced by Harten, Engquist, Osher and Chakravarthy in
1987 [91] in the finite volume framework, which in the first time successfully attempting
to obtain a self-similar, uniformly high order accurate, and essentially non-oscillatory in-
terpolation. In this paper, they used Newton divided differences to determine the local
smoothness of the function to be approximated, which indicates the relatively smooth
stencil to be chosen among candidate stencils for interpolation. Later, ENO methods in fi-
nite difference framework, together with TVD Runge-Kutta time integration, were shown
save significant amount of computational cost in higher dimension [92, 93].
    Based on ENO, WENO methods were developed using convex combination of all can-
didates stencils instead of only one stencil as in ENO. The crucial ingredient is choice of
the combination coefficients, also called nonlinear weights, to fulfill the following [89]: (1)
when the solution is sufficiently smooth, the nonlinear weight should be close to so-called
                                               6


"linear weight", which guarantees high order accuracy in combined stencil; (2) when some
candidate stencils contain discontinuity while others don’t, the stencils containing discon-
tinuities should be associated with smaller nonlinear weight.
    The first WENO Method [94] was the one dimension case for hyperbolic conserva-
tion laws, under finite volume framework, while [95] provided the multidimensional
version, and in addition, [96, 97] improved accuracy. [98] offered 1D finite volume for-
mulation based on a staggered grid and Lax-Friedrichs formulation. [99] developed the
multidimensional finite difference formulation with improved accuracy, and a general
framework in forming a (2k + 1)-order WENO approximation from a k-th order ENO
stencil was established. The fifth order WENO in [99] becomes the most commonly used
WENO method. Higher order ((2k+1)-th order, with k = 3, 4, 5, ...) WENO reconstruction
procedures were developed in [100]. WENO improves upon ENO in robustness, better
smoothness of fluxes, better steady state convergence, better provable convergence prop-
erties, and more efficiency. We refer to lecture notes and review papers [11, 101, 89, 12]
for further details.
    Among the applications of WENO methods, we consider Hamilton-Jacobi equations
                         ϕt + H(ϕx1 , ..., ϕxd ) = 0, ϕ(x, 0) = ϕ0 (x),
where H is usually nonlinear but at least Lipschitz continuous. It is widely known that
global C 1 solution does not exist for this equation, regardless of smoothness of initial
condition. This is a direct implication, at least for one dimension case, from the fact that
if we take u = ϕx , the Hamilton-Jacobi equation is equivalent to hyperbolic conservation
law of function u. Singularities of u are discontinuities, so u is bounded with bounded
variation, and so is derivative of ϕ. The idea of viscosity solution, a specific kind of weak
solution, is then introduced to theory of Hamilton-Jacobi equation, to describe the unique
physical relevant solution. More details are discussed in e.g. [102, 103, 104].
    Because of the relation between Hamilton-Jacobi equation and hyperbolic conserva-
tion law, ENO and WENO method applied in conservation law would be similarly ap-
                                                 7


plied to Hamilton-Jacobi equation. High order ENO schemes for solving Hamilton-Jacobi
equations were developed in [105] for the second order case and in [106] for the more gen-
eral cases, based on ENO schemes for solving conservation laws [91, 92, 93]. High order
WENO schemes for solving Hamilton-Jacobi equations were developed in [107], based
on WENO schemes for solving conservation laws [94, 99]. The framework of WENO
schemes for solving Hamilton-Jacobi equations is again similar to that of ENO schemes
described in the previous section. The key ingredient in designing a nonlinear weight, as
discussed in [107], is similar to that in [99] for conservation laws, namely the smoothness
indicator is a scaled sum of the squares of the L2 norms of the second and higher deriva-
tives of the interpolation polynomial on the target interval. We refer to the lecture notes
of Shu [12] for more details of ENO and WENO schemes.
    To couple with ENO and WENO spatially semi-discrete methods above, a popular
time discretization method to choose is the class of strong stability preserving (SSP),
which is also referred to as total variation diminishing (TVD) or high order Runge-Kutta
time discretizations; see [92, 108, 109, 110].
1.4    Organization of Thesis
In Chapter 2, we consider the Maxwell equations with nonlinear optical media, where
the nonlinearity comes from cubic Kerr effect and Raman scattering [111, 112]. The nu-
merical simulation of such equations is considered to be computationally intensive but
substantially more robust and physically relevant compared with Nonlinear Schrodinger
Equation (NLSE) models. A semi-discrete energy-stable DG method was applied to such
equations in one dimension [1] or higher dimension [2], while fully implicit method or
trapezoidal method was used for time discretization. We construct the method under
adaptive SGDG framework [5, 6, 83], with L2 indicator norm on each element for coars-
ening and refinement. Sparse grid collocation method with interpolatory multiwavelet
basis functions are required to deal with nonlinear problems, and its adaptive version
                                                8


reduces computational cost [7]. Fast wavelet transform is applied to transform between
point values or derivatives of numerical solution at interpolation points, and coefficients
of hierarchial multiwavelet basis functions; the fast transform computes results dimen-
sion by dimension after proper decomposition of one dimension matrix operator into
upper and lower triangular part, so computational cost is further reduced [6, 7]. A C++
package is developed by a group of colleagues, for general numerical simulation using
SGDG framework [8], incorporating the standard SGDG implementation [3] with Alpert’s
multiwavelet basis [76], Lagrangian and Hermite interpolatory multiwavelet [7] for non-
linear problems, adaptivity [5], fast matrix-vector multiplication [6], etc. Several time
integration method, including IMEX-RK [6] was implemented in the package, and data
structures, linear solvers, and multi-thread parallelism in Eigen package [113] of numer-
ical linear algebra is integrated to the SGDG package. In our work, we further apply
PETSc package [114] to implement the adaptive SGDG method. PETSc offers a whole set-
ting of advanced data structures, which are designed for multi-core parallel computing;
its rich library of nonlinear solvers [115], including a group of matrix-free method e.g.
Jacobian-free Newton–Krylov methods [116], is vital for general simulations for nonlin-
ear problems when Jacobian is not attainable. Several examples are made to test accuracy
and to show efficiency of the implementation of the presented algorithm.
    In Chapter 3, we present a simplified ordinary differential model of data flow on pro-
cessors in a high performance computing framework, which involves computations ne-
cessitating inter-processor communications. The asymptotic limit of this model treats the
computer as a continuum of processors and data flow as an Eulerian fluid governed by
a conservation law, which implies a Hamilton-Jacobi equation [103, 104] for which the
existence and uniqueness of solutions can be proven. We choose WENO method for nu-
merical simulation of the continuum Hamilton-Jacobi model; WENO interpolation uses
a convex combination of candidate stencils, and by assigning each stencil a nonlinear
weight based on local smoothness of numerical solution on the stencil, WENO interpola-
                                              9


tion has high order accuracy [11]. Comparing to other numerical methods like artificial
viscosity or TVD method, there is no problem-dependent parameter in WENO method,
and no accuracy degeneration occurs in smooth region of numerical solution [12]. Thus
numerical experiments for the continuum model are calculated using a spatially fifth or-
der WENO interpolation coupling with optimal third order SSP Runge-Kutta time inte-
gration [12, 11], and compared with results from discrete model. A qualitative agreement
is shown between the simulations on discrete and continuum models, and we investi-
gate the effect of variations in the computing environment’s processing capabilities on
the progress of the modeled computation. The major contents of this chapter was already
published in [10].
                                            10


                                           CHAPTER 2
        APPLICATION OF SGDG METHOD TO MAXWELL’S EQUATION IN
                                   NONLINEAR OPTICS
2.1    Maxwell’s Equation in Nonlinear Optics
The propagation of electromagnetic waves in general media is modeled by the time-
dependent Maxwell’s partial differential equations (PDEs), coupled with constitutive laws
that describe the response of the media. Particularly, in nonlinear media, the material
response depends nonlinearly on the optical field, and many interesting physical phe-
nomena, such as frequency mixing and second/third-harmonic generation have been ob-
served and harnessed for practical applications. We refer to classical textbooks [117, 118,
119] for a more detailed review of the field of nonlinear optics.
    When Maxwell’s equations are considered to model the electromagnetic waves propa-
gating through a nonlinear optical medium, the medium response is described by consti-
tutive laws that relate the electric field E and the electric flux density D through the polar-
ization P of the medium. Here we focus on a macroscopic phenomenological description
of the polarization, which comprises both linear and nonlinear responses. Specifically,
the linear response is modeled by a single resonance Lorentz dispersion, while the non-
linear response is cubic and incorporates the instantaneous Kerr effect and the delayed
nonlinear Lorentz dispersion called Raman scattering. Within this description, we will
follow the auxiliary differential equation (ADE) approach, where the linear and nonlin-
ear Lorentz dispersion is represented through a set of ODEs, describing the time evolution
of P (hence of D) forced by E, appended to Maxwell’s equations. An alternative repre-
sentation is via a recursive convolution method, where D is computed from E through a
time convolution integral [120, 1].
    We begin with the Maxwell’s equations, which govern the time evolution of the elec-
                                               11


tric field E and magnetic field H in a non-magnetic nonlinear optical medium, on time
domain (0, T ) and spatial domain Ω:
                                           ∂t B + ∇ × E = 0                               (2.1)
                                     ∂t D + Js − ∇ × H = 0                                (2.2)
                                                   ∇·B = 0                                (2.3)
                                                  ∇·D = ρ                                 (2.4)
along with initial and boundary data in the domain Ω ⊂ Rd , d = 1, 2, 3. The variable Js is
the source current density, and ρ is the charge density. The electric flux density D and the
magnetic induction B are related to the electric and magnetic field, respectively, via the
constitutive laws
                                        D = ϵ0 (ϵ∞ E + P)                                 (2.5)
                                        B = µ0 H                                          (2.6)
where P is the polarization. The dielectric parameter is ϵ0 , the electric permittivity of free
space, ϵ∞ , the relative electric permittivity in the limit of the infinite frequency, and µ0 ,
the magnetic permeability of free space. We will assume here that all model parameters
are constant, and the material is isotropic.
    To model the linear and nonlinear dispersion in the material we use the auxiliary dif-
ferential equation (ADE) approach as presented in [121, 120]. A thorough discussion of
the modeling of Raman and Kerr effects in optical (silica) fibers can be found in [111].
The linear (L) delayed or retarded response of the material to the electromagnetic field is
captured in the polarization, P, via a linear single resonance Lorentz response, which, in
the form of a second order ODE, is provided as,
                           ∂ 2 PLDelay 1 ∂PLDelay
                                  2
                                       +            + ω02 PLDelay = ωp2 E,
                               ∂t         τ ∂t
which could be split into first order system
                        ∂PLDelay           ∂J       1
                                    = J,       = − J − ω02 PLDelay + ωp2 E.
                           ∂t              ∂t      τ
                                                 12


Here ω0 and ωp are the resonance and plasma frequencies of the medium, respectively, and
τ −1 is a damping constant. In addition, ωp2 = (ϵs −ϵ0 )ω02 , with ϵs as the relative permittivity
at zero frequency.
    For pulse widths that are sufficiently short (for e.g., shorter than 1 pico-second (ps) for
Silica) [122], the nonlinear response has an instantaneous as well as a delayed component.
For the nonlinear (NL) response of the medium, we will consider a cubic Kerr-type instan-
taneous response, and a retarded Raman molecular vibrational response called Raman
scattering. The Kerr effect is a phenomenon in which the refractive index of a material
changes proportionally to the square of the applied electric field. Raman scattering arises
from the electric field induced changes in the internal nuclear vibrations on time scales
about 1 to 100 femto-seconds (fs) [112], and is modeled by a nonlinear single resonance
Lorentz delayed response. The two nonlinear responses are given as
                                     PN L
                                      Kerr = a(1 − θ)E|E| ,
                                                             2
and
                                        PN  L
                                           Delay = aθQE,
while the total nonlinear response is PN L = PN     L      NL
                                                   Kerr +PDelay . Here a is a third order coupling
constant, θ parameterizes the relative strength of the instantaneous electronic Kerr and
retarded Raman molecular vibrational responses, and Q describes the natural molecular
vibrations within the dielectric material that has frequency many orders of magnitude
less than the optical wave frequency, responding to the field intensity. The time evolution
of Q is given by the following ODE,
                                ∂ 2Q     1 ∂Q
                                    2
                                      +          + ωv2 Q = ωv2 |E|2 ,
                                 ∂t      τv ∂t
where ωv is the resonance frequency of the vibration, and τv−1 a damping constant. This
is essentially a model for a simple linear oscillator, but coupled to the nonlinear field
intensity |E|2 .
                                                 13


    At the end we obtain the following system, on (0, T ) × Ω,
                                µ0 ∂t H = −∇ × E                                                 (2.7a)
                                   ∂t D = ∇ × H                                                  (2.7b)
                                   ∂t P = J                                                       (2.7c)
                                                  1
                                    ∂t J = − J − ω02 P + ωp2 E                                   (2.7d)
                                                  τ
                                   ∂t Q = σ                                                      (2.7e)
                                                   1
                                    ∂t σ = −          σ − ωv2 Q + ωv2 |E|2                        (2.7f)
                                                  τv
with constitutive law
                          D = ϵ0 (ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE).                               (2.8)
Note that here P is essentially PLDelay , the linear delayed response. As demonstrated in
[1, 2], under the assumption of periodic boundary conditions, the energy of the system is
                     µ0          ϵ0 ϵ∞ 2          ϵ0           ϵ0 ω02 2 ϵ0 aθ 2 ϵ0 aθ
                  Z
         E(t) =         |H|2 +          |E| + 2 |J|2 +             2
                                                                      |P| +      2
                                                                                   σ +   Q|E|2
                   Ω  2             2            2ω  p         2ω  p          4ω v     2
                    3ϵ0 a(1 − θ) 4 ϵ0 aθ 2
                  +               |E| +          Q dΩ,                                             (2.9)
                          4                   4
satisfying
                          d           ϵ0                   ϵ0 aθ
                                           Z                       Z
                                                  2
                             E =− 2            |J| dx − 2              σ 2 dx ≤ 0.               (2.10)
                         dt          ωp τ    Ω            2ωv τv     Ω
Besides, E ≥ 0 if θ ∈ [0, 3/4].
    The one dimensional case is as following:
                        µ0 ∂t H = ∂x E                                                         (2.11a)
                           ∂t D = ∂x H                                                         (2.11b)
                           ∂t P = J                                                             (2.11c)
                                          1
                           ∂t J = − J − ω02 P + ωp2 E                                          (2.11d)
                                          τ
                           ∂t Q = σ                                                            (2.11e)
                                          1
                            ∂t σ = −         σ − ωv2 Q + ωv2 E 2                                (2.11f)
                                          τv
                              D = ϵ0 (ϵ∞ E + P + a(1 − θ)E 3 + aθQE)                           (2.11g)
                                                     14


Note that here we assume uniformity of all the vector fields in the y and z directions.
Thus, all derivatives with respect to y and z in the curl and divergence operators are set
to zero. All field quantities are represented by a single scalar component. The scalar mag-
netic field H (hence B) represents the 2nd (or the 3rd) component of the vector magnetic
field H, and the scalar electric flux density D (hence E) represents the 3rd (or the 2nd)
component of D (hence E). Gauss’s laws (2.3) (2.4) only involve the x derivatives of the
first components of B and D, and therefore they are decoupled from the one-dimensional
model and become irrelevant. Numerical simulation later depends on dimensionless ver-
sion [1]
                           ∂t H = ∂x E                                               (2.12a)
                           ∂t D = ∂x H                                               (2.12b)
                           ∂t P = J                                                   (2.12c)
                                       1
                           ∂t J = − J − ω02 P + ωp2 E                                (2.12d)
                                       τ
                           ∂t Q = σ                                                  (2.12e)
                                       1
                            ∂t σ = −      σ − ωv2 Q + ωv2 E 2                         (2.12f)
                                       τv
                              D = ϵ∞ E + P + a(1 − θ)E 3 + aθQE                      (2.12g)
2.2     Formulation of Energy Stable Adaptive SGDG Method in One Di-
        mension
In this section, we describe the energy-stable adaptive SGDG method in one dimension.
We start from general formulation of SGDG method, and how it is applied to the nonlin-
ear Maxwell equation as we specified above. Adaptive scheme based on SGDG formula-
tion will also be discussed in the end of the section.
                                               15


2.2.1     SGDG Method and Alpert’s Multiwavelets
We first define the one-dimension spatial domain and its SGDG basis functions [3]. Con-
sider Ω = [0, 1] as our spatial domain, and define a set of nested grids, where the n-th
level grid Ωn consists of 2n uniform cells
                           Inj = (2−n j, 2−n (j + 1)], j = 0, 1, . . . , 2n − 1
for any n ≥ 0. For notation convenience, we also denote I−1 = [0, 1]. Define
                           Vnk = {v : v ∈ P k (Inj ), j = 0, 1, . . . , 2n − 1},
the piecewise polynomial space, of degree at most k, on n-th level grid Ωn , and we have
the nested structure
                                       V0k ⊂ V1k ⊂ V2k ⊂ . . . .
Define Wnk the orthogonal complement of Vn−1        k
                                                        in Vnk , with respect to L2 inner product in
[0, 1], i.e.
                                      M
                                   k
                                Vn−1       Wnk = Vnk ,      k
                                                          Vn−1    ⊥ Wnk .
Again for notation convenience, let W0k = V0k ; then for any natural number N ,
                                                   M
                                          VNk =          Wnk .
                                                  0≤n≤N
     As discussed in [76], we define a set of orthonormal basis functions for each Wnk ,
n = 0, . . . , N and they form a basis of VNk . The case of grid level n = 0 is trivial; the
normalized shifted Legendre polynomials in [0, 1] will be proper choice for such purpose.
                                                0
Denote these Legendre polynomials by vi,0          for i = 0, . . . , k. When n > 0, we consider the
orthonormal basis of W1k ; basis of general Wnk , n ≥ 1 will turn out to be a scaling version
of basis of W1k . In particular, orthonormal basis of W1k is defined as
                                        √
                               hi (x) =   2fi (2x − 1), i = 0, 1, . . . , k,
where for fixed integer k ≥ 0, f0 , f1 , . . . , fk are a sequence of functions supported on
[−1, 1], satisfying
                                                   16


   1. fi on (0, 1) is a polynomial with degree k;
   2. each fi extends evenly or oddly to (−1, 0), as in the following formula
                                               fi (x) = (−1)i+k+1 fi (−x),                        (2.13)
      for any x ∈ (−1, 0);
   3. f0 , f1 , . . . , fk are orthonormal i.e.
                                                 Z  1
                                                      fi (x)fj (x)dx = δij
                                                   −1
      for all 0 ≤ i, j ≤ k, where δij equals to 1 if i = j and 0 otherwise;
   4. fj has vanishing moments
                                                    Z 1
                                                         fj (x)xi dx = 0,
                                                     −1
      with i = 0, ..., j + k, for each 0 ≤ j ≤ k.
It was shown [76] that we can compute all fi through a Gram-Schmidt procedure. The
particular form of fi up to k = 4 are provided in Table 1 of [76], and for completeness we
offer these results here. For simplicity, we only provide definition of fi on (0, 1), and its
values on (−1, 0) are obtained by proper extension as described in equation (2.13).
   • k=0                                                          r
                                                                    1
                                                       f0 (x) =       ;
                                                                    2
   • k=1                                    r                           r
                                               3                           1
                                  f0 (x) =       (−1 + 2x), f1 (x) =         (−2 + 3x);
                                               2                           2
   • k=2                           r                                       r
                                 1    1                    2              1 3
                        f0 (x) =        (1 − 24x + 30x ), f1 (x) =             (3 − 16x + 15x2 ),
                                 3    2                                   2 2
                                                        r
                                                      1 5
                                           f2 (x) =          (4 − 15x + 12x2 );
                                                      3 2
                                                          17


    • k = 3:
                     r                                                   r
                        15                                                     1
          f0 (x) =         (1 + 4x − 30x2 + 28x3 ),            f1 (x) =          (−4 + 105x − 300x2 + 210x3 ),
                        34                                                    42
                     r                                                         r
                   1    35                                                   1 5
        f2 (x) =           (−5+48x−105x2 +64x3 ),               f3 (x) =             (−16+105x−192x2 +105x3 );
                   2    34                                                   2 42
    • k = 4:
                                     r
                                          1
                        f0 (x) =              (1 + 30x + 210x2 − 840x3 + 630x4 ),
                                         186
                                       r
                                     1 1
                        f1 (x) =               (−5 − 144x + 1155x2 − 2240x3 + 1260x4 ),
                                     2 38
                                     r
                                          35
                        f2 (x) =                 (22 − 735x + 3504x2 − 5460x3 + 2700x4 ),
                                         14694
                                       r
                                     1 21
                        f3 (x) =               (35 − 512x + 1890x2 − 2560x3 + 1155x4 ),
                                     8 38
                                       r
                                     1      7
                        f4 (x) =                (32 − 315x + 960x2 − 1155x3 + 480x4 ).
                                     2 158
These multiwavelet functions retain orthonormal properties of wavelet bases for different
hierarchical levels. More precisely, Wnk with n ≥ 1 has basis
    j
   vi,n (x) = 2(n−1)/2 hi (2n−1 x − j) = 2n/2 fi (2n x − 2j − 1),               i = 0, ..., k, j = 0, ..., 2n−1 − 1.
         j                         j
Each vi,n    is supported on In−1     , with discontinuity on x = 2−n w, w = 2j, 2j + 1, 2j + 2. Note
         j          n−1 −1 S
that {vi,n  }0≤j≤2
              0≤i≤k,n≥1
                               0
                             {vi,0 }0≤i≤k is an orthonormal set in L2 ([0, 1]), i.e.
                                     Z 1
                                            j         ′
                                          vi,n (x)vij′ ,n′ (x)dx = δii′ δjj ′ δnn′ ,
                                       0
            j         n−1 −1
               }0≤j≤2
                               S 0
while {vi,n      0≤i≤k, 1≤n≤N    {vi,0 }0≤i≤k is an orthonormal basis of VNk .
2.2.2    Semi-Discrete DG Method for the Maxwell’s Equation
To formulate semi-discrete DG method [1, 27] for system (2.11), we need some more no-
tations. Recall that we use N to represent the finest level of nested grid. Define
                                                    i
                                      xi+1/2 =          ,    i = 0, 1, . . . , 2N ,
                                                   2N
                                                            18


the discrete points on Ω = [0, 1] containing discontinuities for functions in VNk . As defined
above, INj = (xj−1/2 , xj+1/2 ] for j = 1, 2, . . . , 2N ; h = 2−N is mesh size. We also denote
v + , v − the right and left limit on discontinuities, and [v] = v + − v − the jump, {v} =
(v + + v − )/2 the average. They can be evaluated at each cell boundary xj±1/2 , denoted
by e.g (v + )j±1/2 . Then on each cell INj , we can formulate the standard semi-discrete DG
method of system (2.11) following [27, 1]: find functions Hh (t, ·), Dh (t, ·), Eh (t, ·), Ph (t, ·),
Jh (t, ·), Qh (t, ·), σh (t, ·), on space VNk , such that for each j = 0, 1, . . . , 2N − 1 and each cell
INj ,
             Z                  Z
        µ0        ∂t Hh ϕdx +         Eh ∂x ϕdx − (E  ch ϕ− )j+1/2 + (E
                                                                      ch ϕ+ )j−1/2 = 0, ∀ϕ ∈ V k  (2.14a)
               j                    j
                                                                                               N
              IN                  IN
         Z                  Z
               ∂t Dh ϕdx +        Hh ∂x ϕdx − (H   fh ϕ− )j+1/2 + (Hfh ϕ+ )j−1/2 = 0, ∀ϕ ∈ VNk    (2.14b)
            j                  j
           IN                 IN
        ∂ t Ph = J h                                                                               (2.14c)
                      1
        ∂t Jh = − Jh − ω02 Ph + ωp2 Eh                                                            (2.14d)
                      τ
        ∂t Qh = σh                                                                                (2.14e)
                                                                
                                         1
         Z                     Z
               ∂t σh ϕdx = −                σh − ωv2 Qh + ωv2 Eh2 ϕdx, ∀ϕ ∈ VNk                    (2.14f)
            j
           IN                    INj     τv
with constitutive law
            Z                Z
                  Dh ϕdx =          ϵ0 (ϵ∞ Eh + Ph + a(1 − θ)Eh3 + aθQh Eh )ϕdx, ∀ϕ ∈ VNk .         (2.15)
               j                j
              IN               IN
Both terms E       ch and Hfh are numerical fluxes. In our numerical experiments, we pick one
of the following as our numerical flux [1]:
central flux
                                                Ech = {Eh },     Hfh = {Hh };                       (2.16)
alternating flux either alternating flux I
                                                     ch = E + , H
                                                     E           fh = H −                         (2.17a)
                                                             h          h
         or alternating flux II
                                                     ch = E − , H
                                                     E           fh = H +                         (2.17b)
                                                             h          h
                                                           19


dissipative flux, or upwind flux, inspired by the upwind flux for the Maxwell system
      without Kerr, linear Lorentz and Raman effects
                                           r
                                         1    µ0                               1 ϵ0 ϵ∞
                                                                                 r
                    Eh = {Eh } +
                    c                             [Hh ], Hh = {Hh } +
                                                            f                            [Eh ].  (2.18)
                                         2 ϵ0 ϵ∞                               2     µ0
    Recall that under the assumption of periodic boundary condition, energy E = E(t) of
system (2.11) is
                          µ0 2 ϵ0 ϵ∞ 2            ϵ0         ϵ0 ω02 2
                    Z
             E =            H +            E + 2 J2 +              P                            (2.19a)
                      Ω   2            2         2ωp          2ωp2
                                                                                           !
                                 ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ                    3ϵ0 a(1 − θ) 4
                               +     2
                                        σ +        Q +          QE 2 +                  E dx    (2.19b)
                                  4ωv          4            2                    4
and its derivative
                           d            ϵ0               ϵ0 aθ
                                            Z                    Z
                                                2
                              E =− 2           J dx − 2              σ 2 dx ≤ 0,                 (2.20)
                           dt         ωp τ   Ω          2ωv τv     Ω
indicates that energy E is non-increasing. Besides, energy E is non-negative if θ ∈ [0, 3/4].
Similar results could be obtained for semi-discrete DG method:
Theorem 2.2.1 (Semi-Discrete Stability [1]) Under periodic boundary condition and the three
fluxes (2.16)(2.17)(2.18) above, the semi-discrete numerical method (2.14) and the discrete energy
             d
Eh satisfies Eh ≤ 0. In addition, Eh ≥ 0 if θ ∈ [0, 3/4]. The discrete energy Eh is
             dt
                         µ0 2 ϵ0 ϵ∞ 2             ϵ0         ϵ0 ω02 2
                    Z
            Eh =            Hh +           Eh + 2 Jh2 +            P                            (2.21a)
                     Ω    2            2         2ωp         2ωp2 h
                                                                                            !
                                 ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ                      3ϵ0 a(1 − θ)
                               + 2 σh +            Qh +         Qh Eh2 +                 Eh4 dx (2.21b)
                                 4ωv           4            2                     4
With proper assumption of regularity of exact solution, using argument of certain L2 pro-
jection or Gauss-Ladau projection, it was proved that the semi-discrete method converges
to exact solution, and optimal convergence rate is obtained for alternating and upwind
flux.
Theorem 2.2.2 (Error estimates of semi-discrete method [1]) Under periodic boundary con-
dition, assume the following regularity of exact solution of system (2.11)
                            E, H, P, Q, J, σ ∈ W 1,∞ ([0, T ]; H k+1 (Ω)),
                                                   20


                    E ∈ W 1,∞ ([0, T ]; W 1,∞ (Ω)), Q ∈ W 1,∞ ([0, T ]; L∞ (Ω)),
with sufficiently small a and θ, for numerical scheme (2.14), we have
                            ||u − uh || ≤ Chr ,   u = E, H, P, Q, J, σ,                     (2.22)
where r = k for central flux (2.16), and r = k + 1 for alternating (2.17) or upwind flux (2.18).
2.2.3  Energy-Stable DG Method of the Maxwell Equation
On temporal discretization, a main focus is provable energy stability of fully discrete
method. This turns out to be a nontrivial task for the nonlinear Maxwell equation. Com-
mon choices, such as the second order leap-frog or implicit trapezoidal method, may not
yield provable stability results as for the linear models [1], while the main difficulties arise
from the nonlinear Kerr and Raman terms. Here we introduce two fully discrete meth-
ods: one can be understood as novel modifications of leap-frog or implicit trapezoidal
method, and the other is a fully implicit method. These temporal discretizations are of
formal second order accuracy.
    The leap-frog style method [1] is as following: given functions unh ∈ VNk at time t = tn ,
with u = H, D, E, P, J, Q, σ, find un+1h  ∈ VNk of all u at tn+1 = tn + ∆t, so that for any j and
                                                 21


each cell INj , and any ϕ ∈ VNk ,
             n+1/2
           Hh      − Hhn
     Z                                  Z
  µ0                        ϕdx +             Ehn ∂x ϕdx − (E  cn ϕ− )j+1/2 + (E  cn ϕ+ )j−1/2 = 0           (2.23a)
                                                                 h                 h
        j
       IN      ∆t/2                      INj
        Dhn+1 − Dhn
  Z                             Z
                                          n+1/2                 ^n+1/2 −                ^n+1/2 +
                     ϕdx +             Hh        ∂x ϕdx − (Hh           ϕ )j+1/2 + (Hh         ϕ )j−1/2 = 0 (2.23b)
     j
    IN      ∆t                    INj
  Z                   Z
          n+1
       Dh ϕdx =              ϵ0 (ϵ∞ Ehn+1 + Phn+1 + a(1 − θ)Yhn+1 + aθQn+1            h Eh
                                                                                            n+1
                                                                                                )ϕdx          (2.23c)
     j                   j
    IN                  IN
                                                                                  
                                         3
  Z                  Z
       Yhn+1 ϕdx   =              n             n+1 2          n 2
                               Yh + [(Eh ) + (Eh ) ](Eh − Eh ) ϕdx     n+1      n
                                                                                                             (2.23d)
     j
    IN                   j
                        IN               2
  Phn+1   − Phn    Jhn+1 +      Jhn
                =                                                                                            (2.23e)
       ∆t                  2
  Jhn+1 − Jhn        1 Jhn+1 + Jhn               P n+1 + Phn            E n+1 + Ehn
                =−                        − ω02 h                + ωp2 h                                      (2.23f)
       ∆t            τ         2                         2                   2
    n+1       n      n+1         n
  Qh − Qh          σ        + σh
                = h                                                                                          (2.23g)
  Z ∆tn+1                  2
        σh − σhn                             1 σhn+1 + σhn             n+1
                                                                            + Qnh
                                  Z                                                              
                                                                   2 Qh                 2 n n+1
                   ϕdx = −                                   − ωv                 + ωv Eh Eh        ϕdx      (2.23h)
     j
    IN      ∆t                        j
                                     IN      τv       2                    2
                       n+1/2
           Hhn+1 − Hh
     Z                                     Z
                                                                     [ n+1 −              [
  µ0                            ϕdx +           Ehn+1 ∂x ϕdx − (E    [
                                                                       h   ϕ )j+1/2 + (E  [ n+1 +
                                                                                            h   ϕ )j−1/2 = 0  (2.23i)
        j
       IN       ∆t/2                          j
                                             IN
                               [ n+1 is similar to E       cn , or the same as E     n+1 , for central and alternat-
Be aware that the flux E       [                                                   [
ing fluxes; for upwind flux, we have
                                                                   r
                                       [  n+1 = {E n+1 } +
                                                                 1      µ0     n+1/2
                                      E[
                                                       h                    [Hh      ].
                                                                 2    ϵ0 ϵ∞
    The fully implicit method [1] is as following, with the same set-up as the leap-frog
                                                              22


method above:
            Hhn+1 − Hhn                      Ehn+1 + Ehn
      Z                               Z
   µ0                     ϕdx +                            ∂x ϕdx
         j
        IN       ∆t                      j
                                        IN          2
                                                Ehn+1\   + Ehn −              E n+1
                                                                                 \  + Ehn +
                                            −(                  ϕ )j+1/2 + ( h            ϕ )j−1/2 = 0   (2.24a)
                                                       2                           2
         Dhn+1 − Dhn                     Hhn+1 + Hhn
   Z                             Z
                       ϕdx +                             ∂x ϕdx
      j
     IN       ∆t                   INj          2
                                                Hhn+1^   + Hhn −               H n+1
                                                                                  ^  + Hhn +
                                            −(                  ϕ )j+1/2 + ( h             ϕ )j−1/2 = 0  (2.24b)
   Z                    Z                               2                           2
        Dhn+1 ϕdx =           ϵ0 (ϵ∞ Ehn+1 + Phn+1 + a(1 − θ)Yhn+1 + aθQn+1          h Eh
                                                                                          n+1
                                                                                              )ϕdx        (2.24c)
      j                   j
     IN                  IN
                                                                                 
                                           3
   Z                   Z
                                                 n+1 2
        Yhn+1 ϕdx  =               n                            n 2
                                Yh + [(Eh ) + (Eh ) ](Eh − Eh ) ϕdx    n+1      n
                                                                                                         (2.24d)
      j
     IN                   j
                         IN                2
   Phn+1   − Phn     Jhn+1 +     Jhn
                 =                                                                                       (2.24e)
        ∆t                  2
     n+1                   n+1
   Jh − Jh    n
                       1J         + Jhn           P n+1 + Phn          E n+1 + Ehn
                 =− h                      − ω02 h                + ωp2 h                                 (2.24f)
        ∆t             τ        2                        2                   2
                       n+1
   Qn+1    − Q n
               h     σ       +  σ n
                                  h
     h
                 = h                                                                                     (2.24g)
        ∆t                  2
         σhn+1 − σhn                          1 σhn+1 + σhn            n+1
                                                                            + Qnh
                                   Z                                                           
                                                                    2 Qh
   Z
                                                                                      2 n n+1
                     ϕdx = −                                   − ωv               + ωv Eh Eh      ϕdx    (2.24h)
      j
     IN      ∆t                        j
                                      IN     τv        2                   2
We can establish the energy stability for the resulting fully discrete methods as following.
Theorem 2.2.3 (Fully Discrete Stability [1]) Under periodic boundary condition and any of
the three numerical fluxes, the discrete energy is non-increasing, i.e. Ehn+1 ≤ Ehn . In addition,
Ehn ≥ 0, if θ ∈ [0, 3/4] and CFL condition ∆t ≤ Ch are satisfied. Here C is a constant depending
on constants µ0 , ϵ0 , ϵ∞ of the Maxwell equation, polynomial degree k, and choice of numerical
flux. The discrete energy Eh at t = tn is
                          µ0 n 2 ϵ0 ϵ∞ 2                     ϵ0
                  Z
         Ehn   =               (Hh ) +              Eh + 2 Jh2
                     Ω     2                   2           2ωp
                                                                                                      !
                             ϵ0 ω02 2 ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ                           3ϵ0 a(1 − θ)
                          + 2 Ph +                2
                                                     σh +        Qh +        Qh Eh2 +              Eh4 dx (2.25)
                              2ωp             4ωv            4           2                  4
                                                              23


for fully implicit method, and
                       µ0 n+1/2 n−1/2 ϵ0 ϵ∞ 2                ϵ0
                  Z
         Ehn  =           Hh      Hh      +          Eh + 2 Jh2
                   Ω    2                        2          2ωp
                                                                                               !
                         ϵ0 ω02 2   ϵ0 aθ 2 ϵ0 aθ 2 ϵ0 aθ                       3ϵ0 a(1 − θ) 4
                       + 2 Ph     +     2
                                          σh +         Qh +          Qh Eh2 +               Eh dx (2.26)
                          2ωp       4ωv            4               2                  4
for leap-frog method.
2.2.4   Interpolatory Multiwavelets
In addition to Alpert’s basis for SGDG space VNk , we need interpolatory multiwavelets to
obtain a second set of basis, to deal with nonlinear terms, e.g. f (uh ) when uh ∈ VNk and f
is nonlinear [7, 5, 6].
     Alpert’s multiwavelets and the space Wnk are constructed, corresponding to the dif-
ference of the L2 projection on adjacent levels. Instead, the sparse grid collocation basis
proposed in [7] corresponds to interpolation on nested grids. Denote the set of interpola-
tion points in the interval I = [0, 1] at mesh level 0 by X0 = {xi }Pi=0 ⊂ I, where the number
of interpolation points in X0 is P + 1. Then the interpolation points at mesh level n ≥ 1,
Xn , can be obtained in the following manner:
                    Xn = {xji,n := 2−n (xi + j), i = 0, . . . , P, j = 0, . . . , 2n − 1}.
In order to save computational cost, the points on different level of Xn should be nested,
i.e.,
                                    X0 ⊂ X1 ⊂ X2 ⊂ X3 ⊂ · · ·,
which can be achieved by requiring X0 ⊂ X1 .
     Given the interpolation points, we define the basis functions on the 0-th level grid as
Lagrange (K = 0) or Hermite (K ≥ 1) interpolation polynomials of degree less than or
equal to M := (P + 1)(K + 1) − 1 which satisfy the property
                                           (l′ )
                                          ϕi,l (xi′ ) = δii′ δll′ .
                                                    24


for i, i′ = 0, ..., P and l, l′ = 0, ..., K. It is easy to see that span{ϕi,l : i = 0, . . . , P, l =
0, . . . , K} = V0M . The constants P, K, M will be specified later, depending on the Alpert’s
multiwavelets that the interpolatory multiwavelets relate to and the problem to solve.
With the basis function at mesh level 0, we can define basis function at mesh level n ≥ 1:
             ϕji,l,n := 2−nl ϕi,l (2n x − j),         i = 0, . . . , P, l = 0, . . . , K, j = 0, . . . , 2n − 1,
which is a complete basis set for VnM .
     Now we introduce the hierarchical representations based on the nested structure of
interpolation points. Define X̃0 := X0 and X̃n := Xn \ Xn−1 for n ≥ 1; then we have the
decomposition
                                                                  [n
                                                          Xn =         X̃i .
                                                                  i=0
Denote the points in X̃1 by X̃1 =                  {x̃i }Pi=0 . Then the points in X̃n for n ≥ 1 can be repre-
sented by
                    X̃n = {x̃ji,n := 2−(n−1) (x̃i + j) : i = 0, . . . , P, j = 0, . . . , 2n − 1}.
For notational convenience, we let W                 f0M = V0M . The increment function space W                  fnM for
n ≥ 1 is introduced as a function space that satisfies
                                                                    M
                                                   VnM = Vn−1   M         fM ,
                                                                          W  n
and is defined through the multiwavelets ψi,l ∈ V1M that satisfy
                                           (l′ )                   (l′ )
                                        ψi,l (xi′ ) = 0,         ψi,l (x̃i′ ) = δii′ δll′
for i, i′ = 0, ..., P and l, l′ = 0, ..., K. Here the superscript l′ denotes the l′ -th order deriva-
tive. Then W   f M is given by
                   n
                   fnM = span{ψ j ,
                   W                              i = 0, ..., P, l = 0, ..., K, j = 0, ..., 2n−1 − 1},
                                     i,l,n
where
                                         j
                                       ψi,l,n    (x) := 2−(n−1)l ψi,l (2n−1 x − j).
                                                                25


    For completeness, we list the basis functions used in this thesis in the following, and
we focus on Hermite interpolation here [6]; for more possible choices of basis, see [6, 7].
The basis functions in W     f1 are piecewise polynomials on Il = (0, 1/2) and Ir = (1/2, 1).
Note that the functions may be discontinuous at the interface x = 1/2; thus Il and Ir are
both defined to be open intervals. The basis functions in W             f1 here are all supported on
one half interval Il or Ir and vanish on the other half. For simplicity, we will only declare
the function on its support. For example, ψ0 (x)|Ir gives the definition of ψ0 on Ir while
vanishing on Il . The interpolation points are put at the cell interface
                                                             +  −
                                e0 = {0+ , 1− },              1       1
                               X                  X1 = {
                                                   e               ,       }
                                                              2       2
Here and below, we use superscripts +, − to emphasize the left and right limits of a func-
tion at that point, which is a feature of the discontinuous piecewise polynomial space.
The basis functions in W    f03 and W f13 are, when P = 1, K = 1,
                         ϕ0,0 (x) = (x − 1)2 (2x + 1), ϕ1,0 (x) = −x2 (2x − 3),
                               ϕ0,1 (x) = x(x − 1)2 , ϕ1,1 (x) = x2 (x − 1),
and
                     ψ0,0 (x)|Il = −4x2 (4x − 3), ψ1,0 (x)|Ir = 4(x − 1)2 (4x − 1),
                      ψ0,1 (x)|Il = 2x2 (2x − 1), ψ1,1 (x)|Ir = 2(x − 1)2 (2x − 1).
When P = 1, K = 2, the basis functions in W        f 5 and W  f 5 are
                                                      0         1
               ϕ0,0 (x) = −(x − 1)3 (6x2 + 3x + 1), ϕ0,1 (x) = −x(x − 1)3 (3x + 1),
                                     1
                      ϕ0,2 (x) = − x2 (x − 1)3 , ϕ1,0 (x) = x3 (6x2 − 15x + 10),
                                     2
                                                                      1
                       ϕ1,1 (x) = −x3 (x − 1)(3x − 4), ϕ1,2 (x) = x3 (x − 1)2 ,
                                                                      2
and
         ψ0,0 (x)|Il = 16x3 (12x2 − 15x + 5), ψ1,0 (x)|Ir = −16(x − 1)3 (12x2 − 9x + 2),
                                                   26


         ψ0,1 (x)|Il = −8x3 (2x − 1)(3x − 2), ψ1,1 (x)|Ir = −8(x − 1)3 (2x − 1)(3x − 1),
                      ψ0,2 (x)|Il = x3 (2x − 1)2 , ψ1,2 (x)|Ir = −(x − 1)3 (2x − 1)2 .
    The construction above has close connection with interpolation operators. For a given
function f (x) ∈ C K+1 (I), we define INP,K [f ] as the standard Hermite interpolation on VNM ,
and we have the representation
                                                        n−1
                                             X N max(2X−1,0) X    K X P
                           INP,K [f ](x)  =                              bji,l,n ψi,l,n
                                                                                  j
                                                                                        (x).
                                             n=0       j=0       l=0 i=0
Clearly, (INP,K − INP,K                     fM
                       −1 )[f ](x) is in WN . The algorithm converting between the point values
and the derivatives {f (l) (xji,n )} to hierarchical coefficients {bji,l,n } is given in [7], and by a
standard argument in fast wavelet transform, can be performed in O(M 2n ) flops.
2.2.5   SGDG Method with Multiresolution Interpolation in One Dimension
As discussed in section (2.2.3), we derive the energy stable DG method. To corporate it
with multiresolution interpolation [6], we need to decide how to numerically calculate
terms
                                      Z   1
                                            f (Uhn+1 , Uhn )ϕdx,    ∀ϕ ∈ VNk ,                   (2.27)
                                        0
in fully discrete numerical scheme (2.23) and (2.24), where
                                            U = (H, D, E, P, J, Q, σ)
and f (Uhn+1 , Uhn ) takes one of the following up to some constants:
                      [(Ehn+1 )2 + (Ehn )2 ](Ehn+1 − Ehn ),       Ehn Ehn+1 ,     Qn+1
                                                                                     h Eh
                                                                                           n+1
                                                                                               .
Notice that nonlinear terms of the Maxwell equation discussed in the thesis occur as part
of integrand of integrals over spatial domain. The fully discrete SGDG schemes with
interpolation are to replace nonlinear term in form of (2.27) to
                                            Z   1
                                                  I f (Uhn+1 , Uhn ) ϕdx
                                                                   
                                                                                                 (2.28)
                                              0
                                                          27


where I[·] is a multiresolution interpolation operator INP,K as described in previous sec-
tion, onto finite element space VNM with the same multiresolution structure as VNk , but of
polynomial degree M = (P + 1)(K + 1) − 1, as we discuss above.
    To elaborate the implementation of such algorithm regarding treatment to nonlinear
terms, assume we know both Uhn+1 and Uhn for an implicit scheme. We first read the
(derivative) values of Uhn+1 and Uhn , each component of which is a linear combination of
Alpert’s basis functions at the chosen interpolation points. We then calculate the (deriva-
tive) values of f at interpolation points of interpolatory multiwavelet basis. Last, we
transform the (derivative) values back to coefficients of (Alpert’s or interpolatory) multi-
wavelet basis, by using the algorithm introduced in [7]. Such numerical algorithm can be
performed through a fast matrix-vector product as in [6, 84]. As stated in [6], we remark
that the computational cost does not increase too much compared to the multiresolution
DG schemes for linear equations [5]. The cost of the transformation from the (derivative)
values to hierarchical coefficients is only linearly dependent on the dimension d [7].
2.3    Formulation of Energy Stable Adaptive SGDG Method in Higher
       Dimension
2.3.1  Semi-Discrete and Fully-Discrete Energy Stable DG Method
As mentioned before in (2.7) and (2.8), we consider the following Maxwell system for
vector-valued H, D, E, P, J and scalar Q, σ, on (0, T ) × Ω,
                              µ0 ∂t H = −∇ × E                                       (2.29a)
                                 ∂t D = ∇ × H                                        (2.29b)
                                 ∂t P = J                                             (2.29c)
                                  ∂t J = −γJ − ω02 P + ωp2 E                         (2.29d)
                                 ∂t Q = σ                                            (2.29e)
                                  ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2                     (2.29f)
                                               28


with constitutive law
                          D = ϵ0 (ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE).                             (2.30)
Here
                                                  1              1
                                            γ=      ,    γv =      .
                                                  τ             τv
For simplicity, we consider the method only for the two-dimension transverse electric
(TE) mode. Extension to the full three-dimension model is straightforward [2]. Thus, we
consider the two-dimension system of equations
                      µ0 ∂t Hz = ∂y Ex − ∂x Ey                                                 (2.31a)
                        ∂t Dx = ∂y Hz                                                          (2.31b)
                         ∂t Dy = −∂x Hz                                                         (2.31c)
                             D = ϵ0 (ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE)                         (2.31d)
                          ∂t P = J                                                             (2.31e)
                           ∂t J = −γJ − ω02 P + ωp2 E                                           (2.31f)
                          ∂t Q = σ                                                             (2.31g)
                          ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2                                      (2.31h)
Here, all vector fields have two components polarized in the x-y plane, e.g. D = (Dx , Dy ),
E = (Ex , Ey ), P = (Px , Py ), and J = (Jx , Jy ), except H = Hz . Q and σ are scalar compo-
nents.
    Assume computational domain, now on x-y plane, is Ω = [0, 1] × [0, 1], and consider
the uniform mesh for Sparse Grid DG purpose. Define
                                      i                 j
                         xi+ 1 =        ,   yj+ 1 =       ,   i, j = 0, 1, . . . , 2N .
                              2     2N          2      2N
Then {xi+1/2 } and {yj+1/2 } divide [0, 1] on x and y direction uniformly into 2N many cells.
Let
                 Ii = [xi− 1 , xi+ 1 ],   Jj = [yj− 1 , yj+ 1 ],     Kij = Ii × Jj ,    ∀i, j,
                           2       2                  2     2
                                                      29


and {Kij }i,j is a partition or mesh of Ω. Each cell Kij has cell center
                                    xi− 1 + xi+ 1 yj− 1 + yj+ 1               i − 1/2 j − 1/2
                     (xi , yj ) = (      2         2
                                                     ,      2         2
                                                                        )=(            ,           ),
                                             2                 2                2N           2N
and grid size is ∆x = ∆y = 2−N .
    Regarding the mesh, we define
                                Vhk = v ∈ L2 (Ω) : v|Kij ∈ Qk (Kij ), ∀i, j ,
                                        
where Qk (Kij ) consists of polynomials with degree up to k in each variable on Kij . With-
out confusion, Vhk is also used to represent its vector version in the following. For any
function v ∈ Vhk , we write
                v(x±i+ 1
                          , y) = lim± v(xi+ 1 + ϵ, y),                  ±
                                                                 v(x, yj+ 1 ) = lim v(x, yj+ 1 + ϵ),
                                                                                     ±
                        2        ϵ→0            2                         2      ϵ→0             2
which are left or right limit on cell boundary of x or y direction. Average and jump on
cell interface x = xi+1/2 is
                            1 +                     −
                                                               
           {v}xi+ 1 =          v(xi+ 1 , y) + v(xi+ 1 , y) ,        [v]xi+ 1 = v(x+ i+ 1
                                                                                         , y) − v(x− i+ 1
                                                                                                          , y),
                   2        2        2                  2                  2           2                2
and on cell interface y = yj+1/2 is
                           1        +                  −
                                                               
                                                                                       +                −
           {v}yj+ 1 =          v(x, yj+  1 ) +  v(x,  y      )  ,   [v]yj+ 1 = v(x, yj+    1 ) − v(x, y       ).
                   2       2             2
                                                       j+ 21               2               2
                                                                                                        j+ 12
A typical semi-discrete DG method is as following [2]: find each component of Hzh , Dh ,
Eh , Ph , Jh , Qh and σh in Vhk , under periodic boundary condition, such that,
                      µ0 (∂t Hzh , ϕ) + BhE (Exh , Eyh , ϕ) = 0,          ∀ϕ ∈ Vhk                               (2.32a)
                                          H
                      (∂t Dxh , ϕ) + Bxh      (Hzh , ϕ) = 0,       ∀ϕ ∈ Vhk ,                                    (2.32b)
                                          H
                      (∂t Dyh , ϕ) + Byh     (Hzh , ϕ) = 0, ∀ϕ ∈ Vhk ,                                            (2.32c)
                                                                                                   
                      Dh = ϵ0 ϵ∞ Eh + Ph + a(1 − θ)Ih |Eh |2 Eh + aθIh (Qh Eh ) ,                                (2.32d)
                      ∂ t P h = Jh                                                                               (2.32e)
                      ∂t Jh = −γJh − ω02 Ph + ωp2 Eh                                                              (2.32f)
                      ∂t Qh = σh                                                                                 (2.32g)
                      ∂t σh = −γv σh − ωv2 Qh + ωv2 Ih (|Eh |2 )                                                 (2.32h)
                                                             30


where Ih is interpolation on Gauss-Legendre points, projecting nonlinear terms to DG
space, (·, ·) the inner product on Ω = [0, 1]2 , and
                                                                                2N
                                                                              X      Z  1
        BhE (Exh , Eyh , ϕ)    = −(Eyh , ∂x ϕ) + (Exh , ∂y ϕ) −                           E
                                                                                          dyh (xi+ 1 , y)[ϕ]xi+ 1 dy
                                                                                                   2            2
                                                                               i=1    0
                                         2 N
                                        X      Z   1
                                    +                E  xh (x, yj+ 1 )[ϕ]yj+ 1 dx,                                   (2.33a)
                                                      d
                                                      d
                                                                     2        2
                                        j=1      0
                                     2 N
                                    X      Z   1
               H
             Bxh (Hzh , ϕ)     =                 H   zh (x, yj+ 1 )[ϕ]yj+ 1 dx + (Hzh , ∂y ϕ),                       (2.33b)
                                                 g
                                                 g
                                                                   2       2
                                    j=1      0
                                         2 N
                                        X      Z   1
               H
             Byh (Hzh , ϕ)     = −                   Hg zh (xi+ 1 , y)[ϕ]xi+ 1 dy − (Hzh , ∂x ϕ).                    (2.33c)
                                                                  2           2
                                         i=1     0
We choose the fluxes to be central flux
                          E yh (xi+ 1 , y) = {Eyh }xi+ 1 , Exh (x, yj+ 1 ) = {Exh }yj+ 1 ,                           (2.34a)
                          d                                          d
                                                                     d
                                     2                          2                 2                  2
                          H  zh (xi+ 1 , y) = {Hzh }xi+ 1 , Hzh (x, yj+ 1 ) = {Hzh }yj+ 1 ,                          (2.34b)
                          g                                          g
                                                                     g
                                     2                           2                 2                  2
or alternating flux
                                                        †                                          ♮
                      E  yh (xi+ 1 , y) = Eyh (xi+ 1 , y), Exh (x, yj+ 1 ) = Exh (x, yj+ 1 ),                        (2.35a)
                      d                                              d
                                                                     d
                                  2                        2                      2                     2
                                                         ‡                                          ♯
                      H  zh (xi+ 1 , y) = Hzh (xi+ 1 , y), Hzh (x, yj+ 1 ) = Hzh (x, yj+ 1 ),                        (2.35b)
                      g                                              g
                                                                     g
                                  2                         2                      2                     2
where
                                        (†, ‡), (♮, ♯) = (+, −) or (−, +).
    For the semi-discrete in space methods with the central or alternating numerical fluxes,
one can establish an energy relation similar as for the continuous model. Additionally,
error estimates can be proved and they are optimal with respect to the approximation
property of the discrete space Vhk when the numerical fluxes are alternating.
Theorem 2.3.1 (Semi-discrete in space energy stability [2]) Under the assumption of peri-
odic boundary conditions, the semi-discrete in space methods (2.32), with either central flux (2.34)
or alternating flux (2.35), satisfy
                          dEh (t)           ϵ0 γ                       ϵ0 aθγv
                                                   Z                             Z
                                                              2
                                    =− 2                |Jh | dΩ −                   σh2 dΩ ≤ 0,                      (2.36)
                              dt             ωp      Ω                   2ωv2      Ω
                                                                31


with the discrete energy defined as
                                   µ0 2     ϵ0 ϵ∞             ϵ0          ϵ0 ω02           ϵ0 aθ 2
                            Z
                Eh =                  Hzh +        |Eh |2 + 2 |Jh |2 +           |Ph |2 +        σ
                              Ω     2         2              2ωp          2ωp 2            4ωv2 h
                                                                                              !
                               ϵ0 aθ                  3ϵ 0 a(1 −  θ)                 ϵ0 aθ
                                     Ih Qh |Eh |2 +                  Ih |Eh |4 +           Q2h dΩ.
                                                                              
                            +                                                                           (2.37)
                                 2                           4                         4
Meanwhile, when θ ∈ [0, 3/4], Eh ≥ 0.
Theorem 2.3.2 (Semi-discrete in space error estimates [2]) Let T > 0 be given. Let κerr , ρerr ∈
(0, 1) be arbitrary constants, then under periodic boundary conditions and
                                       
                              1
   1. θ ∈ 0,                             ,
                  1 + 3(1 − ρerr )−2
   2. aθCk ∥Q∥∞ ≤ ϵ∞ (1 − κerr ), and
                                                              Ck2
                                                                                              
                            2                                           2
   3. a 12(1 − θ)Ck ∥E∥∞ ∥∂t E∥∞ + (12 − 11θ)                     ∥∂t E∥∞ + 2θCk ∥∂t Q∥∞ ≤ ϵ∞ κerr ,
                                                             ρerr
as well as the exact solution being sufficiently smooth, the numerical solution uh given by 2.32
with suitable initialization admits the following error estimate at time T
                         ∥u − uh ∥L2 (Ω) ≤ CC(κerr , ρerr )hr , u = Hz , E, P, J, σ, Q,                 (2.38)
where r = k for central flux (2.34), and r = k + 1 for alternating flux (2.35). Here C is a
generic constant independent of mesh size h, but may depend on k, the mesh parameter δ, the
model parameters, and some Sobolev norm of the exact solutions up to time T .
    Following the semi-discrete method, the fully-discrete leap-frog DG scheme is, given
  n
Hzh  , Enh , Dnh , Jnh , Pnh , σhn and Qhn in Vhk at time tn , we find Hzh     n+1
                                                                                   , En+1       n+1 n+1   n+1
                                                                                        h , Dh , Jh , P h ,
                                                         32


σhn+1 and Qhn+1 ∈ Vhk at time tn+1 = tn + ∆t, satisfying
                                                 !
                           n+1/2         n
                        Hzh       − Hzh
                 µ0                         , ϕ + BhE (Exh     n      n
                                                                  , Eyh , ϕ) = 0, ∀ϕ ∈ Vhk ,                      (2.39a)
                             ∆t/2
                  n+1            n
                   Dxh − Dxh
                                         
                                                   H      n+1/2
                                     , ϕ + Bxh       (Hzh , ϕ) = 0, ∀ϕ ∈ Vhk ,                                    (2.39b)
                          ∆t
                       n+1
                                         !
                                  n
                    Dyh − Dyh                      H      n+1/2
                                     , ϕ + Byh       (Hzh , ϕ) = 0, ∀ϕ ∈ Vhk ,                                    (2.39c)
                          ∆t
                 Dn+1    = ϵ0 ϵ∞ En+1         + Pn+1      + a(1 − θ)Yhn+1 + aθIh Qn+1                n+1
                                                                                                         
                    h                   h           h                                         h Eh          ,    (2.39d)
                 Yhn+1 = Yhn + Ih |En+1                                   n+1
                                                       2       n 2               n
                                                                                    n+1          n
                                                                                                     
                                                 h   |   +  |E h |  −   E h   ·E h    E  h    − E h
                                 1
                             + Ih (En+1            + Enh )·(En+1      − Enh )(En+1     + Enh ) ,
                                                                                              
                                             h                  h                h                                (2.39e)
                                 2
                 Pn+1
                    h    −  P  n
                               h      Jn+1 + Jnh
                                  = h                ,                                                             (2.39f)
                       ∆t                   2
                 Jn+1    − Jnh        Jn+1 + Jnh              Pn+1 + Pnh             En+1 + Enh
                   h
                                 +γ h                  + ω02 h                = ωp2 h              ,              (2.39g)
                      ∆t                     2                       2                      2
                 Qn+1
                    h    − Qnh        σ n+1 + σhn
                                  = h                ,                                                            (2.39h)
                       ∆t                    2
                 σhn+1 − σhn            σ n+1 + σhn            Qn+1 + Qnh
                                 + γv h                  + ωv2 h               = ωv2 Ih (En+1       n
                                                                                             h ·Eh ),              (2.39i)
                      ∆t                      2 !                     2
                           n+1         n+1/2
                        Hzh     − Hzh                            n+1      n+1
                 µ0                            , ϕ + BhE (Exh         , Eyh   , ϕ) = 0, ∀ϕ ∈ Vhk                   (2.39j)
                              ∆t/2
The terms of BhE , Bxh  H
                          , and Byh H
                                         are defined in (2.33), with either the central fluxes (2.34) or
alternating fluxes in (2.35). Energy stability result for fully discrete scheme follows:
Theorem 2.3.3 (Fully-discrete energy stability [2]) Under the assumption of periodic bound-
ary conditions, the fully-discrete leap-frog nodal DG schemes (2.39) satisfy
                         ϵ0 γ∆t                                   ϵ0 aθγv ∆t
                                  Z                                            Z
      Ehn+1 − Ehn  =−                  |Jn+1
                                          h     +  Jnh |2 dΩ  −                    (σhn+1 + σhn )2 dΩ ≤ 0,          (2.40)
                           4ωp2     Ω                                 8ωv2       Ω
with the discrete energy Ehn defined as
                      µ0 n+1/2 n−1/2 ϵ0 ϵ∞ n 2                         ϵ0             ϵ0 ω02 n 2 ϵ0 aθ n 2
               Z 
      Ehn  =             Hzh Hzh               +           |Eh | + 2 |Jnh |2 +               |Ph | +          (σ )
                 Ω    2                              2                2ωp              2ωp2             4ωv2 h
                  ϵ0 aθ         n    n 2
                                          3ϵ0 a(1 − θ)                  n 4
                                                                              ϵ0 aθ n 2 
               +        Ih Qh |Eh | +                           Ih |Eh | +               (Qh ) dΩ.                  (2.41)
                    2                                   4                           4
In addition, Ehn ≥ 0 if θ ∈ [0, 3/4] and ∆t ≤ Ch for some constant C.
                                                             33


    Similarly, we can formulate fully implicit scheme as in (2.39); however, update of Hzh
at intermediate time step n + 1/2 is unnecessary, so in fully implicit scheme, (2.39j) should
be removed, while (2.39a),(2.39b),and (2.39c) should be replaced by
                                             n+1        n E n+1 + E n
           n+1        n
            Hzh − Hzh
                              
                                        E Exh + Exh           yh        yh
       µ0                , ϕ + Bh (                        ,                , ϕ) = 0, ∀ϕ ∈ Vhk             (2.42a)
                 ∆t                               2               2
         n+1       n                       n+1       n
          Dxh − Dxh
                         
                                     H Hzh + Hzh
                      , ϕ + Bxh        (                 , ϕ) = 0, ∀ϕ ∈ Vhk ,                              (2.42b)
               ∆t                               2
            n+1
                         !
                    n
          Dyh   − Dyh                       n+1
                                     H Hzh + Hzh
                                                      n
                      , ϕ + Byh        (                 , ϕ) = 0, ∀ϕ ∈ Vhk ,                              (2.42c)
               ∆t                               2
All simulations of two dimension in Section 2.6 use fully implicit scheme (2.39d)-(2.39i)
and (2.42).
2.3.2   High Dimensional Sparse Grid DG Method and Interpolatory Multiwavelets
Based on one dimensional Sparse Grid DG Method we discussed before, we are ready
to define it in higher dimension d > 1. First we recall some basic notations about multi-
indexes. For a multi-index α = (α1 , . . . , αd ) ∈ Nd0 , where N0 denotes the set of nonnegative
integers, the l1 and l∞ norms are defined as
                                              X d
                                     |α|1 =        αi ,    |α|∞ = max αi .
                                                                     1≤i≤d
                                              i=1
The component-wise arithmetic operations are
           α · β = (α1 β1 , . . . , αd βd ),   c · α = (cα1 , . . . , cαd ),    2α = (2α1 , . . . , 2αd ),
while and relational operations are defined as
                    α ≤ β ⇐⇒ αi ≤ βi ∀i;                α = β ⇐⇒ α ≤ β and β ≤ α;
                                       α < β ⇐⇒ α ≤ β and β ≤ α.
    By making use of the multi-index notation, we denote by l = (l1 , . . . , ld ) ∈ Nd0 , the
mesh level in a multivariate sense. Define the tensor-product mesh grid
                                                  O         O       O
                                      Ωl = Ωl1         Ωl2      ...      Ωld ,
                                                         34


as tensor product of one dimensional mesh, and the corresponding mesh size
                                           hl = (hl1 , . . . , hld ).
For convenience, we also write hli as hi , for i = 1, 2, . . . , d. Based on grid Ωl , we denote by
             Ilj = (h1 j1 , h1 (j1 + 1)) × (h2 j2 , h2 (j2 + 1)) × . . . × (hd jd , hd (jd + 1))
an elementary cell, and
          Vlk = {v : v(x) ∈ Qk (Il )j , 0 ≤ j ≤ 2l − 1} = Vl1k,x1 × Vl2k,x2 × . . . × Vldk,xd
the tensor-product piecewise polynomial space, where Qk (Ilj ) denotes the collection of
polynomials of degree up to k in each dimension on cell Ilj , and Vlik,xi the piecewise k-
th order polynomial space of variable xi , with x = (x1 , . . . , xi , . . . , xd ). If we use equal
mesh refinement of size hN = 2−N in each coordinate direction, the grid and space will be
                          k
denoted by ΩN and VN         , respectively.
    Based on a tensor-product construction, the multi-dimensional increment space can
be defined as
                                  Wlk = Wlk1 ,x1 × Wlk2 ,x2 × . . . × Wlkd ,xd
Therefore, the standard tensor-product piecewise polynomial space on N can be written
as
                                                        M
                                           k
                                         VN   =                   Wlk ,
                                                 |l|∞ ≤N ; l∈Nd0
while the sparse grid approximation space as discussed in e.g. [3] is
                                                        M
                                         Vbk =
                                           N                      Wlk ,                          (2.43)
                                                  |l|1 ≤N ; l∈Nd0
                k                          b k scales as O((k + 1)d 2N N d−1 ) [3], which is signif-
a subset of VN    . The dimension of V        N
                                  k
icantly less than that of VN         with exponential dependence on N d. The approximation
            b k are discussed in e.g. [3], which has a stronger smoothness requirement
results for V N
                                    k
than the traditional space VN         . However, for adaptive scheme we will discuss later, we
                                                       35


will not require the numerical solution to be in V                       b k , but rather in Vk , and we can choose
                                                                           N                  N
                              k
basis functions from VN         to solve the PDE problem in each time step adatpively.
    Finally, we define the polynomial basis functions of degree k in multi-dimensions as
                                                              Y d
                                              j
                                           vi,l  (x)     =          vijmm,lm (xm ),
                                                              m=1
and for all m,
                        lm ∈ N0 ,    0 ≤ im ≤ k,                0 ≤ jm ≤ max{0, 2lm −1 − 1}.
The orthonormality of the basis is established in sections before. Furthermore, the support
    j
of vi,l of all i are Ilj′ , where l′ = (l1′ , . . . , ld′ ) and lm    ′
                                                                          = max{lm − 1, 0}.
    Following the same manner, we can construct high-dimensional interpolatory multi-
wavelets. Let
                                  fM = W
                                  W          fM × W            fM × . . . × W       fM ,
                                     l           l1 ,x1           l2 ,x2             ld ,xd
where W  f M , m = 1, . . . , d are one-dimensional interpolatory multiwavelet spaces of
            lm ,xm
mesh level lm and m-th dimension variable xm , and M here is degree of polynomials of
interpolatory multiwavelets. Recall that we might choose M as a larger integer than k.
Then
                                                                 M
                                                M                            fM,
                                            VN       =                      W    l
                                                          |l|∞ ≤N ; l∈Nd0
while the sparse grid approximation space is
                                                                 M
                                             eM =
                                             V                              fM.
                                                                            W
                                                N                               l
                                                           |l|1 ≤N ; l∈Nd0
2.3.3    Adaptive SGDG Method
In this subsection, we formulate an adaptive multiresolution projection algorithm as dis-
cussed in [123] and [5, 6]. The last two references provide full details of adaptive SGDG
scheme. Define an accuracy threshold or error threshold ϵ > 0; we will use this threshold
throughout this subsection when we demonstrate the adaptive scheme, in both refine-
ment and coarsening procedures. In practice, we choose smaller accuracy threshold for
                                                                 36


coarsening than refinement, usually δ = ε/10, where ε and δ are error threshold for re-
finement and coarsening, respectively.
    The backbone of the algorithm is the fact that each hierarchical basis, of space VNk in
                                 k
one dimension or of space VN        in higher dimension, represents some details of a solution
or a function on a specific mesh scale, which naturally provides an error indicator for
the design of adaptive algorithms. In one dimension, consider u(x) has L2 projection
ũ(x) on VNk supported on Ω = [0, 1], as linear combination of Alpert’s multiwavelet basis
functions:
                                                              n−1 −1}
                                        X k X  N max{0,2  X
                              ũ(x) =                                 uji,n vi,n
                                                                             j
                                                                                 (x),    (2.44)
                                        i=0 n=0            j=0
where
                                                 Z   1
                                       uji,n =          u(x)vi,nj
                                                                   (x)dx.
                                                   0
It was shown in [4] that
                                      n−1 −1}
                          Xk max{0,2
                                   X
                                                |uji,n |2 ≤ C2−(q+1)n |u|Hq+1 (Ω) ,
                          i=0       j=0
if u ∈ Hp+1 (Ω), where seminorm Hp+1 (Ω) is
                                                           dp+1 u
                                     |u|Hp+1 (Ω) =                           ,
                                                           dxp+1     L2 (Ω)
and q = min{p, k}. The hierarchial coefficients uji,n here (also called hierarchial surplus)
naturally serves as an indicator for local smoothness of u(x). The main idea of the adap-
tive algorithm is to choose only those basis functions or elements whose coefficients above
a prescribed threshold value ϵ. Here we consider using L2 norm of u as indicator, and
                        j
since basis functions vi,n  are orthnormal, this is equivalent to
                                                  k
                                                               !1/2
                                                X        j 2
                                                       |ui,n |
                                                 i=0
for each fixed n and fixed 0 ≤ j ≤ max{0, 2n−1 − 1}. In summary, we would flag an
element Vnj if
                                              k
                                                            !1/2
                                             X
                                                  |uji,n |2        ≥ ϵ.
                                             i=0
                                                        37


                                                                    k
In higher dimension, projection ũ of function u on VN                is
                                   X X                       X                  j
                        ũ(x) =                                          uji,l vi,l (x),              (2.45)
                                  0≤i≤k 0≤l≤N 0≤j≤max{0,2l−1 −1}
where
                                                 Z
                                      uji,l   =     u(x)vi,l   j
                                                                 (x)dx,                               (2.46)
                                                  Ω
with spatial domain Ω = [0, 1]d and dimension d. Here multi-index of constant is simple
duplicate of the constant in d times, where d is dimension of spatial domain, e.g.
                               1 = (1, . . . , 1),        N = (N, . . . , N ),
and max function here is component-wise. It is also demonstrated in [4, 5] that
                                              1/2
             X          X
                                    |uji,l |2     ≤ C2−(q+1)l |u|Hq+1 (Ω) ,          ∀u ∈ Hp+1 (Ω),
            0≤i≤k 0≤j≤max{0,2l−1 −1}
similar to the one dimension case. So in higher dimension, an element Vlj will be flagged
as important if
                                                             !1/2
                                              X
                                                   |uji,l |2       ≥ ϵ.                               (2.47)
                                             0≤i≤k
    The adaptive scheme is implemented by hash table as the underlying data structure,
which stores active elements of the adaptive scheme, and the numerical solution on these
active elements. The implementation details will be discussed in next section. We now
introduce the concepts of child, parent and leaf elements [6], for better understanding of
                                                        ′
the adaptive procedure. Assume Vlj and Vlj′ are two elements, satisfying |l|∞ , |l′ |∞ ≤ N .
Then element Vlj ais called child element of Vlj , if and only if for some 1 ≤ m ≤ d,
l′m = lm + 1, where lm and l′m are m-th component of multi-index l and l′ , respectively.
                                                              ′
Accordingly, we call Vlj a father element of Vlj′ . If an element does not have any of its
child elements in the hash table, then we call it a leaf element.
    To describe the time evolution of adaptive scheme [6], consider now the numerical
solution uh on active elements at time step tn is stored in a hash table H. Information of
                                                      38


leaf elements is stored in a separate leaf table. The time evolution from time step tn to
tn+1 = tn + ∆t consists of four steps. The first step is the prediction step, which predicts
the location on spatial domain Ω where the details of numerical solution uh become sig-
nificant at the next time step tn+1 , to determine in next step where we should add more
elements in order to capture the fine structures. We solve from tn to tn+1 using a cheap
                                                                                    (p)
solver, e.g. the forward Euler method in time, to obtain predicted solution uh . For spa-
tial discretization in this step, if any nonlinear terms involve, the interpolation operator
I applied here has the same multiresolution structure as determined by the hash table H
corresponding to the numerical solution uh at current time step tn .
                                                                           (p)
    The second step is refinement step based on predicted solution uh . An active element
in hash table, satisfying (2.47), is considered as important element. All child elements Vlj
of an important element will be added to the hash table, to obtain new hash table H (p) ,
and coefficients uji,l for 0 ≤ i ≤ k associated to newly added multiwavelet basis functions,
as in (2.46), are set to be zeros. Leaf table is also updated accordingly. Now in the third
step, with numerical solution uh at time step tn , and a new hash table H (p) , we can solve
numerical solution ũn+1h    for the next step tn+1 before the final coarsening step. Again, the
interpolation operator is now determined by the new hash table H (p) .
    The last step is coarsening step, to remove leaf elements that are not important. We
traverse elements in the leaf table, and if a leaf element does not satisfy (2.47), it will be
removed from both the leaf table and hash table. Note that the leaf table has to be dynam-
ically updated in this procedure, since the removal of any leaf element may potentially
create one or more new leaf elements. For the same reason, coarsening is done in recur-
sive way, until no more leaf elements are removed and created. Now we obtain the new
hash table and leaf table, and numerical solution un+1  h    at time step n + 1.
    The last component of the algorithm is an efficient implementation of the hash table.
As suggested in [124, 75] and further discussed in [4], the hash table approach is easy to
implement, requires little storage overhead, and allows one to conveniently deal with hi-
                                                 39


erarchical multi-index (l, j) in the implementation. Specifically, by a prescribed hash func-
tion that will be discussed in details in 2.4.2, the hierarchical multi-index (l, j) is mapped
to a hash key (an integer), which serves as an address in the hash table. Then, given a
hierarchical index, the associated data can be easily stored and retrieved by computing
the hash key.
2.4    Adaptive SGDG Package and PETSc Implementation
2.4.1   Overview of Package
Before we take into account full details of the package, we briefly introduce the struc-
ture of the SGDG package [8] here, which is developed using C++ language. The SGDG
package is developed under the same logic as development of SGDG methods and their
adaptive version, together with necessary time integration methods, to solve several dif-
ferent PDEs. Therefore in general, the SGDG pacakge consists significantly of two parts,
regarding spatial and time discretization, together with other supporting classes and files.
    The most important part is SGDG spatial discretization, including Alpert’s multi-
wavelet basis, interpolatory multiwavelet basis, formulation of bilinear form, represen-
tation of numerical solutions as linear combination of basis functions, and fast algorithms
for matrix multiplication. Several other features are implemented to support these pur-
poses, including hash table to refer to basis functions, quadrature functions, and trans-
formation of numerical solution under different multiwavelet basis, etc. Besides, several
time integration methods are implemented to corporate with spatial SGDG discretiza-
tion, including several Runge Kutta method or SSP methods [92, 108, 109, 110], and IMEX
method [125, 6], with help of data structures and linear solvers in Eigen package [113]; af-
ter re-implementing several key functions regarding PETSc [9] data structures, generally
fully implicit method with nonlinear terms becomes available in the package. At the end,
several examples regarding different equations are written for further research and users
in future; several unit tests are developed to verify correctness of certain significant and
                                              40


complicated C++ classes and functions; makefile is developed to handle compiling and
linking procedure to create executable effectively; doxygen [126] is introduced to generate
documentation for source codes of the package.
2.4.2    Hash Key of SGDG Elements
For implementation of SGDG method, hash key is a key ingredient. After spatial dis-
cretization, each element or cell in SGDG formulation is provided a unique integer as the
hash key, and this element could be identified in the later calculations by the hash key.
     To describe the details of hash key, we first notice that each element is uniquely identi-
fied by its levels and support indexes on each dimension. Consider the SGDG formulation
of d dimension and an element Vlj , then on dimension i, the level of the element is li and
the support is identified as ji , 1 ≤ i ≤ d, which are i-th component of multi-index l and
                                                                            ′
j, respectively. Assume there are two elements Vlj and Vlj′ , then element Vlj has smaller
                         ′
hash key than Vlj′ , if and only if
    1. |l|1 < |l′ |1 , i.e.
                                                    Xd         X  d
                                                        li <         li′ ;
                                                    i=1         i=1
       or
    2. |l|1 = |l′ |1 , and li = li′ for i = p + 1, p + 2, . . . , d, but lp < lp′ for some 1 ≤ p ≤ d; or
    3. li = li′ for all 1 ≤ i ≤ d, and ji < ji′ for i = 1, 2, . . . , p − 1, but jp < jp′ for some
       1 ≤ p ≤ d.
It is clear that under such procedure, two elements, whose level vectors have the same l1
norm, or who have the same level vectors, will have relatively close hash key numbers;
elements with smaller l1 norm of level vector will have smaller hash key numbers. These
requirements are suitable for SGDG scheme, since sparse grid approximation space (2.43)
collects elements with l1 norm of level vector bounded by the maximum mesh level N .
                                                       41


2.4.3   SGDG Spatial Discretization
A DGSolution class is defined, in general, to represent the SGDG solutions as linear com-
bination of (Alpert’s or interpolatory) multiwavelet basis functions. The class stores sev-
eral important information, including maximum mesh level N = NMAX, dimension d =
DIM, number of components of solutions VEC_NUM, and most importantly, the hash ta-
ble implemented in a C++ standard container map dg. Certain member functions are im-
plemented, which includes initialization by L2 projection of initial functions, copy func-
tion to copy SGDG solutions among realizations, evaluation functions to evaluate SGDG
solutions at different spatial points regarding different basis functions, functions to com-
pute errors, treatments regarding artificial viscosity, etc.
    Active elements of SGDG method, together with the numerical solution, as discussed
in last section, is stored in hash table, implemented by a standard C++ container map
called dg. The key values of the map dg are hash key discussed in 2.4.2; the hash key
is mapped to the object of class Element, each of which represents element Vlj , and nu-
                                                           j
merical solution on the Alpert’s multiwavelet basis {vi,l    }0≤i≤k . On each element or ob-
ject of Element class, the associated coefficients of numerical solution for Alpert’s basis
functions are stored in ucoe_alpt, while the associated coefficients for other interpolatory
multiwavelet basis are stored in ucoe_intp. Other necessary data, including fucoe_intp
for coefficients of f (u) for interpolatory multiwavelet basis with given nonlinear function
f and numerical solution u, are also declared and stored in Element class. Some basic
information of the element, including the support and discontinuous points of the ba-
sis functions, are also stored explicitly for further use in volume integral etc. Besides,
to compute the volume integral and cell boundary integral more efficiently, we need to
determine whether two basis functions (or two elements) are orthogonal to each other;
several functions in the class play such role. For the purpose of adaptive method, we
need to add or remove elements dynamically twice on each time step, so we need to store
the pointer to children element in hash_ptr_chd and to parent element in hash_ptr_par.
                                              42


class DGSolution
{
public:
     // constructor and destructor of the class
     static int DIM;           // dimension of spatial domain
     static int VEC_NUM;       // number of unknowns
     ...
     // find pointers to elements that are related to current element
     void find_ptr_vol_alpt();
     void find_ptr_flx_alpt();
     void find_ptr_vol_intp();
     void find_ptr_flx_intp();
     // functions regarding error calculations and copying data between vectors
     // key of the map is hash key
     // basis functions and associated data store in object of class Element
     std::unordered_map<int, Element> dg;
     // calculate value of DG solution at a given point
     std::vector<double> val(const std::vector<double> & x, \
                                   const std::vector<int> & derivative) const;
     // functions for initialization of DG solution
protected:
     const bool sparse;
     const int NMAX;           /// maximum mesh level
     Hash hash;
     AllBasis<AlptBasis> all_bas;
     AllBasis<LagrBasis> all_bas_Lag;
     AllBasis<HermBasis> all_bas_Her;
     ...
};
                     Algorithm 2.1: DGSolution Class, in DGSolution.h
    There are also several classes defined for basis functions, all of which are derived from
the base class Basis. Common features of basis functions are defined here, including level
of mesh, support of basis and discontinuous points, index indicating which multiwavelet
function the current basis function is scaled from, and several functions to compute vol-
ume or edge integrals. The derived classes, including AlptBasis, LagrBasis, HermBa-
sis, represent the corresponding Alpert’s multiwavelet basis functions and Lagrangian or
Hermite interpolatory multiwavelet basis functions. Expression of basis functions and
their derivatives are implemented for further usage in volume and edge integral calcula-
                                              43


class Element
{
public:
    Element(const std::vector<int> & level_, const std::vector<int> & suppt_, \
                                         AllBasis<AlptBasis> & all_bas, Hash & hash);
    ~Element() {};
    static int DIM;         ///< dimension of spatial domain
    static int VEC_NUM;     ///< number of unknowns
    static std::vector<std::vector<bool>> is_intp;
                            ///< specify which components need interpolation
    const std::vector<int> level;        ///< mesh level in each dimension
    const std::vector<int> suppt;        ///< support index in each dimension
    /// left and right end point of support in multi-dimension
    std::vector<double> xl;
    std::vector<double> xr;
    std::vector<std::vector<double>> dis_point;      ///< discontinuous points
    std::vector<std::vector<double>> supp_interv;    ///< interval of support
    std::vector<int> order_elem;         ///< order for this element in each dimension
    int hash_key;                        ///< hash key for this element
    static int PMAX_alpt;        //< polynomial degree for Alpert’s basis
    static int PMAX_intp;        //< polynomial degree for interpolatory basis
    // order of Alpert’s or interpolatory multiwavelet basis in this element, e.g
    std::vector<VecMultiD<int>> order_alpt_basis_in_dg;
    ...
    // coefficients of Alpert’s basis, and in fast multiplication e.g.
    std::vector<VecMultiD<double>> ucoe_alpt;
    ...
    // coefficients of interpolatory basis for numerical solution u, e.g.
    std::vector< VecMultiD<double> > ucoe_intp;
    ...
    // coefficients for interpolatory basis of f(u), e.g.
    std::vector< std::vector< VecMultiD<double> > > fucoe_intp;
    ...
    // coefficients of numerical solution on next time step, for fully implicit method
    std::vector< VecMultiD<double> > up_intp_next;
    std::vector< VecMultiD<double> > ucoe_alpt_next;
    // number of total children and parent elements; used in adaptive scheme
    int num_total_chd, num_total_par;
    // maps that store existing children and parents elements; used in adaptive scheme
    std::unordered_map<int,Element*> hash_ptr_chd;
    std::unordered_map<int,Element*> hash_ptr_par;
    ...
};
                      Algorithm 2.2: Element Class, in Element.h
                                           44


tions.
    Previous paragraphs discuss the top-down structure from DGSolution class to basis
functions, and these classes are core classes for regular SGDG methods. However, to im-
plement adaptive methods properly, a derived DGAdapt class from DGSolution class is
necessary. The main feature added to DGAdapt is functions refine() and coarsen(), which
conduct the refinement and coarseness procedure in adaptive method. The basic logic
of these two algorithms are to search among elements stored in dg map of DGSolution
class, or through leaf table called leaf and leaf_zero_child that store all leaf elements or
leaf elements with no active child elements, to determine if an element is important, as
discussed in details in Subsection 2.3.3. Note that an element’s importance and whether
an element should be removed are determined by the element and its active children, so
the determination process is local.
    The data structures defined above provide a clear picture of how solution is repre-
sented in the SGDG algorithm, but we still need certain data structures or classes of ma-
trix to represent related transformations when time integration is applied. Recall that in
Basis class, we define a general class for all basis, together with operations on basis func-
tions, especially certain inner products between different basis functions or their deriva-
tives. We then define, accordingly, a class template AllBasis<T>, which collects all basis
functions in one dimension, where T represents one of AlptBasis, LagrBasis, HermBasis,
specifying which type of basis is collected in the AllBasis<T>; we employ C++ template
here to provide a generic class among all kinds of basis functions.
    On top of the AllBasis class, we define class OperatorMatrix1D, which provides one
dimensional transformation matrix from coefficients of numerical solution. Using the
information provided in OperatorMatrix1D, we are able to assemble and store matrixes
regarding SGDG spatial discretization in class BilinearForm. BilinearForm supports two
sets of data structure: for thread parallelization we could use Eigen package [113] and its
matrixes and vectors; for implicit scheme using nonlinear solver in time stepping, we use
                                               45


class DGAdapt :
    public DGSolution
{
public:
    // constructor and destructor of class DGAdapt
    // refine based on reference solution generated by Euler forward
    void refine();
    // coarsen
    void coarsen();
    ...
protected:
    const double eps;       // refinement threshold
    const double eta;       // coarsening threshold
    // leaf element that does not have its all children
    std::unordered_map<int, Element*> leaf;
    // leaf element with no child
    std::unordered_map<int, Element*> leaf_zero_child;
    // after adding or deleting, check no holes in solution
    void check_hole();
    // update leaf table DGAdapt::leaf
    void update_leaf();
    // add a given element into DG solution
    void add_elem(Element & elem);
    // delete a given element into DG solution
    void del_elem(Element & elem);
    ...
private:
    // coarsen based on leaf elements with no child
    void coarsen_no_leaf();
    // update leaf table DGAdapt::leaf_zero_child
    void update_leaf_zero_child();
    ...
};
                      Algorithm 2.3: DGAdapt class, in DGAdapt.h
PETSc package [9] and its data structure and solvers.
                                           46


2.4.4   Time Integration and PETSc Package
The package uses ODESolver class to deal with time integration, coupling with data struc-
ture from Eigen package. ODESolver class stores matrix and solution vectors together at
the same time, and provide several functions to operate matrix and vector operations.
Several specific time integration methods are defined as derived class of ODESolver, in-
cluding explicit Strong Stability Preserving (SSP) methods, multistep methods, and IMEX
method. Implementations of some of these methods require solving a linear system, in
which case linear solver of Eigen package will be used to solve linear system when evolv-
ing the solution to next time step. This kind of approach made certain implicit method
available in the package, but a fully implicit method involving nonlinearity would not fit
into such framework. To deal with this, we introduce PETSc package [9] for nonlinear
solvers involved.
    In order to illustrate how SGDG package couples with PETSc package, we provide the
following linear example. Consider linear PDE
                   Ut − (AU )x − (BU )y = 0, (t, x, y) ∈ [0, T ] × [0, 1] × [0, 1]
with periodic boundary condition on both x and y direction, where U (t, x, y) = (U0 , U1 , U2 )T
is vector-valued with three components, and
                                                                    
                                   0 0 −1                   0 1 0
                                                                    
                            A=      0  0    0  ,
                                                      B=     1 0   0.
                                                                       
                                                                    
                                    −1 0 0                      0 0 0
An accurate solution of this system is
                                                   √
                                                    5t+x+2y))
                                                               √
                            U (t, x, y) = ecos(2π(            ( 5, 2, −1).
The semi-discrete SGDG scheme of this equation is, to find vector-valued solution U =
                                                   47


U (t, x, y), for any vector-valued test function V , such that
                                           2N
                                           X   Z   1
                                                      C01 U (x+ , y) + C00 U (x− , y) · [V ]
                                                                                     
               (Ut , V ) + (AU, Vx ) +                                                                  dy
                                                 0                                             x=xj+1/2
                                           j=1
                              2N
                             X     Z   1
                                          C11 U (x, y + ) + C10 U (x, y − ) · [V ]
                                                                          
           + (BU, Vy ) +                                                                    dx = 0         (2.49)
                                     0                                             y=yj+1/2
                              j=1
where (·, ·) is inner product on spatial region [0, 1]2 of (x, y). For alternating flux
                                     c0 = U0− ,
                                     U               c1 = U1+ ,
                                                     U             c2 = U2+ ,
                                                                   U
we have
                                                                                                 
                 0 0 0                   0 0 −1                  0 0 0                 0 1 0
                                                                                                 
        C00   = 0 0 0 , C01           =0 0 0  , C10 = 1 0 0 , C11 = 0 0 0 .
                                                                                                   
                                                                                                 
                  −1 0 0                     0 0 0                     0 0 0                  0 0 0
We can rewrite (2.49) as (Ut , V ) + B(U, V ) = 0 where B(·, ·) is a bilinear form. For fully
implicit method in time, consider numerical solutions at time step tn and tn+1 = tn + ∆t
are U n and U n+1 respectively, then
                                                                  ∆t
                  F (U n+1 ; U n , V ) := (U n+1 − U n , V ) +       B(U n+1 + U n , V ) = 0.              (2.50)
                                                                   2
    In Algorithm 2.4, we provide an example of solving (2.48), after removing some tech-
nical details, to reveal the main structure of the SGDG package. Maximum mesh level
NMAX and polynomial degree PMAX of Alpert’s multiwavelet basis are provided by
command line arguments. Static variables, hash key in class Hash, collection of all Alpert’s
multiwavelet basis functions in class AllBasis<AlptBasis>, one dimension matrix for lin-
ear operations in class OperatorMarix1D<AlptBasis,AlptBasis>, and numerical solution
and hash table in class DGSolution, are defined in proper order. Functions with name
format assemble_matrix_*_system assemble the matrix corresponding to bilinear form
B(·, ·), using coefficients matrix A, B, and C00 , C01 , C10 , C11 , and store the assembled ma-
trix in PETSc matrix mat_petsc of class HyperbolicAlpt, a derived class from class Bilin-
earForm.
                                                          48


// include header files
typedef struct AppCtx AppCtx;
struct AppCtx
{
    HyperbolicAlpt* app_HyperbolicAlpt;
    DGSolution*     app_DGSolution;
    double          dt;
};
int main(int argc, char** argv)
{
    PetscErrorCode ierr;
    ierr = PetscInitialize(&argc,&argv,(char*)0,NULL);if (ierr) return ierr;
    int NMAX=6,PMAX=1;
    ierr=PetscOptionsGetInt(NULL,NULL,"-NMAX",&NMAX,NULL);CHKERRQ(ierr);
    ierr=PetscOptionsGetInt(NULL,NULL,"-PMAX",&PMAX,NULL);CHKERRQ(ierr);
    // assign values to static variables
    AlptBasis::PMAX = PMAX;
    Element::DIM = 2;
    Element::VEC_NUM = 3;
    ...
    Hash hash; // hash key
    AllBasis<AlptBasis> all_bas_alpt(NMAX);
    const std::string boundary_type = "period";
    OperatorMatrix1D<AlptBasis, AlptBasis> \
                         oper_matx_alpt(all_bas_alpt, all_bas_alpt, boundary_type);
    DGSolution dg_solu(true, NMAX, NMAX, all_bas_alpt, hash);
    // initialize the DGSolution with an initial function
    HyperbolicAlpt linear(dg_solu, oper_matx_alpt);
    linear.assemble_matrix_vol_system(0,A);
    linear.assemble_matrix_vol_system(1,B);
    linear.assemble_matrix_flx_system(0,1,C01);
    linear.assemble_matrix_flx_system(0,-1,C00);
    linear.assemble_matrix_flx_system(1,1,C11);
    linear.assemble_matrix_flx_system(1,-1,C10);
    double final_time = 1./std::sqrt(5.);
    double cfl = 0.9;
    AppCtx app_sol;
    app_sol.app_HyperbolicAlpt = &linear;
    app_sol.app_DGSolution      = &dg_solu;
    double current_time = 0.;
    // while loop to find numerical solution; see below
    // calculate errors; print solutions
    // Destroy PETSc objects
    return ierr;
}
              Algorithm 2.4: Example of Linear Equations, in Linear2D.cpp
                                            49


    In the while loop of finding numerical solutions of each time step, as shown in Al-
gorithm 2.5, a Scalable Nonlinear Equations Solvers (SNES) object of PETSc is created
for solving general linear and nonlinear system. For linear problem, scalable linear equa-
tions solvers (KSP) of PETSc is a better choice, and setting of KSP object is similar to SNES
object.
while (current_time < final_time)
{
     double dt = cfl / std::pow(2., NMAX);
     dt = std::min( dt , finaltime - curr_time );
     app_sol.dt = dt;
     // Initialize vectors for residual and next step solution
     Vec residual, next_step_solution;
     ...
     SNES snes;
     ierr = SNESCreate(PETSC_COMM_WORLD,&snes);
     ierr = SNESSetFunction(snes,residual,FormFunction,&app_sol);
     ierr = SNESSetFromOptions(snes);
     ierr = SNESSolve(snes,NULL,next_step_solution);
     // Copy next step solution back to DGSolution class
     curr_time += dt;
     // Destroy PETSc objects
}
          Algorithm 2.5: While Loop to Find Numerical Solution in Algorithm 2.4
    As in while loop of Algorithm 2.5, the key functions of SNES nonlinear solver are
SNESSetFunction and SNESSolve. SNESSetFunction provides to the nonlinear solver a
FormFunction, which compute F (x) if the nonlinear problem to be solved is F (x) = 0.
In our example, as in Algorithm 2.6, FormFunction computes F (U n+1 ; U n , V ) with input
of numerical solution of next time step U n+1 , and numerical solution of current time step
U n and other related information are provided through an application context of class
AppCtx. As in Algorithm 2.4, an AppCtx object stores time step dt= ∆t, and the pointer
of DGSolution object of SGDG solution, and HyperbolicAlpt object of bilinear form of the
system.
    Nevertheless, SNESSolve uses iterative method to find solution of nonlinear prob-
                                             50


PetscErrorCode FormFunction(SNES snes,Vec next_u,Vec f,void* ctx)
{
      AppCtx *user = (AppCtx*)ctx;
      PetscErrorCode ierr;
      Vec now_u;
      // Copy current step solution in user->app_DGSolution to now_u
      ...
      Vec sum_, diff_;
      // sum_ = 0.5 * (next_u + now_u) * user->dt
      ...
      // diff_ = next_u - now_u
      ...
      // f = diff_ + mat_petsc * sum_
      ierr = MatMultAdd(user->app_HyperbolicAlpt->mat_petsc, sum_, diff_, f);
      return ierr;
}
        Algorithm 2.6: Function FormFunction for Nonlinear Solver in Algorithm 2.4
lem, and SNESSetFromOptions sets various SNES parameters, either from command line
arguments or by default. A typical set of options provided through command line argu-
ments is like
Linear2D -NMAX 6 -PMAX 3 -snes_mf \
          -snes_rtol 1e-12 -snes_atol 1e-12 -snes_stol 1e-12
                   Algorithm 2.7: Command to Run Executable Linear2D
which set maximum mesh level 6 and polynomial degree 3, choose matrix-free method
through option -snes_mf, and set tolerance of nonlinear solvers by options –snes_rtol,
–snes_atol, and –snes_stol.
     Before the end of this section, we remark that although the example provided here
is linear, the nonlinear problem can fit into this structure too. In case a FormFunction is
provided, matrix-free method can be applied to solve the numerical solution.
                                             51


2.5   Numerical Simulation in One Dimensional Case
2.5.1  Kink Shape Solutions
The first numerical test we consider, as discussed in [1], where a traveling wave solution
was constructed for the instantaneous intensity-dependent Kerr response neglecting the
influence of damping, corresponding to the case of θ = 0 and τ = ∞ in (2.11). This yields
a simplified system from Equations (2.11) as the following:
                                   ∂t H = ∂x E                                            (2.51a)
                                   ∂t D = ∂x H                                            (2.51b)
                                    ∂t P = 6J                                              (2.51c)
                                    ∂t J = 6(−ω02 P + ωp2 E)                              (2.51d)
                                      D = ϵ∞ E + P + aE 3                                 (2.51e)
where the only nonlinear term comes from constitutive law of D, representing the cubic
Kerr effect. We can find a traveling wave function E(x, t) = E(ξ) with ξ = 6(x−vt), where
E(ξ) is comprised of a kink and anti-kink wave, and is solved based on the following ODE
of E(ξ) and Φ(ξ):
                   dE
                         = Φ                                                              (2.52a)
                    dξ
                    dΦ        6av 2 EΦ2 + (ϵ∞ ω02 + ωp2 − ω02 /v 2 )E + aω02 E 3
                         =                                                                (2.52b)
                    dξ                     1 − ϵ∞ v 2 − 3av 2 E 2
Here the parameters are
                                                                      √          0.6545
          ϵ∞ = 2.25, a = 0.75, ω0 = 93.627179982222216, ωp =           3ω0 , v = √      ,
                                                                                    ϵ∞
with condition
                         E(0) = 0,      Φ(0) = 0.24919666777865812.
These conditions are carefully chosen, so that the resulting solution of E, Φ are 6-periodic.
The overall solution for all variables, based on E, Φ obtained from the ODE, again with
                                               52


ξ = 6(x − vt), is the following:
                                          1                  1
                          H(x, t) = − E(ξ), D(x, t) = 2 E(ξ)                            (2.53)
                                          v                 v
                                          1
                          P (x, t) = ( 2 − ϵ∞ )E(ξ) − aE(ξ)3                            (2.54)
                                         v
                                                1
                          J(x, t) = (ϵ∞ v − )Φ(ξ) + 3avE(ξ)2 Φ(ξ)                       (2.55)
                                                v
Note that all H, D, E, P, J have period 1. The function E and J at t = 0 are given in Figure
2.1.
                          Initial E                                 Initial J
         0.4                                         0.4
                                            E                                      J
         0.2                                         0.2
           0                                           0
        -0.2                                        -0.2
        -0.4                                        -0.4
             0          2           4         6          0       2            4      6
                              Figure 2.1: Function E and J at t = 0
    We perform an accuracy test for fully implicit method. We choose the CFL condition
to be
                                      ∆t = CF L × h(k+1)/2
where h = 1/2N is mesh size, and k is degree of piecewise polynomial. CFL constant is set
as in Table 2.1. L1 , L2 and L∞ errors with order, regarding function E and J, are shown,
                                           k    1  2      3
                                         CFL    1  8     63
                       Table 2.1: CFL constant for fully implicit method
respectively, in Table 2.2 and 2.3 for k = 1, in Table 2.4 and 2.5 for k = 2, and in Table 2.6
and 2.7 for k = 3. All errors are calculated at final time T = 1/v. It is clear that SGDG
                                                53


scheme with P k Alpert’s multiwavelet basis has accuracy order k + 1. This is expected
since SGDG is the same as traditional DG in one dimension.
            N     L1 Error   Order    L2 Error  Order   L∞ Error    Order
             6   7.915e-04            1.506e-03         5.295e-03
             7   2.217e-04    1.836   4.421e-04  1.769  1.708e-03    1.632
             8   5.606e-05    1.983   1.143e-04  1.951  4.638e-04    1.881
             9   1.399e-05    2.002   2.865e-05  1.996  1.180e-04    1.975
            10   3.494e-06    2.001   7.160e-06  2.001  2.948e-05    2.001
            Table 2.2: Errors and Orders of Accuracy of E. k = 1, T = 1/v.
            N     L1 Error   Order    L2 Error  Order   L∞ Error    Order
             6   8.435e-03            1.623e-02         5.124e-02
             7   2.604e-03    1.696   5.537e-03  1.551  1.982e-02    1.370
             8   6.826e-04    1.931   1.514e-03  1.871  5.793e-03    1.775
             9   1.717e-04    1.991   3.836e-04  1.980  1.486e-03    1.963
            10   4.292e-05    2.000   9.603e-05  1.998  3.729e-04    1.994
            Table 2.3: Errors and Orders of Accuracy of J. k = 1, T = 1/v.
            N     L1 Error   Order    L2 Error  Order   L∞ Error    Order
             6   8.902e-04            1.718e-03         5.946e-03
             7   1.161e-04    2.939   2.359e-04  2.864  9.532e-04    2.641
             8   1.452e-05    3.000   2.967e-05  2.991  1.226e-04    2.959
             9   1.816e-06    2.999   3.715e-06  2.998  1.541e-05    2.992
            10   2.270e-07    3.000   4.643e-07  3.000  1.926e-06    3.000
            Table 2.4: Errors and Orders of Accuracy of E. k = 2, T = 1/v.
            N     L1 Error   Order    L2 Error  Order   L∞ Error    Order
             6   1.078e-02            2.051e-02         6.959e-02
             7   1.426e-03    2.919   3.144e-03  2.706  1.200e-02    2.536
             8   1.787e-04    2.996   3.992e-04  2.977  1.487e-03    3.013
             9   2.236e-05    2.998   5.002e-05  2.997  1.844e-04    3.011
            10   2.796e-06    3.000   6.253e-06  3.000  2.305e-05    3.000
            Table 2.5: Errors and Orders of Accuracy of J. k = 2, T = 1/v.
                                           54


               N     L1 Error    Order     L2 Error      Order   L∞ Error        Order
                6    8.642e-04             1.665e-03             5.768e-03
                7    5.565e-05    3.957    1.134e-04     3.877   4.617e-04       3.643
                8    3.473e-06    4.002    7.104e-06     3.996   2.926e-05       3.980
                9    2.171e-07    4.000    4.440e-07     4.000   1.830e-06       3.999
               10    1.357e-08    4.000    2.775e-08     4.000   1.144e-07       4.000
               Table 2.6: Errors and Orders of Accuracy of E. k = 3, T = 1/v.
               N     L1 Error    Order     L2 Error      Order   L∞ Error        Order
                6    1.058e-02             2.006e-02             6.870e-02
                7    6.831e-04    3.953    1.521e-03     3.721   5.726e-03       3.585
                8    4.279e-05    3.997    9.565e-05     3.991   3.510e-04       4.028
                9    2.674e-06    4.000    5.980e-06     4.000   2.190e-05       4.003
               10    1.688e-07    3.986    3.738e-07     4.000   1.371e-06       3.998
               Table 2.7: Errors and Orders of Accuracy of J. k = 3, T = 1/v.
     Besides, we also use this example to test our adaptive method, and we calculate the
convergence rate with respect to the error threshold Rε and with respect to degree of
freedoms RDOF as following
                                     log(el−1 /el )             log(el−1 /el )
                       RDOF =                          ,  Rε =                 .
                                log(DOFl /DOFl−1 )              log(εl /εl−1 )
where el is the standard L2 error, of either E or J component, with refinement parameter
εl , and DOFl is the associated number of active degrees of freedom at final time after last
coarsening step.
2.5.2   Soliton Propagation
In this example, we will consider the soliton propagation similar to the setup in [1]. The
coefficients in this example (2.12) are chosen as
                                                 1                  1
                  ϵ∞ = 2.25, ϵs = 5.25, β1 = 3,     = 1.168 × 10−5 , = 29.2/32,
                                                 τ                  τv
                                                                                 p
               a = 0.07, θ = 0.3, Ω0 = 12.57, ω0 = 5.84, ωv = 1.28, ωp = ω0 β1 .
                                                  55


                    ε      DOF     E error   RDOF      Rε     J error  RDOF      Rε
               5.00e-05      48   5.57e-05                   3.46e-03
               2.00e-05      58   3.10e-05    3.09    0.64   2.02e-03   2.86    0.59
               1.00e-05      74   1.69e-05    2.50    0.88   1.26e-03   1.92    0.67
               5.00e-06      88   9.14e-06    3.54    0.89   7.03e-04   3.38    0.85
               2.50e-06      92   5.42e-06   11.73    0.75   3.72e-04  14.31    0.92
                   Table 2.8: Adaptive Result of P 2 case, using alternating flux
                    ε       DOF    E error   RDOF       Rε    J error  RDOF      Rε
                5.00e-05     36    7.84e-05                  2.46e-03
                2.00e-05     40    3.42e-05   7.89     0.91  1.15e-03   7.23    0.83
                1.00e-05     62    2.32e-05   0.88     0.56  8.14e-04   0.79    0.50
                5.00e-06     76    8.76e-06   4.78     1.41  3.40e-04   4.28    1.26
                2.50e-06     82    6.21e-06   4.54     0.50  1.79e-04   8.44    0.93
                     Table 2.9: Adaptive Result of P 2 case, using upwind flux
                    ε       DOF    E error   RDOF       Rε    J error  RDOF      Rε
                5.00e-05     24    2.77e-05                  1.71e-03
                2.00e-05     32    1.64e-05   1.82     0.57  9.12e-04   2.18    0.69
                1.00e-05     40    1.05e-05   2.01     0.65  5.65e-04   2.15    0.69
                5.00e-06     52    6.01e-06   2.12     0.80  3.82e-04   1.49    0.56
                2.50e-06     58    3.98e-06   3.76     0.59  2.75e-04   3.01    0.47
                  Table 2.10: Adaptive Result of P 3 case, using alternating flux
                    ε       DOF    E error   RDOF       Rε    J error  RDOF      Rε
                5.00e-05     22    3.24e-05                  1.25e-03
                2.00e-05     26    1.76e-05   3.63     0.66  4.17e-04   6.55    1.19
                1.00e-05     32    8.81e-06   3.35     1.00  2.33e-04   2.80    0.84
                5.00e-06     44    6.58e-06   0.92     0.42  2.23e-04   0.14    0.06
                2.50e-06     48    4.28e-06   4.94     0.62  1.32e-04   6.00    0.75
                    Table 2.11: Adaptive Result of P 3 case, using upwind flux
Initially, all fields are zero. The left boundary is injected with an incoming solitary wave,
for which the electric field is prescribed as
                                    E(x = 0, t) = f (t) cos(Ω0 t),
where f (t) = M sech(t − 20). M is a constant to be specified later. As in [1], the boundary
condition of H can be approximated from the linearized dispersion relation. Assuming
a space-time harmonic variation ei(wt−kx) of all fields, the exact dispersion relation associ-
                                                 56


ated with the linear parts of the system implies that
                                         s
                                     √             ωp2 /ϵ∞
                              k = ω ϵ∞ 1 − 2                  .
                                               ω − iω/τ − ω02
The approximate value of H is given by
                                          ( 8                                            )
                    Z ∞                    X (−i)m  1 (m)
                               iωt                                     (m)
     H(x = 0, t) =       H(ω)e
                         b         dω ≃ ℜ                            f     (t) exp(iω0 t) .
                     −∞                    m=0
                                                 m!      Z
                                                                ω=Ω0
where ℜ is real part of complex number, f (m) (t) is the m-th derivative of f (t), and ( Z1 )(m)
is the m-th derivative of Z = −ω/k with respect to ω, where k is considered as a function
of ω as described above. Besides, we consider the absorbing right boundary derived from
the linearized system, which is
                                                             √
                       b = 3 E − − √1 H − ,      e = 3H− −      ϵ∞ −
                       E                        H                  E ,
                            4        4 ϵ∞             4         4
for central flux and alternating flux, and
                                                             √
                       b = 1 E − − √1 H − ,      e = 1H− −      ϵ∞ −
                       E                        H                  E ,
                            2        2 ϵ∞             2         2
for upwind flux; these settings guarantee the energy stability for semi-discrete schemes
as desired and discussed in [1].
    We choose maximum mesh level to be N = 11, and the time step ∆t = CF L × ∆x =
CF L/2N . The CFL number is chosen to be CF L = 0.3 for both alternating I/II fluxes and
upwind flux, to eliminate the error from time stepping and to ensure the convergence to
the expected solution. Figure A.1 and A.9 show the adaptive scheme solutions at two
different time T = 40 or T = 80, for different fluxes and different degrees of Alpert’s
multiwavelet basis of SGDG spaces, while the previous figure is for transient fundamental
temporal soliton, and the later for transient second order temporal soliton. We choose
refinement error threshold ε = 10−5 , and coarsening error threshold δ = ε/10 = 10−6 .
    As discussed in [121, 122, 1], a daughter pulse occurs other than the soliton-like pulse,
and travels faster than the soliton-like pulse. Among all cases, the daughter pulse has
                                             57


smaller magnitude and higher frequency, comparing to the soliton-like pulse. The daugh-
ter pulse shows up in almost all simulations except in case of upwind flux with polyno-
mial degree 1, since numerical dissipation of upwind flux is stronger than other flux,
damping the magnitude of the pulse significantly.
    We also plot at time T = 40 and T = 80, the numerical solutions and active elements
of adaptive scheme, for different fluxes and polynomial degree, in Figure A.2, A.3, A.4
for transient fundamental case, and in Figure A.10, A.11, A.12 for transient second order
case. Among all cases, the active elements of numerical solution sweep from left bound-
ary to right boundary, following the position of the soliton-like pulse. For any fixed flux,
high order method uses fewer active elements on simulation, when the high order and
low order method share the same refinement and coarsening accuracy threshold. It is
also clear that fewer active elements are in the hash table for adaptive scheme with up-
wind flux, especially at time T = 80 comparing to the other alternating fluxes; again this
demonstrates upwind flux is numerically more dissipative than other fluxes, so that it
significantly eliminates any oscillations.
    To demonstrate the function of refinement threshold ε, we also compute the transient
fundamental temporal soliton with refinement threshold ε = 10−3 and coarsening thresh-
old δ = 10−4 , in Figure A.5, A.6, A.7, and A.8. Spurious oscillations occur on simulations
of alternating flux on the left side of soliton-like pulse; besides, oscillation on alternating
flux II case has larger magnitude than that of alternating I case. There is no oscillation
on upwind flux case, again due to numerical dissipation of upwind flux. Simulations
with upwind flux use fewer active elements than alternating fluxes, which is a similar
result as in ε = 10−5 case. Comparing to ε = 10−5 case, when polynomial degree and flux
are the same, simulations of ε = 10−3 use fewer elements, especially on the right side of
soliton-like pulse, and for this reason, daughter pulse in ε = 10−5 case does not show up
in the ε = 10−3 case. Larger thresholds of refinement and coarsening allow larger error
on numerical solution, so the numerical scheme becomes less sensitive on detecting the
                                               58


daughter pulse of small magnitude. Similar to ε = 10−5 case, support of elements sweep
from the left to the right following transport of soliton-like pulse.
2.6     Numerical Simulation in Higher Dimensional Case
2.6.1   Accuracy and Convergence Test of Linear Equation
The following equation of unknown u = u(t, x, y), with (t, x, y) ∈ [0, T ] × Ω and Ω =
[0, 1] × [0, 1], is considered in this test
                                            ut + ux + uy = 0,                            (2.56)
with periodic boundary condition in both x and y direction. The initial condition is
                                                                  1
                             u(0, x, y) = u0 (x, y) =                      ,             (2.57)
                                                        2 + sin(2π(x − y))
and the accurate solution is
                                      u(t, x, y) = u0 (x − t, y − t).                    (2.58)
Spatially we use Sparse Grid DG method with upwind flux, and trapezoidal discretiza-
tion in time. Since trapezoidal rule is second order in time, we use the following CFL
condition:
                                                      CF L
                                     ∆t ≤                                                (2.59)
                                                 1              1
                                                        +
                                             ∆x(k+1)/2 ∆y (k+1)/2
if spatial discretization is k + 1 order. The CFL number is chosen as the following:
                                            k      1     2     3
                                          CFL     0.9   4.9   9.9
                                       Table 2.12: CFL numbers
     We obtain the following tables of numerical error and order, for end time T = 1. Here
                                                                                      1
N represent the maximum mesh level on both x, y direction, i.e. ∆x = ∆y = N . We also
                                                                                     2
put together the table at T = 0, i.e. the error of projection of initialization for comparison,
and similar order of accuracy is observed.
                                                     59


N     L1 error    order     L2 error    order      L∞ error     order
5   4.57312e-02           5.30394e-02            1.15945e-01
6   1.51710e-02    1.592  1.98859e-02   1.415    5.50358e-02    1.075
7   5.14680e-03    1.560  7.43171e-03   1.420    2.41551e-02    1.188
8   1.49080e-03    1.788  2.31736e-03   1.681    9.06915e-03    1.413
9   2.73017e-04    2.449  4.83023e-04   2.262    2.87813e-03    1.656
 Table 2.13: Errors and orders of accuracy of u, for k = 1, at T = 1
N     L1 error    order     L2 error    order      L∞ error     order
5   7.89276e-03           1.24346e-02            5.85053e-02
6   2.32947e-03    1.761  3.63022e-03   1.776    2.05560e-02    1.509
7   8.50218e-04    1.454  1.49729e-03   1.278    1.17706e-02    0.804
8   2.33762e-04    1.863  4.21631e-04   1.828    3.97204e-03    1.567
9   6.76643e-05    1.789  1.24259e-04   1.763    1.27171e-03    1.643
 Table 2.14: Errors and orders of accuracy of u, for k = 1, at T = 0
N     L1 error    order     L2 error    order      L∞ error     order
5   5.07673e-03           6.88257e-03            2.69062e-02
6   4.97155e-04    3.352  7.57930e-04   3.183    5.93592e-03    2.180
7   1.81276e-04    1.456  2.87770e-04   1.397    2.56533e-03    1.210
8   1.89313e-05    3.259  3.28919e-05   3.129    4.44195e-04    2.530
9   2.79344e-06    2.761  5.14717e-06   2.676    7.21485e-05    2.622
 Table 2.15: Errors and orders of accuracy of u, for k = 2, at T = 1
N     L1 error    order     L2 error    order      L∞ error     order
5   1.26846e-03           2.26795e-03            1.55856e-02
6   1.53309e-04    3.049  2.74032e-04   3.049    2.37428e-03    2.715
7   4.71975e-05    1.700  9.17983e-05   1.578    1.04093e-03    1.190
8   6.57529e-06    2.844  1.27438e-05   2.849    1.99676e-04    2.382
9   9.84804e-07    2.739  2.00984e-06   2.665    3.13352e-05    2.672
 Table 2.16: Errors and orders of accuracy of u, for k = 2, at T = 0
N     L1 error    order     L2 error    order      L∞ error     order
4   4.45318e-03           5.69848e-03            2.10659e-02
5   1.11115e-03    2.003  1.46891e-03   1.956    7.11292e-03    1.566
6   1.10792e-04    3.326  1.47087e-04   3.320    8.31930e-04    3.096
7   8.05073e-06    3.783  1.70609e-05   3.108    1.51009e-04    2.462
8   1.20156e-06    2.744  1.80771e-06   3.238    1.11994e-05    3.753
 Table 2.17: Errors and orders of accuracy of u, for k = 3, at T = 1
                                 60


            N      L1 error    order       L2 error      order   L∞ error    order
             4   1.04154e-03            1.48986e-03             4.86638e-03
             5   2.18472e-04    2.253   4.12664e-04      1.852  1.88407e-03  1.369
             6   1.81491e-05    3.589   3.27455e-05      3.656  3.26242e-04  2.530
             7   2.18507e-06    3.054   5.21414e-06      2.651  7.02091e-05  2.216
             8   1.71983e-07    3.667   3.57593e-07      3.866  6.91854e-06  3.343
              Table 2.18: Errors and orders of accuracy of u, for k = 3, at T = 0
    For the same equation, initial condition, and accurate solution as described above, we
also consider the adaptive scheme. In such case we use CFL condition
                                                  CF L
                                        ∆t ≤               ,
                                                1      1
                                                   +
                                               ∆x ∆y
with CFL= 0.9. We simulate the case of maximum mesh level N = 7 or N = 8, at
final time T = 1, and different refinement parameter ε; coarsen parameter η is set to be
ε/10. We also compute convergence rate, Rε with respect to refinement parameter or error
threshold, and RDOF with respect to active degree of freedom at final time, with following
formula:
                                                log(el−1 /el )
                                 RDOF =                                                  (2.60)
                                           log(DOFl /DOFl−1 )
                                              log(el−1 /el )
                                       Rε =                                              (2.61)
                                              log(εl−1 /εl )
where el is L2 error, εl the corresponding refinement parameter ε, and DOFl the active
degrees of freedom at final time step. These results are shown in Table 2.19.
    In the following Figure 2.2, we plot the accurate solution at T = 1, i.e.
                                                          1
                              u(t = 1, x, y) =                     ,
                                                2 + sin(2π(x − y))
and the distribution of active elements on computational domain, by showing the center
of active elements, at final time T = 1, for different cases as given in Table 2.19, i.e k = 1
for N = 8 and k = 2, 3 for N = 7, with ε = 10−5 . The figures show that more active
elements are needed in the region where the numerical solution has larger magnitude,
which is expected for adaptive scheme.
                                                61


                                   ε    DOF      L2 error   RDOF     Rε
                                 1e-03   1216   3.472e-03
                                 5e-04   1848   2.671e-03   0.627  0.378
                                 1e-04   3232   9.350e-04   1.878  0.652
                k = 1, N = 8
                                 5e-05   5456   3.508e-04   1.872  1.414
                                 1e-05  10976   1.628e-04   1.099  0.477
                                 5e-06  13980   1.286e-04   0.972  0.339
                                 1e-03   900    1.367e-03
                                 5e-04   1080   5.501e-04   4.993  1.313
                                 1e-04   2268   2.874e-04   0.875  0.403
                k = 2, N = 7
                                 5e-05   2880   1.808e-04   1.941  0.669
                                 1e-05   5400   4.064e-05   2.374  0.927
                                 5e-06   6930   2.996e-05   1.222  0.440
                                 1e-03   704    5.249e-04
                                 5e-04   1120   4.092e-04   0.536  0.359
                                 1e-04   1856   8.728e-05   3.059  0.960
                k = 3, N = 7
                                 5e-05   2112   4.854e-05   4.540  0.846
                                 1e-05   3968   1.858e-05   1.523  0.597
                                 5e-06   4800   6.058e-06   5.888  1.617
              Table 2.19: L2 error and order for adaptive scheme, for T = 1
2.6.2  Accuracy and Convergence Test for nonlinear Maxwell Equation
Here we consider the nondimensionlized form of system (2.31), i.e.
                       ∂t Hz = ∂y Ex − ∂x Ey                                      (2.62a)
                       ∂t Dx = ∂y Hz                                              (2.62b)
                       ∂t Dy = −∂x Hz                                              (2.62c)
                          D = ϵ∞ E + P + a(1 − θ)|E|2 E + aθQE                    (2.62d)
                        ∂t P = J                                                  (2.62e)
                         ∂t J = −γJ − ω02 P + ωp2 E                                (2.62f)
                        ∂t Q = σ                                                  (2.62g)
                         ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2                          (2.62h)
on spatial domain Ω = [0, 1]2 and time domain [0, T ], and manufactured solution with pe-
riodic boundary condition as specified below, to demonstrate accuracy and convergence
                                            62


                           Accurate Solution                                         Center of Elements for N=8, k=1
                                                                            1
          1                                                                0.8
     0.8
                                                                           0.6
     0.6                                                               y
                                                                           0.4
     0.4
          1                                                                0.2
                                                      1
                0.5
                                               0.5                          0
                y           0     0                                              0                0.5                      1
                                                x
                                                                                                   x
                    Center of Elements for N=7, k=2                                  Center of Elements for N=7, k=3
      1                                                                     1
    0.8                                                                    0.8
    0.6                                                                    0.6
y                                                                      y
    0.4                                                                    0.4
    0.2                                                                    0.2
      0                                                                     0
          0                      0.5                  1                          0                0.5                      1
                                  x                                                                x
              Figure 2.2: Accurate solution, and active elements at T = 1 for ε = 10−5 ; red
                                  circles are center of active elements.
of SGDG method. We use following functions as manufactured solution:
                                        Hz (t, x, y) = exp(cos(w(t + αx + βy))/q),                                     (2.63)
and
                          P = D = E = (β, −α)Hz ,          J = ∂t P,       Q = Hz , σ = ∂t Hz .                        (2.64)
We also define parameters as
                       ϵ∞ = 1, a = 0.01, θ = 0.125, γ0 = γv = 0.05, ω0 = ωp = ωv = 1,                                  (2.65)
                                                           63


so equations (2.62d), (2.62f), and (2.62h) needs to add extra source for balance of equa-
tions. However, the PDE part of system (2.62) is satisfied, whenever α2 + β 2 = 1. In our
example here, to impose periodic boundary condition, we choose
                              √           1          2
                     w = 2π 5,       α= √ ,     β=√ ,      q = 15.                  (2.66)
                                           5          5
To match the high order accuracy of spatial and time discretization, we first set
                                               (k+1)/2
                                      1                              1
                ∆t = CFL ×     1                    , ∆x = ∆y =     ,          (2.67)
                                            1                        2N
                                 2      +
                                    ∆x ∆y
where N is the maximum of mesh level, k is degree of Alpert’s multiwavelet functions,
and we pick the CFL number as following:
                                           
                                              0.3 k = 1
                                           
                                           
                                           
                                           
                                           
                                   CFL =      1    k=2
                                           
                                           
                                           
                                            2
                                           
                                                   k=3
We compute the errors as in Table 2.20 at final time T = 0.25, using Alternating Flux I.
We can observe reduction of order of convergence from k + 1 of regular DG method to
k + 1/2 of SGDG method by 1/2, given the property of SGDG method as discussed in e.g.
[4], Theorem 3.4. This verifies the accuracy of the program.
                                             64


N    Hz                              Ex                                    Ey
                       ∞                                   ∞
     L2              L               L2                 L                  L2              L∞
   1
P
 6   8.08e-04        2.89e-03        1.41e-03           3.97e-03           1.61e-03        5.05e-03
 7   2.71e-04 1.58   1.13e-03   1.35 5.19e-04    1.44   2.16e-03    0.88   6.11e-04   1.40 2.42e-03 1.06
 8   5.99e-05 2.18   3.64e-04   1.63 2.10e-04    1.30   9.14e-04    1.24   2.33e-04   1.39 1.10e-03 1.14
 9   2.16e-05 1.47   1.03e-04   1.82 7.72e-05    1.44   3.55e-04    1.36   8.19e-05   1.51 4.08e-04 1.43
10   6.01e-06 1.85   3.67e-05   1.49 2.78e-05    1.47   1.49e-04    1.25   2.88e-05   1.51 1.63e-04 1.33
P2
N    Hz                              Ex                                    Ey
                       ∞                                   ∞
     L2              L               L2                 L                  L2              L∞
 5   5.95e-04        1.27e-03        4.34e-04           1.09e-03           2.56e-04        8.53e-04
 6   8.23e-05 2.85   2.16e-04   2.55 7.09e-05    2.61   2.92e-04    1.90   5.13e-05   2.32 2.16e-04 1.98
 7   1.07e-05 2.94   2.81e-05   2.94 1.09e-05    2.70   4.50e-05    2.70   8.34e-06   2.62 3.84e-05 2.49
 8   1.34e-06 3.00   4.83e-06   2.54 1.69e-06    2.69   7.98e-06    2.50   1.45e-06   2.52 8.08e-06 2.25
 9   1.69e-07 2.99   5.10e-07   3.24 2.84e-07    2.57   1.48e-06    2.43   2.59e-07   2.49 1.39e-06 2.54
P3
N Hz                                 Ex                                    Ey
                       ∞                                   ∞
     L2              L               L2                 L                  L2              L∞
 5 8.97e-04          1.39e-03        5.98e-04           9.08e-04           3.00e-04        5.14e-04
 6 5.67e-05   3.98   9.02e-05   3.94 3.78e-05    3.98   6.23e-05    3.86   1.90e-05   3.98 4.43e-05 3.54
 7 3.65e-06   3.96   5.92e-06   3.93 2.44e-06    3.96   4.52e-06    3.79   1.23e-06   3.95 3.68e-06 3.59
 8 2.28e-07   4.00   3.98e-07   3.90 1.53e-07    3.99   3.63e-07    3.64   7.79e-08   3.98 3.24e-07 3.50
 9 1.43e-08   3.99   2.56e-08   3.96 9.64e-09    3.99   3.38e-08    3.43   5.06e-09   3.94 2.80e-08 3.53
                 Table 2.20: L2 and L∞ errors of Hz , Ex , Ey for Alternating Flux I case
                                                   65


2.6.3   Spatial Optical Soliton Propagation
In this subsection, we present the result of the adaptive SGDG schemes to simulate phys-
ically relevant problem of spatial optical soliton propagation. We start from the setup
of the example in the dimensional form, and show the non-dimensionalized form after-
wards based on which the actual simulation is conducted.
    We first consider the spatial optical soliton propagation in realistic glasses, character-
ized by a three-pole Sellmeier linear dispersion, an instantaneous Kerr nonlinearity and
a dispersive Raman nonlinearity as discussed in [2, 127]. The non-dimensionalized form
of the equations is the following:
                 µ0 ∂t Hz = ∂y Ex − ∂x Ey                                                    (2.68a)
                   ∂t Dx = ∂y Hz                                                             (2.68b)
                    ∂t Dy = −∂x Hz                                                            (2.68c)
                                 "           3
                                                                                 #
                                           X
                        D = ϵ0 ϵ∞ E + b          Ps + a(1 − θ)|E|2 E + aθQE                  (2.68d)
                                           s=1
                    ∂ t P s = Js ,  s = 1, 2, 3,                                             (2.68e)
                                        2          2
                     ∂t Js = −γJs − ω0s   Ps + ωps   E,     s = 1, 2, 3,                      (2.68f)
                     ∂t Q = σ                                                                (2.68g)
                      ∂t σ = −γv σ − ωv2 Q + ωv2 |E|2                                        (2.68h)
with parameters
      ω01 = 2.7537 × 1016 rad/s, ω02 = 1.6205 × 1016 rad/s, ω03 = 1.9034 × 1014 rad/s,
                                                   s
                                       2              τ12 + τ22
      a = 1.89 × 10−22 m2 /V 2 , γv = , ωv =                    , τ1 = 12.2f s, τ2 = 32.2f s,
                                       τ2               τ12 τ22
and
                                                                p
      β1 = 0.69617, β2 = 0.40794, β3 = 0.89748, ωps =            βps ω0s , γs = 0, s = 1, 2, 3,
                                   ϵ∞ = 1, b = 1, θ = 0.3.
                                               66


The physical domain is [0, 38] for x and [−3, 3] for y, in unit of µm; however, to reduce
the numerical oscillations from domain boundary, we set the computational domain to
be larger, i.e. [0, 60] × [−4, 4] instead. On the left boundary x = 0, time-dependent hard
source is injected to the magnetic field Hz , namely,
                              Hz (x = 0, y, t) = H0 sin(ωc t)sech(y/w),              (2.69)
where, ωc = 4.35 × 1015 rad/s is the carrier frequency, and w = 667 nm, H0 = 4.77 ×
107 A/m are the width and the magnitude of the incident wave, respectively.
    To better understand expected behaviour of our numerical tests, recall that in uniform
glasses, the Maxwell’s equations (2.68) with the nonlinear constitutive laws reduce to the
nonlinear Schrödinger equation (NLSE) under paraxial assumption [2]. The pulse pro-
vided by the hard source above has predictable propagation, as indicated by the accurate
solution of NLSE. In fact, the normalized NLSE of u = u(ζ, ξ), i.e.
                        ∂u    1 ∂ 2u
                      i    =         + |u|2 u,  ζ ∈ (0, +∞), ξ ∈ (−∞, +∞),           (2.70)
                        ∂ζ    2 ∂ 2ξ
with initial condition u(0, ξ) = g(ξ), ξ ∈ (−∞, +∞) admits bright soliton solutions.
Specifically, regarding our numerical tests, we considered
                         g(ξ) = g1 (ξ) := ηsech (η(ξ − ξ0 )) exp(−iΛξ − iϕ)
as initial condition, and the solution is given by
                                                                         Λ2 − η 2
                u1 (ζ, ξ) = ηsech (η(ξ − ξ0 − Λζ)) exp(−iΛξ − iϕ + i ·            ),
                                                                            2
which is called fundamental soliton; η, Λ, ϕ, ξ0 are parameters. When Λ = 0, |u1 (ζ, ξ)|
remains constant for all ζ, and one can see that the fundamental soliton propagates in
the dispersive and weakly nonlinear medium without changing its amplitude, width or
shape. To compare with following case, consider ξ0 = 0 and η = 1. Instead, when
                                      g(ξ) = g2 (ξ) := 2sech(ξ),
                                                  67


the solution becomes
                                                cosh(3ξ) + 3 exp(−4iζ) cosh(ξ)
                    u2 (ζ, ξ) := 4 exp(−iζ/2)                                      .
                                               cosh(4ξ) + 4 cosh(2ξ) + 3 cos(4ζ)
This solution is called second-order soliton, as the result of the interactions between two
fundamental solitons. We present the magnitude of u1 and u2 in the following Figure 2.3.
     3                                           1      3
                                                 0.9                                     3.5
     2                                                  2
                                                 0.8
                                                                                         3
     1                                           0.7    1
                                                                                         2.5
                                                 0.6
     0                                                  0                                2
                                                 0.5
                                                                                         1.5
    -1                                           0.4   -1
                                                 0.3                                     1
    -2                                                 -2
                                                 0.2                                     0.5
    -3                                           0.1   -3
       0         0.5           1           1.5            0        0.5          1    1.5
Figure 2.3: Magnitude |u1 (ζ, ξ)| of Fundamental Soliton and |u2 (ζ, ξ)| of Second Order
Soliton, of Solutions of NLSE (2.70).
    Before we simulate the fundamental and second order solitons, we need a final step to
non-dimensionalize the Maxwell equation (2.68). Following the procedure desribed in [1],
we choose reference space scale x0 = 8µm, reference time scale t0 = x0 /c = 2.6685×10−14 s
with c the light speed, and reference value of Hz is H0 = 4.77 × 107 A/m. If reference value
of electric field is E0 , then we have rescaled fields and constants defined as follows:
                                 p
                        (H/E0 )     µ0 /ϵ0 → H, D/(ϵ0 E0 ) → D, P/E0 → P,
                      (J/E0 )t0 → J, E/E0 → E, Q/E02 → Q, (σ/E02 )t0 → σ,
                     aE02 → a, ω∗ t0 → ω∗ , ∗ = 0s, ps, v, s = 1, 2, 3, γv t0 → γv .
Note that here we use same notations for scaled and original variables. From the scaling
                                   p
of H = Hz , we have H0 = E0 ϵ0 /µ0 = E0 ϵ0 c, so E0 = 1.797 × 1010 N/(A · s).
                                                    68


     Spatial computational domain is [0, 60] × [−4, 4] in unit of µm, or [xa , xb ] × [ya , yb ] =
[0, 7.5] × [0, 1], after nondimensionalization on both x, y direction and shifting on y direc-
tion by 0.5, on x-y plane, and computational domain is large enough in both x and y di-
rection to avoid non-physical oscillation. Initially, all the fields are set to be zero. To make
the best use of boundary condition provided as Hz (x = 0, y, t) in (2.69), we consider the
adaptive SGDG schemes with alternating numerical fluxes (2.35). Note that in these two
cases, we do not need information of E(x = 0, y, t), since the flux regarding E on x = 0
along x direction is chosen to be right limit on x = 0, whose value is evaluated within
the computational domain. We refer the interested readers to [1] for numerical bound-
ary treatments suitable for other alternating fluxes. For simplicity we will only present
the simulation results by the Alternating I numerical flux. And except for the injected
boundary on x = 0, the other three edges of boundary are absorbing boundaries based on
linearized system on characteristic direction, after neglecting the nonlinear effects of Kerr
type and Raman scattering and the linear delayed response. The dimensionless form of
these numerical flux follows as
                                                            √                      √
    1. at right boundary x = xb of x direction, (Hz +         ϵ∞ Ey )+ = (Hz +       ϵ∞ Ey )− and
               √
       (Hz − ϵ∞ Ey )+ = 0;
                                                      √                     √
    2. at left boundary y = ya of y direction, (Hz +    ϵ∞ Ex )+ = (Hz +      ϵ∞ Ex )− and (Hz −
       √
         ϵ∞ Ex )− = 0;
                                                      √                     √
    3. at left boundary y = yb of y direction, (Hz −    ϵ∞ Ex )+ = (Hz −      ϵ∞ Ex )− and (Hz +
       √
         ϵ∞ Ex )+ = 0.
     In our simulation, we use P 2 Alpert’s multiwavlet as basis of numerical solution, cou-
pling with P 3 interpolatory multiwavelet to deal with nonlinearity. The interpolatory
multiwavelet basis is chosen to be one degree higher than Alpert’s, in order to eliminate
numerical error of interpolation. Maximum mesh level is 11, i.e. the most refined ∆x
                                               69


and ∆y might be 1/211 . Error thresholds for refinement and coarsening are ε = 10−3 and
η = 10−4 , respectively. The relative tolerance of nonlinear solver of PETSc is set to be 10−9 .
    We consider first the case of fundamental soliton, where the non-dimensionalized
form of initial condition is
                              Hz (x = 0, y, t) = sin(ωc t)sech(y/w),
with rescaled ωc = 116.08 and w = 0.083375. We plot the solutions at different time,
and the centers of corresponding active elements for adaptive scheme in Figure 2.4. Note
that in this case, the soliton during the propagation maintains magnitude and width as in
first picture of Figure 2.3, while the error threshold allows error of specific magnitude to
enter into the soliton solution. The distribution of active elements also provide important
information of the soliton: more active elements accumulate around y = 0.5, where the
magnitude of the magnetic field reaches its maximum, and fewer active elements support
on region away from y = 0.5, since the soliton solution almost vanishes when y is close to
its boundary. Along x direction, active elements and their support move along x direction
as the soliton propagates. The adaptive scheme generates active elements that properly
follow the propagation of the solution.
    Next, we present several simulation results of non-dimensionalized initial condition
                              Hz (x = 0, y, t) = 2 sin(ωc t)sech(y/w)
with same ωc and w as in fundamental soliton case; see Figure 2.5. We focus on the region
when x is smaller than 1/3. This solution has changing magnitude and width, which is
different from fundamental soliton. As predicted in NLSE (2.70) and Figure 2.3, this is the
case when two fundamental solitons produce the second order soliton, with focusing and
defocusing effect. Again the support of active elements match the region with larger |Hz |
magnitude, while at the region close to boundary of y direction or of large x, the solution
almost vanishes and fewer elements are needed for good resolution of the solution.
                                                 70


            1
                                                                               1
           0.9
                                                                               0.9
           0.8
                                                                               0.8
           0.7
                                                                               0.7
           0.6
                                                                               0.6
       y   0.5
                                                                               0.5
           0.4
                                                                               0.4
           0.3
                                                                               0.3
           0.2
                                                                               0.2
           0.1                                                                 0.1
            0
                 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
                                             x
            1                                                                  1
           0.9                                                                 0.9
           0.8                                                                 0.8
           0.7                                                                 0.7
           0.6                                                                 0.6
       y   0.5                                                                 0.5
           0.4                                                                 0.4
           0.3                                                                 0.3
           0.2                                                                 0.2
           0.1                                                                 0.1
            0
                 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
                                             x
            1
                                                                               1
           0.9                                                                 0.9
           0.8                                                                 0.8
           0.7                                                                 0.7
           0.6                                                                 0.6
       y   0.5                                                                 0.5
           0.4                                                                 0.4
           0.3                                                                 0.3
           0.2                                                                 0.2
           0.1                                                                 0.1
            0
                 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
                                             x
Figure 2.4: Comparison for fundamental soliton between absolute value |Hz | of magnetic
field (left column) and center of active elements (right column) at different time. First
row: T = 2; Second row: T = 4; Third row: T = 6.
                                                                                     71


            1
                                                                3.5
           0.9
                                                                3
           0.8
           0.7                                                  2.5
           0.6
                                                                2
       y   0.5
                                                                1.5
           0.4
           0.3
                                                                1
           0.2
                                                                0.5
           0.1
            0
                 0   0.05   0.1   0.15       0.2   0.25   0.3
                                         x
            1
                                                                3.5
           0.9
                                                                3
           0.8
           0.7
                                                                2.5
           0.6
                                                                2
       y   0.5
           0.4                                                  1.5
           0.3
                                                                1
           0.2
                                                                0.5
           0.1
            0
                 0   0.05   0.1   0.15       0.2   0.25   0.3
                                         x
            1
                                                                4
           0.9
                                                                3.5
           0.8
                                                                3
           0.7
                                                                2.5
           0.6
       y   0.5                                                  2
           0.4
                                                                1.5
           0.3
                                                                1
           0.2
                                                                0.5
           0.1
            0
                 0   0.05   0.1   0.15       0.2   0.25   0.3
                                         x
Figure 2.5: Comparison for second order soliton between absolute value |Hz | of magnetic
field (left column) and center of active elements (right column) at different time. First
row: T = 1.976; Second row: T = 3.024; Third row: T = 3.459.
                                                                      72


                                        CHAPTER 3
       APPLICATION OF WENO METHOD IN MATHEMATICAL MODEL OF
              ASYNCHRONOUS DATA FLOW IN PARALLEL COMPUTERS
3.1    Models of Extreme Scale Computers
It has been well-established that current and future generations of extreme scale com-
puters have achieved and, for the foreseeable future, are expected to achieve increases in
performance via greater levels of parallelism at multiple levels — e.g., within the proces-
sors as well as increasing the number of processors and nodes —as opposed to increases
in clock speeds, which are expected to remain relatively flat. Additionally, extremely con-
current codes, involving dynamic parallelism and greater degrees of asynchronous paral-
lel executions, are increasingly needed to leverage this large scale parallellism [128, 129].
    As machine improvements depend on increasingly complex architectures and as ad-
ditional constraints on system development and planning (such as power consumption
[129]) arise, a need for predictive, quantitative models of computational performance will
grow greater. Previously developed modeling tools such as LogP [130] result in easily
evaluated models which can prove difficult to extend and modify. Alternatively, PRAM
models have been used as abstractions of codes; these however have scalability issues
due to the complexity of simulating them [131]. Other modern tools [132, 133] are sim-
ilarly still limited to fine-grain simulations of at most a few dozen nodes, again due to
their computational complexity during simulations.
    Core counts are now in the hundreds of thousands and millions on machines in the
TOP500 list of supercomputers; node counts consistently are in the thousands. Such num-
bers mean that fine-grained simulation tools (such as those listed above) are incapable of
describing large-scale phenomena. Alternative approaches have been proposed to ad-
dress these issues: miniapp codes can mimic key features of the performance of exascale
                                               73


codes with a much smaller codebase [134]. Aspen, a framework for performance model-
ing [135, 136], uses a domain specific language which encodes both abstracted features of
machines hardware and specific software applications to provide coarse-grained simula-
tions. However, these suffer from the need to develop specialized simulation codes which
can be problem dependent, resulting in possibly labor-intensive tools. A workflow mod-
eling apporach, Pegasus, has been developed to model workflows using a graph-theoretic
perspective to detect and manage anamolies in the computing environment [137].
    We propose developing a macroscopic model of extreme scale computers which views
such computing environments in a continuum framework. Such a model has several po-
tential benefits: in addition to being computationally tractable, it will open up the pos-
sibility of using the theoretical tools of partial differential equations to understand and
control the performance of high-performance computing systems. Specifically, our goal
is to derive a fluid-limit model of data flow — which can be described by a partial differ-
ential equation — from a simplified deterministic model of data processing and flow in
an extreme scale computer with interprocessor communications and asynchronous exe-
cutions. Fluid models, beyond their obvious utility in physical systems, have been used
to model flows in networks, such as vehicular traffic flows [138], supply chains [139], and
gas networks [140, 141]. In particular, as discussed in [142] and [139], such fluid models
lie at the end of a hierarchy of models which begin with microscopic or discrete mod-
els. That is, similar to the derivation of physical fluid laws from many body physics, one
may derive continuum-level flow equations from discrete-level models of the dynamics
of agent interactions. With such a model, standard numerical simulation tools and analyt-
ical methods may be brought to bear for studying large-scale phenomena in extreme-scale
computing.
    We begin in section 3.2 with a microscopic model of a network of processors perform-
ing a multi-stage computational task which necessitates inter-processor communications.
In section 3.3, we derive a formal asymptotic limit of this agent-based model as the scale
                                              74


of the system increases, resulting in an Eulerian fluid flow model. Along with the re-
sulting nonlinear conservation law, we present a related Hamilton-Jacobi equation and
establish the existence of solutions [103, 104] in section 3.3. In section 3.4, we illustrate the
numerical methods we use to simulate discrete and continuum model, especially WENO
interpolation for solving Hamilton-Jacobi equation [12, 11, 107, 99]. Finally in section 3.5,
we present the results of numerical experiments, to show agreement between the micro-
scopic and fluid models and then illustrate the behavior of solutions under heterogeneous
computing layouts.
     This chapter is based on published paper [10].
3.2      The Discrete Model
In this section, we introduce the microscopic model, which is based on a highly simplified,
deterministic, semi-discrete ordinary differential equation (ODE). Imagine the computer
                                         max
as a network of processors {Pi }ii=1 that are arranged in a one-dimensional, periodic lat-
tice. The computer is assigned a computational job involving a sequence of k max tasks
which are identical in the sense that each one takes the same effort to complete. This
computational job is divided by distributing data within the many processors. Denote by
qi,k (t) the amount of data in Pi that sits in stage k at time t.
     The dynamics of qi,k are given by a conservation law of the form
                 q̇i,k (t) = Fi,k−1 (t) − Fi,k (t),     k = 1, . . . , k max , i = 1, . . . imax ,    (3.1)
where Fi,k (i = 1, . . . , imax , k = 1, . . . , k max −1) is the rate of data moving in processor i from
stage k to k + 1, referred to as the throughput. At the first stage k = 1, Fi,0 (i = 1, . . . , imax )
is the rate of data being loaded into processor i to be processed, referred to as the inflow,
and at the final stage, Fi,kmax (i = 1, . . . , imax ) is the rate of data completing the final stage
of the job, referred to as the outflow. Equation (3.1) implies that the data in each processor
is neither created or destroyed, only moved in and out of the processor or in between
                                                         75


stages; that is,
                                       kXmax        !
                                    d
                                             qi,k      = Fi,0 − Fi,kmax .                (3.2)
                                    dt  k=1
Another fundamental quantity of interest in the discrete model is Qi,k (t), which is defined
as the amount of data at time t that has gone through the first k − 1 stages of Pi . For each
t ≥ 0,
                                           max
                                        kX              Z t
                           Qi,k (t) =          qi,j (t) +           Fi,kmax (s)ds.       (3.3)
                                          j=k                 0
    To determine the form of Fi,k , consider the case of with or without throttling. Without
throttling, assume data can move in one processor Pi between tasks in the rate of a given
maximum throughput ai ≥ 0, which is independent of the task number k. If we consider
throttling, Fi,k (t) does not reach the maximum ai for two reasons considered here: self-
throttling and neighbour-throttling.
   1. Self-throttling: Given an amount of data qi,k to be processed at stage k in processor
                                ..              ..              ..
                                 .               .               .
                            qi−1,k+1        qi,k+1          qi+1,k+1          k+1
                             qi−1,k          qi,k            qi+1,k             k
                            qi−1,k−1        qi,k−1          qi+1,k−1          k−1
                                 ..              ..              ..
                                  .               .               .
                               Pi−1            Pi           Pi+1
Figure 3.1: Schematic of network of processors. Dashed lines denote inter-processor com-
munications
                                                      76


   Pi , we define the self-throttling function
                                                                         
                                                                      qi,k
                                    v1 (qi,k ; q∗ ) = max 0, min 1,           ,           (3.4)
                                                                       q∗
   in the form of roof-line model. If no data is available to be processed, then Fi,k = 0;
   if the amount of data to be processed drops below a certain threshold q∗ > 0, then
   Pi cannot maintain the throughput ai and the throughput is reduced.
2. Neighbor throttling: As the computational task is not entirely parallel across pro-
   cessors, Pi requires sufficient information from its neighbors to perform task k at full
   throughput. The neighbor throttling function v2 models this dependence. It gives
   the amount of available data on Pi at stage k
                                                                                       
                                                        1                   1
        v2 (qi,k , ∆i+1,k , ∆i−1,k ; β) = min qi,k , max{∆i+1,k , 0}, max{∆i−1,k , 0} .   (3.5)
                                                        β                   β
   Here ∆i±1,k denotes the data on the right/left neighbor which is available to be used
   by Pi to process qi,k . The parameter β ∈ (0, 1] allows for the possibility that compu-
   tations do not rely in a one-to-one fashion upon the availability of data from neigh-
   bors. If ∆i±1,k = 0 the processing of data stops due to the absence of a necessary
   component of the computational task and so Fi,k = 0. Alternatively, if both ∆i+1,k
   and ∆i−1,k exceed βqi,k , then Pi has sufficient data from its neighbors to process qi,k
   and no throttling occurs.
   The data from the left/right neighbor which is available for processing at stage k is
   given by
                             ∆i±1,k = Qi±1,k − Qi,k+1 = Qi±1,k − (Qi,k − qi,k ).          (3.6)
   The data on each neighbor must have completed the same stage for it to be available;
   additionally, this data is not reused on Pi for the same stage. This means that the
   data available to be used from the neighbors can be written as above and so the
   amount of data available to be processed on Pi at stage k is given by
                                                                                     
                            v2 qi,k , Qi+1,k − Qi,k + qi,k , Qi−1,k − Qi,k + qi,k ; β .   (3.7)
                                                       77


     The throughput Fi,k is a composition of the throttling functions v1 and v2 :
                                                                                                
                  Fi,k = ai v1 v2 qi,k , Qi+1,k − Qi,k + qi,k , Qi−1,k − Qi,k + qi,k ; β ; q∗ .                 (3.8)
At first glance, this definition of Fi,k appears circular since it depends on Qi,k , which in
turn depends on Fi,kmax . However, as a consequence of the conservation law (3.2),
                                                                  kXmax            kXmax
                         Z  t                   Z  t
                              Fi,kmax (s)ds =        Fi,0 (s)ds +       qi,j (0) −       qi,j (t)               (3.9)
                          0                      0                 j=1              j=1
Thus to complete the model, we need only prescribe initial data qi,k (0) and the inflow Fi,0 .
To prescribe the inflow, we specify qi,0 and then let Fi,0 be evaluated according to (3.8).
Theorem 3.2.1 ([10], Proposition 2.1) The system (3.1) with (i) throughput Fi,k defined in (3.8)
for i = 1, . . . , imax and k = 1, . . . , k max ; (ii) prescribed initial data qi,k (0) for i = 1, . . . , imax and
k = 1, . . . , k max ; and (iii) prescribed inflow data Fi,0 for i = 1, . . . , imax and t ≥ 0 has a unique
solution for all t ≥ 0. Moreover, if qi,k (0) ≥ 0 for all i = 1, . . . , imax and k = 1, . . . , k max , then
qi,k (t) ≥ 0 for all t ≥ 0 and i = 1, . . . , imax and k = 1, . . . , k max .
3.3      The Continuum Model
In this section, we consider a formally accurate continuum model in the limit as number
of processors imax and number of tasks k max tend to infinity. So far to our knowledge, there
is no immediate conclusion of existence and uniqueness for such model, so a Hamilton-
Jacobi equation is demonstrated at the end of this section. The extensive theory of vis-
cosity solutions of Hamilton-Jacobi equation leads to promising result of existence and
uniqueness of the model. Higher dimension model can be derived in the similar manner;
for details, see [10] Section 3.3.
     For given imax , k max , we define the quantities:
                                                                               ε    k max
                              δ := (k max )−1 ,    ε := (imax )−1 ,    η :=      = max .                      (3.10)
                                                                               δ    i
                                                           78


Here, δ is the fraction of the work done in each stage and ε is the average amount of data
in a processor. In following paragraphs, we provide a brief discussion of derivation of the
continuum model, by imax and k max tending to infinity. We assume, in taking this limit,
that the job performed by the computer is fixed – that is, the total amount of work does
not change.
    To derive such a continuum model, we first express the ODE (3.1) in terms of the
following O(1) quantities:
           q∗                  qi,k                  1               ±         Ri±1,k − Ri,k                ai
    r∗ :=        ,    ri,k :=       ,    Ri,k := Qi,k ,           Di,k  := ±                       ,  αi :=    .  (3.11)
           εδ                  εδ                    ε                                  ε                   ε
Define the rescaled throttling functions
                                                                  
                                                                 r
                        w1 (r, r∗ ) = max 0, min 1,                      .                                       (3.12a)
                                                                 r∗
                                                                                                         
                    −     +
                                                   1               +            1               −
       w2 r, D , D ; η, β = min r, max{ηD + r, 0}, max{ηD + r, 0}                                                (3.12b)
                                                    β                           β
and the composite function
                       w r, D− , D+ ; r∗ , α, η, β := αw1 w2 (r, D− , D+ ; η, β); r∗ .
                                                                                                    
                                                                                                                  (3.13)
The dynamics in (3.8) can now be re-expressed in terms of the O(1) quantities in (3.11),
thereby obtaining an evolution formula for ri,k :
                                                           fi,k−1 (t) − fi,k (t)
                                             ṙi,k (t) =                                                          (3.14)
                                                                      δ
                                                                −          +
                                                                                                 
                              fi,k (t) = w ri,k (t), −Di,k        (t), Di,k  (t); r∗ , αi , η, β                  (3.15)
for i = 1, . . . , imax , k = 0, . . . , k max , and ri,0 is prescribed for i = 1, . . . , imax .
    The next step is to interpret (3.14) as a conservative finite-difference formula for a
sufficiently smooth function ρ = ρ(x, y, t), defined on [0, 1) × [0, 1] × [0, ∞), such that
                                                   ρ(xi , zk , t) = ri,k (t),                                     (3.16)
on grid points
                                         xi = (i − 0.5) ε and zk = kδ,                                            (3.17)
                                                               79


for i = 1, . . . , imax and k = 0, . . . , k max . We also let α = α(x) be a continuous function
such that α(xi ) = αi . Let the function ϕ = ϕ(x, z, t) interpolate the fluxes on the same grid
points:
                                                   ϕ(xi , zk , t) = fi,k (t),                               (3.18)
for i = 1, . . . , imax , k = 1, . . . , k max , t ≥ 0, and
                                                  Z 1                      Z t
                                 P (x, z, t) =        ρ(x, ξ, t)dξ +            ϕ(x, 1, s)ds,               (3.19)
                                                    z                       0
and we find that both
                           Φ(1) (ρ(xi , zk , t), ∂x P (xi , zk , t), ∂x2 P (xi , zk , t); r∗ , a, η, β)     (3.20)
and
                                    Φ(0) (ρ(xi , zk , t), ∂x P (xi , zk , t); r∗ , a, η, β)                 (3.21)
approximate
                                                      −           +
                                      w(ri,k (t), Di,k    (t), Di,k  (t); r∗ , αi , η, β),                  (3.22)
when 0 ≪ ε, δ ≪ 1, where
           Φ(0) (ρ, ∂x P ; r∗ , α, η, β) = w (ρ, −∂x P, ∂x P ; r∗ , α, η, β)                               (3.23a)
                                                                     ε                   ε              
     Φ(1)  ρ, ∂x P, ∂x2 P ; r∗ , α, η, β = w ρ, −∂x P + ∂x2 P, ∂x P + ∂x2 P ; r∗ , α, η, β .
                                           
                                                                                                           (3.23b)
                                                                      2                   2
Thus for 0 ≪ ε, δ ≪ 1, with η ∈ (0, ∞) fixed, the weak form of (3.14) is formally consistent
with the continuum model
      ∂t ρ + ∂z Φ(ℓ) (ρ, ∂x P, ∂x2 P ; r∗ , a, η, β) = 0,             (x, z, t) ∈ T1 × (0, 1) × (0, ∞),    (3.24a)
      ρ(x, 0, t) = ρbc (x, t),                                        (x, t) ∈ T1 × (0, ∞),                (3.24b)
      ρ(x, z, 0) = ρ0 (x, z),                                         (x, z) ∈ T1 × (0, 1)                 (3.24c)
where
                                 Z  1                     Z  t
                P (x, z, t) =         ρ(x, ξ, t)dξ +           ϕ(ℓ) (x, 1, s)ds,                           (3.25a)
                                  z                        0
            ϕ(ℓ) (x, z, t) = Φ(ℓ) ρ(x, z, t), ∂x P (x, z, t), ∂x2 P (x, z, t); r∗ , α, η, β ,
                                                                                                       
                                                                                                           (3.25b)
                                                               80


and Φ(ℓ) , ℓ ∈ {0, 1}, is given in (3.23). For the sake of compactness, we have slightly
abused notation in (3.24a), as the definition of Φ(0) is independent of ∂x2 P . Additionally,
we have identified [0, 1) with the one-dimensional torus T1 in order to reflect the periodic
layout of the processors.
    As in the discrete case, it may appear that the model in (3.24) is circular due to the
definition of P in (3.25a). However, as with F in (3.9), Φ(ℓ) can be unwrapped, this time
using the conservation law (3.24a); that is
               Z t                        Z t                   Z   1                Z   1
                     (ℓ)
                   ϕ (x, 1, s)ds =            ϕ(x, 0, s)ds +          ρ0 (x, ξ)dξ −        ρ(x, ξ, t)dξ   (3.26)
                0                          0                      0                    0
Thus the continuum model is complete once initial condition ρ0 and inflow condition
ϕbc := ϕ(·, 0, ·) are specified. In practice, ρbc is prescribed and then ϕbc is evaluated using
(3.25b) and (3.23).
    Furthermore, integrating (3.24a) with respect to z gives
                               Z 1
                            ∂t      ρ(x, ξ, t)dξ + Φ(ℓ) (x, 1, t) − Φ(ℓ) (x, z, t) = 0                    (3.27)
                                 z
Meanwhile, differentiating (3.25a) gives
                                                    Z   1
                               ∂t P (x, z, t) = ∂t        ρ(x, ξ, t)dξ + Φ(ℓ) (x, 1, t)                   (3.28)
                                                      z
Combining (3.27) and (3.28) and using the fact that ρ = −∂z P gives a closed Hamilton-
Jacobi equation for P with initial and boundary conditions that are derived by applying
the definition of P in (3.25a) to (3.24c) and (3.24b), respectively. The complete model is,
for some T > 0,
        ∂t P − Φ(ℓ) (−∂z P, ∂x P, ∂x2 P ; r∗ , α, η, β) = 0,          (x, z, t) ∈ T1 × (0, 1) × (0, T ), (3.29a)
                 Z 1                    Z t
   P (x, 0, t) −      ρ0 (x, ξ)dξ −         ϕbc (x, s)ds = 0,                     (x, t) ∈ T1 × (0, T ), (3.29b)
                   0                     0
                                         Z 1
                         P (x, z, 0) −        ρ0 (x, ξ)dξ = 0,                    (x, z) ∈ T1 × (0, 1),  (3.29c)
                                          z
where (3.29b) is derived by integrating (3.24b) over z ∈ (0, 1) and applying (3.26).
    The existence and uniqueness of viscosity solution follows:
                                                           81


                            −ηD = r − βr∗                r     ηD = r − βr∗
                     −ηD = r                                                ηD = r
                                                   Ω1      Ω1
                                        Ω3                       Ω4
                                                                          r = r∗
                                                 Ω3 Ω2 Ω2 Ω4
                         −ηD                                             ηD
                          1−β
                               =r                                       1−β
                                                                             =r
                                                                        D
                                Figure 3.2: Flux Φ(0) defined in (3.30)
Theorem 3.3.1 ([10], Theorem 3.1) Assume that α and ρ0 are (i) non-negative, (ii) uniformly
Lipschitz in their arguments, and (iii) periodic in x (that is, α(0) = α(1) and ρbc (0, t) = ρbc (1, t)
                                                                 RT
for all t ∈ [0, T ]). Further, assume that there is an M where 0 ϕbc (x, s)ds ≤ M for all x ∈ T1 .
Then there exists a unique, continuous, viscosity solution (in the sense of [103]) to (3.29).
    We use the flux function Φ(0) for all of the numerical simulations in Section 3.5. This
function is a piecewise constant that can be expressed in the following form:
                                                   
                                                   
                                                   
                                                   
                                                    α           (r, D) ∈ Ω1 ,
                                                   
                                                   
                                                     αr
                                                                 (r, D) ∈ Ω2 ,
                                                   
                                                   
                                                   r
                                                   
                                                       ∗
                          Φ(0) (r, D; r∗ , α, β) = α(r + ηD)                                   (3.30)
                                                   
                                                                (r, D) ∈ Ω3 ,
                                                          βr∗
                                                   
                                                   
                                                   
                                                   
                                                      α(r − ηD)
                                                   
                                                   
                                                                 (r, D) ∈ Ω4 ,
                                                   
                                                   
                                                   
                                                          βr∗
where the subdomains Ωi are depicted in Figure 3.2.
3.4     Numerical Methods for Simulations
In this section, we perform numerical simulations of the one dimensional processor sys-
tem in order to (i) test the ability of the macroscopic model to approximate the discrete
model when ε and δ are small and (ii) explore how model parameters affect the model out-
put. Problem data is specified in terms of continuum models quantities. These quantities
are translated back to discrete model quantities in order to implement ODE simulations.
                                                     82


3.4.1   ODE Implementation
The explicit two-step Adams-Bashforth (Section III of [143]) is used to simulate the dis-
crete model formed by (3.1), (3.3), and (3.8). Given η, values imax and k max are chosen
so that k max /imax = η as in (3.10). We then compute a solution to the discrete model as
follows. Using (3.11) and (3.16), we convert r∗ , a, ρ0 , ρbc to their discrete counterparts:
       q∗ = εδr∗ ,     qi,k (0) = εδρ0 (xi , zk ),        qi,0 (t) = ρbc (xi , t), ai = εα(xi ). (3.31)
This discrete model data is used to set the time step:
                                                          q∗
                                    ∆t =                   √              .                     (3.32)
                                            2 maxi ai           imax k max
The outflow at Fi,kmax is tracked and accumulated over time in order to compute Qi,k from
(3.3). At the final time T , the result of the explicit time stepping is converted back, via the
formula in (3.11), i.e., ri,k (T ) = (εδ)−1 qi,k (T ). In order to compare this against solutions to
the continuum model (see below), we use these point-wise values to generate a piecewise
constant function r over the cells Ci,k = (xi − .5ε, xi + .5ε) × (zk , zk + δ):
                                                  X
                                     r(x, z) =          ri,k χCi,k (x, z).                       (3.33)
                                                   i,k
3.4.2   Hamilton-Jacobi Implementation
The Hamilton Jacobi equation (3.29) is solved numerically using a fifth-order WENO in-
terpolation in x and z and the optimal third-order SSP Runge-Kutta method for time
integration. Details of these algorithms can be found in Sections 3.2 and 6, respectively,
of [11]. Once a numerical solution for P is computed, we again use WENO interpolation
to approximate ρ via the relation ρ(x, z, t) = −∂z P (x, z, t).
    To condense the notation, let σ = ∂x P , τ = ∂z P and υ = ∂xx P . Then for fixed r∗ , α, η, β,
and ℓ, let H(σ, τ, υ) = −Φ(ℓ) (−τ, σ, υ; r∗ , a, η, β). The numerical solution for P is computed
                                                       83


on a grid {xn , zm } where
              xn = n∆x,                    n = 1, 2, . . . , N,                       ∆x = N −1 ,        (3.34)
              zm = m∆z,                    m = 1, 2, . . . , M,                       ∆z = M −1 .        (3.35)
The semi-discrete method for the grid function Pn,m (t) ≈ P (xn , zm , t) is
                            d                     −       +      −       +
                              Pn,m (t) = −Ĥ(σn,m     , σn,m  , τn,m  , τn,m ; υn,m ),                   (3.36)
                           dt
                                                                                ±           ±
where the numerical approximations σn,m        ±
                                                    ≈ σ(x±   n , zm ) and τn,m ≈ τ (xn , zm ) are obtained
via WENO interpolation and υn,m ≈ υ(xn , zm ) is computed by central difference. The
numerical flux function Ĥ, based on the global Lax-Friedrichs flux:
                                  σ− + σ+ τ − + τ +
                                                           
      −        −                                                  1                      1
          +        +
 Ĥ(σ , σ , τ , τ ; υ) = H                  ,           , υ − λx (σ + − σ − ) − λz (τ + − τ − ), (3.37)
                                       2         2                2                      2
where
                                               αη                                  α
                          λx = max |Hσ | =         ,     λz = max |Hτ | =              .                 (3.38)
                                  σ,τ          βr∗                σ,τ             βr∗
The time step for the SSP integrator is given by
                                              λx       λz
                                                           
                                         ∆t       +            ≤ 0.6.                                    (3.39)
                                              ∆x ∆z
3.4.3   WENO Interpolation
Here we demonstrate how to compute σ ± and τ ± using fifth order WENO interpolation
                                              ±
[11]. For simplicity, we only consider σi,j      for fixed i, j, which approximate ∂x P (x, zj ) from
                                                              −
the left or right limit of x = xi , and we consider σi,j         first. For fixed i, j, each of stencils
      X0 = {xi−3 , xi−2 , xi−1 , xi }, X1 = {xi−2 , xi−1 , xi , xi+1 }, X2 = {xi−1 , xi , xi+1 , xi+2 },
and function values of P (·, zj ) evaluated on the stencil, we can find a Lagrangian interpo-
lating polynomial Qk of degree 3, k = 0, 1, 2, i.e.
                          Qk (xi−3+k+r ) = P (xi−3+k+r , zj ),         r = 0, 1, 2, 3.
                                                     84


Then
                                           −,k       d
                                          σi,j  :=     Qk (x)
                                                    dx          x=xi
                                             −,k
approximate ∂x P (x−  i , zj ); note that σi,j , k = 0, 1, 2 are the three possible interpolations of
third order ENO procedure. Let
                          ∆+x Pi,j := Pi+1,j − Pi,j = P (xi+1 , zj ) − P (xi , zj )
denote the standard forward difference operator; here we omit variable t of function P
for convenience. Direct calculations imply
                           −,0        1 ∆+x Pi−3,j     7 ∆+            +
                                                           x Pi−2,j ∆x Pi−1,j
                         σi,j    =                  −                                              (3.40)
                                      3 ∆x             6 ∆x             ∆x
                           −,1        −1 ∆+    P
                                             x i−2,j     5 ∆  +
                                                                P
                                                              x i−1,j      1 ∆+
                                                                              x Pi,j
                         σi,j    =                    +                +                           (3.41)
                                        6      ∆x        6 ∆x              3 ∆x
                                          +                +              +
                           −,2        1 ∆x Pi−1,j 5 ∆x Pi,j           1 ∆x Pi+1,j
                         σi,j    =                  +             −               .                (3.42)
                                      3 ∆x             6 ∆x           6 ∆x
                                                                −,k
WENO procedure uses a convex combination of σi,j                    , k = 0, 1, 2 as final approximation
    −
of σi,j , i.e.
                                    −           −,0        −,1          −,2
                                   σi,j = w0 σi,j   + w1 σi,j   + w2 σi,j   ,                      (3.43)
where w0 , w1 , w2 ≥ 0 are nonlinear weights whose sum is 1. The key ingredient of WENO
interpolation is to choose these nonlinear weights in the following way:
   1. In smooth regions, w0 , w1 , w2 should be very close to the optimal linear weights:
                   w0 = 0.1 + O(∆x2 ), w1 = 0.6 + O(∆x2 ), w2 = 0.3 + O(∆x2 ),
                        −
        which makes σi,j   defined by (3.43) fifth order accurate on stencil {xi−3 , xi−2 , . . . , xi+2 }
        in approximating ∂x P (xi , zj ) in smooth regions of P (·, zj );
   2. When stencil Xk contains discontinuity in the x derivative of P , the corresponding
        weight wk should be close to zero, so that stencil on non-smooth region has little
                           −
        contribution to σi,j  . The choice of weight in [107] satisfies wk = O(∆x4 ) in this case.
                                                      85


To find such nonlinear weight, a smoothness indicator is introduced to measure how
smooth the function being interpolated is inside the interpolation stencil. As illustrated
in [107], the smoothness indicator is a scaled sum of the squares of the L2 norms of the
second and higher derivatives of the interpolation polynomial on the target interval, i.e.
                                     IS0 = 13(a − b)2 + 3(a − 3b)2 ,                                       (3.44)
                                     IS1 = 13(b − c)2 + 3(b + c)2 ,                                        (3.45)
                                     IS2 = 13(c − d)2 + 3(3c − d)2 ,                                       (3.46)
where
                       ∆2x Pi−2,j             ∆2x Pi−1,j           ∆2x Pi,j         ∆2x Pi+1,j
                  a=               ,   b=                ,   c=             , d=               ,           (3.47)
                          ∆x2                   ∆x2                 ∆x2                ∆x2
and ∆2x represents the central difference, i.e.
                                      ∆2x Pi,j = Pi+1,j − 2Pi,j + Pi−1,j .
Then the nonlinear weights are
                                                   w
                                                   ek
                                    wk =                      ,   k = 0, 1, 2,
                                             w
                                             e0 + we1 + w  e2
where we usually choose ε = 10−6 , and
                                   1                        6                        3
                     w
                     e0 =                  ,   w
                                               e1 =                 ,   we2 =              .
                            (ε + IS0 )2               (ε + IS1 )2              (ε + IS2 )2
We obtain the fifth order WENO approximation as
                      ∆+               ∆+                ∆+           ∆+
                                                                                
       −                x Pi−2,j          x Pi−1,j         x Pi,j       x Pi+1,j
      σi,j = 12 −                 +7                +7            −                − ΦW EN O (a, b, c, d),
                         ∆x                 ∆x             ∆x            ∆x
with
                                                                                
                  W EN O                   1                        1         1
                Φ        (a, b, c, d) := w0 (a − 2b + c) +              w2 −       (b − 2c + d),
                                           3                        6         2
and symmetrically, the approximation to the right derivative is
                      ∆+               ∆+                ∆+           ∆+
                                                                                
       +                x Pi−2,j          x Pi−1,j         x Pi,j       x Pi+1,j
      σi,j = 12 −                 +7                +7             −               − ΦW EN O (e, d, c, b),
                         ∆x                 ∆x             ∆x            ∆x
with a, b, c, d in (3.47) and
                                                      ∆2x Pi+2,j
                                                 e=               .
                                                          ∆x2
                                                         86


3.5              Numerical Experiments
We perform a sequence of exploratory experiments below, modifying the parameters η
and β, as well as the throughput function α. In all cases, α, ρ0 , and ρbc are periodic with
respect to x and the parameter r∗ = 1. Results are presented as two-dimensional color
maps or line-outs in the z direction. In all figures, the horizontal axis corresponds to the
z-axis. Profiles of α for each experiment are depicted in fig. 3.3.
     1                                                1                                                 1                                               1
    0.8                                              0.8                                               0.8                                             0.8
    0.6                                              0.6                                               0.6                                             0.6
x                                                x                                                 x                                               x
    0.4                                              0.4                                               0.4                                             0.4
    0.2                                              0.2                                               0.2                                             0.2
     0                                                0                                                 0                                               0
           0.6   0.7   0.8       0.9   1   1.1             0.6   0.7   0.8       0.9   1    1.1              0.6   0.7   0.8       0.9   1   1.1             0.6   0.7   0.8       0.9   1   1.1
                             α                                               α                                                 α                                               α
           (a) Example 1                               (b) Examples 2-3                                      (c) Example 4                                   (d) Example 5
Figure 3.3: Profiles of the processor speed α used in the numerical experiments. The non-
standard orientation of the graphs is set to match the axes in the numerical results that
follow.
Example 3.5.1 (Agreement between models) The purpose of this example is to demonstrate
that the macroscopic model approximates the microscopic model when ε and δ are sufficiently
small. We set β = 1 and consider η ∈ {0.2, 1, 5}. The initial condition, boundary condition, and
processor speed are given by
          ρ0 (x, z) = 1.5 (sin(2πz))6 χ[0,0.5] (z),                                        ρbc (x, t) = 0,               α(x) = 1 − 0.4(sin(πx))2 ,                                      (3.48)
respectively. Both models are simulated up to a final time t = 0.5.
          For this example, the Hamilton-Jacobi simulation is performed with a 1000 × 1000 mesh and
a time step chosen according to (3.39) in order to generate a highly resolved numerical solution
of the macroscopic model. For the microscopic model, we use imax = 1000 and k max = 200 when
η = 0.2, imax = k max = 500 when η = 1, and imax = 200 and k max = 1000 when η = 5. These
solutions to the microscopic model are then used to obtain the piecewise-constant function r on the
1000 × 1000 mesh from the Hamilton-Jacobi simulation.
                                                                                                  87


          Numerical results for η = 0.2, η = 1.0, and η = 5.0 are shown in fig. B.1, fig. B.2, and
fig. B.3, respectively. While the results demonstrate general qualitative agreement between the
models, discrepancies develop over time, especially for smaller values of η; see figs. B.1i and B.1l.
For the worst case scenario (η = 0.2),we note that the k max is relatively small, suggesting that the
asymptotic analysis is less relevant. Indeed, in this case the microscopic model displays a diffusive
behavior similar to that encountered with under-resolved advective numerical schemes. In response
to this error, we increase the size of the discrete model by a factor of 2.5 ( giving imax = 2500 and
k max = 500), at which point the discrepancy between models decreases noticeably; see the first row
of fig. 3.4. It is possible that an increase in the order of the approximation of the model with respect
to z-advection (similar to that used to obtain Φ(0) ) may be needed to avoid such cases for low η
models with relatively low k max . Such a modification remains open.
                                                                                                          1.5
                                                                                                                                                r
     1                                        1.5                                                                                               ρ
                                                         1
    0.8                                                 0.8                                        0.2
                                                                                                           1
                                              1
    0.6                                                 0.6
x                                                   x                                              0
    0.4                                                 0.4
                                              0.5                                                         0.5
    0.2                                                 0.2                                        −0.2
     0                                        0          0
          0   0.2   0.4       0.6   0.8   1                   0    0.2   0.4       0.6   0.8   1
                          z                                                    z                           0
                                                                                                                0   0.2   0.4       0.6   0.8   1
              (a) r at t = 0.5                                    (b) ρ − r at t = 0.5                                          z
                                                                                                            (c) ρ,r at (x, t) = (0.3, 0.5)
Figure 3.4: Comparison of the discrete model with (imax , k max ) = (2500, 500) and the con-
tinuum model for η = 0.2 at time t = 0.5. As expected, the discrete model shows better
agreement with the continuum model than the previous version with only (imax , k max ) =
(1000, 200) processors and stages; cf. fig. B.1
          For the remaining examples, the Hamilton-Jacobi simulations are performed on a
coarser mesh of 100 × 100.
Example 3.5.2 (Variations in η) In this example, we examine the effect of η on solutions to the
macroscopic model while β = 1.0 is fixed. The initial condition, boundary condition, and processor
                                                                               88


speed are given by
              ρ0 (x, z) = 1.5χz≤0.2 (x, z) ρbc (x, t) = 0,    α(x) = 1 − 0.4(sin(πx))6 ,       (3.49)
respectively. It is expected that the slower processor speed around x = 0.5 will slow down neigh-
boring processors due to neighbor-based throttling, encoded in the definition of w2 in (3.12b).
Moreover, the effect should become more global in x as η increases, since larger values of η cor-
respond to a larger number of stages per processor. Indeed as the stages increase, interactions
between neighbors begin to have a cumulative global effect. This trend can be observed by compar-
ing results across the first three rows of fig. B.4 and in the line-outs in the final row.
Example 3.5.3 (Variations in β) In this example, we examine the effect of β on solutions to the
macroscopic model, while holding η = 1.0 fixed. The initial condition, boundary condition, and
processor speed are again given by (3.49).
     Based on the definition of the function w2 in (3.12b), the expectation is that smaller values of
β will lead to reduced throttling effects. Such behavior is confirmed by the numerical results in
fig. B.5.
Example 3.5.4 (Highly localized slowdown) In this example, we explore the effects of a highly
localized slowdown in processor speed when η = β = 1. The initial and boundary conditions are
given in (3.49), while the processor speed is given by α(x) = 1 − 0.4c(x), where
                                       
                                       
                                       
                                       
                                       
                                         0             |x − .5| > .05
                                       
                                       
                                       
                                       
                                       40x − 18
                                                       x ∈ [.45, .475]
                               c(x) =                                     .                    (3.50)
                                       
                                       
                                       
                                       
                                         −40x + 22 x ∈ [.525, .55]
                                       
                                       
                                       
                                       
                                       1
                                                       |x − .5| < .025
In particular, α ̸= 1 only on the interval (0.45, 0.55). Simulation results from this example are
shown in fig. 3.5. At early times, slower processors in the center of the x domain prohibit neigh-
boring processors from moving data to later stages of the calculation (i.e. along the z-direction).
                                                    89


The result is a buildup of data in the neighboring processors. As time progresses, the build-up
of data spreads as throttled processors near the initial slowdown around x = 0.5 begin to effect
neighbors further away. Eventually these buildups dissipate as the slower processors begin catch
up with their throttled neighbors.
      1                                        1.5         1                                        1.5        1                                          1.5
     0.8                                                  0.8                                                 0.8
                                               1                                                    1                                                     1
     0.6                                                  0.6                                                 0.6
 x                                                    x                                                   x
     0.4                                                  0.4                                                 0.4
                                               0.5                                                  0.5                                                   0.5
     0.2                                                  0.2                                                 0.2
      0                                        0           0                                        0          0                                          0
           0   0.2   0.4       0.6   0.8   1                    0   0.2   0.4       0.6   0.8   1                   0   0.2   0.4       0.6   0.8   1
                           z                                                    z                                                   z
                (a) ρ at t = 0.1                                    (b) ρ at t = 0.25                                    (c) ρ at t = 0.5
                               Figure 3.5: The effect of a highly localized slowdown on ρ
Example 3.5.5 (Long-term behavior) In previous examples, we have observed that under some
conditions, solutions eventually resemble a traveling profile of the form
                                                     ρ∗ (x, z, t) = χ[ζ0 (x),ζ1 (x)] (z − st),                                                          (3.51)
where s is a positive constant and the profiles ζ0 and ζ1 are constant in time and satisfy ζ0 (x) <
ζ1 (x) for all x ∈ [0, 1). Our intuition is that for a wide range of conditions, traveling profiles of
this type will arise after sufficiently long times, if the z domain is extended to (0, ∞). Moreover
the shape of ζ1 and ζ2 is closely related to the initial data and the shape of α. A more systematic
study of such profiles in special case can be found in [144]. Rather than make a precise conjecture
at this point, we instead provide an example which further demonstrates our intuition. Initial
and boundary conditions are given in (3.49). Because the domain in z is limited, we introduce
relatively small variations in α, which allow the system to settle faster:
                                                           α(x) = 1 + 0.1 cos(4πx).                                                                     (3.52)
Simulation results for this example are presented in fig. 3.6. When t = 0.5, the solution has
nearly settled to a profile of the form (3.51), with cusps that appear where the waves caused by
                                                                                90


throttling meet, at x = 0.5 and at the periodic boundary. We see then at t = 0.75 that this profile
is maintained, with cusps at the same location, and again at t = 1 (after extending the z domain).
In particular the solution has the periodicity of α.
      1                                             1.5            1                                                 1.5        1                                         1.5
     0.8                                                         0.8                                                           0.8
                                                    1                                                                1                                                    1
     0.6                                                         0.6                                                           0.6
 x                                                           x                                                             x
     0.4                                                         0.4                                                           0.4
                                                    0.5                                                              0.5                                                  0.5
     0.2                                                         0.2                                                           0.2
      0                                             0              0                                                 0          0                                         0
           0   0.2   0.4       0.6   0.8      1                        0   0.2   0.4       0.6   0.8       1                         0    0.2   0.4       0.6   0.8   1
                           z                                                           z                                                              z
               (a) ρ at t = 0.1                                            (b) ρ at t = 0.25                                               (c) ρ at t = 0.5
                                     1                                            1.5            1                                               1.5
                                   0.8                                                         0.8
                                                                                  1                                                              1
                                   0.6                                                         0.6
                               x                                                           x
                                   0.4                                                         0.4
                                                                                  0.5                                                            0.5
                                   0.2                                                         0.2
                                     0                                            0              0                                               0
                                         0   0.2   0.4       0.6   0.8      1                        0.6       0.8         1   1.2       1.4
                                                         z                                                                 z
                                             (d) ρ at t = 0.75                                             (e) ρ at t = 1
Figure 3.6: The effect of small variation in processor speed on ρ. After sufficient time, a
profile emerges with the periodicity of α.
                                                                                       91


                                         CHAPTER 4
                                      CONCLUSIONS
In the thesis, we discuss several numerical methods, their applications, and implementa-
tions. For Sparse Grid Discontinuous Galerkin (SGDG) method, we focus on its adaptive
version [3, 4, 5, 6, 7, 8], with Alpert’s multiwavelet as basis for DG space, interpolatory
multiwavelet and collocation method to treat nonlinear terms, fast wavelet transform di-
mension by dimension to reduce computational cost, and nonlinear solver in PETSc [9]
for implicit time stepping. We apply this method on Maxwell equation in one or two
dimension in nonlinear optical media [120, 121, 111, 122]. An energy stable Discontin-
uous Galerkin method [1, 2] is employed for this problem, and in our work, coupling
with adaptive SGDG scheme package [8] using PETSc [9]. Numerical experiments of
model problems show expected accuracy and convergence rate, and adaptive scheme
with proper choice of error threshold can choose a set of active elements in hash table to
represent numerical solutions, without significant degeneration of numerical accuracy or
generating non-physical numerical solutions or oscillations. In future work, data struc-
ture of parallel computation in PETSc, e.g. DMPlex to deal with unstructured mesh, and
IS and PetscSection for indexing, can provide a solution to compute numerical solution of
SGDG method in core parallelism. The usage of PETSc also provides a solution to SGDG
method with fully implicit time stepping, broadening the availability of SGDG method.
We can also explore different standard for refinement and coarsening step of adaptive
method, to better specify active elements in adaptive scheme.
    For Weighted Essentially Non-Oscillatory (WENO) method, we consider the fifth or-
der WENO interpolation, which maintains high order accuracy on smooth region, while
any singularity does not significantly affect the numerical interpolation [107, 12]. WENO
interpolation and strong stability preserving (SSP) time stepping provide promising fully
discrete numerical method to solve Hamilton-Jacobi equation [11]. This method is ap-
                                              92


plied to solve the Hamilton-Jacobi continuum model of extreme scale parallel computers,
which is derived from a discrete model by taking limits on number of processors and
tasks. Numerical experiments of both models show that this continuum model can cap-
ture the asymptotic behaviour of the discrete model. Additionally, these experiments
provide an initial understanding of solutions’ dependence on parameters associated with
the parallelism of the modelled computation as well as the effects heterogeneities in pro-
cessing capacity [10]. In future work, we can explore control strategies for α that can
alleviate bottlenecks caused by local slowdowns in the processor speed. We can also ex-
tend the model to allow for more complicated interactions, including stochastic effects,
and explore strategies for optimal communication. Finally, the parameters of the model
can be taken from processor components of a real supercomputer, and we can determine
if the macroscopic model predicts the real global behaviour of the supercomputer.
                                            93


APPENDICES
    94


                                                                         APPENDIX A
                   NUMERICAL SIMULATION OF 1D SOLITON PROPAGATION
          1                                                     1                                                     1
                                               T=40                                                  T=40                                                  T=40
        0.8                                    T=80           0.8                                    T=80           0.8                                    T=80
        0.6                                                   0.6                                                   0.6
        0.4                                                   0.4                                                   0.4
        0.2                                                   0.2                                                   0.2
    E     0                                               E     0                                               E     0
        -0.2                                                  -0.2                                                  -0.2
        -0.4                                                  -0.4                                                  -0.4
        -0.6                                                  -0.6                                                  -0.6
        -0.8                                                  -0.8                                                  -0.8
         -1                                                    -1                                                    -1
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
          1                                                     1                                                     1
                                               T=40                                                  T=40                                                  T=40
        0.8                                    T=80           0.8                                    T=80           0.8                                    T=80
        0.6                                                   0.6                                                   0.6
        0.4                                                   0.4                                                   0.4
        0.2                                                   0.2                                                   0.2
    E     0                                               E     0                                               E     0
        -0.2                                                  -0.2                                                  -0.2
        -0.4                                                  -0.4                                                  -0.4
        -0.6                                                  -0.6                                                  -0.6
        -0.8                                                  -0.8                                                  -0.8
         -1                                                    -1                                                    -1
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
          1                                                     1                                                     1
                                               T=40                                                  T=40                                                  T=40
        0.8                                    T=80           0.8                                    T=80           0.8                                    T=80
        0.6                                                   0.6                                                   0.6
        0.4                                                   0.4                                                   0.4
        0.2                                                   0.2                                                   0.2
    E     0                                               E     0                                               E     0
        -0.2                                                  -0.2                                                  -0.2
        -0.4                                                  -0.4                                                  -0.4
        -0.6                                                  -0.6                                                  -0.6
        -0.8                                                  -0.8                                                  -0.8
         -1                                                    -1                                                    -1
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
Figure A.1: Transient fundamental (M = 1) temporal soliton propagation with the adap-
tive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level
N = 11. Refinement threshold ε = 10−5 , and coarsen threshold is η = 10−6 . First column:
piecewise polynomial degree k = 1; second column: k = 2; third column: k = 3. First
row: alternating flux I; second row: alternating flux II; third row: upwind flux. Solutions
at time T = 40 and T = 80 are plotted on each case.
                                                                                 95


         12                                                        12                                                                                                                  12                                                                                 12
          9                                                         9                                                                                                                      9                                                                               9
          6                                                         6                                                                                                                      6                                                                               6
          3                                                         3                                                                                                                      3                                                                               3
          0                                                         0                                                                                                                      0                                                                               0
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                      0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
          1                                                         1                                                                                                                      1                                                                               1
                                               T = 40.0                                                                      T = 80.0                                                                                                                 T = 40.0                                                  T = 80.0
        0.5                                                       0.5                                                                                                                 0.5                                                                                0.5
    E     0                                                   E     0                                                                                                            E         0                                                                         E     0
        -0.5                                                      -0.5                                                                                                               -0.5                                                                                -0.5
         -1                                                        -1                                                                                                                  -1                                                                                 -1
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                      0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
                               x                                                                      x                                                                                                                    x                                                                    x
                    (a) k = 1, Alternating flux I                                                                                                                                                         (b) k = 2, Alternating flux I
                                                                                   12                                                                                  12
                                                                                       9                                                                                9
                                                                                       6                                                                                6
                                                                                       3                                                                                3
                                                                                       0                                                                                0
                                                                                           0         0.2         0.4         0.6            0.8              1               0       0.2           0.4         0.6         0.8              1
                                                                                       1                                                                                1
                                                                                                                                                  T = 40.0                                                                       T = 80.0
                                                                                  0.5                                                                                 0.5
                                                                             E         0                                                                          E     0
                                                                                 -0.5                                                                                 -0.5
                                                                                   -1                                                                                  -1
                                                                                           0         0.2         0.4         0.6            0.8              1               0       0.2           0.4         0.6         0.8              1
                                                                                                                       x                                                                                  x
                                                                                                      (c) k = 3, Alternating flux I
Figure A.2: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.1.
                                                                                                                                                                 96


         12                                                        12                                                                                                                  12                                                                                 12
          9                                                         9                                                                                                                      9                                                                               9
          6                                                         6                                                                                                                      6                                                                               6
          3                                                         3                                                                                                                      3                                                                               3
          0                                                         0                                                                                                                      0                                                                               0
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                      0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
          1                                                         1                                                                                                                      1                                                                               1
                                               T = 40.0                                                                      T = 80.0                                                                                                                 T = 40.0                                                  T = 80.0
        0.5                                                       0.5                                                                                                                 0.5                                                                                0.5
    E     0                                                   E     0                                                                                                            E         0                                                                         E     0
        -0.5                                                      -0.5                                                                                                               -0.5                                                                                -0.5
         -1                                                        -1                                                                                                                  -1                                                                                 -1
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                      0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
                               x                                                                      x                                                                                                                    x                                                                    x
                   (a) k = 1, Alternating flux II                                                                                                                                                        (b) k = 2, Alternating flux II
                                                                                   12                                                                                  12
                                                                                       9                                                                                9
                                                                                       6                                                                                6
                                                                                       3                                                                                3
                                                                                       0                                                                                0
                                                                                           0         0.2         0.4         0.6            0.8              1               0       0.2           0.4         0.6         0.8              1
                                                                                       1                                                                                1
                                                                                                                                                  T = 40.0                                                                       T = 80.0
                                                                                  0.5                                                                                 0.5
                                                                             E         0                                                                          E     0
                                                                                 -0.5                                                                                 -0.5
                                                                                   -1                                                                                  -1
                                                                                           0         0.2         0.4         0.6            0.8              1               0       0.2           0.4         0.6         0.8              1
                                                                                                                       x                                                                                  x
                                                                                                     (c) k = 3, Alternating flux II
Figure A.3: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.1.
                                                                                                                                                                 97


         12                                                        12                                                                                                                  12                                                                                 12
          9                                                         9                                                                                                                      9                                                                               9
          6                                                         6                                                                                                                      6                                                                               6
          3                                                         3                                                                                                                      3                                                                               3
          0                                                         0                                                                                                                      0                                                                               0
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                      0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
          1                                                         1                                                                                                                      1                                                                               1
                                               T = 40.0                                                                      T = 80.0                                                                                                                 T = 40.0                                                  T = 80.0
        0.5                                                       0.5                                                                                                                 0.5                                                                                0.5
    E     0                                                   E     0                                                                                                            E         0                                                                         E     0
        -0.5                                                      -0.5                                                                                                               -0.5                                                                                -0.5
         -1                                                        -1                                                                                                                  -1                                                                                 -1
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                      0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
                               x                                                                      x                                                                                                                    x                                                                    x
                         (a) k = 1, Upwind flux                                                                                                                                                                      (b) k = 2, Upwind flux
                                                                                   12                                                                                  12
                                                                                       9                                                                                9
                                                                                       6                                                                                6
                                                                                       3                                                                                3
                                                                                       0                                                                                0
                                                                                           0         0.2         0.4         0.6            0.8              1               0       0.2           0.4         0.6         0.8              1
                                                                                       1                                                                                1
                                                                                                                                                  T = 40.0                                                                       T = 80.0
                                                                                  0.5                                                                                 0.5
                                                                             E         0                                                                          E     0
                                                                                 -0.5                                                                                 -0.5
                                                                                   -1                                                                                  -1
                                                                                           0         0.2         0.4         0.6            0.8              1               0       0.2           0.4         0.6         0.8              1
                                                                                                                       x                                                                                  x
                                                                                                                 (c) k = 3, Upwind flux
Figure A.4: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.1.
                                                                                                                                                                 98


          1                                                     1                                                     1
                                               T=40                                                  T=40                                                  T=40
        0.8                                    T=80           0.8                                    T=80           0.8                                    T=80
        0.6                                                   0.6                                                   0.6
        0.4                                                   0.4                                                   0.4
        0.2                                                   0.2                                                   0.2
    E     0                                               E     0                                               E     0
        -0.2                                                  -0.2                                                  -0.2
        -0.4                                                  -0.4                                                  -0.4
        -0.6                                                  -0.6                                                  -0.6
        -0.8                                                  -0.8                                                  -0.8
         -1                                                    -1                                                    -1
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
          1                                                     1                                                     1
                                               T=40                                                  T=40                                                  T=40
        0.8                                    T=80           0.8                                    T=80           0.8                                    T=80
        0.6                                                   0.6                                                   0.6
        0.4                                                   0.4                                                   0.4
        0.2                                                   0.2                                                   0.2
    E     0                                               E     0                                               E     0
        -0.2                                                  -0.2                                                  -0.2
        -0.4                                                  -0.4                                                  -0.4
        -0.6                                                  -0.6                                                  -0.6
        -0.8                                                  -0.8                                                  -0.8
         -1                                                    -1                                                    -1
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
          1                                                     1                                                     1
                                               T=40                                                  T=40                                                  T=40
        0.8                                    T=80           0.8                                    T=80           0.8                                    T=80
        0.6                                                   0.6                                                   0.6
        0.4                                                   0.4                                                   0.4
        0.2                                                   0.2                                                   0.2
    E     0                                               E     0                                               E     0
        -0.2                                                  -0.2                                                  -0.2
        -0.4                                                  -0.4                                                  -0.4
        -0.6                                                  -0.6                                                  -0.6
        -0.8                                                  -0.8                                                  -0.8
         -1                                                    -1                                                    -1
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
Figure A.5: Transient fundamental (M = 1) temporal soliton propagation with the adap-
tive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level
N = 11. Refinement threshold ε = 10−3 , and coarsen threshold is η = 10−4 . First column:
piecewise polynomial degree k = 1; second column: k = 2; third column: k = 3. First
row: alternating flux I; second row: alternating flux II; third row: upwind flux. Solutions
at time T = 40 and T = 80 are plotted on each case.
                                                                                 99


         12                                                        12                                                                                                                 12                                                                                 12
          9                                                         9                                                                                                                     9                                                                               9
          6                                                         6                                                                                                                     6                                                                               6
          3                                                         3                                                                                                                     3                                                                               3
          0                                                         0                                                                                                                     0                                                                               0
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                     0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
          1                                                         1                                                                                                                     1                                                                               1
                                               T = 40.0                                                                      T = 80.0                                                                                                                T = 40.0                                                  T = 80.0
        0.5                                                       0.5                                                                                                                0.5                                                                                0.5
    E     0                                                   E     0                                                                                                           E         0                                                                         E     0
        -0.5                                                      -0.5                                                                                                              -0.5                                                                                -0.5
         -1                                                        -1                                                                                                                 -1                                                                                 -1
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                     0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
                               x                                                                      x                                                                                                                   x                                                                    x
                    (a) k = 1, Alternating flux I                                                                                                                                                        (b) k = 2, Alternating flux I
                                                                                   12                                                                                 12
                                                                                       9                                                                               9
                                                                                       6                                                                               6
                                                                                       3                                                                               3
                                                                                       0                                                                               0
                                                                                           0         0.2         0.4         0.6            0.8              1              0       0.2           0.4         0.6         0.8              1
                                                                                       1                                                                               1
                                                                                                                                                  T = 40.0                                                                      T = 80.0
                                                                                  0.5                                                                                0.5
                                                                             E         0                                                                         E     0
                                                                                 -0.5                                                                                -0.5
                                                                                   -1                                                                                 -1
                                                                                           0         0.2         0.4         0.6            0.8              1              0       0.2           0.4         0.6         0.8              1
                                                                                                                       x                                                                                 x
                                                                                                      (c) k = 3, Alternating flux I
Figure A.6: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.5.
                                                                                                                                                             100


         12                                                        12                                                                                                                 12                                                                                 12
          9                                                         9                                                                                                                     9                                                                               9
          6                                                         6                                                                                                                     6                                                                               6
          3                                                         3                                                                                                                     3                                                                               3
          0                                                         0                                                                                                                     0                                                                               0
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                     0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
          1                                                         1                                                                                                                     1                                                                               1
                                               T = 40.0                                                                      T = 80.0                                                                                                                T = 40.0                                                  T = 80.0
        0.5                                                       0.5                                                                                                                0.5                                                                                0.5
    E     0                                                   E     0                                                                                                           E         0                                                                         E     0
        -0.5                                                      -0.5                                                                                                              -0.5                                                                                -0.5
         -1                                                        -1                                                                                                                 -1                                                                                 -1
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                     0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
                               x                                                                      x                                                                                                                   x                                                                    x
                   (a) k = 1, Alternating flux II                                                                                                                                                       (b) k = 2, Alternating flux II
                                                                                   12                                                                                 12
                                                                                       9                                                                               9
                                                                                       6                                                                               6
                                                                                       3                                                                               3
                                                                                       0                                                                               0
                                                                                           0         0.2         0.4         0.6            0.8              1              0       0.2           0.4         0.6         0.8              1
                                                                                       1                                                                               1
                                                                                                                                                  T = 40.0                                                                      T = 80.0
                                                                                  0.5                                                                                0.5
                                                                             E         0                                                                         E     0
                                                                                 -0.5                                                                                -0.5
                                                                                   -1                                                                                 -1
                                                                                           0         0.2         0.4         0.6            0.8              1              0       0.2           0.4         0.6         0.8              1
                                                                                                                       x                                                                                 x
                                                                                                     (c) k = 3, Alternating flux II
Figure A.7: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.5.
                                                                                                                                                             101


         12                                                        12                                                                                                                 12                                                                                 12
          9                                                         9                                                                                                                     9                                                                               9
          6                                                         6                                                                                                                     6                                                                               6
          3                                                         3                                                                                                                     3                                                                               3
          0                                                         0                                                                                                                     0                                                                               0
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                     0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
          1                                                         1                                                                                                                     1                                                                               1
                                               T = 40.0                                                                      T = 80.0                                                                                                                T = 40.0                                                  T = 80.0
        0.5                                                       0.5                                                                                                                0.5                                                                                0.5
    E     0                                                   E     0                                                                                                           E         0                                                                         E     0
        -0.5                                                      -0.5                                                                                                              -0.5                                                                                -0.5
         -1                                                        -1                                                                                                                 -1                                                                                 -1
               0   0.2   0.4       0.6   0.8              1              0       0.2           0.4         0.6         0.8              1                                                     0         0.2         0.4         0.6            0.8              1              0   0.2   0.4       0.6   0.8              1
                               x                                                                      x                                                                                                                   x                                                                    x
                         (a) k = 1, Upwind flux                                                                                                                                                                     (b) k = 2, Upwind flux
                                                                                   12                                                                                 12
                                                                                       9                                                                               9
                                                                                       6                                                                               6
                                                                                       3                                                                               3
                                                                                       0                                                                               0
                                                                                           0         0.2         0.4         0.6            0.8              1              0       0.2           0.4         0.6         0.8              1
                                                                                       1                                                                               1
                                                                                                                                                  T = 40.0                                                                      T = 80.0
                                                                                  0.5                                                                                0.5
                                                                             E         0                                                                         E     0
                                                                                 -0.5                                                                                -0.5
                                                                                   -1                                                                                 -1
                                                                                           0         0.2         0.4         0.6            0.8              1              0       0.2           0.4         0.6         0.8              1
                                                                                                                       x                                                                                 x
                                                                                                                 (c) k = 3, Upwind flux
Figure A.8: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.5.
                                                                                                                                                             102


          2                                                     2                                                     2
                                               T=40                                                  T=40                                                  T=40
        1.6                                    T=80           1.6                                    T=80           1.6                                    T=80
        1.2                                                   1.2                                                   1.2
        0.8                                                   0.8                                                   0.8
        0.4                                                   0.4                                                   0.4
    E     0                                               E     0                                               E     0
        -0.4                                                  -0.4                                                  -0.4
        -0.8                                                  -0.8                                                  -0.8
        -1.2                                                  -1.2                                                  -1.2
        -1.6                                                  -1.6                                                  -1.6
         -2                                                    -2                                                    -2
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
          2                                                     2                                                     2
                                               T=40                                                  T=40                                                  T=40
        1.6                                    T=80           1.6                                    T=80           1.6                                    T=80
        1.2                                                   1.2                                                   1.2
        0.8                                                   0.8                                                   0.8
        0.4                                                   0.4                                                   0.4
    E     0                                               E     0                                               E     0
        -0.4                                                  -0.4                                                  -0.4
        -0.8                                                  -0.8                                                  -0.8
        -1.2                                                  -1.2                                                  -1.2
        -1.6                                                  -1.6                                                  -1.6
         -2                                                    -2                                                    -2
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
          2                                                     2                                                     2
                                               T=40                                                  T=40                                                  T=40
        1.6                                    T=80           1.6                                    T=80           1.6                                    T=80
        1.2                                                   1.2                                                   1.2
        0.8                                                   0.8                                                   0.8
        0.4                                                   0.4                                                   0.4
    E     0                                               E     0                                               E     0
        -0.4                                                  -0.4                                                  -0.4
        -0.8                                                  -0.8                                                  -0.8
        -1.2                                                  -1.2                                                  -1.2
        -1.6                                                  -1.6                                                  -1.6
         -2                                                    -2                                                    -2
               0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1              0   0.2   0.4       0.6   0.8          1
                               x                                                     x                                                     x
Figure A.9: Transient second order (M = 2) temporal soliton propagation with the adap-
tive Sparse Grid DG fully implicit scheme, in Subsection 2.5.2. Maximum mesh level
N = 11. Refinement threshold ε = 10−5 , and coarsen threshold is η = 10−6 . First column:
piecewise polynomial degree k = 1; second column: k = 2; third column: k = 3. First
row: alternating flux I; second row: alternating flux II; third row: upwind flux. Solutions
at time T = 40 and T = 80 are plotted on each case.
                                                                               103


         12                                                      12                                                                                                        12                                                                              12
         9                                                       9                                                                                                           9                                                                             9
         6                                                       6                                                                                                           6                                                                             6
         3                                                       3                                                                                                           3                                                                             3
         0                                                       0                                                                                                           0                                                                             0
              0   0.2   0.4       0.6   0.8              1            0   0.2           0.4         0.6         0.8              1                                               0         0.2         0.4         0.6            0.8              1            0   0.2   0.4       0.6   0.8              1
         2                                                       2                                                                                                           2                                                                             2
                                              T = 40.0                                                                T = 80.0                                                                                                          T = 40.0                                                T = 80.0
         1                                                       1                                                                                                           1                                                                             1
     E   0                                                   E   0                                                                                                     E     0                                                                         E   0
         -1                                                      -1                                                                                                        -1                                                                              -1
         -2                                                      -2                                                                                                        -2                                                                              -2
              0   0.2   0.4       0.6   0.8              1            0   0.2           0.4         0.6         0.8              1                                               0         0.2         0.4         0.6            0.8              1            0   0.2   0.4       0.6   0.8              1
                              x                                                                x                                                                                                             x                                                                  x
                   (a) k = 1, Alternating flux I                                                                                                                                            (b) k = 2, Alternating flux I
                                                                              12                                                                              12
                                                                                9                                                                             9
                                                                                6                                                                             6
                                                                                3                                                                             3
                                                                                0                                                                             0
                                                                                    0         0.2         0.4         0.6            0.8              1            0   0.2           0.4         0.6         0.8              1
                                                                                2                                                                             2
                                                                                                                                           T = 40.0                                                                T = 80.0
                                                                                1                                                                             1
                                                                          E     0                                                                         E   0
                                                                              -1                                                                              -1
                                                                              -2                                                                              -2
                                                                                    0         0.2         0.4         0.6            0.8              1            0   0.2           0.4         0.6         0.8              1
                                                                                                                x                                                                           x
                                                                                               (c) k = 3, Alternating flux I
Figure A.10: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of alternating flux I case, as in Figure A.9.
                                                                                                                                                      104


         12                                                      12                                                                                                        12                                                                              12
         9                                                       9                                                                                                           9                                                                             9
         6                                                       6                                                                                                           6                                                                             6
         3                                                       3                                                                                                           3                                                                             3
         0                                                       0                                                                                                           0                                                                             0
              0   0.2   0.4       0.6   0.8              1            0   0.2           0.4         0.6         0.8              1                                               0         0.2         0.4         0.6            0.8              1            0   0.2   0.4       0.6   0.8              1
         2                                                       2                                                                                                           2                                                                             2
                                              T = 40.0                                                                T = 80.0                                                                                                          T = 40.0                                                T = 80.0
         1                                                       1                                                                                                           1                                                                             1
     E   0                                                   E   0                                                                                                     E     0                                                                         E   0
         -1                                                      -1                                                                                                        -1                                                                              -1
         -2                                                      -2                                                                                                        -2                                                                              -2
              0   0.2   0.4       0.6   0.8              1            0   0.2           0.4         0.6         0.8              1                                               0         0.2         0.4         0.6            0.8              1            0   0.2   0.4       0.6   0.8              1
                              x                                                                x                                                                                                             x                                                                  x
                  (a) k = 1, Alternating flux II                                                                                                                                           (b) k = 2, Alternating flux II
                                                                              12                                                                              12
                                                                                9                                                                             9
                                                                                6                                                                             6
                                                                                3                                                                             3
                                                                                0                                                                             0
                                                                                    0         0.2         0.4         0.6            0.8              1            0   0.2           0.4         0.6         0.8              1
                                                                                2                                                                             2
                                                                                                                                           T = 40.0                                                                T = 80.0
                                                                                1                                                                             1
                                                                          E     0                                                                         E   0
                                                                              -1                                                                              -1
                                                                              -2                                                                              -2
                                                                                    0         0.2         0.4         0.6            0.8              1            0   0.2           0.4         0.6         0.8              1
                                                                                                                x                                                                           x
                                                                                              (c) k = 3, Alternating flux II
Figure A.11: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of alternating flux II case, as in Figure A.9.
                                                                                                                                                      105


         12                                                      12                                                                                                        12                                                                              12
         9                                                       9                                                                                                           9                                                                             9
         6                                                       6                                                                                                           6                                                                             6
         3                                                       3                                                                                                           3                                                                             3
         0                                                       0                                                                                                           0                                                                             0
              0   0.2   0.4       0.6   0.8              1            0   0.2           0.4         0.6         0.8              1                                               0         0.2         0.4         0.6            0.8              1            0   0.2   0.4       0.6   0.8              1
         2                                                       2                                                                                                           2                                                                             2
                                              T = 40.0                                                                T = 80.0                                                                                                          T = 40.0                                                T = 80.0
         1                                                       1                                                                                                           1                                                                             1
     E   0                                                   E   0                                                                                                     E     0                                                                         E   0
         -1                                                      -1                                                                                                        -1                                                                              -1
         -2                                                      -2                                                                                                        -2                                                                              -2
              0   0.2   0.4       0.6   0.8              1            0   0.2           0.4         0.6         0.8              1                                               0         0.2         0.4         0.6            0.8              1            0   0.2   0.4       0.6   0.8              1
                              x                                                                x                                                                                                             x                                                                  x
                        (a) k = 1, Upwind flux                                                                                                                                                         (b) k = 2, Upwind flux
                                                                              12                                                                              12
                                                                                9                                                                             9
                                                                                6                                                                             6
                                                                                3                                                                             3
                                                                                0                                                                             0
                                                                                    0         0.2         0.4         0.6            0.8              1            0   0.2           0.4         0.6         0.8              1
                                                                                2                                                                             2
                                                                                                                                           T = 40.0                                                                T = 80.0
                                                                                1                                                                             1
                                                                          E     0                                                                         E   0
                                                                              -1                                                                              -1
                                                                              -2                                                                              -2
                                                                                    0         0.2         0.4         0.6            0.8              1            0   0.2           0.4         0.6         0.8              1
                                                                                                                x                                                                           x
                                                                                                          (c) k = 3, Upwind flux
Figure A.12: Comparison of active elements and electric field solutions, at T = 40 and
T = 80, for k = 1, 2, 3, of upwind flux case, as in Figure A.9.
                                                                                                                                                      106


                                                                                             APPENDIX B
            NUMERICAL EXPERIMENTS OF ASYNCHRONOUS DATA FLOW MODELS
        1                                                          1.5         1                                                           1.5         1                                                                1.5
    0.8                                                                      0.8                                                                     0.8
                                                                   1                                                                       1                                                                            1
    0.6                                                                      0.6                                                                     0.6
x                                                                        x                                                                       x
    0.4                                                                      0.4                                                                     0.4
                                                                   0.5                                                                     0.5                                                                          0.5
    0.2                                                                      0.2                                                                     0.2
        0                                                          0           0                                                           0           0                                                                0
            0    0.2    0.4       0.6          0.8         1                       0        0.2     0.4         0.6     0.8        1                        0        0.2         0.4         0.6      0.8       1
                              z                                                                            z                                                                           z
                  (a) r at t = 0.1                                                           (b) r at t = 0.25                                                        (c) r at t = 0.5
        1                                                          1.5         1                                                           1.5         1                                                                1.5
    0.8                                                                      0.8                                                                     0.8
                                                                   1                                                                       1                                                                            1
    0.6                                                                      0.6                                                                     0.6
x                                                                        x                                                                       x
    0.4                                                                      0.4                                                                     0.4
                                                                   0.5                                                                     0.5                                                                          0.5
    0.2                                                                      0.2                                                                     0.2
        0                                                          0           0                                                           0           0                                                                0
            0    0.2    0.4       0.6          0.8         1                       0        0.2     0.4         0.6     0.8        1                        0        0.2         0.4         0.6      0.8       1
                              z                                                                            z                                                                           z
                  (d) ρ at t = 0.1                                                           (e) ρ at t = 0.25                                                        (f) ρ at t = 0.5
        1                                                                      1                                                                      1
    0.8                                                        0.2           0.8                                                       0.2           0.8                                                            0.2
    0.6                                                                      0.6                                                                     0.6
x                                                              0         x                                                             0         x                                                                  0
    0.4                                                                      0.4                                                                     0.4
    0.2                                                        −0.2          0.2                                                       −0.2          0.2                                                            −0.2
        0                                                                      0                                                                      0
            0    0.2    0.4       0.6         0.8      1                           0        0.2     0.4        0.6     0.8     1                           0        0.2      0.4           0.6       0.8    1
                              z                                                                           z                                                                            z
                (g) ρ − r at t = 0.1                                                       (h) ρ − r at t = 0.25                                                    (i) ρ − r at t = 0.5
1.5                                                                          1.5                                                                      1.5
                                                               r                                                                       r                                                                                r
                                                               ρ                                                                       ρ                                                                                ρ
    1                                                                         1                                                                        1
0.5                                                                          0.5                                                                      0.5
    0                                                                         0                                                                        0
            0     0.2     0.4           0.6          0.8       1                       0      0.2         0.4         0.6     0.8          1                    0          0.2             0.4       0.6    0.8          1
                                  z                                                                              z                                                                               z
    (j) ρ,r at (x, t) = (0.3, 0.1)                                            (k) ρ,r at (x, t) = (0.3, 0.25)                                              (l) ρ,r at (x, t) = (0.3, 0.5)
Figure B.1: Discrete solution r and continuum solution ρ of η = 0.2, in Example 3.5.1.
From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Dis-
crete solution is computed with (imax , k max ) = (1000, 200). Continuum solution is com-
puted on a 103 × 103 mesh.
                                                                                                          107


        1                                                         1.5             1                                                    1.5         1                                                             1.5
    0.8                                                                      0.8                                                                 0.8
                                                                  1                                                                    1                                                                         1
    0.6                                                                      0.6                                                                 0.6
x                                                                       x                                                                    x
    0.4                                                                      0.4                                                                 0.4
                                                                  0.5                                                                  0.5                                                                       0.5
    0.2                                                                      0.2                                                                 0.2
        0                                                         0               0                                                    0           0                                                             0
            0    0.2    0.4       0.6         0.8         1                           0     0.2    0.4         0.6     0.8     1                        0        0.2         0.4       0.6       0.8   1
                              z                                                                           z                                                                        z
                  (a) r at t = 0.1                                                          (b) r at t = 0.25                                                     (c) r at t = 0.5
        1                                                         1.5             1                                                    1.5         1                                                             1.5
    0.8                                                                      0.8                                                                 0.8
                                                                  1                                                                    1                                                                         1
    0.6                                                                      0.6                                                                 0.6
x                                                                       x                                                                    x
    0.4                                                                      0.4                                                                 0.4
                                                                  0.5                                                                  0.5                                                                       0.5
    0.2                                                                      0.2                                                                 0.2
        0                                                         0               0                                                    0           0                                                             0
            0    0.2    0.4       0.6         0.8         1                           0     0.2    0.4         0.6     0.8     1                        0        0.2         0.4       0.6       0.8   1
                              z                                                                           z                                                                        z
                  (d) ρ at t = 0.1                                                          (e) ρ at t = 0.25                                                     (f) ρ at t = 0.5
        1                                                                     1                                                                   1
    0.8                                                       0.2           0.8                                                    0.2           0.8                                                         0.2
    0.6                                                                     0.6                                                                  0.6
x                                                             0         x                                                          0         x                                                               0
    0.4                                                                     0.4                                                                  0.4
    0.2                                                       −0.2          0.2                                                    −0.2          0.2                                                         −0.2
        0                                                                     0                                                                   0
            0    0.2    0.4       0.6         0.8     1                           0        0.2     0.4        0.6     0.8     1                        0        0.2      0.4           0.6       0.8   1
                              z                                                                          z                                                                     z
                (g) ρ − r at t = 0.1                                                      (h) ρ − r at t = 0.25                                                 (i) ρ − r at t = 0.5
1.5                                                                         1.5                                                                   1.5
                                                              r                                                                    r                                                                             r
                                                              ρ                                                                    ρ                                                                             ρ
    1                                                                        1                                                                     1
0.5                                                                         0.5                                                                   0.5
    0                                                                        0                                                                     0
            0     0.2     0.4           0.6         0.8       1                       0      0.2         0.4         0.6     0.8       1                    0          0.2         0.4           0.6   0.8           1
                                  z                                                                             z                                                                            z
    (j) ρ,r at (x, t) = (0.3, 0.1)                                           (k) ρ,r at (x, t) = (0.3, 0.25)                                           (l) ρ,r at (x, t) = (0.3, 0.5)
Figure B.2: Discrete solution r and continuum solution ρ of η = 1 case, in Example 3.5.1.
From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5.
Discrete solution is computed with (imax , k max ) = (500, 500). Continuum solution is com-
puted on a 103 × 103 mesh.
                                                                                                         108


        1                                                          1.5         1                                                           1.5         1                                                               1.5
    0.8                                                                      0.8                                                                     0.8
                                                                   1                                                                       1                                                                           1
    0.6                                                                      0.6                                                                     0.6
x                                                                        x                                                                       x
    0.4                                                                      0.4                                                                     0.4
                                                                   0.5                                                                     0.5                                                                         0.5
    0.2                                                                      0.2                                                                     0.2
        0                                                          0           0                                                           0           0                                                               0
            0    0.2    0.4       0.6          0.8         1                       0        0.2     0.4         0.6     0.8        1                        0       0.2         0.4         0.6      0.8       1
                              z                                                                            z                                                                          z
                  (a) r at t = 0.1                                                           (b) r at t = 0.25                                                       (c) r at t = 0.5
        1                                                          1.5         1                                                           1.5         1                                                               1.5
    0.8                                                                      0.8                                                                     0.8
                                                                   1                                                                       1                                                                           1
    0.6                                                                      0.6                                                                     0.6
x                                                                        x                                                                       x
    0.4                                                                      0.4                                                                     0.4
                                                                   0.5                                                                     0.5                                                                         0.5
    0.2                                                                      0.2                                                                     0.2
        0                                                          0           0                                                           0           0                                                               0
            0    0.2    0.4       0.6          0.8         1                       0        0.2     0.4         0.6     0.8        1                        0       0.2         0.4         0.6      0.8       1
                              z                                                                            z                                                                          z
                  (d) ρ at t = 0.1                                                           (e) ρ at t = 0.25                                                        (f) ρ at t = 0.5
        1                                                                      1                                                                      1
    0.8                                                        0.2           0.8                                                       0.2           0.8                                                           0.2
    0.6                                                                      0.6                                                                     0.6
x                                                              0         x                                                             0         x                                                                 0
    0.4                                                                      0.4                                                                     0.4
    0.2                                                        −0.2          0.2                                                       −0.2          0.2                                                           −0.2
        0                                                                      0                                                                      0
            0    0.2    0.4       0.6         0.8      1                           0        0.2     0.4        0.6     0.8     1                           0        0.2     0.4           0.6       0.8    1
                              z                                                                           z                                                                           z
                (g) ρ − r at t = 0.1                                                       (h) ρ − r at t = 0.25                                                (i) (ρ − r) at t = 0.50
1.5                                                                          1.5                                                                      1.5
                                                               r                                                                       r                                                                               r
                                                               ρ                                                                       ρ                                                                               ρ
    1                                                                         1                                                                        1
0.5                                                                          0.5                                                                      0.5
    0                                                                         0                                                                        0
            0     0.2     0.4           0.6          0.8       1                       0      0.2         0.4         0.6     0.8          1                    0         0.2             0.4       0.6    0.8          1
                                  z                                                                              z                                                                              z
    (j) ρ,r at (x, t) = (0.3, 0.1)                                            (k) ρ,r at (x, t) = (0.3, 0.25)                                              (l) ρ,r at (x, t) = (0.3, 0.5)
Figure B.3: Discrete solution r and continuum solution ρ of η = 5 case, in Example 3.5.1.
From left to right, columns correspond to solutions at t = 0.1, t = 0.25, and t = 0.5. Dis-
crete solution is computed with (imax , k max ) = (200, 1000). Continuum solution is com-
puted on a 103 × 103 mesh.
                                                                                                          109


         1                                               1.5           1                                             1.5           1                                               1.5
     0.8                                                           0.8                                                         0.8
                                                         1                                                           1                                                             1
     0.6                                                           0.6                                                         0.6
 x                                                             x                                                           x
     0.4                                                           0.4                                                         0.4
                                                         0.5                                                         0.5                                                           0.5
     0.2                                                           0.2                                                         0.2
         0                                               0             0                                             0             0                                               0
             0     0.2    0.4       0.6     0.8    1                       0   0.2    0.4       0.6     0.8    1                       0     0.2    0.4       0.6     0.8    1
                                z                                                           z                                                             z
             (a) ρ at t = 0.1, η = 0.2                                     (b) ρ at t = 0.25, η = 0.2                                  (c) ρ at t = 0.5, η = 0.2
         1                                               1.5           1                                             1.5           1                                               1.5
     0.8                                                           0.8                                                         0.8
                                                         1                                                           1                                                             1
     0.6                                                           0.6                                                         0.6
 x                                                             x                                                           x
     0.4                                                           0.4                                                         0.4
                                                         0.5                                                         0.5                                                           0.5
     0.2                                                           0.2                                                         0.2
         0                                               0             0                                             0             0                                               0
             0     0.2    0.4       0.6     0.8    1                       0   0.2    0.4       0.6     0.8    1                       0     0.2    0.4       0.6     0.8    1
                                z                                                           z                                                             z
                 (d) ρ at t = 0.1, η = 1                                    (e) ρ at t = 0.25, η = 1                                       (f) ρ at t = 0.5, η = 1
         1                                               1.5           1                                             1.5           1                                               1.5
     0.8                                                           0.8                                                         0.8
                                                         1                                                           1                                                             1
     0.6                                                           0.6                                                         0.6
 x                                                             x                                                           x
     0.4                                                           0.4                                                         0.4
                                                         0.5                                                         0.5                                                           0.5
     0.2                                                           0.2                                                         0.2
         0                                               0             0                                             0             0                                               0
             0     0.2    0.4       0.6     0.8    1                       0   0.2    0.4       0.6     0.8    1                       0     0.2    0.4       0.6     0.8    1
                                z                                                           z                                                             z
                 (g) ρ at t = 0.1, η = 5                                   (h) ρ at t = 0.25, η = 5                                        (i) ρ at t = 0.5, η = 5
 1.5                                                           1.5                                                         1.5
                                                    η = 0.2                                                     η = 0.2                                                       η = 0.2
                                                     η=1                                                         η=1                                                           η=1
                                                     η=5                                                         η=5                                                           η=5
     1                                                             1                                                           1
 0.5                                                           0.5                                                         0.5
     0                                                             0                                                           0
             0      0.2     0.4           0.6     0.8     1                0    0.2     0.4           0.6     0.8     1                0      0.2     0.4           0.6     0.8     1
                                     z                                                           z                                                             z
                    (j) ρ(0.3, z, 0.1)                                         (k) ρ(0.3, z, 0.25)                                            (l) ρ(0.3, z, 0.5)
Figure B.4: The effects on ρ due to variations in η, in Example 3.5.2. As η increases, the
throttling effect of local slowdown spreads more quickly, and data is not processed as
quickly.
                                                                                        110


         1                                               1.5           1                                             1.5           1                                               1.5
     0.8                                                           0.8                                                         0.8
                                                         1                                                           1                                                             1
     0.6                                                           0.6                                                         0.6
 x                                                             x                                                           x
     0.4                                                           0.4                                                         0.4
                                                         0.5                                                         0.5                                                           0.5
     0.2                                                           0.2                                                         0.2
         0                                               0             0                                             0             0                                               0
             0     0.2    0.4       0.6     0.8    1                       0   0.2    0.4       0.6     0.8    1                       0     0.2    0.4       0.6     0.8    1
                                z                                                           z                                                             z
             (a) ρ at t = 0.1, β = 0.1                                 (b) ρ at t = 0.25, β = 0.1                                      (c) ρ at t = 0.5, β = 0.1
         1                                               1.5           1                                             1.5           1                                               1.5
     0.8                                                           0.8                                                         0.8
                                                         1                                                           1                                                             1
     0.6                                                           0.6                                                         0.6
 x                                                             x                                                           x
     0.4                                                           0.4                                                         0.4
                                                         0.5                                                         0.5                                                           0.5
     0.2                                                           0.2                                                         0.2
         0                                               0             0                                             0             0                                               0
             0     0.2    0.4       0.6     0.8    1                       0   0.2    0.4       0.6     0.8    1                       0     0.2    0.4       0.6     0.8    1
                                z                                                           z                                                             z
             (d) ρ at t = 0.1, β = 0.5                                     (e) ρ at t = 0.25, β = 0.5                                  (f) ρ at t = 0.5, β = 0.5
         1                                               1.5           1                                             1.5           1                                               1.5
     0.8                                                           0.8                                                         0.8
                                                         1                                                           1                                                             1
     0.6                                                           0.6                                                         0.6
 x                                                             x                                                           x
     0.4                                                           0.4                                                         0.4
                                                         0.5                                                         0.5                                                           0.5
     0.2                                                           0.2                                                         0.2
         0                                               0             0                                             0             0                                               0
             0     0.2    0.4       0.6     0.8    1                       0   0.2    0.4       0.6     0.8    1                       0     0.2    0.4       0.6     0.8    1
                                z                                                           z                                                             z
                 (g) ρ at t = 0.1, β = 1                                   (h) ρ at t = 0.25, β = 1                                        (i) ρ at t = 0.5, β = 1
 1.5                                                           1.5                                                         1.5
                                                    β = 0.1                                                     β = 0.1                                                       β = 0.1
                                                    β = 0.5                                                     β = 0.5                                                       β = 0.5
                                                     β=1                                                         β=1                                                           β=1
     1                                                             1                                                           1
 0.5                                                           0.5                                                         0.5
     0                                                             0                                                           0
             0      0.2     0.4           0.6     0.8     1                0    0.2     0.4           0.6     0.8     1                0      0.2     0.4           0.6     0.8     1
                                     z                                                           z                                                             z
                    (j) ρ(0.3, z, 0.1)                                         (k) ρ(0.3, z, 0.25)                                            (l) ρ(0.3, z, 0.5)
Figure B.5: The effects on ρ due to variations in β, in Example 3.5.3. Larger values of β
lead to more throttling.
                                                                                        111


BIBLIOGRAPHY
      112


                                    BIBLIOGRAPHY
[1]  Vrushali A Bokil, Yingda Cheng, Yan Jiang, and Fengyan Li. Energy stable dis-
     continuous Galerkin methods for Maxwell’s equations in nonlinear optical media.
     Journal of Computational Physics, 350:420–452, 2017.
[2]  Maohui Lyu, Vrushali A Bokil, Yingda Cheng, and Fengyan Li. Energy stable
     nodal discontinuous Galerkin methods for nonlinear Maxwell’s equations in multi-
     dimensions. Journal of Scientific Computing, 89(2):1–42, 2021.
[3]  Zixuan Wang, Qi Tang, Wei Guo, and Yingda Cheng. Sparse grid discontinuous
     Galerkin methods for high-dimensional elliptic equations. Journal of Computational
     Physics, 314:244–263, 2016.
[4]  Wei Guo and Yingda Cheng. A sparse grid discontinuous Galerkin method for
     high-dimensional transport equations and its application to kinetic simulations.
     SIAM Journal on Scientific Computing, 38(6):A3381–A3409, 2016.
[5]  Wei Guo and Yingda Cheng. An adaptive multiresolution discontinuous Galerkin
     method for time-dependent transport equations in multidimensions. SIAM Journal
     on Scientific Computing, 39(6):A2962–A2992, 2017.
[6]  Juntao Huang and Yingda Cheng. An adaptive multiresolution discontinuous
     Galerkin method with artificial viscosity for scalar hyperbolic conservation laws in
     multidimensions. SIAM Journal on Scientific Computing, 42(5):A2943–A2973, 2020.
[7]  Zhanjing Tao, Yan Jiang, and Yingda Cheng. An adaptive high-order piecewise
     polynomial based sparse grid collocation method with applications. Journal of Com-
     putational Physics, 433:109770, 2021.
[8]  Yingda Cheng, Wei Guo, Juntao Huang, Kai Huang, Yuan Liu, and Zhanjing Tao.
     Adaptive multiresolution discontinuous Galerkin (DG) C++ package for solving
     partial differential equations in high dimensions. https://github.com/JuntaoHua
     ng/adaptive-multiresolution-DG, 2019-2022.
[9]  Satish Balay, Shrirang Abhyankar, Mark F. Adams, Steven Benson, Jed Brown, Peter
     Brune, Kris Buschelman, Emil M. Constantinescu, Lisandro Dalcin, Alp Dener, Vic-
     tor Eijkhout, William D. Gropp, Václav Hapla, Tobin Isaac, Pierre Jolivet, Dmitry
     Karpeev, Dinesh Kaushik, Matthew G. Knepley, Fande Kong, Scott Kruger, Dave A.
     May, Lois Curfman McInnes, Richard Tran Mills, Lawrence Mitchell, Todd Mun-
     son, Jose E. Roman, Karl Rupp, Patrick Sanan, Jason Sarich, Barry F. Smith, Ste-
     fano Zampini, Hong Zhang, Hong Zhang, and Junchao Zhang. PETSc Web page.
     https://petsc.org/, 2022.
[10] Richard C Barnard, Kai Huang, and Cory Hauck. A mathematical model of asyn-
     chronous data flow in parallel computers. IMA Journal of Applied Mathematics,
     85(6):865–891, 2020.
                                            113


[11] Chi-Wang Shu. High order numerical methods for time dependent Hamilton-Jacobi
     equations. In Mathematics and computation in imaging science and information process-
     ing, pages 47–91. World Scientific, 2007.
[12] Chi-Wang Shu. Essentially non-oscillatory and weighted essentially non-oscillatory
     schemes for hyperbolic conservation laws. Advanced numerical approximation of non-
     linear hyperbolic equations, pages 325–432, 1998.
[13] William H Reed and Thomas R Hill. Triangular mesh methods for the neutron
     transport equation. Technical report, Los Alamos Scientific Lab., N. Mex.(USA),
     1973.
[14] Bernardo Cockburn and Chi-Wang Shu. The Runge-Kutta local projection-
     discontinuous-Galerkin finite element method for scalar conservation laws. ESAIM:
     Mathematical Modelling and Numerical Analysis, 25(3):337–361, 1991.
[15] Bernardo Cockburn and Chi-Wang Shu. TVB Runge-Kutta local projection discon-
     tinuous Galerkin finite element method for conservation laws. II. general frame-
     work. Mathematics of computation, 52(186):411–435, 1989.
[16] Bernardo Cockburn, San-Yih Lin, and Chi-Wang Shu. TVB Runge-Kutta local pro-
     jection discontinuous Galerkin finite element method for conservation laws III: one-
     dimensional systems. Journal of computational Physics, 84(1):90–113, 1989.
[17] Bernardo Cockburn, Suchung Hou, and Chi-Wang Shu. The Runge-Kutta local
     projection discontinuous Galerkin finite element method for conservation laws. IV.
     the multidimensional case. Mathematics of Computation, 54(190):545–581, 1990.
[18] Bernardo Cockburn and Chi-Wang Shu. The Runge-Kutta discontinuous Galerkin
     method for conservation laws V: multidimensional systems. Journal of Computa-
     tional Physics, 141(2):199–224, 1998.
[19] Bernardo Cockburn and Chi-Wang Shu. The P 1 -RKDG method for two-dimensional
     Euler equations of gas dynamics, volume 91-9. NASA Langley Research Center. Insti-
     tute for Computer Applications in Science and Engineering (ICASE), 1991.
[20] Chi-Wang Shu and Stanley Osher. Efficient implementation of essentially non-
     oscillatory shock-capturing schemes. Journal of computational physics, 77(2):439–471,
     1988.
[21] Sigal Gottlieb, David I Ketcheson, and Chi-Wang Shu. Strong stability preserving
     Runge-Kutta and multistep time discretizations. World Scientific, 2011.
[22] Chi-Wang Shu. TVB uniformly high-order schemes for conservation laws. Mathe-
     matics of Computation, 49(179):105–121, 1987.
[23] Jianxian Qiu and Qiang Zhang. Stability, error estimate and limiters of discontinu-
     ous Galerkin methods. In Handbook of Numerical Analysis, volume 17, pages 147–171.
     Elsevier, 2016.
                                            114


[24] Francesco Bassi and Stefano Rebay. A high-order accurate discontinuous finite el-
     ement method for the numerical solution of the compressible Navier–Stokes equa-
     tions. Journal of computational physics, 131(2):267–279, 1997.
[25] Bernardo Cockburn and Chi-Wang Shu. The local discontinuous Galerkin method
     for time-dependent convection-diffusion systems. SIAM journal on numerical analy-
     sis, 35(6):2440–2463, 1998.
[26] Bernardo Cockburn, George E Karniadakis, and Chi-Wang Shu. The development
     of discontinuous Galerkin methods. In Discontinuous Galerkin Methods, pages 3–50.
     Springer, 2000.
[27] Chi-Wang Shu. Discontinuous Galerkin methods: general approach and stability.
     Numerical solutions of partial differential equations, 201, 2009.
[28] Chi-Wang Shu. Discontinuous Galerkin method for time-dependent problems: sur-
     vey and recent developments. Recent developments in discontinuous Galerkin finite
     element methods for partial differential equations, pages 25–62, 2014.
[29] Douglas N Arnold, Franco Brezzi, Bernardo Cockburn, and Donatella Marini. Dis-
     continuous Galerkin methods for elliptic problems. In Discontinuous Galerkin Meth-
     ods, pages 89–101. Springer, 2000.
[30] Jacques Louis Lions. Problemes aux Limites non Homogenes a Donnees Irregulieres,
     pages 283–292. Springer Berlin Heidelberg, Berlin, Heidelberg, 1968.
[31] Joachim Nitsche. Über ein Variationsprinzip zur Lösung von Dirichlet-Problemen
     bei Verwendung von Teilräumen, die keinen Randbedingungen unterworfen sind.
     In Abhandlungen aus dem mathematischen Seminar der Universität Hamburg, vol-
     ume 36, pages 9–15. Springer, 1971.
[32] Ivo Babuška. The finite element method with penalty. Mathematics of computation,
     27(122):221–228, 1973.
[33] Douglas N. Arnold. An Interior Penalty Finite Element Method with Discontinuous
     Elements. PhD thesis, The University of Chicago, 1979.
[34] Douglas N Arnold. An interior penalty finite element method with discontinuous
     elements. SIAM journal on numerical analysis, 19(4):742–760, 1982.
[35] Douglas N Arnold, Franco Brezzi, Bernardo Cockburn, and L Donatella Marini.
     Unified analysis of discontinuous Galerkin methods for elliptic problems. SIAM
     journal on numerical analysis, 39(5):1749–1779, 2002.
[36] Béatrice Rivière. Discontinuous Galerkin methods for solving elliptic and parabolic equa-
     tions: theory and implementation. SIAM, 2008.
[37] Chi-Wang Shu. A brief survey on discontinuous Galerkin methods in computa-
     tional fluid dynamics. Advances in mechanics, 43(6):541–553, 2013.
                                               115


[38] Bernardo Cockburn. Discontinuous Galerkin methods for convection-dominated
     problems. In High-order methods for computational physics, pages 69–224. Springer,
     1999.
[39] Bernardo Cockburn and Chi-Wang Shu. Runge-Kutta discontinuous Galerkin
     methods for convection-dominated problems. Journal of scientific computing,
     16(3):173–261, 2001.
[40] Douglas N. Arnold etc. Foreword for the special issue on discontinuous Galerkin
     method. J. Sci. Comput., 22–23(1–3), 2005.
[41] Bernardo Cockburn and Chi-Wang Shu. Foreword. J. Sci. Comput., 40(1–3):1–3, Jul
     2009.
[42] Clint Dawson. Foreword. Computer Methods in Applied Mechanics and Engineering,
     195(25):3183, 2006. Discontinuous Galerkin Methods.
[43] Yanlai Chen, Bo Dong, and Chi-Wang Shu. A foreword to the special issue in honor
     of Professor Bernardo Cockburn on his 60th birthday: A life time of discontinuous
     schemings. J. Sci. Comput., 77(3):1303–1309, Dec. 2018.
[44] Jan Hesthaven, Jennifer Ryan, Chi-Wang Shu, Jaap Vegt, Yan Xu, Qiang Zhang,
     and Zhimin Zhang. Preface to focused issue on discontinuous galerkin methods.
     Communications on Applied Mathematics and Computation, pages 1–2, 10 2021.
[45] Jan S Hesthaven and Tim Warburton. Nodal discontinuous Galerkin methods: algo-
     rithms, analysis, and applications. Springer Science & Business Media, 2007.
[46] Guido Kanschat.       Discontinuous Galerkin methods for viscous incompressible flow.
     Springer, 2008.
[47] B.Q. Li. Discontinuous Finite Elements in Fluid Dynamics and Heat Transfer. Compu-
     tational Fluid and Solid Mechanics. Springer London, 2005.
[48] Daniele Antonio Di Pietro and Alexandre Ern. Mathematical aspects of discontinuous
     Galerkin methods, volume 69. Springer Science & Business Media, 2011.
[49] V. Dolejší and M. Feistauer. Discontinuous Galerkin Method: Analysis and Applica-
     tions to Compressible Flow. Springer Series in Computational Mathematics. Springer
     International Publishing, 2015.
[50] Eric T Chung, Patrick Ciarlet Jr, and Tang Fei Yu. Convergence and supercon-
     vergence of staggered discontinuous Galerkin methods for the three-dimensional
     Maxwell’s equations on Cartesian grids. Journal of Computational Physics, 235:14–31,
     2013.
[51] Bernardo Cockburn, Fengyan Li, and Chi-Wang Shu. Locally divergence-free dis-
     continuous Galerkin methods for the Maxwell equations. Journal of Computational
     Physics, 194(2):588–610, 2004.
                                             116


[52] Jan S Hesthaven and Tim Warburton. Nodal high-order methods on unstructured
     grids: I. time-domain solution of Maxwell’s equations. Journal of Computational
     Physics, 181(1):186–221, 2002.
[53] Stephen D Gedney, John C Young, Tyler C Kramer, and J Alan Roden. A discontin-
     uous Galerkin finite element time-domain method modeling of dispersive media.
     IEEE transactions on antennas and propagation, 60(4):1969–1977, 2012.
[54] Yunqing Huang, Jichun Li, and Wei Yang.                 Interior penalty DG methods
     for Maxwell’s equations in dispersive media.           Journal of Computational Physics,
     230(12):4559–4570, 2011.
[55] Stéphane Lanteri and Claire Scheid. Convergence of a discontinuous Galerkin
     scheme for the mixed time-domain Maxwell’s equations in dispersive media. IMA
     Journal of Numerical Analysis, 33(2):432–459, 2013.
[56] Tiao Lu, Pingwen Zhang, and Wei Cai. Discontinuous Galerkin methods for dis-
     persive and lossy Maxwell’s equations and PML boundary conditions. Journal of
     Computational Physics, 200(2):549–580, 2004.
[57] Eric T Chung and Patrick Ciarlet Jr. A staggered discontinuous Galerkin method
     for wave propagation in media with dielectrics and meta-materials. Journal of Com-
     putational and Applied Mathematics, 239:189–207, 2013.
[58] Jichun Li. Development of discontinuous Galerkin methods for Maxwell’s equa-
     tions in metamaterials and perfectly matched layers. Journal of Computational and
     Applied Mathematics, 236(5):950–961, 2011.
[59] Jichun Li and Jan S Hesthaven. Analysis and application of the nodal discontinuous
     Galerkin method for wave propagation in metamaterials. Journal of Computational
     Physics, 258:915–930, 2014.
[60] Jichun Li, Cengke Shi, and Chi-Wang Shu. Optimal non-dissipative discontinuous
     Galerkin methods for Maxwell’s equations in Drude metamaterials. Computers &
     Mathematics with Applications, 73(8):1760–1780, 2017.
[61] Elisabeth Blank. The Discontinuous Galerkin Method for Maxwell’s Equations: Applica-
     tion to Bodies of Revolution and Kerr-Nonlinearities. PhD thesis, Karlsruhe, Karlsruher
     Institut für Technologie (KIT), Diss., 2013, 2013.
[62] Loula Fezoui and Stéphane Lanteri. Discontinuous Galerkin methods for the numerical
     solution of the nonlinear Maxwell equations in 1d. PhD thesis, Inria, 2015.
[63] Juntao Huang and Chi-Wang Shu. A second-order asymptotic-preserving and
     positivity-preserving discontinuous Galerkin scheme for the Kerr–Debye model.
     Mathematical Models and Methods in Applied Sciences, 27(03):549–579, 2017.
[64] Zhichao Peng, Vrushali A Bokil, Yingda Cheng, and Fengyan Li. Asymptotic and
     positivity preserving methods for Kerr-Debye model with Lorentz dispersion in
     one dimension. Journal of Computational Physics, 402:109101, 2020.
                                            117


[65] Rand Corporation and Richard Bellman. Adoptive control processes: A guided tour.
     Princeton University Press, 1961.
[66] Hans-Joachim Bungartz and Michael Griebel. Sparse grids. Acta numerica, 13:147–
     269, 2004.
[67] Jochen Garcke and Michael Griebel. Sparse grids and applications, volume 88.
     Springer Science & Business Media, 2012.
[68] C Zenger. Sparse Grids, Parallel Algorithms for Partial Differential Equations: Pro-
     ceedings of the Sixth GAMM-Seminar, Kiel, January 1990, Notes on Numerical
     Fluid Mechanics (Vol. 31), W. Hackbusch, eds., Vieweg, Braunschweig, 1991.
[69] Sergei Abramovich Smolyak. Quadrature and interpolation formulas for tensor
     products of certain classes of functions. In Doklady Akademii Nauk, volume 148,
     pages 1042–1045. Russian Academy of Sciences, 1963.
[70] Konstantin Ivanovich Babenko. Approximation of periodic functions of many vari-
     ables by trigonometric polynomials. In Doklady Akademii Nauk, volume 132, pages
     247–250. Russian Academy of Sciences, 1960.
[71] Vladimir Nikolaevich Temlyakov. Approximations of functions with bounded
     mixed derivative. Trudy Matematicheskogo Instituta imeni VA Steklova, 178:3–113,
     1986.
[72] F-J Delvos. d-variate boolean interpolation.       Journal of Approximation Theory,
     34(2):99–114, 1982.
[73] G Baszenski, F-J Delvos, and S Jester. Blending approximations with sine functions.
     In Numerical Methods in Approximation Theory, Vol. 9, pages 1–19. Springer, 1992.
[74] Chin Bo Liem and Tsimin Shih. Splitting Extrapolation Method, the: A New Technique
     In Numerical Solution Of Multidimensional Prob, volume 7. World Scientific, 1995.
[75] Michael Griebel. Sparse grids and related approximation schemes for higher dimensional
     problems. SFB 611, 2005.
[76] Bradley K Alpert. A class of bases in L2 for the sparse representation of integral
     operators. SIAM journal on Mathematical Analysis, 24(1):246–262, 1993.
[77] Beylkin Alpert, Gregory Beylkin, David Gines, and Lev Vozovoi. Adaptive solu-
     tion of partial differential equations in multiwavelet bases. Journal of Computational
     Physics, 182(1):149–190, 2002.
[78] Stephane Mallat. A wavelet tour of signal processing: An approximation tour, 1999.
[79] Yuan Liu, Yingda Cheng, Shanqin Chen, and Yong-Tao Zhang. Krylov implicit in-
     tegration factor discontinuous Galerkin methods on sparse grids for high dimen-
     sional reaction-diffusion equations. Journal of Computational Physics, 388:90–102,
     2019.
                                             118


[80] Wei Guo. A sparse grid discontinuous Galerkin method for the high-dimensional
     Helmholtz equation with variable coefficients. arXiv preprint arXiv:1905.08917, 2019.
[81] Zhanjing Tao, Wei Guo, and Yingda Cheng. Sparse grid discontinuous Galerkin
     methods for the Vlasov-Maxwell system. Journal of Computational Physics: X,
     3:100022, 2019.
[82] Jianguo Huang and Yue Yu. A sparse grid discrete ordinate discontinuous Galerkin
     method for the radiative transfer equation. arXiv preprint arXiv:2101.09070, 2021.
[83] Juntao Huang, Yuan Liu, Wei Guo, Zhanjing Tao, and Yingda Cheng. An adaptive
     multiresolution interior penalty discontinuous Galerkin method for wave equa-
     tions in second order form. Journal of Scientific Computing, 85(1):1–31, 2020.
[84] Jie Shen and Haijun Yu. Efficient spectral sparse grid methods and applica-
     tions to high-dimensional elliptic problems. SIAM Journal on Scientific Computing,
     32(6):3228–3250, 2010.
[85] Andreas Zeiser. Fast matrix-vector multiplication in the sparse-grid Galerkin
     method. Journal of Scientific Computing, 47(3):328–346, 2011.
[86] Juntao Huang, Yong Liu, Yuan Liu, Zhanjing Tao, and Yingda Cheng. A
     class of adaptive multiresolution ultra-weak discontinuous Galerkin methods for
     some nonlinear dispersive wave equations. SIAM Journal on Scientific Computing,
     44(2):A745–A769, 2022.
[87] Wei Guo, Juntao Huang, Zhanjing Tao, and Yingda Cheng. An adaptive sparse
     grid local discontinuous Galerkin method for Hamilton-Jacobi equations in high
     dimensions. Journal of Computational Physics, 436:110294, 2021.
[88] Zhanjing Tao, Juntao Huang, Yuan Liu, Wei Guo, and Yingda Cheng. An
     adaptive multiresolution ultra-weak discontinuous Galerkin method for nonlinear
     Schrödinger equations. Communications on Applied Mathematics and Computation,
     4(1):60–83, 2022.
[89] Chi-Wang Shu. Essentially non-oscillatory and weighted essentially non-oscillatory
     schemes. Acta Numerica, 29:701–762, 2020.
[90] Chi-Wang Shu. High order ENO and WENO schemes for computational fluid dy-
     namics. In High-order methods for computational physics, pages 439–582. Springer,
     1999.
[91] A. Harten, B. Engquist, S. Osher, and S.R. Chakravarthy. Uniformly high order
     accurate essentially non-oscillatory schemes. III. Journal of Computational Physics
     (Print), 71(2):231–303, 1987.
[92] Chi-Wang Shu and Stanley Osher. Efficient implementation of essentially non-
     oscillatory shock-capturing schemes. Journal of computational physics, 77(2):439–471,
     1988.
                                           119


[93] Chi-Wang Shu and Stanley Osher. Efficient implementation of essentially non-
      oscillatory shock-capturing schemes, II. Journal of Computational Physics, 83(1):32–
      78, 1989.
[94] Xu-Dong Liu, Stanley Osher, and Tony Chan. Weighted essentially non-oscillatory
      schemes. Journal of computational physics, 115(1):200–212, 1994.
[95] Oliver Friedrich. Weighted essentially non-oscillatory schemes for the interpolation
      of mean values on unstructured grids. Journal of computational physics, 144(1):194–
      212, 1998.
[96] Changqing Hu and Chi-Wang Shu. High order weighted ENO schemes for unstruc-
      tured meshes - preliminary results. Computational fluid dynamics’98, pages 356–362,
      1998.
[97] Changqing Hu and Chi-Wang Shu. Weighted essentially non-oscillatory schemes
      on triangular meshes. Journal of Computational Physics, 150(1):97–127, 1999.
[98] Doron Levy, Gabriella Puppo, and Giovanni Russo. Central WENO schemes for
      hyperbolic systems of conservation laws. ESAIM: Mathematical Modelling and Nu-
      merical Analysis, 33(3):547–571, 1999.
[99] Guang-Shan Jiang and Chi-Wang Shu. Efficient implementation of weighted eno
      schemes. Journal of computational physics, 126(1):202–228, 1996.
[100] Dinshaw S Balsara and Chi-Wang Shu. Monotonicity preserving weighted essen-
      tially non-oscillatory schemes with increasingly high order of accuracy. Journal of
      Computational Physics, 160(2):405–452, 2000.
[101] Chi-Wang Shu. High order weighted essentially nonoscillatory schemes for con-
      vection dominated problems. SIAM review, 51(1):82–126, 2009.
[102] Michael G Crandall and Pierre-Louis Lions. Viscosity solutions of Hamilton-Jacobi
      equations. Transactions of the American mathematical society, 277(1):1–42, 1983.
[103] Michael G Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscos-
      ity solutions of second order partial differential equations. Bulletin of the American
      mathematical society, 27(1):1–67, 1992.
[104] Guy Barles. An introduction to the theory of viscosity solutions for first-order
      Hamilton–Jacobi equations and applications. In Hamilton-Jacobi equations: approx-
      imations, numerical analysis and applications, pages 49–109. Springer, 2013.
[105] Stanley Osher and James A Sethian. Fronts propagating with curvature-dependent
      speed: Algorithms based on Hamilton-Jacobi formulations. Journal of computational
      physics, 79(1):12–49, 1988.
[106] Stanley Osher and Chi-Wang Shu. High-order essentially nonoscillatory schemes
      for Hamilton–Jacobi equations. SIAM Journal on numerical analysis, 28(4):907–922,
      1991.
                                              120


[107] Guang-Shan Jiang and Danping Peng. Weighted ENO schemes for Hamilton–Jacobi
      equations. SIAM Journal on Scientific computing, 21(6):2126–2143, 2000.
[108] Chi-Wang Shu. Total-variation-diminishing time discretizations. SIAM Journal on
      Scientific and Statistical Computing, 9(6):1073–1084, 1988.
[109] Sigal Gottlieb and Chi-Wang Shu. Total variation diminishing Runge-Kutta
      schemes. Mathematics of computation, 67(221):73–85, 1998.
[110] Sigal Gottlieb, Chi-Wang Shu, and Eitan Tadmor. Strong stability-preserving high-
      order time discretization methods. SIAM review, 43(1):89–112, 2001.
[111] G.P. Agrawal. Nonlinear Fiber Optics. Electronics & Electrical. Elsevier Science, 2007.
[112] Peter M Goorjian, Allen Taflove, Rose M Joseph, and Susan C Hagness. Compu-
      tational modeling of femtosecond optical solitons from Maxwell’s equations. IEEE
      journal of quantum electronics, 28(10):2416–2422, 1992.
[113] Gaël Guennebaud, Benoît Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010.
[114] Shrirang Abhyankar, Jed Brown, Emil M Constantinescu, Debojyoti Ghosh, Barry F
      Smith, and Hong Zhang. PETSc/TS: A modern scalable ODE/DAE solver library.
      arXiv preprint arXiv:1806.01437, 2018.
[115] Peter R Brune, Matthew G Knepley, Barry F Smith, and Xuemin Tu. Composing
      scalable nonlinear algebraic solvers. SIAM Review, 57(4):535–565, 2015.
[116] Dana A Knoll and David E Keyes. Jacobian-free Newton–Krylov methods: a survey
      of approaches and applications. Journal of Computational Physics, 193(2):357–397,
      2004.
[117] N. Bloembergen. Nonlinear Optics. World Scientific, 1996.
[118] R.W. Boyd. Nonlinear Optics. Elsevier Science, 2003.
[119] G. New. Introduction to Nonlinear Optics. Cambridge University Press, 2011.
[120] M.N.O. Sadiku. Computational Electromagnetics with MATLAB, Fourth Edition. CRC
      Press, 2018.
[121] L Gilles, SC Hagness, and L Vázquez. Comparison between staggered and un-
      staggered finite-difference time-domain grids for few-cycle temporal optical soliton
      propagation. Journal of Computational Physics, 161(2):379–400, 2000.
[122] Cheryl V Hile and William L Kath. Numerical solutions of Maxwell’s equations for
      nonlinear-optical pulse propagation. JOSA B, 13(6):1135–1145, 1996.
[123] Olivier Bokanowski, Jochen Garcke, Michael Griebel, and Irene Klompmaker. An
      adaptive sparse grid semi-Lagrangian scheme for first order Hamilton-Jacobi Bell-
      man equations. Journal of Scientific Computing, 55(3):575–605, 2013.
                                              121


[124] Michael Griebel. Adaptive sparse grid multilevel methods for elliptic PDEs based
      on finite differences. Computing, 61(2):151–179, 1998.
[125] W. Hundsdorfer and J.G. Verwer. Numerical Solution of Time-Dependent Advection-
      Diffusion-Reaction Equations.      Springer Series in Computational Mathematics.
      Springer Berlin Heidelberg, 2013.
[126] Dimitri van Heesch. Doxygen web page. https://www.doxygen.nl/index.html,
      1997-2019.
[127] Jethro H Greene and Allen Taflove. General vector auxiliary differential equa-
      tion finite-difference time-domain method for nonlinear optics. Optics express,
      14(18):8305–8310, 2006.
[128] Scientific Grand Challenges. Architectures and technology for extreme scale com-
      puting. In US Department of Energy Workshop Report, 2009.
[129] Jack Dongarra, Jeffrey Hittinger, John Bell, Luis Chacón, Robert Falgout, Michael
      Heroux, Paul Hovland, Esmond Ng, Clayton Webster, and Stefan Wild. Applied
      mathematics research for exascale computing. Technical report, U.S. Department of
      Energy, Office of Science, Advanced Scientific Computing Research Program, 2014.
[130] David E. Culler, Richard M. Karp, David A. Patterson, Abhijit Sahay, Eunice E.
      Santos, Klaus E. Schauser, Ramesh Subramonian, and Thorsten von Eicken. LogP:
      a practical model of parallel computation. Commun. ACM, 39:78–85, 1996.
[131] A. Pietracaprina and G. Pucci. The complexity of deterministic PRAM simulation
      on distributed memory machines. Theory of Computing Systems, 30(3):231–247, 1997.
[132] Alberto Nunez, Javier Fernandez, Rosa Filguiera, Felix Garcia, and Jesus Carretero.
      SIMCAN: A flexible, scalable and expandable simulation platform for modelling
      and simulating distributed architectures and applications. Simulation Modelling
      Practice and Theory, 20(1):12–32, 2012.
[133] Julian M. Kunkel. Simulating parallel programs on application and system level.
      Computer Science - Research and Development, 28(2):167–174, 2013.
[134] S.S. Dosanjh, R.F. Barrett, D.W. Doerfler, S.D. Hammond, K.S. Hemmert, M.A. Her-
      oux, P.T. Lin, K.T. Pedretti, A.F. Rodrigues, T.G. Trucano, and J.P. Luitjens. Exascale
      design space exploration and co-design. Future Generation Computer Systems, 30:46
      – 58, 2014.
[135] Kyle L. Spafford and Jeffrey S. Vetter. Aspen: A domain specific language for per-
      formance modeling. In Proceedings of the International Conference on High Performance
      Computing, Networking, Storage and Analysis, SC ’12, pages 84:1–84:11, Los Alamitos,
      CA, USA, 2012. IEEE Computer Society Press.
                                             122


[136] Jeffrey S. Vetter and Jeremy S. Meredith. Synthetic program analysis with Aspen.
      In Proceedings of the 3rd International Conference on Exascale Applications and Software,
      Edinburgh, Scotland, UK, 04/2015 2015. University of Edinburgh, University of
      Edinburgh.
[137] Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J.
      Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and
      Kent Wenger. Pegasus, a workflow management system for science automation.
      Future Generation Computer Systems, 46:17 – 35, 2015.
[138] A. Aw and M. Rascle. Resurrection of "second order" models of traffic flow. SIAM
      Journal on Applied Mathematics, 60(3):916–938, 2000.
[139] D. Armbruster, D. Marthaler, and C. Ringhofer. Kinetic and fluid model hierarchies
      for supply chains. Multiscale Modeling & Simulation, 2(1):43–61, 2003.
[140] Mapundi K. Banda, Michael Herty, and Axel Klar. Gas flow in pipeline networks.
      Networks and Heterogeneous Media, 1(1):41–56, 2006.
[141] Jens Brouwer, Ingenuin Gasser, and Michael Herty. Gas pipeline models revisited:
      Model hierarchies, nonisothermal models, and simulations of networks. Multiscale
      Modeling & Simulation, 9(2):601–623, 2011.
[142] Axel Klar and Raimund Wegener. A hierarchy of models for multilane vehicular
      traffic. I. Modeling. SIAM J. Appl. Math., 59(3):983–1001 (electronic), 1999.
[143] Ernst Hairer, Syvert P Nørsett, and Gerhard Wanner. Solving Ordinary Differential
      Equations I: Nonstiff problems. Springer, 1987.
[144] Cory D Hauck, Michael Herty, and Giuseppe Visconti. Qualitative properties of
      mathematical model for data flow. Networks & Heterogeneous Media, 16(4), 2021.
                                              123