A-STABLE IMPLICIT RAPID SCHEME AND SOFTWARE SOLUTION FOR ELECTROMAGNETIC WAVE PROPAGATION By Mathialakan Thavappiragasam A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical Engineering - Doctor of Philosophy Computational Mathematics, Science and Engineering - Dual Major 2019 ABSTRACT A-STABLE IMPLICIT RAPID SCHEME AND SOFTWARE SOLUTION FOR ELECTROMAGNETIC WAVE PROPAGATION By Mathialakan Thavappiragasam A robust and rapid scheme to solve electromagnetics (EM) is an important requirement in the scientific computing environment in which there are several useful methods used to solve tasks in EM. Our research study is motivated by this need and is targeted to develop a fast A-stable implicit numerical scheme and scalable software solution for EM wave propagation. Our scheme is based on the Method Of Lines Transpose (MOLT) approach which discretizes time first and then solves boundary value problems. By applying the free-space Green’s function, the solution is derived by decomposing particular and homogeneous solutions. The compact Simpson’s quadrature based, O(N) fast convolution, a recursive algorithm, is used to solve the particular solution for N number of grid points. The homogeneous solution is obtained using a particular solution at the boundary points and the applied boundary conditions. The multi-dimensional scheme is developed using the ADI splitting approach and an arbitrary order accuracy in time is achieved by switching the time derivation to a spatial derivation using the Lax-Wendroff approach. The focus of the work in this thesis has been to overcome the limitations in Neumann and outflow boundary conditions to get high-order accuracy by using special treatments that deal with a choice of the interpolation, finite difference stencil, and the initial conditions. In addition, we have extended these ideas to construct perfectly electrically conducting boundary conditions in 2D for the MOLT. In addition to introducing higher-order boundary conditions, an embedded boundary method is employed to deal with complex geometries. As the method is A-stable, it does not suffer from small-time step limitations that are found in explicit finite difference time domain methods when using either embedded boundary or cut cell methods to capture geometry. Further, we are developing an open source code MOLTN (Method Of Lines Transpose, Nth order) which is intended to be a hardware-independent, scalable software tool, using multi- node MPI, multi-core OpenMP, and GPU CUDA implementation. As a test case of the method, we implement and study the A6 magnetron with our embedded boundary method using point sources inside of the domain. The eventual goal is to combine this method with a novel particle method for the simu- lations of plasma. The particle method would treat particles as point particles that generate fields that are tracked on the mesh. No density or current will be mapped to the mesh. The consistency and performance of the scheme are evaluated for EM wave propagation and scattering using different shaped objects including curved boundaries and the introduction of true point sources that demonstrate how we handle particles. Stable solutions result for a wide range of mesh sizes and potential to leverage novel computing architectures, such as GPU, have been demonstrated. Copyright by MATHIALAKAN THAVAPPIRAGASAM 2019 For my beloved late parents (Thavappiragasam and Pushpavathy) who motivate me beyond, to move toward my dream v ACKNOWLEDGEMENTS My first and deepest gratitude goes to my advisor Professor Andrew Christlieb for giving me point to point guidance and support to achieve the target successfully; thank god for giving me such a mentor for my Ph.D. studies. Dr. John Luginsland is the next person to bring me through these studies. Thank you so much for your valuable guidance sir. I will never forget your help, Professor Shanker Balasubramaniam; you established a successful path to move forward, thank you so much! I would also like to thank Professor John Verboncoeur, Professor Edward J. Rothwell, Professor Brian O’Shea for their guidance and thoughts over my Ph.D. studies. Without your support, I couldn’t imagine success. I would like to thank Dr. Matthew F. Causley, Dr. Aditya Viswanathan, and Dr. Pierson Guthrey for their support and collaborative works to accomplish every milestone, and also I am happy to thank Dr. Eric Wolf for his valuable and timely supports. The Christlieb- group, the bleeding edge research team with active researchers, is the team I have been working throughout my Ph.D. studies. Thank you colleagues, William Sands, Dr. Hana Cho, Dr. Michel Crockatt, Dr. Xiao Feng, Dr. Bosu Choi, Dr. Yan Jiang, Dr. Wei Guo, Dr. Bankim Mandal, Dr. Rouchun Zhang for your tips and tricks. I would also like to thank Dr. Bernard G. Pope, a professor emeritus of physics at the Michigan State University, for thesis proofreading and editing. My thanks also go to professors who taught me courses at Michigan State University, and the faculty and staff that are working in the Department of Electrical and Computer Engineering and Computational Mathematics, Science, and Engineering. Finally, I am proud to thank my family for their support and patience, especially my loving wife Kukapalini for her supports and inspirational words of encouragement to carry my studies and our family concurrently. I also want to thank friends and relatives for their motivation. vi TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi CHAPTER 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 High Power Magnetrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Research Goal and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Electromagnetic Wave Propagation . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Electromagnetic Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Boundary Conditions for Electromagnetic Fields . . . . . . . . . . . . 1.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 2: Mathematical Model for Wave Equation Solver . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 One Dimensional Implicit Wave Equation Solver . . . . . . . . . . . . . . . . 2.2.1 Semi-discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Fast Convolution Algorithm . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Boundary Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3.1 Dirichlet Boundary Condition . . . . . . . . . . . . . . . . . 2.2.3.2 Neumann Boundary Condition . . . . . . . . . . . . . . . . 2.2.3.3 Periodic Boundary Condition . . . . . . . . . . . . . . . . . 2.2.3.4 Outflow Boundary Condition . . . . . . . . . . . . . . . . . 2.3 Multi-Dimensional Implicit Wave Equation Solver using ADI Scheme . . . . 2.4 Arbitrary Temporal Order Accuracy . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Arbitrary Order One Dimensional Scheme . . . . . . . . . . . . . . . 2.4.2 Extension to Higher Dimensions for Arbitrary Order Accuracy . . . . 2.5 Treatment for Variable Speed Wave Propagation . . . . . . . . . . . . . . . . 2.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1.1 Higher Order Wave Solvers in Two Dimension . . . . . . . . 2.6.1.2 Higher Order Wave Solvers in Three Dimensions . . . . . . 2.6.2 Three Dimensional Waves . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 Waves with Variable Speeds . . . . . . . . . . . . . . . . . . . . . . . 2.6.3.1 One Dimensional Case . . . . . . . . . . . . . . . . . . . . . 2.6.3.2 Two Dimensional Case . . . . . . . . . . . . . . . . . . . . . 2.6.3.3 Three Dimensional Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . 2.6.4 Waveguide in a Photonic Crystal 2.6.1 Convergence Studies 1 1 3 6 7 7 10 11 13 13 16 16 23 24 25 26 27 27 31 35 37 41 44 46 46 46 48 48 50 50 53 55 56 56 vii 2.6.4.2 Geometry of the Problem . . . . . . . . . . . . . . . . . . . 2.6.4.3 Result and Discussion . . . . . . . . . . . . . . . . . . . . . 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3: Higher Order Non-reflecting Boundary Condition . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Extension to Higher Order in Accuracy . . . . . . . . . . . . . . . . . . . . . 3.3 Appropriate Initial Condition . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Convergence Test for High-Order Outflow . . . . . . . . . . . . . . . 3.4.2 Rotating Gaussian Pulse . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2.1 Time Evaluation of 50o angled Gaussian pulse . . . . . . . 3.4.3 Outflow Boundary Condition Along Curve Boundaries . . . . . . . . 3.4.4 Convergence Studies Using Analytical Solution . . . . . . . . . . . . . 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 58 61 62 62 63 67 68 68 70 72 73 74 76 CHAPTER 4: Higher Order Embedded Neumann Boundary Method . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 Derivation of The Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 One Dimensional Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Two Dimensional Scheme . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Three Dimensional Scheme . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Treatment for Complex Geometries . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Pre-computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Geometry for A6 MDO in 2D . . . . . . . . . . . . . . . . . . . . . . 4.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 77 79 79 81 85 88 90 92 93 95 4.4.1.1 Convergence studies 96 Spherical Scattering 98 4.4.2.1 Convergence Studies . . . . . . . . . . . . . . . . . . . . . . 100 4.4.3 Electron Beam in Magnetrons . . . . . . . . . . . . . . . . . . . . . . 101 4.4.3.1 A6 Magnetron with Diffraction Output (MDO) in 2D . . . . 101 4.4.3.2 A6 Magnetron in 3D . . . . . . . . . . . . . . . . . . . . . . 103 4.4.3.3 Convergence Studies . . . . . . . . . . . . . . . . . . . . . . 104 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.4.1 Cylindrical Scattering 4.4.2 CHAPTER 5: Second-Order PEC Boundary Condition . . . . . . . . . . . . 107 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.1 2D Electric Scalar and Magnetic Vector Potential . . . . . . . . . . . . . . . 108 5.2 5.3 Perfect Electric Conductor Boundary . . . . . . . . . . . . . . . . . . . . . . 109 5.4 Numerical Approximation for PEC Boundary Conditions . . . . . . . . . . . 111 5.4.1 Using Dirichlet Boundary Conditions for A to Capture PEC Boundary 112 5.4.2 Embedded Boundary Method, An effective Neumann to Dirichlet . . . . . . . . . . . . . . . . . . . . . 115 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Map for Creating Ghost Points viii 5.5.1 Square Cavity Rotated Through Different Angles . . . . . . . . . . . 120 5.5.1.1 Convergence Studies and Error Analysis . . . . . . . . . . . 123 5.5.2 Square Cavity with a leak (diffraction Q) . . . . . . . . . . . . . . . 126 5.5.3 A6 Magnetron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.5.4 Rising Sun Magnetrons . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 CHAPTER 6: Software Solution and Code Acceleration . . . . . . . . . . . 134 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.2 Multi-core OpenMP, Multi-node MPI version . . . . . . . . . . . . . . . . . 134 6.2.1 Data Structures and Data Flows . . . . . . . . . . . . . . . . . . . . . 135 6.2.2 Domain Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.2.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.3 GPGPU CUDA version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.3.1 Task Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.3.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 CHAPTER 7: Summary and Potential Future Directions . . . . . . . . . . . 145 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.3.1 Code Acceleration for MOLTN . . . . . . . . . . . . . . . . . . . . . 146 7.3.2 Complex Geometry in MOLTN . . . . . . . . . . . . . . . . . . . . . 147 7.3.3 Further HPM tube simulations . . . . . . . . . . . . . . . . . . . . . 147 Incorporate with Particles . . . . . . . . . . . . . . . . . . . . . . . . 148 7.3.4 7.3.4.1 Derive the scheme for 3D point source . . . . . . . . . . . . 149 7.3.4.2 Numerical Example . . . . . . . . . . . . . . . . . . . . . . 151 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 ix LIST OF TABLES Table 2.1: The maximum values βmax for which the P th order scheme remains A-stable [14]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Table 5.1: Frequency (in GHz) obtained at the point (3.36cm, 3.36cm) using the ping test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Table 5.2: Frequency (in GHz) obtained at the point (3.36cm, 3.36cm) using the ping test for CFL 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Table 5.3: Numerical error in the frequency computation using the ping test at the point (3.36cm, 3.36cm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 x LIST OF FIGURES Figure 1.1: 3D Relativistic A6 magnetron with diffraction output (MDO) with a . . . . . . . . . . . . . . . . . . . cylindrical cathode in the center [61] Figure 1.2: 2D view of A6 magnetron with a cathode of radius rc and the anode with inner radius ra, vane radius rv, vane angle α1, and cavity angle α2 . . . . Figure 1.3: Boundary surface between two regions with electric fields E1 and E2, magnetic field H1 and H2, electric flux densities D1 and D2, and magnetic flux densities B1 and B2 respectively. Here, Js is the surface current, ρs is the surface charge density, and n is the normal vector pointed out from the region two. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.1: Fast convolution line integration which decomposes left I L[u](x) and right I R[u](x)) oriented integrals between the domain a and b with the spatial step size, ∆xj(= xj − xj−1). . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.2: (a) x-, (b) y- and (c) xy lines with exact circular boundary points on the mesh lines that are used for the ADI x (a) and y (b) sweeps. . . . . . . Figure 2.3: Fourth order time convergence of the 2D wave solver using Dirichlet (a) and periodic (b) boundary conditions with ∆x = ∆y = 6.25 × 10−3. This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 on a square domain Ω = [−1, 1]2 with a point source cos(ωt), ω = 1 at the center of the box, (0, 0). The CFL (= c∆t ∆x ) value changes proportional to the time step size ∆t with fixed spatial step size. Figure 2.4: Fourth order convergence of 3D wave solver using Dirichlet boundary conditions with ∆x = ∆y = ∆z = 1.25 × 10−2 ω = 1. This is a self- refinement study which measures L∞ norm of the error at time T = 2.0 on a cubic domain Ω = [−1, 1]3 with a point source cos(ωt) at the center of the cube, (0, 0, 0) for (a) ω = 0.1 and (b) ω = 1. The CFL (= c∆t ∆x ) value changes proportional to the time step size ∆t with fixed spatial step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . size. Figure 2.5: Time evolution of a point source field cos(ωt), ω = 1 within a cubic do- main (Ω = [−1, 1]3) by imposing Dirichlet boundary condition along the surface with the spatial step size, ∆x = ∆y = ∆z = 0.0625 ω = 1 and the time step size, ∆t = 0.0313. . . . . . . . . . . . . . . . . . . . . . . 5 6 10 23 33 47 49 50 xi Figure 2.6: Time evolution of a point source field cos(ωt), ω = 1 within a cubic (Ω = [−1, 1]3) by imposing periodic boundary condition along the surface with the spatial step size, ∆x = ∆y = ∆z = 0.0625 ω = 1 and the time step size, ∆t = 0.0313. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.7: Time evolution of a point source field cos(ωt), ω = 1 within a cubic (Ω = [−1, 1]3) by imposing outflow boundary condition along the surface with the spatial step size, ∆x = ∆y = ∆z = 0.0625 ω = 1 and the time step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . size, ∆t = 0.0313. Figure 2.8: 1D wave travelling over the two domains Ω1 and Ω2 with piecewise con- stant speed c1 = 1.0 and c2 = 2.0. For this experiment, we choose spatial step ∆x = 0.002, and time step ∆t = 0.005. . . . . . . . . . . . . . . . Figure 2.9: 1D wave travelling over the layered media with eight domains and the wave travels with the same speed in every bi-domain ( c1 = 1.0 and c2 = 2.0). For this experiment, we choose spatial step ∆x = 0.002, and time step ∆t = 0.005 along layered media with piecewise constant wave speed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.10: Geometrical view of 0.5 × 0.5 (a) one and (b) four square patches in the square domain Ω = [−1, 1]2 with c2(= 0.1) and c1(= 1.0) are the wave speeds in the patches and the remaining area. . . . . . . . . . . . . . . . Figure 2.11: Time evolution of a Gaussian field e−25(x2+y2) in a square domain Ω = [−1, 1]2 with a 0.52 square patch as shown in Figure 2.10-(a). Here, the spatial step size, ∆x = ∆y = 0.02 and the time step size, ∆t = 0.005. . . Figure 2.12: Time evolution of a Gaussian field e−25(x2+y2) in a square domain Ω = [−1, 1]2 with four 0.52 square patches as shown in Figure 2.10-(b). Here, the spatial step size, ∆x = ∆y = 0.02 and the time step size, ∆t = 0.005. Figure 2.13: Geometrical view of 0.5 × 0.5 × 0.5 (a) one and (b) four cubic patches in the cubical domain Ω = [−1, 1]3 with c2(= 0.1) and c1(= 1.0) are the wave speeds in the patches and the remaining area. . . . . . . . . . . . . Figure 2.14: Time evolution of a Gaussian field e−25(x2+y2+z2) in a cubical domain Ω = [−1, 1]3 with a 0.53 cubic patch as shown in Figure 2.13-(a). Here, the spatial step size, ∆x = ∆y = ∆z = 0.0625 and the time step size, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ∆t = 0.0313. Figure 2.15: Time evolution of a Gaussian field e−25(x2+y2+z2) in a cubical domain Ω = [−1, 1]3 with four 0.53 cubic patches as shown in Figure 2.13-(b). Here, the spatial step size, ∆x = ∆y = ∆z = 0.0625 and the time step size, ∆t = 0.0313. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 50 52 53 54 54 54 55 56 56 xii Figure 2.16: Schematic illustration of a line defect (in red) within a set of cylindrical rods (in green) on a photonic crystal. . . . . . . . . . . . . . . . . . . . . Figure 2.17: 2D Lattice of cylindrical rods which are periodic along x and y axes with . . . . . . . . . . . . . . . . . . lattice constant a on a photonic crystal. Figure 2.18: Square lattice of square rods of side length 0.38a with lattice constant a and dielectric constant  = 8.9 in 3D (a) and 2D (b), a linear defect is . . . . . . . . . . . . . . . . . . . formed by removing a column of rods. Figure 2.19: The wave generated by the point source, e(−iωt), where ω = cky, ky = ,  = 8.9, and µ = 1 travelling through the photonic 0.88(2a), c = 1√ waveguide using the model shown in Figure 2.18 . . . . . . . . . . . . . (µ) Figure 2.20: Square lattice of cylindrical rods of side length 0.38a with lattice constant a and dielectric constant  = 8.9 in 3D (a) and 2D (b), a linear defect is . . . . . . . . . . . . . . . . . . . formed by removing a column of rods. Figure 2.21: The wave generated by the point source, e(−iωt), where ω = cky, ky = ,  = 8.9, and µ = 1 travelling through the photonic 0.88(2a), c = 1√ waveguide using the model shown in Figure 2.20 . . . . . . . . . . . . . (µ) Figure 2.22: The wave, e(−25(x2+y2)) traveling through the photonic waveguide using . . . . . . . . . . . . . . . . . . . . . . the model shown in Figure 2.20. Figure 3.1: Fourth order convergence of (a) 2D and (b) 3D wave solver using out- flow boundary conditions with the spatial step size h = 1.25 × 10−2. This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 on a (a) square and (b) cubic domain with a point source cos(ωt), ω = 1 at the center of the domain. The CFL (= c∆t h ) value changes proportional to the time step size ∆t with fixed spatial step size, h. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.2: Fourth order convergence of the oval shape Gaussian pulse (however it reduces to the above third order near to θ = Π 4 due to the ADI splitting error) given by the Equation 3.8 rotated through Π 2 on square do- main Ω = [−1, 1]2 by imposing Outflow boundary conditions along the boundaries. This self-refinement study measures L∞ norm of the error at time T = 2.0 with h = 0.02. . . . . . . . . . . . . . . . . . . . . . . . . 18 to Π xiii 57 58 59 59 60 60 60 70 71 Figure 3.3: Time evolution of a Gaussian pulse given by the Equation 3.8 with angle θ = 50o on square domain Ω = [−1, 1]2 by imposing outflow boundary conditions along the boundaries. Here the time step size ∆t = 3.3× 10−3 . . . and CFL value is 0.5. The wave is leaving completely as expected. Figure 3.4: Evolution of a Gaussian field given by the Equation 3.8 at time t = 0.6864, rotated through different angles (a) θ = 30o, (b) θ = 50o, (c) θ = 70o, and (d) θ = 90o on square domain Ω = [−1, 1]2 by imposing outflow boundary conditions along the boundaries, Here the time step size ∆t = 3.3 × 10−3 and CFL value is 0.5 at time t = 0.1287 using outflow boundary conditions. We observe the same behaviour of the wave for rotated Gaussian pulse through different angles. The waves leave completely through the curved boundary. . . . . . . . . . . . . . . Figure 3.5: (a) Geometry and (b) graph representation of the 2D object constructed by using an oval of width 2a and height 2b and a rectangle of 2rw × (rh1 + . rh2). Here rh1 = b and v1, v2, v3, and v4 are the vertices of the object. Figure 3.6: Time evolution of the point source field cos ωt with ω = 0.5 at (0, 100∆x) in the object as shown in Figure 3.5 by imposing outflow boundary con- ditions along the curved boundary and homogeneous Dirichlet along the straight boundaries. Here the spatial step size, ∆x = ∆y = 0.039 and the time step size, ∆t = 0.0195 . . . . . . . . . . . . . . . . . . . . . . . Figure 3.7: (a) Snapshot of the point source field at time T = 5.0 and (b) Fourth order convergence of 2D wave solver using Outflow boundary conditions along the curve boundary with the spatial grid width ∆x = ∆y = 0.078. This is a self-refinement study which measures L∞ norm of the error at time T = 5.0 on the object shown in Figure 3.5 with a point source . . . . . . . . . . . cos(ωt), ω = 0.5 at the center of the box, (0, 100∆x) Figure 3.8: Fourth order convergence of 3D wave solver using outflow boundary con- ditions with the spatial step size h = 2.5× 10−2. This measures L∞ norm of the error which compares with analytical solution using a point source sin(4πt) placed at the center of the domain. . . . . . . . . . . . . . . . . Figure 4.1: Geometry of 1D embedded Neumann boundary domain Ω ≡ [xa, xb] with left boundary xB, ghost point xG, and interpolation points xI and xII, where the distance ξI = |xI − xa|, ξII = |xII − xa|, and ξG = |xG − xa| . Figure 4.2: (a)Geometry of 2D embedded Neumann boundary method with a bound- ary point (xB, yB), ghost point (xG, yG), and interpolation points (xI, yI), (xII, yII), (xIII, yIII) and (xIV , yIV ) and (b) symmetric 8-points stencil to interpolate the solution at the point (xI, yI). . . . . . . . . . . . . . . . 72 72 73 74 75 76 80 82 xiv Figure 4.3: Geometry of 3D embedded Neumann boundary method with a bound- ary point (xB, yB, zB), ghost point (xG, yG, zG), and interpolation points (xI, yI, zI), (xII, yII, zII), (xIII, yIII, zIII) and (xIV , yIV , zIV ) . . . . . . . Figure 4.4: 2D graph with 3 vertices (v1(x1, y1), v2(x2, y2), and v3(x3, y3), 2 straight edges es1(≡ (v1, v2)), es2(≡ (v3, v1)) and 1 circular arch (a) or elliptical arch (b) edge ea1(≡ (v2, v3, Oa1)) centered at Oa1(xa1, ya1). . . . . . . . Figure 4.5: Flow diagram of pre-computation for 3D problems with complex geome- tries; it cuts a 3D object into 2D slices Sxy, Sxz, makes 2D graphs Gxy, Gxz, and identifies the properties of the line segments Px, Py, and Pz. . Figure 4.6: A6 MDO (a) Axial view with a solid cathode of radius rc in the center, the anode with inner radius ra, vane radius rv, vane angle α1, cavity angle α2, and a coupling horn of radius rh (b) the angle of kth cavity which is represented by the angles θk . . . . . . . . . . . . 1 and θk 2 , θk 2 = θk 1 + α1. Figure 4.7: Geometrical structure of A6 MDO with key points; intersection, bound- ary, ghost, and interpolation points required to obtain (a) x− and (b) y− . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sweeps. Figure 4.8: 2D model for a single cylindrical scatter at the (a) center and two (b) symmetric and (c) asymmetric cylindrical scatters in the corners. Here the red mark indicates the point source. . . . . . . . . . . . . . . . . . . 86 89 90 93 94 95 Figure 4.9: Time evolution of a point source field cos(ωt), ω = 10 with a circular scatter of radius r = 0.5 in the center of the square domain Ω = [−1, 1]2 with a spatial step size ∆x = ∆y = 0.0156 and time step size ∆t = 0.0078. 96 Figure 4.10: Time evolution of a point source field cos(ωt), ω = 10 with two symmetric circular scatters of radii r1 = r2 = 0.3 in the bottom-left and top-right corners of the square domain Ω = [−2, 2]2 with a spatial step size ∆x = . . . . . . . . . . . . . . . ∆y = 0.0156 and time step size ∆t = 0.0078. Figure 4.11: Time evolution of a point source field cos(ωt), ω = 10 with two asymmet- rical circular scatters of radii r1 = 0.3 and r2 = 0.15 in the bottom-left and top-right corners of the square domain Ω = [−2, 2]2 with a spatial step size ∆x = ∆y = 0.0156 and time step size ∆t = 0.0078. . . . . . . . Figure 4.12: Fourth order convergence of 2D wave solver using a point source cos(ωt), ω = 1 placed at (0,−1 + 3∆x), ∆x = 0.0156 on a square domain Ω = [−1, 1]2 with a circular scatter of radius r = 0.5 at the center by imposing Neumann and outflow along the circular and square boundaries. This is a self-refinement study which measures L∞ norm of the error at the final time T = 2.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 97 97 xv Figure 4.13: Convergence plots for a) space using entire solution and b) space using boundary derivative on a square domain Ω = [−1, 1]2 with a circular scatter of radius r = 0.5 at the center by imposing Neumann and outflow along the circular and square boundaries. This is a self-refinement study which measures L2 norm of the error at the final time T = 2.0. . . . . . Figure 4.14: 3D model for a single spherical scatter at the center of a cube, two b) . . . . . symmetric and c) asymmetric spherical scatters in the corners. 98 99 Figure 4.15: Time evolution of a point source field cos(ωt), ω = 10 with a spherical scatter of radii r = 0.5 in the center of the cubic domain Ω = [−1, 1]3 with a spatial step size ∆x = ∆y = ∆z = 0.0156 and time step size ∆t = 0.0078. 99 Figure 4.16: Time evolution of a point source field cos(ωt), ω = 10 with two symmetric spherical scatters of radii r1 = r2 = 0.3 in the bottom-left and top- right corners of the cubic domain Ω = [−1, 1]3 with a spatial step size ∆x = ∆y = ∆z = 0.0156 and time step size ∆t = 0.0078. . . . . . . . . 100 Figure 4.17: Time evolution of a point source field cos(ωt), ω = 10 with two asymmet- ric spherical scatters of radii r1 = 0.3 and r2 = 0.15 in the bottom-left and top-right corners of the cubic domain Ω = [−1, 1]3 with a spatial step size ∆x = ∆y = ∆z = 0.0156 and time step size ∆t = 0.0078. . . . . . . 100 Figure 4.18: Fourth order (a) time and (b) space convergence plots using a spherical scatter of radius r = 0.5 at the center of a cubic domain Ω = [−1, 1]3 by imposing Neumann and outflow along the surface of the sphere and cubic boundaries. This is a self-refinement study which measures (a) L∞ and (b) L2 norm of the error at the final time T = 2.0. . . . . . . . . . . . . 101 Figure 4.19: Time evolution of a circular source field cos(ωt)(|x2+y2−r2 c| < 2∆x), ω = 10 in the model of A6RM with a solid cathode of radius rc = 1.58 and anode of inner radius ra = 2.11, vane radius rv = 4.11 and the coupling horn radius rh = 4.9 and the angular width of the vane, α1 = π 6 , and the cavity, α2 = π 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Figure 4.20: Time evolution of fields provided by six point sources in the model of A6M with a transparent cathode of radius rc = 1.58 and anode of inner radius ra = 2.11, vane radius rv = 4.11 and the coupling horn radius 6 , and the cavity, α2 = π rh = 4.9 and the angular width of the vane, α1 = π 6 .102 xvi Figure 4.21: Fourth order convergence of fields provided by a point source cos(ωt), ω = 1 at the center (0, 0), of A6MDO with the anode of radii rh = 4.9 rv = 4.11, and ra = 2.11 and angular width of the vane, α1 = π 6 and the cavity, α2 = π 6 . This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 for fixed spatial step size ∆x = ∆y = 3.13× 10−2 103 Figure 4.22: Key points used for the simulation of 3D A6 magnetron; intersection, boundary, ghost, and interpolation points required to obtain (a) x−, (b) y−, and (c) z− sweeps. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Figure 4.23: Time evolution of fields provided by six point sources in the model of 3D A6 magnetron with a transparent cathode of radius rc = 1.58 and anode of inner radius ra = 2.11, vane radius rv = 4.11 and the coupling horn radius rh = 4.9, the angular width of the vane, α1 = π 9 , cavity angle, α2 = 2π 9 , and thickness h = 6. . . . . . . . . . . . . . . . . . . . . . . . . 105 Figure 4.24: Fourth order (a) time convergence and (b) space convergence of fields provided by six point sources in the model of 3D A6 magnetron with a transparent cathode and the anode of radii rh = 4.9 rv = 4.11, and ra = 2.11 and angular width of the vane, α1 = π 6 and the cavity, α2 = π 6 . This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 for a fixed spatial step size of ∆x = ∆y = 3.13 × 10−2 . . . 105 Figure 5.1: Boundary surface between two regions with electric fields E1 and E2, magnetic field H1 and H2, electric flux densities D1 and D2, and magnetic flux densities B1 and B2 respectively. Here, Js is the surface current, ρs is the surface charge density, and n is the normal vector pointed out from the region two (Perfect Electric Conductor). . . . . . . . . . . . . . . . . 110 Figure 5.2: A PEC square cavity rotated by an angle θ with normal vectors nL, nD, nR, and nU along the left, down, right, and up boundaries. . . . . . . . . 112 Figure 5.3: Geometry of 2D embedded PEC boundary method with a boundary point (xB, yB), ghost point (xG, yG), and interpolation points (xI, yI), (xII, yII), and (xIII, yIII). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Figure 5.4: Frequency distribution for 31.42o rotated 21cm×21cm square cavity com- puted using the measured vector potential at the point (3.36cm, 3.36cm) for the impulse response, h = 0.084 cm . . . . . . . . . . . . . . . . . . . 121 Figure 5.5: A strong fundamental mode for 31.42o rotated 21cm×21cm square cavity computed for different resolutions. . . . . . . . . . . . . . . . . . . . . . 121 xvii Figure 5.6: Convergence plots for the natural modes (a) 1-3, (b) 3-3, (c) 1-5, (d) 3-5, and (e) 5-5 with analytically computed frequencies (a) f = 2.2588GHz, (b) f = 3.0305GHz, (c) f = 3.6422GHz, (d) f = 4.1650GHz, and (e) f = 5.0508GHz for 31.42o rotated 21cm × 21cm square cavity computed for different resolutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Figure 5.7: Time evolution of a point source field sin(2πf t) with f = 1GHz placed at the center of a PEC square box of grid size 100 × 100 rotated by the angles 0o, 31.42o, 45o, time step size ∆t = 7ps. . . . . . . . . . . . . . . 124 Figure 5.8: Convergence plots for a) space using entire solution and b) space using boundary derivative on a PEC square domain [0cm, 21cm]2. This study measures L2 norm of the error at time T = 1.0ns compared with the analytical solution, for the cases of mesh aligned and nonaligned (rotated by 45o). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Figure 5.9: Convergence plots for (a) time and (b) space on a PEC a square domain [0cm, 21cm]2. This study measures L∞ norm of the error at time T = 1.0ns compared with the analytical solution, for the cases of mesh aligned and nonaligned (rotated by 45o). . . . . . . . . . . . . . . . . . . . . . 126 Figure 5.10: Time evaluation of A in (a) mesh aligned and (b) θ = 31.420 rotated square cavities with a leak imposed by outflow boundary condition and diffraction Q which is obtained for an oscillating point source (sin(2πf t), f = 1GHz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Figure 5.11: Frequency spectrum of 2D A6M with vane resonators rv = 4.11cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 20o, and cavity angle, 40o, the grid size 128×128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. . . . . . 128 Figure 5.12: Evolution of the transparent cathode A6M with vane resonators rv = 4.11cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 20o, and cavity angle, 40o, the grid size 128×128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Figure 5.13: 2D view of the 18 cavity rising sun magnetron (AX9) with a cathode of radius rc and anode of inner radius ra, short and long cavity radii rvh and rvl. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 xviii Figure 5.14: Evolution of the transparent cathode 12 cavity rising sun magnetron with vane resonators rvs = 4.11cm and rvl = 4.91cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 12o, and cavity angle, 18o, the grid size 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. . . . . . 131 Figure 5.15: Evolution of the transparent cathode 18 cavity rising sun magnetron (AX9) with vane resonators rvs = 4.11cm and rvl = 4.91cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 8o, and cavity angle, 12o, the grid size 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. . . . . . 132 Figure 6.1: Domain Decomposition of the domain [a, b] to three 1D sub domains j ) and right [c0, c1], [c1, c2],and [c2, c3], each evolves own internal left (I L (I R j ) sweeps over the domain of length ∆cj. . . . . . . . . . . . . . . . . 137 Figure 6.2: Log-log plot for wall time using 28 core Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor, for the grid size N 2, N = 64, 128, 256, 1024, 2048.139 Figure 6.3: Log-log plot for speedup in using 28 core Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor for the grid size N 2, N = 64, 128, 256, 512, 1024, 2048. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Figure 6.4: Log-log plot for efficiency using 28 core Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor for the grid size N 2, N = 64, 128, 256, 512, 1024, 2048. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Figure 6.5: Task parallelism for the computation of un+1 based on Cxy and parallel stages executing through CUDA streams. . . . . . . . . . . . . . . . . . 142 Figure 6.6: Log-log plot for wall time using Intel(R) Xeon(R) CPU, Tesla K20, and K80 GPUs for the grid sizeN 2, N = 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Figure 6.7: Intel(R) Xeon(R) CPU vs Tesla K20 and K80 GPU speedup (log-log plot) for the grid size N 2, N = 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192. . 143 Figure 7.1: Magnetron with eight circular hole cavity anode (this figure is from En- cyclopedia Britannica, Inc.) . . . . . . . . . . . . . . . . . . . . . . . . 148 Figure 7.2: Two-cavity klystron (this figure is from Reference [30]) . . . . . . . . . . 148 Figure 7.3: Evolution of the two 3D point sources sin(2πf t) with the same frequency f = 1 at the corners of the cubical domain ([−1, 1]3), spatial step size ∆x = ∆y = ∆z = 0.013, and time step size ∆t = 0.0067. . . . . . . . . 152 xix Figure 7.4: Time (a) and space (b) convergence studies using 3D point source sin(2πf t) with frequencies f = 1 at the center of the cubical domain ([−1, 1]3). . . 152 Figure 7.5: Evolution of the two 3D point sources sin(2πf t) with frequencies 1 and 10 at the corners of the cubical domain ([−1, 1]3), spatial step size ∆x = ∆y = ∆z = 0.013, and time step size ∆t = 0.0067. . . . . . . . . . . . . 153 xx LIST OF ALGORITHMS Algorithm 1 Fast Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm 2 Two Dimensional Solver . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm 3 Algorithm 4 compute u : Compute u at time tn+1 . . . . . . . . . . . . . . . . . . linv : L−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm 5 Compute u at time tn+1 with high-order accuracy . . . . . . . . . . . Algorithm 6 poly highorder : Compute the coefficients for high-order accuracy . . Algorithm 7 Algorithm 8 Algorithm 9 Algorithm 10 compute u high :Compute u in 2D with high-order accuracy . . . . . compute c : Compute operator C in 2D; Cxy = L−1 compute d :Compute operator D in 2D; Dxy = 1 − L−1 compute cd : Compute operators C and D in 2D; Dxy and Cxy y Dx + L−1 y L−1 x Dy . . . . . . . . . . . . . . x Algorithm 11 Compute u at time tn+1 by imposing outflow boundary conditions . linv out: Compute L−1 for outflow boundary condition . . . . . . . . Algorithm 12 Algorithm 13 apply bc outf low: Apply outflow boundary conditions . . . . . . . . . Algorithm 14 gamma : Compute the γ coefficients for a given order of accuracy . . 25 35 36 36 39 40 43 44 45 45 67 68 69 69 70 81 84 85 88 Algorithm 15 E : Compute the coefficients E based on lemma 2.1 . . . . . . . . . Algorithm 16 Compute L−1 for 1D scheme . . . . . . . . . . . . . . . . . . . . . . . Algorithm 17 Compute wx(= L−1 Algorithm 18 Compute wyx(= L−1 Algorithm 19 Compute wzyx(= L−1 Algorithm 20 Compute wx(= L−1 Algorithm 21 Compute wyx(= L−1 x [u]) . . . . . . . . . . . . . . . . . . . . . . . . . . y [wx]) . . . . . . . . . . . . . . . . . . . . . . . . z [wyx]) . . . . . . . . . . . . . . . . . . . . . . . x [A]) y [wx]) . . . . . . . . . . . . . . . . . . . . . . . . 119 . . . . . . . . . . . . . . . . . . . . . . . . . 118 xxi CHAPTER 1: Introduction 1.1 Context and Motivation Scientists have been researching Higher Power Microwave (HPM) tubes to improve their performance in new technologies, as well as identifying applications in different areas, such as medicine, waste decomposition, radar systems, plasma screens, linear accelerators, etc. Beyond working in the Electromagnetic (EM) experimental labs, researchers also have been working on the development of HPM simulation tools using different numerical schemes. Simulations can provide insight into real experiments. Moreover, they are very flexible to implement and can test new ideas using easy, fast, and cost-efficient design cycles. In order to perform EM simulations, first, we have to solve Maxwell’s equations. Several approaches have been introduced to solve Maxwell’s equations. They basically fall in the broad category of Finite Difference (FD) Methods, Finite Element Methods (FEM), Method of Moments (MOM), or hybrid techniques which combine two or more methods ( e.g., Finite Element Time Domain (FETD) with Finite Different Time Domain (FDTD) or FETD with MOM). Each of the aforementioned methods has pros and cons, For example, the FDTD-based explicit Yee scheme [71] has been used as a benchmark method for more than 60 years because it preserves certain physical properties very well namely, Gauss’s law and the continuity condition. The drawback of the method is that it fails to handle curved boundaries in Cartesian grids due to staircasing issues. Further, explicit FD schemes can become unstable when the surface bounds a medium [52]. In contrast, the FETD method is a very prominent approach for complex geometries but this requires the solution of large systems of linear equations. Such methods are expensive and hard to scale due to the bottleneck of matrix inversions. Further, FETD schemes based on the Newmark beta method are unconditionally 1 stable but exhibit only second-order accuracy [29]. In the same spirit as FEM, the MOM was developed to address complex problems in EM. The method transforms the boundary- value problem into a system of linear equations [42]. Consequently, it suffers from the same performance issues as FEM, since these matrices need to be inverted. The MOLT based scheme developed by Causley et.al. [15; 13] was designed to address the issues faced by FDTD and FETD methods. It is an unconditionally stable, fast, and implicit scheme, capable of arbitrary order accuracy in time for Dirichlet and periodic boundaries on Cartesian grids. The method reduces to second order for outflow boundary condition and Neumann boundary condition on curved boundaries. My work extended the order of accuracy for these problems to arbitrary order, which was verified through refinement studies. We show fourth-order spatiotemporal accuracy for complex geometry problems including curved boundaries. Further, a general framework was defined for problems with complex geometry using an embedded boundary approach. Simulation tools for HPM tubes, such as A6 magnetron, 12 and 18 cavity rising suns were developed using a PEC boundary condition. The scheme was derived for the electromagnetic vector potential using the Lorenz gauge which imposed the PEC boundary condition in 2D. I evaluated the simulation of A6 magnetron using a ping test and obtained six-strong resonance modes. This is identified as a challenging task because the existing tools may fail to give an accurate or a robust solution [39; 40; 51; 67], or the methods are limited to second-order accuracy [32]. The Discontinuous Galerkin PIC- based approach fails to obtain the exact six resonance modes for the simulation of an A6 magnetron due to the diffusivity of the scheme [39; 40]. The multiphysics simulation software tool VSim which is based on conformal Finite Difference Time Domain (FDTD) methods and conformal particle boundary conditions, fails to give robust solutions in the simulation of A6 magnetrons for every resolution. The user-configurable code, MAGIC which is an EM FDTD-PIC code, is limited to the second-order accuracy[32]. We will review the needs and challenges for the simulation of magnetrons, specifically A6 magnetron in the next section. 2 1.2 High Power Magnetrons The Magnetron is the most tunable Higher Power Microwave (HPM) source with 30% tunable range [62]. It was invented by Arthur Hull in 1913, and magnetrons were built on his original principles in the 1920s and 1930s with power levels of about 100W. In the 1940s, the magnetrons developed by Boot and Randell achieved 10kW on their initial development. In August 1940, the cavity magnetron based new weapons development for the US defense systems during World War-II was led by Sir Henry Tizard. The magnetron driven radar had a great impact on the war and the subsequent technical development of the magnetrons was mainly carried out at the MIT radiation laboratory. The postwar development of magnetron and related technologies has been enhanced in multi-directions. Such technology is strapping, a mode selection technology which shifts the frequency spectrum by connecting alternate resonators and prevents mode hopping. The rising suns, designed by alternating between two resonators’ shape to separate the mode in frequency. A new generation of high-power frequency-agile magnetrons was introduced for electronic warfare applications. The coaxial magnetron, introduced by Feinstein and Collier, allows stable tuning by use of a cavity surrounding the magnetron resonator. The Magnetron Injection Gun was introduced by French and the relativistic magnetron (high current extension of the conventional magnetron) was developed by Bekefi and Orzechowski in 1976 [5; 23; 32]. Nowadays, magnetrons, high output power (GW-class) microwave tubes, are used in nu- merous applications such as communication systems, radar, warfare, medical X-ray sources, and microwave ovens. To study and enhance the performance of the magnetrons, magnetron developers need a reliable and accurate simulation method. The research studies on the sim- ulation of magnetrons have been carried out in academic research labs, national research labs as well as electrical-electronic-computer engineering industries. For this simulation, we need to derive a time-dependent solution for Maxwell’s equations that is applicable for complex geometries and should have the ability to incorporate with particles. In recent years, the conventional FDTD particle-in-cell (PIC) method has been widely used to study magnetrons 3 as well as other microwave tubes. Even though the FDTD based Yee scheme is a well-agreed method for Maxwell’s equations and preserves divergence-free quantities, because of an ex- plicit scheme it suffers from CFL restrictions and is not suitable for curved surfaces due to the cut-cell staircasing approximation. Another approach, the conformal finite difference time domain (CFDTD) PIC method, treats cut-cells differently in order to maintain second-order accuracy for curved surfaces [51]. The CFDTD Dey-Mittra algorithm adjusts the area and lengths of cut cells in the equation of Faraday’s law in Maxwell’s equation, but its stability is limited by the CFL condition and it wouldn’t be scalable. The ADI-FDTD is one of the most powerful schemes to solve Maxwell’s equations because it relies on simple, one dimen- sional, tridiagonal system solvers in contrast to a single large system solver as is required by the Crank-Nicholson implicit method. However, the ADI-FDTD method is mostly used to solve non-complicated domains such as rectangular domain and is not applicable for complex geometries because of showing only first-order accuracy for stair-stepped curved boundaries. ADI-FDTD method combined with the Dey-Mittra embedded boundary method can model the curved domains associated with complex structures and time step sizes beyond the CFL limit [67]. The efficiency of this method depends on the one-dimensional tridiagonal solvers which are used underneath and that will cause a major bottle-neck which effects the scala- bility of the scheme. Further, the order of accuracy is limited to second-order. Based on these studies, in order to simulate the HPM tube, we need a robust implicit scheme that applies to complex geometries. Hence, the MOLT based A-stable implicit scheme introduced in [14] is chosen to simulate HPM magnetrons and imposed embedded boundary methods to deal with complex geometries associated with these kinds of accelerator struc- tures. We derived a time-dependent solution to Maxwell’s equations in the vector potential form under the Lorenz gauge and imposed a perfect electric conductor (PEC) embedded boundary method. The scheme is unconditionally stable, shows fourth-order accuracy be- yond the CFL restriction, and is freely scalable with high performance (requires O(n) cost for n grids). It doesn’t depend on scalability restricting matrix operations but instead uses 4 free-space Green’s function based fast convolution. We performed a cold test using our sim- ulation study. The cold test is an experiment of classical electromagnetic/microwave studies without including charged particles and it can be done in either the frequency or the time domain. We determined the dispersion relation of the interacting structures or slow-wave structures and obtained the fundamental frequency mode using ping tests. It would be possible to carry out a follow up hot test which will include particles. Figure 1.1 shows the 3D view of a relativistic A6 magnetron with diffraction output (MDO) [61]. It has a cylindrical cathode in the center, an anode with six vanes around the circle symmetrically, and a coupling horn which is connected with a cavity. We simulate the open cavity A6 MDO using our MOLT based scheme by imposing embedded Neumann (Chapter 4)/PEC (Chapter 5) and outflow boundary conditions for the scheme evaluation. Further, our simulation study includes a symmetrical A6 magnetron using PEC boundary conditions in 2D and Neumann boundary conditions in 3D. Further, we evaluate the A6 magnetron for the impulse response and resonance mode frequencies of it. A 2D view of an A6 magnetron without diffraction output is shown in Figure 1.2. This has a cathode of radius rc and an anode with inner radius ra, vane radius rv, vane angle α1, and cavity angle α2. Figure 1.1: 3D Relativistic A6 magnetron with diffraction output (MDO) with a cylindrical cathode in the center [61] 5 Figure 1.2: 2D view of A6 magnetron with a cathode of radius rc and the anode with inner radius ra, vane radius rv, vane angle α1, and cavity angle α2 . 1.3 Research Goal and Objectives We have been working towards developing a linear-time, fast higher-order, multi-dimensional, A-stable implicit solver that can be applicable for electromagnetic (EM) problems, specif- ically targeting plasma science. This approach uses MOLT formulation combined with an ADI scheme. In this way, algebraic approximations of original complex multi-physics prob- lems can be mapped to utilize a computer’s core competencies. We aim to develop several classes for leadership distributed multi-core computing platforms based on GP-GPU that is an O(N ) direct implicit solver, and thereby enable exploration of a range of grand-challenge problems. We are motivated by the following objectives: • The development of high-order outflow and Neumann boundary conditions in 3D and implementing a general geometric framework. • The development of a high-order EM solver with no CFL constraint by developing 6 suitable boundary conditions for the vector-potential form of Maxwells equations under the Lorenz gauge. Develop a scheme for perfectly electrically conducting layers (PECs) in 2D. • The development of high-order multi-core 2D and 3D implicit solvers on GPGPU for general geometries. • The development of a scalable software solution using multi-core openMP and multi- node MPI. We choose classic EM problems with complex geometries to assess our solver. These include photonic crystal waveguides, scattering of laser light on spherical objects, and sim- ulating electron beams in an A6 relativistic magnetron. 1.4 Electromagnetic Wave Propagation 1.4.1 Electromagnetic Fields The mathematical relationship of electromagnetic fields is governed by Maxwell’s equa- tions. The differential form of the macroscopic Maxwell’s equations can be expressed as: = 0 ∇ × E + ∂B ∂t ∇ × H − ∂D ∂t = J ∇ · B = 0 ∇ · D = ρ (1.1) (1.2) (1.3) (1.4) Where Equations 1.1 and 1.2 represent Faraday’s law and Ampere’s law respectively, and Equations 1.3 and 1.4 represent Gauss’s law. Here B is the magnetic flux density (Wbm−2), D is the electric flux density (Cm−2), E is the electric field intensity (Vm−1), H is the magnetic field intensity (Am−1), J is the electric current density (Am−2), and ρ is the 7 electric charge density (Cm−3). In linear isotropic media, the electric flux density D and the magnetic flux density B have the constitutive relations with electric field intensity E and magnetic field intensity H as follows: D = E = 0rE B = µH = µ0µrH (1.5) (1.6) where the dielectric constant  is the permittivity (Fm−1), 0 is the free space permittivity and r is the relative permittivity. Similarly, µ is the permeability, µ0 is the free space permeability (Hm−1), and µr is the relative permeability of the media. Upon applying the approximations (1.5 and 1.6), we get ∇ · H = 0 ∇ · E = ρ  Now consider the case where  and µ do not vary with time, thus ∇ × E + µ ∇ × H −  ∂H ∂t ∂E ∂t = 0 = J (1.7) (1.8) (1.9) (1.10) Since we have solvers for second-order wave equations, we convert the first order Maxwell system to second-order form. We know the following relationship with the curl and divergence operations, ∇ × (∇ × v) = ∇(∇ · v) − ∇2v (1.11) Applying Equation 1.11 to E, we find the wave equation for the electric field in the 8 presence of source, ∇(∇ · E) − ∇2E = ∇ × (∇ × E) (cid:17) − ∇2E = −µ(∇ × ∂H ∇(cid:16) ρ  ) ∂t ∂(∇ × H) = −µ = −µ = −µ ∂J ∂t ∂t ∂( ∂E ∂t + J) ∂t + ∇(cid:16) ρ (cid:17) ∂t2 − µ ∂2E ∂J ∂t  ∇2E − 1 c2 ∂2E ∂t2 = µ (1.12) where the speed c = 1√ µ. Suppose there are no free charges, ρ = 0, and J = 0 then we have source-free wave equation for the electric field, ∇2E − 1 c2 ∂2E ∂t2 = 0 (1.13) In the same way, upon applying Equation 1.11 to H, we obtain ∇(∇ · H) − ∇2H = ∇ × (∇ × H) (cid:18) (cid:19) −∇2H = ∇ ×  =  ∂t + J + ∇ × J ∂E ∂t ∂(∇ × E) ∂(−µ ∂H ∂t ) + ∇ × J ∂t2 + ∇ × J ∂t ∂2H =  = −µ ∇2H − 1 c2 ∂2H ∂t2 = −∇ × J (1.14) If J = 0 then we have source-free wave equation for the magnetic field, 9 ∇2H − 1 c2 ∂2H ∂t2 = 0 (1.15) We have derived a wave equation for the electric field (1.12) and the magnetic field (1.14) previously. However, since our targeting application domain is plasma science, we need to obtain electric scalar and magnetic vector potential formulation from the mixed potential form of Maxwell’s equation which can easily deal with charged particles. We give a detailed derivation for the vector potential formulation in Chapter 5 1.4.2 Boundary Conditions for Electromagnetic Fields The electromagnetic fields and flux densities are subject to boundary conditions; they can be derived from the integral form of Maxwell’s equations. Let’s consider a current and charge-free boundary s that separates the regions 1 and 2 as shown in Figure 1.3. Figure 1.3: Boundary surface between two regions with electric fields E1 and E2, magnetic field H1 and H2, electric flux densities D1 and D2, and magnetic flux densities B1 and B2 respectively. Here, Js is the surface current, ρs is the surface charge density, and n is the normal vector pointed out from the region two. Faraday’s and Ampere’s laws predict electric and magnetic fields are continuous along the boundary, so the tangential boundary conditions can be expressed as, ˆn × (E1 − E2) = 0 ˆn × (H1 − H2) = 0 (1.16) Gauss’s law predicts the electric and magnetic fluxes are continuous along the normal to 10 the boundary surface, so the normal boundary conditions can be expressed as, ˆn · (D1 − D2) = 0 ˆn · (B1 − B2) = 0 (1.17) Suppose the surface s has the induced surface electric current density Js and the induced surface electric charge density ρs, the tangential and normal boundary conditions can be predicted from Faraday’s, Ampere’s, and Gauss’s laws as follows, ˆn × (E1 − E2) = 0 ˆn · (D1 − D2) = ρs ˆn × (H1 − H2) = Js ˆn · (B1 − B2) = 0 (1.18) (1.19) 1.5 Overview We give a general overview of this dissertation and draw attention to the original contri- butions of the work. Basically, this work begins with MOLT based A-stable implicit solvers introduced by [15] and later developed to achieve higher-order schemes [14] and adopt em- bedded boundary methods [13]. We provide high-order outflow and Neumann boundary con- ditions in 3D while targeting 3D multi-core high-order implicit EM solvers on the GP-GPU platform. This work contributes to the development of high-order outflow and Neumann boundary conditions in 3D for general geometries and 3D high-order EM solver using the vector-potential form of Maxwells equations under the Lorenz gauge. Further, this work implements these solvers on a multi-core computing platform, GP-GPU in order to release a full-throttle multi-scale solver to enhance spatiotemporal performance. We now describe the structure of the dissertation section by section. We start with an introduction of our work in Chapter 1 including motivation (in Section 1.1) and objectives (Section 1.3). In Chapter 2, we describe the available MOLT based implicit scheme (in Section 2.2.1) and explain how we extend the technique to a 3D solver 11 using an ADI scheme ( in Section 2.3). We present numerical results for newly developed 3D solvers including variable speed test cases. Waveguides on a photonic crystal are used to assess our 2D variable speed solver. High-order temporal schemes are explained in Chapter 3 including a novel scheme for high-order outflow boundary conditions described in Section 3.2. Further, this chapter in- cludes numerical results for the proposed high-order 3D solver. In Chapter 4, we begin with a description of the embedded method presented by [13] and extend it to higher-order using the method given in [15] for 2D and 3D problems. Further, we develop a general approach to deal with complex boundary problems in Section 4.3. We give an explanation of our high-order 3D scheme in Section 4.2.3 and present numerical results for complex boundary problems, electromagnetic wave scattering by PEC cylinders (4.4.1), electron beams in A6 Relativistic Magnetron in 2D (4.4.3), and scattering of Laser light on spherical objects in 3D (4.4.2) which are implemented using a fourth-order embedded boundary scheme. In Chapter 5, we explain our versatile approach for electromagnetic vector potential and the implementation of PEC boundary conditions. We provide several test cases including frequency mode analysis for A6 magnetron in 2D. Chapter 6 explains nuts and bolts of a software solution which can be scalable using high-performance computing systems. Finally in Chapter 7 we summarize our work that was carried out throughout this study and give a list of possible directions that can be used for further advancement. 12 CHAPTER 2: Mathematical Model for Wave Equation Solver 2.1 Introduction In this chapter, we review the novel impact method which we will further develop in later chapters. The new work in this chapter is the extension to 3D. We also discuss the advantages of this approach over other methods. Maxwell’s equations in free-space result in a hyperbolic wave equation with finite speed of propagation of a field quantity. We develop a numerical scheme in 3D to solve the hyperbolic wave equation by imposing different boundary conditions, specifically outflow and embedded Neumann and PEC boundaries. We are mainly focused on the development of an EM solver for problems with complex geometries, which can be easily accepted on multi-core technologies facilitated by GP-GPU computing platforms. Our main target is to derive a time-dependent solution for Maxwell’s equations with higher-order accuracy in time and space. In order to obtain the solution, we choose an implicit unconditionally stable Method Of Lines Transpose (MOLT - also known as the transverse method of lines or Rothes method) [4; 17; 38; 41; 54; 50; 31] based numerical scheme coupled with an embedded boundary approach to deal with complex geometry problems [15; 14; 13]. This was implemented in 2D and we extend the scheme to 3D by successful implementations for different applications. This approach uses a MOLT formulation combined with an Alternating Direction Implicit (ADI) scheme. The method avoids the use of matrices that typically result from the discretization of the spatial parts of a problem and thus eliminates the main bottleneck in scaling implicit methods. In this scheme, a PDE is first discretized in time, and then the resulting boundary- value problems are solved using a Green’s function method. In particular, the inverse of the 13 resulting modified Helmholtz operator is analytically constructed and evaluated efficiently using an O(N ) recursive fast convolution algorithm. Extension to multi-dimensions is formed using an ADI scheme, and each line is solved independently. Next, we describe why this scheme is an optimal solution. When we use this scheme to solve the hyperbolic wave equation, the spatial information along a line in the simulation is connecting the boundaries to all points in space along that line to advance a computational solution in time. Along this line, every point in space knows about the other points due to the nature of the Green’s function used in the solution. The approximation of spatial convolution depends on the quadrature size we choose, and the accuracy of the time derivation can be extended by using the higher-order scheme which performs a set of spatial convolutions/sweeps in multiple computing levels. The hyperbolic wave equation form of Maxwell’s equation is preserved by the time discretization and the spatial convolution. Further, we proved the scheme is strictly dispersing ( no diffusive for- mulation) [53; 15], and satisfies the Gauss’s divergence-free conditions. When we extend the scheme to multi-D using ADI splitting, the spatial points talk to each other by moving in each direction, and they communicate along the lines in that direction. We reduce the splitting error by doing a couple of sweeps along each direction and taking the average in multiple computing levels. Historically, time-dependent solutions of the wave equation are most commonly con- structed with the method-of-lines (MOL) approach [47; 48; 72; 59]. The MOL schemes consist of either collocation (nodal) methods or Galerkin (modal) methods [7; 12; 20]. The collocation methods (e.g.finite-difference, finite-volume, and finite-element methods) are used to find a solution at a particular point, but in Galerkin methods, the solution is projected onto a set of basis functions and the time-dependent coefficients are computed. The ap- proach which has influenced most numerical methods in computational electromagnetics is Finite-difference time-domain (FDTD) methods [46; 65; 19], which use an explicit scheme, so these schemes are restricted by Courant-Friedrichs-Lewy (CFL) condition (concerning the 14 ratio of the time step to the spatial step) that limits the time step size. In 1966, Kane S. Yee derived an FDTD based time-dependent solution for Maxwell’s equations [71]. The FDTD Yee scheme has been used for a large number of problems in computational electromagnetics. However, the scheme is limited to Cartesian grids and suffers from the staircase effect on curved boundaries. Further, it cannot be used to deal with discontinuities across material interfaces while maintaining a high-order of accuracy. Hence, the scheme is not suitable for curved objects or media with material interfaces [57; 19]. There have been many attempts apart from FDTD methods [46; 65; 19] to complex geometry and raise the order of these methods above two [21; 19; 11; 10; 18]. However, these methods have not made it into the mainstream because they are not robust. In addition to stability issues, these methods typically suffer from very small CFL [21; 19; 11; 18] conditions because of small cells at the boundaries. In contrast, implicit schemes break the CFL restriction. Hence, the implicit schemes are most preferable to the problems with tightly coupled scales such as problems in plasma science. Alternating direction implicit FDTD (ADI-FDTD) is an implicit scheme that is used to develop an A-stable implicit Maxwell solver in several plasma-related problems [73; 58; 26; 27], but they are all second order in time. Further, the implicit Finite Element Time Domain (FETD) method based on the Newmark beta approach is unconditionally stable, but it gives second-order accuracy. Further, the implicit and frequency domain schemes require cost-expensive matrix inversion. Therefore, the MOLT based scheme is chosen to be an appropriate scheme for this simulation study in order to obtain high-order accuracy rapidly. In Section 2.2, we give the implicit semi-discrete solution for the 1D wave equation, in Section 2.3 we explain how we can extend this framework to multi-dimensions. In Sec- tion 2.2.3, we derive equations for several applicable boundary conditions including outflow boundary conditions, and Section 2.4 explains details of the higher-order scheme, and finally Section 2.6 reports a set of test cases. The subsections of section 2.4 are laid out as follows. 15 In Section 2.4.1, we derive a family of schemes of order 2P for the one-dimensional case, and in Section 2.4.2, we generalize the higher-order scheme for higher spatial dimensions, producing ADI methods of order 2P which will be A-stable. Here P refers to the number of terms taken in the MOLT expansion. 2.2 One Dimensional Implicit Wave Equation Solver Based on the initial boundary value problem with consistent boundary conditions, the wave equation for one spatial dimension can be written as follows, 1 c2 ∂ttu − ∂xxu = S(x, t), x ∈ [a, b], t < 0, (2.1) with initial conditions u(x, 0) = f (x), ∂tu(x, 0) = g(x), x ∈ [a, b], x ∈ [a, b], where c is the propagation speed and S(x, t) denotes a source. This is well-posed once consistent boundary conditions are appended, such as: 1. Dirichlet boundary condition: u(a, t) = UL(t) and u(b, t) = UR(t). 2. Neumann boundary condition: ∂xu(a, t) = VL(t) and ∂xu(b, t) = VR(t). 3. Periodic boundary condition: u(a, t) = u(b, t) and ∂xu(a, t) = ∂xu(b, t). 4. Outflow boundary condition: ∂tu(a, t) = c∂xu(a, t) and ∂tu(b, t) = −c∂xu(b, t). 2.2.1 Semi-discretization The semi-discrete form of the wave equation can be obtained as follows: We start by discretizing utt using the second order time centered finite difference approxi- 16 mation, ∂ttun = un+1 − 2un + un−1 ∆t2 − ∆t2 12 ∂ttttu(x, η), η ∈ [tn−1, tn+1]. (2.2) The Laplacian has to be evaluated at time tn+1 to obtain an implicit scheme. To perform this, we can apply a symmetric 3-point averaging for the Laplacian because we choose a centered finite difference stencil. ∂xxun = ∂xx where β > 0. un + un+1 − 2un + un−1 β2 (cid:17) − ∆t2 β2 ∂ttxxu(x, η). (2.3) Using (2.2) and (2.3), we obtain the second order semi-discrete equation, (cid:16) ∂xx − β2 (c∆t)2 (cid:17)(cid:16) un + (cid:17) un+1 − 2un + un−1 β2 = − β2 (c∆t)2 − S(x, tn) + O(∆t2). (2.4) (cid:16) (cid:16) The differential operator in Equation (2.4) can be interpreted as a modified Helmholtz operator with, Lx[u] := (cid:17) 1 − 1 α2 ∂xx u(x), α = β c∆t , x ∈ [a, b]. (2.5) On rearranging equation (2.4), we form the modified Helmholtz equation which can be written as, Lx[un+1 − (2 − β2)un + un−1] = β2(cid:16) un + α2 Sn(cid:17) 1 . (2.6) We solve this modified Helmholtz equation (2.6) by inverting the Helmholtz operator L using the free space Greens function. In this way, we start with the solution subject to the boundary [−∞,∞] and then map it to the actual boundary [a, b] by making the boundary adjustment terms. Next, we describe the process in detail beginning with the derivation of 17 Green’s formula. Equation (2.4) can be written as, (cid:16) 1 − 1 (cid:17) where ψ(x) = un+1−(2−β2)un+un−1 and s(x) = β2(cid:16) ψ(x) = s(x), α2 ∂xx x ∈ [a, b], α2 Sn(cid:17) un+ 1 , subject to the homogeneous (2.7) Dirichlet boundary condition ψ(a) = 0, ψ(b) = 0, (2.8) The semi-discrete form of Equation (2.6) approximates the hyperbolic wave equation with the second-order of accuracy in time using centered finite difference, and the symmetric formulation removes the numerical dissipation. The scheme was proved as purely dispersive [53], so it is more favorable for long-time simulations and overcomes the spurious effect of numerical diffusion. Consistency of the scheme was proved for free space and with boundary conditions. Further, the scheme was proved as an unconditionally stable scheme using Von- Neumann analysis (see [53] for proof). Based on those characteristics/proofs of the scheme, (cid:16) 1− 1 α2 ∂xx (cid:17) the hyperbolic nature of the equation is preserved. Upon multiplying the expression ψ(x) with a test function P (x) and integrate (cid:16) (cid:12)(cid:12)(cid:12)b a P (x)∂xψ(x) (cid:17) (cid:12)(cid:12)(cid:12)b a − ψ(x)∂xP (x) (2.9) it from a to b, b(cid:90) a = (cid:16) b(cid:90) 1 − 1 α2 ∂xx ψ(x)P (x)dx (cid:17) (cid:16) (cid:17) ψ(x) 1 − 1 α2 ∂xx P (x)dx − 1 α2 a Now we have the Green’s formula, 18 (cid:104) b(cid:90) a (cid:16) (cid:17) P (x) −(cid:16) (cid:17) 1 − 1 α2 ∂xx 1 − 1 α2 ∂xx (cid:105) ψ(x)P (x) dx = (cid:16) 1 α2 (cid:12)(cid:12)(cid:12)b a P (x)∂xψ(x) ψ(x) Suppose the function P is a free space Green’s function (cid:16) (cid:17) 1 − 1 α2 ∂xx P (x, x(cid:48)) = δ(x − x(cid:48)) (cid:17) (cid:12)(cid:12)(cid:12)b a − ψ(x)∂xP (x) (2.10) (2.11) b(cid:90) a ψ(x)δ(x − x(cid:48))dx − ψ(x(cid:48)) − b(cid:90) b(cid:90) a a s(x)P (x, x(cid:48))dx = s(x)P (x, x(cid:48))dx = (cid:16) (cid:16) 1 α2 1 α2 Hence, P (x, x(cid:48))∂xψ(x, x(cid:48)) − ψ(x)∂xP (x, x(cid:48)) P (x, x(cid:48))∂xψ(x) − ψ(x)∂xP (x, x(cid:48)) (2.12) (cid:17) (cid:12)(cid:12)(cid:12)b (cid:17) a (cid:12)(cid:12)(cid:12)b a (cid:12)(cid:12)(cid:12)b a (cid:12)(cid:12)(cid:12)b a b(cid:90) a ψ(x(cid:48)) = s(x)P (x, x(cid:48))dx + (cid:16) 1 α2 (cid:12)(cid:12)(cid:12)b a P (x, x(cid:48))∂xψ(x) − ψ(x)∂xP (x, x(cid:48)) (cid:17) (cid:12)(cid:12)(cid:12)b a (2.13) This ends up with the particular solution and boundary correction which can be obtained by the boundary condition we chose (Dirichlet, Neumann, periodic, outflow etc.). Here we chose to use the free space Greens function and enforce the boundary conditions on u(a, t) and u(b, t) through expressions we derive for the unknown terms. Let’s apply the homogeneous Dirichlet boundary condition (2.8), b(cid:90) a ψ(x(cid:48)) = s(x)P (x, x(cid:48))dx + (cid:16) 1 α2 (cid:12)(cid:12)(cid:12)x=b P (b, x(cid:48))∂xψ(x) − P (a, x(cid:48))∂xψ(x) (cid:17) (cid:12)(cid:12)(cid:12)x=a (2.14) 19 Now we have two unknowns ∂xψ(x) and they can be obtained by solving the systems of linear equations that are derived by the limits on Equation (2.14) to x(cid:48) → a and x(cid:48) → b, and ∂xψ(x) (cid:12)(cid:12)(cid:12)x=b (cid:12)(cid:12)(cid:12)x=a (cid:16) (cid:16) 1 α2 1 α2 (cid:12)(cid:12)(cid:12)x=b (cid:12)(cid:12)(cid:12)x=b − P (a, a)∂xψ(x) − P (a, b)∂xψ(x) (cid:17) (cid:17) (2.15) (2.16) (cid:12)(cid:12)(cid:12)x=a (cid:12)(cid:12)(cid:12)x=a b(cid:90) b(cid:90) a (cid:12)(cid:12)(cid:12)x=a (cid:12)(cid:12)(cid:12)x=b ψ(a) = s(x)P (x, a)dx + P (b, a)∂xψ(x) ψ(b) = s(x)P (x, b)dx + P (b, b)∂xψ(x) a Since here, we are imposing homogeneous boundary condition, ψ(a) = 0 and ψ(b) = 0 b(cid:90) (cid:16) − P (a, b) b(cid:90) (cid:16) − P (b, b) a b(cid:90) b(cid:90) a (cid:17) (cid:17) (2.17) (2.18) ∂xψ(x) = wα s(x)P (x, a)dx + P (a, a) s(x)P (x, b)dx ∂xψ(x) = wα s(x)P (x, a)dx + P (b, a) s(x)P (x, b)dx where a a wα = α2 P (a, a)P (b, b) − P (a, b)P (b, a) Now we compute the exact free space Green’s function to substitute the test function P , (cid:16) 1 − 1 α2 ∂xx (cid:17) G(x, x(cid:48)) = δ(x − x(cid:48)) (2.19) subject to G(x, x(cid:48)) = 0 when x → −∞ and x → ∞. We can show that the Green’s x(cid:48)− = −α2. function satisfies the continuity condition G Consider the homogeneous solution φ, Lφ = 0 and decompose it to two solutions φ1 and φ2 for x > x(cid:48) and x < x(cid:48). So they also satisfy Lφ1 = 0 and Lφ2 = 0 and φ1 and φ2 can be x(cid:48)− = 0 and jump conditions ∂xG (cid:12)(cid:12)(cid:12)x(cid:48)+ (cid:12)(cid:12)(cid:12)x(cid:48)+ 20 written in the form of, φ1 = c11sinh(αx) + c12cosh(αx) φ2 = c21sinh(αx) + c22cosh(αx) (2.20) (2.21) Upon applying the continuity and jump conditions , c11sinh(αx(cid:48)) + c12cosh(αx(cid:48)) − c21sinh(αx) − c22cosh(αx(cid:48)) = 0 c11cosh(αx(cid:48)) − c12sinh(αx(cid:48)) + −c21cosh(αx) + c22sinh(αx(cid:48)) = −α (2.22) (2.23) Let’s consider x(cid:48) = 0, then we get, c12 = c22 and c21 = c11 + α. Now we write the solutions φ1 and φ2 in the exponential form, (cid:16)eαx − e−αx (cid:17) (cid:16) eαx − e−αx 2 + c12 (cid:16)eαx + e−αx (cid:17) (cid:17) (cid:16)eαx + e−αx 2 + c12 2 φ2 = (c11 + α) 2 φ1 = c11 (cid:17) (2.24) (2.25) For x > x(cid:48), limx(cid:48)→∞ φ1 = 0, we choose c11 = −c12 such that eαx vanishes, so, φ1 = −c11e−αx Similarly, for x < x(cid:48), limx(cid:48)→−∞ φ2 = 0, we choose c11 = −α/2 such that e−αx vanishes, so, Hence, the Green’s function G(x, x(cid:48)) = α φ2 = eαx α 2 2 e−α|x−x(cid:48)|, and therefore the inverted modified Helmholtz operator can be written as, L−1 x [u] = (cid:124)(cid:123)(cid:122)(cid:125) Ix[u] Particular solution (cid:124) + Ae−α(x−a) + Be−α(b−x) . (2.26) (cid:123)(cid:122) (cid:125) Homogeneous solution 21 where, b(cid:90) a α 2 Ix[u] := e−α|x−y|u(y)dy, x ∈ [a, b], and (cid:12)(cid:12)(cid:12)x=a ∂xψ(x) A = −ψ(a) 2 − 1 2α , B = −ψ(b) 2 + 1 2α (cid:12)(cid:12)(cid:12)x=b ∂xψ(x) (2.27) (2.28) A and B also can be written in integral form as follows, a(cid:90) −∞ A = α 2 e−α(a−y)u(y)dy, B = α 2 ∞(cid:90) b e−α(y−b)u(y)dy. (2.29) The coefficients A and B of the homogeneous solution are determined by applying bound- ary conditions. Now, equation (2.6) can be written using the inverted Helmholtz operator, L−1 as follows, un+1 − 2un + un−1 = −β2un + β2L−1 x [un] + β2L−1 x (2.30) α2 Sn(cid:105) (cid:104) 1 . α2 Sn(cid:105) (cid:104) 1 The definition can additionally be modified to include boundary corrections on a finite domain (see [15]). We also introduce a new operator D related to equation (2.26) that we are going to use frequently in higher-order and higher dimensional solutions. un+1 − 2un + un−1 = −β2Dx[un] + β2L−1 x , Dx[u] := u − L−1 x [u]. (2.31) The computational cost of evaluating Ix[u] in equation (2.26) is typically O(N 2); however it can be computed efficiently with second order accuracy at O(N ) cost using the fast convo- lution method detailed next. 22 2.2.2 Fast Convolution Algorithm We are going to see how we can compute the particular solution using a fast convolution algorithm. The integration for the particular solution, I[u](x) can be performed by decom- posing it into left (I L[u](x)) and right (I R[u](x))) oriented integrals as shown in the Figure 2.1 Figure 2.1: Fast convolution line integration which decomposes left I L[u](x) and right I R[u](x)) oriented integrals between the domain a and b with the spatial step size, ∆xj(= xj − xj−1). We begin by splitting the integration at y = x, and integrating from a to x and x to b; i.e., I[u](x) = I L[u](x) + I R[u](x) (2.32) x(cid:90) a α 2 I L[u](x) = u(y)e−α(x−y)dy, I R[u](x) = b(cid:90) x α 2 u(y)e−α(y−x)dy, so that both integrands decay exponentially away from x. Additionally, they satisfy expo- nential recurrence relations, which means that I L[u](xj) = e−α∆xj I L[u](xj−1) + J L[u](xj) j = 1 to N J L[u](xj) = α 2 e−αyu(x − y)dy, (2.33) ∆xj(cid:90) 0 23 I R[u](xj) = e−α∆xj I R[u](xj+1) + J R[u](x), j = N − 1 to 0 ∆xj(cid:90) J R[u](xj) = α 2 e−αyu(x + y)dy. (2.34) 0 where the ∆xj = xj − xj−1 which is the spatial grid size. The recursive representation and the localization reduces the computational complexity. In particular, using a second order polynomial interpolant in place of u in the integral allows us to compute I[u](x) in O(N ) time. The local integrals ((2.33) and (2.34)) may be evaluated using quadrature or analytically. For example, we can derive a second order quadrature using compact Simpson’s rule and Lemma 2.1, j ≈ P uj + Quj−1 + R (uj+1 − 2uj + uj−1) , J L j ≈ P uj + Quj+1 + R (uj+1 − 2uj + uj−1) , J R (2.35) with quadrature weights, P = 1 − 1 − d v , Q = −d + 1 − d v , R = 1 − d v2 − 1 + d 2v , where v = α∆x, d = e−v, and uj denotes the solution at grid point xj This method can be summarized as follows (Algorithm 1), 2.2.3 Boundary Condition We can determine the Homogeneous coefficients (A, B) from the imposed boundary con- ditions and the end/boundary point values of the particular solution (Ix(xstart) and Ix(xend) where xstart, and xend are two boundary points). Let us work with several common boundary conditions in 1D first. Then these can be expandable to higher dimension. 24 Algorithm 1 Fast Convolution 1: un - array of u values at time tn, 2: ν - array of weighted nodes along the specific direction, 3: expweight - array of exponentially weighted nodes along the specific direction, 4: ps - order of accuracy in space 5: for j = 1 to n do 6: Compute JL(j + 1) and JR(j) via quadrature (see equation (2.35) for second order quadrature) or analytical integration using expweight and ps IL(j + 1) = d(j)IL(j) + JL(j + 1) IR(n − j + 1) = d(n − j + 2)IR(n − j + 2) + JL(n − j + 1) 7: end for 8: d = e−ν 9: for j = 1 to n do 10: 11: 12: end for 13: for j = 1 to n do I = 1 14: 2 15: end for (cid:0)IL(j) + IR(j)(cid:1) 2.2.3.1 Dirichlet Boundary Condition Let us begin with Dirichlet boundary conditions. Evaluating the semi-discrete solution defined in equation (2.30) at x = a and b, we get. UL(tn+1) − 2UL(tn) + UL(tn−1) = −β2UL(tn) + β2 UR(tn+1)−2UR(tn)+UR(tn−1) = −β2UR(tn)+β2(cid:16) (cid:20) (cid:18) (cid:104) (cid:21) 1 α2 Sn α2 Sn(cid:105) 1 I un+1 + (a) + A + Be−α(b−a) I un+1+ (b)+Ae−α(b−a)+B , (2.37) (cid:19) , (2.36) (cid:17) We can rearrange the linear system by unknown and known values, A + µB = −wD a , µA + B = −wD b . Solving the linear system for the unknowns A and B gives, (wD a − µwD b ) (µ2 − 1) A = , B = (wD b − µwD a ) (µ2 − 1) , (2.38) 25 where, (cid:104) (cid:104) wD a = I un + wD b = I un + 1 α2 Sn(cid:105) α2 Sn(cid:105) 1 (a) − 1 β2 (a) − 1 β2 (cid:16) (cid:16) (cid:17) (cid:17) , (2.39) (2.40) UL(tn+1) + (β2 − 2)UL(tn) + UL(tn−1) UR(tn+1) + (β2 − 2)UR(tn) + UR(tn−1) , and µ = e−α(b−a). 2.2.3.2 Neumann Boundary Condition For Neumann boundary condition, we obtain the identities, I (b) = −αI(b) using the characteristics of the integral solution (equation (2.27)); specifically, all (a) = αI(a), I (cid:48) (cid:48) dependence on x is on the Greens function which is a simple exponential function. Differ- entiating the semi-discrete solution (2.30), and applying the Neumann boundary conditions at x = a and b yields, VL(tn+1)−2VL(tn)+VL(tn−1) = −β2VL(tn)+β2(cid:16) I (cid:104) un+1+ and VR(tn+1)−2VR(tn)+VR(tn−1) = −β2VR(tn)+β2(cid:16) (cid:104) I un+1+ 1 α2 Sn(cid:105) α2 Sn(cid:105) 1 (a)+A+Be−α(b−a)(cid:17) , (2.41) (cid:17) . (2.42) (b)+Ae−α(b−a)+B We can rearrange the linear system by unknown and known values, A − µB = wN a , −µA + B = wN b . Solving the linear system for the unknowns A and B gives, A = (wN a + µwN b ) (µ2 − 1) , B = (wN b + µwN a ) (µ2 − 1) , (2.43) 26 where, (cid:104) (cid:104) wN a = I un + wN b = I un + 1 α2 Sn(cid:105) α2 Sn(cid:105) 1 (a) − 1 β2 (a) − 1 β2 (cid:16) (cid:16) 2.2.3.3 Periodic Boundary Condition VL(tn+1) + (β2 − 2)VL(tn) + VL(tn−1) (2.44) VR(tn+1) + (β2 − 2)VR(tn) + VR(tn−1) . (2.45) (cid:17) (cid:17) , By applying the periodic boundary conditions on the semi-discrete solution (2.30), we obtain, I (cid:104) α2 Sn(cid:105) 1 (cid:104) un + (a) + A + µB = I un + (a) + µA + B, and α (cid:104) (cid:16) I un + α2 Sn(cid:105) 1 (cid:17) (cid:16) − I = α (a) − A + µBα (cid:104) un + (cid:17) . (a) − µA + Bα (2.46) (2.47) 1 1 α2 Sn(cid:105) α2 Sn(cid:105) (cid:104) (cid:104) We can rearrange the linear system to obtain, (µ − 1)A + (1 − µ)B = I (1 − µ)A + (1 − µ)B = I un + un + (a) − I un + (a) + I un + (cid:104) (cid:104) 1 α2 Sn(cid:105) α2 Sn(cid:105) 1 1 α2 Sn(cid:105) α2 Sn(cid:105) 1 (b), (b). Upon solving these linear equations we get, (cid:104) α2 Sn(cid:105) (cid:104) α2 Sn(cid:105) I A = un + 1 (1 − µ) (b) I , B = un + 1 (a) (1 − µ) . (2.48) 2.2.3.4 Outflow Boundary Condition We can extend the definition of the outflow boundary conditions to the domains exterior to [a, b]. Assuming that our initial condition has compact support in [a, b], after time t = tn, the domain of dependence of un(x) extends to [a − ctn, b + ctn], since the propagation speed 27 is c. Now the solution of L−1 ((2.26)) can be written as. L−1[un(x)] = α 2 where e−α|x−y|un(y)dy = I[u(x)] + Ane−α(x−a) + Bne−α(b−x). b+ctn(cid:90) a−ctn An := α 2 a(cid:90) b+ctn(cid:90) a−ctn e−α(a−y)un(y)dy, Bn := α 2 e−α(y−b)un(y)dy. (2.49) (2.50) b Now we turn these spatial integrals into time integrals which exist at precisely the end points x = a and b. First we consider outflow along the right boundary, x > b and assume this region contains only right traveling waves, u(x, t) = u(x − ct). By tracing backward along a characteristic ray we find, u(b + y, t) = u(b, t − y/c), y > 0. Hence, ctn(cid:90) e−α(y)un(b + y, tn)dy = 0 tn(cid:90) 0 αc 2 Bn = α 2 e−α(cs)un(b, tn − s)ds. Now, we can impose outflow boundary conditions using the behavior of u at x = b. A temporal recurrence relation due to the exponential will be, tn−1(cid:90) 0 e−αcsu(b, tn − s)ds + e−αc∆t e−αcsu(b, tn−1 − s)ds, e−βzu(b, tn − z∆t)dz + e−βBn−1. (2.51) ∆t(cid:90) 1(cid:90) 0 Bn = αc 2 = β 2 0 28 where β = αc∆t. Therefore, the boundary coefficient Bn can be computed locally in both time and space. We choose a quadratic interpolant in terms of u to have second order accuracy. u(b, tn − z∆t) = un(b) − z 2 (cid:16) (cid:17) un+1(b) − un−1(b) (cid:16) + z2 2 un+1(b) − 2un(b) − un−1(b) (cid:17) . (2.52) Using the Equation (2.52) would give the second order approximation for outflow. To compute the values that come from this approximation, we will make use of the following lemma which comes from integration by parts (see [15] for proof), Lemma 2.1. For integers m ≥ 0 and real v > 0, 1(cid:90) Em := v e−vzdz = zm m! 1 vm (cid:16) (cid:17) 1 − e−vPm(v) m(cid:88) l=0 vl l! where Pm(v) = 0 is the Taylor series expansion of order m of ev. Using this lemma and integrating the expression (2.52) analytically, we can get, Bn = e−βBn−1 + γ0un+1(b) + γ1un(b) + γ2un−1(b), (2.53) where, γ0 = E2(β) − 1 E1(β), 2 γ1 = −2E2(β) + E0(β), γ2 = E2(β) + 1 2 E1(β). and (2.54) There are two unknowns in this equation (2.53), Bn and un+1, so we need one more equation, 29 that can be derived by evaluating equation (2.30) at x = b, un+1(b) − 2un(b) + un−1(b) = −β2(cid:16) un(b) −(cid:0)I(b) + µAn + Bn(cid:1)(cid:17) . (2.55) Upon solving these two linear equations (2.53) and (2.55) we can obtain, −Γ0µAn + (1 − Γ0)Bn = e−βBn−1 + Γ0I(b) + Γ1un(b) + Γ2un−1(b), (2.56) where Γ0 = β2 2 γ0, Γ1 = γ1 − γ0(β2 − 2), Γ2 = γ2 − γ0. Similarly, by considering outflow at the other end x < a, we get (1 − Γ0)An − Γ0µBn = e−βAn−1 + Γ0I(a) + Γ1un(a) + Γ2un−1(a). (2.57) Solving the linear system of equations (2.56) and (2.57) gives, (1 − Γ0)wOut a + µΓ0wOut b (1 − Γ0)2 − (µΓ0)2 , Bn = An = where (1 − Γ0)wOut b + µΓ0wOut a (1 − Γ0)2 − (µΓ0)2 , (2.58) a = e−βAn−1 + Γ0I(a) + Γ1un(a) + Γ2un−1(a), wOut b = e−βBn−1 + Γ0I(b) + Γ1un(b) + Γ2un−1(b) wOut According to the solution of the coefficients An and Bn, we realize that we need to know their values at the previous time step(An−1 and Bn−1). For utilization purposes, we set A0 = B0 = 0 . 30 2.3 Multi-Dimensional Implicit Wave Equation Solver using ADI Scheme The extension to multi-dimension is formed using an alternating direction implicit (ADI) scheme, and each line is solved independently. The multi-dimensional wave equation can be represented using the initial boundary value problem, 1 c2 ∂2u(x, t) ∂t2 − ∇2u(x, t) = S(x, t), x ∈ Ω, t > 0, (2.59) u(x, 0) = f (x), x ∈ Ω, ∂tu(x, 0) = g(x), x ∈ Ω. with consistent boundary conditions, u(x, t) = h(x, t), x ∈ ∂Ω, t > 0, source term S(x, t), and wave speed c The multi-dimensional implicit scheme uses an ADI scheme [15], and each ADI line is solved independently. Consider a scheme for three spatial dimensions, 1 − 1 = + (cid:16) 1 α4 ∂2 ∂z2 ∂2 ∂y2 (cid:17)(cid:16) ∂x2 + ∂2 ∂x2 (cid:17) (cid:17)(cid:16) ∂2 ∂y2 + 1 − 1 α2 (cid:16) ∂2 α2∇2 = 1 − 1 α2 (cid:17) − 1 (cid:16) ∂4 1 − 1 α2 α2∇2 = LxLyLz + O(cid:0)(c∆t)4(cid:1) ∂x2∂y2 + ∂x2∂z2 + 1 − 1 α2 ∂2 ∂z2 ∂y2∂z2 α6 ∂4 ∂4 (cid:17) ∂6 ∂x2∂y2∂z2 (2.60) Hence, 1 − 1 where Lx, Ly, and Lz are one dimensional univariate modified Helmholtz operators [15] 31 applied in the indicated spatial variable. Now the semi-discrete equation, (cid:104) β2un + un+1 − 2un + un−1(cid:105) LxLyLz = β2un + β2 α2 Sn(x, t) (2.61) Upon inverting the modified Helmholtz operators Lx, Ly, and Lz, we have a semi-discrete solution, un+1 − 2un + un−1 = −β2Dxyz[un] + β2L−1 z L−1 y L−1 x α2 Sn(cid:105) (cid:104) 1 (x, y, z) (2.62) where the three dimensional operator is, Dxyz[u] := u − L−1 z L−1 y L−1 x [u], while in two dimensions it is un+1−2un+un−1 = −β2Dxy[un]+β2L−1 y L−1 x α2 Sn(cid:105) (cid:104) 1 (x, y), Dxy[u] := u−L−1 y L−1 x [u]. (2.63) Inversion of the operator L−1 is performed sequentially leading to the usual x, y and z “sweeps” of the ADI algorithm that is represented by a combination of the operator L−1 γ . γ Because the inverse operators are defined along lines, it is quite natural to discretize 2D and 3D regions along Cartesian lines. However, the endpoints of these lines are not restricted to residing at mesh points and can always be chosen to lie on the boundary ∂Ω. For example, the x, y, boundary points for the x-sweep and y-sweep for a circle are shown in Figure 2.2. We first perform x-sweeps along with x-lines with actual boundary ends and obtain the intermediate solution and then apply y-sweeps over the intermediate solution, along with y- lines with actual boundary ends. The xy-sweeps form a complete circular boundary (Figure 2.2 - (c)). Similarly, we can perform y-sweeps first then x-sweeps next and take an average to obtain the solution The general approach for implementation of the 3D ADI scheme follows the steps shown below: Assume the number of x, y, and z lines are nx, ny, and nz respectively. At each time step, 32 (a) x-lines (b) y-lines Figure 2.2: (a) x-, (b) y- and (c) xy lines with exact circular boundary points on the mesh lines that are used for the ADI x (a) and y (b) sweeps. (c) xy-lines 1. Perform the x-sweep Operate L−1 on u(x, y, z) along x, and store the result into a temporary variable wx(x, yi, zk) for 1 ≤ k ≤ nz and 1 ≤ i ≤ ny. The boundary conditions are imposed at x = xa(ik) and xb(ik). 2. Perform the y-sweep Operate L−1 on wx(x, y, z) along y, and store the result into a temporary variable wyx(xj, y, zk) for 1 ≤ k ≤ nz and 1 ≤ j ≤ nx. The boundary conditions are imposed 33 at y = ya(jk) and yb(jk). 3. Perform the z-sweep For 1 ≤ i ≤ ny and 1 ≤ j ≤ nx using wyx(x, y, z), solve for the equation un+1 = 2un + un−1 − β2Dxyz, where Dxyz = un − L−1 z [wyx]. The boundary conditions are now applied at z = za(ij) and zb(ij). In order to improve the accuracy of the ADI solve, the inversion of the x, y and z Helmholtz operators is symmetrized, by averaging the results of x—y —z, y —z —x, and z —x —y solves. That means we need to apply the operators Dxyz, Dyzx, and Dzyx and then take the average. We give the detailed algorithm for the two-dimensional implicit wave equation solver in Algorithm 2 and relevant supporting functions are given in Algorithms 3 and 4. Algorithm 3 computes solution u at time tn+1 using our two dimensional implicit solver that calls the function LINV given in Algorithm 4 to apply the operator L−1 to perform x and y sweeps. LINV may be written in two functions for x and y sweeps separately. The function applyBC called in Algorithm 4 is used to find the homogeneous coefficients An and Bn using appropriate boundary conditions. Those are retrieved as the first and second elements of the vector H respectively. Since we have to use the value of the coefficients at the previous time step (tn−1), An−1 and Bn−1 to compute An and Bn for outflow boundary condition, these algorithms, however, need to be modified with minor changes by passing reference type parameters to the function applyBC. The parameter should be defined within the main Algorithm 2 and be called via computeU . Further, although Algorithm 2 would typically be used to deal with problems with rectangular boundaries, it can be extended to adopt problems with randomly placed boundary points in each grid line such as when dealing with a circular boundary using the embedded boundary method. Here, we need to use matrices x and y instead of vectors and define additional vectors to keep boundary points of each horizontal and vertical grid line. 34 Algorithm 2 Two Dimensional Solver 1: xA, xB and yA , yB - boundary points along x and y respectively, 2: nx, ny - number of grid points along x and y respectively, 3: ∆t - time step size, 4: T - total time, 5: c - wave speed, 6: β - averaging parameter, 7: ps -order of accuracy in space. 8: x = grid vector(xA, xB, nx) 9: y = grid vector(yA, yB, ny) 10: set initialvalue(un−1, t0) 11: set initialvalue(un, t1) 12: α = β c∆t 13: nt = T ∆t 14: µx = e−α(x(n)−x(0)) 15: µy = e−α(y(n)−y(0)) 16: expAx = e−α(x−x(0)) 17: expBx = e−α(x(n)−x) 18: expAy = e−α(y−y(0)) 19: expBy = e−α(y(n)−y) 20: νx = α(x(2 : nx) − x(1 : nx − 1)) 21: νy = α(y(2 : ny) − y(1 : ny − 1)) 22: expweightx = exp weights(νx, p) 23: expweighty = exp weights(νy, p 24: for tn = 1 to nt do 25: 26: 27: 28: end for compute u(un−1, un, nx, ny, β, µx, µy, xA, xB, yA, yB, expAx, expBx, = un+1 expAy, expBy, νx, νy, expweightx, expweighty, ps) un−1 = un un = un+1 2.4 Arbitrary Temporal Order Accuracy So far, we have derived multi-dimensional semi-discretizations of the wave equation with second-order accuracy in time. In order to take large time steps in problems, we need higher- order accuracy in time. While there is no stability restriction placed on our A-stable scheme, considerations of accuracy present themselves when the CFL number becomes large. Indeed, when large time steps are taken, the anisotropies introduced by the dimensional splitting are very pronounced. 35 Algorithm 3 compute u : Compute u at time tn+1 1: un−1, un - array of u values at time tn−1 and tn respectively, 2: nx, ny - number of grid points along x and y respectively, 3: β - averaging parameter, 4: µx = e−α(x(n)−x(0)), 5: µy = e−α(y(n)−y(0)), 6: xA, xB and yA , yB - boundary points along x and y respectively, 7: expAx, expBx - exponential vector of grid distance from left boundary and right boundary respectively along x, 8: expAy, expBy - exponential vector of grid distance from bottom boundary and up bound- ary respectively along y, 9: νx, νy - array of weighted nodes along x and y respectively, 10: expweightx, expweighty - array of exponentially weighted nodes along x and y respec- tively, 11: ps -order of accuracy in space 12: linv x = linv(u1, ny, β, µx, xA, xB, expAx, expBx, νx, expweightx, ps, 0) 13: linv yx = linv(Linvx, nx, β, µy, yA, yB, expAy, expBy, νy, expweighty, p, 1) 14: Dxy = un − linv yx 15: un+1 = 2un + un−1 − β2Dxy Algorithm 4 linv : L−1 1: un - array of u values at time tn, 2: n - number of lines that need to be sweep, 3: bdryA, bdryB - boundary points along the specific line, 4: expA expB- exponential vector of grid distance from left boundary and right boundary respectively along the line, I = f astconvolution(u(i, :), ν, expweight, ps) H = apply bc(u(bdryA), u(bdryB), I(bdryA), I(bdryB), β, bdryA, bdryB, µ) Linv(i, :) = I + H(1)expA + H(2)expB 5: ν - array of weighted nodes along the line, 6: expweight - array of exponentially weighted nodes along the line, 7: p -order of accuracy in space, 8: dir - determine which sweep is going to do, along x or y determined by 0 or 1 respectively 9: if dir == 0 then 10: 11: 12: 13: 14: 15: else 16: 17: 18: 19: 20: 21: end if I = f astconvolution(u(:, j), ν, expweight, ps) H = apply bc(bdryA), u(bdryB), I(bdryA), I(bdryB), β, bdryA, bdryB, µ) Linv(:, j) = I + H(1)expA + H(2)expB end for for i = 1 to n do end for for j = 1 to n do 36 In this section, we discuss A-stable schemes of arbitrary order using a MOLT formula- tion of the wave equation, and by implicitly including higher-order derivatives. However, we construct the derivatives using a novel approach: recursive convolution. The cornerstone of our method lies in the fact that by using recursive applications of the convolution oper- ators introduced in [15], the inversions of higher derivatives can be performed analytically. So the resulting scheme is made explicit, even at the semi-discrete level. In constructing the analytical convolution operators, we incorporate the boundary conditions directly. In this way, without additional complexity Dirichlet and periodic boundary conditions can be implemented to higher-order. Since dealing with outflow boundary conditions depend on previous time step values at the boundaries, we need to do a little more work to implement it in higher-order. We use the recursive convolution algorithm which consumes O(N ) oper- ations. So, the schemes we obtain in d = 1, 2 and 3 dimensions will achieve accuracy 2P in O(P dN ) operations per time step. 2.4.1 Arbitrary Order One Dimensional Scheme A higher-order scheme can be achieved by including more spatial derivatives in the nu- merical scheme. We choose a Lax-Wendroff approach to increase the temporal accuracy of our second order semi-discrete solution. This approach requires us to exchange time deriva- tives with spatial derivatives in the Taylor expansion. Let’s start with the semi-discrete wave equation and apply the Lax-Wendroff procedure. We have un+1 − 2un + un−1 = 2 (∂tt)m un (2.64) Hence, by using the fact, (∂tt)m u =(cid:0)c2∂xx ∆t2m (2m)! ∞(cid:88) (cid:1)m u, m ≥ 1. m=1 37 un+1 − 2un + un−1 = 2 = 2 ∞(cid:88) ∞(cid:88) m=1 m=1 (c∆t)2m (2m)! (∂xx)m un (cid:18)∂xx (cid:19)m β2m (2m)! α2 un. (2.65) Differential operators (∂xx)m have to be approximated properly. To do this, we can consider the convolution operator D from (2.31). We begin by considering the equation, (cid:20)(cid:18)∂xx (cid:19)m(cid:21) α2 F (cid:18) k (cid:19)2m α = (−1)m where F denotes the Fourier transform. Now, we need to find (k/α)2; it can be formed using D as follows, ˆD := F [D] = 1 − F(cid:2)L−1(cid:3) = 1 − 1 +(cid:0) k 1 α (cid:1)2 = (cid:0) k (cid:1)2 1 +(cid:0) k α α (cid:1)2 . Solving for the quantity (k/α)2 in terms of ˆD, gives an expression for all even derivatives as follows, (cid:18) k (cid:18) k α α (cid:19)2 (cid:19)2m = = ˆD 1 − ˆD (cid:32) ˆD 1 − ˆD = ∞(cid:88) (cid:33)m p=1 ˆDp = and (cid:19) (cid:18) p − 1 m − 1 ∞(cid:88) p=m ˆDp, By inserting these into the Taylor expansion (2.65), we obtain ∞(cid:88) m=1 (cid:19) (cid:18) p − 1 m − 1 ∞(cid:88) p=m Dp[un]. un+1 − 2un + un−1 = (−1)m 2β2m (2m)! Upon reversing the order of summation, we have un+1 − 2un + un−1 = Ap(β)Dp[un], (2.66) ∞(cid:88) p=1 38 where Ap(β) is a polynomial of coefficients in β2, Ap(β) = 2 (−1)m β2m (2m)! p(cid:88) m=1 (cid:19) (cid:18) p − 1 m − 1 . (2.67) From this scheme (2.4.1), 1D second (identical to the original second order scheme (2.31)) and fourth order equations are un+1 − 2un + un−1 = −β2D[un] un+1 − 2un + un−1 = −β2D[un] − (cid:18) β2 − β4 12 (cid:19) D2[un] (2.68) (2.69) Algorithm 3 can be modified as follows to be able to compute the solution with higher- order accuracy in 1D. Algorithm 5 Compute u at time tn+1 with high-order accuracy 1: un−1, un - array of u values at time tn−1 and tn respectively, 2: n - number of points along the line, 3: β - averaging parameter, 4: µ = e−α(x(n)−x(0)), 5: bdryA, bdryB - boundary points along the specific line 6: expA, expB - exponential vector of grid distance from left boundary and right boundary respectively along the line, 7: ν - array of weighted nodes along the line, 8: expweight - array of exponentially weighted nodes along the line, 9: ps -order of accuracy in space, 10: pt -order of accuracy in time 11: P = pt 2 12: Dterms = 0 13: for k = 1 to P do 14: D = u − linv(u, n, β, µ, bdryA, bdryB, expA, expB, ν, expweight, ps) 15: Dterms = Dterms + poly highorder(β, P )D u = D 16: 17: end for 18: un+1 = 2un − un−1 + Dterms Temporal cost for a scheme of order P is O(P N ) per time step for N spatial points. Since the local truncation errors are O((c∆t/β)2P +2), the error constant will decrease with 39 Algorithm 6 poly highorder : Compute the coefficients for high-order accuracy 1: β - averaging parameter, 2: k - order of accuracy at current stage 3: coef = 0; 4: for m = 1 to k do 5: 6: end for 7: coef = 2coef coef = coef + (−1)m β2m (cid:1) (cid:0) k−1 m−1 (2m)! increasing β. However, it should be limited with an upper value to maintain stability. The stability of this scheme was proved in [15] using Von-Neumann analysis and they obtained an upper limit of β for several even orders of temporal accuracy which is summarized in Table 2.1. We can notice that βmax decreases with increasing P P Order βmax 1 2 2 2 4 3 6 4 8 5 10 1.4840 1.2345 1.0795 0.9715 Table 2.1: The maximum values βmax for which the P th order scheme remains A-stable [14]. We need to set appropriate initial conditions to achieve the expected order. Suppose we would like to get P th order accuracy, the initial value should be computed to O(∆t2P ). For our 3-step scheme (2.66), u0 and u1 have to be initialized. They may be computed analytically. However, in general, we know the exact value of u0 (= f (x)), and we need to approximate the value for u1 = u(x, ∆t). To do so, we can apply the Lax-Wendroff procedure in Taylor expansion, we obtain 40 m=0 ∞(cid:88) ∞(cid:88) ∞(cid:88) ∞(cid:88) m=0 m=0 u1 = = = = (2m)! t u0 ∂m ∆tm m! (cid:18) ∆t2m (cid:18)(c∆t)2m (cid:18)(c∆t)2m (2m)! (2m)! m=0 (cid:19) ∆t2m+1 (2m + 1)! ∂2m+1 t u0 ∂2m t u0 + ∂2m x u0 + (c∆t)2m+1 (2m + 1)! ∂2m x 1 c ∂tu0 ∂2m x f (x) + (c∆t)2m+1 (2m + 1)! ∂2m x 1 c (cid:19) (cid:19) g(x) . (2.70) where g(x) = ut(x, 0). This expansion can now be truncated at O(∆t2P ) to have P th order of accuracy. 2.4.2 Extension to Higher Dimensions for Arbitrary Order Accuracy We derive higher-order accurate solutions for the wave equation in higher dimensions using an ADI scheme. We begin with the expansion of Equation (2.64) by introducing the Laplacian in the Lax-Wendroff procedure. un+1 − 2un + un−1 = 2 ∆t2m (2m)! (∂m tt ) un = 2 ∞(cid:88) m=1 ∞(cid:88) m=1 β2m (2m)! (cid:18)∇2 (cid:19)m α2 un. Before approximating higher-order powers of the Laplacian operator, consider univariate modified Helmholtz operators, and their corresponding D operators as given below, Lγ := 1 − ∂2 γ α2 , Dγ := 1 − L−1 γ , γ = x, y, z. Since these operators satisfy identities, LγDγ[u] = L[u] − u = − ∂γγ α2 u, the Laplacian in 3D can be written by, −∇2 α2 = LxDx + LyDy + LzDz = LxLyLz[Cxyz], (2.71) 41 where Cxyz : L−1 y L−1 z Dx + L−1 z L−1 x Dy + L−1 x L−1 y Dz Consider the D operator in three dimensions. Dxyz := 1 − L−1 x L−1 y L−1 z . (2.72) (2.73) Upon rearranging and inverting (2.73), we get an identity LxLyLz = (1 − Dxyz) −1 , Now, the Laplacian becomes, −∇2 α2 = (1 − Dxyz) −1 Cxyz. By expanding (1−D)−m as a power series, we can construct even orders of the Laplacian as follows, (cid:18)∇2 (cid:19)m α2 (cid:19) (cid:18) p − 1 m − 1 ∞(cid:88) p=m = (−1)mCm xyz Dp−m xyz . (2.74) Upon omitting the subscripts, the semi-discrete scheme for 2D and 3D is un+1 − 2un + un−1 = = = (cid:18)∇2 (cid:19)m 2β2m α2 (2m)! (−1)m 2β2m (2m)! Cm m=1 ∞(cid:88) ∞(cid:88) ∞(cid:88) m=1 p(cid:88) (−1)m 2β2m (2m)! p=1 m=1 (cid:19) un (cid:18) p − 1 ∞(cid:88) (cid:18) p − 1 (cid:19) m − 1 p=m m − 1 Dp−m[un] xyzDp−m Cm xyz [un]. We give second and fourth-order schemes here, un+1 − 2un + un−1 = −β2Cxyz[un] un+1 − 2un + un−1 = −β2Cxyz[un] − (cid:18) β2Dxyz − β4 12 (cid:19) Cxyz Cxyz[un] (2.75) (2.76) (2.77) As in the 1D case, these schemes will be unconditionally stable for all ∆t, and the same 42 range for β as shown in Table 2.1. The algorithm for computing u in two dimensions using ADI splitting with a given order of accuracy in time is designed as below in Algorithm 7. Using this algorithm, we can retrieve the solution with accuracy O (pt) in time. Algorithm 7 compute u high :Compute u in 2D with high-order accuracy 1: un−1, un - array of u values at time tn−1 and tn respectively, 2: nx, ny - number of points along x and y respectively, 3: β - averaging parameter, 4: µx = e−α(x(n)−x(0)),µy = e−α(y(n)−y(0)), 5: xA, xB, and yA, yB - boundary points in x and y directions respectively, 6: expAx, expBx - exponential vector of grid distance from left boundary and right boundary respectively along x, 7: expAy, expBy - exponential vector of grid distance from bottom boundary and up bound- ary respectively along y, 8: νx, νy - array of weighted nodes along x and y respectively, 9: expweightx, expweighty - array of exponentially weighted nodes along x and y respec- tively, expAy, expBy, νx, νy, expweightx, expweighty, ps) 10: ps and pt - order of accuracy in space and time respectively 11: P = pt 2 12: struct outarray(0) 13: cd terms = −β2structoutarray(0) 14: for k = 2 to P do 15: 16: = compute c(u, nx, ny, β, µx, µy, xA, xB, yA, yB, expAx, expBx, struct inarray = struct outarray struct outarray(1) = compute cd(struct inarray(0), nx, ny, β, µx, µy, xA, xB, yA, yB, expAx, expBx, expAy, expBy, νx, νy, expweightx, expweighty, ps) for i = 2 to k − 1 do struct outarray(i) = compute d(struct inarray(i), nx, ny, β, µx, µy, xA, xB, yA, yB, expAx, expBx, expAy, expBy, νx, νy, xpweightx, expweighty, ps) 17: 18: cd terms = cd terms + ploy highorder(β, i, k)struct outarray(i − 1) end for for i = k to 1 do 19: 20: 21: 22: 23: end for 24: un+1 = 2un − un−1 + cd terms end for Two unique functions were designed to compute solutions for the convolution operators D (Algorithm 9) and C (Algorithm 8) by leveraging the fast convolution algorithm (Algorithm 1). Even though these two algorithms can be used together to compute the operators C and 43 D by applying them on the same vector of values at any stage, we need a separate function as given in Algorithm 10 in order to avoid redundant calculations and follow up issues with computing certain homogeneous boundary coefficients. For this algorithm, we expand the equation Cxy = L−1 y Dx + L−1 x Dy as follows using Dγ = 1 − L−1 , γ Cxy = (L−1 y + L−1 x ) − (L−1 y L−1 x + L−1 x L−1 y ) (2.78) The Algorithm 6 computes the coefficients of the equation (2.75) for higher dimension, higher-order accuracy in terms of β using (2.81). Algorithm 8 compute c : Compute operator C in 2D; Cxy = L−1 1: u - array of u values at time t, 2: nx, ny - number of points along x and y respectively, 3: β - averaging parameter, 4: µx = e−α(x(n)−x(0)), 5: µy = e−α(y(n)−y(0)), 6: xA, xB, and yA, yB - boundary points along x and y respectively, 7: expAx, expBx - exponential vector of grid distance from left boundary and right boundary y Dx + L−1 x Dy respectively along x, 8: expAy, expBy - exponential vector of grid distance from bottom boundary and up bound- ary respectively along y, 9: νx, νy - array of weighted nodes along x and y respectively, 10: expweightx, expweighty - array of exponentially weighted nodes along x and y respec- tively, 11: ps -order of accuracy in space 12: Dx = u − Linv(u, ny, β, µx, xA, xB, expAx, expBx, νx, expweightx, ps, 0) 13: linv yDx = linv(Dx, nx, β, µy, yA, yB, expAy, expBy, νy, expweighty, ps, 1) 14: Dy = u − linv(u, nx, β, µy, yA, yB, expAy, expBy, νy, expweighty, ps, 1) 15: linv xDy = linv(Dy, ny, β, µx, xA, xB, expAx, expBx, νx, expweightx, ps, 0) 16: Cxy = linv yDx + linv xDy 2.5 Treatment for Variable Speed Wave Propagation This procedure of exchanging time derivatives with spatial derivatives in the semi-discrete equation using the Lax-Wendroff procedure will also work in the case of variable wave speeds. 44 Algorithm 9 compute d :Compute operator D in 2D; Dxy = 1 − L−1 1: u - array of u values at time t, nx, ny - number of points along x and y respectively, 2: β - averaging parameter, 3: µx = e−α(x(n)−x(0)), 4: µy = e−α(y(n)−y(0)), 5: xA, xB, and yA, yB - boundary points along x and y directions respectively, 6: expAx, expBx - exponential vector of grid distance from left boundary and right boundary y L−1 x respectively along x, 7: expAy, expBy - exponential vector of grid distance from bottom boundary and up bound- ary respectively along y, 8: νx, νy - array of weighted nodes along x and y respectively, 9: expweightx, expweighty - array of exponentially weighted nodes along x and y respec- tively, 10: ps -order of accuracy in space 11: linv x = linv(u, ny, β, µx, xA, xB, expAx, expBx, νx, expweightx, ps, 0) 12: linv yx = linv(linv x, nx, β, µy, yA, yB, expAy, expBy, νy, expweighty, ps, 1) 13: Dxy = u − linv yx Algorithm 10 compute cd : Compute operators C and D in 2D; Dxy and Cxy 1: u - array of u values at time t, nx, ny - number of points along x and y respectively, 2: β - averaging parameter, 3: µx = e−α(x(n)−x(0)), 4: µy = e−α(y(n)−y(0)), 5: xA, xB, and yA, yB - boundary points along x and y respectively, 6: expAx, expBx - exponential vector of grid distance from left boundary and right boundary respectively along x, 7: expAy, expBy - exponential vector of grid distance from bottom boundary and up bound- ary respectively along y, 8: νx, νy - array of weighted nodes along x and y respectively, 9: expweightx, expweighty - array of exponentially weighted nodes along x and y respec- tively, 10: ps -order of accuracy in space 11: linv x = linv(u, ny, β, µx, xA, xB, expAx, expBx, νx, expweightx, ps, 0) 12: linv y = linv(u, nx, β, µy, yA, yB, expAy, expBy, νy, expweighty, ps, 1) 13: linv yx = linv(linv x, nx, β, µy, yA, yB, expAy, expBy, νy, expweighty, ps, 1) 14: linv xy = linv(linv y, ny, β, µx, xA, xB, expAx, expBx, νx, expweightx, ps, 0) 15: Dxy = u − linv yx 16: Cxy = linv x + linv y − (linv yx + linv xy) The wave equation with variable wave speed can be written as, utt = [c(x)]2uxx (2.79) 45 Assume that the speed is finite and bounded as c1 ≤ c(x) ≤ c2. It can be normalized by ¯c(x) = c(x)/c2. We choose α = β β > 0. Thus, c2∆t, un+1 − 2un + un−1 = 2 m=1 m=1 ∞(cid:88) ∞(cid:88) ∞(cid:88) ∞(cid:88) ∞(cid:88) m=1 m=1 p=1 = 2 = 2 = = p(cid:88) m=1 ∆t2m (2m)! ∆t2m (2m)! β2m (2m)! (∂tt)mun (cid:0)c2(x)∂xx (cid:18) ¯c2(x) ∂xx α2 (−1)m 2β2m (2m)! ¯c2m(x) Ap(β)Dp[un], . (cid:1)m un (cid:19)m ∞(cid:88) p=m un (cid:18) p − 1 (cid:19) m − 1 Dp[un] (2.80) . (2.81) (cid:19) (cid:18) p − 1 m − 1 (−1)m (β¯c(x))2m (2m)! and the polynomial of coefficients will be, Ap(β) = 2 2.6 Numerical Results We present a set of test cases that evaluates our framework; specifically, we consider problems with variable/piecewise constant wave speed in one, two, and three dimensions, using a Gaussian pulse as an initial source or a point source. For 2D, we choose a classic EM problem - a transverse magnetic (TM) mode photonic crystal waveguide. 2.6.1 Convergence Studies 2.6.1.1 Higher Order Wave Solvers in Two Dimension In this section, we consider a two-dimensional fourth-order temporally accurate solver. A square domain (Ω = [−1, 1]× [−1, 1]) is chosen to assess the two dimensional higher-order solver. We place a point source cos(ωt) at the center of the domain, (0, 0) and apply different 46 boundary conditions. (a) Dirichlet boundary condition (b) Periodic boundary conditions Figure 2.3: Fourth order time convergence of the 2D wave solver using Dirichlet (a) and periodic (b) boundary conditions with ∆x = ∆y = 6.25 × 10−3. This is a self- refinement study which measures L∞ norm of the error at time T = 2.0 on a square domain Ω = [−1, 1]2 with a point source cos(ωt), ω = 1 at the center of the box, (0, 0). The CFL (= c∆t ∆x ) value changes proportional to the time step size ∆t with fixed spatial step size. 47 Figure 2.3 shows fourth order time convergence obtained for Dirichlet and periodic bound- ary conditions by performing a refinement study on a square domain Ω = [−1, 1] × [−1, 1] with a point source cos(ωt), ω = 1 at the center of the box, (0, 0). The CFL (= c∆t ∆x ) value changes proportional to the time step size ∆t with fixed spatial step size, ∆x = ∆y = 0.00625. We consider a final time T = 2.0, with a fixed spatial resolution of 320× 320 spatial points. The discrete L∞ norm of the error is constructed at each time step and the maximum over all time steps is used to generate the graph in Figure 2.3. 2.6.1.2 Higher Order Wave Solvers in Three Dimensions We first show fourth order convergence for Dirichlet boundary condition by performing a time refinement study on a cubic domain Ω = [−1, 1] × [−1, 1] × [−1, 1] with a point source cos(ωt) at center of the domain, (0, 0, 0). This runs up to time T = 2.0, with a fixed spatial resolution of 160× 160× 160 spatial points. The discrete L∞ norm of the error is constructed at each time step and the maximum error over all time steps is used to graph Figure 2.4 for ∆t = 625e−2 , k = 1 to 5, with Dirichlet boundary conditions. 2k 2.6.2 Three Dimensional Waves In this section, we validate our three dimensional solver using a cubic domain (Ω = [−1, 1] × [−1, 1] × [−1, 1]). We set the spatial grid size as ∆x = ∆y = ∆z = 0.0625 (32 × 32 × 32 grid points) and the time step size to ∆t = 0.0313. We successfully tested the solver for a point source by applying Dirichlet, periodic and outflow boundary conditions along the cubic faces. Figures 2.5, 2.6, and 2.7 show the field of a point source cos(ωt), ω = 1 that is placed at (0, −1 + 3∆x, 0) using Dirichlet, periodic, and outflow boundary conditions along the six faces respectively. In each case, a three dimensional wave generated by a point source (Figure 2.5, 2.6, and 2.7) smoothly propagates in a cube and obeys applied boundary conditions along the surfaces of the cube. 48 (a) ω = 0.1 (b) ω = 1 Figure 2.4: Fourth order convergence of 3D wave solver using Dirichlet boundary conditions with ∆x = ∆y = ∆z = 1.25 × 10−2 ω = 1. This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 on a cubic domain Ω = [−1, 1]3 with a point source cos(ωt) at the center of the cube, (0, 0, 0) for (a) ω = 0.1 and (b) ω = 1. The CFL (= c∆t ∆x ) value changes proportional to the time step size ∆t with fixed spatial step size. 49 Figure 2.5: Time evolution of a point source field cos(ωt), ω = 1 within a cubic domain (Ω = [−1, 1]3) by imposing Dirichlet boundary condition along the surface with the spatial step size, ∆x = ∆y = ∆z = 0.0625 ω = 1 and the time step size, ∆t = 0.0313. Figure 2.6: Time evolution of a point source field cos(ωt), ω = 1 within a cubic (Ω = [−1, 1]3) by imposing periodic boundary condition along the surface with the spatial step size, ∆x = ∆y = ∆z = 0.0625 ω = 1 and the time step size, ∆t = 0.0313. Figure 2.7: Time evolution of a point source field cos(ωt), ω = 1 within a cubic (Ω = [−1, 1]3) by imposing outflow boundary condition along the surface with the spatial step size, ∆x = ∆y = ∆z = 0.0625 ω = 1 and the time step size, ∆t = 0.0313. 2.6.3 Waves with Variable Speeds 2.6.3.1 One Dimensional Case A one dimensional wave travels through two different consecutive domains, Ω1 ∈ [−1, 0] and Ω2 ∈ [0, 1] with piecewise constant speed c1 = 1.0 and c2 = 2.0 respectively. The solution 50 until the wave reaches the location of discontinuity, x = 0 is, u(x, t) = aie−η(x−c1(t−t0))2 (2.82) where ai is the initial amplitude, η is the Gaussian shape parameter, and t0 is a time offset which delays the Gaussian pulse. A portion of the traveling wave in domain one, Ω1 will be reflected at the location of interface, x = 0, while the remaining waves are transmitted to domain two, Ω2. Thus, we need to consider the combination of transmitted, reflected, and incident waves from the location of the interface, x = 0 and onward. The left domain (Ω1) will have components of the incident and reflected waves, and the right domain (Ω2) will have the transmitted component of the original wave. In this way, the solution obeys the following equations, u1(t) = aieη(x−c1(t−t0))2 + areη(x+c1(t−t0))2 x < 0 u2(t) = ate¯η(x−c2(t−t0))2 x ≥ 0, where, ar and at are amplitudes of the reflected and transmitted wave respectively and ¯η is the shape parameter of the transmitted Gaussian pulse. Because of the zero transverse displacement at the interface, u1(0, t) = u2(0, t), that gives us, (ai + ar)eηc2 1(t−t0)2 = ate¯ηc2 2(t−t0)2. By equating the coefficients we get, ai + ar = at ηc2 1 = ¯ηc2 2. 51 (2.83) (2.84) From energy conservation, we obtain a2 i√ η = a2 r√ η + a2 t√ ¯η . (2.85) Upon solving equations (2.84), and (2.85), we obtain , 2aic2 at = (c1 + c2) ar = ai − at. , and ar = 1 − at. Further, For our first numerical example, we choose ai = 1, so at = 2c2 c1+c2 we chose c1 = 1.0 and c2 = 2.0, and apply our solver along the domain Ω1 ∪ Ω2 with 1024 uniform grid points of size ∆x = 0.002, while maintaining a CFL of 2.5 by choosing time step ∆t = 0.005. Figure 2.8 shows time snapshots of the moving wave using outflow boundary conditions at the left and right boundaries. The numerical results agree closely with the theoretical solutions as can be seen in the figure. Figure 2.8: 1D wave travelling over the two domains Ω1 and Ω2 with piecewise constant speed c1 = 1.0 and c2 = 2.0. For this experiment, we choose spatial step ∆x = 0.002, and time step ∆t = 0.005. A problem with multiple domains was selected as our second numerical test case with the same spatial step size, ∆x = 0.002, and time step size ∆t = 0.005, hence CFL remains 2.5. The entire domain has eight equal width sub-domains of size 0.25, and the wave travels at the same speed in every bi-domain. The first sub-domain is between -1 and -0.75, has waves travelling with speed c1 (=1.0); the wave speed is c2 (= 2.0) in the even-numbered sub-domains. 52 Figure 2.9 shows time snapshots of the moving wave using outflow boundary conditions at the external left and right boundaries. Figure 2.9: 1D wave travelling over the layered media with eight domains and the wave travels with the same speed in every bi-domain ( c1 = 1.0 and c2 = 2.0). For this experiment, we choose spatial step ∆x = 0.002, and time step ∆t = 0.005 along layered media with piecewise constant wave speed. 2.6.3.2 Two Dimensional Case In this section, we consider the two-dimensional variable speed solver. A square domain with square patches is chosen to assess the two-dimensional variable speed solver. Due to the material property of patches, waves travel with different speeds through these patches compared with the remaining area. We choose a Gaussian pulse e−25(x2+y2) as an initial solution and apply outflow boundary condition along the border of the square domain (Ω = [−1, 1] × [−1, 1]). Since we use spatial step size ∆x = ∆y = 0.02 and time step size ∆t = 0.005, applicable CFL is 0.25. We demonstrate two test cases: One and Four patches are placed for case-i and case-ii respectively as shown in Figure 2.10 Case-i - Single patch A 0.5 × 0.5 square patch is placed at the left top corner centered at, (-0.5, 0.5). The wave speed is set to be c2(= 0.1) in the patch, and c1(= 1.0) in the remaining area. Figure 2.11 shows time snapshots of the solution. Case-ii - Four patches There are four 0.5 x 0.5 square patches placed at the four corners; i.e., centered at (-0.5, 0.5), (-0.5, -0.5), (0.5, 0.5), and (0.5, -0.5). The wave speed is set to be c2(= 0.1) in every patch, and c1(= 1.0) in the remaining area. Figure 2.12 shows time snapshots of the wave 53 (a) single patch (b) 4 patches Figure 2.10: Geometrical view of 0.5× 0.5 (a) one and (b) four square patches in the square domain Ω = [−1, 1]2 with c2(= 0.1) and c1(= 1.0) are the wave speeds in the patches and the remaining area. Figure 2.11: Time evolution of a Gaussian field e−25(x2+y2) in a square domain Ω = [−1, 1]2 with a 0.52 square patch as shown in Figure 2.10-(a). Here, the spatial step size, ∆x = ∆y = 0.02 and the time step size, ∆t = 0.005. at different time instants. Figure 2.12: Time evolution of a Gaussian field e−25(x2+y2) in a square domain Ω = [−1, 1]2 with four 0.52 square patches as shown in Figure 2.10-(b). Here, the spatial step size, ∆x = ∆y = 0.02 and the time step size, ∆t = 0.005. 54 2.6.3.3 Three Dimensional Cases In this section, we discuss the three-dimensional variable speed wave-solver. A cubic domain with cubic patches is chosen to assess the three-dimensional variable speed solver. Due to the material property of patches, waves travel with different speeds through these patches compared with the remaining area. We place a Gaussian pulse e−36(x2+y2+z2) as an initial solution and apply outflow boundary conditions along the border of the cube (Ω = [−1, 1] × [−1, 1] × [−1, 1]). Since we use a spatial grid size ∆x = ∆y = ∆z = 0.0625 (32 × 32 × 32 grid points) and time step size ∆t = 0.0313, applicable CFL is 0.5. We demonstrate two test cases: one, and four patches for case-i and case-ii respectively as shown in Figure 2.13 (a) single patch (b) 4 patches Figure 2.13: Geometrical view of 0.5 × 0.5 × 0.5 (a) one and (b) four cubic patches in the cubical domain Ω = [−1, 1]3 with c2(= 0.1) and c1(= 1.0) are the wave speeds in the patches and the remaining area. Case-i - Single patch A 0.5 × 0.5 × 0.5 cubic patch is placed at the left top corner centered at, (-0.5, 0, 0.5). The wave speed is set to be c2(= 0.1) in the patch, and c1(= 1.0) in the remaining area. Figure 2.14 shows snapshots of the wave at different time instants. Case-ii - Four patches There are four 0.5× 0.5× 0.5 cubic patches placed at the four corners; i.e., centered at (-0.5, 55 Figure 2.14: Time evolution of a Gaussian field e−25(x2+y2+z2) in a cubical domain Ω = [−1, 1]3 with a 0.53 cubic patch as shown in Figure 2.13-(a). Here, the spa- tial step size, ∆x = ∆y = ∆z = 0.0625 and the time step size, ∆t = 0.0313. 0, 0.5), (-0.5, 0, -0.5), (0.5, 0, 0.5), and (0.5, 0, -0.5). The wave speed is set to be c2(= 0.1) in every patch, and c1(= 1.0) in the remaining area. Figure 2.15 shows snapshots of the wave at different time instants. Figure 2.15: Time evolution of a Gaussian field e−25(x2+y2+z2) in a cubical domain Ω = [−1, 1]3 with four 0.53 cubic patches as shown in Figure 2.13-(b). Here, the spatial step size, ∆x = ∆y = ∆z = 0.0625 and the time step size, ∆t = 0.0313. The 3D Gaussian pulse smoothly travels through the different material interfaces provided by patches (one in Figure 2.14 and four in Figure 2.15) and the rest of the area. 2.6.4 Waveguide in a Photonic Crystal 2.6.4.1 Problem Definition We study the propagation of light in a photonic crystal, a low-loss periodic dielectric medium, in which the atoms or molecules are replaced by macroscopic media with different 56 dielectric constants, and the periodic potential is replaced by periodic dielectric functions [43]. These are very useful in applications such as the design of fiber-optic cables, laser engineering, high-speed computing, and spectroscopy. In order to study this propagation, we first cast Maxwell equations written in terms of the electric field, E and magnetic field H in the form of a second-order wave equation. 2.6.4.2 Geometry of the Problem A two-dimensional photonic crystal which is periodic in the x, y directions and homo- geneous in the z direction is chosen as the test domain. The crystal has photonic band gaps in the xy plane and it can prevent light from propagating in any direction within the plane. This system has discrete translational symmetry in the xy plane, and this symmetry property can be used to characterize its electromagnetic modes. Any modes that propagate in the xy plane are invariant under reflections through the plane. The Transverse Magnetic (TM) modes have the H field in the plane and, the E field normal to the plane. As shown in Figure 2.16, a line defect is introduced along the y direction by removing a line of rods in a set of alumina rods. This can be represented using a square lattice of cylindrical rods in the air (Figure 2.17). Figure 2.16: Schematic illustration of a line defect (in red) within a set of cylindrical rods (in green) on a photonic crystal. 57 In our model, we chose an m-by-n lattice that has dielectric rods which are periodic along x (n rods) and y (m rods) axes with the lattice constant a. The cylindrical rod with radius r and dielectric constant  is homogeneous along the z direction (we imagine the cylinders are infinitely tall). A column of rods is removed to produce a cavity, so it has reflecting walls on both sides along the y direction. We consider the waveguide in TM mode using the following boundary conditions: periodic in the x direction and outflow in the y direction. We assume that there are no free charges, so only the dielectric constants are changing in space due to the rods. Figure 2.17: 2D Lattice of cylindrical rods which are periodic along x and y axes with lattice constant a on a photonic crystal. 2.6.4.3 Result and Discussion We performed numerical simulations of a waveguide on the photonic crystal using square and cylindrical rods. A rectangular (3.9 × 2.1) lattice of rods was chosen, the rods were placed in x, y directions with distance a = 0.3. The dielectric constant of rods was chosen to be  = 8.9. For the ADI scheme, we chose nx = ny = 100, with time step t = 0.0105. Using square rods The set of square rods of size r = 0.38a is modeled as shown in Figure 2.18, and we obtained the result as shown in Figure 2.19 using the boundary conditions: periodic in the direction 58 x and outflow in the direction y for a point source e(−iωt), where ω = cky, ky = 0.88(2a), c = 1√ ,  = 8.9, and µ = 1. (µ) (a) 3D (b) 2D Figure 2.18: Square lattice of square rods of side length 0.38a with lattice constant a and dielectric constant  = 8.9 in 3D (a) and 2D (b), a linear defect is formed by removing a column of rods. Figure 2.19: The wave generated by the point source, e(−iωt), where ω = cky, ky = 0.88(2a), ,  = 8.9, and µ = 1 travelling through the photonic waveguide using c = 1√ the model shown in Figure 2.18 (µ) Using cylindrical rods We replaced the square rods with cylindrical rods and used the same parameter choices as in case i (Figure 2.20). We obtained the result as shown in Figure 2.21 using the following boundary conditions: periodic in x and outflow in y Further, we tested the model for a source-free wave equation with initial solution e(−25(x2+y2)) using the same model (Figure 2.20) and obtained the result as shown in Figure 2.22. It is 59 (a) 3D (b) 2D Figure 2.20: Square lattice of cylindrical rods of side length 0.38a with lattice constant a and dielectric constant  = 8.9 in 3D (a) and 2D (b), a linear defect is formed by removing a column of rods. Figure 2.21: The wave generated by the point source, e(−iωt), where ω = cky, ky = 0.88(2a), ,  = 8.9, and µ = 1 travelling through the photonic waveguide using c = 1√ the model shown in Figure 2.20 (µ) important to know that with the embedded boundary method, such as we are pursuing here, it is no ”harder” to introduce an embedded boundary method for a square then it is for a circle. Figure 2.22: The wave, e(−25(x2+y2)) traveling through the photonic waveguide using the model shown in Figure 2.20. 60 2.7 Summary We derived the implicit semi-discrete solution for the multi-dimension wave equation sub- ject to several applicable boundary conditions including Dirichlet, periodic, Neumann, and outflow boundary conditions. We extended the second-order scheme to arbitrary higher- order using Dirichlet and periodic boundaries on Cartesian grids. The P th order scheme consumes O(P N ) time and O(P 2N ) space for the domain with N grid points, it can be reduced to ideally linear time and space complexity O(N ) for considerably large problems (N >>). The three-dimensional high-order scheme is implemented for Dirichlet and pe- riodic boundary conditions and evaluated using different applications including a variable speed wave. In the next chapter, we describe the derivation of high-order outflow boundary conditions. 61 CHAPTER 3: Higher Order Non-reflecting Boundary Condition 3.1 Introduction In this chapter, we develop a high-order non-reflecting boundary condition that can be applicable for curved boundaries based on the scheme described in Chapter 2. A way to define a finite computational domain in the infinite physical domain is by using an artificial boundary which is called non-reflecting or open boundary condition. Since we develop simulation of A6 magnetron with diffraction output (A6 MDO), we need a scheme that successfully handles non-reflecting boundary conditions, even for curved boundaries. The approach described in Chapter 2 gives second-order accuracy for the outflow boundary condition. Basically, this approach extends the spatial domain for a specific time instance and switches spatial integration into time integration at the boundary ends. We use higher- order interpolation polynomial based on a higher-order finite-difference stencil for the time integration approximation and apply a suitable initial condition to reach high-order accuracy. Historically, the open boundary condition was developed for outgoing wave propagation from the Sommerfeld radiation condition [66] by Zienkiewicz and Newton [74]. The wave- based interpretation of this boundary condition is more applicable for EM wave propagation [25; 60; 63]. Later, Higdon developed a sequence of increasing-order boundary conditions at various incidence angles [35]. The absorbing layer-based method, Perfectly Matched Layer (PML), is found to be the most popular method. It was developed by Berenger [6] for the 2-D Maxwell equations, and has been extended to 3D. PML imposes both losses of electric and magnetic fields, identifying the correct loss term for PML so that the outgoing waves not reflected in the domain can be problem-specific. However, it is difficult to choose an optimal conductive parameter(σ) for the best performance of the PML. Another approach, 62 the sponge layer method [16], requires extra grid points to make padding which implements the functions to prevent the reflection. Kirchhoff’s formula-based method developed by Ting and Miksis [69] also requires more computation. Recently, Bradley [1] introduced a high-order nonreflecting boundary condition which uses a compressed nonreflecting boundary kernel to achieve high-order for 2D and 3D cases. This boundary kernel approach is very complicated and consumes O(M N logN ) work in 2D and O(M N log2N ) work in 3D, for M time steps and N spatial grid points. Without using the complicated boundary kernel approach, our higher-order scheme for outflow boundary condition works well for multi-dimensional problems with consuming O(P M N ) works, for the temporal order of accuracy 2P [14]. For arbitrarily big problems, M N >> P , so the complexity will be reduced to O(M N ). In this chapter, we discuss 4-th order derivation for outflow boundary conditions. In Sec- tion 3.2, we derive the high-order outflow boundary condition and give a concrete definition. Section 3.3 gives the suitable initial condition for this approach, and finally we report a set of test cases in Section 3.4. 3.2 Extension to Higher Order in Accuracy We need to apply boundary conditions appropriately to obtain high order accuracy. Dirichlet, Neumann, and periodic boundary conditions can be implemented straightforwardly ( see [15] for details) using the approach shown in [14]. Since we have to know the behavior of the wave equation outside of the domain x ≤ a and x ≥ b, the outflow boundary condition requires little additional computation as detailed in the following. In order to obtain higher-order outflow boundary conditions, we consider more terms in the Taylor series. Note, equation (2.52) uses three terms for second-order accuracy. We use the same approach as explained previously in section 2.2.3.4. Besides, we perform some iterations along the boundary stencil to ensure convergence. For this procedure, we need to know the solution at time steps, tn+1, tn+2, and so on, if we choose the centered fi- 63 nite difference stencil to derive equations with higher-order accuracy (as was used to derive second-order outflow boundary conditions). A problem with this approach is that we need data at a future time, tn+2, which we don’t have direct access to. Instead, we prefer to use one-sided backward finite difference stencils to obtain higher-order accuracy and obtain boundary coefficients explicitly. Let us derive outflow boundary conditions with fourth order accuracy. First, we construct a time interpolant at the right boundary (x > b) using a Taylor series expression of the form (Here z = s/∆t as introduced in 2.52 ) u(b, tn − z∆t) ≈ u(b, tn) − z∆tut(b, tn) + z2∆t2 2 utt(b, tn) − z3∆t3 6 uttt(b, tn) + z4∆t4 24 utttt(b, tn) (3.1) By truncating higher order error terms, we only need to approximate the first time derivative to fourth order accuracy, second time derivative to third order accuracy, third time derivative to second order accuracy, and fourth time derivative to first order accuracy. To perform this approximation, we use a five point backward finite difference stencil to approximate ut, utt, uttt, and utttt to the desired order of accuracy. we obtain, u(b, tn − z∆t) ≈ un(b) − z (cid:16)25 12 (cid:16)35 (cid:16)5 (cid:16) un−1(b) + un(b) − 4un−1(b) + 3un−2(b) − 4 3 un−3(b) + un(b) − 26 3 12 un(b) − 9un−1(b) + 12un−2(b) − 7un−3(b) + 2 un(b) − 4un−1(b) + 6un−2(b) − 4un−3(b) + un−4(b) un−3(b) + 11 12 un−4(b) un−2(b) − 14 3 19 2 3 2 (cid:17) z2 + 2 − z3 6 z4 24 + un−4(b) 1 4 un−4(b) (cid:17) (cid:17) (cid:17) (3.2) Integrating this expression analytically using Lemma 2.1, we arrive at Bn = e−βBn−1 + γ0un(b) + γ1un−1(b) + γ2un−2(b) + γ3un−3(b) + γ4un−4(b) (3.3) 64 where, E3(β) + E4(β), 35 12 E2(β) − 5 2 γ0 = E0(β) − 25 E1(β) + 12 E2(β) + 9E3(β) − 4E4(β), γ1 = 4E1(β) − 26 3 E2(β) − 12E3(β) + 6E4(β), γ2 = −3E1(β) + 19 2 E2(β) + 7E3(β) − 4E4(β), 14 3 E2(β) − 3 11 2 12 4 γ3 = 3 γ4 = −1 4 E3(β) + E4(β). E1(β) + E1(β) + (3.4) Likewise, by considering the left boundary x < a, we get An = e−βAn−1 + γ0un(a) + γ1un−1(a) + γ2un−2(a) + γ3un−3(a) + γ4un−4(a). (3.5) We can also compute the coefficients for second-order accuracy using the explicit approach in this form, where, An = e−βAn−1 + γ0un(a) + γ1un−1(a) + γ2un−2(a). Bn = e−βBn−1 + γ0un(b) + γ1un−1(b) + γ2un−2(b). E1(β) + E2(β), γ0 = E0(β) − 3 2 γ1 = 2E1(β) − 2E2(β), γ2 = −1 2 E1(β) + E2(β). (3.6) (3.7) Now we have equations to compute the homogeneous boundary coefficients An and Bn to second and fourth-order accuracy. Note that the boundary constants corresponding to operator D[u] (denoted by An 1 , Bn 1 ) are independent of the boundary constants corresponding 65 to operator D2[u] (denoted by An 2 , Bn 2 ). Therefore, our fourth-order wave solution can be constructed on two levels. • Level 1 Compute D[u] using u; An 1 and Bn 1 are obtained by second order solution implicitly (2.58) or explicitly (3.6). • Level 2 Compute D2[u] using D[u]; An 2 and Bn 2 are obtained by fourth order solution (3.3) and (3.5) explicitly Now, we provide algorithms for fourth-order outflow boundary conditions in one dimen- sion. We first modify Algorithm 5 to work for outflow boundary conditions with fourth-order accuracy in time. This Algorithm 11 computes γ coefficients (see 3.7 and 3.4 ) using second- order centered, and fourth-order backward finite different stencils in Level 1 and Level 2 computations respectively. Initially, outflow boundary coefficients (vector H) are set to zero, meaning boundary coefficients at time step t0 are zero, and previous solutions at boundaries are maintained (ubppr) for the computation. During the computation of L−1 using outflow (Algorithm 12), we use an iterative method to ensure the solution converges. In each iteration, it computes boundary coefficients and updates the solutions within the boundary stencil. We perform the iteration until it satisfies a criterion |u − utemp|∞ < tol where tol is a tolerance which may be chosen to be quite small. We developed an Algorithm 13 to compute outflow boundary coefficients using the previously explained approach. A general procedure to compute the γ coefficients for any order based on a given finite dif- ference stencil is designed in Algorithm 14. This algorithm utilizes a function f dcoef f F (k, ¯x, x) published by [49] to compute coefficients for finite difference approximations of the derivative of order k at ¯x based on grid values at points in the boundary stencil x. In Algorithm 15, the function E represents the expression defined in lemma 2.1 66 Algorithm 11 Compute u at time tn+1 by imposing outflow boundary conditions 1: un, un−1, and un−2 - array of u values at time tn, tn−1, and tn−2 respectively, 2: bdryA, bdryB - boundary points, 3: expA expB- exponential vector of grid distance from left boundary and right boundary respectively along the line, 4: ν - array of weighted nodes along the line, 5: expweight - array of exponentially weighted nodes along the line, 6: ps, pt -order of accuracy in space and time respectively, 7: xA, xB - left and right boundary points respectively. 8: stimp2 = 1 : −1 : 1 9: stexp2 = 0 : −1 : −4 10: g(1, :) = gamma(β, 2, stimp2) 11: g(2, :) = gamma(β, 4, stexp4) 12: P = 2 13: Dterms = 0 14: for k = 1 to P do 15: D = u − linv out(un, β, bdryA, bdryB, expA, expB, ν, expweight, ps, 2k, H, ubppr, g, maxit, tol) 16: 17: Dterms = Dterms + poly high(β, P )D u = D 18: 19: end for 20: un+1 = 2un − un−1 + Dterms 3.3 Appropriate Initial Condition Our MOLT based implicit numerical scheme computes solution un+1 at time tn+1 using the solutions un and un−1 at previous time steps tn, and tn−1. Therefore our initial condition should be handled carefully in order to avoid order reduction of the scheme. In Causley’s higher order paper [14] he introduced a technique to approximate u1 at time t1 using the initial solution at time t0, u0. However, since our high-order outflow scheme requires solutions un, un−1, un−2, un−3, and un−4 at the boundary points for the time levels tn, tn−1, tn−2, tn−3, and tn−4 to compute homogeneous coefficients, the above approximation is not enough to go with high-order outflow scheme. Therefore, we require the initial solution u1, u2, u3, and u4 to use our higher order scheme. In order to obtain those values, we first use the second order scheme with the initial approximation of u1 and obtain u2, u3, and u4, then afterwords we use our 67 Algorithm 12 linv out: Compute L−1 for outflow boundary condition 1: u - array of u values at time tn, 2: β - averaging parameter, 3: bdryA, bdryB - boundary points, 4: expA expB- exponential vector of grid distance from left boundary and right boundary respectively along the line, 5: ν - array of weighted nodes along the line, 6: expweight - array of exponentially weighted nodes along the line, 7: ps -order of accuracy in space, 8: pt - order of accuracy in time. 9: H - boundary coefficients at time step tn, 10: ubppr - solution at the boundary points at previous time steps, 11: g - a vector of γ coefficients for a given order of accuracy, 12: maxit - maximum number of iterations, 13: tol - tolerance number for the convergence. 14: Compute particular solution 15: I = f astconvolution(u, ν, expweight, ps) 16: Boundary correction 17: utemp = u 18: it = 0 19: repeat 20: H = apply bc outf low(uA, uB, bdryA, bdryB, β, H, ubppr, g, pt) 21: 22: utemp = update stencil(utemp, I, expA, expB) it = it + 1 23: until (|u − utemp|∞ < tol) OR (it < maxit) 24: Operate L−1 25: k = pt 2 26: linv = I + H(k, 1)expA + H(k, 2)expB fourth order scheme to compute the solution u5. 3.4 Numerical Results In this section, several test cases are reported for higher-order accuracy in 2D and 3D. 3.4.1 Convergence Test for High-Order Outflow We first show fourth order convergence of outflow boundary conditions by performing a refinement study on a rectangular domain Ω = [−1, 1] × [−1, 1] and a cubic domain Ω = [−1, 1] × [−1, 1] × [−1, 1] for 2D and 3D respectively. In each case, we use a point 68 Algorithm 13 apply bc outf low: Apply outflow boundary conditions 1: IA, IB - solution at boundary points, 2: bdryA, bdryB - boundary points, 3: H - boundary coefficients at time step tn, 4: ubppr - solution at the boundary points at previous time steps, 5: g - a vector of γ coefficients for a given order of accuracy, 6: β - averaging parameter, 7: p - order of accuracy in time. 8: p = 2 ∗ k 9: H(k, 1) = H(k, 1)e−β + g(k, 1)IA 10: H(k, 2) = H(k, 2)e−β + g(k, 1)IB 11: for j = 2 to p +1 do for bp = 0 to 1 do 12: 13: 14: 15: end for 16: for j = 1 to p do 17: 18: end for 19: ubppr(p + 1, 0) = uxA 20: ubppr(p + 1, 1) = uxB H(k, bp) = H(k, bp) + g(k, j)ubppr(p + 2 − j, bp) ubppr(j, :) = ubppr(j + 1, :) end for Algorithm 14 gamma : Compute the γ coefficients for a given order of accuracy cf s(:, j) = f dcoef f F (j, 0, stencil) 1: β - averaging parameter, 2: p - order of accuracy in time, 3: stencil - array of uniform grid points 4: for j = 1 to p do 5: 6: end for 7: for k = 1 to p+1 do for j = 1 to p do 8: 9: 10: 11: end for 12: g(1) = g(1) + E(0, β) g(k) = g(k) + (−1)jE(j, β) end for source cos(ωt), ω = 1 placed at center of the domain ( (0, 0) in 2D and (0, 0, 0) in 3D), and run the evaluation up to dimensionless time T = 2.0, with a fixed spatial resolution of 160 × 160 spatial points in 2D and 160 × 160 × 160 in 3D. The discrete L∞ norm of the error is constructed at each time step and maximum error over all time steps is used to graph in Figure 3.1 for ∆t = 125×10−3 , k = 1 to 5, with outflow boundary conditions. The 2k 69 Algorithm 15 E : Compute the coefficients E based on lemma 2.1 1: m and v - integers, 2: m ≥ 0 and v > 0 3: Pmv = 0 4: for l = 0 to m do Pmv = Pmv + vl 5: l! 6: end for 7: Emv = 1 vm (1 − e−v)Pmv CFL (= c∆t ∆x ) value changes proportional to the time step size ∆t with fixed spatial step size, ∆x = ∆y = ∆z. (a) in 2D (b) in 3D Figure 3.1: Fourth order convergence of (a) 2D and (b) 3D wave solver using outflow bound- ary conditions with the spatial step size h = 1.25×10−2. This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 on a (a) square and (b) cubic domain with a point source cos(ωt), ω = 1 at the center of the domain. The CFL (= c∆t h ) value changes proportional to the time step size ∆t with fixed spatial step size, h. 3.4.2 Rotating Gaussian Pulse In this section, we examine the ability of our outflow scheme to handle the effect of a rotating Gaussian pulse. First, we perform refinement studies on a square domain Ω = [−1, 1]×[−1, 1] for a rotating Gaussian pulse through different angles. We chose the Gaussian 70 pulse as an initial solution, (cid:16) −36 e ( cos(θ)x+sin(θ)y r1 )2 +( sin(θ)x+cos(θ)y r2 )2(cid:17) (3.8) with r1 = 2 and r2 = 1 and rotate the angle θ from Π 18 to Π 2 for this refinement studies. We run the evaluation up to the dimensionless time T = 2.0, with a fixed spatial resolution of 100 × 100 spatial points. The discrete L∞ norm of the error is constructed at each time step and maximum error over all time step is used to graph in Figure 3.2 for ∆t = 0.2 2k , k = 0 to 4, with our fourth order outflow boundary conditions. Figure 3.2: Fourth order convergence of the oval shape Gaussian pulse (however it reduces to 4 due to the ADI splitting error) given by the 2 on square domain Ω = [−1, 1]2 by imposing the above third order near to θ = Π Equation 3.8 rotated through Π 18 to Π Outflow boundary conditions along the boundaries. This self-refinement study measures L∞ norm of the error at time T = 2.0 with h = 0.02. The convergence plot shows that the order of the accuracy reduces from fourth-order to above third-order while the angle goes to Π 4 . This is acceptable behavior because the highest splitting error should be obtained for the angle Π 4 due to the ADI splitting. 71 3.4.2.1 Time Evaluation of 50o angled Gaussian pulse In this section, we present a time evolution of the same Gaussian pulse defined in equa- tion 3.8 with r1 = 2, r2 = 1 and angle θ = 50o using outflow boundary conditions. We chose a rectangular domain Ω = [−1, 1]× [−1, 1] with spatial resolution of 300× 300 spatial points, and set the time step size ∆t = 3.3 × 10−3 to maintain CFL value 0.5. Figure 3.3 shows snapshots of the wave at different time instants (a) t = 0.3597 (b) t = 0.6864 (c) t = 1.0131 (d) t = 1.3695 Figure 3.3: Time evolution of a Gaussian pulse given by the Equation 3.8 with angle θ = 50o on square domain Ω = [−1, 1]2 by imposing outflow boundary conditions along the boundaries. Here the time step size ∆t = 3.3 × 10−3 and CFL value is 0.5. The wave is leaving completely as expected. (a) θ = 30o (b) θ = 50o (c) θ = 70o (d) θ = 90o Figure 3.4: Evolution of a Gaussian field given by the Equation 3.8 at time t = 0.6864, rotated through different angles (a) θ = 30o, (b) θ = 50o, (c) θ = 70o, and (d) θ = 90o on square domain Ω = [−1, 1]2 by imposing outflow boundary conditions along the boundaries, Here the time step size ∆t = 3.3 × 10−3 and CFL value is 0.5 at time t = 0.1287 using outflow boundary conditions. We observe the same behaviour of the wave for rotated Gaussian pulse through different angles. The waves leave completely through the curved boundary. 72 3.4.3 Outflow Boundary Condition Along Curve Boundaries We performed an experiment to examine the ability of our outflow boundary treatment along the curved boundaries. For this experiment, we built a 2D geometry with a curved boundary as shown in Figure 3.5 (a). The curve is a portion of an ellipse which has axes a and b and centred at the origin (0, 0). rw = ab √ b2 + a2 tan2 θ rh2 = rw tan θ rh1 = b This 2D object can be represented as a 2D graph (Figure 3.5), G(V, E(ES, EA)) with a set of vertices V = (v1, v2, v3, v4), straight edges ES = [es1(≡ (v2, v3)), es2(≡ (v3, v4)), es3(≡ (v3, v4)), es4(≡ (v4, v1))], and an arch edge EA = [ea1(≡ (v1, v2, O))]. (a) Geometry (b) Graph Figure 3.5: (a) Geometry and (b) graph representation of the 2D object constructed by using an oval of width 2a and height 2b and a rectangle of 2rw×(rh1+rh2). Here rh1 = b and v1, v2, v3, and v4 are the vertices of the object. 73 We made an instant of this object with a = 1.6, b = 4.8, and θ = 50o, place a point source cos ωt with ω = 0.5 at O(0, 100∆x), apply outflow boundary condition along the elliptical curved edge and Dirichlet along the remaining straight edges. Time evaluation of the source field for the spatial grid width ∆x = ∆y = 0.039 and time step size ∆t = 0.0195 is shown in Figure 3.6 (a) t = 1.8135s (b) t = 11.778s (c) t = 12.6945s (d) t = 13.7085s Figure 3.6: Time evolution of the point source field cos ωt with ω = 0.5 at (0, 100∆x) in the object as shown in Figure 3.5 by imposing outflow boundary conditions along the curved boundary and homogeneous Dirichlet along the straight boundaries. Here the spatial step size, ∆x = ∆y = 0.039 and the time step size, ∆t = 0.0195 Further, we performed refinement studies for the same structure and obtained the fourth- order accuracy in time by computing the L∞ error for the final time T = 5.0 as shown in the plot 3.7. 3.4.4 Convergence Studies Using Analytical Solution We perform this experiment to confirm the order of accuracy of our scheme using an analytical solution. We use the fourth order scheme (see [14] for the proof) as given in Equation 3.9 which includes the source term s, un+1 − 2un + un−1 = −β2Cxyz[un] − β2Dxyz − β4 12 (cid:0)sn+1 + 10sn + sn−1(cid:1) + + β2 12α2 (cid:19) Cxyz Cxyz[un] β2 12α4Cxyz[sn] (3.9) (cid:18) 74 (a) Field at t = 5.0 (b) Convergence plot Figure 3.7: (a) Snapshot of the point source field at time T = 5.0 and (b) Fourth order convergence of 2D wave solver using Outflow boundary conditions along the curve boundary with the spatial grid width ∆x = ∆y = 0.078. This is a self- refinement study which measures L∞ norm of the error at time T = 5.0 on the object shown in Figure 3.5 with a point source cos(ωt), ω = 0.5 at the center of the box, (0, 100∆x) Here we use three dimensional integration operator Cxyz acting on the source term s for a high-order correction. We derive an analytical solution for a point source sin(4πt) placed at (x0, y0, z0) as follows, (cid:112)(x − x0)2 + (y − y0)2 + (z − z0)2 + δ2 e−36(t/15−1).2sin(4πt) 1 4π (3.10) where δ ∈ R+, 0 < δ2 < ∆x We choose a cubic domain Ω = [−1, 1]× [−1, 1]× [−1, 1], place a point source sin(4πt) at the center (0, 0, 0), impose outflow boundary conditions along the all six boundary surfaces, and run the evaluation up to dimensionless time T = 3.7, with a fixed spatial resolution of 80 × 80 × 80. The discrete L∞ norm of the error is constructed at each time step and maximum error over all time steps is used to graph in Figure 3.8 for ∆t = 2×10−1 , k = 0 to 2k 3. 75 Figure 3.8: Fourth order convergence of 3D wave solver using outflow boundary conditions with the spatial step size h = 2.5 × 10−2. This measures L∞ norm of the error which compares with analytical solution using a point source sin(4πt) placed at the center of the domain. 3.5 Summary We derived a solution for high-order outflow boundary conditions using our MOLT based scheme which uses the extended backward finite difference time-stencil along the boundaries and applies a suitable initial condition. The computation begins with the second-order scheme followed by the 4th-order scheme after three time-steps. We evaluated the scheme using several test cases including curved boundaries. In the next chapter, we describe the derivation of the high-order embedded Neumann boundary method for multi-dimensions. 76 CHAPTER 4: Higher Order Embedded Neumann Boundary Method 4.1 Introduction In this chapter, we review the second-order embedded Neumann boundary method for the scheme described in Chapter 2 and extend the order of spatiotemporal accuracy for this scheme to arbitrary high-order. Further, we develop a general framework for complex geometries in 3D. We choose an embedded boundary method to deal with complicated geometries. As an initial test for the simulation of magnetrons, we impose the embedded Neumann boundary condition (later, in Chapter 5 we introduce our embedded PEC boundary condition). The MOLT based scheme described in Chapter 2 gives second-order accuracy for embedded Neumann boundary condition, therefore we want to improve the accuracy for arbitrary higher-order in time and space. In this chapter, we describe a high-order 3D embedded Neumann boundary method and generalize it for complex geometries in 3D. The high-order accuracy in time is achieved using the Lax-Wendroff approach as explained previously in Chapter 2. Our treatment for embedded Neumann boundary condition includes the ghost points and boundary corrections. First, the solution at the ghost points are computed using Hermite-Birkhoff interpolation, and then interior points were updated using a boundary correction iteration along with boundary stencils. Hence, we extend the interpolation stencils to achieve high-order accuracy in space. The embedded boundary method can deal with curved boundaries and material inter- faces by superimposing boundaries on Cartesian grids and it also can handle off-grid points. Thus, the embedded boundary method is the most suitable approach for complicated geome- tries in different application areas such as fluid dynamics in aerospace engineering [55; 44], 77 electromagnetics [3], and acoustics [45; 2]. Further, several material interface problems are dealt with embedded boundary methods such as dielectric materials. Here, the question is how to handle wave propagation along with material/domain interfaces. Researchers propose appropriate jump conditions [11; 3; 46; 22] which were derived from the domain’s material properties and apply the suitable boundary conditions along the interfaces using interpolation techniques. In [46], the embedded method was used for numerical modeling of electromagnetic scattering of an incident plane wave by a dielectric circular cylinder. They dealt with discontinuous wave propagation speed using appropriate jump conditions, but the scheme is in second-order temporal and spatial accuracy for 2D problems. In [11] and [18] second and fourth-order accuracy was achieved using an upwinding embedded boundary method for Maxwell’s equations based on Runge-Kutta type time discretization. However, the approach is very cost expensive and is not as fast as our MOLT based scheme. Another method proposed in [3] uses interior boundary points instead of exterior ghost points to apply Neumann or Dirichlet boundary conditions with the embedded boundary method and gives fourth-order accuracy for the wave equation in 2D. The method is based on a compact Pade-type discretization of spatial derivatives together with the Taylor series method in time and it can remove small-cell stiffness problems for both Neumann or Dirichlet boundary con- ditions. However, it also deals with cost expensive and slow matrix operations to obtain the solution. An approach similar to our scheme has been used in [9; 8]. They provide an implicit ADI based numerical scheme for solving the heat and wave equations in a general embedded- boundary domain with high-order spatial accuracy using Fourier continuation spatial approx- imations and second-order temporal accuracy. Thus, the big difference from our approach is that they use a Fourier-based method to invert the semi-discrete differential operator instead of constructing and inverting the modified Helmholtz operator as we do. Besides, the Fourier continuation method allows for minimal dispersion errors for wave propagation problems. However, the schemes are only second-order accurate in time and higher-order 78 accuracy requires resorting to Richardson extrapolations. In the following sections, we describe the higher-order Neumann boundary conditions in Section 4.2, detailing its implementation in 1D, 2D, and 3D. In Section 4.3, we describe the generalized embedded method suitable for any complex geometry and finally provide numerical results. 4.2 Derivation of The Scheme Neumann boundary conditions for boundary geometries along grid lines can be directly imposed using a two-point boundary correction by utilizing surface grids in 2D and volu- metric grids in 3D. The method can be applied to polygonal domains by the use of multiple overset grids, each aligned with a boundary segment, which communicates with the interior grid through interpolation on a ghost cell region. We chose the embedded boundary methods explained in [13] as the basis to impose Neumann boundary conditions for curved boundary surfaces in 3D. Here we determine the corresponding Dirichlet boundary values at the end- points of each x−, y− and z− sweep lines that result in the approximate satisfaction of the Neumann boundary condition. The Neumann boundary condition is unconditionally unstable even when space is set continuously [46; 22]. This means all embedded boundary methods need a touch of diffusion to stabilize it. In our earlier work, we developed a O(∆t2) with one higher-order diffusion term to stabilize it. Here we extend the diffusion term to the fourth-order accuracy. 4.2.1 One Dimensional Scheme Let us consider 1D domain Ω ≡ [xa, xb] with a left boundary xB, and we choose a ghost point xG, which is the exterior-point to the simulation domain (it should have at least one interior neighbor). We give the formulation of the left domain in detail next, As shown in Figure 4.1, xI and xII are two interior points, their distance from boundary 79 Figure 4.1: Geometry of 1D embedded Neumann boundary domain Ω ≡ [xa, xb] with left boundary xB, ghost point xG, and interpolation points xI and xII, where the distance ξI = |xI − xa|, ξII = |xII − xa|, and ξG = |xG − xa| point are ξI = |xI − xa| and ξII = |xII − xa| along the normal. We define ξG as the distance between ghost and boundary point (ξG = |xG− xa|). We apply Hermite-Birkhoff interpolate, P (ξ) through the points, xG, xa, xI, and xII by imposing the conditions P (cid:48)(0) = 0, P (ξI) = uI and P (ξII) = uII, and explicitly solving for uG gives: (cid:19) (cid:18) ξ2 II − ξ2 II − ξ2 ξ2 I G (cid:18) ξ2 G − ξ2 II − ξ2 ξ2 I I (cid:19) ul II + O(∆x2), ul+1 G = ul I + We have introduced additional notation of l + 1 and l, which designate how we will set up a fixed point iteration using the Hermite-Birkhoff interpolate. The iteration itself is a simple first order fixed point method. Let w(l) and w(l+1) be the solutions at the lth and (l + 1)th iterations. We choose a tolerance tol, and maximum number of allowed iterations mit, and define a stopping criterion as |w(l+1) − w(l)|∞ < tol OR nit > mit where nit is the current iteration number. The detailed steps for this iterative approach are given in Algorithm 16, As detailed in [13] we need to include an artificial dissipation term into the solution to maintain stability for such an embedded boundary method. Thus the second-order solution un+1(x) at the next time step looks like, un+1(x) − 2un(x) + un−1(x) = −β2D(1) x [un] − D(2) x [un−1]. (4.1) where  is an artificial dissipation coefficient, that satisfies 0 <  < 1. The Value of D(2) at the previous time step can be used as D(2) go with one more computing-level to obtain D(2) x [un] x [un−1] at the current time step, and we need to x [un]. According to [13], β has to be reduced 80 Algorithm 16 Compute L−1 for 1D scheme 1: Compute particular solution I n+1 = I[un] 2: Initial guess wn+1(0) ≈ 3un − 3un−1 + un−2 3: Initialize the iteration counter nit = 0 4: repeat 5: and wn+1(l) I II G = G I I 6: 7: (cid:17) (cid:17) wn+1(l) II−ξ2 II−ξ2 ξ2 (cid:16) ξ2 (cid:16) ξ2 , at the interpolation points xI, and xII and as well as at Compute wn+1(l) the interpolation points along the right boundary Compute solution at the ghost points using the following Hermite-Birkhoff interpolant wn+1(l) Compute homogeneous coefficients a1 and b1 using the fact (Here, xaG and xbG are ghost points ), wn+1(l) = I n+1 + a1e−α(x−xaG) + b1e−α(xbG−x). Compute solution and update boundary stencil wn+1(l+1) = I n+1 + a1e−α(x−xaG) + b1e−α(xbG−x) nit = nit + 1 10: until (|wn+1(l+1) − wn+1(l)| < tol OR nit > mit) G−ξ2 II−ξ2 ξ2 wn+1(l) II 8: 9: + I I by a small amount from the case published in [14]. 4.2.2 Two Dimensional Scheme When extending the embedded Neumann boundary method to 2D cases, we need to consider two important changes: every spatial point will be treated in x and y Cartesian coordinates, and the ghost point approximation will be made by using bi-linear interpolation. To achieve high-order accuracy, we need more computing-levels (perform P computing-levels for 2P -th order of accuracy in time ) and extend the interpolation stencils by using 4-points along the normal for the Hermite-Birkhoff interpolation and an eight-point stencil for the bi- linear interpolation. We can derive second-order and fourth-order 2D solution with artificial dissipation ([13]) as given below, un+1 − 2un + un−1 = −β2C(1) (cid:18) (cid:19) xy [un] + C(2) xy [un−1], un+1 − 2un + un−1 = −β2C(1) xy [un] − β2D(2) xy − β4 12 C(2) xy C(1) xy [un] + C(3) xy [un−1]. (4.2) (4.3) We now consider the situation of a two-dimensional domain with a curved boundary Γ 81 displayed in Figure 4.2. In the 2D case, we define a ghost point (xG, yG) which is an exterior point (xi, yj), but at least one of the neighboring points (xi±1, yj) or (xi, yj±1) should be an interior point. In order to compute the value of unknown uG at the ghost point location (xaG, yaG), we construct a quadratic Hermite-Birkhoff boundary interpolant P (ξ) along the direction normal to the boundary, which intersects the boundary curve Γ at location (xa, ya). We choose four interior points selected along the normal of the surface with distances, ξI = |(xI, yI)− (xa, ya)| = ∆sI, ξII = |(xII, yII)− (xa, ya)| = 2∆sI, ξIII = |(xIII, yIII)− (xa, ya)| = 3∆sI and ξIV = |(xIV , yIV )−(xa, ya)| = 4∆sI, where we will typically take ∆sI = 2∆x, and √ compute values uI, uII, uIII, and uIV at points (xI, yI), (xII, yII), (xIII, yIII), and (xIV , yIV ) respectively by interpolating from interior grid points, and use these interpolation values uI, uII, uIII, and uIV to perform the Hermite-Birkhoff interpolant P (ξ). Further, the distance from the boundary to the ghost point is defined as ξG = |(xG, yG) − (xa, ya)|. (a) Geometry (b) 8-points stencil Figure 4.2: (a)Geometry of 2D embedded Neumann boundary method with a boundary point (xB, yB), ghost point (xG, yG), and interpolation points (xI, yI), (xII, yII), (xIII, yIII) and (xIV , yIV ) and (b) symmetric 8-points stencil to interpolate the solution at the point (xI, yI). The solution at the ghost point, uG can be computed using the Hermite-Birkhoff inter- polant P (ξ) by imposing the conditions P (cid:48)(0) = 0, P (ξI) = uI, P (ξII) = uII, P (ξIII) = uIII, 82 and P (ξIV ) = uIV given by, ul+1 G = ul I + (cid:19) (cid:18) ξ2 III − ξ2 III − ξ2 ξ2 I G (cid:18) ξ2 IV − ξ2 IV − ξ2 ξ2 G II (cid:19) ul II + (cid:18) ξ2 G − ξ2 III − ξ2 ξ2 I I (cid:19) ul III + (cid:18) ξ2 G − ξ2 IV − ξ2 ξ2 II II (cid:19) IV + O(∆x2), ul We obtain uI and uII by the approximations using the symmetric 8-point stencil inter- polation. Suppose the interpolation point uI lies in a cell with corners (xi, yj), (xi+1, yj), (xi+1, yj+1), and (xi, yj+1), we have the following approximation for uI uI = w1ui,j + w2ui+1,j + w3ui+1,j+1 + w4ui,j+1 + w5ui−1,j−1 + w6ui+2,j−1 + w7ui+2,j+2 + w8ui−1,j+2 (4.4) where w1 = w3 = w5 = w7 = 4∆x∆y (xi+2 − xI)(yj+2 − yI) (xI − xi−1)(yI − yj−1) (xi+1 − xI)(yj+1 − yI) (xI − xi)(yI − yj) 4∆x∆y 4∆x∆y 4∆x∆y w2 = w4 = w6 = w8 = 4∆x∆y (xI − xi−1)(yj+2 − yI) (xi+2 − xI)(yI − yj−1) (xI − xi)(yj+1 − yI) (xi+1 − xI)(yI − yj) 4∆x∆y 4∆x∆y 4∆x∆y As described in 1D (4.2.1), we use an iterative scheme to improve the accuracy of the solution. For each term in the operator on the right hand side of the second order solution given by equation 4.2, or fourth order in time solution given by equation 4.3, we must identify x [un] +L−1 the correct ghost point that allows that term to guarantee that the normal derivative is zero along the boundary. Taking the C(1)[un] operator and expanding the operator out we have C(1)[un] = L−1 the ghost point for wx = L−1 y [wx] such that when solving for the boundary correction terms the operator satisfies ∂nwx|∂Ω = 0, ∂nwy|∂Ω = 0, ∂nwxy|∂Ω = 0, ∂nwyx|∂Ω = 0. y [un]. The fixed point iteration identifies x [wy], and wyx = L−1 y [un], wxy = L−1 x [un], wy = L−1 y [un]−L−1 x [un]−L−1 x L−1 y L−1 83 Let us consider wx and wyx = L−1 y [wx], knowing that wy and wxy are similar. The iterative process for wx starts by making an initial guess at the direct boundary values wx|∂Ω, using an extrapolate in time wx n−2 at each boundary n+1,(0) ≈ 3wx n−1 + wx n − 3wx point (Algorithm 17). Algorithm 17 Compute wx(= L−1 x [u]) 1: Compute the interior sweep (particular solution) I n+1 n − 3wx 2: Initial guess wx 3: Initialize the iteration counter nit = 0 4: repeat 5: 6: n+1(0) ≈ 3wx for k = 1 to ny do n−1 + wx x = Ix[un] n−2 Compute wxI(k), and wxII(k), at the interpolation points (xI(k), y(k)), and (xII(k), y(k)) and as well as at the interpolation points along the right boundary using bilinear interpolation Compute solution at the ghost points (xaG(k), y(k)) and (xbG(k), y(k)) using the Hermite-Birkhoff interpolant Compute homogeneous coefficients a1(k) and b1(k) using the fact wx Compute solution and update boundary stencil wx n+1(l) = I n+1 + ax(k)e−α(x−xaG(k)) + bx(k)e−α(xbG(k)−x) n+1(l+1) = I n+1 + ax(k)e−α(x−xaG(k)) + bx(k)e−α(xbG(k)−x) 7: 8: 9: 10: 11: end for nit = nit + 1 12: until (|wx n+1(l+1) − wx n+1(l)| < tol OR nit > mit) Given the update on wx, the next step is to consider wyx = L−1 y [wx]. The process starts with the initial guess wx n+1,(0) ≈ 3un − 3un−1 + un−2 at the boundary (Algorithm 18). The same process is done for wy and wxy. The Hermite-Birkhoff interpolate enforces that the normal derivative is zero for wx, wy, wxy and wyx such that C(1)[un] satisfies the normal derivative condition on ∂Ω to within tolerance. For the fourth order formulation, we compute the boundary conditions for C(1)[un] and then we repeat the process for C(2) and D(2) acting on C(1)[un]. Next, we will eamine the higher order embedded Neumann boundary method for 3D in detail, 84 y [wx]) Algorithm 18 Compute wyx(= L−1 1: Compute the interior sweep (particular solution) I n+1 2: Initial guess wyx 3: Initialize the iteration counter nit = 0 4: repeat 5: 6: n+1(0) ≈ 3un − 3un−1 + un−2 for k = 1 to nx do y = Iy[wx n] Compute wyxI(k), and wyxII(k), at the interpolation points (x(k), yI(k), and (x(k), yII(k)) and as well as at the interpolation points along the right boundary using bilinear interpolation Compute solution at the ghost points (x(k), yaG(k)) and (x(k), ybG(k)) using the Hermite-Birkhoff interpolant Compute homogeneous coefficients a1(k) and b1(k) using the fact wyx Compute solution and update boundary stencil wyx n+1(l) = I n+1 + ay(k)e−α(y−yaG(k)) + by(k)e−α(ybG(k)−y) n+1(l+1) = I n+1 + ay(k)e−α(y−yaG(k)) + by(k)e−α(ybG(k)−y) 7: 8: 9: 10: 11: end for nit = nit + 1 12: until (|wyx n+1(l+1) − wyx n+1(l)| < tol OR nit > mit) 4.2.3 Three Dimensional Scheme We now consider the situation of a three-dimensional domain with a curved-surface boundary Γ displayed in Figure 4.3. In the 3D case, we define a ghost point (xaG, yaG, zaG) which is an exterior point (xi, yj, zk), but at least one of the neighbouring points (xi±1, yj, zk) or (xi, yj±1, zk) or (xi, yj, zk ± 1) of it should be an interior point. As described in the one and two dimensional cases, to compute uG, we construct a quadratic Hermite-Birkhoff boundary interpolant P (ξ) along the direction normal to the boundary surface, which intersects the boundary curved-surface Γ at location (xa, ya, za). We choose two interior points selected along the normal of the surface with distances, ξI = |(xI, yI, zI) − (xa, ya, za) = ∆sI| and ξII = |(xII, yII, zII) − (xa, ya, za) = 2∆sI|, where we will typically take ∆sI = √ 3∆x, and compute values uI and uII, at points (xI, yI, zI) and (xII, yII, zII) respectively by interpo- lating from interior grid points, and use these interpolation values uI and uII to perform the Hermite-Birkhoff interpolant P (ξ). Further, the distance from the boundary to the ghost point is defined as ξG = |((xaG, yaG, zaG)) − (xa, ya, za)|. The solution at ghost point, uG can be computed using the Hermite-Birkhoff interpolant 85 Figure 4.3: Geometry of 3D embedded Neumann boundary method with a boundary point (xB, yB, zB), ghost point (xG, yG, zG), and interpolation points (xI, yI, zI), (xII, yII, zII), (xIII, yIII, zIII) and (xIV , yIV , zIV ) P (ξ) by imposing the conditions P (cid:48)(0) = 0, P (ξI) = uI, and P (ξII) = uII, given by, (cid:19) (cid:18)ξ2 II − ξ2 II − ξ2 ξ2 I G (cid:19) (cid:18) ξ2 G − ξ2 II − ξ2 ξ2 I I ul+1 G = ul I + ul II + O(∆x2), We obtain uI and uII by the approximations using the tri-linear interpolation. Sup- pose the interpolation point uI lies in a cell of rectangular prism with corners (xi, yj, zk), (xi+1, yj, zk), (xi+1, yj+1, zk) , (xi, yj+1, zk), (xi, yj, zk+1), (xi+1, yj, zk+1), (xi+1, yj+1, zk+1), and (xi, yj+1, zk+1), we have the following approximation for uI uI = w1ui,j,k + w2ui+1,j,k + w3ui+1,j+1,k + w4ui,j+1,k + w5ui,j,k+1 + w6ui+1,j,k+1 + w7ui+1,j+1,k+1 + w8ui,j+1,k+1 (4.5) 86 where w1 = w3 = w5 = w7 = ∆x∆y∆z (xi+1 − xI)(yj+1 − yI)(zk+1 − zI) (xI − xi)(yI − yj)(zk+1 − zI) (xi+1 − xI)(yj+1 − yI)(zI − zk) (xI − xi)(yI − yj)(zI − zk) ∆x∆y∆z ∆x∆y∆z ∆x∆y∆z ∆x∆y∆z (xI − xi)(yj+1 − yI)(zk+1 − zI) (xi+1 − xI)(yI − yj)(zk+1 − zI) (xI − xi)(yj+1 − yI)(zI − zk) (xi+1 − xI)(yI − yj)(zI − zk) ∆x∆y∆z ∆x∆y∆z w2 = w4 = w6 = w8 = ∆x∆y∆z Three dimensional 2-nd and 4-th order schemes can be defined as follows, un+1 − 2un + un−1 = −β2C(1) (cid:18) xyz[un−1], xyz[un] + C(2) (cid:19) un+1 − 2un + un−1 = −β2C(1) xyz[un] − β2D(2) xyz − β4 12 C(2) xyz C(1) xyz[un] + C(3) xyz[un−1]. (4.6) (4.7) y x and L−1 in the z− to perform z− sweeps. Taking the C(1)[un] operator x [un]− z [un]. The fixed point iteration identifies the We use the operators C and D in three dimensions, and need to operate L−1 direction in addition to L−1 and expanding the operator out we have C(1)[un] = L−1 L−1 x L−1 z L−1 ghost point for wx = L−1 wxz = L−1 operators satisfies ∂n(.)|∂Ω = 0 when solving for the boundary correction terms. z [un], wyx = L−1 x [wzy], and wyxz = L−1 y [wx], wzy = L−1 y [wxz], and each of these z L−1 y [un] − L−1 x [un], wy = L−1 z [wyx], wxzy = L−1 x L−1 y L−1 y [un], wz = L−1 x [wz], wzyx = L−1 x [un] − L−1 z [un] +L−1 y [un] +L−1 y L−1 y L−1 x L−1 z L−1 z [wy], z Let us consider wx, wyx = L−1 y [wx] and wzyx = L−1 z [wyx], knowing that computing of (wy, wzy, wxzy), and (wz, wxz, wyxz) are similar. The iterative process for wx starts by making an initial guess at the direct boundary values wx|∂Ω, using an extrapolation in n−2 at each boundary point. Given the update on n+1,(0) ≈ 3wx n−1 + wx n − 3wx time wx wx, the next step is to consider wyx = L−1 n+1,(0) ≈ 3wyx n− 3wyx wyx the following step is to consider wzyx = L−1 n−1 + wyx y [wx]. The process starts with the initial guess n−2 at the boundary, and then using the update on wyx, z [wyx]. The process starts with the initial guess 87 in Algorithm 19. z z [wyx]) x , L−1 y , and L−1 n+1,(0) ≈ 3un− 3un−1 + un−2 at the boundary. We use three separate algorithms for each wzyx sweep L−1 z , and each algorithm processes three major tasks: computation of a particular solution, a boundary correction, and then obtaining the value of actual L−1. We give the detailed steps for L−1 Algorithm 19 Compute wzyx(= L−1 1: Compute the interior sweep (particular solution) I n+1 2: Initial guess wzyx 3: Initialize the iteration counter nit = 0 4: repeat 5: 6: 7: n+1(0) ≈ 3un − 3un−1 + un−2 for j = 1 to nx do for i = 1 to ny do z = Iz[wyx n] Compute wzyxI(i,j), and wzyxII(i,j), at the interpolation points (x(j), y(i), zI(i,j)), and (x(j), y(i), (zII(i,j)) and as well as at the interpolation points along the right boundary using bi-linear interpolation Compute solution at the ghost points (x(j), y(i), zaG(i,j)) and (x(j), y(i), zbG(i,j)) using the Hermite-Birkhoff interpolant Compute homogeneous coefficients az(i,j) and bz(i,j) using the fact wzyx Compute solution and update boundary stencil wzyx end for n+1(l) = I n+1 + az(i,j)e−α(z−zaG(i,j)) + bz(i,j)e−α(zbG(i,j)−z) n+1(l+1) = I n+1 + az(i,j)e−α(z−zaG(i,j)) + bz(i,j)e−α(zbG(i,j)−z) 8: 9: 10: 11: 12: 13: end for nit = nit + 1 14: until (|wzyx n+1(l+1) − wzyx n+1(l)| < tol OR nit > mit) Next, we are going to develop a generalized higher dimensional, higher-order scheme for problems with complex geometries. 4.3 Treatment for Complex Geometries We now consider domains with complex geometries such as a model for A6 magnetron that has a set of arch areas joined together in 2D. The A6 magnetron has a complex ge- ometry (Figure 1.2), and so it serves as a nice numerical test case for this boundary con- dition as opposed to a physical accurate boundary condition in a magnetron. We chose an embedded boundary method to solve such complex problems. The complex geometries in 88 higher-dimensional problems may be even harder. In our approach, however, since the higher dimensional problems are solved with ADI schemes, we need to worry about one-dimensional lines. Thus, we need to know all the relevant properties of each line to solve it. However, these lines are going to be broken into several segments due to complex boundaries. First, we compute boundary intersection points ( which may be in off-grids), where the grid lines intersect with boundaries. We give a detailed explanation for 3D complex geometries in this section. In our approach, first, we decompose higher-dimensional geometry to 2D slices and then represent it using 2D graphs with a set of vertices and edges. The edges can be classified as straight-line edges and curves or arches. The graph G can be given as, G(V, E(ES, EA)), ES ≡ (vsi, vsj), and EA ≡ (Oij, vai, vaj) (4.8) where vsi, vsj, vai, and vaj are the vertices and Oij is center of the arch (vai, vaj) which may be a circular arch (4.4 (a)) or an elliptical arch (4.4 (b)). In Figure 4.4, each object has three vertices (v1, v2, v3), two straight edges es1(≡ (v1, v2)), es2(≡ (v3, v1)), and one arch edge ea1(≡ (v2, v3, Oa1)) that can be represented as a graph G([v1, v2, v3], [[es1, es2], [ea1)]]) (a) with circular arch (b) with elliptical arc Figure 4.4: 2D graph with 3 vertices (v1(x1, y1), v2(x2, y2), and v3(x3, y3), 2 straight edges es1(≡ (v1, v2)), es2(≡ (v3, v1)) and 1 circular arch (a) or elliptical arch (b) edge ea1(≡ (v2, v3, Oa1)) centered at Oa1(xa1, ya1). 89 4.3.1 Pre-computing Before beginning the PDE evolution, we need to perform a pre-computation to identify key characteristics of the geometry such as boundary and relevant parameter values along the grid lines in each direction. The flow diagram in Figure 4.5 shows major tasks performed in pre-computation and results obtained at the end of each task. Figure 4.5: Flow diagram of pre-computation for 3D problems with complex geometries; it cuts a 3D object into 2D slices Sxy, Sxz, makes 2D graphs Gxy, Gxz, and identifies the properties of the line segments Px, Py, and Pz. 1. Slicing This task decomposes a given 3D object into a set of 2D objects by slicing along the grid lines. We can form three sets of 2D slices Syz, Sxz, and Sxy in each direction x, y, and z respectively, but two sets are sufficient for our computation (if the object is uniform along any direction, we need to process slicing in that direction only - yielding 90 a set of slices). Suppose we apply the slicing along z and y directions, we will have a set of xy slices, Sxy placed along z with distance ∆z apart and a set of xz slices, Sxz placed along y with distance ∆y. 2. Graphing This task defines the geometry of each 2D slice using 2D graphs as expressed in equation 4.8. We have two sets of 2D slices, Sxy of size nz and Sxz of size ny where nz and ny are the number of spatial grid points along the z and y directions respectively. Each slice should have at least one 2D object. Suppose we assume a 3D object has only convex surfaces along with primary directions x, y, and z, then 2D objects in the slices will be defined by Gxy(k), k = 1, 2, ...nz Gxz(j), j = 1, 2, ...ny Therefore, we will obtain, nz + ny graphs. 3. Segmentation This task generates line segments along each grid line in each direction x, y, and z using intersections between the grid lines and the surface of the object. The segments along x and y can be computed using the graph Gxy and along x and z can be computed using the graph Gxz, but we should avoid unneeded duplicate computations for x in order to reduce computing cost. In the final stage of the pre-computation all relevant parameters/properties Px of size nx × nsx, Py of size ny × nsy and Pz of size nz × nsz for each line segment will be computed. Here nsx, nsy and nsz are the number of line segments along x, y, and z respectively. Since, however, each line in the same direction does not need to have the same amount of segments, nsx, nsy and nsz are not single variables, they are vectors/arrays to keep track of the number of segments in each line. 91 Now, the sizes of Px, Py, and Pz are nx(cid:88) i=1 nsx(i), ny(cid:88) j=1 nsy(j) , and nz(cid:88) k=1 nsz(k) respectively. After we are done with the pre-computation, we will have three sets of line segments with their properties (Px, Py, and Pz) for each direction. These segments will be utilized by our higher-order solver to obtain a solution with higher-order accuracy. Actually, during each sweep, each line segment is treated individually with its own piecewise constant wave speed, and then the 3D higher-order equation 2.77 is applied to compute the solution un+1 at the next time step tn+1. 4.3.2 Geometry for A6 MDO in 2D A model of A6M is a useful device to study high power microwave sources. This device has a solid cathode and a solid anode with six cavities as shown in Figure 4.6-(a)). In our modeling, we consider both categories solid and transparent cathodes ( [28; 56]) and design the coupling horn by removing the top portion of a rectangular pyramid and fixing it with a cavity of the anode. If we install the whole pyramid in the same place, the top will be at the center of the anode. In order to model the A6 magnetron with diffraction output (MDO) in 2D, we assume that the cathode and anode are infinitely long in the z-direction. Figure 4.6-(a) shows the axial view of the MDO in 2D. Suppose α1 and α2 are the angular widths of the vane and cavity respectively, we can compute the angles θ1 and θ2 for each cavity of arch in terms of α1 and α2 (Figure 4.6-(b)). The angle of the kth arch is given by, 1 = θk−1 θk 2 = θk−1 θk 1 + (k − 1)(α1 + α2) 1 + kα1 + (k − 1)α2 1 + α1 + α2 = θ1 1 + 2α1 + α2 = θ1 (4.9) (4.10) where, since this is a symmetric uniform model α1 + α2 = π 3 . The model has two closed objects, a six wings object of anode with an object of circle inside which represents the cathode. The geometry of a six wings object can be represented with a graph which has 24 vertices, 12 straight line edges and 12 arches [68]. 92 During the pre-computation, for each horizontal and vertical lines, we find the intersection (a) Axial view (b) Cavity angle Figure 4.6: A6 MDO (a) Axial view with a solid cathode of radius rc in the center, the anode with inner radius ra, vane radius rv, vane angle α1, cavity angle α2, and a coupling horn of radius rh (b) the angle of kth cavity which is represented by the angles θk 1 and θk 2 , θk 2 = θk 1 + α1. points with edges of graphs and compute ghost, boundary, and interior points( (xI, yI) and (xII, yII)) Figure 4.7 and relevant parameters which are needed to apply bi-linear interpo- lation. The pre-computation task forms two sets of line segments for each direction which have to be treated during x and y sweeps with our numerical scheme. 4.4 Numerical Results In order to evaluate our numerical scheme, we consider several test cases in 2D and 3D using classic EM problems such as electromagnetic scattering of a plane wave by metal circular cylinders ( which can be extended to dielectric circular cylinders and applicable to photonic crystal applications), scattering of laser light by spherical objects, and electron beams in A6M. 93 Figure 4.7: Geometrical structure of A6 MDO with key points; intersection, boundary, ghost, and interpolation points required to obtain (a) x− and (b) y− sweeps. 94 4.4.1 Cylindrical Scattering We begin with electromagnetic wave scattering of a plane wave by a metal cylinder of radius r which is homogeneous along the z direction. By assuming the cylinder is infinitely tall, we can represent this case in a 2D model. We choose a rectangular domain Ω = [−1, 1]×[−1, 1] with a circle of radius r = 0.5 with center at (0, 0), and analyse a point source field (cos(ωt), ω = 10) placed at (−1 + 3∆x, 0), ∆x = 0.0156 (Figure 4.8 - (a)) by applying Neumann, outflow and Dirichlet boundary conditions along the circular boundary, vertical rectangular borders (in y direction), and horizontal rectangular borders (in x direction) respectively. We set the grid size to be 128 × 128, time step size ∆t = 0.0078, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. Figure 4.9 shows snapshots of the scattered wave at different time instants. (a) A cylinder at the center (b) Two cylinders symmetrical (c) Two cylinders asymmetrical Figure 4.8: 2D model for a single cylindrical scatter at the (a) center and two (b) symmet- ric and (c) asymmetric cylindrical scatters in the corners. Here the red mark indicates the point source. For the following two experiments, we use two circular scatters instead of one and consider symmetric and asymmetric geometry with a rectangular domain Ω = [−2, 2] × [−2, 2]. We move the point source to the bottom right corner and place another circular scatter with the same radius r1 = 0.3 for the symmetric case (Figure 4.8 - (c)) and with radius r2 = 0.15 for the asymmetric case (Figure 4.8 - (d)) at the top right corner (1.5, 1.5). We apply Neumann and outflow boundary conditions along the circular and the four rectangular boundaries respectively and retain the same values for parameters ∆x, ∆t, β , and . Figures 4.10 and 95 (a) t = 1.0608 (b) t = 2.0436 (c) t = 3.1668 (d) t = 4.329 Figure 4.9: Time evolution of a point source field cos(ωt), ω = 10 with a circular scatter of radius r = 0.5 in the center of the square domain Ω = [−1, 1]2 with a spatial step size ∆x = ∆y = 0.0156 and time step size ∆t = 0.0078. 4.11 show snapshots of the wave at different time instants for the symmetric and asymmetric cases respectively. (a) t = 1.2714 (b) t = 1.9032 (c) t = 2.886 (d) t = 4.29 Figure 4.10: Time evolution of a point source field cos(ωt), ω = 10 with two symmetric circular scatters of radii r1 = r2 = 0.3 in the bottom-left and top-right corners of the square domain Ω = [−2, 2]2 with a spatial step size ∆x = ∆y = 0.0156 and time step size ∆t = 0.0078. 4.4.1.1 Convergence studies We demonstrate fourth order convergence by performing a refinement study on a 2D square domain Ω = [−1, 1] × [−1, 1] with a circular scatter of radius r = 0.5 at the center, and a point source cos(ωt), ω = 1 placed at (0,−1 + 3∆x), ∆x = 0.0156. This runs up to the final time T = 2.0, with a fixed spatial resolution of 160 × 160 spatial points. The discrete L∞ norm of the error is constructed at each time step and maximum error over all time steps is used to graph 4.12 for ∆t = 6.25 2k , k = 1 to 5, with Neumann and outflow boundary 96 (a) t = 1.2714 (b) t = 1.9032 (c) t = 2.886 (d) t = 4.29 Figure 4.11: Time evolution of a point source field cos(ωt), ω = 10 with two asymmetrical circular scatters of radii r1 = 0.3 and r2 = 0.15 in the bottom-left and top-right corners of the square domain Ω = [−2, 2]2 with a spatial step size ∆x = ∆y = 0.0156 and time step size ∆t = 0.0078. conditions along the circular and rectangular boundaries respectively. Figure 4.12: Fourth order convergence of 2D wave solver using a point source cos(ωt), ω = 1 placed at (0,−1 + 3∆x), ∆x = 0.0156 on a square domain Ω = [−1, 1]2 with a circular scatter of radius r = 0.5 at the center by imposing Neumann and outflow along the circular and square boundaries. This is a self-refinement study which measures L∞ norm of the error at the final time T = 2.0. For the space convergence study, a point source cos(ωt), ω = 1 is placed at (−1+3∆x, 0), and runs up to time T = 2.0, with fixed CFL values 2 and 1, where the wave speed c is 97 chosen to be 1. Our L∞ norm error plots of the solution at time T = 2.0 with varying resolutions show fourth order accuracy (Figure 4.13 - (a)). We also note that the normal derivative along the boundary converges with fourth order accuracy (Figure 4.13 - (b)). (a) Space convergence of solution (b) Space convergence of boundary derivative Figure 4.13: Convergence plots for a) space using entire solution and b) space using boundary derivative on a square domain Ω = [−1, 1]2 with a circular scatter of radius r = 0.5 at the center by imposing Neumann and outflow along the circular and square boundaries. This is a self-refinement study which measures L2 norm of the error at the final time T = 2.0. 4.4.2 Spherical Scattering In this section, we evaluate our 3D scheme using the scattering of light by a spherical object. For the first test case, we chose a cubic domain (Ω = [−1, 1] × [−1, 1] × [−1, 1]) with a spherical scatter of radius r = 0.5 at the center of the cube (0, 0, 0) (Figure 4.14, (a)). We place a source field (cos(ωt), ω = 10) placed 3∆z above the center of the bottom surface( xy plane), at (0, 0,−1 + 3∆z), ∆z = 0.0156 by applying Neumann and outflow boundary conditions along the surface of the sphere and six surfaces of the cube respectively. We set the grid size to be 128× 128× 128, time step size ∆t = 0.0078, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. Figure 4.15 shows snapshots of the wave at different time 98 instants. (a) Single sphere (b) Two symmetric spheres (c) Two asymmetrical spheres Figure 4.14: 3D model for a single spherical scatter at the center of a cube, two b) symmetric and c) asymmetric spherical scatters in the corners. Figure 4.15: Time evolution of a point source field cos(ωt), ω = 10 with a spherical scatter of radii r = 0.5 in the center of the cubic domain Ω = [−1, 1]3 with a spatial step size ∆x = ∆y = ∆z = 0.0156 and time step size ∆t = 0.0078. For the next two experiments we placed two equal sized spheres of radii r1 = r2 = 0.3 for the first case and two spheres with different radii r1 = 0.3 and r2 = 0.15 for the second case at the corners of the cubical domain as shown in Figure 4.14 - (b), (c). We move the point source to (0, 0,−1 + 3∆z), ∆z = 0.0156 and simulate the wave propagation by apply- ing Neumann and outflow boundary conditions along the surface of the two spheres and six surfaces of the cube respectively. We retain the same values of parameters ∆x, ∆t, β, and  as before. Figure 4.16 and 4.17 show snapshots of the wave at different time instants for the symmetric and asymmetric cases respectively. 99 Figure 4.16: Time evolution of a point source field cos(ωt), ω = 10 with two symmetric spherical scatters of radii r1 = r2 = 0.3 in the bottom-left and top-right corners of the cubic domain Ω = [−1, 1]3 with a spatial step size ∆x = ∆y = ∆z = 0.0156 and time step size ∆t = 0.0078. Figure 4.17: Time evolution of a point source field cos(ωt), ω = 10 with two asymmetric spherical scatters of radii r1 = 0.3 and r2 = 0.15 in the bottom-left and top- right corners of the cubic domain Ω = [−1, 1]3 with a spatial step size ∆x = ∆y = ∆z = 0.0156 and time step size ∆t = 0.0078. 4.4.2.1 Convergence Studies We demonstrate fourth order convergence in time and space by performing self-refinement studies on a rectangular domain Ω = [−1, 1]3 with a spherical scatter of radius r = 0.5 at the center. For the time convergence study, a point source cos(ωt), ω = 1 is placed at (0, 0,−1 + 3∆z), ∆z = 0.0156, and runs up to dimensionless time T = 2.0, with a fixed spatial resolution of 160×160×160 spatial points. The discrete L∞ norm of the error is constructed at each time step and maximum error over all time steps is used to graph Figure 4.18 -a) for ∆t = 0.025 2k , k = 1 to 5, with Neumann and outflow boundary conditions along the sphere surface and cube surfaces respectively. For the space convergence study, a point source cos(ωt), ω = 1 is placed at (0, 0,−1 + 3∆z), and evolves to dimensionless time T = 2.0, with fixed CFL values 2, 1, and 0.5 where 100 (a) Time convergence (b) Space convergence Figure 4.18: Fourth order (a) time and (b) space convergence plots using a spherical scatter of radius r = 0.5 at the center of a cubic domain Ω = [−1, 1]3 by imposing Neumann and outflow along the surface of the sphere and cubic boundaries. This is a self-refinement study which measures (a) L∞ and (b) L2 norm of the error at the final time T = 2.0. the wave speed c is chosen to be 1. Our L2 norm error plots of the solution at dimensionless time T = 2.0 with varying resolutions show fourth order accuracy (Figure 4.18 -b)). 4.4.3 Electron Beam in Magnetrons 4.4.3.1 A6 Magnetron with Diffraction Output (MDO) in 2D In this section, we test our solver using a modified model for the A6M in 2D. We use a point source instead of voltage source and made two experiments using solid and transparent cathodes with a coupling horn of radius rh = 4.9, radius of the vane resonators rv = 4.11, the anode radius ra = 2.11, the cathode radius rc = 1.58, the angular width of the vane, 6 , and the cavity, α2 = π α1 = π circular source, cos(ωt)(|x2 + y2 − r2 6 . For the first test case, we simulate the scattered wave of a c| < 2∆x), ω = 10 placed along with the cathode. 4.19 shows snapshots of the wave captured at different time instants. In the second test case, we use six point-sources that are placed around the circle of the 101 cathode each nearby a cavity (we maintain 15o from left edge-radius of each cavity) and simulate the waves propagated by these sources. 4.20 shows snapshots of the wave captured at different time instants. We applied Neumann boundary conditions along with the model for both cases. Figure 4.19: Time evolution of a circular source field cos(ωt)(|x2 +y2−r2 c| < 2∆x), ω = 10 in the model of A6RM with a solid cathode of radius rc = 1.58 and anode of inner radius ra = 2.11, vane radius rv = 4.11 and the coupling horn radius rh = 4.9 and the angular width of the vane, α1 = π 6 , and the cavity, α2 = π 6 . Figure 4.20: Time evolution of fields provided by six point sources in the model of A6M with a transparent cathode of radius rc = 1.58 and anode of inner radius ra = 2.11, vane radius rv = 4.11 and the coupling horn radius rh = 4.9 and the angular width of the vane, α1 = π 6 , and the cavity, α2 = π 6 . Further, we perform a convergence test by a refinement study on a model for A6M without cathodes. We set the radii for coupling horn, vane resonators, and anode as rh = 4.9 rv = 4.11, and ra = 2.11 respectively. The angular width of the vane, α1 = π 6 and the cavity angle, α2 = π 6 . We place a point source cos(ωt), ω = 1 at the center (0, 0), and evolve to the final dimensionless time T = 2.0, with a fixed spatial resolution of 160 × 160 spatial points. The discrete L∞ norm of the error is constructed at each time step and maximum error 102 over all time steps is used to graph Figure 4.21 for ∆t = 0.3125 2k , k = 1 to 6, with Neumann boundary conditions along the model. It shows fourth order convergence. (a) A6M Figure 4.21: Fourth order convergence of fields provided by a point source cos(ωt), ω = 1 at the center (0, 0), of A6MDO with the anode of radii rh = 4.9 rv = 4.11, and ra = 2.11 and angular width of the vane, α1 = π 6 . This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 for fixed spatial step size ∆x = ∆y = 3.13 × 10−2 6 and the cavity, α2 = π 4.4.3.2 A6 Magnetron in 3D For this experiment, a 3D domain of Ω = [−5, 5]3 is chosen with radius of the vane resonators rv = 4.11, anode radius ra = 2.11, cathode radius rc = 1.58, angular width of vane 20o, cavity angle 40o, and thick h = 6 along z. We set the grid size to be 1283, time step size ∆t = 0.039, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. In this simulation of A6 magnetron with a transparent cathode which is mimicked by using six emitters/line sources, we calculated the time evolution by imposing embedded Neumann boundary conditions over the boundary stencil during the x−, y−, and z− sweeps (Figure 103 4.23). Figure 4.22 shows the geometrical setup of the 3D A6 magnetron, indicating key on/off mesh grid points used by the scheme. (a) on x−sweep (b) on y−sweep (c) on z−sweep Figure 4.22: Key points used for the simulation of 3D A6 magnetron; intersection, boundary, ghost, and interpolation points required to obtain (a) x−, (b) y−, and (c) z− sweeps. 4.4.3.3 Convergence Studies Further, we perform self-refinement studies to evaluate the consistency of our scheme for A6 magnetron with a transparent cathode. We retain the same values of geometrical parameters (rh, rv, ra, rc, α1, α2, and h) and emitters. The simulation evolves to the final 104 (a) t = 1.5625 (b) t = 4.6875 (c) t = 6.25 (d) t = 11.7187 Figure 4.23: Time evolution of fields provided by six point sources in the model of 3D A6 magnetron with a transparent cathode of radius rc = 1.58 and anode of inner radius ra = 2.11, vane radius rv = 4.11 and the coupling horn radius rh = 4.9, the angular width of the vane, α1 = π 9 , and thickness h = 6. 9 , cavity angle, α2 = 2π time T = 2.0 by imposing Neumann boundary conditions along the model. For the time- convergence test, we set a fixed spatial resolution of 1283 spatial points, the discrete L∞ norm of the error is constructed at each time step and maximum error over all time steps is used to graph Figure 4.24 -a) for ∆t = 0.3125 2k , k = 1 to 4. It shows fourth order convergence. (a) time convergence (b) space convergence Figure 4.24: Fourth order (a) time convergence and (b) space convergence of fields provided by six point sources in the model of 3D A6 magnetron with a transparent cathode and the anode of radii rh = 4.9 rv = 4.11, and ra = 2.11 and angular width of the vane, α1 = π 6 . This is a self-refinement study which measures L∞ norm of the error at time T = 2.0 for a fixed spatial step size of ∆x = ∆y = 3.13 × 10−2 6 and the cavity, α2 = π 105 For space-convergence test, we use fixed CFL values 2, 1, and 0.5 where the wave speed c is chosen to be 1. Our L∞ norm error plots of the solution at dimensionless time T = 2.0 with varying resolutions show fourth order accuracy (Figure 4.24 -b)). 4.5 Summary We proposed a fourth-order 3D embedded boundary method and developed a generalized framework for complex geometry domains in 3D. We use the extended stencil for both com- putations of the solution at the interpolation points (8-point stencil) and the ghost points (4- point stencil) to achieve 4th-order convergence in space. And we use the same Lax-Wendroff approach (exchange the time derivative to spatial derivative) to obtain higher-order accuracy in time. We gave a detailed description of the high-order Neumann boundary conditions and its implementation in 1D, 2D, and 3D. We evaluated our scheme for several problems with complex geometries including A6 magnetron in 3D. Since the magnetrons originally designed using metal, our best choice to represent its boundaries should be PEC. Therefore, we discuss the PEC boundary condition in the next chapter. We give descriptions of the derivation of PEC boundary conditions for our MOLT based scheme and its evaluation. 106 CHAPTER 5: Second-Order PEC Boundary Condition 5.1 Introduction In this chapter, we develop an embedded PEC boundary condition for the scheme de- scribed in Chapter 2 and develop an exact simulation for magnetrons by solving the vector potential form of Maxwell’s equations. Since magnetrons are physically designed by metal boundaries and have complex ge- ometries, we need PEC boundary conditions to represent the exact metal boundaries and the embedded boundary method to deal with complex geometries. Hence, we derive an embedded PEC boundary condition based on the electromagnetic vector potential for the simulation of magnetrons in this chapter. We introduce a fast and geometrically flexible approach to calculate the solution to Maxwell’s equations in vector potential form under the Lorenz gauge. As presented, the method is 4th order in time and second-order in space, but the A-stable formulation could be extended to higher order in both time and space. While there is no conceptual limitation to develop this in 3D, our initial work has centered on 2D. The eventual goal is to combine this method with particle methods for the simulations of plasma. In the current work, the scheme is evaluated for EM wave propagation within an object that is bounded by PEC. Vector potential formulations of electromagnetism are widely used in classical and quan- tum physics. Maxwell’s equations describe the time-evolution of four fields: the magnetic flux density (B), the electric field intensity (E), the electric flux density (D), and the magnetic field intensity (H). It is often convenient to employ a formulation based on the magnetic vector potential A and electric scalar potential φ. Maxwell’s equations then reduce to un- coupled wave equations for the vector potential A and scalar potential φ under the Lorenz 107 gauge condition [37]. We use our MOLT based implicit A-stable scheme to solve these wave equations. A Perfect Electric Conductor (PEC) boundary condition is derived based on the conti- nuity of the magnetic flux normal to the boundary and is applied to the magnetic vector potential. Our formulation avoids the additional compatibility conditions used in [34]. This embedded PEC boundary condition is a slightly modified embedded Neumann boundary condition. The fundamental difference is that one of the partial spatial derivatives in the PDE is negative. As presented, the current embedded boundary formulation is second-order in space but could be extended to higher-order. In this chapter, we first derive wave equations for vector and scalar potentials A and φ under the Lorenz gauge condition discussed in section 5.2. Then we describe our two- dimensional high-order implicit scheme using ADI splitting and the higher-order embedded PEC boundary conditions in section 5.3, and finally we give numerical results for several test cases in Section 5.5. 5.2 2D Electric Scalar and Magnetic Vector Potential The macroscopic Maxwell’s equations are: ∂tB = −∇ × E, ∂tD = ∇ × H − J, ∇ · B = 0, ∇ · D = ρ. (5.1) (5.2) (5.3) (5.4) where J is the electric current density, and ρ is the electric charge density. In a linear isotropic medium, D = E and B = µH. The dielectric constant  = 0r with 0 and r being the free space and relative permittivity respectively, and the permeability µ = µ0µr with µ0 and µr being the free space and relative permeability respectively. 108 The electric scalar potential φ and magnetic vector potential A are related to E and B by E = −∇φ − ∂tA, B = ∇ × A. For free space, Ampere’s law (Equation 5.2) is, 1 c2 ∂tE = ∇ × B − µ0J. (5.5) (5.6) (5.7) where the free space phase velocity c = 1√ (µ00). Substituting E and B using equations 5.5 and 5.6 and imposing the Lorenz gauge con- dition 1 c2 ∂tφ = −∇A. a wave equation in terms of the magnetic vector potential A results: t A − ∇2A = µ0J. 1 c2 ∂2 (5.8) Similarly by imposing the Lorenz gauge condition, a wave equation for the scalar potential φ in free space is obtained: t φ − ∇2φ = 1 c2 ∂2 ρ 0 . (5.9) 5.3 Perfect Electric Conductor Boundary The electric field (E) is continuous along the boundary and the magnetic flux (B) is continuous along the normal to the boundary. Since the Perfectly Conducting Boundary (PEC) has infinite electrical conductivity (σ = ∞), there will be no interior electric field (E2 = 0) in the perfect conductor. It also follows that there is no magnetic field (H2 = 0). 109 Figure 5.1: Boundary surface between two regions with electric fields E1 and E2, magnetic field H1 and H2, electric flux densities D1 and D2, and magnetic flux densities B1 and B2 respectively. Here, Js is the surface current, ρs is the surface charge density, and n is the normal vector pointed out from the region two (Perfect Electric Conductor). The boundary conditions for the PEC become: n × E1 = 0, n × H1 = Js, n · B1 = 0, n · D1 = ρs. (5.10) (5.11) (5.12) (5.13) where n is the unit normal vector to the boundary and Js and ρs are the surface current density and charge density respectively. Consider a two-dimensional vector/scalar potential (we are solving 2D problems in this section). If we choose to restrict B to the x-y plane, the vector potential A only has a z component, A = Az(x, y)z. (5.14) Using equation 5.6 and the boundary condition represented by equation 5.12 we get n · (∂yAz − ∂xAz) = 0. (5.15) 110 5.4 Numerical Approximation for PEC Boundary Conditions In this section we discuss enforcing boundary conditions with the above method. For PEC boundary conditions, we extend our previous work on Neumann boundary conditions to PEC (Chapter 4, [13]). Before we move with boundary conditions, let us express the second and fourth order two dimensional schemes, An+1 − 2An + An−1 = −β2C(1) An+1 − 2An + An−1 = −β2C(1) xy [An]. xy [An] − (cid:18) β2D(2) xy − β4 12 (cid:19) C(2) xy C(1) xy [An]. (5.16) (5.17) where superscripts on the operators Cxy and Dxy denote level numbers. On a domain Ω = [xa, xb]× [ya, yb] outflow boundary and PEC conditions can be defined by, 1. PEC: ∂yAz(xa, t) = 0, −∂yAz(xb, t) = 0, −∂xAz(ya, t) = 0, ∂xAz(yb, t) = 0. 2. Outflow boundary condition: ∂tA(xa, t) = c∂xA(xa, t) , ∂tA(xb, t) = −c∂xA(xb, t), ∂tA(ya, t) = c∂yA(ya, t) , ∂tA(yb, t) = −c∂yA(yb, t). Unlike Neumann boundary conditions, the challenge here is that PEC needs y derivatives during the x− sweeps and x derivatives during the y− sweeps. This is challenging for an ADI method because of the separation of directional information. For a rotated square, the PEC boundary condition can use a similar approach as used in the original method for Neumann boundaries in [13] because the directions are coupled at the boundary. Rotating the domain through any angle, say θ, as shown in figure 5.2, we have, 111 1. Left: cos(θ)∂yAz − sin(θ)∂xAAz = 0, 2. Right: −cos(θ)∂yAz + sin(θ)∂xAz = 0, 3. Up: sin(θ)∂yAz − cos(θ)∂xAz = 0, 4. Down: −sin(θ)∂yAz + cos(θ)∂xAz = 0. Figure 5.2: A PEC square cavity rotated by an angle θ with normal vectors nL, nD, nR, and nU along the left, down, right, and up boundaries. In the next sections, we demonstrate how to construct a solution for A given the Dirichlet boundaries, followed by an extension of the fixed point map for Neumann boundaries in [13] to the case of PEC to generate the Dirichlet boundary conditions (including an adaptation for the mesh aligned case). 5.4.1 Using Dirichlet Boundary Conditions for A to Capture PEC Boundary In our embedded boundary method for Neumann, we constructed an efficient conver- gent Neumann to Dirichlet Map, converting a boundary condition on the outward normal derivative of a function into a condition that constructs ghost points outside the domain. The method is designed such that the generated ghost points enforce the desired Neumann 112 condition on the boundary. In the next two sections, we adapt this idea to PEC, which is a condition on the tangential derivative. We now show how to carry out the sweeps with given values at the end of each line of the 4th order ADI method. In our 2D scenario, to construct the operator L−1 to build C(1), C(2) and D(2), each line is treated as a 1D problem, where the lines are evaluated with x− sweeps and y− sweeps respectively [13]. We note that the operators that make up D(k) and C(k) have their own boundary conditions to enforce at level k. x and L−1 y The objective is given the boundary condition, solve for each ax and bx along the x− direction and ay and by along the y− direction. We will denote A(ti, xa) = AL(ti), A(ti, xb) = AR(ti), A(ti, ya) = AD(ti) and A(ti, yb) = AU (ti). Here, [xa, xb] and [ya, yb] are horizontal and vertical lines that we make sweeps along. These permit the boundary conditions to be incorporated into the method. As we did for wave solvers in our early work, [15], taking AL(ti) and AR(ti) as fixed, for component of the second order term C(1) along the line ∂Ω = [xa, xb], we arrive at the L−1 x two equations in two unknowns: AL(tn+1) + (β2 − 2)AL(tn) + AL(tn−1) I AL(tn) + (xa) + ax − Mxbx , AR(tn+1)(β2 − 2)AR(tn) + AR(tn−1) I AR(tn) + (xb) + Mxax + bx . = β2(cid:16) = β2(cid:16) (cid:104) (cid:104) µ0 α2 Jn(cid:105) α2 Jn(cid:105) µ0 (cid:17) (cid:17) where Mx = e−α(xb−xa). We can rearrange the linear system into unknown and known values, ax + Mxbx = wP a , Mxax + bx = wP b . 113 Solving the linear system for the unknowns ax and bx gives ax = (wP a − MxwP b ) (1 − M 2 x ) , bx = (wP b − MxwP a ) (1 − M 2 x ) , (5.18) where (cid:16) (cid:16) wP a = wP b = 1 β2 1 β2 AL(tn+1) + (β2 − 2)AL(tn) + AL(tn−1) AR(tn+1) + (β2 − 2)AR(tn) + AR(tn−1) (cid:104) (cid:17) − I (cid:17) − I (cid:104) µ0 α2 Jn(cid:105) α2 Jn(cid:105) µ0 AL(tn) + (xa), AR(tn) + (xb). Taking AD(ti) and AU (ti) as fixed for L−1 y of the second order term C(1) along the line ∂Ω = [ya, yb], we arrive at two equations in two unknowns for ay and by: AD(tn+1)(β2 − 2)AD(tn) + AD(tn−1) = β2(cid:16) AU (tn+1)(β2 − 2)AU (tn) + AU (tn−1) = β2(cid:16) (cid:104) (cid:104) µ0 α2 Jn(cid:105) α2 Jn(cid:105) µ0 I AD + (ya) + ay + Myby I AU + (yb) + Myay + by (cid:17) (cid:17) . , (5.19) (5.20) where My = e−α(yb−ya). Solving the linear system for the unknowns ay and by gives ay = (wP a − MywP b ) (1 − M 2 y ) , by = (wP b − MywP a ) (1 − M 2 y ) , where wP a = wP b = 1 β2 1 β2 (cid:16) (cid:16) AD(tn+1) + (β2 − 2)AD(tn) + AD(tn−1) AU (tn+1) + (β2 − 2)AU (tn) + AU (tn−1) (cid:17) − I (cid:104) (cid:17) − I (cid:104) AD(tn) + AU (tn) + µ0 α2 Jn(cid:105) α2 Jn(cid:105) µ0 (ya) (yb). For the operator C(2) + D(2) acting on C(1)[An], which is the higher order correction, one can obtain similar equations, except that the values of AL(ti), AR(ti), AD(ti) and AU (ti) at the boundaries are explicitly zero. This is due to linearity and the leading order operator satisfies the boundary condition exactly. 114 5.4.2 Embedded Boundary Method, An effective Neumann to Dirichlet Map for Creating Ghost Points Given that we know how to solve for the Dirichlet boundary conditions for equations 5.16 and 5.17, as in [14], we would like to extend this process to the PEC case. As discussed in [13], the reason to employ an embedded boundary approach for a differential boundary condition has to do with stability. The difference here is we need to develop an iterative map to find the homogeneous coefficients a and b for each piece of the operator instead of directly computing these coefficients. Our embedded boundary method is based on our initial work in [13]. In this paper, we developed a second-order embedded boundary method for Neumann boundary conditions which was second-order accurate in time and space. Using the same ideas, we extended this method to PEC boundary conditions for the vector potential in 2D. However, in [13], direct application through the ADI of the operators is more direct. Extending this work to fourth-order involves developing an approach to solving for the boundary conditions for the operators C(1), C(2), and D(2). We use an iterative method to obtain accurate values at the boundaries using an initial approximation and correct it by imposing our PEC boundary condition through a fixed point iteration. The fixed point iteration is a multi-step process that converts the derivative condition along the boundary to a Dirichlet boundary condition, to be set at the ghost point. These Dirichlet boundary values force the solution to satisfy the PEC condition at the boundary to within the tolerance of iteration of the fixed point method. It should be noted that the iteration is local at the boundary and does not involve re- computing the interior sweeps, making this update cost-effective. To understand the method, we start by considering a collection of uniform points with a boundary passing through, see figure 5.3. We define the ghost points to be the collection of points that are greater than one half of a grid spacing away, but less than two and a half grid spacings away, from the boundary. As in our work in [13], starting from the 115 Figure 5.3: Geometry of 2D embedded PEC boundary method with a boundary point (xB, yB), ghost point (xG, yG), and interpolation points (xI, yI), (xII, yII), and (xIII, yIII). ghost point we define a normal to the surface, along which we will enforce the boundary condition. We define the fixed point method along this normal. In figure 5.3, the ghost point being considered is the red grid point labeled (xG, yG), the blue point labeled (xB, yB) is the boundary point and there are three interior points needed that are related to the normal, at (xI, yI), (xII, yII), and (xIII, yIII). Their distances from the boundary point 2∆sI , ξII = |(xII, yII) − (xB, yB)| = (xB, yB) are ξI = |(xI, yI) − (xB, yB)| = and ξIII = |(xIII, yIII) − (xB, yB)| = 2∆sI, where we will typically take ∆sI = √ 2∆sI, √ 2∆x. √ The fixed point method starts by assuming we know AI, AII, AIII, and ∂T AB, which is taken as a solution at {(xI, yI), (xII, yII) and (xIII, yIII)} and the tangential derivative of a solution at (xB, yB). Here we enforce ∂T AB = 0 by making ∂T P|(xB,yB) = 0 one of the conditions we use to solve for the coefficients of the Hermite-Birkhoff interpolating polynomial, P (x, y) = c0 +c1x+c2y +c3xy. Enforcing that the Hermite-Birkhoff interpolates these points and explicitly solving for AG gives: Al+1 G = ΓIAl I + ΓIIAl II + ΓIIIAl III + O(∆x2), (5.21) 116 where, where, ΓI = 1 + γ3(γ4 + γ5) − γ6(γ1 + γ2) (γ4γ2 − γ5γ1) , ΓII = ΓIII = , (γ6γ2 + γ3γ5) (γ4γ2 − γ5γ1) (γ6γ1 + γ3γ4) (γ4γ2 − γ5γ1) , γ1 = xII − xI + yII − yI, γ2 = xI − xIII + yI − yIII, γ3 = xG − xI + yG − yI, γ4 = (xII − xI)(xB − yB) + xIyI, γ5 = (xI − xIII)(xB − yB) + xIIIyIII, γ6 = (xB − yB)(xG − x1) + xG(yG − y1). (5.22) (5.23) We have introduced additional notation of l + 1 and l, which designate how we will set up a fixed point iteration using the Hermite-Birkhoff interpolant. The iteration itself is a simple first order fixed point method. Let w(l) and w(l+1) be the solutions at the lth and (l + 1)th iterations. We choose a tolerance tol, and maximum number of allowed iterations mit, and define a stopping criterion as |w(l+1) − w(l)|∞ < tol OR nit > mit where nit is the current iteration number. We will now discuss the quasi local process for obtaining w(l+1) given the rest of the information. For each term in the operator on the right hand side of the second order solution given by equation 5.16, or fourth order in time solution given by equation 5.17, we must identify the correct ghost point that allows that term to guarantee that the tangential derivative is zero along the boundary. Taking the C(1)[An] operator and expanding the operator out we 117 x [An] + L−1 have C(1)[An] = L−1 y [An] − L−1 identifies the ghost point for wx = L−1 L−1 y [wx] such that when solving for the boundary correction terms the operator satisfies ∂T wx|∂Ω = 0, ∂T wy|∂Ω = 0, ∂T wxy|∂Ω = 0, ∂T wyx|∂Ω = 0. x L−1 y [An], wxy = L−1 y L−1 x [An] − L−1 x [An], wy = L−1 y [An]. The fixed point iteration x [wy], and wyx = Let us consider wx and wyx = L−1 y [wx], knowing that wy and wxy are similar. The iterative process for wx starts by making an initial guess at the direct boundary values wx|∂Ω, using an extrapolate in time wx n−2 at each boundary n+1,(0) ≈ 3wx n−1 + wx n − 3wx point (Algorithm 20). Algorithm 20 Compute wx(= L−1 x [A]) 1: Compute the interior sweep (particular solution) I n+1 n − 3wx 2: Initial guess wx 3: Initialize the iteration counter nit = 0 4: repeat 5: 6: n+1(0) ≈ 3wx for k = 1 to ny do n−1 + wx n−2 x = Ix[An] Compute wxI(k), and wxII(k), at the interpolation points (xI(k), y(k)), and (xII(k), y(k)) and as well as at the interpolation points along the right boundary using bilinear interpolation Compute solution at the ghost points (xaG(k), y(k)) and (xbG(k), y(k)) using the Hermite-Birkhoff interpolant Compute homogeneous coefficients a1(k) and b1(k) using the fact wx Compute solution and update boundary stencil wx n+1(l) = I n+1 + ax(k)e−α(x−xaG(k)) + bx(k)e−α(xbG(k)−x) n+1(l+1) = I n+1 + ax(k)e−α(x−xaG(k)) + bx(k)e−α(xbG(k)−x) 7: 8: 9: 10: 11: end for nit = nit + 1 12: until (|wx n+1(l+1) − wx n+1(l)| < tol OR nit > mit) Given the update on wx, the next step is to consider wyx = L−1 y [wx]. The process starts with the initial guess wx n+1,(0) ≈ 3An − 3An−1 + An−2 at the boundary (Algorithm 21). The same process is done for wy and wxy. The Hermite-Birkhoff interpolate enforces that the tangential derivative is zero for wx, wy, wxy and wyx such that C(1)[An] satisfies the tangential derivative condition on ∂Ω to within tolerance. For the fourth order formulation, we compute the boundary conditions for C(1)[An] and then we repeat the process for C(2) and D(2) acting on C(1)[An]. 118 y [wx]) Algorithm 21 Compute wyx(= L−1 1: Compute the interior sweep (particular solution) I n+1 2: Initial guess wyx 3: Initialize the iteration counter nit = 0 4: repeat 5: 6: n+1(0) ≈ 3An − 3An−1 + An−2 for k = 1 to nx do y = Iy[wx n] Compute wyxI(k), and wyxII(k), at the interpolation points (x(k), yI(k), and (x(k), yII(k)) and as well as at the interpolation points along the right boundary using bilinear interpolation Compute solution at the ghost points (x(k), yaG(k)) and (x(k), ybG(k)) using the Hermite-Birkhoff interpolant Compute homogeneous coefficients a1(k) and b1(k) using the fact wyx Compute solution and update boundary stencil wyx n+1(l) = I n+1 + ay(k)e−α(y−yaG(k)) + by(k)e−α(ybG(k)−y) n+1(l+1) = I n+1 + ay(k)e−α(y−yaG(k)) + by(k)e−α(ybG(k)−y) 7: 8: 9: 10: 11: end for nit = nit + 1 12: until (|wyx n+1(l+1) − wyx n+1(l)| < tol OR nit > mit) Further as detailed in [13] we need to include an artificial dissipation term to maintain stability for these embedded boundary domains. Thus we have (cid:18) β2D(2) xy − β4 12 C(2) xy (cid:19) An+1 − 2An + An−1 = −β2C(1) xy [An] − C(1) xy [An] + C(3) xy [An−1]. where  is an artificial dissipation coefficient that satisfies 0 <  < 1. The value of C(3) at the previous time step can be used in place of C(3) we need to go to one more computing level to obtain C(3) xy [An] xy [An−1] at the current time step, and xy [An] [68]. We obtain AI, AII, and AIII approximately, using the bilinear interpolation. Suppose the interpolation point AI lies in a rectangular cell with corners (xi, yj), (xi+1, yj), (xi+1, yj+1), and (xi, yj+1). Then we have the following approximation for AI AI =w1Ai,j + w2Ai+1,j + w3Ai+1,j+1 + w4Ai,j+1, (5.24) 119 where w1 = w3 = (xi+1 − xI)(yj+1 − yI) (xI − xi)(yI − yj) ∆x∆y ∆x∆y , , w2 = w4 = (xI − xi)(yj+1 − yI) (xi+1 − xI)(yI − yj) ∆x∆y ∆x∆y , . (5.25) 5.5 Experimental Results 5.5.1 Square Cavity Rotated Through Different Angles For this experiment, we chose a square cavity bounded by a PEC, placed a point source at the center of the cavity, and performed a ping test and mode analysis. First, we computed the vector potential A at a point inside the box using our scheme, then computed the derived frequency by taking FFT over the time history of the measured A, and finally analyzed the frequency modes for varying CFL value, step size, and rotation angle. Further, we have transformed into physical units here in contrast to the normalized units we have used up to this point. We chose a square box 21cm × 21cm with PEC boundaries in a square domain (Ω = [−21cm, 21cm]2), placed a point source 1 at the center of the box,(0, 0), and turned it on for a single time step t = ∆tns. The vector potential is measured at the point (3.36cm, 3.36cm) for the time period t = [∆tns, T ns]. The derived frequencies for different CFL values (0.5, 1.0, 2.0), rotation angles (0o, 31.42o, 45o), and resolutions are summarized in tables 5.5.1 and 5.5.1. Table 5.5.1 consists of frequencies obtained for the CFL value 1. Here, we set the wave speed c = 30 Gcms−1, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. Figure 5.4 shows the frequency distribution for θ = 31.42o which consists the 1-1 strong mode fundamental frequency 1GHz as expected and Figure 5.5 shows a frequency mode computed for different resolutions with CFL value 1. We see clear convergence to the analytically computed 1-1 mode fundamental frequency 1GHz. For the fixed CFL and wave speed, time step size ∆t reduces with reducing spatial step size ∆x. Hence, the amplitude 120 is reducing by the decreasing spatial step size. Figure 5.4: Frequency distribution for 31.42o rotated 21cm × 21cm square cavity computed using the measured vector potential at the point (3.36cm, 3.36cm) for the impulse response, h = 0.084 cm Figure 5.5: A strong fundamental mode for 31.42o rotated 21cm × 21cm square cavity com- puted for different resolutions. 121 (a) 1-3 mode, f = 2.2588GHz (b) 3-3 mode, f = 3.0305GHz (c) 1-5 mode, f = 3.6422GHz (d) 3-5 mode, f = 4.1650GHz (e) 5-5 mode, f = 5.0508GHz Figure 5.6: Convergence plots for the natural modes (a) 1-3, (b) 3-3, (c) 1-5, (d) 3-5, and (e) 5-5 with analytically computed frequencies (a) f = 2.2588GHz, (b) f = 3.0305GHz, (c) f = 3.6422GHz, (d) f = 4.1650GHz, and (e) f = 5.0508GHz for 31.42o rotated 21cm× 21cm square cavity computed for different resolutions. 122 Rotated angle, θ Grid size, N CF L = 0.5, T = 50ns CF L = 2, T = 200ns 45o 31.42o 31.42o 45o 0o 0o 502 1002 1502 2002 2502 0.98 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 0.96 0.98 0.98 1.00 1.00 0.98 1.00 0.99 1.00 1.00 0.98 0.99 1.00 1.00 1.00 0.96 0.98 0.98 0.99 1.00 Table 5.1: Frequency (in GHz) obtained at the point (3.36cm, 3.36cm) using the ping test. Rotated angle, θ Grid size, N 0o 31.42o 45o 502 1002 1502 2002 2502 9.599802E-1 9.899800E-1 9.899800E-1 1.000015E+0 1.000008E+0 9.899800E-1 9.999800E-1 9.999800E-1 1.000115E+0 1.000108E+0 9.899800E-1 9.999800E-1 9.999800E-1 1.000015E+0 1.000008E+0 Table 5.2: Frequency (in GHz) obtained at the point (3.36cm, 3.36cm) using the ping test for CFL 1. Further, we evaluated the frequencies of the normal modes 1-3, 3-3, 1-5, 3-5, and 5-5. Our convergence plots (Figure 5.6) show the clear convergence to the analytically computed frequencies 2.2588GHz, 3.0305GHz, 3.6422GHz, 4.1650GHz, and 5.0508GHz for the modes 1-3, 3-3, 1-5, 3-5, and 5-5 respectively. Figure 5.7 shows the time evolution of the potential A generated by a point source sin(2πf t) with f = 1GHz placed at the center of the box. We chose the same domain as used in the previous experiment and set the grid size to be 100×100, time step size ∆t = 7ps. 5.5.1.1 Convergence Studies and Error Analysis We evaluate the consistency of our scheme via (1) the ping test and (2) space-time convergence studies for the magnetic vector potential A in the PEC square cavity. First, we perform a convergence study given the initial condition sin(cid:0) mπx (cid:1) sin(cid:0) nπy (cid:1) with L H frequency mode 3-2 (m = 3, n = 2) over a square domain [0cm, 21cm]2 and compare our 123 (a) θ = 0o, t = 0.637ns (b) θ = 31.42o, t = 0.637ns (c) θ = 45o, t = 0.637ns (d) θ = 0o, t = 1.176ns (e) θ = 31.42o, t = 1.176ns (f) θ = 45o, t = 1.176ns Figure 5.7: Time evolution of a point source field sin(2πf t) with f = 1GHz placed at the center of a PEC square box of grid size 100×100 rotated by the angles 0o, 31.42o, 45o, time step size ∆t = 7ps. numerical solution to the exact solution given by, A(t, x, y) = cos c ( mπ L )2 + ( nπ H (cid:18) (cid:114) (cid:19) )2t sin (cid:16)mπx (cid:17) L (cid:16)nπy (cid:17) H sin , where the wave speed c is chosen to be 30Gcms−1. Our L2 norm error plots of the solution at time T = 1.0ns with varying resolutions and a fixed CFL( = 1) for the cases of mesh aligned and nonaligned (rotated by 45o) square cavities show second order accuracy (Figure 5.8 - (a)). We also note that the tangential derivative along the boundary converges with 124 third order accuracy (Figure 5.8 - (b)). (a) Space convergence of solution (b) Space convergence of boundary derivative Figure 5.8: Convergence plots for a) space using entire solution and b) space using boundary derivative on a PEC square domain [0cm, 21cm]2. This study measures L2 norm of the error at time T = 1.0ns compared with the analytical solution, for the cases of mesh aligned and nonaligned (rotated by 45o). For the second evaluation, we compute the error with our fundamental frequency com- putation using the ping test explained above and summarize it in Table 5.5.1.1. The error is the difference between the frequencies (1-1 mode) obtained analytically (= 1GHz) and numerically using our scheme. Rotated angle θ Grid size, N CF L = 0.5, T = 50ns 31.42o 0o CF L = 2, T = 200ns 31.42o 0o 502 1002 1502 2002 2502 1.001980E-2 1.999960E-5 1.999960E-5 1.999960E-5 8.0000064E-6 4.501910E-2 2.001960E-2 2.001960E-2 1.500023E-5 8.000064E-6 1.501970E-2 5.019900E-3 1.999960E-3 5.015075E-4 5.0080040E-5 4.501910E-2 2.001960E-2 1.501970E-2 4.985075E-4 8.000064E-6 Table 5.3: Numerical error in the frequency computation using the ping test at the point (3.36cm, 3.36cm). Finally, we perform a self-refinement study for time and space convergence using a point source. Figure 5.9 shows the time and space convergence plots for the cases of mesh aligned and nonaligned (rotated by 45o) square cavities with the same configuration as used in 125 the above experiment done for the time evolution of a point source field. For the time convergence test, we maintain the resolution at 320 × 320 and reduce the time step size ∆t = 10.9ps to 0.3418ps. For the space convergence test, we maintain the CFL value as 1 and reduce the spatial step size ∆x = 1.313cm to 0.082cm. We obtained fourth-order convergence in time and second-order in space. (a) Time convergence (b) Space convergence Figure 5.9: Convergence plots for (a) time and (b) space on a PEC a square domain [0cm, 21cm]2. This study measures L∞ norm of the error at time T = 1.0ns compared with the analytical solution, for the cases of mesh aligned and non- aligned (rotated by 45o). 5.5.2 Square Cavity with a leak (diffraction Q) The Q factor (quality factor) of a resonator is a measure of the damping, with a lower Q indicating higher damping [33]. It measures the harmonic losses of the oscillations. The diffraction Q factor is the external Q factor which characterized the energy loss due to radiation. The value of Q is infinite for a closed cavity and finite for an open cavity and it can be used to determine the amount of energy loss. The diffraction Q can be expressed as, Q = −πf (t2 − t1) ln( A1t1 A2t2 ) where A1 and A2 are the vector potentials at time t1 and t2. 126 In the second experiment, an open boundary is placed along the center of the right edge of the square cavity, and the diffraction quality factor Q is obtained for an oscillating point source (sin(2πf t), f = 1GHz) placed at the center. Every configuration was the same as above, except for imposing an outflow boundary condition along the open boundary and keeping the Gaussian pulse running for the entire time period [∆tns, T ns]. The vector potential A is measured at the point (3.36cm, 3.36cm) and the computation of Q is done for aligned and nonaligned mesh cases. For non aligned meshes, we rotate the square by an angle θ = 45o. Analytic calculation of the quality factor treating an open cavity like a transmission line with a load simulating free space (377Ω) gives a value of Q=24 [36]. The Q values obtained from our scheme (Q = 26.5768 for θ = 0o and Q = 25.7875 for θ = 31.420) are very close to the analytically obtained value. Figure 5.10 shows the time evaluation of vector potential A at the point (3.36cm, 3.36cm) for both cases. (a) θ = 0o, Q = 26.5768 (b) θ = 31.420, Q = 25.7875 Figure 5.10: Time evaluation of A in (a) mesh aligned and (b) θ = 31.420 rotated square cavities with a leak imposed by outflow boundary condition and diffraction Q which is obtained for an oscillating point source (sin(2πf t), f = 1GHz) 127 5.5.3 A6 Magnetron We chose a 2D A6 magnetron to test the applicability of our PEC scheme to complicated geometry. For this experiment, a 2D domain of Ω = [0cm, 5cm]2 is chosen with radius of the vane resonators rv = 4.11cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 20o, and cavity angle, 40o. We set the grid size to be 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. The embedded PEC boundary condition is imposed over the boundary stencil during the x− and y− sweeps. This experiment was conducted for frequency mode analysis. A point source was placed in the center of the A6M and the frequency distribution examined using a ping test at the point (1.4063cm, 0.8203cm). We obtained six strong frequency modes as shown in figure 5.11, associated with the first two passbands, clearly showing the effects of all six symmetric resonances. Figure 5.11: Frequency spectrum of 2D A6M with vane resonators rv = 4.11cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 20o, and cavity angle, 40o, the grid size 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. We further simulated the A6 magnetron with a transparent cathode (5.12) which is 128 mimicked by using six emitters/point sources and calculated the time evolution for the same configuration as above. The formulation maintains symmetry and high accuracy for relatively coarse-grained solutions. For example, in the A6 magnetron results, there are only 12 points across the neck of the magnetron vanes. (a) t = 1.23ns (b) t = 3.01ns (c) t = 5.68ns (d) t = 8.81ns (e) t = 11.95ns (f) t = 17.87ns Figure 5.12: Evolution of the transparent cathode A6M with vane resonators rv = 4.11cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 20o, and cavity angle, 40o, the grid size 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. 5.5.4 Rising Sun Magnetrons We chose Rising Sun magnetrons with 12 and 18 cavities for further evaluation of our scheme to complicated geometry. Figure shows the 2D view of the 18 cavity rising sun 129 magnetron (AX9) with the cathode of radius rc and the anode with inner radius ra, short vane radius rvs and long vane radius rvl. For this experiment, a 2D domain of Ω = [0cm, 5cm]2 is chosen with radius of the wider and shorter vane resonators rvs = 4.11cm, and rvl = 4.91cm respectively, anode radius ra = 2.11cm, cathode radius rc = 1.58cm. For the 12 cavity rising sun magnetron , we chose the angular width of vane, 12o, and cavity angle, 18o. For the 18 cavity rising sun magnetron , we chose the angular width of vane, 12o, and cavity angle, 8o. We set the grid size to be 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. The embedded PEC boundary condition is imposed over the boundary stencil during the x− and y− sweeps. Figure 5.13: 2D view of the 18 cavity rising sun magnetron (AX9) with a cathode of radius rc and anode of inner radius ra, short and long cavity radii rvh and rvl. We simulated the rising sun magnetrons with transparent cathodes ( Figures 5.14 and 5.15 ) which are mimicked by using 12 and 18 emitters/point sources for 12 and 18 cavity rising sun magnetrons respectively and calculated the time evolution for the same configuration as above. The formulation maintains symmetry and high accuracy for relatively coarse-grained solutions. 130 (a) t = 0.72s (b) t = 0.72s (c) t = 5.68s (d) t = 11.86s (e) t = 5.68s (f) t = 11.86s Figure 5.14: Evolution of the transparent cathode 12 cavity rising sun magnetron with vane resonators rvs = 4.11cm and rvl = 4.91cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 12o, and cavity angle, 18o, the grid size 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. 131 (a) t = 0.81s (b) t = 0.81s (c) t = 2.05s (d) t = 5.72s (e) t = 5.72s (f) t = 5.72s Figure 5.15: Evolution of the transparent cathode 18 cavity rising sun magnetron (AX9) with vane resonators rvs = 4.11cm and rvl = 4.91cm, anode radius ra = 2.11cm, cathode radius rc = 1.58cm, angular width of vane, 8o, and cavity angle, 12o, the grid size 128 × 128, time step size ∆t = 39ps, averaging parameter β = 1.4 and dissipation coefficient  = 0.1. 5.6 Summary We derived wave equations for vector and scalar potentials A and φ under the Lorenz gauge condition and obtained the solution for the vector potentials based wave equation subject to PEC boundary condition. We explained the embedded PEC boundary conditions in 2D and the consistency and performance of the scheme are confirmed using the ping test and frequency mode analysis for rotated square cavities. We then demonstrate the 132 diffraction Q value test and the use of this method for simulating an A6 magnetron. We obtained strong six resonance modes for the impulse response of A6 magnetron as expected. In the next chapter, we describe our scalable software solution MOLTN in detail. 133 CHAPTER 6: Software Solution and Code Acceleration 6.1 Introduction In this chapter, we describe our new open-source code development work and give scala- bility studies for the OpenMP and GP-GPU CUDA versions. The exciting EM software solutions in plasma science are developed based on the FDTD- PIC method [51; 67; 32; 70] and they have limitations as described previously in section 1.1. We have proved that our MOLT based scheme can overcome those limitations. Hence, we are motivated to develop a software solution to bring our scheme to the scientific commu- nity. We are developing an open source code MOLTN (Method Of Line Transpose based Nth arbitrary order scheme) which is intended to be an architecture-independent, scalable software tool, using MPI, OpenMP, and as well as GPU CUDA implementation. We are working with the templated C++ library Kokkos [24] at the lowest levels of the code to achieve this goal. In this chapter, we give the design of MOLTN software and scalability studies for the OpenMP version in section 6.2 and the parallel design and scalability studies for the GP-GPU CUDA version in section 6.3. 6.2 Multi-core OpenMP, Multi-node MPI version As an initial work, we developed the code in C++ using multi-core, shared memory OpenMP and multi-node, distributed memory MPI. For this implementation, we first de- compose the domain as a set of subdomains and assign each sub-domain per node. We use OpenMP pragmas for the data parallelization within the nodes and each node communicates 134 with each other using the MPI library. In our scheme, internal sweeps are carried out on each node and boundary information is passed between the adjacent nodes during the left and right oriented sweeps for the fast convolution integration (as described in the section 2.2.2). We give an overview of the structure of the MOLTN in the following section, 6.2.1 Data Structures and Data Flows As we have seen, we know the kernel of this scheme is the O(n) one-dimensional recursive fast convolution to obtain the particular solution and the homogeneous solution ( section 2.2). Thus, we defined the MOLT kernel using a C++ object to implement this algorithm with subroutines to build quadrature weights, perform left and right oriented sweeps, and apply boundary conditions. We defined another major object, MOLT mesh, to represent the domain with every relevant parameter. The MOLT mesh acts as the main storage place of our implementation, which consists of a set of one-dimensional C++ pointer arrays. The controller object defined in MOLTN is the MOLT time-step which receives the user- defined object and parameters, builds MOLT mesh and performs needed computation and communication. We use node-core hybrid parallelism using MPI and OpenMP, thus initially, the whole domain has to be partitioned and distributed over the HPC cluster nodes. Then computing tasks have to be performed and the resultant relevant data must be transferred between the nodes. The MOLT time-step performs this task by invoking appropriate routines defined in the MOLTN library API that includes MOLT kernel, MOLT mesh, and MOLT time-step. So the application class can include these library files to carry out its work. The application of these libraries can be extended beyond the wave solver such as advection-diffusion, magnetohydrodynamic (MHD), and fluid dynamics. 135 6.2.2 Domain Decomposition Here we present our domain decomposition strategy ([64]). Consider the L−1(u) operator: M (u) = u(y)e−α|x−y|dy + A0e−α(x−a) + B0e−α(b−x). a We seek to partition this expression for subdomains, so we define ci such that c0 < ... < ci < ... < cN for some N , where c0 = a and cN = b, and thus for ci < x < ci+1, Mi[u](x) = u(y)e−α|x−y|dy + Aie−α(x−ci) + Bie−α(ci+1−x). This expression can be rewritten in terms of the left and right sweep operators ci+1(cid:90) Mi[u](x) =e−α(x−ci) u(y)eα(y−ci)dy + e−α(ci+1−x) u(y)eα(ci+1−y)dy ci x + Aie−α(x−ci) + Bie−α(ci+1−x). We evaluate this expression at x = ci: Mi[u](ci) =e−α(ci+1−ci) u(y)eα(ci+1−y)dy + Ai + Bie−α(ci+1−ci), (6.1) (6.2) b(cid:90) ci+1(cid:90) ci x(cid:90) This is computed in the following way: we define ci+1(cid:90) ci x(cid:90) b(cid:90) a I L[u](x) = I R[u](x) = x(cid:90) b(cid:90) a u(y)e−α|x−y|dy = e−αx u(y)eαydy u(y)e−α|x−y|dy = eαx u(y)e−αydy x x 136 So then M (u) = I L[u](x) + I R[u](x) + Ae−α(x−a) + Be−α(b−x) Suppose we decompose the domain [a, b] to three sub domains 1,2, and 3 with pseudo boundaries c0, c1, c2, and c3 as shown in Figure 6.1, we can now apply the above expression for the each sub domain, Figure 6.1: Domain Decomposition of the domain [a, b] to three 1D sub domains [c0, c1], j ) sweeps over [c1, c2],and [c2, c3], each evolves own internal left (I L the domain of length ∆cj. j ) and right (I R A1e−α(x−a) + B1e−α(c1−x) − A0e−α(x−a) − B0e−α(b−x) = e−α(c1−x)I R 2 [u](c1) + e−α(c2−x)I R 3 [u](c2) A2e−α(x−c1) + B2e−α(c2−x) − A0e−α(x−a) − B0e−α(b−x) = e−α(x−c1)I L 2 [u](c1) + e−α(c2−x)I R 3 [u](c2) A3e−α(x−c2) + B3e−α(b−x) − A0e−α(x−a) − B0e−α(b−x) = e−α(x−c1)I L 1 [u](c1) + e−α(x−c2)I L 2 [u](c2) 137 We define ∆c1 = a − c1, ∆c2 = c2 − c1, and ∆c3 = b − c2. We now write: A1e−α(x−a) + B1e−α(c1−x) − A0e−α(x−a) − B0e−α(c1−x)e−α∆c2e−α∆c3 = e−α(c1−x)I R 2 [u](c1) + e−α(c1−x)e−α∆c2I R 3 [u](c2) A2e−α(x−c1) + B2e−α(c2−x) − A0e−α∆c1e−α(x−c1) − B0e−α∆c3e−α(c2−x) = e−α(x−c1)I L 2 [u](c1) + e−α(c2−x)I R 3 [u](c2) A3e−α(x−c2) + B3e−α(b−x) − A0e−α∆c1e−α∆c2e−α(x−c2) − B0e−α(b−x) = e−α(x−c2)e−α∆c2I L 1 [u](c1) + e−α(x−c2)I L 2 [u](c2) From the equations 6.3, 6.4, and 6.5 we can discern (6.3) (6.4) (6.5) 2 [u](c1) + e−α∆c2I R 3 [u](c2) A1 = A0 B1 = B0e−α∆c2e−α∆c3 + I R A2 = A0e−α∆c1 + I L B2 = B0e−α∆c3 + I R A3 = A0e−α∆c1e−α∆c2 + e−α∆c2I L 3 [u](c2) 2 [u](c1) 1 [u](c1) + I L 2 [u](c2) B3 = B0 We choose ∆t such that e−α∆ci = 0 for i = 1...N , hence, A1 = A0, B1 = I R 2 [u](c1), A2 = I L 2 [u](c1), B2 = I R 3 [u](c2), A3 = I L 2 [u](c2), B3 = B0. 138 6.2.3 Performance Analysis As an initial evaluation of MOLTN, we tested it using a single node multi-core platform. We chose a node with 28 cores which facilitated with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor and 492 GB memory to examine our OpenMP parallel MOLTN code with- out using any accelerators. This strong scaling study shows nearly linear speedup (in Figure 6.3) and approximately 85% efficiency ( in Figure 6.4). We observe less speedup/efficiency for a small amount of work due to the leading effect of communication cost including com- munication overhead. Speedup and efficiency reduce with increasing core counts due to memory bounds.Hence, it shows positive signs for the scalability of our scheme, but we need to prove the scalability of the hybrid version and also have to work with the performance enhancement using the accelerators such as NVIDIA GPU and Intel SIMD vectorization. We believe that designing an algorithm based on the architecture is the best practice to achieve high performance, and focus to work deeply with lower-level code optimization technologies, NVIDIA warp primitive-level programming and SIMD vectorization intrinsics. Figure 6.2: Log-log plot for wall time using 28 core Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor, for the grid size N 2, N = 64, 128, 256, 1024, 2048. 139 Figure 6.3: Log-log plot for speedup in using 28 core Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor for the grid size N 2, N = 64, 128, 256, 512, 1024, 2048. Figure 6.4: Log-log plot for efficiency using 28 core Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor for the grid size N 2, N = 64, 128, 256, 512, 1024, 2048. 140 6.3 GPGPU CUDA version In this section, we describe our initial work of GPGPU CUDA implementation of our scheme and performance evaluation. 6.3.1 Task Organization For this implementation, we chose the two-dimensional second-order scheme, un+1 − 2un + un−1 = −β2C(1) xy [un] where, Cxy = (L−1 y + L−1 x ) − (L−1 y L−1 x + L−1 x L−1 y ) (6.6) Based on the structure of the scheme, the computation of the solution un+1 at the time step tn+1 can be obtained through four parallel stages using task paralleling, 1. Perform the first set of x- and y- sweeps: Compute wx = L−1 x [u] and wy = L−1 y [u] concurrently, 2. Perform the second set of x- and y- sweeps: Compute wyx = L−1 x [wx] and wxy = L−1 y [wy] concurrently, 3. Obtain Cxy: Compute Cxy1 = wx − wyx and Cxy2 = wy − wxy concurrently, 4. Obtain un+1: Compute un+1 1 = 2un − β2Cxy1 and un+1 2 = −un−1 − β2Cxy2 concurrently, and finally obtain un+1 = un+1 1 + un+1 2 . We choose two CUDA streams to go through these stages, Figure 6.5 shows the processing flow of the two streams. 141 Figure 6.5: Task parallelism for the computation of un+1 based on Cxy and parallel stages executing through CUDA streams. 6.3.2 Performance Analysis We evaluated the scalability and performance of the scheme using NVIDIA Tesla K20 and K80 GPU accelerators. Tesla K20 (2880 CUDA cores) and K80 (4992 CUDA cores) GPUs were compared with a CPU, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz processor. This study shows the nearly 45X and 35X speedup for Tesla K80 and K20 respectively with full utilized CUDA cores(in Figure 6.7) and linear time complexity for both CPU and GPU architectures (Figure 6.6). Even though the evaluation result shows reasonable improve- ment in time and cost, we have to include the MPI library to communicate between nodes for the distributed multi-node system and also consider the lower-level code optimization technologies, NVIDIA warp primitive-level programming. 142 Figure 6.6: Log-log plot for wall time using Intel(R) Xeon(R) CPU, Tesla K20, and K80 GPUs for the grid sizeN 2, N = 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192. Figure 6.7: Intel(R) Xeon(R) CPU vs Tesla K20 and K80 GPU speedup (log-log plot) for the grid size N 2, N = 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192. 143 6.4 Summary We have been developing an architecture-independent, scalable software tool MOLTN using MPI, OpenMP, and as well as GPU CUDA implementation. In this chapter, we have described the overview of the design of MOLTN, parallel implementation of GP-GPU CUDA version, and scalability studies for both OpenMP and CUDA versions. In the next chapter, we summarize the work carried out through these studies and give useful ideas for future work. 144 CHAPTER 7: Summary and Potential Future Directions 7.1 Introduction In this chapter, I conclude the work carried out through this study, emphasize my con- tributions to this research work, and identify potential future directions. 7.2 Conclusion Based on my literature review, I identified the MOLT based scheme developed by Christlieb team [15] as an unconditionally stable, fast, and implicit scheme and arbitrary order accuracy for Dirichlet and periodic boundaries on a Cartesian grid. Hence, I chose the scheme for my simulation study and overcame the identified limitations on it. The major limitation is that the scheme was second order for outflow and Neumann boundary conditions. So, I worked to extend the order of accuracy for Neumann and outflow for arbitrary order and achieved a scheme with fourth-order spatiotemporal accuracy for complex geometry problems including curved boundaries. Further, I defined a general framework for complex geometry problems and simulated such problems using an embedded boundary method. Specifically, I developed a simulation tool for HPM tubes, such as A6 magnetron, 12 and 18 cavity rising suns us- ing PEC boundary condition. For this simulation, I derived the scheme for electromagnetic vector potential using the Lorenz gauge and imposed PEC boundary condition in 2D. I eval- uated the simulation of A6 magnetron using a ping test and obtained six strong resonance modes. This is one of the challenging tasks that I accomplished. I extended the implementation of the scheme to 3D and obtained multiple applications such as 3D variable speed waves, spherical scattering, 3D point source, and A6 magnetron in 145 3D. The simulation of three-dimensional A6 magnetron is an additional important milestone. I developed codes in C++ for these simulations and improved those performances by code optimization and acceleration using parallel technologies in different architectures including NVIDIA GPU. Further, I have been working with the scalable open source MOLTN devel- opment and contributed to the development of multi-core shared memory OpenMP version and wave kernel development using kokkos. 7.3 Future Directions 7.3.1 Code Acceleration for MOLTN As described in the Chapter 6, we are developing an open source scalable software to make visible our MOLT based scheme to the scientific community. We are on the initial stage of this development and will have to optimize the code in different levels; i) low-level architec- ture based ii) compiler related iii) high-level code based optimizations. To obtain accelerated code in Intel architectures, we can use the SIMD vectorization using Intel intrinsic instruc- tions, which are C style functions that provide access to many Intel instructions, including Intel SSE (128 bits Streaming SIMD Extension), AVX (256 bits Advanced Vector Extension), and AVX512 (512 bits Advanced Vector Extension) etc. without the need to write assembly code. The SIMD vectorization techniques can increase processor throughput by perform- ing multiple computations in a single instruction. We can use Arithmetic and Elementary Math functions such as the function m256d mm256 add pd( m256d, m256d) to add packed double-precision (64-bit) floating-point elements using AVX. We can also use the compiler flags for the auto-vectorization (e.g. −O2 or higher in intel). Another well-known accelerator is NVIDIA GPUs, which execute warps of 32 parallel threads using SIMT (Sin- gle Instruction Multiple Threads - multiple threads issue common instructions to arbitrary data) technologies. The explicit warp-level programming using warp-level primitives which were introduced in CUDA 9.0, is safe, effective and will give high-performance. We can use 146 NVIDIA’s intrinsic functions to make the code even faster with reduced accuracy, but they can be used in device codes only. 7.3.2 Complex Geometry in MOLTN We have already implemented simple rectangular-shaped boundary problems on the Cartesian grids for MOLTN, so we have to define a general framework for complex ge- ometry problems on MOLTN in the future. To obtain this, i) convert MOLT kernel to able to operate on line-segments instead of lines because line-segments are unit elements for complex geometry cases. ii) we have to deal with the work load-balancing and data locality carefully in order to make sure better performance is obtained. I can suggest a max-fit policy which takes care of load balancing and locality at the same time. 7.3.3 Further HPM tube simulations Even though the concept of the magnetrons is the same, every magnetron is not designed by the same structure. They may differ in the number of cavities (12 or 18 cavity rising suns), the shape of the cavities (triangular or circular cavity magnetrons), and radius of the cathode/anode, etc. In these studies, my simulations are developed based on triangular cavities. In the future, we can consider the circular cavity-anode structure as shown in Figure 7.3.3 which is a typical magnetron for microwave ovens in cutaway view. The cylindrical anode structure contains a number of equally spaced cavity resonators with the anode surface adjacent to the cylindrical cathode. Permanent magnets are used to provide the necessary magnetic field, which is perpendicular to the electric field between the cathode and the anode. The power output is coupled through an antenna that runs from one of the cavities to a waveguide that channels the microwave radiation to the cooking chamber. Further, beyond the magnetrons, we can develop simulations for other kinds of HPM tubes such as the accelerator klystron. Figure 7.3.3 describes the two-cavity klystron [30] 147 Figure 7.1: Magnetron with eight circular hole cavity anode (this figure is from Encyclopedia Britannica, Inc.) Figure 7.2: Two-cavity klystron (this figure is from Reference [30]) 7.3.4 Incorporate with Particles We obtained a cold test (in Chapter 5) using the electromagnetic waves without including particles successfully. Now, we are planning to perform hot tests by including particles in 148 the simulation. For this purpose, we tested for a point source as a delta function in three dimensions. Next we will see how to derive the scheme which includes the point source. 7.3.4.1 Derive the scheme for 3D point source Let’s rewrite the Helmholtz operator in 1D, (cid:16) (cid:17) (.) Lx(.) = 1 − 1 α2 ∂xx We use the free space Green’s function to solve the L−1 as previously described. The one dimensional Green’s function can be given by, Upon using the Green’s function, G(x|x0) = e−α|x−x0|2 |x − x0|2 1 4π (cid:90) G(x|x0)(.)dx0 G(x|x0)(.)dx0 + (7.1) (7.2) (cid:90) L−1 x (.) = Ωx0 Ωx0 Now, we break the solution into two pieces, the particular solution and the homogeneous solutions: where, b(cid:90) a Ix(.) = α 2 L−1 x (.) := Ix(.) + Hx (7.3) e−α|x−y|(.)dy, Hx = axe−α(x−xa) + bxe−α(xb−x). For three dimensional inverse operator L−1(.) for the ADI splitting is performed one line 149 at a time. Recall, where, b(cid:90) a Ik(.) = α 2 L−1 k (.) := Ik(.) + Hk (7.4) e−α|k−k0|(.)dk0, Hk = ake−α(k−ka) + bke−α(kb−k). So the three dimensional operator L−1(.) can be expressed as, L−1(.) = L−1 x (L−1 y (L−1 z (.))) + O(∆t2) or or L−1(.) = Ix,y,z(.) + Hx,y,z(.) + O(∆t2) L−1(.) = Ix(Iy(Iz(.) + Hz) + Hy) + Hx + O(∆t2) By plugging into the second order scheme, un+1 = (2 − β2)un − un−1 + β2 ¯L−1 un+1 = (2 − β2)un − un−1 + β2L−1 x 3D [un] (cid:2)L−1 y (cid:2)L−1 z [un](cid:3)(cid:3) . (7.5) (7.6) and including a point source S, un+1 = (2 − β2)un − un−1 + β2L−1 x (cid:2)L−1 y (cid:2)L−1 z [un](cid:3)(cid:3) + β2 α2 Sn.Gδ 3D. (7.7) where, Gδ 3D = 1 4π √ (cid:112)(x − x0)2 + (y − y0)2 + (z − z0)2 + δ2 (x−x0)2+(y−y0)2+(z−z0)2 e−α 150 Gδ 3D(x/x0) = (cid:112)(x − x0)2 + (y − y0)2 + (z − z0)2 + δ2 e−α||x−x0||2 1 4π where δ ∈ R+, 0 < δ2 < ∆x 7.3.4.2 Numerical Example For the next two experiments we placed two point sources sin(2πf t) with equal fre- quencies (f = 1) for the first case and two point sources sin(2πf t) with different fre- quencies f = 1 and f = 10 for the second case at the corners (−0.613,−0.613,−0.613) and (1.613, 1.613, 1.613) of the cubical domain ([−1, 1]3). We simulate the wave propaga- tion by applying outflow and Dirichlet boundary conditions along the right surface of the cube and the other five surfaces of the cube respectively. We set the values of parameters ∆x = ∆y = ∆z = 0.013, ∆t = 0.0067, and β = 2. Figure 7.3 and 7.5 show snapshots of the wave at different time instants for the symmetric and asymmetric cases. Finally, we perform a self-refinement study for time and space convergence using a point source sin(2πf t) with frequencies f = 1 at the center of the cubical domain ([−1, 1]3). Figure 7.4 shows the time and space convergence plots for the time evolution of a point source field. For the time convergence test, we maintain the resolution at 40 × 40 × 40 and reduce the time step size ∆t = 0.1 to 0.0125. For the space convergence test, we maintain the CFL value as 1.3 and reduce the spatial step size ∆x = 1.313 to 0.082. We obtained second-order convergence in time and space This shows a positive result to deal with particles properly, and the approach can be extended to high-order by using the Taylor series expansion in terms of the source function. 151 (a) t = 1.23ns (b) t = 3.01ns (c) t = 5.68ns (d) t = 8.81ns Figure 7.3: Evolution of the two 3D point sources sin(2πf t) with the same frequency f = 1 at the corners of the cubical domain ([−1, 1]3), spatial step size ∆x = ∆y = ∆z = 0.013, and time step size ∆t = 0.0067. (a) time convergent (b) space convergent Figure 7.4: Time (a) and space (b) convergence studies using 3D point source sin(2πf t) with frequencies f = 1 at the center of the cubical domain ([−1, 1]3). 152 (a) t = 1.23ns (b) t = 3.01ns (c) t = 5.68ns (d) t = 8.81ns Figure 7.5: Evolution of the two 3D point sources sin(2πf t) with frequencies 1 and 10 at the corners of the cubical domain ([−1, 1]3), spatial step size ∆x = ∆y = ∆z = 0.013, and time step size ∆t = 0.0067. 7.4 Summary I gave the summary and contributions of this research work and gave potential future directions in this chapter. I strengthen the MOLT based scheme by including high-order Neumann, outflow and PEC boundary conditions, general geometry framework, 3D im- plementations, and accelerated codes and wave kernel for MOLTN development. We are proposing to work with particles, several HPM tube simulations, and optimization and de- velopment of MOLTN in the future. 153 BIBLIOGRAPHY 154 BIBLIOGRAPHY [1] Alpert, B., L. Greengard, and T. Hagstrom (2000), Rapid evaluation of nonreflect- ing boundary kernels for time-domain wave propagation, SIAM Journal on Numerical Analysis, 37 (4), 1138–1164, doi:10.1137/S0036142998336916. [2] AlSalem, H., P. Petrov, G. Newman, and J. Rector (2018), Embedded boundary meth- ods for modeling 3d finite-difference laplace-fourier domain acoustic-wave equation with free-surface topography, GEOPHYSICS, 83 (5), T291–T300, doi:10.1190/geo2017- 0828.1. [3] Appelo, D., and N. A. Petersson (2012), A Fourth-Order Accurate Embedded Boundary Method for the Wave Equation, SIAM J. Sci. Comput., 34 (6), A2982–A3008. [4] Ascher, U., R. Mattheij, and R. Russell (1995), Numerical Solution of Boundary Value Problems for Ordinary Differential Equations, Society for Industrial and Applied Math- ematics, doi:10.1137/1.9781611971231. [5] Benford, J. (2010), History and future of the relativistic magnetron, in 2010 Interna- tional Conference on the Origins and Evolution of the Cavity Magnetron, pp. 40–45, doi:10.1109/CAVMAG.2010.5565566. [6] Berenger, J.-P. (1994), A perfectly matched layer for the absorption of elec- tromagnetic waves, Journal of Computational Physics, 114 (2), 185 – 200, doi: https://doi.org/10.1006/jcph.1994.1159. [7] Bernacki, M., S. Lanteri, and S. Piperno (2006), Time-Domain Parallel Simulation of Heterogeneous Wave Propagation on Unstructured Grids Using Explicit, Nondiffusive, Discontinuous Galerkin Methods, Journal of Computational Acoustics, 14 (01), 57–81, doi:10.1142/S0218396X06002937. [8] Bruno, O. P., and M. Lyon (2010), High-order unconditionally stable FC-AD solvers for general smooth domains I . Basic elements, Journal of Computational Physics, 229 (6), 2009–2033. [9] Bruno, O. P., and M. Lyon (2010), High-order unconditionally stable FC-AD solvers for general smooth domains I. Basic elements , Journal of Computational Physics, 229 (6), 2009 – 2033, doi:http://dx.doi.org/10.1016/j.jcp.2009.11.020. [10] Cai, W. (2004), Numerical methods for maxwell’s equations in inhomogeneous media with material interfaces, Journal of Computational Mathematics, 22 (2), 156–167. [11] Cai, W., and S. Deng (2003), An upwinding embedded boundary method for Maxwells equations in media with material interfaces: 2D case, Journal of Computational Physics, 190 (1), 159 – 183, doi:http://dx.doi.org/10.1016/S0021-9991(03)00269-9. 155 [12] Catella, A., V. Dolean, and S. Lanteri (2008), An Unconditionally Stable Discontin- uous Galerkin Method for Solving the 2-D Time-Domain Maxwell Equations on Un- structured Triangular Meshes, IEEE Transactions on Magnetics, 44 (6), 1250–1253, doi:10.1109/TMAG.2007.916578. [13] Causley, M., A. Christlieb, and E. Wolf (2017), Method of Lines Transpose: An Efficient Unconditionally Stable Solver for Wave Propagation, Journal of Scientific Computing, 70 (2), 896–921, doi:10.1007/s10915-016-0268-8. [14] Causley, M. F., and A. J. Christlieb (2014), Higher Order A-Stable Schemes for the Wave Equation Using a Successive Convolution Approach, SIAM Journal on Numerical Analysis, 52 (1), 220–235, doi:10.1137/130932685. [15] Causley, M. F., A. Christlieb, Y. G¨u¸cl¨u, and E. Wolf (2013), Method of Lines Transpose: A Fast Implicit Wave Propagator, Mathematics of Computation, submitted. [16] Chini, G. P., and S. Leibovich (2003), An analysis of the klemp and dur- internal waves, doi:10.1175/1520- ran radiation boundary condition as applied to dissipative of Physical Oceanography, Journal 0485(2003)033¡2394:AAOTKA¿2.0.CO;2. 33 (11), 2394–2407, [17] Coifman, R., V. Rokhlin, and S. Wandzura (1993), The fast multipole method for the wave equation: a pedestrian prescription, IEEE Antennas and Propagation Magazine, 35 (3), 7–12, doi:10.1109/74.250128. [18] Deng, S., and W. Cai (2006), A Fourth-Order Upwinding Embedded Boundary Method (UEBM) for Maxwells Equations in Media with Material Interfaces: Part I. [19] Ditkowski, A., K. Dridi, and J. Hesthaven (2001), Convergent Cartesian Grid Methods for Maxwell’s Equations in Complex Geometries, Journal of Computational Physics, 170 (1), 39 – 80, doi:http://dx.doi.org/10.1006/jcph.2001.6719. [20] Dolean, V., H. Fahs, L. Fezoui, and S. Lanteri (2010), Locally implicit discontinuous Galerkin method for time domain electromagnetics, Journal of Computational Physics, 229 (2), 512 – 526, doi:http://dx.doi.org/10.1016/j.jcp.2009.09.038. [21] Dridi, K. H., J. S. Hesthaven, and A. Ditkowski (2001), Staircase-free finite-difference time-domain formulation for general materials in complex geometries, IEEE Transac- tions on Antennas and Propagation, 49 (5), 749–756, doi:10.1109/8.929629. [22] Duru, K., and G. Kreiss (2012), A Well-Posed and Discretely Stable Perfectly Matched Layer for Elastic Wave Equations in Second Order Formulation, Communications in Computational Physics, pp. 1–35. [23] Dwivedi, S., and P. K. Jain (2013), Magnetically Insulated Line Oscillator (MILO) Performance Study and its Parameter Optimization, IEEE Transactions on Plasma Science, 41 (9), 2532–2538, doi:10.1109/TPS.2013.2277859. 156 [24] Edwards, H. C., C. R. Trott, En- abling manycore performance portability through polymorphic memory access pat- terns, Journal of Parallel and Distributed Computing, 74 (12), 3202 – 3216, doi: https://doi.org/10.1016/j.jpdc.2014.07.003, domain-Specific Languages and High-Level Frameworks for High-Performance Computing. and D. Sunderland (2014), Kokkos: [25] Engquist, B., and A. Majda (1977), Absorbing boundary conditions for the nu- merical simulation of waves, Mathematics of Computation, 31 (139), 629–651, doi: 10.1090/S0025-5718-1977-0436612-4. [26] Fornberg, B. (2001), A short proof of the unconditional stability of the ADI-FDTD scheme, Tech. rep., University of Colorado. [27] Fornberg, B., J. Zuev, and J. Lee (2007), Stability and accuracy of time-extrapolated ADI-FDTD methods for solving wave equations, Journal of Computational and Applied Mathematics, 200 (1), 178 – 192. [28] Fuks, M., and E. Schamiloglu (2005), Rapid start of oscillations in a mag- 205,101, doi: netron with a ”transparent” cathode, Phys. Rev. Lett., 10.1103/PhysRevLett.95.205101. 95, [29] Gedney, S. D., and U. Navsariwala (1995), An unconditionally stable finite element time-domain solution of the vector wave equation, IEEE Microwave and Guided Wave Letters, 5 (10), 332–334, doi:10.1109/75.465046. [30] Gewartowski, H. A. W., J. W. (1965), Principles of electron tubes, D. Van Nostrand Company, Inc. [31] Gimbutas, Z., and V. Rokhlin (2003), A Generalized Fast Multipole Method for Nonoscillatory Kernels, SIAM Journal on Scientific Computing, 24 (3), 796–817, doi: 10.1137/S1064827500381148. [32] Goplen, B., L. Ludeking, D. Smith, and G. Warren (1995), User-configurable MAGIC for electromagnetic PIC calculations, Computer Physics Communications, 87 (1), 54 – 86, doi:https://doi.org/10.1016/0010-4655(95)00010-D, particle Simulation Methods. [33] Harlow, J. (2003), Electric Power Transformer Engineering, The Electric Power Engi- neering Hbk, Second Edition, CRC Press. [34] Henshaw, W. D. (2006), A High-Order Accurate Parallel Solver for Maxwell’s Equations on Overlapping Grids, SIAM J. Scientific Computing, 28, 1730–1765. [35] Higdon, R. L. (1987), Numerical absorbing boundary conditions for the wave equation, Mathematics of Computation, 49 (179), 65–90. [36] Huang, Y. J., L. H. Yeh, and K. R. Chu (2014), An analytical study on the diffraction quality factor of open cavities, Physics of Plasmas, 21 (10), 103,112, doi: 10.1063/1.4900415. 157 [37] Jackson, J. D. (1999), Classical Electrodynamics, 3rd ed. ed., Wiley, New York, NY. [38] Jackson, S. D., and P. H. Muir (2001), Theory and numerical simulation of nth- order cascaded Raman fiber lasers, J. Opt. Soc. Am. B, 18 (9), 1297–1306, doi: 10.1364/JOSAB.18.001297. [39] Jacobs, G., and J. Hesthaven (2009), Implicit explicit time integration of a high-order particle-in-cell method with hyperbolic divergence cleaning, Computer Physics Commu- nications, 180 (10), 1760 – 1767, doi:https://doi.org/10.1016/j.cpc.2009.05.020. [40] Jacobs, G. B., J. S. Hesthaven, and G. Lapenta (2006), Simulations of the Weibel instability with a high-order discontinuous galerkin particle-in-cell solver, Collection of Technical Papers - 44th AIAA Aerospace Sciences Meeting, 19, 14,189–14,199. [41] Jia, J., and J. Huang (2008), Krylov deferred correction accelerated method of lines transpose for parabolic problems, Journal of Computational Physics, 227 (3), 1739– 1753. [42] Jin, J. (2010), Theory and Computation of Electromagnetic Fields, John Wiley & Sons, Inc, doi:10.1002/9780470874257. [43] Joannopoulos, J., S. Johnson, J. Winn, and R. Meade (2011), Photonic Crystals: Mold- ing the Flow of Light (Second Edition), Princeton University Press. [44] Johansen, H., and P. Colella (1998), A Cartesian Grid Embedded Boundary Method for Poisson’s Equation on Irregular Domains, J. Computational. Phys., 147 (1), 60–85, doi:10.1006/jcph.1998.5965. [45] Kreiss, H., and N. Petersson (2006), An embedded boundary method for the wave equation with discontinuous coefficients, SIAM Journal on Scientific Computing, 28 (6), 2054–2074, doi:10.1137/050641399. [46] Kreiss, H., and N. A. Petersson (2006), An Embedded Boundary Method for the Wave Equation with Discontinuous Coefficients, SIAM Journal on Scientific Comput- ing, 28 (6), 2054–2074, doi:10.1137/050641399. [47] Lambers, J. V. (2016), Solution of time-dependent PDE through component-wise ap- proximation of matrix functions, IAENG Journal of Applied Mathematics, 41 (1), 1–26. [48] Lapidus, L., and G. F. Pinder (1999), Numerical Solution of Partial Differential Equa- tions in Science and Engineering, 677 pp., John Wiley and Sons Inc. [49] LeVeque, R. (2007), Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems, Society for Industrial and Ap- plied Mathematics. [50] Li, P., H. Johnston, and R. Krasny (2009), A Cartesian treecode for screened Coulomb interactions , Journal of Computational Physics, 228 (10), 3858 – 3868, doi: http://dx.doi.org/10.1016/j.jcp.2009.02.022. 158 [51] M. C. Lin, C. Nieter, P. H. Stoltz, D. N. Smithe (2010), Accurately and Ef- ficiently Studying the RF Structures Using a Conformal Finite-Difference Time- Domain Particle-in-Cell Method, The Open Plasma Physics Journal, 3, 48–52, doi: 10.2174/1876534301003010048. [52] Marfurt, K. J. (1984), Accuracy of finitedifference and finiteelement modeling of the scalar and elastic wave equations, GEOPHYSICS, 49 (5), 533–549, doi: 10.1190/1.1441689. [53] Matthew Causley, B. O., Andrew Christlieb, and L. V. Groningen (2014), Method of lines transpose: An implicit solution to the wave equation, Mathematics of Computation, 83, 2763–2786, doi:https://doi.org/10.1090/S0025-5718-2014-02834-2. [54] Mazzia, A., and F. Mazzia (1997), High-order transverse schemes for the numerical solution of PDEs, Journal of Computational and Applied Mathematics, 82 (1), 299 – 311, doi:http://dx.doi.org/10.1016/S0377-0427(97)00090-3. [55] McCorquodale, P., P. Colella, and H. Johansen (2001), A Cartesian grid embedded boundary method for the heat equation on irregular domains. [56] Mendonca, C., S. Prasad, E. Schamiloglu, and T. Fleming (2011), 3d icepic simulations of a pulsed relativistic magnetron with transparent cathode: A comparative study, in 2011 IEEE Pulsed Power Conference, pp. 823–828, doi:10.1109/PPC.2011.6191521. [57] Monk, P., and E. Suli (1994), Error estimates for Yee’s method on non-uniform grids, IEEE Transactions on Magnetics, 30 (5), 3200–3203, doi:10.1109/20.312618. [58] Namiki, T. (1999), A new FDTD algorithm based on alternating-direction implicit method, IEEE Transactions on Microwave Theory and Techniques, 47 (10), 2003–2007, doi:10.1109/22.795075. [59] Newmark, N. (1959), A Method of Computation for Structural Dynamics, no. nos. 179- 181 in A Method of Computation for Structural Dynamics, American Society of Civil Engineers. [60] Orlanski, I. (1976), A simple boundary condition for unbounded hyperbolic flows, Journal of Computational Physics, 21 (3), 251 – 269, doi:https://doi.org/10.1016/0021- 9991(76)90023-1. [61] Palevsky, A. (1980), Generation of Intense Microwave Radiation by the Relativistic E- beam Magnetron (experiment and Numerical Simulation), Massachusetts Institute of Technology, Department of Physics. [62] Palevsky, A., and G. Bekefi (1979), Microwave emission from pulsed, relativistic ebeam diodes. II. The multiresonator magnetron, The Physics of Fluids, 22 (5), 986–996, doi: 10.1063/1.862663. 159 [63] Pearson, R. A. (1974), Consistent boundary conditions for numerical models of systems that admit dispersive waves, Journal of the Atmospheric Sciences, 31 (6), 1481–1489, doi:10.1175/1520-0469(1974)031¡1481:CBCFNM¿2.0.CO;2. [64] Pierson Guthrey, W. S. A. C., Thavappiragasam Mathialakan (2019), Domain Decom- position Strategy for Method of Line Transpose Based Scheme. [65] Ricci, P., G. Lapenta, (2002), A Simplified Implicit Maxwell Solver, Journal of Computational Physics, 183 (1), 117 – 141, doi: http://dx.doi.org/10.1006/jcph.2002.7170. and J. Brackbill [66] Sommerfeld, A. (1949), p. iv, doi:https://doi.org/10.1016/B978-0-12-654658-3.50002-1. [67] T. Austin, D. N. S., J. R. Cary, and C. Nieter (2010), Alternating Direction Implicit Methods for FDTD Using the Dey-Mittra Embedded Boundary Method, The Open Plasma Physics Journal, 3, 29–35, doi:10.2174/1876534301003010029. [68] Thavappiragasam, M., A. Viswanathan, and A. Christlieb (2017), MOLT based fast high-order three dimensional A-stable scheme for wave propagation, Journal of Coupled Systems and Multiscale Dynamics, 5 (2), 151–163, doi:doi:10.1166/jcsmd.2017.1137. [69] Ting, L., and M. J. Miksis (1986), Exact boundary conditions for scattering prob- lems, The Journal of the Acoustical Society of America, 80 (6), 1825–1827, doi: 10.1121/1.394297. [70] Wang, Y., and S. Langdon (2010), Design of wave ports in fdtd and its application to microwave circuits and antennas, in 2010 IEEE Antennas and Propagation Society International Symposium, pp. 1–4, doi:10.1109/APS.2010.5562023. [71] Yee, K. (1966), Numerical solution of initial boundary value problems involving Maxwell’s equations in isotropic media, IEEE Transactions on Antennas and Propa- gation, 14 (3), 302–307, doi:10.1109/TAP.1966.1138693. [72] Zafarullah, A. (1970), Application of the Method of Lines to Parabolic Par- tial Differential Equations With Error Estimates, J. ACM, 17 (2), 294–302, doi: 10.1145/321574.321583. [73] Zheng, F., Z. Chen, and J. Zhang (1999), A finite-difference time-domain method with- out the Courant stability conditions, IEEE Microwave and Guided Wave Letters, 9 (11), 441–443, doi:10.1109/75.808026. [74] Zienkiewicz, O., and R. Newton (1969), Coupled Vibrations of a Structure Submerged in a Compressible Fluid. 160