DEVELOPMENT OF A FAST AND ACCURATE TIME STEPPING SCHEME FOR THE FUNCTIONALIZED CAHN-HILLIARD EQUATION AND APPLICATION TO A GRAPHICS PROCESSING UNIT By Jaylan Stuart Jones A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Applied Mathematics and Physics - Doctor of Philosophy 2013 ABSTRACT DEVELOPMENT OF A FAST AND ACCURATE TIME STEPPING SCHEME FOR THE FUNCTIONALIZED CAHN-HILLIARD EQUATION AND APPLICATION TO A GRAPHICS PROCESSING UNIT By Jaylan Stuart Jones This dissertation explores and develops time-stepping schemes for computing solutions to the Functionalized Cahn-Hilliard (FCH) model. It is important to find a scheme that is both fast enough to compute evolution to the long-time states and to give enough accuracy to capture important geometric events. The FCH model is relatively new, and very little work has been done to develop efficient numerical schemes for its simulation, so much of this work is based on the extensive work done on the Cahn-Hilliard (CH) model. For each of the methods, the spatial approximation is computed with a Fourier spectral method. All of the schemes are adapted to be computed on a graphics processing unit (GPU) which gives significant improvements in the speed of the simulation. First, an implicit-explicit (IMEX) method will be introduced that is based on a convex splitting of the right hand side of the equation. This splitting guarantees that the solutions will decay in energy for any size time step, which gives numerical stability for very large time steps. With this splitting, a novel iterative method for solving the implicit portion greatly improves the numerical efficiency. Second is the development of a fully implicit method that attains high accuracy. The method uses a conjugate gradient method to solve the Netwon’s method iterations, and is preconditioned using a physics based approximation to the operator that is easy to invert and numerically efficient. Lastly, exponential time differencing (ETD) methods are derived for the Cahn-Hilliard and Functionalized Cahn-Hilliard Equations. The ETD methods are all explicit which affords computation speed, and higher order versions are natural extensions giving accurate time stepping. Finally, numerical experiments for the three types of methods compare the accuracy and speed. These simulations are performed for both fixed time-step simulations as well as adaptive time steps. This gives a clear picture of the strengths and weaknesses, and it gives enough information to determine which time-stepping method will work best for approximating solutions to the FCH equation. TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Introduction . . . . . . . . . . . . . . . . 1.1 Order Parameters and Phase Field Models . . . . 1.2 The Cahn-Hilliard Model . . . . . . . . . . . . . . 1.3 Sharp Interfaces and the Canham-Helfrich Model 1.4 The Functionalized Cahn-Hilliard Model . . . . . 1.5 Proposed Experimental Nanomorphologies . . . . 1.6 Functionalized Cahn-Hilliard Equation . . . . . . 1.7 Numerical Challenges for the CH and FCH . . . . 1.8 Numerical Methods for the CH Equation . . . . . 1.8.1 Spatial Schemes . . . . . . . . . . . . . . . 1.8.2 Temporal Schemes . . . . . . . . . . . . . 1.9 Spectral Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 6 7 10 12 13 15 15 17 20 Chapter 2 Convex Splitting for the FCH Equation 2.1 Gradient Splitting Method . . . . . . . . . . . . . . 2.2 Gradient Splitting Example . . . . . . . . . . . . . 2.3 Splitting the FCH Equation . . . . . . . . . . . . . 2.4 Stability and Solvability . . . . . . . . . . . . . . . 2.5 Fixed-Point Iteration . . . . . . . . . . . . . . . . . 2.6 CPU version of code . . . . . . . . . . . . . . . . . 2.7 Adaptive Time-Step . . . . . . . . . . . . . . . . . 2.8 FCH on Graphics Processing Units . . . . . . . . . 2.9 GPU Speed-Up . . . . . . . . . . . . . . . . . . . . 2.10 Comparison of results to experimental data . . . . . 2.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 22 24 26 28 32 36 37 39 42 43 44 Chapter 3 Fully Implicit Method . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . 3.2 Models . . . . . . . . . . . . . . . . . . . . . . . 3.3 Basic numerical approach and results . . . . . . 3.3.1 Spectral discretization in space . . . . . 3.3.2 Adaptive, implicit discretization in time 3.3.3 Solution of the implicit system . . . . . . 3.3.4 Basic numerical results . . . . . . . . . . 3.4 Investigation of the preconditioned system . . . 3.4.1 Preliminaries and numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 48 52 54 54 55 57 59 67 67 iv . . . . . . . . . . . . . . . . . . . . 3.4.2 Formal asymptotics . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2.1 Allen Cahn . . . . . . . . . . . . . . . . . . . . . . . 3.4.2.2 Cahn-Hilliard . . . . . . . . . . . . . . . . . . . . . . 3.5 Performance on a variety of models . . . . . . . . . . . . . . . . . . . 3.5.1 2D Cahn-Hillard . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Sixth order model . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 2D vector model . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 GPU implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 GPU Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Higher order time stepping . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . 3.8 A Pair of Fourth-Order Accurate Methods . . . . . . . . . . . . . . . 3.9 Investigation of splitting methods . . . . . . . . . . . . . . . . . . . . 3.9.1 Preliminaries and numerical results . . . . . . . . . . . . . . . 3.9.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.2.1 Accuracy in Allen-Cahn equations . . . . . . . . . . 3.9.2.2 Condition number of solver for Eyre’s method for Hilliard problems . . . . . . . . . . . . . . . . . . . . 3.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4 Exponential Integrator . . . . . . . . . 4.1 General Exponential Integrator . . . . . . . . . 4.2 Stability and Linearization of the CH Equation 4.3 Linearization of the FCH Equation . . . . . . . 4.4 Computing Exponential Terms . . . . . . . . . . 4.5 Runge-Kutta Methods . . . . . . . . . . . . . . 4.6 Taylor Methods . . . . . . . . . . . . . . . . . . 4.7 Convergence of ETD Methods . . . . . . . . . . 4.8 Stability Analysis . . . . . . . . . . . . . . . . . 4.8.1 Predictor/Corrector Analysis . . . . . . 4.9 Implementation of ETD-RK2 . . . . . . . . . . 4.9.1 Application to a GPU . . . . . . . . . . 4.9.1.1 GPU Speedup . . . . . . . . . . 4.9.2 Adaptive Time-Stepping . . . . . . . . . 4.10 Numerical Examples for FCH . . . . . . . . . . 4.10.1 Medusa Head in 2D . . . . . . . . . . . . 4.10.2 Punctured Hollow Spheres . . . . . . . . 4.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 104 107 109 111 113 115 118 119 121 123 126 128 129 131 131 132 134 Chapter 5 Comparison of Methods for the FCH 5.1 Description of Compared Schemes . . . . . . . . 5.2 Complete Simulation . . . . . . . . . . . . . . . 5.3 Fixed Time-Step Simulation . . . . . . . . . . . 5.4 Time Step Size Restrictions . . . . . . . . . . . 5.4.1 Solution Freeze Out for Convex Splitting Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 137 138 140 141 141 v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 . . . . 70 . . . . 73 . . . . 76 . . . . 77 . . . . 77 . . . . 77 . . . . 82 . . . . 84 . . . . 87 . . . . 89 . . . . 89 . . . . 93 . . . . 93 . . . . 98 . . . . 98 Cahn. . . . 101 . . . . 102 5.5 5.6 5.4.2 Solution Freeze Out for Exponential Time Differencing 5.4.3 Implicit Time-step Size Restriction . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . APPENDICES . . . . . . . . . . . . . . . . . . Appendix A: Convex Splitting Code in CUDA Appendix B: Fully Implicit Code in CUDA . . Appendix C: ETD-RK2 Code in CUDA . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 144 146 149 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 152 174 205 . . . . . . . . . . . . . . . . . 227 LIST OF TABLES Table 3.1 Error estimates, Eδt , for fixed time step computations of the 1D Cahn-Hilliard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Table 3.2 Error estimates, EN , for computations of the 1D Cahn-Hilliard . . . 61 Table 3.3 Performance of adaptive time stepping through a ripening event of the 1D Cahn-Hilliard model . . . . . . . . . . . . . . . . . . . . . . 62 Dependence of the smallest eigenvalue of the preconditioned Jacobian matrix for the Cahn Hilliard problem . . . . . . . . . . . . . . . . . 69 Performance of each adaptive time stepping method through a ripening event of the 1D Cahn-Hilliard model . . . . . . . . . . . . . . . . 94 Performance of adaptive time stepping through a ripening event of the 1D Cahn-Hilliard model with Eyre’s splitting . . . . . . . . . . . 98 Table 3.4 Table 3.5 Table 3.6 Table 4.1 Comparison of linearization choices for the CH equation. . . . . . . 109 Table 5.1 Comparison of the convex splitting, fully implicit, and exponential integrator methods for a full simulation . . . . . . . . . . . . . . . . 139 Table 5.2 Comparison of error and simulation time for fixed time-step simulations of the FCH equation . . . . . . . . . . . . . . . . . . . . . . . 141 Table 5.3 Order of convergence at given time step sizes. vii . . . . . . . . . . . . 143 LIST OF FIGURES Figure 1.1 Potential energy function . . . . . . . . . . . . . . . . . . . . . . . . 3 Figure 1.2 Tilted potential energy function . . . . . . . . . . . . . . . . . . . . 9 Figure 1.3 Proposed Nafion nanomorphologies based on experimental results Figure 2.1 Energy trace of solution with unstable equilibrium initial condition. 39 Figure 2.2 An unstable steady state solution for the FCH equation . . . . . . . 40 Figure 2.3 Speedup of GPU over parallel CPU for a short FCH simulation computed using single precision (top) and double precision (bottom). . 45 Two dimensional FCH simulation results compared against images from a diblock copolymer experiment . . . . . . . . . . . . . . . . . 46 Figure 2.5 Geometries that minimize the FCH energy . . . . . . . . . . . . . . 46 Figure 2.6 Results for a diblock copolymer pore showing pearling instability . . 47 Figure 3.1 Solution of the 1D Cahn-Hilliard Equation . . . . . . . . . . . . . . 60 Figure 3.2 The solution of the 1D Cahn-Hilliard equation during the final stages of the ripening process . . . . . . . . . . . . . . . . . . . . . . . . . 63 Figure 3.3 The energy history for the adaptive time approximation . . . . . . . 64 Figure 3.4 The time step history for the adaptive time approximation . . . . . 65 Figure 3.5 The PCG iteration count history for the adaptive time approximation 66 Figure 3.6 Eigenvalues of the preconditioned Jacobian matrix for the model situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 The eigenfunction of the preconditioned Jacobian matrix with the smallest eigenvalue for the model situation . . . . . . . . . . . . . . 69 2D Cahn-Hilliard example computation. . . . . . . . . . . . . . . . . 79 Figure 2.4 Figure 3.7 Figure 3.8 viii . 11 Figure 3.9 2D sixth order model example computation. . . . . . . . . . . . . . 80 Figure 3.10 2D vector model example computation . . . . . . . . . . . . . . . . 81 Figure 3.11 Solution of the 3D Cahn-Hilliard computation used as a timing test for the GPU implementation . . . . . . . . . . . . . . . . . . . . . . 85 Figure 3.12 Time step size for the 3D Cahn-Hilliard computation . . . . . . . . 85 Figure 3.13 PCG count per time step for the 3D Cahn-Hilliard computation . . 86 Figure 3.14 The time step and PCG iteration count history for the adaptive time approximation of the 1D Cahn-Hilliard using BE-FE, BDF2-AB2 and BDF3-AB3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Figure 4.1 Timestep size refinement study for ETD-RK methods . . . . . . . . 119 Figure 4.2 Timestep size refinement study for ETD-Taylor methods Figure 4.3 Timestep size refinement study for the ETD-T2 method with different numbers of iterations . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Figure 4.4 Timestep size refinement study for the ETD-T3 method with different numbers of iterations . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Figure 4.5 Comparison of the error for ETD-RK methods versus the ETD-Taylor methods with one and twenty iterations. . . . . . . . . . . . . . . . 123 Figure 4.6 Stability regions for exponential time differencing methods up to third order in time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Figure 4.7 Stability regions for ETD-T2 and ETD-T3 methods with 100 iterations of the second step . . . . . . . . . . . . . . . . . . . . . . . . . 125 Figure 4.8 Contours giving limits on the region of stability for ETD-T2 and ETD-T3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Figure 4.9 Diagram showing how to implement a spectral method on a GPU to avoid synchronization, minimize memory use, and fully utilize the parallelism of the GPU. . . . . . . . . . . . . . . . . . . . . . . . . 127 ix . . . . . . 120 Figure 4.10 Number of times faster the GPU computation is over an eight-core, fully-parallelized, CPU computation. . . . . . . . . . . . . . . . . . 129 Figure 4.11 Time evolution of an unstable steady state solution to the FCH equation in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 133 Figure 4.12 Time evolution of a punctured vesicle. Figure 5.1 Final states of the complete simulation comparing the convex splitting method to the other two methods . . . . . . . . . . . . . . . . . . . 139 Figure 5.2 Time evolution showing freeze out of solution with larger time step sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Figure 5.3 Freeze out of operators for the Cahn-Hilliard equation showing the amount of evolution when doubling the time step size . . . . . . . . 145 Figure 5.4 The energy spectrum of solutions to the Cahn-Hilliard equation is dominated by low wave numbers. . . . . . . . . . . . . . . . . . . . . 146 Figure 5.5 Freeze out of operators for the Functionalized Cahn-Hilliard equation showing the amount of evolution when doubling the time step size . 147 x . . . . . . . . . . . . . . . . 135 Chapter 1 Introduction 1.1 Order Parameters and Phase Field Models In 1937, Lev Landau published two papers on the nature of phase transitions [1, 2], wherein he used the idea of an order parameter to describe the nature of the material at different points in space. This parameter was a function of space and could be used, for example, to understand the average direction of spin at each point in a magnetic material. The order parameter was later used in the Ginzburg-Landau theory of superconductivity [3]. This order parameter function was the foundation for phase field models many years later. The phase field model was introduced in 1983 by George J. Fix to model first order liquid to solid phase transitions [4]. Later, J.S. Langer compared the model to similar models in solidification theory and described the value of such a model when describing the physics of phase transitions [5]. Since then, phase field models have been used to describe the physics of many different systems including grain growth and coarsening, microstructure evolution in thin films, surface-stress-induced pattern formation, crack propagation, crystal growth in the presence of strain, multiphase fluid flow, stress and electromagnetic driven void migration, tumor growth, vesicle dynamics, and multicomponent interdiffusion (see [6] and [7] for a list of various references). In general, phase field models have become a valuable tool in analyzing and modeling systems where multiple phases of a material separate due to high interaction energies. 1 Phase field models are powerful in describing microstructural evolution because they eliminate the need to track the evolving fronts that describe phase boundaries. In one dimension, an interface-tracking model can easily describe the sharp interfaces and calculate their motions, but when describing complex phase separations in two or three dimensions, it is no longer practical to track the interface directly. Instead, a phase function, u (t, x) is introduced into the model to delineate the amount of each specific phase that exists at that place and time. The evolution of the phase function is governed by a set of equations which balance diffusion against driving forces. The primary advantage of a phase function model is that the boundaries between phases have finite thickness and are therefore easier to analyze mathematically. Phase field formulations can also simplify the numerical simulations of such systems because the derivatives of the phase function remain finite across the interfaces. Phase field models are typically used to track the motion of interfaces, and the thermodynamic and kinetic coefficients are chosen to match the coefficients in a corresponding sharp-interface model. The phase function evolves in time, and when a sharp-interface representation is desired, the solution of the phase function can be projected into the sharpinterface model. This research will be working with a phase field model similar in nature to the well known Cahn-Hilliard model. 1.2 The Cahn-Hilliard Model What is now commonly termed the Cahn-Hilliard model was originally developed by van der Waals in 1893 [8, 9]. The model was essentially forgotten until 1958 when, without knowledge of van der Waals’ work, John Cahn and John Hilliard rederived the model to describe the separation of liquid metal alloys as they coarsened due to cooling [10]. The 2 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −1.5 −1 −0.5 0 0.5 1 1.5 Figure 1.1 Potential energy function W (u). For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. model is expressed in the energy form ECH (u) = where ε2 | u|2 + W (u) dx, 2 Ω (1.1) u is the density gradient of one of the phases and W (u) is a function that represents the potential energy of the mixed states of the phases. A symmetric, double-well potential is typically used for W (u), which defines local energy minima for pure states with u = ±1 and an endothermic energy for the mixed states (Figure 1.1). chemical potential [11] µ= δECH . δu 3 In 1989, Pego defined the We calculate the variational derivative δECH δu using a formula similar to the Euler-Lagrange equation. If F [ρ (r)] = f (r, ρ (r) , ρ (r)) dr then µ= ∂f δF [ρ] = − δρ ∂ρ · ∂f . ∂ ( ρ) (1.2) Applying formula (1.2) to (1.1) and substituting into Pego’s equation for the chemical potential we obtain µ = −ε2 ∆u + W (u) , so that the diffusion equation becomes ut = ∆µ = ∆ −ε2 ∆u + W (u) from Fick’s second law of diffusion [12]. When written this way, it appears in the form of a conservation law ut = · j (x) where j (x) = µ. The conserved quantity is the total mass of each of the phases in the system. This makes the evolution of the Cahn-Hilliard model a mass conserving H −1 gradient flow for the energy (1.1). The resulting differential equation is commonly called the Cahn-Hilliard equation. ut = −∆ ε2 ∆u − W (u) . (1.3) Due to the importance of the H −1 gradient flow and inner product to this work, it is useful to recall some definitions. First, Lp is the Lebesgue function space defined such that a function f is in Lp if |f |p dx < ∞ on the domain of interest (which could be unbounded). Further H s , for non-negative integers s, is the Sobolev space of functions that contains all functions f such that f is in L2 and all of its first s weak derivatives are in L2 . H −1 is not a Sobolev space, nor is it even a space of functions, but rather the space of distributions (an 4 extension of functions) that acts as a dual space to H 1 . We introduce the operator ∆−1 : u ∈ L2 | Ω udx = 0 → H 2 (Ω) denoted f := ∆−1 u with the property that ∆f, v L2 = u, v L2 for any v ∈ L2 (Ω). This mapping requires boundary conditions to make it unique (e.g. ∆−1 : 2 L2 (Ω) + u|∂Ω = 0 → Hperiodic (Ω)). Further, we recall the following equivalent formulations of the H −1 inner product when u, v ∈ L2 (Ω) satisfy periodic boundary conditions, u, v H −1 = ∆−1 u, v 2 = u, ∆−1 v 2 L L (1.4) The global existence of solutions to (1.3) was shown by Elliot and Songmu in 1986 [13]. Solutions to the Cahn-Hilliard equation undergo a rapid spinodal decomposition into domains of the two pure phases (u = ±1) separated by a boundary layer with thickness of O (ε). The evolution of the separated phase domains was studied first by Pego, who showed that the motion of the interfaces is a Mullins-Sekerka type flow [11]. Later, asymptotic analysis by Alikakos et al. derived the rigerous convergence of the Cahn-Hilliard equation to the Mullins-Sekerka flow [14]. Further analysis was performed by Modica and Sternberg showing that solutions of the Cahn-Hilliard equation minimize the area of the interface surfaces [15, 16]. Work by de Mottoni and Schatzman, showed that the motion of the developed phase interfaces is driven by the mean curvature of the surface [17]. 5 1.3 Sharp Interfaces and the Canham-Helfrich Model The limit ε → 0 corresponds to sharp-interface boundaries between phase domains. A common assumption is that the potential energy stored in such interfaces depends exclusively on the area of the surface and curvature of the interface. For a surface in three dimensions and a point, p, on that surface, consider all the curves on the surface passing through p. 1 Each curve will have a curvature at p defined by κ = R , where R is the radius of the circle defining that curvature. Take the minimal and maximal curvatures over that set of curves to obtain κ1 and κ2 respectively. We can then define the mean curvature and Gaussian curvature as H= 1 (κ + κ2 ) 2 1 K = κ1 κ2 . The Canham-Helfrich free energy is a generic model for the bending of thin elastic films which truncates the dependence upon curvature at quadratic order. Canham derived the model when studying the biconcave shape of human red blood cells [18], and separately, Helfrich developed it to describe the minimum energy geometries of lipid bilayers and vesicles [19]. In three space dimensions, the Canham-Helfrich energy is EC (Γ) = Γ a1 + a2 (H − a3 )2 + a4 K dS, (1.5) where a1 is a constant that denotes the energy density per unit of surface area, a2 and a4 are the energy densities attributed to the respective curvatures, and a3 specifies the intrinsic, or zero-energy, value of the mean curvature. Due to the fact that phase separation is dominated by interfacial energies, the CanhamHelfrich model must be taken into consideration. Du et al. show that the Canham-Helfrich 6 model is sufficient to generate many important geometrical structures [20], and the CanhamHelfrich energy is considered to be generic [21]. However, the model brings with it some challenges. As a sharp interface model, Canham-Helfrich models do not have the ability to account for topological changes, nor can it predict the dependence of its parameters on the interfacial structure. To address some of these issues, Gurtin and Jabbour proposed a diffuse interface model that accounts for the smoothing of sharp corners found in grain boundaries of crystalline materials by limiting the energy dependence on curvature to a constitutive framework[22]. 1.4 The Functionalized Cahn-Hilliard Model The Functionalized Cahn-Hilliard (FCH) model was developed by Promislow to describe nanostructure morphology changes in functionalized polymer chains that have been hydrated with a polar solvent [23]. Hydrocarbon backbones of long polymers are hydrophobic, and when they are made into membranes, they exclude any polar solvent such as water. To make such a membrane useful in applications such as Polymer Electrolyte Membrane (PEM) fuel cells, there must be a porous network of thin water channels to allow conduction of protons while excluding electrons [24]. The immiscibility of the water and polymer can be reversed by a functionalization process whereby acid terminated side-chains are added to the polymer backbone. This functionalization embeds latent energy into the membrane that can be released when water is introduced. This energy reduction occurs when the polymer and water phases separate, and a polymer-solvent interface is formed. At the interface, the acid groups can minimize energy by releasing their protons and solvating the bound negative ions. Most phase field models are obtained from a perturbation about a spatially uniform 7 density. The FCH model is fundamentally different because it is obtained by perturbing a model which stabilizes preferred geometries. The Functionalized Cahn-Hilliard energy is E (u) = 2 1 2 ε ∆u − Wτ (u) − ε Ω2 ε2 η1 | u|2 + η2 Wτ (u) dx, 2 (1.6) where the first term in the integral comes from the variational derivative of the Cahn-Hilliard energy (1.1) which stabilizes bilayer, pore, and micelle structures. The second term gives the perturbation which promotes the growth of interface and competition between these geometries. As with the Cahn-Hilliard energy, the function u takes values from −1 to 1 and defines the volume fraction of the polymer and water phases (1.1). The parameter ε defines the thickness of the boundary layer separating phase domains. η1 and η2 are O (1) or O (ε) constant parameters that govern the nature of the energetic interations. η1 can be compared to the electrostatic energy of solvating the side chains, and η2 describes pressure associated with differing mixtures of the two phases. Wτ (u) is a double-well potential energy function as before, but it is no longer symmetric due to the difference in self-energies of polymer and water (Figure 1.2). Typically Wτ (u) = 1 2 1 (u + 1)2 2 (u − 1)2 − τ (u − 2) , where the positive parameter τ defines the amount that 3 the well is tilted. This asymmetry, along with η1 and η2 , gives rise to various geometries that can each minimize the energy. This behavior is not observed in the CH model with a tilted well because the addition of a linear tilt is eliminated by the H −1 gradient flow. In the FCH energy, the slope of the introduced linear term remains hidden in the positive squared term. Solutions of the FCH equation evolve on time scales ranging from O ε2 through O ε−2 8 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 −1.5 −1 −0.5 0 0.5 1 1.5 Figure 1.2 Tilted potential energy function W (u) [25]. At the fastest time-scales, O ε2 and O ε1 , initial data relaxes to form the proper front profiles between domains. When time is O (1), domains form according to a nonlinear diffusion equation which has stable equilibria at the zeros of W (u), specifically u (x) = −1. At longer O ε−1 time-scales, geometric structures evolve with a quenched mean curvature flow for the normal velocity of the domain fronts. Finally, for the longest time evolution, O ε−2 , domains that are separated in space compete for the minor phase, u = +1. The aim of this work is to accurately capture all of these disparate time-scales. Work by Gompper and Schick used small angle x-ray scattering data (SAXS) to motivate an energy model similar to the FCH model that describes amphiphilic systems such as two immiscible fluids mixed with a surfactant [26, 27]. In their study, they show a strong 9 connection between the Ginzburg-Landau energy model, the Canham-Helfrich surface energy, and experimental data. This connection can be combined with asymptotic analysis by Gavish et al. that shows convergence of the FCH energy to the Canham-Helfrich energy in the limit as ε → 0 [28]. Taken together, the analysis and experimental results suggest that the FCH model can be useful in probing the physically relevant geometries created by functionalized polymer/solvent systems. 1.5 Proposed Experimental Nanomorphologies Recently Gavish et al. described the nature of geometric minimizers to the FCH model [28]. In three dimensions, energy minimizing geometries include cylinders, inverted micelles, and bilayers. Experimental and numerical studies of a functionalized polymer, Nafion, have yielded a multitude of predicted geometric structures, some of which are shown in Figure 1.3. Nafion is a membrane made from sulfonated tetrafluoroethylene that was discovered in the 1960s by Walther Grot [33]. It is composed of long, hydrophobic, polymer chains that have been functionalized by attaching short, sulfonate-tipped, side-chains. Hydrolysis of these sulfonate groups can release latent energy, so small domains of water form inside the membrane to lower the free energy. Any continuum model for Nafion must be inherently binary due to the rarity of water embedded in the membrane. Water and low hydration ions form a common phase, so the value of u (x) represents the density of ions in water at each point in the domain. One of the first characterizations of Nafion was performed by Hsu and Gierke who predicted cluster chain pores [29]. A study by Ioselevich et al. attempted to combine experimental data from many experiments, and they believe that the Nafion backbones group to 10 Cylindrical pore morphology predicted by Rubatat et al. [31] Cluster chain morphology from work by Hsu and Gierke [29] Nafion backbone groupings suggested by Ioselevich [30] Morphology fit to SAXS data by SchmidtRohr and Chen [32] Figure 1.3 Proposed Nafion nanomorphologies based on experimental results provide a higher number of solvated acid groups per unit length [30]. Rubatat et al. used small angle x-ray scattering (SAXS) and small angle neutron scattering (SANS) to probe the structure of Nafion, and they concluded that the water domains were cylindrical in shape [31]. Schmidt-Rohr and Chen predicted parallel cylindrical pores and then numerically adjusted the two dimensional cross-section until it matched SAXS data [32]. From essentially the same experimental data, many different geometries have been predicted. Through simulations on the Functional Cahn-Hilliard equation, we intend to investigate the range of possible nanomorphologies that minimize the FCH energy and provide tools to better describe the balance of curvature and surface energies. 11 1.6 Functionalized Cahn-Hilliard Equation Before running simulations on the FCH model, we must convert it into a differential equation. The FCH energy (1.6) can be formally derived from the CH energy (1.1) by assigning a negative value to the interfacial energy via the Cahn-Hilliard energy, ECH , then balancing it against the square of its own variational derivative. Thus the FCH energy can be written as E (u) = 1 Ω2 δECH 2 dx − ηECH , δu (1.7) where we have simplified to the case η = εη1 = εη2 . Physically this describes electrostatic energy competition. The square of δECH δu describes phase separation that lowers energy by separating the polar solvent and non-polar polymer. Opposed to this is the negative CahnHilliard term, −ηECH , which lowers the energy by solvating the ionic polymer side-chains. Again, we introduce the chemical potential using formula (1.2) and note that the variational derivative of (1.6) can be written in terms of µ= δE = δu δECH δu δ 2 ECH −η δu2 to obtain δECH δu = − ε2 ∆ − W (u) − η ε2 ∆u − W (u) . (1.8) If we reintroduce the different values for η1 and η2 we obtain µ = − ε2 ∆ − W (u) − εη1 ε2 ∆u − W (u) − ε (η1 − η2 ) W (u) . 12 (1.9) As before, we take the H −1 gradient flow on the chemical potential. This gives the working form of the FCH equation, ut = −∆µ = ∆ ε2 ∆ − W (u) − εη1 ε2 ∆u − W (u) − ε (η1 − η2 ) W (u) . (1.10) A similar derivation has been shown by Gavish et al. for the Allen-Cahn like version of the equation [28]. That version of the model does not conserve mass, so it must be explicitly accounted for with a zero-mass projection at each step. This is accomplished by subtracting off the average change in solution from every point in the domain. With the FCH equation we need to specify initial and boundary conditions. The initial condition is simply u (x, 0) = u0 (x), where u0 is a function that identifies the composition of the material at each point in space and varies between −1 (pure polymer) and 1 (pure solvent) over the entire domain, Ω. The boundary conditions can take many forms depending on the set up of the experiment, but for now we use periodic boundary conditions since we are simulating a small domain inside bulk material of the same composition. 1.7 Numerical Challenges for the CH and FCH The Cahn-Hilliard and Functionalized Cahn-Hilliard equations both have some significant numerical challenges when simulating solutions. When the CH equation (1.3) is written out 2 1 with the standard double well potential, W (u) = 4 u2 − 1 , we have the equation: ut = −∆ ε2 ∆u − u3 + u . 13 This equation is notoriously difficult to approximate numerically due to the small time step size required to maintain stability of the solution. The behavior is often referred to as stiffness in the numerical approximation, and for this equation, it is due to both the harmonic operators and the nonlinear operator [34]. As an example, if the forward Euler method is used and ε = 0.01, the time steps must be on the order of 10−7 for the method to be stable. What makes the problem worse is that the fastest evolution in the solution is O ε2 ∼ 10−4 and Ostwald ripening (growth of large phase domains at the expense of smaller domains) is on a much slower O (1) time scale [35]. This means that an explicit method requires far too many time steps to get any reasonable results from a simulation. On the other hand, if backward Euler is used with the hope of obtaining a larger time step, the nonlinearity causes implicit solvers to fail due to large time steps moving the solution out of the basin of attraction (as in Newton’s method). The implicit time step restriction is not quite as dramatic as the restriction for an explicit method, but it is still unreasonably inefficient. When attempting to simulate solutions to the FCH equation (1.10), the set of numerical challenges includes the difficulties from the CH equation, but it also has other challenges as well. The equation includes a third laplacian operator and the tilted potential well increases the complexity of the polynomial terms (Figure 1.2), but the biggest difference is not numerical but physical. In the CH model, the complexity of the solution geometry decreases in time because a decrease in phase interactions leads to a decrease in energy. In the FCH model the solution can also decrease energy by increasing the amount interface between phases, so solutions to the FCH equation can become more complex as time evolves. These extra challenges require a researcher to think carefully about the types of methods to use when working on the FCH equation, but methods that have worked for the CH equation are 14 the best place to start looking for useful ideas. 1.8 Numerical Methods for the CH Equation Since the FCH model is new, numerical methods to simulate solutions to it have not yet been developed. Development of a time-stepping method for the FCH equation is the purpose of this research. Due to the strong connections between the CH and FCH models, a review of methods for solving the Cahn-Hilliard equation is necessary. A literature search returns far too many papers to discuss here, so this review will only touch on a few papers from each of the most relevant numerical methods. I divide the review into two parts: spatial schemes and time-stepping schemes. Any good numerical method for simulating the CH or FCH equation will require both a spatial and temporal component, and the connections between the two could be very important to the performance of the method. 1.8.1 Spatial Schemes The most common numerical method for solving the Cahn-Hilliard equation (1.3) has been the method of finite elements. Elliot and French were active in this area of research in the late 1980s. In their research, they developed several different finite element methods, each with slightly different properties. In the first paper, they present a method for solving the CH equation in one dimension and discuss the severity of the time step restriction due to stability [36]. Two years later with Milner, they propose a method that is second order in space and only requires the elements to be continuous [37]. In a second paper in 1989, Elliot and French present a non-conforming finite element approach and give proofs for the accuracy and convergence of the method [38]. 15 In 1991, Du and Nicolaides presented a semi-discrete finite element method that has the advantage of a Lyapunov function [39]. The existence of a Lyapunov function for their method makes it possible to prove convergence of the approximate solutions without any conditions beyond the existence and uniqueness of solutions to the original differential equation. Later, Barret et al. used a finite element method to model solutions to the CH equation with non-constant surface diffusion [40]. In the last decade, Feng and Prohl developed and analyzed a mixed finite element scheme, and they proved some error bounds that were O ε−1 , rather than O exp 1 ε which was the previous result [41]. A common thread through several of the methods, including the finite element schemes, was the use of multigrid techniques. In 2006, Kay and Welford gave a multigrid finite element scheme with mesh independent convergence rates [42]. It is interesting to note that this paper also discusses the difficulty of time step restrictions. An earlier paper by Kim et al. uses a conservative multigrid method to solve the CH equation. The solution is then coupled to a projection method to solve fluid flow with the Navier-Stokes equation for a two fluid system [43]. In the last few years, several papers have been published by Wise and his colleagues using non-linear multigrid finite difference schemes on the Phase Field Crystal model [7, 44, 45]. The Phase Field Crystal model is an extension of the Cahn-Hilliard model that has strong anisotropy so that it can effectively model crystal growth and formation. Their schemes are unconditionally stable with respect to time step size because they use a convex splitting technique developed by Eyre [46, 34]. The technique will be used for the method developed in Chapter 2, and it will be discussed extensively there. Previous to the Wise papers, Furihata published a finite difference scheme that was second order in time and space but suffered from the severe time step conditions on stability [47]. 16 Spectral methods have also played an important role in simulating solutions to the CH equation. In 1999, Zhu et al. applied a semi-implicit spectral method to the CH equation to study the long time behavior in a two dimensional domain [48]. In their scheme, they treated the principle elliptic operator implicitly and all the other terms explicitly. Later, Vollmayr1 Lee and Rutenburg used a spectral method to identify an O t 3 time step size which can control accuracy for a certain class of convex splitting schemes [49]. They came to the conclusion based on Eyre’s Theorem [46] and von Neumann analysis. In 2005, Ye and Cheng published a paper discussing the inheritance of energy dissipation and mass conservation in spectral methods [50]. Recently, Shen and Yang published the results of error estimates for several methods used to simulate the CH equation, and in their conclusion, they recommend spectral methods because of their effectiveness in capturing the interface fronts in the solution [51]. Lastly, discontinuous Galerkin methods have been applied to the CH equation with some success. In 2006, Wells et al. discussed a discontinuous Galerkin scheme and compared the results and convergence to a standard finite element method [52]. A year later, Xia et al. presented a local discontinuous Galerkin scheme that comes with a proof of energy stability [53]. The paper includes numerical accuracy results and also some results on a ternary system which could be of interest to our group in future research. 1.8.2 Temporal Schemes Chapters 2, 3, and 4 will discuss in depth three temporal schemes for the Functionalized Cahn-Hilliard equation: convex splitting, fully implicit, and exponential time differencing. Before delving into the work of this dissertation, I will review these three temporal methods for the Cahn-Hilliard equation and other partial differential equations. 17 In 1998, Eyre proposed a gradient splitting method for stabilizing the numerical computation of equations that have the form ut = F (u) = ∆ δE . Eyre applied this method to δu the Cahn-Hilliard equation (1.3) to take advantage of the convexity of the associated CahnHilliard energy (1.1). By appealing to convexity, the implicit portion of the discretized equation discussed in Section 1.7 will have a unique solution. This small adjustment in the method completely removes the time step restriction. The method is discussed in two papers, but only one of them was actually published [34, 46]. Several other authors have used convex splitting of the CH energy or similar models to improve performance of their numerical simulations. These methods are valuable because the convex splittings derived guarantee a unique solution at each time step and provide for methods that decay in energy at each step. In 2009, Wise, Wang, and Lowengrub presented an energy stable scheme for the phase field crystal equation [54]. Following this in 2010, Wise used the same scheme to approximate solutions to the Cahn-Hilliard-Hele-Shaw system of equations in a nonlinear multigrid framework [45]. Later, Gomez and Hughes developed a convex splitting based on the fourth derivative of W (u) that attains second order convergence in time [55]. A method that does not fall into the convex splitting category but retains gradient stability is the implicit-explicit stabilized scheme. Stabilization comes from adding and subtracting a term that stabilizes inversion of the linear portion of the operator. Papers in 2010 by Shen and Yang develop a first order version of the method [56] and review several other energy stable methods [51]. A later paper in 2012 by Shen, Wang, and Wang extends their analysis to a gradient stable scheme that is second order accurate in time. Before the development of gradient stable schemes, simulation of the CH equation was primarily performed with implicit schemes since explicit computations suffer from severe 18 time step restrictions. An early paper on computing the CH equation with implicit time stepping in one dimension comes from Elliot and French [36]. Further work with Milner provided a second order in space calculation [37], and French alone published a paper on an implicit scheme for the CH equation in two dimensions [57]. Lastly, a paper by Du and Nicolaides established a scheme with better stability properties [39]. More recently in 2008, Kronbichler and Kreiss published a paper on two phase flows where they used implicit time stepping for the Cahn-Hilliard portion [58]. In 2011, Willoughby completed his doctoral dissertation with Brian Wetton on implicit time stepping methods for the Allen-Cahn and Cahn-Hilliard equations [59]. The third method I implement in this dissertation is the exponential time differencing (ETD) method. ETD methods for ordinary differential equations originated as early as 1960 in work by Certaine [60] and Pope [61]. In the nineteen-sixties and seventies, A great many papers followed that refined the idea, but it eventually fell out of favor because it was infeasible on the current computing hardware. An extensive review of ETD and other similar methods was published by Minchev and Wright [62]. With recent advances in computing hardware and parallelism, ETD methods have finally become effective for partial differential equations. In 1994, Hou, Lowengrub, and Shelley used an exponential linearization to remove the stiffness from the computation of interfacial flows with surface tension [63]. This lead to a rash of papers being published within the last fifteen years on the topic of ETD methods for PDEs. In 1998, Beylkin, Keiser, and Vozovoi develloped discretization schemes for nonlinear PDEs [64]. Four years later, Cox and Matthews expanded the schemes to include higher order Runge-Kutta methods [65]. In 2004 and 2005, Du and Zhu published a pair of papers analyzing the stability of ETD-RungeKutta (ETD-RK) methods and proposing a complex integration for stabilizing higher order 19 terms [66, 67]. Kassam completed his doctoral dissertation on high order time-stepping for semilinear PDEs and published a paper with Trefethen [68, 69]. In 2005, Hochbruck and Ostermann published a pair of papers on ETD-RK methods for parabolic problems [70, 71]. They later published a review paper on “the construction, analysis, implementation and application of exponential integrators” [72], where they focus on the two dominant types of stiffness; equations where the Jacobian has eigenvalues with large negative real parts, or highly oscillatory problems with large, purely-imaginary eigenvalues. Tokman has published a series of papers focused on efficient computation of the exponential coefficients that arise in ETD schemes, particularly how to compute the coefficients for large scale computing [73, 74, 75]. Most recently, Koskela and Ostermann published a paper on extending exponential time differencing to higher orders using Taylor expansion [76]. 1.9 Spectral Method To effectively study temporal schemes we employ a Fourier spectral method in space. We consider a standard Fourier pseudo-spectral discretization of (1.3). The function u is approximated by the vector U defined on a discrete grid of N points (xj = jh, j = 1, . . . N ) equally spaced with spacing h = 2π/N . The discrete Fourier transform of U is denoted by ˆ U: 1 ˆ Un = N N e−i(j−1)(n−1)/N Uj j=1 which we can write in vector form as ˆ U = FU 20 where F is the matrix representation of the discrete Fourier transform. In the pseudo-spectral setting, U can also be considered to approximate u in the sense N ˆ Un (t)ei2παn x u(x, t) ≈ n=1 where αn are the appropriately aliased n values. We then use the usual spectral approximations for derivatives on grid points uxx ≈ F −1 Λα F U := ∆h U 2 where Λα is the diagonal matrix with entries −αn . The inverse discrete Fourier matrix F −1 = N F ∗ . Note that the matrix ∆h can be formed explicitly [77] but we only need the property that it is possible to multiply by it efficiently using the FFT. With these approximations, we can write a method of lines (MOL) discretization (a semi-discretization in space but keeping time continuous) of (1.3): dU = − 2 ∆h ∆h U + ∆h W (U). dt (1.11) where by W (U) we mean the vector with entries W (Uj ). Throughout this document we will focus on the approximation to the time-stepping and treat the analysis in a semi-discretized form. All of the computer codes are discretized in space using this Fourier spectral decomposition, and the operators are built explicitly in the frequency domain to obtain efficiency. 21 Chapter 2 Convex Splitting for the FCH Equation 2.1 Gradient Splitting Method We begin by giving a detailed description of Eyre’s convex splitting method. If an energy, E (u), is strictly convex, then its H −1 gradient is contractive. By definition, a functional, F (u), is weakly contractive in the H −1 Sobolev space if it satisfies F (u) − F (v) , u − v H −1 ≤ 0, for all u, v ∈ H 2 (Ω). Using integration by parts on the H −1 inner product with periodic boundary conditions, this is equivalent to (F (u) − F (v)) , (u − v) L2 ≤ 0, (2.1) Such a function gives energy decay in the approximation to differential equation, i.e. U n+1 − U n = F U n+1 ⇒ E U n+1 < E (U n ) , k 22 (2.2) where U n+1 and U n are the numerical solutions and times tn+1 and tn respectively, and k is the size of the time step. On the other hand, if the negated energy, −E (u), is strictly convex, then F (u) is expansive, we have F (u) − F (v) , u − v H −1 ≥ 0, which gives U n+1 − U n = −F (U n ) ⇒ E U n+1 < E (U n ) . k (2.3) Neither the Cahn-Hilliard energy (1.1) nor the Funcionalized Cahn-Hilliard energy (1.6) are convex, however, they can be split into convex and concave terms. After the separation of energy terms, we obtain the splitting of the gradient function into F (u) = Fc (u) − Fe (u) . (2.4) Fc (u) and Fe (u) are the respective contractive and expansive parts of the function F (u). With the gradient function split, Eyre proved that the scheme U n+1 − U n = k Fc U n+1 − Fe (U n ) (2.5) is consistent, gradient stable for any k > 0, and possesses a unique solution for each time step. The ability to use any size of time step is a massive improvement over the previous restrictions, but the method is only first order in time. We believe that we could improve the method by applying a deferred correction step to the time updates [78]. The essence of the method is that we take an explicit step on the expansive terms, and it comes with a large amount of error, but the implicit step on the contractive terms makes up for the instability of the explicit portion and keeps the solution decaying in the energy landscape. 23 2.2 Gradient Splitting Example To better understand the gradient splitting method, take the Cahn-Hilliard equation in one dimension, du = −ε2 uxxxx + u3 − uxx . dt xx 2 The first terms is contractive since it come from a convex term in the energy, ε2 | u|2 . and respectively. Using (2.1), we can show that it is contractive as follows, 5 − ε2 ∂x (uxxxx − vxxxx ) , ∂x (u − v) L2 = −ε2 ∂x (u − v) , ∂x (u − v) 2 L 2 3 3 3 = −ε2 ∂x (u − v) , ∂x (u − v) 2 = −ε2 ∂x (u − v) ≤ 0 (2.6) L 2 1 The second term also comes from a convex term in the energy, namely 4 u4 , and can be shown to be contractive in a similar fashion. ∂x u3 xx − v3 xx , ∂x (u − v) L2 = ∂x u3 − v 3 xx , ∂x (u − v) L2 3 4 = ∂x u3 − v 3 , ∂x (u − v) 2 = − u3 − v 3 , ∂x (u − v) 2 L L 4 4 = − (u − v) u2 + uv + v 2 , ∂x (u − v) 2 ≤ − u − v, ∂x (u − v) 2 L L 2 2 = − ∂x (u − v) , ∂x (u − v) 2 2 = − ∂x (u − v) ≤ 0 2 L 2 24 The third term is the backward heat operator and is expansive since it comes from − 1 u2 . 2 We can show that it is expansive by again using (2.1), 3 − ∂x (uxx − vxx ) , ∂x (u − v) L2 = − ∂x (u − v) , ∂x (u − v) 2 L 2 2 = ∂x (u − v) , ∂x (u − v) 2 2 = ∂x (u − v) ≥ 0 (2.7) 2 2 L Thus we can split the right hand side into (2.4), where the contractive and expansive functions are Fc (u) = −ε2 uxxxx + u3 xx Fe (u) = uxx . Using (2.5), the gradient splitting scheme is 2 2 U n+1 − U n = k∂x −ε2 ∂x U n+1 + U n+1 3 − Un (2.8) To see the nonlinear Newton solve needed, rewrite the equation as 2 2 U n+1 − k∂x −ε2 ∂x U n+1 + U n+1 3 2 = 1 − k∂x U n . For each time step, this equation can be solved by calculating the right hand side from the solution at the previous time step, then using Newton’s method to iteratively solve the nonlinear implicit portion, and thereby obtaining the solution at the next time step. Due to the contractive nature of the implicit terms, Newton’s method is now guaranteed to converge for any time step size, k > 0, as discussed in Sections 1.7 and 2.1. 25 2.3 Splitting the FCH Equation The gradient splitting method depends on the choice of mixing potential function W (u), since a different choice would give a different collection of convex and concave terms. The tilted, double-well potential used in this research is W (u) = 1 (u + 1)2 2 1 τ (u − 1)2 + (u − 2) 2 3 (2.9) and is shown in Figure 1.2. The parameter τ governs the amount of tilting in the double-well and has a significant impact on the possible minimal energy geometries. With the potential substituted into the FCH equation (1.10), we obtain the full PDE: 1 ut = −ε4 ∆3 u+ε2 ∆2 u3 + ε2 τ ∆2 u2 −ε2 (2 + εη1 ) ∆2 u+3ε2 ∆ u2 ∆u +ε2 τ ∆ (u∆u) 2 5 1 2 1 τ − ∆ 3u5 + τ u4 + τ − 4 − εη2 u3 − (6 + εη2 ) u2 + 1 + εη2 − τ 2 u (2.10) 2 2 2 2 Considering this equation with a first order approximation of the time derivative, most of the terms on the right hand side of the equation can be classified as contractive or expansive gradients of their corresponding contribution in the energy functional (Equation 1.6). One serious exception is the term 2 ∆ ∆u3 + 3u2 ∆u , which is from the Euler-Lagragian variation of the energy component Ω − 2 u3 ∆udΩ. This energy term is neither convex nor concave. Note that 3u2 ∆u + ∆u3 = −6u | u|2 + 6 · u2 u . To guarantee gradient stability for the scheme, we must adopt the semi-implicit terms −6un+1 | un |2 + 6 · un+1 2 un+1 while classifying the other terms as previously discussed. This gives the unconditionally 26 gradient stable nonlinear semi-implicit scheme un+1 − un =∆ k 4 + 2 η (un )3 − 2 2 + 4 η ∆un − 4 ∆2 un+1 − 6 2 un+1 | un |2 +6 2 · un+1 2 un+1 − 3 un+1 5 + 1 + 2 η un+1 . (2.11) Unfortunately, the mixed implicit-explicit terms make the numerical scheme more complicated and significantly decrease its speed and efficiency. We therefore chose to treat the difficult term fully implicitly and include it in Fc (u). The additive splitting can then be completed term by term and substituted into Equation 2.5. This gives the nearly contractive function Fc (u) = −∆ ε2 ∆ ε2 ∆u − u3 − ε2 3u2 + τ u ∆u + 3u5 τ 1 − (6 + εη2 ) u2 + 1 + εη2 − τ 2 u , (2.12) 2 2 and the corresponding expansive function 1 Fe (u) = ∆ ε2 ∆ (2 + εη1 ) u − τ u2 2 5 + τ u4 + 2 1 2 τ − 4 − εη2 u3 . 2 (2.13) Even though we cannot prove that this splitting is guaranteed to give energy decay for every time step size, every simulation we have performed with this method has demonstrated a decrease in energy. This slight deviation from Eyre’s original method seems to have little impact on the numerical stability of the scheme. 27 2.4 Stability and Solvability First, we will prove that the semi-discretized scheme is indeed unconditionally gradient stable. The proof is algebraic, relying simply on the integration by parts and Young’s inequality. We will prove stability for the simpler version of 2.11 which is the L2 flow on the energy, un+1 − un = 4 + 2 η (un )3 − 2 2 + 4 η ∆un − 4 ∆2 un+1 − 6 2 un+1 | un |2 k +6 2 · un+1 2 un+1 − 3 un+1 5 + 1 + 2 η un+1 . (2.14) Theorem 2.4.1. The semi-implicit scheme (2.14) is unconditionally gradient stable for any time step size k, i.e. F (un+1 ) ≤ F (un ) Proof. Choose the test function φ = − un+1 − un , multiply it to both sides of (2.14), 1 and integrate over the domain. On the left side, we have − k un+1 − un 2 ≤ 0 which approximates the theoretical decay rate −k ut 2 . Here and after, we use || · || for standard L2 norm in the inner product space. On the right side, we will recover the form F un+1 − F (un ) + R with R ≥ 0. This is computed term by term beginning with the linear terms. • − 4 ∆2 un+1 : we have 4 2 4 ∆un+1 , ∆ ∆un+1 • − 1 + 2 η un+1 : we have 1 + 2η 2 2 un+1 − un , which equals 2 − ∆un 2 + ∆un+1 − ∆un . (2.15) 1 + 2 η un+1 , un+1 − un , which is un+1 2 2 − un 2 + un+1 − un . 28 (2.16) • − 2 2 + 4 η ∆un : after integrating by parts, it reads 2 2 + 4η un , un − un+1 , and therefore can be written as 2 + 1 4 η 2 un 2 − 5 • −3 un+1 : it reads 3 un+1 3 un+1 5 5 un+1 2 + un − un+1 2 . (2.17) , un+1 − un . We shall show that , un+1 − un ≥ 1 2 1, un+1 6 − (un )6 . (2.18) Define a new function f un+1 = 3 un+1 5 un+1 − un − 1 2 un+1 6 − (un )6 . We show that f un+1 ≥ 0 by first noting that f (un ) = 0. Further, f 15 un+1 4 un+1 = un+1 − un which is non-negative for un+1 > un , non-positive for un+1 < un . Thus (2.18) holds. • 4 + 2 η (un )3 : a simple calculation gives a result similar to (2.18), i.e. (un )3 un − un+1 ≥ 1 4 (un )4 − (un+1 )4 . Thus 4 + 2 η (un )3 , un − un+1 ≥ • −6un+1 | un |2 + 6 · un+1 2 1+ 1 2 η 4 1, (un )4 − un+1 4 . un+1 : note that for the first term we have 6un+1 | un |2 un+1 − un ≥ 3 un+1 29 2 | un |2 − 3 (un )2 | un |2 , (2.19) therefore 1, 6un+1 | un |2 un+1 − un ≥ 1, 3 un+1 2 | un |2 − 3 (un )2 | un |2 . (2.20) The second term gives 6 un+1 2 , un+1 un+1 − un 1, 3 un+1 ≥ 2 un+1 2 − 3 un+1 2 | un |2 . (2.21) Adding (2.21) to (2.20), we obtain 2 −6un+1 | un |2 + 6 2 · 1, 3 un+1 un+1 2 2 un+1 un+1 , − un+1 − un 2 ≥ − 2 1, 3 (un )2 | un |2 . (2.22) From (2.15 - 2.22), we conclude that F un+1 ≤ F (un ), with F (u) defined by (1.6). Next, we will show that there is at most one solution to the nonlinear equation (2.14) for any time step k, therefore, there shall be no restriction from the solvability of (2.14) Theorem 2.4.2. The nonlinear equation (2.14) has only one solution in H 2 (Ω) for any time step k. Proof. Assuming un+1 = w, v are distinct solutions, by subtraction it follows that w−v = − 4 ∆2 (w − v) − 2 6w | un |2 − 6v | un |2 + 6 2 k −6 2 · w2 w · v 2 v − 3w5 + 1 + 2 η w − 3v 5 − 1 + 2 η v . 30 (2.23) Multiplying both sides of (2.23) by w − v and integrating over the domain, the left hand side of (2.23) becomes 2 1 k , (w − v) ≥ 0. It suffices to show that the right hand side of (2.23) is non-positive, thus w = v in H 2 (Ω). The right hand side gives − 4 ∆ (w − v) 2 − 6 2 | un |2 , (w − v)2 − 1 + 2 η − 3 w5 − v 5 , (w − v) + 6 2 · (w)2 (w) − 6 2 · (v)2 w−v 2 (v) , w − v . It suffices to show that g (w, v) = − 4 ∆ (w − v) 2 − 3 w5 − v 5 , w − v + 62 · w2 w − 6 2 · v 2 v, w − v ≤ 0. (2.24) 2 It is easy to check that 3 w5 − v 5 (w − v) ≥ w3 − v 3 , therefore 2 g (w, v) ≤ − 4 ∆ (w − v) 2 − w3 − v 3 + 6 2 · w2 w − 6 2 · v 2 v, w − v . (2.25) Integration by parts gives 62 · w2 w − 6 2 · v 2 v, w − v = − 2 6w2 w − 6v 2 v, (w − v) = 2 w3 − v 3 , 2 ∆ (w − v) . Using Young’s inequality, it follows that 62 · w2 w − 6 2 2 · v 2 v, w − v ≤ 4 ∆ (w − v) 2 + w3 − v 3 . 31 (2.26) Adding (2.26) to (2.25), we have g (w, v) ≤ 0, therefore w − v 2 = 0, i.e., w = v. So far, after assuming existence and regularity of the solution to (2.14), we have shown that the semi-discrete scheme (2.14) is unconditionally gradient stable and admits at most one solution. We would like to comment that 2 Remark 1. When the finite element method with solution base Hh (Ω) is used, the above analysis holds with suitable boundary conditions. However, Theorem 2.4.1 and 2.4.2 may not be true for other spatial discretization. The motivation of designing such a scheme is due to the fact that one of the major difficulties for the numerical simulation of (1.10) is an accurate time stepping strategy. Remark 2. The generalization of this proof to the conservative H −1 flow (2.11) in the semi-discrete framework is straightforward by using the test function φ = ∆−1 un+1 − un . 2.5 Fixed-Point Iteration To solve the implicit portion of equation (2.5), we apply Newton’s method G (U r ) U r+1 − U r = −G (U r ) , (2.27) where r is the iteration number, and we define G U n+1 = U n+1 − U n − kFc U n+1 + kFe (U n ) . 32 (2.28) We desire to iteratively solve for U r such that G (U r ) < with a desired tolerance. First we calculate the Fr´chet derivative e G (U r ) v = v − kFc (U r ) v, (2.29) with peturbation v and Fc defined as Fc (U r ) v = −∆ ε2 ∆ ε2 ∆v − 3 (U r )2 v − ε2 (6U r v + τ v) ∆U r − ε2 3 (U r )2 + τ U r ∆v 1 +15 (U r )4 v − τ (6 + εη2 ) U r v + 1 + εη2 − τ 2 v . (2.30) 2 The Fr´chet derivative is necessary in this use of Newton’s method because G (u) is a funce tional on H 6 (Ω). The Fr´chet derivative of a function f (u) is defined as the linear operator e A (u) that satisfies the following limit, f (u + v) − f (u) − A (u) v = 0. v v →0 lim It can be easily calculated by computing f (u + v) − f (u) and then keeping only the terms that are linear in v. This is precisely how we obtained (2.30) from (2.12). It is important to note in equation (2.29) that G (U r ) is a complicated function of U r . Computing this operator for every step of the iterative solve is computationally infeasible. Instead, we approximate G with the operator 1 ˜ G = 1 + k∆ ε4 ∆2 − c2 ε2 ∆ + c1 + 1 + εη2 − τ 2 2 (2.31) which is independent of the phase-field solution U r , and be computed entirely spectrally 33 which is a very quick calculation. c1 and c2 are O (1) empirical constants that approximate the maximum values of the terms depending on U r . With this approximation, the new fixed-point method loses the second order convergence of Newton’s method, but the total computational cost is much lower for each iterative process. We do not yet have a proof that this splitting applied to the FCH equation (1.10) is a contraction mapping, so we will present a proof that is applied to the Cahn-Hilliard equation (1.3) [79]. The proof is in a finite element framework, even though our method uses a Fourier method. The approximate Newton iteration for the CH equation is 1 + kε2 ∆2 − kC∆ ur+1 = k∆ (ur )3 + un − k∆un . (2.32) We will prove that (2.32) is a contraction mapping for small enough k. We first change (2.32) into a system of equations,    U − kC∆U − k∆P = v , (2.33)   P + ε2 ∆U = u3 − v where v = un . For the partition kh 1 in the finite element space σh = Hh (Ω), we look for a pair (U, P ) ∈ σh such that    U, φ + kC   P, ψ − ε2 U, φ + k P, φ = v, φ (2.34) U, ψ = u3 − v, ψ holds for any φ, ψ ∈ σh . We define a mapping Πv : σh → σh for any v ∈ σh , so that U = Πv (u). Lemma 2.5.1. For given v with v 2 = α, there exists a constant β > 0 such that Πv 34 maps the ball S = {u ∈ σh : u 2 ≤ 2α} into itself if k ≤ βε2 hδ . hδ is the mesh size related parameter. Proof. Letting φ = ε2 U and ψ = kP in (2.34), we have   2  ε U 2 + kCε2 U 2 + kε2 P, U = ε2 v, U 2 2   k P 2 − kε2 U, P = k u3 − v, P 2 . (2.35) Summing the two equations gives ε2 U 2 + kCε2 2 U 2 + k P 2 = ε2 v, U + k u3 − v, P , 2 2 (2.36) and applying the identity a2 − ab = 1 a2 − 1 b2 + 1 (a − b)2 leaves us with the inequality 2 2 2 U 2 + 2kCε2 2 2k 2k U 2 + 2 P 2 ≤ v 2 + 2 u3 − v, P . 2 2 2 ε ε (2.37) Using Young’s inequality, it follows that U 2 + 2kCε2 2 k U 2 ≤ v 2 + 2 u3 − v 2 . 2 2 2 ε (2.38) ˜ By inverse estimates in the finite element space, there exists C > 0 independent of hδ and 1 ˜ ˜ − α such that u 2 ≤ Cαhδ 2 for u 2 ≤ 2α, where hδ only depends on mesh size. C only ∞ depends on the degree of the underlying polynomial, so we can choose C = 35 1 ˜ −2 Cαhδ 2 or more practically u 2 so that ∞ U 2 + 2kCε2 2 2kC 2 2k U 2≤ v 2+ 2 u 2+ 2 v 2 2 2 2 2 ε ε 8kC 2 α2 2kα2 ≤ α2 + + 2 . ε2 ε Thus it suffices to require that 8kC 2 + 2k ≤ 3ε2 or equivalently k ≤ 3ε2 8C 2 +2 (2.39) ≤ βε2 hδ . Similarly, we can show that Πv is a contractive mapping if k ≤ βε2 hδ for some β > 0, but the proof is nearly identical so we will not repeat it here. 2.6 CPU version of code We found solutions to the Functionalized Cahn-Hilliard equation by applying a spectral method to (2.5) with Fc and Fe given by (2.12) and (2.13) respectively. We wrote the computer code in C++, but any number of languages could have been used. There are two primary reasons we chose a spectral method; thin interfaces between phases and simplicity of taking numerical derivatives. First, spectral methods offer the highest possible spatial resolution. Due to the physics of the materials modeled, solutions to the FCH equation develop O (1) changes over O (ε) intervals in space, so high spatial resolution is critical. We find that we can obtain reasonable solutions when we have as few as three grid points over the length of ε. For a cubic domain with 128 grid points along each edge, this resolution allows us to effectively capture up to forty-two changes in phase from polymer to water or visa versa, which is far more than we need in any reasonable simulation. Second, the properties of a spectral method simplify derivatives in higher dimensions. The 36 number of Laplacians needed to calculate the solution at each time step becomes difficult with other methods, but in the frequency domain, each Laplacian is a simple elementwise multiplication of the Laplacian operator array against the target array. The spectral method brings with it some other challenges. First and foremost is the computational cost of transforming large arrays back and forth between the frequency and spatial domains. Fortunately, the FFTW package provides the tools to do Fourier transforms quickly and can easily be adapted to parallel computing on multiple CPUs. Another restriction that comes with spectral methods is limitations on the domain shape and boundary conditions. Domain shape is limited to rectangular prisms since grid points must be spaced regularly in every direction. Further, the number of grid points in every direction should be a power of two to use the full speed of FFTW. When computing solutions spectrally, periodic boundary conditions come automatically, but any other boundary conditions become difficult or impossible to compute. Fortunately for us, periodic boundary conditions are reasonable for our simulations, but in future work, we may need to apply Dirichlet, Neumann, or mixed boundary conditions. At that time it will probably be necessary to switch methods. 2.7 Adaptive Time-Step In addition to the advantages of a spectral method, we can speed up the simulation by adapting the time step to the evolution of the solution. The balance between Cahn-Hilliard like coarsening and surface interface generation causes periods of rapid change in the solution interspersed between slow evolution. Due to the unconditionally stable nature of the numerical scheme, we are free to choose a time step as large as we would like at the cost of 37 numerical error. When choosing a time step size, we estimate the evolution with a completely explicit calculation based on the previous time step size. The residual is Rn = k [Fc (U n ) − Fe (U n )] ≈ U n+1 − U n . (2.40) Then we determine the new time step knew = ct , max (Rn ) (2.41) x∈Ω where ct is a constant that allows us to control the time step sizes globally and is typically a value near 1. In future work, we intend to replace this crude estimate with a control on the error rather than the change in the solution. A simulation that captures the importance of adaptive time stepping for the FCH is that of a sphere with an O ε2 perturbation. A sphere that has too large of a radius is an unstable steady state of the equation, so with a small perturbation the solution has very little change for a long time period. Eventually the evolution moves the solution away from the unstable steady state and collapses into a lower energy geometric solution. A graph of the energy that shows the necessity of adaptive time stepping is in Figure 2.1. It is easy to see the value of a time step that adapts to the solution. For evolution before T = 600, large time steps can be taken because there is very little change, but as the solution begins to change rapidly very small time steps must be taken in order to preserve the integrity of the approximate solution. Figure 2.2 gives still shots of the solution at important time intervals. Comparing to the energy, we see that the solution changes very little between T = 1 and T = 600 and it would be a severe waste of computing time if we 38 Figure 2.1 Energy trace of solution with unstable equilibrium initial condition. used uniform time steps that were small enough to capture the changes at T = 630 and later. After T = 700 the evolution of the solution slows, and it is reasonable to take larger time steps again. 2.8 FCH on Graphics Processing Units A recent development in scientific computing is General-Purpose computing on Graphics Processing Units (GPGPU). In the 1990s, Graphics Processing Units (GPUs) were developed to accelerate the building of 2D and 3D images for output to a display. Since then, the stream processing capabilities of GPUs have been turned to scientific computing, and the 39 T =1 T = 600 T = 630 T = 640 T = 650 T = 10000 Figure 2.2 An unstable steady state solution for the FCH equation application to scientific computing has lead to commercial lines of GPUs that have hundreds of computation cores but no display output hardware. GPU computation can be much faster if a problem or computation lends itself well to many lightweight computation threads. We used Nvidia GPU cards and the CUDA programming language which is C++ with extra functions to control the device (GPU) from the host (CPU). When simulating the FCH equation spectrally with the gradient splitting detailed in section 2.3, there are three main types of mathematical operations; Fourier transforms (FFT), inverse Fourier transforms (IFFT), and multiply/add calculations. All three of these operations can be adapted very well to GPU processing. Three dimensional Fourier transforms can be computed extremely fast on a GPU by breaking the domain into lines of grid points, 40 then handing each line of grid points to a separate processor for calculation of the one dimensional FFT. After each processor is done computing the frequency representation of the data, the GPU can recompile the data back together. Inverse Fourier transforms can be easily computed for the same reason. The cuFFT package has already been developed to implement FFT calculations on the GPU, and it’s easy to implement because the syntax and structure matches almost exactly with the FFTW library for C++. Once the data can be quickly converted into either the spatial or frequency domain, the entire equation is simply element-wise multiplication or addition in the appropriate domain. This plays directly into the strength of the GPU, millions of lightweight computation threads. There are however limitations to GPU computing that cannot be overlooked. Unlike a CPU that has access to any memory location, GPUs multiprocessors can only access the data that is stored in the GPU card’s memory. This leads to two complications; limitations on array size and movement of data. The GPU card we used had only 4GB of memory on the card, so for the number of double-precision, complex-valued arrays we used in the simulation, the size of the domain was limited to 223 total grid points. For a cubic domain, this limit corresponds to 128 grid points in each dimension. It is important to realize that this is a rigid memory limit because it is defined by the GPU card’s hardware. The second complication is moving data to the memory on the GPU. The movement of data on and off the card is the only way to access the solutions to the PDE, and data movement is much slower than computation speed. Moving arrays onto the card for every calculation would erase all of the speed-up obtained from using the hundreds of processors. Our answer to this limitation was to put the initial condition array on the card at the beginning and not take the array off the card except when the simulation ended or we needed to record the data. Without this adjustment when porting the code from C++ to 41 CUDA, the change would have been pointless. The final limitation that needs mentioning is the limit on double precision computations. There are some GPUs that are not capable of performing double precision calculations, but on the GPU we used, there was one double precision core on each multiprocessor, as opposed to eight single precision cores. This made the simulation much slower when we used double precision, but recent advances in GPU technology have rectified this. The newest GPUs for scientific computation have only double precision cores, which gives them up to sixteen times more double precision capability [80]. 2.9 GPU Speed-Up We conducted a speed test to compare simulation code written for parallel CPUs against the code written for the GPU. The CPU code was written in C++ and fully parallelized using OpenMP. The parallel version of FFTW was used for the Fourier transforms. The code was run on two Intel E5530 processors with four cores per processor. The processors have a clock speed of 2.4 GHz. The GPU used was a Tesla M1060 with 240 multiprocessor cores with a clock speed of 1.3 GHz, and it had 4GB of global memory. The two hardware configurations were both about two years old at the time of the test, so the speed result are comparable. The test was performed on initial conditions of random initial data for time up to T = 0.001, which required approximately 20 time steps. The solutions given by the two codes were exact up to implicit iteration precision. The simulation length was relatively short compared to our typical FCH simulations that compute solutions up to T = 10, 000 or even T = 100, 000. This was to show the impact of slow transfer of data between the host and GPU device. 42 Figure 2.3 gives the speed-up comparison for single and double precision calculations. For both the single and double precision experiments, the speed-up for a single time step was much higher than the total time speed-up due to the slow transfer of initial data onto the GPU card at the beginning of the simulation and transfer of the final solution off the card at the end of the simulation. As the simulation time grows longer, the total time speed-up converges to the speed-up per time step value since the data transfer to and from the GPU will become insignificant. The jagged nature of the single precision, per-time-step curve is due to the error in measuring extremely small computation times. If the same speed-up experiment was run on a state of the art GPU and parallel CPUs, the results are expected to be similar, however the improvement in number of double precision cores per multiprocessor on the Fermi line of GPUs leads us to expect results more like the ∼ 25x speed-up of the single precision experiment [80]. 2.10 Comparison of results to experimental data A mathematical model is only as good as its ability to describe real materials. Unfortunately, three dimensional images of Nafion are not available, but there are systems composed of amphiphilic diblock copolymer systems that obey similar surface energy dominated physics. Figure 2.4 shows a comparison between a previous 2D simulation on the FCH equation [28], and images from an amphiphilic diblock copolymer system experiment [81]. Three dimensional simulations also show pore network solutions that self-assemble from random initial data. Figure 2.5 shows a three dimensional pore network along with two other solutions that all had identical random initial data. The difference between the geometries is small changes in the parameters η1 and η2 in equation (1.10). 43 Another geometry that has been seen extensively in experiments is the pearled pore. Figure 2.6 shows an image from an experiment performed by Bendejacq et al. on a diblock copolymer system [82]. Next to the experimental image is a simulation of pearling from the FCH equation. The numerical simulation had initial conditions of a straight pore with radially symmetric O (ε) noise. The periodic boundary conditions prevent the pore from extending or contracting, and thereby force the solution into a frustrated quasi-steady state. We have not yet been able to find steady state pearling from random initial data in three dimensions, but we hope to find pearled pores and other interesting structures with a future parameter scan. 2.11 Conclusions We extended Eyre’s convex splitting scheme to the Functionalized Cahn-Hilliard equation and obtained a numerical scheme. We proved that the scheme had only one solution and that the energy could not increase for any size time-step. This shows that the method is unconditionally gradient stable with respect to the size of the time-step taken. The right hand side of the equation could not be split additively as desired, but by moving the mixed term into the implicit portion, we were able to obtain an effectively gradient stable method. A novel iterative technique significantly increased the calculation speed by eliminating the need to rebuild the inverted operator for each iteration. The operator is built once at the beginning of the computation saving computation time. Using this temporal scheme with a Fourier spectral method for the spatial discretization made a scheme that could be implemented on a graphics processing unit. The GPU code is extremely efficient, and the speed up allowed us to study large, three-dimensional domains over long evolution times. 44 35 Times Faster than CPU 30 25 20 15 Per timestep 10 Total time 5 16 17 18 19 20 21 22 Number of Points in Domain (Log 2P) 23 11 10 Times Faster than CPU 9 8 7 6 5 4 3 Per timestep 2 Total time 1 0 16 17 18 19 20 21 22 Number of Points in Domain (Log 2P) 23 Figure 2.3 Speedup of GPU over parallel CPU for a short FCH simulation computed using single precision (top) and double precision (bottom). 45 Figure 2.4 Two dimensional FCH simulation results (left) compared against images from a diblock copolymer experiment [81] (right) Pore network Lamella network Inverted micelles Figure 2.5 Geometries that minimize the FCH energy for slightly different values of η1 and η2 . The images show a level set near u = 0 with blue on the water side of the level set and green on the polymer side 46 TEM experimental image Experimental diagram Figure 2.6 Experimental results for a diblock copolymer pore showing pearling instability [82] (left) compared against result of a numerical solution of the FCH equation (right) 47 Chapter 3 Fully Implicit Method 3.1 Introduction Many material science problems require an understanding of the microstructure that develops in a mixture of two of more materials or phases over time as it phase separates during a casting or annealing process. One such equation is the well-studied Cahn-Hilliard [10] equation, written below in equation (3.2) in a one-dimensional (1D) setting, that describes a binary alloy during annealing. The parameter in the model describes the width of the layers between the regions. Such regions form in O(1) time in a spinodal evolution. Subsequently, they merge in a ripening process. Ripening happens on longer time scales, generically O(eC/ ) for 1D Cahn-Hilliard [83] and O(1/ ) in higher dimensions [11]. We extend the use of the terms “spinodal” and “ripening” to describe similar regimes in the evolution described by other equations. Phase regions undergoing Cahn-Hilliard evolution increase in size over time in a coarsening process. The statistics of this coarsening process are of interest [84]. The Cahn-Hilliard model is a sub-class of phase field models. A review of the use of such models in material science applications can be found in [6]. It can be shown rigorously that as → 0, solutions of Cahn-Hilliard equations have layers that tend to interfaces that move with a nonlocal geometric motion known as the Mullins-Sekerka flow [11]. Other phase field models also limit to geometric motion of other kinds. Understanding the limiting process 48 and studying it directly is of interest. In addition, Cahn-Hilliard equations and variants can be used in computational approximation of moving interfaces in so-called diffuse interface methods [85, 86] in which the problem for u is coupled to other variables describing other physics. While the computational approach developed in this chapter might be useful to some diffuse interface computations, we are motivated by a general class of pure (uncoupled to other physics) energy gradient phase field problems described below. There are several interesting generalizations of the Cahn-Hilliard equation. A lower order version (two instead of four spatial derivatives), the Allen-Cahn equation (3.1) [87] is also of interest in materials science, describing the evolution of crystal grains of the same material during annealing. This equation can also be called a Ginzberg-Landau equation. The aim of this chapter is to develop a numerical approach that can be applied to a wide range of phase field problems that can easily be adapted to new terms, higher order problems, and extension to vector solutions. It should be made clear we do not attempt to outperform well-developed codes with space and time adaptivity with fast, multi-grid solvers that have been developed for particular problems. Rather, we develop a reasonably fast time-adaptive technique with general applicability. Since many questions of interest in materials science are about the microstructure of a bulk material far from boundaries, it is reasonable to consider problems in periodic domains. We use a Fourier spectral discretization which is a natural choice in this setting. Although this does rule out spatial adaptivity, it does admit a fast implementation on Graphical Processing Units (GPU) in the computational framework we develop. We discretize in time using Backward Differentiation Formula (BDF) methods [88] of low order, which have good stability properties. Temporal error estimation is done with Adams-Bashforth (AB) [89] predictors. Newton’s method is used to solve the resulting nonlinear problems. The Jacobian 49 matrix in the solve for the Newton update is symmetric since it is the second variational derivative of an energy functional. It is also positive definite for time steps small enough (this is discussed in more detail below). Although the Jacobian is dense for spectral discretizations, multiplication by the matrix can be done quickly using the Fast Fourier Transform (FFT). This motivates our use of the conjugate gradient method [90] to solve the Newton updates. Such an approach used on high order problems requires an efficient preconditioner. We use a constant coefficient version of the problem that is a linearization at pure phase states, which will dominate the solution during ripening. This idea is similar to that used in [91] in fixed point iterations for time stepping for Cahn-Hilliard with operator splitting. Efficient performance is seen with our approach for a wide selection of scalar and vector problems from second to sixth order. Mild increase is seen in preconditioned conjugate gradient (PCG) iteration counts per time step as the time step is increased and is decreased. Exploration of the performance of the method specifically for the 1D Cahn-Hilliard problem, which has a well understood structure during ripening, shows that the number of PCG iterations per solve scales as O( δt/ ) for large time steps δt and small , independent of the spatial discretization. There have been many contributions to the numerical solution of the Cahn-Hilliard and related equations. Our work is novel in four ways: we exploit the symmetry of the Jacobian matrix for the fully implicit time stepping problem in a CG method; we propose and analyze the preconditioner for this Jacobian solve and show that it is effective for a number of problems; we implement the method in modern GPU architecture and get fast performance; we demonstrate that the Jacobian matrix is not singular for large time steps during ripening and that fully implicit time stepping leads to accurate solutions with these large time steps with energy decrease. 50 We discuss further the issue of time-stepping. For arbitrary discrete initial data, the fully implicit time discretization problem is known to have a unique solution only when the time step is small enough [39]. As the spatial grid size h = ∆x is reduced and the initial data on the refined grid is again allowed to be arbitrary, the time step that guarantees unique solutions to the implicit time discretization problem is reduced. In addition, it is not possible in general to show that a fully implicit time step leads to a decrease in the underlying energy. Guarantees of solvability for any time step δt and energy decrease are possible for some models with an operator splitting approach due to Eyre [46]. Although never published, this work has been very influential and some of the results are summarized in [49]. Several of the computational approaches cited in Chapter 1 use variants of this splitting approach. Some of the splitting techniques lead to nonlinear problems and we show that our ideas lead to efficient solution of these problems (with preconditioned CG iteration counts independent of δt and ). Guarantees of energy decay and unique solutions for any chosen time step δt are very attractive and so splitting techniques have dominated the thinking in numerical methods for these problems for many years. Our use of fully implicit time stepping may be seen as controversial. We first considered this approach because of some motivating high order, vector models for which the process of splitting the gradient terms into convex and concave parts was not straightforward. When implemented, we discovered that fully implicit time stepping did not suffer from severe time step restrictions as the literature predicted. That previous analysis was really for a “worst case” scenario and the solution structure through ripening processes allowed for large time steps to be taken. We also discovered computationally that Eyre’s splitting can lead to disproportionately large temporal errors during ripening and prohibits the use of large time steps appropriate to the dynamics there if time-accurate 51 solutions are required. This poor behavior can be understood with formal asymptotics on a simple problem shown below. Our fully implicit time stepping is able to take appropriate large time steps and maintain accuracy. While not the original intention of our work, our results suggest that fully implicit time stepping strategies are more efficient than popular splitting techniques when time accuracy is desired, at least for the selection of models we consider. We outline the following sections of the paper. In Section 3.2 we present the equations for the various models we consider and how they arise from an energy gradient flow. In Section 3.3 we give a basic description of the approach in a 1D setting using simple backward Euler time stepping. This is done for clarity of exposition, but it should be made clear that the approach has wide applicability, shown in Section 3.5 for a number of models in 2D and in Section 3.6 where a GPU implementation to Cahn-Hilliard in 3D is described. Higher order time stepping methods are discussed in Section 3.7. The performance of the preconditioner is examined numerically and with formal asymptotics in a simple 1D setting in section 3.4. A similar approach is taken in section 3.9 to investigate time stepping with operator splitting. 3.2 Models We consider first the very basic model for a scalar function u(x, t) in a 1D setting of the Allen-Cahn [87] equation ut = 2 uxx − W (u) (3.1) ut = − 2 uxxxx + (W (u))xx (3.2) and Cahn-Hilliard [10] equation 52 where W (u) = 1 (u2 − 1)2 . This is simply the one-dimensional version of 1.3. We consider 4 x ∈ [0, 2π] and u to be periodic in space. Qualitatively, the reaction terms W (u) drive the solution to the two pure states u = ±1 and the other terms smooth the interface between regions of different phases over a width of O( ). These equations can be written with the in different places corresponding to different time scalings. It is important to note that the results discussed below on the condition number of the preconditioned conjugate gradient method and its dependence on time step δt and are with respect to the scaling shown in equations (3.1) and (3.2) above. Higher dimensional versions of these equations are obtained by replacing ∂ 2 /∂x2 by the Laplacian ∆. The models above are gradient flows on the Cahn-Hilliard energy defined in 1.1. The fact that the evolution is a gradient flow leads to a symmetric Jacobian matrix for the implicit time step of the discretization, allowing the use of the conjugate gradient method for its solution. Note that the Allen-Cahn model is a gradient flow in the standard L2 inner product: 2π (u, v) := uvdx. (3.3) 0 However, the Cahn-Hilliard model is a gradient flow with respect to the H−1 inner product: (u, v)H−1 := (u, ∆−1 v). (3.4) In addition, we consider the scalar sixth order problem 1.10 where η is a given positive constant (note that with this sign, this term promotes the formation of phase interface). This is also a gradient flow of a certain energy in the H−1 inner product. Models of this form are of current interest in the study of pore formation in functionalized polymers [92]. 53 We also consider the following vector model for u = (u, v): ut = − 2 ∆∆u + ∆ u W (u) (3.5) where here 3 |u − ui |2 W (u) = (3.6) i=1 and ui are the points in the (u, v) plane that correspond to the cube roots of unity. This is a volume preserving model that forms symmetric triple junctions between three phases. It can be seen as the higher order mass preserving version of the Ginzberg-Landau equation presented in [93]. The extension of our computational method to higher dimensions and more complex (higher order, vector) models is relatively straight-forward. We consider this the main strength of our approach. 3.3 3.3.1 Basic numerical approach and results Spectral discretization in space In Section 1.9, we discretized the Cahn-Hilliard equation in space using a Fourier spectral method. The sixth order and vector models are discretized in a similar manner. It is a direct computation to show that the MOL discretizations above guarantee a decrease in a discrete energy:   N  2h  d |F −1 Ωα F U| + h W (Uj ) ≤ 0  dt  2 j=1 54 where in the left hand term |·| is the Euclidean norm and Ωα is a diagonal matrix with entries iαn . The left hand term in braces is the discrete analogue of the energy (1.1). Since the MOL discretization has this property, we can expect time discretizations to have the property also, at least up to the order of the truncation error of the method. This is observed in the variety of computational examples shown in the rest of this work. In fact, with the adaptive time stepping strategy we use, an increase in this discrete energy is never observed in any accepted time step including very large time steps used during ripening events. We continue with a description of the time discretization of these models below. 3.3.2 Adaptive, implicit discretization in time We consider the fully discrete approximation of u m u(jh, tm ) ≈ Uj , j = 1, . . . N and m = 0, . . . M with t0 = 0 where initial conditions are given and time steps δtm = tm − tm−1 are chosen adaptively as shown below. A basic, first order approach is shown here. Higher order methods are presented in section 3.7. They provide some efficiency gains but are not overwhelmingly superior for modest accuracy. Consider the semi-discrete Allen-Cahn equation (1.11). Starting at tm−1 we form the explicit, forward Euler predictor U∗ for the solution at time tm : U∗ = Um−1 + δtm 2 ∆ Um−1 h − W (Um−1 ) . (3.7) We use this as an initial step in a Newton iteration for the solution to an implicit, backward 55 Euler time step G(Um ) := Um − δtm 2 ∆ Um h + W (Um ) − Um−1 = 0 (3.8) We describe the solution procedure for this problem in the subsection below. Expanding the predictor and corrector steps in Taylor series, it is easy to show that the exact local truncation error ηe for this step can be approximated by ηe ≈ η := 1 ∗ U − Um 2 as is well known. We use maximum norms for all error calculations in this work. In timeadaptive computations below, we specify a given tolerance σ > 0 for the local truncation error. If a time step fails the accuracy check (η > σ) then we fail the step and repeat with the time step reduced by a factor of 1.3. This particular value is a somewhat arbitrary factor but there were very few failed steps in the computations shown in this work and changing this value does not change performance. A time step is also failed if the Newton iterations do not converge or if the step leads to an increase in energy (although as noted above, this failure was never observed in any of the computations described in this chapter). After a successful step, the next step is taken with time step δtm+1 = δtm max 0.8 σ , 1.3 η where the local truncation error is assumed to be dominated by its leading order quadratic term and 0.8 and 1.3 are “safety factors”. 56 3.3.3 Solution of the implicit system Consider (3.8), the implicit problem to be solved at each time step for the discretization of the Allen-Cahn equation (3.1). Let U(r) denote the r’th Newton iterate approximation of Um . The next iterate requires a solve with the Jacobian coefficient matrix J = I − δtm 2 ∆h + δtm Λ2 (3.9) where Λ2 is the diagonal matrix with entries (r) (r) W (Uj ) = 3[Uj ]2 − 1. The matrix J is dense but symmetric and multiplication by J can be done efficiently using the fast fourier transform and diagonal multiplication. This suggests the use of the conjugate gradient method as a solution technique. However, the condition number of J is large and the equivalent matrices for problems with higher order derivatives have even larger condition number. Efficient preconditioning is clearly required. We propose the preconditioner Q−1 where Q is the discretization of a constant coefficient problem Q = I − δtm 2 ∆h + 2δtm (3.10) which can be inverted efficiently. This is motivated heuristically by the observation that during ripening, the solution will have values approximately ±1 at most grid points. The behavior of the preconditioned gradient solver is examined in computational studies below and analytically in section 3.4. For the discretization of the Allen-Cahn equation, it is shown 57 that the condition number is O(δt) for large δt independent of during ripening. For the discretization of the Cahn-Hilliard problem (3.2) we have the following Jacobian matrix JCH and use the preconditioner QCH JCH = I + δtm 2 ∆h ∆h − δtm ∆h Λu (3.11) QCH = I + δtm 2 ∆h ∆h − 2δtm ∆h . (3.12) Note that for this problem and the ones below, the matrices are symmetric in the discrete version of the H−1 inner product (3.4). In the preconditioned conjugate gradient (PCG) algorithm [90] these inner products are used. The preconditioned Jacobian matrix is shown to have condition number O(δt/ ) in ripening states in this case. For the discretization of the sixth order problem (1.10) we have J6 = I − δtm 4 ∆h ∆h (∆h + ηI) + δtm 2 ∆h ∆h Λ2 (3.13) +δtm 2 ∆h (ΛL Λ3 + Λ2 ∆h + ηΛ2 ) −δtm ∆h (Λ2 + Λ1 Λ3 ) 2 Q6 = I − δtm 4 ∆h ∆h ∆h − δtm (η 4 − 4 2 )∆h ∆h (3.14) −δtm (4 − 2 2 η)∆h where here Λi , i = 1, 2, 3 is the diagonal matrix with entries di W (r) (U ) dui j and ΛL is the diagonal matrix with entries ∆h U (r) . Here, the preconditioner is derived with 58 the same heuristic reasoning as above, that at pure states u = ±1, Λ1 = 0, Λ2 = 2I, Λ3 = 6I and ΛL = 0. For the vector problem (3.5) we consider the solution components (Uj , Vj ) at a grid point as a block and have JV = I + δtm 2 ∆h ∆h − δtm ∆h ΛV QV = I + δtm 2 ∆h ∆h − 18δtm ∆h . where ΛV is a block diagonal matrix with 2 × 2 blocks    ∂2W ∂uv ∂2W ∂v 2 ∂2W ∂u2 ∂2W ∂uv    where the vector potential (3.6) is considered above and the partial derivatives are evaluated (r) (r) at the corresponding grid values (Uj , Vj ). A straight-forward calculation shows that for values of u in any of the three symmetric potential wells, this block is diagonal with diagonal entries 18. Hence, the preconditioner follows the same reasoning as above. 3.3.4 Basic numerical results We consider the 1D Cahn-Hilliard equation as the model system in this section. Starting from initial data u0 (x) = cos(2x) + and 1 cos(x+1/10) e 100 (3.15) = 0.18, the system moves to an intermediate state at t = 300 with two intervals with u ≈ −1 as shown in Figure 3.1. This time is roughly where the slowest ripening evolution 59 Figure 3.1 This figure corresponds to the solution of the 1D Cahn-Hilliard (3.2) with initial conditions (3.15) for = 0.18 at time t = 300. This is the type of solution at which the preconditioner performance is examined in more detail in Section 3.4. occurs, marked by the largest time steps taken by the method as shown in Figure 3.5. The second term on the right above is a small perturbation so that these two intervals are not symmetric, so that we will see generic behavior. At much longer times, these two intervals will slowly evolve and merge [11, 83] as shown below. The fixed time step δt performance of the method is examined at a short time t = 0.2 in Tables 3.1 and 3.2 . First order convergence in the time step and spectral convergence in N is seen as expected. Also observed is that N = O(1/ ) is needed to spatially resolve the interfaces of width . 60 δt Eδt 2e-4 1.32e-5 1e-4 6.6e-6 5e-5 3.3e-6 Table 3.1 Error estimates Eδt = Uδt − Uδt/2 for fixed time step δt computations of the 1D Cahn-Hilliard model with = 0.18 to short time t = 0.2. Spatial discretization is fixed at N = 128. N EN for = 0.18 EN 32 2.0e-3 64 9.3e-7 9.0e-13 128 for = 0.09 0.139 4.4e-3 1.3e-6 Table 3.2 Error estimates EN = UN − U2N for computations of the 1D Cahn-Hilliard model with = 0.18 and = 0.09 to short time t = 0.2. Fixed time steps of δt = 1e − 4 are used. The performance of the adaptive time stepping method through a ripening event is shown in Table 3.3. In these runs the Newton solve has residual tolerance 10−8 and the PCG solve at each step has tolerance 10−9 . Starting from initial data (3.15) the solution contains four transition layers at short time as shown in Figure 3.1. Over a very long time, the middle transitions move closer together and merge as shown in Figure 3.2. The final state with two transition layers is steady. The energy, time step size and PCG iteration history for the are shown in Figure 3.5 for the tolerance σ = 10−4 . One or two Newton iterations are taken at each time step. Note the sharp transitions in the energy E at early times (the spinodal evolution) and at the ripening event at which the time steps are small. The ripening time estimates shown in Table 3.3 correspond to the time t at which the midpoint value u(π, t) changes from positive to negative. This ripening event happens at a very fast time scale after the long, slow transient. The results in Table 3.3 show that the method can accurately capture the time that such events occur. Since local truncation error is O(δt2 ) it is expected that as the local tolerance σ is reduced by a factor of 10, the number of total time steps 61 σ time steps 1e-4 848 1e-5 2580 (3.04) 1e-6 8072 (3.13) 1e-7 25446 (3.15) ripening time 8180 8273 8304 8314 total PCG iterations 19,105 39,942 (2.09) 87,563 (2.19) 227,799 (2.60) Table 3.3 Performance of the adaptive time stepping through a ripening event of the 1D Cahn-Hilliard model with = 0.18 and N = 128. The numbers in brackets are ratios to quantities in the previous row. should increase by a factor of √ 10 ≈ 3.16 as is observed computationally. This validates our simple error estimation strategy. Note that the number of PCG steps increases by a smaller factor, indicating that the condition number of the implicit system decreases as δt → 0. Note also that the performance of the solver is independent of N (for fixed ). It is well known [94, 83] that ripening is exponentially slow in in 1D Allen-Cahn and Cahn-Hillard models. With error tolerance σ = 10−4 and N = 128 we compute ripening at time approximately 34,200 using 948 time steps and 22,950 PCG iterations for = 0.16 and ripening time 218,000 using 1081 time steps and 28,417 PCG iterations for = 0.14. This demonstrates that our approach behaves well when computing through ripening over very long time scales. It is known that to resolve the dynamics for increasing small , higher precision arithmetic is needed to resolve the exponentially small interactions of the layers in this 1D setting [83]. The method performs similarly well on the lower order Allen-Cahn equation (3.1). Spinodal evolution leads to similar layers to those shown in Figure 3.1. In this case the equations do not preserve mass and so ripening involves the separate collapse of the intervals of state u ≈ −1 leading to a steady state of u ≡ 1. 62 Figure 3.2 This figure corresponds to the solution of the 1D Cahn-Hilliard equation (3.2) with initial conditions (3.15) for = 0.18 during the final stages of the ripening process at large time. 63 Energy 0.13 0.12 0.11 0.1 0.09 0.08 0.07 0.06 0.05 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 t Figure 3.3 The energy history for the adaptive time approximation with tolerance σ = 10−4 of the 1D Cahn-Hilliard equation (3.2) with initial conditions (3.15) for = 0.18. 64 Time step size k 180 160 140 120 100 80 60 40 20 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 t Figure 3.4 The time step history for the adaptive time approximation with tolerance σ = 10−4 of the 1D Cahn-Hilliard equation (3.2) with initial conditions (3.15) for = 0.18. 65 Conjugate gradient iterations 40 35 30 25 20 15 10 5 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 t Figure 3.5 The PCG iteration count history for the adaptive time approximation with tolerance σ = 10−4 of the 1D Cahn-Hilliard equation (3.2) with initial conditions (3.15) for = 0.18. 66 3.4 3.4.1 Investigation of the preconditioned system Preliminaries and numerical results At a ripening state u of the 1D Cahn-Hilliard model evolving from the initial data (3.15), we examine the structure of the preconditioned Jacobian matrix A = Q−1 JCH CH as presented above. We can consider the operator A in the continuum limit. Since JCH is a low order perturbation of the elliptic operator QCH , the limit operator A is a relatively compact perturbation of the identity (see [95], chapter 6). Thus we can expect the condition number of A to be relatively insensitive to the spatial discretization level N for N large enough to resolve the problem. In the discrete 1D setting, A and its eigenvalues can be computed explicitly using built-in MATLAB commands. The eigenvalues for the = 0.18, δt = 10 at the solution shown in Figure 3.1 are shown in Figure 3.6. Note the clustering of eigenvalues near 1 as expected from the preconditioning. At this slowly evolving state, there are small eigenvalues as → 0 and δt → ∞ that determine the condition number of A and limit the performance of the CG iterations. The eigenfunction corresponding to the smallest eigenvalue for parameters = 0.18 and δt = 10 is shown in Figure 3.7. Notice that it is composed of pulses at the locations of the transition layers in the solution in Figure 3.1. For this state u there are M = 4 transition layers. It is shown using formal asymptotics in section 3.4.2 that it is expected that there will be M − 1 = 3 small eigenvalues as observed computationally. We consider the dependence of the smallest eigenvalue on and δt in Table 3.4. Note that these results give evidence that the smallest eigenvalue scales like /δt, 67 Eigenvalues of the preconditioned jacobian matrix 1 0.8 0.6 0.4 0.2 0 −0.2 clustering of eigenvalues at 1 3 small positive eigenvalues −0.4 −0.6 −0.8 −1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 3.6 Eigenvalues of the preconditioned Jacobian matrix for the model situation described in Section 3.4 for parameters = 0.18 and δt = 10. giving a condition number κ of A that scales like δt/ . This is confirmed in the formal asymptotics below. A similar study with the preconditioned Jacobian for the Allen-Cahn equations reveals M small eigenvalues of identical magnitude that are ndependent of and have values ap- proximately C/δt with C ≈ 0.41 for large δt. This behavior is confirmed in the formal asymptotics below. Remark 3. Since the condition number κ scales linearly with δt and the number of CG 68 Eigenfunction for smallest eigenvalue for preconditioned matrix 0.2 0.15 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2 0 1 2 3 4 5 6 7 x Figure 3.7 The eigenfunction of the preconditioned Jacobian matrix with the smallest eigenvalue for the model situation described in Section 3.4 for parameters = 0.18 and δt = 10. δt↓ T → 25 50 100 200 400 = 0.18 300 3.36e-3 1.68e-3 8.43e-4 4.22e-4 2.11e-4 = 0.16 2000 3.14e-3 1.58e-3 7.89e-4 3.95e-4 1.97e-4 = 0.14 10000 2.83e-3 1.45e-3 7.26e-4 3.63e-4 1.82e-4 Table 3.4 Dependence of the smallest eigenvalue of the preconditioned Jacobian matrix for the Cahn Hilliard problem on δt and . The time T at which eigenvalues are evaluated is roughly where the slowest evolution occurs. The values are not that sensitive to the value of T . 69 iterations to reach a fixed solution accuracy increases as √ κ [90], taking larger time steps in ripening regimes will lead to higher numerical efficiency. Thus, we advocate taking as large time steps δt as accuracy will allow. This is a further motivation for our consideration of higher accurate time stepping techniques below. 3.4.2 Formal asymptotics 3.4.2.1 Allen Cahn Consider first the Allen-Cahn equation (3.1). In an infinite spatial domain this problem has a translationally invariant steady solution u0 = tanh x √ 2 . In the ripening phase, solutions take the form [94] M u0 [(−1)j (x − xj )] u≈ (3.16) j=1 where xj (t) are transition layer positions. This is the structure seen in Figure 3.1 with M = 4 transition layers (M must be even in our periodic setting). The approximation above is valid up to exponentially small terms in . We denote such an approximation by =e in what follows. Linearizing (3.1) leads to vt = Lv := vxx − (3u2 − 1)v 70 (3.17) where v is the linear disturbance to u(x, t). The spectrum of L at ripening states (3.16) is well understood: Theorem 3.4.1 (Carr and Pego 1989 [94]). L has M exponentially small (in ) eigenvalues λj . The rest are negative and bounded away from zero. The eigenfunctions of the small eigenvalues φj (x) are given by φj =e d 1 u0 (x − xj ) = √ sech2 dx 2 which in words are spikes of width x − xj √ 2 (3.18) centred at the interface location. The eigenvalues λj govern the exponentially slow motion of the fronts xj (t). The theorem stated in [94] allows for more general potentials W (u) and the results below can be extended to these cases. Consider now the spectrum of the preconditioned Jacobian for the Allen-Cahn problem at ripening states: Aψ = σψ where A = Q−1 J where Q and J are given by (3.9) and (3.10) respectively. This can be rewritten as (I − δtL)ψ = σ[I − δt(L − 3(u2 − 1))]ψ (3.19) where I is the identity operator. Recall that we are interested in the small eigenvalues σ that determine the condition number of the PCG iterations for this problem and that there is computational evidence that these σ are O(1/δt). Thus we make the ansatz σ = β/δt and 71 consider (3.19) formally in powers of δt: O(δt) : O(1) : Lψ = 0 (3.20) ψ = β(L − 3(u2 − 1))ψ. (3.21) The first equation above forces ψ to be in V = span{φj } at highest order where the φj are the eigenfunctions corresponding to small eigenvalues of Theorem 3.4.1. With ψ ∈ V, (3.25) is satisfied to exponentially small terms in . Now (3.21) becomes ψ = 3β(1 − u2 )ψ We take ψ = φj ∈ V and take the L2 inner product of the equation above with φj leading to 1 β= 3 φ2 j φ2 (1 − u2 ) j (3.22) Considering the ripening form (3.16) and the local nature of φj (3.18), the value of β only depends on the local layer structure up to exponentially small terms and we thus obtain M copies of β =e 1 3 sech4 x ≈ 0.4167 sech4 (1 − tanh2 x) (3.23) which agrees closely with the computational results in the previous section. Note that so far (3.21) is satisfied only in the projection on V. Correction terms of order O(1/δt) in ψ in V ⊥ can be derived that give a complete picture of the formal analysis. 72 3.4.2.2 Cahn-Hilliard The analysis of the small eigenvalues for the preconditioned Jacobian matrix for the CahnHilliard problem can be done similarly, with a few additional technical difficulties. Our results here depend on a conjecture on the rank of a modified square distance matrix. Here the relevant eigenvalue problem is (I + δtDL)ψ = σ[I + δtD(L − 3(u2 − 1))]ψ (3.24) where D is the second derivative operator. We use D here rather than ∆ to be clear that the analysis that follows is only applicable to the 1D case. The extra difficulty here arises essentially because D is not invariant on V (the span of the eigenfunctions φj ) or V ⊥ . We make the ansatz σ = β/δt as before and consider (3.24) formally in powers of δt: O(δt) : O(1) : DLψ = 0 (3.25) ψ = βD(L − 3(u2 − 1))ψ. (3.26) As before, (3.25) forces ψ ∈ V to highest order which we make explicit here M ψ= cj φj . (3.27) j=1 More carefully said, Lψ must be a constant to satisfy (3.25) but this constant enters as a O(1/δt) term. Turning to (3.26) we first notice that ψ is a second derivative of a periodic 73 function and so must have average zero which gives the discrete condition M cj =e 0. (3.28) j=1 Let P be the projection onto the set of functions with average value zero: 2π 1 φ(x)dx. 2π 0 Pφ = φ − Let φ∗ = P φj . Because of (3.28), (3.27) can be written equivalently as j M cj φ ∗ . j ψ= j=1 We can apply D−1 (taking the result to have zero average value) to (3.26), obtaining M M cj D−1 φ∗ j = 3βP (u2 − 1) cj φj . j=1 j=1 Taking the inner product with an arbitrary element of V with zero mass, then normalizing with φ2 = K/ (consider the form (3.18) to see that this gives an absolute constant K) we j obtain − βAC K P BP c = βc (3.29) where c is the vector of M values cj , P is here the discrete projection onto the subspace of vectors that sum to zero, βAC is the constant from the Allen-Cahn eigenvalue problem 74 (3.23), and B is the M × M symmetric matrix with entries bij = φ∗ D−1 φ∗ . j i (3.30) As above for the simpler Allen-Cahn case, (3.26) is so far only satisfied in the massless subspace of V. Full matching can be done with O(1/δt) terms in V ⊥ plus constants. Considering (3.29), the expected behavior of the small preconditioned Jacobian eigenvalues of size /δt will be observed as long as P BP is full rank M −1 and has O(1) eigenvalues. Consider (3.30) in the limit as → 0. The integral φj =e 2 remains fixed in this limit and so φj is seen to be an approximate delta function at xj with weight 2. Thus, D−1 φ∗ is approximately a j quadratic with second derivative −1/π except in a region of width D−1 φ∗ (x) ≈ d − j around xj : d2 4π 2 − 2π + 2π 3 where d is the distance between x and xj on the periodic interval [0, 2π]. The integral (3.30) can then be approximated by bij ≈ 2dij − d2 /π ij (3.31) where dij is the distance between points xi and xj on the periodic interval and we have adjusted the entries of B by a constant as allowed by the expression (3.29) to simplify the expression. Note that the expression above is only accurate to polynomial terms in . We have strong computational evidence that the matrix P BP with limiting entries (3.31) has rank M − 1 whenever the layer positions xi are distinct. Thus, we conjecture Conjecture 1. If {xi }, i = 1, . . . M are distinct points in [0, 1] and the entries bij of the 75 M × M matrix B are given by bij = dij − d2 ij where dij are the distances between points xi and xj , taken either as absolute values or on the periodic interval, then the matrix P BP has M − 1 strictly negative eigenvalues. Here P is projection onto the subspace orthogonal to constant vectors. In the statement above we have scaled the interval to [0, 1] for convenience. It should be noted that without the first term in bij , linear in distance with scaling proportional to the interval size, the square distance matrix P BP has rank at most 3 [96]. 3.5 Performance on a variety of models One of the main advantages of our approach is that it is easily extensible to higher order and vector models. The Jacobian matrix and preconditioner are straight forward to generate for these more complicated models. There is no need to determine how to split the potential into convex and concave pieces, as is needed for the popular class of splitting methods based on Eyre’s method [46] which we discuss in section 3.9 below. While we do not claim that our approach is the most computationally efficient methods for every (or any) problem, they are reasonably efficient and can be tried on any new problem with little effort. Here, we apply them to the Cahn-Hilliard problem (3.2), the sixth order model problem (1.10) and the vector model problem (3.5), all in 2D. We conduct these numerical tests with first order time stepping for simplicity. In the examples below, convergence in ripening times and the behavior of the number of time steps and total PCG iterations with tolerance σ is observed as in the preliminary 1D Cahn-Hilliard example shown in section 3.3.4 . 76 3.5.1 2D Cahn-Hillard We begin with initial conditions u0 (x, y) = 2esin x+sin y−2 + 2.2e− sin x−sin y−2 − 1 (3.32) which after some initial ripening of u = 1 states (mass fraction 0.4556) leads to two circular states, one of which captures the other in a long time frame. The results for the = 0.08 model, using N = 128 and σ = 10−4 are shown in Figure 3.8. This simulation took 3,305 time steps and a total of 84,138 PCG iterations. The similar computation for = 0.16 using N = 64 and σ = 10−4 past ripening took 1,471 time steps and a total of 30,051 PCG iterations. The MATLAB code used for this example will be available on the publisher’s web site for this article. 3.5.2 Sixth order model We begin with the same initial conditions (3.32) shown in Figure 3.8 upper left. The results for = 0.18, η = 1 computed with N = 128 and σ = 10−4 are shown in Figure 3.9. The influence of the interface promoting term is evident. For larger the final steady state is a regular array. The final pattern shown here is not at steady state. This simulation was done in 2,229 time steps with a total of 234,582 PCG iterations. 3.5.3 2D vector model We begin with initial conditions (3.32) for u and v0 (x, y) = sin y. After some initial ripening, regions of the three states form, separated by interfaces which meet at triple junctions. Unlike grain growth [93] this model preserves the area of each of the three phases. The results for 77 the = 0.32 (the value compared to other models is larger due to the larger magnitude of the reaction term for this model), using N = 128 and σ = 10−4 are shown in Figure 3.10. This simulation took 2,229 time steps and 51,985 total CG iterations. We believe that the final result at t = 100 is an approximation of the steady state of the problem. 78 2D Cahn-Hilliard initial conditions u (x, 0) 2D Cahn-Hilliard solution u (x, 0.5) 2D Cahn-Hilliard solution u (x, 150) 2D Cahn-Hilliard solution u (x, 300) Figure 3.8 2D Cahn-Hilliard example computation. 79 Sixth order model solution u (x, 2) Sixth order model solution u (x, 25) Sixth order model solution u (x, 500) Sixth order model solution u (x, 2200) Figure 3.9 2D sixth order model example computation. 80 2D vector model initial conditions u (x, 0) 2D vector model solution u (x, 0.5) 2D vector model solution u (x, 15) 2D vector model solution u (x, 100) Figure 3.10 2D vector model example computation. Contours of cos(arg u + iv) are plotted. Two of the phases have value cos(2π/3) = −1/2 (light blue in the plots) and are separated by dark blue lines. 81 3.6 GPU implementation A recent development in scientific computing is General-Purpose computing on Graphics Processing Units (GPGPU). In the 1990s, Graphics Processing Units (GPUs) were developed to accelerate the building of 2D and 3D images for output to a display. Since then, the stream processing capabilities of GPUs have been turned to scientific computing, and the application to scientific computing has led to commercial lines of GPUs that have hundreds of double precision computation cores with error checking. GPU computation can be much faster if a problem or computation lends itself well to many lightweight computation threads. We used Nvidia GPU cards and the CUDA programming language, which is C++ with extra functions to control the device (GPU) from the host (CPU). When simulating the Cahn-Hilliard equation implicitly with the spectral method detailed in Section 3.3, there are three main types of mathematical operations: Fourier transforms, element-wise multiplication or addition, and array reductions (the inner products in the PCG method for example). The first two of these operations adapt very well to GPU processing, but array reductions pose some challenges. Three-dimensional Fourier transforms can be computed extremely fast on a GPU by breaking the domain into lines of grid points, then handing each line of grid points to a separate multi-processor for calculation of the one dimensional FFT. After each processor is done computing the frequency representation of the data, the GPU recompiles the data into its three-dimensional representation. The widely available cuFFT package implements FFT calculations on the GPU. With the solution available in both spatial and frequency domains, multiplication by the Jacobian matrix and the preconditioner can be performed by element-wise multiplication or addition in the appropriate domain. This plays directly into 82 the strength of the GPU which has millions of lightweight computation threads. The third type of operation performed when computing solutions to the Cahn-Hilliard equation is array reductions. Reductions must be used when computing the inner products as well as maximum residual values for the conjugate gradient and Newton method iterations. In a CPU calculation, array reductions are performed serially so they are simple to implement and run quickly compared to other parts of the calculation. However, array reductions are particularly troublesome for GPUs. Besides having hundreds of processors, GPUs are fast because they hide latency by scheduling calculations in a first come first served manner. Thus it is impossible to know before computation time in which order the 221 evaluations will take place. The solution to this problem is to synchronize all threads after each set of evaluations, even though setting such a thread block increases computation time. An example of the proper way to approach array reduction on GPUs is given in the CUDA SDK provided by Nvidia. There are limitations to GPU computing that cannot be overlooked. Unlike a CPU that has access to any memory location, GPUs multiprocessors can only access the data that is stored in the GPU card’s memory. This leads to two complications: limitations on array size and movement of data. The GPU card we used had 6GB of memory on the card, so for the number of double-precision, complex-valued arrays we used in the simulation, the size of the domain was limited to 222 total grid points. For a cubic domain, this limit corresponds to 128 grid points in each dimension. It is important to realize that this is a rigid memory limit because it is defined by the GPU card’s hardware. It is certainly possible to increase the domain size by using multiple GPUs for a computation, but this would require significant effort to reduce the data movement between domains that would limit the computational speedup. The second complication is movement of the data into the memory on the GPU. 83 The movement of data on and off the card is the only way to access the solutions to the PDE, and data movement is much slower than computation speed. Because of this, the number of times the solution array is moved from the card must be kept at a minimum. This places some limitations on the post-processing of solutions. 3.6.1 GPU Speedup We conducted a speed test to compare simulation code written for parallel CPUs against the code written for the GPU. The CPU code was written in C++ and aggressively parallelized using OpenMP. The parallel version of FFTW was used for the Fourier transforms. The code was run on two Quad-core Intel Xeon E5620 processors (eight total cores). The processors have a clock speed of 2.4 GHz. The GPU used was a Tesla C2070 with 448 multiprocessor cores with a clock speed of 1.15 GHz, and it had 6GB of global memory. The two hardware configurations were about one year old at the time of the test, so the speed results are comparable. The test was performed on initial conditions of 3D random initial data on cubic grids with 128 grid points per dimension. Computing to time T = 40, the parallelized C++ calculation took nine hours and seventeen minutes to complete compared to the Cuda GPU code that took one hour and twenty-five minutes. The solutions given by the two codes were identical. We realized a speedup factor of about 6.5 with the GPU implementation over the eight core CPU calculation in this computation in which data transfer on and off the GPU card was negligible. This speedup factor was confirmed in timing tests of individual time steps. The computed solutions are shown in Figure 3.11 and the time step and PCG profiles are shown in Figure 3.12 and Figure 3.13, respectively. The sequential decreases in time step size correspond to ripening events. 84 Figure 3.11 Solution at times T = 5, 10, and 40 of the 3D Cahn-Hilliard computation used as a timing test for the GPU implementation. Shown are the zero level sets of the solutions, with blue indicating increasing and green decreasing solutions at the interface. 0.08 0.07 Time step size (k) 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 35 40 Time t Figure 3.12 Time step size δt for the 3D Cahn-Hilliard computation used as a timing test for the GPU implementation. 85 PCG iterations per time step 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 Time t Figure 3.13 PCG count per time step for the 3D Cahn-Hilliard computation used as a timing test for the GPU implementation. 86 3.7 Higher order time stepping We now consider second and third order adaptive time stepping methods. We use the multistep methods Adams-Bashforth (AB) and Backward Difference Formula (BDF) as our explicit predictor and implicit corrector respectively. With the same notation for the fully discrete approximation of u given in section 3.3.2, the second and third order versions of AB for the predicted U∗ solution at time tm are [89]: δtm 6f (Um−1 ) − f (Um−2 ) , 4 δtm (AB3): U∗ = Um−1 + 23f (Um−1 ) − 16f (Um−2 ) + 5f (Um−3 ) , 12 (AB2): U∗ = Um−1 + where for example for the lower order Allen-Cahn equation, f (u) = 2 ∆h u − W (u). Again we use these as an initial state in a Newton iteration for the solution to the implicit BDF time step. The second and third order versions of BDF lead to the nonlinear systems G(Um ) for the solution at time tm given below [88]: 1 3 m U − δtm f (U m ) − 2U m−1 + U m−2 = 0, 2 2 11 m 3 1 (BDF3): G(U m ) := U − δtm f (U m ) − 3U m−1 + U m−2 − U m−3 = 0. 16 2 3 (BDF2): G(U m ) := The solution procedure for these systems is the same as described in section 3.3.3. The combination of AB predictor and BDF corrector has several advantages. BDF methods up to order 6 have good stability properties for stiff problems and have a high ratio of accuracy 87 to implicit solves [88]. Like forward and backward Euler methods, AB and BDF methods of the same order p share the same form of dominant local truncation error, proportional to the order p + 1 time derivative of u. Because of this, the exact local truncation error ηe for the second and third order steps can be easily approximated by ηe ≈ η := C||U ∗ − U m || (3.33) where C is 4/9 and 2/5 for order 2 and 3 methods, respectively. These higher order methods require p previous values for order p methods. Standard AB and BDF methods require these previous values to be equally spaced in time. This is the major drawback of standard multi-step methods. Initially and after every time step change, we compute the necessary additional previous values using minimal stage L-stable Singly Diagonal Implicit Runge Kutta (SDIRK) methods of second or third order [97]. In our time-adaptive computations below, we again specify a given tolerance σ > 0 for the local truncation error η. Because changing the time step requires additional computational computational cost we only increase the time step if the local error is sufficiently below tolerance, 1 η < σ γ where γ > 1 is user defined. We chose γ = 3 based on computational evidence [98]. Total time step counts are relatively insensitive to γ around this value. If the condition above is met, we allow a time step increase with multiplier ξ = (0.8γ)1/(p+1) . 88 where again 0.8 is a safety factor. If a time step fails the accuracy check (η < σ) then we fail the step and reduce the time step by a factor of 1 and restart the computation. ξ Although SDIRK steps are more accurate than BDF steps for the same time step size, the errors are different and so the error estimator (3.33) cannot be used directly after a restart. However, the error mismatch decays with a fixed number of steps in a numerical initial layer effect. Because of this, we only check the error after a preset number S of time steps following a restart for both an increase and decrease in the time step. The values of S used are 4 and 8 for BDF2-AB2 and BDF3-AB3 respectively (the numerical initial layers for BDF3 have a slower decay than for BDF2). 3.7.1 Numerical Results We consider the same 1D Cahn-Hilliard model problem as in section 3.3.4, where the initial data u0 (x) is given by (21) and = 0.18. In these runs the Newton solve has residual tolerance 10−9 and the PCG solve at each step has tolerance 10−10 . The time step size and PCG iteration history are shown in Figure 3.14. Note that the time step history for the first order method (FE/BE) is different in Figure 3.14 than in Figure 3.5 because we apply the same accuracy threshold to changing time steps here that we do for the higher order methods. 3.8 A Pair of Fourth-Order Accurate Methods In recent work with Dr. David Seal, we discovered and implemented a new family of highorder time-stepping Lax-Wendroff schemes. The fourth-order implicit method is A-stable and effectively avoids the Dahlquist barrier by using the derivatives of the right hand side of 89 Evolution of timestep 70 BE−FE BDF2−AB2 BDF3−AB3 60 50 k 40 30 20 10 0 0 2000 4000 6000 8000 10000 t Conjugate Gradient Iterations 300 BE−FE BDF2−AB2 BDF3−AB3 250 200 150 100 50 0 0 2000 4000 6000 8000 10000 t Figure 3.14 The time step and PCG iteration count history for the adaptive time approximation of the 1D Cahn-Hilliard using BE-FE, BDF2-AB2 and BDF3-AB3 with initial conditions (21) for = 0.18, σ = 10−7 . 90 the equation. The fourth-order explicit method has the same error term as the corresponding implicit scheme allowing an adaptive time-stepping procedure. For the general PDE, ut = f (u) the two-stage explicit scheme is given by 2 δt n ) + δt J f (un ) = + f (u 2 8 δt2 un+1 = un + δtf (un ) + (J f (un ) + 2J f (u∗ )) , 6 u∗ un (3.34) and the implicit scheme is un+1 = un + δt f (un ) + f un+1 2 + δt2 J f (un ) − J f un+1 12 , (3.35) where J is the Jacobian of the right-hand side of the differential equation, f . For the CH equation, f = −ε2 ∆2 u + ∆ u3 − u , so the Jacobian acting on some object v is J v = −ε2 ∆2 v + ∆ 3u2 − 1 v . We compute the explicit solution and then solve for the implicit solution as above. The adaptive time-step is computed in the same manner also, except σ δtnew = δtold · min 0.8 · 5 , 1.3 . E (3.36) The preconditioner we use is derived in the same way as the first order preconditioner. For the fourth order method applied to the CH equation, G (ur ) v = v − δt δt2 J f (v) − Hf (v) , 2 12 91 (3.37) ∂u tt with Hf = ∂u computed by hand. Assuming u = ±1 and ∆u = 0 for the majority of the domain, we obtain the physics based preconditioner, Q=1− δt2 δt −ε2 ∆2 + 2∆ + −ε2 ∆2 + 2 2 12 −ε2 ∆2 + 2∆ . (3.38) For the Cahn-Hilliard equation, we observe fourth order convergence for this pair of methods. The physics based preconditioner made a significant difference by reducing the required number of CG steps by a factor of one hundred throughout most of the computation. During the spinodal phase separation exhibited by solutions of the CH equation (computed from random initial data), time steps for the fourth order method were ten to a thousand times larger than the first order implicit method (Section 3.3.2). When the solution became smooth (or if smooth initial conditions were used), the first and fourth order methods maintained approximately the same size time steps when controlling the error. This comes from the fact that when using a fixed time step, the first order method was numerically stable for a time step size slightly larger than the fourth order method. The Newton and conjugate gradient solves required nearly the same number of iterations per time step, but as expected, the fourth order method required more computation per iteration. The results in Table 3.5 shows the performance of the methods on the test problem for various values of local tolerance σ. Ripening times can be computed very accurately with the higher order methods. The PCG count is an accurate measure of computational cost and includes all iterations from SDIRK restart and failed steps. Efficiency gains are obtained with the higher order methods, with the biggest gain moving from first to second order. As in section 3.3.4 the first order method shows the correct asymptotic behavior in the number √ of time steps (ratios of 10 ≈ 3.16 of times steps as σ is decreased by 10). The second order 92 √ method results are also consistent with the correct asymptotic behavior ratio 3 10 ≈ 2.15. The third order results have not reached the asymptotic regime. Considering the accuracy of the ripening times in this case, it appears that the error estimation is over-estimating the local error. Remark 4. Experiments with other higher order strategies were also done, for example using SDIRK2 stepping with SDIRK3 for error estimation. However, for tolerances σ leading to accuracy of practical interest, this error estimation was also not asymptotically valid. It is known that constructing good error estimators for IRK methods applied to very stiff problems is difficult [99]. We conjecture that the relatively large higher derivative terms pollute the error estimation in this case. The problem of accurate error estimation for higher order time stepping for this class of problems is an interesting open question. With such error estimation, arbitrary precision could be achieved to solutions of these problems: spectral accuracy in space and high accuracy in time using spectral deferred correction methods [100] or high order Radau methods [88]. 3.9 3.9.1 Investigation of splitting methods Preliminaries and numerical results We present a splitting approach of Eyre [46] in the framework of the implicit approximation of the 1D Cahn-Hillard model. The standard backward Euler time stepping approximation of this model leads to the problem Um + δt∆h 2 ∆ Um h − (Um )<3> + Um − Um−1 = 0 93 (3.39) Method σ 1e-04 BE-FE 1e-05 1e-06 1e-07 1e-04 BDF2-AB2 1e-05 1e-06 1e-07 1e-06 BDF3-AB3 1e-07 1e-08 time steps 792 2114 (2.67) 6371 (3.01) 20216 (3.17) 191 660 (3.455) 1484 (2.248) 3279 (2.210) 374 981 (2.623) 2041 (2.081) ripening time 8206.11 8272.16 8303.81 8314.15 8301.34 8316.19 8317.97 8318.47 8318.43 8318.61 8318.63 total PCG iterations 41475 69166 (1.67) 125879 (1.82) 277079 (2.20) 22663 36535 (1.612) 44987 (1.231) 70580 (1.569) 25443 37582 (1.477) 53079 (1.412) Table 3.5 Performance of each adaptive time stepping method through a ripening event of the 1D Cahn-Hilliard model with = 0.18 and N = 128. The numbers in brackets are ratios to quantities in the previous row. where by (Um )<3> we mean the pointwise values cubed and δt is the time step as before. This is the system solved with the methods in this chapter. A slight modification to this system, taking explicitly the linear term that otherwise makes the system non-convex, Eyre [46] considered Um + δt∆h 2 ∆ Um h − (Um )<3> − Um−1 + δt∆h Um−1 = 0. (3.40) Note that the implicit solve is still nonlinear but now convex. There are two attractive theoretical properties to this approach: A: The discrete problem (3.40) has a unique solution for every δt and every U m−1 . B: The unique solution above is guaranteed to reduce the energy. The nonlinear system can be solved with the same Newton, PCG approach as the unsplit discretization we have considered up to this point and we can examine the condition number of the preconditioned Jacobian matrix in the same framework as the preconditioned unsplit 94 method in section 3.4. Here, the matrix of interest is AS = Q−1 JS S where QS = I + δt 2 ∆h ∆h − 3δt∆h and JS = I + δt 2 ∆h ∆h − δt∆h ΛS (r) where ΛS is the diagonal matrix with positive entries 3[Uj ]2 where U(r) denotes the r’th iterate for Um as before. Note that JS is positive definite for every δt and U(r) (a consequence of the convexity of the implicit problem), unlike JCH (3.11) from the unsplit discretization. The eigenvalues of AS like those of A in section 3.4 also cluster around 1 and have a minimum value that determines the condition number of AS . Here, the minimum eigenvalue for large δt at ripening states is approximately 0.290 independent of δt and . This matches the value predicted by formal asymptotics shown in section 3.9.2 below. It is possible to set up a linear fixed point iteration and avoid the machinery of newton iterations with PCG solvers. Consider the iterative scheme with iterates U(r) for Um to solve (3.40): U(r) + δt∆h 2 ∆ U(r) h − 3U(r) = δt∆h (U(r−1) )<3> − 3U(r−1) +Um−1 − δt∆h Um−1 This approach is taken in [91]. Note that the linearization of the fixed point iteration has matrix I − AS , so the small eigenvalues of AS will determine the convergence rate of the iterations. The convergence factors will be approximately 1-0.290 = 0.710 for the 1D C-H 95 ripening solutions considered in this section, independent of δt and . Eyre also considered other operator splitting possibilities, including one that only requires a linear implicit solve Um + δt∆h 2 ∆ Um h − 4Um − Um−1 + δt∆h (4Um−1 − (Um−1 )3 ) = 0 (3.41) which retains properties A and B above under the assumption that U ∞ remains bounded by 1. From the point of view of solver efficiency, the splitting methods are ideal. However, the accuracy of time stepping with the operator splitting can be much lower than that of the conventional (non-split) solvers considered in this chapter. We repeat the 1D Cahn-Hilliard ripening time computations of section 3.3.4. The results are shown in Table 3.6 for the split step method (3.40) using the PCG solver for the time stepping. Comparing to Table 3.3, it is clear that the splitting method is much less efficient than the pure implicit time stepping proposed in this chapter. It is clear from Table 3.6 that this is not just an artifact of the error estimation, since even with the large number of time steps, ripening times are not found accurately. The linear splitting (3.41) performed similarly poorly, taking 193,673 time steps for σ = 10−4 in the same computation listed in Tables 3.3 and 3.6, obtaining a (very inaccurate) ripening time of 15301. Although the solve at each time step is more efficient and the method has the same order of accuracy, the errors from the operator splitting are much larger than those from pure implicit time stepping for large time steps. Poor performance of splitting methods is observed in 2D computations and in the other models considered in section 3.5. For example, using Eyre’s method (3.40) on the 2D Cahn-Hilliard computation described in section 3.5 leads to 83,862 time steps and 1,069,416 PCG iterations for the = 0.08 run (compared to 3,305 and 84,138 respectively for our fully implicit approach) 96 and 18,891 time steps and 207,252 PCG iterations for the = 0.16 case (compared to 1,471 and 30,051 for our approach). The accuracy limitations of splitting methods has previously been observed [55, 91]. Using higher order implicit-explicit (IMEX) time stepping methods [101, 102, 103, 104] (which retain property A above but not necessarily B) does not allow significantly larger time steps through ripening, although they achieve the specified order of accuracy as δt → 0 of course. The accuracy loss can be explained in the 1D Allen-Cahn case using simple asymptotics below. Remark 5. The recent method in [55] is different from Eyre’s splitting ideas. Their technique ensures property B above but not A. The implicit problem they solve at every time step (with incomplete LU preconditioned GMRES) has the same structure as the fully implicit time stepping considered in the current work, although it is a modification of the implicit midpoint rule which does not have strong stability properties. We believe our PCG solution strategy could be applied to their discretization and should be more efficient. Remark 6. It should be made clear that these 1D problems with exponentially slow ripening events are extreme cases. It is not clear whether Eyre’s type splitting methods might be made more efficient in some 2D and 3D problems where motion is generically only polynomial slow in . If one relaxes the strict requirement of time accuracy and considers a computational time step as just an efficient step down the energy landscape[91], the comparison is even less clear. See also [105, 106] for time stepping strategies that preserve coarsening statistics but not pointwise accuracy. A hybrid strategy might also be considered, where efficient time steps based on operator splitting are taken during the spinodal phase or when other physics limits the time step size and our fully implicit time stepping strategy is used only during long ripening events. We consider the comparison of the efficiency of the approaches an open 97 σ time steps 1e-4 70,517 1e-5 202,549 (2.87) 1e-6 618,431(3.06) ripening time 13147 9582 8695 total PCG iterations 1,039,676 2,368,051 (2.27) 5,205,739 (2.19) Table 3.6 Performance of adaptive time stepping through a ripening event of the 1D CahnHilliard model with = 0.18 and N = 128 with Eyre’s splitting (3.40). The numbers in brackets are ratios to quantities in the previous row. Compare to values in Table 3.3 for the fully implicit time discretization. problem of interest. This is a question of accurate time stepping with lengthly solves versus time step size limited computations with fast solves. However, we remind the reader that the main advantage of the numerical framework described in this chapter is the generality of problems to which it can be easily applied rather than its potential as an efficient approach to any particular problem. 3.9.2 Asymptotics We extend the asymptotic framework developed in section 3.4.2 to explore the solver efficiency and accuracy limitations of Eyre’s splitting method. 3.9.2.1 Accuracy in Allen-Cahn equations We highlight the accuracy limitations of Eyre’s splitting in the simple, low order setting of the 1D Allen-Cahn problem (3.1). Consider the linear problem (3.17) in ripening states at which Theorem 3.4.1 applies. Applying a fully implicit BE time stepping method to this problem leads to the discrete problem V m = V m−1 + δtLV m 98 (3.42) Consider now taking large time steps. Components of V in V ⊥ are strongly reduced in a single time step. Components of V in V change slowly and represent the dynamics in the problem that need to be captured accurately during ripening. This characterizes the 1D Allen-Cahn equations as an extremely stiff problem, in the sense that it is even more stiff than the second order parabolic part acting on V ⊥ . For that reason, we strongly advocate the use of L-stable [88] time stepping schemes on this problem rather than weakly stable schemes such as trapezoidal rule or implicit midpoint rule. To consider the accuracy of the method, we make the standard ansatz V m = Gm φj (x) (3.43) where G is a constant to be determined and recall that φj is an eigenfunction of L with small eigenvalue λj . Inserting (3.43) into (3.42) leads to the expected G= 1 ≈ 1 + δtλj + δt2 λ2 + · · · j 1 − δtλj (3.44) where we have expanded the expression in terms of small δt. Comparing to the exact G=e δtλj 1 we see the term above has an error of 2 δt2 λ2 , again as expected. We presented j the standard analysis above to be able to compare it to the result from Eyre’s method. Linearizing Eyre’s method in this case leads to V m = V m−1 + δt(L − I)V m + δtV m−1 . 99 Here, it is seen why the Allen-Cahn equation is used for this discussion, since Eyre’s splitting preserves the eigen-structure of V in this case. With the same ansatz (3.43) we obtain G= 1 + δt ≈ 1 + δtλj + δt2 (λ2 − λj ) + · · · j 1 + δt − δtλj (3.45) in this case. Here the dominant error is λj δt2 , which is exponentially larger than the error in the fully implicit case. Another way to view this accuracy loss is that for the fully implicit method, if δtλj is small, the scheme is “accurate” but for the Eyre’s case, δt must be small when (3.45) is considered. Thus, exponentially smaller time steps must be taken with Eyre’s scheme than with a fully implicit method. In this context, higher order splitting methods of Eyre’s type based on standard implicit-explicit (IMEX) splitting do not solve this accuracy issue. While they are formally higher order accurate, they require time steps unreasonably small before they begin to be accurate, just as the lowest order scheme described above. We conjecture that this is true for all such IMEX approaches. For example, the “G” value for SBDF2 [102] is 1 δt3 (3λ3 − 2λ2 ) + · · · 1 + δtλj + δt2 λ2 + j j j 2 6 and for implicit-explicit midpoint rule [103] 1 2 2 δt3 1 + δtλj + δt λj + (3λ3 /2 − 3λ2 /2) + · · · j j 2 6 100 3.9.2.2 Condition number of solver for Eyre’s method for Cahn-Hilliard problems Here the relevant eigenvalue problem is (I + δtD(L − I))ψ = σ[I + δtD(L − I + 3(u2 − 1))]ψ (3.46) where I is the identity operator and D is the second derivative operator as before. The fundamental difference to the analysis in section 3.4.2.2 is that the operator multiplied by δt on the left is now L − I (which does not have small eigenvalues) rather than L (which does). This indicates there are not O(1/δt) eigenvalues to this preconditioned system as δt → ∞ but rather they remain O(1) in this limit as observed computationally above. Formally considering (3.46) as δt → ∞ leads to (L − I)ψ = σ(L − I + 3(u2 − 1))ψ (3.47) where it can be justified dropping the integration constant for σ = 1 at leading order due to the structure of u2 − 1 at ripening states. The operators on either side above are negative definite and symmetric, so we obtain upper bounds for the smallest eigenvalues when we take ψ = φj (the eigenfunctions of L with small eigenvalues so Lψj =e 0) in the expression above and then take the inner product with φj . This gives asymptotically σmin ≤ 1 ≈ 0.294 1 + 1/βAC 101 where βAC is the value (3.23) found previously for the relevant integral. We have not given convincing evidence that this should be the minimum eigenvalue. Considering (3.47) heuristically, the eigenfunction ψ for the minimum eigenvalue σ should minimize the size of (L − I)ψ and maximize the product with 1 − u2 , which elements of V do. In addition, the value does correspond to that observed computationally. 3.10 Conclusions We have presented a new approach to the computational approximation of energy gradient flows from material science models such as Allen-Cahn, Cahn-Hilliard and higher order and vector variants. The approach allows accurate time stepping with large time steps when the evolution is slow during ripening events. Some evidence is given that approaches based on Eyre’s operator splitting require much smaller time steps to maintain accuracy in these settings. Fully implicit time stepping with matrix-free Newton steps solved with the CG method are proposed in this work. An efficient preconditioner is identified. Computational evidence and formal asymptotics show that the solver has mild computational increase as is decreased and time step δt is increased. The approach is shown to work well on a number of problems in a general class and allows for an efficient GPU implementation. It is possible that the computational approach may also be useful for other models such as those from the study of liquid crystals or epitaxial growth [107, 108] that share the same structure. There are several avenues of study for theoreticians opened in this work. Arguments made using formal asymptotics backed by computational evidence in 1D could be made rigorous and extended to higher dimensions. The authors believe this is more than a technical exercise, but that real insight into the efficiency of different computational approaches can be obtained 102 from such a study. In addition, there is the conjecture in section 3.4.2.2 on the rank of a modified distance matrix. There have been many computational approaches to the Cahn-Hilliard equation, which is used widely in materials science studies and as a computational approximation to moving interface problems. In this well-developed computational field, a test suite of benchmark problems is clearly needed. A review of the methods in the literature, including a comparison on the benchmark problems, would be a valuable contribution. 103 Chapter 4 Exponential Integrator 4.1 General Exponential Integrator A significant challenge in simulating the Functionalized Cahn-Hilliard equation is finding the proper balance between time accuracy and reasonably fast computation times. Eyre’s method uses a convex splitting of the equation to guarantee energy decay for any size timestep [46, 34], but the time accuracy suffers [109]. Alternatively, a fully implicit preconditioned conjugate gradient method gives the proper time evolution and time accuracy, but the Newton solve is severely restricted [109], resulting in a method that is too slow to reasonably capture the long-time dynamics of the solution. Our goal is to develop a method that is both time accurate and is computationally efficient enough to describe fully relaxed solutions to the FCH equation. The exponential integrator method can provide the performance we desire under certain limitations, and it is also computationally efficient. The limitations are much less restrictive than the fully implicit method, and has much greater time accuracy than the convex splitting method. A review of the literature for exponential integrator methods is in Section 1.8.2. The work in this chapter follows the the papers published for ODEs by Du and Zhu under the name exponential time differencing (ETD) [66, 67], and by Cox and Matthews for stiff systems [65]. In this chapter we will present the ETD method for the CH and FCH equations, along with some variations in the time stepping to obtain greater stability and increased time 104 accuracy. We desire to solve the general PDE defined by ut = F (u) . We first separate F (u) into linear and non-linear parts, ut = Lu + N (u) . (4.1) By moving the linear portion to the left hand side and multiplying the equation by e−tL , we can obtain a total time derivative. ut − Lu = N (u) e−tL ut − e−tL Lu = e−tL N (u) e−tL u t = e−tL N (u) . Integration in time gives the general exponential integrator formula, u (tn+1 ) = eδtL u (tn ) + eδtL δt e−sL N (u (s + tn )) ds (4.2) 0 Up to this point, the derivation is exact, but we must now make approximations because we do not know enough about N (u (t)) to compute the integral. The first approach is to obtain an explicit scheme that is first order in time by taking N (u (t)) ≈ N (u (tn )) over 105 the interval of integration. We then compute the exponential integral by hand to obtain un+1 = eδtL un + eδtL − 1 N (un ) , L (4.3) where the notation un = u (tn ) is used here and in subsequent sections. Schemes that are higher order in time will be created by improving upon the approximation to N (u (t)). We use a Fourier spectral method to compute solutions to the CH and FCH equations (1.3 and 1.10). The spectral method comes with both advantages and disadvantages. The primary advantage is that the operators can be computed quickly and efficiently by hopping between the spectral and physical domains. This reduces nearly all of the calculations to element-wise operations on the arrays which can be computed extremely efficiently in parallel on a graphics processing unit (GPU). Further, solutions to the FCH equation develop layers with slope on the order of ε−1 [28], and the spectral method only needs a few grid points to capture such layers. The primary disadvantage of using a spectral method is that we are limited to periodic boundary conditions on rectangular domains. We expect to employ a finite element or discontinuous galerkin method when we need to change the domain and boundary conditions. 106 4.2 Stability and Linearization of the CH Equation The most straight forward way to split the CH equation into the proper form (4.1) is to define Lu = −ε2 ∆2 u − ∆u N (u) = ∆ u3 . Unfortunately, this splitting is not stable with respect to inversion of L. This can be seen by looking at the one-dimensional form of the operator in the frequency representation, where k is the wavenumber, L = −ε2 −k 2 2 − −k 2 = −ε2 k 4 + k 2 . (4.4) Ideally, the eigenvalues of the operator would be negative for all values of k or at least 1 nonzero, but here we have zeros at k = ε− 2 . There are a couple of different ways to rectify this problem. One way to stabilize the linear operator is to add and subtract a term in the equation that makes the operator non-positive in the frequency representation [56]. This gives, ut = −ε2 ∆2 u + ∆ u3 − u + β∆u − β∆u, 107 (4.5) with the splitting, Lu = −ε2 ∆2 u + β∆u N (u) = ∆ u3 − (1 + β) u . In the frequency representation, L = −ε2 k 4 − βk 2 , (4.6) so any β ≥ 1 gives the desired property. Through numerical simulations, we concur with Shen and Yang’s suggestion that β = 2 is the ideal choice. Unfortunately, the addition of this term introduces additional error which is undesirable, but like the method described in Chapter 2, it comes with a guarantee that the energy will never increase. Thus, this method is ideal for simulations where dynamics are less important that the final geometry or solution. This decrease in accuracy is evident in Table 4.1, and it becomes more significant for the Functionalized Cahn-Hilliard equation. A more accurate way to obtain stability under inversion is to linearize the problem using information from the physics of the solution. Solutions to the CH equation rapidly form domains where u (x) = ±1, so we choose to linearize about the background state u (x) = −1. We take v = u + 1 and substitute into equation 1.3 which gives vt = −ε2 ∆2 v + ∆ 2v − 3v 2 + v 3 . 108 (4.7) δt Simple Shen Physics-based ETD 1e-3 1.3741 9.7420e-2 7.2249e-2 2.5333e-3 1e-4 4.8206e-4 4.5247e-3 1e-5 6.8251e-5 3.3740e-4 1.3448e-4 6.5339e-6 1e-6 1.0790e-5 2.3092e-5 Table 4.1 Comparison of linearization choices for the CH equation. Error of the final solution is given for a simulation with fixed time step δt. For the simple linearization with δt = 10−3 , the solution became unstable and developed spurious oscillations. Taking the linear and nonlinear parts gives the operators Lv = −ε2 ∆2 v + 2∆v N (u) = ∆ v 3 − 3v 2 , where the frequency representation of L is strictly negative for all values of k except k = 0, as desired. With this formulation of the CH equation, we now apply the first order exponential integrator (4.3). We note that with this linearization, the first order method is unconditionally stable with respect to the time step size. Unfortunately, it suffers from a loss of convergence order for time steps larger than δt 4.3 10−4 . Linearization of the FCH Equation To effectively linearize the FCH equation, we use information from the physics of the solution. Solutions to the FCH equation rapidly form domains where u (x) = −1 is the dominant, background phase. We choose to linearize about the background state, u (x) = b = −1, by 109 taking v = u − b and substituting into equation 1.10 which gives vt = ∆ =∆ ε2 ∆ − W (v + b) + εη1 ε2 ∆ − W (b) − W ε2 ∆v − W (v + b) + ε (η1 − η2 ) W (v + b) (b) v − Q2 (v) + εη1 + ε (η1 − η2 ) W (b) + W (b) v + Q1 (v) ε2 ∆v − W (b) − W (b) v − Q1 (v) , where Q1 (v) and Q2 (v) contain the terms that are quadratic and higher order in v. For 1 our typical potential well, Wτ (u) = 2 (u + 1)2 1 (u − 1)2 − τ (u − 2) , we have Q1 (v) = 2 3 1 2 (6b + τ ) v 2 + v 3 and Q2 (v) = 3v 2 . We note that W (b) = 0, and separate the linear and nonlinear parts to obtain the operators Lv = ∆ ε2 ∆ − W (b) + εη1 ε2 ∆ − W (b) + ε (η1 − η2 ) W (b) v N (u) = ∆ − ε2 ∆ − W (b) + εη1 Q1 (v) + −W (b) − Q2 (v) ε2 ∆v − W (b) v − Q1 (v) + ε (η1 − η2 ) Q1 (v)] . This linearization is ideal because the frequency representation of L is strictly negative for all values of k except k = 0, making the operator stable under inversion for η1 ≥ η2 . Numerically, we observe stability as long as η1 and η2 are of the same order. With this formulation of the FCH equation, we now apply the first order exponential integrator (4.3). We note that with this linearization, the first order method is unconditionally stable with respect to the time step size, and now extend this linearization to higher-order in time methods for solving the FCH equation. 110 4.4 Computing Exponential Terms One way to improve time accuracy and obtain higher-order time-stepping is to expand N (v (t)) as a Taylor series in time. Over the integral of integration, we take the expansion about tn and truncate at the order of accuracy that we desire, N (v (t)) = N (vn ) + (t − tn ) JN dv dt + O δt2 tn = N (vn ) + (t − tn ) JN (Lvn + N (vn )) + O δt2 . d dt N (v) = JN dv dt (4.8) is calculated by hand, noting that all the time dependence is in v (x, t). We derive the second order method by first substituting the approximation into equation 4.2, vn+1 = eδtL vn + eδtL δt 0 e−sL [N (vn ) + sJN (Lvn + N (vn ))] ds. After pulling out the portions that are constant in time, vn+1 = eδtL vn + eδtL N (vn ) δt 0 e−sL ds + JN (Lvn + N (vn )) δt se−sL ds , 0 we integrate the exponentials by hand to obtain a second-order in time method, vn+1 = eδtL vn + eδtL − 1 eδtL − 1 − δtL JN (Lvn + N (vn )) . N (vn ) + L L2 (4.9) The exponential coefficients are universal for high order ETD methods, and they are particularly difficult to calculate numerically with reasonable accuracy. The exponential 111 coefficients are defined as: φ0 (z) = ez φ1 (z) = ez − 1 z φ2 (z) = ez − 1 − z z2 ... φl+1 (z) = 1 φl (z) − l! . z (4.10) In our implementation of ETD methods, z = δtL, and we always compute these terms in the frequency representation. Using this notation, equations 4.3 and 4.9 can be written as, vn+1 = φ0 (δtL) vn + δt φ1 (δtL) N (vn ) vn+1 = φ0 (δtL) vn + δt φ1 (δtL) N (vn ) + δt2 φ2 (δtL) JN (Lvn + N (vn )) , respectively. In Sections 4.5 and 4.6, we introduce methods that are higher order in time which use the higher order φ-functions. Unfortunately, computing φ1 (z) directly encounters significant errors when z is small, i.e. both δt and k are small. This comes from cancellation errors when rounding off values at machine precision, and can lead to errors that are orders of magnitude larger than the actual value [62, 110, 69]. These errors become increasingly problematic for φl (x) with larger values of l. To eliminate these errors, we use the Taylor expansion of φ1 (z) about z = 0, and include as many terms as necessary to obtain a desired tolerance. We find that direct computation of 1 φl (z) is sufficient if z > η l+1 , where η is machine precision (typically 10−16 ). In the spectral setting, L is small for wave numbers, k, near zero, but is typically O (1) or larger. Thus, small z depends on the size of δt which can be as small as 10−12 in our three dimensional calculations. When computing φ1 (z) and δt ≤ 10−7 , we divide the φ1 (z) operator into two portions: large k values are computed directly, and small k values are computed using a Taylor expansion out to a fixed number of terms that have been estimated a priori. In three 112 √ η dimensional calculations, the cutoff for which the expansion is necessary is given by L < δt , where the array L was computed once at the beginning of the simulation. The calculation is performed similarly when approximating the higher order φ-functions. An alternate method for stabilizing the calculation of φ-functions was presented by Kassam and Trefethen [69]. It involves calculating the value of an integral in the complex plane. When L is real, as the CH and FCH equations, the integration can be performed around any contour Γ that encloses the eigenvalues of the operator that are small in modulus. Kassam uses circles and 32 or 64 evaluation points with the trapezoidal rule to obtain stability and accuracy. Similar work by Du and Zhu suggests that only two points are necessary for up to the fourth-order Runge-Kutta scheme [67]. It seems reasonable that only a few points are necessary to provide stability, but we find it difficult to believe that two points is sufficient to see higher order accuracy in time, and the paper gave no results higher than second order. 4.5 Runge-Kutta Methods One way to improve the time accuracy of the exponential integrator is through the use of Runge-Kutta (RK) type methods. This is accomplished by computing approximations to N (v (t)) using multiple stages over the interval [tn , tn+1 ]. For the second order RK method, we take an explicit exponential integrator step an = eδtL vn + eδtL − 1 N (vn ) , L 113 then make the approximation N (v (t)) = N (vn ) + (t − tn ) (N (an ) − N (vn )) /δt + O δt2 . This approximation gives the ETDRK2 method described in [65, 66]. Denoting the numerical approximation to v (tn ) as vn , the second order RK method is eδtL − 1 N (vn ) L eδtL − 1 − δtL (N (an ) − N (vn )) . vn+1 = an + δtL2 an = eδtL vn + (4.11) A similar construction gives a third-order exponential integrator RK method, δt e2L−1 + N (vn ) an = L eδtL − 1 bn = eδtL vn + (2N (an ) − N (vn )) L δt e 2 L vn vn+1 = eδtL vn + δt−2 L−3 −4 − δtL + eδtL 4 − 3δtL + δt2 L2 N (vn ) + 4 2 + δtL + eδtL (−2 + δtL) N (an ) + −4 − 3δtL − δt2 L2 + eδtL (4 − δtL) N (bn ) } . (4.12) The standard fourth order Runge-Kutta method only attains third-order accuracy in time, but [65] gives a method with different coefficients that is fourth-order accurate for 114 exponential integrator schemes, δt an = δt e 2 L vn e2L−1 + N (vn ) L bn = δt e 2 L vn e2L−1 + N (an ) L cn = δt e 2 L an e2L−1 (2N (bn ) − N (vn )) + L δt δt vn+1 = eδtL vn + δt−2 L−3 −4 − δtL + eδtL 4 − 3δtL + δt2 L2 N (vn ) + 2 2 + δtL + eδtL (−2 + δtL) (N (an ) + N (bn )) + −4 − 3δtL − δt2 L2 + eδtL (4 − δtL) N (cn ) } . (4.13) Since we are only using explicit methods, we require that these methods have sufficient numerical stability. Stability for the RK and other methods will be discussed in Section 4.8. 4.6 Taylor Methods In Section 4.4, we derived a second-order, explicit ETD method based on a Taylor expansion of N (v) centered at tn . Unfortunately, numerical simulations showed that this method (4.9) is unstable for large time-step sizes and attains second order convergence for only very small time-step sizes. This is due to approximating the function on the left hand side of the integration interval where the exponential multiplier is smallest (remember that L ≤ 0 for each k in the frequency representation). To solve this dilemma, we instead approximate 115 N (v (t)) with an expansion about tn+1 , N (v (t)) = N (vn+1 ) + (tn+1 − t) JN dv dt + O δt2 tn+1 = N (vn+1 ) + (tn+1 − t) JN (Lvn+1 + N (vn+1 )) + O δt2 . (4.14) Again substituting into equation 4.2 and extracting the multiplying constants from the integral, we have vn+1 = eδtL vn + eδtL δt 0 e−sL [N (vn+1 ) + (δt − s) JN (Lvn+1 + N (vn+1 ))] ds = eδtL vn + eδtL {N (vn+1 ) + δtJN (Lvn+1 + N (vn+1 ))} δt − JN (Lvn+1 + N (vn+1 )) δt e−sL ds 0 se−sL ds . 0 Computing the integrals by hand leaves an implicit numerical method, vn+1 = eδtL vn + eδtL − 1 {N (vn+1 ) + δtJN (Lvn+1 + N (vn+1 ))} L eδtL − 1 − δtL JN (Lvn+1 + N (vn+1 )) . − L2 Rather than solve this implicit method, we add an explicit stage to approximate vn+1 , eδtL − 1 N (vn ) L eδtL − 1 vn+1 = eδtL vn + {N (an ) − δtJN (Lan + N (an ))} L eδtL − 1 − δtL + JN (Lan + N (an )) . L2 an = eδtL vn + 116 (4.15) This method based on the Taylor expansion is not unconditionally stable with respect to time-step size, but it allows a much larger time step size than (4.9) and shows second order convergence (see Section 4.7). Further, since the second stage of equation 4.15 accepts an approximation for vn+1 and returns a better approximation for it, we can iterate on the second stage of the method to reduce the error of our solution. As the number of iterations increases, the error in the solution approaches the error of a fully-implicit secondorder exponential integrator. In this way, the method behaves like a predictor-corrector type method. By taking more terms in our initial Taylor expansion, a similar third order method is an = eδtL vn + eδtL − 1 N (vn ) L ˙ N (an ) = JN (Lan + N (an )) ¨ N (an ) = HN [Lan + N (an )]2 + JN [L (Lan + N (an )) + JN (Lan + N (an ))] vn+1 = eδtL vn + eδtL − 1 L ˙ N (an ) − δtN (an ) + eδtL − 1 − δtL ˙ ¨ N (an ) − δtN (an ) L2 eδtL − 1 − δtL − 1 δt2 L2 ¨ 2 + N (an ) . L3 δt2 ¨ N (an ) 2 + (4.16) Again, each iteration of the second stage improves the solution by decreasing the error. Since the predictor stage is only first order, we require at least two iterations of the second stage to obtain a third order method. Further iterations drive the solution to the expected error of a fully-implicit third-order exponential integrator. 117 4.7 Convergence of ETD Methods To test the order of convergence in time step size, and to compare the methods discussed in this chapter, we completed a refinement study. The initial condition was a solution to the CH equation that was far past the spinodal phase but still far from the steady state solution. Time evolution was computed for fixed δt = 10−4 then halved and recomputed. We estimated the error for δt = 10−4 as the distance between the solutions measured in the maximum norm. Using the solution with δt = 5 · 10−5 , we again halved the time step to calculate the corresponding error. Figures 4.1 through 4.5 give the results of the refinement studies for each of the methods described above. Figure 4.1 shows the results for the higher order ETD-Runge-Kutta methods (equations 4.11, 4.12, and 4.13) together with the general ETD (4.3) for comparison. Dashed lines with the proper order are included for reference. We note that for small enough time step size, each of the methods reaches its expected order of convergence. However, as the size of the time step approaches δt = 10−4 the order of convergence begins to plateau. The reason for this behavior is discussed in Section 5.4.1. Figure 4.2 gives the corresponding plot for the ETD-Taylor methods (equations 4.15 and 4.16). The methods reach their expected order of convergence for larger size time steps than the ETD-RK methods, but again they plateau near δt = 10−4 . Figures 4.3 and 4.4 show the effect of iterating on the second stage of the method for the ETD-T2 and ETD-T3 methods, respectively. These iterations reduce the error by as much as three orders of magnitude in the ETD-T2 method and five orders of magnitude in the ETD-T3 method. Lastly, Figure 4.5 compares the error between the Runge-Kutta and Taylor methods for both the second and third order schemes. For the simple implementation, the Taylor 118 0 10 −2 10 −4 Error 10 −6 10 ETD ETD−RK2 ETD−RK3 ETD−RK4 −8 10 −10 10 −6 10 −5 10 δt −4 10 Figure 4.1 Timestep size refinement study for the ETD-RK methods given in Section 4.5. Dashed lines are included for comparison and have slope of two, three, and four respectively. methods have slightly more error than the corresponding ETD-RK methods, but including several iterations makes the Taylor methods much more valuable. 4.8 Stability Analysis In this section I compare the stability regions of the first through third order ETD methods described above (4.3, 4.11, 4.12, 4.15, 4.16). The stability region is the parameter region such that the magnitude of the amplification factor is less than or equal to one when it is applied to the model equation ut = ξu + λu, 119 (4.17) 0 10 −2 Error 10 −4 10 −6 10 ETD ETD−T2 ETD−T3 −8 10 −6 10 −5 10 δt −4 10 Figure 4.2 Timestep size refinement study for the ETD-Taylor methods given in Section 4.6. Dashed lines are included for comparison and have slope of two and three respectively. where Lu = ξu, and N (u) = λu. We assume ξ is a negative, real-valued constant and λ is complex. In Figures 4.6, we plot the regions for λ in the complex plane where un+1 ≤ 1 for one step with an initial condition un = 1. Similar plots and analysis for the ETD-RK methods can be found in [67] and [65]. As discussed in Section 4.6, the Taylor methods decrease error when the second stage is repeated, but unfortunately iterating on the corrector stage does not significantly increase the size of the stability region as shown in Figure 4.7. 120 0 10 −2 Error 10 −4 10 1 2 5 10 20 −6 10 −8 10 −6 10 −5 −4 10 δt 10 Figure 4.3 Timestep size refinement study for the ETD-T2 method with different numbers of iterations. The dashed line has slope two and is included for comparison. 4.8.1 Predictor/Corrector Analysis To understand the limits on the stability region for the ETD-Taylor methods, we take the limit as δtξ → 0 and look at the ode, ut = λu. (4.18) The backward Taylor series for this problem yields a single time step of the form un+1 = un + z − z2 z3 z4 + − + ··· 2! 3! 4! un+1 , where we can truncate the expansion at the order we desire for the method. 121 (4.19) 0 10 −2 10 −4 Error 10 −6 10 1 2 5 10 20 −8 10 −10 10 −6 −5 10 10 δt −4 10 Figure 4.4 Timestep size refinement study for the ETD-T3 method with different numbers of iterations. The dashed line has slope three and is included for comparison. Taken together with a predictor step, we have: Predictor: u[0] = p0 (z) · un (4.20a) Corrector: u[k] = un + p(z) · u[k−1] , (4.20b) with k as the counter for number of corrector iterations. To implement the second order ETD-T2 scheme (4.15) for the ode of interest (4.18), we use p0 (z) = 1 + z in (4.20a) which yields the first order predictor, and set p(z) = z − z 2 /2 in (4.20b) giving the second order corrector. Each iteration of (4.20b) will increase the order of the method, up to the order of accuracy of p(z). Thus the second step of ETD-T3 must be performed at least twice to achieve solutions that are third order in time. 122 0 10 −2 10 ETD ETD−T2 ETD−RK2 ETD−T2 (20) ETD−T3 ETD−RK3 ETD−T3 (20) −4 Error 10 −6 10 −8 10 −10 10 −6 10 −5 −4 10 δt 10 Figure 4.5 Comparison of the error for ETD-RK methods versus the ETD-Taylor methods with one and twenty iterations. The exact solution to the linear difference equation (4.20a)-(4.20b) is given by, u[k] = un 1 + p(z)k p0 (z) − 1 − p(z) 1 − p(z) un . (4.21) Provided |p(z)| < 1, the error converges to the same as a fully implicit scheme, but unfortunately this has a much smaller region of stability. For second and third order corrector steps, Figure 4.8 shows the regions where |p(z)| < 1 which match with the results of Figure 4.7 for small δtξ. 4.9 Implementation of ETD-RK2 To obtain larger-scale, three-dimensional solutions to the CH and FCH equations, we implemented the ETD-RK2 method for time stepping combined with a spectral solution in space. 123 4 4 ETD ETD−RK2 ETD−RK3 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 ETD ETD−T2 ETD−T3 −3 −4 −4 −2 0 2 4 −4 −4 (a) Runge-Kutta δtξ = −0.01 −2 0 2 4 (b) Taylor δtξ = −0.01 15 15 ETD ETD−RK2 ETD−RK3 10 10 5 5 0 0 −5 −5 −10 ETD ETD−T2 ETD−T3 −10 −15 −15 −10 −5 0 5 10 15 (c) Runge-Kutta δtξ = −10 −15 −15 −10 −5 0 5 10 15 (d) Taylor δtξ = −10 Figure 4.6 Stability regions for exponential time differencing methods up to third order in time. Regions are plotted in the δtλ complex plane. 124 4 3 15 ETD ETD−T2 ETD−T3 10 2 1 5 0 ETD ETD−T2 ETD−T3 0 −1 −5 −2 −10 −3 −4 −4 −2 0 2 4 −15 −15 −10 (a) δtξ = −0.01 −5 0 5 10 15 (b) δtξ = −10 Figure 4.7 Stability regions for ETD-T2 and ETD-T3 methods with 100 iterations of the second step. Regions are plotted in the δtλ complex plane. 4 3 2nd Order 3rd Order 2 1 0 −1 −2 −3 −4 −4 −2 0 2 4 Figure 4.8 Contours where p (z) = 1, giving limits on the region of stability for ETD-T2 and ETD-T3. The regions are plotted in the δtλ complex plane. Compare with Figure 4.7 125 Two properties make the spectral method ideal for this calculation: high spatial resolution captures solution fronts, and computing derivatives is fast and easy. Details for the general spectral method are given in Section 3.3. We first wrote the code in C++ and parallelized it using OpenMP. After we were satisfied with the stability and validity of the CPU version of the code, we wrote a corresponding code for a GPU using C++/CUDA. The Fourier transforms were computed using cuFFT, a package similar to FFTW adapted for GPUs. We used an adaptive time-stepping scheme based on error control, and a cubic domain with 128 grid points in each direction. The cubic domain is limited in the number of Fourier modes in each dimension by the size of the GPU’s on-card memory. We performed our calculation on an Nvidia M1060 which has 4 gigabytes of memory leading to the 128 mode limitation. 4.9.1 Application to a GPU When writing algorithms for a graphics processing unit there are three important hardware properties; massive parallelism, fixed memory limit, and asynchronous scheduling. Algorithms that are effective within these limitations can gain impressive speed as shown in the previous examples (Sections 2.9 and 3.6.1). Modern GPUs typically have thousands of streaming multiprocessors that can handle thirty-two threads at a time. To be computationally effective, this impressive parallelism requires problems of sufficient size. If the computational requirements of the algorithm are too small, multiprocessors will sit idle and lead to a decrease efficiency. On the other hand, each GPU has limited on-card memory, so an effective algorithm will not require large amounts of storage. This on-card memory is limited to about 6GB of storage for current GPUs, and any memory transfer onto and off of the card is much slower than the speed of calculation, so it is effectively a hard limit on the amount of usable memory. 126 Multiply/Add FFT IFFT u u ∆ ∆u u u2 + u ∆u u2 + u ∆u Figure 4.9 Diagram showing how to implement a spectral method on a GPU to avoid synchronization, minimize memory use, and fully utilize the parallelism of the GPU. Finally, graphics processing units gain speed by hiding latency. This is done by scheduling with a first-come-first-served method for each of the streaming multiprocessors. Every time an algorithm requires a synchronization of the threads, the scheduler must clear out and restart, which slows down the computation. This is particularly relevant when computing with a spectral method because synchronization must occur when switching between regions of element-wise calculations and regions of Fourier transforms. Figure 4.9 gives the process for computing u2 + u ∆u. This term does not appear in either the CH or FCH equations, but it gives a simple working example to better explain how the more complicated algorithms work. Beginning with two arrays, u in the spatial domain and ∆ in the frequency domain, we alternate between FFT/IFFTs and elementwise operations. Note that by computing u2 + u at the last level we are able to perform the calculation requiring storage of only one additional array. Further, when the domain is relatively large, every streaming multiprocessor will be required giving full usage of the massive parallelism. 127 4.9.1.1 GPU Speedup For the ETD-RK2 method, we conducted a speed test similar to Section 2.9. The test consisted of a very short simulation (approximately thirty time steps) computed with a single graphics processing unit compared against a fully parallelized code running on eight CPU processors. The hardware for the two codes was purchased at roughly the same time by the High Performance Computing Center at Michigan State University, so the comparison is fair. The same random initial data was used for each simulation, and the parameters were: ε = 0.03, η1 = 5.0, η2 = 3.0, and τ = 0.6. We adjusted the number of grid points in the domain by powers of two in each dimension keeping the shape as close to cubic as possible. The CPU simulations were run on two quad-core Intel Xeon E5620 processors, with each of the eight cores having 2.4 GHz of processing speed. The code was written in C++ and aggressively parallelized with OpenMP using the FFTW library for the Fourier transforms. The GPU simulation was on an Nvidia Tesla C2075 which has 448 cores with 1.15 GHz processing speed. The code was written in C++ and CUDA using the cuFFT library for the Fourier transforms. Figure 4.10 shows the number of times faster that the GPU calculation ran compared to the CPU calculation. As discussed before, the total computation speedup is lower than the speed up per time-step due to the slow transfer of data to and from the GPU card. As computation length increases the total computation speed up will approach the higher value. Further, the size of the problem is also important. As the number of grid points increases, so does the amount of computation between thread synchronizations on the GPU leading to greater efficiency. The number of grid points for this computation cannot be increased above 102 2 due to the 6GB memory limit on the card. 128 Times Faster Than CPU 20 15 10 5 Per Timestep Total Time 0 15 16 17 18 19 20 21 22 Number of Points in Domain (log 2P) Figure 4.10 Number of times faster the GPU computation is over an eight-core, fullyparallelized, CPU computation. An aspect of supercomputing centers that is gaining notoriety is the issue of power consumption [111]. For this speed comparison, the CPU calculation used eighty watts of energy per processor for a total energy consumption of 160 W, while the GPU used 225 Watts. Although the CPU used less power per unit time, the total power consumption was two to nine times less for the GPU than the CPUs, because the GPU completed the calculation many times faster. Energy efficiency is an additional advantage of GPU computing. 4.9.2 Adaptive Time-Stepping In order to capture fast transient behaviors in the Cahn-Hilliard and Functionalized CahnHilliard simulations, we implemented a second order adaptive time stepping scheme with the ETD-RK2 method. Unlike the pairs of methods described in Chapter 3, we do not have a corresponding implicit scheme with matching error terms. Instead we use the trapezoidal 129 method to compare against, un+1 = un + δt (F (un+1 ) + F (un )) . 2 (4.22) It would be pointless to solve the second order implicit equation, so we take the solution from ETD-RK2 at the new time, uETD-RK2 , substitute it into equation 4.22, and explicitly compute an approximation to the solution from the trapezoidal method, utrap = un + δt (F (uETD-RK2 ) + F (un )) . 2 (4.23) Taking the difference of these two solutions at time tn+1 , we obtain an estimate, η, of the exact single-step error, ηe , ηe ≈ η := 1 u − utrap . 2 ETD-RK2 (4.24) This estimate is used to choose the size of the next time step, δtn+1 , with δtn+1 = δtn max 0.8 3 σ , 1.3 , η where σ is the desired tolerance, and as before, 0.8 and 1.3 are safety factors. The adaptive time-stepping is primarily important for the beginning stages of a simulation where the solution rapidly relaxes onto the slow manifolds of evolution. It also becomes important later in simulations where domains grow close together and rapid topological changes occur. At all times, we restrict our time-step to δt ≤ 10−4 , for the reasons discussed in Sections 4.7 and 5.4.1. 130 4.10 Numerical Examples for FCH As with the other methods, we we used the high-performance solver to investigate important geometries and structural evolutions that had connections to the analysis of the models. Using the medusa head problem in two dimensions, we show that the ETD-RK2 method gives proper convergence in time for important topological events. Second, we applied the threedimensional GPU solver to a geometry with a hollow sphere interacting with hoop shaped pores. Third, we will show evolution of a punctured hollow sphere that has applications to vesicle membranes in biology. 4.10.1 Medusa Head in 2D The medusa head problem forced us to make some important changes to our early choice of time-stepping method for the FCH equation. In Section 2.7, we initially presented the medusa head in three dimensions where we used the convex splitting method to compute time evolution. When we went back to show convergence in time for the precipitous drop in energy, we discovered that when we cut the time step in half, the event happened in approximately half the time. If this effect was due to compounding error that drove the system away from the unstable equilibrium, we would expect that halving the time step would lengthen the time before the event. We chose this geometry as a check against the undesirable behavior shown by the convex splitting method. In Figure 4.11, we ran the simulation with the same initial condition and fixed time-steps of 2 · 10−4 , 10−4 , and 5 · 10−5 . The event did happen sooner for smaller time step sizes, but difference was very minor. This suggests that our time evolution is precise, and future work will address accuracy in time evolution compared against the asymptotic 131 analysis of the model. The calculation was done in MATLAB. 4.10.2 Punctured Hollow Spheres An area of research in cell biological where the FCH model could prove useful is in the study of lipid membranes. Cells use vesicles (hollow spheres) composed of lipid bilayer membranes to transport nutrients and perform other vital functions. Recently, molecular dynamics simulations have been used to study formation of these membranes [112, 113]. Using our phase field model, we studied the time evolution of a vesicle that had been punctured by π π removing the section with azimuthal angle, φ, less than π , 16 , and 32 . 8 132 Figure 4.11 Time evolution of an unstable steady state solution to the FCH equation in two dimensions. Energy versus time is shown for three identical simulations with different fixed time step sizes. 133 Figure 4.12 shows the time evolution of the punctured vesicle with φ ≤ π removed. 8 Initially, the opening receded and reduced the line-energy by thickening the edge. After, the opening had stabilized, the hole slowly closed. The parameters for simulation were: ε = 0.03, η1 = 5, η2 = 10, τ = 0.2, and the domain was [− π , π ] cubed. In non-dimensionalized 2 2 time, the closing times for the three punctured spheres was T = 10.9934, T = 0.9658, and T = 0.03217 respectively, as measured by the sign change of the second derivative of the energy decay from negative to positive. 4.11 Conclusions The exponential time differencing schemes provide a good balance of accuracy and speed when computing solutions to the Functionalized Cahn-Hilliard equation (1.10). The general ETD scheme can be extended to higher-order time-stepping schemes by approximating the nonlinear portion of the equation using both Runge-Kutta or Taylor expansions. Finding a good linearization of the right hand side of the equation provides dividends in both stability and accuracy of the calculation. The best linearization that we found was based on the physics of the model by computing the linearization about the dominant phase u = −1. We studied the stability and accuracy for the higher-order ETD-RK and ETD-Taylor methods and found that in general the Runge-Kutta methods had better stability properties and accuracy for the simple schemes. However, when we treated the ETD-Taylor methods like predictor corrector schemes, it only took a few iterations to obtain a scheme hundreds of times more accurate that either the ETD-RK or simple ETD-Taylor methods. Higher-order ETD schemes rely on precise calculation of the φ-function to be stable and accurate. 134 T =0 T = 0.5 T = 10.0 Figure 4.12 Time evolution of a punctured vesicle. 135 T = 15.0 Adaptive time-stepping based on error estimates improved the speed of the calculations for reasonable accuracy. We were able to use this combination of speed and accuracy to capture accurate evolution of physically and biologically relevant structures. Certainly, exponential time differencing scheme are adequate for simulation of the FCH equation. 136 Chapter 5 Comparison of Methods for the FCH Equation To quantify the differences between the methods proposed in Chapters 2, 3, and 4, we performed an experiment on the punctured hollow sphere geometry from Section 4.10.2 using the three-dimensional GPU codes given in Appendix ??. The initial condition used π was a sphere with angle φ ≤ 16 removed and evolved numerically until a stage after the hole had closed, but the sphere had not yet become smooth. This gives a smooth geometry on the manifold of evolution that will evolve quickly without any changes in topology. The parameters for the test were: ε = 0.03, η1 = 5ε = 0.15, η2 = 10ε = 0.30, and τ = 0.20. We compare the accuracy, computation speed, and time step restrictions for the three methods computed on the GPU. 5.1 Description of Compared Schemes The implementation of the convex splitting method applied the scheme from equation 2.5 with the splitting given in equations 2.12 and 2.13. The fixed-point iteration uses the operator defined by equation 2.31 with c1 = c2 = 5.0. The error from this method is expected to be O (δt), and the adaptive time-stepping constant is set at ct = 0.16 (see 2.41). The implementation of the implicit method used the scheme described in Section 3.3.3 137 with the Jacobian matrix and associated preconditioned matrix given in equations 3.13 and 3.14. The error from this method is O (δt), and the adaptive time stepping tolerance was set to σ = 10−4 . The implementation of the exponential time differencing used the ETD-RK2 scheme (4.11), with the linearization from Section 4.3. The error from this method is O δt2 . We compute adaptive time stepping based on the difference between the solutions after both stages at time t = tn+1 . This gives a fast first-order adaptive time stepping that does not require any extra computation. 5.2 Complete Simulation For the first comparison, we ran a simulation from T = 0 to T = 2 with each of the three methods. We ran the simulations using the fully parallelized and optimized GPU code, and we allowed completely adaptive time stepping. Results from the simulations are given in Table 5.1. To obtain error estimates, we compared against a “true” solution computed using the ETD-RK2 method with a fixed time step size of δt = 10−6 . We also computed the solution with a fixed time step size δt = 10−5 so that we could be confident that this “true” solution was accurate enough (i.e. the error was at least an order of magnitude smaller than the comparison solutions). The first major result was that the solution from the convex splitting method did not evolve enough to close the puncture as shown in Figure 5.1. This not only gave significant error, but it did not allow us to compare event times, and the calculated energy of the final solution was significantly different from the other two methods. Further, without the rapid changes in geometry that accompany the event, the adaptive time stepping scheme did not 138 Convex Splitting Fully Implicit Exponential Integrator Error 29.66 8.6836e-3 1.8772e-1 Maximum δt 1.8e-2 7.7e-4 1.0e-4 194 43172 4540 Computation Time (s) Number of Time Steps 190 2997 21109 none 0.9654 0.9658 Event Time Energy Drop 169.604 176.337 176.335 l2 Table 5.1 Comparison of the convex splitting, fully implicit, and exponential integrator methods for a full simulation up to T = 2 with adaptive time stepping. Convex Splitting Implicit and ETD-RK2 Figure 5.1 Final states of the complete simulation comparing the convex splitting method to the other two methods. The images show a two-dimensional slice of the three-dimensional solutions slow down for that portion of the simulation and the computation time was absurdly fast. The error from each simulation was calculated with the l2 -norm, and it shows that the implicit scheme was approximately twenty times more accurate than the ETD scheme. The implicit scheme also took time steps up to seven times larger than the ETD scheme. The big winner for the ETD method is that it was roughly ten times faster than the implicit scheme, and it accurately predicted the geometric event and ended with a computed energy very close to that of the implicit scheme. The speed of the exponential time differencing was much faster than the other two methods. On average, the ETD scheme took 0.215 seconds per time step compared to 1.021 139 second and 14.405 seconds for the convex splitting and implicit schemes, respectively. This explains why the ETD scheme can successfully take far more small, accurate time steps. This speed difference is particularly important for the three-dimensional simulations. With the ETD scheme, we can compute important geometries out to T = 300 or less in one week. The same simulation with the convex splitting would likely miss the important dynamics, and the implicit scheme would require more than two months. 5.3 Fixed Time-Step Simulation For a second comparison of the three methods, we conducted fixed time-step simulations for δt = 4 · 10−4 , 2 · 10−4 , and 1 · 10−4 . For initial conditions, the simulation used the “true” solution from above at time T = 1, and we computed the solution up to T = 2. This initial geometry was shortly after the closing of the puncture and it evolved to resemble a hollow sphere. The results are collected in Table 5.2, and they give further evidence that the convex splitting method has particularly bad evolution and thus poor accuracy. The implicit method is roughly two orders of magnitude more accurate than the ETD method for the time-step sizes used. The convex splitting scheme and the exponential integrator compute time steps at roughly the same speed, with the implicit scheme taking five to ten times longer. It is particularly interesting that when the step size halves (number of steps double), the amount of computation time does not double. Some of this can be attributed to the overhead of slow data transfer onto and off of the GPU card. However, the implicit scheme actually becomes more efficient for each time-step when the steps are small. This is because the initial guess for the Newton solve is much closer when the time-step is small, and it therefore uses fewer 140 Convex Splitting Fully Implicit δt Error Time Error Time 4e-4 5.3619 603 1.2657e-3 6968 2e-4 4.1915 1016 6.1597e-4 8096 1941 2.9146e-4 11973 1e-4 2.6570 Exponential Integrator Error Time 6.1249e-1 751 2.9415e-1 1126 1.0623e-1 2159 Table 5.2 Comparison of error and simulation time for fixed time-step simulations of the FCH equation. Error calculations use the l2 norm, and time is given in seconds. Newton iterations and far fewer preconditioned conjugate gradient iterations. 5.4 Time Step Size Restrictions Although we have focused on time stepping methods that balance large time steps with accurate solutions and reasonable computation speeds, we have discovered that there are soft limits on the size of time steps for each of the three types of methods. Both the convex splitting method and the exponential time differencing give incorrect evolution for time steps that are too large. The implicit scheme is limited by the ability to complete the Newton convergence and the error control. In this section, we discuss the causes for these limitations and how to adjust for them. 5.4.1 Solution Freeze Out for Convex Splitting In our work with Eyre’s convex splitting method, we discovered that some numerical methods for the Cahn-Hilliard equation suffer from a behavior we call solution freeze out. This behavior occurs when doubling the time step size does not give twice the evolution of the solution, and the movement of mass with respect to time vanishes. This can happen when the higher frequency modes in the solution are over damped, causing the mass in the phases to have a limit on the distance evolved in a given time step. 141 T = 2 · 10−2 T = 4 · 10−2 T = 6 · 10−2 T = 8 · 10−2 T = 10−1 Figure 5.2 Time evolution showing freeze out of solution with larger time step sizes. For rows one through three δt = 10−2 , 10−3 , and 10−4 respectively. We first noticed this behavior when performing a refinement study on an unstable equilibrium initial condition. When we increased the time step size by a factor of ten, the critical event happened nearly ten times later in the evolution. This was surprising because we expected any increase in error to drive the geometry away from the equilibrium point more quickly. Figure 5.2 shows this type of evolution for time step sizes δt = 10−2 , 10−3 , and10−4 in the coarsening stage of evolution for the CH equation. The coarsening of the solution should occur near T = 2 · 10−2 , but is not captured for the largest time step size. Further, when δt = 10−3 , the freeze out leads to incorrect early topological changes which give a different solution at the final time. This freeze out behavior was also observed by Gomez and Hughes [55]. 142 5.4.2 Solution Freeze Out for Exponential Time Differencing The first order exponential integrator also experiences freeze out. This causes the decrease in accuracy leading to loss of convergence order which can be seen in Figures 4.1 and 4.2 as δt approaches 10−4 . The question becomes, at which time step size do we stop trusting the evolution of the solution? The obvious answer is to look at the order of convergence. Table 5.3 shows the time refinement study that gives the order of convergence for a given fixed δt used in the first order ETD scheme (4.3). For any δt 10−4 the method ceases to be first order in time, and it will give incorrect evolution. The study was performed on a solution of Cahn-Hilliard equation that was far beyond the spinodal stage, but this behavior is consistent with long-time simulations of the FCH equation. δt Order 1.00E-3 -0.1754 5.00E-4 0.4640 2.50E-4 0.8845 1.25E-4 1.0263 6.25E-5 1.0433 3.12E-5 1.0354 1.56E-5 1.0266 Table 5.3 Order of convergence at given time step sizes. More than just a refinement study tells us why δt ≈ 10−4 is the appropriate cutoff. If we look back at equation 4.3, we notice that all of the time evolution is embedded in the exponential operators. Freeze out happens when doubling the time step size results in less than twice the evolution. In Figure 5.3, we plot the amount of evolution versus time step size and wave number (as in 4.4). We define amount of evolution as the fractional change in 143 the solution divided by the fractional change in time. For the first order ETD method, amount of evolution = φ1 (2dtL) −1 φ1 (dtL) 2dt − 1 dt + 1. As timestep size grows, note that the freeze out begins initially with high-frequency wave numbers. This behavior brings stability to the method, but when δt gets too large it starts damping out physically relevant wave numbers. We identify physically relevant wave numbers by the energy spectrum of the solution (Figure 5.4). For the geometry and length scales we compute, a typical simulation must capture |k| requires δt 40, and from Figure 5.3, we see that this 10−4 . Switching now to the FCH equation, Figure 5.5 shows that the change of the linear operator increases the freeze out behavior, particularly for the convex splitting method. In two dimensions, the FCH equation requires accuracy for essentially the same wave numbers as the CH equation due to geometric similarities. Thus for the ETD schemes, we limit time steps to a maximum of dt = 10−4 any time that the adaptive time stepping scheme estimates a time step larger than that. For the convex splitting scheme, the same limitation would require time steps no larger than δt = 10−7 which greatly reduces its feasibility. 5.4.3 Implicit Time-step Size Restriction The implicit scheme has a more standard time-step size restriction. Due to the fourthorder nonlinearity in the Functionalized Cahn-Hilliard equation, Newton’s method will not converge if the initial guess is not close enough. We have two reasonable choices for the initial guess; the solution at the last computed time step, or the solution of a forward Euler step. Preconditioning the conjugate gradient iteration has no effect on the time-step size 144 (a) Exponential Integrator (b) Convex Splitting Figure 5.3 Freeze out of operators for the two methods applied to the Cahn-Hilliard equation showing the amount of evolution when doubling the time step size. Note that both methods suffer from similar freeze out behavior. 145 (a) Typical Solution to the CH Equation (b) Energy Spectrum Figure 5.4 The energy spectrum of solutions to the Cahn-Hilliard equation is dominated by low wave numbers. restriction. When using the last solution as the initial guess, the larger the step in time, the farther away the initial guess is from solution at the next time. This gives a time-step restriction on the order of 10−6 for the FCH equation with ε = 0.03. On the other hand, using a forward Euler step to initialize the implicit solve also has a limitation. The Euler step should capture the evolution and make the initial guess closer, but the explicit scheme is unstable for large time-steps and introduces spurious oscillations. This drives the forward Euler guess away from being a good initializer. It turns out that the forward Euler step is better in most cases and provides a time-step restriction on the order of 10−4 when ε = 0.03. 5.5 Conclusions In conclusion, the exponential time differencing methods give the best combination of speed and accuracy, both of which are important in computing long time evolution for solutions 146 (a) Exponential Integrator (b) Convex Splitting Figure 5.5 Freeze out of operators for the two methods applied to the Functionalized CahnHilliard equation showing the amount of evolution when doubling the time step size. Note that the convex splitting method suffers from much stronger freeze out behavior. 147 of the Functionalized Cahn-Hilliard equation in three dimensions. In combination with a spectral solution in space, application of the ETD method to a graphics processing unit greatly enhances the speed of the simulation and allows us to compute up to T = 300 in a week or less which is a great improvement over the fully implicit scheme. Although the ETD methods suffer from freeze out effects, the restriction is much less stringent than the convex splitting method. This gives time accuracy when the code enforces a hard limit on the size of the time-step. The ETD method captures relevant geometric evolution better than convex splitting and much faster than a fully implicit scheme. The Functionalized Cahn-Hilliard equation can be effectively split using the convex splitting method proposed by Eyre [34, 46]. The scheme is very efficient computationally and performs its implicit iterations on a timescale much like an explicit method. A convex splitting scheme could be very effective in any calculation where the solution at final time is important, but the evolution that arrives at that solution is unneeded. The primary drawback is the lack of accuracy when time-steps are restrictively small. The convex splitting is also very difficult to adapt to new potential wells or vector phase problems which will be necessary when connecting with ab initio simulations and experimental work. Lastly, the fully implicit scheme is the most accurate by at least an order of magnitude. If very precise evolution is required an implicit scheme is the proper choice. The cost of using the implicit scheme is its slow computation time. Preconditioning the Newton solve with a physics-based preconditioner does not alleviate the small time-step sizes required for the iterative solve to converge, but it does reduce the number of conjugate gradient iterations by up to a factor of one hundred and cuts the computation time similarly. 148 5.6 Future Work As with any interesting research, there are a wide variety of directions we could take in the future. An interesting extension would be to improve the model to more than two phases. Some of the analysis has already been performed for three phases, and the new energy formulation comparable to equation 1.6 is  E (u) = 1 2 ε ∆U − W Ω2 2 −u2  U + εH0   u1 − ε η1 ε2 2 U 2 + η2 W U dΩ. (5.1) U = (u1 , u2 )T defines the fraction of polymer phases one and two, and the fraction of solvent is us = 1−u1 −u2 so that the total amount is identically one. Further, the εH0 term allows us to introduce intrinsic curvature and have greater control when modeling important biological systems such as endocytosis and exocytosis. An important aspect of future work could be the development of better ties to the physics through a continuum mechanics description. Currently, there is evidence from experiments and some asymptotic analysis that suggests that the model is reasonable, but it would vastly improve the value of the FCH model if we could identify what types of physical systems connect well with the model’s results. With this development, we would be able to identify the physical constants that each of the four coefficients in the FCH energy, namely ε, η1 , η2 , and τ . We would also need to obtain an accurate potential well, W (u) for each specific system. A more physics based application of the model is comparison with small angle scattering either with x-rays (SAXS) or neutrons (SANS). We have put some effort into this connection already, but the extension of the numerical to SAXS comparison could lead to an inverse 149 problem to find the four coefficients in the FCH energy. With some good fortune, this inverse problem could assist engineers and chemists in creating better membranes for PEM fuel cells, solid state Dye-Sensitized solar cells (DSSC), or polymer gel electrolyte batteries. Finally, one of the widest areas of future research would be the generation of better numerical methods for handling different boundary conditions and larger domains. With the ability to compute solutions on larger domains with any type of boundary condition, we could simulate the entire pore network formation of Nafion or other materials from lab-like conditions. There would no longer be the restrictions of simulating inside bulk material with uniform water concentrations. The likely candidate for these changes is high order discontinuous Galerkin methods with global domain decomposition. Although not exhaustive, this list gives a view of the opportunities for numerical computation that exist with the Functionalized Cahn-Hilliard model. The methods and implementations described deliver significant results, and valuable knowledge will come from the continuation of this research. 150 APPENDICES 151 Appendix A: Convex Splitting Code in CUDA 1 2 3 4 // // spe6NL3Dcuda6 . cu // // This code was w r i t t e n by J a y l a n Jones t o approximate s o l u t i o n s to the 5 // F u n c t i o n a l i z e d Cahn−H i l l i a r d Equation u s i n g th e convex splitting 6 // scheme p a t t e r n e d a f t e r t he work o f Eyre . 7 // 8 9 #i n c l u d e 10 #i n c l u d e 11 #i n c l u d e 12 #i n c l u d e 13 #i n c l u d e < c u f f t . h> 14 #i n c l u d e