DEVELOPMENT OF A FAST AND ACCURATE TIME STEPPING SCHEME FOR
THE FUNCTIONALIZED CAHN-HILLIARD EQUATION AND APPLICATION TO A
GRAPHICS PROCESSING UNIT
By
Jaylan Stuart Jones

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
Applied Mathematics and Physics - Doctor of Philosophy
2013

ABSTRACT
DEVELOPMENT OF A FAST AND ACCURATE TIME STEPPING
SCHEME FOR THE FUNCTIONALIZED CAHN-HILLIARD EQUATION
AND APPLICATION TO A GRAPHICS PROCESSING UNIT
By
Jaylan Stuart Jones
This dissertation explores and develops time-stepping schemes for computing solutions
to the Functionalized Cahn-Hilliard (FCH) model. It is important to ﬁnd a scheme that is
both fast enough to compute evolution to the long-time states and to give enough accuracy
to capture important geometric events. The FCH model is relatively new, and very little
work has been done to develop eﬃcient numerical schemes for its simulation, so much of this
work is based on the extensive work done on the Cahn-Hilliard (CH) model. For each of
the methods, the spatial approximation is computed with a Fourier spectral method. All of
the schemes are adapted to be computed on a graphics processing unit (GPU) which gives
signiﬁcant improvements in the speed of the simulation.
First, an implicit-explicit (IMEX) method will be introduced that is based on a convex
splitting of the right hand side of the equation. This splitting guarantees that the solutions
will decay in energy for any size time step, which gives numerical stability for very large time
steps. With this splitting, a novel iterative method for solving the implicit portion greatly
improves the numerical eﬃciency.
Second is the development of a fully implicit method that attains high accuracy. The
method uses a conjugate gradient method to solve the Netwon’s method iterations, and is
preconditioned using a physics based approximation to the operator that is easy to invert
and numerically eﬃcient.

Lastly, exponential time diﬀerencing (ETD) methods are derived for the Cahn-Hilliard
and Functionalized Cahn-Hilliard Equations. The ETD methods are all explicit which aﬀords
computation speed, and higher order versions are natural extensions giving accurate time
stepping.
Finally, numerical experiments for the three types of methods compare the accuracy
and speed. These simulations are performed for both ﬁxed time-step simulations as well
as adaptive time steps. This gives a clear picture of the strengths and weaknesses, and
it gives enough information to determine which time-stepping method will work best for
approximating solutions to the FCH equation.

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1 Introduction . . . . . . . . . . . . . . . .
1.1 Order Parameters and Phase Field Models . . . .
1.2 The Cahn-Hilliard Model . . . . . . . . . . . . . .
1.3 Sharp Interfaces and the Canham-Helfrich Model
1.4 The Functionalized Cahn-Hilliard Model . . . . .
1.5 Proposed Experimental Nanomorphologies . . . .
1.6 Functionalized Cahn-Hilliard Equation . . . . . .
1.7 Numerical Challenges for the CH and FCH . . . .
1.8 Numerical Methods for the CH Equation . . . . .
1.8.1 Spatial Schemes . . . . . . . . . . . . . . .
1.8.2 Temporal Schemes . . . . . . . . . . . . .
1.9 Spectral Method . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

1
1
2
6
7
10
12
13
15
15
17
20

Chapter 2 Convex Splitting for the FCH Equation
2.1 Gradient Splitting Method . . . . . . . . . . . . . .
2.2 Gradient Splitting Example . . . . . . . . . . . . .
2.3 Splitting the FCH Equation . . . . . . . . . . . . .
2.4 Stability and Solvability . . . . . . . . . . . . . . .
2.5 Fixed-Point Iteration . . . . . . . . . . . . . . . . .
2.6 CPU version of code . . . . . . . . . . . . . . . . .
2.7 Adaptive Time-Step . . . . . . . . . . . . . . . . .
2.8 FCH on Graphics Processing Units . . . . . . . . .
2.9 GPU Speed-Up . . . . . . . . . . . . . . . . . . . .
2.10 Comparison of results to experimental data . . . . .
2.11 Conclusions . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

22
22
24
26
28
32
36
37
39
42
43
44

Chapter 3 Fully Implicit Method . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . .
3.2 Models . . . . . . . . . . . . . . . . . . . . . . .
3.3 Basic numerical approach and results . . . . . .
3.3.1 Spectral discretization in space . . . . .
3.3.2 Adaptive, implicit discretization in time
3.3.3 Solution of the implicit system . . . . . .
3.3.4 Basic numerical results . . . . . . . . . .
3.4 Investigation of the preconditioned system . . .
3.4.1 Preliminaries and numerical results . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

48
48
52
54
54
55
57
59
67
67

iv

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

3.4.2

Formal asymptotics . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2.1 Allen Cahn . . . . . . . . . . . . . . . . . . . . . . .
3.4.2.2 Cahn-Hilliard . . . . . . . . . . . . . . . . . . . . . .
3.5 Performance on a variety of models . . . . . . . . . . . . . . . . . . .
3.5.1 2D Cahn-Hillard . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Sixth order model . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.3 2D vector model . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 GPU implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6.1 GPU Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Higher order time stepping . . . . . . . . . . . . . . . . . . . . . . . .
3.7.1 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . .
3.8 A Pair of Fourth-Order Accurate Methods . . . . . . . . . . . . . . .
3.9 Investigation of splitting methods . . . . . . . . . . . . . . . . . . . .
3.9.1 Preliminaries and numerical results . . . . . . . . . . . . . . .
3.9.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9.2.1 Accuracy in Allen-Cahn equations . . . . . . . . . .
3.9.2.2 Condition number of solver for Eyre’s method for
Hilliard problems . . . . . . . . . . . . . . . . . . . .
3.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 4 Exponential Integrator . . . . . . . . .
4.1 General Exponential Integrator . . . . . . . . .
4.2 Stability and Linearization of the CH Equation
4.3 Linearization of the FCH Equation . . . . . . .
4.4 Computing Exponential Terms . . . . . . . . . .
4.5 Runge-Kutta Methods . . . . . . . . . . . . . .
4.6 Taylor Methods . . . . . . . . . . . . . . . . . .
4.7 Convergence of ETD Methods . . . . . . . . . .
4.8 Stability Analysis . . . . . . . . . . . . . . . . .
4.8.1 Predictor/Corrector Analysis . . . . . .
4.9 Implementation of ETD-RK2 . . . . . . . . . .
4.9.1 Application to a GPU . . . . . . . . . .
4.9.1.1 GPU Speedup . . . . . . . . . .
4.9.2 Adaptive Time-Stepping . . . . . . . . .
4.10 Numerical Examples for FCH . . . . . . . . . .
4.10.1 Medusa Head in 2D . . . . . . . . . . . .
4.10.2 Punctured Hollow Spheres . . . . . . . .
4.11 Conclusions . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

104
104
107
109
111
113
115
118
119
121
123
126
128
129
131
131
132
134

Chapter 5 Comparison of Methods for the FCH
5.1 Description of Compared Schemes . . . . . . . .
5.2 Complete Simulation . . . . . . . . . . . . . . .
5.3 Fixed Time-Step Simulation . . . . . . . . . . .
5.4 Time Step Size Restrictions . . . . . . . . . . .
5.4.1 Solution Freeze Out for Convex Splitting

Equation
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

137
137
138
140
141
141

v

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

. . . . 70
. . . . 70
. . . . 73
. . . . 76
. . . . 77
. . . . 77
. . . . 77
. . . . 82
. . . . 84
. . . . 87
. . . . 89
. . . . 89
. . . . 93
. . . . 93
. . . . 98
. . . . 98
Cahn. . . . 101
. . . . 102

5.5
5.6

5.4.2 Solution Freeze Out for Exponential Time Diﬀerencing
5.4.3 Implicit Time-step Size Restriction . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . .

APPENDICES . . . . . . . . . . . . . . . . . .
Appendix A: Convex Splitting Code in CUDA
Appendix B: Fully Implicit Code in CUDA . .
Appendix C: ETD-RK2 Code in CUDA . . .
BIBLIOGRAPHY

.
.
.
.

. . . . . . . . . . . . .

vi

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

143
144
146
149

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

152
152
174
205

. . . . . . . . . . . . . . . . . 227

LIST OF TABLES

Table 3.1

Error estimates, Eδt , for ﬁxed time step computations of the 1D
Cahn-Hilliard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

Table 3.2

Error estimates, EN , for computations of the 1D Cahn-Hilliard . . .

61

Table 3.3

Performance of adaptive time stepping through a ripening event of
the 1D Cahn-Hilliard model . . . . . . . . . . . . . . . . . . . . . .

62

Dependence of the smallest eigenvalue of the preconditioned Jacobian
matrix for the Cahn Hilliard problem . . . . . . . . . . . . . . . . .

69

Performance of each adaptive time stepping method through a ripening event of the 1D Cahn-Hilliard model . . . . . . . . . . . . . . . .

94

Performance of adaptive time stepping through a ripening event of
the 1D Cahn-Hilliard model with Eyre’s splitting . . . . . . . . . . .

98

Table 3.4

Table 3.5

Table 3.6

Table 4.1

Comparison of linearization choices for the CH equation. . . . . . . 109

Table 5.1

Comparison of the convex splitting, fully implicit, and exponential
integrator methods for a full simulation . . . . . . . . . . . . . . . . 139

Table 5.2

Comparison of error and simulation time for ﬁxed time-step simulations of the FCH equation . . . . . . . . . . . . . . . . . . . . . . . 141

Table 5.3

Order of convergence at given time step sizes.

vii

. . . . . . . . . . . . 143

LIST OF FIGURES

Figure 1.1

Potential energy function . . . . . . . . . . . . . . . . . . . . . . . .

3

Figure 1.2

Tilted potential energy function . . . . . . . . . . . . . . . . . . . .

9

Figure 1.3

Proposed Naﬁon nanomorphologies based on experimental results

Figure 2.1

Energy trace of solution with unstable equilibrium initial condition.

39

Figure 2.2

An unstable steady state solution for the FCH equation . . . . . . .

40

Figure 2.3

Speedup of GPU over parallel CPU for a short FCH simulation computed using single precision (top) and double precision (bottom). .

45

Two dimensional FCH simulation results compared against images
from a diblock copolymer experiment . . . . . . . . . . . . . . . . .

46

Figure 2.5

Geometries that minimize the FCH energy . . . . . . . . . . . . . .

46

Figure 2.6

Results for a diblock copolymer pore showing pearling instability . .

47

Figure 3.1

Solution of the 1D Cahn-Hilliard Equation . . . . . . . . . . . . . .

60

Figure 3.2

The solution of the 1D Cahn-Hilliard equation during the ﬁnal stages
of the ripening process . . . . . . . . . . . . . . . . . . . . . . . . .

63

Figure 3.3

The energy history for the adaptive time approximation . . . . . . .

64

Figure 3.4

The time step history for the adaptive time approximation . . . . .

65

Figure 3.5

The PCG iteration count history for the adaptive time approximation 66

Figure 3.6

Eigenvalues of the preconditioned Jacobian matrix for the model situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

The eigenfunction of the preconditioned Jacobian matrix with the
smallest eigenvalue for the model situation . . . . . . . . . . . . . .

69

2D Cahn-Hilliard example computation. . . . . . . . . . . . . . . . .

79

Figure 2.4

Figure 3.7

Figure 3.8

viii

.

11

Figure 3.9

2D sixth order model example computation. . . . . . . . . . . . . .

80

Figure 3.10

2D vector model example computation . . . . . . . . . . . . . . . .

81

Figure 3.11

Solution of the 3D Cahn-Hilliard computation used as a timing test
for the GPU implementation . . . . . . . . . . . . . . . . . . . . . .

85

Figure 3.12

Time step size for the 3D Cahn-Hilliard computation . . . . . . . .

85

Figure 3.13

PCG count per time step for the 3D Cahn-Hilliard computation . .

86

Figure 3.14

The time step and PCG iteration count history for the adaptive time
approximation of the 1D Cahn-Hilliard using BE-FE, BDF2-AB2 and
BDF3-AB3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

Figure 4.1

Timestep size reﬁnement study for ETD-RK methods . . . . . . . . 119

Figure 4.2

Timestep size reﬁnement study for ETD-Taylor methods

Figure 4.3

Timestep size reﬁnement study for the ETD-T2 method with diﬀerent
numbers of iterations . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Figure 4.4

Timestep size reﬁnement study for the ETD-T3 method with diﬀerent
numbers of iterations . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Figure 4.5

Comparison of the error for ETD-RK methods versus the ETD-Taylor
methods with one and twenty iterations. . . . . . . . . . . . . . . . 123

Figure 4.6

Stability regions for exponential time diﬀerencing methods up to third
order in time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Figure 4.7

Stability regions for ETD-T2 and ETD-T3 methods with 100 iterations of the second step . . . . . . . . . . . . . . . . . . . . . . . . . 125

Figure 4.8

Contours giving limits on the region of stability for ETD-T2 and
ETD-T3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Figure 4.9

Diagram showing how to implement a spectral method on a GPU
to avoid synchronization, minimize memory use, and fully utilize the
parallelism of the GPU. . . . . . . . . . . . . . . . . . . . . . . . . 127

ix

. . . . . . 120

Figure 4.10

Number of times faster the GPU computation is over an eight-core,
fully-parallelized, CPU computation. . . . . . . . . . . . . . . . . . 129

Figure 4.11

Time evolution of an unstable steady state solution to the FCH equation in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 133

Figure 4.12

Time evolution of a punctured vesicle.

Figure 5.1

Final states of the complete simulation comparing the convex splitting
method to the other two methods . . . . . . . . . . . . . . . . . . . 139

Figure 5.2

Time evolution showing freeze out of solution with larger time step
sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Figure 5.3

Freeze out of operators for the Cahn-Hilliard equation showing the
amount of evolution when doubling the time step size . . . . . . . . 145

Figure 5.4

The energy spectrum of solutions to the Cahn-Hilliard equation is
dominated by low wave numbers. . . . . . . . . . . . . . . . . . . . . 146

Figure 5.5

Freeze out of operators for the Functionalized Cahn-Hilliard equation
showing the amount of evolution when doubling the time step size . 147

x

. . . . . . . . . . . . . . . . 135

Chapter 1
Introduction

1.1

Order Parameters and Phase Field Models

In 1937, Lev Landau published two papers on the nature of phase transitions [1, 2], wherein
he used the idea of an order parameter to describe the nature of the material at diﬀerent
points in space. This parameter was a function of space and could be used, for example, to
understand the average direction of spin at each point in a magnetic material. The order
parameter was later used in the Ginzburg-Landau theory of superconductivity [3]. This
order parameter function was the foundation for phase ﬁeld models many years later.
The phase ﬁeld model was introduced in 1983 by George J. Fix to model ﬁrst order liquid
to solid phase transitions [4]. Later, J.S. Langer compared the model to similar models in
solidiﬁcation theory and described the value of such a model when describing the physics of
phase transitions [5]. Since then, phase ﬁeld models have been used to describe the physics
of many diﬀerent systems including grain growth and coarsening, microstructure evolution in
thin ﬁlms, surface-stress-induced pattern formation, crack propagation, crystal growth in the
presence of strain, multiphase ﬂuid ﬂow, stress and electromagnetic driven void migration,
tumor growth, vesicle dynamics, and multicomponent interdiﬀusion (see [6] and [7] for a
list of various references). In general, phase ﬁeld models have become a valuable tool in
analyzing and modeling systems where multiple phases of a material separate due to high
interaction energies.
1

Phase ﬁeld models are powerful in describing microstructural evolution because they
eliminate the need to track the evolving fronts that describe phase boundaries. In one
dimension, an interface-tracking model can easily describe the sharp interfaces and calculate
their motions, but when describing complex phase separations in two or three dimensions,
it is no longer practical to track the interface directly. Instead, a phase function, u (t, x) is
introduced into the model to delineate the amount of each speciﬁc phase that exists at that
place and time. The evolution of the phase function is governed by a set of equations which
balance diﬀusion against driving forces. The primary advantage of a phase function model is
that the boundaries between phases have ﬁnite thickness and are therefore easier to analyze
mathematically. Phase ﬁeld formulations can also simplify the numerical simulations of such
systems because the derivatives of the phase function remain ﬁnite across the interfaces.
Phase ﬁeld models are typically used to track the motion of interfaces, and the thermodynamic and kinetic coeﬃcients are chosen to match the coeﬃcients in a corresponding
sharp-interface model. The phase function evolves in time, and when a sharp-interface
representation is desired, the solution of the phase function can be projected into the sharpinterface model. This research will be working with a phase ﬁeld model similar in nature to
the well known Cahn-Hilliard model.

1.2

The Cahn-Hilliard Model

What is now commonly termed the Cahn-Hilliard model was originally developed by van
der Waals in 1893 [8, 9]. The model was essentially forgotten until 1958 when, without
knowledge of van der Waals’ work, John Cahn and John Hilliard rederived the model to
describe the separation of liquid metal alloys as they coarsened due to cooling [10]. The

2

0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−1.5

−1

−0.5

0

0.5

1

1.5

Figure 1.1 Potential energy function W (u). For interpretation of the references to color in
this and all other ﬁgures, the reader is referred to the electronic version of this dissertation.
model is expressed in the energy form

ECH (u) =

where

ε2
| u|2 + W (u) dx,
2
Ω

(1.1)

u is the density gradient of one of the phases and W (u) is a function that represents

the potential energy of the mixed states of the phases. A symmetric, double-well potential
is typically used for W (u), which deﬁnes local energy minima for pure states with u = ±1
and an endothermic energy for the mixed states (Figure 1.1).
chemical potential [11]
µ=

δECH
.
δu
3

In 1989, Pego deﬁned the

We calculate the variational derivative

δECH
δu

using a formula similar to the Euler-Lagrange

equation. If
F [ρ (r)] =

f (r, ρ (r) , ρ (r)) dr

then
µ=

∂f
δF [ρ]
=
−
δρ
∂ρ

·

∂f
.
∂ ( ρ)

(1.2)

Applying formula (1.2) to (1.1) and substituting into Pego’s equation for the chemical potential we obtain
µ = −ε2 ∆u + W (u) ,
so that the diﬀusion equation becomes ut = ∆µ = ∆ −ε2 ∆u + W (u) from Fick’s second
law of diﬀusion [12]. When written this way, it appears in the form of a conservation law
ut =

· j (x) where j (x) =

µ. The conserved quantity is the total mass of each of

the phases in the system. This makes the evolution of the Cahn-Hilliard model a mass
conserving H −1 gradient ﬂow for the energy (1.1). The resulting diﬀerential equation is
commonly called the Cahn-Hilliard equation.

ut = −∆ ε2 ∆u − W (u) .

(1.3)

Due to the importance of the H −1 gradient ﬂow and inner product to this work, it is
useful to recall some deﬁnitions. First, Lp is the Lebesgue function space deﬁned such that a
function f is in Lp if

|f |p dx < ∞ on the domain of interest (which could be unbounded).

Further H s , for non-negative integers s, is the Sobolev space of functions that contains all
functions f such that f is in L2 and all of its ﬁrst s weak derivatives are in L2 . H −1 is not
a Sobolev space, nor is it even a space of functions, but rather the space of distributions (an
4

extension of functions) that acts as a dual space to H 1 .
We introduce the operator ∆−1 : u ∈ L2 | Ω udx = 0 → H 2 (Ω) denoted

f := ∆−1 u

with the property that
∆f, v L2 = u, v L2
for any v ∈ L2 (Ω). This mapping requires boundary conditions to make it unique (e.g. ∆−1 :
2
L2 (Ω) + u|∂Ω = 0 → Hperiodic (Ω)). Further, we recall the following equivalent formulations

of the H −1 inner product when u, v ∈ L2 (Ω) satisfy periodic boundary conditions,

u, v H −1 = ∆−1 u, v 2 = u, ∆−1 v 2
L
L

(1.4)

The global existence of solutions to (1.3) was shown by Elliot and Songmu in 1986
[13]. Solutions to the Cahn-Hilliard equation undergo a rapid spinodal decomposition into
domains of the two pure phases (u = ±1) separated by a boundary layer with thickness of
O (ε). The evolution of the separated phase domains was studied ﬁrst by Pego, who showed
that the motion of the interfaces is a Mullins-Sekerka type ﬂow [11]. Later, asymptotic
analysis by Alikakos et al. derived the rigerous convergence of the Cahn-Hilliard equation
to the Mullins-Sekerka ﬂow [14]. Further analysis was performed by Modica and Sternberg
showing that solutions of the Cahn-Hilliard equation minimize the area of the interface
surfaces [15, 16]. Work by de Mottoni and Schatzman, showed that the motion of the
developed phase interfaces is driven by the mean curvature of the surface [17].

5

1.3

Sharp Interfaces and the Canham-Helfrich Model

The limit ε → 0 corresponds to sharp-interface boundaries between phase domains. A
common assumption is that the potential energy stored in such interfaces depends exclusively
on the area of the surface and curvature of the interface. For a surface in three dimensions
and a point, p, on that surface, consider all the curves on the surface passing through p.
1
Each curve will have a curvature at p deﬁned by κ = R , where R is the radius of the circle

deﬁning that curvature. Take the minimal and maximal curvatures over that set of curves
to obtain κ1 and κ2 respectively. We can then deﬁne the mean curvature and Gaussian
curvature as
H=

1
(κ + κ2 )
2 1

K = κ1 κ2 .

The Canham-Helfrich free energy is a generic model for the bending of thin elastic ﬁlms
which truncates the dependence upon curvature at quadratic order. Canham derived the
model when studying the biconcave shape of human red blood cells [18], and separately,
Helfrich developed it to describe the minimum energy geometries of lipid bilayers and vesicles
[19]. In three space dimensions, the Canham-Helfrich energy is

EC (Γ) =

Γ

a1 + a2 (H − a3 )2 + a4 K dS,

(1.5)

where a1 is a constant that denotes the energy density per unit of surface area, a2 and a4 are
the energy densities attributed to the respective curvatures, and a3 speciﬁes the intrinsic, or
zero-energy, value of the mean curvature.
Due to the fact that phase separation is dominated by interfacial energies, the CanhamHelfrich model must be taken into consideration. Du et al. show that the Canham-Helfrich

6

model is suﬃcient to generate many important geometrical structures [20], and the CanhamHelfrich energy is considered to be generic [21]. However, the model brings with it some
challenges. As a sharp interface model, Canham-Helfrich models do not have the ability to
account for topological changes, nor can it predict the dependence of its parameters on the
interfacial structure.
To address some of these issues, Gurtin and Jabbour proposed a diﬀuse interface model
that accounts for the smoothing of sharp corners found in grain boundaries of crystalline
materials by limiting the energy dependence on curvature to a constitutive framework[22].

1.4

The Functionalized Cahn-Hilliard Model

The Functionalized Cahn-Hilliard (FCH) model was developed by Promislow to describe
nanostructure morphology changes in functionalized polymer chains that have been hydrated
with a polar solvent [23]. Hydrocarbon backbones of long polymers are hydrophobic, and
when they are made into membranes, they exclude any polar solvent such as water. To make
such a membrane useful in applications such as Polymer Electrolyte Membrane (PEM) fuel
cells, there must be a porous network of thin water channels to allow conduction of protons
while excluding electrons [24]. The immiscibility of the water and polymer can be reversed by
a functionalization process whereby acid terminated side-chains are added to the polymer
backbone. This functionalization embeds latent energy into the membrane that can be
released when water is introduced. This energy reduction occurs when the polymer and
water phases separate, and a polymer-solvent interface is formed. At the interface, the acid
groups can minimize energy by releasing their protons and solvating the bound negative ions.
Most phase ﬁeld models are obtained from a perturbation about a spatially uniform

7

density. The FCH model is fundamentally diﬀerent because it is obtained by perturbing a
model which stabilizes preferred geometries. The Functionalized Cahn-Hilliard energy is

E (u) =

2
1 2
ε ∆u − Wτ (u) − ε
Ω2

ε2 η1
| u|2 + η2 Wτ (u) dx,
2

(1.6)

where the ﬁrst term in the integral comes from the variational derivative of the Cahn-Hilliard
energy (1.1) which stabilizes bilayer, pore, and micelle structures. The second term gives
the perturbation which promotes the growth of interface and competition between these
geometries.
As with the Cahn-Hilliard energy, the function u takes values from −1 to 1 and deﬁnes
the volume fraction of the polymer and water phases (1.1). The parameter ε deﬁnes the
thickness of the boundary layer separating phase domains. η1 and η2 are O (1) or O (ε)
constant parameters that govern the nature of the energetic interations. η1 can be compared
to the electrostatic energy of solvating the side chains, and η2 describes pressure associated
with diﬀering mixtures of the two phases.
Wτ (u) is a double-well potential energy function as before, but it is no longer symmetric
due to the diﬀerence in self-energies of polymer and water (Figure 1.2). Typically Wτ (u) =
1
2

1
(u + 1)2 2 (u − 1)2 − τ (u − 2) , where the positive parameter τ deﬁnes the amount that
3

the well is tilted.

This asymmetry, along with η1 and η2 , gives rise to various geometries

that can each minimize the energy. This behavior is not observed in the CH model with a
tilted well because the addition of a linear tilt is eliminated by the H −1 gradient ﬂow. In the
FCH energy, the slope of the introduced linear term remains hidden in the positive squared
term.
Solutions of the FCH equation evolve on time scales ranging from O ε2 through O ε−2

8

0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−1.5

−1

−0.5

0

0.5

1

1.5

Figure 1.2 Tilted potential energy function W (u)
[25]. At the fastest time-scales, O ε2 and O ε1 , initial data relaxes to form the proper
front proﬁles between domains. When time is O (1), domains form according to a nonlinear
diﬀusion equation which has stable equilibria at the zeros of W (u), speciﬁcally u (x) = −1.
At longer O ε−1 time-scales, geometric structures evolve with a quenched mean curvature
ﬂow for the normal velocity of the domain fronts. Finally, for the longest time evolution,
O ε−2 , domains that are separated in space compete for the minor phase, u = +1. The
aim of this work is to accurately capture all of these disparate time-scales.
Work by Gompper and Schick used small angle x-ray scattering data (SAXS) to motivate
an energy model similar to the FCH model that describes amphiphilic systems such as
two immiscible ﬂuids mixed with a surfactant [26, 27]. In their study, they show a strong

9

connection between the Ginzburg-Landau energy model, the Canham-Helfrich surface energy,
and experimental data. This connection can be combined with asymptotic analysis by Gavish
et al. that shows convergence of the FCH energy to the Canham-Helfrich energy in the limit
as ε → 0 [28]. Taken together, the analysis and experimental results suggest that the FCH
model can be useful in probing the physically relevant geometries created by functionalized
polymer/solvent systems.

1.5

Proposed Experimental Nanomorphologies

Recently Gavish et al. described the nature of geometric minimizers to the FCH model
[28]. In three dimensions, energy minimizing geometries include cylinders, inverted micelles,
and bilayers. Experimental and numerical studies of a functionalized polymer, Naﬁon, have
yielded a multitude of predicted geometric structures, some of which are shown in Figure
1.3.

Naﬁon is a membrane made from sulfonated tetraﬂuoroethylene that was discovered

in the 1960s by Walther Grot [33]. It is composed of long, hydrophobic, polymer chains
that have been functionalized by attaching short, sulfonate-tipped, side-chains. Hydrolysis
of these sulfonate groups can release latent energy, so small domains of water form inside
the membrane to lower the free energy. Any continuum model for Naﬁon must be inherently
binary due to the rarity of water embedded in the membrane. Water and low hydration ions
form a common phase, so the value of u (x) represents the density of ions in water at each
point in the domain.
One of the ﬁrst characterizations of Naﬁon was performed by Hsu and Gierke who predicted cluster chain pores [29]. A study by Ioselevich et al. attempted to combine experimental data from many experiments, and they believe that the Naﬁon backbones group to

10

Cylindrical pore morphology predicted by Rubatat et al. [31]

Cluster chain morphology from work by Hsu
and Gierke [29]

Naﬁon backbone groupings suggested by Ioselevich [30]

Morphology ﬁt to SAXS data by SchmidtRohr and Chen [32]

Figure 1.3 Proposed Naﬁon nanomorphologies based on experimental results
provide a higher number of solvated acid groups per unit length [30]. Rubatat et al. used
small angle x-ray scattering (SAXS) and small angle neutron scattering (SANS) to probe the
structure of Naﬁon, and they concluded that the water domains were cylindrical in shape [31].
Schmidt-Rohr and Chen predicted parallel cylindrical pores and then numerically adjusted
the two dimensional cross-section until it matched SAXS data [32]. From essentially the
same experimental data, many diﬀerent geometries have been predicted. Through simulations on the Functional Cahn-Hilliard equation, we intend to investigate the range of possible
nanomorphologies that minimize the FCH energy and provide tools to better describe the
balance of curvature and surface energies.

11

1.6

Functionalized Cahn-Hilliard Equation

Before running simulations on the FCH model, we must convert it into a diﬀerential equation.
The FCH energy (1.6) can be formally derived from the CH energy (1.1) by assigning a
negative value to the interfacial energy via the Cahn-Hilliard energy, ECH , then balancing it
against the square of its own variational derivative. Thus the FCH energy can be written as

E (u) =

1
Ω2

δECH 2
dx − ηECH ,
δu

(1.7)

where we have simpliﬁed to the case η = εη1 = εη2 . Physically this describes electrostatic
energy competition. The square of

δECH
δu

describes phase separation that lowers energy by

separating the polar solvent and non-polar polymer. Opposed to this is the negative CahnHilliard term, −ηECH , which lowers the energy by solvating the ionic polymer side-chains.
Again, we introduce the chemical potential using formula (1.2) and note that the variational
derivative of (1.6) can be written in terms of

µ=

δE
=
δu

δECH
δu

δ 2 ECH
−η
δu2

to obtain

δECH
δu

= − ε2 ∆ − W (u) − η

ε2 ∆u − W (u) .

(1.8)

If we reintroduce the diﬀerent values for η1 and η2 we obtain

µ = − ε2 ∆ − W (u) − εη1

ε2 ∆u − W (u) − ε (η1 − η2 ) W (u) .

12

(1.9)

As before, we take the H −1 gradient ﬂow on the chemical potential. This gives the working
form of the FCH equation,

ut = −∆µ = ∆

ε2 ∆ − W (u) − εη1

ε2 ∆u − W (u) − ε (η1 − η2 ) W (u) .

(1.10)

A similar derivation has been shown by Gavish et al. for the Allen-Cahn like version of the
equation [28]. That version of the model does not conserve mass, so it must be explicitly
accounted for with a zero-mass projection at each step. This is accomplished by subtracting
oﬀ the average change in solution from every point in the domain.
With the FCH equation we need to specify initial and boundary conditions. The initial
condition is simply u (x, 0) = u0 (x), where u0 is a function that identiﬁes the composition
of the material at each point in space and varies between −1 (pure polymer) and 1 (pure
solvent) over the entire domain, Ω. The boundary conditions can take many forms depending
on the set up of the experiment, but for now we use periodic boundary conditions since we
are simulating a small domain inside bulk material of the same composition.

1.7

Numerical Challenges for the CH and FCH

The Cahn-Hilliard and Functionalized Cahn-Hilliard equations both have some signiﬁcant
numerical challenges when simulating solutions. When the CH equation (1.3) is written out
2
1
with the standard double well potential, W (u) = 4 u2 − 1 , we have the equation:

ut = −∆ ε2 ∆u − u3 + u .

13

This equation is notoriously diﬃcult to approximate numerically due to the small time
step size required to maintain stability of the solution. The behavior is often referred to as
stiﬀness in the numerical approximation, and for this equation, it is due to both the harmonic
operators and the nonlinear operator [34].
As an example, if the forward Euler method is used and ε = 0.01, the time steps must
be on the order of 10−7 for the method to be stable. What makes the problem worse is
that the fastest evolution in the solution is O ε2 ∼ 10−4 and Ostwald ripening (growth
of large phase domains at the expense of smaller domains) is on a much slower O (1) time
scale [35]. This means that an explicit method requires far too many time steps to get any
reasonable results from a simulation. On the other hand, if backward Euler is used with the
hope of obtaining a larger time step, the nonlinearity causes implicit solvers to fail due to
large time steps moving the solution out of the basin of attraction (as in Newton’s method).
The implicit time step restriction is not quite as dramatic as the restriction for an explicit
method, but it is still unreasonably ineﬃcient.
When attempting to simulate solutions to the FCH equation (1.10), the set of numerical
challenges includes the diﬃculties from the CH equation, but it also has other challenges as
well. The equation includes a third laplacian operator and the tilted potential well increases
the complexity of the polynomial terms (Figure 1.2), but the biggest diﬀerence is not numerical but physical. In the CH model, the complexity of the solution geometry decreases
in time because a decrease in phase interactions leads to a decrease in energy. In the FCH
model the solution can also decrease energy by increasing the amount interface between
phases, so solutions to the FCH equation can become more complex as time evolves. These
extra challenges require a researcher to think carefully about the types of methods to use
when working on the FCH equation, but methods that have worked for the CH equation are
14

the best place to start looking for useful ideas.

1.8

Numerical Methods for the CH Equation

Since the FCH model is new, numerical methods to simulate solutions to it have not yet been
developed. Development of a time-stepping method for the FCH equation is the purpose of
this research. Due to the strong connections between the CH and FCH models, a review of
methods for solving the Cahn-Hilliard equation is necessary. A literature search returns far
too many papers to discuss here, so this review will only touch on a few papers from each
of the most relevant numerical methods. I divide the review into two parts: spatial schemes
and time-stepping schemes. Any good numerical method for simulating the CH or FCH
equation will require both a spatial and temporal component, and the connections between
the two could be very important to the performance of the method.

1.8.1

Spatial Schemes

The most common numerical method for solving the Cahn-Hilliard equation (1.3) has been
the method of ﬁnite elements. Elliot and French were active in this area of research in the
late 1980s. In their research, they developed several diﬀerent ﬁnite element methods, each
with slightly diﬀerent properties. In the ﬁrst paper, they present a method for solving the
CH equation in one dimension and discuss the severity of the time step restriction due to
stability [36]. Two years later with Milner, they propose a method that is second order
in space and only requires the elements to be continuous [37]. In a second paper in 1989,
Elliot and French present a non-conforming ﬁnite element approach and give proofs for the
accuracy and convergence of the method [38].

15

In 1991, Du and Nicolaides presented a semi-discrete ﬁnite element method that has
the advantage of a Lyapunov function [39]. The existence of a Lyapunov function for their
method makes it possible to prove convergence of the approximate solutions without any conditions beyond the existence and uniqueness of solutions to the original diﬀerential equation.
Later, Barret et al. used a ﬁnite element method to model solutions to the CH equation with
non-constant surface diﬀusion [40]. In the last decade, Feng and Prohl developed and analyzed a mixed ﬁnite element scheme, and they proved some error bounds that were O ε−1 ,
rather than O exp 1
ε

which was the previous result [41].

A common thread through several of the methods, including the ﬁnite element schemes,
was the use of multigrid techniques. In 2006, Kay and Welford gave a multigrid ﬁnite element
scheme with mesh independent convergence rates [42]. It is interesting to note that this paper
also discusses the diﬃculty of time step restrictions. An earlier paper by Kim et al. uses a
conservative multigrid method to solve the CH equation. The solution is then coupled to a
projection method to solve ﬂuid ﬂow with the Navier-Stokes equation for a two ﬂuid system
[43].
In the last few years, several papers have been published by Wise and his colleagues using
non-linear multigrid ﬁnite diﬀerence schemes on the Phase Field Crystal model [7, 44, 45].
The Phase Field Crystal model is an extension of the Cahn-Hilliard model that has strong
anisotropy so that it can eﬀectively model crystal growth and formation. Their schemes
are unconditionally stable with respect to time step size because they use a convex splitting
technique developed by Eyre [46, 34]. The technique will be used for the method developed in
Chapter 2, and it will be discussed extensively there. Previous to the Wise papers, Furihata
published a ﬁnite diﬀerence scheme that was second order in time and space but suﬀered
from the severe time step conditions on stability [47].
16

Spectral methods have also played an important role in simulating solutions to the CH
equation. In 1999, Zhu et al. applied a semi-implicit spectral method to the CH equation to
study the long time behavior in a two dimensional domain [48]. In their scheme, they treated
the principle elliptic operator implicitly and all the other terms explicitly. Later, Vollmayr1

Lee and Rutenburg used a spectral method to identify an O t 3

time step size which

can control accuracy for a certain class of convex splitting schemes [49]. They came to the
conclusion based on Eyre’s Theorem [46] and von Neumann analysis. In 2005, Ye and Cheng
published a paper discussing the inheritance of energy dissipation and mass conservation in
spectral methods [50]. Recently, Shen and Yang published the results of error estimates for
several methods used to simulate the CH equation, and in their conclusion, they recommend
spectral methods because of their eﬀectiveness in capturing the interface fronts in the solution
[51].
Lastly, discontinuous Galerkin methods have been applied to the CH equation with some
success. In 2006, Wells et al. discussed a discontinuous Galerkin scheme and compared the
results and convergence to a standard ﬁnite element method [52]. A year later, Xia et al.
presented a local discontinuous Galerkin scheme that comes with a proof of energy stability
[53]. The paper includes numerical accuracy results and also some results on a ternary
system which could be of interest to our group in future research.

1.8.2

Temporal Schemes

Chapters 2, 3, and 4 will discuss in depth three temporal schemes for the Functionalized
Cahn-Hilliard equation: convex splitting, fully implicit, and exponential time diﬀerencing.
Before delving into the work of this dissertation, I will review these three temporal methods
for the Cahn-Hilliard equation and other partial diﬀerential equations.
17

In 1998, Eyre proposed a gradient splitting method for stabilizing the numerical computation of equations that have the form ut = F (u) = ∆ δE . Eyre applied this method to
δu
the Cahn-Hilliard equation (1.3) to take advantage of the convexity of the associated CahnHilliard energy (1.1). By appealing to convexity, the implicit portion of the discretized
equation discussed in Section 1.7 will have a unique solution. This small adjustment in
the method completely removes the time step restriction. The method is discussed in two
papers, but only one of them was actually published [34, 46].
Several other authors have used convex splitting of the CH energy or similar models to
improve performance of their numerical simulations. These methods are valuable because
the convex splittings derived guarantee a unique solution at each time step and provide for
methods that decay in energy at each step. In 2009, Wise, Wang, and Lowengrub presented
an energy stable scheme for the phase ﬁeld crystal equation [54]. Following this in 2010, Wise
used the same scheme to approximate solutions to the Cahn-Hilliard-Hele-Shaw system of
equations in a nonlinear multigrid framework [45]. Later, Gomez and Hughes developed a
convex splitting based on the fourth derivative of W (u) that attains second order convergence
in time [55].
A method that does not fall into the convex splitting category but retains gradient
stability is the implicit-explicit stabilized scheme. Stabilization comes from adding and
subtracting a term that stabilizes inversion of the linear portion of the operator. Papers in
2010 by Shen and Yang develop a ﬁrst order version of the method [56] and review several
other energy stable methods [51]. A later paper in 2012 by Shen, Wang, and Wang extends
their analysis to a gradient stable scheme that is second order accurate in time.
Before the development of gradient stable schemes, simulation of the CH equation was
primarily performed with implicit schemes since explicit computations suﬀer from severe
18

time step restrictions. An early paper on computing the CH equation with implicit time
stepping in one dimension comes from Elliot and French [36]. Further work with Milner
provided a second order in space calculation [37], and French alone published a paper on
an implicit scheme for the CH equation in two dimensions [57]. Lastly, a paper by Du and
Nicolaides established a scheme with better stability properties [39].
More recently in 2008, Kronbichler and Kreiss published a paper on two phase ﬂows where
they used implicit time stepping for the Cahn-Hilliard portion [58]. In 2011, Willoughby
completed his doctoral dissertation with Brian Wetton on implicit time stepping methods
for the Allen-Cahn and Cahn-Hilliard equations [59].
The third method I implement in this dissertation is the exponential time diﬀerencing
(ETD) method. ETD methods for ordinary diﬀerential equations originated as early as
1960 in work by Certaine [60] and Pope [61]. In the nineteen-sixties and seventies, A great
many papers followed that reﬁned the idea, but it eventually fell out of favor because it
was infeasible on the current computing hardware. An extensive review of ETD and other
similar methods was published by Minchev and Wright [62].
With recent advances in computing hardware and parallelism, ETD methods have ﬁnally
become eﬀective for partial diﬀerential equations. In 1994, Hou, Lowengrub, and Shelley
used an exponential linearization to remove the stiﬀness from the computation of interfacial
ﬂows with surface tension [63]. This lead to a rash of papers being published within the
last ﬁfteen years on the topic of ETD methods for PDEs. In 1998, Beylkin, Keiser, and
Vozovoi develloped discretization schemes for nonlinear PDEs [64]. Four years later, Cox
and Matthews expanded the schemes to include higher order Runge-Kutta methods [65]. In
2004 and 2005, Du and Zhu published a pair of papers analyzing the stability of ETD-RungeKutta (ETD-RK) methods and proposing a complex integration for stabilizing higher order
19

terms [66, 67]. Kassam completed his doctoral dissertation on high order time-stepping for
semilinear PDEs and published a paper with Trefethen [68, 69]. In 2005, Hochbruck and
Ostermann published a pair of papers on ETD-RK methods for parabolic problems [70, 71].
They later published a review paper on “the construction, analysis, implementation and
application of exponential integrators” [72], where they focus on the two dominant types
of stiﬀness; equations where the Jacobian has eigenvalues with large negative real parts, or
highly oscillatory problems with large, purely-imaginary eigenvalues. Tokman has published
a series of papers focused on eﬃcient computation of the exponential coeﬃcients that arise
in ETD schemes, particularly how to compute the coeﬃcients for large scale computing [73,
74, 75]. Most recently, Koskela and Ostermann published a paper on extending exponential
time diﬀerencing to higher orders using Taylor expansion [76].

1.9

Spectral Method

To eﬀectively study temporal schemes we employ a Fourier spectral method in space. We
consider a standard Fourier pseudo-spectral discretization of (1.3). The function u is approximated by the vector U deﬁned on a discrete grid of N points (xj = jh, j = 1, . . . N )
equally spaced with spacing h = 2π/N . The discrete Fourier transform of U is denoted by
ˆ
U:
1
ˆ
Un =
N

N

e−i(j−1)(n−1)/N Uj
j=1

which we can write in vector form as

ˆ
U = FU

20

where F is the matrix representation of the discrete Fourier transform. In the pseudo-spectral
setting, U can also be considered to approximate u in the sense
N

ˆ
Un (t)ei2παn x

u(x, t) ≈
n=1

where αn are the appropriately aliased n values. We then use the usual spectral approximations for derivatives on grid points

uxx ≈ F −1 Λα F U := ∆h U

2
where Λα is the diagonal matrix with entries −αn . The inverse discrete Fourier matrix

F −1 = N F ∗ . Note that the matrix ∆h can be formed explicitly [77] but we only need the
property that it is possible to multiply by it eﬃciently using the FFT.
With these approximations, we can write a method of lines (MOL) discretization (a
semi-discretization in space but keeping time continuous) of (1.3):

dU
= − 2 ∆h ∆h U + ∆h W (U).
dt

(1.11)

where by W (U) we mean the vector with entries W (Uj ).
Throughout this document we will focus on the approximation to the time-stepping and
treat the analysis in a semi-discretized form. All of the computer codes are discretized in
space using this Fourier spectral decomposition, and the operators are built explicitly in the
frequency domain to obtain eﬃciency.

21

Chapter 2
Convex Splitting for the FCH
Equation

2.1

Gradient Splitting Method

We begin by giving a detailed description of Eyre’s convex splitting method. If an energy,
E (u), is strictly convex, then its H −1 gradient is contractive. By deﬁnition, a functional,
F (u), is weakly contractive in the H −1 Sobolev space if it satisﬁes

F (u) − F (v) , u − v H −1 ≤ 0,

for all u, v ∈ H 2 (Ω). Using integration by parts on the H −1 inner product with periodic
boundary conditions, this is equivalent to

(F (u) − F (v)) ,

(u − v) L2 ≤ 0,

(2.1)

Such a function gives energy decay in the approximation to diﬀerential equation, i.e.
U n+1 − U n
= F U n+1 ⇒ E U n+1 < E (U n ) ,
k

22

(2.2)

where U n+1 and U n are the numerical solutions and times tn+1 and tn respectively, and k
is the size of the time step. On the other hand, if the negated energy, −E (u), is strictly
convex, then F (u) is expansive, we have F (u) − F (v) , u − v H −1 ≥ 0, which gives
U n+1 − U n
= −F (U n ) ⇒ E U n+1 < E (U n ) .
k

(2.3)

Neither the Cahn-Hilliard energy (1.1) nor the Funcionalized Cahn-Hilliard energy (1.6) are
convex, however, they can be split into convex and concave terms. After the separation of
energy terms, we obtain the splitting of the gradient function into

F (u) = Fc (u) − Fe (u) .

(2.4)

Fc (u) and Fe (u) are the respective contractive and expansive parts of the function F (u).
With the gradient function split, Eyre proved that the scheme

U n+1 − U n = k Fc U n+1 − Fe (U n )

(2.5)

is consistent, gradient stable for any k > 0, and possesses a unique solution for each time
step.
The ability to use any size of time step is a massive improvement over the previous
restrictions, but the method is only ﬁrst order in time. We believe that we could improve
the method by applying a deferred correction step to the time updates [78]. The essence of
the method is that we take an explicit step on the expansive terms, and it comes with a large
amount of error, but the implicit step on the contractive terms makes up for the instability
of the explicit portion and keeps the solution decaying in the energy landscape.
23

2.2

Gradient Splitting Example

To better understand the gradient splitting method, take the Cahn-Hilliard equation in one
dimension,
du
= −ε2 uxxxx + u3
− uxx .
dt
xx
2

The ﬁrst terms is contractive since it come from a convex term in the energy, ε2 | u|2 . and
respectively. Using (2.1), we can show that it is contractive as follows,

5
− ε2 ∂x (uxxxx − vxxxx ) , ∂x (u − v) L2 = −ε2 ∂x (u − v) , ∂x (u − v) 2
L
2

3
3
3
= −ε2 ∂x (u − v) , ∂x (u − v) 2 = −ε2 ∂x (u − v) ≤ 0 (2.6)
L
2

1
The second term also comes from a convex term in the energy, namely 4 u4 , and can be

shown to be contractive in a similar fashion.

∂x

u3

xx

− v3

xx

, ∂x (u − v)

L2

= ∂x

u3 − v 3

xx

, ∂x (u − v)

L2

3
4
= ∂x u3 − v 3 , ∂x (u − v) 2 = − u3 − v 3 , ∂x (u − v) 2
L
L
4
4
= − (u − v) u2 + uv + v 2 , ∂x (u − v) 2 ≤ − u − v, ∂x (u − v) 2
L
L
2
2
= − ∂x (u − v) , ∂x (u − v)

2

2
= − ∂x (u − v) ≤ 0
2
L
2

24

The third term is the backward heat operator and is expansive since it comes from − 1 u2 .
2
We can show that it is expansive by again using (2.1),

3
− ∂x (uxx − vxx ) , ∂x (u − v) L2 = − ∂x (u − v) , ∂x (u − v) 2
L
2
2
= ∂x (u − v) , ∂x (u − v)

2

2
= ∂x (u − v) ≥ 0 (2.7)
2
2
L

Thus we can split the right hand side into (2.4), where the contractive and expansive
functions are

Fc (u) = −ε2 uxxxx + u3

xx

Fe (u) = uxx .

Using (2.5), the gradient splitting scheme is

2
2
U n+1 − U n = k∂x −ε2 ∂x U n+1 + U n+1

3

− Un

(2.8)

To see the nonlinear Newton solve needed, rewrite the equation as

2
2
U n+1 − k∂x −ε2 ∂x U n+1 + U n+1

3

2
= 1 − k∂x U n .

For each time step, this equation can be solved by calculating the right hand side from
the solution at the previous time step, then using Newton’s method to iteratively solve the
nonlinear implicit portion, and thereby obtaining the solution at the next time step. Due to
the contractive nature of the implicit terms, Newton’s method is now guaranteed to converge
for any time step size, k > 0, as discussed in Sections 1.7 and 2.1.
25

2.3

Splitting the FCH Equation

The gradient splitting method depends on the choice of mixing potential function W (u),
since a diﬀerent choice would give a diﬀerent collection of convex and concave terms. The
tilted, double-well potential used in this research is

W (u) =

1
(u + 1)2
2

1
τ
(u − 1)2 + (u − 2)
2
3

(2.9)

and is shown in Figure 1.2. The parameter τ governs the amount of tilting in the double-well
and has a signiﬁcant impact on the possible minimal energy geometries.
With the potential substituted into the FCH equation (1.10), we obtain the full PDE:

1
ut = −ε4 ∆3 u+ε2 ∆2 u3 + ε2 τ ∆2 u2 −ε2 (2 + εη1 ) ∆2 u+3ε2 ∆ u2 ∆u +ε2 τ ∆ (u∆u)
2
5
1 2
1
τ
− ∆ 3u5 + τ u4 +
τ − 4 − εη2 u3 − (6 + εη2 ) u2 + 1 + εη2 − τ 2 u (2.10)
2
2
2
2

Considering this equation with a ﬁrst order approximation of the time derivative, most of
the terms on the right hand side of the equation can be classiﬁed as contractive or expansive
gradients of their corresponding contribution in the energy functional (Equation 1.6). One serious exception is the term 2 ∆ ∆u3 + 3u2 ∆u , which is from the Euler-Lagragian variation
of the energy component Ω − 2 u3 ∆udΩ. This energy term is neither convex nor concave.
Note that 3u2 ∆u + ∆u3 = −6u | u|2 + 6

· u2 u . To guarantee gradient stability for

the scheme, we must adopt the semi-implicit terms −6un+1 | un |2 + 6 ·

un+1

2

un+1

while classifying the other terms as previously discussed. This gives the unconditionally

26

gradient stable nonlinear semi-implicit scheme
un+1 − un
=∆
k

4 + 2 η (un )3 − 2 2 + 4 η ∆un − 4 ∆2 un+1 − 6 2 un+1 | un |2
+6 2

·

un+1

2

un+1 − 3 un+1

5

+ 1 + 2 η un+1

.
(2.11)

Unfortunately, the mixed implicit-explicit terms make the numerical scheme more complicated and signiﬁcantly decrease its speed and eﬃciency. We therefore chose to treat the
diﬃcult term fully implicitly and include it in Fc (u). The additive splitting can then be completed term by term and substituted into Equation 2.5. This gives the nearly contractive
function

Fc (u) = −∆ ε2 ∆ ε2 ∆u − u3 − ε2 3u2 + τ u ∆u + 3u5
τ
1
− (6 + εη2 ) u2 + 1 + εη2 − τ 2 u , (2.12)
2
2

and the corresponding expansive function

1
Fe (u) = ∆ ε2 ∆ (2 + εη1 ) u − τ u2
2

5
+ τ u4 +
2

1 2
τ − 4 − εη2 u3 .
2

(2.13)

Even though we cannot prove that this splitting is guaranteed to give energy decay for every
time step size, every simulation we have performed with this method has demonstrated a
decrease in energy. This slight deviation from Eyre’s original method seems to have little
impact on the numerical stability of the scheme.

27

2.4

Stability and Solvability

First, we will prove that the semi-discretized scheme is indeed unconditionally gradient
stable. The proof is algebraic, relying simply on the integration by parts and Young’s
inequality. We will prove stability for the simpler version of 2.11 which is the L2 ﬂow on the
energy,
un+1 − un
= 4 + 2 η (un )3 − 2 2 + 4 η ∆un − 4 ∆2 un+1 − 6 2 un+1 | un |2
k
+6 2

·

un+1

2

un+1 − 3 un+1

5

+ 1 + 2 η un+1 .

(2.14)

Theorem 2.4.1. The semi-implicit scheme (2.14) is unconditionally gradient stable for any
time step size k, i.e. F (un+1 ) ≤ F (un )
Proof. Choose the test function φ = − un+1 − un , multiply it to both sides of (2.14),
1
and integrate over the domain. On the left side, we have − k un+1 − un

2

≤ 0 which

approximates the theoretical decay rate −k ut 2 . Here and after, we use || · || for standard
L2 norm in the inner product space. On the right side, we will recover the form F un+1 −
F (un ) + R with R ≥ 0. This is computed term by term beginning with the linear terms.
• − 4 ∆2 un+1 : we have
4

2

4 ∆un+1 , ∆

∆un+1

• − 1 + 2 η un+1 : we have
1 + 2η
2

2

un+1 − un , which equals

2
− ∆un 2 + ∆un+1 − ∆un
.

(2.15)

1 + 2 η un+1 , un+1 − un , which is

un+1

2

2
− un 2 + un+1 − un
.

28

(2.16)

• − 2 2 + 4 η ∆un : after integrating by parts, it reads

2 2 + 4η

un ,

un − un+1 ,

and therefore can be written as

2

+

1 4
η
2

un 2 −

5

• −3 un+1 : it reads 3 un+1

3 un+1

5

5

un+1

2

+

un − un+1

2

.

(2.17)

, un+1 − un . We shall show that

, un+1 − un

≥

1
2

1, un+1

6

− (un )6 .

(2.18)

Deﬁne a new function

f un+1 = 3 un+1

5

un+1 − un −

1
2

un+1

6

− (un )6 .

We show that f un+1 ≥ 0 by ﬁrst noting that f (un ) = 0. Further, f
15 un+1

4

un+1 =

un+1 − un which is non-negative for un+1 > un , non-positive for un+1 <

un . Thus (2.18) holds.
• 4 + 2 η (un )3 : a simple calculation gives a result similar to (2.18), i.e. (un )3 un − un+1 ≥
1
4

(un )4 − (un+1 )4 . Thus

4 + 2 η (un )3 , un − un+1 ≥

• −6un+1 | un |2 + 6

·

un+1

2

1+

1 2
η
4

1, (un )4 − un+1

4

.

un+1 : note that for the ﬁrst term we have

6un+1 | un |2 un+1 − un ≥ 3 un+1

29

2

| un |2 − 3 (un )2 | un |2 ,

(2.19)

therefore

1, 6un+1 | un |2 un+1 − un

≥

1, 3 un+1

2

| un |2 − 3 (un )2 | un |2 . (2.20)

The second term gives

6 un+1

2

, un+1

un+1 −

un

1, 3 un+1

≥

2

un+1

2

− 3 un+1

2

| un |2 .

(2.21)

Adding (2.21) to (2.20), we obtain

2

−6un+1 | un |2 + 6
2

·

1, 3 un+1

un+1
2

2

un+1

un+1 , − un+1 − un
2

≥

− 2 1, 3 (un )2 | un |2 .

(2.22)

From (2.15 - 2.22), we conclude that F un+1 ≤ F (un ), with F (u) deﬁned by (1.6).
Next, we will show that there is at most one solution to the nonlinear equation (2.14) for
any time step k, therefore, there shall be no restriction from the solvability of (2.14)
Theorem 2.4.2. The nonlinear equation (2.14) has only one solution in H 2 (Ω) for any
time step k.
Proof. Assuming un+1 = w, v are distinct solutions, by subtraction it follows that

w−v
= − 4 ∆2 (w − v) − 2 6w | un |2 − 6v | un |2 + 6 2
k
−6 2

· w2 w

· v 2 v − 3w5 + 1 + 2 η w − 3v 5 − 1 + 2 η v .

30

(2.23)

Multiplying both sides of (2.23) by w − v and integrating over the domain, the left hand side
of (2.23) becomes

2
1
k , (w − v)

≥ 0. It suﬃces to show that the right hand side of (2.23)

is non-positive, thus w = v in H 2 (Ω). The right hand side gives

− 4 ∆ (w − v) 2 − 6 2 | un |2 , (w − v)2 − 1 + 2 η
− 3 w5 − v 5 , (w − v) + 6 2

· (w)2

(w) − 6 2

· (v)2

w−v 2

(v) , w − v .

It suﬃces to show that

g (w, v) = − 4 ∆ (w − v) 2 − 3 w5 − v 5 , w − v
+ 62

· w2 w − 6 2

· v 2 v, w − v ≤ 0.

(2.24)

2

It is easy to check that 3 w5 − v 5 (w − v) ≥ w3 − v 3 , therefore

2
g (w, v) ≤ − 4 ∆ (w − v) 2 − w3 − v 3 + 6 2

· w2 w − 6 2

· v 2 v, w − v . (2.25)

Integration by parts gives

62

· w2 w − 6 2

· v 2 v, w − v = − 2 6w2 w − 6v 2 v,

(w − v)

= 2 w3 − v 3 , 2 ∆ (w − v) .

Using Young’s inequality, it follows that

62

· w2 w − 6 2

2
· v 2 v, w − v ≤ 4 ∆ (w − v) 2 + w3 − v 3 .

31

(2.26)

Adding (2.26) to (2.25), we have g (w, v) ≤ 0, therefore w − v 2 = 0, i.e., w = v.
So far, after assuming existence and regularity of the solution to (2.14), we have shown
that the semi-discrete scheme (2.14) is unconditionally gradient stable and admits at most
one solution. We would like to comment that
2
Remark 1. When the ﬁnite element method with solution base Hh (Ω) is used, the above

analysis holds with suitable boundary conditions. However, Theorem 2.4.1 and 2.4.2 may
not be true for other spatial discretization. The motivation of designing such a scheme is
due to the fact that one of the major diﬃculties for the numerical simulation of (1.10) is an
accurate time stepping strategy.
Remark 2. The generalization of this proof to the conservative H −1 ﬂow (2.11) in the
semi-discrete framework is straightforward by using the test function φ = ∆−1 un+1 − un .

2.5

Fixed-Point Iteration

To solve the implicit portion of equation (2.5), we apply Newton’s method

G (U r ) U r+1 − U r = −G (U r ) ,

(2.27)

where r is the iteration number, and we deﬁne

G U n+1 = U n+1 − U n − kFc U n+1 + kFe (U n ) .

32

(2.28)

We desire to iteratively solve for U r such that G (U r ) <

with

a desired tolerance. First

we calculate the Fr´chet derivative
e

G (U r ) v = v − kFc (U r ) v,

(2.29)

with peturbation v and Fc deﬁned as

Fc (U r ) v = −∆ ε2 ∆ ε2 ∆v − 3 (U r )2 v − ε2 (6U r v + τ v) ∆U r − ε2 3 (U r )2 + τ U r ∆v
1
+15 (U r )4 v − τ (6 + εη2 ) U r v + 1 + εη2 − τ 2 v . (2.30)
2

The Fr´chet derivative is necessary in this use of Newton’s method because G (u) is a funce
tional on H 6 (Ω). The Fr´chet derivative of a function f (u) is deﬁned as the linear operator
e
A (u) that satisﬁes the following limit,

f (u + v) − f (u) − A (u) v
= 0.
v
v →0
lim

It can be easily calculated by computing f (u + v) − f (u) and then keeping only the terms
that are linear in v. This is precisely how we obtained (2.30) from (2.12).
It is important to note in equation (2.29) that G (U r ) is a complicated function of U r .
Computing this operator for every step of the iterative solve is computationally infeasible.
Instead, we approximate G with the operator

1
˜
G = 1 + k∆ ε4 ∆2 − c2 ε2 ∆ + c1 + 1 + εη2 − τ 2
2

(2.31)

which is independent of the phase-ﬁeld solution U r , and be computed entirely spectrally
33

which is a very quick calculation. c1 and c2 are O (1) empirical constants that approximate
the maximum values of the terms depending on U r . With this approximation, the new
ﬁxed-point method loses the second order convergence of Newton’s method, but the total
computational cost is much lower for each iterative process.
We do not yet have a proof that this splitting applied to the FCH equation (1.10) is a
contraction mapping, so we will present a proof that is applied to the Cahn-Hilliard equation
(1.3) [79]. The proof is in a ﬁnite element framework, even though our method uses a Fourier
method. The approximate Newton iteration for the CH equation is

1 + kε2 ∆2 − kC∆ ur+1 = k∆ (ur )3 + un − k∆un .

(2.32)

We will prove that (2.32) is a contraction mapping for small enough k. We ﬁrst change
(2.32) into a system of equations,


 U − kC∆U − k∆P = v

,

(2.33)


 P + ε2 ∆U = u3 − v
where v = un . For the partition

kh

1
in the ﬁnite element space σh = Hh (Ω), we look for a

pair (U, P ) ∈ σh such that


 U, φ + kC

 P, ψ − ε2

U, φ + k

P,

φ = v, φ
(2.34)

U, ψ =

u3

− v, ψ

holds for any φ, ψ ∈ σh . We deﬁne a mapping Πv : σh → σh for any v ∈ σh , so that
U = Πv (u).
Lemma 2.5.1. For given v with v 2 = α, there exists a constant β > 0 such that Πv
34

maps the ball S = {u ∈ σh : u 2 ≤ 2α} into itself if k ≤ βε2 hδ . hδ is the mesh size related
parameter.
Proof. Letting φ = ε2 U and ψ = kP in (2.34), we have

 2
 ε U 2 + kCε2 U 2 + kε2 P, U = ε2 v, U
2
2

 k P 2 − kε2 U, P = k u3 − v, P
2

.

(2.35)

Summing the two equations gives

ε2 U 2 + kCε2
2

U 2 + k P 2 = ε2 v, U + k u3 − v, P ,
2
2

(2.36)

and applying the identity a2 − ab = 1 a2 − 1 b2 + 1 (a − b)2 leaves us with the inequality
2
2
2

U 2 + 2kCε2
2

2k
2k
U 2 + 2 P 2 ≤ v 2 + 2 u3 − v, P .
2
2
2
ε
ε

(2.37)

Using Young’s inequality, it follows that

U 2 + 2kCε2
2

k
U 2 ≤ v 2 + 2 u3 − v 2 .
2
2
2
ε

(2.38)

˜
By inverse estimates in the ﬁnite element space, there exists C > 0 independent of hδ and
1

˜
˜ −
α such that u 2 ≤ Cαhδ 2 for u 2 ≤ 2α, where hδ only depends on mesh size. C only
∞
depends on the degree of the underlying polynomial, so we can choose C =

35

1
˜ −2
Cαhδ

2

or

more practically u 2 so that
∞

U 2 + 2kCε2
2

2kC 2
2k
U 2≤ v 2+ 2 u 2+ 2 v 2
2
2
2
2
ε
ε
8kC 2 α2 2kα2
≤ α2 +
+ 2 .
ε2
ε

Thus it suﬃces to require that 8kC 2 + 2k ≤ 3ε2 or equivalently k ≤

3ε2
8C 2 +2

(2.39)

≤ βε2 hδ .

Similarly, we can show that Πv is a contractive mapping if k ≤ βε2 hδ for some β > 0,
but the proof is nearly identical so we will not repeat it here.

2.6

CPU version of code

We found solutions to the Functionalized Cahn-Hilliard equation by applying a spectral
method to (2.5) with Fc and Fe given by (2.12) and (2.13) respectively. We wrote the
computer code in C++, but any number of languages could have been used. There are two
primary reasons we chose a spectral method; thin interfaces between phases and simplicity
of taking numerical derivatives.
First, spectral methods oﬀer the highest possible spatial resolution. Due to the physics
of the materials modeled, solutions to the FCH equation develop O (1) changes over O (ε)
intervals in space, so high spatial resolution is critical. We ﬁnd that we can obtain reasonable
solutions when we have as few as three grid points over the length of ε. For a cubic domain
with 128 grid points along each edge, this resolution allows us to eﬀectively capture up to
forty-two changes in phase from polymer to water or visa versa, which is far more than we
need in any reasonable simulation.
Second, the properties of a spectral method simplify derivatives in higher dimensions. The

36

number of Laplacians needed to calculate the solution at each time step becomes diﬃcult
with other methods, but in the frequency domain, each Laplacian is a simple elementwise multiplication of the Laplacian operator array against the target array. The spectral
method brings with it some other challenges. First and foremost is the computational cost
of transforming large arrays back and forth between the frequency and spatial domains.
Fortunately, the FFTW package provides the tools to do Fourier transforms quickly and can
easily be adapted to parallel computing on multiple CPUs.
Another restriction that comes with spectral methods is limitations on the domain shape
and boundary conditions. Domain shape is limited to rectangular prisms since grid points
must be spaced regularly in every direction. Further, the number of grid points in every
direction should be a power of two to use the full speed of FFTW. When computing solutions spectrally, periodic boundary conditions come automatically, but any other boundary
conditions become diﬃcult or impossible to compute. Fortunately for us, periodic boundary
conditions are reasonable for our simulations, but in future work, we may need to apply
Dirichlet, Neumann, or mixed boundary conditions. At that time it will probably be necessary to switch methods.

2.7

Adaptive Time-Step

In addition to the advantages of a spectral method, we can speed up the simulation by
adapting the time step to the evolution of the solution. The balance between Cahn-Hilliard
like coarsening and surface interface generation causes periods of rapid change in the solution interspersed between slow evolution. Due to the unconditionally stable nature of the
numerical scheme, we are free to choose a time step as large as we would like at the cost of

37

numerical error. When choosing a time step size, we estimate the evolution with a completely
explicit calculation based on the previous time step size. The residual is

Rn = k [Fc (U n ) − Fe (U n )] ≈ U n+1 − U n .

(2.40)

Then we determine the new time step

knew =

ct
,
max (Rn )

(2.41)

x∈Ω

where ct is a constant that allows us to control the time step sizes globally and is typically
a value near 1. In future work, we intend to replace this crude estimate with a control on
the error rather than the change in the solution.
A simulation that captures the importance of adaptive time stepping for the FCH is
that of a sphere with an O ε2 perturbation. A sphere that has too large of a radius is an
unstable steady state of the equation, so with a small perturbation the solution has very
little change for a long time period. Eventually the evolution moves the solution away from
the unstable steady state and collapses into a lower energy geometric solution. A graph of
the energy that shows the necessity of adaptive time stepping is in Figure 2.1.
It is easy to see the value of a time step that adapts to the solution. For evolution
before T = 600, large time steps can be taken because there is very little change, but as the
solution begins to change rapidly very small time steps must be taken in order to preserve
the integrity of the approximate solution. Figure 2.2 gives still shots of the solution at
important time intervals. Comparing to the energy, we see that the solution changes very
little between T = 1 and T = 600 and it would be a severe waste of computing time if we

38

Figure 2.1 Energy trace of solution with unstable equilibrium initial condition.
used uniform time steps that were small enough to capture the changes at T = 630 and
later. After T = 700 the evolution of the solution slows, and it is reasonable to take larger
time steps again.

2.8

FCH on Graphics Processing Units

A recent development in scientiﬁc computing is General-Purpose computing on Graphics
Processing Units (GPGPU). In the 1990s, Graphics Processing Units (GPUs) were developed
to accelerate the building of 2D and 3D images for output to a display. Since then, the
stream processing capabilities of GPUs have been turned to scientiﬁc computing, and the
39

T =1

T = 600

T = 630

T = 640

T = 650

T = 10000

Figure 2.2 An unstable steady state solution for the FCH equation
application to scientiﬁc computing has lead to commercial lines of GPUs that have hundreds
of computation cores but no display output hardware. GPU computation can be much faster
if a problem or computation lends itself well to many lightweight computation threads. We
used Nvidia GPU cards and the CUDA programming language which is C++ with extra
functions to control the device (GPU) from the host (CPU).
When simulating the FCH equation spectrally with the gradient splitting detailed in section 2.3, there are three main types of mathematical operations; Fourier transforms (FFT),
inverse Fourier transforms (IFFT), and multiply/add calculations. All three of these operations can be adapted very well to GPU processing. Three dimensional Fourier transforms
can be computed extremely fast on a GPU by breaking the domain into lines of grid points,

40

then handing each line of grid points to a separate processor for calculation of the one dimensional FFT. After each processor is done computing the frequency representation of the
data, the GPU can recompile the data back together. Inverse Fourier transforms can be
easily computed for the same reason. The cuFFT package has already been developed to
implement FFT calculations on the GPU, and it’s easy to implement because the syntax
and structure matches almost exactly with the FFTW library for C++.
Once the data can be quickly converted into either the spatial or frequency domain, the
entire equation is simply element-wise multiplication or addition in the appropriate domain.
This plays directly into the strength of the GPU, millions of lightweight computation threads.
There are however limitations to GPU computing that cannot be overlooked. Unlike a CPU
that has access to any memory location, GPUs multiprocessors can only access the data
that is stored in the GPU card’s memory. This leads to two complications; limitations on
array size and movement of data. The GPU card we used had only 4GB of memory on the
card, so for the number of double-precision, complex-valued arrays we used in the simulation,
the size of the domain was limited to 223 total grid points. For a cubic domain, this limit
corresponds to 128 grid points in each dimension. It is important to realize that this is a
rigid memory limit because it is deﬁned by the GPU card’s hardware.
The second complication is moving data to the memory on the GPU. The movement
of data on and oﬀ the card is the only way to access the solutions to the PDE, and data
movement is much slower than computation speed. Moving arrays onto the card for every
calculation would erase all of the speed-up obtained from using the hundreds of processors.
Our answer to this limitation was to put the initial condition array on the card at the
beginning and not take the array oﬀ the card except when the simulation ended or we
needed to record the data. Without this adjustment when porting the code from C++ to
41

CUDA, the change would have been pointless.
The ﬁnal limitation that needs mentioning is the limit on double precision computations.
There are some GPUs that are not capable of performing double precision calculations, but
on the GPU we used, there was one double precision core on each multiprocessor, as opposed
to eight single precision cores. This made the simulation much slower when we used double
precision, but recent advances in GPU technology have rectiﬁed this. The newest GPUs
for scientiﬁc computation have only double precision cores, which gives them up to sixteen
times more double precision capability [80].

2.9

GPU Speed-Up

We conducted a speed test to compare simulation code written for parallel CPUs against
the code written for the GPU. The CPU code was written in C++ and fully parallelized
using OpenMP. The parallel version of FFTW was used for the Fourier transforms. The
code was run on two Intel E5530 processors with four cores per processor. The processors
have a clock speed of 2.4 GHz. The GPU used was a Tesla M1060 with 240 multiprocessor
cores with a clock speed of 1.3 GHz, and it had 4GB of global memory. The two hardware
conﬁgurations were both about two years old at the time of the test, so the speed result are
comparable. The test was performed on initial conditions of random initial data for time
up to T = 0.001, which required approximately 20 time steps. The solutions given by the
two codes were exact up to implicit iteration precision. The simulation length was relatively
short compared to our typical FCH simulations that compute solutions up to T = 10, 000 or
even T = 100, 000. This was to show the impact of slow transfer of data between the host
and GPU device.

42

Figure 2.3 gives the speed-up comparison for single and double precision calculations.
For both the single and double precision experiments, the speed-up for a single time step
was much higher than the total time speed-up due to the slow transfer of initial data onto the
GPU card at the beginning of the simulation and transfer of the ﬁnal solution oﬀ the card
at the end of the simulation. As the simulation time grows longer, the total time speed-up
converges to the speed-up per time step value since the data transfer to and from the GPU
will become insigniﬁcant. The jagged nature of the single precision, per-time-step curve
is due to the error in measuring extremely small computation times. If the same speed-up
experiment was run on a state of the art GPU and parallel CPUs, the results are expected to
be similar, however the improvement in number of double precision cores per multiprocessor
on the Fermi line of GPUs leads us to expect results more like the ∼ 25x speed-up of the
single precision experiment [80].

2.10

Comparison of results to experimental data

A mathematical model is only as good as its ability to describe real materials. Unfortunately,
three dimensional images of Naﬁon are not available, but there are systems composed of
amphiphilic diblock copolymer systems that obey similar surface energy dominated physics.
Figure 2.4 shows a comparison between a previous 2D simulation on the FCH equation
[28], and images from an amphiphilic diblock copolymer system experiment [81].

Three

dimensional simulations also show pore network solutions that self-assemble from random
initial data. Figure 2.5 shows a three dimensional pore network along with two other solutions
that all had identical random initial data. The diﬀerence between the geometries is small
changes in the parameters η1 and η2 in equation (1.10).

43

Another geometry that has been seen extensively in experiments is the pearled pore.
Figure 2.6 shows an image from an experiment performed by Bendejacq et al. on a diblock
copolymer system [82].

Next to the experimental image is a simulation of pearling from

the FCH equation. The numerical simulation had initial conditions of a straight pore with
radially symmetric O (ε) noise. The periodic boundary conditions prevent the pore from
extending or contracting, and thereby force the solution into a frustrated quasi-steady state.
We have not yet been able to ﬁnd steady state pearling from random initial data in three
dimensions, but we hope to ﬁnd pearled pores and other interesting structures with a future
parameter scan.

2.11

Conclusions

We extended Eyre’s convex splitting scheme to the Functionalized Cahn-Hilliard equation
and obtained a numerical scheme. We proved that the scheme had only one solution and
that the energy could not increase for any size time-step. This shows that the method is
unconditionally gradient stable with respect to the size of the time-step taken. The right
hand side of the equation could not be split additively as desired, but by moving the mixed
term into the implicit portion, we were able to obtain an eﬀectively gradient stable method.
A novel iterative technique signiﬁcantly increased the calculation speed by eliminating
the need to rebuild the inverted operator for each iteration. The operator is built once at
the beginning of the computation saving computation time. Using this temporal scheme
with a Fourier spectral method for the spatial discretization made a scheme that could be
implemented on a graphics processing unit. The GPU code is extremely eﬃcient, and the
speed up allowed us to study large, three-dimensional domains over long evolution times.

44

35

Times Faster than CPU

30

25

20

15
Per timestep

10

Total time
5
16

17

18
19
20
21
22
Number of Points in Domain (Log 2P)

23

11
10

Times Faster than CPU

9
8
7
6
5
4
3
Per timestep

2

Total time

1
0
16

17

18
19
20
21
22
Number of Points in Domain (Log 2P)

23

Figure 2.3 Speedup of GPU over parallel CPU for a short FCH simulation computed using
single precision (top) and double precision (bottom).

45

Figure 2.4 Two dimensional FCH simulation results (left) compared against images from a
diblock copolymer experiment [81] (right)

Pore network

Lamella network

Inverted micelles

Figure 2.5 Geometries that minimize the FCH energy for slightly diﬀerent values of η1 and
η2 . The images show a level set near u = 0 with blue on the water side of the level set and
green on the polymer side

46

TEM experimental image

Experimental diagram
Figure 2.6 Experimental results for a diblock copolymer pore showing pearling instability
[82] (left) compared against result of a numerical solution of the FCH equation (right)

47

Chapter 3
Fully Implicit Method

3.1

Introduction

Many material science problems require an understanding of the microstructure that develops
in a mixture of two of more materials or phases over time as it phase separates during
a casting or annealing process. One such equation is the well-studied Cahn-Hilliard [10]
equation, written below in equation (3.2) in a one-dimensional (1D) setting, that describes
a binary alloy during annealing. The parameter

in the model describes the width of

the layers between the regions. Such regions form in O(1) time in a spinodal evolution.
Subsequently, they merge in a ripening process. Ripening happens on longer time scales,
generically O(eC/ ) for 1D Cahn-Hilliard [83] and O(1/ ) in higher dimensions [11]. We
extend the use of the terms “spinodal” and “ripening” to describe similar regimes in the
evolution described by other equations. Phase regions undergoing Cahn-Hilliard evolution
increase in size over time in a coarsening process. The statistics of this coarsening process
are of interest [84].
The Cahn-Hilliard model is a sub-class of phase ﬁeld models. A review of the use of such
models in material science applications can be found in [6]. It can be shown rigorously that
as → 0, solutions of Cahn-Hilliard equations have layers that tend to interfaces that move
with a nonlocal geometric motion known as the Mullins-Sekerka ﬂow [11]. Other phase ﬁeld
models also limit to geometric motion of other kinds. Understanding the limiting process
48

and studying it directly is of interest. In addition, Cahn-Hilliard equations and variants can
be used in computational approximation of moving interfaces in so-called diﬀuse interface
methods [85, 86] in which the problem for u is coupled to other variables describing other
physics. While the computational approach developed in this chapter might be useful to
some diﬀuse interface computations, we are motivated by a general class of pure (uncoupled
to other physics) energy gradient phase ﬁeld problems described below.
There are several interesting generalizations of the Cahn-Hilliard equation. A lower order
version (two instead of four spatial derivatives), the Allen-Cahn equation (3.1) [87] is also of
interest in materials science, describing the evolution of crystal grains of the same material
during annealing. This equation can also be called a Ginzberg-Landau equation.
The aim of this chapter is to develop a numerical approach that can be applied to a wide
range of phase ﬁeld problems that can easily be adapted to new terms, higher order problems,
and extension to vector solutions. It should be made clear we do not attempt to outperform
well-developed codes with space and time adaptivity with fast, multi-grid solvers that have
been developed for particular problems. Rather, we develop a reasonably fast time-adaptive
technique with general applicability.
Since many questions of interest in materials science are about the microstructure of a
bulk material far from boundaries, it is reasonable to consider problems in periodic domains.
We use a Fourier spectral discretization which is a natural choice in this setting. Although
this does rule out spatial adaptivity, it does admit a fast implementation on Graphical
Processing Units (GPU) in the computational framework we develop. We discretize in time
using Backward Diﬀerentiation Formula (BDF) methods [88] of low order, which have good
stability properties. Temporal error estimation is done with Adams-Bashforth (AB) [89]
predictors. Newton’s method is used to solve the resulting nonlinear problems. The Jacobian
49

matrix in the solve for the Newton update is symmetric since it is the second variational
derivative of an energy functional. It is also positive deﬁnite for time steps small enough (this
is discussed in more detail below). Although the Jacobian is dense for spectral discretizations,
multiplication by the matrix can be done quickly using the Fast Fourier Transform (FFT).
This motivates our use of the conjugate gradient method [90] to solve the Newton updates.
Such an approach used on high order problems requires an eﬃcient preconditioner. We use
a constant coeﬃcient version of the problem that is a linearization at pure phase states,
which will dominate the solution during ripening. This idea is similar to that used in [91] in
ﬁxed point iterations for time stepping for Cahn-Hilliard with operator splitting. Eﬃcient
performance is seen with our approach for a wide selection of scalar and vector problems
from second to sixth order. Mild increase is seen in preconditioned conjugate gradient (PCG)
iteration counts per time step as the time step is increased and

is decreased. Exploration

of the performance of the method speciﬁcally for the 1D Cahn-Hilliard problem, which has
a well understood structure during ripening, shows that the number of PCG iterations per
solve scales as O(

δt/ ) for large time steps δt and small , independent of the spatial

discretization.
There have been many contributions to the numerical solution of the Cahn-Hilliard and
related equations. Our work is novel in four ways: we exploit the symmetry of the Jacobian
matrix for the fully implicit time stepping problem in a CG method; we propose and analyze
the preconditioner for this Jacobian solve and show that it is eﬀective for a number of
problems; we implement the method in modern GPU architecture and get fast performance;
we demonstrate that the Jacobian matrix is not singular for large time steps during ripening
and that fully implicit time stepping leads to accurate solutions with these large time steps
with energy decrease.
50

We discuss further the issue of time-stepping. For arbitrary discrete initial data, the
fully implicit time discretization problem is known to have a unique solution only when the
time step is small enough [39]. As the spatial grid size h = ∆x is reduced and the initial
data on the reﬁned grid is again allowed to be arbitrary, the time step that guarantees
unique solutions to the implicit time discretization problem is reduced. In addition, it is
not possible in general to show that a fully implicit time step leads to a decrease in the
underlying energy. Guarantees of solvability for any time step δt and energy decrease are
possible for some models with an operator splitting approach due to Eyre [46]. Although
never published, this work has been very inﬂuential and some of the results are summarized
in [49]. Several of the computational approaches cited in Chapter 1 use variants of this
splitting approach. Some of the splitting techniques lead to nonlinear problems and we show
that our ideas lead to eﬃcient solution of these problems (with preconditioned CG iteration
counts independent of δt and ).
Guarantees of energy decay and unique solutions for any chosen time step δt are very
attractive and so splitting techniques have dominated the thinking in numerical methods
for these problems for many years. Our use of fully implicit time stepping may be seen
as controversial. We ﬁrst considered this approach because of some motivating high order,
vector models for which the process of splitting the gradient terms into convex and concave
parts was not straightforward. When implemented, we discovered that fully implicit time
stepping did not suﬀer from severe time step restrictions as the literature predicted. That
previous analysis was really for a “worst case” scenario and the solution structure through
ripening processes allowed for large time steps to be taken. We also discovered computationally that Eyre’s splitting can lead to disproportionately large temporal errors during ripening
and prohibits the use of large time steps appropriate to the dynamics there if time-accurate
51

solutions are required. This poor behavior can be understood with formal asymptotics on
a simple problem shown below. Our fully implicit time stepping is able to take appropriate
large time steps and maintain accuracy. While not the original intention of our work, our
results suggest that fully implicit time stepping strategies are more eﬃcient than popular
splitting techniques when time accuracy is desired, at least for the selection of models we
consider.
We outline the following sections of the paper. In Section 3.2 we present the equations
for the various models we consider and how they arise from an energy gradient ﬂow. In
Section 3.3 we give a basic description of the approach in a 1D setting using simple backward
Euler time stepping. This is done for clarity of exposition, but it should be made clear that
the approach has wide applicability, shown in Section 3.5 for a number of models in 2D and in
Section 3.6 where a GPU implementation to Cahn-Hilliard in 3D is described. Higher order
time stepping methods are discussed in Section 3.7. The performance of the preconditioner
is examined numerically and with formal asymptotics in a simple 1D setting in section 3.4. A
similar approach is taken in section 3.9 to investigate time stepping with operator splitting.

3.2

Models

We consider ﬁrst the very basic model for a scalar function u(x, t) in a 1D setting of the
Allen-Cahn [87] equation
ut = 2 uxx − W (u)

(3.1)

ut = − 2 uxxxx + (W (u))xx

(3.2)

and Cahn-Hilliard [10] equation

52

where W (u) = 1 (u2 − 1)2 . This is simply the one-dimensional version of 1.3. We consider
4
x ∈ [0, 2π] and u to be periodic in space.
Qualitatively, the reaction terms W (u) drive the solution to the two pure states u = ±1
and the other terms smooth the interface between regions of diﬀerent phases over a width
of O( ). These equations can be written with the

in diﬀerent places corresponding to

diﬀerent time scalings. It is important to note that the results discussed below on the
condition number of the preconditioned conjugate gradient method and its dependence on
time step δt and

are with respect to the scaling shown in equations (3.1) and (3.2) above.

Higher dimensional versions of these equations are obtained by replacing ∂ 2 /∂x2 by the
Laplacian ∆.
The models above are gradient ﬂows on the Cahn-Hilliard energy deﬁned in 1.1. The
fact that the evolution is a gradient ﬂow leads to a symmetric Jacobian matrix for the
implicit time step of the discretization, allowing the use of the conjugate gradient method
for its solution. Note that the Allen-Cahn model is a gradient ﬂow in the standard L2 inner
product:
2π

(u, v) :=

uvdx.

(3.3)

0

However, the Cahn-Hilliard model is a gradient ﬂow with respect to the H−1 inner product:

(u, v)H−1 := (u, ∆−1 v).

(3.4)

In addition, we consider the scalar sixth order problem 1.10 where η is a given positive
constant (note that with this sign, this term promotes the formation of phase interface).
This is also a gradient ﬂow of a certain energy in the H−1 inner product. Models of this
form are of current interest in the study of pore formation in functionalized polymers [92].
53

We also consider the following vector model for u = (u, v):

ut = − 2 ∆∆u + ∆ u W (u)

(3.5)

where here
3

|u − ui |2

W (u) =

(3.6)

i=1

and ui are the points in the (u, v) plane that correspond to the cube roots of unity. This is
a volume preserving model that forms symmetric triple junctions between three phases. It
can be seen as the higher order mass preserving version of the Ginzberg-Landau equation
presented in [93].
The extension of our computational method to higher dimensions and more complex
(higher order, vector) models is relatively straight-forward. We consider this the main
strength of our approach.

3.3
3.3.1

Basic numerical approach and results
Spectral discretization in space

In Section 1.9, we discretized the Cahn-Hilliard equation in space using a Fourier spectral
method. The sixth order and vector models are discretized in a similar manner. It is a direct
computation to show that the MOL discretizations above guarantee a decrease in a discrete
energy:


N
 2h

d
|F −1 Ωα F U| + h
W (Uj ) ≤ 0

dt  2
j=1

54

where in the left hand term |·| is the Euclidean norm and Ωα is a diagonal matrix with entries
iαn . The left hand term in braces is the discrete analogue of the energy (1.1). Since the
MOL discretization has this property, we can expect time discretizations to have the property
also, at least up to the order of the truncation error of the method. This is observed in the
variety of computational examples shown in the rest of this work. In fact, with the adaptive
time stepping strategy we use, an increase in this discrete energy is never observed in any
accepted time step including very large time steps used during ripening events. We continue
with a description of the time discretization of these models below.

3.3.2

Adaptive, implicit discretization in time

We consider the fully discrete approximation of u

m
u(jh, tm ) ≈ Uj , j = 1, . . . N and m = 0, . . . M

with t0 = 0 where initial conditions are given and time steps δtm = tm − tm−1 are chosen
adaptively as shown below. A basic, ﬁrst order approach is shown here. Higher order methods
are presented in section 3.7. They provide some eﬃciency gains but are not overwhelmingly
superior for modest accuracy.
Consider the semi-discrete Allen-Cahn equation (1.11). Starting at tm−1 we form the
explicit, forward Euler predictor U∗ for the solution at time tm :

U∗ = Um−1 + δtm

2 ∆ Um−1
h

− W (Um−1 ) .

(3.7)

We use this as an initial step in a Newton iteration for the solution to an implicit, backward

55

Euler time step

G(Um ) := Um − δtm

2 ∆ Um
h

+ W (Um ) − Um−1 = 0

(3.8)

We describe the solution procedure for this problem in the subsection below.
Expanding the predictor and corrector steps in Taylor series, it is easy to show that the
exact local truncation error ηe for this step can be approximated by

ηe ≈ η :=

1 ∗
U − Um
2

as is well known. We use maximum norms for all error calculations in this work. In timeadaptive computations below, we specify a given tolerance σ > 0 for the local truncation
error. If a time step fails the accuracy check (η > σ) then we fail the step and repeat with
the time step reduced by a factor of 1.3. This particular value is a somewhat arbitrary factor
but there were very few failed steps in the computations shown in this work and changing
this value does not change performance. A time step is also failed if the Newton iterations
do not converge or if the step leads to an increase in energy (although as noted above, this
failure was never observed in any of the computations described in this chapter). After a
successful step, the next step is taken with time step

δtm+1 = δtm max 0.8

σ
, 1.3
η

where the local truncation error is assumed to be dominated by its leading order quadratic
term and 0.8 and 1.3 are “safety factors”.

56

3.3.3

Solution of the implicit system

Consider (3.8), the implicit problem to be solved at each time step for the discretization of
the Allen-Cahn equation (3.1). Let U(r) denote the r’th Newton iterate approximation of
Um . The next iterate requires a solve with the Jacobian coeﬃcient matrix

J = I − δtm 2 ∆h + δtm Λ2

(3.9)

where Λ2 is the diagonal matrix with entries

(r)

(r)

W (Uj ) = 3[Uj ]2 − 1.

The matrix J is dense but symmetric and multiplication by J can be done eﬃciently using
the fast fourier transform and diagonal multiplication. This suggests the use of the conjugate
gradient method as a solution technique. However, the condition number of J is large and
the equivalent matrices for problems with higher order derivatives have even larger condition
number. Eﬃcient preconditioning is clearly required. We propose the preconditioner Q−1
where Q is the discretization of a constant coeﬃcient problem

Q = I − δtm 2 ∆h + 2δtm

(3.10)

which can be inverted eﬃciently. This is motivated heuristically by the observation that
during ripening, the solution will have values approximately ±1 at most grid points. The
behavior of the preconditioned gradient solver is examined in computational studies below
and analytically in section 3.4. For the discretization of the Allen-Cahn equation, it is shown

57

that the condition number is O(δt) for large δt independent of

during ripening.

For the discretization of the Cahn-Hilliard problem (3.2) we have the following Jacobian
matrix JCH and use the preconditioner QCH

JCH = I + δtm 2 ∆h ∆h − δtm ∆h Λu

(3.11)

QCH = I + δtm 2 ∆h ∆h − 2δtm ∆h .

(3.12)

Note that for this problem and the ones below, the matrices are symmetric in the discrete
version of the H−1 inner product (3.4). In the preconditioned conjugate gradient (PCG)
algorithm [90] these inner products are used. The preconditioned Jacobian matrix is shown
to have condition number O(δt/ ) in ripening states in this case.
For the discretization of the sixth order problem (1.10) we have

J6 = I − δtm 4 ∆h ∆h (∆h + ηI) + δtm 2 ∆h ∆h Λ2

(3.13)

+δtm 2 ∆h (ΛL Λ3 + Λ2 ∆h + ηΛ2 )
−δtm ∆h (Λ2 + Λ1 Λ3 )
2
Q6 = I − δtm 4 ∆h ∆h ∆h − δtm (η 4 − 4 2 )∆h ∆h

(3.14)

−δtm (4 − 2 2 η)∆h

where here Λi , i = 1, 2, 3 is the diagonal matrix with entries
di W (r)
(U )
dui j
and ΛL is the diagonal matrix with entries ∆h U (r) . Here, the preconditioner is derived with

58

the same heuristic reasoning as above, that at pure states u = ±1, Λ1 = 0, Λ2 = 2I, Λ3 = 6I
and ΛL = 0.
For the vector problem (3.5) we consider the solution components (Uj , Vj ) at a grid point
as a block and have

JV

= I + δtm 2 ∆h ∆h − δtm ∆h ΛV

QV

= I + δtm 2 ∆h ∆h − 18δtm ∆h .

where ΛV is a block diagonal matrix with 2 × 2 blocks




∂2W
∂uv
∂2W
∂v 2

∂2W
∂u2
∂2W
∂uv





where the vector potential (3.6) is considered above and the partial derivatives are evaluated
(r)

(r)

at the corresponding grid values (Uj , Vj ). A straight-forward calculation shows that for
values of u in any of the three symmetric potential wells, this block is diagonal with diagonal
entries 18. Hence, the preconditioner follows the same reasoning as above.

3.3.4

Basic numerical results

We consider the 1D Cahn-Hilliard equation as the model system in this section. Starting
from initial data
u0 (x) = cos(2x) +
and

1 cos(x+1/10)
e
100

(3.15)

= 0.18, the system moves to an intermediate state at t = 300 with two intervals with

u ≈ −1 as shown in Figure 3.1. This time is roughly where the slowest ripening evolution

59

Figure 3.1 This ﬁgure corresponds to the solution of the 1D Cahn-Hilliard (3.2) with initial
conditions (3.15) for = 0.18 at time t = 300. This is the type of solution at which the
preconditioner performance is examined in more detail in Section 3.4.
occurs, marked by the largest time steps taken by the method as shown in Figure 3.5. The
second term on the right above is a small perturbation so that these two intervals are not
symmetric, so that we will see generic behavior. At much longer times, these two intervals
will slowly evolve and merge [11, 83] as shown below. The ﬁxed time step δt performance of
the method is examined at a short time t = 0.2 in Tables 3.1 and 3.2 . First order convergence
in the time step and spectral convergence in N is seen as expected. Also observed is that
N = O(1/ ) is needed to spatially resolve the interfaces of width .
60

δt
Eδt
2e-4 1.32e-5
1e-4 6.6e-6
5e-5 3.3e-6
Table 3.1 Error estimates Eδt = Uδt − Uδt/2 for ﬁxed time step δt computations of the
1D Cahn-Hilliard model with = 0.18 to short time t = 0.2. Spatial discretization is ﬁxed
at N = 128.
N EN for = 0.18 EN
32
2.0e-3
64
9.3e-7
9.0e-13
128

for = 0.09
0.139
4.4e-3
1.3e-6

Table 3.2 Error estimates EN = UN − U2N for computations of the 1D Cahn-Hilliard
model with = 0.18 and = 0.09 to short time t = 0.2. Fixed time steps of δt = 1e − 4 are
used.
The performance of the adaptive time stepping method through a ripening event is shown
in Table 3.3. In these runs the Newton solve has residual tolerance 10−8 and the PCG solve
at each step has tolerance 10−9 . Starting from initial data (3.15) the solution contains four
transition layers at short time as shown in Figure 3.1. Over a very long time, the middle
transitions move closer together and merge as shown in Figure 3.2. The ﬁnal state with two
transition layers is steady. The energy, time step size and PCG iteration history for the are
shown in Figure 3.5 for the tolerance σ = 10−4 . One or two Newton iterations are taken
at each time step. Note the sharp transitions in the energy E at early times (the spinodal
evolution) and at the ripening event at which the time steps are small. The ripening time
estimates shown in Table 3.3 correspond to the time t at which the midpoint value u(π, t)
changes from positive to negative. This ripening event happens at a very fast time scale
after the long, slow transient. The results in Table 3.3 show that the method can accurately
capture the time that such events occur. Since local truncation error is O(δt2 ) it is expected
that as the local tolerance σ is reduced by a factor of 10, the number of total time steps

61

σ
time steps
1e-4
848
1e-5 2580 (3.04)
1e-6 8072 (3.13)
1e-7 25446 (3.15)

ripening time
8180
8273
8304
8314

total PCG iterations
19,105
39,942 (2.09)
87,563 (2.19)
227,799 (2.60)

Table 3.3 Performance of the adaptive time stepping through a ripening event of the 1D
Cahn-Hilliard model with = 0.18 and N = 128. The numbers in brackets are ratios to
quantities in the previous row.
should increase by a factor of

√

10 ≈ 3.16 as is observed computationally. This validates our

simple error estimation strategy. Note that the number of PCG steps increases by a smaller
factor, indicating that the condition number of the implicit system decreases as δt → 0.
Note also that the performance of the solver is independent of N (for ﬁxed ).
It is well known [94, 83] that ripening is exponentially slow in

in 1D Allen-Cahn and

Cahn-Hillard models. With error tolerance σ = 10−4 and N = 128 we compute ripening
at time approximately 34,200 using 948 time steps and 22,950 PCG iterations for

= 0.16

and ripening time 218,000 using 1081 time steps and 28,417 PCG iterations for

= 0.14.

This demonstrates that our approach behaves well when computing through ripening over
very long time scales. It is known that to resolve the dynamics for increasing small , higher
precision arithmetic is needed to resolve the exponentially small interactions of the layers in
this 1D setting [83].
The method performs similarly well on the lower order Allen-Cahn equation (3.1). Spinodal evolution leads to similar layers to those shown in Figure 3.1. In this case the equations
do not preserve mass and so ripening involves the separate collapse of the intervals of state
u ≈ −1 leading to a steady state of u ≡ 1.

62

Figure 3.2 This ﬁgure corresponds to the solution of the 1D Cahn-Hilliard equation (3.2)
with initial conditions (3.15) for = 0.18 during the ﬁnal stages of the ripening process at
large time.

63

Energy
0.13
0.12
0.11
0.1
0.09
0.08
0.07
0.06
0.05

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

t
Figure 3.3 The energy history for the adaptive time approximation with tolerance σ = 10−4
of the 1D Cahn-Hilliard equation (3.2) with initial conditions (3.15) for = 0.18.

64

Time step size k
180
160
140
120
100
80
60
40
20
0

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

t
Figure 3.4 The time step history for the adaptive time approximation with tolerance σ =
10−4 of the 1D Cahn-Hilliard equation (3.2) with initial conditions (3.15) for = 0.18.

65

Conjugate gradient iterations
40

35

30

25

20

15

10

5

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

t
Figure 3.5 The PCG iteration count history for the adaptive time approximation with
tolerance σ = 10−4 of the 1D Cahn-Hilliard equation (3.2) with initial conditions (3.15) for
= 0.18.

66

3.4
3.4.1

Investigation of the preconditioned system
Preliminaries and numerical results

At a ripening state u of the 1D Cahn-Hilliard model evolving from the initial data (3.15),
we examine the structure of the preconditioned Jacobian matrix

A = Q−1 JCH
CH

as presented above. We can consider the operator A in the continuum limit. Since JCH is
a low order perturbation of the elliptic operator QCH , the limit operator A is a relatively
compact perturbation of the identity (see [95], chapter 6). Thus we can expect the condition
number of A to be relatively insensitive to the spatial discretization level N for N large
enough to resolve the problem. In the discrete 1D setting, A and its eigenvalues can be
computed explicitly using built-in MATLAB commands. The eigenvalues for the

= 0.18,

δt = 10 at the solution shown in Figure 3.1 are shown in Figure 3.6. Note the clustering
of eigenvalues near 1 as expected from the preconditioning. At this slowly evolving state,
there are small eigenvalues as

→ 0 and δt → ∞ that determine the condition number of

A and limit the performance of the CG iterations. The eigenfunction corresponding to the
smallest eigenvalue for parameters

= 0.18 and δt = 10 is shown in Figure 3.7. Notice that

it is composed of pulses at the locations of the transition layers in the solution in Figure 3.1.
For this state u there are M = 4 transition layers. It is shown using formal asymptotics in
section 3.4.2 that it is expected that there will be M − 1 = 3 small eigenvalues as observed
computationally. We consider the dependence of the smallest eigenvalue on

and δt in

Table 3.4. Note that these results give evidence that the smallest eigenvalue scales like /δt,

67

Eigenvalues of the preconditioned jacobian matrix
1
0.8
0.6
0.4
0.2
0
−0.2

clustering of
eigenvalues at 1

3 small positive
eigenvalues

−0.4
−0.6
−0.8
−1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 3.6 Eigenvalues of the preconditioned Jacobian matrix for the model situation described in Section 3.4 for parameters = 0.18 and δt = 10.
giving a condition number κ of A that scales like δt/ . This is conﬁrmed in the formal
asymptotics below.
A similar study with the preconditioned Jacobian for the Allen-Cahn equations reveals
M small eigenvalues of identical magnitude that are ndependent of

and have values ap-

proximately C/δt with C ≈ 0.41 for large δt. This behavior is conﬁrmed in the formal
asymptotics below.
Remark 3. Since the condition number κ scales linearly with δt and the number of CG
68

Eigenfunction for smallest eigenvalue for
preconditioned matrix
0.2
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
−0.2

0

1

2

3

4

5

6

7

x
Figure 3.7 The eigenfunction of the preconditioned Jacobian matrix with the smallest eigenvalue for the model situation described in Section 3.4 for parameters = 0.18 and δt = 10.

δt↓ T →
25
50
100
200
400

= 0.18
300
3.36e-3
1.68e-3
8.43e-4
4.22e-4
2.11e-4

= 0.16
2000
3.14e-3
1.58e-3
7.89e-4
3.95e-4
1.97e-4

= 0.14
10000
2.83e-3
1.45e-3
7.26e-4
3.63e-4
1.82e-4

Table 3.4 Dependence of the smallest eigenvalue of the preconditioned Jacobian matrix for
the Cahn Hilliard problem on δt and . The time T at which eigenvalues are evaluated is
roughly where the slowest evolution occurs. The values are not that sensitive to the value
of T .

69

iterations to reach a ﬁxed solution accuracy increases as

√

κ [90], taking larger time steps in

ripening regimes will lead to higher numerical eﬃciency. Thus, we advocate taking as large
time steps δt as accuracy will allow. This is a further motivation for our consideration of
higher accurate time stepping techniques below.

3.4.2

Formal asymptotics

3.4.2.1

Allen Cahn

Consider ﬁrst the Allen-Cahn equation (3.1). In an inﬁnite spatial domain this problem has
a translationally invariant steady solution

u0 = tanh

x
√
2

.

In the ripening phase, solutions take the form [94]
M

u0 [(−1)j (x − xj )]

u≈

(3.16)

j=1

where xj (t) are transition layer positions. This is the structure seen in Figure 3.1 with
M = 4 transition layers (M must be even in our periodic setting). The approximation above
is valid up to exponentially small terms in . We denote such an approximation by =e in
what follows. Linearizing (3.1) leads to

vt = Lv := vxx − (3u2 − 1)v

70

(3.17)

where v is the linear disturbance to u(x, t). The spectrum of L at ripening states (3.16) is
well understood:
Theorem 3.4.1 (Carr and Pego 1989 [94]). L has M exponentially small (in ) eigenvalues
λj . The rest are negative and bounded away from zero. The eigenfunctions of the small
eigenvalues φj (x) are given by

φj =e

d
1
u0 (x − xj ) = √ sech2
dx
2

which in words are spikes of width

x − xj
√
2

(3.18)

centred at the interface location.

The eigenvalues λj govern the exponentially slow motion of the fronts xj (t). The theorem
stated in [94] allows for more general potentials W (u) and the results below can be extended
to these cases. Consider now the spectrum of the preconditioned Jacobian for the Allen-Cahn
problem at ripening states:
Aψ = σψ
where A = Q−1 J where Q and J are given by (3.9) and (3.10) respectively. This can be
rewritten as
(I − δtL)ψ = σ[I − δt(L − 3(u2 − 1))]ψ

(3.19)

where I is the identity operator. Recall that we are interested in the small eigenvalues σ
that determine the condition number of the PCG iterations for this problem and that there
is computational evidence that these σ are O(1/δt). Thus we make the ansatz σ = β/δt and

71

consider (3.19) formally in powers of δt:

O(δt) :
O(1) :

Lψ = 0

(3.20)

ψ = β(L − 3(u2 − 1))ψ.

(3.21)

The ﬁrst equation above forces ψ to be in

V = span{φj }

at highest order where the φj are the eigenfunctions corresponding to small eigenvalues of
Theorem 3.4.1. With ψ ∈ V, (3.25) is satisﬁed to exponentially small terms in . Now (3.21)
becomes
ψ = 3β(1 − u2 )ψ
We take ψ = φj ∈ V and take the L2 inner product of the equation above with φj leading
to
1
β=
3

φ2
j
φ2 (1 − u2 )
j

(3.22)

Considering the ripening form (3.16) and the local nature of φj (3.18), the value of β only
depends on the local layer structure up to exponentially small terms and we thus obtain M
copies of
β =e

1
3

sech4 x
≈ 0.4167
sech4 (1 − tanh2 x)

(3.23)

which agrees closely with the computational results in the previous section. Note that so far
(3.21) is satisﬁed only in the projection on V. Correction terms of order O(1/δt) in ψ in V ⊥
can be derived that give a complete picture of the formal analysis.

72

3.4.2.2

Cahn-Hilliard

The analysis of the small eigenvalues for the preconditioned Jacobian matrix for the CahnHilliard problem can be done similarly, with a few additional technical diﬃculties. Our
results here depend on a conjecture on the rank of a modiﬁed square distance matrix. Here
the relevant eigenvalue problem is

(I + δtDL)ψ = σ[I + δtD(L − 3(u2 − 1))]ψ

(3.24)

where D is the second derivative operator. We use D here rather than ∆ to be clear that
the analysis that follows is only applicable to the 1D case. The extra diﬃculty here arises
essentially because D is not invariant on V (the span of the eigenfunctions φj ) or V ⊥ . We
make the ansatz σ = β/δt as before and consider (3.24) formally in powers of δt:

O(δt) :
O(1) :

DLψ = 0

(3.25)

ψ = βD(L − 3(u2 − 1))ψ.

(3.26)

As before, (3.25) forces ψ ∈ V to highest order which we make explicit here
M

ψ=

cj φj .

(3.27)

j=1

More carefully said, Lψ must be a constant to satisfy (3.25) but this constant enters as a
O(1/δt) term. Turning to (3.26) we ﬁrst notice that ψ is a second derivative of a periodic

73

function and so must have average zero which gives the discrete condition
M

cj =e 0.

(3.28)

j=1

Let P be the projection onto the set of functions with average value zero:
2π
1
φ(x)dx.
2π 0

Pφ = φ −

Let φ∗ = P φj . Because of (3.28), (3.27) can be written equivalently as
j
M

cj φ ∗ .
j

ψ=
j=1

We can apply D−1 (taking the result to have zero average value) to (3.26), obtaining
M

M

cj D−1 φ∗
j

=

3βP (u2

− 1)

cj φj .
j=1

j=1

Taking the inner product with an arbitrary element of V with zero mass, then normalizing
with

φ2 = K/ (consider the form (3.18) to see that this gives an absolute constant K) we
j

obtain
− βAC

K

P BP c = βc

(3.29)

where c is the vector of M values cj , P is here the discrete projection onto the subspace
of vectors that sum to zero, βAC is the constant from the Allen-Cahn eigenvalue problem

74

(3.23), and B is the M × M symmetric matrix with entries

bij =

φ∗ D−1 φ∗ .
j
i

(3.30)

As above for the simpler Allen-Cahn case, (3.26) is so far only satisﬁed in the massless subspace of V. Full matching can be done with O(1/δt) terms in V ⊥ plus constants. Considering
(3.29), the expected behavior of the small preconditioned Jacobian eigenvalues of size /δt
will be observed as long as P BP is full rank M −1 and has O(1) eigenvalues. Consider (3.30)
in the limit as

→ 0. The integral

φj =e 2 remains ﬁxed in this limit and so φj is seen

to be an approximate delta function at xj with weight 2. Thus, D−1 φ∗ is approximately a
j
quadratic with second derivative −1/π except in a region of width

D−1 φ∗ (x) ≈ d −
j

around xj :

d2
4π 2
− 2π +
2π
3

where d is the distance between x and xj on the periodic interval [0, 2π]. The integral (3.30)
can then be approximated by
bij ≈ 2dij − d2 /π
ij

(3.31)

where dij is the distance between points xi and xj on the periodic interval and we have
adjusted the entries of B by a constant as allowed by the expression (3.29) to simplify the
expression. Note that the expression above is only accurate to polynomial terms in . We
have strong computational evidence that the matrix P BP with limiting entries (3.31) has
rank M − 1 whenever the layer positions xi are distinct. Thus, we conjecture
Conjecture 1. If {xi }, i = 1, . . . M are distinct points in [0, 1] and the entries bij of the

75

M × M matrix B are given by
bij = dij − d2
ij
where dij are the distances between points xi and xj , taken either as absolute values or on
the periodic interval, then the matrix P BP has M − 1 strictly negative eigenvalues. Here P
is projection onto the subspace orthogonal to constant vectors.
In the statement above we have scaled the interval to [0, 1] for convenience. It should be
noted that without the ﬁrst term in bij , linear in distance with scaling proportional to the
interval size, the square distance matrix P BP has rank at most 3 [96].

3.5

Performance on a variety of models

One of the main advantages of our approach is that it is easily extensible to higher order and
vector models. The Jacobian matrix and preconditioner are straight forward to generate for
these more complicated models. There is no need to determine how to split the potential
into convex and concave pieces, as is needed for the popular class of splitting methods based
on Eyre’s method [46] which we discuss in section 3.9 below. While we do not claim that
our approach is the most computationally eﬃcient methods for every (or any) problem, they
are reasonably eﬃcient and can be tried on any new problem with little eﬀort. Here, we
apply them to the Cahn-Hilliard problem (3.2), the sixth order model problem (1.10) and
the vector model problem (3.5), all in 2D. We conduct these numerical tests with ﬁrst order
time stepping for simplicity.
In the examples below, convergence in ripening times and the behavior of the number of
time steps and total PCG iterations with tolerance σ is observed as in the preliminary 1D
Cahn-Hilliard example shown in section 3.3.4 .
76

3.5.1

2D Cahn-Hillard

We begin with initial conditions

u0 (x, y) = 2esin x+sin y−2 + 2.2e− sin x−sin y−2 − 1

(3.32)

which after some initial ripening of u = 1 states (mass fraction 0.4556) leads to two circular
states, one of which captures the other in a long time frame. The results for the

= 0.08

model, using N = 128 and σ = 10−4 are shown in Figure 3.8. This simulation took 3,305
time steps and a total of 84,138 PCG iterations. The similar computation for

= 0.16

using N = 64 and σ = 10−4 past ripening took 1,471 time steps and a total of 30,051 PCG
iterations. The MATLAB code used for this example will be available on the publisher’s
web site for this article.

3.5.2

Sixth order model

We begin with the same initial conditions (3.32) shown in Figure 3.8 upper left. The results
for

= 0.18, η = 1 computed with N = 128 and σ = 10−4 are shown in Figure 3.9. The

inﬂuence of the interface promoting term is evident. For larger

the ﬁnal steady state is a

regular array. The ﬁnal pattern shown here is not at steady state. This simulation was done
in 2,229 time steps with a total of 234,582 PCG iterations.

3.5.3

2D vector model

We begin with initial conditions (3.32) for u and v0 (x, y) = sin y. After some initial ripening,
regions of the three states form, separated by interfaces which meet at triple junctions. Unlike
grain growth [93] this model preserves the area of each of the three phases. The results for
77

the = 0.32 (the

value compared to other models is larger due to the larger magnitude of

the reaction term for this model), using N = 128 and σ = 10−4 are shown in Figure 3.10.
This simulation took 2,229 time steps and 51,985 total CG iterations. We believe that the
ﬁnal result at t = 100 is an approximation of the steady state of the problem.

78

2D Cahn-Hilliard initial conditions u (x, 0)

2D Cahn-Hilliard solution u (x, 0.5)

2D Cahn-Hilliard solution u (x, 150)

2D Cahn-Hilliard solution u (x, 300)

Figure 3.8 2D Cahn-Hilliard example computation.
79

Sixth order model solution u (x, 2)

Sixth order model solution u (x, 25)

Sixth order model solution u (x, 500)

Sixth order model solution u (x, 2200)

Figure 3.9 2D sixth order model example computation.
80

2D vector model initial conditions u (x, 0)

2D vector model solution u (x, 0.5)

2D vector model solution u (x, 15)

2D vector model solution u (x, 100)

Figure 3.10 2D vector model example computation. Contours of cos(arg u + iv) are plotted. Two of the phases have value
cos(2π/3) = −1/2 (light blue in the plots) and are separated by dark blue lines.
81

3.6

GPU implementation

A recent development in scientiﬁc computing is General-Purpose computing on Graphics
Processing Units (GPGPU). In the 1990s, Graphics Processing Units (GPUs) were developed
to accelerate the building of 2D and 3D images for output to a display. Since then, the stream
processing capabilities of GPUs have been turned to scientiﬁc computing, and the application
to scientiﬁc computing has led to commercial lines of GPUs that have hundreds of double
precision computation cores with error checking. GPU computation can be much faster if
a problem or computation lends itself well to many lightweight computation threads. We
used Nvidia GPU cards and the CUDA programming language, which is C++ with extra
functions to control the device (GPU) from the host (CPU).
When simulating the Cahn-Hilliard equation implicitly with the spectral method detailed
in Section 3.3, there are three main types of mathematical operations: Fourier transforms,
element-wise multiplication or addition, and array reductions (the inner products in the PCG
method for example). The ﬁrst two of these operations adapt very well to GPU processing,
but array reductions pose some challenges.
Three-dimensional Fourier transforms can be computed extremely fast on a GPU by
breaking the domain into lines of grid points, then handing each line of grid points to a
separate multi-processor for calculation of the one dimensional FFT. After each processor
is done computing the frequency representation of the data, the GPU recompiles the data
into its three-dimensional representation. The widely available cuFFT package implements
FFT calculations on the GPU. With the solution available in both spatial and frequency
domains, multiplication by the Jacobian matrix and the preconditioner can be performed by
element-wise multiplication or addition in the appropriate domain. This plays directly into

82

the strength of the GPU which has millions of lightweight computation threads.
The third type of operation performed when computing solutions to the Cahn-Hilliard
equation is array reductions. Reductions must be used when computing the inner products as
well as maximum residual values for the conjugate gradient and Newton method iterations.
In a CPU calculation, array reductions are performed serially so they are simple to implement
and run quickly compared to other parts of the calculation. However, array reductions are
particularly troublesome for GPUs. Besides having hundreds of processors, GPUs are fast
because they hide latency by scheduling calculations in a ﬁrst come ﬁrst served manner.
Thus it is impossible to know before computation time in which order the 221 evaluations
will take place. The solution to this problem is to synchronize all threads after each set
of evaluations, even though setting such a thread block increases computation time. An
example of the proper way to approach array reduction on GPUs is given in the CUDA SDK
provided by Nvidia.
There are limitations to GPU computing that cannot be overlooked. Unlike a CPU that
has access to any memory location, GPUs multiprocessors can only access the data that is
stored in the GPU card’s memory. This leads to two complications: limitations on array size
and movement of data. The GPU card we used had 6GB of memory on the card, so for the
number of double-precision, complex-valued arrays we used in the simulation, the size of the
domain was limited to 222 total grid points. For a cubic domain, this limit corresponds to
128 grid points in each dimension. It is important to realize that this is a rigid memory limit
because it is deﬁned by the GPU card’s hardware. It is certainly possible to increase the
domain size by using multiple GPUs for a computation, but this would require signiﬁcant
eﬀort to reduce the data movement between domains that would limit the computational
speedup. The second complication is movement of the data into the memory on the GPU.
83

The movement of data on and oﬀ the card is the only way to access the solutions to the PDE,
and data movement is much slower than computation speed. Because of this, the number
of times the solution array is moved from the card must be kept at a minimum. This places
some limitations on the post-processing of solutions.

3.6.1

GPU Speedup

We conducted a speed test to compare simulation code written for parallel CPUs against the
code written for the GPU. The CPU code was written in C++ and aggressively parallelized
using OpenMP. The parallel version of FFTW was used for the Fourier transforms. The code
was run on two Quad-core Intel Xeon E5620 processors (eight total cores). The processors
have a clock speed of 2.4 GHz. The GPU used was a Tesla C2070 with 448 multiprocessor
cores with a clock speed of 1.15 GHz, and it had 6GB of global memory. The two hardware
conﬁgurations were about one year old at the time of the test, so the speed results are
comparable. The test was performed on initial conditions of 3D random initial data on
cubic grids with 128 grid points per dimension. Computing to time T = 40, the parallelized
C++ calculation took nine hours and seventeen minutes to complete compared to the Cuda
GPU code that took one hour and twenty-ﬁve minutes. The solutions given by the two codes
were identical. We realized a speedup factor of about 6.5 with the GPU implementation over
the eight core CPU calculation in this computation in which data transfer on and oﬀ the
GPU card was negligible. This speedup factor was conﬁrmed in timing tests of individual
time steps. The computed solutions are shown in Figure 3.11 and the time step and PCG
proﬁles are shown in Figure 3.12 and Figure 3.13, respectively. The sequential decreases in
time step size correspond to ripening events.

84

Figure 3.11 Solution at times T = 5, 10, and 40 of the 3D Cahn-Hilliard computation used
as a timing test for the GPU implementation. Shown are the zero level sets of the solutions,
with blue indicating increasing and green decreasing solutions at the interface.

0.08
0.07

Time step size (k)

0.06
0.05
0.04
0.03
0.02
0.01
0
0

5

10

15

20

25

30

35

40

Time t
Figure 3.12 Time step size δt for the 3D Cahn-Hilliard computation used as a timing test
for the GPU implementation.

85

PCG iterations per time step

30

25

20

15

10

5

0
0

5

10

15

20

25

30

35

40

Time t
Figure 3.13 PCG count per time step for the 3D Cahn-Hilliard computation used as a timing
test for the GPU implementation.

86

3.7

Higher order time stepping

We now consider second and third order adaptive time stepping methods. We use the
multistep methods Adams-Bashforth (AB) and Backward Diﬀerence Formula (BDF) as our
explicit predictor and implicit corrector respectively. With the same notation for the fully
discrete approximation of u given in section 3.3.2, the second and third order versions of AB
for the predicted U∗ solution at time tm are [89]:

δtm
6f (Um−1 ) − f (Um−2 ) ,
4
δtm
(AB3): U∗ = Um−1 +
23f (Um−1 ) − 16f (Um−2 ) + 5f (Um−3 ) ,
12

(AB2): U∗ = Um−1 +

where for example for the lower order Allen-Cahn equation,

f (u) = 2 ∆h u − W (u).

Again we use these as an initial state in a Newton iteration for the solution to the implicit
BDF time step. The second and third order versions of BDF lead to the nonlinear systems
G(Um ) for the solution at time tm given below [88]:

1
3 m
U − δtm f (U m ) − 2U m−1 + U m−2 = 0,
2
2
11 m
3
1
(BDF3): G(U m ) :=
U − δtm f (U m ) − 3U m−1 + U m−2 − U m−3 = 0.
16
2
3
(BDF2): G(U m ) :=

The solution procedure for these systems is the same as described in section 3.3.3. The
combination of AB predictor and BDF corrector has several advantages. BDF methods up
to order 6 have good stability properties for stiﬀ problems and have a high ratio of accuracy

87

to implicit solves [88]. Like forward and backward Euler methods, AB and BDF methods
of the same order p share the same form of dominant local truncation error, proportional to
the order p + 1 time derivative of u. Because of this, the exact local truncation error ηe for
the second and third order steps can be easily approximated by

ηe ≈ η := C||U ∗ − U m ||

(3.33)

where C is 4/9 and 2/5 for order 2 and 3 methods, respectively. These higher order methods
require p previous values for order p methods. Standard AB and BDF methods require
these previous values to be equally spaced in time. This is the major drawback of standard
multi-step methods. Initially and after every time step change, we compute the necessary
additional previous values using minimal stage L-stable Singly Diagonal Implicit Runge
Kutta (SDIRK) methods of second or third order [97].
In our time-adaptive computations below, we again specify a given tolerance σ > 0 for the
local truncation error η. Because changing the time step requires additional computational
computational cost we only increase the time step if the local error is suﬃciently below
tolerance,
1
η
<
σ
γ
where γ > 1 is user deﬁned. We chose γ = 3 based on computational evidence [98]. Total
time step counts are relatively insensitive to γ around this value. If the condition above is
met, we allow a time step increase with multiplier

ξ = (0.8γ)1/(p+1) .

88

where again 0.8 is a safety factor. If a time step fails the accuracy check (η < σ) then we
fail the step and reduce the time step by a factor of 1 and restart the computation.
ξ
Although SDIRK steps are more accurate than BDF steps for the same time step size,
the errors are diﬀerent and so the error estimator (3.33) cannot be used directly after a
restart. However, the error mismatch decays with a ﬁxed number of steps in a numerical
initial layer eﬀect. Because of this, we only check the error after a preset number S of time
steps following a restart for both an increase and decrease in the time step. The values of
S used are 4 and 8 for BDF2-AB2 and BDF3-AB3 respectively (the numerical initial layers
for BDF3 have a slower decay than for BDF2).

3.7.1

Numerical Results

We consider the same 1D Cahn-Hilliard model problem as in section 3.3.4, where the initial
data u0 (x) is given by (21) and

= 0.18. In these runs the Newton solve has residual

tolerance 10−9 and the PCG solve at each step has tolerance 10−10 . The time step size and
PCG iteration history are shown in Figure 3.14. Note that the time step history for the
ﬁrst order method (FE/BE) is diﬀerent in Figure 3.14 than in Figure 3.5 because we apply
the same accuracy threshold to changing time steps here that we do for the higher order
methods.

3.8

A Pair of Fourth-Order Accurate Methods

In recent work with Dr. David Seal, we discovered and implemented a new family of highorder time-stepping Lax-Wendroﬀ schemes. The fourth-order implicit method is A-stable
and eﬀectively avoids the Dahlquist barrier by using the derivatives of the right hand side of

89

Evolution of timestep
70
BE−FE
BDF2−AB2
BDF3−AB3

60
50

k

40
30
20
10
0
0

2000

4000

6000

8000

10000

t
Conjugate Gradient Iterations
300
BE−FE
BDF2−AB2
BDF3−AB3

250
200
150
100
50
0
0

2000

4000

6000

8000

10000

t

Figure 3.14 The time step and PCG iteration count history for the adaptive time approximation of the 1D Cahn-Hilliard using BE-FE, BDF2-AB2 and BDF3-AB3 with initial conditions
(21) for = 0.18, σ = 10−7 .

90

the equation. The fourth-order explicit method has the same error term as the corresponding
implicit scheme allowing an adaptive time-stepping procedure.
For the general PDE,
ut = f (u)
the two-stage explicit scheme is given by
2
δt
n ) + δt J f (un )
=
+ f (u
2
8
δt2
un+1 = un + δtf (un ) +
(J f (un ) + 2J f (u∗ )) ,
6

u∗

un

(3.34)

and the implicit scheme is

un+1 = un +

δt
f (un ) + f un+1
2

+

δt2
J f (un ) − J f un+1
12

,

(3.35)

where J is the Jacobian of the right-hand side of the diﬀerential equation, f .
For the CH equation, f = −ε2 ∆2 u + ∆ u3 − u , so the Jacobian acting on some object
v is J v = −ε2 ∆2 v + ∆

3u2 − 1 v . We compute the explicit solution and then solve for

the implicit solution as above. The adaptive time-step is computed in the same manner also,
except
σ
δtnew = δtold · min 0.8 · 5 , 1.3 .
E

(3.36)

The preconditioner we use is derived in the same way as the ﬁrst order preconditioner.
For the fourth order method applied to the CH equation,

G (ur ) v = v −

δt
δt2
J f (v) −
Hf (v) ,
2
12

91

(3.37)

∂u

tt
with Hf = ∂u computed by hand. Assuming u = ±1 and ∆u = 0 for the majority of the

domain, we obtain the physics based preconditioner,

Q=1−

δt2
δt
−ε2 ∆2 + 2∆ +
−ε2 ∆2 + 2
2
12

−ε2 ∆2 + 2∆ .

(3.38)

For the Cahn-Hilliard equation, we observe fourth order convergence for this pair of
methods. The physics based preconditioner made a signiﬁcant diﬀerence by reducing the required number of CG steps by a factor of one hundred throughout most of the computation.
During the spinodal phase separation exhibited by solutions of the CH equation (computed
from random initial data), time steps for the fourth order method were ten to a thousand
times larger than the ﬁrst order implicit method (Section 3.3.2). When the solution became
smooth (or if smooth initial conditions were used), the ﬁrst and fourth order methods maintained approximately the same size time steps when controlling the error. This comes from
the fact that when using a ﬁxed time step, the ﬁrst order method was numerically stable for
a time step size slightly larger than the fourth order method. The Newton and conjugate
gradient solves required nearly the same number of iterations per time step, but as expected,
the fourth order method required more computation per iteration.
The results in Table 3.5 shows the performance of the methods on the test problem for
various values of local tolerance σ. Ripening times can be computed very accurately with
the higher order methods. The PCG count is an accurate measure of computational cost and
includes all iterations from SDIRK restart and failed steps. Eﬃciency gains are obtained
with the higher order methods, with the biggest gain moving from ﬁrst to second order. As
in section 3.3.4 the ﬁrst order method shows the correct asymptotic behavior in the number
√
of time steps (ratios of 10 ≈ 3.16 of times steps as σ is decreased by 10). The second order

92

√
method results are also consistent with the correct asymptotic behavior ratio 3 10 ≈ 2.15.
The third order results have not reached the asymptotic regime. Considering the accuracy
of the ripening times in this case, it appears that the error estimation is over-estimating the
local error.
Remark 4. Experiments with other higher order strategies were also done, for example using
SDIRK2 stepping with SDIRK3 for error estimation. However, for tolerances σ leading to
accuracy of practical interest, this error estimation was also not asymptotically valid. It is
known that constructing good error estimators for IRK methods applied to very stiﬀ problems
is diﬃcult [99]. We conjecture that the relatively large higher derivative terms pollute the
error estimation in this case. The problem of accurate error estimation for higher order
time stepping for this class of problems is an interesting open question. With such error
estimation, arbitrary precision could be achieved to solutions of these problems: spectral
accuracy in space and high accuracy in time using spectral deferred correction methods [100]
or high order Radau methods [88].

3.9
3.9.1

Investigation of splitting methods
Preliminaries and numerical results

We present a splitting approach of Eyre [46] in the framework of the implicit approximation
of the 1D Cahn-Hillard model. The standard backward Euler time stepping approximation
of this model leads to the problem

Um + δt∆h

2 ∆ Um
h

− (Um )<3> + Um − Um−1 = 0

93

(3.39)

Method

σ
1e-04
BE-FE
1e-05
1e-06
1e-07
1e-04
BDF2-AB2 1e-05
1e-06
1e-07
1e-06
BDF3-AB3 1e-07
1e-08

time steps
792
2114 (2.67)
6371 (3.01)
20216 (3.17)
191
660 (3.455)
1484 (2.248)
3279 (2.210)
374
981 (2.623)
2041 (2.081)

ripening time
8206.11
8272.16
8303.81
8314.15
8301.34
8316.19
8317.97
8318.47
8318.43
8318.61
8318.63

total PCG iterations
41475
69166 (1.67)
125879 (1.82)
277079 (2.20)
22663
36535 (1.612)
44987 (1.231)
70580 (1.569)
25443
37582 (1.477)
53079 (1.412)

Table 3.5 Performance of each adaptive time stepping method through a ripening event of
the 1D Cahn-Hilliard model with = 0.18 and N = 128. The numbers in brackets are ratios
to quantities in the previous row.
where by (Um )<3> we mean the pointwise values cubed and δt is the time step as before.
This is the system solved with the methods in this chapter. A slight modiﬁcation to this
system, taking explicitly the linear term that otherwise makes the system non-convex, Eyre
[46] considered

Um + δt∆h

2 ∆ Um
h

− (Um )<3> − Um−1 + δt∆h Um−1 = 0.

(3.40)

Note that the implicit solve is still nonlinear but now convex. There are two attractive
theoretical properties to this approach:
A: The discrete problem (3.40) has a unique solution for every δt and every U m−1 .
B: The unique solution above is guaranteed to reduce the energy.
The nonlinear system can be solved with the same Newton, PCG approach as the unsplit
discretization we have considered up to this point and we can examine the condition number
of the preconditioned Jacobian matrix in the same framework as the preconditioned unsplit
94

method in section 3.4. Here, the matrix of interest is

AS = Q−1 JS
S

where QS = I + δt 2 ∆h ∆h − 3δt∆h and

JS = I + δt 2 ∆h ∆h − δt∆h ΛS

(r)

where ΛS is the diagonal matrix with positive entries 3[Uj ]2 where U(r) denotes the r’th iterate for Um as before. Note that JS is positive deﬁnite for every δt and U(r) (a consequence
of the convexity of the implicit problem), unlike JCH (3.11) from the unsplit discretization.
The eigenvalues of AS like those of A in section 3.4 also cluster around 1 and have a minimum value that determines the condition number of AS . Here, the minimum eigenvalue for
large δt at ripening states is approximately 0.290 independent of δt and . This matches the
value predicted by formal asymptotics shown in section 3.9.2 below.
It is possible to set up a linear ﬁxed point iteration and avoid the machinery of newton
iterations with PCG solvers. Consider the iterative scheme with iterates U(r) for Um to
solve (3.40):

U(r) + δt∆h

2 ∆ U(r)
h

− 3U(r)

= δt∆h (U(r−1) )<3> − 3U(r−1)
+Um−1 − δt∆h Um−1

This approach is taken in [91]. Note that the linearization of the ﬁxed point iteration has
matrix I − AS , so the small eigenvalues of AS will determine the convergence rate of the
iterations. The convergence factors will be approximately 1-0.290 = 0.710 for the 1D C-H
95

ripening solutions considered in this section, independent of δt and . Eyre also considered
other operator splitting possibilities, including one that only requires a linear implicit solve

Um + δt∆h

2 ∆ Um
h

− 4Um − Um−1 + δt∆h (4Um−1 − (Um−1 )3 ) = 0

(3.41)

which retains properties A and B above under the assumption that U ∞ remains bounded
by 1.
From the point of view of solver eﬃciency, the splitting methods are ideal. However, the
accuracy of time stepping with the operator splitting can be much lower than that of the
conventional (non-split) solvers considered in this chapter. We repeat the 1D Cahn-Hilliard
ripening time computations of section 3.3.4. The results are shown in Table 3.6 for the split
step method (3.40) using the PCG solver for the time stepping. Comparing to Table 3.3, it
is clear that the splitting method is much less eﬃcient than the pure implicit time stepping
proposed in this chapter. It is clear from Table 3.6 that this is not just an artifact of the
error estimation, since even with the large number of time steps, ripening times are not
found accurately. The linear splitting (3.41) performed similarly poorly, taking 193,673 time
steps for σ = 10−4 in the same computation listed in Tables 3.3 and 3.6, obtaining a (very
inaccurate) ripening time of 15301. Although the solve at each time step is more eﬃcient and
the method has the same order of accuracy, the errors from the operator splitting are much
larger than those from pure implicit time stepping for large time steps. Poor performance
of splitting methods is observed in 2D computations and in the other models considered in
section 3.5. For example, using Eyre’s method (3.40) on the 2D Cahn-Hilliard computation
described in section 3.5 leads to 83,862 time steps and 1,069,416 PCG iterations for the
= 0.08 run (compared to 3,305 and 84,138 respectively for our fully implicit approach)

96

and 18,891 time steps and 207,252 PCG iterations for the = 0.16 case (compared to 1,471
and 30,051 for our approach). The accuracy limitations of splitting methods has previously
been observed [55, 91]. Using higher order implicit-explicit (IMEX) time stepping methods
[101, 102, 103, 104] (which retain property A above but not necessarily B) does not allow
signiﬁcantly larger time steps through ripening, although they achieve the speciﬁed order of
accuracy as δt → 0 of course. The accuracy loss can be explained in the 1D Allen-Cahn case
using simple asymptotics below.
Remark 5. The recent method in [55] is diﬀerent from Eyre’s splitting ideas. Their technique
ensures property B above but not A. The implicit problem they solve at every time step (with
incomplete LU preconditioned GMRES) has the same structure as the fully implicit time
stepping considered in the current work, although it is a modiﬁcation of the implicit midpoint
rule which does not have strong stability properties. We believe our PCG solution strategy
could be applied to their discretization and should be more eﬃcient.
Remark 6. It should be made clear that these 1D problems with exponentially slow ripening
events are extreme cases. It is not clear whether Eyre’s type splitting methods might be made
more eﬃcient in some 2D and 3D problems where motion is generically only polynomial slow
in . If one relaxes the strict requirement of time accuracy and considers a computational
time step as just an eﬃcient step down the energy landscape[91], the comparison is even
less clear. See also [105, 106] for time stepping strategies that preserve coarsening statistics
but not pointwise accuracy. A hybrid strategy might also be considered, where eﬃcient time
steps based on operator splitting are taken during the spinodal phase or when other physics
limits the time step size and our fully implicit time stepping strategy is used only during long
ripening events. We consider the comparison of the eﬃciency of the approaches an open

97

σ
time steps
1e-4
70,517
1e-5 202,549 (2.87)
1e-6 618,431(3.06)

ripening time
13147
9582
8695

total PCG iterations
1,039,676
2,368,051 (2.27)
5,205,739 (2.19)

Table 3.6 Performance of adaptive time stepping through a ripening event of the 1D CahnHilliard model with = 0.18 and N = 128 with Eyre’s splitting (3.40). The numbers in
brackets are ratios to quantities in the previous row. Compare to values in Table 3.3 for the
fully implicit time discretization.
problem of interest. This is a question of accurate time stepping with lengthly solves versus
time step size limited computations with fast solves. However, we remind the reader that
the main advantage of the numerical framework described in this chapter is the generality of
problems to which it can be easily applied rather than its potential as an eﬃcient approach
to any particular problem.

3.9.2

Asymptotics

We extend the asymptotic framework developed in section 3.4.2 to explore the solver eﬃciency and accuracy limitations of Eyre’s splitting method.

3.9.2.1

Accuracy in Allen-Cahn equations

We highlight the accuracy limitations of Eyre’s splitting in the simple, low order setting of
the 1D Allen-Cahn problem (3.1). Consider the linear problem (3.17) in ripening states at
which Theorem 3.4.1 applies. Applying a fully implicit BE time stepping method to this
problem leads to the discrete problem

V m = V m−1 + δtLV m

98

(3.42)

Consider now taking large time steps. Components of V in V ⊥ are strongly reduced in a
single time step. Components of V in V change slowly and represent the dynamics in the
problem that need to be captured accurately during ripening. This characterizes the 1D
Allen-Cahn equations as an extremely stiﬀ problem, in the sense that it is even more stiﬀ
than the second order parabolic part acting on V ⊥ . For that reason, we strongly advocate
the use of L-stable [88] time stepping schemes on this problem rather than weakly stable
schemes such as trapezoidal rule or implicit midpoint rule. To consider the accuracy of the
method, we make the standard ansatz

V m = Gm φj (x)

(3.43)

where G is a constant to be determined and recall that φj is an eigenfunction of L with
small eigenvalue λj . Inserting (3.43) into (3.42) leads to the expected

G=

1
≈ 1 + δtλj + δt2 λ2 + · · ·
j
1 − δtλj

(3.44)

where we have expanded the expression in terms of small δt. Comparing to the exact
G=e

δtλj

1
we see the term above has an error of 2 δt2 λ2 , again as expected. We presented
j

the standard analysis above to be able to compare it to the result from Eyre’s method.
Linearizing Eyre’s method in this case leads to

V m = V m−1 + δt(L − I)V m + δtV m−1 .

99

Here, it is seen why the Allen-Cahn equation is used for this discussion, since Eyre’s splitting
preserves the eigen-structure of V in this case. With the same ansatz (3.43) we obtain

G=

1 + δt
≈ 1 + δtλj + δt2 (λ2 − λj ) + · · ·
j
1 + δt − δtλj

(3.45)

in this case. Here the dominant error is λj δt2 , which is exponentially larger than the error in
the fully implicit case. Another way to view this accuracy loss is that for the fully implicit
method, if δtλj is small, the scheme is “accurate” but for the Eyre’s case, δt must be small
when (3.45) is considered. Thus, exponentially smaller time steps must be taken with Eyre’s
scheme than with a fully implicit method. In this context, higher order splitting methods of
Eyre’s type based on standard implicit-explicit (IMEX) splitting do not solve this accuracy
issue. While they are formally higher order accurate, they require time steps unreasonably
small before they begin to be accurate, just as the lowest order scheme described above. We
conjecture that this is true for all such IMEX approaches. For example, the “G” value for
SBDF2 [102] is
1
δt3
(3λ3 − 2λ2 ) + · · ·
1 + δtλj + δt2 λ2 +
j
j
j
2
6
and for implicit-explicit midpoint rule [103]
1 2 2 δt3
1 + δtλj + δt λj +
(3λ3 /2 − 3λ2 /2) + · · ·
j
j
2
6

100

3.9.2.2

Condition number of solver for Eyre’s method for Cahn-Hilliard problems

Here the relevant eigenvalue problem is

(I + δtD(L − I))ψ = σ[I + δtD(L − I + 3(u2 − 1))]ψ

(3.46)

where I is the identity operator and D is the second derivative operator as before. The
fundamental diﬀerence to the analysis in section 3.4.2.2 is that the operator multiplied by δt
on the left is now L − I (which does not have small eigenvalues) rather than L (which does).
This indicates there are not O(1/δt) eigenvalues to this preconditioned system as δt → ∞
but rather they remain O(1) in this limit as observed computationally above. Formally
considering (3.46) as δt → ∞ leads to

(L − I)ψ = σ(L − I + 3(u2 − 1))ψ

(3.47)

where it can be justiﬁed dropping the integration constant for σ = 1 at leading order due to
the structure of u2 − 1 at ripening states. The operators on either side above are negative
deﬁnite and symmetric, so we obtain upper bounds for the smallest eigenvalues when we
take ψ = φj (the eigenfunctions of L with small eigenvalues so Lψj =e 0) in the expression
above and then take the inner product with φj . This gives asymptotically

σmin ≤

1
≈ 0.294
1 + 1/βAC

101

where βAC is the value (3.23) found previously for the relevant integral. We have not
given convincing evidence that this should be the minimum eigenvalue. Considering (3.47)
heuristically, the eigenfunction ψ for the minimum eigenvalue σ should minimize the size of
(L − I)ψ and maximize the product with 1 − u2 , which elements of V do. In addition, the
value does correspond to that observed computationally.

3.10

Conclusions

We have presented a new approach to the computational approximation of energy gradient
ﬂows from material science models such as Allen-Cahn, Cahn-Hilliard and higher order and
vector variants. The approach allows accurate time stepping with large time steps when
the evolution is slow during ripening events. Some evidence is given that approaches based
on Eyre’s operator splitting require much smaller time steps to maintain accuracy in these
settings. Fully implicit time stepping with matrix-free Newton steps solved with the CG
method are proposed in this work. An eﬃcient preconditioner is identiﬁed. Computational
evidence and formal asymptotics show that the solver has mild computational increase as
is decreased and time step δt is increased. The approach is shown to work well on a number
of problems in a general class and allows for an eﬃcient GPU implementation. It is possible
that the computational approach may also be useful for other models such as those from the
study of liquid crystals or epitaxial growth [107, 108] that share the same structure.
There are several avenues of study for theoreticians opened in this work. Arguments made
using formal asymptotics backed by computational evidence in 1D could be made rigorous
and extended to higher dimensions. The authors believe this is more than a technical exercise,
but that real insight into the eﬃciency of diﬀerent computational approaches can be obtained

102

from such a study. In addition, there is the conjecture in section 3.4.2.2 on the rank of a
modiﬁed distance matrix.
There have been many computational approaches to the Cahn-Hilliard equation, which
is used widely in materials science studies and as a computational approximation to moving
interface problems. In this well-developed computational ﬁeld, a test suite of benchmark
problems is clearly needed. A review of the methods in the literature, including a comparison
on the benchmark problems, would be a valuable contribution.

103

Chapter 4
Exponential Integrator

4.1

General Exponential Integrator

A signiﬁcant challenge in simulating the Functionalized Cahn-Hilliard equation is ﬁnding
the proper balance between time accuracy and reasonably fast computation times. Eyre’s
method uses a convex splitting of the equation to guarantee energy decay for any size timestep
[46, 34], but the time accuracy suﬀers [109]. Alternatively, a fully implicit preconditioned
conjugate gradient method gives the proper time evolution and time accuracy, but the Newton solve is severely restricted [109], resulting in a method that is too slow to reasonably
capture the long-time dynamics of the solution. Our goal is to develop a method that is both
time accurate and is computationally eﬃcient enough to describe fully relaxed solutions to
the FCH equation.
The exponential integrator method can provide the performance we desire under certain
limitations, and it is also computationally eﬃcient. The limitations are much less restrictive
than the fully implicit method, and has much greater time accuracy than the convex splitting
method. A review of the literature for exponential integrator methods is in Section 1.8.2.
The work in this chapter follows the the papers published for ODEs by Du and Zhu under
the name exponential time diﬀerencing (ETD) [66, 67], and by Cox and Matthews for stiﬀ
systems [65]. In this chapter we will present the ETD method for the CH and FCH equations,
along with some variations in the time stepping to obtain greater stability and increased time
104

accuracy.
We desire to solve the general PDE deﬁned by

ut = F (u) .

We ﬁrst separate F (u) into linear and non-linear parts,

ut = Lu + N (u) .

(4.1)

By moving the linear portion to the left hand side and multiplying the equation by e−tL , we
can obtain a total time derivative.

ut − Lu = N (u)
e−tL ut − e−tL Lu = e−tL N (u)
e−tL u

t

= e−tL N (u) .

Integration in time gives the general exponential integrator formula,

u (tn+1 ) = eδtL u (tn ) + eδtL

δt

e−sL N (u (s + tn )) ds

(4.2)

0

Up to this point, the derivation is exact, but we must now make approximations because
we do not know enough about N (u (t)) to compute the integral. The ﬁrst approach is to
obtain an explicit scheme that is ﬁrst order in time by taking N (u (t)) ≈ N (u (tn )) over

105

the interval of integration. We then compute the exponential integral by hand to obtain

un+1 = eδtL un +

eδtL − 1
N (un ) ,
L

(4.3)

where the notation un = u (tn ) is used here and in subsequent sections. Schemes that are
higher order in time will be created by improving upon the approximation to N (u (t)).
We use a Fourier spectral method to compute solutions to the CH and FCH equations
(1.3 and 1.10). The spectral method comes with both advantages and disadvantages. The
primary advantage is that the operators can be computed quickly and eﬃciently by hopping
between the spectral and physical domains. This reduces nearly all of the calculations to
element-wise operations on the arrays which can be computed extremely eﬃciently in parallel
on a graphics processing unit (GPU). Further, solutions to the FCH equation develop layers
with slope on the order of ε−1 [28], and the spectral method only needs a few grid points
to capture such layers. The primary disadvantage of using a spectral method is that we are
limited to periodic boundary conditions on rectangular domains. We expect to employ a
ﬁnite element or discontinuous galerkin method when we need to change the domain and
boundary conditions.

106

4.2

Stability and Linearization of the CH Equation

The most straight forward way to split the CH equation into the proper form (4.1) is to
deﬁne

Lu = −ε2 ∆2 u − ∆u
N (u) = ∆ u3 .

Unfortunately, this splitting is not stable with respect to inversion of L. This can be seen by
looking at the one-dimensional form of the operator in the frequency representation, where
k is the wavenumber,

L = −ε2 −k 2

2

− −k 2 = −ε2 k 4 + k 2 .

(4.4)

Ideally, the eigenvalues of the operator would be negative for all values of k or at least
1

nonzero, but here we have zeros at k = ε− 2 . There are a couple of diﬀerent ways to rectify
this problem.
One way to stabilize the linear operator is to add and subtract a term in the equation
that makes the operator non-positive in the frequency representation [56]. This gives,

ut = −ε2 ∆2 u + ∆ u3 − u + β∆u − β∆u,

107

(4.5)

with the splitting,

Lu = −ε2 ∆2 u + β∆u
N (u) = ∆ u3 − (1 + β) u .

In the frequency representation,
L = −ε2 k 4 − βk 2 ,

(4.6)

so any β ≥ 1 gives the desired property. Through numerical simulations, we concur with
Shen and Yang’s suggestion that β = 2 is the ideal choice. Unfortunately, the addition of
this term introduces additional error which is undesirable, but like the method described
in Chapter 2, it comes with a guarantee that the energy will never increase. Thus, this
method is ideal for simulations where dynamics are less important that the ﬁnal geometry or
solution. This decrease in accuracy is evident in Table 4.1, and it becomes more signiﬁcant
for the Functionalized Cahn-Hilliard equation.
A more accurate way to obtain stability under inversion is to linearize the problem using
information from the physics of the solution. Solutions to the CH equation rapidly form
domains where u (x) = ±1, so we choose to linearize about the background state u (x) = −1.
We take v = u + 1 and substitute into equation 1.3 which gives

vt = −ε2 ∆2 v + ∆ 2v − 3v 2 + v 3 .

108

(4.7)

δt
Simple
Shen
Physics-based ETD
1e-3
1.3741
9.7420e-2
7.2249e-2
2.5333e-3
1e-4 4.8206e-4 4.5247e-3
1e-5 6.8251e-5 3.3740e-4
1.3448e-4
6.5339e-6
1e-6 1.0790e-5 2.3092e-5
Table 4.1 Comparison of linearization choices for the CH equation. Error of the ﬁnal solution
is given for a simulation with ﬁxed time step δt. For the simple linearization with δt = 10−3 ,
the solution became unstable and developed spurious oscillations.
Taking the linear and nonlinear parts gives the operators

Lv = −ε2 ∆2 v + 2∆v
N (u) = ∆ v 3 − 3v 2 ,

where the frequency representation of L is strictly negative for all values of k except k = 0,
as desired.
With this formulation of the CH equation, we now apply the ﬁrst order exponential integrator (4.3). We note that with this linearization, the ﬁrst order method is unconditionally
stable with respect to the time step size. Unfortunately, it suﬀers from a loss of convergence
order for time steps larger than δt

4.3

10−4 .

Linearization of the FCH Equation

To eﬀectively linearize the FCH equation, we use information from the physics of the solution.
Solutions to the FCH equation rapidly form domains where u (x) = −1 is the dominant,
background phase. We choose to linearize about the background state, u (x) = b = −1, by

109

taking v = u − b and substituting into equation 1.10 which gives

vt = ∆
=∆

ε2 ∆ − W (v + b) + εη1
ε2 ∆ − W (b) − W

ε2 ∆v − W (v + b) + ε (η1 − η2 ) W (v + b)

(b) v − Q2 (v) + εη1

+ ε (η1 − η2 ) W (b) + W (b) v + Q1 (v)

ε2 ∆v − W (b) − W (b) v − Q1 (v)
,

where Q1 (v) and Q2 (v) contain the terms that are quadratic and higher order in v. For
1
our typical potential well, Wτ (u) = 2 (u + 1)2 1 (u − 1)2 − τ (u − 2) , we have Q1 (v) =
2
3
1
2

(6b + τ ) v 2 + v 3 and Q2 (v) = 3v 2 . We note that W (b) = 0, and separate the linear and

nonlinear parts to obtain the operators

Lv = ∆

ε2 ∆ − W (b) + εη1

ε2 ∆ − W (b) + ε (η1 − η2 ) W (b) v

N (u) = ∆ − ε2 ∆ − W (b) + εη1 Q1 (v) + −W

(b) − Q2 (v)

ε2 ∆v − W (b) v − Q1 (v)

+ ε (η1 − η2 ) Q1 (v)] .

This linearization is ideal because the frequency representation of L is strictly negative for
all values of k except k = 0, making the operator stable under inversion for η1 ≥ η2 .
Numerically, we observe stability as long as η1 and η2 are of the same order.
With this formulation of the FCH equation, we now apply the ﬁrst order exponential integrator (4.3). We note that with this linearization, the ﬁrst order method is unconditionally
stable with respect to the time step size, and now extend this linearization to higher-order
in time methods for solving the FCH equation.

110

4.4

Computing Exponential Terms

One way to improve time accuracy and obtain higher-order time-stepping is to expand
N (v (t)) as a Taylor series in time. Over the integral of integration, we take the expansion about tn and truncate at the order of accuracy that we desire,

N (v (t)) = N (vn ) + (t − tn ) JN

dv
dt

+ O δt2
tn

= N (vn ) + (t − tn ) JN (Lvn + N (vn )) + O δt2 .

d
dt N

(v) = JN

dv
dt

(4.8)

is calculated by hand, noting that all the time dependence is in v (x, t).

We derive the second order method by ﬁrst substituting the approximation into equation
4.2,
vn+1 = eδtL vn + eδtL

δt
0

e−sL [N (vn ) + sJN (Lvn + N (vn ))] ds.

After pulling out the portions that are constant in time,

vn+1 = eδtL vn + eδtL N (vn )

δt
0

e−sL ds + JN (Lvn + N (vn ))

δt

se−sL ds ,

0

we integrate the exponentials by hand to obtain a second-order in time method,

vn+1 = eδtL vn +

eδtL − 1
eδtL − 1 − δtL
JN (Lvn + N (vn )) .
N (vn ) +
L
L2

(4.9)

The exponential coeﬃcients are universal for high order ETD methods, and they are
particularly diﬃcult to calculate numerically with reasonable accuracy. The exponential

111

coeﬃcients are deﬁned as:

φ0 (z) = ez

φ1 (z) =

ez − 1
z

φ2 (z) =

ez − 1 − z
z2

...

φl+1 (z) =

1
φl (z) − l!
.
z

(4.10)

In our implementation of ETD methods, z = δtL, and we always compute these terms in
the frequency representation. Using this notation, equations 4.3 and 4.9 can be written as,

vn+1 = φ0 (δtL) vn + δt φ1 (δtL) N (vn )
vn+1 = φ0 (δtL) vn + δt φ1 (δtL) N (vn ) + δt2 φ2 (δtL) JN (Lvn + N (vn )) ,

respectively. In Sections 4.5 and 4.6, we introduce methods that are higher order in time
which use the higher order φ-functions. Unfortunately, computing φ1 (z) directly encounters
signiﬁcant errors when z is small, i.e. both δt and k are small. This comes from cancellation
errors when rounding oﬀ values at machine precision, and can lead to errors that are orders
of magnitude larger than the actual value [62, 110, 69]. These errors become increasingly
problematic for φl (x) with larger values of l.
To eliminate these errors, we use the Taylor expansion of φ1 (z) about z = 0, and include
as many terms as necessary to obtain a desired tolerance. We ﬁnd that direct computation of
1

φl (z) is suﬃcient if z > η l+1 , where η is machine precision (typically 10−16 ). In the spectral
setting, L is small for wave numbers, k, near zero, but is typically O (1) or larger. Thus,
small z depends on the size of δt which can be as small as 10−12 in our three dimensional
calculations. When computing φ1 (z) and δt ≤ 10−7 , we divide the φ1 (z) operator into two
portions: large k values are computed directly, and small k values are computed using a
Taylor expansion out to a ﬁxed number of terms that have been estimated a priori. In three

112

√

η

dimensional calculations, the cutoﬀ for which the expansion is necessary is given by L < δt ,
where the array L was computed once at the beginning of the simulation. The calculation
is performed similarly when approximating the higher order φ-functions.
An alternate method for stabilizing the calculation of φ-functions was presented by Kassam and Trefethen [69]. It involves calculating the value of an integral in the complex plane.
When L is real, as the CH and FCH equations, the integration can be performed around any
contour Γ that encloses the eigenvalues of the operator that are small in modulus. Kassam
uses circles and 32 or 64 evaluation points with the trapezoidal rule to obtain stability and
accuracy.
Similar work by Du and Zhu suggests that only two points are necessary for up to
the fourth-order Runge-Kutta scheme [67]. It seems reasonable that only a few points are
necessary to provide stability, but we ﬁnd it diﬃcult to believe that two points is suﬃcient
to see higher order accuracy in time, and the paper gave no results higher than second order.

4.5

Runge-Kutta Methods

One way to improve the time accuracy of the exponential integrator is through the use of
Runge-Kutta (RK) type methods. This is accomplished by computing approximations to
N (v (t)) using multiple stages over the interval [tn , tn+1 ]. For the second order RK method,
we take an explicit exponential integrator step

an = eδtL vn +

eδtL − 1
N (vn ) ,
L

113

then make the approximation

N (v (t)) = N (vn ) + (t − tn ) (N (an ) − N (vn )) /δt + O δt2 .

This approximation gives the ETDRK2 method described in [65, 66].
Denoting the numerical approximation to v (tn ) as vn , the second order RK method is
eδtL − 1
N (vn )
L
eδtL − 1 − δtL
(N (an ) − N (vn )) .
vn+1 = an +
δtL2
an = eδtL vn +

(4.11)

A similar construction gives a third-order exponential integrator RK method,
δt

e2L−1
+
N (vn )
an =
L
eδtL − 1
bn = eδtL vn +
(2N (an ) − N (vn ))
L
δt
e 2 L vn

vn+1 = eδtL vn + δt−2 L−3

−4 − δtL + eδtL 4 − 3δtL + δt2 L2

N (vn )

+ 4 2 + δtL + eδtL (−2 + δtL) N (an )
+ −4 − 3δtL − δt2 L2 + eδtL (4 − δtL) N (bn ) } .

(4.12)

The standard fourth order Runge-Kutta method only attains third-order accuracy in
time, but [65] gives a method with diﬀerent coeﬃcients that is fourth-order accurate for

114

exponential integrator schemes,
δt

an =

δt
e 2 L vn

e2L−1
+
N (vn )
L

bn =

δt
e 2 L vn

e2L−1
+
N (an )
L

cn =

δt
e 2 L an

e2L−1
(2N (bn ) − N (vn ))
+
L

δt

δt

vn+1 = eδtL vn + δt−2 L−3

−4 − δtL + eδtL 4 − 3δtL + δt2 L2

N (vn )

+ 2 2 + δtL + eδtL (−2 + δtL) (N (an ) + N (bn ))
+ −4 − 3δtL − δt2 L2 + eδtL (4 − δtL) N (cn ) } .

(4.13)

Since we are only using explicit methods, we require that these methods have suﬃcient
numerical stability. Stability for the RK and other methods will be discussed in Section 4.8.

4.6

Taylor Methods

In Section 4.4, we derived a second-order, explicit ETD method based on a Taylor expansion
of N (v) centered at tn . Unfortunately, numerical simulations showed that this method
(4.9) is unstable for large time-step sizes and attains second order convergence for only very
small time-step sizes. This is due to approximating the function on the left hand side of
the integration interval where the exponential multiplier is smallest (remember that L ≤ 0
for each k in the frequency representation). To solve this dilemma, we instead approximate

115

N (v (t)) with an expansion about tn+1 ,

N (v (t)) = N (vn+1 ) + (tn+1 − t) JN

dv
dt

+ O δt2
tn+1

= N (vn+1 ) + (tn+1 − t) JN (Lvn+1 + N (vn+1 )) + O δt2 .

(4.14)

Again substituting into equation 4.2 and extracting the multiplying constants from the integral, we have

vn+1 = eδtL vn + eδtL

δt
0

e−sL [N (vn+1 ) + (δt − s) JN (Lvn+1 + N (vn+1 ))] ds

= eδtL vn + eδtL {N (vn+1 ) + δtJN (Lvn+1 + N (vn+1 ))}
δt

− JN (Lvn+1 + N (vn+1 ))

δt

e−sL ds

0

se−sL ds .

0

Computing the integrals by hand leaves an implicit numerical method,

vn+1 = eδtL vn +

eδtL − 1
{N (vn+1 ) + δtJN (Lvn+1 + N (vn+1 ))}
L
eδtL − 1 − δtL
JN (Lvn+1 + N (vn+1 )) .
−
L2

Rather than solve this implicit method, we add an explicit stage to approximate vn+1 ,
eδtL − 1
N (vn )
L
eδtL − 1
vn+1 = eδtL vn +
{N (an ) − δtJN (Lan + N (an ))}
L
eδtL − 1 − δtL
+
JN (Lan + N (an )) .
L2
an = eδtL vn +

116

(4.15)

This method based on the Taylor expansion is not unconditionally stable with respect
to time-step size, but it allows a much larger time step size than (4.9) and shows second
order convergence (see Section 4.7). Further, since the second stage of equation 4.15 accepts
an approximation for vn+1 and returns a better approximation for it, we can iterate on
the second stage of the method to reduce the error of our solution. As the number of
iterations increases, the error in the solution approaches the error of a fully-implicit secondorder exponential integrator. In this way, the method behaves like a predictor-corrector type
method.
By taking more terms in our initial Taylor expansion, a similar third order method is

an = eδtL vn +

eδtL − 1
N (vn )
L

˙
N (an ) = JN (Lan + N (an ))
¨
N (an ) = HN [Lan + N (an )]2 + JN [L (Lan + N (an )) + JN (Lan + N (an ))]
vn+1 = eδtL vn +

eδtL − 1
L

˙
N (an ) − δtN (an ) +

eδtL − 1 − δtL ˙
¨
N (an ) − δtN (an )
L2
eδtL − 1 − δtL − 1 δt2 L2 ¨
2
+
N (an ) .
L3

δt2 ¨
N (an )
2

+

(4.16)

Again, each iteration of the second stage improves the solution by decreasing the error. Since
the predictor stage is only ﬁrst order, we require at least two iterations of the second stage
to obtain a third order method. Further iterations drive the solution to the expected error
of a fully-implicit third-order exponential integrator.

117

4.7

Convergence of ETD Methods

To test the order of convergence in time step size, and to compare the methods discussed
in this chapter, we completed a reﬁnement study. The initial condition was a solution to
the CH equation that was far past the spinodal phase but still far from the steady state
solution. Time evolution was computed for ﬁxed δt = 10−4 then halved and recomputed.
We estimated the error for δt = 10−4 as the distance between the solutions measured in the
maximum norm. Using the solution with δt = 5 · 10−5 , we again halved the time step to
calculate the corresponding error. Figures 4.1 through 4.5 give the results of the reﬁnement
studies for each of the methods described above.
Figure 4.1 shows the results for the higher order ETD-Runge-Kutta methods (equations
4.11, 4.12, and 4.13) together with the general ETD (4.3) for comparison. Dashed lines with
the proper order are included for reference. We note that for small enough time step size,
each of the methods reaches its expected order of convergence. However, as the size of the
time step approaches δt = 10−4 the order of convergence begins to plateau. The reason for
this behavior is discussed in Section 5.4.1.
Figure 4.2 gives the corresponding plot for the ETD-Taylor methods (equations 4.15 and
4.16). The methods reach their expected order of convergence for larger size time steps than
the ETD-RK methods, but again they plateau near δt = 10−4 . Figures 4.3 and 4.4 show the
eﬀect of iterating on the second stage of the method for the ETD-T2 and ETD-T3 methods,
respectively. These iterations reduce the error by as much as three orders of magnitude in
the ETD-T2 method and ﬁve orders of magnitude in the ETD-T3 method.
Lastly, Figure 4.5 compares the error between the Runge-Kutta and Taylor methods
for both the second and third order schemes. For the simple implementation, the Taylor

118

0

10

−2

10

−4

Error

10

−6

10

ETD
ETD−RK2
ETD−RK3
ETD−RK4

−8

10

−10

10

−6

10

−5

10
δt

−4

10

Figure 4.1 Timestep size reﬁnement study for the ETD-RK methods given in Section 4.5.
Dashed lines are included for comparison and have slope of two, three, and four respectively.

methods have slightly more error than the corresponding ETD-RK methods, but including
several iterations makes the Taylor methods much more valuable.

4.8

Stability Analysis

In this section I compare the stability regions of the ﬁrst through third order ETD methods
described above (4.3, 4.11, 4.12, 4.15, 4.16). The stability region is the parameter region
such that the magnitude of the ampliﬁcation factor is less than or equal to one when it is
applied to the model equation
ut = ξu + λu,

119

(4.17)

0

10

−2

Error

10

−4

10

−6

10

ETD
ETD−T2
ETD−T3

−8

10 −6
10

−5

10
δt

−4

10

Figure 4.2 Timestep size reﬁnement study for the ETD-Taylor methods given in Section 4.6.
Dashed lines are included for comparison and have slope of two and three respectively.
where Lu = ξu, and N (u) = λu. We assume ξ is a negative, real-valued constant and λ
is complex. In Figures 4.6, we plot the regions for λ in the complex plane where un+1 ≤ 1
for one step with an initial condition un = 1. Similar plots and analysis for the ETD-RK
methods can be found in [67] and [65].
As discussed in Section 4.6, the Taylor methods decrease error when the second stage is
repeated, but unfortunately iterating on the corrector stage does not signiﬁcantly increase
the size of the stability region as shown in Figure 4.7.

120

0

10

−2

Error

10

−4

10

1
2
5
10
20

−6

10

−8

10 −6
10

−5

−4

10
δt

10

Figure 4.3 Timestep size reﬁnement study for the ETD-T2 method with diﬀerent numbers
of iterations. The dashed line has slope two and is included for comparison.

4.8.1

Predictor/Corrector Analysis

To understand the limits on the stability region for the ETD-Taylor methods, we take the
limit as δtξ → 0 and look at the ode,

ut = λu.

(4.18)

The backward Taylor series for this problem yields a single time step of the form

un+1 = un + z −

z2 z3 z4
+
−
+ ···
2!
3!
4!

un+1 ,

where we can truncate the expansion at the order we desire for the method.

121

(4.19)

0

10

−2

10

−4

Error

10

−6

10

1
2
5
10
20

−8

10

−10

10

−6

−5

10

10
δt

−4

10

Figure 4.4 Timestep size reﬁnement study for the ETD-T3 method with diﬀerent numbers
of iterations. The dashed line has slope three and is included for comparison.
Taken together with a predictor step, we have:

Predictor:

u[0] = p0 (z) · un

(4.20a)

Corrector:

u[k] = un + p(z) · u[k−1] ,

(4.20b)

with k as the counter for number of corrector iterations. To implement the second order
ETD-T2 scheme (4.15) for the ode of interest (4.18), we use p0 (z) = 1 + z in (4.20a) which
yields the ﬁrst order predictor, and set p(z) = z − z 2 /2 in (4.20b) giving the second order
corrector. Each iteration of (4.20b) will increase the order of the method, up to the order
of accuracy of p(z). Thus the second step of ETD-T3 must be performed at least twice to
achieve solutions that are third order in time.

122

0

10

−2

10

ETD
ETD−T2
ETD−RK2
ETD−T2 (20)
ETD−T3
ETD−RK3
ETD−T3 (20)

−4

Error

10

−6

10

−8

10

−10

10

−6

10

−5

−4

10
δt

10

Figure 4.5 Comparison of the error for ETD-RK methods versus the ETD-Taylor methods
with one and twenty iterations.
The exact solution to the linear diﬀerence equation (4.20a)-(4.20b) is given by,

u[k] =

un
1
+ p(z)k p0 (z) −
1 − p(z)
1 − p(z)

un .

(4.21)

Provided |p(z)| < 1, the error converges to the same as a fully implicit scheme, but unfortunately this has a much smaller region of stability. For second and third order corrector
steps, Figure 4.8 shows the regions where |p(z)| < 1 which match with the results of Figure
4.7 for small δtξ.

4.9

Implementation of ETD-RK2

To obtain larger-scale, three-dimensional solutions to the CH and FCH equations, we implemented the ETD-RK2 method for time stepping combined with a spectral solution in space.

123

4

4
ETD
ETD−RK2
ETD−RK3

3

3

2

2

1

1

0

0

−1

−1

−2

−2

−3

ETD
ETD−T2
ETD−T3

−3

−4
−4

−2

0

2

4

−4
−4

(a) Runge-Kutta δtξ = −0.01

−2

0

2

4

(b) Taylor δtξ = −0.01

15

15
ETD
ETD−RK2
ETD−RK3

10

10

5

5

0

0

−5

−5

−10

ETD
ETD−T2
ETD−T3

−10

−15
−15

−10

−5

0

5

10

15

(c) Runge-Kutta δtξ = −10

−15
−15

−10

−5

0

5

10

15

(d) Taylor δtξ = −10

Figure 4.6 Stability regions for exponential time diﬀerencing methods up to third order in
time. Regions are plotted in the δtλ complex plane.

124

4
3

15

ETD
ETD−T2
ETD−T3

10

2
1

5

0

ETD
ETD−T2
ETD−T3

0

−1
−5

−2
−10

−3
−4
−4

−2

0

2

4

−15
−15

−10

(a) δtξ = −0.01

−5

0

5

10

15

(b) δtξ = −10

Figure 4.7 Stability regions for ETD-T2 and ETD-T3 methods with 100 iterations of the
second step. Regions are plotted in the δtλ complex plane.

4
3

2nd Order
3rd Order

2
1
0
−1
−2
−3
−4
−4

−2

0

2

4

Figure 4.8 Contours where p (z) = 1, giving limits on the region of stability for ETD-T2 and
ETD-T3. The regions are plotted in the δtλ complex plane. Compare with Figure 4.7

125

Two properties make the spectral method ideal for this calculation: high spatial resolution
captures solution fronts, and computing derivatives is fast and easy. Details for the general
spectral method are given in Section 3.3. We ﬁrst wrote the code in C++ and parallelized
it using OpenMP. After we were satisﬁed with the stability and validity of the CPU version of the code, we wrote a corresponding code for a GPU using C++/CUDA. The Fourier
transforms were computed using cuFFT, a package similar to FFTW adapted for GPUs. We
used an adaptive time-stepping scheme based on error control, and a cubic domain with 128
grid points in each direction. The cubic domain is limited in the number of Fourier modes
in each dimension by the size of the GPU’s on-card memory. We performed our calculation
on an Nvidia M1060 which has 4 gigabytes of memory leading to the 128 mode limitation.

4.9.1

Application to a GPU

When writing algorithms for a graphics processing unit there are three important hardware
properties; massive parallelism, ﬁxed memory limit, and asynchronous scheduling. Algorithms that are eﬀective within these limitations can gain impressive speed as shown in the
previous examples (Sections 2.9 and 3.6.1).
Modern GPUs typically have thousands of streaming multiprocessors that can handle
thirty-two threads at a time. To be computationally eﬀective, this impressive parallelism
requires problems of suﬃcient size. If the computational requirements of the algorithm
are too small, multiprocessors will sit idle and lead to a decrease eﬃciency. On the other
hand, each GPU has limited on-card memory, so an eﬀective algorithm will not require large
amounts of storage. This on-card memory is limited to about 6GB of storage for current
GPUs, and any memory transfer onto and oﬀ of the card is much slower than the speed of
calculation, so it is eﬀectively a hard limit on the amount of usable memory.
126

Multiply/Add
FFT
IFFT

u
u

∆
∆u

u

u2 + u

∆u

u2 + u ∆u
Figure 4.9 Diagram showing how to implement a spectral method on a GPU to avoid synchronization, minimize memory use, and fully utilize the parallelism of the GPU.
Finally, graphics processing units gain speed by hiding latency. This is done by scheduling
with a ﬁrst-come-ﬁrst-served method for each of the streaming multiprocessors. Every time
an algorithm requires a synchronization of the threads, the scheduler must clear out and
restart, which slows down the computation. This is particularly relevant when computing
with a spectral method because synchronization must occur when switching between regions
of element-wise calculations and regions of Fourier transforms.
Figure 4.9 gives the process for computing u2 + u ∆u. This term does not appear in
either the CH or FCH equations, but it gives a simple working example to better explain
how the more complicated algorithms work. Beginning with two arrays, u in the spatial
domain and ∆ in the frequency domain, we alternate between FFT/IFFTs and elementwise operations. Note that by computing u2 + u at the last level we are able to perform
the calculation requiring storage of only one additional array. Further, when the domain
is relatively large, every streaming multiprocessor will be required giving full usage of the
massive parallelism.

127

4.9.1.1

GPU Speedup

For the ETD-RK2 method, we conducted a speed test similar to Section 2.9. The test
consisted of a very short simulation (approximately thirty time steps) computed with a
single graphics processing unit compared against a fully parallelized code running on eight
CPU processors. The hardware for the two codes was purchased at roughly the same time by
the High Performance Computing Center at Michigan State University, so the comparison is
fair. The same random initial data was used for each simulation, and the parameters were:
ε = 0.03, η1 = 5.0, η2 = 3.0, and τ = 0.6. We adjusted the number of grid points in the
domain by powers of two in each dimension keeping the shape as close to cubic as possible.
The CPU simulations were run on two quad-core Intel Xeon E5620 processors, with each
of the eight cores having 2.4 GHz of processing speed. The code was written in C++ and
aggressively parallelized with OpenMP using the FFTW library for the Fourier transforms.
The GPU simulation was on an Nvidia Tesla C2075 which has 448 cores with 1.15 GHz
processing speed. The code was written in C++ and CUDA using the cuFFT library for the
Fourier transforms.
Figure 4.10 shows the number of times faster that the GPU calculation ran compared
to the CPU calculation. As discussed before, the total computation speedup is lower than
the speed up per time-step due to the slow transfer of data to and from the GPU card. As
computation length increases the total computation speed up will approach the higher value.
Further, the size of the problem is also important. As the number of grid points increases,
so does the amount of computation between thread synchronizations on the GPU leading
to greater eﬃciency. The number of grid points for this computation cannot be increased
above 102 2 due to the 6GB memory limit on the card.

128

Times Faster Than CPU

20

15

10

5
Per Timestep
Total Time
0
15

16

17

18

19

20

21

22

Number of Points in Domain (log 2P)
Figure 4.10 Number of times faster the GPU computation is over an eight-core, fullyparallelized, CPU computation.
An aspect of supercomputing centers that is gaining notoriety is the issue of power
consumption [111]. For this speed comparison, the CPU calculation used eighty watts of
energy per processor for a total energy consumption of 160 W, while the GPU used 225 Watts.
Although the CPU used less power per unit time, the total power consumption was two to
nine times less for the GPU than the CPUs, because the GPU completed the calculation
many times faster. Energy eﬃciency is an additional advantage of GPU computing.

4.9.2

Adaptive Time-Stepping

In order to capture fast transient behaviors in the Cahn-Hilliard and Functionalized CahnHilliard simulations, we implemented a second order adaptive time stepping scheme with
the ETD-RK2 method. Unlike the pairs of methods described in Chapter 3, we do not have
a corresponding implicit scheme with matching error terms. Instead we use the trapezoidal

129

method to compare against,

un+1 = un +

δt
(F (un+1 ) + F (un )) .
2

(4.22)

It would be pointless to solve the second order implicit equation, so we take the solution
from ETD-RK2 at the new time, uETD-RK2 , substitute it into equation 4.22, and explicitly
compute an approximation to the solution from the trapezoidal method,

utrap = un +

δt
(F (uETD-RK2 ) + F (un )) .
2

(4.23)

Taking the diﬀerence of these two solutions at time tn+1 , we obtain an estimate, η, of
the exact single-step error, ηe ,

ηe ≈ η :=

1
u
− utrap .
2 ETD-RK2

(4.24)

This estimate is used to choose the size of the next time step, δtn+1 , with

δtn+1 = δtn max 0.8 3

σ
, 1.3 ,
η

where σ is the desired tolerance, and as before, 0.8 and 1.3 are safety factors.
The adaptive time-stepping is primarily important for the beginning stages of a simulation
where the solution rapidly relaxes onto the slow manifolds of evolution. It also becomes
important later in simulations where domains grow close together and rapid topological
changes occur. At all times, we restrict our time-step to δt ≤ 10−4 , for the reasons discussed
in Sections 4.7 and 5.4.1.

130

4.10

Numerical Examples for FCH

As with the other methods, we we used the high-performance solver to investigate important
geometries and structural evolutions that had connections to the analysis of the models.
Using the medusa head problem in two dimensions, we show that the ETD-RK2 method gives
proper convergence in time for important topological events. Second, we applied the threedimensional GPU solver to a geometry with a hollow sphere interacting with hoop shaped
pores. Third, we will show evolution of a punctured hollow sphere that has applications to
vesicle membranes in biology.

4.10.1

Medusa Head in 2D

The medusa head problem forced us to make some important changes to our early choice
of time-stepping method for the FCH equation. In Section 2.7, we initially presented the
medusa head in three dimensions where we used the convex splitting method to compute
time evolution. When we went back to show convergence in time for the precipitous drop
in energy, we discovered that when we cut the time step in half, the event happened in
approximately half the time. If this eﬀect was due to compounding error that drove the
system away from the unstable equilibrium, we would expect that halving the time step
would lengthen the time before the event.
We chose this geometry as a check against the undesirable behavior shown by the convex
splitting method. In Figure 4.11, we ran the simulation with the same initial condition and
ﬁxed time-steps of 2 · 10−4 , 10−4 , and 5 · 10−5 . The event did happen sooner for smaller time
step sizes, but diﬀerence was very minor. This suggests that our time evolution is precise,
and future work will address accuracy in time evolution compared against the asymptotic

131

analysis of the model. The calculation was done in MATLAB.

4.10.2

Punctured Hollow Spheres

An area of research in cell biological where the FCH model could prove useful is in the study
of lipid membranes. Cells use vesicles (hollow spheres) composed of lipid bilayer membranes
to transport nutrients and perform other vital functions. Recently, molecular dynamics
simulations have been used to study formation of these membranes [112, 113]. Using our
phase ﬁeld model, we studied the time evolution of a vesicle that had been punctured by
π
π
removing the section with azimuthal angle, φ, less than π , 16 , and 32 .
8

132

Figure 4.11 Time evolution of an unstable steady state solution to the FCH equation in two dimensions. Energy versus time is
shown for three identical simulations with diﬀerent ﬁxed time step sizes.

133

Figure 4.12 shows the time evolution of the punctured vesicle with φ ≤ π removed.
8
Initially, the opening receded and reduced the line-energy by thickening the edge. After, the
opening had stabilized, the hole slowly closed. The parameters for simulation were: ε = 0.03,
η1 = 5, η2 = 10, τ = 0.2, and the domain was [− π , π ] cubed. In non-dimensionalized
2 2
time, the closing times for the three punctured spheres was T = 10.9934, T = 0.9658, and
T = 0.03217 respectively, as measured by the sign change of the second derivative of the
energy decay from negative to positive.

4.11

Conclusions

The exponential time diﬀerencing schemes provide a good balance of accuracy and speed
when computing solutions to the Functionalized Cahn-Hilliard equation (1.10). The general
ETD scheme can be extended to higher-order time-stepping schemes by approximating the
nonlinear portion of the equation using both Runge-Kutta or Taylor expansions. Finding a
good linearization of the right hand side of the equation provides dividends in both stability
and accuracy of the calculation. The best linearization that we found was based on the
physics of the model by computing the linearization about the dominant phase u = −1.
We studied the stability and accuracy for the higher-order ETD-RK and ETD-Taylor
methods and found that in general the Runge-Kutta methods had better stability properties
and accuracy for the simple schemes. However, when we treated the ETD-Taylor methods
like predictor corrector schemes, it only took a few iterations to obtain a scheme hundreds of
times more accurate that either the ETD-RK or simple ETD-Taylor methods. Higher-order
ETD schemes rely on precise calculation of the φ-function to be stable and accurate.

134

T =0

T = 0.5

T = 10.0

Figure 4.12 Time evolution of a punctured vesicle.

135

T = 15.0

Adaptive time-stepping based on error estimates improved the speed of the calculations
for reasonable accuracy. We were able to use this combination of speed and accuracy to
capture accurate evolution of physically and biologically relevant structures. Certainly, exponential time diﬀerencing scheme are adequate for simulation of the FCH equation.

136

Chapter 5
Comparison of Methods for the FCH
Equation
To quantify the diﬀerences between the methods proposed in Chapters 2, 3, and 4, we
performed an experiment on the punctured hollow sphere geometry from Section 4.10.2
using the three-dimensional GPU codes given in Appendix ??. The initial condition used
π
was a sphere with angle φ ≤ 16 removed and evolved numerically until a stage after the

hole had closed, but the sphere had not yet become smooth. This gives a smooth geometry
on the manifold of evolution that will evolve quickly without any changes in topology. The
parameters for the test were: ε = 0.03, η1 = 5ε = 0.15, η2 = 10ε = 0.30, and τ = 0.20. We
compare the accuracy, computation speed, and time step restrictions for the three methods
computed on the GPU.

5.1

Description of Compared Schemes

The implementation of the convex splitting method applied the scheme from equation 2.5
with the splitting given in equations 2.12 and 2.13. The ﬁxed-point iteration uses the operator
deﬁned by equation 2.31 with c1 = c2 = 5.0. The error from this method is expected to be
O (δt), and the adaptive time-stepping constant is set at ct = 0.16 (see 2.41).
The implementation of the implicit method used the scheme described in Section 3.3.3
137

with the Jacobian matrix and associated preconditioned matrix given in equations 3.13 and
3.14. The error from this method is O (δt), and the adaptive time stepping tolerance was
set to σ = 10−4 .
The implementation of the exponential time diﬀerencing used the ETD-RK2 scheme
(4.11), with the linearization from Section 4.3. The error from this method is O δt2 . We
compute adaptive time stepping based on the diﬀerence between the solutions after both
stages at time t = tn+1 . This gives a fast ﬁrst-order adaptive time stepping that does not
require any extra computation.

5.2

Complete Simulation

For the ﬁrst comparison, we ran a simulation from T = 0 to T = 2 with each of the three
methods. We ran the simulations using the fully parallelized and optimized GPU code, and
we allowed completely adaptive time stepping. Results from the simulations are given in
Table 5.1. To obtain error estimates, we compared against a “true” solution computed using
the ETD-RK2 method with a ﬁxed time step size of δt = 10−6 . We also computed the
solution with a ﬁxed time step size δt = 10−5 so that we could be conﬁdent that this “true”
solution was accurate enough (i.e. the error was at least an order of magnitude smaller than
the comparison solutions).
The ﬁrst major result was that the solution from the convex splitting method did not
evolve enough to close the puncture as shown in Figure 5.1. This not only gave signiﬁcant
error, but it did not allow us to compare event times, and the calculated energy of the ﬁnal
solution was signiﬁcantly diﬀerent from the other two methods. Further, without the rapid
changes in geometry that accompany the event, the adaptive time stepping scheme did not

138

Convex Splitting Fully Implicit Exponential Integrator
Error
29.66
8.6836e-3
1.8772e-1
Maximum δt
1.8e-2
7.7e-4
1.0e-4
194
43172
4540
Computation Time (s)
Number of Time Steps
190
2997
21109
none
0.9654
0.9658
Event Time
Energy Drop
169.604
176.337
176.335
l2

Table 5.1 Comparison of the convex splitting, fully implicit, and exponential integrator
methods for a full simulation up to T = 2 with adaptive time stepping.
Convex Splitting

Implicit and ETD-RK2

Figure 5.1 Final states of the complete simulation comparing the convex splitting method to
the other two methods. The images show a two-dimensional slice of the three-dimensional
solutions
slow down for that portion of the simulation and the computation time was absurdly fast.
The error from each simulation was calculated with the l2 -norm, and it shows that the
implicit scheme was approximately twenty times more accurate than the ETD scheme. The
implicit scheme also took time steps up to seven times larger than the ETD scheme. The
big winner for the ETD method is that it was roughly ten times faster than the implicit
scheme, and it accurately predicted the geometric event and ended with a computed energy
very close to that of the implicit scheme.
The speed of the exponential time diﬀerencing was much faster than the other two methods. On average, the ETD scheme took 0.215 seconds per time step compared to 1.021

139

second and 14.405 seconds for the convex splitting and implicit schemes, respectively. This
explains why the ETD scheme can successfully take far more small, accurate time steps.
This speed diﬀerence is particularly important for the three-dimensional simulations. With
the ETD scheme, we can compute important geometries out to T = 300 or less in one week.
The same simulation with the convex splitting would likely miss the important dynamics,
and the implicit scheme would require more than two months.

5.3

Fixed Time-Step Simulation

For a second comparison of the three methods, we conducted ﬁxed time-step simulations for
δt = 4 · 10−4 , 2 · 10−4 , and 1 · 10−4 . For initial conditions, the simulation used the “true”
solution from above at time T = 1, and we computed the solution up to T = 2. This initial
geometry was shortly after the closing of the puncture and it evolved to resemble a hollow
sphere.
The results are collected in Table 5.2, and they give further evidence that the convex
splitting method has particularly bad evolution and thus poor accuracy. The implicit method
is roughly two orders of magnitude more accurate than the ETD method for the time-step
sizes used. The convex splitting scheme and the exponential integrator compute time steps
at roughly the same speed, with the implicit scheme taking ﬁve to ten times longer. It is
particularly interesting that when the step size halves (number of steps double), the amount
of computation time does not double. Some of this can be attributed to the overhead of slow
data transfer onto and oﬀ of the GPU card. However, the implicit scheme actually becomes
more eﬃcient for each time-step when the steps are small. This is because the initial guess
for the Newton solve is much closer when the time-step is small, and it therefore uses fewer

140

Convex Splitting
Fully Implicit
δt
Error
Time
Error
Time
4e-4 5.3619
603
1.2657e-3 6968
2e-4 4.1915
1016
6.1597e-4 8096
1941
2.9146e-4 11973
1e-4 2.6570

Exponential Integrator
Error
Time
6.1249e-1
751
2.9415e-1
1126
1.0623e-1
2159

Table 5.2 Comparison of error and simulation time for ﬁxed time-step simulations of the
FCH equation. Error calculations use the l2 norm, and time is given in seconds.
Newton iterations and far fewer preconditioned conjugate gradient iterations.

5.4

Time Step Size Restrictions

Although we have focused on time stepping methods that balance large time steps with
accurate solutions and reasonable computation speeds, we have discovered that there are
soft limits on the size of time steps for each of the three types of methods. Both the convex
splitting method and the exponential time diﬀerencing give incorrect evolution for time steps
that are too large. The implicit scheme is limited by the ability to complete the Newton
convergence and the error control. In this section, we discuss the causes for these limitations
and how to adjust for them.

5.4.1

Solution Freeze Out for Convex Splitting

In our work with Eyre’s convex splitting method, we discovered that some numerical methods
for the Cahn-Hilliard equation suﬀer from a behavior we call solution freeze out. This
behavior occurs when doubling the time step size does not give twice the evolution of the
solution, and the movement of mass with respect to time vanishes. This can happen when
the higher frequency modes in the solution are over damped, causing the mass in the phases
to have a limit on the distance evolved in a given time step.

141

T = 2 · 10−2

T = 4 · 10−2

T = 6 · 10−2

T = 8 · 10−2

T = 10−1

Figure 5.2 Time evolution showing freeze out of solution with larger time step sizes. For
rows one through three δt = 10−2 , 10−3 , and 10−4 respectively.
We ﬁrst noticed this behavior when performing a reﬁnement study on an unstable equilibrium initial condition. When we increased the time step size by a factor of ten, the critical
event happened nearly ten times later in the evolution. This was surprising because we expected any increase in error to drive the geometry away from the equilibrium point more
quickly. Figure 5.2 shows this type of evolution for time step sizes δt = 10−2 , 10−3 , and10−4
in the coarsening stage of evolution for the CH equation. The coarsening of the solution
should occur near T = 2 · 10−2 , but is not captured for the largest time step size. Further,
when δt = 10−3 , the freeze out leads to incorrect early topological changes which give a
diﬀerent solution at the ﬁnal time. This freeze out behavior was also observed by Gomez
and Hughes [55].

142

5.4.2

Solution Freeze Out for Exponential Time Diﬀerencing

The ﬁrst order exponential integrator also experiences freeze out. This causes the decrease
in accuracy leading to loss of convergence order which can be seen in Figures 4.1 and 4.2
as δt approaches 10−4 . The question becomes, at which time step size do we stop trusting
the evolution of the solution? The obvious answer is to look at the order of convergence.
Table 5.3 shows the time reﬁnement study that gives the order of convergence for a given
ﬁxed δt used in the ﬁrst order ETD scheme (4.3). For any δt

10−4 the method ceases to

be ﬁrst order in time, and it will give incorrect evolution. The study was performed on a
solution of Cahn-Hilliard equation that was far beyond the spinodal stage, but this behavior
is consistent with long-time simulations of the FCH equation.
δt
Order
1.00E-3 -0.1754
5.00E-4 0.4640
2.50E-4 0.8845
1.25E-4 1.0263
6.25E-5 1.0433
3.12E-5 1.0354
1.56E-5 1.0266
Table 5.3 Order of convergence at given time step sizes.
More than just a reﬁnement study tells us why δt ≈ 10−4 is the appropriate cutoﬀ. If
we look back at equation 4.3, we notice that all of the time evolution is embedded in the
exponential operators. Freeze out happens when doubling the time step size results in less
than twice the evolution. In Figure 5.3, we plot the amount of evolution versus time step
size and wave number (as in 4.4). We deﬁne amount of evolution as the fractional change in

143

the solution divided by the fractional change in time. For the ﬁrst order ETD method,

amount of evolution =

φ1 (2dtL)
−1
φ1 (dtL)
2dt − 1
dt

+ 1.

As timestep size grows, note that the freeze out begins initially with high-frequency wave
numbers. This behavior brings stability to the method, but when δt gets too large it starts
damping out physically relevant wave numbers. We identify physically relevant wave numbers
by the energy spectrum of the solution (Figure 5.4). For the geometry and length scales we
compute, a typical simulation must capture |k|
requires δt

40, and from Figure 5.3, we see that this

10−4 .

Switching now to the FCH equation, Figure 5.5 shows that the change of the linear
operator increases the freeze out behavior, particularly for the convex splitting method. In
two dimensions, the FCH equation requires accuracy for essentially the same wave numbers
as the CH equation due to geometric similarities. Thus for the ETD schemes, we limit time
steps to a maximum of dt = 10−4 any time that the adaptive time stepping scheme estimates
a time step larger than that. For the convex splitting scheme, the same limitation would
require time steps no larger than δt = 10−7 which greatly reduces its feasibility.

5.4.3

Implicit Time-step Size Restriction

The implicit scheme has a more standard time-step size restriction. Due to the fourthorder nonlinearity in the Functionalized Cahn-Hilliard equation, Newton’s method will not
converge if the initial guess is not close enough. We have two reasonable choices for the
initial guess; the solution at the last computed time step, or the solution of a forward Euler
step. Preconditioning the conjugate gradient iteration has no eﬀect on the time-step size
144

(a) Exponential Integrator

(b) Convex Splitting
Figure 5.3 Freeze out of operators for the two methods applied to the Cahn-Hilliard equation
showing the amount of evolution when doubling the time step size. Note that both methods
suﬀer from similar freeze out behavior.

145

(a) Typical Solution to the CH Equation

(b) Energy Spectrum

Figure 5.4 The energy spectrum of solutions to the Cahn-Hilliard equation is dominated by
low wave numbers.
restriction.
When using the last solution as the initial guess, the larger the step in time, the farther
away the initial guess is from solution at the next time. This gives a time-step restriction on
the order of 10−6 for the FCH equation with ε = 0.03. On the other hand, using a forward
Euler step to initialize the implicit solve also has a limitation. The Euler step should capture
the evolution and make the initial guess closer, but the explicit scheme is unstable for large
time-steps and introduces spurious oscillations. This drives the forward Euler guess away
from being a good initializer. It turns out that the forward Euler step is better in most cases
and provides a time-step restriction on the order of 10−4 when ε = 0.03.

5.5

Conclusions

In conclusion, the exponential time diﬀerencing methods give the best combination of speed
and accuracy, both of which are important in computing long time evolution for solutions
146

(a) Exponential Integrator

(b) Convex Splitting
Figure 5.5 Freeze out of operators for the two methods applied to the Functionalized CahnHilliard equation showing the amount of evolution when doubling the time step size. Note
that the convex splitting method suﬀers from much stronger freeze out behavior.

147

of the Functionalized Cahn-Hilliard equation in three dimensions. In combination with a
spectral solution in space, application of the ETD method to a graphics processing unit
greatly enhances the speed of the simulation and allows us to compute up to T = 300 in
a week or less which is a great improvement over the fully implicit scheme. Although the
ETD methods suﬀer from freeze out eﬀects, the restriction is much less stringent than the
convex splitting method. This gives time accuracy when the code enforces a hard limit on
the size of the time-step. The ETD method captures relevant geometric evolution better
than convex splitting and much faster than a fully implicit scheme.
The Functionalized Cahn-Hilliard equation can be eﬀectively split using the convex splitting method proposed by Eyre [34, 46]. The scheme is very eﬃcient computationally and
performs its implicit iterations on a timescale much like an explicit method. A convex splitting scheme could be very eﬀective in any calculation where the solution at ﬁnal time is
important, but the evolution that arrives at that solution is unneeded. The primary drawback is the lack of accuracy when time-steps are restrictively small. The convex splitting is
also very diﬃcult to adapt to new potential wells or vector phase problems which will be
necessary when connecting with ab initio simulations and experimental work.
Lastly, the fully implicit scheme is the most accurate by at least an order of magnitude.
If very precise evolution is required an implicit scheme is the proper choice. The cost of using
the implicit scheme is its slow computation time. Preconditioning the Newton solve with
a physics-based preconditioner does not alleviate the small time-step sizes required for the
iterative solve to converge, but it does reduce the number of conjugate gradient iterations
by up to a factor of one hundred and cuts the computation time similarly.

148

5.6

Future Work

As with any interesting research, there are a wide variety of directions we could take in the
future. An interesting extension would be to improve the model to more than two phases.
Some of the analysis has already been performed for three phases, and the new energy
formulation comparable to equation 1.6 is

E (u) =

1 2
ε ∆U − W
Ω2

2

−u2 
U + εH0 

u1

− ε η1

ε2
2

U

2

+ η2 W U

dΩ. (5.1)

U = (u1 , u2 )T deﬁnes the fraction of polymer phases one and two, and the fraction of solvent
is us = 1−u1 −u2 so that the total amount is identically one. Further, the εH0 term allows us
to introduce intrinsic curvature and have greater control when modeling important biological
systems such as endocytosis and exocytosis.
An important aspect of future work could be the development of better ties to the physics
through a continuum mechanics description. Currently, there is evidence from experiments
and some asymptotic analysis that suggests that the model is reasonable, but it would vastly
improve the value of the FCH model if we could identify what types of physical systems
connect well with the model’s results. With this development, we would be able to identify
the physical constants that each of the four coeﬃcients in the FCH energy, namely ε, η1 ,
η2 , and τ . We would also need to obtain an accurate potential well, W (u) for each speciﬁc
system.
A more physics based application of the model is comparison with small angle scattering
either with x-rays (SAXS) or neutrons (SANS). We have put some eﬀort into this connection
already, but the extension of the numerical to SAXS comparison could lead to an inverse

149

problem to ﬁnd the four coeﬃcients in the FCH energy. With some good fortune, this inverse
problem could assist engineers and chemists in creating better membranes for PEM fuel cells,
solid state Dye-Sensitized solar cells (DSSC), or polymer gel electrolyte batteries.
Finally, one of the widest areas of future research would be the generation of better
numerical methods for handling diﬀerent boundary conditions and larger domains. With
the ability to compute solutions on larger domains with any type of boundary condition, we
could simulate the entire pore network formation of Naﬁon or other materials from lab-like
conditions. There would no longer be the restrictions of simulating inside bulk material
with uniform water concentrations. The likely candidate for these changes is high order
discontinuous Galerkin methods with global domain decomposition.
Although not exhaustive, this list gives a view of the opportunities for numerical computation that exist with the Functionalized Cahn-Hilliard model. The methods and implementations described deliver signiﬁcant results, and valuable knowledge will come from the
continuation of this research.

150

APPENDICES

151

Appendix A: Convex Splitting Code
in CUDA
1
2
3
4

//
// spe6NL3Dcuda6 . cu
//
// This code was w r i t t e n by J a y l a n Jones t o approximate s o l u t i o n s
to the
5 // F u n c t i o n a l i z e d Cahn−H i l l i a r d Equation u s i n g th e convex
splitting
6 // scheme p a t t e r n e d a f t e r t he work o f Eyre .
7 //
8
9 #i n c l u d e <i o s t r e a m >
10 #i n c l u d e <f s t r e a m >
11 #i n c l u d e <sstream>
12 #i n c l u d e <cuda . h>
13 #i n c l u d e < c u f f t . h>
14 #i n c l u d e <time . h>
15
16 #d e f i n e PI 3 . 1 4 1 5 9 2 6 5 3 5 8 9
17 #d e f i n e REAL double
18 #d e f i n e COMPLEX cufftDoubleComplex
19 #d e f i n e TYPE CUFFT Z2Z
20 #d e f i n e EXEC c u f f t E x e c Z 2 Z
21
22 u s i n g namespace s t d ;
23
24 //
25 // This s e r i e s o f k e r n e l s a r e a l l c a l l e d i n th e main f u n c t i o n , and
they
26 // break up th e c o m p u t a t i o n a l work i n t o p i e c e s t h a t have no data
27 // dependence on each o t h e r .
28 //
29
30 //
152

31 // b u i l d s h p c o n s t r u c t s t h e approximate i n v e r s e t o th e F r e c h e t
derivative
32 // used i n t he i t e r a t i v e p r o c e s s t h a t s o l v e s t h e i m p l i c i t p o r t i o n
of
33 // t h e s p l i t t i n g .
34 //
35
36
37
global
v o i d b u i l d s h p (REAL ∗ shp , REAL ∗Lap , i n t n , REAL alpha ,
REAL eta2 , REAL taub , REAL c1 , REAL c2 )
38 {
39
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
40
i f ( idx < n)
41
shp [ i d x ] = −Lap [ i d x ] ∗ ( alpha ∗ alpha ∗Lap [ i d x ] ∗ ( alpha ∗ alpha ∗Lap [
i d x ] − c2 ) + c1 + 1 − 0 . 5 ∗ taub ∗ taub + alpha ∗ e t a 2 ) ;
42 }
43
44 //
45 // A l l t h e numbered k e r n e l s a r e c a l l e d i n main
46 //
47
48
global
v o i d kern0 ( i n t n , COMPLEX ∗U, COMPLEX ∗R, REAL evap ,
REAL dt )
49 {
50
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
51
i f ( idx < n)
52
{
53
U[ i d x ] . x = R[ i d x ] . x − dt ∗ evap ∗ (R[ i d x ] . x + 1 ) / 2 ;
54
U[ i d x ] . y = 0 . 0 ;
55
}
56 }
57
58
global
v o i d kern1 ( i n t n , REAL alpha , REAL eta1 , REAL eta2 ,
REAL taub , REAL taus , COMPLEX ∗U, COMPLEX ∗F1U , COMPLEX ∗F2U ,
COMPLEX ∗F3R , COMPLEX ∗F4R , COMPLEX ∗F5R)
59 {
60
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
61
i f ( idx < n)
62
{
63
F1U [ i d x ] . x = 2 . 5 ∗ taub ∗pow (U[ i d x ] . x , 4 ) + ( 0 . 5 ∗ taub ∗ taub − 4 −
a lp h a ∗ e t a 2 ) ∗pow (U[ i d x ] . x , 3 ) ;
64
F1U [ i d x ] . y = 0 . 0 ;
65
66
F2U [ i d x ] . x = ( 2 + alpha ∗ e t a 1 − 0 . 5 ∗ taub ∗U[ i d x ] . x ) ∗U[ i d x ] . x ;
67
F2U [ i d x ] . y = 0 . 0 ;
153

68
69
70
71
72
73
74
75

F3R [ i d x ] . x = pow (U[ i d x ] . x , 3 ) ;
F3R [ i d x ] . y = 0 . 0 ;
F4R [ i d x ] . x = (3∗U[ i d x ] . x + taub ) ∗U[ i d x ] . x ;
F4R [ i d x ] . y = 0 . 0 ;
F5R [ i d x ] . x = 3∗pow (U[ i d x ] . x , 5 ) − 0 . 5 ∗ ( taub ∗6 + alpha ∗ e t a 2 ∗ t a u s
) ∗pow (U[ i d x ] . x , 2 ) ;
F5R [ i d x ] . y = 0 . 0 ;

76
77
}
78 }
79
80
global
v o i d kern2 ( i n t n , COMPLEX ∗U, COMPLEX ∗LapR , REAL ∗Lap )
81 {
82
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
83
i f ( idx < n)
84
{
85
LapR [ i d x ] . x = Lap [ i d x ] ∗U[ i d x ] . x ;
86
LapR [ i d x ] . y = Lap [ i d x ] ∗U[ i d x ] . y ;
87
}
88 }
89
90
global
v o i d kernDel ( i n t n , COMPLEX ∗U, COMPLEX ∗DelxR , REAL ∗
Delx , COMPLEX ∗DelyR , REAL ∗ Dely , COMPLEX ∗DelzR , REAL ∗ Delz )
91 {
92
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
93
i f ( idx < n)
94
{
95
DelxR [ i d x ] . x = −Delx [ i d x ] ∗U[ i d x ] . y ;
96
DelxR [ i d x ] . y = Delx [ i d x ] ∗U[ i d x ] . x ;
97
98
DelyR [ i d x ] . x = −Dely [ i d x ] ∗U[ i d x ] . y ;
99
DelyR [ i d x ] . y = Dely [ i d x ] ∗U[ i d x ] . x ;
100
101
DelzR [ i d x ] . x = −Delz [ i d x ] ∗U[ i d x ] . y ;
102
DelzR [ i d x ] . y = Delz [ i d x ] ∗U[ i d x ] . x ;
103
}
104 }
105
106
global
v o i d kern3 ( i n t n , REAL ∗Lap , REAL alpha , COMPLEX ∗expR ,
COMPLEX ∗F1U , COMPLEX ∗F2U , COMPLEX ∗F4RLapR , COMPLEX ∗F4R ,
COMPLEX ∗LapR )
107 {
108
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
154

109
110
111
112

i f ( idx < n)
{
expR [ i d x ] . x = −Lap [ i d x ] ∗ ( alpha ∗ alpha ∗F2U [ i d x ] . x∗Lap [ i d x ] + F1U
[ idx ] . x) ;
expR [ i d x ] . y = −Lap [ i d x ] ∗ ( alpha ∗ alpha ∗F2U [ i d x ] . y∗Lap [ i d x ] + F1U
[ idx ] . y) ;

113
114
F4RLapR [ i d x ] . x = F4R [ i d x ] . x∗LapR [ i d x ] . x/n ;
115
F4RLapR [ i d x ] . y = 0 . 0 ;
116
}
117 }
118
119 //
120 // kernE computes t he FCH e n e r g y from th e s o l u t i o n a t t he c u r r e n t
time
121 //
122
123
global
v o i d kernE ( i n t n , REAL alpha , REAL eta1 , REAL eta2 ,
REAL taub , REAL taus , COMPLEX ∗LapR , COMPLEX ∗R, COMPLEX ∗DelxR
, COMPLEX ∗DelyR , COMPLEX ∗DelzR , REAL ∗ Energy )
124 {
125
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
126
i f ( idx < n)
127
{
128
Energy [ i d x ] = 0 . 5 ∗ ( alpha ∗ alpha ∗LapR [ i d x ] . x/n − (R[ i d x ] . x∗R[ i d x
] . x−1) ∗(R[ i d x ] . x+taub / 2) ) ∗( alpha ∗ alpha ∗LapR [ i d x ] . x/n − (R[ i d x ] .
x∗R[ i d x ] . x−1) ∗(R[ i d x ] . x+taub / 2) ) − alpha ∗ 0 . 5 ∗ ( alpha ∗ alpha ∗ e t a 1
∗( DelxR [ i d x ] . x/n∗DelxR [ i d x ] . x/n + DelyR [ i d x ] . x/n∗DelyR [ i d x ] . x/n
+ DelzR [ i d x ] . x/n∗DelzR [ i d x ] . x/n ) + e t a 2 ∗(R[ i d x ] . x+1) ∗(R[ i d x ] . x
+1) ∗ ( 0 . 5 ∗ (R[ i d x ] . x−1) ∗(R[ i d x ] . x−1)+t a u s /3∗(R[ i d x ] . x−2) ) ) ;
129
}
130 }
131
132
global
v o i d kern4 ( i n t n , COMPLEX ∗F4RLapR , COMPLEX ∗F4R ,
COMPLEX ∗LapR )
133 {
134
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
135
i f ( idx < n)
136
{
137
F4RLapR [ i d x ] . x = F4R [ i d x ] . x∗LapR [ i d x ] . x/n ;
138
F4RLapR [ i d x ] . y = 0 . 0 ;
139
140
}
141 }
142
155

143

global
v o i d kern5 ( i n t n , REAL ∗Lap , REAL alpha , REAL eta2 ,
REAL taub , COMPLEX ∗ hr , COMPLEX ∗expR , COMPLEX ∗F3R , COMPLEX ∗
F4RLapR , COMPLEX ∗U, COMPLEX ∗F5R)

144 {
145
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
146
i f ( idx < n)
147
{
148
hr [ i d x ] . x = expR [ i d x ] . x − Lap [ i d x ] ∗ ( alpha ∗ alpha ∗Lap [ i d x ] ∗ (
a lp h a ∗ alpha ∗Lap [ i d x ] ∗U[ i d x ] . x−F3R [ i d x ] . x ) − alpha ∗ alpha ∗F4RLapR
[ i d x ] . x + (1 −0.5∗ taub ∗ taub+alpha ∗ e t a 2 ) ∗U[ i d x ] . x + F5R [ i d x ] . x ) ;
149
hr [ i d x ] . y = expR [ i d x ] . y − Lap [ i d x ] ∗ ( alpha ∗ alpha ∗Lap [ i d x ] ∗ (
a lp h a ∗ alpha ∗Lap [ i d x ] ∗U[ i d x ] . y−F3R [ i d x ] . y ) − alpha ∗ alpha ∗F4RLapR
[ i d x ] . y + (1 −0.5∗ taub ∗ taub+alpha ∗ e t a 2 ) ∗U[ i d x ] . y + F5R [ i d x ] . y ) ;
150
}
151 }
152
153 //
154 // warpReduce , reduce , warpReduceSum , and reduceSum p a r a l l e l i z e
t h e r e d u c t i o n s n e c e s s a r y t o c a l c u l a t e e n e r g y and change i n
solution
155 //
156
157
device
v o i d warpReduce ( v o l a t i l e REAL ∗ sdata , u n s i g n e d i n t t i d ,
int blockSize )
158 {
159
i f ( b l o c k S i z e >= 6 4) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 3 2 ] ) ) ;
160
i f ( b l o c k S i z e >= 3 2) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 1 6 ] ) ) ;
161
i f ( b l o c k S i z e >= 1 6) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 8 ] ) ) ;
162
i f ( b l o c k S i z e >= 8 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 4 ] ) ) ;
163
i f ( b l o c k S i z e >= 4 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 2 ] ) ) ;
164
i f ( b l o c k S i z e >= 2 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 1 ] ) ) ;
165 }
166
167
global
v o i d r e d u c e (COMPLEX ∗ g i d a t a , COMPLEX ∗ g odata , i n t n ,
int blockSize )
168 {
169
extern
shared
REAL s d a t a [ ] ;
170
unsigned i n t t i d = threadIdx . x ;
171
u n s i g n e d i n t i = b l o c k I d x . x ∗( b l o c k S i z e ∗2) + t i d ;
156

172
173
174
175
176
177
178
179
180

u n s i g n e d i n t g r i d S i z e = b l o c k S i z e ∗2∗ gridDim . x ;
sdata [ t i d ] = 0;
w h i l e ( i < n ) { s d a t a [ t i d ] = max( f a b s ( g i d a t a [ i ] . x ) , f a b s ( g i d a t a [
i+b l o c k S i z e ] . x ) ) ; i += g r i d S i z e ; }
syncthreads () ;
i f ( blockSize
sdata [ t i d ] ) ,
i f ( blockSize
sdata [ t i d ] ) ,
i f ( blockSize
sdata [ t i d ] ) ,
i f ( t i d < 3 2)
i f ( t i d == 0 )

>= 5 12) { i f ( t i d < 256) { s d a t a [ t i d ] = max( f a b s (
fabs ( sdata [ t i d + 256]) ) ; }
syncthreads () ; }
>= 2 56) { i f ( t i d < 128) { s d a t a [ t i d ] = max( f a b s (
fabs ( sdata [ t i d + 128]) ) ; }
syncthreads () ; }
>= 1 28) { i f ( t i d < 6 4) { s d a t a [ t i d ] = max( f a b s (
fabs ( sdata [ t i d + 6 4 ] ) ) ; }
syncthreads () ; }
warpReduce ( sdata , t i d , b l o c k S i z e ) ;
g odata [ blockIdx . x ] . x = sdata [ 0 ] ;

181
182
183 }
184
185
device
v o i d warpReduceSum ( v o l a t i l e REAL ∗ sdata , u n s i g n e d i n t
tid , int blockSize )
186 {
187
i f ( b l o c k S i z e >= 6 4) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 3 2 ] ;
188
i f ( b l o c k S i z e >= 3 2) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 1 6 ] ;
189
i f ( b l o c k S i z e >= 1 6) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 8 ] ;
190
i f ( b l o c k S i z e >= 8 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 4 ] ;
191
i f ( b l o c k S i z e >= 4 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 2 ] ;
192
i f ( b l o c k S i z e >= 2 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 1 ] ;
193 }
194
195
global
v o i d reduceSum (REAL ∗ g i d a t a , REAL ∗ g odata , i n t n , i n t
blockSize )
196 {
197
extern
shared
REAL s d a t a [ ] ;
198
unsigned i n t t i d = threadIdx . x ;
199
u n s i g n e d i n t i = b l o c k I d x . x ∗( b l o c k S i z e ∗2) + t i d ;
200
u n s i g n e d i n t g r i d S i z e = b l o c k S i z e ∗2∗ gridDim . x ;
201
sdata [ t i d ] = 0;
202
203
w h i l e ( i < n ) { s d a t a [ t i d ] = g i d a t a [ i ] + g i d a t a [ i+b l o c k S i z e ] ; i
+= g r i d S i z e ; }
204
syncthreads () ;
205
206
i f ( b l o c k S i z e >= 5 12) { i f ( t i d < 256) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 2 5 6 ] ; }
syncthreads () ; }
207
i f ( b l o c k S i z e >= 2 56) { i f ( t i d < 128) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 1 2 8 ] ; }
syncthreads () ; }
157

208

if
+
if
if

( b l o c k S i z e >= 1 28) { i f ( t i d < 6 4) { s d a t a [ t i d ] = s d a t a [ t i d ]
sdata [ t i d + 6 4 ] ; }
syncthreads () ; }
( t i d < 3 2) warpReduceSum ( sdata , t i d , b l o c k S i z e ) ;
( t i d == 0 ) g o d a t a [ b l o c k I d x . x ] = s d a t a [ 0 ] ;

209
210
211 }
212
213
global
v o i d kern6 ( i n t n , REAL dt , COMPLEX ∗hp , REAL ∗ shp ,
COMPLEX ∗expR , COMPLEX ∗U)
214 {
215
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
216
i f ( idx < n)
217
{
218
hp [ i d x ] . x = 1/ dt + shp [ i d x ] ;
219
220
expR [ i d x ] . x = −U[ i d x ] . x/ dt + expR [ i d x ] . x ;
221
expR [ i d x ] . y = −U[ i d x ] . y/ dt + expR [ i d x ] . y ;
222
}
223 }
224
225
global
v o i d kern7 ( i n t n , REAL alpha , REAL eta2 , REAL taub ,
REAL taus , COMPLEX ∗R, COMPLEX ∗F3R , COMPLEX ∗F4R , COMPLEX ∗F5R
)
226 {
227
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
228
i f ( idx < n)
229
{
230
F3R [ i d x ] . x = pow (R[ i d x ] . x , 3 ) ;
231
F3R [ i d x ] . y = 0 . 0 ;
232
233
F4R [ i d x ] . x = (3∗R[ i d x ] . x + taub ) ∗R[ i d x ] . x ;
234
F4R [ i d x ] . y = 0 . 0 ;
235
236
F5R [ i d x ] . x = 3∗pow (R[ i d x ] . x , 5 ) − 0 . 5 ∗ ( taub ∗6 + alpha ∗ e t a 2 ∗ t a u s
) ∗pow (R[ i d x ] . x , 2 ) ;
237
F5R [ i d x ] . y = 0 . 0 ;
238
}
239 }
240
241
global
v o i d kern8 ( i n t n , REAL dt , REAL alpha , REAL eta2 , REAL
taub , REAL ∗Lap , COMPLEX ∗ change , COMPLEX ∗expR , COMPLEX ∗R,
COMPLEX ∗F3R , COMPLEX ∗F4RLapR , COMPLEX ∗F5R , COMPLEX ∗hp )
242 {
243
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
244
i f ( idx < n)
245
{
158

change [ i d x ] . x = expR [ i d x ] . x + R[ i d x ] . x/ dt − Lap [ i d x ] ∗ ( alpha ∗
a lp h a ∗Lap [ i d x ] ∗ ( alpha ∗ alpha ∗Lap [ i d x ] ∗R[ i d x ] . x−F3R [ i d x ] . x ) −
a lp h a ∗ alpha ∗F4RLapR [ i d x ] . x + (1 −0.5∗ taub ∗ taub+alpha ∗ e t a 2 ) ∗R[ i d x
] . x + F5R [ i d x ] . x ) ;
change [ i d x ] . y = expR [ i d x ] . y + R[ i d x ] . y/ dt − Lap [ i d x ] ∗ ( alpha ∗
a lp h a ∗Lap [ i d x ] ∗ ( alpha ∗ alpha ∗Lap [ i d x ] ∗R[ i d x ] . y−F3R [ i d x ] . y ) −
a lp h a ∗ alpha ∗F4RLapR [ i d x ] . y + (1 −0.5∗ taub ∗ taub+alpha ∗ e t a 2 ) ∗R[ i d x
] . y + F5R [ i d x ] . y ) ;

246

247

248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265

change [ i d x ] . x = −change [ i d x ] . x/hp [ i d x ] . x ;
change [ i d x ] . y = −change [ i d x ] . y/hp [ i d x ] . x ;
}
}
global

v o i d kern9 ( i n t n , COMPLEX ∗R, COMPLEX ∗ change )

{
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
i f ( idx < n)
{
R[ i d x ] . x = (R[ i d x ] . x + change [ i d x ] . x ) /n ;
R[ i d x ] . y = 0 . 0 ;
}
}

//
// The node s t r u c t u r e b u i l d s a l i n k e d l i s t t h a t c o l l e c t s e n e r g y
and time data
266 //
267
268 s t r u c t node
269 {
270
REAL time ;
271
REAL e n e r g y ;
272
s t r u c t node ∗ next ;
273 } ;
274
275 v o i d addNode ( s t r u c t node∗& t a i l , REAL time )
276 {
277
s t r u c t node ∗ newNode = new node ;
278
newNode−>time = time ;
279
newNode−>e n e r g y = 0 . 0 ;
280
newNode−>next = NULL;
281
t a i l −>next = newNode ;
282
t a i l = newNode ;
283 }
159

284
285 //
286 // This i s where t h e main f u n c t i o n b e g i n s . To make s e n s e o f t h e
code , b e g i n h e r e .
287 //
288
289 i n t main ( i n t argc , ch ar ∗ c o n s t argv [ ] ) {
290
i f ( a r g c <=1 )
291
{
292
c o u t << ” Usage : ” << argv [ 0 ] << ” <f i l e n a m e > <output o f f /
i n c r e a s i n g / even ( 0 / 1 / 2 )>” << e n d l ;
293
exit (1) ;
294
}
295
296
c l o c k t tic , toc ;
297
tic = clock () ;
298
//
299
// I n i t i a l i z e n e c e s s a r y c o n s t a n t s
300
//
301
cout . p r e c i s i o n (10) ;
302
int i , j , k ;
303
fstream myfile , myfile2 , myfile3 , myfile4 ;
304
s t r i n g s t r e a m sstm ;
305
char datain [ 5 0 ] ;
306
307
REAL alpha , eta1 , eta2 , time , tmax , dt , t o l , Lx , Ly , Lz , c1 , c2 ,
ct , taub , taus , evap , t p i c s t e p ;
308
i n t m, n , Nx , Ny , Nz , Mx, My, Mz, b l o c k s i z e ;
309 COMPLEX ∗max , ∗max h , ∗ r e s ;
310
311
max h = new COMPLEX;
312
r e s = new COMPLEX;
313
314
s t r u c t node ∗ tEhead = NULL;
315
s t r u c t node ∗ t E t a i l = NULL;
316
317
tEhead = new node ;
318
tEhead−>time = 0 . 0 ;
319
tEhead−>e n e r g y = 0 . 0 ;
320
t E t a i l = tEhead ;
321
322
string f i l e i n , fileUout , filetEout ;
323
324
//

160

325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366

// Read i n a l l t h e n e c e s s a r y data from th e f i l e g i v e n by th e
user
//
m y f i l e 3 . open ( argv [ 1 ] , i o s : : i n ) ;
i f ( myfile3 . is open () )
{
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
f i l e i n = datain ;
c o u t << ” I n i t cond f i l e = ” << f i l e i n << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
fileUout = datain ;
c o u t << ” Write out s o l u t i o n f i l e = ” << f i l e U o u t << ” . dat ” <<
endl ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
f i l e t E o u t = datain ;
c o u t << ” Write out time and e n e r g y f i l e = ” << f i l e t E o u t << ” .
dat ” << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
a l pha = a t o f ( d a t a i n ) ;
c o u t << ” alpha = ” << alpha << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
taub = a t o f ( d a t a i n ) ;
c o u t << ” taub = ” << taub << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
taus = atof ( datain ) ;
c o u t << ” t a u s = ” << t a u s << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
eta1 = atof ( datain ) ;
c o u t << ” e t a 1 = ” << e t a 1 << e n d l ;

161

367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411

m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
eta2 = atof ( datain ) ;
c o u t << ” e t a 2 = ” << e t a 2 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
evap = a t o f ( d a t a i n ) ;
c o u t << ” evap = ” << evap << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
tmax = a t o f ( d a t a i n ) ;
c o u t << ”tmax = ” << tmax << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
dt = a t o f ( d a t a i n ) ;
c o u t << ” dt = ” << dt << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
tpicstep = atof ( datain ) ;
c o u t << ” t p i c s t e p = ” << t p i c s t e p << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Nx = a t o i ( d a t a i n ) ;
c o u t << ”Nx = ” << Nx << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ny = a t o i ( d a t a i n ) ;
c o u t << ”Ny = ” << Ny << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Nz = a t o i ( d a t a i n ) ;
c o u t << ”Nz = ” << Nz << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Lx = a t o f ( d a t a i n ) ;
c o u t << ”Lx = ” << Lx << e n d l ;

162

412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456

m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ly = a t o f ( d a t a i n ) ;
c o u t << ”Ly = ” << Ly << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Lz = a t o f ( d a t a i n ) ;
c o u t << ”Lz = ” << Lz << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
c1 = a t o f ( d a t a i n ) ;
c o u t << ” c1 = ” << c1 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
c2 = a t o f ( d a t a i n ) ;
c o u t << ” c2 = ” << c2 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
ct = atof ( datain ) ;
c o u t << ” c t = ” << c t << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
t o l = atof ( datain ) ;
c o u t << ” t o l = ” << t o l << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
b l o c k s i z e = atoi ( datain ) ;
c o u t << ” Block s i z e = ” << b l o c k s i z e << e n d l ;
c o u t << ” Grid s i z e = ” << Nx∗Ny∗Nz/ b l o c k s i z e << e n d l ;
}
e l s e { cout << ” F i l e c o u l d not be re ad . ” << e n d l ; }
time = 0 . 0 ;
REAL t P i c = 1 . 0 ;
i f (∗ argv [ 2 ] == ’ 2 ’ ) { t P i c = t p i c s t e p ; }
Mx = Nx / 2 ;
My = Ny / 2 ;
163

457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500

Mz = Nz / 2 ;
n = Nx∗Ny∗Nz ;
dim3 dimBlock ( b l o c k s i z e ) ;
dim3 dimGrid ( n/ dimBlock . x ) ;
//
// B u i l d t he l a p l a c i a n array , and d y n a m i c a l l y a l l o c a t e o t h e r
needed a r r a y s
//
REAL ∗ xkvec , ∗ ykvec , ∗ zkvec ;
xkvec = new REAL[ Nx ] ;
ykvec = new REAL[ Ny ] ;
z k ve c = new REAL[ Nz ] ;
f o r ( i =0; i <Mx; i ++)
{
xkvec [ i ] = i ∗PI /Lx ;
xkvec [ i+Mx] = ( i −Mx) ∗PI /Lx ;
}
f o r ( i =0; i <My; i ++)
{
ykvec [ i ] = i ∗PI /Ly ;
ykvec [ i+My] = ( i −My) ∗PI /Ly ;
}
f o r ( i =0; i <Mz ; i ++)
{
z k vec [ i ] = i ∗PI /Lz ;
z k vec [ i+Mz ] = ( i −Mz) ∗PI /Lz ;
}
REAL ∗ Lap h , ∗Lap , ∗ shp , ∗ Energy , ∗ Energy h ;
REAL ∗ Delx h , ∗ Dely h , ∗ Delz h , ∗ Delx , ∗ Dely , ∗ Delz ;
Energy h = new REAL;
Lap h = new REAL[ n ] ;
Delx h = new REAL[ n ] ;
Dely h = new REAL[ n ] ;
D e l z h = new REAL[ n ] ;
f o r ( i =0; i <Nx ; i ++) {
f o r ( j =0; j <Ny ; j ++) {
f o r ( k=0; k<Nz ; k++)
{
164

Lap h [ k+Nz∗( j+Ny∗ i ) ] = −(xkvec [ i ] ∗ xkvec [ i ]+ ykvec [ j ] ∗ ykvec [
j ]+ z k vec [ k ] ∗ zkvec [ k ] ) ;
Delx h [ k+Nz∗( j+Ny∗ i ) ] = xkvec [ i ] ;
Dely h [ k+Nz∗( j+Ny∗ i ) ] = ykvec [ j ] ;
D e l z h [ k+Nz ∗( j+Ny∗ i ) ] = zkvec [ k ] ;
}
}

501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542

}
c o u t << ”Lap b u i l t ” << e n d l ;
i n t nBytes = s i z e o f (COMPLEX) ∗n ;
c o u t << ” nBytes i s = ” << nBytes << e n d l ;
COMPLEX ∗U h , ∗U, ∗R, ∗F1U , ∗F2U , ∗F3R , ∗F4R , ∗F5R , ∗LapR , ∗
DelxR , ∗DelyR , ∗DelzR , ∗expR , ∗F4RLapR , ∗ hr , ∗hp , ∗ change ;
U h = new COMPLEX[ nBytes ] ;
//
// Read i n i n i t i a l U data from f i l e i n
//
m y f i l e . open ( f i l e i n . c s t r ( ) , i o s : : i n ) ;
i f ( myfile . is open () )
{
f o r ( i =0; i <n ; i ++)
{
m y f i l e . g e t l i n e ( da t ai n , 50 ) ;
U h [ i ] . x = atof ( datain ) ;
U h[ i ]. y = 0.0;
}
c o u t << ” I n i t i a l data read i n s u c c e s s f u l l y ” << e n d l ;
} e l s e { cout << ” I n i t i a l data not r ead ” << e n d l ; }
myfile . close () ;
//
// A l l o c a t e memory on th e GPU and s e t up s p a c e f o r t he F o u r i e r
transforms
//
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d

∗∗)&Lap ,
∗∗)&Delx ,
∗∗)&Dely ,
∗∗)&Delz ,

s i z e o f (REAL) ∗n ) ;
s i z e o f (REAL) ∗n ) ;
s i z e o f (REAL) ∗n ) ;
s i z e o f (REAL) ∗n ) ;
165

543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583

cudaMalloc ( ( v o i d ∗∗)&max , s i z e o f (COMPLEX) ∗ dimGrid . x /2 ) ;
cudaMalloc ( ( v o i d ∗∗)&shp , s i z e o f (REAL) ∗n ) ;
cudaMalloc ( ( v o i d ∗∗)&Energy , s i z e o f (REAL) ∗n ) ;
cudaMalloc ( ( v o i d ∗∗)&U, nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&R, nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&F1U , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&F2U , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&F3R , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&F4R , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&F5R , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&LapR , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&DelxR , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&DelyR , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&DelzR , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&expR , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&F4RLapR , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&hr , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&hp , nBytes ) ;
cudaMalloc ( ( v o i d ∗∗)&change , nBytes ) ;
c o u t << ” Device memory a l l o c a t e d ” << e n d l ;
cudaMemcpy ( Lap , Lap h , s i z e o f (REAL) ∗n , cudaMemcpyHostToDevice ) ;
cudaMemcpy ( Delx , Delx h , s i z e o f (REAL) ∗n , cudaMemcpyHostToDevice )
;
cudaMemcpy ( Dely , Dely h , s i z e o f (REAL) ∗n , cudaMemcpyHostToDevice )
;
cudaMemcpy ( Delz , Delz h , s i z e o f (REAL) ∗n , cudaMemcpyHostToDevice )
;
cudaMemcpy (R, U h , nBytes , cudaMemcpyHostToDevice ) ;
c o u t << ” I n i t i a l data c o p i e d ” << e n d l ;
c u f f t H a n d l e p lan ;
c u f f t P l a n 3 d (&plan , Nx , Ny , Nz , TYPE) ;
b u i l d s h p <<< dimGrid , dimBlock>>>(shp , Lap , n , alpha , eta2 , taub
, c1 , c2 ) ;
kern0<<< dimGrid , dimBlock>>>(n , U, R, 0 , 0 ) ;
sstm << f i l e t E o u t << ” . dat ” ;
m y f i l e 4 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
myfile4 . precision (14) ;
sstm . s t r ( ”” ) ;

166

584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620

//
// w h i l e l o o p g o v e r n i n g time s t e p p i n g
//
w h i l e ( time < tmax )
{
//
// Compute th e e x p l i c i t p o r t i o n o f t he r i g h t hand s i d e
//
res [ 0 ] . x = 1000.0;
kern1<<< dimGrid , dimBlock>>>(n , alpha , eta1 , eta2 , taub , taus
, R, F1U , F2U , F3R , F4R , F5R) ;
EXEC( plan , U, U, CUFFT FORWARD) ;
kern2<<< dimGrid , dimBlock>>>(n , U, LapR , Lap ) ;
kernDel<<< dimGrid , dimBlock>>>(n , U, DelxR , Delx , DelyR , Dely
, DelzR , Delz ) ;
EXEC( plan
EXEC( plan
EXEC( plan
EXEC( plan
EXEC( plan
EXEC( plan

,
,
,
,
,
,

F1U , F1U , CUFFT FORWARD) ;
F2U , F2U , CUFFT FORWARD) ;
LapR , LapR , CUFFT INVERSE) ;
DelxR , DelxR , CUFFT INVERSE) ;
DelyR , DelyR , CUFFT INVERSE) ;
DelzR , DelzR , CUFFT INVERSE) ;

kern3<<< dimGrid , dimBlock>>>(n , Lap , alpha , expR , F1U , F2U ,
F4RLapR , F4R , LapR) ;
//
// Compute th e e n e r g y o f t he s o l u t i o n from t h e p r e v i o u s time
step
//
kernE<<< dimGrid , dimBlock>>>(n , alpha , eta1 , eta2 , taub , taus
, LapR , R, DelxR , DelyR , DelzR , Energy ) ;
cudaMemcpy ( Energy h , Energy , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
reduceSum<<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(
Energy , Energy , n , b l o c k s i z e ) ;
cudaMemcpy ( Energy h , Energy , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
m = n / ( b l o c k s i z e ∗2) ;
167

621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659

w h i l e (m > 1 )
{
reduceSum<<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(
Energy , Energy , m, b l o c k s i z e ) ;
m = m/ ( b l o c k s i z e ∗2) ;
}
cudaMemcpy ( Energy h , Energy , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
t E t a i l −>e n e r g y = ∗ Energy h ;
m y f i l e 4 << t E t a i l −>time << ”\ t ” << t E t a i l −>e n e r g y << e n d l ;
//
// Compute th e d i f f e r e n c e between th e e x p l i c i t s t e p and th e
p r e v i o u s time s t e p t o approximate t he a d a p t i v e time s t e p p i n g .
//
EXEC( plan , F3R , F3R , CUFFT FORWARD) ;
EXEC( plan , F4RLapR , F4RLapR , CUFFT FORWARD) ;
EXEC( plan , F5R , F5R , CUFFT FORWARD) ;
kern5<<< dimGrid , dimBlock>>>(n , Lap , alpha , eta2 , taub , hr ,
expR , F3R , F4RLapR , U, F5R) ;
EXEC( plan , hr , hr , CUFFT INVERSE) ;
reduce <<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(hr , max
, n, blocksize ) ;
m = n / ( b l o c k s i z e ∗2) ;
w h i l e (m > 1 )
{
reduce <<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(max ,
max , m, b l o c k s i z e ) ;
m = m/ ( b l o c k s i z e ∗2) ;
}
cudaMemcpy ( max h , max , s i z e o f (REAL) , cudaMemcpyDeviceToHost ) ;
max h [ 0 ] . x = max h [ 0 ] . x / n ;
dt = c t /max h [ 0 ] . x ;
//
// Adjust th e t i m e s t e p i f n e c e s s a r y
//
i f ( dt < 0 . 0 0 0 0 0 1 )
{
dt = 0 . 0 0 0 0 0 1 ;
168

660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700

}
i f ( dt > ( tmax − time ) )
{
dt = tmax − time ;
}
i f ( dt > ( t P i c − time ) && ∗ argv [ 2 ] != ’ 0 ’ )
{
dt = t P i c − time ;
}
kern6<<< dimGrid , dimBlock>>>(n , dt , hp , shp , expR , U) ;
//
// This l o o p i t e r a t e s u n t i l t he n o n l i n e a r p a r t i s w i t h i n
t o l e r a n c e of the s o l u t i o n
//
while ( res [ 0 ] . x > t o l )
{
kern7<<< dimGrid , dimBlock>>>(n , alpha , eta2 , taub , taus , R,
F3R , F4R , F5R) ;
EXEC( plan , R, R, CUFFT FORWARD) ;
kern2<<< dimGrid , dimBlock>>>(n , R, LapR , Lap ) ;
EXEC( plan , LapR , LapR , CUFFT INVERSE) ;
kern4<<< dimGrid , dimBlock>>>(n , F4RLapR , F4R , LapR) ;
EXEC( plan , F3R , F3R , CUFFT FORWARD) ;
EXEC( plan , F4RLapR , F4RLapR , CUFFT FORWARD) ;
EXEC( plan , F5R , F5R , CUFFT FORWARD) ;
kern8<<< dimGrid , dimBlock>>>(n , dt , alpha , eta2 , taub , Lap ,
change , expR , R, F3R , F4RLapR , F5R , hp ) ;
EXEC( plan , change , change , CUFFT INVERSE) ;
reduce <<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(
change , max , n , b l o c k s i z e ) ;
m = n / ( b l o c k s i z e ∗2) ;
w h i l e (m > 1 )
{
169

701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743

reduce <<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(max
, max , m, b l o c k s i z e ) ;
m = m/ ( b l o c k s i z e ∗2) ;
}
cudaMemcpy ( r e s , max , s i z e o f (REAL) , cudaMemcpyDeviceToHost ) ;
r e s [ 0 ] . x = r e s [ 0 ] . x / (REAL) n ;
EXEC( plan , R, R, CUFFT INVERSE) ;
kern9<<< dimGrid , dimBlock>>>(n , R, change ) ;
}
//
// Update time and update U t o be th e s o l u t i o n a t th e next
time s t e p
// Write out s o l u t i o n t o hard d i s k i f r e q u i r e d
//
kern0<<< dimGrid , dimBlock>>>(n , U, R, evap , dt ) ;
time = time + dt ;
i f ( time == t P i c && ∗ argv [ 2 ] != ’ 0 ’ )
{
cudaMemcpy ( U h , U, nBytes , cudaMemcpyDeviceToHost ) ;
sstm << f i l e U o u t << ( i n t ) t P i c << ” . dat ” ;
m y f i l e 2 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
myfile2 . precision (14) ;
i f ( myfile2 . is open () )
{
f o r ( i =0; i <n ; i ++)
{
m y f i l e 2 << U h [ i ] . x << e n d l ;
}
}
c out << ” Writing out ” << sstm . s t r ( ) << e n d l ;
myfile2 . close () ;
sstm . s t r ( ”” ) ;
i f (∗ argv [ 2 ] == ’ 1 ’ )
{
t P i c += pow ( (REAL) 1 0 . 0 , (REAL) c e i l ( l o g 1 0 ( t P i c +1) ) −1) ;
}
170

744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783

else
{
t P i c += t p i c s t e p ;
}
}
c o u t << time << ”\ t ” << dt << ”\ t ” << t E t a i l −>e n e r g y << e n d l ;
addNode ( t E t a i l , time ) ;
}
//
// Write f i n a l s o l u t i o n t o hard d i s k and compute i t s e n e r g y
//
cudaMemcpy ( U h , U, nBytes , cudaMemcpyDeviceToHost ) ;
EXEC( plan , R, R, CUFFT FORWARD) ;
kern2<<< dimGrid , dimBlock>>>(n , R, LapR , Lap ) ;
kernDel<<< dimGrid , dimBlock>>>(n , R, DelxR , Delx , DelyR , Dely ,
DelzR , Delz ) ;
EXEC( plan
EXEC( plan
EXEC( plan
EXEC( plan

,
,
,
,

LapR , LapR , CUFFT INVERSE) ;
DelxR , DelxR , CUFFT INVERSE) ;
DelyR , DelyR , CUFFT INVERSE) ;
DelzR , DelzR , CUFFT INVERSE) ;

kernE<<< dimGrid , dimBlock>>>(n , alpha , eta1 , eta2 , taub , taus ,
LapR , U, DelxR , DelyR , DelzR , Energy ) ;
reduceSum<<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(Energy
, Energy , n , b l o c k s i z e ) ;
m = n / ( b l o c k s i z e ∗2) ;
w h i l e (m > 1 )
{
reduceSum<<< dimGrid , dimBlock , b l o c k s i z e ∗ s i z e o f (REAL)>>>(
Energy , Energy , m, b l o c k s i z e ) ;
m = m/ ( b l o c k s i z e ∗2) ;
}
cudaMemcpy ( Energy h , Energy , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
t E t a i l −>e n e r g y = ∗ Energy h ;
m y f i l e 4 << t E t a i l −>time << ”\ t ” << t E t a i l −>e n e r g y << e n d l ;
171

784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828

myfile4 . close () ;
sstm << f i l e U o u t << ( i n t ) tmax << ” . dat ” ;
m y f i l e 2 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
myfile2 . precision (14) ;
i f ( myfile2 . is open () )
{
f o r ( i =0; i <n ; i ++)
{
m y f i l e 2 << U h [ i ] . x << e n d l ;
}
}
c o u t << ” Writing out ” << sstm . s t r ( ) << e n d l ;
myfile2 . close () ;
sstm . s t r ( ”” ) ;
w h i l e ( tEhead != NULL)
{
t E t a i l = tEhead ;
tEhead = tEhead−>next ;
delete tEtail ;
}
//
// Perform c l e a n up o f memory used
//
free (U h) ;
f r e e ( max h ) ;
f r e e ( Energy h ) ;
free ( res ) ;
cudaFree ( Lap ) ;
cudaFree ( Delx ) ;
cudaFree ( Dely ) ;
cudaFree ( Delz ) ;
cudaFree (max) ;
cudaFree ( Energy ) ;
cudaFree ( shp ) ;
cudaFree (U) ;
cudaFree (R) ;
cudaFree (F1U) ;
cudaFree (F2U) ;
172

829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854 }

cudaFree (F3R) ;
cudaFree (F4R) ;
cudaFree (F5R) ;
cudaFree (LapR ) ;
cudaFree ( DelxR ) ;
cudaFree ( DelyR ) ;
cudaFree ( DelzR ) ;
cudaFree ( expR ) ;
cudaFree (F4RLapR) ;
cudaFree ( hr ) ;
cudaFree ( hp ) ;
cudaFree ( change ) ;
delete
delete
delete
delete
delete
delete
delete

[]
[]
[]
[]
[]
[]
[]

xkvec ;
ykvec ;
zkvec ;
Lap h ;
Delx h ;
Dely h ;
Delz h ;

toc = clock () − t i c ;
c o u t << ”\n” << t o c / ( ( double ) CLOCKS PER SEC) << e n d l ;
return 0;
./code/spe6NL3Dcuda6.cu

173

Appendix B: Fully Implicit Code in
CUDA
1
2
3
4

//
// FCHImp . cu
//
// This code was w r i t t e n by J a y l a n Jones t o approximate s o l u t i o n s
to the
5 // F u n c t i o n a l i z e d Cahn−H i l l i a r d Equation u s i n g th e f u l l y i m p l i c i t
scheme .
6 //
7
8 #i n c l u d e <i o s t r e a m >
9 #i n c l u d e <f s t r e a m >
10 #i n c l u d e <sstream>
11 #i n c l u d e <cuda . h>
12 #i n c l u d e < c u f f t . h>
13 #i n c l u d e <time . h>
14
15 #d e f i n e PI 3 . 1 4 1 5 9 2 6 5 3 5 8 9
16 #d e f i n e REAL double
17 #d e f i n e COMPLEX cufftDoubleComplex
18 #d e f i n e TYPE CUFFT Z2Z
19 #d e f i n e EXEC c u f f t E x e c Z 2 Z
20
21 u s i n g namespace s t d ;
22
23 //
24 // B u i l d data s t r u c t u r e s n e c e s s a r y t o p a s s i n t o k e r n e l s
25 //
26
27 s t r u c t dataBin {
28
i n t n , l , m, mtotal , CGsteps , Nsteps , b l o c k s i z e ;
29
REAL dt , e p s i l o n , eta1 , eta2 , tau , t o l , maxR, ∗ Energy h ;
30
dim3 dimBlock , dimGrid ;
31 COMPLEX ∗max h ;
174

32
33
34
35
36
37
38
39
40
41

};
s t r u c t operBin {
REAL ∗Lap , ∗Q;
};
int i ;

//
// This s e r i e s o f k e r n e l s c o m p a r t m e n t a l i z e s th e c a l c u l a t i o n i n t o
p e i c e s t h a t do not depend s e r i a l l y on data .
42 // The k e r n e l s a r e c a l l e d i n t t he main f u n c t i o n below .
43 //
44
45 //
46 // kernE1 and kernE2 a r e c a l l e d i n t he FCHenergy f u n c t i o n t h a t
c a l c u l a t e s th e FCH e n e r g y o f t he s o l u t i o n
47 //
48
49
global
v o i d kernE1 ( i n t n , COMPLEX ∗U, REAL ∗Lap , REAL ∗ Delx ,
REAL ∗ Dely , REAL ∗ Delz , COMPLEX ∗LapU , COMPLEX ∗DelxU , COMPLEX
∗DelyU , COMPLEX ∗DelzU )
50 {
51
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
52
i f ( idx < n)
53
{
54
U[ i d x ] . x = U[ i d x ] . x/n ;
55
U[ i d x ] . y = U[ i d x ] . y/n ;
56
57
LapU [ i d x ] . x = Lap [ i d x ] ∗U[ i d x ] . x ;
58
LapU [ i d x ] . y = Lap [ i d x ] ∗U[ i d x ] . y ;
59
60
DelxU [ i d x ] . x = −Delx [ i d x ] ∗U[ i d x ] . y ;
61
DelxU [ i d x ] . y = Delx [ i d x ] ∗U[ i d x ] . x ;
62
63
DelyU [ i d x ] . x = −Dely [ i d x ] ∗U[ i d x ] . y ;
64
DelyU [ i d x ] . y = Dely [ i d x ] ∗U[ i d x ] . x ;
65
66
DelzU [ i d x ] . x = −Delz [ i d x ] ∗U[ i d x ] . y ;
67
DelzU [ i d x ] . y = Delz [ i d x ] ∗U[ i d x ] . x ;
68
69
}
70 }
71

175

72

global
v o i d kernE2 ( i n t n , REAL e p s i l o n , REAL eta1 , REAL eta2 ,
REAL tau , REAL ∗ Energy , COMPLEX ∗U, COMPLEX ∗LapU , COMPLEX ∗
DelxU , COMPLEX ∗DelyU , COMPLEX ∗DelzU )

73 {
74
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
75
i f ( idx < n)
76
{
77
Energy [ i d x ] = 0 . 5 ∗ ( e p s i l o n ∗ e p s i l o n ∗LapU [ i d x ] . x−(U[ i d x ] . x∗U[ i d x
] . x−1) ∗(U[ i d x ] . x+tau / 2) ) ∗( e p s i l o n ∗ e p s i l o n ∗LapU [ i d x ] . x−(U[ i d x ] . x
∗U[ i d x ] . x−1) ∗(U[ i d x ] . x+tau / 2) )−e p s i l o n ∗( e t a 1 ∗ e p s i l o n ∗ e p s i l o n
/2∗( DelxU [ i d x ] . x∗DelxU [ i d x ] . x + DelyU [ i d x ] . x∗DelyU [ i d x ] . x +
DelzU [ i d x ] . x∗DelzU [ i d x ] . x ) + e t a 2 ∗ 0 . 5 ∗ (U[ i d x ] . x+1) ∗(U[ i d x ] . x+1)
∗ ( 0 . 5 ∗ (U[ i d x ] . x−1) ∗(U[ i d x ] . x−1)+tau /3∗(U[ i d x ] . x−2) ) ) ;
78
}
79 }
80
81 //
82 // kernRes1 , kernRes2 , kernRes3 , and kernRes4 a r e c a l l e d i n t he
r e s i d u a l f u n c t i o n t h a t c a l c u l a t e s t he r e s i d u a l f o r t he
conjugate gradient i t e r a t i o n s
83 //
84
85
global
v o i d kernRes1 ( i n t n , COMPLEX ∗W, COMPLEX ∗T1 , REAL ∗Lap
)
86 {
87
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
88
i f ( idx < n)
89
{
90
W[ i d x ] . x = W[ i d x ] . x/n ;
91
W[ i d x ] . y = W[ i d x ] . y/n ;
92
93
T1 [ i d x ] . x = Lap [ i d x ] ∗W[ i d x ] . x ;
94
T1 [ i d x ] . y = Lap [ i d x ] ∗W[ i d x ] . y ;
95
}
96 }
97
98
global
v o i d kernRes2 ( i n t n , REAL dt , REAL e p s i l o n , REAL eta1 ,
REAL eta2 , REAL tau , COMPLEX ∗W, COMPLEX ∗R, COMPLEX ∗T1)
99 {
100
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
101
i f ( idx < n)
102
{
103
T1 [ i d x ] . x = e p s i l o n ∗ e p s i l o n ∗T1 [ i d x ] . x − (W[ i d x ] . x∗W[ i d x ] . x−1)
∗(W[ i d x ] . x+tau / 2) ;
104
T1 [ i d x ] . y = 0 . 0 ;
176

105
106

R[ i d x ] . x = −(3∗W[ i d x ] . x∗W[ i d x ] . x + tau ∗W[ i d x ] . x − 1 − e p s i l o n ∗
e t a 1 ) ∗T1 [ i d x ] . x + e p s i l o n ∗( eta1−e t a 2 ) ∗(W[ i d x ] . x∗W[ i d x ] . x−1) ∗(W[
i d x ] . x+tau /2 ) ;
R[ i d x ] . y = 0 . 0 ;

107
108
}
109 }
110
111
global
v o i d kernRes3 ( i n t n , REAL dt , REAL e p s i l o n , COMPLEX ∗R,
COMPLEX ∗T1 , REAL ∗Lap )
112 {
113
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
114
i f ( idx < n)
115
{
116
R[ i d x ] . x = dt ∗Lap [ i d x ] ∗ ( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] ∗ T1 [ i d x ] . x + R
[ idx ] . x) ;
117
R[ i d x ] . y = dt ∗Lap [ i d x ] ∗ ( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] ∗ T1 [ i d x ] . y + R
[ idx ] . y) ;
118
}
119 }
120
121
global
v o i d kernRes4 ( i n t n , COMPLEX ∗W, COMPLEX ∗U, COMPLEX ∗R
)
122 {
123
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
124
i f ( idx < n)
125
{
126
R[ i d x ] . x = − W[ i d x ] . x + U[ i d x ] . x + R[ i d x ] . x/n ;
127
R[ i d x ] . y = 0 . 0 ;
128
}
129 }
130
131 //
132 // kernHm1 and kernHm2 a r e used t o compute th e Hˆ{−1} i n n e r
pr oduct i n t h e Hm1InnerProd f u n c t i o n
133 //
134
135
global
v o i d kernHm1 ( i n t n , REAL ∗Lap , COMPLEX ∗Vhat , COMPLEX ∗
innerProd )
136 {
137
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
138
i f ( i d x < n && i d x != 0 )
139
{
140
i n n e r P r o d [ i d x ] . x = Vhat [ i d x ] . x/Lap [ i d x ] ;
141
i n n e r P r o d [ i d x ] . y = Vhat [ i d x ] . y/Lap [ i d x ] ;
177

142
}
143
e l s e i f ( i d x == 0 )
144
{
145
innerProd [ idx ] . x = 0 . 0 ;
146
innerProd [ idx ] . y = 0 . 0 ;
147
}
148 }
149
150
global
v o i d kernHm2 ( i n t n , REAL ∗ Energy , COMPLEX ∗U, COMPLEX ∗
innerProd )
151 {
152
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
153
i f ( idx < n)
154
{
155
Energy [ i d x ] = U[ i d x ] . x∗ i n n e r P r o d [ i d x ] . x/n ;
156
}
157 }
158
159 //
160 // kernCG1 through kernCG7 a r e c a l l e d i n t he conjGrad f u n c t i o n
161 //
162
163
global
v o i d kernCG1 ( i n t n , REAL dt , REAL ∗Q, COMPLEX ∗Zhat ,
COMPLEX ∗R, COMPLEX ∗Phat , COMPLEX ∗Vhat )
164 {
165
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
166
i f ( idx < n)
167
{
168
Zhat [ i d x ] . x = R[ i d x ] . x/(1+ dt ∗Q[ i d x ] ) ;
169
Zhat [ i d x ] . y = R[ i d x ] . y/(1+ dt ∗Q[ i d x ] ) ;
170
171
Phat [ i d x ] . x = Zhat [ i d x ] . x ;
172
Phat [ i d x ] . y = Zhat [ i d x ] . y ;
173
174
Vhat [ i d x ] . x = 0 . 0 ;
175
Vhat [ i d x ] . y = 0 . 0 ;
176
177
R[ i d x ] . x = R[ i d x ] . x/n ;
178
R[ i d x ] . y = R[ i d x ] . y/n ;
179
}
180 }
181
182
global
v o i d kernCG2 ( i n t n , COMPLEX ∗Phat , COMPLEX ∗T1 , COMPLEX
∗W, COMPLEX ∗LapU , REAL ∗Lap )
183 {
178

184
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
185
i f ( idx < n)
186
{
187
Phat [ i d x ] . x = Phat [ i d x ] . x/n ;
188
Phat [ i d x ] . y = Phat [ i d x ] . y/n ;
189
190
T1 [ i d x ] . x = Lap [ i d x ] ∗ Phat [ i d x ] . x ;
191
T1 [ i d x ] . y = Lap [ i d x ] ∗ Phat [ i d x ] . y ;
192
193
W[ i d x ] . x = W[ i d x ] . x/n ;
194
W[ i d x ] . y = W[ i d x ] . y/n ;
195
196
LapU [ i d x ] . x = Lap [ i d x ] ∗W[ i d x ] . x ;
197
LapU [ i d x ] . y = Lap [ i d x ] ∗W[ i d x ] . y ;
198
}
199 }
200
201
global
v o i d kernCG3 ( i n t n , REAL dt , REAL e p s i l o n , REAL eta1 ,
REAL eta2 , REAL tau , REAL ∗Lap , COMPLEX ∗T1 , COMPLEX ∗W,
COMPLEX ∗Phat , COMPLEX ∗JP , COMPLEX ∗LapU)
202 {
203
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
204
i f ( idx < n)
205
{
206
T1 [ i d x ] . x = e p s i l o n ∗ e p s i l o n ∗T1 [ i d x ] . x − (3∗W[ i d x ] . x∗W[ i d x ] . x +
tau ∗W[ i d x ] . x − 1 ) ∗Phat [ i d x ] . x ;
207
T1 [ i d x ] . y = 0 . 0 ;
208
209
JP [ i d x ] . x = −(6∗W[ i d x ] . x + tau ) ∗Phat [ i d x ] . x ∗( e p s i l o n ∗ e p s i l o n ∗
LapU [ i d x ] . x − (W[ i d x ] . x∗W[ i d x ] . x−1) ∗(W[ i d x ] . x+tau / 2) ) − (3∗W[
i d x ] . x∗W[ i d x ] . x + tau ∗W[ i d x ] . x − 1 − e p s i l o n ∗ e t a 1 ) ∗T1 [ i d x ] . x +
e p s i l o n ∗( eta1−e t a 2 ) ∗(3∗W[ i d x ] . x∗W[ i d x ] . x + tau ∗W[ i d x ] . x − 1 ) ∗
Phat [ i d x ] . x ;
210
JP [ i d x ] . y = 0 . 0 ;
211
}
212 }
213
214
global
v o i d kernCG4 ( i n t n , REAL dt , REAL e p s i l o n , COMPLEX ∗JP ,
COMPLEX ∗Phat , COMPLEX ∗T1 , REAL ∗Lap )
215 {
216
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
217
i f ( idx < n)
218
{
219
JP [ i d x ] . x = ( Phat [ i d x ] . x − dt ∗Lap [ i d x ] ∗ ( e p s i l o n ∗ e p s i l o n ∗Lap [
i d x ] ∗ T1 [ i d x ] . x + JP [ i d x ] . x ) ) /n ;
179

220

JP [ i d x ] . y = ( Phat [ i d x ] . y − dt ∗Lap [ i d x ] ∗ ( e p s i l o n ∗ e p s i l o n ∗Lap [
i d x ] ∗ T1 [ i d x ] . y + JP [ i d x ] . y ) ) /n ;

221
}
222 }
223
224
global
v o i d kernCG5 ( i n t n , REAL alpha , COMPLEX ∗Vhat , COMPLEX
∗Phat , COMPLEX ∗R, COMPLEX ∗JP )
225 {
226
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
227
i f ( idx < n)
228
{
229
Vhat [ i d x ] . x = Vhat [ i d x ] . x + alpha ∗Phat [ i d x ] . x ;
230
Vhat [ i d x ] . y = Vhat [ i d x ] . y + alpha ∗Phat [ i d x ] . y ;
231
232
R[ i d x ] . x = R[ i d x ] . x − alpha ∗JP [ i d x ] . x ;
233
R[ i d x ] . y = 0 . 0 ;
234
}
235 }
236
237
global
v o i d kernCG6 ( i n t n , REAL dt , REAL ∗Q, COMPLEX ∗Zhat ,
COMPLEX ∗R)
238 {
239
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
240
i f ( idx < n)
241
{
242
Zhat [ i d x ] . x = R[ i d x ] . x/(1+ dt ∗Q[ i d x ] ) ;
243
Zhat [ i d x ] . y = R[ i d x ] . y/(1+ dt ∗Q[ i d x ] ) ;
244
245
R[ i d x ] . x = R[ i d x ] . x/n ;
246
R[ i d x ] . y = R[ i d x ] . y/n ;
247
}
248 }
249
250
global
v o i d kernCG7 ( i n t n , REAL beta1 , REAL beta2 , COMPLEX ∗
Phat , COMPLEX ∗ Zhat )
251 {
252
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
253
i f ( idx < n)
254
{
255
Phat [ i d x ] . x = Zhat [ i d x ] . x + beta2 / beta1 ∗Phat [ i d x ] . x ;
256
Phat [ i d x ] . y = Zhat [ i d x ] . y + beta2 / beta1 ∗Phat [ i d x ] . y ;
257
}
258 }
259
260 //
180

261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276

// kernN1 i s c a l l e d i n th e Newton f u n c t i o n
//
global

v o i d kernN1 ( i n t n , COMPLEX ∗Vhat , COMPLEX ∗W)

{
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
i f ( idx < n)
{
Vhat [ i d x ] . x = Vhat [ i d x ] . x/n ;
W[ i d x ] . x += Vhat [ i d x ] . x ;
W[ i d x ] . y = 0 . 0 ;
}
}
global
v o i d kernLapQ ( i n t n , REAL e p s i l o n , REAL c1 , REAL c3 ,
REAL ∗Lap , REAL ∗Q, REAL ∗ Delx , REAL ∗ Dely , REAL ∗ Delz )

277 {
278
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
279
i f ( idx < n)
280
{
281
Lap [ i d x ] = −(Delx [ i d x ] ∗ Delx [ i d x ]+ Dely [ i d x ] ∗ Dely [ i d x ]+ Delz [ i d x
] ∗ Delz [ i d x ] ) ;
282
Q[ i d x ] = −Lap [ i d x ] ∗ ( e p s i l o n ∗ e p s i l o n ∗ e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] ∗
Lap [ i d x ] − c1 ∗ e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] − c3 ) ;
283
}
284 }
285
286 //
287 // kern1 through kern5 a r e c a l l e d i n t he main f u n c t i o n
288 //
289
290
global
v o i d kern1 ( i n t n , REAL ∗Lap , COMPLEX ∗U, COMPLEX ∗LapU)
291 {
292
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
293
i f ( idx < n)
294
{
295
U[ i d x ] . x = U[ i d x ] . x/n ;
296
U[ i d x ] . y = U[ i d x ] . y/n ;
297
298
LapU [ i d x ] . x = Lap [ i d x ] ∗U[ i d x ] . x ;
299
LapU [ i d x ] . y = Lap [ i d x ] ∗U[ i d x ] . y ;
300
}
301 }
302
181

303

global
v o i d kern2 ( i n t n , REAL e p s i l o n , REAL eta1 , REAL eta2 ,
REAL tau , COMPLEX ∗LapU , COMPLEX ∗T1 , COMPLEX ∗U, COMPLEX ∗Wexp
)

304 {
305
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
306
i f ( idx < n)
307
{
308
T1 [ i d x ] . x = e p s i l o n ∗ e p s i l o n ∗LapU [ i d x ] . x − (U[ i d x ] . x∗U[ i d x ] . x −
1 ) ∗(U[ i d x ] . x + tau / 2) ;
309
T1 [ i d x ] . y = 0 . 0 ;
310
311
Wexp [ i d x ] . x = −(3∗U[ i d x ] . x∗U[ i d x ] . x + tau ∗U[ i d x ] . x − 1 −
e p s i l o n ∗ e t a 1 ) ∗T1 [ i d x ] . x + e p s i l o n ∗( eta1−e t a 2 ) ∗(U[ i d x ] . x∗U[ i d x ] .
x − 1 ) ∗(U[ i d x ] . x + tau / 2) ;
312
Wexp [ i d x ] . y = 0 . 0 ;
313
}
314 }
315
316
global
v o i d kern3 ( i n t n , REAL dt , REAL e p s i l o n , REAL ∗Lap ,
COMPLEX ∗Wexp , COMPLEX ∗T1)
317 {
318
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
319
i f ( idx < n)
320
{
321
Wexp [ i d x ] . x = dt ∗Lap [ i d x ] ∗ ( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] ∗ T1 [ i d x ] . x
+ Wexp [ i d x ] . x ) ;
322
Wexp [ i d x ] . y = dt ∗Lap [ i d x ] ∗ ( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] ∗ T1 [ i d x ] . y
+ Wexp [ i d x ] . y ) ;
323
}
324 }
325
326
global
v o i d kern4 ( i n t n , COMPLEX ∗Wexp , COMPLEX ∗U, COMPLEX ∗W
)
327 {
328
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
329
i f ( idx < n)
330
{
331
Wexp [ i d x ] . x = U[ i d x ] . x + Wexp [ i d x ] . x/n ;
332
Wexp [ i d x ] . y = U[ i d x ] . y + Wexp [ i d x ] . y/n ;
333
334
W[ i d x ] . x = Wexp [ i d x ] . x ;
335
W[ i d x ] . y = Wexp [ i d x ] . y ;
336
}
337 }
338
182

339 //
340 // k e r n E r r o r i s t o compute t he e r r o r f o r t he a d a p t i v e time
s t e p p i n g scheme
341 //
342
343
global
v o i d k e r n E r r o r ( i n t n , COMPLEX ∗R, COMPLEX ∗Wexp ,
COMPLEX ∗W)
344 {
345
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
346
i f ( idx < n)
347
{
348
R[ i d x ] . x = 0 . 5 ∗ f a b s (Wexp [ i d x ] . x − W[ i d x ] . x ) ;
349
}
350 }
351
352
global
v o i d kern5 ( i n t n , COMPLEX ∗U, COMPLEX ∗W)
353 {
354
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
355
i f ( idx < n)
356
{
357
U[ i d x ] . x = W[ i d x ] . x ;
358
U[ i d x ] . y = 0 . 0 ; //W[ i d x ] . y ;
359
}
360 }
361
362 //
363 // warpReduce , reduce , warpReduceSum , and reduceSum p a r a l l e l i z e
t h e r e d u c t i o n s n e c e s s a r y t o c a l c u l a t e e n e r g y and e r r o r
364 //
365
366
device
v o i d warpReduce ( v o l a t i l e REAL ∗ sdata , u n s i g n e d i n t t i d ,
int blockSize )
367 {
368
i f ( b l o c k S i z e >= 6 4) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 3 2 ] ) ) ;
369
i f ( b l o c k S i z e >= 3 2) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 1 6 ] ) ) ;
370
i f ( b l o c k S i z e >= 1 6) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 8 ] ) ) ;
371
i f ( b l o c k S i z e >= 8 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 4 ] ) ) ;
372
i f ( b l o c k S i z e >= 4 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 2 ] ) ) ;
373
i f ( b l o c k S i z e >= 2 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 1 ] ) ) ;
183

374 }
375
376
global
v o i d r e d u c e (COMPLEX ∗ g i d a t a , COMPLEX ∗ g odata , i n t n ,
int blockSize )
377 {
378
extern
shared
REAL s d a t a [ ] ;
379
unsigned i n t t i d = threadIdx . x ;
380
u n s i g n e d i n t i = b l o c k I d x . x ∗( b l o c k S i z e ∗2) + t i d ;
381
u n s i g n e d i n t g r i d S i z e = b l o c k S i z e ∗2∗ gridDim . x ;
382
sdata [ t i d ] = 0;
383
384
w h i l e ( i < n ) { s d a t a [ t i d ] = max( f a b s ( g i d a t a [ i ] . x ) , f a b s ( g i d a t a [
i+b l o c k S i z e ] . x ) ) ; i += g r i d S i z e ; }
385
syncthreads () ;
386
387
i f ( b l o c k S i z e >= 5 12) { i f ( t i d < 256) { s d a t a [ t i d ] = max( f a b s (
sdata [ t i d ] ) , fabs ( sdata [ t i d + 256]) ) ; }
syncthreads () ; }
388
i f ( b l o c k S i z e >= 2 56) { i f ( t i d < 128) { s d a t a [ t i d ] = max( f a b s (
sdata [ t i d ] ) , fabs ( sdata [ t i d + 128]) ) ; }
syncthreads () ; }
389
i f ( b l o c k S i z e >= 1 28) { i f ( t i d < 6 4) { s d a t a [ t i d ] = max( f a b s (
sdata [ t i d ] ) , fabs ( sdata [ t i d + 6 4 ] ) ) ; }
syncthreads () ; }
390
i f ( t i d < 3 2) warpReduce ( sdata , t i d , b l o c k S i z e ) ;
391
i f ( t i d == 0 ) g o d a t a [ b l o c k I d x . x ] . x = s d a t a [ 0 ] ;
392 }
393
394
device
v o i d warpReduceSum ( v o l a t i l e REAL ∗ sdata , u n s i g n e d i n t
tid , int blockSize )
395 {
396
i f ( b l o c k S i z e >= 6 4) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 3 2 ] ;
397
i f ( b l o c k S i z e >= 3 2) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 1 6 ] ;
398
i f ( b l o c k S i z e >= 1 6) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 8 ] ;
399
i f ( b l o c k S i z e >= 8 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 4 ] ;
400
i f ( b l o c k S i z e >= 4 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 2 ] ;
401
i f ( b l o c k S i z e >= 2 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 1 ] ;
402 }
403
404
global
v o i d reduceSum (REAL ∗ g i d a t a , REAL ∗ g odata , i n t n , i n t
blockSize )
405 {
406
extern
shared
REAL s d a t a [ ] ;
407
unsigned i n t t i d = threadIdx . x ;
408
u n s i g n e d i n t i = b l o c k I d x . x ∗( b l o c k S i z e ∗2) + t i d ;
409
u n s i g n e d i n t g r i d S i z e = b l o c k S i z e ∗2∗ gridDim . x ;
410
sdata [ t i d ] = 0;
411
184

w h i l e ( i < n ) { s d a t a [ t i d ] = g i d a t a [ i ] + g i d a t a [ i+b l o c k S i z e ] ; i
+= g r i d S i z e ; }
syncthreads () ;

412
413
414
415

i f ( b l o c k S i z e >= 5 12) { i f ( t i d < 256) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 2 5 6 ] ; }
syncthreads () ; }
i f ( b l o c k S i z e >= 2 56) { i f ( t i d < 128) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 1 2 8 ] ; }
syncthreads () ; }
i f ( b l o c k S i z e >= 1 28) { i f ( t i d < 6 4) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 6 4 ] ; }
syncthreads () ; }
i f ( t i d < 3 2) warpReduceSum ( sdata , t i d , b l o c k S i z e ) ;
i f ( t i d == 0 ) g o d a t a [ b l o c k I d x . x ] = s d a t a [ 0 ] ;

416
417
418
419
420
421
422
423
424
425
426

427
428
429
430
431
432
433
434
435
436
437
438

439
440
441
442
443
444

}
//
// FCHenergy computes t he e n e r g y o f th e s o l u t i o n U
//
REAL FCHenergy ( dataBin &data , operBin oper , COMPLEX ∗U, REAL ∗
Energy , REAL ∗ Delx , REAL ∗ Dely , REAL ∗ Delz , COMPLEX ∗LapU ,
COMPLEX ∗DelxU , COMPLEX ∗DelyU , COMPLEX ∗DelzU , c u f f t H a n d l e
plan ) {
EXEC ( plan , U, U, CUFFT FORWARD) ;
kernE1<<< data . dimGrid , data . dimBlock>>>(data . n , U, oper . Lap ,
Delx , Dely , Delz , LapU , DelxU , DelyU , DelzU ) ;
EXEC( plan
EXEC( plan
EXEC( plan
EXEC( plan
EXEC( plan

,
,
,
,
,

LapU , LapU , CUFFT INVERSE) ;
DelxU , DelxU , CUFFT INVERSE) ;
DelyU , DelyU , CUFFT INVERSE) ;
DelzU , DelzU , CUFFT INVERSE) ;
U, U, CUFFT INVERSE) ;

kernE2<<< data . dimGrid , data . dimBlock>>>(data . n , data . e p s i l o n ,
data . eta1 , data . eta2 , data . tau , Energy , U, LapU , DelxU , DelyU ,
DelzU ) ;
reduceSum<<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f (
REAL)>>>(Energy , Energy , data . n , data . b l o c k s i z e ) ;
i n t nRed = data . n / ( data . b l o c k s i z e ∗2) ;
w h i l e ( nRed > 1 )
{

185

reduceSum<<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗
s i z e o f (REAL)>>>(Energy , Energy , nRed , data . b l o c k s i z e ) ;
nRed = nRed / ( data . b l o c k s i z e ∗2) ;

445
446
447
448
449

}

cudaMemcpy ( data . Energy h , Energy , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
450 // t E t a i l −>e n e r g y = ∗ data . Energy h ;
451
452
r e t u r n (∗ data . Energy h ) ;
453 }
454
455 //
456 // r e s i d u a l computes th e r e s i d u a l needed i n th e s t a n d a r d c o n j u g a t e
g r a d i e n t a l g o r i t h m used below
457 //
458
459 v o i d r e s i d u a l ( dataBin &data , operBin oper , COMPLEX ∗U, COMPLEX ∗W
, COMPLEX ∗R, COMPLEX ∗T1 , c u f f t H a n d l e plan ) {
460
461
EXEC( plan , W, W, CUFFT FORWARD) ;
462
463
kernRes1<<< data . dimGrid , data . dimBlock>>>(data . n , W, T1 , oper .
Lap ) ;
464
465
EXEC( plan , T1 , T1 , CUFFT INVERSE) ;
466
EXEC( plan , W, W, CUFFT INVERSE) ;
467
468
kernRes2<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt , data
. e p s i l o n , data . eta1 , data . eta2 , data . tau , W, R, T1) ;
469
470
EXEC( plan , T1 , T1 , CUFFT FORWARD) ;
471
EXEC( plan , R, R, CUFFT FORWARD) ;
472
473
kernRes3<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt , data
. e p s i l o n , R, T1 , op er . Lap ) ;
474
475
EXEC( plan , R, R, CUFFT INVERSE) ;
476
477
kernRes4<<< data . dimGrid , data . dimBlock>>>(data . n , W, U, R) ;
478 }
479
480 //
481 // Hm1InnerProd computes t he i n n e r product i n t he Hˆ{−1} norm
between U and V
186

482 //
483
484 REAL Hm1InnerProd ( dataBin &data , operBin oper , COMPLEX ∗U, COMPLEX
∗Vhat , COMPLEX ∗ innerProd , REAL ∗ Energy , c u f f t H a n d l e plan ) {
485
486
kernHm1<<< data . dimGrid , data . dimBlock>>>(data . n , oper . Lap , Vhat
, innerProd ) ;
487
488
EXEC( plan , innerProd , innerProd , CUFFT INVERSE) ;
489
490
kernHm2<<< data . dimGrid , data . dimBlock>>>(data . n , Energy , U,
innerProd ) ;
491
492
reduceSum<<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f (
REAL)>>>(Energy , Energy , data . n , data . b l o c k s i z e ) ;
493
494
i n t nRed = data . n / ( data . b l o c k s i z e ∗2) ;
495
w h i l e ( nRed > 1 )
496
{
497
reduceSum<<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗
s i z e o f (REAL)>>>(Energy , Energy , nRed , data . b l o c k s i z e ) ;
498
nRed = nRed / ( data . b l o c k s i z e ∗2) ;
499
}
500
501
cudaMemcpy ( data . Energy h , Energy , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
502 // t E t a i l −>e n e r g y = ∗ data . Energy h ;
503
504
r e t u r n (∗ data . Energy h ) ;
505 }
506
507 //
508 // conjGrad e x e c u t e s t he s t a n d a r d c o n j u g a t e g r a d i e n t scheme
509 //
510
511 v o i d conjGrad ( dataBin &data , operBin oper , COMPLEX ∗U, COMPLEX ∗W,
COMPLEX ∗R, COMPLEX ∗T1 , COMPLEX ∗LapU , COMPLEX ∗Zhat , COMPLEX
∗Phat , COMPLEX ∗Vhat , COMPLEX ∗JP , COMPLEX ∗ innerProd , REAL ∗
Energy , COMPLEX ∗max , c u f f t H a n d l e plan ) {
512
513
REAL alpha , beta1 , beta2 ;
514
515
r e s i d u a l ( data , oper , U, W, R, T1 , p lan ) ;
516
517
EXEC( plan , R, R, CUFFT FORWARD) ;
187

518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537

538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554

kernCG1<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt , oper .
Q, Zhat , R, Phat , Vhat ) ;
EXEC( plan , R, R, CUFFT INVERSE) ;
data .m = 0 ;
b e ta 1 = Hm1InnerProd ( data , oper , R, Zhat , innerProd , Energy ,
plan ) ;
w h i l e ( data .m <= data . CGsteps ) {
EXEC( plan , W, W, CUFFT FORWARD) ;
kernCG2<<< data . dimGrid , data . dimBlock>>>(data . n , Phat , T1 , W,
LapU , oper . Lap ) ;
EXEC( plan
EXEC( plan
EXEC( plan
EXEC( plan

,
,
,
,

Phat , Phat , CUFFT INVERSE) ;
T1 , T1 , CUFFT INVERSE) ;
W, W, CUFFT INVERSE) ;
LapU , LapU , CUFFT INVERSE) ;

kernCG3<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt ,
data . e p s i l o n , data . eta1 , data . eta2 , data . tau , oper . Lap , T1 , W,
Phat , JP , LapU) ;
EXEC( plan , Phat , Phat , CUFFT FORWARD) ;
EXEC( plan , JP , JP , CUFFT FORWARD) ;
EXEC( plan , T1 , T1 , CUFFT FORWARD) ;
kernCG4<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt ,
data . e p s i l o n , JP , Phat , T1 , oper . Lap ) ;
EXEC( plan , JP , JP , CUFFT INVERSE) ;
a l pha = Hm1InnerProd ( data , oper , JP , Phat , innerProd , Energy ,
plan ) ;
a l pha = beta1 / alpha ;
kernCG5<<< data . dimGrid , data . dimBlock>>>(data . n , alpha , Vhat ,
Phat , R, JP ) ;
data .m++;

188

555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579

reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f (
REAL)>>>(R, max , data . n , data . b l o c k s i z e ) ;
i n t nRed = data . n / ( data . b l o c k s i z e ∗2) ;
w h i l e ( nRed > 1 )
{
reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f
(REAL)>>>(max , max , nRed , data . b l o c k s i z e ) ;
nRed = nRed / ( data . b l o c k s i z e ∗2) ;
}
cudaMemcpy ( data . max h , max , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
data . maxR = data . max h−>x ;
i f ( data . maxR < data . t o l /1 0) {
break ;
}
EXEC( plan , R, R, CUFFT FORWARD) ;
kernCG6<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt ,
o p e r . Q, Zhat , R) ;
EXEC( plan , R, R, CUFFT INVERSE) ;
b e ta2 = Hm1InnerProd ( data , oper , R, Zhat , innerProd , Energy ,
plan ) ;
kernCG7<<< data . dimGrid , data . dimBlock>>>(data . n , beta1 , beta2
, Phat , Zhat ) ;

580
581
b e ta1 = beta2 ;
582
}
583
584 }
585
586 //
587 // newton e x e c u t e s Newton ’ s method f o r f i n d i n g z e r o s o f th e
f u n c t i o n and depends on th e conjGrad f u n c t i o n above
588 //
589
590 v o i d newton ( dataBin &data , operBin oper , COMPLEX ∗U, COMPLEX ∗W,
COMPLEX ∗Vhat , COMPLEX ∗R, COMPLEX ∗T1 , COMPLEX ∗LapU , COMPLEX
∗Zhat , COMPLEX ∗Phat , COMPLEX ∗JP , COMPLEX ∗ innerProd , REAL ∗
189

591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612

Energy , COMPLEX ∗max , c u f f t H a n d l e plan ) {
w h i l e ( data . l <= data . Nsteps ) {
conjGrad ( data , oper , U, W, R, T1 , LapU , Zhat , Phat , Vhat , JP ,
innerProd , Energy , max , plan ) ;
data . mtotal +=data .m;
i f ( data .m >= data . CGsteps ) { break ; }
data . l ++;
EXEC( plan , Vhat , Vhat , CUFFT INVERSE) ;
kernN1<<< data . dimGrid , data . dimBlock>>>(data . n , Vhat , W) ;
r e s i d u a l ( data , oper , U, W, R, T1 , p lan ) ;
reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f (
REAL)>>>(R, max , data . n , data . b l o c k s i z e ) ;
i n t nRed = data . n / ( data . b l o c k s i z e ∗2) ;
w h i l e ( nRed > 1 )
{
reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f
(REAL)>>>(max , max , nRed , data . b l o c k s i z e ) ;
nRed = nRed / ( data . b l o c k s i z e ∗2) ;
}
cudaMemcpy ( data . max h , max , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;

613
614
data . maxR = data . max h−>x ;
615
i f ( data . maxR < data . t o l ) { break ; }
616
}
617 }
618
619 //
620 // The node s t r u c t u r e b u i l d s a l i n k e d l i s t t h a t c o l l e c t s e n e r g y
and time data
621 //
622
623 s t r u c t node {
624
REAL ctime ;
625
REAL e n e r g y ;
626
i n t m total ;
627
s t r u c t node ∗ next ;
628 } ;
629
190

630 v o i d addNode ( s t r u c t node∗& t a i l , REAL ctime ) {
631
s t r u c t node ∗ newNode = new node ;
632
newNode−>ctime = ctime ;
633
newNode−>e n e r g y = 0 . 0 ;
634
newNode−>mtotal = 0 ;
635
newNode−>next = NULL;
636
t a i l −>next = newNode ;
637
t a i l = newNode ;
638
}
639
640 //
641 // This i s where t h e main f u n c t i o n b e g i n s . To make s e n s e o f t h e
code , b e g i n h e r e .
642 //
643
644 i n t main ( i n t argc , ch ar ∗ c o n s t argv [ ] ) {
645
i f ( a r g c <=1 )
646
{
647
c o u t << ” Usage : ” << argv [ 0 ] << ” <f i l e n a m e > <output o f f /
i n c r e a s i n g / even ( 0 / 1 / 2 )>” << e n d l ;
648
exit (1) ;
649
}
650
651
time t tic , toc ;
652
time t tic2 , toc2 ;
653
time (& t i c ) ;
654
//
655
// I n i t i a l i z e n e c e s s a r y c o n s t a n t s
656
//
657
c o u t << s c i e n t i f i c ;
658
cout . p r e c i s i o n (10) ;
659
int i , j , k ;
660
661
fstream myfile , myfile2 , myfile3 , myfile4 ;
662
s t r i n g s t r e a m sstm ;
663
char datain [ 5 0 ] ;
664
665
REAL ctime , Tmax, Tpicstep , Lx , Ly , Lz , c1 , c2 , c3 , Ttol ,
Tfactor , Error ;
666
i n t Nx , Ny , Nz , Mx, My, Mz ;
667
b o o l CGflag = t r u e ;
668
669
REAL t P i c = 1 . 0 ;
670
s t r u c t node ∗ tEhead = NULL;
671
s t r u c t node ∗ t E t a i l = NULL;
191

672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713

tEhead = new node ;
tEhead−>ctime = 0 . 0 ;
tEhead−>e n e r g y = 0 . 0 ;
t E t a i l = tEhead ;
string f i l e i n , fileUout , filetEout ;
dataBin data ;
operBin oper ;
//
// Read i n a l l t h e n e c e s s a r y data from th e f i l e g i v e n by th e
user
//
m y f i l e 3 . open ( argv [ 1 ] , i o s : : i n ) ;
i f ( myfile3 . is open () )
{
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
f i l e i n = datain ;
c o u t << ” I n i t cond f i l e = ” << f i l e i n << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
fileUout = datain ;
c o u t << ” Write out s o l u t i o n f i l e = ” << f i l e U o u t << ” . dat ” <<
endl ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
f i l e t E o u t = datain ;
c o u t << ” Write out time and e n e r g y f i l e = ” << f i l e t E o u t << ” .
dat ” << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . e p s i l o n = a t o f ( d a t a i n ) ;
c o u t << ” e p s i l o n = ” << data . e p s i l o n << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . e t a 1 = a t o f ( d a t a i n ) ;
192

714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758

c o u t << ” e t a 1 = ” << data . e t a 1 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . e t a 2 = a t o f ( d a t a i n ) ;
c o u t << ” e t a 2 = ” << data . e t a 2 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . tau = a t o f ( d a t a i n ) ;
c o u t << ” tau = ” << data . tau << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . dt = a t o f ( d a t a i n ) ;
c o u t << ” dt = ” << data . dt << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Tmax = a t o f ( d a t a i n ) ;
c o u t << ”Tmax = ” << Tmax << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Tpicstep = atof ( datain ) ;
c o u t << ” T p i c s t e p = ” << T p i c s t e p << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Nx = a t o i ( d a t a i n ) ;
c o u t << ”Nx = ” << Nx << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ny = a t o i ( d a t a i n ) ;
c o u t << ”Ny = ” << Ny << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Nz = a t o i ( d a t a i n ) ;
c o u t << ”Nz = ” << Nz << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Lx = a t o f ( d a t a i n ) ;
193

759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803

c o u t << ”Lx = ” << Lx << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ly = a t o f ( d a t a i n ) ;
c o u t << ”Ly = ” << Ly << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Lz = a t o f ( d a t a i n ) ;
c o u t << ”Lz = ” << Lz << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
c1 = a t o f ( d a t a i n ) ;
c o u t << ” c1 = ” << c1 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
c2 = a t o f ( d a t a i n ) ;
c o u t << ” c2 = ” << c2 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
c3 = a t o f ( d a t a i n ) ;
c o u t << ” c3 = ” << c3 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . CGsteps = a t o f ( d a t a i n ) ;
c o u t << ” CGsteps = ” << data . CGsteps << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . Nsteps = a t o f ( d a t a i n ) ;
c o u t << ” Nsteps = ” << data . Nsteps << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . t o l = a t o f ( d a t a i n ) ;
c o u t << ” t o l = ” << data . t o l << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ttol = a t o f ( datain ) ;
194

804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847

c o u t << ” Ttol = ” << Ttol << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Tfactor = atof ( datain ) ;
c o u t << ” T f a c t o r = ” << T f a c t o r << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . b l o c k s i z e = a t o i ( d a t a i n ) ;
c o u t << ” Block s i z e = ” << data . b l o c k s i z e << e n d l ;
c o u t << ” Grid s i z e = ” << Nx∗Ny∗Nz/ data . b l o c k s i z e << e n d l ;
}
e l s e { cout << ” F i l e c o u l d not be re ad . ” << e n d l ; }
c t im e = 0 . 0 ;
i f (∗ argv [ 2 ] == ’ 2 ’ )
{
tPic = Tpicstep ;
}
Mx = Nx / 2 ;
My = Ny / 2 ;
Mz = Nz / 2 ;
data . n = Nx∗Ny∗Nz ;
data . dimBlock = data . b l o c k s i z e ;
data . dimGrid = data . n/ data . dimBlock . x ;

//
// B u i l d t he l a p l a c i a n array , and d y n a m i c a l l y a l l o c a t e o t h e r
needed a r r a y s
//
REAL ∗ xkvec , ∗ ykvec , ∗ zkvec ;
xkvec = new REAL[ Nx ] ;
ykvec = new REAL[ Ny ] ;
z k ve c = new REAL[ Nz ] ;
f o r ( i =0; i <Mx; i ++)
{
xkvec [ i ] = i ∗PI /Lx ;
xkvec [ i+Mx] = ( i −Mx) ∗PI /Lx ;
}
195

848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890

f o r ( i =0; i <My; i ++)
{
ykvec [ i ] = i ∗PI /Ly ;
ykvec [ i+My] = ( i −My) ∗PI /Ly ;
}
f o r ( i =0; i <Mz ; i ++)
{
z k vec [ i ] = i ∗PI /Lz ;
z k vec [ i+Mz ] = ( i −Mz) ∗PI /Lx ;
}
REAL ∗ Delx h , ∗ Dely h , ∗ Delz h , ∗ Delx , ∗ Dely , ∗ Delz , ∗ Energy ;
data . Energy h = new REAL;
Delx h = new REAL[ data . n ] ;
Dely h = new REAL[ data . n ] ;
D e l z h = new REAL[ data . n ] ;
f o r ( i =0; i <Nx ; i ++) {
f o r ( j =0; j <Ny ; j ++) {
f o r ( k=0; k<Nz ; k++)
{
Delx h [ k+Nz∗( j+Ny∗ i ) ] = xkvec [ i ] ;
Dely h [ k+Nz∗( j+Ny∗ i ) ] = ykvec [ j ] ;
D e l z h [ k+Nz ∗( j+Ny∗ i ) ] = zkvec [ k ] ;
}
}
}
c o u t << ” Del o p e r a t o r s b u i l t ” << e n d l ;
//
// I n i t i a l i z e a l l o f t he a r r a y s t h a t w i l l be used i n f o u r i e r
t r a n s f o r m s , and b u i l d th e p l a n s n e c e s s a r y
//
i n t nBytes = s i z e o f (COMPLEX) ∗ data . n ;
c o u t << ” nBytes i s = ” << nBytes << e n d l ;
COMPLEX ∗U h , ∗U, ∗W, ∗Wexp , ∗Vhat , ∗R, ∗T1 , ∗LapU , ∗Zhat , ∗Phat
, ∗JP , ∗DelxU , ∗DelyU , ∗DelzU , ∗ innerProd , ∗max ;
//
// Read i n i n i t a l U data from f i l e i n
//
196

891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934

U h = new COMPLEX[ nBytes ] ;
data . max h = new COMPLEX;
m y f i l e . open ( f i l e i n . c s t r ( ) , i o s : : i n ) ;
i f ( myfile . is open () )
{
f o r ( i =0; i <data . n ; i ++)
{
m y f i l e . g e t l i n e ( da t ai n , 50 ) ;
U h [ i ] . x = atof ( datain ) ;
U h[ i ]. y = 0.0;
}
c o u t << ” I n i t i a l data read i n s u c c e s s f u l l y ” << e n d l ;
} e l s e { cout << ” I n i t i a l data not r ead ” << e n d l ; }
myfile . close () ;
//
// A l l o c a t e memory on th e GPU and s e t up s p a c e f o r t he F o u r i e r
transforms
//
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d

∗∗)&U, nBytes ) ;
∗∗)&
W, nBytes ) ;
∗∗)&Wexp , nBytes ) ;
∗∗)&Vhat , nBytes ) ;
∗∗)&R, nBytes ) ;
∗∗)&T1 , nBytes ) ;
∗∗)&LapU , nBytes ) ;
∗∗)&Zhat , nBytes ) ;
∗∗)&Phat , nBytes ) ;
∗∗)&JP , nBytes ) ;
∗∗)&DelxU , nBytes ) ;
∗∗)&DelyU , nBytes ) ;
∗∗)&DelzU , nBytes ) ;
∗∗)&innerProd , nBytes ) ;
∗∗)&Energy , s i z e o f (REAL) ∗ data . n ) ;

cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d

∗∗)&op er . Lap , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&op er .Q, s i z e o f (REAL) ∗ data . n ) ;
∗∗)&Delx , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&Dely , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&Delz , s i z e o f (REAL) ∗ data . n ) ;
197

935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972

cudaMalloc ( ( v o i d ∗∗)&max , s i z e o f (COMPLEX) ∗ data . dimGrid . x /2 ) ;
c u f f t H a n d l e p lan ;
c u f f t P l a n 3 d (&plan , Nx , Ny , Nz , TYPE) ;

cudaMemcpy ( Delx , Delx h , s i z e o f (REAL) ∗ data . n ,
cudaMemcpyHostToDevice ) ;
cudaMemcpy ( Dely , Dely h , s i z e o f (REAL) ∗ data . n ,
cudaMemcpyHostToDevice ) ;
cudaMemcpy ( Delz , Delz h , s i z e o f (REAL) ∗ data . n ,
cudaMemcpyHostToDevice ) ;
cudaMemcpy (U, U h , nBytes , cudaMemcpyHostToDevice ) ;
c o u t << ” I n i t i a l data c o p i e d ” << e n d l ;
//
// C a l c u l a t e th e c o n s t a n t a r r a y s Lap and Q t h a t w i l l be used
below
//
kernLapQ<<< data . dimGrid , data . dimBlock>>>(data . n , data . e p s i l o n ,
c1 , c3 , oper . Lap , oper .Q, Delx , Dely , Delz ) ;
c o u t << ”Lap and Q i n i t i a l i z e d ” << e n d l ;
//
// w h i l e l o o p g o v e r n i n g time s t e p p i n g
//
w h i l e ( ctime < Tmax)
{
time(& t i c 2 ) ;
//
// Prepare th e a r r a y s needed t o use CG based Newton ’ s method
//
EXEC( plan , U, U, CUFFT FORWARD) ;
kern1<<< data . dimGrid , data . dimBlock>>>(data . n , oper . Lap , U,
LapU) ;
EXEC( plan , LapU , LapU , CUFFT INVERSE) ;
EXEC( plan , U, U, CUFFT INVERSE) ;
t E t a i l −>e n e r g y = FCHenergy ( data , oper , U, Energy , Delx , Dely ,
Delz , LapU , DelxU , DelyU , DelzU , p lan ) ;
198

973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008

kern2<<< data . dimGrid , data . dimBlock>>>(data . n , data . e p s i l o n ,
data . eta1 , data . eta2 , data . tau , LapU , T1 , U, Wexp) ;
EXEC( plan , Wexp , Wexp , CUFFT FORWARD) ;
EXEC( plan , T1 , T1 , CUFFT FORWARD) ;
kern3<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt , data .
e p s i l o n , oper . Lap , Wexp , T1) ;
EXEC( plan , Wexp , Wexp , CUFFT INVERSE) ;
kern4<<< data . dimGrid , data . dimBlock>>>(data . n , Wexp , U, W) ;
data . l = 0 ;
data .m = 0 ;
data . mtotal = 0 ;
//
// Apply Newton ’ s method
//
newton ( data , oper , U, W, Vhat , R, T1 , LapU , Zhat , Phat , JP ,
innerProd , Energy , max , plan ) ;
i f ( data .m >= data . CGsteps ) { cout << ”No Conjugate Gradient
c o n v e r g e n c e a t time t = ” << t E t a i l −> ctime << e n d l ; }
i f ( data . l >= data . Nsteps ) { cout << ”No Newton c o n v e r g e n c e a t
time t = ” << t E t a i l −> ctime << e n d l ; }
k e r n E r r o r <<< data . dimGrid , data . dimBlock>>>(data . n , R, Wexp , W
);
reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f (
REAL)>>>(R, max , data . n , data . b l o c k s i z e ) ;
i n t nRed = data . n / ( data . b l o c k s i z e ∗2) ;
w h i l e ( nRed > 1 )
{
reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f
(REAL)>>>(max , max , nRed , data . b l o c k s i z e ) ;
nRed = nRed / ( data . b l o c k s i z e ∗2) ;
}

199

1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032

cudaMemcpy ( data . max h , max , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
E r r o r = data . max h−>x ;
//
// Check t o s e e i f th e i t e r a t i o n s c o n v e r g e d then a d j u s t t he
time s t e p a c c o r d i n g l y
//
i f ( E r r o r > Ttol | | data .m >= data . CGsteps | | data . l >= data .
Nsteps ) {
c o ut << ” T o l e r a n c e f a i l ! Recomputing ” << E r r o r << ”\ t l = ”
<< data . l << ”\ t m = ” << data .m << e n d l ;
data . dt = data . dt / 1 . 3 ;
}
else {
c t ime = ctime + data . dt ;
t E t a i l −>mtotal = data . mtotal ;
//
// Update time and update U t o be t h e s o l u t i o n a t th e next
time s t e p
//
kern5<<< data . dimGrid , data . dimBlock>>>(data . n , U, W) ;
time(& t o c 2 ) ;
c o ut << ctime << ”\ t ” << data . dt << ”\ t ” << t E t a i l −>e n e r g y
<< ”\ t ” << data . l << ”\ t ” << data .m << ”\ t ” << data . mtotal << ”
\ t ” << d i f f t i m e ( toc2 , t i c 2 ) << e n d l ;

1033
1034

i f ( data . mtotal < data . CGsteps /10 && data . l < data . Nsteps /3 )
{
i f ( CGflag == t r u e ) {
data . dt = data . dt ∗min ( T f a c t o r ∗ s q r t ( Ttol / E r r o r ) , (REAL)

1035
1036
1.3) ;
1037
1038
1039
1040
1041
1042
1043

}
else {
data . dt = data . dt ∗min ( T f a c t o r ∗ s q r t ( Ttol / E r r o r ) , (REAL)
( 1 . 0 + 0 . 0 0 3 ∗ ( data . CGsteps/10− data . mtotal ) ) ) ;
}
}
else
{
200

data . dt = data . dt ∗min ( T f a c t o r ∗ s q r t ( Ttol / E r r o r ) , (REAL)

1044
1.0) ;
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087

CGflag = f a l s e ;
}
//
// Write out s o l u t i o n t o hard d i s k i f r e q u i r e d
//
i f ( abs ( ctime − t P i c ) < data . t o l /100 && ∗ argv [ 2 ] != ’ 0 ’ )
{
cudaMemcpy ( U h , U, nBytes , cudaMemcpyDeviceToHost ) ;
sstm << f i l e U o u t << ( i n t ) t P i c << ” . dat ” ;
m y f i l e 2 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
i f ( myfile2 . is open () )
{
f o r ( i =0; i <data . n ; i ++)
{
m y f i l e 2 << U h [ i ] . x << e n d l ;
}
}
cout << ” Writing out ” << sstm . s t r ( ) << e n d l ;
myfile2 . close () ;
sstm . s t r ( ”” ) ;
i f (∗ argv [ 2 ] == ’ 1 ’ )
{
t P i c += pow ( (REAL) 1 0 . 0 , (REAL) c e i l ( l o g 1 0 ( t P i c +1) ) −1) ;
}
else
{
t P i c += T p i c s t e p ;
}
}
//
// Adjust th e t i m e s t e p i f n e c e s s a r y
//
i f ( data . dt > (Tmax − ctime ) )
{
data . dt = Tmax − ctime ;
}

201

1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130

i f ( data . dt > ( t P i c − ctime ) && ∗ argv [ 2 ] != ’ 0 ’ )
{
data . dt = t P i c − ctime ;
}
addNode ( t E t a i l , ctime ) ;
}
}
t E t a i l −>e n e r g y = FCHenergy ( data , oper , W, Energy , Delx , Dely ,
Delz , LapU , DelxU , DelyU , DelzU , p lan ) ;
//
// Write f i n a l s o l u t i o n t o hard d i s k and compute i t s e n e r g y
//
sstm << f i l e U o u t << ( i n t ) Tmax << ” . dat ” ;
m y f i l e 2 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
i f ( myfile2 . is open () )
{
cudaMemcpy ( U h , U, nBytes , cudaMemcpyDeviceToHost ) ;
myfile2 . precision (15) ;
f o r ( i =0; i <data . n ; i ++)
{
m y f i l e 2 << U h [ i ] . x << e n d l ;
}
}
c o u t << ” Writing out ” << sstm . s t r ( ) << e n d l ;
myfile2 . close () ;
sstm . s t r ( ”” ) ;
sstm << f i l e t E o u t << ” . dat ” ;
m y f i l e 4 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
i f ( myfile4 . is open () )
{
w h i l e ( tEhead != NULL)
{
m y f i l e 4 << tEhead−>ctime << ”\ t ” << tEhead−>e n e r g y << ”\ t ”
<< tEhead−>mtotal << e n d l ;
t E t a i l = tEhead ;
tEhead = tEhead−>next ;
delete tEtail ;
}
202

1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175 }

}
myfile4 . close () ;
d e l e t e tEhead ;
//
// Perform c l e a n up o f memory used
//
cudaFree (U) ;
cudaFree ( U h ) ;
cudaFree (W) ;
cudaFree (Wexp) ;
cudaFree ( Vhat ) ;
cudaFree (R) ;
cudaFree (T1) ;
cudaFree (LapU) ;
cudaFree ( Zhat ) ;
cudaFree ( Phat ) ;
cudaFree ( JP ) ;
cudaFree ( DelxU ) ;
cudaFree ( DelyU ) ;
cudaFree ( DelzU ) ;
cudaFree ( i n n e r P r o d ) ;
cudaFree ( ope r . Lap ) ;
cudaFree ( ope r .Q) ;
cudaFree (max) ;
c u f f t D e s t r o y ( plan ) ;
delete
delete
delete
delete
delete
delete
delete
delete
delete

data . Energy h ;
data . max h ;
[] U h;
[ ] xkvec ;
[ ] ykvec ;
[ ] zkvec ;
[ ] Delx h ;
[ ] Dely h ;
[ ] Delz h ;

time (& t o c ) ;
c o u t << ”\n” << d i f f t i m e ( toc , t i c ) << e n d l ;
return 0;

203

./code/FCHImp.cu

204

Appendix C: ETD-RK2 Code in
CUDA
1
2
3
4

//
// FCHExInt . cu
//
// This code was w r i t t e n by J a y l a n Jones t o approximate s o l u t i o n s
to the
5 // F u n c t i o n a l i z e d Cahn−H i l l i a r d Equation u s i n g th e E x p o n e n t i a l
Time
6 // D i f f e r e n c i n g Runge−Kutta scheme t h a t i s second o r d e r a c c u r a t e
i n time .
7 //
8
9 #i n c l u d e <i o s t r e a m >
10 #i n c l u d e <f s t r e a m >
11 #i n c l u d e <sstream>
12 #i n c l u d e <cuda . h>
13 #i n c l u d e < c u f f t . h>
14 #i n c l u d e <time . h>
15 #i n c l u d e <math . h>
16
17 #d e f i n e PI 3 . 1 4 1 5 9 2 6 5 3 5 8 9
18 #d e f i n e REAL double
19 #d e f i n e COMPLEX cufftDoubleComplex
20 #d e f i n e TYPE CUFFT Z2Z
21 #d e f i n e EXEC c u f f t E x e c Z 2 Z
22
23 u s i n g namespace s t d ;
24
25 //
26 // B u i l d data s t r u c t u r e s n e c e s s a r y t o p a s s i n t o k e r n e l s
27 //
28
29 s t r u c t dataBin {
30
int n , blocksize ;
205

31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

REAL dt , e p s i l o n , eta1 , eta2 , tau , mu, maxR, ∗ Energy h ;
dim3 dimBlock , dimGrid ;
COMPLEX ∗max h , ∗ z f ;
};
s t r u c t operBin {
REAL ∗Lap , ∗L ;
};
s t r u c t node {
REAL ctime ;
REAL e n e r g y ;
s t r u c t node ∗ next ;
};
int i ;

//
// This s e r i e s o f k e r n e l s c o m p a r t m e n t a l i z e s th e c a l c u l a t i o n i n t o
p e i c e s t h a t do not depend s e r i a l l y on data .
50 // The k e r n e l s a r e c a l l e d i n t t he main f u n c t i o n below .
51 //
52
53 //
54 // kernE1 and kernE2 a r e c a l l e d i n t he FCHenergy f u n c t i o n t h a t
c a l c u l a t e s th e FCH e n e r g y o f t he s o l u t i o n
55 //
56
57
global
v o i d kernE1 ( i n t n , COMPLEX ∗U, REAL ∗Lap , REAL ∗ Delx ,
REAL ∗ Dely , REAL ∗ Delz , COMPLEX ∗LapU , COMPLEX ∗DelxU , COMPLEX
∗DelyU , COMPLEX ∗DelzU )
58 {
59
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
60
i f ( idx < n)
61
{
62
U[ i d x ] . x = U[ i d x ] . x/n ;
63
U[ i d x ] . y = U[ i d x ] . y/n ;
64
65
LapU [ i d x ] . x = Lap [ i d x ] ∗U[ i d x ] . x ;
66
LapU [ i d x ] . y = Lap [ i d x ] ∗U[ i d x ] . y ;
67
68
DelxU [ i d x ] . x = −Delx [ i d x ] ∗U[ i d x ] . y ;
69
DelxU [ i d x ] . y = Delx [ i d x ] ∗U[ i d x ] . x ;
70
71
DelyU [ i d x ] . x = −Dely [ i d x ] ∗U[ i d x ] . y ;
206

72
DelyU [ i d x ] . y = Dely [ i d x ] ∗U[ i d x ] . x ;
73
74
DelzU [ i d x ] . x = −Delz [ i d x ] ∗U[ i d x ] . y ;
75
DelzU [ i d x ] . y = Delz [ i d x ] ∗U[ i d x ] . x ;
76
77
}
78 }
79
80
global
v o i d kernE2 ( i n t n , REAL e p s i l o n , REAL eta1 , REAL eta2 ,
REAL tau , REAL ∗ Energy , COMPLEX ∗U, COMPLEX ∗LapU , COMPLEX ∗
DelxU , COMPLEX ∗DelyU , COMPLEX ∗DelzU )
81 {
82
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
83
i f ( idx < n)
84
{
85
Energy [ i d x ] = 0 . 5 ∗ ( e p s i l o n ∗ e p s i l o n ∗LapU [ i d x ] . x−(U[ i d x ] . x∗U[ i d x
] . x−1) ∗(U[ i d x ] . x+tau / 2) ) ∗( e p s i l o n ∗ e p s i l o n ∗LapU [ i d x ] . x−(U[ i d x ] . x
∗U[ i d x ] . x−1) ∗(U[ i d x ] . x+tau / 2) )−e p s i l o n ∗( e t a 1 ∗ e p s i l o n ∗ e p s i l o n
/2∗( DelxU [ i d x ] . x∗DelxU [ i d x ] . x + DelyU [ i d x ] . x∗DelyU [ i d x ] . x +
DelzU [ i d x ] . x∗DelzU [ i d x ] . x ) + e t a 2 ∗ 0 . 5 ∗ (U[ i d x ] . x+1) ∗(U[ i d x ] . x+1)
∗ ( 0 . 5 ∗ (U[ i d x ] . x−1) ∗(U[ i d x ] . x−1)+tau /3∗(U[ i d x ] . x−2) ) ) ;
86
}
87 }
88
89 //
90 // kernLapL b u i l d s t h e L a p l a c i a n and L a r r a y s t o be used
throughout t he c a l c u l a t i o n
91 //
92
93
global
v o i d kernLapL ( i n t n , REAL e p s i l o n , REAL eta1 , REAL mu,
REAL ∗Lap , REAL ∗L , REAL ∗ Delx , REAL ∗ Dely , REAL ∗ Delz )
94 {
95
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
96
i f ( idx < n)
97
{
98
Lap [ i d x ] = −(Delx [ i d x ] ∗ Delx [ i d x ]+ Dely [ i d x ] ∗ Dely [ i d x ]+ Delz [ i d x
] ∗ Delz [ i d x ] ) ;
99
L [ i d x ] = Lap [ i d x ] ∗ ( ( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] − mu + e p s i l o n ∗
e t a 1 ) ∗( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] − mu) ) ;
100
}
101 }
102
103 //
104 // kernS1 , kernS2 , kernS3 , and kernS4 a c c e l e r a t e p e i c e s o f th e
f u n c t i o n sub
207

105 //
106
107
global
v o i d kernS1 ( i n t n , REAL tau , COMPLEX ∗V, COMPLEX ∗Q1)
108 {
109
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
110
i f ( idx < n)
111
{
112
Q1 [ i d x ] . x = ( 0 . 5 ∗ ( tau − 6 ) + V[ i d x ] . x ) ∗V[ i d x ] . x∗V[ i d x ] . x ;
113
Q1 [ i d x ] . y = ( 0 . 5 ∗ ( tau − 6 ) + V[ i d x ] . y ) ∗V[ i d x ] . y∗V[ i d x ] . y ;
114
}
115 }
116
117
global
v o i d kernS2 ( i n t n , REAL e p s i l o n , REAL mu, REAL ∗Lap ,
COMPLEX ∗V, COMPLEX ∗Q1 , COMPLEX ∗R)
118 {
119
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
120
i f ( idx < n)
121
{
122
V[ i d x ] . x = V[ i d x ] . x/n ;
123
V[ i d x ] . y = V[ i d x ] . y/n ;
124
125
R[ i d x ] . x = ( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] − mu) ∗V[ i d x ] . x − Q1 [ i d x ] .
x/n ;
126
R[ i d x ] . y = ( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] − mu) ∗V[ i d x ] . y − Q1 [ i d x ] .
y/n ;
127
}
128 }
129
130
global
v o i d kernS3 ( i n t n , REAL tau , COMPLEX ∗V, COMPLEX ∗R)
131 {
132
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
133
i f ( idx < n)
134
{
135
R[ i d x ] . x = ( 6 − tau − 3∗V[ i d x ] . x ) ∗V[ i d x ] . x∗R[ i d x ] . x ;
136
R[ i d x ] . y = ( 6 − tau − 3∗V[ i d x ] . y ) ∗V[ i d x ] . y∗R[ i d x ] . y ;
137
}
138 }
139
140
global
v o i d kernS4 ( i n t n , REAL e p s i l o n , REAL eta1 , REAL eta2 ,
REAL mu, REAL ∗Lap , COMPLEX ∗V, COMPLEX ∗Q1 , COMPLEX ∗R)
141 {
142
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
143
i f ( idx < n)
144
{

208

145

146

R[ i d x ] . x = Lap [ i d x ]∗( −( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] − mu + e p s i l o n
∗ e t a 1 ) ∗Q1 [ i d x ] . x + R[ i d x ] . x + e p s i l o n ∗( e t a 1 − e t a 2 ) ∗(mu∗V[ i d x ] .
x + Q1 [ i d x ] . x ) ) ;
R[ i d x ] . y = Lap [ i d x ]∗( −( e p s i l o n ∗ e p s i l o n ∗Lap [ i d x ] − mu + e p s i l o n
∗ e t a 1 ) ∗Q1 [ i d x ] . y + R[ i d x ] . y + e p s i l o n ∗( e t a 1 − e t a 2 ) ∗(mu∗V[ i d x ] .
y + Q1 [ i d x ] . y ) ) ;
}

147
148 }
149
150 //
151 // kern0 , kern1 , and kern2 a c c e l e r a t e p o r t i o n s o f t h e main
function .
152 //
153
154
global
v o i d kern0 ( i n t n , COMPLEX ∗U, COMPLEX ∗V, i n t d i r e c t )
155 {
156
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
157
i f ( idx < n)
158
{
159
U[ i d x ] . x = V[ i d x ] . x − d i r e c t ;
160
U[ i d x ] . y = 0 . 0 ;
161
}
162 }
163
164
global
v o i d kern1 ( i n t n , REAL dt , REAL ∗L , COMPLEX ∗A, COMPLEX
∗V, COMPLEX ∗R)
165 {
166
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
167
i f ( idx < n)
168
{
169
A[ i d x ] . x = ( exp ( dt ∗L [ i d x ] ) ∗V[ i d x ] . x + ( exp ( dt ∗L [ i d x ] ) − 1 ) /L [
i d x ] ∗R[ i d x ] . x ) /n ;
170
A[ i d x ] . y = ( exp ( dt ∗L [ i d x ] ) ∗V[ i d x ] . y + ( exp ( dt ∗L [ i d x ] ) − 1 ) /L [
i d x ] ∗R[ i d x ] . y ) /n ;
171
}
172 }
173
174
global
v o i d kern2 ( i n t n , REAL dt , REAL ∗L , COMPLEX ∗A, COMPLEX
∗V, COMPLEX ∗R, COMPLEX ∗R2)
175 {
176
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
177
i f ( idx < n)
178
{
179
A[ i d x ] . x = A[ i d x ] . x/n ;
180
A[ i d x ] . y = A[ i d x ] . y/n ;
209

181
182
183

V[ i d x ] . x = A[ i d x ] . x + ( exp ( dt ∗L [ i d x ] ) − 1 − dt ∗L [ i d x ] ) / ( dt ∗L [
i d x ] ∗ L [ i d x ] ∗ n ) ∗(R2 [ i d x ] . x − R[ i d x ] . x ) ;
V[ i d x ] . y = A[ i d x ] . y + ( exp ( dt ∗L [ i d x ] ) − 1 − dt ∗L [ i d x ] ) / ( dt ∗L [
i d x ] ∗ L [ i d x ] ∗ n ) ∗(R2 [ i d x ] . y − R[ i d x ] . y ) ;

184
}
185 }
186
187 //
188 // k e r n E r r o r computes t he d i f f e r e n c e between s t a g e s t h a t i s used
i n t h e a d a p t i v e time s t e p p i n g
189 //
190
191
global
v o i d k e r n E r r o r ( i n t n , COMPLEX ∗R, COMPLEX ∗V, COMPLEX ∗
A)
192 {
193
i n t i d x = b l o c k I d x . x∗ blockDim . x+t h r e a d I d x . x ;
194
i f ( idx < n)
195
{
196
R[ i d x ] . x = f a b s (V[ i d x ] . x − A[ i d x ] . x ) ;
197
}
198 }
199
200 //
201 // warpReduce , reduce , warpReduceSum , and reduceSum p a r a l l e l i z e
t h e r e d u c t i o n s n e c e s s a r y t o c a l c u l a t e e n e r g y and e r r o r
202 //
203
204
device
v o i d warpReduce ( v o l a t i l e REAL ∗ sdata , u n s i g n e d i n t t i d ,
int blockSize )
205 {
206
i f ( b l o c k S i z e >= 6 4) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 3 2 ] ) ) ;
207
i f ( b l o c k S i z e >= 3 2) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 1 6 ] ) ) ;
208
i f ( b l o c k S i z e >= 1 6) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 8 ] ) ) ;
209
i f ( b l o c k S i z e >= 8 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 4 ] ) ) ;
210
i f ( b l o c k S i z e >= 4 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 2 ] ) ) ;
211
i f ( b l o c k S i z e >= 2 ) s d a t a [ t i d ] = max( f a b s ( s d a t a [ t i d ] ) , f a b s (
sdata [ t i d + 1 ] ) ) ;
212 }
213
210

214

global
v o i d r e d u c e (COMPLEX ∗ g i d a t a , COMPLEX ∗ g odata , i n t n ,
int blockSize )

215 {
216
extern
shared
REAL s d a t a [ ] ;
217
unsigned i n t t i d = threadIdx . x ;
218
u n s i g n e d i n t i = b l o c k I d x . x ∗( b l o c k S i z e ∗2) + t i d ;
219
u n s i g n e d i n t g r i d S i z e = b l o c k S i z e ∗2∗ gridDim . x ;
220
sdata [ t i d ] = 0;
221
222
w h i l e ( i < n ) { s d a t a [ t i d ] = max( f a b s ( g i d a t a [ i ] . x ) , f a b s ( g i d a t a [
i+b l o c k S i z e ] . x ) ) ; i += g r i d S i z e ; }
223
syncthreads () ;
224
225
i f ( b l o c k S i z e >= 5 12) { i f ( t i d < 256) { s d a t a [ t i d ] = max( f a b s (
sdata [ t i d ] ) , fabs ( sdata [ t i d + 256]) ) ; }
syncthreads () ; }
226
i f ( b l o c k S i z e >= 2 56) { i f ( t i d < 128) { s d a t a [ t i d ] = max( f a b s (
sdata [ t i d ] ) , fabs ( sdata [ t i d + 128]) ) ; }
syncthreads () ; }
227
i f ( b l o c k S i z e >= 1 28) { i f ( t i d < 6 4) { s d a t a [ t i d ] = max( f a b s (
sdata [ t i d ] ) , fabs ( sdata [ t i d + 6 4 ] ) ) ; }
syncthreads () ; }
228
i f ( t i d < 3 2) warpReduce ( sdata , t i d , b l o c k S i z e ) ;
229
i f ( t i d == 0 ) g o d a t a [ b l o c k I d x . x ] . x = s d a t a [ 0 ] ;
230 }
231
232
device
v o i d warpReduceSum ( v o l a t i l e REAL ∗ sdata , u n s i g n e d i n t
tid , int blockSize )
233 {
234
i f ( b l o c k S i z e >= 6 4) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 3 2 ] ;
235
i f ( b l o c k S i z e >= 3 2) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 1 6 ] ;
236
i f ( b l o c k S i z e >= 1 6) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 8 ] ;
237
i f ( b l o c k S i z e >= 8 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 4 ] ;
238
i f ( b l o c k S i z e >= 4 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 2 ] ;
239
i f ( b l o c k S i z e >= 2 ) s d a t a [ t i d ] = s d a t a [ t i d ] + s d a t a [ t i d + 1 ] ;
240 }
241
242
global
v o i d reduceSum (REAL ∗ g i d a t a , REAL ∗ g odata , i n t n , i n t
blockSize )
243 {
244
extern
shared
REAL s d a t a [ ] ;
245
unsigned i n t t i d = threadIdx . x ;
246
u n s i g n e d i n t i = b l o c k I d x . x ∗( b l o c k S i z e ∗2) + t i d ;
247
u n s i g n e d i n t g r i d S i z e = b l o c k S i z e ∗2∗ gridDim . x ;
248
sdata [ t i d ] = 0;
249
250
w h i l e ( i < n ) { s d a t a [ t i d ] = g i d a t a [ i ] + g i d a t a [ i+b l o c k S i z e ] ; i
+= g r i d S i z e ; }
211

251
252
253
254
255

syncthreads () ;
i f ( b l o c k S i z e >= 5 12) { i f ( t i d < 256) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 2 5 6 ] ; }
syncthreads () ; }
i f ( b l o c k S i z e >= 2 56) { i f ( t i d < 128) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 1 2 8 ] ; }
syncthreads () ; }
i f ( b l o c k S i z e >= 1 28) { i f ( t i d < 6 4) { s d a t a [ t i d ] = s d a t a [ t i d ]
+ sdata [ t i d + 6 4 ] ; }
syncthreads () ; }
i f ( t i d < 3 2) warpReduceSum ( sdata , t i d , b l o c k S i z e ) ;
i f ( t i d == 0 ) g o d a t a [ b l o c k I d x . x ] = s d a t a [ 0 ] ;

256
257
258 }
259
260 //
261 // FCHenergy computes t he e n e r g y o f th e s o l u t i o n U a t each
timestep
262 //
263
264 REAL FCHenergy ( dataBin &data , operBin oper , COMPLEX ∗U, REAL ∗
Energy , REAL ∗ Delx , REAL ∗ Dely , REAL ∗ Delz , COMPLEX ∗LapU ,
COMPLEX ∗DelxU , COMPLEX ∗DelyU , COMPLEX ∗DelzU , c u f f t H a n d l e
plan ) {
265
266
EXEC ( plan , U, U, CUFFT FORWARD) ;
267
268
kernE1<<< data . dimGrid , data . dimBlock>>>(data . n , U, oper . Lap ,
Delx , Dely , Delz , LapU , DelxU , DelyU , DelzU ) ;
269
270
EXEC( plan , LapU , LapU , CUFFT INVERSE) ;
271
EXEC( plan , DelxU , DelxU , CUFFT INVERSE) ;
272
EXEC( plan , DelyU , DelyU , CUFFT INVERSE) ;
273
EXEC( plan , DelzU , DelzU , CUFFT INVERSE) ;
274
EXEC( plan , U, U, CUFFT INVERSE) ;
275
276
kernE2<<< data . dimGrid , data . dimBlock>>>(data . n , data . e p s i l o n ,
data . eta1 , data . eta2 , data . tau , Energy , U, LapU , DelxU , DelyU ,
DelzU ) ;
277
278
reduceSum<<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f (
REAL)>>>(Energy , Energy , data . n , data . b l o c k s i z e ) ;
279
280
i n t nRed = data . n / ( data . b l o c k s i z e ∗2) ;
281
w h i l e ( nRed > 1 )
282
{
283
reduceSum<<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗
s i z e o f (REAL)>>>(Energy , Energy , nRed , data . b l o c k s i z e ) ;
212

nRed = nRed / ( data . b l o c k s i z e ∗2) ;

284
285
286
287
288
289
290
291
292
293
294
295
296

}
cudaMemcpy ( data . Energy h , Energy , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
r e t u r n (∗ data . Energy h ) ;
}
//
// sub i s used t o compute both s t a g e s o f th e ETD
−RK2 scheme
//
v o i d sub ( dataBin &data , operBin oper , COMPLEX ∗V, COMPLEX ∗Q1 ,
COMPLEX ∗R, c u f f t H a n d l e plan ) {

297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318

kernS1<<< data . dimGrid , data . dimBlock>>>(data . n , data . tau , V, Q1
);
EXEC( plan , V, V, CUFFT FORWARD) ;
EXEC( plan , Q1 , Q1 , CUFFT FORWARD) ;
cudaMemcpy ( data . z f , V, s i z e o f (COMPLEX) , cudaMemcpyDeviceToHost ) ;
data . z f −>x = data . z f −>x / data . n ;
data . z f −>y = data . z f −>y / data . n ;
kernS2<<< data . dimGrid , data . dimBlock>>>(data . n , data . e p s i l o n ,
data . mu, ope r . Lap , V, Q1 , R) ;
EXEC( plan , V, V, CUFFT INVERSE) ;
EXEC( plan , R, R, CUFFT INVERSE) ;
kernS3<<< data . dimGrid , data . dimBlock>>>(data . n , data . tau , V, R)
;
EXEC( plan , V, V, CUFFT FORWARD) ;
EXEC( plan , R, R, CUFFT FORWARD) ;
kernS4<<< data . dimGrid , data . dimBlock>>>(data . n , data . e p s i l o n ,
data . eta1 , data . eta2 , data . mu, oper . Lap , V, Q1 , R) ;

319
320 }
321
322 //
213

323 // addNode adds a node t o t he l i n k e d l i s t t h a t c o l l e c t s t h e e n e r g y
and time s t e p s i z e s
324 //
325
326 v o i d addNode ( s t r u c t node∗& t a i l , REAL ctime ) {
327
s t r u c t node ∗ newNode = new node ;
328
newNode−>ctime = ctime ;
329
newNode−>e n e r g y = 0 . 0 ;
330
newNode−>next = NULL;
331
t a i l −>next = newNode ;
332
t a i l = newNode ;
333
}
334
335 //
336 // This i s where t h e main f u n c t i o n b e g i n s . To make s e n s e o f t h e
code , b e g i n h e r e .
337 //
338
339 i n t main ( i n t argc , ch ar ∗ c o n s t argv [ ] ) {
340
i f ( a r g c <=1 )
341
{
342
c o u t << ” Usage : ” << argv [ 0 ] << ” <f i l e n a m e > <output o f f /
i n c r e a s i n g / even ( 0 / 1 / 2 )>” << e n d l ;
343
exit (1) ;
344
}
345
346
time t tic , toc ;
347
time t tic2 , toc2 ;
348
time (& t i c ) ;
349
//
350
// I n i t i a l i z e n e c e s s a r y c o n s t a n t s
351
//
352
c o u t << s c i e n t i f i c ;
353
cout . p r e c i s i o n (10) ;
354
int i , j , k ;
355
356
fstream myfile , myfile2 , myfile3 , myfile4 ;
357
s t r i n g s t r e a m sstm ;
358
char datain [ 5 0 ] ;
359
360
REAL ctime , dtmax , Tmax, Tpicstep , TpicFactor , Lx , Ly , Lz , Ttol ,
Tfactor , Error ;
361
i n t Nx , Ny , Nz , Mx, My, Mz ;
362
363
REAL t P i c = 1 . 0 ;
214

364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405

s t r u c t node ∗ tEhead = NULL;
s t r u c t node ∗ t E t a i l = NULL;
tEhead = new node ;
tEhead−>ctime = 0 . 0 ;
tEhead−>e n e r g y = 0 . 0 ;
t E t a i l = tEhead ;
string f i l e i n , fileUout , filetEout ;
dataBin data ;
operBin oper ;
//
// Read i n a l l t h e n e c e s s a r y data from th e f i l e g i v e n by th e
user
//
m y f i l e 3 . open ( argv [ 1 ] , i o s : : i n ) ;
i f ( myfile3 . is open () )
{
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
f i l e i n = datain ;
c o u t << ” I n i t cond f i l e = ” << f i l e i n << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
fileUout = datain ;
c o u t << ” Write out s o l u t i o n f i l e = ” << f i l e U o u t << ” . dat ” <<
endl ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
f i l e t E o u t = datain ;
c o u t << ” Write out time and e n e r g y f i l e = ” << f i l e t E o u t << ” .
dat ” << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . e p s i l o n = a t o f ( d a t a i n ) ;
c o u t << ” e p s i l o n = ” << data . e p s i l o n << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
215

406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450

m y f i l e 3 >> d a t a i n ;
data . tau = a t o f ( d a t a i n ) ;
data .mu = 2 − data . tau ;
c o u t << ” tau = ” << data . tau << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . e t a 1 = a t o f ( d a t a i n ) ;
c o u t << ” e t a 1 = ” << data . e t a 1 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . e t a 2 = a t o f ( d a t a i n ) ;
c o u t << ” e t a 2 = ” << data . e t a 2 << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . dt = a t o f ( d a t a i n ) ;
c o u t << ” dt = ” << data . dt << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
dtmax = a t o f ( d a t a i n ) ;
c o u t << ”dtmax = ” << dtmax << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Tmax = a t o f ( d a t a i n ) ;
c o u t << ”Tmax = ” << Tmax << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Tpicstep = atof ( datain ) ;
c o u t << ” T p i c s t e p = ” << T p i c s t e p << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
T p i cFa ctor = a t o f ( d a t a i n ) ;
c o u t << ” Tpi cFactor = ” << Tpic Factor << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ttol = a t o f ( datain ) ;
c o u t << ” Ttol = ” << Ttol << e n d l ;

216

451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495

m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Tfactor = atof ( datain ) ;
c o u t << ” T f a c t o r = ” << T f a c t o r << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Nx = a t o i ( d a t a i n ) ;
c o u t << ”Nx = ” << Nx << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ny = a t o i ( d a t a i n ) ;
c o u t << ”Ny = ” << Ny << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Nz = a t o i ( d a t a i n ) ;
c o u t << ”Nz = ” << Nz << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Lx = a t o f ( d a t a i n ) ;
c o u t << ”Lx = ” << Lx << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Ly = a t o f ( d a t a i n ) ;
c o u t << ”Ly = ” << Ly << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
Lz = a t o f ( d a t a i n ) ;
c o u t << ”Lz = ” << Lz << e n d l ;
m y f i l e 3 . i g n o r e ( 5 1 2 , ’= ’ ) ;
m y f i l e 3 >> d a t a i n ;
data . b l o c k s i z e = a t o i ( d a t a i n ) ;
c o u t << ” Block s i z e = ” << data . b l o c k s i z e << e n d l ;
c o u t << ” Grid s i z e = ” << Nx∗Ny∗Nz/ data . b l o c k s i z e << e n d l ;
}
e l s e { cout << ” F i l e c o u l d not be re ad . ” << e n d l ; }
c t im e = 0 . 0 ;

217

496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539

t P i c = T picFact or ;
i f (∗ argv [ 2 ] == ’ 2 ’ )
{
tPic = Tpicstep ;
}
Mx = Nx / 2 ;
My = Ny / 2 ;
Mz = Nz / 2 ;
data . n = Nx∗Ny∗Nz ;
data .mu = 2 − data . tau ;
data . dimBlock = data . b l o c k s i z e ;
data . dimGrid = data . n/ data . dimBlock . x ;

//
// B u i l d t he l a p l a c i a n array , and d y n a m i c a l l y a l l o c a t e o t h e r
needed a r r a y s
//
REAL ∗ xkvec , ∗ ykvec , ∗ zkvec ;
xkvec = new REAL[ Nx ] ;
ykvec = new REAL[ Ny ] ;
z k ve c = new REAL[ Nz ] ;
f o r ( i =0; i <Mx; i ++)
{
xkvec [ i ] = i ∗2∗ PI /Lx ;
xkvec [ i+Mx] = ( i −Mx) ∗2∗ PI /Lx ;
}
f o r ( i =0; i <My; i ++)
{
ykvec [ i ] = i ∗2∗ PI /Ly ;
ykvec [ i+My] = ( i −My) ∗2∗ PI /Ly ;
}
f o r ( i =0; i <Mz ; i ++)
{
z k vec [ i ] = i ∗2∗ PI /Lz ;
z k vec [ i+Mz ] = ( i −Mz) ∗2∗ PI /Lx ;
}
REAL ∗ Delx h , ∗ Dely h , ∗ Delz h , ∗ Delx , ∗ Dely , ∗ Delz , ∗ Energy ;
218

540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583

data . Energy h = new REAL;
Delx h = new REAL[ data . n ] ;
Dely h = new REAL[ data . n ] ;
D e l z h = new REAL[ data . n ] ;
f o r ( i =0; i <Nx ; i ++) {
f o r ( j =0; j <Ny ; j ++) {
f o r ( k=0; k<Nz ; k++)
{
Delx h [ k+Nz∗( j+Ny∗ i ) ] = xkvec [ i ] ;
Dely h [ k+Nz∗( j+Ny∗ i ) ] = ykvec [ j ] ;
D e l z h [ k+Nz ∗( j+Ny∗ i ) ] = zkvec [ k ] ;
}
}
}
c o u t << ” Del o p e r a t o r s b u i l t ” << e n d l ;
//
// I n i t i a l i z e a l l o f t he a r r a y s t h a t w i l l be used i n f o u r i e r
t r a n s f o r m s , and b u i l d th e p l a n s n e c e s s a r y
//
i n t nBytes = s i z e o f (COMPLEX) ∗ data . n ;
c o u t << ” nBytes i s = ” << nBytes << e n d l ;
COMPLEX ∗U h , ∗U, ∗V, ∗A, ∗Q1 , ∗R, ∗R2 , ∗LapU , ∗max ;
//
// Read i n i n i t a l U data from f i l e i n
//
U h = new COMPLEX[ nBytes ] ;
data . max h = new COMPLEX;
data . z f = new COMPLEX;
m y f i l e . open ( f i l e i n . c s t r ( ) , i o s : : i n ) ;
i f ( myfile . is open () )
{
f o r ( i =0; i <data . n ; i ++)
{
m y f i l e . g e t l i n e ( da t ai n , 50 ) ;
U h [ i ] . x = atof ( datain ) ;
U h[ i ]. y = 0.0;
}
219

584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623

c o u t << ” I n i t i a l data read i n s u c c e s s f u l l y ” << e n d l ;
} e l s e { cout << ” I n i t i a l data not r ead ” << e n d l ; }
myfile . close () ;
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d

∗∗)&U, nBytes ) ;
∗∗)&V, nBytes ) ;
∗∗)&A, nBytes ) ;
∗∗)&Q1 , nBytes ) ;
∗∗)&R, nBytes ) ;
∗∗)&R2 , nBytes ) ;
∗∗)&LapU , nBytes ) ;
∗∗)&Energy , s i z e o f (REAL) ∗ data . n ) ;

cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d
cudaMalloc ( ( v o i d

∗∗)&op er . Lap , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&op er . L , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&Delx , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&Dely , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&Delz , s i z e o f (REAL) ∗ data . n ) ;
∗∗)&max , s i z e o f (COMPLEX) ∗ data . dimGrid . x /2 ) ;

c u f f t H a n d l e p lan ;
c u f f t P l a n 3 d (&plan , Nx , Ny , Nz , TYPE) ;

cudaMemcpy ( Delx , Delx h , s i z e o f (REAL) ∗ data . n ,
cudaMemcpyHostToDevice ) ;
cudaMemcpy ( Dely , Dely h , s i z e o f (REAL) ∗ data . n ,
cudaMemcpyHostToDevice ) ;
cudaMemcpy ( Delz , Delz h , s i z e o f (REAL) ∗ data . n ,
cudaMemcpyHostToDevice ) ;
cudaMemcpy (U, U h , nBytes , cudaMemcpyHostToDevice ) ;
c o u t << ” I n i t i a l data c o p i e d ” << e n d l ;
//
// C a l c u l a t e th e c o n s t a n t a r r a y s Lap and L t h a t w i l l be used
below
//
kernLapL<<< data . dimGrid , data . dimBlock>>>(data . n , data . e p s i l o n ,
data . eta1 , data . mu, oper . Lap , oper . L , Delx , Dely , Delz ) ;
c o u t << ”Lap and L i n i t i a l i z e d ” << e n d l ;
sstm << f i l e t E o u t << ” . dat ” ;
m y f i l e 4 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
myfile4 . precision (15) ;
220

624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660

t E t a i l −>e n e r g y = FCHenergy ( data , oper , U, Energy , Delx , Dely ,
Delz , LapU , Q1 , R, R2 , plan ) ;
m y f i l e 4 << ctime << ”\ t ” << tEhead−>e n e r g y << e n d l ;
c o u t << time << ”\ t ” << 0 . 0 << ”\ t ” << t E t a i l −>e n e r g y << ”\ t ” <<
(REAL) 0 . 0 << e n d l ;
kern0<<< data . dimGrid , data . dimBlock>>>(data . n , V, U, (REAL) −1)
;
//
// w h i l e l o o p g o v e r n i n g time s t e p p i n g
//
w h i l e ( ctime < Tmax)
{
time(& t i c 2 ) ;
//
// C a l c u l a t e A and V
//
sub ( data , oper , V, Q1 , R, plan ) ;
kern1<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt , oper .
L , A, V, R) ;
cudaMemcpy (A, data . z f , s i z e o f (COMPLEX) , cudaMemcpyHostToDevice
);
EXEC( plan , A, A, CUFFT INVERSE) ;
sub ( data , oper , A, Q1 , R2 , plan ) ;
kern2<<< data . dimGrid , data . dimBlock>>>(data . n , data . dt , oper .
L , A, V, R, R2) ;
cudaMemcpy (V, data . z f , s i z e o f (COMPLEX) , cudaMemcpyHostToDevice
);
EXEC( plan , V, V, CUFFT INVERSE) ;
EXEC( plan , A, A, CUFFT INVERSE) ;
k e r n E r r o r <<< data . dimGrid , data . dimBlock>>>(data . n , R, V, A) ;

221

661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697

reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f (
REAL)>>>(R, max , data . n , data . b l o c k s i z e ) ;
i n t nRed = data . n / ( data . b l o c k s i z e ∗2) ;
w h i l e ( nRed > 1 )
{
reduce <<< data . dimGrid , data . dimBlock , data . b l o c k s i z e ∗ s i z e o f
(REAL)>>>(max , max , nRed , data . b l o c k s i z e ) ;
nRed = nRed / ( data . b l o c k s i z e ∗2) ;
}
cudaMemcpy ( data . max h , max , s i z e o f (REAL) ,
cudaMemcpyDeviceToHost ) ;
E r r o r = data . max h−>x ;
//
// Update time and update U t o be th e s o l u t i o n a t th e next
time s t e p
//
c t im e = ctime + data . dt ;
kern0<<< data . dimGrid , data . dimBlock>>>(data . n , U, V, 1 ) ;
t E t a i l −>e n e r g y = FCHenergy ( data , oper , U, Energy , Delx , Dely ,
Delz , LapU , Q1 , R, R2 , plan ) ;
time(& t o c 2 ) ;
m y f i l e 4 << ctime << ”\ t ” << t E t a i l −>e n e r g y << e n d l ;
c o u t << ctime << ”\ t ” << data . dt << ”\ t ” << t E t a i l −>e n e r g y <<
”\ t ” << d i f f t i m e ( toc2 , t i c 2 ) << e n d l ;
//
// C a l c u l a t e th e a d a p t i v e time s t e p
//
data . dt = min ( data . dt ∗min ( T f a c t o r ∗ s q r t ( Ttol / E r r o r ) , (REAL)
1 . 3 ) , dtmax ) ;
i f ( abs ( ctime − t P i c ) < 0 . 0 0 0 0 0 0 0 0 0 1 && ∗ argv [ 2 ] != ’ 0 ’ )
{
cudaMemcpy ( U h , U, nBytes , cudaMemcpyDeviceToHost ) ;

222

698

sstm << f i l e U o u t << ( i n t ) f l o o r ( t P i c / Tpic Factor + 0 . 5 ) << ” .
dat ” ;
m y f i l e 2 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
i f ( myfile2 . is open () )
{

699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739

myfile2 . precision (15) ;
f o r ( i =0; i <data . n ; i ++)
{
m y f i l e 2 << U h [ i ] . x << e n d l ;
}
}
c out << ” Writing out ” << sstm . s t r ( ) << e n d l ;
myfile2 . close () ;
sstm . s t r ( ”” ) ;
i f (∗ argv [ 2 ] == ’ 1 ’ )
{
t P i c += pow ( (REAL) 1 0 . 0 , (REAL) c e i l ( l o g 1 0 ( t P i c+TpicFa ctor )
) −1) ;
}
else
{
t P i c += T p i c s t e p ;
}
}
//
// Adjust th e t i m e s t e p i f n e c e s s a r y
//
i f ( data . dt > (Tmax − ctime ) )
{
data . dt = Tmax − ctime ;
}
i f ( data . dt > ( t P i c − ctime ) && ∗ argv [ 2 ] != ’ 0 ’ )
{
data . dt = t P i c − ctime ;
}
addNode ( t E t a i l , ctime ) ;
}

223

740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782

t E t a i l −>e n e r g y = FCHenergy ( data , oper , U, Energy , Delx , Dely ,
Delz , LapU , Q1 , R, R2 , plan ) ;
//
// Write out f i n a l s o l u t i o n data
//
sstm << f i l e U o u t << ( i n t ) f l o o r (Tmax/ TpicFa ctor + 0 . 5 ) << ” . dat ”
;
m y f i l e 2 . open ( sstm . s t r ( ) . c s t r ( ) , i o s : : out ) ;
i f ( myfile2 . is open () )
{
cudaMemcpy ( U h , U, nBytes , cudaMemcpyDeviceToHost ) ;
myfile2 . precision (15) ;
f o r ( i =0; i <data . n ; i ++)
{
m y f i l e 2 << U h [ i ] . x << e n d l ;
}
}
c o u t << ” Writing out ” << sstm . s t r ( ) << e n d l ;
myfile2 . close () ;
sstm . s t r ( ”” ) ;
w h i l e ( tEhead != NULL)
{
t E t a i l = tEhead ;
tEhead = tEhead−>next ;
delete tEtail ;
}
m y f i l e 4 << ctime << ”\ t ” << tEhead−>e n e r g y << e n d l ;
myfile4 . close () ;
d e l e t e tEhead ;
//
// Clean up a l l t h e memory used on t h e GPU and CPU
//
cudaFree (U) ;
cudaFree ( U h ) ;
cudaFree (V) ;
cudaFree (A) ;
cudaFree (Q1) ;
cudaFree (R) ;
224

783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805 }

cudaFree (R2) ;
cudaFree (LapU) ;
cudaFree ( ope r . Lap ) ;
cudaFree ( ope r . L) ;
cudaFree (max) ;
c u f f t D e s t r o y ( plan ) ;
delete
delete
delete
delete
delete
delete
delete
delete
delete

data . Energy h ;
data . max h ;
[] U h;
[ ] xkvec ;
[ ] ykvec ;
[ ] zkvec ;
[ ] Delx h ;
[ ] Dely h ;
[ ] Delz h ;

time (& t o c ) ;
c o u t << ”\n” << d i f f t i m e ( toc , t i c ) << e n d l ;
return 0;
./code/FCHExInt.cu

225

BIBLIOGRAPHY

226

BIBLIOGRAPHY
[1] L. D. Landau. On the theory of phase transitions, part ii. Sov. Phys. JETP, 11:627,
1937.
[2] L. D. Landau. On the theory of phase transitions, part i. Sov. Phys. JETP, 7:19ﬀ,
1937.
[3] V. L. Ginzburg and L. D. Landau. Concerning the theory of superconductivity. Soviet
Physics IETP, 20:1064–1082, 1950.
[4] G. J. Fix. Phase ﬁeld models for free boundary problems. Free boundary problems:
Theory and applications, 2:580–589, 1983.
[5] J. S. Langer. Models of pattern formation in ﬁrst-order phase transitions. Dir. in
Cond. Matter Phys., World Sci. Pub., pages 164–186, 1986.
[6] L. Q. Chen. Phase-ﬁeld models for microstructure evolution. Annu. Rev. Mat. Res.,
32(1):113–140, 2002.
[7] S. Wise, J. Kim, and J. Lowengrub. Solving the regularized, strongly anisotropic
cahn-hilliard equation by an adaptive nonlinear multigrid method. J. Comp. Phys.,
226(1):414–446, 2007.
[8] J. S. Rowlinson. Translation of jd van der waals’the thermodynamik theory of capillarity under the hypothesis of a continuous variation of density. J. Stat. Phys., 20(2):197–
200, 1979.
[9] J. D. Van der Waals. The thermodynamic theory of capillarity under the hypothesis
of a continuous variation of density. J. Stat. Phys., 20(2):200–244, 1979.
[10] J. W. Cahn and J. E. Hilliard. Free energy of a nonuniform system. i. interfacial free
energy. J. Chem. Phys., 28:258, 1958.
[11] R. L. Pego. Front migration in the nonlinear cahn-hilliard equation. Proc. R. Soc. A:
Math. Phys. and Eng. Sci., 422(1863):261, 1989.

227

[12] J. Philibert. One and a half century of diﬀusion: Fick, einstein, before and beyond.
Diﬀusion Fundamentals, 2(1):1–10, 2005.
[13] C. M. Elliott and Z. Songmu. On the cahn-hilliard equation. Arch. Ration. Mech.
Anal., 96(4):339–357, 1986.
[14] N. D. Alikakos, P. W. Bates, and X. Chen. Convergence of the cahn-hilliard equation
to the hele-shaw model. Arch. Ration. Mech. Anal., 128(2):165–205, 1994.
[15] L. Modica. The gradient theory of phase transitions and the minimal interface criterion.
Arch. Ration. Mech. Anal., 98(2):123–142, 1987.
[16] P. Sternberg. The eﬀect of a singular perturbation on nonconvex variational problems.
Arch. Ration. Mech. Anal., 101(3):209–260, 1988.
[17] P. De Mottoni and M. Schatzman. Geometrical evolution of developed interfaces.
Amer. Math. Soc., 347(5), 1995.
[18] P. B. Canham. The minimum energy of bending as a possible explanation of the
biconcave shape of the human red blood cell*. J. Theor. Bio., 26(1):61–81, 1970.
[19] W. Helfrich. Elastic properties of lipid bilayers: theory and possible experiments. Z.
Naturforsch, 28(11):693–703, 1973.
[20] Q. Du, C. Liu, and X. Wang. Simulating the deformation of vesicle membranes under
elastic bending energy in three dimensions. J. Comp. Phys., 212(2):757–777, 2006.
[21] R. D. Kamien. The geometry of soft materials: a primer. Rev. Mod. Phys., 74:953,
2002.
[22] M. E. Gurtin and M. E. Jabbour. Interface evolution in three dimensions with
curvature-dependent energy and surface diﬀusion: Interface-controlled evolution, phase
transitions, epitaxial growth of elastic ﬁlms. Arch. Ration. Mech. Anal., 163(3):171–
208, 2002.
[23] K. Promislow and B. Wetton. Pem fuel cells: a mathematical overview. SIAM J. Appl.
Math, 70(2):369–409, 2009.
[24] S. J. Paddison. Device and materials modeling in PEM fuel cells, volume 113. Springer
Verlag, 2009.

228

[25] S. Dai and K. Promislow. Geometric evolution of bi-layers under the functionalized
cahn-hilliard equation. Proc. R. Soc. A, 2013.
[26] G. Gompper, M. Schick, C. Domb, and J. L. Lebowitz. Phase Transitions and Critical
Phenomena: Self-assembling Amphiphilic Systems. Academic Press, 1994.
[27] G. Gompper and M. Schick. Correlation between structural and interfacial properties
of amphiphilic systems. Phys. Rev. Letters, 65(9):1116–1119, 1990.
[28] N. Gavish, G. Hayrapetyan, K. Promislow Promislow, and L. Bronsard. Curvature
driven ﬂow of bi-layer interfaces. Physica D: Nonlinear Phenomena, 240(7):675 – 693,
2011.
[29] W. Y. Hsu and T. D. Gierke. Ion transport and clustering in naﬁon perﬂuorinated
membranes. J. Memb. Sci., 13(3):307–326, 1983.
[30] A. S. Ioselevich, A. A. Kornyshev, and J. H. G. Steinke. Fine morphology of protonconducting ionomers. J. Phys. Chem. B, 108(32):11953–11963, 2004.
[31] L. Rubatat, G. Gebel, and O. Diat. Fibrillar structure of naﬁon: Matching fourier and
real space studies of corresponding ﬁlms and solutions. Macromolecules, 37(20):7772–
7783, 2004.
[32] K. Schmidt-Rohr and Q. Chen. Parallel cylindrical water nanochannels in naﬁon fuelcell membranes. Nature Materials, 7(1):75–83, 2007.
[33] W. G. F. Grot, G. E. Munn, and P. N. Walmsley. Perﬂuorinated ion exchange membranes. In 141th National Meeting, The Electrochemical Society, Houston, Texas, 1972.
[34] D. J. Eyre. Unconditionally gradient stable time marching the cahn-hilliard equation.
In Computational and mathematical models of microstructural evolution, volume 529 of
Materials Research Society Symposia Proceedings, pages 39–64, Warrendale, PA, 1998.
[35] T. Kupper and N. Masbaum. Simulation of particle growth and ostwald ripening via
the cahn-hilliard equation. Acta metallurgica et materialia, 42(6):1847–1858, 1994.
[36] C. M. Elliott and D. A. French. Numerical studies of the cahn-hilliard equation for
phase separation. IMA J. Appl. Math., 38(2):97, 1987.
[37] C. M. Elliott, D. A. French, and F. A. Milner. A second order splitting method for
the cahn-hilliard equation. Numerische Mathematik, 54(5):575–590, 1989.
229

[38] C. M. Elliott and D. A. French. A nonconforming ﬁnite-element method for the twodimensional cahn-hilliard equation. SIAM J. Num. Anal., pages 884–903, 1989.
[39] Q. Du and R. A. Nicolaides. Numerical analysis of a continuum model of phase transition. SIAM J. Num. Anal., 28(5):1310–1322, 1991.
[40] J. W. Barrett, J. F. Blowey, and H. Garcke. Finite element approximation of the
cahn-hilliard equation with degenerate mobility. SIAM J. Num. Anal., 37(1):286–318,
1999.
[41] X. Feng and A. Prohl. Error analysis of a mixed ﬁnite element method for the cahnhilliard equation. Numerische Mathematik, 99(1):47–84, 2004.
[42] D. Kay and R. Welford. A multigrid ﬁnite element solver for the cahn-hilliard equation.
J. Comp. Phys., 212(1):288–304, 2006.
[43] J. Kim, K. Kang, and J. Lowengrub. Conservative multigrid methods for cahn-hilliard
ﬂuids. J. Comp. Phys., 193(2):511–543, 2004.
[44] Z. Hu, S. M. Wise, C. Wang, and J. S. Lowengrub. Stable and eﬃcient ﬁnite-diﬀerence
nonlinear-multigrid schemes for the phase ﬁeld crystal equation. J. Comp. Phys.,
228(15):5323–5339, 2009.
[45] S. M. Wise. Unconditionally stable ﬁnite diﬀerence, nonlinear multigrid simulation of
the cahn-hilliard-hele-shaw system of equations. J. Sci. Comp., 44(1):38–68, 2010.
[46] D.J. Eyre. An unconditionally stable one-step scheme for gradient systems. Unpublished
article, 1998.
[47] D. Furihata. A stable and conservative ﬁnite diﬀerence scheme for the cahn-hilliard
equation. Numerische Mathematik, 87(4):675–699, 2001.
[48] J. Zhu, L. Q. Chen, J. Shen, and V. Tikare. Coarsening kinetics from a variablemobility cahn-hilliard equation: Application of a semi-implicit fourier spectral method.
Phys. Rev. E, 60(4):3564, 1999.
[49] B. P. Vollmayr-Lee and A. D. Rutenberg. Fast and accurate coarsening simulation
with an unconditionally stable time step. Phys. Rev. E, 68(6):066703, 2003.
[50] X. Ye and X. Cheng. The fourier spectral method for the cahn-hilliard equation. Appl.
Math. Comp., 171(1):345–357, 2005.
230

[51] J. Shen and X. Yang. Numerical approximations of allen-cahn and cahn-hilliard equations. Discr. and Cont. Dynam. Sys., 28(4):1669–1691, 2010.
[52] G. N. Wells, E. Kuhl, and K. Garikipati. A discontinuous galerkin method for the
cahn-hilliard equation. J. Comp. Phys., 218(2):860–877, 2006.
[53] Y. Xia, Y. Xu, and C. W. Shu. Local discontinuous galerkin methods for the cahnhilliard type equations. J. Comp. Phys., 227(1):472–491, 2007.
[54] S. M. Wise, C. Wang, and J. S. Lowengrub. An energy-stable and convergent ﬁnitediﬀerence scheme for the phase ﬁeld crystal equation. SIAM J. Num. Anal., 47:2269,
2009.
[55] H. Gomez and T. J. R. Hughes. Provably unconditionally stable, second-order
time-accurate, mixed variational methods for phase-ﬁeld models. J. Comp. Phys.,
230(13):5310–5327, 2011.
[56] J. Shen and X. Yang. Energy stable schemes for cahn-hilliard phase-ﬁeld model of twophase incompressible ﬂows. Chinese Annals of Mathematics, Series B, 31(5):743–758,
2010.
[57] D. A. French. Computations on the cahn-hilliard model of solidiﬁcation. Appl. Math.
Comp., 40(1):55–76, 1990.
[58] M. Kronbichler and G. Kreiss. A hybrid level-set-cahn-hilliard model for two-phase
ﬂow. In 1st European Conference on Microﬂuidics, Bologna, Italy, 2008.
[59] Mark Ryerson Willoughby. High-Order Time-Adaptive Numerical Methods For The
Allen-Cahn and Cahn-Hilliard Equations. PhD thesis, University of British Columbia,
2011.
[60] J Certaine. The solution of ordinary diﬀerential equations with large time constants.
Mathematical methods for digital computers, pages 128–132, 1960.
[61] D. A. Pope. An exponential method of numerical integration of ordinary diﬀerential
equations. Comm. of the ACM, 6(8):491–493, 1963.
[62] B. V. Minchev, W. M. Wright, et al. A review of exponential integrators for ﬁrst order
semi-linear problems. Model. Ident. Cont., 27(4):201, 2006.

231

[63] T. Y. Hou, J. S. Lowengrub, and M. J. Shelley. Removing the stiﬀness from interfacial
ﬂows with surface tension. J. Comp. Phys., 114(2):312–338, 1994.
[64] G. Beylkin, J. M. Keiser, and L. Vozovoi. A new class of time discretization schemes
for the solution of nonlinear pdes. J. Comp. Phys., 147(2):362–387, 1998.
[65] S. M. Cox and P. C. Matthews. Exponential time diﬀerencing for stiﬀ systems. J.
Comp. Phys., 176(2):430–455, 2002.
[66] Q. Du and W. Zhu. Stability analysis and applications of the exponential time diﬀerencing schemes. J. Comp. Math., 22(2):200–209, 2004.
[67] Q. Du and W. Zhu. Analysis and applications of the exponential time diﬀerencing
schemes and their contour integration modiﬁcations. BIT Num. Math., 45(2):307–328,
2005.
[68] A. Kassam. High order timestepping for stiﬀ semilinear partial diﬀerential equations.
PhD thesis, University of Oxford, 2004.
[69] A. Kassam and L. N. Trefethen. Fourth-order time-stepping for stiﬀ pdes. SIAM J.
Sci. Comp., 26(4):1214–1233, 2005.
[70] M. Hochbruck and A. Ostermann. Exponential runge–kutta methods for parabolic
problems. Appl. Numer. Math., 53(2):323–339, 2005.
[71] M. Hochbruck and A. Ostermann. Explicit exponential runge–kutta methods for semilinear parabolic problems. SIAM J. Num. Anal., 43(3):1069–1090, 2005.
[72] M. Hochbruck and A. Ostermann. Exponential integrators. Acta Numerica, 19(1):209–
286, 2010.
[73] M. Tokman. Eﬃcient integration of large stiﬀ systems of odes with exponential propagation iterative (epi) methods. J. Comp. Phys., 213(2):748–776, 2006.
[74] M. Tokman and J. Loﬀeld. Eﬃcient design of exponential-krylov integrators for large
scale computing. Procedia Computer Science, 1(1):229–237, 2010.
[75] M. Tokman. A new class of exponential propagation iterative methods of runge–kutta
type (epirk). J. Comp. Phys., 230(24):8762–8778, 2011.

232

[76] A. Koskela and A. Ostermann. Exponential taylor methods: Analysis and implementation. Comp. Math. Applic., 2012.
[77] L. Trefethen. Spectral Methods in MATLAB. Society for Indus- trial and Applied
Mathematics, 2000.
[78] A. Christlieb and B. Ong. Implicit parallel time integrators. J. Sci. Comp., pages
1–13, 2010.
[79] Z. Xu, A. Christlieb, and K. Promislow. On the unconditionally gradient stable scheme
for the cahn-hilliard equation and its implementation with fourier method. Comm.
Math. Sci., 11(2):345, 2013. This paper will be submitted for publication within the
next six months.
[80] Nvidia’s next generation cuda compute architecture: Fermi. Technical report, Nvidia,
http://www.nvidia.com/object/fermi architecture.html, 2011.
[81] S. Jain and F. S. Bates. Consequences of nonergodicity in aqueous binary peo-pb
micellar dispersions. Macromolecules, 37(4):1511–1523, 2004.
[82] D. Bendejacq, M. Joanicot, and V. Ponsinet. Pearling instabilities in water-dispersed
copolymer cylinders with charged brushes. Euro. Phys. J. E: Soft Matter and Bio.
Phys., 17(1):83–92, 2005.
[83] L. G. Reyna and M. J. Ward. Metastable internal layer dynamics for the viscous
cahn-hilliard equation. Meth. Appl. Anal., 2:285–306, 1995.
[84] A. J. Bray. Theory of phase- ordering kinetics. Adv. Phys., 43:357, 1994.
[85] P. Yue, J. Feng, C. Liu, and J. Shen. A diﬀuse-interface method for simulating twophase ﬂows of complex ﬂuids. J. Fluid. Mech., 515:293–317, 2004.
[86] D. M. Anderson, G. B. McFadden, and A. A. Wheeler. Diﬀuse-interface methods in
ﬂuid mechanics. Annu. Rev. Fluid. Mech, 30:139–165, 1998.
[87] S. M. Allen and J. W. Cahn. A microscopic theory for antiphase boundary motion and
its application to antiphase domain coarsening. Acta Metall. Mater., 27:1085–1095,
1979.
[88] E. Hairer, S.P. Nørsett, and G. Wanner. Solving Ordinary Diﬀerential Equations II Stiﬀ and Diﬀerential-Algebraic Problems. Springer-Verlag, Berlin Heidelberg, 1991.
233

[89] E. Hairer, S.P. Nørsett, and G. Wanner. Solving Ordinary Diﬀerential Equations I Nonstiﬀ problems. Springer-Verlag Berlin Heidelberg, 2 edition, 1993.
[90] Y. Saad. Iterative methods for sparse linear systems. Society for Industrial and Applied
Mathematics, Philadelphia, Pa, 2nd edition, 2003.
[91] A. Christlieb, K. Promislow, and Z. Xu. On the unconditionally gradient stable scheme
for the cahn-hilliard equation and its implementation with fourier method. Comm.
Math. Sci, 2013.
[92] N. Gavish, J. Jones, Z. Xu, A. Christlieb, and K. Promislow. Variational models
of network formation and ion transport: applications to perﬂuorosulfonate ionomer
membranes. Polymers, 4:630–655, 2012.
[93] L. Bronsard and F. Reitich. On three-phase boundary motion and the singular limit of
a vector-valued ginzburg–landau equation. Arch. Ration. Mech. Anal., 124(355–379),
1993.
[94] J. Carr and R. Pego. Metastable patterns in solutions of ut = 2 uxx − f (u). Comm.
Pure Appl. Math, 42:523–576, 1989.
[95] L.C. Evans. Partial Diﬀerential Equations. American Mathematical Society, 2 edition,
2010.
[96] J. C. Gower. Properties of euclidean and non-euclidean distance matrices. Lin. Alg.
Applic., 67:81–97, 1985.
[97] R. Alexander. Diagonally implicit runge-kutta methods for stiﬀ odes. SIAM J. Num.
Anal., 14:1006–1021, 1977.
[98] M. R. Willoughby. High-order time adaptive numerical methods for the allen-cahn and
cahn-hilliard equations. Msc thesis, Institute of Applied Mathematics, University of
British Columbia, 2011.
[99] J. J. B. de Swart and G. S¨derland. On the construction of error estimators for implicit
o
runge-kutta methods. J. Comp. Appl. Math., 86:347–358, 1997.
[100] A. Dutt, L. Greengard, and V. Rokhlin. Spectral deferred correction methods for
ordinary diﬀerential equations. BIT, 40:241–266, 2000.

234

[101] S. Boscarino. On an accurate third order implicit-explicit runge-kutta method for stiﬀ
problems. Appl. Numer. Math, 59:1515–1528, 2009.
[102] U. M. Ascher, S. J. Ruuth, and B. R. Wetton. Implicit-explicit methods for timedependent partial diﬀerential equations. SIAM J. Num. Anal., 32:797–823, 1995.
[103] U. M. Ascher, S. J. Ruuth, and R. J. Spiteri. Implicit-explicit runge-kutta methods for
time-dependent partial diﬀerential equations. Appl. Numer. Math, 25:151–167, 1997.
[104] M.L. Minion. Semi-implicit spectrally deferred correction methods for ordinary diﬀerential equations. Comm. Math. Sci, 1:471–500, 2003.
[105] M. Cheng and A. D. Rutenberg. Maximally fast coarsening algorithms. Phys. Rev. E,
72:055701(R), 2005.
[106] M. Cheng and J. A. Warren. Controlling the accuracy of unconditionally stable algorithms in the cahn-hilliard equation. Phys. Rev. E, 75:017702, 2007.
[107] C. Wang, X. Wang, and S. M. Wise. Unconditionally stable schemes for equations of
thin ﬁlm epitaxy. Discrete Cont. Dyn. Sys. Ser. A, 28:405–423, 2010.
[108] J. Shen, C. Wang, X. Wang, and S. M. Wise. Second-order convex splitting schemes
for gradient ﬂows with ehrlich-schwoebel-type energy: Application to thin ﬁlm epitaxy.
SIAM J. Num. Anal., 50:105–125, 2012.
[109] A. Christlieb, J. Jones, K. Promislow, B. Wetton, and M. Willoughby. High accuracy
solutions to energy gradient ﬂows from material science models. J. Comp. Phys., 2013.
[110] N. J. Higham. Accuracy and Stability of Numberical Algorithms. Number 48. Siam,
1996.
[111] W. Feng and K. W. Cameron. The green500 list: Encouraging sustainable supercomputing. Computer, 40(12):50–55, 2007.
[112] W. Shinoda, T. Nakamura, and S. O. Nielsen. Free energy analysis of vesicle-to-bicelle
transformation. Soft Matter, 7(19):9012–9020, 2011.
[113] S. Yamamoto, Y. Maruyama, and S. Hyodo. Dissipative particle dynamics study of
spontaneous vesicle formation of amphiphilic molecules. J. Chem. Phys., 116:5842,
2002.

235