. u. .
hﬁwedmi

£3. w.

 

:1

. 3Q: It;
.. .
”my;

 

 

'A. .
RE..." 4 v .5. As“
5.5,: s; 3...
Vxﬁmuwnm .Wm...uuh as... ,
..

 

.
It}!

 

 

 

 

ﬂ. ..
:.
.{Pvﬂ

 

 

I3... .114...
J .5 it! .I.
.a I. .511...

Lang: 4 .nQv.i.é§..~G=.y.uuwmx.: . a V
:84“.ha tan. 4% WWVIHWlehmJaM,..wa . $173 an...”

twat.“

 

 

 

1mm Michigan-State
3 _ University
2000!

This is to certify that the
dissertation entitled

FAST COMPUTATIONAL TECHNIQUES FOR MULTISCALE
ELECTROMAGNETIC SIMULATIONS

presented by
Vikram Melapudi

has been accepted towards fulﬁllment
of the requirements for the

Ph.D. degree in Electrical ErgineerinL

 

 

{Ami/é

 

"\ Major Professor’s Signature
MM Vi, 200?

Date

 

MSU is an Afﬁnnative Action/Equal Opportunity Employer

4 —.-n-I-I-I-O-O-O-O-O-l-l-l-I-O-I-l-O--o—I0-'-—--—-—-—-—-—-—-—-— — —- -

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 K:/Prolecc&Pres/ClRC/DateDue indd

FAST COMPUTATIONAL TECHNIQUES FOR MULTISCALE
ELECTROMAGNETIC SIMULATIONS

By
Vikram Melapudi

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY

Electrical Engineering

2009

ABSTRACT

FAST COMPUTATIONAL TECHNIQUES FOR MULTISCALE
ELECTROMAGNETIC SIMULATIONS

By
Vikram Melapudi

Multiscale electromagnetic simulations contain features with multiple length or
frequency scales or both. Multiscale features are characteristic of realitic simulations
as large degrees of freedom (N) are required to capture the minute physical details.
Though integral equation (IE) approaches are well-suited for electromagnetic simu-
lations, they require repeated evaluation of pair-wise potentials - also referred to as
N -body problems. It is well known that the direct computation of these potentials
scales as 0(N2) both in terms of computer memory and time. Even with the rapid
advancements in computer technology, this places severe limitation on the size of the
problem (N) that can be analyzed in a realistic time frame. Further, multiscale sim-
ulations produce badly-conditioned systems of equations that require large number
of iterations when using Krylov-subspace solvers. The main goal of this thesis is to
develop a suite of computational techniques that enables multiscale electromagnetic
simulations in a fast, efﬁcient and stable fashion. In this work, the accelerated Carte-
sian expansion (ACE) algorithm is used to overcome the quadratic cost-scaling of
N -body problems. ACE was intially developed for the fast evaluation of polynomial
potentials and here it is extended to the fast computation of retarded and Helmholtz
potentials. These algorithms are shown to be stable and efﬁcient for computation of
electromagnetic potentials at sub-wavelength or low—frequency scales Hybrid com-
bination of these algorithms with existing fast methods leads to the development of
multiscale electromagnetic solvers that are stable and efﬁcient across length and fre-
quency scales. Since the fast algorithms only reduce the time spent in each iteration, a

new integral equation formulation is developed that yields better conditioned systems

of equations. This is achieved by reformulating the augmented ﬁeld integral equations
such that the resulting Operators are bounded and compact. Further, the widespread
availability of parallel distributed or cluster computers combined with the memory
and speed restriction of single processor computers necessitates the development of
efﬁcient parallel implementation of the sophisticated fast algorithms. The parallel
algorithms developed in this work are provably scalabale and enables simulation of
problems with several millions of unknowns on large scale clusters, with hundreds
of processors and beyond. In this thesis, ACE algorithm is also extended to rapid

computation of time domain diffusion potentials.

ACKNOWLEDGMENT

During the past six years of my Ph.D. life at Michigan State University, I have been
inﬂuenced by many people. I wish to acknowledge the select few who have left an
everlasting impression on me.

I start by thanking my advisor Prof. Shanker Balasubramaniam. More than an
advisor, he has been a good friend and his contagious enthusiasm and rigor in research
continues to amaze me. It is difﬁcult to encompass my gratitude and appreciation
towards him in few words.

I thank my committee members Prof. Lalita Udpa, Prof. Satish Udpa and Prof.
Neil Wright for all the support, advice and time they have provided me. Special
regards to Prof. Lalita Udpa, my ﬁrst advisor, for showing the support and conﬁdence
in me to proceed with Ph.D. The frank discussions with both Prof. Udpa(s) have
helped be at home in MSU. I also thank professors E. Rothwell, A. Tamburrino and
C. Y. Wang (Math Dept.) for their inﬂuential research perspectives and exciting
courses.

I carry with me special memories of all the colleagues and friends in NDEL and
BSer. I thank them for granting me a fun-ﬁlled research life. I am indebted to specially
mention Kavitha, Naveen, Sridhar, Gokul and Yiming. I add my heart ﬁlled gratitude
toward my friends Kavitha and Sanketh for their support and affection.

Finally, I would like to thank my family Mr. Karunakara Reddy (father), Mrs.
Bhargavi (mother) and Mr. Vinod (brother). Their unconditional love, moral support
and appreciation of my passion continues to help me lead a satisfactory life, while

maintaining my unconventional perspectives.

iv

TABLE OF CONTENTS

List of Tables ................................. viii
List of Figures . . . ., ............................ x
Introduction 1
1.1 Background ................................ 1
1.2 Hierarchical Computation Scheme .................... 3
1.2.1 Preliminaries ........................... 4

1.3 Static fast multipole method ....................... 7
1.3.1 Single level scheme ........................ 8
Spherical harmonics ....................... 8

1.3.2 Multilevel FMM algorithm .................... 10

1.3.3 Diagonalized translation Operators ............... 13

1.4 FMM for Helmholtz equations ...................... 15
Single Level FMM ........................ 17

Multilevel FMM .......................... 20

1.4.1 Other FMMs ........................... 25

1.4.2 Wideband FMM ......................... 27
Scaled expansions ......................... 27

Spectral representation based plane wave expansions ..... 28

1.5 Applications ................................ 30
1.6 Thesis Objectives and Outline ...................... 32
Accelerated Cartesian Expansions (ACE) 39
2.1 Introduction ................................ 39
2.2 Cartesian Tensors ............................. 41
2.3 ACE: Deﬁnitions and Theorems ..................... 41
2.4 Multi-level Tree Computational Framework ............... 45
2.5 Algorithmic Improvements ........................ 48
2.5.1 Reduced Interaction List ..................... 48
2.5.2 Compressed Oct-tree ....................... 49
Fast Evaluation of Time Domain Retarded Potential in Sub-Wavelength
Structures 51
3.1 Introduction ................................ 52
3.2 Problem Statement ............................ 54
3.3 Single Time Step Geometries ....................... 56
3.4 Multiple time step interaction ...................... 58

V

3.5 Results and Discussion .......................... 61

Wideband FMM and Multiscale Electromagnetic Solver in Frequency

Domain 73
4.1 Introduction ................................ 74
4.2 Integral Equation and FMM for Helmholtz
Equations ................................. 76
4.2.1 Sub—wavelength breakdown of Helmholtz FMM ........ 78
4.3 ACE translation operator for Helmholtz potential ........... 79
4.4 Hybrid algorithm for multiscale problems ................ 81
4.4.1 Combining ACE and FMM ................... 82
4.4.2 Implementation details ...................... 84
4.5 Results ................................... 85
4.5.1 Helmholtz potential evaluation ................. 85
4.5.2 Multiscale scattering problems .................. 87
A Well Conditioned Formulation of Augmented Electric Field Inte-
gral Equation (AEFIE) 96
5.1 Introduction ................................ 97
5.2 Preliminaries ............................... 99
5.2.1 Interior Resonance and Augmented IE ............. 100
5.2.2 Operator and Eigenvalue Analysis ................ 101
5.3 Well-conditioned Formulation for AEFIE ................ 105
5.3.1 Eigenvalue analysis in 2D .................... 109
5.3.2 3D Problems ........................... 110
5.4 Results ................................... 112
Algorithms for Implementation of Hierarchical Computations on
Parallel, Distributed Computers 121
6.1 Introduction ................................ 122
6.2 Preliminaries ............................... 124
6.2.1 The Fast Multipole Method ................... 126
6.2.2 Accelerated Cartesian Expansion (ACE) ............ 127
6.2.3 Hybrid algorithm for multiscale problems ............ 129
6.3 Parallel Algorithm for FMM ....................... 130
6.3.1 Parallel Construction of the Oct-tree .............. 131
6.3.2 Distribution of ACE and FMM harmonic data ......... 133
6.3.3 Construction of Interaction Lists ................ 134
6.3.4 Multipole and local expansion computation ........... 135
6.3.5 Translation Operation ...................... 136
6.3.6 Evaluation of Potential ...................... 137
6.3.7 Parallel Electromagnetic (EM) Solver .............. 138
6.4 Results ................................... 138
6.4.1 Kernel Evaluation ......................... 139
6.4.2 EM Simulations .......................... 141

vi

7 Summary and Future Work 152

7.1 Summary ................................. 152
7.2 Future Work ................................ 154
7.2.1 Well conditioned formulation for EM solver .......... 154
7.2.2 Fast algorithms .......................... 155
7.2.3 Numerical solution procedures .................. 156

A A combined accelerated Cartesian expansion (ACE) and fast Fourier
transform (FFT) acceleration scheme for rapid evaluation of diffu-

sion potentials 157
A.1 Introduction ................................ 158
A2 Mathematical Preliminaries ....................... 160
A3 Acceleration Schemes ........................... 162
A31 ACE for Spatial Acceleration .................. 162
A32 FFT based Temporal Acceleration ................ 164
A.4 Results ................................... 168

B Comprehensive Exam Problem: Integral Equation Based Eddy Cur-
rent Model for Defects in Layered Media 174
B.l Introduction ................................ 175
B2 Integral Equation Formulation ...................... 176
B.2.1 Formulation ............................ 176
3.2.2 Planar Media Green’s Function ................. 177
B3 Numerical Implementation ........................ 180
B.3.1 Evaluation of Green’s function .................. 181
B.3.2 Solution to integral equations .................. 183
BA Results and Discussion .......................... 184
C Curriculam Vitae 190
Bibliography ..................................................... 194

vii

3.1

3.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

4.1

4.2

4.3

4.4

LIST OF TABLES

Comparison between Old and New (reduced) scheme for interaction
list for different distribution sizes (N3) and {P, K} pairs. .......

Exact multipole to multipole and local to local operators of ACE

Error far in single time step interaction case (C3 = 0.5), for various
N3 and {P, K} pairs ............................

Comparison of run-time in single time step interaction case (Cs = 0.5).

Error far in multiple time step interaction case, for various combina-
tion of N 3, N and {P, K} pairs. N is the number of distinct time steps
involved ...................................

Comparison of run-time in multiple time step interaction case

Error far for non-uniform geometry conﬁguration 1 ...........
Comparison of run-time for non-uniform geometry conﬁguration 1. . .
Comparison of run-time for non-uniform geometry conﬁguration 2. . .

Error convergence of ACE algorithm with random points within a A / 2
size domain ................................

Error in FMM multipoles computed from ACE multipoles using Tmap
in (4.23) ..................................

Time for hybrid algorithm as applied to uniform and non-uniform ge-
ometries ..................................

Multiscale problem 1 : Cone—sphere geometry .............

viii

67

68

70

70

71

71

71

72

72

90

93

94

94

4.5

4.6

5.1

5.2

6.1

6.2

A.1

A2

A3

A.4

Multiscale problem 2: Almond ......................
Multiscale problem 3: Toy-aircraft ....................
3D AEFIE condition numbers with loop—star decomposition .....

3D AEFIE condition numbers with orthogonal, quasi-Helmholtz de-
composition ................................

Average time spent by an individual processor at different stages of
hierarchical tree computation for N =40 million. ............

Average time spent by an individual processor at different stages of
hierarchical tree computation for N =20 million points uniformly dis-
tribution within a cube of side-length 20A. ...............

Exact translation operator in ACE algorithm, P denotes the number
of ACE harmonics. ............................

Error convergence for different number of ACE harmonics (P) and
different source/ observer conﬁguration (N3, (1) .............

Time for different problem size (N3) within a cube of sidelength d =
0.5m. In all cases Nt = 256, P = 3 (6 = 0(1E — 5)) ..........

Tfast for different Nt size. In all cases N5 = 8,000, d = 0.5, P = 3
(e = 0(1E — 5) ..............................

95

95

115

142

143

172

172

173

173

1.1

1.2

1.3

1.4

1.5

1.6

2.1

3.1

3.2

3.3

3.4

3.5

LIST OF FIGURES

Hierarchical decomposition of a 2D computational geometry .....

Representation of 2D computational geometry using quad-trees. Boxes
at different levels and corresponding nodes in tree are represented using
binary keys. ................................

Illustration of interaction list; dark boxes are contained in the interac-
tion list of source box. ..........................

Illustration of computational load in single- and multi-level FMMs.
Dark nodes correspond to actual sources while light shaded nodes rep—
resent centers of multipole and local expansions .............

Various operators involved in a multilevel FMM ............

Re—grouped boxes in original interaction list, in ﬁgure 1.3, for applica-
tion of diagonal translation operator (1.15) ..............

An example of compressed-quadtree with binary key representation
used to label the tree nodes. .......................

Example of antenna feed geometry with low- and high-frequency regimes

denoted by (2 L F and 0 H F respectively. Smallest wavelength of inci-
dent pulse is also shown for reference ...................

Deﬁnition for domains interacting over multiple time steps ......
Map of N in equation 3.11 for an example single level interaction. . .

Non-uniform geometry conﬁguration 1, resembling interconnect in elec-
tronic chips (Ns=12000) ..........................

Non-uniform geometry conﬁguration 2 (N329600) ............

X

36

36

37

37

37

38

50

66

66

67

68

69

3.6

3.7

4.1

4.2

4.3

4.4

4.5

4.6

5.1

5.2

5.3

5.4

5.5

5.6

log(N3) vs. log(Tfa,.) for single interaction case and uniform geometry 69

log(N3) vs. log(Tfa,.) for single interaction case and non-uniform ge-
ometry ...................................

An example non-uniform (a) point distribution and (b) its tree repre-
sentation. .................................

Time vs. no. of unknown in log-log plot when hybrid scheme is applied
to uniform and non-uniform (ﬁg 4.1) geometries .............

Bi—static RCS of cone-sphere geometry, corresponding to Run 3 in table
4.4. Inset ﬁgure shows the incident excitation and magnitude Of surface
current. ..................................

Bi-static RCS of cone-sphere geometry, corresponding to Run 2 in table
4.4. Inset ﬁgure shows the incident excitation and magnitude Of surface
current. ..................................

Bi-static RCS of NASA fat almond (multiscale geometry 2) correspond-
ing to Run 2 in table 4.5. Inset ﬁgure shows the'incident excitation
and magnitude of surface current ....................
Bi-static RCS of Toy-aircraft geometry (multiscale geometry 3) corre-

sponding to Run 1 in table 4.6. Inset ﬁgure shows the incident excita-
tion and magnitude of surface current ..................

r
Eigenvalue spectrum of J,’,(na)Hn(2)(na) corresponding to 2D—tEFIE
operator (5.20). ..............................

I
Eigenvalue spectrum of Jn(rsa)Hn(2)(rsa) corresponding to 2D—tMFIE
operator (5.24). ..............................

Eigenvalue spectrum of Jn(r<.a)Hn(na) corresponding to some of the
AEFIE Operators ..............................

Surface current on 2D circular cylinder computed using 2D AEFIE and
analytical solution. ............................

Singular values Of 2D-AEFIE formulation before and after deﬂation. .
2D AEFIE condition number vs. frequency for circular cylinder. . . .

xi

70

90

91

91

92

92

93

116

116

117

117

118

118

5.7

5.8

6.1

6.2

6.3

6.4

6.5

6.6

6.7

6.8

6.9

2D AEFIE condition number vs. frequency for elliptical cylinders with
different aspect ratios. ..........................

(a) Shows the 2D AEFIE condition number vs. frequency for a 3D
multiscale geometry and (b) shows the geometry ............

An example compressed tree used in ACE+FMM hybrid approach.

The Z—space ﬁlling curves or Morton ordering formed by the sorting
the nodes Of the tree at a particular level. ...............

Illustration of the tree partitioning scheme proposed in this work. The
subsequent distribution of nodes and duplicate nodes in each processor
is also shown. ...............................

Efﬁciency Of the parallel-ACE algorithm for computation of Helmholtz
potential between N uniformly distributed random point within a cu-
bical volume. ...............................

Computational complexity of the parallel-ACE algorithm for the case
of uniformly distributed random points in a cubical volume. The slope
of linear line ﬁts, shown by dotted lines, are close to unity and indicates
the linear complexity of parallel-ACE algorithm .............

Time spent by individual processors of a 128 processor set at different
steps of tree computation for N =40 million using ACE expansions.

Efﬁciency Of the parallel-ACE algorithm for computation of Helmholtz
potential between N uniformly distributed random points placed on
the surface of a sphere ...........................

Time spent on the Helmholtz FMM translation operation by individual
processors of a 128 processor set with and without adaptive direction
partition strategy ..............................

Time spent by individual processors Of a 128 processor set at different
steps of the tree computation for N =40 million using the Helmholtz
FMM expansions ..............................

6.10 Efﬁciency of the parallel-FMM algorithm for evaluation of Helmholtz

potential between N uniformly distributed random points within a
cubical volume. ..............................

119

120

143

143

144

144

145

145

146

146

147

6.11 Computational complexity of parallel-FMM algorithm as a function of
number of unknown N for different processor sets P. The slope of
linear line ﬁts are shown by dotted lines. ................ 148

6.12 Efﬁciency of the parallel-FMM algorithm for evaluation of Helmholtz
potential between uniformly distributed random points on the surface
of a sphere. ................................ 148

6.13 Comparison of ROS due to plane wave scattering from a 64A PEC
sphere computed using the parallel EM solver and Mie series solution.
Only a portion of the RC8 is shown for clarity .............. 149

6.14 Comparison of RCS due to plane wave scattering from 3 128A PEC
sphere, discretized with 14 million unknowns computed using the par-
allel EM solver and Mie series solution .................. 150

6.15 Efﬁciency of the parallel EM solver for the scattering from PEC sphere
with different number of unknowns N and as number Of processors was

varied from 64 to 512. .......................... 150

6.16 Induced surface currents on multiscale geometries (a) toy-aircraft with

sharp edges and (b) tetrahedron shaped arrow .............. 151
A.1 Illustration of the Block-Toeplitz computational scheme ........ 172
A2 log Tfast vs. log N, from Table A3, slope of linear ﬁt = 1.1 ..... 173
B.1 Different domains in eddy current simulation .............. 185
B2 DCIM path from existing literature for zero conductivity ....... 185

B.3 Comparison between existing and PrOposed DCIM path with ﬁnite

conductivity ................................. 186
B4 Tfansmitted ﬁeld at various depth .................... 186
B5 Tfansmitted ﬁeld with varying frequency ................ 187
B6 Z vs. coil lift-Off .............................. 187
B7 Z vs. excitation frequency ........................ 188

B8 Experiment to validate the integral equation model. Measurement of
impedance as the coil is scanned across the defect ............ 188

xiii

B.9 Absolute value of coil impedance from the IE model. ......... 189

B.10 Phase value Of coil impedance from the IE model. ........... 189

 

Chapter 1

Introduction

This chapter provides a comprehensive introduction to the development of fast mul-
tipole methods (F MM) within the context of electromagnetics. Section 1.1 gives a
brief account of the various developments in FMM that are elucidated in greater de-
tail in rest of the chapter. Section 6.2 provides a general introduction to hierarchical
algorithms, which forms the backbone of this thesis work. Section 1.3 and 1.4 details
the development of various versions Of FMM for Laplace and Helmholtz equation,
respectively. Section 1.6 gives a preview of the developments made in this thesis work

along with the outline of the thesis.

1. 1 Background

The numerical solution of Maxwell’s equations has typically proceeded along two
different paths. The ﬁrst, and perhaps the more popular, is the direct discretization
of Maxwell’s equations [1, 2]. Finite difference and ﬁnite element methods belong to
this class of solvers. Their popularity stems from two salient features; (i) they are
typically simpler to program and (ii) their memory and CPU cost scales as 0(N),

where N denotes the number of degrees of freedom. The second methodology for

1

 

solving Maxwell’s equations are based on developing integral equations (IE) derived by
evoking the Green’s identity/ equivalence theorems. While the latter was introduced
in electromagnetics more than four decades ago [3], they were not a popular option
for electromagnetic analysis. The bottlenecks to their adOption was due to both the
memory and CPU complexity, both of which scale as 0(N2). This is despite some
of the inherent advantages Of integral equations for analyzing open region problems,
viz., better condition numbers, possibility of using surface integral equations and

incorporation of the radiation boundary condition in the Green’s function.

The introduction of the fast multipole methods (and tree codes) signiﬁcantly al-
tered the landscape. Both these methods were developed in response to accelerating
pairwise potential evaluations in N -bOdy problems in ﬁelds ranging from biophysics
to computational chemistry to astrophysics, etc. Here, it is necessary to compute
long-range Coulombic potentials repeatedly between N randomly distributed parti-
cles. The tree methods [4, 5] and the fast multipole method (FMM) [6, 7, 8, 9] reduced
the computational complexity of computing these pairwise potentials from 0(N2) to
0(N). FMM and tree codes are based on a hierarchical decomposition of the com-
putational domain, and using multipole/ local expansions to compute the inﬂuence
between sub-domains that are sufﬁciently separated. The FMM, as introduced in [7],
exploits the representation of the potential in terms of spherical harmonics. As we
shall see, this is a consequence of using addition theorems to represent the potential
as a series wherein each term is a product of two functions. These functions depend
either on the coordinates of the source or the observer only. The separation between
source and Observer is crucial to creating a fast scheme. At about the same time,
an algorithm that achieves the same reduction on complexity, albeit using Cartesian
tensors was introduced [10]. This derivation relies on using Taylor expansion Of the
potential function to provide the necessary addition theorems [11]. Cartesian expan-

sions have been used extensively in tree codes. More recently, FMM codes based on

2

Cartesian expansions have used recurrence relations to avoid derivatives [12]. Typi-
cally, FMMs derived using the Cartesian expansion are more expensive as spherical
harmonics are optimal in representing Coulombic potentials. However, it was recently
shown that it is possible to develop a FMM using Maxwell-Cartesian harmonics that
are as optimal as using spherical harmonics with one singular advantage; it avoids
the need for special functions [13]. Both FMM and tree codes have revolutionized
analysis in various application domains ranging from molecular dynamics [14], elas-
tostatics [15, 16], elastic wave equations [17], ﬂow problems [18], capacitance [19] and
impedance [20] extraction in micro-electronic circuits, evaluation of splines [21] and
spherical harmonics [22, 23]. The FMM framework has also been extended to the
solution of potentials resulting from parabolic equations [24, 25, 26].

However, direct extension Of FMM to the solution of potentials arising from hy-
perbolic equations is not as straightforward. The ﬁrst solution to this problem was
presented in two dimensions [27, 28], and soon extended to three dimensions [29, 30].
The crux to developing these algorithms was the derivation of a diagonalized form
of the translation operator [30, 31, 32]. Since then, there has been a virtual ex—
plosion in research in application Of these methods to various problems in electro—
magnetics; see [33, 34, 35] and references therein. The state of art is such that
problems Of the order of tens million spatial degrees Of freedom have been solved
[36, 37, 38, 39, 40, 41]. However, the development of FMM based method continues
on many fronts [42, 43, 44, 45, 46, 47, 48, 49, 50]. This paper reviews progress in

FMM technology since its inception and details current trends in FMM research.

1.2 Hierarchical Computation Scheme

The purpose Of this section is to outline the structure of fast multipole methods and

introduce notation that will be used in the rest of the paper.

3

h
.‘r
K‘s“:
.‘ ' ‘I "
\\

\‘L

..,
\x.
«A.

An»

v'-
F.
“AA.

59"
-)

F;

1.2. 1 Preliminaries

Consider a source distribution q(r) such that supp {q(r)} = 52 C R3. Likewise, it is
assumed that the Observers are also distributed in $2. With no loss of generality, it is
assumed that q(r) = Sill qz-J (r — r,), where N is the number of degrees of freedom.

The ﬁeld due to this source constellation at any point r E R3 is given by

k

45(1’) = 9(Irl) * (10‘) = :90]? - 130% (M)
i=1

where g(|r|) is the appropriate Green’s ftmction, and * denotes a spatial convolution.
It is apparent from this expression that the ﬁeld evaluation scales as 0(N2) for N
observation points. Ideas introduced by [4] to ameliorate this cost for static problems
relies on exploiting the fact that the ﬁeld at a point due to a cluster of sources is
rank deﬁcient, where the rank depends on the distance between the point and the
center of the cluster. In other words, for a given accuracy, potential at an Obser-
vation point sufﬁciently separated from a cluster Of sources can be computed with
few multipole expansions. Similarly, for given accuracy, few local expansions can be
used to compute potential at a cluster Of Observation point due to a well-separated
source. These ideas were cast in a more formal framework as tree-codes [5] and FMM
[6]. At this point, we note that there is rampant confusion in terminology; both
FMM and tree codes are used interchangeably. While the two methods are closely
related, there are subtle but signiﬁcant differences between the two [51]. Tree codes
compute interactions between source pairs using one of three methods: (i) directly,
(ii) evaluating ﬁelds at each Observation point using multipole expansion due to a
cluster of sources, or (iii) using local expansion at Observation clusters to ﬁnd ﬁelds.
The decision on the Operation used depends on which one is computationally eﬂicient.
On the other hand, the algorithmic structure of FMM enables the computation of

potentials in an Optimal manner [51]. Two additional operations that permit this are

4

..,‘....

is»

9.7.-
.II
“I...

‘r
L“

“
k.

aggregation and disaggregation functions. These permit the computation Of informa-
tion at coarser (or ﬁner) levels using information at ﬁner (or coarser) levels. Thus,
FMM relies on a hierarchical decomposition of the computational domain. This is
achieved using the following strategy [8]; the computational domain 52 is embedded
in a ﬁctitious cube that is then divided into eight sub-cubes, and so on. This prO-
cess continues recursively until the desired level of reﬁnement is reached; an Nl-level
scheme implies N1 — 1 recursive divisions of the domain, see ﬁgure 1.1. At any level,
the (sub)domain that is being partitioned is called the parent of all the eight children
that it is being partitioned into. At the lowest level, all source/Observers are mapped
onto the smallest boxes. This hierarchical partitioning of the domain is referred to as
a regular oct-tree data structure. Regular oct-tree representations are optimal only
for geometries with uniform distribution [52]; non-uniform distributions can be repre-
sented using compressed oct-trees [51, 39]. In compressed oct-trees, sub-division- of a
domain is stopped when number Of source/Observer in that domain becomes less than
a pre-ﬁxed limit. While many algorithm exist for constructing a tree, the one that
we have found to be efﬁcient is the use of key data-structure to represent the nodes
Of a tree. In this approach the root box enclosing the entire geometry is represented
with integer value 1; each of the eight (four) children of a parent is identiﬁed with a
three (two) bit code which is appended to the parent box key to Obtain their global
unique key. Figure 1.2 shows an example compressed oct—tree where each box is rep-
resented using key-codes. This representation has several advantages: the nodes of
tree at each level automatically follow Morton ordering and it plays an important role
when partitioning the boxes among processors in parallel algorithm, all antecedents
of a box and essential information like size Of box, center position, level etc. can be
readily recovered from its key-code using bit manipulations [53, 38, 54]. Mapping the
computational domain onto a tree facilitates partition/ classiﬁcation of interactions

as being either in the near or farﬁeld. This is done using the following rule: at any

level in the tree, all boxes/sub—domains are classiﬁed as being either in the near or
far ﬁeld of each other using the following dictum: two sub-domains are classiﬁed as
being in the farﬁeld of each other if the distance between the centers is at least twice
the side length of the domain, and their parents are in the near ﬁeld of each other;
see ﬁgure 1.3 for an illustration of these classiﬁcation. Once, the interaction list have
been built for all levels, the computation proceeds as follows; at the lowest level,
interaction between the elements of boxes that are in the near-ﬁeld of each other is
computed directly, i.e., using (1.1). All other interactions are computed using a three
stage algorithm: (i) compute multipoles of sources that reside in each box; (ii) convert
these to local expansion at all boxes that are in its far ﬁeld; (iii) from the local ex-
pansion, compute the ﬁeld at each Observer. This simple three stage scheme is called
a one-level scheme, and necessitates the development of theorems for (i) computation
Of multipoles at leaf (or smallest boxes), (ii) translate multipole expansion to local
expansion and (iii) ﬁnally, aggregate the local expansions in a box to compute the
ﬁeld at all the Observers. It is apparent that one can derive a more efﬁcient compu-
tational scheme by embedding this scheme within itself as shown in 1.4. That is, if
two sets of sub-domains that interact with each Other are suﬂ'iciently far away, then
these clusters may be combined to form large clusters that then interact with each
other at a higher level and so on; this is referred to here as a multilevel scheme. This
implies that it is necessary to develop additional theorems that enable (i) shifting the
origins of multipole so that effects of small clusters can be grouped together to form
larger clusters and (ii) move the origin of local expansion so that expansions at the
origin of the parent may be disaggregated to those of its children. In concert, these
theorems enable one to traverse up and down the tree, and are presented next. This
said the various steps involved in the hierarchical computing are shown in Algorithm

1.

Note that in single level algorithm the upward and downward tree traversal (steps

6

 

Algorithm 1 Hierarchical tree computation

1: Construct the tree representation for the given geometry (distribution Of discrete
points).

2: Build interaction list using the above deﬁnition, for all boxes in the tree and the
near-ﬁeld list for leaﬂess boxes.

3: NF: Use direct method for computation of nearﬁeld potential at observation
points in each leaﬂess box from sources contained in its near-ﬁeld boxes.

4: $2M: compute multipole expansions for each leaﬂess boxes from sources con-
tained within it.

5: M2M (upward traversal): for all parent boxes compute the multipole expansion
by combining the multipole expansions at their children boxes.

6: M2L (translation): for all boxes in the tree convert the multipole expansions to
local expansions about centers of boxes in their interaction list.

7: L2L (downward traversal): update the local expansion information at a child box
using the local expansion of their parent box.

8: L20: use the local expansions about each leaﬂess box to compute the farﬁeld
potential at its Observation points.

 

 

5 and 7) are absent. Next, we will detail these Operations for diﬂerent FMMs. Starting
with well known static FMM to those for Helmholtz and ﬁnally to those for Wideband
FMM. Details are presented for the ﬁrst two despite the fact that they are well known.
The rationale for doing so is two fold (i) it is important to understand when FMM
for Helmholtz fails and (ii) techniques developed for static FMM and some of the new

FMM approaches ﬁnd their way into the development of Wideband FMM.

1.3 Static fast multipole method

This section provides the appropriate theorems for fast evaluation of potential deﬁned
in terms of g(|r|) = 1/|r|. Such potentials are commonly used in study of plasma
dynamics, magnetostatic problems, eddy currents etc. While on ﬁrst glance, one
might be inclined to exclude methods developed for rapid evaluation of the Coulomb
potential but these play an important role in developing fast methods for Wideband

problems.

1.3.1 Single level scheme

Consider two domains (23 E R3 and {20 E R3 that comprises Of randomly located
source and Observer points, respectively. With no loss of generality, it is assumed
that the number of sources and observers are k, these domains can be embedded in
spheres of radius a. The centers Of 93 and 90 are denoted by rs and r0, respectively.
It is assumed that (23 C Q; and (20 C {-20 and (2305—20 2 Q), and the domains of 93 and
(20 are sufﬁciently separated. In what follows, the domains Os and {—20 will be called
parents Of {23 and 90, respectively. The parent domains can be embedded in a sphere
Of radius 2a, and their center are denoted by rg and r5, respectively. Next, we will
present a single level FMM constructed using two methods; (i) spherical harmonics

and (ii) Cartesian tensors.

Spherical harmonics

The. theorems for a single and multilevel FMMs using spherical coordinates were
introduced in a series of papers [7, 8], and have found extensive application in various
disciplines; a sampling of these can be found in [6, 8, 55, 19, 20, 56, 57]. The genesis

of the method is the well known generating function for Legendre polynomials [58],

1 1

 

 

i ——’"'"

= = P (cos'y) (1.2)
+1 n

R r\/1 — 2% cosy + ("—I)2 n=0 Tn

7'

with

cos'y = cos (9 cos 0' + sindsin 0' cos(¢ — (1),) (1.3)

where Pn(u) represents Legendre polynomial of degree n, r’ = (r’, 0’, 45') and r =

(r, 0, d2). Legendre polynomials in (1.2) can be represented in terms of spherical

8

harmonics Ynm(9, ¢) using the addition theorem [59],

nc(o)sry 2:: Yn(9m m(9’ ,gb) (1.4)

m=-n

where the superscript * represents complex conjugate. Using (1.4) in (1.2) results in

complete separation of source and Observation quantities,

_ _ -2 2 ”WM 02¢ ,)K.’:_T[__+,1¢) (1-5)

n=0 m=—n

These expressions enable the derivation of the following theorems necessary for steps

4, 6 and 8 in Algorithm 1.

Theorem 1.3.1 (Multipole Expansion ($2M): spherical). Let 19 charges of
strengths {q,-,i = 1,...,k} be located at r,- E 93 with Ir,- — r3] < a. Then for any

1‘ 6 Do, the potential ¢ is given by,

00 n

=2 2 Mr,—’"—3’:f.f..":’. m»

n=0 m=—n

where

k
= Z QiIri — rsI”YI:"(0.-, 9152') (1.7)

i=1
where the parameters {9;,¢,-} and {9, <15} are spherical coordinates of r,- and r w.r.t

the origin at r3.

In Theorem 1.3.1, MI," is the multipole expansion at r3 constructed from the
source quantities q;(r,-). Proofs for the error bounds in the above expressions can be

obtained from [8, 9]. Next, these multipoles are translated from r3 to to.

Theorem 1.3.2 (Multipole to Local 'D'anslation Operator (M2L): spherical).

Given a multipole expansion 07'? about r3, it can be mapped to local expansion L59 at

9

n;
.
I

Thec

its;

 

r0 using
" orHit-ml-'ml-Ik'ArAtliT;k(6.i)

Left 2

 

 

n m-k i+n+1 (1'8)
n=0m=-n (_1) Ai+n Irs _ rOI
_ n
where {9, 45} are the spherical coordinates of the r3 w. r.t r0, and AI," — ( 1)

_ \/(n—m)!(n+m)!'

Finally, the local expansions at any leaf node may be mapped onto the observers

using the theorem presented next.

Theorem 1.3.3 (Local expansions to Observer (L20): spherical). The potential

at a point r E 90 due to local expansion Lg about origin is given by,

¢<r>=2 Z LT'I’Ir-rolnYJ"(9,¢) (1.9)

n=0 m=-n

As before, the parameters {9, (1)} are the spherical coordinates of r with respect to the

origin at r0.

The above theorems, in a one level setting, permit the rapid computation of
potentials at all points in 90 due to sources in {23. It is evident that this scheme can
be embedded within itself to create a multilevel scheme. But prior to doing so, it is

instructive to reexamine the fundamentals of FMM from a Cartesian perspective.

1.3.2 Multilevel FMM algorithm

It is apparent that the 0(N4/3) cost of single level algorithm can be further reduced by
embedding this scheme within itself, as is evident from ﬁgure 1.4. To implement such
a scheme it is necessary to develop methods that enable one to construct multipole
expansions at a parent level from those at their children. These are effected using the

following theorems.

Theorem 1.3.4 (Multipole to Multipole (M2M): spherical). A multipole ex-

10

pension 0;,” about r3 can be mapped onto one that exists around r3 using

" 0i“: .1"( )lkl—ImI-I’c-m'ArAm(swerve)

 

=2 2 m)
k (

n=0 mz—n Ai

where rs =|r3 2 rs — ﬂ; and {9, (15} are the polar coordinates of rs w.r. t. r3.

Theorem 1.3.5 (Local to Local (L2L): spherical). Given a local expansion 0;?

about 1'5, it can be mapped to one around r0 using

11 n 0,7,"(—j)ImI-IkI‘Im-klAxgkAfyﬂgkw, (1)) (rgC)n—i

 

= Z Z . (1.11)
n=i m=—n (— 1)ﬂ+‘lA1T{l
'whererpc =|rgc|- — Iro— r5], and {9, rt} are the polar coordinates of re w.r. t. r5.

The equivalent theorems for Cartesian expansion likewise follow.

Theorem 1.3.6 (Multipole to Multipole (M2M): Cartesian). A traceless mul-

tipole tensor Oém) at r3 is related to Mlm) that is centered at r? via

 

m m ("llnvn 11376 n m—n
M] )=Z_:O n! (2n(—1]!!OI ) (1'12)

wherergc=r§—r3.

Theorem 1.3.7 (Local to Local (L2L): Cartesian). Given a local expansion 0]”)
that exist in the domain (20 centered around 110’, it can be shifted to the domain (20

centered at r0 using

m m m+n m Tl. n
L] l: E o] + ).(m).(r2p)t (1.13)

1120 m

where r3” = r0 — r?

11

._-
t)!
(I‘

‘1"
8.
,4
11!:

J
C
r1)

__7'

(

real.

03‘.

-I

r».
Ge):

9.17

These theorems, in concert, permit traversing up and down the oct-tree, see ﬁgure
1.5. While these theorems are the bare-bones presentation of the steps required, there
have been several attempts to make these more efﬁcient [7, 8, 14, 60]. As both methods
are based on Taylor expansions the upperbounds in using these approaches can be
readily derived. Such a derivation is presented in. [8, 13]. Alternatively, another
interesting algorithm was introduced in [13] that permits exact evaluation of the
multipole expansion at the parent given the multipole expansion at the children—this
has been shown both analytically and numerically for different potential functions.
However, in order to get this exact expression, one has to abandon the use of traceless
tensors. It follows that the cost Of using exact multipole to multipole translations is
higher. But in our experience, we have found that we need a smaller number of
multipoles for the same precision, and this can signiﬁcantly affect the total cost,
especially for large data sets [13]. Abandoning the use of traceless Operators has
three salient beneﬁts; (1) the algorithms can be used for any potential function whose
Taylor’s series converges rapidly, (ii) it does not depend on special functions and (iii)
only the translation Operator depends on the potential function which implies that

multiple potentials may be easily combined [61].

In all the above expressions, it was assumed that the number Of multipoles used
was inﬁnite. The analytical estimates regarding truncation of this sum for both the
spherical and Cartesian form can be found in [8, 13]. The cost analysis for multilevel
approach is as follows: the total number of boxes in the tree is 0(N/s) and the cost
for 82M and L2O operations remains the same; the cost of applying M2L translation
Operation across levels scales as 0(P4N/s). In addition the cost of applying M2M and
L2L Operations for all boxes scales as 0(N/sP4). Thus, the overall computational cost
associated with both schemes scales as 0(P4N). This cost is largely dominated by the
time for multipole to local translation (M2L) and considerable research effort has been

expended on reducing this cost. A closer examination of the M2L Operation reveals

12

r...
.
uu.

 

V
(1..
. U.

 

 

It)? :

 

that (i) the number of translations per box is 189 and (ii) the cost per translation
scales as 0(P4). The latter is due to the fact that this operation is not diagonal.
Greengard et. al. [9] remedied this deﬁciency by introducing a novel algorithm
that diagonalizes the translation Operator. Additional modiﬁcations to the overall
algorithm introduced there [42, 62] further reduces the number of translations, making
the “revamped” FMM extremely efﬁcient. Ideas behind this diagonalization can be
exploited by either both varieties of FMM; spherical and Cartesian. It also plays
a key role in FMMs for low-frequency, and consequently, will be presented in some
detail next. An FFT based implementation of above un-diagonalized form results in
a overall cost that scale as (9(NP2 log P) [60], but will not be dwelt here.

1.3.3 Diagonalized translation Operators

A diagonal translation Operator may be derived using a spectral representation of the

Green’s function [9], viz.,

1 1 00 _A 21r _, ,
_ = _ d/\e z/ dae JA(xcos(a)+ysm(a) 1.14
R 21f 0 0 ( )

for z > 0. It is apparent that the inner integral is in fact a zeroth order Bessel function.
The computation Of potentials using the above expressions hinge on the existence Of
an integration rule that is efﬁcient to a given precision and scale invariant if this
formula is to be used at different levels in the FMM tree. Given the existence of such
a rule [63], the potential at any point can be written as [9]
8(6) M (k)
¢(r) = Z Z W(k,i)e—)‘kze_3’\k($cosai+y31nail + 0(6) (1.15)
k=l i=1
where the coefﬁcients W(k, i) are a combination of the charges q,- and integration

weights wk, 8(6) and M (1:) denotes the number of integration points for 6 accuracy.

13

Evidently, in above discrete representation, the number of integration points M (k)
for evaluating a integral depends on k to account for the varying bandwidth, Ak,
of its integrand. The advantages of above scheme are immediately apparent in that
it readily permits translation of the origin; translation Of the origin is quite simply
a shift in the exponentials. The similarity between (1.15) and those in Theorems
(1.3.1), and (2.3.3) are readily apparent. The mapping from spherical harmonic

multipole coefﬁcients MI," onto exponential expansions W(k, j) is given as [9],

 

 

wa, i) = ﬁll)“ §(j)ImIe-jmai ”gill \/ n _ 71:14)?” + ml}; (1.16)

and given W(k, i) coefﬁcients the spherical harmonic local expansion L? can be

computed with,

 

 

(j)|m| 8(6) M(k) .
I"? = —A " W k,i e-‘Jmai ,
x/(n-m)!(n+m)!kE=:1( k) Z31 ( l (117)

The multipole to local translation Operation, with diagonalized translation forms, can
be computed as a three stage process: multipole coefﬁcients are mapped to W(k, i),
translate W(lc, i), and then map the translated coefﬁcients back to local expansions,
and then proceed as usual. It is evident that cost of all operators involving exponen-
tial expansions scale as 0(P2). Various symmetry considerations in implementation
reduces the number of total translation count from 189 to 40. Additionally, one can
exploit symmetry in the expressions involved to further reduce the overall cost, if
not the asymptotic complexity [56]. Thus, properly modifying and augmenting ei-
ther spherical or Cartesian multipole based algorithms with plane wave translation
Operators can considerably ameliorate the cost. However, a couple of issues must be
noted; (i) the plane wave expression is valid for z > 0, this implies that the interaction

list must be modiﬁed [9]; (ii) additional Operators must be introduced to rotate the

14

multipole Operators along the required axis; (iii) the operator developed should be
scale invariant for the scheme to be efﬁcient. In implementation the spherical har-
monic multipole coefﬁcient is converted into six plane wave expansions corresponding
to each face of the cube and the interaction list deﬁnition is changed accordingly. For
example, exponential expansions corresponding to +2 cube face is valid only for boxes
present above X — Y plane, as illustrated in ﬁgure 1.6. Boxes in original interaction
list are divided into six new sets termed as up-list,dovm-list,north-list,south-list,east-
list and west-list corresponding to +2, —z, +y, —y, +x and —x cube faces respectively
[9]. Overall, the diagonalized version of the translation operator reduces both the to—
tal number of translation operation and per translation cost leading to a much faster
algorithm. This approach is very similar to spectral approaches developed for al-
ternative derivation of Helmholtz FMM [42, 64] and is the crux of many methods
developed for Wideband FMM.

1.4 FMM for Helmholtz equations

Thus far, we have seen that cascaded Taylor expansions can be used to develop static
FMM. While these ideas are readily extended to the solution of parabolic equations as
well [24], they are not readily extendable to Helmholtz equation kernels, especially at
high frequencies. Furthermore, as was evident from last section, the scheme developed
should be diagonal. Consider a problem setting that is identical to what was described
in Section 6.2. We shall seek development of methods to accelerate the evaluation
of the potential integral in (1.1) with g(|r|) = exp[—jrs:|r|]/|r|. One expression that
readily suggests itself is the Gegenbauer addition theorem [59, 65, 31],

_- oo

(27%? = —jn§%(—1)l(2l+1)jl(nd)h[2)(nX)Pl(d-i) (1.18)

15

 

 

'7'
l.”

‘4‘

{re ,
4‘“),

W511?

‘1'

If?) r,

‘-

 

where X and d are position vectors such that r = X+d and |X| > |d|, j; and him are
lth order spherical Bessel and Hankel function of second kind, X = IX] and d = [d].
Augmenting this theorem with another addition theorem for Legendre polynomials
in (1.4) completes the separation between the source and Observer coordinates.
e —jK]X+d]— (2 )
l—-——x + dl =-JI€Z ZI:(- (21+1)jz(ﬁd)hl (EX)Yzm(9X.</)X)Yzm(9d.¢d)
l: 0 m=—l

(1.19)
where L is the number Of terms used in the summation, {9 X: 43 X} and {9d, ¢dl are the
polar coordinates of K and d respectively. It is evident that one may use a sequence
of addition theorems to create hierarchical computational methodology. However, the

principal bottleneck to such a scheme is the fact that the Operators involved are not

diagonal. However, diagonal Operators are easily developed by recognizing that
.1. .. r_ 2“—jk-d ~ A
47r(-J) 31(nd)P)(d-X) —— [d ke Pl(k-X) (1.20)

where dzk = sin 9d9d¢ and k = nk. The relation (1.20) can be derived from well
known orthogonality relation among spherical harmonics and expansion for plane

waves given as.

.. . - 47r

/ deY,;n(k)1/,7m,(k) = mama”, (1.21)
00

6-3-2.... = Emamcosv) (1.22)

Substituting (1.20) in (1.18), interchanging the summation and the integral, and

truncating the summation over I yields the ﬁnal diagonalized form,

e‘j"'x+d' _—__jn _/ 2 Rd he)
———= dke -1 Z(— 1)( (M1) (nX)P(k X) (1.23)
[X+d] 47r 1:0

16

Several derivation that result in above diagonalized form exist and are based on
different set of starting formulas [30, 31, 66, 67, 32]. First scheme for diagonalizing
(1.18) was presented in [30] with the use of forward and inverse far ﬁeld transform

deﬁned as,

f(6,¢) = 2 Et— 2")me (45mm (1.24)
l=0m=—l
fl... = [an -j)-’Y.r<k k)f(6 a (125)

Above deﬁnition is a simple spherical harmonic transform from k to {l, m} basis
with direct analogy to Fourier transform. A simpler version of derivation in [30]
is presented in [67, 35]. In [31], the expansion in (1.18) is represented as matrix
vector multiplication which reveals a convolution relation in indices {l, m} . Such
convolutions in {l, m} can be computed as one-tO—one multiplication in 16 domain using
the far ﬁeld transform [31]. A detailed discussion on deriving the above diagonalized
forms from the convolution representation Of original multipole expansion for both
Laplace and Helmholtz equation is presented in [31]. An alternate derivation based
on similarity transform and their relation to group theory is presented in [32] to yield

the same expansion in (1.23).

Single Level FMM

As before, assume that (23 and 90 denote the source and observation domain, and it
is necessary to ﬁnd the ﬁelds Vr 6 $20. It is further assumed that the domains are
cubes, in keeping with the data structure of oct-tree and that each domain can be
embedded in a sphere Of radius a. Furthermore, the clusters are assumed to be well
separated. The separation distance is closely related to error bounds [30, 65], and

will be dealt with in later part of the paper. Given these conditions, traversal up and

17

 

 

l: .'

Th7

 

 

”5

down the tree is effected using the following set Of theorems:

Theorem 1.4.1 (Farﬁeld signature). The far ﬁeld signature due a set of source

q,- fori= 1,--- ,1: located atr; E {23 isgiven by

k
Mmm=ZMusnws
i=1 (1.26)

k
= 2611 6X1) l-jk' (1‘s - rill
i=1

Theorem 1.4.2 (’Ii'anslation operator). If a farﬁeld signature exists at a point r3
such that it is valid for all points outside the domain {23, then the translation operator
that maps this farﬁeld to the local expansion that is centered around r0 and valid in

the domain {20 is given by

00

m. r...) = Z<—1)’<2z + 1>h§21<nlrosi>atﬁ - a.) (1.27)
l=0

Where r03 = r0 — r3

Finally, the potential at any point r E 00 can be constructed using

«mag ﬂMﬂ4wﬂWWmWWM (me

While these equations are readily derived from (1.23). More insight into the derivation
of these equations can be Obtained by realizing that the farﬁeld (and local expansions)
can be represented in terms of spherical harmonics. In turn, this interpretation leads
to expressions that reveal convergence rates Of these and error bounds as a function
radius a and the separation distance. More importantly, this insight leads to the type

of quadrature rules that must be used to implement these schemes numerically. In

18

 

of.

 

other words, the continuous integral is evaluated using

. L P
"".7
= T Z” 22pm“; ,—kpq, r0 — r) T(kpq, r03)M(r3, kpq) (1.29)

where L is the order Of the Gauss Legendre rule, mm are the integration weights, p

and q are the integration points in 9 and (b axis,

 

27rq
7‘1 " 2L + 1
9,, is the (p + 1)th zero Of PL+1 (cos 9)
4r (1 — cos2 9p) (1‘30)
"1m:

 

(2L + 1) [(L + 1)PL(cos 9,,)]2

km 2 xsin9pcos¢q +ysin9psin¢q + 2cos9p

As is apparent form the above equations, uniform sampling is used to evaluate the
integral along ()5. Other applicable rules may be found in [68]. We have yet to elaborate
the underlying factors that decide the order of Gauss-Legendre rule that is used along
9. A number Of formulae exist for choosing the number Of Gauss-Legendre quadrature
point [30, 65, 69]. However, examination Of (1.23) yields interesting insight. If only the
exponential terms are considered in this integral, it is apparent that these expressions
can be represented using L= 0(rid)= 0(2na) harmonics. This, in turn, implies
that the summation is also truncated using L terms. Though the reasoning here is
based on economical means to discretize the integral a deeper reason, arriving at same
conclusion, exists for choice of L based on original multipole expansion [65]. Choice
of L should be large enough for the series (1.19) to converge, but not too large to
cause numerical instability due to the asymptotic behavior of spherical Bessel and
Hankel functions. Given that only a ﬁnite number of terms are being used, one
can explicitly derive error bounds that, in turn, depend on the translation distance

also [30]. Deriving rigorous error bounds has been a focus Of considerable work

19

[43, 69, 70, 71, 72], and the behavior of error is well understood [73, 74] as are the
means to overcome these. A simple choice for truncation limit L applicable to most

practical problems is,

L = rid + Clog(nd + 7r) (1.31)

where C is a number that depends on the desired accuracy 6; typically the choice
of C is {3,5,10} for an accuracy c = {10-3,10-5, 10-14}, respectively [65, 75]. This
estimate is semi-empirical and assumes that the two boxes are well separated if they
are one box apart. Other estimates [76, 73, 69] based on approximation of Bessel
and Hankel function exists both in two- and three-dimensions and can account for
multiple box separation between interacting boxes [72, 74]. Cost of this scheme can
be computed in the same manner as in the static with P = L and the diagonalized
form of translation operator implies 0(P2) cost per Operation. However choice of L
depends on size of box kd, which in turn dictates the number of unknowns per box 3
(assuming uniform discretization). It can be show that the Optimal cost of the above

scheme scales as 0(N3/2) for surface problems.

Multilevel FMM

While the above exposition details the necessary mathematics for implementing a
single level scheme, nesting these in a hierarchical setting is the next logical extention.
The ﬁrst robust attempts to do so are [7 7, 78, 79]. Extension to multilevel is different
from that encountered for the Laplace FMM; there, the number of multipoles at all
level of the tree was constant. But as is evident from ( 1.31) and (1.29) as the size of
the source/ receiver boxes increases, the bandwidth increase increases by a factor of
two, which implies that the number of directions increase by a factor of four. This
then creates a need for developing robust methods for going up and down the tree

for the stages of aggregation and disaggregation. These Operators can be thought of

20

as ﬁlters. But before we proceed into intricate details of the methods to implement

these, the theorems that help achieve these are as follows:

Theorem 1.4.3 (Translation of farﬁeld signatures). If the farﬁeld signature
M(r3,k) around the point rs E 93 is known, then the farﬁeld signature M (r§,k)
around the point r? E 5212 is given by
M(r§, k) = M(rs, k)e_jkl(rg_rs) (1.32)
An identical theorem for can be derived for translating local expansion at the
parent level to that Of its child. Numerical implementation of these theorems is not
as simple as it seems. To maintain uniform accuracy across levels, employing (1.31),
the L for parent is approximately twice that of its child. This implies that the num-
ber of direction for parent box is approximately four times that of its child; thus
the multipole expansions for the child and parent box are deﬁned on different grids.
This process of computing a higher bandwidth representation from lower bandwidth
farﬁeld signature is referred to as interpolation and anterpolation is its inverse ana-
logue applied during downward tree traversal. Implementing the above theorems calls
for efﬁcient methods to interpolate (or anterpolate). Several methods that exist have
been elaborated upon in [33] and summarized as well in [69]. An eﬂicient and exact
algorithm can be devised using the forward and inverse farﬁeld transform for both
interpolation and anterpolation [79, 23, 80, 35]. This algorithm relies on the fact that
at any level the farﬁeld signature can be represented in terms of spherical harmonics,
VlZ.,

OO

M(-.k)=Z Z aannm(6.¢) (1.33)

n=0 m=-n
As is well known, the farﬁeld signature of a source constellation is bandlimited to

0(na) harmonics. This implies that the above expression can be truncated. Further-

21

more, since an Lth order rule is chosen to evaluate the spectral integral in (1.23), it
follows that the upper limit in the summation over n can be chosen to be L. This said,
direct computation of anm is expensive. Alternate methods both exact and approx-
imate have been discussed in [23, 81]. Consider the computation of anm from child
farﬁeld signature M (r3, kkp) represented using (2L2+ 1) coefficients, i.e. p = 1, - - - , L
andq= 1,--- ,(2L+ 1),

“ma = [d2"M(r31 k)Y;m(61 45)

L (2L+1) .
= 2: prnm (COS 6p) 2: M(r81km)ejn¢q (1.34)
1):] q=l
L
= 2 prnm (COS 6p)am (6p)
p=1

where wp are numerical quadrature weights. Since, the integration along (b is per-
formed using uniform sampling, fast Fourier transform (FFT) can be used for summa-
tion inside the brackets. These coefﬁcients are then used to compute samples along

new polar coordinates (91,421) with p = 1, - -- ,L and q = 1, - -- ,(2L + 1) as,

~ ~

L ~ L
M (r3,kpq) = Z e‘jm‘bq Z anman(cos 919) (1.35)

m=—L "=1

Again, FFT can be used to evaluate the outer summation. In interpolation, L > L
to accommodate for the increase in bandwidth and km represents the discrete direc—
tions of the farﬁeld signature corresponding to the parent. The required multipole

coefﬁcients about parent origin r3 can be obtained using a simple shifting Operation,
Mag, 1%) = M(r,,12pq)e*jkm-(”s’—rs) (1.36)

An inverse procedure is performed when translating local expansions from parent to

22

child where anterpolation is used in place of interpolation. First, the parent local
expansion about 1‘3 is shifted about child origin r0; then in anterpolation, the forward
and inverse farﬁeld transform are performed to reduce the bandwidth in an exact
manner as described above but with L < L, where L represents the number of
harmonics in parent domain. Above procedure for interpolation/anterpolation can be
further accelerated with the use Of fast Legendre transform [23] where the coefﬁcients
anm are not computed explicitly. Though this approach scales favorably the break-
even point is large and not suitable for most practical applications [35]. This can
be overcome to some extent using the 1D FMM for fast Legendre transforms [81].
Cost of Interpolation / anterpolation using this approach scales as O (Q log Q), where Q
denotes the number of directions in farﬁeld signature. This said it can be shown that
overall cost of the multilevel algorithm scales as 0(N log2 N) [35]. Other methods
used for interpolation and anterpolation have been presented in detail in [78, 33,
69]. These include the use of polynomials and approximate prolate spheroidal wave
functions. The singular advantage of these methods is their cost scales linearly with
the number of samples, thus the overall cost scales as 0(N log N). However, while
interpolation is sufﬁciently accurate, one has to be more careful when anterpolating
functions as it is necessary to remove higher order harmonics. While we have not
digressed into implementation Of these schemes for vector electromagnetic problems,
we must caution that it is not a trivial extension. It is important to realize that the
farﬁeld component represented in terms of polar components in not bandlimited [82],
whereas they are bandlimited when represented in terms of Cartesian components.
This means that one either uses a fast scheme based on vector spherical harmonics [82]
or converts these to Cartesian before interpolation / anterpolation. Another intriguing
method for interpolation and anterpolation was introduced by Sarvas [48], wherein he

introduced modiﬁcations that enabled the use of FFTs. In other words, bandlimited

23

farﬁeld signatures can be represented in terms of Fourier basis as

 

P Q—l
Made 23 :3 access”) 637)
' p=-Pq=-Q
where,
P Q-l '
atp.q)=DFT{M(k>}= Z Z e"<m9+"¢’M(kpq)
m=—Pn=—Q
_ p2rr (1.38)
_2P+1
=1?
4’ Q

where DFT() represents forward discrete Fourier transform, 2M and 2N are number
of samples or basis function in 9 and 6) axis respectively. Then the integral over the

surface Of sphere can be written as,

1 113 0/ d9sin9U(6. 3) =1 d2 _[ d9|sin9|U(9, 6) (1.39)

Note that the above modiﬁcation changes the limit on 9 integral to [—r, 7r], thus
it can also be evaluated in fast manner using FFT. In single level implementation,
the integrand in (1.19) are ﬁrst represented in terms of Fourier basis using (1.37)
and than (1.39) is used for fast evaluation of integrals. In multilevel implementation
the interpolation and anterpolation, for varying bandwidth of multipole and local
expansion, can be achieved by zero padding and truncating the Fourier coeﬂicients
respectively. In anterpolation the Fourier coefﬁcients of parent local expansions are
symmetrically truncated before inverse Fourier transform, to Obtain the local expan-
sion about child domain with the desired bandwidth. Thus all operations, including

the evaluation of integral, can be evaluated using FFT. Reader is referred to [48]

24

for related theorems, proofs and numerical results. Finally, the numerical implemen-
tation Of multilevel FMM has been scrutinized in terms of different errors and to
ensure stability. This includes discussion on the relation between truncation and in-
tegration error in (1.29) [71], and interpolation/anterpolation error using Lagrange
interpolation [7 3] and spherical transform [69]. In addition, errors due to round-Off
and evaluation of special-function have been considered along with stability criterion
[74]. Numerical experiments show that truncation error in (1.29) is lower bounded
[43, 73]; thus for applications that routinely demand very high accuracies it is prefer-
able to increase the distance between well-separated boxes. Evidently this amounts

to an increase in number of boxes in near-ﬁeld interaction.

1.4.1 Other FMMs

The above exposition presented FMMs that are apt for analyzing very general prob-
lems. However, for certain problems it is possible to develop FMM schemes that take
advantage of topological features Of scatterer to reduce the asymptotic complexity.
The ﬁrst of such algorithm was the fast steepest descent path algorithm [83] that
exploited spectral representation of the Green’s function. The next incarnation of
this was the steepest descent FMM. It was developed following realization that when
analyzing scattering from objects whose height is considerably lesser that its lateral
dimension, it is not particularly useful to expand the ﬁelds using the complete spec-
trum. In other words, SDFMM can be interpreted to be a windowed FMM, and
results in a method whose complexity scales as 0(N). In SDFMM, it is achieved
naturally using the Sommerfeld integral representation of the Green’s function and

evaluating this integral using a combination Of two-dimension FMM and steepest

25

descent. More speciﬁcally,

e—ij
R

 

= :21/00 cugze-jkz(Z-Z’)H(2) (NpIP _ P’I)
—00

—j Nsd (n) (2) - I (1.40)

= 7 Z wnnp H0 (”pIP “ PII) [ﬂew—z)

n=1

where N 3d is quadrature rule along the integration path, wn is the integration weight,
K5,”) = resin an and 19,, = ncos an, and a is deﬁned along steepest descent path. It is
immediately apparent that the summation over Hankel functions can be accelerated
using a generalization of the two-dimensional FMM, and as before, this algorithm can
be cast within a multilevel framework. Another algorithm along these lines was the
fast inhomogeneous plane wave algorithm (FIPWA) [62, 47]. This algorithm follows
directly from Weyl’s identity
e—st _j 21r

_— de casinos-31°11 (1.41)
R 2 o SIP

 

The path of integration yields contributions from both homogeneous and inhomo-
geneous plane waves. As written, the above integral is slowly converging, but the
contour can be deformed along the steepest descent path. This integral is evaluated
numerically. However, values of the radiation pattern for complex 9 is Obtained us-
ing interpolation / extrapolation. Manipulation Of the requisite equations results in a
diagonal translation Operator. This method has been extended for analysis of scat-
tering from Objects above a layered medium [45, 47]. Additionally, they have been
modiﬁed for developing stable algorithms for broadband applications [84]. However,
we shall describe these algorithms and others [42] for rapidly computing potentials

for Wideband applications in the next section.

Finally, other variants of FMM exist that exploit the fact that between well sep-

arated boxes, one may construct windowed translation operators to lower the cost.

26

One such method is the ray propagation FMM (RPFMM) [85, 66]. Other windowed
translation operators have been used in two-dimensions for the analysis of scattering
from bianisotropic objects [86]. However, it follows from complexity analysis that
these methods will be fruitful only when the objects are sufﬁciently far away from
each other. This implies that the algorithm is most useful when used in a one-level

setting and may not be effective with a multilevel implementation.

1.4.2 Wideband FMM

In above discussion, a signiﬁcant highlight is the restrictive choice of L used to trun-
cate the expansions. This choice, based on the asymptotic behavior of Bessel and
Hankel function, reveals the behavior of above expansions when applied to low fre-
quency problems where n is very small. It is well known that Hankel function is
singular at origin and as n —+ 0 the expansion in (1.23), though valid, becomes nu—
merically unstable. This breakdown is referred to as low- frequency breakdown [43, 42].
Consequently for ﬁxed K. the size of source domain, which also deﬁnes the transla-
tion distance, cannot be made arbitrarily small. This issue becomes signiﬁcant when
the geometry is densely discretized, much more than the conventional A/ 10 criterion,

mostly to represent intricate structural details.

Scaled expansions

At low frequencies the numerical instability can be averted by using a normalized
form of the original expansion (1.23) [44, 46]. This approach is motivated by the
asymptotic behaviour of spherical Bessel and Hankel function for small argument.

Let t be a normalization constant such that t = 0(lcd) then the multipole expansions

27

in (1.23) can be written as,

thé’k—jnlxmn )=—jnZ 2(— )(2l+1) I11 711(Kd)1/lm(9d,¢d) 2+1
l= 0m=-l

hl2)(KX)Ylm(9X,¢X)

(1.42)

In above expression, terms inside the square brackets are the new normalized mul-
tipole coefﬁcients. As It -—> 0, using small argument approximation for spherical
functions and with t = n, it is a straightforward exercise to show that the normalized
expansions reduces to the expansions (1.2) used in static case. While the normalized
forms ensures numerical stability, the low-frequency nature Of the problem implies
that one can choose the number of multipoles to be same at every level. This in turn
implies that the multilevel version of this approach scales as 0(N) [46]. A constant
normalization factor is sufﬁcient when the geometry is uniformly discretized. How-
ever to accommodate wide variation in domain sizes and maintain the stability Of
expansion different normalization factor should be chosen in different parts [33]. This
approach has been successfully used in integral equation solution for scattering from

sub-wavelength structures [46, 87].

Spectral representation based plane wave expansions

An alternate approach, inspired by the diagonalized form for static FMM, was in-
troduced in [42] and later implemented in [84, 49, 50]. It is based on the following

well-known spectral representation of solution to Helmholtz equation [88],

 

—jnR
e R _2_1_7r0 8-1923122]: e—jA(xcosa+ysina)—___n dad). (1.43)

this relation is valid for z > 0. Further it is straight-forward to identify the purely

propagating part of spectrum as 0 S A S K. and the evanescent part as n: < A S 00;

28

with simple change of variables, above expression can be written as [42],

 

 

 

 

e—st ( e-st) ( e—st)
= + (1.44)
R R evanescent R propagating
where,
e—jrtR 1 oo 2 2 2r . .
___ _/ e—V A —rc z/ e—JA(xcosa+ysma)
R evanescent 27f K 0
A
dadA

x/A2 — 52
= i [00 e—oz f2“ e—jvo§+n§(xcosa+ysina)dado.
271' 0 0

_- R 2
3 JR = i "e_ [\2_,€2z)‘/ We—jA(xcosa+ysina)
R propagating 2W 0 0

A
\/A§——Is2

_ ij 1r/2
— 271’ 0

 

dadA

2
e—jncos9/ We-jrtsin9(xcosa+ysina)dad6
0

Notice that with It -—> 0 the propagating part vanishes and the evanescent part reduces
to the diagonalized form (1.14) used in static FMM. Now it remains to discretize the
above integrals for numerical evaluation and generalized Gaussian quadratures can
be employed for this. However, unlike in static case, the integrand cannot be ren-
dered scale independent and this means quadrature points and weights should be
pre-computed for all possible translation distances at all levels. It is worthwhile to
recount that the multipole and local expansions are computed and stored as they
appear in original spherical harmonics expansion (1.23); they are converted to ex-
ponential expansions back and forth during multipole to local translation only and
these relations can be found in [50]. This approach avoids the ﬂoating point over-
ﬂow as all the computed quantities and operations are regular and numerically sta-

ble. Other approaches based on above spectral representation have been presented

29

[84, 49, 89, 90] and they differ signiﬁcantly in their numerical implementation and
structure. In all these methods the multipole and local expansion are represented
directly in terms of exponential expansion coefﬁcients; hence they require new inter—
polation/anterpolation Operators for multilevel implementation. In [84], an extension
of FIPWA as introduced for multi-layered structures, the integrand is sampled along
the steepest descent path (SDP) and extrapolation techniques to estimate the evanes-
cent portion Of the spectrum from samples of the propagating portion. However, one
has to treat “shallow” evanescent waves differently from “deep” evanescent waves.
In [49], the evanescent integrand is sampled along the traditional Sommerfeld inte-
gral path (SIP) and singular value decomposition (SVD) of the integrand is used to
obtain expressions for multipole coefﬁcient and multilevel translation operators. An
interpolation matrix approach is presented in [90] to relate exponential expansions at
different levels. Using sample points in child and parent domain an overdetermined
system Of equation is formed and solved for the interpolation matrix entries in a least
square sense. The advantage of latter approaches is that they avoid the spherical

harmonic to exponential expansion and reverse mapping Operations.

1.5 Applications

This section provides an overview on application of above discussed algorithms in
different contexts. As mentioned in introduction, FMM and other fast methods, e.g.
FFT and tree code based, were develOped primarily to accelerate the evaluation of
potential or ﬁeld in N body problems. Integral equation solutions, a common choice
in simulation Of many electromagnetic applications, sought through iterative solvers
requires repeated evaluation of potential or ﬁeld at source points itself. Thus fast
algorithms play a signiﬁcant role in solving real world problems within realistic time

duration. The literature referenced here is only selective and not exhaustive as the

30

use of these algorithms have become more common during recent years. Also, only
topics related to electromagnetics are listed here; for applications in other research

ﬁeld refer to introduction.

First, electromagnetic application of static or Laplace FMM was evaluation of
electrostatic potential in 2D [6, 91]. The extension to 3D has seen lot Of applications,
particularly, in plasma dynamics [8, 92]. FMM based FastCap and FastHenry are
widely popular tools for extraction of equivalent capacitance and impedance among
multi-connects in micro-electronic components [19, 20]. Static FMM is also used in
integral equation solution of magnetostatic problems predominantly for analysis and
design of electric machines [93]. Simulations with non-linear materials have beneﬁted
much as they demand multiple solution before attaining stability [94, 95, 56]. It
has also been applied to quasi-static case especially in simulation Of eddy-current

phenomena [96, 97] and micro-magnetics is another area Of practical interest [98, 99].

The recently published book on fast methods in electromagnetics is a virtual
treasure house Of FMM methods and their applications to various problems in high
ﬁ'equency electromagnetics [33]. As is to be expected, Helmholtz FMM has been
applied to accelerate iterative solution of surface and volume integral equations. The
means to modify Helmholtz equation such that they are applicable to vector elec-
tromagnetics problems was ﬁrst presented in [7 8]. More detailed description can be
found [100, 75, 33]. Since their introduction, they have been applied extensively to
scattering and radiation problems of different ﬂavors; for instance, scattering from
perfect electrically conducting surfaces [28, 65, 78, 100, 69, 101, 102, 103, 104, 105],
scattering from dielectric/ composite bodies [106, 107, 108, 109, 110, 111], volume
integral equations [112, 70, 113, 114], anisotropic Objects [115, 116], scattering from
rough surfaces [117, 118, 119], application to microstrips [120], EMC/EMI analy-
sis [121, 122, 123], antennas [124, 125, 126]. Efficient implementation of FMM in

solvers with higher order geometry and basis function representations have led to the

31

development of fast and accurate solvers [107, 127, 128]. [129, 130]

Multipole accelerated algorithms have also been employed in various hybrid meth-
ods where solution is Obtained with use of moment method combined with one or more
of following techniques: to impose global radiation boundary conditions in ﬁnite e1-
ement solvers [131, 132, 133], ray tracing and diffraction methods [134], multi-grid
methods [135] and physical optics [136, 137]. These techniques are primarily used in
applications with multi-scale scatterers like antenna interactions [138] and ﬁeld pre-
dictions for urban mobile communications [139]. Implementation of FMM was also
modiﬁed to accommodate perfectly matched layer (PML) assisted integral equation
methods used in simulation of monolithic microwave integrated circuit (MMIC) and
photonic crystals [140, 141, 142]. Fast inhomogeneous plane wave (FIPWA) method
and other forms of FMM have been used to accelerate solution of scattering simu-
lations involving layered media structures with applications in design of microstrip
antennas [129, 130, 143, 144, 145, 146, 147, 148] and geophysical investigations for
sub-surface scatterers [64, 149, 150, 151, 152, 153, 154, 155, 156, 157]. A combined
FMM-FFT algorithm [158, 159] and SDFMM have been used in electromagnetic
analysis Of general quasi-planar structures with applications to rough surface scat-
tering, grating structure design in quantum devices and radiation from microstrip
patch antenna [118, 160, 161, 162]. The principle of FMM has also been extended
to accelerate potential employed in time domain integral equations. Plane wave time
domain (PWTD) is the time domain analogue of Helmholtz—FMM that has been used
to accelerate time domain IE (TDIE) [163, 82, 164].

1.6 Thesis Objectives and Outline

As mentioned in the preceding exposition, the primary downside of conventional

FMM is that they are speciﬁc to the form of Green’s function. In other words, one is

32

required to develop new set of FMM formulaes for each form of the Green’s function.
The Helmholtz FMM, as detailed in Section 1.4.2, suffers from numerical instability
for low excitation frequency. Analogously, plane wave time domain (PWTD), the time
domain counterpart Of Helmholtz FMM, also suffers from a similar breakdown when
large number of unknowns are concentrated in small size domains. Consequently,
the existing state of the art fast methods face severe limitations when applied to
multiscale problems. These are realistic problems where certain regions are very
densely discretized to accurately capture the physical details. The main goal of this
thesis is develop mathematical techniques to overcome the limitations of the existing
fast methods for electromagnetic simulation of multiscale problems. This is achieved
with the aid of accelerated Cartesian expansion (ACE) algorithm. ACE is a recently
developed, hierarchical tree computation algorithm in the vein Of FMM. Unlike FMM,
ACE algorithm relies on Cartesian harmonics and Taylor series expansion to derive
FMM like algorithm for arbitrary, non-oscillatory potentials. In this thesis, diﬂerent
aspects of ACE algorithm are exploited to develop fast algorithms to overcome the

low-frequency breakdown in both time and frequency domain.

Parallelization of FMM can be classiﬁed as a fairly recent work, with most of
the literature concentrated in the last decade. The hierarchical framework of FMM
qualiﬁes it as one of the diﬂicult algorithms to parallelize. Most of the existing
algorithms are based on heuristics. Such algorithms, though successful, provide only
modest scalability with maximum at 64 processors. This is a severe limitation of
the FMM algorithm when considering the ever growing size of cluster computers. In
this work, a parallel version of FMM algorithm is introduced that is scalable up to
hundreds of processors and beyond. The prOposed algorithm is provably scalable and
hence allows for large scale parallelization. This work leads to the development of the

state of art parallel multiscale electromagnetic solver for very large scale simulations.

The rest of the thesis is organized as follows:

33

Chapter 2, provides a detailed review of ACE algorithm. Here the deﬁnitions
and theorems of ACE algorithm are presented in a kernel independent fashion for
easy reference in later chapters. Also presented here is a detailed overview of the

multi-level tree computation scheme along with the proposed modiﬁcations.

Chapter 3, addresses the sub-wavelength breakdown of PWTD method, the time
domain counterpart of Helmholtz FMM. The smallest domain size used in PWTD is
restricted for reasons Of numerical stability. Thus, when large number of unknowns
are concentrated within a sub-wavelength structure, the computational advantage
Offered by the PWTD algorithm is overshadowed by the direct computation cost.
Here, the almost kernel independent framework of ACE algorithm is exploited to
develop an algorithm that is stable and efﬁcient for evaluation Of retarded potentials

within sub-wavelegnth structures.

Chapter 4, addresses the low-frequency breakdown of Helmholtz FMM. Here the
FMM algorithm is presented in sufﬁcient detail to identify the root cause of the
breakdown. ACE expansions of the Helmholtz kernel is developed and the stability
and convergence of these expansions at low-frequency limits is shown in a rigorous
manner. This leads to the development Of a Wideband FMM algorithm, Obtained by
seamlessly combining the ACE and FMM algorithms. This hybrid algorithm, that
is stable and efﬁcient across length and frequency scales, is then augmented with an

existing electromagnetic solver for simulation of multiscale geometries.

Chapter 5, take a slight detour from fast methods and explores the possibility of
developing a new integral equation formulation that yields well conditioned systems of
equations for multiscale simulation. Here the augmented electric ﬁeld integral equa-
tions (AEFIE), an existing IE formulation, is considered for modiﬁcation. Included
here is a succinct review of the operator theory analysis of EM integral equations.
These tools are used to establish the behaviour of the new formulation when applied

to low—frequency and multiscale problems. The new formulation is ﬁrst developed for

34

2D problems and then extended, with appropriate modiﬁcations, to 3D problems.

Chapter 6, presents the development of parallel implementation of above hierar-
chical tree computation algorithms on distributed computers using message passing
interface (MPI). Here the emphasis is laid on developing a parallel framework that is
provably scalable on large number, in orders of thousands, Of processors. The novel
schemes developed here results in a implicitly load balanced parallel algorithm. The
resulting framework can be viewed as a seamless combination of different schemes
already in existence. Detailed description on development of a parallel electromag-
netics solver is also provided and its high efﬁciency is demonstrated on hundreds Of
processors and beyond.

Chapter 7, summarizes the various contributions Of this thesis work in a succinct
manner. Several possible future works are also mentioned here.

Appendix A details the development Of a ACE based algorithm for rapid compu-
tation of time domain diffusion potentials and Appendix B provides a quick review of
the comprehensive exam problem “Integral equation methods to model eddy current

inspection of plates ”.

35

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

/ W/w

/ 11o // m
/ 10° // 101 /

/ t00107
L_/'°°°'

 

 

 

 

 

 

 

mom Level 4
$010101
/ . . / ° / Legend:
/ :7? . // ' . . [Leaf Boxl I Parent Box]

 

 

Figure 1.2: Representation of 2D computational geometry using quad-trees. Boxes
at different levels and corresponding nodes in tree are represented using binary keys.

36

 

Source ' Near-field Interaction list
I Box Box I Box

Figure 1.3: Illustration of interaction list; dark boxes are contained in the interaction
list of source box. '

 

Figure 1.4: Illustration of computational load in single- and multi-level FMMs. Dark
nodes correspond to actual sources while light shaded nodes represent centers of
multipole and local expansions.

M2L
A p
's" r° L2L

r, l'o L20
, 02M 09

as

DI
O

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1.5: Various operators involved in a multilevel FMM

37

 

South-list
North-Ii

 

Down-list

Figure 1.6: Re-grouped boxes in original interaction list, in ﬁgure 1.3, for application
of diagonal translation operator (1.15)

38

Chapter 2

Accelerated Cartesian Expansions

(ACE)

This Chapter, provides a detailed treatment of the accelerated Cartesian expansion
(ACE) algorithm and a general framework for hierarchical computations. Though
ACE is not the primary development of this thesis work, it is lays the foundations for
the advancements made in this thesis. Section 2.1, provides a brief overview of the
ACE algorithm. Section 2.2, presents the requisite introduction to deﬁnitions and
notations of Cartesian tensors used in rest of this thesis. In Section 2.3, the deﬁni-
tions and theorems of ACE algorithm are stated. Section 2.4, provides the details Of
the different procedures involved in a hierarchical tree computation algorithm. Sec-
tion 2.5 describes some of the algorithmic developements introduced to reduce the

computational cost by half.

2.1 Introduction

Accelerated Cartesian Expansion (ACE) is a fast computational technique, in the

vein of FMM, in the sense that it employs tree structure for hierarchical computation

39

and derives rigorous error and cost estimates. A common feature of these hierarchical
computational scheme is the use of divide and conquer strategy to offer the computa-
tional advantage with a prescribed loss in accuracy. This loss Of precision is justiﬁed
by the fact that the numerical simulation are constrained to ﬁnite precision by other
factors; and evaluation Of potentials beyond this limit does not Offer any advantage.
Further, these schemes accelerate the computation of far-ﬁeld potentials only. The
dominant contribution to the total potentials arise from the near-ﬁeld interactions

that are evaluated exactly using direct computation.

ACE is the mathematical engine behind the fast method discussed in this thesis.
It employs Taylor’s series expansion to derive addition theorem for arbitrary, non-
oscillatory functions. It is worth noting that the use Of Taylor expansion for fast
computation have been developed earlier also [10, 11, 12]. Typically, these FMMs
derived using the Cartesian expansions were more expensive as spherical harmonics
are optimal in representing Coulombic potentials. However, it was recently shown
that it is possible to develop a FMM using Maxwell-Cartesian harmonics that are as
optimal as using spherical harmonics with one singular advantage; it avoids the need
for special functions [13]. Here, the entire algorithm is cast within the framework Of
Cartesian tensors and exploits the fact that these tensors are totally symmetric to
provide an optimal representation of Cartesian harmonics. Another salient feature of
ACE algorithm is that it derives exact forrnulaes for traversing up and down the tree,
which in turn implies lesser source of error. The use Of Taylor’s expansion implies
that the potential or its modiﬁed form be non-oscillatory for rapid convergence of
these expansions. This technique, as presented here, was introduced for kernels of
the form R”. ACE has been extended to several other forms Of potentials, some
Of them as part of this thesis work, for e.g. Helmholtz potential [165], Yukawa (or
shielded Coulomb) potentials, retarded potential, diffusion potentials, solutions to

Klein-Gordon and lossy wave equations.

40

2.2 Cartesian Tensors

Tensor analysis is an integral tool used in development of ACE algorithm. A Cartesian
tensor Of rank n is denoted by A(") or in component notation by Ag;l..an, and is an
array of 3" components, for points in R3. A totally symmetric tensor is one that is
independent of the permutation Of indices (11 - - - an and in compressed form it contains
(n + 1)(n + 2) / 2 independent components. Alternatively, they can be represented in
compressed form as A(nl(n1, n2, 72.3) where n1 + n2 + n3 = n, and n,- is the number of
times the index i is repeated. An n-fold contraction between two tensors A("+ml and

B01) is represented using C(m) = A("+m) . n . BI"). The contraction of two totally

symmetric Cartesian tensors can be written using the compressed notation as

17.!
CW) (m1, m2,m3) = Z WA(n+m)(n1 + m1,712 + m2,713 + m3)
"1,712,713 1' ' 3' (2.1)

3(n)(n1,n2,n3)

An extensive exposition of theorems and formulae pertinent to the properties of com-

pressed tensors, their application to the ACE algorithm, can be found in [166].

2.3 ACE: Deﬁnitions and Theorems

In this section the theorems and deﬁnitions that permit the fast evaluation of functions
are outlined brieﬂy. To this end, assume that domains as and (20 are sufﬁciently
separated, and comprise Of sources and observers, respectively. Also, 93 C 913, 90 C
03 and (2’3 0 913 = (l). The centers of the domains 03, 90, at: and 93 are denoted by
r3, r0, r? and r3, respectively. Further, denote the potential function that maps the
effects of these sources on the observation points as 1,0(R), where R = ||r — r’ I] and 7:
sources exist in (23. Here, the function ¢(R) can stand for any interpolation function

T(t) convolved with the retarded potential and Observed at time t = 0. An addition

41

theorem for this function may be Obtained using Taylor’s expansion.

Theorem 2.3.1 (Taylor Expansion). The function 11) (r — r’) can be expressed

about the origin using

w(r — r') = Z -(_71!)—n-rm . n . Vntb(r) (2.2)

n=0

mer>H

This theorem gives rise to the following corollary.

Corollary 2.3.2. The function 112(r — r’) takes the form

00— M(n) . I n
1/1 (r _ r') = 211.0 n V w(r) for r > r’ (2.3)
Egozor'n.n.L("l forr’>r

where M("l and L(") are the multipole and local expansions. These theorems may

be used in concert to derive/ prove the following ﬁve theorems that form the crux of

ACE [166].

Theorem 2.3.3 (Multipole Expansion). The total potential at any point r E 00

due to k sources q;, i = 1, - - - ,1: located at points r,- E 93 is given as
oo
¢(r) = 2 MW -n - war)
":0 (2.4)
Mini = 24236.- - e)"

where M(") is the multipole tensor.

Theorem 2.3.4 (Multipole to Multipole Expansion). Given a multipole expan-

42

sion of k sources about rs

0(")=Z(—1)"q—'(r,—r,)" (2.5a)
i=1

then the multipole expansion about the point rg can be expressed in terms of ( 2. 5a ) as

M‘"’= 21- 1)"1;<r.-— rt)“= Z Z 361-)” "’01” (26>)

m=0 P(m, n)

It is evident that one can repeatedly use this theorem to translate the multipole

expansion from rs to rg. This expression is exact [166].

Theorem 2.3.5 (Multipole to Local Translation). Assume that the domains 93
and 910’ are suﬂiciently separated, and the distance between their centers r53 = |r53| =
Irg — rgl is greater that diam {913’} and diam {123 } If a multipole expansion M01)
is located at rg, then another expansion Ll") that produces the same ﬁeld Vr 6 523 is

given by

w(=r) 2p" ..n L(")

no=00 (2.6)
L(n) = Z a34071-71) , (m — n) . Vmwrgs)

where p = r — r3 and V is the derivative with respect to rig.

Theorem 2.3.6 (Local to Local Expansion). A local expansion 0(") that exists
in the domain of; centered around r’o7 can be shifted to the domain {to centered at r0

using

00

L("') = Z

m=n m—n

m

0(m) . (m — n) . (r?)”"” (2.7)

It can be shown that this expression is exact as well. Finally, the ﬁelds at a set of

43

observation points can be computed using the following theorem.

46) = Z Li") -n- (pm-r (2.8)

n=0

Proofs for these theorems for MR) = R” can be found in [166] and may be trivially
extended to functions of the form ’l/J(R) = span{R"”} for u = —1,0, 1, ~ -- ,K, or
any other non-oscillatory function. Note, that when ¢(R) = R"", evaluating the
multipole to local expansion using Theorem 2.3.5 implies the computation Of VnR'V

which can be efﬁciently eﬂected through

11311 13221 11311

n n. n 1 _2 _ "'2
6.12-26.3 (17) = <-1>"R " " Z Z 2 H)":
m1 =0 m2 =0 m3=0 m1 m2
(2.9)
”3 X Rme (V, Tl _ m _ 1)xn1-2m1yn2-2m22n3—2m3
m3

where R2 = x2 + y2 + 22. As was pointed out in [166], a computation scheme based

on these theorems have the following characteristics:

1. The multipoles are independent of the function being translated. Only the
translation operator depends on V. This fact will be of use in developing fast

methods for evaluating the retarded potential.

2. The multipole to multipole expansion (or the local tO local expansion) is exact.

This implies that the errors obtained do not depend on the height of the tree.

3. The formulation in terms of totally symmetric tensors permits the realization

of CPU cost savings of a factor of 1 / 720 over a simplistic implementation.

4. Finally, since only the translation function depends on the potential function

being used, it follows that the proposed methodology can be readily altered,

44

with very little change in the overall algorithm, for other potential functions.

2.4 Multi-level Tree Computational Framework

These theorems permit rapid evaluation of potential using either a standard or com-
pressed oct-tree decomposition of the domain. A standard oct-tree is constructed by
ﬁrst embedding the entire domain in a ﬁctitious cube that is then divided into eight
sub-cubes, and so on. This process continues recursively until the desired level of
reﬁnement is reached; an Nl-level scheme implies N1 — 1 recursive divisions of the do-
main. At any level, the domain that is being partitioned is called the parent of all the
eight children that it is being partitioned into. At the lowest level, all source / observers
are mapped onto the smallest boxes, leaf boxes. This hierarchical partitioning of the
domain is referred to as a regular oct-tree data structure. At any level in the tree,
all boxes/ domains are classiﬁed as being either in the near or far ﬁeld of each other
using the following dictum: two subdomains are classiﬁed as being in the far ﬁeld of
each other if the distance between the centers is at least twice the sidelength of the
domain, and their parents are in the near ﬁeld of each other. This deﬁnition will be
used unless it is specially stated that an alternate deﬁnition is necessary.

The interactions between all source and Observation points are now computed
using traversal up and down the tree structure in the following manner. First the
multipole expansions are computed at the lowest level for leaf boxes. Parent box
multipoles, at all levels, are computed from its children multipoles using multipole-
to-multipole translation Operator, theorem 2.3.4. This process is called upward tree
traversal. Second, local expansions are computed at every box from multipole expan-
sion of the boxes in its far-ﬁeld using multipole-to—local translation Operator, theorem
2.3.6. Next, the local expansions of all child boxes are updated with the local ex-

pansion of its parent using local-tO-local translation Operator, theorem 2.3.6. This

45

procedure is referred to as downward tree traversal. Finally the potential at observer
points are computed from the local expansion of leaf boxes using theorem 2.8. This
the far-ﬁeld potentials that accounts for contribution from all sources except from
the sources in near-ﬁeld region of the corresponding leaf box. The total or complete
potentials is Obtained by accounting for contributions from near-ﬁeld sources for leaf

boxes only through direct evaluation.

Cost of this scheme can be computed in the following manner. The cost associated
with each Operation will be denoted by Cop where op E {N F, 02M, M 2M, M 2L, L2L}
that stand for (i) near ﬁeld (ii) charge to multipole (iii) multipole to multipole (iv)
multipole to local (v) local to local and (vi) local to Observer. In the following analysis
the total number of interaction pairs is denoted by N, number of harmonics by P,
total number Of levels in the tree by N]. Let NM denote the number of boxes at
each level and assume that the number of unknowns in each leaf boxes, on average,
is s. It follows that NM = N/s, N,,,,_1 = 8N“ and 2:1, Nb, o< N/s. With these

preliminaries the cost of each Operation can be computed in the following manner.

1. Near ﬁeld evaluation, C N F1 This computation is carried out only at the lowest

 

level I = 1, at leaf boxes. The cost of direct evaluation between two leaf boxes
scales as s2 and for each leaf boxes can atmost have 27 boxes in its near-ﬁeld.

The total cost of this operation can be written as

CNF OC N/s X 2752
(2.10)

or 27Ns

2. Multipole expansion, 002M: In this Operation multipoles expansion in form

 

of totally symmetric Cartesian tensors are computed from s charges per leaf

box. The distinct elements in a totally symmetric Cartesian tensor of rank p is

46

+ + . e cost 0 eva uatm mu t1 0 e tensors u to ran IS,
(19 1)(p 2)/2 Th f 1 'g 1°p1 p Pth k'

P
062M 1%.. s x Ze+1>tp+2v2
p=o (2.11)
_ NP3
- T

. Multipole to multipole expansion translation, C M2 M3 The multipoles of parent

 

boxes, at any level, is computed from its eight children multipoles. The number

of Operations to translate all P + 1 multipole from a child to its parent is
6

H (P + i) / i. Since the total number of boxes in the tree scales as N / s the cost

i=1
for this Operation can be expressed as

(P+i)

 

N 6
CM2M = —S- x H (2.12)
i=1

. Multipole to local translation, C M2 L3 This operation is performed on all boxes

 

of the tree. For any box the maximum number of far-ﬁeld boxes is 189. The
cost of translation between two tensors is same as in previous case.
(P + i)

N 6
CM2L=189:-H i (2.13)

 

i=1

. Local to local expansion translation, C L2 L1 As mentioned before this operation

 

is exactly the same as multipole-to—multipole translation operation i.e. C L2 L =

CM2M-

. Local to observer, 01,20: cost Of this Operation is exactly the same as that for

 

mapping charges to multipole expansion i.e. 0120 = 002M-

47

The total cost of the scheme is the sum Of all individual cost,

Ctotal = CNF + 002M + CM2M + CM2L + CL2L + CL20

NP6 NP3
= 7 __ __
2Ns+1913720+ 3

(2.14)

It is readily apparent that optimal number of unknowns per box is s or P3/10.

2.5 Algorithmic Improvements

The above discussion pertains to the classical multi-level computational framework
introduced in [167]. Since its introduction several modiﬁcations and additions have
been suggested to reduce the computational time and extend its applicability to
general geometric distributions. Following are some of the developed in this research

work for Optimal implementation Of ACE algorithm.

2.5.1 Reduced Interaction List

From the cost estimate of ACE algorithm it is evident that C M2 L, cost of multipole-to-
local translation Operation, is the dominant part. Both the per-translation evaluation
cost and number of interaction pairs (typically 189) are very high. This observation is
common to all FMM like methods [168, 169]. In FMM based on expansions in terms of
spherical harmonics both the factors can be reduced with the use of plane wave basis
representation and exploiting the resulting symmetry. Alternatively, in this work a
new deﬁnition is introduced to classify far-ﬁeld pairs: if box a (at level 1 + 1) interacts
with all the children of box b ( at level I) and box a, box b are in far-ﬁeld of each other
then box a interacts with box b. Interaction between boxes at two consecutive levels
is easily effected using Cartesian tensors. In fully populated oct-tree this results in

a reduction in the number of translation operations by half with minimal increase in

48

error. Numerical results that support this claim is presented in chapter 3.

2.5.2 Compressed Oct-tree

Classical multi—level FMM loses its 0(N3) scaling when applied to geometries with
non-uniformly distributed sources / observers. The root cause of this breakdown is the
use Of uniform or regular oct-trees where all branches of tree is grown till the lowest
level. This implies that the number of source / Observer per leaf box varies drastically
between regions Of low and high source / Observer concentrations. For leaf boxes with
very low number of unknowns, evaluation of potential at its far-ﬁeld boxes through
{CZM, M 2L, L20} Operaion can be costlier than direct evaluation. To overcome this
shortcoming an adaptive version of the multi-level computational scheme was pre-
sented in [169] with compressed oct-tree representation for non-uniform geometries.
In compressed oct-tree representation only boxes, at any level, with source/Observer
pairs greater than a predetermined number is sub-divided into child boxes. Further,
in adaptive version always the optimal form of FMM is used based on the number Of
points in a leaf box. The implementation Of ACE algorithm closely follows the work
in [169], the main deviation is that the smallest box is used to enclose some pre-ﬁxed
number of points per box, 3. While this approach is not signiﬁcantly diﬂerent in terms
of cost when compared with [169], it does provide the possibility of improving error
with certain geometries as the error in multipole evaluation is reduced. On downside
this method may produce large number of single child parent which in turn increases
the number of tree traversal Operations, however this can be remedied by eliminating
these redundant parent boxes. With the elimination Of single child parent nodes, the
resulting oct-tree would have the same structure as in [169] except the leaf box size

would be smaller here as shown in Figure 2.1.

49

 

 

 

 

 

 

 

 

101101\1/ ch

1010110 L101“),
/ . - /__/’ ° /
/ " - / / 5—7/ /
/ 7- ' / W

 

Figure 2.1: An example Of compressed-quadtree with binary key representation used
to label the tree nodes.

50

Chapter 3

Fast Evaluation of Time Domain
Retarded Potential in

Sub-Wavelength Structures

In this chapter, a computational scheme is presented for fast evaluation of time do-
main retarded potentials in sub-wavelength structure, whose principle dimension is
less than or only a few orders of the maximum wavelength. Section 3.1 provides a
brief review of the existing fast algorithms for evaluation of retarded potentials and
their limiations when applied to sub-wavelength structures. Section 3.2 describes the
problem of computing retarded potentials. Here these computation are reduced to
evaluation of polynomial potentials Of different orders. Section 3.3 details a fast algo—
rithm when principal dimension of the domain is less than the maximum wavelength
and Section 3.4 generalizes this to arbitrary size domains. Section 3.5 presents re-
sults and discussion of the proposed method when applied to arbitrary uniform and

non-uniform geometry distributions.

51

3. 1 Introduction

Time domain solutions to scattering problems is preferred over frequency domain
methods when the analysis spans a wide range of frequencies. Examples of such anal-
ysis include characterization of Wideband antennas and analysis radar signatures. In—
tegral equation based methods for scattering from electrically large objects in time do-
main has been made possible via the development of acceleration techniques like plane
wave time domain (PWTD) and time domain adaptive integral method (TDAIM).
These methods ameliorate the computational cost when the size of overall Object is
several wavelengths long and the smallest feature scale is a fraction of the wavelength.
However, analysis Of structures that contain a mix of feature scales, poses problems
for acceleration techniques in both frequency and time domain. Here, it is the geo-
metric constraint that dictate the computational complexity. For instance, to model
ﬁne features, it is necessary to discretize that domain at a considerably higher rate
than that is dictated by the smallest wavelength to capture the geometric details.
These features occur in the analysis of practical problems in applied electromagnet-
ics, ranging from EMI/EMC applications to antenna topologies to feed structures to
signal integrity analysis in high speed interconnects, etc. The solution to this problem
is typically sought by devising a methodology that works at sub-wavelength scales,
and developing a transition to higher frequencies so that it can be integrated with

existing acceleration methodologies.

The problem encountered herein is not very different from those addressed in
the frequency domain fast multipole method (FMM). The PWTD algorithm is a
time domain analogue of FMM, with one signiﬁcant difference; the ﬁeld due to a
quasi-time limited and bandlimited source can be reconstructed to arbitrary accuracy
using a discrete set of propagating plane waves provided certain separation conditions

between the source and observers are met [170]. The separation criterion ensures that

52

time gating can be employed to yield causal results. Unlike in the frequency domain,
the cause of breakdown is not the expansions used in the algorithm; all functions
used in the expansion are regular at zero. The breakdown occurs because domains
that interact with each other via the PWTD algorithm are determined indirectly by
the time step size. As the time step depends only on the maximum frequency of
excitation and not on the smallest discretization, it implies that PWTD breaks down
as an acceleration tool because most Of the interactions would fall under the “near”
ﬁeld classiﬁcation. However, these arguments suggest an approach for overcoming this
hurdle; develop an acceleration procedure using adaptive time stepping. The main
advantage Of this procedure is the seamless manner in which it can be integrated with
the classical PWTD scheme for high frequencies, resulting in an acceleration scheme
that is valid at all length scales [171, 172]. Alternatively, one can modify existing
frequency domain low frequency algorithm to construct time domain information
[173]. This implies that one needs to develop the mechanism to transition from
frequency to time domain and vice versa such that the resulting system can still be
cast within the framework that permits transient analysis within a marching on in
time framework. It has been shown that the latter approach is considerably faster
than the former [173].

This work presents an alternate method to arrive at the same objective and is
founded on using Taylor expansions in a Cartesian framework, detailed in previous
chapter. More speciﬁcally, the methodology presented herein will rely on the recently
develOped fast kernels for evaluating potential of the form R” for l/ E R, and is very
competitive in terms of speed for a given accuracy with the other two methods that
exist [171, 172, 173]. integrated with PWTD. Thus, the main contribution in this

work are

0 Development of an acceleration technique to compute retarded potentials in

the sub-wavelength regime. The method presented relies on representing the

53

retarded potential as a function of potentials of the form R” , and then acceler-
ating this function. The presented method can be extended to other functional

representations as well.

0 Development of the requisite algorithmic structure to seamlessly extend this
(with very little cost overhead) to multiple time steps. Extension to multiple

time steps is done with the sole aim Of integrating with the PWTD algorithm.

0 Application to sub-wave legnth problems with non-uniform geometric distribu-

tions

3.2 Problem Statement

Consider a set of N3 sources that are randomly distributed in a domain 9. The
location of these sources will be denoted using rn and their time signatures by fn(rn, t)
for n = 1, - -- ,Ns. It is assumed that these functions are bandlimited to an angular
frequency 62m and all sources are approximately quiescent for t < 0. As in all time
domain solvers, the source functions fn(rn, t) are known only at evenly spaced time
steps tk = kAt for k = 1, - - - ,Nt where At = rr/(xwmax), MA), is the total simulation
time and x is an oversampling factor. Typically, x > 1 and chosen between 5 to 20
to accurately reconstruct functions fn (rn, t) from its samples. The ﬁeld at any point
r due to these sources is given by
”8 6(t - 124/c)

<I>(r, 1) = Z T * fn(1‘n, 1) (3-1)
n=1

where c is the speed Of light, * denotes convolution in time and Rn = ||r — rnll. It
is apparent that the cost of computing (3.1) scales as 0(NtN3). Finally, in keeping
with the deﬁnition of sub-wavelength regime, the size of domain is diam(Q) = 0(cAt).

Given the size of the domain, it is apparent that the PWTD scheme cannot be readily

54

used; it has to be substantially modiﬁed in order to evaluate these potentials efﬁciently

[172).

In developing this scheme, it is necessary that the source signatures in (3.1) be
known so as to facilitate the integration of the proposed algorithm with existing
marching-on-in—time solvers for time domain integral equations. The starting point
of the proposed method arises from the representation of the source signal. Assume
that the source function can be represented in terms of fn(rn, t) = 2k Inka(t),
where Tk(t) = T(t — tk) is a time basis function and I): are the samples of the
function at the discrete time step tk. It follows from this representation that

0% 00

(rt) = Z 2: 111,71: Tk(t - RnR’n/C) (32)

n=lk=0

 

This implies that to realize a fast algorithm, one needs to rapidly compute functions of
the form T), (t - Rn/c). To illustrate the development of a fast algorithm the temporal
basis functions are chosen to be backward Lagrange polynomials. Note, however, that
the methodology presented herein is not restricted to polynomials. To this end, the

K th order basis functions is deﬁned as

hk(tng—k(t) for (k — 1)At S t S kAt; k = 0, - - - ,K

 

 

T(t) = (3.3a)
0 otherwise
where,
1 k = 0
Mt) = k t _ m, (3.3b)
, k 75 0
i=1 ‘1At
and K k
__ _ t + lAt
arr—11(1) — t—H1 mi (336)

It follows, from the above equation that T(t) = 0 for t 9? (—At, K At), T (0) = 1 and

55

T(t) = 0 for t 2 —At, At, 2At, - - - , (K— 1)At. Using the functions in (3.2), and point
testing in time, results in the potential function that is a polynomial of R". It means

that one can directly exploit acceleration methods developed for kernels Of the form

R" [166].

3.3 Single Time Step Geometries

The ﬁeld at any observation point r c Q, at time instance iAt, due to sources at

rn E 9 for n = 1, - -- ,Ns can be obtained from (3.1)

 

<I>(iAt, r): giftéh 72an/6) fn(rn, iAt — r)dr (3.4)

n=1 0
where Rn = “r — rn|| and fn(rn, t) is the transient source strength at the nth spatial
point. The limits [0, At], on above time integral is possible because Rn/c E [0, At].
Employing time-domain basis function from (3.2) and evaluating the time integral in

(3.4) results in,

 

 

<I>(iAt,r) = i Z 1",,- [nu-”2; Rn/c) (3.5)
n=1j=i— —K
Ab If
.___ ZZIni-jT(jAtRnEI/C) (36)
n=1j=0

where, K is the order of temporal basis function T(t). Since T(t) is chosen to be a

backward Lagrange polynomial, (3.6) can be expressed in terms of powers of Rn/c as

NsKK

<I>(iAt,r) 2:21,, ,_Joz (h ,j )R,,_1 (3.7)

n=1 j=0 h: 0

56

In (3.7), a(h, j) is the coefﬁcient corresponding to the polynomial of degree (h —
1) for the basis function at (i — j)-th time step, they also depend on At and c.
Evaluating these polynomials of form R" can be performed at 0(N3) cost using the
ACE algorithm. Thus, the overall cost of this scheme scales as 0 (K N3). Error
bounds for using ACE to evaluate (3.7) can be obtained from the bounds derived in
[166] and it can be proven that the upper bound of the error is determined by that for
R‘l. Note, that the above derivation is not speciﬁc to using polynomials as temporal
basis functions. Other basis functions may be dealt with in one of two ways; either
by ﬁnding the appropriate translation functions, or by mapping these onto a space of
polynomials. Using polynomials is fairly trivial as the framework for the R” kernel is
readily available [166].

The 0(KN3) reduction in cost, speciﬁed above, is for brute force implementa-
tion of the ACE algorithm. It is important to recognize that the above formula-
tion demands evaluation of the kernel 12'” for different V’s. However, most of the
steps in the proposed algorithm are kernel independent. In that, Theorems 2.3.3 and
2.3.4 (multipole expansion and multipole-to—multipole translation) do not depend on
the kernel. Similar observation holds for local-to-local translation and evaluation of
potential from local expansion, Theorem 2.3.6 and equation (2.8). Thus, only the
multipole-to—local translation, Theorem 2.3.5, depends on the kernel and requires the
evaluation of VnR" for different V values. Therefore, evaluation of polynomials of
form 2,, cuR" involves (almost) one tree traversal (up and down) irrespective of the
kernel, only the multipole-to—local translations need to be done separately for each
kernel or polynomials of different degrees. Thus, a careful implementation of the ACE

algorithm rasults in an adaptable and signiﬁcantly lower cost algorithm. Applying

57

the multipole to local translation (Theorem 2.3.5) in (3.7),

N3 K K
(P(iAtJ) = ZZZIn,i—ja(hvj)R’ri-l

n=1j=0h=0
N3 K K P

= Z z Z In,i—ja(haj) Z VpRgh—l) 'P ' R’n(p)
n=1j=0h=0 p=0
P K K N3

= Z: Za(h,j)VpR£h‘l) ~p- ZIni—JRW)
p=0j=0 h=0 n=1
K P

= 2273“”) 'P-M?) (3.8)
J'=0p=0

where R0 = ||r — roll, Rf, = Hro — rnll and r0 is the center of sphere enclosing

all sources. 73.0)) and M?) are the optimal tensor representation of multipoles and
translation Operation of the ACE algorithm. Equation (3.8) implies that upward
tree traversal, i.e., multipole-to-multipole translation and multipole-to—local transla-
tion should be performed K times. This is to preserve the transient information,
In,,-_j associated with each basis function for every source. However, downward tree-
traversal which include local—to-local translation and potential evaluation needs to be

performed only once.

3.4 Multiple time step interaction

The above exposition was geared towards developing a scheme for computing inter-
actions when diamm) < cAt. Next, the generic case of diamQ > CA; is approached
through modiﬁcations to above methodology. Consider two domains 91 and {22 such
that, ‘v’ r1 6 $21 and r2 6 {22 satisﬁes (N —1)At g ||r1 — r2||/c S NAt, where N

is any positive integer. Then, the ﬁeld at any point 1'] at i-th time step, (P(iAt, r1),

58

due to Nsﬁz sources at rn E (22 can be written as

 

 

N3 92 NA
. ’ 6 'r — Rn c .
<I>(iAt,r1)= ”21 f ( Rn / )fn(zAt —T,rn)drdr (3.9)
_ (N—1)At
where Rn = ||r1 — rnll. Repeating the derivation presented for single time step
interaction,
N3 K .
. T J + N — 1 At — Rn C
‘I’(2At,l‘1) = Z ZIn,i—j—(N—l) (( R: / ) (3°10)
n=1 j=0

When N = 1, (3.10) reduces to the case for interaction within one time step (3.6). It
is important to preserve R/c argument of the basis function in (3.10), as a polynomial
representation is necessary for acceleration using the ACE algorithm. Thus, the key
in multiple time step interaction is to identify groups (21 and {22, and it can be done

using the following argument;

ﬁnd dmin Z NAt and dmag: S (N + ”At (3.11)

where, dmax and dmz-n are the maximum and minimum distance between any two
points in $21 and {22, see Figure 3.2. For example, consider spherical domains of
radii 1'1 and r2 whose centers are separated by R0; then dmax = R0 + T1 + T2 and
dmin = R0 — r1 — r2. From (3.10) and (3.8) it can be inferred that the number
of upward tree traversals (multipole-to-multipole and multipole-to—local translations)
equals NWK, where Nmacht is the diameter of the sphere encompassing the entire
low-frequency region Q. These constraints mandate a new deﬁnition be used when

developing the interaction list in the oct-tree as follows:

Deﬁnition Interaction list rule: Consider two child boxes whose parent boxes are in

near-ﬁeld. They are in each other’s far-ﬁeld if the distance between their centers is

59

at least twice the sidelength of the domain and they satisfy (3.11). Otherwise, they

are in each other’s near-ﬁeld.

Some boxes may be well-separated in space and still not satisfy the temporal
constraint in (3.11). For example, consider two spheres of radius r1 = r2 = cAt / 8
whose centers are separated by R0 = N cAt, now dmaz = c(N + 1/2)At and dmin =
c(N — 1/4)At which do not satisfy (3.11). In such cases one can choose either of the
following options: (i) sub-divide the domains and perform interaction at next level
(with smaller domain size); (ii) consider the domains to be in near-ﬁeld of each other
and use direct evaluation. Sub-dividing the domain without limit has two disadvan-
tages. First, the number of unknowns per smallest box, with increasing levels, can fall
below the limit for optimal computational cost. Second, sub—division into smaller size
boxes does not always ensure compliance with constraint in (3.11); it can be shown
that boxes who’s centers are separated by multiples of CA1; (N cAt), without regard to
their size, will not follow the temporal constraint (3.11) and interaction between such
boxes should be evaluated using direct methods. Further, using the second Option on
short trees can increase the total number of near-ﬁeld interactions and dominate the
overall computational cost. In this work, an optimal implementation is obtained by
combining both, i.e., sub-dividing up-to a certain level and beyond this level domains
violating (3.11) are placed in near-ﬁeld interaction of each other. It is essential to
note that the number of levels up-to which sub-division is used can be geometry de-
pendent. In essence, this procedure overcomes the multiple time step interaction with
a slight cost overhead that should be optimized. Further observations are presented

in next section.

60

3.5 Results and Discussion

In this section results presented will substantiate the above claims and demonstrate
the efﬁcacy of the algorithm presented herein. As in all illustration of FMM meth-
ods, the goal is to demonstrate considerable speed—up with predetermined accuracy.
Consequently, the results presented will demonstrate convergence as well as 0(N3)
per time step CPU cost scaling. In all numerical experiments, the source/ observer
locations are randomly distributed. The corresponding standard / compressed oct-tree
data structures (including interaction lists) are generated using the algorithmic pro-
cedure outlined in the Appendix. The accuracy of the prOposed algorithm is validated
against analytical data for all casae where the unknown count is numerically small.

The relative error at nth observer is evaluated as

_ llq)fast,far(nat) - (panalytical,far(nit)ll2
)

Error (n — (3.12)
f” H<I>....:,,..-c..z,f..<n,t>nz

 

where, “”2 represents L2-norm, <I> fast, far (t) and (panalytical,far(t) represent the time
history of the ﬁelds produced by the sources evaluated using proposed algorithm and
analytical procedure, respectively. The error reported in this work is the average error
over all observers [172] when the number of observers N 3 < 32, 000. For larger number
of unknowns, the analytical data (and hence the error) is computed for randomly
distributed unknowns (approximately 150). Hence, the reported data is an estimate
of the expected error. These value are denoted using a 1‘. Finally, as is usually done
for all fast algorithms, analytical data is computed only for the source/ observation
pairs that are in the far-ﬁeld of each other, and is consequently representative of an
upper bound or worst-case error. The CPU timings (in seconds) are those taken for
evaluating the ﬁeld at a single time step using a 2.3 GHz Intel Pentium processor

with 2GB RAM running Linux OS. In all experiments that follow, the time signature

61

that is associated with the nth source is given by (A.16)
—(t—t )2/202
fn(t, rn) = Fine 1) (3.13)

where Kn is the magnitude of the source randomly chosen between [0,1], 0 = 6.366 x
10’8 s and tp = 60 s. The effective highest frequency and minimum wavelength
associated with these signal parameters are fmax = 3/7ro = 15MHz and Am,” 2:
20m, respectively. As prescribed in MOT solvers, the time step is chosen as At =
1 / (20 fmax) = 3.334 ns and is independent of geometric feature size and only a func-
tion of fmax. The above parameters are chosen such that cAt = 1m, thus, all
geometric features smaller than 1 m would fall in the sub-wavelength category. In
rest of the section, P denotes the number of ACE harmonics used and K denotes the

order of the time basis function.

The ﬁrst set of numerical simulation is performed to demonstrate the validity
of the improvements made in the kernel that reduce the number of translations by
approximately a factor of two without signiﬁcantly affecting the order of the error (see
Appendix 2.5.1 for details). The numerical experiment performed is as follows; source
points were randomly distributed within a cube of side-length 0.5 m, i.e., all points
interact within one time step. The number of source / observation points is varied (as is
the height of the tree), the number of unknowns at a leaf box is approximately 64, and
error is obtained for the “Old” and “New” schemes. The results presented in Table
3.1 indicate what is expected, viz., the computational cost is reduced approximately
by a factor of two while the increase in error is almost always marginal (the order of

magnitude of the error is unchanged).

Next, set of results demonstrate that the multipole-to-multipole and local-to—local
operations are exact. An important ramiﬁcation of this is that the error is independent

of the height of the tree. This experiment is effected as follows: consider two cubical

62

domains {21 = (0, 1/4) x(0,1/4)x(0, 1/4) In3 and 522 = (1/2, 3/4) x (0, 1/4) x (0, 1/4)
m3. Each domain contains 4000 randomly distributed source and observation points.
In constructing interaction lists, it is ensured that only sources / observers in (21 and
(22 interact, all others are ignored. Thus, as the number of levels in the tree are
increased, the change in the error norm can be attributed solely to the multipole-to-
multipole and local-to-local operations. Table A.1 shows error computed for different
{P, K} pairs and different levels in tree, where dxo is the size (in meters) of smallest
box. It is evident from Table A.1 that, for a given {P, K} pair, the variation in error
obtained from using different levels in the tree is accurate to double precision. This is
a consequence of the fact that Theorems 2.3.4 and 2.3.6 are exact, i.e., they produce
the multipole (or local) expansion had the box size at that level been the leaf box.
Consequently, the error bounds are much tighter. Details and proofs can be found in

[166].

Next, results are presented for distribution wherein all source/ observation pairs
are distributed within a domain (I < cAt and distribution sizes ranging from 8000
to 4,000,000 points. The number of unknowns per leaf box, on average, is chosen to
lie between 60 and 70. From Table 3.1, it can be inferred that number of harmonics
and order of time basis function are closely coupled, i.e., for a given K, arbitrarily
increasing P does not improve the error and vice-versa. This is true because the
two sources for error (A.17) reported here are (a) approximation of a time signal
with polynomial basis function of order K, and (b) error in evaluating a polynomial
through ACE (limited P) due to far-ﬁeld approximation. Hence, the results for time
comparison are presented only for the optimal pairs {P, K}. For example, {4, 2}
indicates simulation run with 4th order harmonic in ACE and 2nd order temporal
basis functions. In general ﬁrst, second and third order temporal basis function
can provide up to 0(10'4), 0(10-5) and 0(10’7) accuracies respectively, for the

given source signal parameters (A.16). Table 3.3 shows the relative error for different

63

{P, K} pairs and distribution sizes, N3. It can be seen that for increasing {P, K}
combination the error decreases consistently. Table 3.4 presents the per-time—step
computation time involved in both direct and proposed algorithm, the order of error

corresponding to different {P, K} pairs can be inferred from Table 3.3.

Similar results are presented for multiple time step interaction in Tables 3.5 and
3.6, where N denotes the number of distinct time step interactions and Cs denotes the
sidelength of cube enclosing all sources / observers in meters. In Tables 3 to 6 empty
entries, pertaining to large N, and {P, K} values, are due to insuﬂicient computer
memory on the chosen computer platform. Figure 3.6 shows N3 vs. Tfar graph in log
scale for data in Table 3.4. The lines plotted in the graph corresponds to a least square
error linear ﬁt for different {P, K} pairs. Slope of these line for different {P, K} values
was approximately 1.06, thus, validating the 0(N3) scaling of algorithm presented

here.

The evident mismatch between timings in Tables 3.6 and 3.4 is explained as fol-
lows. In the case of single-tirne—step interaction, the size of smallest box was chosen
to accommodate 60 to 70 unknowns per box on average. However the largest box, at
top of the tree (level 1), is within cAt dimensions; therefore, the height of the tree
increases as distribution size is increased. In case of multiple time-step interactions
one can keep the leaf box size constant and increase the level-1 box size for higher
distribution size, to achieve % 64 unknowns per leaf box. However, this does not
imply a direct increase in tree height because the interactions at larger boxes also
need to obey (3.11). For example, for two spheres of radius r, to interact, the limit-
ing condition based on (3.11), is r, S cAt / 4. Boxes greater than this size interact
only through their child. This is the only limitation of the algorithm presented here,
however, in practice the algorithm can be strictly used to compute ﬁeld interacting

in few time steps only and PWTD will interface with this method when [R/ (cAt)J

is beyond a certain number of time steps. Thus, an ideal algorithm should switch

64

between the proposed algorithm and PWTD seamlessly.

Finally, results for an adaptive version of the algorithm introduced here is shown
on two types of non-uniformly distributed geometries. The ﬁrst closely resembles
interconnects in electronic chips as shown in Figure 3.4. The distribution of points
between top and bottom planes and two interconnects were approximately the same.
In applying the adaptive version, the number of unknowns per leaf node was ap
proximately 64, was tested for source / observer distributions ranging from 8,000 to
1,000,000. Table 3.7 presents the error obtained using the proposed algorithm, and
was generated for different combinations of ACE harmonics (P) and order of time
basis function (K). The rate of error convergence exhibited here is fast in compar-
ison to those in Tables 3.3 and 3.5. This outcome is primarily attributed to the
consideration of smallest box enclosure and stricter enforcement of error criteria in
building the interaction list; see previous chapter. The timing result for this geometry
conﬁguration is presented in Table 3.8. As explained above in uniform distribution,
the timing results are presented only for certain combinations of {P, K}, each pair
corresponding to different orders of accuracy given in Table 3.7. Figure 3.7 shows
N3 vs. Tfa, graph in log scale. The slope of the linear ﬁt was approximately 1.06
for different pairs of {P, K}, exhibiting the 0(N3) scaling produced by the adaptive
version of the algorithm. The second geometry conﬁguration considered is made of
three circles with points non-unifome distributed in each of them as shown in Figure
3.5. Each circle is 0.15 m in radius and the points were distributed so that density of
points is inversely proportional to the radius. The adaptive version is applied on ﬁve
different distribution sizes varying from 9,600 to 1,000,000 and the results are shown
in Table 3.9. As before, it can be veriﬁed that the time scaling is 0(N3).

65

 

 

Figure 3.1: Example of antenna feed geometry with low- and high-frequency regimes
denoted by Q L F and 9 H F respectively. Smallest wavelength of incident pulse is also
shown for reference.

 

 

 

 

dmin
92

 

 

 

 

Figure 3.2: Deﬁnition for domains interacting over multiple time steps

66

 

N

|:I 3

2

- 1

- Direct

     

AT

Source box
Single interaction case

Figure 3.3: Map of N in equation 3.11 for an example single level interaction.

Table 3.1: Comparison between Old and New (reduced) scheme for interaction list
and P

for different distribution sizes

 

67

3':
I . .
I. i.
0'..- - I...-
II - n 'l
‘0 I :l I ‘f. . .
I’

.‘ :‘1 .. A . I

- '. .'. 1.: I

I‘.; '.: :. e u. : ‘8
I I'. 0‘ . ...I :' .
:‘. I * E I y.
" I ~I I. 0 I.
.~ . I a '

I .0. .M. . . ‘ a}'
3' I I -

U.
I ..' I I
"t... -' ~ ..0 .ﬁ'Iu'". 0
I '7

.. I. .. .: O. :t

'0'.‘ .0 ~' ‘? mg}- \ E...

‘ . r .§:3-«’i-r’,__‘ag'._,g_..
:4.

II
. .o .e
.' I ~I ."
e I

. J... . :.:."-5;'§'L‘:'-“~‘".

. I...
:. I -. o -I ’ I
"' I If. .3” . :. I..
' ‘ - ' - ' -' '. Z"
I a s I I a. "
,..: ." .f‘ ‘ ' I' '5
z. s' o .2 -- -
‘ I 0‘. 'ﬁo
I
. 42.; , ‘.-. .
._II n. . ~' -I. ’4'.
e . . \- ‘ ' .’
I... :0 ’. I
I . e . .. I.
' :::.v'-. ~ ' -
5 a. _
. ..‘ 1.1: .‘A .
I .004..'. I. I 4 . .-
’ . ~54" \ . -
I . I ..n . II
I. a. ' I... a." g ._
I... ..' I . i
O. ' 3‘. ‘
I : .
.. II ~-
I O . .I' '
I ' .3
s \
.5:- ..'
.

 

Figure 3.4: N on-uniform geometry conﬁguration 1, resembling interconnect in elec-
tronic chips (N3=12000).

Table 3.2: Exact multipole to multipole and local to local operators of ACE
ditto {P, K}
A=0.0625 Levels {1,1} {4,2}
A 4 1.8800972191556 69E—2 7.03463843261 4828E-4
A/ 2 5 1.8800972191556 66E—2 7.03463843261 3739E-4
A/ 8 7 1.8800972191556 70E—2 7.03463843261 3831E-4
A/32 9 1.8800972191556 70E—2 7.03463843261 3819E—4

 

 

 

 

 

 

 

 

 

 

 

 

68

 

'? n . .-n...o' .. . . . n . .. .‘QE. .. 0. ':;?":‘-.'....‘-. f
T ' .1. ‘ﬁ ‘0..- }" I a .I... ‘, ‘.' F . I ‘99.” 'e ‘vl ’5
- h. - . o' a ..:--v--r--.. -=" - -
.: a. .I s It. a ..n '. ' O .0 .0. e a N
..I "‘ I ‘.:...‘l ' . '- ;.O‘I. .
0" .'II . . . ... - . :t'.:{."} 5 I: "i:
u I o u i - I ~ 0 "
g‘ :. . ‘ ~ . f 'I
o o ‘ ~- .0 . ‘I I ".
O . ’. ‘I {7. ' I
‘ .a. '. . :’- . ‘
I... .z. a T :
o‘- I:
.- I I

Figure 3.5: Non-uniform geometry conﬁguration 2 (N3=9600).

 

25 ‘ O {1 ,1}

2: um}
15- .{42}

  
    
  

——Llnoer Fit

 

 

 

9
Cl
l

4.6 q

-2 . . . . . .
3.7 4.2 4.7 6.2 5.7 6.2 6.7

L00 (NI)

 

 

Figure 3.6: log(N3) vs. log(Tfa,.) for single interaction case and uniform geometry

69

  

 

   

 

 

 

 

 

1.1
0.6 -
E 0.1 -
§0A, onn
09 4 - {2.1}
A {4.3}
.14 .
-—LineIr Fit
'19 I l I I
3.7 4.2 4.7 5.2 6.7
L09 (N!)

Figure 3.7: log(N3) vs. log(Tfa,.) for single interaction case and non-uniform geome-
try

Table 3.3: Error far in single time step interaction case (Cs 2 0.5), for various N 3
and {P, K} pairs.

 

N3

8000

12000

32000

640001

soooooi

 

levels

4

 

4

 

4

 

5

 

6

 

{P,K}

Error jar

 

{1,1}

0.00581

0.00995

0.00474

0.00808

0.0015

 

{0&1}

0.000938

0.00155

0.000637

0.000944

0.00176

 

 

{ELI}

0.000341

0.000596

0.000355

0.000638

0.00118

 

'982}

7.87E—05

0.000143

6.54E—05

0.000174

0.000406

 

'012}

2.09E—05

4.87E—05

2.59E—05

4.01E—05

2.48E—05

 

{95$}

4.97E—06

4.52E—06

2.31E—06

8E—06

8.34E—06

 

 

 

{13,3}

8.85E-07

 

1.62E—06

 

1.03E—06

 

2.43E—06

 

 

 

Table 3.4: Comparison of run-time in single time step interaction case (C's = 0.5).

 

 

 

 

 

 

 

 

 

 

fl:fasta {P, K}

N3 TDirect {131} {2’1} {4’2} {953? {13:3}
8000 4.47 1.40E—2 3.18E—2 0.14 2.17 10.96
12000 11.02 2.27E—2 4.6lE-2 0.23 3.60 18.50
32000 97.59 8.87E—2 0.18 0.91 14.42 85.2
64000 - 0.20 0.444 1.90 27.95 173.38

500000 - 1.94 3.82 15.67 245.37 -
1000000 - 3. 78 7.19 30.98 498.03 -
2000000 - 7.71 13.33 60.18 742.21 -
4000000 - 16.06 27.46 121.72 1940.15 -

 

 

 

 

 

 

 

 

 

70

Table 3.5: Error far in multiple time step interaction case, for various combination
of N3, N and {P, K} pairs. N is the number of distinct time steps involved.

 

8000

12000

32000

32000

128000T

 

4

4

4

5

6

 

1

1

1

2

2

 

2

 

2

 

2

 

3

 

3

 

ETTOTfar

 

2.92E—03

2.97E—03

3.50E—03

2.11E—03

3.80E-03

 

5.79E-04

5.42E—04

7.06E—04

4.07E—04

3.33E—04

 

3.15E—04

3.07E-04

4.53E-04

3.22E—04

3.30E-04

 

 

6.27E—05

5.67E—05

8.14E—05

5.03E—05

7.19E—05

 

2.67E-05

2.45E—05

2.99E—05

1.54E—05

1.62E-05

 

 

 

 

1.97E-06

1.90E—06

 

3.32E—06

 

 

9.15E—07

 

Table 3.6: Comparison of run-time in multiple time step interaction case

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

TFast: {P, K}

N8 03 N TDirect {1,1} {2’1} {4’2} {933}
8000 1.0 2 2.02 0.03 0.06 0.31 4.85
12000 1.0 2 4.69 0.04 0.10 0.49 7.68
32000 1.0 2 61.93 0.19 0.44 2.52 43.97
32000 2.0 3 34.31 0.17 0.32 1.67 28.49
64000 2.0 3 - 0.67 1.52 8.76 165.45

500000 2.0 4 - 27.89 55.47 294.06 -
1000000 1.0 2 - 45.52 83.37 433.34 -

 

 

 

Table 3.7: Error far for non-uniform geometry conﬁguration 1.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N3 8000 12000 32000
levels 4 5 6
{P, K} Error far
{1,1} 3.81E—3 3.58E-3 3.23E—3
{2,1} 9.46E—4 8.62E—3 6.75E—4
{3,2} 1.83E-4 1.64E-4 1.14E—4
{4,2} 3.78E-5 3.43E—5 2.69E—5
{5,2} 1.62E—5 1.51E—5 1.32E—5
{6,3} 3.06E-6 2.59E—6 2.02E—6
{8,3} 1.18E-6 1.10E—7 9.46E—7
{9,3} 7.52E—7 7.27E-7 6.93E—7

 

 

 

71

 

Table 3.8: Comparison of run-time for non-uniform geometry conﬁguration 1.

 

 

 

 

 

 

 

 

 

 

TFast’ {P, K}

Na {1,1} {2,1} {4,2} {63}
8000 0.03 0.05 0.14 0.49
16000 0.05 0.09 0.32 1.03
32000 0.11 0.20 0.63 2.28
64000 0.25 0.43 1.42 4.8
250000 1.19 1.75 5.33 18.86
500000 2.61 3.58 10.86 40.08

1000000 6.11 7.8 23.25 79.63

 

 

 

 

 

Table 3.9: Comparison of run-time for non-uniform geometry conﬁguration 2.

 

 

 

 

 

 

 

 

TFasta {Pa K}
N3 {1,1} {2,1} {4,2} {63}
9600 0.03 0.05 0.19 0.51
38400 0.13 0.23 0.79 2.88
105000 0.4 0.72 2.35 8.42
450000 2.16 3.18 10.52 37.34
1000000 5.17 7.07 24.33 82.46

 

 

 

 

 

72

 

 

Chapter 4

Wideband FMM and Multiscale
Electromagnetic Solver in

Frequency Domain

This chapter addresses the development of a fast algorithm for electromagnetic sim-
ulation of multiscale structures in frequency domain. Section 4.1 provides a brief
review of the multiscale problem in electromagnetics and the limitation of the ex-
isting fast algorithms. Section 4.2 presents a general problem setting followed by a
brief exposition on the sub-wavelength breakdown of FMM algorithm. In Section
4.3, ACE algorithm is employed for fast evaluation of Helmholtz potential in sub-
wavelength scenarios; rigorous proofs are provided to establish the stability of these
expansions. Section 4.4 describes the details of the hybrid algorithm, combining ACE
and FMM, that is applicable to multiscale problems. Section 4.5 presents results on
error and timing of the proposed schemes to demonstrate their numerical stability and
efﬁciency. The hybrid algorithm was also integrated with an EM solver to analyze

scattering from electrically large structures.

73

4. 1 Introduction

Integral equation based methods are used extensively in scattering analysis. However,
it is well known that they produce dense matrices that increase the cost and limit the
size of the problem that can be solved. In past two decades, signiﬁcant research effort
has been dedicated to the development of efﬁcient and accurate techniques to amor-
tize this cost. These advances have had a widespread impact in variety of applications
ranging from scattering and radiation analysis to micro-electronic packaging; an in-
complete compendium of applications is presented in [88, 174]. The increased power
and availability of computational resources and acceleration schemes have enabled so-
lution of problems with very large number of unknowns, varying from few thousands
to few millions [37, 40]. Another class of problems arise when analyzing structures
which require a high local density of unknowns to capture geometric features. This
class of problems, hereafter, referred to as multiscale problems exhibit multiple scales
in frequency or length or both. For example, small length scale discretizations are
required to capture sharp geometric features that are embedded within large and
smooth geometries discretized at a coarser length scale. Similarly, multiple frequency
scales is vital to analysis and design of ultra Wideband (UWB) antennas embedded
in structures [17 5]. In general the characteristics of a multiscale problem is the con-
centration of large number of unknowns in electrically small domains. Akin to the
breakdown of time domain fast mathods for wave equation, existing techniques for
Helmholtz equations also face limitations when applied to multiscale problems as their
cost scaling is poor [46, 42] and mixed discretization also lead to badly-conditioned
matrix systems [176, 177, 178, 179, 175}. The development of a fast algorithm that is
stable and efﬁcient for multiscale structures is addressed in this chapter. The latter
problem of badly-conditioned system is remedied with the use of an alternate integral

equation formulation, whose development is detailed in Appendix A.

74

919 I:

app:
PIC“!
less
see.
he

CU;

As mentioned in Chapter 1, FMM has become an indispensable for large scale
electromagnetic analysis [174, 34]. However, FMM becomes numerically unstable
and inefﬁcient when applied to multiscale problems [42]. This is a consequence of the
fact that Helmholtz FMM does not smoothly transition to Laplace FMM as frequency
tends to zero. This was ﬁrst remedied by introducing a suitable scaling factor [46] that
ensures the computed quantities are stable and the transition is smooth. However
this approach is not suitable for problems with multiple length scales. An alternative
approach based on spectral representation of free space Green’s function in terms of
propagating and evanescent plane waves was proposed in [42] . This approach seam—
lessly transitions from high to low frequency kernel for both spatial and frequency
scaling [50], but it requires the evaluation of an inﬁnite integral in k—space; general-
ized Gaussian quadratures [50] and other approaches based on contour integration in

complex plane [84, 49, 180] have been explored for this purpose.

The main contributions of this chapter are,

a development of a low frequency fast method based on Accelerated Cartesian

Expansion (ACE) algorithm

0 development of a hybrid scheme by combining ACE with FMM for multiscale

problems
0 derivation of convergence proofs and bounds

o integrating the hybrid scheme with integral (equation solvers and demonstrate

its application to practical problems.

Though the overall structure of the hybrid algorithm developed here bears some

similarity with [50], it should be emphasized that they are two different algorithms.

75

4.2

surfa
l. I
cum

lllSlC

(CF

be!
1hr

4.2 Integral Equation and FMM for Helmholtz

Equations

Let S denote the surface of a closed PEC object that resides in free-space. This
surface is excited by a plane wave characterized by {E2 (r), Hi (r)} with wavelength
A. The scattered ﬁelds are denoted by {E3 (r), H3 (r)} and are radiated by equivalent
currents J (r) on the surface S. Let 8‘ denote a surface that is conformal and just
inside 5 and let Et(r) = E3(r) + E’(r) and Ht(r) = H3(r) + Hi(r) denote the
total electric and magnetic ﬁelds, respectively. The combined ﬁeld integral equation
(CFIE) formulation for solution of J (r) is,

as x a x Et(r) + (1 — a)ﬁ x Ht(r) = 0 Vr e 5* (4.1)

where fl is the outward pointing normal and a is an arbitrary scalar constant chosen
between 0 and 1. The scattered electric and magnetic ﬁelds are related to J (r) through

the dyadic Green’s function,

" x n X E3(r) = £e{J(r)} (4.2)
= ﬁxﬁx/SdsGdnr) J(r)
n x H8 (r) = ICm{J(r)} (4.3)
. 1 = I I
= an—n; Sdst [Gn(r,r)-J(r)]
ﬁnk, r') = —-jmy (T + Z—ZV) g(r, 1") (4.4)
e— 'nlr—r’]
g(nr') = Vii—If] (4.5)

In above relations It is the wavenumber, 77 is the characteristic impedance of free space

and I: is. the identity dyad. The CFIE formulation is chosen to eliminate spurious

76

4‘

solutions

current 1

 

l'sing Ga

where.

is is
term:

he re

ah
the
(lit
TEX
1h
i0
tc

tl

solutions corresponding to interior resonance problem. As is normally done, the
current J (r) is represented using a space of local vector basis functions fm(r) [181].

Using Galerkin testing results in a system of equations that may be expressed as

ZI = V (4.6)
where,
an = (010'), —a£e{fm(r)} + (1 - a)’Cm{fm(r)}> (4-7)
(as), seams)» = —jm<fn<r).gfm<r)> + €<v - fm(r),gv - rum) (4.8)
<fn(r), ’Cm{fm(r)}) = (fn(r) X 1‘1, V9 >< fm(r)> (4-9)

12,, = (1,,(r), an x a x Ei(r) + (1 — a)ﬁ x Him) (4.10)

As is evident from these equations, the evaluation of each element may be recast in
terms of evaluation of scalar potentials. Thus, to better analyze the problem, it can
be reposed as follows. Find the potential 111(r) due to a set of N sources

N e—jKIr—ril

W) = 2

i=1

[1' _ ril wnw, (4.11)
where 111,, and w,- represent the appropriate testing and source strengths, respectively,
that include numerical quadrature weights and other constants. It is evident that a
direct evaluation of potential at N observation points yields an 0(N2) method. FMM
reduces this cost to 0(N log N) by utilizing spherical harmonic expansion [30, 65] of
the scalar Green’s function. Very brieﬂy, the classical FMM algorithm proceeds as
follows: the computational domain is embedded in a ﬁctitious cube that is then used
to construct an oct-tree. At the lowest level, interaction between the elements of boxes

that are in the nearﬁeld of each other is computed directly. All other interactions

77

are mmpd

 

that read]

,
operator 1
the ﬁeld 1
a single].
\llile 111
bar um}

that are

“he:

Elma

are computed using a three stage algorithm: (i) compute multipoles M from sources
that reside in each box; (ii) convert these to local expansion L, using the translation
operator T , at all boxes that are in its far ﬁeld; (iii) from the local expansions compute
the ﬁeld at observer points within the box. This simple three stage scheme is called
a single-level algorithm and sufﬁces to discuss the limitation of these expansions.
While multilevel variants of this scheme exist [35, 69], the limitations of FMM are
best understood by examining (6.8). Consider K closely spaced sources located at r,-

that are well-separated from the testing point r,

W) = if?” d2RM(r3,k)T(ro—r3,k)wne—jk'(r°_r) (4.123)
= '73? [dzkwne—jk'(r0‘r)£(ro,k) (4.12b)
K .

M(r3,k) = Zwie—Jk'(r3-ri) (4.120)
i=1

T(r, k) = Z(—1)’(21 + 1)hl(2)(n|r|)Pl(k . f) (4.12d)
l=0

£(r0,k) = M(rs,k)T(ro—r3,k) (4.12e)

where [r0 — r3] > 2d, |r — r0] < d, k = 5k, r3 (r0) is the center of multipole (local)
expansion for source (observation) cluster, ’1' is the translation operator, hp and P]
denotes an order I spherical Hankel function of second kind and Legendre polynomial,

respectively.

4.2.1 Sub-wavelength breakdown of Helmholtz FMM

In FMM the interaction between source and observation clusters is evaluated using
the translation operator T in (6.8d) that contains a spherical Hankel function. The
singular behaviour of spherical Hankel function implies restrictions on the size of its

argument nlrl. For numerical stability, neither the translation distance [r] nor the

78

wavenuzz.
has beer;
submar-
or order

high den]

 

the owl

problem:

4.3

111.10

form (

w}

wavenumber K. can be arbitrarily small [73, 76]. Limitations on these parameters
has been reported in detail in [74]. As a result, FMM is inefficient when applied to
sub-wavelength problems where the principal dimension of the domain is less than
or order of a wavelength only. In such problems, some of the leaf boxes have a very
high density of unknowns and the overall computational complexity is dominated by
the nearﬁeld cost. Thus FMM algorithm is inefﬁcient when applied to multiscale

problems where the discretization rate is either non-uniform or uniformly dense.

4.3 ACE translation operator for Helmholtz po-

tential

In ACE, the translation operator is the only kernel dependent quantity. The analytical

form of the translation function in case of Helmholtz potential can be written as,

 

n n
e_,.,,R 1111 1121 12331
Vn R (n1,n2,n3) = Z (_1)n+mR2m—2n—1
m1=0m2=0m3=
n1 n2 n3 (4-13)
X
m1 m2 m3

$n1—2m1yn2-2m22n3—2m3g(n _ m, KR)
where

g(n,,.11) = Whom)(Humanism)

n n!

= 2mm!(n — 2m)!

 

m

In above expressions, n = n1 + n2 + n3, m = m1 + mg + m3, Kn(-) represents the

modiﬁed Hankel function of order n, R = |r| and [-j is the floor operation. It is well

79

1.110111
size 0:
traml

for m

known that above Taylor’s series expansion is convergent when either the domain
size or frequency is small. Consider the limiting case of low frequencies or small
translation distance i.e. KR —> 0. Using the small argument asymptotic expansion
for modiﬁed Hankel function [59] and comparing with the translation operator for

1 / R potential given in [13], (6.14) can be written as,

P 0.5 n+0.5
gwm z ¢2/«<1«R><“+°5>(—”;’—l (TIER)

for 0<x<2n+1

 

z (271. + 1)!! (4.14)
lim Vie—M? — V"1 (4 15)
AIR—)0 R _ R .

where (n)!! denotes a double factorial. Above relation shows that at low frequen-
cies ACE translation operator for Helmholtz kernel tends to that of Laplace kernel
[13]. Next, the relation between spherical and Cartesian expansions for low-frequency
Helmholtz kernel is derived. Consider the spherical expansion of Helmholtz potential

[65]
cjn|X+d|—

IX +dl -J'~Z(- 1)( (21+1)1'1(red)h[2 )(KX)P)(d X) (4.16)

=0

where d and X are the location of source and observation points respectively, d = |d|,
= IX] and d << X. The following relation between Legendre polynomial and

Cartesian tensors is well known [182, 13]

A

A 1 . A
Pn(X . d) = a(100.11.an (4.17)
where D; is the detracer Operator, using this (4.16) can be written as,

n|X+d|—
f———l;+=d' —jnZ$‘—l—1)(2z+1)j(nd)h(zl(nX) (11(1) 1. 19,14) (4.18)

80

Replacini

above 113'.

 

 

Above 11
ately 3c:

and low

1161‘
to 1

I63

Replacing spherical Bessel function 3'; with its small argument approximation reduces

above relation to a form similar to that prescribed in ACE (6.14) ,

:1,“

j)(a:) % —(2l+1)!! for smalls: ' (4.19)
1),)? = (—1)‘X’+1V’x—1 (4.20)
e—jan+dl °° —1l —1’+1j 2 _
W m: Tao) .1. W11] )(nX)(nX)l+1VlX 1 (4.21)
(:0 . ..

Above relation shows that the translation operator of ACE algorithm is an appr0pri-
ately scaled version of FMM’s such that it is stable for small translation distances

and low frequencies.

4.4 Hybrid algorithm for multiscale problems

Though ACE algorithm is efficient for sub-wavelength problems, Cartesian expan-
sions in (6.14) breakdown when applied to problems with high-frequency or domains
spanning multiple wavelengths. Numerically it was observed that ACE algorithm is
efﬁcient when domain size is conﬁned to 2/\. Thus, the features of ACE algorithm are
complementary to that of FMM, i.e. , ACE algorithm is stable for sub-wavelength
(low-frequency) problems where FMM breakdown, whereas FMM algorithm is stable
and efficient for large-wavelength problems where ACE algorithm breakdown. In mul-
tiscale problems both sub- and large-wavelength problems exists simultaneously and
neither of the algorithms will be efﬁcient individually. Consequently, it follows that
to eﬁciently analyze these structures, it is necessary to hybridize both algorithms to

reap their respective advantages while not inheriting their disadvantages.

81

4.4.1 Combining ACE and FMM

The transition from ACE to FMM is readily realized by using the Taylor series ex-

pansion of the FMM multipoles. Examination of (6.8a) leads to

N 00
M(rs, k) = Zwi 2(1); — r2.)(p) .p . Vpe_jk'(r3’rAl
i=1 p=0
= Z MWO‘A) ~19 - Tg’lﬁk, rs — rA) (4.22)

12:0

where r A denotes the center of ACE multipole expansion MU’) and the mapping

operator Tmap is given as,

1.1.2.4, r)(n1, n2, m.) = 331332633841...
_ (4.23)
= (_j)nngl K32 KZBe-Jk-r
In above expressions n = n1 + n2 + n3 and k = nzi + my}? + 1222. A similar derivation
follows for computation of ACE local expansion from FMM’s. Consider the evaluation

of potential using FMM local expansion (6.8b),

111(1) -_- ano-A — 1)(?) . p . L(p)(rA) (4.24)
p=0
where
Loom) = $417? [1121211344, r0 — r A)£(k, r0) (4.25)

Notice that the same mapping operator Tmap is used for ACE-to-FMM multipole
and FMM-to—ACE local expansion translations. In addition the translation of local

expansions requires the evaluation of spectral integral.

Truncation of Taylor series expansion in (6.15) and (6.16) to P terms introduces

mapping error. Let Re{y} denote the real part of y, the error in real part of FMM

82

multipoles computed from Pth order ACE harmonics can be written as,

 

 

 

P
1 .
5%,”, = Re{M} — Re 2 7(rA — r)(p) -p - Vpe_3k'(r3—rA) (4.26)
19:0 p'
00 1 (P) p
= Z —'(rA—r) -p-V cos(—k-(rs—rA))
p=P+1p'
°° 1
5 Z —' [(rA — r)(p) -p - Vpcos(—k- (r3 — rA))]
p=P+1p'
00
< — an 7’
p2;;1Iﬂ( )
where a = ma2:( [r A — r|). For P 2 2 above error can be written as
00 —2
R 9 3p 1»
Emap S 2 -—.—.———(arc) (4.27)
p=P+1 23 4 5...p
00
32 3 p
< — — P .
_ Z 3 (4) (an) (4 28)
p=P+1

Above geometric series converges when 3am / 4 < 1 and the mapping error can be

P+1
5R .3“_" _L. (4.29)
m“? - 4 1 -— 3414/4

written as,

Above bound shows that the mapping error €71,210], decreases with increasing number Of

ACE harmonics P or with decreasing size of ACE domain d. Now consider the error

in imaginary part of FMM multipoles computed from Pth order ACE harmonics,

 

P
44,... = Im{M}—Im 2%(44—1100)-p-vpe—jk'<rs-'A> (4.30)
p=0p'
oo
1
s 2 —,[4.4—oi”)-p-vpsin(—k~(rs—r4))l
p=P+lp'
= 5%mo

83

Since both real and imaginary parts satisfy same error bound, the Tmap Operator
preserves both the amplitude and phase of FMM multipoles to desired precision. The
error bound for FMM-to—ACE local expansion translation operation is identical to
ACE-to-FMM multipole translation as they use the same mapping operator Tmap.
Furthermore, let d be the side length of the cube where ACE harmonics are deﬁned.
Then in (4.29), a = \/3d and for convergent error d S /\ / 2.65. This limit is within the
range [0.2, 2.0]A where both ACE and FMM expansions are stable, hence the error in

transition from ACE to FMM and vice versa can be controlled to arbitrary accuracy.

4.4.2 Implementation details

As shown in ﬁgure 4.1 the multiscale geometry is mapped onto a non-uniform oct-
tree. This ensures that the number of unknowns per leaf-level box is approximately
same [50, 61]. In ﬁgure 4.1(b), the dark and light nodes indicate ACE and FMM
computational domains respectively. This classiﬁcation of tree-nodes is based on the
size of the domain they represent and introduces a transition level such that nodes at
and above this level are of FMM type and nodes below this level are of ACE type. The
hierarchical tree code computation starts with the evaluation of appropriate multipole
expansion at leaf boxes. During upward tree traversal the parent multipoles are
computed from their children multipoles using the multipole-to-multipole translation
operator. At transition level alone the parent box FMM multipoles are computed
from their children ACE multipoles using the mapping operator in (4.23). Next the
appropriate multipole-to—local expansion translation operation is performed. Then
the children local expansions are updated with their parent local expansion using
the local-to-local expansion translation operator. Again at the transition level, the
child box ACE expansion is updated with its parent box FMM local expansion using
the mapping operation in (6.16). Finally the local expansion coefficients at leaf-

level boxes are used to compute the farﬁeld potential at their respective observation

84

 

points. .-

contrihzd

4.5

 

 

 

This sec]
hybrid 5’
to erahr
SCheme

algor'rth

4.51

is r
poi
cor
Op
rel

oi

points. As in all tree algorithms the complete potential is obtained after the near-ﬁeld

contributions are accounted through direct evaluation.

4.5 Results

This section presents plethora of results that exhibit the accuracy and efficiency of the
hybrid scheme when applied to multiscale problems. First few set of results pertain
to evaluation of Helmholtz potential given a set of random points. Later the hybrid
scheme is integrated with an existing solver and it effectiveness, over FMM-only

algorithm, is shown for several problems.

4.5.1 Helmholtz potential evaluation

First, the accuracy and stability of of ACE-only algorithm when applied to sub-
wavelength problems is demonstrated. Consider the evaluation of Helmholtz potential
at N source/ observation pairs that are randomly distributed within a domain of size
A/ 2. The error incurred in computing only the far-ﬁeld potentials using ACE are
listed in Table 4.1. As is evident, the error decreases uniformly with increase in the
number of harmonics. Note, that the error presented here does not include nearﬁeld
contributions; in general the total error including the nearﬁeld contribution, that are
computed exactly, is less by two orders of magnitude.

Next, the convergence of the mapping operators ACE to FMM (FMM to ACE
is reciprocal) prescribed in (4.23) is demonstrated. Given some arbitrary number of
points conﬁned within a domain of size A, the FMM multipoles for a given box size are
computed both directly and from their children’s ACE harmonics using the mapping
operator Tmap. Let 2d be the side-length of the FMM box, Table 4.2 shows the
relative error in computation of FMM multipoles for various values of d and number

of Cartesian harmonics P used in mapping. As expected the mapping error uniformly

85

 

decrees
To d
hiring .
random
8.000.001
‘2) to 1‘3
stnrcted
oi doma
points is
cordigur
the cen'
average
ACE 211
This we
domly g
of fan“)
Pentlur
Li M .11
dlSirlbr
figure
dislrlbr
UllilOm
Scheme
terill0n
dislribu
the loo

to him 1

decreases to double precision as P increases or as d decreases.

To demonstrate the efﬁciency of the proposed algorithm, the hybrid scheme com-
bining ACE and FMM is applied to both uniformly and non-uniformly distributed
random points. In both cases the number of unknowns N is varied from 64,000 to
8,000,000. In case of uniform distributions the size of the domain is increased from
2) to 12A as the number of unknowns is increased. The non-uniform geometry is con-
structed of three overlapping thin disks, as shown in ﬁgure 4.1(a), and the overall size
of domain was ﬁxed at 12A. In each disk the points were distributed so that density of
points is inversely proportional to radius and linear in z—axis. Note that this geometry
conﬁguration closely resembles a multiscale scenario as the discretization rate, near
the centers of disk, can be as high as x\/ 1000. In all cases it was ensured that the
average number of points per leaf-level box is approximately 64 and the number of
ACE and FMM harmonics were chosen so as to maintain an accuracy of 0(10‘4).
This was veriﬁed by performing direct computation on few, typically 50 to 100, ran-
domly selected points. Table 4.3 shows the time taken, in seconds, for computation
of far-ﬁeld potential using the hybrid algorithm on a desktop computer with 2.3 GHz
Pentium IV processor 4GB RAM running Linux. In uniform distribution LACE and
L F M M denotes the number of ACE and FMM levels respectively. In non-uniform
distribution LACE and L F M M were constant at 5 and 3 respectively for all cases.
Figure 4.2 shows the Log-Log plot of time vs. N for both uniform and non-uniform
distributions. The linear line ﬁt with slope one, indicated inside the ﬁgure, for both
uniform and non-uniform case shows that the cost scaling of the proposed hybrid
scheme is irrespective of how the points are distributed. It is important to draw at-
tention to the overlapping linear line ﬁts corresponding to uniform and non-uniform
distribution of points. This, in particular, highlights the fact that the time taken by
the hybrid algorithm depends purely on the number of unknowns N without regard

to how the points are distributed.

86

4.5.2 Multiscale scattering problems

Next, the performance of integral equation solver augmented with ACE+FMM with
that augmented with only FMM. The solvers employs RWG basis ftmctions for surface
currents J(r). GMRES iterative solver with a restart value of 30 was used with

tolerance and maximum number of iteration ﬁxed at 1E—3 and 1000, respectively.

In these numerical experiments geometries with different overall size and number
of discretizations were considered along with different excitation frequencies. In rest
of the results FMM harmonics were used when the box size is greater than or equal
to A / 4 and ACE harmonics for rest of the domains. For each conﬁguration, of chosen
geometry and frequency, the CFIE solver was executed in two modes (i) FMM-only:
where the leaf-level box size is ﬁxed at A/ 4 to ensure that only FMM harmonics are
utilized (ii) ACE+FMM: the non-uniform tree is constructed such that the average
number of unknowns per leaf-level box was approximately 10 to 20. Note that in
ACE+FMM runs the smallest domain size can be as small as /\/40. In all cases
FMM harmonics were used when box size was equal to or greater than /\ / 4 and ACE
harmonics for rest of the domains - this deﬁnes the transition level in the hybrid algo-
rithm. The ACE and FMM harmonics were chosen such that they yield an accuracy
of 0(1E — 3). The maximum run time for each simulation was limited to 6 days and
any unﬁnished data is denoted by *. The following values are reported in table for
each simulation: near-time and solve-time are the time, in seconds, for computation
of sparse near-ﬁeld matrix and iterative solution respectively, speed-up is the ratio of
total time spent by the solver using FMM-only and ACE+FMM algorithms, Avg/boa:
and M an: denotes the average and maximum number of source / observer pairs per leaf-
level box respectively. The ratio of maximum to minimum edge length serves as good
measure of the multiscale nature of the problem as it is close to one for uniformly

discretized geometries and high for discretization with multiple length scales.

87

First multiscale problem considered here is the cone-sphere geometry. Here the
cone’s tip is densely discretized in comparison to the smooth sphere part of the
structure. Table 4.4 shows the results of solver using FMM-only and ACE+FMM
algorithm when applied to three different cases. The geometry ﬁts within a cuboid
with aspect ratio 1 : 1 : 5 and the maximum dimension for each case is given in
terms of the incident wavelength in table 4.4. Let 2 be the axis of rotation of cone-
sphere. The propagation direction of incident plane wave E = ie‘j’df'r was k = g
for ﬁrst two runs and k = 2 for Run 3, as shown in ﬁgures 4.4 and 4.3. As is evident
the solver with hybrid scheme offers speed-up as high as 7 times over that using
FMM-only algorithm. Essentially this speed-up is achieved by reducing the number
of near-ﬁeld interactions as indicated by the near-time in table. This is due to the
fact that ACE+FMM algorithm allows domain size to be as small as A/ 40 which in
turn reduces the average number of unknowns per box considerably in comparision to
FMM-only case. As expected, the speed-up offered by ACE+FMM algorithm reduces
as the problem size increases as most interactions fall under FMM; only few number
of interactions exist in sub-wavelength domains and ACE algorithm does not offer
much advantage over their direct computation. Figure 4.3 and 4.4 shows the bi-static
RCS corresponding to Run 3 and Run 2 in table 4.4. The RCS computed using both

solvers exhibit excellent match to given order of accuracy.

Table 4.5 shows results from second multiscale problem - NASA almond. The
entire structure ﬁts within a cuboid with aspect ratio 1 : 6 : 4 and the maximum
dimension is given in terms of incident wavelength for each case in the table 4.5. In
all cases the direction of incident plane polarized along 2?: was k = 2 as shown in ﬁgure
4.5. Three different meshes were considered with the number of unknowns varying
from 62,000 to 250,000, the increasing multiscale nature of the problem is indicated
by the max/ min edge length. Here again the hybrid scheme offers speed-up as high as
7 times over FMM-only approach. Notice that in Run 3 the solver with ACE+FMM

88

has completed its run while the large number of near-ﬁeld interactions in FMM-only
consumes almost the entire computational time. Figure 4.5 shows the bi-static RCS
computed using both the solvers for Run 2 in table and they agree with each other.

Table 4.6 shows results from third multiscale problem which is a toy-aircraft ge-
ometry with many sharp features. The structure ﬁts within a cuboid with aspect ratio
3 : 1.5 : 1 and the maximum dimension in terms of incident wavelength is reported in
table. In all cases the direction of incident plane wave polarized along :5 was it = 2
as shown in ﬁgure 4.6. The number of miknowns was varied from few thousands to
millions as the maximum dimension was increased from 1.5 to 20 A. The maximum
to minimum edge length ratio for this geometry was apprOximately 20 in all cases,
indicative of a uniformly dense discretization. The solver with ACE+FMM exhibits
speed-up as high as 14 times over the solver with FMM-only algorithm. Notice that
in Run 3 with 1.7 million unknowns, the large number of unknowns per box, indi-
cated by average and maximum unknowns per box in table, in FMM-only case results
in large number of near-ﬁeld interactions which consumes the entire computational
time. In comparision, with smaller domains in ACE+FMM algorithm the number of
unknowns per box is considerably smaller and entire computation is completed within
the limited time. Figure 4.6 shows the bi-static RCS corresponding to Run 2 in table
4.6.

89

 

 

(b)

Figure 4.1: An example non-uniform (a) point distribution and (b) its tree represen-
tation.

Table 4.1: Error convergence of ACE algorithm with random points within a A / 2 size

dome

Lin

 

P

1

3

5

7

9

l2

 

Error

 

 

1.172E—2

 

4.804E—4

 

3.936E—5

 

1.236E—5

 

4.380E—6

 

7.029E—7

 

90

 

st.T (Error0(1.053))

       

 

 

3 ' 121.
0 Uniform; y- 1.0011 -4.22
2.5: I Non-Unifonn; y- 1.001: -4.18 1°“
w
2 ‘ 811
E 1 5
O
_J
1 1
0.5 ‘
2A
0 . . . . . .
4 4.5 5 5.5 6 6.5 7 7.5
Log (N)

Figure 4.2: Time vs. no. of unknown in log-log plot when hybrid scheme is applied
to uniform and non-uniform (ﬁg 4.1) geometries.

 

 

_FMM+ACE ‘
.___FMM-only
o .
g -5 > Ex
.5 . ~-
8 \A
M.

 

1

60

80 .100 120 140 160 180
Thetamdeg.

 

Figure 4.3: Bi—static RCS of cone-sphere geometry, corresponding to Run 3 in table
4.4. Inset ﬁgure shows the incident excitation and magnitude of surface current.

91

 

RCS in (if)

F i811MB
to RM
Surfaﬂr

 

 

20-

   

 

 

m
u
.5 1o .
(I)
0
m I
— ACE+FMM
— - FMM-only
~10 - q
.200 -100 100 200

0
Theta in dog

Figure 4.4: Bi-static RCS of cone-sphere geometry, corresponding to Run 2 in table
4.4. Inset ﬁgure shows the incident excitation and magnitude of surface current.

 

20 — ACE+FMM

 

 

— — — FMM-only

 

RCSindB

 

 

 

-150 -100 -50 0_ 50 100 150
Theta 111 deg.

Figure 4.5: Bi-static RCS of NASA fat almond (multiscale geometry 2) corresponding
to Run 2 in table 4.5. Inset ﬁgure shows the incident excitation and magnitude of
surface current

92

 

 

 

 

 

“ACE+FMM
---FMM-only
m-
“ ’
5 I
. 10»- ’
at? "..‘s I
.E I
(I) _, ..
O 1
a:
I
I
1200 450 400 -so 0 so 100 150 200

Theta in dog.

Figure 4.6: Bi-static RCS of Toy-aircraft geometry (multiscale geometry 3) corre-
sponding to Run 1 in table 4.6. Inset ﬁgure shows the incident excitation and mag-
nitude of surface current

Table 4.2: Error in FMM multipoles computed from ACE multipoles using Tmap in
(4.23)

 

 

 

 

 

 

 

 

 

 

ACE harmonics
d P=3 P=6 P=9 P=12
0.5 2.13 5.58E-3 9.62E—6 5.90E—09
0.25 2.58E—2 8.04E-6 1.51E—9 1.27E—13
0.125 3.49E—4 1.30E—8 1.55E—13 2.24E—15
0.0625 1.04E—5 5.34E—11 1.41E—15 1.41E—15

 

 

 

93

11th 4
tries

 

 

 

 

Table 4.3: Time for hybrid algorithm as applied to uniform and non-uniform geome-
tries

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Uniform Non-uniform

N Size(A) Time LACE L F M M Time
64,000 2 3.77 3 5 7.48
125,000 2 6.29 3 5 8.13
250,000 2 14.39 3 5 13.93
500,000 4 34.57 3 6 35.71
1,000,000 4 68.7 3 6 79.53

2,000,000 4 125.26 3 6 135.43
4,000,000 8 310.22 3 7 263.7

8,000,000 10 588.94 3 7 484.02

 

 

 

 

 

 

 

Table 4.4: Multiscale problem 1 : Cone-sphere geometry

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Near-Time Solve—Time Speed-up Avg/ box Max

Run 1 800 MHZ, Size ft: 2A with 19,000 basis
ACE+F MM 832.47 449.87 6.21 14 2,288
FMM 7843.59 117.87 1,329 3,424

Run 2 76 MHz, Size n 7A with 19,000 basis
ACE+FMM 593.56 344.96 1.39 17 1,666
FMM 1028 277.13 100 1,926

Run 3 10 GHz, Size z 21A with 72,000 basis
ACE+FMM 1100.28 1107.1 0.79 3 2,084
FMM 1440.7 300.28 41 2,224

 

 

 

 

 

 

 

 

94

Table 4.5: Multiscale problem 2: Almond

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Near-Time Solve-Time Speed-up Avg/ box Max
Run 1 1 GHz, Size = 5A with 62,550 basis
Max/min edge len.= 160.34
ACE+FMM 1063.88 795.31 7.71 5 270
FMM-only 13532 800 4 242 6,118
Run 2 1.5 GHz , Size = 8A with 107,400 basis
Max/min edge len.= 193.42
ACE+FMM 1569.43 6713.38 2.66 4 256
FMM-only 20548.77 1475.01 175 5,464
Run 3 2 GHz , Size = 10.6A with 269,100 basis
Max/min edge len.= 474.61
ACE+FMM 22247.25 26428.42 2: 11 2180
FMM-only 96297.28 * 265 13,110
Table 4.6: Multiscale problem 3: Toy-aircraft
N ear-Time Solve-Time Speed-up Avg/ box Max
Run 1 76.2 MHz , Size = 1.53A with 9,727 unknowns
ACE+FMM 196.15 286.07 14.29 7 36
FMM-only 6491.35 400.66 2,784 5,178
Run 2 300 MHz, Size = 6.06A with 26,145 unknowns
ACE+FMM 62.54 1343.4 6.4 2 12
FMM-only 7800.79 1203.73 697 2,646
Run 3 1 GHz, Size = 20.27A with 1,754,814 unknowns
ACE+FMM 237767.37 58500.89 * 98 1,174
FMM-only * * 691 4,508

 

 

 

 

 

 

 

95

 

 

Ch

 

A);
BC.

This
lead
sub:
tion
dev
tears
a C1
cor.

0ft

Chapter 5

A Well Conditioned Formulation of
Augmented Electric Field Integral
Equation (AEFIE)

This chapter addresses the develOpment of integral equation (IE) formulations that
lead to well conditioned systems of equations. Typically iterative solvers, like Krylov-
subspace solvers, are used for solution of large systems of equation and well condi-
tioned systems of equation require fewer number iterations for solution. Thus the
developments presented here are complementary to the discussions in previous chap-
ters where the focus was on reducing the cost of a single iteration. Section 5.1 provides
a concise account of the recent research work on the theory and development of well
conditioned formulation for electromagnetic simulations. Section 5.2 introduces some
of the analysis tools and the insights they provide in understanding the IE operators
of EM. Section 5.3 introduces a new formulation of the augmented electric ﬁeld IE
(AEFIE) that leads to both better conditioned systems of equation and unique so-
lutions at all frequencies. The new formulation is ﬁrst developed for 2D and then

extended to 3D case with appropriate modiﬁcations. Section 5.4 presents plethora of

96

result:

5.1

Comp).
oomerl

he prir

 

of com
louse;
have 8
Int
‘llatio;
electr.
detail
linear
that 1
depe)
trie ﬂ
and (
and 1
[184.
f017m
r€480
the e:
HlﬁCa
11011-5,

Only 5

results that exhibit the well-conditioned nature of the new formulation.

5. 1 Introduction

Computational electromagnetic (CEM) is the ﬁeld of research that concerns with
numerical solution of Maxwell’s equation. The rapid development in this ﬁeld can
be primarily attributed to the simultaneous development in power and availability
of computers along with the advancements in mathematical research. This chapter
focuses on the latter aspect, the mathematical developments in the past decade that
have considerably altered the landscape of CEM research.

Integral equations (IE) is one of the widely adopted numerical techniques for sim-
ulation of electromagnetic problems [183]. The distinct advantage of IE approach for
electromagnetic simulations over their differential equation counterparts have been
detailed in the previous chapters. Typically, the IE formulations result in a set of
linear system Of equation, which are solved using an iterative solver. It is well known
that the convergence rate of an iterative solver, the number of iterations for solution,
depends directly on the condition number of the numerical system of equation. Elec—
tric ﬁeld IE (EFIE) is one the most widely used formulation as it is valid for both Open
and closed problems. It is well known that EFIE is an integral equation of ﬁrst kind
and the condition number of these numerical system is not assured to be bounded
[184, 185]. Further, EFIE also suffers from the low-frequency breakdown where the
formulation is inherently ill-conditioned for low excitation frequencies. The physical
reasons for this breakdown of EFIE at low-frequencies is well known [17 6]. Consider
the electric ﬁeld produced by an arbitrary electric current source, there exists a sig-
niﬁcant disparity in the magnitude of electric ﬁeld produced by the solenoidal and
non-solenoidal part of the current source. Thus, using the electric ﬁeld equations,

only some parts of the source can be computed in a stable and robust fashion. It

97

 

 

0V8!

quir
err.
118
ite
co
tic
to
£11
f0
31

can be analytically shown that the disparity increases as the frequency tends to zero
and this is known as the low-frequency breakdown of EFIE. Several computational
strategies, based on quasi-Helmholtz decomposition of surface electric currents, have
been proposed to overcome this limitation of EFIE. Loop-star, loop-tree and tree/co-
tree are different forms of the same type of solution approach [179, 178, 87, 186].
Apart from this, recent mathematical analysis of boundary integral equations in EM
have lead to the development of analytical preconditioners that modiﬁes the EFIE
into a well conditioned, second kind integral equations [187, 188, 189]. However, the
resulting formulation suffers from the interior resonance problem and hence produce
non-unique solution at resonance frequencies. Conventional techniques, like combined
ﬁeld IE (CFIE), also fail when applied to these new formulations. Further, the nu-
merical implementation of these modiﬁed formulation demands careful considerations

that has been the focus of several recent research works [190, 191, 192, 193].

This work explores the development of a new IE formulation for electromagnetic
simulation that is both well-conditioned and resonance free. This formulation is based
on the augmented ﬁeld integral equations (AFIE), which were initially proposed to
overcome the interior resonance problem [194]. AFIE, as originally proposed, re-
quires the solution of an over-determined system of equations using a least-square-
error approach. In this work, both the electric charges and currents are considered
as unknowns and the resulting new AFIE formulation is amenable to conventional
iterative solvers. However, this requires that the imposition of continuity and charge
conservation conditions separately. Based on operator theory analysis, these addi-
tional constraints are imposed in a manner such that the operators in the resulting
formulation are bounded and compact; leading to well—conditioned systems of equa-
tions. The validity of these formulation is shown both analytically and numerically
for 2D problems. Since some of the observations in 2D case does not hold good for

3D problems, the relevant modiﬁcations are also discussed here.

98

5.2

 

let S 1:
surface
it. The
current

{or sold

let S

lntegr

Ina

elec‘

In

Sp.

5.2 Preliminaries

Let S denote the surface of a closed PEC object that resides in free-space. This
surface is excited by a plane wave characterized by {Ez (r), Hi(r)} with wavelength
A. The scattered ﬁelds are denoted by {E3 (r), H3(r)} and are radiated by equivalent
currents J (r) on the surface S. The electric ﬁeld integral equation (EFIE) formulation

for solution of J (r) is,
f1 X f1 x (Ez(r) + E3(r)) = 0 Vr E S (5.1)

Let S — denote a surface that is conformal and just inside S, then the magnetic ﬁeld

integral equation (MFIE) is written as
f1 x (H3(r) + Him) = 0 ‘v’r e s- (5.2)

In above equations, fl is the outward pointing normal on surface S. The scattered

electric and magnetic ﬁelds are related to J (r) through the dyadic Green’s fimction,

f1 x ii x E3 (r) = £t{J(r)} (5.3)
= ﬁxﬁxfgdan(r,r)-J(r) (5.4)
—f1 x H3(r) = ICt{J(r)} (5.5)
. 1 = I I

= nxj-TT7 Sdst [Gn(r,r)-J(r)]

E1403 r’ = —jm] (I + %) g(r,r’) (5.6)
I e—jn|r-r’|

g(r,r) = m (5.7)

In above relations K. = w/c is the wavenumber, w is the angular frequency, c is the

speed of light in free space, 77 is the characteristic impedance of free space and I is the

99

identity dyad. Since the above formulations are based on the boundary conditions
on tangential ﬁelds, the IE operators [It and IQ are referred to as tEFIE and tMFIE,
respectively. Typically, in numerical solutions, the current J (r) is represented using a
space of local vector basis functions fm (r) [181]. Galerkin testing results in a system

of equations that may be expressed as

21 = v (5.8)

For example, in case of tEFIE, the elements of Z and V can be written as

an = <fn(r),-£t{fm(r)}> (5.9)
(fn(r),£1{fm(r)}> = —jnn<fn(r),gfm(r)>+%<V-fm(r),gV-fn(r)> (5-10)
12,, = (fn(r),ﬁxﬁxE’(r)) (5.11)

5.2.1 Interior Resonance and Augmented IE

It is well known that, for closed geometries, both tEFIE and tMFIE Operators have
non-empty null space at excitation frequencies corresponding to interior resonance.
Thus, at these frequencies the solution of the equations (51,52) is not unique [195].
Combined ﬁeld IE (CFIE) is a popular alternative, where both EFIE and MFIE
are solved simultaneously. Other proven approaches to overcome interior resonance
problems are combined source IE (CSIE) [196] and dual surface IE [197]. All these
alternatives demand additional computation in one form or the other, for e.g. CFIE
requires computation of magnetic ﬁeld, and CSIE doubles the number of unknowns.
Augmented ﬁeld integral equations (AFIE), proposed by Yaghjian [194], is an alter-
native approach to overcome the interior resonance problem with the computation
of either electric or magnetic ﬁeld only. In AFIE, a unique solution to currents J (r)

is obtained by simultaneously satisfying both the tangential and normal boundary

100

 

 

 

condit]

 

 

Simllzt

 

lo the
IESpor
boom
All?
it is .
Syste
appr
tint-r
AFli

iSne

5.2

In ti
to st

elect

condition of electric or magnetic ﬁeld. Consider the. augmented EFIE (AEFIE)

mi} = —ﬁ x a x 132(1) Vr e s (5.12)

5,,{J} = 50?- — a - Ei(r) ‘v’r e s (5.13)

5,,{1} = a . / 1135,01, r’) -J(r') (5.14)
S

Similarly, the augmented MFIE (AMFIE) can be written as,

ICt{J} = J — a x Hi(r) Vr e s (5.15)

1c..{J} = —ﬁ - Hi(r) Vr e s (5.16)
. 1 = I I

1cm} = n - 1'77 SdsV x [0,,(r,r ) -J(r )] (5.17)

In the above equations Ln and [Cu are the nEFIE and nMFIE operators that cor-
respond to the boundary condition on electric and magnetic ﬁelds normal to the
boundary surface S, respectively. It has been rigorously shown that both AEFIE and
AMFIE produce unique solutions for any closed geometry except spheres. Moreover,
it is evident that discretization of both the formulations lead to an overdetermined
system of equations, which can be solved only in a least squares sense. Hence this
approach has been relatively less popular when compared to CFIE or CSIE. A dis-
tinct feature of AFIE, also noted in the seminal work of Yaghjian [194], is that the
AFIE formulation is similar to a second kind IE. However, exploiting this advantage

is not trivial and forms the main focus of the work presented here.

5.2.2 Operator and Eigenvalue Analysis

In the past decade, rigorous mathematical analysis techniques have been employed
to study the different boundary integral equation formulations used in computational

electromagnetics. These theoretical analysis of boundary IE operators depend on the

101

form and size of the computational geometry and they are restricted to the study of
canonical geometries such as circular cylinder and sphere in 2D and 3D, respectively.
Though analytical solution exist for these geometries, the theoretical analysis have
provided valuable insights that led to a better understanding of the behaviour of these
IE Operators. It is well known that an IE Operator with ﬁnite and bounded spectrum,
when discretized, yields a well-conditioned system of equations [185, 191]. In formal
terms, a formulation is well-conditioned if all its operators take the form (I + M) ,
where I is the identity operator and .M is a compact operator. Intuitively, compact
Operators have bounded spectrum and the presence of identity operator ensures that
the spectrum of overall operator is offset from origin . Hence, the spectral radius of
these operators are ﬁnite and bounded, see [184, 185] for rigorous treatment on these
topics. In rest of this section, for the sake of completion and clarity, these recent
develOpments are presented in requisite detail for the 2D case. Similar observations
hold for 3D case also, however with more involved derivations beyond the scope of

this thesis, and these are just stated with ample references.

Consider the 2D problem of traverse electric (TEz) scattering from a PEC circular
cylinder Of radius a, with axis of rotation aligned along the Z-axis. The tEFIE

formulation (5.1) for this problem can be written as,
27r VV
-jl€7)f‘ X /ad¢, (7 + it?) H32)(KIR) - J¢(¢I) = —f‘ X Ez(¢) (5.18)
0

where H (2) is the Hankel function of second kind, R = a [r—r’ | , r = a cos ¢i+a sin 33)
0
and r’ = a cos (15’ a”:+a sin (1’37. For this canonical case, it is well known that {43(3an n =

0,1, . . .} forms a complete set of eigenfunctions for the surface current J [183]. Then

102

 

the or.

 

where

eigenf‘.

 

The;
8155

0. lli(

Th1

wjd

for

the unknown surface current J ¢ can be represented as sum of these eigenfunctions,

00
J¢ = Z Inasejnd’ (5.19)
n=0
where In are the unknown coefficient to be solved for. Using the orthogonality of the

eigenfunctions, the eigenvalues of Operator Lt are given as
1 I
AgE’tEFIE = 5(n7ma)J,’,(I~za)Hn(2)(Ica) (5.20)

The plot of few of these eigenvalues for different orders 77. and size of the object
a is shown in ﬁgure 5.1. Evidently, these eigenvalues are zero whenever J;,(I~za) =
0, indicating the non-trivial, ﬁnite dimension null space of the TE—tEFIE Operator.
These are also the frequencies corresponding to the interior resonance. Further, from
ﬁgure 5.1, it is seen that the spread between eigenvalues is large , especially, for small

values of 15a. Consider the asymptotic limits when KG. —> 0,

TE,tEFIE ~ .nna
AZEJEFIE z 39%] —» 00 ;n 55 o (5.22)

This suggests that the tEFIE Operator is unstable for electrically small scatterer. The
widely spread eigenvalue spectrum also suggests that tEFIE Operator is an unbounded
operator. This is particularly a consequence of the double derivatives in (5.18) that

leads to hyper-singular terms.

The same eigenvalue analysis can be extended to tMFIE operators. The tMFIE

for 2D TEz scattering from a PEC cylinder of radius a can be written as,

1 . 2“ I" I I (2)
zctrJ¢<411=J¢+Er x [0 cake 4(4 )J¢(¢ 1H. (.11) (5.23)

103

 

Sam; .
Here.
tions

null s]

narree

 

 

analxd

Thus

3CC'UI

whet

3580c

As in the case of tEFIE, {43ejnd’} form a complete set of eigenfunctions and the

corresponding eigenvalues are given as,
1 I
ZEWFIE = §(j7ma)Jn(Ica)Hn(2)(na) (5.24)

Samples of these eigenvalues are plotted in ﬁgure 5.2 for different values of n and 14a.
Here, the eigenvalues are zero whenever Jn(1ca) = 0 and the corresponding eigenfimc-
tions form the ﬁnite, non-trivial null space of the tMFIE Operator. Notice that the
null space of tEFIE and tMFIE are not the same, in other words, the interior reso-
nance for both operators occur at different frequencies. Performing the asymptotic

analysis for I430. —> 0,

Ag’EJMFIE z 1 (5.25)

AgEttMF’E z § ;n ,4 o (5.26)

Thus the spectrum of tMFIE operator is bounded and for n —+ 00 the eigenvalues

accumulate at (0.5+j0.0).

Similar analysis can be carried out in 3D for the canonical problem of scattering
from a PEC sphere of radius a. Here, the vector tesseral harmonics Xnm and Unm
form the complete set of eigenfunctions for representation of the vector current ﬁelds
on surface of the sphere. These are also known as the surface Helmholtz decomposition

[187] on sphere and given by,

x5484) = i- x VYJ"(6,¢) (5.27)

Unm(9,¢) = 1" x xnm(o,¢) (5.28)

where 14:" (0, (1)) = P,’,"(cos 0)ejm¢ is the spherical harmonics and Pi," denotes the

associate Legendre function of order {n, m}. The eigenvalues of tEFIE and tMFIE

104

opera

 

 

unho{
about

261063

hueri
probl
these
QXlQl
ﬁne
tion
mre
The
our
ati

H10

In t

001]

Operator in 3D can be derived as,

xnm 2 .n (1) Unm

4. = (8a) 1 (841117184) (5.29)
Um, (Ica)2j;,(na)hn( )(Ica)Xnm
Xnm —' 2 " hi.” Unm

1C1 = 1(5a)1n(na)’1 (.5) (530)
Um 1(na)27n(na)hn( ltna>xnm

Similar to 2D case, the spectrum of tEFIE operator is widespread, indicating an
unbounded Operator and the spectrum of tMFIE Operator is bounded and accumulate
about (0.5 + 30.0). Here again, the interior resonance frequencies, corresponding to
zero eigenvalues, are not same for tEFIE and tMFIE operators.

The above eigenvalue analysis offers more insight than just understanding the
interior resonance problem and analytic nature of the IE operators. Since practical
problems cannot be approximated as above canonical problems, discretized version of
these IE operators are employed. In such cases, the above eigenvalue analysis can be
extended as follows: assuming a uniform discretization of the geometry, increasingly
ﬁne discretization size corresponds to better representation of higher order eigenfunc-
tions. Thus, employing a dense discretization with tEFIE Operator leads to larger
spread of eigenvalues and hence results in a badly-conditioned system of equations.
The same geometry, when considered with tMFIE operator would lead to a well con-
ditioned system of equation as their eigenvalues are bounded at all frequencies, except
at interior resonance. These insights are used in develOpment and investigation of the

modiﬁed AEFIE formulation presented in the next section.

5.3 Well-conditioned Formulation for AEFIE

In this section, the AEF IE (5.12) is posed in a manner such that it results in a well

conditioned system of equations; that is amenable for use with conventional iterative

105

solvers. The particular case of AEFIE was chosen so that the formulation, in future,
can be extended to Open geometries as well. Similar to the discussion in previous
section, an eigenvalue analysis of the proposed formulation is presented in detail for

the 2D problem. Extension to 3D is not trivial and requires careful consideration.

As mentioned before, if N basis function are used to represent the current, AEFIE
requires the solution to satisfy 2N contraint equations of tEFIE and nEFIE. In this
work, electric charges are also considered as unknowns so that there are 2N unknowns
to be solved with 2N equations. Charge unknowns have been previously employed
in MOM formulations to overcome the low-frequency breakdown of EFIE. Here we

consider this choice speciﬁcally to reformulate AEFIE as

A3 A], J 11in
= — , (5.31)
A} A]; p ﬁ-E’
At, = ﬁx / dr'g(r,r’)J(r’) (5.32)
S
A3 = a. / dr'g(r,r')J(r’) (5.33)
S
A; = -ﬁx deHVg(r,r’)p3(r’) (5.34)
A2 = —£—ﬁ-/dr'Vg(r,r’)ps(r’) (5.35)
60 3

Another motivation for employing charge unknowns is that the scalar basis function
used to represent charges forms a suitable set of testing function for nEFIE. In nu-
merical implementation, the vector basis function for currents and scalar charge basis
function are used to test the tangential and normal electric ﬁeld boundary conditions,
respectively. However, considering both currents and charges as unknowns demands

development of methodologies to impose the continuity and total-charge conservation

106

 

 

60nd;

Here
fune'
molt
liere
thee:
This
the r]
Thus

 

fine
appr
to ir
lead
is tl
Thu
knot
tiOr)
el‘Te:
SfStt

the i

conditions,

V-J

’ijs (5.36)

2,0,, = 0 (5.37)

Here, the continuity condition is imposed as an external constraint using the penalty
function method. In this approach, the discretized form of the differential equation is
multiplied by a scalar factor a and added to the tangential ﬁeld boundary condition.
Here, a is a predetermined constant chosen to be as large as possible, as per the
theory of penalty functions, but within the range of available numerical precision.
This causes a obvious numerical imbalance between the two equations of (5.31), hence
the normal ﬁeld boundary condition is also scaled by a to ameliorate this disparity.
Thus, the AEFIE formulation satisfying the continuity condition is given as,

A3 + (IV A}, + ozjw J¢ = _ 1‘1 x Ez (5.38)

01.743 01.743 p3 an - Ez

Finally, the charge conservation can be ensured either through a penalty function
approach or through the deﬂation procedure. Since penalty function is already used
to impose the continuity condition, employing it to impose charge conservation can
lead to numerical overﬂow. Further, note that the constant current and charge vector
is the only non-trivial element in the null space of the AEFIE operator in (5.31).
This observation favors the use of deﬂation technique as it requires an approximate
knowledge of the null space. Deﬂation is a well known procedure used in the solu-
tion of badly conditioned numerical systems. Consider an arbitrary matrix M with
eigenvectors {en} and corresponding eigenvalues {An}. M is a badly conditioned
system of equation if one of the eigenvalues , say A0, is very small. However, with

the knowledge of corresponding eigenvector, one can consider a modiﬁed system of

107

equations M’ that is a rank one update of the original system
M’ = M + (q — 10150453" (5.39)

where q is the average eigenvalue of the original system. Notice that q becomes the

new eigenvalue of eigenvector e0 and all other eigenvalues are unchanged.

M'eo = (M+ (q-A0)eoeo)eo (540)
= qeo (5.41)
M'en = MennyéO (5.42)
= Anen (5.43)

Thus the defective eigenvalue is deflated from the original system using the rank one
update. Multiple deﬂations or rank one updates can be used to improve the condition
number of an arbitrary system with more than one defective eigenvalues. However,
the deﬂation procedure requires the knowledge of eigenvectors corresponding to these
defective eigenvalues. Also, additional evaluations need to be performed on solution
of modiﬁed system to remove the effects of deﬂation and obtain the correct solution.
Since the null space of the AEFIE operator (5.31) is known, the deﬂation technique
is employed to impose the charge conservation condition. At the limiting case of
w —» 0, the above AEFIE equations reduces to solving the currents and charges
using the normal electric ﬁeld boundary condition while imposing the zero divergence
constraint on currents. It is well known that nEFIE is equivalent to tMFIE and hence
is well conditioned at low-frequencies also. Also, the zero divergence is the required
and physically correct behaviour of currents at low-frequencies. Hence the proposed
AEFIE formulation does not suffer from the low-frequency breakdown and is expected

to be well conditioned across the frequency range.

108

5.3.1 Eigenvalue analysis in 2D

An immediate observation, looking at (5.31), is that none of the AEF IE operators
contain double derivatives, which is the reason for hypersingularity that leads to
unbounded operators. The following theoretical analysis is performed to investigate
the analytic nature of each of the AEFIE Operators. Consider the evaluation of
electric ﬁeld, using the above operators, produced by the current and charge sources
residing on the surface of a circular cylinder of radius a. As mentioned in previous
section, {die-j 11¢} form a complete set of eigenfunctions for vector surface currents J ,5.
Similarly, {ejnib} forms a complete set of eigenfunctions for the scalar surface charges
p3. Thus the mixed set {(iejnd’, ejnib} is a complete set of basis functions for the
AEFIE operator (5.31). It is a fairly straight-forward exercise to obtain the following

result,

 

5in _ z Z “cine
_ . = wéma 11 12 45. (5.44)
mi - Ez 221 222 e3"¢
1 2 2 '71.
Zn = 594418421415.htm)—Jn-1(8a>H.‘._’1(m>>—a’;<s.45)
212 = -:—aJn(15a)H,(,2)(Ica)—ja15 (5.46)

221 = a(2ij(Jn+1(Ka)H,(,2,ll(na)+Jn—1(na)H,(,21(na))) (5471

Z22 = a(%Jn(na) ;(2)(na)+31z) (5.48)

Note that the eigenvalues of the above operator can be obtained through numerical
solution only. Instead, the behaviour of the four individual Operators can be studied
separately. Comparing with (5.24), it can be concluded that the spectrum of the
operator A; is similar to that of tMFIE and hence it is bounded and compact.
Figure 5.3 shows the evaluation of function Jn(na) Hn(1ca) for different values of no

and n. It is evident, that the spread of eigenvalues is bounded and accumulates at

109

 

 

 

 

 

tria

origin for large values of n. Thus the remaining three AEFIE operators - A3, 3, .142,
are also bounded. Notice that the addition of continuity condition as penalty term,
to At , shifts the eigenvalues away from origin and hence acts as a preconditioner.
Thus, this analysis shows that the operators forming the diagonal sub-matrices of
the new AEFIE formulation are bounded and compact. Which in turn implies that
the resulting numerical system of equations from the new AEFIE formulation is well

conditioned.

5.3.2 3D Problems

The AEFIE formulation, developed above for 2D, can be extended to 3D problems
also. One of the important observations in the 2D case is that the constant vector
current is the only null space of the divergence D operator. This observation is spe-
ciﬁc for 2D closed geometries and is crucial to application of the deﬂation technique
discussed above. In general, all solenoidal currents are elements of null space of the
divergence operator D and there is no restriction on number of these solenoidal cur-
rents in 3D. It is well known that the surface vector ﬁeld on a triangular discretization
of a closed 3D surface, with N nodes and M triangles, can be represented with 3M / 2
vector basis functions. Further, these basis functions can be separated into N purely
solenoidal and 3M / 2 — N non-solenoidal vector basis functions. Hence, in the case of
triangular discretizations, the dimension of the null space of the discretized operator
D is N. This implies that one needs to perform multiple deﬂations, which can be
costly and tedious. Instead, a domain decomposition framework is adopted in the
3D case. This approach relies on the fact that, given a set of basis flmction, the
solenoidal and non-solenoidal contribution to the total surface current J can be read-
ily identiﬁed. This decomposition of surface currents is well known in literature as
loop-star, tree-co-tree, loop—tree decomposition and is not unique for a given surface

triangulation. Testing functions are also decomposed in this manner. The resulting

110

discretized AEFIE formulation can be written as

 

 

Zt A 2 J 2
LL 1 L = L (5.49)
A21 A22 J EL
- -T
L -
- -T
_ t
A21 — ZTL Z312 (5.51)
Zt —01V- Zt —aI
A22 = W T” - (5.52)
_ 0 3T 02310
~' . u T
E1 = {135, 057)} (5.53)
~ T

where the subscripts J L and J T represent the solenoidal (loop or co-tree) and non-
solenoidal (tree or star) parts of the total current J, ZLmn and Z3", is the respective
discretized form of the tangential and normal ﬁeld IE operator representing the contri-
bution from source type. n = {L, T, p} and testing function of type m = {L, T, p}. As
expected, the differential operator D preconditions the matrix elements corresponding
to non-solenoidal source and testing basis functions only. Hence, in the above equa-
tion, [the sub-matrix A22 is expected to be well conditioned with bounded spectrum
and Z L L is only bounded. Schur complement is a well known method to solve such
decoupled system of equations. Here the Schur complement, inverse of the smaller
sub-matrix, is computed and used in the iterative solution. In AEFIE, assume that
the inverse of Z L L is explicitly computed, then the Schur complement solution can
be written as

J = [A22 - .42le1111112] _1 (I‘SL — A21ZZILJE‘L) (5.55)

111

Here it cannot be guaranteed by any means that Z L L will be a considerably small
matrix. Since Z L L is guaranteed to be well conditioned, the inverse in above equation
can be replaced with an iterative solution. Thus the Schur complement solution
offers a reasonable means of exploiting the well conditioned nature of the diagonal

sub-matrices.

5.4 Results

In this section, results are presented to validate the well-conditioned nature of the
proposed AEFIE formulation in both 2D and 3D. In both 2D and 3D, different ge-
ometrical conﬁguration were considered and condition number was computed for a
wide range of incident frequencies.

The 2D AEFIE formulation with deﬂation is ﬁrst validated by comparing the nu-
merical solution for surface currents on a circular cylinder with the analytical solution,
as shown in ﬁgure 5.4. There is an excellent match between the two solutions and
hence validates the numerical implementation of the proposed AEFIE formulation.
Figure 5.5 shows the singular values of AEFIE system before and after application of
deﬂation. As discussed in previous section, the deﬂation techniques successfully elim-
inates the defective singular valuas to improve the condition number of AEFIE formu-
lation. The 2D AEFIE formulation was applied to cylindrical objects of various shapes
as the frequency was varied from 3Hz to 30 GHz for each geometry. In each case, the
chosen discretization rate corresponds to A/ 10 at 30 GHz and the same discretization
was used across the frequency range. Figure 5.6 shows the condition number of 2D
AEFIE formulation and that of a conventional tEFIE approach (5.1) for circular ge-
ometry. As expected, the condition number of the tEFIE formulation monotonously
increases as the frequency reduces. The condition number of AEFIE stays constant

across the entire frequency range, complementing the theoretical arguments in above

112

section. Next, the condition number of AEFIE formulation is computed for elliptical
cylinders of different aspect ratio. As expected, the condition number of AEFIE for-
mulation remain constant across the wide range of frequencies, while that of tEFIE
increases monotonously as frequency reduces. The elliptical surface were formed by
nodes located {13", yn} = {A sin(nA¢)+cos(nA¢), A cos(nA¢)+sin(nA¢)} where A4,
controls the discretization rate and A denotes the aspect ratio. Thus the variation in
discretization size is large for ellipse with larger aspect ratio. As seen from the ﬁgure
5.7, though the condition number of AEFIE increases with aspect ratio the variation
is negligible and are within the same order of magnitude. With regard to tEFIE
formulation, the behaviour of condition number with frequency was almost the same
for different aspect ratios. Figure 5.8, presents the condition number of a singular
geometry formed by an intersection of triangle and half-circle. Again, the discretiza-
tion points were generated at constant angles and this geometry is representative of a
multiscale structure in 2D. The condition number of AEFIE formulation remains con-
stant and relatively low across the range of frequencies, while the condition number

of tEFIE formulation increases at lower frequencies.

Next, the AEFIE formulation was applied to three different 3D geometries: sphere,
NASA almond and cone-sphere geometry. Here the sphere geometry was discretized
at a uniform rate. NASA almond and cone-sphere geometry are representative of
multiscale structures, as they contain dense discretizations to capture sharp details.
In each case, the condition number of the operators Z L L and A22 involved in Schur
complement solution (5.55) and the conventional tEFIE formulation were computed
at three different frequencies - 30 Hz, 3 KHz, 3MHz. Table 5.1 shows the condition
number for these numerical experiments with a simple loop-star decomposition [186]
to represent the solenoidal and non-solenoidal parts of the current. As expected, the
condition the number of tEFIE formulation increases as the frequency is reduced. The

condition number of diagonal sub-matrices of AEFIE formulation remains constant

113

across the frequency range. Though the condition number behaviour is favorable with
respect to frequency, these are quite high for use in practice. As an alternative, the
solenoidal and non-solenoidal current decomposition were numerically orthogonalized.
The resulting condition numbers of AEFIE Operators Z L L and A22 are shown in Ta-
ble 5.2. For comparision, the modiﬁed tEFIE operators with the same orthogonal
basis function is also included in Table 5.2. In comparision to Table 5.1, the use of
orthogonal basis functions improves the condition number of AEFIE operators con—
siderably. Further, the condition numbers of AEFIE formulation is either comparable
or better than the the modiﬁed-tEFIE formulation. However, it must be noted that
the construction of orthogonal basis functions is a computationally intensive process
as the cost scales quadratically with the number of unknowns. Eﬂicient means of
obtaining these basis function is a current and widely investigated t0pic of research

and is part of the future work.

114

 

 

 

 

Table

pOSlllt

 

Table 5.1: 3D AEFIE condition numbers with loop-star decomposition

Table 5.2: 3D AEFIE condition numbers with orthogonal, quasi-Helmholtz decom-

position

 

Freq.

Z

A11

[ A22

 

Sphere

 

30 MHz

9.18E+02

2.59E+02

2.21E+03

 

30 Khz

9.20E+08

2.60E+02

2.03E+03

 

30 Hz

2.20E+15

 

2.60E+02

 

2.03E+03

 

Cone— Sphere

 

30 MHz

2.17E+05

2.10E+03

1.20E+06

 

30 Khz

2.38E+08

2.07E+03

4.48E+04

 

30 Hz

1.23E+18

 

2.00E+03

 

1.94E+05

 

Thin-Almond

 

30 MHz

3.19E+05

5.50E+05

1.80E+07

 

30 Khz

3.20E+11

5.50E+05

1 .80E+07

 

 

30 Hz

 

1.38E+15

 

5.50E+05

 

4.43E+07 .

 

 

Freq.

A11

1

A22

I Ortho—LS

 

Sphere

 

30MHz

2.80E+01

1.00E+02

359

 

30KHz

2.60E+01

2.46E+02

360

 

30 Hz

2.60E+01

2.50E+02

360

 

Thin-Almond

 

30MHz

8.50E+02

2.40E+05

1.20E+05

 

30KHz

8.50E+02

5.88E+05

5.80E+05

 

30 Hz

8.50E+02

2.40E+05

 

Cone-Sphere

 

30MHz

1.40E+02

5.54E+02

8.08E+04

 

30 KHz

1 .39E+02

9.79E+03

 

 

30 Hz

 

1.39E+02

 

1.24E+03

8.08E+04

 

 

115

 

 

 

 

F101"

:9 ..
Grail

o ‘1'
Open

 

 

 

 

 

 

 

.23”!
m3. "

'4'§ --e--02
,5}? +2

2 "8°20
3.: "0..

a 200
Jr;

 

o 0.1 02 0'3 0.4 015 0.6
Real

I .
Figure 5.1: Eigenvalue spectrum of J;,(na)Hn(2)(na) corresponding to 2D-tEFIE op-
erator (5.20).

 

 

 

 

 

 

 

o.
-0.1
«02
6,03-
“ __k_°_
E .04 -e—o.2
* 2.0
~05 D 2.0
O 20
4.6 U 200
-o. . - - -
3.4 -0.2 0 0.2 0.4

I
Figure 5.2: Eigenvalue spectrum of Jn(na)Hn(2)(na) corresponding to 2D-tMFIE
operator (5.24).

116

Fié’Ur
anal}...

 

ka

 

 

 

 

 

 

D 200.0
a 0 20.0
D D 20
n O 0.2
u
n
:1
U a
El
:1
~ 0
U l l j A
1.6 2 2.5 3
Real

Figure 5.3: Eigenvalue spectrum of Jn(rca)Hn(na) corresponding to some of the AE-
FIE operators.

 

 

:x—_
_ Real EFIE

   
     

0 Real New
- - - lmag EFIE

 

 

 

1- .
:1 .. c1 lmagNow I. ’
I:
. l
t
3 0.5-
'U
8
3
U
E
D o-
a
E
E
.
I: -0.5

 

 

 

40 60
Anglo Index

Figure 5.4: Surface current on 2D circular cylinder computed using 2D AEFIE and
analytical solution.

117

 

 

 

 

 

 

14 s

 

 

 

 

 

 

 

 

 

 

 

121 v ,r _
8
a 10- _
E —Wlthout deﬂation
a - - -Altor dollallon
'5 3. ..
O
E
a)
'5‘ 0- .
‘6'
'5
O
-' 4- -
2. l
10 2O 30 4O 50 60 70 80 90 100
Index

Figure 5.5: Singular values of 2D-AEFIE formulation before and after deﬂation.

 

 

 

 

 

 

 

 

 

18 T I l I l I I
16 5‘ _Now (1E12) _
~.:l\* -.n-ens
14- §‘~.~ 0 New (1E8) -
\ '~.
ss *-~.. ---- No J dollatlon
12 \ "~,~ ..
h- ~~ ~
A ‘ ~,‘.
3 ‘s F"
1° ’- ‘ ~L~N~ ‘
8 ‘\ L~o
H \ Q. ~ .5
O 8 “ ~ ~ - _ 4|
1- " ‘
a \
o ‘x
_l S
8 r '5‘ 1
‘s
‘s
4 l- ‘ —l
\\
h~~‘;
o l I n I 1 n J
0 1 6 7 8

3 4 5
L0910 ( lroq )

Figure 5.6: 2D AEFIE condition number vs. frequency for circular cylinder.

118

18-

 

 

 

 

 

 

 

16 - — - EFIE (All aspect ratios)
"a.” -+.. AEFIE (1 :8)
14 _ “s. ———AEFIE (1 :16)
“a. -a-AEFIE (1:4)
\
A 12 - X
3 “~
8 10 - ‘s‘.
\
2 8 “\
6 \s‘
\
4 - x
:3. '..‘.'.':.*:':.':ﬁ:':.: *2: 1.7:: e.g.
2 ~ ”'53-
o 1 1 1 1 1 1 1 1
0 1 2 5 6 7 8

3 4
Log10 (Freq)

Figure 5.7: 2D AEFIE condition number vs. frequency for elliptical cylinders with
different aspect ratios.

119

 

 

Fig
H111

 

Iog10(Cond.)

Figure 5.8: (a) Shows the 2D AEFIE condition number vs. frequency for a 3D

 

 

 

 

 

 

 

 

 

..l
_AEFIE
1“ ' -o-EFIE
12- a\~‘
10 - X
‘\1\ J
8 ~ 31‘
g
0 - \e
S“
4 . xx .
S‘
2 #31:}
00 2 4 6 8
log10(Freq.)
(a)
(b)

multiscale geometry and (b) shows the geometry

120

Chapter 6

Algorithms for Implementation of
Hierarchical Computations on

Parallel, Distributed Computers

In this Chapter, parallel algorithms are developed for efﬁcient implementation of al-
gorithms described in previous chapters on distributed parallel or cluster computers.
Section 6.1 providas a brief introduction to the existing approaches for parallel imple-
mentation of FMM. Section 6.2 provides a succinct summary of the integral equation
solver for electromagnetic simulation (EM) and the Wideband FMM, developed in the
Chapter 4, for multiscale EM simulations. Section 6.3 is devoted to the details of the
parallel algorithm proposed in this work. This section expounds on the parallel im-
plementation of every step of the hierarchical tree computation. Section 6.4 presents
a plethora of results that demonstrate the scalability and efﬁciency of the proposed

algorithm.

121

6. 1 Introduction

It is well known that integral equation (IE) methods are well-suited to model ﬁelds
propagating in unbounded media as they impose exact radiation boundary condition
and, in many cases, requires discretization of the boundary surfaces only [183]. The
principle bottleneck to the popularity of IE based solver was its computational com-
plexity; both memory and time scale as 0(N2), where N is the number of spatial
degrees of freedom. In the past two decades, several fast algorithms have been de-
velOped to ameliorate the computational complexity of IE based solvers. These can
be broadly classiﬁed [33] into (a) fast multipole methods (FMM), (b) fast Fourier
transform (FFT) based approaches and (c) numerical compression schemas. FMM
and its variants have enjoyed a widespread popularity [6, 8, 174], and the rest of
the paper will focus on the further develOpment of FMM based methods. FMM was
ﬁrst developed for the Laplace potential used in static problems [6, 8] and was later
extended to the Helmholtz potential [30] for use in electromagnetic simulations [65].
The development of similar fast methods, combined with the explosive growth in the
computing power, have enabled simulation of realistic structures with complex geo-
metric features [198]. Many of these simulations fall under the category of multiscale
problems that exhibit multiple scales in length or frequency or both. It is well known
that classical FMM suffers from low frequency breakdown [42, 44], and several mod-
iﬁcations have been proposed to overcome this limitation [46, 49, 50, 198, 180]. In
this work, the Wideband FMM, as described in the Chapter 4, is constructed as a
hybrid combination of accelerated Cartesian expansion (ACE) and FMM is used for

multiscale simulations.

The developments of fast methods has increased the size of the problems being
analyzed from thousands to millions of unknowns [36, 199, 200]. As the problem size

exceeds few millions of unknowns, the serial implementation of above fast algorithms

122

on single processor machines face severe limitations in terms of computational mem-
ory and speed. This, along with inexpensive and widespread availability of distributed
or cluster computers serves as the motivation for exploring the parallel implementa-
tion of FMM [127, 37, 200, 201]. However, the algorithmic sophistication of the fast
methods makes the development of efﬁcient parallel algorithms difﬁcult. FMM re-
lies on (i) tree data structure to hierarchically partition the computational geometry
and (ii) an alternate representation of the Greens function using multipoles. Sev-
eral different approaches to partitioning the tree-data have been explored and they
can be broadly classiﬁed into (a) spatial partitioning, (b) direction partitioning and
(c) hierarchical partitioning. In spatial partitioning, the nodes of the tree are dis-
tributed among the P processors and harmonic expansions associated with a nodes
is completely contained within the processor it resides in. This approach is eﬂective
when the number of harmonic expansions associated with a node is constant, as in
the case of the Laplace FMM. While spatial partitioning has been successfully used
for the computation of electrostatic interactions in molecular dynamic simulations
[202, 203, 204, 205, 206, 207], it is not efﬁcient when applied to the case of Helmholtz
FMM; where the number of harmonic expansions associated with a node depend on
its level in the tree. This led to the development of direction partitioning strategy
[208, 209]. In this approach, spatial partitioning is used up to a particular level and
beyond this level the nodes are duplicated in all processors and their harmonic expan-
sions are partitioned among the processors [208]. The level beyond which direction
partitioning is used is determined by heuristics or a one time tuning analysis. Though
this approach does not ensure scalability beyond hundreds of processors, it has been
well exploited to solve problems with several millions of unknowns on smaller clusters
[210, 209]. Hierarchical partitioning was developed recently as a combination of spa-
tial and direction partitioning approaches [211, 212]. Here spatial partitioning is used

at the leaf level, and a systematic combination of spatial and direction partitioning is

123

 

 

USE

aib

0pc
SII.‘
tha
3116
par

SUE

Hen
pres

WE I

(\D

6.?

Cor.

 

that
equi
by I
CUIrl

Can

 

used at all other levels. This approach shows the promise of being provably scalable
albeit with a tight relation between N and P [212].

In this Chapter, a parallel implementation of the Wideband FMM [198] is devel-
oped that is scalable on large number of processors. This is achieved by developing
strategies that ensure self-similar distribution of tree data and lead to an algorithm
that is implicitly load balanced. This work extends the recent developments in par-
allel algorithms for Laplace FMM [206, 213] to the Helmholtz FMM. The resulting
parallel algorithm offers a seamless combination of spatial and direction partitioning
strategies. It is well known, from Amdahl’s law, that the maximum parallel speed-up
achieved is limited by the minimum time spent on serial computations of an algorithm.
Hence the parallel implementation of every step of the tree computation algorithm is
presented along with their cost analysis. The main contributions of this Chapter are:

we present,
1. a scalable parallel algorithm for hierarchical tree computations.
2. theoretical bounds on the parallel performance of the presented algorithm.

3. a scalable parallel EM solver for Wideband-FMM.

6.2 Preliminaries

Consider the electromagnetic scattering from a closed perfect electric conductor (PEC)
that is immersed in free space. Let S denote the surface of this object that is
equipped with a unit outward pointing normal f1. Electromagnetic ﬁelds denoted
by {EL (r), HL(r)} are incident upon the object. The incident ﬁeld inducae an electric
current J (r) that radiates scattered ﬁelds {E3 (r), H3 (r)}. The unknown current J (r)

can be found by solving the combined ﬁeld integral equation (CFIE), that may be

124

written as

an x a x Et(r) + (1 — (1)11 x Ht(r) = 0 r e S‘ (6.1)

where EL = EL + E3 and HL = Hi + H8 are the total electric and magnetic ﬁelds,
respectively, 8’ is a surface conformal to and just inside S and a E [0, 1] is an
arbitrary scalar constant. The scattered electromagnetic ﬁelds are related to J (r)
through the dyadic Green’s function,

a x a x 118(1) = £6{J(r)} = n x a x /3 6135.0, r') -J(r’) (6.2)
—a x H3(r) = ICm{J(r)} = a x % SdsV x [6.0310 .J(r’)] (0.3)
5.0.10 = -J'mv (7 + Q) g(r, r’) (6.4)

I e—jnh—r’]
g(r, 1') = W (5-5)

In above relations, K. is the wavenumber, 17 is the characteristic impedance of free
space, I= is the identity dyad and g is the scalar Green’s function. Typically, the
unknown currents J(r) are represented using RWG vector basis functions fm (r) [181]
and the system of matrix equations obtained by using a Galerkin testing procedure
is

21 = V (6.6)
The fast evaluation of the matrix-vector product relies on the rapid evaluation of

the scalar potential. Hence, with no loss of generality, consider the evaluation of the

scalar potential 2p due to N sources

N e—jnlr_ril
WT) = Z anwi (6-7)
i=1

Variation to the vector case is well established [65] and will not be elucidated here.

125

 

C0

3C

tha

I111;

tic

ill

He

Wa

l'.‘

M

 

6.2.1 The Fast Multipole Method

Consider the evaluation of the potential ¢(r) at a point r that is well-separated from
a cluster of sources that reside within a sphere of radius d. The FMM expansions

that enablae the fast evaluation of this potential is given as [65]

10(1‘) = 7%? dZRM(r3,k)T(ro—rs,k)wne_jk'(r_r0) (6.8a)
= 2+: [dzkwne’jk'(r_r0)£(ro,k) (6.8b)
K .

M(r3,k) = Zwie_3k'(r3-ri) (6.80)
i=1

T(x,k) = Z(—j)’(21+1)h¢(2)(~|r|)1’z(E-i) (6.8d)
l=0

£(r0,k) = M(r3,k)T(ro—rs,k) (6.8e)

where lro — r3| > 2d, Ir — r0| _<_ d, |r’ — r3| S d, k = nk, r3 (r0) is the center of
multipole (local) expansion M (L) for source (observation) cluster, T is the transla-
tion Operator, him and P1 denotes an order l spherical Hankel function of second kind
and Legendre polynomial, respectively. The translation Operator contains a spherical
Hankel function which is singular at the origin. Thus, for numerical stability, neither
the translation distance Ix] nor the wavenumber K. can be arbitrarily small [73, 76].
Hence, the classical FMM is ineﬂicient or numerically unstable when applied to sub-
wavelength problems, where the domain is discretizated at a rate higher than /\/ 10
to capture the geometric details. These limitations have been reported in detail in
[74] and several alternatives have been proposed to overcome them [46, 49, 50, 174].
In this work, the stable accelerated Cartesian expansion (ACE) [198] described in
Chapters 2 and 4 is used for the sub-wavelength problems. Following section brieﬂy
presents the ACE algorithm and its integration with FMM to create a Wideband
FMM.

126

 

by .

0p,.

W}.

ha:

 

6.2.2 Accelerated Cartesian Expansion (ACE)

ACE is a hierarchical tree computation algorithm that employs the generalized Tay-
lor’s expansion to derive alternate representation Of the Green’s function. The con-
struction of ACE algorithm is similar to FMM in that it uses the oct-tree for geometry
processing and derives equivalent operators for tree computation. In contrast, ACE
employs Cartesian harmonics as multipole and local expansions. Use of Taylor’s series
expansion for fast computation has been explored previously albeit with severe limi-
tation on accuracy and performance. ACE provides a generic framework by adopting
a tensorial formulation to exploit the full power of Taylor’s series expansion for fast
computation. In rest Of the paper, M(”) denotes a tensor Of rank n, the polyadz'c
associated with r = {725, ry, rz} is given by r(") = {r21r32r2’3} where n = 23:1 n,-
and n,- > 0, an m fold contraction between two tensors A(”) and B(m) is denoted
by A(”) . m . B(m) = C("-m) when n > m; for more details on these deﬁnitions and

operations see [182, 13].

The ACE expansions for computing the potential in (6.7) can be written as

Mr) = anM(n)(r3).n.T(")(r3,r) (6.9)
n=0
= an(r0—r)(").n.L(n)(ro) (6.10)
n=0
K w-
M(n)(r3) = Z(-1)"#(ri—rs)n (6.11)
i=1 '
(n) ne_jnlr0—r3[
T (r3,ro) = V W (6.12)
L(”)(ro) = Z $T(")(r3,ro).(m—n).M(m_") (6.13)

where M(") and L(") denote the n—th order multipole and local expansion Cartesian

harmonics and T(") is the ACE translation Operator. ACE is an almost kernel in-

127

 

Wllf

W]; I

ﬁll.

 

SlZl’

 

dependent method as (a) all quantities of the ACE algorithm, except the translation
Operator, are independent of the form of the kernel and (b) the ACE expansions are
rapidly converging for any non-oscillatory function [13, 61, 174, 214]. Readers inter-
ested in the details and other salient features of the ACE algorithm are referred to
[13]. In the case of the Helmholtz potential, the ACE translator Operator is given as

[174],

(,K,, 19211 11321 12331
T(")(r3,ro)=V" R (”1,712,710 2 Z Z(_)1)n+mR2m— —-2n 1

m1 =0 m2=0 m3=0

 

n1 n2 113 (6-14)

x
m1 m2 m3

1;"1 —2m1yn2-2mzzn3—2m3g(n _ m, KR)

where

902. ..R) = x/Q/WUKR)("+0'5’Kn+0.5(jﬂR)

n n!

= 2mm!(n — 2m)!

 

m

where, n = n1 + n2 + 713, m 2 m1 + mg + m3, Kn(-) represents the modiﬁed Hankel

 

function of order n, R = \/:I:2 + y2 + z2 and [J is the floor Operation. It is well
known that above Taylor’s series expansion is convergent when either the domain
size or frequency is small or both. Further, it has been rigorously shown that as

frequency tends to zero the ACE expansion seamlessly transitions to Laplace FMM

[198].

128

6.2.3 Hybrid algorithm for multiscale problems

Multiscale problems, by deﬁnition, contains a mixture of sub-wavelength and large-
wavelength problems. From above discussions it can be seen that the ACE algorithm
for evaluation of Helmholtz potential is stable and efﬁcient for sub-wavelength prob-
lems; while, FMM algorithm is eﬂicient and Optimal for large-wavelength problems.
Thus, individually neither of the two algorithm is efﬁcient for multiscale problems.
A hybrid approach, where both ACE and FMM expansions are used in an Optimal
and seamless fashion, is required to achieve full eﬂiciency with multiscale geometries.
This implies that one needs to develop transition Operators to switch from ACE to

FMM and vice versa [198]. These maps are given by

00

M(r.,k) = ZM<P>(rA).p.T$£2.p(k,r.-r.4) (6.15)
p=0
Loan”) 2 i‘jr—n/dszlggp(k,ro—rf1)£(k,ro) (6.16)

where r A denotes the center Of ACE multipole expansion Ml?) and the mapping
Operator Tmap. The derivation of Tmap and the proof Of convergence can be found

in [198].

The overall Wideband FMM algorithm proceeds as shown in Algorithm 2. The
computational geometry is represented using a compressed oct-tree [51, 61]. This
is constructed by ﬁrst embedding a cube enclosing the computational domain and
recursively sub-dividing the large parent cubes into eight smaller, non-overlapping
children cubes The boxes at the lowest level of the tree, beyond which no sub-division
occurs, are referred tO as the leaf boxes. In rest Of the paper, box and nodes are
used inter-changeably. Interaction list is constructed for all nodes and nearﬁeld list
is constructed for all leaf nodes only [9, 61]. Next, the tree nodes are classiﬁed as

ACE or FMM, based on the side length of the domain they represent. All nodes

129

representing domains less than a certain predetermined size, typically A / 4 or /\ / 2,
are classiﬁed as ACE nodes and rest Of the nodes are labeled as FMM. The ACE to
FMM multipole transition Operator in (6.15) is used in step 4 and FMM to ACE local

expansion transition Operator in (6.16) is used in step 6 of the Algorithm 2 [198].

 

Algorithm 2 Wideband Multilevel Fast Multipole Algorithm

1: Construct the tree representation for the given geometry (distribution of discrete
points).

2: Build interaction list for all tree nodes and the near-ﬁeld list for leaf nodes only.

3: 82M: compute multipole expansions at each leaf node from sources contained
within it.

4: M2M (upward traversal): compute the parent node multipole by combining the
multipole expansions at their children node.

5: M2L (translation): for all nodes in the tree convert the multipole expansions to
local expansions of the nodes in their interaction list.

6: L2L (downward traversal): update the local expansion information at a child
node using the local expansion of their parent node.

7: L20: use the local expansions about each leaf node to compute the farﬁeld po-
tential at its Observation points.

8: NF: use direct method for computation Of nearﬁeld potential at observation
points in each leaf node from sources contained in its near-ﬁeld nodes.

 

 

6.3 Parallel Algorithm for FMM

This section presents the details of the parallel implementation of the ACE (PACE),
FMM (PFMM) and the Wideband FMM (PACEFMM) algorithm outlined in the
previous section. First, a scheme for constructing and partitioning the oct-tree data
structure in parallel environment is presented. This is followed by the details on
parallel implementation of the individual tree computation steps in Algorithm 2. As
mentioned in the introduction, the emphasis of this work is on reducing the latency

among processors to ensure the scalability of the algorithm.

130

6.3.1 Parallel Construction of the Oct-tree

Although the construction of oct-tree is a one time effort and takes up a negligible
fraction of the overall parallel run-time, it is important because (a) tree partitioning
among the processors directly affects the load balancing of the rest of the algorithm
and (b) creation of various interaction lists at this stage are communication-intensive.
In our implementation, the tree is stored in postorder traversal order. It will be shown
that the resulting ordering of nodes enables load balanced computation Of various tree

Operations, obviating the need for explicit load balancing.

Let N denote the total number Of points (sources and observers) distributed within
a cubical domain of side length D and P be the number of processors. The average
number of points per processor is denoted by n = N / P. Given the smallest side
length do associated with leaf boxes, the total number Of levels or height of the tree is
H = log2 (D / d0). Integer coding scheme [204] is used to uniquely represent a node in
the tree. This has several advantages as (a) the keys encode a wealth of information
such as the center position Of the box represented by the node, level of the node, its
entire ancestral lineage etc., and (b) the sorted keys conform to Morton ordering [215].
Morton ordering of the sorted leaf nodes distributed across processors results in a self-
similar structure in each processor [199, 216] as shown in ﬁgure 6.2. Self-similarity
is critical to parallel processing as it ensures that each processor has an identical
number of tree-Operation. This leads to an implicitly load balanced scheme. The full
post-order tree is constructed in a recursive fashion, in each processor, by generating
the parent nodes from children nodes. Next, some comments on the distribution of

tree-nodes among the processors are provided.

Each processor contains only a part of the tree with nodes at every level as shown
in ﬁgure 6.3. It is evident that some nodes can occur in multiple processors. When

considering the global postorder traversal tree, across processors, each such node is

131

 

associated with a processor where its occurrence is appr0priate (the processor which
has the rightmost leaf box in the subtree of the node). This processor is referred to as
the native processor for that node and every tree-node has an unique native processor.
All other occurrences of the node are termed duplicate nodes and the following Lemma

6.3.1 provides a bound on number of such nodes.

Lemma 6.3.1. The number of duplicate nodes in each processor is bounded by the
height of the tree, and will appear sequentially at the end of the local postorder traversal

tree.

Proof. Let H denote the height of the tree. A processor can have at most one duplicate
nodes per level in the tree. The rationale for this statement is as follows: assume that
a processor has at least two duplicate nodes at the same level in the tree. Let v1 and
v2 be two such nodes, with v2 occurring to the right of v1 in the tree. A processor
has a node in its local tree only if at least one of the leaf boxes in the subtree under
the node falls in the same processor. Also, all the leaf boxes in a processor are
consecutive in Morton ordering. Taken together, these two Observations imply that
' the rightmost leaf box under v1 must reside in the same processor. Thus, v1 is native
to this processor and cannot be a duplicated node. This argument demonstrates
that a processor can have at most one duplicate node per level, shared with the next
processor. Similarly, one can show that the number of multiply occurring nodes that
are native to a processor are limited to one per level.

The proof that the duplicate nodes will appear sequentially at the end Of local
postorder traversal tree follows from the fact that the postorder sequencing always
places nodes before their parents. The parent of a duplicate node is also a duplicate
node in the same processor. Hence all duplicate nodes in a processor appear in

sequence at the end Of the local postorder traversal tree. El

Next section provides details on the parallel implementation of each step of the

132

tree computations shown in Algorithm 2. In rest of the Chapter, given any two nodes
A and B, A is said to be less than B if node A appears earlier than node B in the
postorder traversal sequence. This notion Of comparison between tree nodes simpliﬁes

implementation of several Of the processes detailed below.

6.3.2 Distribution Of ACE and FMM harmonic data

The above tree partitioning scheme ensures that the nodes are unifome partitioned
among the processors. If the size of data associated with each node is same, then
data is also uniformly distributed amongst the processors. This is true only for the
ACE portion of the tree; in FMM, the number Of expansions depends on the level Of
the node. Thus the total FMM expansions data contained in the native processors is
considerably high, leading to severe load imbalance during tree computations. Though
this load imbalance is bounded, due to Lemma 1, it undermines the scalability of the
algorithm. A possible remedy is to redistribute the tree nodes such that the FMM
expansions data is uniformly distributed across the processors. This results in the
following unfavorable scenarios. First, the Optimal redistribution of nodes may be
such that some processors, especially the native processors, contain only higher level
nodes. This induces latency during upward and downward tree traversals, steps 5 and
6 in Algorithm 2. Second the redistributed tree is not self-similar across processors
and affects the parallel efﬁciency during tree computations. As will be evident, this
results in highly scalable parallel scheme.

An alternative approach developed in this is paper is the adaptive direction parti-
tion. To achieve uniform distribution Of FMM expansion data, direction partitioning
[37] is employed for duplicate nodes only. The FMM expansions data of a duplicate
node is partitioned such that each copy contains an equal and distinct portion. This
scheme and the self-similar distribution of the tree automatically ensures that each

processor has an equal quantity Of FMM harmonics data. The nodes to be parti-

133

tioned and the number Of partitions are decided automatically, resulting in implicit
load balancing. This approach bears some similarity to the recently introduced hi-
erarchical partitioning approach, where the multipole data of all nodes except the
leafs are partitioned [211, 212]. This imposes a strict relation between N and P for
scalability. In contrast, the adaptive direction partition scheme is ﬂexible and differs
in the following manner: (a) it combines the spatial and direction partitioning in a
seamless manner; (b) direction partitioning is used only when its Optimal; and (c) it

provides a means of preserving the self similarity of tree computations.

6.3.3 Construction of Interaction Lists

Tree computation requires the construction of interaction and nearﬁeld lists. Inter-
action lists are built for all the nodes in the local tree except duplicate nodes. This
operation is split into serial and parallel portions. In the serial portion, the interac-
tion list of each node is built assuming that the full tree is constructed [199]. Given a
node’s key code, straightforward bit manipulation yields its parent node, the parent’s
neighbor nodes and their children. This information is used tO construct the interac-
tion list of each local node. Due to locality there is no communication cost associated
with this Operation. In the parallel portion, the non-existent nodes are eliminated
using one time communication. At this stage, different communication maps are
contructed for information exchange during tree traversal. The entire process is efﬁ-
ciently implemented with the use Of a binary tree search algorithm to identify nodes
in postorder traversal. A similar procedure is used to construct the nearﬁeld list Of

local leaf nodes.

134

6.3.4 Multipole and local expansion computation

In each processor, the multipole expansions are computed at every node in the local
postorder traversal tree. The postorder traversal order ensures that a parent node
appears only after its children nodes (in case Of duplicate nodes, all children that reside
in the same processor). Thus, when a parent node occurs the necessary children
multipoles are already computed. Multipole expansiond are computed for all the
local nodes, including the duplicate nodes. Note that the multipole expansions at the
duplicate nodes are only partially ﬁlled as they account for sources in that processor
only. Thus, after the local computation, all processors with duplicate nodes send their
multipole expansions to the appr0priate native processors of the duplicate nodes they
host. The native processor of a node simply adds the received multipole expansion
data to the appropriate local node. This algorithm is a one step update process with

the following bound on communication overhead.

Lemma 6.3.2. Total number of nodes received by a processor during multipole com-

putation is bounded by (P — 1)H.

Proof. This follows from the fact that the number Of duplicate nodes in a processor is
bounded by H (see Lemma 6.3.1). Since only the duplicate nodes are exchanged during
multipole computations, the maximum number of nodes received by any processor

will be no more than (P — 1)H. D

The computation of local expansion is a reverse analogue of multipole computa-
tion. In the downward tree traversal, the child node local expansions are updated
with the local expansion of their parent node. First, the processors with the duplicate
nodes obtain their local expansion from its native processor. Then, the downward
tree traversal is performed locally in each processor by traversing the local postorder

tree from right to left.

135

 

Cost analysis: Each processor has at least one node from every level of the tree
and their multipole expansions are computed in every processor by traversing the
local postorder traversal tree. Thus, this part of the process is load balanced if
every processor has the same number of leaf nodes. This is true even in the case
of FMM where the number of multipole harmonics increases as the level increases.
Since the number of duplicate nodes per processor is bounded, the communication
overhead involved in exchange Of their multipole information is also bounded. Hence

the overall process is load balanced.

6.3.5 Translation Operation

At each node in the global postorder traversal tree, local expansions are computed
using multipole expansions of the nodes in its interaction list. This process is divided
into a parallel and serial portion. In the initial parallel portion, multipole information
is exchanged between processors. While building the interaction lists for each node
in the local postorder traversal tree, the set Of processors that require their multipole
expansions are identiﬁed. This list of local nodes and processors is sorted accord-
ing to the processor-ID. At every processor, the requisite information is send to the
appropriate processors by traversing through this list. In implementation, this data
is exchanged in blocks whose size is deﬁned by the user. This serves two purposes
(a) the number of communication calls can be greatly reduced when compared to a
scheme where the multipole data is exchanged one node at a time, and (b) the block
size can be adjusted according to the communication architecture Of the distributed
environment to ensure an optimum performance. Once the required multipole expan-
sion data is received, the actual translation operation is performed in a serial manner
to compute the local expansion of nodes in the local tree. In case Of FMM transla-
tions, the duplicate nodes exchange and compute only part Of their FMM harmonics

data in accordance with the adaptive direction partitioning strategy. This is possi-

136

ble due to the diagonal translation operation of the Helmholtz FMM algorithm. In
actual implementation the serial and parallel parts are performed in an intertwined
fashion such that the translation Operation is performed as and when the data is
received. Further, asynchronous communications can be used to minimize the wait
time between the parallel and serial portion.

Cost analysis: The translation Operation is reciprocal. Thus, if two interacting
nodes are in different processors, then both processors need to exchange same amount
Of information. This process would be load balanced if all processors receive and
process the same amount Of multipole data. In case of ACE computations, where
the size of harmonics data is constant for all nodes, an uniform partitioning of tree
nodes automatically ensures that the data is also uniformly partitioned. The same
argument is not true in the case of FMM computation where the number of harmonics
is a function of the level of the node. However, the use of adaptive direction partition
ensures that the data is distributed uniformly across the processors. Hence, this part

of the algorithm is also load balanced.

6.3.6 Evaluation of Potential

The farﬁeld potential at the Observation points are evaluated from the leaf node local
expansion they reside in. However, the evaluation Of the potential is completed only
after accounting for the nearﬁeld interactions. These are interactions only among leaf
boxes, as speciﬁed by the nearﬁeld list. Similar to translation operation, for each leaf
node a list of processors that require its information is created and then sorted by
processor-ID. At every processor this list is used to communicate the leaf box infor-
mation, in blocks, to appropriate processors. The nearﬁeld potential is computed in
a serial manner from the received data. This completes the evaluation of potential at
every point across all processors. As in the case of translation Operation, the commu-

nication and computation parts are intertwined and asynchronous communication is

137

 

used to minimize wait time.

6.3.7 Parallel Electromagnetic (EM) Solver

Next, a brief description is provided on the use Of the above described parallel ACE
and FMM (PACEFMM) algorithm within the framework of EM solvers. When the
discretization size is in the orders of millions, the geometry processing to create basis
functions, etc. becomes computationally intensive. Though this is a one time prO-
cess, an efﬁcient parallel implementation is necessary to justify the algorithmic gains
Obtained with the PACEFMM algorithm to reduce the overall solution time. We
assume that the input to the parallel EM solver is a simple mesh ﬁle with a list Of
:0, y, z position of each node and the element-to-nOde connectivity table. Each of the
P processors reads an equal share of the Na nodes and Ne elements from the input
mesh ﬁle. Using the local element-tenode connectivity, a list of edges is created in
each Of the P processors. Each edge is represented by the two global node numbers
that make the edge and this two element integer array is sorted in parallel. Thus
every processor has approximately equal number of edges. The global node numbers
allows one to gather the {:0, y, 2} data of the nodes as they are sequentially distributed
across the processors. This allows us to compute the centers of each edge which is
then used to construct the oct-tree. Based on the distribution Of leaf boxes the edge
data are exchanged among the processors and the necessary {33, y, 2} data of related

nodes are gathered from their global node number.

6.4 Results

In this Section, we present plethora of results that exhibit scalability Of the parallel
algorithm presented here. All the results were obtained on a IBM Blue Gene / L cluster

with 1024 processors and 512 MB RAM per processor. The message passing interface

138

 

(MPI) was used for communication between the processors. In our implementation of
the classical FMM, the spherical harmonic ﬁlters [23] are used for interpolation and
anterpolation during upward and downward tree traversal, respectively. With 512
MB RAM per processor, precomputation of these ﬁlter coefﬁcients places memory
constraints and restricts the maximum number Of FMM levels to 10 or the overall
domain size to 128 A. First set of results correspond to evaluation of kernel only
which helps to study the various aspects of the parallel algorithm. This is followed
by the use Of these algorithm for solving electromagnetic scattering problems. In all
cases, the timings are reported in seconds and the parallel eﬂiciency of the algorithm
P6” is computed using

(6.17)

where Tm and Nm respectively denote the average time taken for evaluation of
potential a processor and number Of processors in the m-th processor set, m E

{32, 64, 128, 256,512, 1024} and ref is the smallest size processor set for a given N.

6.4.1 Kernel Evaluation

The PACE and PFMM algorithms were separately employed to evaluate the scalar
potential (6.7) at N random, uniformly distributed source / Observer pairs within a
volume and surface. For volume distribution, we ﬁll a cube of sidelength a and in
case of surface distribution the points were placed on a sphere of radius a.

In evaluating the PACE algorithm the overall size Of the domain was ﬁxed at
/\ and the leaf box size was the chosen such that the average number of sources /
Observers per leaf box is approximately 60. The order of ACE harmonic p = 3 was
chosen so as to evaluate the potential to an accuracy of 0(10-3) [198]. The number
of points N was varied from 1 million to 80 million and in each case the number Of

processors was varied from 32 to 512. Table 6.1 shows, as a representative sample,

139

the time spent at different stages Of the tree computation for the case of N = 40
million on diﬂerent processor sets. It is evident that the time taken for parallel part
of multipoletO-multipole (M2M) and local-to—local (L2L) evaluation are negligible.
This is in accordance with the theoretical reasoning presented in Section 6.3.4, where
Lemma 6.3.2 shows that the number of communications at this stage is bounded
and small. Notice that the timing data for rest of the process is proportional to the
number Of processors. This is a direct consequence of the self-similar tree partition
algorithm that ensures load balanced tree computation. The parallel efficiency of
PACE algorithm for different cases is shown in the ﬁgure 6.4. In all cases ref was
chosen to be 32 except for N = 80 million where ref =64 was used. The presented
algorithm exhibits efficiency as high as 98% on 512 processors. Figure 6.6 shows
the time spent by individual processors of a 128 processor set for N =40 million at
different steps of tree computation. This exhibits the excellent load balance Of the
prescribed algorithm as all the processors spend almost the same amount at every
step Of the tree computation. Next, in ﬁgure 6.5, we plot Tm as function of N, for
each processor set, to measure the cost complexity of our parallel implementation.
The slope of the linear line ﬁts are close to unity which indicates the 0(N) scaling of
our parallel implementation. Figure 6.7 plots the efﬁciency of our PACE algorithm
for the case of spherical distribution. Here again the eﬂiciency is as high as 96%
on 512 processors. This indicates the scalability of our parallel algorithm on large

number Of processors.

In evaluating the PACEFMM algorithm, the sidelength of the leaf box was ﬁxed
at A / 4 and overall size Of the domain was chosen such that the average number of
points per leaf box was approximately 60. This choice implies that the number of
ACE levels is zero. Table 6.2 shows, as a representative sample, the average time
taken by one processor at different stages of hierarchical tree computation for N =

10 million on different processor sets. As in the case Of PACE algorithm, the parallel

140

part of upward and downward tree traversal time are negligible and this is attributed
to the bounded number Of communications. Figure 6.8 shows the time taken by
the individual processors in a 128 processor set during translation with and with-
out adaptive direction partition strategy. Without the adaptive direction partition,
the native processors that host the duplicate nodes spend more time in communica-
tion and computation than others. The load balance among the processors improves
with adaptive direction partition strategy and helps to improve the scalability of the
PFMM algorithm. The eﬁiciency in case Of volume distribution of points is shown in
ﬁgure 6.10. The PFMM algorithm Offers efﬁciency as high as 96% on 512 processors.
Figure 6.9 shows the time taken by individual processors of a 128 processor set at
different stages of tree computation. The negligible variations in time taken by differ-
ent processors indicate the excellent load balance of the algorithm; which stems from
the self-similar partitioning of tree nodes. In ﬁgure 6.11, the average time taken for
potential evaluation Tm is plotted as a function of N for different processor sets m.
The slope of the linear line ﬁts are close to unity which indicates the linear complexity
Of the proposed PFMM algorithm. The efﬁciency of the PFMM algorithm for surface
distribution of points is in shown in ﬁgure 6.12. The algorithm Offers high efﬁciency

on 512 processors even for relatively small number of points N = 10 million.

6.4.2 EM Simulations

The CFIE formulation is used to solve for electromagnetic scattering from closed PEC
objects. The numerical system is solved by a parallel GMRES solver with diagonal
preconditioner.

First, the parallel solver is validated by computing the plane wave scattering from
a PEC sphere and comparing the RCS with analytical results from Mie series. Figure
6.13 shows the comparison of RCS from a PEC sphere Of radius 64A computed using

the parallel solver and Mie series. As is evident there is a excellent agreement between

141

Table 6.1: Average time spent by an individual processor at different stages of hier-

archical tree computation for N =40 million.

 

 

 

 

 

 

Proc. Local- Parallel- Trans- Parallel- Local- Direct
Multipole Multipole lation Local-exp Local-exp
32 1.97 0.00 241.35 0.01 43.50 1307.55
64 0.99 0.00 121.31 0.00 21.75 660.91
128 0.49 0.01 60.76 0.01 10.88 332.82
256 0.25 0.01 30.72 0.01 5.43 167.19
512 0.12 0.01 15.62 0.01 2.72 83.75

 

 

 

 

 

 

 

 

 

the two solutions and validates our parallel implementation. Figure 6.14 shows the
comparison of RCS of a PEC sphere of radius 128A, discretized with 14 million un-
knowns and both the solutions exhibit excellent agreement. To compute the efﬁciency
of the parallel solver, scattering from PEC spheres of different radius (and different
number of unknowns) was considered on different processors sets. Consider spheres
Of radius {16, 32,64} with {500,1500, 3240} thousand unknowns. These simulations
were performed on 64, 128, 256 and 512 processors. When computing the efficiency
using (6.17 ), the time denotes the solution time averaged across the processors and
64 processor set as reference, ref =64. As shown in ﬁgure 6.15, the parallel solver
exhibits eﬂiciency as high as 90% on 512 processors with 3.24 million unknowns. The
parallel solver was applied to two realistic geometries. The ﬁrst geometry is a PEC
toy-aircraft with ﬁne edges that is densely discretized with 1.75 million unknowns.
At 3GHz the principal dimension of the geometry was 64): and the ﬁgure 6.16a shows
the induced surface currents and the computed RCS. The total simulation time was
less than 4 hours when executed on a 256 processor cluster. The second geometry is
3 PEG sharp arrow discretized with 3.24 million unknowns. The principal dimension
was 64): long at 3 GHz and the induced surface currents and RCS are shown in ﬁgure

6.16b. The simulation time was less than 2 hours on a 256 processor cluster.

142

Table 6.2: Average time spent by an individual processor at different stages of hi-
erarchical tree computation for N =20 million points uniformly distribution within a
cube of sidelength 20A.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Proc. Local- Parallel- Trans- Parallel- Local- Direct
Multipole Multipole lation Local-exp Local-exp
32 249.98 0.04 39.01 0.02 130.99 2488.38
64 125.31 0.07 16.94 0.03 66.17 1250.01
128 62.94 0.13 9.85 0.06 33.71 626.89
256 31.77 0.28 5.09 0.11 17.52 314.1
512 16.17 0.59 3.23 0.24 9.39 157.22
5E iii
310 51:1 ”f” *1) ill
9' M""'f§l3§iii<iﬁfi;3éim
0 o ACE o (0

Figure 6.1: An example compressed tree used in ACE+FMM hybrid approach.

Level m+1

LGVOI m

 

 

 

 

 

 

Figure 6.2: The Z—space ﬁlling curves or Morton ordering formed by the sorting the
nodes Of the tree at a particular level.

143

 

 

P1 P3 [ 9, 10. 21. 11. 12. 22. 27. 13[23. 28. 30. A]

P2 | 4. 18. 25.211.13.13. 19. 7. 3, 20. 26. 29,]A || P4] 14. 23. 15. 1s. 24, 20. 30, A]

 

 

 

Figure 6.3: Illustration of the tree partitioning scheme proposed in this work. The
subsequent distribution of nodes and duplicate nodes in each processor is also shown.

 

 

 

 

 

 

1.02~~-~- ~
1 .

E 0.90".
g in:
E 096 *5"
'_6 #1014
i 0.94 —~ - *20"
I! #4011

0.02- - ~ ~ +0014

20 220 420 020 020 1020

No of Processors

Figure 6.4: Efﬁciency of the parallel-ACE algorithm for computation of Helmholtz
potential between N uniformly distributed random point within a cubical volume.

144

 

/ ﬁx) = 1.03! - 4.34

f(x) = 1.03! - 4.65

f(x) = 1.01! - 4.“

f. f(x) = 1.01x - 5.14
. f(x) = 1X - 5.34

 

   

5.0 0.0 0.0 7.1 11
Log N

Figure 6.5: Computational complexity Of the parallel-ACE algorithm for the case of
uniformly distributed random points in a cubical volume. The slope of linear line
ﬁts, shown by dotted lines, are close to unity and indicates the linear complexity of
parallel-ACE algorithm.

 

 

 

 

 

 

 

V .. "Directi

2.5 b‘ ‘- r - — :—-__I

2 -M2u

ﬂ .0- * a. * up C‘ ‘I' A a u- d‘ an d- as. q

A f

o 1.5

--E. Il-L2Li

: 1 “a! 9,. I 1.!!!-..I..e.-.'.!au'. 1! -.519' e.g.-11.....Intue— in.-..'.!.-..§!.'AH 1 his.
I.
3

0.5

o? g

- ' 'MZM:

gnu-Q'C-n-l‘hg-----§’..-§\ _Oh‘-'1.-§-I.

-0.5 ’3 ‘

1' 10 10H20msr'40‘55‘ee’73‘02 01 100100110127
Processor Index

Figure 6.6: Time spent by individual processors Of a 128 processor set at different
steps of tree computation for N =40 million using ACE expansions.

145

1.01

 

 

 

 

 

 

5 0.09 1"
e ; 3*5M
E 0.90 §"'"10M
=° ;*20M
0. I f

3 ‘80"

0.961" .. - 5
20 120 220 320 420 520

No. of Processors

Figure 6.7: Efﬁciency of the parallel-ACE algorithm for computation of Helmholtz
potential between N uniformly distributed random points placed on the surface of a
sphere.

 

 

 

 

 

 

 

'I‘Wl Dir. Part.
22 3' " ””+’wro Dir. Part
O
1: i
C .
o
0 l
o a
U I
5
e E
g .
h
12 ‘

 

 

1° 3:1...“ T‘.‘T"“‘""T‘l“' r f‘T“”‘ “:Wrr‘rrr.‘ . . . . . 1 v 1 . . . . . -v . .1
~e°eeeeesseeesses
Processor Index
Figure 6.8: Time spent on the Helmholtz FMM translation Operation by individ-

ual processors of a 128 processor set with and without adaptive direction partition
strategy.

146

 

3.2

Hap ~41 Iiﬁpudsllulﬁgnu 1.},1 61"." sv‘Jnu-ﬁ‘ﬁ'SIsg‘Q‘

 

 

" 'Direct

2.7 1.....- .. - , ,...... .. ...._... M--- .. .. .. .. . .._............-__.._.,._- --_ _ _... -

' " M2"

1bDulu-.O-aQ-g..~hstuuntstss-ﬂgh'I-‘mhaglg-uu-IQ]
A

al--ﬁdﬁdﬂuﬂvvv~w1

-— 2.2 — L2L
I'-
v
a
O
_l

131$ 3' a! g l ’

"Trens

 

 

 

1 .2 "1217? ll 11 ’1"?711117177111 ST’T‘; lflf 3:1". FT" . i :rllll'lll Till 1'll'i—ETTIWTYTYTTIIITTETffll'lll'l‘nyIml-TFIITUI l'l'l'l'llfl'l'TYIl'I"! 117 II‘

1 11 21 31 41 51 61 71 81 91101111121
Processor Index

Figure 6.9: Time spent by individual processors Of a 128 processor set at different
steps Of the tree computation for N =40 million using the Helmholtz FMM expansions.

 

1.02

 

 

o 9 8 “0“
I

 

0.94 «Mi-m

0.92 +————~- .-_._..-__.--__

 

 

Parallel Etﬂclency

 

 

 

 

 

 

0.9 . , __.. , ._
0 100 200 300 400 500 600

No. of Processors

Figure 6.10: Efficiency Of the parallel-FMM algorithm for evaluation of Helmholtz
potential between N uniformly distributed random points within a cubical volume.

147

 

 

 

 

 

 

 

 

3.5
3 . 1'32
'- "' 64
g 2 5 '7‘ 1 26
"' * 256
2-.-- #512
1 .5 W
1 .1.--- — WWW-WW W-.-...__._____-,_-___---. ———— -— . . __-. 2:
6.6 6.6 7 7.2 7.4 7.6 7.8 6

Log N

Figure 6.11: Computational complexity Of parallel-FMM algorithm as a function of
number Of unknown N for different processor sets P. The slope Of linear line ﬁts are
shown by dotted lines.

 

 

 

 

 

 

1 .1
0.9 -
.2 0.0 -
i
9:3 0.7 J * ":1 M
& *N=5u
0.0- *N=10|I
0.5 I T I l l
0 100 200 300 400 500

No. of Processors

Figure 6.12: Efﬁciency Of the parallel-FMM algorithm for evaluation Of Helmholtz
potential between uniformly distributed random points on the surface of a sphere.

148

 

 

 

 

 

 

 

 

 

 

 

l l l I I l I
m ' —1
Mle
2° ' —-— Numerical ‘
10 ..
9.3
w o ..
U)
U
1:
'10 - 1’ .1
-20 - -
.30 .. q
l l I l l l l
0 10 20 ll) 50 60

30
Then. degree
Figure 6.13: Comparison Of RCS due to plane wave scattering from 3 64A PEC sphere

computed using the parallel EM solver and Mie series solution. Only a portion Of the
RCS is shown for clarity.

149

 

 

40 ’ _Numericel
_ _ _Anaiyucai

 

 

 

30'

 

20

RCS (in as)

 

J l l l I l

 

 

 

0 5 10 15 20 25
Theta (in degrees)

30

Figure 6.14: Comparison Of RCS due to plane wave scattering from a 128): PEC
sphere, discretized with 14 million unknowns computed using the parallel EM solver

and Mie series solution.

 

1.05 . ,, .,.-.-..-.-.....-__---_____.-. ---...._ ..-...-.--..-.._.--,
l .

‘ 5

1

'1

 

 

3

 

 

 

 

Parallel Efﬁciency
0
e

 

 

l .'
+n ‘-Iv--Is-~‘."'(‘..‘ ~1- --- - ~~ - -'_‘x‘.\A-mpm .. l-_-‘-.v ~- - - .‘r-rm-uhn--‘LI --~,---- - -... n-‘wraoq-n.—~...,-.‘_-_-. al.-1m...- .. , 1.: --..I

N0. of Processors

 

Figure 6.15: Efﬁciency Of the parallel EM solver for the scattering from PEC sphere
with different number Of unknowns N and as number Of processors was varied from

64 to 512.

150

 

(b)

Figure 6.16: Induced surface currents on multiscale geometries (a) toy-aircraft with
sharp edges and (b) tetrahedron shaped arrow.

151

Chapter 7

Summary and Future Work

7. 1 Summary

In this thesis work, several different algorithms have been developed to overcome
the high computational cost Of integral equation methods used in computational
electromagnetics. A common feature among these proposed algorithms is the use Of
the recently developed accelerated Cartesian expansion (ACE) algorithm. ACE was
initially introduced for fast evaluation of polynomial potentials with real exponents.
This work has extended it for the fast evaluation Of Helmholtz and retarded potentials
that is commonly used in frequency and time domain electromagnetic simulations,
respectively. It has been rigorously shown that these ACE-based algorithms are
stable and efficient when applied to sub-wavelength or low-frequency problems in
electromagnetics. These are the class of problems where the principal dimension Of
a computational domain is much smaller the dominant frequency or analogously the

dominant frequency is very small.

The stability of the new algorithms at sub-wavelength regime is a features com-
plementary to the existing acceleration schemes, like Fast multipole method (FMM)

and plane wave time domain (PWTD), which have only limited success at these small

152

 

 

scales. Following this Observation, hybrid algorithms were developed to combine pos-
itive aspects Of the proposed and existing algorithms while their negative aspects
mutually cancel. The hybrid algorithm in frequency domain was further integrated
with an electromagnetic solver, resulting in a multiscale electromagnetic solver. This
solver was used to simulate several multiscale problem with signiﬁcant speed-up, as
high as 14 times, over the existing solvers. The capability of the hybrid algorithm
to yield arbitrary accuracy and its cost scaling were evaluated numerically. It was
Observed that the cost scaling Of the hybrid algorithm remained purely a function of

the number of unknowns, irrespective Of the nature of the geometric distribution.

In the context of efﬁciently solving the multiscale problems, a new integral equa-
tion formulation was developed to yield well-conditioned systems Of equation. Since a
well-conditioned system of equation requires less number of iteration, when using iter-
ative solver, the new formulation helps to reduce the overall solution time. This work
achieves this by reformulating the existing augmented electric ﬁeld integral equation
(AEFIE) employed both the electric charge and current as independent unknowns.
The imposition Of additional constraints, like continuity and total charge conserva—
tion, have also be addressed here. The formulation was shown to be stable across
frequency range for a variety Of 2D and 3D problems. Though the condition numbers
were in the orders of few hundreds in the 2D case, the condition number was in the
order os 106 for 3D problems. This was reduced considerably with the use of orthogo-
nal basis functions. However, it must be noted that the existing numerical procedure
to construct orthogonal basis fimction are computationally intensive in terms of both
memory and time.

Finally, novel strategies for parallel implementation of the hierarchical algorithms
were also considered. In developing the parallel implementation Of these algorithm,
the emphasis was laid on improving the scalability such that they can be efﬁciently

executed on large scale clusters with thousands of processors. A major contribution of

153

 

this work is the development of a strategy to distribute the FMM or ACE tree nodes
to reﬂect self-similarity and the introduction of adaptive direction partitioning. These
developments enable the derivation of theoretical bounds on communication overhead,
load balanced evaluation of different steps of the tree computation and results in
a parallel algorithm that is provably scalable. Plethora Of results are presented to
demonstrate the different aspects of the parallel algorithm; particularly the scalability

Of the algorithm on thousands of processors.

7 .2 Future Work

The following topics are the suggested future direction of research along the lines of

this thesis work:

7.2.1 Well conditioned formulation for EM solver

e As noted in the summary, the well conditioned AEFIE formulation developed
here demands orthogonal basis function to Obtain reasonably low condition num-
bers for 3D problems. Existing numerical procedures to construct orthogonal
basis functions are computationally expensive. Development Of methods or al-
gorithms to efﬁciently construct these orthogonal basis function, either based
on tOpOlogy or numerical decomposition, would be Of signiﬁcant merit and ﬁnd

use in several other research areas.

0 The AEFIE formulation developed in this work is speciﬁc to closed domains.
In many practical applications, particularly when analyzing antennas, Open
structures are common. Hence, these formulations should be extended to Open

domain problems also.

0 The AEFIE formulation is developed for PEC scattering only. Development

154

of similar approaches for scattering from dielectric Objects would be of great

beneﬁt and extends the reach of these algorithms.

0 Recent research work have successfully exploited Calderon operators to Obtain
well conditioned formulation for both open and closed problems. However, these
analytic preconditioners have been developed for PEC scatterers only and can

be extended to dielectric scattering also.

7 .2.2 Fast algorithms

0 The algorithms proposed in this work has extended the application of multi-
pole methods from moderate to high frequency EM problems to very low to
high frequency EM problems. This limit can be extended to include the en-
tire frequency range, very low to very high frequency problem, by integrating
FMM with ray/Optics based based algorithms. This develOpment would enable
simulation of ultra Wideband radiating structures in the presence of very large

scatterers.

e In this work, the largest problem size solved with the parallel FMM was 128
A. This limitation is due to the limited memory on the distributed computers.
Several strategies can been developed for optimal usage of memory. Particu-
larly, it was Observed that, when considering large number of FMM levels, the
memory required by interpolation and anterpolation matrix is large. Numer—
ical compression of these matrices is a viable workaround to reduce both the

memory and time spent in traversing up and down the tree structure.

e The multiscale EM solver develOped in this work considers only ﬁrst order basis
functions to represent the unknown ﬁelds. These algorithms can be suitably
modiﬁed to consider higher order geometries and basis function. This would

increase the range Of multiscale geometries that can be analyzed.

155

‘r"-‘_--._—-;l

7.2.3 Numerical solution procedures

All the solvers used in this work utilize a simple diagonal preconditioners. More
sophisticated numerical preconditioners, amenable for parallel implementation with
FMM, have been developed in the past decade. Implementation and development Of
such preconditioners would further extend the range of the problems solved by the

fast algorithms developed here.

156

Appendix A

A combined accelerated Cartesian
expansion (ACE) and fast Fourier
transform (FFT) acceleration
scheme for rapid evaluation of

diffusion potentials

This chapter explores the use Of accelerated Cartesian expansion (ACE) algorithm for
fast evaluation time domain diﬂusion potentials. Diffusion potentials are employed to
model several physical phenomena such as heat conduction, crystal growth etc. This
chapter is organized as follows: Section A.1 provides a comprehensive introduction to
the use of diffusion equation to model various phyiscal system and the existing fast
schemes. Section A.2 presents the formal deﬁnition the problem considered here. Sec-
tion A.3 introduces in detail the temporal and spatial acceleration schemes required
for rapid computation of diffusion potentials. Section A.4 presents plethora of results

to demonstrate the accuracy and efﬁciency Of the acceleration schemes presented here.

157

A.1 Introduction

The time domain diffusion equation is used to model a number Of different physical
phenomena, including eddy currents[217], heat conduction [218], crystal growth [219]
and pharmacokinetics [220]. As a result, methods to efficiently solve this equation
have widespread impact. Likewise, the solution to the lossy wave equation ﬁnds use in
ﬁelds ranging from wave propagation physics [221] to relativistic diffusive phenomena
[218]. The solution to these equations are typically obtained using either ﬁnite dif-
ference (FD) or ﬁnite element (FE) methods. The popularity of these methods may
be primarily attributed to ease Of implementation and, perhaps, a plethora of readily
available codes. However, differential equation based methods, in general, demand
the discretization of the whole domain and employ artiﬁcial or approximate bound-
ary conditions to truncate the domain. Additionally, they are susceptible to grid
dispersion. These features Of FE translate to higher computational cost. Integral
equation based methods offer a modality to overcome these computational bottle
necks. However, as is well known, integral equation based solvers are expensive, and
their computational complexity scales quadratically with the spatial and temporal
degrees Of freedom,i.e., the cost scales as 0(N3Nt2), where N, and N; represent the

spatial and temporal degrees Of freedom, respectively.

Methods to ameliorate the cost of using integral equation based solvers have been
a topic Of considerable intellectual interest. These range from fast multipole like
methods [222], Fourier-based methods [223], and methods that exploit spatial rank-
deﬁciency [224]. These methods have been used extensively for solving integral equa-
tions that arise from Poisson [225], Helmholtz [27], and wave equations [170]. In what
follows, we shall focus on fast methods that rely on fast multipole type methods. This
class of methods was ﬁrst developed to ameliorate classical N-body problems, e.g.,

evaluating Coulombic potentials in large systems [222] and has been widely used in a

158

 

number Of diﬂerent applications ranging from molecular dynamics to astrophysics to
electrical engineering to ﬂuid mechanics [34]. In this paper, the speed-up is achieved
using a series of cascaded Taylor’s series expansion in a hierarchical manner [166].
The methods used to develOp the classical FMM can be immediately extended to
computing Gauss transforms [226], that play an important role in several areas [227].
Consequently, fast Gauss transforms have seen several improvements [228]. Direct
application of this method to computing diffusion potential ameliorates the quadratic
spatial cost, i.e., the cost is reduced to 0(N3Nt2). Acceleration in computing the tem-
poral convolution was ﬁrst addressed by considering different time scales [227], and
successfully used to solve the heat equation [229]. A recent work [230] employs basis
function expansions tO develop a acceleration scheme whose cost scales as 0(N3Nt).
All these methods accelerate the solution of the diffusion equation completely in real
space. Alternatively the diffusion equation can be solved in Fourier space [231, 232]
where the cost scales as 0(NtN3 log N3).

In this chapter, an acceleration scheme is proposed for the fast evaluation Of time
domain potentials that arise from the solution Of diffusion equations. The proposed
method scales as 0(N3Nt log Nt). It is to be noted that the focus Of this work is the
development of acceleration kernels and not solving these equations for application to
speciﬁc problems. The proposed methodology is based upon integrating the recently
developed Accelerated Cartesian Expansion (ACE) [166] algorithm with either Fast
Fourier Transform (FFT) schemes. The main contributions of this work can be listed

as

e Developments of modiﬁed FGT based on ACE. This has exact translation op-

erator for traversing up and down the tree

0 Integration of ACE with FFT based temporal acceleration schemes. The overall
cost Of the scheme scales as 0(N3Nt log Nt).

159

A.2 Mathematical Preliminaries

Consider a domain, Q C R3 x R, that contains a distribution Of sources, f (r, t), that
is bandlimited to fmax and time limited to T. The ﬁeld due to this distribution of

sources satisﬁes

(V2 — a23t — 21383) <I>(r, t) = f(r,t) for r x t e 0

(A.1)
<I>(r,0) = u0(r) for r x 0 E Q

where <I>(r, t) is the dissipative wave potential at r and time t, 9 is a domain of ﬁnite
volume, u0(r) is the initial condition, a and c 6 IR are problem dependent constants,
and 6t denotes a temporal derivative. In the limit c —> 00, we recover the diffusion
equation. This corresponds to the limit of an inﬁnite velocity of propagation for
solutions, i.e. changes in the spatial proﬁle Of a solution can inﬂuence the behavior

of the solution at all points in space after an inﬁnitesimal period of time.

<I>(r, t) = <I>1(r, t) + <I>2(r, t) (A.2a)

510-, t) = [a G(r — r’, t)u0(r’)dr’ (A.2b)
t

<I>2(r, t) = f(r’, t')G(r - r',t — t')dr’dt' (A.2c)
( l

where G (r, t) is the Green’s function for the diffusion equations given by,

 

_ a -02||r||2
G(r, t) — (47rt)3/2 exp (T) O(t) (A.3)

where 9(t) is the Heaviside distribution. The literature on the application of these
Green’s function to integral equation based solvers is extensive [233]. However, it

is apparent that the computational cost is the principal bottleneck to the adOp-

160

tion of these methods, despite their several advantages [230, 232]. Indeed, it can be
shown from a straightforward discretization of these integrals that the cost scales as
0(N3Nt + N 3N3). In what follows, we will prescribe methods to tackle the latter
integral. As will be evident, evaluating the former will be a trivial use of the methods

presented here.

Assume that the temporal sample size is uniform, and is denoted by At. Without
loss of generality, the discrete version Of (A.2c) can be written as
lt/AtJ Ns
52(r, t) = Z Z G(r — r;,t — zAt)f,(r;, 1A.) (A.4)
l=0 i=1
where [o] denotes the ﬂoor Operation, r; and f,- are the position and strength asso-
ciated with the it” spatial source respectively. Testing the discretized form at N.
spatial points at time t = tk results in the following matrix equation,

k—l

Pk = Z Zk-zSz (A-5)
l=0

where the vectors are given as

Pk = {¢2(r1:kAt):¢2(r2:kAt)a°'°:‘D2(rNsakAt)}

Sl = {f(r11 LAt)1 f(r21LAt)7 ' ° ' a f(rNsalAt)}

and Z1 is N. x N3 matrix whose elements are Zl(i, j) = G(r,- — rj,lAt). In what
follows, it is assumed that S) is known only at t = IA); and not before. Other IE
formulations [230, 232, 229] can be reduced to the matrix equation of the form (B.20).
As done in [234], this equation can be cast as a space time matrix equation and it '
is readily apparent that cost for evaluating this system scales as 0(N3Nt2). In the

next section fast methods are developed to reduce this cost to either 0(N3Nt log M)

161

0r 0(N3Nt).

A.3 Acceleration Schemes

In this section we introduce the spatial and temporal acceleration schemes that form
the crux of this paper. First we employ the ACE algorithm to rapidly evaluate po-
tentials at a particular Observation time due to spatially distributed sources excited
at a single time step. This corresponds to the inner summation in (A.4) and the
evaluation of one matrix vector product in (B.20). Given the ACE expansions for
spatial acceleration, an FFT based scheme is proposed for rapid evaluation of sum-
mation over temporal basis functions corresponding to the outer summation in (A.4).
The temporal acceleration schemes presented here primarily exploit the fact that the
harmonic expansion Of ACE preserves the temporal convolution in (A.4). Then the
Toeplitz structure among ACE expansions is identiﬁed, for which the standard FFT
scheme is employed to reduce the complexity of temporal convolution from 0 (N132 ) to
0(Nt log Nt). Further, the proposed schemes are cast in a block Toeplitz fashion to
conform with the marching on in time (MOT) framework of existing integral equation

solvers.

A.3.1 ACE for Spatial Acceleration

As mentioned in the previous section, each summation term in (B20) corresponds to
a ﬁxed source and observation time which is denoted in the subscript. The individual

matrix vector product Zk_lS( corresponds to the evaluation of inner sum in (A.4).

162

This evaluation involves only spatial contributions and can be written as,

V1: 1 = Zk—lsl (A-G)

N.
vgl, = me...) = Z G(rm — rn, (k — l)At)fn(rn, 1A,) (A.7)
n=1

where Vkl is an intermediate vector of dimension N 3 introduced for the convenience
of the following discussion. It is evident that cost of computing the entire vector V“
scales as 0(N3). In this work, the accelerated Cartesian expansion (ACE) algorithm
is adopted to accelerate computation Of vector V“. Rapid evaluation of ij through
the ACE algorithm requires the deﬁnition of V"G (R, t) in evaluating the multipole
to local expansion using Theorem 2.3.5. It is evaluated using a recursive expression

as

mm) = 651652653211...immense») (As)
afoul-1:) 24—L:Ia{°-10(R,,t) — 2(k—1)8[°‘2G(Rl,t) le{a:,y,z}

3
where p,- e {0, 1, ..p} and 2 p,- = p.

i=1

The ACE procedure for Gaussian kernels bears similarity to Fast Gauss Trans-
form (FGT) [226].The use Of generalized Taylor expansion and Cartesian tensors for
Gaussian kernels is essentially a reformulation of FGT, where Hermite polynomials
and expansions are employed. The cost and storage savings achieved with the use
of totally symmetric tensors is identical to that of the graded lexicographic order
representation in improved FGT[228]. An additional advantage with ACE algorithm,
as detailed in Chapter 3, is the use of exact translation operators for multipoleto—

multipole and local-tO-local expansions.

163

A.3.2 FFT based Temporal Acceleration

The above description details the use Of ACE for the rapid evaluation of a potential
at an arbitrary Observation time due to spatially distributed sources all excited at a
particular time instant. This corresponds to the evaluation Of one Of the sum terms in
(B20), hence only a part of total potential at time step (kAt) is computed. As men-
tioned in the introduction, diffusive, lossy wave and Klein-Gordon potentials exhibit
an inﬁnite temporal tail (long history) and the computation of the potential at the
kth time step would involve k such partial potential evaluations. Consequently, the
cost of a scheme with only spatial acceleration scales as 0(N3Nt2). This complexity
is undesirable when the number Of time steps in the simulation is large, as would be
expected for any time domain simulation Of merit. A FFT based temporal accelera-
tion scheme is deverlOped here to ameliorate this cost. This scheme are formulated in
a manner such that causality is not violated i.e. evaluation of a potential at time step
(kAt) assumes the knowledge of sources at time steps (mAt), m < It only. Thus the
proposed acceleration schemes are in conformance with the existing solver framework

and can be readily integrated.

Consider the convolution in (B.20), this can be written as one matrix vector
product as illustrated in Fig. A.1, which illustrates the evaluation Of the vector Pk at
all time instants k = 1, . . . , Nt. Note that each term in the ﬁgure are themselves either
a matrix or vector quantity and depend on the spatial discretization. It is evident,
from Fig. A.1, that the matrices Z;_j form a Toeplitz matrix and beckons the use of
fast Fourier transform to perform the matrix vector product in 0(N82Nt log Nt) cost.
However causality allows one the knowledge of past sources only. In other words,
one cannot assume the knowledge of sources 8),, Sk+la . . . , S Nt when evaluating the
potential at time step It. To overcome this, the matrix is divided into sub-matrices

Tk as indicated in Fig. A.1 . The evaluation of the potential Pk involves different

164

submatrices Tl that are multiplied by past sources only at different time instances
[234, 235]. In this scheme, computation at the kth time step involves 10 past time
source vectors { S k— N: . . . , Sk_1} multiplied with the appropriate Toeplitz sub-matrix
T N to produce potential Pk at the kth time step and partially evaluated potentials
{Pk+1,.. ~1Pk+N—1} at future time steps, where N depends on It. For example,
evaluation of P2 involves the multiplication of vector {80,31} with sub-matrix T2,
however this also results in partial computation Of P3. In the next time step, P3
is completely evaluated with the computation of the matrix vector product T1 S2.
Thus the sub-division scheme shown in ﬁgure A.1 also provides a means to compute
potential within the MOT framework. Further each Of the submatrices is Toeplitz
and hence FFT can be employed to accelerate the computation. It is important to
note that the this acceleration methodology does not involve any approximations and
is exact. Consequently, this scheme is used to accelerate both near and far time

interactions.

Next, consider the above acceleration scheme within the context of ACE algorithm
introduced in previous section. The following two observations that forms the basis
Of integrating the Block-Toeplitz based temporal and ACE based spatial acceleration

acceleration schemes,

1. Except for the multipoletO-local translation Operators all other operations in

ACE depend only on either the source (1) or Observer (k) time.

2. The multipoletO-local translation Operation in Theorem 2.3.5 preserves the

convolution in time.

Consider the evaluation of potential at Observation points in domain (20 at time kAt

due to sources located in domain {23 and excited at (At. Using theorem 2.3.5 and

165

 

(2.8) Of ACE algorithm in (A.4) we get

<I>2(r,kAt) = 202...)“: n Ll") (A9)
n=0
H H )
Lk" =. 2ng (A.10)
(=0
°° m n 1 m
= Z (21w )_(m_n).mH§c_]) (A.11)
m=n (:0

where, Hg’g = VmG(rgs,(k—I)At)

From above, we infer that the evaluation Of the local expansions at the kth time step
is equivalent to the evaluation of the potential at the kth time step. Considering the
above equivalence we rewrite the discrete form (B.20) as

k— 1 00
P), = 2:09,, -.n)("> .[Z (M[m ") .—(m n). —H§,"‘)) (A.12)

 

l=0 n=0 m=n .
°° H ( —) H< ) l

= 2(pm.)("l .11.. Z: Z (Mlm n .(m— n).— nngml) (A.13)
n=0 "1:" 1:0 _

In above equation, the time dependence of translation Operator and multipole ex—
pansion corresponding to a particular tensor component Of ACE harmonic can be
written in the block Toeplitz matrix form as illustrated in Fig A.1. This immediately
suggests the evaluation of the temporal convolution in (A.13) using FFT. In addition,
as described above, the block-Toeplitz structure is utilized to conform with the MOT
framework of solvers. TO complete the discussion we deﬁne the Fourier transform Of

a nth rank Cartesian tensor M(") as follows

M(")(n1,n2,n3;w) = f{Ml(n)(n1,n2,n3)} ,where n1 + 112 + n3 = n (A14)

166

 

where .7: denotes the forward Fourier transform Operator (t —+ w) and M("’) denotes
the nth rank Cartesian tensor in Fourier space. In essence the Fourier transform of a
tensor is evaluated in a term-by-term (n1, n2, n3) fashion. Given this deﬁnition the

local expansion at kth time step Lk due to all past sources is evaluated as,

Lin)(n1,n2,n3) = Z: $.74 {211070010 . (m — n) . M(m—n) (01)} (A15)

where n1 + n2 + 113 = n, .7-"‘1 is the inverse Fourier transform Operator (or —-> t).
Since the Fourier transform is applied only in the time domain (t H w) the tensor

contraction deﬁnition is valid in the Fourier domain as well.

Computation of the time domain diffusion potential using the multi-level tree
representation Of the spatial domain requires attention to the following details. The
multipole expansions for all boxes should be evaluated as we march on in time and
stored at all time discretizations, this implies Nt upward tree traversals. The mul-
tipole expansions M(") are represented as a set of Nt x 1 column vector for each
tensor component as M[n)(n1, n2, n3). In a similar fashion the translation operator
for Observation time kAt is represented as a set of k x k matrix for each tensor com-
ponent. In actual implementation only the (2k - 1) unique entries Of this Toeplitz
matrix is stored for computation with FFT. For interacting boxes, the multipole to
local translation Operation involves the temporal convolution (A.11) and are evalu-
ated in a rapid manner using FFT as in (A.15). This evaluation is carried out for
each of the (n + 1)(n + 2) / 2 tensor components. As is evident, from ﬁgure A.1 the
size of the Toeplitz system N depends on the block size which in turn depends on
the observation time step It and can vary between 1 and N; / 2. The outcome of this
process are the N local expansions from which the potentials at N future time steps

are evaluated by downward tree traversal.

The cost of this scheme is computed in the following manner. The cost of one up-

167

ward and downward tree traversal scales as 0(N3). The cost of computing multipole
expansions at all time instants scales as 0(N3Nt). The cost multipoleto-local expan-
sion and downward tree traversal for N size Toeplitz system scales as 0(N3N log N)
and 0(N3N) respectively. In the entire scheme, Toeplitz systems of size Nt / 2 occurs

once, Nt / 4 occurs twice and so on. Thus, the total cost Of this scheme scales as

0(N3Nt+Ns (% (10g_1§£+1) +2% (log%+1) +...))

z o (NsNt log2 N.)

The error in the acceleration scheme is only due to approximations in the ACE algo-

rithm.

A.4 Results

In this section, results are presented to substantiate the above claims and demon-
strate the efﬁcacy Of the algorithm presented here. The goal here is to demonstrate
considerable speed-up with predetermined accuracy. Consequently, the results pre
sented will demonstrate convergence in error as well as linear CPU cost scaling. In
all numerical experiments, the source / Observer locations are randomly (uniform dis-
tribution) chosen. The time signature associated with the nth source is given by
(A. 16)

M0 = WWW/2.2 (A.16)

where an is the source strength, randomly chosen between [0,1], 0 = 6.366 x 10-8
s and tp = 60 s. In all simulations At was chosen as 1 second. These parameters
were chosen arbitrarily.Error is conditioned primarily by the number of ACE har-
monics used, P. While there is an error associated with the UV decomposition of the

translation Operator, it is conditioned to be at or below that of the error incurred in

168

truncating the expansion of the translation operator to P harmonics. The accuracy Of
the proposed algorithm is validated against direct computation for all cases where the

unknown count is numerically small. The relative error at nth Observer is evaluated

 

as
(I) n,t — (I) - n,t
Errorfar(n) = H fast,far( ) direct,far( )ll2 (A.17)
llq>direct,far(n1 L) l [2
where, I] - [[2 represents Lg-norm, (I) fast, for (t) and (I’dz‘rect,far(t) represent the time

history of the ﬁelds produced by the sources evaluated using the proposed algorithm
and a direct procedure, respectively. The ﬁnal error reported throughout this work
is the average error over all observers. Error is computed only with the far-ﬁeld
potential, i.e. direct data is computed only for the source/Observation pairs that
are in the far-ﬁeld Of each other, and is consequently representative Of an upper
bound or worst-case error. With the exception Of the temporal scaling experiments,
the computational time reported here is the total run-time in seconds using a 2.3
GHz Intel Pentium processor with 2GB RAM running Linux OS. High Performance
Computing Center at Michigan State University was utilized to extend into very long

time scales. All estimated or projected time values are marked by I.

The ﬁrst set Of results demonstrates the exact multipoletO-multipole and local-
tO-local translation Operators of the ACE algorithm, in the context Of evaluating
Gaussian kernels. An important implication of this feature of the ACE algorithm
is that the error does not increase as the height of the tree is increased. Consider
two domains :21 and $22 of size (0, 0.5) x (0,05) x (0, 0.5)m3 and (1,15) x (1,15) x
(1,1.5)m3 respectively. In each domain 4000 source/Observer points are randomly
distributed and we consider interaction between these two domains only, all other
interactions are neglected. As the number of levels in tree is increased, the change
in the error norm can be attributed solely to the error in multipoletO-multipole

and local-to-local translations. Table A.1 shows error computed for different P and

169

different levels in tree. For a given accuracy (ﬁxed P) it is evident that variation in

error Obtained from using different levels in the tree is accurate to double precision.

The next set of results pertain to the evaluation of the time domain diffusion po-
tential in (A.4). Table A.2 presents errors for different ACE harmonics P, numbers
of unknowns N3, and domain sizes- represented by d - sidelength Of cube enclosing all
sources/ observers. In all cases the number Of time discretizations was maintained at
a constant, Nt = 256 and the FFT based MOT scheme was utilized. As expected, an
increase in the number of ACE harmonics, P, leads to a uniform decrease in the error
for all cases. The speed-ups provided by the proposed algorithm, utilizing the FFT
scheme, are exhibited in Tables A.3 and A.4. Table A.3 presents run-times for evalu-
ating time domain diffusion potential for different size Of spatial discretizations while
the following parameters were kept as constants: total number Of time discretization
Nt = 256, size Of domain d = 0.5 and number of ACE harmonics P = 3 corresponding
to an error Of 0(1E — 5). It can be seen that the proposed algorithm is 230 times
faster than the direct method even for a small problem size N. = 4000 and Nt = 256.
Figure A.2 shows a log-scale graph of Tfast vs N, for values in Table A.3. The line
in the graph corresponds to a least-square linear ﬁt whose slope was found to be 1.1.

This validates the 0(Ns) cost scaling of the proposed algorithm.

In Table A.4 the run-time of the proposed algorithm is shown for different num-
bers Of time discretizations. Aside from Nt, all other parameters were kept constant
at: N. = 8, 000, d = 0.5m and P = 3. For the FFT experiments, the expected
0(NtlogNt) scaling cannot be veriﬁed because of the following implementation de
tails. In this workthe Open source library FFTW3 package [236] was used which does
not exhibit uniform NtlogNt scaling. This is the case with many other performance
oriented FFT packages as well. Also, it was Observed that it is efficient (faster) to use
direct evaluation than using FFT procedure when the size Of the Toeplitz system was

smaller. This is important in terms Of overall performance as smaller size Toeplitz

170

system occur more frequently ex. the Toeplitz system of size 1 x 1 and 2 x 2 occurs
Nt / 2 and N); / 4 times. In this work direct evaluation was used for any Toeplitz matrix
of size 5 16 x 16, this may vary based on the computer platform and FFT library in

use.

171

 

 

 

 

 

 

 

 

 

 

 

 

I P1 . S0
P3 S1
P3 32
P4 = 33
P5 S4
P3 35
P7 ‘50—." Se

 

Figure A.1: Illustration Of the Block-Toeplitz computational scheme

Table A.1: Exact translation operator in ACE algorithm, P denotes the number Of

ACE harmonics.
Levels P = 3

3 5.343762051614 770E—006
4 5343762051614 130E—006
5 5.343762051614 279E—006

 

P = 6
7.0123317207 40178E—008
7.0123317207 70545E—008
7.0123317207 63711E—008

 

 

 

 

 

 

 

 

Table A.2: Error convergence for different number Of ACE harmonics (P) and different
source/Observer conﬁguration (N3, d)

 

N3, d
P || 8000, 1.0 | 4000, 1.0 | 8000, 0.5 | 4000, 0.5

 

 

 

0

4.92E—02

5.01E-02

1.39E—02

1.74E—02

 

1.12E—02

1.04E—02

3.71E—03

4.70E—03

 

9.61E—05

9.39E—05

1.22E—05

2.21E—05

 

1.20E—06

1 .36E—06

3.54E—08

9.04E-08

 

1.87E—08

2.28E—08

1.20E-10

3.30E—10

 

 

19'4me

 

2.77E—10

 

3.44E—10

 

6.40E—13

 

1.21E—12

 

 

172

 

4.4

I

4.8

5.2

Log(Ns)

Figure A.2: log Tfast vs. log N3 from Table A.3, slope of linear ﬁt = 1.1

5.6

Table A.3: Time for different problem size (N3) within a cube Of sidelength d = 0.5m.

In all cases Nt = 256, P = 3 (e = 0(1E — 5))

 

 

 

 

 

 

 

 

 

 

 

 

N3 TFFT TDirect
4000 71.98 17195.35
8000 98.32 68781.25 I
16000 226.8 275125.60 I
32000 472.66 110050240 1
64000 978.07 440200960 I
128000 2282.01 17608038.40l
256000 4652.78 7043215360l

 

Table A.4: T ast for different Nt size. In all cases N3 = 8, 000, d = 0.5, P = 3

(e = 0(1E — 5)

 

 

 

 

 

 

 

 

 

I Nt I TFFT I TDirect
256 94.17 44169.36
512 231.95 176677.44 T
1024 554.57 706709.76 l
2048 1395.52 282683904 1
4096 3438.00 1130735616 1
8192 9494.96 4522942464 T
16384 28410.84 180917698.56l

 

 

 

 

 

17

3

Appendix B

Comprehensive Exam Problem:
Integral Equation Based Eddy
Current Model for Defects in
Layered Media

The goal of this chapter is to develop an efficient simulation scheme for eddy current
testing Of defects in layered media. Finite element method is the popular technique
used to model eddy current testing. However, they require discretization of entire
domain and is not favorable for modeling layered media. Alternatively, the less
used Integral equation approach, with layered medium Green’s function, demands
discretization Of the small defect region only. Such a formulation demands the eval-
uation of inﬁnite integrals which are efﬁciently handled using the recently developed
Discrete Complex Image Method (DCIM). A volume integral equation solver, based
on tetrahedral elements, is developed here. Full details of the formulation along with

results demonstrating the efﬁciency and accuracy of the method are presented.

174

 

B.1 Introduction

Eddy current testing (ECT) relies on the principle of magnetic induction to interro—
gate materials under inspection [237] at very low frequencies, of the order Of kilo—Hertz
(kHz). Since its inception, ECT has become an indispensable tool and covers a wide
spectrum Of industries from pipeline inspection to aircraft health monitoring to bio-
medical inspections. ECT can be used for any applications where the material (to
be inspected is a conducting medium. Qualitatively, the principle of Operation Of
ECT is as follows: time varying magnetic ﬁeld (produced usually by a time vary-
ing impressed currents) induces eddy currents in the conducting region, this in turn
produces a measurable ﬁeld outside the material region. Change in conductivity of
the material affects the characteristic Of the induced eddy currents—magnitude and
direction of ﬂow—causing a change in the measured ﬁeld. Eddy currents are induced
in regions only near the impressed currents, thus it is a local phenomena and aids in

precise location of defects with appropriate signal processing.

Numerical models for ECT have played a signiﬁcant role in its development and
application. These models have been used to design the geometry Of the excitation
and measurement probes, determine Optimal testing parameters and development
Of efﬁcient signal processing techniques. Eddy current phenomena is governed by
Maxwell’s equation. Under low-frequency and high conductivity assumptions, the
coupled Maxwell’s equations is reduced to a single diffusion equation. Finite ele
ment methods (FEM) has been the popular choice in NDE community to model this
phenomena. This is primarily due to ease of implementation and fast solution time.
However, FEM requires discretization of a large domain and employs artiﬁcial bound—
ary condition to truncate ﬁelds at boundary. Alternatively, integral equation solution
have also been developed tO model ECT, however these models need to be formulated

appropriately to compete with the speed of FEM solutions.

175

In this work, an integral equation solver is developed to model ECT for analyzing
defects that are embedded in a multilayered environment, e. g., cracks under rivets etc.
Contrary to the existing literature in NDE [237], the low-frequency approximation is
not used here. A detailed discussion Of the various techniques used in this develop-
ment is provided in the following sections. Section B.2 describes in detail the theory
required for the development of the integral equation model. Section B.3 presents
the numerical techniques used in implementation. Finally, Section B.4 provides some

results to substantiate the theoretical claims.

B.2 Integral Equation Formulation

B.2. 1 Formulation

Consider the homogeneous domains {21 and (22 in ﬁgure B.1. Assuming e—i’L’L time
convention the harmonic electromagnetic ﬁeld in each region are given by Maxwell’s

equation,

V X E1 = inl
(B.1)
V X H1 = -’iw€1E1 + 0'1E1 + Je = —iw€~1E1 + Je
V X E2 = in2
(B.2)

V X H2 = —iw82E2 + 02E2 = —iw€~2E2
where E, and Hk represent the total electric and magnetic ﬁelds, respectively, in the
domain 52),, J c is the impressed or excitation source, {Eb/1.190%} are the electromag-
netic material constants permittivity, permeability and conductivity, respectively, Of

the domain 9k, 87,. = 8k — ok/iw denotes the complex permittivity. Straight forward

176

manipulations reduces (B.2) to the form in (B.1)
V X H2 = —’iw§1E2 + J3 (B.3)

where J 5 = —iw(§2 — €1)E2 is the equivalent source that depends on the ﬁeld in that
region. The equivalent sources introduced here allows one to treat 92 same as 521 but
with additional sources. The total electric ﬁeld, EL in any region is divided into E8
and E3 corresponding to the contribution from impressed source J 3 and equivalent
source J 3 respectively,

Et = E8 + E3 (BA)

The above integral equation is used to solve for the equivalent source strengths to
uniquely determine the ﬁelds everywhere. Evaluation Of E3 due to the equivalent
sources J 3 requires the prescription of the Green’s function, G, for a source radiating

in presence of the homogeneous media (21. Thus, (BA) is written as
E = E6 +341, (B5)

In ECT simulations, the layered media without any defect is chosen as {21 and the
region corresponding to defect only is taken as 92. Thus the Green’s function in (B5)
corresponds to the layered media Green’s function with sources outside and inside the
layered media. The next section, provides a detailed treatment on the derivation of

the layered media Green’s function.

B.2.2 Planar Media Green’s Function

In deriving the Green function, no approximation is made to reduce the governing
wave equations of ECT to a diffusion equation. This is done intentionally to lever-

age on the existing literature in the wave prOpagation community and introduce the

177

 

appropriate modiﬁcations at a later stage. A vast source of literature address the
derivation Of Green’s function in presence of layered planar media obeying the wave
equation and radiation boundary condition. In the last decade, several signiﬁcant de
velopments have been made in terms of formulation and evaluation of these Green’s

function in a manner suitable to numerical models.

The electric and magnetic ﬁeld produced by an inﬁnitesimal electric dipole d =

6(r)d in homogeneous media can be written in terms of spherical waves as

 

_ ikr
E(r) -_- —iwu (I+ :_2v) er *d (B.6)
eikr
H(r) = V x T *d (B.7)
where r = Mr“ and * denotes spatial convolution Operation. Sommerfeld Identity

[238] is used to represent spherical waves as a product of plane waves in z direction

and cylindrical waves in p direction,

ezkr i

k (1) 'k
— - dk —pH I: 6z zlzl B.8
1‘ 2./SIP pkz ”(pp) ( )

 

where 16,, = [#02 — kg, 16 = w/c, c is the speed Of light in the homogeneous medium
and w is the angular frequency. Now consider the eﬂect Of a layered media, stacked
in z direction, whose properties vary only as a function Of 2. From the expansion in
(B8), it is evident that only the plane waves in z direction will get affected. In other
words, the effect Of introducing a layered medium in 2 can be accounted in terms
of the reﬂection and transmission of the plane waves travelling in z direction. As
an example, consider a vertically oriented dipole placed at origin in a homogeneous
media, (1,, = 0. Since Ep = 0, this corresponds to transverse magnetic (TM) wave

excitation that is fully characterized by the Ez ﬁelds. Combining (B6) and (B8), we

178

get
0 k .
E2 =iwp(1+a§/k2)l / dkp—pHél)(kpp)e’kZIzl (13.9)
2 SIP k2

When a planar layered media in z direction is introduced, its effect can be accounted
for using the TM 2 reﬂection and transmission coefﬁcients depending on the region
where ﬁeld is computed. Fields on the same side of the source are called the re
ﬂected ﬁelds and is denoted by superscript R. Fields on the Opposite side are called
as the transmitted ﬁelds denoted by superscript T. In this case, the reﬂected and

transmitted ﬁelds are given as,

__ k . .
EzR = (1+ 83/k2)% [SIP dkp—:Hél)(kpp)e’kzlzl(1 + RTMe_’2"ZIzI) (13.10)

_ k .
E3 = (1 + 6.3/9%? [51p dkp—:H((,1)(kpp)ezk2zlzl(1 + TTM) (3.11)

where RT M and TT M denotes the reﬂection and transmission coefficient for TM z
plane waves, 11:22 = 10% — 10,2, and k2 is the propagation constant in the layered
medium. Similarly, one can perform the same derivation for TE 2 ﬁelds produced by
a dipole oriented in horizontal or :r — y plane. Since any ﬁeld can be broken into TM 2
and TE; polarization, superposition of the two forms is required when considering
arbitrary excitation. Depending on the intended use, several different variations of
the layered media Green’s function can be found in the literature. In this work,
the symmetric form of the Green’s ﬁmction provided in [239] is used as it allows for
efficient treatment Of singularities in the Green’s function. In addition to the source

vector, this form requires a testing vector to attain symmetry. The derivation is

179

straight forward but laborious [239], hence only the ﬁnal form is presented here

where,

R
9TE,TM,EM
REM

Cl”

T
gTE,TM,TM1
TTMl

 

' 1
’2’? (68’ + pa avvo ’)g P (B.12a)
b
iw
Ti? (0:... a’ngE + azdngM) + (B.12b)
iwuo

471-132 __(a. VV al’gTM + 2613.V3V3.d;g§M)

rid—#0

l 1
Tu b643( angE + k2 —as. VSVS. .aggTM + kg, -—aza;gTM1)(B. 12C)

0 k .
= 3/ dkpRTE,TM,EM_iH([1)(kpp)e—ikbz(z+z’) (13.133)
2 SIP kbz

2
_b RTE + RTM)

= 2k§(
= —a’,+6;
= i / dkaTE’TM’TM 11931131)(kpp)e""bzz+”‘“zz’)(13-13b)
2 SIP kbz
_ szTM
p

where 67 is the testing vector, 6’ is the source dipole vector, subscript t and 2 denotes

the traverse and 2 components of vectors.

B.3 Numerical Implementation

This section provides details on the numerical implementation of the above theoretical

formulation for ECT simulation. This includes methods to efﬁciently evaluate the

inﬁnite integrals Of Green’s function (B.12) are presented. Followed by details on the

solution to the integral equation in (B5).

180

B.3.1 Evaluation of Green’s function

Several methods have been develOped for numerical evaluation of the inﬁnite integrals
arising with the use of Sommerfeld identity in (B8). These vary from asymptotic
expansion that yield closed form, approximate solution to specialized quadrature
rules. In this work, the recently developed Discrete Complex Image Method (DCIM)
is used evaluate these inﬁnite integrals in closed form. In DCIM, the integrand is
approximated as a sum of exponential functions in In; and the Sommerfeld identity is

used to represent these individual inﬁnite integrals as spherical waves.

Consider the evaluation of the following inﬁnite integral,

- k 2' z
(W) = $1911: éHél)(ka)e kz 13092) (314)

Note that all integrals in (B.12) can be reduced to the above form with simple ma-

nipulations. Let the coefﬁcient R(kz) in (BM) be represented as sum of exponentials
N .
R(kz) = Z Ane‘kzon (13.15)
n=0

Using the Sommerfeld identity (B.8), the individual integrals in the above summation

is represented as spherical waves,

 

N
¢<r> = :3 Aneik‘rﬁ/r: , rs. = \/p2 + (z + on)? (13.16)

n=0
These spherical waves can be interpreted as emanating from image sources of strength
An placed at complex positions rﬁ.

There exists several different approach to determine A,- and ci, for approximating
120%) in terms of exponential functions. Matrix Pencil Method (MPM) is a relatively

new technique for representing arbitrary function in terms of exponentials. MPM is

181

based on singular valued decomposition (SVD) and provides a means to control the
error in the resulting approximation. A major requirement of MPM is that the
integrand be sampled at uniform intervals. In case of planar media Green’s function,
this sampling should be done respecting the Sommerfeld integral path (SIP) shown in
ﬁgure B.2. In practice, this is achieved by using an intermediate parametric variable
t. The integrand is sampled at uniform intervals of t and the relation between t and

k2 is chosen to conform with the SIP,
k2 = 7k[it + 1] 0 S t S T0/7 (B.17)

where To deﬁnes the ﬁnite domain of integration and is chosen large enough until
oscillations in integrand have damped sufﬁciently, 7 controls the separation of SIP

with real axis as shown in ﬁgure B.2.

The relation (B.17) prescribed in literature is suitable only for scenarios where
the conductivity of the medium is zeros or negligible. When conductivity is ﬁnite
and very large, as in case of ECT, k2 determined through above relation (B.17) fails
to follow the SIP as shown in ﬁgure B.3. A modiﬁed relation (B.18) is introduced in

this work to account for materials with high conductivity and still respect the SIP
[9; = 7Re{k}[it + 1] + iIm{k} (B.18)

where Re{-} and I m{} represent the real and imaginary part respectively. Note
that when k is a pure real quantity, i.e. when conductivity is zero, the modiﬁed
relation reduces to the original relation (B.17). Figure B.3 shows the path evaluated
using the modiﬁed relation for high conductivity case. Other advantages of high
conductivity are (a) the poles and branch cuts of the integrand are well separated

from real axis and they do not affect the numerical integration; and (b) the integrands

182

 

 

 

are less oscillatory, which in turn implies less number of approximation terms. The
maximum number of DCIM terms encountered throughtout this work did not exceed

20.

B.3.2 Solution to integral equations

For numerical solution of the integral equation in (B.5), the volume region {22 is
discretized into M tetrahedrons. The unknown electric ﬁeld density D in this domain
is represented in terms of divergence conforming vector basis functions deﬁned on each

face of tetrahedron [240],

Me
D(r) = Z f,(r)D,- (3.19)
n=1

where Me is the total number of distinct tetrahedron faces, fz- is the vector basis
function deﬁned on the ith face and D,- is the unknown coefﬁcient corresponding to
this basis function. Several properties of these basis function are discussed in detail
in [240]. Employing this representation in (B5) and using Galerkin testing leads to
the following matrix equation,

zv = H (3.20)

where V and H are vectors of size Me and Z is a matrix of size Me x Me. The

elements the matrix and vectors in (B.20) are given as follows,

Z(m,n) = [V dr [V dr fm(r).G .1.,(r) (3.21)
V(m) = Dm (B22)
H(m) = [V drEi"C(r)fm(r) (3.23)

where V is the volume of domain 92. The symmetric form of the Green’s function
used in this work, allows the transfer of derivatives of the Green’s function onto the

testing and sourcing vector basis Thus the maximum singularity encountered in this

183

 

work is of the type 1/ | Ir—r’ II which can be evaluated in closed form using the methods

provided in [241].

BA Results and Discussion

Several results that demonstrate the efﬁcacy of above described scheme are presented
in this section. First the theoretical modiﬁcations introduced to the DCIM method
for high-conductivity materials are veriﬁed. Simple semi-analytical models exists for
axi-symmetric geometries. Such models exist for circular coils place over defect free
layered media. The numerical implementation of DCIM and MPM procedure with
the proposed modiﬁcations are compared with results these semi-analytical models for
various tact conﬁguration. First, the results for evaluation of the transmitted ﬁeld at
various depths inside the layered medium using both the methods are shown in ﬁgure
B.4. Figure B.5 shows the evaluation of transmitted ﬁelds at different frequencies
using both the methods. Next, impedance of the coil evaluated using DCIM+MPM
method and semi-analytical model is shown in ﬁgures B.6 and B7 corresponding to
different coil lift-off and excitation frequency, respectively. In all cases, there is an
excellent match between DCIM+MPM method and semi-analytical and this implies
that the proposed modiﬁcations to DCIM are valid and essential.

Next set of results corresponds to the case of a defect in the layered medium. A
semi-inﬁnite medium with ﬁnite conductivity of a = 30M S with a rectangular defect
of size 13 x 0.3 x 5 mm3 is considered. The frequency of excitation is 900 MHz
and the solution from the integral equation solver are compared with experimental
measurements. Impedance of the coil is computed at various .1: position, see ﬁgure B.8,
with the origin at center of defect region and a: position of coil is measured with respect
to coil center from origin. The amplitude of coil impedance evaluated from Integral

equation solution is shown in ﬁgure B.9 and the phase values of impedance is shown in

184

91 02:0

Figure B.1: Different domains in eddy current simulation

knlkpl

§§§§§§§9o

 

 

 

 

o1obozoooaoooaooosoooeooomoeooo
3°!ka ,

Figure B.2: DCIM path from existing literature for zero conductivity

ﬁgure B.10. Qualitatively comparing with the measurement results, provided in [242],
there is a good match with results from the integral equation solver. Quantitatively,

the error between the model results and measurement was evaluated to be about 10%.

185

 

‘00-!
'1 New path for complex k
300 P , — — - Old path for complex It
200 H - - - - Old path forreal k
I
'2 m J
x: I
v o '
E

      

‘

0‘
.-.-o-o-ona-e-e-e-
‘ L

6 2000 4000 0000 8000
Re{Kp}

 

.-.-.-.-..
l A

Figure B.3: Comparison between existing and Proposed DCIM path with ﬁnite con-
ductivity.

 

   

 

 

 

.3
4.4 x10
4.2
4 . ..
“3.8 -
9
m 3.6 > -
g 3.4- DCIM -
3.2 _ 0 Analytical .
3 . .
2'80 oh 0) 0:6 018 1
Re {E <l> } x10'7

Figure B.4: Transmitted ﬁeld at various depth

186

   
 

 

 

—— DCIM
0 Semi-Analy.

 

 

   

2000 4000

0000
Freq. ( Hz )

8000 10000

 

Angle{E<D}
I? o N .5

 

 

 

 

6"”

2000 4000 0000 8000 10000

Figure B.5: Transmitted ﬁeld with varying frequency

1'

 

I J l

2 3 4
Position ( mm )

b

DCIM
Semi-Analy.

 

 

1

i i s 4

Figure B.6: Z vs. coil lift—off

187

 

 

 

 

”’0 5 10 1’5 20 2'5 30

Figure B.7: Z vs. excitation frequency

 

 

 

 

Coll position
C D
I J
Sample

 

Figure B.8: Experiment to validate the integral equation model. Measurement of
impedance as the coil is scanned across the defect.

188

 

 

 

as 1 135 2
Coll pos (cm)

 

Figure B.9: Absolute value of coil impedance from the IE model.

 

 

$3.88

Angle {Z}
3.5

O

.10 .

 

 

 

 

.20 1 n n 1 n
0.5 o 0.5 1 15 2 2.5
Coll pos. (cm )

Figure B.10: Phase value of coil impedance from the IE model.

189

Appendix C

Curriculam Vitae

Journal Papers

1. M Vikram, Jun and B. Shanker, Mathematical Analysis of Single Integral
Equation (SIE) for Electromagnetic Scattering from Homoegeneous Dielectric
Bodies,under preparation, to be submitted to IEEE Trans. on Antennas and
Propagat.

2. M Vikram, B. Shanker and S. Aluru, A Scalable Parallel Wideband FMM
for Efficient Electromagnetic Simulations on Large Scale Clusters, submitted to
IEEE Trans. Antennas and Propagat, May 2009.

3. M Vikram, A. Baczewski, B. Shanker and L. Kempel, “Accelerated Carte-
sian Expansion (ACE) uniﬁed framework for the rapid evaluation of potentials
associated with the diffusion, lossy wave and Klein-Gordon equations” , under
review Journal of computational physics.

4. M Vikram, H. Huang, B. Shanker and V. Tri, “A Novel Wideband FMM for
Electromagnetic Simulation of Multiscale Structures”, accepted for publication
in IEEE Trans. Antennas and Propagat.

5. M. Vikram, B. Shanker, “An incomplete review of fast multipole methods-
from static to Wideband-as applied to problems in computational electromag-
netics”, Journal of Applied Computational Electromagnetic Society, Accepted
for publication - May 2008.

6. M. Vikram, B. Shanker, “Fast Evaluation of Time Domain Fields in Sub-
Wavelength Source/ Observer Distributions using Accelerated Cartesian Expan-

190

 

sions (ACE)”, Journal of Computational Physics, Vol. 227, Issue 2, pp. 1007-
1023, Dec. 2007.

7. K. Arunachalarn, V. R. Melapudi, L. Udpa, and S. S. Udpa, “Microwave NDT
of Cement-Based Materials Using Far Field Reﬂection Coefﬁcients”, NDT and
B International, 39, pp. 585—593, 2006.

8. N. V. Nair, V. R. Melapudi, H. R. Jimenez, X. Liu, Y. Deng, Z. Zeng, L.
Udpa, T. J. Moran, and S. S. Udpa, “A GMR-Based Eddy Current System for
NDE of Aircraft Structures”, IEEE Transactions on Magnetics, Vol. 42, Issue
10, pp. 3312-3314, Oct. 2006.

Conference Papers

1. M. Vikram, B. Shanker and S. Aluru Large Scale Parallel Electromagnetic
Solver for Multiscale Simulations, Submitted to ACES International conference,
to be held in March 2009.

2. N. V. Nair, M. Vikram and B. Shanker A Robust Generalized Method of Mo-
ments Scheme for Electromagnetic Analysis, Submitted to ACES International
conference, to be held in March 2009.

3. M. Vikram, A. Baczewski, B. Shanker and S. Aluru Parallel Accelerated Carte-
sian Expansions for Particle Dynamics Simulations, Accepted in IEEE IPDPS
2009, to be held in May 2009. (20% acceptance rate)

4. M. Vikram, B. Shanker and T. Van Fast Solvers for Electromagnetic Simulation
of Mixed-Scale Structures, Antennas Propagation Society International Sympo—
sium, San Diego, July 2008. (Received third place price in best sudent paper
contest)

5. M. Vikram, B. Shanker and S. Aluru Provably Optimal Parallel Fast Solvers
for Electromagnetic Simulations, Antennas Propagation Society International
Symposium/URSI, San Diego, July 2008.

6. M. Vikram, B. Shanker and E. Michielssen Fast Evaluation of Tiansient Fields in
Sub- to Long-wavelength Source / Observer Distributions, Antennas Propagation
Society International Symposium/URSI, San Diego, July 2008.

7. N. V. Nair, M. Vikram and B. Shanker Mathematical Analysis of the Aug-

mented Electric Field Integral Equation Operators, Antennas Propagation So-
ciety International Symposium/URSI, San Diego, July 2008.

191

 

10.

11.

12.

13.

14.

15.

16.

17.

18.

. A. Baczewski, M. Vikram and B. Shanker Fast Methods for the Evaluation

of the Diffusion Kernel with Extensions to the Dissipative Kernel, Antennas
Propagation Society International Symposium, San Diego, July 2008.

M. Vikram and B. Shanker “Fast Evaluation of Time Domain Fields in Sub-
Wavelength Source / Observer Distributions using Accelerated Cartesian Expan—
sions”, URSI/UNC International conference, Ottawa, July 2007.

M. Vikram, H. Huang and B. Shanker “Fast Evaluation of Fields from Sub—
Wavelength Structures in Frequency Domain Using Accelerated Cartesian Ex-
pansions”, URSI/UNC International conference, Ottawa, July 2007.

M. Vikram, B. Shanker, L. Udpa and S. S. Udpa “Eddy Current Simulation of
Defects in Layered Media Using Discrete Complex Image Method”, URSI/ UN C
International conference, Ottawa, July 2007.

M. Vikram, Y. Jun and B. Shanker “Eigenvalue Analysis of Single Integral
Equations”, Electromagnetic Theory and Symposium, Ottawa, July 2007.

M. Vikram, H. Grifﬁth, H. Huang and B. Shanker “Accelerated cartesian har-
monics for fast computation of time and frequency domain low-frequency ker-
nels” , Antennas PrOpagation Society International Symposium, Hawaii, June
2007.

V. R. Melapudi, S. S. Udpa, L. Udpa, and W. P. Winfree “Ray Tracing Model
for Terahertz Inspection of Spray On Foam Insulation (SOFI) ”, 12th Biennial
IEEE Conference on Electromagnetic Field Computation, 2006

N. V. Nair, V. R. Melapudi, L. Xin, L. Udpa, S. S. Udpa “A GMR based
Eddy Current Field Sensing System for NDE of Aircraft Structures”, IEEE
International Magnetics Conference (INTERMAG), 2006.

V. R. Melapudi, S. S. Udpa, L. Udpa, and W. P. Winfree “Ray-Rracing Model
for Terahertz Imaging of SOFI Inspection”, Review of Quantitative N ondestruc-
tive Evaluation, D. O. Thompson and D. E. Chimenti, Eds, American Institute
of Physics, 2005.

K. Arunachalam, R. V. Melapudi, E. J. Rothwell, L. Udpa, and S. S. Udpa, “Mi-
crowave N DE for Reinforced Concrete”, Review of Quantitative Nondestructive
Evaluation, D. O. Thompson and D. E. Chimenti, Eds, American Institute of
Physics, 2005.

N. V. Nair, V. R. Melapudi, S. Ramakrishnan, L. Udpa, S. S. Udpa and W. P.
Winfree, “A Wavelet Based Signal Processing Technique for Image Enhance-
ment in Terahertz Imaging Data” , Review of Quantitative Nondestructive Eval-
uation, D. O. Thompson and D. E. Chimenti, Eds, American Institute of
Physics, 2005.

192

19.

20.

I. Elshaﬁey, V. R. Melapudi and L. Udpa “ Electromagnetic?Acoustic Modeling
of Fields Induced by Gradient Pulses in Diffusion Tensor Magnetic Resonance

Imaging”, Review of Quantitative Nondestructive Evaluation, D. 0. Thompson
and D. E. Chimenti, Eds, American Institute of Physics, 2005.

V. R. Melapudi, S. Udpa, L. Udpa, and W. P. Winfree, “Imaging and Model-
ing Techniques for Terahertz Inspection of NASA-SOFI”, to appear in Annual
Review of Progress In Quantitative Nondestructive Evaluation 2005, Portland.

Technical Reports

1.

M Vikram, H. Huang, B. Shanker and V. Tri, A Novel Wideband FMM for
Electromagnetic Simulation of Multiscale Structures Michigan State University
Technical Report Series, 2008.

. M. Vikram and B. Shanker Fast Evaluation of Time Domain Fields in Sub-

Wavelength Source/ Observer Distributions using Accelerated Cartesian Expan-
sions (ACE), Michigan State University Technical Report Series, 2007.

. M. Vikram, U. Parriker and L. Udpa Automated detection and classiﬁcation

software for inspection of aircraft rotor structures, Technical Report submitted
to FAA, 2005.

Disclosures

1.

Disclosure (Spring 2007): A Free Scan GMR based Eddy Current Testing Sys—
tem, Inventors: M. Vikram, N. V. Nair, S. S. Udpa and L. Udpa

193

BIBLIOGRAPHY

[1] M. N. O. Sadiku, Numerical techniques in electromagnetics. CRC press, 2000.

[2] J. Jin, The ﬁnite element method in electromagnetics. John Wiley and sons,
2002.

[3] J. H. Richmond, “Digital computer solutions of the rigorous equations for scat-
tering problems,” Proceeding of the IEEE, vol. 53, pp. 796—804, 1965.

[4] A. Appel, “An efﬁcient program for many-body simulations,” SIAM J. Sci.
Comput., vol. 6, pp. 85—103, 1985.

[5] J. Barnes and P. Hut, “A hierarchical 0(n log 11) force calculation algorithm,”
Nature, vol. 324, pp. 446—449, 1986.

[6] V. Rokhlin, “Rapid solution of the integral equations of classical potential the-
ory,” Journal of Computational Physics, vol. 60, pp. 187—207, 1985.

[7] L. Greengard and V. Rokhlin, “A fast algorithm for particle simulations,” Jour-
nal of Computational Physics, vol. 20, pp. 63—71, 1987.

[8] L. Greengard, The rapid evaluation of potential ﬁelds in particle systems. Cam-
bridge, MA: MIT Press, 1988.

[9] L. Greengard and V. Rokhlin, “A new version of the fast multipole method for
the Laplace equation in three dimensions,” Acta Numerica, vol. 6, pp. 229—269,
1997.

[10] F. Zhao, “An 0(n) algorithm for three-dimensional n-body simulation,” Mas-
ter’s thesis, Massachusetts Institute of Technology, 1987.

[11] E. J. Weniger, “Addition theorems as three-dimensional Taylor expansions,”
International Journal of Quantum Chemistry, vol. 76, pp. 280—295, 2000.

[12] P. B. Visscher and D. M. Apalkov, “Charge based recursive fast multipole mi-
cromagnetics,” Physica B: Physics of Condensed Matter, vol. 343, pp. 184—188,
2004.

194

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

1211

[22]

[23]

[24]

B. Shanker and H. Huang, “Accelerated Cartesian expansions - a fast method
for computing of potentials of the form R‘” for all real 11,” Journal of Com-
putational Physics, vol. 226, pp. 732—753, 2007. Technical Report; MSU-ECE-
Report-20065.

W. D. Elliott and J. A. Board, “Multipole algorithms for molecular dynamics
simulation on high performance computers,” Tech. Rep. TR95—003.html, Duke
University, 1995.

Y. Fu, K. J. Klimkowski, G. J. Rodin, E. Berger, J. C. Browne, J. K. Singer,
R. A. van de Geijin, and K. S. Vemaganti, “A fast solution method for three-
dimensional many-particle problems of linear elastosticity,” J. Numer. Methods
Eng, vol. 42, pp. 1215—1229, 1998.

K. Yoshida, N. Nishimura, and S. Kobayashi, “Application of fast multipole
Galerkin boundary integral equationmethod to elastostatic crack problems in
3D,” Int. J. Numer. Methods Eng, vol. 50, pp. 525—527, 2001.

Y. H. Chen, W. C. Chew, and S. Zeroug, “Fast multipole method as an efﬁcient
solver for 2D elastic wavesurface integral equations,” Computational mechanics,
vol. 20, pp. 495—506, 1997.

Y. Fu and G. J. Rodin, “Fast solution method for three-dimensional Stokesian
many-particle problems,” Commun. Numer. Meth. Eng, vol. 16, pp. 145-149,
2000.

K. Nabors and J. White, “Fastcap: A multipole accelerated 2-D capacitance
extraction program,” IEEE Trans. Computer Aided Design, vol. 10, pp. 1447—
1459, 1991.

M. Kamon, M. J. Tsuk, and J. K. White, “FASTHENRY: A multipole accel-
erated 3—D inductance extraction program,” IEEE Trans. Microwave Theory
Tech, vol. 42, pp. 1750—1758, 1994.

F. Chen and D. Suter, “Using a fast multipole method to accelerate spline
evaluations,” IEEE Computational Science and Engineering, vol. 5, pp. 24—31,
1998.

J. R. Driscoll and J. D. M. Healy, “Asymptotically fast algorithms for spherical
and related transforms,” in 30th annual symposium on foundations of computer
science, (Research Triangle Park, NC), pp. 344—349, IEEE Computer Society
Press, 1989.

R. J. Chien and B. K. Alpert, “A fast spherical ﬁlter with uniform resolution,”
Journal of Computational Physics, vol. 136, pp. 580—584, 1997.

L. Greengard and J. Strain, “A fast algorithm for the evaluation of heat poten-
tials,” Comm. Pure Appl. Math., vol. 43, pp. 949—963, 1990.

195

 

[25] J. Tausch, “A fast method for solving the heat equation by layer potentials,”
Journal of Computational Physics, vol. 224, no. 2, pp. 956—969, 2007.

[26] J .-R. Li and L. Greengard, “On the numerical solution of the heat equation I:
fast solvers infree space,” Journal of Computational Physics, vol. 226, pp. 1891—
1901, 2007.

[27] V. Rokhlin, “Rapid solutions of integral equations of scattering theory in two
dimensions,” Journal of Computational Physics, vol. 86, pp. 414—439, 1990.

[28] C. C. Lu and W. C. Chew, “Fast algorithm for solving hybrid integral equa-
tions,” IEE Proceedings H: Microwaves and Antennas, vol. 140, pp. 455—460,
1992.

[29] N. Engheta, W. D. Murphy, V. Rokhlin, and M. S. Vassiliou, “The fast mul-
tipole method (FMM) for electromagnetic scattering problems,” IEEE Trans.
Antennas Propagat., vol. 40, pp. 634—641, 1992.

[30] V. Rokhlin, “Diagonal forms of translation operators for the Helmholtz equation
in three dimensions,” Applied and Computational Harmonic Analysis, vol. 1,
pp. 82—93, 1993.

[31] M. A. Epton and B. Dembart, “Multipole translation theory for three di-
mensional Laplace and Helmholtz equations,” SIAM J. Sci. Compt, vol. 16,
pp. 865-897, 1995.

[32] W. C. Chew, S. Koc, J. M. Song, C. C. Lu, and E. Michielssen, “A succint
way to diagonalize the translation matrix in three dimensions,” IEEE AP-S
Int. Symp., pp. 2072—2075, 1997.

[33] W. C. Chew, J .-M. Jin, E. Michielssen, and J. Song, eds, Fast and Efﬁcient
Algorithms in Computational Electromagnetics. Artech House Publishers, 2001.

[34] N. N ishimura, “Fast multipole accelerated boundary integral equation meth-
ods,” Appl Mech Rev, vol. 55, pp. 299—324, 2002.

[35] M. F. Gyure and M. A. Stalzer, “A prescription for the multilevel Helmhotz
F MM,” IEEE Computational Science and Engineering, vol. 5, no. 3, pp. 39—47,
1998.

[36] S. Velamparambil, J. M. Song, W. C. Chew, and G. K., “ScaleME: A portable
scalable multipole engine for electromagnetic and acoustic integral equation
solvers,” IEEE AP—S Int. Symp., vol. 3, pp. 1774—1777, 1998.

[37] S. Velamparambil, W. C. Chew, and J. Song, “10 million unknowns: is it that
big?,” IEEE Trans. Antennas and Propagat., vol. 45, pp. 43—58, 2003.

196

 

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

B. Hariharan, S. Aluru, and B. Shanker, “A scalable parallel fast multipole
method for analysis of scattering from perfect electrically conducting surfaces,”
in Supercomputing, ACM/IEEE 2002 Conference, pp. 42—42, 16-22 Nov. 2002.

B. Hariharan and S. Aluru, “Efﬁcient parallel algorithms and software for com-
pressed octrees with applications to hierarchical methods,” Parallel Computing,
vol. 31, pp. 311—331, 2005.

L. Gurel and O. Ergul, “Fast and accurate solutions of extremely large integral-
equation problems discretised with tens of millions of unknowns,” Electronics
Letters, vol. 43, no. 9, pp. 499—500, April 26 2007.

O. Ergul and L. Gurel, “Hierarchical parallelisation strategy for multilevel fast
multipole algorithm in computational electromagnetics,” Electronics Letters,
vol. 44, no. 1, pp. 3-5, January 3 2008.

L. Greengard, J. Huang, V. Rokhlin, and S. Wandzura, “Accelerating fast mul-
tipole methods for the Helmholtz equation at low frequencies,” IEEE Compu-
tational Science and Engineering, vol. 5, pp. 32—38, 1998.

B. Dembart and E. Yip, “The accuracy of fast multipole methods for Maxwell’s
equations,” IEEE Computational Science and Engineering, vol. 5, pp. 48—56,
1998.

J. S. Zhao and W. C. Chew, “MLFMA for solving boundary equations of 2D
electromagnetic scattering from static to electrodynamic,” Microwave and 0p-
tical Technology Letters, vol. 20, pp. 306—311, 1999.

T. J. Cui and W. C. Chew, “Fast algorithm for electromagnetic scattering by
buried conducting plates of large size,” IEEE Trans. Antennas and Propagat.,
vol. 47, no. 6, pp. 1116—1118, 1999.

J. S. Zhao and W. C. Chew, “Three-dimensional multilevel fast multipole algo-
rithm from static to electrodynamic,” Microwave and Optical Technology Let-
ters, vol. 26, pp. 43—48, 2000.

B. Hu and W. C. Chew, “Fast inhomogenous plane wave algorithm for scattering
from objects above the multilayered medium,” IEEE Trans. Geoscience and
Remote Sensing, vol. 39, pp. 1028—1038, 2001.

J. Sarvas, “Performing interpolation and anterpolation entirely by fast Fourier
transform in the 3-D multilevel fast multipole algorithm,” SIAM Journal on
Numerical Analysis, vol. 41, pp. 2180—2196, 2003.

E. Darve and P. Have, “Efﬁcient fast multipole method for low-frequency scat-
tering,” Journal of Computational Physics, vol. 197, pp. 341—363, January 2004.

197

[50] H. Cheng, W. Y. Crutchﬁeld, Z. Gimbutas, L. F. Greengard, J. F. Ethridge,
J. Huang, V. Rokhlin, and N. Y. andJ. Zhao, “A Wideband fast multipole
method for the Helmholtz equation in three dimensions,” Journal of Computa-
tional Physics, vol. 216, pp. 300—325, 2006.

[51] H. Cheng, L. Greengard, and V. Rokhlin, “A fast adaptive multipole algorithm
in three dimensions,” Journal of Computational Physics, vol. 155, pp. 468—498,
1999.

[52] S. Aluru, “Greengard’s N—body algorithm is not order N,” SIAM J. Sci. Com-
put., vol. 17, pp. 773—776, 1996.

[53] M. S. Warren and J. K. Salmon, “A parallel hashed oct-tree N-body algorithm,”
in Supercomputing ’93: Proceedings of the 1993 ACM/IEEE conference on Su-
percomputing, (New York, NY, USA), pp. 12—21, ACM, 1993.

[54] F. Sevilgen, S. Aluru, and N. Futamura, “Parallel algorithms for tree accu-
mulations,” Journal of Parallel and Distributed Computing, vol. 65, pp. 85—93,
2005.

[55] J. T. J. White, “A multiscale method for fast capacitance extraction,” in Pro-
ceedings of the IEEE/ACM DAC, 1998.

[56] S. Balasubramanian, L. N. Subramaniam, and B. Shanker, “Fast integral equa-
tion based analysis of magneto—static ﬁelds in bulk non-linear media,” IEEE
Trans. Magnetics, vol. 35, pp. 100—107, 2002.

[57] V. Rokhlin and S. M. Wandzura, “The fast multipole method for periodic struc-
ture,” in IEEE AP—S Int. Symp., pp. 424—427, 1994.

[58] J. A. Stratton, Electromagnetic theory. McGraw-Hill New York, 1941.

[59] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables. New York, NY: Dover Publica-
tions, Inc., 1972.

[60] W. D. Elliott and J. A. Board, “Fast Fourier transform accelerated fast multi-
pole algorithm,” SIAM J. Sci. Comput., vol. 17, pp. 398—415, 1996.

[61] M. Vikram and B. Shanker, “Fast evaluation of time domain ﬁelds in sub-
wavelength source / observer distributions using accelerated Cartesian expan-
sions (ACE) ,” Journal of Computational Physics, vol. 227, pp. 1007-1023, 2007.

[62] B. Hu, W. C. Chew, E. Michielssen, and J. Zhao, “An improved fast steepest
descent algorithm for the fast analysis of two-dimensional scattering problems,”
Radio Sci., vol. 34, pp. 759—772, 1999.

198

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

N. Yarvin and V. Rokhlin, “Generalized Gaussian quadratures and singular
value decompositions of integral operators,” tech. rep., Department of Com-
puter Science, Yale University, 1996.

B. Hu, W. C. Chew, E. Michielssen, and J. Zhano, “Fast inhomogeneous plane
wave algorithm for fast analysis of two-dimensional scattering problems,” Radio
Science, vol. 34, pp. 759—772, 1999.

R. Coifman, V. Rokhlin, and S. Wandzura, “The fast multipole method for the
wave equation: A pedestrian prescription,” IEEE Antennas and Propagation
Magazine, vol. 35, pp. 7—12, June 1993.

V. Rokhlin, “Sparse diagonal forms for translation operators for the Helmholtz
equation in two dimensions,” Tech. Rep. YALEU/DCS/RR—1095, Yale Univer-
sity, 1995.

J. Rahola, “Diagonal forms of the translation operators in the fast multipole al-
gorithm for scattering problems,” BIT Numerical Mathematics, vol. 36, pp. 333—
358, 1996.

A. D. McLaren, “Optimal numerical integration on a sphere,” Mathematics of
Computation, vol. 17, pp. 361—383, 1963.

E. Darve, “The fast multipole method: Numerical implementation,” Journal of
Computational Physics, vol. 160, pp. 195—240, 2000.

J. Song and W. C. Chew, “Error analysis for the truncation of multipole expan—
sion of vector Green’s functions,” IEEE Microwave and Wireless Components
letters, vol. 11, pp. 311—313, 2001.

J .-Y. Li, L.-W. Li, B.-L. Ooi, P.-S. Kooi, and M.-S. Leong, “On the accuracy of
the addition theorem for a scalar green’s function used in the FMM,” Microwave
and Optical Technology Letters, vol. 31, pp. 439—442, 2001.

M. L. Hastriter, S. Ohnuki, and W. C. Chew, “Error control of the translation
operator in 3D MLFMA,” Microwave and Optical Technology Letters, vol. 37,
pp. 184—188, 2003.

S. Koc, J. Song, and W. C. Chew, “Error analysis for the numerical evaluation
of the diagonal forms of the scalar spherical addition theorem,” SIAM J. Numer.
Anal, vol. 36, no. 3, pp. 906—921, 1999.

M. Nilsson, “Stability of the high frequency fast multipole method for helrnholtz
equation in three dimensions,” BIT Numerical Mathematics, vol. 44, pp. 773—
791,2004.

J. M. Song, C. C. Lu, W. C. Chew, and S. W. Lee, “Fast illinois solver code
(FISC),” IEEE Antennas Propagat. Mag, vol. 40, pp. 27—34, 1998.

199

[76] E. Darve, “The fast multipole method I: Error analysis and asymptotic com-
plexity,” SIAM J. Numer. Anal, vol. 38, pp. 98—128, 2000.

[77] C. C. Lu and W. C. Chew, “A multilevel algorithm for solving boundary integral
wave scattering,” Microwave and Optical Technology Letters, vol. 7, pp. 466—
470, 1994.

[78] J. M. Song and W. C. Chew, “Multilevel fast-multipole algorithm for solving
combined ﬁeld integral equations of electromagnetic scattering,” Microwave and
Optical Technology Letters, vol. 10, no. 1, pp. 14—19, 1995.

[79] B. Dembart and E. Yip, “A 3D fast multipole method for electromagnetics
with multiple levels,” in Proceedings of the 11th Annual Conference on Applied
Computational Electromagnetics, vol. 1, (Monterey, CA), pp. 621-628, 1995.

[80] T. Hrycak and V. Rokhlin, “An improved fast multipole algorithm for potential
ﬁelds,” SIAM Journal on Scientiﬁc Computing, vol. 19, pp. 1804—1826, 1998.

[81] N. Yarvin and V. Rokhlin, “A generalized 1D fast-multipole method, with ap-
plications to ﬁltering of spherical harmonics,” tech. rep., Yale University, 1998.

[82] B. Shanker, A. Ergin, M. Lu, and E. Michielssen, “Fast analysis of transient elec-
tromagnetic scattering phenomena using the multilevel plane wave time domain
algorithm,” IEEE Trans. Antennas and Propagat., vol. 51, no. 3, pp. 628—641,
March 2003.

[83] E. Michielssen and W. C. Chew, “Fast steepest descent path algorithm for ana-
lyzing scattering from two-dimensional objects,” Radio Sci., vol. 31, pp. 1215—
1224, 1996.

[84] L. J. J iang and W. C. Chew, “Low-frequency fast inhomogeneous plane-wave
algorithm (LF—FIPWA),” Microwave and Optical Technology Letters, vol. 20,
pp. 117-122, 2004.

[85] R. L. Wagner and W. C. Chew, “A ray-propagation fast multipole algorithm,”
Microwave and Optical Technology Letters, vol. 7, pp. 435—438, 1994.

86 B. Shanker, S.-K. Han, and E. Michielssen, “A fast multipole approach to com-
[
puting scattering from an inhomogeneous bianisotropic cylindrical object using
Beltrami ﬁelds,” Radio Science, vol. 33, pp. 17—31, 1998.

[87] T. J. Cui and W. C. Chew, “Accurate analysis of wire structures from very-
low frequency to microwave frequency,” IEEE Trans. Antennas and Propagat.,
vol. 50, pp. 301—307, 2002.

[88] W. C. Chew, Waves and Fields in Inhomogeneous Media. IEEE Press New
York, 1995.

200

 

 

[89] L. Xuan, A. Zhu, A. R. J ., and S. D. Gedney, “A broadband multilevel fast
multipole algorithm,” IEEE AP-S Int. Symp. and USNC/URSI National Radio
Science Meeting, vol. 2, pp. 1195-1198, 2004.

[90] H. Wallen and J. Sarvas, “Translation procedures for broadband MLFMA,”
Progress in Electromagnetics Research, vol. 55, pp. 47—78, 2005.

[91] J. Carrier, L. Greengard, and V. Rokhlin, “A fast adaptive multipole algorithm
for particle simulations,” SIAM J. Sci. Stat. Comput., vol. 9, pp. 669—686, 1988.

[92] K. Nabors and J. White, “Multipole-accelerated capacitance extraction for 3—D
structures with multilayer dielectrics,” IEEE Tran. Circ. Syst., vol. 39, pp. 946—
954, 1992.

[93] A. Adedoyin and P. Andrei, “3—D magnetostatic using the fast multiple
method,” IEEE Trans. Magnetics, vol. 43, no. 6, pp. 2914—2916, June 2007.

[94] I. Mayergoyz, P. Andrei, and M. Dimian, “Nonlinear magnetostatic calcula-
tions based on fast multipole method,” IEEE Trans. Magnetics, vol. 39, no. 3,
pp. 1103-1106, May 2003.

[95] W. Hafla, A. Buchau, F. Groh, and W. Rucker, “Efﬁcient integral equation
method for the solution of 3—D magnetostatic problems,” IEEE Trans. Magnet-
ics, vol. 41, no. 5, pp. 1408—1411, May 2005.

[96] Y. Takahashi, Y. Fujishima, and S. Wakao, “Large-scale analysis of surface
charge in eddy-current problems by the fast multipole method,” IEEE Trans.
Magnetics, vol. 41, no. 5, pp. 1700—1703, May 2005.

[97] G. Rubinacci, A. Tamburrino, S. Ventre, and F. Villone, “A fast 3-D multipole
method for eddy-current computation,” IEEE Trans. Magnetics, vol. 40, no. 2,
pp. 1290—1293, March 2004.

[98] C. Seberino and H. Bertram, “Concise, efﬁcient three-dimensional fast multipole
method for micromagnetics,” IEEE Trans. Magnetics, vol. 37, no. 3, pp. 1078—
1086, May 2001.

[99] D. Apalkov and P. Visscher, “Fast multipole method for micromagnetic simula-
tion of periodic systems,” IEEE Trans. Magnetics, vol. 39, no. 6, pp. 3478—3480,
Nov. 2003.

[100] J. M. Song, C. C. Lu, and W. C. Chew, “MLFMA for electromagnetic scatter-
ing by large complex objects,” IEEE Trans. Antennas and Propagat, vol. 45,
pp. 1488—1493, 1997.

[101] T. Eibert, “Some scattering results computed by surface-integral-equation and
by the multilevel fast multipole method,” IEEE Antennas and Propagation
Magazine, vol. 49, no. 2, pp. 61—69, April 2007.

201

 

[102]

[103]

[104]

[105]

[106]

[107]

[108]

(109)

[110]

[111]

[112]

[113)

L. Hamilton, P. Macdonald, M. Stalzer, R. Turley, J. Visher, and S. Wandzura,
“3D method of moments scattering computations using the fast multipole
method,” Antennas and Propagation Society International Symposium, 1994.
AP-S. Digest, vol. 1, pp. 435-438, 20—24 Jun 1994.

T. J. Cui, W. C. Chew, G. Chen, and J. Song, “Efﬁcient MLFMA, RPFMA, and
FAFFA algorithms for EM scattering by very large structures,” IEEE Trans.
Antennas and Propagat., vol. 52, no. 3, pp. 759—770, March 2004.

K. Sertel and J. L. Volakis, “Effects of the fast multipole method (FMM) param-
eters on radar cross section computations,” IEEE AP-S Int. Symp., pp. 624—627,
1999.

Z. Q. Zhang, Q. H. Liu, and X. M. Xu, “RCS computation of large inhomo-
geneous objects using fast integral equation solver,” IEEE Trans. Antennas
Propagat., vol. 51, pp. 613—618, March 2003.

J. M. Song and W. C. Chew, “Fast multipole method solution using parametric
geometry,” Micro. Opt. Tech. Lett., vol. 7, pp. 760—795, 1994.

K. Donepudi, J. M. Jin, and W. C. Chew, “A higher order multilevel fast
multipole algorithm for scattering from mixed conducting/ dielectric bodies,”
IEEE Trans. Antennas Propag., vol. 51, pp. 2814—2821, 2003.

K. Donepudi, J. Song, J .-M. Jin, G. Kang, and W. Chew, “A novel implementa-
tion of multilevel fast multipole algorithm for higher order Galerkin’s method,”
IEEE Trans. Antennas and Propagat., vol. 48, no. 8, pp. 1192—1197, Aug 2000.

J. Liu and J .-M. Jin, “A novel hybridization of higher order ﬁnite element
and boundary integral methods for electromagnetic scattering and radiation
problems,” IEEE Trans. Antennas Propagat., vol. 49, no. 12, pp. 1794-1806,
Dec 2001.

H. Wallen, S. Jarvenpaa, P. Yla-Oijala, and J. Sarvas, “Broadband Muller-—
MLFMA for electromagnetic scattering by dielectric objects,” IEEE Trans.
Antennas Propagat., vol. 55, no. 5, pp. 1423—1430, May 2007.

V. J andhyala, B. Shanker, E. Michielssen, and W. Chew, “Fast algorithm for the
analysis of scattering by dielectric rough surfaces,” Journal of Optical Society
of America, vol. 15, pp. 1877—1885, 1998.

X. Q. Sheng, J. J. M., J. M. Song, C. W. C., and C. C. Lu, “Solution of
combined-ﬁeld integral equation using multilevel fast multipole algorithm for
scattering from homogeneous bodias,” IEEE Antennas and Propagation Maga-
zine, vol. 46, pp. 1718—1726, 1998.

C.-C. Lu, “A fast algorithm based on volume integral equation for analysis
of arbitrarily shaped dielectric radomes,” IEEE Trans. Antennas Propogat.,
vol. 51, no. 3, pp. 606—612, March 2003.

202

 

[114] J .-Y. Li and L.-W. Li, “Characterizing scattering by 3d arbitrarily shaped ho-

(115]

[116]

[117]

[118]

(119)

[120]

(121]

[122]

[123]

[124]

mogeneous dielectric objects using fast multipole method,” IEEE Antennas and
Wireless Propagation Letters, vol. 3, pp. 1—4, 2004.

G. Kobidze, K. Aygun, and B. Shanker, “Efﬁcient integral equation based anal-
ysis of scattering from PEC-anisotropic bodies,” in Antennas and Propagation
Society International Symposium, 2004. IEEE, vol. 4, pp. 3887-3890Vol.4, 20-
25 June 2004.

G. Kobidze and B. Shanker, “Integral equation based analysis of scattering
from 3-d inhomogeneous anisotropic bodies,” Antennas and Propagation, IEEE
Transactions on, vol. 52, pp. 2650—2658, Oct. 2004.

R. Wagner, J. Song, and W. Chew, “Monte carlo simulation of electromag-
netic scattering from two-dimensional random rough surfaces,” Antennas and
Propagation, IEEE Transactions on, vol. 45, no. 2, pp. 235—245, 1997.

M. El-Shenawee, C. Rappaport, E. Miller, and M. Silevitch, “Three-dimensional
subsurface analysis of electromagnetic scattering from penetrable/PEC objects
buried under rough surfaces: use of the steepest descent fast multipole method,”
IEEE Trans. Geoscience and Remote Sensing, vol. 39, no. 6, pp. 1174—1182, Jun
2001.

V. Jandhyala, B. Shanker, E. Michielssen, and W. Chew, “A combined steepest
descent-fast multipole algorithm for the analysis of three-dimensional scattering

by rough surfaces,” in Antennas and Propagation Society International Sympo-
sium, 1997. IEEE, 1997 Digest, vol. 4, pp. 2308—2311vol.4, 13—18 July 1997.

V. Jandhyala, E. Michielssen, B. Shanker, and W. Chew, “SDFMM-based fast
analysis of radiation and scattering ﬁom ﬁnite microstrip structuras,” in An-
tennas and Propagation Society International Symposium, 1998. IEEE, vol. 3,

pp. 1530—1533vol.3, 21-26 June 1998.

G. Hoyler and R. Unbehauen, “The fast multipole method for EMC problem,”
Electrical Engineering, vol. 80, pp. 403—411, 1997.

H. Y. Chao, W. C. Chew, E. Michielssen, and J. M. Song, “The multilevel fast
multipole algorithm for electromagnetic compatibility analysis,” IEEE-EM C
Int. Symp., vol. 2, pp. 844—847, 1999.

T. J. Cui, W. C. Chew, J.-S. Zhao, and H.-Y. Chao, “Full-wave analysis of
complicated transmission-line circuits using wire models,” Antennas and Prop-
agation, IEEE Transactions on, vol. 50, no. 10, pp. 1350—1360, 2002.

Z. Z. 00, L. Le-Wei, J.-Y. Li, and E.-P. Li, “Analysis and design of reﬂector
and horn antennas using multilevel fast multipole algorithms (MLFMA),” IEEE
AP—S Int. Symp., vol. 2, pp. 758—761, 22-27 June 2003.

203

 

 

[125]

[126]

[127]

[128]

[129]

[130)

[131]

[132]

[133]

[134]

[135)

J. Lord, S. Benham, and R. Mittra, “Multi-level fast multipole for antenna
modelling,” Antennas and Propagation, 2003. (ICAP 2003). Twelfth Interna-
tional Conference on (Conf. Publ. No. 491), vol. 1, pp. 308-311, 31 March-3
April 2003.

H.-Y. Chao, J.-S. Zhao, and W. C. Chew, “Application of curvilinear basis
functions and MLFMA for radiation and scattering problems involving curved
PEC structures,” IEEE Trans. Antennas and Propagat., vol. 51, no. 2, pp. 331—
336, Feb 2003.

K. Donepudi, J. M. Jin, S. Velamparambil, J. M. Song, and W. C. Chew,
“A higher-order parallelized fast multipole algorithm for 3D scattering,” IEEE
Trans. Antennas Propagat., vol. 49, pp. 1069—1078, 2001.

K. Donepudi, J. M. Song, J. M. Jin, G. Kang, and W. C. Chew, “Point-based
MLFMA for Galerkin’s method,” IEEE Trans. Antennas Propagat., vol. 48,
pp. 1192—1197, 2000.

L. Gurel and M. Aksun, “Fast multipole method in layered media: 2—D elec-
tromagnetic scattering problems,” in IEEE AP—S Int. Symp., pp. 1734-1737,
1996.

L. Gurel and M. Aksun, “Electromagnetic scattering solution of conducting
strips in layered media using the fast multipole method,” IEEE Microwave and
Guided Wave Letters, vol. 6, pp. 277—??, 1996.

J. M. Jin, The Finite Element Method in Electromagnetics. New York: Wiley,
2002.

S. Bindiganavale and J. Volakis, “A hybrid FE—FMM technique for electromag-
netic scattering,” IEEE Trans. Antennas Propagat., vol. 45, no. 1, pp. 180—181,
Jan 1997.

N. Lu and J .-M. Jin, “Application of fast multipole method to ﬁnite-element
boundary-integral solution of scattering problems,” IEEE Trans. Antennas Pro-
pogat., vol. 44, no. 6, pp. 781—786, Jun 1996.

Z. Zhao, L. Li, J. Smith, and L. Carin, “Analysis of scattering from very large
three-dimensional rough surfaces using MLFMM and ray-based analyses,” IEEE
Antennas and Propagation Magazine, vol. 47, no. 3, pp. 20—30, June 2005.

P.-L. Rui, R.-S. Chen, D.-X. Wang, and E. Kai-Ning Yung, “A spectral multi-
grid method combined with MLFMM for solving electromagnetic wave scatter-
ing problems,” IEEE Trans. Antennas Propagat, vol. 55, no. 9, pp. 2571-2577,
Sept. 2007.

204

 

[136]

[137)

[138]

[139]

[140]

[141]

[142]

[143]

[144]

[145]

[146]

T. Eibert, “Modeling and design of offset parabolic reﬂector antennas using
physical Optics and multilevel fast multipole method accelerated method of
moments,” Multitopic Conference, 2006. INMIC ’06. IEEE, pp. 377—382, 23—24
Dec. 2006.

L. Gurel, A. Manyas, and O. Ergul, “PO-MLFMA hybrid technique for the so-
lution of electromagnetic scattering problems involving complex targets,” An-
tennas and Propagation, 2007. EuCAP 2007. The Second European Conference
on, pp. 1—5, 11-16 Nov. 2007.

A. Tzoulis and T. Eibert, “Antenna modeling with the hybrid ﬁnite element
- boundary integral - multilevel fast multipole - uniform geometrical theory
of diffraction method,” Antennas, 2007. INICA ’07. 2nd International ITG
Conference on, pp. 91—95, 28-30 March 2007.

X. Yin, X. Zai, W. Hong, and X. Zhu, “A fast algorithm based on MEI and
FMM for ﬁeld prediction in urban mobile communications,” IEEE AP-S' Int.
Symp., vol. 2, pp. 1384—1387, Aug 1999.

D. Pissoort, E. Michielssen, D. Ginste, and F. Olyslager, “Fast-multipole anal-
ysis of electromagnetic scattering by photonic crystal slabs,” Lightwave Tech-
nology, Journal of, vol. 25, no. 9, pp. 2847—2863, Sept. 2007.

D. Ginste, E. Michielssen, F. Olyslager, and D. De Zutter, “An efﬁcient per-
fectly matched layer based multilevel fast multipole algorithm for large pla-
nar microwave structures,” IEEE Trans. Antennas Propagat., vol. 54, no. 5,
pp. 1538—1548, May 2006.

D. Pissoort, D. Ginste, and F. Olyslager, “Including PML-based absorbing
boundary conditions in the MLFMA,” IEEE Antennas and Wireless Propaga-
tion Letters, vol. 4, pp. 312—315, 2005.

V. Jandhyala, E. Michielssen, and R. Mittra, “Multipole-accelerated capaci-
tance computatiion for 3-d structures in a stratiﬁed dielectric medium using
a closed form green’s function,” Int. J. Microwave. Millimeter Wave Comput.

Aided Eng., vol. 5, pp. 68—78, 1995.

L. Gurel and M. I. Aksun, “Electromagnetic scttering solution of conducting
strips in layered media using the fast multipole method,” IEEE Microwave
Guided Wave Letters, vol. 6, pp. 277—279, 1996.

P. A. MacDonald and T. Itoh, “Fast simulation of microstrip structures using
the fast multipole method,” Int. J. Numerical Modelling: Electron. Networks,
Devices and Fields, vol. 9, pp. 345—357, 1996.

J. Zhao, W. C. Chew, C. C. Lu, E. Michielssen, and J. M. Song, “Thin-
stratiﬁed medium fast-multipole algorithm for solving microstrip structures,”
IEEE Trans. Microwave Theory Tech, vol. 46, pp. 395—403, 1998.

205

 

[147] V. Jandhyala, E. Michielssen, B. Shanker, and W. C. Chew, “A fast algorithm
for the analysis of radiation and scattering from arrays on ﬁnite substrates,”
Micro. Opt. Tech. Lett., vol. 23, pp. 306—310, 1999.

[148] F. Ling, J. M. Song, and J. M. Jin, “Multilevel fast multipole algorithm for
analysis of large scale microstripstructures,” IEEE Microwave and Guided Wave
Letts., vol. 9, pp. 508—510, 1999.

[149] B. Hu and W. C. Chew, “Fast inhomogeneous plane wave algorithm for elec-
tromagnetic solutions in layered medium structures-2D case,” Radio Science,
vol. 35, 2000.

[150] B. Hu and W. C. Chew, “Fast inhomogeneous plane wave algorithm for multi-
layered medium problems,” IEEE Trans. Geoscience. Remote Sensing, vol. 39,
pp. 1028—1038, 2001.

[151] N. Geng, A. Sullivan, and L. Carin, “Multi—level fast-multipole algorithm for
scattering from conducting targets above or embedded in a lossy half space,”
IEEE Trans. Geoscience Remote Sensing, vol. 38, pp. 1567-1579, 2000.

[152] N. Geng, A. Sullivan, and L. Carin, “Fast multipole method analysis of scatter-
ing from a three—dimensional target in a half-space environment,” IEEE Trans.
Antennas Propagat., vol. 49, pp. 740—748, 2001.

[153] Z. Liu, J. He, Y. Xie, A. Sullivan, and C. L., “Multi-level fast multipole al-
gorithm for general targets on a half-space interface,” IEEE Tans. Antennas
Propagat., vol. 50, pp. 1838—1849, 2002.

[154] Z. Liu, R. J. Adams, and C. L., “New MLFMA formulation for closed pec
targets in the vicinity of a half space,” IEEE Trans. Antennas Propagat., vol. 51,
pp. 2822—2829, 2003.

[155] X. Dong, Z. Liu, and L. Carin, “Volume and surface MLFMA formulations for
dielectric targets in the presence of a half space,” Radio Science, vol. 39, 2004.

[156] L. Li and L. Carin, “Multi-level fast multipole method calibration of ray models
with application to wireless propagation,” IEEE Trans. Antennas Propagat.,
vol. 52, pp. 2794—2799, 2004.

[157] M. A. Seville, Multilevel Multipole Free Fast Algorithm for Elecgromagnetic
Scattering Problems in Layered Media. PhD thesis, University of Illinois at
Urbana, 2006.

[158] R. L. Wagner, Eﬂicient computational techniques for electromagnetic propaga-
tion and scattering. PhD thesis, University of Illinois at Urbana-Champaign,
1996.

206

 

 

 

[159]

[160]

[161]

[162]

[163]

[164]

[165]

[166]

[167]

[168]

[169]

R. L. Wagner, J. Song, and C. W. C., “Monte Carlo simulation of electromag-
netic scattering from two-dimensional random rough surfaces,” IEEE Trans.
Antennas Propagat., vol. 45, pp. 235—245, 1997.

V. J andhyala, B. Shanker, E. Michielssen, and W. C. Chew, “A combined steep-
est dascent fast multipole algorithm for the fast analysis of three—dimensional
scattering by rough surfaces,” IEEE Trans. Geoscience Remote Sensing, vol. 36,
pp. 738—748, 1998.

V. Jandhyala, D. Sengupta, B. Shanker, E. Michielssen, M. Feng, and G. Still-
man, “Two-dimensional rough surface couplers for broadband quantum-well
infrared photodetectors,” Appl. Phys. Lett., vol. 73, pp. 3495—3497, 1998.

V. Jandhyala, D. Sengupta, B. Shanker, E. Michielssen, M. Feng, and G. Still-
man, “Efﬁcient electromagnetic analysis of two-dimensional ﬁnite quasi-random
gratings of quantum well infrared photodetectors,” J. Appl. Phys, vol. 83,
pp. 3360—3363, 1998.

A. Ergin, B. Shanker, and E. Michielssen, “The plane-wave time-domain algo-
rithm for the fast analysis of transient wave phenomena,” Antennas and Prop-
agation Magazine, IEEE, vol. 41, pp. 39—52, Aug. 1999.

B. Shanker, A. Ergin, K. Aygun, and E. Michielssen, “Analysis of transient elec-
tromagnetic scattering phenomena using a two-level plane wave time-domain
algorithm,” Antennas and Propagation, IEEE Transactions on, vol. 48, no. 4,

pp. 510—523, 2000.

M. Vikram, H. Grifﬁth, H. Huang, and B. Shanker, “A uniform framework for
time and frequency domain low-frequency problems,” in Proceedings of the 2007
IEEE Internation Symposium on Antennas and Propagation Society Interna-
tional, 2007.

B. Shanker and H. Huang, “Accelerated cartesian expansions—a fast method
for computing of potentials of the form r’” for all real V,” J. Comp. Phys,
p. doi:10.1016/j.jcp.2007.04.033, 2007.

L. Greengard and V. Rokhlin, “A new version of the fast multipole method for
the laplace equation in three dimensions,” Acta Numerica, vol. 6, pp. 229—269,
1997.

L. Greengard and V. Rokhlin, “A fast algorithm for particle simulations,” Jour-
nal of Computational Physics, vol. 73, pp. 325-348, Dec. 1987.

H. Cheng, L. Greengard, and V. Rokhlin, “A fast adaptive multipole algorithm
in three dimensions,” J. Comput. Phys, vol. 155, no. 2, pp. 468-498, 1999.

207

 

[170]

[171]

[172]

[173]

[174]

[175]

[176]

[177]

[178]

[179]

[180]

[181]

B. Shanker, A. Ergin, M. Lu, and E. Michielssen, “Fast analysis of transient
electromagnetic scattering phenomena using the multilevel plane wave time
domain algorithm,” Antennas and Propagation, IEEE Transactions on, vol. 51,
pp. 628—641, March 2003.

K. Aygiin, Novel fast algorithms for time domain integral equation analysis of
antennas and circuits. PhD thesis, University of Illinois at Urbana, 2002.

J. Meng, M. Lu, and E. Michielssen, “Towards efﬁcient and stable low fre-
quency time domain integral equation solvers,” in Proceeding of the 2006 IEEE
International Symposium on Antennas and Propagation, pp. 2955-2958, 2006.

J. Meng, M. Lu, and E. Michielssen, “A fast space-adaptive algorithm to evalu-
ate transient wave ﬁelds due to low-frequency source constellations,” in Proceed-

ings of the 2005 IEEE International Symposium on Antennas and Propagation,
vol. 3A, pp. 171—174, July 2005.

M. Vikram and B. Shanker, “An incomplete review of fast multipole methods
-from static to Wideband -as applied to problems in computational electromag-
netics,” ACES (accepted for publication ), 2008.

J .-S. Zhao and W. C. Chew, “Integral equation solution to Maxwell’s equations
from zero frequency to microwave frequency,” IEEE Trans. Antennas Propagat.,
vol. 48, pp. 1635—1645, 2000.

J. Mautz and R. Harrington, “An E—ﬁeld solution for a conducting surface small
or comparable to the wavelength,” Antennas and Propagation, IEEE Transac-
tions on, vol. 32, pp. 330—339, Apr 1984.

E. Arvas, R. Harrington, and J. Mautz, “Radiation and scattering from electri-
cally small conducting bodies of arbitrary shape,” Antennas and Propagation,
IEEE Transactions on, vol. 34, pp. 66—77, Jan 1986.

W. Wu, W. Glisson, and D. Kajfez, “A study of two numerical solution proce-
dures for the electric ﬁeld integral equation at low frequency,” A ppl. Computat.
Electromagn. Soc. J., vol. 10, pp. 69—80, 1995.

M. Burton and S. Kashyap, “A study of a recent, moment-method algorithm
that is accurate to very low frequencies,” Appl. Computat. Electromagn. Soc.
J., vol. 10, pp. 69—80, 1995.

I. Bogaert, J. Peeters, J. Fostier, and F. Olyslager, “NSPWMLFMA: A low
frequency stable formulation of the mlfrna in three dimensions,” Antennas and
Propagation Society International Symposium, 2008. AP-S 2008. IEEE, pp. 1—
4, July 2008.

S. Rao, D. Wilton, and A. Glisson, “Electromagnetic scattering by surfaces of
arbitrary shape,” Antennas and Propagation, IEEE Transactions on, vol. 30,
pp. 409—418, May 1982.

208

 

 

[182] J. Applequist, “Traceless cartesian tensor forms for spherical harmonic func-

tions: new theorems and applications to electrostatics of dielectric media,” J.
Phys. A: Math. Gen., vol. 22, pp. 4303—4330, 1989.

[183] A. Peterson, S. Ray, and R. Mittra, Computational Methods for Electromagnet-
ics. Institute of Electrical and Electronic Engineers (IEEE) Press, 1997.

[184] D. L. Colton and R. Kress, Integral equation methods in scattering theory. New
York: Wiley, 1983.

[185] K. E. Atkinson, The numerical solution of integral equations of the second kind.
Cambridge, UK: Cambridge Univ. Press, 1997.

[186] G. Vecchi, “Loop-star decomposition of basis functions in the discretization of
the eﬁe,” IEEE Trans. Antennas and Propagat., vol. 47, pp. 339—346, 1999.

[187] G. Hsiao and R. E. Kleinman, “Mathematical foundations for error estimation
in numerical solution of integral equations in electromagnetics,” IEEE Trans.
Antennas and Propagat., vol. 45, pp. 316—328, 1997.

[188] R. J. Adams and G. S. Brown, “Stabilization procedure for electric ﬁeld integral
equation,” Electronic Letters, vol. 35, pp. 2015-2016, 1999.

[189] R. J. Adams, “Physical and analytical properties of a stabilied electric ﬁeld
integral equation,” IEEE Trans. Antennas and Propagat., vol. 52, pp. 362—372,
2004.

[190] S. H. Christiansen and J .-C. N édélec, “A preconditioner for the electric ﬁeld
integral equation based on calderon formulas,” SIAM J. Numer. Anal, vol. 40,
no. 3, pp. 1100—1135, 2002.

[191] H. Contopanagos, B. Dembart, M. Epton, J. J. Ottusch, V. Rokhlin, J. Visher,
and S. Wandzura, “Well-conditioned boundary integral equations for three-
dimensional electromagnetic scattering,” IEEE Trans. Antennas and Propagat.,
vol. 50, pp. 1824—1830, 2002.

[192] R. J. Adams and N. J. Champagne, “A numerical implementation of the mod-
iﬁed electric ﬁeld integral equation,” IEEE Trans. Antennas and Propagat.,
vol. 52, pp. 2262—2266, 2004.

[193] F. Andriulli, K. Cools, H. Bagci, F. Olyslager, A. Buffa, S. Christiansen, and
E. Michielssen, “A multiplicative calderon preconditioner for the electric ﬁeld
integral equation,” Antennas and Propagation, IEEE Transactions on, vol. 56,
pp. 2398-2412, Aug. 2008.

[194] A. D. Yaghjian, “Augmented electric- and magnetic-ﬁeld integral equation,”
Radio Science, vol. 16, pp. 987—1001, 1981.

209

 

[195]

[196]

[197]

[198]

(199]

[200)

[201]

[202]

(203]

[204]

[205]

[206]

J. R. Mautz and R. F. Harrington, “H-ﬁeld, E—ﬁeld and combined-ﬁeld solutions
for conducting bodies of revolution,” Electronics and Communications, vol. 32,
pp. 157-164, 1978.

J. Mautz and R. Harrington, “A combined-source solution for radiation and
scattering from a perfectly conducting body,” Antennas and Propagation, IEEE
Transactions on, vol. 27, pp. 445—454, Jul 1979.

R. Shore and A. Yaghjian, “Dual-surface integral equations in electromag-
netic scattering,” Antennas and Propagation, IEEE Transactions on, vol. 53,
pp. 1706—1709, May 2005.

M. Vikram, H. Huang, B. Shanker, and V. Tri, “A novel Wideband FMM for
fast integral equation solution of multiscale problems in electromagnetics,” to
appear in IEEE Trans. Antennas and Propagat., July 2009.

B. Hariharan, S. Aluru, and B. Shanker, “A scalable parallel fast multipole
method for analysis of scattering from perfect electrically conducting surfaces,”
in Supercomputing ’02: Proceedings of the 2002 ACM/IEEE conference on Su-
percomputing, (Los Alamitos, CA, USA), pp. 1—17, IEEE Computer Society
Press, 2002.

L. Gurel and O. Ergul, “Fast and accurate solutions of extremely large integral-
equation problems discretised with tens of millions of unknowns,” Electronics
Letters, vol. 43, pp. 499-500, April 2007.

C. Waltz, K. Sertel, M. Carr, B. Usner, and J. Volakis, “Massively parallel
fast multipole method solutions of large electromagnetic scattering problems,”
IEEE Dans. Antennas and Propagat., vol. 55, no. 6, pp. 1810—1816, June 2007.

P. B. Callahan and S. R. Kosaraju, “A decomposition of multidimensional point
sets with applications to k-nearest neighbors and n-body potential ﬁelds,” Jour-
nal of the ACM, vol. 42, pp. 67—90, 1995.

M. S. Warren and J. K. Salmon, “Astrophysical n-body simulations using hier-
archical tree data structures,” Proceedings of Supercomputing, p. 570576, 1992.

M. Warren and J. Salmon, “A parallel hashed oct-tree n-body algorithm,” Pro-
ceedings of Super-computing, pp. 1—12, 1993.

P. Liu and S. Bhatt, “Experiences with parallel n-body simulation,” IEEE
Transactions on Parallel and Distributed Systems, vol. 11, pp. 1306—1323, 2000.

F. Sevilgen, S. Aluru, and N. Futamura, “A provably optimal, distribution—
independent parallel fast multipole method,” Proceedings of International Par-
allel and Distributed Processing Symposium IEEE Computer Society Press,
pp. 77—84, 2000.

210

 

[207] S. H. Teng, “Provably good partitioning and load balancing algorithms for
parallel adaptive n-body simulation,” SIAM Journal on Scientiﬁc Computing,
vol. 19, pp. 635—656, 1998.

[208] S. Velamparambil, J. E. Schutt-Aine, J. G. Nickel, J. M. Song, and W. C. Chew,
“Solving large sclae electromagnetic problems using a linux cluster and parallel
MLFMA,” IEEE AP—S Int. Symp., vol. 1, pp. 636—639, 1999.

[209] O. Ergul and L. Gurel, “Efﬁcient parallelization of the multilevel fast multipole
algorithm for the solution of large-scale scattering problems,” IEEE Trans. An-
tennas Propagat., vol. 56, pp. 2335—2345, 2008.

[210] S. Velamparambil and W. C. Chew, “Analysis and performance of a distributed
memory multilevel fast multipole algorithm,” IEEE Trans. Antennas Propagat.,
vol. 8, pp. 2719—2727, 2005.

[211] O. Ergul and L. Gurel, “Hierarchical parallelisation strategy for multilevel fast
multipole algorithm in computational electromagnetics,” Electronics Letters,
vol. 44, 2008.

[212] J. Fostier and F. Olyslager, “A provably scalable parallel multilevel fast multi-
pole algorithm,” Electronics Letters, vol. 44, pp. 1111—1113, Sept. 2008.

[213] M. Vikram, A. Baczewski, B. Shanker, and S. Aluru, “Parallel accelerated carte-
sian expansions for particle dynamics simulations,” to appear in Proceedings of
International Parallel and Distributed Processing Symposium IEEE Compute
Society Press, 2009. '

[214] M. Vikram, A. Baczewski, B. Shanker, and L. Kempel, “Accelerated cartesian
expansion (ace) uniﬁed framework for the rapid evaluation of potentials asso-
ciated with the diffusion, lossy wave and klein-gordon equations,” Submitted to
Journal of Computational Physics, 2008.

[215] G. M. Morton, “A computer oriented geodetic data base and a new technique
in ﬁle sequencing,” tech. rep., Tech. Rep. IBM.

[216] M. Griebel, S. Knapek, and G. Zumbusch, Numerical simulation in Molecular
Dynamics. Springer-Verlag, 2007.

[217] A. Tamburrino, R. Fresa, S. Udpa, and Y. .Tian, “Three-dimensional defect
localization from time-of—ﬂight / eddy current testing data,” IEEE Transactions
on Magnetics, vol. 40(2), pp. 1148—1151, 2004.

[218] P. M. Morse and H. Feshbach, Methods of Theoretical Physics. McGraw-Hill
Science / Engineering / Math, 1953.

[219] W. Boettinger, J. Warren, C. Beckermann, and A. Karma, “Phase-simulation
of solidiﬁcation,” Annual Review of Materials Research, vol. 32, pp. 163—94,
2002. .

211

 

 

[220] D. Ariﬁn, L. Lee, and C. Wang, “Mathematical modeling and simulation of drug
release from microspheres: Implications to drug delivery systems,” Advanced .
Drug Delivery Reviews, vol. 58, pp. 1274—1325, 2006.

[221] A. E. Yilmaz, D. S. Weile, B. Shanker, J.-M. Jin, and E. Michielssen, “Fast
analysis of transient scattering in lossy media,” IEEE Antennas and Wireless
Propagation Letters, vol. 1, pp. 14—17, 2002.

[222] V. Rokhlin, “Rapid solution of integral equations of classical potential theory,”
J. Comput. Phy., vol. 60, pp. 187—207, 1985.

[223] E. Bleszynski, M. Bleszynski, and T. Jaroszewicz, “Aim: Adaptive integral
method for solving large-scale electromagnetic scattering and radiation prob-
lems,” Radio Sci, vol. 31, no. 5, pp. 1225—1251, 1996.

[224] S. Kapur and D. E. Long, “Ies3: Eﬂicient electrostatic and electromagnetic
simulation,” IEEE Computational Science and Engineering, vol. 5, pp. 60—67,
Oct. 1998.

[225] L. Greengard and V. Rokhlin, “A new version of the fast multipole method for
the laplace equation in three dimensions,” Acta Numerica, vol. 6, pp. 229—269,
1997.

[226] L. Greengard and J. Strain, “The fast gauss transform,” SIAM J. Sci. Stat.
Comput., vol. 12, pp. 79—94, January 1991.

[227] J. Strain, “Fast adaptive methods for the free-space heat equation,” SIAM J.
Sci. Statist. Comput., vol. 15, pp. 186—206, 1992.

[228] C. Yang, R. Duraiswami, N. A. Gumerov, and L. Davis, “Improved fast gauss
transform and efﬁcient kernel density estimation,” in I CCV ’03: Proceedings of
the Ninth IEEE International Conference on Computer Vision, (Washington,
DC, USA), p. 464, IEEE Computer Society, 2003.

[229] L. Greengard and J. Strain, “A fast algorithm for the evaluation of heat poten-
tials,” Comm. Pure Appl. Math, vol. 43, no. 8, pp. 949—963, 1990.

[230] J. Tausch, “A fast method for solving the heat equation by layer potentials,”
Journal of Computational Physics, vol. 224, no. 2, pp. 956—969, 2007.

[231] L. Greengard and P. Lin, “Spectral approximation of the free-space heat kernel,”
Appl. Comput. Harmon. Anal, vol. 9, no. 1, pp. 83—97, 2000.

[232] J .-R. Li and L. Greengard, “On the numerical solution of the heat equation i:
fast solvers in free space,” Journal (of Computational Physics, 2007.

[233] R. Harrington, Field Computation by Moment Methods. Wiley-IEEE Press,
1993.

212

 

[234] M. Lu, K. Yegin, B. Shanker, and E. Michielssen, “Fast time domain integral
equation solvers for analyzing two-dimensional scattering phenomena, part i:
temporal acceleration,” Electromagnetics, vol. 24, no. 6, pp. 425-449, 2004.

[235] A. E. Yilmaz, J. M. Jin, and E. Michielssen, “Analysis of low-frequency elec-
tromagnetic transients by an extended time-domain adaptive integral metho,”
IEEE Trans. on Advanced Packaging, vol. 30, pp. 301—312, May 2007.

[236] The Design and Implementation of FFTW3, vol. 93, 2005.

[237] S. S. Udpa (Editor) and P. O. Moore (Editor), Handbook of Nondestructive
Evaluation: Electromagnetic Testing, vol. 5. ASNT, 2004.

[238] W. C. Chew, Waves and Fields in Inhomogeneous Media. Piscataway, NJ: IEEE
Press, 1995.

[239] T. J. C111 and W. C. Chew, “Fast algorithm for electromagnetic scattering by
buried 3-D dielectric objects of large size,” IEEE Transactions on Geoscience
and Remote Sensing, vol. 37, pp. 2597—2608, Sept. 1999.

[240] D. H. Schaubert, D. R. Wilton, and A. W. Glisson, “A tetrahedral modeling
method for electromagnetic scattering by arbitrarily shaped inhomogenous di-
electric bodies,” IEEE Trans. Antennas and Propogat., vol. 32, pp. 77—85, Jan
1984.

[241] D. Wilton, S. Rao, A. Glisson, D. Schaubert, O. Al-Bundak, and C. Butler,
“Potential integrals for uniform and linear source distributions on polygonal
and polyhedral domains,” IEEE Transactions on Antennas and Propagation,
vol. 32, pp. 276—281, Mar. 1984.

[242] H. A. Sabbagh, J. R. Bowler and L. D. Sabbagh, “A Volume Integral Code for
Eddy-Current Non-destructive Evaluation,” Journal of Applied Computational
Electromagnetics Society, vol. 4, pp. 3—22, 1989.

213

 

      

 

11111111111111111111

062 6034

' 11111111111111

   

. . :::::