WNWWIWIHIWHHI'WllHluilHHlllﬂlKWI

     

'—I_‘_‘
IN-x
I(DOOR)

   

ANSTATEU

1 ”NM If”!!!IIHIIIHIIIIHII

00877 2661

{warms

                 

:lHjllzl!

 

 

 

 

 

 

 

 

     

This is to certify that the

dissertation entitled

Parallel Homotopy Algorithm for Symmetric

Large Sparse Eigenproblems

presented by

Liang Jiao Huang

has been accepted towards fulﬁllment
of the requirements for

Ph. D. degreeinMathematics

47/77/4 v

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

1 LIBRARY
4 Michigan State
' University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before due due.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

"7 ﬂ
ll IL_I|
4i i:
—i__ FT
r—u i

MSU Is An Affirmative ActiorVEqueI Opportunity Institution
Mama-M

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PARALLEL HOMOTOPY ALGORITHM
FOR LARGE SPARSE SYMMETRIC EIGENPROBLEMS

Liang J iao Huang

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Mathematics

1992

8

/
*x

.70

0

/
O

22 ..

/

ABSTRACT

PARALLEL HOMOTOPY ALGORITHM
FOR LARGE SPARSE SYMMETRIC EIGENPROBLEMS

By

Liang J iao Huang

In this work, we apply homotopy method to solve eigenproblem
Ax=/\x, AER,:I:€R”\{0}

for large sparse symmetric matrix A. A one-parameter family of matrices A(t) =

tA + ( 1 — t)D is introduced and eigenproblem

is considered for t E [0, l]. The problem of choosing optimal starting matrix A(0) = D
is discussed and partial solutions are obtained. The regularity and bifurcation prob-
lems of Mt) and 2(t) are also considered. It is found that these functions can be
chosen as analytic functions of t and the bifurcations are of simple type. Then a
homotopy continuation algorithm is constructed and several new techniques are de-
veloped to handle the curve following process more efﬁciently. Finally the algorithm is
implemented in both parallel and vector machines and numerical results are obtained
for many typical testing matrices. Our preliminary experiments show that homotopy

continuation method is a very promising method.

To my wife Lu Ping

iii

ACKNOWLEDGMENTS

I would like to thank Professor Tien-Yien Li, my dissertation advisor, for suggesting
the problem and the directions he provided which made this work possible. I would
also like to thank him for for his encouragement and support during my graduate
study at Michigan State University.

I would like to thank my dissertation committee members Professor Qiang Du,
Professor Dennis Dunninger, Professor Richard Hill, and Professor David Yen for

their valuable suggestions and time.

iv

Contents

List of Tables v
List of Figures vi
1 Introduction 1
2 The Choice of Starting Matrix 4
2.1 Introduction ................................ 4
2.2 Unitarily Invariant Norms ........................ 6
2.3 Block Diagonal Approximants ...................... 7
3 Regularity and Bifurcation 14
3.1 Introduction ................................ 14
3.2 Regularity ................................. 15
3.3 Bifurcation Directions .......................... 16
3.4 Continuity of Eigenvectors ........................ 18
3.5 Close Eigenvalues ............................. 20
4 The Algorithm 23
4.1 Choice of the Starting Matrix ...................... 23
4.2 Location of the Starting Points ..................... 24
4.3 Prediction ................................. 24
4.4 Correction ................................. 26

4.5 Dynamic Subspace Iteration ....................... 32

4.6 Checking .................................. 33
4.7 Clearing Up ................................ 35
4.8 Step Size Control ............................. 35
Numerical Experiments 37
5.1 Introduction ................................ 37
5.2 Test Matrices ............................... 38
5.3 Results on IBM 3090 Vector Machine .................. 41
5.4 Results on BBN Butterfly Parallel Machine ............... 44
5.5 Accuracy and Orthogonality ....................... 46
Bibliography 48

vi

List of Tables

5.1
5.2
5.3
5.4
5.5
5.6
5.7

Test data from IBM ........................... 41
Interior 20 eigenpairs on IBM 3090 ................... 42
First 50 eigenpairs on IBM 3090 with perturbed starting matrix . . . 43
Speed-up for homotopy algorithm .................... 45
Speed-up over EA15 ........................... 45
The residuals of eigenpairs ....................... 46
The orthogonalities of eigenvectors ................... 47

vii

List of Figures

2.1
2.2

4.1
4.2

5.1
5.2
5.3

Curves corresponding to D1 ....................... 5
Curves corresponding to D2 ....................... 5
The change of clusters from t to t + h .................. 26
An isolated eigenvalue at t becomes nonisolated at t + h ....... 28
Execution time vs. matrix order ..................... 44
Speed-up for homotopy ......................... 45
Speed-up over EA15 ........................... 45

viii

Chapter 1

Introduction

Large scale scientific computing is currently a very active research ﬁeld. Tradi-
tional methods which work well for small problems are often not suitable for large
problems, and not suitable for modern computer architectures. For example, the very
efﬁcient and widely used method for solving eigenproblems of small matrices — the
QR iteration method [25], becomes inapplicable for large sparse eigenvalue problems
because, among other things, the process of Householder reduction can quickly de-
stroy the sparse pattern. Moreover, the method is highly serial in nature, it is difﬁcult
to fully exploit the power of modern computers. The best known method which can
be used to solve large scale eigenvalue problems is the Lanczos method [5]. It can
take advantage of the sparseness structure of a given matrix and is good for finding
a few extreme eigenvalues. However, it is not a parallel method either, and it is not
efﬁcient for finding interior eigenvalues.

In this work, we propose a method, called the homotopy continuation method,
which is suitable for parallel solution of large sparse symmetric eigenvalue problems.
The basic idea of the method is described in the following (see [17] and [18] for more
details).

Given a real symmetric matrix A of order n, instead of solving for all the eigen-

values of A directly, we choose another real symmetric matrix D, called the starting

matrix, and consider the one-parameter family of matrices
A(t) = D + t(A — D) for t 6 [0,1]. (1.1)
This family has the property,
A(O) = D, A(1) = A.

For each t 6 [0,1], let the eigenvalues of A(t) be A1(t) S A2(t) S - - - S An(t). It is
well known that eigenvalues of a matrix are continuous functions of its elements, so in
this case each A;(t) is a continuous function of t for i = 1,2, - ~ - , 12. Therefore, there
are n continuous curves, which we shall call eigencurves.

Suppose we have the eigenvalues of D — A1(0),A2(0),-- - ,An(0). From these
values, the starting points of n eigencurves are known. By following these continuous
eigencurves, we can reach the ending points A1(1),A2(1),- .. ,An(1) of the curves,

which are the eigenvalues of the given matrix A.

The main advantages of our method are:

o It is parallel in nature: tracing of each eigencurve is independent of the others.

0 The main calculation is concentrated on solving large sparse linear equations,
and unlike solving for eigenvalues directly, several good packages are available

for solving such linear equations efﬁciently [8], [9], [11].

o It can be used to ﬁnd only a few of the speciﬁed eigenvalues. (In contrast,
the Lanczos method tends to give extreme eigenvalues on both ends before the
emerging of interior eigenvalues , and it can not tell which eigenvalue is found

and what multiplicity it has.)

The dissertation proceeds as follows: Chapter 2 discusses the problem of choosing
a starting matrix D. Some optimal solutions to this question are found under certain

conditions. Chapter 3 addresses the regularity and bifurcation problems. Under the

2

assumption that A and D is symmetric, the eigensystems are analytic. Also, with A
and D symmetric, the bifurcations are easy to handle — bifurcation directions can
be readily computed using a simple formula. Chapter 4 describes our homotopy algo-
rithm for following the eigencurves. Chapter 5 presents the test matrices, softwares
used, and the numerical results on two typical machines —— one a vector machine,
and the other a parallel machine. The results show that the homotopy continuation

method is very promising for large sparse symmetric eigenproblems.

Chapter 2

The Choice of Starting Matrix

2.1 Introduction

Choosing a good starting matrix D plays a very important role in our homotopy
algorithm. This can be seen from the following example.
Example 2.1: Let A be a 10 x 10 symmetric random matrix, D; be a random

diagonal matrix, and D2 be a block diagonal matrix with two diagonal blocks directly

from A. That is, if
A = A11 A12
A21 A22

A 0
D2 = 11
0 A22

where A,- are 5 x 5 matrices for i, j = 1,2. We construct homotopies (1.1) for

then

each starting matrix D1 and D2, and plot eigencurves in Figure 2.1 and Figure 2.2
respectively. It can be seen that, eigencurves corresponding to D2 are much more
straight and thus much easier to follow than those corresponding to D1. Obviously,

D2 is a better starting matrix than D1.

The following theorem helps in deciding if a matrix D is a good starting matrix.

a6u/ 26..

 

 

 

 

 

 

 

 

"’°‘ * "WA

‘1.0 I I I I ‘ -10 I I 1 I I
00. .20 H0 .60 .90 1-0 00. .20 so .60 .80 1.0

 

 

 

 

 

Figure 2.1: Curves corresponding to D1 Figure 2.2: Curves corresponding to D2

Theorem 2.1 Let A and D be given symmetric matrices, A1(t),A2(t),- - -,A,.(t) be
the eigenvalues of A(t) in (1.1), then |Ai(t)| _<_ [IA — DI]: for each i = 1,2,- - - ,n.
Proof: By Weyl’s Theorem [2],
lMt + At) - Mill S ||A(t + At) - A(illlz = "NH - Dlllz-
Hence
Mt + At) - Mt)

I At
Because A;(t) is differentiable function of t [24], letting At -v 0 yields,

 

l 5 MA - D)":-

IAI-(tN S "(A - D)||2.

This completes the proof. I

One of our goals in constructing a good homotopy is to ﬁnd a starting matrix
D so that the eigencurves are more straight. The theorem above indicates that one
should look for a D which, among other conditions, minimizes [IA — Dllz. For the

matrix A in example 2.1, it will be proved in Corollary 2.4 that
IIA — Dz": 5 HA - D1 "2. (2.1)

5

(In fact, we will show
IIA - Dzllz = mgn "A - Dllzi (2.2)
where 'D is the set of matrices having the same structure as D2). This explains why
eigencurves are better behaved when D2 is used as a starting matrix.
So the problem of ﬁnding a good starting matrix becomes a matrix approximation
problem. In this chapter, an important type of norms in matrix theory is introduced

ﬁrst, then the approximation problem arises in our homotopy method is considered.

2.2 Unitarily Invariant Norms

Some notations are needed for future reference:

M = { Complex matrices of order n }
Ll = { Unitary matrices of order n }
D = { Block diagonal matrices of order n with

a given block structure }

A unitarily invariant norm is a matrix norm which satisﬁes
“A“ = IIUAII = IIAVII

for every A E M and U, V G Ll. A result of J. von Neumann characterizes all
unitarily invariant norms by symmetric gauge functions of singular values [28]. Two
important classes of unitarily invariant norms are: Schatten p norms
llAllp =(28j(4)”)1/’, p 2 1
i=1
and Ky Fan 1: norms
I:
IIAllk = ZsJ-(A). k =1.2.---.n

i=1

where {3j(A), j = 1,2, - - - ,n} are the singular values of A in descending order. It is

easy to see that Schatten 2 norm is the Frobenius norm || - "F and Ky Fan 1 norm is

the spectral norm || - "2.

2.3 Block Diagonal Approximants

A good choice for the starting matrix is based on three considerations. First, the
eigensystems of D should be easy to ﬁnd; secondly, D should be as close to A as
possible, so that the eigencurves are better behaved [l7]; ﬁnally, D itself should be
easy to obtain. However, these considerations often conﬂict with each other in the
sense that if D is too close to A, then eigensystems of D could be as difﬁcult to ﬁnd
as those of A’s; on the other hand, if eigensystems of D are easy to ﬁnd, then D is
usually not very close to A.

One natural candidate for the starting matrix D is the block diagonal matrix

 

 

[ Dll ].
D
D = 22 _ (2.3)
L D“: .
where D,-.- are smaller square matrices for 2' = 1,2,--- ,lc. For such a matrix D,

its eigensystems can be found from those of D33, and each D53 eigensystems can
be easily found since it is a small matrix. Furthermore, by using multi-processors,
eigensystems of D can be computed in parallel. Therefore, matrices of this form

satisfy our ﬁrst consideration.

Let D be of the form in (2.3). The next question is, how to ﬁnd D of this form
which is closest to a given matrix A. In other words, let D be the set of matrices of

the form in (2.3), we want to ﬁnd Do 6 D such that
"A - D0" = mDin ”A — D“. (2.4)

To the best of our knowledge, this type of matrix approximation has never been
investigated before. To measure the closeness of two matrices, unitarily invariant
norms are used as the underlying norms. This is because we are considering eigenvalue
problems and these norms are simple functions of eigenvalues (by von Neumann’s

characterization [28]).

For the Frobenius norm, the solution of (2.4) is obvious. We can partition A into
blocks according to the structure in (2.3) and choose Do to be the block diagonal
matrix whose diagonal blocks are identical to the corresponding diagonal blocks of

A. That is, if

 

A11 A12 ° ° ' All:
A A - ~ - A
A = 21 22 2!: (2.5)
_ Akl Akz AH: .
then _
An
A
Do = 22 0 . (2.6)
AH:

 

 

For general unitarily invariant norms, however, the solution is not so obvious. One
would hope that Do above is the choice since it meets all our three considerations. It
turns out that this is true for an important class of matrices.

In this section, we prove some positive results ﬁrst, and then a counterexample is

constructed to show that the result can not be extended to cover all matrices.

Theorem 2.2 Let A and Do be matrices in (2.5) and (2.6). If there exists a unitary
matrix E G Ll which commutes with every D E D such that

A — D0 = —E(A — Do)EH, (2.7)
then Do is the best block diagonal approximant of A for every unitarily invariant
norm, i.e., [IA — D0" = minpep ||A — D”.

For the proof of Theorem 2.2, the following lemmas are needed. They were ﬁrst

proved by Fan in [10].

Lemma 2.1 Let A,B E M ,then ”A” S ”B“ holds for all unitarily invariant norms

|| - I] if and only if
“Alla S “Bill:

8

holds for all Ky Fan norms || - “k, k =1,2, - . - ,n.

Lemma 2.2 Let A 6 M, X)‘ = {2:1,:c2,~-,xk} and Y), = {y1,y2,~--,yk} be sets of

orthonormal vectors in C", then

I: l:
“Alli: = 231M) = 5,113]; 2 Rei-Tjiij).
j=l k k

I j=l
Proof of Theorem 2.2: Let A0 = A — D0 = U EV” be a singular value decomposi-
tion of A0, with

U =(U1, U2, ' ' ' a It“), 2 = diag(31(A0)a ' ' ' I 3n(A0))a V = (‘01, 02, ° ° ' ,vn)-
Because A0 = —EA0E”,
A0 = —E(U2V”)E” = (—EU)E(EV)"

where -EU and EV are unitary matrices. This gives another singular value decom-

position of A0. Thus,
(“n/1005) = {—Eu,,AoEv,-) = 31(40)-
By Lemma 2.2, for any D E D and k S n,

k
”A - Dllk Z Z: Rein» (A - 0M)

i=1
l: I:
= 2011, Ao‘vj) + Z Re(u,-, (Do — 13M)
i=1 i=1
l:
= ||A - Dollk + 2 new.» (Do - D)vj)- (2-8)

i=1
On the other hand, since —EU and EV are unitary matrices,

l:
”A - Dllk Z 2 34-511» (A - D)Ev,-)
i=1
k

= Z(—Euj, AoEvj) + i: Re(—E’Uj, (DO - D)Evj)

i=1 i=1

k k
= 231-(Ao)— Z Re(Eu,-, E(Do - Dlvj)

j=l ’=l

I:
= ”A — D0“). — Z Re(u,-, (Do — D)v,-). (2.9)

i=1

9

Combining inequalities (2.8) and (2.9),

k
IIA - DIII: Z llA - Dollk + I: 38064130 - D)vj)|
i=1

> llA - Dollh-

This completes the proof because of Lemma 2.1. I
Remark 2.1: In fact, it can be seen from the proof that the following more
general result has been established: Let A and Do be any two given matrices, if there

exists a unitary matrix E that commutes with Do such that
A -— D0 = —E(A — D0)E”,
then Do is closest to A among all matrices that commute with E, that is,

IIA — Doll = 02,293,, ||A - D".

An important class of matrices satisfying (2.7) is the class of block tridiagonal

matrices.

Corollary 2.3 If A is a block tridiagonal matrix, then the best block diagonal ap-

proximant of A is Do for every unitarily invariant norms.

Proof: Condition (2.7) is satisﬁed for E = diag(Il, —Iz, - - . ,(—1)"‘11k), with each I,
an identity matrix of appropriate order. I

Block tridiagonal (in particular, tridiagonal) matrices arise in many applications.
In fact, in matrix eigenvalue computations, a given matrix is usually transformed into
a compact form ——— tridiagonal or block tridiagonal form by Householder or Lanczos
transformation, then QR iteration, bisection, or homotopy method is applied to ob-

tain the solutions.

Corollary 2.4 Let A be any square matrix, if we partition A into

 

then

D0 =

 

A11 0
0 A22

is the best approximant of A among all block diagonal matrices of the form for every

unitarily invariant norm.

Proof: This is a special case of Corollary (2.3) with I: = 2. I

Inequality (2.1) and equation (2.2) of Example 2.1 at the beginning of this chapter
follow from this corollary.

It is well known that the time required to ﬁnd eigensystems of a matrix of order n
is proportional to n3. If A is divided into two blocks of approximately equal sizes as in
Corollary 2.3, then D is closest to A and eigensystems of D can be solved by parallel
computer using about 1 / 8 execution time for solving eigensystems of A. Furthermore,
if block sizes of D are still too large to work with, they can be divided into smaller
blocks and the “divide and conquer” strategy can be employed.

For general matrices, the conclusion of Theorem 2.2 may not be true. This can
be seen in the following example.

Example 2.2: Let

 

 

 

 

’ 0 2 2 7
A = 2 0 2
. 2 2 0 .
If we choose block size to be 1, then
. 0 0 0 I
D0 = 0 0 0
L 0 0 0 .
The eigenvalues of A are A1 = 4, A2 = —2 and A3 = —2, therefore for spectral norm

ll - Ila,

"A - Do||2 = {2332; IM = A: = 4

11

However,

llA-1||2=112&§|M-1|=3,

i.e.,

IIA - IIla < llA - Dolls-

Thus Do is not the best approximant. In fact, it can be shown that I is the best

approximant in this case: ﬁrst of all, A — I has orthonormal eigenvectors

 

 

 

 

 

 

r a F 1 r 1
l 0 -—2
1‘ " 1 l 1‘ " l 1 1 — 1 1
1 ﬂ 1 2 ﬂ ) 3 «6 a
1 —1 1 J
with corresponding eigenvalues A1 = 3, A2 = —3 and A3 = —3. If there exists a matrix

D such that "A — DI]; < ”A - 1]]2, then by Courant-Fischer min-max theorem, we
must have
Kiwi/1 - D)x,-)| = K3313 (A - [)331') +($1a(1 - Dlxjil
= It; + ($1, (I - D)$j)| S ||A - D||2
< [IA-III2= Mil. i=1,2,3.

Let I — D = diag(d1,d2,d3) , then the above inequalities yield

($1, (I -- D)$1) == (d1 + (I: + d3)/3 < 0, (2.10)
(32, (I — D)x2) = (.12 + d3)/2 > o, (2.11)
(1'3, (1 - D)$3) = (4d1 + d2 + d3)/6 > 0. (2.12)

It follows from (2.10) and (2.11), d1 < 0. However, (2.10) and (2.12) imply 3d; > 0,

Thus such a matrix D does not exist, and the assertion is achieved.

Another example of larger order can be found in [22] where the conclusion was

justiﬁed by numerical results.

Although the question of choosing the starting matrix is not completely answered,
the matrix D in (2.6) is the best choice in several important cases (such as block

tridiagonal A) and is the best in all cases for Frobenius norm.

12

Remark 2.2: Our results provide a general guide line for choosing a starting
matrix. In a practical problem, usually more special properties about the given
matrix A is known, this allows other choices for D as long as it satisﬁes our three
general considerations. For instance, when a physical problem is investigated, in
order to determine the ﬁnal parameters for the problem, it is necessary to do a series
of experiments. In each experiment, the parameters are adjusted only by a small
amount. If the question is related to an eigenvalue problem, then there is a family of
matrices with the same (or similar) structures, each matrix is a small perturbation of
the previous one (except the ﬁrst). In such a case, the natural choice for the starting
matrix is one of the matrices in the family whose eigensystems have been found, and
we expect eigensystems of the new matrix to be a “small” perturbation of the starting

matrix.

13

Chapter 3

Regularity and Bifurcation

3.1 Introduction

It is well known that eigenvalues are continuous functions of the entries of the matrix.

However, they are generally not differentiable. For example, consider

A(t)=[l t]. (3.1)
1 1

its eigenvalues are A1(t) = 1 — \/t and A2(t) = 1 + ﬂ. They are not differentiable at
t = 0. This happens because, at t = 0, A(O) has multiple eigenvalues A1(0) = A2(0) =
1, that is, eigencurves have a bifurcation point.

The behavior of eigenvectors at a bifurcation is even worse. They may not even

be continuous. The following example is attributed to W. Givens in [21], §3.1:

A(t) = [ 1+tcos(2/t) —t sin(2/t) J (3.2)

—t sin(2/t) 1 — tcos(2/t)

has eigenvalues A1(t) = 1 -t and A2(t) = l + t. They are analytic functions of t.

However, the corresponding eigenvectors are

cos(1/t) sin(l/t)
31“) = i $2“) = v
[ —sin(1/t) ] [cos(I/t) [

14

which have no limits as t —-v 0 ! Again the problem arises because A(t) has a double
eigenvalue at t = 0.

For our homotopy algorithm, handling the bifurcation efﬁciently becomes a very
important problem. The behavior of eigencurves around such points can be quite
complicated. Nevertheless, for symmetric matrices, bifurcation is not as difﬁcult. In
fact, a result proved in 19403 ([24], Chapter I) guarantees that A;(t) and x.-(t) can
be chosen in such a way that they are all analytic functions of t. In such a case,
the eigencurves through a bifurcation point are well behaved. A more general result
of this type is considered one of the major breakthroughs in the past ﬁfty years in

eigenvalue perturbation theory [14].

3.2 Regularity

We summarize those results that related to our problem in the following theorem.

Theorem 3.1 Suppose A and D are both real symmetric matrices. Let
A(t)=tA+(1 -t)D= D+t(A—D),

then the eigensystems

(A10), 31(0), (02“), $20)), ' ' ' a(An(t)1 xn(t))

of A(t) can be chosen in such a way that all functions involved are analytic functions
for real t. Furthermore, there are only ﬁnitely many t 6 [0,1] such that A(t) has

multiple eigenvalues.

Proof: see [14]. I
Remark 3.1: In our algorithm, preserving the order A1(t) S A2(t) S - . - S A..(t)
is very important. With such an order, we can implement parallel processing and

compute only partial eigensystems when necessary. However, when this is imposed,

15

the conclusion of the above theorem is no longer valid. For example, let
1 1 — 2t
A(t) = (3'3)
1 — 2t 1

then A1(t) = 2t, A2(t) = 2 — 2t and they are analytic. With the ordering, however,
A1(t) = 1-[1—2tl, Ag(t) = 1+]1-2tl. They are not analytic at t = 1/2. Nevertheless,
after the ordering, these A;(t)’s are still piecewise analytic and one-sided derivatives
Ag")(t+) always exist. This is sufﬁcient for our numerical implementation, since we

trace eigencurves forward, only the right hand derivatives are needed.

3.3 Bifurcation Directions

Suppose
(MU), 31“», (A2“), 320)), ' ° ' 9 (Ana), $n(t))

are the eigensystems of A(t) as in Theorem 3.1, then

Acme) = Aj(t)=vj(t). (3.4)

Taking the right-hand derivatives with respect to t on both sides of this equation
yields
A'(t)x,-(t) + A(t)x;-(t+) = A;(t+)$j(t) + A,(t)x;~(t+). (3.5)

Multiplying the above equation on the left by xﬂt), we have
A}(t+) -_- xf(t)A'(t):c,-(t).

But A(t) = A — D, so
A;(t+) = zf(t)(A — D)x,~(t). (3.6)
With this last formula, the prediction-correction scheme can be applied to numerically

compute the eigensystems. However, at a bifurcation point, eigenvectors are not

uniquely deﬁned. For an eigenvalue of multiplicity 1:, any I: orthonormal vectors

l6

from the Ir dimensional invariant subspace form an eigenbasis for that subspace. By
Theorem 3.1, there is at least one way in choosing an appropriate set of eigenvectors
for each t such that xJ-(t) becomes analytic. This choice is not known beforehand.
And (3.6) can only be applied for this set of xj(t)’s. An alternative way of computing

these bifurcation directions is given as follows.

Theorem 3.2 Suppose A;(t) = Ag+1(t) = - - - = Ag+k-1(t) are k multiple eigenvalues
of A(t). Let y1(t), y2(t), - - - , y)‘(t) be any I: orthonormal eigenvectors corresponding to
these eigenvalues. Then A;(t+),A:-+1(t+),--- ,»\:-+,,_1(t+) equal the k eigenvalues of
YT(A — D)Y, where Y is the matrix consisting of y.-(t),y.-+1(t),- - - ,y,-+k-1(t) as its

columns.

Proof: The proof of a more general result can be found in [14]. By using equation
(3.5) a very simple proof can be obtained here.
Let x.-(t), x.-+1(t), - - - , $g+k_1(t) be the analytic eigenvectors corresponding to A;(t) =

Ag+1(t) = - - - = Ag+k-1 (t) as in Theorem 3.1. Multiply (3.5) by x,T(t) from the left,

xl(t)(A — Din-(t) + Az(t)zi(t)x;(t+) = A}(t+)zi(t)x.-(t) + A1(t)wi(t)z;(t+)-

Since A)(t) = /\,(t) and xjr(t)x,-(t) = 6;,- for i S l, j S i + k — 1, the above equation
yields
x,T(t)(A — D)x,-(t) = a,,1§(1+), for iS 1, j g i + k -— 1. (3.7)

Let X be the matrix consisting of x,-(t), xg+1(t), - - - , $g+k_1(t) as its columns. By (3.7),

P A;(t+)

A2..(t+)

A'(t+) a XT(A — D)X = (3.8)

 

 

_ A:I'i"k--1(tnl-) J
Because both {y,-(t), y.-+I(t), - - - , yg+k_1(t)} and {x.-(t), $g+1(t), - - - , xg+k_1(t)} form or-
thonormal bases for the invariant subspace, there exists an orthogonal matrix Q such

that Y = X Q Hence

17

YT(A - D)Y -_- QT[XT(A — D)X]Q.

It follows that YT(A - D)Y and X T(A - D)X are similar matrices and have the same
eigenvalues. By (3.8), they are A;(t+), A;+,(t+), - - - , Ai+k-1(t+)- I

3.4 Continuity of Eigenvectors

The Rayleigh quotient iteration (RQI) is one of the most important methods in our
algorithm (see Chapter 4). The convergence of ROI depends not only on a good
choice of eigenvalue prediction, but also on a good eigenvector prediction. Usually,
the eigenvector at step t is used as the starting vector for RQI at step t + At. In
earlier discussion, homotopy algorithm computes eigenvectors at t = 0 as follows:
ﬁnd the eigenvectors for each diagonal block of A(O) = D then extend them to full
length vectors with appropriate zero entries. The following example shows that this

can be an inefﬁcient choice.

Let
[11] [10]
A: , D: , (3.9)
11 01

and A(t) = (l — t)D + tA, then

A(t) = [ 1 t ] . (3.10)
t 1

According to the earlier algorithms, eigenvectors of A(O) = D should be (by consid-

ering D as a block diagonal matrix with two 1 x 1 diagonal blocks),

$1(0)=[1], 32(0)=[0].
0 1

However, for any t > 0, A(t) has eigenvectors

1 l 1 1
x1(t)_7§[l], x2(t)—7§ -1].

18

These eigenvectors are not even continuous at t = 0. In fact there is a 45° turn in
both vectors. Using x1(0) and x2(0) as starting vectors for ROI in the computation
of x1(t) and xg(t) is obviously a bad choice. Again, the problem occurs because A(t)
has multiple eigenvalues at t = 0, and any two orthonormal vectors can serve as base
vectors for the corresponding invariant subspace.

An important question is then, how can one ﬁnd a set of eigenvectors at step t
that can be turned continuously into a set of eigenvectors at step t + At? Using the

notations in Theorem 3.2, we have

Theorem 3.3 Suppose YT(A —- D)Y has eigendecomposition QA’QT. If
A:‘(t'l")i Ai+1(t+)9 ' ' ' a Ai-i-k—l (t+)

are distinct, then the columns of YQ are, up to a sign, the analytic eigenvectors

described in Theorem 3.1.

Proof: Let X be the matrix with analytic eigenvectors

x.(t). 3:410), ° ° ° , $i+k—1(t)

as its columns. Since both X and Y consist of orthonormal eigenvectors from the same

invariant subspace, there is an orthogonal matrix Q such that X = YQ. Substituting
into (3.8), yields
QTO’TM - D)Y)Q = A'.

01'

YT(A — D)Y = QA’QT.

On the other hand,
YTM — D)Y = QA'QT,

QA'QT = QA’QT.

19

Because diagonal matrix A’ has distinct values A;(t+), A2+1(t+), . - - , Aj+k_1(t+) along
its diagonal, the columns of Q and Q can differ at most by a sign. Hence the columns
of YQ and X = YQ can differ at most by a sign. I

In conclusion, when bifurcation occurs at t, YT(A — D)Y is formed by using the
computed eigenvector matrix Y, then the eigendecomposition QA’QT is computed for
this I: X It matrix YT(A — D)Y. From this decomposition, the bifurcation directions
can be obtained. If these directions are mutually distinct, then YQ is formed and its
columns are taken as a set of improved eigenvectors at this step, which can be used
to speed up the iterations in later steps.

In the example considered at the beginning of this section, a simple computation

generates the improved eigenvectors at t = 0

1 1 1 1
31(0)=—‘/-§[1], 32(0)=7.2.[_1].

They turn out to be the exact eigenvectors at t = 1.

3.5 Close Eigenvalues

Results in previous sections are derived under the assumption that A(t) has multiple
eigenvalues at some t. In real computations, however, it is hardly distinguishable
whether a matrix has true coincidental eigenvalues or pathologically close ones. For

example, the famous Wilkinson matrix

[101 l
19
1
W5: 10
1
191
110

 

 

20

has distinct eigenvalues since it is an irreducible symmetric tridiagonal matrix, but
its ﬁrst two eigenvalues are so close that they agree to 15 signiﬁcant digits ([29],
p.308-309). That is, these eigenvalues are practically indistinguishable. Moreover,
even if the eigenvalues are distinct, formula (3.6) may not be appropriate to apply
when eigenvalues are close, since in this situation the corresponding eigenvectors can
be very sensitive. The computed eigenvector yj(t) may be signiﬁcantly different from
the true one xj(t), making yf(t)(A — D)y,-(t) inaccurate. In such a case, we will treat
them together as a cluster.

In the rest of this section, we demonstrate that the result of Theorem 3.2 can also

be used in the case of close eigenvalues.

Theorem 3.4 Let c > 0, suppose A(t) has eigenvalues A;(t),)..-+1(t),- - - ,z\.~+)¢-1(t)
such that |A.-(t) — Aj(t)| S c forj = i+ l, . - - ,i+ k — 1. Then there exists a matrix E
such that B(t) = A(t) + E has A,(t) as its k-multiple eigenvalue and “E”; S 6.

Proof: Suppose A(t) has eigendecomposition QAQT, where Q is an orthogonal matrix
and A = diag().l, A2,. - - ,An). Let A = diag( 61, 62, - . - ,6") be the diagonal matrix
such that 51 = = 5a = 0, 5m = A; - Ai+la"'96i+k—l = A; - Aawe-1, 51+): = =
6,, = 0. Set E = QAQT and B(t) = A(t) + E. It is easy to see that B(t) has
k-multiple eigenvalue )1,- and "Eng = IIQAQT“; = ”Aug S e. I

Suppose B(t) has eigensystems (p1(t), 21(t)), (p2(t), 22(t)), - - - , ([1,,(t), 2,,(t)). When
A(t) has close eigenvalues, e is small, hence B(t) is a small perturbation of A(t). Al-
though eigenvectors corresponding to close eigenvalues are very sensitive, a collection
of sensitive eigenvectors can deﬁne an insensitive invariant subspace provided the

corresponding cluster of eigenvalues is isolated ( [12], p.199-208). So
X = spanks“), $1410), ' ' ' , $i+k-1(t)}

and

Z = span{z.(t).z.-+1(t).- ' ° , 2141—10)}

21

are close. Let X = [x.-(t) xg+1(t)-- -x.-+k-1(t)], Z = [2,-(t) z;+1(t) . - -Z,'+k_1(t)], then
the eigenvalues of XT(A — D)X and ZT(A — D)Z, are close since XT(A — D)X and
ZT(A — D)Z are the projections of A — D onto X and Z , respectively.

Now apply Theorem 3.2 to matrix B(t),

o(ZT(A — D)Z) = {,1;(t)|j = 1,1 + 1, - . -,i + k — 1}.
And the ﬁrst order prediction gives
#10 + At) t” #10) + #j-(UN = Mt) + Ilj-(tlAt-
By Weyl’s Theorem [2],
PW + At) - 2110+ At)l S Ilr‘1(t + At) - B(t + At)” = HE” S 6,

hence

A,(t + At) e ,1,(t + At) rs ,\.(t) + p;(t)At.

It follows that we may replace #2“) by an eigenvalue of X T(A — D)X. Hence close

eigenvalues and multiple eigenvalues can be handled in a uniform way.

22

Chapter 4

The Algorithm

The tracing of eigencurves consists of the following main steps:

4.1 Choice of the Starting Matrix

To start the algorithm, we need to choose a starting matrix D ﬁrst. From the results
developed in Chapter 2, block diagonal matrix with blocks directly from A will be
our choice unless more information about the problem is revealed. At this stage,
the sizes of the blocks remain undecided. It is clear that D will be closer to A if
its diagonal blocks are larger, and the eigencurves will be better behaved. However,
because of the requirement that the eigensystems of D should be much easier to ﬁnd
than those of A, the sizes of the blocks of D should be kept under certain limit. In
our algorithm, QR iteration is used to solve for eigensystems of D. This requires that
each block size be no more than a few hundred. Another important factor in choosing
D is the computer architecture. When more processors are available, one can choose
more (hence smaller) blocks; otherwise, fewer (hence larger) blocks should be used.
In the test examples we run in BBN Butterfly Machine, we choose the block size to
be around one hundred. This seems to be optimal for most of the cases comparing

with other choices. In general, no formula is set to automatically generate the block

23

sizes of D.

4.2 Location of the Starting Points

When the starting matrix D is chosen, we need to ﬁnd values A;(0), x;(0) for i =
1, 2, . - - , n, that is, the eigenvalues and eigenvectors of D. Since D is a block diagonal
matrix, its eigenvalues consist of eigenvalues from all diagonal blocks, its eigenvectors
can be obtained from eigenvectors of diagonal blocks by extending them to full di-
mension vectors with appropriate zero entries. In a parallel architecture, each block
is assigned a processor to perform QR iterations on the block, and the results are
collected to form the eigensystems of D. The QR iteration method is used since it is
one of the best methods for matrices of relatively small size. Finally the eigenvalues
of D are arranged in descending order to prepare for parallel processing in later steps.

It is clear that this step has high parallelization level and requires relatively small

amount of execution time.

4.3 Prediction

From the previous step, eigenvalues )1,(0) and eigenvectors x,(0) are available for
j = 1,2,- - - ,n. Assuming in general that Aj(t),x,-(t) at t have been obtained, we
shall apply the prediction — correction scheme to compute the eigenvalues A,- and
eigenvectors x,- at t + h. The choice of the step size h will be discussed in a later
section. Here, we ﬁrst consider the prediction procedure. In this step we not only
predict the eigenvalue A,(t + h), but also predict whether or not it is an isolated
eigenvalue of A(t + h). For isolated eigenvalues and for clusters, different correction
methods are used.

Case 1: If the j-th eigenvalue Aj(t) of A(t) is simple and well separated from the
other eigenvalues, then

X-(t) = xf(t)(A — D)x,-(t).

J

24

Let the j-th predicted eigenvalue of A(t + h) be

1,(t + h) as ,\,-(t) + 1;.(t)h
= A,(t) + zf(t)(A — D)x,-(t)h.
All quantities on the right hand side of the above equation are available from step t.

For the eigenvector, we simply use $j(t) as the prediction for xJ-(t + h). There are

several reasons for this choice:
0 Although in theory one can use
xJ-(t + h) z raj-(t) + x;(t)h, (4.1)
but the high cost in computating x;(t) makes it an impractical approach.

0 Eigenvectors are more sensitive to perturbation than eigenvalues ([29], p.331-
335). It is difﬁcult to compute the derivative xs-(t) to a high accuracy. Hence
(4.1) usually does not yield signiﬁcant improvement of xj(t + h).

e In our correction step, the eigenvector prediction is not as important as the

eigenvalue prediction.
Case 2: Aj(t) is in a cluster of eigenvalues of A(t), then we proceed as follows.

1) Find It — the number of eigenvalues in the cluster;

2) Form X = [2:5, $5+1,°°-,.’Bg+k_1] where the corresponding A;,---,A,-+k_1

form a cluster;
3) Form XT(A - D)X;
4) Calculate eigendecomposition X T(A — D)X = QQQT;

5) Set X = XQ.

25

clusterof3
clustcrofS

 

 

 

 

t t+h

Figure 4.1: The change of clusters from t to t + h

Now suppose ﬂ=diag(w,- 025+] - - -w.-+),_1). By Theorem 3.2, Log, wi+1, -- -, wg+k_1

are the approximations of the derivatives of A,(t). Thus the predictions

Aj(t + h) z Aj(t) + wjh (4.2)
xJ-(t + h) z x_,-(t). (4.3)

can be used. If some at; (i S j S i + k — 1) is an isolated value inside (0;, 10,-1.1,
- - -, wg+g_1, then Aj(t + h) is likely to be out of the cluster at t + h (Figure 4.1). So
it should be labeled as an isolated value and treated as in Case 1. If, on the other
hand, some wj’s are among a cluster of tag, 025+], - - -, wg+k_1, the corresponding Aj’s
are likely to form a new cluster at t + h (Figure 4.1), hence they are collected in a
group and handled together at the correction step. The predictions for them are the
same as in (4.2) and (4.3).

4.4 Correction

From the last step, in addition to the predictions for Aj(t + h) and x_,-(t + h), the

information about the isolation of A,(t + h) is also available. If it should be treated as

26

an eigenvalue in a cluster, the number of eigenvalues in the cluster is also known, then
Subspace Iteration with Rayleigh-Ritz Procedure (SIRR) is used for the correction:
Suppose {A,-, A,“ , - - - , L414} is a. cluster and the corresponding approximate

eigenvectors are {x,-,x,°+1,- - - ,x.-+),-1}. Let
1
A = I“; + Ai-l-l + ° ° ° + Ai+k—l), X = [31° $£+1 ' ° ' 335+k-1l-
Solve (A — AI)Y1 = X for Y1, then repeat the following for V = 1,2, - -- :

1) Decompose Yu = QuRu;

2) Solve (A — AI)Yy = Qu;

3) Form Hy = QZ'(A - D)-1Qu = QZYu;

4) Decompose H” = GVOVGI, where G is orthogonal matrix, Oyzdiag(0,-,0,-+1,
° - ° , 9.41:4);

5) Form Yu+1 = QuGu

6) Test for the convergence of 0;, 0;“, - - - , 0,-+).-1. Goto 1) until all 0;, 0,-+1,- - -,

9.4.1-1 converge.

If A,- (t + h) can be treated as an isolated eigenvalue, Rayleigh Quotient Iteration
(RQI) is the choice. However, in our algorithm, RQI is not applied directly at the

very ﬁrst step because of the following two observations:

a) Although A,(t) is isolated, Aj(t + h) may not be so (Figure 4.2), and our
prediction step can not detect such a situation. When this happens, if RQI
is used, those undesirable behaviors of the algorithm, such as converging

to an eigenvalue far from the original prediction, may occur [20].

b) Although RQI converges cubically, it usually needs a few iterations before
achieving this rate (unless the starting values are extremely good). It is

quite expensive to use when cubic rate can not be achieved.

27

P cluster of 2

isolated

 

 

 

t (+11

Figure 4.2: An isolated eigenvalue at t becomes nonisolated at t + h

To overcome these difﬁculties, two additional procedures are introduced. The
ﬁrst one is the Inverse Iteration (INVIT) procedure. It can be used to compute a
better approximate eigenpair (A, x) by using inverse iteration recursively. It is cheaper
than RQI. The second one is called SUBDIM, which is designed to detect whether
an approximate eigenvalue falls inside a cluster of eigenvalues. If it does, SUBDIM
will compute the dimension of the invariant subspace corresponding to this cluster
of eigenvalues and also generates an approximate basis for the subspace. Otherwise,
SUBDIM is equivalent to an additional step of inverse iteration.

With the help of these two procedures, the correction goes as follows: First, a
few steps of INVIT are used to obtain a better approximate eigenpair (A,x), then
SUBDIM is called to determine if A is an isolated eigenvalue. If so, the faster method
— RQI is applied for the rest of the correction steps. The starting values (A,x) for
the iteration are now inside or closer to the cubic converging range of ROI. If A is not
an isolated eigenvalue, then from SUBDIM, the dimension of an invariant subspace
corresponding to eigenvalues close to A and an approximate basis for the subspace are
available. Then, similar to the ﬁrst case described at the beginning of this section,

subroutine SIRR is applied to further improve the subspace to the desired accuracy.

28

The detail of INVIT and SUBDIM is described in the rest of this section.
(a). INVIT: This is a slightly different version of ordinary inverse iteration
procedure:

0) set xlo) = x,-(t,-), p = A;(t,-+1), k = 0
1) solve
(Assn—1111“) = z“) (4.4)
for yl").
2) compute 7), = ||y(")|| and let

(1:)
30+!) ._._ L.
'71:

3) if 7), S M and 71/714 2 a then k=k+l, goto 1)
else let A = p(x("+1)), x = xU‘“), quit;
where

112””) = (x<*+”)TA(t.-..)x<*+'>

is the Rayleigh quotient.

There are two control parameters in this procedure, namely M and a. In our
algorithm, we set M = 10‘/p and a = 1.2. It is well known that xl") will converge

to an invariant subspace corresponding to the eigenvalues close to p and

—) . l .
mlnlp—Agl

 

‘71:

The convergence comes in two ways: if the prediction [1 is very close to some true
eigenvalues (min In — A.-I < 1/M), 7), > M is quickly satisﬁed; on the other hand, if p
is not close to any particular eigenvalue (min lp — A,-| > 1/M), then we must wait for
the second condition 7k/7k_1 < a to be satisﬁed. Inequality 75/71,-1 < 0 indicates
that 7). will not grow signiﬁcantly in the following iterations. In this case, the inverse

iteration procedure has been stabilized or converged. For our purpose, this procedure

29

is not affected by close eigenvalues, since one of the two conditions must be satisﬁed
in the end. There are other choices for M and 0. Since we will switch to a faster

method eventually, it is appropriate to impose a loose converging condition.

(b). SUBDIM — for computing the dimension and basis of the invariant sub-
space corresponding to the eigenvalues close to A:

Suppose (A19 $1)1(A29 33), ' ' ' 1(An1 37!) are n eigenpairs 0fA(tj+1)tlet {3’11 3"), ' ' ' 1 Sin}

be a set of linearly independent vectors chosen beforehand. Take yl, orthogonalize it

against x1 and call it yl again. Then

0) let k=l, 21: x
i) solve
(A(t...) — 102 = y. (4.5)
for 2

ii) compute ||z||.

If "z” is large, then

let 21+] = z/||z||, take yk+1, use MGS to orthogonalize it against

{21, 22, - - - , 2H,}, let iterat=1,k=k+l, goto i).
else

if iterat = 1, orthogonalize 2 against {21,Zg,'°',2k}, let 3),, =

z/||z||, iterat = iterat+l, goto i).
end if.

where MGS stands for modiﬁed Gram—Schmidt process [12].

From this procedure, It is the dimension of invariant subspace and {21, 22, . - - , 2k}
is an approximate basis.

The theoretical background for this procedure is the following.

30

Suppose A1, A2, . - - , Am are in a cluster for some 1 S m < n such that

max IA) — AIS 6, min IA) — AI 2 a > e. (4.6)
lSlSm m-HSlSn

Expand the vector y), in (4.5) in terms of x1, x2, - - - , x":

n m n
w: = 2 “(on = 2 “la-TI + 2: “lib”: 314' silo) + ill”,
i=1 l= l=m+l

then the solution of (4.5) is

Z = (A(tm) - All-1w:
= (A(tm) - All-1 20951
l=l

 

 

 

1'!
l=1
m (k) n (k)
a! at
= x; + x).
g A) — A , Ea, A) — A

The norm of z“) is not big, in fact,

(I) = (al ) < Zl=l(al ) <1
II2 II J z —— _ .-

l=m+1 [Al - Alz - minm+lSlSn [At - Al

 

 

If k < m, y), is orthogonal to 17), = {21, 22, - - - , 2),}, hence y), has nontrivial component

ylo) in Vm, and the norm of z grows rapidly during the iteration; in fact,

"2“ 2 "2(0)” = \Ji “(f-[2" Z lIla/lo)"-
l=1 IA; — A] 6
Hence ||z|| becomes large in one or two iterations, and when the procedure continues,
the subspace dimension continues to grow. When one reaches 1: = 171, however, y),
is orthogonal to the whole subspace 17", which is a good approximation to Vm, hence
yl‘o) z 0, and ”z” is not large. Then the procedure ends. In such a way, we can detect
the size of the cluster and generate a set of vectors which is a good approximate basis

for the corresponding invariant subspace.

31

It is necessary to be speciﬁc about “large” and “not large” when this algorithm is

used. Notice that keeping on iterating (4.5) yields,
Ila/lo)" -* 1. llylnll -' 0.

hence in the end

NZ" 2

alt—t

Therefore, we can say, for example, "2" is large if it is greater than l/2e, it is not
large otherwise, where e is as in (4.6). Our experience is that, one needs not be
too restrictive, because underestimating the cluster dimension is more serious than
overestimating it, which can cause the slow convergence of the subspace iteration.
One remedy for the possible incorrect dimension count is that one can use “dynamic”

scheduling (describe in the next section) during the progress of the subspace iteration.

4.5 Dynamic Subspace Iteration

When the convergence of the Rayleigh quotient iteraion or subspace iteration is too
slow (more than three iterations, for instance), it indicates that the estimate of the
subspace dimension is inappropriate, either too big or too small. The iteration process
needs to be monitored more carefully, and the subspace dimension should be ajusted
during the progress when it becomes necessary: after a few steps of Rayleigh-Ritz
procedure [23], only those Ritz vectors with their Ritz values close together are kept
for further iterations. The subspace dimension is then reduced and the condition of
the subspace is improved. This in turn will reduce the computation and speed up the
convergence. If, on the other hand, all the Ritz values are close and convergence is
still slow, then it becomes necessary to enlarge the subspace. This can be achieved
by calling procedure SUBDIM to determine a more accurate dimension for the in-
variant subspace, and the subspace iteration resumes after this. Such a process often

accelerates the convergence considerably.

32

Remark 4.1: For large sparse matrix, factorization is feasible, but it is still
one of the major parts of the computation when solving an equation. Therefore we
keep the number of factorizations to a minimum in our algorithm by including all
the procedures such as INVIT, SUBDIM, SIRR, and ROI in the algorithm. INVIT,
SUBDIM, and SIRR are all iterative methods. They require solving the systems with
different right hand sides only, that is, for different iteration step, new factorization is
not needed. These methods are not the fastest, but by combining with faster methods
such as RQI, one can still achieve high convergent rate to reduce the computation

time.

4.6 Checking

When RQI or SIRR is used, equations of the type
(A(t) - #03 = y

are solved. By taking advantage of the symmetry, the so called symmetric Gauss
elimination method [3] can be applied. That is, A(t) - 111 can be decomposed as
LDLT (instead of the traditional LU form), where L is a lower triangular matrix
and D is a block diagonal matrix with diagonal blocks of order one or two. Using
Sylvester’s Theorem, the inertia (u, C, 7r) of A(t) — #1 is available as a by-product of
the decomposition, where u,(, and r are the number of eigenvalues of A(t) which
is greater than, equal to and less than a, repectively. With the availability of the
inertia, the location of 11 relative to other eigenvalues of A(t) is known. To see how
the checking is done, let’s assume the eigenvalues of A(t) are A1 S A; S - - . S An and
ROI is applied to compute the eigenvalues. Suppose the lc-th step of RQI is:

1). Solve (A(t) — 11021)le = x“) for y“).
2). Compute 7), = ||y(")|| and set self“) = ylkl/ryk.
3). Set 110‘“) = (x("+1))TA(t)x("+1) and goto 1).

33

Suppose the convergence is observed at step m. Let the inertias corresponding to
A(t) — p('”‘1)I and A(t) — ”("01 be (um-1,(m_1,1rm-1) and (um, (mtrm) respectively.
If we denote i = um-“ then A.- S plm'll S Ag“. Now the checking can be done as

follows:

1). If V... = i and pl”) S [Am-1), set A,- = pl”).
(m)

11 l l

2., 11““) 1.1.1

2). If V... = i and 11“”) > ﬂ(m-1), set A,“ = til”).
(at)

 

 

 

u

A, “(m-1) 114-1

3). If um = i + 1, set A,“ = 11"”).
l4‘01!)

A, “(m-1) Am

4). If um = i — 1, set A,- = 11“").
”(m)
A, ”(m-1) At+1

Hence with minimum computations, we know whether the process converges to
the eigencurve being followed. In general, jumping to a neighboring curve does not
occur when eigenvalue separation is good. When the separation is poor, a cluster of
eigenvalues are computed which usually includes the one we wanted. Our iterations in
the correction steps are successful most of the time. In case the process does converge
to a neighboring eigenvalue, there is a good chance that this value is also needed and
the corresponding eigencurve has not been traced. Usually, the number of eigenvalues

sought is larger than the number of processors available, so a total waste is unlikely.

34

For example, suppose we have 100 eigencurves to trace and there are 10 proces-
sors available for the purpose. Assume the scheduling is arranged as follows: The
ﬁrst processor is assigned to trace A1(t),A2(t),---,Alo(t); the second processor is
assigned to trace Au(t),A12(t),o - ',A20(t); and so on. If jumping occurs when the
ﬁrst processor is following A, (t), then the computed eigenvalue should be one among

A2(t), A3(t), - . - , A10(t), say A3(t). Then tracing of the third eigencurve can be skipped.

4.7 Clearing Up

During the tracing of eigencurves, a signiﬁcant amount of time is spent on keeping
the process on the right curve. There are several existing techniques for this purpose
[l7],[18]. However, there are cases where these techniques are too costly. Several
new approaches have been introduced in our algorithm. They are designed especially
for large matrices. Instead of imposing very strong restrictions on those control
parameters, our strategy is to abandon the process of tracing the curve if convergence
to a desired value is not observed after reasonable efforts have been made. In the
end, most of the needed eigenvalues of A are computed. A few of those missed are
scattered across the spectrum of A and are isolated by those found, that is, there
are several small intervals containing those missed values. By counting the inertias
corresponding to the end points of each interval, the number of eigenvalues inside
such an interval is known. These values are well separated from the neighboring ones
since clusters have been found by SUBDIM in the algorithm. Under these favorite
conditions, subspace iteration method can be applied in each interval efﬁciently to

ﬁnd the eigenvalues inside it, and this can be done in parallel.

4.8 Step Size Control

Our extensive numerical experience shows that the step size should be chosen as large

as possible. Allowing small step size can lead to inefﬁciency. In our algorithm, the

35

 

ﬁrst attempt of the step size h is chosen to be max{0.25, (1 — t)/2}. If convergence
is not achieved at some step and h > 0.25, the step size is cut in half and the
process is repeated. If the failing occurs when h S 0.25, we will simply abandon the
process of following this speciﬁc curve. Then the eigenvalue may be computed by
other process (such as SUBDIM if the value is among a cluster) or may be considered
missed. Because we have a back up procedure for those eigenvalues missed due to
the abandence of tracing the corresponding curves, this “giving up” strategy does
not cause problem, instead, it is more efﬁcient to have a lower bound imposed on h.
There are cases that, under no control limit, h becomes very small, hence causing
the long execution time at a single “bad point”. In our many examples, the largest

number of missed eigenvalues is usually less than 10% of the total values sought.

36

 

Chapter 5

Numerical Experiments

5.1 Introduction

Our algorithm has been implemented in several machines using FORTRAN language.
The sparse matrix A is held using coordinate scheme ([8], p.23—24), that is, the matrix
is speciﬁed as a set of triples (a;,-, i, j ), they are stored in one real array and two integer
arrays. Because A is symmetric, only the upper triangular part is stored.

In our algorithm, sparse linear systems of the type
(A(t) — AI)y = x (5.1)

need to be solved frequently. The intensive research in sparse matrix techniques in
the past two decades resulted in several efﬁcient codes for solving such systems. We
choose MA27 subroutines from Harwell Subroutine Library for this purpose. This
code can solve indeﬁnite symmetric systems, such as (5.1), stably and with minimum
overhead above the code for positive deﬁnite systems. It can also obtain the inertia
of A(t) — AI as a by-product. Both of these features are vitally important to our
algorithm.

The code MA27 has three separate steps: Symbolic Analyse, Factorize, and Solve.
Symbolic Analyse exploits the sparse structure of the matrix and estimates the work-

ing space needed for later steps. Factorize implements a version of Gauss Elimination

37

to compute the LDLT decomposition of the matrix. The last step, Solve, uses this
decomposition to actually solve the matrix system. Since the underlying matrices
A(t) have the same sparse structure for all t 6 [0,1], Symbolic Analyse is required
only once in the whole algorithm. In addition, for inverse iterations, one needs to
solve systems with different right hand sides only, new LDLT decomposition is not
necessary, i.e., only the last step, Solve, is called for each iteration. This leads to

signiﬁcant saving in time because Factorize is the most expensive one of the three.

5.2 Test Matrices

Four types of matrices are used to test our algorithm. The ﬁrst two can be found in

[5]. They are common matrices for testing purpose.

1. Diagonally-disordered matrices:
These matrices arise in the study of two-dimensional N X x NY arrays of atoms

in disordered systems [15]. The associate matrix A has order N = N X X NY,

[B I 1' [$1 1l
I B 1x
A: ,3:
I ".1
.I I B] L1 lx‘

 

 

 

 

The matrix A is almost block tridiagonal. I is the identity matrix of order N X . Each
B block is N X X N X and there are NY blocks down the diagonal of A. The diagonal
elements of B are randomly generated numbers. A scaling parameter bounds the mag-
nitudes of these disordered terms. In the simplest case, these entries are chosen from
a uniform random distribution over an interval [~SCALE, SCALE] determined by the
user-speciﬁed scaling parameter SCALE. A second class of diagonally-disordered ma-
trices is obtained by choosing the diagonal elements randomly as either 0 or SCALE.
If N X and NY are coprime, then all of the eigenvalues of A are distinct.

38

2. Poisson matrices:

When the Laplace equation
um+uW = Au, R: {(x,y)|0 S x S X,0 S y S Y}

is solved numerically, the differential equation is replaced by an algebraic linear sys-
tem, Ax = Ax, obtained from discretizing the Laplace operator on the rectangle.
The order of the resulting matrix is N = N X x NY where N X is the number of
subdivisions in x-direction and NY is the number of subdivisions in y-direction. The

matrix A is block tridiagonal,

 

 

 

 

r - 1 —3c
B C
—sc 1 —c
C B
—c
A: C 1 B:
—c
C B 30
—c l —sc
30 B
' ‘ _ —sc 1 ]

The parameter c is user-speciﬁed and must be chosen such that 0 < c S 0.5. Here,
C = —(0.5 — 0)]. For Dirichlet boundary conditions 3 = 1, and for Neumann con—
ditions s = J2. Under Dirichlet conditions, the eigenvalues (and eigenvectors) of A

are known,
A;,- = [1 — 2ccos(1ri/(NX +1)) — 2(0.5 — c)cos(1rj/(NY +1))],
1SiSNX,1SjSNY.
Under Neumann conditions, however, the eigenvalues and eigenvectors are not known

a priori. By varying the value of c, many different eigenvalue distributions can be

obtained.

3. Random matrices:
The sparse matrices in this group are generated in three steps: Let N be the order
of the matrix. We use rand to represent random number generator that produces

uniform distribution on interval (0,1). Then

39

0) Clear the matrix:
for I=1,- - -, N; J=I,- - -, N
A(I, J) = 0.0.

1) Generate N entries randomly in the upper triangular part of A:
for i=1,- - -, N
=Int(N*rand(i)+1), J=Int(N*rand(I)+1)
if I<J then A(I,J)=rand(i)+A(I,J) else A(J,I)=rand(i)+A(J,I).

2) Generate N entries randomly along upper 5 diagonals:
for i=1,- - -, N
=Int(N*rand(i)+I), J=Min{N, I+Int(5*rand(I))}
A(I,J)=rand(i)+A (I,J).
3) Generate N entries randomly along the main and upper diagonal:
for i=1,- - -, N
I=Int(N*rand(i)+1), J=Min{N, I+Int(2*rand(I))}
A(I,J) = rand(i)+A(I,J).

A matrix generated this way has nonzero entries concentrated near the main

diagonal, a pattern shared by many sparse matrices in applications.

4. Tridiagonal matrices:

The familiar Toeplitz matrices

A=[121]=

 

 

of different orders are chosen as test examples in this group.

40

5.3 Results on IBM 3090 Vector Machine

We compare our algorithm with EA15 — an algorithm for eigenproblems using Lanc-
zos method. This algorithm is made up of several subroutines in Harwell Subroutine
Library . The user supplies an interval which contains all the eigenvalues of inter-
est and the algorithm ﬁnds all eigenvalues inside the interval and the corresponding
eigenvectors.

Table 5.1 contains a list of execution times for computing the ﬁrst 50 eigenpairs for
different matrices. It can be seen that our algorithm runs behind Lanczos algorithm

in IBM 3090, a sequential machine.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Matrix Order Homotopy Lanczos Ratio
500 85.34 43.91 0.515

Disordered 1000 181.01 94.49 0.522
5000 1198.72 631.73 0.527

500 65.29 37.44 0.573

Poisson 1000 133.54 75.61 0.579
5000 676.41 394.35 0.583

500 46.83 23.65 0.505

Random 1000 94.66 49.04 0.518
5000 481.37 253.68 0.527

 

Table 5.1: Test data from IBM

However, in the following important situations, our algorithm outperforms Lanc-
zos algorithm.
1). Interior Eigenvalues: The above comparision with EA15 are based on the

computing of some extremal eigenvalues. It is well known that Lanczos algorithm is

41

most efﬁcient for such problems. However, in many applications, interior eigenvalues
are needed. In such a circumstance, our homotopy method is much faster than Lanc-
zos algorithm. The reason is that, during Lanczos iterations, extremal eigenvalues
usually emerge before the interior ones.

Table 5.2 shows the execution times for the computation of 20 middle eigenpairs

of the matrices listed.

 

Matrix Order Horn. EA15 Ratio

 

500 42.68 86.79 2.03
Disordered 1000 82.38 332.07 4.03
5000 354.10 2487.48* 7.02
500 33.52 70.73 2.11
Poisson 1000 63.91 270.34 4.23
5000 347.64 2471.72* 7.11
500 24.78 55.75 2.25
Random 1000 49.46 214.16 4.33
5000 251.47 1750.23* 6.95

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 5.2: Interior 20 eigenpairs on IBM 3090

* Eigenvalues only

2). Better Initial Matrix: As mentioned in Remark 2.2, for a typical application
problem, one needs to solve a series of matrix eigenvalue problems. When this situ-
ation occurs, using a matrix in the sequence as the initial matrix, we expect to have
a much better result than using the block diagonal initial matrix. An experiment
is constructed to illustrate this: randomly choose n/2 nonzero entries in A, perturb

them by small random numbers in ( -0.05, 0.05). Then use the perturbed matrix

42

 

Matrix Order Homotopy EA15 Ratio

 

500 11.96 43.91 3.67
Disordered 1000 33.39 94.49 2.83
5000 199.28 631.73 3.17
500 11.52 37.44 3.25
Poisson 1000 18.72 75.61 4.04
5000 119.14 394.35 3.31
500 9.98 23.65 2.37
Random 1000 23.69 49.04 2.07
5000 85.70 253.68 2.96

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 5.3: First 50 eigenpairs on IBM 3090 with perturbed starting matrix

as the initial matrix D. Table 5.3 shows the execution times for both algorithms —
homotopy and EA15. The time needed for the computation of initial eigensystems
is not included in homotopy algorithm since it is assumed that eigensystems for that

matrix are obtained in the previous step of the sequence.

Test results for tridiagonal matrices are not included in the tables above. Our
algorithm runs successfully for this group. However, the code EA15 can not ﬁnd all
the eigenvalues in a given interval and no error messasge was signaled. It is known that
Lanczos algorithm can be very fast for many problems, but it may not be suitable for
general problems, even some well conditioned ones. In contrast, our algorithm seems
more robust.

In sparse matrix computations, it is common experience that any method involving
0(n2) or more computations is not recommendable. When the number of needed

eigenpairs is ﬁxed, it can 'be seen from Figure 5.1 that our algorithm’s execution

43

time

 

 

A
1200‘ disorderedman'ices

900‘
Pommatrices

600‘
randommamces

300-

F I I I I ’

1000 2000 3000 4000 5000 “d“

Figure 5.1: Execution time vs. matrix order

time is approximately proportional to the order n of the matrix. It appears that our
homotopy method is an excellent candidate for the parallel computation of eigenvalues

of sparse matrices.

5.4 Results on BBN Butterﬂy Parallel Machine

BBN Butterﬂy machine is a share-distributed memory architecture with 96 MC68020
/MC68882 BBN GP1000 processors, the maximal number of processors available to
users is 90. The results in Table 5.4 and Table 5.5 are obtained from averaging the
times used in computing the ﬁrst 100 eigenpairs of 1000 by 1000 diagonally-disordered
matrix, Poisson matrix, and random matrix.

Speed-up in Table 5.4 measures the parallelization level of the algorithm itself.
It is close to the “ perfect” speed-up when the number of processors is less than or
equal to 16 (15 times faster using 16 processors). After that, however, the efﬁciency
(speed-up/ number of processors) becomes lower. This is caused by the computation
of initial eigensystems (those of D’s), since the tasks available at this step is far less
than the number of processors. If the problem is of sequential type (Remark 2.2),

this “bottle neck” will disappear. Another possible way of improving our algorithm

44

 

 

 

 

 

 

 

 

 

 

Number of processors 1 2 4 8 16 32 64
Time for homotopy 2186.85 1095.73 565.08 282.90 145.61 88.14 74.08
Speed-up l 1.99 3.87 7.73 15.02 24.81 29.52

 

 

 

Table 5.4: Speed-up for homotopy algorithm

here is to apply homotopy method recursively — using the idea of “Divide and Con-

quer”. Nevertheless, our present algorithm can achieve high speed-up close to 30.

 

 

 

 

 

 

 

 

 

 

 

 

Number of processors 1 2 4 8 16 32 64
Time for homotopy 2186.85 1095.73 565.08 282.90 145.61 88.14 74.08
Time for EA15 998.17 998.17 998.17 998.17 998.17 998.17 998.17
Speed-up 0.46 0.91 1.77 3.41 6.86 11.33 13.47

 

Table 5.5: Speed-up over EA15

Table 5.5 lists the speed-up of our algorithm over EA15. It can be seen that

our algorithm can be more than 13 times faster than EA15, although it is slower on

sequential machines.

The following two ﬁgures are the graphs of speed-up curves.

“-

48..

32‘

16"

 

 

 

Figure 5.2: Speed-up for homotopy

45

64a

48"

 

 

Figure 5.3: Speed-up over EA15

 

5.5 Accuracy and Orthogonality

The accuracy of our algorithm is also very satisfactory compare to EA15. Table 5.6
lists the maximal residuals

1232a "Ax.- - A13"

for the computed eigenpairs from these two algorithms. The data used here are

collected from IBM 3090. The residual measures the accuracy for both the eigenvalue

and eigenvector.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Matrix Order Homotopy Lanczos
500 0.97e-15 0.81e-l5

Disordered 1000 0.13e-14 0.94e-15
5000 0.27e-14 0.13e-14

500 0.32e-14 0.74e-15

Poisson 1000 0.5le-l4 0.79e-15
5000 0766-14 0.11e-l4

500 0.27e-15 0.18e-15

Random 1000 0.22e-15 0.20e—15
5000 0.23e-l5 0.39e—15

 

 

Table 5.6: The residuals of eigenpairs

Table 5.7 compares the orthogonality among the computed eigenvectors. Because
A is symmetric, these eigenvectors are orthogonal in theory, i.e., [IXTX — I H = 0.
Listed in the last two columns are the maximal entries of the computed matrices

X TX - I from these two algorithms.

46

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Matrix Order Homotopy Lanczos
500 0.36e-15 0.52e-15

Disordered 1000 0.66e-15 0.71e—15
5000 0.67e-15 0.82e-15

500 0.27e-14 0.75e-15

Poisson 1000 0.23e-14 0.84e-15
5000 0.59e-14 0.20e-14

500 0.87e-16 0.13e-15

Random 1000 0.19e-15 0.36e-15
5000 0.25e-15 0.47e-15

 

Table 5.7: The orthogonality of eigenvectors

47

 

Bibliography

[1] P. Arbenz and GB. Golub. On the spectral decomposition of hermitian matrices
modiﬁed by low rank perturbations with applications. SIAM J. Matrix Anal.
Appl., 9(1):40—58, Jan. 1988.

[2] R. Bhatia. Perturbation Bounds for Matrix Eigenvalues. Longman Scientiﬁc 8!.
Technical, New York, 1985.

[3] JR. Bunch and RN. Parlett. Direct methods for solving symmetric indeﬁnite
systems of linear equations. SIAM J. Numer. Anal, 8(4):639—655, 1971.

[4] E. Chu, A. George, J. Liu, and E. Ng. Sparspack: Waterloo sparse matrix
package user’s guide for sparspack-a. 1984.

[5] J.K. Cullum and K.A. Wiloughby. Lanczos Algorithms for Large Symmetric
Eigenvalue Computations, volume I:Theory. Birkhauser, Boston, 1985.

[6] J .J .M. Cuppen. A divide and conquer method for the symmetric eigenproblem.
Numer. Math., 36:197—95, 1981.

[7] J .J . Dongarra and D.C. Sorensen. A fully parallel algorithm for the symmetric
eigenvalue problem. SIAM J. Sci. Stat. Comput., 8(2):139—54, March 1987.

[8] LS. Duff, A.M. Erisman, and J.K. Reid. Direct Methods for Sparse Matrices.
Clarendon Press, Oxford, 1986.

48

[9] SC. Eisenstat, M.C. Gursky, M.H. Schultz, and A.H. Sherman. Yale sparse
matrix package. 1: The symmetric codes. Int. J. Numer. Meth. Engng., 18:1145—
51, 1982.

[10] K. Fan. Maximum properties and inequalities for eigenvalues of completely con-

tinuous operators. Proc. Nat. Acad. Sci. U.S.A., 75:760-766, 1951.

[11] A. George and J.W.H. Liu. Computer Solution of Large Sparse Positive- deﬁnite
Systems. Prentice-Hall, New York, 1981.

[12] GB. Golub and CF. Van Loan. Matrix Computation. The Johns Hopkins
University Press, Baltimore, 1983.

[13] RS. Jenson. The solution of large symmetric eigenproblems by sectioning. SIAM
J. Numer. Anal., 9:534—545, 1972.

[14] T. Kato. Perturbation Theory for Linear Operators. Springer-Verlag, New York,
1966.

[15] S. Kirkpatrick and T.P. Eggarter. Localized states of a binary alloy. Phys. Rev.,
B 6:3589—3600, 1972.

[16] K. Li and T.Y. Li. An algorithm for symmetric tridiagonal eigenproblems —

divide and conquer with homotopy continuation. To Appear.

[17] T.Y. Li and NH. Rhee. Homotopy algorithm for symmetric eigenvalue problems.
Numer. Math., 55(3):265—80, 1989.

[18] T.Y. Li, H.Z. Sun, and X.H. Sun. Parallel homotopy algorithm for symmetric
tridiagonal eigenvalue problem. SIAM J. Sci. Stat. Comput., 12(3):155—65, May
1988.

[19] 3.8. L0, B. Philippe, and A. Sameh. A multiprocessor algorithm for the sym-
metric eigenvalue problem. SIAM J. Sci. Stat. Comput., 8(2):155—65, March
1987.

49

[20] SC. Ma, M.L. Patrick, and DB. Szyld. A parallel, hybrid algorithm for the
generalized eigenproblem. Preprint, 1988.

[21] J. Ortega. Numerical Analysis; 0 second course. Academic Press, New York,

1972.

[22] M.L. Overton. On minimizing the maximum eigenvalue of a symmetric matrix.

SIAM J. Matrix Anal. Appl., 9(2):256—68, April 1988.

[23] RN. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood
Cliff, New York, 1980.

[24] F. Rellich. Perturbation Theory of Eigenvalue Problems. Gordon and Breach,
New York, 1969.

[25] RT. Smith, J .M. Boyle, J .J . Dongarra, B.S. Garbow, Y. Ikebe, V.C. Klema, and
GB. Moler. Matrix Eigensystem Routines — EISPA CK Guide, Lecture Notes in

Computer Science, volume 6. Springer-Verlag, Berlin, 2 edition, 1976.

[26] G.W. Steward. Introduction to Matrix Computations. Academic Press, New
York, 1973.

[27] D. Szyld. Criteria for combining inverse and rayleigh quotient iterations. SIAM
J. Numer. Anal, 25(6):1369—75, 1988.

[28] J. von Neumann. Some matrix inequalities and metrization of matrix space.

Tomlr. Univ. Rev., 1:286-300, 1937.

[29] J .H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford Univ. Press, Oxford,
1965.

[30] Z. Zeng, T.Y. Li, and L. Cong. Solving eigenvalue problems of real nonsymmetric
matrices with real homotopies. SIAM J. Numer. Anal, 29(1):229-248, 1992.

50

"llllilli'liiillll