WNWWIWIHIWHHI'WllHluilHHlllfllKWI '—I_‘_‘ IN-x I(DOOR) ANSTATEU 1 ”NM If”!!!IIHIIIHIIIIHII 00877 2661 {warms :lHjllzl! This is to certify that the dissertation entitled Parallel Homotopy Algorithm for Symmetric Large Sparse Eigenproblems presented by Liang Jiao Huang has been accepted towards fulfillment of the requirements for Ph. D. degreeinMathematics 47/77/4 v MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 1 LIBRARY 4 Michigan State ' University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before due due. DATE DUE DATE DUE DATE DUE "7 fl ll IL_I| 4i i: —i__ FT r—u i MSU Is An Affirmative ActiorVEqueI Opportunity Institution Mama-M PARALLEL HOMOTOPY ALGORITHM FOR LARGE SPARSE SYMMETRIC EIGENPROBLEMS Liang J iao Huang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Mathematics 1992 8 / *x .70 0 / O 22 .. / ABSTRACT PARALLEL HOMOTOPY ALGORITHM FOR LARGE SPARSE SYMMETRIC EIGENPROBLEMS By Liang J iao Huang In this work, we apply homotopy method to solve eigenproblem Ax=/\x, AER,:I:€R”\{0} for large sparse symmetric matrix A. A one-parameter family of matrices A(t) = tA + ( 1 — t)D is introduced and eigenproblem is considered for t E [0, l]. The problem of choosing optimal starting matrix A(0) = D is discussed and partial solutions are obtained. The regularity and bifurcation prob- lems of Mt) and 2(t) are also considered. It is found that these functions can be chosen as analytic functions of t and the bifurcations are of simple type. Then a homotopy continuation algorithm is constructed and several new techniques are de- veloped to handle the curve following process more efficiently. Finally the algorithm is implemented in both parallel and vector machines and numerical results are obtained for many typical testing matrices. Our preliminary experiments show that homotopy continuation method is a very promising method. To my wife Lu Ping iii ACKNOWLEDGMENTS I would like to thank Professor Tien-Yien Li, my dissertation advisor, for suggesting the problem and the directions he provided which made this work possible. I would also like to thank him for for his encouragement and support during my graduate study at Michigan State University. I would like to thank my dissertation committee members Professor Qiang Du, Professor Dennis Dunninger, Professor Richard Hill, and Professor David Yen for their valuable suggestions and time. iv Contents List of Tables v List of Figures vi 1 Introduction 1 2 The Choice of Starting Matrix 4 2.1 Introduction ................................ 4 2.2 Unitarily Invariant Norms ........................ 6 2.3 Block Diagonal Approximants ...................... 7 3 Regularity and Bifurcation 14 3.1 Introduction ................................ 14 3.2 Regularity ................................. 15 3.3 Bifurcation Directions .......................... 16 3.4 Continuity of Eigenvectors ........................ 18 3.5 Close Eigenvalues ............................. 20 4 The Algorithm 23 4.1 Choice of the Starting Matrix ...................... 23 4.2 Location of the Starting Points ..................... 24 4.3 Prediction ................................. 24 4.4 Correction ................................. 26 4.5 Dynamic Subspace Iteration ....................... 32 4.6 Checking .................................. 33 4.7 Clearing Up ................................ 35 4.8 Step Size Control ............................. 35 Numerical Experiments 37 5.1 Introduction ................................ 37 5.2 Test Matrices ............................... 38 5.3 Results on IBM 3090 Vector Machine .................. 41 5.4 Results on BBN Butterfly Parallel Machine ............... 44 5.5 Accuracy and Orthogonality ....................... 46 Bibliography 48 vi List of Tables 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Test data from IBM ........................... 41 Interior 20 eigenpairs on IBM 3090 ................... 42 First 50 eigenpairs on IBM 3090 with perturbed starting matrix . . . 43 Speed-up for homotopy algorithm .................... 45 Speed-up over EA15 ........................... 45 The residuals of eigenpairs ....................... 46 The orthogonalities of eigenvectors ................... 47 vii List of Figures 2.1 2.2 4.1 4.2 5.1 5.2 5.3 Curves corresponding to D1 ....................... 5 Curves corresponding to D2 ....................... 5 The change of clusters from t to t + h .................. 26 An isolated eigenvalue at t becomes nonisolated at t + h ....... 28 Execution time vs. matrix order ..................... 44 Speed-up for homotopy ......................... 45 Speed-up over EA15 ........................... 45 viii Chapter 1 Introduction Large scale scientific computing is currently a very active research field. Tradi- tional methods which work well for small problems are often not suitable for large problems, and not suitable for modern computer architectures. For example, the very efficient and widely used method for solving eigenproblems of small matrices — the QR iteration method [25], becomes inapplicable for large sparse eigenvalue problems because, among other things, the process of Householder reduction can quickly de- stroy the sparse pattern. Moreover, the method is highly serial in nature, it is difficult to fully exploit the power of modern computers. The best known method which can be used to solve large scale eigenvalue problems is the Lanczos method [5]. It can take advantage of the sparseness structure of a given matrix and is good for finding a few extreme eigenvalues. However, it is not a parallel method either, and it is not efficient for finding interior eigenvalues. In this work, we propose a method, called the homotopy continuation method, which is suitable for parallel solution of large sparse symmetric eigenvalue problems. The basic idea of the method is described in the following (see [17] and [18] for more details). Given a real symmetric matrix A of order n, instead of solving for all the eigen- values of A directly, we choose another real symmetric matrix D, called the starting matrix, and consider the one-parameter family of matrices A(t) = D + t(A — D) for t 6 [0,1]. (1.1) This family has the property, A(O) = D, A(1) = A. For each t 6 [0,1], let the eigenvalues of A(t) be A1(t) S A2(t) S - - - S An(t). It is well known that eigenvalues of a matrix are continuous functions of its elements, so in this case each A;(t) is a continuous function of t for i = 1,2, - ~ - , 12. Therefore, there are n continuous curves, which we shall call eigencurves. Suppose we have the eigenvalues of D — A1(0),A2(0),-- - ,An(0). From these values, the starting points of n eigencurves are known. By following these continuous eigencurves, we can reach the ending points A1(1),A2(1),- .. ,An(1) of the curves, which are the eigenvalues of the given matrix A. The main advantages of our method are: o It is parallel in nature: tracing of each eigencurve is independent of the others. 0 The main calculation is concentrated on solving large sparse linear equations, and unlike solving for eigenvalues directly, several good packages are available for solving such linear equations efficiently [8], [9], [11]. o It can be used to find only a few of the specified eigenvalues. (In contrast, the Lanczos method tends to give extreme eigenvalues on both ends before the emerging of interior eigenvalues , and it can not tell which eigenvalue is found and what multiplicity it has.) The dissertation proceeds as follows: Chapter 2 discusses the problem of choosing a starting matrix D. Some optimal solutions to this question are found under certain conditions. Chapter 3 addresses the regularity and bifurcation problems. Under the 2 assumption that A and D is symmetric, the eigensystems are analytic. Also, with A and D symmetric, the bifurcations are easy to handle — bifurcation directions can be readily computed using a simple formula. Chapter 4 describes our homotopy algo- rithm for following the eigencurves. Chapter 5 presents the test matrices, softwares used, and the numerical results on two typical machines —— one a vector machine, and the other a parallel machine. The results show that the homotopy continuation method is very promising for large sparse symmetric eigenproblems. Chapter 2 The Choice of Starting Matrix 2.1 Introduction Choosing a good starting matrix D plays a very important role in our homotopy algorithm. This can be seen from the following example. Example 2.1: Let A be a 10 x 10 symmetric random matrix, D; be a random diagonal matrix, and D2 be a block diagonal matrix with two diagonal blocks directly from A. That is, if A = A11 A12 A21 A22 A 0 D2 = 11 0 A22 where A,- are 5 x 5 matrices for i, j = 1,2. We construct homotopies (1.1) for then each starting matrix D1 and D2, and plot eigencurves in Figure 2.1 and Figure 2.2 respectively. It can be seen that, eigencurves corresponding to D2 are much more straight and thus much easier to follow than those corresponding to D1. Obviously, D2 is a better starting matrix than D1. The following theorem helps in deciding if a matrix D is a good starting matrix. a6u/ 26.. "’°‘ * "WA ‘1.0 I I I I ‘ -10 I I 1 I I 00. .20 H0 .60 .90 1-0 00. .20 so .60 .80 1.0 Figure 2.1: Curves corresponding to D1 Figure 2.2: Curves corresponding to D2 Theorem 2.1 Let A and D be given symmetric matrices, A1(t),A2(t),- - -,A,.(t) be the eigenvalues of A(t) in (1.1), then |Ai(t)| _<_ [IA — DI]: for each i = 1,2,- - - ,n. Proof: By Weyl’s Theorem [2], lMt + At) - Mill S ||A(t + At) - A(illlz = "NH - Dlllz- Hence Mt + At) - Mt) I At Because A;(t) is differentiable function of t [24], letting At -v 0 yields, l 5 MA - D)":- IAI-(tN S "(A - D)||2. This completes the proof. I One of our goals in constructing a good homotopy is to find a starting matrix D so that the eigencurves are more straight. The theorem above indicates that one should look for a D which, among other conditions, minimizes [IA — Dllz. For the matrix A in example 2.1, it will be proved in Corollary 2.4 that IIA — Dz": 5 HA - D1 "2. (2.1) 5 (In fact, we will show IIA - Dzllz = mgn "A - Dllzi (2.2) where 'D is the set of matrices having the same structure as D2). This explains why eigencurves are better behaved when D2 is used as a starting matrix. So the problem of finding a good starting matrix becomes a matrix approximation problem. In this chapter, an important type of norms in matrix theory is introduced first, then the approximation problem arises in our homotopy method is considered. 2.2 Unitarily Invariant Norms Some notations are needed for future reference: M = { Complex matrices of order n } Ll = { Unitary matrices of order n } D = { Block diagonal matrices of order n with a given block structure } A unitarily invariant norm is a matrix norm which satisfies “A“ = IIUAII = IIAVII for every A E M and U, V G Ll. A result of J. von Neumann characterizes all unitarily invariant norms by symmetric gauge functions of singular values [28]. Two important classes of unitarily invariant norms are: Schatten p norms llAllp =(28j(4)”)1/’, p 2 1 i=1 and Ky Fan 1: norms I: IIAllk = ZsJ-(A). k =1.2.---.n i=1 where {3j(A), j = 1,2, - - - ,n} are the singular values of A in descending order. It is easy to see that Schatten 2 norm is the Frobenius norm || - "F and Ky Fan 1 norm is the spectral norm || - "2. 2.3 Block Diagonal Approximants A good choice for the starting matrix is based on three considerations. First, the eigensystems of D should be easy to find; secondly, D should be as close to A as possible, so that the eigencurves are better behaved [l7]; finally, D itself should be easy to obtain. However, these considerations often conflict with each other in the sense that if D is too close to A, then eigensystems of D could be as difficult to find as those of A’s; on the other hand, if eigensystems of D are easy to find, then D is usually not very close to A. One natural candidate for the starting matrix D is the block diagonal matrix [ Dll ]. D D = 22 _ (2.3) L D“: . where D,-.- are smaller square matrices for 2' = 1,2,--- ,lc. For such a matrix D, its eigensystems can be found from those of D33, and each D53 eigensystems can be easily found since it is a small matrix. Furthermore, by using multi-processors, eigensystems of D can be computed in parallel. Therefore, matrices of this form satisfy our first consideration. Let D be of the form in (2.3). The next question is, how to find D of this form which is closest to a given matrix A. In other words, let D be the set of matrices of the form in (2.3), we want to find Do 6 D such that "A - D0" = mDin ”A — D“. (2.4) To the best of our knowledge, this type of matrix approximation has never been investigated before. To measure the closeness of two matrices, unitarily invariant norms are used as the underlying norms. This is because we are considering eigenvalue problems and these norms are simple functions of eigenvalues (by von Neumann’s characterization [28]). For the Frobenius norm, the solution of (2.4) is obvious. We can partition A into blocks according to the structure in (2.3) and choose Do to be the block diagonal matrix whose diagonal blocks are identical to the corresponding diagonal blocks of A. That is, if A11 A12 ° ° ' All: A A - ~ - A A = 21 22 2!: (2.5) _ Akl Akz AH: . then _ An A Do = 22 0 . (2.6) AH: For general unitarily invariant norms, however, the solution is not so obvious. One would hope that Do above is the choice since it meets all our three considerations. It turns out that this is true for an important class of matrices. In this section, we prove some positive results first, and then a counterexample is constructed to show that the result can not be extended to cover all matrices. Theorem 2.2 Let A and Do be matrices in (2.5) and (2.6). If there exists a unitary matrix E G Ll which commutes with every D E D such that A — D0 = —E(A — Do)EH, (2.7) then Do is the best block diagonal approximant of A for every unitarily invariant norm, i.e., [IA — D0" = minpep ||A — D”. For the proof of Theorem 2.2, the following lemmas are needed. They were first proved by Fan in [10]. Lemma 2.1 Let A,B E M ,then ”A” S ”B“ holds for all unitarily invariant norms || - I] if and only if “Alla S “Bill: 8 holds for all Ky Fan norms || - “k, k =1,2, - . - ,n. Lemma 2.2 Let A 6 M, X)‘ = {2:1,:c2,~-,xk} and Y), = {y1,y2,~--,yk} be sets of orthonormal vectors in C", then I: l: “Alli: = 231M) = 5,113]; 2 Rei-Tjiij). j=l k k I j=l Proof of Theorem 2.2: Let A0 = A — D0 = U EV” be a singular value decomposi- tion of A0, with U =(U1, U2, ' ' ' a It“), 2 = diag(31(A0)a ' ' ' I 3n(A0))a V = (‘01, 02, ° ° ' ,vn)- Because A0 = —EA0E”, A0 = —E(U2V”)E” = (—EU)E(EV)" where -EU and EV are unitary matrices. This gives another singular value decom- position of A0. Thus, (“n/1005) = {—Eu,,AoEv,-) = 31(40)- By Lemma 2.2, for any D E D and k S n, k ”A - Dllk Z Z: Rein» (A - 0M) i=1 l: I: = 2011, Ao‘vj) + Z Re(u,-, (Do — 13M) i=1 i=1 l: = ||A - Dollk + 2 new.» (Do - D)vj)- (2-8) i=1 On the other hand, since —EU and EV are unitary matrices, l: ”A - Dllk Z 2 34-511» (A - D)Ev,-) i=1 k = Z(—Euj, AoEvj) + i: Re(—E’Uj, (DO - D)Evj) i=1 i=1 k k = 231-(Ao)— Z Re(Eu,-, E(Do - Dlvj) j=l ’=l I: = ”A — D0“). — Z Re(u,-, (Do — D)v,-). (2.9) i=1 9 Combining inequalities (2.8) and (2.9), k IIA - DIII: Z llA - Dollk + I: 38064130 - D)vj)| i=1 > llA - Dollh- This completes the proof because of Lemma 2.1. I Remark 2.1: In fact, it can be seen from the proof that the following more general result has been established: Let A and Do be any two given matrices, if there exists a unitary matrix E that commutes with Do such that A -— D0 = —E(A — D0)E”, then Do is closest to A among all matrices that commute with E, that is, IIA — Doll = 02,293,, ||A - D". An important class of matrices satisfying (2.7) is the class of block tridiagonal matrices. Corollary 2.3 If A is a block tridiagonal matrix, then the best block diagonal ap- proximant of A is Do for every unitarily invariant norms. Proof: Condition (2.7) is satisfied for E = diag(Il, —Iz, - - . ,(—1)"‘11k), with each I, an identity matrix of appropriate order. I Block tridiagonal (in particular, tridiagonal) matrices arise in many applications. In fact, in matrix eigenvalue computations, a given matrix is usually transformed into a compact form ——— tridiagonal or block tridiagonal form by Householder or Lanczos transformation, then QR iteration, bisection, or homotopy method is applied to ob- tain the solutions. Corollary 2.4 Let A be any square matrix, if we partition A into then D0 = A11 0 0 A22 is the best approximant of A among all block diagonal matrices of the form for every unitarily invariant norm. Proof: This is a special case of Corollary (2.3) with I: = 2. I Inequality (2.1) and equation (2.2) of Example 2.1 at the beginning of this chapter follow from this corollary. It is well known that the time required to find eigensystems of a matrix of order n is proportional to n3. If A is divided into two blocks of approximately equal sizes as in Corollary 2.3, then D is closest to A and eigensystems of D can be solved by parallel computer using about 1 / 8 execution time for solving eigensystems of A. Furthermore, if block sizes of D are still too large to work with, they can be divided into smaller blocks and the “divide and conquer” strategy can be employed. For general matrices, the conclusion of Theorem 2.2 may not be true. This can be seen in the following example. Example 2.2: Let ’ 0 2 2 7 A = 2 0 2 . 2 2 0 . If we choose block size to be 1, then . 0 0 0 I D0 = 0 0 0 L 0 0 0 . The eigenvalues of A are A1 = 4, A2 = —2 and A3 = —2, therefore for spectral norm ll - Ila, "A - Do||2 = {2332; IM = A: = 4 11 However, llA-1||2=112&§|M-1|=3, i.e., IIA - IIla < llA - Dolls- Thus Do is not the best approximant. In fact, it can be shown that I is the best approximant in this case: first of all, A — I has orthonormal eigenvectors r a F 1 r 1 l 0 -—2 1‘ " 1 l 1‘ " l 1 1 — 1 1 1 fl 1 2 fl ) 3 «6 a 1 —1 1 J with corresponding eigenvalues A1 = 3, A2 = —3 and A3 = —3. If there exists a matrix D such that "A — DI]; < ”A - 1]]2, then by Courant-Fischer min-max theorem, we must have Kiwi/1 - D)x,-)| = K3313 (A - [)331') +($1a(1 - Dlxjil = It; + ($1, (I - D)$j)| S ||A - D||2 < [IA-III2= Mil. i=1,2,3. Let I — D = diag(d1,d2,d3) , then the above inequalities yield ($1, (I -- D)$1) == (d1 + (I: + d3)/3 < 0, (2.10) (32, (I — D)x2) = (.12 + d3)/2 > o, (2.11) (1'3, (1 - D)$3) = (4d1 + d2 + d3)/6 > 0. (2.12) It follows from (2.10) and (2.11), d1 < 0. However, (2.10) and (2.12) imply 3d; > 0, Thus such a matrix D does not exist, and the assertion is achieved. Another example of larger order can be found in [22] where the conclusion was justified by numerical results. Although the question of choosing the starting matrix is not completely answered, the matrix D in (2.6) is the best choice in several important cases (such as block tridiagonal A) and is the best in all cases for Frobenius norm. 12 Remark 2.2: Our results provide a general guide line for choosing a starting matrix. In a practical problem, usually more special properties about the given matrix A is known, this allows other choices for D as long as it satisfies our three general considerations. For instance, when a physical problem is investigated, in order to determine the final parameters for the problem, it is necessary to do a series of experiments. In each experiment, the parameters are adjusted only by a small amount. If the question is related to an eigenvalue problem, then there is a family of matrices with the same (or similar) structures, each matrix is a small perturbation of the previous one (except the first). In such a case, the natural choice for the starting matrix is one of the matrices in the family whose eigensystems have been found, and we expect eigensystems of the new matrix to be a “small” perturbation of the starting matrix. 13 Chapter 3 Regularity and Bifurcation 3.1 Introduction It is well known that eigenvalues are continuous functions of the entries of the matrix. However, they are generally not differentiable. For example, consider A(t)=[l t]. (3.1) 1 1 its eigenvalues are A1(t) = 1 — \/t and A2(t) = 1 + fl. They are not differentiable at t = 0. This happens because, at t = 0, A(O) has multiple eigenvalues A1(0) = A2(0) = 1, that is, eigencurves have a bifurcation point. The behavior of eigenvectors at a bifurcation is even worse. They may not even be continuous. The following example is attributed to W. Givens in [21], §3.1: A(t) = [ 1+tcos(2/t) —t sin(2/t) J (3.2) —t sin(2/t) 1 — tcos(2/t) has eigenvalues A1(t) = 1 -t and A2(t) = l + t. They are analytic functions of t. However, the corresponding eigenvectors are cos(1/t) sin(l/t) 31“) = i $2“) = v [ —sin(1/t) ] [cos(I/t) [ 14 which have no limits as t —-v 0 ! Again the problem arises because A(t) has a double eigenvalue at t = 0. For our homotopy algorithm, handling the bifurcation efficiently becomes a very important problem. The behavior of eigencurves around such points can be quite complicated. Nevertheless, for symmetric matrices, bifurcation is not as difficult. In fact, a result proved in 19403 ([24], Chapter I) guarantees that A;(t) and x.-(t) can be chosen in such a way that they are all analytic functions of t. In such a case, the eigencurves through a bifurcation point are well behaved. A more general result of this type is considered one of the major breakthroughs in the past fifty years in eigenvalue perturbation theory [14]. 3.2 Regularity We summarize those results that related to our problem in the following theorem. Theorem 3.1 Suppose A and D are both real symmetric matrices. Let A(t)=tA+(1 -t)D= D+t(A—D), then the eigensystems (A10), 31(0), (02“), $20)), ' ' ' a(An(t)1 xn(t)) of A(t) can be chosen in such a way that all functions involved are analytic functions for real t. Furthermore, there are only finitely many t 6 [0,1] such that A(t) has multiple eigenvalues. Proof: see [14]. I Remark 3.1: In our algorithm, preserving the order A1(t) S A2(t) S - . - S A..(t) is very important. With such an order, we can implement parallel processing and compute only partial eigensystems when necessary. However, when this is imposed, 15 the conclusion of the above theorem is no longer valid. For example, let 1 1 — 2t A(t) = (3'3) 1 — 2t 1 then A1(t) = 2t, A2(t) = 2 — 2t and they are analytic. With the ordering, however, A1(t) = 1-[1—2tl, Ag(t) = 1+]1-2tl. They are not analytic at t = 1/2. Nevertheless, after the ordering, these A;(t)’s are still piecewise analytic and one-sided derivatives Ag")(t+) always exist. This is sufficient for our numerical implementation, since we trace eigencurves forward, only the right hand derivatives are needed. 3.3 Bifurcation Directions Suppose (MU), 31“», (A2“), 320)), ' ° ' 9 (Ana), $n(t)) are the eigensystems of A(t) as in Theorem 3.1, then Acme) = Aj(t)=vj(t). (3.4) Taking the right-hand derivatives with respect to t on both sides of this equation yields A'(t)x,-(t) + A(t)x;-(t+) = A;(t+)$j(t) + A,(t)x;~(t+). (3.5) Multiplying the above equation on the left by xflt), we have A}(t+) -_- xf(t)A'(t):c,-(t). But A(t) = A — D, so A;(t+) = zf(t)(A — D)x,~(t). (3.6) With this last formula, the prediction-correction scheme can be applied to numerically compute the eigensystems. However, at a bifurcation point, eigenvectors are not uniquely defined. For an eigenvalue of multiplicity 1:, any I: orthonormal vectors l6 from the Ir dimensional invariant subspace form an eigenbasis for that subspace. By Theorem 3.1, there is at least one way in choosing an appropriate set of eigenvectors for each t such that xJ-(t) becomes analytic. This choice is not known beforehand. And (3.6) can only be applied for this set of xj(t)’s. An alternative way of computing these bifurcation directions is given as follows. Theorem 3.2 Suppose A;(t) = Ag+1(t) = - - - = Ag+k-1(t) are k multiple eigenvalues of A(t). Let y1(t), y2(t), - - - , y)‘(t) be any I: orthonormal eigenvectors corresponding to these eigenvalues. Then A;(t+),A:-+1(t+),--- ,»\:-+,,_1(t+) equal the k eigenvalues of YT(A — D)Y, where Y is the matrix consisting of y.-(t),y.-+1(t),- - - ,y,-+k-1(t) as its columns. Proof: The proof of a more general result can be found in [14]. By using equation (3.5) a very simple proof can be obtained here. Let x.-(t), x.-+1(t), - - - , $g+k_1(t) be the analytic eigenvectors corresponding to A;(t) = Ag+1(t) = - - - = Ag+k-1 (t) as in Theorem 3.1. Multiply (3.5) by x,T(t) from the left, xl(t)(A — Din-(t) + Az(t)zi(t)x;(t+) = A}(t+)zi(t)x.-(t) + A1(t)wi(t)z;(t+)- Since A)(t) = /\,(t) and xjr(t)x,-(t) = 6;,- for i S l, j S i + k — 1, the above equation yields x,T(t)(A — D)x,-(t) = a,,1§(1+), for iS 1, j g i + k -— 1. (3.7) Let X be the matrix consisting of x,-(t), xg+1(t), - - - , $g+k_1(t) as its columns. By (3.7), P A;(t+) A2..(t+) A'(t+) a XT(A — D)X = (3.8) _ A:I'i"k--1(tnl-) J Because both {y,-(t), y.-+I(t), - - - , yg+k_1(t)} and {x.-(t), $g+1(t), - - - , xg+k_1(t)} form or- thonormal bases for the invariant subspace, there exists an orthogonal matrix Q such that Y = X Q Hence 17 YT(A - D)Y -_- QT[XT(A — D)X]Q. It follows that YT(A - D)Y and X T(A - D)X are similar matrices and have the same eigenvalues. By (3.8), they are A;(t+), A;+,(t+), - - - , Ai+k-1(t+)- I 3.4 Continuity of Eigenvectors The Rayleigh quotient iteration (RQI) is one of the most important methods in our algorithm (see Chapter 4). The convergence of ROI depends not only on a good choice of eigenvalue prediction, but also on a good eigenvector prediction. Usually, the eigenvector at step t is used as the starting vector for RQI at step t + At. In earlier discussion, homotopy algorithm computes eigenvectors at t = 0 as follows: find the eigenvectors for each diagonal block of A(O) = D then extend them to full length vectors with appropriate zero entries. The following example shows that this can be an inefficient choice. Let [11] [10] A: , D: , (3.9) 11 01 and A(t) = (l — t)D + tA, then A(t) = [ 1 t ] . (3.10) t 1 According to the earlier algorithms, eigenvectors of A(O) = D should be (by consid- ering D as a block diagonal matrix with two 1 x 1 diagonal blocks), $1(0)=[1], 32(0)=[0]. 0 1 However, for any t > 0, A(t) has eigenvectors 1 l 1 1 x1(t)_7§[l], x2(t)—7§ -1]. 18 These eigenvectors are not even continuous at t = 0. In fact there is a 45° turn in both vectors. Using x1(0) and x2(0) as starting vectors for ROI in the computation of x1(t) and xg(t) is obviously a bad choice. Again, the problem occurs because A(t) has multiple eigenvalues at t = 0, and any two orthonormal vectors can serve as base vectors for the corresponding invariant subspace. An important question is then, how can one find a set of eigenvectors at step t that can be turned continuously into a set of eigenvectors at step t + At? Using the notations in Theorem 3.2, we have Theorem 3.3 Suppose YT(A —- D)Y has eigendecomposition QA’QT. If A:‘(t'l")i Ai+1(t+)9 ' ' ' a Ai-i-k—l (t+) are distinct, then the columns of YQ are, up to a sign, the analytic eigenvectors described in Theorem 3.1. Proof: Let X be the matrix with analytic eigenvectors x.(t). 3:410), ° ° ° , $i+k—1(t) as its columns. Since both X and Y consist of orthonormal eigenvectors from the same invariant subspace, there is an orthogonal matrix Q such that X = YQ. Substituting into (3.8), yields QTO’TM - D)Y)Q = A'. 01' YT(A — D)Y = QA’QT. On the other hand, YTM — D)Y = QA'QT, QA'QT = QA’QT. 19 Because diagonal matrix A’ has distinct values A;(t+), A2+1(t+), . - - , Aj+k_1(t+) along its diagonal, the columns of Q and Q can differ at most by a sign. Hence the columns of YQ and X = YQ can differ at most by a sign. I In conclusion, when bifurcation occurs at t, YT(A — D)Y is formed by using the computed eigenvector matrix Y, then the eigendecomposition QA’QT is computed for this I: X It matrix YT(A — D)Y. From this decomposition, the bifurcation directions can be obtained. If these directions are mutually distinct, then YQ is formed and its columns are taken as a set of improved eigenvectors at this step, which can be used to speed up the iterations in later steps. In the example considered at the beginning of this section, a simple computation generates the improved eigenvectors at t = 0 1 1 1 1 31(0)=—‘/-§[1], 32(0)=7.2.[_1]. They turn out to be the exact eigenvectors at t = 1. 3.5 Close Eigenvalues Results in previous sections are derived under the assumption that A(t) has multiple eigenvalues at some t. In real computations, however, it is hardly distinguishable whether a matrix has true coincidental eigenvalues or pathologically close ones. For example, the famous Wilkinson matrix [101 l 19 1 W5: 10 1 191 110 20 has distinct eigenvalues since it is an irreducible symmetric tridiagonal matrix, but its first two eigenvalues are so close that they agree to 15 significant digits ([29], p.308-309). That is, these eigenvalues are practically indistinguishable. Moreover, even if the eigenvalues are distinct, formula (3.6) may not be appropriate to apply when eigenvalues are close, since in this situation the corresponding eigenvectors can be very sensitive. The computed eigenvector yj(t) may be significantly different from the true one xj(t), making yf(t)(A — D)y,-(t) inaccurate. In such a case, we will treat them together as a cluster. In the rest of this section, we demonstrate that the result of Theorem 3.2 can also be used in the case of close eigenvalues. Theorem 3.4 Let c > 0, suppose A(t) has eigenvalues A;(t),)..-+1(t),- - - ,z\.~+)¢-1(t) such that |A.-(t) — Aj(t)| S c forj = i+ l, . - - ,i+ k — 1. Then there exists a matrix E such that B(t) = A(t) + E has A,(t) as its k-multiple eigenvalue and “E”; S 6. Proof: Suppose A(t) has eigendecomposition QAQT, where Q is an orthogonal matrix and A = diag().l, A2,. - - ,An). Let A = diag( 61, 62, - . - ,6") be the diagonal matrix such that 51 = = 5a = 0, 5m = A; - Ai+la"'96i+k—l = A; - Aawe-1, 51+): = = 6,, = 0. Set E = QAQT and B(t) = A(t) + E. It is easy to see that B(t) has k-multiple eigenvalue )1,- and "Eng = IIQAQT“; = ”Aug S e. I Suppose B(t) has eigensystems (p1(t), 21(t)), (p2(t), 22(t)), - - - , ([1,,(t), 2,,(t)). When A(t) has close eigenvalues, e is small, hence B(t) is a small perturbation of A(t). Al- though eigenvectors corresponding to close eigenvalues are very sensitive, a collection of sensitive eigenvectors can define an insensitive invariant subspace provided the corresponding cluster of eigenvalues is isolated ( [12], p.199-208). So X = spanks“), $1410), ' ' ' , $i+k-1(t)} and Z = span{z.(t).z.-+1(t).- ' ° , 2141—10)} 21 are close. Let X = [x.-(t) xg+1(t)-- -x.-+k-1(t)], Z = [2,-(t) z;+1(t) . - -Z,'+k_1(t)], then the eigenvalues of XT(A — D)X and ZT(A — D)Z, are close since XT(A — D)X and ZT(A — D)Z are the projections of A — D onto X and Z , respectively. Now apply Theorem 3.2 to matrix B(t), o(ZT(A — D)Z) = {,1;(t)|j = 1,1 + 1, - . -,i + k — 1}. And the first order prediction gives #10 + At) t” #10) + #j-(UN = Mt) + Ilj-(tlAt- By Weyl’s Theorem [2], PW + At) - 2110+ At)l S Ilr‘1(t + At) - B(t + At)” = HE” S 6, hence A,(t + At) e ,1,(t + At) rs ,\.(t) + p;(t)At. It follows that we may replace #2“) by an eigenvalue of X T(A — D)X. Hence close eigenvalues and multiple eigenvalues can be handled in a uniform way. 22 Chapter 4 The Algorithm The tracing of eigencurves consists of the following main steps: 4.1 Choice of the Starting Matrix To start the algorithm, we need to choose a starting matrix D first. From the results developed in Chapter 2, block diagonal matrix with blocks directly from A will be our choice unless more information about the problem is revealed. At this stage, the sizes of the blocks remain undecided. It is clear that D will be closer to A if its diagonal blocks are larger, and the eigencurves will be better behaved. However, because of the requirement that the eigensystems of D should be much easier to find than those of A, the sizes of the blocks of D should be kept under certain limit. In our algorithm, QR iteration is used to solve for eigensystems of D. This requires that each block size be no more than a few hundred. Another important factor in choosing D is the computer architecture. When more processors are available, one can choose more (hence smaller) blocks; otherwise, fewer (hence larger) blocks should be used. In the test examples we run in BBN Butterfly Machine, we choose the block size to be around one hundred. This seems to be optimal for most of the cases comparing with other choices. In general, no formula is set to automatically generate the block 23 sizes of D. 4.2 Location of the Starting Points When the starting matrix D is chosen, we need to find values A;(0), x;(0) for i = 1, 2, . - - , n, that is, the eigenvalues and eigenvectors of D. Since D is a block diagonal matrix, its eigenvalues consist of eigenvalues from all diagonal blocks, its eigenvectors can be obtained from eigenvectors of diagonal blocks by extending them to full di- mension vectors with appropriate zero entries. In a parallel architecture, each block is assigned a processor to perform QR iterations on the block, and the results are collected to form the eigensystems of D. The QR iteration method is used since it is one of the best methods for matrices of relatively small size. Finally the eigenvalues of D are arranged in descending order to prepare for parallel processing in later steps. It is clear that this step has high parallelization level and requires relatively small amount of execution time. 4.3 Prediction From the previous step, eigenvalues )1,(0) and eigenvectors x,(0) are available for j = 1,2,- - - ,n. Assuming in general that Aj(t),x,-(t) at t have been obtained, we shall apply the prediction — correction scheme to compute the eigenvalues A,- and eigenvectors x,- at t + h. The choice of the step size h will be discussed in a later section. Here, we first consider the prediction procedure. In this step we not only predict the eigenvalue A,(t + h), but also predict whether or not it is an isolated eigenvalue of A(t + h). For isolated eigenvalues and for clusters, different correction methods are used. Case 1: If the j-th eigenvalue Aj(t) of A(t) is simple and well separated from the other eigenvalues, then X-(t) = xf(t)(A — D)x,-(t). J 24 Let the j-th predicted eigenvalue of A(t + h) be 1,(t + h) as ,\,-(t) + 1;.(t)h = A,(t) + zf(t)(A — D)x,-(t)h. All quantities on the right hand side of the above equation are available from step t. For the eigenvector, we simply use $j(t) as the prediction for xJ-(t + h). There are several reasons for this choice: 0 Although in theory one can use xJ-(t + h) z raj-(t) + x;(t)h, (4.1) but the high cost in computating x;(t) makes it an impractical approach. 0 Eigenvectors are more sensitive to perturbation than eigenvalues ([29], p.331- 335). It is difficult to compute the derivative xs-(t) to a high accuracy. Hence (4.1) usually does not yield significant improvement of xj(t + h). e In our correction step, the eigenvector prediction is not as important as the eigenvalue prediction. Case 2: Aj(t) is in a cluster of eigenvalues of A(t), then we proceed as follows. 1) Find It — the number of eigenvalues in the cluster; 2) Form X = [2:5, $5+1,°°-,.’Bg+k_1] where the corresponding A;,---,A,-+k_1 form a cluster; 3) Form XT(A - D)X; 4) Calculate eigendecomposition X T(A — D)X = QQQT; 5) Set X = XQ. 25 clusterof3 clustcrofS t t+h Figure 4.1: The change of clusters from t to t + h Now suppose fl=diag(w,- 025+] - - -w.-+),_1). By Theorem 3.2, Log, wi+1, -- -, wg+k_1 are the approximations of the derivatives of A,(t). Thus the predictions Aj(t + h) z Aj(t) + wjh (4.2) xJ-(t + h) z x_,-(t). (4.3) can be used. If some at; (i S j S i + k — 1) is an isolated value inside (0;, 10,-1.1, - - -, wg+g_1, then Aj(t + h) is likely to be out of the cluster at t + h (Figure 4.1). So it should be labeled as an isolated value and treated as in Case 1. If, on the other hand, some wj’s are among a cluster of tag, 025+], - - -, wg+k_1, the corresponding Aj’s are likely to form a new cluster at t + h (Figure 4.1), hence they are collected in a group and handled together at the correction step. The predictions for them are the same as in (4.2) and (4.3). 4.4 Correction From the last step, in addition to the predictions for Aj(t + h) and x_,-(t + h), the information about the isolation of A,(t + h) is also available. If it should be treated as 26 an eigenvalue in a cluster, the number of eigenvalues in the cluster is also known, then Subspace Iteration with Rayleigh-Ritz Procedure (SIRR) is used for the correction: Suppose {A,-, A,“ , - - - , L414} is a. cluster and the corresponding approximate eigenvectors are {x,-,x,°+1,- - - ,x.-+),-1}. Let 1 A = I“; + Ai-l-l + ° ° ° + Ai+k—l), X = [31° $£+1 ' ° ' 335+k-1l- Solve (A — AI)Y1 = X for Y1, then repeat the following for V = 1,2, - -- : 1) Decompose Yu = QuRu; 2) Solve (A — AI)Yy = Qu; 3) Form Hy = QZ'(A - D)-1Qu = QZYu; 4) Decompose H” = GVOVGI, where G is orthogonal matrix, Oyzdiag(0,-,0,-+1, ° - ° , 9.41:4); 5) Form Yu+1 = QuGu 6) Test for the convergence of 0;, 0;“, - - - , 0,-+).-1. Goto 1) until all 0;, 0,-+1,- - -, 9.4.1-1 converge. If A,- (t + h) can be treated as an isolated eigenvalue, Rayleigh Quotient Iteration (RQI) is the choice. However, in our algorithm, RQI is not applied directly at the very first step because of the following two observations: a) Although A,(t) is isolated, Aj(t + h) may not be so (Figure 4.2), and our prediction step can not detect such a situation. When this happens, if RQI is used, those undesirable behaviors of the algorithm, such as converging to an eigenvalue far from the original prediction, may occur [20]. b) Although RQI converges cubically, it usually needs a few iterations before achieving this rate (unless the starting values are extremely good). It is quite expensive to use when cubic rate can not be achieved. 27 P cluster of 2 isolated t (+11 Figure 4.2: An isolated eigenvalue at t becomes nonisolated at t + h To overcome these difficulties, two additional procedures are introduced. The first one is the Inverse Iteration (INVIT) procedure. It can be used to compute a better approximate eigenpair (A, x) by using inverse iteration recursively. It is cheaper than RQI. The second one is called SUBDIM, which is designed to detect whether an approximate eigenvalue falls inside a cluster of eigenvalues. If it does, SUBDIM will compute the dimension of the invariant subspace corresponding to this cluster of eigenvalues and also generates an approximate basis for the subspace. Otherwise, SUBDIM is equivalent to an additional step of inverse iteration. With the help of these two procedures, the correction goes as follows: First, a few steps of INVIT are used to obtain a better approximate eigenpair (A,x), then SUBDIM is called to determine if A is an isolated eigenvalue. If so, the faster method — RQI is applied for the rest of the correction steps. The starting values (A,x) for the iteration are now inside or closer to the cubic converging range of ROI. If A is not an isolated eigenvalue, then from SUBDIM, the dimension of an invariant subspace corresponding to eigenvalues close to A and an approximate basis for the subspace are available. Then, similar to the first case described at the beginning of this section, subroutine SIRR is applied to further improve the subspace to the desired accuracy. 28 The detail of INVIT and SUBDIM is described in the rest of this section. (a). INVIT: This is a slightly different version of ordinary inverse iteration procedure: 0) set xlo) = x,-(t,-), p = A;(t,-+1), k = 0 1) solve (Assn—1111“) = z“) (4.4) for yl"). 2) compute 7), = ||y(")|| and let (1:) 30+!) ._._ L. '71: 3) if 7), S M and 71/714 2 a then k=k+l, goto 1) else let A = p(x("+1)), x = xU‘“), quit; where 112””) = (x<*+”)TA(t.-..)x<*+'> is the Rayleigh quotient. There are two control parameters in this procedure, namely M and a. In our algorithm, we set M = 10‘/p and a = 1.2. It is well known that xl") will converge to an invariant subspace corresponding to the eigenvalues close to p and —) . l . mlnlp—Agl ‘71: The convergence comes in two ways: if the prediction [1 is very close to some true eigenvalues (min In — A.-I < 1/M), 7), > M is quickly satisfied; on the other hand, if p is not close to any particular eigenvalue (min lp — A,-| > 1/M), then we must wait for the second condition 7k/7k_1 < a to be satisfied. Inequality 75/71,-1 < 0 indicates that 7). will not grow significantly in the following iterations. In this case, the inverse iteration procedure has been stabilized or converged. For our purpose, this procedure 29 is not affected by close eigenvalues, since one of the two conditions must be satisfied in the end. There are other choices for M and 0. Since we will switch to a faster method eventually, it is appropriate to impose a loose converging condition. (b). SUBDIM — for computing the dimension and basis of the invariant sub- space corresponding to the eigenvalues close to A: Suppose (A19 $1)1(A29 33), ' ' ' 1(An1 37!) are n eigenpairs 0fA(tj+1)tlet {3’11 3"), ' ' ' 1 Sin} be a set of linearly independent vectors chosen beforehand. Take yl, orthogonalize it against x1 and call it yl again. Then 0) let k=l, 21: x i) solve (A(t...) — 102 = y. (4.5) for 2 ii) compute ||z||. If "z” is large, then let 21+] = z/||z||, take yk+1, use MGS to orthogonalize it against {21, 22, - - - , 2H,}, let iterat=1,k=k+l, goto i). else if iterat = 1, orthogonalize 2 against {21,Zg,'°',2k}, let 3),, = z/||z||, iterat = iterat+l, goto i). end if. where MGS stands for modified Gram—Schmidt process [12]. From this procedure, It is the dimension of invariant subspace and {21, 22, . - - , 2k} is an approximate basis. The theoretical background for this procedure is the following. 30 Suppose A1, A2, . - - , Am are in a cluster for some 1 S m < n such that max IA) — AIS 6, min IA) — AI 2 a > e. (4.6) lSlSm m-HSlSn Expand the vector y), in (4.5) in terms of x1, x2, - - - , x": n m n w: = 2 “(on = 2 “la-TI + 2: “lib”: 314' silo) + ill”, i=1 l= l=m+l then the solution of (4.5) is Z = (A(tm) - All-1w: = (A(tm) - All-1 20951 l=l 1'! l=1 m (k) n (k) a! at = x; + x). g A) — A , Ea, A) — A The norm of z“) is not big, in fact, (I) = (al ) < Zl=l(al ) <1 II2 II J z —— _ .- l=m+1 [Al - Alz - minm+lSlSn [At - Al If k < m, y), is orthogonal to 17), = {21, 22, - - - , 2),}, hence y), has nontrivial component ylo) in Vm, and the norm of z grows rapidly during the iteration; in fact, "2“ 2 "2(0)” = \Ji “(f-[2" Z lIla/lo)"- l=1 IA; — A] 6 Hence ||z|| becomes large in one or two iterations, and when the procedure continues, the subspace dimension continues to grow. When one reaches 1: = 171, however, y), is orthogonal to the whole subspace 17", which is a good approximation to Vm, hence yl‘o) z 0, and ”z” is not large. Then the procedure ends. In such a way, we can detect the size of the cluster and generate a set of vectors which is a good approximate basis for the corresponding invariant subspace. 31 It is necessary to be specific about “large” and “not large” when this algorithm is used. Notice that keeping on iterating (4.5) yields, Ila/lo)" -* 1. llylnll -' 0. hence in the end NZ" 2 alt—t Therefore, we can say, for example, "2" is large if it is greater than l/2e, it is not large otherwise, where e is as in (4.6). Our experience is that, one needs not be too restrictive, because underestimating the cluster dimension is more serious than overestimating it, which can cause the slow convergence of the subspace iteration. One remedy for the possible incorrect dimension count is that one can use “dynamic” scheduling (describe in the next section) during the progress of the subspace iteration. 4.5 Dynamic Subspace Iteration When the convergence of the Rayleigh quotient iteraion or subspace iteration is too slow (more than three iterations, for instance), it indicates that the estimate of the subspace dimension is inappropriate, either too big or too small. The iteration process needs to be monitored more carefully, and the subspace dimension should be ajusted during the progress when it becomes necessary: after a few steps of Rayleigh-Ritz procedure [23], only those Ritz vectors with their Ritz values close together are kept for further iterations. The subspace dimension is then reduced and the condition of the subspace is improved. This in turn will reduce the computation and speed up the convergence. If, on the other hand, all the Ritz values are close and convergence is still slow, then it becomes necessary to enlarge the subspace. This can be achieved by calling procedure SUBDIM to determine a more accurate dimension for the in- variant subspace, and the subspace iteration resumes after this. Such a process often accelerates the convergence considerably. 32 Remark 4.1: For large sparse matrix, factorization is feasible, but it is still one of the major parts of the computation when solving an equation. Therefore we keep the number of factorizations to a minimum in our algorithm by including all the procedures such as INVIT, SUBDIM, SIRR, and ROI in the algorithm. INVIT, SUBDIM, and SIRR are all iterative methods. They require solving the systems with different right hand sides only, that is, for different iteration step, new factorization is not needed. These methods are not the fastest, but by combining with faster methods such as RQI, one can still achieve high convergent rate to reduce the computation time. 4.6 Checking When RQI or SIRR is used, equations of the type (A(t) - #03 = y are solved. By taking advantage of the symmetry, the so called symmetric Gauss elimination method [3] can be applied. That is, A(t) - 111 can be decomposed as LDLT (instead of the traditional LU form), where L is a lower triangular matrix and D is a block diagonal matrix with diagonal blocks of order one or two. Using Sylvester’s Theorem, the inertia (u, C, 7r) of A(t) — #1 is available as a by-product of the decomposition, where u,(, and r are the number of eigenvalues of A(t) which is greater than, equal to and less than a, repectively. With the availability of the inertia, the location of 11 relative to other eigenvalues of A(t) is known. To see how the checking is done, let’s assume the eigenvalues of A(t) are A1 S A; S - - . S An and ROI is applied to compute the eigenvalues. Suppose the lc-th step of RQI is: 1). Solve (A(t) — 11021)le = x“) for y“). 2). Compute 7), = ||y(")|| and set self“) = ylkl/ryk. 3). Set 110‘“) = (x("+1))TA(t)x("+1) and goto 1). 33 Suppose the convergence is observed at step m. Let the inertias corresponding to A(t) — p('”‘1)I and A(t) — ”("01 be (um-1,(m_1,1rm-1) and (um, (mtrm) respectively. If we denote i = um-“ then A.- S plm'll S Ag“. Now the checking can be done as follows: 1). If V... = i and pl”) S [Am-1), set A,- = pl”). (m) 11 l l 2., 11““) 1.1.1 2). If V... = i and 11“”) > fl(m-1), set A,“ = til”). (at) u A, “(m-1) 114-1 3). If um = i + 1, set A,“ = 11"”). l4‘01!) A, “(m-1) Am 4). If um = i — 1, set A,- = 11“"). ”(m) A, ”(m-1) At+1 Hence with minimum computations, we know whether the process converges to the eigencurve being followed. In general, jumping to a neighboring curve does not occur when eigenvalue separation is good. When the separation is poor, a cluster of eigenvalues are computed which usually includes the one we wanted. Our iterations in the correction steps are successful most of the time. In case the process does converge to a neighboring eigenvalue, there is a good chance that this value is also needed and the corresponding eigencurve has not been traced. Usually, the number of eigenvalues sought is larger than the number of processors available, so a total waste is unlikely. 34 For example, suppose we have 100 eigencurves to trace and there are 10 proces- sors available for the purpose. Assume the scheduling is arranged as follows: The first processor is assigned to trace A1(t),A2(t),---,Alo(t); the second processor is assigned to trace Au(t),A12(t),o - ',A20(t); and so on. If jumping occurs when the first processor is following A, (t), then the computed eigenvalue should be one among A2(t), A3(t), - . - , A10(t), say A3(t). Then tracing of the third eigencurve can be skipped. 4.7 Clearing Up During the tracing of eigencurves, a significant amount of time is spent on keeping the process on the right curve. There are several existing techniques for this purpose [l7],[18]. However, there are cases where these techniques are too costly. Several new approaches have been introduced in our algorithm. They are designed especially for large matrices. Instead of imposing very strong restrictions on those control parameters, our strategy is to abandon the process of tracing the curve if convergence to a desired value is not observed after reasonable efforts have been made. In the end, most of the needed eigenvalues of A are computed. A few of those missed are scattered across the spectrum of A and are isolated by those found, that is, there are several small intervals containing those missed values. By counting the inertias corresponding to the end points of each interval, the number of eigenvalues inside such an interval is known. These values are well separated from the neighboring ones since clusters have been found by SUBDIM in the algorithm. Under these favorite conditions, subspace iteration method can be applied in each interval efficiently to find the eigenvalues inside it, and this can be done in parallel. 4.8 Step Size Control Our extensive numerical experience shows that the step size should be chosen as large as possible. Allowing small step size can lead to inefficiency. In our algorithm, the 35 first attempt of the step size h is chosen to be max{0.25, (1 — t)/2}. If convergence is not achieved at some step and h > 0.25, the step size is cut in half and the process is repeated. If the failing occurs when h S 0.25, we will simply abandon the process of following this specific curve. Then the eigenvalue may be computed by other process (such as SUBDIM if the value is among a cluster) or may be considered missed. Because we have a back up procedure for those eigenvalues missed due to the abandence of tracing the corresponding curves, this “giving up” strategy does not cause problem, instead, it is more efficient to have a lower bound imposed on h. There are cases that, under no control limit, h becomes very small, hence causing the long execution time at a single “bad point”. In our many examples, the largest number of missed eigenvalues is usually less than 10% of the total values sought. 36 Chapter 5 Numerical Experiments 5.1 Introduction Our algorithm has been implemented in several machines using FORTRAN language. The sparse matrix A is held using coordinate scheme ([8], p.23—24), that is, the matrix is specified as a set of triples (a;,-, i, j ), they are stored in one real array and two integer arrays. Because A is symmetric, only the upper triangular part is stored. In our algorithm, sparse linear systems of the type (A(t) — AI)y = x (5.1) need to be solved frequently. The intensive research in sparse matrix techniques in the past two decades resulted in several efficient codes for solving such systems. We choose MA27 subroutines from Harwell Subroutine Library for this purpose. This code can solve indefinite symmetric systems, such as (5.1), stably and with minimum overhead above the code for positive definite systems. It can also obtain the inertia of A(t) — AI as a by-product. Both of these features are vitally important to our algorithm. The code MA27 has three separate steps: Symbolic Analyse, Factorize, and Solve. Symbolic Analyse exploits the sparse structure of the matrix and estimates the work- ing space needed for later steps. Factorize implements a version of Gauss Elimination 37 to compute the LDLT decomposition of the matrix. The last step, Solve, uses this decomposition to actually solve the matrix system. Since the underlying matrices A(t) have the same sparse structure for all t 6 [0,1], Symbolic Analyse is required only once in the whole algorithm. In addition, for inverse iterations, one needs to solve systems with different right hand sides only, new LDLT decomposition is not necessary, i.e., only the last step, Solve, is called for each iteration. This leads to significant saving in time because Factorize is the most expensive one of the three. 5.2 Test Matrices Four types of matrices are used to test our algorithm. The first two can be found in [5]. They are common matrices for testing purpose. 1. Diagonally-disordered matrices: These matrices arise in the study of two-dimensional N X x NY arrays of atoms in disordered systems [15]. The associate matrix A has order N = N X X NY, [B I 1' [$1 1l I B 1x A: ,3: I ".1 .I I B] L1 lx‘ The matrix A is almost block tridiagonal. I is the identity matrix of order N X . Each B block is N X X N X and there are NY blocks down the diagonal of A. The diagonal elements of B are randomly generated numbers. A scaling parameter bounds the mag- nitudes of these disordered terms. In the simplest case, these entries are chosen from a uniform random distribution over an interval [~SCALE, SCALE] determined by the user-specified scaling parameter SCALE. A second class of diagonally-disordered ma- trices is obtained by choosing the diagonal elements randomly as either 0 or SCALE. If N X and NY are coprime, then all of the eigenvalues of A are distinct. 38 2. Poisson matrices: When the Laplace equation um+uW = Au, R: {(x,y)|0 S x S X,0 S y S Y} is solved numerically, the differential equation is replaced by an algebraic linear sys- tem, Ax = Ax, obtained from discretizing the Laplace operator on the rectangle. The order of the resulting matrix is N = N X x NY where N X is the number of subdivisions in x-direction and NY is the number of subdivisions in y-direction. The matrix A is block tridiagonal, r - 1 —3c B C —sc 1 —c C B —c A: C 1 B: —c C B 30 —c l —sc 30 B ' ‘ _ —sc 1 ] The parameter c is user-specified and must be chosen such that 0 < c S 0.5. Here, C = —(0.5 — 0)]. For Dirichlet boundary conditions 3 = 1, and for Neumann con— ditions s = J2. Under Dirichlet conditions, the eigenvalues (and eigenvectors) of A are known, A;,- = [1 — 2ccos(1ri/(NX +1)) — 2(0.5 — c)cos(1rj/(NY +1))], 1SiSNX,1SjSNY. Under Neumann conditions, however, the eigenvalues and eigenvectors are not known a priori. By varying the value of c, many different eigenvalue distributions can be obtained. 3. Random matrices: The sparse matrices in this group are generated in three steps: Let N be the order of the matrix. We use rand to represent random number generator that produces uniform distribution on interval (0,1). Then 39 0) Clear the matrix: for I=1,- - -, N; J=I,- - -, N A(I, J) = 0.0. 1) Generate N entries randomly in the upper triangular part of A: for i=1,- - -, N =Int(N*rand(i)+1), J=Int(N*rand(I)+1) if I