1|}!leWWW“)WWIWW“MINIMUM 12 2 0 N01 'YHESlS \ llllllllllMIIIIHIWWlllllllllllllllllllllllllllllllllllll 31293 01413 7727 This is to certify that the dissertation entitled A SCALABLE ALGORITHM FOR NON-SYMMETRIC EIGENVALUE PROBLEM presented by Xiaozhuo Yang has been accepted towards fulfillment of the requirements for Ph . D. degree in Applied Mathematics 7 r—Cvx' \7/:£L- 6’) Major professor Date May 2: 1996 MS U is an Affirmative Action/Equal Opportunity Institution 0- 12771 LIBRARY M'Chigan State ’ University PLACE IN RETURN BOX to remove thle checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE L_L__| T—ll—TF—J usu IeAn Affirmative Action/Ewe! Opportunity lnetltulon W m1 A SCALABLE ALGORITHM FOR NON-SYMMETRIC EIGENVALUE PROBLEM By Xiaozhuo Yang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY Department of Mathematics 1996 ABSTRACT A SCALABLE ALGORITHM FOR NON-SYMMETRIC EIGENVALUE PROBLEM BY Xiaozhuo Yang This thesis is on iterative methods which find all the zeros of a monic complex polynomial simultaneously and an application to the algebraic eigenvalue problems. In Chapter One we investigate the convergence behavior of the Durand-Kerner method on the real plane R2 and characterize the convergence set completely in C2. The Durand-Kerner method converges if and only if the initial approximations are in the convergence set. Chapter Two presents a modified Aberth method to find the multiple zeros of a polynomial efficiently and accurately. We show that the modified Aberth method converges cubically. We also describe the property of the original Aberth method near a multiple zero and hence propose a scheme to dynamically detect the multiple zeros and their multiplicities. In Chapter Three we apply the Aberth method, along with homotopy method, to the nonsymmetric eigenvalue problem. The algorithm has been implemented on the Intel Touchstone Delta multicomputers. The result shows this algorithm is competi- tive to HQR from EISPACK. A dynamical load-balancing technique is developed to balance the load across the computing nodes. The load-balanced algorithm improves the performance by up to 15%. DEDICATION To my wife, Ling Gao and sons, Pengling and Phillip, for their understanding, support and patience. iii ACKNOWLEDGEMENTS I am most indebted to my dissertation advisor, Professor T.Y. Li, for his encourage- ment, support adn advice during my graduate study at Michigan State University. His deep insight into mathematics and persistence have always influenced me. I also would like to thank my dissertation committee members, Professor Michael Prazier, Professor Richard Hill, Professor John McCarthy, and Professor Zhengfang Zhou, for their valuable suggestions and precious time. iv Contents LIST OF TABLES . vi LIST OF FIGURES .............................. viii 1 Dynamics of the Durand-Kerner Method 1 1.1 Introduction ................................ 1 1.2 The Derivation .............................. 2 1.3 Dynamics of the Durand—Kerner Method in R2 ............ 3 1.4 The Complex Case ............................ 15 2 A Modified Aberth Method of Finding All Roots of a Polynomial with Multiple Roots 18 2.1 derivation ................................. 18 2.2 Local Convergence ............................ 21 2.3 The Aberth Method at Multiple Zeros ................. 28 2.4 Dynamic Estimation of Multiplicities .................. 37 2.5 A Numerical Example .......................... ' 40 3 Nonsymmetric Eigenvalue Problem 44 3.1 Introduction ................................ 44 3.2 Splitting Procedure and Homotopy ................... 44 3.3 Hyman’s Method ............................. 47 3.4 Initial Implementation .......................... 49 3.5 Load—Balancing .............................. 52 3.6 Conclusion ................................. 56 BIBLIOGRAPHY 57 vi List of Tables 3.1 Time on i860 with a 64 x 64 random matrix ............... 51 3.2 Time on i860 with a 128 x 128 random matrix. ............ 51 3.3 Time on i860 with a 256 x 256 random matrix. ............ 52 3.4 Unbalanced Load Distribution ..................... 53 3.5 Balanced Load Distribution ....................... 54 vii List of Figures 1.1 The Convergence Set ........................... 6 1.2 The Decomposition of R2 ......................... 8 1.3 The First Iteration ............................ 10 1.4 The Second Iteration ........................... 11 1.5 The Monotone Convergence ....................... 13 2.1 Algorithm MODABERTH ......................... 43 3.1 Algorithm ABERTH ............................ 50 3.2 Unbalanced vs. Balanced Timing .................... 55 3.3 Timing Info for 240 x 240 and 360 x 360 matrix ........... 55 viii Chapter 1 Dynamics of the Durand-Kerner Method 1 . 1 Introduction The Durand-Kerner method is an iterative method which approximates simultane- ously all the n roots of a complex monic polynomial of degree n. It was first proposed by Durand [11], and later, independently, by Kerner [20]. Since then, more work on this method has appeared in the literature [17, 18, 21]. With a monic polynomial 19(2) =zn+alz"_l+~-+an_lz+an, (1.1) the Durand—Kerner iterations are z zizzk— n Mk) , k=1,2,...,n. (1.2) I—Ijzl,j¢k(zk _ 23') Here, {21, 22, . . . , 2n} is a set of approximations to the n zeros of p(z). Empera- tively, {21, 2;, . . . , 2;} forms a set of better approximations. The method was shown to have convergence order two when the polynomial has only simple roots [3, 20]. It was conjectured in [19] that this method converges for almost all initial approx- imations in C", when the initial approximations are viewed as a point (21, 22, . . . , 2”) in C". To the best of our knowledge, this conjecture remains open, except for the case n = 2. It was mentioned in [19] that T. Terano gave a proof [33] for n = 2 in his Ph.D. thesis. In this chapter, we investigate the convergence behavior of the Durand-Kerner method on the real plane 82 and characterize the convergence set completely in C2. The Durand-Kerner method converges if and only if the initial approximations are in the convergence set. After the derivation of the Durand-Kerner method in Section 1.2, we discuss in Section 1.3 the convergence behavior of the Durand-Kerner method in R2. In Section 1.4 we show that the Durand-Kerner method converges quadratically in its convergence set in C 2. 1.2 The Derivation The Durand-Kerner method can be derived from different approaches [11, 13, 20]. Here, we give a simple derivation of the Durand—Kerner method following [2]. We also discuss some properties of the Durand-Kerner method. Suppose that $1, $2, . . . , at” are n simple roots of polynomial p(z) and 21, 22, . . . , 2,, are n initial approximations. Let 6;, = :51, — 2k. Then, polynomial (1.1) can be written as 71 10(2) = 11(2 — (2;. + 61.)). k=l Expanding the right hand side in the powers of 6;, and dropping all higher powers of 6;, yield z)E’H(z—zk) 251,1] z—z, k=1 k: 1 jyék For a fixed index k, plugging in 2;, on both sides and solving for 6;, give the first order approximation for 6k, P(Zk) 6;, = — . (1.3) Hj¢k(zk — 21') This gives formula (1.2). Notice that the sum of the Durand-Kerner iterates z], 2;, . . . , 25,, is —al [21]. 1.3 Dynamics of the Durand-Kerner Method in R2 In this section, we consider the Durand-Kerner method in R2. All numbers considered are real numbers except otherwise stated. We assume that the polynomial has degree two with distinct real roots, having the following form where a and b are two distinct real numbers. For simplicity, let A(:L', y) = (:r’, y’) be the Durand-Kerner iteration, where 1:: : Obviously, the Durand-Kerner method does not converge when the initial points are 1:- (1.4) chosen from the set D = {(x, y) E R2 : a: = y}, which makes the denominators zero. Let J = A’1(D). This set can be described in the following theorem. Theorem 1.3.1 The set J has the following form J:{(x,y):2xy—(a+b)(:r+y)+2ab=0}. Moreover, the Durand-Kerner iteration maps all points in J into one single point (new) 2’2 ' Proof. Suppose (:r,y) is in the set A‘1(D), then _ (Iv-a)(-’L'-b) = _ (y-a)(y-b) :c—y y—a: ' So, xz—xy—a:2+(a+b):c—ab_ yz—xy—y2+(a+b)y—ab x—y _ y—m ' Hence, 2xy—(a+b)(:c+y)+2ab=0. (1.5) On the other hand, let (:r,y) satisfy condition (1.5). If :1: 7E 3%, then y = W. Bring it into the Durand-Kerner iteration (1.4), yielding ,_ (:c—a)(a:—b) a+b (I? ‘- (B _ ]a+b[.1:—2czb : 2 ' a: - 2x—(a+b) Similarly, ,_ (y-a)(y-b)_a+b y _ y — ]a+b]y-Zab - 2 ' y — 2y-(a+b) Now, we show that :1: 52$ 9359 when (x,y) E J. Ifx = 13:5 and (x,y) E J, then 2a+b b y—(a+b)(y+a—:—)+2ab=0. 5 Consequently, —L%§L2 + 2ab = 0. So, (a — b)2 = 0. Hence a = b. This contradicts to our assumption that a and b are distinct. Let P = ((a + b)/2, (a + b)/2), then the Durand-Kerner iteration A maps J onto P. I Let L = {(33, y) : r + y = a + b}, a straight line on the real plane. The two fixed points P1 = (b,a) and P2 = (a,b) of the Durand-Kerner iteration function (1.4) are on L (See Figure 1.1). Theorem 1.3.2 After one Durand-Kerner iteration, (:r',y’) in (1.4) satisfies :r’+y’ = a + b. Proof. Choose any starting points a: and y such that a: 75 y. By the Durand- Kerner iteration (1.4) , , (x-a)(~'v-b) a: = :1:— 3‘31 —:ry+ (a+b)a: — ab 3 :r—y and , — —b y I y_(y a)(y ) y—a: _ -:I:y+(a+b)y—ab _ y—at , So, x'+y'—a—b = a"—a+y'—b —a:y+(a+b)a:—a(x—y)—ab :r—y —:cy+(a+b>y—b—ab y—x + 6 —:cy+b:c+ay—ab+:ry—ay—ba:+ab 3"?! Pl Lh Figure 1.1: The Convergence Set Theorem 1.3.1 and Theorem 1.3.2 show that the Durand-Kerner method maps It2 —- D onto L and maps J onto point P. We will show next that R2 — (J U D) is the convergence set on the rea1 plane where starting in which the Durand—Kerner method converges . For convenience, define (a+b):c—2ab 2x—(a+b)' d(a:) = We assume a > b. The graph of D is a straight line passing through the point P = ((a + b)/2, (a + b)/2) in R2 while the graph of d(a:), the set J, has two branches (see Figure 1.1). The function y = d(:c) has a horizontal asymptote and a vertical asymptote, Lh= 3! = 99?, Lu: :1: = gig—Q. L}, and LU divide the plane into four regions: II={(x,y):x>mandy> w}. 12={(:r,y)::r<9flandy>a—+b. I3={(:r,y):.r d(:c)}. [12 = {(x,y) E 11 : y > a: and y > d(a:)}. 113 = {(x,y) E 11 : y > .r and y < d(a:)}. [14 = {(a:,y) E II : y < a: and y < d(a:)}. I31={(:c,y)6 13 : y > :1: and y < d(:r)}. I32 : {(a:,y) 6 I3 : y < :1: and y < d(.r)}. 133 = {(x,y) E [3 : y < :1: and y > d(:c)}. 134 = {(a',y) E 13 : y > :1: and y > d(:z:)}. Line y = a and line a: = b divide the regions 12 and 14 into eight regions: 0 [21={(:I:,y)€I4.b<:rS(a+b)/2 andaaandbaandy :r, y > d(a:) and a: > (a + b)/2. For (x’, y’) = A(:c, y), by Theorem 1.3.2, we only need to show ar’ > (a + b)/2. , a+b —:cy+(a+b).r—ab a+b .r- = - 2 x—y 2 _ 25ry—2(a+b):r+2ab—(a+b)y+(a+b).r 2(y-arr) _ 2:1:y—(a+b)a:—(a+b)y+2ab 2(y-as) ' Since the denominator is positive by assumption, the numerator also needs to be positive. Now, 0 < y — d(.r) (a + b):c — 2ab 2a: — (a + b) _ 2xy—(a+b)y—(a+b)x+2ab — 2:1: — (a + b) The denominator is positive, so is the numerator. this proves cr’ — @ > 0. I Lemma 1.3.4 A(114) C [4 O L. Proof. Suppose the initial point (x,y) 6 114, then y < :L‘, y < d(:c) and :r > (a + b) / 2. The assertion can be proved by applying the same arguments as in Lemma 1.3.3. I Similarly, we can show that A(111) C [2 O L and A(113) C 12 (1 L. By symmetry, similar results hold when the starting points are in [3, namely, A(I31) C [2 n L, A(I33) C [2 n L, A(I32) C 14 (1 L and A(I34) C 14 O L (see Figure 1.3). 10 ““.‘12 I] If _____ r- ---l.!.2_:.i' .. I, +.. 4 ’ 11 Ill l 114 e l'131 I34 4 a"; 132 I-." “ .‘I .r ...... 133-- ......... ‘e .‘ L 13 I4 Figure 1.3: The First Iteration Next, we show that the Durand-Kerner iterations converge to P2 = (a,b) and P1 = (b,a) when the starting points are in I4 and 12, respectively. By symmetry, we only consider the case of 1.; (see Figure 1.4). Lemma 1.3.5 A([41) C [42 n L. Proof. Let (x,y) E I41. Notice that y < 3:, a: > a > (a + b)/2 and b < y < (a + b)/2. For (1", y’) = A(2:,y), (III—a : _$y+(a+b)x_ab_a($_y) 13—3] _. —$y+bx+ay—ab _ 53—3, _ (x—GXD—u) _ x—y < O, and I a+b —2a:y+2(a+b):z:—2ab—(a+b)x+(a+b)y 2 2(x - y) 11 —2xy + (a + b):c + (a + b)y — 2ab 2(x - y) Combining inequalities w > 2ab and a: > a > Lag—bl, yields —2a:y + (a + b)a: + (a + b)y — 2ab > —2a:y+(a+b):r+(a+b)y— (a + b)’ 2 = 0.5((a + b) — 2y)(2:1: — (a + b)) >0. So, 93’ — $13 > 0. This proves our assertion. Similarly, we can show that A(I43) C [42 n L. 12 “ 11 13 \ \ ‘ \ 132,.-. I43 -‘ ¢’ . -‘ -- Figure 1.4: The Second Iteration When the starting points are chosen in 142 we have Lemma 1.3.6 A(I42) C [44 0 L. Proof. Let (a3,y) E [42, then y < :r, 91sz < a: g a and b S y < 9:39. 12 For (:r’, y’) = A(:c,y), we need to show 93’ Z a since (:r’,y’) lies on L. , —:cy+(a+b)a:—ab—a(:r—y) a: —a = :1: — y _ (x - a)(b - y) 33 ‘y > 0 I For region 144, we have Lemma 1.3.7 AU“) C [44 0 L. Proof. When (2:,y) E 144, cc 2 a > b 2 y. Then, :c’—a : —:ry+(a+b)at—ab-a(.r—y) I ”y = (x —a)(b—y) ‘7" ‘y 2 0. and ,_b _ -xy+(a+b)y-ab-b(y—w) y — y —.’1: = (m—a)(b-—y) y—a: S 0 SO ((13,, y’) E I44 (1 L. I We have shown that after at most two iterations the Durand-Kerner iterates lie on the line L in either 144 or 122 (see Figure 1.4). Next, we will show that the Durand-Kerner iterations converge monotonically on the line L towards the points P2 2 (a,b) and P1 = (b, a) when the initial approxima- tions are in the region I44 (1 L and [22 {'1 L, respectively (see Figure 1.5). 13 ‘2 ~. 11 122 “ Pl \4 13 I4 Figure 1.5: The Monotone Convergence Theorem 1.3.8 If the starting point (:r,y) E L F1 144, then the Durand-Kerner iter- ations converge to (a,b) monotonically. Moreover, it converges quadratically. Proof. Since (2:,y) lies on L, we have a: + y = a + b. For (x’, y’) = A(:c,y), we also have ar’ + y’ = a + b. Hence, y = a+b—x. So, $I_a : ($—a)(b—y) 1'"?! : (r—a)(:c—a) 2:c—(a+b) = 2(2) Gib)”; a) 14 ° x—a Since 0 < 2(x—“—}9) < 1. Let (34231”) = (A)“($.y) E A(fink—1016.11». then 1 0 < a:(k+1)— a < —(;r(k) — a) < (§)k+l(:r — a) inductively from (1.6). So, .731") converges to a monotonically. At the same time, y”) converges to b monotonically. Therefore, (2711‘), y‘kl) converges to (a, b) monotonically. Next, we consider the rate of convergence. Since (||(:v’. y’) - (a, b)|l2)2 = (50’ - (1)2 + (y’ - b)2 (”(23.11) - (a,b)II2)4 ((11 - a)2 + (y - WV 2 rug-1311? 2 ((33 - a02 + (y - (9)2)? (:13 - a)2(x - a)2 (a + b — 2x)22(:c -— a)4 l (a +b— 2:22)2 1 (x - 11-1)? .hh—i The Durand-Kerner method converges quadratically. I Similarly, we can prove the following Theorem 1.3.9 If the starting point (.r,y) E L (1 122, then the Durand-Kerner iter- ations converge to (b,a). Moreover, it converges quadratically. In summary, after at most two iterations the Durand-Kerner method starting at 15 a point in R2 — (J U D) will converge to the solution monotonically along the line L and the rate of convergence is two. 1.4 The Complex Case This section discusses the Durand-Kerner method in C 2. Let the two zeros a and b as well as the starting points :1: and y be complex numbers. Similar to the previous section, (a, b), (b, a) and (3:, y) are viewed as points in two dimensional complex space C2. Let D = {(x,y) 6 C2: 1: = y}, J = {(a:,y) 6 C2: 2:1:y—(a+b)(a:+y)+2ab = 0} and L = {(a:,y) E C'2 : a: + y = a + b}. Notice that Theorem 1.3.1 and Theorem 1.3.2 are still true in complex case. J is the pre—image of D and after one iteration all points lie on the complex line L. We also notice that both J and D have Lebesgue measure zero in C 2. Similar to the Durand-Kerner method on the real plane, we show that the Durand- Kerner method converges for any starting points in the set C2 — (J U D U S ), where S is a set of Lebesgue measure zero in C 2. The line L can be parameterized as follows: a+b a-b 2+2t a; : (1.7) y=%+%t 1 where t is a complex number. When t = 0, —l and 1 the corresponding points are P = ((a + b)/2,(a + b)/2) , P1 = (b,a) and P2 = (a,b). Since the Durand—Kerner iterative function (1.4) maps C 2 — (J U D) onto L — P, we assume, for the moment, that the starting points are chosen from L — P. These points can be parameterized by (1.7) with complex parameter t # 0. We will see that Durand-Kerner method does not converge when the initial approximation is chosen 16 by (1.7) with an imaginary number t. Let S be the set of points in 02 corresponding to imaginary t, that is, .5' = 5,, U 5y, where a+b a—b 2 + 2 ri)} S.={(w.y)€02=(fv-y)$-($-a)($-b)=(iv-y)( and S.={(x.y)e C2:(y—x)y-(y—a)(y—b> =(y—x)(“',*b+ bgarm, where r is a real number and i = \/—1. It is easy to see that S is a geometric surface in C 2 , hence it has Lebesgue measure zero. Now, the Durand-Kerner iterative function (1.4) maps points in C 2 - (J UDUS) to points on line L which can be parameterized by (1.7) with non-zero and non-imaginary t. Let (:r,y) be on L with parametric equation (1.7). For (33’, y’) = A(a:,y), we have I _ gfl [a-b]]1+t2] a: — 2 + 2 2t _ a+b (5-02 (1+9) y' — '2— + 2 2t ‘ Therefore, the Durand—Kerner iterations are determined solely by the iterative behavior of the complex iterative function It is easy to see that f (z) is just the Newton iterative function for the quadratic polynomial q(z) = 22—1. That is, f (z) = z—:—,(L:%. The Newton iterations of quadratic functions are well understood [4]. We list some results in the following Theorem 1.4.1 The iterations of the function f(z) have the following properties: 1. f"(z) converges to 1 if the real part ofz is positive, 17 2. f"(z) converges to -1 if the real part ofz is negative, 3. f"(z) does not converge ifz is on the imaginary axis, where f”(2) = f(f“’l(2))- Therefore, the Durand-Kerner method converges when the starting points are chosen from 02 — (J U D U 5'). Since Newton’s method converges quadratically, so does the Durand-Kerner method. We conclude that the Durand-Kerner method converges if and only if the starting points are in the set C'2 — (J U D U S). Chapter 2 A Modified Aberth Method of Finding All Roots of a Polynomial with Multiple Roots 2.1 derivation The Aberth method is a parallel iterative method for finding all roots of a monic complex polynomial with simple roots simultaneously. This method was discovered many times, see Aberth [2], Borsch-Supan [5], Docev [8], and Ehrlich [12]. For a monic complex polynomial p(z)=z”-1-a12""1+---+a,,_lz+an , (2.1) with n distinct initial approximations zl, 22, ..., zn, the Aberth iterations are 2 zfczzk— p( k) k=1,2,...,n. (2.2) P’(Zk) - 19121:) zy=1g¢k 11—; ’ It was shown [2] that the Aberth method converges cubically to the roots of p(a:) when it has only simple roots. It is also observed that this method converges to the n roots simultaneously for almost all initial approximations. To be more precise, considering the initial approximations (21,22, . ..,z,.,) as a point in C", the Aberth 18 19 method converges for all initial points in C n — J, where J is a subset of C" with Lebesgue measure 0. At this stage, no proof for this observation is available. The Aberth method is not suitable for finding the roots of a polynomial with 1 multiple roots, since the quotient can be very large when two approximations 2k and z, converge to the same multiple root. Overflow would occur. To overcome this difficulty, we propose a modified Aberth method which has the same convergence rate as the original Aberth method and can find all roots of a polynomial with multiple roots. This method reduces to the original Aberth method when the polynomial has only simple roots. The modified method, like the original method, is parallel in nature and is globally convergent in practice. In this chapter, we first derive the modified Aberth method and describe its con- vergence property. In later sections we show the convergence behavior of the original Aberth method near multiple roots and discuss the issue concerning how to dynami— cally detect the multiple roots. The derivation of the modified Aberth method is based on the theory of electro— statics. Let the polynomial in (2.1) have the following form p(z) = (z — 1:1)"1(z — $2)"2...(z — 17m)”"‘ , (2.3) where 1:1, 1:2,. . . , mm are the m distinct roots of p(z) with multiplicities n1, n2, . . . , nm, respectively, where n = 22;, nk . Starting with m distinct initial approximations z1,22, . . . , 2",, the modified Aberth iterations are nkp(zk) P’(Zk) — 11121:) Z?=1;j¢k Stu—127 ,k=1,2,...,m. (2.4) ZLZZ/c— We utilize the identification of complex numbers with vectors in the complex plane C. If an 12;, unit plus charge is situated at the point 3],, for each It, the resulting 20 electromagnetic vector field at a point z is i (7.42:: = 6%?) (2.5) k=1 _ 33") Given a polynomial p(z), we wish to locate a root of p(z) by sampling the field defined by the right hand side of (2.5) at some point z;c with multiplicity nk, and finding the point where a single nk unit plus charge would be located if it were causing this field. Sampling at this new point, the cycle then could be repeated. Calling the new point z]c = z), + wk, and solving for wk in the equation (z,c — (2+ wk» : (211:3) ’ leads to the modified Newton’s method I Z nkp(zk) 2k: k— p’(2k) . (2.6) Now, if we try to locate all roots of p(z) by simultaneously applying the modified Newton’s method (2.6) to m different sampling points, some of them may converge to the same (multiple) root. To avoid this, an nk unit minus charge is assigned at the sampling point 2),. The idea is that when a sampling point z)c is near a (multiple) root, the field from the minus charge at 2;, should counteract that field from the plus charge at the root, preventing a second sampling point from converging to this root. After taking conjugates, we have the iteration equation for the k-th sampling point, n). _ P'fZ) "n1 (2:. — (z). + UM) _ P12) + Z #k 2" — 2" Solving for wk yields wk = — nkflzk) . . (2.7) p,(zk) _ 19(2):) 21%]: zk—ZJ This gives the modified Aberth iterative formula (2.4). 21 The modified Aberth iteration correction (2.7) can also be derived algebraically. Let Rk(z)— —- mfg—aw,— be a rational function, k- — 1,2,. ,m. Then 12;, has the same root 3:), as p(z). The modified Newton’s method (2.6) applied to Rk(z) gives nkRk(Zk) 32121:) wk: — le n]#k?::k-)zj)n1 (Hj¢k(zk — 2.1.)"J )2 — (II (2 2)"J)’ P’1Zk) IIj¢k(zk ‘ 21')" ’ “P120 ”1::(zZ—z:)n J . M10120 P'1Zk) _ 19121:) Eye): 27212;, . 2.2 Local Convergence In this section we discuss the local convergence property of the modified Aberth method (2.4). We will Show that it converges cubically. However, when the mul- tiplicities are estimated incorrectly in practice, only linear convergence rate can be reached. For convenience, let and Expanding Fk(Z) in the Taylor series at the point X = (x1, x2, . . . ,xm) yields 6Fk(X 82Fk(X w 2...; 1..— Hi: We..1.._..)...-a.) 22 One can show that all the first and second partial derivatives are zero, so the successive iterates converge cubically. But this involves some tedious calculation of the second derivatives. Instead, we give in this section another proof of the cubical convergence. First, we calculate the first derivatives which reveal that the modified Aberth method is only linearly convergent if the multiplicities are estimated incor- rectly. Lemma 2.2.1 . P(Zk) _ ill-£31: P’(Zk) — Proof. For simplicity, let k = 1. 1. P121) ‘33. pa.) 21 _ 1.... (21 — we. — .2)» ~12. - arm zl—nl n1(zl — x1)("1“1)(zl — 332)"? - ~ - + (21 — $1)"1n2(zl — x2)("2'1) - - - + 1. (21 — $0121" 11311)"2 ' ' ' (21 — $m)n"' = 1m z1—>x1n1(zl— 2:2)"2(21- 9:3)"3 ---+ (21 — $1)n2(21 — $2)("2‘1)~-- + --- =0. Lemma 2.2.2 Suppose j 7i k, then 8Dk(Z) ] n,- 62y Z:X— (it;c — $J)2, and 6015(2) _ —nk 82]: lZ=X— be; (13k _ 3302 Proof 23 and 8D,.(Z) _ n1 _ —n1 82" l —X (#1: (2k 21)2 |Z_X— #2]: (17k _ xl)2 Lemma 2.2.3 P12) Pm(2) n1. —1 Proof. Write p(z) = (z — xk)""g(z), where g(:1:k) # 0. Then 11(2) = no - x.)"~-‘g(z) + (z — arm's) 11212) = min. —1)(z — ark-29(2) + no - ark-1912) +m)(e§.°))2 s Mil—Tswmei”)? tp2 IX/(m - 1)T(£(0))3 — tp2 . Hence, 6(1) _<_ Wkwl)? Furthermore, .53) _ ————-—,p2 wows. ’) K(m — 1)T lpz (0) < _ tp2 K(m — 1)TC“ _ 6(0) __ k , Hence, each new disk is contained in the previous disk. Therefore, K(m — 1)T 6("7+1) < (4")? by induction. I 2.3 The Aberth Method at Multiple Zeros Though we proved in the previous section that the modified Aberth method con- verges cubically, no information is available in practice whether a given polynomial has multiple roots and what the multiplicities of multiple roots are. With n initial approximations for a polynomial of degree n, one can only start with the original Aberth method. An effective program should be able to detect the multiplicities of multiple roots of a polynomial in the process of execution. We first investigate the behavior of the original Aberth method near a multiple root. In this section we show that the approximations to a multiple root become equidistant on the circle centered at the root. This information will be used to dynamically detect multiple roots and their multiplicities. We first investigate the convergence behavior of the Aberth iterations applied to polynomial 17(2) 2 2". Let rk = if, r; = %,k = 2,3,...,n, and r1 = l,r’l = 1. 29 Assume none of the approximations reaches the root of p(z) in the process. Dividing both sides of (2.2) by 2;, we have 2k — p(zk) 1 I z; p’(zk)-p(zn=)2?=1,,¢x 3:3 rk — '—,' — 21 21 _ P051) f P’(z1)-P(21)2?=2 11—”: 51 _ PM) 21 21(P’(Zk)-P(zk)2;'=1,,¢k 371:3) _ _ 13(21) T 21(P’(zi)-P(ZI)2?=2 3:37) 51 Z? — —1 21 21012: ‘21? 23:1,];41: ,k—lgl _ 1 z? — —T 21(nz;1 —z{‘ :=2 fl) ,. k 7'1: — 71—1 1: 1 "r1. "'2' 211:1 1w: 3: l — ;——,,1—1— - 1:2 l—rJ 1 — 1 1 ""k 2;: .nek a: _ J _ Tl: l— ——l—_r_—_ "-21:21—rj Qk(7‘1, 7‘2, ..., Tn) . With Q1 2 1, let Q : (Q1,Q2,...,Qn). Forj = 1,2,...,n, let w,- = 62195411 be the roots of the polynomial P(z) = z" — 1, where i = \/—1. It is easy to see that (w1,w2, . . . ,wn) is a fixed point of Q. Theorem 2.3.1 Ifr = (r1,r2, . . . ,rn) with no zero components is a fixed point of Q, then r = w. Proof. Under the assumption, rk satisfies, 71-1 ”A. nrn—I_rn 2n: 1 k k ]= 1,);11: 'lc‘rj rkzr;c ,kzl,2,...,n. (2.10) 1—n'_2:1=121—jr 3O Canceling 7‘]; on both sides of (2.10) yields __l__ nrk- -rk Z]: 1,J#k 'k—r) l-n "8:21—11, Therefore, 1 1 n 1 —' n 1 a n 7* Zj=m¢k rk—r ” 21:2 l—r J J or n 23f221jm QJU F 1': 1j¢k Tip for k=1,2,. .,n We want to show that r1,r2, ..., rn are exactly the n roots of polynomial z" — 1, which are spaced equally on the unit circle. It is easy to see that the roots of z" — 1 satisfy (2.11), since 2 Zj¢k——— ”_rj — -'—’£,—)(,(,:—;l for any polynomial with all its roots being simple, where p’ and pm are the first and second derivatives of p. The right hand side of (2.11) is a constant C for any k = l, 2, . . . , n. Summing up both sides of (2.11) for h from 1 to n, the right hand side becomes nC. Let the left hand side of (2.11) be fn, then fn = in. Z — r k=1 1': —1.j¢k rk J n—1 n—1 1 n—1 n—1 1 = Th 7' r + T" Z r + Th - . — r r — r k=1 J:1,j;ék " 3:1 " J k=1 " " = fl4+n—l. (2H) Since the first term of the recursive equation is f2 = l, the solution of the recursive 31 relation is fn = M. Therefore, nC = £33411. Cancelling n on both sides gives 2 " 1 — 1 Tk Z = n 2 , (2.13) 7' —T‘ j=1.j¢k " J or, 2n, 2 H(rk—r1)=(n—1)H(rk—r,). (2.14) j=l,j¢k l¢k.j 1'9“: Let P(z) = n (z — Tj). Then (2.14) becomes rkP(2)(rk) = (n — 1)P'(rk) (2.15) for k = 1,2, ..., n. However, the polynomials zP(2)(z) and (n — 1)P'(z) on both sides of (2.15) have the same degree n — 1 and the same leading coefficient (n — l)(n — 2). They are also identical at n distinct points r1,r2, . . . ,rn. So those two polynomials are identical. We have P(2)(z) __ n — 1 P’(z) _ 2 ° Integrating both sides, we obtain P'(z) 2 c2:"‘1 for some constant c. Hence, P(z) = z" —1 since P(z) is monic and r1 = 1. Therefore, r1, r2, . . . , rn are the n roots of z" — 1 = 0 and w is the only fixed point of Q. I Based on Theorem 2.3.1, we can show, Theorem 2.3.2 Applying the Aberth method {2.2) to p(z) = 2", when the iterations converge to the n-ple root 0 and none of them reaches it, then they are eventually equally spaced on a circle around 0 and the modulus of each correction is ;1%. Proof. The first part is a direct consequence of Theorem 2.3.1. For the second part, assume the approximations are equally spaced on a circle around 0. Without 32 loss of generality, let the approximations 21, 22, - - - ,2n be the n roots of z" -— r = 0 for some positive number r. Then T! I zk zk _ 2k — n l l - n nzk — zk 2:5“: “—2, 2!: : zk_ n(n-l)z"—2 " ‘ k—t— Zk = 2k — T n “ T 2 = z 1— A n+1) n —1 = z 72 +1 k So, IZk—zll=lzk—n_lzkl= 2 Izkl. n+1 n+1 That is, the modulus of each correction is $. I Similar result holds for the Aberth method applied to polynomial p(z) = (z — 3:)" for a complex number 1:. For a general polynomial, when we analyze the behavior of the approximations converging to multiple roots, we may assume the approximations which converge to simple roots are the roots themselves, since these approximations converge to their targets very fast (we will show this later). Theorem 2.3.3 For a polynomial p(z) of degree n which has an m-ple root :31 (m < n) and simple roots x2, x3, ..., :rn_m+1, suppose approximations 21, 22, ..., 2m converge to $1 and other approximations 2...“, ..., 2,, take the roots ofp(z) other than 3:1 as their initial values, then 21, 22, ..., 2,., are eventually equally spaced on a circle around $1 and the modulus of each correction is #. 33 Proof. Since approximations zm+1,. . . ,2” are roots of p(z): 2:2, 3:3, ..., $n_m+1, no iterations for them are needed. We want to show that the Aberth iterations for 21, 22, . . . , 2m applied to p(z) are equivalent to the Aberth iterations applied to 2'" —:1:1. Let p(z) = (z — xl)mg(z), then the Aberth iterations applied to p(z) for 21, 22, ...,zm are zfczzk— p(z).) 1 ,k=1,2,...,m. P'(Zk) " p(z).) Z?¢k,,-=1 27.3 The denominators in the above iterations are P'(Zk) - PM) E l j¢k.j=l zk—zj = m(2k - film-19(4) + (2k ‘ $llm9'(zkl = mac—mm 19(2.)—(z.—x1)mg(zv Z kl, J¢k,j 0. Let r 2 rk and z E C with | z — a |> 1‘. Then 1 m k=0 2 — 2;, z — a with some fl satisfying I ,6 |< (Iz- al)(|r:- _a|_r) Proof. 1 1 _ 1 1 z-—z;c z—a — z—a—rkeek‘ z—a ”69,4 r(z — a — rkeak')(z — or), and ski rke 1 I IS r(2—a—rke9k‘)(z—a) (Iz—a|)(|z—a|—r)' 35 Let m—l rkepk,‘ B = 0 3 , k=0 r(z — a — rke * )(z — a) m—l l m m then, 2,50 z_2k — z_a = 7‘2 and Ifl IS (|z-o|)(|z—o|—r)' I Theorem 2.3.6 Suppose the Aberth approximations applied to p(z) converges and the approximation starting at z, converges to a simple root x1, then, eventually, I zlr+1)— :6.- IS 063, ("+1) where z,- is the approximation at step (r + l), C is a constant and e=max{|z§r)—xj |,j:1,2,...,n}. Proof. Suppose p(z) = H2":I(z — xi)” , where x1,x2, ...,xm are m distinct roots with multiplicities n1, n2, ..., nm. Then Let c be a small number such that the distance of every Aberth approximation to its target is decreasing as soon as all the approximations are within 6 distance from the roots. Let p be the minimum distance of any two distinct roots x1. and x,. Let c be far less than p, say 46 < p. For simplicity, let n1 2' 1, 21 approximates x1, 22, . . . ,zn,+n, approximate x2, . . . , Zn1+n2+...+nm_1 , . . . ,zn approximate xm. Suppose all the approximations are within 6 distance from their targets. From Lemma 2.3.5, we have, "1 1 n- J E , = + 7351' k=1 21 _ zn1+...+n]_1+k 21 _ xj . n 8n - Wltl’l r, < e and I 2, |< 17%; = 3;} SlIlCe Zn1+...+n,_1+1,. . -,Zn1+...+n, converge to x,. 4 4 36 So, , 1 21—31 — Zi—iBl—p.(21)_z,, 1 p(zl) 3:2 7.1—2, 1 = 21 — .751 — m _"J__+ _ 21:2 zl—xJ-l- zl —1:l 2m :2 zl—xJ —l—+ rj/Bj l = 21— 171‘ 1 m . . z1-1731 2' 21—1‘1 — 1 "’ Z?=2(zl _ lerjflj 1 1— 21:2(21— $07351] _ :71: 2(31 ‘5cllrjflj — (21 — 3:1)1 “ 2m=2(zl“ ‘31 )rjflj Hence, when r,- < e we have = (zl—xl) l— 8 n-l 3p 3 z —x < ———e 3p 16 Therefore, | 7.1 — x1 |< 1_L;p;1_1€ 3 when 6 satisfies the extra condition 62 < ———33— I 16(n— 1) This theorem does not imply that the rate of convergence for the approximations which converge to simple roots is cubic. When 6. is the largest error of all the errors | 2):) -- 23;. l, k = 1,2, . . . ,m, at step 7‘, the conclusion of the theorem indicates the error at step r + 1 for approximations converging to simple roots is less than C 63, for some fixed constant C. Apparently, it is much faster than the convergence rate of the approximations which converge to multiple roots where the error at step r + 1 is less than C16 with C1 < 1. In practice, the approximations converging to simple roots converge to the roots very fast while the approximations converging to multiple roots display a very slow convergence rate. 37 In [29] and [17], finding multiple roots of polynomials by Durand-Kerner method is discussed. Like the Aberth method, the Durand-Kerner method can find all the roots of a polynomial with simple roots very efficiently. It converges quadratically in this case while it converges only linearly when multiple roots exist. In [29] a result similar to Theorem 2.3.2 for the Durand-Kerner method is proved without using the modified Durand-Kerner method to refine the approximations. In [17] a different technique is used to handle the multiple roots and obtains superlinear convergence rate. Their numerical results show that both approaches can not find the multiple roots with high accuracy. 2.4 Dynamic Estimation of Multiplicities For a given polynomial p(z) with no information in advance, one can only use the original Aberth method (2.2) in the beginning. It is desirable that the multiplicities of multiple zeros of p(z) can be dynamically detected in the process of execution. The approximations can then be grouped into different groups according to the data available concerning whether they converge to the same root. Then, the modified Aberth method is applied. For a group of approximations converging to the same root, the proposed algorithm takes the average value of all the members in the group. This average value should be closer to the multiple root than any approximation in this group as the approximations are spaced equally on a circle around the root by Theorem 2.3.2 and Theorem 2.3.3. This average is used in the modified Aberth iterations. The same approach can be found in [17] which uses these properties without proof. Each 2;. takes a small random perturbation of the averaged value for the next iteration since the same Durand-Kerner iteration functions are used and the approximations must be distinct. While this approach provides a supper linear 38 convergence rate, it does not yield a high accuracy. In [29], the correction f (2k) Hj¢k(zk ‘ ZJ‘) is multiplied by 72],, the estimated multiplicity, to be added to 2],, since the modulus of the correction is 771; for the Durand-Kerner method. In this way, quadratic convergence is achieved theoretically. Again, this approach can not yield results with high accuracy since no new formulae are used in the process of iterations. None of the remedies can avoid the overflow problem because the reciprocals of small differences of close approximations are too large. The modified Aberth method introduced in this chapter can avoid the overflow problem because the approximations in the same group are treated as a single value in the new formula. Consequently, the roots of the polynomial can be obtained with high accuracy. Our technique of dividing approximations into groups is based on the following observations: 1. If the approximations to an m-ple root are sufficiently close, m approximations are situated on a small circle centering around the root by an angle of 2;". 2. Every correction directs toward the center of the circle and has almost the same magnitude. 3. In comparison with approximations of simple roots, the order of convergence is slow, i.e., the value of the residual is relatively large. 4. The magnitudes of the residuals for m approximations are almost the same. After some iterations we can group the approximations into different groups by the following method. If the k-th and j—th approximations are in the same group, they must satisfy the following conditions: 39 1. The corrections wk and w, should satisfy lwkl 1—a< lel <1+a, (am) for a small number a. 2. Let 0].,- be the angle between vector 2, — 2;, and vector wk, and 0,]. be the angle between vector zk — z,- and vector wj. These angles should satisfy | cosflk, — €0.3ij |< B , (2.17) for a small number ,8. Notice that cosfikj = W > 0 and 0050,}. = (2:;.,—zz ,m,) > 0 lzk‘zjllel . 3. We can count the number of approximations satisfying the above two crite— ria to find the multiplicity. To make sure this group contains all appropriate approximations we check the criterion: [W£r+l}]_m—1 — —< 2.1 wa’}| m+1| 7’ ( 8) for a small number 7. In our implementation, we trace the rate of convergence of each approximation by keeping track of each residual wk. At first few iterations, wk usually fluctuates {r-H} wildly. Then all 11);, start decreasing monotonically. By checking the ratio 35-m— r k which converges to (m — 1) / (m + 1) by Corollary 2.3.4, one can tell which approx— imations converge to simple roots. By Theorem 2.3.6 the ratios corresponding to approximations which converge to simple roots converge to zero very fast. We ob- served that as soon as all residuals start decreasing it takes at most 10 iterations for those approximations which converge to simple roots to converge to their targets within the chosen accuracy. Then the program begins to detect the multiplicities 40 and to execute the grouping process. At first, the residuals are sorted by their mag- nitudes and the grouping process is invoked to group the approximations according to criteria (2.16) and (2.17). The program checks criterion (2.18) for each group. If it is satisfied, the program takes the average of the approximations in the group and use it in the modified Aberth method. If criterion (2.18) is not satisfied the ap- proximations in this group are used in the modified Aberth method instead of their average. When the grouping process is successful it takes only a few iterations for the modified Aberth method to converge since the averages are very close to the roots and the method converges cubically. In our experiment, it usually takes less than 3 iterations for modified Aberth method to converge after the grouping. In practice, the grouping process may include wrong approximations in a group. In this case, the modified Aberth method will converge linearly. The program disbands the groups with linear convergence rate and convert to the beginning with the newly computed approximations as initial approximations. The flow chart of the program is given in Figure 2.1. 2.5 A Numerical Example We demonstrate our method by the following numerical example. Let p(z) = (z —1.1—1.1z')4(z — 3.2 — 2.3i)2(z — 2.1—1.5). This polynomial was also used in [29] with which we will compare our result. As in [29], we choose the initial approximations 2;, = 627”"77, k = 0,1,. . . ,6. After 12 iterations the approximations and the residual | w,- I are real imaginary residual 1. 2.1000000000000 1.5000000000000 0. In fact, the lst approximation reaches the exact root at the 10—th iteration. The 4—th, 5-th, 6-th and the 7-th are grouped as one group with multiplicity 4. Their 3.2002858826098 3.1997141178405 1.0968596312555 1 .09406242073 72 1.1059206829313 1.1032928553054 average is 41 2.2998138134465 2.3001862562026 1.0941839385686 1.1031845106896 1.0967222422851 1 . 1060241399490 6.8221947921266D—04 6.8252015101818D-04 4.2572547041395D-03 4.4900998706488D-03 4.5441352406740D-03 4.7093618963726D-03 (1.1000338975574, 1.10002287078731). It is very close to the exact root (1.1, 1.1). Grouping the 2nd and 3rd in one group gives an average which is also very close to the exact root (3.2, 2.3). These demonstrate the effective— (3.2000000002252, 2.3000000348246) ness of Theorem 2.2.6. Then the program switches to the modified Aberth method (2.4) . Only one iteration gives all the exact roots. 1. 2. 3. real 1 . 1000000000000 3.2000000000000 2.1000000000000 imaginary 1.1000000000000 2.3000000000000 1.5000000000000 42 In [29], the same groups are formed after 14 iterations. It took another 7 iterations for the program to stop. The final results for the approximations converging to the root (1.1,1.1) are listed below with accuracy to the 4—th digit. real imaginary 1. 1.09914601 1.10078329 2. 1.10092914 1.09945021 3. 1.09931441 1.09917266 4. 1.10045233 1.10079471 The result is even far less accurate than our average. In our experiments the approximations converge linearly until the modified Aberth method is applied. Once the multiplicities are detected, the modified Aberth method converges extremely fast. 43 Algorithm MODABERTH Input: polynomial p(z) and its degree n the initial approximations 2; Output: the roots of p(z) begin for i = 1 to 2 do for j = 1 to n do compute new approximations z; compute residual ué, save old one as an for j = 1 to n do compute the ratio Tj=3Ué/UU for i = 1 to 2 for j = 1 to n compute new approximation as before compute residual as before compute ratio as before and save old ones now we have 3 consecutive ratios for each approximation we want to check if the ratios are all decreasing monotonically while not decreasing do one iteration for each approximation while there is an i such that r5< smalll and w,- > smallg ! if there is an approximation converging to a simple root but ! it has not yet converged. do one iteration for each approximation while not done sort residuals un,um,~-,um. for i = 1 to n and w; > small; for j = i+1 to n check if j should be in the same group as i for each group, check criterion 3 if all groups satisfy criterion 3 set success = 1 if success = 1 while not all convergent use modified Aberth method once for each approx if linear convergence is observed break up groups and goto begin done = 1 ! all approximations are convergent else use modified Aberth method once for each approx end MODABERTH Figure 2.1: Algorithm MODABERTH Chapter 3 Nonsymmetric Eigenvalue Problem 3.1 Introduction This chapter presents a parallel algorithm to solve the eigenvalues of non-symmetric matrices. Both homotopy and divide-and-conquer strategies are used. The original matrix is split into two or more submatrices, then a homotopy between the sub- matrices and the original matrix is established such that the intermediate matrices have only simple eigenvalues. The Aberth method is then applied to refine all the ap- proximations simultaneously. Section 3.2 discusses the split-homotopy strategy, while Section 3.3 introduces the Hyman method for computing the characteristic polyno- mial and its derivative. In Section 3.4 we make a straightforward implementation on Intel iPSC/ 860 and Intel Delta multicomputers. However, this initial approach did not use the processors uniformly so we modify our technique in Section 3.5 to balance the load for message-passing distributed systems. 3.2 Splitting Procedure and Homotopy Since any real matrix A can be transformed to a similar matrix in upper Hessenberg form by an orthogonal transformation, we shall assume throughout this chapter that matrix A is an upper Hessenberg matrix, namely, 44 45 A = (as) = a3? 0 an—Ln—2 * \ anm-l ** \;_ For convenience we assume n is an even number. We further assume A is irre- ducible, that is, none of the subdiagonal entries am-“ 2 < j < n, is zero, otherwise we can consider the reduced matrices. Let k = 71/2. Let D 2 (did) be the matrix obtaining from A by making the subdiagonal entry ah“). = 0, namely, _ A11 A12 D _ ( 0 A22 ) . The eigenvalues of D is the union of the eigenvalues of smaller matrices A11 and A22. It takes much less time to compute the eigenvalues of A11 and A22 and it can be done in parallel. The following homotopy is established H(x\,t)=c(1—t)det(D—AI)+tdet(A—)\I) (3.1) where c is a random complex number and 0 S t S 1. For each t, since H (A, t) is a polynomial in A of degree 12, there are n zeros: A1(t), A2(t), . .. , An(t) . They are called the eigenpaths of system (3.1). It is well known [23] that the eigenpaths do not meet in the interval (0, 1) for a randomly chosen 0. It implies that the zeros of H (A,t) at each i, 0 < t < 1, are distinct. Our algorithm does not depend heavily on this property, though it makes the algorithm easier to implement and more efficient. Notice that the zeros of H (A, 0) are the eigenvalues of matrix D while the zeros of H(/\, 1) are the eigenvalues of matrix A. We choose 0 = to < t1 < t2 < < tm = 1 46 for some positive integer m. The idea is to use the zeros of H (A, t) at t,- as the initial approximations to the zeros of H (A, t) at ti“. Aberth method or our modified Aberth method may be applied to refine the approximations. It is also possible to use the tangent line at Ak(t,) to predict the initial approximation for Ak(t.-+1). Our numerical experiments show that the algorithm of the second [approach is slower than that of the first one because of the extra calculations of derivatives in the variable t and the global convergence property of the Aberth method in practice. Here, we only discuss the first approach. In the spirit of the Aberth method, approximations at each step must be distinct. It is possible that some of the zeros are identical at t = 0 that makes the Aberth method not applicable. In such case, we simply make random perturbations to make the initial approximations distinct. Notice that our goal is to find the zeros at t = 1. It is unnecessary to approximate the zeros with high accuracy at the intermediate steps t,- < 1. In our implementation, the algorithm is allowed to run for a fixed number of iterations at the intermediate steps regardless of the convergence. The global convergence property of the Aberth method in practice ensures the eventual convergence at t = 1. The problem of finding the zeros at t = 0, i.e., the eigenvalues of both A11 and A22 are solved by the fastest available serial codes or by applying the same splitting and homotopy strategy. In the latter case, we keep dividing the matrices until certain dimension then the fastest serial code is applied. For instance, suppose the dimension of the original matrix A is n = 2’, the number of processing nodes is k = 2’", and m < n. One can split matrix A into I: small matrices, all of them have dimension L = 2”“, the serial algorithm is then applied to those small matrices. 47 3.3 Hyman’s Method In order to extract the zeros of H (A,t) iteratively by Aberth method, H (A,t) and 8H :r.t 8A [16] which is remarkably backward stable as pointed out by Wilkinson [34]. Notice need to be evaluated efficiently. For this purpose, Hyman’s method is used that det(D — AI) = det(A“ — AI) det(A” — AI). Differentiating equation (3.1) with respect to A, we have, 6H(A,t) ax (9det(A11— AI) 6A 0det(A22 — AI) = c(1—t) 8A det(A22 — A1) + det(A“ — AI) 8det(A — AI) +t 6A . Since A11 and A22 have the same structure as A, we only need to address the problem of computing det(A — AI) and N—etaflxé—Q in more details. For A = (a;,-) and J: = ($1,$2,...,xn)T, the system of equations (A — AI):r = 0 can be written as (all — ”131+ 012132 + ° ° ° + a1n$n = 0 021151 + ((122 - ”$2 + ' ' ° + a2n$n = 0 For given A, letting 23,, = 1 in the last equation, we can solve the last n — 1 equations recursively for a:,._1, . . . , .132, :31. These values are then used to evaluate the left-hand side of the first equation, F”): (011— ”$1 + 012132 + ° ' ' + a1nl‘n- It is clear that F(A) = 0 if A is a eigenvalue of A. 48 To computer det(A — AI), we write matrix A — AI as (an—A (112 013 aln \ 021 (122 — A 023 "' "° a2n a32 033.. A ... ... as" 0 an—1,n—-2 an—lm-l " A art—1,11 \ an,n—l ann _ A Forj = 1,2, . . . ,n the jth column is multiplied by 1:,- as found above and added to the last column, thereby yielding the matrix { 011— A 012 013 al.n—l F(A) \ 021 022 - A 023 "' "' C12,n—1 0 (132 033 — A ' ' ' (13.n—1 0 0 an—l,n—2 an—l,n—l — A 0 \ an,n—l 0 ) Therefore n-l det(A — AI) = (—1)"‘1F(A)H (1,11,. i=1 To compute W, we need to compute 5%? Differentiating (3.2) with respect to A, taking into account that 1:], j = 1,2, . . . ,n — 1, are functions of (A,t), we have 81? 8:1: :1: (all._.A)_i%._.xl.+.a12_73_+.”..+.al %i?._.0 8x 8x 8x axk_ 3r 8 ahk-l ‘ (akk-A)3f-$k+~°+akn-§f=0 (3.3) 8r"- 8 ann—l—__l 'l' (ann _ A)fil _mn : 0 . . a n_ a n_ . . With 83”} = 0, we solve (3.3) success1vely for $8, ‘ , I8, 2 , . . . , %, usmg the prev1ously computed values of 1:1, 3:2, . . . , at”. The value of the left-hand side of the first equation in (3.3) is then $3. 49 3.4 Initial Implementation We implemented the algorithm on the Intel iPSC/ 860 gamma multicomputer and Intel Touchstone Delta multicomputer. Both iPSC/ 860 and Delta multicomputers are distributed memory MIMD ma- chines [15]. They consist of a collection of computing nodes each of which has a processor and a local memory. The nodes are inter-connected by networks. The iPSC / 860 system uses hypercube topology architecture while the Delta system uses 2 dimensional mesh. The computing nodes for both systems are the Intel 2860 pro- CCSSOI‘S . The 128 computing nodes of the iPSC/86O are networked as a 7 dimensional hypercube. The 512 computing nodes of the Delta system are networked as a 16 by 32 mesh. Each row has 32 computing nodes and each column has 16 nodes. The communication between processors for both systems are done by a library of explicit message passing functions. It is the programmer’s responsibility to locate the synchronization points of the program and specify the communication between the processors by explicitly passing messages. It is expected that the Delta multicomputer offers better performance since for some applications 2—D mesh is a natural topology and the Delta multicomputer uses faster switching techniques [1, 6]. In our application we have not observed any signif— icant improvement of the Delta multicomputer over the iPSC / 860 system since our program doesn’t need extensive communication. On the other hand, the communica— tion in our program is solely broadcasting in which the nodes broadcast the messages in turn with no message contention. The special faster switching technique for the Delta multicomputer lose advantage for our application. 50 Algorithm ABERTH Input: the matrix A the initial approximations A5, i=1,2,---,n. the number of processors m, where n is a multiple of m Output: the eigenvalues of A, A5, i=1,2,---,n. begin ABERTH s = n/m ! my share of computation load ! begin computation iam = mynode() ! system call, returns the id of this node for i=iam=ks+1 to (iam+1)*s update Ag! use Aberth method end for ! end of computation ! begin communication for £220 to n1-1 ! the node counts from 0 to nz—-1 if (iam = i) then send Agam.,+1,°'°,/\(gam+1)" to all other nodes. else receive Aiam13+1,---,A(,am+1)., from node i. end if end for ! end communication end ABERTH Figure 3.1: Algorithm ABERTH Both systems support Fortran 77 with message passing libraries. Their main differences are their inter-connection topologies and the switching techniques which are invisible to the users. The programs written for one system are portable to the other. In our implementation, programs were written for iPSC/ 860 multicomputer first, then ported to the Delta multicomputer without modification. The program has two phases. The first phase splits the matrix into two or more submatrices. Then subroutine H QR in EISPACK is called to find the eigenvalues of each submatrices. Those eigenvalues are used in the second phase as the initial approximations to the eigenvalues of the target matrix. The Aberth method is then 51 used iteratively to find all the eigenvalues in parallel. The basic structure of the program for one iteration in the second phase is shown in Figure 3.1. When compared to the fastest available sequential code H QR in EISPACK run- ning on one node, our parallel program was slower when less than eight nodes were used, see Table 3.1, Table 3.2 and Table 3.3 for matrices with different dimensions. For eight or more nodes, the parallel version was faster, by a factor of two on 16 nodes. Execution time improved as the number of nodes increased. As more nodes are added the improvement flattens off due to the load imbalancing problem and the communication overhead. nodes sec 1 2.72 2 1.55 4 0.97 8 0.67 HQR on 1 node 16 0.52 0.75 sec 32 0.47 64 0.60 Table 3.1: Time on i860 with a 64 x 64 random matrix. nodes sec 1 18.96 2 10.61 4 6.63 8 4.23 HQR on 1 node 16 2.83 4.56 sec 32 2.25 64 2.48 128 2.72 Table 3.2: Time on i860 with a 128 x 128 random matrix. 52 nodes sec 1 149.45 2 84.54 4 51.22 8 . 30.29 HQR on 1 node 16 17.87 32.24 sec 32 10.81 64 8.21 128 9.82 256 13.38 Table 3.3: Time on i860 with a 256 x 256 random matrix. 3.5 Load-Balancing In this section we consider the load-balancing problem. The Aberth method fits parallel computers naturally. Suppose n nodes are used for the computation of the n eigenvalues of an n x 71 matrix. Each node is responsible for finding one eigenvalue. For direct implementation, every node has a copy of the complete information of the matrix and all the 11 initial approximations. The nodes execute calculations simultaneously so that one node updates only one approximation. Since all nodes have the same amount of computation, they generally complete the calculation at the same time. Then every node broadcasts the updated approximation to the other nodes. This process continues until all the nodes find the expected eigenvalues. A natural stopping criterion is, as soon as an approximation is good enough no further iteration on that approximation is needed. In Section 3.4 we have shown that the parallel algorithm can beat the best serial algorithm, H QR in EISPACK, for a sufficiently large number of processors using an Intel Delta multicomputer. However, if the number of nodes is far less than the num- ber of eigenvalues, each node is responsible for finding more than one eigenvalue and 53 ] Si 82 S3 S4[85 S6 S7 S8 S9 810 811 S12 813 814 S15 S16] N0 10 10 10 10 9 7 7 5 5 5 3 2 1 1 1 0 N1 10 10 9 8 6 4 3 3 2 2 2 1 1 1 0 0 N2 10 10 10 10 9 9 8 8 8 7 7 6 3 2 1 1 N3 10 10 10 9 9 8 7 5 5 4 3 3 2 1 0 0 N4 10 10 10 9 9 8 7 7 6 6 6 4 4 3 1 1 Table 3.4: Unbalanced Load Distribution some nodes may complete their jobs earlier. This difference causes load-imbalance. To illustrate the load-imbalance, let’s compute the eigenvalues of a 50 x 50 matrix whose eigenvalues and the initial approximations are randomly chosen in the region {2 E C, [2] < 1}. With 5 nodes, the program assigns the first 10 eigenvalues to the first node, the second 10 eigenvalues to the second node, . . . , and the last 10 eigenvalues to the last node. Since it takes the same number of floating-point operations for any node to update one approximation, we will count the amount of time to complete this number of operations as one unit time. Table 3.4 lists the number of eigenvalues each node computes at each step—each row represents a node and each column represents a step. It takes 105 parallel time units for the program to complete. The key for the load-balancing problem for the Aberth method is the observation that if every node has a complete copy of the previous approximations, it can update any approximation. The total number of eigenvalues is sufficiently small for each node to have a copy. Based on the differences between the old approximations and the new ones it received, each node can decide which approximations are good enough and need no more updating. Once a node has complete information on which approxima- tions need to be updated and the number of nodes involved in the computation, it can determine the distribution of approximations to nodes. The number of approx- 54 SI 82 83 84 85 86 S7 88 89 810 S11 812 813 814 815 816] N0 10 10 10 10 9 8 7 6 6 5 5 4 3 2 1 1 N1 10 10 10 9 9 7 7 6 5 5 4 3 2 2 1 1 N2 10 10 10 9 8 7 6 6 5 5 4 3 2 2 1 0 N3 10 10 10 9 8 7 6 5 5 5 4 3 2 1 0 0 N4 10 10 9 9 8 7 6 5 5 4 4 3 2 1 0 0 Table 3.5: Balanced Load Distribution imations can be evenly divided among nodes, and that determination is completely distributed. Note that each node also knows the approximations other nodes are to update. This information is required when a node broadcasts the updated approxi— n1 nnodes mations later. Node i updates [ ] + 1‘.- consecutive approximations, where m is the number of approximations to be updated at this step, nnodes is the number of n1 nnodes nodes involved in the computation, r,- = 1 if i < m — [ ] and r.- = 0 otherwise. For example, when there are 50 approximations to be updated and there are 5 nodes, each node gets 10 consecutive approximations. If there are 33 approximations, then node 0, node 1 and node 2 each gets 7 consecutive approximations while node 3 and node 4 each gets 2 consecutive approximations. For this balanced program, at each step every node computes the approximations it is responsible for. Table 3.5 lists the number of approximations each node computes at each step for the above example. The program takes 97 parallel time units to complete which represents a distinct improvement over the 105 time units for the unbalanced example. Figure 3.2 shows the corresponding timing information. The unbalanced program uses static scheduling which assigns the i-th chunk of approximations to the i-th node. The balanced program, on the other hand, assigns the i-th share of approximations to the i-th node dynamically at each step. For this NodeO Nothl Nodal Nodc3 Nod¢4 55 Nacho i i I i i ' ' Nab! . . —h h h h “I Noch Lp—I—I-I-I-IIIJH New . ' 5* i—n Nodu ,, . . " "or F*hh :_ h h :p?][ o 10 a no a o n u 10 nu “commun- . m.___ III-I'l—ii m 4 l 1 L11; No.0flt‘ 0 I0 I! so co ‘9 5| U 7‘ I1 ” ”INIMWI” (a) Unbalanced Timing (b) Balanced Timing Figure 3.2: Unbalanced vs. Balanced Timing seconds 601- 45 ~~ 30 «— l 1 2 3 4 5 6 No. of Nodes 2 3 4 5 6 No. (I Nodes (a) Timing for 240 x 240 matrices (b) Timing for 360 x 360 matrices Figure 3.3: Timing Info for 240 X 240 and 360 x 360 matrix dynamical scheduling extra computation is needed to determine the shares for all the nodes at every node. Fortunately, this computation takes only 0(n) operations, while one updating of an approximation takes 0(n2) operations in Hyman’s method to evaluate the polynomial and its derivative. This extra computation is negligible. We implemented the balanced algorithm on a cluster of up to 6 DEC Alfa 3000 workstations running Message-Passing-Interface (MPI) with Fortran 77. MPI is a widely used standard for writing message-passing programs. It is available for al- most all existing high performance machines [27]. The test matrices are real upper Hessenberg matrices whose entries are randomly generated in the range (-10, 10). 56 The matrix is divided in the middle into two smaller real upper Hessenberg matrices. Subroutine H QB. in EISPACK is called to find the eigenvalues for each of the two matrices. The eigenvalues for the two matrices are merged as the initial approxima- tions to the original matrix. Aberth method is applied to refine the approximations. We tested the matrices of degree up to 360 x 360. The results show that the balanced program saves about 10% for the chosen class of matrices compared to the unbal- anced program. Figures 3.3 shows the timing information for test matrices of degree 240 x 240 and degree 360 x 360. 3.6 Conclusion In this chapter we describe a parallel algorithm to solve the eigenvalues of non- symmetric matrices. The straightforward approach leads to an unbalanced utilization of processors 80 a dynamically load balanced approach was developed and demon- strated. At this time, only six processors were available so the gains are small. Since the computations grow much faster than communication we expect significantly bet- ter results on a larger machine, especially with respect to the fastest serial algorithm, HQR. Bibliography [1] Aad J. wan der Steen, An overview of (almost) available parallel systems, Publication of the NCF, December 1993. [2] O. Aberth, Iteration Methods for Finding all Zeros of a Polynomial Simulta- neously, Math. Comp., Vol.27,1973, pp. 339-344 [3] G. Alefeld and Herzberger, 0n the Convergence speed of Some algorithms for simultaneous approximation of polynomial roots, SIAM J. Num. Anal, 11(1974), 237-243. [4] Alan F. Beardon, Iteration of Rational Functions, Springer-Verlag, 1990. [5] W. Borsch-Supan, A posteriori error bounds fro the zeros of polynomials, Nu- mer. Math. 6(1963) pp. 380-398. [6] Lionel M. Ni, Lecture Notes of CPSSQQ, Computer Science Department, Michi- gan State University, Spring, 1995. [7] J .Cuppen, A divide and comquer method for the symmetric tridiagonal matri- ces, Numer. Math., 36(1981), pp. 177-195. [8] K. Docev, A modified Newton method for the simultaneous approximation of all roots of the given algebraic equations, Fiz. - Mat. Spis Bulgar. Akad. Nauk 5 (1962) pp. 136-139 ( in Bulgarian ). [9] J. Dongarra, M. Sidani, A parallel algorithm for the nonsymmetric eigenvalue problem, SIAM J. Sci. Comput. 5(1993), pp. 542-569. [10] J. Dongarra, C. Sorensen, A fully parallel algorithm for the symmetric eigen- value problem, SIAM J. Sci. Statist. Comput. 8(1978), pp. Sl39—@154. [11] E. Durand, Solutions Numeriques des Equations Algebriques, T.I. Paris 1960. [12] L.W. Ehrlich, A modified Newton method for polynomials, , Comm. ACM 10 (1967) pp. 107-108. [13] M.R.Farmer,G. Loizou, A CLASS OF ITERATION FUNCTIONS FOR IM- PROVING,SIMULTANEOUSLY, APPROXIMATIONS TO THE ZEROS OF A POLYNOMIAL, BIT 15 (1975),pp. 250 - 158. [14] Hamilton, A Type of Variation on Newton’s Method, Amer. Math. Monthly, 57, 517-522(1950). 57 58 [15] Kai Hwang, Advanced Computer Architecture: Parallelism, Scalability, Pro- grammability, McGraw-Hill Inc. 1993. [16] M. Hyman, Eigenvalues and Eigenvectors of general matrices, presented at the 12th National Meeting of the Association for Computing Machinary, Houston, Texas, June 1957. [17] Pierre Fraigniaud, The Durand-Kerner Polynomials Roots-Finding Method in case of Multiple Roots, BIT 31(1991),112—123. [18] T.L.Freeman, Calculating Polynomial zeros on a local memory parallel com- puter, Paralle Computing 12(1989) 351-358. [19] M.W. Green, A.J.Korsak and M.C.Pease, Simultaneous iteration towards all roots of a complex polynomial, SIAM Rev. 18 (1976), pp. 501-502. [20] I Kerner, Ein Gessamtschrittverfahren zur Berechung der Nullstellen von Poly- nooem. Num. Math. 8(1966), 290-294. [21] Gbran Kjellberg, Two Observations on Durand-Kerner’s Root-Finding Method, BIT 24(1984),556—559. [22] K. Li and T.Y. Li, An Algorithm for Symmetric Tridiagonal Eigenproblems— Divide and Conquer with Homotopy Continuation, SIAM J. Sci. Statist. Com— put., 14(1993), pp. 735-751. [23] T.Y. Li, T. Sauer, J. Yorke, The random product homotopy and deficient poly- nomial systems, Numer. Math., 51, (1987), pp. 481-500. [24] T.Y. Li , Zhonggang Zeng, Homotopy-determinant algorithm for solving non- symmetric eigenvalue problems, Math. Comp. 10(1992), pp. 483-502. [25] T.Y. Li and Zhonggang Zeng, The Laguerre Iteration in Solving the Symmetric Tridiagonal Eigenproblem, Revisited, SIAM J. Sci. Comput. 9(1994), 1145- 1173. [26] T.Y.Li, Z. Zeng, and L. Cong, Solving eigenvalue Problems of real nonsymmet- ric matrices with real homotopies, SIAM J. Numer. Anal. 29(1992), 229-248. [27] Message Passing Interface Forum, MPI: A Message-Passing Interface Stan- dard, May 5, 1994. [28] M. Marden, Geometry of Polynomials, Amer. Mathematical Soc., Providence, RI, 2nd ed., 1966. [29] Tsuyako MIYAKODA, Iterative method for multiple zeros of a polynomial by clustering, J. Corn. and App. Math. 28(1989),pp. 315-326. [30] J .M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, 1970. [31] B.N.Parlett, Laguerre’s Method Applied to Matrix Eigenvalue Problem, Math. Comp. 18(1965), pp. 464-485. 59 [32] L. Pasquini and D. Trigiante, Il metodo di continuazione e l’approssimazione simultanea degli zeri dk un polinomio, Monografie di Soft. Matem., N. 30, Pubbl. dell’IAC, 1984. [33] T. Terano, On a global algorithm for algebraic equationss, PhD Thesis, Uni- versity of Tokyo, Information engineering course (1978). [34] J.H. Wilkinson, Error Analysis of Floating—point Compuation, Numer. Math. 2 (1960), 319-340. "lllllllllllllllf