r551; A: .q: -« "w 4 A m 1.2-» w. . ..:..«.4....<. w" v "am. w...- u... . ”-1 < mm. nun-a» , v ma- n 11 .. 4 we - ‘ifi555,py .3... ‘- . raft-anwfi‘ .‘ , L § but. ' H mm“ :g m “" 1:: ”.1 21“}? at: ”i“:- at 15 - V .‘ot.¢. . -.'. £19 .4” _ Tr? '..-4~ a 4 ~ I. ".4. x you. A. :y...~ . .., «an... - . "w... 44 A 4.1.» n- rl vol r w vv-.r. 1.." ~ “a..- J -. ~.A..,..- "- 4-. ‘9..- iwn. a . “‘JRO - .04 mum. , . '11‘Hfi “4'3-..“m . ....<’,.... _ v...” 4 4:1 MN. an 1 .; W. ‘1 1 .y-.. g , w, .w :i‘fiwfip’ . :‘1k 12 "" - ,- m...” “4 1, hr" ..., “w. -c l .m "I? '" " 29:. 3'5‘ . mi‘ 1 a an}: _ . ‘ . _ £17»... > 7n” s l . ‘gqll‘i' . ‘33”, -, 39' l . >- I " tasty: ‘ ‘ . ‘O' ' h, ‘ I 1 “S. 1 : ! )h""-‘?\i§zfi: . t ‘ that ‘31}; h 245v}: ’1} ‘ . Vl-t r > n 13 43?.- .1'331253: i 1 " i . 2 11!" ‘ )th .: wk". 1 if? :11“; 7‘35: :5; .3?!“ V .11.! ‘ ‘ r\?'§‘?.54 mix I : "‘ii!3' 2331‘.‘, ,. 3 1 - VJ; ("83124.3 3 1293 01421 768 “fig lllililllllllllxiu BRAKES 11111 , This is to certify that the dissertation entitled Quasi-Laguerre's Method and Its Parallel Implementation on Solving Symmetric Tridiagonal Eigenvalue Problems presented by Xiulin Zou has been accepted towards fulfillment of the requirements for Ph-Ds degree in ApplieLMaihematics . / \ ’ ,4 ~//;;?i:::~. " //C«Q_‘v/ /L; (r I Date 12/12/95 MS U is an Affirmative Action/Equal Opportunity Institution Major professor LIERARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your "cord. TO AVOID FINES mum on or baton date duo. DATE DUE DATE DUE DATE DUE QUASI-LAGUERRE’S METHOD AND ITS PARALLEL IMPLEMENTATION ON SOLVING SYMMETRIC TRIDIAGONAL EIGENVALUE PROBLEMS By Xiulin Zou A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Mathematics 1995 ABSTRACT QUASI-LAGUERRE’S METHOD AND ITS PARALLEL IMPLEMENTATION ON SOLVING SYMMETRIC TRIDIAGONAL EIGENVALUE PROBLEMS By Xiulin Zou First of all, a quasi-Laguerre’s iterative method is derived for solving polyno- mial equations. Unlike the well-known Laquerre’s method that requires evaluating the second derivative, the quasi-Laguerre’s method only involves the evaluation of the function and its first derivative and still maintains the monotonical convergence property of the Laguerre’s method for solving polynomial equations with only real roots. Two different approaches to derive the quasi-Laguerre’s method are given and each approach reveals some different features of the method. It is also proven that the order of convergence of the quasi-Laguerre’s method is l + x/i Secondly, a new algorithm using split-merge and the quasi-Laguerre’s method with cloud and cluster handler is developed to solve symmetric tridiagonal eigenproblems. The cloud and cluster handler utilizes the multiplicity estimation method developed ealier in this work. Numerical results show that the new algorithm is very competitive in both speed and accuracy. Finally a parallel version of the new algorithm is designed and implemented. Nu- merical results on a substantial variety of matrices show our algorithm is the best one among the existing parallel algorithms for symmetric tridiagonal eigenvalue problems. TO MY PARENTS For their 70th birthday Contents 1 Introduction 1 2 Quasi-Laguerre’s method 4 2.1 Laguerre method ............................. 4 2.2 Quasi-Laguerre method .......................... 5 2.3 From an optimization point of view ................... 21 2.4 For polynomials with complex roots ................... 27 2.5 Convergence order ............................ 27 3 Estimate multiplicity 33 3.1 Determining multiplicity of a root .................... 33 3.2 A new stopping criteria .......................... 38 4 Application to symmetric tridiagonal eigenproblem 40 4.1 Introduction ................................ 40 4.1.1 Evaluation of the logarithmic derivative f’ / f of the determinant 40 4.1.2 The split-merge process ..................... 42 4.1.3 Deflation .............................. 45 4.1.4 Cluster and cloud handler .................... 45 4.1.5 Partial spectrum ......................... 54 4.1.6 Stopping criteria ......................... 55 4.2 Description of the new algorithm .................... 55 4.2.1 The global Newton’s formula ................... 55 4.3 4.2.2 Initial points for the quasi-Laguerre iteration .......... 4.2.3 Quasi-Laguerre iteration with cluster and cloud handler . . . . 4.2.4 Stopping test ........................... Numerical tests .............................. 4.3.1 Testing matrices ......................... 4.3.2 Speed test in evaluating eigenvalues without computing eigen- vectors ............................... 4.3.3 Accuracy test ........................... Parallel computation of eigenvalues 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Introduction ................................ Issues for parallel algorithm design ................... Determining performance ......................... Existed parallel algorithms ........................ The parallel quasi-Laguerre’s method .................. 5.5.1 Load balancing .......................... 5.5.2 The pseudo-code ......................... Performance test ............................. Comparison with parallel bisection and sequential root free QR vi 55 56 58 58 61 62 69 69 70 72 73 75 75 80 81 83 List of Tables 4.1 Accuracy comparison between the new version and the old version of the quasi-Laguerre’s algorithm. The numbers in the table represents the max-error of Icomputedez'gs — trueeigsl/(lnorm), as multiples of machine precision ............................. 68 5.1 Performance test result on DEC ALPHA workstations ........ 83 vii List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Distribution of the roots of the polynomial in case 1 .......... Distribution of the roots of the polynomial in case 2 .......... Distribution of the roots of the polynomial in case 3 .......... Movement of quasi-Laguerre iteration ................. fl “Ik— root .................................... sign of qk = fig} and qk_1 = LAB—'71)} as the sequence approaches the Labeling the roots clockwise ....................... Split and merge processes ........................ Cluster and cloud of eigenvalues ..................... inipts .................................... qlag .................................... qlag .................................... Comparison of execution time on Dec Alpha between NQL-the new version of quasi-Laguerre Iteration and OQL—the old version of quasi- Laguerre’s method, for finding all eigenvalues without computing eigen- vectors. .................................. Execution time on Dec Alpha for finding all eigenvalues without com- puting eigenvectors. B/ M: DSTEBZ; NQL: the new quasi-Laguerre Iteration; RQR: Root-free—QR—DSTERF ................. Execution time(continued) on Dec Alpha for finding all eigenvalues without computing eigenvectors. B / M: DSTEBZ; NQL: the new quasi- Laguerre Iteration; RQR: Root—free-QR—DSTERF. .......... viii 14 14 15 17 22 45 46 57 59 60 63 64 4.9 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Error(on the scale of machine double precision) , on Dec Alpha, for finding all eigenvalues without computing eigenvectors. B / M: DSTEBZ; NQL: the new quasi-Laguerre Iteration; RQR: root free QR—DSTERF. Before load balancing in shared environment, other CPU intensive jobs are running on PE # 2 also. Left: Execution time of each Alpha workstation. Right: Number of eigenvalues computed by each Alpha workstation. Matrix size 5000, type 4 .................. Before load balancing in Heterogeneous environment, PE ## 1—5 are Alpha workstations, PE ## 6—7 are SUN SparclOs. Left: Execution time of each workstation. Right: Number of eigenvalues computed by each workstation. Matrix size 5000, type 4 ............... After load balancing in shared environment, other jobs are running on PE # 2 also. Left: Execution time of each workstation. Right: Number of eigenvalues computed by each Alpha workstation. Matrix size 5000, type 4 ............................. After load balancing in Heterogeneous environment, PE ## 1-5 are Alpha workstations, PE ## 6-7 are SUN SparclOs. Left: Execution time of each workstation. Right: Number of eigenvalues computed by each workstation. Matrix size 5000, type 4 ............... Two configuration of the PVM machines. Top: formed by Dec Al- pha workstations. Bottom: formed by Dec Alpha and Sun SparclO workstations ................................ Comparison between bisection — B / M, quasi-Laguerre — NQL, and root free QR — RFQR ............................. Comparison(continued) between bisection — B/ M, quasi-Laguerre — NQL, and root free QR — RFQR, on a random matrix of order 5000 . ix 67 77 77 80 84 86 87 Acknowledgment First of all, I am indebted to my advisor, Prof. Tien-Yien Li, for his support and guidance during my study at Michigan State University. Secondly, I owe my wife, Ying Zhou, for her support and sacrifice as I was so intensively engaged in my research and teaching that she had to take full charge of all the domestic matters while still taking four courses of her own. I would like to thank all the Professors from whom I have gained more insight, more inspiration, or more knowledge about Mathematics and its application. I am very appreciative to Prof. Enbody from the Computer Science Department for giving me the privilege to take several of his courses to gain my knowledge with computer science and parallel computing and for his sponsoring a computer account in the Computer Science Department for me. I also want to thank the Advanced Computer System Lab at Michigan State University for providing the computing facilities for my research. I thank Mr. Min Jin for some valuable discusion, Ms. Wenjiang Qiao, Mr. Paul Gray, etc., for some PVM technical consulting and information exchanging. Last but not least, I thank all the committee members, Prof. Chiu, Prof. Dunninger, Prof. Frazier and Prof. Zhou, for reading my thesis and providing valuable suggestions. Chapter 1 Introduction Define n ’%f)li\/En—l)[(n—1)(p'x)2)—nfl—l P(x) P(1=) [11(33): 33 - and 13(10): L+($) if|L+(10)- xl S IL-(ft) - SCI, (12) L_(:1:) otherwise. Then Laguerre’s iteration n+1 = L($k), applied to a polynomial that has only real roots, converges monotonically and cubically to a root of the polynomial, starting from any initial point [16]. This method has been successfully used to find all the eigenvalues of symmetric tridiagonal matrices [19] with a great speed up. However, this method requires the evaluation of the second derivative, which is normally very expensive. A method that maintains the advantages of Laguerre’s iteration and in- volves only first derivative is sought in this work. The problem is formulated as follows. Let f($) = kI-IICB-ra) (1-3) be a polynomial of degree n that has only real roots. Assume that the logarithmic derivatives $113)) 2 qo and 1&5}: ql are known and no root of f (:5) lies between 3:0 and 2:1. Find, based on 2:0,qo,a:1, and q1, an iterative method that converges monotonically to a root of f (2:), starting from 3:0 and 2:1. First of all, our iteration formula for this purpose is derived for polynomials of form f(:r) = k(a: -— r)'"(:r — z)”‘m by solving 7' in the following system, m n — m : qu (1'4) 3:0 — 1' :co — z m n — m :r1— 1' x1 — z The formula for r is r _ $o+$1+ mn— [(n+m)Aq+QOQIA$]A4£ (16) 2 —mu‘firmi \/—m(n-m)(90qi +71%) + [90611 +73%: 29:2 where Aq = q1 — go, and Arc = 2:1 — are. We call (1.6) the quasi-Laguerre’s iteration formula. We proceed to derive the formula for a general polynomial of form (1.3) and begin with the following two equations, " 1 Exo—T' —(]0, (1.7) jzl .7 n 1 Z x! _ r = C11 (1-8) j=1 J Let 6:1: 2 mn — [(n + m)Aq + qoqlAm]% (1.9) 472M 2c \/—m(n — quoq, + neg)+[qoq1+ n4: M and 5 if 6 < 6-, 6(sco,:r1,q0,q1,m) = + . I +| —| I (1.10) d-1f|6_l< |6+|. Define 1' +1: QL(xo,x1,qo,q1,m) = 0 l +6(xo,:v1,qo,q1,m). (1.11) 2 Then, the iteration 33;,“ = QL(mk_1, :rk, qk_1, qt, 1), starting from 1:0 and 1:1, converges monotonically. It is shown that the limit of the iteration is a root of the polynomial. We call this iterative method the quasi-Laguerre’s method. The integer m in the formula is called the multiplicity index of the quasi-Laguerre’s formula. It is shown that if the multiplicity index of the formula matches the multiplicity of the root, the order of convergence of the iteration is \/2 + l. 2 While quasi—Laguerre’s iterative method is best suitable for solving polynomial equation with only real roots, it can also be used to solve polynomial that may have complex roots. Though the global convergence property is not guaranteed, we have shown that the order of convergence is still \/2 + 1 provided that the multiplicity index matches the multiplicity of the root. In solving polynomials that have only real roots, the two Laguerre’s iteration se— quences mt“ = 1:: + L+(:rk) and 2;“ = 2:; + L_(:ck) both converge to a root of the polynomial, starting from any initial point 330. That is, the Laguerre’s iteration sequence generated by the same sign in formula (1.1) converges to a root of the poly- nomial, no matter where the iteration starts. However, this feature is not inherited by the quasi-Laguerre’s function. A quasi-Laguerre’s iteration sequence generated by mt“ = $1213: + 6+, with m = 1, marches toward the right hand side of the initial interval [$0,313]. If there is no root of the polynomial in this direction, the iteration will wrap around and produce a point on the left hand side of the left most root of the polynomial. When this happens, the iteration can not proceed any more be- cause there are roots of the polynomial between the two iteration points now. Similar situation could occur for quasi-Laguerre’s iteration sequence generated using the — sign. A new algorithm with cloud and cluster handler is designed in Chapter 4, using Split-Merge and quasi-Laguerre’s method to solve the symmetric tridiagonal eigen- problems. The new algorithm differs from the one given in [15]. Various numerical results for a substantial variety of matrices presented in Sec- tion 4.3 indicate that our algorithm is strongly competitive in both accuracy and speed. A parallel version of our algorithm is implemented on a cluster of worksta- tions using PVM (Parallel Virtual Machine) and numerical results show that our algorithm leads all the existing algorithms on distributed memory parallel platforms. Chapter 2 Quasi-Laguerre’s method 2.1 Laguerre method Let p(a:) be a polynomial of degree n, with all its roots being real. By 12 Mi: (n—l) (n-1)(flfl)2—npnx p(~’v) 10(1‘) p(~'v) the Laguerre’s iteration is defined as [4:]:(112) = (C — 11::+1 = Li(:ck). (2.1) Let $k+l be one of {:r;+1, 55:11} which is closer to 13),. Then the iterates {wk} converges at least linearly from any guess 11:0 close to a zero 2 of p(a:), and converges cubically when 2 is a single root. In general, Laguerre’s iteration is defined via (2.2) This function is called the Laguerre iteration function with multiplicity index m. Let an,“ be one of the {Lm_(:rk), Lm+(a:k)} that is closer to xk. Then the iterates {1:1,} converges at least linearly from any guess 9:0 close to a zero of p(:z:) with multiplicity greater than or equal to m, and cubically if 1:0 is close to a root that has multiplicity exactly equal to m. Kahan [16] derived this general Laguerre function in the Riemann sphere. In the real field, Kahan [16] used the non-overshooting strategy to derive the Laguerre’s formula, which leads to the monotonical convergence of the method for m = 1. For general m < n, the method never overshoot more than m — 1 roots of the polynomial. 2.2 Quasi-Laguerre method Laguerre’s iteration is an excellent method for finding the roots of polynomials with only real roots because of its global and cubic convergence. It has been successfully applied to solving symmetric tridiagonal eigenvalue problems by Li and Zeng [19]. However Laguerre’s iteration involves evaluations of the polynomial itself, its first derivative, and its second derivative. In [7], the so—called Quasi—Laguerre’s iteration was established which maintains the global and monotonic convergence of the La— guerre’s iteration without evaluating the second derivative. We shall present in the following a different approach to derive the iteration which were obtained indepen- dently from, and almost simultaneously as, the work in [7]. The quasi-Laguerre’s formula derived from a special polynomial First of all, we assume the polynomial is of form p(a:) = a(:r — z)m(a: — t)"‘"‘. Suppose {xo,p(mo),p’(a:0)} and {x1,p(:r1),p’(:rl)} are known. Let qo = fl—ol and _ '1‘1 q, — p(rn)' Then, m n — m “ (10- (2.3) xo—z :rO—t = , 2.4 531—2 171-1 QI ( ) m n—m Eliminating t from the above two equations yields, 2 "“10 "“11 m 5170 — $1 ]: (q°‘q‘)+m(x1 — Z)($o — z)‘ n—m[qoql—($1—z :rO—z (ml—z)(:ro—z) Let Ax = x1 - :ro, Aq = q1 — go. It follows that _mqoxo + qm — (qo +q1)z m 2 _ M, 9; m(n — m) (qu1 (1:1 — z)(:1:0 — z) +(:r1 — z)(a:o — z) ( )Aa: (51:1 — z)(a:0 — z). A _ _ 2_ _ (qui'i-(n—m _q_m(q0$0+q1$1) m(q0+q1)z m m(n m). Am — (331—2)($0"Z) AC] _ "1010330 + (11331— n) " m((10 + (11):? goq1+(n m)A$- (xl_z)(x0_z) ' Let A = qoq1+(n—m)%§. B = "10101130 + (11331 — 72), (2'5) 0 = m(qo+q1). Then A(IB1 —Z)(.’130—Z) = B—CZ. (2.6) Or, A22 + [C — A(.’IIO + 1171)]? + 14330171 — B = 0. So, _ A(xo + x1) — C :l: \/[C — A(:I:o + 231)]2 — 4A(A:1:0.'1:1— B) z — 2A 1 = $0423: — am :1: \/[C -— Am + m]? — 4A(A:coa:1— 3)} $0 + $1 1 where Q = 02 — 2CA(2:0 + .731) + A2(:lro + sci)2 — 4142170131 + 4A8 = C2 — 2CA(:L'o + 1131) + A2030 — $02 + 4A3 : 02 + Q“ and Q1 = —2CA(mo + 1:1)+ A2(a:o — in)2 + 4AB = A[—2C(;ro + x1) + A(:z:0 — 51:1)2 + 48] = A{-2m(CIo + <11)($0 + x1) + l‘Io€11(Afl7)2 + (n — "UACIAJFI +4m(qoxo + qlxl — n)} = A{——2m[Cono + qlxo + qoaso + qlxl] + (10(11(Aa:)2 +(n - m)[q1:v1 — qlxo — qox1+ qoxo] + 4m[qoxo + qlxl - nl} = A{[—2m + (n — m) + 4m] [qoato + qlxl] +l—2m - (n — m)llQO$1 + (11150] — 4mn + qul(A$)2} = A{(n + m)[qo:coq1:13o — qoxl - (11500] — 477171 + qoqi(A$)2} = AU” + m)(q1— (Io)($1 — 170) - 4mn + (10(11(A$)2} = A{(n + m)AqAx - 4mn + qoq1(A$)2}- A Q = "12010 + (102 + [(10611 + (n — m)Kgll‘4mn + (n + m)AqA$ + CIOCMAJTVI A = "12(th + (11)2 - 4mnqoq1 - 4mn(n - m)—q- + (n + m)qoq1Aquv A2: +(n2 - m2)(A‘1) + (1091(133) + (71 -m)qoq1AqAx A = m2(qo + ql)2 — 4mnqoql — 4mn(n — "GK: +2nqoq1AqAa= + (n2 - m2)(A<1) + <10<11(AII¢)2 Aq = m2(qo + my — 4mnqoq1 -— 4mn(n - THO-5; + [C1041ACL' 'l' nAq]2 — m2(Aq)2 A = m2[( 1:0, the general Laguerre’s iteration formula (2.2) with multiplicity index m is obtained. An alternative form of the quasi-Laguerre’s formula Rewrite (2.6) in the following form A(z — 2:1)2 + [C + A(x1— 230)](2 — x1) + C331 — B = 0, where A, B, and C are defined by (2.5), then the solution can be expressed as, —(C + AAx) :l: x/B 2A ’ (2.8) 2 = 331 + where R = (C + AAsc)2 — 4A(C:1:1 — B) = C2 + 2ACA$ + A2(A$)2 — 4ACx1+ 4AB = C2 — 2AC(:ro + x1)+ A2(A:L‘)2 + 4AB A = "12010 + (102 — 2[(10(11+(n " Twig] "1010 + q1)(:co + $1) E An: A = m2(qo + (11)2 +(qoq1 + (n - m)A—:)2(Aiv)2 A +(qoq1 + (n " 7”) )2(A33)2 + 4[9091+(n — m)A_:] "1010330 + qiqi - n) A +2 [qoql + (n — "GAE-i m(A$Aq — 2n) Aq 2 Ag Aq _ 2 2 _ 2 _ _ 2 — m (qo + q1) + [qoq1+ n at] (Ax) 2]q0q1 + 72 at] m $(Ax) A +m2(Aq)2 + 2m [(1091 + nA_:i (Aqu — 2n) — 2m'2(A€I)2 + 472ng A A 2 A = 4m2 [4091 + fig] + [Cloth + n-A—g] (Ax)2 - 4mn [qoql + n—A—g] A A [(10% + TIA—g] ([QOQ1 ‘1' iii] (Ax)2 + 4m2 — 4mn) , and Aq C + AAx = m(qo + q1)+ (qoq1+(n — ME) A3: A = quo + (90(11 + 71—2) A11: Substituting these into (2.8) yields, 2mqo + ((1041 + 71%) i \/[(qu1 + 71%] ([0091 + 71%] (Ax)2 + 47710” — 71)) 201041 + (n — m)%§) 2’=~’L‘1-l- (2.9) Note that C331 — B = m(qo + ql):c1 — m(qo:ro + qlxl — n) 2 nm + mquac. So, z : 31+ 2(nm + mqurc) — (2mqo + ((10% + n%)) i \/[qu1 + 11-23,] ([Cqu1 + 11%] (Ax)2 + 4m(m — n)) (2.10) Equations (2.7), (2.9) and (2.10) are different formulae for the same root 2 of the polynomial p(:1:) = k(:r — z)m(:r — t)"‘"‘. In the following, we shall use those different forms in different occasions. Deriving the quasi-Laguerre’s formula for general polynomials Let p(:r) be a polynomial of degree n with only real roots r1,r2, - - - ,rn. Some of these roots may be equal to each other, i.e., they may be multiple roots. Let 3:0, 1:1, qo,q1 be given such that 2:0 < $1, “fig—0°} = (10,124)?) = q1 and the interval [$0, $1] does not contain any r23. To summarize, we emphasize that the derivations hereafter will always be under the Basic Assumptions: 1. The polynomial p(:r) has only real roots. 2. There is no root of p(:r) between 2:0 and 9:1. 3. The polynomial p(:r) has prescribed logarithmic derivatives at (1:0 and $1. Write p(a:) = kHE-‘zlcr — Tj). We want to estimate all the roots of p(:z:) under the above conditions. It turns out that the estimation gives rise to the quasi-Laguerre’s formula (2.7). To begin, notice that ” 1 (10:2 j=l " 1 (11:2 j=l $1 — rj (2.11) a {Ea—7‘1“ (2.12) Let Y,- = HEEL—T‘bjr- 1,2,~.,n, and y: Elf-Q. Then, " l (10: , (2.13) Eli/j -y 9 n 1 f; Y1 + y Adding and then subtracting the above two equations, we have n Y,- QOI = , (2.15) 12:31 3sz _ y2 —Aq " 1 = , 2.1 A2: j=l 16-2 _ y2 ( 6) where qm = 9932591 and Aq = q] — qo,A:c = 1:1 — 11:0. Lemma 2.2.1 The following inequalities hold 2 2 - _ Y,- —y >0, forallj—1,2,---,n, (2.17) and _Aq Ax > 0. (2.18) Proof. Since 132 — y2 = (Y,- + y)(YJ- — ) = (2:1 — rj)(:co — 11,-), inequality (2.17) follows. Inequality (2.2.1) follows from inequality (2.17) and equation (2.16) El Let Y = 39% — r, where r is a root of the polynomial p(a:) with multiplicity greater than or equal to m. For simplicity and without loss of generality, let 1‘ = r1 = r2 = = rm 2 = rm+k, wherek Z 0. So Y = Y1 2 Y2 = = m+k. Rewrite (2.15) and (2.16) into the following, " Y,- mY = (101 — ——, (2.19) jg?“ Y3? — 312 Y2 — y2 n 1 —Aq m —— = — — . 2.20 15;] Y]? _ 312 A55 Yz — y2 ( ) Note that Ym+1, . - -, Ym+k stay on the left hand side of the above two equations, only m equal terms are combined and moved to the right hand side. It follows from Holder’s inequality, (in: Yj )2< Zn: sz i 1 (2.21) j=m+l ’62 _ y2 - jzm+l Y? — y2 j=m+1 ’32 _ y2 n 1 n 1 = n—m+y2 Z —— Z —-——. (2.22) j=m+1 sz - yz j=m+1 sz — 92 10 Remark 2.2.2 Equality holds in (2.21) if and only if Ym+1 = Ym+2 = = Y", 50 p(.2:) has only two different roots. Using (2.19) and (2.20), we have 2 mY Aq m Aq m _ __ < _ 2 __ _ __ _ [(101 Y2 —y2 _ n m+ ( A2: Y2 -—y2)]( A2: Y2 —y2)° (2.23) Dr, M (M)2 Y AqAa: 1 2 2 €101+(n—m)'A—$-——4—-2m<101Y2 _ y2 +[m +(n—mlm—m 2 ]Y2 _ y, _<_ 0 Since Y2 — y2 > 0, so, A A 2 [(131 + (n — "ll—g - (——q)—] (Y2 - yz) — 2qu1Y + nm - mAqAx S 0 A2: 4 And hence, Aqu 2 2 All (Ag)2 1 [Tim—m 2 —y (qo1+(n—m)‘A—;-—4— W 1 Aq Aq 2 Or, A A9: A 1 1 A [nm —— m (14 — y2 (q0q1+ TVA-73H -Y—2 — 2mq01—f +q0q1+(n — mug—:1 S 0. (2.25) Lemma 2.2.3 The inequality ' A qOQl + n—A-Z- s 0 (2.26) holds, and equality holds if and only ifp(:r) = k(:r — 7‘)". Proof. Applying Holder’s inequality, we have Aq _ Aq 2 (Aq)2 (IOCII‘anm —nA:r +901 n 1 Y- : —nZY2_y2+(Z}/2_J , 2 jzl J y l y2 j=1 j-y2 |/\ | 'M "S J: J n 1 n 1 n Y2 n 1 < _ __ J _ 2 __ = _ ”Zyz_ +ZY2—y2 (gyfi_y2 ygyj2_y2) 0. 2 y j=1 11 Lemma 2.2.4 The inequality AqAa: 4 A 2 9 y qq +n— > nm 2.27 (0 1 A13) ( ) nm — m holds. Proof. The assertion follows directly from Lemma 2.2.3 and Lemma 2.2.1 E] Lemma 2.2.4 indicates that the leading coefficient of the quadratic inequality (2.24) is positive. Let a, b, c denote the coefficients of inequality (2.24), i e m AqAa: 2( + Aq) a = n —m — n— 4 y (loch Ax A A2: A2: 2 = nm—(n+m> ‘1 —qoq.( ) 4 4 b = —2mq01=—2m-%—q—l, A C = 4041+(n—mlxg- To solve the quadratic inequality, let’s solve the quadratic equation first. Denote the two roots of the quadratic equation (2.25) by 21 and 22. Then _bi‘/b2_4ac= mmi (% )2—ac 2a a mlg—g—q-L :i: m2(991.;11)2 — [nm— (11 + 71093—35 - Cloth—[Kn —m)-:—:- + (1091] nm— (n+m)A“—-Qoq1AT”- Simplifying the radicant, we have 2 radicant = r1222(q—0——— + q1)2_ m(n — m)A—q — nmCIOCh + (n2 — m2) (Alf) a: Aq A2: A A2: A2: 2 +01 + m) 4 (1091+(n — 7")th (14 + (qoq1)2( 4) A 2 A = n2(—4q)— + mzqoql — nm(n — m)A—: — nmqoql A A2: A2: 2 4'2"th 4 + (CIOQ1)2( 4 ) nA A Aq A -—- (——")‘ — m 180 + $1 __ i _ 2 21 So p(a:) may have roots on either side of the interval [2:0,:c1], and gig—fl — f1- (resp. 513-1231 — i) is the nearest possible root to the right (resp. left) hand side of the interval. See Figure 2.1. Case 2: 0 <213z2. 13 / x0 19;.) X1 \ <_X.:_Xco__12_2 (xfixohq 2 1 Figure 2.1: Distribution of the roots of the polynomial in case 1 For this case we have l- 0 and 6+ > 0, then 59% < QL+ < QL-. Proof. We prove the theorem by relating 61 to ;L; and the above three cases. Obviously, any root has multiplicity 2 1 and the numerator of 61(see (2.29)) is positive by Lemma 2.2.4. So only the denominator of 6* needs to be examined. If 6- and 6+ have different signs and 6+ < 0, then —m99—‘253-‘- < 0, which leads to (L < 0, a contradiction. So (1) is proved. If 6- > 0 and 6+ > 0, then (10+ €11 m— 0 2 > 15 and + A "‘7"?ng > \f-mm - m)(CI0q1 + nA_:l:) + (”A9 + (10413102 /4- That is, the denominators of 6- and 6+ are both positive, and the denominator of 6- is less than that of 6+, hence 6- > 5+. Furthermore, we have 5%“ < QL+ < QL-. If 6- < 0 and 6+ < 0, then 90 + (11 0 2 < —m and + A |— "2&7qu > \/—m(n — m)(q0q1 + nA—z) + (nAq + qoqlAar)2 /4. So the denominators of (L and 6+ are both negative, and the absolute value of the denominator of 6- is greater than that of 6+. Hence, 6- > 6+. Furthermore, we have QL+ < QL_ < fly. I] The following result can be easily verified. Corollary 2.2.7 Ifd- < O and 6+ < 0, then |6-| < |6+|. Ifd- > 0 and 6+ > 0, then |6_| > |6+|. Theorem 2.2.8 Among all the polynomials, p(.r) = k(:z: — r)(:c — t)”"1 is the one that has root closest to the interval [$0,131]. Proof. It follows from Remark 2.2.2 that r = QLi if only if the polynomial is of the form p(:z:) = k(:(: - r)(2: — t)"‘1. The assertion then follows from Theorem 2.2.6. I] We call the polynomial in Theorem 2.2.8 the optimal polynomial. Lemma 2.2.9 The inequality Ar): iamii > —2__ (231) holds. 16 Proof. The roots of the polynomial must be outside of the interval [2:0, 2:1]. It is clear that QLi = 59—131- + (ii are the roots of the optimal polynomial, so they are outside the interval [$0,231]. Therefore Idmil > 925. E] Figure 2.4 illustrates the three cases discussed in Theorem 2.2.6. If one identifies two ends of a straight line as 00, a uniform circle is obtained. Point A in the figure serves as a reference point so that one can tell where 00 is on the circle corresponding to the three line cases. I m A QL_ x0 x] QL+ case I m 1 QL+ QL+ A QL. x0 x1 QL. \/ case 11 m 2 uniform view of the 3 cases 1 m pomt A 18 a reference pomt x.) x1 0L. A QL_ case 111 Figure 2.4: Movement of quasi-Laguerre iteration Proposition 2.2.10 If —q1 < 0 and —-q0 < 0, then |6-| < |6+|. If —q1 > 0 and ‘qo > 0, then |5-| > I6+|. Proof. If —q1 < 0 and —q0 < 0, then the denominator of |6-| is greater than the denominator of |6+|. Hence |5_| < |6+|. The second case can be achieved similarly. U 17 For polynomials with only real roots, a monotonically convergent algorithm is obtained and we call it the quasi-Laguerre’s algorithm. We describe the algorithm below and deduce some of its properties from the above contents. Quasi-Laguerre algorithm: Let p(:c) be a polynomial with all real roots. Start with an interval [20,21] that contains no roots of p(:1:). Since any root will have multiplicity at least one, we can always use m = 1 in the quasi-Laguerre’s iteration. If the nearest root of the polynomial is known to be of multiplicity 2 m, then m should be used to accelerate the iteration. The following iterative scheme converges monotonically and the iteration sequence never cross the root. Theorem 2.2.11 1. Initial step: evaluate qo = #21:; ’41 : p’grnz 19(101) ' 2. Choose m properly if possible. Otherwise, let m = 1. 3. Compute 51: according to (2.2.9). Then, we have + 2+ (a). If 61: > 0, then :13: = 3:312:21 + 6+(x:_2,xt_l,qz_2,q:_l) converges to a + _ _ _ _ root ofp(:c), where at] — 1:1, qf' — ql, $3 — 1170s (1d — (10; (b). If 61 < 0, then x; = iii—41 + 6-(x;_2,x;_l,q;_2,q;_l) converges to a root ofp(:c), where 2:," = 2:0, qf = (10, $6 = 1171, (15 = Q1, (c). If6-0,letxfzzrozxa',xf=x1=$3,qf=CIo=(Idr q? = q1 = qr“;- Then, _ x-_ +r-_ _ _ -. -. . g, = ——,— +g_(s._.,e._.,g._.,g._.) gs to a root ergo), if —qo < 0. + +13g ° 93: = ill-lz—‘k;l +5+($Z_2sxt_1,qt_2sqt.1) converges to a root ofp(:l:), if —Q1 > 0. We call the sequence the quasi-Laguerre’s iteration sequence. 18 Proof. It follows from Theorem 2.2.6 and Lemma 2.2.9 that the quasi-Laguerre’s iteration sequences converges monotonically. In the following, we will omit super- scripts + or -. Assume the limit of the sequence {wk}:°:0 is r. Let k —-) oo in 23k + mk- $k+1 = ——2——1 + 5($k.$k—1,kaCIk—1)s then, (SUE/ca xk—la qka Qk—1)—> 0- That is, _(n + m)AQk—1Ark—1 _ (Ik—ICIkgAxk-IV _) 0 mq——’——-" ”Lg“ i\/-mm n(-m )(qk 1qk+nA:—:—‘_ l)+(nAq/c 1+qk 1qkAzr/c 1)2 /4 Lemma 2.2.4 ensures that the numerator of the above fraction is greater than or equal to mn. So, - + A _ _m2"_1_2-_q_"i\/—m(n - m)(qk_1q;c + ”A: l)+(nAqk-1 + qk-lqkAxk-1)2 /4 —> oo. -1 If p(r) 7:4 0 then the limit should be _mp’(r) :l: \I—m(n — m)((-er)-)2 + nipfl) 79 oo p(r) p(r) dw p(r) This contradiction leads to the conclusion that r is a root of p(r). [1 Remark 2.2.12 It follows from Proposition 2.2.10 that ifr is a root of p(at) such that an. —-> r+, as k —> oo, i.e., the quasi-Laguerre’s iteration sequence is approaching r from the right hand side, then i6—(l‘kgxk—laCIkaCIlc—1H < |5+(I€k,$k—1,kaQk—1)I for k large enough since —qk and —qk-1 would both be negative, see Figure 2.5. Similarly, if an, —2 r“, as k -+ 00, i.e., the quasi-Laguerre’s iteration sequence is approaching r from the left hand side, then l5—(xk,$k—1sqk,Qk—1)I > |5+($k,$k—1,qksqk—1)| 19 for 1: large enough, see Figure 2.5. Notice that if the quasi-Laguerre ’s iteration sequence converges, then the sign cho- sen in the formula is always the one that makes the magnitude of 6 smaller than the choice of the other sign. This constitutes an important feature of our algorithm. / Xk xk-l xk-l Xk \ - 0, - 0 - O, - qk< qk-l< q > qk>0 Figure 2.5: sign of qk = 21:) and qk-l = %)l as the sequence approaches the root Remark 2.2.13 Starting from any initial point 2:0, two Laguerre ’s iteration sequences, xi“ = L+(a:2') and 2;“ = L-(ccz), can be generated and both converge to a (dif- ferent) root of the polynomial. However, this is not true for the quasi-Laguerre’s iteration. The quasi-Laguerre’s iteration sequence generated using the + sign, for in- stance, moves upward ( to the right). If there is no root of the polynomial on the right hand side of the initial interval, the iteration sequence will definitely wrap around after some steps of iterations, hence produces a point that is on left hand side of all the roots. When this happens, the iteration usually collapses because there are roots of the polynomial between two consecutive iterates now, hence complex number may be generated with the quasi-Laguerre’s function. Numerical experiments have verified this phenomenon. Therefore, the choice of sign in generating quasi-Laguerre’s itera- tion sequence is more delicate. To summarize, if one starts with an interval [xo,$1] that contains no root of the polynomial, then the following guidelines can be used to generate iterates: If it is known that there is a root on the left (right, resp.) hand side of the initial interval, then the — sign (+ sign, resp.) is chosen in the quasi-Laguerre’s function to generate iteration sequence; If this information can not be obtained, use the criteria in Remark 2.2.12 to generate iteration sequence. 20 2.3 Horn an optimization point of view A Theorem from optimization The following result can be found in [22]. Bold faced letters represent point in an Euclidean space E" of dimension n. Definition 2.3.1 Let x“ E E" be a point satisfying constraints 110‘") = 0, g(X*) S 0, and let J be the set of indices j for which g,(.r"‘) = 0. Then x"' is said to be a regular point of the constraints if the gradient vectors Vh,-(x"'), ng(x*), 1 S i S m,j E J are linearly independent. Theorem 2.3.2 (Kuhn-’Ihcker Conditions) Let x“ be a solution of the problem minimize f(x) subject to h(x) = o, (2.32) g(x) g o (2.33) and suppose x" is a regular point of the constraints. Then there is a vector /\ E E" and a vector p E E” with ,u 2 0 such that Vf(:t*) + ATVh(:I:") + pTVg(x*) = ID (2.34) n g(x*) = 0. (2.35) (2.34) is called the Lagrange equation. (2.35) is called the complimentary condition. Regular point only concerns active constraints. Now we derive the quasi—Laguerre’s formula from the optimization point of view. The basic assumptions on page 9 are still valid in this section. Let’s identify the real a'Xis with the unit circle and label the roots, r1, r2, -- -, rn, of p(r) clockwise starting from go, see Figure 2.6. 21 r2 r1 x0 x1 rn r2 r1 Figure 2.6: Labeling the roots clockwise Lemma 2.3.3 If the roots ofp(:1:) is labeled clockwise, then 1 . ,forlSiSm—10. Thus, 1 1 0 < S . {130 — T‘m (Co — 1',’ That is, 1 1 2 Case II: rm > 230. We consider the following two subcases. 22 X0 X1 (2.36) II-l: r,- < 230. Then > 0 > . rm — 230 r,- — 220 11-2: r.- > x0. Then r,- 2 rm. So, 1 2 > 0 rm — x0 r, — x0 Cl Therefore, in the clockwise labeling, the farther r, is to 230, the larger the quantities l “40 is. In the following, we assume m < 72. Our goal is to look for, based on x0, x1, go, and q1, an iteration formula that will not jump over more than m — 1 roots of p(x) in the clockwise direction. Such a formula answers the question: how close h to the interval [xo,x1] is the m‘ root of p(x). In order to answer this question, we consider the following optimization problem, minimize (2.37) Tm — 1130 subject to n 1 j=l $0 — rj " 1 Z = q1, (2.39) j=l $1 — T‘j 1 1 . Z , fori=1,2,---,m—1, (2.40) Tm — $0 7“; — $0 1 1 . (2.41) rn — x0 rm — .730 The constraint (2.41) is to ensure that p(x) has at least two different roots. Lemma 2.3.4 (r1,r2,---,rn) is a regular point of the constraints (2.37)-(2.40) if 7‘1: 7‘2: - - -= rm and there exists at least onej > m such that r, :,£ rm. 23 Proof. Write the inequality constraint (2.40) into the following, 1 1 . — ZO,forz=1,2,-~-,m—1, 170—73 (ED—rm Then the Jacobian matrix of the constraints (2.38)-(2.40) is, (go—3,3, 0 0 72:03,“, 0 0 .(2.42) 0 (371,—). 0 ‘53-‘73 0 0 K 0 0 (——1:_)’ “(xo-lrmv 0 0 / Multiplying the ith row by —1 and adding the result to the first row, for i = 170""i-2 xl-ri—z 3,4, - - « ,m + 1, and then multiplying the it" row by — and adding the result to the second row, for i = 3,4, - - - ,m + 1, the above matrix becomes, 1 1 1 r 0 0 0 mm): (..-.-..)2 (M), 0 0 . . . 0 2"“ (11"3‘)! l , . . 1 (1170'--7'm)2 (II—rm+l) (II-'73:)5 1 . . . __1____ ——7(xo—r1) 0 0 (104,")? 0 0 1 1 0 (Io-1‘2)2 . i i 0 _ (IO‘rml‘) 0 O o. 1 _ 1 K 0 0 ' (3‘70-’"vn—1)2 (lb-rm)? 0 0 t (2.43) Without loss of generality, we assume r1, 75 rm. Then the submatrix __1_2 __1__2_ (to-rm) (Io-rn) 1 7." ($0402 1 (2.44) m ‘=1 (xi—rm)? (rt-Tn) is nonsingular since r1 = r2 = : rm # rn. Hence the Jacobian matrix (2.42) is of full rank. [J Theorem 2.3.5 An optimal solution can be attained only when r1 = r2 = = rm, and rm+1 = rm+2 = = 7'“. Furthermore r1 = = rm = r is given by (2.30). Proof. Let (r’f,r;,- -- r“) be an optimal solution to (2.37)—(2.41) and a regular 3 11 point of (2.38)-(2.40). According to Kuhn-Tucker Conditions (Theorem 2.3.2), there 24 exist [11, pg, 31, 32, ---, sm_1 such that 1 n n 1 V $0+H1(Z$0 .—CI0) +H2( ..— 0(from the basic assumptions), we .7 have iii—$0: —_1# f,orj=m+1,m+2, 73—231 #2 So r§=rIn+h forj=m+1,m+2,~-,n. Now all rf, for i = 1,2,--- ,m —1 must be equal to rfn. Otherwise, if r'l' 31$ r; for instance, then 31 = 0 by (2. 40) and the complimentary condition (2. 48). Therefore, :1—:—:—‘:— - F from (2 4(7) i. e. ,r‘{- — r;, contradicting to the fact that r; is different from rm (see constraints (2.41)), which implies 1'; ¢ 1",". So there are only two different rf’s , r; = r; = = fin, and r;,+1 = ran = = 71:, when the maximum is achieved. We have derived the formula for r when the roots 25 of the polynomial is of this kind (see (2.7)), and the formula is the quasi-Laguerre’s formula(see (2.30)). [1 Similarly, if the roots of p(r) is labeled counterclockwisely, starting from the right end of the interval [xo,a:1], then the mth root is closest to 3:1 only when r1 = r2 = - - - = rm 74 rm.“ 2 - - - = rn. This again yields the general quasi—Laguerre’s iteration formula. The corresponding optimization problem in this case would be maximiz- l rm‘xl ing satisfying similar constraints as (2.38)-(2.41). So we have the following properties. Corollary 2.3.6 Among all the polynomials that have only real roots (none of the roots is in the interval [xo,xl]) and satisfy the fundamental constraint (2.38) and (2.3.9), p(r) = k(:c — r)"‘(:r — z)""m is the one whose m‘h root, counting from the interval, clockwise or counterclockwise, is closest to the interval [230,331]. Theorem 2.3.7 QLmi computed by {2.30) would never overshoot more than m — 1 roots of the polynomial p(r), counting from the interval [3:0, 3:1], clockwise or counter- clockwise. Proof. Theorem 2.3.5 showed that the closest m‘h root to the interval [230,321], counting from the left (or right) end of the interval, clockwise (or counterclockwise), is given by (2.30). [1 Theorem 2.3.8 Let m be an integer, 1 S m g n — 1. In the clockwise direction, the larger the m, the farther the QLm_ is from $0. In the counterclockwise direction, the larger the m, the farther the QLm+ is from .131. Proof. Let m < k S n — 1 We only prove the case for the clockwise direction. Note QLm_ is the closest mth root and QLk_ is the closest k‘h root among all the l o o o o c o o l polynomlals With the prescribed logar1thm1c der1vat1ves, we have —_QLk_—1'0 Z _—QLm_—ro since It > m. D 26 2.4 For polynomials with complex roots The quasi—Laguerre’s method can also be applied to solve general polynomials or continuously differentiable functions that may have complex roots. The choice of the sign in (2.30), which we developed for polynomials with only real roots, is determined by how the sequence is approaching the root. Negative sign is chosen in (2.30) if the sequence is approaching the root from the right hand side, while positive Sign is chosen otherwise. However, for functions with complex roots, one must choose the sign for which the magnitude of |5| is the smaller one. 2.5 Convergence order In this section we will prove the order of convergence for quasi-Laguerre’s method is 1+\/2 if the multiplicity index matches the multiplicity of the root. The proof is in the complex plane, hence covers the real case. Write the quasi-Laguerre’s formula (2.30) in the following form $1+$0 2 + 6:1:(30axlaq0a (ham)? (2'49) 132: where -m 0 I x 2 Jib—+11 i \/—m(n — m)[qoq1+ "Xi—i] + [(1091 + ”33: 2 (92‘) 6i($0,$13 qu q1,m) : (1091 + (n - m)%% (2.50) Recall that the choice of + or — in (2.49) depends on the magnitude of (ii. For convenience, we let 5+ if |5+| S l5—l, a- if |5_| < |5+|. 6(xo,:t'1,qo,q1,m)= Theorem 2.5.1 Let {1130211 C C be a quasi-Laguerre’s iteration sequence in the complex plane, i.e., 33;.“ = W + 6($k_1,$k,qk_1,qk,m). If the sequence {$1.} converges and the multiplicity index of the formula matches the multiplicity of the converged root, then the order of convergence is 1 + {2. 27 Proof. Let r 6 C be a root of p(r) such that limk__,oo:1:;c = r. write p(x) = (a: - r)"‘cp(:r), where p(r) 7t 0. Using (2.49), we only need to show that if le — r| and I330 — rl is small, then I32 — 1‘] = 0(le — rlzlxo — r|),if I330 — r| = 0(1), and I170 — r] = 0(1). (2.51) Write (IO = + 00) (2.52) (Bo—7‘ (II = + 01) (2.53) 171 — T‘ where o,- = g(x,-), for i = 0, 1, and g(:r) = “3,65% is a continuously differentiable function in a neighborhood of r. Then m mao moo (1091 = + + + 0001, (fl-54) (ato—r)(:r1—r) at] —r :ro—r fl — — m + 3 2 55 Act — (2:0 — r)(.r1- r) Act, ( i ) where fi—‘i = fi’ Write Aq _ m(m — n) qOQI + ”AIL“ — ($1 _ 1')(.’130 _ 7‘) + 50, (2.56) where 50 _ moo mm + (7001 + ”A: (2.57) :31 —— r (to — r A3: So, Ag 1 1 Ax(qoq1+ nA_x) — m(m — n) (2:0 _ r — x1 _ r) + SoAx, (2.58) and the radicant in (2.49) can be written as, Aq (Ax)2 [ Aq]2 _ m2(n -— m)2 m(" m) [“1 + "Anti + 4 W“ + ”Ax _ (:ro — r)(x1 — r) 1 2 , 1 1 2 —m(n—m)So+— m(m—n) ( — ) 4 11:0 — r 3:1 — r 1 1 2 2] +2m(m n) (we _ r 931— r) SOASB + 50(Arr) _ m2(n — m)2 1 1 2 ~ 4 lxo—r 2:1 — rl —m(m—n)S'o +1m(m_..)( 1 1 )SA +15% )2 2 1‘0 — 7' $1 — 7‘ 0 37 4 0 17 28 1 1 + :ro—r xl—r m2(n — m)2 [ 1 1 4 ]Z+%m(n—m)( )SoAzz: 0—1‘ $1—1‘ Ax $0—7‘ +£53(Ax)2 + m(m — n) So — m(m — n)So = [m(nZ— m) ( 1 + 1 )+ éSoArc] + m(m — n) Am So — m(n — m)So CEO—7‘ $1—T' $0—T' 2 $1—T' = Sl—m(n—m) 50, (2.59) $o—T where s—1( )( 1 + 1 )+lSA (260) 1—2mn m 170—?" 171—7“ 20 :12. ' Assume le — r] < alro — rl, (2.61) for some 0 < a < 1. Since the iteration sequence converges, assumption (2.61) is feasible. Hence, 50 2 C(51) and So 2 0(512)if|:1:1- r| = 0(1), |:1:0 — r] = 0(1). (2.62) Therefore, A A 2 A 2 l/‘mW — m) [(1001 + nA—Zl + ( 2.1:) [(10% + TIA—iii _ 2 _ _ $1 —T‘§ __ J51 (1 m(m mam—r512) — is 1—l ( — )"’“’"§+ (( — )2) (263) _ 1 an mam—r3120“ r . . So, —m(CIo+CII) _ _ 33 fl 2 (gr 2 i m(n mllqOQI +nAx] + [00(11 +nAx] 2 (2-54) _-m(qo+q1) _1 _ den-{53 _ 2 _ 2 i51(1 2m(n m)$0_rsl2+o((.r1 r) )). (2.65) It follows from (2.52), (2.53), (2.65) and (2.60) that the dominant term in (2.65) is —m2:l:m(n —m) 1 (2.66) 2 $1—T 29 Thus, taking the plus sign yields a smaller magnitude in (2.66). Subtracting r from (2.49), and taking the sign in front of the square root that yields a + sign in (2.65), we have —m + mn—m —-S xl_r+x0_r 4153+31__12_12L_r_a+0($1_r) -S (132—7': + $0,”! where 52: 8'3: 52+S3 2 qu1 + (n — m)§§ : 52 + 53 (2 67) qoq1+(n-m)%%’ ' :cl—r Ito—r _ fl) ( 2 + 2 )(WW WA. (it—L2: + Ego—g) [(330 — 13:31 — 7‘) {1% xTilr (“fifx‘xfl .) + (n _ nag—g] (:31 -r xo—r) [(xm(2m-n) moo + 29L. 2 2 o—r)(:1:1—r) :cl—r :co-r +0001 — A0 Ax m(2m—n) 1 m mxl—r m(2m—n) 1 +—00+'— 01+ 2 xo—r 2 2930—1" 2 331—1" :3 —r m a: —r+:1: —r A0 m 0 God-"01+ O 1 3... _.~ 2 2 [0001+(n ””3; m(2m—n) 1 - 1 m m xl—r xo—r [ + ]+—(00+01)+—( 01+ 00) 2 xo—r :cl—r 2 2 (co—r xl—r .. - . g; £1 - .. (2.68) +0001 + (n — m) [0001 + (n — m):—:] q0+q1 m(n—m):r1—rSo 2 +51 2 xo—r51+0($1 r) m2 l 1 m l 1 l '7( + )‘§‘(”°+”1)+§m("*m)( + ) Ito—7‘ asl—r .‘L‘o—T xl—r m m(n—m)x1—'r50 2 mo—TS:+O($1_T) m(2m—n)[ 1 + 1 ]_m 2 3(00 + 0‘) +%SoA$ — xo—r xl—r m(n—m)x1—rSo 2 Ito—TS] +%SOA$ — + 0(331 — r) (2.69) 4:: A3: m xl—r xo—r xo—r—l-xl—r —( 01+ ao)+ 2 xo—r :cl—r 2 [0001 + (n — m) 30 where 1 + 1)+a.SoAa: 121-7“ 130-!" 1+1)+SOA$ xl-r .ro—r 01m(n - m) m(n — m) _(n — m)m (fifl—r + £5) — (n — m)(aoal + rag—g) m(n — m) (1,114 + $01,.) + SoAx n(n — In)??? — (n — m)(aoal — n-fifi) + alSoAa: _ n(n — m) (31%; + weir) + SoAa: = 0(01 — 00) + 0(ch — r). 31 1 m(n—m)ml—TSO _ A _ _ _ +250 in 2 $0-7‘Sl+0(xl 7') _ _r_n_ xl—r :L‘O—r :co—r+:r1—r[ g _ 2(xo—ral+x1—rao)+ 2 0001+(n m)A:c A 1(m $00 mAmO'I-i-O'OO'IACC'i-TLAO') 2 xl-r arc—r m(n—m)ml-rSo 2 $0 — 7' 51 + 0(31 T) _ ma +2231—x0—r0 _ 2 0 2 xo—r 1 a: —r+x —1' Am A0 +(0 2‘ +‘2—l("°"1+(""mlal +EAU_m(n—m)xl—r§+o(x —r) 2 2 xo—rsl l m m 2121—21' A0 = ——ao+—[——01—0‘1+01—00]+(31-T)[‘7001+(n—m)— 2 2 xo—r Am m(n—m)ml—TS’O — 2 $0—T51+0(xl 7‘) _ :cl—ra m(n—m)m—rfi _ xo—rl 2 (Bo—7‘31 A0 +(181—7‘)(0'001+(n—m)'A—$]+0($1—7‘) $1—T[a 71—77150] 2 m — — (Co-7‘ l 2 51 A0 +(x1 — r) [0001+(n - m)m] + 0(331— 7"), (2-70) 0 n_m50 n—m fis+gfi+aoal+nfifi 1- —=01— 2 s 2 §Mn~mmzfi+5fifl+%&A$ So, 52 + 53 = m: : :(0(al — 00) + 0(x1— r)) +(x1— r) (0001 + (n — "Hi—E) + 0(x1 — r) = (x1 — r) [0001+(n — m)-:—: +2.::° .(g;;o.)]..(._,. : (x1 — r)0 (g2(r) + (n — m)g'(r) + g’(r) +1) + o(x1 — r), (2.71) and, 52 + 53 ~ (x1 - r)2(:ro - 7‘)0 (292(7‘) + 2(n - m)9’(7") + g'(?‘) + 1) (IOQI + (n — 770%, m(2m - n) + 0(530 - 7‘) That is, 292(7‘) + 2(n - m)9’0‘) + g’(?‘) + 1 m(2m — n) x2 — r z 0( ) (x1 — r)2(x0 -— r). (2.72) Let the order of convergence be ,u, then 331 — r = 0((xo — r)“), 1122 — r = 0((x1— r)”). Hence, 1 #:2+_a p p2—2p—lz0, and, 2+ 8 p— 2f=1+\/2— I] Remark 2.5.2 The following local convergence property of the quasi-Laguerre’s iter- ative method in complex plan is implied by equation (2.72) that: there exits an e > 0 such that if Ixo — r| < 6, I321 — r] < e and Ix] — r| < a|xo — r| for some 0 < a < 1, then the quasi-Laguerre’s iteration sequence starting from 2:0, and x1 converges to r, and the convergence order is 1 + x/2 if the multiplicity index matches the multiplicity of the root. 32 Chapter 3 Estimate multiplicity 3.1 Determining multiplicity of a root While Quasi-Laguerre’s iteration converges super-linearly with convergence order 1+ \/2 when the multiplicity index of the method matches the multiplicity of the root, it converges linearly otherwise. It is, therefore, important to estimate the multiplicity of the root in the iteration process. Let r be a root of the polynomial p(x) with multiplicity m. For an iterative method involving first and second derivatives at one point, the following Lagouan— delle’s limiting formula [17] can be used to compute the multiplicity of the nearest root m = lim p’(xk)2 3r" P’(~”'=Ic)2 - P($k)P"(-’13k) In practice, the following formula is used instead, . (3.1) P’(37k)2 flu)? - P($k)P"($k) where int(x) is the largest integer S x. mmint< ) , (3.2) This method requires the evaluation of the second derivative of p(x), therefore it is inappropriate for our iterative method that involves only the first derivative of the polynomial. Let x0, x1, ---, xk, - -- be an iterative sequence that converges to a root r of p(x) 33 with multiplicity m. Write Let qk = M then P(l‘k)’ I m + 9 (113k), xk — r g(xk) where g(xk) # 0. This gives rise to the following formula for estimating the multi- qk: plicity m, m z int(|(:1:;c — xk+1)qk|). (3.3) This formula requires both xk_1 and 23;. to be close to the root. Otherwise, severe overestimate may occur. When both qk_1 and qk are known, then a better formula can be designed as follows. Start with the following two equations m g’($k-1) _ —— , 3.4 qkl xk—l—T 9(17k-1) ( ) I q. = m +g(‘”’°) (3.5) 33;. - r g(xk) ' Subtracting (3.4) from (3.5) then dividing the resulting equation by 113;, — xk_1 yields, M = —m ) (1—%($k—1— r)(.1:)c — r)(:I:)c — xk_1)h;€) , (3.6) (it). — xk_1 (:13)c — r)(xk_1 — r where h’ = % (93%)1) for some 5), between xk and xk_1. Multiplying (3.4) by (3.5), we obtain 2 qk_1qk = ( (1 + 6), (3.7) (Ck — T‘)(:I3k_1 — 7‘) where 6 = m(m, — r) + i1b)—(xk_1 — r) + #(xk — r)(xk_1— r)g—'(-£"—‘—‘lg—I(-r—"l). mam.) "m(n—1) 9(rk—1) 9m.) Dividing (3.7) by (3.6), we have qk—1Qk($k - mic—1) _ _m 1+ 5 4!: - (Ik—l 1— #(ivk—i — 7‘)($k -* 7‘)(I13k — ilk-1M2. (3.8) So, _ _qk_1qk(a:k - xk_1)1- flack—1 - r)(rvk - r)(:v:c — Ink—0h}. Qk — CIk—l 1+ 5 . 34 Hence, the following formula can be used to estimate the multiplicity m, qk_1qk(xk — ark—1) m z . (lie-1 — (II: This method is tested on the following polynomial of degree 23, p(x) = (x+1)3(x—1)(x—3)(x—-3.0000000999991)(x—-3.1000001)”(x—10.5)(x—20.0)2. Start from x1 = 4.5, x0 = 7.4, and aim at the root 3.1000001 of multiplicity 14. If multiplicity index mul = 1 is used in our quasi-Laguerre’s iteration, it would take 101 iteration steps to converge to the root. If we use (3.9) to estimate the multiplicity when [(ka — xk)| < 1.0 x 10‘3, then the computation result shows that after 20 iterations with mul = 1, condition |(x20 — x19)| < 1.0‘3 is satisfied, and (3.9) is then applied to obtain the true multiplicity 14. When mul = 14 is used in the quasi-Laguerre’s iteration, it takes only three more iterations to converge to the root 3.1000001. To approach root 3.1000001 of multiplicity 14, with mul = 1 it took 20 iteration steps for the condition |(x;c — xk+1)l < 1.0 x 10‘3 to be satisfied. If the multiplicity estimation is started earlier, faster convergence can be expected. But, over estimate may occur. The consequence of the over estimation could be the following, (1) Cause the convergence to march a longer distance toward the target. (2) Hurt the iteration process by jumping over the root. In case (2), a back up scheme is necessary to restore the iteration process. The following back up scheme is used in our experiments, 1. give up the new point, 2. reduce the estimated multiplicity index mul by 1 and recompute a new point using the reduced multiplicity index, 3. check overshoot, if still overshoot, reduce mul by 1 again to recompute a new point, 35 4. repeat this process until no overshoot. A more complicated issue is how to detect overshoot. The following method is useful in many situations. As the sequence is approaching to a root from one side(left or right), the sign of 171%) should stay the same. Therefore, if different signs are obtained at two consecutive iteration points, an overshoot may have occurred. For some specific problems, more reliable tools for detecting overshoot are available. For instance, when our method is used to solve the symmetric tridiagonal eigenvalue problem, the Sturm sequence provides reliable information for detecting whether an overshoot occurs. The following computation result is obtained using the backup scheme stated above. Estimation of the multiplicity starts when ka — xk_1| < 0.1, 7%31‘41 > 0, %3} > 0 and 0.1 < < 1. 0. The advantage of the multiplicity estimation formula IS fully exhibited 1n this result. Overestimated mul values, 17 and 15, are xkil‘xk xk-xk used both successfully once, without overshooting the root. It helped the iteration process advance to a closer position to the root. The total number of iteration is only 9, a 10 times speed up compared to the process using mul = 1. starting points starting logarithmic derivatives x-0= 7.4000000000000 -p’(x0)/p(x0)= -3.7424415421671 x_1= 4.5000000000000 -p’(x1)/p(x1)= -11.868803998501 new points marching distance mul-index x_2= 4.3230028988455 d1t= -O.17699710115449 mu1= 1 x_3= 3.9224125629059 d1t= -0.40059033593962 mul= 17 x_4= 3.0665236579569 dlt= -O.85588890494904 mul= 17 x_5= 3.0957747133912 dlt= -0.82663784951471 mu1= 16 x_6= 3.1240282732094 dlt= -0.79838428969651 mul= 15 x_7= 3.0989338356082 d1t= -2.5094437601137D-02 mu1= 15 x_8= 3.1005358217376 d1t= -2.3492451471832D-02 mul= 14 x_9= 3.1000001611582 d1t= -5.3566057939210D-04 mul= 14 x_10= 3.1000001000000 dlt= -6.1158159848123D-08 mul= 14 36 total time for this root = 2.89380E-O2sec. 1.71600E-O2sec. total number of iterations= 9 The following computation result is obtained when the above method is used to the polynomial. p(x) = (x+1)3(x—l)(x—3)(x—3.10000009999999)(x—3.1000001)l4(x—10.5)(x—20.0)2. Note roots 3.10000009999999 and 3.1000001000000 are not equal, but extremely close, to each other in the double precision environment. The numerical result shows that 3.1000001000000 is estimated as a root of multiplicity 15, even though it’s actual multiplicity is 14. starting points starting logarithmic derivatives x_0= 7.4oooooooooooo -p’(x0)/p(x0)= -3.7477269546723 x_1= 4.5oooooooooooo -p’(x1)/p(x1)= -11.916423052696 new points marching distance mul-index x_2= 4.3239509868566 dlt= -O.17604901314338 mul= 1 x_3= 3.9247571786955 dlt= -O.39919380816112 mu1= 17 x_4= 3.0720825767638 dlt= -O.85267460193171 mu1= 17 x_5= 3.1012166752925 dlt= -O.82354050340303 mu1= 16 x_6= 3.0999200729812 dlt= -1.2966023113043D-03 mu1= 16 x_7= 3.1000009568750 dlt= -1.2157184174845D-03 mu1= 15 x_8= 3.1000001000000 dlt= -8.56874987915350-07 mul= 15 x_9= 3.1000001000000 dlt= -4.654210059564OD-15 mu1= 15 total time for this root = 2.23110E-023ec. 1.36170E-028ec. total number of iterations= 8 Remark 3.1.1 Stopping criteria for the above numerical results is |xk+1 - xkl < e f'fl‘k) or “I“ < c, where c is the machine precision for double precision numbers. Remark 3.1.2 In this two examples where roots of the polynomials are known, over- shooting is easily detected. Genrerally, detecting overshoot is a difficult subject and we 37 do not attempt to address this problem here. However, in the application to symmetric tridiagonal eigenvalue problems, the Sturm sequence precisely detects the number of roots jumped. 3.2 A new stopping criteria Kahan [16] has suggested the following stopping criteria for Laguerre’s iteration: |$k+1— 931:1? S (ka * wk—1|-|$k+1- MDT, (3-10) where 7' denotes the error tolerance. This criteria is based on the following observa- tions. Let $k+1- 33!: Tk ‘2 —— 33/: - xk—l Then as {xk}‘,:°=l converges to /\ when k —) 00, r): is normally decreasing. Thus 00 w I/\ — xk+1| = Z($k+2+i — $k+1+il S 2 |$k+2+i “ $k+1+il i=0 i=0 00 - T'k (Bk —.fl?k Slmk+1—xk|2r2= '11:“. I i=1 _ ka+1 — 331:12 _ lurk — ark—1| — lxk+1 — x1. ' For the quasi-Laguerre iteration, we propose the following stopping criteria, 1+fi < 7' (3.11) 9k+1 117041) — 93001 X i where qk+1 = W, le = $1551. This criteria is quite efficient when it is used in solving the symmetric tridiagonal eigenvalue problem. The left hand side of the above inequality is actually a prediction for the distance |xk+2 — /\|. This prediction is based on the rate of convergence of the quasi-Laguerre’s iteration. We have shown in Theorem 2.5.1 that |$k+2 — Al % |$k+1 — Alzimk —' Al (3.12) % |$k+1 *- $k|l$k+1— Allan. —* 4|, (3-13) 38 and 3,... — A) z (3,. — All”? (3.14) It follows that ”W; (BL-+1" A ”fl |$k+1 — All/E qu z 33—h: =|xk+1- /\| I“ __ MIT/5 155k _ )(|(l+\/§)x\/§ z l$k+1— AI = l$k+l — All“ — 4|- 15131: _ ,\|1+\/§ So, by (3.13) and (3.14), we have q 1+\/§ l$k+2 — )‘I z |$k+1— xkl —L (lie-+1 In practice, power of 1 + J2 should be avoid in floating point computation, so the following inequality is tested for stopping 2 —q—"— < 7‘. (3.15) (1H1 |$k+1— xkl 39 Chapter 4 Application to symmetric tridiagonal eigenproblem 4. 1 Introduction In this chapter, we shall use the tools developed in previous chapters to approximate all eigenvalues of symmetric tridiagonal matrices. 4.1.1 Evaluation of the logarithmic derivative f’ / f of the de- terminant Let T be a symmetric tridiagonal matrix of the form {01 ,61 l 31 02 fiz 0 T=ifli—1,ai,flil= i. -. - (4-1) 0 ,Bn-Z an—l ,Bn-l \ fin—l an / We may assume, without loss of generality, that T is unreduced; that is, ,8,- 75 0, j = 1, - ' - , n — 1. For an unreduced T, the characteristic polynomial f(/\) E det(T — A1) (4.2) 40 has only real and simple zeros ([30], p300). In order to use our quasi-Laguerre iteration developed in previous chapters for finding zeros of f (A), or the eigenvalues of T, it is necessary to evaluate f and f’ efficiently with satisfactory accuracy in the first place. It is well known that the characteristic polynomial f (A) and its derivative with respect to A can be evaluated by three-term recurrences ([30], p423): =1, = oz — /\ p0 ’01 1 . (4.3) Pi = (ai _ A)pi—l _ flE—lpi—2a 2 = 2131 ° ° ' an I = 0, I = _1 P0 P1 . (4.4) pi : (Oi — A)pi-l — pi—l _ flE—lpi-Z) Z = 2139 ° ° ' an and f(/\) = pm f'(/\) = pi.- However, these recurrences may suffer from a severe underflow—overfiow problem and require constant testing and scaling. The following modified recurrence equations [19] is the result of careful investigation of the problem and is more stable than the code presented in [23]. It computes the logarithmic derivative q()\) = f’ (A) / f (A), required in our quasi-Laguerre’s formula. Let ._ p: .=_& Pi—l, 10:, £1 = 01"), 2 (4.5) éi 2 ai_A_,?-i—__lla i=2,3,---,n, :— l 770 771 £1 . .2 (4.6) ”i=E1'f[(ai—)\)77£-1+1—(€::11)77£-2]a i=2139H°1n I and f’(/\) —W :77” To prevent the algorithm from breaking down when E,- = 0 for some 1 S i S 11, an extra check [19] is provided: 0 If {1 = 0 (i.e., 01 = A), set {I = fer}; 41 522—152, €i—1 ’ where s is the machine precision. A determinant evaluation subroutine DETEVL olf§;=0,i>1,set§;= has been implemented [19] according to the recurrences (4.5) and (4.6). When 5,, i = 1, - - - ,n are known, the Sturm sequence is available ([25], p47). Thus, as a by- product, DETEVL also evaluates the number of eigenvalues of T which are less than A. Let A1 < A2 < < An be the zeros of f(A) and A1 < A2 < < An be the zeros of the numerical approximation f (A), it was shown in [19] that mo.) - 3.1 s -‘Z—5m,ax{lal + (3.4.11) + 13.15. (4.7) 4.1.2 The split-merge process Let A1 0, for all i = 1,2,... ,n — 1, since in (4.3)—(4.6), fii’s always appear in their square form. The following interlacing property for this rank-one tearing is important to our algorithm. 42 Theorem 4.1.1 Let A1 < A2 < < An and A1 S A2 < S An be eigenvalues ofT and T respectively. Then A 3\1sxlsfizsAzsmsfinsAn 0 and p < A.- or —fL'(%l < 0 and p > A,, and there is another point p0 such that no eigenvalues lie between p0 and p. So if p < A,-(determined by Sturm’s sequence) and —%%l > 0, then global Newton’s method is used to find the second initial point. Since the previous eigenvalue A.-_1 has been found, we choose p0 to be A;_1 + 6 in global Newton’s method. Otherwise, if —~‘;l(§)l < 0 and p < A, (or —f—’m > 0 and p > A.) then we use bisection method f(P) to examine the midpoint 9%9 (or 9%3, respectively); if —f7l(%l < 0 and p > A; (or --%3 > 0 and p < Ag) then we use Newton’s method plus bisection adjustment to 43 examine the point min(p — fig], Eer—b) (or max(p — %§], %3), respectively). Repeat this process until two initial points p0 and p are found with €551 > 0 if p < A.- (or £173 < 0 if p > Ag). The advantage of global Newton’s method is that the second initial point can be obtained without extra call to the subroutine DETEVL, which is the most expensive part of the whole algorithm. Once two initial points p0 and p are found, the following quasi-Laguerre’s iteration formula (see 2.10) with initial points 330 = p0, x1 = p and initial multiplicity index m = 1 is used, k- k k— n + q(x$..")(x£.1 — $5.1”) (k+l) _ (k) x“ _ 3'” + (k-l) (k) (k—I) (k) (k—l) we... ) — (x... — x... )5 i (/3 [(xm. — as... )23 + (m — n)] (4.10) where (k) (k4) S = 237,—, («3335433) + #33:: : :Efgf )) , (4.11) and (k) W551) = mm“) (4.12) 351:1) ' Multiplicity index m > 1 may be necessary when clusters exist. This will be addressed in Section 4.1.4. The iteration sequence {32‘ )}:1 obtained by (4.10) with an appropriately chosen sign (see Remark 2.2.12 in Chapter 2) converges monotonically to A.- with ultimate convergence rate \/2 + 1 by Theorem 2.5.1. The eigenvalues of T in (4.8) consist of eigenvalues of To and T1 in (4.9). To find eigenvalues of To and T1, the split-merge process described above may be applied again. Indeed, the splitting process can be applied to T recursively (See Figure 4.1) until 2 x 2 and 1 x 1 matrices are reached. After T is well split into a tree structure as shown in Figure 4.1, the merging process in the reverse direction from 2 x 2 and 1 x 1 matrices can be started. More specifically, let T, be split into Tao and T01. Let Ag, - - - , A; be eigenvalues of Ta = T, 0 ( 0 in ascending order. Then the quasi-Laguerre iteration is applied to the 0 T01 44 TOT/\ T11 A A,“ Am A TOIO THO Tlll Figure 4.1: Split and merge processes polynomial equation fa(A) E det[TgI — AI] = 0 to obtain the corresponding eigenvalue A‘-’ i = 1,2, - - - ,m, by the merging process described above. This process continues until To and T1 are merged into T. That is, in the final step all the eigenvalues of T are obtained by applying the quasi-Laguerre iteration to f (A) = det(T — AI) from eigenvalues of To and T1. 4.1.3 Deflation By Theorem 4.1.1, Ag 6 (Ag, Ag“) for each i = 1, - - - ,n with the convention An+1 = Ag, + 2,61,. If Ag+1 — Ag is less than the error tolerance, then either Ag or Ag+1 can be accepted as Ag. In general, if T has a cluster of m + 1 very close eigenvalues, for instance, AH", — A,- is less than the error tolerance for certain I S j S n — m, then m eigenvalues Aj, Ag“, - - - , Aj+m_1 of T can be obtained free of computations. They can be set to any one of Ag, - - - ,A,~+m. 4.1.4 Cluster and cloud handler Since the matrix T in (4.1) is unreduced, its eigenvalues are all simple. Therefore using m = 1 in the quasi-Laguerre iterations seems appropriate in all cases to obtain ultimate super-linear convergence with convergence rate \/2 + 1. 45 However, in some occasions, there may exist a group of eigenvalues of T, say, /\i+l < ’Ai+2 < < A£+r which are relatively close to each other, compared to other eigenvalues. We say the spectrum has a cloud. For example, type-3 matrices (see Section 4.3.1) have two clouds of eigenvalues. On some other occasions, some eigenvalues may even be numerically indistinguishable. For instance, the spectrum of Wilkinson matrix [30] contains eigenvalues mostly in pairs and numerically indistinguishable. We call this type of eigenvalues a cluster. Figure 4.2 illustrates these situations. lllll 11111 1 l l I I Spectrum has two clouds Spectrum has clusters Figure 4.2: Cluster and cloud of eigenvalues Definition 4.1.2 If m roots of f(x) are numerically indistinguishable, we say the function has a cluster of roots with cluster size m. Definition 4.1.3 If m roots of f (x) , being gathered in an interval, are relatively far from the other roots, we say f (x) has a cloud of roots in that interval and the cloud size is m. Numerical experiments exhibited slow convergence of quasi-Laguerre’s iteration with multiplicity index mul = 1 in case of clouds and clusters. Hence a cloud and cluster handler is needed to speed up the convergence. Two examples are listed below to show the slow convergence of the quasi-Laguerre’s iteration in these situations. Example 1. Wilkinson matrix W513. The eigenvalues of this matrix consist mostly of numbers in pairs that are numerically indistinguishable. In particular, we look at eigenvalue # 23, that is equal to 11.000000000000. The following numerical result is obtained from the quasi-Laguerre’s iteration process with multiplicity index equal to 1 through out the whole iteration. 46 Numerical Result of Quasi-Laguerre with Mul-index 1 starting points pO 8 11.250000000000 p1 = 11.137888560412 new points marching distance marching ratio p2 ' 11.057728240655 -8.0160319756418D-02 0.71500571262537 p3 I 11.023732381883 -3.3995858772538D-O2 0.42409834286890 p4 8 11.009560676203 -1.4171705679063D-02 0.41686564748620 p5 I 11.003851776062 -5.7089001419764D-03 0.40283789906888 p6 = 11.001550472961 -2.3013031001457D-03 0.40310796176390 p7 = 11.000624132107 -9.2634085422377D-04 0.40252883427876 p8 = 11.000251228591 -3.7290351638350D-04 0.40255540358554 p9 = 11.000101125531 -1.5010305952183D-04 0.40252519197879 p10= 11.000040705300 -6.0420231328607D-05 0.40252498197625 p11= 11.000016384787 -2.4320513144343D-05 0.40252267509654 p12= 11.000006595237 -9.7895493107211D-06 0.40252231737957 p13= 11.000002654728 -3.9405095313399D-06 0.40252205758077 p14= 11.000001068586 -1.586141701832OD-06 0.40252198077863 p15= 11.000000430129 -6.3845684136714D-O7 0.40252194405438 p16= 11.000000173137 -2.5699288044115D-07 0.40252193067718 p17= 11.000000069691 -1.0344526894853D-07 0.40252192500802 p18= 11.000000028052 -4.1638989052076D-08 0.40252192754020 p19= 11.000000011292 -1.6760605649697D-08 0.40252191590758 p20= 11.000000004545 -6.7465114403695D-09 0.40252193634135 p21= 11.000000001830 -2.7156191175802D-09 0.40252197622175 p22= 11.000000000736 -1.0930956358719D-09 0.40252170445976 p238 11.000000000296 -4.3999528135870D-10 0.40252221939185 p24= 11.000000000119 -1.7710815504780D-1O 0.40252285092899 p25= 11.000000000048 -7.1289803041994D-11 0.40252129001486 p26= 11.000000000019 -2.8695184519433D-11 0.40251457143920 p27= 11.000000000008 -1.1550485192110D-11 0.40252346815500 47 p288 p298 p30- p31all p328 p33- p34- p35= 11 11 11 11 11 11 11 11 .000000000003 .000000000001 .000000000001 .000000000000 .000000000000 .000000000000 .000000000000 .000000000000 -4.6498264506008D-12 -1.8709902983900D-12 -7.53617046123OQD-13 -3.0360953346465D-13 -1.2186073771329D-13 -4.8424972315608D-14 -1.9584490177560D-14 -6.8926651173973D-15 Total number of iterations is 35. Example 2. Type 3 matrix. As mentioned above, eigenvalues of this type of matrix form two clouds. We examine a 99 by 99 matrix of this type with a = 100.0 and b = 44.0 (see Section 4.3.1). The two clouds reside on intervals [43.928732599741, 43.999929526104] and [100.000000000000,100.071267400259] respectively. Each cloud of size m may be considered approximately as a root of multiplicity m when viewed from distance. .40256546571541 .40237852278298 .40279046169913 .40286977985244 .40137322541445 .39737960908740 .40442955857401 .35194508791937 00000000 Therefore the quasi-Laguerre’s iteration with multiplicity index 1 would converge linearly to the nearest eigenvalue of the cloud. The following numerical result is extracted from the result of Quasi-Laguerre algorithm applied to type 3 matrix of size 99x99. Multiplicity index 1 is used through out the entire computation. The slow linear convergence is quite obvious. Numerical Result of Quasi-Laguerre with Mul-index 1 starting points P0= 101.015872629335 p= 100.841454915614 new points p2 p3 p4 p5 p6 p7 100.743437006649 100.644852081645 100.563815029678 100.492668650879 100.431544760492 100.378647685865 marching distance -9.8017908965762D-02 -9.8584925003505D-02 -8.1037051967655D-02 -7.1146378798231D-O2 -6.1123890387780D-O2 -5.2897074626041D-O2 48 marching ratio 0.56197221529112 1.0057848207917 0.82200247111588 0.87794875394317 0.85912862214851 0.86540752380866 98* P9 = p108 p11= p12= p13= p14= p15= p16= p17= p18= p19= p20= p21= p22= p23= p24= p25= p26= p27= p288 p29= p30= p31= p32= p33= p342 p35= p36= 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. .075466390876 100 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 332982237172 293536689733 259485766377 230106703430 204781231369 182976233372 164233166158 148157185727 134408159176 122692599785 112756519967 104379002023 097366441507 091547492042 086768828702 082891873555 079790564254 077350095807 074045946273 073005718055 072272825651 071784008039 071484884863 071328989422 071273971357 071267449120 071267400260 071267400259 .5665448693327D-02 .94455474394590-02 .4050923355454D-02 .9379062947075D-02 .5325472061190D-02 .1804997997236D-02 .8743067214062D-02 .6075980431046D-02 .3749026550949D-02 .1715559390238D-02 .93607981874230-03 .3775179436346D-03 .0125605164542D-03 .8189494646869D-03 .77866333991680-03 .87695514658610-03 .1013093014782D-03 .4404684469887D-03 .88370493067220-03 .42044460324560-03 .0402282177712D-03 .3289240457534D-04 .8881761131248D-04 .99123176522530—04 .5589544052832D-04 .5018065412373D-05 .52223649513870-06 .8860813617266D-08 .9956651921029D-13 mNOOO000000000000000000000000 .86328873602484 .86379415002273 .86323870666811 .86279783488954 .86202450046865 .86099078210872 .85957665377625 .85770275736860 .85525275487378 .85210100852045 .84810972210352 .84314116799185 .83706899389962 .82978955419113 .82122440982119 .81130535273354 .79993427424847 .78691552816919 .77186203042146 .75406958919980 .73232579108987 .70454962868210 .66697049698000 .61193207773218 .52117472922255 .35291644980713 .11854717984453 .4914201062295D-03 .1776476818449D-06 However, the super-linear convergence can be obtained in these cases if a correct 49 multiplicity index is used. Therefore, estimating the multiplicity (i.e. the cloud or cluster size) is essential to the cluster and cloud handler. In our new algorithm, we used the following formula (see (3.9) in Chapter 3) to estimate the cluster size or cloud size of each root as the iteration evolves, k k k k . (1nt ”(Al [mini - $531)] mul = mt (k) (k+l) . (4.13) Q(xm:k) _ Q($mi ) is not close to A5. There are two features con- (m) Overestimation may occur if x, cerning over-estimating of mul, one being favorable to the convergence if it does not result in jumping because the bigger the mul used in the formula the farther the marching distance of the iteration. The other feature will hurt the convergence if it jumps over the eigenvalue. In the second case, the iteration has to back up, namely, reduce the value of mul and recompute the next iteration point. We used a sim- ple, yet efficient, back—up scheme by reducing mul to be the number of eigenvalues jumped which can be easily detected as a result of Sturm’s sequence. Considering the fact that the jumping resulted from overestimating may produce one initial point for approximating the next eigenvalue, so an overestimate of the number mul causes es- sentially no harm because our algorithm can dynamically reduce mul (see Figure 4.4), while an underestimate may result in slow convergence. The following are the numerical results of the new quasi-Laguerre’s method with cloud and cluster handler applied to Examples 1 and 2 above. A substantial speedup in each case is observed. mul is the multiplicity index used in the quasi-Laguerre’s function, mlt is the estimated cloud size or cluster size and jump is the number of eigenvalues jumped by the current iteration. Example 1’ Numerical Results of Quasi-Laguerre + Cluster Handler starting points p0 = 11.250000000000 p1 = 11.137888560412 new points marching distance marching ratio mul p2 = 11.057728240655 -8.0160319756418D-02 0.71500571262537 1 50 0.42409834286890 0.41686564748620 0.67428930324464 5.0719350393388D-04 p3 = 11.023732381883 -3.3995858772538D-02 p4 = 11.009560676203 -1.4171705679063D-02 p5 = 11.000004846655 -9.5558295481237D-03 p6 = 11.000000000001 -4.8466546715077D-06 p7 = 11.000000000000 -7.0335823226288D-13 Example 2’ Numerical Results starting points of Quasi-Laguerre + Cloud Handler 1.4512241534305D-07 p0 = 101.015872629335 p1 = 100.841454915614 new points marching distance marching ratio mul mlt 100.743437006649 -9.8017908965762D-02 0.56197221529112 1 51 100.644852081645 -9.8584925003505D-02 1.0057848207917 51 51 100.029961843314 -O.61489023833137 6.2371629162320 51 51 100.194192778959 -0.45065930268662 4.5712800681300 27 27 100.097946953508 -9.6245825450974D-02 0.21356671187569 27 27 100.065547713253 -3.2399240254871D-02 0.33663008346658 27 27 100.083910561793 -1.4036391714786D-02 0.14583896651118 9 9 100.073214269531 -1.0696292261722D-02 0.76204002275420 9 9 100.070280416514 -2.9338530174209D-03 0.27428691602977 9 9 100.072038260612 -1.1760089194859D-03 0.10994547369413 3 3 100.071242608246 -7.9565236563468D-04 0.67657000933504 3 3 100.071686253674 -3.5200693760571D-04 0.29932335696874 1 1 100.071422241996 -2.6401167823509D-04 0.75001839461135 1 1 100.071303605790 -1.1863620589650D-04 0.44935969003182 1 1 100.071269600688 -3.4005101598554D-05 0.28663342140443 1 1 100.071267404003 -2.1966854942081D-06 6.4598704045676D-O2 1 1 100.071267400259 -3.7435236860743D-09 1.7041691657475D-03 done 00000 Our cloud and cluster handler is more dynamic and more effective than the linear acceleration method used in [19] and [8]. A numerical result is reported in [8] that 51 showed more than 11 iterations are required to compute the 23rd eigenvalue of the 99 by 99 Wilkinson matrix. The starting points they used are closer to the target than the starting points 11250000000000 and 11.137888560412 used in our test. Our algorithm needs 7 iterations only. The following numerical result from the linear acceleration in [8] again shows the advantage of our cluster and cloud handler. Example 2” Numerical Results of Quasi-Laguerre + Linear Acceleration starting points p0 = 102.000000000000 p = 101.961967587691 new points marching distance marching ratio activity 101.419535791275 -0.19431401692363 1.7915217080245 EVFP 101.174041555357 -O.43980825284116 Sturm 101.174041555357 -O.10914874934945 .55818469601614 EVFP 100.926995451781 -O.24704610357598 Sturm 100.926995451781 -9.86916862735660-02 .55818469601614 EVFP 100.703617752133 -O.22337769964884 Sturm 100.703617752133 -6.7609196631664D-02 .55818469601614 EVFP 100.550591824733 -0.15302592740005 Sturm 100.550591824733 -5.4640879681183D-02 .55818469601614 EVFP 100.426918246229 -O.12367357850324 Sturm 100.426918246229 -4.0113011667144D-O2 .55818469601614 EVFP 100.336126887701 -9.07913585279660-02 Sturm 100.336126887701 -3.1160591475516D-02 .55818469601614 EVFP 100.265598340604 -7.0528547097709D-O2 Sturm 100.265598340604 -2.3378215187691D-02 .55818469601614 EVFP 100.212684340167 -5.2914000436132D-02 Sturm 100.212684340167 -1.7833578400225D-02 .55818469601614 EVFP 100.172320010912 -4.0364329255723D-02 Sturm 52 100.172320010912 -1.3373261607786D-02 0.55818469601614 EVFP 100.142051114639 -3.0268896272261D-02 Sturm 100.142051114639 -1.0001205562557D-02 0.55818469601614 EVFP 100.119414493698 -2.2636620941768D-02 Sturm 100.119414493698 -7.3347879597213D-03 0.55818469601614 EVFP 100.102813013637 -1.6601480060970D-02 Sturm 100.102813013637 -5.2628633859943D-03 0.55818469601614 EVFP 100.090901105338 -1.1911908298643D-02 Sturm 100.090901105338 -3.6378825546402D-03 0.55818469601614 EVFP 100.082667161150 -8.2339441885324D-03 Sturm 100.082667161150 -2.3913400151639D-03 0.55818469601614 EVFP 100.077254627918 -5.4125332318762D-03 Sturm 100.077254627918 -1.4615792679392D-03 0.55818469601614 EVFP 100.073946505146 -3.3081227715712D-03 Sturm 100.073946505146 -7.9939460451005D-04 0.55818469601614 EVFP 100.072137164009 -1.8093411371325D-03 Sturm 100.072137164009 -3.5153908158226D-04 0.55818469601614 EVFP 100.071341494238 -7.9566977062484D-04 Sturm 100.071341494238 -5.6772584691737D-05 0.55818469601614 EVFP 100.071212995782 -1.28498456660960-04 Sturm 100.071214206718 -1.2728752032842D-04 Sturm 100.071225469893 -1.1602434544500D-04 Sturm 100.071253032066 -8.8462172612935D-05 Sturm 100.071284721654 -5.6772584684950D-05 Sturm 100.071267733919 -1.6987735058407D-05 3.3419749301337 EVFP 100.071267400306 -3.3361304770473D-07 50.920475608743 EVFP 100.071267400259 -4.6343259817599D-11 7198.7393424155 EVFP 100.071267400259 -4.6343259817599D-11 1.3891321138799D-04 done Note that even though Example 2’ and Example 2” have different starting points, the first clearly showed faster convergence than the second one. The second took 40 53 iterations after 19 = 100.703617752133, while the first took only 17 iterations after p = 100.841454915614 that is farther from the root /\ = 100.071267400259 than 100.703617752133 is. 4.1.5 Partial spectrum In some applications, only a partial spectrum may be needed. Our algorithm, like bisection method, inherits the features of the split-merge Laguerre’s method [19] for finding partial spectrum specified by orders or by intervals. From the strong interlacing property given in Theorem 4.1.1, one can easily obtain the following: Proposition 4.1.4 If [a,b] contains 1: eigenvalues of T, then [a,b] contains at least k — 1 and at most I: + 1 eigenvalues of T. More precisely, let m(m) be the number of A 3+]. be all the eigenvalues eigenvalues of T which are less than a: E R and :\,+1, - - - , ofT in [a,b]. Thens—l Sn(a) Ss ands+k—l§n(b) Ss+k. To find eigenvalues of T in a given interval [a,b], the eigenvalues of T in [a,b] are found first, say 5x3“, - - - , :\,+k. By evaluating m(a) and m(b), the actual number of eigenvalues of T in [a,b] is a = m(b) — 5(a). Hence /\,,(a)+1,- - - ,AKMH are the eigenvalues of T in [a,b]. By Proposition 4.1.4, 5 — 1 _<_ m(a) S s and s + k — 1 S m(b) S s + It, so, a can either be It — 1, k, or k + 1. Thus, at most k + 2 values are needed to be considered as the first starting points to evaluate these a eigenvalues of T. Let A A A.9 : 0, )‘s+l = A3+1) Ass-+2 = As+2s ' ' ' a )‘s+k = ’\s+ka As+lc+l = b Then a values among them can serve as the first staring points which will lead to all a eigenvalues of T in [a, b]. To find eigenvalues from the i“ to the jth eigenvalues, we use bisection and Sturm sequence to find a, and b such that the interval [a,b] contains all the eigenvalues of interest. Then the above method is used in the split-merge process to find all these eigenvalues. 54 The capability of finding partial spectrum has direct application in parallel com- puting. We will use this feature in the parallel implementation of the algorithm (see Chapter 5). 4.1.6 Stopping criteria The following stopping criterion was suggested by Kahan [16]: |x("+1)— $00]? S (lec) _ $(k-1)| _ |$(k+l) _ 1,00”, (4.14) where 7' is the error tolerance, see Section 3.2. This criteria is used in [8]. In addition, other stopping criteria are used in our code. We use the trivial stopping criteria, Isak“ - xkl < 7', or lf’fio )l < 7'. Note that lh’; ) is the Newton iteration step size. We also estimate the magnitude of the distance the next quasi- Laguerre’s iteration can march. (3.15) derived in Section 3.2 is used for this estimate. 4.2 Description of the new algorithm 4.2.1 The global Newton’s formula Assume that there is no root of f (x) lying between $0 and 3:1, and the logarithmic derivative of ff / f1 at 1:1 is known, then the following formula can be used to obtain a second point that does not cross any root of f, 1 11:2 = 3:1— —,——_—. (4.15) 7%.]...— This formula is called the global Newton’s iteration formula [9]. In [21] this formula is generalized to treat multiple roots and more properties of this method is described. 4.2.2 Initial points for the quasi-Laguerre iteration At every stage of the merging process, we evaluate the eigenvalues A1 < ' - - < Am of an m x m sub-matrix, given m initial values :\1 S - . - 3 Am (obtained from the previous 55 merging process) and an upper bound 3...,“ that interlace those m eigenvalues: W1SA1$A2SA2$”'SIA\mSAmSS\m+1 (see Theorem 4.1.1). To evaluate an eigenvalue )1.- by the quasi-Laguerre iteration, two initial points, say 3(0) and x“), are required on the same side of A,- without any other eigenvalues lying between them and x“) is chosen to be closer to A.- than 13(0). For the 2"" eigenvalue, we start with finding the mid-point p = ’\2—+2£"—+-1- and computing '73)) and m(p). Then use the global Newton’s method or Newton’s method plus bisection adjustment to determine the next point. Several improvements are made in our practical implementation. From our com— puting experience, the high order of convergence of the quasi-Laguerre iteration occurs only when no critical point of f (i.e. zero of f’) lies between cell) and Ag. In other words, if mm is to the left (resp. right) of A5, then it is desirable that —f’(x(1))/f(:c(1)) is positive (resp. negative), see Figure 2.5. If there is one or more critical point in [X3 11.4.1], then bisection or one step Newton’s iteration is used repeatedly until the above requirement at 2:“) is satisfied. Also, if the midpoint seems to be too far from the target, then one of A; and ’1,“ might be the eigenvalue and should be tested for quick exit. Our algorithm is summarized in the algorithm INIPTS in Figure 4.3. 4.2.3 Quasi-Laguerre iteration with cluster and cloud han- dler After two initial points are found, the quasi-Laguerre’s iteration (4.10) is used with mul = 1 to begin with. In the process, the algorithm checks whether slow convergence is encountered by checking the ratio fit' If this ratio is between 0.1 and 1.0, (4.13) is used to estimate the multiplicity mul of the root. The estimated mul value is used in (4.10) to find a new point. However, an overestimated mul used in (4.10) may result in a new point that jumps over the target eigenvalue. If this happens, the iteration must back up and the value of mul must be reduced to compute a new point. Hence, the previous two points should be saved for possible future backup before using mul > 1 in (4.10) to compute a new point. Overshooting is detected by 56 Algorithm INIPTS Input: subscript i, initial end points :\,-,5\,-+1. Let A0 = X1. Local variables: pOOk, plok, p1, fp1 Outputzstarting points p0,p and K.(p0),n(p),fp0 = —%,fp = —f7%§)l. Begin INIPTS (a,b) = (Ila/1m); (n) p = 9%; (#1) Evaluate fp= —ff’(;’) , m(p) by DETEVL; If m(p) = i — 1, then if p00k and fp > 0, then go to (it). else p00k = .true.,p0 = p, pr = fp; if fp > O and p— /\,-_1 > 2tol, use G-Newton and goto (it) . else a =p, go to (#3); endif endif Else if m(p) = i‘, then b = p; if P101: and fp <0, then p0=p1,fp0=fp1, go to (it). else Plok = .true.,p1 = p, fpl = fp; if fp > O and p00k = .false., then evaluate fa = —£f'—((§)l and 5(a), set p00k=.true.; if m(a) =i, then A,- = a, goto (it) and exit. else if fa < 0 then goto (#11); else then p = min {a+fa,9%3}; goto (#1); endif endif else goto (##); endif endif Else if m(p) > i, then b=p, goto (#11); else then a =p, goto (mt); Endif (#End INIPTS Figure 0.1: Algorithm INIPTS 57 the Sturm sequence obtained in the evaluation of —%ff)l. By comparing m(xk) and m(mk_1), the algorithm detects whether a jump occurs. If jmp eigenvalues are jumped over, the algorithm reduces the mul value to min(mul — 1, jmp), the minimum of mul -l and jmp, and recompute a new point. The quasi—Laguerre’s method with the above feature is called the quasi-Laguerre’s method with cluster and cloud handler. The algorithm is illustrated in Figure 4.4. 4.2.4 Stopping test Stopping test is done in various places in the algorithm. Figure 4.5 shows when and where to check the stopping criteria. 4.3 Numerical tests Our algorithm is implemented and tested on SPARC stations and DEC Alpha stations with IEEE floating point standard. The machine precision is e z 2.2 X 10"”. 4.3.1 Testing matrices There are 12 types of matrices used for testing our algorithm. In the following description of these matrix types, 0,, i = 1,. - . ,n, denote the diagonal entries and ,85, i = 1, ~ - - , n — 1, are the sub-diagonal entries. Matrices with known eigenvalues Type 1. Toeplitz matrices [b, a, b]. Exact eigenvalues: {a + 2b cos fillsksn ([12], Example 7.4, p137). Type 2. a1 =a—b,a,-=a for i=2,~-,n-—1,an=a+b. fi,=b,j=1,~~,n—1. k— Exact eigenvalues: {a + 2b cos QTTSMlISkSn ([12], Example 7.6, p138). 58 Algorithm Q-LAG Input: p0, pr, rs:(p0), p, dltO =p—p0, no roots between p0 and p. Local variables: rat = iu—luused to test convergence rate, dltO mul--estimated multiplicity used to speed up the convergence, oldmul--multiplicity used in the last iteration, jmp = m(p) — n(p0)--used to adjust mul and upmul, upmul--dynamically adjusted upper bound to control mul. Uutput: p1, dlt1= p1 — p. Begin Q-LAG mul: 1, upmulzn—l; (it) Evaluate fp= —%(§)l, m(p) by DETEVL; STOP-CHECK1; imp = ”(19) — MPG): If |jmp| 79 0, then back up p0» fp0. p. fp; mul = ma:1:(min(|jmp|,oldmul — l), 1) , upmul = mul; Endif p1 = qlag(n,mul,dlt0,p,n,fp0,fp) by formula (2. 10); dltl = p1 — p; oldmul = mul. (for backup) If mulyé 1, store p0, pr, p, fp for possible future backup; and rat = dltO ' if (0.1 < rat < 1.0), Then estimate mul by formula (3.9); mul = min(mul, upmul); mul = maa:(mul,1); endif dltO = dltl, p = p1, p0 = p, pr = fp; STOP-CHECK2; Goto (1:); End Q-LAG Figure 4.4: Algorithm Q-LAG WITH CLUSTER AND CLOUD HANDLER 59 Algorithm STOP-CHECK Inputs: dltO, p0, pr, p, fp, tol. Outputs: EigenFound or ContinueIter. After Evaluating fp, compute q==1/fp; Begin STOP-CHECKI if (Iql < tol), then EigenFound, eig = p + q. else ContinueIter; endif End STOP-CHECKI After Q-Lag iteration, new point pl is obtained; dlt=p1—p, fprat = €12,703 Begin STOP-CHECKZ if dltl < tol, then EigenFound, eig 2 p1. else if fprat2 =1: dlt < tol, then EigenFound, eig 2 p1. else ContinueIter; endif End STOP-CHECKZ Figure 4.5: Algorithm STOP a for odd i . Type 3. a,- = , fl,- = 1. Exact eigenvalues : b for even i {a+b:l:\/(a—b)2+16coszgk-4’f—l 2 } (add {a} when n is odd ) lSksn/2 ([12], Example 7.8 and 7.9, p139). Type 4. a; = 0, fl,- = ‘/i(n—i). Exact eigenvalues: {—n + 2k — ”19:3,, ([12], Example 7.10, p140). Type 5. a; = —[(2i — 1)(n -— 1) — 2(i — 1)2], ,8,- : i(n — i). Exact eigenvalues: {—k(k — 1)}15ksn ([12], Example 7.11, p141). 60 Wilkinson and random matrices Type 6. Wilkinson matrices W: . fl,- = l, n/2—i+1 forevennandlSifin/2 _ i—n/2 forevennand n/2