r551;

A:

.q: -«
"w 4 A m
1.2-» w.
. ..:..«.4....<.
w" v "am.
w...- u...
.
”-1 <
mm.
nun-a»

,
v

ma-

n 11

..
4

we -
‘iﬁ555,py

.3... ‘- .
raft-anwﬁ‘
.‘ , L §

but. '

H

mm“ :g m

“" 1:: ”.1
21“}? at: ”i“:-

 

at

15

- V
.‘ot.¢.

.

-.'.
£19
.4” _

Tr?

'..-4~ a
4

~ I.
".4.

x you.

A. :y...~
. .., «an... - .
"w... 44 A

4.1.»
n-
rl

vol

r
w

vv-.r.
1.."

~
“a..- J -.
~.A..,..-

"-

4-. ‘9..-
iwn.
a .
“‘JRO

- .04 mum. ,

. '11‘Hﬁ “4'3-..“m
. ....<’,.... _ v...” 4
4:1 MN. an 1

.;
W.

‘1 1
.y-.. g ,
w,
.w :i‘ﬁwﬁp’
.
:‘1k 12

"" -
,-
m...”

“4 1,

hr"

...,
“w.

-c
l
.m

"I?
'" " 29:.

3'5‘

.

mi‘ 1

a

an}: _ . ‘ . _

£17»... > 7n” s l .

‘gqll‘i' . ‘33”, -,

39' l . >- I " tasty: ‘
‘ .

‘O' '
h, ‘ I 1

“S. 1 : !

)h""-‘?\i§zﬁ:
. t
‘

that ‘31};
h

245v}:

’1} ‘ . Vl-t r > n
13 43?.- .1'331253: i
1 " i . 2
11!" ‘ )th .:

wk". 1
if? :11“; 7‘35: :5; .3?!“ V
.11.! ‘ ‘

r\?'§‘?.54

mix I :

"‘ii!3' 2331‘.‘,
,.

3
1
- VJ;

 

("83124.3

 

 

 

 

3 1293 01421 768

“fig lllililllllllllxiu

BRAKES

11111 ,

This is to certify that the

dissertation entitled

Quasi-Laguerre's Method and Its Parallel Implementation
on Solving Symmetric Tridiagonal
Eigenvalue Problems

presented by

Xiulin Zou

has been accepted towards fulfillment

of the requirements for

Ph-Ds degree in ApplieLMaihematics

. / \ ’ ,4
~//;;?i:::~. " //C«Q_‘v/ /L;
(r I

 

Date 12/12/95

 

MS U is an Affirmative Action/Equal Opportunity Institution

Major professor

 

 

LIERARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your "cord.
TO AVOID FINES mum on or baton date duo.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

QUASI-LAGUERRE’S METHOD
AND ITS PARALLEL IMPLEMENTATION ON
SOLVING SYMMETRIC TRIDIAGONAL EIGENVALUE PROBLEMS

By

Xiulin Zou

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Mathematics

1995

ABSTRACT

QUASI-LAGUERRE’S METHOD
AND ITS PARALLEL IMPLEMENTATION ON
SOLVING SYMMETRIC TRIDIAGONAL EIGENVALUE PROBLEMS

By

Xiulin Zou

First of all, a quasi-Laguerre’s iterative method is derived for solving polyno-
mial equations. Unlike the well-known Laquerre’s method that requires evaluating
the second derivative, the quasi-Laguerre’s method only involves the evaluation of
the function and its ﬁrst derivative and still maintains the monotonical convergence
property of the Laguerre’s method for solving polynomial equations with only real
roots. Two different approaches to derive the quasi-Laguerre’s method are given and
each approach reveals some different features of the method. It is also proven that
the order of convergence of the quasi-Laguerre’s method is l + x/i

Secondly, a new algorithm using split-merge and the quasi-Laguerre’s method with
cloud and cluster handler is developed to solve symmetric tridiagonal eigenproblems.
The cloud and cluster handler utilizes the multiplicity estimation method developed
ealier in this work. Numerical results show that the new algorithm is very competitive
in both speed and accuracy.

Finally a parallel version of the new algorithm is designed and implemented. Nu-

merical results on a substantial variety of matrices show our algorithm is the best one

among the existing parallel algorithms for symmetric tridiagonal eigenvalue problems.

TO MY PARENTS

For their 70th birthday

Contents

1 Introduction 1
2 Quasi-Laguerre’s method 4
2.1 Laguerre method ............................. 4
2.2 Quasi-Laguerre method .......................... 5
2.3 From an optimization point of view ................... 21
2.4 For polynomials with complex roots ................... 27
2.5 Convergence order ............................ 27

3 Estimate multiplicity 33
3.1 Determining multiplicity of a root .................... 33
3.2 A new stopping criteria .......................... 38

4 Application to symmetric tridiagonal eigenproblem 40
4.1 Introduction ................................ 40
4.1.1 Evaluation of the logarithmic derivative f’ / f of the determinant 40

4.1.2 The split-merge process ..................... 42

4.1.3 Deﬂation .............................. 45

4.1.4 Cluster and cloud handler .................... 45

4.1.5 Partial spectrum ......................... 54

4.1.6 Stopping criteria ......................... 55

4.2 Description of the new algorithm .................... 55
4.2.1 The global Newton’s formula ................... 55

4.3

4.2.2 Initial points for the quasi-Laguerre iteration ..........

4.2.3 Quasi-Laguerre iteration with cluster and cloud handler . . . .

4.2.4 Stopping test ...........................
Numerical tests ..............................
4.3.1 Testing matrices .........................

4.3.2 Speed test in evaluating eigenvalues without computing eigen-
vectors ...............................

4.3.3 Accuracy test ...........................

Parallel computation of eigenvalues

5.1
5.2
5.3
5.4
5.5

5.6
5.7

Introduction ................................
Issues for parallel algorithm design ...................
Determining performance .........................
Existed parallel algorithms ........................
The parallel quasi-Laguerre’s method ..................
5.5.1 Load balancing ..........................
5.5.2 The pseudo-code .........................
Performance test .............................

Comparison with parallel bisection and sequential root free QR

vi

55
56
58
58

61
62

69
69
70
72
73
75
75
80
81
83

List of Tables

4.1 Accuracy comparison between the new version and the old version of
the quasi-Laguerre’s algorithm. The numbers in the table represents
the max-error of Icomputedez'gs — trueeigsl/(lnorm), as multiples of

machine precision ............................. 68

5.1 Performance test result on DEC ALPHA workstations ........ 83

vii

List of Figures

2.1
2.2
2.3
2.4
2.5

2.6

4.1
4.2
4.3
4.4
4.5
4.6

4.7

4.8

Distribution of the roots of the polynomial in case 1 ..........
Distribution of the roots of the polynomial in case 2 ..........
Distribution of the roots of the polynomial in case 3 ..........
Movement of quasi-Laguerre iteration .................

fl “Ik—
root ....................................

sign of qk = ﬁg} and qk_1 = LAB—'71)} as the sequence approaches the

Labeling the roots clockwise .......................

Split and merge processes ........................
Cluster and cloud of eigenvalues .....................

inipts ....................................

qlag ....................................
qlag ....................................

Comparison of execution time on Dec Alpha between NQL-the new
version of quasi-Laguerre Iteration and OQL—the old version of quasi-
Laguerre’s method, for ﬁnding all eigenvalues without computing eigen-
vectors. ..................................
Execution time on Dec Alpha for ﬁnding all eigenvalues without com-
puting eigenvectors. B/ M: DSTEBZ; NQL: the new quasi-Laguerre
Iteration; RQR: Root-free—QR—DSTERF .................
Execution time(continued) on Dec Alpha for ﬁnding all eigenvalues
without computing eigenvectors. B / M: DSTEBZ; NQL: the new quasi-
Laguerre Iteration; RQR: Root—free-QR—DSTERF. ..........

viii

14
14
15
17

22

45
46
57
59
60

63

64

4.9

5.1

5.2

5.3

5.4

5.5

5.6

5.7

Error(on the scale of machine double precision) , on Dec Alpha, for
ﬁnding all eigenvalues without computing eigenvectors. B / M: DSTEBZ;
NQL: the new quasi-Laguerre Iteration; RQR: root free QR—DSTERF.

Before load balancing in shared environment, other CPU intensive jobs
are running on PE # 2 also. Left: Execution time of each Alpha
workstation. Right: Number of eigenvalues computed by each Alpha
workstation. Matrix size 5000, type 4 ..................
Before load balancing in Heterogeneous environment, PE ## 1—5 are
Alpha workstations, PE ## 6—7 are SUN SparclOs. Left: Execution
time of each workstation. Right: Number of eigenvalues computed by
each workstation. Matrix size 5000, type 4 ...............
After load balancing in shared environment, other jobs are running
on PE # 2 also. Left: Execution time of each workstation. Right:
Number of eigenvalues computed by each Alpha workstation. Matrix
size 5000, type 4 .............................
After load balancing in Heterogeneous environment, PE ## 1-5 are
Alpha workstations, PE ## 6-7 are SUN SparclOs. Left: Execution
time of each workstation. Right: Number of eigenvalues computed by
each workstation. Matrix size 5000, type 4 ...............
Two conﬁguration of the PVM machines. Top: formed by Dec Al-
pha workstations. Bottom: formed by Dec Alpha and Sun SparclO
workstations ................................
Comparison between bisection — B / M, quasi-Laguerre — NQL, and root
free QR — RFQR .............................
Comparison(continued) between bisection — B/ M, quasi-Laguerre —

NQL, and root free QR — RFQR, on a random matrix of order 5000 .

ix

67

77

77

80

84

86

87

Acknowledgment First of all, I am indebted to my advisor, Prof. Tien-Yien Li,
for his support and guidance during my study at Michigan State University. Secondly,
I owe my wife, Ying Zhou, for her support and sacriﬁce as I was so intensively engaged
in my research and teaching that she had to take full charge of all the domestic matters
while still taking four courses of her own. I would like to thank all the Professors
from whom I have gained more insight, more inspiration, or more knowledge about
Mathematics and its application. I am very appreciative to Prof. Enbody from
the Computer Science Department for giving me the privilege to take several of his
courses to gain my knowledge with computer science and parallel computing and for
his sponsoring a computer account in the Computer Science Department for me. I
also want to thank the Advanced Computer System Lab at Michigan State University
for providing the computing facilities for my research. I thank Mr. Min Jin for some
valuable discusion, Ms. Wenjiang Qiao, Mr. Paul Gray, etc., for some PVM technical
consulting and information exchanging. Last but not least, I thank all the committee
members, Prof. Chiu, Prof. Dunninger, Prof. Frazier and Prof. Zhou, for reading

my thesis and providing valuable suggestions.

Chapter 1

Introduction

Deﬁne

n

’%f)li\/En—l)[(n—1)(p'x)2)—nﬂ—l

P(x) P(1=)

 

[11(33): 33 -

and

13(10): L+($) if|L+(10)- xl S IL-(ft) - SCI, (12)

L_(:1:) otherwise.

Then Laguerre’s iteration n+1 = L($k), applied to a polynomial that has only real
roots, converges monotonically and cubically to a root of the polynomial, starting
from any initial point [16]. This method has been successfully used to ﬁnd all the
eigenvalues of symmetric tridiagonal matrices [19] with a great speed up. However,
this method requires the evaluation of the second derivative, which is normally very
expensive. A method that maintains the advantages of Laguerre’s iteration and in-
volves only ﬁrst derivative is sought in this work.
The problem is formulated as follows. Let

f($) = kI-IICB-ra) (1-3)
be a polynomial of degree n that has only real roots. Assume that the logarithmic
derivatives $113)) 2 qo and 1&5}: ql are known and no root of f (:5) lies between

3:0 and 2:1. Find, based on 2:0,qo,a:1, and q1, an iterative method that converges

monotonically to a root of f (2:), starting from 3:0 and 2:1.

First of all, our iteration formula for this purpose is derived for polynomials of

form f(:r) = k(a: -— r)'"(:r — z)”‘m by solving 7' in the following system,

 

 

 

 

 

 

 

m n — m
: qu (1'4)

3:0 — 1' :co — z

m n — m

:r1— 1' x1 — z

The formula for r is
r _ $o+$1+ mn— [(n+m)Aq+QOQIA$]A4£ (16)
2 —mu‘ﬁrmi \/—m(n-m)(90qi +71%) + [90611 +73%: 29:2

where Aq = q1 — go, and Arc = 2:1 — are.
We call (1.6) the quasi-Laguerre’s iteration formula.
We proceed to derive the formula for a general polynomial of form (1.3) and begin

with the following two equations,

 

 

 

 

 

" 1
Exo—T' —(]0, (1.7)
jzl .7
n 1
Z x! _ r = C11 (1-8)
j=1 J
Let
6:1: 2 mn — [(n + m)Aq + qoqlAm]% (1.9)
472M 2c \/—m(n — quoq, + neg)+[qoq1+ n4: M
and
5 if 6 < 6-,
6(sco,:r1,q0,q1,m) = + . I +| —| I (1.10)
d-1f|6_l< |6+|.
Deﬁne
1' +1:
QL(xo,x1,qo,q1,m) = 0 l +6(xo,:v1,qo,q1,m). (1.11)

2
Then, the iteration 33;,“ = QL(mk_1, :rk, qk_1, qt, 1), starting from 1:0 and 1:1, converges
monotonically. It is shown that the limit of the iteration is a root of the polynomial.
We call this iterative method the quasi-Laguerre’s method. The integer m in the
formula is called the multiplicity index of the quasi-Laguerre’s formula. It is shown
that if the multiplicity index of the formula matches the multiplicity of the root, the

order of convergence of the iteration is \/2 + l.

2

While quasi—Laguerre’s iterative method is best suitable for solving polynomial
equation with only real roots, it can also be used to solve polynomial that may have
complex roots. Though the global convergence property is not guaranteed, we have
shown that the order of convergence is still \/2 + 1 provided that the multiplicity
index matches the multiplicity of the root.

In solving polynomials that have only real roots, the two Laguerre’s iteration se—
quences mt“ = 1:: + L+(:rk) and 2;“ = 2:; + L_(:ck) both converge to a root of
the polynomial, starting from any initial point 330. That is, the Laguerre’s iteration
sequence generated by the same sign in formula (1.1) converges to a root of the poly-
nomial, no matter where the iteration starts. However, this feature is not inherited
by the quasi-Laguerre’s function. A quasi-Laguerre’s iteration sequence generated by
mt“ = $1213: + 6+, with m = 1, marches toward the right hand side of the initial
interval [$0,313]. If there is no root of the polynomial in this direction, the iteration
will wrap around and produce a point on the left hand side of the left most root
of the polynomial. When this happens, the iteration can not proceed any more be-
cause there are roots of the polynomial between the two iteration points now. Similar
situation could occur for quasi-Laguerre’s iteration sequence generated using the —
sign.

A new algorithm with cloud and cluster handler is designed in Chapter 4, using
Split-Merge and quasi-Laguerre’s method to solve the symmetric tridiagonal eigen-
problems. The new algorithm differs from the one given in [15].

Various numerical results for a substantial variety of matrices presented in Sec-
tion 4.3 indicate that our algorithm is strongly competitive in both accuracy and
speed. A parallel version of our algorithm is implemented on a cluster of worksta-
tions using PVM (Parallel Virtual Machine) and numerical results show that our

algorithm leads all the existing algorithms on distributed memory parallel platforms.

Chapter 2

Quasi-Laguerre’s method

2.1 Laguerre method

Let p(a:) be a polynomial of degree n, with all its roots being real. By

12
Mi: (n—l) (n-1)(ﬂﬂ)2—npnx
p(~’v) 10(1‘) p(~'v)

the Laguerre’s iteration is deﬁned as

 

[4:]:(112) = (C —

 

11::+1 = Li(:ck).

(2.1)

Let $k+l be one of {:r;+1, 55:11} which is closer to 13),. Then the iterates {wk} converges

at least linearly from any guess 11:0 close to a zero 2 of p(a:), and converges cubically

when 2 is a single root.

In general, Laguerre’s iteration is deﬁned via

 

 

(2.2)

This function is called the Laguerre iteration function with multiplicity index m. Let

an,“ be one of the {Lm_(:rk), Lm+(a:k)} that is closer to xk. Then the iterates {1:1,}

converges at least linearly from any guess 9:0 close to a zero of p(:z:) with multiplicity

greater than or equal to m, and cubically if 1:0 is close to a root that has multiplicity

exactly equal to m.

Kahan [16] derived this general Laguerre function in the Riemann sphere. In the
real ﬁeld, Kahan [16] used the non-overshooting strategy to derive the Laguerre’s
formula, which leads to the monotonical convergence of the method for m = 1. For

general m < n, the method never overshoot more than m — 1 roots of the polynomial.

2.2 Quasi-Laguerre method

Laguerre’s iteration is an excellent method for ﬁnding the roots of polynomials with
only real roots because of its global and cubic convergence. It has been successfully
applied to solving symmetric tridiagonal eigenvalue problems by Li and Zeng [19].
However Laguerre’s iteration involves evaluations of the polynomial itself, its ﬁrst
derivative, and its second derivative. In [7], the so—called Quasi—Laguerre’s iteration
was established which maintains the global and monotonic convergence of the La—
guerre’s iteration without evaluating the second derivative. We shall present in the
following a different approach to derive the iteration which were obtained indepen-

dently from, and almost simultaneously as, the work in [7].

The quasi-Laguerre’s formula derived from a special polynomial

First of all, we assume the polynomial is of form p(a:) = a(:r — z)m(a: — t)"‘"‘.
Suppose {xo,p(mo),p’(a:0)} and {x1,p(:r1),p’(:rl)} are known. Let qo = ﬂ—ol and

_ '1‘1
q, — p(rn)' Then,
m n — m

 

“ (10- (2.3)

xo—z :rO—t

= , 2.4
531—2 171-1 QI ( )

m n—m

 

 

Eliminating t from the above two equations yields,

2
"“10 "“11 m 5170 — $1

]: (q°‘q‘)+m(x1 — Z)($o — z)‘

 

 

 

 

 

n—m[qoql—($1—z :rO—z (ml—z)(:ro—z)

Let Ax = x1 - :ro, Aq = q1 — go. It follows that

_mqoxo + qm — (qo +q1)z m 2 _ M, 9; m(n — m)
(qu1 (1:1 — z)(:1:0 — z) +(:r1 — z)(a:o — z) ( )Aa: (51:1 — z)(a:0 — z).

 

 

 

A _ _ 2_ _
(qui'i-(n—m _q_m(q0$0+q1$1) m(q0+q1)z m m(n m).

 

 

 

 

 

 

Am — (331—2)($0"Z)
AC] _ "1010330 + (11331— n) " m((10 + (11):?
goq1+(n m)A$- (xl_z)(x0_z) '
Let
A = qoq1+(n—m)%§.
B = "10101130 + (11331 — 72), (2'5)
0 = m(qo+q1).
Then
A(IB1 —Z)(.’130—Z) = B—CZ. (2.6)
Or,
A22 + [C — A(.’IIO + 1171)]? + 14330171 — B = 0.
So,
_ A(xo + x1) — C :l: \/[C — A(:I:o + 231)]2 — 4A(A:1:0.'1:1— B)
z — 2A
1
= $0423: — am :1: \/[C -— Am + m]? — 4A(A:coa:1— 3)}
$0 + $1 1
where
Q = 02 — 2CA(2:0 + .731) + A2(:lro + sci)2 — 4142170131 + 4A8
= C2 — 2CA(:L'o + 1131) + A2030 — $02 + 4A3 : 02 + Q“
and

Q1 = —2CA(mo + 1:1)+ A2(a:o — in)2 + 4AB
= A[—2C(;ro + x1) + A(:z:0 — 51:1)2 + 48]
= A{-2m(CIo + <11)($0 + x1) + l‘Io€11(Afl7)2 + (n — "UACIAJFI
+4m(qoxo + qlxl — n)}
= A{——2m[Cono + qlxo + qoaso + qlxl] + (10(11(Aa:)2

+(n - m)[q1:v1 — qlxo — qox1+ qoxo] + 4m[qoxo + qlxl - nl}

= A{[—2m + (n — m) + 4m] [qoato + qlxl]
+l—2m - (n — m)llQO$1 + (11150] — 4mn + qul(A$)2}
= A{(n + m)[qo:coq1:13o — qoxl - (11500] — 477171 + qoqi(A$)2}
= AU” + m)(q1— (Io)($1 — 170) - 4mn + (10(11(A$)2}
= A{(n + m)AqAx - 4mn + qoq1(A$)2}-

A
Q = "12010 + (102 + [(10611 + (n — m)Kgll‘4mn + (n + m)AqA$ + CIOCMAJTVI

A
= "12(th + (11)2 - 4mnqoq1 - 4mn(n - m)—q- + (n + m)qoq1Aquv

A2:
+(n2 - m2)(A‘1) + (1091(133) + (71 -m)qoq1AqAx
A
= m2(qo + ql)2 — 4mnqoql — 4mn(n — "GK:

+2nqoq1AqAa= + (n2 - m2)(A<1) + <10<11(AII¢)2

Aq
= m2(qo + my — 4mnqoq1 -— 4mn(n - THO-5; + [C1041ACL' 'l' nAq]2 — m2(Aq)2

A
= m2[(<Io + (102 “ ((10 — (102] — 4mn90<11 — 47717101 * "UK: + [(quiAfc + ”A912

A
= —4m(n - m)(CIOQI + nA—Z) + [QOQIAIB + "ACII2

Combine the above equations, we have

$0+$1 1 (CZ—Q)

 

 

 

 

 

z = 2 ’ 27110 1 v0)
= M _ i__Ql_
2 24(0 $ x/Q)
= 1212 +1 4mn — [(n + m)Aq + qoqle]Ax
2 2 _m((I0 + ql) :l: \ﬂ-4m(n — m)(q0q1 + n2 —%)+ [(10611Ax + "A912
= $0 + 9’17"” - [(n + m)Aq + (MAI/:1a -2( 7.)
2 mm i \/"m(" — m)(40€h + n2)+l€10qi+ 12%: 29312-

We call (2.7) the Quasi-Laguerre formula with multiplicity index m. Letting
2:1 -—> 1:0, the general Laguerre’s iteration formula (2.2) with multiplicity index m is

obtained.

An alternative form of the quasi-Laguerre’s formula
Rewrite (2.6) in the following form
A(z — 2:1)2 + [C + A(x1— 230)](2 — x1) + C331 — B = 0,

where A, B, and C are deﬁned by (2.5), then the solution can be expressed as,

—(C + AAx) :l: x/B
2A ’

 

(2.8)

2 = 331 +
where

R = (C + AAsc)2 — 4A(C:1:1 — B)
= C2 + 2ACA$ + A2(A$)2 — 4ACx1+ 4AB
= C2 — 2AC(:ro + x1)+ A2(A:L‘)2 + 4AB

A
= "12010 + (102 — 2[(10(11+(n " Twig] "1010 + q1)(:co + $1)

E
An:

A
= m2(qo + (11)2 +(qoq1 + (n - m)A—:)2(Aiv)2

A
+(qoq1 + (n " 7”) )2(A33)2 + 4[9091+(n — m)A_:] "1010330 + qiqi - n)

A
+2 [qoql + (n — "GAE-i m(A$Aq — 2n)

Aq 2 Ag Aq
_ 2 2 _ 2 _ _ 2
— m (qo + q1) + [qoq1+ n at] (Ax) 2]q0q1 + 72 at] m $(Ax)

A
+m2(Aq)2 + 2m [(1091 + nA_:i (Aqu — 2n) — 2m'2(A€I)2 + 472ng

A A 2 A
= 4m2 [4091 + ﬁg] + [Cloth + n-A—g] (Ax)2 - 4mn [qoql + n—A—g]
A A
[(10% + TIA—g] ([QOQ1 ‘1' iii] (Ax)2 + 4m2 — 4mn) ,
and
Aq
C + AAx = m(qo + q1)+ (qoq1+(n — ME) A3:
A
= quo + (90(11 + 71—2)

A11:
Substituting these into (2.8) yields,

 

2mqo + ((1041 + 71%) i \/[(qu1 + 71%] ([0091 + 71%] (Ax)2 + 47710” — 71))
201041 + (n — m)%§)

 

2’=~’L‘1-l-

(2.9)

Note that C331 — B = m(qo + ql):c1 — m(qo:ro + qlxl — n) 2 nm + mquac. So,

 

 

z : 31+ 2(nm + mqurc)
— (2mqo + ((10% + n%)) i \/[qu1 + 11-23,] ([Cqu1 + 11%] (Ax)2 + 4m(m — n))

(2.10)

Equations (2.7), (2.9) and (2.10) are different formulae for the same root 2 of the
polynomial p(:1:) = k(:r — z)m(:r — t)"‘"‘. In the following, we shall use those different

forms in different occasions.

Deriving the quasi-Laguerre’s formula for general polynomials

Let p(:r) be a polynomial of degree n with only real roots r1,r2, - - - ,rn. Some
of these roots may be equal to each other, i.e., they may be multiple roots. Let
3:0, 1:1, qo,q1 be given such that 2:0 < $1, “ﬁg—0°} = (10,124)?) = q1 and the interval [$0, $1]
does not contain any r23. To summarize, we emphasize that the derivations hereafter

will always be under the

Basic Assumptions:

1. The polynomial p(:r) has only real roots.
2. There is no root of p(:r) between 2:0 and 9:1.

3. The polynomial p(:r) has prescribed logarithmic derivatives at (1:0 and $1.

Write p(a:) = kHE-‘zlcr — Tj). We want to estimate all the roots of p(:z:) under the
above conditions. It turns out that the estimation gives rise to the quasi-Laguerre’s
formula (2.7).

To begin, notice that

” 1
(10:2
j=l

" 1
(11:2

j=l $1 — rj

 

(2.11)

a
{Ea—7‘1“

 

(2.12)

Let Y,- = HEEL—T‘bjr- 1,2,~.,n, and y: Elf-Q. Then,
" l
(10: , (2.13)
Eli/j -y

 

9

n 1

 

 

 

 

 

f; Y1 + y
Adding and then subtracting the above two equations, we have
n Y,-
QOI = , (2.15)
12:31 3sz _ y2
—Aq " 1
= , 2.1
A2: j=l 16-2 _ y2 ( 6)
where qm = 9932591 and Aq = q] — qo,A:c = 1:1 — 11:0.
Lemma 2.2.1 The following inequalities hold
2 2 - _
Y,- —y >0, forallj—1,2,---,n, (2.17)
and
_Aq
Ax > 0. (2.18)

Proof. Since 132 — y2 = (Y,- + y)(YJ- — ) = (2:1 — rj)(:co — 11,-), inequality (2.17)
follows. Inequality (2.2.1) follows from inequality (2.17) and equation (2.16) El

Let Y = 39% — r, where r is a root of the polynomial p(a:) with multiplicity

greater than or equal to m. For simplicity and without loss of generality, let 1‘ = r1 =

 

 

r2 = = rm 2 = rm+k, wherek Z 0. So Y = Y1 2 Y2 = = m+k. Rewrite
(2.15) and (2.16) into the following,
" Y,- mY
= (101 — ——, (2.19)
jg?“ Y3? — 312 Y2 — y2
n 1 —Aq m
—— = — — . 2.20
15;] Y]? _ 312 A55 Yz — y2 ( )
Note that Ym+1, . - -, Ym+k stay on the left hand side of the above two equations,

only m equal terms are combined and moved to the right hand side. It follows from

Holder’s inequality,

 

 

 

(in: Yj )2< Zn: sz i 1 (2.21)
j=m+l ’62 _ y2 - jzm+l Y? — y2 j=m+1 ’32 _ y2
n 1 n 1
= n—m+y2 Z —— Z —-——. (2.22)
j=m+1 sz - yz j=m+1 sz — 92

10

Remark 2.2.2 Equality holds in (2.21) if and only if Ym+1 = Ym+2 = = Y", 50
p(.2:) has only two different roots.

Using (2.19) and (2.20), we have

 

 

 

 

 

 

 

 

 

2
mY Aq m Aq m
_ __ < _ 2 __ _ __ _
[(101 Y2 —y2 _ n m+ ( A2: Y2 -—y2)]( A2: Y2 —y2)° (2.23)
Dr,
M (M)2 Y AqAa: 1
2 2
€101+(n—m)'A—$-——4—-2m<101Y2 _ y2 +[m +(n—mlm—m 2 ]Y2 _ y, _<_ 0
Since Y2 — y2 > 0, so,
A A 2
[(131 + (n — "ll—g - (——q)—] (Y2 - yz) — 2qu1Y + nm - mAqAx S 0
A2: 4
And hence,
Aqu 2 2 All (Ag)2 1
[Tim—m 2 —y (qo1+(n—m)‘A—;-—4— W
1 Aq Aq 2
Or,
A A9: A 1 1 A
[nm —— m (14 — y2 (q0q1+ TVA-73H -Y—2 — 2mq01—f +q0q1+(n — mug—:1 S 0. (2.25)
Lemma 2.2.3 The inequality
' A
qOQl + n—A-Z- s 0 (2.26)

holds, and equality holds if and only ifp(:r) = k(:r — 7‘)".

Proof. Applying Holder’s inequality, we have

Aq _ Aq 2 (Aq)2
(IOCII‘anm —nA:r +901

 

n

1 Y-
: —nZY2_y2+(Z}/2_J

, 2
jzl J y

l

y2 j=1 j-y2

 

 

 

 

 

|/\
|

'M

"S

 

 

J: J
n 1 n 1 n Y2 n 1
< _ __ J _ 2 __ =
_ ”Zyz_ +ZY2—y2 (gyﬁ_y2 ygyj2_y2) 0.

2
y j=1

11

Lemma 2.2.4 The inequality

AqAa:
4

A

2 9
y qq +n— > nm 2.27
(0 1 A13) ( )

 

nm — m
holds.

Proof. The assertion follows directly from Lemma 2.2.3 and Lemma 2.2.1 E]

Lemma 2.2.4 indicates that the leading coefﬁcient of the quadratic inequality (2.24)
is positive. Let a, b, c denote the coefﬁcients of inequality (2.24), i e

 

 

 

m AqAa: 2( + Aq)
a = n —m — n—
4 y (loch Ax
A A2: A2: 2
= nm—(n+m> ‘1 —qoq.( )
4 4
b = —2mq01=—2m-%—q—l,
A
C = 4041+(n—mlxg-

To solve the quadratic inequality, let’s solve the quadratic equation ﬁrst. Denote

the two roots of the quadratic equation (2.25) by 21 and 22. Then

_bi‘/b2_4ac= mmi (% )2—ac

2a a
mlg—g—q-L :i: m2(991.;11)2 — [nm— (11 + 71093—35 - Cloth—[Kn —m)-:—:- + (1091]

nm— (n+m)A“—-Qoq1AT”-

 

 

 

 

Simplifying the radicant, we have

 

 

 

 

 

 

2
radicant = r1222(q—0——— + q1)2_ m(n — m)A—q — nmCIOCh + (n2 — m2) (Alf)
a:
Aq A2: A A2: A2: 2
+01 + m) 4 (1091+(n — 7")th (14 + (qoq1)2( 4)
A 2 A
= n2(—4q)— + mzqoql — nm(n — m)A—: — nmqoql
A A2: A2: 2
4'2"th 4 + (CIOQ1)2( 4 )
nA A Aq A
-—- (——")‘ — m<n — m)90€11— nm<n — m)—"— + nqoqle— + mom—‘5)
2 A2: 2 2
A
= —m(n — quoq. + n—ﬂ) + mm + (10(1le /4.

A2:

12

So,

 

"19%“ i \/-m(n - m)(qoql + "52%) + (”/34 + qoq1A$)2/4

A A1: A 2
rim—(n+m)—L4 —qoq1—f

 

(2.28)

21,2 =

Lemma 2.2.5 The radicant under the square root in (2.28) is nonnegative.

Proof. This is an easy consequence of Lemma 2.2.3. El

Lemma 2.2.5 ensures the roots 21 and 22 of equation oz2 + bz + c = 0 are all real.
Without loss of generality, we assume 21 S 22.

Since the leading coefﬁcient of the quadratic inequality (2.25) is greater than zero,
we have

21 S S 22.

1
Y

To achieve an inequality for Y, consider the following three cases.

Case 1: 21 <0 < 22.

For this case we have

01'

Since Y = 591,151 — r, so,

 

 

r g — —,
2 22
or
r > 180 + $1 __ i
_ 2 21
So p(a:) may have roots on either side of the interval [2:0,:c1], and gig—ﬂ — f1- (resp.

513-1231 — i) is the nearest possible root to the right (resp. left) hand side of the

interval. See Figure 2.1.

Case 2: 0 <213z2.

13

/ x0 19;.) X1 \
<_X.:_Xco__12_2 (xﬁxohq
2 1

Figure 2.1: Distribution of the roots of the polynomial in case 1

For this case we have

 

 

l-<Y<—1—
22 21
So
$o+$1_i<r<$o+$1_i
2 21— — 2 22

That is, all the roots are on the left hand side of the interval [$o,x1]. The root
503,531 — i is the nearest possible to the left hand side of the interval and £93231- — i

is the farthest possible root to the left of the interval. See Figure 2.2.

l'

 

/ 1‘ x0 xﬂgxl x1
_Q‘_1;‘_Xo_)__lz_l (xﬁxn) ‘47.;
2

Figure 2.2: Distribution of the roots of the polynomial in case 2

Case 3: 21 S 22 < 0.

This case is similar to case 2 and we also have

 

 

133/31.
22 21
It follows that
$o+$1__1_<r<$o+$1_i
2 21 — _ 2 22.

That is, all the roots are on the right hand side of the interval [mmxl] in this case.

The root £93131 — :11- is the nearest possible to the right hand side of the interval and

139-35351- — :12— is the farthest possible root to the right. See Figure 2.3.

14

l 1 l L1 1 1 ll

xO xﬂ:xl x1 / \
Jig—XLKJZ-l (xﬁxo) "21.;
2

Figure 2.3: Distribution of the roots of the polynomial in case 3

 

 

 

Deﬁne
and. = "m _ (n + MARIA—E _ (”Ills—ii . (2.29)
-m“°—‘§ﬂ i \/—m(n - m)(QOQI + "31%) + ("A4 + Goq1A$)2 /4
Note that the set {6m_, 6m+} is the same as {_§11"_zl_2}'
Now, deﬁne
QLm.(xo,x1,q.,q.) = “3" J; $1 + 6m.- (2.30)

This is the same as (2.7). We call QLmi the quasi-Laguerre iteration function
with multiplicity index m. When there is no ambiguity, we shall write QLmi for
QLm¢($o,$1,qo,q1). Also for simplicity, if m = 1, we write QLi for QLli, and 61

for 611;. In the following theorem, m is taken to be 1.

Theorem 2.2.6 (1). If 6- and 6+ have different signs, then 6- < 0 < 6+. And all
the roots of the polynomial p(a:) lie outside the interval [QL-,QL+].

{2). If 6- and 5+ have the same sign, then 6+ < (L. And all the roots of the
polynomial p(m) lie in the interval [QL+,QL-]. Furthermore, if 6- < 0 and 6+ < 0,
then QL+ < QL- < gig-£1; if 5- > 0 and 6+ > 0, then 59% < QL+ < QL-.

Proof. We prove the theorem by relating 61 to ;L; and the above three cases.
Obviously, any root has multiplicity 2 1 and the numerator of 61(see (2.29)) is
positive by Lemma 2.2.4. So only the denominator of 6* needs to be examined.

If 6- and 6+ have different signs and 6+ < 0, then —m99—‘253-‘- < 0, which leads to
(L < 0, a contradiction. So (1) is proved.

If 6- > 0 and 6+ > 0, then

(10+ €11
m—

0
2 >

15

and

 

+ A
"‘7"?ng > \f-mm - m)(CI0q1 + nA_:l:) + (”A9 + (10413102 /4-

That is, the denominators of 6- and 6+ are both positive, and the denominator of 6-
is less than that of 6+, hence 6- > 5+. Furthermore, we have 5%“ < QL+ < QL-.

If 6- < 0 and 6+ < 0, then
90 + (11

0
2 <

 

—m

and

 

+ A
|— "2&7qu > \/—m(n — m)(q0q1 + nA—z) + (nAq + qoqlAar)2 /4.

So the denominators of (L and 6+ are both negative, and the absolute value of the

denominator of 6- is greater than that of 6+. Hence, 6- > 6+. Furthermore, we have

QL+ < QL_ < ﬂy. I]
The following result can be easily veriﬁed.

Corollary 2.2.7
Ifd- < O and 6+ < 0, then |6-| < |6+|.
Ifd- > 0 and 6+ > 0, then |6_| > |6+|.

Theorem 2.2.8 Among all the polynomials, p(.r) = k(:z: — r)(:c — t)”"1 is the one

that has root closest to the interval [$0,131].

Proof. It follows from Remark 2.2.2 that r = QLi if only if the polynomial is of
the form p(:z:) = k(:(: - r)(2: — t)"‘1. The assertion then follows from Theorem 2.2.6.
I]

We call the polynomial in Theorem 2.2.8 the optimal polynomial.

Lemma 2.2.9 The inequality

Ar):
iamii > —2__ (231)

holds.

16

Proof. The roots of the polynomial must be outside of the interval [2:0, 2:1]. It is
clear that QLi = 59—131- + (ii are the roots of the optimal polynomial, so they are

outside the interval [$0,231]. Therefore Idmil > 925. E]

Figure 2.4 illustrates the three cases discussed in Theorem 2.2.6. If one identiﬁes
two ends of a straight line as 00, a uniform circle is obtained. Point A in the ﬁgure
serves as a reference point so that one can tell where 00 is on the circle corresponding

to the three line cases.

I m
A QL_ x0 x] QL+

case I

m 1 QL+
QL+ A QL. x0 x1 QL. \/

case 11 m
2

uniform view of the 3 cases

1 m pomt A 18 a reference pomt

x.) x1 0L. A QL_

case 111

Figure 2.4: Movement of quasi-Laguerre iteration

Proposition 2.2.10 If —q1 < 0 and —-q0 < 0, then |6-| < |6+|. If —q1 > 0 and
‘qo > 0, then |5-| > I6+|.

Proof. If —q1 < 0 and —q0 < 0, then the denominator of |6-| is greater than the

denominator of |6+|. Hence |5_| < |6+|. The second case can be achieved similarly.

U

17

For polynomials with only real roots, a monotonically convergent algorithm is
obtained and we call it the quasi-Laguerre’s algorithm. We describe the algorithm

below and deduce some of its properties from the above contents.

Quasi-Laguerre algorithm:

Let p(:c) be a polynomial with all real roots. Start with an interval [20,21] that
contains no roots of p(:1:). Since any root will have multiplicity at least one, we
can always use m = 1 in the quasi-Laguerre’s iteration. If the nearest root of the
polynomial is known to be of multiplicity 2 m, then m should be used to accelerate the
iteration. The following iterative scheme converges monotonically and the iteration

sequence never cross the root.

Theorem 2.2.11

1. Initial step: evaluate qo = #21:; ’41 : p’grnz

19(101) '

2. Choose m properly if possible. Otherwise, let m = 1.
3. Compute 51: according to (2.2.9). Then, we have

+ 2+
(a). If 61: > 0, then :13: = 3:312:21 + 6+(x:_2,xt_l,qz_2,q:_l) converges to a

+ _ _ _ _
root ofp(:c), where at] — 1:1, qf' — ql, $3 — 1170s (1d — (10;

(b). If 61 < 0, then x; = iii—41 + 6-(x;_2,x;_l,q;_2,q;_l) converges to a

root ofp(:c), where 2:," = 2:0, qf = (10, $6 = 1171, (15 = Q1,

(c). If6-<Oand6+>0,letxfzzrozxa',xf=x1=$3,qf=CIo=(Idr

q? = q1 = qr“;- Then,

_ x-_ +r-_ _ _ -. -.
. g, = ——,— +g_(s._.,e._.,g._.,g._.) gs to a root ergo),

if —qo < 0.
+ +13g
° 93: = ill-lz—‘k;l +5+($Z_2sxt_1,qt_2sqt.1) converges to a root ofp(:l:),

if —Q1 > 0.

We call the sequence the quasi-Laguerre’s iteration sequence.

18

Proof. It follows from Theorem 2.2.6 and Lemma 2.2.9 that the quasi-Laguerre’s

iteration sequences converges monotonically. In the following, we will omit super-

 

 

scripts + or -. Assume the limit of the sequence {wk}:°:0 is r. Let k —-) oo in

23k + mk-

$k+1 = ——2——1 + 5($k.$k—1,kaCIk—1)s

then,

(SUE/ca xk—la qka Qk—1)—> 0-
That is,

_(n + m)AQk—1Ark—1 _ (Ik—ICIkgAxk-IV _) 0
mq——’——-" ”Lg“ i\/-mm n(-m )(qk 1qk+nA:—:—‘_ l)+(nAq/c 1+qk 1qkAzr/c 1)2 /4

Lemma 2.2.4 ensures that the numerator of the above fraction is greater than or equal

to mn. So,

 

 

- + A _
_m2"_1_2-_q_"i\/—m(n - m)(qk_1q;c + ”A: l)+(nAqk-1 + qk-lqkAxk-1)2 /4 —> oo.
-1

If p(r) 7:4 0 then the limit should be

_mp’(r) :l: \I—m(n — m)((-er)-)2 + nipﬂ) 79 oo

 

 

p(r) p(r) dw p(r)

This contradiction leads to the conclusion that r is a root of p(r). [1

Remark 2.2.12 It follows from Proposition 2.2.10 that ifr is a root of p(at) such
that an. —-> r+, as k —> oo, i.e., the quasi-Laguerre’s iteration sequence is approaching

r from the right hand side, then

i6—(l‘kgxk—laCIkaCIlc—1H < |5+(I€k,$k—1,kaQk—1)I

for k large enough since —qk and —qk-1 would both be negative, see Figure 2.5.
Similarly, if an, —2 r“, as k -+ 00, i.e., the quasi-Laguerre’s iteration sequence is

approaching r from the left hand side, then

l5—(xk,$k—1sqk,Qk—1)I > |5+($k,$k—1,qksqk—1)|

19

for 1: large enough, see Figure 2.5.
Notice that if the quasi-Laguerre ’s iteration sequence converges, then the sign cho-
sen in the formula is always the one that makes the magnitude of 6 smaller than the

choice of the other sign. This constitutes an important feature of our algorithm.

/ Xk xk-l xk-l Xk \

- 0, - 0 - O, -
qk< qk-l< q > qk>0

Figure 2.5: sign of qk = 21:) and qk-l = %)l as the sequence approaches the root

Remark 2.2.13 Starting from any initial point 2:0, two Laguerre ’s iteration sequences,
xi“ = L+(a:2') and 2;“ = L-(ccz), can be generated and both converge to a (dif-
ferent) root of the polynomial. However, this is not true for the quasi-Laguerre’s
iteration. The quasi-Laguerre’s iteration sequence generated using the + sign, for in-
stance, moves upward ( to the right). If there is no root of the polynomial on the right
hand side of the initial interval, the iteration sequence will deﬁnitely wrap around
after some steps of iterations, hence produces a point that is on left hand side of all
the roots. When this happens, the iteration usually collapses because there are roots
of the polynomial between two consecutive iterates now, hence complex number may
be generated with the quasi-Laguerre’s function. Numerical experiments have veriﬁed
this phenomenon. Therefore, the choice of sign in generating quasi-Laguerre’s itera-
tion sequence is more delicate. To summarize, if one starts with an interval [xo,$1]
that contains no root of the polynomial, then the following guidelines can be used to
generate iterates: If it is known that there is a root on the left (right, resp.) hand side
of the initial interval, then the — sign (+ sign, resp.) is chosen in the quasi-Laguerre’s
function to generate iteration sequence; If this information can not be obtained, use

the criteria in Remark 2.2.12 to generate iteration sequence.

20

2.3 Horn an optimization point of view

A Theorem from optimization

The following result can be found in [22]. Bold faced letters represent point in an

Euclidean space E" of dimension n.
Deﬁnition 2.3.1 Let x“ E E" be a point satisfying constraints
110‘") = 0, g(X*) S 0,

and let J be the set of indices j for which g,(.r"‘) = 0. Then x"' is said to be a regular
point of the constraints if the gradient vectors Vh,-(x"'), ng(x*), 1 S i S m,j E J are

linearly independent.

Theorem 2.3.2 (Kuhn-’Ihcker Conditions) Let x“ be a solution of the problem

minimize f(x)
subject to
h(x) = o, (2.32)
g(x) g o (2.33)

and suppose x" is a regular point of the constraints. Then there is a vector /\ E E"

and a vector p E E” with ,u 2 0 such that
Vf(:t*) + ATVh(:I:") + pTVg(x*) = ID (2.34)

n g(x*) = 0. (2.35)

(2.34) is called the Lagrange equation. (2.35) is called the complimentary condition.

Regular point only concerns active constraints.

Now we derive the quasi—Laguerre’s formula from the optimization point of view.
The basic assumptions on page 9 are still valid in this section. Let’s identify the real
a'Xis with the unit circle and label the roots, r1, r2, -- -, rn, of p(r) clockwise starting

from go, see Figure 2.6.

21

r2 r1 x0 x1 rn r2

r1

Figure 2.6: Labeling the roots clockwise

Lemma 2.3.3 If the roots ofp(:1:) is labeled clockwise, then

 

 

1 .
,forlSiSm—1<n.
:1:

Proof. Consider the following two cases.

Case I: rm < 2:0.

 

 

 

 

Then
rm S r,- < 230.
So,
rm—atoSrg—xo.
Or,
:rO—rm Bro—r,- >0.
Thus,
1 1
0 < S .
{130 — T‘m (Co — 1',’
That is,
1 1
2

Case II: rm > 230.

We consider the following two subcases.

22

X0 X1

(2.36)

II-l: r,- < 230.

 

 

 

 

Then
> 0 > .
rm — 230 r,- — 220
11-2: r.- > x0.
Then
r,- 2 rm.
So,
1 2 > 0
rm — x0 r, — x0
Cl

Therefore, in the clockwise labeling, the farther r, is to 230, the larger the quantities

l

“40 is. In the following, we assume m < 72. Our goal is to look for, based on x0,

 

x1, go, and q1, an iteration formula that will not jump over more than m — 1 roots

of p(x) in the clockwise direction. Such a formula answers the question: how close

h

to the interval [xo,x1] is the m‘ root of p(x). In order to answer this question, we

consider the following optimization problem,

 

 

 

 

 

 

 

minimize (2.37)
Tm — 1130
subject to
n 1
j=l $0 — rj
" 1
Z = q1, (2.39)
j=l $1 — T‘j
1 1 .
Z , fori=1,2,---,m—1, (2.40)
Tm — $0 7“; — $0
1 1
. (2.41)
rn — x0 rm — .730

The constraint (2.41) is to ensure that p(x) has at least two different roots.

Lemma 2.3.4 (r1,r2,---,rn) is a regular point of the constraints (2.37)-(2.40) if

7‘1: 7‘2: - - -= rm and there exists at least onej > m such that r, :,£ rm.

23

Proof. Write the inequality constraint (2.40) into the following,

1 1 .
— ZO,forz=1,2,-~-,m—1,
170—73 (ED—rm

 

Then the Jacobian matrix of the constraints (2.38)-(2.40) is,

 

 

 

 

 

 

(go—3,3, 0 0 72:03,“, 0 0 .(2.42)
0 (371,—). 0 ‘53-‘73 0 0
K 0 0 (——1:_)’ “(xo-lrmv 0 0 /

 

Multiplying the ith row by —1 and adding the result to the ﬁrst row, for i =

170""i-2
xl-ri—z

3,4, - - « ,m + 1, and then multiplying the it" row by — and adding the result

to the second row, for i = 3,4, - - - ,m + 1, the above matrix becomes,

 

 

 

 

1 1 1
r 0 0 0 mm): (..-.-..)2 (M),
0 0 . . . 0 2"“ (11"3‘)! l , . . 1
(1170'--7'm)2 (II—rm+l) (II-'73:)5
1 . . . __1____
——7(xo—r1) 0 0 (104,")? 0 0
1 1
0 (Io-1‘2)2 . i i 0 _ (IO‘rml‘) 0 O
o. 1 _ 1
K 0 0 ' (3‘70-’"vn—1)2 (lb-rm)? 0 0 t
(2.43)
Without loss of generality, we assume r1, 75 rm. Then the submatrix
__1_2 __1__2_
(to-rm) (Io-rn)
1 7." ($0402 1 (2.44)
m ‘=1 (xi—rm)? (rt-Tn)
is nonsingular since r1 = r2 = : rm # rn. Hence the Jacobian matrix (2.42) is of
full rank. [J
Theorem 2.3.5 An optimal solution can be attained only when r1 = r2 = = rm,
and rm+1 = rm+2 = = 7'“. Furthermore r1 = = rm = r is given by (2.30).

Proof. Let (r’f,r;,- -- r“) be an optimal solution to (2.37)—(2.41) and a regular

3 11

point of (2.38)-(2.40). According to Kuhn-Tucker Conditions (Theorem 2.3.2), there

24

exist [11, pg, 31, 32, ---, sm_1 such that

 

 

 

1 n n 1
V $0+H1(Z$0 .—CI0) +H2( ..—<I1)
rm— j=l$ _ r] j=l 1'1 — Ti

+".12:3s.(,,n—.1 ))=0,

 

 

and
1 1 .
s,-( — )=0,forz=1,2,---,m—1.

rfn—xo rf—zro

 

We have omitted the constraint (2.41) because of the complimentary condition. That

is,

1 1

_ . — —,O 2.4
(Tin - 1130):) #1( :n)2 MOT - 7702 _7‘(m — 30V )2; S ( 5)
1 + 1 0 f ' +1 (246)

—-—— — z 1‘ : m --- .

”1(170 —7‘;)2 [12(31— 7“?)2 , 0 J ? 3n,
1 .

ﬁlm+ﬂzm+sim=0, fori=1,2,~-,m—1, (2.47)
s,-(r',",,—r;-")=O, fori=1,2,---,m—1. (2.48)

Clearly in = 0 would lead to m = 0 by (2.46) and vice versa, and this would
further lead to s,- = 0, by (2.47), for all i = 1,2, - - - ,m — 1, a contradiction to (2.45).

So none of [11,[12 is zero. From (2.46) and 2’:—:';- > 0(from the basic assumptions), we
.7

 

have
iii—$0: —_1# f,orj=m+1,m+2,
73—231 #2
So
r§=rIn+h forj=m+1,m+2,~-,n.
Now all rf, for i = 1,2,--- ,m —1 must be equal to rfn. Otherwise, if r'l' 31$ r; for

instance, then 31 = 0 by (2. 40) and the complimentary condition (2. 48). Therefore,
:1—:—:—‘:— - F from (2 4(7) i. e. ,r‘{- — r;, contradicting to the fact that r; is different
from rm (see constraints (2.41)), which implies 1'; ¢ 1",".

So there are only two different rf’s , r; = r; = = ﬁn, and r;,+1 = ran = =

71:, when the maximum is achieved. We have derived the formula for r when the roots

25

of the polynomial is of this kind (see (2.7)), and the formula is the quasi-Laguerre’s
formula(see (2.30)). [1

Similarly, if the roots of p(r) is labeled counterclockwisely, starting from the right
end of the interval [xo,a:1], then the mth root is closest to 3:1 only when r1 = r2 =
- - - = rm 74 rm.“ 2 - - - = rn. This again yields the general quasi—Laguerre’s iteration

formula. The corresponding optimization problem in this case would be maximiz-

l
rm‘xl

 

ing satisfying similar constraints as (2.38)-(2.41). So we have the following

properties.

Corollary 2.3.6 Among all the polynomials that have only real roots (none of the
roots is in the interval [xo,xl]) and satisfy the fundamental constraint (2.38) and
(2.3.9), p(r) = k(:c — r)"‘(:r — z)""m is the one whose m‘h root, counting from the

interval, clockwise or counterclockwise, is closest to the interval [230,331].

Theorem 2.3.7 QLmi computed by {2.30) would never overshoot more than m — 1
roots of the polynomial p(r), counting from the interval [3:0, 3:1], clockwise or counter-

clockwise.

Proof. Theorem 2.3.5 showed that the closest m‘h root to the interval [230,321],

counting from the left (or right) end of the interval, clockwise (or counterclockwise),

is given by (2.30). [1

Theorem 2.3.8 Let m be an integer, 1 S m g n — 1. In the clockwise direction, the
larger the m, the farther the QLm_ is from $0. In the counterclockwise direction, the

larger the m, the farther the QLm+ is from .131.

Proof. Let m < k S n — 1 We only prove the case for the clockwise direction.

Note QLm_ is the closest mth root and QLk_ is the closest k‘h root among all the

l

o o o o c o o l
polynomlals With the prescribed logar1thm1c der1vat1ves, we have —_QLk_—1'0 Z _—QLm_—ro

since It > m. D

26

2.4 For polynomials with complex roots

The quasi—Laguerre’s method can also be applied to solve general polynomials or
continuously differentiable functions that may have complex roots. The choice of the
sign in (2.30), which we developed for polynomials with only real roots, is determined
by how the sequence is approaching the root. Negative sign is chosen in (2.30) if
the sequence is approaching the root from the right hand side, while positive Sign is
chosen otherwise. However, for functions with complex roots, one must choose the

sign for which the magnitude of |5| is the smaller one.

2.5 Convergence order

In this section we will prove the order of convergence for quasi-Laguerre’s method is
1+\/2 if the multiplicity index matches the multiplicity of the root. The proof is in the
complex plane, hence covers the real case. Write the quasi-Laguerre’s formula (2.30)

in the following form

$1+$0

2

 

+ 6:1:(30axlaq0a (ham)? (2'49)

132:

where

 

-m 0 I x 2
Jib—+11 i \/—m(n — m)[qoq1+ "Xi—i] + [(1091 + ”33: 2 (92‘)

6i($0,$13 qu q1,m) :
(1091 + (n - m)%%

 

(2.50)
Recall that the choice of + or — in (2.49) depends on the magnitude of (ii. For

convenience, we let

5+ if |5+| S l5—l,
a- if |5_| < |5+|.

6(xo,:t'1,qo,q1,m)=

Theorem 2.5.1 Let {1130211 C C be a quasi-Laguerre’s iteration sequence in the
complex plane, i.e., 33;.“ = W + 6($k_1,$k,qk_1,qk,m). If the sequence {$1.}
converges and the multiplicity index of the formula matches the multiplicity of the

converged root, then the order of convergence is 1 + {2.

27

Proof. Let r 6 C be a root of p(r) such that limk__,oo:1:;c = r. write p(x) =
(a: - r)"‘cp(:r), where p(r) 7t 0. Using (2.49), we only need to show that if le — r|

and I330 — rl is small, then
I32 — 1‘] = 0(le — rlzlxo — r|),if I330 — r| = 0(1), and I170 — r] = 0(1). (2.51)

Write

 

(IO = + 00) (2.52)

(Bo—7‘

 

(II = + 01) (2.53)

171 — T‘
where o,- = g(x,-), for i = 0, 1, and g(:r) = “3,65% is a continuously differentiable function

in a neighborhood of r. Then

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

m mao moo
(1091 = + + + 0001, (fl-54)
(ato—r)(:r1—r) at] —r :ro—r
ﬂ — — m + 3 2 55
Act — (2:0 — r)(.r1- r) Act, ( i )
where ﬁ—‘i = ﬁ’ Write
Aq _ m(m — n)
qOQI + ”AIL“ — ($1 _ 1')(.’130 _ 7‘) + 50, (2.56)
where
50 _ moo mm + (7001 + ”A: (2.57)
:31 —— r (to — r A3:
So,
Ag 1 1
Ax(qoq1+ nA_x) — m(m — n) (2:0 _ r — x1 _ r) + SoAx, (2.58)
and the radicant in (2.49) can be written as,
Aq (Ax)2 [ Aq]2 _ m2(n -— m)2
m(" m) [“1 + "Anti + 4 W“ + ”Ax _ (:ro — r)(x1 — r)
1 2 , 1 1 2
—m(n—m)So+— m(m—n) ( — )
4 11:0 — r 3:1 — r
1 1 2 2]
+2m(m n) (we _ r 931— r) SOASB + 50(Arr)
_ m2(n — m)2 1 1 2
~ 4 lxo—r 2:1 — rl —m(m—n)S'o
+1m(m_..)( 1 1 )SA +15% )2
2 1‘0 — 7' $1 — 7‘ 0 37 4 0 17

28

1 1
+

:ro—r xl—r

 

 

 

m2(n — m)2 [ 1 1

4 ]Z+%m(n—m)( )SoAzz:

0—1‘ $1—1‘

Ax

$0—7‘

 

+£53(Ax)2 + m(m — n)

So — m(m — n)So

 

 

= [m(nZ— m) ( 1 + 1 )+ éSoArc] + m(m — n) Am So — m(n — m)So

 

 

 

CEO—7‘ $1—T' $0—T'
2 $1—T'

= Sl—m(n—m) 50, (2.59)
$o—T

where
s—1( )( 1 + 1 )+lSA (260)
1—2mn m 170—?" 171—7“ 20 :12. '

Assume

le — r] < alro — rl, (2.61)

for some 0 < a < 1. Since the iteration sequence converges, assumption (2.61) is

feasible. Hence,

 

 

 

 

 

 

 

 

 

50 2 C(51) and So 2 0(512)if|:1:1- r| = 0(1), |:1:0 — r] = 0(1). (2.62)
Therefore,
A A 2 A 2
l/‘mW — m) [(1001 + nA—Zl + ( 2.1:) [(10% + TIA—iii
_ 2 _ _ $1 —T‘§
__ J51 (1 m(m mam—r512)
— is 1—l ( — )"’“’"§+ (( — )2) (263)
_ 1 an mam—r3120“ r . .
So,
—m(CIo+CII) _ _ 33 ﬂ 2 (gr
2 i m(n mllqOQI +nAx] + [00(11 +nAx] 2 (2-54)
_-m(qo+q1) _1 _ den-{53 _ 2
_ 2 i51(1 2m(n m)$0_rsl2+o((.r1 r) )). (2.65)

It follows from (2.52), (2.53), (2.65) and (2.60) that the dominant term in (2.65) is

—m2:l:m(n —m) 1

 

(2.66)

2 $1—T

29

Thus, taking the plus sign yields a smaller magnitude in (2.66). Subtracting r from

(2.49), and taking the sign in front of the square root that yields a + sign in (2.65),

we have

—m + mn—m —-S
xl_r+x0_r 4153+31__12_12L_r_a+0($1_r)

-S
(132—7': + $0,”!

where

52:

8'3:

52+S3

2 qu1 + (n — m)§§

: 52 + 53 (2 67)
qoq1+(n-m)%%’ '

 

 

 

:cl—r Ito—r _ ﬂ)
( 2 + 2 )(WW WA.

(it—L2: + Ego—g) [(330 — 13:31 — 7‘) {1% xTilr
(“ﬁfx‘xﬂ .) + (n _ nag—g]
(:31 -r xo—r) [(xm(2m-n) moo + 29L.

2 2 o—r)(:1:1—r) :cl—r :co-r

 

 

 

+0001 —

 

 

 

 

A0
Ax
m(2m—n) 1 m mxl—r m(2m—n) 1
+—00+'— 01+
2 xo—r 2 2930—1" 2 331—1"
:3 —r m a: —r+:1: —r A0
m 0 God-"01+ O 1

3... _.~ 2 2 [0001+(n ””3;

m(2m—n) 1 - 1 m m xl—r xo—r
[ + ]+—(00+01)+—( 01+ 00)
2 xo—r :cl—r 2 2 (co—r xl—r
.. - . g; £1 - .. (2.68)

+0001 + (n — m)

 

 

 

 

 

 

 

 

 

 

[0001 + (n — m):—:]

q0+q1 m(n—m):r1—rSo

2 +51 2 xo—r51+0($1 r)

m2 l 1 m l 1 l
'7( + )‘§‘(”°+”1)+§m("*m)( + )

Ito—7‘ asl—r .‘L‘o—T xl—r

 

m

 

 

 

 

 

m(n—m)x1—'r50
2 mo—TS:+O($1_T)
m(2m—n)[ 1 + 1 ]_m

2 3(00 + 0‘)

 

+%SoA$ —

 

 

xo—r xl—r
m(n—m)x1—rSo
2 Ito—TS]

 

+%SOA$ — + 0(331 — r) (2.69)

4::
A3:

 

 

 

m xl—r xo—r xo—r—l-xl—r
—( 01+ ao)+
2 xo—r :cl—r 2

[0001 + (n — m)

30

where

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 + 1)+a.SoAa:

121-7“ 130-!"

1+1)+SOA$

xl-r .ro—r

01m(n - m)

 

 

 

m(n — m)

_(n — m)m (ﬁﬂ—r + £5) — (n — m)(aoal + rag—g)

 

 

 

 

 

m(n — m) (1,114 + $01,.) + SoAx
n(n — In)??? — (n — m)(aoal — n-ﬁﬁ) + alSoAa:
_ n(n — m) (31%; + weir) + SoAa:

= 0(01 — 00) + 0(ch — r).

31

1 m(n—m)ml—TSO
_ A _ _ _
+250 in 2 $0-7‘Sl+0(xl 7')
_ _r_n_ xl—r :L‘O—r :co—r+:r1—r[ g
_ 2(xo—ral+x1—rao)+ 2 0001+(n m)A:c
A
1(m $00 mAmO'I-i-O'OO'IACC'i-TLAO')
2 xl-r arc—r
m(n—m)ml-rSo
2 $0 — 7' 51 + 0(31 T)
_ ma +2231—x0—r0
_ 2 0 2 xo—r 1
a: —r+x —1' Am A0
+(0 2‘ +‘2—l("°"1+(""mlal
+EAU_m(n—m)xl—r§+o(x —r)
2 2 xo—rsl l
m m 2121—21' A0
= ——ao+—[——01—0‘1+01—00]+(31-T)[‘7001+(n—m)—
2 2 xo—r Am
m(n—m)ml—TS’O
— 2 $0—T51+0(xl 7‘)
_ :cl—ra m(n—m)m—rﬁ
_ xo—rl 2 (Bo—7‘31
A0
+(181—7‘)(0'001+(n—m)'A—$]+0($1—7‘)
$1—T[a 71—77150]
2 m — —
(Co-7‘ l 2 51
A0
+(x1 — r) [0001+(n - m)m] + 0(331— 7"), (2-70)
0 n_m50 n—m ﬁs+gﬁ+aoal+nﬁﬁ
1- —=01—
2 s 2 §Mn~mmzﬁ+5ﬁﬂ+%&A$

So,

 

 

 

52 + 53 = m: : :(0(al — 00) + 0(x1— r))
+(x1— r) (0001 + (n — "Hi—E) + 0(x1 — r)
= (x1 — r) [0001+(n — m)-:—:
+2.::°<o<§:> .(g;;o.)]..(._,.

: (x1 — r)0 (g2(r) + (n — m)g'(r) + g’(r) +1) + o(x1 — r), (2.71)

and,
52 + 53 ~ (x1 - r)2(:ro - 7‘)0 (292(7‘) + 2(n - m)9’(7") + g'(?‘) + 1)
(IOQI + (n — 770%, m(2m - n) + 0(530 - 7‘)
That is,

 

 

292(7‘) + 2(n - m)9’0‘) + g’(?‘) + 1

m(2m — n)

 

x2 — r z 0( ) (x1 — r)2(x0 -— r). (2.72)

Let the order of convergence be ,u, then

331 — r = 0((xo — r)“), 1122 — r = 0((x1— r)”).

 

Hence,
1
#:2+_a
p
p2—2p—lz0,
and,
2+ 8
p— 2f=1+\/2—
I]

Remark 2.5.2 The following local convergence property of the quasi-Laguerre’s iter-
ative method in complex plan is implied by equation (2.72) that: there exits an e > 0
such that if Ixo — r| < 6, I321 — r] < e and Ix] — r| < a|xo — r| for some 0 < a < 1,
then the quasi-Laguerre’s iteration sequence starting from 2:0, and x1 converges to r,
and the convergence order is 1 + x/2 if the multiplicity index matches the multiplicity
of the root.

32

Chapter 3

Estimate multiplicity

3.1 Determining multiplicity of a root

While Quasi-Laguerre’s iteration converges super-linearly with convergence order
1+ \/2 when the multiplicity index of the method matches the multiplicity of the root,
it converges linearly otherwise. It is, therefore, important to estimate the multiplicity
of the root in the iteration process.

Let r be a root of the polynomial p(x) with multiplicity m. For an iterative
method involving ﬁrst and second derivatives at one point, the following Lagouan—
delle’s limiting formula [17] can be used to compute the multiplicity of the nearest

root
m = lim p’(xk)2
3r" P’(~”'=Ic)2 - P($k)P"(-’13k)

In practice, the following formula is used instead,

 

. (3.1)

P’(37k)2
ﬂu)? - P($k)P"($k)

where int(x) is the largest integer S x.

 

mmint<

) , (3.2)

 

 

This method requires the evaluation of the second derivative of p(x), therefore it
is inappropriate for our iterative method that involves only the ﬁrst derivative of the
polynomial.

Let x0, x1, ---, xk, - -- be an iterative sequence that converges to a root r of p(x)

33

with multiplicity m. Write

Let qk = M then

P(l‘k)’ I
m + 9 (113k),
xk — r g(xk)

where g(xk) # 0. This gives rise to the following formula for estimating the multi-

 

 

qk:

plicity m,
m z int(|(:1:;c — xk+1)qk|). (3.3)

This formula requires both xk_1 and 23;. to be close to the root. Otherwise, severe
overestimate may occur.
When both qk_1 and qk are known, then a better formula can be designed as

follows. Start with the following two equations

 

 

 

m g’($k-1)
_ —— , 3.4
qkl xk—l—T 9(17k-1) ( )
I
q. = m +g(‘”’°) (3.5)

33;. - r g(xk) '
Subtracting (3.4) from (3.5) then dividing the resulting equation by 113;, — xk_1
yields,

M = —m ) (1—%($k—1— r)(.1:)c — r)(:I:)c — xk_1)h;€) , (3.6)

(it). — xk_1 (:13)c — r)(xk_1 — r

 

where h’ = % (93%)1) for some 5), between xk and xk_1.

Multiplying (3.4) by (3.5), we obtain

2

 

qk_1qk = ( (1 + 6), (3.7)

(Ck — T‘)(:I3k_1 — 7‘)

where 6 = m(m, — r) + i1b)—(xk_1 — r) + #(xk — r)(xk_1— r)g—'(-£"—‘—‘lg—I(-r—"l).

mam.) "m(n—1) 9(rk—1) 9m.)

Dividing (3.7) by (3.6), we have

qk—1Qk($k - mic—1) _ _m 1+ 5
4!: - (Ik—l 1— #(ivk—i — 7‘)($k -* 7‘)(I13k — ilk-1M2.

 

 

(3.8)

So,

_ _qk_1qk(a:k - xk_1)1- ﬂack—1 - r)(rvk - r)(:v:c — Ink—0h}.
Qk — CIk—l 1+ 5 .

 

34

Hence, the following formula can be used to estimate the multiplicity m,

 

qk_1qk(xk — ark—1)
m z .
(lie-1 — (II:

This method is tested on the following polynomial of degree 23,
p(x) = (x+1)3(x—1)(x—3)(x—-3.0000000999991)(x—-3.1000001)”(x—10.5)(x—20.0)2.

Start from x1 = 4.5, x0 = 7.4, and aim at the root 3.1000001 of multiplicity 14.

If multiplicity index mul = 1 is used in our quasi-Laguerre’s iteration, it would
take 101 iteration steps to converge to the root. If we use (3.9) to estimate the
multiplicity when [(ka — xk)| < 1.0 x 10‘3, then the computation result shows that
after 20 iterations with mul = 1, condition |(x20 — x19)| < 1.0‘3 is satisﬁed, and (3.9)
is then applied to obtain the true multiplicity 14. When mul = 14 is used in the
quasi-Laguerre’s iteration, it takes only three more iterations to converge to the root
3.1000001.

To approach root 3.1000001 of multiplicity 14, with mul = 1 it took 20 iteration
steps for the condition |(x;c — xk+1)l < 1.0 x 10‘3 to be satisﬁed. If the multiplicity
estimation is started earlier, faster convergence can be expected. But, over estimate

may occur. The consequence of the over estimation could be the following,
(1) Cause the convergence to march a longer distance toward the target.
(2) Hurt the iteration process by jumping over the root.

In case (2), a back up scheme is necessary to restore the iteration process.

The following back up scheme is used in our experiments,
1. give up the new point,

2. reduce the estimated multiplicity index mul by 1 and recompute a new point

using the reduced multiplicity index,

3. check overshoot, if still overshoot, reduce mul by 1 again to recompute a new

point,

35

4. repeat this process until no overshoot.

A more complicated issue is how to detect overshoot. The following method is
useful in many situations. As the sequence is approaching to a root from one side(left
or right), the sign of 171%) should stay the same. Therefore, if different signs are
obtained at two consecutive iteration points, an overshoot may have occurred. For
some speciﬁc problems, more reliable tools for detecting overshoot are available. For
instance, when our method is used to solve the symmetric tridiagonal eigenvalue
problem, the Sturm sequence provides reliable information for detecting whether an
overshoot occurs.

The following computation result is obtained using the backup scheme stated
above. Estimation of the multiplicity starts when ka — xk_1| < 0.1, 7%31‘41 > 0,
%3} > 0 and 0.1 < < 1. 0. The advantage of the multiplicity estimation

formula IS fully exhibited 1n this result. Overestimated mul values, 17 and 15, are

xkil‘xk

xk-xk

 

 

used both successfully once, without overshooting the root. It helped the iteration
process advance to a closer position to the root. The total number of iteration is only

9, a 10 times speed up compared to the process using mul = 1.

starting points starting logarithmic derivatives
x-0= 7.4000000000000 -p’(x0)/p(x0)= -3.7424415421671
x_1= 4.5000000000000 -p’(x1)/p(x1)= -11.868803998501

new points marching distance mul-index

x_2= 4.3230028988455 d1t= -O.17699710115449 mu1= 1

x_3= 3.9224125629059 d1t= -0.40059033593962 mul= 17
x_4= 3.0665236579569 dlt= -O.85588890494904 mul= 17
x_5= 3.0957747133912 dlt= -0.82663784951471 mu1= 16
x_6= 3.1240282732094 dlt= -0.79838428969651 mul= 15
x_7= 3.0989338356082 d1t= -2.5094437601137D-02 mu1= 15
x_8= 3.1005358217376 d1t= -2.3492451471832D-02 mul= 14
x_9= 3.1000001611582 d1t= -5.3566057939210D-04 mul= 14
x_10= 3.1000001000000 dlt= -6.1158159848123D-08 mul= 14

36

total time for this root = 2.89380E-O2sec. 1.71600E-O2sec.

total number of iterations= 9

The following computation result is obtained when the above method is used to

the polynomial.
p(x) = (x+1)3(x—l)(x—3)(x—3.10000009999999)(x—3.1000001)l4(x—10.5)(x—20.0)2.

Note roots 3.10000009999999 and 3.1000001000000 are not equal, but extremely close,
to each other in the double precision environment. The numerical result shows that
3.1000001000000 is estimated as a root of multiplicity 15, even though it’s actual

multiplicity is 14.

starting points starting logarithmic derivatives
x_0= 7.4oooooooooooo -p’(x0)/p(x0)= -3.7477269546723
x_1= 4.5oooooooooooo -p’(x1)/p(x1)= -11.916423052696
new points marching distance mul-index

x_2= 4.3239509868566 dlt= -O.17604901314338 mul= 1

x_3= 3.9247571786955 dlt= -O.39919380816112 mu1= 17
x_4= 3.0720825767638 dlt= -O.85267460193171 mu1= 17
x_5= 3.1012166752925 dlt= -O.82354050340303 mu1= 16
x_6= 3.0999200729812 dlt= -1.2966023113043D-03 mu1= 16
x_7= 3.1000009568750 dlt= -1.2157184174845D-03 mu1= 15
x_8= 3.1000001000000 dlt= -8.56874987915350-07 mul= 15

x_9= 3.1000001000000 dlt= -4.654210059564OD-15 mu1= 15
total time for this root = 2.23110E-023ec. 1.36170E-028ec.

total number of iterations= 8

Remark 3.1.1 Stopping criteria for the above numerical results is |xk+1 - xkl < e

f'fl‘k)

or “I“ < c, where c is the machine precision for double precision numbers.

Remark 3.1.2 In this two examples where roots of the polynomials are known, over-

shooting is easily detected. Genrerally, detecting overshoot is a difﬁcult subject and we

37

do not attempt to address this problem here. However, in the application to symmetric
tridiagonal eigenvalue problems, the Sturm sequence precisely detects the number of

roots jumped.

3.2 A new stopping criteria
Kahan [16] has suggested the following stopping criteria for Laguerre’s iteration:
|$k+1— 931:1? S (ka * wk—1|-|$k+1- MDT, (3-10)

where 7' denotes the error tolerance. This criteria is based on the following observa-

tions. Let

$k+1- 33!:
Tk ‘2 ——

 

 

33/: - xk—l

Then as {xk}‘,:°=l converges to /\ when k —) 00, r): is normally decreasing. Thus

 

 

00 w
I/\ — xk+1| = Z($k+2+i — $k+1+il S 2 |$k+2+i “ $k+1+il
i=0 i=0

 

00
- T'k (Bk —.fl?k
Slmk+1—xk|2r2= '11:“. I

i=1
_ ka+1 — 331:12

_ lurk — ark—1| — lxk+1 — x1. '

 

For the quasi-Laguerre iteration, we propose the following stopping criteria,

1+ﬁ

< 7' (3.11)

9k+1

 

 

 

117041) — 93001 X i

where qk+1 = W, le = $1551. This criteria is quite efﬁcient when it is used
in solving the symmetric tridiagonal eigenvalue problem. The left hand side of the
above inequality is actually a prediction for the distance |xk+2 — /\|. This prediction
is based on the rate of convergence of the quasi-Laguerre’s iteration. We have shown

in Theorem 2.5.1 that
|$k+2 — Al % |$k+1 — Alzimk —' Al (3.12)

% |$k+1 *- $k|l$k+1— Allan. —* 4|, (3-13)

38

and

 

 

3,... — A) z (3,. — All”? (3.14)
It follows that
”W; (BL-+1" A ”ﬂ |$k+1 — All/E
qu z 33—h: =|xk+1- /\| I“ __ MIT/5

 

 

 

 

155k _ )(|(l+\/§)x\/§

 

z l$k+1— AI = l$k+l — All“ — 4|-

15131: _ ,\|1+\/§
So, by (3.13) and (3.14), we have
q 1+\/§
l$k+2 — )‘I z |$k+1— xkl —L
(lie-+1

 

 

In practice, power of 1 + J2 should be avoid in ﬂoating point computation, so the
following inequality is tested for stopping

2

—q—"— < 7‘. (3.15)

(1H1

|$k+1— xkl

 

 

39

Chapter 4

Application to symmetric

tridiagonal eigenproblem

4. 1 Introduction

In this chapter, we shall use the tools developed in previous chapters to approximate

all eigenvalues of symmetric tridiagonal matrices.

4.1.1 Evaluation of the logarithmic derivative f’ / f of the de-

terminant
Let T be a symmetric tridiagonal matrix of the form

{01 ,61 l
31 02 ﬁz 0

T=iﬂi—1,ai,ﬂil= i. -. - (4-1)

0 ,Bn-Z an—l ,Bn-l

\ ﬁn—l an /
We may assume, without loss of generality, that T is unreduced; that is, ,8,- 75 0, j =

 

 

1, - ' - , n — 1. For an unreduced T, the characteristic polynomial
f(/\) E det(T — A1) (4.2)

40

has only real and simple zeros ([30], p300). In order to use our quasi-Laguerre iteration
developed in previous chapters for ﬁnding zeros of f (A), or the eigenvalues of T, it is
necessary to evaluate f and f’ efﬁciently with satisfactory accuracy in the ﬁrst place.

It is well known that the characteristic polynomial f (A) and its derivative with

respect to A can be evaluated by three-term recurrences ([30], p423):

=1, = oz — /\
p0 ’01 1 . (4.3)
Pi = (ai _ A)pi—l _ ﬂE—lpi—2a 2 = 2131 ° ° ' an
I = 0, I = _1
P0 P1 . (4.4)
pi : (Oi — A)pi-l — pi—l _ ﬂE—lpi-Z) Z = 2139 ° ° ' an

and

f(/\) = pm f'(/\) = pi.-
However, these recurrences may suffer from a severe underﬂow—overﬁow problem and
require constant testing and scaling. The following modiﬁed recurrence equations [19]
is the result of careful investigation of the problem and is more stable than the code
presented in [23]. It computes the logarithmic derivative q()\) = f’ (A) / f (A), required

in our quasi-Laguerre’s formula. Let

 

._ p: .=_&
Pi—l, 10:,
£1 = 01"),
2 (4.5)
éi 2 ai_A_,?-i—__lla i=2,3,---,n,
:—

l
770 771 £1

. .2 (4.6)
”i=E1'f[(ai—)\)77£-1+1—(€::11)77£-2]a i=2139H°1n

I

 

and

f’(/\)

—W :77”

To prevent the algorithm from breaking down when E,- = 0 for some 1 S i S 11, an

extra check [19] is provided:
0 If {1 = 0 (i.e., 01 = A), set {I = fer};

41

522—152,
€i—1 ’

where s is the machine precision. A determinant evaluation subroutine DETEVL

olf§;=0,i>1,set§;=

has been implemented [19] according to the recurrences (4.5) and (4.6). When 5,,
i = 1, - - - ,n are known, the Sturm sequence is available ([25], p47). Thus, as a by-
product, DETEVL also evaluates the number of eigenvalues of T which are less than
A.

Let A1 < A2 < < An be the zeros of f(A) and A1 < A2 < < An be the zeros

of the numerical approximation f (A), it was shown in [19] that

mo.) - 3.1 s -‘Z—5m,ax{lal + (3.4.11) + 13.15. (4.7)

4.1.2 The split-merge process

Let

A1<A2<---<An

be the zeros of f in (4.2). To use our quasi-Laguerre’s iteration to approximate any
A,-, i = 1,2,- - - ,n, it is essential to provide a pair of starting points 3(0) and x11),
being either $(0) < x“) < A.- or A.- < x“) < 33“”, with no other Aj’s lying between 33(0),

33(1) and A,-. For this purpose, we split the matrix T into

. (To 0 )
T: (4.3)
0 TI

(01 .31 A (ak+l_,8k 5H1 A
To: [3‘ ,7]: m“ .(4.9)
[Bk—1 ﬂn-i

\ :Bk-l alt—ﬂk} \ [Bn—l an }

Obviously, the eigenvalues of T consist of eigenvalues of To and T1. Without loss of

where

 

 

 

 

generality, we may assume )6.- > 0, for all i = 1,2,... ,n — 1, since in (4.3)—(4.6),
ﬁi’s always appear in their square form. The following interlacing property for this

rank-one tearing is important to our algorithm.

42

Theorem 4.1.1 Let A1 < A2 < < An and A1 S A2 < S An be eigenvalues ofT

and T respectively. Then

A

3\1sxlsﬁzsAzsmsﬁnsAn<An+l
with the convention A...“ = An + 26),.

Proof. See [11, Theorem 8.6.2, p462]. I]

The eigenvalues of T will be used critically to approximate the eigenvalues of T
by our quasi-Laguerre iteration. We shall call this procedure, splitting T into To
and T1 of T and using eigenvalues of T, consisting of eigenvalues of To and T1, to
approximate eigenvalues of T, the split-merge process, similar to Cuppen’s divide-
and-conquer strategy [4] of course.

From Theorem 4.1.1, we have

511' S A,- S 3\i+1-
So, to evaluate certain eigenvalue A,-, i = 1,2,-~,n, of T, we start with the
mid-point, . .
_ A.“ + Ai+1
p _ 2

and evaluate @4813) by the subroutine DETEVL [19]. Note that the Sturm sequence
at p, which decides the position of p relative to Ag, is a by-product of this evaluation.
Based on the information obtained about p, we either use global Newton (see (4.15))
or Newton’s method plus bisection adjustment to ﬁnd the second starting point. To
use global Newton’s method, we must have —§(%l > 0 and p < A.- or —fL'(%l < 0 and
p > A,, and there is another point p0 such that no eigenvalues lie between p0 and p.
So if p < A,-(determined by Sturm’s sequence) and —%%l > 0, then global Newton’s
method is used to ﬁnd the second initial point. Since the previous eigenvalue A.-_1

has been found, we choose p0 to be A;_1 + 6 in global Newton’s method. Otherwise,

if —~‘;l(§)l < 0 and p < A, (or —f—’m > 0 and p > A.) then we use bisection method

f(P)
to examine the midpoint 9%9 (or 9%3, respectively); if —f7l(%l < 0 and p > A; (or

--%3 > 0 and p < Ag) then we use Newton’s method plus bisection adjustment to

43

examine the point min(p — ﬁg], Eer—b) (or max(p — %§], %3), respectively). Repeat
this process until two initial points p0 and p are found with €551 > 0 if p < A.- (or
£173 < 0 if p > Ag). The advantage of global Newton’s method is that the second

initial point can be obtained without extra call to the subroutine DETEVL, which is
the most expensive part of the whole algorithm. Once two initial points p0 and p are
found, the following quasi-Laguerre’s iteration formula (see 2.10) with initial points

330 = p0, x1 = p and initial multiplicity index m = 1 is used,

k- k k—
n + q(x$..")(x£.1 — $5.1”)

(k+l) _ (k)
x“ _ 3'” + (k-l) (k) (k—I) (k) (k—l)
we... ) — (x... — x... )5 i (/3 [(xm. — as... )23 + (m — n)]

 

 

 

 

(4.10)
where (k) (k4)
S = 237,—, («3335433) + #33:: : :Efgf )) , (4.11)
and (k)
W551) = mm“) (4.12)

351:1) '

Multiplicity index m > 1 may be necessary when clusters exist. This will be
addressed in Section 4.1.4.

The iteration sequence {32‘ )}:1 obtained by (4.10) with an appropriately chosen
sign (see Remark 2.2.12 in Chapter 2) converges monotonically to A.- with ultimate
convergence rate \/2 + 1 by Theorem 2.5.1.

The eigenvalues of T in (4.8) consist of eigenvalues of To and T1 in (4.9). To ﬁnd
eigenvalues of To and T1, the split-merge process described above may be applied
again. Indeed, the splitting process can be applied to T recursively (See Figure 4.1)
until 2 x 2 and 1 x 1 matrices are reached.

After T is well split into a tree structure as shown in Figure 4.1, the merging
process in the reverse direction from 2 x 2 and 1 x 1 matrices can be started. More

speciﬁcally, let T, be split into Tao and T01. Let Ag, - - - , A; be eigenvalues of Ta =

T, 0
( 0 in ascending order. Then the quasi-Laguerre iteration is applied to the
0 T01

44

TOT/\

T11

A A,“ Am A

TOIO THO Tlll

Figure 4.1: Split and merge processes

polynomial equation

fa(A) E det[TgI — AI] = 0

to obtain the corresponding eigenvalue A‘-’ i = 1,2, - - - ,m, by the merging process

described above. This process continues until To and T1 are merged into T. That is,
in the ﬁnal step all the eigenvalues of T are obtained by applying the quasi-Laguerre

iteration to f (A) = det(T — AI) from eigenvalues of To and T1.

4.1.3 Deﬂation

By Theorem 4.1.1, Ag 6 (Ag, Ag“) for each i = 1, - - - ,n with the convention An+1 =
Ag, + 2,61,. If Ag+1 — Ag is less than the error tolerance, then either Ag or Ag+1 can be
accepted as Ag. In general, if T has a cluster of m + 1 very close eigenvalues, for
instance, AH", — A,- is less than the error tolerance for certain I S j S n — m, then
m eigenvalues Aj, Ag“, - - - , Aj+m_1 of T can be obtained free of computations. They

can be set to any one of Ag, - - - ,A,~+m.

4.1.4 Cluster and cloud handler

Since the matrix T in (4.1) is unreduced, its eigenvalues are all simple. Therefore
using m = 1 in the quasi-Laguerre iterations seems appropriate in all cases to obtain

ultimate super-linear convergence with convergence rate \/2 + 1.

45

However, in some occasions, there may exist a group of eigenvalues of T, say,
/\i+l < ’Ai+2 < < A£+r

which are relatively close to each other, compared to other eigenvalues. We say
the spectrum has a cloud. For example, type-3 matrices (see Section 4.3.1) have
two clouds of eigenvalues. On some other occasions, some eigenvalues may even be
numerically indistinguishable. For instance, the spectrum of Wilkinson matrix [30]
contains eigenvalues mostly in pairs and numerically indistinguishable. We call this

type of eigenvalues a cluster. Figure 4.2 illustrates these situations.

lllll 11111 1 l l I I

Spectrum has two clouds Spectrum has clusters

Figure 4.2: Cluster and cloud of eigenvalues

Deﬁnition 4.1.2 If m roots of f(x) are numerically indistinguishable, we say the

function has a cluster of roots with cluster size m.

Deﬁnition 4.1.3 If m roots of f (x) , being gathered in an interval, are relatively far
from the other roots, we say f (x) has a cloud of roots in that interval and the cloud

size is m.

Numerical experiments exhibited slow convergence of quasi-Laguerre’s iteration
with multiplicity index mul = 1 in case of clouds and clusters. Hence a cloud and
cluster handler is needed to speed up the convergence. Two examples are listed below
to show the slow convergence of the quasi-Laguerre’s iteration in these situations.

Example 1. Wilkinson matrix W513. The eigenvalues of this matrix consist mostly
of numbers in pairs that are numerically indistinguishable. In particular, we look at
eigenvalue # 23, that is equal to 11.000000000000. The following numerical result is
obtained from the quasi-Laguerre’s iteration process with multiplicity index equal to

1 through out the whole iteration.

46

Numerical Result of Quasi-Laguerre with Mul-index 1

starting points

pO 8 11.250000000000 p1 = 11.137888560412

new points marching distance marching ratio

p2 ' 11.057728240655 -8.0160319756418D-02 0.71500571262537
p3 I 11.023732381883 -3.3995858772538D-O2 0.42409834286890
p4 8 11.009560676203 -1.4171705679063D-02 0.41686564748620
p5 I 11.003851776062 -5.7089001419764D-03 0.40283789906888
p6 = 11.001550472961 -2.3013031001457D-03 0.40310796176390
p7 = 11.000624132107 -9.2634085422377D-04 0.40252883427876
p8 = 11.000251228591 -3.7290351638350D-04 0.40255540358554
p9 = 11.000101125531 -1.5010305952183D-04 0.40252519197879
p10= 11.000040705300 -6.0420231328607D-05 0.40252498197625
p11= 11.000016384787 -2.4320513144343D-05 0.40252267509654
p12= 11.000006595237 -9.7895493107211D-06 0.40252231737957
p13= 11.000002654728 -3.9405095313399D-06 0.40252205758077
p14= 11.000001068586 -1.586141701832OD-06 0.40252198077863
p15= 11.000000430129 -6.3845684136714D-O7 0.40252194405438
p16= 11.000000173137 -2.5699288044115D-07 0.40252193067718
p17= 11.000000069691 -1.0344526894853D-07 0.40252192500802
p18= 11.000000028052 -4.1638989052076D-08 0.40252192754020
p19= 11.000000011292 -1.6760605649697D-08 0.40252191590758
p20= 11.000000004545 -6.7465114403695D-09 0.40252193634135
p21= 11.000000001830 -2.7156191175802D-09 0.40252197622175
p22= 11.000000000736 -1.0930956358719D-09 0.40252170445976
p238 11.000000000296 -4.3999528135870D-10 0.40252221939185
p24= 11.000000000119 -1.7710815504780D-1O 0.40252285092899
p25= 11.000000000048 -7.1289803041994D-11 0.40252129001486
p26= 11.000000000019 -2.8695184519433D-11 0.40251457143920
p27= 11.000000000008 -1.1550485192110D-11 0.40252346815500

47

p288
p298
p30-
p31all
p328
p33-
p34-
p35=

11
11
11
11
11
11
11
11

.000000000003
.000000000001
.000000000001
.000000000000
.000000000000
.000000000000
.000000000000
.000000000000

-4.6498264506008D-12
-1.8709902983900D-12
-7.53617046123OQD-13
-3.0360953346465D-13
-1.2186073771329D-13
-4.8424972315608D-14
-1.9584490177560D-14
-6.8926651173973D-15

Total number of iterations is 35.

Example 2. Type 3 matrix. As mentioned above, eigenvalues of this type of
matrix form two clouds. We examine a 99 by 99 matrix of this type with a = 100.0
and b = 44.0 (see Section 4.3.1). The two clouds reside on intervals [43.928732599741,
43.999929526104] and [100.000000000000,100.071267400259] respectively. Each cloud
of size m may be considered approximately as a root of multiplicity m when viewed

from distance.

.40256546571541
.40237852278298
.40279046169913
.40286977985244
.40137322541445
.39737960908740
.40442955857401
.35194508791937

00000000

Therefore the quasi-Laguerre’s iteration with multiplicity index 1

would converge linearly to the nearest eigenvalue of the cloud.

The following numerical result is extracted from the result of Quasi-Laguerre

algorithm applied to type 3 matrix of size 99x99. Multiplicity index 1 is used through

out the entire computation. The slow linear convergence is quite obvious.

Numerical Result of Quasi-Laguerre with Mul-index 1

starting points

P0=

101.015872629335 p= 100.841454915614

new points

p2
p3
p4

p5
p6
p7

100.743437006649

100.644852081645
100.563815029678

100.492668650879

100.431544760492

100.378647685865

marching distance

-9.8017908965762D-02
-9.8584925003505D-02
-8.1037051967655D-02
-7.1146378798231D-O2
-6.1123890387780D-O2
-5.2897074626041D-O2

48

marching ratio

0.56197221529112
1.0057848207917
0.82200247111588
0.87794875394317
0.85912862214851
0.86540752380866

98*
P9 =
p108
p11=
p12=
p13=
p14=
p15=
p16=
p17=
p18=
p19=
p20=
p21=
p22=
p23=
p24=
p25=
p26=
p27=
p288
p29=
p30=
p31=
p32=
p33=
p342
p35=
p36=

100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
100.
.075466390876

100

100.
100.
100.
100.
100.
100.
100.
100.
100.
100.

332982237172
293536689733
259485766377
230106703430
204781231369
182976233372
164233166158
148157185727
134408159176
122692599785
112756519967
104379002023
097366441507
091547492042
086768828702
082891873555
079790564254
077350095807

074045946273
073005718055
072272825651
071784008039
071484884863
071328989422
071273971357
071267449120
071267400260
071267400259

.5665448693327D-02
.94455474394590-02
.4050923355454D-02
.9379062947075D-02
.5325472061190D-02
.1804997997236D-02
.8743067214062D-02
.6075980431046D-02
.3749026550949D-02
.1715559390238D-02
.93607981874230-03
.3775179436346D-03
.0125605164542D-03
.8189494646869D-03
.77866333991680-03
.87695514658610-03
.1013093014782D-03
.4404684469887D-03
.88370493067220-03
.42044460324560-03
.0402282177712D-03
.3289240457534D-04
.8881761131248D-04
.99123176522530—04
.5589544052832D-04
.5018065412373D-05
.52223649513870-06
.8860813617266D-08
.9956651921029D-13

mNOOO000000000000000000000000

.86328873602484
.86379415002273
.86323870666811
.86279783488954
.86202450046865
.86099078210872
.85957665377625
.85770275736860
.85525275487378
.85210100852045
.84810972210352
.84314116799185
.83706899389962
.82978955419113
.82122440982119
.81130535273354
.79993427424847
.78691552816919
.77186203042146
.75406958919980
.73232579108987
.70454962868210
.66697049698000
.61193207773218
.52117472922255
.35291644980713
.11854717984453
.4914201062295D-03
.1776476818449D-06

However, the super-linear convergence can be obtained in these cases if a correct

49

multiplicity index is used. Therefore, estimating the multiplicity (i.e. the cloud or
cluster size) is essential to the cluster and cloud handler.
In our new algorithm, we used the following formula (see (3.9) in Chapter 3) to

estimate the cluster size or cloud size of each root as the iteration evolves,

k k k k
. (1nt ”(Al [mini - $531)]
mul = mt (k) (k+l) . (4.13)
Q(xm:k) _ Q($mi )

is not close to A5. There are two features con-

 

(m)

Overestimation may occur if x,
cerning over-estimating of mul, one being favorable to the convergence if it does not
result in jumping because the bigger the mul used in the formula the farther the
marching distance of the iteration. The other feature will hurt the convergence if it
jumps over the eigenvalue. In the second case, the iteration has to back up, namely,
reduce the value of mul and recompute the next iteration point. We used a sim-
ple, yet efﬁcient, back—up scheme by reducing mul to be the number of eigenvalues
jumped which can be easily detected as a result of Sturm’s sequence. Considering the
fact that the jumping resulted from overestimating may produce one initial point for
approximating the next eigenvalue, so an overestimate of the number mul causes es-
sentially no harm because our algorithm can dynamically reduce mul (see Figure 4.4),
while an underestimate may result in slow convergence.

The following are the numerical results of the new quasi-Laguerre’s method with
cloud and cluster handler applied to Examples 1 and 2 above. A substantial speedup
in each case is observed. mul is the multiplicity index used in the quasi-Laguerre’s
function, mlt is the estimated cloud size or cluster size and jump is the number of

eigenvalues jumped by the current iteration.

Example 1’

Numerical Results of Quasi-Laguerre + Cluster Handler

starting points

p0 = 11.250000000000 p1 = 11.137888560412
new points marching distance marching ratio mul
p2 = 11.057728240655 -8.0160319756418D-02 0.71500571262537 1

50

0.42409834286890
0.41686564748620
0.67428930324464

5.0719350393388D-04

p3 = 11.023732381883 -3.3995858772538D-02
p4 = 11.009560676203 -1.4171705679063D-02
p5 = 11.000004846655 -9.5558295481237D-03
p6 = 11.000000000001 -4.8466546715077D-06
p7 = 11.000000000000 -7.0335823226288D-13

Example 2’

Numerical Results

starting points

of Quasi-Laguerre + Cloud Handler

1.4512241534305D-07

p0 = 101.015872629335 p1 = 100.841454915614

new points marching distance marching ratio mul mlt
100.743437006649 -9.8017908965762D-02 0.56197221529112 1 51
100.644852081645 -9.8584925003505D-02 1.0057848207917 51 51
100.029961843314 -O.61489023833137 6.2371629162320 51 51
100.194192778959 -0.45065930268662 4.5712800681300 27 27
100.097946953508 -9.6245825450974D-02 0.21356671187569 27 27
100.065547713253 -3.2399240254871D-02 0.33663008346658 27 27
100.083910561793 -1.4036391714786D-02 0.14583896651118 9 9
100.073214269531 -1.0696292261722D-02 0.76204002275420 9 9
100.070280416514 -2.9338530174209D-03 0.27428691602977 9 9
100.072038260612 -1.1760089194859D-03 0.10994547369413 3 3
100.071242608246 -7.9565236563468D-04 0.67657000933504 3 3
100.071686253674 -3.5200693760571D-04 0.29932335696874 1 1
100.071422241996 -2.6401167823509D-04 0.75001839461135 1 1
100.071303605790 -1.1863620589650D-04 0.44935969003182 1 1
100.071269600688 -3.4005101598554D-05 0.28663342140443 1 1
100.071267404003 -2.1966854942081D-06 6.4598704045676D-O2 1 1
100.071267400259 -3.7435236860743D-09 1.7041691657475D-03 done

00000

Our cloud and cluster handler is more dynamic and more effective than the linear

acceleration method used in [19] and [8]. A numerical result is reported in [8] that

51

showed more than 11 iterations are required to compute the 23rd eigenvalue of the
99 by 99 Wilkinson matrix. The starting points they used are closer to the target
than the starting points 11250000000000 and 11.137888560412 used in our test. Our
algorithm needs 7 iterations only.

The following numerical result from the linear acceleration in [8] again shows the

advantage of our cluster and cloud handler.

Example 2”

Numerical Results of Quasi-Laguerre + Linear Acceleration

starting points

p0 = 102.000000000000 p = 101.961967587691

new points marching distance marching ratio activity
101.419535791275 -0.19431401692363 1.7915217080245 EVFP
101.174041555357 -O.43980825284116 Sturm
101.174041555357 -O.10914874934945 .55818469601614 EVFP
100.926995451781 -O.24704610357598 Sturm
100.926995451781 -9.86916862735660-02 .55818469601614 EVFP
100.703617752133 -O.22337769964884 Sturm
100.703617752133 -6.7609196631664D-02 .55818469601614 EVFP
100.550591824733 -0.15302592740005 Sturm
100.550591824733 -5.4640879681183D-02 .55818469601614 EVFP
100.426918246229 -O.12367357850324 Sturm
100.426918246229 -4.0113011667144D-O2 .55818469601614 EVFP
100.336126887701 -9.07913585279660-02 Sturm
100.336126887701 -3.1160591475516D-02 .55818469601614 EVFP
100.265598340604 -7.0528547097709D-O2 Sturm
100.265598340604 -2.3378215187691D-02 .55818469601614 EVFP
100.212684340167 -5.2914000436132D-02 Sturm
100.212684340167 -1.7833578400225D-02 .55818469601614 EVFP
100.172320010912 -4.0364329255723D-02 Sturm

52

100.172320010912 -1.3373261607786D-02 0.55818469601614 EVFP
100.142051114639 -3.0268896272261D-02 Sturm
100.142051114639 -1.0001205562557D-02 0.55818469601614 EVFP
100.119414493698 -2.2636620941768D-02 Sturm
100.119414493698 -7.3347879597213D-03 0.55818469601614 EVFP
100.102813013637 -1.6601480060970D-02 Sturm
100.102813013637 -5.2628633859943D-03 0.55818469601614 EVFP
100.090901105338 -1.1911908298643D-02 Sturm
100.090901105338 -3.6378825546402D-03 0.55818469601614 EVFP
100.082667161150 -8.2339441885324D-03 Sturm
100.082667161150 -2.3913400151639D-03 0.55818469601614 EVFP
100.077254627918 -5.4125332318762D-03 Sturm
100.077254627918 -1.4615792679392D-03 0.55818469601614 EVFP
100.073946505146 -3.3081227715712D-03 Sturm
100.073946505146 -7.9939460451005D-04 0.55818469601614 EVFP
100.072137164009 -1.8093411371325D-03 Sturm
100.072137164009 -3.5153908158226D-04 0.55818469601614 EVFP
100.071341494238 -7.9566977062484D-04 Sturm
100.071341494238 -5.6772584691737D-05 0.55818469601614 EVFP
100.071212995782 -1.28498456660960-04 Sturm
100.071214206718 -1.2728752032842D-04 Sturm
100.071225469893 -1.1602434544500D-04 Sturm
100.071253032066 -8.8462172612935D-05 Sturm
100.071284721654 -5.6772584684950D-05 Sturm
100.071267733919 -1.6987735058407D-05 3.3419749301337 EVFP
100.071267400306 -3.3361304770473D-07 50.920475608743 EVFP
100.071267400259 -4.6343259817599D-11 7198.7393424155 EVFP
100.071267400259 -4.6343259817599D-11 1.3891321138799D-04 done

Note that even though Example 2’ and Example 2” have different starting points,

the ﬁrst clearly showed faster convergence than the second one. The second took 40

53

iterations after 19 = 100.703617752133, while the first took only 17 iterations after
p = 100.841454915614 that is farther from the root /\ = 100.071267400259 than
100.703617752133 is.

4.1.5 Partial spectrum

In some applications, only a partial spectrum may be needed. Our algorithm, like
bisection method, inherits the features of the split-merge Laguerre’s method [19] for
ﬁnding partial spectrum speciﬁed by orders or by intervals.

From the strong interlacing property given in Theorem 4.1.1, one can easily obtain

the following:

Proposition 4.1.4 If [a,b] contains 1: eigenvalues of T, then [a,b] contains at least
k — 1 and at most I: + 1 eigenvalues of T. More precisely, let m(m) be the number of

A

3+]. be all the eigenvalues

eigenvalues of T which are less than a: E R and :\,+1, - - - ,
ofT in [a,b]. Thens—l Sn(a) Ss ands+k—l§n(b) Ss+k.

To ﬁnd eigenvalues of T in a given interval [a,b], the eigenvalues of T in [a,b]
are found ﬁrst, say 5x3“, - - - , :\,+k. By evaluating m(a) and m(b), the actual number
of eigenvalues of T in [a,b] is a = m(b) — 5(a). Hence /\,,(a)+1,- - - ,AKMH are the
eigenvalues of T in [a,b]. By Proposition 4.1.4, 5 — 1 _<_ m(a) S s and s + k — 1 S
m(b) S s + It, so, a can either be It — 1, k, or k + 1. Thus, at most k + 2 values are

needed to be considered as the ﬁrst starting points to evaluate these a eigenvalues of

T. Let

A
A

A.9 : 0, )‘s+l = A3+1) Ass-+2 = As+2s ' ' ' a )‘s+k = ’\s+ka As+lc+l = b

Then a values among them can serve as the ﬁrst staring points which will lead to all

a eigenvalues of T in [a, b].

To ﬁnd eigenvalues from the i“ to the jth eigenvalues, we use bisection and Sturm
sequence to ﬁnd a, and b such that the interval [a,b] contains all the eigenvalues of
interest. Then the above method is used in the split-merge process to ﬁnd all these

eigenvalues.

54

The capability of ﬁnding partial spectrum has direct application in parallel com-

puting. We will use this feature in the parallel implementation of the algorithm (see

Chapter 5).

4.1.6 Stopping criteria

The following stopping criterion was suggested by Kahan [16]:

|x("+1)— $00]? S (lec) _ $(k-1)| _ |$(k+l) _ 1,00”, (4.14)

where 7' is the error tolerance, see Section 3.2. This criteria is used in [8].

In addition, other stopping criteria are used in our code. We use the trivial
stopping criteria, Isak“ - xkl < 7', or lf’fio )l < 7'. Note that lh’; ) is the Newton
iteration step size. We also estimate the magnitude of the distance the next quasi-

Laguerre’s iteration can march. (3.15) derived in Section 3.2 is used for this estimate.

4.2 Description of the new algorithm

4.2.1 The global Newton’s formula

Assume that there is no root of f (x) lying between $0 and 3:1, and the logarithmic
derivative of ff / f1 at 1:1 is known, then the following formula can be used to obtain

a second point that does not cross any root of f,

1
11:2 = 3:1— —,——_—. (4.15)
7%.]...—

This formula is called the global Newton’s iteration formula [9]. In [21] this formula

is generalized to treat multiple roots and more properties of this method is described.

4.2.2 Initial points for the quasi-Laguerre iteration

At every stage of the merging process, we evaluate the eigenvalues A1 < ' - - < Am of

an m x m sub-matrix, given m initial values :\1 S - . - 3 Am (obtained from the previous

55

merging process) and an upper bound 3...,“ that interlace those m eigenvalues:
W1SA1$A2SA2$”'SIA\mSAmSS\m+1

(see Theorem 4.1.1). To evaluate an eigenvalue )1.- by the quasi-Laguerre iteration, two
initial points, say 3(0) and x“), are required on the same side of A,- without any other
eigenvalues lying between them and x“) is chosen to be closer to A.- than 13(0). For the
2"" eigenvalue, we start with ﬁnding the mid-point p = ’\2—+2£"—+-1- and computing '73))
and m(p). Then use the global Newton’s method or Newton’s method plus bisection
adjustment to determine the next point.

Several improvements are made in our practical implementation. From our com—
puting experience, the high order of convergence of the quasi-Laguerre iteration occurs
only when no critical point of f (i.e. zero of f’) lies between cell) and Ag. In other
words, if mm is to the left (resp. right) of A5, then it is desirable that —f’(x(1))/f(:c(1))
is positive (resp. negative), see Figure 2.5. If there is one or more critical point in
[X3 11.4.1], then bisection or one step Newton’s iteration is used repeatedly until the
above requirement at 2:“) is satisﬁed. Also, if the midpoint seems to be too far from

the target, then one of A; and ’1,“ might be the eigenvalue and should be tested for

quick exit. Our algorithm is summarized in the algorithm INIPTS in Figure 4.3.

4.2.3 Quasi-Laguerre iteration with cluster and cloud han-

dler

After two initial points are found, the quasi-Laguerre’s iteration (4.10) is used with
mul = 1 to begin with. In the process, the algorithm checks whether slow convergence
is encountered by checking the ratio ﬁt' If this ratio is between 0.1 and 1.0,
(4.13) is used to estimate the multiplicity mul of the root. The estimated mul value
is used in (4.10) to ﬁnd a new point. However, an overestimated mul used in (4.10)
may result in a new point that jumps over the target eigenvalue. If this happens,
the iteration must back up and the value of mul must be reduced to compute a new

point. Hence, the previous two points should be saved for possible future backup

before using mul > 1 in (4.10) to compute a new point. Overshooting is detected by

56

 

 

Algorithm INIPTS
Input: subscript i, initial end points :\,-,5\,-+1. Let A0 = X1.
Local variables: pOOk, plok, p1, fp1
Outputzstarting points p0,p and K.(p0),n(p),fp0 = —%,fp = —f7%§)l.
Begin INIPTS
(a,b) = (Ila/1m);
(n) p = 9%;
(#1) Evaluate fp= —ff’(;’) , m(p) by DETEVL;
If m(p) = i — 1, then
if p00k and fp > 0, then go to (it).
else p00k = .true.,p0 = p, pr = fp;
if fp > O and p— /\,-_1 > 2tol, use G-Newton and goto (it) .
else a =p, go to (#3);
endif
endif
Else if m(p) = i‘, then b = p;
if P101: and fp <0, then p0=p1,fp0=fp1, go to (it).
else Plok = .true.,p1 = p, fpl = fp;
if fp > O and p00k = .false., then
evaluate fa = —£f'—((§)l and 5(a), set p00k=.true.;
if m(a) =i, then A,- = a, goto (it) and exit.
else if fa < 0 then goto (#11);
else then p = min {a+fa,9%3}; goto (#1);
endif
endif
else goto (##);
endif
endif
Else
if m(p) > i, then b=p, goto (#11);
else then a =p, goto (mt);
Endif
(#End INIPTS

 

Figure 0.1: Algorithm INIPTS

57

 

the Sturm sequence obtained in the evaluation of —%ff)l. By comparing m(xk) and
m(mk_1), the algorithm detects whether a jump occurs. If jmp eigenvalues are jumped
over, the algorithm reduces the mul value to min(mul — 1, jmp), the minimum of
mul -l and jmp, and recompute a new point. The quasi—Laguerre’s method with the

above feature is called the quasi-Laguerre’s method with cluster and cloud handler.

The algorithm is illustrated in Figure 4.4.

4.2.4 Stopping test

Stopping test is done in various places in the algorithm. Figure 4.5 shows when

and where to check the stopping criteria.

4.3 Numerical tests

Our algorithm is implemented and tested on SPARC stations and DEC Alpha

stations with IEEE ﬂoating point standard. The machine precision is e z 2.2 X 10"”.

4.3.1 Testing matrices

There are 12 types of matrices used for testing our algorithm. In the following
description of these matrix types, 0,, i = 1,. - . ,n, denote the diagonal entries and ,85,

i = 1, ~ - - , n — 1, are the sub-diagonal entries.

Matrices with known eigenvalues

Type 1. Toeplitz matrices [b, a, b]. Exact eigenvalues: {a + 2b cos fillsksn ([12],
Example 7.4, p137).

Type 2. a1 =a—b,a,-=a for i=2,~-,n-—1,an=a+b. ﬁ,=b,j=1,~~,n—1.

k—
Exact eigenvalues: {a + 2b cos QTTSMlISkSn ([12], Example 7.6, p138).

58

 

 

Algorithm Q-LAG

Input: p0, pr, rs:(p0), p, dltO =p—p0, no roots between p0 and p.

Local variables: rat = iu—luused to test convergence rate,

dltO
mul--estimated multiplicity used to speed up the convergence,
oldmul--multiplicity used in the last iteration,
jmp = m(p) — n(p0)--used to adjust mul and upmul,
upmul--dynamically adjusted upper bound to control mul.
Uutput: p1, dlt1= p1 — p.
Begin Q-LAG
mul: 1, upmulzn—l;
(it) Evaluate fp= —%(§)l, m(p) by DETEVL; STOP-CHECK1;
imp = ”(19) — MPG):
If |jmp| 79 0, then
back up p0» fp0. p. fp;
mul = ma:1:(min(|jmp|,oldmul — l), 1) , upmul = mul;
Endif
p1 = qlag(n,mul,dlt0,p,n,fp0,fp) by formula (2. 10); dltl = p1 — p;
oldmul = mul. (for backup)

If mulyé 1, store p0, pr, p, fp for possible future backup;

and

rat = dltO '

if (0.1 < rat < 1.0), Then
estimate mul by formula (3.9);
mul = min(mul, upmul);
mul = maa:(mul,1);
endif
dltO = dltl, p = p1, p0 = p, pr = fp; STOP-CHECK2; Goto (1:);
End Q-LAG

 

Figure 4.4: Algorithm Q-LAG WITH CLUSTER AND CLOUD HANDLER

59

 

 

 

Algorithm STOP-CHECK
Inputs: dltO, p0, pr, p, fp, tol.

Outputs: EigenFound or ContinueIter.
After Evaluating fp, compute q==1/fp;
Begin STOP-CHECKI
if (Iql < tol), then EigenFound, eig = p + q.
else ContinueIter;
endif
End STOP-CHECKI
After Q-Lag iteration, new point pl is obtained;
dlt=p1—p, fprat = €12,703
Begin STOP-CHECKZ
if dltl < tol, then EigenFound, eig 2 p1.
else if fprat2 =1: dlt < tol, then EigenFound, eig 2 p1.
else ContinueIter;
endif

End STOP-CHECKZ

 

 

Figure 4.5: Algorithm STOP

a for odd i .
Type 3. a,- = , ﬂ,- = 1. Exact eigenvalues :

b for even i

 

 

{a+b:l:\/(a—b)2+16coszgk-4’f—l
2

} (add {a} when n is odd )
lSksn/2

([12], Example 7.8 and 7.9, p139).

Type 4. a; = 0, ﬂ,- = ‘/i(n—i). Exact eigenvalues: {—n + 2k — ”19:3,, ([12],
Example 7.10, p140).

Type 5. a; = —[(2i — 1)(n -— 1) — 2(i — 1)2], ,8,- : i(n — i). Exact eigenvalues:
{—k(k — 1)}15ksn ([12], Example 7.11, p141).

60

Wilkinson and random matrices

Type 6. Wilkinson matrices W: . ﬂ,- = l,

n/2—i+1 forevennandlSiﬁn/2

_ i—n/2 forevennand n/2<iSn
m— (n-1)/2—i+1 foroddnandlSi_<_(n+1)/2
i—(n+1)/2 foroddnand (n+1)/2<i$n

([30], pp308—309). Most of the eigenvalues are in pairs, consisting of two

numerically indistinguishable eigenvalues.

Type 7. Random matrices. a;’s and 333 are random numbers in [0, 1].
LAPACK testing matrices (generated by the LAPACK test matrix generator [2])
Type 8. Matrices with eigenvalues evenly distributed between its smallest and largest

eigenvalues.

Type 9. Matrices with geometrically distributed eigenvalues. Namely, eigenvalues

can be written as {qkhSkSn for some q E (0,1).
Type 10. Matrices with an eigenvalue 1 and the remaining eigenvalues in (—8, 5).

Type 11. Matrices with eigenvalues evenly distributed in the interval (0, 1] except

one eigenvalue with very small magnitude.

Type 12. Matrices with an eigenvalue 1 and the rest of the eigenvalues are evenly

distributed in a small interval [10"12 — 5, 10‘12 + 5].

4.3.2 Speed test in evaluating eigenvalues without comput-

ing eigenvectors

We compare the performance of the following codes for evaluating eigenvalues of

an n x n matrix:

( 1) ner-LAG: our split-merge algorithm using the quasi—Laguerre iteration with

cluster and cloud handler. Storage requirement: 9n;

61

(2) oldQ-LAG: The old split-merge algorithm using the quasi-Laguerre iteration by

[8]. Storage requirement: 9n;

(3) B/M: bisection/multi-section subroutine DSTEBZ in LAPACK. Storage re-

quirement: 12n;

(4) RFQR: root-free QR routine DSTERF in LAPACK, as recommended in LA—

PACK for evaluating eigenvalues only. Storage requirement: 2n.

First of all, we compare our speed with the old version of quasi-Laguerre’s algo-
rithm [8]. The result is presented in Table 4.6. Matrices of types 8—12 are generated
by LAPACK. The order of these matrices generated is limited to 550 by the exe—
cutable program available. For the ﬁrst ﬁve types of matrices, the new version of
quasi-Laguerre’s algorithm is faster than the old version. For types 8, 11 and 12, the
two version dose not have much difference. For the other cases, the old version is
faster.

We then compare our new version of quasi-Laguerre’s algorithm with the LAPACK
subroutines: DSTEBZ—bisection method and DSTERF—root free QR method. The
result of this test is given in Figure 4.7 and Figure 4.8. Matrices of type 10 to type 12
involve tiny eigenvalues or clusters and are used more in stability tests. The speed
comparison in this category may not be relevant. In particular, matrices of types 10
and 12 have big dense clusters, and intensive deﬂations make both Q-LAG methods
out score RFQR in speed by a wide margin.

4.3.3 Accuracy test

For those matrices with known eigenvalues (type 1 to type 5), the accuracy of a

method can be determined by

. i.- — A,
dlrect error: D = max |——-—|
' ||T||1
where X." is the approximation of the exact eigenvalue A; of T and H - “1 is the [1 norm.

62

 

  

 

 

 

 

  
 
 

 

 

 

 

  
  

 

 

    

 

 

 

 

 

 

  

 

 

 

 

 

 

 

 

Type 1 matrix 6 Type 2 matrix
A 6 I I I I I A I I I ﬂ 1
8 OQL -o— o‘ OQL *—
3, 4 NQL -+--- .w ,8, 4 NQL -+--- “,1.
e 2 - 1 e 2 ~ ' ~
'1: 0 : i 0 g . . . 1
200 400 600 800 1000 200 400 600 800 1000
Matrix order Matrix order
Type 3 matrix 6 Type 4 matrix
A 6 I l I A I I I II I
8' OQL —o— 8' OQL —o—
3 4 ‘NQL -+--- - 3,, 4 NQL -+--- -
e 2 - e “e’ 2 - -
I: o 3 1 L 1 1 l: 0 ¢ 1 1 1
200 400 600 800 1000 200 400 600 800 1000
Matrix order Matrix order
Type 5 matrix 6 Type 6 matrix
A 6 I I I I r A I I I I
8 OQL -o— ,, 3 OQL 4—
3 4 NQL -+--- " i 3 4 NQL -+--- J,“
E 2 - l .g 2 P ”*‘__r.,...«-—’* l
I" O A 1 1 1 1 l— o "' 1 1
200 400 600 800 1000 200 400 600 800 1000
Matrix order Matrix order
Type 7 matrix 2 Type 8 matrix
A 5 A I I I I I
' 4 ' 1.6 " L 7
iii 3 1% 1-2 - 381 :4: *
0 2 w 0.8 - -
.§ 1 .§ 0.4 - .
i- 0 l— o 1 1 1 4
200 400 600 800 1000 100 200 300 400 500
Matrix order Matrix order
Type 9 matrix 0 3 Type 10 matrix
A 2 I I I I I A e I I I I T
' 1.6 - - 0' 0.24 - 1
ii, 1.2 - 38:: I: -»+ - 3‘»: 0.18 - 00'- +‘ ,x ------ " «
g 0.8 ~ ,,,,,,,, - g 0.12 ~ NQL ’*“‘ " .
i: 0'3 ' """" “ '1: 0'08 ’ - , . . . i
100 200 300 400 500 100 200 300 400 500
Matrix order Matrix order
Type 11 matrix Type 12 matrix
A 2 I I I I I A I I I I I
' 1.6 - J '
i; 1.2 . - i,
g 0.8 ~ - g
i: 0.3 r 4 l: 0

100 200 300 400 500
Matrix order

 

 
 
  

 

 

   

 

 

  

 

 

 

 

 

 

 

 

 

  
  

 

 

     

 

 

 
 

 

 

 

 

 

 

 

 

 

100 200 300 400 500
Matrix order

Figure 4.6: Comparison of execution time on Dec Alpha between N QL-the new version
of quasi-Laguerre Iteration and OQL—the old version of quasi-Laguerre’s method, for

ﬁnding all eigenvalues without computing eigenvectors.

63

Type 1 matrix

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

35 . . a
B/M 4—
NQL -+---
30 RQR 9... l
25 v -
g .
£9.
i:
'1;
200 400 600 800 1000
Matrix order
Typeamatrix
35 I I I I I
N333 *—
30 RQR ~e~ _
25 - -
3 .
3
'F
-]
- --- '1’. . ”El
200 400 600 800 1000
Matrix order
TypeSmatrix
35 . r
B/M 4—
NQL --+-- .
30 RQR we
25 r ‘
8'
52
i:
200 400 600 800 1000
Matrix order
Figure 4.7:

Time(sec.)

Ti me(sec.)

Time(sec.)

Type 2 matrix
35 I I I I I
£33 *
3° RQR -a--- ‘

25- -

 

 

 

 

 

200 400 600 800 1000
Matrix order

Type4matrix
35 1 .
it"! "‘5
30 RQR ......_, l

25- -

 

 

 

 

 

200 400 600 800 1000

 

Matrix order
Type 6 matrix
35 . . .
Nat/2T *—
3° RQR ......, i
25 - -

 

 

 

200 400 600 800 1000
Matrix order

Execution time on Dec Alpha for ﬁnding all eigenvalues without com-

puting eigenvectors. B / M: DSTEBZ; NQL: the new quasi-Laguerre Iteration; RQR:
Root-free-QR—DSTERF.

 

 

   
   

 

 

 
  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Type 7 matrix Type 8 matrix
25 I I I I I 10 I I I I f
NQL "1"“ NQL -+---
20 RQR £1 8 ” RQR ------a
T 15 - ’-‘ 6
i. i;
o a)
E _ E
i 10 i: 4
5 - 2
o —::*.'.'.'.+::.~. 2:111... 0
200 400 600 800 1000 100 200 300 400 500
Matrix order Matrix order
Type 9 matrix Type 10 matrix
10 I I I I I 8 I I I I I
B/M _._ _
~01 7 .53.! :7:
8 ’ RQR -----e
6 ' RQR "a"
’1‘ ’7 5 .
8 5 8
L”, EL 4 .
“e’ E
i: 4 i: 3 ;.
2 _
2
1 -
.......... E]
o o _‘ “mun"--.-_-§'.';;;°.'_‘;;§l. -------- t
100 200 300 400 500 100 200 300 400 500
Matrix order Matrix order
Type 11 matrix Type 12 matrix
9 I If I I I 6 I I I I I
8 - B/M +—
NQL -+--- 5 ' BM ‘9—
7 r non We NQL -+--—
f? 6 h 6 4 _ RQR ~a---
(D
E 4 ’ g
l- 3 _ l— 2
2 1-
1
1 '- ,-
o _‘p-"nitﬁfff: .... E -------- 51 O -
100 200 300 400 500 100 200 300 400 500
Matrix order Matrix order

Figure 4.8: Execution time(continued) on Dec Alpha for ﬁnding all eigenvalues with-
out computing eigenvectors. B / M: DSTEBZ; NQL: the new quasi-Laguerre Iteration;
RQR: Root-free-QR—DSTERF.

65

First of all, we compare the accuracy of the new Quasi-Laguerre’s algorithm with
the old version. Results are shown in Table 4.1, which shows the new method is no
less accurate than the old one.

We then compare the accuracy of our new algorithm with Bisection/ Multi-bisection
and Root Free QR method. The results are shown in Figure 4.9. It appears that our
algorithm Q-LAG achieves the smallest direct error on all matrices of the ﬁrst 5
types. The direct error of our algorithm as well as B / M is independent of the matrix
size, whereas RFQR seems to have larger error when the matrix size becomes larger.
Root free QR is the fastest algorithm, but is the least accurate one, compared to

Quasi-Laguerre and Bisection method.

66

Type 1 matrix

Type 2 matrix

 

 

I I

1.8 “M
1.6 $QL
1.4 ~
1.2 -
1 ..
0.8 -

I
+—
-+- -

Error(eps)

0.4 -
0.2 -

 

Error(eps)

 

 

 

I I I

 

 

 

O I l
400

600

800 1000

Matrix order

Type 3 matrix

400 600 800
Matrix order

Type 4 matrix

 

 

1.8
1.6
1.4 r
1.2 -

Error(eps)

0.8 -

0.6 -

0.4 -
0.2 -

 

I

Error(eps)

 

 

 

I I

 

J l l

 

 

200 400 600 800

Matrix order

Error(eps)

 

 

 

 

 

We ‘1 1 .
200 400 600 800 1000
Matrix order

400 600 800
Matrix order

Figure 4.9: Error(on the scale of machine double precision) , on Dec Alpha, for

ﬁnding all eigenvalues without computing eigenvectors. B / M: DSTEBZ; NQL: the
new quasi—Laguerre Iteration; RQR: root free QR—DSTERF.

67

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Order Version type 1 type 2 type 3 type 4 type 5
100 new 0.656 0.656 0.656 0.04 0.0512
100 old 0.656 0.656 0.656 0.05 0.102
200 new 0.656 0.656 0.656 0.04 0.0586
200 old 0.656 0.656 0.656 0.04 0.819
300 new 0.656 0.656 0.656 0.853 0.0683
300 old 0.656 0.656 0.656 0.0533 0.728
400 new 0.656 0.656 0.656 1.28 0.0635
400 old 0.656 0.656 0.656 1.28 0.819
500 new 0.656 0.656 0.656 1.02 0.524
500 old 0.656 0.656 0.656 1.02 0.524
600 new 0.656 0.656 0.656 0.853 0.091
600 old 0.656 0.656 0.656 0.853 0.728
700 new 0.656 0.656 0.656 1.46 1.07
700 old 0.656 0.656 0.656 0.731 1.07
800 new 0.656 0.656 0.656 1.28 0.0614
800 old 0.656 0.656 0.656 1.28 1.64
900 new 0.656 0.656 0.656 1.14 1.29
900 old 0.656 0.656 0.656 1.14 1.29
1000 new 0.656 0.656 0.656 1.02 1.05
1000 old 0.656 0.656 0.656 1.02 1.05

 

Table 4.1: Accuracy comparison between the new version and the old version of the

quasi-Laguerre’s algorithm. The numbers in the table represents the max-error of

|computedeigs — trueeigsl /(1norm), as multiples of machine precision

68

 

Chapter 5

Parallel computation of eigenvalues

5.1 Introduction

In the advent of parallel and vector computers, such as Butterﬂy, Convex, SGI,
nCUBE2 etc., and softwares such as PVM [10], it is possible to shorten the time
required to ﬁnish certain large computing works. Parallel computers often demand
parallel algorithm to subdivide the problem into smaller ones so that each CPU(or
processor) can handle part of the work simultaneously. As the whole computing is di-
vided into smaller jobs so that all of them can run at the same time, the total amount
of computing time can be decreased by a factor of the number of CPUs participated,
provided that each job can be executed independently of others. The biggest advan-
tage of our quasi-Laguerre’s iteration for solving symmetric tridiagonal eigenproblems
is its natural parallelism, in the sense that each eigenvalue can be computed fully in-
dependently of the others. In this chapter, we shall present a parallelized version of
our algorithm for symmetric tridiagonal eigenvalue problem and present some com-
putation result to show our algorithm is the most efﬁcient and the fastest one among
all the existing parallel algorithms for the problem.

There are two types of parallel architectures, one with shared memory and the
other with distributed memory. With the parallel software PVM, developed at Oak
Ridge National Laboratory, it is possible to hook a bunch of existing UNIX worksta-

tions to form a virtually parallel machine. Each workstation has its own memory, so

69

this virtually parallel machine has distributed memory architecture. Message passing
is inevitable for a parallel code to run on this architecture. There are many advantages

with this virtually parallel computer:

1. It is less costly since a real parallel computer or vector computer is normally

very expensive.

2. It has more memory space since each machine has its own considerably large

memory.

3. It is more ﬂexible since the machines can run daily routine jobs when parallel

computing is not in demanding.

4. Programs using PVM for a cluster of workstations can be easily ported to some

super computers since many super computers support PVM.

5. It is equally suitable for both SIMD and MIMD applications while many other
super computer is only better suitable for one of this type of application than

the other.

5.2 Issues for parallel algorithm design

To design an efﬁcient parallel algorithm, the following factors are often taken into
consideration.

1. Communication cost. With a message passing model for distributed mem—
ory parallel machine, including PVM machine, sending and receiving data between
processors is inevitable. This is called process communication or message passing.
Some message passing model requires the sending and receiving happen at the same
time, it is called synchronous hand-shaking. With synchronous communication, the
computation on the sending processor halts until the matching receive is executed
by the receiving processor. A better model allows sending and receiving happen at
different time, that is, messages sent are piled up in queue and stored in buffer wait-

ing for picking up by receiver(s) and the sender can continue to process other things

70

once message is on the way to its destinations. The sending process needs not to
wait for the response of the receiver(s). This is called asyncronization. PVM allows
asyncrounous communication. The time spent on message passing(including data
transferring time, processor waiting time and processor idle time etc.) is called com-
munication cost. An important statistics is the computation to communication ratio:
(time spent computing) / (time spent communicating). One should always maximize
this ratio. If the algorithm requires frequent exchange of information among the
processors, the communication cost will be very high.

2. Length of message. Longer messages take longer time to process, but short
messages require a start-up time. For example, if a PVM program wants to send out
data, it has to clear the send buffers to prepare them for packing messages. Take a
look at the following two pieces of codes, both are sending n integers to n processors

(with process IDs specified by tid[ ]).

code 1: I code 2:
l
for (i=0; i<n; i++){ I pvm_initsend(PvaataRaw);
pvm_initsend(PvaataRaw); I pvm_pkint(msg, n, 1);
pvm_pkint(&msg[i], 1, 1); I for (i=0; i<n; i++){
pvm_send(tid[i], tag); I pvm_send(tid[i], tag);
} I }

The second code is obviously more efﬁcient because the ﬁrst code starts up the sending
process n times while the second code starts the initialization only once. The packing
in the second code is also more efﬁcient than the one in the ﬁrst code because it packs
the whole array only once.

3. Message routing. An effective message routing schedule could also improve
the performance of a parallel algorithm. Message routing heavily depends on the
architecture of the parallel machine and hardware interconnection. With PVM ma-

chines that consist of a bunch of workstations located at different sites with certain

71

distances and interconnected through Ethernet(typically 30Mbps) or F DDI(typically
lOOMbps) ﬁber cables, message routing is even a more important issue. PVM sup—
ports multi-casting that most multiprocessor vendors do not. Multi-casting certainly
makes message routing more efﬁcient if one has to send messages to a group of pro-
cessors. Let’s take a look at the following two codes. Both codes are sending, with

different ways though, n messages (data) to n.proc host machines, speciﬁed by the

ti d array.
code 3: | code 4:

I
pvm_initsend(PvaataRaw); l pvm_initsend(PvaataRaw);
pvm-pkint(msg, n_msg, 1); | pvm_pkint(msg. n_msg, 1);
pvm_mcast(tid, n_proc, tag); I for (i=0; i<n_proc; i++){

| pvm_send(tid [i] , tag) ;

Code 3 is more efﬁcient than code 4 since code 3 sends the messages through the
network once while code 2 puts the messages through the network n_proc times.
Multi-casted data passes the cable (network) only once and is picked up by the receiver
when it passes by. Finding out how the PVM machines are wired up could also help
reduce the trafﬁc of message passing.

4. Buffering. If too many messages are received they must be buffered in some
way in some local area. Messages could be lost if we run out of space. With PVM,
normally there is no such a problem because each workstation has relatively large
memory.

5. Load balancing. Section 5.5.1 discuss this issue in more details.

5.3 Determining performance

1. Timing. There are different ways to record the run-time of a program. The

Fortran subroutine dtime() record the user execution time since the last call to the

72

subroutine, etime() record the user execution time since the program is started. C
function gettimeofday() records the wall time, hence it includes all the computation
time, system scheduling time and waiting time etc. In a shared or heterogeneous PVM
environment where computer loads, powers or speeds may be different, some proces—
sors may be in idle (e.g. waiting for results from other processes) while others are
heavily engaged. The performance of an algorithm should be judged by the total time
from when the process is ﬁrst started until when the process is completely ﬁnished.
This total time includes the machine idle time, waiting time, and certainly compu-
tation time. Therefore, using gettimeofday() realistically reﬂects the performance of
an algorithm.

2. Parallel efﬁciency. Parallel efﬁciency of p processors is deﬁned as

_ T(1)
E(P) - m,

where T(k) is the time required to execute the program on k processors. A parallel

algorithm with parallel efﬁciency equal to 1 is a perfectly parallelized algorithm.

5.4 Existed parallel algorithms

There are several parallel algorithms available for computing the eigenvalues of a
symmetric tridiagonal matrix. The following is a summary of all the existing algo-
rithms. We will see that our parallel algorithm with quasi-Laguerre’s method seems
to outperform the others.

1. Jacobi methods. This method has been announced to be of historical interest
only. Barlow has an introductory description of this algorithm in [3].

2. QR algorithm. A parallel QR was ﬁrst discussed by Sameh and Kuch [28].
Arbenz, Gates, and Sprenger [1] studied a modiﬁed version of QR for ﬁnding both
eigenvalues and eigenvectors. The ﬁnding of the eigenvalues was performed redun-
dantly while the calculation of the eigenvectors was done in parallel. In [8], the
sequential quasi-Laguerre’s method has been compared with the sequential QR for
ﬁnding both eigenvalues and eigenvectors, and the results showed that quasi-Laguerre

is faster than QR on a sequential machine.

73

3. Cuppen’s Divide-Conquer method. Ipsen and Jessup [14] reported that
parallel bisection is faster than the divide-and—conquer method [4]. Demmel also
reported the comparison between these two methods and showed the new version
of divide and conquer is only slightly faster than bisection+inverse iteration on a
sequential machine for random matrices, but twice as fast as the bisection+inverse
for geometrically distributed matrices [5]. Our parallel quasi-Laguerre’s method is
implemented for distributed memory model, and based on the above information, we

do not make the comparison with the method of divide-conquer.

4. Split-merge Laguerre’s method. A parallel Laguerre’s algorithm is reported
in [18]. The algorithm parallelized the sequential code along with the split-merge
process, that is, after the matrix is split, each processor computes the eigenvalues of
the corresponding smaller matrix then computes the eigenvalues of the merged matrix
level by level until eigenvalues of the entire matrix are found. With this approach,
processors must exchange eigenvalues during each level of merging processes, hence
the communication is quite high. [18] reported that when matrix order is small (such
as 128), parallel bisection performed better than parallel Laguerre’s method. This
may attribute to the communication cost of the parallel Laguerre’s algorithm. Our
parallel approach with the quasi-Laguerre’s method requires minimum (nearly zero)
communication cost as the parallel bisection does, therefore is always faster than
bisection method. Another aspect of the approach in [18] is that the original code
was written for nCube supercomputer which requires the number of processors to be
power of 2, such as 2, 4, 8, etc.. The code was then ported to PVM environment
without implementation modiﬁcation. Hence the code still requires the number of

workstations to be power of 2.

5. Bisection method. This is a fully parallelable and fully scalable algorithm.
Our algorithm maintains all the good features of this method and our parallel quasi—

Laguerre’s algorithm outscored parallel bisection method by a great margin.

74

5.5 The parallel quasi-Laguerre’s method

We implemented the parallel quasi-Laguerre’s algorithm using PVM with a master
and slave program.

The master program divides the spectrum into small chunks, l : n-1, (n_l + l) :
n.2, (n_k + 1) : n, where i : j denotes the eigenvalues from number i to number j.
Then the master spawns the slave process to all available machines that form the
PVM machine, and sends out information to the slaves. The information sent to
the slaves includes the matrix order, the diagonal and off diagonal of the symmetric
tridiagonal matrix, starting numbers and ending numbers of eigenvalue chunks, and
some other administrative information such as parent process ID and slave process
IDs and so on. The master program also serves as an administrator that is freed
from computation and also keeps the ability to process other jobs that has to be done
sequentially. So we don’t spawn slave process onto the machine the master process is
running.

The slave program spawned by the master program receives data and ’instructions’
sent by the master program and calls split-merge Quasi—Laguerre’s subroutine to
ﬁnd the respective eigenvalues of the matrix, then sends the results back to the
master program, and wait for another chunk of eigenvalues to compute until an exit

instruction is received.

5.5.1 Load balancing

With PVM machine that is composed of a cluster of general purpose, shared work-
stations interconnected by Ethernet cable or FDDI cable, load balancing is always

needed due to the following factors:
1. each workstation may have different computing power and speed,
2. each workstation may have different work loads from other users,

3. initially divided subtasks may require different amount of time to ﬁnish.

75

During the early stage of developing our code, we used uniform(or nearly uniform)
subdivision method to divide the whole job into smaller ones. We did not encounter
much uneven distribution of computation time among the slave processes until the
code is tested in a shared environment and a heterogeneous environment. We found
some processors took twice as much time as the others to ﬁnish their share in such
environment.

Here are two experimental results that exhibit the uneven computation time
among the processors due to the shared environment and heterogeneous environ-
ment. The ﬁrst experiment was done on six DEC Alpha workstations(one master
and ﬁve slaves) in a shared environment, that is, all the processes that are computing
the eigenvalues of a matrix have to share CPU with other CPU-intensive processes
(mostly from other users). The second experiment was done in a heterogeneous en-
vironment, six DEC Alpha-s and two SUN SparclOs(one Alpha machine served as
the master and all other 7 machines served as slaves). The CPU clock speeds for
the Dec Alpha workstations (model 3000/ 400) and SparclO workstations used for the
experiment are 133MHz and 33MHz, respectively.

In a shared environment, it is somewhat difﬁcult to reproduce an experiment since
other user’s processes come and go randomly. For comparison purpose, we need to be
able to control the environment in order to make the comparison more meaningful. We
don’t intend to compare apple with cat. So, both experiments were conducted during
a reserved time period, that is, all the computers are reserved for the experiment and
no other user can get onto the system. In the ﬁrst experiment, a shared environment
is created by creating two CPU-intensive jobs(called dummy processes) on one of the
slave machines, then run the parallel quasi-Laguerre program on the PVM machine.
One of the slaves has to share CPU with the other two dummy processes that are
running on the same host, hence it gets only 1 / 3 of the CPU access. Both experiments
compute all the eigenvalues of a 5000 by 5000 type 4 matrix. The time of each slave,
without load balancing, is plotted in Figure 5.1 and Figure 5.2.

Now we have identiﬁed the importance of load balancing. The following ideas are

implemented in designing the load balancing scheme for our parallel quasi-Laguerre

76

before load balance before load balance

 

 

 

 

 

 

 

 

 

 

 

 

 

 

60 I I I I T j I I I I
1200 L -
50 - -
1000 - 4
g 40 r ‘ 8’ 800 b _‘
£9, . in
g 30 i 3 600 - ~
— a:
|...
20 - ‘ 400 _ .
10 r r 200 - q
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
PE number PE number

Figure 5.1: Before load balancing in shared environment, other CPU intensive jobs
are running on PE # 2 also. Left: Execution time of each Alpha workstation. Right:
Number of eigenvalues computed by each Alpha workstation. Matrix size 5000, type
4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

before load balance before load balance
60 I W I I T I I I f I I l I I
1000 - -
50 r ~
800 ~
A 40 - . i
' 8,
3; 30 - 4 m 60° [ ‘
g ‘5
p 20 _ _, at 400 " "
1o .. l . 200 - ~
0 0
O 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
PE number PE number

Figure 5.2: Before load balancing in Heterogeneous environment, PE ## 1-5 are
Alpha workstations, PE ## 6—7 are SUN SparclOs. Left: Execution time of each
workstation. Right: Number of eigenvalues computed by each workstation. Matrix

size 5000, type 4

77

method.

0 Create uneven loads so that earlier distributed load has slightly large chunk size
than later distributed loads. This method has an effect of balancing the job as
a whole. Also the process which gets the last job won’t take too long to ﬁnish

since the last job is the smallest in chunk size.

0 Create more subtasks than the number of available host machines. Hopefully,

faster processes can ﬁnish more jobs to help the whole situation.

0 Spawn more process to each host in hope to gain more CPU favor. This strategy
will only be used in emergency, such as, the application needs result as soon

possible and must sacrifice other application’s needs.

0 Reset process priority level to a lower level for courtesy of the actual worksta-
tion owners. Lower process priority number means less CPU access, hence less -

intrusion to the actual workstation owners.

Our algorithm incorporated all of the above features, and the user can control the sit-
uation by choosing appropriate parameter values to run the program. However, here
we only discuss the ﬁrst two items, create more and uneven loads, since the other two
items are more situation dependent. We used two parameters, n_rounds(number of
rounds to distribute the subtasks) and di f f _size(chunk size difference between suc—
cessive processes), to determine how many rounds (each round has n_hosts subtasks)
of subtasks to create and how much difference in chunk size between the succes-
sive chunks. First of all, the total number of eigenvalues is split into nearly equal
chunks(with difference of at most one), then use the value n-1'ounds to further divide
the chunks into smaller ones and use the value of (12' f f _size to create difference among
the chunk sizes. In this way, a job queue is established with chunk sizes in descending
order. The rest of the program is just to distribute the chunks from this queue to the
slave processes until the queue is empty.

Since the whole job is divided into many small chunks to create more and smaller

subtasks, each job takes less time to ﬁnish and the processors that ﬁnish earlier can

78

get more subtask to process. As a whole, every host contributes and the hosts that
have less load(from other users) or faster CPU speed contribute more to the whole
problem. Hence, an overall balanced timing distribution is achieved. The experiment
results with this load balancing scheme is plotted in Figures 5.3 and 5.4. It should be
noted that time(in seconds) spent on computing is balanced among the participating
processes while the number of eigenvalues computed by each host is different.

Although items 3 and 4 above are not discussed in detail here, it is worth to
mention that two parameters, n.w(number of salve processes on each host machine)
and prio(process priority number), are used to determine how many slave processes to
spawn on each host machine and whether to honor the actual host owner by reducing
our process priority in our code.

Experiments showed that more subtasks create more overheads. In a homoge-
neous environment, uniform subdivision works slightly better than the nonuniform
subdivision method. But in heterogeneous environment or shared platform, this load

balancing scheme demonstrates a great advantage.

 

 

 

 

 

 

 

 

 

 

 

 

after load balance after load balance
60 17 I I I I T I j I I
1200 - -
50 - -
1000 - ~
A 40 - ~
3 3 800 - -
3 3° 3 600 - .
.— =13:
'—
20 t ‘ 4oo - -
10 - ‘ 200 - .
0 a 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
PE number PE number

Figure 5.3: After load balancing in shared environment, other jobs are running on PE
# 2 also. Left: Execution time of each workstation. Right: Number of eigenvalues

computed by each Alpha workstation. Matrix size 5000, type 4

79

after load balance

 

60

50

I

40

3O

Time(sec.)

20-

r

10

 

O

I

I

I

I

I

l

 

 

0

1

2

3 4 5 6 7 8
PEnumber

# of Eigs

after load balance

1 j I T I I I

 

 

 

 

 

 

 

 

 

1000» «
aoo~ -
600- .
400. .
200- ll
0

67

0 1 2 3 4 5
PEnumber

Figure 5.4: After load balancing in Heterogeneous environment, PE ## 1-5 are
Alpha workstations, PE ## 6.7 are SUN SparclOs. Left: Execution time of each

workstation. Right: Number of eigenvalues computed by each workstation. Matrix

size 5000, type 4

5.5.2 The pseudo-code

The following is the pseudo-code of the master and slave programs. The actual

master program is written by C since C handles administration better. The actual

slave program is written by Fortran since Fortran is better for scientiﬁc computation.

On the other hand, the LAPACK routines are written in Fortran.

The master program

input diagonal and off-diagonal;

get my process ID -- pid;

detect PVM configuration;

spawn slave process in this config(except me);

get the slave process IDs -- tid[];

create n\_rounds*n\_hosts uneven subtasks, successively differed

by diff\-size;

multi-cast diagonal and off-diagonal to all slave processes;

multi-cast slave IDs to all slave processes;
send the first round of subtasks to slaves;
while (received && subtask queue != empty){
unpack the results;
send another subtask to the slave process just finished;

}

output all eigenvalues;

The slave program

receive diagonal and sub-diagonal;
receive the first subtask and instructions;
call quasi-Laguerre subroutine to process the first subtask;
send results back to master;
while (received another subtask){
call quasi-Laguerre subroutine to process the subtask;
}
if (received exit instruction){

pvm-exit();

5.6 Performance test

We tested the algorithm on a cluster of DEC ALPHA workstations, a cluster of
SUN workstations and the mix of DEC and SUN workstations. We recorded total
computation time for each slave and the total time from spawning slave processes
until eigenvalues are all received. The result for a type 4 matrix of order 5000 on
different number of Alpha workstations is listed in Table 5.1. Husky, Collie, Bulldog,

Sheltie and Mongrel are workstations’ names. 100, 500, 1000, and so on are the order

81

of the symmetric tridiagonal matrix whose eigenvalues are computed. The number,
in the Collie row and 1000 column for example, is the total contributed time for
the workstation Collie to compute its share of the eigenvalues of the corresponding
matrix. Max and Min are the maximum and minimum slave time, respectively. The
effect of load balancing is reﬂected by the difference of these two numbers. Large
relative difference between these two numbers means unbalanced load. The T5H row
lists the whole time, from spawning slave processes till all eigenvalues are received,

for 5 slaves to compute all the eigenvalues. The communication to computation ratio,

CCR, is calculated by the T5S row and the Max row.

IT5S — Marl

CCR = Max

 

The Collie2 row lists the time to compute all the eigenvalues of the whole matrix
by Collie alone. The TlH row listed the whole time from spawning a slave process
to Collie until all eigenvalues are received from Collie. The relative PVM overhead,

OVHD, is calculated from these two rows,

TlH — Collie?
Collie?

 

OVHD =

The PEF Y row lists the parallel efficiency of the algorithm. It is calculated by the

following formula,

PEFY = T1H/5/T5H.

Note we did not include the master host in calculating the parallel efficiency. Table

5.1 shows our parallel quasi-Laguerre’s algorithm has the following advantages,
1. Communication cost is very low, nearly negligible for large matrices.

2. Parallel efﬁciency is very high and increases as the order of the matrix increases.

This directly reﬂects the full parallelability and full scalability of our algorithm.

3. The PVM software causes nearly no overhead to the algorithm.

82

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

100 500 1000 5000 10000 20000 40000 50000
Husky 0.04 0.33 0.98 19.28 73.88 286.71 1119.88 1780.84
Collie 0.03 0.29 0.99 19.58 73.88 292.99 1146.79 1818.00
Bulldog 0.02 0.29 0.98 19.22 74.27 288.82 1157.48 1853.14
Sheltie 0.02 0.29 0.99 19.39 74.72 289.76 1166.00 1847.76
Mongrel 0.02 0.29 0.97 19.19 73.76 289.33 1152.39 1829.34
Max 0.04 0.33 0.99 19.58 74.72 292.99 1166.00 1853.14
Min 0.02 0.29 0.97 19.22 73.76 286.71 1119.88 1780.84
T5S 0.79 0.59 1.29 19.9 75.14 293.74 1182.91 1854.66
CCR 1875% 78.8% 26.0% 1.63% 0.56% 0.26% 1.5% 0.1%
Collie2 0.059 0.98 3.74 89.55 358.51 1411.87 5663.02 9014.07
TlH 0.66 1.21 3.98 89.87 358.98 1412.87 5665.25 9016.91
OVHD 1017% 258.4% 6.42% 0.35% 0.13% 0.07% 0.04% 0.03%
PEF Y 17% 41% 62% 89% 95% 96% 96% 97%

 

 

Table 5.1: Performance test result on DEC ALPHA workstations
5.7 Comparison with parallel bisection and sequen-

tial root free QR

We tested type 1 to 7 matrices of order 5000 on six Dec Alphas workstations. Figure
5.5 shows the two conﬁgurations PVM machines used for our tests.

We run the parallel program(quasi-Laguerre and bisection) on different number
of host machines to compute all the eigenvalues of the seven types of matrices of
order 5000. We also run the root free QR program from LAPACK. The total time
for each run is recorded and the results are plotted in Figure 5.7. Root free QR
method could not take advantage of all available machines. For matrix types 1-5, our
parallel quasi-Laguerre’s algorithm beats root free QR with three or more machines.

For type 6 matrix, our parallel algorithm leads root free QR when there are four or

83

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

    

 

 

 

 

 

 

 

 

 

 

 

 

 

     

terrier

 

 

 

 

 

 

 

Figure 5.5: Two conﬁguration of the PVM machines. Top: formed by Dec Alpha
workstations. Bottom: formed by Dec Alpha and Sun Sparc10 workstations

84

more machines. For type 7 matrix, quasi-Laguerre wins root free QR with 5 or more

machines. In all cases, quasi-Laguerre outperforms bisection method.

85

Type 1 matrix Type 2 matrix

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

 

 

 

800 I I ﬁi I I 800 I I I I
700 - B/M 4— 700 - BM -0—
NQL -+--- NQL -+--—
600 ~ - 600 ~ -
’T 500 - . ’7‘ 500 r ‘
i. i.
E: 400 - - E 400 - .
r: 300 - - i: 300 - -
200 - - 200 - -
100 - + “““ - 100 - + ..... -
o 1 “if ...... T """ "l’ """ '1' 0 n T ...... T ..... 'I """ ‘1"
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Number of PEs Number of PEs
Type 3 matrix Type 4 matrix
800 I I T I 800 I I I I
700 - BM -0— 700 - BM -0—
NQL -+--- NQL -+---
600 ~ - 600 - ~
<~ 500 - - a 500 - *
§ 3:
E 400 - - 27 400 ' ‘
t: 300 _ - i: 300 - -
200 - ~ 200 - -
100 - - 100 - -
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Number of PEs Number of PEs
Type 5 matrix Type 6 matri
800 T 1 . . 800 I I . . .
700 - B/M -o— 700 - B/M +—
NQL -+--- NQL -+---
600 - - 600 ~ -
"-‘ 500 - - (5‘ 500 - -
3 a
g 400 - .. E 400 - -
i: 300 - - i: 300 - -
200 - - 200 ~ ~
100 - ~ 100 ~ +--,_ -
O 0 1 Ni. ...... ‘i- ----- T """ "I’
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Number of PEs Number of PEs

Figure 5.6: Comparison between bisection — B / M, quasi-Laguerre — NQL, and root
free QR — RF QR

86

Type 7 matrix
80 I I ﬁ I I

70 r- B/M -¢—~
NQL -+---
60 - a
50 ~
40 -
30 -
20 f
10 -

 

d

Time(sec.)

 

 

 

 

2 3 4 5 6
Number of PBS

Figure 5.7: Comparison(c0ntinued) between bisection — B / M, quasi-Laguerre — NQL,
and root free QR - RFQR, on a random matrix of order 5000

87

Bibliography

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

P. ARBENZ, K. GATES, AND C. SPRENGER A Parallel implementation of the
symmetric tridiagonal qr algorithm, Proceedings of the Fourth Symposium on the

Frontiers of massively Parallel Computation, IEEE CS Press, 1992.

E. ANDERSON, Z. BAI, C. BISCHOF, J. DEMMEL, J. DONGARRA, J. DU
CROZ, A. GREENBAUM, S. HAMMARLING, A. MCKENNEY, S. OSTROUCHOV,
and D. SORENSON, LAPACK User’s Guide, SIAM, Philadelphia, 1992.

J. L. BARLOW The Parallel Solution of the Symmetric Eigenvalue Problem ,
Large-Scale Matrix Problems and the Numerical Solution of Partial Differential

Equations, Advances in Numerical Analysis Vol. III, Oxford, 1994

J. J. M. CUPPEN, A divide and conquer method for the symmetric tridiagonal
eigenproblem, Numer. Math., 36 (1981), pp. 177—195.

J. DEMMEL, Designing high performance symmetric eigenvalue software for par-

allel computers, http://http:.berkeley.edu/demmel, Feb 17, 1995.

J. J. DONGARRA AND D. C. SORENSEN, A fully parallel algorithm for the sym-
metric eigenvalue problem, SIAM J. Sci. Stat. Comput., 8 (1987), pp. 139—154.

Q. DU, M. JIN, T. Y. LI AND Z. ZENG, Quasi-Laguerre iteration, preprint,
Michingan State University, 1995

Q. DU, M. JIN, T.Y. Ll AND Z. ZENG Quasi-Laguerre iteration in solving

symmetric tridiagonal eigenvalue problems, Preprint, Michigan State University,

1995.

88

[9] L. V. FOSTER, Generalizations of Laguerre’s method: lower order methods,

preprint.

[10] A. GEIST, A. BEGUELIN, J. DONGARRA, W. JIANG, R. MANCHEK, V. SUN-
DERAM, PVM 3 User’s Guide and Reference Manual, September, 1994.

[11] G. H. GOLUB AND C. F. VAN LOAN, Matrix Computations, 2nd Ed., The
Johns Hopkins University Press, Baltimore, MD, 1989.

[12] R. T. GREGORY AND D. L. KARNEY, A Collection of Matrices for Testing
Computational Algorithms, Robert E. Krieger Publishing Company, Huntington,
New York, 1978.

[13] M. GU AND S. C. EISENSTAT, A divide-and-conquer algorithm for the symmet-
ric tridiagonal eigenproblem, SIAM J. Matrix Anal. Appl., Vol. 16, No. 1 (1995),
pp. 172-191.

[14] I. IPSEN & E. J ESSUP Solving the symmetric tridiagonal eigenvalue problem on
the hypercube, SIAM journal of Scientiﬁc and Statistical Computing, Vol. 11. pp.
203-229, March 1990.

[15] M. JlN Quasi-Laguerre ’3 method and application to symmetric tridiagonal eigen-

value problem, Ph.D thesis, 1995.

[16] W. KAHAN, Notes On Laguerre’s Iteration, preprint, University of California,
Berkeley (1992).

[17] J.L. Lagouanelle. Sur une me’thode de calcul de l'ordre de multiplicite’ des ze'ros

d'un polynéme. C. R. Acad. Sci. Paris Sér. A. 262(1966). 626-627.

[18] C. TREFFTZ, C. C. HUANG, P. MCKINLEY, T. Y. L1, AND Z. ZENG, A

scalable eigenvalue solver for symmetric tridiagonal matrices, to appear, Parallel

Comput.

89

[19] T. Y. LI AND Z. ZENG, Laguerre ’s iteration in solving the symmetric tridiagonal
eigenproblem — revisited, SIAM J. Sci. Comput., Vol. 15, No. 5 (1994), pp. 1145-
1173.

[20] T. Y. LI AND Z. ZENG, Homotopy-determinant algorithm for solving non-

symmetric eigenvalue problem , Mathematics of Computation, Vol. 59, No. 200

(1992), pp. 483-502.

[21] T.Y. LI AND X. ZOU , 0n the global Newton’s method and global Secant method,
preprint, Michigan State University, 1995.

[22] D. G. LUENBERGER Linear and nonlinear programming, Reading, Mass.
Addison-Wesley, 1984.

[23] B. N. PARLETT, The use of a reﬁned error bound when updating eigenvalues of
tridiagonal, Lin. Alg. & Appls., Vol. 68 (1985), pp. 179-219.

[24] B. N. PARLETT, Orthogonal eigenvectors without Gram-Schmidt, Dundee Nu-

merical Analysis Conference, 1995.

[25] B. N. PARLETT, The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood
Cliffs, NJ, 1980.

[26] M. Petkovié. Iterative Methods for Simultaneous Inclusion of Polynomial Zeros.

Lecture Notes in Mathematics. Springer-Verlag 1989.

[27] J. RUTTER,A serial implementation of Cuppen’s divide and conquer algorithm
for the symmetric eigenvalue problem, LAPACK lawn 69 (1994).

[28] A. SAMEH AND D. KUCK, A parallel QR algorithm for symmetric tridiagonal
matrices, IEEE Transactions on Computers, no. C-26, pp.81-91, 1977.

[29] D. C. SORENSEN AND P. T. P. TANG, 0n the orthogonality of eigenvectors
computed by divide-and—conquer techniques, SIAM. J. Numer. Anal., 28 (1991),
pp. 1752 - 1775.

90

[30] J. H. WILKINSON, The Algebraic Eigenvalue Problem, Oxford University Press,
Oxford, 1965.

91