9F: c
I 0:
NH D .Y.
vi. .
tn.“

r J . I ._ hr u ”ur-

G. S 5 :LL .Hu:
. 0 UN Mum

r. I ,.I will.
an. 5.. Hum. C
in $3 . . 00
M. k L... ”L1 "L”... 7
Wm PM. a. Cu. n3
1;...“ AJ To” F, ; (via
Ha w... .3 AU
IN {u . EL
nu.” n... Hm” {E
In” AHV I.\ In WI.“

2: Q. g 1h.

It.“ «LEW “.1“ ”1.1
r i 9r. .Lx
rt. cu in. x -
Witt a... “I.” 7..
hi... hi . Km”
52.... "5:
.1...
”U

 

  
     

  

1.1312433?
' Michigan State
,- University .

TthClS

This is to certify that the

thesis entitled

FUNCTION MINIMI ZING ALGORITHMS

presented by

David George MC Dowe l 1

has been accepted towards fulfillment
of the requirements for

Eh. I). degree in_Ma.thema_tics

 

Major professor

Date 7-31-70

0-169

ABSTRACT
FUNCTION MINIMIZING ALGORITHMS
By

David George McDowell

We consider the problem of minimizing a real valued function
f in n-dimensional Euclidean Space by the method of conjugate
directions.

First, the case when f is quadratic with a positive semi
definite coefficient matrix is examined. Two examples of the
conjugate direction method are considered, the Fletcher-Powell
formulation of the Davidon algorithm and the conjugate gradient
algorithm of M.R. Hestenes. An extension of the Fletcher-Powell
method is given and shown to be theoretically equivalent to an
extension of the conjugate gradient method given by M.R. Hestenes.
As a result of this equivalence we show that the conjugate
gradient and Fletcher-Powell methods are equivalent.

These two methods are then applied to the minimization
of non-quadratic functions. Convergence is shown and conditions
on the rates of convergence are derived. The rate of convergence
for the conjugate gradient method is shown to be geometric and
for the Fletcher-Powell method better than geometric.

Finally a general algorithm for the minimization of non-
quadratic functions is given. The above two methods are essentially
special cases of this algorithm. The rate of convergence for this

general method is shown to be at least geometric.

FUNCTION MINIMIZING ALGORITHMS

By

David George McDowell

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Mathematics

l970

Chapter

TABLE OF CONTENTS

I. 'NTRODUCT'ON 0| 0 O O O O, O O O O O I 0' I O O O

l.
2.
3.

Th. Problem. 0 e e e e e e e e e e e e e
PrellmlnarlOSe e e'e e .e e e e e e e e e
Iterative Procedures . . . . . ',° . . .

ll. GENERAL ALGOR'THMS . Q g Q Q g o g g o e o e a

A.

5.

6.
7.
8.
9.

Methods for Generating Conjugate
Direction Vectors. . . . . . .
The Equivalence of Algorithms (A. I)

and (A. 3). e e e e o e e e e o
Converse to Theorem h.l and Theorem 9.2.
Th3 CQ'MchOde e e e e e e e e e e e e e
Th8 FP‘ MethOd- e e e e o o e e e o e o 0
Second Method for Constructing Vectors
that are N- ~Conjugate . . . . . . . . . .

Ill. APPLICATIONS TO NON-QUADRATIC FUNCTIONS. . . .

lO.
ll.

l2.

l3.
BIBLIOGRAPHY

Algorithms for Non-quadratic Functions .
The FP- -algorlthm for Non-quadratic
FunCthNSe e e e e e e e e e e
The cg-algorithm for Non- -quadratic
FunCtIOHSe e e e e e e e e e e e e e e e
Minimizing Men-quadratic Functions . . .

Page

m \OW-ﬂ

l8

3]
35
39
hi
Ah
GI
61

66

7O
77

8|

INTRODUCTION

l. The Problem.

This thesis deals with the study of iterative procedures
for minimizing real valued functions on Euclidean n-space.

It is divided Into three parts, the first two parts deal with
quadratic functions and the third part with non-quadratic
functions.

In particular, the first chapter considers quadratic
functions whose leading coefficients are positive semi
definite.' The iteration method of conjugate directions is
given along with a geometric interpretation. This material
is an extension of results found in [l]* and [2] by E.
Stiefei and M. R. Hestenes.

The second chapter Starts with a description of the two
main algorithms. The first is by M. R. Hestenes [2]. The
second is an extension of Davidon's method [7] as formulated
by Fletcher and Powell [3]. Preperties of the algorithm are
also given. The equivalence of the two algorithms is then
proved. A special case of this equivalence is shown by G.

Myers [5].

 

*Numbers in square brackets refer to bibliography.

l

Two special cases of the algorithms are given for
minimizing a quadratic function with positive semi definite
leading coefficient. These are the conjugate gradient method
and the Fletcher-Powell formulation of the Davidon method.
The case when the leading coefficient is only positive
definite can be found in [l], [2], and [3]. It Is shown that
these two cases of the main algorithms are also equivalent.
In [2], M. R. Hestenes proved that every conjugate direction
method is a conjugate gradient method. From this and above
results we are able to conclude that every conjugate direction
method is also a Fletcher-Powell method.

The last section of Chapter ii shows that minimizing the
quadratic form by the method of conjugate directions is
equivalent to performing Gauss elimination on the coefficient
matrix. The case when the matrix is only positive definite
is done In [I].

The final chapter applies the minimizing algorithms to
non-quadratic functions. Convergence for the algorithms ls
shown and conditions on the rates of convergence are derived.

The Fletcher-Powell method Is shown to converge
faster than geometrically. Two forms of the conjugate
gradient method are given, and both are shown to converge
geometrically. The second scheme was suggested In [9].

In numerical tests [A], the second method did show better
convergence than the first method.

The algorithms In the last chapter, rather than given in

terms of the function to be minimized, are presented in a form
that is convenient for comparing and deriving convergence
results. The equivalence between these algorithms and the
minimization of a non-quadratic function is given In the last
section.

2. Preliminaries.

We consider the quadratic form

(2.1) E(X) - (r.Rr)
with
(2.2) r I k-Ax

where R, and A are nxn dimensional matrices and x, k are

n dimensional vectors. It will be assumed the elements of

R, A, x, and k are real numbers, and the matrix R is positive
definite. We are concered with finding a minimum to (2.l)

which Is also a solution to the linear equation
(2e3) AX - k

(If one exists) by searching in conjugate directions.

Given any two vectors x and y where
x'(xlo°°°oxn)0Y'(Y]s".sYn)
we have

x+y - (xl+yl, ' ° ', xn+yn)

ax - (ax , ax") ,

where a is a scalar. The inner product of two vectors is

given by

(x,y) - nyI + ' ° ° - + xnyn

and the length of the vector x is given by

lxl - (x,x)"2.

The conjugate transpose of a matrix A will be denoted by

A* and A* satisfies the relation
(Ax.y) - (x.A*y).

The matrix A is said to be positive definite if (x,Ax):O

for all x and equal to zero if and only if x - 0. It is
positive semi definite if (x,Ax)30 for all x. If A is positive
semi definite then A - A*. The null space of a matrix A,
indicated n(A), is the set of all vectors that A maps into

the zero vector. If A is positive semi definite and

(x,Ay) - O, with x and y not in the null space of A, then

x and y are said to be A-orthogonal or A-conjugate.

At each vector or point x the vector defined by (2.2)
will be called the residue at x. The function E(x) defined
by (2.0 will be called the error function. This comes from
the fact that E(x) is a measure of accuracy with which x
approximates a solution to (2.3).

The error function E(x) - 0 if and only if x is a
solution to [2.3). that is r(x) - 0. The error function is a

positive quadratic form and in the following work we will be

looking for a minimum to E(x). in the case where (2.3) has
a solution then a point minimizes E(x) if and only if it Is
also a solution to (2.3). Let M be the set of all points

that minimize E(x), that is
M ' [X|E(x) - minimum].

If A is nonsingular then M contains a single point. If A

is singular then M generally contains more than one point.
It should be noted that any positive quadratic form can

be put in the form of (2.l) plus a constant. .Therefore our

methods for minimizing (2.l) hold for all positive quadratic

forms.

Given two vectors x and y we have

(2.“) E(x+y) - E(X) - 2(9.Y) + (y.Ny)
where

g I A*Rr

N I A*RA

where r is_given by (2.2) and A is an arbitrary nxn matrix.
The matrix N is positive semi definite. The vector 9 will
be called the gradient of E(x) at x. It differs from the
usual gradient by -l/2.

Setting y - tp in equation (2.h), where t is a scalar,

we obtain

(2.5) E(x) - E(x+tp) - 2t(g,p) - t2(p,Np).

Using the following lemma we will obtain some preperties of
the error function.

Lemma 2.l Given N - A*RA where R i gosjtive definite

then the following hold
a) P 1.32“") ELM .._Y..°“' .LtP Lush“)-

b) p 2.l—'1 n(N) .i_f_ and only L1; (p,Np) - O.
c) p Lil-EMA) implies (g,p) 9 0.
proof. p is in n(A) implies p is in n(N) by the defini-

tion of N. If p is in n(N) then
0 - (p.Np) - (p.A*RAp) - (Ap.RAp)-

Thus, Ap - 0 and p is in n(A). This completes the proof of
a). b) follows from a) and the proof of a). To show c)

we have
(gap) ' (PsA*Rr) ' (APoRr) ' 0

since Ap - 0.
We can now derive some elementary properties of E(x).
Lemma 2.2 Given vectors x and p, not L2_n(N), the

function E(x+tp) considered 5 a. function 3: t Ls‘ minimized

when t - a, where

(9.P)

(PsNP)

Le. also have

E(x) - E(x+ap) - a2(p,Np)’

E(x) - E(x+tp) - 0

Proof. If p is not in n(N) then by lemma 2.l we have
(p,Np) # 0. Hence minimizing E(x) - E(x+tp) as a function of

t yields t - a and
E(X) - E(x+ap) - 28(g.p) - azip.Np) - a2(p.Np).

The last part follows directly from (2.5) and lemma 2.l.

Lemma 2.3 et 8 b ‘3 linear subspace 2L'En, then

E(X+y) z E(x)

for ally LIL B .i_f_ and only Li: 9 .i_§_orthogonal 53 B.
Proof. Equation (2.h) implies inequality holds whenever

(g.y) - 0. Clearly n(N) is contained in B by lemma 2.2. if
(g,p) # O for some vector p not in n(N) and p in B, then from

lemma 2.2
E(x) - E(x+ap) - 82(P.ND)>0

and the inequality does not hold when y - ap.
Recall the set M was defined as the set of points that

minimize the function E(x). We now obtain an expression for

the set M.

Lemma 2.h M - n(N) + h where h satisfies the Ineguality

E(h) 5 E(x)

or a l X l En.o
..P.ro’of.. If v is in the n(N) then by. (2.“)

E(h+v) I E(h) - 2(g,v) + @3Nv) I E(h)

by lemma 2.l. Thus h + n(N) is contained in M. Let h be
in M and set v I h - h. Then h I h + v and 5(5) - E(h) I O
by the definition of M._ Suppose v is not in n(N) then lemma

2.] implies (v,Nv) # 0. We have
E(h) - E(h+av) - 82(v,Nv) 5 0

since h minimizes E(x). Thus a - o and (g,v) - 0. Therefore
0 - E(h) - E(h) - E(h) - E(h+v) - -(v,Nv) .

This is a contradiction to the assumption v is not in n(N).
Therefore v is in n(N) and M is contained in the set h + n(N).
Lemma 2.5 g(x) I 0 if an onlz i xaM.

Propf. g(x) I 0 implies
E(x) - E(x+tp) - -t2(p,Np) 5 o

for all t and all p, and hence x is in M. Suppose g(x) I 0
then if x Is in M we have E(x) - E(y) 5 O for all y In E".

In particular let y I x+tg, then
2 2
E(x) - E(x+tg) I 2tlgl - t (g,Ng) > O

for t sufficiently small. ThisWs a contradiction to x being

in M. Therefore g(x) I 0.

These lemmas give us some of the basic prepertles
concerning the error function and the linear manifold M.

Before considering finite iteration methods we note that

g(x) I A*Rr(x) I A*R(k-Ax)

I E - Nx

where k I A*Rk.
The previous lemma shows that x is in M if and only if
k - Nx I 0. We know from preliminary remarks that x in M
solves Ax I k if and only if Ax I k has a solution. Therefore
the two linear systems Ax I k and Nx I E are equivalent if and
only if Ax I k has a solution.

3. Iterative procedures.

We will be considering iterations
(30‘) X'+] - X' + tlpl '- 0. I. P 0 e

such that xh is in M, m 5 n. Here p, is a direction vector

and t. is a scalar. The iterative procedures will be used to

minimize the error function E(x). We set

rI I k - AxI a, I c,/d,
(302) 9' - A*Rl" C' - (9'.P')
N - A*RA d, - (p,.Np,)

and we have from lemma 2.2 E(x'+‘) - E(x,) attaining its
minimum when tI I a, and p' is not In n(N), and E(x,+,)
- E(x,) I 0 when p, is in n(N). Therefore for an arbitrary

initial point xO our iteration will take the form

lO

xi-i-l ' xi T aipi

 

(3.3) rl+i I r! - a'Ap, rO I k-AxO
with
(9 p)
a 3 , I. I p, not in n(N)
L 0 p' ll'l "(N)e

 

Since E(x,+,) - E(x,) I 0 for all tl when p, is in n(N) and

x,+l I x, + t p,, we pick ti I a, I O for convenience. Thus

i
the iteration will be completely determined by our choice of
po, pl, ' ° ' where x0 and R have already been chosen.
We have the following definitions concerning
p0, p,, ' ‘ ', pr , r f n-l. We say the vectors p0, p1, ' 0
0 . pr are linear independent if
r
2 a'pI I 0 implies a, I 0, i I O, ' ' , r.
iIo
If B is an nxn matrix then the vectors p0, pl, ' ° ', pr
are B-linear independent if
r
2 a,p, in n(B) implies a, I O, l I 0, l, - ° °, r.
iIo
Clearly B-linear independence implies linear independence and
if B is nonsingular then the two definitions are equivalent.
Let m I rank of A, then n-m I dimension of n(A). We

also haVe m I rank of N and n-m I dimension of n(N). The

next two lemmas give us the properties concerning the

ii

vectors p0, p1, ' ' ' ‘ .

Lemma 3.l l£.m vectors, m f n, are chosen 33 that
(30“) d' - (PI. NP‘) > 0 o (plszj) - O l i J

_then p0, ‘ ‘ ‘ ', Pm-i are N-llnear independent and hence
also A-linear independent. Independent 2: the choice gi‘
x0, the vector xm defined [3.2. (3.3) EL M. 3.9.. also have the

following conditions satisfied,

(305) (QIpPJ) ' 0 j ' 0. ° ° ', l'l
(3.6) (9,.pI) I (gj.p,) j I 0, ° ‘ ', i-i.

Proof. We first note that d, > 0 implies that p, is

not in n(N). Condition (3.6) shows that if

(gllpl)
ai - _-—.
.(piINp')
than
(90.9,)
8‘ - w .
(PioNPi)
We note
m-l m-l
'2 .c,p, in n(N) means N( Z c'p,) I 0,
lIo iIo
therefore
m-l

'20 clel I 0 and CJ(pJ,NpJ) I 0

l2

for j I 0, ° ‘ ', m-l. Hence cj I 0, j I 0, ° ' , m-l and
pl, ' ' ' ’pm-l are N and also A-llnear independent. We next

prove conditions (3.5) and (3.6). We have

gi-l-l ' A*R'l+l ' 9i ' aiNPi
and hence

(91,].pji-(glmji - a,(Npl.pj) .
For i I j this becomes by (3.h)

(9,“.91) .. (9pm) .

and the formula for a, implies (g,+,,p') I 0. Therefore we

get (g,.pj) - o for i > j. We also obtain
(9H,.pJ) - (gimj) - a'(Np,.pJ) .. (spill). i>i+i
which implies
(91-1.p,)-(gi-2.pi)- ° ° ‘ - (90.91).

This completes the proof of (3.5) and (3.6). To show xm

is in N we note that
(gmspj) ' 0 j ' 0. ° ' ', m-l

and 9m is orthogonal to n(N) by lemma 2.l. The dimension of
n(N) is n-m, thus gm is orthogonal to E". Hence 9m I O and
lemma 2.5 implies xm is in M.

We have as a converse to lemma 3.l the following lemma.

l3

Lemma 3.2 .Let.vectors.po,p,,...,pm_,.pg_A-linear independent.

 

Then there exists appositive definite matrix R such that (3.h) holds

with N = A*RA. If in iteration (3.l) xm i§_ip_M then t. I ai.

 

'ﬁngi. Let P be a matrix whose columns are p0, . . ..

pm-l. Let Q I AP with columns v0, ' ° ,v that is

m-i'
v, I Ap', i I 0, ° ‘ ',m-l. The vectors v0, ° ° ‘, Vm-l are
linear independent since the vectors p0, ° ° ',pm_] are
A-llnear independent. Extend v0, - 'on-I to a basis for E".
Thus we have n linear independent vectors v0, ' ° ,vn-l.

Let 6 be the nonsingular matrix whose columns are v0, . . -,
Vn-l' if we set R I (56*).‘, then R Is a positive definite

matrix. We have
6*R6 - 6*(66*)"6 - 6*6*" - I.
that is
(v,,RvJ) I 0 i i j i,j I 0, ° ',n-l
and
(v,,va) I i .
Therefore if we set N I A*RA we have

P*NP - (AP)*R(AP) - Q*RQ - I,

and (3.“) holds.

To show t, I a. we note that
xm ' xi * tiPi + ' ° ° + tm-iPm-i

and rm I k-Axm. Since xm is in M we have gm I 0, therefore

1h
9m - A*er - A*R(k-Axm) - R-Nxm - 0.
Now
9' I A*Rr‘ I E-Nx' I N(xm-xl)

' tiNPi T ° ' ° T tm-lem-i°

 

Therefore
(9,.pI) - tyin;.p,)
with pi not in n(N) and hence I;,
(gispi)
t, - -——-—-—— . al‘
(PIONP')

An iteration (3.3) in which the vectors p0, ‘ ' °,pm_,
are chosen as in (3.“) is called the method of conjugate
directions (ed-method) by M. R. Hestenes and E. Stiefei
[i].

We have the following theorem concerning the cd-method.

Theorem 3.l mm cd-method £13m x! minimizes

E(x) 2; the line x I x‘_‘ + tp._l and also an the i-

dimensional plane 21 points

xw- x. + top. + ' ° ' + ti-lPi-i'

The point x; 13_the center of t e (i-l)-dimensional ellipsoid

which Li the intersection f this plane with t e level curve

 

E(x) - mo).

This theorem follows-from lemmas 2.2, 2.3 and 3.l. This

l5

theorem differs from theorem 3.l in [2] because there the
level curve E(x) I E(xo) is an (n-l)-dimensional ellipsoid
whereas here this level curve is an (m-i)-dimensional
ellipsoidal cylinder'wlth center M. Recall the set
M I h+n(N), where h is a point that minimizes E(x).

When the matrix A is nonsingular the level curves
E(x) I constant form a one-parameter family of (n-l)-
dimensional ellipsoids with center M (a single point), the
solution to Ax I k.

In the case where A is singular and has rank m then the
level curves E(x) I constant form a one-parameter family of
(m-l)-dimensional ellipsoidal cylinders with center M.

In order to see this we consider the linear mapping
A: En+D ,

where D is an m-dimensional linear sub5pace of E", and where

dimension n(A) I n-m. We have
E“ - n(A) + 3,,
where Bm is an m-dimenslonal linear subspace of En and
A: Bm+D

one-to-one. Let A be a nonsingular matrix such that Ax I Ax

for all x in Bm. We have E(x) I E(x) for all x In Bm where
E(x) I (k-Ax,R(k-Ax)).

The level curves E(x) I constant form a one-parameter family

l6

of (n-l)-dlmensional ellipsoids. Thus the intersection of the
level curves E(x) I constant and 8m form a one-parameter

family of (m-l)-dimensional ellipsoids in 8 But since E(x)

m.
and E(x) agree on Bm then the intersection of the level
curves E(x) I constant and Bm are (m-i)-dlmensional
ellipsoids in 8m where the center minimizes E(x) in E". We

also have

En I n(A) + 3m
and
E(X+q) - E(x)

for all q in n(A), hence the level curves E(x) I constant are
(m-l)-dimensional ellipsoidal cylinders with center axis M.

if in theorem 3.l we let

i-l
Bi I {150 tjpjltj arbitrary} .

i I l, ° ° ',m, then B, is an i-dimensional linear subspace
generated by l A-linear independent vectors and Bm is as
above. We have B'CI Bi+l for each i, hence the level curve
E(x) I constant intersects Bi in an (i-l)-dimensional
ellipsoid.

in the cd-method we have an arbitrary initial point x0.
This results in a displacement of each B, by an amount x0.
The geometrical interpretation is the same as above except

we are dealing with hyperplanes that are not subspaces. The

points x0, ‘ ’ ',x, are in x0 + B, and x, is the center of

l7

the (i-l)-dimensional ellipsoid which is the Intersection of
the level curve E(x) I constant and x0 + Bi' The point xm

minimizes E(x) and is the intersection of x + B

o m and M.

The previous two sections are an extension of the
results found in [l] and [2], to the case where A is a
singular matrix.

We now consider various methods for generating vectors

that satisfy condition (3.h).

II
GENERAL ALGORITHMS

h. Methods forggenerating_conjugate direction vectors.
Given a positive semi definite matrix N we can generate
vectors that are N-conjugate by taking n basis vectors
“l" 0 a ,un and then N-orthogonalizing them by a Gram-
Schmidt process. This method will be looked at in section 9.
In this section we will present two algorithms. The
first is by M. R. Hestenes and can be found in [2]. The
second method is an extension of the Davidon algorithm, as
formulated by Fletcher and Powell in [3]. We will show in
section 5 that the two algorithms are equivalent.
The following theorem is due to M. R. Hestenes and can
be found in [2].
Theorem h.l Let K and N 22 positive zgmL definite

matrices and let 9 b ._2 arbitrary vector not L2."(K)'

w__ 0--

The algorithm

90 . K90 gi+l . 9i - aini
M") Pi+l " K91+] T bii’l
a] I clld, , bi I el/d‘
d; - (Pi.NPl) . C; - (9i.p,)

el " ' ("PliK9i+l)

l8

l9

generates non-zero vectors 90,9', . 0 °,- and p°,pl, . . .

satisfyingm relations
(1‘02) (9'3KQJ) ' 0 . (P'szJ) - o l *1 e

T e algorithm will terminate l2." steps where r 5 min (m,s)

where m is t e rank f N and s is the rank iii-K. is. the rth

step one‘_: t e following will old. i) 9r I 0, ii) 9r i O,

Kgr I 0, iii) Kg, I 0, Npr I 0 with pr I 0.
Corollary i. If K.Li positive definite and (2,90) I 0

for al 2 L2 n(N) then algorithm (h.l) will terminate i r

asses. r 5 £223.21“. N. LI.___and _on_i.x Li; 9. - o.
ELEEL! Clearly case ii) of theorem “.1 cannot occur

with K positive definite. If (2,90) I O for all z in n(N)

then
(2,9,) I (2,90) - ao (z,Npo) - ° ° ° - a'-l(z,Np') I O
for all z in n(N) and i 5 r. In case iii) Npr I 0 means
pr ' K9!" T brIlpr-l
is in n(N). Therefore
0 ' (gripr) ' (gr,Kgr) ' br-l (QrDPr-l)
' (9r'K9r)

Thus gr I 0 and algorithm terminates in r steps if and only
if ng0.

Corollary 2 I both K an N are positive definite then

20

(1i.l) M terminate Lil r ste s, r s n, mmmu 9r I 0.
It will be convenient to describe the material dealing
with the Davidon algorithm as formulated by Fletcher and
Powell in terms of the Dirac bracket notation. Fletcher and
Powell used this notation in [3]. The reason for this is that
outer products of vectors occur in this algorithm and they
have a convenient formulation in this notation. A column
vector (x1, ' t 0 ,x") is written as [€> . The row vector
with these same elements is denoted by <3I. The scalar
product of <§| and |£> is written <§jy>i. The notation
Ix> <y| is the outer or vector product of the two vectors
Iﬁ> and i<y|. It is the matrix whose elements are (xmyj).
If ths an nxn matrix then Hlx> is a column vector and
<x|H is a row vector and <x|H|y> is a scalar.
The following theorem gives an algorithm that generates
vectors which are NIconjugate. It is an extension of the
Davidon algorithm as formulated by Fletcher and Powell [3].

Theorem h.2 Given positive semi definite matrices Ho

an N an p’vector go not‘Lp n(Ho) then the algorithm

[50> I Holgo> [Qt-ii) " [9l> ' allei)

'5 l+l> "' ”in l9i+l>

Hi+i'”i"'Ai"'Bi.

(h.3) where

 

_ L°i><ﬁ| B, - -’H,|y'><yilﬂ'
(Gib/i) . (“Null/I)

2]

and

|°i> " ailsi> » Iv.) " l9.) - [9i+l>
aI I cildl , c. I <g3lsi> , d' I <s'INIs'>

generates non-zero vectors so, 5,, 0 ° - o satisfying the

relation
<53|NIsJ> I 0 ii‘j .

The algorithm terminates 1.9. r steps when one 9; the following
holds. i) [9,.) I 0, ii) Igr>¥ O, Hr-l|9r> I 0,
iii) Hr_llgr> .i O,ler> - o with |sr>¢ 0.

Theorem “.2 will be proved by means of two lemmas.

Lemma h.l The scalars a',cI, and d' are positive or

i< r. Moreover, the following hold

a) (align-l) " 0
b) Hi+lN|<=ii> " [0,)
(1m) c) <0I|N[03+]> - o
d) <X|Hiix> z 0 for all x
e) (QIIHllgi-i-l) . 0
f) <°i-l|9i+l> " °
9) <YIIHilYi>>°il<r‘

The algorithm terminates _i_n_ r steps L1: and only Lt one 9; t e
following holds. I) [9,.) - 0, ii) lgr>¢ O,Hr_1|gr> - 0,
iii) Hr-l|9r>" O'N[5r> I 0, in this case Isr>¢ 0.

Proof. a, is chosen so that <5l[9i+l> I 0, therefore

22

‘<°iI9l+l>> ' ai‘<5l[9|+|>’ ' 0.

To show b) we have

HI+l N|°i> ' Hl-i-ll'Yi)’ ' HI[Y|>’ T AIIYi>’ T B,Iy,>»

" HIM) ” [°i> " Hilyi> ' |°i>°
For c)
0 " <9i+l[°i> " <I9i+lIHi+lNI°l> " <5l+l|N|°i>
implying

‘<P]+l["[°i>’ ' 0 0

We prove d) by induction. We are given Ho positive semi

definite. Assume H1, ' ‘ ' ,Hk are positive semi definite and

let In) - inki'” Ix) . In) - (MI/2h.) with |.>. 0.
For k<r we will show in AA 9) that |q>¥ 0. Then

since

<p|p><<ii<i> ' <P|q>2 3 0

by Schwartz's Inequality. Also

<°klvk> - (okINlok> > o . k<r .

23

Therefore «<x|Hk+]|x>» 3 0 for all x.

To show e) we observe that

<9i|Hilgi+l> " <silgi+|>I 0.
Now

<°l-l|9l+i> " <0i-i|9.> ‘ <°I-l[NIOI> - o

establishing f).

For 9) we note that

(“will”) ' (gl'gi-i-llHiin'Qi-i-l)
. (sulﬂilsi) + <9i+ilﬂilgy+1>.

Since i <r and Hi is positive semi definite then

<9llﬂilgi>> 0 and <gl+l|Hllgl+l> Z 0

and therefore

<Y‘IH'IY[>> 0 . i<r.

The scalars a,, c', and d, are positive for i <r since

cl ' <91|5i> " <9i[”i[9|>> 0

and

di I i<s,|NIs'>’> O
and

8' - CI/d‘e

2h

We now prove the last statement of the lemma. If
Igr> I 0 then Isr> I Hr|9r> I 0 and the algorithm terminates.

If Igr>i‘0 and Hr-l|9r> I 0 then

[5,.) ' Hflgl’) - Hi--l|9r> 4' Ar-l|9r> + Br"l|9r>

with Ar-l|9r> I O and Br_'lg,.> I 0, thus Isr>I O and
algorithm terminates. If Hr_,|g,.> 9‘ O and ler> I 0

then Iyr> I Nlpor> I 0 and hence H is undefined since

r+l

both Ar and Br are undefined. Therefore the algorithm
terminates in r steps. In this case we have Isr>ii 0 since

if [5,.) I O then Hrlgr I O, and

Hr-]|9r> + BrIllgr> - 0'

Writing this equation out gives us

(1.”) Hr-l|9r> - tls‘r-i> ' 0

where

t _ <Yr-lIHr-il9r>
<Yr-IIHrIlIYr‘l>

Thus if t I 0 then

l+t
0 ' <9rlerl> - T <gr|HrIl|gr>

implying Hr-ilgr) I 0 , which is a contradiction- If

t I 0 then

25

O T Hrlgr> T I"r-ll9r> '

This is again a contradiction since in case Iii) it is assumed
Hr-llgr> Ii 0.
If none of the three conditions hold then Iyr>

I Mar) 9‘ 0. Also H'rlyr> 1/ 0 for if Hrlyr> I 0 then

Hr|9r> T Hrlgr-i-i> and

<9rlHr|9r> ' <9r|Hr|9r+l> -, <s.lg....> - 0.

Hence '5'.) I Hr|9r> I O and Mar) I 0, which is a
contradiction to the assumption Mar) 1‘ 0; thus Hrlyr>si 0.

We therefore have

<Yr|°r> I <°rIN|°r> > 0 and <yr|Hr|yr> > 0,

hence both Ar and Br are defined and Hr+l is a positive semi
definite matrix. Thus the algorithm does not terminate at

til

the r step and

[Sr-Ii) ' Hr+llgr+l> . With |9r+l> " [9,) -‘N|o,.>.

This completes the proof of the lemma.

Lemma li.2 The vectors s°,s], ' ' ' generated .91.

algorithm (li.3) satisfy the followinj. relations:

a) <S'INISJ> - ‘0 iii]
(ins) b) HkNIaJ> - [0]) j < k
c) <s,|gJ>Io lIO,",j-l

d) <s'lgj>Ic, j-o,---,i

26

Proof. In view of lemma h.l relations (h.5) a), b)
and c) hold for i, j 5 l. Assume they hold for l, j 5 k and

we shall prove they hold for l, j s k + i. We have

Isak) .. lg.-.) - "lab-l) '[91+l> " N|°l+i> " ° ' '"|°k-i>

and

<°i|9k> " <°ilgi+l> ' <°ilNl°i+|>' ' ° ° T<°ilN4[°k-l>
I 0 for i < k .

Also, we have
<°il9k+l> ' <°i|9k> " <°l|N|°k> " °
for i < k + i. From “.5 b) we obtain

(bill‘l‘m) " 3k <°|[N[3k> " ak<°i|NHk|9k>

Iak (O'ng> IOforl<k.

Since ak is positive then <°i[9k> I O. From this we observe

that

<vklik~ln> -<v..Io.> - <o.I~ia.> - o

for l < k. Therefore

|°k><°k1NJEi>
<°lek>

“k+i“l°i> ' HkN|°i> +

_ "kIYk><Ykl”k"|°i>

<9 [Hlek> " HkNlai> ' |°i>
k

27

for i < k and

Hk+lNIak> - Iok> .

From this we get

0 ' <9k+i|°i> " <9k+i|“k+i"|°i>
- <5k+‘INIo,> for i < k + l .

Hence <s,INIsJ> I0 i,j5k-l-l
and (“.5) a), b) and c) are proved for i,j 5 k + i.

To prove d) we have

<5i|9o> ' <51|91> T 30 <5l|leo> ' (51”!) " °l '

Assume d) holds for j 5 k, that is

<sk|91> " ck 3' 0. ° ' ‘. "°

Then we note that

<sk+llgj> "' <5k+l[9k+l> T a] <5k+llNl3j>

4. e . e + Tk <Tk+l|NlTk> I <skHng+1> I Ck-I-l

and lemma “.2 is proved.
Theorem “.2 follows from lemmas “.l and “.2.
The vectors so,s], ' 0 , sr.] and do, 0‘, ' ° ' , ar_,
are N-linear independent and hence also linear independent.
The next two lemmas give some preperties of the matrices

H'. l - 0. T T. '0

Lemma “.3 n (Ho) I n(H,) , l>>0 .

28

Proof. Assume z is in n(H') , l 5 k and show that z is

in n(H We have

k+l)‘
Hk-i-i'z) " HUI) T Aklz> T Bk")

“’0 (ENE?
<°kIYk>

We note that Ink) I l/ak Hk[9k> , hence

<0k[z> I l/ak <9k|HI<lz> . 0

and therefore Ak|z> I 0. Thus HkHIz) I O and z is in

I Ak|z>i I

n(H We therefore have n(H.) contained in n(H'+,) for

k+l)e
each i, and hence n(Ho) contained in n(H,) for each i.
To complete the proof we need only show that if z is

In n(Hk) then 2 is in "(Hk-l)° Assume z is in n(Hk) then

(21°51) 2

<zIHklz> T <ZIHk-]|Z> T (ck-liYk-l>

_ <Yk-lIHk'l[z>2

<Yk-i [Hit-i lYk-l)
Let Ip> - (Hk-,)'/2|z> and Iq> - (Hk_,-)'/2|yk-,> it 0 then

(“-5) ° " (21%|!)
_ $p|p><cilq> - _<piq>2 + (140.002

<<i|<i> <Yk-][°k-]> .

By Schwartz's Inequality we have

29

<P|P><°|°> - <i>|<i>2 : 0-

Thus, for the right hand side of equation (“.6) to be zero

we must have

<,.,><.,.> - <plq>2 - o

and

<<z|ok-]>> I O .

Therefore,

H - [Y - Y _ [H - Iz
OIHk|z> THk-llz> _ k l l_< l>< k__Jl_ k_l_>

<Yk-1["k-l[Yk-l>
" Hk-ilz " tk‘lykIl>
where
SZk-ilﬂk~1'=>

<Yk'l IHk-l lYk-l> .

We therefore have 2 - tk-]yk_, in "(Hk-l)' Howeven,we have

 

tit-l "

just shown that if a vector is in n(Hk-i) then it is in n(Hk).

Hence [2 - tk-lYk-i> is in n(Hk) and

0 I Hklz " tkIlYkIl> " " Tk-lHklyk'l>
I .. Tk'llak'l> 0 Where [ck-'l)T 0'

This implies tk-l I O and z is in "(Hk-l)° Therefore n(H')
is contained in n(Ho) for each i and the proof is complete.

Lemma ll.“ Li; Ho hgsitive definite then H, _i_§_ positive

30

definite mm i.
T 5:23;. The proof of this follows directly from (“.“d)
of lemma “.l, and lemma “.3.
We complete this section with two corollaries to theorem
“.2 '
Corollaryi .l_f_ H0 Ls. positive definite m<z|90> I O

for a l z‘Lp n(N) then algorithm (“.3) will terminate L;

r stars-2. r s 24229.1. N. Lf.__and __.y.onl 1:, lg.) - o.
M' 'f <Zl9°> I 0 for all z in n(N) then <z|g'> . o

for all i 5 r.

It is only necessary to show that case Ii) and case
iii) of theorem “.2 cannot occur. Clearly case ii) cannot
occur since Hr-l ls positive definite by lemma “.“.

in case iii) we have Hr-lgr I O and ler>> I 0 with

Isr> T Hr|9r> T Hr-llgr> T Br-llgr>'

Since Isr> is in n(N) and <z|9i> I O for all 2 In n(N),

then

<Yr-l [Hr:l|2[>_2 .
<Yr-][Hr-i|Yr-l>

 

0 " <9r|5r> T <gr|Hr-llg|‘> T

In order for the right hand side of this equation to be zero

it is necessary that

[9r> T tlyr-l>

for some scalar t. If t I 0 then

3i

[5,.) T Hrlgr> T tHrlyr-l> T tlc’r-l>

and NI°,-}>’ I 0. This contradicts our assumption that the

altliorithm terminates at the rth

step. If t I 0 then Igr> I 0
and Hr-l|9r>’ I 0 , which contradicts Hr-l|9€> I 0. Therefore
case iii) cannot occur and the algorithm terminates if and
only if Igr> - 0.

Corollary 2 if both H and N are positive definite then

—“ O- _

algorithm (“.3) will terminate in r-steps, r 5 rank _i N,

LL and Gill! Ll: lgr> I O.
5. The equivalence of algorithms (“.l) and (“.3).

In this section we prove the equivalence of algorithms
(“.l) and (h,3). We do this by showing that if Ho - K then
the two algorithms generate vectors that are scalar multiples
of each other. G. E. Myers proved a special case of this in
[5]. There Ho and K were both equal to I and N was positive
definite.

in order to prove this equivalence we need the following
two lemmas.

Mil-$5.] <91[Hi-l[93> I O _2_r_l < j 5 r

Proof. Since

HI-]|Z1-]><Yl-||Hl'-l[9l>

|s,> - H1|9|> . Hi-l|9i> " <y'-llﬂl-'|y‘_l>

and

<5ilgj> '0 l‘JS"

32
by “.5 c) , we have
(5.1) o - (gulf-i) - <9jIHi-i[9i>

_ iglIEI-llyi-Q<Yi-l[Hi-llgi>
<YIIIIHi-llyi'l>

We note that

(QJIHl-llYi-l> " ' <9lei-i[9i>

and

‘<y‘-'[Hi-1[Y'-]>> ' ‘<9i-l[Hl'l[gi-l>’ T [<gilHl‘llgl>"

Therefore (5.1) can be written as

(”mi-ll”) ]
(502) 0 ' g IH _ lg [T — - -- _— ‘__5 _ .
(J I I '>[ <9i-li";-1|9i-i>+<9;|"i-i|9,>

Also

<9i-i|”i-l|9i-i> > 0
since i - l < r. Therefore equation (5.2) is zero If and
only if

<gj|Hl'l|gl> ' o for l < j : r e

Lemma 5.2 H,Igj> I HoIgJ> for i < j 5 r.

£5331. We first note that (“.Sc) implies

Ail“). o for i .. o. . . . .1... l . hence

33

"IIQD " Til-1'93) ‘ Hi-llyi:l><yi-llHl:1|9[>
<Yl-I'T'i-llYi-l>

. iii-1'91) ' Hi-lIYi-i>[<9TT11HTTT121>— _, CHIN-1'99] .
<v,-llHl-1|Yi-l>

 

However,

<gilHi-l|9.i> " 0

by the previous lemma, and

<9i-i|”i-l|9j> ' <5i-lIQJ> - 0 .

Therefore

Hllgj> - Hl-llgj> forl <1: r
and

Hllgj> " HI-lIQJ> " ' ’ " Holgj>

which proves the lemma.
Theorem S.i L: K I "0 m algorithms (“.l)9_11(“.3)
generate vectors 13.2.3293. directions. I
m. The proof of this theorem is by induction. We

have

'50) T Ho'go> T K|g°> T [90> °

Assume Is,> I t,lp,> with, t‘ > 0 and i 5 k < r where
Ip'> , i I 0, ’ ° ' , k, are generated by algorithm (“.l)

and [51> , i I 0, ' ' ' , k, are generated by algorithm (“.3).

3“

We have

(5.3) [5k+]> " Hk+ilgk+l> " Hkl9k+1> T' Bklgk+l>

“ijk><Ykl“k|9k+l>
<9k|Hk|9k> T <9k+l [“klng)

 

- Hob...» -

Now

(53') Hklyk> ' Hklgk) ' Hk'9k+l> ' lsk> " ”0'9k+i> .

and

(5.5) (minim...) .. <vk|H°|9kH> . - <9k+1|He|9k+i> .

Substituting (5.“) and (5.5) Into (5.3) and collecting terms

we obtain

(5'6) Isk+l> " tk-i-l HOIQk-i-l) T Bklsk>

where

<9k1,"k|9k> _

tk-I-l " ‘ ." "’0
<9k|”k|9k> T <9k+IIHo|9k+i>
and

E _ <.....lH,Ig....> __i_ 4<9k+i|K|9k+l>

k <9k'"k|9k> tk <9k'K|9k>

Using (“.l) and (“.2) we can show

b _ <9k+l[K[.9k+l>
P
k <9k|K|9k> ‘

35

hence Bk I l/tk bk .

Therefore (5.6) becomes

ITk+l> T_ tk-I-l K|9k+l> T bkli’k> T tk+l|pk+l> '

Coroliacy I If H I K.L£ positive definite and

an. O I...

<zlg°> I 0 ML 2 in n(N) then algorithms (“.l) m (“.3)

are equivalent i the sense described above.

Lemma 5.3 <g,|Hk|gJ> - o for l < j and k 5 l.

Proof. For ItI i we have

<gllHi|9j> I <5,Igj> I 0 for i <j .

For k < i we have

<93|Hklgj> ' <9][H|-|[9j> "' O o

with lemma 5.2 implying the first equality and lemma 5.l the
second.

We note that in particular

<9i|Ho|9j> I O for i 9‘] .

6.. Converse to theorem “.l and theorem “.2.

Algorithms (“.l) and (“.3) can be used to generate any
set of N-conjugate vectors where N is a positive semi definite
matrix. In [2] this is shown for algorithm (“.l) when N and

K are positive definite matrices.

 

 

Theorem 6.l at N 23;: positive semi deflnitg_matrlx with
rank m an at p0, ' ‘ ' , Pm-l p$_m vectors such that

36

 

 

(6.l) d, I (p‘,Np')>0 , (p,,NpJ) I 0 , i I J.

1.3;. co, ' ° T ‘ ,cm.i be in positive mnumbers gi_§_e_t_
ai I ci/di i I 0, ° ' ,m-l £1_

(6.2) 90 I aoNpo + ° ° ° + am-1Npm_' .

.23. g], ' ' ‘ , gm_] Eggnerated by

(6.3) 9l+l I 9i - a'Np' .

Then there exists 2.ppsitive semi definite matrix K such that

‘6‘“) p0 T K90 0 Pl+l ' K9i-i-i T hip]
with bi I ci+l/°i . Also
(6'5) c] " (930KB!) 9 (gisKQJ) " 0 . l *1.

Proof. Let G and P be the nxn matrices whose columns are

90’ . o . OQM'lOOI.. ,0 and pot . o . ppm-Is 0. T .9 0
reSpectlvely. Let A, C, D be diagonal matrices whose

diagonals are ac, ‘ ° ',am.], 0, ' ', 0 ; co, ‘ ‘ ' ,cm.',

0, - ° ,0 ; and do, ° - - . dm-], 0, ° ° ° ,0. Let A, E, D
be diagonal matrices whose diagonals are l/ao, ‘ ‘ ‘ ,l/am-l,

0! . . ,0 i l/cob . . . i [,6 oi . . . i O i and

m-i'

l/do’ e e o . l/d o . e o 0 .

m-l’
Set bi I ci+l/°i , and let B be the matrix with

b0, ‘ ° ' ,bm-2, O, ' ' ° ' 0 just above the diagonal and

zeros elsewhere. Let V I (v'j) with V'J I i when i I j+l

and Vij I 0 elsewhere, and set T I I - V.

37
We have
A - c5 - 5c , B - tv*c , P*NP - 0.
Relation (6.3) has the form
GT I NPA.
it follows that
P*GT - P*NPA - DA - c.
Since C* I C, we have
c - c* - (P*GT)* . T*G*p
and 6*? - T*"c.
If we define Q I PET* then
G*Q - c*rtr* - T*"cET* - T*"TT* - T

where

with I an mxm identity matrix. Define K I QCQ* then
KG - QCQ*G - QCT* - QCT - Qc.
Since
PET*C - QC - KG

and

38
PET*c - P(I-B) - P-PB,
we have
ch- P-PB.

Hence P I KG+PB

establishing (6.h). Also
G*KG I G*QCQ*G I C

and K is positive semi definite. This completes the proof
of the theorem.

We can also choose K positive definite. Let K be a
positive semi definite matrix whose null space is generated
by go, ' ‘ ‘ ,gm-] , then K defined by K I K+K is a positive
definite matrix. The matrix K also satisfies conditions (6.h)
and (6.5) of theorem 6.l.

Theorem 6.l is an extension of theorem 5.l in reference
[2], from N and K positive definite to N and K positive semi

definite.

Theorem 6.2 et N be 2.Eositive semi definite matrix a:

rank m an p0, ' ° ' me-i m vectors such that
(PioNPI)>o a (PlsNPJ) ' 0 l * 1-

‘L3 (6.2) and (6.3) hold then there exists‘g positive semi

definite matrix Ho such that

(6'6) po . Hogo ’ Pl+l ' Hl+l 9l+l

_.-.-- __ .._, in.-- .5. msz
I

 

1..

39
where Hi+i.Li defined 1i L1 algorithm (h.3). Also
(6.7) (9,.Hkg,)>0 . (gl’Hkgj) I 0 k s l < j.

3:221. Statement (6.6)follows from the equivalence of
algorithms (h.l) and (h.3l and theorem 6.l. We get (6.7) from
lemma 5.3 and the fact that Hk is positive semi definite.

We now describe the conjugate gradient method (cg-method)
and the Davidon method as formulated by Fletcher and Powell
(FF-method).

7. The cg-method.

The following is essentially the method described in
[l] and [2]. However here we are allowing A to be a singular
matrix. The method of conjugate gradients used to minimize
the function E(x), where E(x) I (r,Rr) with R a positive
definite matrix and r I k-Ax, is as follows. Select a
positive definite matrix K and an initial estimate x0 and

compute ro ' k-Ax 90 ' A*R'oo and Po ' K90. The vector go

o.
is a negative scalar multiple of the usual gradient of E(x)

at x I x0, and is therefore in a direction of decrease of E(x).
Having obtained X}, r;, g, and p, we compute xl+1, r'+,, g'+],
and p'+l by the following. We have

(Qisp') (gl'Kgl)

(9,.Np') (p,.NP,)

with N I A*RA

#0

xi+i ' xi + aipi

rl+i I k - Ax'+l I r, - aiAPl
(7.1)
91+] ' A"‘Ri‘l-i-l

(Np,,Kg,+|) (9l+lngl+l)

 

 

b, - -
(p;.Np‘) (9,.Ksll

pi+i ' “91+! + bipi °

We note that

93+: A*R'I+1 ' 91 ' aini'

The vectors go, 9], ' ' ’ are orthogonal to n(N) and hence
algorithm (7.1) satisfies the hypothesis of corollary l to
theorem h.l. Therefore algorithm (7.l) terminates in r steps,
r 5 rank of N, when gr I 0. By lemma 2.5 xr minimizes E(x)
and is therefore a solution to Ax I k if one exists.

Theorem 7.] .IEE cg-method is a cd-method 222.5E3bsolution

xr is i M. The points x0, ° ° ' x, (i s r) generated

 

2x'the cg-method lie L_'_g i-dlmenslonal subspace n'. The
point x: minimizes the function E(xliLg "i'

 

 

This theorem is proved in [2], however it is shown above
.that xr is in M.

The following theorem completes this section on the

 

 

cg-method. It essentially shows that every cd-method is a
cg-method.

Theorem 7.2 kg N 22 positive $5.311. definite matrix
mmumuumtawmeuo. ' ° ° . pin-19...!"

vectors such that

hi

d, - (p'.Np')>0 , (p',ij) - 0 l i Jo

 

Then there exists‘g cg-method such that the algorithm described

Lg section 7 generates p0, - - ° , pm-].

5522:. To prove this we need only show that golies
in the space spanned by Npo, ° ° 0 , Npm-] and then apply
theorem 6.l and the paragraph following it. If m I n then
Npo, ' ° ° , Npm-l spans E" and we are done. If m<n, then En
is spanned by Npo' ° ' ° , Npm-],qm,' ' ° 'qn-l where

qm. ' ° ° , qn-] spans the null space of N. Thus,

"pm-l + V

9o - aoNpO + . ° + “m-l

where v is in n(N). Hence
0 I (v.9 ) I (v.v)
. o
and therefore v I 0. This means
go I aoNpO + ' ° + am'lNPM‘l

where

(9009')

 

a] - - a' a

(p, .Np,)
The materialin this section for the case when A is
nonsingular can be found in [2].
8. Th6 FP'methOde
We consider here the application of the FP-method to a

quadratic function, in particular to the error function E(x).

#2

The method is as follows.

Select an initial point x and compute

Ir.) - Ik) - we). I9.> - A*Rnr.> .

and [50) I Holgo> , where “o is a given positive definite
matrix. Having obtained HI, xi, r', g', and s‘ we compute
H;+], a, x'+], ri+i 91+] and s'+] by the following formulas.

We have

a . (9:151)
‘ <5,|N|5,>

XI...‘ . x. + a'ls,>

Iri+l>> ’ l'l)’ ' aiA|5i>
A*RIr'+l>> - Ig,>> - a,N|si>

l°j>”<°i|
‘<°I'Yt>

a, - ————

|°I>’ ' a,|s,)» . IYI>’ ' |91>’ ‘ l9:+1>’ ' "|°h>

|33+|> " "3+1 |9|+l>°

 

i:

.1

v e
I

I
+
I

This algorithm is equivalent to algorithm (h.3) where

go, ' ‘ ‘ , gr are the negative gradient vectors of E(x) at
x0, "0 ° ,xr respectively. As in the last section the
gradient vectors are orthogonal to the null space of N, and

hence this algorithm satisifes the conditions of corollary i

“3

to theorem 4.2. Therefore the algorithm terminates if and

only if lg I O. This implies that x minimizes E(x).
r

r

In view of corollary i to theorem 5.l the cg-method and
the FP-method are equivalent. This means that if Ho and K
are equal and the two methods start at the same initial point
xo then they generate direction vectors that are scalar
multiples of each other and hence generate the same points

x]. C C O . xr.

Theorem 8.i Theorem 7.l holds with cg-method replaced

2L.FP-method.

Theorem 8.2 _E_v_e_l;y_ cd-method Ls_ 29. FP-method.

Prggi. This follows from the equivalence of the cg and
FP-methods and theorem 7.2.

From theorem 8.i we have that 00, ° ° ° val-l spans an

i-dimensional subspace n; and the point x, minimizes E(x) in

n}. if we let W be the space spanned by a', ' ' ' ’ar-i then

the matrix H, projects the gradient vector gI into W in the
direction of 0;. This becomes the next direction of search

for a minimum to E(x). We note that W is N orthogonal to 1'.

If N is positive definite then E" I n, + W where W I 1,".

Lemma 8.i I N Lipositive semi definite then Hr he.

left inverse £2.N 22 the space generated El.°ot. ' ' ,ar-'.

i N La positive definite and r I n then

Ah
Proof. We have

(8.2) HrNIoJ> - laj> , j - 0, ° - ° ,r-l.

Hence Hr is a left inverse to N on the space generated by
do,“ ' ‘, °r-1° If N is positive definite and r I n then
(8.2) implies Hn‘I N'].

We observe that S'NS I A where S is the matrix whose
columns are 00, ° ° ' ,on.] and S' is the transpose of S
and A is diagonal with (allNIa'> , i I O, ' ' ' ., n-i,

along the diagonal. The matrix N satisfies the equation
N - s"'1 As" - (SA'IS'VI .

hence
u" - SA-IS' - T‘IM'I)” Ial><OII

. zlj<oi|NIo,> -I|o,><a,| I I'ZAI.

This completes the proof of the lemma and also completes

section 8.
The last two parts of lemma 8.i can be found in [3].

9. Second method for constructing vectors that are N-conjugggg.
In section A we mentioned another method for generating

N-conjugate vectors. This is the usual Gram-Schmidt method.

The connection between the cd-method and Gauss elimination will

be examined in this section. This is considered in [l] but

with the restriction that the matrix we are dealing with be

positive definite. Here we will relax this restriction to

AS

include positive semi definite.
Let N be a positive semi definite matrix and u,, s 0 -

un be n linear independent vectors. We generate P]. ' ‘ ‘.

pn by the following algorithm

Pl " uI
(9.1) i
PI+I ' "3+1 ‘ 13] tijpj
where
(PlsNui+l)

 

pJ not in n(N)

o pJ in n(N).

if pj is in n(N) then t'J can be arbitrary for each i,
however we pick tU I 0 for convenience. Some elementary
prOperties are stated in the following lemma.

MBA Lipo, ' ' °, Pn _a_r_eg_enerated b_y_ (9.l)
mm: following an.

a) In; vectors pl,' ° ° ,pn 2:3.linear independent.

b) Ltu‘i i n(N) Mp; I u'.

c) if r i t e set of indices for which p i i n(N)

r
then i, j EEEHLE r implies (p‘,NpJ) I 0 i i j 221
(PIONPI)>00

d) The set of vectors {pl-Ir 1.9.. 1'} form _a_ basis for n(N).

e) The set f vectors {pill not in r} are N-llnear

0
___~

independent.

f) (P',NuJ) I 0.LEL j<l‘ggg (p'.NU:) ' (PioNP])0

#6

Proof. To prove a) we note that (9.l) implies
u]. . . . , un can be written as a linear combination of
p1.‘ ' “.pn. Therefore Plo ' ° ', pn span En and are linear
independent. If u, is in n(N) then by (9.l) t'-]j I O for
j I i,‘ ' ‘ ,i-i and p; I u,, establishing b). We note that
c) follows directly from (9.l) and the definition of t'j.
For d) we observe that if v is in n(N) than

n

and
0 I Nv I E a,Np,,
hence

0 I aJ(pJ,NpJ) , j t P

implying aj I O for j not in P . Therefore

v I Z a‘p .
i c? ‘

We get a) directly from c).
The vectors "j can be written as a linear combination
of p], ' ‘ ' , pj where (p,,NpJ) I 0 for i i j. This implies

that (p[,NuJ) I 0 for j<i and
(p1.Nui) I (p:.N(pl+v))
where v is a linear combination of p1, ' ' ', p,_l. Therefore

(PIDNUI) ' (PioNP )0

A7

completing the proof of the lemma.

This method of orthogonalizing vectors is just the
standard Gram-Schmidt method. '
I We have a second method of computing the vectors
pl' ° ‘ ' , pn from u], ° ° ', un. This is found in [l] and
it is through this method we show the relationship between
Gauss elimination method for solving a linear system and the
cd-method.

Let p], ° ° ' , pn be generated by the following

algorithm.

u,l I u, -i I l, ‘ ' ‘,n
. J
(9.2) p1 “1
",1“ . u'l - twp, .- j... - .
(U‘J.NUJ)

 

T , i>j, pj not in n(N)
tlj ' (PJUNUJ)

0 pj li‘l i'i(~)
(The following lemma indicates some of the properties
concerning this algorithm

Lemma 9.2 _l_f_ p], ' ° °,.pn are generated 31 (9.2)

then the conditions i lemma 9.l hold along with the

following additions

g) (u,k,NuJ) - o j<k
h) (ulk,ij) I 0 j<k
k

') “i ' “i ' tHP] ' ° ° ° ' tik-lpk-l '

We now look at the cdnnection between Gauss elimination

#8

method and algorithm (9.2). Consider the linear system

Nx I k where N is a positive semi definite matrix. The
following shows that solving this linear system by Gauss
elimination is equivalent to solving it by the cd-method
where the direction vectors are found by the N-orthogonallza-

tion of the vectors e], ° ° ', en where

a, - (0,0, - ° -0, l, 0° ° °. 0)
with l in the ith position. This is done in [l] for the case
where N is positive definite.

Iteration (3.3), with initial point taken as zero,

 

becomes

xl I O

X1+I ' xi + BIPI

(gippl)

(9.3) p, not in n(N)

8' - (P'szi)

o p, in n(N)

and with

gl-k-NXI-ko

Lemma 3.l states that for p, not in n(N)

(9;.p,) (91.91) (k.P,)
a I—C—I—I-I-Iw-
I (9;.N91) (p,.Np;) (pI.Np;)

The vectors pl, ‘ ‘ ° , pn are generated by the recursion

formula

A9

u" - 0,
(9.4) p, - u,‘
uIJT' I u'J - t'JpJ i I j+l, . 0 0 ,n
’(Nu,J,eJ)

 

pj not in n(N)
tlj 'l (NPJ’OJ)

L 0 p1 in "(N)e

 

These formulas generate mutually conjugate vectors p,,' ' ',pn
and by (9.3) corresponding points x1,‘ ' °,xn. The vectors
pl,‘ ' °,pn satisfy lemma 9.2. The inner products in tij

and a} can be computed in an easy way. The method is the one
used in elimination. We also get a basis for the null

space of the matrix N In the process.

First we write down the matrices N, I and the vector k.

711712...rlnlo.°'°0kl

‘

r2].r22"°r2n0 0°"0k2

(9.5)
00000000000000000
rn]r,,2°'°r,.m0°"°0lk,.|

where N is an nxn matrix, I is the identity matrix and k is

an n vector. We will assume the equation Nx I R has a

solution. By (9.3) and (9.9) we have

pI-ul De]

XZ-Xl +alp, '8] p1

50

(k.p,)

3' - (NP'.°|)

 

p] not in n(N)

O p] in "(N)e

We also have

i
U' - U, ' tilp] - OI ' t,lel

 

where
(Nu I e )
. 1 o l rll
- I -- p] not in n(N)
‘11 ' ("PI-°I) '11

Multiplying the first row by t'] and subtracting from

th

the i row (iI2,° ' ',n), we obtain the new matrix

r'l r12 0 0 0 rln P110 0 0 pl" k

2 2 . . .
° r22 ‘ ‘ ‘ r2n 921 P2n k2
(9.6) 2

. U3] 0 a "3" k3

2 2
0 'n2 0 e e rnnz unlz o 0 Huh kn
where
r‘Jz - r'J ‘ ti] rlj ' - 2. ' ' °.n
J ' In . . .0"
with

We also have

P] ' (pll' . . .spln)

P2 ' (P210. ° '0 P2")

2 2 see
ul2 I ("ll 0 ° ° .0 uin ) I ' 30 ,n.

The following two lemmas summarize the results.

Lemma 9.3 ‘Li pl I e] i not i n(N) then t e followigg

hOlde

a) rIJZ I (Nu'z, °j) i I 2, ' ° ‘,n

b) k,2 . (k,u,2) i - 2,- . ',n
c) r”2 I ”'2 i,j I 2,‘ ‘ ',"
d) - (o k 2 - . . k 2)

92 u 2 o 9 n.

°) NP2 ' (00 r2220 ' ' °. r2n
3:22:31 3’ rijz ' 'Ij ’ tII'IJ
- (~.,..,) - z,,(u.J,.,) = <~.,,.J) - .,.(~.,..,)
. (N(e,-t'|e]),ej) I (Nu,2 ,eJ).
In particular we have
r222 ' ("P2o°2) . '122 ' (""1 2 .02).
b) k,2 . k,-t,,k, - (k,e,) - t,,(k,,a,)
I (k, e, - tIlel) I.(k.u‘2)

We observe here that k22 I (k.pz).

c) We have 'Ij I r1] since N I N* and also

52

tllrlJ-m rlj-r-J-lrl'ntjlr‘, .'
rll rll

hence

2
rUZ ' rU-tnru - rjl'tjl'li ' l’J'
for i,j I 2, ° ' 9,n.

d) 92' . (92'el) I (9,,e‘) - a,(Np',e,)
. kl 2
I-k" -—- r'] I k, '_ti]k‘ I k' , i I 2,‘ ' ',n
rll
with
k
l . l

92.k"_r]l-0'

rll

e) if rIJZ I (Nu,2,eJ) then
2 . . s
‘72] ' (szoej) J ' '0 On'
Therefore
"”2 ' (0' '222' ' ' '. '2n2)

and
2
'21 ' (“92.91)" 0 °

-Lemma 9.4 If p, I e, L3 in n(N) then r,, I r" I 0 or

i I l, ' ' °,r an t e five conditions in lemma 9.3 reduce t

53

2

b)k,-k,,I-2,-°°,ngndk,-o
2 0 0 0
C) rljz - rJ' l,J - 2. .ﬂ 0
_B_~La) this reduces £3 rU I r“.
d) 92 ' (oukzo . . .okn)

°) Np2 ' (0"220 ' ° .ian)

5533:. We have 'll I r], I 0 for i I l, ‘ ' ‘,n since
Np' I Nel I 0. Conditions a) through e) are a direct
consequence of t'y I a] I O. in b) k] I 0 since we have
assumed Nx I R has a solution. This ends the proof of lemma
9.9.

in the case Pi is In n(N) we have
X2 - X] + alp, - 0.

We will consider another step before looking at the

general case. From algorithms (9.3) and (9.9) we obtain

x3 I x2 + azpz

f
(k.P2)

 

J p2 not in n(N)
a2 I (Np2,e2)

 

0 p2 in “(N)

2 0 0 0
u, I u, - t'zpz i I 3, ,n

where

Sh

(NU'2.°2)

 

p2 not in n(N)
t12 ' ("920°2)

0 p2 ln n(N).

if p2 is not in n(N) then by lemma 9.3

2

2
2 . 32 - k2 lfzz e

2
t12 ' '12 ('22

Multiplying the second row of (9.6) by t,2 and subtracting

from the ith row (iI3, ' ‘ ',n), we obtain the new matrix
'11 '12 '13 ' ' ' 'In p11 ' ' ' Pin k1
2 2 0 0 0 2 e o a 2
° '22 '23 '2. p21 p2n k2
e e e . 3
(9 7) 0 0 r333 '3n3 P3] ' ‘ P3n k3
0 0 'h33 ° ° ° 'hnB "hl3 ' ' ’ "9"3 k“3
0 O rn33 e e e rnn3 ""13 0 s o ““03 kn3 .
We have
'IJ3 ' '112 ' t12'2J ‘ ' 3" ' ""
J ' 20' ' '0"
u'3 I “'2 - tlzpz l I 3, ' ' ‘,n
with

p3 I 033 and k'3 I klz I t'zkzz

whera p3 - (PB'. 0 o .Op3n) and "'3 - (“'13. o e .Ouln3)

The following two lemmas summarize the results.

Lemma 9.5 I p2 i not i n(N) then the fellowLeg hold

 

a) r'J3 I (Nu,3,ej) i I 3,’ ‘ ‘,n
J _ 2.. . ..n
b) k,3 . (k,u,3) 1 - 3, - - ',n
c, '113 . m3 1.3- 3.. . '.n
d) 93 . (o, o, k33, . . -, kn3)
e) Np3 - (0. o, r33 . ' ' '. r3n3 L
5.2mm Lpznmm) men: - r212 - 0.1.2.:
l I 2,’ ' ',n ﬂugcondltlonemm9ﬁ ML
a) r” Ir” lI3,"',n
J. 2. . . ..n
b) k,3 . klz 1 . 3, ' ' ‘,n and k22 . o
c) 'Ij3 I r113 I,j I 3,‘ ' “,n.
;B_y_a) ELI-3.. reduces _t_<_>_ r112 I rjiz'

d) 93 . (o, o, k32, - - ',kn2)

e) Np3 I (0. 0. r332. e e

an

also x3 I x2.

If p2 is in n(N), then by lemma 9.6,

would look like

r11 '12 '13 ' ' ' r1n P11

0 0 e o e e e e 0 p
(9.8) 2'2

0 0 r33 ' ° ' r3" U3}

the

matrix (9.7)

' Pin k1

' P2n O

. 2 2
U3" k3

56

in this case we have

3

p3 - 03 I U32 I (“312. . . .9 U3 2)

i'l

It should be noted that if we had not assumed the existence

of a solution to Nx I k then kzz

may not have been zero in
the case of p2 in n(N). Fm.
We now consider the general case. Suppose we have

p], ' ' ‘,pm and x], ’ ' ',xm then by algorithms~(9.3) and E

(9.“) we have

 

(k,pm)

 

, p not in n(N)
am '( (Npmt°m) m

 

 

0 pm in n(N)
u‘ill‘l'l - uim . tlmpm '- m+]. 0.0 a."
where
(Nu m,e )
J ' I m pm not in n(N)
tim" )(Npm19m)‘

 

If pm is not in n(N) then we have

m

tim I rlmm/rmm and am I kmm/rmmm .

The matrix is in the form

l'll

Multiplying

from the ith row

(9.10)

rll '12 ° 3 ° rlm
2 a o e 2

0 r22 I’Zm

'-nm
We have
r m+l _ r m
ij lj
u'm-i-l - u,”
kim+l _ klm
and
pm+l ' (Pm+ilo

'mIln

l rnn

p11

P2]

m+l,

m+l,

m+l,

Pln h
2
PZn k2
.Pmn k m
0 0 0
m m
unn k ‘

the mth row of (9.9) by t‘m and subtracting

(i I m+l, - - -,n), we obtain the new matrix

 

58

u m+l _ (”lim+l m+l)

1 1 ' ° '1 u1n ' ' m+‘.' 9 '.n.

The following two lemmas summarize the results.

 

Lemma 9.7 L_.pm L_'not L__n(N) then t a following hold
a) r,jm+l I (Nu'm+l,ej) i I m+l, ' ' ',n
.l " mo. ' .0" n...
C) rljilW'l - Him.” '.J . "1+1. 0 e a."
0 0 0 0 0 0 +
d) 9m+1 ' (00 o 00 km+lm+ls 1 knm 1)
. . . +1 .. +
3) NPm+l ' (0. o 01 'm-I-lm-I-lm 1 o 'm+lnm l>° A
Lemma 9.8 _l__ pm _i_s_ L1; n(N) then r'mm I 'ml

I 0 (l I m, ' ‘ ',n) an t e five conditions in lemma 9.7

reduce t

a) rlJm+' I film is) ' ”+10 ' ' '1"
b) klm+' I k,” 1 - n+1. - - ',n
C) rIjm+l - rJ'ﬂH’l l.j - m..." e e 0.".

Bi 3‘) this reduces £9. r'Jm - I‘J‘m o

d) 9m+l ' (00 ° ° '1 0, km+lm1 ° ° ..an)

6) an+1 . (0. ‘ '. 0. rm+lm+lm0 '1 'm+lnm)

In this case we also have

ill‘i'l m
pm+l ' um+l - UM+I ’

and again kmm I 0 since we are assuming that Nx I k has a

solution.

59

Recall that applying the cd-method to minimize the error
function E(x) is equivalent to solvingthe linear system Nx I k
If E(x) is suitably chosen. Let E(x) I (r,Rr) where R Is a

positive definite matrix and r I k-Ax. The gradient vector
9 - A*Rr I k-Nx

where g
k - A*RE, N - A*RA.

Since E(x) is minimized when g(x) I 0 then we see that

 

minimizing E(x) is equivalent to solving the linear system
Nx I k.

From the work in section 9 we see that in choosing our
direction vectors p‘, ' ’ ',pn by N-orthogonalizing
e], ° ° °,en the inner products in algorithms 9.3 and 9.9 are
found by performing Gauss elimination on the system Nx I k.
Hence the application of Gauss elimination to Nx I k and
solving Nx I k through algorithms (9.3) and (9.h) are
equivalent.

Let N be positive definite in the previous material of
section 9. Since every cd-method is a cg-method then there

exists a positive definite matrix K such that

sl'k
(9.11)
P1 ' K91 91+1 ' 91 ' alei
(91.9.) (k.pi)

aI-_———-M

(P;.NP|) (pg.Np,)

6O
p1-1-1' K91-1-1 " b1"1
., - .m/c, . c.- (g,...)-1k.p,).
The vectors 92, ° 9 0, 9n are generated by (9.ll) since
91+] I k - in+l I k - Nx' - ain, I g, - a'Np'.
From the proof of theorem 6.l we see that If we choose
K - G*"cc" ,

where G is the matrix whose columns are 9,, . . .,gn and C
is the diagonal matrix with c, down the diagonal, then (9.il)
is satisfied. Hence the vectors pl, 0 . -,pn generated by
N-orthogonalization of e], . . . -,en are also generated by
(9.il), and solving Nx I k by Gauss elimination is equivalent
to solving it by the cg-method whose direction vectors are
given by (9.il).

We now apply the cg-method and FP-method to the

minimization of non-quadratic functions.

 

Ill
APPLICATIONS TO NON-QUADRATIC FUNCTIONS

i0. Algorithms for non-quadratic functions.

The algorithms considered in this section and also
sections ii and l2 are for minimizing a non-quadratic function
f(x), bounded from below for all x and having a positive

definite Hessian matrix at the minimum point. The minimum

 

I. *IILI ' u s

point found Is of course just a local minimum. These algorithms
differ only In the method for generating the direction vectors
p0. pl. ' ’ ° . These vectors are the directions in which
the function f(x) is minimized. The algorithms presented In
sections l0, ii, and l2 will not be given in terms of f(x) but
in a form that is convenient for comparing and deriving
convergence results. In section l3 the relationship between
the minimization of the function f(x) and the algorithms in
sections l0, ii, and l2 will be shown.

Let {C1} and {T1} be sequences of positive definite
matrices converging to C and T respectively, where C and T

are positive definite.

 

Theorem l0.l l__g° L_H3 non-zero vector then the algorithm
90 I Togo 9.+1 . 91 - agc1p,
(91.121)
a} '
(10.1) (pphpl)

6i

‘62

' TI+19i+l

pi+l
generetes non-zero vectors 90, g], ' ° 0 an p0, pl, 0 - v

such that
(l0.2) (9,4,1. Pi) ' 0.

The algorithm terminates L2 r steps‘Li| or some r, 9r I 0;

otherwise

(l0.3) lim 9, I 0.
1+»
‘55223. The scalar a, is chosen so that (l0.2) holds.
If gr I 0 for some r then pr I 0 and the algorithm clearly
terminates. The proof of (i0.3) is a result of the following
lemma and theorem.
£$Eﬂ£.'°°l £1523 exists positive LEEL numbers m an M

such that

(l0.li) 0 <m5l/a' 5“

Proof. From (l0.l) we have

 

The eigenvalues of C,Tl are zeros of the determinates of the
matrices C,“ A - T' and T"' A - Ci' Therefore l/a' is
between the maximum and minimum eigenvalues of C'Tl. Since

C, and T, are positive definite then the matrix C,TI has real

v
l)

 

63
positive eigenvalues. Thus

0 < Almin 5 l/a' s A'max

where Aimax and A)m are respectively the maximum and minimum

in
eigenvalues of CIT,. The minimum and maximum eigenvalues

of C,T, converge respectively to the minimum and maximum

5:
eigenvalues of CT. Since the eigenvalues of CT are real and 2
positive there exists m and M such that g
1 1 3
0 < m s A min 5 l/a, f A max 5 M g

f";

 

for all i.
Theorem l0.2 LL algorithm (l0.l) does not terminate Ln

‘3 finite number‘gi steps there exists positive real numbers

v an w such that 0 5 v < i an or i >1 w

(1005) (91+10C-‘9l+‘) : V (glee-'9')e

Proof. We have

(1006) (“MW-'9'...) ’ (9'sc-Igi) ' 23'(9'oc-ICIT[9')

+ 8'2 (9',T,C'C-IC'T,9,).
This equation can be written in the form
(no.7) (g,,,.c"g,+,)'- 1.,.c".,1 - 2.,(g,.r,g.1c.1.,>
where

c ( ) (x,c“c,T,x) 1/2 (x,T,c,c"c,T,x)
X - “ ‘ ‘ ‘ _ g
' (x,TIx) (x,T'C'T'x)

 

 

6h

Since CI converges to C,

lim G,(x) I i/2

i+I

for all x i 0. Hence there exists w > 0 such that for all

i > w
(l0.8) G'(x) > l/h .
By lemma l0.l a, 3 i/M. Hence from equation (l0.7) we obtain

al (gioTlgi)
(g .C'lg )5 1" I (9 .C'191)
1+1 1+1 2 (91.6491) 1

 

where

 

with 0 < m 5 M. By choosing
v I l- m/ZM,

we have 0 < v < l and

(9,...c"g..,> : v1g,.c"g.>

for all I > w.

As a consequence of (l0.5) we have

lim (gI,C-lg,) - 0.

I+w

and hence

 

65

lim gI I O,

l+a
Therefore the proof of theorem l0.2 completes the proof of
Atheorem l0.l.
Theorem l0.2 shows that for i > w the sequence
'{(g,,C"g,)} converges as fast as a geometric progression
with ratio v.

Before considering a special case of theorem l0.l we

finish this section with the following lemma and theorem.

Lemma l0.2 _I'_f_ I), I I-a,C,T, then A j__a_n.eigenvalue __f_

DI LI: and only _I_f_ (l-A)/a, _I_.__n_ eigenvalue 9; C,T'.

Proof. Since

 

Di I A l I al [I (~l-A)/a' I CIT']

and a, is bounded away from zero for all i, then A is a zero
of the determinate of Di - AI If and only if (l-A)/a, is a

zero of the determinate of I (l-A)/a' - C, ,.

Theorem l0.3 Lilli algorithm (l0.l) P113. the eigenvector
corresponding £2 t e maximum eigenvalue 2:.CIT' then

|91+1| < Isnl

In be respectively the maximum

and minimum eigenvalues of 0,. By lemma l0.2 (l-A'm,n)/a,

l i
Proof. Let A max and A m

and (l-A' )/a' are respectively the maximum and minimum

max

i
We note that A ma < l since

eigenvalues of CIT x

1"
(L-A'mxna. > 0. s1...

66

(91.11,) (p';.T1"pi) i

3' - —. -

(PI.CIP;) (p'.C1p') max. eigenvalue of C,Tl

 

 

 

then lie, is the maximum eigenvalue of C,T'. This means

A min I D and all eigenvalues of D, lie in the Interval

[0, I). Therefore, by (l0.l),
91+1' 91 " a1517191" l>191
and l

|91+1|5||D1|| I9'|< |9.| .

 

ii. The FP-algorithm for non-quadratic functions.
The FP-algorithm is a special case of algorithm (l0.l).
However, before looking at it we first consider the following

result.

If {T'} is a sequence of positive definite matrices such

that

llm Tl I uC-I

l+ee

where u is a positive real number, then algorithm (l0.l)

converges.

With the above choice for the sequence {T,} we get a

convergence which Is stated in the following theorem.

Theorem ll.l _I_f_ T, Le defined 2 above and algorithm

(ID.l) does not terminate Lg'g finite number 2:.steps then

90-911"'§£_“_5.t¥.

67

where v, z 0 an
“—-

(ll.2) lim v' I O

I-bee

Proof. Let

(X.T‘-IX)

 

Ll(x) '
(x,C'x)

then

lim L|(X) ' l/U

[Ira

 

for all x i 0. Therefore

lim a, - lim L,(p,)-1/u

l+w l+ee

and

I‘m C'T’ . U'e

I+ee

if we choose v' I ”D," then

lim v' I lim ”I-a'C‘T'III O .
l-Nn

l+o
Since 91+] ' 019'. we have
I91+1| 5 HO, |||9.| . V,|9;|o

We now give the FP-algorlthm for minimizing a non-quadratic

function.

68

Theorem li.2 The conclusion 2: theorem l0.l follows j__
s

we replace {Tl} 9.1“”) where Ho L_eositive definite

 

matrix and

H,+, I H, + AI + B,
A _ |°1><°1l
I <°1|Y1>
("'3’ B _ '”1|Y1><Y1|"1
' <v11H11v1>
1a. - am.) . 1v1> - '1|°1>'

 

 

3:321. To prove this theorem it Is only necessary to
show that Hi is positive definite for each I and converges to
a positive definite matrix.

The proof is by induction. The matrix H0 is positive
definite by hypothesis. We assume H,, ' ' ' ,Hk are positive

definite and show that Hk+l is positive definite. We have

‘ 2 2
x k x H y
<x|Hk+1|x> I <x|Hk|x> + < '0 >u I < I kl k) .
<°lek> <°k|Yk>
If |p> I Wk)”2 IX) and |q> I (Hk)”2 ka> with
|q> .1 0, then

 

@1931» - <m>2 + <xla.>2
' <qlq> <°klyk>

 

(11.11) <x|Hk|x> -
.1...

("‘5) <°k|Y1<> ' <°k|ckl°k>> 0'

By Schwartz's Inequality we have

69

(ll.6) <P|P><QICI> - <p|q>2 z 0 .

and is equal to zero if and only if

(li.7) Ip> I th>

for some non-zero t. Since Hk is positive definite, (li.7)

is equivalent to

Ix)» I tlyk>1.

If <x|ok> I 0 for Ix> 9i 0, then Ix) 1‘ tlyk>by
(ll.5). Hence (ll.6) is a strict inequality and <3]Hk+,|x>i> D.

If <xlak>si 0 then <x|ak>2 >0 and <x|Hk+IIx>> O.

From the definition of H'+] we have

”1+1C1|°1> " |°1>

and

(pHIIC, I131) . 0

for each i. The definition of H, here and in section A differ
in that here y' I C'ol and in section A y, I Na'. Since C'
converges to C we have

llm 11, - c",

i+I
Therefore {H‘} is a sequence of positive definite matrices
whose limit is positive definite.

The next theorem gives a result on the convergence of the

FP-algorithm.

70

Theorem 11.3 J; T' .i_s_equai 2H] and algorithm (l0.l)

does not terminate _i_t_1__a_ finite number of. steps then there

exists positive real numbers v' such that

(”-9) |91+1| s Vilgil

29....

(11.10) lim v' -o.
I-Hn

‘nggi. With the choice of T' I Hi for each i, the
hypothesis of theorem ll.l is satisfied. Therefore both (li.§)
and (li.i0) hold.
l2. The cg-aigorithm for non-quadratic functions.

In this seciton two cg-aigorithms are given for non-
quadratic functions. Both methods, however, generate the same
direction vectors when the function to be minimized is
quadratic.

We again have the sequence {C'} of positive definite
matrices cowerging to the positive definite matrix C.

Theorem l2.] Given 3 positive definite matrix K

an
* -

 

non-zero vector 90 then t e algorithm

P " K90 91+1' 91 " 813191

(glsp')

 

a -
‘ (plbclp')

(12.l)
P1+1 " K9i+l * b1P1

7i

(K9|+I.CIPI)
bi I '

 

(PIDCIPI)

generates non-zero vectors 90’ g], - 0 0 and p0, pl, 0 o .

such that
(1202) (9i+l0pi) ' 0 o (P]+]oc[PI) ' 0

or a l i. The algorithm terminates L_.r steps L1_ or some

r, 9r I 0; otherwise

 

(12.3) llm gl - o
1....

25221, The scalars a, and b, are chosen so that (i2.2)
holds. if for some r, gr I 0 then br-i I O and pr g 0 and
the algorithm terminates. The proof of (l2.3) follows from
the next lemma and theorem.

LSEEE.'2°I £3353 exists positive real numbers m an M

such that
“a

(12.h) 0 1< m 5 1/3! < M

Proof. For non-zero vectors p' and 9. we have by (l2.l)

(P1091) (gingl)
a - - e
' (Pisc'p‘) (p'scip‘)

 

 

We note that

(911K91) ' (P1oK'1P1) ‘.§1-12 (P1-I'K-IPI-1)

72

and hence

(g,.1<g,) 5 (pl.1<"p,).

Therefore

(P1 p'K-lpi).

 

a1
(P'scip')

l 1
and a' is bounded above by i/A min where A min is the minimum
eigenvalue of C‘K. Since c, converges to C there exists

m> 0 such that
(12.5) a. 5 l/m

for all i.

To show a' is bounded away from zero we note that

(l2.6) (91.51P1) - (91.KCIK91) + 2b|-](P|och|-])
‘b]-|2 (Pl-]oc|P[-]).
Using (lZ.l) and (l2.2) equation (12.6) becomes

(12.7) (91.0.9‘) . (9‘.KC'Kg,)

+ b,-,(91.K91) (91.6161-1'191-‘) (91.6.51-1'191)

 

 

81-1 (9,.9,-]) (p;.91)
(P1é1gF1P1-1)

 

(PI-1-CI-191-1)

Since Ci converges to C the term in brackets converges to i.

We note that

73

(QisKgi).
b1-1 ' > 0 1
(91-10K91-1)

 

hence for i sufficiently large equation (l2.7) implies
(Piscip') : (9'.KC'KQI)

and
(91.K9')
a' 2 A e
(9‘.KCK9.)

 

Thus,_for i sufficiently large a. is bounded below bY Ilk'max

where x'max is the maximum eigenvalue of C'K. Hence there

exists M 0 such that

(12.8) a, z i/M

for all i. From (i2.5) and (l2.8) we have
0 <m f i/a' 5 M

for all i.

Theorem l2.2 'Li algorithm (lZ.i) does not terminate
1.9.5. finite number 9_f_ steps there exists positive real

numbers v and w such that 0 S V“i an or i>1w
- -1
(1209) (9i+l.C lgi+]) S V (9'.c 9').

Proof. Using (l2.l) we have

(12.10) (9i+lvc-‘9i+l) I (91.6'191) - 2a'(g'.Kg')G'(g;.p')

where

c ( ) (x,c"c,y) (y,C"c,c'y)
] st ‘ - '

l
(st) ; (Yuc'Y)

 

 

 

7h
Since Ci converges to C,

lim G‘(x,y) I i/2

[+1111
for all x and y such that

(x.y) I (x.KX) 9‘ 0 .
Hence there exists w > 0 such that for i>w
(12.11) Gl(x.y) >1/l1 .
From the lemma i2.i we have
(12.l2) O <l/M f a‘ 5 l/m .

Therefore from (iZ.i0), (iZ.li), and (l2.l2)

_‘ 81(9|0K9')
9') "

 

IA

(91+ioc-l91+1) (931C

2

which becomes

 

-1
(9i+i'c 9H1)

IA
.
I
A
to
m
c
O

with

a (9i K9)
1" .l S‘-m/2MO

 

By choosing

v I i I m/ZM,

75

we have 0 f v <l and'

l

-1 -
(914.11c 91+1) 5 V (91'c 91"

As a result of this theorem we have

lim (gl,C-lg‘) - o

i+a
and hence

lim gI I 0 .

[+00

 

VT

This completes the proof of theorem l2.i. Theorem i2.2
shows that for i>»w the sequence {(g,,c"gI)} converges as
fast as a geometric progression with ratio v.

The second cg-method is simply taking algorithm (l2.i)
and restarting it every n iterations. The algorithm is
given in the following theorem.

Theorem i2.3 LL’K is 2_positlve definite matrix an

90 _i_§__a_ non-zero vector t en t e algorithm

po ' K90 9j+i+l ' gj-o-l ' aj+icj+lpj+i

(gJ+IOPJ+')

 

a i I
1+ (PJ+[DCJ+|PJ+')

(12.13)
pJ+l+i ' K91+i+l + bj+iPJ+‘ l . 0. ‘ ‘ °. "'2

. - (K91+1+1'c1+lpg+1)

(91+i'c1+lpj+i)

 

bj+i

PJ+1+1 ' K91+1.1 l I "-1 .

76

where j I 0, n, 2n, 3n, . . . and for each J, l I O, o . 0,

n-ig‘ggnerates non-zero vectors go. 9], 0 . - an p°.p]. ‘ '

such that

('Z‘IA) (9J+i+l'pj+i) ' 0
2.2.1
('2"5) (”1+1+11°J+1PJ+1’ ' 0 i . 0. ' - -. n-Z.

The algorithm terminates lpyr steps if for some r, 9r I 0;

otherwise

llm gI I 0

I+o
Theorem l2.h Li algorithm (l2.l3) does not terminate
"L9. .9. m M '9; 2.2.2.33. there exists positive real

numbers v and w such that ()5 v < l and for l > w
. -l -l
(l2.l6) (g'+‘oc 9'44) 5 V (9'.c 9').

The proofs of theorems l2.3 and l2.h duplicate the
'proofs of theorems l2.] and l2.2.

Numerical tests [A] for minimizing non-quadratic
functions have shown the second conjugate gradient method
converges faster than the first method. The Fletcher-Powell
formulation of the Davidon algorithm (F-P method) exhibits
substantially faster convergence than either of the conjugate

gradient methods.

77

l3. Minimlz[gg_non-quadratic functions.

Let f(x) be a non-quadratic function bounded below, and
assume f(x) has a positive definite Hessian matrix at every
local minimum.

We indicate the matrix of second partial derivatives
(Hessian matrix) at the point x by f"(x). The gradient
vector at x is indicated by f'(x).

To minimize the function f(x) we select an initial
point x0 and set go I - f'(xo). Next select a vector po
that is in a direction of decent. This will be true if

(go.po) > 0. The next point x is found by minimizing f(x)

l
along the line

x(t) I x0 + t p0.
This point will be characterized by
(f' (x]) .po) I 0.

Having found x', calculate g. by g' I - f'(x') and
select p; so that it is in a direction of decent. This will
again be true if (g',pl) > 0. The point x‘+1 is found by

minimizing f(x) along the line
(13.1) x(t) I x' + t p' .

We again have (f'(x'+l), pi) I 0. The sequence {f(x')}
is a monotonic decreasing sequence of real numbers bounded

below, and therefore converges. The sequence {x'} converges

78

to some point 2 which is a local minimum to f(x). Hence

(1302) lim XI I i
I-no
and
(13.3) ilm f(xi) - f(i)
i-ns

where f(x) < f(x) for all x in some neighborhood of x. We

are of course assuming f(x) is sufficiently well behaved so
that (l3.2) and (l3.3) hold.

We have
(13.H) g'+] I g. I f"(x'+elo')o‘
where 0 5 9i < l and
0' - X'...‘ ' X' - a'p] e

Here a} is the scalar that minimizes f(x) along the line

given by (l3.l). The

I'm (x'i'G'O') - ;
[+0

since °i converges to zero. Hence

Hm f"(X'+eICI) - f”(;)e
[+1113
Since f"(x) is positive definite there exists J > 0 such that

for all i > J f"(x'+6'a‘) is positive definite. For

i I O, ' ° °, J we have

79

(91‘91+]o°1) ' a' (9|.P1) > 0

since al > O and (gl.p') > 0. Therefore there exists a

positive definite matrix C. such that
(13.5) g. - 9i+l I Ciel

for i I 0, ° ' ', J. The vectors go,gl,° - - therefore

satisfy
91+1' 91 " °1c191

where C' is given by (i3.5) for l I i, ' ° ', J and
C' I f"(xl+ei°i)

for i I J+l, ' ' ' . Hence {0'} is a sequence of positive
definite matrices whose limit is f"(§) which is also positive
def'N'tOe

Now a' satisfies the equation

(9mm)

 

a. I
(Piscipg)
since (9l+lOPi) I 0.
With the above choice for Ci the algorithm for minimizing
f(x) can be written in the following form.
Given an initial point x0 and setting 90 I I f'(x°) we
choose po so that (go,p°) > 0. Having found x',g', and pI

we calculate x1+l, g‘+], and Pi+l by the following formulas

91+1' 91" a1c1i’1

 

80

 

(Ploclp')
xi-1-i ' x1 + aipl

and pi+l is chosen so that (g'+'. P'+I) > O.

The algorithms in sections l0, ii, and l2 differ from
algorithm (l3.6) in that they give specific methods for
'generating the vectors p°.pl, - ° ° . Hence the study of
algorithms of the fonn given in sections l0, ii, and l2 is
equivalent to studying algorithms for minimizing a large

class of non-quadratic functions.

BIBLIOGRAPHY

[l]

[2]

[3]

['1]

[5]

[6]

[7]

[8]

[9]

BIBLIOGRAPHY

Hestenes. M. R. and E. Stiefei, ”Methods of Conlugate
Gradients for Solvin Linear S stems. eport .
National Bureau of Standards, I952.
Hestenes, M. R. ”The C n u e Gr d e t Me h d

lvin i e S s ems." Proceedings of Symposia in

Applied Mathematics. Vol. 6. New York, N. Y.:
McGraw-Hiii Book Co., l966. ‘

Fletcher, R. and H. J. D. Powell. "A R idl' C n e

‘QESSDI EEIDQQ fﬁﬂ'ﬂiﬂimiiiiiﬁﬂ°" Computer Journal.
JUIY '3 3e . I

Kelley, H. J. and Geraldine E. Hyers. ”Egnluga;g,

Ii
Presented at the lgth Congres s of international

Astronautlcal Federation. Belgrade, Yugoslavia.
Septemberﬁi967.

Myers. G. E. “Properties of the Cenlugate Gradient and
Davidon Methods.ll est ury, ew or : na yt c ec an cs
ssoc ates nc. ' ‘ ~

Dirac. P. A. M. ”The Principles of Quantum Mechanich'
Oxford: Qe Us Pa. lggae . '

Davidon. W- C- "W1"

A. E. C. Research and Development Report, ANLI -5990.
l959. ‘

Fletcher. R. and 0- M. Reoves- "WW
W" ComputerJournal- July '9 -

Hayes, R. M. "Iterative Methods 9f Sglxing Linea;

bi H be ~ ." National Bureau of
Standards  Report l733. Hay l952.

8]

 

   

”IIIIIIIIIIIIﬁIIIIIIIIIIIIIII“