9F: c I 0: NH D .Y. vi. . tn.“ r J . I ._ hr u ”ur- G. S 5 :LL .Hu: . 0 UN Mum r. I ,.I will. an. 5.. Hum. C in $3 . . 00 M. k L... ”L1 "L”... 7 Wm PM. a. Cu. n3 1;...“ AJ To” F, ; (via Ha w... .3 AU IN {u . EL nu.” n... Hm” {E In” AHV I.\ In WI.“ 2: Q. g 1h. It.“ «LEW “.1“ ”1.1 r i 9r. .Lx rt. cu in. x - Witt a... “I.” 7.. hi... hi . Km” 52.... "5: .1... ”U 1.1312433? ' Michigan State ,- University . TthClS This is to certify that the thesis entitled FUNCTION MINIMI ZING ALGORITHMS presented by David George MC Dowe l 1 has been accepted towards fulfillment of the requirements for Eh. I). degree in_Ma.thema_tics Major professor Date 7-31-70 0-169 ABSTRACT FUNCTION MINIMIZING ALGORITHMS By David George McDowell We consider the problem of minimizing a real valued function f in n-dimensional Euclidean Space by the method of conjugate directions. First, the case when f is quadratic with a positive semi definite coefficient matrix is examined. Two examples of the conjugate direction method are considered, the Fletcher-Powell formulation of the Davidon algorithm and the conjugate gradient algorithm of M.R. Hestenes. An extension of the Fletcher-Powell method is given and shown to be theoretically equivalent to an extension of the conjugate gradient method given by M.R. Hestenes. As a result of this equivalence we show that the conjugate gradient and Fletcher-Powell methods are equivalent. These two methods are then applied to the minimization of non-quadratic functions. Convergence is shown and conditions on the rates of convergence are derived. The rate of convergence for the conjugate gradient method is shown to be geometric and for the Fletcher-Powell method better than geometric. Finally a general algorithm for the minimization of non- quadratic functions is given. The above two methods are essentially special cases of this algorithm. The rate of convergence for this general method is shown to be at least geometric. FUNCTION MINIMIZING ALGORITHMS By David George McDowell A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Mathematics l970 Chapter TABLE OF CONTENTS I. 'NTRODUCT'ON 0| 0 O O O O, O O O O O I 0' I O O O l. 2. 3. Th. Problem. 0 e e e e e e e e e e e e e PrellmlnarlOSe e e'e e .e e e e e e e e e Iterative Procedures . . . . . ',° . . . ll. GENERAL ALGOR'THMS . Q g Q Q g o g g o e o e a A. 5. 6. 7. 8. 9. Methods for Generating Conjugate Direction Vectors. . . . . . . The Equivalence of Algorithms (A. I) and (A. 3). e e e e o e e e e o Converse to Theorem h.l and Theorem 9.2. Th3 CQ'MchOde e e e e e e e e e e e e e Th8 FP‘ MethOd- e e e e o o e e e o e o 0 Second Method for Constructing Vectors that are N- ~Conjugate . . . . . . . . . . Ill. APPLICATIONS TO NON-QUADRATIC FUNCTIONS. . . . lO. ll. l2. l3. BIBLIOGRAPHY Algorithms for Non-quadratic Functions . The FP- -algorlthm for Non-quadratic FunCthNSe e e e e e e e e e e The cg-algorithm for Non- -quadratic FunCtIOHSe e e e e e e e e e e e e e e e Minimizing Men-quadratic Functions . . . Page m \OW-fl l8 3] 35 39 hi Ah GI 61 66 7O 77 8| INTRODUCTION l. The Problem. This thesis deals with the study of iterative procedures for minimizing real valued functions on Euclidean n-space. It is divided Into three parts, the first two parts deal with quadratic functions and the third part with non-quadratic functions. In particular, the first chapter considers quadratic functions whose leading coefficients are positive semi definite.' The iteration method of conjugate directions is given along with a geometric interpretation. This material is an extension of results found in [l]* and [2] by E. Stiefei and M. R. Hestenes. The second chapter Starts with a description of the two main algorithms. The first is by M. R. Hestenes [2]. The second is an extension of Davidon's method [7] as formulated by Fletcher and Powell [3]. Preperties of the algorithm are also given. The equivalence of the two algorithms is then proved. A special case of this equivalence is shown by G. Myers [5]. *Numbers in square brackets refer to bibliography. l Two special cases of the algorithms are given for minimizing a quadratic function with positive semi definite leading coefficient. These are the conjugate gradient method and the Fletcher-Powell formulation of the Davidon method. The case when the leading coefficient is only positive definite can be found in [l], [2], and [3]. It Is shown that these two cases of the main algorithms are also equivalent. In [2], M. R. Hestenes proved that every conjugate direction method is a conjugate gradient method. From this and above results we are able to conclude that every conjugate direction method is also a Fletcher-Powell method. The last section of Chapter ii shows that minimizing the quadratic form by the method of conjugate directions is equivalent to performing Gauss elimination on the coefficient matrix. The case when the matrix is only positive definite is done In [I]. The final chapter applies the minimizing algorithms to non-quadratic functions. Convergence for the algorithms ls shown and conditions on the rates of convergence are derived. The Fletcher-Powell method Is shown to converge faster than geometrically. Two forms of the conjugate gradient method are given, and both are shown to converge geometrically. The second scheme was suggested In [9]. In numerical tests [A], the second method did show better convergence than the first method. The algorithms In the last chapter, rather than given in terms of the function to be minimized, are presented in a form that is convenient for comparing and deriving convergence results. The equivalence between these algorithms and the minimization of a non-quadratic function is given In the last section. 2. Preliminaries. We consider the quadratic form (2.1) E(X) - (r.Rr) with (2.2) r I k-Ax where R, and A are nxn dimensional matrices and x, k are n dimensional vectors. It will be assumed the elements of R, A, x, and k are real numbers, and the matrix R is positive definite. We are concered with finding a minimum to (2.l) which Is also a solution to the linear equation (2e3) AX - k (If one exists) by searching in conjugate directions. Given any two vectors x and y where x'(xlo°°°oxn)0Y'(Y]s".sYn) we have x+y - (xl+yl, ' ° ', xn+yn) ax - (ax , ax") , where a is a scalar. The inner product of two vectors is given by (x,y) - nyI + ' ° ° - + xnyn and the length of the vector x is given by lxl - (x,x)"2. The conjugate transpose of a matrix A will be denoted by A* and A* satisfies the relation (Ax.y) - (x.A*y). The matrix A is said to be positive definite if (x,Ax):O for all x and equal to zero if and only if x - 0. It is positive semi definite if (x,Ax)30 for all x. If A is positive semi definite then A - A*. The null space of a matrix A, indicated n(A), is the set of all vectors that A maps into the zero vector. If A is positive semi definite and (x,Ay) - O, with x and y not in the null space of A, then x and y are said to be A-orthogonal or A-conjugate. At each vector or point x the vector defined by (2.2) will be called the residue at x. The function E(x) defined by (2.0 will be called the error function. This comes from the fact that E(x) is a measure of accuracy with which x approximates a solution to (2.3). The error function E(x) - 0 if and only if x is a solution to [2.3). that is r(x) - 0. The error function is a positive quadratic form and in the following work we will be looking for a minimum to E(x). in the case where (2.3) has a solution then a point minimizes E(x) if and only if it Is also a solution to (2.3). Let M be the set of all points that minimize E(x), that is M ' [X|E(x) - minimum]. If A is nonsingular then M contains a single point. If A is singular then M generally contains more than one point. It should be noted that any positive quadratic form can be put in the form of (2.l) plus a constant. .Therefore our methods for minimizing (2.l) hold for all positive quadratic forms. Given two vectors x and y we have (2.“) E(x+y) - E(X) - 2(9.Y) + (y.Ny) where g I A*Rr N I A*RA where r is_given by (2.2) and A is an arbitrary nxn matrix. The matrix N is positive semi definite. The vector 9 will be called the gradient of E(x) at x. It differs from the usual gradient by -l/2. Setting y - tp in equation (2.h), where t is a scalar, we obtain (2.5) E(x) - E(x+tp) - 2t(g,p) - t2(p,Np). Using the following lemma we will obtain some preperties of the error function. Lemma 2.l Given N - A*RA where R i gosjtive definite then the following hold a) P 1.32“") ELM .._Y..°“' .LtP Lush“)- b) p 2.l—'1 n(N) .i_f_ and only L1; (p,Np) - O. c) p Lil-EMA) implies (g,p) 9 0. proof. p is in n(A) implies p is in n(N) by the defini- tion of N. If p is in n(N) then 0 - (p.Np) - (p.A*RAp) - (Ap.RAp)- Thus, Ap - 0 and p is in n(A). This completes the proof of a). b) follows from a) and the proof of a). To show c) we have (gap) ' (PsA*Rr) ' (APoRr) ' 0 since Ap - 0. We can now derive some elementary properties of E(x). Lemma 2.2 Given vectors x and p, not L2_n(N), the function E(x+tp) considered 5 a. function 3: t Ls‘ minimized when t - a, where (9.P) (PsNP) Le. also have E(x) - E(x+ap) - a2(p,Np)’ E(x) - E(x+tp) - 0 Proof. If p is not in n(N) then by lemma 2.l we have (p,Np) # 0. Hence minimizing E(x) - E(x+tp) as a function of t yields t - a and E(X) - E(x+ap) - 28(g.p) - azip.Np) - a2(p.Np). The last part follows directly from (2.5) and lemma 2.l. Lemma 2.3 et 8 b ‘3 linear subspace 2L'En, then E(X+y) z E(x) for ally LIL B .i_f_ and only Li: 9 .i_§_orthogonal 53 B. Proof. Equation (2.h) implies inequality holds whenever (g.y) - 0. Clearly n(N) is contained in B by lemma 2.2. if (g,p) # O for some vector p not in n(N) and p in B, then from lemma 2.2 E(x) - E(x+ap) - 82(P.ND)>0 and the inequality does not hold when y - ap. Recall the set M was defined as the set of points that minimize the function E(x). We now obtain an expression for the set M. Lemma 2.h M - n(N) + h where h satisfies the Ineguality E(h) 5 E(x) or a l X l En.o ..P.ro’of.. If v is in the n(N) then by. (2.“) E(h+v) I E(h) - 2(g,v) + @3Nv) I E(h) by lemma 2.l. Thus h + n(N) is contained in M. Let h be in M and set v I h - h. Then h I h + v and 5(5) - E(h) I O by the definition of M._ Suppose v is not in n(N) then lemma 2.] implies (v,Nv) # 0. We have E(h) - E(h+av) - 82(v,Nv) 5 0 since h minimizes E(x). Thus a - o and (g,v) - 0. Therefore 0 - E(h) - E(h) - E(h) - E(h+v) - -(v,Nv) . This is a contradiction to the assumption v is not in n(N). Therefore v is in n(N) and M is contained in the set h + n(N). Lemma 2.5 g(x) I 0 if an onlz i xaM. Propf. g(x) I 0 implies E(x) - E(x+tp) - -t2(p,Np) 5 o for all t and all p, and hence x is in M. Suppose g(x) I 0 then if x Is in M we have E(x) - E(y) 5 O for all y In E". In particular let y I x+tg, then 2 2 E(x) - E(x+tg) I 2tlgl - t (g,Ng) > O for t sufficiently small. ThisWs a contradiction to x being in M. Therefore g(x) I 0. These lemmas give us some of the basic prepertles concerning the error function and the linear manifold M. Before considering finite iteration methods we note that g(x) I A*Rr(x) I A*R(k-Ax) I E - Nx where k I A*Rk. The previous lemma shows that x is in M if and only if k - Nx I 0. We know from preliminary remarks that x in M solves Ax I k if and only if Ax I k has a solution. Therefore the two linear systems Ax I k and Nx I E are equivalent if and only if Ax I k has a solution. 3. Iterative procedures. We will be considering iterations (30‘) X'+] - X' + tlpl '- 0. I. P 0 e such that xh is in M, m 5 n. Here p, is a direction vector and t. is a scalar. The iterative procedures will be used to minimize the error function E(x). We set rI I k - AxI a, I c,/d, (302) 9' - A*Rl" C' - (9'.P') N - A*RA d, - (p,.Np,) and we have from lemma 2.2 E(x'+‘) - E(x,) attaining its minimum when tI I a, and p' is not In n(N), and E(x,+,) - E(x,) I 0 when p, is in n(N). Therefore for an arbitrary initial point xO our iteration will take the form lO xi-i-l ' xi T aipi (3.3) rl+i I r! - a'Ap, rO I k-AxO with (9 p) a 3 , I. I p, not in n(N) L 0 p' ll'l "(N)e Since E(x,+,) - E(x,) I 0 for all tl when p, is in n(N) and x,+l I x, + t p,, we pick ti I a, I O for convenience. Thus i the iteration will be completely determined by our choice of po, pl, ' ° ' where x0 and R have already been chosen. We have the following definitions concerning p0, p,, ' ‘ ', pr , r f n-l. We say the vectors p0, p1, ' 0 0 . pr are linear independent if r 2 a'pI I 0 implies a, I 0, i I O, ' ' , r. iIo If B is an nxn matrix then the vectors p0, pl, ' ° ', pr are B-linear independent if r 2 a,p, in n(B) implies a, I O, l I 0, l, - ° °, r. iIo Clearly B-linear independence implies linear independence and if B is nonsingular then the two definitions are equivalent. Let m I rank of A, then n-m I dimension of n(A). We also haVe m I rank of N and n-m I dimension of n(N). The next two lemmas give us the properties concerning the ii vectors p0, p1, ' ' ' ‘ . Lemma 3.l l£.m vectors, m f n, are chosen 33 that (30“) d' - (PI. NP‘) > 0 o (plszj) - O l i J _then p0, ‘ ‘ ‘ ', Pm-i are N-llnear independent and hence also A-linear independent. Independent 2: the choice gi‘ x0, the vector xm defined [3.2. (3.3) EL M. 3.9.. also have the following conditions satisfied, (305) (QIpPJ) ' 0 j ' 0. ° ° ', l'l (3.6) (9,.pI) I (gj.p,) j I 0, ° ‘ ', i-i. Proof. We first note that d, > 0 implies that p, is not in n(N). Condition (3.6) shows that if (gllpl) ai - _-—. .(piINp') than (90.9,) 8‘ - w . (PioNPi) We note m-l m-l '2 .c,p, in n(N) means N( Z c'p,) I 0, lIo iIo therefore m-l '20 clel I 0 and CJ(pJ,NpJ) I 0 l2 for j I 0, ° ‘ ', m-l. Hence cj I 0, j I 0, ° ' , m-l and pl, ' ' ' ’pm-l are N and also A-llnear independent. We next prove conditions (3.5) and (3.6). We have gi-l-l ' A*R'l+l ' 9i ' aiNPi and hence (91,].pji-(glmji - a,(Npl.pj) . For i I j this becomes by (3.h) (9,“.91) .. (9pm) . and the formula for a, implies (g,+,,p') I 0. Therefore we get (g,.pj) - o for i > j. We also obtain (9H,.pJ) - (gimj) - a'(Np,.pJ) .. (spill). i>i+i which implies (91-1.p,)-(gi-2.pi)- ° ° ‘ - (90.91). This completes the proof of (3.5) and (3.6). To show xm is in N we note that (gmspj) ' 0 j ' 0. ° ' ', m-l and 9m is orthogonal to n(N) by lemma 2.l. The dimension of n(N) is n-m, thus gm is orthogonal to E". Hence 9m I O and lemma 2.5 implies xm is in M. We have as a converse to lemma 3.l the following lemma. l3 Lemma 3.2 .Let.vectors.po,p,,...,pm_,.pg_A-linear independent. Then there exists appositive definite matrix R such that (3.h) holds with N = A*RA. If in iteration (3.l) xm i§_ip_M then t. I ai. 'fingi. Let P be a matrix whose columns are p0, . . .. pm-l. Let Q I AP with columns v0, ' ° ,v that is m-i' v, I Ap', i I 0, ° ‘ ',m-l. The vectors v0, ° ° ‘, Vm-l are linear independent since the vectors p0, ° ° ',pm_] are A-llnear independent. Extend v0, - 'on-I to a basis for E". Thus we have n linear independent vectors v0, ' ° ,vn-l. Let 6 be the nonsingular matrix whose columns are v0, . . -, Vn-l' if we set R I (56*).‘, then R Is a positive definite matrix. We have 6*R6 - 6*(66*)"6 - 6*6*" - I. that is (v,,RvJ) I 0 i i j i,j I 0, ° ',n-l and (v,,va) I i . Therefore if we set N I A*RA we have P*NP - (AP)*R(AP) - Q*RQ - I, and (3.“) holds. To show t, I a. we note that xm ' xi * tiPi + ' ° ° + tm-iPm-i and rm I k-Axm. Since xm is in M we have gm I 0, therefore 1h 9m - A*er - A*R(k-Axm) - R-Nxm - 0. Now 9' I A*Rr‘ I E-Nx' I N(xm-xl) ' tiNPi T ° ' ° T tm-lem-i° Therefore (9,.pI) - tyin;.p,) with pi not in n(N) and hence I;, (gispi) t, - -——-—-—— . al‘ (PIONP') An iteration (3.3) in which the vectors p0, ‘ ' °,pm_, are chosen as in (3.“) is called the method of conjugate directions (ed-method) by M. R. Hestenes and E. Stiefei [i]. We have the following theorem concerning the cd-method. Theorem 3.l mm cd-method £13m x! minimizes E(x) 2; the line x I x‘_‘ + tp._l and also an the i- dimensional plane 21 points xw- x. + top. + ' ° ' + ti-lPi-i' The point x; 13_the center of t e (i-l)-dimensional ellipsoid which Li the intersection f this plane with t e level curve E(x) - mo). This theorem follows-from lemmas 2.2, 2.3 and 3.l. This l5 theorem differs from theorem 3.l in [2] because there the level curve E(x) I E(xo) is an (n-l)-dimensional ellipsoid whereas here this level curve is an (m-i)-dimensional ellipsoidal cylinder'wlth center M. Recall the set M I h+n(N), where h is a point that minimizes E(x). When the matrix A is nonsingular the level curves E(x) I constant form a one-parameter family of (n-l)- dimensional ellipsoids with center M (a single point), the solution to Ax I k. In the case where A is singular and has rank m then the level curves E(x) I constant form a one-parameter family of (m-l)-dimensional ellipsoidal cylinders with center M. In order to see this we consider the linear mapping A: En+D , where D is an m-dimensional linear sub5pace of E", and where dimension n(A) I n-m. We have E“ - n(A) + 3,, where Bm is an m-dimenslonal linear subspace of En and A: Bm+D one-to-one. Let A be a nonsingular matrix such that Ax I Ax for all x in Bm. We have E(x) I E(x) for all x In Bm where E(x) I (k-Ax,R(k-Ax)). The level curves E(x) I constant form a one-parameter family l6 of (n-l)-dlmensional ellipsoids. Thus the intersection of the level curves E(x) I constant and 8m form a one-parameter family of (m-l)-dimensional ellipsoids in 8 But since E(x) m. and E(x) agree on Bm then the intersection of the level curves E(x) I constant and Bm are (m-i)-dlmensional ellipsoids in 8m where the center minimizes E(x) in E". We also have En I n(A) + 3m and E(X+q) - E(x) for all q in n(A), hence the level curves E(x) I constant are (m-l)-dimensional ellipsoidal cylinders with center axis M. if in theorem 3.l we let i-l Bi I {150 tjpjltj arbitrary} . i I l, ° ° ',m, then B, is an i-dimensional linear subspace generated by l A-linear independent vectors and Bm is as above. We have B'CI Bi+l for each i, hence the level curve E(x) I constant intersects Bi in an (i-l)-dimensional ellipsoid. in the cd-method we have an arbitrary initial point x0. This results in a displacement of each B, by an amount x0. The geometrical interpretation is the same as above except we are dealing with hyperplanes that are not subspaces. The points x0, ‘ ’ ',x, are in x0 + B, and x, is the center of l7 the (i-l)-dimensional ellipsoid which is the Intersection of the level curve E(x) I constant and x0 + Bi' The point xm minimizes E(x) and is the intersection of x + B o m and M. The previous two sections are an extension of the results found in [l] and [2], to the case where A is a singular matrix. We now consider various methods for generating vectors that satisfy condition (3.h). II GENERAL ALGORITHMS h. Methods forggenerating_conjugate direction vectors. Given a positive semi definite matrix N we can generate vectors that are N-conjugate by taking n basis vectors “l" 0 a ,un and then N-orthogonalizing them by a Gram- Schmidt process. This method will be looked at in section 9. In this section we will present two algorithms. The first is by M. R. Hestenes and can be found in [2]. The second method is an extension of the Davidon algorithm, as formulated by Fletcher and Powell in [3]. We will show in section 5 that the two algorithms are equivalent. The following theorem is due to M. R. Hestenes and can be found in [2]. Theorem h.l Let K and N 22 positive zgmL definite matrices and let 9 b ._2 arbitrary vector not L2."(K)' w__ 0-- The algorithm 90 . K90 gi+l . 9i - aini M") Pi+l " K91+] T bii’l a] I clld, , bi I el/d‘ d; - (Pi.NPl) . C; - (9i.p,) el " ' ("PliK9i+l) l8 l9 generates non-zero vectors 90,9', . 0 °,- and p°,pl, . . . satisfyingm relations (1‘02) (9'3KQJ) ' 0 . (P'szJ) - o l *1 e T e algorithm will terminate l2." steps where r 5 min (m,s) where m is t e rank f N and s is the rank iii-K. is. the rth step one‘_: t e following will old. i) 9r I 0, ii) 9r i O, Kgr I 0, iii) Kg, I 0, Npr I 0 with pr I 0. Corollary i. If K.Li positive definite and (2,90) I 0 for al 2 L2 n(N) then algorithm (h.l) will terminate i r asses. r 5 £223.21“. N. LI.___and _on_i.x Li; 9. - o. ELEEL! Clearly case ii) of theorem “.1 cannot occur with K positive definite. If (2,90) I O for all z in n(N) then (2,9,) I (2,90) - ao (z,Npo) - ° ° ° - a'-l(z,Np') I O for all z in n(N) and i 5 r. In case iii) Npr I 0 means pr ' K9!" T brIlpr-l is in n(N). Therefore 0 ' (gripr) ' (gr,Kgr) ' br-l (QrDPr-l) ' (9r'K9r) Thus gr I 0 and algorithm terminates in r steps if and only if ng0. Corollary 2 I both K an N are positive definite then 20 (1i.l) M terminate Lil r ste s, r s n, mmmu 9r I 0. It will be convenient to describe the material dealing with the Davidon algorithm as formulated by Fletcher and Powell in terms of the Dirac bracket notation. Fletcher and Powell used this notation in [3]. The reason for this is that outer products of vectors occur in this algorithm and they have a convenient formulation in this notation. A column vector (x1, ' t 0 ,x") is written as [€> . The row vector with these same elements is denoted by <3I. The scalar product of <§| and |£> is written <§jy>i. The notation Ix> and i is a column vector and is a scalar. The following theorem gives an algorithm that generates vectors which are NIconjugate. It is an extension of the Davidon algorithm as formulated by Fletcher and Powell [3]. Theorem h.2 Given positive semi definite matrices Ho an N an p’vector go not‘Lp n(Ho) then the algorithm [50> I Holgo> [Qt-ii) " [9l> ' allei) '5 l+l> "' ”in l9i+l> Hi+i'”i"'Ai"'Bi. (h.3) where _ L°i><fi| B, - -’H,|y'> " ailsi> » Iv.) " l9.) - [9i+l> aI I cildl , c. I , d' I generates non-zero vectors so, 5,, 0 ° - o satisfying the relation <53|NIsJ> I 0 ii‘j . The algorithm terminates 1.9. r steps when one 9; the following holds. i) [9,.) I 0, ii) Igr>¥ O, Hr-l|9r> I 0, iii) Hr_llgr> .i O,ler> - o with |sr>¢ 0. Theorem “.2 will be proved by means of two lemmas. Lemma h.l The scalars a',cI, and d' are positive or i< r. Moreover, the following hold a) (align-l) " 0 b) Hi+lN|<=ii> " [0,) (1m) c) <0I|N[03+]> - o d) z 0 for all x e) (QIIHllgi-i-l) . 0 f) <°i-l|9i+l> " ° 9) >°il¢ O,Hr_1|gr> - 0, iii) Hr-l|9r>" O'N[5r> I 0, in this case Isr>¢ 0. Proof. a, is chosen so that <5l[9i+l> I 0, therefore 22 ‘<°iI9l+l>> ' ai‘<5l[9|+|>’ ' 0. To show b) we have HI+l N|°i> ' Hl-i-ll'Yi)’ ' HI[Y|>’ T AIIYi>’ T B,Iy,>» " HIM) ” [°i> " Hilyi> ' |°i>° For c) 0 " <9i+l[°i> " " <5l+l|N|°i> implying ‘’ ' 0 0 We prove d) by induction. We are given Ho positive semi definite. Assume H1, ' ‘ ' ,Hk are positive semi definite and let In) - inki'” Ix) . In) - (MI/2h.) with |.>. 0. For k¥ 0. Then since < ' 2 3 0 by Schwartz's Inequality. Also <°klvk> - (okINlok> > o . k» 3 0 for all x. To show e) we observe that <9i|Hilgi+l> " I 0. Now <°l-l|9l+i> " <0i-i|9.> ‘ <°I-l[NIOI> - o establishing f). For 9) we note that (“will”) ' (gl'gi-i-llHiin'Qi-i-l) . (sulflilsi) + <9i+ilflilgy+1>. Since i > 0 and Z 0 and therefore > 0 . i " <9i[”i[9|>> 0 and di I i’> O and 8' - CI/d‘e 2h We now prove the last statement of the lemma. If Igr> I 0 then Isr> I Hr|9r> I 0 and the algorithm terminates. If Igr>i‘0 and Hr-l|9r> I 0 then [5,.) ' Hflgl’) - Hi--l|9r> 4' Ar-l|9r> + Br"l|9r> with Ar-l|9r> I O and Br_'lg,.> I 0, thus Isr>I O and algorithm terminates. If Hr_,|g,.> 9‘ O and ler> I 0 then Iyr> I Nlpor> I 0 and hence H is undefined since r+l both Ar and Br are undefined. Therefore the algorithm terminates in r steps. In this case we have Isr>ii 0 since if [5,.) I O then Hrlgr I O, and Hr-]|9r> + BrIllgr> - 0' Writing this equation out gives us (1.”) Hr-l|9r> - tls‘r-i> ' 0 where t _ Thus if t I 0 then l+t 0 ' <9rlerl> - T implying Hr-ilgr) I 0 , which is a contradiction- If t I 0 then 25 O T Hrlgr> T I"r-ll9r> ' This is again a contradiction since in case Iii) it is assumed Hr-llgr> Ii 0. If none of the three conditions hold then Iyr> I Mar) 9‘ 0. Also H'rlyr> 1/ 0 for if Hrlyr> I 0 then Hr|9r> T Hrlgr-i-i> and <9rlHr|9r> ' <9r|Hr|9r+l> -, - 0. Hence '5'.) I Hr|9r> I O and Mar) I 0, which is a contradiction to the assumption Mar) 1‘ 0; thus Hrlyr>si 0. We therefore have I <°rIN|°r> > 0 and > 0, hence both Ar and Br are defined and Hr+l is a positive semi definite matrix. Thus the algorithm does not terminate at til the r step and [Sr-Ii) ' Hr+llgr+l> . With |9r+l> " [9,) -‘N|o,.>. This completes the proof of the lemma. Lemma li.2 The vectors s°,s], ' ' ' generated .91. algorithm (li.3) satisfy the followinj. relations: a) - ‘0 iii] (ins) b) HkNIaJ> - [0]) j < k c) Io lIO,",j-l d) Ic, j-o,---,i 26 Proof. In view of lemma h.l relations (h.5) a), b) and c) hold for i, j 5 l. Assume they hold for l, j 5 k and we shall prove they hold for l, j s k + i. We have Isak) .. lg.-.) - "lab-l) '[91+l> " N|°l+i> " ° ' '"|°k-i> and <°i|9k> " <°ilgi+l> ' <°ilNl°i+|>' ' ° ° T<°ilN4[°k-l> I 0 for i < k . Also, we have <°il9k+l> ' <°i|9k> " <°l|N|°k> " ° for i < k + i. From “.5 b) we obtain (bill‘l‘m) " 3k <°|[N[3k> " ak<°i|NHk|9k> Iak (O'ng> IOforl I O. From this we observe that - - - o for l < k. Therefore |°k><°k1NJEi> <°lek> “k+i“l°i> ' HkN|°i> + _ "kIYk> <9 [Hlek> " HkNlai> ' |°i> k 27 for i < k and Hk+lNIak> - Iok> . From this we get 0 ' <9k+i|°i> " <9k+i|“k+i"|°i> - <5k+‘INIo,> for i < k + l . Hence I0 i,j5k-l-l and (“.5) a), b) and c) are proved for i,j 5 k + i. To prove d) we have <5i|9o> ' <51|91> T 30 <5l|leo> ' (51”!) " °l ' Assume d) holds for j 5 k, that is " ck 3' 0. ° ' ‘. "° Then we note that "' <5k+l[9k+l> T a] <5k+llNl3j> 4. e . e + Tk I I Ck-I-l and lemma “.2 is proved. Theorem “.2 follows from lemmas “.l and “.2. The vectors so,s], ' 0 , sr.] and do, 0‘, ' ° ' , ar_, are N-linear independent and hence also linear independent. The next two lemmas give some preperties of the matrices H'. l - 0. T T. '0 Lemma “.3 n (Ho) I n(H,) , l>>0 . 28 Proof. Assume z is in n(H') , l 5 k and show that z is in n(H We have k+l)‘ Hk-i-i'z) " HUI) T Aklz> T Bk") “’0 (ENE? <°kIYk> We note that Ink) I l/ak Hk[9k> , hence <0k[z> I l/ak <9k|HI . 0 and therefore Ak|z> I 0. Thus HkHIz) I O and z is in I Ak|z>i I n(H We therefore have n(H.) contained in n(H'+,) for k+l)e each i, and hence n(Ho) contained in n(H,) for each i. To complete the proof we need only show that if z is In n(Hk) then 2 is in "(Hk-l)° Assume z is in n(Hk) then (21°51) 2 T T (ck-liYk-l> _ 2 - (Hk-,)'/2|z> and Iq> - (Hk_,-)'/2|yk-,> it 0 then (“-5) ° " (21%|!) _ $p|p> - _2 + (140.002 < . By Schwartz's Inequality we have 29 <°|°> - |2 : 0- Thus, for the right hand side of equation (“.6) to be zero we must have <,.,><.,.> - 2 - o and <> I O . Therefore, H - [Y - Y _ [H - Iz OIHk|z> THk-llz> _ k l l_< l>< k__Jl_ k_l_> " Hk-ilz " tk‘lykIl> where SZk-ilflk~1'=> . We therefore have 2 - tk-]yk_, in "(Hk-l)' Howeven,we have tit-l " just shown that if a vector is in n(Hk-i) then it is in n(Hk). Hence [2 - tk-lYk-i> is in n(Hk) and 0 I Hklz " tkIlYkIl> " " Tk-lHklyk'l> I .. Tk'llak'l> 0 Where [ck-'l)T 0' This implies tk-l I O and z is in "(Hk-l)° Therefore n(H') is contained in n(Ho) for each i and the proof is complete. Lemma ll.“ Li; Ho hgsitive definite then H, _i_§_ positive 30 definite mm i. T 5:23;. The proof of this follows directly from (“.“d) of lemma “.l, and lemma “.3. We complete this section with two corollaries to theorem “.2 ' Corollaryi .l_f_ H0 Ls. positive definite m I O for a l z‘Lp n(N) then algorithm (“.3) will terminate L; r stars-2. r s 24229.1. N. Lf.__and __.y.onl 1:, lg.) - o. M' 'f I 0 for all z in n(N) then . o for all i 5 r. It is only necessary to show that case Ii) and case iii) of theorem “.2 cannot occur. Clearly case ii) cannot occur since Hr-l ls positive definite by lemma “.“. in case iii) we have Hr-lgr I O and ler>> I 0 with Isr> T Hr|9r> T Hr-llgr> T Br-llgr>' Since Isr> is in n(N) and I O for all 2 In n(N), then _2 . 0 " <9r|5r> T T In order for the right hand side of this equation to be zero it is necessary that [9r> T tlyr-l> for some scalar t. If t I 0 then 3i [5,.) T Hrlgr> T tHrlyr-l> T tlc’r-l> and NI°,-}>’ I 0. This contradicts our assumption that the altliorithm terminates at the rth step. If t I 0 then Igr> I 0 and Hr-l|9r>’ I 0 , which contradicts Hr-l|9€> I 0. Therefore case iii) cannot occur and the algorithm terminates if and only if Igr> - 0. Corollary 2 if both H and N are positive definite then —“ O- _ algorithm (“.3) will terminate in r-steps, r 5 rank _i N, LL and Gill! Ll: lgr> I O. 5. The equivalence of algorithms (“.l) and (“.3). In this section we prove the equivalence of algorithms (“.l) and (h,3). We do this by showing that if Ho - K then the two algorithms generate vectors that are scalar multiples of each other. G. E. Myers proved a special case of this in [5]. There Ho and K were both equal to I and N was positive definite. in order to prove this equivalence we need the following two lemmas. Mil-$5.] <91[Hi-l[93> I O _2_r_l < j 5 r Proof. Since HI-]|Z1-]> |s,> - H1|9|> . Hi-l|9i> " and <5ilgj> '0 l‘JS" 32 by “.5 c) , we have (5.1) o - (gulf-i) - <9jIHi-i[9i> _ iglIEI-llyi-Q We note that (QJIHl-llYi-l> " ' <9lei-i[9i> and ‘> ' ‘<9i-l[Hl'l[gi-l>’ T [" Therefore (5.1) can be written as (”mi-ll”) ] (502) 0 ' g IH _ lg [T — - -- _— ‘__5 _ . (J I I '>[ <9i-li";-1|9i-i>+<9;|"i-i|9,> Also <9i-i|”i-l|9i-i> > 0 since i - l < r. Therefore equation (5.2) is zero If and only if ' o for l < j : r e Lemma 5.2 H,Igj> I HoIgJ> for i < j 5 r. £5331. We first note that (“.Sc) implies Ail“). o for i .. o. . . . .1... l . hence 33 "IIQD " Til-1'93) ‘ Hi-llyi:l> . iii-1'91) ' Hi-lIYi-i>[<9TT11HTTT121>— _, CHIN-1'99] . However, " 0 by the previous lemma, and <9i-i|”i-l|9j> ' <5i-lIQJ> - 0 . Therefore Hllgj> - Hl-llgj> forl <1: r and Hllgj> " HI-lIQJ> " ' ’ " Holgj> which proves the lemma. Theorem S.i L: K I "0 m algorithms (“.l)9_11(“.3) generate vectors 13.2.3293. directions. I m. The proof of this theorem is by induction. We have '50) T Ho'go> T K|g°> T [90> ° Assume Is,> I t,lp,> with, t‘ > 0 and i 5 k < r where Ip'> , i I 0, ’ ° ' , k, are generated by algorithm (“.l) and [51> , i I 0, ' ' ' , k, are generated by algorithm (“.3). 3“ We have (5.3) [5k+]> " Hk+ilgk+l> " Hkl9k+1> T' Bklgk+l> “ijk> <9k|Hk|9k> T <9k+l [“klng) - Hob...» - Now (53') Hklyk> ' Hklgk) ' Hk'9k+l> ' lsk> " ”0'9k+i> . and (5.5) (minim...) .. . - <9k+1|He|9k+i> . Substituting (5.“) and (5.5) Into (5.3) and collecting terms we obtain (5'6) Isk+l> " tk-i-l HOIQk-i-l) T Bklsk> where <9k1,"k|9k> _ tk-I-l " ‘ ." "’0 <9k|”k|9k> T <9k+IIHo|9k+i> and E _ <.....lH,Ig....> __i_ 4<9k+i|K|9k+l> k <9k'"k|9k> tk <9k'K|9k> Using (“.l) and (“.2) we can show b _ <9k+l[K[.9k+l> P k <9k|K|9k> ‘ 35 hence Bk I l/tk bk . Therefore (5.6) becomes ITk+l> T_ tk-I-l K|9k+l> T bkli’k> T tk+l|pk+l> ' Coroliacy I If H I K.L£ positive definite and an. O I... I 0 ML 2 in n(N) then algorithms (“.l) m (“.3) are equivalent i the sense described above. Lemma 5.3 - o for l < j and k 5 l. Proof. For ItI i we have I <5,Igj> I 0 for i ' <9][H|-|[9j> "' O o with lemma 5.2 implying the first equality and lemma 5.l the second. We note that in particular <9i|Ho|9j> I O for i 9‘] . 6.. Converse to theorem “.l and theorem “.2. Algorithms (“.l) and (“.3) can be used to generate any set of N-conjugate vectors where N is a positive semi definite matrix. In [2] this is shown for algorithm (“.l) when N and K are positive definite matrices. Theorem 6.l at N 23;: positive semi deflnitg_matrlx with rank m an at p0, ' ‘ ' , Pm-l p$_m vectors such that 36 (6.l) d, I (p‘,Np')>0 , (p,,NpJ) I 0 , i I J. 1.3;. co, ' ° T ‘ ,cm.i be in positive mnumbers gi_§_e_t_ ai I ci/di i I 0, ° ' ,m-l £1_ (6.2) 90 I aoNpo + ° ° ° + am-1Npm_' . .23. g], ' ' ‘ , gm_] Eggnerated by (6.3) 9l+l I 9i - a'Np' . Then there exists 2.ppsitive semi definite matrix K such that ‘6‘“) p0 T K90 0 Pl+l ' K9i-i-i T hip] with bi I ci+l/°i . Also (6'5) c] " (930KB!) 9 (gisKQJ) " 0 . l *1. Proof. Let G and P be the nxn matrices whose columns are 90’ . o . OQM'lOOI.. ,0 and pot . o . ppm-Is 0. T .9 0 reSpectlvely. Let A, C, D be diagonal matrices whose diagonals are ac, ‘ ° ',am.], 0, ' ', 0 ; co, ‘ ‘ ' ,cm.', 0, - ° ,0 ; and do, ° - - . dm-], 0, ° ° ° ,0. Let A, E, D be diagonal matrices whose diagonals are l/ao, ‘ ‘ ‘ ,l/am-l, 0! . . ,0 i l/cob . . . i [,6 oi . . . i O i and m-i' l/do’ e e o . l/d o . e o 0 . m-l’ Set bi I ci+l/°i , and let B be the matrix with b0, ‘ ° ' ,bm-2, O, ' ' ° ' 0 just above the diagonal and zeros elsewhere. Let V I (v'j) with V'J I i when i I j+l and Vij I 0 elsewhere, and set T I I - V. 37 We have A - c5 - 5c , B - tv*c , P*NP - 0. Relation (6.3) has the form GT I NPA. it follows that P*GT - P*NPA - DA - c. Since C* I C, we have c - c* - (P*GT)* . T*G*p and 6*? - T*"c. If we define Q I PET* then G*Q - c*rtr* - T*"cET* - T*"TT* - T where with I an mxm identity matrix. Define K I QCQ* then KG - QCQ*G - QCT* - QCT - Qc. Since PET*C - QC - KG and 38 PET*c - P(I-B) - P-PB, we have ch- P-PB. Hence P I KG+PB establishing (6.h). Also G*KG I G*QCQ*G I C and K is positive semi definite. This completes the proof of the theorem. We can also choose K positive definite. Let K be a positive semi definite matrix whose null space is generated by go, ' ‘ ‘ ,gm-] , then K defined by K I K+K is a positive definite matrix. The matrix K also satisfies conditions (6.h) and (6.5) of theorem 6.l. Theorem 6.l is an extension of theorem 5.l in reference [2], from N and K positive definite to N and K positive semi definite. Theorem 6.2 et N be 2.Eositive semi definite matrix a: rank m an p0, ' ° ' me-i m vectors such that (PioNPI)>o a (PlsNPJ) ' 0 l * 1- ‘L3 (6.2) and (6.3) hold then there exists‘g positive semi definite matrix Ho such that (6'6) po . Hogo ’ Pl+l ' Hl+l 9l+l _.-.-- __ .._, in.-- .5. msz I 1.. 39 where Hi+i.Li defined 1i L1 algorithm (h.3). Also (6.7) (9,.Hkg,)>0 . (gl’Hkgj) I 0 k s l < j. 3:221. Statement (6.6)follows from the equivalence of algorithms (h.l) and (h.3l and theorem 6.l. We get (6.7) from lemma 5.3 and the fact that Hk is positive semi definite. We now describe the conjugate gradient method (cg-method) and the Davidon method as formulated by Fletcher and Powell (FF-method). 7. The cg-method. The following is essentially the method described in [l] and [2]. However here we are allowing A to be a singular matrix. The method of conjugate gradients used to minimize the function E(x), where E(x) I (r,Rr) with R a positive definite matrix and r I k-Ax, is as follows. Select a positive definite matrix K and an initial estimate x0 and compute ro ' k-Ax 90 ' A*R'oo and Po ' K90. The vector go o. is a negative scalar multiple of the usual gradient of E(x) at x I x0, and is therefore in a direction of decrease of E(x). Having obtained X}, r;, g, and p, we compute xl+1, r'+,, g'+], and p'+l by the following. We have (Qisp') (gl'Kgl) (9,.Np') (p,.NP,) with N I A*RA #0 xi+i ' xi + aipi rl+i I k - Ax'+l I r, - aiAPl (7.1) 91+] ' A"‘Ri‘l-i-l (Np,,Kg,+|) (9l+lngl+l) b, - - (p;.Np‘) (9,.Ksll pi+i ' “91+! + bipi ° We note that 93+: A*R'I+1 ' 91 ' aini' The vectors go, 9], ' ' ’ are orthogonal to n(N) and hence algorithm (7.1) satisfies the hypothesis of corollary l to theorem h.l. Therefore algorithm (7.l) terminates in r steps, r 5 rank of N, when gr I 0. By lemma 2.5 xr minimizes E(x) and is therefore a solution to Ax I k if one exists. Theorem 7.] .IEE cg-method is a cd-method 222.5E3bsolution xr is i M. The points x0, ° ° ' x, (i s r) generated 2x'the cg-method lie L_'_g i-dlmenslonal subspace n'. The point x: minimizes the function E(xliLg "i' This theorem is proved in [2], however it is shown above .that xr is in M. The following theorem completes this section on the cg-method. It essentially shows that every cd-method is a cg-method. Theorem 7.2 kg N 22 positive $5.311. definite matrix mmumuumtawmeuo. ' ° ° . pin-19...!" vectors such that hi d, - (p'.Np')>0 , (p',ij) - 0 l i Jo Then there exists‘g cg-method such that the algorithm described Lg section 7 generates p0, - - ° , pm-]. 5522:. To prove this we need only show that golies in the space spanned by Npo, ° ° 0 , Npm-] and then apply theorem 6.l and the paragraph following it. If m I n then Npo, ' ° ° , Npm-l spans E" and we are done. If m - A*Rnr.> . and [50) I Holgo> , where “o is a given positive definite matrix. Having obtained HI, xi, r', g', and s‘ we compute H;+], a, x'+], ri+i 91+] and s'+] by the following formulas. We have a . (9:151) ‘ <5,|N|5,> XI...‘ . x. + a'ls,> Iri+l>> ’ l'l)’ ' aiA|5i> A*RIr'+l>> - Ig,>> - a,N|si> l°j>”<°i| ‘<°I'Yt> a, - ———— |°I>’ ' a,|s,)» . IYI>’ ' |91>’ ‘ l9:+1>’ ' "|°h> |33+|> " "3+1 |9|+l>° i: .1 v e I I + I This algorithm is equivalent to algorithm (h.3) where go, ' ‘ ‘ , gr are the negative gradient vectors of E(x) at x0, "0 ° ,xr respectively. As in the last section the gradient vectors are orthogonal to the null space of N, and hence this algorithm satisifes the conditions of corollary i “3 to theorem 4.2. Therefore the algorithm terminates if and only if lg I O. This implies that x minimizes E(x). r r In view of corollary i to theorem 5.l the cg-method and the FP-method are equivalent. This means that if Ho and K are equal and the two methods start at the same initial point xo then they generate direction vectors that are scalar multiples of each other and hence generate the same points x]. C C O . xr. Theorem 8.i Theorem 7.l holds with cg-method replaced 2L.FP-method. Theorem 8.2 _E_v_e_l;y_ cd-method Ls_ 29. FP-method. Prggi. This follows from the equivalence of the cg and FP-methods and theorem 7.2. From theorem 8.i we have that 00, ° ° ° val-l spans an i-dimensional subspace n; and the point x, minimizes E(x) in n}. if we let W be the space spanned by a', ' ' ' ’ar-i then the matrix H, projects the gradient vector gI into W in the direction of 0;. This becomes the next direction of search for a minimum to E(x). We note that W is N orthogonal to 1'. If N is positive definite then E" I n, + W where W I 1,". Lemma 8.i I N Lipositive semi definite then Hr he. left inverse £2.N 22 the space generated El.°ot. ' ' ,ar-'. i N La positive definite and r I n then Ah Proof. We have (8.2) HrNIoJ> - laj> , j - 0, ° - ° ,r-l. Hence Hr is a left inverse to N on the space generated by do,“ ' ‘, °r-1° If N is positive definite and r I n then (8.2) implies Hn‘I N']. We observe that S'NS I A where S is the matrix whose columns are 00, ° ° ' ,on.] and S' is the transpose of S and A is diagonal with (allNIa'> , i I O, ' ' ' ., n-i, along the diagonal. The matrix N satisfies the equation N - s"'1 As" - (SA'IS'VI . hence u" - SA-IS' - T‘IM'I)” Ial> -I|o,>00 d) The set of vectors {pl-Ir 1.9.. 1'} form _a_ basis for n(N). e) The set f vectors {pill not in r} are N-llnear 0 ___~ independent. f) (P',NuJ) I 0.LEL jj, pj not in n(N) tlj ' (PJUNUJ) 0 pj li‘l i'i(~) (The following lemma indicates some of the properties concerning this algorithm Lemma 9.2 _l_f_ p], ' ° °,.pn are generated 31 (9.2) then the conditions i lemma 9.l hold along with the following additions g) (u,k,NuJ) - o j_ r112 I rjiz' d) 93 . (o, o, k32, - - ',kn2) e) Np3 I (0. 0. r332. e e an also x3 I x2. If p2 is in n(N), then by lemma 9.6, would look like r11 '12 '13 ' ' ' r1n P11 0 0 e o e e e e 0 p (9.8) 2'2 0 0 r33 ' ° ' r3" U3} the matrix (9.7) ' Pin k1 ' P2n O . 2 2 U3" k3 56 in this case we have 3 p3 - 03 I U32 I (“312. . . .9 U3 2) i'l It should be noted that if we had not assumed the existence of a solution to Nx I k then kzz may not have been zero in the case of p2 in n(N). Fm. We now consider the general case. Suppose we have p], ' ' ‘,pm and x], ’ ' ',xm then by algorithms~(9.3) and E (9.“) we have (k,pm) , p not in n(N) am '( (Npmt°m) m 0 pm in n(N) u‘ill‘l'l - uim . tlmpm '- m+]. 0.0 a." where (Nu m,e ) J ' I m pm not in n(N) tim" )(Npm19m)‘ If pm is not in n(N) then we have m tim I rlmm/rmm and am I kmm/rmmm . The matrix is in the form l'll Multiplying from the ith row (9.10) rll '12 ° 3 ° rlm 2 a o e 2 0 r22 I’Zm '-nm We have r m+l _ r m ij lj u'm-i-l - u,” kim+l _ klm and pm+l ' (Pm+ilo 'mIln l rnn p11 P2] m+l, m+l, m+l, Pln h 2 PZn k2 .Pmn k m 0 0 0 m m unn k ‘ the mth row of (9.9) by t‘m and subtracting (i I m+l, - - -,n), we obtain the new matrix 58 u m+l _ (”lim+l m+l) 1 1 ' ° '1 u1n ' ' m+‘.' 9 '.n. The following two lemmas summarize the results. Lemma 9.7 L_.pm L_'not L__n(N) then t a following hold a) r,jm+l I (Nu'm+l,ej) i I m+l, ' ' ',n .l " mo. ' .0" n... C) rljilW'l - Him.” '.J . "1+1. 0 e a." 0 0 0 0 0 0 + d) 9m+1 ' (00 o 00 km+lm+ls 1 knm 1) . . . +1 .. + 3) NPm+l ' (0. o 01 'm-I-lm-I-lm 1 o 'm+lnm l>° A Lemma 9.8 _l__ pm _i_s_ L1; n(N) then r'mm I 'ml I 0 (l I m, ' ‘ ',n) an t e five conditions in lemma 9.7 reduce t a) rlJm+' I film is) ' ”+10 ' ' '1" b) klm+' I k,” 1 - n+1. - - ',n C) rIjm+l - rJ'flH’l l.j - m..." e e 0.". Bi 3‘) this reduces £9. r'Jm - I‘J‘m o d) 9m+l ' (00 ° ° '1 0, km+lm1 ° ° ..an) 6) an+1 . (0. ‘ '. 0. rm+lm+lm0 '1 'm+lnm) In this case we also have ill‘i'l m pm+l ' um+l - UM+I ’ and again kmm I 0 since we are assuming that Nx I k has a solution. 59 Recall that applying the cd-method to minimize the error function E(x) is equivalent to solvingthe linear system Nx I k If E(x) is suitably chosen. Let E(x) I (r,Rr) where R Is a positive definite matrix and r I k-Ax. The gradient vector 9 - A*Rr I k-Nx where g k - A*RE, N - A*RA. Since E(x) is minimized when g(x) I 0 then we see that minimizing E(x) is equivalent to solving the linear system Nx I k. From the work in section 9 we see that in choosing our direction vectors p‘, ' ’ ',pn by N-orthogonalizing e], ° ° °,en the inner products in algorithms 9.3 and 9.9 are found by performing Gauss elimination on the system Nx I k. Hence the application of Gauss elimination to Nx I k and solving Nx I k through algorithms (9.3) and (9.h) are equivalent. Let N be positive definite in the previous material of section 9. Since every cd-method is a cg-method then there exists a positive definite matrix K such that sl'k (9.11) P1 ' K91 91+1 ' 91 ' alei (91.9.) (k.pi) aI-_———-M (P;.NP|) (pg.Np,) 6O p1-1-1' K91-1-1 " b1"1 ., - .m/c, . c.- (g,...)-1k.p,). The vectors 92, ° 9 0, 9n are generated by (9.ll) since 91+] I k - in+l I k - Nx' - ain, I g, - a'Np'. From the proof of theorem 6.l we see that If we choose K - G*"cc" , where G is the matrix whose columns are 9,, . . .,gn and C is the diagonal matrix with c, down the diagonal, then (9.il) is satisfied. Hence the vectors pl, 0 . -,pn generated by N-orthogonalization of e], . . . -,en are also generated by (9.il), and solving Nx I k by Gauss elimination is equivalent to solving it by the cg-method whose direction vectors are given by (9.il). We now apply the cg-method and FP-method to the minimization of non-quadratic functions. Ill APPLICATIONS TO NON-QUADRATIC FUNCTIONS i0. Algorithms for non-quadratic functions. The algorithms considered in this section and also sections ii and l2 are for minimizing a non-quadratic function f(x), bounded from below for all x and having a positive definite Hessian matrix at the minimum point. The minimum I. *IILI ' u s point found Is of course just a local minimum. These algorithms differ only In the method for generating the direction vectors p0. pl. ' ’ ° . These vectors are the directions in which the function f(x) is minimized. The algorithms presented In sections l0, ii, and l2 will not be given in terms of f(x) but in a form that is convenient for comparing and deriving convergence results. In section l3 the relationship between the minimization of the function f(x) and the algorithms in sections l0, ii, and l2 will be shown. Let {C1} and {T1} be sequences of positive definite matrices converging to C and T respectively, where C and T are positive definite. Theorem l0.l l__g° L_H3 non-zero vector then the algorithm 90 I Togo 9.+1 . 91 - agc1p, (91.121) a} ' (10.1) (pphpl) 6i ‘62 ' TI+19i+l pi+l generetes non-zero vectors 90, g], ' ° 0 an p0, pl, 0 - v such that (l0.2) (9,4,1. Pi) ' 0. The algorithm terminates L2 r steps‘Li| or some r, 9r I 0; otherwise (l0.3) lim 9, I 0. 1+» ‘55223. The scalar a, is chosen so that (l0.2) holds. If gr I 0 for some r then pr I 0 and the algorithm clearly terminates. The proof of (i0.3) is a result of the following lemma and theorem. £$Efl£.'°°l £1523 exists positive LEEL numbers m an M such that (l0.li) 0 1 w (1005) (91+10C-‘9l+‘) : V (glee-'9')e Proof. We have (1006) (“MW-'9'...) ’ (9'sc-Igi) ' 23'(9'oc-ICIT[9') + 8'2 (9',T,C'C-IC'T,9,). This equation can be written in the form (no.7) (g,,,.c"g,+,)'- 1.,.c".,1 - 2.,(g,.r,g.1c.1.,> where c ( ) (x,c“c,T,x) 1/2 (x,T,c,c"c,T,x) X - “ ‘ ‘ ‘ _ g ' (x,TIx) (x,T'C'T'x) 6h Since CI converges to C, lim G,(x) I i/2 i+I for all x i 0. Hence there exists w > 0 such that for all i > w (l0.8) G'(x) > l/h . By lemma l0.l a, 3 i/M. Hence from equation (l0.7) we obtain al (gioTlgi) (g .C'lg )5 1" I (9 .C'191) 1+1 1+1 2 (91.6491) 1 where with 0 < m 5 M. By choosing v I l- m/ZM, we have 0 < v < l and (9,...c"g..,> : v1g,.c"g.> for all I > w. As a consequence of (l0.5) we have lim (gI,C-lg,) - 0. I+w and hence 65 lim gI I O, l+a Therefore the proof of theorem l0.2 completes the proof of Atheorem l0.l. Theorem l0.2 shows that for i > w the sequence '{(g,,C"g,)} converges as fast as a geometric progression with ratio v. Before considering a special case of theorem l0.l we finish this section with the following lemma and theorem. Lemma l0.2 _I'_f_ I), I I-a,C,T, then A j__a_n.eigenvalue __f_ DI LI: and only _I_f_ (l-A)/a, _I_.__n_ eigenvalue 9; C,T'. Proof. Since Di I A l I al [I (~l-A)/a' I CIT'] and a, is bounded away from zero for all i, then A is a zero of the determinate of Di - AI If and only if (l-A)/a, is a zero of the determinate of I (l-A)/a' - C, ,. Theorem l0.3 Lilli algorithm (l0.l) P113. the eigenvector corresponding £2 t e maximum eigenvalue 2:.CIT' then |91+1| < Isnl In be respectively the maximum and minimum eigenvalues of 0,. By lemma l0.2 (l-A'm,n)/a, l i Proof. Let A max and A m and (l-A' )/a' are respectively the maximum and minimum max i We note that A ma < l since eigenvalues of CIT x 1" (L-A'mxna. > 0. s1... 66 (91.11,) (p';.T1"pi) i 3' - —. - (PI.CIP;) (p'.C1p') max. eigenvalue of C,Tl then lie, is the maximum eigenvalue of C,T'. This means A min I D and all eigenvalues of D, lie in the Interval [0, I). Therefore, by (l0.l), 91+1' 91 " a1517191" l>191 and l |91+1|5||D1|| I9'|< |9.| . ii. The FP-algorithm for non-quadratic functions. The FP-algorithm is a special case of algorithm (l0.l). However, before looking at it we first consider the following result. If {T'} is a sequence of positive definite matrices such that llm Tl I uC-I l+ee where u is a positive real number, then algorithm (l0.l) converges. With the above choice for the sequence {T,} we get a convergence which Is stated in the following theorem. Theorem ll.l _I_f_ T, Le defined 2 above and algorithm (ID.l) does not terminate Lg'g finite number 2:.steps then 90-911"'§£_“_5.t¥. 67 where v, z 0 an “—- (ll.2) lim v' I O I-bee Proof. Let (X.T‘-IX) Ll(x) ' (x,C'x) then lim L|(X) ' l/U [Ira for all x i 0. Therefore lim a, - lim L,(p,)-1/u l+w l+ee and I‘m C'T’ . U'e I+ee if we choose v' I ”D," then lim v' I lim ”I-a'C‘T'III O . l-Nn l+o Since 91+] ' 019'. we have I91+1| 5 HO, |||9.| . V,|9;|o We now give the FP-algorlthm for minimizing a non-quadratic function. 68 Theorem li.2 The conclusion 2: theorem l0.l follows j__ s we replace {Tl} 9.1“”) where Ho L_eositive definite matrix and H,+, I H, + AI + B, A _ |°1><°1l I <°1|Y1> ("'3’ B _ '”1|Y1> 1a. - am.) . 1v1> - '1|°1>' 3:321. To prove this theorem it Is only necessary to show that Hi is positive definite for each I and converges to a positive definite matrix. The proof is by induction. The matrix H0 is positive definite by hypothesis. We assume H,, ' ' ' ,Hk are positive definite and show that Hk+l is positive definite. We have ‘ 2 2 x k x H y I + < '0 >u I < I kl k) . <°lek> <°k|Yk> If |p> I Wk)”2 IX) and |q> I (Hk)”2 ka> with |q> .1 0, then @1931» - 2 + 2 ' <°klyk> (11.11) - .1... ("‘5) <°k|Y1<> ' <°k|ckl°k>> 0' By Schwartz's Inequality we have 69 (ll.6) - 2 z 0 . and is equal to zero if and only if (li.7) Ip> I th> for some non-zero t. Since Hk is positive definite, (li.7) is equivalent to Ix)» I tlyk>1. If I 0 for Ix> 9i 0, then Ix) 1‘ tlyk>by (ll.5). Hence (ll.6) is a strict inequality and <3]Hk+,|x>i> D. If si 0 then 2 >0 and > O. From the definition of H'+] we have ”1+1C1|°1> " |°1> and (pHIIC, I131) . 0 for each i. The definition of H, here and in section A differ in that here y' I C'ol and in section A y, I Na'. Since C' converges to C we have llm 11, - c", i+I Therefore {H‘} is a sequence of positive definite matrices whose limit is positive definite. The next theorem gives a result on the convergence of the FP-algorithm. 70 Theorem 11.3 J; T' .i_s_equai 2H] and algorithm (l0.l) does not terminate _i_t_1__a_ finite number of. steps then there exists positive real numbers v' such that (”-9) |91+1| s Vilgil 29.... (11.10) lim v' -o. I-Hn ‘nggi. With the choice of T' I Hi for each i, the hypothesis of theorem ll.l is satisfied. Therefore both (li.§) and (li.i0) hold. l2. The cg-aigorithm for non-quadratic functions. In this seciton two cg-aigorithms are given for non- quadratic functions. Both methods, however, generate the same direction vectors when the function to be minimized is quadratic. We again have the sequence {C'} of positive definite matrices cowerging to the positive definite matrix C. Theorem l2.] Given 3 positive definite matrix K an * - non-zero vector 90 then t e algorithm P " K90 91+1' 91 " 813191 (glsp') a - ‘ (plbclp') (12.l) P1+1 " K9i+l * b1P1 7i (K9|+I.CIPI) bi I ' (PIDCIPI) generates non-zero vectors 90’ g], - 0 0 and p0, pl, 0 o . such that (1202) (9i+l0pi) ' 0 o (P]+]oc[PI) ' 0 or a l i. The algorithm terminates L_.r steps L1_ or some r, 9r I 0; otherwise (12.3) llm gl - o 1.... 25221, The scalars a, and b, are chosen so that (i2.2) holds. if for some r, gr I 0 then br-i I O and pr g 0 and the algorithm terminates. The proof of (l2.3) follows from the next lemma and theorem. LSEEE.'2°I £3353 exists positive real numbers m an M such that “a (12.h) 0 1< m 5 1/3! < M Proof. For non-zero vectors p' and 9. we have by (l2.l) (P1091) (gingl) a - - e ' (Pisc'p‘) (p'scip‘) We note that (911K91) ' (P1oK'1P1) ‘.§1-12 (P1-I'K-IPI-1) 72 and hence (g,.1 0 such that (12.5) a. 5 l/m for all i. To show a' is bounded away from zero we note that (l2.6) (91.51P1) - (91.KCIK91) + 2b|-](P|och|-]) ‘b]-|2 (Pl-]oc|P[-]). Using (lZ.l) and (l2.2) equation (12.6) becomes (12.7) (91.0.9‘) . (9‘.KC'Kg,) + b,-,(91.K91) (91.6161-1'191-‘) (91.6.51-1'191) 81-1 (9,.9,-]) (p;.91) (P1é1gF1P1-1) (PI-1-CI-191-1) Since Ci converges to C the term in brackets converges to i. We note that 73 (QisKgi). b1-1 ' > 0 1 (91-10K91-1) hence for i sufficiently large equation (l2.7) implies (Piscip') : (9'.KC'KQI) and (91.K9') a' 2 A e (9‘.KCK9.) Thus,_for i sufficiently large a. is bounded below bY Ilk'max where x'max is the maximum eigenvalue of C'K. Hence there exists M 0 such that (12.8) a, z i/M for all i. From (i2.5) and (l2.8) we have 0 1w - -1 (1209) (9i+l.C lgi+]) S V (9'.c 9'). Proof. Using (l2.l) we have (12.10) (9i+lvc-‘9i+l) I (91.6'191) - 2a'(g'.Kg')G'(g;.p') where c ( ) (x,c"c,y) (y,C"c,c'y) ] st ‘ - ' l (st) ; (Yuc'Y) 7h Since Ci converges to C, lim G‘(x,y) I i/2 [+1111 for all x and y such that (x.y) I (x.KX) 9‘ 0 . Hence there exists w > 0 such that for i>w (12.11) Gl(x.y) >1/l1 . From the lemma i2.i we have (12.l2) O »w the sequence {(g,,c"gI)} converges as fast as a geometric progression with ratio v. The second cg-method is simply taking algorithm (l2.i) and restarting it every n iterations. The algorithm is given in the following theorem. Theorem i2.3 LL’K is 2_positlve definite matrix an 90 _i_§__a_ non-zero vector t en t e algorithm po ' K90 9j+i+l ' gj-o-l ' aj+icj+lpj+i (gJ+IOPJ+') a i I 1+ (PJ+[DCJ+|PJ+') (12.13) pJ+l+i ' K91+i+l + bj+iPJ+‘ l . 0. ‘ ‘ °. "'2 . - (K91+1+1'c1+lpg+1) (91+i'c1+lpj+i) bj+i PJ+1+1 ' K91+1.1 l I "-1 . 76 where j I 0, n, 2n, 3n, . . . and for each J, l I O, o . 0, n-ig‘ggnerates non-zero vectors go. 9], 0 . - an p°.p]. ‘ ' such that ('Z‘IA) (9J+i+l'pj+i) ' 0 2.2.1 ('2"5) (”1+1+11°J+1PJ+1’ ' 0 i . 0. ' - -. n-Z. The algorithm terminates lpyr steps if for some r, 9r I 0; otherwise llm gI I 0 I+o Theorem l2.h Li algorithm (l2.l3) does not terminate "L9. .9. m M '9; 2.2.2.33. there exists positive real numbers v and w such that ()5 v < l and for l > w . -l -l (l2.l6) (g'+‘oc 9'44) 5 V (9'.c 9'). The proofs of theorems l2.3 and l2.h duplicate the 'proofs of theorems l2.] and l2.2. Numerical tests [A] for minimizing non-quadratic functions have shown the second conjugate gradient method converges faster than the first method. The Fletcher-Powell formulation of the Davidon algorithm (F-P method) exhibits substantially faster convergence than either of the conjugate gradient methods. 77 l3. Minimlz[gg_non-quadratic functions. Let f(x) be a non-quadratic function bounded below, and assume f(x) has a positive definite Hessian matrix at every local minimum. We indicate the matrix of second partial derivatives (Hessian matrix) at the point x by f"(x). The gradient vector at x is indicated by f'(x). To minimize the function f(x) we select an initial point x0 and set go I - f'(xo). Next select a vector po that is in a direction of decent. This will be true if (go.po) > 0. The next point x is found by minimizing f(x) l along the line x(t) I x0 + t p0. This point will be characterized by (f' (x]) .po) I 0. Having found x', calculate g. by g' I - f'(x') and select p; so that it is in a direction of decent. This will again be true if (g',pl) > 0. The point x‘+1 is found by minimizing f(x) along the line (13.1) x(t) I x' + t p' . We again have (f'(x'+l), pi) I 0. The sequence {f(x')} is a monotonic decreasing sequence of real numbers bounded below, and therefore converges. The sequence {x'} converges 78 to some point 2 which is a local minimum to f(x). Hence (1302) lim XI I i I-no and (13.3) ilm f(xi) - f(i) i-ns where f(x) < f(x) for all x in some neighborhood of x. We are of course assuming f(x) is sufficiently well behaved so that (l3.2) and (l3.3) hold. We have (13.H) g'+] I g. I f"(x'+elo')o‘ where 0 5 9i < l and 0' - X'...‘ ' X' - a'p] e Here a} is the scalar that minimizes f(x) along the line given by (l3.l). The I'm (x'i'G'O') - ; [+0 since °i converges to zero. Hence Hm f"(X'+eICI) - f”(;)e [+1113 Since f"(x) is positive definite there exists J > 0 such that for all i > J f"(x'+6'a‘) is positive definite. For i I O, ' ° °, J we have 79 (91‘91+]o°1) ' a' (9|.P1) > 0 since al > O and (gl.p') > 0. Therefore there exists a positive definite matrix C. such that (13.5) g. - 9i+l I Ciel for i I 0, ° ' ', J. The vectors go,gl,° - - therefore satisfy 91+1' 91 " °1c191 where C' is given by (i3.5) for l I i, ' ° ', J and C' I f"(xl+ei°i) for i I J+l, ' ' ' . Hence {0'} is a sequence of positive definite matrices whose limit is f"(§) which is also positive def'N'tOe Now a' satisfies the equation (9mm) a. I (Piscipg) since (9l+lOPi) I 0. With the above choice for Ci the algorithm for minimizing f(x) can be written in the following form. Given an initial point x0 and setting 90 I I f'(x°) we choose po so that (go,p°) > 0. Having found x',g', and pI we calculate x1+l, g‘+], and Pi+l by the following formulas 91+1' 91" a1c1i’1 80 (Ploclp') xi-1-i ' x1 + aipl and pi+l is chosen so that (g'+'. P'+I) > O. The algorithms in sections l0, ii, and l2 differ from algorithm (l3.6) in that they give specific methods for 'generating the vectors p°.pl, - ° ° . Hence the study of algorithms of the fonn given in sections l0, ii, and l2 is equivalent to studying algorithms for minimizing a large class of non-quadratic functions. BIBLIOGRAPHY [l] [2] [3] ['1] [5] [6] [7] [8] [9] BIBLIOGRAPHY Hestenes. M. R. and E. Stiefei, ”Methods of Conlugate Gradients for Solvin Linear S stems. eport . National Bureau of Standards, I952. Hestenes, M. R. ”The C n u e Gr d e t Me h d lvin i e S s ems." Proceedings of Symposia in Applied Mathematics. Vol. 6. New York, N. Y.: McGraw-Hiii Book Co., l966. ‘ Fletcher, R. and H. J. D. Powell. "A R idl' C n e ‘QESSDI EEIDQQ ffifl'fliflimiiiiififl°" Computer Journal. JUIY '3 3e . I Kelley, H. J. and Geraldine E. Hyers. ”Egnluga;g, Ii Presented at the lgth Congres s of international Astronautlcal Federation. Belgrade, Yugoslavia. Septemberfii967. Myers. G. E. “Properties of the Cenlugate Gradient and Davidon Methods.ll est ury, ew or : na yt c ec an cs ssoc ates nc. ' ‘ ~ Dirac. P. A. M. ”The Principles of Quantum Mechanich' Oxford: Qe Us Pa. lggae . ' Davidon. W- C- "W1" A. E. C. Research and Development Report, ANLI -5990. l959. ‘ Fletcher. R. and 0- M. Reoves- "WW W" ComputerJournal- July '9 - Hayes, R. M. "Iterative Methods 9f Sglxing Linea; bi H be ~ ." National Bureau of Standards Report l733. Hay l952. 8] ”IIIIIIIIIIIIfiIIIIIIIIIIIIIII“