ALGORITHMS FOR SGLVTNG EVEREEEIENMINEB SYSTEMS 0E LINEAR L57 LIT-7c: EQIIATIGRS IN THE A; SETTSE: ' ’2 5:1;.ffEffjiisz TIEeais Io: Ike Degrée of PIE D MTPH‘GATT STATE TTT‘TTVERSTT‘I' R‘JBERT ‘1‘iTE. LTAM OWENS ‘ ' 39T5 _ IIIIIIIITITIIIIIITTITIITITIIIITTIITTI L. 3 1293 00839 6214 SEP27W ‘9' ABSTRACT ALGORITHMS FOR SOLVING OVERDETERMINED SYSTEMS OF LINEAR EQUATIONS IN THE ‘P SENSE BY Robert William Owens In this thesis, we investigate methods for approx- imating a solution of an overdetermined system of linear‘ equations. A best approximate solution of the linear sys- tem Ax = b is taken to be a vector x which minimizes the length of the error vector n(x) = b-Ax. we consider the two classes of approximation problems obtained when we determine the length of n(x) first by a smooth strictly convex norm, and second by an LP metric for O < p < 1. For each of the two approximation problems, we study a dual problem whose solution leads directly to a solution of the original problem. Algorithms for solving the dual problems are presented, and numerical results from several LP approximation problems, 0 < p < a, are discussed. ALGORITHMS FOR SOLVING OVERDETERMINED SYSTEMS OF LINEAR EQUATIONS IN THE LP SENSE BY Robert William Owens A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR.OF PHILOSOPHY Department of Mathematics 1975 ACKNOWLEDGMENTS I would like to express my sincere thanks to Professor V.P. Sreedharan, without whose direction and encouragement I could not have written this thesis. I also thank the other members of my committee, Professors Harvey S. Davis, Norman L. Hills, Edward C. Ingraham, and Mary J. Winter, whose helpful comments and suggestions aided in the final preparation of the thesis. Finally, special thanks are reserved for Jill Shoemaker for typing the final manuscript. ii CHAPTER I . CHAPTER II . CHAPTER III. CHAPTER IV . BIBLIOGRA PHY TABLE OF CONTENTS INTRODUCTION . . . . . . . . APPROXIMATION WITH A SMOOTH STRICTLY CONVEX NORM . . . THE LP PROBLEM, on and b 6 ng‘\~1mage(A). we are interested in finding x 6 n9‘ min- imizing the number ”b - Axnp. We shall refer to this as the LP problem. In the case of p = a, we often refer to it as the Chebyshev problem. The case where p = 2 is the easiest to solve since this is just the problem of finding the usual Euclidean dis- tance from a point to a subspace. If the matrix A has rank n, then x = (ATA)—1ATb is the solution of the £2 problem, and the only difficulties that arise are computational ones stemming from the fact that ATA may be ill conditioned. A number of essentially different methods have been developed to handle the Chebyshev problem and to a lesser degree the ‘1 problem. Since each of these norms is neither strictly convex nor smooth, i.e. both have "flat spots" and 1'sharp corners", most of the powerful theorems from approxima-- tion theory are not applicable and special techniques must be devised. Since the unit ‘1 and La balls are convex poly- hedra, the methods of linear programming are applicable for solving the £1 and Chebyshev problems. Stiefel [22] pre- sented suCh an algorithm to solve the Chebyshev problem, and more recently Barrodale and Young [4], Barrodale and Roberts [5], and Abdelmalek [l] have given improved algorithms for handling both problems by linear programming. It has been conjectured that the solution of the La problem is the limit of the solutions of the LP problem as p 4 a, and similarly for the L1 problem letting p 4 1+. It is well known that if these limits exist, then they are exactly the solution that one would expect, but the question of convergence is the crucial point. Descloux [8] established the convergence of the solutions of the LP problem as p 4 a providing a justification for this method of solving the Chebyshev problem. The question concerning the convergence of the solutions of the ‘p problems as p 4 1+, however, is still unanswered. Abdelmalek [1] incorrectly applied a theorem of Hoel [13] to conclude that the solutions of the ‘P problems converge as p.+1+. Moreover, even if such a theorem were established, there would remain the problem of computing the solutions of the ‘P problems when p is near 1. This, as shall be mentioned, is another significant difficulty. Methods of solving the Chebyshev problem by trying to find the limit of the solutions of the LP problems as p 4 a have been pre- sented by Fletcher, Grant, and Hebden [12]. Attempts to solve the ‘1 problem by locating a limit of the solutions of the ‘p problems as p 4 1+ have been investigated by Abdelmalek [1]. Another method used to solve the Chebyshev problem was developed by Lawson [14]. Using a result of Motzkin and Walsh [16], Lawson was able to compute the solution of the Chebyshev problem as the limit of the solutions of a sequence of weighted LP problems where p may be held fixed and the weights change in each iteration. Since weighted L2 problems are almost as easy to solve as 12 problems, Lawson chose p = 2 throughout his algorithm. An analysis of the rate of convergence of the Lawson algorithm was made by Cline [7]. Another theorem of Motzkin and Walsh [16] similar to the one guaranteeing that Lawson's algorithm converges to the solution of the Chebyshev problem also says that the solution of the £1 problem is identically the same as the solution of a weighted LP problem for some choice of weight functions depending upon p. So far, however, no one has discovered an algorithm for computing those weights either directly or iter- atively even in the case when p = 2. The best known method for solving the Chebyshev problem is the exchange algorithm as can be found in Cheney [6]. It exploits the fact that the m xn Chebyshev problem given by the overdetermined linear system Ax = b has exactly the same solution as some particular (n-El) xn Chebyshev problem determined by Ax = b, where A is composed of n + 1 rows of A and S is the n + 1 vector of the corresponding entries from b. In fact, of all such (nuEl) xn problems, the one yielding the solution of the original problem is that one for which the error ufiJEIRAx - b“ is the largest. xEfi! Although most of their work deals with the LP problem for l < p < a, Duris and Sreedharan [10] have also presented algorithms for solving the Chebyshev and ‘1 prob— lems that use only the solutions of least squares problems. Much more can usually be said about the solutions of the LP problem when l < p < o since in that case the norm is both strictly convex and smooth, i.e. the boundary of the unit ball contains no line segment and at each point on the boundary of the unit ball there is a unique hyperplane sup- porting the unit ball. Also, the duality of the spaces L 1 l . nd where —-+ -= 1 can be ex loited. Sreedharan [19,20,21], Duris [9], and Anton and Duris P [3] have develOped several algorithms for solving the approx- imation problem for an arbitrary smooth strictly convex norm and in particular for the LP norm when l < p < a. In each case, the solution of a dual problem is obtained iteratively by carrying out an orthogonal projection and solving a single nonlinear equation in one real variable at each iteration. Numerical results indicate that the algorithms converge slowly when p is very near 1 or very large, but work satisfactorily in all other cases. In Chapter 2 of this thesis, another algorithm for solving the given approximation problem with a smooth strictly convex norm is presented. The algorithm uses ideas developed by Sreedharan in [20,21]. Numerical results on LP problems are presented in Chapter 4. Rice and Usow [18] have extended the previously men— tioned Lawson algorithm to include LP problems for 2.g p < a and they have developed a method for accelerating the conver- gence of the algorithms. However, they observed that for p large the convergence is still quite slow. Moreover, the algorithm is not applicable at all for l < p < 2. Another algorithm applicable for 2.5 p < a but not for l < p < 2 has been presented by Fletcher, Grant, and Hebden [11]. Although primarily designed for Lp approxi- mation problems, i.e. continuous rather than discrete approx- imation, the modification is immediate. For p12 3, second order convergence is proven, but numerical results are few and inconclusive regarding this algorithm. A final method of solving the LP problem is by min- imizing the differentiable function of n real variables f(x) = Hb-—Ax"p (1.3) It is sufficient to find a zero of the gradient vf. Newton's method and the method of steepest descent are applicable to speed convergence to a root of vf if p is large enough to guarantee enough derivatives of f. Unfortunately, p'Z 3 is required so again the case 1 < p < 2 is left untreated. In summary, the situation is roughly as follows: p = 1: Only a very few algorithms are available. 1 < p < 14-2, e small: Almost no effective method is known for treating these LP problems. p near 2, i.e. not too close to 1 nor very large: There are many algorithms available - few difficulties. p very large: There are a couple of good algorithms avail- able to choose from, but not very many. p = a (Chebyshev problem): There are several efficient algo— rithms to solve this problem. With a slight modification in the definition of "'"p’ the LP problem for O < p < 1 also makes mathematical sense and we are consequently led to investigate solutions of that problem also. With the exception of Hoel [13] and Motzkin and Walsh [15,16,17], no one has considered this question. When we choose 0 < p < 1, "°"p suitably defined turns out to be a p—homogeneous metric but not a norm, and the unit ball is not convex. In Chapter 3 we consider the LP problem for O < p < l establishing theoretical and compu- tational methods for finding a solution. In Chapter 4 some numerical results are presented. CHAPTER II APPROXIMATION WITH A SMOOTH STRICTLY CONVEX NORM In this chapter, we shall study the system of linear equations Ax = b where A is an m xn matrix, m > n, x is a n-vector, and b is an m-vector. We assume all numbers are real. Let "-R be a smooth strictly convex norm on 35‘. The problem that we shall be concerned with, referred to as problem (P), is (P): Find x 6 EMT minimizing [Rb-Ax" [x 6 n3‘}. In [21], a dual problem (P*) was considered, the correspondence between problems (P) and (P*) established, and two algorithms for constructing the solution of problem (P*) were presented. Here we review problem (P*), the previous results concerning that problem and (P), and then present a new algorithm for solving problem (P). Before actually beginning this development, we set down some assumptions, definitions, and notation that will remain standard throughout this chapter. m (.‘.) denotes the usual inner product on H! , i.e. m (x y) = Z x.y. ‘ j=1 J 3 We denote the transpose of a matrix A by AT. (v) means the linear span of the vector v. K = Image (A) = [Ax[x 5 En] l T m K = Ker A = [x E n: ](k]x) = O Vk E K] E:]R“1 4 Kl is the orthogonal projection of 35‘ onto K‘, where orthogonal means with respect to the inner product (']°) given above. 3 = Eb p = inf{||b-k” [k E K] We assume that p > 0 since the problem is trivial otherwise. Recall that a norm "‘" is strictly convex if and only if 1 . . "x" = "y” = Eflx-Ey“ implies that x = y, and "-H is smooth if and only if through each point of unit norm there passes a unique hyperplane supporting the closed unit ball B= [x 6mm]"x" 51}. Given v #'0, we define the vector v* 6 H5‘ by (v*]v) = ”v” and max[(u]v*) [Hun g l} = l. 10 By Lemma 2.1 of [21], the correspondence v H‘V* is a con- tinuous function from H§T\~[O} into itself. Moreover, in the special case when "'H is the LP norm with l < p < a, v* takes on the particularly simple form v.]P-l v; = ‘43—]? sgn V3 anp In [21], Sreedharan introduced the following dual problem (P*) (P*): Find 2 EK@(s), ”z” = 1 maximizing (s‘w) over all w 6 K®(s), "w" = 1. 2.1 Lemma. (i) There exists a unique k e K minimizing Hw-yHIYGIQ. (LlJ) (ii) Problem (P*) has a unique solution. Proof: (i) Let x E K and set 5 = Hb-xH. S = [y 6 K THb"Y".S 5] is a compact set, and f:K 4 n: by f(z) = Hb-—z” is continuous on S, so f must achieve its minimum value on S. Suppose x,y 6 S such that f(x) = ub-xn min[|]b-z|] [z 6 S] = Hb-yl] = f(y). (2.1.2) §sz- (x+y)u -§-II(b-x) + (Io-y)" <%um-xn+Im—yw =Nb-xu=Im—yu 11 By the strict convexity of H-U, b-x = b-y, i.e. x = y. Thus there is a unique k E K minimizing (2.1.1). (ii) It is clear that problem (P*) has a solution. Sup— pose y,z both solve problem (P*) and y #'2. Since ”-u is strictly convex, "y" = ”z" = 1, and y #Tz, IIY+2II < 2. (2.1.3) Since (91y) = (s|z) > o, y+z 510. Let w = ”3’12” Then w E K@(s), “w" = 1, and _ (Sly) + (812) = 2 s (s‘w) — THY +2". #120” > (s‘y) by (2.1.3). But this contradicts the assumption that y solves problem (P*). Thus the solution of problem (P*) must be unique. The key results of [21] used to solve problems (P) and (P*) are 2.2 Theorem. If 2 EK@(s) with ”z" 1 and z* eKJ’, then b - (b]z*)z E K and p (b]z*). s s) 2.3 Corollary. Let 2 be as above. Then (b]z*) = (s z 2.4 Theorem. Let 2 solve problem (P*). Then 88 82. b _ s s) (s z 2 6 K and p = ( 2.7 Lemma. Let z,W’E 351 be linearly independent. Then a 6 1R satisfies VI Us ch Ch 12 IIz-OMII g Nz-xwu vx 6 R if and only if ((z-aw)*]w) = 0. Next we give a new algorithm for solving problem (P). 2.2 Algorithm. Step 1. Set k = O and yk = "s" Step 2. Compute y; and vk = AA yk . Step 3. If vk = 0, go to step 6; otherwise continue. Step 4. Choose Gk > 0 such that ((yk-akvk)*[vk)==0. Step 5. Set yk+l = 72R where z = yk - akvk. Increment k by 1 and return to step 2. Step 6. Using yk, the solution of problem (P*), solve problem (P). 2.3 Theorem. Either algorithm 2.2 solves problem (P) in a finite number of steps or the yk converge to the solution of problem (P*). Before proving theorem 2.3, we make a couple of obser— vations about algorithm 2.2 and establish a few facts to be used in the proof. In step 1, we could have selected yO to be any element of K6 (3) of norm one with (yo|s) > 0. Our choice is just a very convenient one. In step 3, one might choose to stop the computations if ”Vk” or ”akvk” is 13 small. The reasons for these stop rules will be apparent in the course of the proof of the convergence of the algo— rithm. With respect to step 6, Theorem 2.4 of [21] says that Ax = b — Y‘all-[812)— yk = b - pyk has a solution, say 3?. Since “b-A§H = p, i solves problem (P). We next prove a couple of propositions essentially saying that algorithm 2.2 makes sense. In particular, vk = 0 implies that yk solves problem (P*), and the choice of Gk specified in step 4 is always possible. 2.4 Lemma. Let vk #'O. (i) There exists a 3 0 such that Hyk - GNkH < 1. (ii) If Yk’vk are linearly independent, then for every 0 > O sudh that ”yk - Gvk" < l, we have 0 < Hyk - avkn. (iii) If yk,vk are linearly dependent, then there is a unique a > 0 such that “yk - avk“ = 0. (iv) There is a unique 6 6 3% such that VA 6 n1 ”yk - Bbku g “yk - kau , and B > o. (v) ((yk - fivk)*|vk) = 0. Proof: By the definition of y; , max[(u]yl:)| Ru“ 5 1} = 1. My]. - wk" = 1ka - 0.ka maXIIUIYPI Hull 5 11 14 -av Yk k -owk[{TY1:‘ 2 lek' WkTHTTYk = (Yk’WRTY;) = (ykwp - aIy;(vk) “Ykn - 0(y;]vk) _ _ * - l a(ykIvk). (2.4.1) Observe that * _ * T* T1: T*_ T*2 (yk|vk) - (yk]AA yk) = (A yk]A yk) - ”A yknz . (2.4.2) Together with the definition vk = AATy;, ‘we have that . . T11- vk = 0 If and only If A yk = 0. (2.4.3) By assumption, vk #TO so from (2.4.1) IlYk-Wk" > 1 for all a < 0. (2.4.4) Suppose that Yk’vk are linearly dependent. Then there is a B 6 IR such that nyk-kau = o. By (2.4.4), I3 > o. More specifically, fl = proving (iii). Suppose Yk’vk l vkfl are linearly independent. If there were no a > 0 such that ”yk-avk“ < 1, then Ika-avk“ 2 1 = ”ka for all a e 13. (2.4.5) By Lemma 2.7 of [21], (2.4.5) implies that (yElvk) = O, which by (2.4.2) and (2.4.3) means that vk = O, contradicting the ‘hypothesis that vk i'O. Consequently, there must be an <1>C> 15 such that "yk-avku < l establishing (i) and (ii). Consider the strictly convex function f:]R 4 ]R by f(x) = [Tyk—kau . f(0) = 1, f(x) 4 a as A 4 a, and by (i) there exists an a > 0 such that f(a) < 1. By the continuity and strict convexity of f, there is a unique 1 > 0 such that f(i)==l, and by the strict convexity of f, there is a unique 5, O < B <'X, minimizing f. (iv) is proven. Finally, from (iv) and lemma 2.7 of [21], it follows that for the 5 found in (iv). This completes the proof of Lemma 2.4. We summarize the important points of Lemma 2.4 in 2.5 Proposition. Assume that yk and VR are linearly in- dependent. Then there exist a unique 0 > 0 such that _. * == ((Yk avk) [vk) 0. (2.5.1) Moreover, with this choice of a, 0 < ||yk~akaI < 1. (2.5.2) As an immediate consequence of proposition 2.5, step 4 of algorithm 2.2 is guaranteed to make sense. 2.6 Proposition. For any k 2 O, yk E Ke(s), "ka = l, and ‘YkT3) > 0. 16 Proof: The assertion is true for k = O by construction. Assume that the pr0position holds for k = O,1,...,n. We shall now verify its validity for k = ni-l. Y = = n = . If vn 0, then yn+1 T§;T' yn so the assertion is true for k = nntl. Assume that vn {’0. Since vk AATy]:, vk e K vk. s e K“, yn e K@(s) with (ynTS) ,\ o, and vn E K implies that yn and vn are linearly inde— pendent. So by prOposition 2.5, o < Hyn-anvn". (2.6.1) yn nn Since yn E K®(s) and vn 6 K, yn+1 = TTYn-anvn" E K@(s), and "yn+1" = 1. Finally, ynnanvn (Yn+1‘8) a (TTYn- anVnTT TS) (yn]S)-Gh(vn|s) TTyn - anvn “ (yn|s) a "Yn"ahvn" (2.6.2) > O by the induction hypothesis. This completes the proof of proposition 2.6. We next show that steps 3 and 6 of the algorithm do not lead us astray by proving 2.7 Proposition. If vk = 0, then yk solves problem (P*). Proof: By (2.4.3), vk = 0 implies that y; 6 Ker AT = K‘. Also, by Lemma 2.6, yk 6 K@ (s) and "Yk" = 1. Applying theorem 2.2 of [21], we have 17 b - (b(y;)yk e K and p = (b|y;). (2.7.1) (2.7.1) together with corollary 2.3 of [21] and lemma 2.1(i) yields that s s) b-Tss—‘Jy—kTYk (2.7.2) is the unique point in K closest to b. Let 2 solve problem (P*). Then by Theorem 2.4 of [21], s s) b- (S 2) 2 (2.7.3) is also the unique point in K closest to b. Equating (2.7.2) and (2.7.3), we have 2 = ——ZE—7-. (2 7 4) (3 2) (STYk . . Taking the norm of both sides of (2.7.4) and using the facts that ”z" = “yku = l and (yk]s), (st) > O, we have (s‘z) = (STYk)° (2.7.5) Lemma 2.1(ii) now says that yk = z, i.e. yk solves problem (P*). We now prove Theorem 2.3, i.e. that algorithm 2.2 solves problem (P). Since the proof is somewhat lengthly, we first give an outline of the steps to be taken. We show that (i) Vk 2_O O < pk < pk+1, where pk = (ykls), 18 (ii) 1im Hyk-Gkvku = l, k4ao (iii) 1im vk = O, k4a (iv) any limit point of [yk]k'2 O] solves problem (P*), and (v) 1im yk = y which solves problem (P*). k4a Because of PrOposition 2.7, we can assume that vk #‘o Vk.2 O. 2.8 Proof of Theorem 2.3. Let pk = (YkTS). (2.8.1) (i) Claim: 0 < pk < pk+1 Vk‘z o. _ ._§_ pk+1 = (Yk+lTS) - (yk‘S) b (2 6 2) - TTYk'akvaT y ° ° ’ > (yk]s) by (2.6.1). Thus 0 < pk < pk+1 Vk‘z 0. (ii) Claim: 1im "yk-akvk” = l. k4¢ From (i), O < pk < pk+1 _<_ max[(w]s)]w E K@(s),"w" = l} l9 and from (2.6.2), p k 3 Wk" “kvaT' pk+1 pk Thus lim “y - v = lim = 1. k... k a" k” 11.... pk+1 (iii) Claim: 1im vk = O. R 4a: Suppose the claim were false. Then there exists 5 > 0 such that, by taking a subsequence if necessary, Hvk“ 2.6 for all k.2 0. (2.8.2) 1 > TTYk‘akaTT 2 ”(1)1ka ' TTYkTT = aknvk" ' 1 20145—1 from which one obtains 0 <0:k <% . (2.8.3) Again by taking subsequences if necessary, we can assume that lim = a and lim y = y. (2.8.4) k4o ck k4o k By Lemma 2.1 of [21], the mapping x H x* is continuous on IRm\{O] , so 1im v = lim AATy* = AATy* a v. k k k4a k4¢ "v” 2 5 by (2.8.2). 20 Also, by the continuity of x 4 x* and “-H, and since ((yk-okkaIVk) = o ‘v’k 2 0, we have that ((Y-GV)*|V) = 0, (2.8.5) h-wu=1=hN (4&0 Either 0.: O or a > 0. We show that each of these possi- bilities leads to a contradiction forcing us to conclude that lim vk = O. k4a If a = 0, then (2.8.5) becomes (y*]v) = 0. Together with (2.4.2) and (2.4.3), this means that v = O, contradicting our assumption that v {'0. Suppose a > 0. First note that y,v are linearly independent since (v|s) a O and (y|s) = lim (yk]s) = lim pk > 0. By Lemma 2.7 of [21], (2.8.5) implies k4m k4a that IIY'O‘VH S IIy-xvll for all ). 6 1R. (2.8.7) But (2.8.6) and the strict convexity of "°”’ forces IIy-gvu < 1 which contradicts (2.8.7). Hence we must conclude that lim vk = O. k4¢ (iv) Claim: any limit point of [yk]k.2 O} solves problem (P*). 21 By taking a subsequence and reindexing if necessary, we can assume that lim yk = y. From (iii) and the continuity of k4a the mapping x 14 x* , we have that v = lim vk = 0, (2.8.8) k4a which by (2.4.3) implies that y* e Ker AT = K‘. Also, by proposition 2.6, y E K@(s) and "y“ = 1. (2.8.9) Applying Theorem 2.2 and Corollary 2.3 of [21] and Lemma 2.1(i), b _ s s (s y)y is the unique point in K closest to b. Suppose z solves problem (P*). Then by theorem 2.4 of [21], s s _ s s . b - (s z) z b (s y) y , i.e z = 4— (s 2) my) ' (“M") Taking the norm of both sides of (2.8.10) and using "z" = "y“ = l, (s]z) > 0,, (st) = lim (s‘yk) = lim pk > 0, it k4a R40 follows that (st) = (sTy). Finally, Lemma 2.1(ii) says that z = y, i.e. y solves problem (P*). (v) Claim: 1im yk = y, the unique solution of problem (P*). k4~ 22 By the uniqueness of the solution of problem (P*) any convergent subsequence of {Yka 2:0] converges to the unique solution of problem (P*). Due to the compactness of B= [x€K®(s) ]|]x]]=l],‘ 1im yk=y. k4~ This completes the proof of Theorem 2.3 establishing that algorithm 2.2 is guaranteed to find the solution of problem (P). CHAPTER III THE LP PROBLEM, o < p < 1 We shall be considering the system of linear equations Ax = b where A is an m xn matrix, m > n, b is an m-vector, and x is an n—vector. All numbers are assumed to be real. For yEIRm and o 0, since other- wise the problem is trivial. Finally, let (v) denote the linear span of the vector v. Observe that s-b 6 K since 0 = s-Eb = Es-Eb = E(s-b). As a result, d(s,K) = inf[IIs—k||p]k 6K] = inf[”b-k+ (s-b)TIp|k ex] = inf[]|b-k||PTk 6K] = d(b,K). The existence of a solution of problem (P) follows immediately from the continuity of the LP norm and the finite dimensionality of the subspace K. Given problem (P), we associate a dual problem (P*): Find 2 €K®(s), ]|z||p= l maximizing (sTw) over all w E K€9(s), "w”p = l. The relation between problem (P) and problem (P*) is given in the following theorem which extends Theorem 2.4 of [21]. 3.1 Theorem. Let 2 solve problem (P*). Then (1) pl/sz) = (s|s)1 (3.1.1) and (ii) b - pl/Pz e K. (3.1.2) Proof: (i) (st) max[(s|w) [w E K®(s), ”wup = l} max[(s]k+Bs)]k E K, 5 6 1R1TTk+BSITP = 1} 26 = (s‘s)max[B E 1R| [lk+BsHp = 1, k E K} = (sls)max{B 6 1R {0} 1 Hk+snp = E—lp, k ex} _ 1 _ (s|s)max{“k+s“l7p Dc 6 K} ' P l (s s)min{||k+ suglppc 6K} = (s‘s)—7—1 . pl p Thus pl/p(s‘z) = (s|s) which is (3.1.1). (ii) Let t E K‘. We shall show that (tlb-pl/pz) = 0. Suppose (t‘s) = 0. Since 2 e KGD(s) and t e KJL n (s)*, (t‘z) = 0. Also (t‘b) = (Et‘b) = (t‘Eb) = (t|s) = 0. Thus (t‘b-—p1/pz) = o if (t|s) = 0. (3.1.3) Next suppose that t = 3. Since E is the orthogonal pro- jection of mm onto K" and s = Eb, we have that (s‘b) = (s‘s). (3.1.4) (t|b—p1/pz) = (slb-pl/pz) , (S‘b) - pl/P(s‘z), (s‘s)-(s‘s), by (3.1.4) and (3.1.1), 80 (t‘b-pl/pz) = o if t e (s). (3.1.5) 27 Let F be the orthogonal projection of If“ onto (3), and let t e K‘L. Write t = t1+t2, where t1 =- t-Ft and t2= Ft. Clearly, (t1)s)==0 and t2 6 (s), so by (3.1.3) and (3.1.5), (t‘b-—p1/bz) = 0. Since t E K‘ was arbitrary, b-pl/bz E K. This completes the proof of the theorem 3.1. Observe that ((b- (b- pl/Pz) up = "pl/p2.“ = pnznp = (3. /pz so b--p1 is a point of K of minimal distance from b. In the statement and proof of Theorem 3.1, we used only the p—homogeneity of the LP norm on min Figure 1 on the following page gives a geometric inter- pretation of theorem 3.1. Given K and b as in the theorem, we are to find b’ 6 K which minimizes {"b-k"p‘k E K}. Since the Lp-norm is translation invariant, this is equivalent to finding 3’ e K minimizing [Hs-—k”p‘k 6 K}. Let p = d(b,K) and let 2 solve (P*) as indicated in the figure. By expanding the unit ball by a factor of p, which by the p—homogeneity of ”-Hp means multiplying the unit ball by l/h, multiplying by -d” and translating it by 3, one can p easily see that z is taken by the above operations to the point 3’, i.e. s’= s-pl/pz. Thus b’= b-pl/Pz. Figure 1 also indicates in which direction one should look for a characterization of the solutions of problem (P*). By the symmetry and concavity of "-"p, a solution of problem (P*) should lie at a ”corner“ or on an “edge" of the unit ball, and these correspond to points at which a certain number of coordinates of the given point are zero. Before making 28 (s) .b b' l/ \ x \ / \ ’/ \‘ \ __(\ ,>' / \\ z” (1,0) \\ I / \ S / I \ / \ / \ l \ / \\ / I \ I \ l \ I \I T Figure l. Geometric interpretation of theorem 3.1. these intuitive ideas precise, we introduce some terminology and establish a couple of lemmas. Let X be a real linear metric space, i.e. a real vector space on which a translation invariant metric is de- fined so that the metric space structure is compatible with the linear space structure. Denote by X* the algebraic dual of X, i.e. the space of all linear functions on X. 29 3.2 Definition. Let X be a real linear metric space, AcX, a E A, and H a non-trivial homogeneous hyperplane in X, i.e. H {x E X‘f(x) = 0}, where f e X* \{O}. We say that H+a= [h+a)h 6H} supports A at a if either f(x) 2f(a) vx EA or f(x) _<_f(a) Vx EA. (3.2.1) 3.3. Lemma. Let X be a real linear metric space, f €X* \{O}, A = f-l(0), z E X\A, Z a subspace of X with z E Z and dim Z > O, and A1 = A n Z. Then A1 is a hyperplane in Z. Proof: A1 = A n Z = {x 6 Z|f(x) = 0} being the kernel of a linear functional can fail to be a hyperplane in Z only if either dim Z = O, which is not so by hypothesis, or if f :0 on Z. The latter is also impossible since 2 e Z‘\A by hypothesis so f(z) a! 0. Thus Al is a hyperplane in Z. 3.4 Lemma. Let K be a subspace of mm, f a linear func- tional on K with f ,£ 0 on K, H = f-1(O) = {x €K|f(x) =0}, and B = {x e K ( "pr g 1}. Let 2 e K satisfy (i) “znp = 1, (ii) 2i 7! O, i = 1,2, ...,m, (iii) H+z supports B at 2. Then dimK=1. Proof: dim K 2 1 since 2 6 K\{O}. Suppose dim K > 1. First we show that 2 £ H. If 2 E H, then f(z) = 0. Since dim H = dim K- 1 > 0, there exists x E K\H with "xup g l, i.e. x E B and f(x) > 0. Hence f(x) > f(z) > f(-x). Since -x E B this contradicts the hypothesis that H+z supports B at 2. Thus z£H. Choose x €H\{O} and define 3O 1 if xi = O 6- = , i = 1,...,m (3.4.1) 2. 1 . ‘fi‘ 1f Xi # O l . . 6 = 5-m1n{bi‘1 = 1,...,m}. (3.4.2) Then \zi + exi‘ > O for i = 1,...,m and Va 6 ('6,6). Let 9(a) = Hz + exnp, -o < e < a. (3.4.3) c129 m 2 p—2 2 = p(p—1) Z7 Xj‘z- + eX-| < 0 (3.4.4) dc j=1 J 3 since not all of the x. = O and O < p < 1. This shows that 9 cannot have a local minimum for e = 0, i.e. "z + (‘5’pr < 1 for o < (3'( < 5. (3.4.5) Choose any such 3. Then there must exist 6 > 0 such that Haz+3xnp_<_1 va 6 (1-e,1+e). (3.4.6) Let u = (l - §)z + 3k and v = (l + §)z = 3x. By construc- tion, u,v E B and by the linearity of f f(u) = (1 - -2§)£(z) and f(v) = (1 + §)f(z). (3.4.7) Since we have already verified that f(z) ¥'O, either f(u) < f(z) < f(V) or flu) > f(z) > f(V). 31 In either case, we have a contradiction of our hypothesis that H + 2 supports B at 2. Thus the assumption that dim K > 1 must be wrong. In other words, dim K = 1 which is what we wanted to prove. We are now ready to establish a theorem which guar— antees that we need only check a finite number of points in order to find a solution of problem (P*). 3.5 Theorem. Let K be an n dimensional subspace of nf‘, n < m, with basis bl""’bn' Let D be the matrix (bl""’bn) = (rl,...,rm)T, where ri is the i-Eh row of the mxn matrix D. Let B = {x E K} "xllpg 1}. Suppose that f is a non-zero linear functional on K such that H + z sup- ports B at z, where Hznp = 1 and H = {x e K|f(x) = O}. Denote by I the index set of i's such that 21 #'O, and J = {1,2,...,m}‘~I. Let A be the matrix with rows rj, j EJ, and set N = {x E nf‘le 0}. Then dim N = 1. Proof: First of all, dim N #'0. To see this note that there exists 5 6 13‘ such that D5 = z. By the definition of the set J, it follows that AB 0, i.e. B e N. Now B #'0 since 2 #'0. Hence dim N = Q.2 1. Without loss of generality, let I = {1,...,p} and J = {p4-l,...,m}, and put K* = [Dx‘x E N}. Denote by f* the restriction of f to K*, and let H* = H n K* and B* = B n K*. Clearly, z E B*. We claim that dim K* = q. To see this, observe that the rank of the matrix D is n, and hence by the rank-nullity 32 theorem of linear algebra, Dx = 0 implies that x = 0. Hence dim K* = dim N = q. By Lemma 3.3, H* is a hyperplane in K*. Finally, H* + 2 supports B* at 2 since f* = f|K*. At this point ‘we pause to observe that each k e K* satisfies k“,+1 = --°‘ km = 0. To see this, recall that k 6 K* if and only if there exists an x E N such that k = Dx. But x E N implies that Ax = O, i.e. (Dx)j = 0 for j 6 J = {p4-l,...,m}. We now in essence dr0p the m-u trailing zero coor- dinates and consider our problem in 36‘. To make this pre- cise, let x2 u [x E 33*}(x1,...,xu,0,...,0) e K* ’ .K 4 fit by f(x) = f*(x W1 1,...,XH,O,...,O ), fi= {x eiZ'I'Euc) = 0}, ml H " H {X E K|Ihq|p g 1}, where now’ “X"p = jifi‘xj‘p’ z = (21,...,zu). Notice that E 'has no coordinates equal to zero, and E [zjlp = 1. Also, H is a hyperplane in K and H + ; gupports B’ at 2. Finally, dim K'= dim K* = q. But by lemma 3.4, q = 1, which is precisely what we wanted to prove, namely dim N = l. 3.6 Definition. With an eye toward the future and a certain dislike of repetition, we define the phrase “the usual n dimensional situation" to be the following: 33 K is an n dimensional subspace of mg“ with basis b1,...,bn: n < m: s E K*~\{0} where K‘ = {x e mm\(x|k) = o Vk e K}; D = (b1,...,bn,s) = (r1,...,rm)T where ri is the 1&2. row of the m x(n+l) matrix D: B = {x 6 K@(s)| "xnp g l}; o 0) coincides with F(x) in at least n + 1 points of E." Condition A says that the m x(n-+l) matrix (Yj(xi)) has rank n+-l. In the same paper, Motzkin and Walsh observe that ”Theorem 6 implies that every extremal polynomial P(x) is found by interpolation to F(x) in n4—1 points of E: there exist but a finite number of polynomials interpolating to F(x) in n-tl points of E, so every extremal polynomial can be found merely by comparing their measures of approxi- mation.” Without making a further assumption about the matrix Y = (Yj(xi)), namely that it satisfies the Haar Condition, there need not be only a finite number of polynomials inter— polating F(x) at ni-l points of E. The suggested pro— cedure for finding a polynomial of best approximation, consequently, need not be finite. Applying Corollary 3.10, we can prove that the observation of Motzkin and Walsh is correct if Y satisfies the Haar Condition. When the Haar Condition is violated, however, we can easily construct, even 3 in HR , counterexamples to the assertion of Motzkin and Walsh. Suppose m = 3, n = l, F(xl) = F(x2) = 0, F(x3) = Y1(x1) Yl(x2) = Y2(xl) = Y2(x2) = Y2(x3) = l, and Y1(x3) = -2. The l 1 matrix Y = (Yj(xi)) = l 1 clearly has rank n-tl = 2, -2 1 39 so Condition A is satisfied. For any a 6 EL = - + P(x) avl(x) 032(x) satisfies P(xl) = P(x2) = 0, so that P(x) interpolates F(x) in n-tl = 2 points of E. Clearly, there are an infinite number of these interpolating polynomials showing that the observation of Motzkin and Walsh is incorrect. The solution of the given approximation problem is indeed included among all those functions which interpolate F(x) in at least n-tl points, but this collection of inter— polating functions need not be finite as claimed by Motzkin and Walsh. Of course, this counterexample is possible only because Y does not satisfy the Haar Condition. By Theorems 3.8 and 3.13, we are guaranteed that the given approximation problem can always be solved in a finite number of steps. Before proceeding any further toward a solution of problem (P*), we pause to consider some of the difficulties that lie ahead. Essentially we want to find a point on an n4-1 dimensional cross section of the LP unit ball in n5‘ at which a translation of a fixed subspace is tangent to the ball. So far we have reduced the problem to one of consider— ing only a finite number of points. As one can see in the following figures, these points correspond closely to corners of a polyhedron, hence the name corner points. In the figures, the corner points have been connected by straight lines rather than by the curved arcs that one would obtain when the 2 unit ball is intersected with the specified plane K. Iljllllwvllc'llll‘l I‘ll! Figure 2(a). m = 3 p = 1.0, .9, .8, ..., .2 0 10 K = span -1 , l 10 Figure 2(b). m = 5, p = 25 FF-l) [301‘ 0 1 l K= span! 1 , 2 F 2 3 K LBOJ LOJJ Figure 2(c). m = 5, p - 2 r (so) (07‘ l -5 K = spané 5 , -l P S -l K LOJ LSOJ/ 41 Our approach to the problem up to now has been sim- ilar to the idea behind linear programming. In linear pro- gramming problems, a linear functional defined on a finite dimensional space is maximized subject to certain linear constraints on the same space. In this case, one also pro— ceeds by reducing the number of points at which the solution may occur from an infinite number to a finite one. In linear programming, however, the points one is left with actually are corners of a convex polyhedron. To solve the linear programming problem, one moves from one corner to an adjacent one always increasing (at least not decreasing) the value of the objective function, until a corner point is reached where all adjacent corners give no higher values for the objective function. By the convexity of the polyhedron, one then con- cludes that such a point is in fact a solution of the linear programming problem. The advantage of such an algorithm is that one may not have to check all of the corner points all of the time. In practice, the number of points actually com- puted and checked is usually far less than the total number of corners, though there are examples where the usual simplex algorithm would end up checking all of the corner points. If possible, we would like to develop a similar exchange type algorithm to obtain a solution of problem (P*). There is no difficulty in moving from one corner to another con— tinually increasing the value of the objective function until a point is reached all of whose neighbors yield no higher value of the objective function. The crucial step at which 42 we encounter trouble is trying to conclude that such a point solves problem (P*). Since the 2p unit ball is not con- vex, some of the corner points often lie inside the convex hull of all of the corner points. In figure 2(a), as p decreases the cross section of the unit ball changes from convex to non convex. Figures 2(b) and 2(c) are further examples of non convex cross sections of various 1 unit balls. Moreover, it is quite possible that a local maximum of the objective function may not be a global maximum. In other words, we may not have solved problem (P*). Figure 2(c) shows a situation in which this may occur Keeping this obstacle of non convexity clearly in view, we begin to consider the practical task of actually solving problem (P*). We now turn to the construction of algorithms for solving problem (P*). 3.14 Remarks. Given any point z E Kea(s), there exists a B G HJHJ' such that z = DB, where the definition of D appears in 3.6. Also, since 3 E K*, (2‘s) = (DB‘s) = Bn+l(s}s). Problem (P*) requires us to maximize (w‘s) over all w 6 K® (s), ”WHP = l, i.e. . . n+1 . max1mize Bn+1(s|s) over all 5 6 fit , HDBHP==1, i.e. maximize 5 n+1 over all B e 11?.“1 , HDBHP = 1. Using this last formulation of problem (P*) and Theorem 3.8, we see that the solution of problem (P*) entails at most the following: 43 (1) find all those 6 6 HJHJ' for which D5 is a corner point; (2) select from this collection the one with the largest n+1St coordinate, say B. Then (3) z = DB solves problem (P*). If D satisfies the Haar Condition, then it is quite 1 such that D5 is a corner easy to find all 5 6 fig“- point. From Theorem 3.13 and Corollary 3.10, we know that the solution of problem (P*) must be among the (1:11) dis- tinct corner points each of which have n coordinates equal to zero and m-n coordinates non-zero. Suppose we wish to find the corner point z for which [i‘zi = 0} = {i1,...,in}. Select any in+1 e {l,...,m}~.[i1,...,in}, and with A = T _ _ T (ri ,...,ri ) , 801ve AX - en+1 Where en+1 — (O,ooo,0,l) 1 n+1 an+1. The Haar Condition insures that x exists and is . _ x _ unique. Set 5 - “DxfiT7E . Then 2 — DB has exactly n P coordinates equal to zero, viz. zil = --- = zin = 0, and ”zup = 1. In this manner, all of the corner points can be found. The above algorithm lacks three important features: (1) a method for determining if the Haar Condition is satis- fied: (2) an orderly way of selecting the (2:) points to be computed; and (3) the ability, at least theoretically, to ignore some of the corner points some of the time. The first of these is lacking for the very good reason that checking for the Haar Condition involves more work than 44 solving the original problem itself. The second shortcoming will be corrected shortly, and the third problem will be considered afterwards. Since the set of points in KGD(s) of unit norm and having exactly n coordinates equal to zero is identically the same as the set of all corner points when the Haar Condition holds, having that condition satisfied seems like an ideal setting in which to work. The tremendous problem of verifying that the Haar Condition holds diminishes that ideal somewhat. An even more damaging blow is leveled by Lemma 3.12 which says that the greatest number of possible corner points occurs only when the Haar Condition holds. In other words, that situation requires the most work to find all of the corner points since there are more corners to find. Our algorithms for finding all of the corner points will work whether the Haar Condition holds or not. In fact, it is actually faster without that condition present. 3.15 Definition. Let U = {u e N“ l1 _<_ u1 < (12 < < nn 5 m} and T = [1,...,(:)}. Define Y:T -o U by the following rules. Given t E T, (1) Set t =t,u O = O, and i = 1. O (2) Find ui E N such that ui-l < ui _<_ m—n+1 ui-l -u _ m-j i and 1 5- ti-l ._ 2 (n-i) S (m-i) ° j-l+u. l-l ui-l (3) Set ti = ti-l - Z) (::2 and increment j=l+u. 45 (4) Repeat steps 2 and 3 until un has been found. Then Y(t) = u E U where the components u1,...,un of u v were found above. We adopt the convention that Z)(-) = 0 . j=u if u > v. We must verify that Definition 3.15 makes sense, i.e. that u can be found as claimed. We begin this task by proving the following lemma. 3.16 Lemma. Let m,n,i,k E 11, i'g n < m, k.g m-+i-n-l. m—n+i (m-j m-k Then Z) . = . j=l+k n-i) n-1+l proof: Since (3) = (ati)+(p;1) for all rm; 6 H, p > q, (nfitl) (m;E;1) + EIEIi) ’ (”Ekll) [(mfiklz) ($2311)] = m;511> + (mgfiz) + ----+<“;f:1) + (3::ii ( = (“3511) + mgfzz) + ----+(“;f:1) + :zi) , m— n+i = j= =l+k 1, the left inequality of (3.17.1) follows from the k-l . definition of k. If k = 1, then 2) §:i) = 0, and by i=1 47 hypothesis, t > 0. Thus we conclude from (3.17.1) that 1512— :i': _1)< <05]; which is what we were to show. We let u1 = k proving the existence of ul. We have already established the uniqueness of ul. Suppose that u1,...,ui_1 have been found satis- fying the conditions specified in Definition 3.15. We next show that there exists a unique ui, ui_1 < ui g m-n-+i, such that u.—l 1 lSti-l' Z lm(:i)— (n:- 1111') j= Wl+u By the induction hypothesis on ui_1, i.e. that ui_1 satis- fies the conditions of Definition 3.15, u. -1 1-1 -1 2 > <1 1 l 1 2 j=l+ui P(n i+l n—i+l —u. m-n+i . And by Lemma 3.16, (m }-1) = 23 (m-? . Thus n-1+l ._ n-1 j-l+u. 1-1 ti-l 5 .=133 n-i 3 i-l So there is a smallest integer k, ui _1 < k < m-n-+i, such that k— 1 k j 231(n:2) ui_1 + l, the left inequality of (3.17.2) follows from the definition of k. If k = ui-l + 1, then k-l . . 2) 2:3) = O, and by the induction hypothesis on ui_1, j=l+ui_1 ti_1‘2 1. Hence from (3.17.2) we have k-l . _ ' x m- J m-k 1.3 ti-l .=1fa (n-i ‘3 (n—i 3 1-1 By the uniqueness of u2 already proven, we conclude that there exists a unique ui satisfying (2) of Definition 3.15. This concludes the proof of Proposition 3.17. 3.18 Preposition. Y is a one-to-one function. Proof: Let 1 g t, t’ g (:2) such that Ht) = \Ht’) = 11. Using the notation of Definition 3.15, observe that m-u m—u n _ n _ 15tn5(n-n)‘(0 )_1 ' ' = ' = = ' implying that tn tn 1. Suppose that tn-k tn—k for k = O,1,...,i. We shall show that t = t n-(i+1) n-(i+1)° BY (3) of definition 3.15, un—i-l t ' = t ° + Z (“173) , n-(1+l) n—1 j=l+u . 1 n-1-l un-i-l = t’ . + Z, (mfj) n-1 '=l+u 1 ’ J n-i-l = : tn-(i+1) 49 Hence tk = t; , k = n,n-—l,...,O. But t = t0 and t6 = t’ so that t = t'. Thus Y is a one-to-one function. Both T and U have (2:) elements by construction, and since Y:T 4‘U is one-to-one, it must also be onto. -1 . . Hence Y :U 4 T exists. Moreover, by rearranging (3) of definition 3.15 and iterating, we have Y'l(U) u.—l n 1 m-j 1.+ 23 Z n_i)] , 1=l j=l+ui_1 v where we let u0 = O and 23(0) = 0 if p > v. J=u 3.19 Definition. Imfi: T and U be as in definition 3.15. Define §:U 4 T by Q = Y-l. An element u E U corresponds to the possible corner point z 6 n91 for which 21 = O, i = u1,...,un. The cor- respondence U u T is not as arbitrary as it may appear at first. An example with m = 6, n = 3 appears in Table 3.1. Notice that the first zero coordinates appear as high as possible for a point in 351 on the left and proceed down- ward to the right. We are now prepared to present an algorithm for solving problem (P*). Assume that we have "the usual n dimensional situation". 3.20 Algorithm. Step 1 Set q = 1, pi = 1, B. = o for . m 1 = 1,...,(11) . Step 2 Compute Y(q) = (k1,...,k 50 Q(u)=t= 1234567891011121314151617181920 l l l 1 1 l l 1 1 1.12 2 :2 2 :2 2 £3 3 13 4 u = Y(t) = 2 2 2 2 3 3 3 444 S 23 3 13 4 41 5 44 4 £5 5 3 4 5 6 4 5 6 5 6 £344 5 £5 5 6 (5 S 6 £5 6 O O O O O O O O O (J O 0 O O O C) O (3 O (3 Corresponding O O O O O O O O O 0 point in mm o o oo o o o o o o O O (D O O O 0 (D (3 O O O O (D O C) O O C) O k -r* J \-——-—~z——-——J ‘-¢——’\VJ 5 4 3 2 (2) (2) Mb) Table 3.1 Example 1: Compute Y(15). Given: u0 = 0 t0 = 15 Compute: u1 = 2 t1 = 5 u2 = 4 t2 = 2 u3 = 6 t3 = l 2 Y(15) = 4 , which agrees with the table above. 6 Example 2: Compute §(u) for u = (1,4,5)T u.-l u.-1 n 1 . 3 1 - Nu) = 1+ .23 . 23 (MPH .23 L. 23 (3-3)] 1-1 j-1i_1+1 1=1 j=l+ui_l O . 3 . 4 . 2(623)+ 2(61’>+ 23(653)=1+0+(‘i>+(i)+° i=1 i=2 i=5 = l-+4-+3 = 8, which again agrees with the table above. §£22_§ Step 4 Step 5 Step 6 Step 7 Step1§ $22.2 §£22_19 §£22_11 Step 12 51 Construct A = E and set I = [k1,...kn}. Select i E {l,...,m}~\I and form C = (:8). 1 Does C contain n-+l linearly independent rows? If yes, then go to Step 7. If no, then continue. Set A = C, I = I U {i} and return to Step 4. Solve Ax = O, (ri‘x) = 1. Compute Dx and “Dxlli’D/p = Y, Where D is given in definition 3.6. x sgn x Set 5 - l—Eill and z(q) = —————§:l-Dx. q _ Y Y Let J = {j‘(Dx)j = 0}. Form all possible sets containing exactly n different elements of J. For each set {j1,...,jn} found in Step 10, with 1,5 j1 < j2 < ... < jn g;m and j = . n . _ = (Ji)i=l’ compute 0(3) — t and set pt 0. m If Pi = O, i = 1,...,(n ), then go to Step 13. Otherwise, let q be the smallest in— teger k, l _<_ k _<_ (1:1) such that pk = 1. Return to Step 2. 52 Step 13 Select k, l _<_ k _<_ (1‘), such that Bk = maxfBi|l _<_ i < (2)]. Then z(k) solves problem (P*) and max{(s‘w) 1w 6 109(3), Hwnp = l} = Bk(s\s). In Step 5, one must eventually answer the question in the affirmative since the row rank of D equals the column rank of D which is n-rl by hypothesis. The question it- self can be answered in a number of ways. For example, one might orthogonalize the rows of C and check whether any zero rows occur. This method will also help When step 7 is reached since one then knows which rows of C yield a non— singular matrix .6 with which to solve Gx = en. Steps 10 and 11 are present to exploit Lemma 3.12 which says that some of the original (3:) possible corner points may in fact be redundant. In Step 9, one need not save all of the Bi and z(i) but only the current largest Bi and the corres- ponding z(i) which would make Step 13 unnecessary. Algorithm 3.20 solves problem (P*) for any choice of positive integers m,n with m > n. The price paid for this flexibility is a rather complex algorithm involving a considerable amount of index manipulation. In the case where n is very small or nearly equal to m, one can avoid much of this work by deve10ping special algorithms designed to solve only problems with a particular fixed choice of dim K. As with algorithm 3.20, Corollary 3.9 provides the basis for each algorithm. 53 Recall that if dim K = n, then 2, the solution of problem (P*), has at most m-—n nonzero coordinates, and z E K59(s) implies that 2 must also satisfy m-(n4-l) orthogonality constraints. For n = m-l,m-2,m-3, these facts lead to simpler algorithms for solving problem (P*). 3.21 Algorithm. Corollary 3.8 has already given the solution 2 when n = m- 1. In that case, 2 = ej sgn sj , where ej is the usual unit basis vector in fig! and j is determined by \sj‘ = maxf‘si‘ (1'3 1,3 m}. 3.22 Algorithm. Suppose that n = m-2 and that a E . T T (K€B(s))*‘\{O}. With 3 = ($1,...,sm) , a = (a1,...,am) , and z = (21,...,zm)T, problem (P*) reduces to Maximize f.., 1.3 i, j g_m, i ¥’j, 1) f.. = 5.2. + 5.2., (3.22.1) 13 1 1 J 3 subject to O = a.z. + a.z., (3.22.2) 1 1 j j = P P 1 (21‘ + ‘23.) . (3.22.3) Let i and j be fixed. Then aizi‘ = ajzj‘, by (3.22.2), ‘ai‘P‘zi‘P lag-11°12,- 1" , ‘ajfi’u- (zi1P), by (3.22.3). Solving for ‘zi‘ we find that 54 13-! J lz.‘ = . (3.22.4) 1 1/ (‘ai|p + laj]p) p Similarly, 1a.! 1 |z.[ = . (3.22.5) J p p l/p ((ai( + ‘aj( ) Noticing in (3.22.1) that fij is maximized by taking sgn zi = sgn si and sgn zj = sgn Sj’ we conclude that for fixed i,j the maximum value of fij is ‘ajsi‘ + ‘aisj .. = —- . (3.22.6) 13 p p 17p (‘31) + [aj‘ ) And the corner point z for which (z‘s) = fij has coordin- ates ‘aj\ sgn si 1 (‘ai‘p + ‘aj)p)l/p , z. = ‘ai‘sgnsi 1 .9 J (‘ai‘P+ ‘aj‘P) /P 2k 0, k#1,j, lgRSm. We now compute the (1;) values f... If f is the largest ij pv of the fij’ then the solution of problem (P*) is z = z e + z e u H v v 55 3.23 Algorithm. In a similar manner, the (P*) can be found directly when n = solution of problem m-3. Let (al,...,am)T, (bl,...,bm)T E (I<@(s))‘L be linearly independent. Problem (P*) can be stated as Maximize gijk, 1 _<_ i,j,k _<_ m,i 713's! k r’ i, gijk = szi + sjzj + Skzk (3.23.1) subject to O = a. + ajzj + akzk, (3.23.2) O = bizi + bjzj + bkzk’ (3.23.3) _ P P P 1 — (21‘ + ‘zj( + [zk| . (3.23.4) Let i,j,k be fixed. Then (3.23.2) and (3.23.3) can be re- written, after eliminating the 2k from (3.23.2) and the zj term from (3.23.3), as A.z. + A.z. = 0, (3.23.5) 1 1 3 j Bizi + Bkzk = 0, (3.23.6) ai ak aj ak Aj Ai bi bk bj bk bj hi and Bk = bkA We have P P - P P ‘A1‘ ‘21) — ‘Aj‘ ‘zj‘ , by (3.23.5), (3.23.7) P P _ P P [Bi‘ \zi] - |Bk| )zk) , by (3.23.6). 56 Also ‘7B u p p p P p p AjBk AjBk‘ ‘21) + (AjBkl lzj‘ + AjBk‘ (21“ , by (3.23.4), Bk)p‘zi‘p+ ‘Ain‘p‘zi‘p+ \Aj311p)zilp, by (3.23.7), _ p p p - ((AjBk\p+ (Ain‘ + Ajafl )‘zi] . (3.23.8) Let D—(ABP+ABP+ABP)1/p tht “)jk‘ 1k) lji) :50 a (A. Bk) (2. 1 =-—4l——- , by (3.23.8). . . (Ain( |A.Bi( Similarly, one can show that lzj‘ =-——Er—— and ‘Zk‘ =-——%r——. From (3.23.1), it is clear that for i,j,k fixed, gijk is maximized by taking sgn 2H»: sgn SH , p = i,j,k, with the maximum value being |Aj Bk M‘-+‘A Bk 3. ]‘-+ A. B. 13k) gijk= D The corner point z for which (z|s) = gijk has coordinates zi = ‘AjBkI sgn Si/D , zj - |Ain‘ sgn sj/D , zk = lAjBi‘ sgn sk/D , 21' = o, z e {l,...,m}\{i,j,k}. We now compute the (1;) values gijk corresponding to the values of the objective function at all of the corner points. If gMW is the largest of the gijk’ then the solut1on of problem (P*) is 57 = + . z zxex+zueH zvev As these special algorithms indicate, the amount of work required to find the solution of problem (P*) by this technique increases rapidly as dim KL increases. One could, however, try the same general approach for small values of n = dim K. 3.24 Algorithm. If K is one dimensional, say K = (a) )T where a = (31,...,am , then 2, the solution of problem (P*), by Corollary 3.9, has at least one coordinate equal to zero. We can also assume that (a1) + 131‘ {'0 for l g_i g_m since otherwise the entire problem takes place in IRm-k, k > 0. Problem (P*) can be stated as Maximize f. l < i'g m, i’ ‘— fi = (aka + 515‘s), (3.24.1) subject to O = aiai + Bisi’ (3.24.2) l:= Maia + Bisflp . (3.23.3) If :31 = 0, then ai = o by (3.24.2), and hence |le = “s”.1->1/p by (3.24.3). If ai = 0, then Bi = O by (3.24.2), - -1/p _ aiai and hence ai‘ — ”anp . If aisi f'O, then Bi - - Si a.a. . _ _ 1 1 = P _._i by (3.24.2), so that l - "aia 81 sup (Oi) ”a Si snp a. s. - _1 l/p _ ‘ 1‘ by (3.24.3). Thus (0‘11- 1/ua - s. sup — 17p . 1 Has. - a.su ‘a.‘ 1. 1 p . . . _ i Similarly, if aisi ¥'O, ‘Bi‘ — 1/5 . In all three llasi - ais”p cases, 58 s. a. ‘ai‘ = ‘ 1‘ l/p 3 and ‘Bi‘ = ‘ 1‘ lfp (3.24.4) ”as; ' aisnp H331 ‘ aisnp Since (a‘s) = 0, fi = Bi(s|s). To maximize fi’ we will choose sgn Bi = 1 whenever Bi {'0. Thus by (3.24.2) we conclude that sgn ai = —sgn(siai) if ai ¥ 0, and of no interest if ai = 0 since Bi = 0 then. Thus by (3.24.4) —si sgn ai ‘ai‘ Ct. = 1/ , and Bi = ”as. — a.su P 1 1 p l/p ' H331 ' aiSHp By evaluating the m Bi's, we have essentially - up to a factor of (s|s) - evaluated the objective function at all of the corner points. Consequently, if 5“ is the largest of the 6i, then the solution of problem (P*) is z = a a + B s. H H Algorithm 3.20 and, to a lesser degree, the four special algorithms just described have two distinguishing features - one good and the other bad. On the one hand, they always work, i.e. they give the correct solution of problem (P*). On the other hand, algorithm 3.20 in particular can involve a tremendous amount of work since every corner point must be computed. Consequently, unless m and n are fairly small numbers or the Haar Condition is so flagrantly violated that the actual number of corner points is reasonably small, algorithm 3.20 does not represent a computationally feasible method for finding the solution of problem (P*). 59 3.25 Definition. Let x,y be corner points with xi = O, iEI,yi=O, iEI’, xjalo jeJ, yjaKO jEJ’, and I U J = I’ U J' = {1,...,m}. We say that x and y are adjacent if {ri‘i 6 I n 1’} contains n-l linearly inde- pendent vectors. The idea behind this definition is most readily seen if we assume that the Haar Condition holds since otherwise the idea can be easily lost among the subscripts. In this situation, x,y being adjacent (corner) points implies that I and I' both have exactly n elements and [ri]i E I n 1’} contains n-—1 linearly independent row vectors, i.e. I n 1' contains exactly n-l elements. Thus there is an i 6 I and jEI' such that I=JU {i}\[j} and I’=Iu{j}\{i}. In terms of coordinates, all but one of the zero coordinates of either x or y is also a zero coordinate of the other. It should be noted that adjacent points can be much farther apart than one might expect a term like adjacent to allow. For example, if K is a one dimensional subspace, then each two corner points are adjacent. This follows im— mediately since corner points need only have one coordinate equal to zero. 3.26 Definition. A corner point z is a local solution of problem (P*) if (z‘s).2 (x‘s) for all x adjacent to z. It follows from the definition of adjacent corner points that there can exist corner points which are not ad- jacent. Consequently, a local solution of problem (P*) need not be a solution of problem (P*). 60 We can now remedy the shortcoming of excessive com— putations found in algorithm 3.20 by presenting an exchange algorithm similar to that used in linear programming. The solution found in this manner may, however, only be a local solution of problem (P*). As always, we assume "the usual n dimensional situation". 3.27 Algorithm. Step 2. Step 3. Step 4. Step 5. Step 6. Step 7. We claim that ‘we show that N 2 ~ 2 Step 1. Find a corner point z = DB, and set I = {i‘zi = 0}. Select n linearly independent rows ri1,...,rin of D w1th 11,...,1n 6 I. Pick u 6 [il,...,in}. Relabel the ril,...,rin as p1,...,Pn with En = r“. Orthogonalize the 5 and (ii) for i i-‘l (5. ‘pi) jéa (lepj5 pj. Pick k 6 {l,...,m}‘\I. II N u b :1 '0 I“ ll '0 I I Set B - YknDYk" sgn(Yk)n+l, where Y _ (pnlrk) B k - pn - (B‘rks = D3 is a corner point. To verify this, satisfies the definition of a corner point. 61 By construction, HEMP = 1. Let J = {j‘Ej = O} and N = {x e IRn+1\(x|rj) = O,j 6 J}. We need to show that dimN=l. By construction, {ri,...,ri ,rk} are n4—l linearly inde— pendent vectors. {il,...,i:,k}\\{u} c J, so dimhrg 1. But dim N‘p 1 since a E N {0}. Thus E = DE is a corner point. ~ Step 8. (i) If 5h+1 > Bn+1, replace 5 by B and z by E, and then return to step 2. Notice that we have n linearly inde— pendent rows on hand from above. (ii) If Bn+1 g 6M1, try another k 6 {1,...,m}\\I until all return to step 6 and of these have been tried. (iii) When all of the k E {l,...,m}~\I have been tried in (ii), return to step 3 to choose another p E {1,...,1n}. (iv) When all of the p E {i,...,in} have been tried in (iii), 2 is a local solu— tion of problem (P*) with value Bn+l’ Step 1 can be accomplished in the same manner that corner points were found in algorithm 3.20. Experience with a few examples seems to indicate that a good starting corner point to find in step 1 is that 2 which has zero coordinates where the coordinates of the vector s are the smallest in abso- lute value. In many cases, this corner point actually solves problem (P*). 62 Step 8(i) insures that algorithm 3.27 eventually terminates since the value of the objective function Bn+l = (s‘z) is non decreasing and there are only a finite number of corner points. Lemma 3.12 guarantees that in step 8(ii) we need only check the n coordinates listed rather than all of the zero coordinates. The assertion in step 8(iv) that the point 2 found by algorithm 3.27 is a local solution of problem (P*) follows directly from the definition of a local solution. If dim (K9 (3)) is close to dim IRm, then the computations involved in choosing n4-l linearly independent row vectors (step 2) and orthogonalizing them (step 5) can become tedious. In this case, it may be advantagous to ex- ploit the fact that when n is nearly equal to m, then dim{(K® (s))"‘} = m-n-l is quite small. Before presenting such an algorithm with all of its details, a brief example clarifying the key ideas involved is sketched. Assume "the usual n dimensional situation". First find m-n-l linearly independent vectors bl""’bm-n-1 6 mm which span (K®(s))"', i.e. DTbj = 0 j = 1,...,m—n-l. Letting x0 = (x|s), we can reformulate problem (P*) as: maximize x0 subject to the constraints -x0 + X181 + x232 + --- + xmsm a O, xlblJ + xzb2j + + xmbhj==0, j = l, .,m—n—l, 63 By relabelling indices if necessary, we can assume that the linear constraints can be put in the following simpler form: - -+c x -+----+c x = 0 x0 m—n m—n m m ’ + x +---+ x =0 xl am-n,l m—n am,l m ’ x -+a x -+----+a x = O 2 m-n,2 m-n m,2 m ’ '° +°°--+a O. x -+a x x = m-n-l m—n,m—n-l m-n m,m—n-l m If we set the n variables x ... = xm = O, we would m-n+l = be left With -x0 + Cm—nxm—n = O, Xj + am-n’ jxm-n = O, J a 1’ ...,m-n-lo Substituting these restrictions on x into the equation m-n - ”“971 -l/b qup 8 l, we find that ‘Xm-n‘ = (l + -Ef ‘am-n,j|P) ’ j—l. m-n-l _1/P and thus we conclude that ‘xol = cm-n‘(l + jEQ ‘am-n,j‘p) Similarly we could have set all of xm-n through xm equal to zero except xm 0's kig n, and found that \x 0‘ = -n+k’ m'n'l p ~1/b ‘Cm-n+k‘(l + $31 ‘am-n+k,j‘ ) in that case. Since we are interested in maximizing x , choose q to be that sub- 0 script which maximizes this last expression, and call the O * max1mum value x0. and some xi, 1 (”i ng-n-l, we can repeat the above steps If we now interchange the roles of xq to obtain another i3. If i3.g x5, then the objective function has not been increased, so we try another 1 g_i g m-n-l until all have been considered. When that occurs, 64 x is a local solution of problem (P*) with (x‘s) = x3. If 21* O > x3, then set x* = i* and repeat the computations. O 0 With this general scheme in mind, we present 3.28 Algorithm. Step 1. Step 2. Step 3. Step 4. Step 5. Find m-n—l linearly independent vectors b l”°°’bm-n—l 6 n?“ such that DTbj = 0, j = 1,...,m-n-l. Select I c {1,...,m} containing m—n-l indices 11"°"1m-n-l’ so that xi. can be eliminated from -x0 + (s‘x) = O and from all but the jth equation of the system (bk‘x) = O, k = 1,...,m-n-l, in which xij has coefficient 1. Call the resulting system -x + (c‘x) = O, O (aj‘x) = 0, j = 1,...,m-m—1. Find k 6 {l,...,m}-I which maximizes m-n-l _1/b = P (l + j§1 lak,j‘ ) . Set x = ckxk. For each ij 6 I, form Ij = I U {k}\\{ij}. Interchange the roles of xi and xk, i.e. eliminate xk from -x + (c‘x) = O and 0 from all of (ai‘x) = O, i = 1,...,m-n-l, except where i = j in which xk has 65 coefficient 1. Call the resulting system "X + (C(j)'X) = 09 O (ai(j)|x) = O, i = 1,...,m-n-1. Step 6. Find H e {l,...,m}\Ij which minimizes m-n-l _1/ (1+ 23 (mp) p i=1 ‘aH’i 3 XH(J) * _ . . and set xj - XH(])Cu(j). Step 7. (i) If (x; .3 \x*‘ for all ij 5 I, then the vector x found in steps 2 and 3 is a local solution of problem (P*). (ii) If any |x;\ > ‘x*‘, let x; be the largest one in absolute value. Set I = I a. = aj(q), j = 1,...,m-n-1, c = C(q), q’ 3 k = q, and x* = x3. Return to step 5. To see that algorithms 3.27 and 3.28 compute the same local solution of problem (P*), observe the following eqivalence of steps in the two algorithms: Algorithm 3.28 Algorithm 3.27 Step 2 Steps 2,5 Step 3 Steps 6,7,8(ii) Step 5 Step 8(iii) Step 7(i) Step 8(iv) Step 7(ii) Step 8(i) 66 Before leaving the subject of algorithms for solving the LP problem, 0 < p < 1, one unsuccessful attempt to solve problem (P*) may be of interest. Rather than moving from vertex to vertex in an exchange algorithm, one might try to solve a series of problems in which the dimension of the subspace K changes by one at each step. Since one can find the global solution if either KG) (3) = IRm or dim(K®(s)) = 2, hopefully one could start with the solution to one of those two problems, and by suc- cessively decreasing or increasing the dimension of K by I, obtain the solution of the given n dimensional problem. Any such algorithm would depend upon getting some information about the solution of an njtl dimensional problem from a known solution of a related n dimensional problem. In some sense, the solutions should not be very far apart. Assume K, K satisfy "the usual n,n-+l dimensional situations" respectively with K a subspace of R, that z,z solve problem (P*) for K,K,. that zi = 0, i e I, E. O, 1 161', zjae’o,jeJ, zj¥0,jefi, and IUJ=iu3= {1,...,m}. We can also assume that the Haar Condition holds in both cases so that ‘1] = n, ‘I‘ = n4-l, ‘Jl = m-n, [3| = m-n-l. Under these conditions, it was conjectured that I CT, or equivalently, 3cJ. In IR3 this means that the solution of the two dimensional LP problem must lie on one of the four edges of the unit ball passing through the vertex that was the solution of the three dimensional problem. Alter— natively, the solution of the three dimensional problem is one 67 of the two end points of the edge on which the solution of the two dimensional LP problem lies. Unfortunately, this conjecture is not true even in IR3 foreither o f(z) for all x e B~\[z}, then Lemma 3.4 is true for p = 1. In the proof, 68 choose x, 5, E, and e as before, but obtain a contradic- tion either to the hypothesis that H + 2 supports B at z or to the strictness of the support. Theorem 3.5 then follows immediately for p = 1 if we again assume that H + z prOperly supports B at 2. Theorem 3.5 and all three of its corollaries, Lemma 3.12 and Theorem 3.13 all hold as previously stated for p = 1. Algorithms 3.27 and 3.28 both solve problem (P*) when p = 1 since the £1 unit ball being convex eliminates the possibility of finding a local solution that is not a global solution of problem (P*). Consequently, in the £1 case, algorithm 3.20 is unnecessary since all three al- gorithms find the same solution while algorithm 3.20 requires much more computational effort to do it. CHA PI‘ER IV NUMERICAL RESULTS In this chapter we consider some of the computational limitations placed upon the algorithms presented in chapters II and III. Since the main application of algorithm 2.2 is to the 2p spaces, 1 < p < a, we discuss the computational difficulties encountered in that context, and then present numerical results from two examples. Examples of LP approx— imation, 0 < p,g l, conclude the Chapter. As we noted in chapter II, in the case of the LP spaces, 1 < p < a, if y i 0, then * IYjIP-l yj = 1 sgny. . HYIIP' This particularly simple form for y* makes the evaluation * . * . . of yk in step 2 and (yk-akvk) in step 4 of algorithm 2.2 immediate. The two main computational difficulties occurring . . . . * _ in algorithm 2.2 are finding Gk such that ((yk-akvk) ‘vk)-0 and solving the original problem (P) once the solution of problem (P*) has been found. Since values obtained on a computer are seldom exactly correct, each of these difficulties 69 70 brings up a related one also. How close to Gk is close enough, and how close to vk = 0 is close enough to call yk the solution of problem (P*)? Since the questions re- lated to knowing when problem (P*) has been solved are the easier, we dispose of those first. By Theorem 2.4 of [21], if y solves problem (P*), then (s y)y is in the image of A, and x 6 fig! such that Ax = z solves problem (P). Let the rank of A be k. Form the m)(k matrix M composed of the k linearly independent columns ai ,...,ai of A. Then 1 k w = (MTM)-1MTz satisfies Mw = 2. Let x 6 n5‘ be given by w. if i E {il,...,ik] o if ig{il,...,ik] , and let y 6 B9‘ be any solution of Ay = 0. 71 Then u = x4-y satisfies Au = z, and hence u solves problem (P). If the rank of A is n, then the solution of problem (P) is unique, and in general, the solutions of problem (P) are the translates of an n-rank (A) dimensional subspace of 35‘. The answer to the question of how small vk must be to accept yk as the solution of problem (P*) is somewhat dependent upon the matrix A. Since = Yk ‘ 0‘ka yk+1 “yk - 0'1.ka ’ 1im y - v ) = l and k“ H k “k k.) ’ ”yk" = 1 for all kip 0, ‘(akvk)j| is almost exactly how much the jth coordinate of yk is changing at each iteration. Consequently, in our examples the algorithm terminated if max[‘(akvk)j‘ )l'g j gim} < 10‘7, and our computed solutions agreed with published solutions for the same problems. The related problems of how to compute and how accu— rately to compute the 0k in step 4 pose the greatest problem in algorithm 2.2. Where no confusion arises, we drOp the subscript k since the iteration being considered is usually I .III III" ‘1‘. (I'illlr II I l ['1 all! 4' {ll-'1' IIII II. ...-III ‘4' I w 72 irrelevant. We assume throughout the discussion that y,v are linearly independent since the problem is trivial other- wise. Let f(a) ((y-av)*)v) l m p—l "y-av”p-l Z)Vj‘y avj‘ sgn(yj avj), P and define ‘p-lsgn(yj — av.). J m 9(a) = jéfl Vj‘Yj"aNj Clearly, f and 9 have the same roots so we consider only the simpler function 9. By Proposition 2.5, g has a unique root a and a > 0. Moreover, g(0) > 0 since * f(0) = (Y IV) > O by (2.4.2) and (2.4.3). 9 is clearly continuous since 1 < p < a. With this information, a number of techniques for finding the root a of g are available. The following four methods were used. 1. The method of bisection. Choose a1 > 0 arbitrarily and compute g(a1). Since g(0)f>0 and the root 0 is unique, if g(a1) > 0, set 02 = 201 , hflH 5...; if 9(01) < 0, set a2 = <1 73 Of course, if g(a1) = 0, then we are finished. Repeat this procedure until two numbers, say a+ and a7, 'have been found such that 9(6‘) < o < g(a+). Then let If g(a) is positive (negative), replace 0+ (07) and repeat this bisection of the interval [d+, d'] the root of g is obtained. 2. The secant method. Begin as in the bisection method by finding d+, a? that 9(a') < O < g(a+). Instead of taking the midpoint of the interval [a+ find the point at which the line through the points 9(0*)) and (07,19(07)) crosses the x-axis, i.e. wmw)-&QW) a? — d' a: As above, if g(a) is positive (negative), replace by a and repeat until the root of g is found. by 0, until such , d”), (w ’ 0+ (0") 74 3. The secant-bisection method. After finding 6+, a" as described above, alternate one iteration of the bisection method and one step of the secant method. This is similar to Dekker's algorithm. 4. Newton's method. Observe that g is a differentiable function of a except at those aj such that y. =a.v., j=l,...,.m. After checking if any of these aj is the desired root, one can apply Newton's method to find a such that 9(a) = 0. Choose a1 > 0 arbitrarily, and compute 9(an) ah+l = a.n —-§77agy , n = 1,2,... It should be noted that some care must be taken with the use of Newton's method. In our examples, it often failed to locate a for want of a sufficiently good initial guess. For p small, both the bisection method and the secant method had trouble converging to the root in some instances. It occasionally happened that one but not the other had difficul- ties handling a specific situation. The mixed secant-bisection method worked quite successfully in these cases enjoying the benefits of each while avoiding many of their shortcomings. For values of p roughly between 1.25 and 200, few difficulties arise with any of these four methods for locating a. For small or very large values of p, however, 9 can be- come somewhat unruly. 75 While in theory algorithm 2.2 always solves problem (P), one should expect a little less in practice. Recall that m "\ p-l a = V. o-av. s n o-aVo o 9() 33:1 31y] 3) 9 (Y3 J) The cause of our difficulties is the exponent p-l. If p is near 1, then 1 ‘yj-av.‘p- .-av. =- .—a. J sgn(yJ J) sgn(yJ v3) independent of the magnitude of ‘yj-avj‘. A number that is supposed to be zero, but because of roundoff errors is actually 10-15 on the computer, will be greater than .7 after being raised to the p-l power when p = 1.01, and approximately .966 when p = 1.001. Similarly, when p is very large, 0 if )yj-'avj' < 1 [.5 H H \Y- J 3 if ‘yj-avj‘ a: if )Yj'avj‘ > 1 The point of these comments is that one should not expect algorithm 2.2 to be computationally feasible throughout 1 < p < a. With p near 1, the thirty-two figures of double precision machine accuracy was sometimes not sufficient to 5 determine a such that (9(a)) < 10— The question of how accurately one must know 0 becomes of interest at this 76 point. The question will be taken up‘when the actual examples are discussed. A second unpleasant feature of the function 9 must also be considered when p is near 1. We know that yk converges to some y and that vk converges to 0, but the behavior of Gk is not known from the theory. Numerical results indicate that the sequence {ak‘k-Z 0} has two limit points with {a 2n} that sometimes differ considerably. For p near 1, one of and {02n+1} approaching two numbers the two limits appears to be 0 making those akvk go to 0 rapidly while the other akvk approach 0 more slowly. At the other end of the range 1 < p < a, our program came to a halt not because the algorithm was sensitive to large values of p, which it is, but because the computer can not store exponents that are too large. More to the point is that we could not compute the LP norm when p grew too large. Two examples were programmed in FORTRAN IV for a CDC 6500. The first is taken from Barrodale and Young [4]. The linear system to be solved is x = 1.52 x + y = 1.025 x + 2y = 0.475 x + 3y = 0.01 X + 4y = - .475 x + By = -l.005 128 64 32 16 1.5 1.499972 1.499944 1.499890 1.499894 1.500651 1.503757 1.514762 1.520005 1.520126 1.520215 1.520187 1.520037 1.5200001 1.52 77 —.499955 -.499909 -.499817 -.499671 -.499637 -.500057 —.502571 -.503800 -.5037866 -.503744 -.5036206 -.5033896 -.503333 -.503 Table 4.1 .025 .0251982 .0253977 .025801 .0266422 .0285313 .032874 .043216 .050790 .0532758 .056436 .060543 .0659792 .0701003 .073 Iterations 26 21 21 21 23 53 30 45 47 37 27 364 78 The best LP approximate solution was computed for p = l.04,l.06,...,1.5,2,4,8,...,128. The search for Gk was terminated when (9(a)) < 10-6 or after 32 iterations of the mixed bisection-secant method whidhever occurred first. All of the solutions together were computed in less than forty seconds. Some of the solu— tions are listed in Table 4.1. The second example appears in Cheney [6, p.44]. The overdetermined system of linear equations is x + y = 3 x - y = l x + 2y = 7 2y + 4y = 11.1 2x + y = 6.9 3x + y = 7.2 This system poses special difficulties because the solution of the ‘1 problem is not unique. All points on the segment joining P1 = (1.77,1.89) and P2 = (2.516667,1.516667) solve the £1 problem with a minimal ‘1 error vector of length 4.7. The LP problem was solved for p = l.06,l.08,...,1.5,2,4,6,20,40,100,400. Ia H H‘ F4 +4 (A H H‘ F‘ +4 be H Ul .42 .38 .34 .26 .22 .18 .14 .10 .06 2.0883483 2.0889511 2.0893571 2.0895666 2.0896032 2.0895121 2.0893464 2.0891500 2.088926 2.0884417 2.0880841 2.0872254 Table 4.2 79 1.7400827 1.7365094 1.7337896 1.7319498 1.7309038 1.7304580 1.7303686 1.7304290 1.7305371 1.7307791 1.7309614 1.7313904 Iterations 15 16 16 14 10 34 22 11 13 13 23 80 The computed results for pip 1.5 agree with those published by Cheney [6] and Duris [9]. For 1.06 g p < 1.5, no pub— lished figures are available, but as p decreases (see Table 4.2), the solution of the LP problems approximately equals a solution of the £1 problem. For 1 < p < 1.06, algorithm 2.2 consistently obtained solutions about Q = (2.04615,l.75193), which is quite far from the computed limit [1] of the solutions of the LP problems as p 4 l, (2.0883,l.7309). It should be noted, however, that Q is an ‘1 solution of the problem since Q = 1P1 + (1-x)P , 'where l = .369839. The problem of determining how accurately one must know a in step 2 was particularly troublesome with this example. The smaller [9(a)] is forced, the longer the al- gorithm takes and the greater is the possibility that the computer is not capable of locating the desired a. To solve the £1.06 problem, it was necessary to know a such that [9(a)] < 10-24 in each iteration, and the computations took 28.5 seconds. The inability of the computer to store numbers exactly and the presence of many solutions of the ‘1 problem near the unique solution of the LP problem apparently teamed up to render algorithm 2.2 ineffective for values of p less than about 1.06. 81 On the topic of 2p approximation when 0 < p.g 1, algorithm 3.20 was programmed in FORTRAN IV for a CDC 6500, and two examples were studied. The example taken from Cheney [6, p.44] that we dis- cussed earlier was tested with p='i—O", n=1,2,...,lO. For p = l, the algorithm found both corner point solutions mentioned previously. All points on the line segment joining those two points are also solutions of the £1 problem. For 0 < p < 1, however, the original problem has the unique solution (2.516667,1.516667). For each case, the algorithm took less than a second to compute the solutions. For a second example, we chose K to be the subspace S l spanned by the single vector 1 ,, and took b to be 0 -5 1 The intersection of K9 (5), where s = b since b E KL, . . - _ z i and the three dimenSional 1p unit ball for p - 10,,10,..., %% are shown in Figure 2(a) of Chapter III. 3 runs dia- gonally from the upper left to the lower right passing through the corner points shown. As Table 4.3 indicates, the solution of problem (P*) "jumps" from the corner along 5 to the ones On either side for p a .777 as the picture indicates. .78 .775 .4545 .4383 .4160 .4107 .4093 .3856 .3442 .2886 .2163 .1291 .0433 .0014 .5000 .4629 .4204 .4112 .4089 .3715 .3150 .2500 .1768 .0992 .0313 .0010 .4545 .4383 .4160 .4107 .4093 .3856 .3442 .2886 .2163 .1291 .0433 .0014 82 .5000 .4629 .4204 .4112 .8185 .7711 .6883 .5772 .4327 .2582 .08665 0 .002891 -.0002891 Table 4.3 .0000 .08185 .08185 .07711 .07711 .06883 .06883 .05772 .05772 .04327 .04327 .02582 .02582 .008665 .008665 .0002891 .5000 .4629 .4204 .4112 .8185 .7711 .6883 .5772 .4327 .2582 0 .08665 0 .002891 0 2.000 2.000 2.000 2.000 1.998 1.998 1.949 1.949 1.896 1.896 1.861 1.861 1.845 1.845 1.848 1.848 1.873 1.873 1.923 1.923 BIBLIOGRA PHY BIBLIOGRA PHY [ l] Abdelmalek, N.H., Linear Ll approximation for a discrete point set and L1 solutions for over- determined linear equations, J. ACM 18 (1971), 41—47. [ 2] Abdelmalek, N.H., On the discrete linear Ll approx- imation and L1 solutions of overdetermined linear equations, J. Approximation Theory 11 (1974), 38-53. [ 3] Anton, H. and Duris, C.S., An algorithm for best approx- imate solutions to Av = b in normed linear spaces, J. Approximation Theory 8 (1973), 133-141. [ 4] Barrodale, I. and Young, A., Algorithms for best L1 and L6° linear approximations on a discrete set, Numer. Math. 8 (1966), 295-306. [ 5] Barrodale, I. and Roberts, F.D.K., An improved algorithm for discrete £1 linear approximation, SIAM J. Numer. Anal. 10 (1973), 839-848. F 6] Cheney, E.W., Introduction to Approximation Theory, McGraw Hill, New York, N.Y. 1966. r7] Cline, A.K., Rate of convergence of Lawson's algorithm, Math. Comp. 26 (1972), 167—176. [ 8] Descloux, J., Approximations in Lp and Chebyshev approximations, J. Soc. Indust. Appl. Math. 11 (1963), r“: 9] Duris, C.S., An algorithm for solving overdetermined linear equations in the LP-sense, Mathematical Report 70-10, Drexel University, 1970. [10] Duris, C.S. and Sreedharan, V.P., Chebyshev and Ll- solutions of linear equations using least squares solutions, SIAM J. Numer. Anal. 5 (1968), 491-505. [ll] [12] [13] [14] [l6] [17] [18] [19] [20] [21] [22] The cal- Comput. Fletcher, R., Grant, J.A., culation of linear L J. 14 (1971), 276-279. and Hebden, M.D., approximations, Fletcher, R., Grant, J.A., and Hebden, M.D., Linear minimax approximation as the limit of best LP- approximation, SIAM J. Numer. Anal. 11 (1974), 123-136. Hoel, P.G., Certain problems in the theory of closest approximation, Amer. J. Math. 57 (1935), 891-901. Lawson, C.L., Contributions to the Theory of Linear Least Maximum Approximation, Ph.D. Thesis, University of California, Los Angeles, 1961. Motzkin, T.S. and walsh, J.L., Least pth power poly— nomials on a real finite point set, Trans. AMS 78 (1955), 67-81. Motzkin, T.S. and Walsh, J.L., Polynomials of best approximations on a real finite point set. I, Trans. AMS 91 (1959), 231-245. Motzkin, T.S. and Walsh, J.L., Polynomials of best approximation on an interval, Proc. N.A.S. 45 (1959), 1523-1528. Rice, J.R. and Usow, K.H., The Lawson algorithm and ex- tensions, Math. Comp. 22 (1968), 118-127. Sreedharan, V.P., Solutions of overdetermined linear equations which minimize error in an abstract norm, Sreedharan, V.P., Least squares algorithms for finding solutions of overdetermined linear equations which minimize error in an abstract norm, Numer, Math. 17 (1971), 387-401. Sreedharan, V.P., Least squares algorithms for finding solutions of overdetermined systems of linear equa- tions which minimize error in a smooth strictly convex norm, J. Approximation Theory 8 (1973), 46-61. Stiefel, E., Note on Jordan elimination, linear program- ming, and Tchebycheff approximation, Numer. Math. 2 (1960), 1-17. »‘i]]]]]i]]i“ 8 "([11]