ALGORITHMS FOR SGLVTNG EVEREEEIENMINEB SYSTEMS 0E LINEAR L57 LIT-7c: EQIIATIGRS IN THE A; SETTSE: ' ’2 5:1;.ffEffjiisz TIEeais Io: Ike Degrée of PIE D MTPH‘GATT STATE TTT‘TTVERSTT‘I' R‘JBERT ‘1‘iTE. LTAM OWENS ‘ ' 39T5 _ IIIIIIIITITIIIIIITTITIITITIIIITTIITTI L. 3 1293 00839 6214 SEP27W ‘9' ABSTRACT ALGORITHMS FOR SOLVING OVERDETERMINED SYSTEMS OF LINEAR EQUATIONS IN THE ‘P SENSE BY Robert William Owens In this thesis, we investigate methods for approx- imating a solution of an overdetermined system of linear‘ equations. A best approximate solution of the linear sys- tem Ax = b is taken to be a vector x which minimizes the length of the error vector n(x) = b-Ax. we consider the two classes of approximation problems obtained when we determine the length of n(x) first by a smooth strictly convex norm, and second by an LP metric for O < p < 1. For each of the two approximation problems, we study a dual problem whose solution leads directly to a solution of the original problem. Algorithms for solving the dual problems are presented, and numerical results from several LP approximation problems, 0 < p < a, are discussed. ALGORITHMS FOR SOLVING OVERDETERMINED SYSTEMS OF LINEAR EQUATIONS IN THE LP SENSE BY Robert William Owens A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR.OF PHILOSOPHY Department of Mathematics 1975 ACKNOWLEDGMENTS I would like to express my sincere thanks to Professor V.P. Sreedharan, without whose direction and encouragement I could not have written this thesis. I also thank the other members of my committee, Professors Harvey S. Davis, Norman L. Hills, Edward C. Ingraham, and Mary J. Winter, whose helpful comments and suggestions aided in the final preparation of the thesis. Finally, special thanks are reserved for Jill Shoemaker for typing the final manuscript. ii CHAPTER I . CHAPTER II . CHAPTER III. CHAPTER IV . BIBLIOGRA PHY TABLE OF CONTENTS INTRODUCTION . . . . . . . . APPROXIMATION WITH A SMOOTH STRICTLY CONVEX NORM . . . THE LP PROBLEM, o
n and b 6 ng‘\~1mage(A). we are interested in finding x 6 n9‘ min- imizing the number ”b - Axnp. We shall refer to this as the LP problem. In the case of p = a, we often refer to it as the Chebyshev problem. The case where p = 2 is the easiest to solve since this is just the problem of finding the usual Euclidean dis- tance from a point to a subspace. If the matrix A has rank n, then x = (ATA)—1ATb is the solution of the £2 problem, and the only difficulties that arise are computational ones stemming from the fact that ATA may be ill conditioned. A number of essentially different methods have been developed to handle the Chebyshev problem and to a lesser degree the ‘1 problem. Since each of these norms is neither strictly convex nor smooth, i.e. both have "flat spots" and 1'sharp corners", most of the powerful theorems from approxima-- tion theory are not applicable and special techniques must be devised. Since the unit ‘1 and La balls are convex poly- hedra, the methods of linear programming are applicable for solving the £1 and Chebyshev problems. Stiefel [22] pre- sented suCh an algorithm to solve the Chebyshev problem, and more recently Barrodale and Young [4], Barrodale and Roberts [5], and Abdelmalek [l] have given improved algorithms for handling both problems by linear programming. It has been conjectured that the solution of the La problem is the limit of the solutions of the LP problem as p 4 a, and similarly for the L1 problem letting p 4 1+. It is well known that if these limits exist, then they are exactly the solution that one would expect, but the question of convergence is the crucial point. Descloux [8] established the convergence of the solutions of the LP problem as p 4 a providing a justification for this method of solving the Chebyshev problem. The question concerning the convergence of the solutions of the ‘p problems as p 4 1+, however, is still unanswered. Abdelmalek [1] incorrectly applied a theorem of Hoel [13] to conclude that the solutions of the ‘P problems converge as p.+1+. Moreover, even if such a theorem were established, there would remain the problem of computing the solutions of the ‘P problems when p is near 1. This, as shall be mentioned, is another significant difficulty. Methods of solving the Chebyshev problem by trying to find the limit of the solutions of the LP problems as p 4 a have been pre- sented by Fletcher, Grant, and Hebden [12]. Attempts to solve the ‘1 problem by locating a limit of the solutions of the ‘p problems as p 4 1+ have been investigated by Abdelmalek [1]. Another method used to solve the Chebyshev problem was developed by Lawson [14]. Using a result of Motzkin and Walsh [16], Lawson was able to compute the solution of the Chebyshev problem as the limit of the solutions of a sequence of weighted LP problems where p may be held fixed and the weights change in each iteration. Since weighted L2 problems are almost as easy to solve as 12 problems, Lawson chose p = 2 throughout his algorithm. An analysis of the rate of convergence of the Lawson algorithm was made by Cline [7]. Another theorem of Motzkin and Walsh [16] similar to the one guaranteeing that Lawson's algorithm converges to the solution of the Chebyshev problem also says that the solution of the £1 problem is identically the same as the solution of a weighted LP problem for some choice of weight functions depending upon p. So far, however, no one has discovered an algorithm for computing those weights either directly or iter- atively even in the case when p = 2. The best known method for solving the Chebyshev problem is the exchange algorithm as can be found in Cheney [6]. It exploits the fact that the m xn Chebyshev problem given by the overdetermined linear system Ax = b has exactly the same solution as some particular (n-El) xn Chebyshev problem determined by Ax = b, where A is composed of n + 1 rows of A and S is the n + 1 vector of the corresponding entries from b. In fact, of all such (nuEl) xn problems, the one yielding the solution of the original problem is that one for which the error ufiJEIRAx - b“ is the largest. xEfi! Although most of their work deals with the LP problem for l < p < a, Duris and Sreedharan [10] have also presented algorithms for solving the Chebyshev and ‘1 prob— lems that use only the solutions of least squares problems. Much more can usually be said about the solutions of the LP problem when l < p < o since in that case the norm is both strictly convex and smooth, i.e. the boundary of the unit ball contains no line segment and at each point on the boundary of the unit ball there is a unique hyperplane sup- porting the unit ball. Also, the duality of the spaces L 1 l . nd where —-+ -= 1 can be ex loited. Sreedharan [19,20,21], Duris [9], and Anton and Duris P [3] have develOped several algorithms for solving the approx- imation problem for an arbitrary smooth strictly convex norm and in particular for the LP norm when l < p < a. In each case, the solution of a dual problem is obtained iteratively by carrying out an orthogonal projection and solving a single nonlinear equation in one real variable at each iteration. Numerical results indicate that the algorithms converge slowly when p is very near 1 or very large, but work satisfactorily in all other cases. In Chapter 2 of this thesis, another algorithm for solving the given approximation problem with a smooth strictly convex norm is presented. The algorithm uses ideas developed by Sreedharan in [20,21]. Numerical results on LP problems are presented in Chapter 4. Rice and Usow [18] have extended the previously men— tioned Lawson algorithm to include LP problems for 2.g p < a and they have developed a method for accelerating the conver- gence of the algorithms. However, they observed that for p large the convergence is still quite slow. Moreover, the algorithm is not applicable at all for l < p < 2. Another algorithm applicable for 2.5 p < a but not for l < p < 2 has been presented by Fletcher, Grant, and Hebden [11]. Although primarily designed for Lp approxi- mation problems, i.e. continuous rather than discrete approx- imation, the modification is immediate. For p12 3, second order convergence is proven, but numerical results are few and inconclusive regarding this algorithm. A final method of solving the LP problem is by min- imizing the differentiable function of n real variables f(x) = Hb-—Ax"p (1.3) It is sufficient to find a zero of the gradient vf. Newton's method and the method of steepest descent are applicable to speed convergence to a root of vf if p is large enough to guarantee enough derivatives of f. Unfortunately, p'Z 3 is required so again the case 1 < p < 2 is left untreated. In summary, the situation is roughly as follows: p = 1: Only a very few algorithms are available. 1 < p < 14-2, e small: Almost no effective method is known for treating these LP problems. p near 2, i.e. not too close to 1 nor very large: There are many algorithms available - few difficulties. p very large: There are a couple of good algorithms avail- able to choose from, but not very many. p = a (Chebyshev problem): There are several efficient algo— rithms to solve this problem. With a slight modification in the definition of "'"p’ the LP problem for O < p < 1 also makes mathematical sense and we are consequently led to investigate solutions of that problem also. With the exception of Hoel [13] and Motzkin and Walsh [15,16,17], no one has considered this question. When we choose 0 < p < 1, "°"p suitably defined turns out to be a p—homogeneous metric but not a norm, and the unit ball is not convex. In Chapter 3 we consider the LP problem for O < p < l establishing theoretical and compu- tational methods for finding a solution. In Chapter 4 some numerical results are presented. CHAPTER II APPROXIMATION WITH A SMOOTH STRICTLY CONVEX NORM In this chapter, we shall study the system of linear equations Ax = b where A is an m xn matrix, m > n, x is a n-vector, and b is an m-vector. We assume all numbers are real. Let "-R be a smooth strictly convex norm on 35‘. The problem that we shall be concerned with, referred to as problem (P), is (P): Find x 6 EMT minimizing [Rb-Ax" [x 6 n3‘}. In [21], a dual problem (P*) was considered, the correspondence between problems (P) and (P*) established, and two algorithms for constructing the solution of problem (P*) were presented. Here we review problem (P*), the previous results concerning that problem and (P), and then present a new algorithm for solving problem (P). Before actually beginning this development, we set down some assumptions, definitions, and notation that will remain standard throughout this chapter. m (.‘.) denotes the usual inner product on H! , i.e. m (x y) = Z x.y. ‘ j=1 J 3 We denote the transpose of a matrix A by AT. (v) means the linear span of the vector v. K = Image (A) = [Ax[x 5 En] l T m K = Ker A = [x E n: ](k]x) = O Vk E K] E:]R“1 4 Kl is the orthogonal projection of 35‘ onto K‘, where orthogonal means with respect to the inner product (']°) given above. 3 = Eb p = inf{||b-k” [k E K] We assume that p > 0 since the problem is trivial otherwise. Recall that a norm "‘" is strictly convex if and only if 1 . . "x" = "y” = Eflx-Ey“ implies that x = y, and "-H is smooth if and only if through each point of unit norm there passes a unique hyperplane supporting the closed unit ball B= [x 6mm]"x" 51}. Given v #'0, we define the vector v* 6 H5‘ by (v*]v) = ”v” and max[(u]v*) [Hun g l} = l. 10 By Lemma 2.1 of [21], the correspondence v H‘V* is a con- tinuous function from H§T\~[O} into itself. Moreover, in the special case when "'H is the LP norm with l < p < a, v* takes on the particularly simple form v.]P-l v; = ‘43—]? sgn V3 anp In [21], Sreedharan introduced the following dual problem (P*) (P*): Find 2 EK@(s), ”z” = 1 maximizing (s‘w) over all w 6 K®(s), "w" = 1. 2.1 Lemma. (i) There exists a unique k e K minimizing Hw-yHIYGIQ. (LlJ) (ii) Problem (P*) has a unique solution. Proof: (i) Let x E K and set 5 = Hb-xH. S = [y 6 K THb"Y".S 5] is a compact set, and f:K 4 n: by f(z) = Hb-—z” is continuous on S, so f must achieve its minimum value on S. Suppose x,y 6 S such that f(x) = ub-xn min[|]b-z|] [z 6 S] = Hb-yl] = f(y). (2.1.2) §sz- (x+y)u -§-II(b-x) + (Io-y)" <%um-xn+Im—yw =Nb-xu=Im—yu 11 By the strict convexity of H-U, b-x = b-y, i.e. x = y. Thus there is a unique k E K minimizing (2.1.1). (ii) It is clear that problem (P*) has a solution. Sup— pose y,z both solve problem (P*) and y #'2. Since ”-u is strictly convex, "y" = ”z" = 1, and y #Tz, IIY+2II < 2. (2.1.3) Since (91y) = (s|z) > o, y+z 510. Let w = ”3’12” Then w E K@(s), “w" = 1, and _ (Sly) + (812) = 2 s (s‘w) — THY +2". #120” > (s‘y) by (2.1.3). But this contradicts the assumption that y solves problem (P*). Thus the solution of problem (P*) must be unique. The key results of [21] used to solve problems (P) and (P*) are 2.2 Theorem. If 2 EK@(s) with ”z" 1 and z* eKJ’, then b - (b]z*)z E K and p (b]z*). s s) 2.3 Corollary. Let 2 be as above. Then (b]z*) = (s z 2.4 Theorem. Let 2 solve problem (P*). Then 88 82. b _ s s) (s z 2 6 K and p = ( 2.7 Lemma. Let z,W’E 351 be linearly independent. Then a 6 1R satisfies VI Us ch Ch 12 IIz-OMII g Nz-xwu vx 6 R if and only if ((z-aw)*]w) = 0. Next we give a new algorithm for solving problem (P). 2.2 Algorithm. Step 1. Set k = O and yk = "s" Step 2. Compute y; and vk = AA yk . Step 3. If vk = 0, go to step 6; otherwise continue. Step 4. Choose Gk > 0 such that ((yk-akvk)*[vk)==0. Step 5. Set yk+l = 72R where z = yk - akvk. Increment k by 1 and return to step 2. Step 6. Using yk, the solution of problem (P*), solve problem (P). 2.3 Theorem. Either algorithm 2.2 solves problem (P) in a finite number of steps or the yk converge to the solution of problem (P*). Before proving theorem 2.3, we make a couple of obser— vations about algorithm 2.2 and establish a few facts to be used in the proof. In step 1, we could have selected yO to be any element of K6 (3) of norm one with (yo|s) > 0. Our choice is just a very convenient one. In step 3, one might choose to stop the computations if ”Vk” or ”akvk” is 13 small. The reasons for these stop rules will be apparent in the course of the proof of the convergence of the algo— rithm. With respect to step 6, Theorem 2.4 of [21] says that Ax = b — Y‘all-[812)— yk = b - pyk has a solution, say 3?. Since “b-A§H = p, i solves problem (P). We next prove a couple of propositions essentially saying that algorithm 2.2 makes sense. In particular, vk = 0 implies that yk solves problem (P*), and the choice of Gk specified in step 4 is always possible. 2.4 Lemma. Let vk #'O. (i) There exists a 3 0 such that Hyk - GNkH < 1. (ii) If Yk’vk are linearly independent, then for every 0 > O sudh that ”yk - Gvk" < l, we have 0 < Hyk - avkn. (iii) If yk,vk are linearly dependent, then there is a unique a > 0 such that “yk - avk“ = 0. (iv) There is a unique 6 6 3% such that VA 6 n1 ”yk - Bbku g “yk - kau , and B > o. (v) ((yk - fivk)*|vk) = 0. Proof: By the definition of y; , max[(u]yl:)| Ru“ 5 1} = 1. My]. - wk" = 1ka - 0.ka maXIIUIYPI Hull 5 11 14 -av Yk k -owk[{TY1:‘ 2 lek' WkTHTTYk = (Yk’WRTY;) = (ykwp - aIy;(vk) “Ykn - 0(y;]vk) _ _ * - l a(ykIvk). (2.4.1) Observe that * _ * T* T1: T*_ T*2 (yk|vk) - (yk]AA yk) = (A yk]A yk) - ”A yknz . (2.4.2) Together with the definition vk = AATy;, ‘we have that . . T11- vk = 0 If and only If A yk = 0. (2.4.3) By assumption, vk #TO so from (2.4.1) IlYk-Wk" > 1 for all a < 0. (2.4.4) Suppose that Yk’vk are linearly dependent. Then there is a B 6 IR such that nyk-kau = o. By (2.4.4), I3 > o. More specifically, fl = proving (iii). Suppose Yk’vk l vkfl are linearly independent. If there were no a > 0 such that ”yk-avk“ < 1, then Ika-avk“ 2 1 = ”ka for all a e 13. (2.4.5) By Lemma 2.7 of [21], (2.4.5) implies that (yElvk) = O, which by (2.4.2) and (2.4.3) means that vk = O, contradicting the ‘hypothesis that vk i'O. Consequently, there must be an <1>C> 15 such that "yk-avku < l establishing (i) and (ii). Consider the strictly convex function f:]R 4 ]R by f(x) = [Tyk—kau . f(0) = 1, f(x) 4 a as A 4 a, and by (i) there exists an a > 0 such that f(a) < 1. By the continuity and strict convexity of f, there is a unique 1 > 0 such that f(i)==l, and by the strict convexity of f, there is a unique 5, O < B <'X, minimizing f. (iv) is proven. Finally, from (iv) and lemma 2.7 of [21], it follows that for the 5 found in (iv). This completes the proof of Lemma 2.4. We summarize the important points of Lemma 2.4 in 2.5 Proposition. Assume that yk and VR are linearly in- dependent. Then there exist a unique 0 > 0 such that _. * == ((Yk avk) [vk) 0. (2.5.1) Moreover, with this choice of a, 0 < ||yk~akaI < 1. (2.5.2) As an immediate consequence of proposition 2.5, step 4 of algorithm 2.2 is guaranteed to make sense. 2.6 Proposition. For any k 2 O, yk E Ke(s), "ka = l, and ‘YkT3) > 0. 16 Proof: The assertion is true for k = O by construction. Assume that the pr0position holds for k = O,1,...,n. We shall now verify its validity for k = ni-l. Y = = n = . If vn 0, then yn+1 T§;T' yn so the assertion is true for k = nntl. Assume that vn {’0. Since vk AATy]:, vk e K vk. s e K“, yn e K@(s) with (ynTS) ,\ o, and vn E K implies that yn and vn are linearly inde— pendent. So by prOposition 2.5, o < Hyn-anvn". (2.6.1) yn nn Since yn E K®(s) and vn 6 K, yn+1 = TTYn-anvn" E K@(s), and "yn+1" = 1. Finally, ynnanvn (Yn+1‘8) a (TTYn- anVnTT TS) (yn]S)-Gh(vn|s) TTyn - anvn “ (yn|s) a "Yn"ahvn" (2.6.2) > O by the induction hypothesis. This completes the proof of proposition 2.6. We next show that steps 3 and 6 of the algorithm do not lead us astray by proving 2.7 Proposition. If vk = 0, then yk solves problem (P*). Proof: By (2.4.3), vk = 0 implies that y; 6 Ker AT = K‘. Also, by Lemma 2.6, yk 6 K@ (s) and "Yk" = 1. Applying theorem 2.2 of [21], we have 17 b - (b(y;)yk e K and p = (b|y;). (2.7.1) (2.7.1) together with corollary 2.3 of [21] and lemma 2.1(i) yields that s s) b-Tss—‘Jy—kTYk (2.7.2) is the unique point in K closest to b. Let 2 solve problem (P*). Then by Theorem 2.4 of [21], s s) b- (S 2) 2 (2.7.3) is also the unique point in K closest to b. Equating (2.7.2) and (2.7.3), we have 2 = ——ZE—7-. (2 7 4) (3 2) (STYk . . Taking the norm of both sides of (2.7.4) and using the facts that ”z" = “yku = l and (yk]s), (st) > O, we have (s‘z) = (STYk)° (2.7.5) Lemma 2.1(ii) now says that yk = z, i.e. yk solves problem (P*). We now prove Theorem 2.3, i.e. that algorithm 2.2 solves problem (P). Since the proof is somewhat lengthly, we first give an outline of the steps to be taken. We show that (i) Vk 2_O O < pk < pk+1, where pk = (ykls), 18 (ii) 1im Hyk-Gkvku = l, k4ao (iii) 1im vk = O, k4a (iv) any limit point of [yk]k'2 O] solves problem (P*), and (v) 1im yk = y which solves problem (P*). k4a Because of PrOposition 2.7, we can assume that vk #‘o Vk.2 O. 2.8 Proof of Theorem 2.3. Let pk = (YkTS). (2.8.1) (i) Claim: 0 < pk < pk+1 Vk‘z o. _ ._§_ pk+1 = (Yk+lTS) - (yk‘S) b (2 6 2) - TTYk'akvaT y ° ° ’ > (yk]s) by (2.6.1). Thus 0 < pk < pk+1 Vk‘z 0. (ii) Claim: 1im "yk-akvk” = l. k4¢ From (i), O < pk < pk+1 _<_ max[(w]s)]w E K@(s),"w" = l} l9 and from (2.6.2), p k 3 Wk" “kvaT' pk+1 pk Thus lim “y - v = lim = 1. k... k a" k” 11.... pk+1 (iii) Claim: 1im vk = O. R 4a: Suppose the claim were false. Then there exists 5 > 0 such that, by taking a subsequence if necessary, Hvk“ 2.6 for all k.2 0. (2.8.2) 1 > TTYk‘akaTT 2 ”(1)1ka ' TTYkTT = aknvk" ' 1 20145—1 from which one obtains 0 <0:k <% . (2.8.3) Again by taking subsequences if necessary, we can assume that lim = a and lim y = y. (2.8.4) k4o ck k4o k By Lemma 2.1 of [21], the mapping x H x* is continuous on IRm\{O] , so 1im v = lim AATy* = AATy* a v. k k k4a k4¢ "v” 2 5 by (2.8.2). 20 Also, by the continuity of x 4 x* and “-H, and since ((yk-okkaIVk) = o ‘v’k 2 0, we have that ((Y-GV)*|V) = 0, (2.8.5) h-wu=1=hN (4&0 Either 0.: O or a > 0. We show that each of these possi- bilities leads to a contradiction forcing us to conclude that lim vk = O. k4a If a = 0, then (2.8.5) becomes (y*]v) = 0. Together with (2.4.2) and (2.4.3), this means that v = O, contradicting our assumption that v {'0. Suppose a > 0. First note that y,v are linearly independent since (v|s) a O and (y|s) = lim (yk]s) = lim pk > 0. By Lemma 2.7 of [21], (2.8.5) implies k4m k4a that IIY'O‘VH S IIy-xvll for all ). 6 1R. (2.8.7) But (2.8.6) and the strict convexity of "°”’ forces IIy-gvu < 1 which contradicts (2.8.7). Hence we must conclude that lim vk = O. k4¢ (iv) Claim: any limit point of [yk]k.2 O} solves problem (P*). 21 By taking a subsequence and reindexing if necessary, we can assume that lim yk = y. From (iii) and the continuity of k4a the mapping x 14 x* , we have that v = lim vk = 0, (2.8.8) k4a which by (2.4.3) implies that y* e Ker AT = K‘. Also, by proposition 2.6, y E K@(s) and "y“ = 1. (2.8.9) Applying Theorem 2.2 and Corollary 2.3 of [21] and Lemma 2.1(i), b _ s s (s y)y is the unique point in K closest to b. Suppose z solves problem (P*). Then by theorem 2.4 of [21], s s _ s s . b - (s z) z b (s y) y , i.e z = 4— (s 2) my) ' (“M") Taking the norm of both sides of (2.8.10) and using "z" = "y“ = l, (s]z) > 0,, (st) = lim (s‘yk) = lim pk > 0, it k4a R40 follows that (st) = (sTy). Finally, Lemma 2.1(ii) says that z = y, i.e. y solves problem (P*). (v) Claim: 1im yk = y, the unique solution of problem (P*). k4~ 22 By the uniqueness of the solution of problem (P*) any convergent subsequence of {Yka 2:0] converges to the unique solution of problem (P*). Due to the compactness of B= [x€K®(s) ]|]x]]=l],‘ 1im yk=y. k4~ This completes the proof of Theorem 2.3 establishing that algorithm 2.2 is guaranteed to find the solution of problem (P). CHAPTER III THE LP PROBLEM, o < p < 1 We shall be considering the system of linear equations Ax = b where A is an m xn matrix, m > n, b is an m-vector, and x is an n—vector. All numbers are assumed to be real. For yEIRm and o
0, since other- wise the problem is trivial. Finally, let (v) denote the linear span of the vector v. Observe that s-b 6 K since 0 = s-Eb = Es-Eb = E(s-b). As a result, d(s,K) = inf[IIs—k||p]k 6K] = inf[”b-k+ (s-b)TIp|k ex] = inf[]|b-k||PTk 6K] = d(b,K). The existence of a solution of problem (P) follows immediately from the continuity of the LP norm and the finite dimensionality of the subspace K. Given problem (P), we associate a dual problem (P*): Find 2 €K®(s), ]|z||p= l maximizing (sTw) over all w E K€9(s), "w”p = l. The relation between problem (P) and problem (P*) is given in the following theorem which extends Theorem 2.4 of [21]. 3.1 Theorem. Let 2 solve problem (P*). Then (1) pl/sz) = (s|s)1 (3.1.1) and (ii) b - pl/Pz e K. (3.1.2) Proof: (i) (st) max[(s|w) [w E K®(s), ”wup = l} max[(s]k+Bs)]k E K, 5 6 1R1TTk+BSITP = 1} 26 = (s‘s)max[B E 1R| [lk+BsHp = 1, k E K} = (sls)max{B 6 1R {0} 1 Hk+snp = E—lp, k ex} _ 1 _ (s|s)max{“k+s“l7p Dc 6 K} ' P l (s s)min{||k+ suglppc 6K} = (s‘s)—7—1 . pl p Thus pl/p(s‘z) = (s|s) which is (3.1.1). (ii) Let t E K‘. We shall show that (tlb-pl/pz) = 0. Suppose (t‘s) = 0. Since 2 e KGD(s) and t e KJL n (s)*, (t‘z) = 0. Also (t‘b) = (Et‘b) = (t‘Eb) = (t|s) = 0. Thus (t‘b-—p1/pz) = o if (t|s) = 0. (3.1.3) Next suppose that t = 3. Since E is the orthogonal pro- jection of mm onto K" and s = Eb, we have that (s‘b) = (s‘s). (3.1.4) (t|b—p1/pz) = (slb-pl/pz) , (S‘b) - pl/P(s‘z), (s‘s)-(s‘s), by (3.1.4) and (3.1.1), 80 (t‘b-pl/pz) = o if t e (s). (3.1.5) 27 Let F be the orthogonal projection of If“ onto (3), and let t e K‘L. Write t = t1+t2, where t1 =- t-Ft and t2= Ft. Clearly, (t1)s)==0 and t2 6 (s), so by (3.1.3) and (3.1.5), (t‘b-—p1/bz) = 0. Since t E K‘ was arbitrary, b-pl/bz E K. This completes the proof of the theorem 3.1. Observe that ((b- (b- pl/Pz) up = "pl/p2.“ = pnznp = (3. /pz so b--p1 is a point of K of minimal distance from b. In the statement and proof of Theorem 3.1, we used only the p—homogeneity of the LP norm on min Figure 1 on the following page gives a geometric inter- pretation of theorem 3.1. Given K and b as in the theorem, we are to find b’ 6 K which minimizes {"b-k"p‘k E K}. Since the Lp-norm is translation invariant, this is equivalent to finding 3’ e K minimizing [Hs-—k”p‘k 6 K}. Let p = d(b,K) and let 2 solve (P*) as indicated in the figure. By expanding the unit ball by a factor of p, which by the p—homogeneity of ”-Hp means multiplying the unit ball by l/h, multiplying by -d” and translating it by 3, one can p easily see that z is taken by the above operations to the point 3’, i.e. s’= s-pl/pz. Thus b’= b-pl/Pz. Figure 1 also indicates in which direction one should look for a characterization of the solutions of problem (P*). By the symmetry and concavity of "-"p, a solution of problem (P*) should lie at a ”corner“ or on an “edge" of the unit ball, and these correspond to points at which a certain number of coordinates of the given point are zero. Before making 28 (s) .b b' l/ \ x \ / \ ’/ \‘ \ __(\ ,>' / \\ z” (1,0) \\ I / \ S / I \ / \ / \ l \ / \\ / I \ I \ l \ I \I T Figure l. Geometric interpretation of theorem 3.1. these intuitive ideas precise, we introduce some terminology and establish a couple of lemmas. Let X be a real linear metric space, i.e. a real vector space on which a translation invariant metric is de- fined so that the metric space structure is compatible with the linear space structure. Denote by X* the algebraic dual of X, i.e. the space of all linear functions on X. 29 3.2 Definition. Let X be a real linear metric space, AcX, a E A, and H a non-trivial homogeneous hyperplane in X, i.e. H {x E X‘f(x) = 0}, where f e X* \{O}. We say that H+a= [h+a)h 6H} supports A at a if either f(x) 2f(a) vx EA or f(x) _<_f(a) Vx EA. (3.2.1) 3.3. Lemma. Let X be a real linear metric space, f €X* \{O}, A = f-l(0), z E X\A, Z a subspace of X with z E Z and dim Z > O, and A1 = A n Z. Then A1 is a hyperplane in Z. Proof: A1 = A n Z = {x 6 Z|f(x) = 0} being the kernel of a linear functional can fail to be a hyperplane in Z only if either dim Z = O, which is not so by hypothesis, or if f :0 on Z. The latter is also impossible since 2 e Z‘\A by hypothesis so f(z) a! 0. Thus Al is a hyperplane in Z. 3.4 Lemma. Let K be a subspace of mm, f a linear func- tional on K with f ,£ 0 on K, H = f-1(O) = {x €K|f(x) =0}, and B = {x e K ( "pr g 1}. Let 2 e K satisfy (i) “znp = 1, (ii) 2i 7! O, i = 1,2, ...,m, (iii) H+z supports B at 2. Then dimK=1. Proof: dim K 2 1 since 2 6 K\{O}. Suppose dim K > 1. First we show that 2 £ H. If 2 E H, then f(z) = 0. Since dim H = dim K- 1 > 0, there exists x E K\H with "xup g l, i.e. x E B and f(x) > 0. Hence f(x) > f(z) > f(-x). Since -x E B this contradicts the hypothesis that H+z supports B at 2. Thus z£H. Choose x €H\{O} and define 3O 1 if xi = O 6- = , i = 1,...,m (3.4.1) 2. 1 . ‘fi‘ 1f Xi # O l . . 6 = 5-m1n{bi‘1 = 1,...,m}. (3.4.2) Then \zi + exi‘ > O for i = 1,...,m and Va 6 ('6,6). Let 9(a) = Hz + exnp, -o < e < a. (3.4.3) c129 m 2 p—2 2 = p(p—1) Z7 Xj‘z- + eX-| < 0 (3.4.4) dc j=1 J 3 since not all of the x. = O and O < p < 1. This shows that 9 cannot have a local minimum for e = 0, i.e. "z + (‘5’pr < 1 for o < (3'( < 5. (3.4.5) Choose any such 3. Then there must exist 6 > 0 such that Haz+3xnp_<_1 va 6 (1-e,1+e). (3.4.6) Let u = (l - §)z + 3k and v = (l + §)z = 3x. By construc- tion, u,v E B and by the linearity of f f(u) = (1 - -2§)£(z) and f(v) = (1 + §)f(z). (3.4.7) Since we have already verified that f(z) ¥'O, either f(u) < f(z) < f(V) or flu) > f(z) > f(V). 31 In either case, we have a contradiction of our hypothesis that H + 2 supports B at 2. Thus the assumption that dim K > 1 must be wrong. In other words, dim K = 1 which is what we wanted to prove. We are now ready to establish a theorem which guar— antees that we need only check a finite number of points in order to find a solution of problem (P*). 3.5 Theorem. Let K be an n dimensional subspace of nf‘, n < m, with basis bl""’bn' Let D be the matrix (bl""’bn) = (rl,...,rm)T, where ri is the i-Eh row of the mxn matrix D. Let B = {x E K} "xllpg 1}. Suppose that f is a non-zero linear functional on K such that H + z sup- ports B at z, where Hznp = 1 and H = {x e K|f(x) = O}. Denote by I the index set of i's such that 21 #'O, and J = {1,2,...,m}‘~I. Let A be the matrix with rows rj, j EJ, and set N = {x E nf‘le 0}. Then dim N = 1. Proof: First of all, dim N #'0. To see this note that there exists 5 6 13‘ such that D5 = z. By the definition of the set J, it follows that AB 0, i.e. B e N. Now B #'0 since 2 #'0. Hence dim N = Q.2 1. Without loss of generality, let I = {1,...,p} and J = {p4-l,...,m}, and put K* = [Dx‘x E N}. Denote by f* the restriction of f to K*, and let H* = H n K* and B* = B n K*. Clearly, z E B*. We claim that dim K* = q. To see this, observe that the rank of the matrix D is n, and hence by the rank-nullity 32 theorem of linear algebra, Dx = 0 implies that x = 0. Hence dim K* = dim N = q. By Lemma 3.3, H* is a hyperplane in K*. Finally, H* + 2 supports B* at 2 since f* = f|K*. At this point ‘we pause to observe that each k e K* satisfies k“,+1 = --°‘ km = 0. To see this, recall that k 6 K* if and only if there exists an x E N such that k = Dx. But x E N implies that Ax = O, i.e. (Dx)j = 0 for j 6 J = {p4-l,...,m}. We now in essence dr0p the m-u trailing zero coor- dinates and consider our problem in 36‘. To make this pre- cise, let x2 u [x E 33*}(x1,...,xu,0,...,0) e K* ’ .K 4 fit by f(x) = f*(x W1 1,...,XH,O,...,O ), fi= {x eiZ'I'Euc) = 0}, ml H " H {X E K|Ihq|p g 1}, where now’ “X"p = jifi‘xj‘p’ z = (21,...,zu). Notice that E 'has no coordinates equal to zero, and E [zjlp = 1. Also, H is a hyperplane in K and H + ; gupports B’ at 2. Finally, dim K'= dim K* = q. But by lemma 3.4, q = 1, which is precisely what we wanted to prove, namely dim N = l. 3.6 Definition. With an eye toward the future and a certain dislike of repetition, we define the phrase “the usual n dimensional situation" to be the following: 33 K is an n dimensional subspace of mg“ with basis b1,...,bn: n < m: s E K*~\{0} where K‘ = {x e mm\(x|k) = o Vk e K}; D = (b1,...,bn,s) = (r1,...,rm)T where ri is the 1&2. row of the m x(n+l) matrix D: B = {x 6 K@(s)| "xnp g l}; o
0)
coincides with F(x) in at least n + 1 points
of E."
Condition A says that the m x(n-+l) matrix (Yj(xi))
has rank n+-l.
In the same paper, Motzkin and Walsh observe that
”Theorem 6 implies that every extremal polynomial P(x) is
found by interpolation to F(x) in n4—1 points of E:
there exist but a finite number of polynomials interpolating
to F(x) in n-tl points of E, so every extremal polynomial
can be found merely by comparing their measures of approxi-
mation.”
Without making a further assumption about the matrix
Y = (Yj(xi)), namely that it satisfies the Haar Condition,
there need not be only a finite number of polynomials inter—
polating F(x) at ni-l points of E. The suggested pro—
cedure for finding a polynomial of best approximation,
consequently, need not be finite. Applying Corollary 3.10,
we can prove that the observation of Motzkin and Walsh is
correct if Y satisfies the Haar Condition. When the Haar
Condition is violated, however, we can easily construct, even
3
in HR , counterexamples to the assertion of Motzkin and Walsh.
Suppose m = 3, n = l, F(xl) = F(x2) = 0, F(x3) = Y1(x1)
Yl(x2) = Y2(xl) = Y2(x2) = Y2(x3) = l, and Y1(x3) = -2. The
l 1
matrix Y = (Yj(xi)) = l 1 clearly has rank n-tl = 2,
-2 1
39
so Condition A is satisfied. For any a 6 EL
= - +
P(x) avl(x) 032(x)
satisfies P(xl) = P(x2) = 0, so that P(x) interpolates
F(x) in n-tl = 2 points of E. Clearly, there are an
infinite number of these interpolating polynomials showing
that the observation of Motzkin and Walsh is incorrect.
The solution of the given approximation problem is
indeed included among all those functions which interpolate
F(x) in at least n-tl points, but this collection of inter—
polating functions need not be finite as claimed by Motzkin
and Walsh. Of course, this counterexample is possible only
because Y does not satisfy the Haar Condition. By Theorems
3.8 and 3.13, we are guaranteed that the given approximation
problem can always be solved in a finite number of steps.
Before proceeding any further toward a solution of
problem (P*), we pause to consider some of the difficulties
that lie ahead. Essentially we want to find a point on an
n4-1 dimensional cross section of the LP unit ball in n5‘
at which a translation of a fixed subspace is tangent to the
ball. So far we have reduced the problem to one of consider—
ing only a finite number of points. As one can see in the
following figures, these points correspond closely to corners
of a polyhedron, hence the name corner points. In the figures,
the corner points have been connected by straight lines rather
than by the curved arcs that one would obtain when the 2
unit ball is intersected with the specified plane K.
Iljllllwvllc'llll‘l I‘ll!
Figure 2(a). m = 3
p = 1.0, .9, .8, ..., .2
0 10
K = span -1 , l
10
Figure 2(b). m = 5, p = 25
FF-l) [301‘
0 1 l
K= span! 1 , 2 F
2 3
K LBOJ LOJJ
Figure 2(c). m = 5, p - 2
r (so) (07‘
l -5
K = spané 5 , -l P
S -l
K LOJ LSOJ/
41
Our approach to the problem up to now has been sim-
ilar to the idea behind linear programming. In linear pro-
gramming problems, a linear functional defined on a finite
dimensional space is maximized subject to certain linear
constraints on the same space. In this case, one also pro—
ceeds by reducing the number of points at which the solution
may occur from an infinite number to a finite one. In linear
programming, however, the points one is left with actually
are corners of a convex polyhedron. To solve the linear
programming problem, one moves from one corner to an adjacent
one always increasing (at least not decreasing) the value of
the objective function, until a corner point is reached where
all adjacent corners give no higher values for the objective
function. By the convexity of the polyhedron, one then con-
cludes that such a point is in fact a solution of the linear
programming problem. The advantage of such an algorithm is
that one may not have to check all of the corner points all
of the time. In practice, the number of points actually com-
puted and checked is usually far less than the total number
of corners, though there are examples where the usual simplex
algorithm would end up checking all of the corner points.
If possible, we would like to develop a similar
exchange type algorithm to obtain a solution of problem (P*).
There is no difficulty in moving from one corner to another con—
tinually increasing the value of the objective function until
a point is reached all of whose neighbors yield no higher
value of the objective function. The crucial step at which
42
we encounter trouble is trying to conclude that such a point
solves problem (P*). Since the 2p unit ball is not con-
vex, some of the corner points often lie inside the convex
hull of all of the corner points. In figure 2(a), as p
decreases the cross section of the unit ball changes from
convex to non convex. Figures 2(b) and 2(c) are further
examples of non convex cross sections of various 1 unit
balls. Moreover, it is quite possible that a local maximum
of the objective function may not be a global maximum. In
other words, we may not have solved problem (P*). Figure
2(c) shows a situation in which this may occur
Keeping this obstacle of non convexity clearly in
view, we begin to consider the practical task of actually
solving problem (P*). We now turn to the construction of
algorithms for solving problem (P*).
3.14 Remarks. Given any point z E Kea(s), there exists a
B G HJHJ' such that z = DB, where the definition of D
appears in 3.6. Also, since 3 E K*, (2‘s) = (DB‘s) =
Bn+l(s}s). Problem (P*) requires us to
maximize (w‘s) over all w 6 K® (s), ”WHP = l, i.e.
. . n+1 .
max1mize Bn+1(s|s) over all 5 6 fit , HDBHP==1, i.e.
maximize 5
n+1 over all B e 11?.“1 , HDBHP = 1.
Using this last formulation of problem (P*) and Theorem
3.8, we see that the solution of problem (P*) entails at
most the following:
43
(1) find all those 6 6 HJHJ' for which D5 is
a corner point;
(2) select from this collection the one with the
largest n+1St coordinate, say B. Then
(3) z = DB solves problem (P*).
If D satisfies the Haar Condition, then it is quite
1 such that D5 is a corner
easy to find all 5 6 fig“-
point. From Theorem 3.13 and Corollary 3.10, we know that
the solution of problem (P*) must be among the (1:11) dis-
tinct corner points each of which have n coordinates equal
to zero and m-n coordinates non-zero. Suppose we wish to
find the corner point z for which [i‘zi = 0} = {i1,...,in}.
Select any in+1 e {l,...,m}~.[i1,...,in}, and with A =
T _ _ T
(ri ,...,ri ) , 801ve AX - en+1 Where en+1 — (O,ooo,0,l)
1 n+1
an+1. The Haar Condition insures that x exists and is
. _ x _
unique. Set 5 - “DxfiT7E . Then 2 — DB has exactly n
P
coordinates equal to zero, viz. zil = --- = zin = 0, and
”zup = 1. In this manner, all of the corner points can be
found.
The above algorithm lacks three important features:
(1) a method for determining if the Haar Condition is satis-
fied: (2) an orderly way of selecting the (2:) points to
be computed; and (3) the ability, at least theoretically, to
ignore some of the corner points some of the time.
The first of these is lacking for the very good reason
that checking for the Haar Condition involves more work than
44
solving the original problem itself. The second shortcoming
will be corrected shortly, and the third problem will be
considered afterwards.
Since the set of points in KGD(s) of unit norm and
having exactly n coordinates equal to zero is identically
the same as the set of all corner points when the Haar
Condition holds, having that condition satisfied seems like
an ideal setting in which to work. The tremendous problem
of verifying that the Haar Condition holds diminishes that
ideal somewhat. An even more damaging blow is leveled by
Lemma 3.12 which says that the greatest number of possible
corner points occurs only when the Haar Condition holds. In
other words, that situation requires the most work to find
all of the corner points since there are more corners to find.
Our algorithms for finding all of the corner points will work
whether the Haar Condition holds or not. In fact, it is
actually faster without that condition present.
3.15 Definition. Let U = {u e N“ l1 _<_ u1 < (12 < < nn 5
m} and T = [1,...,(:)}. Define Y:T -o U by the following
rules. Given t E T,
(1) Set t =t,u
O = O, and i = 1.
O
(2) Find ui E N such that ui-l < ui _<_ m—n+1
ui-l -u
_ m-j i
and 1 5- ti-l ._ 2 (n-i) S (m-i) °
j-l+u.
l-l
ui-l
(3) Set ti = ti-l - Z) (::2 and increment
j=l+u.
45
(4) Repeat steps 2 and 3 until un has been found.
Then Y(t) = u E U where the components u1,...,un of u
v
were found above. We adopt the convention that Z)(-) = 0
. j=u
if u > v.
We must verify that Definition 3.15 makes sense,
i.e. that u can be found as claimed. We begin this task
by proving the following lemma.
3.16 Lemma. Let m,n,i,k E 11, i'g n < m, k.g m-+i-n-l.
m—n+i (m-j m-k
Then Z) . = .
j=l+k n-i) n-1+l
proof: Since (3) = (ati)+(p;1) for all rm; 6 H, p > q,
(nfitl)
(m;E;1) + EIEIi) ’
(”Ekll) [(mfiklz) ($2311)]
= m;511> + (mgfiz) + ----+<“;f:1) + (3::ii
(
= (“3511) + mgfzz) + ----+(“;f:1) + :zi) ,
m— n+i
= j= =l+k f(z) for all x e
B~\[z}, then Lemma 3.4 is true for p = 1. In the proof,
68
choose x, 5, E, and e as before, but obtain a contradic-
tion either to the hypothesis that H + 2 supports B at
z or to the strictness of the support. Theorem 3.5 then
follows immediately for p = 1 if we again assume that
H + z prOperly supports B at 2. Theorem 3.5 and all
three of its corollaries, Lemma 3.12 and Theorem 3.13 all
hold as previously stated for p = 1.
Algorithms 3.27 and 3.28 both solve problem (P*)
when p = 1 since the £1 unit ball being convex eliminates
the possibility of finding a local solution that is not a
global solution of problem (P*). Consequently, in the
£1 case, algorithm 3.20 is unnecessary since all three al-
gorithms find the same solution while algorithm 3.20 requires
much more computational effort to do it.
CHA PI‘ER IV
NUMERICAL RESULTS
In this chapter we consider some of the computational
limitations placed upon the algorithms presented in chapters
II and III. Since the main application of algorithm 2.2 is
to the 2p spaces, 1 < p < a, we discuss the computational
difficulties encountered in that context, and then present
numerical results from two examples. Examples of LP approx—
imation, 0 < p,g l, conclude the Chapter.
As we noted in chapter II, in the case of the LP
spaces, 1 < p < a, if y i 0, then
* IYjIP-l
yj = 1 sgny. .
HYIIP'
This particularly simple form for y* makes the evaluation
* . * . .
of yk in step 2 and (yk-akvk) in step 4 of algorithm
2.2 immediate.
The two main computational difficulties occurring
. . . . * _
in algorithm 2.2 are finding Gk such that ((yk-akvk) ‘vk)-0
and solving the original problem (P) once the solution of
problem (P*) has been found. Since values obtained on a
computer are seldom exactly correct, each of these difficulties
69
70
brings up a related one also. How close to Gk is close
enough, and how close to vk = 0 is close enough to call
yk the solution of problem (P*)? Since the questions re-
lated to knowing when problem (P*) has been solved are
the easier, we dispose of those first.
By Theorem 2.4 of [21], if y solves problem (P*),
then
(s y)y
is in the image of A, and x 6 fig! such that Ax = z solves
problem (P).
Let the rank of A be k. Form the m)(k matrix
M composed of the k linearly independent columns
ai ,...,ai of A. Then
1 k
w = (MTM)-1MTz
satisfies
Mw = 2.
Let x 6 n5‘ be given by
w. if i E {il,...,ik]
o if ig{il,...,ik] ,
and let y 6 B9‘ be any solution of
Ay = 0.
71
Then u = x4-y satisfies
Au = z,
and hence u solves problem (P). If the rank of A is
n, then the solution of problem (P) is unique, and in
general, the solutions of problem (P) are the translates
of an n-rank (A) dimensional subspace of 35‘.
The answer to the question of how small vk must
be to accept yk as the solution of problem (P*) is somewhat
dependent upon the matrix A. Since
= Yk ‘ 0‘ka
yk+1 “yk - 0'1.ka ’
1im y - v ) = l and
k“ H k “k k.) ’
”yk" = 1 for all kip 0,
‘(akvk)j| is almost exactly how much the jth coordinate
of yk is changing at each iteration. Consequently, in
our examples the algorithm terminated if
max[‘(akvk)j‘ )l'g j gim} < 10‘7,
and our computed solutions agreed with published solutions
for the same problems.
The related problems of how to compute and how accu—
rately to compute the 0k in step 4 pose the greatest problem
in algorithm 2.2. Where no confusion arises, we drOp the
subscript k since the iteration being considered is usually
I .III III" ‘1‘. (I'illlr II I
l ['1 all! 4' {ll-'1' IIII II. ...-III ‘4' I w
72
irrelevant. We assume throughout the discussion that y,v
are linearly independent since the problem is trivial other-
wise.
Let f(a) ((y-av)*)v)
l m p—l
"y-av”p-l Z)Vj‘y avj‘ sgn(yj avj),
P
and define
‘p-lsgn(yj — av.).
J
m
9(a) = jéfl Vj‘Yj"aNj
Clearly, f and 9 have the same roots so we consider only
the simpler function 9. By Proposition 2.5, g has a
unique root a and a > 0. Moreover, g(0) > 0 since
*
f(0) = (Y IV) > O
by (2.4.2) and (2.4.3). 9 is clearly continuous since
1 < p < a.
With this information, a number of techniques for
finding the root a of g are available. The following
four methods were used.
1. The method of bisection.
Choose a1 > 0 arbitrarily and compute g(a1). Since g(0)f>0
and the root 0 is unique,
if g(a1) > 0, set 02 = 201 ,
hflH
5...;
if 9(01) < 0, set a2 = <1
73
Of course, if g(a1) = 0, then we are finished. Repeat this
procedure until two numbers, say a+ and a7, 'have been
found such that
9(6‘) < o < g(a+).
Then let
If g(a) is positive (negative), replace 0+ (07)
and repeat this bisection of the interval [d+, d']
the root of g is obtained.
2. The secant method.
Begin as in the bisection method by finding d+, a?
that
9(a') < O < g(a+).
Instead of taking the midpoint of the interval [a+
find the point at which the line through the points
9(0*)) and (07,19(07)) crosses the x-axis, i.e.
wmw)-&QW)
a? — d'
a:
As above, if g(a) is positive (negative), replace
by a and repeat until the root of g is found.
by 0,
until
such
, d”),
(w
’
0+ (0")
74
3. The secant-bisection method.
After finding 6+, a" as described above, alternate one
iteration of the bisection method and one step of the secant
method. This is similar to Dekker's algorithm.
4. Newton's method.
Observe that g is a differentiable function of a except
at those aj such that
y. =a.v., j=l,...,.m.
After checking if any of these aj is the desired root, one
can apply Newton's method to find a such that 9(a) = 0.
Choose a1 > 0 arbitrarily, and compute
9(an)
ah+l = a.n —-§77agy , n = 1,2,...
It should be noted that some care must be taken with
the use of Newton's method. In our examples, it often failed
to locate a for want of a sufficiently good initial guess.
For p small, both the bisection method and the secant method
had trouble converging to the root in some instances. It
occasionally happened that one but not the other had difficul-
ties handling a specific situation. The mixed secant-bisection
method worked quite successfully in these cases enjoying the
benefits of each while avoiding many of their shortcomings.
For values of p roughly between 1.25 and 200, few
difficulties arise with any of these four methods for locating
a. For small or very large values of p, however, 9 can be-
come somewhat unruly.
75
While in theory algorithm 2.2 always solves problem
(P), one should expect a little less in practice. Recall
that
m
"\ p-l
a = V. o-av. s n o-aVo o
9() 33:1 31y] 3) 9 (Y3 J)
The cause of our difficulties is the exponent p-l. If
p is near 1, then
1
‘yj-av.‘p-
.-av. =- .—a.
J sgn(yJ J) sgn(yJ v3)
independent of the magnitude of ‘yj-avj‘. A number that
is supposed to be zero, but because of roundoff errors is
actually 10-15 on the computer, will be greater than .7
after being raised to the p-l power when p = 1.01, and
approximately .966 when p = 1.001. Similarly, when p is
very large,
0 if )yj-'avj' < 1
[.5
H
H
\Y-
J 3
if ‘yj-avj‘
a: if )Yj'avj‘ > 1
The point of these comments is that one should not expect
algorithm 2.2 to be computationally feasible throughout
1 < p < a.
With p near 1, the thirty-two figures of double
precision machine accuracy was sometimes not sufficient to
5
determine a such that (9(a)) < 10— The question of
how accurately one must know 0 becomes of interest at this
76
point. The question will be taken up‘when the actual examples
are discussed.
A second unpleasant feature of the function 9 must
also be considered when p is near 1. We know that yk
converges to some y and that vk converges to 0, but the
behavior of Gk is not known from the theory. Numerical
results indicate that the sequence {ak‘k-Z 0} has two limit
points with {a
2n}
that sometimes differ considerably. For p near 1, one of
and {02n+1} approaching two numbers
the two limits appears to be 0 making those akvk go to
0 rapidly while the other akvk approach 0 more slowly.
At the other end of the range 1 < p < a, our program
came to a halt not because the algorithm was sensitive to
large values of p, which it is, but because the computer
can not store exponents that are too large. More to the
point is that we could not compute the LP norm when p
grew too large.
Two examples were programmed in FORTRAN IV for a
CDC 6500. The first is taken from Barrodale and Young [4].
The linear system to be solved is
x = 1.52
x + y = 1.025
x + 2y = 0.475
x + 3y = 0.01
X + 4y = - .475
x + By = -l.005
128
64
32
16
1.5
1.499972
1.499944
1.499890
1.499894
1.500651
1.503757
1.514762
1.520005
1.520126
1.520215
1.520187
1.520037
1.5200001
1.52
77
—.499955
-.499909
-.499817
-.499671
-.499637
-.500057
—.502571
-.503800
-.5037866
-.503744
-.5036206
-.5033896
-.503333
-.503
Table 4.1
.025
.0251982
.0253977
.025801
.0266422
.0285313
.032874
.043216
.050790
.0532758
.056436
.060543
.0659792
.0701003
.073
Iterations
26
21
21
21
23
53
30
45
47
37
27
364
78
The best LP approximate solution was computed for
p = l.04,l.06,...,1.5,2,4,8,...,128.
The search for Gk was terminated when (9(a)) < 10-6 or
after 32 iterations of the mixed bisection-secant method
whidhever occurred first. All of the solutions together
were computed in less than forty seconds. Some of the solu—
tions are listed in Table 4.1.
The second example appears in Cheney [6, p.44]. The
overdetermined system of linear equations is
x + y = 3
x - y = l
x + 2y = 7
2y + 4y = 11.1
2x + y = 6.9
3x + y = 7.2
This system poses special difficulties because the solution
of the ‘1 problem is not unique. All points on the segment
joining
P1 = (1.77,1.89) and P2 = (2.516667,1.516667)
solve the £1 problem with a minimal ‘1 error vector of
length 4.7.
The LP problem was solved for
p = l.06,l.08,...,1.5,2,4,6,20,40,100,400.
Ia H H‘ F4 +4 (A H H‘ F‘ +4 be H
Ul
.42
.38
.34
.26
.22
.18
.14
.10
.06
2.0883483
2.0889511
2.0893571
2.0895666
2.0896032
2.0895121
2.0893464
2.0891500
2.088926
2.0884417
2.0880841
2.0872254
Table 4.2
79
1.7400827
1.7365094
1.7337896
1.7319498
1.7309038
1.7304580
1.7303686
1.7304290
1.7305371
1.7307791
1.7309614
1.7313904
Iterations
15
16
16
14
10
34
22
11
13
13
23
80
The computed results for pip 1.5 agree with those published
by Cheney [6] and Duris [9]. For 1.06 g p < 1.5, no pub—
lished figures are available, but as p decreases (see Table
4.2), the solution of the LP problems approximately equals
a solution of the £1 problem.
For 1 < p < 1.06, algorithm 2.2 consistently obtained
solutions about
Q = (2.04615,l.75193),
which is quite far from the computed limit [1] of the solutions
of the LP problems as p 4 l, (2.0883,l.7309). It should
be noted, however, that Q is an ‘1 solution of the problem
since
Q = 1P1 + (1-x)P , 'where l = .369839.
The problem of determining how accurately one must
know a in step 2 was particularly troublesome with this
example. The smaller [9(a)] is forced, the longer the al-
gorithm takes and the greater is the possibility that the
computer is not capable of locating the desired a. To solve
the £1.06 problem, it was necessary to know a such that
[9(a)] < 10-24 in each iteration, and the computations took
28.5 seconds. The inability of the computer to store numbers
exactly and the presence of many solutions of the ‘1 problem
near the unique solution of the LP problem apparently teamed
up to render algorithm 2.2 ineffective for values of p less
than about 1.06.
81
On the topic of 2p approximation when 0 < p.g 1,
algorithm 3.20 was programmed in FORTRAN IV for a CDC 6500,
and two examples were studied.
The example taken from Cheney [6, p.44] that we dis-
cussed earlier was tested with
p='i—O", n=1,2,...,lO.
For p = l, the algorithm found both corner point solutions
mentioned previously. All points on the line segment joining
those two points are also solutions of the £1 problem.
For 0 < p < 1, however, the original problem has the unique
solution (2.516667,1.516667). For each case, the algorithm
took less than a second to compute the solutions.
For a second example, we chose K to be the subspace
S l
spanned by the single vector 1 ,, and took b to be 0
-5 1
The intersection of K9 (5), where s = b since b E KL,
. . - _ z i
and the three dimenSional 1p unit ball for p - 10,,10,...,
%% are shown in Figure 2(a) of Chapter III. 3 runs dia-
gonally from the upper left to the lower right passing through
the corner points shown. As Table 4.3 indicates, the solution
of problem (P*) "jumps" from the corner along 5 to the
ones On either side for p a .777 as the picture indicates.
.78
.775
.4545
.4383
.4160
.4107
.4093
.3856
.3442
.2886
.2163
.1291
.0433
.0014
.5000
.4629
.4204
.4112
.4089
.3715
.3150
.2500
.1768
.0992
.0313
.0010
.4545
.4383
.4160
.4107
.4093
.3856
.3442
.2886
.2163
.1291
.0433
.0014
82
.5000
.4629
.4204
.4112
.8185
.7711
.6883
.5772
.4327
.2582
.08665
0
.002891 -.0002891
Table 4.3
.0000
.08185
.08185
.07711
.07711
.06883
.06883
.05772
.05772
.04327
.04327
.02582
.02582
.008665
.008665
.0002891
.5000
.4629
.4204
.4112
.8185
.7711
.6883
.5772
.4327
.2582
0
.08665
0
.002891
0
2.000
2.000
2.000
2.000
1.998
1.998
1.949
1.949
1.896
1.896
1.861
1.861
1.845
1.845
1.848
1.848
1.873
1.873
1.923
1.923
BIBLIOGRA PHY
BIBLIOGRA PHY
[ l] Abdelmalek, N.H., Linear Ll approximation for a
discrete point set and L1 solutions for over-
determined linear equations, J. ACM 18 (1971),
41—47.
[ 2] Abdelmalek, N.H., On the discrete linear Ll approx-
imation and L1 solutions of overdetermined linear
equations, J. Approximation Theory 11 (1974), 38-53.
[ 3] Anton, H. and Duris, C.S., An algorithm for best approx-
imate solutions to Av = b in normed linear spaces,
J. Approximation Theory 8 (1973), 133-141.
[ 4] Barrodale, I. and Young, A., Algorithms for best L1
and L6° linear approximations on a discrete set,
Numer. Math. 8 (1966), 295-306.
[ 5] Barrodale, I. and Roberts, F.D.K., An improved algorithm
for discrete £1 linear approximation, SIAM J. Numer.
Anal. 10 (1973), 839-848.
F 6] Cheney, E.W., Introduction to Approximation Theory,
McGraw Hill, New York, N.Y. 1966.
r7] Cline, A.K., Rate of convergence of Lawson's algorithm,
Math. Comp. 26 (1972), 167—176.
[ 8] Descloux, J., Approximations in Lp and Chebyshev
approximations, J. Soc. Indust. Appl. Math. 11 (1963),
r“:
9] Duris, C.S., An algorithm for solving overdetermined
linear equations in the LP-sense, Mathematical
Report 70-10, Drexel University, 1970.
[10] Duris, C.S. and Sreedharan, V.P., Chebyshev and Ll-
solutions of linear equations using least squares
solutions, SIAM J. Numer. Anal. 5 (1968), 491-505.
[ll]
[12]
[13]
[14]
[l6]
[17]
[18]
[19]
[20]
[21]
[22]
The cal-
Comput.
Fletcher, R., Grant, J.A.,
culation of linear L
J. 14 (1971), 276-279.
and Hebden, M.D.,
approximations,
Fletcher, R., Grant, J.A., and Hebden, M.D., Linear
minimax approximation as the limit of best LP-
approximation, SIAM J. Numer. Anal. 11 (1974),
123-136.
Hoel, P.G., Certain problems in the theory of closest
approximation, Amer. J. Math. 57 (1935), 891-901.
Lawson, C.L., Contributions to the Theory of Linear
Least Maximum Approximation, Ph.D. Thesis, University
of California, Los Angeles, 1961.
Motzkin, T.S. and walsh, J.L., Least pth power poly—
nomials on a real finite point set, Trans. AMS 78
(1955), 67-81.
Motzkin, T.S. and Walsh, J.L., Polynomials of best
approximations on a real finite point set. I, Trans.
AMS 91 (1959), 231-245.
Motzkin, T.S. and Walsh, J.L., Polynomials of best
approximation on an interval, Proc. N.A.S. 45 (1959),
1523-1528.
Rice, J.R. and Usow, K.H., The Lawson algorithm and ex-
tensions, Math. Comp. 22 (1968), 118-127.
Sreedharan, V.P., Solutions of overdetermined linear
equations which minimize error in an abstract norm,
Sreedharan, V.P., Least squares algorithms for finding
solutions of overdetermined linear equations which
minimize error in an abstract norm, Numer, Math. 17
(1971), 387-401.
Sreedharan, V.P., Least squares algorithms for finding
solutions of overdetermined systems of linear equa-
tions which minimize error in a smooth strictly
convex norm, J. Approximation Theory 8 (1973), 46-61.
Stiefel, E., Note on Jordan elimination, linear program-
ming, and Tchebycheff approximation, Numer. Math. 2
(1960), 1-17.
»‘i]]]]]i]]i“
8
"([11]