ALGORITHMS FOR SGLVTNG

EVEREEEIENMINEB SYSTEMS 0E LINEAR L57 LIT-7c:

EQIIATIGRS IN THE A; SETTSE: ' ’2 5:1;.ffEffjiisz

TIEeais Io: Ike Degrée of PIE D

MTPH‘GATT STATE TTT‘TTVERSTT‘I'
R‘JBERT ‘1‘iTE. LTAM OWENS
‘ ' 39T5 _

 

IIIIIIIITITIIIIIITTITIITITIIIITTIITTI L.

3 1293 00839 6214

 

 

SEP27W

 

 

‘9' ABSTRACT
ALGORITHMS FOR SOLVING OVERDETERMINED SYSTEMS
OF LINEAR EQUATIONS IN THE ‘P SENSE
BY
Robert William Owens

In this thesis, we investigate methods for approx-
imating a solution of an overdetermined system of linear‘
equations. A best approximate solution of the linear sys-
tem Ax = b is taken to be a vector x which minimizes
the length of the error vector n(x) = b-Ax. we consider
the two classes of approximation problems obtained when
we determine the length of n(x) first by a smooth strictly
convex norm, and second by an LP metric for O < p < 1.

For each of the two approximation problems, we study
a dual problem whose solution leads directly to a solution
of the original problem. Algorithms for solving the dual
problems are presented, and numerical results from several

LP approximation problems, 0 < p < a, are discussed.

ALGORITHMS FOR SOLVING OVERDETERMINED SYSTEMS

OF LINEAR EQUATIONS IN THE LP SENSE

BY

Robert William Owens

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements

for the degree of
DOCTOR.OF PHILOSOPHY

Department of Mathematics

1975

ACKNOWLEDGMENTS

I would like to express my sincere thanks to
Professor V.P. Sreedharan, without whose direction and
encouragement I could not have written this thesis. I
also thank the other members of my committee, Professors
Harvey S. Davis, Norman L. Hills, Edward C. Ingraham, and
Mary J. Winter, whose helpful comments and suggestions
aided in the final preparation of the thesis. Finally,
special thanks are reserved for Jill Shoemaker for typing

the final manuscript.

ii

CHAPTER I .

CHAPTER II .

CHAPTER III.

CHAPTER IV .

BIBLIOGRA PHY

TABLE OF CONTENTS

INTRODUCTION . . . . . . . .

APPROXIMATION WITH A SMOOTH
STRICTLY CONVEX NORM . . .

THE LP PROBLEM, o<p<1

NUMERICAL RESULTS . . . . .

iii

23

69

83

Table 3.1

Table 4.1

Table 4.2

Table 4.3

LIST OF TABLES

iv

50

77

79

82

Figure 1

Figure 2

LI ST OF FIGURES

28

4O

CHAPTER I

INTRODUCTION

The problem that we shall study can be stated some-

what generally as follows:

Given a linear subspace K of 35‘ and a point

b not in K, find x E K which is closest to b.

For any metric d on ng‘, the above approximation problem
makes sense mathematically, while in practical problems only
a few of these are of interest. Moreover, even in the pres-
ence of certain well behaved naturally occurring norms on

m

H! as the metric, there are still many theoretical and

computational difficulties.

The L norms, 1.5 p'g a, given by

P
m 1/p
anp = (jglylep) 1 g p < . and (1.1)
”XIIco = max ijT (1.2)

195m

have received the most consideration with the L1,;2, and
La cases often being singled out for special attention
because of their many applications.

Since this thesis is concerned with L approximation

P
problems in the form just mentioned along with extensions to

the case where O < p < 1, we briefly review what is known
about the subject.

First, however, we rephrase the original problem.

Let A bean mxn real matrix with m >n and b 6
ng‘\~1mage(A). we are interested in finding x 6 n9‘ min-
imizing the number ”b - Axnp. We shall refer to this as
the LP problem. In the case of p = a, we often refer to
it as the Chebyshev problem.

The case where p = 2 is the easiest to solve since
this is just the problem of finding the usual Euclidean dis-
tance from a point to a subspace. If the matrix A has rank
n, then x = (ATA)—1ATb is the solution of the £2 problem,
and the only difficulties that arise are computational ones
stemming from the fact that ATA may be ill conditioned.

A number of essentially different methods have been
developed to handle the Chebyshev problem and to a lesser
degree the ‘1 problem. Since each of these norms is neither
strictly convex nor smooth, i.e. both have "flat spots" and
1'sharp corners", most of the powerful theorems from approxima--
tion theory are not applicable and special techniques must be
devised.

Since the unit ‘1 and La balls are convex poly-
hedra, the methods of linear programming are applicable for
solving the £1 and Chebyshev problems. Stiefel [22] pre-

sented suCh an algorithm to solve the Chebyshev problem,

and more recently Barrodale and Young [4], Barrodale and

Roberts [5], and Abdelmalek [l] have given improved algorithms
for handling both problems by linear programming.

It has been conjectured that the solution of the La
problem is the limit of the solutions of the LP problem as
p 4 a, and similarly for the L1 problem letting p 4 1+.
It is well known that if these limits exist, then they are
exactly the solution that one would expect, but the question
of convergence is the crucial point. Descloux [8] established
the convergence of the solutions of the LP problem as
p 4 a providing a justification for this method of solving
the Chebyshev problem. The question concerning the convergence
of the solutions of the ‘p problems as p 4 1+, however, is
still unanswered. Abdelmalek [1] incorrectly applied a theorem
of Hoel [13] to conclude that the solutions of the ‘P problems
converge as p.+1+. Moreover, even if such a theorem were
established, there would remain the problem of computing the
solutions of the ‘P problems when p is near 1. This, as
shall be mentioned, is another significant difficulty. Methods
of solving the Chebyshev problem by trying to find the limit
of the solutions of the LP problems as p 4 a have been pre-
sented by Fletcher, Grant, and Hebden [12]. Attempts to solve
the ‘1 problem by locating a limit of the solutions of the
‘p problems as p 4 1+ have been investigated by Abdelmalek
[1].

Another method used to solve the Chebyshev problem
was developed by Lawson [14]. Using a result of Motzkin and

Walsh [16], Lawson was able to compute the solution of the

Chebyshev problem as the limit of the solutions of a sequence
of weighted LP problems where p may be held fixed and

the weights change in each iteration. Since weighted L2
problems are almost as easy to solve as 12 problems, Lawson
chose p = 2 throughout his algorithm. An analysis of the
rate of convergence of the Lawson algorithm was made by Cline
[7].

Another theorem of Motzkin and Walsh [16] similar to
the one guaranteeing that Lawson's algorithm converges to the
solution of the Chebyshev problem also says that the solution
of the £1 problem is identically the same as the solution
of a weighted LP problem for some choice of weight functions
depending upon p. So far, however, no one has discovered an
algorithm for computing those weights either directly or iter-
atively even in the case when p = 2.

The best known method for solving the Chebyshev problem
is the exchange algorithm as can be found in Cheney [6]. It
exploits the fact that the m xn Chebyshev problem given by
the overdetermined linear system Ax = b has exactly the same
solution as some particular (n-El) xn Chebyshev problem
determined by Ax = b, where A is composed of n + 1 rows
of A and S is the n + 1 vector of the corresponding
entries from b. In fact, of all such (nuEl) xn problems,
the one yielding the solution of the original problem is that

one for which the error uﬁJEIRAx - b“ is the largest.
xEﬁ!

Although most of their work deals with the LP

problem for l < p < a, Duris and Sreedharan [10] have also
presented algorithms for solving the Chebyshev and ‘1 prob—
lems that use only the solutions of least squares problems.
Much more can usually be said about the solutions of
the LP problem when l < p < o since in that case the norm
is both strictly convex and smooth, i.e. the boundary of the
unit ball contains no line segment and at each point on the
boundary of the unit ball there is a unique hyperplane sup-
porting the unit ball. Also, the duality of the spaces L
1

l .
nd where —-+ -= 1 can be ex loited.

Sreedharan [19,20,21], Duris [9], and Anton and Duris

P

[3] have develOped several algorithms for solving the approx-
imation problem for an arbitrary smooth strictly convex norm
and in particular for the LP norm when l < p < a. In each
case, the solution of a dual problem is obtained iteratively
by carrying out an orthogonal projection and solving a single
nonlinear equation in one real variable at each iteration.
Numerical results indicate that the algorithms converge slowly
when p is very near 1 or very large, but work satisfactorily
in all other cases.

In Chapter 2 of this thesis, another algorithm for
solving the given approximation problem with a smooth strictly
convex norm is presented. The algorithm uses ideas developed
by Sreedharan in [20,21]. Numerical results on LP problems

are presented in Chapter 4.

Rice and Usow [18] have extended the previously men—
tioned Lawson algorithm to include LP problems for 2.g p < a
and they have developed a method for accelerating the conver-
gence of the algorithms. However, they observed that for p
large the convergence is still quite slow. Moreover, the
algorithm is not applicable at all for l < p < 2.

Another algorithm applicable for 2.5 p < a but not
for l < p < 2 has been presented by Fletcher, Grant, and
Hebden [11]. Although primarily designed for Lp approxi-
mation problems, i.e. continuous rather than discrete approx-
imation, the modification is immediate. For p12 3, second
order convergence is proven, but numerical results are few
and inconclusive regarding this algorithm.

A final method of solving the LP problem is by min-

imizing the differentiable function of n real variables
f(x) = Hb-—Ax"p (1.3)

It is sufficient to find a zero of the gradient vf. Newton's

method and the method of steepest descent are applicable to

speed convergence to a root of vf if p is large enough to

guarantee enough derivatives of f. Unfortunately, p'Z 3

is required so again the case 1 < p < 2 is left untreated.
In summary, the situation is roughly as follows:

p = 1: Only a very few algorithms are available.

1 < p < 14-2, e small: Almost no effective method is known

for treating these LP problems.

p near 2, i.e. not too close to 1 nor very large: There

are many algorithms available - few difficulties.

p very large: There are a couple of good algorithms avail-
able to choose from, but not very many.

p = a (Chebyshev problem): There are several efficient algo—
rithms to solve this problem.

With a slight modification in the definition of "'"p’
the LP problem for O < p < 1 also makes mathematical sense
and we are consequently led to investigate solutions of that
problem also. With the exception of Hoel [13] and Motzkin
and Walsh [15,16,17], no one has considered this question.

When we choose 0 < p < 1, "°"p suitably defined
turns out to be a p—homogeneous metric but not a norm, and the
unit ball is not convex. In Chapter 3 we consider the LP
problem for O < p < l establishing theoretical and compu-
tational methods for finding a solution. In Chapter 4 some

numerical results are presented.

CHAPTER II

APPROXIMATION WITH A SMOOTH STRICTLY CONVEX NORM

In this chapter, we shall study the system of linear

equations
Ax = b

where A is an m xn matrix, m > n, x is a n-vector, and
b is an m-vector. We assume all numbers are real. Let

"-R be a smooth strictly convex norm on 35‘. The problem
that we shall be concerned with, referred to as problem (P),
is

(P): Find x 6 EMT minimizing [Rb-Ax" [x 6 n3‘}.

In [21], a dual problem (P*) was considered, the
correspondence between problems (P) and (P*) established,
and two algorithms for constructing the solution of problem
(P*) were presented.

Here we review problem (P*), the previous results
concerning that problem and (P), and then present a new
algorithm for solving problem (P).

Before actually beginning this development, we set
down some assumptions, definitions, and notation that will

remain standard throughout this chapter.

m

(.‘.) denotes the usual inner product on H! , i.e.
m
(x y) = Z x.y.
‘ j=1 J 3
We denote the transpose of a matrix A by AT. (v) means
the linear span of the vector v.
K = Image (A) = [Ax[x 5 En]
l T m
K = Ker A = [x E n: ](k]x) = O Vk E K]

E:]R“1 4 Kl is the orthogonal projection of 35‘ onto K‘,
where orthogonal means with respect to the inner product

(']°) given above.
3 = Eb
p = inf{||b-k” [k E K]

We assume that p > 0 since the problem is trivial otherwise.

Recall that a norm "‘" is strictly convex if and
only if
1 . .
"x" = "y” = Eﬂx-Ey“ implies that x = y,
and "-H is smooth if and only if through each point of unit

norm there passes a unique hyperplane supporting the closed
unit ball B= [x 6mm]"x" 51}.

Given v #'0, we define the vector v* 6 H5‘ by

(v*]v) = ”v” and max[(u]v*) [Hun g l} = l.

10

By Lemma 2.1 of [21], the correspondence v H‘V* is a con-
tinuous function from H§T\~[O} into itself. Moreover, in
the special case when "'H is the LP norm with l < p < a,

v* takes on the particularly simple form

v.]P-l
v; = ‘43—]? sgn V3
anp

In [21], Sreedharan introduced the following dual

problem (P*)

(P*): Find 2 EK@(s), ”z” = 1 maximizing (s‘w) over

all w 6 K®(s), "w" = 1.
2.1 Lemma. (i) There exists a unique k e K minimizing
Hw-yHIYGIQ. (LlJ)
(ii) Problem (P*) has a unique solution.

Proof: (i) Let x E K and set 5 = Hb-xH.
S = [y 6 K THb"Y".S 5] is a compact set, and f:K 4 n: by
f(z) = Hb-—z” is continuous on S, so f must achieve its

minimum value on S. Suppose x,y 6 S such that

f(x) = ub-xn

min[|]b-z|] [z 6 S] = Hb-yl] = f(y). (2.1.2)

§sz- (x+y)u -§-II(b-x) + (Io-y)"
<%um-xn+Im—yw

=Nb-xu=Im—yu

11

By the strict convexity of H-U, b-x = b-y, i.e. x = y.
Thus there is a unique k E K minimizing (2.1.1).
(ii) It is clear that problem (P*) has a solution. Sup—

pose y,z both solve problem (P*) and y #'2. Since

”-u is strictly convex, "y" = ”z" = 1, and y #Tz,
IIY+2II < 2. (2.1.3)
Since (91y) = (s|z) > o, y+z 510. Let w = ”3’12” Then

w E K@(s), “w" = 1, and

 

_ (Sly) + (812) = 2 s
(s‘w) — THY +2". #120” > (s‘y) by (2.1.3).

But this contradicts the assumption that y solves problem
(P*). Thus the solution of problem (P*) must be unique.
The key results of [21] used to solve problems (P)

and (P*) are

2.2 Theorem. If 2 EK@(s) with ”z" 1 and z* eKJ’,

then

b - (b]z*)z E K and p (b]z*).

s s)

2.3 Corollary. Let 2 be as above. Then (b]z*) = (s z

2.4 Theorem. Let 2 solve problem (P*). Then

88
82.

b _ s s)

(s z 2 6 K and p = (

2.7 Lemma. Let z,W’E 351 be linearly independent. Then

a 6 1R satisfies

VI

Us

ch
Ch

12

IIz-OMII g Nz-xwu vx 6 R

if and only if

((z-aw)*]w) = 0.
Next we give a new algorithm for solving problem (P).

2.2 Algorithm.

 

Step 1. Set k = O and yk = "s"

Step 2. Compute y; and vk = AA yk .
Step 3. If vk = 0, go to step 6; otherwise continue.

Step 4. Choose Gk > 0 such that ((yk-akvk)*[vk)==0.

Step 5. Set yk+l = 72R where z = yk - akvk.

Increment k by 1 and return to step 2.

Step 6. Using yk, the solution of problem (P*),

solve problem (P).

2.3 Theorem. Either algorithm 2.2 solves problem (P) in a
finite number of steps or the yk converge to the solution of

problem (P*).

Before proving theorem 2.3, we make a couple of obser—
vations about algorithm 2.2 and establish a few facts to be
used in the proof. In step 1, we could have selected yO to
be any element of K6 (3) of norm one with (yo|s) > 0. Our

choice is just a very convenient one. In step 3, one might

choose to stop the computations if ”Vk” or ”akvk” is

13

small. The reasons for these stop rules will be apparent
in the course of the proof of the convergence of the algo—
rithm. With respect to step 6, Theorem 2.4 of [21] says that
Ax = b — Y‘all-[812)— yk = b - pyk has a solution, say 3?. Since
“b-A§H = p, i solves problem (P).

We next prove a couple of propositions essentially
saying that algorithm 2.2 makes sense. In particular, vk = 0
implies that yk solves problem (P*), and the choice of Gk

specified in step 4 is always possible.

2.4 Lemma. Let vk #'O.

 

(i) There exists a 3 0 such that Hyk - GNkH < 1.

(ii) If Yk’vk are linearly independent, then for
every 0 > O sudh that ”yk - Gvk" < l, we

have 0 < Hyk - avkn.

(iii) If yk,vk are linearly dependent, then there

is a unique a > 0 such that “yk - avk“ = 0.

(iv) There is a unique 6 6 3% such that VA 6 n1

”yk - Bbku g “yk - kau , and B > o.
(v) ((yk - ﬁvk)*|vk) = 0.
Proof: By the definition of y; ,
max[(u]yl:)| Ru“ 5 1} = 1.

My]. - wk" = 1ka - 0.ka maXIIUIYPI Hull 5 11

14

-av

Yk k
-owk[{TY1:‘

2 lek' WkTHTTYk
= (Yk’WRTY;)

= (ykwp - aIy;(vk)

“Ykn - 0(y;]vk)

_ _ *
- l a(ykIvk). (2.4.1)
Observe that
* _ * T* T1: T*_ T*2
(yk|vk) - (yk]AA yk) = (A yk]A yk) - ”A yknz . (2.4.2)
Together with the definition vk = AATy;, ‘we have that
. . T11-
vk = 0 If and only If A yk = 0. (2.4.3)

By assumption, vk #TO so from (2.4.1)
IlYk-Wk" > 1 for all a < 0. (2.4.4)

Suppose that Yk’vk are linearly dependent. Then there is
a B 6 IR such that nyk-kau = o. By (2.4.4), I3 > o.

More specifically, ﬂ = proving (iii). Suppose Yk’vk

l
vkﬂ
are linearly independent. If there were no a > 0 such that

”yk-avk“ < 1, then
Ika-avk“ 2 1 = ”ka for all a e 13. (2.4.5)

By Lemma 2.7 of [21], (2.4.5) implies that (yElvk) = O, which
by (2.4.2) and (2.4.3) means that vk = O, contradicting the

‘hypothesis that vk i'O. Consequently, there must be an <1>C>

15

such that "yk-avku < l establishing (i) and (ii). Consider

the strictly convex function
f:]R 4 ]R by f(x) = [Tyk—kau .

f(0) = 1, f(x) 4 a as A 4 a, and by (i) there exists an

a > 0 such that f(a) < 1. By the continuity and strict
convexity of f, there is a unique 1 > 0 such that f(i)==l,
and by the strict convexity of f, there is a unique 5,

O < B <'X, minimizing f. (iv) is proven. Finally, from

(iv) and lemma 2.7 of [21], it follows that

for the 5 found in (iv).
This completes the proof of Lemma 2.4.

We summarize the important points of Lemma 2.4 in

2.5 Proposition. Assume that yk and VR are linearly in-

dependent. Then there exist a unique 0 > 0 such that
_. * ==
((Yk avk) [vk) 0. (2.5.1)
Moreover, with this choice of a,
0 < ||yk~akaI < 1. (2.5.2)

As an immediate consequence of proposition 2.5, step

4 of algorithm 2.2 is guaranteed to make sense.

2.6 Proposition. For any k 2 O, yk E Ke(s), "ka = l, and

‘YkT3) > 0.

16

Proof: The assertion is true for k = O by construction.
Assume that the pr0position holds for k = O,1,...,n. We

shall now verify its validity for k = ni-l.

Y

= = n = .
If vn 0, then yn+1 T§;T' yn so the assertion

is true for k = nntl. Assume that vn {’0. Since vk
AATy]:, vk e K vk. s e K“, yn e K@(s) with (ynTS) ,\ o,
and vn E K implies that yn and vn are linearly inde—

pendent. So by prOposition 2.5,

o < Hyn-anvn". (2.6.1)

yn nn

Since yn E K®(s) and vn 6 K, yn+1 = TTYn-anvn" E K@(s),

and "yn+1" = 1. Finally,

ynnanvn
(Yn+1‘8) a (TTYn- anVnTT TS)

 

(yn]S)-Gh(vn|s)
TTyn - anvn “
(yn|s)

a "Yn"ahvn" (2.6.2)

 

 

> O by the induction hypothesis.

This completes the proof of proposition 2.6.
We next show that steps 3 and 6 of the algorithm do

not lead us astray by proving

2.7 Proposition. If vk = 0, then yk solves problem (P*).

Proof: By (2.4.3), vk = 0 implies that y; 6 Ker AT = K‘.
Also, by Lemma 2.6, yk 6 K@ (s) and "Yk" = 1. Applying

theorem 2.2 of [21], we have

17

b - (b(y;)yk e K and p = (b|y;). (2.7.1)

(2.7.1) together with corollary 2.3 of [21] and lemma 2.1(i)

yields that

s s)
b-Tss—‘Jy—kTYk (2.7.2)

is the unique point in K closest to b.
Let 2 solve problem (P*). Then by Theorem 2.4

of [21],

s s)
b- (S 2) 2 (2.7.3)

is also the unique point in K closest to b.

Equating (2.7.2) and (2.7.3), we have

2 = ——ZE—7-. (2 7 4)
(3 2) (STYk . .
Taking the norm of both sides of (2.7.4) and using the facts
that ”z" = “yku = l and (yk]s), (st) > O, we have
(s‘z) = (STYk)° (2.7.5)

Lemma 2.1(ii) now says that yk = z, i.e. yk solves problem
(P*).

We now prove Theorem 2.3, i.e. that algorithm 2.2
solves problem (P). Since the proof is somewhat lengthly,
we first give an outline of the steps to be taken. We show

that

(i) Vk 2_O O < pk < pk+1, where pk = (ykls),

 

18

(ii) 1im Hyk-Gkvku = l,
k4ao

(iii) 1im vk = O,
k4a

(iv) any limit point of [yk]k'2 O] solves problem

(P*), and

(v) 1im yk = y which solves problem (P*).
k4a

Because of PrOposition 2.7, we can assume that vk #‘o Vk.2 O.
2.8 Proof of Theorem 2.3. Let
pk = (YkTS). (2.8.1)

(i) Claim: 0 < pk < pk+1 Vk‘z o.
_ ._§_

pk+1 = (Yk+lTS)

 

- (yk‘S) b (2 6 2)
- TTYk'akvaT y ° ° ’
> (yk]s) by (2.6.1).

Thus 0 < pk < pk+1 Vk‘z 0.
(ii) Claim: 1im "yk-akvk” = l.
k4¢
From (i),

O < pk < pk+1 _<_ max[(w]s)]w E K@(s),"w" = l}

l9

and from (2.6.2),

 

 

p
k 3 Wk" “kvaT'
pk+1
pk
Thus lim “y - v = lim = 1.
k... k a" k” 11.... pk+1

(iii) Claim: 1im vk = O.
R
4a:

Suppose the claim were false. Then there exists 5 > 0 such

that, by taking a subsequence if necessary,

Hvk“ 2.6 for all k.2 0. (2.8.2)
1 > TTYk‘akaTT 2 ”(1)1ka ' TTYkTT

= aknvk" ' 1
20145—1
from which one obtains
0 <0:k <% . (2.8.3)

Again by taking subsequences if necessary, we can assume
that

lim = a and lim y = y. (2.8.4)
k4o ck k4o k

By Lemma 2.1 of [21], the mapping x H x* is continuous on
IRm\{O] , so

1im v = lim AATy* = AATy* a v.
k k
k4a k4¢

"v” 2 5 by (2.8.2).

20

Also, by the continuity of x 4 x* and “-H, and since
((yk-okkaIVk) = o ‘v’k 2 0,
we have that
((Y-GV)*|V) = 0, (2.8.5)
h-wu=1=hN (4&0

Either 0.: O or a > 0. We show that each of these possi-
bilities leads to a contradiction forcing us to conclude that
lim vk = O.
k4a

If a = 0, then (2.8.5) becomes (y*]v) = 0. Together
with (2.4.2) and (2.4.3), this means that v = O, contradicting
our assumption that v {'0. Suppose a > 0. First note that
y,v are linearly independent since (v|s) a O and (y|s) =
lim (yk]s) = lim pk > 0. By Lemma 2.7 of [21], (2.8.5) implies
k4m k4a
that

IIY'O‘VH S IIy-xvll for all ). 6 1R. (2.8.7)
But (2.8.6) and the strict convexity of "°”’ forces
IIy-gvu < 1
which contradicts (2.8.7). Hence we must conclude that
lim vk = O.

k4¢

(iv) Claim: any limit point of [yk]k.2 O} solves problem
(P*).

21

By taking a subsequence and reindexing if necessary, we can
assume that lim yk = y. From (iii) and the continuity of

k4a
the mapping x 14 x* , we have that

v = lim vk = 0, (2.8.8)
k4a

which by (2.4.3) implies that y* e Ker AT = K‘. Also, by

proposition 2.6,
y E K@(s) and "y“ = 1. (2.8.9)
Applying Theorem 2.2 and Corollary 2.3 of [21] and Lemma

2.1(i),

b _ s s

(s y)y

is the unique point in K closest to b.

Suppose z solves problem (P*). Then by theorem

2.4 of [21],
s s _ s s .
b - (s z) z b (s y) y , i.e
z = 4—
(s 2) my) ' (“M")

Taking the norm of both sides of (2.8.10) and using "z" =
"y“ = l, (s]z) > 0,, (st) = lim (s‘yk) = lim pk > 0, it
k4a R40
follows that (st) = (sTy). Finally, Lemma 2.1(ii) says that
z = y, i.e. y solves problem (P*).

(v) Claim: 1im yk = y, the unique solution of problem (P*).
k4~

22

By the uniqueness of the solution of problem (P*) any
convergent subsequence of {Yka 2:0] converges to the
unique solution of problem (P*). Due to the compactness

of B= [x€K®(s) ]|]x]]=l],‘ 1im yk=y.
k4~

This completes the proof of Theorem 2.3 establishing
that algorithm 2.2 is guaranteed to find the solution of

problem (P).

CHAPTER III

THE LP PROBLEM, o < p < 1

We shall be considering the system of linear equations
Ax = b

where A is an m xn matrix, m > n, b is an m-vector, and
x is an n—vector. All numbers are assumed to be real. For

yEIRm and o<p<1, let
m p
TTYTTP = j{Elij-I -

Given A,b, and p, the problem that we shall be concerned

'with, referred to as problem (P), is
(P): Find x e 35‘ minimizing {ub-Ax"P[x e nf‘}.

In order to solve the given problem, we will introduce
a dual problem, to be denoted (P*). A relation between prob-
lems (P) and (P*) is proven, and a characterization of
the solutions of problem (P*) established. Although in
general problem (P*) can not be solved in a computationally
feasible manner, it can always be solved in a finite number of
steps and efficient algorithms for solving particular cases

are given. Exchange type algorithms for finding local solutions

23

24

of problem (P*) are given and some of the difficulties
encountered in searching for a global solution of the gen-
eral problem (P*), reviewed. Problem (P*), when Hg] is
equipped with a norm, was considered in [21].

It should be noted that n "p’ O < p < l, is not a
norm on JRm . We shall, however, refer to TT'TTp as the

Lp-norm and problem (P) as the LP problem. Although not

a norm, ”'“p does satisfy the following properties:

(i) [MP 2 o VX e 112‘“ with equality if and only

if x = O.
.. m
(11) ”xi-yup g “xRP + ”yup Vx,y 6 IR .

(iii) Haxnp = IGIPHXHP vx e 112‘“, va 6 m.

m l
The usual I norm when p'2 l, “x“ ==( Z)‘x.]p) /p’ fails
p P j=1 J

to satisfy the triangle inequality if one chooses O < p < l

and consequently fails to be even a metric. We regain the
triangle inequality at the expense of substituting p-homogeneity,
see (iii), for l-homogeneity by defining "'“p as above for

O < p < 1. More generally, a function "-H satisfying (i),
(ii), (iii) above has been called a p-homogeneous norm.

Let (~]-) denote the usual inner product on ERA i.e.

(MW

)1
,dtﬂ

x

k:

Set
K = Image (A) = [Ax]x 6 nf‘}, and

K1 = Ker AT = [x 6 ﬁ§1[(x]k) = 0 Vk 6 K}.

25

. . m
Let 13:anl 4 K'L be the orthogonal prOJection of I! onto
K‘, where orthogonal means with respect to the inner product

given aboVe, and set s = Eb. Let
p = d(b,K) = inf[Hb-knp]k 6 K}.

We assume that b (‘K, or equivalently p > 0, since other-
wise the problem is trivial. Finally, let (v) denote the
linear span of the vector v.

Observe that s-b 6 K since 0 = s-Eb = Es-Eb =
E(s-b). As a result, d(s,K) = inf[IIs—k||p]k 6K] =
inf[”b-k+ (s-b)TIp|k ex] = inf[]|b-k||PTk 6K] = d(b,K).

The existence of a solution of problem (P) follows
immediately from the continuity of the LP norm and the
finite dimensionality of the subspace K.

Given problem (P), we associate a dual problem

(P*): Find 2 €K®(s), ]|z||p= l maximizing (sTw)

over all w E K€9(s), "w”p = l.

The relation between problem (P) and problem (P*) is given

in the following theorem which extends Theorem 2.4 of [21].
3.1 Theorem. Let 2 solve problem (P*). Then
(1) pl/sz) = (s|s)1 (3.1.1)

and (ii) b - pl/Pz e K. (3.1.2)

Proof: (i) (st) max[(s|w) [w E K®(s), ”wup = l}

max[(s]k+Bs)]k E K, 5 6 1R1TTk+BSITP = 1}

26

= (s‘s)max[B E 1R| [lk+BsHp = 1, k E K}

 

= (sls)max{B 6 1R {0} 1 Hk+snp = E—lp, k ex}
_ 1
_ (s|s)max{“k+s“l7p Dc 6 K}
' P
l

 

(s s)min{||k+ suglppc 6K}

= (s‘s)—7—1 .
pl p
Thus pl/p(s‘z) = (s|s) which is (3.1.1).

(ii) Let t E K‘. We shall show that (tlb-pl/pz) = 0.
Suppose (t‘s) = 0. Since 2 e KGD(s) and t e KJL n (s)*,

(t‘z) = 0. Also (t‘b) = (Et‘b) = (t‘Eb) = (t|s) = 0. Thus
(t‘b-—p1/pz) = o if (t|s) = 0. (3.1.3)

Next suppose that t = 3. Since E is the orthogonal pro-

jection of mm onto K" and s = Eb, we have that
(s‘b) = (s‘s). (3.1.4)

(t|b—p1/pz) = (slb-pl/pz) ,

(S‘b) - pl/P(s‘z),

(s‘s)-(s‘s), by (3.1.4) and (3.1.1),

80

(t‘b-pl/pz) = o if t e (s). (3.1.5)

27

Let F be the orthogonal projection of If“ onto (3), and
let t e K‘L. Write t = t1+t2, where t1 =- t-Ft and t2=
Ft. Clearly, (t1)s)==0 and t2 6 (s), so by (3.1.3) and
(3.1.5), (t‘b-—p1/bz) = 0. Since t E K‘ was arbitrary,

b-pl/bz E K. This completes the proof of the theorem 3.1.

Observe that ((b- (b- pl/Pz) up = "pl/p2.“ = pnznp = (3.

/pz

so b--p1 is a point of K of minimal distance from b.

In the statement and proof of Theorem 3.1, we used

only the p—homogeneity of the LP norm on min
Figure 1 on the following page gives a geometric inter-

pretation of theorem 3.1. Given K and b as in the theorem,

we are to find b’ 6 K which minimizes {"b-k"p‘k E K}.

Since the Lp-norm is translation invariant, this is equivalent

to finding 3’ e K minimizing [Hs-—k”p‘k 6 K}. Let p =

d(b,K) and let 2 solve (P*) as indicated in the figure.

By expanding the unit ball by a factor of p, which by the

p—homogeneity of ”-Hp means multiplying the unit ball by

l/h, multiplying by -d” and translating it by 3, one can

p
easily see that z is taken by the above operations to the
point 3’, i.e. s’= s-pl/pz. Thus b’= b-pl/Pz.

Figure 1 also indicates in which direction one should
look for a characterization of the solutions of problem (P*).
By the symmetry and concavity of "-"p, a solution of problem
(P*) should lie at a ”corner“ or on an “edge" of the unit

ball, and these correspond to points at which a certain number

of coordinates of the given point are zero. Before making

28

 

 

 

(s)
.b
b'
l/ \
x \
/ \
’/ \‘ \
__(\ ,>' /
\\ z” (1,0)
\\ I /
\ S /
I
\ /
\ /
\ l
\ /
\\ /
I
\ I
\ l
\ I
\I
T Figure l.

Geometric interpretation of theorem 3.1.

these intuitive ideas precise, we introduce some terminology
and establish a couple of lemmas.

Let X be a real linear metric space, i.e. a real
vector space on which a translation invariant metric is de-
fined so that the metric space structure is compatible with
the linear space structure. Denote by X* the algebraic

dual of X, i.e. the space of all linear functions on X.

29

3.2 Definition. Let X be a real linear metric space, AcX,

a E A, and H a non-trivial homogeneous hyperplane in X, i.e.

H {x E X‘f(x) = 0}, where f e X* \{O}. We say that H+a=

[h+a)h 6H} supports A at a if either
f(x) 2f(a) vx EA or f(x) _<_f(a) Vx EA. (3.2.1)

3.3. Lemma. Let X be a real linear metric space, f €X* \{O},

 

A = f-l(0), z E X\A, Z a subspace of X with z E Z and

dim Z > O, and A1 = A n Z. Then A1 is a hyperplane in Z.

Proof: A1 = A n Z = {x 6 Z|f(x) = 0} being the kernel of a
linear functional can fail to be a hyperplane in Z only if
either dim Z = O, which is not so by hypothesis, or if f :0
on Z. The latter is also impossible since 2 e Z‘\A by

hypothesis so f(z) a! 0. Thus Al is a hyperplane in Z.

3.4 Lemma. Let K be a subspace of mm, f a linear func-

 

tional on K with f ,£ 0 on K, H = f-1(O) = {x €K|f(x) =0},
and B = {x e K ( "pr g 1}. Let 2 e K satisfy (i)
“znp = 1, (ii) 2i 7! O, i = 1,2, ...,m, (iii) H+z supports

B at 2. Then dimK=1.

Proof: dim K 2 1 since 2 6 K\{O}. Suppose dim K > 1.
First we show that 2 £ H. If 2 E H, then f(z) = 0. Since
dim H = dim K- 1 > 0, there exists x E K\H with "xup g l,
i.e. x E B and f(x) > 0. Hence f(x) > f(z) > f(-x). Since
-x E B this contradicts the hypothesis that H+z supports

B at 2. Thus z£H. Choose x €H\{O} and define

3O

1 if xi = O
6- = , i = 1,...,m (3.4.1)

2.
1 .
‘ﬁ‘ 1f Xi # O

l . .
6 = 5-m1n{bi‘1 = 1,...,m}. (3.4.2)
Then \zi + exi‘ > O for i = 1,...,m and Va 6 ('6,6).
Let
9(a) = Hz + exnp, -o < e < a. (3.4.3)
c129 m 2 p—2
2 = p(p—1) Z7 Xj‘z- + eX-| < 0 (3.4.4)
dc j=1 J 3

since not all of the x. = O and O < p < 1. This shows

that 9 cannot have a local minimum for e = 0, i.e.
"z + (‘5’pr < 1 for o < (3'( < 5. (3.4.5)
Choose any such 3. Then there must exist 6 > 0 such that
Haz+3xnp_<_1 va 6 (1-e,1+e). (3.4.6)

Let u = (l - §)z + 3k and v = (l + §)z = 3x. By construc-

tion, u,v E B and by the linearity of f
f(u) = (1 - -2§)£(z) and f(v) = (1 + §)f(z). (3.4.7)
Since we have already verified that f(z) ¥'O, either

f(u) < f(z) < f(V) or ﬂu) > f(z) > f(V).

31

In either case, we have a contradiction of our hypothesis
that H + 2 supports B at 2. Thus the assumption that
dim K > 1 must be wrong. In other words, dim K = 1 which

is what we wanted to prove.

We are now ready to establish a theorem which guar—
antees that we need only check a finite number of points in

order to find a solution of problem (P*).

3.5 Theorem. Let K be an n dimensional subspace of nf‘,

n < m, with basis bl""’bn' Let D be the matrix
(bl""’bn) = (rl,...,rm)T, where ri is the i-Eh row of the
mxn matrix D. Let B = {x E K} "xllpg 1}. Suppose that f
is a non-zero linear functional on K such that H + z sup-
ports B at z, where Hznp = 1 and H = {x e K|f(x) = O}.
Denote by I the index set of i's such that 21 #'O, and

J = {1,2,...,m}‘~I. Let A be the matrix with rows rj, j EJ,

and set N = {x E nf‘le

0}. Then dim N = 1.

Proof: First of all, dim N #'0. To see this note that there
exists 5 6 13‘ such that D5 = z. By the definition of the

set J, it follows that AB

0, i.e. B e N. Now B #'0
since 2 #'0. Hence dim N = Q.2 1.

Without loss of generality, let I = {1,...,p} and
J = {p4-l,...,m}, and put K* = [Dx‘x E N}. Denote by f*
the restriction of f to K*, and let H* = H n K* and
B* = B n K*. Clearly, z E B*.

We claim that dim K* = q. To see this, observe that

the rank of the matrix D is n, and hence by the rank-nullity

32

theorem of linear algebra, Dx = 0 implies that x = 0.
Hence dim K* = dim N = q.

By Lemma 3.3, H* is a hyperplane in K*. Finally,
H* + 2 supports B* at 2 since f* = f|K*. At this point
‘we pause to observe that each k e K* satisfies k“,+1 = --°‘
km = 0. To see this, recall that k 6 K* if and only if
there exists an x E N such that k = Dx. But x E N implies
that Ax = O, i.e. (Dx)j = 0 for j 6 J = {p4-l,...,m}.

We now in essence dr0p the m-u trailing zero coor-

dinates and consider our problem in 36‘. To make this pre-

cise, let

x2
u

[x E 33*}(x1,...,xu,0,...,0) e K*

’

.K 4 ﬁt by f(x) = f*(x

W1

1,...,XH,O,...,O ),

ﬁ= {x eiZ'I'Euc) = 0},

ml
H

" H
{X E K|Ihq|p g 1}, where now’ “X"p = jiﬁ‘xj‘p’
z = (21,...,zu).

Notice that E 'has no coordinates equal to zero, and
E [zjlp = 1. Also, H is a hyperplane in K and H + ;
gupports B’ at 2. Finally, dim K'= dim K* = q. But by
lemma 3.4, q = 1, which is precisely what we wanted to prove,

namely dim N = l.

3.6 Definition. With an eye toward the future and a certain

dislike of repetition, we define the phrase “the usual n

 

dimensional situation" to be the following:

33

K is an n dimensional subspace of mg“ with

basis b1,...,bn: n < m: s E K*~\{0} where K‘ =

{x e mm\(x|k) = o Vk e K}; D = (b1,...,bn,s) =
(r1,...,rm)T where ri is the 1&2. row of the
m x(n+l) matrix D: B = {x 6 K@(s)| "xnp g l};

o<p<1.

3.7 Definition. Given "the usual n dimensional situation",

 

a point z E B is called a corner point or simply a corner
T
)

 

if dim{x E IRn+1}Ax = 0} =1 where A = (r. ,...,r.
11 1k

.9

and {i1,...,ik} = {i121 = 0}.

Observe that if we are working in "the usual n
dimensional situation" and if z is a corner point, then at
least n coordinates of 2 must be zero. Using the same

. . . . +
notation as earlier, Since dim{x e nfll'

(Ax = 0} = l, A
must have at least n rows, and hence 2 must have at least
n coordinates equal to zero.

We next show that the solution of problem (P*) must
be a corner point, and that there are only a finite number of
such points. We will have then reduced our problem to that
of finding and checking a finite number of points.

Recall that problem (P*) requires us to find 2 e
K©(s), ”zllp = 1, such that (z‘s) = max[(w}s)}w E K@(s),
"w”p = l}, i.e. find 2 E B such that K + 2 supports B

at 2. Using this formulation of problem (P*) and assuming

that z is a solution, we see that all of the hypotheses of

34

Theorem 3.5 are satisfied, so 2 must be a corner point.

Thus we have proven

3.8 Theorem. Given ”the usual n dimensional situation"
and that z solves problem (P*), then 2 is a corner

point.

3.9 Corollary. Under the hypotheses of Theorem 3.8, 2

has at least n coordinates equal to zero.
Proof: This is just the observation made earlier.

3.10 Corollary. If, in addition to the hypotheses of Theorem
3.8, D satisfies the Haar Condition, i.e. eadh n4-l rows
of D are linearly independent, then a solution 2 of

problem (P*) has exactly n coordinates equal to zero.

Proof: We again use the notation of Theorem 3.5. Since
dim{x E nfrH1|Ax = O} = l, the Haar Condition forces A to
be an n x(nn+l) matrix. Hence 2 has exactly n coordin-

ates equal to zero.

3.11 Corollary. Assume that ”the usual n dimensional situ-
ation" holds, that n = m-l, and that ‘Sj‘ = max{|si‘ ‘i =

l, ...,m}. Then a solution of problem (P*) is z = ej sgn 83"
where ej is the usual unit basis vector in ng‘.
Proof: By Corollary 3.9, 2 must have n = m-l coordinates

equal to zero, and since qup = 1, 2 must be one of the

vectors :ei, 1.5 i Sim. (:eils) = isi is clearly maximized

35

by ej sgnsj where |sjl = maxHi| |i = 1,...,m}. Hence
ej sgn sj solves problem (P*) .
This case in which KG (3) = mm and the trivial
case where K = {O} for which 2 = i p solves problem
HSH
P

(P*) are rare in that they can be completely solved directly
with little effort. Most of the others, as we shall see,

require considerably more work.

3.12 Lemma. Suppose that we have "the usual n dimensional
situation”, and that x,y are corner points with x = DB,
y = DY, I = {i‘xi = O}, J = {j}yj = O}, and J C I. Then

x =;:yu

PIOOf: Let M = {u E mn+l ‘(u‘ri) = 0’1 6 I} and N = {V €1Rn+l

1
(v}rj) = 0,3 6 J}. Since J c I, M c1N. And since x and y
are both corner points, dim M = dim N = 1. Thus M = N a (5).

Finally, “x“p = “yup = 1 implies that x = :3n

3.13 Theorem. In "the usual n dimensional situation”, there
are at most (111:) different corner points. Moreover, if D
satisfies the Haar Condition, then there are exactly (:2) dif-
ferent corner points. (By different corner points we mean

that if z is a corner point, then -z will not be considered

as a corner point also.)

Proof: Recall that z is a corner point if and only if

1Rn+1

dim{x E ‘(x‘ri) = 0 Vi such that zi = O} = 1, where

36

ri is the ith row if the matrix D given in Definition

3.6. Since there are TE?(:;) ways of choosing at least
n but no more than m-ifn coordinates of a point z E nf‘
to be zero, there are at most m-l(l;) corner points, i.e.
the number of corner points is figite.

Let the set of (2) distinct n element subsets
1. Let c(l),...,c(q) be

a complete enumeration of all the different corner points.

of {1,...,m} be denoted by B

We shall show that to each corner point c(i) we can assign
a distinct I E El establishing that the number of different
corner points is at most (2) .

For i = 1,...,q, define G1 = [jlcj(i) = 0}, where
cj(i) is the jth coordinate of c(i). We now define Fi
and Ei’ i = 1,...,q, iteratively by setting Fi = [ I E E1]
I chi} and Ei+l = Eia\Fi. We claim that Fi #'¢, 1 =
1,...,q. For suppose the claim were not true. Then for

some k, 1.5 k g_q, Fk = dﬂ Since c(K) is a corner point,

dim{x 6 ﬁfﬁq’}(x}rj) = O,j 6 Gk} = 1. Thus there must be

n linearly independent rows of D, say ri, i 6 I = {il,...,
. . n+1 .

in} c Gk’ such that d1m{x 6 Hi [(x|ri) = 0,1 6 I} = 1.
Note that cj(k) = 0 for all j E I since I chk. By
assumption,

¢= Fk = El\(F1U... qu_1).

So I 6 FL for some 1, 1.3 3.3 k-l, i.e. I ClGL and
n+1

hence ci(z) = 0, i E I. Since dim{x 6 Hi ‘(x‘ri) = 0,

37

i e I} = 1 =dim[x e IRn+1}(x}rj)=O,j 6‘52} and I :62,

we conclude that

{x e HJHJ‘}(x}ri) = 0,i 6 I} = {x E HJHJ'}(x}rj) = 0,i 6 GL}'

Moreover, since “c(k)“p = “C(£)Hp, c(k) .ic(£) contradict-
ing the assumption that c(k) and C(2) are different corner
points. Hence Fi #'¢, i = 1,...,q.

By construction, the F1 are mutually exclusive, so
to each corner point c(i) we can assign a distinct I EFi.
Thus there are at most (ii) different corner points.

If D satisfies the Haar Condition, then by Corollary
3.10 each corner point has exactly n coordinates equal to
zero. Each of the (1:) choices of n coordinates from the
m yields an n x(nn+l) matrix A for whidh dim{x E HJHJ}

m

Ax = 0} = l, and hence each of the (r:

) possible choices
produce a different corner point. This completes the proof

of Theorem 3.13.

Corollary 3.9 was first proven by Motzkin and Walsh
[15,Theorem 6] in the following form:

"Let E consist of the real points x1,...,xm

(m'2 n-tl), let F(x) be defined on E, let p
(0 < p < 1) be given, and let the functions

Yl(x),...,Yn+l(x) satlgfy Condition A. Then
every function P(x) 3 Z‘, a Y (x) of best
approximation measured by the deviation

38

m
kzluﬂmxk) - P(xk) (P (pk > 0)

coincides with F(x) in at least n + 1 points

of E."

Condition A says that the m x(n-+l) matrix (Yj(xi))
has rank n+-l.

In the same paper, Motzkin and Walsh observe that
”Theorem 6 implies that every extremal polynomial P(x) is
found by interpolation to F(x) in n4—1 points of E:
there exist but a finite number of polynomials interpolating
to F(x) in n-tl points of E, so every extremal polynomial
can be found merely by comparing their measures of approxi-
mation.”

Without making a further assumption about the matrix
Y = (Yj(xi)), namely that it satisfies the Haar Condition,
there need not be only a finite number of polynomials inter—
polating F(x) at ni-l points of E. The suggested pro—
cedure for finding a polynomial of best approximation,
consequently, need not be finite. Applying Corollary 3.10,
we can prove that the observation of Motzkin and Walsh is
correct if Y satisfies the Haar Condition. When the Haar
Condition is violated, however, we can easily construct, even

3

in HR , counterexamples to the assertion of Motzkin and Walsh.

Suppose m = 3, n = l, F(xl) = F(x2) = 0, F(x3) = Y1(x1)

Yl(x2) = Y2(xl) = Y2(x2) = Y2(x3) = l, and Y1(x3) = -2. The
l 1
matrix Y = (Yj(xi)) = l 1 clearly has rank n-tl = 2,

-2 1

39

so Condition A is satisfied. For any a 6 EL
= - +
P(x) avl(x) 032(x)

satisfies P(xl) = P(x2) = 0, so that P(x) interpolates
F(x) in n-tl = 2 points of E. Clearly, there are an
infinite number of these interpolating polynomials showing
that the observation of Motzkin and Walsh is incorrect.

The solution of the given approximation problem is
indeed included among all those functions which interpolate
F(x) in at least n-tl points, but this collection of inter—
polating functions need not be finite as claimed by Motzkin
and Walsh. Of course, this counterexample is possible only
because Y does not satisfy the Haar Condition. By Theorems
3.8 and 3.13, we are guaranteed that the given approximation
problem can always be solved in a finite number of steps.

Before proceeding any further toward a solution of
problem (P*), we pause to consider some of the difficulties
that lie ahead. Essentially we want to find a point on an
n4-1 dimensional cross section of the LP unit ball in n5‘
at which a translation of a fixed subspace is tangent to the
ball. So far we have reduced the problem to one of consider—
ing only a finite number of points. As one can see in the
following figures, these points correspond closely to corners
of a polyhedron, hence the name corner points. In the figures,
the corner points have been connected by straight lines rather
than by the curved arcs that one would obtain when the 2

unit ball is intersected with the specified plane K.

Iljllllwvllc'llll‘l I‘ll!

 

 

Figure 2(a). m = 3

p = 1.0, .9, .8, ..., .2

 

 

 

 

 

 

0 10
K = span -1 , l
10
Figure 2(b). m = 5, p = 25
FF-l) [301‘
0 1 l
K= span! 1 , 2 F
2 3
K LBOJ LOJJ
Figure 2(c). m = 5, p - 2
r (so) (07‘
l -5
K = spané 5 , -l P
S -l
K LOJ LSOJ/

 

 

 

 

 

 

41

Our approach to the problem up to now has been sim-
ilar to the idea behind linear programming. In linear pro-
gramming problems, a linear functional defined on a finite
dimensional space is maximized subject to certain linear
constraints on the same space. In this case, one also pro—
ceeds by reducing the number of points at which the solution
may occur from an infinite number to a finite one. In linear
programming, however, the points one is left with actually
are corners of a convex polyhedron. To solve the linear
programming problem, one moves from one corner to an adjacent
one always increasing (at least not decreasing) the value of
the objective function, until a corner point is reached where
all adjacent corners give no higher values for the objective
function. By the convexity of the polyhedron, one then con-
cludes that such a point is in fact a solution of the linear
programming problem. The advantage of such an algorithm is
that one may not have to check all of the corner points all
of the time. In practice, the number of points actually com-
puted and checked is usually far less than the total number
of corners, though there are examples where the usual simplex
algorithm would end up checking all of the corner points.

If possible, we would like to develop a similar
exchange type algorithm to obtain a solution of problem (P*).
There is no difficulty in moving from one corner to another con—
tinually increasing the value of the objective function until
a point is reached all of whose neighbors yield no higher

value of the objective function. The crucial step at which

42

we encounter trouble is trying to conclude that such a point
solves problem (P*). Since the 2p unit ball is not con-
vex, some of the corner points often lie inside the convex
hull of all of the corner points. In figure 2(a), as p
decreases the cross section of the unit ball changes from
convex to non convex. Figures 2(b) and 2(c) are further
examples of non convex cross sections of various 1 unit
balls. Moreover, it is quite possible that a local maximum
of the objective function may not be a global maximum. In
other words, we may not have solved problem (P*). Figure
2(c) shows a situation in which this may occur

Keeping this obstacle of non convexity clearly in
view, we begin to consider the practical task of actually
solving problem (P*). We now turn to the construction of

algorithms for solving problem (P*).

3.14 Remarks. Given any point z E Kea(s), there exists a
B G HJHJ' such that z = DB, where the definition of D
appears in 3.6. Also, since 3 E K*, (2‘s) = (DB‘s) =

Bn+l(s}s). Problem (P*) requires us to

maximize (w‘s) over all w 6 K® (s), ”WHP = l, i.e.
. . n+1 .
max1mize Bn+1(s|s) over all 5 6 ﬁt , HDBHP==1, i.e.

maximize 5

n+1 over all B e 11?.“1 , HDBHP = 1.

Using this last formulation of problem (P*) and Theorem
3.8, we see that the solution of problem (P*) entails at

most the following:

43

(1) find all those 6 6 HJHJ' for which D5 is
a corner point;

(2) select from this collection the one with the
largest n+1St coordinate, say B. Then

(3) z = DB solves problem (P*).

If D satisfies the Haar Condition, then it is quite

1 such that D5 is a corner

easy to find all 5 6 ﬁg“-
point. From Theorem 3.13 and Corollary 3.10, we know that

the solution of problem (P*) must be among the (1:11) dis-
tinct corner points each of which have n coordinates equal
to zero and m-n coordinates non-zero. Suppose we wish to

find the corner point z for which [i‘zi = 0} = {i1,...,in}.

Select any in+1 e {l,...,m}~.[i1,...,in}, and with A =

T _ _ T
(ri ,...,ri ) , 801ve AX - en+1 Where en+1 — (O,ooo,0,l)
1 n+1
an+1. The Haar Condition insures that x exists and is

. _ x _
unique. Set 5 - “DxﬁT7E . Then 2 — DB has exactly n
P

coordinates equal to zero, viz. zil = --- = zin = 0, and
”zup = 1. In this manner, all of the corner points can be
found.

The above algorithm lacks three important features:
(1) a method for determining if the Haar Condition is satis-
fied: (2) an orderly way of selecting the (2:) points to
be computed; and (3) the ability, at least theoretically, to
ignore some of the corner points some of the time.

The first of these is lacking for the very good reason

that checking for the Haar Condition involves more work than

44

solving the original problem itself. The second shortcoming
will be corrected shortly, and the third problem will be
considered afterwards.

Since the set of points in KGD(s) of unit norm and
having exactly n coordinates equal to zero is identically
the same as the set of all corner points when the Haar
Condition holds, having that condition satisfied seems like
an ideal setting in which to work. The tremendous problem
of verifying that the Haar Condition holds diminishes that
ideal somewhat. An even more damaging blow is leveled by
Lemma 3.12 which says that the greatest number of possible
corner points occurs only when the Haar Condition holds. In
other words, that situation requires the most work to find
all of the corner points since there are more corners to find.
Our algorithms for finding all of the corner points will work
whether the Haar Condition holds or not. In fact, it is

actually faster without that condition present.

3.15 Definition. Let U = {u e N“ l1 _<_ u1 < (12 < < nn 5
m} and T = [1,...,(:)}. Define Y:T -o U by the following
rules. Given t E T,

(1) Set t =t,u

O = O, and i = 1.

O

(2) Find ui E N such that ui-l < ui _<_ m—n+1

ui-l -u
_ m-j i
and 1 5- ti-l ._ 2 (n-i) S (m-i) °
j-l+u.
l-l
ui-l
(3) Set ti = ti-l - Z) (::2 and increment

j=l+u.

45

(4) Repeat steps 2 and 3 until un has been found.

Then Y(t) = u E U where the components u1,...,un of u
v

were found above. We adopt the convention that Z)(-) = 0

. j=u

if u > v.

We must verify that Definition 3.15 makes sense,
i.e. that u can be found as claimed. We begin this task

by proving the following lemma.

3.16 Lemma. Let m,n,i,k E 11, i'g n < m, k.g m-+i-n-l.

m—n+i (m-j m-k
Then Z) . = .
j=l+k n-i) n-1+l

proof: Since (3) = (ati)+(p;1) for all rm; 6 H, p > q,

(nfitl)

(m;E;1) + EIEIi) ’

(”Ekll) [(mﬁklz) ($2311)]

= m;511> + (mgfiz) + ----+<“;f:1) + (3::ii
(

= (“3511) + mgfzz) + ----+(“;f:1) + :zi) ,
m— n+i
= j= =l+k<m

3.17 Proposition. Y is well defined.

Proof: First observe that for any i, 1.3 i's n, if ui

exists then it is unique. For suppose that both u,v satisfy

46

the conditions for ui specified in (2) of definition 3.15,

and suppose p < v. Then

v-l
lSti-l" 21103)’
j=l+ui_

L11 - - —'
= t. - j=1-E1Hmmj) (m'l )4 3'=(1+1n m ’3

- “'25 (“‘"3 - 12:95.0

._ n-l
j—l+ui_1

IA
n
P.
l
H

. ui-l _. m-
31nce l < t. — Z} 9 < ( P) by hypothesis. This
- 1—1 ._ n-i —- n-i
j-1+ui_1

contradiction leads us to conclude that if ui exists then
it is unique.

We show inductively that the ui exists. Since uo==0,
we first construct ul. For this we must show that for some

integer k, 0 < k g m-n + 1,

<(m-k

lSt-jalm:1)'<'

m m- -n+l m-'
By hypothesis, t.g (r1)’ and by Lemma 3.16, (2 i) = Z‘ (n-i)’
j= —1

Thus there is a smallest integer k, 0 < k g_m-n-+l, such
that

k-l .

k .
m-j m-j
3.231 n_1) < t _<_ 331 n_1) . (3.17.1)

If k > 1, the left inequality of (3.17.1) follows from the
k-l .

definition of k. If k = 1, then 2) §:i) = 0, and by
i=1

47

hypothesis, t > 0. Thus we conclude from (3.17.1) that

1512— :i': _1)< <05];
which is what we were to show. We let u1 = k proving
the existence of ul. We have already established the
uniqueness of ul.
Suppose that u1,...,ui_1 have been found satis-
fying the conditions specified in Definition 3.15. We next

show that there exists a unique ui, ui_1 < ui g m-n-+i,

such that
u.—l
1
lSti-l' Z lm(:i)— (n:- 1111')
j= Wl+u
By the induction hypothesis on ui_1, i.e. that ui_1 satis-
fies the conditions of Definition 3.15,
u. -1
1-1
-1
2 > <1
1 l 1 2 j=l+ui P(n i+l n—i+l
—u. m-n+i .
And by Lemma 3.16, (m }-1) = 23 (m-? . Thus
n-1+l ._ n-1
j-l+u.
1-1
ti-l 5 .=133 n-i
3 i-l

So there is a smallest integer k, ui _1 < k < m-n-+i, such

that
k— 1 k j
231(n:2)<t. < Z) ('. . (3.17.2)
j=l+u._ 1'1 " j=l+u “‘1)

i-l

48

If k > ui_1 + l, the left inequality of (3.17.2) follows

from the definition of k. If k = ui-l + 1, then
k-l .
. 2) 2:3) = O, and by the induction hypothesis on ui_1,
j=l+ui_1
ti_1‘2 1. Hence from (3.17.2) we have
k-l .
_ ' x m- J m-k
1.3 ti-l .=1fa (n-i ‘3 (n—i
3 1-1

By the uniqueness of u2 already proven, we conclude that
there exists a unique ui satisfying (2) of Definition 3.15.

This concludes the proof of Proposition 3.17.
3.18 Preposition. Y is a one-to-one function.

Proof: Let 1 g t, t’ g (:2) such that Ht) = \Ht’) = 11.

Using the notation of Definition 3.15, observe that
m-u m—u
n _ n _
15tn5(n-n)‘(0 )_1

' ' = ' = = '
implying that tn tn 1. Suppose that tn-k tn—k for

k = O,1,...,i. We shall show that t = t

n-(i+1) n-(i+1)° BY

(3) of definition 3.15,

un—i-l
t ' = t ° + Z (“173) ,
n-(1+l) n—1 j=l+u . 1
n-1-l
un-i-l
= t’ . + Z, (mfj)
n-1 '=l+u 1 ’
J n-i-l

= :
tn-(i+1)

49

Hence tk = t; , k = n,n-—l,...,O. But t = t0 and t6 = t’
so that t = t'. Thus Y is a one-to-one function.

Both T and U have (2:) elements by construction,
and since Y:T 4‘U is one-to-one, it must also be onto.

-1 . .
Hence Y :U 4 T exists. Moreover, by rearranging (3) of

definition 3.15 and iterating, we have

Y'l(U)

u.—l
n 1 m-j
1.+ 23 Z n_i)] ,

1=l j=l+ui_1

v
where we let u0 = O and 23(0) = 0 if p > v.
J=u
3.19 Definition. Imﬁ: T and U be as in definition 3.15.

Define §:U 4 T by Q = Y-l.

 

An element u E U corresponds to the possible corner
point z 6 n91 for which 21 = O, i = u1,...,un. The cor-
respondence U u T is not as arbitrary as it may appear at
first. An example with m = 6, n = 3 appears in Table 3.1.
Notice that the first zero coordinates appear as high as
possible for a point in 351 on the left and proceed down-
ward to the right.

We are now prepared to present an algorithm for solving

problem (P*). Assume that we have "the usual n dimensional

situation".

3.20 Algorithm. Step 1 Set q = 1, pi = 1, B. = o for
. m
1 = 1,...,(11) .

Step 2 Compute Y(q) = (k1,...,k

50

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Q(u)=t= 1234567891011121314151617181920
l l l 1 1 l l 1 1 1.12 2 :2 2 :2 2 £3 3 13 4
u = Y(t) = 2 2 2 2 3 3 3 444 S 23 3 13 4 41 5 44 4 £5 5
3 4 5 6 4 5 6 5 6 £344 5 £5 5 6 (5 S 6 £5 6
O O O O O O O O O (J
O 0 O O O C) O (3 O (3
Corresponding O O O O O O O O O 0
point in mm o o oo o o o o o o
O O (D O O O 0 (D (3 O
O O O (D O C) O O C) O
k -r* J \-——-—~z——-——J ‘-¢——’\VJ
5 4 3 2
(2) (2) Mb)
Table 3.1
Example 1: Compute Y(15). Given: u0 = 0 t0 = 15
Compute: u1 = 2 t1 = 5
u2 = 4 t2 = 2
u3 = 6 t3 = l
2
Y(15) = 4 , which agrees with the table above.
6
Example 2: Compute §(u) for u = (1,4,5)T
u.-l u.-1
n 1 . 3 1 -
Nu) = 1+ .23 . 23 (MPH .23 L. 23 (3-3)]
1-1 j-1i_1+1 1=1 j=l+ui_l
O . 3 . 4 .
2(623)+ 2(61’>+ 23(653)=1+0+(‘i>+(i)+°
i=1 i=2 i=5
= l-+4-+3 = 8, which again agrees with the table above.

§£22_§

Step 4

Step 5

Step 6

Step 7

Step1§

$22.2

§£22_19

§£22_11

Step 12

51

Construct A = E and set I = [k1,...kn}.

Select i E {l,...,m}~\I and form C = (:8).
1

Does C contain n-+l linearly independent
rows? If yes, then go to Step 7. If no,

then continue.
Set A = C, I = I U {i} and return to Step 4.
Solve Ax = O, (ri‘x) = 1.

Compute Dx and “Dxlli’D/p = Y, Where D is

given in definition 3.6.

x sgn x
Set 5 - l—Eill and z(q) = —————§:l-Dx.

q _ Y Y
Let J = {j‘(Dx)j = 0}.

Form all possible sets containing exactly

n different elements of J.

For each set {j1,...,jn} found in Step 10,
with 1,5 j1 < j2 < ... < jn g;m and j =

. n . _ =
(Ji)i=l’ compute 0(3) — t and set pt 0.

m

If Pi = O, i = 1,...,(n

), then go to Step
13. Otherwise, let q be the smallest in—
teger k, l _<_ k _<_ (1:1) such that pk = 1.

Return to Step 2.

52

Step 13 Select k, l _<_ k _<_ (1‘), such that Bk =
maxfBi|l _<_ i < (2)]. Then z(k) solves
problem (P*) and max{(s‘w) 1w 6 109(3),

Hwnp = l} = Bk(s\s).

In Step 5, one must eventually answer the question
in the affirmative since the row rank of D equals the column
rank of D which is n-rl by hypothesis. The question it-
self can be answered in a number of ways. For example, one
might orthogonalize the rows of C and check whether any
zero rows occur. This method will also help When step 7 is
reached since one then knows which rows of C yield a non—
singular matrix .6 with which to solve Gx = en. Steps 10
and 11 are present to exploit Lemma 3.12 which says that
some of the original (3:) possible corner points may in fact
be redundant. In Step 9, one need not save all of the Bi
and z(i) but only the current largest Bi and the corres-
ponding z(i) which would make Step 13 unnecessary.

Algorithm 3.20 solves problem (P*) for any choice
of positive integers m,n with m > n. The price paid for
this flexibility is a rather complex algorithm involving a
considerable amount of index manipulation. In the case where
n is very small or nearly equal to m, one can avoid much
of this work by deve10ping special algorithms designed to
solve only problems with a particular fixed choice of dim K.
As with algorithm 3.20, Corollary 3.9 provides the basis for

each algorithm.

53

Recall that if dim K = n, then 2, the solution of
problem (P*), has at most m-—n nonzero coordinates, and
z E K59(s) implies that 2 must also satisfy m-(n4-l)
orthogonality constraints. For n = m-l,m-2,m-3, these

facts lead to simpler algorithms for solving problem (P*).

3.21 Algorithm. Corollary 3.8 has already given the solution
2 when n = m- 1. In that case, 2 = ej sgn sj , where ej
is the usual unit basis vector in ﬁg! and j is determined

by \sj‘ = maxf‘si‘ (1'3 1,3 m}.

3.22 Algorithm. Suppose that n = m-2 and that a E
. T T
(K€B(s))*‘\{O}. With 3 = ($1,...,sm) , a = (a1,...,am) ,

and z = (21,...,zm)T, problem (P*) reduces to

Maximize f.., 1.3 i, j g_m, i ¥’j,

1)
f.. = 5.2. + 5.2., (3.22.1)
13 1 1 J 3
subject to O = a.z. + a.z., (3.22.2)
1 1 j j
= P P
1 (21‘ + ‘23.) . (3.22.3)

Let i and j be fixed. Then

aizi‘ = ajzj‘, by (3.22.2),

‘ai‘P‘zi‘P

lag-11°12,- 1" ,

‘ajﬁ’u- (zi1P), by (3.22.3).

Solving for ‘zi‘ we find that

54

 

 

 

13-!
J
lz.‘ = . (3.22.4)
1 1/
(‘ai|p + laj]p) p
Similarly,
1a.!
1
|z.[ = . (3.22.5)
J p p l/p
((ai( + ‘aj( )
Noticing in (3.22.1) that fij is maximized by taking
sgn zi = sgn si and sgn zj = sgn Sj’ we conclude that for
fixed i,j the maximum value of fij is
‘ajsi‘ + ‘aisj
.. = —- . (3.22.6)
13 p p 17p
(‘31) + [aj‘ )
And the corner point z for which (z‘s) = fij has coordin-
ates

‘aj\ sgn si

1 (‘ai‘p + ‘aj)p)l/p ,

z. = ‘ai‘sgnsi

1 .9
J (‘ai‘P+ ‘aj‘P) /P

2k 0, k#1,j, lgRSm.

We now compute the (1;) values f... If f is the largest
ij pv

of the fij’ then the solution of problem (P*) is

z = z e + z e
u H v v

55

3.23 Algorithm. In a similar manner, the

(P*) can be found directly when n =

solution of problem

m-3. Let (al,...,am)T,

(bl,...,bm)T E (I<@(s))‘L be linearly independent. Problem
(P*) can be stated as
Maximize gijk, 1 _<_ i,j,k _<_ m,i 713's! k r’ i,
gijk = szi + sjzj + Skzk (3.23.1)
subject to O = a. + ajzj + akzk, (3.23.2)
O = bizi + bjzj + bkzk’ (3.23.3)
_ P P P
1 — (21‘ + ‘zj( + [zk| . (3.23.4)
Let i,j,k be fixed. Then (3.23.2) and (3.23.3) can be re-
written, after eliminating the 2k from (3.23.2) and the zj
term from (3.23.3), as
A.z. + A.z. = 0, (3.23.5)
1 1 3 j
Bizi + Bkzk = 0, (3.23.6)
ai ak aj ak Aj Ai
bi bk bj bk bj hi
and Bk = bkA We have
P P - P P
‘A1‘ ‘21) — ‘Aj‘ ‘zj‘ , by (3.23.5),
(3.23.7)
P P _ P P
[Bi‘ \zi] - |Bk| )zk) , by (3.23.6).

56

Also

‘7B
u

p p p P p p
AjBk AjBk‘ ‘21) + (AjBkl lzj‘ + AjBk‘ (21“ , by (3.23.4),

Bk)p‘zi‘p+ ‘Ain‘p‘zi‘p+ \Aj311p)zilp, by (3.23.7),

_ p p p
- ((AjBk\p+ (Ain‘ + Ajaﬂ )‘zi] . (3.23.8)

Let D—(ABP+ABP+ABP)1/p tht

“)jk‘ 1k) lji) :50 a

(A. Bk)
(2. 1 =-—4l——- , by (3.23.8).

. . (Ain( |A.Bi(
Similarly, one can show that lzj‘ =-——Er—— and ‘Zk‘ =-——%r——.

From (3.23.1), it is clear that for i,j,k fixed, gijk is
maximized by taking sgn 2H»: sgn SH , p = i,j,k, with the
maximum value being

|Aj Bk M‘-+‘A Bk 3. ]‘-+ A. B. 13k)

 

gijk= D
The corner point z for which (z|s) = gijk has coordinates
zi = ‘AjBkI sgn Si/D ,
zj - |Ain‘ sgn sj/D ,
zk = lAjBi‘ sgn sk/D ,

21' = o, z e {l,...,m}\{i,j,k}.

We now compute the (1;) values gijk corresponding to the
values of the objective function at all of the corner points.
If gMW is the largest of the gijk’ then the solut1on of

problem (P*) is

57

= + .
z zxex+zueH zvev

As these special algorithms indicate, the amount of
work required to find the solution of problem (P*) by this
technique increases rapidly as dim KL increases. One could,
however, try the same general approach for small values of

n = dim K.

3.24 Algorithm. If K is one dimensional, say K = (a)
)T

 

where a = (31,...,am , then 2, the solution of problem
(P*), by Corollary 3.9, has at least one coordinate equal
to zero. We can also assume that (a1) + 131‘ {'0 for

l g_i g_m since otherwise the entire problem takes place

in IRm-k, k > 0. Problem (P*) can be stated as

Maximize f. l < i'g m,

 

 

 

 

i’ ‘—
fi = (aka + 515‘s), (3.24.1)
subject to O = aiai + Bisi’ (3.24.2)
l:= Maia + Bisﬂp . (3.23.3)
If :31 = 0, then ai = o by (3.24.2), and hence |le =
“s”.1->1/p by (3.24.3). If ai = 0, then Bi = O by (3.24.2),
- -1/p _ aiai
and hence ai‘ — ”anp . If aisi f'O, then Bi - - Si
a.a. .
_ _ 1 1 = P _._i
by (3.24.2), so that l - "aia 81 sup (Oi) ”a Si snp
a. s.
- _1 l/p _ ‘ 1‘
by (3.24.3). Thus (0‘11- 1/ua - s. sup — 17p .
1 Has. - a.su
‘a.‘ 1. 1 p
. . . _ i
Similarly, if aisi ¥'O, ‘Bi‘ — 1/5 . In all three
llasi - ais”p

cases,

58

 

 

s. a.
‘ai‘ = ‘ 1‘ l/p 3 and ‘Bi‘ = ‘ 1‘ lfp (3.24.4)
”as; ' aisnp H331 ‘ aisnp
Since (a‘s) = 0, fi = Bi(s|s). To maximize fi’

we will choose sgn Bi = 1 whenever Bi {'0. Thus by (3.24.2)
we conclude that sgn ai = —sgn(siai) if ai ¥ 0, and of no

interest if ai = 0 since Bi = 0 then. Thus by (3.24.4)

—si sgn ai ‘ai‘
Ct. = 1/ , and Bi =
”as. — a.su P
1 1 p

 

l/p '

H331 ' aiSHp

By evaluating the m Bi's, we have essentially - up
to a factor of (s|s) - evaluated the objective function at
all of the corner points. Consequently, if 5“ is the largest

of the 6i, then the solution of problem (P*) is
z = a a + B s.
H H

Algorithm 3.20 and, to a lesser degree, the four
special algorithms just described have two distinguishing
features - one good and the other bad. On the one hand, they
always work, i.e. they give the correct solution of problem
(P*). On the other hand, algorithm 3.20 in particular can
involve a tremendous amount of work since every corner point
must be computed. Consequently, unless m and n are fairly
small numbers or the Haar Condition is so flagrantly violated
that the actual number of corner points is reasonably small,
algorithm 3.20 does not represent a computationally feasible

method for finding the solution of problem (P*).

59

3.25 Definition. Let x,y be corner points with xi = O,
iEI,yi=O, iEI’, xjalo jeJ, yjaKO jEJ’, and
I U J = I’ U J' = {1,...,m}. We say that x and y are
adjacent if {ri‘i 6 I n 1’} contains n-l linearly inde-
pendent vectors.
The idea behind this definition is most readily seen
if we assume that the Haar Condition holds since otherwise
the idea can be easily lost among the subscripts. In this
situation, x,y being adjacent (corner) points implies that
I and I' both have exactly n elements and [ri]i E I n 1’}
contains n-—1 linearly independent row vectors, i.e. I n 1'
contains exactly n-l elements. Thus there is an i 6 I
and jEI' such that I=JU {i}\[j} and I’=Iu{j}\{i}.
In terms of coordinates, all but one of the zero coordinates
of either x or y is also a zero coordinate of the other.
It should be noted that adjacent points can be much
farther apart than one might expect a term like adjacent to
allow. For example, if K is a one dimensional subspace,
then each two corner points are adjacent. This follows im—
mediately since corner points need only have one coordinate

equal to zero.

3.26 Definition. A corner point z is a local solution of

 

problem (P*) if (z‘s).2 (x‘s) for all x adjacent to z.
It follows from the definition of adjacent corner
points that there can exist corner points which are not ad-
jacent. Consequently, a local solution of problem (P*)

need not be a solution of problem (P*).

60

We can now remedy the shortcoming of excessive com—

putations found in algorithm 3.20 by presenting an exchange

algorithm similar to that used in linear programming. The

solution found in this manner may, however, only be a local

solution of problem (P*). As always, we assume "the usual

n dimensional situation".

3.27 Algorithm.

Step 2.

Step 3.

Step 4.

Step 5.

Step 6.

Step 7.

We claim that

‘we show that

N
2

~
2

Step 1. Find a corner point z = DB, and

set I = {i‘zi = 0}.

Select n linearly independent rows

ri1,...,rin of D w1th 11,...,1n 6 I.

Pick u 6 [il,...,in}.
Relabel the ril,...,rin as p1,...,Pn

with En = r“.

Orthogonalize the 5

and (ii) for i
i-‘l (5. ‘pi)

jéa (lepj5 pj.

Pick k 6 {l,...,m}‘\I.

II
N
u
b
:1
'0
I“
ll
'0 I
I

Set B - YknDYk" sgn(Yk)n+l, where

Y _ (pnlrk) B

k - pn - (B‘rks

= D3 is a corner point. To verify this,

satisfies the definition of a corner point.

61

By construction, HEMP = 1. Let J = {j‘Ej = O} and N =

{x e IRn+1\(x|rj) = O,j 6 J}. We need to show that dimN=l.
By construction, {ri,...,ri ,rk} are n4—l linearly inde—
pendent vectors. {il,...,i:,k}\\{u} c J, so dimhrg 1. But
dim N‘p 1 since a E N {0}. Thus E = DE is a corner

point.

~

Step 8. (i) If 5h+1 > Bn+1, replace 5 by B and
z by E, and then return to step 2.
Notice that we have n linearly inde—
pendent rows on hand from above.

(ii) If Bn+1 g 6M1,

try another k 6 {1,...,m}\\I until all

return to step 6 and

of these have been tried.

(iii) When all of the k E {l,...,m}~\I have
been tried in (ii), return to step 3 to

choose another p E {1,...,1n}.

(iv) When all of the p E {i,...,in} have
been tried in (iii), 2 is a local solu—

tion of problem (P*) with value Bn+l’

Step 1 can be accomplished in the same manner that
corner points were found in algorithm 3.20. Experience with
a few examples seems to indicate that a good starting corner
point to find in step 1 is that 2 which has zero coordinates
where the coordinates of the vector s are the smallest in abso-
lute value. In many cases, this corner point actually solves

problem (P*).

62

Step 8(i) insures that algorithm 3.27 eventually
terminates since the value of the objective function Bn+l =
(s‘z) is non decreasing and there are only a finite number
of corner points. Lemma 3.12 guarantees that in step 8(ii)
we need only check the n coordinates listed rather than
all of the zero coordinates. The assertion in step 8(iv)
that the point 2 found by algorithm 3.27 is a local solution
of problem (P*) follows directly from the definition of a
local solution.

If dim (K9 (3)) is close to dim IRm, then the
computations involved in choosing n4-l linearly independent
row vectors (step 2) and orthogonalizing them (step 5) can
become tedious. In this case, it may be advantagous to ex-
ploit the fact that when n is nearly equal to m, then
dim{(K® (s))"‘} = m-n-l is quite small. Before presenting
such an algorithm with all of its details, a brief example

clarifying the key ideas involved is sketched.

Assume "the usual n dimensional situation". First
find m-n-l linearly independent vectors bl""’bm-n-1 6
mm which span (K®(s))"', i.e. DTbj = 0 j = 1,...,m—n-l.
Letting x0 = (x|s), we can reformulate problem (P*) as:
maximize x0 subject to the constraints
-x0 + X181 + x232 + --- + xmsm a O,
xlblJ + xzb2j + + xmbhj==0, j = l, .,m—n—l,

63

By relabelling indices if necessary, we can assume that the

linear constraints can be put in the following simpler form:

- -+c x -+----+c x = 0
x0 m—n m—n m m ’
+ x +---+ x =0

xl am-n,l m—n am,l m ’

x -+a x -+----+a x = O

2 m-n,2 m-n m,2 m ’

'° +°°--+a O.

x -+a x x =
m-n-l m—n,m—n-l m-n m,m—n-l m

If we set the n variables x ... = xm = O, we would

m-n+l =
be left With -x0 + Cm—nxm—n = O,
Xj + am-n’ jxm-n = O, J a 1’ ...,m-n-lo

Substituting these restrictions on x into the equation

m-n
- ”“971 -l/b
qup 8 l, we find that ‘Xm-n‘ = (l + -Ef ‘am-n,j|P) ’
j—l.
m-n-l _1/P
and thus we conclude that ‘xol = cm-n‘(l + jEQ ‘am-n,j‘p)

Similarly we could have set all of xm-n through xm equal to

zero except xm 0's kig n, and found that \x

0‘ =

-n+k’
m'n'l p ~1/b
‘Cm-n+k‘(l + $31 ‘am-n+k,j‘ ) in that case. Since we
are interested in maximizing x , choose q to be that sub-

0

script which maximizes this last expression, and call the

O *
max1mum value x0.

and some xi, 1 (”i ng-n-l, we can repeat the above steps

If we now interchange the roles of xq
to obtain another i3. If i3.g x5, then the objective
function has not been increased, so we try another 1 g_i g

m-n-l until all have been considered. When that occurs,

64

x is a local solution of problem (P*) with (x‘s) = x3.

If 21*

O > x3, then set x* = i* and repeat the computations.

O 0

With this general scheme in mind, we present

3.28 Algorithm.

Step 1.

Step 2.

Step 3.

Step 4.

Step 5.

Find m-n—l linearly independent vectors

b

l”°°’bm-n—l 6 n?“ such that DTbj = 0,

j = 1,...,m-n-l.

Select I c {1,...,m} containing m—n-l
indices 11"°"1m-n-l’ so that xi. can

be eliminated from -x0 + (s‘x) = O and

from all but the jth equation of the

system (bk‘x) = O, k = 1,...,m-n-l, in
which xij has coefficient 1. Call the

resulting system -x + (c‘x) = O,

O

(aj‘x) = 0, j = 1,...,m-m—1.

Find k 6 {l,...,m}-I which maximizes

m-n-l _1/b
= P
(l + j§1 lak,j‘ ) .

Set x = ckxk.

For each ij 6 I, form Ij = I U {k}\\{ij}.

Interchange the roles of xi and xk, i.e.

eliminate xk from -x + (c‘x) = O and

0
from all of (ai‘x) = O, i = 1,...,m-n-l,

except where i = j in which xk has

65

coefficient 1. Call the resulting system

"X + (C(j)'X) = 09

O

(ai(j)|x) = O, i = 1,...,m-n-1.

Step 6. Find H e {l,...,m}\Ij which minimizes

m-n-l _1/
(1+ 23 (mp) p

i=1 ‘aH’i

3

XH(J)
* _ . .
and set xj - XH(])Cu(j).

Step 7. (i) If (x; .3 \x*‘ for all ij 5 I, then
the vector x found in steps 2 and 3 is

a local solution of problem (P*).

(ii) If any |x;\ > ‘x*‘, let x;

be the
largest one in absolute value. Set I =

I a. = aj(q), j = 1,...,m-n-1, c = C(q),

q’ 3
k = q, and x* = x3. Return to step 5.
To see that algorithms 3.27 and 3.28 compute the same

local solution of problem (P*), observe the following

eqivalence of steps in the two algorithms:

Algorithm 3.28 Algorithm 3.27
Step 2 Steps 2,5
Step 3 Steps 6,7,8(ii)
Step 5 Step 8(iii)
Step 7(i) Step 8(iv)

Step 7(ii) Step 8(i)

66

Before leaving the subject of algorithms for solving
the LP problem, 0 < p < 1, one unsuccessful attempt to
solve problem (P*) may be of interest. Rather than moving
from vertex to vertex in an exchange algorithm, one might
try to solve a series of problems in which the dimension of
the subspace K changes by one at each step.

Since one can find the global solution if either
KG) (3) = IRm or dim(K®(s)) = 2, hopefully one could start
with the solution to one of those two problems, and by suc-
cessively decreasing or increasing the dimension of K by
I, obtain the solution of the given n dimensional problem.
Any such algorithm would depend upon getting some information
about the solution of an njtl dimensional problem from a
known solution of a related n dimensional problem. In some
sense, the solutions should not be very far apart.

Assume K, K satisfy "the usual n,n-+l dimensional

situations" respectively with K a subspace of R, that z,z

solve problem (P*) for K,K,. that zi = 0, i e I, E. O,

1
161', zjae’o,jeJ, zj¥0,jeﬁ, and IUJ=iu3=
{1,...,m}. We can also assume that the Haar Condition holds

in both cases so that ‘1] = n, ‘I‘ = n4-l, ‘Jl = m-n, [3| =
m-n-l. Under these conditions, it was conjectured that

I CT, or equivalently, 3cJ. In IR3 this means that the
solution of the two dimensional LP problem must lie on one
of the four edges of the unit ball passing through the vertex
that was the solution of the three dimensional problem. Alter—

natively, the solution of the three dimensional problem is one

67

of the two end points of the edge on which the solution of
the two dimensional LP problem lies.
Unfortunately, this conjecture is not true even in

IR3 foreither o<p<1 or p=l.

 

l
3.29 Counterexample. Let p = 1/2, s = [i.ol] , and K =
l

(5);. By corollary 3.11, e solves problem (P*). But

2
.97
if K is taken to be (v), where v = 1.01 , the solution
.02
.009
of problem (P*) is approximately 0 , which is on an
.9

edge that does not have e2 as either of its end points.

11
3.30 Counterexample. Let p l, s = 5%[10] , and K = (s)‘.

4
Corollary 3.11 is true if p = 1 also, so el solves this

three dimensional problem. But if K = (v), where v =
-110 + 66s 3
137 - l77e , ‘with 0 < e < 135 , then the solution of the

-40 + 2613
0

two dimensional problem is l-e , which is on an edge that
s

does not have el as either of its end points.

We conclude this section by pointing out that many
of the results obtained for 0 < p < 1 also hold for the
case where p = 1, thus providing a method of solving the
£1 problem. ‘With p = l,'rheorem 3.1 is a special case of
Theorem 2.4 in [21]. If we alter Lemma 3.4 so that (iii)
reads H + z properly supports B at z, i.e. either f(x) <
f(z) for all x E B\\{z} or f(x) > f(z) for all x e

B~\[z}, then Lemma 3.4 is true for p = 1. In the proof,

68

choose x, 5, E, and e as before, but obtain a contradic-
tion either to the hypothesis that H + 2 supports B at
z or to the strictness of the support. Theorem 3.5 then
follows immediately for p = 1 if we again assume that

H + z prOperly supports B at 2. Theorem 3.5 and all
three of its corollaries, Lemma 3.12 and Theorem 3.13 all
hold as previously stated for p = 1.

Algorithms 3.27 and 3.28 both solve problem (P*)
when p = 1 since the £1 unit ball being convex eliminates
the possibility of finding a local solution that is not a
global solution of problem (P*). Consequently, in the
£1 case, algorithm 3.20 is unnecessary since all three al-
gorithms find the same solution while algorithm 3.20 requires

much more computational effort to do it.

CHA PI‘ER IV

NUMERICAL RESULTS

In this chapter we consider some of the computational
limitations placed upon the algorithms presented in chapters
II and III. Since the main application of algorithm 2.2 is
to the 2p spaces, 1 < p < a, we discuss the computational
difficulties encountered in that context, and then present
numerical results from two examples. Examples of LP approx—
imation, 0 < p,g l, conclude the Chapter.

As we noted in chapter II, in the case of the LP

spaces, 1 < p < a, if y i 0, then

* IYjIP-l
yj = 1 sgny. .
HYIIP'

This particularly simple form for y* makes the evaluation

* . * . .

of yk in step 2 and (yk-akvk) in step 4 of algorithm
2.2 immediate.

The two main computational difficulties occurring
. . . . * _
in algorithm 2.2 are finding Gk such that ((yk-akvk) ‘vk)-0
and solving the original problem (P) once the solution of
problem (P*) has been found. Since values obtained on a

computer are seldom exactly correct, each of these difficulties

69

70

brings up a related one also. How close to Gk is close
enough, and how close to vk = 0 is close enough to call
yk the solution of problem (P*)? Since the questions re-
lated to knowing when problem (P*) has been solved are
the easier, we dispose of those first.

By Theorem 2.4 of [21], if y solves problem (P*),

then

(s y)y

is in the image of A, and x 6 ﬁg! such that Ax = z solves
problem (P).

Let the rank of A be k. Form the m)(k matrix
M composed of the k linearly independent columns

ai ,...,ai of A. Then
1 k
w = (MTM)-1MTz
satisfies
Mw = 2.
Let x 6 n5‘ be given by
w. if i E {il,...,ik]
o if ig{il,...,ik] ,
and let y 6 B9‘ be any solution of

Ay = 0.

71

Then u = x4-y satisfies
Au = z,

and hence u solves problem (P). If the rank of A is
n, then the solution of problem (P) is unique, and in
general, the solutions of problem (P) are the translates
of an n-rank (A) dimensional subspace of 35‘.

The answer to the question of how small vk must
be to accept yk as the solution of problem (P*) is somewhat
dependent upon the matrix A. Since

= Yk ‘ 0‘ka
yk+1 “yk - 0'1.ka ’

1im y - v ) = l and
k“ H k “k k.) ’

”yk" = 1 for all kip 0,

‘(akvk)j| is almost exactly how much the jth coordinate

of yk is changing at each iteration. Consequently, in

our examples the algorithm terminated if

max[‘(akvk)j‘ )l'g j gim} < 10‘7,

and our computed solutions agreed with published solutions
for the same problems.

The related problems of how to compute and how accu—
rately to compute the 0k in step 4 pose the greatest problem
in algorithm 2.2. Where no confusion arises, we drOp the

subscript k since the iteration being considered is usually

I .III III" ‘1‘. (I'illlr II I

l ['1 all! 4' {ll-'1' IIII II. ...-III ‘4' I w

72

irrelevant. We assume throughout the discussion that y,v
are linearly independent since the problem is trivial other-

wise.

Let f(a) ((y-av)*)v)

l m p—l
"y-av”p-l Z)Vj‘y avj‘ sgn(yj avj),
P

 

and define

‘p-lsgn(yj — av.).

J

m
9(a) = jéﬂ Vj‘Yj"aNj

Clearly, f and 9 have the same roots so we consider only

the simpler function 9. By Proposition 2.5, g has a

unique root a and a > 0. Moreover, g(0) > 0 since
*
f(0) = (Y IV) > O

by (2.4.2) and (2.4.3). 9 is clearly continuous since
1 < p < a.

With this information, a number of techniques for
finding the root a of g are available. The following

four methods were used.

1. The method of bisection.
Choose a1 > 0 arbitrarily and compute g(a1). Since g(0)f>0

and the root 0 is unique,

if g(a1) > 0, set 02 = 201 ,

hﬂH
5...;

if 9(01) < 0, set a2 = <1

73

Of course, if g(a1) = 0, then we are finished. Repeat this

procedure until two numbers, say a+ and a7, 'have been

found such that
9(6‘) < o < g(a+).

Then let

If g(a) is positive (negative), replace 0+ (07)
and repeat this bisection of the interval [d+, d']

the root of g is obtained.

2. The secant method.
Begin as in the bisection method by finding d+, a?

that
9(a') < O < g(a+).

Instead of taking the midpoint of the interval [a+
find the point at which the line through the points

9(0*)) and (07,19(07)) crosses the x-axis, i.e.

wmw)-&QW)
a? — d'

a:

 

As above, if g(a) is positive (negative), replace

by a and repeat until the root of g is found.

by 0,

until

such

, d”),

(w

’

0+ (0")

74

3. The secant-bisection method.
After finding 6+, a" as described above, alternate one
iteration of the bisection method and one step of the secant

method. This is similar to Dekker's algorithm.

4. Newton's method.

 

Observe that g is a differentiable function of a except

at those aj such that
y. =a.v., j=l,...,.m.

After checking if any of these aj is the desired root, one
can apply Newton's method to find a such that 9(a) = 0.

Choose a1 > 0 arbitrarily, and compute
9(an)
ah+l = a.n —-§77agy , n = 1,2,...

It should be noted that some care must be taken with
the use of Newton's method. In our examples, it often failed
to locate a for want of a sufficiently good initial guess.
For p small, both the bisection method and the secant method
had trouble converging to the root in some instances. It
occasionally happened that one but not the other had difficul-
ties handling a specific situation. The mixed secant-bisection
method worked quite successfully in these cases enjoying the
benefits of each while avoiding many of their shortcomings.

For values of p roughly between 1.25 and 200, few
difficulties arise with any of these four methods for locating

a. For small or very large values of p, however, 9 can be-

come somewhat unruly.

75

While in theory algorithm 2.2 always solves problem
(P), one should expect a little less in practice. Recall

that

m
"\ p-l
a = V. o-av. s n o-aVo o
9() 33:1 31y] 3) 9 (Y3 J)
The cause of our difficulties is the exponent p-l. If

p is near 1, then

1
‘yj-av.‘p-

.-av. =- .—a.
J sgn(yJ J) sgn(yJ v3)

independent of the magnitude of ‘yj-avj‘. A number that
is supposed to be zero, but because of roundoff errors is

actually 10-15 on the computer, will be greater than .7
after being raised to the p-l power when p = 1.01, and
approximately .966 when p = 1.001. Similarly, when p is

very large,

0 if )yj-'avj' < 1

[.5
H
H

\Y-

J 3

if ‘yj-avj‘

a: if )Yj'avj‘ > 1

The point of these comments is that one should not expect
algorithm 2.2 to be computationally feasible throughout
1 < p < a.

With p near 1, the thirty-two figures of double
precision machine accuracy was sometimes not sufficient to
5

determine a such that (9(a)) < 10— The question of

how accurately one must know 0 becomes of interest at this

76

point. The question will be taken up‘when the actual examples
are discussed.

A second unpleasant feature of the function 9 must
also be considered when p is near 1. We know that yk
converges to some y and that vk converges to 0, but the
behavior of Gk is not known from the theory. Numerical
results indicate that the sequence {ak‘k-Z 0} has two limit
points with {a

2n}
that sometimes differ considerably. For p near 1, one of

and {02n+1} approaching two numbers

the two limits appears to be 0 making those akvk go to
0 rapidly while the other akvk approach 0 more slowly.
At the other end of the range 1 < p < a, our program
came to a halt not because the algorithm was sensitive to
large values of p, which it is, but because the computer
can not store exponents that are too large. More to the
point is that we could not compute the LP norm when p
grew too large.
Two examples were programmed in FORTRAN IV for a
CDC 6500. The first is taken from Barrodale and Young [4].

The linear system to be solved is

x = 1.52
x + y = 1.025
x + 2y = 0.475
x + 3y = 0.01
X + 4y = - .475

x + By = -l.005

128
64
32
16

1.5

1.499972
1.499944
1.499890
1.499894
1.500651
1.503757
1.514762

1.520005
1.520126
1.520215
1.520187

1.520037

1.5200001

1.52

77

—.499955
-.499909
-.499817
-.499671
-.499637
-.500057

—.502571

-.503800
-.5037866
-.503744
-.5036206
-.5033896

-.503333

-.503

Table 4.1

.025

.0251982
.0253977
.025801
.0266422
.0285313
.032874

.043216

.050790
.0532758
.056436
.060543
.0659792

.0701003

.073

Iterations

26
21
21
21
23
53

30
45
47
37
27

364

78

The best LP approximate solution was computed for
p = l.04,l.06,...,1.5,2,4,8,...,128.

The search for Gk was terminated when (9(a)) < 10-6 or
after 32 iterations of the mixed bisection-secant method
whidhever occurred first. All of the solutions together
were computed in less than forty seconds. Some of the solu—
tions are listed in Table 4.1.

The second example appears in Cheney [6, p.44]. The

overdetermined system of linear equations is

x + y = 3

x - y = l
x + 2y = 7
2y + 4y = 11.1
2x + y = 6.9
3x + y = 7.2

This system poses special difficulties because the solution
of the ‘1 problem is not unique. All points on the segment
joining

P1 = (1.77,1.89) and P2 = (2.516667,1.516667)

solve the £1 problem with a minimal ‘1 error vector of
length 4.7.

The LP problem was solved for

p = l.06,l.08,...,1.5,2,4,6,20,40,100,400.

Ia H H‘ F4 +4 (A H H‘ F‘ +4 be H

Ul

.42
.38
.34

.26
.22
.18
.14
.10

.06

2.0883483
2.0889511
2.0893571
2.0895666
2.0896032
2.0895121
2.0893464
2.0891500
2.088926

2.0884417
2.0880841
2.0872254

Table 4.2

79

1.7400827
1.7365094
1.7337896
1.7319498
1.7309038
1.7304580
1.7303686
1.7304290
1.7305371
1.7307791
1.7309614
1.7313904

Iterations

15
16
16
14

10

34
22
11
13
13
23

80

The computed results for pip 1.5 agree with those published
by Cheney [6] and Duris [9]. For 1.06 g p < 1.5, no pub—
lished figures are available, but as p decreases (see Table
4.2), the solution of the LP problems approximately equals
a solution of the £1 problem.

For 1 < p < 1.06, algorithm 2.2 consistently obtained

solutions about
Q = (2.04615,l.75193),

which is quite far from the computed limit [1] of the solutions
of the LP problems as p 4 l, (2.0883,l.7309). It should
be noted, however, that Q is an ‘1 solution of the problem

since

Q = 1P1 + (1-x)P , 'where l = .369839.

The problem of determining how accurately one must
know a in step 2 was particularly troublesome with this

example. The smaller [9(a)] is forced, the longer the al-

gorithm takes and the greater is the possibility that the
computer is not capable of locating the desired a. To solve
the £1.06 problem, it was necessary to know a such that
[9(a)] < 10-24 in each iteration, and the computations took

28.5 seconds. The inability of the computer to store numbers

exactly and the presence of many solutions of the ‘1 problem
near the unique solution of the LP problem apparently teamed
up to render algorithm 2.2 ineffective for values of p less

than about 1.06.

81

On the topic of 2p approximation when 0 < p.g 1,
algorithm 3.20 was programmed in FORTRAN IV for a CDC 6500,
and two examples were studied.

The example taken from Cheney [6, p.44] that we dis-

cussed earlier was tested with
p='i—O", n=1,2,...,lO.

For p = l, the algorithm found both corner point solutions
mentioned previously. All points on the line segment joining
those two points are also solutions of the £1 problem.

For 0 < p < 1, however, the original problem has the unique
solution (2.516667,1.516667). For each case, the algorithm
took less than a second to compute the solutions.

For a second example, we chose K to be the subspace

S l
spanned by the single vector 1 ,, and took b to be 0
-5 1

The intersection of K9 (5), where s = b since b E KL,

. . - _ z i
and the three dimenSional 1p unit ball for p - 10,,10,...,
%% are shown in Figure 2(a) of Chapter III. 3 runs dia-
gonally from the upper left to the lower right passing through
the corner points shown. As Table 4.3 indicates, the solution

of problem (P*) "jumps" from the corner along 5 to the

ones On either side for p a .777 as the picture indicates.

.78

.775

.4545

.4383

.4160

.4107

.4093

.3856

.3442

.2886

.2163

.1291

.0433

.0014

.5000

.4629

.4204

.4112

.4089

.3715

.3150

.2500

.1768

.0992

.0313

.0010

.4545

.4383

.4160

.4107

.4093

.3856

.3442

.2886

.2163

.1291

.0433

.0014

82

.5000

.4629

.4204

.4112

.8185

.7711

.6883

.5772

.4327

.2582

.08665

0

.002891 -.0002891

Table 4.3

.0000

.08185
.08185
.07711
.07711
.06883
.06883
.05772
.05772
.04327
.04327
.02582
.02582
.008665
.008665

.0002891

.5000

.4629

.4204

.4112

.8185

.7711

.6883

.5772

.4327

.2582

0

.08665

0

.002891

0

2.000

2.000

2.000

2.000

1.998
1.998
1.949
1.949
1.896
1.896
1.861
1.861
1.845
1.845
1.848
1.848
1.873
1.873
1.923

1.923

BIBLIOGRA PHY

BIBLIOGRA PHY

[ l] Abdelmalek, N.H., Linear Ll approximation for a
discrete point set and L1 solutions for over-
determined linear equations, J. ACM 18 (1971),
41—47.

[ 2] Abdelmalek, N.H., On the discrete linear Ll approx-
imation and L1 solutions of overdetermined linear
equations, J. Approximation Theory 11 (1974), 38-53.

[ 3] Anton, H. and Duris, C.S., An algorithm for best approx-
imate solutions to Av = b in normed linear spaces,
J. Approximation Theory 8 (1973), 133-141.

[ 4] Barrodale, I. and Young, A., Algorithms for best L1

and L6° linear approximations on a discrete set,
Numer. Math. 8 (1966), 295-306.

[ 5] Barrodale, I. and Roberts, F.D.K., An improved algorithm
for discrete £1 linear approximation, SIAM J. Numer.

Anal. 10 (1973), 839-848.

F 6] Cheney, E.W., Introduction to Approximation Theory,
McGraw Hill, New York, N.Y. 1966.

r7] Cline, A.K., Rate of convergence of Lawson's algorithm,
Math. Comp. 26 (1972), 167—176.

[ 8] Descloux, J., Approximations in Lp and Chebyshev
approximations, J. Soc. Indust. Appl. Math. 11 (1963),

r“:

9] Duris, C.S., An algorithm for solving overdetermined
linear equations in the LP-sense, Mathematical
Report 70-10, Drexel University, 1970.

[10] Duris, C.S. and Sreedharan, V.P., Chebyshev and Ll-
solutions of linear equations using least squares
solutions, SIAM J. Numer. Anal. 5 (1968), 491-505.

[ll]

[12]

[13]

[14]

[l6]

[17]

[18]

[19]

[20]

[21]

[22]

The cal-
Comput.

Fletcher, R., Grant, J.A.,
culation of linear L

J. 14 (1971), 276-279.

and Hebden, M.D.,
approximations,

Fletcher, R., Grant, J.A., and Hebden, M.D., Linear
minimax approximation as the limit of best LP-

approximation, SIAM J. Numer. Anal. 11 (1974),
123-136.

Hoel, P.G., Certain problems in the theory of closest
approximation, Amer. J. Math. 57 (1935), 891-901.

Lawson, C.L., Contributions to the Theory of Linear
Least Maximum Approximation, Ph.D. Thesis, University
of California, Los Angeles, 1961.

Motzkin, T.S. and walsh, J.L., Least pth power poly—
nomials on a real finite point set, Trans. AMS 78
(1955), 67-81.

Motzkin, T.S. and Walsh, J.L., Polynomials of best
approximations on a real finite point set. I, Trans.
AMS 91 (1959), 231-245.

Motzkin, T.S. and Walsh, J.L., Polynomials of best
approximation on an interval, Proc. N.A.S. 45 (1959),
1523-1528.

Rice, J.R. and Usow, K.H., The Lawson algorithm and ex-
tensions, Math. Comp. 22 (1968), 118-127.

Sreedharan, V.P., Solutions of overdetermined linear
equations which minimize error in an abstract norm,

Sreedharan, V.P., Least squares algorithms for finding
solutions of overdetermined linear equations which
minimize error in an abstract norm, Numer, Math. 17
(1971), 387-401.

Sreedharan, V.P., Least squares algorithms for finding
solutions of overdetermined systems of linear equa-
tions which minimize error in a smooth strictly
convex norm, J. Approximation Theory 8 (1973), 46-61.

Stiefel, E., Note on Jordan elimination, linear program-
ming, and Tchebycheff approximation, Numer. Math. 2
(1960), 1-17.

»‘i]]]]]i]]i“

8

"([11]