COMPARATIVE ANALYSIS OF
ORTHOGONAL MATCHING PURSUIT AND
LEAST ANGLE REGRESSION
By
Mazin Abdulrasool Hameed

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
MASTER OF SCIENCE
Electrical Engineering
2012

ABSTRACT
COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING
PURSUIT AND LEAST ANGLE REGRESSION
By
Mazin Abdulrasool Hameed
The problem of finding a unique and sparse solution for an underdetermined linear system has
attracted significant attention in recent years. In this thesis, we compare two popular algorithms
that are used for finding sparse solutions of underdetermined linear systems: Orthogonal
Matching Pursuit (OMP) and Least Angle Regression (LARS). We provide an in-depth
description of both algorithms. Subsequently, we outline the similarities and differences
between them. OMP and LARS solve different optimization problems: OMP attempts to find
an approximate solution for the l0-norm minimization problem, while LARS solves the l1-norm
minimization problem. However, both algorithms depend on an underlying greedy framework.
They start from an all-zero solution, and then iteratively construct a sparse solution until some
convergence is reached. By reformulating the overall structure and corresponding analytical
expressions of OMP and LARS, we show that many of the steps of both algorithms are almost
identical. Meanwhile, we highlight the primary differences between the two algorithms. In
particular, based on our reformulation, we show that the key difference between these
algorithms is how they update the solution vector at each iteration. In addition, we present
some of the salient benefits and shortcomings of each algorithm. Moreover, we exploit parallel
processing techniques to speed-up the convergence of the algorithms.

ACKNOWLEDGMENTS
First of all, I would like to express my deep gratitude to my advisor Professor Hayder Radha for
his guidance, inspiration and encouragement. Without his invaluable advice and support, this
work would not be possible.
In addition, I would like to thank the government of my country (Iraq) represented by
ministry of higher education and scientific research for granting me the scholarship to study in
the United State and supporting me financially during my study in Michigan State University.
Furthermore, I would like to thank my colleague in WAVES lab Mohammad Aghagolzadeh
for helping me a lot in this work. Also, I am thankful to my friends Wahhab Albazrqaoe and
Mohammed Alqizwini. Their support and encouragement are appreciated.
Finally, a special thanks to my parents and my wife for their encouragement and support
throughout my study.

iii

TABLE OF CONTENTS
LIST OF TABLES ........................................................................................................................ v
LIST OF FIGURES ..................................................................................................................... vi
ABBREVIATIONS .................................................................................................................... viii
NOTATIONS................................................................................................................................ ix
1

Introduction ............................................................................................................................. 1
1.1 Background ......................................................................................................................... 1
1.2 Contributions....................................................................................................................... 5
1.3 Thesis Organization ............................................................................................................ 6

2

Greedy Pursuit Algorithms .................................................................................................... 8
2.1 Matching Pursuit (MP) ....................................................................................................... 8
2.2 Orthogonal Matching Pursuit (OMP) ............................................................................... 11
2.3 Stagewise Orthogonal Matching Pursuit (StOMP) ........................................................... 15

3

Least Angle Regression (LARS) .......................................................................................... 17
3.1 Basic LARS Algorithm ..................................................................................................... 17
3.3.1 Derivation of Updated Direction ................................................................................ 20
3.1.2 Derivation of Step Size ............................................................................................... 21
3.2 Modified LARS for solving the LASSO problem ............................................................ 27

4

Comparative Analysis of OMP and LARS ......................................................................... 32
4.1 Algorithmic Steps ............................................................................................................. 32
4.1.1 Reformulating step for updating the active set ........................................................... 35
4.1.2 Reformulating step for updating the solution vector .................................................. 36
4.2 Performance Analysis ....................................................................................................... 41
4.2.1 Convergence time ....................................................................................................... 41
4.2.2 Accuracy ..................................................................................................................... 42

5

Simulation Results and Discussion ...................................................................................... 48
5.1 Reconstructing images from their compressive samples .................................................. 48
5.2 Using parallel processing to speed-up convergence of the algorithms ............................. 55
5.3 Impact of measurement size and sparsity on MSE ........................................................... 58
5.4 Variation of the solution and correlation coefficients over iterations............................... 62
5.5 Study the algorithms performance through three dimensions (3D) examples .................. 67

6

Conclusion and Future Work .............................................................................................. 72

Appendix A .................................................................................................................................. 75
BIBLIOGRAPHY ....................................................................................................................... 78
iv

LIST OF TABLES
Table 4.1: The algorithmic steps of OMP and LARS................................................................... 34
Table 4.2: The original step for updating the active set and the reformulated one in OMP and
LARS ............................................................................................................................................ 36
Table 4.3: The original step for updating the solution vector and the reformulated one in OMP
and LARS...................................................................................................................................... 39
Table 4.4: The algorithmic steps of OMP and LARS after reformulating ................................... 40

v

LIST OF FIGURES
Figure 1.1: A sparse solution of an underdetermined linear system ............................................... 3
Figure 2.1: The MP algorithm ...................................................................................................... 10
Figure 2.2: The OMP algorithm.................................................................................................... 12
Figure 2.3: OMP approximates the vector 𝑦 by using the columns 𝑎1, 𝑎2 in Example 2.1 ......... 14
Figure 2.4: The StOMP algorithm ................................................................................................ 16
Figure 3.1: LARS approximates the vector 𝑦 by using the columns 𝑎1, 𝑎2................................. 20
Figure 3.2: The basic LARS algorithm ......................................................................................... 26
Figure 4.1: Columns of the matrix 𝐴 and the measurement vector 𝑦 in Example 4.1 .................. 43

Figure 4.2: The selection and updating steps at the first iteration of OMP and LARS in Example
4.1.The approximation change is obtained via multiplying the matrix 𝐴 by the current solution
update vector Δ𝑥𝑡 .......................................................................................................................... 45

Figure 4.3: The selection and updating steps at the second (last) iteration of OMP and LARS in
Example 4.1. The approximation change is obtained via multiplying the matrix 𝐴 by the current
solution update vector Δ𝑥𝑡 ............................................................................................................ 46

Figure 4.4: The solution coefficients of the reconstructed vector 𝑥 over iterations of OMP and
LARS in Example 4.1, and Euclidean norm of error between the original sparse vector 𝑥 and the
reconstructed vector 𝑥 after the algorithms terminate .................................................................. 47
Figure 5.1: Reconstruct a patch of an image from its compressive samples by using OMP or
LARS ............................................................................................................................................ 50

Figure 5.2: Reconstructed images from their compressive samples by using OMP and LARS
with 𝑚 = 32 and 𝑛 = 64.............................................................................................................. 53
Figure 5.3: Reconstructed images from their compressive samples by using OMP and LARS
with 𝑚 = 16 and 𝑛 = 64.............................................................................................................. 54
Figure 5.4: The convergence time of OMP and LARS against the number of used processors .. 56

Figure 5.5: The ideal and actual speedup ratios for the OMP algorithm against the number of
used processors ............................................................................................................................. 57
Figure 5.6: Average MSE of 100 trials between the original sparse vectors and the reconstructed
ones by OMP and LARS as a function of measurement size 𝑚 for different sparsity 𝑘 .............. 60
vi

Figure 5.7: Average MSE of 100 trials between the original sparse vectors and the reconstructed
ones by OMP and LARS as a function of sparsity 𝑘 for different measurement size 𝑚 .............. 61

Figure 5.8: The solution coefficients that correspond to the active columns over iterations of
OMP and LARS ............................................................................................................................ 63
Figure 5.9: Absolute correlation of columns of the matrix 𝐴 in some iterations of OMP ............ 65
Figure 5.10: Absolute correlation of columns of the matrix 𝐴 in some iterations of LARS ........ 66
Figure 5.11: The columns of the matrix 𝐴 and the measurement vector 𝑦 in Example 5.1 ......... 68

Figure 5.12: The approximation change (𝐴∆𝑥) at each iteration of OMP and LARS in Example
5.1, where 𝜇 = 0.5124, and the approximation change = 𝐴 × ∆𝑥 .............................................. 69
Figure 5.13: The columns of the matrix 𝐴 and the measurement vector 𝑦 in Example 5.2 ......... 70

Figure 5.14: The approximation change (𝐴∆𝑥) at each iteration of OMP and LARS in Example
5.2, where 𝜇 = 0.8885, and the approximation change = 𝐴 × ∆𝑥 .............................................. 71

vii

ABBREVIATIONS
CS

Compressed Sensing

DCT

Discrete Cosine Transform

LARS

Least Angle Regression

LASSO

Least Absolute Shrinkage and Selection Operator

MP

Matching Pursuit

MSE

Mean Square Error

OMP

Orthogonal Matching Pursuit

PSNR

Peak Signal to Noise Ratio

StOMP

Stagewise Orthogonal Matching Pursuit

viii

NOTATIONS

𝑥

Solution vector which is usually sparse

𝑚

Size of the measurement vector 𝑦

𝑦

𝑛

𝑘

𝐴

𝑎𝑖
𝐼

𝐼𝑐
𝑡

𝑟𝑡

𝑐𝑡

𝑥𝑡

∆𝑥 𝑡
𝑑𝑡
𝛾𝑡

𝜆𝑡

Observation or measurement vector
Size of the solution vector 𝑥

Sparsity (number of nonzero entries of the solution vector 𝑥 )
Projection matrix of size 𝑚 × 𝑛
𝑖 𝑡ℎ column of the matrix 𝐴

Active set

Inactive set
Iteration counter

Residual vector at iteration 𝑡

Correlation vector at iteration 𝑡

Current solution vector at iteration 𝑡

Solution update vector at iteration 𝑡 (i.e. ∆𝑥 𝑡 = 𝑥 𝑡 − 𝑥 𝑡−1 )
Updated direction of LARS at iteration 𝑡
Step size of LARS at iteration 𝑡

The largest absolute entry in the correlation vector at iteration 𝑡, and represents the
absolute correlation of the active columns in LARS

ix

Chapter 1
1 Introduction
1.1 Background
Various applications in science and nature can be modeled as an underdetermined linear system
which is characterized by having fewer equations than unknowns. Therefore, finding a solution
for an underdetermined linear system is a central issue for a wide range of areas and applications.
Some of these areas include: Compressed Sensing (CS) [1,2,3,4,5], error correction [6,7],
minimum distance problems in coding theory [8], and a wide range of inverse problems [9]. In
this thesis, we focus on presenting the problem of finding a solution for an underdetermined
linear system in the context of the area of compressed sensing. In that context, instead of sensing
a signal using a high sampling rate (e.g. the Nyquist rate), CS senses a compressive
representation of that signal using a number of measurements that is smaller than the signal
dimension. As a result, CS generates fewer measurements than traditional signal sampling
methods that have been based on the Nyquist criterion for decades. Consequently, the CS
framework leads to solving the following underdetermined linear system:
𝑦 = 𝐴𝑥

(1.1)

Here, 𝑦 ∈ 𝑅 𝑚 is the measurement vector (compressive samples), 𝐴 is a known 𝑚 × 𝑛

projection matrix with 𝑚 < 𝑛, and 𝑥 ∈ 𝑅 𝑛 is an unknown vector that represents the signal that

we need to find.

In the system of Equation (1.1), the number of unknowns is more than the number of

equations. Therefore, there are either infinitely many solutions, or no solution if the vector 𝑦 is
1

not in the space spanned by the columns of the matrix 𝐴. To avoid the case of having no solution,

we assume that the matrix 𝐴 is a full rank matrix, and its columns span the whole space 𝑅 𝑚 . In
practice, and for virtually all real applications, a unique solution for this system is required. It has

been well-established under the area of compressed sensing and related literature that if the

signal 𝑥 has a sparse representation (includes only a few nonzero entries) in some space

(transform domain), it can be recovered uniquely [2,4,10,1,11,3,12].

To find the sparse solution for the underdetermined linear system, the following optimization
problem should be solved:

min ‖𝑥‖0
𝑥

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜

𝐴𝑥 = 𝑦

(1.2)

where ‖𝑥‖0 is l0-norm, which represents the number of nonzero entries in a vector 𝑥 (i.e.
‖𝑥‖0 = {# 𝑜𝑓 𝑖 ∶ 𝑥(𝑖) ≠ 0}. When ‖𝑥‖0 ≪ 𝑛 , 𝑥 is consider to be sparse.

Solving the optimization problem characterized by Equation (1.2) yields to a sparse solution

that represents the given vector 𝑦 as a linear combination of the fewest columns of the matrix 𝐴.

Suppose that the vector 𝑥 has only 𝑘 nonzero entries with 𝑘 < 𝑚 < 𝑛. To find such 𝑘 nonzero

entries of the vector 𝑥, 𝑘 columns of the matrix 𝐴 that best approximate the vector 𝑦 should be

identified. For example in Figure 1.1, when shaded columns are recognized, nonzero shaded

coefficients of the vector 𝑥 can be computed.

The l0-norm minimization problem expressed by Equation (1.2) is an NP-hard problem [13].

To obtain a global solution, we need to examine the feasibility of all 2 𝑛 sparsity patterns of the
vector 𝑥. Hence, the computation complexity of solving problem (1.2) is 𝑂(2 𝑛 ), which is not

practical when 𝑛 is very large. As a result, some efforts in this area tend to find an approximate
solution with tractable complexity by tolerating some error:

2

min ‖𝑥‖0
𝑥

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜

where 𝜖 > 0 is a small constant.

‖𝐴𝑥 − 𝑦‖2 < 𝜖

(1.3)

The problem characterized by Equation (1.3) is commonly known as “sparse approximation”
[13].

𝑚

×

𝑛

0
0
*
0
0
0
*
*
0
0
0
0

𝑥

Matrix 𝐴

𝑛

=

𝑦

𝑚

* : nonzero

Figure 1.1: A sparse solution of an underdetermined linear system
Because l0-norm is a nonconvex function, one approach for finding a suboptimal solution of
the problem (1.2) is relaxing the l0-norm by an l1-norm which is convex function [14,15,16]:
min ‖𝑥‖1
𝑥

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜

𝐴𝑥 = 𝑦

(1.4)

In the signal processing literature, the l1-norm minimization problem (1.4) is known as “Basis
Pursuit” [14]. The solution that is obtained by solving problem (1.4) is equivalent to the solution
of l0-norm minimization problem (1.2) under some conditions [17,18,19,20,21]. The l1-norm
minimization (1.4) is a convex optimization problem, and therefore, it can be solved globally and
efficiently by adopting some convex techniques such as the interior point method [22].
3

Another general direction for finding a sparse solution of the underdetermined linear system
(1.1) is to use greedy algorithms [23]. There has been a wide range of greedy algorithms
proposed in the literature for finding sparse solutions for the CS problem. One popular family of
greedy algorithms, which received a great deal of attention, is the family of greedy pursuit
algorithms. Some of these algorithms include: Matching Pursuit (MP) [24], Orthogonal
Matching Pursuit (OMP) [25,26,27], Stagewise Orthogonal Matching Pursuit (StOMP) [28],
Regularized Orthogonal Matching Pursuit (ROMP) [29], Orthogonal Complementary Matching
pursuit (OCMP) [30], Gradient Pursuit (GP) [31], and Compressive Sampling Matching pursuit
(CoSaMP) [32]. Another popular framework, which can be considered greedy in nature, for
solving the underdetermined system (1.1) is Least Angle Regression (LARS) [33].
Before elaborating further on both of these families of algorithms, namely matching pursuit
and least angle regression, we briefly highlight the overall strategy used by these and other
greedy algorithms. First, it is worth emphasizing that one of the primary motivations for pursuing
greedy algorithms is that they are faster than convex techniques. However, greedy algorithms do
not always yield as an accurate solution as convex techniques do. Being iterative, a greedy
algorithm usually makes the choice that appears the best at that particular iteration of the
algorithm. In essence, a greedy algorithm makes a locally optimal choice at each iteration with
the hope that this choice will lead to a globally optimal solution. In other words, these algorithms
iteratively make one greedy choice after another until the final solution is reached. Thus, the
choice made by a greedy algorithm may depend on past choices, but not on future choices.
Equally important, greedy algorithms, in general, never change their pervious choices that are
made so far.

4

To find a solution for the optimization problem (1.2), greedy algorithms typically begin from
an all-zero solution, and then iteratively build a sparse solution by selecting a set of columns

from the matrix 𝐴 that best approximates the vector 𝑦. Furthermore, such algorithms usually
update the solution 𝑥 at each iteration in a way that depends on the selected columns up to that

point. This update leaves a residual signal, which represents the part of the measurement vector
𝑦 that have not been used to recover the solution vector (up to that point); and hence, this

residual will be used at the next iteration. These steps are repeated until the current residual falls
below a certain very small value.

1.2 Contributions
In this thesis, we focus on analyzing and comparing the two, arguably most popular greedy
algorithms that are used for finding a sparse solution of an underdetermined linear system:
Orthogonal Matching Pursuit (OMP) and Least Angle Regression (LARS). OMP approximates
the l0-norm minimization problem (1.2), while LARS solves the l1-norm minimization problem
(1.4). Although they solve different optimization problems, we show that both OMP and LARS
have almost identical steps after reformulating the underlying analytical expressions of these
algorithms. The significant difference between them is how they update the solution vector. Even
then, we illustrate some of the subtle similarities and differences between their update strategies;
and try to shed some light on their strengths and weaknesses in that respect.
More specifically, our key contributions include:
• Providing a comparative analysis of OMP and LARS in terms of how they find a sparse
solution of an underdetermined linear system. Comparing such popular and widely used

5

algorithms is important to their understanding that could lead to better and more efficient
implementations.
• Providing a thorough insight and detailed derivation of computing the updated direction
vector and step size parameter that are used in LARS. Prior work, including the original
LARS paper [33] did not provide the same level and detailed analysis that is presented in this
thesis.
• Reformulating some algorithmic steps of OMP and LARS to discover similarities and
differences between them.
• Presenting some of the benefits and shortcomings of OMP and LARS.
• Simulating the comparative analysis of OMP and LARS through different examples covering
a wide range of cases.

1.3 Thesis Organization
The rest of the thesis is organized as follows:
• In chapter 2, we review three greedy pursuit algorithms that are used for finding a sparse
solution of an underdetermined linear system, and concentrate on explaining the OMP
algorithm. We start from the basic MP algorithm, which is considered the ancestor of OMP.
Then, we describe OMP in detail. We end the chapter by explaining StOMP, which is a
modified version of OMP.
• In chapter 3, we review how LARS can be used for finding a sparse solution of an
underdetermined linear system [34]. First, we describe the basic LARS algorithm [33] and the
derivations of its main steps. Then, we explain how LARS can be modified to solve the well-

6

known Least Absolute Shrinkage and Selection Operator (LASSO) problem [35], which can
be defined as a constrained version of ordinary least square for solving the l1-norm
minimization problem.
• In chapter 4, we compare OMP and LARS in terms of their algorithmic steps and
performance. We reformulate some steps of OMP and LARS to discover the similarities and
differences between them. To study the performance of OMP and LARS, we compare the
convergence time and solution accuracy in each algorithm.
• In chapter 5, we simulate the comparative analysis of OMP and LARS using MATLAB. In
general, we go through different examples to demonstrate the similarities and differences
between OMP and LARS from various perspectives. First, we employ OMP and LARS to
reconstruct some images from their compressive samples. Afterward, we exploit parallel
processing techniques to speed-up convergence of the algorithms, and observe the efficiency
of using different number of processors. Next, we examine the performance of OMP and

LARS in terms of mean square error (MSE) as a function of the measurement size 𝑚 and the
sparsity 𝑘. To study the difference in updating process of OMP and LARS, we examine and

plot the coefficients of the solution and correlation vectors over iterations. Note that the
correlation vector represents the correlations of columns of the matrix 𝐴 with current residual

vector 𝑟. At last, we generate two different examples, and utilize OMP and LARS to

reconstruct the sparse vector in each example. At each iteration of the algorithms, we plot the
updating process in three dimensions (3D) view.
• Finally, in chapter 6, we state the conclusion of this thesis and some potential future work.

7

Chapter 2
2 Greedy Pursuit Algorithms
Greedy pursuit algorithms attempt to find an approximate solution for the l0-norm minimization
problem (1.2). In this chapter, we review three greedy pursuit algorithms that are used for finding
a sparse solution of an underdetermined linear system, and concentrate on explaining the OMP
algorithm. First, we explain the basic MP algorithm which is considered the ancestor of OMP.
Then, we describe OMP in detail. We end this chapter by explaining StOMP, which is a
modified version of OMP.

2.1 Matching Pursuit (MP)
Matching Pursuit (MP) [24] is a basic iterative greedy algorithm that searches for a sparse
solution of an underdetermined linear system. The fundamental vector in MP is the residual

vector 𝑟 ∈ 𝑅 𝑚 , which represents the remaining part of the measurement vector 𝑦 after updating
the solution vector. Henceforth, we assume that the columns of the matrix 𝐴 are normalized to

have unit l2-norm (i.e. ‖𝑎 𝑖 ‖2 = 1, 𝑖 = {1,2, … . , 𝑛}, where 𝑎 𝑖 is the 𝑖 𝑡ℎ column of the matrix 𝐴).
MP starts from an all zero solution, and initializes the residual with the measurement vector 𝑦

(i.e. 𝑟0 = 𝑦). Then at each iteration 𝑡, MP computes the correlation vector by multiplying

columns of the matrix 𝐴 with the residual vector 𝑟 from the previous iteration as follows:
𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1

(2.1)

Next, it selects a column from the matrix 𝐴 that is highly correlated with the current residual 𝑟.

Indeed, the MP algorithm selects a column from the matrix 𝐴 that corresponds to the entry in the
8

vector 𝑐 𝑡 that has the largest magnitude (i.e. the largest entry in the vector 𝑐 𝑡 is determined by
considering the absolute value of each entry in the vector 𝑐 𝑡 ). This can be expressed by the
following equation:

𝑖 = arg max |𝑐 𝑡 (𝑗)|

(2.2)

𝑥 𝑡 (𝑖) = 𝑥 𝑡−1 (𝑖) + 𝑐 𝑡 (𝑖)

(2.3)

where 𝑖 is the index of the selected column.

1≤j≤n

Afterward, the 𝑖 𝑡ℎ solution coefficient 𝑥(𝑖) associated with the selected column is updated:

The last step of MP is computing the current approximation to the vector 𝑦 via multiplying the

matrix 𝐴 by the current solution vector 𝑥 𝑡 , and finding a new residual vector by subtracting the
vector 𝑦 from the current approximation:

𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

(2.4)

The residual vector 𝑟 𝑡 will be used in the subsequent iteration (𝑡 + 1) to calculate a new

correlation vector 𝑐 𝑡+1, and select a new column. These steps are repeated until the norm of the
residual 𝑟 𝑡 falls below a certain very small value (𝜖). The MP algorithm can be summarized in

Figure 2.1.

9

• Input: the vector 𝑦 ∈ 𝑅 𝑚 , the matrix 𝐴 ∈ 𝑅 𝑚×𝑛 , and the termination threshold for the
residual norm 𝜖.

• Output: the sparse vector 𝑥 ∈ 𝑅 𝑛 .

• Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴.

1. Initialization: 𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1

2. Compute the correlation vector: 𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1

3. Find a column index of the matrix 𝐴 that is best correlated with the current residual vector.
𝑐 𝑡:

This can be achieved by determining the index of the largest absolute entry in the vector
𝑖 = arg max |𝑐 𝑡 (𝑗)|
1≤𝑗≤𝑛

4. Update the 𝑖th solution coefficient: 𝑥 𝑡 (𝑖) = 𝑥 𝑡−1 (𝑖) + 𝑐 𝑡 (𝑖)
5. Compute the new residual vector : 𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

6. If ‖𝑟 𝑡 ‖2 < 𝜖, terminate and return 𝑥 = 𝑥 𝑡 as the final solution. Else, increase the iteration
counter: 𝑡 = 𝑡 + 1 and return to step 2

Figure 2.1: The MP algorithm

The MP algorithm is simple and intuitive for finding a sparse solution of an underdetermined
linear system. However, it suffers from slow convergence because it requires a large unbounded
number of iterations to find a solution. The computational complexity of the MP algorithm is
𝑂(𝑚𝑛𝑇), where 𝑇 is the number of iterations that are required for MP to coverage to the final

solution.

10

2.2 Orthogonal Matching Pursuit (OMP)
In many applications, MP is not practical since its complexity increases linearly with the number
of required iterations (𝑇). Therefore, MP was revised to limit the number of required iterations
by adding an orthogonalization step. The modified MP is known as Orthogonal Matching
Pursuit (OMP) [25,26,27]. OMP inherits many steps from MP. The primary modification that is
made by OMP is using a least square formula to achieve better approximation to the
measurement vector 𝑦 over the selected columns. OMP keeps indices of the selected columns in

a set called the active set 𝐼. Therefore, the selected columns are called the active columns.

Like MP, OMP starts from an all-zero solution, and initializes the residual to the

measurement vector 𝑦. At each iteration, OMP selects a column from the matrix 𝐴 that is best

correlated with the residual vector 𝑟. Then, OMP appends the index of the selected column to the

active set. The next step is to find the active entries (the entries that correspond to the active set
𝐼) of the solution vector 𝑥 by solving the following least square problem over the active columns:
𝑥 𝑡 (𝐼) = arg min‖𝑦 − 𝐴 𝐼 𝑣‖2
2
𝑣

(2.5)

Here, 𝐴 𝐼 is a sub-matrix that is formed by the active columns, and 𝑥(𝐼) is a sub-vector which
holds the active entries of the vector 𝑥.

By using well-known linear algebra techniques, problem (2.5) can be solved by projecting the

vector 𝑦 onto the space spanned by the active columns. This can be achieved by solving for 𝑥 𝑡 (𝐼)
in the following equation:

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦

(2.6)

Note that Equation (2.6), which is widely known as the normal equation, represents a linear
system with a unique solution.
11

Then, and as in the MP algorithm, a new residual vector is computed using Equation (2.4).
These steps are repeated until the norm of the current residual falls below a certain very small

value 𝜖. When OMP terminates, the vector 𝑦 is spanned by the set of active columns. More
importantly, OMP ensures that the residual vector 𝑟 is orthogonal to all active columns at each

iteration. Consequently, the correlations of the active columns will be zeros at the next iteration.
Therefore in OMP, no column is selected twice, and the active set grows linearly with the
iteration counter. OMP can be summarized in Figure 2.2.
• Input: the vector 𝑦 ∈ 𝑅 𝑚 , the matrix 𝐴 ∈ 𝑅 𝑚×𝑛 , and the termination threshold for the
residual norm 𝜖.

• Output: the sparse vector 𝑥 ∈ 𝑅 𝑛 .

• Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴.

1. Initialization: 𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1, 𝐼 = ∅

2. Compute the correlation vector: 𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1

3. Find a column index of the matrix 𝐴 that is best correlated with current residual vector.
𝑐 𝑡:

This can be achieved by determining the index of the largest absolute entry in the vector

𝑖 = arg max|𝑐 𝑡 (𝑗)|
c
𝑗∈𝐼

where 𝐼 c is the inactive set (the set have indices of columns of the matrix 𝐴 that are not in the

4. Add 𝑖 to the active set: 𝐼 = 𝐼 ∪ {𝑖}
active set

5. Solve the least square problem: 𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦
6. Compute the new residual vector : 𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

7. If ‖𝑟 𝑡 ‖2 < 𝜖, terminate and return 𝑥 = 𝑥 𝑡 as the final solution. Else, increase the iteration
counter: 𝑡 = 𝑡 + 1 and return to step 2

Figure 2.2: The OMP algorithm
12

To further illustrate the OMP algorithm, we consider the following example.

Example 2.1: assume that the matrix 𝐴 contains two columns (𝑎1 , 𝑎2 ) as illustrated in Figure 2.3.

In this case, the solution coefficients (𝑥(1), 𝑥(2)) that generate the vector 𝑦 as a linear
combination of the columns 𝑎1 , 𝑎2 should be found:

𝑦 = 𝑥(1) 𝑎1 + 𝑥(2)𝑎2

OMP starts by determining which column is highly correlated with the current residual. Recall

that the residual vector is initialized to the vector 𝑦 at the beginning of the first iteration. In this

example, the vector 𝑦 is correlated more with the vector 𝑎1 than the vector 𝑎2 . For this reason,

OMP selects the vector 𝑎1 during the first iteration. Then, OMP takes the largest possible step in
the direction of the vector 𝑎1 ; this is achieved by projecting the vector y onto the vector 𝑎1 to get

the current solution coefficient 𝑥1 (1) (red bold line in Figure 2.3). This leaves some error value

that is represented by the residual vector 𝑟1. This residual is orthogonal to the vector 𝑎1 which in
this example represents the active set. At the second iteration, a new column is selected which

represents the best correlated column with the current residual 𝑟1. In this simple illustrative

example, the column 𝑎2 which was left out from the first iteration, is selected now and its index

is added to the active set 𝐼. Next, the vector 𝑦 is projected onto the space spanned by the columns

𝑎1 and 𝑎2 to obtain the current solution coefficients 𝑥2 (1) and 𝑥2 (2) (blue bold line in

Figure 2.3). Since the vector 𝑦 belongs to the space spanned by 𝑎1 and 𝑎2 , there is no residual

signal left after the second iteration (i.e. 𝑟2 = 0). In this case, OMP terminates and returns the

vector 𝑥2 as the final solution.

13

𝑎2

𝑥1 (1)

𝑥2 (1)

𝑥2 (2)

𝑦

𝑟1

𝑎1

Figure 2.3: OMP approximates the vector 𝑦 by using the columns 𝑎1 , 𝑎2 in Example 2.1
The Red bold line represents the approximation to the vector 𝑦 at the first iteration; while the
blue bold line represents the approximation at the second iteration. For interpretation of the
references to color in this and all other figures, the reader is referred to the electronic version of
this thesis.

The orthogonalization step expedites convergence of the OMP algorithms to the final
solution. Because of this step, the OMP algorithm is faster than the basic MP algorithm. In OMP,
a column is selected and added to the active set only once. This implies that there is no chance
for the same column to be selected twice since the residual vector has zero correlation with the
active columns, because it is always orthogonal to all active columns after updating the solution
coefficients. On the other hand, a column may be selected more than once during MP’s
computation. Therefore, MP requires more iterations than OMP to converge to the final solution.
Nevertheless, each iteration in the OMP algorithm requires a higher computational cost than an
MP’s iteration. This is because of the orthogonalization step (solving a least square problem) that
is used by the OMP algorithm.

14

In the OMP algorithm, there is a high probability to recover a 𝑘 sparse signal (includes only

𝑘 nonzero entries) in at most 𝑘 iterations using the vector 𝑦 and the matrix 𝐴 [36]. As a result,

the computational complexity of the OMP algorithm would be 𝑂(𝑚𝑛𝑘). Hence, OMP is an

efficient algorithm for finding a sparse solution of an underdetermined linear system when the

sparsity 𝑘 is relatively small. Note that the sparsity 𝑘 of any vector refers to the number of
nonzero entries in that vector. It is worth noting, in the statistical literature, there is a similar
algorithm to OMP that is known as “Forward Stepwise Regression” [37].

2.3 Stagewise Orthogonal Matching Pursuit (StOMP)
In many signal processing applications, such as image inpainting [38] and cartoon-texture
decomposition [39,40], the underdetermined linear system is extremely large in scale, and the
unknown vector (𝑥) is not very sparse (i.e. includes many nonzero entries). In such cases, OMP

is not a practical algorithm to find the solution vector 𝑥 because the computational complexity of

OMP increases linearly with the number of nonzeros 𝑘. Therefore, OMP has been revised in a

way to find less accurate solutions for such large scale underdetermined linear systems in a
reasonable time. The modified version of OMP is called Stagewise Orthogonal Matching Pursuit
(StOMP) [28]. The StOMP is characterized by adding several columns to the active set at each
iteration instead of considering a single column as in the case of OMP. StOMP selects columns
that their absolute correlations with the current residual exceed a specific chosen threshold value.
Thus, the convergence of StOMP is faster than OMP since StOMP requires less number of

iterations than OMP to find the solution vector 𝑥. The drawback of the StOMP algorithm is the
accuracy of the resulting solution. In other words, although StOMP is faster than OMP, the
calculated solution by StOMP is less accurate than the calculated one by OMP. It should be clear
that StOMP offers a trade-off between the accuracy of the final solution and the required time to
find that solution.
15

The algorithmic steps of StOMP are similar to OMP’s steps, except that in OMP, a single
column is added to the active set at each iteration, while in StOMP, more than one column can be
added to the active set at each iteration. It is important to mention that in the StOMP algorithm,
the threshold value (𝑡ℎ𝑟) is calculated at each iteration by the following equation:
𝑡ℎ𝑟 = 𝑠𝜎

where 2 ≤ 𝑠 ≤ 3 is the threshold parameter, and 𝜎 =

‖𝑟‖2
√𝑛

(2.7)
is the formal noise level.

Note that the parameter 𝜎 is updated at each iteration since it depends on the residual vector 𝑟.

The StOMP algorithm can be summarized in Figure 2.4.

• Input: the vector 𝑦 ∈ 𝑅 𝑚 the matrix 𝐴 ∈ 𝑅 𝑚×𝑛 , the termination threshold for the residual
norm 𝜖, and the threshold parameter 𝑠.

• Output: the sparse vector 𝑥 ∈ 𝑅 𝑛 .

• Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴.

1. Initialization: 𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1, 𝐼 = ∅

2. Compute the correlation vector: 𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1

3. Compute the threshold: 𝑡ℎ𝑟 𝑡 = 𝑠𝜎 𝑡 where 2 ≤ 𝑠 ≤ 3 and 𝜎 =

‖𝑟 𝑡−1 ‖2
√𝑛

.

4. Find indices of columns that their absolute correlations with the residual vector exceed the
threshold value:

𝑖 = { 𝑗: |𝑐 𝑡 (𝑗)| > 𝑡ℎ𝑟 𝑡 , 𝑗 ∈ 𝐼 c }

where 𝐼 c is the inactive set (the set have indices of columns of the matrix 𝐴 that are not in the

5. Add 𝑖 to the active set: 𝐼 = 𝐼 ∪ {𝑖}
active set

6. Solve the least square problem: 𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦
7. Compute the new residual : 𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

8. If ‖𝑟 𝑡 ‖2 < 𝜖, terminate and return 𝑥 = 𝑥 𝑡 as the final solution. Else, increase the iteration
counter: 𝑡 = 𝑡 + 1 and return to step 2.

Figure 2.4: The StOMP algorithm
16

Chapter 3
3

Least Angle Regression (LARS)
LARS was originally proposed by Efron, Hastie, Johnstone and Tibshirani as a new model
selection algorithm to solve overdetermined linear systems [33]. In addition, Efron et al
demonstrated how LARS can be modified to solve the known LASSO problem in a more
efficient way than traditional convex optimization techniques. Later, Donoho and Tsaig proposed
to use the modified LARS to find a sparse solution of an underdetermined linear system, which
is known as “homotopy algorithm” [34]. In this chapter, we review and study how LARS can be
used for finding a sparse solution of an underdetermined linear system [33,34]. First, we describe
the basic LARS algorithm [33]. Then, we provide a thorough insight about the modified LARS
for solving the LASSO problem which is considered an effective way for solving the l1-norm
minimization problem (1.4) [34]. In addition, we provide a detailed derivation of LARS main
steps, which was not explained in the prior work [33,34] at the same level we present here.

3.1 Basic LARS Algorithm
In Section 2.2, we have seen that the OMP algorithm relies on solving the least square problem
to update the solution coefficients at each iteration. As we stated earlier, OMP adopts the largest
possible step in the least square direction of the active columns. Therefore, OMP is considered
an extreme fitting method that can be overly greedy. OMP may not select some columns that are
significant in terms of accuracy of the targeted sparse solution. This is because these columns are
highly correlated with columns that have already been in the active set. To overcome this
problem, the Least Angle Regression (LARS) was used to find a sparse solution of an
17

underdetermined linear system [34]. LARS is considered less greedy than OMP since it adopts
appropriate step that is fairly smaller than the OMP step. In other words, LARS increases the
coefficients of the solution vector that associated with the active set as much as needed (as
explained further below).

Similar to OMP, LARS starts by initializing: the solution vector 𝑥 with an all-zero, the

residual vector 𝑟 with the measurement vector 𝑦, and the active set 𝐼 with an empty set. Then, at

each iteration, a new column is selected from the matrix 𝐴, and its index is added to the active

set. At the first iteration, LARS selects a column, say 𝑎 𝑗 , that is highly correlated with the
residual vector 𝑟. This implies that the residual vector 𝑟 has a smaller angle with the column 𝑎 𝑗

than other columns of the matrix 𝐴. Then, LARS increases the coefficient 𝑥(𝑗) that is associated
with the selected column 𝑎 𝑗 . This causes the absolute correlation value of 𝑎 𝑗 with the current

residual to decrease as 𝑥(𝑗) is increased. LARS takes the smallest possible steps in the direction

of the column 𝑎 𝑗 until another column, say 𝑎 𝑘 , has as much absolute correlation value with the

current residual as the column 𝑎 𝑗 . Now, instead of continuing in the direction of 𝑎 𝑗 , LARS moves

forward in the direction that is equiangular with the selected columns (𝑎 𝑗 , 𝑎 𝑘 ) until a third
column, say 𝑎 𝑚 , has much absolute correlation value with the current residual as much as 𝑎 𝑗 and

𝑎 𝑘 . The procedure is continued to add one column until no remaining column has correlation

with the current residual. The name “least angle” comes from a geometrical interpretation of the
LARS process, which chooses the updated direction that makes the smallest and equal angle with
all active columns.
To further illustrate the LARS algorithm, we consider the two-dimensional system that was
described in Example 2.1. LARS initializes the solution vector to an all zero and the residual to
the measurement vector 𝑦. As shown in Figure 3.1, LARS begins to select the column 𝑎1
18

because it has absolute correlation with the initial residual (the vector 𝑦) more than 𝑎2

(i.e. 𝜃1 (1) < 𝜃1 (2), where 𝜃 𝑡 (𝑖) is the angle between the column 𝑎 𝑖 and the current residual 𝑟 at
iteration 𝑡). Then, the LARS algorithm proceeds in the direction of 𝑎1 by adding the step size 𝛾1

(the red bold line in Figure 3.1) which is chosen in a way that makes columns 𝑎1 and 𝑎2 have the

same absolute correlation with the current residual at the next iteration (i.e. 𝜃2 (1) = 𝜃2 (2)).

When the solution vector 𝑥 is updated, the solution coefficient 𝑥1 (1) = 𝛾1. At the second
iteration, LARS adds the column 𝑎2 to the active set, and proceeds in the direction that is

equiangular with the columns 𝑎1 and 𝑎2 . Because there is no remaining column, LARS adds step

size 𝛾2 (the blue bold line in Figure 3.1) that leads to the vector 𝑦. LARS terminates since the

residual is zero, and the solution coefficients equals to: 𝑥2 (1) = 𝛾1 + 𝛾2 𝑑2 (1) and 𝑥2 (2) =
𝛾2 𝑑2 (2), where 𝑑2 is the updated direction at the second iteration which is equiangular with the

active columns (𝑎1 , 𝑎2 ).

In the LARS algorithm, two important parameters are required to be computed: the updated

direction vector 𝑑 that should be equiangular with the active columns, and the step size 𝛾 which

is a scalar that is multiplied by the updated direction 𝑑 to update the solution vector 𝑥. The
remaining steps of the LARS algorithm are roughly similar to the corresponding OMP’s steps
that were described in Section 2.2.

19

𝑎2

𝜃1 2

𝑎2

𝜃1 1 < 𝜃1 (2)

𝜃2 2

𝜃1 1

𝜃2 1

𝛾1

𝛾2

𝜃2 1 = 𝜃2 (2)

𝑥2 1 = 𝛾1 + 𝛾2 𝑑2 1
𝑥2 2 = 𝛾2 𝑑2 (2)

𝑥1 1 = 𝛾1
𝑥1 2 = 0

𝑦
𝑎1

Figure 3.1: LARS approximates the vector 𝑦 by using the columns 𝑎1 , 𝑎2
The red bold line represents the approximation to the vector 𝑦 at the first iteration, and the blue
bold line represents the change that adds to the previous approximation to obtain the
approximation at the second iteration. 𝑎2 ∥ ���. 𝜃 𝑡 (𝑖) is the angle between the column 𝑎 𝑖 and
𝑎2
the current residual at iteration 𝑡. d2 is the updated direction at the second iteration.

3.3.1 Derivation of Updated Direction
In this section, we provide our derivation of computing the updated direction that is used in
[34]. The updated direction should form an equal angle with the active columns toward the

residual vector 𝑟. We start the derivation of the update direction 𝑑 𝑡 ∈ 𝑅 𝑛 at iteration 𝑡 by
projecting the previous residual vector 𝑟 𝑡−1 onto the space spanned by the active columns. This

is achieved by solving for � in the following normal equation:
𝑥
𝐴 𝐼𝑇 𝐴 𝐼 � = 𝐴 𝐼𝑇 𝑟 𝑡−1
𝑥

By using Equation (2.1), the active entries of the correlation vector can be found as follows:

20

(3.1)

Substituting (3.2) into (3.1):

𝑐 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑟 𝑡−1

(3.2)

𝐴 𝐼𝑇 𝐴 𝐼 � = 𝑐 𝑡 (𝐼)
𝑥

(3.3)

𝑐 𝑡 (𝐼) = 𝜆 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

(3.4)

At each iteration of the LARS algorithm, the active columns have the same absolute correlation

with the residual vector. This correlation value is represented by the parameter 𝜆 𝑡 . Consequently,
we can factorize 𝑐 𝑡 (𝐼) as follows:

By using (3.4), Equation (3.3) can be rewritten as:

𝐴 𝐼𝑇 𝐴 𝐼 � = 𝜆 𝑡 𝑠𝑖𝑠𝑛 (𝑐 𝑡 (𝐼))
𝑥

𝜆 𝑡 is a scalar; dividing Equation (3.5) by the value of 𝜆 𝑡 leads to:
𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝑠𝑖𝑠𝑛 (𝑐 𝑡 (𝐼))

(3.5)

(3.6)

where 𝑑 𝑡 (𝐼) = �/𝜆 𝑡 is a sub-vector of the updated direction vector 𝑑 𝑡 that holds the entries
𝑥

corresponding to the active set 𝐼. The goal of Equation (3.6) is to calculate the entries of the sub-

vector 𝑑 𝑡 (𝐼) which ensures that the absolute correlations of the active columns are declined
equally. LARS sets entries that are not in the active set to zero (i.e. 𝑑 𝑡 (𝐼 𝑐 ) = 0).

3.1.2 Derivation of Step Size

At each iteration, the LARS algorithm computes the step size 𝛾 to update the solution
coefficients that correspond to the active set. This kind of update is necessary to cause a column
from the inactive set to be included in the active set at the next iteration. This can be achieved by
making the active columns similar to the selected column in terms of the absolute correlation
with the current residual vector.
21

Derivation of closed-form expression for calculating the step size 𝛾 was briefly stated in the

original LARS paper [33]. Therefore, In this section, we obviously provide our complete and

detailed derivation of that expression. We start the derivation of the step size 𝛾 by assuming the
LARS algorithm is currently at iteration 𝑡. In this case, the step size 𝛾 𝑡 is required to be
calculated. In LARS, the current residual vector is computed in the same way as OMP, which is
expressed in Equation (2.4):

And the previous residual is obtained by:

𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

𝑟 𝑡−1 = 𝑦 − 𝐴𝑥 𝑡−1

(3.7)

(3.8)

To find the relationship between the current and the previous residual vectors, we subtract
Equation (3.8) from Equation (3.7):

Equivalently:

𝑟 𝑡 − 𝑟 𝑡−1 = 𝑦 − 𝐴𝑥 𝑡 − 𝑦 + 𝐴𝑥 𝑡−1
𝑟 𝑡 = 𝑟 𝑡−1 − 𝐴(𝑥 𝑡 − 𝑥 𝑡−1 )

(3.9)

𝑥 𝑡 = 𝑥 𝑡−1 + 𝛾 𝑡 𝑑 𝑡

(3.10)

𝑟 𝑡 = 𝑟 𝑡−1 − 𝛾 𝑡 𝐴 𝑑 𝑡

(3.11)

At each iteration, LARS updates the solution vector 𝑥 via adding the multiplication of the scalar
𝛾 𝑡 by the vector 𝑑 𝑡 to the previous solution vector. This is expressed as follows:

By substituting (3.10) in (3.9), we obtain:

Because 𝑑(𝐼 𝑐 ) = 0, Equation (3.11) can be simplified to:

𝑟 𝑡 = 𝑟 𝑡−1 − 𝛾 𝑡 𝐴 𝐼 𝑑 𝑡 (𝐼)
22

(3.12)

The correlation vector c in the next iteration is computed by:

By substituting (3.12) in (3.13), we get:

𝑐 𝑡+1 = 𝐴 𝑇 𝑟 𝑡

𝑐 𝑡+1 = 𝐴 𝑇 (𝑟 𝑡−1 − 𝛾 𝑡 𝐴 𝐼 𝑑 𝑡 (𝐼))
= 𝑐 𝑡 − 𝛾 𝑡 𝐴 𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼)

(3.13)

(3.14)

At each iteration of LARS, the parameter 𝜆 represents the absolute correlation of the active
columns. In fact, 𝜆 represents the largest absolute value over entries of the correlation vector.
This can be expressed as follows:

𝜆 = ‖𝑐‖∞ = |𝑐(𝐼)|

where ‖ ‖∞ is the maximum norm, and ‖𝑐‖∞ ≝ max(|𝑐(1)|, |𝑐(2)|, … … . . . , |𝑐(𝑛)|).

(3.15)

Hence, 𝜆 can be computed for the next iteration 𝑡 + 1 as follows:

By using Equation (3.14), we get:

From (3.6), we have:

𝜆 𝑡+1 = |𝑐 𝑡+1 (𝐼)|

𝜆 𝑡+1 = |𝑐 𝑡 (𝐼) − 𝛾 𝑡 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼)|

𝑑 𝑡 (𝐼) = (𝐴 𝐼𝑇 𝐴 𝐼 )−1 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))

By substituting 𝑑 𝑡 (𝐼) of (3.17) into (3.16), we obtain:

𝜆 𝑡+1 = |𝑐 𝑡 (𝐼) − 𝛾 𝑡 𝐴 𝐼𝑇 𝐴 𝐼 (𝐴 𝐼𝑇 𝐴 𝐼 )−1 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))|
𝜆 𝑡+1 = |𝑐 𝑡 (𝐼) − 𝛾 𝑡 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))|

Substituting (3.4) in (3.18):

𝜆 𝑡+1 = �𝜆 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)� − 𝛾 𝑡 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))�
23

(3.16)

(3.17)

(3.18)

𝜆 𝑡+1 = �𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�(𝜆 𝑡 − 𝛾 𝑡 )�
𝜆 𝑡+1 = |𝜆 𝑡 − 𝛾 𝑡 |

(3.19)

Recall that at each iteration, the updated direction 𝑑 ensures the absolute correlations of the
active columns are declined equally. In other word, the value of 𝜆 is decreased at each iteration.

Moreover, LARS selects the smallest step size that causes a column from inactive set to join the
active set at next iteration. Therefore, the value of 𝛾 𝑡 should be positive and less than 𝜆 𝑡 . As a
result, Equation (3.19) can be simplified to:

𝜆 𝑡+1 = 𝜆 𝑡 − 𝛾 𝑡

(3.20)

|𝑐 𝑡+1 (𝑖)| = 𝜆 𝑡+1

(3.21)

𝑐 𝑡+1 (𝑖) = 𝑐 𝑡 (𝑖) − 𝛾 𝑡 𝑎 𝑖𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼)

(3.22)

We assume the 𝑖 𝑡ℎ column is added to the active set at the next iteration 𝑡 + 1. Therefore, the

absolute value of its correlation should equal to 𝜆 𝑡+1 .
where 𝑖 ∈ 𝐼 𝑡𝑐 , and 𝐼 𝑡𝑐 is the inactive set at iteration 𝑡.

Using Equation (3.14), the correlation of the 𝑖 𝑡ℎ column can be computed by the following
equation:

By substituting (3.22) and (3.20) in Equation (3.21), we get:

�𝑐 𝑡 (𝑖) − 𝛾 𝑡 𝑎 𝑖𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼)� = 𝜆 𝑡 − 𝛾 𝑡

±(𝑐 𝑡 (𝑖) − 𝛾 𝑡 𝑎 𝑖𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼)) = 𝜆 𝑡 − 𝛾 𝑡
±𝑐 𝑡 (𝑖) ∓ 𝛾 𝑡 𝑎 𝑖𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝜆 𝑡 − 𝛾 𝑡
𝛾 𝑡 ∓ 𝛾 𝑡 𝑎 𝑖𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝜆 𝑡 ∓ 𝑐 𝑡 (𝑖)

𝛾 𝑡 (1 ∓ 𝑎 𝑖𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼)) = 𝜆 𝑡 ∓ 𝑐 𝑡 (𝑖)
24

𝛾𝑡 =

Let 𝑣 𝑡 = 𝐴 𝐼 𝑑 𝑡 (𝐼)

𝜆 𝑡 ∓ 𝑐 𝑡 (𝑖)
1 ∓ 𝑎 𝑖𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼)

𝛾𝑡 =

𝜆 𝑡 ∓ 𝑐 𝑡 (𝑖)
1 ∓ 𝑎 𝑖𝑇 𝑣 𝑡

The LARS algorithm finds the minimum positive step size 𝛾 𝑡 that makes one column from the
inactive set 𝐼 𝑐 join the active set 𝐼 at the next iteration 𝑡 + 1.

𝛾 𝑡 = min �
𝑐
𝑖∈𝐼

𝜆 𝑡 – 𝑐 𝑡 ( 𝑖) 𝜆 𝑡 + 𝑐 𝑡 ( 𝑖)
,
�
1 − 𝑎 𝑖𝑇 𝑣 𝑡 1 + 𝑎 𝑖𝑇 𝑣 𝑡

(3.23)

In (3.23), only positive components within every choice of 𝑖 are considered when the minimum
is taken.

After the updated direction 𝑑 𝑡 and the step size 𝛾 𝑡 are calculated, LARS updates the solution

vector as expressed by Equation (3.10), and computes the new residual as expressed by Equation
(3.7). These steps are repeated as long as there is still a column from the inactive set that has
correlation with the current residual. When no remaining columns have correlation with the
current residual (i.e. 𝜆 approaches zero), LARS terminates and the vector 𝑥 𝑡 is returned as the
final solution. The basic LARS algorithm can be summarized in Figure 3.2.

25

• Input: the vector 𝑦 ∈ 𝑅 𝑚 the matrix 𝐴 ∈ 𝑅 𝑚×𝑛 , and the termination threshold for the
residual norm 𝜖.

• Output: the sparse vector 𝑥 ∈ 𝑅 𝑛 .

• Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴.

1.Initialization: 𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1, 𝐼 = ∅.

2.Compute the correlation vector: 𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1

3.Compute the maximum absolute value in the correlation vector: 𝜆 𝑡 = ‖𝑐 𝑡 ‖∞

4. If 𝜆 𝑡 is zero or approaches a very small value, LARS is terminated and the vector 𝑥 𝑡 is
5. Find the active set: 𝐼 = {𝑗: |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }

returned as the final solution, otherwise the following steps are implemented.
𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝑠𝑠𝑠𝑠(𝑐 𝑡 (𝐼))

6. Solve the following least square problem to find active entries of the updated direction:
where 𝑠𝑠𝑠𝑠(𝑐 𝑡 (𝐼)) returns the sign of the active entries of the correlation vector 𝑐 𝑡

7. Set the inactive entries of the updated direction to zero: 𝑑 𝑡 (𝐼 c ) = 0
8. Calculate the step size:
where 𝑣 𝑡 = 𝐴 𝐼 𝑑 𝑡 (𝐼)

𝛾 𝑡 = min �
𝑐
𝑖∈𝐼

𝜆 𝑡 – 𝑐 𝑡 (𝑖) 𝜆 𝑡 + 𝑐 𝑡 (𝑖)
,
�
1 − 𝑎 𝑖𝑇 𝑣 𝑡 1 + 𝑎 𝑖𝑇 𝑣 𝑡

9. Update the solution vector: 𝑥 𝑡 = 𝑥 𝑡−1 + 𝛾 𝑡 𝑑 𝑡

10. Compute the new residual vector: 𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

11. If ‖𝑟 𝑡 ‖2 < 𝜖, terminate and return 𝑥 = 𝑥 𝑡

iteration counter: 𝑡 = 𝑡 + 1 and return to step 2.

as the final solution. Else, increase the

Figure 3.2: The basic LARS algorithm

26

3.2 Modified LARS for solving the LASSO problem
In certain circumstances, the l1-norm minimization problem stated in (1.4) can successfully find
the sparsest solution while OMP fails [14,27,15]. The l1-norm minimization problem can be
solved by using the standard convex optimization techniques. However, the convex optimization
methods require heavy computations especially when the underdetermined system is very large.
The Least Absolute Shrinkage and Selection Operator (LASSO) [35] was proposed by Tibshirani
to solve the l1-norm minimization problem (1.4) efficiently. LASSO minimizes the least square
error subject to the l1-norm of the solution vector being less than some threshold (q):
min‖𝑦 − 𝐴𝑥‖2
2
𝑥

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜

‖𝑥‖1 ≤ 𝑞

(3.24)

In other words, LASSO finds the least square solution subject to an l1-norm constraint on the
solution vector. LASSO (3.24) can be rewritten equivalently as an unconstrained optimization
problem:

min
𝑥

1
‖𝑦 − 𝐴𝑥‖2 + 𝜆‖𝑥‖1
2
2

(3.25)

where 𝜆 ∈ [0, ∞) is a scalar regularization parameter that handles the trade-off between the mean
square error and the l1-norm of the solution vector 𝑥.

To see how Equation (3.25) solves the l1-norm minimization problem (1.4), the vector 𝑥 is

initialized with an all zero for large 𝜆, and then 𝜆 is decreased gradually. When 𝜆 approaches

zero, the solution of LASSO (3.25) converges to the solution of l1-norm minimization problem
(1.4) [34].

27

LARS with a minor modification has been used to solve LASSO (3.25) faster than traditional
convex optimization methods [33,34]. To explain this, we start solving (3.25) as a minimization
problem by setting its gradient to zero and solve the following equation:

This can be simplified to:

1
𝜕 𝑥 � ‖𝑦 − 𝐴𝑥‖2 + 𝜆‖𝑥‖1 � = 0
2
2

−𝐴 𝑇 (𝑦 − 𝐴𝑥) + 𝜆𝜕 𝑥 (‖𝑥‖1 ) = 0

Because the residual 𝑟 = 𝑦 − 𝐴𝑥, Equation (3.27) can be rewritten as:

(3.26)

(3.27)

−𝐴 𝑇 𝑟 + 𝜆 𝜕 𝑥 (‖𝑥‖1 ) = 0

(3.28)

𝑐 = 𝜆 𝜕 𝑥 (‖𝑥‖1 )

(3.29)

Recall that the correlation is calculated by: 𝑐 = 𝐴 𝑇 𝑟, Equation (3.28) can be expressed as:
−𝑐 + 𝜆 𝜕 𝑥 (‖𝑥‖1 ) = 0

The gradient of ‖𝑥‖1 cannot be found because l1-norm is discontinuous at zero. Therefore,

subgradient of l1-norm can be found as follows:
𝜕 𝑥 ‖𝑥‖1 = �𝑣 ∈ 𝑅 𝑛 ∶

𝑣(𝑖) = 𝑠𝑖𝑠𝑛�𝑥(𝑖)�
𝑣(𝑖) ∈ [−1,1]

𝑖𝑓 𝑥(𝑖) ≠ 0
�
𝑖𝑓 𝑥(𝑖) = 0

(3.30)

At each iteration of LARS, only coefficients of the solution vector 𝑥 corresponding to the active

set 𝐼 are nonzeros (i.e. 𝑥(𝑖) ≠ 0 , 𝑖 ∈ 𝐼). The other coefficients are zeros (i.e. 𝑥(𝑖) = 0, 𝑖 ∈ 𝐼 𝑐 ).
Hence, by using (3.30), Equation (3.29) can be expressed for the active set as follows:
𝑐(𝐼) = 𝜆 𝑠𝑖𝑠𝑛(𝑥(𝐼))

and for the inactive set, Equation (3.29) can be expressed as follows:

28

(3.31)

|𝑐(𝐼 𝑐 )| ≤ 𝜆

(3.32)

Thus, LARS can be used to solve the LASSO problem if it satisfies the constraints stated by
Equations (3.31) and (3.32) at each iteration. In other words, to solve LASSO, LARS should
ensure:

1- The absolute correlations of the active columns are equal to 𝜆.

2- The signs of the correlation and the solution vectors are matched for the active entries.
3- The absolute correlations of the inactive columns are equal or less than 𝜆.

LARS maintains the condition stated by equality (3.32) at each iteration because the absolute

correlations of the inactive columns are always less than 𝜆. If they are equal to 𝜆, the columns

should be in the active set (step 5 of the LARS algorithm in Figure 3.2). Also LARS partially
maintains the condition stated by Equation (3.31) which is the absolute correlations of the active

columns are equal to 𝜆. However, LARS does not enforce the signs matching of the active
entries between the correlation and solution vectors. Some of the solution coefficients associated

with the active set may change signs while the signs of corresponding correlation coefficients are
still the same. Therefore, LARS should be modified to maintain the sign matching constraint that
states in the following equation:

𝑠𝑖𝑠𝑛 �𝑥(𝐼)� = 𝑠𝑖𝑠𝑛�𝑐(𝐼)�

(3.33)

LARS violates the condition (3.33) when one of the solution coefficients that is associated with
the active set crosses zero. Therefore, if any solution coefficient associated with the active set
hits zero during the LARS procedures, the corresponded column to this coefficient should be
removed from the active set, and the updated direction 𝑑 is recalculated by solving Equation
(3.6). Hence, we need to determine the step size 𝛾 which causes one of the solution coefficients
29

that is associated with the active set to be zero after updating solution process. By using Equation
(3.10), the 𝑖 𝑡ℎ solution coefficient is updated as follows:

𝑥 𝑡 (𝑖) = 𝑥 𝑡−1 (𝑖) + 𝛾 𝑡 𝑑 𝑡 (𝑖)

The value of 𝑥 𝑡 (𝑖) would be zero if:

𝛾𝑡 = −

𝑥 𝑡−1 (𝑖)
𝑑 𝑡 (𝑖)

The smallest positive step size that causes the 𝑖 𝑡ℎ solution coefficient associated with the active

set to be zero, can be found by the following minimization problem:
𝛾 𝑡 = min �−
𝑖∈𝐼

𝑥 𝑡−1 (𝑖)
�
𝑑 𝑡 (𝑖)

(3.34)

where the minimum is taken only over positive value.

Therefore, at each iteration, the modified LARS for solving LASSO computes two step sizes:
one adds a column to the active set as stated by Equation (3.23), and the other drops a column
from the active set as stated by Equation (3.34). To distinguish between these two values of the
step size, 𝛾 + is used to refer to the step size computed by (3.23), and 𝛾 − is used to refer to the
step size computed by (3.34). The modified LARS selects the smallest step size:

where

𝛾 + = 𝑚𝑖𝑛 �
𝑡
𝑐
𝑖∈𝐼

𝛾 𝑡 = 𝑚𝑖𝑛 {𝛾 + , 𝛾 − }
𝑡
𝑡

𝜆 𝑡 – 𝑐 𝑡 (𝑖) 𝜆 𝑡 + 𝑐 𝑡 (𝑖)
,
�
1 − 𝑎 𝑖𝑇 𝑣 𝑡 1 + 𝑎 𝑖𝑇 𝑣 𝑡

and

𝛾 − = 𝑚𝑖𝑛 �−
𝑡
𝑖∈𝐼

𝑥 𝑡−1 (𝑖)
�
𝑑 𝑡 (𝑖)

(3.35)

The modified LARS uses the step size computed by (3.35) instead of the step size computed by

(3.23) in the basic LARS algorithm. Equally important, for updating the active set, if 𝛾 𝑡 = 𝛾 + ,
𝑡
the 𝑖 𝑡ℎ column would be added to the active set at the next iteration 𝑡 + 1 as follows:

30

𝐼 = 𝐼 ∪ {𝑖}

(3.36)

𝐼 = 𝐼 − {𝑖}

(3.37)

In contrast, if 𝛾 𝑡 = 𝛾 − , the 𝑖 𝑡ℎ column would be removed from the active set at the next iteration
𝑡
𝑡 + 1 as follows:

The remaining steps are the same as they are in the basic LARS algorithm stated in Section 3.1.
The modified LARS requires more iterations than the basic LARS because in the modified
LARS, some columns are added to and dropped from the active set, while in the basic LARS,
columns are always added to the active set. The computational complexity of modified LARS is

𝑂(𝑚𝑛𝑇), where 𝑇 is the number of required iterations which is normally equal or greater than

the number of nonzero entries of the solution vector 𝑥 (i.e. T ≥ 𝑘). Note that the constant that

hides in asymptotic notation (big 𝑂) of the modified LARS is much higher than the one of OMP,

because the modified LARS executes many additional steps to compute the step size 𝛾 while

OMP does not. However, the modified LARS is considered an efficient algorithm for solving the
LASSO problem because it requires less computational cost than traditional convex optimization
methods.

31

Chapter 4
4

Comparative Analysis of OMP and LARS
In the signal processing and statistical literatures, OMP and modified LARS for solving LASSO
are considered popular algorithms for finding a sparse solution of an underdetermined linear
system. In this chapter, we compare OMP and modified LARS in terms of the algorithmic steps
and performance. We reformulate some steps of OMP and modified LARS to identify the
similarities and differences between them. To study the performance of OMP and modified
LARS, we compare the convergence time and the solution accuracy in each algorithm.

4.1 Algorithmic Steps
Pervious work by Donoho and Tsaig [34] showed that the basic LARS algorithm that was stated
in Section 3.1 and OMP have the same algorithmic steps, except the updating solution step. As

stated earlier, OMP updates the active entries of the solution vector 𝑥 by solving the least square
problem. This is achieved by projecting the vector 𝑦 onto the space spanned by active columns;

which is conducted by solving the following normal equation:
𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦

(4.1)

On the other hand, the basic LARS algorithm updates the active entries of the solution vector 𝑥
by solving the following penalized least square problem:

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦 − 𝜆 𝑡+1 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

(4.2)

Equations (4.1) and (4.2) are identical if the penalization term �𝜆 𝑡+1 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�� is removed

from Equation (4.2). However, Donoho et al [34] did not demonstrate the derivation of (4.2).
Therefore, we formulate our derivation of (4.2) in appendix A.
32

In our work, we compare OMP and modified LARS for solving LASSO from a different
perspective. We reformulate some steps in the algorithms to reflect more similarities and
differences between these algorithms. For simplicity, hereafter in this chapter and the next ones
of this thesis, we use the abbreviation “LARS” to refer to the modified LARS algorithm that is
specialized for solving the LASSO problem.
OMP and LARS solve different optimization problems: OMP is used to find an approximate
solution for the l0-norm minimization problem stated by Equation (1.2), while LARS is used to
solve the l1-norm minimization problem stated by Equation (1.4). Nevertheless, both OMP and
LARS depend on an underlying greedy framework. Indeed, they have almost the same
fundamental algorithmic steps. Both algorithms start from an all-zero solution, an empty active
set, and the measurement vector 𝑦 as initial residual. They depend on the correlation between the

columns of the matrix 𝐴 and the current residual to maintain the active set. They compute the
residual vector in the same manner. Furthermore, in both algorithms, it is required to solve the
least square problem over the active columns to update the solution vector 𝑥.

However, OMP and LARS are different in some steps. At each iteration of LARS, a column

is added to or removed from the active set, while in OMP, a column is always added to the active
set. In addition, each algorithm adopts a different way to update the solution vector 𝑥. OMP

takes the largest possible step in the least square direction of the active columns. OMP achieves

this through updating the active entries of the solution vector 𝑥 by projecting the vector 𝑦 onto

the space spanned by the active columns. On the other hand, LARS takes the smallest possible
step that forces a column from the inactive set to join the active set, or drops a column from the
active set. The algorithmic steps of OMP and LARS are compared in Table 4.1.

33

OMP

Initialization
Compute correlation

Update active set

𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1, 𝐼 = ∅
𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1

𝑖 = arg 𝑚𝑎𝑥|𝑐 𝑡 (𝑗)|
𝑐
𝑖∈𝐼

𝐼 = 𝐼 ∪ {𝑖}

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦
𝑥 𝑡 (𝐼 𝑐 ) = 0

Stopping condition
Increase iteration
counter

𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1, 𝐼 = ∅
𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1
𝜆 𝑡 = ‖𝑐 𝑡 ‖∞

If (𝛾 𝑡−1 = 𝛾 + 𝑜𝑟 𝑡 = 1) // add column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }
𝐼 = 𝐼 ∪ {𝑖}

If (𝛾 𝑡−1 = 𝛾 − ) // remove column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼: 𝑥 𝑡−1 (𝑗) = 0}
𝐼 = 𝐼 − {𝑖}

𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))
𝑖∈𝐼

𝑑 𝑡 (𝐼 𝑐 ) = 0

𝑣 𝑡 = 𝐴 𝐼 𝑑 𝑡 (𝐼)

𝛾 + = 𝑚𝑖𝑛 �
𝑡
𝑐

Update solution

Compute residual

LARS

𝜆 𝑡 – 𝑐 𝑡 (𝑖) 𝜆 𝑡 + 𝑐 𝑡 (𝑖)
,
�
1 − 𝑎 𝑖𝑇 𝑣 𝑡 1 + 𝑎 𝑖𝑇 𝑣 𝑡

𝛾 − = 𝑚𝑖𝑛 �−
𝑡
𝑖∈𝐼

𝑥 𝑡−1 (𝑖)
�
𝑑 𝑡 (𝑖)

𝛾 𝑡 = 𝑚𝑖𝑛 {𝛾 + , 𝛾 − }
𝑡
𝑡
𝑥 𝑡 = 𝑥 𝑡−1 + 𝛾 𝑡 𝑑 𝑡

𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡

‖𝑟 𝑡 ‖2 < 𝜖

‖𝑟 𝑡 ‖2 < 𝜖

𝑡 = 𝑡+1

𝑡 = 𝑡+1

Table 4.1: The algorithmic steps of OMP and LARS

34

From Table 4.1, it is observable that the algorithmic steps are identical in both algorithms,
except the steps for updating the active set and the solution vector. To find more commonality
between OMP and LARS, we reformulate the steps for updating active set and the solution
vector in the OMP algorithm.

4.1.1 Reformulating step for updating the active set
As we stated early on, one of differences between OMP and LARS is the way of updating the
active set. To compare them in terms of updating active set step, we reformate the step in the

OMP algorithm. At each iteration, OMP selects a column from the inactive set 𝐼 𝑐 that has the
highest absolute correlation with current residual. This can be expressed as follows:
𝑖 = arg 𝑚𝑎𝑥|𝑐 𝑡 (𝑗)|
𝑐

where 𝑖 is the index of the selected column.

𝑖∈𝐼

(4.3)

To make the updating active set step of OMP comparable with the corresponding step of LARS,

we divide the step of finding the index 𝑖 in Equation (4.3) into two steps:

1- Computing the largest absolute entry in the correlation vector: 𝜆 𝑡 = ‖𝑐 𝑡 ‖∞ .

2- Determining an index of column from the inactive set that its absolute correlation with the
current residual vector equals to 𝜆 𝑡 . This can be described as follows:
𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }

Then, 𝑖 is added to the active set as follows: 𝐼 = 𝐼 ∪ {𝑖}.

(4.4)

In Table 4.2, we state the original step for updating the active set and the reformulated one in

OMP and LARS. It is noticeable that the step for updating the active set is the same in both
algorithms if LARS always adds columns to the active set. Therefore, the basic LARS algorithm
that is stated in Section 3.1 is similar to OMP in the way of updating the active set, because it
35

always adds columns to the active set and never removes them from the set. On the other hand,
the modified LARS for solving the LASSO problem differs from OMP in the step for updating
the active set. The modified LARS adds or removes columns to/from the active set, while OMP
always adds columns to the active set.
OMP

Update active set
(original)

𝑖 = arg 𝑚𝑎𝑥|𝑐 𝑡 (𝑗)|
𝑐
𝑖∈𝐼

𝐼 = 𝐼 ∪ {𝑖}

𝜆 𝑡 = ‖𝑐 𝑡 ‖∞

𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }
add column

Update active set
(after reformulating)

𝜆 𝑡 = ‖𝑐 𝑡 ‖∞
LARS

𝐼 = 𝐼 ∪ {𝑖}

If (𝛾 𝑡−1 = 𝛾 + 𝑜𝑟 𝑡 = 1) // add column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }
𝐼 = 𝐼 ∪ {𝑖}

If (𝛾 𝑡−1 = 𝛾 − ) // remove column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼: 𝑥 𝑡−1 (𝑗) = 0}
𝐼 = 𝐼 − {𝑖}

𝜆 𝑡 = ‖𝑐 𝑡 ‖∞

If (𝛾 𝑡−1 = 𝛾 + 𝑜𝑟 𝑡 = 1) // add column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }
𝐼 = 𝐼 ∪ {𝑖}

If (𝛾 𝑡−1 = 𝛾 − ) // remove column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼: 𝑥 𝑡−1 (𝑗) = 0}
𝐼 = 𝐼 − {𝑖}

Table 4.2: The original step for updating the active set and the reformulated one in OMP and
LARS

4.1.2 Reformulating step for updating the solution vector
The other difference between OMP and LARS is the way of updating the solution vector. To
compare them clearly, we reformulate the step for updating the solution vector in OMP to make

it comparable with corresponding step in LARS. This can be explained as follows: let 𝐼 be the
active set at iteration 𝑡, and 𝐼 ̅ be the active set at iteration 𝑡 − 1. We assume that OMP is
36

currently at iteration 𝑡. The active entries of the solution vector 𝑥 are computed via solving
Equation (4.1):

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦

The residual vector at iteration 𝑡 − 1 is obtained by:

(4.5)

𝑟 𝑡−1 = 𝑦 − 𝐴𝑥 𝑡−1

Because only active entries of 𝑥 𝑡−1 are nonzeros, we can rewrite (4.6) as follows:

(4.6)

̅
𝑦 = 𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 )+ 𝑟 𝑡−1

By substituting 𝑦 of (4.7) in (4.5), we obtain:

(4.7)

̅
𝑟 𝑡−1 = 𝑦 − 𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 )

̅
𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 (𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 ) + 𝑟 𝑡−1 )

By simplifying this expression, it follows:

̅
𝐴 𝐼𝑇 (𝐴 𝐼 𝑥 𝑡 (𝐼) − 𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 ) ) = 𝐴 𝐼𝑇 𝑟 𝑡−1

Recall Equation (3.2) that computes the active entries of the correlation vector: 𝑐 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑟 𝑡−1 .
This leads to:

̅
𝐴 𝐼𝑇 (𝐴 𝐼 𝑥 𝑡 (𝐼) − 𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 ) ) = 𝑐 𝑡 (𝐼)

(4.8)

At each iteration of OMP, one column is added to the active set. Therefore, the active set 𝐼 at

iteration 𝑡 has one column that is not included in the active set 𝐼 ̅ of the previous iteration (𝑡 − 1).
The coefficient of 𝑥 𝑡−1 that is corresponding to this column is zero. Hence, we get:
̅
𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 ) = 𝐴 𝐼 𝑥 𝑡−1 (𝐼)

By substituting (4.9) in (4.8), the following equation is obtained:

Equivalently:

𝐴 𝐼𝑇 (𝐴 𝐼 𝑥 𝑡 (𝐼) − 𝐴 𝐼 𝑥 𝑡−1 (𝐼) ) = 𝑐 𝑡 (𝐼)

37

(4.9)

𝐴 𝐼𝑇 𝐴 𝐼 �𝑥 𝑡 (𝐼) − 𝑥 𝑡−1 (𝐼)� = 𝑐 𝑡 (𝐼)

At this point, let us define the vector Δ𝑥 𝑡 ∈ 𝑅 𝑛

where

Δ𝑥 𝑡 (𝐼) = 𝑥 𝑡 (𝐼) − 𝑥 𝑡−1 (𝐼) and

Δ𝑥 𝑡 (𝐼 𝑐 ) = 0. Therefore, Equation (4.10) can be expressed as follows:
𝐴 𝐼𝑇 𝐴 𝐼 Δ𝑥 𝑡 (𝐼) = 𝑐 𝑡 (𝐼)

(4.10)

(4.11)

The active entries of the vector Δ𝑥 𝑡 can be obtained by solving Equation (4.11), while the

remaining entries of the vector are set to zero (i.e. Δ𝑥 𝑡 (𝐼 𝑐 ) = 0). The formula for updating the
solution vector 𝑥 can be rewritten in terms of the vector Δ𝑥 as follows:
𝑥 𝑡 = 𝑥 𝑡−1 + Δ𝑥 𝑡

(4.12)

As a result, we divide the step for updating the solution vector that is expressed by Equation
(4.1) in the OMP algorithm into two steps:

1- Computing active entries of the vector Δ𝑥 as expressed in Equation (4.11), and setting the
remaining entries to zero.

2- Updating the solution vector 𝑥 as expressed in Equation (4.12).

With respect to LARS, the vector Δ𝑥 can be calculated by using Equation (3.10) as follows:
Δ𝑥 𝑡 = 𝛾 𝑡 𝑑 𝑡

(4.13)

The original step for updating the solution vector and the reformulated one are listed for OMP
and LARS in Table 4.3.

38

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦
OMP

𝑥 𝑡 (𝐼 𝑐 ) = 0

𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))
LARS

𝑣 𝑡 = 𝐴 𝐼 𝑑 𝑡 (𝐼)

𝛾 + = 𝑚𝑖𝑛 �
𝑡
𝑐
𝑖∈𝐼

Update solution

𝑑 𝑡 (𝐼 𝑐 ) = 0

𝜆 𝑡 – 𝑐 𝑡 (𝑖) 𝜆 𝑡 + 𝑐 𝑡 (𝑖)
,
�
1 − 𝑎 𝑖𝑇 𝑣 𝑡 1 + 𝑎 𝑖𝑇 𝑣 𝑡

𝛾 − = 𝑚𝑖𝑛 �−
𝑡

(original)

𝐴 𝐼𝑇 𝐴 𝐼 Δ𝑥 𝑡 (𝐼) = 𝑐 𝑡 (𝐼)
Δ𝑥 𝑡 (𝐼 𝑐 ) = 0

𝛾 𝑡 = 𝑚𝑖𝑛 {𝛾 + , 𝛾 − }
𝑡
𝑡
𝑥 𝑡 = 𝑥 𝑡−1 + 𝛾 𝑡 𝑑 𝑡

𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))
𝑖∈𝐼

𝑑 𝑡 (𝐼 𝑐 ) = 0

𝑣 𝑡 = 𝐴 𝐼 𝑑 𝑡 (𝐼)

𝛾 + = 𝑚𝑖𝑛 �
𝑡
𝑐

Update solution

𝑖∈𝐼

𝜆 𝑡 – 𝑐 𝑡 (𝑖) 𝜆 𝑡 + 𝑐 𝑡 (𝑖)
,
�
1 − 𝑎 𝑖𝑇 𝑣 𝑡 1 + 𝑎 𝑖𝑇 𝑣 𝑡

𝛾 − = 𝑚𝑖𝑛 �−
𝑡

(after reformulating)

𝑥 𝑡 = 𝑥 𝑡−1 + Δ𝑥 𝑡

𝑥 𝑡−1 (𝑖)
�
𝑑 𝑡 (𝑖)

𝑖∈𝐼

𝑥 𝑡−1 (𝑖)
�
𝑑 𝑡 (𝑖)

𝛾 𝑡 = 𝑚𝑖𝑛 {𝛾 + , 𝛾 − }
𝑡
𝑡
Δ𝑥 𝑡 = 𝛾 𝑡 𝑑 𝑡

𝑥 𝑡 = 𝑥 𝑡−1 + Δ𝑥 𝑡

Table 4.3: The original step for updating the solution vector and the reformulated one in OMP
and LARS

By using the reformulated steps that we derive in Sections 4.1.1 and 4.1.2, the algorithmic steps
of OMP and LARS can be rewritten as shown in Table 4.4.

39

OMP

Compute correlation

𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1, 𝐼 = ∅

Update active set

𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }

Initialization

𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1
𝜆 𝑡 = ‖𝑐 𝑡 ‖∞

add column
𝐼 = 𝐼 ∪ {𝑖}

𝐴 𝐼𝑇 𝐴 𝐼 Δ𝑥 𝑡 (𝐼) = 𝑐 𝑡 (𝐼)
Δ𝑥 𝑡 (𝐼 𝑐 ) = 0

Stopping condition

𝑐 𝑡 = 𝐴 𝑇 𝑟 𝑡−1
𝜆 𝑡 = ‖𝑐 𝑡 ‖∞

If (𝛾 𝑡−1 = 𝛾 + 𝑜𝑟 𝑡 = 1) // add column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑡 (𝑗)| = 𝜆 𝑡 }
𝐼 = 𝐼 ∪ {𝑖}

If (𝛾 𝑡−1 = 𝛾 − ) // remove column
𝑡−1
𝑖 = { 𝑗 ∈ 𝐼: 𝑥 𝑡−1 (𝑗) = 0}
𝐼 = 𝐼 − {𝑖}

𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑡 (𝐼))
𝑖∈𝐼

𝑑 𝑡 (𝐼 𝑐 ) = 0

𝑣 𝑡 = 𝐴 𝐼 𝑑 𝑡 (𝐼)

𝜆 𝑡 – 𝑐 𝑡 (𝑖) 𝜆 𝑡 + 𝑐 𝑡 (𝑖)
,
�
1 − 𝑎 𝑖𝑇 𝑣 𝑡 1 + 𝑎 𝑖𝑇 𝑣 𝑡

𝛾 − = 𝑚𝑖𝑛 �−
𝑡

update vector

Compute residual

𝑥0 = 0, 𝑟0 = 𝑦, 𝑡 = 1, 𝐼 = ∅

𝛾 + = 𝑚𝑖𝑛 �
𝑡
𝑐

Compute solution

Update solution

LARS

𝑥 𝑡 = 𝑥 𝑡−1 + Δ𝑥 𝑡
𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡
‖𝑟 𝑡 ‖2 < 𝜖

𝑖∈𝐼

𝑥 𝑡−1 (𝑖)
�
𝑑 𝑡 (𝑖)

𝛾 𝑡 = 𝑚𝑖𝑛 {𝛾 + , 𝛾 − }
𝑡
𝑡
Δ𝑥 𝑡 = 𝛾 𝑡 𝑑 𝑡

𝑥 𝑡 = 𝑥 𝑡−1 + Δ𝑥 𝑡
𝑟 𝑡 = 𝑦 − 𝐴𝑥 𝑡
‖𝑟 𝑡 ‖2 < 𝜖

Increase iteration
𝑡 = 𝑡+1
𝑡 = 𝑡+1
counter
Table 4.4: The algorithmic steps of OMP and LARS after reformulating

40

Form Table 4.4, it is observable that the algorithmic steps of OMP and LARS are almost
identical; the only differences are in updating the active set and computing the solution update
vector (the vector Δ𝑥). For updating the active set, OMP always adds a column to the active set

at each iteration, and this column would never be removed later. On the other hand, in LARS, a
column is added to or removed from the active set at each iteration. For computing the vector

Δ𝑥, both OMP and LARS solve the least square problems stated in Equations (4.11) and (3.6)

respectively. However, each algorithm relies on different parameters to achieve this. OMP uses

the active entries of the correlation vector 𝑐, while LARS uses only the signs of them. Equally
important in OMP, the vector Δ𝑥 is directly computed by solving the least square problem as

stated in Equation (4.11). In contrast, in LARS, the vector Δ𝑥 is computed by the following
steps:

1. Determine the updated direction 𝑑 by solving the least square problem (3.6).

2. Compute the step size 𝛾 by Equation (3.35).

3. Multiply 𝛾 by 𝑑 to obtain the vector Δ𝑥 as express in Equation (4.13).

4.2

Performance Analysis

To analyze the performance of OMP and LARS, we compare these two algorithms in terms of
the convergence time and the accuracy of the final solution.

4.2.1 Convergence time
In convergence to the final solution, OMP is much faster than LARS because:
1. Generally speaking, OMP requires less number of iterations than LARS to converge to the
final solution. This is because OMP always adds columns to the active set, while LARS adds
or removes columns to/from the active set.
41

2. At each iteration, OMP computes the vector Δ𝑥 in one step by solving the least square
problem (4.11). Whereas in LARS, computing the vector Δ𝑥 involves many steps as stated in

Section 4.1. These steps require more computations than solving the least square problem.

4.2.2 Accuracy
In terms of the recovered sparse solution, OMP is considered less accurate than LARS when

some columns of the matrix 𝐴 are highly correlated. OMP takes the largest possible step in the
least square direction of the active columns by solving the least square problem to update the

solution vector 𝑥. As a result, the residual vector will be orthogonal to all active columns at the
next iteration. In other words, the residual vector will have zero correlation with all active

columns. Consequently, columns that are highly correlated with the active columns would have
small correlation with the current residual. Recall that at each iteration, OMP select a column
that has the largest absolute correlation with current residual. Therefore, OMP does not select
these columns even though they are significant for recovering the sparse solution. On the other
hand, LARS does not this problem because it uses a fairly smaller step than the OMP step, and
increases the coefficients of the solution vector associated with the active set as much as needed.
For this reason, LARS selects important columns for recovering the sparse solution even though
they may be highly correlated with the columns that have already been in the active set .To
demonstrate this, we consider the following example:

Example 4.1: assume the matrix 𝐴 contains three columns (𝑎1 , 𝑎2 , 𝑎3 ), where the columns 𝑎1

and 𝑎2 are highly correlated as shown in Figure 4.1. We generate the vector 𝑥 ∈ 𝑅 3 which its

first and second entries are nonzeros, and the third one is zero:
0.9613
𝑥 = �0.2757�
0
42

The measurement vector
As a result, the vector

is obtained via multiplying the matrix

by the vector

would be a linear combination of the columns

and

(i.e.

).

.

1

0.8

0.6

0.4

0.2

0
0

0.2

0.4

0.6

Figure 4.1: Columns of the matrix

approximation change to the vector

1

1.2

and the measurement vector

We use both OMP and LARS to find the vector
combination of columns of the matrix

0.8

1.4

in Example 4.1

that represents the vector

as a linear

. In Figure 4.2 and Figure 4.3, we illustrate the
which is accomplished by OMP and LARS at each

iteration. Note that the approximation change is calculated by multiplying the matrix

by the

current solution update vector Δ .
OMP starts by selecting a column from the matrix
with the initial residual (i.e. the vector

that has maximum absolute correlation

. In this example, the column

is selected at the first

iteration because it is highly correlated with initial residual as shown in Figure 4.2, and added to
the active set. Then, OMP takes the largest possible step in the direction of the column
projecting the vector

onto the column

by

. This leaves some error represented by the residual
43

vector 𝑟 which is orthogonal to 𝑎1 . At the second iteration, the column 𝑎2 would be less

correlated with the residual vector since it is highly correlated with the column 𝑎1 . In this case,

the column 𝑎3 has the largest absolute correlation with the current residual. Therefore, the
column 𝑎3 is selected and added to the active set as shown in Figure 4.3. Then, OMP takes the
largest possible step in the space spanned by the columns (𝑎1 ,𝑎3 ) toward the vector 𝑦. After

updating the solution vector, the residual will be zero and OMP terminates. Note that the column
𝑎2 is never selected even though it is important for recovering the original vector 𝑥.

On the other hand, similar to OMP, LARS starts by adding the column 𝑎1 to the active set at

the first iteration as shown in Figure 4.2. However, LARS moves in the direction of the column

𝑎1 until the column 𝑎2 has absolute correlation with the current residual as much as 𝑎1 . At the

second iteration, LARS adds the column 𝑎2 to the active set, and moves in the direction that is

equiangular with both 𝑎1 and 𝑎2 toward the vector 𝑦 as shown in Figure 4.3. The residual will be
zero after updating the solution vector. Therefore, LARS terminates at the end of the second
iteration. Note that LARS selects the column 𝑎2 which is essential to reconstruct the original

vector 𝑥, while OMP does not.

Figure 4.4 shows the coefficients of the reconstructed vector � over iterations of OMP and
𝑥

LARS. As shown in the figure, LARS achieves very small error norm (almost zero), while OMP
obtains a high error norm. Note that the error is computed by the following equation:
𝑒𝑟𝑟𝑜𝑟 = 𝑥 − �
𝑥

where 𝑥 is the original sparse vector, and � is the reconstructed one by the algorithms
𝑥

(4.14)

Therefore, this implies that LARS can reconstruct the sparse vector 𝑥 when two or more columns
of the matrix 𝐴 are highly correlated, while OMP fails. However, LARS is slower than OMP.
44

OMP
Selection

LARS
Selection

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
0

0.2

0.4

0.6

0.8

1

1.2

0
0

1.4

0.2

0.4

Updating

0.6

0.8

1

1.2

1.4

1

1.2

1.4

Updating

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
0

0.2

0.4

0.6

0.8

1

1.2

0
1.4 0

0.2

0.4

0.6

0.8

: Inactive columns of the matrix
: Active columns of the matrix
: Measurement vector
: Approximation change ( Δ ) at the first iterations
Figure 4.2: The selection and updating steps at the first iteration of OMP and LARS in Example
4.1.The approximation change is obtained via multiplying the matrix by the current solution
update vector Δ

45

OMP
Selection

LARS
Selection

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
0

0.2

0.4

0.6

0.8

1

1.2

0
0

1.4

0.2

0.4

Updating

0.6

0.8

1

1.2

1.4

1

1.2

1.4

Updating

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
0

0.2

0.4

0.6

0.8

1

1.2

1.4

0
0

0.2

0.4

0.6

0.8

: Inactive columns of the matrix
: Active columns of the matrix
: Measurement vector
: Approximation change ( Δ ) at the first iterations
: Approximation change ( Δ ) at the second iterations
Figure 4.3: The selection and updating steps at the second (last) iteration of OMP and LARS in
Example 4.1. The approximation change is obtained via multiplying the matrix by the current
solution update vector Δ

46

OMP
initialization

1

1

0
0
0

0.8

LARS
initialization
0
0
0

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0
0

0.2

0.4

1

0.6 0.8
iteration 1

1

0.8
1

0.6

1.2

0
0

1.4

0.2

0.4

1

1.2022
0
0

0.6 0.8
iteration 1

1

0.4

1

0.4

0.2

1.4

0.6856
0
0

0.8
0.6

1.2

0.2
0

0
0

0.2

1

0.4

0.6 0.8
iteration 2
0.9613
0.2757
0

0.8
0.6

1

1.2

1
3

0

1.4

0.2

1

1.1831
0
0.1353

0.4

0.6 0.8
iteration 2

0.9613
0.2757
0

0.8
0.6

0.4

1.2

1.4

0.9613
0.2757
0
1
2

0.4

0.2

1

0.2
0
0
‖

0
0

0.2
‖

0.4 0.6
‖
‖

0.8 1
1.2
‖
	0.3788

1.4

0.2
‖

0.4
‖

0.6
‖

0.8
1
4.3356

1.2
10

1.4

: Inactive columns of the matrix
: Active columns of the matrix
: Measurement vector

Figure 4.4: The solution coefficients of the reconstructed vector over iterations of OMP and
LARS in Example 4.1, and Euclidean norm of error between the original sparse vector and the
reconstructed vector after the algorithms terminate
47

Chapter 5
5 Simulation Results and Discussion
In this chapter, we simulate the comparative analysis of the OMP and LARS algorithms stated in
previous chapters by using MATLAB. In general, we would go through different examples to
demonstrate the similarities and differences between OMP and LARS from different
perspectives. First, we employ OMP and LARS to reconstruct some images from their
compressive samples. Afterward, we exploit parallel processing techniques to speed-up
convergence of the algorithms, and observe the efficiency of using different number of
processors. Next, we examine the performance of OMP and LARS in terms of mean square error

(MSE) as a function of the measurement size 𝑚 and the sparsity 𝑘. To study the difference in the

updating process of OMP and LARS, we observe the coefficients of the solution and correlation
vectors over iterations. Finally, to study the performance of OMP and LARS from different

aspect, we generate two different examples of an underdetermined linear system. We utilize
OMP and LARS to reconstruct the sparse vector in each example. At each iteration of the
algorithms, we plot the updating process in three dimensions (3D) view.

5.1 Reconstructing images from their compressive samples
To demonstrate the efficiency of OMP and LARS, we reconstruct gray images from their
compressive samples by using both algorithms, and measure the error between the original image
and the reconstructed one. To estimate the convergence speed of each algorithm, we record the
time and the number of iterations that algorithms require to reconstruct the images.

48

We use images of size (512 × 512) pixels. First, we divide each image into 4096 patches of

size (8 × 8) pixels. We reshape every patch into a vector of size (64 × 1) as shown in

Figure 5.1. Each patch vector is normalized through dividing its entries by the maximum value
found in the image.

The matrix 𝐴 is obtained by multiplying two matrices: 𝑚 × 𝑛 random matrix 𝜙, and 𝑛 × 𝑛

transformation matrix 𝜓:

where 𝑛 = 64.

𝐴

𝑚×𝑛

= 𝜙 𝑚×𝑛 × 𝜓 𝑛×𝑛

(5.1)

The random matrix 𝜙 is generated by using Gaussian distribution, and the transformation matrix
𝜓 is generated by using Discrete Cosine Transform (DCT) [41].

To obtain the measurement vector 𝑦, the random matrix 𝜙 is multiplied by each patch of

image:

𝑦𝑖 = 𝜙 × 𝑝𝑖

where 𝑝 𝑖 represents the 𝑖 𝑡ℎ patch, and 𝑖 = 1,2, … … ,4096

(5.2)

Now, the underdetermined linear system is expressed as follows:
𝐴 × 𝑥𝑖 = 𝑦𝑖

where 𝑥 𝑖 is the sparse representation of the patch 𝑝 𝑖 .

(5.3)

To obtain 𝑥 𝑖 for each 𝑖, we employ OMP and LARS with the system (5.3) as shown in
Figure 5.1. Then, the transformation matrix 𝜓 is multiplied by 𝑥 𝑖 to obtain the reconstructed

patch 𝑝̂ 𝑖 :

𝑝̂ 𝑖 = 𝜓 × 𝑥 𝑖
49

(5.4)

512

512
Original
8 Patch

𝜙

𝑚×64

Matrix 𝐴 = 𝜙 × 𝜓

8

Reshape

=

×

64

𝑝

Reconstructed
Patch

𝑦

𝑚

Reshape

Algorithm
OMP or LARS

𝜓64×64 ×

Compressive
Samples

64

64

𝑥

=

Sparse
vector

𝑝̂

Figure 5.1: Reconstruct a patch of an image from its compressive samples by using OMP or
LARS
50

After reconstructing all patches, we form the recovered image. Figure 5.2 and Figure 5.3 show

the recovered images for measurement size 𝑚 = 32 and 𝑚 = 16 respectively.

Then, we compute the mean square error (MSE) between the original patch and the

reconstructed one:

𝑛

2
1
𝑀𝑆𝐸 = � �𝑝 𝑖 (𝑗) − 𝑝̂ 𝑖 (𝑗)�
𝑛
𝑖

𝑗=1

(5.5)

After computing MSE for all 4096 patches, we calculate the average mean square error over
them:

4096

1
𝐴𝑣𝑒𝑟𝑎𝑠𝑒 𝑀𝑆𝐸 =
� 𝑀𝑆𝐸 𝑖
4096
𝑖=1

(5.6)

We depend on the value of Peak Signal to Noise Ratio (PSNR) to determine the performance of
the algorithms which is computed by:

𝑃𝑆𝑁𝑅 = 10 log10 �

𝑀𝐴𝑋(𝑥)
�
𝐴𝑣𝑒𝑟𝑎𝑠𝑒 𝑀𝑆𝐸

(5.7)

where 𝑀𝐴𝑋(𝑥) is the maximum possible entry in the patches. Note that 𝑀𝐴𝑋(𝑥) = 1 because
the image patches are normalized.

If an algorithm has high PSNR, this implies that it can successfully reconstruct the original
images with small error. The values of PSNR for OMP and LARS with measurement size
𝑚 = 32 and 𝑚 = 16 are shown in Figure 5.2 and Figure 5.3 respectively.

To estimate the convergence speed of the algorithms, we record the number of iterations that

each algorithm requires to converge to the final solution for each patch. Then, we compute the
average number of iterations over all patches in the same way that we did with MSE. However,
the number of iterations is not an accurate measure of the algorithms’ speed because computation
involved at each iteration of LARS takes longer time than the computation of OMP iteration. To
51

get a better measurement of the algorithms’ speed, we also record the required time for each
algorithm to recover the whole image in a certain computer. In this simulation, we use a Dell
laptop that is equipped with CPU Intel (i3, 2.13 GHz) and (4 GB) of memory. The average
number of iterations and the required time for OMP and LARS with measurement size 𝑚 = 32
and 𝑚 = 16 are shown in Figure 5.2 and Figure 5.3 respectively.

From Figure 5.2 and Figure 5.3, it is observable that the number of iterations required by

OMP is less than the number of iterations required by LARS to recover the same images. This is
because OMP always adds columns to the active set, while LARS adds or removes columns
to/from the active set. Equally important, OMP consumes less time than LARS. Nevertheless,
LARS obtains higher PSNR than OMP. The justification for this is that LARS selects the most

important columns from the matrix 𝐴 to recover the sparse solution even though these columns
are highly correlated, while OMP does not.

52

original image

reconstructed image (OMP)

reconstructed image (LARS)

OMP

LARS

PSNR

26.3780

28.2098

Time (second)

7.58966

22.27399

Average iteration

32

36

original image

reconstructed image (OMP)

reconstructed image (LARS)

OMP

LARS

PSNR

28.3421

30.0823

Time (second)

5.42805

16.819573

Average iteration

23

27

Figure 5.2: Reconstructed images from their compressive samples by using OMP and LARS
with 𝑚 = 32 and 𝑛 = 64

53

original image

reconstructed image (OMP)

reconstructed image (LARS)

OMP

LARS

PSNR

21.8220

22.4618

Time (second)

2.452597

8.807559

Average iteration

16

19

original image

reconstructed image (OMP)

reconstructed image (LARS)

OMP

LARS

PSNR

21.1888

22.5433

Time (second)

2.138599

7.632700

Average iteration

14

16

Figure 5.3: Reconstructed images from their compressive samples by using OMP and LARS
with 𝑚 = 16 and 𝑛 = 64

54

5.2 Using parallel processing to speed-up convergence of the
algorithms
In the previous section, we divided an image into a number of patches. After that, the
compressive samples of each patch were obtained. Next, the algorithms (OMP and LARS) were
used to reconstruct the original patches as illustrated in Figure 5.1. Because samples of each
patch are independent of samples of other patches, and the same algorithmic steps are executed
to samples of all patches, we can utilize the parallel processing techniques to expedite the
convergence of the algorithms.
According to the Flynn classification [42], reconstructing all patches of an image from their
compressive samples falls under single instruction multiple data (SIMD) class. This is because
identical algorithmic steps are performed to samples of different patches to recover the whole
image. In this case, samples of patches are distributed over different processers. All processors
execute the same algorithmic steps to recover their own patches. Subsequently, the recovered
patches are combined from processers to form the entire image.

Both OMP and LARS calculate the Gramian matrix (𝐴 𝑇 𝐴) to solve the least square problem

over active columns. Computing the Gramian matrix involves heavy computation. Equally
important, the matrix 𝐴 is identical for all patches. Therefore, for efficiency reason, the Gramian

matrix (𝐴 𝑇 𝐴) is off-line computed first, and then it is distributed to all processors [43].

To examine impact of using many processors on convergence time, we reconstruct an image

from its compressive samples as we did in Section 5.1. However, in this section, we use more
than one processor to run OMP and LARS. We illustrate variation of the convergence time with
respect to the number of used processors as shown in Figure 5.4. It is noticeable that the
convergence time reduces as the number of used processors increases. In addition, we observe
that the reduction in convergence time is not significant as the number of processors increases

55

above 6. This is because the communication overhead among processors grows as the number of
used processors increases.

Figure 5.4: The convergence time of OMP and LARS against the number of used processors

56

To calculate the gain of using many processors, we compute the speedup ratio by using the
following equation:

𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =

𝑇(1)
𝑇(𝑝)

(5.8)

where 𝑇(1) is the convergence time of an algorithm by using one processor, 𝑇(𝑝) is the

convergence time of an algorithm by using 𝑝 processors, and 1 ≤ 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 ≤ 𝑝.

The ideal speedup ratio equals to the number of used processors (𝑝). Normally, the speedup ratio
is less than 𝑝 because of time lost in communication overhead among processors. The ideal and
actual speedup ratios for the OMP algorithm are shown in Figure 5.5.

Figure 5.5: The ideal and actual speedup ratios for the OMP algorithm against the number of
used processors

57

5.3 Impact of measurement size and sparsity on MSE
In this section, we examine the performance of OMP and LARS in terms of mean square error

(MSE) as a function of the measurement size 𝑚 and the sparsity 𝑘. We consider various

underdetermined linear systems that have different measurement size 𝑚 and different sparsity 𝑘.

For all of the considered underdetermined linear systems, the value of 𝑛 is fixed to 1000. For

each trial of underdetermined linear system, we randomly generate the matrix 𝐴 of size 𝑚 ×

1000 by using Gaussian distribution. The columns of the matrix 𝐴 are normalized to have unit l2

norm. Then we generate the sparse vector 𝑥 ∈ 𝑅1000 which has 𝑘 nonzero entries in random

positions. The nonzero entries of the vector 𝑥 are also obtained from Gaussian distribution, and

all other entries are set to zero. The vector 𝑥 is also normalized. We multiply the matrix 𝐴 by the
sparse vector 𝑥 to compute the measurement vector 𝑦:
𝑦= 𝐴× 𝑥

(5.9)

At this point, we employ OMP and LARS to reconstruct the original sparse vector of
undetermined linear system described by Equation (5.9). Then, we compute MSE between the
original sparse vector and the reconstructed one as follows.
𝑛

1
2
𝑀𝑆𝐸 = � �𝑥(𝑗) − �(𝑗)�
𝑥
𝑛
𝑗=1

(5.10)

where � is the reconstructed sparse vector by the algorithms
𝑥

In Figure 5.6, the average MSE of 100 trials is illustrated as a function of 𝑚 for different

values of 𝑘 with using OMP and LARS. For each value of 𝑘, we observe that the average MSE
decreases as the value of

𝑚 increases, until it settles to a very small value about (10−34) after

specific value of 𝑚, say � . Having the average MSE approaching such small value implies that
𝑚

the algorithms successfully reconstruct the original sparse vector. Hence, the value � is
𝑚
58

considered as the minimum size of measurements to make the algorithms able to recover the

original 𝑘 sparse vector. From the figure, we observe that OMP and LARS require more
measurement 𝑚 to successfully recover the sparse vector as number of nonzeros 𝑘 is increased.
In addition, we notice that LARS recovers the sparse vector with less average MSE than OMP

for 𝑚 < � . Therefore, LARS obtains better results when the measurement size 𝑚 is smaller than
𝑚
�.
𝑚

To examine the performance of the algorithms from a different viewpoint, we plot the

average MSE as a function of 𝑘 for different values of 𝑚 as shown in Figure 5.7. We can see that

the average MSE starts from almost zero; then after a certain value of 𝑘, say � , it obviously
𝑘
increases to some values above zero. Therefore, by using the measurement vector of size 𝑚, the

algorithms can recover a 𝑘 sparse vector if 𝑘 < � . Furthermore, we observe that LARS recovers
𝑘
a 𝑘 sparse vector with less average MSE than OMP when 𝑘 > � . As a result, we can state that
𝑘
LARS is better to be used than OMP when the sparsity 𝑘 is fairly large.

59

Figure 5.6: Average MSE of 100 trials between the original sparse vectors and the reconstructed
ones by OMP and LARS as a function of measurement size 𝑚 for different sparsity 𝑘
60

Figure 5.7: Average MSE of 100 trials between the original sparse vectors and the reconstructed
ones by OMP and LARS as a function of sparsity 𝑘 for different measurement size 𝑚
61

5.4 Variation of the solution and correlation coefficients
over iterations
In this section, we aim to monitor the updating process of OMP and LARS to the solution vector
over iterations. We use both OMP and LARS to recover a sparse solution of an underdetermined
linear system. At each iteration of OMP and LARS, we plot the solution coefficients. To run an
example, we consider an underdetermined linear system with 𝑚 = 50, 𝑛 = 100, and 𝑘 = 10.
We generate the matrix 𝐴 and the sparse vector 𝑥 of the system by using Gaussian distribution in
the same way we did in Section 5.3. To obtain the measurement vector 𝑦, we multiply the matrix

𝐴 by the sparse vector 𝑥 as express in Equation (5.9). At this point, we employ OMP and LARS

to reconstruct the original sparse vector. The solution coefficients that corresponding to the
active columns are shown over iterations of OMP and LARS in Figure 5.8.
Form Figure 5.8, we observe that both OMP and LARS terminate with the same solution
coefficients. However, they are different in the way of updating the solution coefficients. In
OMP, when a column is added to the active set, the corresponding solution coefficient is
increased to the largest possible value. The coefficients of the active set may decrease slightly in
the subsequent iterations, because they will be affected by other columns that would be added to
the active set later. On the other hand, at each iteration of LARS, all coefficients corresponding
to the active set are increased to the smallest possible step in a way that makes a column from the
inactive set join the active set at the next iteration. In addition, we observe that columns are
added to the active set in different order in the algorithms as shown in Figure 5.8. Moreover,
LARS requires one more iteration than the case of OMP to converge to the final solution, since it
adds one more column (column 72 in this example) to active set, and increase its corresponding
solution coefficients to a very small value (almost zero). The reason behind this is that LARS
62

solves the l1 norm minimization problem. Therefore, some coefficient that are zero in original
sparse vector would be equal to very small value but not exactly zero in the reconstructed sparse
vector by LARS.

Figure 5.8: The solution coefficients that correspond to the active columns over iterations of
OMP and LARS

63

Furthermore, to see how the correlations of columns of the matrix 𝐴 with the current residual

are changed, we illustrate them over some iterations of OMP and LARS as shown in Figure 5.9
and Figure 5.10 respectively.
At each iteration, OMP adds a column that has maximum absolute correlation with current
residual vector to the active set, and updates the solution coefficients to make its correlation zero
at the next iteration. This is because OMP ensures that the current residual vector is always

orthogonal to all active columns. For example in Figure 5.9, OMP adds the 99 𝑡ℎ column to the

active set at the second iteration, and its correlation would be zero at the third iteration. OMP
continues to add a column to the active set until the norm of the current residual approaches zero.

On the other hand, LARS first selects a column that has the maximum absolute correlation

with the initial residual (the vector 𝑦), and then updates the corresponding solution coefficient in

a way that makes the absolute correlation of selected column equal to the largest absolute
correlation over columns that are not in the active set. For instance in Figure 5.10, LARS selects

the 55 𝑡ℎ column at the first iteration, and updates the solution coefficient to make its absolute

correlation similar to the absolute correlation of the 91 𝑡ℎ column. Therefore, the 91 𝑡ℎ column

would be added to the active set at the second iteration. In similar manner, LARS iteratively
adds a column to the active set, and decreases the absolute correlations of the active columns
until no column has correlation with the current residual.
We conclude that OMP updates the solution vector to make absolute correlations of the
active columns equal to zero, while LARS updates the solution vector to make absolute
correlations of the active columns equal to same value (𝜆) as follows:
OMP:

|𝑐(𝑖)| = 0

LARS: |𝑐(𝑖)| = 𝜆

∀𝑖 ∈ 𝐼

(5.11)

Note that 𝜆 is decreased at each iteration; when 𝜆 = 0, LARS converges to the final solution
vector.

64

Active set
55

Active set
55
99

Active set
55
99
32

Active set
55
99
32
34
40
6
91
35
92
Active set
55
99
32
34
40
6
91
35
92
22

Figure 5.9: Absolute correlation of columns of the matrix 𝐴 in some iterations of OMP
65

𝜆

Active set
55

Active set
55
91

Active set
55
91
99

Active set
55
91
99
35
40
32
6
34
92
72
Active set
55
91
99
35
40
32
6
34
92
72
22

Figure 5.10: Absolute correlation of columns of the matrix 𝐴 in some iterations of LARS
66

5.5 Study the algorithms performance through three
dimensions (3D) examples
In this section, we study the performance of OMP and LARS from different perspective by using
three dimensional systems. We create two different examples of underdetermined linear system

with 𝑚 = 3, 𝑛 = 6, and 𝑘 = 2. Then, we use OMP and LARS to reconstruct the sparse vector of

these examples. We illustrate the updating process of algorithms at each iteration in three
dimensions (3D) view.

In both examples, we generate the matrix 𝐴 and the sparse vector 𝑥 by using Gaussian

distribution in the same way we did in Section 5.3. We multiply the matrix 𝐴 by the sparse

vector 𝑥 to obtain the measurement vector 𝑦 as expressed in Equation (5.9). Now, by having the

matrix 𝐴 and the vector 𝑦, we use OMP and LARS to reconstruct the sparse vector 𝑥. To

determine the performance of the algorithms, we compute MSE between the original sparse

vector 𝑥 and the reconstructed one by the algorithms as stated in Equation (5.10).

Example 5.1: we generate the matrix 𝐴 in this example to have columns that are slightly
correlated. To quantify the correlation between columns of the matrix 𝐴, we use the mutual

coherence 𝜇 which is computed as follows:

𝜇 ≝ 𝑎𝑟𝑠 𝑚𝑎𝑥�< 𝑎 𝑖 , 𝑎 𝑗 >�

where 𝑎 𝑖 is the 𝑖 𝑡ℎ column of the matrix 𝐴

𝑖≠𝑗

(5.12)

In this example, 𝜇 = 0.5124 which is considered a fairly small. The columns of the matrix 𝐴 and

the measurement vector 𝑦 are drawn in Figure 5.11. We illustrate the approximation change

(𝐴∆𝑥) to the vector 𝑦 at each iteration of OMP and LARS in Figure 5.12. We notice that both
OMP and LARS obtain almost the same MSE which is approximately zero. This implies that

67

OMP and LARS achieve the same performance and succeed to reconstruct the sparse vector
when columns of the matrix

are slightly correlated.

1

0.5

0

-0.5
1

0
-1

-1

: Columns of the matrix
: Measurement vector
Figure 5.11: The columns of the matrix

-0.5

0

0.5

and the measurement vector

1

in Example 5.1

Equally important, for lower dimensional underdetermined linear systems, a rectangular
matrix (matrix contains more columns than rows) that has small mutual coherence
existed and hard to find. To obtain the matrix

∈

that has

is rarely

0.5124 in this example, we

generate 10 million matrices by using Gaussian distribution, and select the matrix that has the
smallest mutual coherence .

68

OMP
iteration 1

LARS
iteration 1

1

1

0.5

0.5

0

0

-0.5
1

-0.5
1
0
-1 -1

-0.5

0

0.5

0

1

-0.5

-1 -1

iteration 2

0

0.5

1

0

0.5

1

iteration 2

1

1

0.5

0.5

0

0

-0.5
1

-0.5
1
0

0
-1 -1

-0.5

4.1087

0

0.5

1
-1 -1

10

-0.5

4.6222

10

: Inactive columns of the matrix
: Active columns of the matrix
: Measurement vector
: Approximation change ( ∆ )
Figure 5.12: The approximation change ( ∆ ) at each iteration of OMP and LARS in Example
5.1, where
0.5124, and the approximation change
∆

69

Example 5.2: we generate the matrix

in this example to have columns that are highly

0.8885. The columns of the matrix

correlated, and

and the measurement vector

are

drawn in Figure 5.13. As we did in Example 5.1, we illustrate the approximation change ( ∆ )
to the vector

at each iteration of OMP and LARS as shown in Figure 5.14. From the figure, we

observe that the value of MSE in the case of LARS is very small (almost zero), while in OMP, it
is high. Therefore, this implies that LARS succeeds to reconstruct the sparse vector , while
OMP fails. Equally important, LARS converges to the final solution in two iterations, while
OMP requires three iterations to converge to the final solution. As a result, LARS achieves a
better performance than OMP when columns of the matrix

are highly correlated.

1
0.5
0
-0.5
-1
1
1

0.5

0.5

0

0

-0.5

-0.5
-1

: Columns of the matrix
: Measurement vector
Figure 5.13: The columns of the matrix

-1

and the measurement vector

70

in Example 5.2

OMP
iteration 1

LARS
iteration 1

1

1

0

0

-1
1

-1
1

1

0

1

0

0
-1 -1

0
-1 -1
iteration 2

iteration 2
1

1

0

0

-1
1

-1
1

1

0

1

0

0

0
-1 -1

-1 -1
iteration 3
1

0

-1
1
0

1
0

-1 -1
0.0384
2.7785 10
: Inactive columns of the matrix
: Active columns of the matrix
: Measurement vector
: Approximation change ( ∆ )
Figure 5.14: The approximation change ( ∆ ) at each iteration of OMP and LARS in Example
5.2, where
0.8885, and the approximation change
∆

71

Chapter 6
6

Conclusion and Future Work
In this thesis, we developed a comprehensive comparison between two popular algorithms (OMP
and LARS) that have been widely used for finding a sparse solution of an underdetermined linear
system. We provided a thorough description of both algorithms. OMP attempts to find an
approximate solution for the l0-norm minimization problem, while LARS solves the l1-norm
minimization problem. Although OMP and LARS solve different minimization problems, they
both depend on an underlying greedy framework. They start from an all-zero solution, and then
iteratively construct a sparse solution based on correlation between columns of the matrix 𝐴 and

the current residual vector. They converge to the final solution when norm of the current residual

vector approaches zero. We showed that both OMP and LARS have almost similar algorithmic
steps after reformulating the corresponding analytical expressions. Nevertheless, they are
different in maintaining the active set and updating the solution vector. To maintain the active
set, LARS adds or drops columns to/from the active set, while OMP always adds columns to the
active set and never removes them from the set. Moreover, OMP and LARS adopt different ways
to update the solution vector. At each iteration, OMP updates the solution coefficients to the
largest possible value that makes the residual vector is orthogonal to all active columns. In
contrast, LARS updates the solution coefficients to the smallest possible value that forces a
column from the inactive set to join the active set, or drops a column from the active set.
In addition, we analyzed the performance of OMP and LARS in terms of convergence time
and accuracy of the final solution. In general, OMP converges to the final solution in less number
of iterations than LARS because LARS may drop columns from the active set at some iterations,
72

while OMP does not. Equally important, LARS computes the solution update vector (the vector

∆𝑥) which is used to update the solution vector 𝑥, in many steps, while OMP computes it in one
step. Therefore, LARS executes many additional steps that OMP does not have. As a result,
OMP requires less time than LARS to converge to the final solution.

However, in some scenarios especially when columns of the matrix 𝐴 are highly correlated,

LARS can successfully reconstruct the sparse vector, while OMP fails. The reason behind this is

that OMP does not select columns that are highly correlated with columns that have already been
in the active set, even though these columns are important to recover the sparse vector. In
contrast, LARS does not face this problem.
With respect to future work, we recommend to study the impact of mutual coherence of the

matrix 𝐴 on the performance of OMP and LARS for different measurement size 𝑚 and different

sparsity 𝑘. In addition, we believe that there is a good room to improve the implementations of
OMP and LARS, and propose a new algorithm that combines benefits of OMP and LARS.

73

APPENDICES

74

Appendix A
In this appendix, we formulate our derivation of penalized least square problem expressed by
Equation (4.2), which can be exploited to compute active entries of the solution vector in the
basic LARS algorithm that was stated in Section 3.1. Indeed, Equation (4.2) is used to compare
LARS with OMP in [34]. We start the derivation by recalling Equation (3.6) that computes the
updated direction of LARS:

𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑡 (𝐼) = 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

where 𝐼 is the active set at the iteration 𝑡.

Multiply Equation (A.1) by the step size 𝛾 𝑡 which is a scalar:
From Equation (3.10), we obtain:-

𝐴 𝐼𝑇 𝐴 𝐼 𝛾 𝑡 𝑑 𝑡 (𝐼) = 𝛾 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

(A.1)

(A.2)

𝛾 𝑡 𝑑 𝑡 (𝐼) = 𝑥 𝑡 (𝐼) − 𝑥 𝑡−1 (𝐼)

(A.3)

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡−1 (𝐼) + 𝛾 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

(A.4)

Substituting (A.3) into (A.2):

𝐴 𝐼𝑇 𝐴 𝐼 (𝑥 𝑡 (𝐼) − 𝑥 𝑡−1 (𝐼)) = 𝛾 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

Recall the residual vector 𝑟 at iteration 𝑡 − 1 is computed by the following equation:
𝑟 𝑡−1 = 𝑦 − 𝐴𝑥 𝑡−1

(A.5)

̅
𝑟 𝑡−1 = 𝑦 − 𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 )

(A.6)

Because only active entries of 𝑥 𝑡−1 are nonzeros, we can rewrite (A.5) as follows:
where 𝐼 ̅ is the active set at iteration 𝑡 − 1.

75

As OMP, the basic LARS always adds columns to the active set and never removes them from

the set. Therefore, the active set 𝐼 at iteration 𝑡 has one column that is not included in the active
set 𝐼 ̅ of the previous iteration (𝑡 − 1). The coefficient of 𝑥 𝑡−1 that corresponded to this column is
zero. Hence, we obtain:

̅
𝐴 𝐼 ̅ 𝑥 𝑡−1 (𝐼 ) = 𝐴 𝐼 𝑥 𝑡−1 (𝐼)

(A.7)

By substituting (A.7) into (A.6), we obtain:

(A.8)

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦 − 𝐴 𝐼𝑇 𝑟 𝑡−1 + 𝛾 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

Using (A.8) in (A.4):

𝐴 𝐼 𝑥 𝑡−1 (𝐼) = 𝑦 − 𝑟 𝑡−1

(A.9)

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 (𝑦 − 𝑟 𝑡−1 ) + 𝛾 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

The active entries of the correlation vector are computed via: 𝑐 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑟 𝑡−1 . Hence we can
rewrite Equation (A.9) as follows:

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦 − 𝑐 𝑡 (𝐼) + 𝛾 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

(A.10)

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦 − (𝜆 𝑡 − 𝛾 𝑡 ) 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

(A.11)

Substituting 𝑐 𝑡 (𝐼) of (3.4) into (A.10):

𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦 − 𝜆 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)� + 𝛾 𝑡 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

Recall Equation (3.20): 𝜆 𝑡+1 = 𝜆 𝑡 − 𝛾 𝑡 , we can rewrite (A.11) as follows:
𝐴 𝐼𝑇 𝐴 𝐼 𝑥 𝑡 (𝐼) = 𝐴 𝐼𝑇 𝑦 − 𝜆 𝑡+1 𝑠𝑖𝑠𝑛�𝑐 𝑡 (𝐼)�

This leads to the penalized least square problem expressed by Equation (4.2).

76

BIBLIOGRAPHY

77

BIBLIOGRAPHY
[1] Emmanuel J. Candès, Justin Romberg, and Terence Tao, "Robust uncertainty principles:
exact signal reconstruction from highly incomplete frequency information," IEEE
Transactions on Information Theory, vol. 52, no. 2, pp. 489 - 509, February 2006.
[2] David L. Donoho, "Compressed Sensing," IEEE Transactions on Information Theory, vol.
52, no. 4, pp. 1289–1306, April 2006.
[3] Emmanuel J. Candès and Terence Tao, "Near-Optimal Signal Recovery From Random
Projections: Universal Encoding Strategies?," IEEE Transactions on Information Theory,
vol. 52, no. 12, pp. 5406 - 5425, December 2006.
[4] Emamnuel J. Candès, "Compressive sampling," In Proceedings of the International
Congress of Mathematicians, Madrid, Spain, August 2006.
[5] Yaakov Tsaig and David L. Donoho, "Extensions of compressed sensing," Elsevier Signal
Processing, vol. 86, no. 3, pp. 549-571, March 2006.
[6] Emmanuel J. Candes and Terence Tao, "Decoding by Linear Programming," IEEE
Transactions on Information Theory, vol. 51, no. 12, pp. 4203-4215, December 2005.
[7] Mark Rudelson and Roman Vershynin, "Geometric approach to error-correcting codes and
reconstruction of signals," InternationalMathematics Research Notices, vol. 2005, no. 64,
pp. 4019-4041, December 2005.
[8] Alexander Vardy, "Algorithmic complexity in coding theory and the minimum distance
problem," in STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory
of Computing, New York, NY, USA, 1997, pp. 92–109.
[9] Albert Tarantola, Inverse problem theory and methods for model parameter estimation.
Philadelphia, PA , USA: Society for Industrial and Applied Mathematics, 2005.
[10] Emmanuel J. Candès, Justin Romberg, and Terence Tao, "Quantitative Robust Uncertainty
Principles and Optimally Sparse Decompositions," Foundations of Computational
Mathematics, vol. 6, no. 2, pp. 227-254, 2006.
[11] Emmanuel J. Candès, Justin Romberg, and Terence Tao, "Stable signal recovery from
incomplete and inaccurate measurements," Communications on Pure and Applied
Mathematics, vol. 59, no. 8, pp. 1207-1223, August 2006.
[12] Michael Elad, Sparse and Redundant Representations From Theory to Applications in
Signal and Image Processing, 1st ed. New York, NY, USA: Springer, 2010.
78

[13] B. K. Natarajan, "Sparse Approximate Solutions to Linear Systems," SIAM Journal on
Computing, vol. 24, no. 2, pp. 227-234, 1995.
[14] Scott S. Chen, David L. Donoho, and Michael A. Saunders, "Atomic Decomposition by
Basis Pursuit," SIAM Journal on Scientific Computing , vol. 20, no. 33-61 , 1998.
[15] David L. Donoho, "For most large underdetermined systems of linear equations the minimal
L1-norm solution is also the sparsest solution," Communications on Pure and Applied
Mathematics, vol. 59, no. 6, pp. 797–829, June 2006.
[16] Jean-Jacques FUCHS, "Recovery of exact sparse representations in the presence of bounded
noise ," IEEE Transactions on Information Theory, vol. 51, no. 10, pp. 3601 - 3608 ,
October 2005.
[17] Yaakov Tsaig and David L. Donoho, "Breakdown of equivalence between the minimal L1norm solution and the sparsest solution," Elsevier Signal Processing, vol. 86, no. 3, pp.
533–548, March 2006.
[18] David L. Donoho and Philip B. Stark, "Uncertainty Principles and Signal Recovery," SIAM
Journal on Applied Mathematics, vol. 49, no. 3, pp. 906-931, 1989.
[19] David L. Donoho and Xiaoming Huo, "Uncertainty principles and ideal atomic
decomposition," IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2845 - 2862,
November 2001.
[20] Michael Elad and Alfred M. Bruckstein, "A generalized uncertainty principle and sparse
representation in pairs of bases ," IEEE Transactions on Information Theory, vol. 48, no. 9,
pp. 2558 - 2567, September 2002.
[21] Rémi Gribonval and Morten Nielsen, "Sparse representations in unions of bases ," IEEE
Transactions on Information Theory, vol. 49, no. 12, pp. 3320 - 3325, December 2003.
[22] Stephen Boyd and Lieven Vandenberghe, Convex Optimization.: Cambridge University
Press, 2004.
[23] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein,
Introduction to Algorithms, 3rd ed.: The MIT Press, 2009.
[24] Stephane G. Mallat and Zhifeng Zhang, "Matching Pursuits With Time-Frequency
Dictionaries," IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397 - 3415 ,
December 1993.
[25] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, "Orthogonal matching pursuit: recursive
function approximation with applications to wavelet decomposition," in The Twenty-Seventh

79

Asilomar Conference on Signals, Systems and Computers, November 1993.
[26] Geoffrey Davis, Stephana Mallat, and Marco Avellaneda,
Approximations," Constr. Approx, vol. 13, pp. 57-98, 1997.

"Adaptive

Greedy

[27] Joel A. Tropp, "Greed is Good: Algorithmic Results for Sparse Approximation," IEEE
Transactions on Information Theory, vol. 50, no. 10, pp. 2231-2242, October 2004.
[28] David L. Donoho, Yaakov Tsaig, Iddo Drori, and Jean-Luc Starck, "Sparse solution of
underdetermined linear equations by Stagewise Orthogonal Matching Pursuit," Preprint,
2006.
[29] Deanna Needell and Roman Vershynin, "Signal Recovery From Incomplete and Inaccurate
Measurements Via Regularized Orthogonal Matching Pursuit," IEEE Journal of Selected
Topics in Signal Processing, vol. 4, no. 2, pp. 310-316, April 2010.
[30] Gagan Rath and Christine Guillemot, "Sparse approximation with an orthogonal
complementary matching pursuit algorithm," IEEE International Conference on Acoustics,
Speech and Signal Processing, ICASSP , pp. 3325 - 3328, April 2009.
[31] Thomas Blumensath and Mike E. Davies, "Gradient Pursuits ," IEEE Transactions on
Signal Processing, vol. 56, no. 6, pp. 2370 - 2382 , June 2008.
[32] D. Needell and J.A. Tropp, "CoSaMP: Iterative signal recovery from incomplete and
inaccurate samples," Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp.
301-321, May 2009.
[33] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, "Least Angle
Regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-451 , April 2004.
[34] David L. Donoho and Yaakov Tsaig, "Fast Solution of L1- norm Minimization Problems
When the Solution May be Sparse," IEEE Transactions on information Theory, vol. 54, no.
11, pp. 4789-4812, November 2008.
[35] Robert Tibshirani, "Regression shrinkage and selection via the lasso," Journal of the Royal
Statistical Society, vol. 58, no. 1, pp. 267–288, 1996.
[36] Joel A. Tropp and Anna C. Gilbert, "Signal recovery from Random measurements via
Orthogonal Matching Pursuit ," IEEE Transactions on information Theory, vol. 53, no. 12,
pp. 4655–4666, December 2007.
[37] Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The elements of statistical learning,
2nd ed. New York, NY, USA: Springer, 2009.

80

[38] M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho, "Simultaneous cartoon and texture
image inpainting using morphological component analysis (MCA)," Applied and
Computational Harmonic Analysis, vol. 19, no. 3, pp. 340–358, November 2005.
[39] Jean-Luc Starck, Michael Elad, and David L. Donoho, "Image Decomposition: Separation
of Texture from Piecewise Smooth Content," in SPIE annual meeting, San Diego, CA,
USA, 2003.
[40] Jean-Luc Starck, Michael Elad, and David L. Donoho, "Redundant Multiscale Transforms
and Their Application for Morphological Component Separation," Advances in Imaging and
Electron Physics, vol. 132, pp. 287–348, 2004.
[41] N. Ahmed, T. Natarajan, K.R. Rao, "Discrete Cosine Transform," IEEE Transactions on
Computers, vol. C-23, no. 1, pp. 90-93, January 1974.
[42] Behrooz Parhami, Introduction to parallel processing. New York, NY, USA: Plenum Press,
1999.
[43] Julien Mairal. SPArse Modeling Software. [Online]. http://www.di.ens.fr/willow/SPAMS/

81