COMBINATORIAL METHODS FOR COMPRESSED SENSING
By
Abdolreza Abdolhosseini Moghadam

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Electrical Engineering – Doctor of Philosophy
2014

ABSTRACT
COMBINATORIAL METHODS FOR COMPRESSED SENSING
By
Abdolreza Abdolhosseini Moghadam
The inefficiency of the classical signal acquisition systems where a signal is sampled at the
Nyquist rate and then compressed, raises the question that without knowing any prior, can’t we
directly sense a compressed version of the signal? In a series of papers, it has been shown that
under certain conditions for the sensing operation and the signal of interest, a non-adaptive linear
sampling scheme called Compressed Sensing (CS) can achieve such objective. Consequently a
number of decoders have been proposed to recover a signal from it compressive samples. Two
extreme approaches for those decoders are convex relaxation and combinatorial methods. The
former approach requires the least number of samples for an exact reconstruction at the price of
high complexities while the latter approach has low complexities but requires a higher number
of samples for recovery. This thesis targets to design a CS approach which keeps only the best
properties of convex relaxation and combinatorial approaches. In particular, we plan to show that,
under our proposed sensing operator, a fast combinatorial approach would be an instance of the
convex relaxation and thus it inherits all good properties of convex relaxation approaches, namely
optimality in sample requirement and robustness in the presence of noise.

Keywords: Compressed sensing, compressive sampling, sparse coding, combinatorial algorithms, image processing and signal processing.

I dedicate this dissertation to the nicest family, which I was fortunate enough to be part of, i.e. my
parents (Eynollah and Farkhondeh), my brothers (Mohammad Reza, Hamid Reza –rest in peace–,
Ahmad Reza and Mohammad) and my beloved wife, Sara.

iv

ACKNOWLEDGMENTS

I would like to thank my thesis committee members Dr. Jonathan I. Hall, Dr. Selin Aviyente
and Dr. Karim G. Oweiss for their time, patience and valuable comments.
I also wish to express my utmost gratitude to Dr. Hayder Radha, my advisor. His guidance,
support, sensible trust and so many other positive attributes of his, have been my greatest assets in
the last seven years. Truly, it was both a privilege and an honor to know him and to be his student.
Finally, I want to thank my labmates, specially Mohammad Aghagolzadeh, Iman Barjasteh and
Muhammad Usman Ilyas for helpful (and sometimes fun) discussions and time that we had and
spent together.

v

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 A new sampling scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 From Compressed Sensing to under-determined system of equations
1.3.2 Uniqueness of the solution . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Tractability of the decoder . . . . . . . . . . . . . . . . . . . . . .
1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Motivation for the proposed research . . . . . . . . . . . . . . . . . . . . .
1.6 Challenges and proposed solutions . . . . . . . . . . . . . . . . . . . . . .
1.6.1 Going to higher fields . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.2 Augmenting with dense random rows . . . . . . . . . . . . . . . .
1.6.3 Robust and deterministic solver . . . . . . . . . . . . . . . . . . .
1.6.4 Changing the problem formulation . . . . . . . . . . . . . . . . . .
1.7 Proposal organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

1
1
2
4
4
5
7
9
12
14
16
17
17
18
18

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

20
20
24
25
26

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

28
28
28
29
32
38

Chapter 4
Complex Sparse Projections for Compressed Sensing (CRISP)
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Forming the projection matrix . . . . . . . . . . . . . . . . . . . . . .
4.3 CRISP Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 The Average Behavior Analysis . . . . . . . . . . . . . . . . .
4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

44
44
44
47
50
51
55

Chapter 2
Compressed sensing under hybrid projections
2.1 Recursive approach for CS . . . . . . . . . . . . . . .
2.2 Augmenting with dense rows . . . . . . . . . . . . . .
2.3 Alignment . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Lexical order . . . . . . . . . . . . . . . . . . . . . .
Chapter 3
An extreme case of hybrid projections
3.1 Introduction . . . . . . . . . . . . . . . . . .
3.2 Hybrid Compressed Sesning . . . . . . . . .
3.2.1 HCS Solver . . . . . . . . . . . . . .
3.3 Optimal ratio of ms and md under HCS . . .
3.4 Simulation Results . . . . . . . . . . . . . .

vi

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

Chapter 5
Modified SSMP . . . . . .
5.1 SERP projection matrix . . . . . .
5.2 w-RIP . . . . . . . . . . . . . . .
5.2.1 Implications of w-RIP . .
5.3 w-RIP for SERP matrices . . . . .
5.4 A Combinatorial solver for SERP .
5.5 Simulation Results . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

57
58
60
63
65
72
82

Chapter 6
Future works . . . . . . . . . . . .
6.1 Introduction . . . . . . . . . . . . . . . .
6.2 Another look at combinatorial algorithms
6.3 noise . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

88
88
88
94

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

vii

LIST OF FIGURES

Figure 3.1

From left to right, the k-sparse image of Monalisa, reconstructed by
applying HCS, BP and OMP respectively to the whole image (m =
150746). The decoding times are tbp = 109.5077, thcs = 16.1704 and
tomp = 15.1670 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Figure 3.2

From left to right, the k-sparse image of Lenna, reconstructed with m =
28 compressive samples by applying HCS, BP and OMP respectively on
blocks of size 8 × 8 pixel. The decoding times are tbp = 41.0283, thcs =
4.4735 and tomp = 1.1512 seconds. . . . . . . . . . . . . . . . . . . . . 39

Figure 3.3

m, n and k represent number of samples, signal length and signal sparsity respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Figure 3.4

m, n and k represent number of samples, signal length and signal sparsity respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Figure 3.5

m, n and k represent number of samples, signal length and signal sparsity respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Figure 4.1

Comparing Compressive Demosaicing with some well-known demosaicing algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Figure 4.2

Comparing Compressive Demosaicing with some well-known demosaicing algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Figure 4.3

Comparing Compressive Demosaicing with some well-known demosaicing algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Figure 5.1

Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0
and sampling ratio m/k, when the projection matrix is sparse binary and
the solver is SSMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Figure 5.2

Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0
and sampling ratio m/k, when the projection matrix is a sparse realvalued sensing matrix (SERP) with r = 1 and the solver is MSSMP. . . . 84

viii

Figure 5.3

Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0
and sampling ratio m/k, when the projection matrix is a sparse realvalued sensing matrix (SERP) with r = 2 and the solver is MSSMP. . . . 84

Figure 5.4

Comparing the performance of the sparse real valued projection matrices
and MSSMP with sparse binary projections under SSMP decoder when
samples are noisy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Figure 5.5

Comparing the performance of the sparse real-valued and dense Gaussian
projection matrices when BP is the decoder and samples are noisy. . . . . 86

Figure 5.6

Comparing the performance of the sparse real-valued and dense Gaussian
projection matrices when Cosamp is the decoder and samples are noisy. . 87

ix

Chapter 1
Introduction

1.1 A new sampling scheme
For many decades, the Nyquist-Shannon sampling theorem was the tenet core of signal acquisition systems. According to this theorem, a signal can be completely determined by its samples
if the sampling rate is twice the maximum frequency of that signal. On the other hand, there are
numerous applications where sampling at Nyquist frequency is either impossible, very expensive
or time consuming. For instance, sensing each sample in Magnetic resonance Imaging (MRI) is so
time consuming that, a full MRI might take several hours which is clearly inconvenient for patients
[5]. Another example could be the class of signals with such high frequencies (e.g. terahertz) that
there is no sampling device operating at those frequencies. On the other hand, we often deal with
signals with a high level of correlation in some domain. For instance, it has been known for a long
time that natural images can be approximated virtually with no perceptional loss by employing the
Discrete Cosine Transform (DCT) or different types of Wavelets [51, 52, 53] and by retaining a
fairly low number of coefficients. Indeed, this fact has been extensively used in efficiently compressing these classes of signals where instead of storing the image, a few “significant” transform
coefficients of that image are stored and the rest are discarded. This classical technique warns us
about the inefficiency of current sampling approach that a highly redundant signal would be sampled at the Nyquist rate and then after compression all those samples would be thrown away. This
raises the question “why go to so much effort to acquire all the data when most of what we get will
1

be thrown away? Can’t we just directly measure the part that won’t end up being thrown away?”
[2]. Consequently, a new sampling framework was proposed independently by Donoho in [2] and
Candes, Romberg and Tao in [5]. This scheme is widely referred to as “Compressed Sensing” or
“Compressive Sampling” and often abbreviated by CS [1]-[15].
In those papers, it has been shown that signals with high correlations in some domain can be
recovered from (sometimes much) fewer samples compared to traditional approaches. This can be
achieved by exploiting:
• the fact that high correlation translates into existence of some “sparsifying domain” where
the transform coefficients of those signals in respective domains are mostly zero or have very
low magnitudes.
• Certain conditions on the sensing operator, ensuring that each sample carries some (yet little)
information about all or at least a subset of the signal of the interest.
Furthermore, as opposed to traditional sampling methods where each sample is computed by convolving the signal (to be sensed) with a shifted delta dirac function, in CS each sample is computed
by correlating the signal (to be sensed) with a waveform. This immediately implies that CS (at least
in its original form) is a non-adaptive linear sampling scheme. Before presenting the compressed
sensing framework, let us introduce the notations which will be used in this report.

1.2 Notations
In this report, we denote vectors by bold lower case letters (e.g. α) and matrices by bold upper
case letters (e.g. P ). For a natural number q ∈ N, define [q]

{1, 2, . . . , q}. The cardinality

of a set S is shown by |S|. The set difference operation is denoted by \. The complement of S
2

is represented by S c . The l-th entry of the vector a is denoted by al . The support of a signal is
denoted by Supp(.), i.e.:
Supp(x)

{j : xj ̸= 0}

(1.1)

We use I n to show the identity matrix of size n. We often use MATLAB notation for identifying
submatrices of a matrix. For instance, for P ∈ Rn1 ×n2 , s1 ⊆ [n1 ] and s2 ⊆ [n2 ], by P s1 ,s2 we
mean the submatrix of P restricted to rows of s1 and columns of s2 . Similarly we use the notion
of ‘:’ for instance P s1 ,: (or P :,s2 ) to keep only rows (columns) indexed by s1 (s2 ). The transpose
of P is denoted by P T . The Moore Penrose pseudoinverse of a matrix P is P † , that is
(
)−1
PT
P† = PTP

(1.2)

We use ⊙ for Hadamard (point wise) product. That is for a field C and a, b, c ∈ Cn , if
a = b ⊙ c then al = bl cl for l ∈ [n]. Also ⊗ stands for Kronecker tensor product. In other words,
for n1 , n2 , m1 , m2 ∈ N, A ∈ Cn1 ×n2 and B ∈ Cm1 ×m2 we have:




 A1,1 B . . . A1,n2 B


..
..
..
A⊗B =
.
.
.


An1 ,1 B . . . An2 ,n2 B








For p ≥ 1, the ℓp norm of a real valued vector a ∈ Rn denoted by ∥a∥p is:

∥a∥p =

( n
∑
i=1

3

)1
|ai |p

p

Two special cases are p = 0 and p = ∞ where we define pseudo norms of ℓ0 and ℓ∞ by:

i ∈ [n] : ∥a∥0

#{i : |ai | ̸= 0}, ∥a∥∞

max{|ai |}

In words ℓ0 of a vector counts the number of non-zero entries of that vector and ℓ∞ is the maximum
absolute value of that vector. If necessary, we define other notations in the subsequent chapters.

1.3

Compressed Sensing

1.3.1 From Compressed Sensing to under-determined system of equations
Consider the real valued discrete signal1 s ∈ Rn of length n ∈ N. In the traditional sensing
scheme, the sensing operator is I n , the identity matrix of size n. That is y, the vector of sensed
samples is simply
y = I ns = s
The theory of Compressed Sensing attempts to answer the following question: is there any linear
undersampling operator Φ ∈ Cm×n where m < n and a decoder ∆ such that having samples
ˆ = ∆(Φ, y) has the minimum error in some norm ℓp ?
y = Φs and Φ, the estimated solution s
That is, design ∆(., .) and the sampling operator Φ such that:

min ∥∆(Φ, y) − s∥p

(1.3)

Unfortunately, such solution can not be found using classical linear algebraic approaches. To
see that, note y = Φs is an underdetermined system of linear equations and has an infinite set of
1 The following arguments can be extended to analog signals by first discretizing an analog signal.

4

solution. More specifically, recall that for a matrix Φ ∈ Rm×n , its kernel or null-space is:

ker (Φ) = {z ∈ Rn : Φz = 0}

(1.4)

where 0 is a vector with all zero entries. Now note that in case of our problem where m < n,
the rank of the null-space of Φ is at least n − m, meaning that ker (Φ) ̸= ∅. Now assume that
z ∈ ker (Φ) then for all α ∈ R we have

y = Φs = Φs + αΦz = Φ (s + αz)

Thus given y and Φ there are infinitely many solutions (in the form of s + αz) and we do not
know which one is the correct solution. Consequently, first we need a guarantee that the solution
s, which we are looking for, is unique.

1.3.2 Uniqueness of the solution
As noted before for many interesting classes of signals, most notably audio and images, there exist
sparsifying transforms. More specifically, by a sparsifying transform Ψ ∈ Rq×n for the signal
s ∈ Rq we mean that:
s = Ψx, k = ∥x∥0 < q

(1.5)

where x is often referred to as transform coefficients or representation of s in the Ψ domain.
Usually n ≥ q to span all vectors in Rq . A measure for indicating how “good” a sparsifying
transform Ψ is, the ability of that transform to make representations x more sparse. In words, we
expect that for all signals {s(i) }i belonging to a specific class D (e.g. images of w1 × w2 pixels),

5

we have:
∀s(i) ∈ D, ∃k ≪ n, x ∈ Rn : s(i) = Ψx, ∥x∥0 ≤ k

(1.6)

Here we present the arguments in [43] to show that how the existence of a good sparsifying
transform, changes drastically the uniqueness of a solution for an under-determined system of
equations y = Φs. Suppose s has a k-sparse representation x in the Ψ domain, i.e. ∥x∥0 = k
and s = Ψx. As in the same paper [43], let us use the term “projection matrix” for the matrix
P = ΦΨ. Then
y = Φs = ΦΨx = P x
Now assume that every 2k columns of the projection matrix are linearly independent. We claim
that x is the unique minimizer to the following optimization problem:

ˆ
arg min∥ˆ
x∥0 : s.t. y = P x

(1.7)

ˆ
x

We prove this claim by contradiction. Assume there exists another solution, say v ̸= x such that
v is also at most k-sparse (∥v∥0 ≤ k). Thus

P x = P v , P (x − v) = 0

Consequently the vector x − v is in the null-space of P . However x − v is at most 2k sparse. This
violates the assumption that every 2k columns of the projection matrix are linearly independent.
Hence x is the unique minimizer of (1.7). Note that, the immediate consequence of this proof is
that m = 2k is the minimum number of samples that one can sense from a signal with a k-sparse
representation and does not lose any information. Since otherwise rank of P would be less than

6

2k and one can find 2k linearly dependent columns of P .
Now one can hope for solving (1.7) and then reconstruct the original signal s by s = Ψx.
Unfortunately, solving (1.7) is exponentially hard. Since, one must verify that can y be spanned
by only one column of P . Then he (she) tries any two columns of P to span y and so on. Consequently the next question after the uniqueness of the solution, is that: does there exists any decoder
∆(., .) with polynomial time complexity for (1.7)?

1.3.3 Tractability of the decoder
The problem of the form (1.7) has been around in the signal processing and also physics communities for a long time, specifically for finding the transform coefficients of a signal in an overcomplete
sparsifying transform where the number of atoms (or basis functions) of that transform is more than
the dimension of the signal [17, 26, 52, 53]. Heuristically, the problem of (1.7) has been relaxed to
the following optimization problem:

arg min∥ˆ
x∥1 : s.t. y = Φˆ
x

(1.8)

ˆ
x

The reason behind changing the ℓ0 norm in (1.7) to ℓ1 norm formulation of (1.8) is that, ℓ1 norm is
the closest convex and valid norm to the non-convex pseudo norm of ℓ0 . Furthermore, the solution
to (1.8) often has many small values, since any other ℓp>1 norm, all the values with the magnitudes
in the range of (−1, 1) would be damped. Thus if any other norm than ℓ1 was used in (1.8), then
the solution would more likely have many values in the range of (−1, 1) which is not desirable
(recall that we are interested in finding a sparse solution).
Independently, Donoho in [2] and Candes, Romberg and Tao in [5] have proved that under
certain conditions (to be noted shortly) for the projection matrix P the solution of (1.8) is indeed
7

equal to the solution of (1.7). This is a major breakthrough since the problem of (1.8) is a convex
optimization problem (more precisely linear programming) and can be solved tractably. Let us
discuss the necessary conditions on the projection matrix.
For a matrix P and each integer k = 1, 2, . . ., define the “isometry constant” [5, 3] of order k
as the smallest number δk such that:

(1 − δk )∥x∥22 ≤ ∥P x∥22 ≤ (1 + δk )∥x∥22

(1.9)

holds for all k-sparse vectors x. Note that a small isometry constant is equivalent to these
statements:
1. That matrix (approximately) preserves the geometry (norm) of the signal x, i.e. ∥x∥2 ≈
∥P x∥2 .
2. That matrix approximately preserves the distances between vectors in the compressed space
∥x1 − x2 ∥2 ≈ ∥P x1 − P x2 ∥2 .
3. All eigen-values of the sub-matrices (with k columns) of P are in the range of [1−δk , 1+δk ].
4. Submatrices (with k columns) of P approximately behave like an orthonormal frame.
We say that a matrix P has a “Restricted Isometry Property” (RIP) of order k, if δ2k <

√
2 − 1.

Now we can present the main theorem of Compressed Sensing (in a compact form) [3].
Theorem 1.3.1. Assume y = P x + e, ∥e∥2 ≤ ϵ, δ2k <

√
ˆ = arg min ∥¯
2 − 1 and let x
x∥1 subject

¯ ∥2 ≤ ϵ. Then
to ∥y − P x

∥ˆ
x − x∥2 ≤ C0 k −1/2 ∥x − x⋆ ∥1 + C1 ϵ

8

where C0 and C1 are constants (independent from n) and x⋆ is the best k-sparse approximation
to x.
The theorem is extremely strong, since:
• It is robust to noise e.
• Even if the signal of interest x is not exactly sparse, it returns the best sparse approximation
of that signal.
The only question to be answered is: which matrices have RIP? Generally verifying RIP for
matrices is either impossible or the bounds are not very tight. However, there are proofs of the
existence of RIP for dense matrices with entries coming from (a) uniformly random (b) normal
distribution (c) Bernoulli matrices and (d) partial Fourier matrices [16, 5, 2]. Furthermore the best
bound on the number of rows or equivalently the number of samples required to guarantee RIP
is for dense matrices with entries from uniformly random or normal when m = O (k log(n/k)).
In other words, to make the recovery of a k-sparse vector x from an under-determined system of
linear equations y = P x tractable, we have to increase the number of samples from m = 2k to
m = O (k log(n/k)). Furthermore, this bound is tight. In other words, if the order of the number
of samples is less than O (k log(n/k)), then either the recovery is not possible for “all” sparse
signals or there is no tractable decoding algorithm for that amount of samples.

1.4 Related Works
Broadly speaking there are three approaches to the problem compressed sensing:
1. Convex relaxation: As stated before,the first proposed protocol for CS was relaxing the
non-convex cost function of min ∥.∥0 to the convex one min ∥.∥1 and use random dense
9

matrices for the projection matrix (guaranteeing RIP). Since the derived bounds in [5, 2] are
tight, only a few number of research directions remain in the convex relaxation approaches.
One is to design more efficient (in terms of complexity) linear programming solver to find
the solution to (1.8). This can range from using gradient descent methods [19]-[24] to recent
interior points in [17]. The other research direction in this category is to reformulating (1.8)
such that either the complexity decreases or the quality of reconstruction increases. For
instance the algorithm Basis Pursuit (BP) [17] exactly solves (1.8). While there are other
formulations such as Lasso [18] which solves the following problem:

arg min∥y − Ψˆ
x∥2 : s.t. ∥ˆ
x∥ 1 ≤ k

(1.10)

ˆ
x

In practice, the complexity of Lasso is slightly lower than BP but it usually leads to higher
errors of reconstruction as well. Furthermore, usually the sparsity level k of the unknown
vector x is not known beforehand, except in possibly some applications. Consequently BP
has become the most popular solver in this category of decoders within the CS community.
2. Greedy pursuit: In these approaches [26, 28, 30, 29], a greedy approach based on the
original Matching Pursuit (MP) [25] is adapted for finding the solution to (1.7) where in each
step of the algorithm, one coordinate of the solution would be set to a non-zero value such
that the residue of the approximation up to that point would be minimized. That is, having
ˆ , the following problem is
samples y = P x, projection matrix P and current estimate x
solved:
arg max
i

ˆ)
P T:,i (y − P x
ˆ ∥2
∥P T:,i ∥2 ∥y − P x

(1.11)

ˆ i would be updated accordingly to minimize ∥R∥2 where R = y − P x
ˆ is the
and then x

10

residue of approximation till that step. The reason behind this greedy decision is that, if the
correlation of the residue with the i-th column of the projection matrix is high, then there
is a good chance that xi has large magnitude. Note that these greedy approaches directly
attempt to solve the ℓ0 minimization problem of (1.7). Thus proving the convergence of
these approaches to the correct solution can be challenging. In fact except for a few methods
such as [28, 30], there is either no proof (of convergence) for many greedy methods or
certain conditions on P are required (e.g. very small values for δ3k , etc.) which validating
their existence or satisfaction for matrices might be virtually impossible.
3. Combinatorial Algorithms: In these approaches [31]-[42], a carefully designed sensing
matrix Φ is deployed in the sensing stage such that the effective projection matrix P = ΦΨ
would be a highly structured and often binary sparse matrix. Then the sparsity and structured
attributes of P are utilized usually in a group testing like algorithm to solve (1.7). Similar to
greedy approaches, the proof of converging to the correct solution in these approaches is very
difficult. Thus we do not have strong proofs as in convex relaxation methods guaranteeing
this methods would converge to the correct solution. We discuss more about this approach
later in this report.
One might ask while there are available linear programming solvers to solve (1.8), why should
we look for alternative solutions? The answer to this question is solely due to the complexity of
decoders, belonging to the convex relaxation methods. In fact, the complexity of a linear programming solver equals to solving an inverse problem of the same size. That is if the unknown sparse
vector x ∈ Rn , then the complexity of solving (1.8) is O(n3 ). Although this order of complexity
is a polynomial time, it is so high such that BP (or any other ℓ1 minimizer) can be only utilized in
small to medium size problems where the length of the signal is less than a few thousands. On the

11

other hand, for many classes of signals (e.g. digital images) the length is in the order of millions
(mega pixel images). This puts a big restriction on the usability of BP in those applications. Now
let us present a brief comparison of different CS approaches in here.
The set of algorithms belonging to convex relaxation approaches works great as long as the projection matrix holds RIP. Not only they have the highest level of robustness in the presence of noise,
they also exhibit excellent results even if the underlying signal x is not “exactly sparse”, meaning
that if the signal has many small and not zero entries. Furthermore, in practice, they require the
least number of samples to recover x from y = P x. However, as stated before, the main disadvantage of these solvers is their high complexities. Furthermore, this class of solvers usually only
operates on dense projection matrices P . Thus in applications where storage is of concern, storing
the projection matrix suitable for convex relaxation methods is another challenge (O(mn)). On the
other extreme, approaches belonging to the class of combinatorial algorithms have the lowest complexities, usually either linear or sub-linear in the signal length ≈ O(n). Furthermore, the sparsity
of the projection matrix translates into fast matrix operations and efficient storage (≈ O(n)). However, these algorithms (virtually always) requires the highest number of samples for the recovery
compared to the other classes of decoders. Furthermore, these algorithms are (usually) not robust
in the presence of noise and only works with exactly sparse signals. Finally, greedy approaches
are intermediate in their running time, sampling efficiency and robustness. However, many greedy
algorithms require parameter tuning which are often signal dependent [30, 28].

1.5 Motivation for the proposed research
Indeed, the ultimate goal in the compressed sensing framework is to design an effective projection
matrix P and the associated decoder (solver) ∆ with the following characteristics:

12

1. The required number of samples for the recovery of x from y = P x is optimal, i.e. m =
O ( k log(n/k)).
2. The complexity of the solver is minimal, for example O(n) or even lower.
3. The scheme is robust to noise.
4. The algorithm works even with approximately sparse signals.
So far all of these properties have not been satisfied by a single scheme. We plan to achieve this
goal by utilizing the valuable lessons which CS approaches provide us. Specifically we plan to
design a CS scheme with the best properties of convex relaxation and combinatorial approaches.
At first glance, this task seems impossible since convex relaxation methods (virtually always) operates on random dense projection matrices while combinatorial approaches only works with highly
structured sparse binary matrices. However in this thesis, we plan to show and prove that with
some modifications of the projection matrices suitable for combinatorial algorithms and designing
associated solver, we might be able to achieve the aforementioned goals. To that end, we must
slightly deviate from classical combinatorial approaches. More specifically we investigate the effect of substituting the binary entries in a sparse structured matrix with ones from larger fields, say
real or complex numbers. Note that, this technique is reminiscent of modifying codes in classical
channel coding theory [54, 55]. Consequently new respective solvers must be developed for this
new class of projection matrices. One of our objectives is to emulate the Luby Transform (LT)
decoder [50] (which is a rateless, near optimal erasure correcting code) for solving the problem
of CS. In fact, it has been observed that the problem of compressed sensing (solving an underdetermined system of equations when the solution is sparse) has very close relationship with the
problem of channel coding where we have a noisy over-complete system of linear equations [9].
Thus, another objective of this thesis is to establish a strong link between the problems of channel
13

coding and CS. Finally, we introduce a systematic perspective to the problem of CS under combinatorial approaches which might later help us in finding rigorous proof for the convergence of the
proposed solver.

1.6 Challenges and proposed solutions
To design a low complexity yet robust and optimal (in terms of sample requirement) compressed
sensing framework, it would be beneficial to find the reasons why combinatorial approaches to
CS fail to achieve the same goal. A typical combinatorial algorithm usually follows this simple
algorithm to recover a sparse signal x of length n from m < n samples y = P x:
ˆ = 0, a zero vector of length n.
1. Initialize with the estimation of x
2. Let si be the column indices where the i-th row of the projection matrix is non-zero (and in
fact one):
si

{j : P i,j ̸= 0}

(1.12)

We say that the i-th sample spans signal coordinates si .
3. Find a subset of sample indices ω ⊆ [m] with the following properties: (a) the size of this
set is more than a threshold c (|ω| > c) (b) all samples with indices in ω have the same value
∀i, j ∈ ω : y i = y j and (b) all the rows indexed by ω coincide in only one column index:
∩
| i∈ω si | = 1.
4. Let a =

∩

ˆa ← x
ˆ a + y b where b can be any member
i∈ω si then update the estimate with x

of the set ω.
5. Update the sample values to y ← y − P :,a y b .
14

6. Go to step 3 and continue this operation until y is an all zero vector.
Let us briefly explain the task of this algorithm. After initialization in the first two steps, the
algorithm finds a subset of samples which all of them vote the same value and all of them span
only one common signal coefficient xa . Since the projection matrix is binary, thus the algorithm
assumes the same votes comes from xa . Consequently, the signal at that coordinate would be
estimated as the votes. Similar to many message passing algorithm (e.g. LDPC codes [57, 58])
and after subtracting the effect of that estimation from the samples, the algorithm iterates the
same procedure. It might be the case that the estimate is wrong. However, one would wish that the
number of correct estimates is higher than the wrong ones and hence the algorithm would converge
to the correct solution.
Here a few notes should be highlighted. The assumption that “if a number of samples have
equal values and they span only one common signal coordinate, therefore that common coordinate
has to be non-zero”, is not valid and one can find many counter examples. For instance, if a number
of signal coefficients are equal and there exists a subset of samples that span only one of those nonzeroes then those samples would be equal as well. In fact, that assumption is a necessary but not
a sufficient condition, that is if a number of samples spans only one common non-zero then they
would be equal. This problem is overcome in traditional combinatorial algorithm by choosing the
threshold c (the minimum number of samples that vote the same) large enough to increase the level
of confidence. However as we increase c, we have to increase the number of samples as well, since
otherwise, no c samples can be found with the aforementioned properties. This observation justifies
one of the reasons why in practice, combinatorial algorithms require a higher number of samples
(m) when compared to other classes of CS decoders. Furthermore, if the samples are contaminated
by noise then exactly equal samples would not be found. A remedy for this problem can be finding
a subset of samples where they are within a small radius of each others. Although this correction
15

might work where the noise has very small magnitudes, it usually fails. Consequently, designing
robust combinatorial CS approaches still remains an open problem. Another negative result about
the projection matrices of traditional combinatorial algorithm is that, those binary sparse matrices
do not hold RIP [44]. Indeed, RIP is the most important reason why convex relaxation methods
are stable under the presence of noise. Thus designing alternative robust combinatorial solvers
seems challenging (if not impossible), if we keep the sampling process intact. Here we present our
solution to these inherent problems with combinatorial CS approaches that employ binary sparse
projection matrices.

1.6.1 Going to higher fields
The first strategy that we employ to improve the characteristics of combinatorial approaches is to
replace binary entries of the projection matrix with numbers from larger fields, e.g. real or complex
numbers. As we show later in this report, this would aid to make milder assumptions about the
underlying signal for the purpose of recovery. On the other hand, this modification introduces
other problems. For instance, even if a subset of samples span only one (unique) non-zero signal
coefficient then those samples would not be equal. We propose a simple and efficient solution for
this problem in the next chapter. More specifically, we (jointly) look at a subset of size l < m
samples as a vector in Hl where H is the field from which the entries of P are. Then, we look
for a (subset of) columns of the projection matrix which aligns in that direction. Furthermore, we
can carefully choose entries of the projection matrix such that sub-matrices of P would have a
lexical order for the decoder. This technique reduces the complexity of our solver significantly as
we show in the following chapters.

16

1.6.2 Augmenting with dense random rows
As noted before, there are interesting links between the problem of CS and channel coding. This
link is more strong when we consider the class of combinatorial algorithms. It has been shown
in some codes [50], for instance the robust soliton distribution in LT codes that adding “dense
encoding symbols” helps in the stability of the decoder in deviating from the average behavior. We
consider this option in the design of our projection matrix where it might be the case that some
random dense rows would be added to the projection matrix. Finding the optimal ratio of dense
and sparse rows (in P ) is one of our objectives.

1.6.3 Robust and deterministic solver
Recall that in classical combinatorial algorithms some wrong decisions might happen in finding the
support of the signal. These wrong decisions would be corrected in a message passing algorithm.
However, this permission (of wrong decisions) leads to extra iterations of the algorithm. In this
thesis, we plan to design our solver such that all the decisions would be correct which translates
into decreasing the complexity of the solver and running time. As we show later in this report, we
require a RIP condition on the projection matrix to guarantee that no mistake can be made during
the recovery. Meanwhile RIP translates into robustness (in the presence of noise) at least for
convex relaxation algorithms. We present our numerical experiments that our proposed projection
matrices holds RIP and provide guidelines how to prove that claim. Hence, we can hope for a
robust optimal combinatorial CS framework.

17

1.6.4 Changing the problem formulation
Another contribution of this work is to introduce a systematic formulation for the problem of CS
under a combinatorial framework. In our formulation (as we present in the next chapter), we
show that combinatorial algorithms are in fact a divide and conquer approach to find the sparsest
solution to an under-determined system of equations. Then in the last chapter of this report, we
present necessary and sufficient conditions for convergence and robustness.

1.7 Proposal organization
This thesis has two major parts. In the first part, we solve the problem of CS in the ideal case. i.e.
the signals of interest are exactly sparse and the measurements are noise free. Furthermore, some
of the analysis in the first part predicts only the average behavior of the proposed schemes. The
second part of this thesis includes more realistic scenario where the signals are no longer exactly
sparse and measurements are noisy. Also all the proofs will hold true either with probability one
(either strictly or asymptotically) or with overwhelming probability. Nevertheless, we review the
following chapters in here:
• In Chapter 2, we introduce “hybrid projection matrices” as a candidate for projection matrices in the problem of CS. This class of matrices is composed of two parametric dense and
sparse matrices. Then we present our generic solver for this class of matrices.
• In Chapter 3, we consider the most simple case of hybrid projection matrices where rows
of the spars sub-matrix span disjoint coordinates of the signal. We derive a bound on the
sample requirements, optimal ratio of the dense rows to sparse rows and complexity of the
scheme. Our simulation results also verify that this simple scheme, in certain applications,

18

can outperform many state of the art algorithms in the ideal case.
• In Chapter 4, we consider the case where rows of the sparse sub-matrix of our projection
matrix coincide in some column indices, however there is no dense submatrix present in
the projection matrix. Two interesting types of incidence are those from the adjacency matrices of (a) expander graphs [63] and (b) block designs, especially Balanced Incomplete
Block Designs or BIBD [49]. We show that expander graphs lead to an optimal (in terms of
sampling requirements) scheme.
• Chapter 5 is devoted to future works and our plans to extend our work to the case of compressible signals and noisy measurements.

19

Chapter 2
Compressed sensing under hybrid
projections

2.1 Recursive approach for CS
In this section, we present our systematic approach to the problem of CS. Although most combinatorial approaches follow a similar path, to the best of our knowledge, the abstract formulation
presented below is novel and unique. For the simplicity of notation and without any loss of generality, assume the signal of interest is sparse in the standard basis Ψ = I n . That is the sensing
matrix and the projection matrix are the same P = ΦΨ = Φ and x = s.
Consider two disjoint non-empty sets of T ⊂ [n] and T c where by definition T ∪ T c = [n]. For
all vectors x ∈ Rn and norms ℓp we have:

p

p

p

∥x∥p = ∥xT ∥p + ∥xT c ∥p

However norms of ℓ0 and ℓ1 are special since the equality changes to:

∥x∥0 = ∥xT ∥0 + ∥xT c ∥0 , ∥x∥1 = ∥xT ∥1 + ∥xT c ∥1

and a similar equality generally does not hold true for other norms. We show that this special

20

property of ℓ0 and ℓ1 enables us to follow a divide and conquer approach to the problem of CS.
Recall that the goal of CS is to find the vector x ∈ Rn with minimum sparsity and which is the
solution to the under-determined system of linear equations y = P x ∈ Rm where m < n, i.e.:

ˆ
x = arg min∥ˆ
x∥0 : s.t. y = P x

(2.1)

ˆ
x

One might follow a divide and conquer approach to this problem where coordinates of x are
partitioned into two disjoint non-empty sets of T, T c ⊂ [n] and then solve the following problem:

min ∥xT ∥0 + min ∥xT c ∥1 , s.t. y = P x

(2.2)

It can be easily verified (e.g. by contradiction) that the solution to this optimization problem
equals to the solution to the original format of CS (2.1). This divide and conquer approach has
all advantages of a recursive program. For instance, after dividing the signal coordinates into
two disjoint sets T and T c , one might partition each of these two sets into two more disjoint
non-empty subsets. This division can be continued until we reach to a point where finding the subsolution with the minimum ℓ0 can be done tractably and then traverse that tree upward. However,
this approach implies some undesirable conditions on the projection matrix. More specifically, to
implement this divide and conquer algorithm even in the first level, we require that the sample y
can be divided into disjoint subsets Q and Qc where samples y Q only span1 xT and y Qc only
span xT c . This is the case, since otherwise the sub-problems (min ∥xT ∥0 and min ∥xT c ∥0 ) would
not be independent and a divide and conquer approach will not work. Thus in this case, there exist
1 Recall that we say the i-th sample spans coordinates ω if ω = {j : P }.
i,j

21

permutation matrices Π1 and Π2 such that:




 A 0 
Π1 P Π2 = 

0 B

(2.3)

c
c
where A ∈ R|Q|×|T | and B ∈ R|Q |×|T | . Unfortunately, this highly structured matrix most likely

does not hold RIP [3, 16] and hence it might lead into a non robust solution. To avoid this problem
we have to consider some overlaps among the span sets of different samples.
Knowing that samples span overlapping subsets of the signal coordinates, we slightly change
the formulation of (2.2) to the following:

min ∥xT ∪a ∥0 + min ∥xT c ∪a ∥1 , s.t. y = P x

(2.4)

where a is a (small) non-empty subset of [n]. Generally, the solution to this problem is not equal
to the original problem of (2.1) unless xa is an all zero vector. In this paper, we attempt to solve the
problem of CS using this formulation. That is find two overlapping subsets of signal coefficients,
say T ∪ a and T c ∪ a such that xa is zero and solve each of two sub-problems independently.
Suppose the i-th sample (i ∈ [m]) spans indices si of a k-sparse signal of length n. Without loss
of generality, assume that support of the signal is distributed uniformly. Therefore, the probability
that y i spans j non zeros is:
(k)( n−k )
j |si |−j
(n)
Pr (|si ∩ Supp(x)| = j) =

(2.5)

|si |

which is a hypergeometric distribution with the mean of |si |k/n. Note that for a dense matrix

22

where all entries are non-zero |si | = n, we have

Pr (|si ∩ Supp(x)| = k) = 1

(meaning that all samples spans all non-zeros), which is not surprising! However, for sparse matrices where |si | ≪ n and k ≪ n this probability (and the corresponding mean) is very small. In
other words, for a sparse projection matrix (even considering the variance and deviation from the
mean) most samples span only a few non-zeros (if any).
It turns out that finding samples which spans only zero valued coefficient is not trivial. Since
otherwise, we would have known the support of the signal and the problem has been already solved
by forming a full-rank system of equations. As we show in the following chapters finding isolated
non-zero coefficients seems to be a simpler task. We say that the non-zero coefficient xi is isolated
in the j-th sample if: Supp(xsj ) ∩ sj = i, i.e. the only non-zero coefficient sampled by y j is
xi . In the scenario when the isolated non-zeros can be identified , one may apply the following
algorithm to solve the problem of Compressed Sensing.
ˆ = ∆s (P , y)
Algorithm 1: A recursive decoder x
input : Samples y = P x, Projection matrix P
ˆ
output: x
Find a (number of) sample(s) y j that spans an isolated non-zero, for instance xi ;
Let a be the estimate of that isolated non-zero;
ˆ i ← a;
x
y ← y − P :,i a (subtract the effect of estimation from samples);
P ← P [m]\j,[n]\s , y ← y [m]\j (throw away used samples and respective rows and
j

columns of the projection matrix);
ˆ [n]\s ← ∆(P , y);
x
j

Note that Algorithm 1 is exactly our previous recursive formulation (2.4) when ∥xT ∥0 =
1. The same algorithm can be expressed in a non-recursive formulation which is presented in

23

Algorithm 2.
ˆ = ∆s (P , y)
Algorithm 2: Non recursive formulation of decoder x
input : Samples y = P x, Projection matrix P
ˆ
output: x
while ∥y∥ ̸= 0 do
Find a (number of) sample(s) y j that spans an isolated non-zero, for instance xi ;
Let a be the estimate of that isolated non-zero;
ˆi ← x
ˆ i + a;
x
y ← y − P :,i a;
end

2.2 Augmenting with dense rows
Algorithm 1 succeeds in recovering the sparse vector x if we always can find a number of samples
that span an isolated non-zero. This is equivalent to guarantee the ripple in the LT codes never
gets empty [50] before complete recovery. In LT codes, this guarantee comes from a carefully
designed degree distribution over message and parity nodes. However unlike the channel codes,
in CS we are not aware of the support of the signal. Consequently, it would be extremely difficult
(if not impossible) to design span sets {si } such that we can always find a number of samples
spanning an isolated non-zero. To overcome this problem and for other reasons to be discussed in
the following chapters, we augment the projection matrix with md ≥ 0 random dense rows (hence
Hybrid projection matrices). To avoid ambiguity, from now on, we use ms to denote the number of
sparse rows (samples) of a projection matrix and use md to denote the number of dense rows of the
same projection matrix. Now, assume that after a number of iteration, the algorithm ∆s (., .) can not
find any other isolated non-zero and halts. Furthermore, assume that up to that point signal values
in indices of ω ⊆ [n] have been recovered and samples y q coming from ms sparse rows of the
projection matrix have been used. If the number of undiscovered signal coefficients n − |ω| is less
24

than or equal to the number of unused samples md + (ms − |q|), then we have a sufficient number
of equations to determine the signal values in the remaining indices. Algorithm 3 summarizes this
case.
ˆ = ∆h (P , y)
Algorithm 3: The hybrid solver x
input : Samples y = P x, Projection matrix P
ˆ
output: x
ˆ ← ∆s (P , y);
x
Let q ⊆ [ms ] be the sample indices which have been used from the sparse part;
Let ω ⊆ [n] be the indices where the signal has been recovered;
if n − |ω| ≤ md + (ms − |q|) then
ˆ );
ˆ [n]\ω ← P †[n]\ω (y − P x
x
end

2.3 Alignment
So far we have assumed that there exists an algorithm which can find isolated non-zeros. In here,
we present a simple algorithm for such task. Assume A ∈ Ra×b and z ∈ Rb is a vector of size b
where a < b. Furthermore define h = Az. We say that h is aligned with the i-th column of A if
h is in the direction of A:,i , that is:
|AT:,i h|
∥A:,i ∥2 ∥h∥2

=1

(2.6)

Note that if ∥z∥0 = 1, then h would be aligned with one of columns of A since:

zj =



 0 j ̸= i

 α j = i.

25

⇒ h = αA:,i

(2.7)

Now if we can prove that alignment of h with a column of A means that ∥z∥0 = 1

?

h = αA:,i ⇒ z j =



 0 j ̸= i

 α j = i.

(2.8)

then we can utilize the concept of alignment in finding isolated non-zeros in the problem of
CS. More specifically, we can substitute A with P Ω,T , h with y Ω and z with xT where T ⊆ [n]
and Ω ⊆ [m] is indices of the set of samples spanning xT and then verify whether ∥xT ∥0 = 1 or
not.
In our simulations, we have observed that the aforementioned statement holds true in case of
our projection matrices. Providing a rigorous proof to (2.8) is one of our future work. In Chapter
5, we present some guidelines on how we are going to prove it.

2.4

Lexical order

Indeed the concept of alignment for the matrix A ∈ Ra×b and a vector h ∈ Ra makes sense2 only
when a > 1. As stated before, A shall be a sub-matrix of the projection matrix P and h will be
subsets of compressive samples y = P x. Clearly, we need some properties in the entries of the
projection matrix to make the process of verifying the alignment, tractable. Since otherwise, we
have to check all sub-matrices of the projection matrix and respective subsets of samples to find
alignments, which has an exponential complexity. This problem can be solved by either of these
two strategies:
1. Slightly modifying the decoder ∆(., .).
2. Using lexical order, meaning that columns of the projection matrix have a logical order to
2 We always have alignment for a=1.

26

the decoder. That is we can find a scoring function S : Rb → R such that:

i, j ∈ [b], i < j : S (A(:, i)) < S (A(:, j))

(2.9)

In the latter case, a quick search algorithm, similar to Algorithm 4 might be used to find the
existence of an alignment. Examples for such concepts will be presented with further details in the
following chapters.
Algorithm 4: M = B (A, h, l, u): A binary search for alignment when a lexical order exists
input : A, h = Az, l, u
output: M index of alignment
if u < l then
return -1 (no alignment found);
end
M =( l + [(u
) − l)/2] ;
if S A:,M < S (h) then
return B (A, h, M + 1, u) ;
end (
)
if S A:,M > S (h) then
return B (A, h, 1, M − 1) ;
end
return M;

27

Chapter 3
An extreme case of hybrid projections

3.1 Introduction
In this chapter, we present Hybrid Compressed Sensing (HCS), an instance of class of hybrid
projection matrices, presented in Chapter 2. HCS is an extreme example of such class since in
its sparse sub-matrix, each sample spans disjoint subset of signal coefficients. As we show in
this chapter, HCS is not able to achieve the optimal sample requirements of m = O(k log n/k).
However, it can be proved that the non-optimal HCS outperforms even the state of the art solutions
in certain settings of the signal length n and sparsity k.

3.2 Hybrid Compressed Sesning
Recall that, a matrix belonging to the class of hybrid projection matrices (after some permutation
on its rows) can be decomposed into:




P =

P (s)
P (d)




(3.1)

where P (s) is a ms × n sparse and P (d) is a md × n dense matrix. In HCS, we set the non-zero
entries of P (s) and P (d) to be respectively complex and real numbers. Note that, there exists some
devices such as radar or medical imaging which capture complex valued samples. Nevertheless,
28

we count each complex valued compressive sample as two real samples (one for the imaginary
part and one for the real part). Hence, the projection matrix P in (3.1) is sensing m = 2ms + mr
measurements.
For simplicity of analysis and without loss of generality assume ms counts n. Now let s =
{s1 , . . . , sms } be a set of equal size, random subsets over [n] = 1, 2, . . . , n such that s forms a
partition:
s
∀i, j ∈ [n], i ̸= j : |si | = w, si ∩ sj = ∅, ∪m
i=1 si = [n]

Then the i-th row of P (s) is non-zero, only in the column indices of si . Also let f (i) be a unit
norm 1 × w row vector with random complex entries having distinct phases in the range of [0, π).
Then we insert the entries of f (i) in the i-th row of P (s) in the column indices of si . This can be
written as:
(s)

(c)

∀i ∈ [ms ] : P i,s = f (i) , P i,[n]\s = 0
i
i
Thus the i-th sample (i ∈ [ms ]) is a function of the signal coefficients only in the indices of
si : y i = f (i) xsi . Appending md real-valued, dense random rows to this matrix yields the final
projection matrix.

3.2.1 HCS Solver
Since there is no overlap among different rows of P (s) , finding the isolated non-zeros can not be
robust unless we made some assumptions about the signal. Let us assume that the distribution of
signal coefficients is such that for all Ω ⊂ [n] and |Ω| = w, the inner product of xΩ with a unit
norm random vector of complex numbers is not zero. Our simulations show that this assumption
holds true for test signals and images in CS community. Using this assumption we have:
Proposition 1. Let si ⊂ [n], |si | = w be the (ordered) set of indices of non-zero coefficients
29

(i)

(s)

for the i-th row of a sparse matrix P (s) . And let P i,s = f (i) = [ejϕ1
i

(i)

. . . ejϕw ] be the

corresponding vector of complex exponentials (on the unit circle) such that the random phases of
ϕ(.)
. are distinct in the range of [0, π):

(i)

(i)

∀l, t ∈ [w], l ̸= t : ϕl

(i)

∈ [0, π), ϕt ̸= ϕl

(s)

If xsi is real-valued and non-zero in at most one index (∥xsi ∥0 ≤ 1) then having y i = P i,: x =
f (i) xsi and f (i) , one can recover xsi deterministically. Moreover having y i and f (i) , we can
detect the event that ∥xsi ∥0 ≥> 2. Specifically if y i = 0 then xsi = 0. If the phase of y i equals
(i)

to the phase of l-th entry of f (i) (∠y i = ∠f l
(i)

si,l and has the value of xsi,l = y i /f l

(i)

= ϕl ) then xsi is non-zero only in the index of

= ∥y i ∥ where si,l is the l-th member of the ordered list
(i)

(i)

si . If the phase of y i is not among the phases of f (i) ({ϕ1 , . . . , ϕw }), then xsi is non-zero in at
least two indices.
Recall that hybrid solvers have two major stages: 1) finding and recovering subsets of the signal
coefficients spanning at most one non-zero coefficient and 2) forming a full-rank system of linear
equations and solving it. For the first ms complex valued compressive samples (y i , i ∈ [ms ]), the
solver looks for the phase of each compressive sample (∠y i ) among the available phases in the
(s)
i

associated random vector. As before let f (i) = P i,s , w = |si | and define:

∠f (i)

(i)

(i)

[∠f 1 . . . ∠f w ]

Assume the phase of sample y i equals to the phase of t-th entry of f (i) . Then under our assumptions and by Proposition 1, xsi is zero in all indices except in the index of ss,t . This can be written

30

as:




 ∥y i ∥ l = si,t 


(i)

∠y i = ∠f t ⇔ xl =



0


l ∈ si \t 

(3.2)

Moreover, if the compressive sample y i is zero, then xsi is a zero vector. Otherwise (if the sample
is neither zero nor its phase is found in ∠f (i) ), this subset of the coefficients (xsi ) spans at least
two non-zeros and Proposition 1 is not applicable to this subset. Also note that, the idea of finding
the phase of a compressive sample among the available phases in a row of the projection matrix is
an example of the concept of the alignment, introduced in the previous chapter. More specifically,
in the formulation of (2.6), we can re-cast the proposition 1 as finding the alignment of the vector:


in the matrix of:



 ℜ (y i ) 
h=

I (y i )

(3.3)

) 
 ℜ P i,si 
A= (
) 
I P i,si

(3.4)



(

where for a complex number c = a + bi where i2 = −1, ℜ(c) = a and I(c) = b.
Define I ′ as the set of sample indices, such that each of these samples is either zero or the phase
of that sample can be found among the available phases in the respective random vector:

I′ =

}
{
(i)
i ∈ [ms ] : y i = 0 or ∠y i ∈ ∠f

(3.5)

Thus the first stage of the HCS solver determines signal values on indices of:

I = ∪i∈I ′ si

31

(3.6)

ˆ I is the effect of identified coefficients on the compressive samples y; and by
Consequently P :,I x
ˆ I from the compressive samples y we are nullifying such effect:
subtracting P :, I x

ˆ Ic
ˆ I = P :,I c x
y − P :,I x

where I c is the set complement of I (I ∪ I c = [n]). If we consider the n signal coefficients as
unknowns and compressive samples as equations, the problem of y = P x can be viewed as a
system of n unknowns and m = 2ms + mr equations where |I| coefficients have already been
recovered in the first stage of the solver. Having n − |I| further (independent) equations one
can recover the remaining unknowns. The random dense submatrix P (d) and complex samples
spanning at least two non-zeros provide us such equations.

3.3 Optimal ratio of ms and md under HCS
In this section, we derive the sample requirements of HCS (under perfect recovery) and also the
complexity of such procedure. To that end, we first find the average number of samples required
for the perfect recovery and then using the tools from the concentration of measurement [59] we
show that our derived bound is tight.
Recall that complex compressive samples span disjoint subset of coefficients. Hence it is
straightforward to see that the distribution of number of non-zeros spanned by each sample is
multinomial. The problem of finding this multinomial distribution can be casted as the classical
“balls into bins” problem [59]. Hence the probability of the event that a complex compressive

32

sample (say y i , i ∈ [ms ]) spans j non-zero(s) is:
)
( )
( )(
1 k−j 1 j
k
Pr ∥xsi ∥0 = j =
1−
j
ms
ms
(

)

(3.7)

Consequently, the expected number of samples spanning at most one non-zero coefficient (|I ′ |) for
large values of k is:
)
(
E(|I ′ |) = ms Pr ∥xsi ∥0 ≤ 1 ≈ ms

(
)
k
− k
1+
e ms
ms

(3.8)

Since each complex sample spans w = n/ms coefficients, the number of identified coefficients (I)
on average is:
E(|I|) = wE(|I ′ |) = n

(
)
−k
k
1+
e ms
ms

(3.9)

Note that we have not utilized ms − I ′ complex samples in the first stage of the algorithm
since these samples span at least two non-zeros. Counting each complex sample as two equations,
(
)
we need at least md ≥ n − E(|I|) − 2 ms − I ′ further equations to form a full rank system of
equations to recover the remaining coefficients. To identify the required number of measurements,
this problem can be casted as the following optimization problem:
(
)
arg min 2ms + md s.t. md ≥ n − E(|I|) − 2 ms − I ′
ms ,md

(3.10)

The following two lemmas outline key results for solving this problem and determine the sample
requirements of the HCS solver. The first lemma provides expressions for the numbers of complex samples ms and real samples md that are required for the recovery of x under HCS. The
second lemma simplifies these expressions and show that the HCS measurement bound is tight. A

33

corollary is also stated below.
Lemma 3.3.1. Consider a k-sparse signal x of length n and define l = n/k. Then on average,
HCS requires

(√
ms ≈ k

3 l

2

√
−

3

16
1
−
729l 3

)
(3.11)

complex-valued samples and
md = n − nα − 2αms

(3.12)

real-valued samples for the perfect recovery of x where α = (1 + k/ms ) exp(−k/ms ).
Proof of Lemma 1.
To that end we solve the following optimization problem:
(
)
arg min 2ms + md s.t. md ≥ n − E(|I|) − 2 ms − I ′
ms ,md

(3.13)

One can use the Lagrangian multiplier to solve (3.13). Define α = E(|I ′ |)/ms < 1, then the
objective function (L) is:

L = 2ms + md + λ (αn + 2(1 − α)ms + md − n)

(3.14)

Taking derivatives respect to three parameters λ, ms and md gives us three equations:

λ = −1, αn + 2(1 − α)ms + md = n

(3.15)

∂L
= 0 ⇒ 2m3s + 2km2s + 2k 2 ms = nk 2
∂ms

(3.16)

34

Define l = n/k. Let us assume l2 ≫ l, then solving (3.16) yields:
(√
ms = k

3 l

2

√
−

3

1
16
−
729l 3

)

√
<k

3 l

2

(3.17)

Hence we have:
md = n − nα − 2αms
Assuming

(3.18)

√
3 2
2l < 2 + 1/(1 − α) implies md < ms .

(
)3
Lemma 3.3.2. If n 2(α − α2 )2 ln k ≪ k 4 then with a probability of at least 1 − k2 , m =
2ms + md compressive samples is sufficient for the perfect recovery of x under HCS, where ms =
√
n , m = n − nα − 2αm and 0 < δ ≪ 1.
k(1 + δ) 3 2k
s
d
Proof of Lemma 2.
We show that deviations from our derived bound is insignificant by following these steps:
1. We use Corollary 5.11 in [59] which in case of our problem, states that the distribution of
number of non-zeros spanned by each sample is approximately Poisson with mean k/ms .
2. Assuming independent Poisson distributions of non-zero coefficients in compressive samples, we compute the probability that a given sample spans more than one non-zero and
find its variance and mean. Further one can easily find µ, the average number of samples
spanning at least two non-zero coefficients.
3. Assuming independent Poisson distributions of non-zeroes in the compressive samples, let γ
be the number of samples spanning at least two non-zero coefficients, we find the minimum
t such that: Pr(γ − µ ≥ t) < 1/k. In words, t (with high probability) is the maximum
deviation of γ from µ. Then we show that under a realistic presupposition, we have: t ≪ k.
35

4. Applying Corollary 5.11 from [59] to the value of t computed in the step 3, we have: in
the actual distribution of non-zero coefficients among the partitions (i.e. multinomial), the
maximum number of samples spanning at least two non-zeros is less than µ + t with a
probability of higher than 1 − k2 = 1 − O(1/k).
′ represents the number of non-zeros spanned by each complex
Let us assume k ′ = k1′ , . . . , km
s

compressive sample in Poisson case. Let the binary random variable νi be one when ki′ > 1 and
zero otherwise. Then Pr(νi = 1) = 1 − α. Hence E(νi ) = 1 − α and σ 2 (νi ) = α(1 − α). Define
σ2 =

∑ 2
(σi )/ms = σi2 . By Bennet’s inequality [59] we have:

∆ = Pr

(m
s
∑

)
(νi − E(νi )) > t

(
(
2
≤ exp −ms σ h

i=1

t
ms σ 2

))

where:
h(u) = (1 + u) log(1 + u) − u >

3u2
6 + 2u

(3.19)

for u ≥ 0. Solving ∆ ≤ 1/k gives:

t≈

√

2σ 2 ms ln k

It is straightforward to see that when
(

)3
2
n 2σ ln k ≪ k 4

(3.20)

then we have t ≪ k and this completes the proof.
Corollary 3.3.3. If

√
3

2l2 ≤ 2 + 1/(1 − α) then md ≤ ms .

Although the HCS sampling bound is not optimal for all values of k and n, it outperforms
36

even the most complex CS solutions over a wide range of typical and practical values of k and n
(e.g., as long as k/n is not excessively small). Specifically let m⋆ = Ck log n/k be the sample
requirement for an optimal CS framework where C is a constant only depending on the projection
√
matrix and the solver. If 3 n/k < 1.25C log(n/k), then HCS requires fewer samples for perfect
recovery when compared with the sampling requirement of the hypothetical optimal solver. Now
let us compute the complexity of the HCS solver. Recall that the HCS decoding process consists
of two stages. In the first stage, for each complex valued compressive sample (y i , i ∈ [ms ]), we
look for the phase of that sample (∠y i ) among the available phases in the respective random vector
∠f (i) . Assume the entries of

(i)

(i)

(i)

f (i) = [exp(jϕ1 ) exp(jϕ2 ) . . . exp(jϕw )]

are sorted based on phases:

(i)

(i)

(i)

(i)

a, b ∈ [w], a > b ⇔ ∠f a = ϕa > f b = ∠ϕb

(3.21)

Then the complexity of this search for each compressive sample is only log(w). Since we have
to repeat this search for all complex valued compressive samples, thus the complexity of the first
stage of the solver is O(ms log(w)). As before, let |I ′ | ≥ 0, denotes the number of compressive
samples spanning at most one non-zero coefficient, then we have not utilized ms − |I ′ | complex
sample from the first stage of the solver. The second stage of the HCS solver finds the solution of
a full rank system of md + ms − |I ′ | ≤ 2ms equations. Let g(β) be the complexity of linear solver
employed in the second stage of HCS solver to solve a full rank system of β equations. Then the

37

complexity of the second stage of HCS is:
(
)
O g(md + ms − |I ′ |) ≤ O (g(2ms ))

(3.22)

Finally the complexity of HCS solver is O (max{ms log(n/ms ), g(2ms )}).
As a final remark, note that the phases of sample in here are the lexical order concept which we
present in the previous chapter. More specifically in case of HCS and based on formulation (2.9)
we have:
h ∈ C : S (h) = ∠h

3.4

(3.23)

Simulation Results

We tested the proposed HCS framework on a large number of standard compressed sensing signals
and images. Here, we present the results for images of Monalisa and Lenna and compare our
results with two dominant solvers/projection matrices (Fig. 3.1 and Fig. 3.2).
We performed compressive sampling on the whole image of Monalisa, while for Lena, we
applied a block based compressed sensing. More specifically for Lena, the target image is formed
by keeping the largest k = 8 DCT coefficients in all 8×8 blocks and set the rest of DCT coefficients
to zero. For Monalisa image, a total of k = 57979 DCT coefficients (with the largest absolute
value) out of n = 283554 coefficients have been kept and we set the remaining coefficients to zero.
In our simulations, we counted each complex valued compressive sample as two real samples. In
these figures m, tbp , thcs and tomp represent the number of (real) compressive samples and the
decoding time required for BP, HCS and OMP respectively. In all simulations, we have assumed
that for a given number of samples m, the ratio of complex valued compressive samples and real

38

Figure 3.1: From left to right, the k-sparse image of Monalisa, reconstructed by applying HCS, BP
and OMP respectively to the whole image (m = 150746). The decoding times are tbp = 109.5077,
thcs = 16.1704 and tomp = 15.1670 seconds.

Figure 3.2: From left to right, the k-sparse image of Lenna, reconstructed with m = 28 compressive samples by applying HCS, BP and OMP respectively on blocks of size 8 × 8 pixel. The
decoding times are tbp = 41.0283, thcs = 4.4735 and tomp = 1.1512 seconds.

39

valued dense samples (for HCS) is approximately one: (ms = ⌈m/3⌉ and md = m − 2ms ). It
is important to note that this sample assignment is not optimal and HCS needs fewer samples for
the perfect recovery of the signal (see Lemma 1). However in some real world application, the
number of non-zero coefficients might not be known beforehand. Although BP achieved virtually
perfect reconstruction, but clearly the complexity of BP is much higher compared to HCS. Further
simulation results in case of 1D signals are presented in Fig. 3.3 - Fig. 3.5.

40

Target Signal=Doppler, n=120, k=20
Noiseless

HCS, m=50,
MSE=0.000000, Time=0.006031

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

−0.4

20 40 60 80 100 120
BP, Gaussian Projection, m=50,
MSE=0.015769, Time=0.044851
0.5

20 40 60 80 100 120
OMP, Gaussian Projection, m=50,
MSE=0.024785, Time=0.011491
1
0.5

0

0
−0.5
−0.5
20

40

60

80 100 120

20

40

60

80 100 120

(a)

Figure 3.3: m, n and k represent number of samples, signal length and signal sparsity respectively.

41

Target Signal=Blocks, n=4800, k=800
Noiseless

HCS, m=2400,
MSE=0.000000, Time=6.450577

4

4

2

2

0

0

1000 2000 3000 4000

BP, Gaussian Projection, m=2400,
MSE=0.000037, Time=375.307032

1000 2000 3000 4000

OMP, Gaussian Projection, m=2400,
MSE=0.028259, Time=416.847072
10

4
5
2

0
−5

0

1000 2000 3000 4000

1000 2000 3000 4000
(a)

Figure 3.4: m, n and k represent number of samples, signal length and signal sparsity respectively.

42

Target Signal=Leopold, n=1800, k=200
Noiseless

HCS, m=720,
MSE=0.000000, Time=0.224265

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0
500

1000

1500

500

BP, Gaussian Projection, m=720,
MSE=0.000126, Time=6.620905

1000

1500

OMP, Gaussian Projection, m=720,
MSE=0.000266, Time=11.789665

0.1

0.1

0.05

0.05
0

0
500

1000

1500

500

1000

1500

(a)

Figure 3.5: m, n and k represent number of samples, signal length and signal sparsity respectively.

43

Chapter 4
Complex Sparse Projections for
Compressed Sensing (CRISP)

4.1 Introduction
In this chapter, we present Complex Sparse Projections for Compressed Sensing (CRISP), another
instance of hybrid projection matrices. There are two major differences between CRISP and HCS,
presented in the previous section:
• There are no dense random rows in the projection matrix, i.e. P = P s .
• There are overlaps among rows of the (sparse part of) the projection matrix.
We show in this chapter that if overlaps among rows of the projection matrix are carefully
designed, then CRISP would be an optimal CS framework.

4.2 Forming the projection matrix
¯ would be beneficial to describe properties of CRISP projection maUsing an auxiliary matrix P
¯ . Under the proposed CRISP
trices. Let us use the term base projection matrix for such matrix P
¯ has binary entries (ones and zeros), and hence, this base
framework, the base projection matrix P
¯ with
matrix provides an underlying structure that can be used to populate the “ones” entries of P
44

numbers from higher fields. Similar to HCS, we again use complex numbers in the final projec¯ by complex numbers generates the final
tion matrix. This process of replacing the “ones” of P
projection matrix P . Consequently, the underlying “structure” (and associated sparsity) of both
¯ and the final projection matrix P is the same. One straightforward approach
the base matrix P
¯ is to randomly select the entries of P
¯ where “ones” are placed (Algorithm 5).
for constructing P
¯ except for
This simple approach does not impose any constraints on the underlying structure of P
¯.
the parameter K, which is the number of “ones” that are placed randomly in each column of P
Note that K has to be sufficiently small (relative to m = 2ms and n) in order to have a sparse
¯ in this case might
underlying structure for the final projection matrix P . Note that the resulting P
be consider as the incidence matrix of a bipartite left regular graph of degree K. Hence with high
¯ corresponds to an expander graph. A bipartite graph with n left nodes, m
probability [37][62] P
right nodes and regular left degree K will be called a (αn, βc) expander, for some 0 < α, β < 1 if
for every subset of left nodes ν with cardinality less than or equal to αn (|V | ≤ αn) the number of
neighbors connected to ν is larger than βc|ν|. That is

|N (ν)| > βc|ν|

(4.1)

where N (ν) is the set of neighbors of ν.
¯ , where 1 is an all one vector.
Algorithm 5: Generating the base projection matrix P
Input: m, n and K ≥ 2
¯
Output: The base projection matrix P
¯
for i = 1 to n (all columns of P ) do
Select K distinct random integers (l = {l1 , . . . , lK }) uniformly from [m] = {1, . . . , m}
¯ , set the entries in the rows indexed by l to one and put zero in
For the i-th column of P
¯ e,l = 1
other indices: P

One might impose certain constraints on the underlying structure of the base projection matrix
45

which leads to the same structure for P . Most notably, approaches that are inspired by channel
coding methods for constructing parity check matrices, and in particular ones related to Low¯ . For instance, one
Density-Parity-Check (LDPC) codes can be used for the binary base matrix P
might utilize the incidence matrices of Balanced Incomplete Block Designs (BIBD) [49] (if they
exist for the pair of ms and n) or adjacency matrix of expander graphs to improve the performance
of the CRSIP framework as we explain further below in this paper.
¯ prior to replacing the
In general, one can perform some form of matrix manipulation of P
¯ onto some complex numbers. Further, one can have certain constraints on the complex
“ones” of P
numbers used in the projection matrix P . In this paper, we focus on applying two steps to the
¯ , random permutation and employing complex numbers with unique phases as in
base matrix P
¯ and
HCS. For example, the overall approach used when permuting (uniformly) the columns of P
¯ by normalized complex numbers with unique phases in range
substituting the non-zero entries of P
of [0, π) is shown below (Algorithm 6).
Algorithm 6: Generating the projection matrix from the base projection matrix.
¯
Input: The base projection matrix P
Output: The projection matrix P
¯;
Permute randomly the columns of P
for i = 1 to m (all rows of P ) do
¯ and let w = |si |.
Let si be the non-zero entries of the i-th row of P
Generate a row vector f (i) composed of w normalized complex numbers with unique
phases in range of [0, π).
For the i-th row of P , insert f (i) in the column indices of si : P i,si = f (i) and
∀l ∈
/ si : P i,l = 0

Now we say few words regarding the BIBD design procedure. As we eluded above, the incidence matrix of a Balanced Incomplete Block Design [49] represents a special case for the design
¯ . Here, we briefly review a few key properties of such matrices. Let
of the base projection matrix P
m, n, K and λ be positive integers such that m > K ≥ λ. The incidence matrix of a (m, n, K, λ)46

BIBD is a m × n, matrix with the following properties:
• Each row is non-zero exactly in w = λ(m − 1)/(K − 1) column indices. Also each of the
n = mw/K columns is non-zero exactly in K entries1 and thus we have ∀i, j ∈ [m] : w =
|si | = |sj |. Note that this matrix corresponds to the (transpose of) incidence matrix of a
bipartite left-regular of degree K and right-regular of degree w.
• If we consider two distinct rows of this matrix, both are non-zero in exactly λ columns. That
is: ∀i, j ∈ [m], i ̸= j : |si ∩ sj | = λ.
Due to the special characteristics (listed above) of the proposed projection matrix, the following
properties hold:
1. Since each column of our projection matrix is non-zero in K row indices, hence each signal
coefficient xi appears exactly in K random different equations/samples.
¯ is an incidence matrix of a BIBD then two distinct equations
2. If the base projection matrix P
(compressive samples) have exactly λ common coefficients/unknowns.

4.3

CRISP Solver

Let us assume that the distribution of magnitudes of signal coefficients is such that for all Ω ⊂ [n]
and |Ω| = w, the inner product of xΩ with a unit norm random vector of complex numbers is not
zero (which is true for test signals and images in CS community).
Similar to other combinatorial approaches, the decoding scheme of the CRISP solver is iterative
due to the overlaps among rows of the projection matrix. We examine compressive samples to find
1 Note that the size of support of a signal (or sparsity measure) is denoted by k, while K is a parameter of a BIBD.

47

some subsets of the signal spanning only one non-zero coefficient. This can be achieved by looking
for the phases of the compressive samples:

∠y = {∠y 1 , . . . , ∠y ms }
among the phases of the entries of the random vectors f (.) . For instance, assume that ∠y i
(i)

equals to ∠f l . Therefore using Proposition 1 (in Chapter 3), one can determine the signal values in the i-th subset xsi . Subtracting the effect of the identified non-zero coefficient from the
compressive samples leads to the following new under-determined system of equations:

ˆ si = P (x − z)
y (1) = y − P :,si x

(4.2)

where for i > 1, y (i) is the residue of the approximation in the i-th iteration of the solver and z is
a vector with the same size of x such that z si = xsi and zero otherwise. Note that:

∥x − z∥0 = ∥x∥0 − 1

. Since there are exactly K distinct samples spanning the same non-zero coefficient, this reduction
¯ is the incidence
affects K more samples (equations). Specifically, if the base projection matrix P
matrix of a BIBD, then identifying the signal values in one subset reduces the number of unknowns
(un-identified coefficients) in all equations:

yi =

w
∑

f (i) xsi,e

e=1

by λ. Therefore the algorithm converges to the solution very fast. More specifically, consider a

48

compressive sample y t spanning two non-zeros including xsi,l :

y t = αxsi,l + βxe , e ∈ [n]/si,l

where α and β are two distinct entries of f (t) . Thus:

(1)

yt

= y t − P i,si xsi = βxe

Consequently after this reduction, all compressive samples spanning two non-zeros such that one
of these non-zeros is among the identified coefficients in the first iteration, turn into new samples spanning only one non-zero coefficient. Therefore in the next iteration, the phases of these
compressive samples can be found among the available phases in the random vectors f (.) . This
event results in identifying the signal values in new subsets (and consequently new coordinates).
Algorithm 7 shows the CRISP decoding process.
Algorithm 7: CRISP solver
Input: y and P
ˆ which y = P x
ˆ
Output: the sparse vector x
(0)
ˆ = 0.
Set I = 0, y = y and x
repeat
(I)

Find all indices β = {i : ∠y i
for all i ∈ β do

∈ ∠f (i) }
{

(I)
(i)
ˆe =
Suppose ∠y i = ∠f l then: x

ˆ si
y (I+1) = y (I) − P :,si x
I =I +1
until y (I) is a zero vector;

49

(I)
∥y i ∥ e = si,l
0
e ∈ si /si,l

}

4.4 Optimality
¯ . ThereAs stated before, the projection matrix (P ) inherits best properties of the base projection P
fore, by our design and with high probability, P corresponds to (randomly permuted) the incidence
matrix of an expander graph. One can use the arguments of [37] that with high probability, having m = O (k log n/k)) compressive samples, 1) in the first iteration of the solver, there exists at
least 1 + [K/2] samples spanning only one specific non-zero coefficient (where K is the number
of non-zero entries in each column of the projection matrix) and 2) the algorithm never halts and
finally 3) it converges in O(k) iterations. Therefore the algorithm is optimal in terms of the order
of the number of samples for perfect recovery. However now instead of real valued samples, we
are sensing complex samples which (albeit keeping the order of m fixed) doubles the number of
sensed real samples (each complex sample is counted as two real measurements). On the other
hand, as opposed to traditional combinatorial algorithms which require at least 1 + [K/2] samples
that span only one common non-zero coefficient (for a voting like decoding algorithm), here we
only need one sample (instead of 1 + [K/2]) that spans one non-zero, which in turn lowers the
required number of samples for perfect recovery. This significant reduction in the required samples is attributed to our utility of unique complex phases that make it feasible to detect and identify
isolated non-zero coefficients. And finally, this generalization (from binary matrices to matrices
with complex entries) reduces the complexity of the solver significantly from O (n log(n/k)) to
O (k log(n/k) log(n/m)).
Recall that the first iteration of the solver we look for the phases of compressive samples among
a predetermined set of phases (available phases in each row of the projection matrix). Similar to

50

HCS, we can assume that for each row of P , non-zero entries are sorted by their phases:

(i)

(i)

(i)

(i)

a, b ∈ [w], a > b ⇔ ∠f a = ϕa > f b = ∠ϕb

Then one might utilize a quick search algorithm (such as binary search) in the CRISP solver. Now
for each sample, we need to search among w = O(n/m) distinct phases and the complexity of such
search is only log(w). Hence the first iteration of CRISP has a complexity of order O (m log(w)).
For the next iteration of the solver, we need to subtract the effect of identified coefficients from the
compressive samples. Since each non-zero coefficient appears in K (distinct) samples, hence the
complexity of such update is O(K) which is a constant. However the algorithm converges in O(k)
iterations, hence the overall complexity of CRISP is O (m log r + kK) = O (m log(n/m)).

4.4.1 The Average Behavior Analysis
In this subsection, we present the proof for our claim that CRISP on average needs only k <
m ≤ 2k complex (or equivalently 2k < m ≤ 4k real) valued compressive samples for the perfect
recovery of a k-sparse signal. Note that, this does not violates the optimality of the bound m⋆ =
O(k log n/k). In fact, it has been observed that BP typically recovers a k-sparse signal from ≈ 4k
compressive samples. Furthermore, this analysis is only for the average case and in practice the
solver usually deviates from the mean behavior and thus higher number of samples is usually
required.
Lemma 4.4.1. On average, CRISP requires 2k < m ≤ 4k real-valued compressive samples for
the perfect recovery of a k sparse signal of length n if k ≫ 1.
Proof. Let C = m/k be the oversampling factor. Then the probability that a given sample y j

51

spans exactly i non-zeros is:
(

)

(k)(m−1)i (m−1)k−i

Pr ∥xsj ∥0 = i = i

K−1

( m )k

K

(4.3)

K

where m is the number of rows of P . It is important to note that since the columns of CRISP
projection matrix are permuted randomly, hence the probability that any given sample spans i
non-zero coefficients, is constant for all samples:
(
)
(
)
∀l, j ∈ [m] : Pr ∥xsj ∥0 = i = Pr ∥xsl ∥0 = i = ηi

(4.4)

Thus the expected number of samples, spanning only one non-zero is mη1 . To simplify our analysis, let us consider the asymptotic case where k (and hence m and n) tends to infinity. Then the
expected number of samples spanning only one non-zero would be:
(

)

K

−
E #j : ∥xsj ∥0 = 1 ≈ kKe C

(4.5)

k→∞

Note that for large values of k and fixed values of m = Ck, the expected number of samples
spanning only one non-zero is a linear function of k. These samples identify the locations of
isolated non-zero coefficients which consequently lead to the recovery of the associated subsets of
coefficients.
Here we should emphasize that not all of these isolated coefficients provide new information.
For instance, it is possible that there are several samples spanning only one specific non-zero
coefficient xi . So it is important to exclude this redundant information from our analysis. Let
be the probability of the event that one sample y a spans only one non-zero coefficient xi , given

52

that there exists another compressive sample y b (b ̸= a) spanning the same coefficient xi :

Pr (sa ∩ Supp(x) = i|∃b ∈ [m] : sb ∩ Supp(x) = i)

Then we have:

(m−2)(m−2)k−1
k(m − 1) K−2
K
=
( m )k

(4.6)

(4.7)

K

That is choose a non-zero coefficient xi among k non-zeros and choose another sample y a (except
y b ) and let y a be only function of xi . Now randomly select K − 2 other samples which would be
functions of xi and distribute the remaining k − 1 non-zeros (counting their K repetitions) among
remaining m − 2 samples. Fix m = Ck, for large values of k (4.7) can be approximated to:

k→∞

≈

k(K 2 − K) − 2K
e C
m

(4.8)

which is a linear function of sparsity measure k. Thus the expected number of samples which do
not result to discovery of new coefficients (µ) is:

µ=

m
K

≈

m kK(K − 1) − 2K
− 2K
e C = k(K − 1)e C
K
m

(4.9)

K

−
Recall that (on average) kKe C samples span only one non-zero coefficient. Hence, the expected

number of compressive samples spanning distinct non-zeros in the first iteration of the solver is:

−K
µ′ = kKe C − µ = k

(
)
2K
−K
−
Ke C − (K − 1)e C

(4.10)

In words, CRISP solver (on average) recovers µ′ non-zero coefficients in the first iteration. Since
the columns of the CRISP projection matrix are permuted independently and uniformly, we can
53

Name=QuadChirp, k=80 (DCT), n=2500
1

CRISP Reconstructed, m=320
MSE=0.000000, Time=0.386784
1

0.5

0.5

0

0

−0.5

−0.5

−1

−1
500 1000 1500 2000 2500

500 1000 1500 2000 2500

Guassian Matrix, BP, m=320
MSE=0.060634, Time=3.678604

Guassian Matrix, OMP, m=320
MSE=0.169403, Time=1.401314
1

0.5

0.5
0

0

−0.5
−1

−0.5
500 1000 1500 2000 2500

500 1000 1500 2000 2500

Figure 4.1: Comparing Compressive Demosaicing with some well-known demosaicing algorithms.
say that in the next iteration, there are k (1) non-zeros left to recover from m samples, where:

k (1) = k − µ′ = kq, q =

(
)
2K
−K
−
1 − Ke C + (K − 1)e C

(4.11)

Note that for fixed values of C = m/k, q is independent of k. It is straightforward to see that, (on
average) in the i-th iteration of the algorithm, there are: k (i) = kq i non-zero coefficients which
have not been recovered. Also note that even for 1 < C ≤ 2, we have q < 1. In other words, on
average, k < m ≤ 2k complex valued samples (and hence 2k < m ≤ 4k real valued samples) is
sufficient for the perfect recovery.

54

Name=Blocks, k=50 (Wavelet), n=4950

CRISP Reconstructed, m=200
MSE=0.000000, Time=0.284941

6

6

4

4

2

2

0

0
1000 2000 3000 4000

1000 2000 3000 4000

Guassian Matrix, BP, m=200
MSE=0.960739, Time=4.413219

Guassian Matrix, OMP, m=200
MSE=1.991390, Time=0.613399

10

10
5

5

0
−5

0

−10
1000 2000 3000 4000

1000 2000 3000 4000

Figure 4.2: Comparing Compressive Demosaicing with some well-known demosaicing algorithms.

4.5 Simulation Results
We tested the proposed CRISP on a large number of standard signals from SparseLab [45]. Specifically we compared the performance of CRISP (in terms of quality of the recovered signal as a
function of the number of compressive samples and the required time for recovery) in comparison
with the popular Gaussian random projection matrix and dominant Basis Pursuit (BP) and OMP
solvers. In all plots m, n and k denote the number of compressive samples, the signal length and
the sparsity of the signal in the sparsifying domain (Wavelet/DCT). To provide fairness, we have
counted each complex sample as two samples. We performed our simulations under various configurations of signal length and sparsity ratio which we present some of them in Fig. 4.1- Fig. 4.3..
As clearly demonstrated CRISP has significantly lower complexity compare to BP and OMP.
Here we should highlight an important note: due to the strong structures and rich properties of
55

Name=Bumps, k=500 (Wavelet), n=5000

4

4

2

2
0

CRISP Reconstructed, m=2000
MSE=0.000000, Time=14.474694

0

1000 2000 3000 4000 5000

1000 2000 3000 4000 5000

Guassian Matrix, BP, m=2000
MSE=0.000003, Time=297.498630

Guassian Matrix, OMP, m=2000
MSE=0.000000, Time=24.440286

4

4

2

2

0

0

1000 2000 3000 4000 5000

1000 2000 3000 4000 5000

Figure 4.3: Comparing Compressive Demosaicing with some well-known demosaicing algorithms.
BIBDs, we expect that a sensing matrix generated based on an incidence matrix of a BIBD boosts
CRISP performances in terms of the solver complexity and also the sample requirements for the
perfect recovery. For instance, in Fig. 4.2, we have used a Hamming matrix of weight two (all
¯ . Hence P
¯ is
m tuples with K = 2 non-entries in each column) as the base projection matrix P
an incidence matrix of a (m, m(m − 1)/2, 2, 1)-BIBD. In that simulation, we observed that if the
incidence of a (200, 4950, 2, 1)-BIBD was used as the base projection matrix, then m = 4k (real
valued) samples was enough for the perfect recovery. On the other hand, if we have used Algorithm
5 to generate the base projection, then m ≈ 6k samples was required for the perfect recovery and
also the decoding time would increase as well. However we should recall that: although there are
numerous methods for generating a BIBD with some given parameters [49], it is possible that there
are no BIBD designs that exist in some configurations of (m, n, K, λ).

56

Chapter 5
Modified SSMP
In this chapter, we are interested in finding a sensing operator P which can be utilized in all three
types of CS approaches (combinatorial, convex relaxation, and greedy); and at the same time, P
can provide recovery guarantees in ℓ2 norm sense. We show that by some modification in the adjacency matrices of expander graphs, one can construct a class of sensing matrices which exhibits
some “weak” form of RIP. Let us clarify what we mean by weak in here. As stated before, the formal definition of RIP demands that ||P x||2 must be close to ||x||2 (with high probability) for all
possible k-sparse signals. Although RIP is known to be a sufficient condition to guarantee the success of geometrical approaches in solving CS, it has been reported in numerous papers (e.g. [64]
[65] [66]) that the empirical results of linear programming solvers shows that there might be some
less constraining properties than RIP, governing the success of linear programming in CS context.
This observation raised to investigation for alternative sufficient conditions which consequently
led to introduction of different types of RIP, for instance statistical RIP [64], weak RIP [67] and
generalized RIP [69] to name a few. This chapter is along with those efforts. More specifically,
we focus on the probability of recovering any arbitrary but fixed signal and not the probability of
failure of the decoder for all possible k-sparse signals which clearly is a much stronger one. Such
notion of fixing the signal support has been introduced and used before, for instance in [67]. By
using basic arguments, we show that under this measure (the probability of failure for a fixed arbitrary signal), a simpler and weaker version of RIP which we refer to as w-RIP (that is very similar
to the concept Weak-RIP in [67]) is sufficient to recover any arbitrary fixed signal under convex
57

relaxation methods and at least one optimal greedy algorithm called Cosamp [28] with high probability. The proposed w-RIP possibly widens the admissibility of many sensing matrices which do
not possess the original RIP, to be considered as “good” sensing mechanism at least in the sense
of any arbitrary but fixed signal. We show that if a matrix P ∈ Rm×n is constructed by replacing
the non-zero entries of the sparse binary adjacency matrix of an expander graph with a Gaussian
random vector then P would benefit from w-RIP and hence could be utilized in geometrical and
greedy frameworks. Also, we show that there exists at least a combinatorial algorithm (based on
Sequential Sparse Matching Pursuit [68]) for the proposed P with the ℓ2 < Cℓ2 guarantee of reconstruction. To the best of our knowledge, no other class of sensing projection matrices has been
introduced in prior works which can provide ℓ2 < Cℓ2 guarantee of reconstruction under all three
approaches to CS.
This chapter is organized as follows: in Section 5.1 and 5.2, the proposed SERP and our definition of w-RIP are introduced. We prove that if one targets for recovering an arbitrary but fixed
signal, then w-RIP is a sufficient condition of success for at least one greedy and all geometrical
CS frameworks. In section 5.3, we prove that SERP matrices adheres to such w-RIP and thus
can be safely utilized in the aforementioned CS frameworks. In Section 5.4, we show that for the
proposed class of SERP matrices, there exists a fast combinatorial algorithm which can robustly
recover a signal from its noisy samples. Finally, simulation results are presented in section 5.5.

5.1

SERP projection matrix

Consider matrix P = G(m, n, r, d, D) where the function G is defined in Algorithm 8 D is a
probability distribution and m, n, r and d are some natural numbers with the condition that m is
divisible by r. In words, Algorithm 8 generates the matrix P by following these operations: (i)

58

for each column i ∈ [n], the algorithm selects uniformly at random d indices among [m/r] =
{1, 2, . . . , m/r}. Let us denote those indices by h. (ii) Then for each entry of h, for instance the
l-th entry hl , the algorithm generates another r indices {(hl − 1)r + 1, (hl − 1)r + 2, . . . , hl r}.
The collection of these rd indices forms the row-indices where the i-th column will be non-zero
(i.e. Ωi ). (iii) Finally, each non-zero entry of the i-th column of P would be a random variable
with distribution D. The output of Algorithm 8 could be also viewed as a matrix which each of
its column is non-zero in exactly d random indices and each non-zero entry is a random vector
(t1 , t2 , . . . , tr ) with ti ∼ D for i ∈ [r]. Note that, for all i and j where ⌊i/r⌋ = ⌊j/r⌋ we have
ωi = ωj . It is straightforward to verify that for sufficiently large n if (with high probability) the
output of G(m/r, n, 1, d, D) corresponds to a high quality expander graph, then G(m, n, r, d, D)
would also (with high probability) correspond to a high quality expander. Also it should be clear
that permuting columns and rows of the generated projection matrix has no effect on our arguments
nor the quality of the matrix.
A wide range of popular projection matrices (in CS context) could be generated by Algorithm
8. For instance if m = O (k log n/k) and D = N (0, 1/m), then G(m, n, 1, m, D) outputs a
random dense Gaussian projection matrix which has RIP with high-probability [16]. On the other
hand, for d = θ (log n), r = 1 and a distribution with delta-Dirac at one and zero otherwise for
D (i.e. t ∼ D is a constant number with value of one), the output of the Algorithm 8 (with high
probability [38, 37, 68]) will correspond to a bipartite expander graph and hence can be utilized in
a combinatorial solver for CS.
In this chapter, we are interested in the output of the Algorithm 8 when r = O (1) is a small
natural number, d = θ (log n), m = O(k log n/k) and D = N (0, 1/rd). We call an output instance
of G(m, n, r, d, D) with those values as “Sparse Expander-like Real-valued Projection” or SERP.
With some realistic approximations, we show that this class of matrices adheres to a property
59

Algorithm 8: The function G(m, n, r, d, D).
input : m, n, r, d and D
output: P ∈ Rm×n
for i = 1 to n do do
T = 0m ;
Let h be a random subset of [m/r] of size d (Card{h} = d) ;
m/r

Ωi = ∪l=1 {(hl − 1)r + 1, (hl − 1)r + 2, . . . , hl r} ;
∀j ∈ Ωi : P j,i ∼ D ;
end

which we refer to as w-RIP (to be defined in follow) for an arbitrary signal x with a fixed support
set T = Support{x}. Then, we show that w-RIP is in fact sufficient to guarantee the recovery of
that fixed signal under convex relaxation and Cosamp (an optimal greedy CS decoder) and there is
no need for the stronger RIP condition. Also by using the same property, we prove that modifying
only one line of the SSMP decoder [68] suffices to come up with a combinatorial algorithm with
a ℓ2 < Cℓ2 guarantee of recovery for that particular signal under SERP sensing matrices. To the
best of our knowledge, this is the first class of matrices which can provide ℓ2 < Cℓ2 recovery
under all three approaches to the problem of compressed sensing at least in a “for any arbitrary
but fixed signal” setting. Throughout this paper, we assume that parameters m, d and r are chosen
properly such that, if non-zero entries of P = G(m, n, r, d, D) are replaced by one, then it would
correspond to a ((c + 1)k, d, q) expander for small constants c ≥ 1 and 0 < q < 1.

5.2 w-RIP
Since Restricted Isometry Property (RIP) is an essential tool in the analysis of different algorithms
in Compressed Sensing, we review that concept in here. A matrix P ∈ Rm×n has RIP of order
k ∈ N [3], if for every k-sparse signal x ∈ Rn , ||x||0 = k, there exists a Restricted Isometry

60

Constant (RIC) constant δk′ such that:
(

)
(
)
1 + δk′ ||x||22 ≤ ||P x||22 ≤ 1 − δk′ ||x||22

No deterministic construction is available so far for a matrix to achieve RIP of order k with the
minimal number of rows (samples) of m = O (k log n/k). However, it has been proved that some
random matrices satisfy RIP with an optimal number of rows with high probability. For instance,
it can be shown [16] that Gaussian random matrices have that property with a probability of at
least 1 − cn−γ for some positive constants c and γ. To establish a clear connection between RIP
and our proposed w-RIP let us define an auxiliary RIP which we refer to as probabilistic-RIP and
abbreviate by p-RIP in this paper.
Definition 1. We say, that a matrix P ∈ Rm×n has p-RIP of order (k, p) ∈ N × [0, 1], if P has
RIP of order k with a probability of at least p.
Hence Gaussian random matrices with m = O (k log n/k) rows have p-RIP of order (k, p)
with p = 1 − cn−γ for some constants γ and c. Now let us present our definition of w-RIP in here.
Definition 2. Definition 2. We say, that a matrix P ∈ Rm×n has w-RIP of order (k, p) ∈ N×[0, 1],
if for any arbitrary but fixed k-sparse signal x ∈ Rn , ||x||0 = k, there exists a w-RIP constant
0 ≤ δk,p ≤ 1 such that:
(

)
)
(
1 + δk,p ||x||22 ≤ ||P x||22 ≤ 1 − δk,p ||x||22

with a probability of at least p.
In some parts of this chapter when p and k are known from the context, we drop subscripts
and simply denote δk,p by δ. Note that, this definition is slightly different from the definition of
61

weak-RIP in [67] and the one in [69]. The connection between RIP and p-RIP is obvious from
their definitions. The connection between w-RIP and p-RIP can be justified as follow. Assume the
set of T = {T1 , T2 , . . .} contains all subsets of [n] = {1, 2, . . . , n} with the cardinality of k, i.e:

∀i ∈ {1, 2, . . . , Card{T }} : Ti ⊆ [n], Card{Ti } = k

Furthermore, suppose that P has w-RIP of order (k, p). Then it means that for each Ti , all singular
√
√
values of P :,Ti are in the range of [ 1 − δk , 1 + δk ] with a probability of p. If columns of P
are independently distributed (which is usually the case) then P would have RIP of order k with a
(
)
probability of pCard{T } , i.e. p-RIP of order k, pCard{T } .
The first papers on Compressed Sensing (e.g. [2, 8, 4, 5]) were focusing on the failure probability of decoder for all possible k sparse signals. For instance, to target a probability of success q in
case of a convex relaxation decoder, one needs a p-RIP of order (2k, q). Consequently, to show that
√
one matrix has this property, that matrix must have w-RIP of order (2k, p) where p ≥ Card{T } q
and T contains all subsets of [n] with the cardinality of k. Note that cardinality of the superset
(n)
T is Card{T } = 2k
. Even for small values of n, Card{T } is enormously large, demanding
Card{T√
} q to be extremely close to one. This theoretically deprives many sensing matrices from

having RIP or p-RIP for the optimal number of samples m = O (k log n/k) even though they
exhibit good empirical results in practice. This has been one of main motives that some recent
efforts (e.g. [64, 67]) has focused on deriving “the probability of failure of a CS framework for
any arbitrary but fixed signal” instead of the “the probability of failure of a CS framework for all
possible signals”.
One expects that this relaxation should have some implications on RIP as well and fortunately
that is the case. By looking more closely at proofs in [bp] and [28], one can easily notice that under

62

both these frameworks, to recover an arbitrary signal x with a fixed support from its compressive
samples, there is no need for the full RIP or p-RIP. Instead, to prove that an arbitrary signal with
a fixed support set could be recovered from its compressive samples with a target probability of q
(
√ )
under those decoders, it only suffices to have w-RIP of order k, p = z q where the constant z is
(n)
solver-dependent and by far smaller than 2k
. For instance as we show in proof of Lemma 5.2.1,
the constant z in case of a linear program based decoder is only O (n/k). Also later in this paper,
we show that the proposed class of SERP matrices can easily achieve w-RIP of a proper order with
some realistic approximations.

5.2.1 Implications of w-RIP
The following two lemmas prove that, if a matrix has w-RIP of proper orders, then it will be a
“good” sensing mechanism under Cosamp (a greedy algorithm) and also any convex relaxation
decoder when one targets for recovering any arbitrary signal with a fixed support.
Lemma 5.2.1. Consider any arbitrary signal x ∈ Rn with the fixed support set T of cardinality
√
k. Assume that, P ∈ Rm×n has w-RIP of order (2k, p) with the constant δ2k,p < 2 − 1 for
p = 1 − 1/nβ+1 where β ≥ 1. If ∆ (P , y) belongs to the class of convex relaxation methods, then
∆ (P , y) provides a ℓ2 < Cℓ1 guarantee of recovery for x from y = P x with a probability of at
least p2n ≥ 1 − 2/nβ .
Proof : Let x∗ = ∆ (P , y) where ∆ is any convex relaxation based decoder. In other words,
x∗ is the solution to optimization problem (1.8). As in [3] define h = x∗ − x and partition [n] into
sets of cardinality k, denoted by T0 , T1 , T2 , . . . such that T0 corresponds to k largest entries of x,
T1 corresponds to k largest entries of hT c , T1 corresponds to the next k largest entries of hT c and
0

0

so on. Fixing the support of x, the arguments of [3] requires that RIP inequalities to be active on
63

sets
T0 ∪ T1 , T0 ∪ T2 , T0 ∪ T3 , . . . , T1 ∪ T2 , T1 ∪ T3 , T1 ∪ T4 , . . . ,
Since k could be as low as one, there are at most 2n of these sets. Now assume that the matrix P
has w-RIP of order (2k, p). Then P has the required properties in [3] with a probability of p2n .
Recalling that
(
)2n
1
2
∀n ≥ 1 : 1 − β+1
≥1− β
n
n
it can be concluded that if one targets for recovery of a fixed signal with a probability of
q = 1 − 2/nβ for some constant β ≥ 1, then if suffices to have p > 1 − 1/nβ+1 .
Lemma 5.2.2. Consider any arbitrary signal x ∈ Rn with the fixed support set T of cardinality
k. Assume that, P ∈ Rm×n has w-RIP of order (2k, p) with the constant δ4k,p < 0.1 for p =
1 − 1/nβ+1 where β ≥ 1. If ∆ (P , y) is Cosamp algorithm, then ∆ (P , y) provides a ℓ2 < Cℓ1
guarantee of recovery for x from y = P x with a probability of at least p2n ≥ 1 − 7/nβ .
Proof : For a k-sparse signal x, Cosamp requires at most 6(k + 1) ≤ 7k iterations (for k ≥ 6).
Assume that in the start of the i-th iteration, the algorithm estimates x by x{i} . Define:

R{i} = x − x{i} , d{i } = y − P x{i}

In words, R{i} is the residue of estimation and d{i } is the discrepancy of estimation in that
iteration. Finally let Ω{i} be indices corresponding to 2k largest values of P T d{i} . Then on the
i-th iteration of Cosamp, one needs RIP inequalities only on column indices

64

{i}

∀i ∈ [6(k + 1)] : T1
{i}

∀i ∈ [6(k + 1)] : T2

= Support{Ω{i} }\Support{R{i} }

= Support{Ω{i} } ∪ Support{x} ∪ Support{x{i} }

{i}

{i}

Note that for all i, Card{T1 } ≤ 2k and Card{T2 } ≤ 4k. Based on the proof of
[28], Cosamp recovers a fixed sparse signal x when the restricted isometery constant on sets
{1}

T1

{2}

, T1

{6(k+1)}

, . . . , T1

{1}

and T2

{2}

, T2

{6(k+1)}

, . . . , T2

are all smaller than 0.1. Therefore, if

one wants to recover any arbitrary but fixed signal x with a target probability of q then the matrix P
√
has to have a w-RIP of order (4k, p) with the constant δk,p ≤ 0.1 for p ≥ 6(k+1) q. For instance,
if p ≥ 1 − 1/nβ+1 then q ≥ 1 − 7/nβ .

5.3 w-RIP for SERP matrices
The following two lemmas prove that SERP matrices have w-RIP and hence could be utilized in
Cosamp and convex relaxation methods in a “for any arbitrary but fixed signal” scenario.
Lemma 5.3.1. Let P ∈ Rm×n be an output instance of G(m, n, r, d, D). If d = θ(log n), m =
O(k log n/k), D = N (0, 1/rd) and r = O(1) is a constant, then for any arbitrary vector x ∈ Rn
with a fixed support, there exist constants δ and α = O(r) such that
(
)
1
P r ||P x||22 ≥ (1 + δ) ||x||22 ≤ α
n

(5.1)

Proof : Without loss of generality, assume that x has a unit norm (||x||22 = 1). To bound
||y||22 = ||P x||22 (similar to [16]) one can resort to moment generating function of the random

65

variable y:
(
)
(
(
)
)
Pr ||y||22 ≥ (1 + δ) = Pr exp t||y||22 ≥ exp (t(1 + δ))
(
(
))
E exp t||y||22
≤
exp (t(1 + δ))

(5.2)
(5.3)

(
(
))
for a positive value of t, where the last inequality is due to Markov’s. Clearly E exp t||y||22
is the moment generating function of the random variable ||y||22 . Note that (see [16]):

yi ∼

||xωi ||2 N (0, 1)
√
rd

where N (0, 1) is standard normal distribution. Consequently
||xωi ||22 χ2 (1)
2
yi ∼
rd
where χ2 (1) is a chi-squared distribution with one degree of freedom. Define
||xωi ||22
∀i ∈ [m] : zi =
=
rd
Then the random variable ||y||22 =

∑

2
j∈ωi xj

rd

∑m

2
i=1 zi χ (1) is a weighted sum of chi-squares. It is known

and easy to verify that manipulating the moment generating function of a random variable with
the distribution of weighted sum of chi-squares does not lead to neat or usable formulas. Hence, a
number of estimations are suggested to approximate that random variable with another one which
can be manipulated more easily (e.g. [70, 71]). Here in this paper, we use the approximation
in [70]. A commonly practiced approximation of a weighted chi-squared random variable is a
Gamma distribution with the same mean and variance of the original random variable. In [70],

66

the authors proved that, this approximation leads to acceptable results and more importantly, the
gamma approximation tends to be an over-estimate of the true distribution of ||y||22 when variance
of {zi } is not small1 . Since values of {zi } in SERP are linearly proportional to ||xωi ||22 (i.e. norms
of random subsets of the signal) and the magnitudes of many signals of the interest (e.g. DCT
coefficients of images) almost follow a power law, hence at least for these classes of signals, it is
very unlikely that ||xωi ||22 to have small variances. Consequently, this approximation seems to be
reasonable for those practical signals of interest. Define
∑m
∑
zi
( m
zi )2
i=1
a = ∑m 2 , λ = ∑i=1
2
2 i=1 zi
2 m
i=1 zi

(5.4)

Then based on the approximation of [70], ||y||22 can be approximated by a Gamma distribution
with the shape parameter λ and scale parameter 1/a. Note that
m
∑

∑m
zi =

i=1

2
i=1 ci xi

rd

, ci = Card{j : j ∈ ωi }

(5.5)

where ci is the number of times that i ∈ [n] occurs in ωi . Recall that, all columns of P are non-zero
in exactly rd row indices. Hence, for all i ∈ [n] we have ci = rd. Since it is assumed that x is unit
norm, we get
m
∑
i=1

( m )2
∑
zi =
zi
=1⇒a=λ
i=1

Now let us find upper and lower bounds for

∑m

2
i=1 zi .

It can be proved by contradiction that given the properties of {zi }, the vector z = [z1 z2 z3 . . . zm ]
∑
2
has to be maximally sparse when the objective function m
i=1 zi attains its maximum. More
specifically, assume that n is not extremely small such that rd > 1. Furthermore, assume that a
1 Here we are deriving the upper bound for equation (5.1). Hence, the derived probability from the approximation
in here is higher than the actual one.

67

given sequence {zi } leads to maximum value for

∑m

2
i=1 zi , however the vector z = [z1 z2 z3 . . . zm ]

is not maximally sparse. Then, one can find two indices a, b ∈ Support{z} such that za ̸= 0 and
za ̸= 0 (since rd > 1 and ||z||0 ≥ rd) and form a new sequence of {w1 w2 . . . wm } where



zc
c ̸= a, c ̸= b




wc =
0
c=a





 z + z c = b.
a
b

(5.6)

Define w = [w1 w2 . . . wm ]. It is easy to verify that
m
∑

zi =

i=1

m
∑

wi , ||w||0 = ||z||0 − 1,

m
∑

zi2 <

i=1

i=1

m
∑

wi2

i=1

which contradicts our assumption that this sequence of {zi } leads to maximum value for
∑m

2
i=1 zi . Performing the re-distribution of signal energies successively, one can conclude that

the vector z = [z1 z2 z3 . . . zm ] has to be maximally sparse when

∑m

2
i=1 zi attains its upper

bound. However, the sparsity of the vector z can never be less than rd, due to two facts: (a) the
matrix P corresponds to an expander and (b) each column of P is non-zero at exactly rd indices.
This maximally sparse case happens when the signal is non-zero in exactly one location. Since,
we have assumed that the signal is unit-norm, then it means that the signal is one at one index and
zero in the rest of indices. It is straightforward to verify that in this case,
concludes the upper bound for

2
i=1 zi = 1/rd. This

∑m

2
i=1 zi .

It is known and easy to verify that the when the constraint
value of

∑m

∑m

i=1 zi = 1 is active, the minimum

∑m

∑m 2
2
i=1 zi occurs when all zi are equals to 1/m. In that case,
i=1 zi equals to 1/m as

well.
Putting together upper and lower bounds, we have rd/2 ≤ a = λ = m/2. Based on arguments

68

above, the following holds
) E (exp (t||y||2 ))
2
Pr ||y||22 ≥ (1 + δ) ≤
=
exp (t(1 + δ))
(
)−a
1 − at
= f (t)
et(t(1+δ))
(

(5.7)
(5.8)

aδ < a which is in a valid range for the
for 0 < t < a. By setting ∂f (t)/∂t = 0, we get t = 1+δ

corresponding moment generating function. Substituting back this value into (5.8) leads to
(

)

Pr ||y||22 ≥ (1 + δ)

(
≤

)
1+δ a
eδ

(5.9)

Let
τ=

(

δ2

2
2 1 + δ + δ2

)

(5.10)

Then by writing Taylor series expansion of (1 + δ)/ exp δ around δ = 0 upto δ 2 and recalling that
1 − s ≤ exp (−s), it can be concluded that
(
)
Pr ||y||22 ≥ (1 + δ) ≤ exp (a)−τ

(5.11)

Now recall that rd/2 ≤ a and d = c1 log n for a constant c1 . Thus
(
)
Pr ||y||22 ≥ (1 + δ) ≤

1
nrc1 τ /2

(5.12)

(
)
In summary, there exists a constant α = rc1 τ /2 such that P r ||y||22 ≥ (1 + δ) ≤ 1/nα .
Clearly parameter r (in the projection matrix) can be tuned such that α ≥ 2.
Lemma 5.3.2. Let P ∈ Rm×n be an output instance of G(m, n, r, d, D). If d = θ(log n), m =
69

O(k log n/k), D = N (0, 1/rd) and r = O(1) is a constant, then for any arbitrary vector x ∈ Rn
with a fixed support, there exist constants δ and α = O(r) such that
)
(
1
Pr ||P x||22 ≥ (1 + δ) ||x||22 ≤ α
n

(5.13)

Proof : Assume that x has a unit norm. Similar to proofs of Lemma 5.3.1, one can conclude:
(
)
(
(
)
)
Pr ||y||22 ≤ (1 − δ) = P r exp −t||y||22 ≥ exp (−t(1 − δ))
(
(
))
E exp −t||y||22
≤
exp (−t(1 − δ))

(5.14)
(5.15)

(
(
))
for a positive value of t. We again use the approximation of [70] to estimate E exp −t||y||22 .
More specifically by using arguments provided in the proof of Lemma 5.3.1, we have
(
(
))
E exp −t||y||22
e−t(1−δ)

et(1−δ)
(
)
≈
1 + at

(5.16)

where rd/2 ≤ a ≤ m/2. For finding the tightest bound for (5.15), one should set the derivative of
(5.16) with respect to t to zero and solve for t. That value is t = aδ/(1 − δ). It is straightforward to
verify that if 0 < δ < 1/2 then this computed value for t is in the valid range of the corresponding
moment generating function. Plugging back this value to (5.15) yields
)a
(
) (
P r ||y||22 ≤ (1 − δ) ≤ (1 − δ)eδ

(5.17)

Note that Taylor series expansion of (1 − δ)eδ around δ = 0 up-to degree 4 is

(1 − δ)eδ = 1 −

( )
δ2 δ3 δ4
−
−
+ O δ5
2
3
8
70

(5.18)

Hence
(

(1 − δ)eδ

)a

aδ 2

≤ e− 2

(5.19)

Considering that rd/2 ≤ a ≤ m/2 and d = c1 log n for a constant c1 , the following holds
(
)
1
P r ||y||22 ≤ (1 − δ) ≤ α
n

(5.20)

for α = rc1 δ 2 /4. Again r can be tuned such that α ≥ 2.
Theorem 5.3.3. Let P ∈ Rm×n be an output instance of G(m, n, r, d, D). If d = θ(log n),
m = O(k log n/k), D = N (0, 1/rd) and r = O(1) is a constant, then P would have w-RIP of
order (k, p) with the constant δ for p ≥ 1 − n2α where α = O(r).
Proof : Lemmas 5.3.1, Lemma 5.3.2 and the Definition 2 directly imply that P has w-RIP of
order (k, p) for p ≥ 1 − n2α where



rc1 δ 2

),
α = min  (
2
δ
4 1+δ+ 2

rc1 δ 2 
4

=

(

rc1 δ 2

2
4 1 + δ + δ2

) = O(r)

Theorem 5.3.3 proves that, the proposed SERP can be utilized in at least one greedy solver and
all convex relaxation methods and provide a “for any arbitrary but fixed” guarantee of recovery in
ℓ2 < cℓq norm sense even in case of noisy samples and compressible signals. In the next section,
we show that the proposed projection matrix could be utilized in combinatorial solvers as well.

71

5.4 A Combinatorial solver for SERP
As before, assume P ∈ Rm×n is generated by using Algorithm 8. In this section, we show how
an already available combinatorial solver for binary projection matrices could be easily modified
to support the proposed sparse real-valued projections. To that end, we have considered the Sequential Sparse Matching Pursuit (SSMP) decoder algorithm, introduced in [68] due to its low
computational costs and also optimality of sample requirements m = O(k log n/k). We prove that
by changing only one line in the SSMP algorithm, one can come up with a Modified version of
Sequential Sparse Matching Pursuit (MSSMP) with a ℓ2 < Cℓ2 guarantee of recovery which is
stronger than the ℓ1 < Cℓq guarantee in case of the original SSMP decoder algorithm. Algorithm
9 outlines the proposed MSSMP decoder for the class of sparse real-valued projection (SERP) matrices. The only difference between the MSSMP algorithm and SSMP is in line 5 of Algorithm 9,
)
(
where in [68] for a fixed index i, ||P x{j} + zei − y||1 is minimized. Note that for a fixed index
)
(
i, the minimizer z to ||P x{j} + zei − y||2 is

z=

(
)
P T:,i y − P x{j}
||P T:,i ||22

(
)
P TΩ ,i yΩi − P Ωi ,: x{j}
i
=
||P TΩ ,i ||22

(5.21)

i

(
)
while the minimizer to ||P x{j} + zei −y||1 in [68] is the median of the vector yΩi −P Ωi ,: x{j} .
To prove the ℓ2 < Cℓ2 guarantee of recovery for the proposed pair of sensing operator and
combinatorial solver, we essentially follow the proof in [68]. However, here the w-RIP feature
of the proposed projection matrix is utilized in the proof, which has different characteristics than
its RIP-1 counterpart in the case of the binary projections in [68, 38, 37]. To provide an easy
comparison between the presented proof and the one in SSMP, we tried to adapt the notations used
in [68] as much as possible in here.

72

Algorithm 9: The Modified version of the Sequential Sparse Matching Pursuit (MSSMP),
a combinatorial algorithm for class of sparse real-valued projections. For any vector x, the
hard-thresholding operator Hk [x] returns a vector which keeps the k largest coefficients of x
and is zero in the rest of entries. In other words, Hk [x] is the best k sparse approximation of
the vector/signal x.
input : P ∈ Rm×n , y ∈ Rm , c ∈ N and k ∈ N
output: x ∈ Rn
x{0} ← 0n ;
for j = 1 to T = O (log ||x||) do
x′ ← x{j−1} ;
for S = 1 to (c − 1)k do
(
)
Find a coordinate i and increment z that minimizes ||P x{j} + zei − y||2 ;

x′ ← x′ + zei ;
end
x{j} ← Hk [x′ ] ;
end

The algorithms SSMP and hence MSSMP are composed of two nested loops. The objective of
the inner loop is to reduce the discrepancy of the estimation in the j-th iteration ||P x{j} − y||l ,
where l = 1 in case of SSMP and l = 2 for MSSMP. The outer loop just enforces that the
signal estimation to be sparse. The following theorem (which itself is built based upon some other
lemmas) proves that the modified SSMP can handle the proposed real valued projection matrix and
guarantee ℓ2 < Cℓ2 recovery for a fixed but arbitrary signal.
Theorem 5.4.1. Given the projection matrix P , noisy samples y = P x + µ, the sparsity level
k = ||x||0 , the combinatorial Algorithm 9 returns an estimate xˆ such that ||ˆ
x − x||2 = O(µ) with
a probability of at least 1 − log ||x||2 /nα (where α is a constant) if w-RIP constant of P δ2k,p is
sufficiently small for p ≥ 1 − 1/nα+1 .
Proof : Let us use x{j} to denote the estimated value of the true signal x in the beginning of
the j-th iteration of the outer loop. Note that in the start of the outer loop, x{j} is k-sparse. By
Lemma 5.4.2 (stated and proved below), if for a fixed regularization parameter cl , the discrepancy
73

of the estimation is still high (i.e. ||P x{j} − y||2 ≥ cl ||µ||2 ) then there exists an index i ∈ [n] and
a step size z such that

||P

(

x{j} + zei

)

√
− y|| ≤

1−

cu
||P x{j} − y||
k

(5.22)

where cu is a function of w-RIP constant of P and also the expander quality of the corresponding
bipartite graph. Thus after O(k) iterations of the inner loop

||P x′ − y|| ≤ β||P x{j−1} − y||, β = e−cu /2

(5.23)

Noting that P x′ − y = P (x′ − x) − µ, triangle inequality can be applied to get
(
)
||P x′ − x || ≤ β||P (x{j−1} − x)|| + (1 + β) ||µ||

(5.24)

To relate the improvement on discrepancy to the improvement on signal estimation, one could
utilize w-RIP on the set of columns corresponding to Support{x′ − x} to conclude
√
||x′ − x|| ≤ β

1 + δ {j−1}
1+β
||x
− x|| + √
||µ||
1−δ
1−δ

Define constants

√
α = 2β

1+δ
2 (1 + β)
, cµ = √
1−δ
1−δ

(5.25)

(5.26)

Then by Lemma 5.4.3 (stated and proved below), we have:

||x{j} − x|| ≤ α||x{j−1} − x|| + cµ ||µ||

74

(5.27)

Now if the expander quality of the corresponding bipartite graph and also the number of samples are sufficiently high (leading to smaller values for β and δ respectively), then α would be less
than one. In that case, running the outer loop for z iterations, we would have

||x{z} − x|| ≤ αz +

2cµ
||µ||
1−α

(5.28)

Thus running the outer loop only for O (log ||x||) iterations leads to ||x{z} − x|| = O(µ).
As a final remark, note that Algorithm 9 requires τ = O(k log ||x||) iterations and in each
iteration it relies on w-RIP inequalities of P for signals which are non-zero at most in 2k indices
(see proof of Lemma 5.4.2). Hence to recover an arbitrary but fixed signal x from its compressive
samples with a target probability of q, P must have w-RIP of order (2k, p) where pτ ≥ q.
In the following, we state the two lemmas used in the proof of the Theorem 5.4.1.
Lemma 5.4.2. (Equivalent of Lemma 3 in [68]) Let y = P x+µ. If x′ is k-sparse and ||P x′ −y|| ≥
cl ||µ|| and P has a w-RIP of a proper order, then there exist an index i and a positive constant cu
such that
(
( ′
)
cu )
′
2
||P x − xi ei + xi ei − y|| ≤ 1 −
||P x′ − y||2
k

(5.29)

∆ = x′ − x

(5.30)

Proof : Define

Since y = P x + µ then (5.29) can be re-expressed as:
√
cu
||P (∆ − ∆i ei ) || ≤ 1 − ||P ∆ − µ||
k

75

(5.31)

Note that from the assumption in the lemma

(cl − 1) ||µ|| ≤ ||P x′ − y|| − ||µ|| ≤ ||P x′ − y + µ|| = ||P ∆||

Since ||∆||0 ≤ 2k hence w-RIP implies that ||P ∆|| ≤

√

(5.32)

1 + δ||∆||. Consequently:

c −1
||∆|| ≥ √l
||µ||
1+δ

(5.33)

T = Support{∆} = {i : ∆i ̸= 0}

(5.34)

Define

Note that 0 ≥ Card{T } = ||x − x′ ||0 ≤ 2k. Assume sort entries of ∆ in a decreasing order, i.e.

∀i ∈ [Card{∆} − 1] : |∆i | ≥ |∆i+1 |

(5.35)

Consider G, the bi-partite graph corresponding to the projection matrix P . Assume one enumerate
−
the edges of the graph G in lexicographic order and forms the sets Ω+
i and Ωi by following this

rule: if (i, j) is the first edge going to the vertex j, then j would be included in Ω−
i , otherwise it
goes to the set Ω−
i [68]. More formally
−
+
Ω+
i = {j : j ∈ Ωi , s.t. z < i, P j,z ̸= 0}, Ωi = Ωi \Ωi

(5.36)

Note that sets Ω+
i are pair-wise disjoint. As before, let D = dr. With a slight abuse and for the
simplicity of notations, let us use |A| for a set to denote its cardinality. Since, P corresponds to an

76

((c + 1)k, D, q) expander graph (where c ≥ 1), one has

∀s ≤ (c + 1)k :

s
∑

|Ω−
i | ≤ qDs

(5.37)

i=1

For a fixed constant cs > 1, define:

−
+
T + = {i : i ∈ T and |Ω−
i | < qDcs }, T = T \T

(5.38)

As in [16], our goal is to show that (a) most of energy of ∆ is in coordinates T + and (b) there is
an index in T + , such that (5.29) holds true for that index.
Suppose that, entries of T are sorted increasingly, i.e. T − = {u1 , u2 , . . .} where ui < ui+1 .
Now since uk ≥ k, hence

k
∑

u

|Ω−
ui | ≤

i=1

k
∑

|Ω−
i |

(5.39)

i=1

By definition ∀i ∈ T − we have |Ω−
i | > qDs and therefore

kqDcs <

k
∑

|Ω−
u |
i

(5.40)

i=1

On the other hand since the graph G is D = rd left regular, thus
u

k
∑

|Ω−
i | ≤ qDuk

(5.41)

uk ≥ kcs

(5.42)

i=1

From the last two equations, we get

77

The immediate consequence is that for all i ∈ [k]:

|{1, 2, . . . , ui } ∩ T + | ≥ k (cs − 1)

(5.43)

As in [68] for any i ∈ [k], let Si be the set containing the smallest (cs −1) elements of {1, 2, . . . , ui }∩
T + \{∪k−1
i=1 Si }. Noting that ∀j ∈ Si : |∆j | ≥ |∆ui | and |Si | ≥ (cs − 1), one has

||∆||2 ≥ ||∆T − ∪{∪ S } ||2 =
i i



∑

cs

∑

ui ∈T −


∆2j  ≥

(5.44)

∆2u = cs ||∆T − ||2
i

(5.45)

∆2u +

ui ∈T −

∑

i

j∈Si

Recalling that ||∆||2 = ||∆T − ||2 + ||∆T + ||2 , it can be concluded that
√
1
||∆T + || ≥ 1 − ||∆||
cs

(5.46)

Define:

gaini = ||P ∆ − µ||2 − ||P (∆ − ∆i ei ) − µ||2 =

(5.47)

||P ∆||2 − ||P (∆ − ∆i ei ) ||2 − 2µT P ∆i ei

(5.48)

Since the i-th column of P is non-zero only in row indices Ωi , hence ∀j ∈ [m]\Ωi : (P ∆)j =
(P ∆ − ei ∆i )j . It is straightforward to verify that:

gaini = 2 (P ∆)TΩ (P ∆i ei )Ω − || (P ∆i ei )Ω ||2 − 2µT P ei ∆i
i
i
i

78

(5.49)

Using the fact that ∆ = ∆T + + ∆T − and summing gaini over all i ∈ T + yield:
∑
i∈T +

gaini = 2 (P ∆)TΩ P ∆T + −
i
(

2||P ∆T + ||2 + 2 P ∆T −

)T (

∑

|| (P ∆i ei )Ω ||2 − 2µT P ∆T + =
i

i∈T +

∑

)

P ∆T + −

|| (P ∆i ei )Ω ||2 − 2µT P ∆T +

i∈T +

i

(5.50)
(5.51)

By w-RIP on support of T + , one has:

|| (P ∆i ei )Ω ||2 ≤ (1 + δ)
i

∑

||∆i ||2 = (1 + δ) ||∆T + ||2

(5.52)

i∈T +

Using the lower bound of w-RIP for T + one has ||P ∆T + ||2 ≥ (1 − δ)||∆T + ||2 and hence
(5.51) can be simplified to
∑

(
)T (
)
gaini ≥ (1 − 3δ) ||∆T + ||2 + 2 P ∆T −
P ∆T + − 2µT P ∆T +

(5.53)

i∈T +

Note that by Cauchy-Schwarz inequality

µ T P ∆T + ≤

1+δ
||∆ + ||2
cl − 1 T

(5.54)

Combining two last equations we have:
∑
i∈T +

(
)
(
)T (
)
2 (1 + δ)
||∆T + ||2 + 2 P ∆T −
P ∆T +
gaini ≥ 1 − 3δ −
cl − 1

(5.55)

Using w-RIP of P on two partitions of T , we have (see [28])

||P TT + P T − || ≤ δ

79

(5.56)

and hence
(
)T (
)
P ∆T + | ≤ δ||∆T + ||||∆T − ||
| P ∆T −

(5.57)

Substituting (5.57) into (5.56) yields
∑
i∈T +

(
)
2 (1 + δ)
gaini ≥ 1 − 3δ −
||∆T + ||2 − 2||∆T + ||||∆T − ||
cl − 1

(5.58)

By (5.46) and (5.58), we have
√
)
(
1
1
2 (1 + δ)
||∆T + ||2 − 2δ
− 2 ≥ C||∆||2
gaini ≥ 1 − 3δ −
cl − 1
cs cs
+

∑
i∈T

for

√
((
)
)(
)
1
2
(1
+
δ)
1
1
C¯ =
1−
1 − 3δ −
− 2δ
−
cs
cl − 1
cs c2s

(5.59)

(5.60)

Recall that |T + | ≤ 2k. Thus there exists an index j ∈ [n] such that:

gainj ≥

2
2
¯
¯
C||∆||
C||∆||
≥
|T + |
2k

(5.61)

To relate the gain to discrepancy ||P x′ − y||, it can be shown that
√
1 + δ||∆|| ≥ ||P ∆|| = ||P x′ − P x − µ + µ|| ≥ ||P x′ − y|| − ||µ||

(5.62)

By formula (5.33)
||∆|| ≥ √

||P x′ − y||
(
)
1 + δ 1 + c 1−1
l

80

(5.63)

Now one can simplify (5.61) to

∃j ∈ [n] : gainj ≥

cu
||P x′ − y||2
k

(5.64)

for
cu =

C¯
(
)2
2 (1 + δ) 1 + c 1−1

(5.65)

l

Note that, to improve the descrepancy of estimation and hence reducing the error of estimation,
we require that cu to be strictly positive. Since the denominator of cu is strictly positive, it only
suffices to have C¯ > 0. This can be achieved if δ (w-RIP constant) is sufficiently small. For
instance, for cs = 2 and cl = 4, C¯ would be positive if δ ≤ 0.058.
Lemma 5.4.3. (Equivalent of Lemma 6 in [68]) For any k-sparse vector x and any vector x′ we
have:
[ ]
||Hk x′ − x|| ≤ 2||x′ − x||

(5.66)

Proof : The proof of Lemma 6 in [68] is for ℓ1 norm, however it holds for any other valid
norm as well. Nevertheless for the sake of completeness, here we present the proof. Note that,
[ ]
[ ]
Hk x′ is the closest k-sparse vector to x′ . Hence ||Hk x′ − x′ || ≤ ||x − x′ ||. Consequently by
triangle-inequality we have:
[ ]
[ ]
||Hk x′ − x|| ≤ ||Hk x′ − x′ || ≤ ||x′ − x|| + ||x′ − x|| ≤ 2||x′ − x||

(5.67)

Let us conclude this section with some discussion about the complexity of the MSSMP decoder.
Computing P x in case of a sparse binary projection is achieved by adding some entries of x (since
all non-zero entries of P are one). However, for a sparse real-valued projection, computing P x

81

must be performed through computing a series of inner products. Clearly, this update/computation
is heavier for a sparse real-valued projection matrix than the one in case of a sparse binary projection matrix. However, it is only a constant factor worse (the complexity of the multiplication
divided by the complexity of summation). On the other hand, computing P x in case of a sparse
real-valued projection matrix is much lighter than when P is a random-dense matrix. Also as stated
before, the algorithms of MSSMP and SSMP are exactly the same, except line 5 of Algorithm 9. In
that line of SSMP, the median of d numbers are computed while the same line in MSSMP demands
the orthogonal projection of two vectors in Rrd . Although this operation is heavier than computing the median of d numbers, order-wise, these two operations have exactly the same complexity
(since r = O(1)). Consequently, the complexity order of MSSMP and SSMP are the same.

5.5

Simulation Results

In this section, we present some numerical results regarding the performance of the proposed
sparse real-valued projection matrices (SERP) under combinatorial, greedy, and convex-relaxation
approaches. Here, we have considered two scenarios: (i) when compressive samples are noiseless
and (ii) the samples are contaminated by noise.
For the noiseless case, the performance of the pair of traditional sparse binary projection matrices and SSMP as the decoder is compared with the proposed pair of sparse real-valued projection
and MSSMP in the decoder. More specifically, we fixed the signal length to n = 10, 000 and
considered signal sparsities k = ||x||0 = 100, 200, 300, . . . , 800. For each sparsity level, we
considered six levels of sampling m = 3k, 4k, . . . , 8k. For each configuration of k and m, we generated a signal x ∈ Rn and uniformly at random selected k of its indices and let the signal values
at those indices be a uniform random variable in the range of [−0.5, 0.5]. Then, we recorded the

82

1
8
0.8

m/k

7
6

0.6

5

0.4

4

0.2

3
100

200

300

400

k

500

600

700

800

0

Figure 5.1: Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of
length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when the
projection matrix is sparse binary and the solver is SSMP.
average of relative reconstruction error (||∆(y, P ) − x||/||x||) in 50 independent trials. The results
for the pair of a sparse binary projection matrix with d = 18 and SSMP decoder are presented
in Figure 5.1. For the same value of d, the results for MSSMP and sparse real-valued projection
matrices with r = 1 and r = 2 are respectively shown in Figure 5.2 and Figure 5.3. As shown in
those plots, MSSMP and the sparse real-valued projections lead to (sometimes much) lower levels
of distortions in recovery when the number of samples are not sufficiently large (e.g. m = 4k
and m = 5k for all values of k). However, both of these two approaches (MSSMP and SSMP)
introduce virtually no error in the recovery process when m > 6k. Note that under the MSSMP
decoder, the results from sensing matrices with r = 1 are often less distorted than those from
matrices with r = 2.
Now let us focus on the scenario when samples are noisy, i.e. y = P x + µ where ||µ|| > 0.
Fixing the signal length, sparsity and number of samples respectively to n = 5000, k = 250 and
m = 5k = 1250, we varied the relative error in sampling (||µ||/||P x||) from 0.1 to one in steps
of size 0.1. For each relative error in sampling, we generated 100 random signals and projection

83

1
8

0.9
0.8

7
0.7
0.6

m/k

6

0.5
5

0.4
0.3

4
0.2
0.1

3
100

200

300

400

k

500

600

700

800

0

Figure 5.2: Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of
length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when
the projection matrix is a sparse real-valued sensing matrix (SERP) with r = 1 and the solver is
MSSMP.

1
8

0.9
0.8

7
0.7
0.6

m/k

6

0.5
5

0.4
0.3

4
0.2
0.1

3
100

200

300

400

k

500

600

700

800

0

Figure 5.3: Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of
length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when
the projection matrix is a sparse real-valued sensing matrix (SERP) with r = 2 and the solver is
MSSMP.

84

matrices as in the noiseless case scenario (i.e. uniformly at random selecting the indices of the
signal to be non-zero and populate those indices uniformly in the range of [−0.5, 0.5]). Then, the
averages of relative errors in reconstruction as a function of the relative error in sampling were
recorded for all three types of CS-decoders and are presented in Figure 5.4-Figure 5.6. Let us, we
briefly highlight the key results of these plots. As shown in Figure 5.4, the pair of sparse real-valued
projections and MSSMP decoder leads to higher qualities of recovery, when compared to the pair
of sparse binary projections and SSMP. Within the class of sparse real-valued projections, when the
relative noise energy is small, projection matrices with r = 1 seems to be better sensing operators.
However, as the noise energy increases, we introduce less error in the recovery process if non-zero
entries of the sparse projection matrix are Gaussian random variables in R2 (i.e. r = 2). When BP
[17] is employed as the decoder, the class of sparse real-valued projection matrices are virtually as
good as dense Gaussian projection matrices (see Figure 5.5). Within the class of sparse real-valued
projections and for fixed value of d, BP consistently introduces less errors in the approximation
for matrices with r = 2. For evaluating the performance of the sparse real-valued projection in
a greedy CS framework, we selected the algorithm of Cosamp [28] for the decoder side. Figure
5.6 shows that Cosamp is more sensitive to the deployed projection matrix compared to BP when
the projection matrix is from the class of sparse real-valued projections. Figure 5.6 and Figure 5.5
suggest that for a fixed value of d and under non-combinatorial class of solvers, deploying sparse
real-valued projection matrices with parameter r being slightly greater than one, leads to less error
in the recovery process.

85

Average of relative
error in reconstruction

1
0.8
0.6
Real sparse projection, r=1−MSSMP
Real sparse projection, r=2−MSSMP
Binary sparse projection−SSMP

0.4
0.2
0
0

0.1

0.2

0.3

0.4
0.5
0.6
0.7
Relative error in sampling

0.8

0.9

1

Figure 5.4: Comparing the performance of the sparse real valued projection matrices and MSSMP
with sparse binary projections under SSMP decoder when samples are noisy.

Average of relative
error in reconstruction

1
0.8
0.6

SERP−BP, r=1
SERP−BP, r=2
Random Gaussian dense−BP

0.4
0.2
0
0

0.1

0.2

0.3

0.4
0.5
0.6
0.7
Relative error in sampling

0.8

0.9

1

Figure 5.5: Comparing the performance of the sparse real-valued and dense Gaussian projection
matrices when BP is the decoder and samples are noisy.

86

Average of relative error
in reconstruction

1
0.8
0.6

Real sparse projection−Cosamp, r=1
Real sparse projection−Cosamp, r=2
Random Gaussian dense−Cosamp

0.4
0.2
0
0

0.1

0.2

0.3

0.4
0.5
0.6
0.7
Relative error in sampling

0.8

0.9

1

Figure 5.6: Comparing the performance of the sparse real-valued and dense Gaussian projection
matrices when Cosamp is the decoder and samples are noisy.

87

Chapter 6
Future works

6.1 Introduction
CRISP, HCS and many other fast combinatorial algorithms (e.g. [38, 37]) follow the traditional
(i.e. not message passing) approach of combinatorial algorithms where each coefficient of the
unknown signal is estimated only once. The main advantage of such approach is that the decoding
process would be extremely fast. On the negative side, such algorithms suffer from high sample
requirements. Algorithm 10 shows how typical combinatorial approaches to CS operate for solving
(6.1) under the zero pseudo norm p = 0. Our initial effort was to utilize the w-RIP property of
SERP matrices to show that one can find an extremely low complexity traditional combinatorial
algorithm under SERP encoding matrices, without sacrificing sample requirements. As we show
in the rest of this chapter, such algorithm indeed exists when samples are not affected by noise.
However, the moment that samples are contaminated by noise, we would be faced with some
challenging problems. We leave solving those challenges for future works and propose some
possible research paths to that end.

6.2

Another look at combinatorial algorithms

Algorithm 10 summarizes a traditional approach for combinatorial algorithms to CS. In the following, we show that Algorithm 11 could be perceived as the recursive formulation of Algorithm

88

10 for solving the equation (6.1) when p is set to zero in the third line of Algorithm 11.

ˆ
arg min∥ˆ
x∥p : s.t. y = P x

(6.1)

ˆ
x

Algorithm 10: A typical combinatorial algorithm for CS. The set of coefficients which their
values are determined, is denoted by z.
input : P ∈ Rm×n and y ∈ Rm
output: xˆ ∈ Rn
yˆ = y, z = ∅ ;
while z ̸= [n] do
Find l ⊆ [m] such that ||xωl ||0 = 1 ;
Let xˆωl denotes estimate of values xωl ;
For θ = ωl : yˆΩ ← yΩ − P Ω ,ω xωl , z ← z ∪ ωl ;
θ
θ
θ l
end

Algorithm 11: ∆(P , θ, y) the recursive formulation of Algorithm 10.
input : P ∈ Rm×n , θ and y ∈ Rm
output: xˆ ∈ Rn
xˆ = 0n ;
Find l ⊆ [m] where ||xωl ||0 = 1 ;
xˆθ1 = arg min∥ξ∥p : s.t. yl = P l,θ1 ξ ;
ξ

xˆθ\θ ← ∆ (P , θ\θ1 , y − P xˆ) ;
1

Initializing with θ = [n] and after finding θ1 such that ∥ˆ
xθ1 ∥0 = 1 (i.e. finding an isolated nonzero), Algorithm 11 recursively finds a subset of signal coefficients in the un-recovered part θi+1 ⊆
[n]\ ∪j∈[i] θj such that ∥xθi+1 ∥0 = 1 where θj denotes the value of θ1 (line 2 in Algorithm 11)
in the j-th level of the stack. This process continues until all non-zero coefficients are recovered.
Note that, for a vector x with a single non-zero entry (∥x∥0 = 1) and y = P x, the solutions
to (6.1) for p = 0 and p = 1 are equal by the triangle inequality. Also note that for any signal
x and two disjoint subsets of a ⊆ [n] and ac := [n]\a one has: ∥xa ∥p + ∥xac ∥p = ∥x∥p both
for p = 0 and p = 1 (see line 3 and 4 in Algorithm 11). Combining this observation with the
89

equality of solutions of (6.1) for p = 0 and p = 1, one can conclude that in Algorithm 11, the
solutions for p = 0 and p = 1 are exactly the same. Hence, typical combinatorial approaches to
CS (Algorithm 10) could be recast as a recursive ℓ1 minimizer. The success of the aforementioned
recursive algorithm depends on:
• The ability of step 3 in Algorithm 10 to find isolated non-zeros.
• The quality of the projection matrix P in recoverability
More specifically, the first condition demands that the decoder at the i-th iteration could truly
find θi = ωl , a subset of signal coefficients, containing only one non-zero (∥xθi ∥0 = 1). Meanwhile, the second requirement states that, at least one isolated non-zero must exist throughout the
recovery process and hence such process shall not be halted before the perfect recovery. It has
been shown [37, 38] that certain sparse matrices, for instance the adjacency matrices of high quality expander graphs provide the second condition. In the following, we assume that the employed
sparse projection matrix has the second condition, meaning that if isolated non-zeros are correctly
identified in step 3 of Algorithm 10, then the recovery process will not halt. To that end and to
make our analysis simple, we adapt the SERP matrices, introduced in the previous chapter. That
is, each column of P is non-zero in exactly rd = O (log n) random indices for a constant r ≥ 1.
Having the guarantee of recoverability, now our concern will be to satisfy the first condition, i.e.
correctly finding isolated non-zeros. The most common approach for finding isolated non-zeros
(in case of a binary sparse projection matrix) is to apply a voting mechanism (see [68, 35, 38, 37]).
More specifically, assume that there exists a set of sample indices of l ⊆ [m] such that all of them
vote the same value and they span the same single coefficient i.e.:

∃l ⊆ [m] : ∀p, q ∈ l : yp = yq , ∩j∈l ωj = i
90

(6.2)

Then typical combinatorial algorithms infer that xˆi equals to that vote and xˆj = 0 for j ∈ ωl \i.
Such conclusion might imply certain presumptions about the underlying signal. For example, it
has been assumed in [35] that no two subsets θ1 , θ2 ⊆ [n] exist such that ∥xθ2 ∥1 ≈ ∥xθ2 ∥1 or
other assumptions in some other methods such as non-negativity of the signal and so on. Consequently, the applicability of the aforementioned approaches will be restricted to signals, satisfying
corresponding conditions. On the other hand, due to the non-binary nature of the proposed sparse
projection matrix P , the mechanism of voting is not applicable in case of P . We introduced the
notions of “alignment” and “isolation” in Chapter 2. In this chapter, we show that the method
of voting is a special case of such notions (alignment and isolation) in case of binary matrices.
Furthermore, we present necessary and sufficient conditions for detecting isolated non-zeros under
real-valued sparse projection matrices.
As before, assume P ∈ Rm×n is a matrix where each of its columns is non-zero in exactly
rd = O (log n) random indices. Also assume that, P is an instance of SERP matrices. Since
P corresponds to the adjacency matrix of a high quality expander graph (with high probability)
[37] and the analysis of [37] is not based on values of P , therefore the recoverability holds for P .
Consequently in here, we only discuss lines 3 and 4 in Algorithm 11. We show, how one can find
subsets of signal coefficients containing only one non-zero, by examining compressive samples.
To that end, it would be beneficial to have a brief review of notions of isolation and alignment in
here.
We say “xi is isolated in sample indices of l ⊆ [m] if ∥xωl ∥0 = 1 and xi ̸= 0. In other words,
xi is the only non-zero signal coefficient (among indices ωl ) spanned by samples yl . Clearly, since
P :,i is non-zero only in indices Ωi , therefore l ⊆ Ωi . For q ⊆ [m] we say “yq is aligned with
the j-th column of P ” if: ∀i ∈ q : P i,j ̸= 0 and ∃α ̸= 0 : yq = αP q,j . Note that, if P
is binary (i.e. P q,j is an all one vector), then the definition of alignment reads as “there exists
91

a set of samples (yq ), all voting the same value of α”. This is exactly the voting mechanism,
employed in combinatorial algorithms. This verifies our claim that the voting mechanism is in fact
an alignment in the special case of binary matrices. Assume that xi ̸= 0 is isolated in sample
indices of l ⊆ Ωi ⊆ [m]. Then:
yl = P l,ω xωl = xi P l,i
l

(6.3)

In words, if xi is isolated in sample indices of l then yl will be aligned with the i-th column of
P . Thus isolation is a necessary condition for alignment. Now, if isolation is a sufficient condition
for alignment as well, then isolated non-zero coefficients in the signal could be found by looking
for alignment of samples among the columns of P . For now assume that isolation and alignment
are equivalent. Then step 3 and 4 in Algorithm 11 could be implemented as follows: to see whether
xi is isolated in a set of unknown samples, one might consider P Ωi ,i and looks for q ⊆ Ωi such
that yq = αP q,j . If there exists any q with those conditions, then (6.3) could be applied to infer
xi = α. In the rest of this section, we identify the underlying conditions under which isolation is a
sufficient condition for alignment.
Here, it is important to highlight the impact of the size of the set l ⊆ Ωi ⊆ [m] in our analysis.
Since, in an extreme case when |l| = 1 in (6.3), there always exists a meaningless alignment.
Recall that, each column of P is non-zero in exactly rd = O (log n) entries. In [37], it has been
shown that in case of a sparse binary projection matrix based on the adjacency matrix of a left
d-regular random graph, there exists at least τ ≥ 1 + d/2 samples which span only one (and the
same) non-zero in all stages of decoding. Hence, in here we have l = τ > rd/2 = O (log n) and
ignore alignments with size smaller than rd/2 in the decoder.
As before, assume yl is aligned with the i-th column of P , where l > d/2. Since by definition
yl = P l,ω xωl and ∥yl ∥2 > 0, there are two possible cases here: (a) ∥xωl ∥0 = 1 or (b) ∥xωl ∥0 > 1.
l

92

In the first case, since yl is non-zero in all entries (∥yl ∥0 = |l|) and for all j ∈ ωl \i we have
∥P l,j ∥0 < |l| (this comes from the expander property of P ) thus one can conclude that, xi is the
only non-zero among indices of ωl and its value can be easily computed by (6.3). In words, in the
first case, isolation is a sufficient and necessary condition for alignment. Unfortunately, a similar
argument generally does not hold true for the second case (∥xωl ∥0 > 1) for P . Nevertheless, here
we show how if the parameter r in SERP matrices is sufficiently large, then the second case would
never happen with a very high probability. To that end, we follow these steps (a) for any subset of
samples l ⊆ [m] with l ≥ 1 + d/2 = O (log n) first we find Q = max{∥xωl ∥0 }, the maximum
number of non-zeros which will be spanned (with high probability) by samples l. (b) Then we
find the value of r, such that every Q columns of P l,ω would be independent. Finally by using
l
elementary arguments, one can show that it is impossible to have ∥xωl ∥0 > 1 and yl to be aligned
with the i-th column of P .
For a fixed subset of sample indices l, define the binary random variable Xi as:

Xi =

Note that Q =



 1 xi ̸= 0, i ∈ ωl

 0 otherwise

(6.4)

∑n

i=1 Xi . Now since the probability that xi ̸= 0 is independent from the

probability of the event that i ∈ ωl , thus:
k
Pr (Xi = 1) =
n

(

(m−l) )
d)
1 − (m

(6.5)

d

For now assume that r = 1, later we derive how large we should set r. Assuming k (and hence
m) is much larger than one, one can use the approximation: 1/(m − d + 1) ≈ 1/m. Using the

93

inequality 1 + α ≤ eα , it can be shown that

µ = E[Q] = n Pr (Xi = 1) ≤ k (1 − exp (l/m))d

(6.6)

Recalling that d = O (log n) = c1 log n, d/2 < l < d and m = O (k log n) = c2 k log n, it
can be easily derived that µ = O(1). Now it only remains to bound the deviation of Q from µ.
Fortunately, since Q is the sum of Bernoulli random variables, there exists good concentration
inequalities for it. For instance, by using Chernoff bound for Poisson trials, it can be easily shown
that for δ ≥ 6.389 (which makes log (1 + δ) > 2) we have:
1
Pr (Q ≥ µ + δβ log n) ≤ β
n

(6.7)

Now if the spark [43] of P l,ω is more then Q, then it is impossible to have yl aligned with
l
the i-th column of P and at the same time ∥xωl ∥0 > 1. This can be proved by contradiction.
More specifically, assume 1 < ∥xωl ∥ < Q and yl is aligned with the i-th column of P (i.e.
yl = P l,ω xωl = αP l,j ). Define a new vector x′ where x′j = xj − α and x′i = xi for i ∈ ωl \j.
l
Then P l,ω x′ = 0 and ∥x′ ∥0 ≤ Q which contradicts the presumption that the spark of P l,ω is
l

l

more than Q. In summary, if the parameter r in SERP matrices is tuned such that (6.7) is satisfied,
then alignment translates into isolation with high probability .

6.3 noise
It seems that since w-RIP holds for SERP matrices and typical combinatorial decoding algorithms
are in fact quick implementation of BP/linear programming [17] (minimizing ℓ1 norm of the solution subject to a set of equations), hence such combinatorial algorithm should be robust when
94

samples are noisy. A naive justification could be following these three steps (a) the decoding algorithm is trying to find the sparsest solution and for each sub-problem it never makes a mistake (b)
using the triangle inequality, the solution found for that sub-problem has the minimum ℓ1 norm and
(c) applying induction, the algorithm should find all non-zero coefficients correctly. Since w-RIP
is a sufficient condition for the robustness of BP (at least in a for any arbitrary signal), this also
proves that our algorithm should be robust in the presence of noise.
On the other hand, we are facing another challenge when samples are contaminated by noise
and that is: when samples are noisy, then no perfect alignment can be found. This problem might
be solved by allowing ϵ-alignment. We say that a vector h is ϵ-aligned with the i-th column of P
if:

(
| cos−1

|P T:,i h|
∥P :,i ∥2 ∥h∥2

)
|≤ϵ

(6.8)

Since the argument of cos−1 is strictly positive and cos−1 is strictly decreasing, one might only
consider the argument inside parentheses when using the notion of ϵ-alignment during the proof.
As noted before, RIP translates into the fact that sub-matrices of a matrix behave like an orthonormal projection. Suppose y = P x + e where e is the noise vector, P has a small value of
isometry constant δk and x is non-zero in index of i. Then we can conclude that: columns of P are
so far apart from each others that the cosine of the angle between vector h = yΩi and vector P Ωi ,i
gets its maximum if the noise energy ∥e∥2 does not dominate the sample energy ∥h∥2 . Thus, still
we can find the isolated non-zeros even in the presence of the noise.
Assume that we are given noisy samples y = P x + e. Furthermore, suppose that a subset of
samples l span only one non-zero coefficient xi ̸= 0 (∥xωl ∥0 = 1 and i ∈ ωl ). The first question to
be answered is how noise would distort the angle of pure (noiseless) samples P l,ω xωl = xi P l,i
l
with P l,i compared to the noisy angel between P l,ω xωl + el and P l,i . Also, we need to find the
l

95

lower bound L for |P T:,i yl |/∥P :,i ∥2 ∥yl ∥2 when there is an isolation and also find the upper bound
U for the same term when there is no isolation. If U < δ < L for a constant δ (function of ϵ),
then the notion of ϵ-alignment could be applied in case of noisy samples. Providing a rigorous
proof is left as a future work. However, in the rest of this section, we highlight some directions and
challenges in that path.
A statistical argument could be applied here, where one derives the distribution of the numerator of 6.8. Note that when an isolation is occurred, the numerator inside cos−1 function is

P Tl,i yl = xi ∥P l,i ∥22 + P Tl,i el

(6.9)

The distribution of the first term ∥P l,i ∥22 is simply Chi-square. However, the distribution of the
second term P Tl,i el is Chi. Since two different distributions are added here, finding the distribution
of P Tl,i yl would be difficult. There could be two solutions for such challenge (a) using the method
of moment generating functions or (b) starting with a simple assumption that the noise e itself is
normally distributed, which makes the second term also Chi-squared. However even in the second
case, we have to deal with the problem of weighted-Chi squares. One might think that similar
approximations in the previous chapter could be utilized. However due to other approximations
which might be applied later, this act should be taken with cautious.
For the denominator, we have to find the distribution of the product of ∥P l,i ∥2 ∥yl ∥2 . Although
for each of multiplicands, the distribution could be derived fairly easily, finding the product distribution could be cumbersome. The problem could be even more challenging when we want to
calculate the distribution of the random variable generated by division of numerator by denominator. Although a remedy in here could be utilizing w-RIP inequalities.
To find the upper bound L, the distributions of the similar terms when there is no isolation have

96

to be calculated. For instance, when samples l are spanning more than one non-zeros then

P Tl,i yl =

∑

xi ∥P l,i ∥22 +

∑ ∑

P Tl,i P l,j

(6.10)

i∈ωl j∈ωl \i

i∈ωl

Again here, the first sum has a distribution of weighted Chi-squares. The second sum however
is distributed based on sum of normal products. Despite the fact that there exist closed formulas for
distribution of normal-product random variables, adding those terms could be challenging. However, again using the method of moment generating functions sounds attractive since it separates
such sum into product of two terms where there exists either good approximations or even closed
formulas for each term.

97

BIBLIOGRAPHY

98

BIBLIOGRAPHY

[1] Emmanuel Cands and Justin Romberg, “Quantitative robust uncertainty principles and optimally sparse decompositions”. (Foundations of Comput. Math., 6(2), pp. 227 - 254, April
2006)
[2] David Donoho, “Compressed sensing”, IEEE Trans. on Information Theory, 52(4), pp. 1289 1306, April 2006
[3] Emmanuel Cands, The restricted isometry property and its implications for compressed sensing. (Compte Rendus de l’Academie des Sciences, Paris, Series I, 346, pp. 589-592, 2008).
[4] Emmanuel Cands and Justin Romberg, “Practical signal recovery from random projection”,
(Preprint, Jan. 2005)
[5] Emmanuel Cands, Justin Romberg, and Terence Tao, “Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information”. IEEE Trans. on Information Theory, 52(2) pp. 489 - 509, February 2006
[6] D. Donoho and J. Tanner, “Neighborliness of randomly projected simplices in high dimensions”, Mar. 2005, Preprint.
[7] Emmanuel Cands and Terence Tao, “Near optimal signal recovery from random projections:
Universal encoding strategies?” IEEE Trans. On Information Theory, 52(12), pp. 5406 - 5425,
December 2006
[8] David Donoho and Yaakov Tsaig, “Extensions of compressed sensing”. (Signal Processing,
86(3), pp. 533-548, March 2006)
[9] Emmanuel Cands and Terence Tao, Decoding by linear programming. (IEEE Trans. on Information Theory, 51(12), pp. 4203 - 4215, December 2005)
[10] E. Candes and T. Tao, “Error correction via linear programming”, Found. of Comp. Math.,
2005, Submitted.
[11] Jarvis Haupt, Waheed U. Bajwa, Gil Raz, and Robert Nowak, “Toeplitz compressed sensing
matrices with applications to sparse channel estimation”. (Preprint, 2008)

99

[12] Dror Baron, Shriram Sarvoham, and Richard G. Baraniuk, “Bayesian compressive sensing
via belief propagation”. (Preprint, 2008)
[13] A.M. Bruckstein, D.L. Donoho, and M. Elad, “From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images”, SIAM Review, Vol. 51, No. 1, Pages 34-81,
February 2009.
[14] Dror Baron, Marco F. Duarte, Michael B. Wakin, Shriram Sarvotham, and Richard G. Baraniuk, “Distributed compressive sensing”. (Preprint, 2005)
[15] Shriram Sarvotham, Dror Baron, and Richard Baraniuk, “Measurements vs. bits: Compressed sensing meets information theory”. (Proc. Allerton Conference on Communication,
Control, and Computing, Monticello, IL, September 2006)
[16] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin, “A simple proof of
the restricted isometry property for random matrices”. Constructive Approximation, 28(3), pp.
253-263, December 2008, [Formerly titled “The Johnson-Lindenstrauss lemma meets compressed sensing”]
[17] S. Chen, , D. Donoho, and M. Saunders. “Atomic decomposition by basis pursuit”. Technical
Report 479, Department of Statistics, Stanford University, May 1995.
[18] R. Tibshirani, “Regression shrinkage and selection via the lasso”. J. Royal. Statist. Soc B.,
Vol. 58, No. 1, pages 267-288, 1996.
[19] Thomas Blumensath and Mike E. Davies, “Gradient pursuits”. (IEEE Trans. on Signal Processing, 56(6), pp. 2370 - 2382, June 2008)
[20] Ingrid Daubechies, Massimo Fornasier, and Ignace Loris, “Accelerated projected gradient
method for linear inverse problems with sparsity constraints”. (Preprint, 2007)
[21] T. Blumensath, M. E. Davies, “Stagewise weak gradient pursuits. Part I: Fundamentals and
numerical studies”. (Preprint, 2008)
[22] T. Blumensath, M. E. Davies, “Stagewise weak gradient pursuits. Part II: Theoretical properties”. (Preprint, 2008)
[23] Rahul Garg, Rohit Khandekar, “Gradient Descent with Sparsification: An iterative algorithm
for sparse recovery with restricted isometry property”. (ICML 2009, Montreal, Canada)

100

[24] Mrio A. T. Figueiredo, Robert D. Nowak, and Stephen J. Wright, “Gradient projection for
sparse reconstruction: Application to compressed sensing and other inverse problems”. (IEEE
Journal of Selected Topics in Signal Processing: Special Issue on Convex Optimization Methods for Signal Processing, 1(4), pp. 586-598, 2007).
[25] S. G. Mallat and Z. Zhang, “Matching Pursuits with Time-Frequency Dictionaries”, IEEE
Transactions on Signal Processing, December 1993, pp. 3397-3415.
[26] Y. C. Pati R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal Matching Pursuit: Recursive
Function Approximation with Applications to Wavelet Decomposition”, Proceedings of the 27
th Annual Asilomar Conference on Signals, Systems, and Computers, Nov. 1993.
[27] Thong T. Do, Lu Gan, Nam Nguyen, and Trac D. Tran, “Sparsity adaptice matching pursuit
algorithm for practical compressed sensing”. Asilomar Conference on Signals, Systems, and
Computers, Pacific Grove, California, October 2008
[28] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples”. Preprint, 2008
[29] Deanna Needell and Roman Vershynin, “Signal recovery from incomplete and inaccurate
measurements via regularized orthogonal matching pursuit”. Preprint, 2007
[30] David L. Donoho, Yaakov Tsaig, Iddo Drori, and Jean-Luc Starck, “Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit”. Preprint, 2007
[31] Piotr Indyk, “Explicit constructions for compressed sensing of sparse signals”. Symp. on
Discrete Algorithms, 2008.
[32] R. Berinde, A. C. Gilbert, P. Indyk, H. Karloff, and M. J. Strauss, “Combining geometry and
combinatorics: A unified approach to sparse signal recovery”. (Preprint, 2008)
[33] Radu Berinde and Piotr Indyk, “Sparse recovery using sparse random matrices”. (Preprint,
2008)
[34] Ronald A. DeVore, “Deterministic constructions of compressed sensing matrices”. J. of Complexity, 23, pp. 918 - 925, August 2007
[35] Shriram Sarvotham, Dror Baron, and Richard Baraniuk, “Sudocodes - Fast measurement and
reconstruction of sparse signals”. Proc. IEEE Int. Symposium on Information Theory (ISIT),
Seattle,Washington, July 2006

101

[36] Weiyu Xu and Babak Hassibi, “Efficient compressive sensing with deterministic guarantees
using expander graphs”. IEEE Information Theory Workshop, Lake Tahoe, September 2007
[37] Sina Jafarpour, Weiyu Xu, Babak Hassibi, and Robert Calderbank, “Efficient compressed
sensing using high-quality expander graphs”. (to appear in IEEE Trans Info Theory, 2009)
[38] M. Amin Khajehnejad, Alexandros G. Dimakis, Weiyu Xu, Babak Hassibi, “Sparse Recovery
of Positive Signals with Minimal Expansion” . (Preprint, 2009)
[39] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Hybrid Compressed Sensing”,
IEEE MMSP 2010.
[40] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Complex Sparse Projections for
Compressed Sensing”, Accepted for publication in Conference on Information Sciences and
Systems (CISS10), Johns Hopkins University, Baltimore, MD, USA.
[41] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Complex Randomness-inStructured Projections for Compressed Sensing”, in Proceedings of IEEE International Conference on Image Processing (ICIP09), Egypt.
[42] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Practical Compressed Sensing
with Log-of-Prime Projections”, in Proceedings of Conference on Information Sciences and
Systems (CISS09), Johns Hopkins University, Baltimore, MD, USA, March 18-20,2009.
[43] Michael Elad, “Optimized projections for compressed sensing”. IEEE Trans. on Signal Processing, 55(12), pp. 5695-5702, December 2007
[44] Venkat Chandar, “A negative result concerning explicit matrices with the restricted isometry
property”. (Preprint, 2008)
[45] http://sparselab.stanford.edu
[46] D. Donoho, “High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension”, Jan. 2005, Preprint.
[47] http://www.mathpages.com/home/kmath199.htm
[48] Martin Raab and Angelika Steger, “”balls into bins” - A simple and tight analysis”, Lecture
notes in computer science, vol. 1518, Springer, Berlin, 1998

102

[49] “Combinatorial Designs: Construction and Analysis” by Douglas R. Stinson, SpringerVerlag, New York inc, 2004
[50] M.Luby, “LT Codes”, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002
[51] Stphane Mallat, “A wavelet tour of signal processing” 2nd Edition, Academic Press, 1999,
ISBN 0-12-466606-x
[52] M. N. Do and M. Vetterli, “The contourlet transform: an efficient directional multiresolution
image representation,” IEEE Trans. on IP, 14(12):2091-2106.
[53] E. Cands and D. Donoho, “Curvelets a surprisingly effective nonadaptive representation for
objects with edges.” In: A. Cohen, C. Rabut and L. Schumaker, Editors, Curves and Surface
Fitting: Saint-Malo 1999, Vanderbilt University Press, Nashville (2000), pp. 105120.
[54] Clark, George C., Jr.; Cain, J. Bibb (1981). “Error-Correction Coding for Digital Communications”. New York: Plenum Press. ISBN 0-306-40615-2.
[55] Lin, Shu; Costello, Daniel J. Jr. (1983). “Error Control Coding: Fundamentals and Applications”. Englewood Cliffs NJ: Prentice-Hall. ISBN 0-13-283796-X.
[56] http://www.math.msu.edu/ jhall/classes/codenotes/coding-notes.html
[57] R. G. Gallager, “Low Density Parity Check Codes”, Monograph, M.I.T. Press, 1963.
[58] Amin Shokrollahi (2003) LDPC Codes: An Introduction.
[59] “Probability and Computing, Randomized Algorithms and Probabilistic Analysis“, Michael
Mitzenmacher and Eli Upfal, Cambridge University Press, 2005.
[60] “Iterative Methods for Solving Linear Systems”, Anne Greenbaum, Society for Industrial and
Applied Mathematics, 1997.
[61] David L. Donohoa, A. Maleki and A. Montanari, “Message-passing algorithms for compressed sensing”, preprint, 2009.
[62] L. Bassalygo and M. Pinsker, “Complexity of an Optimum Nonblocking Switching Network
without Reconnections”, Problem in Information Transmission, vol 9 no 1, pp. 289-313, 1973.

103

[63] P. Sarnak,“What is an Expander,” Notices of American Mathematical Society, Volume 51,
Number 7.
[64] R. Calderbank, S. Howard, and S. Jafarpour. (2009) “Construction of a large class of deterministic sensing matrices that satisfy a statistical isometry property.” [Online]. Available:
http://www.dsp.ece.rice.edu/?les/cs/strip-?nal.pdf
[65] Lu Gan, Cong Ling, Thong T. Do, and Trac D. Tran, “Analysis of the statistical restricted
isometry property for deterministic sensing matrices using Steins method”. (Preprint, 2009).
[66] David L. Donoho, Arian Maleki, Andrea Montanari, “the noise-sensitivity phase transition in
compressed sensing”, IEEE transactions on information theory, Oct. 2011, vol. 57, issue 10,
pages 6920-6941.
[67] E. Candes, “A probabilistic and RIPless theory of compressed sensing”, IEEE transactions
on information theory, Nov. 2011, Vol. 57, Issue 11, pages 7235-7254.
[68] R. Berinde and P. Indyk, “Sequential Sparse Matching Pursuit”, IEEE Allerton 2009, pages
36-43.
[69] J. Haupt and R. Nowak, “A generalized restricted isometry property”, preprint 2007.
[70] H. Feiveson and F. C. Delaney, “the distribution and properties of a weighted sum of chi
squares”, NASA technical notes, TN D-4575, May 1968.
[71] S. Gabler and C. Wolff, “A quick and easy approximation to the distribution of a sum of
weighted chi-square variables”, Statistical Papers, vol. 28, pp.317 -325 1987.

104