COMBINATORIAL METHODS FOR COMPRESSED SENSING By Abdolreza Abdolhosseini Moghadam A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical Engineering – Doctor of Philosophy 2014 ABSTRACT COMBINATORIAL METHODS FOR COMPRESSED SENSING By Abdolreza Abdolhosseini Moghadam The inefficiency of the classical signal acquisition systems where a signal is sampled at the Nyquist rate and then compressed, raises the question that without knowing any prior, can’t we directly sense a compressed version of the signal? In a series of papers, it has been shown that under certain conditions for the sensing operation and the signal of interest, a non-adaptive linear sampling scheme called Compressed Sensing (CS) can achieve such objective. Consequently a number of decoders have been proposed to recover a signal from it compressive samples. Two extreme approaches for those decoders are convex relaxation and combinatorial methods. The former approach requires the least number of samples for an exact reconstruction at the price of high complexities while the latter approach has low complexities but requires a higher number of samples for recovery. This thesis targets to design a CS approach which keeps only the best properties of convex relaxation and combinatorial approaches. In particular, we plan to show that, under our proposed sensing operator, a fast combinatorial approach would be an instance of the convex relaxation and thus it inherits all good properties of convex relaxation approaches, namely optimality in sample requirement and robustness in the presence of noise. Keywords: Compressed sensing, compressive sampling, sparse coding, combinatorial algorithms, image processing and signal processing. I dedicate this dissertation to the nicest family, which I was fortunate enough to be part of, i.e. my parents (Eynollah and Farkhondeh), my brothers (Mohammad Reza, Hamid Reza –rest in peace–, Ahmad Reza and Mohammad) and my beloved wife, Sara. iv ACKNOWLEDGMENTS I would like to thank my thesis committee members Dr. Jonathan I. Hall, Dr. Selin Aviyente and Dr. Karim G. Oweiss for their time, patience and valuable comments. I also wish to express my utmost gratitude to Dr. Hayder Radha, my advisor. His guidance, support, sensible trust and so many other positive attributes of his, have been my greatest assets in the last seven years. Truly, it was both a privilege and an honor to know him and to be his student. Finally, I want to thank my labmates, specially Mohammad Aghagolzadeh, Iman Barjasteh and Muhammad Usman Ilyas for helpful (and sometimes fun) discussions and time that we had and spent together. v TABLE OF CONTENTS LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 A new sampling scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 From Compressed Sensing to under-determined system of equations 1.3.2 Uniqueness of the solution . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Tractability of the decoder . . . . . . . . . . . . . . . . . . . . . . 1.4 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Motivation for the proposed research . . . . . . . . . . . . . . . . . . . . . 1.6 Challenges and proposed solutions . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Going to higher fields . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Augmenting with dense random rows . . . . . . . . . . . . . . . . 1.6.3 Robust and deterministic solver . . . . . . . . . . . . . . . . . . . 1.6.4 Changing the problem formulation . . . . . . . . . . . . . . . . . . 1.7 Proposal organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 4 4 5 7 9 12 14 16 17 17 18 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 20 24 25 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 28 28 29 32 38 Chapter 4 Complex Sparse Projections for Compressed Sensing (CRISP) 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Forming the projection matrix . . . . . . . . . . . . . . . . . . . . . . 4.3 CRISP Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 The Average Behavior Analysis . . . . . . . . . . . . . . . . . 4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 44 44 47 50 51 55 Chapter 2 Compressed sensing under hybrid projections 2.1 Recursive approach for CS . . . . . . . . . . . . . . . 2.2 Augmenting with dense rows . . . . . . . . . . . . . . 2.3 Alignment . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Lexical order . . . . . . . . . . . . . . . . . . . . . . Chapter 3 An extreme case of hybrid projections 3.1 Introduction . . . . . . . . . . . . . . . . . . 3.2 Hybrid Compressed Sesning . . . . . . . . . 3.2.1 HCS Solver . . . . . . . . . . . . . . 3.3 Optimal ratio of ms and md under HCS . . . 3.4 Simulation Results . . . . . . . . . . . . . . vi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5 Modified SSMP . . . . . . 5.1 SERP projection matrix . . . . . . 5.2 w-RIP . . . . . . . . . . . . . . . 5.2.1 Implications of w-RIP . . 5.3 w-RIP for SERP matrices . . . . . 5.4 A Combinatorial solver for SERP . 5.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 58 60 63 65 72 82 Chapter 6 Future works . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . 6.2 Another look at combinatorial algorithms 6.3 noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 88 88 94 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 vii LIST OF FIGURES Figure 3.1 From left to right, the k-sparse image of Monalisa, reconstructed by applying HCS, BP and OMP respectively to the whole image (m = 150746). The decoding times are tbp = 109.5077, thcs = 16.1704 and tomp = 15.1670 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Figure 3.2 From left to right, the k-sparse image of Lenna, reconstructed with m = 28 compressive samples by applying HCS, BP and OMP respectively on blocks of size 8 × 8 pixel. The decoding times are tbp = 41.0283, thcs = 4.4735 and tomp = 1.1512 seconds. . . . . . . . . . . . . . . . . . . . . 39 Figure 3.3 m, n and k represent number of samples, signal length and signal sparsity respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Figure 3.4 m, n and k represent number of samples, signal length and signal sparsity respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 3.5 m, n and k represent number of samples, signal length and signal sparsity respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Figure 4.1 Comparing Compressive Demosaicing with some well-known demosaicing algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Figure 4.2 Comparing Compressive Demosaicing with some well-known demosaicing algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Figure 4.3 Comparing Compressive Demosaicing with some well-known demosaicing algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Figure 5.1 Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when the projection matrix is sparse binary and the solver is SSMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Figure 5.2 Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when the projection matrix is a sparse realvalued sensing matrix (SERP) with r = 1 and the solver is MSSMP. . . . 84 viii Figure 5.3 Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when the projection matrix is a sparse realvalued sensing matrix (SERP) with r = 2 and the solver is MSSMP. . . . 84 Figure 5.4 Comparing the performance of the sparse real valued projection matrices and MSSMP with sparse binary projections under SSMP decoder when samples are noisy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Figure 5.5 Comparing the performance of the sparse real-valued and dense Gaussian projection matrices when BP is the decoder and samples are noisy. . . . . 86 Figure 5.6 Comparing the performance of the sparse real-valued and dense Gaussian projection matrices when Cosamp is the decoder and samples are noisy. . 87 ix Chapter 1 Introduction 1.1 A new sampling scheme For many decades, the Nyquist-Shannon sampling theorem was the tenet core of signal acquisition systems. According to this theorem, a signal can be completely determined by its samples if the sampling rate is twice the maximum frequency of that signal. On the other hand, there are numerous applications where sampling at Nyquist frequency is either impossible, very expensive or time consuming. For instance, sensing each sample in Magnetic resonance Imaging (MRI) is so time consuming that, a full MRI might take several hours which is clearly inconvenient for patients [5]. Another example could be the class of signals with such high frequencies (e.g. terahertz) that there is no sampling device operating at those frequencies. On the other hand, we often deal with signals with a high level of correlation in some domain. For instance, it has been known for a long time that natural images can be approximated virtually with no perceptional loss by employing the Discrete Cosine Transform (DCT) or different types of Wavelets [51, 52, 53] and by retaining a fairly low number of coefficients. Indeed, this fact has been extensively used in efficiently compressing these classes of signals where instead of storing the image, a few “significant” transform coefficients of that image are stored and the rest are discarded. This classical technique warns us about the inefficiency of current sampling approach that a highly redundant signal would be sampled at the Nyquist rate and then after compression all those samples would be thrown away. This raises the question “why go to so much effort to acquire all the data when most of what we get will 1 be thrown away? Can’t we just directly measure the part that won’t end up being thrown away?” [2]. Consequently, a new sampling framework was proposed independently by Donoho in [2] and Candes, Romberg and Tao in [5]. This scheme is widely referred to as “Compressed Sensing” or “Compressive Sampling” and often abbreviated by CS [1]-[15]. In those papers, it has been shown that signals with high correlations in some domain can be recovered from (sometimes much) fewer samples compared to traditional approaches. This can be achieved by exploiting: • the fact that high correlation translates into existence of some “sparsifying domain” where the transform coefficients of those signals in respective domains are mostly zero or have very low magnitudes. • Certain conditions on the sensing operator, ensuring that each sample carries some (yet little) information about all or at least a subset of the signal of the interest. Furthermore, as opposed to traditional sampling methods where each sample is computed by convolving the signal (to be sensed) with a shifted delta dirac function, in CS each sample is computed by correlating the signal (to be sensed) with a waveform. This immediately implies that CS (at least in its original form) is a non-adaptive linear sampling scheme. Before presenting the compressed sensing framework, let us introduce the notations which will be used in this report. 1.2 Notations In this report, we denote vectors by bold lower case letters (e.g. α) and matrices by bold upper case letters (e.g. P ). For a natural number q ∈ N, define [q] {1, 2, . . . , q}. The cardinality of a set S is shown by |S|. The set difference operation is denoted by \. The complement of S 2 is represented by S c . The l-th entry of the vector a is denoted by al . The support of a signal is denoted by Supp(.), i.e.: Supp(x) {j : xj ̸= 0} (1.1) We use I n to show the identity matrix of size n. We often use MATLAB notation for identifying submatrices of a matrix. For instance, for P ∈ Rn1 ×n2 , s1 ⊆ [n1 ] and s2 ⊆ [n2 ], by P s1 ,s2 we mean the submatrix of P restricted to rows of s1 and columns of s2 . Similarly we use the notion of ‘:’ for instance P s1 ,: (or P :,s2 ) to keep only rows (columns) indexed by s1 (s2 ). The transpose of P is denoted by P T . The Moore Penrose pseudoinverse of a matrix P is P † , that is ( )−1 PT P† = PTP (1.2) We use ⊙ for Hadamard (point wise) product. That is for a field C and a, b, c ∈ Cn , if a = b ⊙ c then al = bl cl for l ∈ [n]. Also ⊗ stands for Kronecker tensor product. In other words, for n1 , n2 , m1 , m2 ∈ N, A ∈ Cn1 ×n2 and B ∈ Cm1 ×m2 we have:    A1,1 B . . . A1,n2 B   .. .. .. A⊗B = . . .   An1 ,1 B . . . An2 ,n2 B       For p ≥ 1, the ℓp norm of a real valued vector a ∈ Rn denoted by ∥a∥p is: ∥a∥p = ( n ∑ i=1 3 )1 |ai |p p Two special cases are p = 0 and p = ∞ where we define pseudo norms of ℓ0 and ℓ∞ by: i ∈ [n] : ∥a∥0 #{i : |ai | ̸= 0}, ∥a∥∞ max{|ai |} In words ℓ0 of a vector counts the number of non-zero entries of that vector and ℓ∞ is the maximum absolute value of that vector. If necessary, we define other notations in the subsequent chapters. 1.3 Compressed Sensing 1.3.1 From Compressed Sensing to under-determined system of equations Consider the real valued discrete signal1 s ∈ Rn of length n ∈ N. In the traditional sensing scheme, the sensing operator is I n , the identity matrix of size n. That is y, the vector of sensed samples is simply y = I ns = s The theory of Compressed Sensing attempts to answer the following question: is there any linear undersampling operator Φ ∈ Cm×n where m < n and a decoder ∆ such that having samples ˆ = ∆(Φ, y) has the minimum error in some norm ℓp ? y = Φs and Φ, the estimated solution s That is, design ∆(., .) and the sampling operator Φ such that: min ∥∆(Φ, y) − s∥p (1.3) Unfortunately, such solution can not be found using classical linear algebraic approaches. To see that, note y = Φs is an underdetermined system of linear equations and has an infinite set of 1 The following arguments can be extended to analog signals by first discretizing an analog signal. 4 solution. More specifically, recall that for a matrix Φ ∈ Rm×n , its kernel or null-space is: ker (Φ) = {z ∈ Rn : Φz = 0} (1.4) where 0 is a vector with all zero entries. Now note that in case of our problem where m < n, the rank of the null-space of Φ is at least n − m, meaning that ker (Φ) ̸= ∅. Now assume that z ∈ ker (Φ) then for all α ∈ R we have y = Φs = Φs + αΦz = Φ (s + αz) Thus given y and Φ there are infinitely many solutions (in the form of s + αz) and we do not know which one is the correct solution. Consequently, first we need a guarantee that the solution s, which we are looking for, is unique. 1.3.2 Uniqueness of the solution As noted before for many interesting classes of signals, most notably audio and images, there exist sparsifying transforms. More specifically, by a sparsifying transform Ψ ∈ Rq×n for the signal s ∈ Rq we mean that: s = Ψx, k = ∥x∥0 < q (1.5) where x is often referred to as transform coefficients or representation of s in the Ψ domain. Usually n ≥ q to span all vectors in Rq . A measure for indicating how “good” a sparsifying transform Ψ is, the ability of that transform to make representations x more sparse. In words, we expect that for all signals {s(i) }i belonging to a specific class D (e.g. images of w1 × w2 pixels), 5 we have: ∀s(i) ∈ D, ∃k ≪ n, x ∈ Rn : s(i) = Ψx, ∥x∥0 ≤ k (1.6) Here we present the arguments in [43] to show that how the existence of a good sparsifying transform, changes drastically the uniqueness of a solution for an under-determined system of equations y = Φs. Suppose s has a k-sparse representation x in the Ψ domain, i.e. ∥x∥0 = k and s = Ψx. As in the same paper [43], let us use the term “projection matrix” for the matrix P = ΦΨ. Then y = Φs = ΦΨx = P x Now assume that every 2k columns of the projection matrix are linearly independent. We claim that x is the unique minimizer to the following optimization problem: ˆ arg min∥ˆ x∥0 : s.t. y = P x (1.7) ˆ x We prove this claim by contradiction. Assume there exists another solution, say v ̸= x such that v is also at most k-sparse (∥v∥0 ≤ k). Thus P x = P v , P (x − v) = 0 Consequently the vector x − v is in the null-space of P . However x − v is at most 2k sparse. This violates the assumption that every 2k columns of the projection matrix are linearly independent. Hence x is the unique minimizer of (1.7). Note that, the immediate consequence of this proof is that m = 2k is the minimum number of samples that one can sense from a signal with a k-sparse representation and does not lose any information. Since otherwise rank of P would be less than 6 2k and one can find 2k linearly dependent columns of P . Now one can hope for solving (1.7) and then reconstruct the original signal s by s = Ψx. Unfortunately, solving (1.7) is exponentially hard. Since, one must verify that can y be spanned by only one column of P . Then he (she) tries any two columns of P to span y and so on. Consequently the next question after the uniqueness of the solution, is that: does there exists any decoder ∆(., .) with polynomial time complexity for (1.7)? 1.3.3 Tractability of the decoder The problem of the form (1.7) has been around in the signal processing and also physics communities for a long time, specifically for finding the transform coefficients of a signal in an overcomplete sparsifying transform where the number of atoms (or basis functions) of that transform is more than the dimension of the signal [17, 26, 52, 53]. Heuristically, the problem of (1.7) has been relaxed to the following optimization problem: arg min∥ˆ x∥1 : s.t. y = Φˆ x (1.8) ˆ x The reason behind changing the ℓ0 norm in (1.7) to ℓ1 norm formulation of (1.8) is that, ℓ1 norm is the closest convex and valid norm to the non-convex pseudo norm of ℓ0 . Furthermore, the solution to (1.8) often has many small values, since any other ℓp>1 norm, all the values with the magnitudes in the range of (−1, 1) would be damped. Thus if any other norm than ℓ1 was used in (1.8), then the solution would more likely have many values in the range of (−1, 1) which is not desirable (recall that we are interested in finding a sparse solution). Independently, Donoho in [2] and Candes, Romberg and Tao in [5] have proved that under certain conditions (to be noted shortly) for the projection matrix P the solution of (1.8) is indeed 7 equal to the solution of (1.7). This is a major breakthrough since the problem of (1.8) is a convex optimization problem (more precisely linear programming) and can be solved tractably. Let us discuss the necessary conditions on the projection matrix. For a matrix P and each integer k = 1, 2, . . ., define the “isometry constant” [5, 3] of order k as the smallest number δk such that: (1 − δk )∥x∥22 ≤ ∥P x∥22 ≤ (1 + δk )∥x∥22 (1.9) holds for all k-sparse vectors x. Note that a small isometry constant is equivalent to these statements: 1. That matrix (approximately) preserves the geometry (norm) of the signal x, i.e. ∥x∥2 ≈ ∥P x∥2 . 2. That matrix approximately preserves the distances between vectors in the compressed space ∥x1 − x2 ∥2 ≈ ∥P x1 − P x2 ∥2 . 3. All eigen-values of the sub-matrices (with k columns) of P are in the range of [1−δk , 1+δk ]. 4. Submatrices (with k columns) of P approximately behave like an orthonormal frame. We say that a matrix P has a “Restricted Isometry Property” (RIP) of order k, if δ2k < √ 2 − 1. Now we can present the main theorem of Compressed Sensing (in a compact form) [3]. Theorem 1.3.1. Assume y = P x + e, ∥e∥2 ≤ ϵ, δ2k < √ ˆ = arg min ∥¯ 2 − 1 and let x x∥1 subject ¯ ∥2 ≤ ϵ. Then to ∥y − P x ∥ˆ x − x∥2 ≤ C0 k −1/2 ∥x − x⋆ ∥1 + C1 ϵ 8 where C0 and C1 are constants (independent from n) and x⋆ is the best k-sparse approximation to x. The theorem is extremely strong, since: • It is robust to noise e. • Even if the signal of interest x is not exactly sparse, it returns the best sparse approximation of that signal. The only question to be answered is: which matrices have RIP? Generally verifying RIP for matrices is either impossible or the bounds are not very tight. However, there are proofs of the existence of RIP for dense matrices with entries coming from (a) uniformly random (b) normal distribution (c) Bernoulli matrices and (d) partial Fourier matrices [16, 5, 2]. Furthermore the best bound on the number of rows or equivalently the number of samples required to guarantee RIP is for dense matrices with entries from uniformly random or normal when m = O (k log(n/k)). In other words, to make the recovery of a k-sparse vector x from an under-determined system of linear equations y = P x tractable, we have to increase the number of samples from m = 2k to m = O (k log(n/k)). Furthermore, this bound is tight. In other words, if the order of the number of samples is less than O (k log(n/k)), then either the recovery is not possible for “all” sparse signals or there is no tractable decoding algorithm for that amount of samples. 1.4 Related Works Broadly speaking there are three approaches to the problem compressed sensing: 1. Convex relaxation: As stated before,the first proposed protocol for CS was relaxing the non-convex cost function of min ∥.∥0 to the convex one min ∥.∥1 and use random dense 9 matrices for the projection matrix (guaranteeing RIP). Since the derived bounds in [5, 2] are tight, only a few number of research directions remain in the convex relaxation approaches. One is to design more efficient (in terms of complexity) linear programming solver to find the solution to (1.8). This can range from using gradient descent methods [19]-[24] to recent interior points in [17]. The other research direction in this category is to reformulating (1.8) such that either the complexity decreases or the quality of reconstruction increases. For instance the algorithm Basis Pursuit (BP) [17] exactly solves (1.8). While there are other formulations such as Lasso [18] which solves the following problem: arg min∥y − Ψˆ x∥2 : s.t. ∥ˆ x∥ 1 ≤ k (1.10) ˆ x In practice, the complexity of Lasso is slightly lower than BP but it usually leads to higher errors of reconstruction as well. Furthermore, usually the sparsity level k of the unknown vector x is not known beforehand, except in possibly some applications. Consequently BP has become the most popular solver in this category of decoders within the CS community. 2. Greedy pursuit: In these approaches [26, 28, 30, 29], a greedy approach based on the original Matching Pursuit (MP) [25] is adapted for finding the solution to (1.7) where in each step of the algorithm, one coordinate of the solution would be set to a non-zero value such that the residue of the approximation up to that point would be minimized. That is, having ˆ , the following problem is samples y = P x, projection matrix P and current estimate x solved: arg max i ˆ) P T:,i (y − P x ˆ ∥2 ∥P T:,i ∥2 ∥y − P x (1.11) ˆ i would be updated accordingly to minimize ∥R∥2 where R = y − P x ˆ is the and then x 10 residue of approximation till that step. The reason behind this greedy decision is that, if the correlation of the residue with the i-th column of the projection matrix is high, then there is a good chance that xi has large magnitude. Note that these greedy approaches directly attempt to solve the ℓ0 minimization problem of (1.7). Thus proving the convergence of these approaches to the correct solution can be challenging. In fact except for a few methods such as [28, 30], there is either no proof (of convergence) for many greedy methods or certain conditions on P are required (e.g. very small values for δ3k , etc.) which validating their existence or satisfaction for matrices might be virtually impossible. 3. Combinatorial Algorithms: In these approaches [31]-[42], a carefully designed sensing matrix Φ is deployed in the sensing stage such that the effective projection matrix P = ΦΨ would be a highly structured and often binary sparse matrix. Then the sparsity and structured attributes of P are utilized usually in a group testing like algorithm to solve (1.7). Similar to greedy approaches, the proof of converging to the correct solution in these approaches is very difficult. Thus we do not have strong proofs as in convex relaxation methods guaranteeing this methods would converge to the correct solution. We discuss more about this approach later in this report. One might ask while there are available linear programming solvers to solve (1.8), why should we look for alternative solutions? The answer to this question is solely due to the complexity of decoders, belonging to the convex relaxation methods. In fact, the complexity of a linear programming solver equals to solving an inverse problem of the same size. That is if the unknown sparse vector x ∈ Rn , then the complexity of solving (1.8) is O(n3 ). Although this order of complexity is a polynomial time, it is so high such that BP (or any other ℓ1 minimizer) can be only utilized in small to medium size problems where the length of the signal is less than a few thousands. On the 11 other hand, for many classes of signals (e.g. digital images) the length is in the order of millions (mega pixel images). This puts a big restriction on the usability of BP in those applications. Now let us present a brief comparison of different CS approaches in here. The set of algorithms belonging to convex relaxation approaches works great as long as the projection matrix holds RIP. Not only they have the highest level of robustness in the presence of noise, they also exhibit excellent results even if the underlying signal x is not “exactly sparse”, meaning that if the signal has many small and not zero entries. Furthermore, in practice, they require the least number of samples to recover x from y = P x. However, as stated before, the main disadvantage of these solvers is their high complexities. Furthermore, this class of solvers usually only operates on dense projection matrices P . Thus in applications where storage is of concern, storing the projection matrix suitable for convex relaxation methods is another challenge (O(mn)). On the other extreme, approaches belonging to the class of combinatorial algorithms have the lowest complexities, usually either linear or sub-linear in the signal length ≈ O(n). Furthermore, the sparsity of the projection matrix translates into fast matrix operations and efficient storage (≈ O(n)). However, these algorithms (virtually always) requires the highest number of samples for the recovery compared to the other classes of decoders. Furthermore, these algorithms are (usually) not robust in the presence of noise and only works with exactly sparse signals. Finally, greedy approaches are intermediate in their running time, sampling efficiency and robustness. However, many greedy algorithms require parameter tuning which are often signal dependent [30, 28]. 1.5 Motivation for the proposed research Indeed, the ultimate goal in the compressed sensing framework is to design an effective projection matrix P and the associated decoder (solver) ∆ with the following characteristics: 12 1. The required number of samples for the recovery of x from y = P x is optimal, i.e. m = O ( k log(n/k)). 2. The complexity of the solver is minimal, for example O(n) or even lower. 3. The scheme is robust to noise. 4. The algorithm works even with approximately sparse signals. So far all of these properties have not been satisfied by a single scheme. We plan to achieve this goal by utilizing the valuable lessons which CS approaches provide us. Specifically we plan to design a CS scheme with the best properties of convex relaxation and combinatorial approaches. At first glance, this task seems impossible since convex relaxation methods (virtually always) operates on random dense projection matrices while combinatorial approaches only works with highly structured sparse binary matrices. However in this thesis, we plan to show and prove that with some modifications of the projection matrices suitable for combinatorial algorithms and designing associated solver, we might be able to achieve the aforementioned goals. To that end, we must slightly deviate from classical combinatorial approaches. More specifically we investigate the effect of substituting the binary entries in a sparse structured matrix with ones from larger fields, say real or complex numbers. Note that, this technique is reminiscent of modifying codes in classical channel coding theory [54, 55]. Consequently new respective solvers must be developed for this new class of projection matrices. One of our objectives is to emulate the Luby Transform (LT) decoder [50] (which is a rateless, near optimal erasure correcting code) for solving the problem of CS. In fact, it has been observed that the problem of compressed sensing (solving an underdetermined system of equations when the solution is sparse) has very close relationship with the problem of channel coding where we have a noisy over-complete system of linear equations [9]. Thus, another objective of this thesis is to establish a strong link between the problems of channel 13 coding and CS. Finally, we introduce a systematic perspective to the problem of CS under combinatorial approaches which might later help us in finding rigorous proof for the convergence of the proposed solver. 1.6 Challenges and proposed solutions To design a low complexity yet robust and optimal (in terms of sample requirement) compressed sensing framework, it would be beneficial to find the reasons why combinatorial approaches to CS fail to achieve the same goal. A typical combinatorial algorithm usually follows this simple algorithm to recover a sparse signal x of length n from m < n samples y = P x: ˆ = 0, a zero vector of length n. 1. Initialize with the estimation of x 2. Let si be the column indices where the i-th row of the projection matrix is non-zero (and in fact one): si {j : P i,j ̸= 0} (1.12) We say that the i-th sample spans signal coordinates si . 3. Find a subset of sample indices ω ⊆ [m] with the following properties: (a) the size of this set is more than a threshold c (|ω| > c) (b) all samples with indices in ω have the same value ∀i, j ∈ ω : y i = y j and (b) all the rows indexed by ω coincide in only one column index: ∩ | i∈ω si | = 1. 4. Let a = ∩ ˆa ← x ˆ a + y b where b can be any member i∈ω si then update the estimate with x of the set ω. 5. Update the sample values to y ← y − P :,a y b . 14 6. Go to step 3 and continue this operation until y is an all zero vector. Let us briefly explain the task of this algorithm. After initialization in the first two steps, the algorithm finds a subset of samples which all of them vote the same value and all of them span only one common signal coefficient xa . Since the projection matrix is binary, thus the algorithm assumes the same votes comes from xa . Consequently, the signal at that coordinate would be estimated as the votes. Similar to many message passing algorithm (e.g. LDPC codes [57, 58]) and after subtracting the effect of that estimation from the samples, the algorithm iterates the same procedure. It might be the case that the estimate is wrong. However, one would wish that the number of correct estimates is higher than the wrong ones and hence the algorithm would converge to the correct solution. Here a few notes should be highlighted. The assumption that “if a number of samples have equal values and they span only one common signal coordinate, therefore that common coordinate has to be non-zero”, is not valid and one can find many counter examples. For instance, if a number of signal coefficients are equal and there exists a subset of samples that span only one of those nonzeroes then those samples would be equal as well. In fact, that assumption is a necessary but not a sufficient condition, that is if a number of samples spans only one common non-zero then they would be equal. This problem is overcome in traditional combinatorial algorithm by choosing the threshold c (the minimum number of samples that vote the same) large enough to increase the level of confidence. However as we increase c, we have to increase the number of samples as well, since otherwise, no c samples can be found with the aforementioned properties. This observation justifies one of the reasons why in practice, combinatorial algorithms require a higher number of samples (m) when compared to other classes of CS decoders. Furthermore, if the samples are contaminated by noise then exactly equal samples would not be found. A remedy for this problem can be finding a subset of samples where they are within a small radius of each others. Although this correction 15 might work where the noise has very small magnitudes, it usually fails. Consequently, designing robust combinatorial CS approaches still remains an open problem. Another negative result about the projection matrices of traditional combinatorial algorithm is that, those binary sparse matrices do not hold RIP [44]. Indeed, RIP is the most important reason why convex relaxation methods are stable under the presence of noise. Thus designing alternative robust combinatorial solvers seems challenging (if not impossible), if we keep the sampling process intact. Here we present our solution to these inherent problems with combinatorial CS approaches that employ binary sparse projection matrices. 1.6.1 Going to higher fields The first strategy that we employ to improve the characteristics of combinatorial approaches is to replace binary entries of the projection matrix with numbers from larger fields, e.g. real or complex numbers. As we show later in this report, this would aid to make milder assumptions about the underlying signal for the purpose of recovery. On the other hand, this modification introduces other problems. For instance, even if a subset of samples span only one (unique) non-zero signal coefficient then those samples would not be equal. We propose a simple and efficient solution for this problem in the next chapter. More specifically, we (jointly) look at a subset of size l < m samples as a vector in Hl where H is the field from which the entries of P are. Then, we look for a (subset of) columns of the projection matrix which aligns in that direction. Furthermore, we can carefully choose entries of the projection matrix such that sub-matrices of P would have a lexical order for the decoder. This technique reduces the complexity of our solver significantly as we show in the following chapters. 16 1.6.2 Augmenting with dense random rows As noted before, there are interesting links between the problem of CS and channel coding. This link is more strong when we consider the class of combinatorial algorithms. It has been shown in some codes [50], for instance the robust soliton distribution in LT codes that adding “dense encoding symbols” helps in the stability of the decoder in deviating from the average behavior. We consider this option in the design of our projection matrix where it might be the case that some random dense rows would be added to the projection matrix. Finding the optimal ratio of dense and sparse rows (in P ) is one of our objectives. 1.6.3 Robust and deterministic solver Recall that in classical combinatorial algorithms some wrong decisions might happen in finding the support of the signal. These wrong decisions would be corrected in a message passing algorithm. However, this permission (of wrong decisions) leads to extra iterations of the algorithm. In this thesis, we plan to design our solver such that all the decisions would be correct which translates into decreasing the complexity of the solver and running time. As we show later in this report, we require a RIP condition on the projection matrix to guarantee that no mistake can be made during the recovery. Meanwhile RIP translates into robustness (in the presence of noise) at least for convex relaxation algorithms. We present our numerical experiments that our proposed projection matrices holds RIP and provide guidelines how to prove that claim. Hence, we can hope for a robust optimal combinatorial CS framework. 17 1.6.4 Changing the problem formulation Another contribution of this work is to introduce a systematic formulation for the problem of CS under a combinatorial framework. In our formulation (as we present in the next chapter), we show that combinatorial algorithms are in fact a divide and conquer approach to find the sparsest solution to an under-determined system of equations. Then in the last chapter of this report, we present necessary and sufficient conditions for convergence and robustness. 1.7 Proposal organization This thesis has two major parts. In the first part, we solve the problem of CS in the ideal case. i.e. the signals of interest are exactly sparse and the measurements are noise free. Furthermore, some of the analysis in the first part predicts only the average behavior of the proposed schemes. The second part of this thesis includes more realistic scenario where the signals are no longer exactly sparse and measurements are noisy. Also all the proofs will hold true either with probability one (either strictly or asymptotically) or with overwhelming probability. Nevertheless, we review the following chapters in here: • In Chapter 2, we introduce “hybrid projection matrices” as a candidate for projection matrices in the problem of CS. This class of matrices is composed of two parametric dense and sparse matrices. Then we present our generic solver for this class of matrices. • In Chapter 3, we consider the most simple case of hybrid projection matrices where rows of the spars sub-matrix span disjoint coordinates of the signal. We derive a bound on the sample requirements, optimal ratio of the dense rows to sparse rows and complexity of the scheme. Our simulation results also verify that this simple scheme, in certain applications, 18 can outperform many state of the art algorithms in the ideal case. • In Chapter 4, we consider the case where rows of the sparse sub-matrix of our projection matrix coincide in some column indices, however there is no dense submatrix present in the projection matrix. Two interesting types of incidence are those from the adjacency matrices of (a) expander graphs [63] and (b) block designs, especially Balanced Incomplete Block Designs or BIBD [49]. We show that expander graphs lead to an optimal (in terms of sampling requirements) scheme. • Chapter 5 is devoted to future works and our plans to extend our work to the case of compressible signals and noisy measurements. 19 Chapter 2 Compressed sensing under hybrid projections 2.1 Recursive approach for CS In this section, we present our systematic approach to the problem of CS. Although most combinatorial approaches follow a similar path, to the best of our knowledge, the abstract formulation presented below is novel and unique. For the simplicity of notation and without any loss of generality, assume the signal of interest is sparse in the standard basis Ψ = I n . That is the sensing matrix and the projection matrix are the same P = ΦΨ = Φ and x = s. Consider two disjoint non-empty sets of T ⊂ [n] and T c where by definition T ∪ T c = [n]. For all vectors x ∈ Rn and norms ℓp we have: p p p ∥x∥p = ∥xT ∥p + ∥xT c ∥p However norms of ℓ0 and ℓ1 are special since the equality changes to: ∥x∥0 = ∥xT ∥0 + ∥xT c ∥0 , ∥x∥1 = ∥xT ∥1 + ∥xT c ∥1 and a similar equality generally does not hold true for other norms. We show that this special 20 property of ℓ0 and ℓ1 enables us to follow a divide and conquer approach to the problem of CS. Recall that the goal of CS is to find the vector x ∈ Rn with minimum sparsity and which is the solution to the under-determined system of linear equations y = P x ∈ Rm where m < n, i.e.: ˆ x = arg min∥ˆ x∥0 : s.t. y = P x (2.1) ˆ x One might follow a divide and conquer approach to this problem where coordinates of x are partitioned into two disjoint non-empty sets of T, T c ⊂ [n] and then solve the following problem: min ∥xT ∥0 + min ∥xT c ∥1 , s.t. y = P x (2.2) It can be easily verified (e.g. by contradiction) that the solution to this optimization problem equals to the solution to the original format of CS (2.1). This divide and conquer approach has all advantages of a recursive program. For instance, after dividing the signal coordinates into two disjoint sets T and T c , one might partition each of these two sets into two more disjoint non-empty subsets. This division can be continued until we reach to a point where finding the subsolution with the minimum ℓ0 can be done tractably and then traverse that tree upward. However, this approach implies some undesirable conditions on the projection matrix. More specifically, to implement this divide and conquer algorithm even in the first level, we require that the sample y can be divided into disjoint subsets Q and Qc where samples y Q only span1 xT and y Qc only span xT c . This is the case, since otherwise the sub-problems (min ∥xT ∥0 and min ∥xT c ∥0 ) would not be independent and a divide and conquer approach will not work. Thus in this case, there exist 1 Recall that we say the i-th sample spans coordinates ω if ω = {j : P }. i,j 21 permutation matrices Π1 and Π2 such that:    A 0  Π1 P Π2 =   0 B (2.3) c c where A ∈ R|Q|×|T | and B ∈ R|Q |×|T | . Unfortunately, this highly structured matrix most likely does not hold RIP [3, 16] and hence it might lead into a non robust solution. To avoid this problem we have to consider some overlaps among the span sets of different samples. Knowing that samples span overlapping subsets of the signal coordinates, we slightly change the formulation of (2.2) to the following: min ∥xT ∪a ∥0 + min ∥xT c ∪a ∥1 , s.t. y = P x (2.4) where a is a (small) non-empty subset of [n]. Generally, the solution to this problem is not equal to the original problem of (2.1) unless xa is an all zero vector. In this paper, we attempt to solve the problem of CS using this formulation. That is find two overlapping subsets of signal coefficients, say T ∪ a and T c ∪ a such that xa is zero and solve each of two sub-problems independently. Suppose the i-th sample (i ∈ [m]) spans indices si of a k-sparse signal of length n. Without loss of generality, assume that support of the signal is distributed uniformly. Therefore, the probability that y i spans j non zeros is: (k)( n−k ) j |si |−j (n) Pr (|si ∩ Supp(x)| = j) = (2.5) |si | which is a hypergeometric distribution with the mean of |si |k/n. Note that for a dense matrix 22 where all entries are non-zero |si | = n, we have Pr (|si ∩ Supp(x)| = k) = 1 (meaning that all samples spans all non-zeros), which is not surprising! However, for sparse matrices where |si | ≪ n and k ≪ n this probability (and the corresponding mean) is very small. In other words, for a sparse projection matrix (even considering the variance and deviation from the mean) most samples span only a few non-zeros (if any). It turns out that finding samples which spans only zero valued coefficient is not trivial. Since otherwise, we would have known the support of the signal and the problem has been already solved by forming a full-rank system of equations. As we show in the following chapters finding isolated non-zero coefficients seems to be a simpler task. We say that the non-zero coefficient xi is isolated in the j-th sample if: Supp(xsj ) ∩ sj = i, i.e. the only non-zero coefficient sampled by y j is xi . In the scenario when the isolated non-zeros can be identified , one may apply the following algorithm to solve the problem of Compressed Sensing. ˆ = ∆s (P , y) Algorithm 1: A recursive decoder x input : Samples y = P x, Projection matrix P ˆ output: x Find a (number of) sample(s) y j that spans an isolated non-zero, for instance xi ; Let a be the estimate of that isolated non-zero; ˆ i ← a; x y ← y − P :,i a (subtract the effect of estimation from samples); P ← P [m]\j,[n]\s , y ← y [m]\j (throw away used samples and respective rows and j columns of the projection matrix); ˆ [n]\s ← ∆(P , y); x j Note that Algorithm 1 is exactly our previous recursive formulation (2.4) when ∥xT ∥0 = 1. The same algorithm can be expressed in a non-recursive formulation which is presented in 23 Algorithm 2. ˆ = ∆s (P , y) Algorithm 2: Non recursive formulation of decoder x input : Samples y = P x, Projection matrix P ˆ output: x while ∥y∥ ̸= 0 do Find a (number of) sample(s) y j that spans an isolated non-zero, for instance xi ; Let a be the estimate of that isolated non-zero; ˆi ← x ˆ i + a; x y ← y − P :,i a; end 2.2 Augmenting with dense rows Algorithm 1 succeeds in recovering the sparse vector x if we always can find a number of samples that span an isolated non-zero. This is equivalent to guarantee the ripple in the LT codes never gets empty [50] before complete recovery. In LT codes, this guarantee comes from a carefully designed degree distribution over message and parity nodes. However unlike the channel codes, in CS we are not aware of the support of the signal. Consequently, it would be extremely difficult (if not impossible) to design span sets {si } such that we can always find a number of samples spanning an isolated non-zero. To overcome this problem and for other reasons to be discussed in the following chapters, we augment the projection matrix with md ≥ 0 random dense rows (hence Hybrid projection matrices). To avoid ambiguity, from now on, we use ms to denote the number of sparse rows (samples) of a projection matrix and use md to denote the number of dense rows of the same projection matrix. Now, assume that after a number of iteration, the algorithm ∆s (., .) can not find any other isolated non-zero and halts. Furthermore, assume that up to that point signal values in indices of ω ⊆ [n] have been recovered and samples y q coming from ms sparse rows of the projection matrix have been used. If the number of undiscovered signal coefficients n − |ω| is less 24 than or equal to the number of unused samples md + (ms − |q|), then we have a sufficient number of equations to determine the signal values in the remaining indices. Algorithm 3 summarizes this case. ˆ = ∆h (P , y) Algorithm 3: The hybrid solver x input : Samples y = P x, Projection matrix P ˆ output: x ˆ ← ∆s (P , y); x Let q ⊆ [ms ] be the sample indices which have been used from the sparse part; Let ω ⊆ [n] be the indices where the signal has been recovered; if n − |ω| ≤ md + (ms − |q|) then ˆ ); ˆ [n]\ω ← P †[n]\ω (y − P x x end 2.3 Alignment So far we have assumed that there exists an algorithm which can find isolated non-zeros. In here, we present a simple algorithm for such task. Assume A ∈ Ra×b and z ∈ Rb is a vector of size b where a < b. Furthermore define h = Az. We say that h is aligned with the i-th column of A if h is in the direction of A:,i , that is: |AT:,i h| ∥A:,i ∥2 ∥h∥2 =1 (2.6) Note that if ∥z∥0 = 1, then h would be aligned with one of columns of A since: zj =    0 j ̸= i   α j = i. 25 ⇒ h = αA:,i (2.7) Now if we can prove that alignment of h with a column of A means that ∥z∥0 = 1 ? h = αA:,i ⇒ z j =    0 j ̸= i   α j = i. (2.8) then we can utilize the concept of alignment in finding isolated non-zeros in the problem of CS. More specifically, we can substitute A with P Ω,T , h with y Ω and z with xT where T ⊆ [n] and Ω ⊆ [m] is indices of the set of samples spanning xT and then verify whether ∥xT ∥0 = 1 or not. In our simulations, we have observed that the aforementioned statement holds true in case of our projection matrices. Providing a rigorous proof to (2.8) is one of our future work. In Chapter 5, we present some guidelines on how we are going to prove it. 2.4 Lexical order Indeed the concept of alignment for the matrix A ∈ Ra×b and a vector h ∈ Ra makes sense2 only when a > 1. As stated before, A shall be a sub-matrix of the projection matrix P and h will be subsets of compressive samples y = P x. Clearly, we need some properties in the entries of the projection matrix to make the process of verifying the alignment, tractable. Since otherwise, we have to check all sub-matrices of the projection matrix and respective subsets of samples to find alignments, which has an exponential complexity. This problem can be solved by either of these two strategies: 1. Slightly modifying the decoder ∆(., .). 2. Using lexical order, meaning that columns of the projection matrix have a logical order to 2 We always have alignment for a=1. 26 the decoder. That is we can find a scoring function S : Rb → R such that: i, j ∈ [b], i < j : S (A(:, i)) < S (A(:, j)) (2.9) In the latter case, a quick search algorithm, similar to Algorithm 4 might be used to find the existence of an alignment. Examples for such concepts will be presented with further details in the following chapters. Algorithm 4: M = B (A, h, l, u): A binary search for alignment when a lexical order exists input : A, h = Az, l, u output: M index of alignment if u < l then return -1 (no alignment found); end M =( l + [(u ) − l)/2] ; if S A:,M < S (h) then return B (A, h, M + 1, u) ; end ( ) if S A:,M > S (h) then return B (A, h, 1, M − 1) ; end return M; 27 Chapter 3 An extreme case of hybrid projections 3.1 Introduction In this chapter, we present Hybrid Compressed Sensing (HCS), an instance of class of hybrid projection matrices, presented in Chapter 2. HCS is an extreme example of such class since in its sparse sub-matrix, each sample spans disjoint subset of signal coefficients. As we show in this chapter, HCS is not able to achieve the optimal sample requirements of m = O(k log n/k). However, it can be proved that the non-optimal HCS outperforms even the state of the art solutions in certain settings of the signal length n and sparsity k. 3.2 Hybrid Compressed Sesning Recall that, a matrix belonging to the class of hybrid projection matrices (after some permutation on its rows) can be decomposed into:    P = P (s) P (d)   (3.1) where P (s) is a ms × n sparse and P (d) is a md × n dense matrix. In HCS, we set the non-zero entries of P (s) and P (d) to be respectively complex and real numbers. Note that, there exists some devices such as radar or medical imaging which capture complex valued samples. Nevertheless, 28 we count each complex valued compressive sample as two real samples (one for the imaginary part and one for the real part). Hence, the projection matrix P in (3.1) is sensing m = 2ms + mr measurements. For simplicity of analysis and without loss of generality assume ms counts n. Now let s = {s1 , . . . , sms } be a set of equal size, random subsets over [n] = 1, 2, . . . , n such that s forms a partition: s ∀i, j ∈ [n], i ̸= j : |si | = w, si ∩ sj = ∅, ∪m i=1 si = [n] Then the i-th row of P (s) is non-zero, only in the column indices of si . Also let f (i) be a unit norm 1 × w row vector with random complex entries having distinct phases in the range of [0, π). Then we insert the entries of f (i) in the i-th row of P (s) in the column indices of si . This can be written as: (s) (c) ∀i ∈ [ms ] : P i,s = f (i) , P i,[n]\s = 0 i i Thus the i-th sample (i ∈ [ms ]) is a function of the signal coefficients only in the indices of si : y i = f (i) xsi . Appending md real-valued, dense random rows to this matrix yields the final projection matrix. 3.2.1 HCS Solver Since there is no overlap among different rows of P (s) , finding the isolated non-zeros can not be robust unless we made some assumptions about the signal. Let us assume that the distribution of signal coefficients is such that for all Ω ⊂ [n] and |Ω| = w, the inner product of xΩ with a unit norm random vector of complex numbers is not zero. Our simulations show that this assumption holds true for test signals and images in CS community. Using this assumption we have: Proposition 1. Let si ⊂ [n], |si | = w be the (ordered) set of indices of non-zero coefficients 29 (i) (s) for the i-th row of a sparse matrix P (s) . And let P i,s = f (i) = [ejϕ1 i (i) . . . ejϕw ] be the corresponding vector of complex exponentials (on the unit circle) such that the random phases of ϕ(.) . are distinct in the range of [0, π): (i) (i) ∀l, t ∈ [w], l ̸= t : ϕl (i) ∈ [0, π), ϕt ̸= ϕl (s) If xsi is real-valued and non-zero in at most one index (∥xsi ∥0 ≤ 1) then having y i = P i,: x = f (i) xsi and f (i) , one can recover xsi deterministically. Moreover having y i and f (i) , we can detect the event that ∥xsi ∥0 ≥> 2. Specifically if y i = 0 then xsi = 0. If the phase of y i equals (i) to the phase of l-th entry of f (i) (∠y i = ∠f l (i) si,l and has the value of xsi,l = y i /f l (i) = ϕl ) then xsi is non-zero only in the index of = ∥y i ∥ where si,l is the l-th member of the ordered list (i) (i) si . If the phase of y i is not among the phases of f (i) ({ϕ1 , . . . , ϕw }), then xsi is non-zero in at least two indices. Recall that hybrid solvers have two major stages: 1) finding and recovering subsets of the signal coefficients spanning at most one non-zero coefficient and 2) forming a full-rank system of linear equations and solving it. For the first ms complex valued compressive samples (y i , i ∈ [ms ]), the solver looks for the phase of each compressive sample (∠y i ) among the available phases in the (s) i associated random vector. As before let f (i) = P i,s , w = |si | and define: ∠f (i) (i) (i) [∠f 1 . . . ∠f w ] Assume the phase of sample y i equals to the phase of t-th entry of f (i) . Then under our assumptions and by Proposition 1, xsi is zero in all indices except in the index of ss,t . This can be written 30 as:     ∥y i ∥ l = si,t   (i) ∠y i = ∠f t ⇔ xl =   0  l ∈ si \t  (3.2) Moreover, if the compressive sample y i is zero, then xsi is a zero vector. Otherwise (if the sample is neither zero nor its phase is found in ∠f (i) ), this subset of the coefficients (xsi ) spans at least two non-zeros and Proposition 1 is not applicable to this subset. Also note that, the idea of finding the phase of a compressive sample among the available phases in a row of the projection matrix is an example of the concept of the alignment, introduced in the previous chapter. More specifically, in the formulation of (2.6), we can re-cast the proposition 1 as finding the alignment of the vector:  in the matrix of:   ℜ (y i )  h=  I (y i ) (3.3) )   ℜ P i,si  A= ( )  I P i,si (3.4)  ( where for a complex number c = a + bi where i2 = −1, ℜ(c) = a and I(c) = b. Define I ′ as the set of sample indices, such that each of these samples is either zero or the phase of that sample can be found among the available phases in the respective random vector: I′ = } { (i) i ∈ [ms ] : y i = 0 or ∠y i ∈ ∠f (3.5) Thus the first stage of the HCS solver determines signal values on indices of: I = ∪i∈I ′ si 31 (3.6) ˆ I is the effect of identified coefficients on the compressive samples y; and by Consequently P :,I x ˆ I from the compressive samples y we are nullifying such effect: subtracting P :, I x ˆ Ic ˆ I = P :,I c x y − P :,I x where I c is the set complement of I (I ∪ I c = [n]). If we consider the n signal coefficients as unknowns and compressive samples as equations, the problem of y = P x can be viewed as a system of n unknowns and m = 2ms + mr equations where |I| coefficients have already been recovered in the first stage of the solver. Having n − |I| further (independent) equations one can recover the remaining unknowns. The random dense submatrix P (d) and complex samples spanning at least two non-zeros provide us such equations. 3.3 Optimal ratio of ms and md under HCS In this section, we derive the sample requirements of HCS (under perfect recovery) and also the complexity of such procedure. To that end, we first find the average number of samples required for the perfect recovery and then using the tools from the concentration of measurement [59] we show that our derived bound is tight. Recall that complex compressive samples span disjoint subset of coefficients. Hence it is straightforward to see that the distribution of number of non-zeros spanned by each sample is multinomial. The problem of finding this multinomial distribution can be casted as the classical “balls into bins” problem [59]. Hence the probability of the event that a complex compressive 32 sample (say y i , i ∈ [ms ]) spans j non-zero(s) is: ) ( ) ( )( 1 k−j 1 j k Pr ∥xsi ∥0 = j = 1− j ms ms ( ) (3.7) Consequently, the expected number of samples spanning at most one non-zero coefficient (|I ′ |) for large values of k is: ) ( E(|I ′ |) = ms Pr ∥xsi ∥0 ≤ 1 ≈ ms ( ) k − k 1+ e ms ms (3.8) Since each complex sample spans w = n/ms coefficients, the number of identified coefficients (I) on average is: E(|I|) = wE(|I ′ |) = n ( ) −k k 1+ e ms ms (3.9) Note that we have not utilized ms − I ′ complex samples in the first stage of the algorithm since these samples span at least two non-zeros. Counting each complex sample as two equations, ( ) we need at least md ≥ n − E(|I|) − 2 ms − I ′ further equations to form a full rank system of equations to recover the remaining coefficients. To identify the required number of measurements, this problem can be casted as the following optimization problem: ( ) arg min 2ms + md s.t. md ≥ n − E(|I|) − 2 ms − I ′ ms ,md (3.10) The following two lemmas outline key results for solving this problem and determine the sample requirements of the HCS solver. The first lemma provides expressions for the numbers of complex samples ms and real samples md that are required for the recovery of x under HCS. The second lemma simplifies these expressions and show that the HCS measurement bound is tight. A 33 corollary is also stated below. Lemma 3.3.1. Consider a k-sparse signal x of length n and define l = n/k. Then on average, HCS requires (√ ms ≈ k 3 l 2 √ − 3 16 1 − 729l 3 ) (3.11) complex-valued samples and md = n − nα − 2αms (3.12) real-valued samples for the perfect recovery of x where α = (1 + k/ms ) exp(−k/ms ). Proof of Lemma 1. To that end we solve the following optimization problem: ( ) arg min 2ms + md s.t. md ≥ n − E(|I|) − 2 ms − I ′ ms ,md (3.13) One can use the Lagrangian multiplier to solve (3.13). Define α = E(|I ′ |)/ms < 1, then the objective function (L) is: L = 2ms + md + λ (αn + 2(1 − α)ms + md − n) (3.14) Taking derivatives respect to three parameters λ, ms and md gives us three equations: λ = −1, αn + 2(1 − α)ms + md = n (3.15) ∂L = 0 ⇒ 2m3s + 2km2s + 2k 2 ms = nk 2 ∂ms (3.16) 34 Define l = n/k. Let us assume l2 ≫ l, then solving (3.16) yields: (√ ms = k 3 l 2 √ − 3 1 16 − 729l 3 ) √ 1 and zero otherwise. Then Pr(νi = 1) = 1 − α. Hence E(νi ) = 1 − α and σ 2 (νi ) = α(1 − α). Define σ2 = ∑ 2 (σi )/ms = σi2 . By Bennet’s inequality [59] we have: ∆ = Pr (m s ∑ ) (νi − E(νi )) > t ( ( 2 ≤ exp −ms σ h i=1 t ms σ 2 )) where: h(u) = (1 + u) log(1 + u) − u > 3u2 6 + 2u (3.19) for u ≥ 0. Solving ∆ ≤ 1/k gives: t≈ √ 2σ 2 ms ln k It is straightforward to see that when ( )3 2 n 2σ ln k ≪ k 4 (3.20) then we have t ≪ k and this completes the proof. Corollary 3.3.3. If √ 3 2l2 ≤ 2 + 1/(1 − α) then md ≤ ms . Although the HCS sampling bound is not optimal for all values of k and n, it outperforms 36 even the most complex CS solutions over a wide range of typical and practical values of k and n (e.g., as long as k/n is not excessively small). Specifically let m⋆ = Ck log n/k be the sample requirement for an optimal CS framework where C is a constant only depending on the projection √ matrix and the solver. If 3 n/k < 1.25C log(n/k), then HCS requires fewer samples for perfect recovery when compared with the sampling requirement of the hypothetical optimal solver. Now let us compute the complexity of the HCS solver. Recall that the HCS decoding process consists of two stages. In the first stage, for each complex valued compressive sample (y i , i ∈ [ms ]), we look for the phase of that sample (∠y i ) among the available phases in the respective random vector ∠f (i) . Assume the entries of (i) (i) (i) f (i) = [exp(jϕ1 ) exp(jϕ2 ) . . . exp(jϕw )] are sorted based on phases: (i) (i) (i) (i) a, b ∈ [w], a > b ⇔ ∠f a = ϕa > f b = ∠ϕb (3.21) Then the complexity of this search for each compressive sample is only log(w). Since we have to repeat this search for all complex valued compressive samples, thus the complexity of the first stage of the solver is O(ms log(w)). As before, let |I ′ | ≥ 0, denotes the number of compressive samples spanning at most one non-zero coefficient, then we have not utilized ms − |I ′ | complex sample from the first stage of the solver. The second stage of the HCS solver finds the solution of a full rank system of md + ms − |I ′ | ≤ 2ms equations. Let g(β) be the complexity of linear solver employed in the second stage of HCS solver to solve a full rank system of β equations. Then the 37 complexity of the second stage of HCS is: ( ) O g(md + ms − |I ′ |) ≤ O (g(2ms )) (3.22) Finally the complexity of HCS solver is O (max{ms log(n/ms ), g(2ms )}). As a final remark, note that the phases of sample in here are the lexical order concept which we present in the previous chapter. More specifically in case of HCS and based on formulation (2.9) we have: h ∈ C : S (h) = ∠h 3.4 (3.23) Simulation Results We tested the proposed HCS framework on a large number of standard compressed sensing signals and images. Here, we present the results for images of Monalisa and Lenna and compare our results with two dominant solvers/projection matrices (Fig. 3.1 and Fig. 3.2). We performed compressive sampling on the whole image of Monalisa, while for Lena, we applied a block based compressed sensing. More specifically for Lena, the target image is formed by keeping the largest k = 8 DCT coefficients in all 8×8 blocks and set the rest of DCT coefficients to zero. For Monalisa image, a total of k = 57979 DCT coefficients (with the largest absolute value) out of n = 283554 coefficients have been kept and we set the remaining coefficients to zero. In our simulations, we counted each complex valued compressive sample as two real samples. In these figures m, tbp , thcs and tomp represent the number of (real) compressive samples and the decoding time required for BP, HCS and OMP respectively. In all simulations, we have assumed that for a given number of samples m, the ratio of complex valued compressive samples and real 38 Figure 3.1: From left to right, the k-sparse image of Monalisa, reconstructed by applying HCS, BP and OMP respectively to the whole image (m = 150746). The decoding times are tbp = 109.5077, thcs = 16.1704 and tomp = 15.1670 seconds. Figure 3.2: From left to right, the k-sparse image of Lenna, reconstructed with m = 28 compressive samples by applying HCS, BP and OMP respectively on blocks of size 8 × 8 pixel. The decoding times are tbp = 41.0283, thcs = 4.4735 and tomp = 1.1512 seconds. 39 valued dense samples (for HCS) is approximately one: (ms = ⌈m/3⌉ and md = m − 2ms ). It is important to note that this sample assignment is not optimal and HCS needs fewer samples for the perfect recovery of the signal (see Lemma 1). However in some real world application, the number of non-zero coefficients might not be known beforehand. Although BP achieved virtually perfect reconstruction, but clearly the complexity of BP is much higher compared to HCS. Further simulation results in case of 1D signals are presented in Fig. 3.3 - Fig. 3.5. 40 Target Signal=Doppler, n=120, k=20 Noiseless HCS, m=50, MSE=0.000000, Time=0.006031 0.4 0.4 0.2 0.2 0 0 −0.2 −0.2 −0.4 −0.4 20 40 60 80 100 120 BP, Gaussian Projection, m=50, MSE=0.015769, Time=0.044851 0.5 20 40 60 80 100 120 OMP, Gaussian Projection, m=50, MSE=0.024785, Time=0.011491 1 0.5 0 0 −0.5 −0.5 20 40 60 80 100 120 20 40 60 80 100 120 (a) Figure 3.3: m, n and k represent number of samples, signal length and signal sparsity respectively. 41 Target Signal=Blocks, n=4800, k=800 Noiseless HCS, m=2400, MSE=0.000000, Time=6.450577 4 4 2 2 0 0 1000 2000 3000 4000 BP, Gaussian Projection, m=2400, MSE=0.000037, Time=375.307032 1000 2000 3000 4000 OMP, Gaussian Projection, m=2400, MSE=0.028259, Time=416.847072 10 4 5 2 0 −5 0 1000 2000 3000 4000 1000 2000 3000 4000 (a) Figure 3.4: m, n and k represent number of samples, signal length and signal sparsity respectively. 42 Target Signal=Leopold, n=1800, k=200 Noiseless HCS, m=720, MSE=0.000000, Time=0.224265 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 500 1000 1500 500 BP, Gaussian Projection, m=720, MSE=0.000126, Time=6.620905 1000 1500 OMP, Gaussian Projection, m=720, MSE=0.000266, Time=11.789665 0.1 0.1 0.05 0.05 0 0 500 1000 1500 500 1000 1500 (a) Figure 3.5: m, n and k represent number of samples, signal length and signal sparsity respectively. 43 Chapter 4 Complex Sparse Projections for Compressed Sensing (CRISP) 4.1 Introduction In this chapter, we present Complex Sparse Projections for Compressed Sensing (CRISP), another instance of hybrid projection matrices. There are two major differences between CRISP and HCS, presented in the previous section: • There are no dense random rows in the projection matrix, i.e. P = P s . • There are overlaps among rows of the (sparse part of) the projection matrix. We show in this chapter that if overlaps among rows of the projection matrix are carefully designed, then CRISP would be an optimal CS framework. 4.2 Forming the projection matrix ¯ would be beneficial to describe properties of CRISP projection maUsing an auxiliary matrix P ¯ . Under the proposed CRISP trices. Let us use the term base projection matrix for such matrix P ¯ has binary entries (ones and zeros), and hence, this base framework, the base projection matrix P ¯ with matrix provides an underlying structure that can be used to populate the “ones” entries of P 44 numbers from higher fields. Similar to HCS, we again use complex numbers in the final projec¯ by complex numbers generates the final tion matrix. This process of replacing the “ones” of P projection matrix P . Consequently, the underlying “structure” (and associated sparsity) of both ¯ and the final projection matrix P is the same. One straightforward approach the base matrix P ¯ is to randomly select the entries of P ¯ where “ones” are placed (Algorithm 5). for constructing P ¯ except for This simple approach does not impose any constraints on the underlying structure of P ¯. the parameter K, which is the number of “ones” that are placed randomly in each column of P Note that K has to be sufficiently small (relative to m = 2ms and n) in order to have a sparse ¯ in this case might underlying structure for the final projection matrix P . Note that the resulting P be consider as the incidence matrix of a bipartite left regular graph of degree K. Hence with high ¯ corresponds to an expander graph. A bipartite graph with n left nodes, m probability [37][62] P right nodes and regular left degree K will be called a (αn, βc) expander, for some 0 < α, β < 1 if for every subset of left nodes ν with cardinality less than or equal to αn (|V | ≤ αn) the number of neighbors connected to ν is larger than βc|ν|. That is |N (ν)| > βc|ν| (4.1) where N (ν) is the set of neighbors of ν. ¯ , where 1 is an all one vector. Algorithm 5: Generating the base projection matrix P Input: m, n and K ≥ 2 ¯ Output: The base projection matrix P ¯ for i = 1 to n (all columns of P ) do Select K distinct random integers (l = {l1 , . . . , lK }) uniformly from [m] = {1, . . . , m} ¯ , set the entries in the rows indexed by l to one and put zero in For the i-th column of P ¯ e,l = 1 other indices: P One might impose certain constraints on the underlying structure of the base projection matrix 45 which leads to the same structure for P . Most notably, approaches that are inspired by channel coding methods for constructing parity check matrices, and in particular ones related to Low¯ . For instance, one Density-Parity-Check (LDPC) codes can be used for the binary base matrix P might utilize the incidence matrices of Balanced Incomplete Block Designs (BIBD) [49] (if they exist for the pair of ms and n) or adjacency matrix of expander graphs to improve the performance of the CRSIP framework as we explain further below in this paper. ¯ prior to replacing the In general, one can perform some form of matrix manipulation of P ¯ onto some complex numbers. Further, one can have certain constraints on the complex “ones” of P numbers used in the projection matrix P . In this paper, we focus on applying two steps to the ¯ , random permutation and employing complex numbers with unique phases as in base matrix P ¯ and HCS. For example, the overall approach used when permuting (uniformly) the columns of P ¯ by normalized complex numbers with unique phases in range substituting the non-zero entries of P of [0, π) is shown below (Algorithm 6). Algorithm 6: Generating the projection matrix from the base projection matrix. ¯ Input: The base projection matrix P Output: The projection matrix P ¯; Permute randomly the columns of P for i = 1 to m (all rows of P ) do ¯ and let w = |si |. Let si be the non-zero entries of the i-th row of P Generate a row vector f (i) composed of w normalized complex numbers with unique phases in range of [0, π). For the i-th row of P , insert f (i) in the column indices of si : P i,si = f (i) and ∀l ∈ / si : P i,l = 0 Now we say few words regarding the BIBD design procedure. As we eluded above, the incidence matrix of a Balanced Incomplete Block Design [49] represents a special case for the design ¯ . Here, we briefly review a few key properties of such matrices. Let of the base projection matrix P m, n, K and λ be positive integers such that m > K ≥ λ. The incidence matrix of a (m, n, K, λ)46 BIBD is a m × n, matrix with the following properties: • Each row is non-zero exactly in w = λ(m − 1)/(K − 1) column indices. Also each of the n = mw/K columns is non-zero exactly in K entries1 and thus we have ∀i, j ∈ [m] : w = |si | = |sj |. Note that this matrix corresponds to the (transpose of) incidence matrix of a bipartite left-regular of degree K and right-regular of degree w. • If we consider two distinct rows of this matrix, both are non-zero in exactly λ columns. That is: ∀i, j ∈ [m], i ̸= j : |si ∩ sj | = λ. Due to the special characteristics (listed above) of the proposed projection matrix, the following properties hold: 1. Since each column of our projection matrix is non-zero in K row indices, hence each signal coefficient xi appears exactly in K random different equations/samples. ¯ is an incidence matrix of a BIBD then two distinct equations 2. If the base projection matrix P (compressive samples) have exactly λ common coefficients/unknowns. 4.3 CRISP Solver Let us assume that the distribution of magnitudes of signal coefficients is such that for all Ω ⊂ [n] and |Ω| = w, the inner product of xΩ with a unit norm random vector of complex numbers is not zero (which is true for test signals and images in CS community). Similar to other combinatorial approaches, the decoding scheme of the CRISP solver is iterative due to the overlaps among rows of the projection matrix. We examine compressive samples to find 1 Note that the size of support of a signal (or sparsity measure) is denoted by k, while K is a parameter of a BIBD. 47 some subsets of the signal spanning only one non-zero coefficient. This can be achieved by looking for the phases of the compressive samples: ∠y = {∠y 1 , . . . , ∠y ms } among the phases of the entries of the random vectors f (.) . For instance, assume that ∠y i (i) equals to ∠f l . Therefore using Proposition 1 (in Chapter 3), one can determine the signal values in the i-th subset xsi . Subtracting the effect of the identified non-zero coefficient from the compressive samples leads to the following new under-determined system of equations: ˆ si = P (x − z) y (1) = y − P :,si x (4.2) where for i > 1, y (i) is the residue of the approximation in the i-th iteration of the solver and z is a vector with the same size of x such that z si = xsi and zero otherwise. Note that: ∥x − z∥0 = ∥x∥0 − 1 . Since there are exactly K distinct samples spanning the same non-zero coefficient, this reduction ¯ is the incidence affects K more samples (equations). Specifically, if the base projection matrix P matrix of a BIBD, then identifying the signal values in one subset reduces the number of unknowns (un-identified coefficients) in all equations: yi = w ∑ f (i) xsi,e e=1 by λ. Therefore the algorithm converges to the solution very fast. More specifically, consider a 48 compressive sample y t spanning two non-zeros including xsi,l : y t = αxsi,l + βxe , e ∈ [n]/si,l where α and β are two distinct entries of f (t) . Thus: (1) yt = y t − P i,si xsi = βxe Consequently after this reduction, all compressive samples spanning two non-zeros such that one of these non-zeros is among the identified coefficients in the first iteration, turn into new samples spanning only one non-zero coefficient. Therefore in the next iteration, the phases of these compressive samples can be found among the available phases in the random vectors f (.) . This event results in identifying the signal values in new subsets (and consequently new coordinates). Algorithm 7 shows the CRISP decoding process. Algorithm 7: CRISP solver Input: y and P ˆ which y = P x ˆ Output: the sparse vector x (0) ˆ = 0. Set I = 0, y = y and x repeat (I) Find all indices β = {i : ∠y i for all i ∈ β do ∈ ∠f (i) } { (I) (i) ˆe = Suppose ∠y i = ∠f l then: x ˆ si y (I+1) = y (I) − P :,si x I =I +1 until y (I) is a zero vector; 49 (I) ∥y i ∥ e = si,l 0 e ∈ si /si,l } 4.4 Optimality ¯ . ThereAs stated before, the projection matrix (P ) inherits best properties of the base projection P fore, by our design and with high probability, P corresponds to (randomly permuted) the incidence matrix of an expander graph. One can use the arguments of [37] that with high probability, having m = O (k log n/k)) compressive samples, 1) in the first iteration of the solver, there exists at least 1 + [K/2] samples spanning only one specific non-zero coefficient (where K is the number of non-zero entries in each column of the projection matrix) and 2) the algorithm never halts and finally 3) it converges in O(k) iterations. Therefore the algorithm is optimal in terms of the order of the number of samples for perfect recovery. However now instead of real valued samples, we are sensing complex samples which (albeit keeping the order of m fixed) doubles the number of sensed real samples (each complex sample is counted as two real measurements). On the other hand, as opposed to traditional combinatorial algorithms which require at least 1 + [K/2] samples that span only one common non-zero coefficient (for a voting like decoding algorithm), here we only need one sample (instead of 1 + [K/2]) that spans one non-zero, which in turn lowers the required number of samples for perfect recovery. This significant reduction in the required samples is attributed to our utility of unique complex phases that make it feasible to detect and identify isolated non-zero coefficients. And finally, this generalization (from binary matrices to matrices with complex entries) reduces the complexity of the solver significantly from O (n log(n/k)) to O (k log(n/k) log(n/m)). Recall that the first iteration of the solver we look for the phases of compressive samples among a predetermined set of phases (available phases in each row of the projection matrix). Similar to 50 HCS, we can assume that for each row of P , non-zero entries are sorted by their phases: (i) (i) (i) (i) a, b ∈ [w], a > b ⇔ ∠f a = ϕa > f b = ∠ϕb Then one might utilize a quick search algorithm (such as binary search) in the CRISP solver. Now for each sample, we need to search among w = O(n/m) distinct phases and the complexity of such search is only log(w). Hence the first iteration of CRISP has a complexity of order O (m log(w)). For the next iteration of the solver, we need to subtract the effect of identified coefficients from the compressive samples. Since each non-zero coefficient appears in K (distinct) samples, hence the complexity of such update is O(K) which is a constant. However the algorithm converges in O(k) iterations, hence the overall complexity of CRISP is O (m log r + kK) = O (m log(n/m)). 4.4.1 The Average Behavior Analysis In this subsection, we present the proof for our claim that CRISP on average needs only k < m ≤ 2k complex (or equivalently 2k < m ≤ 4k real) valued compressive samples for the perfect recovery of a k-sparse signal. Note that, this does not violates the optimality of the bound m⋆ = O(k log n/k). In fact, it has been observed that BP typically recovers a k-sparse signal from ≈ 4k compressive samples. Furthermore, this analysis is only for the average case and in practice the solver usually deviates from the mean behavior and thus higher number of samples is usually required. Lemma 4.4.1. On average, CRISP requires 2k < m ≤ 4k real-valued compressive samples for the perfect recovery of a k sparse signal of length n if k ≫ 1. Proof. Let C = m/k be the oversampling factor. Then the probability that a given sample y j 51 spans exactly i non-zeros is: ( ) (k)(m−1)i (m−1)k−i Pr ∥xsj ∥0 = i = i K−1 ( m )k K (4.3) K where m is the number of rows of P . It is important to note that since the columns of CRISP projection matrix are permuted randomly, hence the probability that any given sample spans i non-zero coefficients, is constant for all samples: ( ) ( ) ∀l, j ∈ [m] : Pr ∥xsj ∥0 = i = Pr ∥xsl ∥0 = i = ηi (4.4) Thus the expected number of samples, spanning only one non-zero is mη1 . To simplify our analysis, let us consider the asymptotic case where k (and hence m and n) tends to infinity. Then the expected number of samples spanning only one non-zero would be: ( ) K − E #j : ∥xsj ∥0 = 1 ≈ kKe C (4.5) k→∞ Note that for large values of k and fixed values of m = Ck, the expected number of samples spanning only one non-zero is a linear function of k. These samples identify the locations of isolated non-zero coefficients which consequently lead to the recovery of the associated subsets of coefficients. Here we should emphasize that not all of these isolated coefficients provide new information. For instance, it is possible that there are several samples spanning only one specific non-zero coefficient xi . So it is important to exclude this redundant information from our analysis. Let be the probability of the event that one sample y a spans only one non-zero coefficient xi , given 52 that there exists another compressive sample y b (b ̸= a) spanning the same coefficient xi : Pr (sa ∩ Supp(x) = i|∃b ∈ [m] : sb ∩ Supp(x) = i) Then we have: (m−2)(m−2)k−1 k(m − 1) K−2 K = ( m )k (4.6) (4.7) K That is choose a non-zero coefficient xi among k non-zeros and choose another sample y a (except y b ) and let y a be only function of xi . Now randomly select K − 2 other samples which would be functions of xi and distribute the remaining k − 1 non-zeros (counting their K repetitions) among remaining m − 2 samples. Fix m = Ck, for large values of k (4.7) can be approximated to: k→∞ ≈ k(K 2 − K) − 2K e C m (4.8) which is a linear function of sparsity measure k. Thus the expected number of samples which do not result to discovery of new coefficients (µ) is: µ= m K ≈ m kK(K − 1) − 2K − 2K e C = k(K − 1)e C K m (4.9) K − Recall that (on average) kKe C samples span only one non-zero coefficient. Hence, the expected number of compressive samples spanning distinct non-zeros in the first iteration of the solver is: −K µ′ = kKe C − µ = k ( ) 2K −K − Ke C − (K − 1)e C (4.10) In words, CRISP solver (on average) recovers µ′ non-zero coefficients in the first iteration. Since the columns of the CRISP projection matrix are permuted independently and uniformly, we can 53 Name=QuadChirp, k=80 (DCT), n=2500 1 CRISP Reconstructed, m=320 MSE=0.000000, Time=0.386784 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Guassian Matrix, BP, m=320 MSE=0.060634, Time=3.678604 Guassian Matrix, OMP, m=320 MSE=0.169403, Time=1.401314 1 0.5 0.5 0 0 −0.5 −1 −0.5 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Figure 4.1: Comparing Compressive Demosaicing with some well-known demosaicing algorithms. say that in the next iteration, there are k (1) non-zeros left to recover from m samples, where: k (1) = k − µ′ = kq, q = ( ) 2K −K − 1 − Ke C + (K − 1)e C (4.11) Note that for fixed values of C = m/k, q is independent of k. It is straightforward to see that, (on average) in the i-th iteration of the algorithm, there are: k (i) = kq i non-zero coefficients which have not been recovered. Also note that even for 1 < C ≤ 2, we have q < 1. In other words, on average, k < m ≤ 2k complex valued samples (and hence 2k < m ≤ 4k real valued samples) is sufficient for the perfect recovery. 54 Name=Blocks, k=50 (Wavelet), n=4950 CRISP Reconstructed, m=200 MSE=0.000000, Time=0.284941 6 6 4 4 2 2 0 0 1000 2000 3000 4000 1000 2000 3000 4000 Guassian Matrix, BP, m=200 MSE=0.960739, Time=4.413219 Guassian Matrix, OMP, m=200 MSE=1.991390, Time=0.613399 10 10 5 5 0 −5 0 −10 1000 2000 3000 4000 1000 2000 3000 4000 Figure 4.2: Comparing Compressive Demosaicing with some well-known demosaicing algorithms. 4.5 Simulation Results We tested the proposed CRISP on a large number of standard signals from SparseLab [45]. Specifically we compared the performance of CRISP (in terms of quality of the recovered signal as a function of the number of compressive samples and the required time for recovery) in comparison with the popular Gaussian random projection matrix and dominant Basis Pursuit (BP) and OMP solvers. In all plots m, n and k denote the number of compressive samples, the signal length and the sparsity of the signal in the sparsifying domain (Wavelet/DCT). To provide fairness, we have counted each complex sample as two samples. We performed our simulations under various configurations of signal length and sparsity ratio which we present some of them in Fig. 4.1- Fig. 4.3.. As clearly demonstrated CRISP has significantly lower complexity compare to BP and OMP. Here we should highlight an important note: due to the strong structures and rich properties of 55 Name=Bumps, k=500 (Wavelet), n=5000 4 4 2 2 0 CRISP Reconstructed, m=2000 MSE=0.000000, Time=14.474694 0 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 Guassian Matrix, BP, m=2000 MSE=0.000003, Time=297.498630 Guassian Matrix, OMP, m=2000 MSE=0.000000, Time=24.440286 4 4 2 2 0 0 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 Figure 4.3: Comparing Compressive Demosaicing with some well-known demosaicing algorithms. BIBDs, we expect that a sensing matrix generated based on an incidence matrix of a BIBD boosts CRISP performances in terms of the solver complexity and also the sample requirements for the perfect recovery. For instance, in Fig. 4.2, we have used a Hamming matrix of weight two (all ¯ . Hence P ¯ is m tuples with K = 2 non-entries in each column) as the base projection matrix P an incidence matrix of a (m, m(m − 1)/2, 2, 1)-BIBD. In that simulation, we observed that if the incidence of a (200, 4950, 2, 1)-BIBD was used as the base projection matrix, then m = 4k (real valued) samples was enough for the perfect recovery. On the other hand, if we have used Algorithm 5 to generate the base projection, then m ≈ 6k samples was required for the perfect recovery and also the decoding time would increase as well. However we should recall that: although there are numerous methods for generating a BIBD with some given parameters [49], it is possible that there are no BIBD designs that exist in some configurations of (m, n, K, λ). 56 Chapter 5 Modified SSMP In this chapter, we are interested in finding a sensing operator P which can be utilized in all three types of CS approaches (combinatorial, convex relaxation, and greedy); and at the same time, P can provide recovery guarantees in ℓ2 norm sense. We show that by some modification in the adjacency matrices of expander graphs, one can construct a class of sensing matrices which exhibits some “weak” form of RIP. Let us clarify what we mean by weak in here. As stated before, the formal definition of RIP demands that ||P x||2 must be close to ||x||2 (with high probability) for all possible k-sparse signals. Although RIP is known to be a sufficient condition to guarantee the success of geometrical approaches in solving CS, it has been reported in numerous papers (e.g. [64] [65] [66]) that the empirical results of linear programming solvers shows that there might be some less constraining properties than RIP, governing the success of linear programming in CS context. This observation raised to investigation for alternative sufficient conditions which consequently led to introduction of different types of RIP, for instance statistical RIP [64], weak RIP [67] and generalized RIP [69] to name a few. This chapter is along with those efforts. More specifically, we focus on the probability of recovering any arbitrary but fixed signal and not the probability of failure of the decoder for all possible k-sparse signals which clearly is a much stronger one. Such notion of fixing the signal support has been introduced and used before, for instance in [67]. By using basic arguments, we show that under this measure (the probability of failure for a fixed arbitrary signal), a simpler and weaker version of RIP which we refer to as w-RIP (that is very similar to the concept Weak-RIP in [67]) is sufficient to recover any arbitrary fixed signal under convex 57 relaxation methods and at least one optimal greedy algorithm called Cosamp [28] with high probability. The proposed w-RIP possibly widens the admissibility of many sensing matrices which do not possess the original RIP, to be considered as “good” sensing mechanism at least in the sense of any arbitrary but fixed signal. We show that if a matrix P ∈ Rm×n is constructed by replacing the non-zero entries of the sparse binary adjacency matrix of an expander graph with a Gaussian random vector then P would benefit from w-RIP and hence could be utilized in geometrical and greedy frameworks. Also, we show that there exists at least a combinatorial algorithm (based on Sequential Sparse Matching Pursuit [68]) for the proposed P with the ℓ2 < Cℓ2 guarantee of reconstruction. To the best of our knowledge, no other class of sensing projection matrices has been introduced in prior works which can provide ℓ2 < Cℓ2 guarantee of reconstruction under all three approaches to CS. This chapter is organized as follows: in Section 5.1 and 5.2, the proposed SERP and our definition of w-RIP are introduced. We prove that if one targets for recovering an arbitrary but fixed signal, then w-RIP is a sufficient condition of success for at least one greedy and all geometrical CS frameworks. In section 5.3, we prove that SERP matrices adheres to such w-RIP and thus can be safely utilized in the aforementioned CS frameworks. In Section 5.4, we show that for the proposed class of SERP matrices, there exists a fast combinatorial algorithm which can robustly recover a signal from its noisy samples. Finally, simulation results are presented in section 5.5. 5.1 SERP projection matrix Consider matrix P = G(m, n, r, d, D) where the function G is defined in Algorithm 8 D is a probability distribution and m, n, r and d are some natural numbers with the condition that m is divisible by r. In words, Algorithm 8 generates the matrix P by following these operations: (i) 58 for each column i ∈ [n], the algorithm selects uniformly at random d indices among [m/r] = {1, 2, . . . , m/r}. Let us denote those indices by h. (ii) Then for each entry of h, for instance the l-th entry hl , the algorithm generates another r indices {(hl − 1)r + 1, (hl − 1)r + 2, . . . , hl r}. The collection of these rd indices forms the row-indices where the i-th column will be non-zero (i.e. Ωi ). (iii) Finally, each non-zero entry of the i-th column of P would be a random variable with distribution D. The output of Algorithm 8 could be also viewed as a matrix which each of its column is non-zero in exactly d random indices and each non-zero entry is a random vector (t1 , t2 , . . . , tr ) with ti ∼ D for i ∈ [r]. Note that, for all i and j where ⌊i/r⌋ = ⌊j/r⌋ we have ωi = ωj . It is straightforward to verify that for sufficiently large n if (with high probability) the output of G(m/r, n, 1, d, D) corresponds to a high quality expander graph, then G(m, n, r, d, D) would also (with high probability) correspond to a high quality expander. Also it should be clear that permuting columns and rows of the generated projection matrix has no effect on our arguments nor the quality of the matrix. A wide range of popular projection matrices (in CS context) could be generated by Algorithm 8. For instance if m = O (k log n/k) and D = N (0, 1/m), then G(m, n, 1, m, D) outputs a random dense Gaussian projection matrix which has RIP with high-probability [16]. On the other hand, for d = θ (log n), r = 1 and a distribution with delta-Dirac at one and zero otherwise for D (i.e. t ∼ D is a constant number with value of one), the output of the Algorithm 8 (with high probability [38, 37, 68]) will correspond to a bipartite expander graph and hence can be utilized in a combinatorial solver for CS. In this chapter, we are interested in the output of the Algorithm 8 when r = O (1) is a small natural number, d = θ (log n), m = O(k log n/k) and D = N (0, 1/rd). We call an output instance of G(m, n, r, d, D) with those values as “Sparse Expander-like Real-valued Projection” or SERP. With some realistic approximations, we show that this class of matrices adheres to a property 59 Algorithm 8: The function G(m, n, r, d, D). input : m, n, r, d and D output: P ∈ Rm×n for i = 1 to n do do T = 0m ; Let h be a random subset of [m/r] of size d (Card{h} = d) ; m/r Ωi = ∪l=1 {(hl − 1)r + 1, (hl − 1)r + 2, . . . , hl r} ; ∀j ∈ Ωi : P j,i ∼ D ; end which we refer to as w-RIP (to be defined in follow) for an arbitrary signal x with a fixed support set T = Support{x}. Then, we show that w-RIP is in fact sufficient to guarantee the recovery of that fixed signal under convex relaxation and Cosamp (an optimal greedy CS decoder) and there is no need for the stronger RIP condition. Also by using the same property, we prove that modifying only one line of the SSMP decoder [68] suffices to come up with a combinatorial algorithm with a ℓ2 < Cℓ2 guarantee of recovery for that particular signal under SERP sensing matrices. To the best of our knowledge, this is the first class of matrices which can provide ℓ2 < Cℓ2 recovery under all three approaches to the problem of compressed sensing at least in a “for any arbitrary but fixed signal” setting. Throughout this paper, we assume that parameters m, d and r are chosen properly such that, if non-zero entries of P = G(m, n, r, d, D) are replaced by one, then it would correspond to a ((c + 1)k, d, q) expander for small constants c ≥ 1 and 0 < q < 1. 5.2 w-RIP Since Restricted Isometry Property (RIP) is an essential tool in the analysis of different algorithms in Compressed Sensing, we review that concept in here. A matrix P ∈ Rm×n has RIP of order k ∈ N [3], if for every k-sparse signal x ∈ Rn , ||x||0 = k, there exists a Restricted Isometry 60 Constant (RIC) constant δk′ such that: ( ) ( ) 1 + δk′ ||x||22 ≤ ||P x||22 ≤ 1 − δk′ ||x||22 No deterministic construction is available so far for a matrix to achieve RIP of order k with the minimal number of rows (samples) of m = O (k log n/k). However, it has been proved that some random matrices satisfy RIP with an optimal number of rows with high probability. For instance, it can be shown [16] that Gaussian random matrices have that property with a probability of at least 1 − cn−γ for some positive constants c and γ. To establish a clear connection between RIP and our proposed w-RIP let us define an auxiliary RIP which we refer to as probabilistic-RIP and abbreviate by p-RIP in this paper. Definition 1. We say, that a matrix P ∈ Rm×n has p-RIP of order (k, p) ∈ N × [0, 1], if P has RIP of order k with a probability of at least p. Hence Gaussian random matrices with m = O (k log n/k) rows have p-RIP of order (k, p) with p = 1 − cn−γ for some constants γ and c. Now let us present our definition of w-RIP in here. Definition 2. Definition 2. We say, that a matrix P ∈ Rm×n has w-RIP of order (k, p) ∈ N×[0, 1], if for any arbitrary but fixed k-sparse signal x ∈ Rn , ||x||0 = k, there exists a w-RIP constant 0 ≤ δk,p ≤ 1 such that: ( ) ) ( 1 + δk,p ||x||22 ≤ ||P x||22 ≤ 1 − δk,p ||x||22 with a probability of at least p. In some parts of this chapter when p and k are known from the context, we drop subscripts and simply denote δk,p by δ. Note that, this definition is slightly different from the definition of 61 weak-RIP in [67] and the one in [69]. The connection between RIP and p-RIP is obvious from their definitions. The connection between w-RIP and p-RIP can be justified as follow. Assume the set of T = {T1 , T2 , . . .} contains all subsets of [n] = {1, 2, . . . , n} with the cardinality of k, i.e: ∀i ∈ {1, 2, . . . , Card{T }} : Ti ⊆ [n], Card{Ti } = k Furthermore, suppose that P has w-RIP of order (k, p). Then it means that for each Ti , all singular √ √ values of P :,Ti are in the range of [ 1 − δk , 1 + δk ] with a probability of p. If columns of P are independently distributed (which is usually the case) then P would have RIP of order k with a ( ) probability of pCard{T } , i.e. p-RIP of order k, pCard{T } . The first papers on Compressed Sensing (e.g. [2, 8, 4, 5]) were focusing on the failure probability of decoder for all possible k sparse signals. For instance, to target a probability of success q in case of a convex relaxation decoder, one needs a p-RIP of order (2k, q). Consequently, to show that √ one matrix has this property, that matrix must have w-RIP of order (2k, p) where p ≥ Card{T } q and T contains all subsets of [n] with the cardinality of k. Note that cardinality of the superset (n) T is Card{T } = 2k . Even for small values of n, Card{T } is enormously large, demanding Card{T√ } q to be extremely close to one. This theoretically deprives many sensing matrices from having RIP or p-RIP for the optimal number of samples m = O (k log n/k) even though they exhibit good empirical results in practice. This has been one of main motives that some recent efforts (e.g. [64, 67]) has focused on deriving “the probability of failure of a CS framework for any arbitrary but fixed signal” instead of the “the probability of failure of a CS framework for all possible signals”. One expects that this relaxation should have some implications on RIP as well and fortunately that is the case. By looking more closely at proofs in [bp] and [28], one can easily notice that under 62 both these frameworks, to recover an arbitrary signal x with a fixed support from its compressive samples, there is no need for the full RIP or p-RIP. Instead, to prove that an arbitrary signal with a fixed support set could be recovered from its compressive samples with a target probability of q ( √ ) under those decoders, it only suffices to have w-RIP of order k, p = z q where the constant z is (n) solver-dependent and by far smaller than 2k . For instance as we show in proof of Lemma 5.2.1, the constant z in case of a linear program based decoder is only O (n/k). Also later in this paper, we show that the proposed class of SERP matrices can easily achieve w-RIP of a proper order with some realistic approximations. 5.2.1 Implications of w-RIP The following two lemmas prove that, if a matrix has w-RIP of proper orders, then it will be a “good” sensing mechanism under Cosamp (a greedy algorithm) and also any convex relaxation decoder when one targets for recovering any arbitrary signal with a fixed support. Lemma 5.2.1. Consider any arbitrary signal x ∈ Rn with the fixed support set T of cardinality √ k. Assume that, P ∈ Rm×n has w-RIP of order (2k, p) with the constant δ2k,p < 2 − 1 for p = 1 − 1/nβ+1 where β ≥ 1. If ∆ (P , y) belongs to the class of convex relaxation methods, then ∆ (P , y) provides a ℓ2 < Cℓ1 guarantee of recovery for x from y = P x with a probability of at least p2n ≥ 1 − 2/nβ . Proof : Let x∗ = ∆ (P , y) where ∆ is any convex relaxation based decoder. In other words, x∗ is the solution to optimization problem (1.8). As in [3] define h = x∗ − x and partition [n] into sets of cardinality k, denoted by T0 , T1 , T2 , . . . such that T0 corresponds to k largest entries of x, T1 corresponds to k largest entries of hT c , T1 corresponds to the next k largest entries of hT c and 0 0 so on. Fixing the support of x, the arguments of [3] requires that RIP inequalities to be active on 63 sets T0 ∪ T1 , T0 ∪ T2 , T0 ∪ T3 , . . . , T1 ∪ T2 , T1 ∪ T3 , T1 ∪ T4 , . . . , Since k could be as low as one, there are at most 2n of these sets. Now assume that the matrix P has w-RIP of order (2k, p). Then P has the required properties in [3] with a probability of p2n . Recalling that ( )2n 1 2 ∀n ≥ 1 : 1 − β+1 ≥1− β n n it can be concluded that if one targets for recovery of a fixed signal with a probability of q = 1 − 2/nβ for some constant β ≥ 1, then if suffices to have p > 1 − 1/nβ+1 . Lemma 5.2.2. Consider any arbitrary signal x ∈ Rn with the fixed support set T of cardinality k. Assume that, P ∈ Rm×n has w-RIP of order (2k, p) with the constant δ4k,p < 0.1 for p = 1 − 1/nβ+1 where β ≥ 1. If ∆ (P , y) is Cosamp algorithm, then ∆ (P , y) provides a ℓ2 < Cℓ1 guarantee of recovery for x from y = P x with a probability of at least p2n ≥ 1 − 7/nβ . Proof : For a k-sparse signal x, Cosamp requires at most 6(k + 1) ≤ 7k iterations (for k ≥ 6). Assume that in the start of the i-th iteration, the algorithm estimates x by x{i} . Define: R{i} = x − x{i} , d{i } = y − P x{i} In words, R{i} is the residue of estimation and d{i } is the discrepancy of estimation in that iteration. Finally let Ω{i} be indices corresponding to 2k largest values of P T d{i} . Then on the i-th iteration of Cosamp, one needs RIP inequalities only on column indices 64 {i} ∀i ∈ [6(k + 1)] : T1 {i} ∀i ∈ [6(k + 1)] : T2 = Support{Ω{i} }\Support{R{i} } = Support{Ω{i} } ∪ Support{x} ∪ Support{x{i} } {i} {i} Note that for all i, Card{T1 } ≤ 2k and Card{T2 } ≤ 4k. Based on the proof of [28], Cosamp recovers a fixed sparse signal x when the restricted isometery constant on sets {1} T1 {2} , T1 {6(k+1)} , . . . , T1 {1} and T2 {2} , T2 {6(k+1)} , . . . , T2 are all smaller than 0.1. Therefore, if one wants to recover any arbitrary but fixed signal x with a target probability of q then the matrix P √ has to have a w-RIP of order (4k, p) with the constant δk,p ≤ 0.1 for p ≥ 6(k+1) q. For instance, if p ≥ 1 − 1/nβ+1 then q ≥ 1 − 7/nβ . 5.3 w-RIP for SERP matrices The following two lemmas prove that SERP matrices have w-RIP and hence could be utilized in Cosamp and convex relaxation methods in a “for any arbitrary but fixed signal” scenario. Lemma 5.3.1. Let P ∈ Rm×n be an output instance of G(m, n, r, d, D). If d = θ(log n), m = O(k log n/k), D = N (0, 1/rd) and r = O(1) is a constant, then for any arbitrary vector x ∈ Rn with a fixed support, there exist constants δ and α = O(r) such that ( ) 1 P r ||P x||22 ≥ (1 + δ) ||x||22 ≤ α n (5.1) Proof : Without loss of generality, assume that x has a unit norm (||x||22 = 1). To bound ||y||22 = ||P x||22 (similar to [16]) one can resort to moment generating function of the random 65 variable y: ( ) ( ( ) ) Pr ||y||22 ≥ (1 + δ) = Pr exp t||y||22 ≥ exp (t(1 + δ)) ( ( )) E exp t||y||22 ≤ exp (t(1 + δ)) (5.2) (5.3) ( ( )) for a positive value of t, where the last inequality is due to Markov’s. Clearly E exp t||y||22 is the moment generating function of the random variable ||y||22 . Note that (see [16]): yi ∼ ||xωi ||2 N (0, 1) √ rd where N (0, 1) is standard normal distribution. Consequently ||xωi ||22 χ2 (1) 2 yi ∼ rd where χ2 (1) is a chi-squared distribution with one degree of freedom. Define ||xωi ||22 ∀i ∈ [m] : zi = = rd Then the random variable ||y||22 = ∑ 2 j∈ωi xj rd ∑m 2 i=1 zi χ (1) is a weighted sum of chi-squares. It is known and easy to verify that manipulating the moment generating function of a random variable with the distribution of weighted sum of chi-squares does not lead to neat or usable formulas. Hence, a number of estimations are suggested to approximate that random variable with another one which can be manipulated more easily (e.g. [70, 71]). Here in this paper, we use the approximation in [70]. A commonly practiced approximation of a weighted chi-squared random variable is a Gamma distribution with the same mean and variance of the original random variable. In [70], 66 the authors proved that, this approximation leads to acceptable results and more importantly, the gamma approximation tends to be an over-estimate of the true distribution of ||y||22 when variance of {zi } is not small1 . Since values of {zi } in SERP are linearly proportional to ||xωi ||22 (i.e. norms of random subsets of the signal) and the magnitudes of many signals of the interest (e.g. DCT coefficients of images) almost follow a power law, hence at least for these classes of signals, it is very unlikely that ||xωi ||22 to have small variances. Consequently, this approximation seems to be reasonable for those practical signals of interest. Define ∑m ∑ zi ( m zi )2 i=1 a = ∑m 2 , λ = ∑i=1 2 2 i=1 zi 2 m i=1 zi (5.4) Then based on the approximation of [70], ||y||22 can be approximated by a Gamma distribution with the shape parameter λ and scale parameter 1/a. Note that m ∑ ∑m zi = i=1 2 i=1 ci xi rd , ci = Card{j : j ∈ ωi } (5.5) where ci is the number of times that i ∈ [n] occurs in ωi . Recall that, all columns of P are non-zero in exactly rd row indices. Hence, for all i ∈ [n] we have ci = rd. Since it is assumed that x is unit norm, we get m ∑ i=1 ( m )2 ∑ zi = zi =1⇒a=λ i=1 Now let us find upper and lower bounds for ∑m 2 i=1 zi . It can be proved by contradiction that given the properties of {zi }, the vector z = [z1 z2 z3 . . . zm ] ∑ 2 has to be maximally sparse when the objective function m i=1 zi attains its maximum. More specifically, assume that n is not extremely small such that rd > 1. Furthermore, assume that a 1 Here we are deriving the upper bound for equation (5.1). Hence, the derived probability from the approximation in here is higher than the actual one. 67 given sequence {zi } leads to maximum value for ∑m 2 i=1 zi , however the vector z = [z1 z2 z3 . . . zm ] is not maximally sparse. Then, one can find two indices a, b ∈ Support{z} such that za ̸= 0 and za ̸= 0 (since rd > 1 and ||z||0 ≥ rd) and form a new sequence of {w1 w2 . . . wm } where    zc c ̸= a, c ̸= b     wc = 0 c=a       z + z c = b. a b (5.6) Define w = [w1 w2 . . . wm ]. It is easy to verify that m ∑ zi = i=1 m ∑ wi , ||w||0 = ||z||0 − 1, m ∑ zi2 < i=1 i=1 m ∑ wi2 i=1 which contradicts our assumption that this sequence of {zi } leads to maximum value for ∑m 2 i=1 zi . Performing the re-distribution of signal energies successively, one can conclude that the vector z = [z1 z2 z3 . . . zm ] has to be maximally sparse when ∑m 2 i=1 zi attains its upper bound. However, the sparsity of the vector z can never be less than rd, due to two facts: (a) the matrix P corresponds to an expander and (b) each column of P is non-zero at exactly rd indices. This maximally sparse case happens when the signal is non-zero in exactly one location. Since, we have assumed that the signal is unit-norm, then it means that the signal is one at one index and zero in the rest of indices. It is straightforward to verify that in this case, concludes the upper bound for 2 i=1 zi = 1/rd. This ∑m 2 i=1 zi . It is known and easy to verify that the when the constraint value of ∑m ∑m i=1 zi = 1 is active, the minimum ∑m ∑m 2 2 i=1 zi occurs when all zi are equals to 1/m. In that case, i=1 zi equals to 1/m as well. Putting together upper and lower bounds, we have rd/2 ≤ a = λ = m/2. Based on arguments 68 above, the following holds ) E (exp (t||y||2 )) 2 Pr ||y||22 ≥ (1 + δ) ≤ = exp (t(1 + δ)) ( )−a 1 − at = f (t) et(t(1+δ)) ( (5.7) (5.8) aδ < a which is in a valid range for the for 0 < t < a. By setting ∂f (t)/∂t = 0, we get t = 1+δ corresponding moment generating function. Substituting back this value into (5.8) leads to ( ) Pr ||y||22 ≥ (1 + δ) ( ≤ ) 1+δ a eδ (5.9) Let τ= ( δ2 2 2 1 + δ + δ2 ) (5.10) Then by writing Taylor series expansion of (1 + δ)/ exp δ around δ = 0 upto δ 2 and recalling that 1 − s ≤ exp (−s), it can be concluded that ( ) Pr ||y||22 ≥ (1 + δ) ≤ exp (a)−τ (5.11) Now recall that rd/2 ≤ a and d = c1 log n for a constant c1 . Thus ( ) Pr ||y||22 ≥ (1 + δ) ≤ 1 nrc1 τ /2 (5.12) ( ) In summary, there exists a constant α = rc1 τ /2 such that P r ||y||22 ≥ (1 + δ) ≤ 1/nα . Clearly parameter r (in the projection matrix) can be tuned such that α ≥ 2. Lemma 5.3.2. Let P ∈ Rm×n be an output instance of G(m, n, r, d, D). If d = θ(log n), m = 69 O(k log n/k), D = N (0, 1/rd) and r = O(1) is a constant, then for any arbitrary vector x ∈ Rn with a fixed support, there exist constants δ and α = O(r) such that ) ( 1 Pr ||P x||22 ≥ (1 + δ) ||x||22 ≤ α n (5.13) Proof : Assume that x has a unit norm. Similar to proofs of Lemma 5.3.1, one can conclude: ( ) ( ( ) ) Pr ||y||22 ≤ (1 − δ) = P r exp −t||y||22 ≥ exp (−t(1 − δ)) ( ( )) E exp −t||y||22 ≤ exp (−t(1 − δ)) (5.14) (5.15) ( ( )) for a positive value of t. We again use the approximation of [70] to estimate E exp −t||y||22 . More specifically by using arguments provided in the proof of Lemma 5.3.1, we have ( ( )) E exp −t||y||22 e−t(1−δ) et(1−δ) ( ) ≈ 1 + at (5.16) where rd/2 ≤ a ≤ m/2. For finding the tightest bound for (5.15), one should set the derivative of (5.16) with respect to t to zero and solve for t. That value is t = aδ/(1 − δ). It is straightforward to verify that if 0 < δ < 1/2 then this computed value for t is in the valid range of the corresponding moment generating function. Plugging back this value to (5.15) yields )a ( ) ( P r ||y||22 ≤ (1 − δ) ≤ (1 − δ)eδ (5.17) Note that Taylor series expansion of (1 − δ)eδ around δ = 0 up-to degree 4 is (1 − δ)eδ = 1 − ( ) δ2 δ3 δ4 − − + O δ5 2 3 8 70 (5.18) Hence ( (1 − δ)eδ )a aδ 2 ≤ e− 2 (5.19) Considering that rd/2 ≤ a ≤ m/2 and d = c1 log n for a constant c1 , the following holds ( ) 1 P r ||y||22 ≤ (1 − δ) ≤ α n (5.20) for α = rc1 δ 2 /4. Again r can be tuned such that α ≥ 2. Theorem 5.3.3. Let P ∈ Rm×n be an output instance of G(m, n, r, d, D). If d = θ(log n), m = O(k log n/k), D = N (0, 1/rd) and r = O(1) is a constant, then P would have w-RIP of order (k, p) with the constant δ for p ≥ 1 − n2α where α = O(r). Proof : Lemmas 5.3.1, Lemma 5.3.2 and the Definition 2 directly imply that P has w-RIP of order (k, p) for p ≥ 1 − n2α where   rc1 δ 2 ), α = min  ( 2 δ 4 1+δ+ 2 rc1 δ 2  4 = ( rc1 δ 2 2 4 1 + δ + δ2 ) = O(r) Theorem 5.3.3 proves that, the proposed SERP can be utilized in at least one greedy solver and all convex relaxation methods and provide a “for any arbitrary but fixed” guarantee of recovery in ℓ2 < cℓq norm sense even in case of noisy samples and compressible signals. In the next section, we show that the proposed projection matrix could be utilized in combinatorial solvers as well. 71 5.4 A Combinatorial solver for SERP As before, assume P ∈ Rm×n is generated by using Algorithm 8. In this section, we show how an already available combinatorial solver for binary projection matrices could be easily modified to support the proposed sparse real-valued projections. To that end, we have considered the Sequential Sparse Matching Pursuit (SSMP) decoder algorithm, introduced in [68] due to its low computational costs and also optimality of sample requirements m = O(k log n/k). We prove that by changing only one line in the SSMP algorithm, one can come up with a Modified version of Sequential Sparse Matching Pursuit (MSSMP) with a ℓ2 < Cℓ2 guarantee of recovery which is stronger than the ℓ1 < Cℓq guarantee in case of the original SSMP decoder algorithm. Algorithm 9 outlines the proposed MSSMP decoder for the class of sparse real-valued projection (SERP) matrices. The only difference between the MSSMP algorithm and SSMP is in line 5 of Algorithm 9, ) ( where in [68] for a fixed index i, ||P x{j} + zei − y||1 is minimized. Note that for a fixed index ) ( i, the minimizer z to ||P x{j} + zei − y||2 is z= ( ) P T:,i y − P x{j} ||P T:,i ||22 ( ) P TΩ ,i yΩi − P Ωi ,: x{j} i = ||P TΩ ,i ||22 (5.21) i ( ) while the minimizer to ||P x{j} + zei −y||1 in [68] is the median of the vector yΩi −P Ωi ,: x{j} . To prove the ℓ2 < Cℓ2 guarantee of recovery for the proposed pair of sensing operator and combinatorial solver, we essentially follow the proof in [68]. However, here the w-RIP feature of the proposed projection matrix is utilized in the proof, which has different characteristics than its RIP-1 counterpart in the case of the binary projections in [68, 38, 37]. To provide an easy comparison between the presented proof and the one in SSMP, we tried to adapt the notations used in [68] as much as possible in here. 72 Algorithm 9: The Modified version of the Sequential Sparse Matching Pursuit (MSSMP), a combinatorial algorithm for class of sparse real-valued projections. For any vector x, the hard-thresholding operator Hk [x] returns a vector which keeps the k largest coefficients of x and is zero in the rest of entries. In other words, Hk [x] is the best k sparse approximation of the vector/signal x. input : P ∈ Rm×n , y ∈ Rm , c ∈ N and k ∈ N output: x ∈ Rn x{0} ← 0n ; for j = 1 to T = O (log ||x||) do x′ ← x{j−1} ; for S = 1 to (c − 1)k do ( ) Find a coordinate i and increment z that minimizes ||P x{j} + zei − y||2 ; x′ ← x′ + zei ; end x{j} ← Hk [x′ ] ; end The algorithms SSMP and hence MSSMP are composed of two nested loops. The objective of the inner loop is to reduce the discrepancy of the estimation in the j-th iteration ||P x{j} − y||l , where l = 1 in case of SSMP and l = 2 for MSSMP. The outer loop just enforces that the signal estimation to be sparse. The following theorem (which itself is built based upon some other lemmas) proves that the modified SSMP can handle the proposed real valued projection matrix and guarantee ℓ2 < Cℓ2 recovery for a fixed but arbitrary signal. Theorem 5.4.1. Given the projection matrix P , noisy samples y = P x + µ, the sparsity level k = ||x||0 , the combinatorial Algorithm 9 returns an estimate xˆ such that ||ˆ x − x||2 = O(µ) with a probability of at least 1 − log ||x||2 /nα (where α is a constant) if w-RIP constant of P δ2k,p is sufficiently small for p ≥ 1 − 1/nα+1 . Proof : Let us use x{j} to denote the estimated value of the true signal x in the beginning of the j-th iteration of the outer loop. Note that in the start of the outer loop, x{j} is k-sparse. By Lemma 5.4.2 (stated and proved below), if for a fixed regularization parameter cl , the discrepancy 73 of the estimation is still high (i.e. ||P x{j} − y||2 ≥ cl ||µ||2 ) then there exists an index i ∈ [n] and a step size z such that ||P ( x{j} + zei ) √ − y|| ≤ 1− cu ||P x{j} − y|| k (5.22) where cu is a function of w-RIP constant of P and also the expander quality of the corresponding bipartite graph. Thus after O(k) iterations of the inner loop ||P x′ − y|| ≤ β||P x{j−1} − y||, β = e−cu /2 (5.23) Noting that P x′ − y = P (x′ − x) − µ, triangle inequality can be applied to get ( ) ||P x′ − x || ≤ β||P (x{j−1} − x)|| + (1 + β) ||µ|| (5.24) To relate the improvement on discrepancy to the improvement on signal estimation, one could utilize w-RIP on the set of columns corresponding to Support{x′ − x} to conclude √ ||x′ − x|| ≤ β 1 + δ {j−1} 1+β ||x − x|| + √ ||µ|| 1−δ 1−δ Define constants √ α = 2β 1+δ 2 (1 + β) , cµ = √ 1−δ 1−δ (5.25) (5.26) Then by Lemma 5.4.3 (stated and proved below), we have: ||x{j} − x|| ≤ α||x{j−1} − x|| + cµ ||µ|| 74 (5.27) Now if the expander quality of the corresponding bipartite graph and also the number of samples are sufficiently high (leading to smaller values for β and δ respectively), then α would be less than one. In that case, running the outer loop for z iterations, we would have ||x{z} − x|| ≤ αz + 2cµ ||µ|| 1−α (5.28) Thus running the outer loop only for O (log ||x||) iterations leads to ||x{z} − x|| = O(µ). As a final remark, note that Algorithm 9 requires τ = O(k log ||x||) iterations and in each iteration it relies on w-RIP inequalities of P for signals which are non-zero at most in 2k indices (see proof of Lemma 5.4.2). Hence to recover an arbitrary but fixed signal x from its compressive samples with a target probability of q, P must have w-RIP of order (2k, p) where pτ ≥ q. In the following, we state the two lemmas used in the proof of the Theorem 5.4.1. Lemma 5.4.2. (Equivalent of Lemma 3 in [68]) Let y = P x+µ. If x′ is k-sparse and ||P x′ −y|| ≥ cl ||µ|| and P has a w-RIP of a proper order, then there exist an index i and a positive constant cu such that ( ( ′ ) cu ) ′ 2 ||P x − xi ei + xi ei − y|| ≤ 1 − ||P x′ − y||2 k (5.29) ∆ = x′ − x (5.30) Proof : Define Since y = P x + µ then (5.29) can be re-expressed as: √ cu ||P (∆ − ∆i ei ) || ≤ 1 − ||P ∆ − µ|| k 75 (5.31) Note that from the assumption in the lemma (cl − 1) ||µ|| ≤ ||P x′ − y|| − ||µ|| ≤ ||P x′ − y + µ|| = ||P ∆|| Since ||∆||0 ≤ 2k hence w-RIP implies that ||P ∆|| ≤ √ (5.32) 1 + δ||∆||. Consequently: c −1 ||∆|| ≥ √l ||µ|| 1+δ (5.33) T = Support{∆} = {i : ∆i ̸= 0} (5.34) Define Note that 0 ≥ Card{T } = ||x − x′ ||0 ≤ 2k. Assume sort entries of ∆ in a decreasing order, i.e. ∀i ∈ [Card{∆} − 1] : |∆i | ≥ |∆i+1 | (5.35) Consider G, the bi-partite graph corresponding to the projection matrix P . Assume one enumerate − the edges of the graph G in lexicographic order and forms the sets Ω+ i and Ωi by following this rule: if (i, j) is the first edge going to the vertex j, then j would be included in Ω− i , otherwise it goes to the set Ω− i [68]. More formally − + Ω+ i = {j : j ∈ Ωi , s.t. z < i, P j,z ̸= 0}, Ωi = Ωi \Ωi (5.36) Note that sets Ω+ i are pair-wise disjoint. As before, let D = dr. With a slight abuse and for the simplicity of notations, let us use |A| for a set to denote its cardinality. Since, P corresponds to an 76 ((c + 1)k, D, q) expander graph (where c ≥ 1), one has ∀s ≤ (c + 1)k : s ∑ |Ω− i | ≤ qDs (5.37) i=1 For a fixed constant cs > 1, define: − + T + = {i : i ∈ T and |Ω− i | < qDcs }, T = T \T (5.38) As in [16], our goal is to show that (a) most of energy of ∆ is in coordinates T + and (b) there is an index in T + , such that (5.29) holds true for that index. Suppose that, entries of T are sorted increasingly, i.e. T − = {u1 , u2 , . . .} where ui < ui+1 . Now since uk ≥ k, hence k ∑ u |Ω− ui | ≤ i=1 k ∑ |Ω− i | (5.39) i=1 By definition ∀i ∈ T − we have |Ω− i | > qDs and therefore kqDcs < k ∑ |Ω− u | i (5.40) i=1 On the other hand since the graph G is D = rd left regular, thus u k ∑ |Ω− i | ≤ qDuk (5.41) uk ≥ kcs (5.42) i=1 From the last two equations, we get 77 The immediate consequence is that for all i ∈ [k]: |{1, 2, . . . , ui } ∩ T + | ≥ k (cs − 1) (5.43) As in [68] for any i ∈ [k], let Si be the set containing the smallest (cs −1) elements of {1, 2, . . . , ui }∩ T + \{∪k−1 i=1 Si }. Noting that ∀j ∈ Si : |∆j | ≥ |∆ui | and |Si | ≥ (cs − 1), one has ||∆||2 ≥ ||∆T − ∪{∪ S } ||2 = i i  ∑ cs ∑ ui ∈T −  ∆2j  ≥ (5.44) ∆2u = cs ||∆T − ||2 i (5.45) ∆2u + ui ∈T − ∑ i j∈Si Recalling that ||∆||2 = ||∆T − ||2 + ||∆T + ||2 , it can be concluded that √ 1 ||∆T + || ≥ 1 − ||∆|| cs (5.46) Define: gaini = ||P ∆ − µ||2 − ||P (∆ − ∆i ei ) − µ||2 = (5.47) ||P ∆||2 − ||P (∆ − ∆i ei ) ||2 − 2µT P ∆i ei (5.48) Since the i-th column of P is non-zero only in row indices Ωi , hence ∀j ∈ [m]\Ωi : (P ∆)j = (P ∆ − ei ∆i )j . It is straightforward to verify that: gaini = 2 (P ∆)TΩ (P ∆i ei )Ω − || (P ∆i ei )Ω ||2 − 2µT P ei ∆i i i i 78 (5.49) Using the fact that ∆ = ∆T + + ∆T − and summing gaini over all i ∈ T + yield: ∑ i∈T + gaini = 2 (P ∆)TΩ P ∆T + − i ( 2||P ∆T + ||2 + 2 P ∆T − )T ( ∑ || (P ∆i ei )Ω ||2 − 2µT P ∆T + = i i∈T + ∑ ) P ∆T + − || (P ∆i ei )Ω ||2 − 2µT P ∆T + i∈T + i (5.50) (5.51) By w-RIP on support of T + , one has: || (P ∆i ei )Ω ||2 ≤ (1 + δ) i ∑ ||∆i ||2 = (1 + δ) ||∆T + ||2 (5.52) i∈T + Using the lower bound of w-RIP for T + one has ||P ∆T + ||2 ≥ (1 − δ)||∆T + ||2 and hence (5.51) can be simplified to ∑ ( )T ( ) gaini ≥ (1 − 3δ) ||∆T + ||2 + 2 P ∆T − P ∆T + − 2µT P ∆T + (5.53) i∈T + Note that by Cauchy-Schwarz inequality µ T P ∆T + ≤ 1+δ ||∆ + ||2 cl − 1 T (5.54) Combining two last equations we have: ∑ i∈T + ( ) ( )T ( ) 2 (1 + δ) ||∆T + ||2 + 2 P ∆T − P ∆T + gaini ≥ 1 − 3δ − cl − 1 (5.55) Using w-RIP of P on two partitions of T , we have (see [28]) ||P TT + P T − || ≤ δ 79 (5.56) and hence ( )T ( ) P ∆T + | ≤ δ||∆T + ||||∆T − || | P ∆T − (5.57) Substituting (5.57) into (5.56) yields ∑ i∈T + ( ) 2 (1 + δ) gaini ≥ 1 − 3δ − ||∆T + ||2 − 2||∆T + ||||∆T − || cl − 1 (5.58) By (5.46) and (5.58), we have √ ) ( 1 1 2 (1 + δ) ||∆T + ||2 − 2δ − 2 ≥ C||∆||2 gaini ≥ 1 − 3δ − cl − 1 cs cs + ∑ i∈T for √ (( ) )( ) 1 2 (1 + δ) 1 1 C¯ = 1− 1 − 3δ − − 2δ − cs cl − 1 cs c2s (5.59) (5.60) Recall that |T + | ≤ 2k. Thus there exists an index j ∈ [n] such that: gainj ≥ 2 2 ¯ ¯ C||∆|| C||∆|| ≥ |T + | 2k (5.61) To relate the gain to discrepancy ||P x′ − y||, it can be shown that √ 1 + δ||∆|| ≥ ||P ∆|| = ||P x′ − P x − µ + µ|| ≥ ||P x′ − y|| − ||µ|| (5.62) By formula (5.33) ||∆|| ≥ √ ||P x′ − y|| ( ) 1 + δ 1 + c 1−1 l 80 (5.63) Now one can simplify (5.61) to ∃j ∈ [n] : gainj ≥ cu ||P x′ − y||2 k (5.64) for cu = C¯ ( )2 2 (1 + δ) 1 + c 1−1 (5.65) l Note that, to improve the descrepancy of estimation and hence reducing the error of estimation, we require that cu to be strictly positive. Since the denominator of cu is strictly positive, it only suffices to have C¯ > 0. This can be achieved if δ (w-RIP constant) is sufficiently small. For instance, for cs = 2 and cl = 4, C¯ would be positive if δ ≤ 0.058. Lemma 5.4.3. (Equivalent of Lemma 6 in [68]) For any k-sparse vector x and any vector x′ we have: [ ] ||Hk x′ − x|| ≤ 2||x′ − x|| (5.66) Proof : The proof of Lemma 6 in [68] is for ℓ1 norm, however it holds for any other valid norm as well. Nevertheless for the sake of completeness, here we present the proof. Note that, [ ] [ ] Hk x′ is the closest k-sparse vector to x′ . Hence ||Hk x′ − x′ || ≤ ||x − x′ ||. Consequently by triangle-inequality we have: [ ] [ ] ||Hk x′ − x|| ≤ ||Hk x′ − x′ || ≤ ||x′ − x|| + ||x′ − x|| ≤ 2||x′ − x|| (5.67) Let us conclude this section with some discussion about the complexity of the MSSMP decoder. Computing P x in case of a sparse binary projection is achieved by adding some entries of x (since all non-zero entries of P are one). However, for a sparse real-valued projection, computing P x 81 must be performed through computing a series of inner products. Clearly, this update/computation is heavier for a sparse real-valued projection matrix than the one in case of a sparse binary projection matrix. However, it is only a constant factor worse (the complexity of the multiplication divided by the complexity of summation). On the other hand, computing P x in case of a sparse real-valued projection matrix is much lighter than when P is a random-dense matrix. Also as stated before, the algorithms of MSSMP and SSMP are exactly the same, except line 5 of Algorithm 9. In that line of SSMP, the median of d numbers are computed while the same line in MSSMP demands the orthogonal projection of two vectors in Rrd . Although this operation is heavier than computing the median of d numbers, order-wise, these two operations have exactly the same complexity (since r = O(1)). Consequently, the complexity order of MSSMP and SSMP are the same. 5.5 Simulation Results In this section, we present some numerical results regarding the performance of the proposed sparse real-valued projection matrices (SERP) under combinatorial, greedy, and convex-relaxation approaches. Here, we have considered two scenarios: (i) when compressive samples are noiseless and (ii) the samples are contaminated by noise. For the noiseless case, the performance of the pair of traditional sparse binary projection matrices and SSMP as the decoder is compared with the proposed pair of sparse real-valued projection and MSSMP in the decoder. More specifically, we fixed the signal length to n = 10, 000 and considered signal sparsities k = ||x||0 = 100, 200, 300, . . . , 800. For each sparsity level, we considered six levels of sampling m = 3k, 4k, . . . , 8k. For each configuration of k and m, we generated a signal x ∈ Rn and uniformly at random selected k of its indices and let the signal values at those indices be a uniform random variable in the range of [−0.5, 0.5]. Then, we recorded the 82 1 8 0.8 m/k 7 6 0.6 5 0.4 4 0.2 3 100 200 300 400 k 500 600 700 800 0 Figure 5.1: Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when the projection matrix is sparse binary and the solver is SSMP. average of relative reconstruction error (||∆(y, P ) − x||/||x||) in 50 independent trials. The results for the pair of a sparse binary projection matrix with d = 18 and SSMP decoder are presented in Figure 5.1. For the same value of d, the results for MSSMP and sparse real-valued projection matrices with r = 1 and r = 2 are respectively shown in Figure 5.2 and Figure 5.3. As shown in those plots, MSSMP and the sparse real-valued projections lead to (sometimes much) lower levels of distortions in recovery when the number of samples are not sufficiently large (e.g. m = 4k and m = 5k for all values of k). However, both of these two approaches (MSSMP and SSMP) introduce virtually no error in the recovery process when m > 6k. Note that under the MSSMP decoder, the results from sensing matrices with r = 1 are often less distorted than those from matrices with r = 2. Now let us focus on the scenario when samples are noisy, i.e. y = P x + µ where ||µ|| > 0. Fixing the signal length, sparsity and number of samples respectively to n = 5000, k = 250 and m = 5k = 1250, we varied the relative error in sampling (||µ||/||P x||) from 0.1 to one in steps of size 0.1. For each relative error in sampling, we generated 100 random signals and projection 83 1 8 0.9 0.8 7 0.7 0.6 m/k 6 0.5 5 0.4 0.3 4 0.2 0.1 3 100 200 300 400 k 500 600 700 800 0 Figure 5.2: Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when the projection matrix is a sparse real-valued sensing matrix (SERP) with r = 1 and the solver is MSSMP. 1 8 0.9 0.8 7 0.7 0.6 m/k 6 0.5 5 0.4 0.3 4 0.2 0.1 3 100 200 300 400 k 500 600 700 800 0 Figure 5.3: Relative error of reconstruction (∥∆(y, P ) − x∥/∥x∥) for 50 random signals x of length n = 10, 000 as a function of the sparsity level k = ∥x∥0 and sampling ratio m/k, when the projection matrix is a sparse real-valued sensing matrix (SERP) with r = 2 and the solver is MSSMP. 84 matrices as in the noiseless case scenario (i.e. uniformly at random selecting the indices of the signal to be non-zero and populate those indices uniformly in the range of [−0.5, 0.5]). Then, the averages of relative errors in reconstruction as a function of the relative error in sampling were recorded for all three types of CS-decoders and are presented in Figure 5.4-Figure 5.6. Let us, we briefly highlight the key results of these plots. As shown in Figure 5.4, the pair of sparse real-valued projections and MSSMP decoder leads to higher qualities of recovery, when compared to the pair of sparse binary projections and SSMP. Within the class of sparse real-valued projections, when the relative noise energy is small, projection matrices with r = 1 seems to be better sensing operators. However, as the noise energy increases, we introduce less error in the recovery process if non-zero entries of the sparse projection matrix are Gaussian random variables in R2 (i.e. r = 2). When BP [17] is employed as the decoder, the class of sparse real-valued projection matrices are virtually as good as dense Gaussian projection matrices (see Figure 5.5). Within the class of sparse real-valued projections and for fixed value of d, BP consistently introduces less errors in the approximation for matrices with r = 2. For evaluating the performance of the sparse real-valued projection in a greedy CS framework, we selected the algorithm of Cosamp [28] for the decoder side. Figure 5.6 shows that Cosamp is more sensitive to the deployed projection matrix compared to BP when the projection matrix is from the class of sparse real-valued projections. Figure 5.6 and Figure 5.5 suggest that for a fixed value of d and under non-combinatorial class of solvers, deploying sparse real-valued projection matrices with parameter r being slightly greater than one, leads to less error in the recovery process. 85 Average of relative error in reconstruction 1 0.8 0.6 Real sparse projection, r=1−MSSMP Real sparse projection, r=2−MSSMP Binary sparse projection−SSMP 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Relative error in sampling 0.8 0.9 1 Figure 5.4: Comparing the performance of the sparse real valued projection matrices and MSSMP with sparse binary projections under SSMP decoder when samples are noisy. Average of relative error in reconstruction 1 0.8 0.6 SERP−BP, r=1 SERP−BP, r=2 Random Gaussian dense−BP 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Relative error in sampling 0.8 0.9 1 Figure 5.5: Comparing the performance of the sparse real-valued and dense Gaussian projection matrices when BP is the decoder and samples are noisy. 86 Average of relative error in reconstruction 1 0.8 0.6 Real sparse projection−Cosamp, r=1 Real sparse projection−Cosamp, r=2 Random Gaussian dense−Cosamp 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Relative error in sampling 0.8 0.9 1 Figure 5.6: Comparing the performance of the sparse real-valued and dense Gaussian projection matrices when Cosamp is the decoder and samples are noisy. 87 Chapter 6 Future works 6.1 Introduction CRISP, HCS and many other fast combinatorial algorithms (e.g. [38, 37]) follow the traditional (i.e. not message passing) approach of combinatorial algorithms where each coefficient of the unknown signal is estimated only once. The main advantage of such approach is that the decoding process would be extremely fast. On the negative side, such algorithms suffer from high sample requirements. Algorithm 10 shows how typical combinatorial approaches to CS operate for solving (6.1) under the zero pseudo norm p = 0. Our initial effort was to utilize the w-RIP property of SERP matrices to show that one can find an extremely low complexity traditional combinatorial algorithm under SERP encoding matrices, without sacrificing sample requirements. As we show in the rest of this chapter, such algorithm indeed exists when samples are not affected by noise. However, the moment that samples are contaminated by noise, we would be faced with some challenging problems. We leave solving those challenges for future works and propose some possible research paths to that end. 6.2 Another look at combinatorial algorithms Algorithm 10 summarizes a traditional approach for combinatorial algorithms to CS. In the following, we show that Algorithm 11 could be perceived as the recursive formulation of Algorithm 88 10 for solving the equation (6.1) when p is set to zero in the third line of Algorithm 11. ˆ arg min∥ˆ x∥p : s.t. y = P x (6.1) ˆ x Algorithm 10: A typical combinatorial algorithm for CS. The set of coefficients which their values are determined, is denoted by z. input : P ∈ Rm×n and y ∈ Rm output: xˆ ∈ Rn yˆ = y, z = ∅ ; while z ̸= [n] do Find l ⊆ [m] such that ||xωl ||0 = 1 ; Let xˆωl denotes estimate of values xωl ; For θ = ωl : yˆΩ ← yΩ − P Ω ,ω xωl , z ← z ∪ ωl ; θ θ θ l end Algorithm 11: ∆(P , θ, y) the recursive formulation of Algorithm 10. input : P ∈ Rm×n , θ and y ∈ Rm output: xˆ ∈ Rn xˆ = 0n ; Find l ⊆ [m] where ||xωl ||0 = 1 ; xˆθ1 = arg min∥ξ∥p : s.t. yl = P l,θ1 ξ ; ξ xˆθ\θ ← ∆ (P , θ\θ1 , y − P xˆ) ; 1 Initializing with θ = [n] and after finding θ1 such that ∥ˆ xθ1 ∥0 = 1 (i.e. finding an isolated nonzero), Algorithm 11 recursively finds a subset of signal coefficients in the un-recovered part θi+1 ⊆ [n]\ ∪j∈[i] θj such that ∥xθi+1 ∥0 = 1 where θj denotes the value of θ1 (line 2 in Algorithm 11) in the j-th level of the stack. This process continues until all non-zero coefficients are recovered. Note that, for a vector x with a single non-zero entry (∥x∥0 = 1) and y = P x, the solutions to (6.1) for p = 0 and p = 1 are equal by the triangle inequality. Also note that for any signal x and two disjoint subsets of a ⊆ [n] and ac := [n]\a one has: ∥xa ∥p + ∥xac ∥p = ∥x∥p both for p = 0 and p = 1 (see line 3 and 4 in Algorithm 11). Combining this observation with the 89 equality of solutions of (6.1) for p = 0 and p = 1, one can conclude that in Algorithm 11, the solutions for p = 0 and p = 1 are exactly the same. Hence, typical combinatorial approaches to CS (Algorithm 10) could be recast as a recursive ℓ1 minimizer. The success of the aforementioned recursive algorithm depends on: • The ability of step 3 in Algorithm 10 to find isolated non-zeros. • The quality of the projection matrix P in recoverability More specifically, the first condition demands that the decoder at the i-th iteration could truly find θi = ωl , a subset of signal coefficients, containing only one non-zero (∥xθi ∥0 = 1). Meanwhile, the second requirement states that, at least one isolated non-zero must exist throughout the recovery process and hence such process shall not be halted before the perfect recovery. It has been shown [37, 38] that certain sparse matrices, for instance the adjacency matrices of high quality expander graphs provide the second condition. In the following, we assume that the employed sparse projection matrix has the second condition, meaning that if isolated non-zeros are correctly identified in step 3 of Algorithm 10, then the recovery process will not halt. To that end and to make our analysis simple, we adapt the SERP matrices, introduced in the previous chapter. That is, each column of P is non-zero in exactly rd = O (log n) random indices for a constant r ≥ 1. Having the guarantee of recoverability, now our concern will be to satisfy the first condition, i.e. correctly finding isolated non-zeros. The most common approach for finding isolated non-zeros (in case of a binary sparse projection matrix) is to apply a voting mechanism (see [68, 35, 38, 37]). More specifically, assume that there exists a set of sample indices of l ⊆ [m] such that all of them vote the same value and they span the same single coefficient i.e.: ∃l ⊆ [m] : ∀p, q ∈ l : yp = yq , ∩j∈l ωj = i 90 (6.2) Then typical combinatorial algorithms infer that xˆi equals to that vote and xˆj = 0 for j ∈ ωl \i. Such conclusion might imply certain presumptions about the underlying signal. For example, it has been assumed in [35] that no two subsets θ1 , θ2 ⊆ [n] exist such that ∥xθ2 ∥1 ≈ ∥xθ2 ∥1 or other assumptions in some other methods such as non-negativity of the signal and so on. Consequently, the applicability of the aforementioned approaches will be restricted to signals, satisfying corresponding conditions. On the other hand, due to the non-binary nature of the proposed sparse projection matrix P , the mechanism of voting is not applicable in case of P . We introduced the notions of “alignment” and “isolation” in Chapter 2. In this chapter, we show that the method of voting is a special case of such notions (alignment and isolation) in case of binary matrices. Furthermore, we present necessary and sufficient conditions for detecting isolated non-zeros under real-valued sparse projection matrices. As before, assume P ∈ Rm×n is a matrix where each of its columns is non-zero in exactly rd = O (log n) random indices. Also assume that, P is an instance of SERP matrices. Since P corresponds to the adjacency matrix of a high quality expander graph (with high probability) [37] and the analysis of [37] is not based on values of P , therefore the recoverability holds for P . Consequently in here, we only discuss lines 3 and 4 in Algorithm 11. We show, how one can find subsets of signal coefficients containing only one non-zero, by examining compressive samples. To that end, it would be beneficial to have a brief review of notions of isolation and alignment in here. We say “xi is isolated in sample indices of l ⊆ [m] if ∥xωl ∥0 = 1 and xi ̸= 0. In other words, xi is the only non-zero signal coefficient (among indices ωl ) spanned by samples yl . Clearly, since P :,i is non-zero only in indices Ωi , therefore l ⊆ Ωi . For q ⊆ [m] we say “yq is aligned with the j-th column of P ” if: ∀i ∈ q : P i,j ̸= 0 and ∃α ̸= 0 : yq = αP q,j . Note that, if P is binary (i.e. P q,j is an all one vector), then the definition of alignment reads as “there exists 91 a set of samples (yq ), all voting the same value of α”. This is exactly the voting mechanism, employed in combinatorial algorithms. This verifies our claim that the voting mechanism is in fact an alignment in the special case of binary matrices. Assume that xi ̸= 0 is isolated in sample indices of l ⊆ Ωi ⊆ [m]. Then: yl = P l,ω xωl = xi P l,i l (6.3) In words, if xi is isolated in sample indices of l then yl will be aligned with the i-th column of P . Thus isolation is a necessary condition for alignment. Now, if isolation is a sufficient condition for alignment as well, then isolated non-zero coefficients in the signal could be found by looking for alignment of samples among the columns of P . For now assume that isolation and alignment are equivalent. Then step 3 and 4 in Algorithm 11 could be implemented as follows: to see whether xi is isolated in a set of unknown samples, one might consider P Ωi ,i and looks for q ⊆ Ωi such that yq = αP q,j . If there exists any q with those conditions, then (6.3) could be applied to infer xi = α. In the rest of this section, we identify the underlying conditions under which isolation is a sufficient condition for alignment. Here, it is important to highlight the impact of the size of the set l ⊆ Ωi ⊆ [m] in our analysis. Since, in an extreme case when |l| = 1 in (6.3), there always exists a meaningless alignment. Recall that, each column of P is non-zero in exactly rd = O (log n) entries. In [37], it has been shown that in case of a sparse binary projection matrix based on the adjacency matrix of a left d-regular random graph, there exists at least τ ≥ 1 + d/2 samples which span only one (and the same) non-zero in all stages of decoding. Hence, in here we have l = τ > rd/2 = O (log n) and ignore alignments with size smaller than rd/2 in the decoder. As before, assume yl is aligned with the i-th column of P , where l > d/2. Since by definition yl = P l,ω xωl and ∥yl ∥2 > 0, there are two possible cases here: (a) ∥xωl ∥0 = 1 or (b) ∥xωl ∥0 > 1. l 92 In the first case, since yl is non-zero in all entries (∥yl ∥0 = |l|) and for all j ∈ ωl \i we have ∥P l,j ∥0 < |l| (this comes from the expander property of P ) thus one can conclude that, xi is the only non-zero among indices of ωl and its value can be easily computed by (6.3). In words, in the first case, isolation is a sufficient and necessary condition for alignment. Unfortunately, a similar argument generally does not hold true for the second case (∥xωl ∥0 > 1) for P . Nevertheless, here we show how if the parameter r in SERP matrices is sufficiently large, then the second case would never happen with a very high probability. To that end, we follow these steps (a) for any subset of samples l ⊆ [m] with l ≥ 1 + d/2 = O (log n) first we find Q = max{∥xωl ∥0 }, the maximum number of non-zeros which will be spanned (with high probability) by samples l. (b) Then we find the value of r, such that every Q columns of P l,ω would be independent. Finally by using l elementary arguments, one can show that it is impossible to have ∥xωl ∥0 > 1 and yl to be aligned with the i-th column of P . For a fixed subset of sample indices l, define the binary random variable Xi as: Xi = Note that Q =    1 xi ̸= 0, i ∈ ωl   0 otherwise (6.4) ∑n i=1 Xi . Now since the probability that xi ̸= 0 is independent from the probability of the event that i ∈ ωl , thus: k Pr (Xi = 1) = n ( (m−l) ) d) 1 − (m (6.5) d For now assume that r = 1, later we derive how large we should set r. Assuming k (and hence m) is much larger than one, one can use the approximation: 1/(m − d + 1) ≈ 1/m. Using the 93 inequality 1 + α ≤ eα , it can be shown that µ = E[Q] = n Pr (Xi = 1) ≤ k (1 − exp (l/m))d (6.6) Recalling that d = O (log n) = c1 log n, d/2 < l < d and m = O (k log n) = c2 k log n, it can be easily derived that µ = O(1). Now it only remains to bound the deviation of Q from µ. Fortunately, since Q is the sum of Bernoulli random variables, there exists good concentration inequalities for it. For instance, by using Chernoff bound for Poisson trials, it can be easily shown that for δ ≥ 6.389 (which makes log (1 + δ) > 2) we have: 1 Pr (Q ≥ µ + δβ log n) ≤ β n (6.7) Now if the spark [43] of P l,ω is more then Q, then it is impossible to have yl aligned with l the i-th column of P and at the same time ∥xωl ∥0 > 1. This can be proved by contradiction. More specifically, assume 1 < ∥xωl ∥ < Q and yl is aligned with the i-th column of P (i.e. yl = P l,ω xωl = αP l,j ). Define a new vector x′ where x′j = xj − α and x′i = xi for i ∈ ωl \j. l Then P l,ω x′ = 0 and ∥x′ ∥0 ≤ Q which contradicts the presumption that the spark of P l,ω is l l more than Q. In summary, if the parameter r in SERP matrices is tuned such that (6.7) is satisfied, then alignment translates into isolation with high probability . 6.3 noise It seems that since w-RIP holds for SERP matrices and typical combinatorial decoding algorithms are in fact quick implementation of BP/linear programming [17] (minimizing ℓ1 norm of the solution subject to a set of equations), hence such combinatorial algorithm should be robust when 94 samples are noisy. A naive justification could be following these three steps (a) the decoding algorithm is trying to find the sparsest solution and for each sub-problem it never makes a mistake (b) using the triangle inequality, the solution found for that sub-problem has the minimum ℓ1 norm and (c) applying induction, the algorithm should find all non-zero coefficients correctly. Since w-RIP is a sufficient condition for the robustness of BP (at least in a for any arbitrary signal), this also proves that our algorithm should be robust in the presence of noise. On the other hand, we are facing another challenge when samples are contaminated by noise and that is: when samples are noisy, then no perfect alignment can be found. This problem might be solved by allowing ϵ-alignment. We say that a vector h is ϵ-aligned with the i-th column of P if: ( | cos−1 |P T:,i h| ∥P :,i ∥2 ∥h∥2 ) |≤ϵ (6.8) Since the argument of cos−1 is strictly positive and cos−1 is strictly decreasing, one might only consider the argument inside parentheses when using the notion of ϵ-alignment during the proof. As noted before, RIP translates into the fact that sub-matrices of a matrix behave like an orthonormal projection. Suppose y = P x + e where e is the noise vector, P has a small value of isometry constant δk and x is non-zero in index of i. Then we can conclude that: columns of P are so far apart from each others that the cosine of the angle between vector h = yΩi and vector P Ωi ,i gets its maximum if the noise energy ∥e∥2 does not dominate the sample energy ∥h∥2 . Thus, still we can find the isolated non-zeros even in the presence of the noise. Assume that we are given noisy samples y = P x + e. Furthermore, suppose that a subset of samples l span only one non-zero coefficient xi ̸= 0 (∥xωl ∥0 = 1 and i ∈ ωl ). The first question to be answered is how noise would distort the angle of pure (noiseless) samples P l,ω xωl = xi P l,i l with P l,i compared to the noisy angel between P l,ω xωl + el and P l,i . Also, we need to find the l 95 lower bound L for |P T:,i yl |/∥P :,i ∥2 ∥yl ∥2 when there is an isolation and also find the upper bound U for the same term when there is no isolation. If U < δ < L for a constant δ (function of ϵ), then the notion of ϵ-alignment could be applied in case of noisy samples. Providing a rigorous proof is left as a future work. However, in the rest of this section, we highlight some directions and challenges in that path. A statistical argument could be applied here, where one derives the distribution of the numerator of 6.8. Note that when an isolation is occurred, the numerator inside cos−1 function is P Tl,i yl = xi ∥P l,i ∥22 + P Tl,i el (6.9) The distribution of the first term ∥P l,i ∥22 is simply Chi-square. However, the distribution of the second term P Tl,i el is Chi. Since two different distributions are added here, finding the distribution of P Tl,i yl would be difficult. There could be two solutions for such challenge (a) using the method of moment generating functions or (b) starting with a simple assumption that the noise e itself is normally distributed, which makes the second term also Chi-squared. However even in the second case, we have to deal with the problem of weighted-Chi squares. One might think that similar approximations in the previous chapter could be utilized. However due to other approximations which might be applied later, this act should be taken with cautious. For the denominator, we have to find the distribution of the product of ∥P l,i ∥2 ∥yl ∥2 . Although for each of multiplicands, the distribution could be derived fairly easily, finding the product distribution could be cumbersome. The problem could be even more challenging when we want to calculate the distribution of the random variable generated by division of numerator by denominator. Although a remedy in here could be utilizing w-RIP inequalities. To find the upper bound L, the distributions of the similar terms when there is no isolation have 96 to be calculated. For instance, when samples l are spanning more than one non-zeros then P Tl,i yl = ∑ xi ∥P l,i ∥22 + ∑ ∑ P Tl,i P l,j (6.10) i∈ωl j∈ωl \i i∈ωl Again here, the first sum has a distribution of weighted Chi-squares. The second sum however is distributed based on sum of normal products. Despite the fact that there exist closed formulas for distribution of normal-product random variables, adding those terms could be challenging. However, again using the method of moment generating functions sounds attractive since it separates such sum into product of two terms where there exists either good approximations or even closed formulas for each term. 97 BIBLIOGRAPHY 98 BIBLIOGRAPHY [1] Emmanuel Cands and Justin Romberg, “Quantitative robust uncertainty principles and optimally sparse decompositions”. (Foundations of Comput. Math., 6(2), pp. 227 - 254, April 2006) [2] David Donoho, “Compressed sensing”, IEEE Trans. on Information Theory, 52(4), pp. 1289 1306, April 2006 [3] Emmanuel Cands, The restricted isometry property and its implications for compressed sensing. (Compte Rendus de l’Academie des Sciences, Paris, Series I, 346, pp. 589-592, 2008). [4] Emmanuel Cands and Justin Romberg, “Practical signal recovery from random projection”, (Preprint, Jan. 2005) [5] Emmanuel Cands, Justin Romberg, and Terence Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information”. IEEE Trans. on Information Theory, 52(2) pp. 489 - 509, February 2006 [6] D. Donoho and J. Tanner, “Neighborliness of randomly projected simplices in high dimensions”, Mar. 2005, Preprint. [7] Emmanuel Cands and Terence Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. On Information Theory, 52(12), pp. 5406 - 5425, December 2006 [8] David Donoho and Yaakov Tsaig, “Extensions of compressed sensing”. (Signal Processing, 86(3), pp. 533-548, March 2006) [9] Emmanuel Cands and Terence Tao, Decoding by linear programming. (IEEE Trans. on Information Theory, 51(12), pp. 4203 - 4215, December 2005) [10] E. Candes and T. Tao, “Error correction via linear programming”, Found. of Comp. Math., 2005, Submitted. [11] Jarvis Haupt, Waheed U. Bajwa, Gil Raz, and Robert Nowak, “Toeplitz compressed sensing matrices with applications to sparse channel estimation”. (Preprint, 2008) 99 [12] Dror Baron, Shriram Sarvoham, and Richard G. Baraniuk, “Bayesian compressive sensing via belief propagation”. (Preprint, 2008) [13] A.M. Bruckstein, D.L. Donoho, and M. Elad, “From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images”, SIAM Review, Vol. 51, No. 1, Pages 34-81, February 2009. [14] Dror Baron, Marco F. Duarte, Michael B. Wakin, Shriram Sarvotham, and Richard G. Baraniuk, “Distributed compressive sensing”. (Preprint, 2005) [15] Shriram Sarvotham, Dror Baron, and Richard Baraniuk, “Measurements vs. bits: Compressed sensing meets information theory”. (Proc. Allerton Conference on Communication, Control, and Computing, Monticello, IL, September 2006) [16] Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin, “A simple proof of the restricted isometry property for random matrices”. Constructive Approximation, 28(3), pp. 253-263, December 2008, [Formerly titled “The Johnson-Lindenstrauss lemma meets compressed sensing”] [17] S. Chen, , D. Donoho, and M. Saunders. “Atomic decomposition by basis pursuit”. Technical Report 479, Department of Statistics, Stanford University, May 1995. [18] R. Tibshirani, “Regression shrinkage and selection via the lasso”. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288, 1996. [19] Thomas Blumensath and Mike E. Davies, “Gradient pursuits”. (IEEE Trans. on Signal Processing, 56(6), pp. 2370 - 2382, June 2008) [20] Ingrid Daubechies, Massimo Fornasier, and Ignace Loris, “Accelerated projected gradient method for linear inverse problems with sparsity constraints”. (Preprint, 2007) [21] T. Blumensath, M. E. Davies, “Stagewise weak gradient pursuits. Part I: Fundamentals and numerical studies”. (Preprint, 2008) [22] T. Blumensath, M. E. Davies, “Stagewise weak gradient pursuits. Part II: Theoretical properties”. (Preprint, 2008) [23] Rahul Garg, Rohit Khandekar, “Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property”. (ICML 2009, Montreal, Canada) 100 [24] Mrio A. T. Figueiredo, Robert D. Nowak, and Stephen J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems”. (IEEE Journal of Selected Topics in Signal Processing: Special Issue on Convex Optimization Methods for Signal Processing, 1(4), pp. 586-598, 2007). [25] S. G. Mallat and Z. Zhang, “Matching Pursuits with Time-Frequency Dictionaries”, IEEE Transactions on Signal Processing, December 1993, pp. 3397-3415. [26] Y. C. Pati R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition”, Proceedings of the 27 th Annual Asilomar Conference on Signals, Systems, and Computers, Nov. 1993. [27] Thong T. Do, Lu Gan, Nam Nguyen, and Trac D. Tran, “Sparsity adaptice matching pursuit algorithm for practical compressed sensing”. Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, California, October 2008 [28] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples”. Preprint, 2008 [29] Deanna Needell and Roman Vershynin, “Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit”. Preprint, 2007 [30] David L. Donoho, Yaakov Tsaig, Iddo Drori, and Jean-Luc Starck, “Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit”. Preprint, 2007 [31] Piotr Indyk, “Explicit constructions for compressed sensing of sparse signals”. Symp. on Discrete Algorithms, 2008. [32] R. Berinde, A. C. Gilbert, P. Indyk, H. Karloff, and M. J. Strauss, “Combining geometry and combinatorics: A unified approach to sparse signal recovery”. (Preprint, 2008) [33] Radu Berinde and Piotr Indyk, “Sparse recovery using sparse random matrices”. (Preprint, 2008) [34] Ronald A. DeVore, “Deterministic constructions of compressed sensing matrices”. J. of Complexity, 23, pp. 918 - 925, August 2007 [35] Shriram Sarvotham, Dror Baron, and Richard Baraniuk, “Sudocodes - Fast measurement and reconstruction of sparse signals”. Proc. IEEE Int. Symposium on Information Theory (ISIT), Seattle,Washington, July 2006 101 [36] Weiyu Xu and Babak Hassibi, “Efficient compressive sensing with deterministic guarantees using expander graphs”. IEEE Information Theory Workshop, Lake Tahoe, September 2007 [37] Sina Jafarpour, Weiyu Xu, Babak Hassibi, and Robert Calderbank, “Efficient compressed sensing using high-quality expander graphs”. (to appear in IEEE Trans Info Theory, 2009) [38] M. Amin Khajehnejad, Alexandros G. Dimakis, Weiyu Xu, Babak Hassibi, “Sparse Recovery of Positive Signals with Minimal Expansion” . (Preprint, 2009) [39] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Hybrid Compressed Sensing”, IEEE MMSP 2010. [40] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Complex Sparse Projections for Compressed Sensing”, Accepted for publication in Conference on Information Sciences and Systems (CISS10), Johns Hopkins University, Baltimore, MD, USA. [41] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Complex Randomness-inStructured Projections for Compressed Sensing”, in Proceedings of IEEE International Conference on Image Processing (ICIP09), Egypt. [42] Abdolreza Abdolhosseini Moghadam and Haydar Radha, “Practical Compressed Sensing with Log-of-Prime Projections”, in Proceedings of Conference on Information Sciences and Systems (CISS09), Johns Hopkins University, Baltimore, MD, USA, March 18-20,2009. [43] Michael Elad, “Optimized projections for compressed sensing”. IEEE Trans. on Signal Processing, 55(12), pp. 5695-5702, December 2007 [44] Venkat Chandar, “A negative result concerning explicit matrices with the restricted isometry property”. (Preprint, 2008) [45] http://sparselab.stanford.edu [46] D. Donoho, “High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension”, Jan. 2005, Preprint. [47] http://www.mathpages.com/home/kmath199.htm [48] Martin Raab and Angelika Steger, “”balls into bins” - A simple and tight analysis”, Lecture notes in computer science, vol. 1518, Springer, Berlin, 1998 102 [49] “Combinatorial Designs: Construction and Analysis” by Douglas R. Stinson, SpringerVerlag, New York inc, 2004 [50] M.Luby, “LT Codes”, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002 [51] Stphane Mallat, “A wavelet tour of signal processing” 2nd Edition, Academic Press, 1999, ISBN 0-12-466606-x [52] M. N. Do and M. Vetterli, “The contourlet transform: an efficient directional multiresolution image representation,” IEEE Trans. on IP, 14(12):2091-2106. [53] E. Cands and D. Donoho, “Curvelets a surprisingly effective nonadaptive representation for objects with edges.” In: A. Cohen, C. Rabut and L. Schumaker, Editors, Curves and Surface Fitting: Saint-Malo 1999, Vanderbilt University Press, Nashville (2000), pp. 105120. [54] Clark, George C., Jr.; Cain, J. Bibb (1981). “Error-Correction Coding for Digital Communications”. New York: Plenum Press. ISBN 0-306-40615-2. [55] Lin, Shu; Costello, Daniel J. Jr. (1983). “Error Control Coding: Fundamentals and Applications”. Englewood Cliffs NJ: Prentice-Hall. ISBN 0-13-283796-X. [56] http://www.math.msu.edu/ jhall/classes/codenotes/coding-notes.html [57] R. G. Gallager, “Low Density Parity Check Codes”, Monograph, M.I.T. Press, 1963. [58] Amin Shokrollahi (2003) LDPC Codes: An Introduction. [59] “Probability and Computing, Randomized Algorithms and Probabilistic Analysis“, Michael Mitzenmacher and Eli Upfal, Cambridge University Press, 2005. [60] “Iterative Methods for Solving Linear Systems”, Anne Greenbaum, Society for Industrial and Applied Mathematics, 1997. [61] David L. Donohoa, A. Maleki and A. Montanari, “Message-passing algorithms for compressed sensing”, preprint, 2009. [62] L. Bassalygo and M. Pinsker, “Complexity of an Optimum Nonblocking Switching Network without Reconnections”, Problem in Information Transmission, vol 9 no 1, pp. 289-313, 1973. 103 [63] P. Sarnak,“What is an Expander,” Notices of American Mathematical Society, Volume 51, Number 7. [64] R. Calderbank, S. Howard, and S. Jafarpour. (2009) “Construction of a large class of deterministic sensing matrices that satisfy a statistical isometry property.” [Online]. Available: http://www.dsp.ece.rice.edu/?les/cs/strip-?nal.pdf [65] Lu Gan, Cong Ling, Thong T. Do, and Trac D. Tran, “Analysis of the statistical restricted isometry property for deterministic sensing matrices using Steins method”. (Preprint, 2009). [66] David L. Donoho, Arian Maleki, Andrea Montanari, “the noise-sensitivity phase transition in compressed sensing”, IEEE transactions on information theory, Oct. 2011, vol. 57, issue 10, pages 6920-6941. [67] E. Candes, “A probabilistic and RIPless theory of compressed sensing”, IEEE transactions on information theory, Nov. 2011, Vol. 57, Issue 11, pages 7235-7254. [68] R. Berinde and P. Indyk, “Sequential Sparse Matching Pursuit”, IEEE Allerton 2009, pages 36-43. [69] J. Haupt and R. Nowak, “A generalized restricted isometry property”, preprint 2007. [70] H. Feiveson and F. C. Delaney, “the distribution and properties of a weighted sum of chi squares”, NASA technical notes, TN D-4575, May 1968. [71] S. Gabler and C. Wolff, “A quick and easy approximation to the distribution of a sum of weighted chi-square variables”, Statistical Papers, vol. 28, pp.317 -325 1987. 104