COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Electrical Engineering 2012 ABSTRACT COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed The problem of finding a unique and sparse solution for an underdetermined linear system has attracted significant attention in recent years. In this thesis, we compare two popular algorithms that are used for finding sparse solutions of underdetermined linear systems: Orthogonal Matching Pursuit (OMP) and Least Angle Regression (LARS). We provide an in-depth description of both algorithms. Subsequently, we outline the similarities and differences between them. OMP and LARS solve different optimization problems: OMP attempts to find an approximate solution for the l0-norm minimization problem, while LARS solves the l1-norm minimization problem. However, both algorithms depend on an underlying greedy framework. They start from an all-zero solution, and then iteratively construct a sparse solution until some convergence is reached. By reformulating the overall structure and corresponding analytical expressions of OMP and LARS, we show that many of the steps of both algorithms are almost identical. Meanwhile, we highlight the primary differences between the two algorithms. In particular, based on our reformulation, we show that the key difference between these algorithms is how they update the solution vector at each iteration. In addition, we present some of the salient benefits and shortcomings of each algorithm. Moreover, we exploit parallel processing techniques to speed-up the convergence of the algorithms. ACKNOWLEDGMENTS First of all, I would like to express my deep gratitude to my advisor Professor Hayder Radha for his guidance, inspiration and encouragement. Without his invaluable advice and support, this work would not be possible. In addition, I would like to thank the government of my country (Iraq) represented by ministry of higher education and scientific research for granting me the scholarship to study in the United State and supporting me financially during my study in Michigan State University. Furthermore, I would like to thank my colleague in WAVES lab Mohammad Aghagolzadeh for helping me a lot in this work. Also, I am thankful to my friends Wahhab Albazrqaoe and Mohammed Alqizwini. Their support and encouragement are appreciated. Finally, a special thanks to my parents and my wife for their encouragement and support throughout my study. iii TABLE OF CONTENTS LIST OF TABLES ........................................................................................................................ v LIST OF FIGURES ..................................................................................................................... vi ABBREVIATIONS .................................................................................................................... viii NOTATIONS................................................................................................................................ ix 1 Introduction ............................................................................................................................. 1 1.1 Background ......................................................................................................................... 1 1.2 Contributions....................................................................................................................... 5 1.3 Thesis Organization ............................................................................................................ 6 2 Greedy Pursuit Algorithms .................................................................................................... 8 2.1 Matching Pursuit (MP) ....................................................................................................... 8 2.2 Orthogonal Matching Pursuit (OMP) ............................................................................... 11 2.3 Stagewise Orthogonal Matching Pursuit (StOMP) ........................................................... 15 3 Least Angle Regression (LARS) .......................................................................................... 17 3.1 Basic LARS Algorithm ..................................................................................................... 17 3.3.1 Derivation of Updated Direction ................................................................................ 20 3.1.2 Derivation of Step Size ............................................................................................... 21 3.2 Modified LARS for solving the LASSO problem ............................................................ 27 4 Comparative Analysis of OMP and LARS ......................................................................... 32 4.1 Algorithmic Steps ............................................................................................................. 32 4.1.1 Reformulating step for updating the active set ........................................................... 35 4.1.2 Reformulating step for updating the solution vector .................................................. 36 4.2 Performance Analysis ....................................................................................................... 41 4.2.1 Convergence time ....................................................................................................... 41 4.2.2 Accuracy ..................................................................................................................... 42 5 Simulation Results and Discussion ...................................................................................... 48 5.1 Reconstructing images from their compressive samples .................................................. 48 5.2 Using parallel processing to speed-up convergence of the algorithms ............................. 55 5.3 Impact of measurement size and sparsity on MSE ........................................................... 58 5.4 Variation of the solution and correlation coefficients over iterations............................... 62 5.5 Study the algorithms performance through three dimensions (3D) examples .................. 67 6 Conclusion and Future Work .............................................................................................. 72 Appendix A .................................................................................................................................. 75 BIBLIOGRAPHY ....................................................................................................................... 78 iv LIST OF TABLES Table 4.1: The algorithmic steps of OMP and LARS................................................................... 34 Table 4.2: The original step for updating the active set and the reformulated one in OMP and LARS ............................................................................................................................................ 36 Table 4.3: The original step for updating the solution vector and the reformulated one in OMP and LARS...................................................................................................................................... 39 Table 4.4: The algorithmic steps of OMP and LARS after reformulating ................................... 40 v LIST OF FIGURES Figure 1.1: A sparse solution of an underdetermined linear system ............................................... 3 Figure 2.1: The MP algorithm ...................................................................................................... 10 Figure 2.2: The OMP algorithm.................................................................................................... 12 Figure 2.3: OMP approximates the vector 𝑦 by using the columns π‘Ž1, π‘Ž2 in Example 2.1 ......... 14 Figure 2.4: The StOMP algorithm ................................................................................................ 16 Figure 3.1: LARS approximates the vector 𝑦 by using the columns π‘Ž1, π‘Ž2................................. 20 Figure 3.2: The basic LARS algorithm ......................................................................................... 26 Figure 4.1: Columns of the matrix 𝐴 and the measurement vector 𝑦 in Example 4.1 .................. 43 Figure 4.2: The selection and updating steps at the first iteration of OMP and LARS in Example 4.1.The approximation change is obtained via multiplying the matrix 𝐴 by the current solution update vector Ξ”π‘₯𝑑 .......................................................................................................................... 45 Figure 4.3: The selection and updating steps at the second (last) iteration of OMP and LARS in Example 4.1. The approximation change is obtained via multiplying the matrix 𝐴 by the current solution update vector Ξ”π‘₯𝑑 ............................................................................................................ 46 Figure 4.4: The solution coefficients of the reconstructed vector π‘₯ over iterations of OMP and LARS in Example 4.1, and Euclidean norm of error between the original sparse vector π‘₯ and the reconstructed vector π‘₯ after the algorithms terminate .................................................................. 47 Figure 5.1: Reconstruct a patch of an image from its compressive samples by using OMP or LARS ............................................................................................................................................ 50 Figure 5.2: Reconstructed images from their compressive samples by using OMP and LARS with π‘š = 32 and 𝑛 = 64.............................................................................................................. 53 Figure 5.3: Reconstructed images from their compressive samples by using OMP and LARS with π‘š = 16 and 𝑛 = 64.............................................................................................................. 54 Figure 5.4: The convergence time of OMP and LARS against the number of used processors .. 56 Figure 5.5: The ideal and actual speedup ratios for the OMP algorithm against the number of used processors ............................................................................................................................. 57 Figure 5.6: Average MSE of 100 trials between the original sparse vectors and the reconstructed ones by OMP and LARS as a function of measurement size π‘š for different sparsity π‘˜ .............. 60 vi Figure 5.7: Average MSE of 100 trials between the original sparse vectors and the reconstructed ones by OMP and LARS as a function of sparsity π‘˜ for different measurement size π‘š .............. 61 Figure 5.8: The solution coefficients that correspond to the active columns over iterations of OMP and LARS ............................................................................................................................ 63 Figure 5.9: Absolute correlation of columns of the matrix 𝐴 in some iterations of OMP ............ 65 Figure 5.10: Absolute correlation of columns of the matrix 𝐴 in some iterations of LARS ........ 66 Figure 5.11: The columns of the matrix 𝐴 and the measurement vector 𝑦 in Example 5.1 ......... 68 Figure 5.12: The approximation change (π΄βˆ†π‘₯) at each iteration of OMP and LARS in Example 5.1, where πœ‡ = 0.5124, and the approximation change = 𝐴 Γ— βˆ†π‘₯ .............................................. 69 Figure 5.13: The columns of the matrix 𝐴 and the measurement vector 𝑦 in Example 5.2 ......... 70 Figure 5.14: The approximation change (π΄βˆ†π‘₯) at each iteration of OMP and LARS in Example 5.2, where πœ‡ = 0.8885, and the approximation change = 𝐴 Γ— βˆ†π‘₯ .............................................. 71 vii ABBREVIATIONS CS Compressed Sensing DCT Discrete Cosine Transform LARS Least Angle Regression LASSO Least Absolute Shrinkage and Selection Operator MP Matching Pursuit MSE Mean Square Error OMP Orthogonal Matching Pursuit PSNR Peak Signal to Noise Ratio StOMP Stagewise Orthogonal Matching Pursuit viii NOTATIONS π‘₯ Solution vector which is usually sparse π‘š Size of the measurement vector 𝑦 𝑦 𝑛 π‘˜ 𝐴 π‘Žπ‘– 𝐼 𝐼𝑐 𝑑 π‘Ÿπ‘‘ 𝑐𝑑 π‘₯𝑑 βˆ†π‘₯ 𝑑 𝑑𝑑 𝛾𝑑 πœ†π‘‘ Observation or measurement vector Size of the solution vector π‘₯ Sparsity (number of nonzero entries of the solution vector π‘₯ ) Projection matrix of size π‘š Γ— 𝑛 𝑖 π‘‘β„Ž column of the matrix 𝐴 Active set Inactive set Iteration counter Residual vector at iteration 𝑑 Correlation vector at iteration 𝑑 Current solution vector at iteration 𝑑 Solution update vector at iteration 𝑑 (i.e. βˆ†π‘₯ 𝑑 = π‘₯ 𝑑 βˆ’ π‘₯ π‘‘βˆ’1 ) Updated direction of LARS at iteration 𝑑 Step size of LARS at iteration 𝑑 The largest absolute entry in the correlation vector at iteration 𝑑, and represents the absolute correlation of the active columns in LARS ix Chapter 1 1 Introduction 1.1 Background Various applications in science and nature can be modeled as an underdetermined linear system which is characterized by having fewer equations than unknowns. Therefore, finding a solution for an underdetermined linear system is a central issue for a wide range of areas and applications. Some of these areas include: Compressed Sensing (CS) [1,2,3,4,5], error correction [6,7], minimum distance problems in coding theory [8], and a wide range of inverse problems [9]. In this thesis, we focus on presenting the problem of finding a solution for an underdetermined linear system in the context of the area of compressed sensing. In that context, instead of sensing a signal using a high sampling rate (e.g. the Nyquist rate), CS senses a compressive representation of that signal using a number of measurements that is smaller than the signal dimension. As a result, CS generates fewer measurements than traditional signal sampling methods that have been based on the Nyquist criterion for decades. Consequently, the CS framework leads to solving the following underdetermined linear system: 𝑦 = 𝐴π‘₯ (1.1) Here, 𝑦 ∈ 𝑅 π‘š is the measurement vector (compressive samples), 𝐴 is a known π‘š Γ— 𝑛 projection matrix with π‘š < 𝑛, and π‘₯ ∈ 𝑅 𝑛 is an unknown vector that represents the signal that we need to find. In the system of Equation (1.1), the number of unknowns is more than the number of equations. Therefore, there are either infinitely many solutions, or no solution if the vector 𝑦 is 1 not in the space spanned by the columns of the matrix 𝐴. To avoid the case of having no solution, we assume that the matrix 𝐴 is a full rank matrix, and its columns span the whole space 𝑅 π‘š . In practice, and for virtually all real applications, a unique solution for this system is required. It has been well-established under the area of compressed sensing and related literature that if the signal π‘₯ has a sparse representation (includes only a few nonzero entries) in some space (transform domain), it can be recovered uniquely [2,4,10,1,11,3,12]. To find the sparse solution for the underdetermined linear system, the following optimization problem should be solved: min β€–π‘₯β€–0 π‘₯ 𝑠𝑒𝑏𝑗𝑒𝑐𝑑 π‘‘π‘œ 𝐴π‘₯ = 𝑦 (1.2) where β€–π‘₯β€–0 is l0-norm, which represents the number of nonzero entries in a vector π‘₯ (i.e. β€–π‘₯β€–0 = {# π‘œπ‘“ 𝑖 ∢ π‘₯(𝑖) β‰  0}. When β€–π‘₯β€–0 β‰ͺ 𝑛 , π‘₯ is consider to be sparse. Solving the optimization problem characterized by Equation (1.2) yields to a sparse solution that represents the given vector 𝑦 as a linear combination of the fewest columns of the matrix 𝐴. Suppose that the vector π‘₯ has only π‘˜ nonzero entries with π‘˜ < π‘š < 𝑛. To find such π‘˜ nonzero entries of the vector π‘₯, π‘˜ columns of the matrix 𝐴 that best approximate the vector 𝑦 should be identified. For example in Figure 1.1, when shaded columns are recognized, nonzero shaded coefficients of the vector π‘₯ can be computed. The l0-norm minimization problem expressed by Equation (1.2) is an NP-hard problem [13]. To obtain a global solution, we need to examine the feasibility of all 2 𝑛 sparsity patterns of the vector π‘₯. Hence, the computation complexity of solving problem (1.2) is 𝑂(2 𝑛 ), which is not practical when 𝑛 is very large. As a result, some efforts in this area tend to find an approximate solution with tractable complexity by tolerating some error: 2 min β€–π‘₯β€–0 π‘₯ 𝑠𝑒𝑏𝑗𝑒𝑐𝑑 π‘‘π‘œ where πœ– > 0 is a small constant. ‖𝐴π‘₯ βˆ’ 𝑦‖2 < πœ– (1.3) The problem characterized by Equation (1.3) is commonly known as β€œsparse approximation” [13]. π‘š Γ— 𝑛 0 0 * 0 0 0 * * 0 0 0 0 π‘₯ Matrix 𝐴 𝑛 = 𝑦 π‘š * : nonzero Figure 1.1: A sparse solution of an underdetermined linear system Because l0-norm is a nonconvex function, one approach for finding a suboptimal solution of the problem (1.2) is relaxing the l0-norm by an l1-norm which is convex function [14,15,16]: min β€–π‘₯β€–1 π‘₯ 𝑠𝑒𝑏𝑗𝑒𝑐𝑑 π‘‘π‘œ 𝐴π‘₯ = 𝑦 (1.4) In the signal processing literature, the l1-norm minimization problem (1.4) is known as β€œBasis Pursuit” [14]. The solution that is obtained by solving problem (1.4) is equivalent to the solution of l0-norm minimization problem (1.2) under some conditions [17,18,19,20,21]. The l1-norm minimization (1.4) is a convex optimization problem, and therefore, it can be solved globally and efficiently by adopting some convex techniques such as the interior point method [22]. 3 Another general direction for finding a sparse solution of the underdetermined linear system (1.1) is to use greedy algorithms [23]. There has been a wide range of greedy algorithms proposed in the literature for finding sparse solutions for the CS problem. One popular family of greedy algorithms, which received a great deal of attention, is the family of greedy pursuit algorithms. Some of these algorithms include: Matching Pursuit (MP) [24], Orthogonal Matching Pursuit (OMP) [25,26,27], Stagewise Orthogonal Matching Pursuit (StOMP) [28], Regularized Orthogonal Matching Pursuit (ROMP) [29], Orthogonal Complementary Matching pursuit (OCMP) [30], Gradient Pursuit (GP) [31], and Compressive Sampling Matching pursuit (CoSaMP) [32]. Another popular framework, which can be considered greedy in nature, for solving the underdetermined system (1.1) is Least Angle Regression (LARS) [33]. Before elaborating further on both of these families of algorithms, namely matching pursuit and least angle regression, we briefly highlight the overall strategy used by these and other greedy algorithms. First, it is worth emphasizing that one of the primary motivations for pursuing greedy algorithms is that they are faster than convex techniques. However, greedy algorithms do not always yield as an accurate solution as convex techniques do. Being iterative, a greedy algorithm usually makes the choice that appears the best at that particular iteration of the algorithm. In essence, a greedy algorithm makes a locally optimal choice at each iteration with the hope that this choice will lead to a globally optimal solution. In other words, these algorithms iteratively make one greedy choice after another until the final solution is reached. Thus, the choice made by a greedy algorithm may depend on past choices, but not on future choices. Equally important, greedy algorithms, in general, never change their pervious choices that are made so far. 4 To find a solution for the optimization problem (1.2), greedy algorithms typically begin from an all-zero solution, and then iteratively build a sparse solution by selecting a set of columns from the matrix 𝐴 that best approximates the vector 𝑦. Furthermore, such algorithms usually update the solution π‘₯ at each iteration in a way that depends on the selected columns up to that point. This update leaves a residual signal, which represents the part of the measurement vector 𝑦 that have not been used to recover the solution vector (up to that point); and hence, this residual will be used at the next iteration. These steps are repeated until the current residual falls below a certain very small value. 1.2 Contributions In this thesis, we focus on analyzing and comparing the two, arguably most popular greedy algorithms that are used for finding a sparse solution of an underdetermined linear system: Orthogonal Matching Pursuit (OMP) and Least Angle Regression (LARS). OMP approximates the l0-norm minimization problem (1.2), while LARS solves the l1-norm minimization problem (1.4). Although they solve different optimization problems, we show that both OMP and LARS have almost identical steps after reformulating the underlying analytical expressions of these algorithms. The significant difference between them is how they update the solution vector. Even then, we illustrate some of the subtle similarities and differences between their update strategies; and try to shed some light on their strengths and weaknesses in that respect. More specifically, our key contributions include: β€’ Providing a comparative analysis of OMP and LARS in terms of how they find a sparse solution of an underdetermined linear system. Comparing such popular and widely used 5 algorithms is important to their understanding that could lead to better and more efficient implementations. β€’ Providing a thorough insight and detailed derivation of computing the updated direction vector and step size parameter that are used in LARS. Prior work, including the original LARS paper [33] did not provide the same level and detailed analysis that is presented in this thesis. β€’ Reformulating some algorithmic steps of OMP and LARS to discover similarities and differences between them. β€’ Presenting some of the benefits and shortcomings of OMP and LARS. β€’ Simulating the comparative analysis of OMP and LARS through different examples covering a wide range of cases. 1.3 Thesis Organization The rest of the thesis is organized as follows: β€’ In chapter 2, we review three greedy pursuit algorithms that are used for finding a sparse solution of an underdetermined linear system, and concentrate on explaining the OMP algorithm. We start from the basic MP algorithm, which is considered the ancestor of OMP. Then, we describe OMP in detail. We end the chapter by explaining StOMP, which is a modified version of OMP. β€’ In chapter 3, we review how LARS can be used for finding a sparse solution of an underdetermined linear system [34]. First, we describe the basic LARS algorithm [33] and the derivations of its main steps. Then, we explain how LARS can be modified to solve the well- 6 known Least Absolute Shrinkage and Selection Operator (LASSO) problem [35], which can be defined as a constrained version of ordinary least square for solving the l1-norm minimization problem. β€’ In chapter 4, we compare OMP and LARS in terms of their algorithmic steps and performance. We reformulate some steps of OMP and LARS to discover the similarities and differences between them. To study the performance of OMP and LARS, we compare the convergence time and solution accuracy in each algorithm. β€’ In chapter 5, we simulate the comparative analysis of OMP and LARS using MATLAB. In general, we go through different examples to demonstrate the similarities and differences between OMP and LARS from various perspectives. First, we employ OMP and LARS to reconstruct some images from their compressive samples. Afterward, we exploit parallel processing techniques to speed-up convergence of the algorithms, and observe the efficiency of using different number of processors. Next, we examine the performance of OMP and LARS in terms of mean square error (MSE) as a function of the measurement size π‘š and the sparsity π‘˜. To study the difference in updating process of OMP and LARS, we examine and plot the coefficients of the solution and correlation vectors over iterations. Note that the correlation vector represents the correlations of columns of the matrix 𝐴 with current residual vector π‘Ÿ. At last, we generate two different examples, and utilize OMP and LARS to reconstruct the sparse vector in each example. At each iteration of the algorithms, we plot the updating process in three dimensions (3D) view. β€’ Finally, in chapter 6, we state the conclusion of this thesis and some potential future work. 7 Chapter 2 2 Greedy Pursuit Algorithms Greedy pursuit algorithms attempt to find an approximate solution for the l0-norm minimization problem (1.2). In this chapter, we review three greedy pursuit algorithms that are used for finding a sparse solution of an underdetermined linear system, and concentrate on explaining the OMP algorithm. First, we explain the basic MP algorithm which is considered the ancestor of OMP. Then, we describe OMP in detail. We end this chapter by explaining StOMP, which is a modified version of OMP. 2.1 Matching Pursuit (MP) Matching Pursuit (MP) [24] is a basic iterative greedy algorithm that searches for a sparse solution of an underdetermined linear system. The fundamental vector in MP is the residual vector π‘Ÿ ∈ 𝑅 π‘š , which represents the remaining part of the measurement vector 𝑦 after updating the solution vector. Henceforth, we assume that the columns of the matrix 𝐴 are normalized to have unit l2-norm (i.e. β€–π‘Ž 𝑖 β€–2 = 1, 𝑖 = {1,2, … . , 𝑛}, where π‘Ž 𝑖 is the 𝑖 π‘‘β„Ž column of the matrix 𝐴). MP starts from an all zero solution, and initializes the residual with the measurement vector 𝑦 (i.e. π‘Ÿ0 = 𝑦). Then at each iteration 𝑑, MP computes the correlation vector by multiplying columns of the matrix 𝐴 with the residual vector π‘Ÿ from the previous iteration as follows: 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 (2.1) Next, it selects a column from the matrix 𝐴 that is highly correlated with the current residual π‘Ÿ. Indeed, the MP algorithm selects a column from the matrix 𝐴 that corresponds to the entry in the 8 vector 𝑐 𝑑 that has the largest magnitude (i.e. the largest entry in the vector 𝑐 𝑑 is determined by considering the absolute value of each entry in the vector 𝑐 𝑑 ). This can be expressed by the following equation: 𝑖 = arg max |𝑐 𝑑 (𝑗)| (2.2) π‘₯ 𝑑 (𝑖) = π‘₯ π‘‘βˆ’1 (𝑖) + 𝑐 𝑑 (𝑖) (2.3) where 𝑖 is the index of the selected column. 1≀j≀n Afterward, the 𝑖 π‘‘β„Ž solution coefficient π‘₯(𝑖) associated with the selected column is updated: The last step of MP is computing the current approximation to the vector 𝑦 via multiplying the matrix 𝐴 by the current solution vector π‘₯ 𝑑 , and finding a new residual vector by subtracting the vector 𝑦 from the current approximation: π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 (2.4) The residual vector π‘Ÿ 𝑑 will be used in the subsequent iteration (𝑑 + 1) to calculate a new correlation vector 𝑐 𝑑+1, and select a new column. These steps are repeated until the norm of the residual π‘Ÿ 𝑑 falls below a certain very small value (πœ–). The MP algorithm can be summarized in Figure 2.1. 9 β€’ Input: the vector 𝑦 ∈ 𝑅 π‘š , the matrix 𝐴 ∈ 𝑅 π‘šΓ—π‘› , and the termination threshold for the residual norm πœ–. β€’ Output: the sparse vector π‘₯ ∈ 𝑅 𝑛 . β€’ Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴. 1. Initialization: π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1 2. Compute the correlation vector: 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 3. Find a column index of the matrix 𝐴 that is best correlated with the current residual vector. 𝑐 𝑑: This can be achieved by determining the index of the largest absolute entry in the vector 𝑖 = arg max |𝑐 𝑑 (𝑗)| 1≀𝑗≀𝑛 4. Update the 𝑖th solution coefficient: π‘₯ 𝑑 (𝑖) = π‘₯ π‘‘βˆ’1 (𝑖) + 𝑐 𝑑 (𝑖) 5. Compute the new residual vector : π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 6. If β€–π‘Ÿ 𝑑 β€–2 < πœ–, terminate and return π‘₯ = π‘₯ 𝑑 as the final solution. Else, increase the iteration counter: 𝑑 = 𝑑 + 1 and return to step 2 Figure 2.1: The MP algorithm The MP algorithm is simple and intuitive for finding a sparse solution of an underdetermined linear system. However, it suffers from slow convergence because it requires a large unbounded number of iterations to find a solution. The computational complexity of the MP algorithm is 𝑂(π‘šπ‘›π‘‡), where 𝑇 is the number of iterations that are required for MP to coverage to the final solution. 10 2.2 Orthogonal Matching Pursuit (OMP) In many applications, MP is not practical since its complexity increases linearly with the number of required iterations (𝑇). Therefore, MP was revised to limit the number of required iterations by adding an orthogonalization step. The modified MP is known as Orthogonal Matching Pursuit (OMP) [25,26,27]. OMP inherits many steps from MP. The primary modification that is made by OMP is using a least square formula to achieve better approximation to the measurement vector 𝑦 over the selected columns. OMP keeps indices of the selected columns in a set called the active set 𝐼. Therefore, the selected columns are called the active columns. Like MP, OMP starts from an all-zero solution, and initializes the residual to the measurement vector 𝑦. At each iteration, OMP selects a column from the matrix 𝐴 that is best correlated with the residual vector π‘Ÿ. Then, OMP appends the index of the selected column to the active set. The next step is to find the active entries (the entries that correspond to the active set 𝐼) of the solution vector π‘₯ by solving the following least square problem over the active columns: π‘₯ 𝑑 (𝐼) = arg min‖𝑦 βˆ’ 𝐴 𝐼 𝑣‖2 2 𝑣 (2.5) Here, 𝐴 𝐼 is a sub-matrix that is formed by the active columns, and π‘₯(𝐼) is a sub-vector which holds the active entries of the vector π‘₯. By using well-known linear algebra techniques, problem (2.5) can be solved by projecting the vector 𝑦 onto the space spanned by the active columns. This can be achieved by solving for π‘₯ 𝑑 (𝐼) in the following equation: 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 (2.6) Note that Equation (2.6), which is widely known as the normal equation, represents a linear system with a unique solution. 11 Then, and as in the MP algorithm, a new residual vector is computed using Equation (2.4). These steps are repeated until the norm of the current residual falls below a certain very small value πœ–. When OMP terminates, the vector 𝑦 is spanned by the set of active columns. More importantly, OMP ensures that the residual vector π‘Ÿ is orthogonal to all active columns at each iteration. Consequently, the correlations of the active columns will be zeros at the next iteration. Therefore in OMP, no column is selected twice, and the active set grows linearly with the iteration counter. OMP can be summarized in Figure 2.2. β€’ Input: the vector 𝑦 ∈ 𝑅 π‘š , the matrix 𝐴 ∈ 𝑅 π‘šΓ—π‘› , and the termination threshold for the residual norm πœ–. β€’ Output: the sparse vector π‘₯ ∈ 𝑅 𝑛 . β€’ Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴. 1. Initialization: π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1, 𝐼 = βˆ… 2. Compute the correlation vector: 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 3. Find a column index of the matrix 𝐴 that is best correlated with current residual vector. 𝑐 𝑑: This can be achieved by determining the index of the largest absolute entry in the vector 𝑖 = arg max|𝑐 𝑑 (𝑗)| c π‘—βˆˆπΌ where 𝐼 c is the inactive set (the set have indices of columns of the matrix 𝐴 that are not in the 4. Add 𝑖 to the active set: 𝐼 = 𝐼 βˆͺ {𝑖} active set 5. Solve the least square problem: 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 6. Compute the new residual vector : π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 7. If β€–π‘Ÿ 𝑑 β€–2 < πœ–, terminate and return π‘₯ = π‘₯ 𝑑 as the final solution. Else, increase the iteration counter: 𝑑 = 𝑑 + 1 and return to step 2 Figure 2.2: The OMP algorithm 12 To further illustrate the OMP algorithm, we consider the following example. Example 2.1: assume that the matrix 𝐴 contains two columns (π‘Ž1 , π‘Ž2 ) as illustrated in Figure 2.3. In this case, the solution coefficients (π‘₯(1), π‘₯(2)) that generate the vector 𝑦 as a linear combination of the columns π‘Ž1 , π‘Ž2 should be found: 𝑦 = π‘₯(1) π‘Ž1 + π‘₯(2)π‘Ž2 OMP starts by determining which column is highly correlated with the current residual. Recall that the residual vector is initialized to the vector 𝑦 at the beginning of the first iteration. In this example, the vector 𝑦 is correlated more with the vector π‘Ž1 than the vector π‘Ž2 . For this reason, OMP selects the vector π‘Ž1 during the first iteration. Then, OMP takes the largest possible step in the direction of the vector π‘Ž1 ; this is achieved by projecting the vector y onto the vector π‘Ž1 to get the current solution coefficient π‘₯1 (1) (red bold line in Figure 2.3). This leaves some error value that is represented by the residual vector π‘Ÿ1. This residual is orthogonal to the vector π‘Ž1 which in this example represents the active set. At the second iteration, a new column is selected which represents the best correlated column with the current residual π‘Ÿ1. In this simple illustrative example, the column π‘Ž2 which was left out from the first iteration, is selected now and its index is added to the active set 𝐼. Next, the vector 𝑦 is projected onto the space spanned by the columns π‘Ž1 and π‘Ž2 to obtain the current solution coefficients π‘₯2 (1) and π‘₯2 (2) (blue bold line in Figure 2.3). Since the vector 𝑦 belongs to the space spanned by π‘Ž1 and π‘Ž2 , there is no residual signal left after the second iteration (i.e. π‘Ÿ2 = 0). In this case, OMP terminates and returns the vector π‘₯2 as the final solution. 13 π‘Ž2 π‘₯1 (1) π‘₯2 (1) π‘₯2 (2) 𝑦 π‘Ÿ1 π‘Ž1 Figure 2.3: OMP approximates the vector 𝑦 by using the columns π‘Ž1 , π‘Ž2 in Example 2.1 The Red bold line represents the approximation to the vector 𝑦 at the first iteration; while the blue bold line represents the approximation at the second iteration. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis. The orthogonalization step expedites convergence of the OMP algorithms to the final solution. Because of this step, the OMP algorithm is faster than the basic MP algorithm. In OMP, a column is selected and added to the active set only once. This implies that there is no chance for the same column to be selected twice since the residual vector has zero correlation with the active columns, because it is always orthogonal to all active columns after updating the solution coefficients. On the other hand, a column may be selected more than once during MP’s computation. Therefore, MP requires more iterations than OMP to converge to the final solution. Nevertheless, each iteration in the OMP algorithm requires a higher computational cost than an MP’s iteration. This is because of the orthogonalization step (solving a least square problem) that is used by the OMP algorithm. 14 In the OMP algorithm, there is a high probability to recover a π‘˜ sparse signal (includes only π‘˜ nonzero entries) in at most π‘˜ iterations using the vector 𝑦 and the matrix 𝐴 [36]. As a result, the computational complexity of the OMP algorithm would be 𝑂(π‘šπ‘›π‘˜). Hence, OMP is an efficient algorithm for finding a sparse solution of an underdetermined linear system when the sparsity π‘˜ is relatively small. Note that the sparsity π‘˜ of any vector refers to the number of nonzero entries in that vector. It is worth noting, in the statistical literature, there is a similar algorithm to OMP that is known as β€œForward Stepwise Regression” [37]. 2.3 Stagewise Orthogonal Matching Pursuit (StOMP) In many signal processing applications, such as image inpainting [38] and cartoon-texture decomposition [39,40], the underdetermined linear system is extremely large in scale, and the unknown vector (π‘₯) is not very sparse (i.e. includes many nonzero entries). In such cases, OMP is not a practical algorithm to find the solution vector π‘₯ because the computational complexity of OMP increases linearly with the number of nonzeros π‘˜. Therefore, OMP has been revised in a way to find less accurate solutions for such large scale underdetermined linear systems in a reasonable time. The modified version of OMP is called Stagewise Orthogonal Matching Pursuit (StOMP) [28]. The StOMP is characterized by adding several columns to the active set at each iteration instead of considering a single column as in the case of OMP. StOMP selects columns that their absolute correlations with the current residual exceed a specific chosen threshold value. Thus, the convergence of StOMP is faster than OMP since StOMP requires less number of iterations than OMP to find the solution vector π‘₯. The drawback of the StOMP algorithm is the accuracy of the resulting solution. In other words, although StOMP is faster than OMP, the calculated solution by StOMP is less accurate than the calculated one by OMP. It should be clear that StOMP offers a trade-off between the accuracy of the final solution and the required time to find that solution. 15 The algorithmic steps of StOMP are similar to OMP’s steps, except that in OMP, a single column is added to the active set at each iteration, while in StOMP, more than one column can be added to the active set at each iteration. It is important to mention that in the StOMP algorithm, the threshold value (π‘‘β„Žπ‘Ÿ) is calculated at each iteration by the following equation: π‘‘β„Žπ‘Ÿ = π‘ πœŽ where 2 ≀ 𝑠 ≀ 3 is the threshold parameter, and 𝜎 = β€–π‘Ÿβ€–2 βˆšπ‘› (2.7) is the formal noise level. Note that the parameter 𝜎 is updated at each iteration since it depends on the residual vector π‘Ÿ. The StOMP algorithm can be summarized in Figure 2.4. β€’ Input: the vector 𝑦 ∈ 𝑅 π‘š the matrix 𝐴 ∈ 𝑅 π‘šΓ—π‘› , the termination threshold for the residual norm πœ–, and the threshold parameter 𝑠. β€’ Output: the sparse vector π‘₯ ∈ 𝑅 𝑛 . β€’ Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴. 1. Initialization: π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1, 𝐼 = βˆ… 2. Compute the correlation vector: 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 3. Compute the threshold: π‘‘β„Žπ‘Ÿ 𝑑 = π‘ πœŽ 𝑑 where 2 ≀ 𝑠 ≀ 3 and 𝜎 = β€–π‘Ÿ π‘‘βˆ’1 β€–2 βˆšπ‘› . 4. Find indices of columns that their absolute correlations with the residual vector exceed the threshold value: 𝑖 = { 𝑗: |𝑐 𝑑 (𝑗)| > π‘‘β„Žπ‘Ÿ 𝑑 , 𝑗 ∈ 𝐼 c } where 𝐼 c is the inactive set (the set have indices of columns of the matrix 𝐴 that are not in the 5. Add 𝑖 to the active set: 𝐼 = 𝐼 βˆͺ {𝑖} active set 6. Solve the least square problem: 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 7. Compute the new residual : π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 8. If β€–π‘Ÿ 𝑑 β€–2 < πœ–, terminate and return π‘₯ = π‘₯ 𝑑 as the final solution. Else, increase the iteration counter: 𝑑 = 𝑑 + 1 and return to step 2. Figure 2.4: The StOMP algorithm 16 Chapter 3 3 Least Angle Regression (LARS) LARS was originally proposed by Efron, Hastie, Johnstone and Tibshirani as a new model selection algorithm to solve overdetermined linear systems [33]. In addition, Efron et al demonstrated how LARS can be modified to solve the known LASSO problem in a more efficient way than traditional convex optimization techniques. Later, Donoho and Tsaig proposed to use the modified LARS to find a sparse solution of an underdetermined linear system, which is known as β€œhomotopy algorithm” [34]. In this chapter, we review and study how LARS can be used for finding a sparse solution of an underdetermined linear system [33,34]. First, we describe the basic LARS algorithm [33]. Then, we provide a thorough insight about the modified LARS for solving the LASSO problem which is considered an effective way for solving the l1-norm minimization problem (1.4) [34]. In addition, we provide a detailed derivation of LARS main steps, which was not explained in the prior work [33,34] at the same level we present here. 3.1 Basic LARS Algorithm In Section 2.2, we have seen that the OMP algorithm relies on solving the least square problem to update the solution coefficients at each iteration. As we stated earlier, OMP adopts the largest possible step in the least square direction of the active columns. Therefore, OMP is considered an extreme fitting method that can be overly greedy. OMP may not select some columns that are significant in terms of accuracy of the targeted sparse solution. This is because these columns are highly correlated with columns that have already been in the active set. To overcome this problem, the Least Angle Regression (LARS) was used to find a sparse solution of an 17 underdetermined linear system [34]. LARS is considered less greedy than OMP since it adopts appropriate step that is fairly smaller than the OMP step. In other words, LARS increases the coefficients of the solution vector that associated with the active set as much as needed (as explained further below). Similar to OMP, LARS starts by initializing: the solution vector π‘₯ with an all-zero, the residual vector π‘Ÿ with the measurement vector 𝑦, and the active set 𝐼 with an empty set. Then, at each iteration, a new column is selected from the matrix 𝐴, and its index is added to the active set. At the first iteration, LARS selects a column, say π‘Ž 𝑗 , that is highly correlated with the residual vector π‘Ÿ. This implies that the residual vector π‘Ÿ has a smaller angle with the column π‘Ž 𝑗 than other columns of the matrix 𝐴. Then, LARS increases the coefficient π‘₯(𝑗) that is associated with the selected column π‘Ž 𝑗 . This causes the absolute correlation value of π‘Ž 𝑗 with the current residual to decrease as π‘₯(𝑗) is increased. LARS takes the smallest possible steps in the direction of the column π‘Ž 𝑗 until another column, say π‘Ž π‘˜ , has as much absolute correlation value with the current residual as the column π‘Ž 𝑗 . Now, instead of continuing in the direction of π‘Ž 𝑗 , LARS moves forward in the direction that is equiangular with the selected columns (π‘Ž 𝑗 , π‘Ž π‘˜ ) until a third column, say π‘Ž π‘š , has much absolute correlation value with the current residual as much as π‘Ž 𝑗 and π‘Ž π‘˜ . The procedure is continued to add one column until no remaining column has correlation with the current residual. The name β€œleast angle” comes from a geometrical interpretation of the LARS process, which chooses the updated direction that makes the smallest and equal angle with all active columns. To further illustrate the LARS algorithm, we consider the two-dimensional system that was described in Example 2.1. LARS initializes the solution vector to an all zero and the residual to the measurement vector 𝑦. As shown in Figure 3.1, LARS begins to select the column π‘Ž1 18 because it has absolute correlation with the initial residual (the vector 𝑦) more than π‘Ž2 (i.e. πœƒ1 (1) < πœƒ1 (2), where πœƒ 𝑑 (𝑖) is the angle between the column π‘Ž 𝑖 and the current residual π‘Ÿ at iteration 𝑑). Then, the LARS algorithm proceeds in the direction of π‘Ž1 by adding the step size 𝛾1 (the red bold line in Figure 3.1) which is chosen in a way that makes columns π‘Ž1 and π‘Ž2 have the same absolute correlation with the current residual at the next iteration (i.e. πœƒ2 (1) = πœƒ2 (2)). When the solution vector π‘₯ is updated, the solution coefficient π‘₯1 (1) = 𝛾1. At the second iteration, LARS adds the column π‘Ž2 to the active set, and proceeds in the direction that is equiangular with the columns π‘Ž1 and π‘Ž2 . Because there is no remaining column, LARS adds step size 𝛾2 (the blue bold line in Figure 3.1) that leads to the vector 𝑦. LARS terminates since the residual is zero, and the solution coefficients equals to: π‘₯2 (1) = 𝛾1 + 𝛾2 𝑑2 (1) and π‘₯2 (2) = 𝛾2 𝑑2 (2), where 𝑑2 is the updated direction at the second iteration which is equiangular with the active columns (π‘Ž1 , π‘Ž2 ). In the LARS algorithm, two important parameters are required to be computed: the updated direction vector 𝑑 that should be equiangular with the active columns, and the step size 𝛾 which is a scalar that is multiplied by the updated direction 𝑑 to update the solution vector π‘₯. The remaining steps of the LARS algorithm are roughly similar to the corresponding OMP’s steps that were described in Section 2.2. 19 π‘Ž2 πœƒ1 2 π‘Ž2 πœƒ1 1 < πœƒ1 (2) πœƒ2 2 πœƒ1 1 πœƒ2 1 𝛾1 𝛾2 πœƒ2 1 = πœƒ2 (2) π‘₯2 1 = 𝛾1 + 𝛾2 𝑑2 1 π‘₯2 2 = 𝛾2 𝑑2 (2) π‘₯1 1 = 𝛾1 π‘₯1 2 = 0 𝑦 π‘Ž1 Figure 3.1: LARS approximates the vector 𝑦 by using the columns π‘Ž1 , π‘Ž2 The red bold line represents the approximation to the vector 𝑦 at the first iteration, and the blue bold line represents the change that adds to the previous approximation to obtain the approximation at the second iteration. π‘Ž2 βˆ₯ οΏ½οΏ½οΏ½. πœƒ 𝑑 (𝑖) is the angle between the column π‘Ž 𝑖 and π‘Ž2 the current residual at iteration 𝑑. d2 is the updated direction at the second iteration. 3.3.1 Derivation of Updated Direction In this section, we provide our derivation of computing the updated direction that is used in [34]. The updated direction should form an equal angle with the active columns toward the residual vector π‘Ÿ. We start the derivation of the update direction 𝑑 𝑑 ∈ 𝑅 𝑛 at iteration 𝑑 by projecting the previous residual vector π‘Ÿ π‘‘βˆ’1 onto the space spanned by the active columns. This is achieved by solving for οΏ½ in the following normal equation: π‘₯ 𝐴 𝐼𝑇 𝐴 𝐼 οΏ½ = 𝐴 𝐼𝑇 π‘Ÿ π‘‘βˆ’1 π‘₯ By using Equation (2.1), the active entries of the correlation vector can be found as follows: 20 (3.1) Substituting (3.2) into (3.1): 𝑐 𝑑 (𝐼) = 𝐴 𝐼𝑇 π‘Ÿ π‘‘βˆ’1 (3.2) 𝐴 𝐼𝑇 𝐴 𝐼 οΏ½ = 𝑐 𝑑 (𝐼) π‘₯ (3.3) 𝑐 𝑑 (𝐼) = πœ† 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ (3.4) At each iteration of the LARS algorithm, the active columns have the same absolute correlation with the residual vector. This correlation value is represented by the parameter πœ† 𝑑 . Consequently, we can factorize 𝑐 𝑑 (𝐼) as follows: By using (3.4), Equation (3.3) can be rewritten as: 𝐴 𝐼𝑇 𝐴 𝐼 οΏ½ = πœ† 𝑑 𝑠𝑖𝑠𝑛 (𝑐 𝑑 (𝐼)) π‘₯ πœ† 𝑑 is a scalar; dividing Equation (3.5) by the value of πœ† 𝑑 leads to: 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = 𝑠𝑖𝑠𝑛 (𝑐 𝑑 (𝐼)) (3.5) (3.6) where 𝑑 𝑑 (𝐼) = οΏ½/πœ† 𝑑 is a sub-vector of the updated direction vector 𝑑 𝑑 that holds the entries π‘₯ corresponding to the active set 𝐼. The goal of Equation (3.6) is to calculate the entries of the sub- vector 𝑑 𝑑 (𝐼) which ensures that the absolute correlations of the active columns are declined equally. LARS sets entries that are not in the active set to zero (i.e. 𝑑 𝑑 (𝐼 𝑐 ) = 0). 3.1.2 Derivation of Step Size At each iteration, the LARS algorithm computes the step size 𝛾 to update the solution coefficients that correspond to the active set. This kind of update is necessary to cause a column from the inactive set to be included in the active set at the next iteration. This can be achieved by making the active columns similar to the selected column in terms of the absolute correlation with the current residual vector. 21 Derivation of closed-form expression for calculating the step size 𝛾 was briefly stated in the original LARS paper [33]. Therefore, In this section, we obviously provide our complete and detailed derivation of that expression. We start the derivation of the step size 𝛾 by assuming the LARS algorithm is currently at iteration 𝑑. In this case, the step size 𝛾 𝑑 is required to be calculated. In LARS, the current residual vector is computed in the same way as OMP, which is expressed in Equation (2.4): And the previous residual is obtained by: π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 π‘Ÿ π‘‘βˆ’1 = 𝑦 βˆ’ 𝐴π‘₯ π‘‘βˆ’1 (3.7) (3.8) To find the relationship between the current and the previous residual vectors, we subtract Equation (3.8) from Equation (3.7): Equivalently: π‘Ÿ 𝑑 βˆ’ π‘Ÿ π‘‘βˆ’1 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 βˆ’ 𝑦 + 𝐴π‘₯ π‘‘βˆ’1 π‘Ÿ 𝑑 = π‘Ÿ π‘‘βˆ’1 βˆ’ 𝐴(π‘₯ 𝑑 βˆ’ π‘₯ π‘‘βˆ’1 ) (3.9) π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + 𝛾 𝑑 𝑑 𝑑 (3.10) π‘Ÿ 𝑑 = π‘Ÿ π‘‘βˆ’1 βˆ’ 𝛾 𝑑 𝐴 𝑑 𝑑 (3.11) At each iteration, LARS updates the solution vector π‘₯ via adding the multiplication of the scalar 𝛾 𝑑 by the vector 𝑑 𝑑 to the previous solution vector. This is expressed as follows: By substituting (3.10) in (3.9), we obtain: Because 𝑑(𝐼 𝑐 ) = 0, Equation (3.11) can be simplified to: π‘Ÿ 𝑑 = π‘Ÿ π‘‘βˆ’1 βˆ’ 𝛾 𝑑 𝐴 𝐼 𝑑 𝑑 (𝐼) 22 (3.12) The correlation vector c in the next iteration is computed by: By substituting (3.12) in (3.13), we get: 𝑐 𝑑+1 = 𝐴 𝑇 π‘Ÿ 𝑑 𝑐 𝑑+1 = 𝐴 𝑇 (π‘Ÿ π‘‘βˆ’1 βˆ’ 𝛾 𝑑 𝐴 𝐼 𝑑 𝑑 (𝐼)) = 𝑐 𝑑 βˆ’ 𝛾 𝑑 𝐴 𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) (3.13) (3.14) At each iteration of LARS, the parameter πœ† represents the absolute correlation of the active columns. In fact, πœ† represents the largest absolute value over entries of the correlation vector. This can be expressed as follows: πœ† = β€–π‘β€–βˆž = |𝑐(𝐼)| where β€– β€–βˆž is the maximum norm, and β€–π‘β€–βˆž ≝ max(|𝑐(1)|, |𝑐(2)|, … … . . . , |𝑐(𝑛)|). (3.15) Hence, πœ† can be computed for the next iteration 𝑑 + 1 as follows: By using Equation (3.14), we get: From (3.6), we have: πœ† 𝑑+1 = |𝑐 𝑑+1 (𝐼)| πœ† 𝑑+1 = |𝑐 𝑑 (𝐼) βˆ’ 𝛾 𝑑 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼)| 𝑑 𝑑 (𝐼) = (𝐴 𝐼𝑇 𝐴 𝐼 )βˆ’1 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼)) By substituting 𝑑 𝑑 (𝐼) of (3.17) into (3.16), we obtain: πœ† 𝑑+1 = |𝑐 𝑑 (𝐼) βˆ’ 𝛾 𝑑 𝐴 𝐼𝑇 𝐴 𝐼 (𝐴 𝐼𝑇 𝐴 𝐼 )βˆ’1 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼))| πœ† 𝑑+1 = |𝑐 𝑑 (𝐼) βˆ’ 𝛾 𝑑 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼))| Substituting (3.4) in (3.18): πœ† 𝑑+1 = οΏ½πœ† 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ βˆ’ 𝛾 𝑑 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼))οΏ½ 23 (3.16) (3.17) (3.18) πœ† 𝑑+1 = �𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½(πœ† 𝑑 βˆ’ 𝛾 𝑑 )οΏ½ πœ† 𝑑+1 = |πœ† 𝑑 βˆ’ 𝛾 𝑑 | (3.19) Recall that at each iteration, the updated direction 𝑑 ensures the absolute correlations of the active columns are declined equally. In other word, the value of πœ† is decreased at each iteration. Moreover, LARS selects the smallest step size that causes a column from inactive set to join the active set at next iteration. Therefore, the value of 𝛾 𝑑 should be positive and less than πœ† 𝑑 . As a result, Equation (3.19) can be simplified to: πœ† 𝑑+1 = πœ† 𝑑 βˆ’ 𝛾 𝑑 (3.20) |𝑐 𝑑+1 (𝑖)| = πœ† 𝑑+1 (3.21) 𝑐 𝑑+1 (𝑖) = 𝑐 𝑑 (𝑖) βˆ’ 𝛾 𝑑 π‘Ž 𝑖𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) (3.22) We assume the 𝑖 π‘‘β„Ž column is added to the active set at the next iteration 𝑑 + 1. Therefore, the absolute value of its correlation should equal to πœ† 𝑑+1 . where 𝑖 ∈ 𝐼 𝑑𝑐 , and 𝐼 𝑑𝑐 is the inactive set at iteration 𝑑. Using Equation (3.14), the correlation of the 𝑖 π‘‘β„Ž column can be computed by the following equation: By substituting (3.22) and (3.20) in Equation (3.21), we get: �𝑐 𝑑 (𝑖) βˆ’ 𝛾 𝑑 π‘Ž 𝑖𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼)οΏ½ = πœ† 𝑑 βˆ’ 𝛾 𝑑 Β±(𝑐 𝑑 (𝑖) βˆ’ 𝛾 𝑑 π‘Ž 𝑖𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼)) = πœ† 𝑑 βˆ’ 𝛾 𝑑 ±𝑐 𝑑 (𝑖) βˆ“ 𝛾 𝑑 π‘Ž 𝑖𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = πœ† 𝑑 βˆ’ 𝛾 𝑑 𝛾 𝑑 βˆ“ 𝛾 𝑑 π‘Ž 𝑖𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = πœ† 𝑑 βˆ“ 𝑐 𝑑 (𝑖) 𝛾 𝑑 (1 βˆ“ π‘Ž 𝑖𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼)) = πœ† 𝑑 βˆ“ 𝑐 𝑑 (𝑖) 24 𝛾𝑑 = Let 𝑣 𝑑 = 𝐴 𝐼 𝑑 𝑑 (𝐼) πœ† 𝑑 βˆ“ 𝑐 𝑑 (𝑖) 1 βˆ“ π‘Ž 𝑖𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) 𝛾𝑑 = πœ† 𝑑 βˆ“ 𝑐 𝑑 (𝑖) 1 βˆ“ π‘Ž 𝑖𝑇 𝑣 𝑑 The LARS algorithm finds the minimum positive step size 𝛾 𝑑 that makes one column from the inactive set 𝐼 𝑐 join the active set 𝐼 at the next iteration 𝑑 + 1. 𝛾 𝑑 = min οΏ½ 𝑐 π‘–βˆˆπΌ πœ† 𝑑 – 𝑐 𝑑 ( 𝑖) πœ† 𝑑 + 𝑐 𝑑 ( 𝑖) , οΏ½ 1 βˆ’ π‘Ž 𝑖𝑇 𝑣 𝑑 1 + π‘Ž 𝑖𝑇 𝑣 𝑑 (3.23) In (3.23), only positive components within every choice of 𝑖 are considered when the minimum is taken. After the updated direction 𝑑 𝑑 and the step size 𝛾 𝑑 are calculated, LARS updates the solution vector as expressed by Equation (3.10), and computes the new residual as expressed by Equation (3.7). These steps are repeated as long as there is still a column from the inactive set that has correlation with the current residual. When no remaining columns have correlation with the current residual (i.e. πœ† approaches zero), LARS terminates and the vector π‘₯ 𝑑 is returned as the final solution. The basic LARS algorithm can be summarized in Figure 3.2. 25 β€’ Input: the vector 𝑦 ∈ 𝑅 π‘š the matrix 𝐴 ∈ 𝑅 π‘šΓ—π‘› , and the termination threshold for the residual norm πœ–. β€’ Output: the sparse vector π‘₯ ∈ 𝑅 𝑛 . β€’ Task: approximate the vector 𝑦 by using the fewest columns of the matrix 𝐴. 1.Initialization: π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1, 𝐼 = βˆ…. 2.Compute the correlation vector: 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 3.Compute the maximum absolute value in the correlation vector: πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž 4. If πœ† 𝑑 is zero or approaches a very small value, LARS is terminated and the vector π‘₯ 𝑑 is 5. Find the active set: 𝐼 = {𝑗: |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } returned as the final solution, otherwise the following steps are implemented. 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = 𝑠𝑠𝑠𝑠(𝑐 𝑑 (𝐼)) 6. Solve the following least square problem to find active entries of the updated direction: where 𝑠𝑠𝑠𝑠(𝑐 𝑑 (𝐼)) returns the sign of the active entries of the correlation vector 𝑐 𝑑 7. Set the inactive entries of the updated direction to zero: 𝑑 𝑑 (𝐼 c ) = 0 8. Calculate the step size: where 𝑣 𝑑 = 𝐴 𝐼 𝑑 𝑑 (𝐼) 𝛾 𝑑 = min οΏ½ 𝑐 π‘–βˆˆπΌ πœ† 𝑑 – 𝑐 𝑑 (𝑖) πœ† 𝑑 + 𝑐 𝑑 (𝑖) , οΏ½ 1 βˆ’ π‘Ž 𝑖𝑇 𝑣 𝑑 1 + π‘Ž 𝑖𝑇 𝑣 𝑑 9. Update the solution vector: π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + 𝛾 𝑑 𝑑 𝑑 10. Compute the new residual vector: π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 11. If β€–π‘Ÿ 𝑑 β€–2 < πœ–, terminate and return π‘₯ = π‘₯ 𝑑 iteration counter: 𝑑 = 𝑑 + 1 and return to step 2. as the final solution. Else, increase the Figure 3.2: The basic LARS algorithm 26 3.2 Modified LARS for solving the LASSO problem In certain circumstances, the l1-norm minimization problem stated in (1.4) can successfully find the sparsest solution while OMP fails [14,27,15]. The l1-norm minimization problem can be solved by using the standard convex optimization techniques. However, the convex optimization methods require heavy computations especially when the underdetermined system is very large. The Least Absolute Shrinkage and Selection Operator (LASSO) [35] was proposed by Tibshirani to solve the l1-norm minimization problem (1.4) efficiently. LASSO minimizes the least square error subject to the l1-norm of the solution vector being less than some threshold (q): min‖𝑦 βˆ’ 𝐴π‘₯β€–2 2 π‘₯ 𝑠𝑒𝑏𝑗𝑒𝑐𝑑 π‘‘π‘œ β€–π‘₯β€–1 ≀ π‘ž (3.24) In other words, LASSO finds the least square solution subject to an l1-norm constraint on the solution vector. LASSO (3.24) can be rewritten equivalently as an unconstrained optimization problem: min π‘₯ 1 ‖𝑦 βˆ’ 𝐴π‘₯β€–2 + πœ†β€–π‘₯β€–1 2 2 (3.25) where πœ† ∈ [0, ∞) is a scalar regularization parameter that handles the trade-off between the mean square error and the l1-norm of the solution vector π‘₯. To see how Equation (3.25) solves the l1-norm minimization problem (1.4), the vector π‘₯ is initialized with an all zero for large πœ†, and then πœ† is decreased gradually. When πœ† approaches zero, the solution of LASSO (3.25) converges to the solution of l1-norm minimization problem (1.4) [34]. 27 LARS with a minor modification has been used to solve LASSO (3.25) faster than traditional convex optimization methods [33,34]. To explain this, we start solving (3.25) as a minimization problem by setting its gradient to zero and solve the following equation: This can be simplified to: 1 πœ• π‘₯ οΏ½ ‖𝑦 βˆ’ 𝐴π‘₯β€–2 + πœ†β€–π‘₯β€–1 οΏ½ = 0 2 2 βˆ’π΄ 𝑇 (𝑦 βˆ’ 𝐴π‘₯) + πœ†πœ• π‘₯ (β€–π‘₯β€–1 ) = 0 Because the residual π‘Ÿ = 𝑦 βˆ’ 𝐴π‘₯, Equation (3.27) can be rewritten as: (3.26) (3.27) βˆ’π΄ 𝑇 π‘Ÿ + πœ† πœ• π‘₯ (β€–π‘₯β€–1 ) = 0 (3.28) 𝑐 = πœ† πœ• π‘₯ (β€–π‘₯β€–1 ) (3.29) Recall that the correlation is calculated by: 𝑐 = 𝐴 𝑇 π‘Ÿ, Equation (3.28) can be expressed as: βˆ’π‘ + πœ† πœ• π‘₯ (β€–π‘₯β€–1 ) = 0 The gradient of β€–π‘₯β€–1 cannot be found because l1-norm is discontinuous at zero. Therefore, subgradient of l1-norm can be found as follows: πœ• π‘₯ β€–π‘₯β€–1 = �𝑣 ∈ 𝑅 𝑛 ∢ 𝑣(𝑖) = 𝑠𝑖𝑠𝑛�π‘₯(𝑖)οΏ½ 𝑣(𝑖) ∈ [βˆ’1,1] 𝑖𝑓 π‘₯(𝑖) β‰  0 οΏ½ 𝑖𝑓 π‘₯(𝑖) = 0 (3.30) At each iteration of LARS, only coefficients of the solution vector π‘₯ corresponding to the active set 𝐼 are nonzeros (i.e. π‘₯(𝑖) β‰  0 , 𝑖 ∈ 𝐼). The other coefficients are zeros (i.e. π‘₯(𝑖) = 0, 𝑖 ∈ 𝐼 𝑐 ). Hence, by using (3.30), Equation (3.29) can be expressed for the active set as follows: 𝑐(𝐼) = πœ† 𝑠𝑖𝑠𝑛(π‘₯(𝐼)) and for the inactive set, Equation (3.29) can be expressed as follows: 28 (3.31) |𝑐(𝐼 𝑐 )| ≀ πœ† (3.32) Thus, LARS can be used to solve the LASSO problem if it satisfies the constraints stated by Equations (3.31) and (3.32) at each iteration. In other words, to solve LASSO, LARS should ensure: 1- The absolute correlations of the active columns are equal to πœ†. 2- The signs of the correlation and the solution vectors are matched for the active entries. 3- The absolute correlations of the inactive columns are equal or less than πœ†. LARS maintains the condition stated by equality (3.32) at each iteration because the absolute correlations of the inactive columns are always less than πœ†. If they are equal to πœ†, the columns should be in the active set (step 5 of the LARS algorithm in Figure 3.2). Also LARS partially maintains the condition stated by Equation (3.31) which is the absolute correlations of the active columns are equal to πœ†. However, LARS does not enforce the signs matching of the active entries between the correlation and solution vectors. Some of the solution coefficients associated with the active set may change signs while the signs of corresponding correlation coefficients are still the same. Therefore, LARS should be modified to maintain the sign matching constraint that states in the following equation: 𝑠𝑖𝑠𝑛 οΏ½π‘₯(𝐼)οΏ½ = 𝑠𝑖𝑠𝑛�𝑐(𝐼)οΏ½ (3.33) LARS violates the condition (3.33) when one of the solution coefficients that is associated with the active set crosses zero. Therefore, if any solution coefficient associated with the active set hits zero during the LARS procedures, the corresponded column to this coefficient should be removed from the active set, and the updated direction 𝑑 is recalculated by solving Equation (3.6). Hence, we need to determine the step size 𝛾 which causes one of the solution coefficients 29 that is associated with the active set to be zero after updating solution process. By using Equation (3.10), the 𝑖 π‘‘β„Ž solution coefficient is updated as follows: π‘₯ 𝑑 (𝑖) = π‘₯ π‘‘βˆ’1 (𝑖) + 𝛾 𝑑 𝑑 𝑑 (𝑖) The value of π‘₯ 𝑑 (𝑖) would be zero if: 𝛾𝑑 = βˆ’ π‘₯ π‘‘βˆ’1 (𝑖) 𝑑 𝑑 (𝑖) The smallest positive step size that causes the 𝑖 π‘‘β„Ž solution coefficient associated with the active set to be zero, can be found by the following minimization problem: 𝛾 𝑑 = min οΏ½βˆ’ π‘–βˆˆπΌ π‘₯ π‘‘βˆ’1 (𝑖) οΏ½ 𝑑 𝑑 (𝑖) (3.34) where the minimum is taken only over positive value. Therefore, at each iteration, the modified LARS for solving LASSO computes two step sizes: one adds a column to the active set as stated by Equation (3.23), and the other drops a column from the active set as stated by Equation (3.34). To distinguish between these two values of the step size, 𝛾 + is used to refer to the step size computed by (3.23), and 𝛾 βˆ’ is used to refer to the step size computed by (3.34). The modified LARS selects the smallest step size: where 𝛾 + = π‘šπ‘–π‘› οΏ½ 𝑑 𝑐 π‘–βˆˆπΌ 𝛾 𝑑 = π‘šπ‘–π‘› {𝛾 + , 𝛾 βˆ’ } 𝑑 𝑑 πœ† 𝑑 – 𝑐 𝑑 (𝑖) πœ† 𝑑 + 𝑐 𝑑 (𝑖) , οΏ½ 1 βˆ’ π‘Ž 𝑖𝑇 𝑣 𝑑 1 + π‘Ž 𝑖𝑇 𝑣 𝑑 and 𝛾 βˆ’ = π‘šπ‘–π‘› οΏ½βˆ’ 𝑑 π‘–βˆˆπΌ π‘₯ π‘‘βˆ’1 (𝑖) οΏ½ 𝑑 𝑑 (𝑖) (3.35) The modified LARS uses the step size computed by (3.35) instead of the step size computed by (3.23) in the basic LARS algorithm. Equally important, for updating the active set, if 𝛾 𝑑 = 𝛾 + , 𝑑 the 𝑖 π‘‘β„Ž column would be added to the active set at the next iteration 𝑑 + 1 as follows: 30 𝐼 = 𝐼 βˆͺ {𝑖} (3.36) 𝐼 = 𝐼 βˆ’ {𝑖} (3.37) In contrast, if 𝛾 𝑑 = 𝛾 βˆ’ , the 𝑖 π‘‘β„Ž column would be removed from the active set at the next iteration 𝑑 𝑑 + 1 as follows: The remaining steps are the same as they are in the basic LARS algorithm stated in Section 3.1. The modified LARS requires more iterations than the basic LARS because in the modified LARS, some columns are added to and dropped from the active set, while in the basic LARS, columns are always added to the active set. The computational complexity of modified LARS is 𝑂(π‘šπ‘›π‘‡), where 𝑇 is the number of required iterations which is normally equal or greater than the number of nonzero entries of the solution vector π‘₯ (i.e. T β‰₯ π‘˜). Note that the constant that hides in asymptotic notation (big 𝑂) of the modified LARS is much higher than the one of OMP, because the modified LARS executes many additional steps to compute the step size 𝛾 while OMP does not. However, the modified LARS is considered an efficient algorithm for solving the LASSO problem because it requires less computational cost than traditional convex optimization methods. 31 Chapter 4 4 Comparative Analysis of OMP and LARS In the signal processing and statistical literatures, OMP and modified LARS for solving LASSO are considered popular algorithms for finding a sparse solution of an underdetermined linear system. In this chapter, we compare OMP and modified LARS in terms of the algorithmic steps and performance. We reformulate some steps of OMP and modified LARS to identify the similarities and differences between them. To study the performance of OMP and modified LARS, we compare the convergence time and the solution accuracy in each algorithm. 4.1 Algorithmic Steps Pervious work by Donoho and Tsaig [34] showed that the basic LARS algorithm that was stated in Section 3.1 and OMP have the same algorithmic steps, except the updating solution step. As stated earlier, OMP updates the active entries of the solution vector π‘₯ by solving the least square problem. This is achieved by projecting the vector 𝑦 onto the space spanned by active columns; which is conducted by solving the following normal equation: 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 (4.1) On the other hand, the basic LARS algorithm updates the active entries of the solution vector π‘₯ by solving the following penalized least square problem: 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 βˆ’ πœ† 𝑑+1 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ (4.2) Equations (4.1) and (4.2) are identical if the penalization term οΏ½πœ† 𝑑+1 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½οΏ½ is removed from Equation (4.2). However, Donoho et al [34] did not demonstrate the derivation of (4.2). Therefore, we formulate our derivation of (4.2) in appendix A. 32 In our work, we compare OMP and modified LARS for solving LASSO from a different perspective. We reformulate some steps in the algorithms to reflect more similarities and differences between these algorithms. For simplicity, hereafter in this chapter and the next ones of this thesis, we use the abbreviation β€œLARS” to refer to the modified LARS algorithm that is specialized for solving the LASSO problem. OMP and LARS solve different optimization problems: OMP is used to find an approximate solution for the l0-norm minimization problem stated by Equation (1.2), while LARS is used to solve the l1-norm minimization problem stated by Equation (1.4). Nevertheless, both OMP and LARS depend on an underlying greedy framework. Indeed, they have almost the same fundamental algorithmic steps. Both algorithms start from an all-zero solution, an empty active set, and the measurement vector 𝑦 as initial residual. They depend on the correlation between the columns of the matrix 𝐴 and the current residual to maintain the active set. They compute the residual vector in the same manner. Furthermore, in both algorithms, it is required to solve the least square problem over the active columns to update the solution vector π‘₯. However, OMP and LARS are different in some steps. At each iteration of LARS, a column is added to or removed from the active set, while in OMP, a column is always added to the active set. In addition, each algorithm adopts a different way to update the solution vector π‘₯. OMP takes the largest possible step in the least square direction of the active columns. OMP achieves this through updating the active entries of the solution vector π‘₯ by projecting the vector 𝑦 onto the space spanned by the active columns. On the other hand, LARS takes the smallest possible step that forces a column from the inactive set to join the active set, or drops a column from the active set. The algorithmic steps of OMP and LARS are compared in Table 4.1. 33 OMP Initialization Compute correlation Update active set π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1, 𝐼 = βˆ… 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 𝑖 = arg π‘šπ‘Žπ‘₯|𝑐 𝑑 (𝑗)| 𝑐 π‘–βˆˆπΌ 𝐼 = 𝐼 βˆͺ {𝑖} 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 π‘₯ 𝑑 (𝐼 𝑐 ) = 0 Stopping condition Increase iteration counter π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1, 𝐼 = βˆ… 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž If (𝛾 π‘‘βˆ’1 = 𝛾 + π‘œπ‘Ÿ 𝑑 = 1) // add column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } 𝐼 = 𝐼 βˆͺ {𝑖} If (𝛾 π‘‘βˆ’1 = 𝛾 βˆ’ ) // remove column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼: π‘₯ π‘‘βˆ’1 (𝑗) = 0} 𝐼 = 𝐼 βˆ’ {𝑖} 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼)) π‘–βˆˆπΌ 𝑑 𝑑 (𝐼 𝑐 ) = 0 𝑣 𝑑 = 𝐴 𝐼 𝑑 𝑑 (𝐼) 𝛾 + = π‘šπ‘–π‘› οΏ½ 𝑑 𝑐 Update solution Compute residual LARS πœ† 𝑑 – 𝑐 𝑑 (𝑖) πœ† 𝑑 + 𝑐 𝑑 (𝑖) , οΏ½ 1 βˆ’ π‘Ž 𝑖𝑇 𝑣 𝑑 1 + π‘Ž 𝑖𝑇 𝑣 𝑑 𝛾 βˆ’ = π‘šπ‘–π‘› οΏ½βˆ’ 𝑑 π‘–βˆˆπΌ π‘₯ π‘‘βˆ’1 (𝑖) οΏ½ 𝑑 𝑑 (𝑖) 𝛾 𝑑 = π‘šπ‘–π‘› {𝛾 + , 𝛾 βˆ’ } 𝑑 𝑑 π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + 𝛾 𝑑 𝑑 𝑑 π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 β€–π‘Ÿ 𝑑 β€–2 < πœ– β€–π‘Ÿ 𝑑 β€–2 < πœ– 𝑑 = 𝑑+1 𝑑 = 𝑑+1 Table 4.1: The algorithmic steps of OMP and LARS 34 From Table 4.1, it is observable that the algorithmic steps are identical in both algorithms, except the steps for updating the active set and the solution vector. To find more commonality between OMP and LARS, we reformulate the steps for updating active set and the solution vector in the OMP algorithm. 4.1.1 Reformulating step for updating the active set As we stated early on, one of differences between OMP and LARS is the way of updating the active set. To compare them in terms of updating active set step, we reformate the step in the OMP algorithm. At each iteration, OMP selects a column from the inactive set 𝐼 𝑐 that has the highest absolute correlation with current residual. This can be expressed as follows: 𝑖 = arg π‘šπ‘Žπ‘₯|𝑐 𝑑 (𝑗)| 𝑐 where 𝑖 is the index of the selected column. π‘–βˆˆπΌ (4.3) To make the updating active set step of OMP comparable with the corresponding step of LARS, we divide the step of finding the index 𝑖 in Equation (4.3) into two steps: 1- Computing the largest absolute entry in the correlation vector: πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž . 2- Determining an index of column from the inactive set that its absolute correlation with the current residual vector equals to πœ† 𝑑 . This can be described as follows: 𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } Then, 𝑖 is added to the active set as follows: 𝐼 = 𝐼 βˆͺ {𝑖}. (4.4) In Table 4.2, we state the original step for updating the active set and the reformulated one in OMP and LARS. It is noticeable that the step for updating the active set is the same in both algorithms if LARS always adds columns to the active set. Therefore, the basic LARS algorithm that is stated in Section 3.1 is similar to OMP in the way of updating the active set, because it 35 always adds columns to the active set and never removes them from the set. On the other hand, the modified LARS for solving the LASSO problem differs from OMP in the step for updating the active set. The modified LARS adds or removes columns to/from the active set, while OMP always adds columns to the active set. OMP Update active set (original) 𝑖 = arg π‘šπ‘Žπ‘₯|𝑐 𝑑 (𝑗)| 𝑐 π‘–βˆˆπΌ 𝐼 = 𝐼 βˆͺ {𝑖} πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž 𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } add column Update active set (after reformulating) πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž LARS 𝐼 = 𝐼 βˆͺ {𝑖} If (𝛾 π‘‘βˆ’1 = 𝛾 + π‘œπ‘Ÿ 𝑑 = 1) // add column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } 𝐼 = 𝐼 βˆͺ {𝑖} If (𝛾 π‘‘βˆ’1 = 𝛾 βˆ’ ) // remove column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼: π‘₯ π‘‘βˆ’1 (𝑗) = 0} 𝐼 = 𝐼 βˆ’ {𝑖} πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž If (𝛾 π‘‘βˆ’1 = 𝛾 + π‘œπ‘Ÿ 𝑑 = 1) // add column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } 𝐼 = 𝐼 βˆͺ {𝑖} If (𝛾 π‘‘βˆ’1 = 𝛾 βˆ’ ) // remove column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼: π‘₯ π‘‘βˆ’1 (𝑗) = 0} 𝐼 = 𝐼 βˆ’ {𝑖} Table 4.2: The original step for updating the active set and the reformulated one in OMP and LARS 4.1.2 Reformulating step for updating the solution vector The other difference between OMP and LARS is the way of updating the solution vector. To compare them clearly, we reformulate the step for updating the solution vector in OMP to make it comparable with corresponding step in LARS. This can be explained as follows: let 𝐼 be the active set at iteration 𝑑, and 𝐼 Μ… be the active set at iteration 𝑑 βˆ’ 1. We assume that OMP is 36 currently at iteration 𝑑. The active entries of the solution vector π‘₯ are computed via solving Equation (4.1): 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 The residual vector at iteration 𝑑 βˆ’ 1 is obtained by: (4.5) π‘Ÿ π‘‘βˆ’1 = 𝑦 βˆ’ 𝐴π‘₯ π‘‘βˆ’1 Because only active entries of π‘₯ π‘‘βˆ’1 are nonzeros, we can rewrite (4.6) as follows: (4.6) Μ… 𝑦 = 𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 )+ π‘Ÿ π‘‘βˆ’1 By substituting 𝑦 of (4.7) in (4.5), we obtain: (4.7) Μ… π‘Ÿ π‘‘βˆ’1 = 𝑦 βˆ’ 𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 ) Μ… 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 (𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 ) + π‘Ÿ π‘‘βˆ’1 ) By simplifying this expression, it follows: Μ… 𝐴 𝐼𝑇 (𝐴 𝐼 π‘₯ 𝑑 (𝐼) βˆ’ 𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 ) ) = 𝐴 𝐼𝑇 π‘Ÿ π‘‘βˆ’1 Recall Equation (3.2) that computes the active entries of the correlation vector: 𝑐 𝑑 (𝐼) = 𝐴 𝐼𝑇 π‘Ÿ π‘‘βˆ’1 . This leads to: Μ… 𝐴 𝐼𝑇 (𝐴 𝐼 π‘₯ 𝑑 (𝐼) βˆ’ 𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 ) ) = 𝑐 𝑑 (𝐼) (4.8) At each iteration of OMP, one column is added to the active set. Therefore, the active set 𝐼 at iteration 𝑑 has one column that is not included in the active set 𝐼 Μ… of the previous iteration (𝑑 βˆ’ 1). The coefficient of π‘₯ π‘‘βˆ’1 that is corresponding to this column is zero. Hence, we get: Μ… 𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 ) = 𝐴 𝐼 π‘₯ π‘‘βˆ’1 (𝐼) By substituting (4.9) in (4.8), the following equation is obtained: Equivalently: 𝐴 𝐼𝑇 (𝐴 𝐼 π‘₯ 𝑑 (𝐼) βˆ’ 𝐴 𝐼 π‘₯ π‘‘βˆ’1 (𝐼) ) = 𝑐 𝑑 (𝐼) 37 (4.9) 𝐴 𝐼𝑇 𝐴 𝐼 οΏ½π‘₯ 𝑑 (𝐼) βˆ’ π‘₯ π‘‘βˆ’1 (𝐼)οΏ½ = 𝑐 𝑑 (𝐼) At this point, let us define the vector Ξ”π‘₯ 𝑑 ∈ 𝑅 𝑛 where Ξ”π‘₯ 𝑑 (𝐼) = π‘₯ 𝑑 (𝐼) βˆ’ π‘₯ π‘‘βˆ’1 (𝐼) and Ξ”π‘₯ 𝑑 (𝐼 𝑐 ) = 0. Therefore, Equation (4.10) can be expressed as follows: 𝐴 𝐼𝑇 𝐴 𝐼 Ξ”π‘₯ 𝑑 (𝐼) = 𝑐 𝑑 (𝐼) (4.10) (4.11) The active entries of the vector Ξ”π‘₯ 𝑑 can be obtained by solving Equation (4.11), while the remaining entries of the vector are set to zero (i.e. Ξ”π‘₯ 𝑑 (𝐼 𝑐 ) = 0). The formula for updating the solution vector π‘₯ can be rewritten in terms of the vector Ξ”π‘₯ as follows: π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + Ξ”π‘₯ 𝑑 (4.12) As a result, we divide the step for updating the solution vector that is expressed by Equation (4.1) in the OMP algorithm into two steps: 1- Computing active entries of the vector Ξ”π‘₯ as expressed in Equation (4.11), and setting the remaining entries to zero. 2- Updating the solution vector π‘₯ as expressed in Equation (4.12). With respect to LARS, the vector Ξ”π‘₯ can be calculated by using Equation (3.10) as follows: Ξ”π‘₯ 𝑑 = 𝛾 𝑑 𝑑 𝑑 (4.13) The original step for updating the solution vector and the reformulated one are listed for OMP and LARS in Table 4.3. 38 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 OMP π‘₯ 𝑑 (𝐼 𝑐 ) = 0 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼)) LARS 𝑣 𝑑 = 𝐴 𝐼 𝑑 𝑑 (𝐼) 𝛾 + = π‘šπ‘–π‘› οΏ½ 𝑑 𝑐 π‘–βˆˆπΌ Update solution 𝑑 𝑑 (𝐼 𝑐 ) = 0 πœ† 𝑑 – 𝑐 𝑑 (𝑖) πœ† 𝑑 + 𝑐 𝑑 (𝑖) , οΏ½ 1 βˆ’ π‘Ž 𝑖𝑇 𝑣 𝑑 1 + π‘Ž 𝑖𝑇 𝑣 𝑑 𝛾 βˆ’ = π‘šπ‘–π‘› οΏ½βˆ’ 𝑑 (original) 𝐴 𝐼𝑇 𝐴 𝐼 Ξ”π‘₯ 𝑑 (𝐼) = 𝑐 𝑑 (𝐼) Ξ”π‘₯ 𝑑 (𝐼 𝑐 ) = 0 𝛾 𝑑 = π‘šπ‘–π‘› {𝛾 + , 𝛾 βˆ’ } 𝑑 𝑑 π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + 𝛾 𝑑 𝑑 𝑑 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼)) π‘–βˆˆπΌ 𝑑 𝑑 (𝐼 𝑐 ) = 0 𝑣 𝑑 = 𝐴 𝐼 𝑑 𝑑 (𝐼) 𝛾 + = π‘šπ‘–π‘› οΏ½ 𝑑 𝑐 Update solution π‘–βˆˆπΌ πœ† 𝑑 – 𝑐 𝑑 (𝑖) πœ† 𝑑 + 𝑐 𝑑 (𝑖) , οΏ½ 1 βˆ’ π‘Ž 𝑖𝑇 𝑣 𝑑 1 + π‘Ž 𝑖𝑇 𝑣 𝑑 𝛾 βˆ’ = π‘šπ‘–π‘› οΏ½βˆ’ 𝑑 (after reformulating) π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + Ξ”π‘₯ 𝑑 π‘₯ π‘‘βˆ’1 (𝑖) οΏ½ 𝑑 𝑑 (𝑖) π‘–βˆˆπΌ π‘₯ π‘‘βˆ’1 (𝑖) οΏ½ 𝑑 𝑑 (𝑖) 𝛾 𝑑 = π‘šπ‘–π‘› {𝛾 + , 𝛾 βˆ’ } 𝑑 𝑑 Ξ”π‘₯ 𝑑 = 𝛾 𝑑 𝑑 𝑑 π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + Ξ”π‘₯ 𝑑 Table 4.3: The original step for updating the solution vector and the reformulated one in OMP and LARS By using the reformulated steps that we derive in Sections 4.1.1 and 4.1.2, the algorithmic steps of OMP and LARS can be rewritten as shown in Table 4.4. 39 OMP Compute correlation π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1, 𝐼 = βˆ… Update active set 𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } Initialization 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž add column 𝐼 = 𝐼 βˆͺ {𝑖} 𝐴 𝐼𝑇 𝐴 𝐼 Ξ”π‘₯ 𝑑 (𝐼) = 𝑐 𝑑 (𝐼) Ξ”π‘₯ 𝑑 (𝐼 𝑐 ) = 0 Stopping condition 𝑐 𝑑 = 𝐴 𝑇 π‘Ÿ π‘‘βˆ’1 πœ† 𝑑 = ‖𝑐 𝑑 β€–βˆž If (𝛾 π‘‘βˆ’1 = 𝛾 + π‘œπ‘Ÿ 𝑑 = 1) // add column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼 𝑐 : |𝑐 𝑑 (𝑗)| = πœ† 𝑑 } 𝐼 = 𝐼 βˆͺ {𝑖} If (𝛾 π‘‘βˆ’1 = 𝛾 βˆ’ ) // remove column π‘‘βˆ’1 𝑖 = { 𝑗 ∈ 𝐼: π‘₯ π‘‘βˆ’1 (𝑗) = 0} 𝐼 = 𝐼 βˆ’ {𝑖} 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = 𝑠𝑖𝑠𝑛(𝑐 𝑑 (𝐼)) π‘–βˆˆπΌ 𝑑 𝑑 (𝐼 𝑐 ) = 0 𝑣 𝑑 = 𝐴 𝐼 𝑑 𝑑 (𝐼) πœ† 𝑑 – 𝑐 𝑑 (𝑖) πœ† 𝑑 + 𝑐 𝑑 (𝑖) , οΏ½ 1 βˆ’ π‘Ž 𝑖𝑇 𝑣 𝑑 1 + π‘Ž 𝑖𝑇 𝑣 𝑑 𝛾 βˆ’ = π‘šπ‘–π‘› οΏ½βˆ’ 𝑑 update vector Compute residual π‘₯0 = 0, π‘Ÿ0 = 𝑦, 𝑑 = 1, 𝐼 = βˆ… 𝛾 + = π‘šπ‘–π‘› οΏ½ 𝑑 𝑐 Compute solution Update solution LARS π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + Ξ”π‘₯ 𝑑 π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 β€–π‘Ÿ 𝑑 β€–2 < πœ– π‘–βˆˆπΌ π‘₯ π‘‘βˆ’1 (𝑖) οΏ½ 𝑑 𝑑 (𝑖) 𝛾 𝑑 = π‘šπ‘–π‘› {𝛾 + , 𝛾 βˆ’ } 𝑑 𝑑 Ξ”π‘₯ 𝑑 = 𝛾 𝑑 𝑑 𝑑 π‘₯ 𝑑 = π‘₯ π‘‘βˆ’1 + Ξ”π‘₯ 𝑑 π‘Ÿ 𝑑 = 𝑦 βˆ’ 𝐴π‘₯ 𝑑 β€–π‘Ÿ 𝑑 β€–2 < πœ– Increase iteration 𝑑 = 𝑑+1 𝑑 = 𝑑+1 counter Table 4.4: The algorithmic steps of OMP and LARS after reformulating 40 Form Table 4.4, it is observable that the algorithmic steps of OMP and LARS are almost identical; the only differences are in updating the active set and computing the solution update vector (the vector Ξ”π‘₯). For updating the active set, OMP always adds a column to the active set at each iteration, and this column would never be removed later. On the other hand, in LARS, a column is added to or removed from the active set at each iteration. For computing the vector Ξ”π‘₯, both OMP and LARS solve the least square problems stated in Equations (4.11) and (3.6) respectively. However, each algorithm relies on different parameters to achieve this. OMP uses the active entries of the correlation vector 𝑐, while LARS uses only the signs of them. Equally important in OMP, the vector Ξ”π‘₯ is directly computed by solving the least square problem as stated in Equation (4.11). In contrast, in LARS, the vector Ξ”π‘₯ is computed by the following steps: 1. Determine the updated direction 𝑑 by solving the least square problem (3.6). 2. Compute the step size 𝛾 by Equation (3.35). 3. Multiply 𝛾 by 𝑑 to obtain the vector Ξ”π‘₯ as express in Equation (4.13). 4.2 Performance Analysis To analyze the performance of OMP and LARS, we compare these two algorithms in terms of the convergence time and the accuracy of the final solution. 4.2.1 Convergence time In convergence to the final solution, OMP is much faster than LARS because: 1. Generally speaking, OMP requires less number of iterations than LARS to converge to the final solution. This is because OMP always adds columns to the active set, while LARS adds or removes columns to/from the active set. 41 2. At each iteration, OMP computes the vector Ξ”π‘₯ in one step by solving the least square problem (4.11). Whereas in LARS, computing the vector Ξ”π‘₯ involves many steps as stated in Section 4.1. These steps require more computations than solving the least square problem. 4.2.2 Accuracy In terms of the recovered sparse solution, OMP is considered less accurate than LARS when some columns of the matrix 𝐴 are highly correlated. OMP takes the largest possible step in the least square direction of the active columns by solving the least square problem to update the solution vector π‘₯. As a result, the residual vector will be orthogonal to all active columns at the next iteration. In other words, the residual vector will have zero correlation with all active columns. Consequently, columns that are highly correlated with the active columns would have small correlation with the current residual. Recall that at each iteration, OMP select a column that has the largest absolute correlation with current residual. Therefore, OMP does not select these columns even though they are significant for recovering the sparse solution. On the other hand, LARS does not this problem because it uses a fairly smaller step than the OMP step, and increases the coefficients of the solution vector associated with the active set as much as needed. For this reason, LARS selects important columns for recovering the sparse solution even though they may be highly correlated with the columns that have already been in the active set .To demonstrate this, we consider the following example: Example 4.1: assume the matrix 𝐴 contains three columns (π‘Ž1 , π‘Ž2 , π‘Ž3 ), where the columns π‘Ž1 and π‘Ž2 are highly correlated as shown in Figure 4.1. We generate the vector π‘₯ ∈ 𝑅 3 which its first and second entries are nonzeros, and the third one is zero: 0.9613 π‘₯ = οΏ½0.2757οΏ½ 0 42 The measurement vector As a result, the vector is obtained via multiplying the matrix by the vector would be a linear combination of the columns and (i.e. ). . 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 Figure 4.1: Columns of the matrix approximation change to the vector 1 1.2 and the measurement vector We use both OMP and LARS to find the vector combination of columns of the matrix 0.8 1.4 in Example 4.1 that represents the vector as a linear . In Figure 4.2 and Figure 4.3, we illustrate the which is accomplished by OMP and LARS at each iteration. Note that the approximation change is calculated by multiplying the matrix by the current solution update vector Ξ” . OMP starts by selecting a column from the matrix with the initial residual (i.e. the vector that has maximum absolute correlation . In this example, the column is selected at the first iteration because it is highly correlated with initial residual as shown in Figure 4.2, and added to the active set. Then, OMP takes the largest possible step in the direction of the column projecting the vector onto the column by . This leaves some error represented by the residual 43 vector π‘Ÿ which is orthogonal to π‘Ž1 . At the second iteration, the column π‘Ž2 would be less correlated with the residual vector since it is highly correlated with the column π‘Ž1 . In this case, the column π‘Ž3 has the largest absolute correlation with the current residual. Therefore, the column π‘Ž3 is selected and added to the active set as shown in Figure 4.3. Then, OMP takes the largest possible step in the space spanned by the columns (π‘Ž1 ,π‘Ž3 ) toward the vector 𝑦. After updating the solution vector, the residual will be zero and OMP terminates. Note that the column π‘Ž2 is never selected even though it is important for recovering the original vector π‘₯. On the other hand, similar to OMP, LARS starts by adding the column π‘Ž1 to the active set at the first iteration as shown in Figure 4.2. However, LARS moves in the direction of the column π‘Ž1 until the column π‘Ž2 has absolute correlation with the current residual as much as π‘Ž1 . At the second iteration, LARS adds the column π‘Ž2 to the active set, and moves in the direction that is equiangular with both π‘Ž1 and π‘Ž2 toward the vector 𝑦 as shown in Figure 4.3. The residual will be zero after updating the solution vector. Therefore, LARS terminates at the end of the second iteration. Note that LARS selects the column π‘Ž2 which is essential to reconstruct the original vector π‘₯, while OMP does not. Figure 4.4 shows the coefficients of the reconstructed vector οΏ½ over iterations of OMP and π‘₯ LARS. As shown in the figure, LARS achieves very small error norm (almost zero), while OMP obtains a high error norm. Note that the error is computed by the following equation: π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ = π‘₯ βˆ’ οΏ½ π‘₯ where π‘₯ is the original sparse vector, and οΏ½ is the reconstructed one by the algorithms π‘₯ (4.14) Therefore, this implies that LARS can reconstruct the sparse vector π‘₯ when two or more columns of the matrix 𝐴 are highly correlated, while OMP fails. However, LARS is slower than OMP. 44 OMP Selection LARS Selection 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 0 0 1.4 0.2 0.4 Updating 0.6 0.8 1 1.2 1.4 1 1.2 1.4 Updating 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 0 1.4 0 0.2 0.4 0.6 0.8 : Inactive columns of the matrix : Active columns of the matrix : Measurement vector : Approximation change ( Ξ” ) at the first iterations Figure 4.2: The selection and updating steps at the first iteration of OMP and LARS in Example 4.1.The approximation change is obtained via multiplying the matrix by the current solution update vector Ξ” 45 OMP Selection LARS Selection 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 0 0 1.4 0.2 0.4 Updating 0.6 0.8 1 1.2 1.4 1 1.2 1.4 Updating 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0 0.2 0.4 0.6 0.8 : Inactive columns of the matrix : Active columns of the matrix : Measurement vector : Approximation change ( Ξ” ) at the first iterations : Approximation change ( Ξ” ) at the second iterations Figure 4.3: The selection and updating steps at the second (last) iteration of OMP and LARS in Example 4.1. The approximation change is obtained via multiplying the matrix by the current solution update vector Ξ” 46 OMP initialization 1 1 0 0 0 0.8 LARS initialization 0 0 0 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 1 0.6 0.8 iteration 1 1 0.8 1 0.6 1.2 0 0 1.4 0.2 0.4 1 1.2022 0 0 0.6 0.8 iteration 1 1 0.4 1 0.4 0.2 1.4 0.6856 0 0 0.8 0.6 1.2 0.2 0 0 0 0.2 1 0.4 0.6 0.8 iteration 2 0.9613 0.2757 0 0.8 0.6 1 1.2 1 3 0 1.4 0.2 1 1.1831 0 0.1353 0.4 0.6 0.8 iteration 2 0.9613 0.2757 0 0.8 0.6 0.4 1.2 1.4 0.9613 0.2757 0 1 2 0.4 0.2 1 0.2 0 0 β€– 0 0 0.2 β€– 0.4 0.6 β€– β€– 0.8 1 1.2 β€– 0.3788 1.4 0.2 β€– 0.4 β€– 0.6 β€– 0.8 1 4.3356 1.2 10 1.4 : Inactive columns of the matrix : Active columns of the matrix : Measurement vector Figure 4.4: The solution coefficients of the reconstructed vector over iterations of OMP and LARS in Example 4.1, and Euclidean norm of error between the original sparse vector and the reconstructed vector after the algorithms terminate 47 Chapter 5 5 Simulation Results and Discussion In this chapter, we simulate the comparative analysis of the OMP and LARS algorithms stated in previous chapters by using MATLAB. In general, we would go through different examples to demonstrate the similarities and differences between OMP and LARS from different perspectives. First, we employ OMP and LARS to reconstruct some images from their compressive samples. Afterward, we exploit parallel processing techniques to speed-up convergence of the algorithms, and observe the efficiency of using different number of processors. Next, we examine the performance of OMP and LARS in terms of mean square error (MSE) as a function of the measurement size π‘š and the sparsity π‘˜. To study the difference in the updating process of OMP and LARS, we observe the coefficients of the solution and correlation vectors over iterations. Finally, to study the performance of OMP and LARS from different aspect, we generate two different examples of an underdetermined linear system. We utilize OMP and LARS to reconstruct the sparse vector in each example. At each iteration of the algorithms, we plot the updating process in three dimensions (3D) view. 5.1 Reconstructing images from their compressive samples To demonstrate the efficiency of OMP and LARS, we reconstruct gray images from their compressive samples by using both algorithms, and measure the error between the original image and the reconstructed one. To estimate the convergence speed of each algorithm, we record the time and the number of iterations that algorithms require to reconstruct the images. 48 We use images of size (512 Γ— 512) pixels. First, we divide each image into 4096 patches of size (8 Γ— 8) pixels. We reshape every patch into a vector of size (64 Γ— 1) as shown in Figure 5.1. Each patch vector is normalized through dividing its entries by the maximum value found in the image. The matrix 𝐴 is obtained by multiplying two matrices: π‘š Γ— 𝑛 random matrix πœ™, and 𝑛 Γ— 𝑛 transformation matrix πœ“: where 𝑛 = 64. 𝐴 π‘šΓ—π‘› = πœ™ π‘šΓ—π‘› Γ— πœ“ 𝑛×𝑛 (5.1) The random matrix πœ™ is generated by using Gaussian distribution, and the transformation matrix πœ“ is generated by using Discrete Cosine Transform (DCT) [41]. To obtain the measurement vector 𝑦, the random matrix πœ™ is multiplied by each patch of image: 𝑦𝑖 = πœ™ Γ— 𝑝𝑖 where 𝑝 𝑖 represents the 𝑖 π‘‘β„Ž patch, and 𝑖 = 1,2, … … ,4096 (5.2) Now, the underdetermined linear system is expressed as follows: 𝐴 Γ— π‘₯𝑖 = 𝑦𝑖 where π‘₯ 𝑖 is the sparse representation of the patch 𝑝 𝑖 . (5.3) To obtain π‘₯ 𝑖 for each 𝑖, we employ OMP and LARS with the system (5.3) as shown in Figure 5.1. Then, the transformation matrix πœ“ is multiplied by π‘₯ 𝑖 to obtain the reconstructed patch 𝑝̂ 𝑖 : 𝑝̂ 𝑖 = πœ“ Γ— π‘₯ 𝑖 49 (5.4) 512 512 Original 8 Patch πœ™ π‘šΓ—64 Matrix 𝐴 = πœ™ Γ— πœ“ 8 Reshape = Γ— 64 𝑝 Reconstructed Patch 𝑦 π‘š Reshape Algorithm OMP or LARS πœ“64Γ—64 Γ— Compressive Samples 64 64 π‘₯ = Sparse vector 𝑝̂ Figure 5.1: Reconstruct a patch of an image from its compressive samples by using OMP or LARS 50 After reconstructing all patches, we form the recovered image. Figure 5.2 and Figure 5.3 show the recovered images for measurement size π‘š = 32 and π‘š = 16 respectively. Then, we compute the mean square error (MSE) between the original patch and the reconstructed one: 𝑛 2 1 𝑀𝑆𝐸 = οΏ½ �𝑝 𝑖 (𝑗) βˆ’ 𝑝̂ 𝑖 (𝑗)οΏ½ 𝑛 𝑖 𝑗=1 (5.5) After computing MSE for all 4096 patches, we calculate the average mean square error over them: 4096 1 π΄π‘£π‘’π‘Ÿπ‘Žπ‘ π‘’ 𝑀𝑆𝐸 = οΏ½ 𝑀𝑆𝐸 𝑖 4096 𝑖=1 (5.6) We depend on the value of Peak Signal to Noise Ratio (PSNR) to determine the performance of the algorithms which is computed by: 𝑃𝑆𝑁𝑅 = 10 log10 οΏ½ 𝑀𝐴𝑋(π‘₯) οΏ½ π΄π‘£π‘’π‘Ÿπ‘Žπ‘ π‘’ 𝑀𝑆𝐸 (5.7) where 𝑀𝐴𝑋(π‘₯) is the maximum possible entry in the patches. Note that 𝑀𝐴𝑋(π‘₯) = 1 because the image patches are normalized. If an algorithm has high PSNR, this implies that it can successfully reconstruct the original images with small error. The values of PSNR for OMP and LARS with measurement size π‘š = 32 and π‘š = 16 are shown in Figure 5.2 and Figure 5.3 respectively. To estimate the convergence speed of the algorithms, we record the number of iterations that each algorithm requires to converge to the final solution for each patch. Then, we compute the average number of iterations over all patches in the same way that we did with MSE. However, the number of iterations is not an accurate measure of the algorithms’ speed because computation involved at each iteration of LARS takes longer time than the computation of OMP iteration. To 51 get a better measurement of the algorithms’ speed, we also record the required time for each algorithm to recover the whole image in a certain computer. In this simulation, we use a Dell laptop that is equipped with CPU Intel (i3, 2.13 GHz) and (4 GB) of memory. The average number of iterations and the required time for OMP and LARS with measurement size π‘š = 32 and π‘š = 16 are shown in Figure 5.2 and Figure 5.3 respectively. From Figure 5.2 and Figure 5.3, it is observable that the number of iterations required by OMP is less than the number of iterations required by LARS to recover the same images. This is because OMP always adds columns to the active set, while LARS adds or removes columns to/from the active set. Equally important, OMP consumes less time than LARS. Nevertheless, LARS obtains higher PSNR than OMP. The justification for this is that LARS selects the most important columns from the matrix 𝐴 to recover the sparse solution even though these columns are highly correlated, while OMP does not. 52 original image reconstructed image (OMP) reconstructed image (LARS) OMP LARS PSNR 26.3780 28.2098 Time (second) 7.58966 22.27399 Average iteration 32 36 original image reconstructed image (OMP) reconstructed image (LARS) OMP LARS PSNR 28.3421 30.0823 Time (second) 5.42805 16.819573 Average iteration 23 27 Figure 5.2: Reconstructed images from their compressive samples by using OMP and LARS with π‘š = 32 and 𝑛 = 64 53 original image reconstructed image (OMP) reconstructed image (LARS) OMP LARS PSNR 21.8220 22.4618 Time (second) 2.452597 8.807559 Average iteration 16 19 original image reconstructed image (OMP) reconstructed image (LARS) OMP LARS PSNR 21.1888 22.5433 Time (second) 2.138599 7.632700 Average iteration 14 16 Figure 5.3: Reconstructed images from their compressive samples by using OMP and LARS with π‘š = 16 and 𝑛 = 64 54 5.2 Using parallel processing to speed-up convergence of the algorithms In the previous section, we divided an image into a number of patches. After that, the compressive samples of each patch were obtained. Next, the algorithms (OMP and LARS) were used to reconstruct the original patches as illustrated in Figure 5.1. Because samples of each patch are independent of samples of other patches, and the same algorithmic steps are executed to samples of all patches, we can utilize the parallel processing techniques to expedite the convergence of the algorithms. According to the Flynn classification [42], reconstructing all patches of an image from their compressive samples falls under single instruction multiple data (SIMD) class. This is because identical algorithmic steps are performed to samples of different patches to recover the whole image. In this case, samples of patches are distributed over different processers. All processors execute the same algorithmic steps to recover their own patches. Subsequently, the recovered patches are combined from processers to form the entire image. Both OMP and LARS calculate the Gramian matrix (𝐴 𝑇 𝐴) to solve the least square problem over active columns. Computing the Gramian matrix involves heavy computation. Equally important, the matrix 𝐴 is identical for all patches. Therefore, for efficiency reason, the Gramian matrix (𝐴 𝑇 𝐴) is off-line computed first, and then it is distributed to all processors [43]. To examine impact of using many processors on convergence time, we reconstruct an image from its compressive samples as we did in Section 5.1. However, in this section, we use more than one processor to run OMP and LARS. We illustrate variation of the convergence time with respect to the number of used processors as shown in Figure 5.4. It is noticeable that the convergence time reduces as the number of used processors increases. In addition, we observe that the reduction in convergence time is not significant as the number of processors increases 55 above 6. This is because the communication overhead among processors grows as the number of used processors increases. Figure 5.4: The convergence time of OMP and LARS against the number of used processors 56 To calculate the gain of using many processors, we compute the speedup ratio by using the following equation: 𝑆𝑝𝑒𝑒𝑑𝑒𝑝 = 𝑇(1) 𝑇(𝑝) (5.8) where 𝑇(1) is the convergence time of an algorithm by using one processor, 𝑇(𝑝) is the convergence time of an algorithm by using 𝑝 processors, and 1 ≀ 𝑆𝑝𝑒𝑒𝑑𝑒𝑝 ≀ 𝑝. The ideal speedup ratio equals to the number of used processors (𝑝). Normally, the speedup ratio is less than 𝑝 because of time lost in communication overhead among processors. The ideal and actual speedup ratios for the OMP algorithm are shown in Figure 5.5. Figure 5.5: The ideal and actual speedup ratios for the OMP algorithm against the number of used processors 57 5.3 Impact of measurement size and sparsity on MSE In this section, we examine the performance of OMP and LARS in terms of mean square error (MSE) as a function of the measurement size π‘š and the sparsity π‘˜. We consider various underdetermined linear systems that have different measurement size π‘š and different sparsity π‘˜. For all of the considered underdetermined linear systems, the value of 𝑛 is fixed to 1000. For each trial of underdetermined linear system, we randomly generate the matrix 𝐴 of size π‘š Γ— 1000 by using Gaussian distribution. The columns of the matrix 𝐴 are normalized to have unit l2 norm. Then we generate the sparse vector π‘₯ ∈ 𝑅1000 which has π‘˜ nonzero entries in random positions. The nonzero entries of the vector π‘₯ are also obtained from Gaussian distribution, and all other entries are set to zero. The vector π‘₯ is also normalized. We multiply the matrix 𝐴 by the sparse vector π‘₯ to compute the measurement vector 𝑦: 𝑦= 𝐴× π‘₯ (5.9) At this point, we employ OMP and LARS to reconstruct the original sparse vector of undetermined linear system described by Equation (5.9). Then, we compute MSE between the original sparse vector and the reconstructed one as follows. 𝑛 1 2 𝑀𝑆𝐸 = οΏ½ οΏ½π‘₯(𝑗) βˆ’ οΏ½(𝑗)οΏ½ π‘₯ 𝑛 𝑗=1 (5.10) where οΏ½ is the reconstructed sparse vector by the algorithms π‘₯ In Figure 5.6, the average MSE of 100 trials is illustrated as a function of π‘š for different values of π‘˜ with using OMP and LARS. For each value of π‘˜, we observe that the average MSE decreases as the value of π‘š increases, until it settles to a very small value about (10βˆ’34) after specific value of π‘š, say οΏ½ . Having the average MSE approaching such small value implies that π‘š the algorithms successfully reconstruct the original sparse vector. Hence, the value οΏ½ is π‘š 58 considered as the minimum size of measurements to make the algorithms able to recover the original π‘˜ sparse vector. From the figure, we observe that OMP and LARS require more measurement π‘š to successfully recover the sparse vector as number of nonzeros π‘˜ is increased. In addition, we notice that LARS recovers the sparse vector with less average MSE than OMP for π‘š < οΏ½ . Therefore, LARS obtains better results when the measurement size π‘š is smaller than π‘š οΏ½. π‘š To examine the performance of the algorithms from a different viewpoint, we plot the average MSE as a function of π‘˜ for different values of π‘š as shown in Figure 5.7. We can see that the average MSE starts from almost zero; then after a certain value of π‘˜, say οΏ½ , it obviously π‘˜ increases to some values above zero. Therefore, by using the measurement vector of size π‘š, the algorithms can recover a π‘˜ sparse vector if π‘˜ < οΏ½ . Furthermore, we observe that LARS recovers π‘˜ a π‘˜ sparse vector with less average MSE than OMP when π‘˜ > οΏ½ . As a result, we can state that π‘˜ LARS is better to be used than OMP when the sparsity π‘˜ is fairly large. 59 Figure 5.6: Average MSE of 100 trials between the original sparse vectors and the reconstructed ones by OMP and LARS as a function of measurement size π‘š for different sparsity π‘˜ 60 Figure 5.7: Average MSE of 100 trials between the original sparse vectors and the reconstructed ones by OMP and LARS as a function of sparsity π‘˜ for different measurement size π‘š 61 5.4 Variation of the solution and correlation coefficients over iterations In this section, we aim to monitor the updating process of OMP and LARS to the solution vector over iterations. We use both OMP and LARS to recover a sparse solution of an underdetermined linear system. At each iteration of OMP and LARS, we plot the solution coefficients. To run an example, we consider an underdetermined linear system with π‘š = 50, 𝑛 = 100, and π‘˜ = 10. We generate the matrix 𝐴 and the sparse vector π‘₯ of the system by using Gaussian distribution in the same way we did in Section 5.3. To obtain the measurement vector 𝑦, we multiply the matrix 𝐴 by the sparse vector π‘₯ as express in Equation (5.9). At this point, we employ OMP and LARS to reconstruct the original sparse vector. The solution coefficients that corresponding to the active columns are shown over iterations of OMP and LARS in Figure 5.8. Form Figure 5.8, we observe that both OMP and LARS terminate with the same solution coefficients. However, they are different in the way of updating the solution coefficients. In OMP, when a column is added to the active set, the corresponding solution coefficient is increased to the largest possible value. The coefficients of the active set may decrease slightly in the subsequent iterations, because they will be affected by other columns that would be added to the active set later. On the other hand, at each iteration of LARS, all coefficients corresponding to the active set are increased to the smallest possible step in a way that makes a column from the inactive set join the active set at the next iteration. In addition, we observe that columns are added to the active set in different order in the algorithms as shown in Figure 5.8. Moreover, LARS requires one more iteration than the case of OMP to converge to the final solution, since it adds one more column (column 72 in this example) to active set, and increase its corresponding solution coefficients to a very small value (almost zero). The reason behind this is that LARS 62 solves the l1 norm minimization problem. Therefore, some coefficient that are zero in original sparse vector would be equal to very small value but not exactly zero in the reconstructed sparse vector by LARS. Figure 5.8: The solution coefficients that correspond to the active columns over iterations of OMP and LARS 63 Furthermore, to see how the correlations of columns of the matrix 𝐴 with the current residual are changed, we illustrate them over some iterations of OMP and LARS as shown in Figure 5.9 and Figure 5.10 respectively. At each iteration, OMP adds a column that has maximum absolute correlation with current residual vector to the active set, and updates the solution coefficients to make its correlation zero at the next iteration. This is because OMP ensures that the current residual vector is always orthogonal to all active columns. For example in Figure 5.9, OMP adds the 99 π‘‘β„Ž column to the active set at the second iteration, and its correlation would be zero at the third iteration. OMP continues to add a column to the active set until the norm of the current residual approaches zero. On the other hand, LARS first selects a column that has the maximum absolute correlation with the initial residual (the vector 𝑦), and then updates the corresponding solution coefficient in a way that makes the absolute correlation of selected column equal to the largest absolute correlation over columns that are not in the active set. For instance in Figure 5.10, LARS selects the 55 π‘‘β„Ž column at the first iteration, and updates the solution coefficient to make its absolute correlation similar to the absolute correlation of the 91 π‘‘β„Ž column. Therefore, the 91 π‘‘β„Ž column would be added to the active set at the second iteration. In similar manner, LARS iteratively adds a column to the active set, and decreases the absolute correlations of the active columns until no column has correlation with the current residual. We conclude that OMP updates the solution vector to make absolute correlations of the active columns equal to zero, while LARS updates the solution vector to make absolute correlations of the active columns equal to same value (πœ†) as follows: OMP: |𝑐(𝑖)| = 0 LARS: |𝑐(𝑖)| = πœ† βˆ€π‘– ∈ 𝐼 (5.11) Note that πœ† is decreased at each iteration; when πœ† = 0, LARS converges to the final solution vector. 64 Active set 55 Active set 55 99 Active set 55 99 32 Active set 55 99 32 34 40 6 91 35 92 Active set 55 99 32 34 40 6 91 35 92 22 Figure 5.9: Absolute correlation of columns of the matrix 𝐴 in some iterations of OMP 65 πœ† Active set 55 Active set 55 91 Active set 55 91 99 Active set 55 91 99 35 40 32 6 34 92 72 Active set 55 91 99 35 40 32 6 34 92 72 22 Figure 5.10: Absolute correlation of columns of the matrix 𝐴 in some iterations of LARS 66 5.5 Study the algorithms performance through three dimensions (3D) examples In this section, we study the performance of OMP and LARS from different perspective by using three dimensional systems. We create two different examples of underdetermined linear system with π‘š = 3, 𝑛 = 6, and π‘˜ = 2. Then, we use OMP and LARS to reconstruct the sparse vector of these examples. We illustrate the updating process of algorithms at each iteration in three dimensions (3D) view. In both examples, we generate the matrix 𝐴 and the sparse vector π‘₯ by using Gaussian distribution in the same way we did in Section 5.3. We multiply the matrix 𝐴 by the sparse vector π‘₯ to obtain the measurement vector 𝑦 as expressed in Equation (5.9). Now, by having the matrix 𝐴 and the vector 𝑦, we use OMP and LARS to reconstruct the sparse vector π‘₯. To determine the performance of the algorithms, we compute MSE between the original sparse vector π‘₯ and the reconstructed one by the algorithms as stated in Equation (5.10). Example 5.1: we generate the matrix 𝐴 in this example to have columns that are slightly correlated. To quantify the correlation between columns of the matrix 𝐴, we use the mutual coherence πœ‡ which is computed as follows: πœ‡ ≝ π‘Žπ‘Ÿπ‘  π‘šπ‘Žπ‘₯οΏ½< π‘Ž 𝑖 , π‘Ž 𝑗 >οΏ½ where π‘Ž 𝑖 is the 𝑖 π‘‘β„Ž column of the matrix 𝐴 𝑖≠𝑗 (5.12) In this example, πœ‡ = 0.5124 which is considered a fairly small. The columns of the matrix 𝐴 and the measurement vector 𝑦 are drawn in Figure 5.11. We illustrate the approximation change (π΄βˆ†π‘₯) to the vector 𝑦 at each iteration of OMP and LARS in Figure 5.12. We notice that both OMP and LARS obtain almost the same MSE which is approximately zero. This implies that 67 OMP and LARS achieve the same performance and succeed to reconstruct the sparse vector when columns of the matrix are slightly correlated. 1 0.5 0 -0.5 1 0 -1 -1 : Columns of the matrix : Measurement vector Figure 5.11: The columns of the matrix -0.5 0 0.5 and the measurement vector 1 in Example 5.1 Equally important, for lower dimensional underdetermined linear systems, a rectangular matrix (matrix contains more columns than rows) that has small mutual coherence existed and hard to find. To obtain the matrix ∈ that has is rarely 0.5124 in this example, we generate 10 million matrices by using Gaussian distribution, and select the matrix that has the smallest mutual coherence . 68 OMP iteration 1 LARS iteration 1 1 1 0.5 0.5 0 0 -0.5 1 -0.5 1 0 -1 -1 -0.5 0 0.5 0 1 -0.5 -1 -1 iteration 2 0 0.5 1 0 0.5 1 iteration 2 1 1 0.5 0.5 0 0 -0.5 1 -0.5 1 0 0 -1 -1 -0.5 4.1087 0 0.5 1 -1 -1 10 -0.5 4.6222 10 : Inactive columns of the matrix : Active columns of the matrix : Measurement vector : Approximation change ( βˆ† ) Figure 5.12: The approximation change ( βˆ† ) at each iteration of OMP and LARS in Example 5.1, where 0.5124, and the approximation change βˆ† 69 Example 5.2: we generate the matrix in this example to have columns that are highly 0.8885. The columns of the matrix correlated, and and the measurement vector are drawn in Figure 5.13. As we did in Example 5.1, we illustrate the approximation change ( βˆ† ) to the vector at each iteration of OMP and LARS as shown in Figure 5.14. From the figure, we observe that the value of MSE in the case of LARS is very small (almost zero), while in OMP, it is high. Therefore, this implies that LARS succeeds to reconstruct the sparse vector , while OMP fails. Equally important, LARS converges to the final solution in two iterations, while OMP requires three iterations to converge to the final solution. As a result, LARS achieves a better performance than OMP when columns of the matrix are highly correlated. 1 0.5 0 -0.5 -1 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 : Columns of the matrix : Measurement vector Figure 5.13: The columns of the matrix -1 and the measurement vector 70 in Example 5.2 OMP iteration 1 LARS iteration 1 1 1 0 0 -1 1 -1 1 1 0 1 0 0 -1 -1 0 -1 -1 iteration 2 iteration 2 1 1 0 0 -1 1 -1 1 1 0 1 0 0 0 -1 -1 -1 -1 iteration 3 1 0 -1 1 0 1 0 -1 -1 0.0384 2.7785 10 : Inactive columns of the matrix : Active columns of the matrix : Measurement vector : Approximation change ( βˆ† ) Figure 5.14: The approximation change ( βˆ† ) at each iteration of OMP and LARS in Example 5.2, where 0.8885, and the approximation change βˆ† 71 Chapter 6 6 Conclusion and Future Work In this thesis, we developed a comprehensive comparison between two popular algorithms (OMP and LARS) that have been widely used for finding a sparse solution of an underdetermined linear system. We provided a thorough description of both algorithms. OMP attempts to find an approximate solution for the l0-norm minimization problem, while LARS solves the l1-norm minimization problem. Although OMP and LARS solve different minimization problems, they both depend on an underlying greedy framework. They start from an all-zero solution, and then iteratively construct a sparse solution based on correlation between columns of the matrix 𝐴 and the current residual vector. They converge to the final solution when norm of the current residual vector approaches zero. We showed that both OMP and LARS have almost similar algorithmic steps after reformulating the corresponding analytical expressions. Nevertheless, they are different in maintaining the active set and updating the solution vector. To maintain the active set, LARS adds or drops columns to/from the active set, while OMP always adds columns to the active set and never removes them from the set. Moreover, OMP and LARS adopt different ways to update the solution vector. At each iteration, OMP updates the solution coefficients to the largest possible value that makes the residual vector is orthogonal to all active columns. In contrast, LARS updates the solution coefficients to the smallest possible value that forces a column from the inactive set to join the active set, or drops a column from the active set. In addition, we analyzed the performance of OMP and LARS in terms of convergence time and accuracy of the final solution. In general, OMP converges to the final solution in less number of iterations than LARS because LARS may drop columns from the active set at some iterations, 72 while OMP does not. Equally important, LARS computes the solution update vector (the vector βˆ†π‘₯) which is used to update the solution vector π‘₯, in many steps, while OMP computes it in one step. Therefore, LARS executes many additional steps that OMP does not have. As a result, OMP requires less time than LARS to converge to the final solution. However, in some scenarios especially when columns of the matrix 𝐴 are highly correlated, LARS can successfully reconstruct the sparse vector, while OMP fails. The reason behind this is that OMP does not select columns that are highly correlated with columns that have already been in the active set, even though these columns are important to recover the sparse vector. In contrast, LARS does not face this problem. With respect to future work, we recommend to study the impact of mutual coherence of the matrix 𝐴 on the performance of OMP and LARS for different measurement size π‘š and different sparsity π‘˜. In addition, we believe that there is a good room to improve the implementations of OMP and LARS, and propose a new algorithm that combines benefits of OMP and LARS. 73 APPENDICES 74 Appendix A In this appendix, we formulate our derivation of penalized least square problem expressed by Equation (4.2), which can be exploited to compute active entries of the solution vector in the basic LARS algorithm that was stated in Section 3.1. Indeed, Equation (4.2) is used to compare LARS with OMP in [34]. We start the derivation by recalling Equation (3.6) that computes the updated direction of LARS: 𝐴 𝐼𝑇 𝐴 𝐼 𝑑 𝑑 (𝐼) = 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ where 𝐼 is the active set at the iteration 𝑑. Multiply Equation (A.1) by the step size 𝛾 𝑑 which is a scalar: From Equation (3.10), we obtain:- 𝐴 𝐼𝑇 𝐴 𝐼 𝛾 𝑑 𝑑 𝑑 (𝐼) = 𝛾 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ (A.1) (A.2) 𝛾 𝑑 𝑑 𝑑 (𝐼) = π‘₯ 𝑑 (𝐼) βˆ’ π‘₯ π‘‘βˆ’1 (𝐼) (A.3) 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ π‘‘βˆ’1 (𝐼) + 𝛾 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ (A.4) Substituting (A.3) into (A.2): 𝐴 𝐼𝑇 𝐴 𝐼 (π‘₯ 𝑑 (𝐼) βˆ’ π‘₯ π‘‘βˆ’1 (𝐼)) = 𝛾 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ Recall the residual vector π‘Ÿ at iteration 𝑑 βˆ’ 1 is computed by the following equation: π‘Ÿ π‘‘βˆ’1 = 𝑦 βˆ’ 𝐴π‘₯ π‘‘βˆ’1 (A.5) Μ… π‘Ÿ π‘‘βˆ’1 = 𝑦 βˆ’ 𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 ) (A.6) Because only active entries of π‘₯ π‘‘βˆ’1 are nonzeros, we can rewrite (A.5) as follows: where 𝐼 Μ… is the active set at iteration 𝑑 βˆ’ 1. 75 As OMP, the basic LARS always adds columns to the active set and never removes them from the set. Therefore, the active set 𝐼 at iteration 𝑑 has one column that is not included in the active set 𝐼 Μ… of the previous iteration (𝑑 βˆ’ 1). The coefficient of π‘₯ π‘‘βˆ’1 that corresponded to this column is zero. Hence, we obtain: Μ… 𝐴 𝐼 Μ… π‘₯ π‘‘βˆ’1 (𝐼 ) = 𝐴 𝐼 π‘₯ π‘‘βˆ’1 (𝐼) (A.7) By substituting (A.7) into (A.6), we obtain: (A.8) 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 βˆ’ 𝐴 𝐼𝑇 π‘Ÿ π‘‘βˆ’1 + 𝛾 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ Using (A.8) in (A.4): 𝐴 𝐼 π‘₯ π‘‘βˆ’1 (𝐼) = 𝑦 βˆ’ π‘Ÿ π‘‘βˆ’1 (A.9) 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 (𝑦 βˆ’ π‘Ÿ π‘‘βˆ’1 ) + 𝛾 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ The active entries of the correlation vector are computed via: 𝑐 𝑑 (𝐼) = 𝐴 𝐼𝑇 π‘Ÿ π‘‘βˆ’1 . Hence we can rewrite Equation (A.9) as follows: 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 βˆ’ 𝑐 𝑑 (𝐼) + 𝛾 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ (A.10) 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 βˆ’ (πœ† 𝑑 βˆ’ 𝛾 𝑑 ) 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ (A.11) Substituting 𝑐 𝑑 (𝐼) of (3.4) into (A.10): 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 βˆ’ πœ† 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ + 𝛾 𝑑 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ Recall Equation (3.20): πœ† 𝑑+1 = πœ† 𝑑 βˆ’ 𝛾 𝑑 , we can rewrite (A.11) as follows: 𝐴 𝐼𝑇 𝐴 𝐼 π‘₯ 𝑑 (𝐼) = 𝐴 𝐼𝑇 𝑦 βˆ’ πœ† 𝑑+1 𝑠𝑖𝑠𝑛�𝑐 𝑑 (𝐼)οΏ½ This leads to the penalized least square problem expressed by Equation (4.2). 76 BIBLIOGRAPHY 77 BIBLIOGRAPHY [1] Emmanuel J. CandΓ¨s, Justin Romberg, and Terence Tao, "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information," IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489 - 509, February 2006. [2] David L. Donoho, "Compressed Sensing," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006. [3] Emmanuel J. CandΓ¨s and Terence Tao, "Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?," IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406 - 5425, December 2006. [4] Emamnuel J. CandΓ¨s, "Compressive sampling," In Proceedings of the International Congress of Mathematicians, Madrid, Spain, August 2006. [5] Yaakov Tsaig and David L. Donoho, "Extensions of compressed sensing," Elsevier Signal Processing, vol. 86, no. 3, pp. 549-571, March 2006. [6] Emmanuel J. Candes and Terence Tao, "Decoding by Linear Programming," IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203-4215, December 2005. [7] Mark Rudelson and Roman Vershynin, "Geometric approach to error-correcting codes and reconstruction of signals," InternationalMathematics Research Notices, vol. 2005, no. 64, pp. 4019-4041, December 2005. [8] Alexander Vardy, "Algorithmic complexity in coding theory and the minimum distance problem," in STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory of Computing, New York, NY, USA, 1997, pp. 92–109. [9] Albert Tarantola, Inverse problem theory and methods for model parameter estimation. Philadelphia, PA , USA: Society for Industrial and Applied Mathematics, 2005. [10] Emmanuel J. CandΓ¨s, Justin Romberg, and Terence Tao, "Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions," Foundations of Computational Mathematics, vol. 6, no. 2, pp. 227-254, 2006. [11] Emmanuel J. CandΓ¨s, Justin Romberg, and Terence Tao, "Stable signal recovery from incomplete and inaccurate measurements," Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207-1223, August 2006. [12] Michael Elad, Sparse and Redundant Representations From Theory to Applications in Signal and Image Processing, 1st ed. New York, NY, USA: Springer, 2010. 78 [13] B. K. Natarajan, "Sparse Approximate Solutions to Linear Systems," SIAM Journal on Computing, vol. 24, no. 2, pp. 227-234, 1995. [14] Scott S. Chen, David L. Donoho, and Michael A. Saunders, "Atomic Decomposition by Basis Pursuit," SIAM Journal on Scientific Computing , vol. 20, no. 33-61 , 1998. [15] David L. Donoho, "For most large underdetermined systems of linear equations the minimal L1-norm solution is also the sparsest solution," Communications on Pure and Applied Mathematics, vol. 59, no. 6, pp. 797–829, June 2006. [16] Jean-Jacques FUCHS, "Recovery of exact sparse representations in the presence of bounded noise ," IEEE Transactions on Information Theory, vol. 51, no. 10, pp. 3601 - 3608 , October 2005. [17] Yaakov Tsaig and David L. Donoho, "Breakdown of equivalence between the minimal L1norm solution and the sparsest solution," Elsevier Signal Processing, vol. 86, no. 3, pp. 533–548, March 2006. [18] David L. Donoho and Philip B. Stark, "Uncertainty Principles and Signal Recovery," SIAM Journal on Applied Mathematics, vol. 49, no. 3, pp. 906-931, 1989. [19] David L. Donoho and Xiaoming Huo, "Uncertainty principles and ideal atomic decomposition," IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2845 - 2862, November 2001. [20] Michael Elad and Alfred M. Bruckstein, "A generalized uncertainty principle and sparse representation in pairs of bases ," IEEE Transactions on Information Theory, vol. 48, no. 9, pp. 2558 - 2567, September 2002. [21] RΓ©mi Gribonval and Morten Nielsen, "Sparse representations in unions of bases ," IEEE Transactions on Information Theory, vol. 49, no. 12, pp. 3320 - 3325, December 2003. [22] Stephen Boyd and Lieven Vandenberghe, Convex Optimization.: Cambridge University Press, 2004. [23] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Introduction to Algorithms, 3rd ed.: The MIT Press, 2009. [24] Stephane G. Mallat and Zhifeng Zhang, "Matching Pursuits With Time-Frequency Dictionaries," IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397 - 3415 , December 1993. [25] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, "Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition," in The Twenty-Seventh 79 Asilomar Conference on Signals, Systems and Computers, November 1993. [26] Geoffrey Davis, Stephana Mallat, and Marco Avellaneda, Approximations," Constr. Approx, vol. 13, pp. 57-98, 1997. "Adaptive Greedy [27] Joel A. Tropp, "Greed is Good: Algorithmic Results for Sparse Approximation," IEEE Transactions on Information Theory, vol. 50, no. 10, pp. 2231-2242, October 2004. [28] David L. Donoho, Yaakov Tsaig, Iddo Drori, and Jean-Luc Starck, "Sparse solution of underdetermined linear equations by Stagewise Orthogonal Matching Pursuit," Preprint, 2006. [29] Deanna Needell and Roman Vershynin, "Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit," IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 310-316, April 2010. [30] Gagan Rath and Christine Guillemot, "Sparse approximation with an orthogonal complementary matching pursuit algorithm," IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP , pp. 3325 - 3328, April 2009. [31] Thomas Blumensath and Mike E. Davies, "Gradient Pursuits ," IEEE Transactions on Signal Processing, vol. 56, no. 6, pp. 2370 - 2382 , June 2008. [32] D. Needell and J.A. Tropp, "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples," Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301-321, May 2009. [33] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani, "Least Angle Regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-451 , April 2004. [34] David L. Donoho and Yaakov Tsaig, "Fast Solution of L1- norm Minimization Problems When the Solution May be Sparse," IEEE Transactions on information Theory, vol. 54, no. 11, pp. 4789-4812, November 2008. [35] Robert Tibshirani, "Regression shrinkage and selection via the lasso," Journal of the Royal Statistical Society, vol. 58, no. 1, pp. 267–288, 1996. [36] Joel A. Tropp and Anna C. Gilbert, "Signal recovery from Random measurements via Orthogonal Matching Pursuit ," IEEE Transactions on information Theory, vol. 53, no. 12, pp. 4655–4666, December 2007. [37] Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The elements of statistical learning, 2nd ed. New York, NY, USA: Springer, 2009. 80 [38] M. Elad, J.-L. Starck, P. Querre, and D. L. Donoho, "Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA)," Applied and Computational Harmonic Analysis, vol. 19, no. 3, pp. 340–358, November 2005. [39] Jean-Luc Starck, Michael Elad, and David L. Donoho, "Image Decomposition: Separation of Texture from Piecewise Smooth Content," in SPIE annual meeting, San Diego, CA, USA, 2003. [40] Jean-Luc Starck, Michael Elad, and David L. Donoho, "Redundant Multiscale Transforms and Their Application for Morphological Component Separation," Advances in Imaging and Electron Physics, vol. 132, pp. 287–348, 2004. [41] N. Ahmed, T. Natarajan, K.R. Rao, "Discrete Cosine Transform," IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90-93, January 1974. [42] Behrooz Parhami, Introduction to parallel processing. New York, NY, USA: Plenum Press, 1999. [43] Julien Mairal. SPArse Modeling Software. [Online]. http://www.di.ens.fr/willow/SPAMS/ 81