LOW RANK MODELS FOR MULTI-DIMENSIONAL DATA RECOVERY AND IMAGE SUPER-RESOLUTION By Mohammed Al-Qizwini A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Electrical Engineering -Doctor of Philosophy 2017 ABSTRACT LOW RANK MODELS FOR MULTI-DIMENSIONAL DATA RECOVERY AND IMAGE SUPER-RESOLUTION By Mohammed Al-Qizwini In the past decade tremendous research efforts focused on signals with specific features, especially sparse and low rank signals. Researchers showed that these signals can be recovered from much smaller number of samples than the Nyquist rate. These efforts were promising for several applications in which the nature of the data is known to be sparse or low rank, but the available samples are much fewer than what is required by the traditional signal processing algorithms to grant an exact recovery. Our objective in the first part of this thesis is to develop new algorithms for low rank data recovery from few observed samples and for robust low rank and sparse data separation using the Robust Principal Component Analysis (RPCA). Most current approaches in this class of algorithms are based on using the computationally expensive Singular Value Decomposition (SVD) in each iteration to minimize the nuclear norm. In particular, we first develop new algorithms for low rank matrix completion that are more robust to noise and converge faster than the previous algorithms. Furthermore, we generalize our recovery function to the multi-dimensional tensor domain to target the applications that deal with multi-dimensional data. Based on this generalized function, we propose a new tensor completion algorithm to recover multi-dimensional tensors from few observed samples. We also used the same generalized functions for robust tensor recovery to reconstruct the sparse and low rank tensors from the tensor that is formed by the superposition of those parts. The experimental results for this application showed that our algorithms provide comparable performance, or even outperforms, state-of-the-art matrix completion, tensor completion and robust tensor recovery algorithms; but at the same time our algorithms converge faster. The main objective of the second part of the thesis develops new algorithms for example based single image super-resolution. In this type of applications, we observe a lowresolution image and using some external “example” high-resolution – low-resolution images pairs, we recover the underlying high-resolution image. The previous efforts in this field either assumed that there is a one-to-one mapping between low-resolution and highresolution image patches or they assumed that the high-resolution patches span the lower dimensional space. In this thesis, we propose a new algorithm that parts away from these assumptions. Our algorithm uses a subspace similarity measure to find the closes highresolution patch to each low-resolution patch. The experimental results showed that DMCSS achieves clear visual improvements and an average of 1dB improvement in PSNR over state-of-the-art algorithms in this field. Under this thesis, we are currently pursuing other low rank and image super-resolution applications to improve the performance of our current algorithms and to find other algorithms that can run faster and perform even better. To my family… iv ACKNOWLEDGMENTS First of all, I wish to express my great gratitude to my advisor, Dr. Hayder Radha, for his guidance and support throughout all stages of this research. His advice and encouragement have provided me with the skills needed to develop and refine this research. Special Thanks for Dr. Selin Aviyente, Dr. Xiaobo Tan and Dr. Xiaoming Liu for agreeing to serve on my committee. I would like to acknowledge and thank Michigan State University (MSU) for allowing me to conduct my study and research here. I also wish to thank all my friends and colleagues in the Electrical and Computer Engineering departments at MSU. A huge thank to all my friends who have assisted and supported me in so many ways during my study. Lastly, I would like to express my great gratitude to my mother, father, brothers and sisters for all their support and encouragement for me. v TABLE OF CONTENTS LIST OF TABLES ........................................................................................................... viii LIST OF FIGURES ........................................................................................................... ix Chapter 1 Introduction ...................................................................................................... 1 1.1 Matrix Completion ............................................................................................... 1 1.2 Robust Principal Component Analysis ......................................................................... 2 1.3 Image Super-resolution ...................................................................................................... 4 1.4 Notations .................................................................................................................................. 4 1.5 Literature Survey ................................................................................................. 6 1.5.1. Matrix Completion ....................................................................................................... 6 1.5.2. Tensor Completion ...................................................................................................... 9 1.5.3. Robust Tensor Recovery ............................................................................ 11 1.5.4. Image Super-resolution .............................................................................. 12 1.6 Contributions...................................................................................................... 14 1.7 The proposal outline .......................................................................................................... 15 Chapter 2 Matrix and Tensor Completion and Robust Tensor Recovery....................... 17 2.1 Matrix Rank Minimization ................................................................................ 17 2.2 Matrix Completion ............................................................................................. 21 2.3 Tensor Completion ............................................................................................................. 23 2.4 Robust Tensor Recovery .................................................................................................. 25 2.5 Truncated Nuclear Norm ................................................................................... 26 Chapter 3 Proposed Algorithms for Low Rank Models .................................................. 30 3.1 Matrix Completion Algorithms using Smooth Rank Approximation [68]…….30 3.1.1. Accelerated Proximal Gradient Algorithm for Reweighted Rank Approximation ......................................................................................................................... 30 3.1.2. Alternating Direction Method of Multipliers for Reweighted Rank Approximation .......................................................................................................................... 33 3.2 Smooth Rank Approximation Tensor Completion [67]..................................... 35 3.3 Smooth Rank Approximation for Robust Tensor Recovery [70] ...................... 38 3.3.1. The Proposed ALM Solution................................................................................... 40 3.3.2. The Proposed APG Solution ................................................................................... 42 3.3.3. The Proposed ADMM Solution .............................................................................. 46 3.4 Experimental Results ......................................................................................... 50 3.4.1. Matrix Completion .................................................................................................... 50 3.4.2. Tensor Completion ................................................................................................... 58 3.4.3. Robust Tensor Recovery ......................................................................................... 67 3.5 Proof of Lemmas................................................................................................ 79 3.5.1. Proof of Lemma 3.1: .................................................................................................. 79 3.5.2. Proof of Lemma 3.2: .................................................................................................. 80 3.5.3. Proof of Lemma 3.3: ................................................................................................. 80 vi 3.5.4. Proof of Lemma 3.4: …………………………..…………………….….…….…..….…....81 3.5.5. Proof of Lemma 3.5: …………………………….………….……….…...81 3.5.6. Proof of Lemma 3.6: .……………………………..………….……..…....82 3.5.7. Proof of Lemma 3.7: ………………….……………………..…….……...83 Chapter 4 Single Image Super-Resolution ....................................................................... 85 4.1 Example Based Single Image Super-resolution ................................................. 85 4.2 Subspace Clustering ........................................................................................... 86 4.3 Low Rank Subspace Clustering (LRSC) ........................................................... 88 4.4 Dual-manifold clustering and subspace similarity [86] ..................................... 89 4.4.1. Training Phase: ........................................................................................................... 90 4.4.2. Testing Phase:.............................................................................................................. 93 4.5 Experimental Results ......................................................................................... 94 Chapter 5 Conclusions and Future Works ................................................................... 100 5.1 Conclusions ...................................................................................................... 100 5.2 Future works .................................................................................................... 102 5.2.1. Matrix and Tensor Completion and Robust Tensor Recovery ..............102 5.2.2. Sparse\redundant representation for a single video frame ..................102 BIBLIOGRAPHY ........................................................................................................... 105 vii LIST OF TABLES Table 2.1: A comparison between some important terms according to the rank minimization and cardinality minimization concepts …………………………..…… 17 Table 3.1: Comparison of average execution time vs. SNR for matrix completion algorithms using a set of 12 grayscale images ………………………………………. 54 Table 3.2: Average execution time and PSNR comparison for tensor completion algorithms using a set of 10 colored images ………………………………………… 62 Table 3.3: Average execution time and PSNR comparison for tensor RPCA algorithms using a set of 10 colored images …………………………………………………….. 75 Table 4.1: Performance of super-resolution algorithms on different test ismages with a magnification factor of 3……………………………………………………………… 99 viii LIST OF FIGURES Figure 1.1: Visual explanation of the (. )(.) function in tensor domain. (a) the tensor in 3D domain, (b) the unfolded tensor along i1, (c) the unfolded tensor along i2 and (d) the unfolded tensor along i3…….……………………………………………………….. 5 Figure 2.1: Collaborative filtering example of bad sampling for low rank matrix recovery.……………………………………...……………………………………..... 21 Figure 3.1: (a) Original image (rank 80) (b) The observed noisy samples, (c-e) Inpainted image and the corresponding SNR (dB) using (c) ILRS-1, (d) SRAMC-APG, (e) SRAMC-ADM ……………………………………………………………….……… 51 Figure 3.2: Iterations versus signal to noise ratio for cameraman image with 50% observed entries.…………………………………………………..……………………………. 52 Figure 3.3: SNR versus the noise variance using cameraman image with 50% observed entries……….……………………………………………………………….………. 53 Figure 3.4: Columns correlation coefficient for the recovered cameraman image from 50% observed noisy entries.………………………………………………………………... 53 Figure 3.5: (a) The original façade image, (b) The noisy observation, (b-d) Inpainted image and the corresponding SNR (dB) using (b) ILRS-1, (c) SRAMC-APG, (d) SRAMC-ADM ……………………………………………………………………… 55 Figure 3.6: (a) The original drop image, (b) The masked noisy observation, (b-d) Inpainted image and the corresponding SNR (dB) using (b) ILRS-1, (c) SRAMC-APG, (d) SRAMC-ADM .……………………………………………………………………… 57 Figure 3.7: (a) The original house image, (b-d) Recovered images with the corresponding PSNR (dB) and execution time (Sec.) using (b) LRTC, (c) APG, (d) SRATC ………………………………………………………………………………………... 59 Figure 3.8: (a) The original house image, (b-d) Recovered images with the corresponding PSNR (dB) and execution time (Sec.) using (b) LRTC, (c) APG, (d) SRATC ………………………………………………………………………………………... 60 Figure 3.9: Peak Signal to Noise Ratio versus the rank of the matrix using lena colored image with 50% observed samples.………………………………..………………… 61 Figure 3.10a: (a) Original façade image (b) Recovered images with the corresponding execution time (Sec.) using LRTC.…………………………………..…………… 63 ix Figure 3.10b: Recovered images with the corresponding execution time (Sec.) using (a) APG, (b) SRATC.……………………………………...…………………………… 64 Figure 3.11: (a) The original tomato frame (number 6), (b-d) Recovered frames with the corresponding PSNR (dB) and execution time (Sec.) using (b) LRTC, (c) APG, (d) SRATC …………………………...………………………………………………… 66 Figure 3.12a: Top: The observed image. Left: low rank model. Right: Sparse model using MALM.………………………………………………...…………………………….. 68 Figure 3.12b: Left: low rank model. Right: Sparse model using RSTD (Top) and IADAL (Bottom).…………………………………………………………………………….. 69 Figure 3.12c: Left: low rank model. Right: Sparse model using TSRTR-ALM (Top) and TSRTR-APG (Bottom)..………………………………….………………………….. 70 Figure 3.12d: Left: low rank model. Right: Sparse model using TSRTRADMM.……………………………………………………………………..……….. 71 Figure 3.13: PSNR versus iteration number for the façade image in Figure 3.12…... 73 Figure 3.14: PSNR versus rank of the first two folds for the façade image in Figure 3.12…………………………………………………………………………………... 74 Figure 3.15: Set of images used for performance comparison in Table 3.3................. 74 Figure 3.16a: Top: The observed image. Left: low rank model. Right: Sparse model using MALM (Top) and RSTD (Bottom).............................................................................. 77 Figure 3.16b: Left: low rank model. Right: Sparse model using IADAL (Top), TSRTRALM (Middle) and TSRTR-APG (Bottom)………………………………................. 78 Figure 3.16c: Left: The observed image. Middle: low rank model. Right: Sparse model using TSRTR-ADMM.................................................................................................. 79 Figure 4.1: The test phase summary of the DMCSS algorithm. d1,d2 and d3 represent the distance between the LR test patch and each cluster in the LR manifold. T:L2-H1 is an entry in the subspace similarity table (T) and it represents the closest HR subspace to the corresponding LR subspace............................................................. 91 Figure 4.2: The training images used for the DMCSS algorithm. The pictures are proportionally resized to fit in the table........................................................................ 96 Figure 4.3: The results of applying different super-resolution methods to lena image with magnification factor of 3………………………………………………………… 97 x Figure 4.4: Testing images for super-resolution algorithms…………………………. xi 98 Chapter 1 Introduction 1.1 Matrix Completion According to the Nyquist-Shannon sampling theorem, a signal can be completely determined by its samples if the sampling rate is twice the maximum frequency of that signal. On the other hand, there are numerous applications where sampling at Nyquist frequency is either impossible, very expensive or time consuming [1]. In addition to that the tremendous number of images and videos that are being generated, transmitted and stored on the internet, researchers suggested that instead of the usual data acquisition method of collecting as much information as possible and then through away the redundant information, why not collecting only the information necessary to recover the data. Researchers proved that if the underlying signal has specific properties, then it can be recovered using much fewer samples than the Nyquist rate, for instance, if the signal is sparse or it can be transformed into a sparse domain, then only few number of samples are required to recover that signal using the Compressive Sampling framework [2]–[4]. From a similar concept to Compressed Sensing, Matrix Completion also emerged during the past decade as one of these theories and several researches are being performed in this topic to this day. The main concept behind matrix completion is recovering the underlying low rank data from only few observed samples of the original signal. There is a large number of applications in which we observe an incomplete matrix of measurements and we wish to fill in the unobserved matrix entries using the observed measurements [5], [6]. For instance, in collaborative filtering, the famous Netflix problem, 1 in which the vendor has a large number of users and a set of movies, each user represents a row and each column represents a movie, and the challenge was to predict each user’s ratings for the movies that he/she didn’t watch based on the ratings provided by the user for the movies that he/she watched in order to recommend other movies to that user. Because there is a small number of factors that can affect users’ taste of preferences about movies, these matrices are considered to be low rank matrices [7], [8]. Other applications include recovering missing pixels in images for computer vision, system identification in control, multi-class learning in data analysis, global positioning of sensors in a network, partial distance information, remote sensing applications in signal processing and many statistical problems involving succinct factor models [7]. Here we stop to ask a question, is it possible to recover the unknown entries of a matrix given a subset of its known entries? A first thought to the problem will give an answer no to the question, however if the matrix is low rank, then it is possible to recover the original matrix with high probability using nuclear norm minimization [5], [8], [9]. 1.2 Robust Principal Component Analysis Principal Component Analysis (PCA) is one of the most used statistical techniques in dimension reduction. However, the performance of PCA-based approaches degrades significantly for non-Gaussian noise, and especially for grossly corrupted observations or in the present of outliers [9]–[11]. To overcome these shortcomings of traditional PCA and make it more robust, Candes et al. [10] suggested invoking the assumption that the observed data consists of two components, a low rank component 𝐿 and a sparse component 𝑆 and they introduced a novel algorithm based on tractable convex optimization to solve the 2 problem within a polynomial time and a strong performance guarantee. Consequently, the problem can be reduced to the recovery of 𝐿 from highly corrupted measurements 𝐷 = 𝐿 + 𝑆, in which 𝑆 can have arbitrary large elements, but must be a sparse matrix. Furthermore, to make such modeling approach viable and meaningful, the low rank component 𝐿 is assumed to be not sparse [9], [10]. For some applications, the sparse matrix 𝑆 could carry useful information. For instance, on latent semantic indexing, the input matrix 𝐷 contains entries that encode the relevance of a word to a document. If 𝐷 can be decomposed into two components, low rank and sparse, then 𝐿 could represent common words, and 𝑆 captures some key words that could be used to distinguish each document from others [9], [12]. On the other hand, some applications consider 𝑆 as merely noise or outliers and they are only interested in recovering𝐿 . For example, in the face recognition problem 𝑆 represents the noise and outliers such as facial expression changes or glasses or other objects. While 𝐿 represents the original facial expressions to be recovered [13]. Recovering the low rank and sparse components of an observed multidimensional signal (tensor) that is formed by the superposition of both components is a very interesting problem and it has a wide range of applicability in computer vision, bioinformatics and graph analysis [10],[14]. Also, since most applications of interest, especially those related to imaging and computer vision, have to handle multidimensional data that are inherently 3D or 4D [15], [16]. Applying the recovery algorithms to each two dimensional plane separately ignores the internal correlation in the signal. 3 1.3 Image Super-resolution Image super-resolution is the ill-posed problem of recovering the underlying High- resolution (HR) image from an observed Low-resolution (LR) image(s). Image superresolution is important in several computer vision applications such as license plates recognition [17], object and face recognition [18] and restoration of historical images i.e. image inpainting [19] as it is promising to overcome the shortcoming of low cost camera sensors and allows a better utilization to the high-resolution screens. Many scenarios have been suggested for image super-resolution based on the availability of additional information to help with the recovery. In some scenarios additional LR images for the same object are available but under different conditions which are referred to as multi-frame image super-resolution [20]–[22]. In other scenarios no training images are used, instead researchers base their approaches to recover the HR image on information from the available LR image itself [23]–[25]. Other approaches assume only one LR image is available and examples of training HR and LR images are used to recover the target HR image, which is called example based image super-resolution [26]–[29]. This thesis falls in the example based single image super-resolution category since we will be using two external dictionaries, one for LR example images and the other for HR images. 1.4 Notations Throughout this thesis, matrices are denoted using capital letters, e.g. 𝑀, and tensors by cursive capital letters, e.g. M. Also, 𝑀𝑇 is the transpose of 𝑀. 𝑀𝑖1 𝑖2 is an entry of the matrix 𝑀 ∈ ℝ𝑚×𝑛 at row 𝑖1 and column 𝑖2 ; and M𝑖1 …𝑖𝑁 is an entry of the tensor M ∈ℝ𝑚1 ×𝑚2 …𝑚𝑁 4 at indices 𝑖1 through 𝑖𝑁 . For a matrix 𝑀 ∈ ℝ𝑚×𝑛 , 𝑃𝛺 (𝑀) is the sampling operator which is defined as: i1 X (X ) (1) irow i2 i3 (a) icolumn (b) (X ) irow (2) icolumn (c) (X ) irow (3) icolumn (d) Figure 1.1: Visual explanation of the (. )(.) function in tensor domain. (a) the tensor in 3D domain, (b) the unfolded tensor along i1, (c) the unfolded tensor along i2 and (d) the unfolded tensor along i3. 𝑃𝛺 (𝑀) = { 𝑀𝑖1 𝑖2 0 𝑖𝑓 𝑖1 , 𝑖2 ∈ 𝛺 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒 where 𝛺 represents the set of observed entries. 5 (1.1) For M ∈ ℝ𝑚1 ×𝑚2 …𝑚𝑁 , the 𝑢𝑛𝑓𝑜𝑙𝑑(. )(𝑖) operation over dimension 𝑖 unfolds the 𝑁 dimensional tensor into a matrix, 𝑓𝑜𝑙𝑑(. )(𝑖) is the inverse of 𝑢𝑛𝑓𝑜𝑙𝑑(. )(𝑖) and it can be defined as: M=𝑓𝑜𝑙𝑑((M)(𝑖) )(𝑖) (1.2) A visual example of the tensor unfold operation is shown in Figure 1.1. The inner product between two matrices is defined by: 〈𝐶, 𝐸〉 = 𝑇𝑟(𝐶𝐸) (1.3) where 𝑇𝑟(. ) is the trace function. If 𝑀 = 𝑈𝛴𝑉 𝑇 is the SVD decomposition of 𝑀, then the soft thresholding operator over 𝑀 is defined as [30], [31]: 𝐷𝑇ℎ (𝑀) = 𝑈𝛴𝑇ℎ 𝑉 𝑇 (1.4) Where 𝛴𝑇ℎ = 𝑚𝑎𝑥(𝜎𝑖 − 𝑇ℎ, 0) is the diagonal soft thresholded singular values matrix. 1.5 Literature Survey 1.5.1. Matrix Completion For the past decade, many algorithms have been proposed mainly for matrix completion and several variations of the problem has been suggested. In this section, we are going to talk about some of the important work that has been done for matrix completion that is related to our contributions in this field. E. J. Candès and B. Recht, proposed the first convex programming algorithm to solve the 6 exact matrix completion algorithm in [5], they showed that low rank matrices can be recovered with very high probability if the number of observed samples (m) obey: 𝑚 ≥ 𝐶𝑛1.2 𝑟 log 𝑛 (1.5) For some positive constant 𝐶, most 𝑛×𝑛 matrices with rank 𝑟 can be recovered perfectly using convex optimization. They also showed that by replacing the 1.2 exponent with 1.25 the recovery is guaranteed for all rank values. E. J. Candès and Y. plan proposed a relaxation to the equality constrains of the matrix completion problem in [7] to reduce the effect of noise on the signal reconstruction performance. They performed numerical and quantitative analysis and showed that nuclear norm minimization can perfectly fit large low rank matrices from only few observed noisy samples. A simple Singular Value Thresholding (SVT) algorithm for matrix completion was proposed in [32], the algorithm performs matrix completion by soft thresholding the singular values of the SVD decomposition of the estimated matrix. The authors provided convergence analysis of the algorithm and the experimental results showed that 1000×1000 matrices can be recovered within one minute on an average desktop computer. A Fixed Point Continuation Algorithm with approximate SVD decomposition (FPCA) for nuclear norm minimization were proposed in [31]. The authors compared the algorithm to other semidefinite programming algorithms and showed that FPCA converges much faster than the others and at the same time it provided much better performance. Their experimental results showed that on 1000 ×1000 matrix with rank 50, they achieved 10-5 error and the algorithm converged within 3 minutes. In [33], the authors used the nuclear norm regularized least square instead of minimizing 7 the nuclear norm. The regularized least square is the sum of a convex smooth function with Lipschitz continuous gradient and a convex function on a set of matrices. An accelerated proximal gradient algorithm is proposed to solve the objective function. The experimental results showed that the algorithm is robust and efficient to solve large scale matrices of dimensions up to 105 in less than 10 minutes on an average personal computer. In [34], the authors proposed a novel method by only minimizing the smaller singular values from the nuclear norm and leave the significant ones unchanged instead of all the singular values. The authors build an algorithm using the alternative direction method of multiplier (ADM) based on the Truncated Nuclear Norm Regularization (TNNR) by only minimizing the 𝑁 − 𝑟 singular values where 𝑁 is the number of singular values and 𝑟 is the rank of the matrix. The authors compared the performance of the algorithm against state of the art algorithms on grayscale and colored images and showed that the proposed algorithm achieve significant improvement in performance. A new set of Iterative Reweighted Least Squares (IRLS- 𝑝) where 0 ≥ 𝑝 ≤ 1 algorithms was proposed by K. Mohan and M. Fazel in [35]. The algorithms are used as a computationally efficient variation of the nuclear norm minimization. The authors also present smoothed version of the algorithms sIRLS- 𝑝 which shown very fast convergence results and at the same time improved the recovery performance when compared to the conventional nuclear norm minimization algorithms. Also recently, the notion of matrix completion over samples collected from different times has been proposed in [36]; the author modified and proposed several algorithms that performs adaptive matrix completion when the samples are collected from a process that is being changed over time and the goal is to recover all the underlying low rank matrices. The 8 proposed algorithm that performed the best in their experiments is a variation to the SVT algorithm in [32] and they referred to it as the Adaptive Singular Value Thresholding (ASVT). ASVT performance is evaluated against state-of-the-art matrix completion algorithms and it showed that ASVT outperforms all the other algorithms using the relative error measure. 1.5.2. Tensor Completion Although the matrix completion problem has been studied thoroughly during the past years, less research was done in tensor completion due to the complexity of high dimensional mathematical analysis and the lack of definitions of simple matrix norms when we move to the higher dimensional space. Despite this difficulty, some researchers overcome these problems and proposed different algorithms to solve it. Ji Liu et.al. proposed three algorithms for Low Rank Tensor Completion (LRTC) based on nuclear norm minimization over each fold of the tensor in [30], [37]. The SiLRTC is a simple algorithm and it implement a relaxation technique to separate the dependent components and the Block Coordinate Descent (BCD) to solve for each component separately and achieve the global optimization, the FaLRTC uses a smooth scheme to change the objective function to a smooth function and solved the general nuclear norm minimization problem. Finally, HaLRTC uses the Alternative Direction Methods of Multiplier (ADM) to minimize the tensor nuclear norm. The experimental results showed that all algorithms are applicable for image and video signals and the algorithms are more robust and faster than the available tensor completion algorithms. They also showed that FaLRTC and HaLRTC are more efficient that SiLRTC while FaLRTC is the most efficient 9 algorithm. In [38], the authors suggests minimizing the nuclear norm of the tensor without metricizing the tensor which was shown theoretically to perform better than minimizing the nuclear norm. The authors also develop a series of algebraic and probabilistic techniques such as characterization of sub-differential for tensor nuclear norm and concentration inequalities for tensor martingales, which are useful in other tensor related problems. Based on the work in [34], L T Huang et.al. proposed an algorithm for tensor completion using the truncated nuclear norm in the tensor domain [39]. The used the Alternative Direction Method of Multiplier (ADM) to solve the problem, and they reduced the two-step solution in [34] to one-step solution. They also ran extensive experimental results and showed that their algorithm performs significantly better than the state-of-art algorithms. In [40], the authors proposed an algorithm for tensor completion with high ratio of missing samples, in which situations the usual low rank and smoothness assumptions do not work. The authors address this issue by applying a novel PARAFAC decomposition. The decomposition is represented as a sum of the outer product of functional smooth component vectors, which are represented by linear combinations of smooth basis functions. Based on that, they also developed an algorithm with greedy deflation and rank-one tensor decomposition. The experimental results showed that this algorithm performs better than state-of-the-art algorithms for tensor completion. In [41] the authors proposed an alternative method for tensor completion using Tensor Train (TT) rank which is capable of capturing hidden information from tensors. Based on that, a new optimization function is proposed for tensor completion and two algorithms were proposed for tensor completion; the first one is SiLRTC-TT which uses the usual nuclear 10 norm minimization based on TT rank and the other algorithm uses a multilinear factorization model to approximate the TT rank of a tensor, and is called tensor completion by parallel matrix factorization via TT (TMac-TT). The experimental results showed significant performance improvement of the proposed algorithms compared to the existing algorithms. 1.5.3. Robust Tensor Recovery Similar to Matrix and Tensor Completion, despite the applicability of robust tensor recovery to several real-life applications, the area of Tensor recovery witnessed less research interest than robust matrix completion. In this section, we will talk about the key research articles that tackles the problem of robust matrix and tensor recovery. E. J. Candès et.al. in [10] proved that it is possible to separate a matrix that is generated by superposing a low rank and sparse matrices into its low rank and sparse components by solving convex problem called principal component pursuit. The algorithm they proposed works by minimizing both the nuclear norm of the predicted low rank matrix and the ℓ1 norm of the sparse component. The authors also present several practical applications for this method such as video surveillance where the algorithm was able to successfully detect objects in cluttered environment, face detection where the algorithm is used to remove shadow and secularities. In [42] and similarly [43] the authors proposed a method to recover low rank matrices from corrupted observation by using robust matrix recovery. In this method, the noise matrix is considered the sparse portion and the recovered low rank matrix represents the predicted “clean” matrix. The authors provided a proof that most matrices can be recovered with very high probability using this approach. The algorithm was applied to several computer vision 11 problems and it shows promising performance. For the tensor domain, Goldfarb and Qin apply convex optimization to recover the low rank tensor using high order tensor PCA [14]. They propose an optimization algorithm based on the ADM and the Accelerated Proximal Gradient (APG) algorithms with convergence guarantees for solving both the constrained and the Lagrangian forms of the problem. They also propose a non-convex model that improves the results of the recovery from the convex model. They apply their algorithm to several real computer vision applications and showed the practical use of the proposed method. In [44] the authors used a new model for tensor Singular Value Decomposition (tSVD) that was proposed in [45] for third order tensors only. The proposed method uses ADM algorithm to separate the low rank and sparse components of the observed tensor. The algorithm was applied for colored image denoising and it showed significant performance improvements over the previous algorithms. 1.5.4. Image Super-resolution Since our contributions in this thesis are related to single image super-resolution, we will only mention publications in the same category as our contributions. J. Yang et.al. in [26] address the problem of image super-resolution from the compressed sensing perspective by assuming that the observed low-resolution image is a down sampled version of a high-resolution image. The patches of the high-resolution image is assumed to have a sparse representation. They also show the effect of the sparsity assumption for solving the ill-posed super-resolution problem. The authors also form a dictionary using a small set of randomly chosen raw patches from training images of similar statistical nature 12 to the input image. The experimental results showed that this method produce superior quality compared to the other super-resolution methods. In [46], the authors explore the relationship between neighbor pixels and the estimation of the high-resolution image from the observed low-resolution input. The authors propose an algorithm based on matrix completion concept by minimizing the sum of all the augmented matrices’ rank. The authors tested the algorithm on different applications and showed that the resulted image has both better visual quality and higher PSNR values. Another super-resolution algorithm based on clustered sparse representation and adaptive patch aggregation was proposed in [47]. The training patches pairs are collected randomly from example images and using K-means clustering algorithm, the patches are divided into multiple groups. The over-complete dictionary is learned over the clustered pairs. Only one cluster, sub-dictionary, is selected to represent the low-resolution patch. The authors compare the performance of the algorithm against several state-of-the-art algorithms for super-resolution and it showed that the proposed algorithm outperformed the rest of the methods. In [48], the authors propose an algorithm based on two other super-resolution algorithms, Anchored Neighborhood Regression (ANR) [49] and Simple Functions (SF) [50]. The resulted algorithm (A+) builds on the features and anchored regressors from ANR but instead of learning on the training dictionary, it learns on the entire training material, similar to SF. The authors used different example images and they showed that the PSNR improved by 0.2-0.7 dB over the ANR method, at the same time, the algorithm runs with low computational complexity. 13 A manifold linear approximation based approach for single image super-resolution was proposed in [28]. The authors exploit the non-linear space of the underlying high-resolution patches by considering it as low dimensional manifold in a high dimensional Euclidean space. The Sparse Subspace Clustering (SSC) algorithm is used to create the set of lowdimensional linear spaces that are considered as tangents in the higher resolution patches. Based on the obtained approximated tangent spaces, the structure of the underlying highresolution manifold is used to locate the corresponding high-resolution subspace. This approach requires a small number of training high-resolution samples, around 1000 patches, without any prior assumption about the low-resolution images. A comparison of the obtained results with other state-of-the-art methods clearly indicates the viability of the proposed approach. In [23], the authors proposed a learning-based method for image super-resolution using the low-dimensional manifold representation of high-resolution image patches space. The authors used the image and its down sampled scale to extract a set of training sample points using min-max algorithm. The ℓ1 norm of the resulted graph is then minimized to cluster these samples into a set of manifold neighborhoods and the high-resolution patch is estimated from these tangent spaces. The experimental results validate the effectiveness of the algorithm when compared to relative methods. 1.6 Contributions 1- Derive two robust algorithms for matrix completion using the Accelerated Proximal Gradient (APG) and the Alternating Direction Method of Multipliers (ADM). 14 Further, we compare both algorithms against each other and against the iterative reweighted least squares (IRLS-1) algorithm using a variety of noisy images. 2- Extend the efficient matrix completion algorithms to the N-dimensional tensor completion problem by first providing a general definition for the Schatten-P function into the N-dimensional space. We also derive a multi-dimensional Augmented Lagrangian Multiplier (ALM) optimization algorithm to solve it. 3- Formulate a new objective function for the tensor RPCA problem based on using the truncated and smoothed schatten-p functions. We also solve the formulated problem using the Augmented Lagrangian Multiplier (ALM) algorithm, which is the trivial algorithm to use with RPCA. We also proposed two other algorithms based on the Accelerated Proximal Gradient (APG) and the Alternating Directions Methods of Multipliers (ADMM). We refer to our proposed method as the Truncated and Smoothed Robust Tensor Recovery (TSRTR) algorithm. 4- Propose a novel image super-resolution framework that parts away from the similarity assumption between the low dimensional and high dimensional manifolds; instead the subspace similarity measure is used to find the closest HR subspace to each LR subspace. The test patch is projected to the closest low dimensional subspace, and the most similar HR subspace is selected and hence we get a better approximation to the LR test image than directly projecting the LR test patch to the HR space. 1.7 The proposal outline This thesis is divided into two major parts. In the first part, the low rank matrix and 15 tensor completion problems are explored and we propose new algorithms for both matrix and tensor completion that outperforms state of the art algorithms in this field. The same part also studies the Robust PCA problem and its extension to the multi-dimensional tensor domain; we also propose a new algorithm for the tensor Robust PCA problem that outperforms the state of the art algorithms. The other part of the thesis studies the single image super-resolution problem for which we proposed a novel algorithm that makes no assumption about the similarity between the LR and HR subspaces. In chapter 2, we review state-of-the-art methods related to matrix and tensor completion and robust PCA topics. Most of these methods will be used as a benchmark to compare the performance of our algorithms. The chapter will also provide a detailed description and show our experimental results of the proposed algorithms for Matrix Completion, Tensor Completion and Robust Tensor Recovery. In chapter 3, we review the available algorithms for example based single image superresolution and we present our proposed algorithm along with several experimental results to compare the performance to the state-of-the-art algorithms in this field. Finally, chapter 4 outlines our current efforts and the ongoing research directions and provides some preliminary discussion and analysis of the methods presented in this thesis. 16 Chapter 2 Matrix and Tensor Completion and Robust Tensor Recovery 2.1 Matrix Rank Minimization Matrix completion is a special case of what is known as the affine rank minimization problem, defined as: 𝑚𝑖𝑛 𝑟𝑎𝑛𝑘(𝑋) 𝑋 (2.1) 𝑠. 𝑡. 𝐴(𝑋) = 𝑏 Where, X∈ ℝn1 ∗n2 is the decision variable, A is an affine mapping A ∶ ℝn1 ∗n2 → ℝp , and the vector b∈ ℝp represents the given samples. Table 2.1: A comparison between some important terms according to the rank minimization and cardinality minimization concepts Term Sparsity inducing norm Hilbert space norm Dual norm Rank minimization ℓ1 Cardinality Minimization Nuclear ℓ2 (Euclidean) ℓ∞ Frobenius Operator Orthogonal row and column Norm additivity Disjoint support spaces Convex optimization Linear programming 17 Semi definite programming The affine transformation 𝐴 is an operation on a matrix 𝑋 such that it preserves all the linear properties among data points of 𝑋 in the resulting vector b. The solution to the problem (2.1) can be stated as the simplest (lowest rank) feasible model that is affine in the matrix variable. In general, this type of rank minimization problems can be solved using the singular value decomposition [51]. In matrix completion we are interested only in finding the matrix with the lowest rank to fill in the unobserved entries based on the small number of samples that we have, under the constraint that, in the regions where we have samples, the values of the reconstructed samples should be the same as the original ones. The affine rank minimization problem (2.1) is reduced to the cardinality minimization problem of finding the element in the affine space that has the smallest number of nonzero singular values. In this section, we provide a comparison between the rank and cardinality minimization. The main comparison points are listed in Table 2.1 [51]. The main idea behind matrix completion is to find the minimum rank matrix that fits the observed samples, so if there is only one low rank matrix that fits the observed data, this would recover the original matrix M [1]. Mathematically, this can be represented by [1, 2, 3]: 𝑚𝑖𝑛 𝑟𝑎𝑛𝑘(𝑋) 𝑋 (2.2) 𝑠. 𝑡. 𝑃𝛺 (𝑋) = 𝑃𝛺 (𝑀) Where Ω is the region of the observed samples; Mij is an observed sample and Xij is an estimated sample. It is important to mention that not all low rank matrices can be recovered from few observed samples. Let us look at the matrix 𝑀 below [5]: 18 0 0 0 0 𝑀 = 𝑒1 𝑒𝑛∗ = ⋮ 0 0 [0 0 ⋯ ⋱ 0 0 0 ⋯ 0 ⋮ 1 0 (2.3) 0 0] The matrix 𝑀 is formed by the multiplication of the first canonical eigen vector 𝑒1 by the n-th canonical eigen vector 𝑒𝑛∗ . Even though 𝑀 is a rank one matrix, it cannot be recovered from partially observed samples, unless we observe the one in the top right corner. The reason is that the sampling sets will mostly see only zeros and there is no way for us to guess that there is an element of different value unless we observe it. Otherwise, the predicted matrix will always be zero no matter what recovery method we use. Therefore, not all low rank matrices can be recovered from a set of observed entries, to understand the set of matrices that can be recovered from partially observed samples, we need to look at the SVD decomposition of a matrix 𝑋 [5], [31]. 𝑟 𝑋 = ∑ 𝜎𝑘 𝑢𝑘 𝑣𝑘𝑇 (2.4) 𝑘=1 where 𝑢𝑘 and 𝑣𝑘𝑇 are both the left and right singular vectors and 𝜎𝑘 is the singular value at index 𝑘. In these terms, the generic set of recoverable low rank matrices can be thought of as the family {𝑢𝑘 }1≤𝑘≤𝑟 is selected uniformly at random among all families of 𝑟 orthonormal vectors; the same applies for {𝑣𝑘 }1≤𝑘≤𝑟 and the families may or may not be dependent on each other [5]. To show that the matrix cannot be in the null space of the sampling operator that is providing the observed samples, let us take a look at the SVD representation of the rank-2 symmetric matrix 𝑀 below [5]: 19 2 𝑀 = ∑ 𝜎𝑘 𝑢𝑘 𝑣𝑘 𝑇 (2.5) 𝑘=1 where 𝑢1 = 𝑒1 +𝑒2 √2 , 𝑢2 = 𝑒1 1𝑒2 √2 . This matrix has all zero entries except for the 2×2 top left corner. Again, we will need to observe almost all entries of this matrix in order to recover it using any method. The reason is that the singular vector representation for this matrix is concentrated in one area which is the top-left corner in this example. In conclusion, the singular vectors of the matrix to be recovered need to be spread, uncorrelated with the standard basis, in order to reduce the number of observations required for the recovery [5], [7]. Another important aspect of successfully recovering a low rank matrix from few observed samples, is the way we sample the matrix. For example, we cannot hope to recover a matrix 𝑋from partially observed samples if the sampling operator misses a full row or column. Assume 𝑋 is a matrix of rank one and it is formed by multiplying 𝑏𝑐 ∗ where 𝑏, 𝑐 ∈ ℝ𝑛 . Each entry of this matrix can be represented by [5]: 𝑋𝑖,𝑗 = 𝑏𝑖 𝑐𝑗 (2.6) In this case, if we do not observe any sample from the first row for example, then there is no way for us to predict the first element 𝑏1 . This example can be extended to all rows and columns. Another example from collaborative filtering is shown in Figure 2.1; assume that the matrix represents the ratings that a user (row) has given to particular movies (column); each element in the matrix represent the rate that a user is given to the corresponding movie on a scale 1(not satisfied) to 5 (extremely satisfied). The question marks represent unrated movies by the corresponding user. The red rectangles represents 20 users who didn’t rate any movies and hence we have no information about their preference or taste in movies. Therefore, this sample of the matrix cannot be recovered using matrix completion. Movies Users Figure 2.1: Collaborative filtering example of bad sampling for low rank matrix recovery. So in order to recover a low rank matrix with high probability from partially observed samples, we need a set of samples that are collected uniformly at random from the underlying matrix. 2.2 Matrix Completion The equation (2.1) is an NP-Hard problem and all algorithms that solve it are doubly exponential in theory and practice [5], [7], [8], [51]. For the past decade, the main algorithms for matrix completion are based on the nuclear norm minimization [5], [31], [32], [34], [52]– [54], which was introduced in [55] after proving that the nuclear norm is the tightest convex approximation to the rank function. Several algorithms have been proposed to recover a low rank matrix from a subset of its observed entries and it has been used in many different fields, for example computer vision [56], sensor networks [57], [58] and control [59]. The 21 core idea of this recovery is based on the nuclear norm minimization [5], [31], [32], [34], [52]–[54]. In an ideal scenario, the convex approximation to the matrix completion problem can be represented by: 𝑚𝑖𝑛 ‖𝑋‖∗ 𝑋 (2.7) 𝑠. 𝑡. 𝑃𝛺 (𝑋) = 𝑃𝛺 (𝑀) where 𝑋 is the low rank matrix to be estimated, 𝑀 is the original matrix. Since the observed samples are usually contaminated with noise, we need to relax the equality constraint into an inequality constraint in order to make (2.7) more robust to noise [54]. 𝑚𝑖𝑛 ‖𝑋‖∗ 𝑋 𝑠. 𝑡. ‖𝑃𝛺 (𝑋 ) − (2.8) 𝑃𝛺 (𝑀)‖2𝐹 ≤𝜀 Subsequently, the authors in [60] proposed to use the reweighted least squares algorithm to recover sparse signals from few observed samples. In [61] the authors introduced the reweighted nuclear norm concept into the matrix completion problem. The same authors proposed an efficient iterative algorithm [35], and they showed that it is more computationally efficient and achieves better performance than the heuristic matrix completion algorithms that require evaluating the computationally expensive Singular Value Decomposition (SVD) in each iteration. The IRLS-p algorithm in [35] used a simple gradient minimization approach to obtain the optimal minimum of a reweighted rank approximation function. Instead of minimizing the convex nuclear norm function in (2.7), the authors in [35] suggested to use a smooth approximation to the rank function and they 22 showed that it converges faster than minimizing the nuclear norm function; they considered the smooth Schatten-p function: 𝑝 𝑓𝑝 (𝑋) = 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)2 (2.9) Here, 𝐼 is the identity matrix and γ > 0. The function (2.9) is convex for 𝑝 ≥ 1 [35]. So the matrix rank minimization approximation problem can be written as [35]: 𝑝 𝑚𝑖𝑛 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)2 𝑋 (2.10) 𝑠. 𝑡. 𝑃𝛺 (𝑋) = 𝑃𝛺 (𝑀) The authors also provided a theoretical guarantee that when 𝑝 = 1 the algorithm provides similar results to minimizing the nuclear norm. The main issue with this iterative algorithm is that in each iteration, the authors used a strict equality constraint for the region of observed samples, and hence, when the observed samples are contaminated with noise, the algorithm will converge to the noisy version of the data. 2.3 Tensor Completion In several computer vision applications, we usually deal with a higher dimensional space, we find few studies considered the N-dimensional tensor completion case because of the difficulties associated with higher dimensional space computations [30], [62]. Some researchers, for example in [34], applied their matrix completion algorithms to color images by considering each color channel as a separate matrix. Such algorithms don’t consider the correlation among the different channels. Previous algorithms solved the matrix and tensor completion problems based on minimizing the nuclear norm, which has been shown [55] to be the tightest approximation to the rank function [5], [30], [31], [54], [56], [63]. Some other algorithms also used a reweighted nuclear norm, for example in [61], to achieve better 23 results than using only the truncated nuclear norm. However, minimizing the nuclear norm requires evaluating the computationally expensive SVD in each iteration. As mentioned earlier, in [35] the authors proposed an extension of the work in [60] into the matrix completion problem and proposed computationally efficient algorithms for matrix completion. In this thesis we propose two matrix completion algorithms based on the schatten-p functions that are more robust to noise than the previous algorithms. Tensor completion is a generalization of the matrix completion concept explained in section 2.5.2. We generalize the completion algorithm for the matrix (i.e., 2- mode or 2order tensor) case to higher-order tensors by solving the following optimization problem [30], [39], [64]: 𝑚𝑖𝑛 ‖X‖∗ X (2.11) 𝑠. 𝑡. 𝑃𝛺 (X ) = 𝑃𝛺 (M) where both X and M are n-mode tensors which have the same size in each mode. One issue with this definition is that the nuclear norm of a high dimensional tensor has no unique definition, the mostly used format in tensor completion application is the average of the nuclear norm of the matrix formed by unfolding the tensor along each dimension [38], [39], [65]. This can be represented mathematically as: 𝑛 1 ‖X ‖∗ = ∑‖(X )(𝑖) ‖ ∗ 𝑛 (2.12) 𝑖=1 Since the unfolded tensors tend to form a relatively large matrix, the use of SVD decomposition for each unfold will take much longer time. In this thesis, we will also extend these efficient Schatten-p function that we defined for matrix completion in equation (2.9) to the multi-dimensional tensor domain and propose a tensor completion algorithm based 24 on these functions that outperforms state of the art tensor completion algorithms and at the same time reduces the computation time. 2.4 Robust Tensor Recovery In matrix domain, several algorithms were developed to separate the low rank and sparse models which is known as the robust matrix recovery. The main idea is based on minimizing both the nuclear norm and the ℓ1 norm of the estimated low rank and sparse components [10], [42], [43], [66]. 𝑚𝑖𝑛 ‖𝑋‖∗ + 𝜆‖𝐸‖1 𝑋,𝐸 (2.13) 𝑠. 𝑡. 𝑀 = 𝑋 + 𝐸 where 𝑋 is the estimated low rank matrix, 𝐸 is the estimated sparse matrix and 𝑀 is the observed matrix. Since most applications in real life, especially in computer vision such as colored images and video, are multidimensional data [15], [16]. Applying the recovery algorithms to each two-dimensional space separately will ignore the internal correlation in the signal. Hence, it is important to process the whole tensor data [16], [67]. In the few past years, some contributions have been made in the robust tensor recovery field. The authors in [15] proposed a robust algorithm for error data correction for image and video signals. The authors in [16] used the multi-linear augmented Lagrangian multiplier for tensor recovery and they applied it to recover corrupted multidimensional data. The authors in [14] applied different tensor recovery algorithms and they did a detailed performance analysis among them. The common problem with these algorithms is that they require computing the SVD of the tensor in every iteration, which is very computationally expensive. In this thesis, we 25 formulate a new objective function for the tensor RPCA problem based on using the truncated and smoothed Schatten-p functions. We refer to our proposed framework as the Truncated-and-Smoothed Robust Tensor Recovery (TSRTR). We solved the proposed objective function using the Augmented Lagrangian Multiplier (ALM) algorithm, which is the trivial algorithm to use with RPCA. We also proposed two other algorithms based on the Accelerated Proximal Gradient (APG) and the Alternating Directions Methods of Multipliers (ADMM). 2.5 Truncated Nuclear Norm Even though the nuclear norm based algorithms for matrix completion perform well on synthetic data, these algorithms do not perform well in practical applications due to the fact that nuclear norm doesn’t accurately approximate the rank function. The rank of a matrix treats all the non-zero singular values equally, while the nuclear norm treats singular values differently. Also, nuclear norm approaches do not converge sometimes due if the underlying matrix does not follow the theoretical bounds such as the coherence property [34]. Instead of modifying all the singular values of the estimated matrix we need to keep the significant singular values unchanged and only modify the smaller singular values which we refer to as the Truncated Nuclear Norm Regularization (TNNR) of the matrix. For a matrix 𝑀 ∈ ℝ𝑚×𝑛 or rank 𝑟, TNNR can be represented as min(𝑚, 𝑛) − 𝑟. The truncated nuclear norm only minimizes the 𝑟 lower singular values which produces better approximation to the rank function [34], [39]. Now the matrix completion problem can be casted as: 26 𝑚𝑖𝑛 ‖𝑋‖𝑟 𝑋 𝑠. 𝑡. ‖𝑃𝛺 (𝑋 ) − 𝑃𝛺 (𝑀)‖2𝐹 ≤ 𝜀 (2.14) However, ‖. ‖𝑟 is not convex and hence we have to find a close convex approximation to it. In [34] the authors included a theorem to prove that TNNR can be bounded by a convex function approximation: min(𝑚,𝑛) 𝑇𝑟(𝐴𝑋𝐵 𝑇) ≤ ∑ 𝜎𝑖 (𝑋) (2.15) 𝑖=1 where 𝑋 ∈ ℝ𝑚×𝑛 , 𝐴 ∈ ℝ𝑚×𝑛 , 𝐵 ∈ ℝ𝑚×𝑛 , A𝐴𝑇 = 𝐼 and 𝐵𝐵 𝑇 = 𝐼 and 𝐼 is the identity matrix. Let the SVD of the matrix 𝑋 = 𝑈Σ𝑉 𝑇 where 𝑈 = (𝑢1 , 𝑢2 , … 𝑢𝑚 ) ∈ ℝ𝑚×𝑚 , 𝑉 = (𝑣1 , 𝑣2 , … 𝑣𝑛 ) ∈ ℝ𝑛×𝑛 and Σ ∈ ℝ𝑚×𝑛 , the equality in (2.15) will hold if we set [34]: 𝐴 = (𝑢1 , 𝑢2 , … 𝑢𝑟 )𝑇 ∈ ℝ𝑟×𝑚 , 𝐵 = (𝑣1 , 𝑣2 , … 𝑣𝑟 )𝑇 ∈ ℝ𝑟×𝑛 (2.16) Thus, the matrix completion optimization function can be represented as: 𝑚𝑖𝑛 ‖𝑋‖∗ − 𝑇𝑟(𝐴𝑋𝐵 𝑇 ) 𝑋 𝑠. 𝑡. ‖𝑃𝛺 (𝑋 ) − (2.17) 𝑃𝛺 (𝑀)‖2𝐹 ≤𝜀 Since the objective function is convex by setting 𝐴 and 𝐵 as shown in (2.16), then it can be solved using any convex optimization algorithms. The solution to this function consists of two steps, the first one is to fix 𝑋 and compute 𝐴 and 𝐵, the next iteration is to estimate the new value of 𝑋. An example of this two-step optimization algorithm is shown in Algorithm 2.1 [34]. Similar approach can be applied to the tensor domain for an 𝑛 -fold tensor, which can be generally represented as [39]: 27 𝑛 𝑚𝑖𝑛 ∑ 𝛼𝑖 ‖(X )(𝑖) ‖ X 𝑟 𝑖=1 (2.18) 𝑠. 𝑡. 𝑃𝛺 (X ) = 𝑃𝛺 (M) where 𝛼𝑖 , 1 ≤ 𝑖 ≤ 𝑛 is a weighting parameter for the TNNR of each unfold of the tensor. Using similar proof to [34], equation (2.18) can be rewritten as: 𝑛 𝑚𝑖𝑛 ∑(𝛼𝑖 ‖(X )(𝑖) ‖ − 𝛼𝑖 𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 ) X 𝑟 (2.19) 𝑖=1 𝑠. 𝑡. 𝑃𝛺 (X ) = 𝑃𝛺 (M) 28 Algorithm 2.1: TNNR Iterative Scheme Matrix Completion Input: 𝑃𝛺 (𝑀), tolerance 𝜖 Steps: initialize 𝑋1 = 𝑃𝛺 (𝑀) Repeat 1: Given 𝑋𝑙 [𝑈𝑙 , Σ𝑙 , 𝑉𝑙 ] = 𝑆𝑉𝐷(𝑋𝑙 ) where 𝑈𝑙 = (𝑢1 , 𝑢2 , … 𝑢𝑚 ) ∈ ℝ𝑚×𝑚 𝑉𝑙 = (𝑣1 , 𝑣2 , … 𝑣𝑛 ) ∈ ℝ𝑛×𝑛 Compute 𝐴𝑙 = (𝑢1 , 𝑢2 , … 𝑢𝑟 )𝑇 ∈ ℝ𝑟×𝑚 𝐵𝑙 = (𝑣1 , 𝑣2 , … 𝑣𝑟 )𝑇 ∈ ℝ𝑟×𝑛 2: Solve 𝑋𝑙+1 = 𝑚𝑖𝑛 ‖𝑋𝑙 ‖∗ − 𝑇𝑟(𝐴𝑙 𝑋𝑙 𝐵𝑙 𝑇 ) 𝑋𝑙 𝑠. 𝑡. ‖𝑃𝛺 (𝑋 ) − 𝑃𝛺 (𝑀)‖2𝐹 ≤ 𝜀 Until ‖𝑋𝑙+1 − 𝑋𝑙 ‖𝑭 ≤ 𝜖 Output: 𝑋 29 Chapter 3 Proposed Algorithms for Low Rank Models 3.1 Matrix Completion Algorithms using Smooth Rank Approximation [68] In this work, we are interested in solving the matrix completion problem using the reweighted rank approximation minimization formulation proposed in [35]. To that end, we set the exponent parameter 𝑝 = 1 in the smooth approximation function (2.9). Further, and to make the solution to (2.10) more robust to noise, we also relax the equality constraints into an inequality constraint. Hence, we cast the matrix completion problem as follows: 1 𝑚𝑖𝑛 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)2 𝑋 (3.1) 𝑠. 𝑡. ‖𝑃𝛺 (X-M)‖2𝐹 ≤ 𝜀 where 𝜀 > 0. The Lagrangian equivalent of (3.1) is given by: 𝜆 𝑚𝑖𝑛 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)1/2 + ‖𝑃𝛺 (𝑋 − 𝑀)‖2𝐹 𝑋 2 (3.2) where 𝜆 is the lagrange multiplier. More importantly, we develop two algorithms to solve the reweighted-rankapproximation matrix completion problem in (3.1). These algorithms are described in the following two subsections. 3.1.1. Accelerated Proximal Gradient Algorithm for Reweighted Rank Approximation The APG algorithm solves problems of the form [69]: 𝐹(𝑥) = 𝑓(𝑥) + 𝑔(𝑥) And the APG solution to (3.2) is given by [69]: 30 (3.3) 𝜏 𝑄𝜏 (𝑥, 𝑧) = 𝑓(𝑧) + 𝑔(𝑥) + 〈𝛻𝑓(𝑧), 𝑥 − 𝑧〉 + ‖𝑥 − 𝑧‖2𝐹 2 (3.4) Completing the sum of squares and simplifying (3.4) give: 𝜏 2 1 1 𝑄𝜏 (𝑥, 𝑧) = 𝑓(𝑧) + 2 ‖𝑥 − 𝑧 + 𝜏 𝛻𝑓(𝑧)‖ − 2𝜏 ‖𝛻𝑓(𝑧)‖2𝐹 + 𝑔(𝑥) 𝐹 (3.5) Now, we cast the reweighted-rank-approximation matrix completion considering the Lagrangian dual problem (3.1), by selecting: 1 𝜆 𝑔(𝑋) = 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)2 , 𝑓(𝑋) = ‖𝑃𝛺 (𝑋 − 𝑀)‖2𝐹 2 (3.6) Substitute from (3.6) into (3.5) we get: 2 𝜆 𝜆 𝜏 𝜆 2 2 𝑚𝑖𝑛𝑄𝜏 (𝑋, 𝑍) = ‖𝑃𝛺 (𝑍 − 𝑀)‖𝐹 − ‖𝑃𝛺 (𝑍 − 𝑀)‖𝐹 + ‖𝑋 − 𝑍 + 𝑃𝛺 (𝑍 − 𝑀)‖ 𝑋,𝑍 2 2𝜏 2 𝜏 𝐹 (3.7) + 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)1/2 Using a similar idea to [53] for the nuclear norm minimization case, we introduce another matrix 𝑌. 𝜆 𝑌 𝑘+1 =𝑍 𝑘 − (𝑃𝛺 (𝑍 𝑘 − 𝑀)) 𝜏 Now (3.7) becomes: 31 (3.8) Algorithm 3.1: SRAMC-APG Input: 𝑃𝛺 (𝑀), λ, τ Steps: 1: initialize 𝑋1 =𝑋 0 =𝑌1 =𝑍1 = 𝑃𝛺 (𝑀), 𝑡1 =𝑡 0 =1 2: while Not converged Evaluate 𝑍 𝑘+1 using (2.12) Evaluate 𝑌 𝑘+1 using (2.8) Evaluate 𝑊 using (2.11) Evaluate 𝑋 𝑘+1 using (2.10) Update 𝑡 𝑘+1 using (2.13) End While 𝑋 Output: 𝑚𝑖𝑛𝑄𝜏 (𝑋, 𝑌, 𝑍 ) 𝑋,𝑌,𝑍 𝜆 𝜏 𝜆 = ‖𝑃𝛺 (𝑍 − 𝑀)‖2𝐹 + ‖𝑋 − 𝑌 ‖2𝐹 − ‖𝑃𝛺 (𝑍 − 𝑀)‖2𝐹 2 2 2𝜏 (3.9) + 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)1/2 We can minimize (3.9) using Block Coordinate Descent (BCD), we fix Z,Y ; then we differentiate and solve for X. 𝑋 𝑘+1 = 𝑌 𝑘+1 − 𝜏𝑋 𝑘 𝑊 (3.10) 𝑊 = (𝑋 𝑇 𝑋 + 𝛾𝐼)−1/2 (3.11) where 𝑊 is given by: 32 As in [53] 𝑍 𝑘 can be found by: 𝑍 𝑘+1 𝑡 𝑘−1 − 1 𝑘 =𝑋 + (𝑋 − 𝑋 𝑘−1 ) 𝑡𝑘 𝑘 (3.12) And update 𝑡 𝑘+1 using: 𝑡 𝑘+1 = 1 + √1 + 4(𝑡𝑘 )2 2 (3.13) Algorithm 3.1 shows the final procedure of our Reweighted Rank Approximation Matrix Completion with APG (SRAMC-APG). 3.1.2. Alternating Direction Method of Multipliers for Reweighted Rank Approximation ADM solves general problems of the form [64]: 𝑚𝑖𝑛 𝑥∈ℝ𝑞 ,𝑦∈ℝ𝑚 𝑓(𝑥) + 𝑔(𝑦) (3.14) 𝑠. 𝑡. 𝑥 ∈ ℂ𝑥 , 𝑦 ∈ ℂ𝑦 , 𝐺𝑥 = 𝑦 where, ℂ𝑥 ⊂ ℝ𝑞 , ℂ𝑦 ⊂ ℝ𝑚 and 𝐺 ∈ ℝ𝑞∗𝑚 The Lagrangian dual problem of (2.14) is given by [52], [64]: 𝜏 𝑄(𝑋, 𝑌, 𝐿) = 𝑓(𝑋) + 𝑔(𝑌) + 〈𝐿𝑇 , 𝐺𝑋 − 𝑌〉 + 2 ‖𝐺𝑋 − 𝑌‖2𝐹 (3.15) where 𝐿 ∈ ℝ𝑚 is the Lagrange multipliers matrix. In order to formulate (3.2) as an ADM problem, we need to split the function into two variables. 1 𝜆 𝑚𝑖𝑛 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)2 + ‖𝑃𝛺 (𝑌 − 𝑀)‖2𝐹 𝑋 2 𝑠. 𝑡. 𝑋 = 𝑌 33 (3.16) Algorithm 3.2: SRAMC-ADM Input: 𝑃𝛺 (𝑀), λ, τ Steps: 1: initialize 𝑋1 =𝑌1 =PΩ (𝑀), 𝐿1 = 0 2: while Not converged Evaluate Y 𝑘+1 using (2.20) Evaluate 𝑊 using (2.11) Evaluate X 𝑘+1 using (2.19) Evaluate L 𝑘+1 using (22) End While 𝑋 Output: The Lagrangian dual problem of (3.16) after setting 𝐺 to the identity matrix is given by: 1 𝜆 𝜏 𝑚𝑖𝑛Q(X,Y,L) = 𝑇𝑟(𝑋 𝑇 𝑋 + 𝛾𝐼)2 + ‖𝑃𝛺 (𝑌 − 𝑀)‖2𝐹 + 〈𝐿𝑇 , 𝑋 − 𝑌〉 + ‖𝑋 − 𝑌‖2𝐹 X,Y,L 2 2 (3.17) Using BCD, we reduce (3.17) for X by fixing Y and L. After completing the sum of squares and neglecting the constants we get: 𝑇 𝑚𝑖𝑛Q(X) = 𝑇𝑟(𝑋 𝑋 + X 1 𝛾𝐼)2 2 𝜏 1 + ‖X − Y + L ‖ 2 𝜏 𝐹 (3.18) Differentiating and solving (3.18) for X: 𝑋 𝑘+1 = 𝑌 𝑘+1 − 𝜏(𝑋 𝑘 𝑊 + 𝐿) (3.19) where 𝑊 is defined as in (3.11) Next, we reduce (3.17) for Y by fixing X and L and neglecting the constant terms; then we differentiate and solve for Y. 34 1 𝑌 𝑘+1 = 𝑋 𝑘 + (L 𝑘 − 𝜆 𝑃𝛺 (𝑌 𝑘 − 𝑀)) 𝜏 (3.20) Finally, we simplify (3.17) by completing the sum of squares and solve for L by fixing X and Y and neglecting the constant terms. 2 𝜏 1 1 𝑚𝑖𝑛Q(L) = ‖X − Y+ L ‖ − ‖L ‖2𝐹 L 2 𝜏 2𝜏 𝐹 (3.21) Differentiating and solving for L: L 𝑘+1 = L 𝑘 + 𝜏(X 𝑘+1 − Y 𝑘+1 ) (3.22) The implementation of the Reweighted Rank Approximation Matrix Completion with ADM (SRAMC-ADM) is shown in Algorithm 3.2. 3.2 Smooth Rank Approximation Tensor Completion [67] In this work, we are interested in solving the matrix completion problem using the reweighted rank approximation minimization formulation proposed in [35]. In this thesis we are interested in using a generalized version of the smooth Schatten-p function to solve the tensor completion problem, and hence we need to redefine (2.9) to the higher dimensional space. Definition 3.1: The smooth Schatten-p function for an N-dimensional signal is given by: 𝑁 𝑝 1 𝑓𝑝 (X ) = ∑ 𝑇𝑟((X)𝑇(𝑖) (X)(𝑖) + 𝛾𝐼)2 𝑁 (3.23) 𝑖=1 One can verify the validity of Defintion 2.1 for the 2-dimensional Schatten-p case (𝑁 = 2) by observing that the trace of the matrix and its transpose are the same; and hence, (3.23) reduces to (2.9). 35 Now the tensor completion problem based on a smooth rank approximation function can be casted as: 𝑁 𝑝 1 𝑚𝑖𝑛 ∑ 𝑇𝑟((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 X 𝑁 (3.24) 𝑖=1 𝑠. 𝑡. 𝑃𝛺 (X ) = 𝑃𝛺 (M ) Next we derive the ALM optimization method to solve the tensor completion problem in (3.24). Using similar idea for the matrix case in [63], we can rewrite (3.24) in the tensor domain as: 𝑁 𝑝 1 𝑚𝑖𝑛 ∑ 𝑇𝑟((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 X 𝑁 𝑖=1 (3.25) 𝑠. 𝑡. X + E = 𝑃𝛺 (M ), 𝑃𝛺 (E) = 0 Here, X represents the non-observed entries of the original tensor M . The partial augmented Lagrange function for (3.25) is given by: 𝑁 𝑝 𝐿(X, E,Y, 𝜇) = ∑ 𝛼𝑖 𝑇𝑟((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 + 〈Y, 𝑃𝛺 (M ) − X − E 〉 𝑖=1 (3.26) 𝜇 + ‖𝑃𝛺 (M ) − X − E‖2𝐹 2 where Y is the Lagrange multiplier tensor. In this section we state the solution for (3.26) with respect to each variable (X, E and Y ) as three lemmas; the proofs are presented separately in section 3.5. Lemma 3.1: The optimal solution for (3.26) with respect to X is given by: X 𝑘+1 1 = 𝑃𝛺 (M ) + Y 𝜇 1 𝑁 ∑𝑖=1 𝛽𝑖 Z𝑖 𝑘+1 𝜇 𝑘 𝑘 − [E + ] ∑𝑁 𝑖=1 𝛽𝑖 where 𝛽 is a control parameter and Z𝑖 𝑘+1is given by: 36 (3.27) Algorithm 3.3: Input: Steps: SRATC 𝜇 > 0, 𝛽 ∈ ℝ𝑁 , 𝑃𝛺 (M) 1: initialize X 1 =PΩ (M), Y 1 =E 1 =0 2: while Not converged Do: For each dimension 𝑖 Evaluate 𝑊𝑖 using (2.29) Evaluate Z𝑖 k+1 using (2.28) End For Evaluate X 𝑘+1 using (2.27) Evaluate E 𝑘+1 using (2.30) Evaluate Y 𝑘+1 using (2.31) End While Output: X, E Z𝑖 𝑘+1 = 𝑓𝑜𝑙𝑑 (𝛼𝑖 (X 𝑘 )(𝑖) 𝑊𝑖 ) (3.28) (𝑖) 𝑊𝑖 is defined as: 𝑝 𝑊𝑖 = ((X 𝑘 )𝑇(𝑖) (X 𝑘 )(𝑖) + 𝛾𝐼)1− 2 37 (3.29) Lemma 3.2: The optimal solution for (3.26) with respect to E is given by: 1 𝑃𝛺̅ (E 𝑘+1 ) = 𝑃𝛺̅ (M + Y 𝜇 𝑘 −X 𝑘+1 ) (3.30) 𝑃𝛺 (E 𝑘+1 )=0 Lemma 3.3: The optimal solution for (3.26) with respect to Y is given by: Y 𝑘+1 =Y 𝑘 + 𝜇(𝑃𝛺 (M ) − X 𝑘+1 − E 𝑘+1 ) (3.31) Benefitting from the smooth Schatten-P function formula we used a code optimization idea in order to reduce the computation complexity. Since the trace of the matrix (X 𝑘 )(𝑖) ∈ ℝ𝑝×𝑞 and its transpose are the same; then while computing 𝑊𝑖 we used the transpose of (X 𝑘 )(𝑖) if 𝑞 > 𝑝 and we also used the term ((X 𝑘 )(𝑖) 𝑊𝑖 )𝑇 in order to maintain the original dimensions of the unfolded tensor. The proposed Smooth Rank Approximation Tensor Completion (SRATC) algorithm is presented in algorithm 3.3. 3.3 Smooth Rank Approximation for Robust Tensor Recovery [70] This work is motivated by recent works [10][11], which show that matrix and tensor completion using the truncated nuclear norm provides a significant performance improvement over the traditional nuclear norm minimization framework. This imporvement can be attributed to the observatrion that minimizing the truncated nuclear norm keeps the high singular values (related to the actual low rank signal) intact, and at the same time it minimizes the lower singular values (related to noise or sparse part). Some challenges associated with applying the objective function in [11] for the tensor recovery problem are: First, it needs computing the singular value decomposition of the tensor (HoSVD) two times in each step, which is very computationally expensive. Second, 38 the nature of the RPCA problem requires adding a minimization function to the ℓ1 term to estimate the sparse part of the observed tensor, which makes the derivation of the optimization algorithm quite challenging. In this thesis, we utilize the generalized multi-dimensional smoothed Schatten-p in [8] which was used to achieve lower execution time in tensor completion. The general objective function that we consider is: 𝑚 1 𝑚𝑖𝑛 ∑ [𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 ] − 𝛽𝑖 𝑇𝑟 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 ) + 𝜆‖(E)(𝑖) ‖1 ] X, E 𝑖=1 (3.32) 𝑠. 𝑡. M=X + E where X is the low rank tensor to be estimated and E is the sparse tensor and M is the observed tensor. If the SVD decomposition of a matrix 𝑋 is described as 𝑋 = 𝑈𝛴𝑉 𝑇 , then 𝐴𝑖 = (𝑢 1 … 𝑢𝑟 )𝑇 , 𝐵𝑖 = (𝑣 1 … 𝑣𝑟 )𝑇 . Since the trace function is linear and we are interested in minimizing the nuclear norm function, we set 𝑝 = 1 in the Schatten-p function. Now Equation (3.32) can be casted as: 𝑚 1 𝑚𝑖𝑛 ∑ [𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )] + 𝜆‖(E)(𝑖) ‖ ] X, E 1 𝑖=1 (3.33) 𝑠. 𝑡. M=X + E In this thesis, we propose three different solutions to (3.33) using the Augmented Lagrangian Multiplier (ALM), Accelerated Proximal Gradient (APG) and Alternating Direction Methods of Multipliers (ADMM) optimization algorithms. We also provide analytical derivations and detailed proofs for key expressions of all three algorithms. 39 3.3.1. The Proposed ALM Solution The ALM is an optimization algorithm that is used to solve constrained optimization problems due to its Q-linear solution to the problem and because of the availability of an exact solution to the ALM [7], [12]. In general, ALM solves problems of the form: min 𝑓(𝑥) + 𝑔(𝑦) (3.34) s. t. 𝑀𝑥 = 𝑦 Here 𝑥 and 𝑦 are the variables to be estimated from the optimization problem and 𝑀 is the sampling operator. Using the Lagrangian method for solving optimization problems by converting the condition into a term in the objective function, the general ALM solution to (3.34) can be represented as: 𝜆 min 𝑓(𝑥) + 𝑔(𝑦)+< 𝑙, 𝑀𝑥 − 𝑦 > + ‖𝑀𝑥 − 𝑦‖2𝐹 2 (3.35) where 𝑙 is the Lagrange multiplier vector. To solve (3.35) in the multi-dimensional tensor domain, we need to change all variables to tensors and we need to case our objective function (3.33) so that it can be solved using ALM, to this extent, we need to select the variables as shown below: 𝑚 1 𝑓(X) = ∑ [𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )]] 𝑖=1 (3.36) 𝑔(E ) = ‖(E)(𝑖) ‖1 40 After substituting the variables in the general ALM solution (3.35), The ALM solution to the proposed framework can be represented as: 𝑚 1 𝐿(X, E,Y, 𝜇) = ∑ [𝛼𝑖 𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )] 𝑖=1 (3.37) 𝜇 + 𝜆‖(E)(𝑖) ‖ ] + 〈Y, M − X − E 〉 + ‖M − X − E‖2𝐹 1 2 where Y is the Lagrange multiplier tensor. Lemma 3.4: For a tensor M ∈ ℝ𝑛1 ×𝑛2 …×𝑛𝑚 that is formed by the superposition of low rank and sparse tensors, X and E respectively; the optimal solution for (3.37) with respect to X is given by: X 𝑘+1 1 𝑘 = [∑𝑚 𝑖=1 𝛼𝑖 𝑓𝑜𝑙𝑑 (𝑍𝑖 − 𝜇 (X𝑖 )(𝑖) 𝑊𝑖 + 𝛽𝑖 𝜇 𝐴𝑇𝑖 𝐵𝑖 ) (𝑖) ] / ∑𝑛𝑖=1 𝛼𝑖 (3.38) where 𝑊𝑖 is represented by: 1 𝑊𝑖 = ((X 𝑘 )𝑇(𝑖) (X 𝑘 )(𝑖) + 𝛾𝐼)− 2 (3.39) and Z𝑖 is represented by: Z𝑖 = (M − E 𝑘 1 + Y 𝑘 )(𝑖) 𝜇 (3.40) Lemma 3.5: For a tensor M ∈ ℝ𝑛1 ×𝑛2 …×𝑛𝑚 that is formed by the superposition of low rank and sparse tensors, X and E respectively; the optimal solution for (3.37) with respect to E is given by: 41 E 𝑘+1 = 𝔖 𝜆 (M − X 𝑘+1 𝜇 1 + Y 𝑘) 𝜇 (3.41) The optimal solution for (3.37) with respect to Y is given by: Y 𝑘+1 =Y 𝑘 + 𝜇(M − X 𝑘+1 −E 𝑘+1 ) (3.42) The proofs for lemmas 3.4 and 3.5 are shown in section 3.5. The proposed TSRTR-ALM algorithm is presented in Algorithm 1. 3.3.2. The Proposed APG Solution The proximal gradient algorithms solve unconstrained optimization problems. It works by evaluating the proximal operator of the function. The accelerated version of these algorithms is achieved by adding a weighted difference of the estimated iterative solution which helps speeding up the convergence of the algorithm. APG solves problems of the form [13]: 𝐹(𝑥) = 𝑓(𝑧) + 𝑔(𝑥) (3.43) and the APG solution to (3.43) is given by [13]: 𝜏 𝑄𝜏 (𝑥, 𝑧) = 𝑓(𝑧) + 𝑔(𝑥) + 〈𝛻𝑓(𝑧), 𝑥 − 𝑧〉 + 2 ‖𝑥 − 𝑧‖2𝐹 (3.44) Completing the sum of squares and simplifying (3.44) give: 𝜏 2 1 1 𝑄𝜏 (𝑥, 𝑧) = 𝑓(𝑧) + 2 ‖𝑥 − 𝑧 + 𝜏 𝛻𝑓(𝑧)‖ − 2𝜏 ‖𝛻𝑓(𝑧)‖2𝐹 + 𝑔(𝑥) 𝐹 (3.45) The objective function (3.45) solves for a single variable (𝑥), while our objective function (3.32) consists of two variables (X, E). To solve this, we extend the idea in [5], which was implemented on matrix robust PCA with nuclear norm minimization, to the multidimensional tensor domain by defining a pair of two-tensor variables S = (SX = X , SE = 42 E) ∈ ℝ𝑛1 ×𝑛2 …×𝑛𝑚 ×ℝ𝑛1 ×𝑛2 …×𝑛𝑚 and Z = (ZX , ZE ) ∈ ℝ𝑛1 ×𝑛2 …×𝑛𝑚 ×ℝ𝑛1 ×𝑛2 …×𝑛𝑚 instead of the variables (𝑥 and z). Now, we can solve the Truncated Robust Tensor Recovery problem (3.32), by selecting: 𝑚 1 𝑔(S) = ∑ [𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )] + 𝜆‖(E)(𝑖) ‖ ] 1 𝑖=1 (3.46) 𝜇 𝑓(Z) = 2 ‖M − ZX − Z E ‖2𝐹 Substitute from (3.46) into (3.45) we get: 𝑚𝑖𝑛𝑄𝜏 (S, Z) = S, Z 𝜇 ‖M − ZX − Z E ‖2𝐹 2 𝑚 + ∑ [(𝑇𝑟(SX )𝑇(𝑖) (SX )(𝑖) 𝑖=1 + 𝑝 𝛾𝐼)2 + 𝜆‖(SE )(𝑖) ‖ ] 1 43 − 𝛽𝑖 𝑇𝑟 (𝐴𝑖 (SX )(𝑖) 𝐵𝑖 𝑇 ) (3.47) Algorithm 3.4: TSRTR-ALM Input: M Steps: 1:initialize, X 0 = M, Y 0 =E 0 = 0, 𝜆 > 0 , 𝛽𝑖 > 0, 𝛼𝑖 > 0, 𝜂 > 1, 𝜇 2: while Not converged For each dimension 𝑖 [𝑈, 𝛴, 𝑉] = 𝐻𝑜𝑆𝑉𝐷((X𝑖 𝑘 )(𝑖) ) 𝐴𝑖 = (𝑢 1 … 𝑢 𝑟𝑖 )𝑇 , 𝐵𝑖 = (𝑣 1 … 𝑣 𝑟𝑖 )𝑇 1 𝑊𝑖 = ((X 𝑘 )𝑇(𝑖) (X 𝑘 )(𝑖) + 𝛾𝐼)− 2 𝑍𝑖 = (M − E (X 𝑘+1 𝑘 1 + 𝜇 Y 𝑘 )(𝑖) 1 )(𝑖) = 𝛼𝑖 𝑓𝑜𝑙𝑑 (𝑍𝑖 − 𝜇 (X𝑖 𝑘 )(𝑖) 𝑊𝑖 + 𝛽𝑖 𝜇 𝐴𝑇𝑖 𝐵𝑖 ) (𝑖) End For X E 𝑘+1 𝑘+1 = ∑𝑚 𝑖=1(X 𝑘+1 )(𝑖) / ∑𝑛𝑖=1 𝛼𝑖 = 𝔖 𝜆 (M − X 𝑘+1 + 𝜇 Y 𝑘) 𝜇 Y 𝑘+1 =Y 𝑘 + 𝜇(M − X 1 𝑘+1 −E 𝑘+1 ) 𝜇 = η∗𝜇 End While Output: X ∗, E ∗ To simplify the mathematical solution, we introduce another tensor pair Y = (YX , YE ) ∈ ℝ𝑛1 ×𝑛2 …×𝑛𝑚 ×ℝ𝑛1 ×𝑛2 …×𝑛𝑚 . Similar idea was used in [14] for the nuclear norm minimization case and we define it for each member of the tensor pair. 44 Y 𝑘+1 𝑘 X =Z X 𝜇 − (M − ZX ) 𝜏 𝜆 Y E𝑘+1 =Z 𝑘E − (M − Z E ) 𝜏 (3.48) s.t. S = Y Now (3.48) becomes: 𝑚𝑖𝑛𝑄𝜏 (S, Y, Z ) S, Y, Z = 𝜇 𝜆 𝜏 ‖M − ZX − Z E ‖2𝐹 − ‖M − ZX − Z E ‖2𝐹 + ‖S − Y ‖2𝐹 2 2𝜏 2 𝑚 + ∑ [𝛼𝑖 𝑇𝑟 [((SX )𝑇(𝑖) (SX )(𝑖) 𝑖=1 + 1 𝛾𝐼)2 (3.49) 𝑇 − 𝛽𝑖 (𝐴𝑖 (SX )(𝑖) 𝐵𝑖 )] + 𝜆‖(E)(𝑖) ‖ ] 1 We use the default APG update rule proposed in [14] for each variable [14]: 𝑘 Z𝑘+1 X =S X 𝑡 𝑘−1 − 1 𝑘 + (S X − S 𝑘−1 X ) 𝑘 𝑡 (3.50) 𝑘 Z𝑘+1 E =S E + 𝑡 𝑘−1 −1 𝑡𝑘 (S 𝑘E − S 𝑘−1 E ) And update 𝑡 𝑘+1 using: 𝑡 𝑘+1 = 1 + √1 + 4(𝑡𝑘 )2 2 (3.51) Lemma 3.6: For a tensor M ∈ ℝ𝑛1 ×𝑛2 …×𝑛𝑚 that is formed by the superposition of low rank and sparse tensors, X and E respectively; The optimal solution for the low rank (SX ) and sparse (SE ) tensors using APG is given by: 45 S 𝑘+1 X 𝑚 𝑛𝑚 𝑖=1 𝑖=1 1 𝛽𝑖 = [∑ 𝛼𝑖 𝑓𝑜𝑙𝑑 ((Y X𝑘+1 )(𝑖) − (SX𝑘 )(𝑖) 𝑊𝑖 + 𝐴𝑇𝑖 𝐵𝑖 ) ] / ∑ 𝛼𝑖 𝜏 𝜏 (𝑖) (3.52) S 𝑘+1 = 𝔖 𝜆 (Y E𝑘+1 ) E 𝜇 where 𝑊𝑖 is as defined in (3.32). The proof of lemma 3.6 is shown in section 3.5. Algorithm 2 shows the optimization procedure for the Truncated and Smooth Robust Tensor Recovery using Accelerated Proximal Gradient (TSRTR-APG). 3.3.3. The Proposed ADMM Solution ADMM is an extension of the ALM algorithm. ADMM solves structured optimization problems of the form [15]: 𝑚𝑖𝑛 𝑥∈ℝ𝑞 ,𝑦∈ℝ𝑚 𝑓(𝑥) + 𝑔(𝑦) (3.53) s. t. 𝑥 ∈ ℂ𝑥 , 𝑦 ∈ ℂ𝑦 , 𝐺𝑥 = 𝑦 where, ℂ𝑥 ⊂ ℝ𝑞 , ℂ𝑦 ⊂ ℝ𝑚 and 𝐺 ∈ ℝ𝑞∗𝑚 Similar to ALM, the optimization is done by alternatively solving for 𝑥 and 𝑦. 46 Algorithm 2: TSRTR-APG Input: M, 𝜇, λ, τ Steps: 1: initialize S 1X =S 0X =S 1E =S 0E = 0, 𝑡1 =𝑡 0 =1 2: while Not converged 𝑘 Z𝑘+1 X =S X + 𝑘 Z𝑘+1 E =S E + 𝑡 𝑘−1 −1 𝑡𝑘 𝑡 𝑘−1 −1 𝑡𝑘 (S 𝑘X − S 𝑘−1 X ) (S 𝑘E − S 𝑘−1 E ) 𝜇 𝑘 Y 𝑘+1 X =Z X − 𝜏 (M − ZX ) 𝜆 𝑘 Y 𝑘+1 E =Z E − (M − Z E ) 𝜏 For each fold 𝑖 1 𝑊𝑖 = ((X 𝑘 )𝑇(𝑖) (X 𝑘 )(𝑖) + 𝛾𝐼)− 2 1 (S 𝑘+1 )(𝑖) = 𝛼𝑖 𝑓𝑜𝑙𝑑 ((Y X𝑘+1 )(𝑖) − 𝜏 (S 𝑘X )(𝑖) 𝑊𝑖 + X End for 𝑚 S 𝑘+1 X = [∑ 𝛼𝑖 (S 𝑚 𝑘+1 )(𝑖) X ] / ∑ 𝛼𝑖 𝑖=1 𝑖=1 S 𝑘+1 = 𝔖 𝜆 (Y E𝑘+1 ) E 𝜇 𝑡 𝑘+1 = 1 + √1 + 4(𝑡𝑘 )2 2 End While Output: S ∗X , S ∗E 47 𝛽𝑖 𝜏 𝐴𝑇𝑖 𝐵𝑖 ) (𝑖) The Lagrangian dual problem of (3.53) is given by [15], [16]: 𝑄(𝑥, 𝑦, 𝐿) = 𝑓(𝑥) + 𝑔(𝑦) + 〈𝐿, 𝐺𝑥 − 𝑦〉 (3.54) 𝜏 + ‖𝐺𝑥 − 𝑦‖2𝐹 2 where 𝐿 ∈ ℝ𝑚 is the Lagrange multipliers matrix. Due to the separable structure of ADMM, it is considered an improvement over ALM in solving convex programming problems [17], [18]. Since our cost function (3.32) already has two variables, then we can select the 𝑓 and 𝑔 functions as follow: 𝑚 1 𝑓(X ) = ∑ [𝛼𝑖 𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )]] 𝑖=1 (3.55) 𝑚 𝑔(E) = ∑ ‖(E)(𝑖) ‖ 1 𝑖=1 Using the functions defined in (3.55) the ADMM optimization function can be casted as: 𝑚𝑖𝑛 Q(X , E , L) X,E,L 𝑚 1 = ∑ [𝛼𝑖 𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )] (3.56) 𝑖=1 𝜏 + 𝜆‖(E)(𝑖) ‖ ] + 〈L, X +E − M〉 + ‖X +E − M‖2𝐹 1 2 Lemma 3.7: The optimal solution for the low rank (X ) and sparse (E) and the Lagrange multiplier (L) tensors is given by: 48 Algorithm 3: Input: Steps: TSRTR-ADMM M, 𝜇, λ, τ 1: initialize X 0 =E 0 =0, 𝐿1 = 0 2: while Not converged For each fold 𝑖 1 𝑊𝑖 = ((X 𝑘 )𝑇(𝑖) (X 𝑘 )(𝑖) + 𝛾𝐼)− 2 (X 𝑘+1 𝑘+1 ) )(𝑖) = (E 1 (𝑖) − 𝜏 (X𝑖 𝑘 )(𝑖) 𝑊𝑖 + 𝛽𝑖 𝜏 𝐴𝑇𝑖 𝐵𝑖 End for X 𝑘+1 𝑚 𝑛𝑚 𝑖=1 𝑖=1 1 1 = M − [∑ 𝛼𝑖 (S 𝑘+1 )(𝑖) ] / ∑ 𝛼𝑖 + L X 𝜏 𝜏 E 𝑘+1 1 = 𝔖 𝜆 (M − X 𝑘 + L) 𝜏 𝜏 L 𝑘+1 = L 𝑘 − 𝜏(X 𝑘+1 + E 𝑘+1 − M) End While Output: X ∗, E ∗ 𝑚 X 𝑘+1 (E 1 = M − ∑ 𝛼𝑖 𝑓𝑜𝑙𝑑 ( 𝜏 𝑖=1 [ E 𝑘+1 𝑘+1 ) (𝑖) 1 𝑚 − (X𝑖 𝑘 )(𝑖) 𝑊𝑖 + 1 𝜏 ) / ∑ 𝛼𝑖 + L 𝛽𝑖 𝑇 𝜏 𝑖=1 𝐴𝑖 𝐵𝑖 𝜏 (𝑖) ] 1 = 𝔖 𝜆 (M − X 𝑘 + L) 𝜏 𝜏 L 𝑘+1 = L 𝑘 − 𝜏(X 𝑘+1 + E 49 𝑘+1 − M) (3.57) where 𝑊𝑖 is as defined in (3.32). The proof of lemma 3.7 is shown in section 3.5. The implementation of the Truncated and Smooth Robust Tensor Recovery using with ADMM (TSRTR-ADMM) is shown in Algorithm 2. 3.4 Experimental Results 3.4.1. Matrix Completion In this work we compare our algorithms against each other and against IRLS-1. For the sake of fair comparison, we ran all algorithms using MATLAB R2012b on a computer with 3.3 GHz i-5 CPU and 4GB Memory. We also set the same maximum number of iterations to 1000 for all algorithms. Since IRLS-1 [35] has shown performance and execution time improvements over the matrix completion algorithms with nuclear norm minimization, we will consider comparing our algorithms to each other and to the IRLS-1 only. Figure 3.1 shows the results of recovering the cameraman image (rank 80) with 50% observed entries and additive Gaussian noise (zero mean and 0.007 variance). The results show that SRAMCADM achieves the highest SNR followed by SRAMC-APG and both are higher than IRLS1 algorithm. The reason is because we used the relaxed equality constraints for the smoothed rank approximation algorithms and as a result we minimized the noise variance and reduce the noise effect. Figure 3.2 shows the convergence rate for the algorithms using the same image in Figure 3.1. It shows that both SRAMC-ADM and IRLS-1 need more iterations to converge to the final SNR, while SRAMC-APG converges within less iterations. 50 (a) (b) (c) 14.37 (d) 15.16 (e) 15.44 Figure 3.1: (a) Original image (rank 80) (b) The observed noisy samples, (c-e) Inpainted image and the corresponding SNR (dB) using (c) ILRS-1, (d) SRAMCAPG, (e) SRAMC-ADM 51 16 Signal to Noise Ratio (dB) 14 12 10 8 6 SRAMC-APG SRAMC-ADM IRLS-1 4 2 0 100 200 300 400 500 600 Iteration Number 700 800 900 1000 Figure 3.2: Iterations versus signal to noise ratio for cameraman image with 50% observed entries. Figure 3.3 shows a comparison for the SNR versus the noise variance for all three algorithms using the same cameraman image with 50% random observed noisy entries. From Figure 3.3 we notice that when the noise level is close to zero, there is a slight difference in performance between all algorithms; while once the noise increases we note that SRAMCADM achieves the best performance followed by SRAMC-APG and then IRLS-1. 52 Signal to noise ratio(dB) 18 SRAMC-APG SRAMC-A DM IRLS-1 16 14 12 10 8 6 0 0.01 0.02 0.03 0.04 0.05 Noise Variance 0.06 0.07 0.08 0.09 Figure 3.3: SNR versus the noise variance using cameraman image with 50% observed entries. 1 SRAMC-APG SRAMC-ADM IRLS-1 Correlation Coefficient 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0 50 100 150 Column Index 200 250 300 Figure 3.4: Columns correlation coefficient for the recovered cameraman image from 50% observed noisy entries. 53 Table 3.1: Comparison of average execution time vs. SNR for matrix completion algorithms using a set of 12 grayscale images. Missing 30% 50% 70% Method SNR (dB) Time (Sec) IRLS-1 15.43 25 SRAMC-ADM 17.62 20 SRAMC-APG 16.74 10 IRLS-1 14.74 25 SRAMC-ADM 16.22 21 SRAMC-APG 15.76 12 IRLS-1 12.85 24 SRAMC-ADM 13.47 21 SRAMC-APG 13.16 13 Figure 3.4 shows a plot for the correlation between each column of the recovered and the original images. It also shows that the SRAMC-ADM has the largest correlation with the original image followed by the SRAMC-APG and the IRLS-1. In Table 3.1, we compare the average SNR and the execution time for the three algorithms when applied to a set of 12 standard images1 (256×256 and rank 60). We also added Gaussian noise with variance 0.01. It can be observed that the SRAMC-ADM always gives the highest SNR. On the other hand SRAMC-APG achieves the lowest execution time. 54 (a) (b) (c) 15.76 (d) 17.34 (e) 18.57 Figure 3.5: (a) The original façade image, (b) The noisy observation, (b-d) Inpainted image and the corresponding SNR (dB) using (b) ILRS-1, (c) SRAMC-APG, (d) SRAMC-ADM 55 After presenting our algorithms and using them in general image completion from random noisy measurements, we compare the performance of our methods to the IRLS-1 algorithm using image inpainting application. We apply the algorithms for color image inpainting from texture masking by completing each color channel separately as shown in Figure 3.5 for the façade image2 (256×256) with Gaussian noise (variance 0.005). The figure also shows that our algorithms achieve better results than IRLS-1. 56 (a) (b) (c) 14.91 (d) 15.76 (e) 16.69 Figure 3.6: (a) The original drop image, (b) The masked noisy observation, (b-d) Inpainted image and the corresponding SNR (dB) using (b) ILRS-1, (c) SRAMC-APG, (d) SRAMC-ADM 57 Figure 3.6 shows the results of applying the algorithms for colored image inpainting from a text mask. Again, SRAMC-ADM achieves the highest SNR, followed by SRAMCAPG and the IRLS-1 comes last. For this application, we used the drop image3 (size 220×220, rank 200). 3.4.2. Tensor Completion In this section we present the results of the experiments that we performed to compare the performance of our SRATC algorithm to the Low Rank Tensor Completion (LRTC) algorithm in [30] and the Accelerated Proximal Gradient (APG) tensor completion in [71]. The quality of the recovered data is evaluated using the Peak Signal to Noise Ratio (PSNR) and the convergence rate is evaluated based on the time needed by the algorithm to converge to the final results. We ran the algorithms with MATLAB 2012b on the same desktop computer with a 3.3 GHz i-5 CPU and a 4GB Memory. In the first application we applied the tensor completion algorithms to recover the house color image (size 256×256) [30] with rank 80 in each color channel from random 50% observed samples; the results are shown in Figure 3.7. From Figure 3.7 we see that the APG takes less execution time than LRTC but it converges to the lowest PSNR, while the LRTC takes more time to execute than APG but it achieves higher PSNR. Our algorithm achieves comparative PSNR to LRTC, but it needs the lowest execution time. To test the performance of the tensor completion algorithms under high number of missing entries, we used the algorithms to recover the façade image (size 256×256 and rank 80 for each color channel) in [30] and using only 30% observed samples; the results are presented in Figure 3.8. 58 (a) (b) PSNR:26.30, Time:76.5 (c) PSNR:25.79, Time:64.1 (d) PSNR:26.34, Time:37.8 Figure 3.7: (a) The original house image, (b-d) Recovered images with the corresponding PSNR (dB) and execution time (Sec.) using (b) LRTC, (c) APG, (d) SRATC 59 (a) (b) PSNR:21.34,Time: 90.7 (c) PSNR:20.41,Time:75.5 (d) PSNR: 21.27,Time:49.5 Figure 3.8: (a) The original façade image, (b-d) Recovered images with the corresponding PSNR (dB) and execution time (Sec.) using (b) LRTC, (c) APG, (d) SRATC 60 40 LRTC 38 A P G S RA TC Peak Signal to Noise Rat io (dB) 36 34 32 30 28 26 24 22 20 50 100 150 200 250 Rank Figure 3.9: Peak Signal to Noise Ratio versus the rank of the matrix using lena colored image with 50% observed samples. Again, Figure 3.8 shows that APG achieves the lowest PSNR measure but converges faster than LRTC, while the proposed SRATC algorithm converges the fastest with almost the same performance as LRTC. LRTC converges the slowest but it achieves a good PSNR value. Figure 3.9 shows the effect of changing the rank on the performance of the tensor completion algorithms. In this experiment we used the color lena image (size 256×256) with 50% observed samples. From Figure 3.9 we see that when the rank of the data is very low, SRATC achieves the highest PSNR. While when the rank starts to increase, the SRATC and LRTC algorithms both have approximately the same PSNR measure. From this experiment, we also noticed that the computation time remains approximately constant as we change the rank. 61 Table 3.2: Average execution time and PSNR comparison for tensor completion algorithms using a set of 10 colored images. Missing (%) 30 50 70 Method PSNR (dB) Time (Second) LRTC 32.84 63.5 APG 30.51 57.6 SRATC 33.85 28.7 LRTC 26.59 77.3 APG 24.93 57.1 SRATC 26.74 31.5 LRTC 21.54 77.3 APG 19.39 70.0 SRATC 21.46 40.3 To give an accurate performance evaluation for the tensor completion algorithms, we apply the algorithms to recover a set of 10 different color images1 (size 256×256 and rank 60 for each color channel) from (70, 50 and 30)% observed entries; then we average the PSNR and the execution time; the results are shown in Table 3.2 We notice that regardless of the number of observed entries, the SRATC algorithm achieves the lowest execution time while LRTC and APG require significantly longer time to converge because of using the SVD decomposition in each iteration. 62 (a) (b) Time:751.8 Figure 3.10a: (a) Original façade image (b) Recovered images with the corresponding execution time (Sec.) using LRTC. 63 (a) Time:631.4 (b) Time:395.9 Figure 3.10b: Recovered images with the corresponding execution time (Sec.) using (a) APG, (b) SRATC. Another application for testing the tensor completion algorithms is blocks image inpainting. We used the façade image of size 318×861 [30]. In this application we are trying to recover the missing parts of the façade image texture by inpainting missing blocks of the image. The results are shown in Figures 3.10a and 3.10b; These results show that our algorithm also converges faster than state-of-the-art tensor completion algorithms. For this particular simulation, we don’t have the original image for evaluating the PSNR values; 64 however, by observing the visual quality of the different results, one can observe that the proposed SRATC algorithm provides a visual quality similar to the LRTC algorithm while converging with about one-half of the time required by LRTC. To show the applicability of our SRATC algorithm for higher dimensional data and nonsquare tensors, we show in Figure 3.11 the result of recovering a video sequence of the tomato video [30]; we used 20 frames with each frame of size 242×320 pixels. We also compared the SRATC result to LRTC and APG. As shown in Figure 3.11, our SRATC algorithm saves about 4 minutes when compared to the APG algorithm in addition to the higher PSNR and the visually better recovered frames. Compared to LRTC, our algorithm saves approximately 10 minutes execution time and achieves the same PSNR value. 65 (a) (b) PSNR: 22.82, Time: 1892.6 (c) PSNR:21.01, Time:1530.8 (d) PSNR: 22.86, Time: 1311.4 Figure 3.11: (a) The original tomato frame (number 6), (b-d) Recovered frames with the corresponding PSNR (dB) and execution time (Sec.) using (b) LRTC, (c) APG, (d) SRATC 66 3.4.3. Robust Tensor Recovery In this section we compare the performance and execution time of our proposed algorithms against the Rank Sparsity Tensor Decomposition (RSTD) algorithm which is proposed in [6], the Multi-linear Augmented Lagrangian Multiplier (MALM) in [7] and finally the Inexact Alternating Direction Augmented Lagrangian (IADAL) in [2]. For a fair comparison, we ran all algorithms using MATLAB R2016a on a computer with 3.3 GHz i5 CPU and 4GB Memory. We also set the same maximum number of iterations to 700 for all algorithms since most of the algorithms converge before reaching this number of iterations, while the algorithms that do not converge they do not achieve significant PSNR improvements if kept running longer. Further, for all algorithms, we used the relative absolute error ‖X 𝑘+1 − X 𝑘 ‖𝐹 /‖X 𝑘 ‖𝐹 < 𝜖 as a stopping condition. Figure 3.12 shows the results of applying the robust tensor recovery algorithms for image inpainting using the facade image of size 256×256×3. We set the rank for each unfold to 80×80×3. The figure shows that our framework, in general, achieves a significant improvement in PSNR and execution time followed by IADAL, RSTD and finally MALM. 67 MALM: PSNR:21.43 dB, Time:96 Sec Figure 3.12a: Top: The observed image. Left: low rank model. Right: Sparse model using MALM. 68 RSTD: PSNR:22.79 dB, Time:113 Sec IADAL: PSNR:24.16 dB, Time:76 Sec Figure 3.12b: Left: low rank model. Right: Sparse model using RSTD (Top) and IADAL (Bottom). 69 TSRTR-ALM: PSNR: 26.71 dB, Time: 50 Sec TSRTR-APG: PSNR:25.68 dB, Time: 39 Sec Figure 3.12c: Left: low rank model. Right: Sparse model using TSRTR-ALM (Top) and TSRTR-APG (Bottom). 70 TSRTR-ADMM: PSNR:26.97 dB, Time:69 Sec Figure 3.12d: Left: low rank model. Right: Sparse model using TSRTR-ADMM. Among our proposed algorithms, we notice that using TSRTR-APG does accelerate the convergence of the recovery with a small sacrifice in terms of quality. On the other hand, TSRTR-ADMM consumes longer time to converge but achieves the best reconstruction quality due to the advantage of the separable structure over TSRTR-ALM, which came in between both algorithms. Even though all these algorithms compute the SVD one time in each iteration, we notice that the algorithms derived based on the proposed framework still have the fastest computation time due to the fast PSNR correction throughout iterations. As shown in Figure 3.13, which plots the progress of PSNR improvements over iterations for all tensor RPCA algorithms, TSRTR-APG converges within 200 iterations, while TSRTR-ADMM consumes the whole 700 iterations without convergence. However, TSRTR-ADMM has the highest PSNR and the best visual quality than the rest of the 71 algorithms. TSRTR-ALM requires less than 300 iterations to converge and its performance is slightly higher than TSRTR-APG. In comparison to the rest of the algorithms, both MLAM and RSTD consume the maximum 700 iterations; IADAL converges within less than 500 iterations, but the PSNR values of these algorithms are more than 2dB less than the PSNR values of the proposed framework. Figure 3.14 shows the effect of increasing the rank for the first two unfolds (for example rank 100 means the rank of the unfolded tensor is 100×100×3) of the same façade image on the performance of the tensor recovery algorithms. We notice that TSRTR-ADMM algorithm achieves the highest PSNR over the whole rank range followed by IADAL by an average of 2 dB. We also see that as the rank increases, the difference in performance between TSRTR-ALM and IADAL increases. In addition to that, when the rank is high, MALM achieves higher PSNR than RSTD. For lower rank signals, RSTD achieves better PSNR than MALM. MALM is the least affected algorithm by increasing the rank of the image. In Table 3.3, we compare the average PSNR and the execution time for the four algorithms using different masks applied to a set of 10 colored images shown in Figure 3.15 after setting the rank of each one of the unfolded tensor over each dimension to 100×100×3 before adding the mask. 72 Figure 3.13: PSNR versus iteration number for the façade image in Figure 3.12. 73 Figure 3.14: PSNR versus rank of the first two folds for the façade image in Figure 3.12. Figure 3.15: Set of images used for performance comparison in Table 3.3. 74 Table 3.3: Average execution time and PSNR comparison for tensor RPCA algorithms using a set of 10 colored images. Time Mask Algorithm PSNR (dB) (Sec) MALM 24.91 94 RSTD 25.38 110 IADAL 25.71 68 TSRTR-ALM 27.46 50 TSRTR-APG 26.92 43 TSRTR-ADMM 27.83 61 MALM 30.98 87 RSTD 32.53 106 IADAL 32.98 73 TSRTR-ALM 35.02 56 TSRTR-APG 34.18 41 TSRTR-ADMM 35.64 64 MALM 30.16 83 RSTD 30.92 97 IADAL 31.74 60 TSRTR-ALM 33.57 41 TSRTR-APG 33.22 38 TSRTR-ADMM 34.06 59 75 It shows clearly that on average our proposed framework achieves higher dB improvements over state-of-the-art algorithms. For example, TSRTR-ALM has average performance compared to the others, yet it achieves about 2 dB improvement in PSNR, and TSRTRADMM achieves about 1dB over that. One important reason for this improvement is using the truncated Schatten-p functions, which will only modify the smaller singular values and leave the higher singular values intact, which capture the main information of the image, intact. In addition to that and despite the fact that our framework has similar degree of complexity to the other tensor recovery algorithms, it needs less iteration to converge and it reduces the execution time by an average of 18%. Finally, we apply the RPCA algorithms to recover the ocean video sequence [6] of size 112×160×3×32 and rank 50×50×3×10 that is corrupted by different random masks for different frames; the results are shown in Figure 3.16 for frame number 16 as a representative example of visual quality. From Figure 3.16 we also see that our proposed framework achieves about 2 dB enhancement in PSNR compared to IADAL, which achieved the second-best performance, after the algorithms based on the proposed framework, followed by RSTD and MALM. Even when the proposed algorithms are tested on four-dimensional data, TSRTR-ADMM still achieves the highest PSNR among the proposed algorithms with relatively high execution time, followed by TSRTR-ALM and then TSRTR-APG, which provided the lowest PSNR values when compared to the proposed algorithms. In terms of execution time, TSRTR-APG is still the fastest converging algorithm among the others followed by TSRTR-ALM by 31 seconds and then TSRTR-ADMM, which requires 58 seconds more than TSRTR-APG. 76 MALM-PSNR:23.75 dB, Time:214 Sec RSTD-PSNR: 25.21 dB, Time:263 Sec Figure 3.16a: Top: The observed image. Left: low rank model. Right: Sparse model using MALM (Top) and RSTD (Bottom). 77 IADAL-PSNR:27.12 dB, Time:185 Sec TSRTR-ALM: PSNR:29.08 dB, Time:116 Sec TSRTR-APG: PSNR:28.41 dB, Time:85 Sec Figure 3.16b: Left: low rank model. Right: Sparse model using IADAL (Top), TSRTR-ALM (Middle) and TSRTR-APG (Bottom). 78 TSRTR-ADMM: PSNR:29.67dB, Time:143 Sec Figure 3.16c: Left: The observed image. Middle: low rank model. Right: Sparse model using TSRTR-ADMM. 3.5 Proof of Lemmas 3.5.1. Proof of Lemma 3.1: From (3.26) we complete the sum of squares: 𝑁 𝐿(X, E,Y, 𝜇) = ∑ 𝛼𝑖 𝑇𝑟((X )𝑇(𝑖) (X )(𝑖) + 𝑝 𝛾𝐼)2 𝑖=1 2 𝜇 1 + ‖𝑃𝛺 (M ) − X − E + Y ‖ 2 𝜇 𝐹 (A1) 1 − ‖Y ‖2𝐹 𝜇 Deriving and solving (A1) for X : 𝑁 ∑ 𝑖=1 𝛼𝑖 ((X )𝑇(𝑖) ) ((X )𝑇(𝑖) (X )(𝑖) + 𝑝 𝛾𝐼)1−2 1 − 𝜇(𝑃𝛺 (M ) − X − E + Y ) = 0 𝜇 For simplicity, we assume 𝑊𝑖 is given by (3.32). Then the solution for X is: 79 (A2) X 𝑘+1 1 𝑁 ∑ (X )𝑇(𝑖) 𝑊𝑖 )(𝑖) 1 𝜇 𝑖=1 𝛽𝑖 𝑓𝑜𝑙𝑑 (𝛼𝑖 = 𝑃𝛺 (M ) − E + Y − ∑𝑁 𝜇 𝑖=1 𝛽𝑖 (A3) For further simplification, we assume Z𝑖 𝑘+1 as presented in (3.28) and hence X 𝑘+1 is reduced to (3.27). 3.5.2. Proof of Lemma 3.2: We start from (A1) and derive for E. But here we need to include another constraint to force the entries of E at the observed samples entries to be zero. 1 𝜇(𝑃𝛺 (M ) − X − E + Y ) = 0 𝜇 (A4) 𝑠. 𝑡. 𝑃𝛺 (E )=0 Now solving for E: 1 E𝑘+1 = 𝑃𝛺 (M ) − X + Y 𝜇 (A5) 𝑃𝛺 (E𝑘+1 )=0 Or we can rewrite (A4) as in (3.30). 3.5.3. Proof of Lemma 3.3: Also starting from (A1), we derive for Y : 1 1 𝑃𝛺 (M ) − X − E + 𝜇 Y − 𝜇 Y = 0 (A6) Solving for Y : 80 Y 𝑘+1 = Y 𝑘 + 𝜇(𝑃𝛺 (M ) − X − E) (A7) Which is the same result shown in (3.31). 3.5.4. Proof of Lemma 3.4: First we complete the sum of squares: 𝑚 1 𝐿(X, E,Y, 𝜇) = ∑ [𝛼𝑖 𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )] 𝑖=1 (A8) 2 𝜇 1 1 + 𝜆‖(E)(𝑖) ‖ ] + ‖M − X − E + Y ‖ − ‖Y ‖2𝐹 1 2 𝜇 𝜇 𝐹 Differentiating (A1) for X : 𝑚 ∑ 𝑖=1 𝛼𝑖 𝑓𝑜𝑙𝑑((X )𝑇(𝑖) ) ((X )𝑇(𝑖) (X )(𝑖) + 𝑝 𝛾𝐼)1−2 1 −𝛽𝑖 (𝐴𝑇𝑖 𝐵𝑖 ) − 𝜇(M − X − E + Y ) = 0 𝜇 (A9) Assuming 𝑊𝑖 is given by (3.32). Then the solution for X is: 1 X 𝑘+1 = M − E + 𝜇 Y − 1 𝑚 ∑ (X )𝑇(𝑖) 𝑊𝑖 )(𝑖) −𝛽𝑖 (𝐴𝑇𝑖 𝐵𝑖 )) 𝜇 𝑖=1 𝛼𝑖 𝑓𝑜𝑙𝑑 ( ∑𝑁 𝑖=1 𝛼𝑖 By letting Z𝑖 𝑘+1 as presented in (10), X 𝑘+1 is reduced to (8). 3.5.5. Proof of Lemma 3.5: We start from (A1) and differentiate the equation for E. 81 (A10) 𝑚 2 𝜕 𝜕 𝜇 1 𝐿(X, E,Y, 𝜇) = [∑ 𝜆‖(E)(𝑖) ‖1 + ‖M − X − E + Y ‖ ] 𝜕E 𝜕E 2 𝜇 𝐹 (A11) 𝑖=1 The solution to (A4) for E is by soft thresholding [4] : E 𝑘+1 = 𝔖 𝜆 (M − X 𝑘+1 𝜇 1 + Y 𝑘) 𝜇 (A12) 3.5.6. Proof of Lemma 3.6: To find the low rank (SX ) and sparse (SE ) tensors, we can minimize (3.47) using Block Coordinate Descent (BCD), we fix Y, Z ; then we differentiate and solve for SX and SE separately. 𝑚𝑖𝑛𝑄𝜏 (SE ) = SE 𝜏 ‖S − YE ‖2𝐹 + 𝜆‖SE ‖1 2 E (A13) By removing all constant terms, equation (A6) becomes a standard ℓ1 minimization equation [19] represented in the tensor domain, and by applying the thresholding solution we can easily get the optimal solution for the sparse tensor SE in (3.52). Now, we solve (3.47) for SX and we start by removing the constant terms. 𝑚𝑖𝑛𝑄𝜏 (SX ) = 𝑆X 𝜏 2 ‖SX − YX ‖𝐹 2 (A14) 𝑚 + ∑ [𝛼𝑖 𝑇𝑟 [((SX )𝑇(𝑖) (SX )(𝑖) 𝑖=1 + 1 𝛾𝐼)2 𝑇 − 𝛽𝑖 (𝐴𝑖 (SX )(𝑖) 𝐵𝑖 )]] Diffrenciating (A8) and adding a weighting parameter 0 ≤ 𝛼𝑖 ≤ 1 to control the weight of each fold of the tensor. 82 𝑇 𝑇 ∑𝑚 𝑖=1 [𝛼𝑖 𝑓𝑜𝑙𝑑( (SX )(𝑖) 𝑊𝑖 )(𝑖) −𝛽𝑖 (𝐴𝑖 𝐵𝑖 ))] 𝜏(SX − YX ) + =0 ∑𝑚 𝑖=1 𝛼𝑖 (A15) The final step is to find the low rank tensor SX which is equivalent to (3.52). SX = YX − 𝑇 𝑇 1 ∑𝑚 𝑖=1 [𝛼𝑖 𝑓𝑜𝑙𝑑( (X )(𝑖) 𝑊𝑖 )(𝑖) −𝛽𝑖 (𝐴𝑖 𝐵𝑖 ))] ∑𝑚 𝜏 𝑖=1 𝛼𝑖 (A16) 3.5.7. Proof of Lemma 3.7: Before we start solving (3.47), we need to simplify the mathematical representation by completing the sum of squares. 𝑚𝑖𝑛 Q(X , E , L) X,E,L 𝑚 1 = ∑ [𝛼𝑖 𝑇𝑟 [((X )𝑇(𝑖) (X )(𝑖) + 𝛾𝐼)2 − 𝛽𝑖 (𝐴𝑖 (X )(𝑖) 𝐵𝑖 𝑇 )] 𝑖=1 (A17) 2 𝜏 1 1 + 𝜆‖(E)(𝑖) ‖ ] + ‖X +E − M + L ‖ − ‖L‖2𝐹 1 2 𝜏 𝜏 𝐹 To find X , we solve (3.52) using BCD. First, we fix E, L ; and differentiate for X. We also add a weighting parameter 0 ≤ 𝛼𝑖 ≤ 1 to control the weight of each fold of the tensor. ∑𝑚 𝑖=1 𝛼𝑖 𝑓𝑜𝑙𝑑((X )𝑇(𝑖) ) ((X )𝑇(𝑖) (X )(𝑖) + 𝑇 1 −𝛽𝑖 (𝐴𝑖 𝐵𝑖 ) 𝛾𝐼)2 ∑𝑚 𝑖=1 𝛼𝑖 1 + 𝜏 (X +E − M + L) = 0 𝜏 Using the definition of 𝑊𝑖 in (3.32) and solving for X. 83 (A18) 𝑚 X 𝑘+1 (E = M − ∑ 𝛼𝑖 𝑓𝑜𝑙𝑑 ( [ 𝑘+1 ) 𝑖=1 (𝑖) 1 𝑚 − (X𝑖 𝑘 )(𝑖) 𝑊𝑖 + 1 𝜏 ) / ∑ 𝛼𝑖 + L 𝛽𝑖 𝑇 𝜏 𝑖=1 𝐴𝑖 𝐵𝑖 𝜏 (𝑖) ] (A19) To solve for E, we fix the other tensor variables and remove all constant terms, equation (A10) is reduced to: 𝑚 2 𝜏 1 𝑚𝑖𝑛Q(E ) = ∑ 𝜆‖(E)(𝑖) ‖ + ‖X +E − M + L ‖ 1 E 2 𝜏 𝐹 𝑖=1 (A20) Equation (A10) can also be solved using the thresholding solution used to solve the standard ℓ1 optimization equation [19] represented in the tensor domain. Now, we differentiate (A14) and solve it for E, which results in: E 𝑘+1 1 = 𝔖 𝜆 (M − X 𝑘 + L) 𝜏 𝜏 (A21) The update step for L can be easily found by fixing X and E and neglecting the constant terms. 2 𝜏 1 1 𝑚𝑖𝑛Q(L) = ‖X +E − M + L ‖ − ‖L ‖2𝐹 L 2 𝜏 2𝜏 𝐹 (A22) Differentiating and solving for L: L 𝑘+1 = L 𝑘 − 𝜏(X 𝑘+1 + E 84 𝑘+1 − M) (A23) Chapter 4 Single Image Super-Resolution 4.1 Example Based Single Image Super-resolution Most previous example based single image super-resolution algorithms use two training dictionaries one for LR and the other for HR patches. These methods assume that there is a one-to-one mapping between LR and HR image patches. The training is usually done over a large set of pairs of LR and HR patches. During the testing phase, the test patch is approximated by a linear combination of LR patches from the over complete dictionary; and because of the one-to-one mapping assumption, the corresponding HR patches are substituted instead [26], [27], [29], [48], [72]. However, the down-sampling, noise and blur operations that the HR patches went through to produce the LR patches are non-unique transformations and hence the one-to-one mapping assumption is not accurate. To mitigate some of the issues with prior work, our recent effort in this area was based on the assumption that the HR patches span a lower dimensional space, and hence, by projecting the test LR patch directly to the HR manifold, one may expect optimal approximations can be achieved [28], [73]. However, one key issue with this assumption is the problem of projecting between two spaces that are different in both geometry and dimension. We have observed that this difference in geometry and dimension can lead to inaccurate results. In particular, since the test LR patch is in general smooth; then, when projecting a LR patch into a HR manifold, the projection is steered and biased toward smooth subspaces of the HR manifold. Hence, some of the HR details are not being exploited. This issue limited the level of improvements that could be achieved. In addition, our prior work [28] used sparse subspace 85 clustering (SSC), which is not guaranteed to be robust against noise [74]. Since one can’t guarantee that the examples images are noise free, this can be another drawback for this approach. Finally, since in [28] the number of clusters is provided as an input, it may force patches that truly belong to different subspaces to be placed in the same cluster. 4.2 Subspace Clustering Many applications in real-life deal with high dimensional data, for example in computer vision applications, a video can have thousands of frames. Same issue happens in machine learning, data mining, etc. Dealing with such high dimensional data decreases the performance of the algorithms and increase the computation time. This problem is known as the curse of dimensionality [75], [76]. For the past decades, several methods have been proposed to reduce the dimensionality of the data to avoid this problem. The most commonly used method is the Principal Component Analysis (PCA). The drawback of PCA is that it models the data into a single low-dimensional subspace, while in most applications the data need to be represented in multiple low-dimensional subspaces with known, or unknown, memberships to each subspace. Hence, we need to segment the data into multiple subspaces and fit each group into a low-dimensional subspace [74]. Subspace clustering algorithms can be categorized into: • Iterative: Examples of these approaches are K-subspaces [77] and median Kflats which approximates data by a mixture of flats (subspaces) [78]. In these methods, data points are assigned to the closest subspace and subspaces are fitted into clusters. The problem with such algorithms is that they are sensitive to 86 initialization values and they require knowledge about the dimensionality of the subspaces [75]. • Factorization: For example [79], [80]. These methods are usually based on segmenting the similarity matrix that is built from the data points. Although these methods do not require knowledge about the dimensionality of subspaces, they only work if the subspaces are independent, they are also sensitive to noise and outliers in the data [75]. • Geometric: For example [81] in which the subspaces are represented by a set of homogenous polynomials whose degree is the number of subspaces. A distance function is used to assign the data points into subspaces. The problems with this approach is that it is sensitive to noise and outliers and its computational complexity increases exponentially with the number and dimensions of subspaces [75]. • Statistical: Some of these methods are iterative, such as [82] which is based on the assumption that the distribution of data points in subspaces follow a Gaussian distribution. Similar to iterative methods, these algorithms require a knowledge of the number of subspaces and they are also sensitive to initialization [75]. Another robust statistical methods such as RANSAC [83] which assigns data points into subspaces until there is enough inlier data points. The problem with RANSAC is that the number and dimension of subspaces should be equal and known. 87 • Spectral: There are two main groups under this category, the local spectral clustering which uses local information of each data point to build a similarity matrix and then a spectral clustering algorithm is applied to segment the similarity matrix into subspaces. Examples of these approaches can be found in [84], [85]. The problem with these approaches is dealing with points that are close to two, or more, subspaces and they are require selecting the size of the neighborhood that they use to compute the distance between local data points. The other group is global spectral clustering, in which the similarity between all data points is used to assign the data points to subspaces. To capture the global relationship between data points, researchers rely on the concept of compressed sensing[1], [2] or matrix completion. Example of these methods is the Sparse Subspace Clustering (SSC)[75], which tries to enforce the sparsity assumption on the global similarity matrix that is built from all the data points. The benefit of these algorithms is that there is no need to know the number or dimension of subspaces. 4.3 Low Rank Subspace Clustering (LRSC) LRSC belongs to the global spectral subspace clustering group and it is built on the foundation of Robust Principal Component Analysis (RPCA) that attempts to recover a lowrank component 𝑋 and a sparse component 𝐸 from a given observation matrix: 𝑀 = 𝑋 + 𝐸. The sparse matrix 𝐸 can represent an error (or noise) signal that is corrupting the low-rank matrix 𝑋. This recovery can be achieved by minimizing the ℓ1 norm of the error matrix (noise) and the nuclear norm ‖𝑋‖∗ of the data matrix 𝑋 [10]: 88 𝑚𝑖𝑛 ‖𝑋‖∗ + 𝛼‖𝐸‖1 + 𝑋,𝐸 𝜆 ‖𝑀 − (𝑋 + 𝐸)‖2𝐹 2 (4.1) In LRSC, the self-expressive property is assumed for the low-rank matrix 𝑋 = 𝑋𝐶. Hence, the observed data matrix 𝑀 is assumed to be the graph obtained from using a noisy version of the final clean low rank adjacency matrix (𝐶), in addition to additive sparse noise: 𝑀 = 𝑋𝐶 + 𝐸 (4.2) The LRSC formula is obtained by substituting (4.2) for the RPCA equation (4.1) and adding a condition to maintain the symmetric property of 𝐶 [74]: 𝑚𝑖𝑛 ‖𝐶‖∗ + 𝛼‖𝐸‖1 + 𝐶,𝑋,𝐸 𝜆 ‖𝑀 − 𝑋𝐶 − 𝐸‖2𝐹 2 (4.3) 𝑠. 𝑡. 𝐶 = 𝐶 𝑇 Using the self-expressive property of 𝑋 , i.e. 𝑋 = 𝑋𝐶 and substituting in (4.3), the minimization equation can be casted as [74]: 𝜏 𝜆 𝑚𝑖𝑛 ‖𝐶‖∗ + 𝛼‖𝐸‖1 + ‖𝑋 − 𝑋𝐶‖2𝐹 + ‖𝑀 − 𝑋 − 𝐸‖2𝐹 𝐶,𝑋,𝐸 2 2 (4.4) 𝑠. 𝑡. 𝐶 = 𝐶 𝑇 The output from (4.4) is the noise free adjacency matrix of the clustered graph (𝐶 ∗ ) and the noise free data matrix (𝑋 ∗ ). 4.4 Dual-manifold clustering and subspace similarity [86] Similar to the previous example based methods, the proposed algorithm consists of two main phases; training phase and testing phase. Each phase is explained in details in the following sections. 89 4.4.1. Training Phase: For training, the algorithm picks 1000 randomly generated LR patches of size 8x8 pixels. For each LR patch, the spatially equivalent HR patch, in addition to its 4 direct neighbors, are selected. The experimental results showed that this approach helps capturing not only the equivalent pixels locations from the HR patch, as in previous methods, but also the direct neighbor pixels that contributed to generating that patch during the downsampling and blurring processes. In total we generate less than 6000 training patches. A low-resolution patch (𝑙) used for training is generated from a HR patch (ℎ) by downsampling, blurring and adding noise: 𝑙 = 𝐷𝐵ℎ + 𝑁 (4.5) where 𝐷, 𝐵 and 𝑁 are the down-sampling, blur and additive noise matrices, respectively. The LRSC algorithm is applied to both LR and HR patches, and hence, two clustering-based approximations of the LR and HR manifolds are generated independently. Here, it is important to emphasize that unlike previous algorithms, we don’t assume any similarity in the geometry between the LR and the HR manifolds; instead, an affinity (similarity) measure (𝑚) is used to find the closest HR subspace to the selected LR subspace. 𝑠𝑚𝑎𝑥 = 𝑚𝑎𝑥 𝑚(𝐿𝑖 , 𝐻𝑗 ) (4.6) where 1 ≤ 𝑖 ≤ 𝑘1 , 1 ≤ 𝑗 ≤ 𝑘2 . 𝑘1 , 𝑘2 represent the number of LR and HR clusters respectively. 90 Figure 4.1: The test phase summary of the DMCSS algorithm. d1,d2 and d3 represent the distance between the LR test patch and each cluster in the LR manifold. T:L2-H1 is an entry in the subspace similarity table (T) and it represents the closest HR subspace to the corresponding LR subspace. 91 The similarity between two subspaces 𝐿𝑖 and 𝐻𝑗 with PCA dimensions 𝛿𝑖 and 𝛿𝑗 is given by [87], [88]: 𝑚(𝐿𝑖 , 𝐻𝑗 ) = 𝑟 ∑𝑟𝑘=1 𝜎𝑘 (3.7) where 𝑟 = min(𝛿𝑖 , 𝛿𝑗 ), 𝜎𝑘 represents the k-th singular value of the orthonormal basis product ( 𝑃𝑖𝑇 𝑃𝑗 ). 𝑃𝑖 ∈ ℝ𝛿×𝛿𝑖 and 𝑃𝑗 ∈ ℝ𝛿×𝛿𝑗 are the orthogonal basis matrices of 𝐿𝑖 and 𝐻𝑗 respectively. 𝛿 = 64 is the length of the vector that results from the vectorization of each patch. Since the similarity measure applies to subspaces with different dimensions, it fits quite well with our application. 𝑃𝑖𝑇 𝑃𝑗 = 𝑈ΣV 𝑇 (4.8) Σ = 𝑑𝑖𝑎𝑔(𝜎1 … 𝜎𝑟 ) As the experimental results show, using the similarity measure between LR and HR subspaces gives better approximation to the test image patches than projecting the patch directly to the HR manifold. This helps preventing the problem of selecting the wrong HR subspace in case there are two HR subspaces that are close to each other or in case of the availability of a smooth subspace. 92 Algorithm 4.1: The training and testing phases for the proposed DMCSS algorithm. Algorithm 4.1: DMCSS Training Phase Input: Training Images, patch size (𝑝), 𝜏, 𝛼 and 𝜆 Build the manifolds L and H for LR and HR patches by random selection from training images. Cluster L and H independently using (3.4). For each LR subspace (𝐿𝑖 ) Evaluate 𝑚(𝐿𝑖 , 𝐻𝑗 ) to all HR clusters (𝐻𝑗 ) using (3.7). Store the indices of the pair (𝐿𝑖 , 𝐻𝑗 ) that satisfies (3.6) in T. End For Testing Phase Input: T, a set of LR and HR clusters, 𝑝, overlap (o) and test image (G). For each test patch (𝑔). Find the closest 𝐿𝑖 to 𝑔 using (3.9). Use T to find the corresponding 𝐻𝑗 . Average the HR patches in 𝐻𝑗 . End for G* Output: 4.4.2. Testing Phase: In this phase, the bicubic interpolation method is applied first to the test LR image to 93 increase its size by the desired scaling factor. The image is then divided to 8x8 patches with 7 overlapping pixels. Finding the closest LR subspace to the test patch (𝑔) can be simply done by projecting the test patch onto the LR subspaces (𝐿𝑖 ) and selecting the subspace that produces the minimum distance. 𝑑𝑚𝑖𝑛 = 𝑚𝑖𝑛 𝑑(𝑔, 𝐿𝑖 ) (4.9) where 1 ≤ 𝑖 ≤ 𝑘, and 𝑘 is the number of clusters. The similarity table that was built in the training phase is then used to locate the closest HR subspace. The average of the patches in the HR cluster is used to find the final HR patch that is equivalent to the LR test patch. Similar to previous methods, the overlapped pixels in the final HR image are averaged and we added a condition to maintain the global smoothness of the result similar to [28], [89]. A demonstration of the testing phase is shown in Figure 4.1.The summary of the DualManifold Clustering and Subspace Similarity (DMCSS) algorithm is shown in Algorithm 4.1. 4.5 Experimental Results In this section the performance of DMCSS algorithm is compared against state-of-the- art super-resolution algorithms. We compared the algorithm against the standard MATLAB built-in bicubic interpolation function, the Adjusted Anchored Neighborhood Regression for Fast Super-Resolution algorithm (A+) [48], the sparse representation method (ScSR) [26] and SSC-based Linear Approximation of Manifolds algorithm (SLAM) [28], which provided superior results when compared. For the sake of fair comparison, we ran all algorithms using MATLAB R2012b on a computer with a 3.3 GHz i-5 CPU and a 4GB 94 Memory. Similar to the previous image super-resolution methods, we used the YCBCR image representation model. Since the human eye is only sensitive to luminance, we used bicubic interpolation to scale up the CB and CR channels and applied the algorithm to the luminance channel only. We used the same training set in [28] and added 5 more extra images with different texture from [26]. The HR training set is shown in Figure 4.2. Figure 4.3. shows the results of applying the super-resolution algorithms to scale up the Lena test image of size 85×85 by a magnification factor of 3. It clearly shows that our algorithm recovers most of the edges and details with more accuracy than the other algorithms. It also achieved about 0.9dB improvement in PSNR and 0.029 improvement in SSIM. Next we applied the algorithms to other standard test images which are shown in Figure 4.4 and recorded the results as shown in Table 4.1. It also shows that DMCSS achieves an average of 1dB improvement in PSNR and over 0.02 SSIM improvement. Since most of these algorithms depend on random selection of training patches, we ran each algorithm for 20 times and took the average of the results. 95 Figure 4.2: The training images used for the DMCSS algorithm. The pictures are proportionally resized to fit in the table. 96 (a) Bicubic PSNR:29.70, SSIM:0.836 (b) A+ PSNR:30.65, SSIM:0.852 (c) ScSR (PSNR:31.05, SSIM:0.861) (d) SLAM (PSNR:31.30, SSIM:0.873) (e) DMCSS (PSNR:32.11, SSIM:0.902) (f) Original HR image 97 Figure 4.3: The results of applying different super-resolution methods to lena image with magnification factor of 3. (a) Baby (b) Raccoon (c) Truck (d) Freckles (e) House (f) Mountain 98 Figure 4.4: Testing images for super-resolution algorithms. Table 4.1: Performance of super-resolution algorithms on different test images with a magnification factor of 3. Image title Bicubic A+ ScSR SLAM DMCSS 31.73 32.54 32.49 32.98 33.02 0.768 0.794 0.795 0.852 0.896 28.33 28.84 29.14 28.97 30.43 0.715 0.758 0.771 0.762 0.781 27.43 27.76 28.39 28.31 29.12 0.842 0.849 0.860 0.874 0.892 30.14 30.83 31.05 31.46 32.38 0.820 0.852 0.870 0.890 0.901 24.32 24.81 24.80 24.94 26.62 0.685 0.694 0.731 0.730 0.763 27.13 27.64 27.61 27.89 28.79 0.710 0.745 0.746 0.762 0.771 28.18 28.74 28.91 29.09 30.06 0.757 0.782 0.796 0.812 0.834 Baby Raccoon Truck Freckles House Mountain Average 99 Chapter 5 Conclusions and Future Works 5.1 Conclusions In the first part of this work, we have reviewed the main challenges associated with low rank matrix and tensor completion and robust tensor recovery problems, and described several state-of-the-art methods dealing with these problems and provided solutions to them based on the nuclear norm minimization. We proposed new algorithms for matrix completion using the smoothed Schatten- 𝑝 function. Moreover, we extended the Schatten𝑝 functions to the multi-dimensional tensor domain and proposed other algorithms for tensor completion and robust recovery. The experimental results showed, as expected, that in general all the proposed algorithms provide higher PSNR and better visual quality due to employing the truncated nuclear norm which leaves the most significant singular values untouched while minimizing the insignificant ones. TSRTR-ADMM achieves the highest PSNR but it has higher convergence time compared to TSRTR-ALM; on the other hand, TSRTR-APG is the fasted converging algorithm due to the acceleration of the gradient convergence, but its PSNR level falls below TSRTR-ALM. Several experiments have been carried out on three-dimensional and four-dimensional data. It has been shown that in general, our proposed framework performs much better, in terms of visual quality, PSNR and execution time, over the previous state-of-the-art algorithms. The reason for the quality improvement is due to selecting the truncated schatten-p function that approximated minimizing the rank of the low rank data more accurately than the traditional nuclear norm minimization function. The reason for the faster execution time is due to selecting more 100 efficient and less complex optimization solutions to the Robust Tensor Recovery problem that require less extensive computation and converge faster than the other algorithms. Based on the results obtained in this thesis and from an application point of view, we can categorize our algorithms as follow: if for a specific application timing is important and obtaining an acceptable quality results within small amount of time, then TSRTR-APG will be the best algorithm to be selected for that application. If for another application accurate results are required for the recovery no matter how much time it takes, then TSRTR-ADMM is the best option to select as it always achieves the highest PSNR. For general use and to provide very satisfying results within realistic timing, TSRTR-ALM satisfies that balance between execution time and quality. The second part of this thesis is related to the ill-posed problem of image superresolution. We reviewed the current main methods for image super-resolution and we introduced a new image super-resolution algorithm that makes no assumption about the structural similarity between the HR and LR manifolds. The LR and HR manifolds are trained independently and the subspace similarity measure is utilized to find the closest HR subspace to the selected LR subspace. Second, we used LRSC to prevent the effect of noise and gross errors on the algorithm performance. The experimental results showed that the algorithm achieves better results than state-of-the-art algorithms in this application using both subjective and objective measures. 101 5.2 Future works 5.2.1. Matrix and Tensor Completion and Robust Tensor Recovery Throughout this thesis, the algorithms we proposed for Matrix and Tensor completion and Tensor RPCA problems are applied to recover small size visual data from few observed samples or highly corrupted multi-dimensional signals. For many applications, we usually deal with huge low rank datasets, such as in the NETFLIX problem, although our algorithms are designed to get rid of the SVD or HoSVD computations, it still very time consuming if applied to big datasets. For future works we will focus on reducing the computation time of these algorithms. Since matrix operations are separable and can run on different machines and/or parallel processed and the results can be combined, we will be looking into applying our algorithms into parallel processing environment by either applying them on a powerful parallel computing devices such as GPUs or distribute it on several computers and combine the results. This should help reducing the computation time of the algorithms even further. 5.2.2. Sparse\redundant representation for a single video frame An interesting future direction is to adapt and apply our algorithm for is multi-frame image super-resolution, in which the training manifold will consist of different LR images of the same target image but under different conditions. Hence the HR manifold will be absent and we have to modify our algorithm to work for this scenario. We will also attempt to improve and apply this method on video sequences and other applications that require real time results. One of the known issues when dealing with video signals is that each frame needs to be super-resolved in real time inorder for the human eye 102 to watch the video smoothly without any interruption in the video; this makes it very crucial to make the algorithms that applies on videos to be run in real time. Since our test phase for each frame run in an average of around 1/3 of a second on our computer to super-resolve a colored image of size 85x85 by a factor of 3. This makes our algorithm applicable for only videos of 3 Frames Per Second (FPS) maximum, which is nowhere near a good quality video play rate of an average of 30-60 FPS. This is a great motivation for us to start looking into other ways to improve the speed of our algorithm using different methods such as doing fast interpolation to background and constant colored areas and pay more attention to edges and other borders that blur or makes block effect when using the normal interpolation methods. This should help reducing the number of batches being processed for superresolution and it is promising to improve the running times of our algorithm. 103 BIBLIOGRAPHY 104 BIBLIOGRAPHY [1] Abdolreza Abdolhosseini Moghadam, “COMBINATORIAL METHODS FOR COMPRESSED SENSING,” Michigan State University, 2014. [2] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, 2006. [3] E. J. Cand, “The Restricted Isometry Property and Its Implications for Compressed Sensing,” pp. 1–4, 2008. [4] E. Cand and J. Romberg, “Practical Signal Recovery from Random Projections,” vol. 2291, no. 626. [5] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Found. Comput. Math., vol. 9, no. 6, pp. 717–772, 2009. [6] P. Chen, D. Suter, and S. Member, “Recovering the Missing Components in a Large Noisy Low-Rank Matrix : Application to SFM,” vol. 26, no. 8, pp. 1051–1063, 2004. [7] E. J. Candes and Y. Plan, “Matrix completion with noise,” Proc. IEEE, vol. 98, no. 6, pp. 925–936, 2010. [8] E. J. Candès and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2053–2080, 2010. [9] C. T. Dang., “Sparse and redundant models for data mining and consumer video summarization,” Michigan State University, 2015. [10] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” J. ACM, vol. 58, no. 3, pp. 1–37, May 2011. [11] G. Lerman and T. Zhang, “Report on p -Recovery of the most significant subspace among multiple subspaces with outliers , by Lerman and Zhang,” Constr. Approx., vol. 40, no. 1, p. 5, 2014. [12] X. Chen, Y. Qi, B. Bai, Q. Lin, and J. G. Carbonell, “Sparse latent semantic analysis,” Proc. 11th SIAM Int. Conf. Data Mining, SDM 2011, pp. 474–485, 2011. [13] C.-P. Wei, C.-F. Chen, and Y. F. Wang, “Robust face recognition with structurally incoherent low-rank matrix decomposition.,” Tip, vol. 23, no. 8, pp. 3294–307, 2014. [14] D. Goldfarb and Z. Qin, “Robust low-rank tensor recovery: Models and algorithms,” SIAM J. Matrix Anal. Appl., vol. 35, no. 1, pp. 225–253, 2014. [15] Y. Li, Y. Zhou, J. Yan, J. Yang, and X. He, “Tensor error correction for corrupted values in visual data,” Image Process. (ICIP), …, 2010. 105 [16] H. Tan, B. Cheng, J. Feng, G. Feng, and Y. Zhang, “Tensor Recovery via Multilinear Augmented Lagrange Multiplier Method,” 2011 Sixth Int. Conf. Image Graph., pp. 141–146, Aug. 2011. [17] J. Yuan, S. Du, and X. Zhu, “Fast super-resolution for license plate image reconstruction,” 2008 19th Int. Conf. Pattern Recognit., pp. 1–4, 2008. [18] F. Lin, C. Fookes, V. Chandran, and S. Sridharan, “Super-resolved faces for improved face recognition from surveillance video,” Adv. Biometrics, pp. 1–10, 2007. [19] O. Le Meur and C. Guillemot, “Super-Resolution-based Inpainting,” Eccv, pp. 1–14, 2012. [20] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe super resolution,” IEEE Trans. Image Process., vol. 13, no. 10, pp. 1327–1344, 2004. [21] S. S. Patil and M. M. Patil, “Multi Frame Iage Super Resolution,” vol. 4, no. 5, pp. 1686–1690, 2015. [22] X. Li, Y. Hu, X. Gao, D. Tao, and B. Ning, “A multi-frame image super-resolution method,” Signal Processing, vol. 90, no. 2, pp. 405–414, 2010. [23] C. Dang, M. Aghagolzadeh, and H. Radha, “Image Super-Resolution via Local SelfLearning Manifold Approximation,” IEEE Signal Process. Lett., vol. 21, no. 10, pp. 1245–1249, 2014. [24] Y. Tang and P. Yan, “Single-image super-resolution via local learning,” Int. J. Mach. Learn., pp. 15–23, 2011. [25] X. Lu, P. Yan, Y. Yuan, and X. Li, “Utilizing Homotopy for Single Image Superresolution,” pp. 316–320, 2011. [26] J. Yang, J. Wright, Y. Ma, and T. Huang, “Image Super-Resolution as Sparse Representation of Raw Image Patches,” IEEE Trans. Image Process., pp. 1–13, 2010. [27] H. C. H. Chang, D.-Y. Y. D.-Y. Yeung, and Y. X. Y. Xiong, “Super-resolution through neighbor embedding,” Proc. 2004 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, 2004. CVPR 2004., vol. 1, 2004. [28] C. T. Dang, M. Aghagolzadeh, a. a. Moghadam, and H. Radha, “Single image super resolution via manifold linear approximation using sparse subspace clustering,” 2013 IEEE Glob. Conf. Signal Inf. Process., pp. 949–952, 2013. [29] D. L. Karambelkar and P. J. Kulkarni, “Super-Resolution Using Manifold Learning,” 2011 Int. Conf. Comput. Intell. Commun. Networks, pp. 707–710, 2011. 106 [30] J. Liu, P. Musialski, P. Wonka, and J. Ye, “Tensor completion for estimating missing values in visual data,” Comput. Vision, 2009 IEEE 12th Int. Conf., pp. 2114–2121, Jan. 2009. [31] S. Ma, D. Goldfarb, and L. Chen, “Fixed point and Bregman iterative methods for matrix rank minimization,” Math. Program., vol. 128, no. 1–2, pp. 321–353, 2011. [32] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM J. Optim., vol. 20, no. 4, pp. 1956–1982, 2010. [33] K. Toh and S. Yun, “An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems,” Pacific J. Optim., vol. 6, pp. 615–640, 2010. [34] D. Zhang, Y. Hu, Jj. Ye, X. Li, and X. He, “Matrix completion by truncated nuclear norm regularization,” Comput. Vis. Pattern Recognit., pp. 2192–2199, Jun. 2012. [35] K. Mohan and M. Fazel, “Iterative reweighted algorithms for matrix rank minimization,” J. Mach. Learn. Res., vol. 13, pp. 3441–3473, 2012. [36] R. Tripathi, B. Mohan, and K. Rajawat, “Adaptive Low-Rank Matrix Completion,” vol. 65, no. 14, pp. 3603–3616, 2017. [37] J. Liu, P. Musialski, P. Wonka, and J. Ye, “Tensor completion for estimating missing values in visual data.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 208–20, Jan. 2013. [38] M. Yuan and C. Zhang, “On Tensor Completion via Nuclear Norm Minimization,” 2014. [39] L. Huang and H. So, “Truncated nuclear norm minimization for tensor completion,” Sens. Array …, pp. 417–420, Jun. 2014. [40] T. Yokota and A. Cichocki, “TENSOR COMPLETION VIA FUNCTIONAL SMOOTH COMPONENT DEFLATION,” no. 1, pp. 2–6, 2016. [41] J. A. Bengua, H. N. Phien, H. D. Tuan, and M. N. Do, “Efficient Tensor Completion for Color Image and Video Recovery : Low-Rank Tensor Train,” vol. 26, no. 5, pp. 2466–2479, 2017. [42] J. Wright, A. Ganesh, and S. Rao, “Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization,” Proc. Neural Inf. Process. Syst., vol. 3, 2009. [43] Q. Zhao, D. Meng, Z. Xu, W. Zuo, and L. Zhang, “Robust Principal Component Analysis with Complex Noise,” 2010. [44] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan, “Tensor Robust Principal 107 Component Analysis : Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization,” 2016. [45] M. E. Kilmer and C. D. Martin, “Factorization strategies for third-order tensors,” Linear Algebra Appl., vol. 435, no. 3, pp. 641–658, 2011. [46] F. Changjun, J. Xiangyang, Z. Yongbing, and D. Qionghai, “A Single Frame SuperResolution Method Based on Matrix Completion,” Data Compression Conf., pp. 297–306, Apr. 2012. [47] W. Huang, L. Xiao, Z. Wei, X. Fei, and K. Wang, “Single image super-resolution by clustered sparse representation and adaptive patch aggregation,” China Commun., vol. 10, no. 5, pp. 50–61, 2013. [48] R. Timofte, V. De Smet, and L. Van Gool, “A plus : Adjusted Anchored Neighborhood Regression for Fast Super-Resolution,” Comput. Vis. - Accv 2014, Pt Iv, vol. 9006, pp. 111–126, 2015. [49] R. Timofte, V. De, and L. Van Gool, “Anchored Neighborhood Regression for Fast Example-Based Super-Resolution,” Comput. Vis. (ICCV), 2013 IEEE Int. Conf., pp. 1920–1927, 2013. [50] C. Yang and M. Yang, “Fast Direct Super-Resolution by Simple Functions,” IEEE Int. Conf. Comput. Vis., 2013. [51] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization,” Optim. Online, vol. 52, no. 3, pp. 1–33, 2007. [52] Y. Xu, W. Yin, Z. Wen, and Y. Zhang, “An alternating direction algorithm for matrix completion with nonnegative factors,” Front. Math. China, vol. 7, no. 2, pp. 365– 384, 2012. [53] K.-C. Toh and S. Yun, “An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems,” Pacific J. Optim., vol. 6, no. 15, pp. 615– 640, 2010. [54] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from noisy entries,” J. Mach. Learn. Res., vol. 99, pp. 2057–2078, 2010. [55] M. Fazel, “Matrix rank minimization with applications,” Stanford University, 2002. [56] R. S. Cabral, F. Torre, J. P. Costeira, and A. Bernardino, “Matrix completion for multi-label image classification,” Adv. Neural Inf. Process. Syst., pp. 1–9, 2011. [57] F. Fazel, M. Fazel, and M. Stojanovic, “Random access sensor networks: Field reconstruction from incomplete data,” 2012 Inf. Theory Appl. Work., pp. 300–305, Feb. 2012. 108 [58] K. Xie, X. Ning, X. Wang, D. Xie, J. Cao, G. Xie, and J. Wen, “Recover Corrupted Data in Sensor Networks : A Matrix Completion Solution,” vol. 16, no. 5, pp. 1434– 1448, 2017. [59] Z. Liu and L. Vandenberghe, “Interior-point method for nuclear norm approximation with application to system identification,” SIAM J. Matrix Anal. Appl., vol. 31, no. 3, pp. 1235–1256, Jan. 2009. [60] I. Daubechies and R. DeVore, “Iteratively reweighted least squares minimization for sparse recovery,” Commun. Pure Appl. Math., vol. 63, no. 1, pp. 1–38, 2010. [61] K. Mohan and M. Fazel, “Reweighted nuclear norm minimization with application to system identification,” Am. Control Conf. (ACC), 2010, no. 5, 2010. [62] M. Signoretto, V. de P. Raf, D. M. Bart, and J. A. Suykens, “Tensor versus matrix completion: a comparison with application to spectral data,” Signal Process. Lett., vol. 18, no. 7, pp. 403–406, Jul. 2011. [63] Z. Lin, M. Chen, and Y. Ma, “The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices,” arXiv Prepr. arXiv1009.5055, 2010. [64] S. Gandy, B. Recht, and I. Yamada, “Tensor completion and low-n-rank tensor recovery via convex optimization,” Inverse Probl., vol. 27, no. 2, 2011. [65] S. Friedland and L. Lim, “Nuclear norm of higher-order tensors,” pp. 1–23, 2016. [66] Z. Lin, M. Chen, and Y. Ma, “The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices,” arXiv Prepr. arXiv1009.5055, 2010. [67] M. Al-Qizwini and H. Radha, “Fast smooth rank approximation for tensor completion,” 48th Annu. Conf. Inf. Sci. Syst., no. 4, pp. 1–5, Mar. 2014. [68] M. Al-Qizwini and H. Radha, “Smoothed Rank Approximation Algorithms for Matrix Completion,” Asilomar, 2014. [69] W. Zuo and Z. Lin, “A generalized accelerated proximal gradient approach for totalvariation-based image restoration.,” Image Process. IEEE Trans., vol. 20, no. 10, pp. 2748–2759, Oct. 2011. [70] M. Al-Qizwini and H. Radha, “Truncated and Smoothed Schatten-p Function for Robust Tensor Recovery,” Conf. Inf. Sci. Syst., pp. 31–34, 2015. [71] Z. Shi, T. Zheng, and J. Han, “Trace norm regularized tensor classification and its online learning approaches,” arXiv Prepr. arXiv1109.1342, pp. 1–11, 2011. [72] T. Peleg, S. Member, and M. Elad, “A Statistical Prediction Model Based on Sparse Representations for Single Image Super-Resolution,” IEEE Trans. Image Process., vol. 23, no. 6, pp. 2569–2582, 2014. 109 [73] C. Dang and H. Radha, “Fast Image Super-resolution via Selective Manifold Learning of High-Resolution Patches,” pp. 1319–1323, 2015. [74] R. Vidal and P. Favaro, “Low rank subspace clustering (LRSC),” Pattern Recognit. Lett., vol. 43, no. 1, pp. 47–61, 2014. [75] S. Member, E. Elhamifar, and S. Member, “Sparse Subspace Clustering :,” pp. 1–19. [76] R. Bellman, Dynamic Programming. new jersey: Princeton University Press Princeton., 1957. [77] P. Tseng, “Nearest q-Flat to m Points,” J. Optim. Theory Appl., vol. 105, no. 1, pp. 249–252, 2000. [78] J. Yang, S. Member, J. Wright, and T. S. Huang, “Image Super-Resolution Via Sparse Representation,” vol. 19, no. 11, pp. 2861–2873, 2010. [79] P. Costeira, “A Multibody Factorization Method for Independently Moving Objects,” vol. 29, no. 3, pp. 159–179, 1998. [80] K. Kanatani, “Motion Segmentation by Subspace Separation and Model Selection Kenichi,” pp. 586–591, 2001. [81] Y. Ma and S. Sastry, “Generalized Principal Component Analysis ( GPCA ),” vol. 27, no. 12, pp. 1945–1959, 2005. [82] C. M. Bishop, “Mixtures of Probabilistic Principal Component Analyzers,” vol. 482, no. 1901, pp. 443–482, 1999. [83] M. A. Fischler and R. C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Apphcatlons to Image Analysis and Automated Cartography,” vol. 24, no. 6, 1981. [84] A. Goh, “Segmenting Motions of Different Types by Unsupervised Manifold Clustering,” 2007. [85] T. Zhang, A. Szlam, Y. Wang, and G. Lerman, “Hybrid Linear Modeling via Local Best-Fit Flats,” no. May, pp. 217–240, 2012. [86] M. Al-Qizwini, C. Dang, M. Aghagolzadeh, and H. Radha, “Image Super-resolution via Dual-Manifold Clustering and Subspace Similarity,” Int. Conf. Image Process., vol. 683, 2016. [87] M. Soltanolkotabi, E. Elhamifar, and E. J. Candès, “Robust subspace clustering,” Ann. Stat., vol. 42, no. 2, pp. 669–699, 2014. [88] R. Wang, S. Shan, X. Chen, and W. Gao, “Manifold-manifold distance with application to face recognition based on image set,” Comput. Vis. Pattern 110 Recognition, 2008. CVPR 2008. IEEE Conf., pp. 1–8, 2008. [89] M. Irani and S. Peleg, “Super resolution from image sequences,” Pattern Recognition, 1990. Proceedings., 10th International Conference on, vol. ii. pp. 115– 120 vol.2, 1990. 111