ADVANCES IN MATRIX AND TENSOR ANALYSIS: FLEXIBLE AND ROBUST SAMPLING MODELS, ALGORITHMS, AND APPLICATIONS By Bowen Su A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Applied Mathematicsโ€”Doctor of Philosophy 2025 ABSTRACT This thesis investigates robust and flexible methods for matrix and tensor analysis, which are funda- mental in data science. The primary focus of this work is the development of Guaranteed Sampling Flexibility for Low-Tubal-Rank Tensor Completion, a project aimed at addressing the limitations of existing sampling methods for tensor completion, such as Bernoulli and t-CUR sampling, which often lack flexibility across diverse applications. To overcome these challenges, we introduce Tensor Cross-Concentrated Sampling (t-CCS), an extension of the matrix Cross-Concentrated Sampling (CCS) model to tensors, and propose a novel non-convex algorithm, Iterative Tensor CUR Completion (ITCURC), specifically tailored for t-CCS-based tensor completion. Theoretical analysis provides sufficient conditions for low-rank tensor recovery and presents a detailed sampling complexity analysis. These findings are further validated through extensive testing on both real-world and synthetic datasets. In addition to the main project, this thesis includes another one complementary study. The study explores the robustness of CCS model for matrix completion, a recent approach demonstrated to effectively capture cross-concentrated data dependencies. However, its robustness to sparse outliers has remained underexplored. To address this gap, we propose the Robust CCS Completion prob- lem and develop a non-convex iterative algorithm, Robust CUR Completion (RCURC). Empirical results on synthetic and real-world datasets demonstrate that RCURC is both efficient and robust against outliers, making it a powerful tool for recovering incomplete data. Collectively, these projects advance the robustness and flexibility of matrix and tensor methods, enhancing their applicability in complex, real-world data environments. ACKNOWLEDGEMENTS As I reach the culmination of my Ph.D. journey, I wish to extend my deepest gratitude to the Math- ematics Department and the following professors whose support and guidance have been instru- mental to my growth. I would first like to express my sincere gratitude to my committee chair and advisor, Professor Andrew Christlieb. His expertise in Computational Science and Engineering has been a constant source of inspiration, significantly shaping my research perspective and deepening my understand- ing of high-performance computing. His invaluable advice and steadfast encouragement, particu- larly during the most challenging moments of my graduate studies, have left a lasting impact on both my personal and professional development. I would like long-term opportunities to work with Prof. Andrew on the development of low-rank tensor approximation techniques for high-dimensional PDEs. I am sincerely thankful to Professor Ekaterina Rapinchuk for her unwavering support and academic guidance. Her encouragement and insightful advice during my most difficult times pro- vided me with strength and motivation. Her commitment to my success, from serving on my Ph.D. committee from the very beginning to offering steadfast support throughout my journey, has been truly indispensable. I would like long-term opportunities to collaborate with Prof. Ekaterina on the machine learning projects. I would like to express my sincere gratitude to Professor Yuying Xie for his continuous support and guidance. His encouragement has been invaluable, helping me stay focused and motivated throughout my doctoral journey. I would like long-term opportunities to collaborate with Prof. Xie on the AI for science projects. My heartfelt appreciation extends to Professor Mark Iwen for his generosity in sharing his expertise and professional wisdom. His thoughtful guidance and enduring support during his service on my Ph.D. committee have been invaluable. Being part of the Department of Mathematics has been both an honor and a transformative ex- perience. I am deeply grateful for the knowledge, skills, and unwavering support I have received. I sincerely extend my heartfelt appreciation to Professor Jeffery and Prof Rajesh for their dedica- tion to fostering a supportive, friendly and enriching academic environment. Additionally, I am iii deeply grateful for the collaborative and intellectually stimulating environment fostered by all math staff and all my fellow graduate students in the Mathematics Department. I would like to acknowl- edge Alvarado Taylor, Aldo Garcia Guinto, Bhusal Gokul, Ekblad Owen, Krupansky Nick, Kimble Jamie, Mandela Quashie, Stephen White, Whiting David, Yu Shen, Jie Yang, Peikai Qi, Shitan Xu, Boahen Edem, and many other remarkable peers, whose camaraderie and support have profoundly enriched my Ph.D journey. I would also like to express my heartfelt gratitude to my mentors at Los Alamos National Lab- oratory, Dr. Charles Abolt and Dr. Adam Atchley, for their warm invitation, invaluable guidance during my summer internship, and their proactive efforts to extend the opportunity for me to con- tinue contributing part-time afterward. Their support and encouragement have been instrumental in my professional growth. Lastly but not the least, I would like to express my deepest gratitude to my familyโ€”my parents, Gang Su and Fang Wang, and my wife, Xuandiโ€”whose unwavering support and encouragement have served as the bedrock of my journey. Your companionship, and unshakable faith in my po- tential have been a true pillar of support, I owe a special debt of gratitude. Your understanding and encouragement have inspired me to push through obstacles, remain focused, and strive for excel- lence. Your collective love and sacrifices that have not only made this journey possible but also deeply meaningful, motivating me to turn my aspirations into reality. iv TABLE OF CONTENTS LIST OF TABLES . LIST OF FIGURES . . . . . . . LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi . . vii . . ix . . 1 CHAPTER 2 . . . . Introduction . GUARANTEED SAMPLING FLEXIBILITY FOR LOW-TUBAL-RANK 4 . . . . TENSOR COMPLETION . . . . . . 5 . . . . 17 . . . . . 18 . . . . . 21 . . . . . 23 . . . . . 35 . . . . . 71 . . . . 2.1 . . 2.2 Proposed sampling model . 2.3 Theoretical Results . . . . . 2.4 An efficient solver for t-CCS . 2.5 Numerical Experiments . . 2.6 Proofs of Theoretical Results . . 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 . . 73 . . 79 . . 82 . . 87 . . 88 . . 89 . . 90 CHAPTER 3 ON THE ROBUSTNESS OF CROSS-CONCENTRATED SAMPLING . . FOR MATRIX COMPLETION . . . . . . . . . . . . . . . . . . . . . . Introduction . 3.1 3.2 Proposed Algorithm . 3.3 Numerical Experiments . . 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 4 . 4.1 Summary of Contributions . CONCLUSION . . . BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF TABLES Table 2.1 A Comprehensive Examination of the Per-Iteration Computational Cost for . ITCURC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Table 2.2 Image inpainting results on the Building and Window datasets. The best re- sults are emphasized in bold, while the second-best results are underlined. ITCURC-๐›ฟ refers to the ITCURC method with the percentages of selected hor- izontal and lateral slices set at a fixed rate of ๐›ฟ%. The t-CCS based algorithm ITCURC-๐›ฟ%s are performed on t-CCS scheme while other Bernoulli based . . algorithms are performed on Bernoulli Sampling scheme. . . . . . . . . Table 2.3 The quantitative results for MRI data completion are presented, with the best ITCURC-๐›ฟ represents the results in bold and the second-best underlined. ITCURC method specifying that the selected proportion of horizontal and lateral slices is exactly ๐›ฟ%. The t-CCS based algorithm ITCURC-๐›ฟ%s are performed on t-CCS scheme while other Bernoulli based algorithms are per- . . formed on Bernoulli Sampling scheme. . . . . . . . . . . . . . . . . . . . . 29 . . 32 Table 2.4 Quantitative results for seismic data completion: TMac, TNN, F-TNN with Bernoulli sampling, and our method with t-CCS. Best results are in bold, and second-best are underlined. ITCURC-๐›ฟ refers to the ITCURC method with the percentages of selected horizontal and lateral slices set at a fixed rate of ๐›ฟ%. The t-CCS based algorithm ITCURC-๐›ฟ%s are performed on t-CCS scheme while other Bernoulli based algorithms are performed on Bernoulli Sampling . . scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 3.1 Comparison of runtime and PSNR among RPCA, SPCP, LRMF, IRCUR-R, . IRCUR-F based on full observation and RCURC based on the CCS model. . . 85 vi LIST OF FIGURES Figure 2.1 Visualization of a horizontal, a lateral, and a frontal slice of a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. The orange region from the leftmost subfigure, middle subfigure, and rightmost subfigure are a horizontal slice, a lateral slice, and a frontal slice of T respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Figure 2.2 A Standard Tensor Lateral Basis. Figure 2.3 A Standard Tubal Basis. . . Figure 2.4 t-CUR decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . . 13 . . 16 Figure 2.5 Visual results of color image inpainting using t-CCS samples at an overall sampling rate of 20% with BCPF, TMac, TNN, and F-TNN algorithms. . . . . . 21 Figure 2.6 (Row 1) 3D and (Row 2) 2D views illustrate ITCURCโ€™s empirical phase tran- sition for the t-CCS model. ๐›ฟ = |๐ผ |/768 = |๐ฝ |/768 shows sampled indices ratios, ๐‘ is the Bernoulli sampling probability over subtensors, and ๐›ผ is the overall tensor sampling rate. White and black in the 768 ร— 768 ร— 256 ten- sor results represent success and failure, respectively, across 25 tests for tubal ranks 2, 5, and 7 (Columns 1-3). The ๐›ผ needed for success remains consistent across different combinations ๐›ฟ and ๐‘. . . . . . . . . . . . . . . . . . . . . Figure 2.7 The averaged relative error of TICURC under the t-CCS model with respect . to iterations over 10 independent trials with ๐›ฟ = 0.20. . . . . . . . . . . . Figure 2.8 The averaged relative error of TICURC under the t-CCS model with respect . to iterations over 10 independent trials with ๐›ฟ = 0.25. . . . . . . . . . . . Figure 2.9 The averaged relative error of TICURC under the t-CCS model with respect . to iterations over 10 independent trials with ๐›ฟ = 0.30. . . . . . . . . . . . Figure 2.10 The averaged relative error of TICURC under the t-CCS model with respect . to iterations over 10 independent trials with ๐›ฟ = 0.35. . . . . . . . . . . . Figure 2.11 The visualization of color image inpainting for Building and Window datasets by setting tubal-rank ๐‘Ÿ = 35 with the percentage selected horizontal and lat- eral slices ๐›ฟ = 13% with overall sampling rate 20% for TICUR algorithm, while other algorithms are applied based on Bernoulli sampling model with the same overall sampling rate 20%. Additionally, t-CCS samples on the . . Building for ITCURC are the same as those in Figure 2.5. . . . . . . . vii . . 24 . . 25 . . 26 . . 26 . . 27 . . 28 Figure 2.12 Visualizations of MRI data recovery using ITCURC with tubal rank ๐‘Ÿ = 35, lateral and horizontal slice selection rate ๐›ฟ = 27%, and an overall sampling rate of 30%. Other algorithms are applied under Bernoulli sampling with the same overall sampling rate. Results for slices 51, 66, 86, and 106 are shown in rows 1 to 4, with a 1.3ร— magnified area at the bottom left of each result for . . . . clearer comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 2.13 Visualization of seismic data recovery results by setting tubal-rank ๐‘Ÿ = 3 for ITCURC with percentage of selected horizontal and lateral slices ๐›ฟ = 17% with overall sampling rate 28% while other methods are applied based on Bernoulli sampling models with the same overall sampling rate 28%. Dis- played are slices 15, 25, and 35 from top to bottom, with a 1.2ร— magnified . . area in each set for clearer comparison. . . . . . . . . . . . . . . . . . . Figure 2.14 The structure of the proof of Theorem 2.5: The core of the proof for Theo- rem 2.5 relies on assessing the probability that certain conditions, specified in Proposition 1, are met. Condition I and Condition II serve as sufficient criteria to ensure the applicability of Proposition 1. Thus, the proof of Theo- rem 2.5 primarily involves determining the likelihood that condition I and II are satisfied. The probabilistic assessment of condition I utilizes Lemma 2.8 as a fundamental instrument. Similarly, the evaluation of condition II employs . . Lemmas 2.8 to 2.10, and Corollary 2.1 as essential tools. . . . . . . . . Figure 3.1 [19] Visual comparison of sampling schemes: from uniform to CUR sam- pling at the same observation rate. Colored pixels indicate observed entries, . . . black pixels indicate missing ones. . . . . . . . . . . . . . . . . . . . Figure 3.2 Empirical convergence of RCURC [18]. Left: c = 10 and varying r. Right: . . ๐›ผ = 0.2, ๐‘Ÿ = 5 and varying ๐‘. . . . . . . . . . . . . . . . . . . . . . . . Figure 3.3 Video background subtraction results: Row 1 shows the original images (full observation) at the corresponding frames, while Row 2 presents the ob- served images generated by the CCS model at the respective frames. Rows 3 to 8 showcase the background components extracted using RPCA, SPCP, LMRF, AccAltProj, IRCUR-R, and IRCUR-F algorithms based on the full observation model. Row 9 presents the results obtained using the RCURC . . . algorithm under the CCS model. . . . . . . . . . . . . . . . . . . . . viii . . 30 . . 33 . . 46 . . 74 . . 83 . . 86 LIST OF ALGORITHMS Algorithm 2.1 t-Product based on Fast Fourier Transform (FFT) Algorithm 2.2 Moore-Penrose inverse . Algorithm 2.3 t-SVD . . . . . . . . . Algorithm 2.4 Compact t-SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm 2.5 Tensor Cross-Concentrated Sampling (t-CCS) . . . . . . . . . . . . . . . . . . . . Algorithm 2.6 Iterative CUR tensor completion for t-CCS (ITCURC) . Algorithm 2.7 Two-Step Tensor Completion (TSTC) . . Algorithm 3.1 Cross-Concentrated Sampling (CCS) [19] Algorithm 3.2 Robust CUR Completion (RCURC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . 12 . . 13 . . 14 . . 17 . . 22 . . 67 . . 74 . . 80 ix CHAPTER 1 INTRODUCTION 1 In an era of unprecedented data generation, extracting meaningful insights from complex, high- dimensional, and often incomplete datasets has become a cornerstone of data science [24, 37]. Matrix and tensor analysis, as foundational tools, provide versatile frameworks to address these challenges. Their applications span a diverse range of fields, including image and video process- ing [102, 73, 66], recommendation systems [74, 98], and scientific simulations [33, 53, 51? , 52]. However, despite their versatility, existing methods often struggle in real-world scenarios charac- terized by noisy, sparsely observed, or intricately structured data. This thesis focuses on developing robust and flexible methodologies for matrix and tensor analysis, aiming to enhance their robustness to noise and outliers, improve their adaptability to diverse applications, and expand their theoretical underpinnings. The primary focus of this thesis is on Guaranteed Sampling Flexibility for Low-Tubal-Rank Tensor Completion [107], addressing the limitations of conventional sampling strategies, such as Bernoulli [72, 121, 63] and t-CUR sampling [108, 100], which lack adaptability for diverse real- world applications. To overcome these challenges, this project introduces Tensor Cross-Concentrated Sampling (t-CCS), a generalization of the CCS model to higher-order tensors. Complementing this framework is the development of a novel non-convex algorithm, Iterative Tensor CUR Comple- tion (ITCURC), specifically designed for t-CCS-based tensor completion. The project provides rigorous theoretical foundations, including sufficient conditions for low-tubal-rank tensor recovery and a detailed sampling complexity analysis. Extensive evaluations on synthetic and real-world datasets validate the superior performance of t-CCS and ITCURC in terms of accuracy, flexibility, and computational efficiency. This work advances tensor analysis by addressing the challenges of high-dimensional, incomplete, and sparsely observed data. The second project explores the Robustness of Cross-Concentrated Sampling (CCS) for Matrix Completion [18], a recent method that leverages cross-sectional dependencies to recover missing data. While CCS has shown promise in capturing essential patterns in data matrices, its vulnerabil- ity to sparse outliersโ€”a common challenge in real-world datasetsโ€”remains an open question. This project introduces the Robust CCS Completion (RCURC) framework, extending CCS to handle 2 noisy and incomplete data with resilience to outlier corruption. A non-convex iterative algorithm is developed to solve the RCURC problem, and experimental results on synthetic and real-world datasets demonstrate the algorithmโ€™s robustness, efficiency, and scalability. Together, these projects address critical gaps in matrix and tensor analysis, focusing on robust- ness, flexibility, and efficiency. The methodologies presented in this thesis not only tackle specific challenges but also provide a foundation for addressing a broader class of problems in data science, where noise, sparsity, and high dimensionality are pervasive. By proposing novel frameworks, designing practical algorithms, and establishing comprehensive theoretical insights, this work ad- vances the state of the art in matrix and tensor analysis, paving the way for their application to increasingly complex and diverse data environments. This thesis is structured as follows: Chapter 2 explores Tensor Cross-Concentrated Sampling and the Iterative Tensor CUR Completion algorithm. Chapter 3 introduces the Robust CCS Com- pletion framework, detailing its methodology and application to matrix completion problems. 3 CHAPTER 2 GUARANTEED SAMPLING FLEXIBILITY FOR LOW-TUBAL-RANK TENSOR COMPLETION 4 ABSTRACT While Bernoulli sampling is extensively studied in the field of tensor completion, and t-CUR sam- pling provides a way to approximate low-tubal-rank tensors via lateral and horizontal subtensors, both methods lack sufficient flexibility for diverse practical applications. To address this, we intro- duces Tensor Cross-Concentrated Sampling (t-CCS), an innovative and straightforward sampling model that advances the matrix cross-concentrated sampling concept within a tensor framework. t-CCS effectively bridges the gap between Bernoulli and t-CUR sampling, offering additional flex- ibility that can lead to computational savings in various contexts. A key aspect of our work is the comprehensive theoretical analysis provided. We establish a sufficient condition for the successful recovery of a low-rank tensor from its t-CCS samples. In support of this, we also develop a the- oretical framework validating the feasibility of t-CUR via uniform random sampling and conduct a detailed theoretical sampling complexity analysis for tensor completion problems utilizing the general Bernoulli sampling model. Moreover, we introduce an efficient non-convex algorithm, the Iterative Tensor CUR Completion (ITCURC) algorithm, specifically designed to tackle the unique challenges of t-CCS-based tensor completion. We have intensively tested and validated the ef- fectiveness of the t-CCS model and the ITCURC algorithm across both synthetic and real-world datasets. 2.1 Introduction A tensor, as a multidimensional generalization of matrix, provides an intuitive representa- tion for handling multi-relational or multi-modal data such as hyperspectral data [16, 131, 142], videos [80, 103], seismic data [42, 95], DNA microarrays [90]. However, in real-world scenarios, it is common to encounter situations where only partial observations of the tensor data are available due to unavoidable or unforeseen circumstances. These limitations can stem from factors such as data collection issues or errors made during data entry by researchers. The problem of recovering the missing data by effectively leveraging the available observations is commonly referred to as the Tensor Completion (TC) problem. TC is inherently complex and often ill-posed [49, 141], necessitating the exploration of vari- 5 ous sampling models and completion techniques. A common and crucial assumption for resolving TC is the low-rank structure of the tensor, which has been extensively utilized to enhance TC ap- proaches [5, 80, 138]. However, the concept of tensor rank is not unique and comes with its own limitations. For example, the CANDECOMP / PARAFAC (CP) rank represents the minimum num- ber of rank-one tensors required to achieve the CP decomposition, involving summations of these tensors [60]. Computing the CP rankโ€“an NP-hard problemโ€“presents difficulties in the recovery of tensors with a low CP rank [68]. Thus, finding the optimal low-CP-rank approximation of the target tensor is still an open problem [141]. Other tensor ranks, such as Tucker [117], Tensor Train [91], tubal [69] and Hierarchical-Tucker [48, 54], to name a few, also play prominent roles in the field, each with its distinct computation and application implications. In this study, we focus on the low-tubal-rank model for tensor completion. The tubal-rank is de- fined based on the tensor decomposition known as tensor Singular Value Decomposition (t-SVD), which employs the tensor-tensor product (t-product) [71]. In t-SVD, a tensor is decomposed into the t-product of two orthogonal tensors and a ๐‘“ -diagonal tensor. The tubal-rank is then determined by the number of non-zero singular tubes present in the ๐‘“ -diagonal tensor. Previous research has shown that tubal-rank-based tensor models exhibit better modeling capabilities compared to other rank-based models, particularly for tensors with fixed orientation or specific spatial-shifting char- acteristics [96, 133]. In low-tubal-rank TC model, we consider T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 with tubal rank ๐‘Ÿ and the observations are located in the set ฮฉ. TC aims to recover the original tensor T from the observations on ฮฉ. Mathematically, we aim to solve the following optimization problem: (cid:104)Pฮฉ(T โˆ’ (cid:101)T ), T โˆ’ (cid:101)T (cid:105), subject to tubal-rank( (cid:101)T ) = ๐‘Ÿ, (2.1) min (cid:101)T where (cid:104)ยท, ยท(cid:105) denotes the Frobenius inner product and Pฮฉ is the sampling operator defined by Pฮฉ(T ) = (cid:213) [T ]๐‘–, ๐‘—,๐‘˜ E๐‘–, ๐‘—,๐‘˜ (2.2) (๐‘–, ๐‘—,๐‘˜)โˆˆฮฉ where E๐‘–, ๐‘—,๐‘˜ โˆˆ {0, 1}๐‘›1ร—๐‘›2ร—๐‘›3 is a tensor with all elements being zero except for the element at the position indexed by (๐‘–, ๐‘—, ๐‘˜). 6 For successful recovery, the general setting of an efficient solver for (2.1) requires the observa- tion set ฮฉ to be sampled entry-wise, fiber-wise, or slab-wise through a certain unbiased stochastic process, including the Bernoulli sampling process as referenced in [101, 104, 113, 120] and the uni- form sampling process as referenced in [62, 105, 138]. Although extensive theoretical and empirical studies have been conducted on these sampling settings, their practical applicability is sometimes limited in certain contexts. For instance, in collaborative filtering applications, each dimension of the three order tensor data typically represents users, rated items (such as movies or products), and time respectively. The unbiased sampling models implicitly assume that all users are equally likely to rate all items over time, a premise that is often unrealistic in real-world scenarios. Letโ€™s consider the application of Magnetic Resonance Imaging (MRI) as another example. MRI scans face limitations with certain metal implants and can cause discomfort in prolonged sessions [1]. To address these issues, we propose a generalization of the cross-concentrated sampling model to the tensor completion setting based on the cross-concentrated sampling model for matrix comple- tion [19], termed tensor cross-concentrated sampling (t-CCS). t-CCS enables partial observations on selected horizontal and lateral subtensors, making it more practical in many applications. 2.1.1 Basic Definitions and Terminology We use K to denote an algebraically closed field, either R or C. We represent a matrix as a capital italic letter (e.g., ๐ด) and a tensor by a cursive italic letter (e.g., T ). The notation [๐‘›] denotes the set of the first ๐‘› positive integers, i.e., {1, ยท ยท ยท , ๐‘›}, for any ๐‘› โˆˆ Z+. Submatrices and subtensors are denoted as [ ๐ด] ๐ผ,๐ฝ and [T ] ๐ผ,๐ฝ,๐พ, respectively, with ๐ผ, ๐ฝ, ๐พ as subsets of appropriate index sets. In particular, if ๐ผ is the full index set, we denote [T ]:,๐ฝ,๐พ as [T ] ๐ผ,๐ฝ,๐พ, and similar rules apply to ๐ฝ and ๐พ. Additionally, |๐‘†| denotes the cardinality of the set ๐‘†. If ๐ผ is a subset of the set [๐‘›], then ๐ผ(cid:251) denotes the set of elements in [๐‘›] that are not in ๐ผ. For a given matrix ๐ด, we use ๐ดโ€  to denote its Moore-Penrose inverse and ๐ด(cid:62) for its conjugate transpose. The spectral norm of ๐ด, represented by (cid:107) ๐ด(cid:107), is its largest singular value. Additionally, the Frobenius norm of ๐ด is denoted by (cid:107) ๐ด(cid:107)F, where (cid:113)(cid:205)๐‘–, ๐‘— | ๐ด๐‘–, ๐‘— |2, and its nuclear norm, represented by (cid:107) ๐ด(cid:107)โˆ—, is the sum of all its singular (cid:107) ๐ด(cid:107)F = values. 7 The Kronecker product is denoted by โŠ—. The column vector e๐‘– has a 1 in the ๐‘–-th position, with other elements as 0, and its dimension is specified when used. For a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, (cid:98)T represents the tensor after applying a discrete Fourier transform along its third dimension. Given a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, we call [T ]๐‘–,:,:, [T ]:, ๐‘—,:, [T ]:,:,๐‘˜ horizontal, lateral, and frontal slice of T for any ๐‘– โˆˆ [๐‘›1], ๐‘— โˆˆ [๐‘›2], and ๐‘˜ โˆˆ [๐‘›3]. Figure 2.1 gives an example of the horizontal, lateral, frontal slice of a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. Horizontal Slice [T ]๐‘›1,:,: Lateral Slice [T ]:,๐‘›2,: Frontal Slice [T ]:,:,1 Figure 2.1 Visualization of a horizontal, a lateral, and a frontal slice of a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. The orange region from the leftmost subfigure, middle subfigure, and rightmost subfigure are a horizontal slice, a lateral slice, and a frontal slice of T respectively. Given T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, one can define the associated block circulant matrix obtained from the mode-3 slabs of T i.e., bcirc(T ) := โˆˆ K๐‘›1๐‘›3ร—๐‘›2๐‘›3, T1 T2 ... T๐‘›3 T1 ... ยท ยท ยท T2 ยท ยท ยท T3 ... . . . T๐‘›3 T๐‘›3โˆ’1 ยท ยท ยท T1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป where T๐‘– := [T ]:,:,๐‘–. For the purpose of this section, we will utilize a slight modification of the unfolding of a matrix along its second mode and we define unfold(T ) := (cid:104) T (cid:62) 1 (cid:105) (cid:62) ยท ยท ยท T (cid:62) ๐‘›3 โˆˆ K๐‘›1๐‘›3ร—๐‘›2 and fold(unfold(T )) = T . The t-product of tensors T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 and S โˆˆ K๐‘›2ร—๐‘›4ร—๐‘›3 is denoted by T โˆ— S which is a tensor of dimension ๐‘›1 ร— ๐‘›4 ร— ๐‘›3 obtained via circular convolution. Specifically, T โˆ— S = fold(bcirc(T ) ยท unfold(S)). (2.3) 8 The computational cost of the t-product based on Equation (2.3) is O (๐‘›1๐‘›2๐‘›2 3 ๐‘›4), since bcirc(T )ยท unfold(S) is the multiplication of a ๐‘›1๐‘›3 ร— ๐‘›2๐‘›3 matrix with a ๐‘›2๐‘›3 ร— ๐‘›4 matrix. Define T as T = (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 (cid:1) ยท bcirc(T ) ยท (cid:16) ๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›2 (cid:17) , where ๐น๐‘› to represents the ๐‘› ร— ๐‘› Discrete Fourier Transform matrix and ๐นโˆ’1 ๐‘› is its matrix inverse. By the property that a circulant matrix can be block-diagonalized by DFT, we can see that T is a block-diagonal matrix. Notice that (where S๐‘– = [S]:,:,๐‘–) unfold(S) = = S1 S2 ... ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ S๐‘›3 ๏ฃฏ ๏ฃฐ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ S๐‘›3 S๐‘›3โˆ’1 ๏ฃฏ ๏ฃฐ S1 ... S2 ... S๐‘›3 S1 For simplicity, denote ๐ธ1 as ๐ผ๐‘›4 0 ... 0 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ยท ยท ยท S2 ยท ยท ยท S3 ... . . . ยท ยท ยท S1 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ยท ๐ผ๐‘›4 0 ... 0 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป (where ๐ผ๐‘›4 is the ๐‘›4 ร— ๐‘›4 identity matrix) . Hence, T โˆ— S can also be expressed as T โˆ— S = fold(bcirc(T ) ยท unfold(S)) = fold (bcirc(T ) ยท bcirc(S) ยท ๐ธ1) = fold = fold = fold (cid:16)(cid:16) (cid:16)(cid:16) (cid:16)(cid:16) (cid:17) (cid:17) (cid:17) ๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›1 ๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›1 ๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›1 ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 (cid:1) ยท bcirc(T ) ยท (cid:16) (cid:17) โŠ— ๐ผ๐‘›2 ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›2 ยท T ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›2 (cid:1) ยท bcirc(S) ยท ๐นโˆ’1 ๐‘›3 (cid:17) โŠ— ๐ผ๐‘›1 ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 (cid:1) ยท ๐ธ1 (cid:17) (cid:1) ยท bcirc(S) ยท ๐ธ1 (cid:17) ๐นโˆ’1 ๐‘›3 (cid:16) ยท T ยท S ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 (cid:1) ยท ๐ธ1 (cid:17) . Numerically, we implement the t-product of two tensors based on Algorithm 2.1. 9 Algorithm 2.1 t-Product based on Fast Fourier Transform (FFT) 1: Input: T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, S โˆˆ K๐‘›2ร—๐‘›4ร—๐‘›3. 2: T โ†’ (cid:98)T := fft(T , [], 3); S โ†’ (cid:98)S := fft(S, [], 3). 3: for each ๐‘– โˆˆ {1, 2, . . . , ๐‘›3} do 4: 5: end for 6: Output: Z โ†’ ifft( (cid:98)Z, [], 3). [ (cid:98)C]:,:,๐‘– = [ (cid:98)T ]:,:,๐‘– ยท [ (cid:98)S]:,:,๐‘–. Note that the computational costs of fft(T , [], 3), fft(S, [], 3) and ifft(C, [], 3) are ๐‘›1๐‘›2๐‘›3 log(๐‘›3), ๐‘›2๐‘›4๐‘›3 log(๐‘›3) and ๐‘›1๐‘›4๐‘›3 log(๐‘›3) respectively. Thus, t-product based on FFT takes O (๐‘›1๐‘›2๐‘›3 log(๐‘›3)+ ๐‘›2๐‘›4๐‘›3 log(๐‘›3)+๐‘›1๐‘›4๐‘›3 log(๐‘›3)+๐‘›1๐‘›2๐‘›4๐‘›3) = O (๐‘›1๐‘›2๐‘›4๐‘›3), which is more computational efficient. Definition 1 (Tensor Frobenius norm). The tensor Frobenius norm (cid:107)T (cid:107)F of a third-order tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 is defined as , (cid:107)T (cid:107)F := (cid:115)(cid:213) ๐‘–, ๐‘—,๐‘˜ |T๐‘–, ๐‘—,๐‘˜ |2 = 1 โˆš ๐‘›3 (cid:107)bcirc(T )(cid:107)F. Before we introduce the tensor spectral norm, letโ€™s discuss mathematical insight to define such a mathematical object. Suppose that (๐‘‰, (cid:107) ยท (cid:107)๐‘‰ ) and (๐‘Š, (cid:107) ยท (cid:107)๐‘Š ) are two finite-dimensional linear normed space, where (cid:107) ยท (cid:107)๐‘‰ and (cid:107) ยท (cid:107)๐‘Š are two norms defined on ๐‘‰ and ๐‘Š respectively. Suppose ๐ฟ : ๐‘‰ โ†’ ๐‘Š be a continuous linear operator. The operator norm of ๐ฟ can be defined as (cid:107)๐ฟ (cid:107) = sup (cid:107)๐‘ฃ(cid:107)๐‘‰ โ‰ค1 Let ๐‘‰ = K๐‘›1 and ๐‘Š = K๐‘›2. Given a matrix ๐ด โˆˆ K๐‘›1ร—๐‘›2, it is easy to see that operator ๐ฟ defined as (cid:107)๐ฟ(๐‘ฃ)(cid:107)๐‘Š . ๐ฟ : (๐‘‰, (cid:107) ยท (cid:107)๐‘‰ ) โˆ’โ†’ (๐‘Š, (cid:107) ยท (cid:107)๐‘Š ) ๐‘ฃ โ†ฆโˆ’โ†’ ๐ฟ (๐‘ฃ) = ๐ด ยท ๐‘ฃ (2.4) is a continuous linear operator. Different choices of (cid:107) ยท (cid:107)๐‘‰ and (cid:107) ยท (cid:107)๐‘Š will lead to different matrix norms induced by the operator norm. For example, if (cid:107) ยท (cid:107)๐‘‰ and (cid:107) ยท (cid:107)๐‘Š are both Frobenius norm, then the operator norm defined in Equation (2.5) is the same as the matrix spectral norm. Now letโ€™s suppose ๐‘‰ = K๐‘›2ร—1ร—๐‘›3 and ๐‘Š = K๐‘›1ร—1ร—๐‘›3. Given a tensor A โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, define operator L as L : (๐‘‰, (cid:107) ยท (cid:107)F) โˆ’โ†’ (๐‘Š, (cid:107) ยท (cid:107)F) V โ†ฆโˆ’โ†’ ๐ฟ (V) = A โˆ— V. (2.5) 10 It is easy to check the operator L is bounded and linear. The operator norm of L can be computed as follows. (cid:107)L (cid:107) := sup (cid:107)V (cid:107)โ‰ค1 (cid:107)A โˆ— V (cid:107)F = sup (cid:107)V (cid:107)Fโ‰ค1 (cid:107)bcirc(A) ยท unfold(V)(cid:107)F = (cid:107) bcirc(A)(cid:107) Remember that A = (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 โˆš ๐ผ๐‘›1) = I๐‘›1๐‘›3 and ๐‘›3(๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›2)(cid:62) ยท (cid:1) ยท bcirc(A) ยท (cid:16) โˆš ๐‘›3(๐นโˆ’1 ๐‘›3 (cid:17) . Notice that ๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›2 โŠ— ๐ผ๐‘›2) = I๐‘›2๐‘›3. Thus, 1โˆš๐‘›3 (๐น๐‘›3 โŠ— ๐ผ๐‘›1)(cid:62) ยท 1โˆš๐‘›3 ๐‘›3(๐นโˆ’1 ๐‘›3 (๐น๐‘›3 โŠ— โŠ— ๐ผ๐‘›2) โˆš 1โˆš๐‘›3 (๐น๐‘›3 โŠ— ๐ผ๐‘›1) and are two unitary orthogonal matrices. Hence, we have (cid:107)A (cid:107) = (cid:107) (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 1 โˆš ๐‘›3 (cid:1) ยท bcirc(A) ยท ๐นโˆ’1 ๐‘›3 (๐น๐‘›3 โŠ— ๐ผ๐‘›1)(cid:62) ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 = (cid:107) (cid:16) (cid:17) (cid:107) โŠ— ๐ผ๐‘›2 (cid:1) ยท bcirc(A) ยท (cid:16) ๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›2 โˆš (cid:17) ยท ๐‘›3(๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›2)(cid:62)(cid:107) = (cid:107)(๐น๐‘›3 โŠ— ๐ผ๐‘›1)(cid:62) ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 = (cid:107)(๐น(cid:62) ๐‘›3 โŠ— ๐ผ๐‘›1) ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 (cid:1) ยท bcirc(A) ยท (cid:16) (cid:1) ยท bcirc(A) ยท โŠ— ๐ผ๐‘›2 (cid:17) โŠ— ๐ผ๐‘›2 (cid:16) ๐นโˆ’1 ๐‘›3 ๐นโˆ’1 ๐‘›3 (cid:16) (cid:17) ยท (๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›2)(cid:62)(cid:107) )(cid:62) โŠ— ๐ผ๐‘›2)(cid:107) ยท ((๐นโˆ’1 ๐‘›3 1 ๐‘›3 (cid:17) ยท (cid:1) ยท bcirc(A) ยท ๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›2 (๐น๐‘›3 โŠ— ๐ผ๐‘›2)(cid:107) = (cid:107)๐‘›3(๐นโˆ’1 ๐‘›3 โŠ— ๐ผ๐‘›1) ยท (cid:0)๐น๐‘›3 โŠ— ๐ผ๐‘›1 = (cid:107) bcirc(A) (cid:107). Definition 2 ( ๐‘“ -diagonal tensor ). A tensor is called ๐‘“ -diagonal if each of its frontal slices is a diagonal matrix. Definition 3 (Tensor conjugate transpose). The conjugate transpose of a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 is the ๐‘›2 ร— ๐‘›1 ร— ๐‘›3 tensor T (cid:62) obtained by conjugate transposing each of the frontal slice and then reversing the order of the second to last frontal slices. Definition 4 (Identity tensor). The identity tensor I โˆˆ K๐‘›ร—๐‘›ร—๐‘›3 is the tensor with the only first frontal slices [T ]:,:,1 being the ๐‘› ร— ๐‘› identity matrix and with other frontal slices [T ]:,:,๐‘– are all zeros for ๐‘– = 2, ยท ยท ยท , ๐‘›3. Definition 5 (Orthogonal tensor). If a tensor of size ๐‘› ร— ๐‘› ร— ๐‘›3 is orthogonal if T (cid:62) โˆ— T = I = T โˆ— T (cid:62) = I โˆˆ K๐‘›ร—๐‘›ร—๐‘›3 11 Definition 6 (Partially orthogonal tensor). If a tensor of size ๐‘›1 ร— ๐‘›2 ร— ๐‘›3 is partially orthogonal if T (cid:62) โˆ— T = I โˆˆ K๐‘›2ร—๐‘›2ร—๐‘›3 or T โˆ— T (cid:62) = I โˆˆ K๐‘›1ร—๐‘›1ร—๐‘›3 Definition 7 (Moore-Penrose inverse [71]). T โ€  โˆˆ K๐‘›2ร—๐‘›1ร—๐‘›3 is said to be the Moore-Penrose inverse of T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, if T โ€  satisfies the following four equations, T โˆ— T โ€  โˆ— T = T , T โ€  โˆ— T โˆ— T โ€  = T โ€ , (cid:16) (cid:17) (cid:62) = T โˆ— T โ€ , T โ€  โˆ— T (cid:16) T โˆ— T โ€ (cid:17) (cid:62) = T โ€  โˆ— T . Algorithm 2.2 Moore-Penrose inverse 1: Input: Z โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. 2: Z โ†’ (cid:98)Z = ๏ฌ€t(Z, [], 3). 3: for each ๐‘– โˆˆ ๐‘›3 do [ (cid:98)Z]โ€  4: 5: end for 6: Output: Zโ€  = i๏ฌ€t( (cid:98)Zโ€ , [], 3) :,:,๐‘– = Moore-Penrose-inverse([ (cid:98)Z]:,:,๐‘–) Definition 8 (Tensor spectral norm and condition number). The tensor spectral norm (cid:107)T (cid:107)2 of a third-order tensor T is defined as (cid:107)T (cid:107)2 = (cid:107)bcirc(T )(cid:107)2. The condition number of T is defined as: ๐œ…(T ) = (cid:107)T โ€ (cid:107)2 ยท (cid:107)T (cid:107)2. Definition 9 (Standard tensor lateral basis). The lateral basis หš๐”ข๐‘–, is of size ๐‘›1 ร— 1 ร— ๐‘›3 with only [หš๐”ข๐‘–]๐‘–,1,1 equal to 1 and the remaining equal to zero. Definition 10 (Standard tensor tubal basis [109, 138]). A standard tubal basis (cid:164)๐”ข๐‘˜ , is a 1 ร— 1 ร— ๐‘›3 third mode tensor where all elements are zero except for a single nonzero element with a value of 1 at the (1, 1, ๐‘˜) entry. Definition 11 ( Identity tensor). The identity tensor I โˆˆ K๐‘›ร—๐‘›ร—๐‘›3 is the tensor whose first frontal slice is the ๐‘› ร— ๐‘› identity matrix and other frontal slices are all zeros. Our research will focus on the subtensors of an underlying tensor with low tubal-rank. To ensure that this work is self-contained, we will begin by introducing the concept of the sampling tensor as follows. 12 Figure 2.2 A Standard Tensor Lateral Basis. Figure 2.3 A Standard Tubal Basis. Definition 12 (Sampling tensor). Given a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 and ๐ผ โІ [๐‘›1], the horizontal subtensor R of T with indices ๐ผ can be obtained via R := [T ] ๐ผ,:,: = [I] ๐ผ,:,: โˆ— T , where I is defined in Definition 11. For convenience, [I] ๐ผ,:,: will be denoted by S๐ผ with the given index set ๐ผ. Similarly, the lateral sub-tensor C with indices ๐ฝ โІ [๐‘›2] can be obtained as C := [T ]:,๐ฝ,: = T โˆ— [I]:,๐ฝ,:. The subtensor W of T with horizontal indices ๐ผ and lateral indices ๐ฝ can be represented as U := [T ] ๐ผ,๐ฝ,: = S๐ผ โˆ— T โˆ— S๐ฝ. 2.1.2 Tensor decomposition Tensor decompositions provide a concise representation of the underlying structure of data, revealing the low-dimensional subspace within which the data resides. Theorem 2.1 (t-SVD). Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. Then, it can be factored as T = W โˆ— ฮฃ โˆ— V(cid:62), where W โˆˆ K๐‘›1ร—๐‘›1ร—๐‘›3, V โˆˆ K๐‘›2ร—๐‘›2ร—๐‘›3 are orthogonal and ฮฃ โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 is a f-diagonal tensor. Numerically, we implement t-SVD based on Algorithm 2.3. Algorithm 2.3 t-SVD 1: Input: T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. 2: Z โ†’ (cid:98)Z := fft(Z, [], 3). 3: for each ๐‘– โˆˆ ๐‘›3 do 4: 5: end for 6: Output: ifft( (cid:98)U, [], 3); ifft( (cid:98)S, [], 3); ifft( (cid:98)V, [], 3) [[ (cid:98)U]:,:,๐‘–, [ (cid:98)S]:,:,๐‘–, [ (cid:98)V]:,:,๐‘–] = SVD([ (cid:98)Z]:,:,๐‘–) 13 From Algorithm 2.3, we can see that t-SVD is implemented by performing matrix SVD iter- atively for a loop of ๐‘›3 times. Thus, the computational complexity of t-SVD of a ๐‘›1 ร— ๐‘›2 ร— ๐‘›3 is O (min{๐‘›2 1 ๐‘›2๐‘›3, ๐‘›1๐‘›2 2 ๐‘›3}) Definition 13 (Tubal-rank and multi-rank). Suppose the tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 satisfies rank ๐‘Ÿ๐‘˜ for ๐‘˜ โˆˆ [๐‘›3]. Then (cid:174)๐‘Ÿ = (cid:0)๐‘Ÿ1, ๐‘Ÿ2, . . . , ๐‘Ÿ๐‘›3 addition, max{๐‘Ÿ๐‘– : ๐‘– โˆˆ [๐‘›3]} is called the tubal-rank of T , denoted by rank(T ). We denote tubal- [ (cid:98)T ]:,:,๐‘˜ (cid:1) is called the multi-rank of T , denoted by rank๐‘š (T ). In (cid:16) (cid:17) = rank as ๐‘Ÿ or (cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž, and (cid:107)(cid:174)๐‘Ÿ (cid:107)1 for the sum of the multi-rank. Theorem 2.2 (Compact t-SVD). Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 with tubal-rank ๐‘Ÿ. Then, it can be factored as T = W โˆ— ฮฃ โˆ— V(cid:62), where W โˆˆ K๐‘›1ร—๐‘Ÿร—๐‘›3, V โˆˆ K๐‘›2ร—๐‘Ÿร—๐‘›3 are partially orthogonal and ฮฃ โˆˆ K๐‘Ÿร—๐‘Ÿร—๐‘›3 is a f-diagonal tensor. Numerically, we implement compact t-SVD based on Algorithm 2.4. Algorithm 2.4 Compact t-SVD 1: Input: T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. 2: Z โ†’ (cid:98)Z := fft(Z, [], 3). 3: Initialize (cid:99)W = zeros(๐‘›1, ๐‘Ÿ, ๐‘›3), (cid:98)S = zeros(๐‘Ÿ, ๐‘Ÿ, ๐‘›3) and (cid:98)V = zeros(๐‘›2, ๐‘Ÿ, ๐‘›3). 4: for each ๐‘– โˆˆ ๐‘›3 do 5: 6: 7: [๐‘Š, ๐‘†, ๐‘‰] = SVD( [ (cid:98)Z]:,:,๐‘–) [ (cid:99)W]:,:,๐‘– = [๐‘Š]:,1:๐‘Ÿ; [ (cid:98)S]:,:,๐‘– = [๐‘†]1:๐‘Ÿ,1:๐‘Ÿ; [ (cid:98)V]:,:,๐‘– = [๐‘‰]:,1:๐‘Ÿ 8: 9: end for 10: Output: W = ifft( (cid:99)W, [], 3); S = ifft( (cid:98)S, [], ๐‘†); V = ifft( (cid:98)V, [], 3) Lemma 2.1. [71, 69] [Best Tubal rank-๐‘Ÿ approximation]. Let the t-SVD of T โˆˆ R๐‘šร—๐‘›ร—๐‘˜ be T = U โˆ— S โˆ— Vโ€ . For a given positive integer ๐‘Ÿ, define T๐‘Ÿ = (cid:205)๐‘Ÿ ๐‘ =1 U (:, ๐‘ , :) โˆ— S(๐‘ , ๐‘ , :) โˆ— Vโ€ (:, ๐‘ , :). Then T๐‘Ÿ = argminT โˆˆT ||T โˆ’ T ||๐น, where T = {X โˆ— Yโ€ |X โˆˆ K๐‘šร—๐‘Ÿร—๐‘˜ , Y โˆˆ K๐‘›ร—๐‘Ÿร—๐‘˜ }. Note that S in t-SVD is organized in a decreasing order, i.e., ||S(1, 1, :)||2 โ‰ฅ ||S(2, 2, :)||2 โ‰ฅ ..., which is implicitly defined in [69]. Therefore, the best rank-๐‘Ÿ approximation of tensors is similar to PCA (principal component analysis) of matrices. After we introduce the compact t-SVD, we introduce two important definitions based on this type of decomposition. 14 Definition 14 (Tensor ๐œ‡0-incoherence condition). Given a tubal-rank ๐‘Ÿ tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 with a compact t-SVD T = W โˆ— S โˆ— V(cid:62), we say T satisfy ๐œ‡0-incoherence condition if for all ๐‘˜ โˆˆ {1 ยท ยท ยท , ๐‘›3}, the following hold: max ๐‘–=1,2,.,๐‘›1 (cid:13) (cid:13) (cid:13) [ (cid:99)W](cid:62) :,:,๐‘˜ ยท e๐‘– (cid:13) (cid:13) (cid:13)F โ‰ค (cid:114) ๐œ‡0๐‘Ÿ ๐‘›1 , max ๐‘—=1,2,...,๐‘›2 (cid:13) (cid:13) (cid:13) [ (cid:98)V](cid:62) :,:,๐‘˜ ยท e ๐‘— (cid:13) (cid:13) (cid:13)F โ‰ค (cid:114) ๐œ‡0๐‘Ÿ ๐‘›2 . In certain instances, to accentuate the incoherence parameter of a specific tensor T , we will rep- resent this parameter as ๐œ‡T . In tensor decomposition, t-CUR decomposition, a self-expressiveness tensor decomposition of a given 3-mode tensor, has received significant attention [4, 28, 56, 109]. Specifically, t-CUR involves representing a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 as T โ‰ˆ C โˆ— U โˆ— R, with C = [T ]:,๐ฝ,: and R = [T ] ๐ผ,:,: for some ๐ฝ โІ [๐‘›2] and ๐ผ โІ [๐‘›1]. There exist different versions of U. This work focuses on the t-CUR decomposition of the form T โ‰ˆ C โˆ— Uโ€  โˆ— R with U = [T ] ๐ผ,๐ฝ,:. Under certain conditions, this approximation accurately represents T . [28, 56] have detailed the conditions for exact t-CUR decomposition. We begin by defining the tubal-rank of a 3-mode tensor: For convenience, we present one theoretical result of t-CUR below. Theorem 2.3 ( [28, 56]). Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 with multi-rank rank๐‘š (T ) = (cid:174)๐‘Ÿ. ๐ผ โŠ‚ [๐‘›1] and ๐ฝ โŠ‚ [๐‘›2] are two index sets. Denote C = [T ]:,๐ฝ,:, R = [T ] ๐ผ,:,:, and U = [T ] ๐ผ,๐ฝ,:. Then T = C โˆ— Uโ€  โˆ— R if and only if rank๐‘š (C) = rank๐‘š (R) = (cid:174)๐‘Ÿ. Theorem 2.3 can be visualized as follows. 15 Figure 2.4 t-CUR decomposition. 2.1.3 Related work Kilmer and Martin [71] introduced novel definitions for tensor multi-rank and tubal-rank char- acterized by the t-SVD. Researchers commonly employ a convex surrogate to tubal-rank function augmented with regularization of the tensor nuclear norm (TNN), as indicated in [64, 66, 82, 84, 138, 143]. While a pioneering optimization method featuring TNN is initially proposed to tackle the TC problem in [139], this approach necessitates the simultaneous minimization of all singu- lar values across tensor slices, which hinders its ability to accurately approximate the tubal-rank function [61, 130]. To circumvent this challenge, various truncated methods have been introduced as alternatives. Notably, examples include the truncated nuclear norm regularization [61] and the tensor truncated nuclear norm (T-TNN) [130]. Furthermore, Zhang et al. [135] introduced a novel strategy for low-rank regularization, focusing on nonlocal similar patches. However, the aforemen- tioned tensor completion algorithms are designed based on Bernoulli sampling model. Despite its foundational role in probability theory and statistics, Bernoulli sampling frequently encounters practical limitations when applied to real-world data collection scenarios [43, 94]. In the realm of collaborative filtering, where the tensorโ€™s horizontal and lateral slices denote users and rated ob- jects (such as movies and merchandise) over a specific time period, the application of the Bernoulli sampling model is impractical. As this model implicitly assumes that every user has an equal prob- 16 ability of rating any given object, an assumption that is seldom valid in real-world scenarios. The variability in user preferences and interaction patterns makes this equal-probability assumption un- realistic, thereby challenging the efficacy of the Bernoulli sampling approach in such contexts. 2.2 Proposed sampling model We aim to develop a sampling strategy that is both efficient and effective for a range of real- world scenarios. Inspired by the cross-concentrated sampling model for matrix completion [19] and t-CUR decomposition [28, 56, 109], we introduce a novel sampling model tailored for tensor data, named Tensor Cross-Concentrated Sampling (t-CCS). The t-CCS model extracts samples from both horizontal and lateral subtensors of the original tensor. Formally, let R = [T ] ๐ผ,:,: and C = [T ]:,๐ฝ,: be the selected horizontal and lateral subtensors of T , determined by index sets ๐ผ and ๐ฝ respectively. Next, we sample entries on R and C based on the Bernoulli sampling model. The t-CCS procedure is detailed in Procedure 2.5. Notably, t-CCS transitions to t-CUR sampling when the samples are dense enough to fully capture the subtensors and reverts to Bernoulli sampling when all horizontal and lateral slices are selected. The indices of the cross-concentrated samples are denoted as ฮฉR and ฮฉC, corresponding to the notation used for the subtensors. Our task is to recover an underlying tensor T with tubal-rank ๐‘Ÿ from the observations on ฮฉR โˆช ฮฉC: (cid:68) min (cid:101)T PฮฉR โˆชฮฉC (T โˆ’ (cid:101)T ), T โˆ’ (cid:101)T (cid:69) , subject to tubal-rank( (cid:101)T ) = ๐‘Ÿ, (2.6) where (cid:104)ยท, ยท(cid:105) is the Frobenius inner product and PฮฉR โˆชฮฉC is defined in (2.2). Algorithm 2.5 Tensor Cross-Concentrated Sampling (t-CCS) 1: Input: T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. 2: Uniformly select the horizontal and lateral indices, denoted as ๐ผ and ๐ฝ, respectively. 3: Set R := [T ] ๐ผ,:,: and C := [T ]:,๐ฝ,:. 4: Sample entries from R and C based on Bernoulli sampling models. Record the locations of these samples as ฮฉR and ฮฉC for R and C, respectively. 5: Output: [T ]ฮฉR โˆชฮฉC , ฮฉR, ฮฉC, ๐ผ, ๐ฝ. This chapter aims to provides a theoretical well-posedness of the t-CCS model, our key theo- retical contribution, detailed in Theorem 2.6. 17 2.3 Theoretical Results This section is dedicated to providing a theoretical analysis of the well-posedness of the t-CCS model, which represents our principal theoretical contribution. This analysis is thoroughly detailed in Theorem 2.6. Before presenting our main theoretical result Theorem 2.6, we firstly introduce two important supporting theorems, where the proof of main theoretical result rely on. The first theo- rem, Theorem 2.4, establishes the necessary lower bounds for the number of lateral and horizontal slices required when uniformly sampling these slices to ensure an exact t-CUR decomposition. The- orem 2.4 can be seen as an adaptation of [109, Corollary 3.10], featuring a different proof method specifically designed for uniform sampling and exact t-CUR, and provides a more thorough analysis for this specific context. Before presenting Theorem 2.4, letโ€™s briefly review the sampling schemes for matrix CUR de- composition. Various sampling schemes are designed to ensure the chosen rows and columns val- idate the CUR decomposition. For example, deterministic methods are explored in works such as [6, 8, 79]. Randomized sampling algorithms for CUR decompositions and the column subset selection problem have been extensively studied, as seen in [32, 38, 40, 86, 114, 122]. For a compre- hensive overview of both approaches, refer to [57]. Hybrid methods that combine both approaches are discussed in [9, 10, 17]. Particularly, for a rank ๐‘Ÿ matrix in K๐‘›1ร—๐‘›2 with ๐œ‡-incoherence, sampling O (๐œ‡๐‘Ÿ log(๐‘›1)) rows and O (๐œ‡๐‘Ÿ log(๐‘›2)) columns is sufficient to ensure the exact matrix CUR decomposition [12, 32]. In this work, we extend the uniform sampling results from the matrix setting to the tensor setting. Theorem 2.4. Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 satisfy the tensor ๐œ‡0-incoherence condition and have multi-rank (cid:174)๐‘Ÿ. The indices ๐ผ and ๐ฝ are selected uniformly randomly without replacement from [๐‘›1] and [๐‘›2] respectively. Set C = [T ]:,๐ฝ,:, R = [T ] ๐ผ,:,: and U = [T ] ๐ผ,๐ฝ,:. Then T = C โˆ— Uโ€  โˆ— R holds with 18 probability at least 1 โˆ’ 1 ๐‘›๐›ฝ 1 โˆ’ 1 ๐‘›๐›ฝ 2 , provided that |๐ผ | โ‰ฅ 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›1(cid:107)(cid:174)๐‘Ÿ (cid:107)1) |๐ฝ | โ‰ฅ 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›2(cid:107)(cid:174)๐‘Ÿ (cid:107)1) . Another important supporting theorem is Theorem 2.5, which adapts [138, Theorem 3.1] for tensor recovery with tubal-rank ๐‘Ÿ under Bernoulli sampling, essential for Theorem 2.6. Our contri- bution refines the theorem by explicitly detailing the numerical constants in the original sampling probability. The proof of Theorem 2.5 is under the same framework as in [84, 138]. Theorem 2.5. Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 of tubal-rank ๐‘Ÿ satisfy the tensor ๐œ‡0-incoherence condition. And its compact t-SVD is T = U โˆ— S โˆ— V(cid:62) where U โˆˆ K๐‘›1ร—๐‘Ÿร—๐‘›3, S โˆˆ K๐‘Ÿร—๐‘Ÿร—๐‘›3 and V โˆˆ K๐‘›2ร—๐‘Ÿร—๐‘›3. Suppose the entries in ฮฉ are sampled according to the Bernoulli model with probability ๐‘. If ๐‘ โ‰ฅ 256๐›ฝ(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ log2(๐‘›1๐‘›3 + ๐‘›2๐‘›3) ๐‘›1๐‘›2 with ๐›ฝ โ‰ฅ 1, (2.7) then T is the unique minimizer to (cid:107)T (cid:107)TNN, subject to Pฮฉ(T ) = Pฮฉ(T ), min T with probability at least 1 โˆ’ 3 log(๐‘›1๐‘›3+๐‘›2๐‘›3) (๐‘›1๐‘›3+๐‘›2๐‘›3)4๐›ฝโˆ’2 . Theorem 2.6. Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 satisfy the tensor ๐œ‡0-incoherence condition and have multi-rank (cid:174)๐‘Ÿ with condition number ๐œ…. Let ๐ผ โІ [๐‘›1], ๐ฝ โІ [๐‘›2] be chosen uniformly with replacement to yield R = [T ] ๐ผ,:,: and C = [T ]:,๐ฝ,:. And suppose that ฮฉR and ฮฉC are generated from R and C according to the Bernoulli distributions with probability ๐‘R and ๐‘C respectively. If |๐ผ | โ‰ฅ 3200๐›ฝ๐œ‡0๐‘Ÿ ๐œ…2 log2(๐‘›1๐‘›3 + ๐‘›2๐‘›3), |๐ฝ | โ‰ฅ 3200๐›ฝ๐œ‡0๐‘Ÿ ๐œ…2 log2(๐‘›1๐‘›3 + ๐‘›2๐‘›3), ๐‘R โ‰ฅ ๐‘C โ‰ฅ 1600(|๐ผ | + ๐‘›2)๐œ‡0๐‘Ÿ ๐œ…2 log2((๐‘›1 + ๐‘›2)๐‘›3) |๐ผ |๐‘›2 1600(|๐ฝ | + ๐‘›1)๐œ‡0๐‘Ÿ ๐œ…2 log2((๐‘›1 + ๐‘›2)๐‘›3) |๐ฝ |๐‘›1 , 19 for some absolute constant ๐›ฝ > 1, then T can be uniquely determined from the entries on ฮฉR โˆช ฮฉC with probability at least 1 โˆ’ โˆ’ 1 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)800๐›ฝ๐œ…2 log(๐‘›2) 3 log(๐‘›1๐‘›3 + |๐ฝ |๐‘›3) (๐‘›1๐‘›3 + |๐ฝ |๐‘›3)4๐›ฝโˆ’2 โˆ’ 1 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)800๐›ฝ๐œ…2 log(๐‘›1) . 3 log(๐‘›2๐‘›3 + |๐ผ |๐‘›3) (๐‘›2๐‘›3 + |๐ผ |๐‘›3)4๐›ฝโˆ’2 โˆ’ Remark 1. (i) When ๐‘›1 = ๐‘›2 = ๐‘›, the results in the above theorem can be simplified to that T can be uniquely determined from the entries on ฮฉR โˆชฮฉC with probability at least 1โˆ’ 6 log(2๐‘›๐‘›3) (๐‘›๐‘›3)4๐›ฝโˆ’2 . (ii) Supposed T with multi-rank (cid:174)๐‘Ÿ of low tubal-rank ๐‘Ÿ is the underlying tensor we aim to re- cover. Notice that such T is one of feasible solutions to the optimization problem (2.1) since tubal-rank(T ) = ๐‘Ÿ. Additionally, it is evident that for any (cid:101)T with tubal-rank ๐‘Ÿ, (cid:104)Pฮฉ( (cid:101)T โˆ’ T ), (cid:101)T โˆ’ T )(cid:105) โ‰ฅ 0 and (cid:104)Pฮฉ(T โˆ’ T ), T โˆ’ T (cid:105) = 0. Thus, T is a global minimizer to the optimization problem (2.1). According to Theorem 2.6, T with low tubal-rank ๐‘Ÿ can be reliably recovered using the t-CCS model with high probabil- ity. Consequently, we can obtain a minimizer for the non-convex optimization problem (2.1) through samples that are partially observed from the t-CCS model. Theorem 2.6 elucidates that a sampling complexity of O (๐‘Ÿ ๐œ…2 max{๐‘›1, ๐‘›2}๐‘›3 log2(๐‘›1๐‘›3 + ๐‘›2๐‘›3)) is sufficient for TC on t-CCS model. This complexity is a ๐œ…2 factor worse than that of the benchmark provided by the state-of-the-art Bernoulli-sampling-based TC methods, such as the TNN method detailed by Zhang and Aeron [138], which demands O (๐‘Ÿ max{๐‘›1, ๐‘›2}๐‘›3 log2(๐‘›1๐‘›3 + ๐‘›2๐‘›3)) samples. This observation suggests the potential for identifying a more optimal lower bound, which will leave as a future direction. 20 2.4 An efficient solver for t-CCS In this section, we investigate how to effectively and efficiently solve the t-CCS-based TC prob- lem. First, we consider directly applying several existing TC algorithms including BCPF [140], TMac [129], TNN [138], and F-TNN [66] to a t-CCS-based image recovery problem, where BCPF is CP-based algorithm, TMac is Tucker-based algorithm, TNN and F-TNN are two tubal-rank-based algorithms. However, it turns out that these methods are not well-suited for the tensor completion problem based on t-CCS model. As illustrated in Figure 2.5, these approaches fail to yield reliable visualization outcomes. This indicates the necessity to develop new algorithm(s) for the proposed t-CCS model. Ground truth Observed BCPF TMac TNN F-TNN Figure 2.5 Visual results of color image inpainting using t-CCS samples at an overall sampling rate of 20% with BCPF, TMac, TNN, and F-TNN algorithms. 2.4.1 Iterative tensor CUR completion algorithm To efficiently use the t-CCS structure, we develop the Iterative Tensor CUR Completion (ITCURC), a non-convex algorithm inspired by projected gradient descent. ITCURC updates R, C, and U at each iteration to preserve the tubal-rank ๐‘Ÿ of T . The update formulas are: [R๐‘˜+1]:,๐ฝ(cid:251),: := [T๐‘˜ ] ๐ผ,๐ฝ(cid:251),: + ๐œ‚๐‘… (cid:2)PฮฉR (T โˆ’ T๐‘˜ )(cid:3) [C๐‘˜+1] ๐ผ(cid:251),:,: := [T๐‘˜ ] ๐ผ(cid:251),๐ฝ + ๐œ‚๐ถ (cid:2)Pฮฉ๐ถ (T โˆ’ T๐‘˜ )(cid:3) , ๐ผ,๐ฝ(cid:251),: , ๐ผ(cid:251),๐ฝ,: U๐‘˜+1 := H๐‘Ÿ (cid:16) [T๐‘˜ ] ๐ผ,๐ฝ,: + ๐œ‚๐‘ˆ (cid:2)Pฮฉ๐‘…โˆชฮฉ๐ถ (T โˆ’ T๐‘˜ )(cid:3) (cid:17) , ๐ผ,๐ฝ,: (2.8) (2.9) (2.10) where ๐œ‚๐‘…, ๐œ‚๐ถ, ๐œ‚๐‘ˆ are step sizes, and H๐‘Ÿ is the truncated t-SVD operator. [R๐‘˜+1]:,๐ฝ,: and [C๐‘˜+1] ๐ผ,:,: are updated to U๐‘˜+1. The algorithm, starting from T0 = 0, iterates until ๐‘’๐‘˜ โ‰ค ๐œ€, where ๐œ€ is a preset 21 tolerance and ๐‘’๐‘˜ = (cid:10)PฮฉR โˆชฮฉC (T โˆ’ T๐‘˜ ) , T โˆ’ T๐‘˜ (cid:11) (cid:10)PฮฉR โˆชฮฉC (T ), T (cid:11) . (2.11) The algorithm is summarized in Algorithm 2.6. Algorithm 2.6 Iterative CUR tensor completion for t-CCS (ITCURC) 1: Input: [T ]ฮฉR โˆชฮฉC : observed data; ฮฉR, ฮฉC : observed locations; ๐ผ, ๐ฝ : horizontal and lateral indexes that define R and C respectively; ๐œ‚๐‘…, ๐œ‚๐ถ, ๐œ‚๐‘ˆ : step sizes; ๐‘Ÿ : target tubal-rank; ๐œ€ : tolerance level. 2: Set T0 = 0 โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. 3: while ๐‘’๐‘˜ > ๐œ€ do // ๐‘’๐‘˜ is defined in (2.11) [R๐‘˜+1]:,๐ฝ(cid:251),: = [T๐‘˜ ] ๐ผ,๐ฝ(cid:251),: + ๐œ‚๐‘… [PฮฉR ([T ]ฮฉR โˆชฮฉC โˆ’ T๐‘˜ )] ๐ผ,๐ฝ(cid:251),: 4: [C๐‘˜+1] ๐ผ(cid:251),:,: = [T๐‘˜ ] ๐ผ(cid:251),๐ฝ,: + ๐œ‚๐ถ [PฮฉC ([T ]ฮฉR โˆชฮฉC โˆ’ T๐‘˜ )] ๐ผ(cid:251),๐ฝ,: 5: 6: U๐‘˜+1 = H๐‘Ÿ ( [T๐‘˜ ] ๐ผ,๐ฝ,: + ๐œ‚๐‘ˆ [PฮฉR โˆชฮฉC ([T ]ฮฉR โˆชฮฉC โˆ’ T๐‘˜ )] ๐ผ,๐ฝ,:) 7: 8: 9: 10: end while 11: Output: C๐‘˜+1, U๐‘˜+1 and R๐‘˜+1 [R๐‘˜+1]:,๐ฝ,: = U๐‘˜+1 Update T๐‘˜+1// More details see (2.12), (2.13), (2.14) ๐‘˜ = ๐‘˜ + 1 [C๐‘˜+1] ๐ผ,:,: = U๐‘˜+1. and Now letโ€™s outline Algorithm 2.6โ€™s implementation and computational costs. Updating [R๐‘˜+1]:,๐ฝ(cid:251),: and [C๐‘˜+1] ๐ผ(cid:251),:,: incurs O (|ฮฉR | + |ฮฉC | โˆ’ |ฮฉU |) flops, focusing only on observed locations (refer to (2.8) and (2.9)). The update of U๐‘˜+1, sized |๐ผ | ร— |๐ฝ | ร— ๐‘›3, involves (i) computing (cid:101)U๐‘˜+1 := [T๐‘˜ ] ๐ผ,๐ฝ,: + ๐œ‚๐‘ˆ [PฮฉR โˆชฮฉC ( [T ]ฮฉR โˆชฮฉC โˆ’ T๐‘˜ )] ๐ผ,๐ฝ,: and (ii) finding its tubal-rank ๐‘Ÿ approximation via t-SVD. The cost for (i) is O (|ฮฉU |), while (ii) requires max{O (|๐ผ ||๐ฝ |๐‘Ÿ๐‘›3), O (|๐ผ ||๐ฝ |๐‘›3 log(๐‘›3))}, making the total update cost for U๐‘˜ to be max{O (|๐ผ ||๐ฝ |๐‘Ÿ๐‘›3), O (|๐ผ ||๐ฝ |๐‘›3 log(๐‘›3))}. Considering the cost of updating T๐‘˜+1 in Algorithm 2.6, we focus on [T๐‘˜+1] ๐ผ(cid:251),๐ฝ,:, [T๐‘˜+1] ๐ผ,๐ฝ(cid:251),:, and [T๐‘˜+1] ๐ผ,๐ฝ,: each iteration. The update for [T๐‘˜+1] ๐ผ(cid:251),๐ฝ,: is: [T๐‘˜+1] ๐ผ(cid:251),๐ฝ,: = [C๐‘˜ ] ๐ผ(cid:251),:,: โˆ— Uโ€  ๐‘˜ โˆ— U๐‘˜ = [C๐‘˜ ] ๐ผ(cid:251),:,: โˆ— [V๐‘˜ ]:,1:๐‘Ÿ,: โˆ— [V๐‘˜ ](cid:62) :,1:๐‘Ÿ,: . (2.12) is U๐‘˜ โ€™s t-SVD. Given [V๐‘˜ ]:,1:๐‘Ÿ,:โ€™s size as |๐ฝ | ร— ๐‘Ÿ ร— ๐‘›3, the compu- where U๐‘˜ = W๐‘˜ โˆ— ฮฃ๐‘˜ โˆ— V(cid:62) ๐‘˜ tational cost is O (๐‘›1|๐ฝ |๐‘Ÿ๐‘›3) flops for (2.12), making the total complexity for updating [T๐‘˜+1] ๐ผ(cid:251),๐ฝ,: also O (๐‘›1|๐ฝ |๐‘Ÿ๐‘›3) flops. We update [T๐‘˜+1] ๐ผ,๐ฝ(cid:251),: by [T๐‘˜+1] ๐ผ,๐ฝ(cid:251),: := [W๐‘˜ ]:,1:๐‘Ÿ,: โˆ— [W๐‘˜ ](cid:62) :,1:๐‘Ÿ,: โˆ— [R๐‘˜ ]:,๐ฝ(cid:251),: . (2.13) 22 Similar analysis for updating [T๐‘˜+1] ๐ผ,๐ฝ(cid:251),:, the computational complexity of updating [T๐‘˜+1] ๐ผ,๐ฝ(cid:251),: is O (๐‘›2|๐ผ |๐‘Ÿ๐‘›3). And we update [T๐‘˜+1] ๐ผ,๐ฝ,: by setting [T๐‘˜+1] ๐ผ,๐ฝ,: := U๐‘˜ . (2.14) Thus, computational complexity of updating T๐‘˜+1 is O (|๐ผ |๐‘Ÿ๐‘›2๐‘›3 + |๐ฝ |๐‘Ÿ๐‘›1๐‘›3). Computation of the stopping criterion ๐‘’๐‘˜ cost O (|ฮฉ๐‘… | + |ฮฉ๐ถ | โˆ’ |ฮฉ๐‘ˆ |) flops as we only make computations on the observed locations. The computational costs per iteration are summarized in Table 2.1, showing a complexity of O (๐‘Ÿ |๐ผ |๐‘›2๐‘›3 + ๐‘Ÿ |๐ฝ |๐‘›1๐‘›3) when |๐ผ | (cid:28) ๐‘›1 and |๐ฝ | (cid:28) ๐‘›2. Table 2.1 A Comprehensive Examination of the Per-Iteration Computational Cost for ITCURC. Step Line 3: Computing the stopping criterion ๐‘’๐‘˜ Line 4: [R๐‘˜+1]:,๐ฝ (cid:251) ,: = [T๐‘˜] ๐ผ,๐ฝ (cid:251) ,: + ๐œ‚๐‘… [PฮฉR ( [T ]ฮฉR โˆชฮฉC โˆ’ T๐‘˜)] ๐ผ,๐ฝ (cid:251) ,: Line 5: [C๐‘˜+1] ๐ผ (cid:251) ,:,: = [T๐‘˜] ๐ผ (cid:251) ,๐ฝ ,: + ๐œ‚๐ถ [๐‘ƒฮฉC ( [T ]ฮฉR โˆชฮฉC โˆ’ T๐‘˜)] ๐ผ (cid:251) ,๐ฝ ,: Line 6: U๐‘˜+1 = H๐‘Ÿ ([T๐‘˜] ๐ผ, ๐ฝ ,: + ๐œ‚๐‘ˆ [PฮฉR โˆชฮฉC ( [T ]ฮฉR โˆชฮฉC โˆ’ T๐‘˜)] ๐ผ,๐ฝ ,:) O (max{|๐ผ ||๐ฝ |๐‘Ÿ๐‘›3, |๐ฝ ||๐ผ |๐‘›3 log(๐‘›3)}) O (|ฮฉ๐‘… | + |ฮฉ๐ถ | โˆ’ |ฮฉ๐‘ˆ |) O (|ฮฉ๐‘… | โˆ’ |ฮฉ๐‘ˆ |) O (|ฮฉ๐ถ | โˆ’ |ฮฉ๐‘ˆ |) Computational Complexity Line 8: Updating T๐‘˜+1 O (๐‘Ÿ |๐ผ |๐‘›2๐‘›3 + ๐‘Ÿ |๐ฝ |๐‘›1๐‘›3) 2.5 Numerical Experiments This section presents the performance of our t-CCS based ITCURC through numerical exper- iments on both synthetic and real-world data. The computations are performed on one of shared nodes of the Computing Cluster with a 64-bit Linux system (GLNXA64), featuring Intel(R) Xeon(R) Gold 6148 CPU (2.40 GHz). All experiments are carried out using MATLAB 2022a. 2.5.1 Synthetic data examples This section evaluates ITCURC for t-CCS tensor completion, exploring the needed sample sizes and the impact of Bernoulli sampling probability and fiber sampling rates on low-tubal-rank tensor recovery. We assess ITCURCโ€™s tensor recovery capability under different combinations of horizontal and lateral slice numbers |๐ผ | = ๐›ฟ๐‘›1, |๐ฝ | = ๐›ฟ๐‘›2 and Bernoulli sampling rates ๐‘ on selected subtensors. The study uses tensors of size 768 ร— 768 ร— 256 with tubal-ranks ๐‘Ÿ โˆˆ {2, 5, 7}. To counteract 23 ๐‘Ÿ = 2 ๐‘Ÿ = 5 ๐‘Ÿ = 7 Figure 2.6 (Row 1) 3D and (Row 2) 2D views illustrate ITCURCโ€™s empirical phase transition for the t-CCS model. ๐›ฟ = |๐ผ |/768 = |๐ฝ |/768 shows sampled indices ratios, ๐‘ is the Bernoulli sampling probability over subtensors, and ๐›ผ is the overall tensor sampling rate. White and black in the 768 ร— 768 ร— 256 tensor results represent success and failure, respectively, across 25 tests for tubal ranks 2, 5, and 7 (Columns 1-3). The ๐›ผ needed for success remains consistent across different combinations ๐›ฟ and ๐‘. randomness, we conduct 25 tests for each (๐›ฟ, ๐‘, ๐‘Ÿ) set, a test is successful if (cid:13) (cid:13) (cid:13) T โˆ’ C๐‘˜ โˆ— Wโ€  ๐‘˜ โˆ— R๐‘˜ (cid:13) (cid:13) (cid:13)F ๐œ€๐‘˜ := (cid:107)T (cid:107)F โ‰ค 10โˆ’3. Our empirical phase transition results are presented in Figure 2.6, with the first row showing a 3D view of the phase transition results and the second row the corresponding 2D view. White and black pixels in these visuals indicate all testsโ€™ success and failure, respectively. The results highlight that higher overall sampling rates are needed for successful completion with larger tubal-ranks ๐‘Ÿ. Importantly, tensor completion is achievable with sufficiently large overall sampling rates, regard- less of the specific horizontal, lateral slice sizes, and subtensor sampling rates (see the results of 2D view). This demonstrates ITCURCโ€™s flexibility in sampling low-tubal-rank tensors for successful reconstruction. Additionally, we include our numerical results for the convergence of TICURC in the following section. In the following, we include further empirical data demonstrating the conver- 24 gence behavior of the ITCURC algorithm within the t-CCS model framework. In this experiment, we form a low tubal-rank tensor T = A โˆ— B โˆˆ R๐‘›1ร—๐‘›2ร—๐‘›3 using two Gaussian random tensors, where A โˆˆ R๐‘›1ร—๐‘Ÿร—๐‘›3 and B โˆˆ R๐‘Ÿร—๐‘›2ร—๐‘›3. Our objective is to examine the convergence behavior of the TICURC algorithm under different conditions. For the simulations, we set ๐‘›1 = ๐‘›2 = 768 and ๐‘›3 = 256, and generate partial observations using the t-CCS model by adjusting the rank ๐‘Ÿ and configuring the concentrated subtensors as R โˆˆ R๐›ฟ๐‘›1ร—๐‘›2ร—๐‘›3 and C โˆˆ R๐‘›1ร—๐›ฟ๐‘›2ร—๐‘›3, with 0 < ๐›ฟ < 1. For each fixed ๐‘Ÿ, we maintain a constant overall sampling rate ๐›ผ. Utilizing the observed data, the TICURC algorithm is then employed to approximate the original low tubal-rank tensor. The algorithm continues until the stopping criterion ๐œ€๐‘˜ โ‰ค 10โˆ’6 is met, where ๐œ€๐‘˜ represents the relative error between the estimate at the ๐‘˜-th iteration and the actual tensor, defined as ๐œ€๐‘˜ = (cid:107)T โˆ’(cid:98)T๐‘˜ (cid:107)๐น (cid:107)T (cid:107)๐น . For each specified set of parameters (๐‘Ÿ, ๐›ฟ, ๐›ผ), we generate 10 different tensor completion scenar- ios. The mean relative errors ๐œ€๐‘˜ , along with the specific configurations, are reported in Figures 2.7 to 2.10. One can see that TICURC can achieve an almost linear convergence rate. (a) ๐‘Ÿ = 2, ๐›ผ = 0.15 (b) ๐‘Ÿ = 5, ๐›ผ = 0.25 Figure 2.7 The averaged relative error of TICURC under the t-CCS model with respect to iterations over 10 independent trials with ๐›ฟ = 0.20. 25 (a) ๐‘Ÿ = 2, ๐›ผ = 0.15. (b) ๐‘Ÿ = 5, ๐›ผ = 0.25 Figure 2.8 The averaged relative error of TICURC under the t-CCS model with respect to iterations over 10 independent trials with ๐›ฟ = 0.25. (a) ๐‘Ÿ = 2, ๐›ผ = 0.15 (b) ๐‘Ÿ = 5, ๐›ผ = 0.25 Figure 2.9 The averaged relative error of TICURC under the t-CCS model with respect to iterations over 10 independent trials with ๐›ฟ = 0.30. 2.5.2 Real-world Applications This section presents an evaluation and comparison between the t-CCS model and the Bernoulli Sampling model through tensor completion tasks across various types of data. Our goal is to ex- amine assess the practical feasibility and real-world applicability of the t-CCS model, emphasizing its effectiveness in diverse operational environments. Our experiments are designed to compare the performance of ITCURC, designed based on t-CCS model, against with established TC methods designed based on Bernoulli sampling model such as BCPF [140], TMac [129], TNN [138], and 26 (a) ๐‘Ÿ = 2, ๐›ผ = 0.15 (b) ๐‘Ÿ = 5, ๐›ผ = 0.25 Figure 2.10 The averaged relative error of TICURC under the t-CCS model with respect to iterations over 10 independent trials with ๐›ฟ = 0.35. F-TNN [66]. Our test metric focuses on the quality and execution time of the reconstruction. Qual- ity is assessed using the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index (SSIM) where PSNR = 10 log10 (cid:32) ๐‘›1๐‘›2๐‘›3(cid:107)T (cid:107)2 โˆž (cid:107)T โˆ’ (cid:101)T (cid:107)2 F (cid:33) . and SSIM evaluates the structural similarity between two images, as detailed in [125]. On account of that the data are third-order tensors, we report the mean values of SSIM of all the frontal slices. Higher PSNR and SSIM scores suggest better reconstruction quality. Our experimental process is as follows. We first generate random observations via the t-CCS model: uniformly randomly selecting concentrated horizontal (R) and lateral (C) subtensors, de- fined as R = [T ] ๐ผ,:,: and C = [T ]:,๐ฝ,:, with |๐ผ | = ๐›ฟ๐‘›1 and |๐ฝ | = ๐›ฟ๐‘›2; entries in R and C are sampled based on the Bernoulli sampling model with locations of the observed entries denoted by ฮฉR and ฮฉC. The procedure of the t-CCS model results in a tensor thatโ€™s only partially observed, primarily in the R and C. ITCURC is then applied to estimate the missing entries and thus recover the origi- nal tensor. For comparison, we also generate observations of the entire original tensor T using the Bernoulli sampling model with a probability ๐‘T := |ฮฉR โˆชฮฉC | ๐‘›1๐‘›2๐‘›3 . Additionally, we estimate the missing data using several tensor completion methods: BCPF1, which is based on the CP decomposition 1https://github.com/qbzhao/BCPF 27 framework; TMac2, which utilizes the Tucker decomposition framework; and TNN3 and F-TNN4, which are both grounded in the t-SVD framework. To ensure reliable results, we repeat this entire procedure 30 times, averaging the PSNR and SSIM scores and the runtime to minimize the effects of randomness. 2.5.2.1 Color image completion Color images, viewed as 3D tensors with dimensions for height, width, and color channels, are effectively modeled as low-tubal-rank tensors [80, 83]. In our tests, we focus on two large-size images: โ€˜Buildingโ€™5 (of size 2579 ร— 3887 ร— 3) and โ€˜Windowโ€™6 (of size 3009 ร— 4513 ร— 3). We present averaged test results over various overall observation rates (๐›ผ) in Table 2.2, and visual comparisons at ๐›ผ = 20% in Figure 2.11. Ground truth BCPF TMac TNN F-TNN ITCURC Figure 2.11 The visualization of color image inpainting for Building and Window datasets by setting tubal-rank ๐‘Ÿ = 35 with the percentage selected horizontal and lateral slices ๐›ฟ = 13% with overall sampling rate 20% for TICUR algorithm, while other algorithms are applied based on Bernoulli sampling model with the same overall sampling rate 20%. Additionally, t-CCS samples on the Building for ITCURC are the same as those in Figure 2.5. Figure 2.11 presents a clear visual comparison of results from different methods at a 20% over- all sampling rate, where the algorithms BCPF, TMac, TNN, F-TNN are applied on the Bernoulli Sampling model and ITCURC are applied on t-CCS model. Ground truth is the original image of a building, and a window. BCPF underperforms in visual effects compared to other methods. TNN shows slight variations from ground truth, maintaining colors and details with minor discrepancies. 2https://xu-yangyang.github.io/TMac/ 3https://github.com/jamiezeminzhang/ 4https://github.com/TaiXiangJiang/Framelet-TNN 5https://pxhere.com/en/photo/57707 6https://pxhere.com/en/photo/1421981 28 TMac reveals some notable differences. F-TNN improves reflection fidelity and color saturation, closely resembling ground truth. ITCURC also achieves the high similarity to Ground truth, accu- rately reproducing colors and details. Moreover, ITCURC significantly outperforms TMac, TNN, and F-TNN in the t-CCS based color image completion task, evidenced by the unsatisfactory results of BCPF, TMac, TNN, and F-TNN under t-CCS model, as illustrated in Figure 2.5. Table 2.2 Image inpainting results on the Building and Window datasets. The best results are emphasized in bold, while the second-best results are underlined. ITCURC-๐›ฟ refers to the ITCURC method with the percentages of selected horizontal and lateral slices set at a fixed rate of ๐›ฟ%. The t-CCS based algorithm ITCURC-๐›ฟ%s are performed on t-CCS scheme while other Bernoulli based algorithms are performed on Bernoulli Sampling scheme. Dataset Building Window Overall Observation Rate 12% 16% 20% 12% 16% 20% PSNR SSIM Runtime (sec) ITCURC-11 ITCURC-12 ITCURC-13 BCPF TMac TNN F-TNN ITCURC-11 ITCURC-12 ITCURC-13 BCPF TMac TNN F-TNN ITCURC-11 ITCURC-12 ITCURC-13 BCPF TMac TNN F-TNN 28.9249 28.5518 28.1893 26.7939 27.0425 26.3466 28.2529 0.8310 0.8172 0.8016 0.8639 0.8402 0.6458 0.7583 31.0050 30.8055 30.7260 28.2949 30.1755 30.3844 30.1521 0.8880 0.8818 0.8774 0.8761 0.8586 0.8257 0.8354 32.1645 31.9489 31.6825 29.4298 32.3632 31.7512 33.1660 0.9118 0.9033 0.8954 0.8873 0.9111 0.8382 0.8626 35.2830 35.1195 35.0196 30.1611 33.2673 31.8747 35.6747 0.8571 0.8535 0.8504 0.8269 0.8200 0.8333 0.8745 36.1611 36.1145 36.1215 33.9990 36.6370 34.6443 36.9233 0.8738 0.8733 0.8731 0.8554 0.8928 0.8564 0.8899 37.0236 37.0174 36.8885 35.4780 37.5877 36.7893 37.2618 0.8848 0.8850 0.8837 0.8727 0.9035 0.8804 0.9066 10.9354 10.7715 12.2208 213.6800 92.9568 3651.4556 2642.9409 18.1098 17.3187 19.7731 19.3517 22.0287 21.2458 613.3072 360.2903 108.6827 104.8518 3289.5535 3004.6557 2692.6197 2267.5622 23.8990 25.5856 28.8653 345.3425 233.8853 5801.1631 4739.2703 24.1286 26.3275 29.4986 500.3060 242.7499 6572.9697 4134.5206 25.1853 28.0392 30.8361 1629.8061 259.6068 6690.7945 4105.0327 Table 2.2 shows ITCURC typically offers quality that is comparable to that of Bernoulli Sam- pling based TC algorithms. In runtime efficiency, ITCURC leveraging the t-CCS model signifi- cantly surpasses BCPF, TMac, TNN, and F-TNN, all of which are based on the Bernoulli sampling model. This efficiency enhancement highlights the t-CCS modelโ€™s superior performance in prac- tical applications. Additionally, ITCURCโ€™s consistent performance in delivering similar quality results across different ๐›ฟ, provided the overall sampling rates are consistent. These highlight the 29 flexibility and feasibility of the t-CCS model. 2.5.2.2 MRI reconstruction In this study, we test on a MRI heart dataset7 (of size 320 ร— 320 ร— 110), where compact t-SVD with tubal-rank 35 yields less than 10% error, suggesting low-tubal-rank property of dataset. The visualization of reconstruction of MRI data using different methods at a 30% overall sampling rate are presented in Figure 2.12, and reconstruction quality and runtime are detailed in Table 2.3. Ground truth BCPF TMac TNN F-TNN ITCURC Figure 2.12 Visualizations of MRI data recovery using ITCURC with tubal rank ๐‘Ÿ = 35, lateral and horizontal slice selection rate ๐›ฟ = 27%, and an overall sampling rate of 30%. Other algorithms are applied under Bernoulli sampling with the same overall sampling rate. Results for slices 51, 66, 86, and 106 are shown in rows 1 to 4, with a 1.3ร— magnified area at the bottom left of each result for clearer comparison. Figure 2.12 shows recovery results for four frontal MRI slices using BCPF, TMac, TNN, F-TNN all under Bernoulli sampling model, and ITCURC under t-CCS model. The groundtruth serves as 7http://medicaldecathlon.com/dataaws 30 the actual dataset, from which missing values are to be predicted by the different algorithms. BCPF shows notable artifacts and lacks the sharp edges of the heartโ€™s interior structures. TMac improves over BCPF but still presents a softer representation of cardiac anatomy. TNN enhances the detail prediction, resulting in a more accurate completion of the tensor that begins to resemble the ref- erence more closely. F-TNN maintains improvements on detail prediction, and edges within the cardiac structure suggest a refined approach to tensor completion. ITCURC shows a reconstruction where the cardiac structures are clearly defined, reflecting the structure present in the Ground truth without implying superiority, but rather indicating effectiveness in predicting the missing values. The highlighted regions of interest (ROIs), marked in blue, allow for a detailed comparison across the methods. In these regions of interest (ROIs), though ITCURCโ€™s reconstructions may not provide the most visually appealing results, they demonstrate efficiency in preserving structural integrity and texture, which are crucial aspects for clinical applications. Table 2.3 effectively demonstrates the flexibility and feasibility of the t-CCS model and we can see that reconstruction performance of t-CCS based method ITCURC generally aligns with, or matches, the reconstruction quality of Bernoulli-sampling-based TC methods. Furthermore, in terms of runtime efficiency, ITCURC, im- plemented under the t-CCS model, demonstrates a marked superiority by significantly outperform- ing alternatives such as BCPF, TMac, TNN, and F-TNN, all of which are applied under Bernoulli sampling scheme. This notable advantage distinctly underscores the enhanced effectiveness of the t-CCS model in practical applications. 2.5.2.3 Seismic data reconstruction Geophysical 3D seismic data is often modeled as a tensor with inline, crossline, and depth dimensions. In our analysis, we focus on a seismic dataset8 of size 51 ร— 191 ร— 146, where compact t-SVD with tubal-rank 3 yields less than 5% error, suggesting low-tubal-rank property of dataset. The corresponding results are detailed in Figure 2.13 and Table 2.4. Figure 2.13 presents the comparative analysis of seismic completion algorithms: BCPF, TMac, TNN, and F-TNN, applied based on the Bernoulli sampling model, in contrast to ITCURC, which is 8https://terranubis.com/datainfo/F3-Demo-2020 31 Table 2.3 The quantitative results for MRI data completion are presented, with the best results in bold and the second-best underlined. ITCURC-๐›ฟ represents the ITCURC method specifying that the selected proportion of horizontal and lateral slices is exactly ๐›ฟ%. The t-CCS based algorithm ITCURC-๐›ฟ%s are performed on t-CCS scheme while other Bernoulli based algorithms are per- formed on Bernoulli Sampling scheme. Overall Observation Rate 10% 15% 20% 25% 30% PSNR SSIM Runtime (sec) ITCURC-23 ITCURC-25 ITCURC-27 BCPF TMac TNN F-TNN ITCURC-23 ITCURC-25 ITCURC-27 BCPF TMac TNN F-TNN ITCURC-23 ITCURC-25 ITCURC-27 BCPF TMac TNN F-TNN 22.4004 22.2548 22.1617 22.6581 22.8690 23.4779 21.8172 0.6020 0.5990 0.5990 0.6817 0.6804 0.6304 0.6442 5.7908 5.4241 5.8075 53.1651 30.4813 87.7591 91.2048 24.3553 24.0435 23.9311 24.5373 25.4225 25.3480 25.7453 0.6821 0.6769 0.6751 0.7151 0.7323 0.7494 0.7507 7.0230 8.3488 8.8408 88.2777 28.0944 84.0952 86.3112 26.9104 26.9940 27.0871 25.1663 27.7802 27.9423 27.1969 0.7584 0.7571 0.7567 0.7192 0.7873 0.7677 0.8181 29.1861 29.0219 29.0699 25.8111 29.1526 28.4522 29.3630 0.8160 0.8084 0.8086 0.7301 0.8227 0.7984 0.8562 30.3911 31.1752 31.2539 26.2042 31.1648 30.5580 31.3651 0.8451 0.8619 0.8600 0.7367 0.8924 0.8793 0.8871 4.8030 5.5303 6.0685 111.1949 28.6216 56.9761 84.0228 5.3058 5.9375 6.4371 180.6596 28.9400 57.9823 82.2064 5.9484 6.7111 7.4916 279.2789 30.0219 58.2098 81.2119 applied based on the t-CCS model. The ground truth serves as the definitive reference, with its stark textural definition. BCPF falls short of delivering optimal fidelity, with finer details lost in trans- lation. TMac is commendable for preserving the textureโ€™s integrity, providing a cohesive image. TNN improves upon this, sharpening textural nuances and closing in on the ground truthโ€™s visual quality. F-TNN excels visually, capturing essential texture information effectively, a significant ad- vantage when the emphasis is on recognizing general features. ITCURC demonstrates comparable visual results though less effective than other methods in terms of PSNR and SSIM. Table 2.4 shows that the t-CCS based method, ITCURC, achieves the fastest processing speeds while preserving satisfactory levels of PSNR and SSIM. This underscores the suitability of the t- CCS model for applications where rapid processing is essential without significant loss in visual 32 Ground truth BCPF TMac TNN F-TNN ITCURC Figure 2.13 Visualization of seismic data recovery results by setting tubal-rank ๐‘Ÿ = 3 for ITCURC with percentage of selected horizontal and lateral slices ๐›ฟ = 17% with overall sampling rate 28% while other methods are applied based on Bernoulli sampling models with the same overall sam- pling rate 28%. Displayed are slices 15, 25, and 35 from top to bottom, with a 1.2ร— magnified area in each set for clearer comparison. accuracy. Furthermore, the consistent performance of ITCURC across various subtensor sizes and sampling rates further emphasizes flexibility and feasibility of the t-CCS model in diverse opera- tional environments. Discussions on the results of real-world datasets From the above results, it is evident that our method surpasses others in runtime with signifi- cantly lower computational costs. Consider a tensor of dimensions ๐‘›1 ร— ๐‘›2 ร— ๐‘›3. When a framelet transform matrix is constructed using ๐‘› filters and ๐‘™ levels, the computational cost per iteration for framelet-based Tensor Nuclear Norm (F-TNN) is given by O ((๐‘›๐‘™ โˆ’ ๐‘™ + 1)๐‘›1๐‘›2๐‘›3(๐‘›3 + min(๐‘›1, ๐‘›2))). This formulation incorporates the processes involved in generating a framelet transform matrix, as elaborated in seminal works such as [21] and [65]. While enhancing the number of levels and filters in F-TNN can improve the quality of results, it also escalates the computational bur- den, particularly for tensors of substantial size. In our experiments, we have set both the framelet level and the number of filters to 1 for the F-TNN implementation. For comparison, the com- 33 Table 2.4 Quantitative results for seismic data completion: TMac, TNN, F-TNN with Bernoulli sampling, and our method with t-CCS. Best results are in bold, and second-best are underlined. ITCURC-๐›ฟ refers to the ITCURC method with the percentages of selected horizontal and lateral slices set at a fixed rate of ๐›ฟ%. The t-CCS based algorithm ITCURC-๐›ฟ%s are performed on t-CCS scheme while other Bernoulli based algorithms are performed on Bernoulli Sampling scheme. Overall Observation Rate 12 % 16 % 20 % 24 % 28 % PSNR SSIM Runtime (sec) ITCURC-15 ITCURC-16 ITCURC-17 BCPF TMac TNN F-TNN ITCURC-15 ITCURC-16 ITCURC-17 BCPF TMac TNN F-TNN ITCURC-15 ITCURC-16 ITCURC-17 BCPF TMac TNN F-TNN 24.8020 24.7386 24.8381 24.0733 24.8859 23.7395 24.0688 0.5732 0.5691 0.5724 0.5304 0.5566 0.5165 0.6607 6.4327 6.3825 7.0379 33.5759 16.6135 34.5718 22.1019 27.4143 26.4092 27.4737 26.1054 27.4768 26.1176 24.2084 24.1905 26.5349 26.9970 27.7428 26.3806 28.6408 27.5890 0.7338 0.6691 0.7349 0.6507 0.7321 0.6491 0.5420 0.5407 0.6962 0.6738 0.7577 0.6442 0.8142 0.7551 6.7701 6.2598 6.7522 6.3579 6.8325 6.6306 32.1258 33.1832 16.8581 14.3412 29.2464 31.3138 22.1420 21.4482 30.6585 29.3053 30.6905 29.3542 30.5312 28.8953 24.3015 24.2454 28.4662 30.7237 29.5430 30.9172 31.2791 29.7987 0.8596 0.8143 0.8610 0.8129 0.8523 0.7939 0.5532 0.5494 0.8504 0.7612 0.8486 0.8080 0.8814 0.8479 6.8212 6.8633 7.0215 7.0789 7.3253 7.1480 31.2663 31.7875 13.1142 13.7124 23.9876 26.1727 18.0547 17.8848 putational cost per iteration for the TNN is O (min(๐‘›1, ๐‘›2)๐‘›1๐‘›2๐‘›3 + ๐‘›1๐‘›2๐‘›3 log(๐‘›3)), and for the TMac, it is O ((๐‘Ÿ1 + ๐‘Ÿ2 + ๐‘Ÿ3)๐‘›1๐‘›2๐‘›3) where (๐‘Ÿ1, ๐‘Ÿ2, ๐‘Ÿ3) denotes the Tucker rank. As for BCPF, it is O (๐‘…3(๐‘›1๐‘›2๐‘›3) + ๐‘…2(๐‘›1๐‘›2 + ๐‘›2๐‘›3 + ๐‘›3๐‘›1)), where ๐‘… is the CP rank. In contrast, the computational expense per iteration of our proposed method is significantly reduced to O (๐‘Ÿ |๐ผ |๐‘›2๐‘›3 + ๐‘Ÿ |๐ฝ |๐‘›1๐‘›3), assuming |๐ผ | (cid:28) ๐‘›1 and |๐ฝ | (cid:28) ๐‘›2, indicating a substantial efficiency improvement over traditional methods. Note that for F-TNN, [66] have formulated the tensor nuclear norm utilizing the ๐‘€-product [70], a generalization of the t-product for 3-order tensor. In [66], they have incorporated a tight wavelet frame (framelet) as the transformation matrix ๐‘€. This meticulous design of the ๐‘€ transformation contributes to the superior reconstruction quality of F-TNN. However, the absence of a rapid imple- 34 mentation for multiplying the tensor with matrix ๐‘€ along the third mode leads to F-TNN requiring significantly more computational time compared to other evaluated methods. It is worth noting that our current approach provides an effective balance between runtime effi- ciency and reconstruction quality, making it well-suited for potential real-world applications. This balanced approach is particularly relevant in practical settings where it is essential to consider both speed and quality in big data applications. 2.6 Proofs of Theoretical Results 2.6.1 Proof of Theorem 2.4 In this section, we provide a detailed proof of Theorem 2.4, which is one of two important supporting theorems to our main result Theorem 2.6. Before proceeding to prove Theorem 2.4, we will first introduce and discuss several supporting lemmas. These lemmas are crucial to establish the foundation for the proof of Theorem 2.4. Lemma 2.2. Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, ๐ผ โІ [๐‘›1] and ๐ฝ โІ [๐‘›2]. S๐ผ and S๐ฝ are the horizontal and lateral sampling tensors associated with indices ๐ผ and ๐ฝ respectively (see Definition 12). Then the following results hold S๐ผ โˆ— T = T โˆ— S๐ฝ = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ [S๐ผ]:,:,1 ยท [ (cid:98)T ]:,:,1 [S๐ผ]:,:,1 ยท [ (cid:98)T ]:,:,2 [ (cid:98)T ]:,:,1 ยท [S๐ฝ]:,:,1 [ (cid:98)T ]:,:,2 ยท [S๐ฝ]:,:,1 [S๐ผ]:,:,1 ยท [ (cid:98)T ]:,:,๐‘›3 . . . . . . , . (2.15) (2.16) [ (cid:98)T ]:,:,๐‘›3 ยท [S๐ฝ]:,:,1 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป Proof. Here, we will only focus on the proof of (2.15). First, it is easy to see that S๐ผ โˆ— T = S๐ผ ยท T . 35 In addition, and [S๐ผ]:,:,1 S๐ผ = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ [ (cid:98)T ]:,:,1 T = ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ [S๐ผ]:,:,1 . . . [ (cid:98)T ]:,:,2 . . . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป . [S๐ผ]:,:,1 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป [ (cid:98)T ]:,:,๐‘›3 The result can thus be derived. (cid:131) Theorem 2.7 ([115, 116]). Consider a finite sequence {๐‘‹๐‘˜ } of independent, random, Hermitian matrices with common dimension ๐‘‘. Assume that 0 โ‰ค ๐œ†min (๐‘‹๐‘˜ ) and ๐œ†max (๐‘‹๐‘˜ ) โ‰ค ๐ฟ for each index ๐‘˜. Set ๐‘Œ = (cid:205)๐‘˜ ๐‘‹๐‘˜ . Let ๐œ‡min and ๐œ‡max be the minimum and maximum eigenvalues of E(๐‘Œ ) respectively. Then, P {๐œ†min(๐‘Œ ) โ‰ค (1 โˆ’ ๐œ€)๐œ‡min} โ‰ค ๐‘‘ P {๐œ†max(๐‘Œ ) โ‰ฅ (1 + ๐œ€)๐œ‡max} โ‰ค ๐‘‘ (cid:105) ๐œ‡min/๐ฟ (cid:105) ๐œ‡max/๐ฟ (cid:104) eโˆ’ ๐œ€ (1โˆ’๐œ€)1โˆ’ ๐œ€ (cid:104) e๐œ€ (1+๐œ€)1+๐œ€ for ๐œ€ โˆˆ [0, 1), and for ๐œ€ โ‰ฅ 0. Lemma 2.3. Suppose ๐ด is a block diagonal matrix, i.e. ๐ด = ๐ด1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘– ๐ด๐‘– = I๐‘Ÿ๐‘– and ๐‘Ÿ๐‘– โ‰ค ๐‘›1 for โˆ€๐‘– โˆˆ [๐‘›3]. Let ๐ผ be a random subset of [๐‘›1]. , where each ๐ด๐‘– ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐ด๐‘›3 . . . ๐ด2 is a matrix of size ๐‘›1 ร— ๐‘Ÿ๐‘–, ๐ด(cid:62) 36 Then for any ๐›ฟ โˆˆ [0, 1), the ๐‘›3(cid:205) ๐‘–=1 ๐‘Ÿ๐‘–-th singular value of the matrix [S๐ผ]:,:,1 ๐‘ =: ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ (cid:113) (1โˆ’๐›ฟ)|๐ผ | ๐‘›1 [S๐ผ]:,:,1 . . . ๐ด1 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ [S๐ผ]:,:,1 ๐ด2 . . . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐ด๐‘›3 will be no less than with probability at least โˆ’(๐›ฟ+(1โˆ’๐›ฟ) log(1โˆ’๐›ฟ)) | ๐ผ | ๐‘› 1 max ๐‘› ๐‘– โˆˆ [๐‘› 3 ] 1 (cid:107) [ ๐ด]๐‘–,: (cid:107)2 F . 1 โˆ’ (cid:107)(cid:174)๐‘Ÿ (cid:107)1๐‘’ Proof. Firstly, it is easy to check that ๐‘ = (cid:2)I โŠ— (S๐ผ):,:,1 (cid:3) ยท ๐ด = ๐‘›3(cid:213) (cid:213) ๐‘–โˆˆ๐ผ ๐‘—=1 e( ๐‘—โˆ’1)๐‘›1+๐‘– ยท [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,:, ๐‘Œ := ๐‘ (cid:62) ยท ๐‘ = = ๐‘›3(cid:213) (cid:213) ๐‘–โˆˆ๐ผ (cid:213) ๐‘—=1 ๐‘›3(cid:213) where e( ๐‘—โˆ’1)๐‘›1+๐‘– is the standard column basis vector of K๐‘›1๐‘›3. Consider ๐‘›3(cid:205) ๐‘– ๐‘Ÿ๐‘– ร— ๐‘›3(cid:205) ๐‘– ๐‘Ÿ๐‘– Gram matrix (e( ๐‘—โˆ’1)๐‘›1+๐‘– ยท [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,:)(cid:62) ยท e( ๐‘—โˆ’1)๐‘›1+๐‘– ยท [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,: [ ๐ด](cid:62) ( ๐‘—โˆ’1)๐‘›1+๐‘–,: [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,: =: (cid:213) ๐‘‡๐‘–, where ๐‘‡๐‘– = ๐‘–โˆˆ๐ผ [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,:. It is easy to see that ๐‘Œ is a random matrix due to random- ness inherited from the random set ๐ผ. It is easy to see that each ๐‘‡๐‘– is a positive semidefinite matrix ( ๐‘—โˆ’1)๐‘›1+๐‘–,: ๐‘›3(cid:205) ๐‘—=1 [ ๐ด](cid:62) ๐‘—=1 ๐‘–โˆˆ๐ผ ๐‘›3(cid:205) ๐‘›3(cid:205) ๐‘Ÿ๐‘– ร— of size without replacement from the set (cid:8)๐‘‹1, ๐‘‹2, ยท ยท ยท ๐‘‹๐‘›1 ๐‘Ÿ๐‘–. Thus, the random matrix ๐‘Œ in fact is a sum of |๐ผ | random matrices sampled (cid:9) of positive semi-definite matrices. Notice that ๐‘– ๐‘– ๐œ†max (๐‘‡๐‘–) = ๐œ†max (cid:169) (cid:173) (cid:171) (e( ๐‘—โˆ’1)๐‘›1+๐‘– ยท [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,:)(cid:62) ยท e( ๐‘—โˆ’1)๐‘›1+๐‘– ยท [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,:(cid:170) (cid:174) (cid:172) e( ๐‘—โˆ’1)๐‘›1+๐‘– ยท [ ๐ด] ( ๐‘—โˆ’1)๐‘›1+๐‘–,:(cid:170) (cid:174) (cid:172) โ‰ค max ๐‘– (cid:107) [ ๐ด]๐‘–,:(cid:107)2 F . 2 (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) ๐‘›3(cid:213) ๐‘—=1 ๐‘›3(cid:213) ๐‘—=1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) = ๐œŽmax (cid:169) (cid:173) (cid:171) I and thus E(๐‘Œ ) = |๐ผ | By the orthogonal property of matrix ๐ด, it is easy to see that E (๐‘‡๐‘–) = 1 ๐‘›1 ๐‘›1 (๐‘) and by the Chernoff where E is the expectation operator. Thus, by the fact that ๐œ†min(๐‘Œ ) = ๐œŽ2 , min 37 inequality (see Theorem 2.7), we have ๐œŽmin(๐‘) โ‰ค P (cid:169) (cid:173) (cid:171) (cid:115) (1 โˆ’ ๐›ฟ)|๐ผ | ๐‘›1 (cid:170) (cid:174) (cid:172) โ‰ค (cid:107)(cid:174)๐‘Ÿ (cid:107)1๐‘’ โˆ’(๐›ฟ+(1โˆ’๐›ฟ) log(1โˆ’๐›ฟ)) | ๐ผ | ๐‘› 1 max ๐‘› ๐‘– โˆˆ [๐‘› 3 ] 1 (cid:107) [ ๐ด]๐‘–,: (cid:107)2 F , โˆ€๐›ฟ โˆˆ [0, 1). (cid:131) In the following, we delve into the proof of Theorem 2.4 to tell about how likely t-CUR decom- position holds. The proof of Theorem 2.4. According to Theorem 2.3, T = C โˆ—Uโ€  โˆ—R is equivalent to rank๐‘š (T ) = rank๐‘š (C) = rank๐‘š (R). Therefore, it suffices to prove that rank๐‘š (T ) = rank๐‘š (C) = rank๐‘š (R) holds with probability at least 1 โˆ’ 1 ๐‘›๐›ฝ 1 1 โˆ’ 1 ๐‘›๐›ฝ 2 2 with the given conditions. Notice that T = = = [ (cid:98)T ]:,:,1 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃฎ ๐‘Š1ฮฃ1๐‘‰ (cid:62) ๏ฃฏ 1 ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๏ฃฎ ๐‘Š1 ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ๐‘Š2 [ (cid:98)T ]:,:,2 . . . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป [ (cid:98)T ]:,:,๐‘›3 ๐‘Š2ฮฃ2๐‘‰ (cid:62) 2 . . . ฮฃ1 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ยท ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ . . . ๐‘Š๐‘›3 ๐‘Š๐‘›3ฮฃ๐‘›3 ๐‘‰ (cid:62) ๐‘›3 ฮฃ2 . . . (2.17) ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ยท ๏ฃฎ ๐‘‰ (cid:62) ๏ฃฏ 1 ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ ฮฃ๐‘›3 ๐‘‰ (cid:62) 2 . . . ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป ๐‘‰ (cid:62) ๐‘›3 =: ๐‘Š ยท ฮฃ ยท ๐‘‰ (cid:62), in (2.17) is the compact SVD of [T ]:,:,๐‘– for ๐‘– โˆˆ [๐‘›3]. And R = S๐ผ๐‘Šฮฃ๐‘‰ (cid:62). By the where ๐‘Š๐‘–ฮฃ๐‘–๐‘‰ (cid:62) ๐‘– definition of tensor multi-rank, we have ๐‘Š๐‘– โˆˆ K๐‘›1ร—๐‘Ÿ๐‘– , ฮฃ๐‘– โˆˆ K๐‘Ÿ๐‘–ร—๐‘Ÿ๐‘– , ๐‘‰๐‘– โˆˆ K๐‘›2ร—๐‘Ÿ๐‘– , W โˆˆ K๐‘›1๐‘›3ร—(cid:107)(cid:174)๐‘Ÿ (cid:107)1, ฮฃ โˆˆ K(cid:107)(cid:174)๐‘Ÿ (cid:107)1ร—(cid:107)(cid:174)๐‘Ÿ (cid:107)1, and ๐‘‰ โˆˆ K๐‘›2๐‘›3ร—(cid:107)(cid:174)๐‘Ÿ (cid:107)1. Consequently, demonstrating that rank(R) = (cid:107)(cid:174)๐‘Ÿ (cid:107)1 suffices to ensure the condition that rank๐‘š (T ) = 38 rank๐‘š (R). Observe that ฮฃ is a square matrix with full rank and ๐‘‰ has full column rank. By the Sylvester rank inequality, rank(R) = (cid:107)(cid:174)๐‘Ÿ (cid:107)1 can be guaranteed by showing rank(S ๐ผ ยท ๐‘Š) = (cid:107)(cid:174)๐‘Ÿ (cid:107)1. By applying Lemma 2.3, we have that for all ๐›ฟ โˆˆ [0, 1), (cid:17) P (cid:16) ๐œŽ(cid:107)(cid:174)๐‘Ÿ (cid:107)1 ๐›ฝ1 ๐œ‡0 (cid:107)๐‘Ÿ (cid:107)โˆž log(๐‘›1 (cid:107)(cid:174)๐‘Ÿ (cid:107)1) ๐›ฟ+(1โˆ’๐›ฟ) log(1โˆ’๐›ฟ) (๐‘†๐ผ ยท ๐‘Š) โ‰ค (cid:112)(1 โˆ’ ๐›ฟ)|๐ผ |/๐‘›1 implies P (cid:16) ๐œŽ(cid:107)(cid:174)๐‘Ÿ (cid:107)1 |๐ผ | โ‰ฅ โ‰ค (cid:107)(cid:174)๐‘Ÿ (cid:107)1๐‘’โˆ’(๐›ฟ+(1โˆ’๐›ฟ) log(1โˆ’๐›ฟ)) | ๐ผ | 0 (cid:107) (cid:174)๐‘Ÿ (cid:107)โˆž . ๐œ‡ (๐‘†๐ผ ยท ๐‘Š) โ‰ค (cid:112)(1 โˆ’ ๐›ฟ)|๐ผ |/๐‘›1 (cid:17) โ‰ค 1 ๐‘›1 ๐›ฝ 1 . Note that P (cid:16) rank(๐‘†๐ผ ยท ๐‘Š) < (cid:107)(cid:174)๐‘Ÿ (cid:107)1 (cid:17) We thus have when |๐ผ | โ‰ฅ P (cid:16) rank(๐‘†๐ผ ยท ๐‘Š) = (cid:107)(cid:174)๐‘Ÿ (cid:107)1 ๐›ฝ1 ๐œ‡0 (cid:107)๐‘Ÿ (cid:107)โˆž log(๐‘›1 (cid:107)(cid:174)๐‘Ÿ (cid:107)1) ๐›ฟ+(1โˆ’๐›ฟ) log(1โˆ’๐›ฟ) (cid:17) , =1 โˆ’ P (cid:16) (cid:115) (1 โˆ’ ๐›ฟ)|๐ผ | ๐‘›1 . (cid:170) (cid:174) (cid:172) (cid:17) ๐œŽ(cid:107)(cid:174)๐‘Ÿ (cid:107)1 (๐‘†๐ผ ยท ๐‘Š) โ‰ค โ‰ค P (cid:169) (cid:173) (cid:171) rank(๐‘†๐ผ ยท ๐‘Š) < (cid:107)(cid:174)๐‘Ÿ (cid:107)1 (cid:115) โ‰ฅ1 โˆ’ P (cid:169) (cid:173) (cid:171) Similarly, one can show that rank(S๐ฝ ยท V) = (cid:107)(cid:174)๐‘Ÿ (cid:107)1 holds with probability at least 1 โˆ’ 1 ๐‘›2 (๐‘†๐ผ ยท ๐‘Š) โ‰ค ๐œŽ(cid:107)(cid:174)๐‘Ÿ (cid:107)1 โ‰ฅ 1 โˆ’ (cid:170) (cid:174) (cid:172) . (1 โˆ’ ๐›ฟ)|๐ผ | ๐‘›1 1 ๐›ฝ1 ๐‘›1 ๐›ฝ 2 provided that |๐ฝ | โ‰ฅ ๐›ฝ2 ๐œ‡0 (cid:107)๐‘Ÿ (cid:107)โˆž log(๐‘›2 (cid:107)(cid:174)๐‘Ÿ (cid:107)1) ๐›ฟ+(1โˆ’๐›ฟ) log(1โˆ’๐›ฟ) . Combining all the statements and setting ๐›ฟ = 0.815 and ๐›ฝ1 = ๐›ฝ2 = ๐›ฝ, we conclude that T = C โˆ— Uโ€  โˆ— R holds with probability at least 1 โˆ’ 1 ๐‘›๐›ฝ 1 โˆ’ 1 ๐‘›๐›ฝ 2 and |๐ฝ | โ‰ฅ 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›2(cid:107)(cid:174)๐‘Ÿ (cid:107)1). , provided that |๐ผ | โ‰ฅ 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›1(cid:107)(cid:174)๐‘Ÿ (cid:107)1) (cid:131) 2.6.1.1 Some remarks on the proof of Theorem 2.4 We wish to emphasize that the techniques employed in our proof are not merely straightforward extensions of the probabilistic estimates used in matrix CUR decompositions since one cannot directly apply the union of matrix CUR probabilistic estimates to flattened tensors due to โ€˜โ€˜depen- denceโ€™โ€™ and โ€intertwinedโ€ sampling property of each sub block matrix after one flattens a tensor to a block diagonal matrix in the Fourier domain. We introduce a new tool that offers a probabilistic estimate for achieving an exact t-CUR decomposition, utilizing a novel proof methodology. The cornerstone of our approach is to assess the likelihood that multi-rank is preserved when select- ing horizontal or lateral slices uniformly. This approach distinguishes our method from traditional techniques applied in matrix settings. Although our method involves converting a third-order tensor 39 into a block diagonal matrix, it necessitates the introduction of innovative techniques. These are required to overcome several challenges that do not arise in matrix-based proofs. Given a three- order tensor T โˆˆ R๐‘›1ร—๐‘›2ร—๐‘›3 with multi-rank (cid:174)๐‘Ÿ = (๐‘Ÿ1, ๐‘Ÿ2, ยท ยท ยท , ๐‘Ÿ๐‘›3), its flattened version in the Fourier domain is denoted as T = [ (cid:98)T ]:,:,1 0 0 0 ... 0 [ (cid:98)T ]:,:,2 ยท ยท ยท ... 0 ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ 0 0 ยท ยท ยท ... 0 ยท ยท ยท ยท ยท ยท ยท ยท ยท ... ยท ยท ยท 0 0 0 ... [ (cid:98)T ]:,:,๐‘›3 , ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป where (cid:98)T = FFT(T , [], 3). For simplicity, we denote ๐‘‡๐‘– as [ (cid:98)T ]:,:,๐‘– for ๐‘– = 1, ยท ยท ยท , ๐‘›3. It is easy to see that sampling horizontal(lateral) slices of tensor T โˆˆ R๐‘›1ร—๐‘›2ร—๐‘›3 with an index set ๐ผ is equiva- lent of sampling row(column) vectors of the matrix (cid:98)T with indexes ๐ผ, ๐‘›1 + ๐ผ, ยท ยท ยท , (๐‘›3 โˆ’ 1)๐‘›1 + ๐ผ. In other words, the process of sampling ๐ผ horizontal(lateral) slices is the same with sampling ๐ผ rows(columns) of ๐‘‡๐‘–, for ๐‘– = 1, ยท ยท ยท , ๐‘›3. Similar arguments for the lateral slice index set ๐ฝ. Consider the sample space ฮฉ = {(๐ผ, ๐ฝ), ๐ผ โŠ‚ {1, ยท ยท ยท , ๐‘›1}, ๐ฝ โŠ‚ {1, ยท ยท ยท , ๐‘›2}}. Define the events F๐‘– as for ๐‘– = 1, . . . , ๐‘›3. Lemma 2.4. {(๐ผ, ๐ฝ) โŠ‚ ฮฉ | [๐‘‡๐‘–] ๐ผ,๐ฝ = ๐‘Ÿ๐‘–}, {(๐ผ, ๐ฝ) : [๐‘‡๐‘–] ๐ผ,๐ฝ = ๐‘Ÿ๐‘–, ๐‘– = 1, ยท ยท ยท , ๐‘›3} (cid:40) F1 โˆฉ F2 โˆฉ ยท ยท ยท โˆฉ F๐‘›3 . Proof. The set {(๐ผ, ๐ฝ) : [๐‘‡๐‘–] ๐ผ,๐ฝ = ๐‘Ÿ๐‘–, ๐‘– = 1, ยท ยท ยท , ๐‘›3} can be viewed as a product space, i.e., (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) {(๐ผ, ๐ฝ) ร— (๐ผ, ๐ฝ) ยท ยท ยท ร— (๐ผ, ๐ฝ) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:125) (cid:124) n times : [๐‘‡๐‘–] ๐ผ,๐ฝ = ๐‘Ÿ๐‘–, ๐‘– = 1, ยท ยท ยท , ๐‘›3}. However, F1 โˆฉ F2 โˆฉ ยท ยท ยท F๐‘›3 = {(๐ผ1, ๐ฝ1) ร— (๐ผ2, ๐ฝ2) ร— ยท ยท ยท (๐ผ๐‘›3 , ๐ฝ๐‘›3) : (๐ผ๐‘–, ๐ฝ๐‘–) โˆˆ F๐‘–, ๐‘– = 1, ยท ยท ยท , ๐‘›3}. 40 Therefore, {(๐ผ, ๐ฝ) : [๐‘‡๐‘–] ๐ผ,๐ฝ = ๐‘Ÿ๐‘–, ๐‘– = 1, ยท ยท ยท , ๐‘›3} (cid:40) F1 โˆฉ F2 โˆฉ ยท ยท ยท โˆฉ F๐‘›3 . (cid:131) Let G represent the event {(๐ผ, ๐ฝ) โŠ‚ ฮฉ | |๐ผ | โ‰ฅ ๐œ‡0|(cid:174)๐‘Ÿ | log(๐‘›1) log(๐‘Ÿ), |๐ฝ | โ‰ฅ ๐œ‡0|(cid:174)๐‘Ÿ | log(๐‘›2) log(๐‘Ÿ)}. Although one might have that the conditional probability inequality, based on the [32, Theorem 2.1],[19, Theorem 2], P(F๐‘– |G) โ‰ฅ 1 โˆ’ 4๐‘Ÿ 2 ๐‘›1๐‘›2 , one can find that based on Lemma 2.4: P({(๐ผ, ๐ฝ) โŠ‚ ฮฉ : rank๐‘š ( [T ] ๐ผ,๐ฝ,:) = (cid:174)๐‘Ÿ}|G) = (cid:174)๐‘Ÿ |G) = P({(๐ผ, ๐ฝ) โŠ‚ ฮฉ : [๐‘‡๐‘–] ๐ผ,๐ฝ = ๐‘Ÿ๐‘–, ๐‘– = 1, ยท ยท ยท , ๐‘›3}|G) โ‰ค P(F1 โˆฉ F2 ยท ยท ยท โˆฉ F๐‘›3 |G) The probability inequality P({(๐ผ, ๐ฝ) โŠ‚ ฮฉ : rank๐‘š ([T ] ๐ผ,๐ฝ,:) = (cid:174)๐‘Ÿ}|G) โ‰ค P(F1 โˆฉ F2 ยท ยท ยท โˆฉ F๐‘›3 |G) P(F๐‘– |G) โˆ’ (๐‘›3 โˆ’ 1). As a result, we directly stops us from applying P(F1 โˆฉ F2 ยท ยท ยท โˆฉ F๐‘›3 |G) โ‰ฅ can not get a lower bound of P({(๐ผ, ๐ฝ) โŠ‚ ฮฉ : [๐‘‡๐‘–] ๐ผ,๐ฝ = ๐‘Ÿ๐‘–, ๐‘– = 1, ยท ยท ยท , ๐‘›3}|G) = (cid:174)๐‘Ÿ |G) via a direct ๐‘›3(cid:205) ๐‘–=1 union of matrix CUR probabilistic estimate results. Furthermore, we hope to emphasize that applying the matrix Chernoff inequality to a flatten tensor also presents numerous challenges. Notice that one flatten tensor T into T in the Fourier domain, the corresponding sampling index set of rows with respect to T becomes (cid:208) ๐‘–โˆˆ๐ผ ๐ผ๐‘–, where ๐ผ๐‘– := {๐‘–, ๐‘›1 + ๐‘–, 2๐‘›1 + ๐‘–, ยท ยท ยท , (๐‘›3 โˆ’ 1)๐‘›1 + ๐‘–}, ๐‘– = 1, ยท ยท ยท , ๐‘›1, and the corresponding sampling index of columns with respect to T becomes (cid:208) ๐‘–โˆˆ๐ผ ๐ฝ๐‘–, where ๐ฝ๐‘– := {๐‘–, ๐‘›2 + ๐‘–, 2๐‘›2 + ๐‘–, ยท ยท ยท , (๐‘›3 โˆ’ 1)๐‘›2 + ๐‘–}, ๐‘– = 1, ยท ยท ยท , ๐‘›2. Without loss of generality and for the sake of brevity, in the following, we focus solely on the case of selecting horizontal slices of T , denoted by the sampling index set ๐ผ. It is easy to find that one can not directly apply the matrix Chernoff inequality to the finite set of positive-semi-definite matrices H = (cid:8)๐‘€1, ๐‘€2, ยท ยท ยท ๐‘€๐‘›3 , ๐‘€๐‘›3+1, ๐‘€๐‘›3+2, ยท ยท ยท , ๐‘€2๐‘›3 , ยท ยท ยท , ๐‘€๐‘›1๐‘›3 (cid:9) , 41 where ๐‘€ ๐‘— = [T ](cid:62) set (cid:208) ๐‘–โˆˆ๐ผ ๐ผ๐‘–. Specifically, the intertwined nature of the index set (cid:208) ๐‘–โˆˆ๐ผ ๐‘—,: ยท [T ] ๐‘—,: with ๐‘— = 1, ยท ยท ยท , ๐‘›1๐‘›3 due to the intertwined property of sampling index ๐ผ๐‘– complicates the estimation of the ๐‘›1(cid:205) ๐‘—=1 spectral bound for the sum of random matrices. Specifically, the expression |๐œŽmax( e( ๐‘— โˆ’ 1)๐‘›3 + ๐‘–ยท [T ] ( ๐‘—โˆ’1)๐‘›3+๐‘–,:)|2 does not serve as a lower bound for max ๐‘– |[T ]๐‘–,:|2 2. 2.6.2 Proof of Theorem 2.5 In this section, we provides a detailed proof of Theorem 2.5, another important supporting the- orem to our main theoretical result Theorem 2.6. To the best of our knowledge, there is no existing tensor version of the result found in [97, Theorem 1.1], which furnishes an explicit expression of numerical constants within the theoremโ€™s statement. Existing results related to tensor versions, such as [138, Theorem 3.1] in the context of tensor completion, typically only imply numerical constants implicitly. One can see that [138, Theorem 3.1] does not give an explicit expression of numerical constants of ๐‘0, ๐‘1 and ๐‘2. Theorem 2.8. [138, Theorem 3.1] Suppose M is an ๐‘›1 ร— ๐‘›2 ร— ๐‘›3 tensor and its reduced t-SVD is given by M = U โˆ— S โˆ— V(cid:62) where U โˆˆ R๐‘›1ร—๐‘Ÿร—๐‘›3, S โˆˆ R๐‘Ÿร—๐‘Ÿร—๐‘›3, and V โˆˆ R๐‘›2ร—๐‘Ÿร—๐‘›3. Suppose M satisfies the standard tensor incoherent condition with parameter ๐œ‡0 > 0. Then there exists constants ๐‘0, ๐‘1, ๐‘2 > 0 such that if ๐œ‡0๐‘Ÿ log(๐‘›3(๐‘›1 + ๐‘›2)) min{๐‘›1, ๐‘›2} Then M is the unique minimizer to the follow optimization ๐‘ โ‰ฅ ๐‘0 . with probability at least min X (cid:107)X(cid:107)TNN subject to Pฮฉ(X) = Pฮฉ(M), 1 โˆ’ ๐‘1((๐‘›1 + ๐‘›2)๐‘›3)โˆ’๐‘2. Our work constitutes a substantial contribution through the meticulous analysis of these numer- ical constants, yielding explicit formulations for their expressions. The details of these theoretical advancements are comprehensively elaborated in our theoretical section. Before moving forward, 42 let us introduce several notations used throughout the rest of the supplemental material but not covered in earlier sections. Definition 15. Suppose T is an ๐‘›1ร—๐‘›2ร—๐‘›3 tensor and its compact t-SVD is given by T = Uโˆ—Sโˆ—V(cid:62) where U โˆˆ K๐‘›1ร—๐‘Ÿร—๐‘›3, S โˆˆ K๐‘Ÿร—๐‘Ÿร—๐‘›3 and V โˆˆ K๐‘›2ร—๐‘Ÿร—๐‘›3. Define projection space T as (cid:41) :,๐‘˜,:) : X๐‘˜ โˆˆ K๐‘›2ร—1ร—๐‘›3, Y๐‘˜ โˆˆ K๐‘›1ร—1ร—๐‘›3 ๐‘˜ + Y๐‘˜ โˆ— [V](cid:62) ( [U]:,๐‘˜,: โˆ— X(cid:62) (cid:40) ๐‘Ÿ (cid:213) and the orthogonal projection space TโŠฅ is the orthogonal complement T in K๐‘›1ร—๐‘›2ร—๐‘›3. Define PT(X) ๐‘˜=1 and PTโŠฅ (X) as PT(X) = U โˆ— U(cid:62) โˆ— X + X โˆ— V โˆ— V(cid:62) โˆ’ U โˆ— U(cid:62) โˆ— X โˆ— V โˆ— V(cid:62), PTโŠฅ (X) = (cid:0)I๐‘›1 โˆ’ U โˆ— U(cid:62)(cid:1) โˆ— X โˆ— (cid:0)I๐‘›2 โˆ’ V โˆ— V(cid:62)(cid:1) , where I๐‘›1 is the identity tensor of size ๐‘›1 ร— ๐‘›1 ร— ๐‘›3 and I๐‘›2 is the identity tensor of size ๐‘›2 ร— ๐‘›2 ร— ๐‘›3. Definition 16. Define the operator Rฮฉ : K๐‘›1ร—๐‘›2ร—๐‘›3 โ†’ K๐‘›1ร—๐‘›2ร—๐‘›3 as: ๐›ฟ๐‘–, ๐‘—,๐‘˜ [X]๐‘–, ๐‘—,๐‘˜ หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , Rฮฉ(X) = (cid:213) 1 ๐‘ ๐‘–, ๐‘—,๐‘˜ where [X]๐‘–, ๐‘—,๐‘˜ is the (๐‘–, ๐‘—, ๐‘˜)-th entry of a tensor X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. Definition 17. Given two tensor A โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 and B โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, the inner product of these two tensors is defined as: (cid:104)A, B(cid:105) = 1 ๐‘›3 (cid:62) (cid:16) B (cid:17) . ยท A trace Before we introduce tensor operator norm, we need to introduce a transformed version of a tensor operator. Given a tensor operator, F : K๐‘›1ร—๐‘›2ร—๐‘›3 โ†’ K๐‘›1ร—๐‘›2ร—๐‘›3, the associated transformed operator F : B โ†’ B, where B = (cid:110) B : B โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 (cid:111) , is defined as F (X) = F (X). Definition 18 (Tensor operator norm). Given a operator F : K๐‘›1ร—๐‘›2ร—๐‘›3 โ†’ K๐‘›1ร—๐‘›2ร—๐‘›3, the operator norm (cid:107)F (cid:107) is defined as (cid:107)F (cid:107) = (cid:107)F (cid:107) = max (cid:107)X(cid:107)F=1 (cid:107)F (X)(cid:107)F = max (cid:107)X(cid:107)F=1 (cid:13) (cid:13) (cid:13) F (X) (cid:13) (cid:13) (cid:13)F . Definition 19 (๐‘™โˆž,2 norm [138]). Given a tensor X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, its ๐‘™โˆž,2 norm is defined as (cid:115)(cid:213) (cid:115)(cid:213) (cid:107)X(cid:107)โˆž,2 := max{max ๐‘– [X]2 ๐‘–,๐‘,๐‘˜ , max ๐‘— [T ]2 ๐‘Ž, ๐‘—,๐‘˜ }. ๐‘Ž,๐‘˜ ๐‘,๐‘˜ 43 Definition 20 (Tensor infinity norm). Given a tensor X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, the tensor infinity norm of it is defined as (cid:107)X(cid:107)โˆž := max ๐‘–, ๐‘—,๐‘˜ | [X]๐‘–, ๐‘—,๐‘˜ |. In the following, we will present a formal definition of the tensor completion problem based on the Bernoulli sampling model. Consider a third-order tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 with tubal-rank ๐‘Ÿ. We denote ฮฉ as the set of indices of the observed entries. Suppose that ฮฉ is generated according to the Bernoulli sampling model with probability ๐‘. We define the sampling operator Pฮฉ such that for a given tensor X in K๐‘›1ร—๐‘›2ร—๐‘›3, Pฮฉ(X) = (cid:213) [X]๐‘–, ๐‘—,๐‘˜ E๐‘–, ๐‘—,๐‘˜ , (๐‘–, ๐‘—,๐‘˜)โˆˆฮฉ where E๐‘–, ๐‘—,๐‘˜ is a tensor in {0, 1}๐‘›1ร—๐‘›2ร—๐‘›3 and all elements are zero except for the one at the position indexed by (๐‘–, ๐‘—, ๐‘˜). The primary goal of the tensor completion problem is to reconstruct the tensor T from the entries on ฮฉ. We utilize the approach proposed in the references [84, 138], which addresses the tensor completion issue through a specific convex optimization problem formulated as follows: min X (cid:107)X(cid:107)TNN subject to Pฮฉ(X) = Pฮฉ(T ). (2.18) Notice that TNN is convex but not strictly convex. Thus, there might be more than one local mini- mizer to the optimization problem (2.18). Therefore, we need to establish conditions to ensure that our optimization problem has a unique minimizer, which is exactly the tensor we seek to recover. The question of under what conditions T is the unique minimizer of the optimization problem (2.18) naturally arises. In response, Proposition 1 gives an affirmative answer. Before proceeding, it is important to highlight that in the following context, for convenience, we will interchangeably make use of (cid:107) ยท (cid:107) to denote the tensor spectral norm, tensor operator norm or the matrix spectral norm, depending on the specific situation. Proposition 1 ([84]). Assume that T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 of tubal-rank ๐‘Ÿ satisfies the incoherence condition with parameter ๐œ‡0 and its compact t-SVD is given by T = U โˆ— S โˆ— V(cid:62) where U โˆˆ K๐‘›1ร—๐‘Ÿร—๐‘›3, S โˆˆ K๐‘Ÿร—๐‘Ÿร—๐‘›3 and V โˆˆ K๐‘›2ร—๐‘Ÿร—๐‘›3. Suppose that ฮฉ is generated according to the Bernoulli sampling model 44 with probability ๐‘. Then tensor T is a unique minimizer of the optimization problem (2.18), if the following two conditions hold: Condition 1. (cid:107)PTRฮฉPT โˆ’ PT(cid:107) โ‰ค 1 2 Condition 2. There exists a tensor Y such that Pฮฉ(Y) = Y and (a) (cid:107)PT(Y) โˆ’ U โˆ— V(cid:62)(cid:107)F โ‰ค 1 4 (cid:113) ๐‘ ๐‘›3 (b) (cid:107)PTโŠฅ (Y) (cid:107) โ‰ค 1 2 Based on Proposition 1, our main result is derived through probabilistic estimation of the Con- dition 1 and Condition 2. Throughout this computation, we explicitly determine both the lower bound of the sampling probability ๐‘ and the probability of the exact recovery of T . The architecture of the entire proof is described as follows. 2.6.2.1 Architecture of the proof of Theorem 2.5 The proof of Theorem 2.5 follows the pipeline developed in [84, 138]. We first state a sufficient condition for T to be the unique optimal solution to the optimization problem (2.18) via construct- ing a dual certificate Y obeying two conditions. This result is summarized in Proposition 1. To obtain our main result Theorem 2.5, we just need to show that the conditions in Proposition 1 hold with a high probability. The Theorem 2.5 is built on the basis of Lemma 2.8, Lemma 2.9, Lemma 2.10 and Corollary 2.1. A detailed roadmap of the proof towards Theorem 2.5 is outlined in Figure 2.14. 45 Lemma 2.8 Lemma 2.9 Lemma 2.10 Corollary 2.1 condition I condition II Proposition 1 Theorem 2.5 Figure 2.14 The structure of the proof of Theorem 2.5: The core of the proof for Theorem 2.5 relies on assessing the probability that certain conditions, specified in Proposition 1, are met. Condition I and Condition II serve as sufficient criteria to ensure the applicability of Proposition 1. Thus, the proof of Theorem 2.5 primarily involves determining the likelihood that condition I and II are satisfied. The probabilistic assessment of condition I utilizes Lemma 2.8 as a fundamental instrument. Similarly, the evaluation of condition II employs Lemmas 2.8 to 2.10, and Corollary 2.1 as essential tools. Before delving into the proof of Theorem 2.5, we will introduce several supporting lemmas to lay the necessary foundation. 2.6.2.2 Supporting lemmas for the proof of Theorem 2.5 Lemma 2.5 (Non-commutative Bernstein inequality [115]). Let ๐‘‹1, ๐‘‹2, ยท ยท ยท , ๐‘‹๐ฟ be independent zero-mean random matrices of dimension ๐‘›1 ร— ๐‘›2. Suppose ๐œŽ2 = max E[ ๐‘‹๐‘˜ ๐‘‹ (cid:62) ๐‘˜ ] , E[ and (cid:107) ๐‘‹๐‘˜ (cid:107) โ‰ค ๐‘€. Then for any ๐œ โ‰ฅ 0, (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32)(cid:13) ๐ฟ (cid:13) (cid:213) (cid:13) (cid:13) (cid:13) ๐‘‹๐‘˜ P ๐‘˜=1 โ‰ฅ ๐œ โ‰ค (๐‘›1 + ๐‘›2) exp (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐ฟ (cid:213) ๐‘˜=1 (cid:33) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐ฟ (cid:213) ๐‘˜=1 ๐‘‹ (cid:62) ๐‘˜ ๐‘‹๐‘˜ ] (cid:41) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:32) โˆ’๐œ2/2 ๐œŽ2 + ๐‘€๐œ 3 (cid:33) . The following lemma is a variant of Non-commutative Bernstein inequality. Lemma 2.6. Let ๐‘‹1, ๐‘‹2, ยท ยท ยท , ๐‘‹๐ฟ be independent zero-mean random matrices of dimension ๐‘›1 ร— ๐‘›2. Suppose ๐œŽ2 = max (cid:40)(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) E[ ๐ฟ (cid:213) ๐‘˜=1 ๐‘‹๐‘˜ ๐‘‹ (cid:62) ๐‘˜ ] (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) , (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) E[ ๐ฟ (cid:213) ๐‘˜=1 ๐‘‹ (cid:62) ๐‘˜ ๐‘‹๐‘˜ ] (cid:41) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) 46 and (cid:107) ๐‘‹๐‘˜ (cid:107) โ‰ค ๐‘€, where ๐‘€ is a positive number. If we choose ๐œ = (cid:112) 2๐‘๐œŽ2 log (๐‘›1 + ๐‘›2)+๐‘๐‘€ log (๐‘›1 + ๐‘›2), we have P (cid:32)(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐ฟ (cid:213) ๐‘˜=1 ๐‘‹๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) โ‰ฅ (cid:112) 2๐‘๐œŽ2 log (๐‘›1 + ๐‘›2) + ๐‘๐‘€ log (๐‘›1 + ๐‘›2) (cid:33) โ‰ค (๐‘›1 + ๐‘›2)1โˆ’๐‘, where ๐‘ is any positive number greater than 1. The following fact is very useful and we will make frequent use of the result for the proofs of Theorem 2.5 and Proposition 1. Lemma 2.7. Suppose T is an ๐‘›1 ร— ๐‘›2 ร— ๐‘›3 tensor with its compact t-SVD given by T = U โˆ— ฮฃ โˆ— V(cid:62) and satisfy incoherence condition with parameter ๐œ‡0. Then, (cid:107)PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— )(cid:107)2 F โ‰ค (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 . The following lemma shows how likely that the operator norm PTRฮฉPT โˆ’ PT is smaller than 1 2. Such result will help us calculate how likely the Condition 1 in Proposition 1 holds. Lemma 2.8. Assume that ฮฉ is generated according to the Bernoulli distribution with probability ๐‘, then holds with probability at least 1 โˆ’ 2๐‘›1๐‘›2๐‘›3 exp (cid:16) โˆ’ 3๐‘๐‘›1๐‘›2 28(๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ (cid:17) . (cid:107)PTRฮฉPT โˆ’ PT(cid:107) โ‰ค 1 2 The following lemma states that given an arbitrary tensor X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, tensor spectral norm of difference between Rฮฉ(X) and X can be bounded with tensor infinity norm and ๐‘™โˆž,2 norm with a high probability. Lemma 2.9. Given an arbitrary tensor X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3. Assume that ฮฉ is generated according to the Bernoulli distribution with probability ๐‘. Then, for any constant ๐‘2 > 1, we have (cid:107)Rฮฉ(X) โˆ’ X(cid:107) โ‰ค (cid:107)X(cid:107)โˆž,2 2๐‘2 ๐‘ log((๐‘›1 + ๐‘›2)๐‘›3) + ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ (cid:107)X(cid:107)โˆž (2.19) (cid:115) holds with probability at least 1 โˆ’ ((๐‘›1 + ๐‘›2)๐‘›3)1โˆ’๐‘2. The following lemma states that given an arbitrary tensor X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, the bound of ๐‘™โˆž,2 distance between PTRฮฉ(X) and PT(X) can be controlled by the ๐‘™โˆž,2 norm of X and the tensor infinity norm of X with a high probability. 47 Lemma 2.10. Assume that ฮฉ is generated according to the Bernoulli distribution with probability ๐‘. For any positive number ๐‘1 โ‰ฅ 2, then we can get (cid:32) P (cid:107)(PTRฮฉ(X) โˆ’ PT(X))(cid:107)โˆž,2 โ‰ค (cid:115) 4๐‘1(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ log ((๐‘›1 + ๐‘›2)๐‘›3) ๐‘๐‘›1๐‘›2 ยท (cid:107)X(cid:107)โˆž,2 + ๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ (cid:115) (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 โ‰ฅ 1 โˆ’ ((๐‘›1 + ๐‘›2)๐‘›3)2โˆ’๐‘1. (cid:107)X(cid:107)โˆž(cid:170) (cid:174) (cid:172) The following lemma states that, given an arbitrary tensor X โˆˆ R๐‘›1ร—๐‘›2ร—๐‘›3, the tensor infinity norm of PTRฮฉPT(X) โˆ’ PT(X) can be bounded by the tensor infinity norm of PT(X) with a high probability. Lemma 2.11. Assume that ฮฉ is generated according to the Bernoulli distribution with probability ๐‘. For any X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, then holds with probability at least 1 โˆ’ 2๐‘›1๐‘›2๐‘›3 exp (cid:107)(PTRฮฉPT โˆ’ PT) (X)(cid:107)โˆž โ‰ค (cid:16) โˆ’3๐‘๐‘›1๐‘›2 16(๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ . (cid:107)PT(X)(cid:107)โˆž (cid:17) 1 2 When PT(X) = X, we can easily achieve the following corollary. Corollary 2.1. Assume that ฮฉ is generated according to the Bernoulli distribution with probability ๐‘ž. For any X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, if PT(X) = X then holds with probability at least 1 โˆ’ 2๐‘›1๐‘›2๐‘›3 exp (cid:16) โˆ’3๐‘ž๐‘›1๐‘›2 28(๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ (cid:13) (cid:13) (cid:0)PTRฮฉ๐‘ก PT โˆ’ PT(cid:1) (X)(cid:13) (cid:13)โˆž โ‰ค (cid:107)X(cid:107)โˆž 1 2 (cid:17) . Corollary 2.1 is used to give a probabilistic estimate towards the lower bound of (cid:107)D๐‘ก (cid:107)โˆž, where D๐‘ก is defined in Equation (2.22) later. Now we are ready to provide the proof of Theorem 2.5. Proof of Theorem 2.5. First of all, one can get that the Condition 1 holds with probability at least (cid:18) (cid:19) 1 โˆ’ 2๐‘›1๐‘›2๐‘›3 exp โˆ’ 3๐‘๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ (2.20) according to Lemma 2.8. Next, our main goal is to construct a dual certificate Y that satisfies the condition 2. We do this using the Golfing Scheme [23, 50]. Choose ๐‘ก0 as (cid:24) ๐‘ก0 โ‰ฅ log2 (cid:18) (cid:114) ๐‘›3๐‘Ÿ ๐‘ 4 (cid:19)(cid:25) (2.21) 48 where (cid:100)ยท(cid:101) is the ceil function. Suppose that the set ฮฉ of observed entries is generated from ฮฉ = โˆช๐‘ก0 ๐‘ก=1ฮฉ๐‘ก with P[(๐‘–, ๐‘—, ๐‘˜) โˆˆ ฮฉ๐‘ก] = ๐‘ž := 1 โˆ’ (1 โˆ’ ๐‘) see that for any (๐‘–, ๐‘—, ๐‘˜) โˆˆ [๐‘›1] ร— [๐‘›2] ร— [๐‘›3], 1 ๐‘ก 0 and is independent of each other. It is easy to P[(๐‘–, ๐‘—, ๐‘˜) โˆˆ ฮฉ] =1 โˆ’ P[(๐‘–, ๐‘—, ๐‘˜) โˆ‰ โˆช๐‘ก0 ๐‘ก=1ฮฉ๐‘ก] ๐‘ก0(cid:214) =1 โˆ’ P[(๐‘–, ๐‘—, ๐‘˜) โˆˆ ฮฉ๐‘ ๐‘ก ] = 1 โˆ’ ๐‘ก0(cid:214) ๐‘ก=0 (1 โˆ’ ๐‘) 1 ๐‘ก 0 = ๐‘. ๐‘ก=1 ๐‘ก0(cid:208) ๐‘ก=1 Therefore, the construction of ฮฉ = K๐‘›1ร—๐‘›2ร—๐‘›3 : ๐‘ก = 0, ยท ยท ยท , ๐‘ก0} be a sequence of tensors with A0 = 0 and ฮฉ๐‘ก shares the same distribution as that of ฮฉ. Let {A๐‘ก โˆˆ A๐‘ก = A๐‘กโˆ’1 + Rฮฉ๐‘ก PT(U โˆ— V(cid:62) โˆ’ PT(A๐‘กโˆ’1)), where Rฮฉ๐‘ก (T ) := ๐‘ž 1ฮฉ๐‘ก (๐‘–, ๐‘—, ๐‘˜) [T ]๐‘–, ๐‘—,๐‘˜ หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) 1 Next, our goal is to prove that Pฮฉ(Y) = Y by mathematical induction. For ๐‘ก = 0, Pฮฉ(A0) = (cid:205) ๐‘–โˆˆ[๐‘›1], ๐‘— โˆˆ[๐‘›2],๐‘˜ โˆˆ[๐‘›3] ๐‘— . Set Y := A๐‘ก0. Pฮฉ(0) = 0 = A0. Notice that A1 = A0 + Rฮฉ1 PT(U โˆ— V(cid:62) โˆ’ PT(A0)) = A0 + Rฮฉ1 PT(U โˆ— V(cid:62)) = Rฮฉ1 (U โˆ— U(cid:62) โˆ— (U โˆ— V(cid:62)) + (U โˆ— V(cid:62)) โˆ— V โˆ— V(cid:62) โˆ’ U โˆ— U(cid:62) โˆ— (U โˆ— V(cid:62)) โˆ— V โˆ— V(cid:62)) = Rฮฉ1 (U โˆ— V(cid:62)). Due to ฮฉ1 โІ ฮฉ, it is easy to see that Pฮฉ(A1) = Pฮฉ(Rฮฉ1 (U โˆ— V(cid:62))) = Rฮฉ1 (U โˆ— V(cid:62)) = A1. Assume that for ๐‘˜ โ‰ค ๐‘ก0 โˆ’ 1, it holds that Pฮฉ(A๐‘˜ ) = A๐‘˜ . By linearity of operator Pฮฉ and ฮฉ๐‘ก0 โІ ฮฉ, it follows that Pฮฉ(Y) = Pฮฉ(A๐‘ก0) = Pฮฉ(A๐‘ก0โˆ’1 + Rฮฉ๐‘ก 0 PT(U โˆ— V(cid:62) โˆ’ PT(A๐‘ก0โˆ’1))) = Pฮฉ(A๐‘ก0โˆ’1) + Pฮฉ(Rฮฉ๐‘ก 0 PT(U โˆ— V(cid:62) โˆ’ PT(A๐‘ก0โˆ’1))) = A๐‘ก0โˆ’1 + Rฮฉ๐‘ก 0 PT(U โˆ— V(cid:62) โˆ’ PT(A๐‘ก0โˆ’1)) = A๐‘ก0 = Y. Therefore Y = A๐‘ก0 is the dual certificate. 49 Now letโ€™s prove that (cid:107)PT(Y) โˆ’ U โˆ— V(cid:62)(cid:107)F โ‰ค 1 4 D๐‘ก = U โˆ— V(cid:62) โˆ’ PT(A๐‘ก). (cid:113) ๐‘ ๐‘›3 . For ๐‘ก = 0, 1, ยท ยท ยท , ๐‘ก0, set Notice that ๐‘ ๐‘ก0 Thus, one can derive the following results by Lemma 2.8: for each ๐‘ก, 1 0 โ‰ฅ 1 โˆ’ (1 โˆ’ ๐‘ก ๐‘ž = 1 โˆ’ (1 โˆ’ ๐‘) ๐‘ ๐‘ก0 ) = . (cid:107)D๐‘ก (cid:107)F โ‰ค (cid:13) (cid:13)PT โˆ’ PTRฮฉ๐‘ก PT (cid:13) (cid:13) (cid:107)D๐‘กโˆ’1(cid:107)F โ‰ค 1 2 (cid:107)D๐‘กโˆ’1(cid:107)F holds with probability at least 1 โˆ’ 2๐‘›1๐‘›2๐‘›3 exp(โˆ’ 3๐‘ž๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ). (2.22) (2.23) (2.24) Applying (2.24) from ๐‘ก = ๐‘ก0 to ๐‘ก = 1, we will have that (cid:107)PT(Y โˆ’ U โˆ— V(cid:62)) (cid:107)F = (cid:107)D๐‘ก0 (cid:107)F โ‰ค 1 2 (cid:107)D๐‘ก0โˆ’1(cid:107)F โ‰ค ยท ยท ยท โ‰ค ( )๐‘ก0 (cid:107)U โˆ— V(cid:62)(cid:107)F โ‰ค ( 1 2 โˆš ๐‘Ÿ )๐‘ก0 1 2 (2.25) holds with probability at least 1 โˆ’ 2๐‘ก0๐‘›1๐‘›2๐‘›3 exp(โˆ’ Since ๐‘ก0 โ‰ฅ (cid:108) log2 (cid:16) (cid:113) ๐‘›3๐‘Ÿ ๐‘ 4 (cid:17)(cid:109) , (cid:107)PT(Y โˆ’ U โˆ— V(cid:62))(cid:107)F โ‰ค 1 4 1 โˆ’ 2๐‘ก0๐‘›1๐‘›2๐‘›3 exp(โˆ’ Next, we move on to prove (cid:107)PTโŠฅ (Y)(cid:107) โ‰ค 1 Lemma 2.9 for ๐‘ก0 times, we can get 3๐‘ž๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ). (cid:113) ๐‘ ๐‘›3 3๐‘ž๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ 2. Recall that Y = ). holds with probability at least (2.26) ๐‘ก0(cid:205) ๐‘–=1 Rฮฉ๐‘ก PTD๐‘กโˆ’1. By applying (cid:107)PTโŠฅ (Y)(cid:107) ๐‘ก0(cid:213) (cid:13)PTโŠฅ (cid:0)Rฮฉ๐‘ก PT โˆ’ PT(cid:1) (D๐‘กโˆ’1)(cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:0)Rฮฉ๐‘ก โˆ’ I(cid:1) (PT(D๐‘กโˆ’1))(cid:13) (cid:13) โ‰ค โ‰ค โ‰ค โ‰ค = ๐‘ก=1 ๐‘ก0(cid:213) ๐‘ก=1 ๐‘ก0(cid:213) ๐‘ก=1 ๐‘ก0(cid:213) ๐‘ก=1 ๐‘ก0(cid:213) ๐‘ก=1 (cid:32) ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:32) ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:32) ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:115) (cid:115) (cid:107)PT(D๐‘กโˆ’1)(cid:107)โˆž + (cid:107)PT(D๐‘กโˆ’1)(cid:107)โˆž + 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:107)PT(D๐‘กโˆ’1) (cid:107)โˆž,2 (2.27) (cid:33) (cid:33) 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:107)PT(D๐‘กโˆ’1) (cid:107)โˆž,2 (2.28) (cid:107)D๐‘กโˆ’1(cid:107)โˆž + (cid:115) 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2 (cid:33) (2.29) 50 holds with probability at least 1 โˆ’ ๐‘ก0 ((๐‘›1 + ๐‘›2)๐‘›3)๐‘2โˆ’1 . (2.30) (2.28) holds due to (2.23). (2.29) is due to PT(D๐‘ก) = D๐‘ก by the construction of D๐‘ก in Equa- tion (2.22). Next, we will bound (2.29) by bounding these two terms: (i) (ii) ๐‘ก0(cid:205) ๐‘ก=1 ๐‘ก0(cid:205) ๐‘ก=1 ๐‘2 log((๐‘›1+๐‘›2)๐‘›3) ๐‘ž (cid:107)D๐‘กโˆ’1(cid:107)โˆž (cid:113) 2๐‘2 log((๐‘›1+๐‘›2)๐‘›3) ๐‘ž (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2 via estimating the upper bounds of (cid:107)D๐‘กโˆ’1(cid:107)โˆž and (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2. By applying Corollary 2.1 for ๐‘ก โˆ’ 1 times, where 2 โ‰ค ๐‘ก โ‰ค ๐‘ก0, we have (cid:107)D๐‘กโˆ’1(cid:107)โˆž = (cid:13) (cid:0)PT โˆ’ PTRฮฉ๐‘ก โˆ’1 PT(cid:1) ยท ยท ยท (cid:0)PT โˆ’ PTRฮฉ1 PT(cid:1) D0 (cid:13) (cid:19) ๐‘กโˆ’1 (cid:18) 1 2 (cid:107)D0(cid:107)โˆž โ‰ค (cid:13) (cid:13)โˆž holds with probability at least 1 โˆ’ 2๐‘›1๐‘›2๐‘›3(๐‘ก โˆ’ 1) exp (cid:16) โˆ’3๐‘ž๐‘›1๐‘›2 28(๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ (cid:17) . Therefore, ๐‘ก0(cid:213) ๐‘ก=1 ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:107)D๐‘กโˆ’1(cid:107)โˆž โ‰ค 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (cid:107)D0(cid:107)โˆž holds with probability at least 1 โˆ’ 2๐‘›1๐‘›2๐‘›3(๐‘ก0 โˆ’ 1) exp โˆ’3๐‘ž๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘ก0(cid:205) ๐‘ก=1 (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2. For simplicity of expression, we will denote Now we are going to estimate the upper bound for (cid:113) 2๐‘2 log((๐‘›1+๐‘›2)๐‘›3) ๐‘ž (cid:19) . (cid:18) (2.31) (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2 by bounding ๐‘Ž = 2ยท (cid:113) ๐‘1 log((๐‘›1+๐‘›2)๐‘›3) ๐‘ž (cid:113) (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 and ๐‘ = ๐‘1 (log((๐‘›1+๐‘›2)๐‘›3)) ๐‘ž (cid:113) (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 . By applying Lemma 2.10 by ๐‘ก โˆ’ 1 times and considering the fact that PT(D๐‘ ) = D๐‘  for all 0 โ‰ค ๐‘  โ‰ค ๐‘ก0, we obtain (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2 = (cid:13) (cid:13) (cid:0)PT โˆ’ PTRฮฉ๐‘ก โˆ’1 PT(cid:1) (D๐‘กโˆ’2)(cid:13) (cid:13)โˆž,2 โ‰ค๐‘Ž(cid:107)D๐‘กโˆ’2(cid:107)โˆž,2 + ๐‘(cid:107)D๐‘กโˆ’2(cid:107)โˆž โ‰ค ยท ยท ยท โ‰ค ๐‘Ž๐‘กโˆ’1(cid:107)D0(cid:107)โˆž,2 + ๐‘ holds with probability at least 1 โˆ’ ๐‘กโˆ’1 ((๐‘›1๐‘›3+๐‘›2๐‘›3)๐‘ 1 โˆ’2 for 2 โ‰ค ๐‘ก โ‰ค ๐‘ก0. ๐‘กโˆ’2 (cid:213) ๐‘–=0 ๐‘Ž๐‘– (cid:107)D๐‘กโˆ’2โˆ’๐‘– (cid:107)โˆž 51 Therefore, (cid:115) ๐‘ก0(cid:213) 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2 ๐‘ก=1 (cid:115) โ‰ค (cid:115) = 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ๐‘Ž๐‘กโˆ’1(cid:107)D0(cid:107)โˆž,2 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (cid:107)D0(cid:107)โˆž,2 1 โˆ’ ๐‘Ž๐‘ก0 1 โˆ’ ๐‘Ž (cid:32)(cid:32) ๐‘ก0(cid:213) ๐‘ก=1 (cid:32) (cid:33) ๐‘ก0(cid:213) ๐‘ ๐‘กโˆ’2 (cid:213) + ๐‘ก=2 ๐‘ก0(cid:213) + ๐‘ ยท ๐‘–=0 ๐‘กโˆ’2 (cid:213) ๐‘ก=2 ๐‘–=0 (cid:33) ๐‘Ž๐‘– (cid:107)D๐‘กโˆ’2โˆ’๐‘– (cid:107)โˆž (cid:33) ๐‘Ž๐‘– (cid:107)D๐‘กโˆ’2โˆ’๐‘– (cid:107)โˆž holds with probability at least 1 โˆ’ ๐‘ก0 โˆ’ 1 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)๐‘1โˆ’2 . Taking the process of estimating the upper bound for (cid:107)D๐‘ก (cid:107)โˆž into account, we thus have (2.29) โ‰ค 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (cid:107)D0(cid:107)โˆž + 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (cid:107)D0(cid:107)โˆž,2 1 โˆ’ ๐‘Ž (cid:115) (cid:115) + 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ๐‘ก0(cid:213) ๐‘กโˆ’2 (cid:213) ๐‘Ž๐‘– ยท ๐‘ ยท (cid:19) ๐‘กโˆ’2โˆ’๐‘– (cid:107)D0(cid:107)โˆž (cid:18) 1 2 ๐‘ก=2 (cid:115) ๐‘–=0 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (cid:107)D0(cid:107)โˆž,2 1 โˆ’ ๐‘Ž 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž โ‰ค ยท (cid:107)D0(cid:107)โˆž + (cid:115) + 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท ยท (cid:107)D0(cid:107)โˆž 2๐‘ 1 โˆ’ 2๐‘Ž 1 โˆ’2 โˆ’2(๐‘ก0 โˆ’1)๐‘›1๐‘›2๐‘›3 exp (2.32) (2.33) (2.34) holds with probability at least 1โˆ’ ๐‘ก0โˆ’1 (๐‘›1๐‘›3+๐‘›2๐‘›3)๐‘ (cid:16) โˆ’3๐‘ž๐‘›1๐‘›2 28(๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ (cid:17) when 0 < ๐‘Ž โ‰ค 1 4. Therefore, (cid:107)PTโŠฅ (Y) (cid:107) โ‰ค ๐‘ก0(cid:213) (cid:32) ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:107)D๐‘กโˆ’1(cid:107)โˆž + (cid:115) 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž (cid:107)D๐‘กโˆ’1(cid:107)โˆž,2 (cid:33) ๐‘ก=1 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž โ‰ค (cid:115) ยท (cid:107)D0(cid:107)โˆž + 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (cid:107)D0(cid:107)โˆž,2 1 โˆ’ ๐‘Ž (cid:115) + 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท 2๐‘ 1 โˆ’ 2๐‘Ž ยท (cid:107)D0(cid:107)โˆž holds with probability at least 1 โˆ’ ๐‘ก0 โˆ’ 1 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)๐‘1โˆ’2 provided that 0 < ๐‘Ž โ‰ค 1 4. โˆ’ 2(๐‘ก0 โˆ’ 1)๐‘›1๐‘›2๐‘›3 exp (cid:18) โˆ’3๐‘ž๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ (cid:19) โˆ’ ๐‘ก0 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)๐‘2โˆ’1 52 Note that (cid:107)D0(cid:107)โˆž = (cid:107)U โˆ— V(cid:62)(cid:107)โˆž โ‰ค (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ 2๐‘›1๐‘›2 and (cid:107)D0(cid:107)โˆž,2 = (cid:107)U โˆ— V(cid:62)(cid:107)โˆž,2 โ‰ค (cid:113) (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 . And combining (2.30), (2.31), and (2.6.2.2), we thus have (cid:107)PTโŠฅ (Y) (cid:107) โ‰ค ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 + (cid:115) 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (cid:112)(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ โˆš ๐‘›1๐‘›2 (1 โˆ’ ๐‘Ž) (cid:115) + 2๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท ๐‘ 1 โˆ’ 2๐‘Ž ยท (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž โ‰ค ยท (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 (cid:18) log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท + 2๐‘1 (cid:112) 2๐‘2 (cid:19) 3/2 (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 (cid:115) 4 + โˆš 2๐‘2 3 ยท log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 holds with probability at least 1 โˆ’ ๐‘ก0 โˆ’ 1 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)๐‘1โˆ’2 โˆ’ 2(๐‘ก0 โˆ’ 1)๐‘›1๐‘›2๐‘›3 exp (cid:18) โˆ’3๐‘ž๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ (cid:19) โˆ’ ๐‘ก0 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)๐‘2โˆ’1 provided that 0 < ๐‘Ž โ‰ค 1 4. (cid:113) ๐‘1 log((๐‘›1+๐‘›2)๐‘›3) ๐‘ž Since ๐‘Ž = 2 ยท (cid:113) (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 ๐‘ž โ‰ฅ 64๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3) ยท , the restriction 0 < ๐‘Ž โ‰ค 1 (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 4 is equivalent to . Notice that We thus have ๐‘ โ‰ฅ 256(๐‘›1 + ๐‘›2)๐œ‡0๐›ฝ๐‘Ÿ log2((๐‘›1 + ๐‘›2)๐‘›3) ๐‘›1๐‘›2 . (cid:18) (cid:114) ๐‘›3๐‘Ÿ 4 ๐‘ (cid:19) (cid:101) โ‰ค ๐‘ก0 =(cid:100)log2 (cid:24) log2 (cid:24) 1 2 โ‰ค (cid:18) (cid:114) 4 ๐‘›1๐‘›2๐‘›3 256(๐‘›1 + ๐‘›2)๐œ‡0๐›ฝ log2((๐‘›1 + ๐‘›2)๐‘›3) (cid:25) (cid:19) (cid:19)(cid:25) โˆ’ 2 โ‰ค log((๐‘›1 + ๐‘›2)๐‘›3) log2 (cid:18) ๐‘›1๐‘›2๐‘›3 (๐‘›1 + ๐‘›2)๐œ‡0๐›ฝ ๐‘ž โ‰ฅ , we have 256(๐‘›1 + ๐‘›2)๐œ‡0๐›ฝ๐‘Ÿ log2((๐‘›1 + ๐‘›2)๐‘›3) ๐‘›1๐‘›2 256(๐‘›1 + ๐‘›2)๐œ‡0๐›ฝ๐‘Ÿ log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘›1๐‘›2 , = ยท 1 log ((๐‘›1 + ๐‘›2)๐‘›3) In addition ๐‘ž โ‰ฅ ๐‘ ๐‘ก0 (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ log((๐‘›1+๐‘›2)๐‘›3) ๐‘ž๐‘›1๐‘›2 i.e., 256๐›ฝ . Therefore the condition that ๐‘ž โ‰ฅ 64๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3) (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ โ‰ค 1 ๐‘›1๐‘›2 holds when ๐‘1 = 4๐›ฝ. Hence, the 53 condition ๐‘Ž โ‰ค 1 (cid:107)PTโŠฅ (Y) (cid:107) โ‰ค 4 holds. In addition, by setting ๐‘2 = 12๐›ฝ, we have (cid:115) ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž โˆš + 4 ยท 2๐‘2 3 ยท (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 (cid:18) log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท 2๐‘2 3 (cid:115) 1 256๐›ฝ + 2๐‘1 (cid:112) 2๐‘2 (cid:19) 3/2 (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 (cid:18) 1 256๐›ฝ (cid:19) 3/2 < 1 2 . (cid:112) + 2๐‘1 โ‰ค ๐‘2 256๐›ฝ + 2๐‘2 โˆš 4 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ž ยท (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 with probability at least 1 โˆ’ log(๐‘›1๐‘›3 + ๐‘›2๐‘›3) (๐‘›1๐‘›3 + ๐‘›2๐‘›3)4๐›ฝโˆ’2 โˆ’ โ‰ฅ 1 โˆ’ 3 log(๐‘›1๐‘›3 + ๐‘›2๐‘›3) (๐‘›1๐‘›3 + ๐‘›2๐‘›3)4๐›ฝโˆ’2 log(๐‘›1๐‘›3 + ๐‘›2๐‘›3) (๐‘›1๐‘›3 + ๐‘›2๐‘›3)27๐›ฝโˆ’2 . โˆ’ log(๐‘›1๐‘›3 + ๐‘›2๐‘›3) (๐‘›1๐‘›3 + ๐‘›2๐‘›3)12๐›ฝโˆ’1 Notice that the probabilistic estimation for the validity of the Condition 2 is predicated on the as- 2)๐‘กโˆ’1(cid:107)D0(cid:107)โˆž based on the sumption that the Condition 1 holds true, where we show (cid:107)D (cid:107)โˆž โ‰ค ( 1 Condition 1. Thus, T is the unique minimizer with probability at least 1 โˆ’ 3 log(๐‘›1๐‘›3 + ๐‘›2๐‘›3) (๐‘›1๐‘›3 + ๐‘›2๐‘›3)4๐›ฝโˆ’2 . (cid:131) 2.6.2.3 Proof of supporting lemmas towards Theorem 2.5 Proof of Lemma 2.6. Substitute ๐œ = (cid:112) 2๐‘๐œŽ2 log (๐‘›1 + ๐‘›2)+๐‘๐‘€ log (๐‘›1 + ๐‘›2) to โˆ’๐œ2/2 ๐œŽ2+ ๐‘€ ๐œ 3 in Lemma 2.5. We can get โˆ’๐œ2/2 ๐œŽ2 + ๐‘€๐œ 3 = โˆ’ โˆš 2๐‘๐œŽ2 log(๐‘›1 + ๐‘›2) + 2 โˆš 2๐œŽ2 + 2 2๐‘ 3 ๐‘ 1 2 ๐œŽ๐‘€ log 2 3 2 ๐œŽ๐‘€ log 3 2 (๐‘›1 + ๐‘›2) + ๐‘2๐‘€ 2 log2(๐‘›1 + ๐‘›2) 1 2 (๐‘›1 + ๐‘›2) + 2๐‘๐‘€ 2 3 log (๐‘›1 + ๐‘›2) โ‰ค โˆ’๐‘ log(๐‘›1 + ๐‘›2). Proof of Lemma 2.7. (cid:107)PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) (cid:107)2 F = (cid:104)PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ), PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— )(cid:105) = (cid:104)PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:105) = (cid:13) (cid:13)U(cid:62) โˆ— หš๐”ข๐‘– โ‰ค (cid:13) (cid:13)U(cid:62) โˆ— หš๐”ข๐‘– (cid:13) 2 (cid:13) F (cid:13) 2 (cid:13) F (cid:13)U(cid:62) โˆ— หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข ๐‘— (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 + (cid:13) (cid:13)V(cid:62) โˆ— หš๐”ข ๐‘— (cid:13) (cid:13) 2 F โˆ’ (cid:13) + (cid:13) (cid:13)V(cid:62) โˆ— หš๐”ข ๐‘— (cid:13) (cid:13) 2 F = 54 (cid:131) (cid:131) (cid:62) โˆ— V(cid:13) 2 (cid:13) F , In the following context, we use ๐›ฟ๐‘–, ๐‘—,๐‘˜ to represent indicator function 1(๐‘–, ๐‘—,๐‘˜)โˆˆฮฉ. Proof of Lemma 2.8. By the fact that PT is self adjoint and idempotent operator, we can get E[PTRฮฉPT] = PT(ERฮฉ)PT = PT. It is easy to check that PTRฮฉPT(X) = (cid:213) ๐‘–, ๐‘—,๐‘˜ 1 ๐‘ ๐›ฟ๐‘–, ๐‘—,๐‘˜ (cid:10)X, PT (cid:0)หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข ๐‘— (cid:62)(cid:1) (cid:11) PT (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) . Fix a tensor X โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, we can write (cid:18) 1 ๐‘ (PTRฮฉPT โˆ’ PT) (X) = (cid:213) ๐‘–, ๐‘—,๐‘˜ ๐›ฟ๐‘– ๐‘— ๐‘˜ โˆ’ 1 (cid:19) (cid:68) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X) (cid:69) PT (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) =: (cid:213) ๐‘–, ๐‘—,๐‘˜ H๐‘– ๐‘— ๐‘˜ (X) where H๐‘– ๐‘— ๐‘˜ : K๐‘›1ร—๐‘›2ร—๐‘›3 โ†’ K๐‘›1ร—๐‘›2ร—๐‘›3 is a self-adjoint random operator and ๐›ฟ๐‘–, ๐‘—,๐‘˜ is the indicator function. It is direct to see that E (cid:2)H๐‘– ๐‘— ๐‘˜ (cid:3) = 0 due to the fact that E( 1 Define the operator H ๐‘– ๐‘— ๐‘˜ : B โ†’ B, where B = denotes the set consists of ๐‘ E(๐›ฟ๐‘–, ๐‘—,๐‘˜ ) โˆ’ 1 = 0. B : B โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 ๐‘ ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) = 1 (cid:111) (cid:110) block diagonal matrices with the blocks as the frontal slices of B, as (cid:69) (cid:68) H ๐‘– ๐‘— ๐‘˜ (X) := H๐‘– ๐‘— ๐‘˜ (X) = ( ๐›ฟ๐‘– ๐‘— ๐‘˜ โˆ’ 1) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X) 1 ๐‘ (cid:16) PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) . adjoint. Using the fact that E( 1 It is easy to check that H ๐‘– ๐‘— ๐‘˜ is also self-adjoint by using the fact that the operator PT(ยท) is self- ๐‘ ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) = 0 again, we have E (cid:104) (cid:13) (cid:13) H ๐‘– ๐‘— ๐‘˜ (cid:13) the non-commutative Bernstein inequality, we need to bound = 0. To prove the result by (cid:13) (cid:13) (cid:13) and 2 ๐‘–, ๐‘—,๐‘˜ H ๐‘– ๐‘— ๐‘˜ E (cid:104) H (cid:105) (cid:105) (cid:13) (cid:13) (cid:13) . Firstly, (cid:13) (cid:13) (cid:13) (cid:13) (cid:205) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) 55 we have (cid:13) (cid:13) (cid:13) H ๐‘– ๐‘— ๐‘˜ (cid:13) (cid:13) (cid:13) = sup (cid:107)X(cid:107)F=1 = sup (cid:107)X(cid:107)F=1 = sup (cid:107)X(cid:107)F=1 โ‰ค sup (cid:107)X(cid:107)F=1 = sup (cid:107)X(cid:107)F=1 = sup H ๐‘–, ๐‘—,๐‘˜ (X) (cid:13) (cid:13) (cid:13)F ๐›ฟ๐‘– ๐‘— ๐‘˜ โˆ’ 1) ๐›ฟ๐‘– ๐‘— ๐‘˜ โˆ’ 1) (cid:68) (cid:68) ( ( 1 ๐‘ 1 ๐‘ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ), X (cid:69) (cid:69) PT PT (cid:16) (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17)(cid:13) (cid:13) (cid:13) (cid:13)F (cid:17)(cid:13) (cid:13) (cid:13) (cid:13)F 1 ๐‘ 1 ๐‘ 1 ๐‘ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) (cid:13) (cid:13) (cid:13)F (cid:13) (cid:13) (cid:13)F (cid:107)X(cid:107)F (cid:13) (cid:13) (cid:13) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) (cid:13) (cid:13) (cid:13)F (cid:107)X(cid:107)F โˆš ๐‘›3 (cid:13) (cid:13) (cid:13) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) (cid:13) (cid:13) (cid:13)F PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— โ€˜หš๐”ข(cid:62) ๐‘— ) (cid:13) (cid:13) (cid:13)F (cid:107)X(cid:107)F (cid:13) (cid:13) (cid:13) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) (cid:13) (cid:13) (cid:13)F (cid:107)X(cid:107)F=1 (cid:13) 1 (cid:13) (cid:13) ๐‘ = PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) 2 (cid:13) (cid:13) (cid:13) โ‰ค ๐œ‡0(๐‘›1 + ๐‘›2)๐‘Ÿ ๐‘›1๐‘›2 ๐‘ . Next, we move on to bound (cid:13) (cid:13) (cid:205) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) E (cid:104) 2 ๐‘–, ๐‘—,๐‘˜ H F (cid:13) (cid:13) (cid:105) (cid:13) . By using the fact that PT is a self-adjoint and an (cid:13) (cid:13) idempotent operator, we can get that 2 ๐‘–, ๐‘—,๐‘˜ (X) = H (cid:19) 2 ๐›ฟ๐‘– ๐‘— ๐‘˜ โˆ’ 1 (cid:18) 1 ๐‘ (cid:104)หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X)(cid:105)(cid:104)๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— )(cid:105)PT (cid:16) ๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) . 56 (cid:20) (cid:16) 1 ๐‘ ๐›ฟ๐‘– ๐‘— ๐‘˜ โˆ’ 1 (cid:17) 2(cid:21) = 1โˆ’๐‘ ๐‘ โ‰ค 1 ๐‘ . Notice that H (cid:105) 2 ๐‘– ๐‘— ๐‘˜ (X) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)F โ‰ค 1 ๐‘ (cid:213) (cid:213) E (cid:104) Note that E (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) ๐‘›3 ๐‘ ๐‘›3 ๐‘ (cid:213) (cid:213) โˆš โˆš โ‰ค = (cid:104)หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X)(cid:105)(cid:104)หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT (cid:68) (cid:68) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X) (cid:69) (cid:68) (cid:69) (cid:68) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT (cid:16) (cid:16) (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) (cid:105)PT (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)F หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17)(cid:69) (cid:16) PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17)(cid:69) (cid:16) ยท หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)F (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)F โˆš ๐‘›3 ๐‘ โ‰ค (cid:110)(cid:68) ยท max ๐‘–, ๐‘—,๐‘˜ หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17)(cid:69)(cid:111) (cid:68) (cid:213) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— , PT(X) (cid:16) (cid:69) ยท หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)F หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17)(cid:13) 2 (cid:13) (cid:13) F ) ยท (cid:107)PT(X)(cid:107)F (cid:107)PT(X)(cid:107)F (By Lemma 2.7) โˆš (cid:16) = โ‰ค (cid:13) (cid:13) (cid:13) PT ๐‘›3 ยท (max ๐‘ ๐‘–, ๐‘—,๐‘˜ โˆš ๐‘›3(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 ๐‘›3(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 We have operator norm of (cid:205) ๐‘–, ๐‘—,๐‘˜ (cid:107)X(cid:107)F = โˆš โ‰ค (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 (cid:105) 2 H ๐‘– ๐‘— ๐‘˜ E (cid:104) (cid:107)X(cid:107)F. is bounded by (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 . Thus, we use non-commutative Bernstein inequality to the following result: Notice that ๐œŽ2 ๐‘€ = 1 since ๐œŽ2 = ๐‘€ = (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 . Thus, by Lemma 2.5, we have (cid:20) P (cid:107)PTPฮฉPT โˆ’ PT(cid:107) > 1 2 (cid:21) = P (cid:213) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) ๏ฃฎ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฏ ๏ฃฐ H๐‘– ๐‘— ๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) > 1 2 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป =P (cid:213) (cid:13) (cid:13) ๏ฃฎ ๏ฃฏ (cid:13) ๏ฃฏ (cid:13) (cid:13) ๏ฃฏ ๐‘–, ๐‘—,๐‘˜ (cid:13) ๏ฃฏ ๏ฃฐ H ๐‘– ๐‘— ๐‘˜ โ‰ค2๐‘›1๐‘›2๐‘›3 exp (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:18) > 1 2 ๏ฃน ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃบ ๏ฃป 3๐‘๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ โˆ’ (cid:19) . (cid:131) Proof of Lemma 2.9. It is easy to check that Rฮฉ(X) โˆ’ X = (cid:19) ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 (cid:18) 1 ๐‘ (cid:213) ๐‘–, ๐‘—,๐‘˜ [X]๐‘–, ๐‘—,๐‘˜ หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— =: E๐‘–, ๐‘—,๐‘˜ . (cid:213) ๐‘–, ๐‘—,๐‘˜ 57 Notice that E[E๐‘–, ๐‘—,๐‘˜ ] = 0 and (cid:107)E๐‘–, ๐‘—,๐‘˜ (cid:107) โ‰ค 1 inequality, we just need to check uniform boundness of the spectral norm of E( (cid:205) ๐‘–, ๐‘—,๐‘˜ ๐‘ (cid:107)X(cid:107)โˆž. In order to use the non-commutative Bernstein (E๐‘–, ๐‘—,๐‘˜ )(cid:62)E๐‘–, ๐‘—,๐‘˜ ) and E( (cid:205) ๐‘–, ๐‘—,๐‘˜ E๐‘–, ๐‘—,๐‘˜ (E๐‘–, ๐‘—,๐‘˜ )(cid:62)). Using the fact that หš๐”ข๐‘˜ (cid:62) โˆ— หš๐”ข๐‘˜ = (cid:164)๐”ข(cid:62) ๐‘˜ โˆ— (cid:164)๐”ข๐‘˜ = (cid:164)๐”ข1,(cid:164)๐”ข1 โˆ— (cid:164)๐”ข๐‘˜ = (cid:164)๐”ข๐‘˜ , and หš๐”ข ๐‘— โˆ— (cid:164)๐”ข1 = หš๐”ข ๐‘— , we can have the following result: E(cid:62) ๐‘–, ๐‘—,๐‘˜ (T ) โˆ— E๐‘–, ๐‘—,๐‘˜ (X) = = = = = = = (cid:19) 2 (cid:19) 2 (cid:19) 2 (cid:19) 2 (cid:19) 2 (cid:19) 2 (cid:19) 2 ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1 (cid:18) 1 ๐‘ (cid:18) 1 ๐‘ (cid:18) 1 ๐‘ (cid:18) 1 ๐‘ (cid:18) 1 ๐‘ (cid:18) 1 ๐‘ (cid:18) 1 ๐‘ [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) (cid:62) (cid:16) โˆ— หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:0)หš๐”ข ๐‘— โˆ— (cid:164)๐”ข(cid:62) ๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘– (cid:1) โˆ— (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) [X]2 ๐‘–, ๐‘—,๐‘˜ หš๐”ข ๐‘— โˆ— (cid:164)๐”ข(cid:62) ๐‘˜ โˆ— (cid:0)หš๐”ข(cid:62) ๐‘– โˆ— หš๐”ข๐‘–(cid:1) โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— [X]2 ๐‘–, ๐‘—,๐‘˜ หš๐”ข ๐‘— โˆ— (cid:164)๐”ข(cid:62) ๐‘˜ โˆ— ( (cid:164)๐”ข1 โˆ— (cid:164)๐”ข๐‘˜ ) โˆ— หš๐”ข(cid:62) ๐‘— [X]2 ๐‘–, ๐‘—,๐‘˜ หš๐”ข ๐‘— โˆ— (cid:0)(cid:164)๐”ข(cid:62) ๐‘˜ โˆ— (cid:164)๐”ข๐‘˜ (cid:1) โˆ— หš๐”ข(cid:62) ๐‘— [X]2 ๐‘–, ๐‘—,๐‘˜ หš๐”ข ๐‘— โˆ— ( (cid:164)๐”ข1) โˆ— หš๐”ข(cid:62) ๐‘— [X]2 ๐‘–, ๐‘—,๐‘˜ หš๐”ข ๐‘— โˆ— หš๐”ข(cid:62) ๐‘— . 58 (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:170) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:174) (cid:172) Notice that หš๐”ข ๐‘— โˆ— หš๐”ข(cid:62) ๐‘— returns a zero tensor except for ( ๐‘—, ๐‘—, 1)th entry being 1. We have (cid:213) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) E[E (cid:62) ๐‘–, ๐‘—,๐‘˜ E๐‘–, ๐‘—,๐‘˜ ] E[E(cid:62) ๐‘–, ๐‘—,๐‘˜ โˆ— E๐‘–, ๐‘—,๐‘˜ ] (by the definition of spectral norm of tensor) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) [X]2 ๐‘–, ๐‘—,๐‘˜ หš๐”ข ๐‘— โˆ— หš๐”ข(cid:62) ๐‘— [X]2 ๐‘–, ๐‘—,๐‘˜ หš๐”ข ๐‘— หš๐”ข (cid:62) ๐‘— (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:205) ๐‘–,๐‘˜ [X]2 ๐‘–,1,๐‘˜ (cid:205) ๐‘–,๐‘˜ [X]2 ๐‘–,2,๐‘˜ . . . (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) = โ‰ค = = (cid:213) (cid:213) 1 ๐‘ 1 ๐‘ (cid:213) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:169) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:173) (cid:13) (cid:13) (cid:171) 1 ๐‘ (cid:205) ๐‘–,๐‘˜ [X]2 ๐‘–,๐‘›2,๐‘˜ ๐‘‚๐‘›2 (๐‘›3โˆ’1)ร—๐‘›2 (๐‘›3โˆ’1) (cid:41) = 1 ๐‘ max ๐‘— โˆˆ[๐‘›3] (cid:13) (cid:13)[X]:, ๐‘—,: (cid:13) (cid:13) . 2 F (cid:40) = (cid:213) 1 ๐‘ max ๐‘— โˆˆ[๐‘›3] (cid:13) (cid:205)๐‘–, ๐‘—,๐‘˜ E[E๐‘–, ๐‘—,๐‘˜ E (cid:13) (cid:13) ๐‘–,๐‘˜ [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:62) ๐‘–, ๐‘—,๐‘˜ ] (cid:13) (cid:13) (cid:13) โ‰ค 1 ๐‘ max ๐‘–โˆˆ[๐‘›3] (cid:13) (cid:13)[X]๐‘–,:,: max (cid:213) E( (E๐‘–, ๐‘—,๐‘˜ )(cid:62)E๐‘–, ๐‘—,๐‘˜ ), E( E๐‘–, ๐‘—,๐‘˜ (E๐‘–, ๐‘—,๐‘˜ )(cid:62)) (cid:213) ๐‘–, ๐‘—,๐‘˜ (cid:13) 2 F. Thus, (cid:13) ๏ฃผ๏ฃด๏ฃด๏ฃฝ ๏ฃด๏ฃด ๏ฃพ 1 ๐‘ โ‰ค (cid:107)X(cid:107)2 โˆž,2 . Similarly, we can get ๏ฃฑ๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด ๏ฃณ ๐‘–, ๐‘—,๐‘˜ By Lemma 2.6, for any ๐‘ > 1, (cid:13) (cid:13) (cid:13) (cid:107)Rฮฉ(X) โˆ’ X(cid:107) = Rฮฉ โˆ’ X (cid:13) (cid:13) (cid:13) = (cid:213) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) E๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:115) 2๐‘2 ๐‘ โ‰ค (cid:107)X(cid:107)2 โˆž,2 log((๐‘›1 + ๐‘›2)๐‘›3) + ๐‘2 log((๐‘›1 + ๐‘›2)๐‘›3) ๐‘ (cid:107)T (cid:107)โˆž holds with probability at least 1 โˆ’ ((๐‘›1 + ๐‘›2)๐‘›3)1โˆ’๐‘2. Proof of Lemma 2.10 Consider any ๐‘-th lateral column of PTRฮฉ(X) โˆ’ PT(X): 1 ๐‘ ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) [X]๐‘–, ๐‘—,๐‘˜ PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) (PTRฮฉ(X) โˆ’ PT(X)) โˆ— หš๐”ข๐‘ = ๐‘— ) โˆ— หš๐”ข๐‘ =: (cid:213) ๐‘–, ๐‘—,๐‘˜ where ๐”ž๐‘–, ๐‘—,๐‘˜ โˆˆ K๐‘›1ร—1ร—๐‘›3 are zero-mean independent lateral tensor columns. ( ๐”ž๐‘–, ๐‘—,๐‘˜ , (cid:213) ๐‘–, ๐‘—,๐‘˜ 59 Denote (cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ โˆˆ K๐‘›1๐‘›3ร—1 as the vectorized column vector of ๐”ž๐‘–, ๐‘—,๐‘˜ . Then, we have (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) (cid:13) (cid:13)(cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ | [X]๐‘–, ๐‘—,๐‘˜ | (cid:13) = (cid:13) (cid:13) ๐‘— ) โˆ— หš๐”ข๐‘ (cid:13)๐”ž๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13)F (cid:13) (cid:13)F 1 ๐‘ 1 ๐‘ (cid:13) (cid:13) (cid:13) (cid:115) โ‰ค โ‰ค (cid:107)X(cid:107)โˆž. We also have (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:213) E( (cid:174)๐”ž(cid:62) ๐‘–, ๐‘—,๐‘˜ (cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ ) (cid:213) = E( (cid:13) (cid:13)๐”ž๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) 2 F ) = 1 โˆ’ ๐‘ ๐‘ (cid:213) [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) โˆ— หš๐”ข๐‘ (cid:13) 2 (cid:13) (cid:13) F . ๐‘–, ๐‘—,๐‘˜ ๐‘–, ๐‘—,๐‘˜ By the definition of PT and the incoherence condition, we have: ๐‘–, ๐‘—,๐‘˜ = ๐‘— ) โˆ— หš๐”ข๐‘) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)F (cid:13) (U โˆ— U(cid:62) โˆ— หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ ) โˆ— หš๐”ข(cid:62) ๐‘— โˆ— หš๐”ข๐‘ + (I๐‘›1 โˆ’ U โˆ— U(cid:62)) โˆ— หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) (cid:13) (cid:13) (cid:114) ๐œ‡0๐‘Ÿ ๐‘›1 (cid:114) ๐œ‡0๐‘Ÿ ๐‘›1 (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) By Cauchy-Schwartz inequality, we have (cid:13)(I๐‘›1 โˆ’ U โˆ— U(cid:62)) โˆ— หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ ๐‘— โˆ— V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13)F (cid:13) (cid:13) (cid:13)F (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— หš๐”ข๐‘ ๐‘— โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13)F + (cid:13) (cid:13) (cid:13) โ‰ค โ‰ค + ๐‘— โˆ— V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13)F ๐‘— โˆ— V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13)F (cid:13) (cid:13) (cid:13) PT( หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ) โˆ— หš๐”ข๐‘) (cid:13) 2 (cid:13) (cid:13) F โ‰ค 2๐œ‡0๐‘Ÿ ๐‘›1 (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13) 2 F + 2 (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) 2 (cid:13) (cid:13) F . Thus, (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) E( (cid:213) ๐‘–, ๐‘—,๐‘˜ (cid:174)๐”ž(cid:62) ๐‘–, ๐‘—,๐‘˜ (cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ ) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) โ‰ค = โ‰ค = โ‰ค Similarly, we can bound 2๐œ‡0๐‘Ÿ ๐‘๐‘›1 2๐œ‡0๐‘Ÿ ๐‘๐‘›1 2๐œ‡0๐‘Ÿ ๐‘๐‘›1 2๐œ‡0๐‘Ÿ ๐‘๐‘›1 2๐œ‡0๐‘Ÿ ๐‘๐‘›1 [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— หš๐”ข๐‘ [X]2 ๐‘–, ๐‘—,๐‘˜ [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— หš๐”ข๐‘ (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13) 2 F (cid:13) (cid:13) (cid:13) 2 F (cid:13) (cid:13) (cid:13) 2 F + + + 2 ๐‘ 2 ๐‘ 2 ๐‘ (cid:213) ๐‘–, ๐‘—,๐‘˜ (cid:213) ๐‘— (cid:213) ๐‘— [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13) 2 F (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) (cid:13)หš๐”ข(cid:62) (cid:13) ๐‘— โˆ— V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) (cid:13) (cid:13) 2 F (cid:13) (cid:13) (cid:13) 2 F (cid:213) ๐‘–,๐‘˜ [X]2 ๐‘–, ๐‘—,๐‘˜ (cid:107)X(cid:107)2 โˆž,2 [X]2 ๐‘–,๐‘,๐‘˜ + 2 ๐‘ (cid:13) (cid:13)V โˆ— V(cid:62) โˆ— หš๐”ข๐‘ (cid:13) (cid:13) 2 F (cid:107)X(cid:107)2 โˆž,2 (cid:213) ๐‘–, ๐‘—,๐‘˜ (cid:213) ๐‘–, ๐‘—,๐‘˜ (cid:213) ๐‘–, ๐‘—,๐‘˜ (cid:213) ๐‘–,๐‘˜ (cid:107)X(cid:107)2 โˆž,2 โ‰ค 2(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 (cid:107)X(cid:107)2 โˆž,2 by the same quantity. (cid:12) (cid:12) E( (cid:205) (cid:12) (cid:12) ๐‘–, ๐‘—,๐‘˜ (cid:12) (cid:107)X(cid:107)2 โˆž,2 + 2๐œ‡0๐‘Ÿ ๐‘๐‘›2 (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:113) (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘›1๐‘›2 ๐‘–, ๐‘—,๐‘˜ ) (cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ (cid:174)๐”ž(cid:62) For simplicity, we let ๐‘€ = 1 ๐‘ (cid:107)X(cid:107)โˆž and ๐œŽ2 = 2(๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 (cid:107)X(cid:107)2 โˆž,2. 60 By Lemma 2.6, for any ๐‘1 > 1, we have (cid:213) (cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ โ‰ค (cid:112) P (cid:169) (cid:173) (cid:171) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) = P (cid:169) (cid:13) (cid:173) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:171) ๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3)(cid:112)(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ โˆš ๐‘›1๐‘›2 (cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ (cid:213) (cid:115) โ‰ค + ๐‘ 2๐‘1๐œŽ2 log((๐‘›1 + ๐‘›2)๐‘›3) + ๐‘1๐‘€ log((๐‘›1 + ๐‘›2)๐‘›3)(cid:170) (cid:174) (cid:172) 4๐‘1 log ((๐‘›1 + ๐‘›2)๐‘›3) (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 ยท (cid:107)X(cid:107)โˆž,2 (cid:33) (cid:107)X(cid:107)โˆž โ‰ฅ 1 โˆ’ ((๐‘›1 + ๐‘›2)๐‘›3)1โˆ’๐‘1. (cid:213) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) ๐”ž๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)F = (cid:213) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) ๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:174)๐”ž๐‘–, ๐‘—,๐‘˜ (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) . Notice that (cid:107) (PTRฮฉ(X) โˆ’ PT(X)) โˆ— หš๐”ข๐‘ (cid:107)F = Therefore, (cid:32) P (cid:107) (PTRฮฉ(X) โˆ’ PT(X)) โˆ— หš๐”ข๐‘ (cid:107)F โ‰ค (cid:115) 4๐‘1(๐‘›1 + ๐‘›2) log ((๐‘›1 + ๐‘›2)๐‘›3) ๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 ยท (cid:107)X(cid:107)โˆž,2+ ๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3)(cid:112)(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ โˆš ๐‘›1๐‘›2 ๐‘ (cid:33) (cid:107)X(cid:107)โˆž โ‰ฅ 1 โˆ’ ((๐‘›1 + ๐‘›2)๐‘›3)1โˆ’๐‘1. Using a union bound over all the tensor lateral slices, we have (cid:32) P max ๐‘ {(cid:107) (PTRฮฉ(X) โˆ’ PT(X)) โˆ— หš๐”ข๐‘ (cid:107)F} โ‰ค ๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3)(cid:112)(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ โˆš ๐‘›1๐‘›2 ๐‘ + (cid:33) (cid:107)X(cid:107)โˆž Similarly, we can also show that (cid:32) P max ๐‘ (cid:8)(cid:13) (cid:13)หš๐”ข(cid:62) ๐‘ โˆ— (PTRฮฉ(X) โˆ’ PT(X))(cid:13) (cid:13)F (cid:9) โ‰ค (cid:115) 4๐‘1(๐‘›1 + ๐‘›2) log ((๐‘›1 + ๐‘›2)๐‘›3) ๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 ยท (cid:107)X(cid:107)โˆž,2 โ‰ฅ 1 โˆ’ ๐‘›2((๐‘›1 + ๐‘›2)๐‘›3)1โˆ’๐‘1. (cid:115) 4๐‘1(๐‘›1 + ๐‘›2) log ((๐‘›1 + ๐‘›2)๐‘›3) ๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 ยท (cid:107)X(cid:107)โˆž,2+ ๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3)(cid:112)(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ โˆš ๐‘›1๐‘›2 ๐‘ (cid:33) (cid:107)X(cid:107)โˆž โ‰ฅ 1 โˆ’ ๐‘›1((๐‘›1 + ๐‘›2)๐‘›3)1โˆ’๐‘1. Thus, we can get (cid:32) P + (cid:107)(PTRฮฉ(X) โˆ’ PT(X))(cid:107)โˆž,2 โ‰ค (cid:115) 4๐‘1(๐‘›1 + ๐‘›2) log ((๐‘›1 + ๐‘›2)๐‘›3) ๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 ยท (cid:107)X(cid:107)โˆž,2 ๐‘1 log((๐‘›1 + ๐‘›2)๐‘›3)(cid:112)(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ โˆš ๐‘›1๐‘›2 ๐‘ (cid:33) (cid:107)X(cid:107)โˆž โ‰ฅ 1 โˆ’ ((๐‘›1 + ๐‘›2)๐‘›3)2โˆ’๐‘1. (cid:131) 61 Proof of Lemma 2.11. Notice that (cid:213) PTRฮฉPT(X) = PTRฮฉ (cid:169) (cid:173) ๐‘–, ๐‘—,๐‘˜ (cid:171) (cid:213) (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:170) (cid:174) (cid:172) ๐›ฟ๐‘–, ๐‘—,๐‘˜ (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) 1 ๐‘ ๐›ฟ๐‘–, ๐‘—,๐‘˜ (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) PT (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:170) (cid:174) (cid:172) (cid:17) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— . = PT (cid:169) (cid:173) ๐‘–, ๐‘—,๐‘˜ (cid:171) 1 ๐‘ (cid:213) = ๐‘–, ๐‘—,๐‘˜ (2.35) (2.36) Equation (2.35) is due to PT(X) = (cid:205) ๐‘–, ๐‘—,๐‘˜ linearity of operator PT. (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— . Equation (2.36) is due to Notice that PT(X) = PT (PT(X)) (cid:213) = PT (cid:169) (cid:173) ๐‘–, ๐‘—,๐‘˜ (cid:171) (cid:68) (cid:213) = (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) PT (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:170) (cid:174) (cid:172) (cid:17) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— Thus, we can have any (๐‘Ž, ๐‘, ๐‘)th entry of PTRฮฉPT(X) โˆ’ PT(X) can be given by ๐‘–, ๐‘—,๐‘˜ (cid:10)PTRฮฉPT(X) โˆ’ PT(X), หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) (cid:42) ๐‘ (cid:11) ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:68) (cid:69) PT (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ( 1 ๐‘ (cid:213) ๐‘–, ๐‘—,๐‘˜ ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) (cid:68) (cid:16) PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— ( 1 ๐‘ (cid:213) ๐‘–, ๐‘—,๐‘˜ (cid:17) (cid:17) , หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) ๐‘ , หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) ๐‘ (cid:43) (cid:69) It is easy to see that E(โ„Ž๐‘–, ๐‘—,๐‘˜ ) = 0. Notice that |โ„Ž๐‘–, ๐‘—,๐‘˜ | = = โ‰ค ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) (cid:68) (cid:68) 1 ๐‘ 1 ๐‘ PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) (cid:68) PT (cid:69) (cid:68) PT (cid:16) (cid:16) หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) (cid:17) , หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) ๐‘ (cid:69)(cid:12) (cid:12) (cid:12) (cid:12) , PT (cid:0)หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) ๐‘ (cid:1) (cid:69)(cid:12) (cid:12) (cid:12) (cid:12) (cid:107)PT(X)(cid:107)โˆž (cid:16) (cid:13) (cid:13) (cid:13) PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17)(cid:13) (cid:13) (cid:13)F (cid:13) (cid:13)PT (cid:0)หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) ๐‘ (cid:1)(cid:13) (cid:13)F โ‰ค (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 (cid:107)PT(X) (cid:107)โˆž = : (cid:213) ๐‘–, ๐‘—,๐‘˜ โ„Ž๐‘–, ๐‘—,๐‘˜ . = = ( (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) 1 ๐‘ ( It is easy to check that 62 (cid:213) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) ๐‘–, ๐‘—,๐‘˜ (cid:12) E[โ„Ž2 ๐‘–, ๐‘—,๐‘˜ ] (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) = E (cid:32)(cid:12) (cid:12) (cid:12) (cid:12) ( 1 ๐‘ ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1) (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) (cid:68) (cid:16) PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) , หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) ๐‘ 2(cid:33) (cid:69)(cid:12) (cid:12) (cid:12) (cid:12) (cid:18) ( = E 1 ๐‘ ๐›ฟ๐‘–, ๐‘—,๐‘˜ โˆ’ 1)2 (cid:12) (cid:12) (cid:12) (cid:68) PT(X), หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:69) (cid:68) (cid:16) PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17) , หš๐”ข๐‘Ž โˆ— (cid:164)๐”ข๐‘ โˆ— หš๐”ข(cid:62) ๐‘ 2(cid:19) (cid:69)(cid:12) (cid:12) (cid:12) โ‰ค 1 ๐‘ (cid:107)PT(X) (cid:107)2 โˆž (cid:16) (cid:13) (cid:13) (cid:13) PT หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:17)(cid:13) (cid:13) (cid:13) 2 F โ‰ค (๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 (cid:107)PT(X) (cid:107)2 โˆž . Thus, by non-commutative Bernstein inequality, we have 1 2 (PTRฮฉPT(X) โˆ’ PT(X))๐‘Ž,๐‘,๐‘ โ‰ฅ P (cid:20) (cid:21) (cid:107)PT(X)(cid:107)โˆž โˆ’(cid:107)PT(X)(cid:107)2 โˆž/8 (cid:107)PT(X)(cid:107)2 โˆž + (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ 6๐‘๐‘›1๐‘›2 (cid:107)PT(X)(cid:107)2 โˆž (cid:170) (cid:174) (cid:172) โ‰ค2 exp (cid:169) (cid:173) (cid:171) (cid:18) =2 exp (๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ ๐‘๐‘›1๐‘›2 โˆ’3๐‘๐‘›1๐‘›2 28(๐‘›1 + ๐‘›2)๐œ‡0๐‘Ÿ (cid:19) . Thus, using the union bound on every (๐‘Ž, ๐‘, ๐‘)th entry, we have (cid:107)(PTRฮฉPT โˆ’ PT) (PT(X)) (cid:107)โˆž โ‰ค holds with probability at least 1 โˆ’ 2๐‘›1๐‘›2๐‘›3 exp (cid:16) โˆ’3๐‘๐‘›1๐‘›2 28(๐‘›1+๐‘›2)๐œ‡0๐‘Ÿ (cid:107)PT(X)(cid:107)โˆž 1 2 (cid:17) . Lastly, to maintain the integrity of a self-contained exposition, we offer a detailed proof of Proposition 1 in Section 2.6.2.3, as originally presented in [84]. (cid:131) Proof of Proposition 1 In the following context, the symbol T is used to represent the tensor that we aim to recover in the optimization problem (2.18). Before we delve into the detailed proof pipeline, we wish to reiterate the purpose of Proposition 1. It asserts that T is a unique minimizer to the optimization problem (2.18) when Conditions 1 and 2 are met simultaneously. Notice that T is a feasible solution to the problem (2.18). In order to show T is the unique minimizer, it suffices to show (cid:107)X(cid:107)TNN โˆ’ (cid:107)T (cid:107)TNN > 0 for any feasible solution X but X โ‰  T . We first show that for any feasible solution X different from T , there exists a tensor M such that (cid:107)X(cid:107)TNN โˆ’ (cid:107)T (cid:107)TNN โ‰ฅ (cid:10)U โˆ— V(cid:62) + PTโŠฅ (T ) (M), X โˆ’ T (cid:11) . 63 In this way, we can transform proving (cid:107)X(cid:107)TNN โˆ’ (cid:107)T (cid:107)TNN > 0 into showing (cid:10)U โˆ— V(cid:62) + PTโŠฅ (T ) (M), X โˆ’ T (cid:11) > 0. To prove (cid:104)U โˆ— V(cid:62) + PTโŠฅ (M), X โˆ’ T (cid:105) > 0, we split (cid:10)U โˆ— V(cid:62) + PTโŠฅ (M), X โˆ’ T (cid:11) into two parts (cid:104)PTโŠฅ (M), X โˆ’ T (cid:105) and (cid:10)U โˆ— V(cid:62), X โˆ’ T (cid:11) . By the construction of M, we can show that (cid:104)PTโŠฅ (M), X โˆ’ T (cid:105) = (cid:107)PTโŠฅ (X โˆ’ T )(cid:107)TNN. As for the part (cid:104)U โˆ— V(cid:62), X โˆ’ T (cid:105) , we need to further split it into two parts by introducing the dual certification tensor Y: (cid:10)PT(T ) (Y) โˆ’ U โˆ— V(cid:62), X โˆ’ T (cid:11) , (cid:10)PTโŠฅ (T ) (Y), X โˆ’ T (cid:11) . The reason of doing such separation is that we can bound these two terms by 1 2 (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN โˆš 2 4 (cid:107)PTโŠฅ (Xโˆ’T ) (cid:107)TNN respectively. By combining the bound of above three separations together, and we can get (cid:104)U โˆ— V(cid:62) + PTโŠฅ (M), X โˆ’ T (cid:105) โ‰ฅ 1 8 (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN . In the end, we prove (cid:13)PTโŠฅ (T ) (X โˆ’ T )(cid:13) (cid:13) (cid:13)TNN strictly larger than zero by contradiction. Before we move on to the detailed proof, we will give several useful lemmas which are key to the proof of Proposition 1. First, we state the characterization of the tensor nuclear norm (TNN) which can be described as a duality to the tensor spectral norm. Lemma 2.12. ([84]) Given a tensor T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3, we have (cid:107)T (cid:107)TNN = sup 2 ร—๐‘› {QโˆˆK๐‘› 1 ร—๐‘› 3 :(cid:107)Q (cid:107)โ‰ค1} (cid:104)Q, T (cid:105) . Next, we present a characterization of the subdifferential of TNN, which is useful for proving the uniqueness of the minimizer to the optimization problem (2.18). Lemma 2.13 (Subdifferential of TNN [84]). Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 and its compact t-SVD be T = U โˆ— ฮฃ โˆ— V(cid:62). The subdifferential (the set of subgradients) of (cid:107)T (cid:107)TNN is ๐œ• (cid:107)T (cid:107)TNN = {U โˆ— V(cid:62) + W|U(cid:62) โˆ— W = 0, W โˆ— V = 0, (cid:107)W (cid:107) โ‰ค 1}. 64 For the proof of Proposition 1, a significant challenge is proving the minimizerโ€™s uniqueness. This involves ensuring the expression (cid:10)U โˆ— V(cid:62) + PTโŠฅ (T ) (M), X โˆ’ T (cid:11) > 0 with M satisfying some conditions. Lemma 2.14 ( [84]). Assume that ฮฉ is generated according to the Bernoulli sampling with proba- bility ๐‘. If (cid:107)PTRฮฉPT โˆ’ PT(cid:107) โ‰ค 1 2 , then (cid:107)PT(X) (cid:107)F โ‰ค (cid:115) 2๐‘›3 ๐‘ ยท (cid:107)PTโŠฅ (X) (cid:107)TNN , for any X with Pฮฉ(X) = 0. Proof of Lemma 2.14. Let X be a tensor satisfying Pฮฉ(X) = 0. Using self-adjoint property of the operator PT, we can have (cid:107)RฮฉPT(X) (cid:107)2 F = (cid:104)RฮฉPT(X), RฮฉPT(X)(cid:105) RฮฉPT(X), (cid:213) = (cid:42) 1 ๐‘ ๐‘–, ๐‘—,๐‘˜ ๐›ฟ๐‘–, ๐‘—,๐‘˜ [PT(X)]๐‘–, ๐‘—,๐‘˜ ยท หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:42) RฮฉPT(X), (cid:213) ๐‘–, ๐‘—,๐‘˜ ๐›ฟ๐‘–, ๐‘—,๐‘˜ [PT(X)]๐‘–, ๐‘—,๐‘˜ ยท หš๐”ข๐‘– โˆ— (cid:164)๐”ข๐‘˜ โˆ— หš๐”ข(cid:62) ๐‘— (cid:43) (cid:43) = = = = โ‰ฅ โ‰ฅ 1 ๐‘ 1 ๐‘ 1 ๐‘ 1 ๐‘ 1 ๐‘ 1 ๐‘ (cid:104)RฮฉPT(X), Pฮฉ(PT(X))(cid:105) = 1 ๐‘ (cid:104)PTRฮฉPT(X), X(cid:105) (cid:104)PTRฮฉPT(X) โˆ’ PT(X), X(cid:105) + 1 ๐‘ (cid:104)PT(X), X(cid:105) (cid:107)PT(X) (cid:107)2 F + (cid:107)PT(X) (cid:107)2 F โˆ’ (cid:107)PT(X) (cid:107)2 F โˆ’ 1 ๐‘ 1 ๐‘ 1 ๐‘ (cid:104)(PTRฮฉPT โˆ’ PT) (X), PT(X)(cid:105) (cid:107)PTRฮฉPT โˆ’ PT(cid:107) (cid:104)X, PT(X)(cid:105) (cid:107)PTRฮฉPT โˆ’ PT(cid:107) ยท (cid:107)PT(X) (cid:107)2 F โ‰ฅ 1 2๐‘ (cid:107)PT(X) (cid:107)2 F . Notice that if Pฮฉ(X) = 0, then Rฮฉ(X) must be zero tensor. Thus, we have (cid:107)RฮฉPT(X) (cid:107)F = (cid:13) (cid:107)RฮฉPTโŠฅ (X)(cid:107)F โ‰ค 1 (cid:13) PTโŠฅ (X) (cid:13)F result, we have (cid:107)PT(X)(cid:107)F โ‰ค (cid:112)2๐‘ (cid:107)RฮฉPT(X) (cid:107)F โ‰ค โˆš๐‘›3 ๐‘ (cid:107)PTโŠฅ (X)(cid:107)TNN. As a (cid:131) (cid:13) โ‰ค 1 (cid:13) ๐‘โˆš๐‘›3 (cid:13)โˆ— (cid:113) 2๐‘›3 ๐‘ (cid:107)PTโŠฅ (X) (cid:107)TNN. ๐‘ (cid:107)PTโŠฅ (X) (cid:107)F = 1 ๐‘โˆš๐‘›3 PTโŠฅ (X) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) (cid:13) = Proof of Proposition 1. Consider any feasible solution X โ‰  T to the optimization problem (2.18) with Pฮฉ(X) = Pฮฉ(T ). By the duality between the TNN and tensor spectral norm shown in 65 Lemma 2.12, we have that there exists a tensor M with (cid:107)M (cid:107) โ‰ค 1 such that (cid:107)PTโŠฅ (X โˆ’ T )(cid:107)TNN = (cid:104)M, PTโŠฅ (X โˆ’ T )(cid:105) = (cid:104)PTโŠฅ (M), PTโŠฅ (X โˆ’ T )(cid:105) . Firstly, it is easy to check that U(cid:62) โˆ— PTโŠฅ (M) = 0 and PTโŠฅ (M) โˆ— V = 0 by the definition of the operator PTโŠฅ (ยท). By Lemma 2.13, we have that U โˆ— V(cid:62) + PTโŠฅ (M) is a subgradient of T in terms of Tensor nuclear norm. Therefore, we have (cid:107)X(cid:107)TNN โˆ’ (cid:107)T (cid:107)TNN โ‰ฅ (cid:10)U โˆ— V(cid:62) + PTโŠฅ (M), X โˆ’ T (cid:11) . To prove (cid:107)X(cid:107)TNN โˆ’ (cid:107)T (cid:107)TNN โ‰ฅ 0, it is sufficient to show (cid:10)U โˆ— V(cid:62) + PTโŠฅ (M), X โˆ’ T (cid:11) โ‰ฅ 0. Notice that for any Y with Pฮฉ(Y) = Y, we have (cid:104)Y, X โˆ’ T (cid:105) = (cid:104)Pฮฉ(Y), X โˆ’ T (cid:105) = (cid:104)Pฮฉ(Y), Pฮฉ(X โˆ’ T )(cid:105) = 0. We thus have (cid:10)U โˆ— V(cid:62) + PTโŠฅ (M), X โˆ’ T (cid:11) = (cid:10)U โˆ— V(cid:62) + PTโŠฅ (M) โˆ’ Y, X โˆ’ T (cid:11) . Furthermore, we have (cid:10)U โˆ— V(cid:62) + PTโŠฅ (M) โˆ’ Y, X โˆ’ T (cid:11) = (cid:10)U โˆ— V(cid:62) + PTโŠฅ (M) โˆ’ PTโŠฅ (Y) โˆ’ PT(Y), X โˆ’ T (cid:11) = (cid:104)PTโŠฅ (M), X โˆ’ T (cid:105) โˆ’ (cid:10)PT(Y) โˆ’ U โˆ— V(cid:62), X โˆ’ T (cid:11) โˆ’ (cid:104)PTโŠฅ (Y), X โˆ’ T (cid:105) = (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN โˆ’ (cid:10)PT(Y) โˆ’ U โˆ— V(cid:62), PT(X โˆ’ T )(cid:11) โˆ’ (cid:104)PTโŠฅ (Y), PTโŠฅ (X โˆ’ T )(cid:105) โ‰ฅ (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN โˆ’ (cid:13) (cid:13)PT(Y) โˆ’ U โˆ— V(cid:62)(cid:13) (cid:13)F (cid:107)PT(X โˆ’ T ) (cid:107)F โˆ’ (cid:107)PTโŠฅ (Y)(cid:107) (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN โ‰ฅ โ‰ฅ 1 2 1 8 (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN โˆ’ (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN . (cid:114) ๐‘ ๐‘›3 1 4 ยท (cid:115) 2๐‘›3 ๐‘ (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN (2.37) Inequality (2.37) results from Condition 1 and Condition 2 and Lemma 2.14. Next, to verify the completeness of the proof, it suffices to show that (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN is strictly positive. We show it by contradiction. Suppose (cid:107)PTโŠฅ (X โˆ’ T ) (cid:107)TNN = 0, then PT(X โˆ’ T ) = X โˆ’ T and PTRฮฉPT(X โˆ’ 66 T ) = 0. Therefore, we have (cid:107)X โˆ’ T (cid:107) = (cid:107)(PTRฮฉPT โˆ’ PT)(X โˆ’ T )(cid:107) โ‰ค (cid:107)PTRฮฉPT โˆ’ PT(cid:107)(cid:107)X โˆ’ T (cid:107), which contradicts with the assumption that (cid:107)PTRฮฉPT โˆ’ PT(cid:107) โ‰ค 1 to the optimization problem (2.18). 2 . Thus, T is the unique minimizer (cid:131) Next we present a detailed proof of Theorem 2.6, our main theoretical result, demonstrating that our model ensures tensor recovery in high-probability. 2.6.3 Proof of Theorem 2.6 In this section, we provide a detailed proof of our main theoretical result Theorem 2.6. The proof is based on our Two-Step Tensor Completion (TSTC) algorithm. For the ease of the reader, we state the TSTC algorithm in Algorithm 2.7. This algorithm focuses on subtensor completion before combining results with t-CUR. Algorithm 2.7 Two-Step Tensor Completion (TSTC) 1: Input: [T ]ฮฉR โˆชฮฉC : observed data; ฮฉR, ฮฉC: observation locations; ๐ผ, ๐ฝ: lateral and horizontal indices; ๐‘Ÿ: target rank; TC: the chosen tensor completion solver. , ๐‘Ÿ) , ๐‘Ÿ) 2: (cid:101)R = TC( [T ]ฮฉR 3: (cid:101)C = TC( [C]ฮฉC 4: (cid:101)U = [ (cid:101)C] ๐ผ,:,: 5: (cid:101)T = (cid:101)C โˆ— (cid:101)Uโ€  โˆ— (cid:101)R 6: Output: (cid:101)T : approximation of T Based on the idea of TSTC, it is crucial to understand that how the tensor incoherence properties of the original low tubal-rank tensor transfer to subtensors. 2.6.3.1 Incoherence passes to subtensors Inspired by [17, Theorem 3.5], we explore how subtensors inherit the tensor incoherence con- ditions from the original tensor, differing from [99] in tensor norm and the definition of the tensor incoherence condition. Our focus is on subtensor incoherence due to its impact on the required sampling rate for accurate low tubal-rank tensor recovery (Theorem 2.5) and our emphasis on com- pleting subtensors in tensor completion. We begin by examining the relationship between the tensor incoherence properties of subtensors and the original low tubal-rank tensor. 67 Lemma 2.15. Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 satisfy the tensor ๐œ‡0-incoherence condition. Suppose that T has a compact t-SVD T = W โˆ— ฮฃ โˆ— V(cid:62) and a condition number ๐œ…. Consider the subtensors C = [T ]:,๐ฝ,: and R = [T ] ๐ผ,:,:, each maintaining the same tubal-rank as T . Their compact t-SVDs are represented as C = WC โˆ— ฮฃC โˆ— V(cid:62) C and R = WR โˆ— ฮฃR โˆ— V(cid:62) R , then the following results hold: ๐œ‡C โ‰ค ๐œ…2 (cid:13) (cid:13) (cid:13) [V]โ€  ๐ฝ,:,: (cid:13) (cid:13) (cid:13) 2 |๐ฝ | ๐‘›2 ๐œ‡0, and ๐œ‡R โ‰ค ๐œ…2 (cid:13) (cid:13) (cid:13) [W]โ€  ๐ผ,:,: (cid:13) (cid:13) (cid:13) 2 |๐ผ | ๐‘›1 ๐œ‡0. Proof. First, letโ€™s prove that (cid:13) (cid:13) (cid:13) (cid:62) [(cid:100)WC max ๐‘– ]:,:,๐‘˜ ยท e๐‘– (cid:13) (cid:13) (cid:13)F โ‰ค (cid:114) ๐œ‡0๐‘Ÿ ๐‘›1 . is (cid:1) (cid:62) = P โˆ— S โˆ— Q(cid:62). Thus, P โˆˆ K๐‘Ÿร—๐‘Ÿร—๐‘›3 is an orthonormal tensor, leading to WC = W โˆ— P ๐ฝ,:,:. Assume the compact t-SVD of ฮฃ โˆ— (cid:0)[V] ๐ฝ,:,: Notice that C = [T ]:,๐ฝ,: = W โˆ— ฮฃ โˆ— [V](cid:62) ฮฃ โˆ— (cid:0)[V] ๐ฝ,:,: based on the relationship that C = W โˆ— P โˆ— S โˆ— Q(cid:62). (cid:1) (cid:62) P(cid:62) โˆ— P = I implies that [ (cid:98)P](cid:62) Therefore, we can establish that for ๐‘˜ โˆˆ [๐‘›3], :,:,๐‘˜ ยท [ (cid:98)P]:,:,๐‘˜ = I๐‘Ÿ, where I๐‘Ÿ is the ๐‘Ÿ ร— ๐‘Ÿ identity matrix for all ๐‘˜ โˆˆ [๐‘›3]. max ๐‘– (cid:107) (cid:100)WC](cid:62) (cid:62) (cid:107) [ (cid:99)W](cid:62) :,:,๐‘˜ ยท e๐‘– (cid:107)F = max (cid:13)([V] ๐ฝ,:,:)โ€ (cid:13) ]:,:,๐‘˜ ยท e๐‘– (cid:107)F โ‰ค ๐œ…(T ) (cid:13) (cid:107) [ (cid:99)VC (cid:13) C โˆ— C. Thus, for each ๐‘˜ โˆˆ [๐‘›3], [ (cid:98)V(cid:62) :,:,๐‘˜ ยท e๐‘– (cid:107)F โ‰ค (cid:113) ๐œ‡0๐‘Ÿ ๐‘›2 C ]:,:,๐‘˜ = [(cid:98)ฮฃโ€  ๐‘– ๐‘– C โˆ— W(cid:62) (cid:114) ๐œ‡0๐‘Ÿ ๐‘›1 . (2.38) . The compact t-SVD of C C]:,:,๐‘˜ ยท [ (cid:99)W(cid:62) C ]:,:,๐‘˜ ยท [ (cid:98)C]:,:,๐‘˜ Next, letโ€™s prove max implies V(cid:62) C = ฮฃโ€  holds and (cid:107) [ (cid:98)V(cid:62) C ]:,:,๐‘˜ ยท e๐‘– (cid:107)F can be bounded by (cid:107) [ (cid:98)V(cid:62) C ]:,:,๐‘˜ ยท e๐‘– (cid:107)F =(cid:107) [(cid:98)ฮฃโ€  C]:,:,๐‘˜ ยท [ (cid:99)W(cid:62) C ]:,:,๐‘˜ ยท [ (cid:98)C]:,:,๐‘˜ ยท e๐‘– (cid:107)F ]:,:,๐‘˜ ยท [ (cid:99)W]:,:,๐‘˜ ยท [(cid:98)ฮฃ]:,:,๐‘˜ ยท [ (cid:98)V](cid:62) ๐ฝ,:,๐‘˜ ยท e๐‘– (cid:107)F ๐ฝ,:,๐‘˜ ยท e๐‘– (cid:107)F (cid:62) โ‰ค (cid:107) [ (cid:98)V](cid:62) โ‰ค(cid:107) [(cid:98)ฮฃโ€  C]:,:,๐‘˜ (cid:107)(cid:107) [(cid:100)WC (cid:13) (cid:13) (cid:13) (cid:13) โ€  (cid:13) (cid:13) (cid:13) (cid:13) (cid:13)ฮฃ (cid:13)ฮฃ C (cid:13) (cid:13) โ‰ค(cid:107)Cโ€ (cid:107) (cid:107)T (cid:107) (cid:107) [ (cid:98)V](cid:62) = (cid:13) (cid:13)( [V](cid:62) โ‰ค (cid:13) (cid:13)( [V](cid:62) :,:,๐‘˜ ยท e๐‘– (cid:107)F ๐ฝ,:,:)โ€  โˆ— ฮฃโ€  โˆ— W(cid:62)(cid:13) ๐ฝ,:,:)โ€ (cid:13) (cid:13) (cid:107)T โ€ (cid:107) (cid:107)T (cid:107) (cid:107) [ (cid:98)V](cid:62) (cid:13) (cid:107)T (cid:107) (cid:107) [ (cid:98)V](cid:62) :,:,๐‘˜ ยท e๐‘– (cid:107)F :,:,๐‘˜ ยท e๐‘– (cid:107)F โ‰ค ๐œ… (cid:13) (cid:13)([V](cid:62) ๐ฝ,:,:)โ€ (cid:13) (cid:13) (cid:114) ๐œ‡0๐‘Ÿ ๐‘›2 . 68 That is, max ๐‘– (cid:107) [ (cid:98)V(cid:62) C ]:,:,๐‘˜ ยท e๐‘– (cid:107)F โ‰ค ๐œ… (cid:13) (cid:13) (cid:13) Combining (2.38) and (2.39), we can conclude that ๐œ‡C โ‰ค ๐œ…2 (cid:13) (cid:13) (cid:13) Applying above process on R, we can get ๐œ‡R โ‰ค ๐œ…2 (cid:13) [W]โ€  (cid:13) (cid:13) [V]โ€  ๐ฝ,:,: (cid:13) (cid:13) (cid:13) ๐œ‡0|๐ฝ | ๐‘›2 (cid:114) ๐‘Ÿ |๐ฝ | . [V]โ€  ๐ฝ,:,: (cid:13) 2 |๐ผ | (cid:13) ๐‘›1 (cid:13) ๐ผ,:,: (cid:13) 2 |๐ฝ | (cid:13) ๐‘›2 (cid:13) ๐œ‡0. (cid:115) ๐œ‡0. (2.39) (cid:131) Following Lemma 2.15, we explore the incoherence properties of uniformly sampled subten- sors, with summarized results below. Lemma 2.16. Let T โˆˆ K๐‘›1ร—๐‘›2ร—๐‘›3 with multi-rank (cid:174)๐‘Ÿ, and let T = W โˆ— ฮฃ โˆ— V(cid:62) be its compact t-SVD. Additionally, T satisfies the tensor ๐œ‡0-incoherence condition, and ๐œ… denotes the condition number of T . Suppose ๐ผ โІ [๐‘›1] and ๐ฝ โІ [๐‘›2] are chosen uniformly at random with replacement. Then rank๐‘š (R) = rank๐‘š (C) = rank๐‘š (T ), ๐œ‡C โ‰ค 25 4 ๐œ…2๐œ‡0 and ๐œ‡R โ‰ค 25 4 ๐œ…2๐œ‡0 hold with probability at least 1 โˆ’ 1 ๐‘›๐›ฝ 1 โˆ’ 1 ๐‘›๐›ฝ 2 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›2(cid:107)(cid:174)๐‘Ÿ (cid:107)1). provided that |๐ผ | โ‰ฅ 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›1(cid:107)(cid:174)๐‘Ÿ (cid:107)1) and |๐ฝ | โ‰ฅ Proof of Lemma 2.16. According to Lemma 2.3 by setting ๐›ฟ = 0.815 and ๐›ฝ1 = ๐›ฝ2 = ๐›ฝ, we can easily get that Therefore, (cid:32) P (cid:107) [V]โ€  ๐ฝ,:,:(cid:107) โ‰ค (cid:32) P (cid:107) [W]โ€  ๐ผ,:,:(cid:107) โ‰ค (cid:115) (cid:115) 25๐‘›2 4|๐ฝ | 25๐‘›1 4|๐ผ | , rank๐‘š (C) = rank๐‘š (T ) โ‰ฅ1 โˆ’ (cid:33) (cid:33) , rank๐‘š (R) = rank๐‘š (T ) โ‰ฅ1 โˆ’ , . 1 ๐‘›๐›ฝ 2 1 ๐‘›๐›ฝ 1 (cid:18) (cid:18) P P ๐œ‡C โ‰ค ๐œ‡R โ‰ค 25 4 25 4 ๐œ…2๐œ‡0, rank๐‘š (C) = rank๐‘š (T ) ๐œ…2๐œ‡0, rank๐‘š (R) = rank๐‘š (T ) Combining (2.40) and (2.41), we can conclude that (cid:19) (cid:19) โ‰ฅ1 โˆ’ โ‰ฅ1 โˆ’ , . 1 ๐‘›๐›ฝ 2 1 ๐‘›๐›ฝ 1 rank๐‘š (R) = rank๐‘š (C) = rank๐‘š (T ), ๐œ‡C โ‰ค 25 4 ๐œ…2๐œ‡0 and ๐œ‡R โ‰ค 25 4 ๐œ…2๐œ‡0 with probability at least 1 โˆ’ 1 ๐‘›๐›ฝ 1 โˆ’ 1 ๐‘›๐›ฝ 2 provided that |๐ผ | โ‰ฅ 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›1(cid:107)(cid:174)๐‘Ÿ (cid:107)1) and |๐ฝ | โ‰ฅ 2๐›ฝ๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›2(cid:107)(cid:174)๐‘Ÿ (cid:107)1) . (2.40) (2.41) (cid:131) 69 2.6.3.2 Proof of Theorem 2.6 Proof of Theorem 2.6. Note that ๐ผ โІ [๐‘›1] and ๐ฝ โІ [๐‘›2] are chosen uniformly with replacement. According to Lemma 2.16, we thus have rank๐‘š (R) = rank๐‘š (C) = rank๐‘š (T ), ๐œ‡C โ‰ค 25 4 ๐œ…2๐œ‡0 and ๐œ‡R โ‰ค 25 4 ๐œ…2๐œ‡0 hold with probability at least 1 โˆ’ 1 ๐‘›800๐›ฝ๐œ…2 log(๐‘›1๐‘›3+๐‘›2๐‘›3) 1 โˆ’ 1 ๐‘›800๐›ฝ๐œ…2 log(๐‘›1๐‘›3+๐‘›2๐‘›3) 2 =1 โˆ’ 1 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)800๐›ฝ๐œ…2 log(๐‘›1) โˆ’ 1 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)800๐›ฝ๐œ…2 log(๐‘›2) provided that |๐ผ | โ‰ฅ 3200๐›ฝ๐œ‡0๐‘Ÿ ๐œ…2 log2(๐‘›1๐‘›3 + ๐‘›2๐‘›3) โ‰ฅ 800๐œ…2 log(๐‘›1๐‘›3 + ๐‘›2๐‘›3) ๐›ฝ ยท (2๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›1(cid:107)(cid:174)๐‘Ÿ (cid:107)1)), |๐ฝ | โ‰ฅ 3200๐›ฝ๐œ‡0๐‘Ÿ ๐œ…2 log2(๐‘›1๐‘›3 + ๐‘›2๐‘›3) โ‰ฅ 800๐œ…2 log(๐‘›1๐‘›3 + ๐‘›2๐‘›3) ๐›ฝ ยท (2๐œ‡0(cid:107)(cid:174)๐‘Ÿ (cid:107)โˆž log (๐‘›2(cid:107)(cid:174)๐‘Ÿ (cid:107)1) . Additionally, the following statements hold by Theorem 2.5 and the condition that ๐œ‡C โ‰ค 25 4 ๐œ…2๐œ‡0 and ๐œ‡R โ‰ค 25 4 ๐œ…2๐œ‡0 : i) Given C โˆˆ K๐‘›1ร—|๐ฝ |ร—๐‘›3 with rank(C) = ๐‘Ÿ, ๐‘C โ‰ฅ 1600๐›ฝ(๐‘›1 + |๐ฝ |)๐œ‡0๐‘Ÿ ๐œ…2 log2(๐‘›1๐‘›3 + |๐ฝ |๐‘›3) ๐‘›1|๐ฝ | for some ๐›ฝ > 1 ensures that C is the unique minimizer to XโˆˆK๐‘› with probability at least 1 โˆ’ 3 log(๐‘›1๐‘›3+|๐ฝ |๐‘›3) (๐‘›1๐‘›3+|๐ฝ |๐‘›3)4๐›ฝโˆ’2 . min 1 ร— | ๐ฝ | ร—๐‘› 3 (cid:107)X(cid:107)TNN, subject to PฮฉC (X) = PฮฉC (C). ii) Given R โˆˆ K|๐ผ |ร—๐‘›2ร—๐‘›3 with rank(R) = ๐‘Ÿ, ๐‘R โ‰ฅ 1600๐›ฝ(๐‘›2 + |๐ผ |)๐œ‡0๐‘Ÿ ๐œ…2 log2(๐‘›2๐‘›3 + |๐ผ |๐‘›3) ๐‘›2|๐ผ | for some ๐›ฝ > 1 ensures that R is the unique minimizer to min XโˆˆK| ๐ผ | ร—๐‘› 2 ร—๐‘› 3 (cid:107)X(cid:107)TNN, subject to PฮฉR (X) = PฮฉR (R). Once C and R are uniquely recovered from ฮฉC and ฮฉR, respectively. Then t-CUR decomposition can provide the reconstruction of T via T = C โˆ—Uโ€  โˆ—R with the condition rank๐‘š (R) = rank๐‘š (C) = rank๐‘š (T ). 70 Combining all the statements above, we can conclude that T can be uniquely recovered from ฮฉC โˆช ฮฉR with probability at least 1 โˆ’ 2 (๐‘›1๐‘›3 + ๐‘›2๐‘›3)800๐›ฝ๐œ…2 log(๐‘›2) โˆ’ 3 log(๐‘›1๐‘›3 + |๐ฝ |๐‘›3) (๐‘›1๐‘›3 + |๐ฝ |๐‘›3)4๐›ฝโˆ’2 โˆ’ 3 log(๐‘›2๐‘›3 + |๐ผ |๐‘›3) (๐‘›2๐‘›3 + |๐ผ |๐‘›3)4๐›ฝโˆ’2 . (cid:131) 2.7 Conclusion In this work, we present the t-CCS model, an extension of the matrix CCS model to a tensor framework. We provide both theoretical and experimental evidence demonstrating the flexibility and feasibility of the t-CCS model. The ITCURC algorithm, designed for the t-CCS model, pro- vides a balanced trade-off between runtime efficiency and reconstruction quality. While it is not as effective as the state-of-the-art Bernoulli-Based TC algorithm, it is still comparable in terms of PSNR and SSIM. Thus, one of directions of our future research will focus on enhancing reconstruc- tion quality through the integration of the ๐‘€-product. From theoretical side, our current theoretical result shows that the t-CUR sampling scheme, as a special case of t-CCS model, requires a complex- ity of O (๐œ‡0๐‘Ÿ๐‘›3(๐‘›2 log(๐‘›1๐‘›3) + ๐‘›1 log(๐‘›2๐‘›3))) is sufficient, which is more sampling-efficient than that of a general t-CCS scheme. This finding suggests there is potential to further improve the the- oretical sampling complexity for the t-CCS model, an aspect we plan to explore in our future work. Additionally, there is a need for comprehensive theoretical analysis on the convergence behavior of the ITCURC algorithm within the t-CCS model framework. Evaluating the algorithmโ€™s robustness against additive noise will also be a critical area of focus for future research. Furthermore, while our current work is limited to third-order tensors, we aim to extend our approach to accommodate higher-order tensor configurations in subsequent studies. 71 CHAPTER 3 ON THE ROBUSTNESS OF CROSS-CONCENTRATED SAMPLING FOR MATRIX COMPLETION 72 ABSTRACT Matrix completion is essential in data science for recovering missing entries in partially observed data. Recently, cross-concentrated sampling (CCS), a novel approach to matrix completion, has gained attention, though its robustness against sparse outliers remains unaddressed. In this chap- ter, we propose the Robust CCS Completion problem to explore this robustness and introduce a non-convex iterative algorithm called Robust CUR Completion (RCURC). Our experiments with synthetic and real-world datasets confirm that RCURC is both efficient and robust to sparse outliers, making it a powerful tool for Robust Matrix Completion. 3.1 Introduction The matrix completion problem, first introduced by Candes et al. [25] and Recht [97], aims to reconstruct a low-rank matrix ๐‘‹ from a limited subset of its observed entries. In practice, many real-world data matrices are highly incomplete, and this problem has emerged as an important tool for uncovering latent structures in the data. The significance of matrix completion lies in its broad applicability across numerous domains such as recovering missing data in recommendation systems [7, 45], improving image quality [29, 61], and enhancing the efficiency and accuracy of signal processing techniques [22, 13]. In its simplest form, the goal of matrix completion is to estimate the unknown entries of a matrix ๐‘‹ given access only to a small fraction of the entries. Mathematically, this can be described as solving for ๐‘‹ given observations from the set ฮฉ, which contains the indices of known entries in ๐‘‹. A common assumption is that the matrix ๐‘‹ is of low rank, meaning that it can be described by a small number of underlying factors or components. The challenge arises from ensuring that the reconstructed matrix accurately captures the underlying low-rank structure without overfitting to the noise or sparsity of the observations. Traditional matrix completion methods often rely on uniform or Bernoulli sampling strategies, where each entry of the matrix is sampled independently with a fixed probability. However, this approach can be inefficient, particularly when the data exhibits specific structures or when some rows and columns are more informative than others. 73 Recent advancements in the matrix completion field have introduced a novel sampling method known as CCS [19]. Unlike uniform sampling, which treats all entries equally, CCS focuses on strategically sampling entries from certain rows and columns to achieve a more informative set of observations. By concentrating the samples in regions of the matrix that are more likely to contain useful information, CCS can lead to more accurate matrix recovery with fewer sampled entries. Algorithm 3.1 outlines the key steps in the CCS procedure. Algorithm 3.1 Cross-Concentrated Sampling (CCS) [19] 1: Input: the data matrix ๐‘Œ . 2: Uniformly select a subset of row indices ๐ผ and column indices ๐ฝ. 3: Set ๐‘… = [๐‘Œ ] ๐ผ,: (rows indexed by ๐ผ) and ๐ถ = [๐‘Œ ]:,๐ฝ (columns indexed by ๐ฝ). 4: Uniformly sample entries within the selected rows ๐‘… and columns ๐ถ, recording the sampled locations as ฮฉR and ฮฉC, respectively. 5: Output: Return the observed entries [๐‘Œ ]ฮฉRโˆชฮฉC and the indices ฮฉR, ฮฉC, ๐ผ, ๐ฝ. (a) Uniform Sampling (b) CCSโ€“Less Concentrated (c) CCSโ€“More Concentrated (d) CUR Sampling Figure 3.1 [19] Visual comparison of sampling schemes: from uniform to CUR sampling at the same observation rate. Colored pixels indicate observed entries, black pixels indicate missing ones. As shown in Figure 3.1, the CCS model bridges two commonly used sampling methods in matrix completion: Uniform Sampling and CUR Sampling. Uniform sampling randomly selects entries from the entire matrix, while CUR sampling focuses on selecting entire rows and columns for observation. The CCS approach can be viewed as a hybrid method, offering additional flexibility by concentrating samples on selected rows and columns, with a theoretical basis for achieving better recovery in certain structured datasets. 74 Despite the advantages of CCS, matrix completion in real-world applications often encounters a significant challenge: data corruption by sparse outliers. In many scenarios, the observed matrix is not simply low-rank but is also corrupted by noise or outliers that are sparsely distributed. Such outliers can arise from various sources, such as user input errors in recommendation systems or sensor malfunctions in signal processing. To address this, Robust Matrix Completion methods have been developed, which introduce a sparse matrix ๐‘† to model the outliers, while ensuring that the underlying low-rank matrix ๐‘‹ is accurately recovered. A crucial question that remains is whether CCS-based matrix completion is robust to sparse outliers when used with robust recovery algorithms. Specifically, we ask: Question 1. [18]Is CCS-based matrix completion robust to sparse outliers under some robust al- gorithms, like the uniform sampling model? To address this, we examine the Robust CCS Completion problem, where we are given partial observations Pฮฉ(๐‘Œ ) of a corrupted data matrix ๐‘Œ = ๐‘‹ + ๐‘†, where ๐‘‹ is a low-rank matrix and ๐‘† represents sparse outliers. The objective is to simultaneously recover both ๐‘‹ and ๐‘† using CCS- based sampling. The problem is formulated as follows: min ๐‘‹,๐‘† 1 2 (cid:104)Pฮฉ(๐‘‹ + ๐‘† โˆ’ ๐‘Œ ), ๐‘‹ + ๐‘† โˆ’ ๐‘Œ (cid:105) subject to rank(๐‘‹) = ๐‘Ÿ, ๐‘† is ๐›ผ-sparse, where, the sampling operator Pฮฉ is defined as: Pฮฉ(๐‘Œ ) = (cid:213) [๐‘Œ ]๐‘–, ๐‘— ๐‘’๐‘–๐‘’(cid:62) ๐‘— , (๐‘–, ๐‘—)โˆˆฮฉ where ฮฉ denotes the set of observed indices generated by the CCS model. The sparse compo- nent ๐‘† accounts for outliers, enabling more accurate recovery of the underlying low-rank matrix ๐‘‹. This framework extends the principles of Robust Matrix Completion, incorporating the novel CCS sampling approach. 75 3.1.1 Related Work The problem of low-rank matrix recovery in the presence of sparse outliers has been well- studied under the settings of uniform sampling and Bernoulli sampling. This problem is known as robust principal component analysis (RPCA) when the corrupted data matrix is fully observed, and it is called Robust Matrix Completion if data is partially observed. The seminal work [23] considers both RPCA and Robust Matrix Completion problems via convex relaxed formulations and provides recovery guarantees. In particular, under the ๐œ‡-incoherence assumption for the low- rank ๐‘‹, [23] requires the positions of outliers to be placed uniformly at random, and at least 0.1๐‘›2 entries are observed uniformly at random. Later, a series of non-convex algorithms [31, 78, 89, 132, 137, 12, 112, 20] tackle RPCA and/or Robust Matrix Completion problems with an im- proved, non-random ๐›ผ-sparsity assumptions for the outlier matrix ๐‘†. The typical recovery guarantee shows a linear convergence of a non-convex algorithm, provided ๐›ผ โ‰ค O (1/poly(๐œ‡๐‘Ÿ)); moreover, O (poly(๐œ‡๐‘Ÿ)polylog(๐‘›)๐‘›) random samples are typically required for the Robust Matrix Comple- tion cases. Another line of work [30, 22, 136, 11, 13] focuses on the robust recovery of structured low-rank matrices, e.g., Hankel matrices, and they typically require merely O (poly(๐œ‡๐‘Ÿ)polylog(๐‘›)) samples by utilizing the structure, even in the presence of structured outliers. More recently, [15, 17, 59] study the robust CUR decomposition problem, that is, recovering the low-rank matrix from row- and column-wise observations with entry-wise corruptions. On the other hand, [19] shows that CCS-based matrix completion requires O (๐œ‡2๐‘Ÿ 2๐‘› log2 ๐‘›) samples which is only a factor of log ๐‘› worse than the state-of-the-art result; however, its outlier tolerance has not been studied. 3.1.2 Notation For a matrix ๐‘€, [๐‘€]๐‘–, ๐‘— , [๐‘€] ๐ผ,:, [๐‘€]:,๐ฝ, and [๐‘€] ๐ผ,๐ฝ denote its (๐‘–, ๐‘—)-th entry, its row submatrix with row indices ๐ผ, its column submatrix with column indices ๐ฝ, and its submatrix with row indices ๐ผ and column indices ๐ฝ, respectively. ๐‘€ โ€  represents the Mooreโ€“Penrose inverse of ๐‘€. We use (cid:104)ยท, ยท(cid:105) to denote the Frobenius inner product. The symbol [๐‘›] denotes the set {1, 2, ยท ยท ยท , ๐‘›} for all ๐‘› โˆˆ Z+. Throughout this chapter, uniform sampling is referred to as uniform sampling with replacement. 76 3.1.3 ๐œ‡-incoherence and ๐›ผ-sparsity In Robust Matrix Completion, we often rely on certain structural assumptions about the matrix to be recovered. Two such pivotal assumptions are ๐œ‡-incoherence [26] and ๐›ผ-sparsity [23]. These assumptions play crucial roles in ensuring that the recovery algorithm can effectively reconstruct the matrix even when a significant portion of its entries are missing or corrupted by noise. The concept of ๐œ‡-incoherence is pivotal in the field of matrix completion, designed to ensure a balanced distribution of information across all rows and columns of a matrix. This balance is crucial for preventing any single row or column from disproportionately influencing the overall content of the matrix, which is particularly important when attempting to recover or approximate a matrix from a partial set of its entries. Informally, a matrix is described as ๐œ‡-incoherent when its singular vectors are such that no indi- vidual component dominates. This is quantified through boundedness conditions on the entries of the singular vectors, which ensure that the matrixโ€™s structural information is uniformly distributed. We formalize this intuitive concept with the following definition: Definition 21 (๐œ‡-incoherence [26]). Let ๐‘‹ โˆˆ R๐‘›1ร—๐‘›2 be a matrix of rank ๐‘Ÿ. The matrix ๐‘‹ is said to be ๐œ‡-incoherent if the following conditions hold for its compact singular value decomposition ๐‘‹ = ๐‘ˆฮฃ๐‘‰ (cid:62): (cid:107)๐‘ˆ (cid:107)2,โˆž โ‰ค (cid:114) ๐œ‡๐‘Ÿ ๐‘›1 and (cid:107)๐‘‰ (cid:107)2,โˆž โ‰ค (cid:114) ๐œ‡๐‘Ÿ ๐‘›2 , where (cid:107) ยท (cid:107)2,โˆž represents the maximum โ„“2-norm among the rows of the matrices ๐‘ˆ and ๐‘‰, respec- tively, and ๐œ‡ is a positive scalar that quantifies the level of incoherence. This definition encapsulates how ๐œ‡-incoherence functions as a safeguard against skewed data representation in matrix completion tasks, facilitating algorithms that require uniformly spread sin- gular vectors for effective reconstruction. The ๐›ผ-sparsity assumption plays a critical role in the analysis of matrix structures, particularly within the domain of Robust Matrix Completion. This assumption pertains to the density of non- zero entries in the matrix, or more specifically, in its constituent components such as the sparse error 77 matrix. The sparsity parameter ๐›ผ serves as a threshold, dictating the maximum allowable proportion of non-zero entries in each row and each column of the matrix, thus ensuring a controlled spread of these entries throughout the matrix. Definition 22 (๐›ผ-sparsity [23]). Consider a matrix ๐‘† โˆˆ R๐‘›1ร—๐‘›2. We define ๐‘† as ๐›ผ-sparse if no more than an ๐›ผ fraction of its entries in each row and each column are non-zero. Specifically, for every row index ๐‘– within the set of all row indices [๐‘›1] and every column index ๐‘— within the set of all column indices [๐‘›2], the matrix satisfies: (cid:107) [๐‘†]๐‘–,:(cid:107)0 โ‰ค ๐›ผ๐‘›2 and (cid:107) [๐‘†]:, ๐‘— (cid:107)0 โ‰ค ๐›ผ๐‘›1, where (cid:107) ยท (cid:107)0 represent the number of non-zero entries. The assumptions of ๐œ‡-incoherence and ๐›ผ-sparsity are crucial because they directly influence the feasibility and complexity of the matrix recovery process in Robust Matrix Completion. These conditions ensure that the matrix has a well-distributed singular vector structure and a manageable number of outliers or corruptions, which are key for successful recovery using optimization-based methods [23, 26]. 3.1.4 CUR Approximation CUR approximation, also referred to as skeleton decomposition, forms the foundation of our algorithm design. To provide context, we will briefly review some key concepts of CUR approxi- mation, which plays a crucial role in matrix dimensionality reduction. CUR approximation is a powerful and interpretable technique that addresses the challenge of reducing matrix dimensionality while preserving meaningful structure. Given a rank-๐‘Ÿ matrix ๐‘‹ โˆˆ R๐‘›ร—๐‘› , the matrix can be reconstructed by selecting appropriate rows and columns that span its row and column spaces, respectively. This method offers an intuitive way to approximate matrices by extracting representative submatrices. The theoretical underpinnings of this approach have been established in prior research, as outlined in the following theorem: Theorem 3.1 ( [87, 58]). Consider row and column index sets ๐ผ, ๐ฝ โІ [๐‘›]. Denote the submatrices 78 ๐ถ = [๐‘‹]:,๐ฝ, ๐‘ˆ = [๐‘‹] ๐ผ,๐ฝ, and ๐‘… = [๐‘‹] ๐ผ,:. If rank(๐‘ˆ) = rank(๐‘‹), then ๐‘‹ = ๐ถ๐‘ˆโ€ ๐‘…. An extensive body of literature has significantly contributed to the development of sampling methods in matrix CUR approximation. Key works include those by Achlioptas and McSherry [2], Ahmadi and Drineas [3], Drineas et al. [41], Boutsidis and Woodruff [10], Hamm and Huang [58], Cai et al. [19], and Martinsson [88], among many others [76, 123, 35, 85, 126, 119, 87, 128, 111, 77, 67, 127, 36, 92, 106, 27, 44, 124, 93, 39, 134, 34, 46, 47, 118, 55, 75]. In fact, sampling a sufficient number of rows and columns makes this condition highly likely. An example of such a sampling strategy is provided in Theorem 3.2. Theorem 3.2 ( [32, Theorem 1.1]). Let ๐‘‹ satisfy Definition 21, and suppose we sample |๐ผ | = O (๐‘Ÿ log ๐‘›) rows and |๐ฝ | = O (๐‘Ÿ log ๐‘›) columns uniformly at random. Then rank(๐‘ˆ) = rank(๐‘‹) with probability at least 1 โˆ’ ๐‘‚ (๐‘Ÿ๐‘›โˆ’2). 3.2 Proposed Algorithm In this section, we introduce a novel non-convex algorithm for solving the Robust CCS Comple- tion problem (3.1). The algorithm builds upon the projected gradient descent framework, integrat- ing the CUR decomposition to efficiently compute low-rank approximations at each iteration. This approach significantly reduces computational complexity while maintaining robust performance. The proposed algorithm, named Robust CUR Completion (RCURC), is outlined in Algorithm 3.2. RCURC leverages the CUR approximation, where selected rows and columns are used to cap- ture the essential structure of the matrix. By iteratively updating both the sparse and low-rank components, RCURC aims to solve the matrix completion problem with high efficiency, even in the presence of sparse outliers. Specifically, in each iteration, the algorithm alternates between updating the sparse matrix using a thresholding operator and refining the low-rank matrix through projected gradient updates on the observed data. The method ensures that the low-rank component is efficiently approximated using the CUR decomposition, which focuses on key rows and columns to reduce the dimensionality of the problem. 79 Algorithm 3.2 Robust CUR Completion (RCURC) 1: Input: [๐‘Œ = ๐‘‹ + ๐‘†]ฮฉ๐‘…โˆชฮฉ๐ถ : observed data; ฮฉ๐‘…, ฮฉ๐ถ: observation locations; ๐ผ, ๐ฝ: row and column indices that define ๐‘… and ๐ถ respectively; ๐œ‚๐‘…, ๐œ‚๐ถ: step sizes; ๐‘Ÿ: target rank; ๐œ€: target precision level; ๐œ0: initial thresholding value; ๐›พ: thresholding decay parameter. ๐‘˜ = 0 2: ๐‘‹0 = ๐‘€0; 3: while ๐‘’๐‘˜ > ๐œ€ do // ๐‘’๐‘˜ is defined in (3.3) // Updating sparse component 4: ๐œ๐‘˜+1 = ๐›พ ๐‘˜ ๐œ0 5: [๐‘†๐‘˜+1] ๐ผ,: = T๐œ๐‘˜+1 [๐‘Œ โˆ’ ๐‘‹๐‘˜ ] ๐ผ,: 6: [๐‘†๐‘˜+1]:,๐ฝ = T๐œ๐‘˜+1 [๐‘Œ โˆ’ ๐‘‹๐‘˜ ]:,๐ฝ 7: // Updating low-rank component 8: ๐‘…๐‘˜+1 = [๐‘‹๐‘˜ ] ๐ผ,: + ๐œ‚๐‘… [Pฮฉ๐‘… (๐‘Œ โˆ’ ๐‘‹๐‘˜ โˆ’ ๐‘†๐‘˜+1)] ๐ผ,: 9: ๐ถ๐‘˜+1 = [๐‘‹๐‘˜ ]:,๐ฝ + ๐œ‚๐ถ [Pฮฉ๐ถ (๐‘Œ โˆ’ ๐‘‹๐‘˜ โˆ’ ๐‘†๐‘˜+1)]:,๐ฝ 10: ๐‘ˆ๐‘˜+1 = H๐‘Ÿ ( [๐‘…๐‘˜+1]:,๐ฝ (cid:93) [๐ถ๐‘˜+1] ๐ผ,:) 11: [๐‘…๐‘˜+1]:,๐ฝ = ๐‘ˆ๐‘˜+1 12: ๐‘‹๐‘˜+1 = ๐ถ๐‘˜+1๐‘ˆโ€  13: ๐‘˜+1 ๐‘˜ = ๐‘˜ + 1 14: 15: end while 16: Output: ๐ถ๐‘˜ , ๐‘ˆ๐‘˜ , ๐‘…๐‘˜ : The CUR components of the estimated low-rank matrix. [๐ถ๐‘˜+1] ๐ผ,: = ๐‘ˆ๐‘˜+1 ๐‘…๐‘˜+1 // Do not compute (see (3.2)) and We will explain the algorithm step by step in the following paragraphs. For clarity, we begin with the low-rank component. To leverage the structure of cross-concentrated samples, it is efficient to enforce the low-rank constraint on ๐‘‹ using the CUR approximation technique. Let ๐‘… = ๐‘‹๐ผ,:, ๐ถ = ๐‘‹:,๐ฝ, and ๐‘ˆ = ๐‘‹๐ผ,๐ฝ. Applying gradient descent directly on ๐‘… and ๐ถ yields: ๐‘…๐‘˜+1 = [๐‘‹๐‘˜ ] ๐ผ,: + ๐œ‚๐‘… (cid:2)Pฮฉ๐‘… (๐‘Œ โˆ’ ๐‘‹๐‘˜ โˆ’ ๐‘†๐‘˜+1)(cid:3) ๐ผ,: ๐ถ๐‘˜+1 = [๐‘‹๐‘˜ ]:,๐ฝ + ๐œ‚๐ถ (cid:2)Pฮฉ๐ถ (๐‘Œ โˆ’ ๐‘‹๐‘˜ โˆ’ ๐‘†๐‘˜+1)(cid:3) , , :,๐ฝ where ๐œ‚๐‘… and ๐œ‚๐ถ are the step sizes. However, When it comes to the intersection submatrix ๐‘ˆ, it is more complicated as ฮฉ๐‘… and ฮฉ๐ถ can have overlaps. We abuse the notation (cid:93) and define an operator called union sum here: (cid:2)[๐‘…๐‘˜+1]:,๐ฝ (cid:93) [๐ถ๐‘˜+1] ๐ผ,: (cid:3) ๐‘–, ๐‘— = [๐‘…๐‘˜+1]๐‘–, ๐‘— [๐ถ๐‘˜+1]๐‘–, ๐‘— if (๐‘–, ๐‘—) โˆˆ ฮฉ๐‘… \ ฮฉ๐ถ; if (๐‘–, ๐‘—) โˆˆ ฮฉ๐ถ \ ฮฉ๐‘…; ๐œ‚๐‘…๐œ‚๐ถ ๐œ‚๐‘…+๐œ‚๐ถ (cid:16) [๐‘…๐‘˜+1]๐‘–, ๐‘— ๐œ‚๐‘… + [๐ถ๐‘˜+1]๐‘–, ๐‘— ๐œ‚๐ถ (cid:17) if (๐‘–, ๐‘—) โˆˆ ฮฉ๐ถ โˆฉ ฮฉ๐‘…; 0 otherwise. ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ 80 Basically, we take whatever value we have for the non-overlapped entries and take a weighted aver- age for the overlaps where the weights are determined by the stepsizes used in the updates of ๐‘…๐‘˜+1 and ๐ถ๐‘˜+1. To ensure the rank-๐‘Ÿ constraint, at least one of ๐ถ๐‘˜+1, ๐‘ˆ๐‘˜+1 or ๐‘…๐‘˜+1 should be rank-๐‘Ÿ. For computational efficiency, we choose to put it on the smallest one. Thus, ๐‘ˆ๐‘˜+1 = H๐‘Ÿ (cid:0)[๐‘…๐‘˜+1]:,๐ฝ (cid:93) [๐ถ๐‘˜+1] ๐ผ,: (cid:1) , where H๐‘Ÿ is the best rank-๐‘Ÿ approximation operator via truncated SVD. After replacing the inter- section part ๐‘ˆ๐‘˜+1 in the previously updated ๐‘…๐‘˜+1 and ๐ถ๐‘˜+1, we have the new estimation of low-rank component: ๐‘‹๐‘˜+1 = ๐ถ๐‘˜+1๐‘ˆโ€  ๐‘˜+1 ๐‘…๐‘˜+1. (3.1) However, (3.1) is just a conceptual step and one should never compute it. In fact, the full matrix ๐‘‹๐‘˜ is never needed and should not be formed in the algorithm as updating the corresponding CUR components is sufficient. We detect the outliers and put them into the sparse matrix ๐‘† via hard-thresholding operator: ๏ฃฑ๏ฃด๏ฃด๏ฃด๏ฃด๏ฃฒ ๏ฃด๏ฃด๏ฃด๏ฃด ๏ฃณ The hard-thresholding on residue ๐‘Œ โˆ’ ๐‘‹๐‘˜ , paired with iterative decayed thresholding values: if |[๐‘€]๐‘–, ๐‘— | < ๐œ; [T๐œ (๐‘€)]๐‘–, ๐‘— = otherwise. [๐‘€]๐‘–, ๐‘— 0 ๐œ๐‘˜+1 = ๐›พ ๐‘˜ ๐œ0 with some ๐›พ โˆˆ (0, 1), has shown promising performance in outlier detection in prior art [12, 20, 14]. Notice that we only need to remove outliers located on the selected rows and columns, i.e., ๐‘… and ๐ถ, since they are the only components needed to update the low-rank component later. Therefore, for compu- tational efficiency, we should only compute ๐‘‹๐‘˜ on the selected rows and columns to update ๐‘†๐‘˜+1 correspondinglyโ€”as said, one should never compute the full ๐‘‹๐‘˜ in this algorithm. In particular, [๐‘‹๐‘˜ ] ๐ผ,: = [๐ถ๐‘˜ ] ๐ผ,:๐‘ˆโ€  ๐‘˜ ๐‘…๐‘˜ and [๐‘‹๐‘˜ ]:,๐ฝ = ๐ถ๐‘˜๐‘ˆโ€  ๐‘˜ [๐‘…๐‘˜ ]:,๐ฝ . (3.2) The stopping criteria is set to be ๐‘’๐‘˜ โ‰ค ๐œ€ where ๐œ€ is the targeted accuracy and the computational error is ๐‘’๐‘˜ = (cid:104)Pฮฉ๐‘…โˆชฮฉ๐ถ (๐‘†๐‘˜ + ๐‘‹๐‘˜ โˆ’ ๐‘Œ ), ๐‘†๐‘˜ + ๐‘‹๐‘˜ โˆ’ ๐‘Œ (cid:105) (cid:104)Pฮฉ๐‘…โˆชฮฉ๐ถ๐‘Œ , ๐‘Œ (cid:105) . (3.3) 81 The recommended stepsizes are ๐œ‚๐‘… = 1 ๐‘๐‘… and ๐œ‚๐ถ = 1 of ฮฉ๐‘… and ฮฉ๐ถ respectively. Smaller stepsizes should be used with larger ๐›ผ, i.e., more outliers. ๐‘๐ถ where ๐‘๐‘… and ๐‘๐ถ are the observation rates 3.3 Numerical Experiments In this section, we will verify the empirical performance of RCURC with both synthetic and real datasets. All experiments are implemented using MATLAB Online (R2024a, Update 6) and executed on a cloud-based Linux environment running Ubuntu 20.04 with kernel version 5.15.0- 1062-AWS. 3.3.1 Synthetic Datasets In this simulation, we assess the computational efficiency of our algorithm, RCURC, in address- ing the robust CCS completion problem. We construct ๐‘Œ = ๐‘‹ + ๐‘†, a ๐‘‘ ร— ๐‘‘ matrix with ๐‘‘ = 3000, where ๐‘‹ = ๐‘Š๐‘‰ (cid:62) is a randomly generated rank-๐‘Ÿ matrix. To create the sparse outlier tensor ๐‘†, we randomly select ๐›ผ percent entries to form the support of ๐‘†. The values of the non-zero entries are then uniformly sampled from the range [โˆ’๐‘E((cid:12) (cid:12) (cid:12))]. To generate the robust CCS (cid:12)[๐‘‹]๐‘–, ๐‘— completion problems, we set |J|๐‘‘ = 25%. The results are obtained by averaging over 10 runs and reported in Figure 3.2. Both figures in Figure 3.2 depict the relationship ๐‘‘ = 30%, and ๐‘‘ = |J| |I| (cid:12)[๐‘‹]๐‘–, ๐‘— |ฮฉ๐‘… | (cid:12)), ๐‘E((cid:12) (cid:12) |I|๐‘‘ = |ฮฉ๐ถ | between the relative error ๐‘’๐‘˜ and computational time for our RCURC method with varying rank ๐‘Ÿ and outlier amplification factor ๐‘. It is noteworthy that RCURC consistently achieves nearly linear convergence rates across different scenarios. The empirical convergence illustrated in the left sub- plot of Figure 3.2 shows that smaller r values allow the algorithm to achieve a given relative error in fewer iterations. This is likely because smaller ๐‘Ÿ values minimize the impact of noise during the iterative process, enabling the algorithm to concentrate on the dominant low-rank structure of the matrix, which results in faster convergence. 82 Figure 3.2 Empirical convergence of RCURC [18]. Left: c = 10 and varying r. Right: ๐›ผ = 0.2, ๐‘Ÿ = 5 and varying ๐‘. 3.3.2 Video Background Subtraction We have applied RCURC under our robust CCS model to the problem of background separation. We evaluated our algorithm on the Train Station dataset [110]. The dataset is of size 173ร—216ร—500. In order to transform this data into a low-rank matrix, a specific reconfiguration process is applied. This process involves stacking each of the frontal slices of the tensor, which are essentially indi- vidual frames of the video. To transform high-dimensional video data into a low-rank matrix for streamlined data processing and analysis, we flatten the height and width dimensions (173 and 216) into a single dimension while retaining the frame dimension. This reshaping converts the original 3-dimensional tensor into a 2-dimensional matrix, facilitating subsequent computations. The re- sulting matrix has a size of 37,368 (the product of 173 and 216) by 500 (the number of frames). The CCS model is constructed by selecting 5% of the rows and 5% of the columns to create subrow and subcolumn matrices, with a sampling rate of 80% on each submatrices. We selected several bench- mark algorithms for comparison, including Principal Component Pursuit (RPCA) [23], Stable Prin- cipal Component Pursuit (SPCP) [144], Low-Rank Matrix Factorization (LRMF) [81], Accelerated Alternating Projection (AccAltProj) [12], Iterated Robust CUR with fixed indices (IRCUR-F) [15], and Iterated Robust CUR with resampled indices (IRCUR-R) [15]. However, all six benchmark al- gorithms are operated under the full observation model since we find they do not visually perform 83 02468Time(secs)10-610-410-2100Relative ErrorRelative Error vs Runtimer=5r=10r=150246Time(secs)10-610-410-2100Relative ErrorRelative Error vs Runtimec= 10c= 50c = 200 well on the constructed CCS mdoel. Since IRCUR-F and IRCUR-R are both CUR-based algo- rithms, we set the sampling parameters for IRCUR-R and IRCUR-F to 1.7732 ร— 102 and 4.0227, respectively. This configuration ensures that these two algorithms access 5% of the rows and 5% of the columns from the full observation data in each iteration. The methods RPCA, SPCP, AccAlt- Proj, IRCUR-F, IRCUR-R and RCURC are configured to run with a maximum of 50 iterations or until it meets a convergence tolerance of 10โˆ’3, whichever condition is satisfied first. We man- ually set the regularization parameter for RPCA and SPCP to 0.001, as it yielded the best visual results during tuning from the values 0.001, 0.01, 0.1, 1, and 10. The LRMF method is configured to run with a maximum of 5 iterations and a rank of 1. After manual tunings, where the rank pa- rameter is tested with values 1, 3, and 5, we selected rank 1 as it provided the best balance between visual quality and runtime efficiency. Increasing the rank provided no significant improvement in visual quality while substantially increasing the runtime. Similarly, increasing the number of it- erations beyond 5 does not result in noticeable improvements in visual quality, but significantly extended the runtime. For AccAltProj, IRCUR-R, IRCUR-F, and RCURC, we select a rank pa- rameter of 1 as it yields the best visual results. This choice is based on tuning the rank parameter over the values 1, 3, 5, 7, 9. The visual results are shown in Figure 3.3, consisting of five selected frames (80th, 160th, 240th, 320th and 400th). We present the corresponding quantitative results in Table 3.1, where the comparison is performed using the Peak Signal-to-Noise Ratio (PSNR) to evaluate reconstruction accuracy and computational time to assess efficiency. The PSNR is cal- culated by comparing reconstructed background, obtained through different methods, against the ground truth background tensor. The ground truth tensor is created by replicating the first frame, which represents a static background, across all frames in the dataset. For deterministic algorithms like RPCA, SPCP, LMRF, each quantitative result is calculated as the average of ten independent runs to help mitigate errors from machine precision, floating-point arithmetic, or other low-level numerical issues. In contrast, ICURC-R and ICURC-F involve randomness in row and column selection under full obeservation during iterations. Our method, based on the inherently random sampling modelโ€”the CCS model, naturally involves randomness. Thus, each quantitative result of 84 these three random method is computed as the average of ten independent runs, with the standard deviation included. Table 3.1 Comparison of runtime and PSNR among RPCA, SPCP, LRMF, IRCUR-R, IRCUR-F based on full observation and RCURC based on the CCS model. Runtime (sec) PSNR RPCA SPCP LRMF AccAltProj 13.55 28.35 3.32 41.65 36.86 32.04 41.75 34.62 IRCUR-R 0.31 ยฑ 0.02 41.40 ยฑ 0.19 IRCUR-F 0.18 ยฑ 0.03 41.06 ยฑ 0.22 RCURC 0.20 ยฑ 0.05 40.60 ยฑ 0.58 The visualization results in Figure 3.3 illustrate the performance of various methods in recon- structing static background components. Rows 3, 4, and 5 highlight the effectiveness of RPCA, SPCP, and LRMF when applied to full observation. While these methods generally produce con- sistent and comparable results, subtle imperfections are noticeable in certain frames, such as the 160th frame, where minor blurring or incomplete restoration occurs. Rows 6, 7, and 8, correspond- ing to AccAltProj, IRCUR-R, and IRCUR-F, demonstrate notable performance in reconstructing background components. These methods effectively handle the background separation task, deliv- ering visually satisfactory outputs. It is evident that the our method (last row) performs well in background subtraction. The results are comparable to other state-of-the-art algorithms under full observation, indicating the success of the our method in the video background separation task. The results in Table 3.1 provide a quantitative comparison of runtime and PSNR across various meth- ods. The RCURC algorithm achieves a runtime of 0.20 ยฑ 0.05 seconds, which is significantly faster than RPCA (13.55 seconds), SPCP (36.86 seconds), LRMF (41.75 seconds), and AccAltProj (3.32 seconds). When compared to IRCUR-R (0.31 ยฑ 0.02 seconds) and IRCUR-F (0.18 ยฑ 0.03 seconds), RCURC remains highly competitive in terms of runtime. Regarding PSNR, our method achieves a value of 40.60 ยฑ 0.58, which is higher than RPCA (28.35), SPCP (32.04), and LRMF (34.62). Although the PSNR values of IRCUR-R (41.40 ยฑ 0.19) and IRCUR-F (41.06 ยฑ 0.22) are slightly higher, our method still demonstrates competitive performance. 85 80th frame 160th frame 240th frame 320th frame 400th frame Figure 3.3 Video background subtraction results: Row 1 shows the original images (full obser- vation) at the corresponding frames, while Row 2 presents the observed images generated by the CCS model at the respective frames. Rows 3 to 8 showcase the background components extracted using RPCA, SPCP, LMRF, AccAltProj, IRCUR-R, and IRCUR-F algorithms based on the full ob- servation model. Row 9 presents the results obtained using the RCURC algorithm under the CCS model. 86 3.4 Conclusion This chapter introduces a novel mathematical model for robust matrix completion problems with cross-concentrated samples. A highly efficient non-convex algorithm, dubbed RCURC, has been developed for the proposed model. The key techniques are projected gradient descent and CUR approximation. The numerical experiments, with both synthetic and real datasets, show high potential. In particular, we consistently observe linear convergence on RCURC. As for future work, we will study the statistical properties of the proposed robust CCS comple- tion model, such as theoretical sample complexities and outlier tolerance. The recovery guarantee with a linear convergence rate will also be established for RCURC. We will also try to give a theoret- ical analysis explaining why a smaller rank accelerates the convergence of our RCURC algorithm. We will also explore other real-world applications that suit the proposed model. 87 CHAPTER 4 CONCLUSION 88 In this thesis, we have addressed critical challenges in matrix and tensor analysis by develop- ing robust and flexible methodologies for data recovery tasks in data science. The methodologies propose in this thesis contribute to the advancement of matrix and tensor analysis, particularly in scenarios where robustness and flexibility are critical. By addressing the challenges posed by noise, sparsity, and complex data structures, the propose techniques have the potential to benefit a wide range of applications, such as image processing. Furthermore, the theoretical foundations estab- lished for robust sampling and decomposition provide a framework for future extensions in related fields. Our contributions span two interconnected projects, each tackling fundamental limitations in existing approaches while extending their applicability to real-world scenarios characterized by noise, sparsity, and high dimensionality. This chapter summarizes the key contributions of the thesis. 4.1 Summary of Contributions Guaranteed Sampling Flexibility for Tensor Completion In this project, we address the limitations of existing tensor completion methods by introducing Tensor Cross-Concentrated Sampling (t-CCS), a generalization of CCS to higher-order tensors. Ac- companying this sampling framework, we develop the Iterative Tensor CUR Completion (ITCURC) algorithm, which offers theoretical guarantees for low-tubal-rank tensor recovery. Through rigor- ous theoretical analysis and extensive empirical validation, this project demonstrates the flexibility, accuracy, and computational efficiency of t-CCS-based model. Robust CCS Completion for Matrix Analysis In this project, we explore the robustness of CCS for matrix completion. While CCS has demon- strated effectiveness in capturing cross-sectional dependencies, its sensitivity to sparse outliers posed a significant limitation. To address this, we propose the Robust CCS Completion frame- work, introducing a non-convex iterative algorithm designed to handle noisy and incomplete data. Experiments on synthetic and real-world datasets validate our algorithmโ€™s efficiency and robustness, establishing it as a robust tool for practical data completion tasks. 89 BIBLIOGRAPHY [1] Benefits and risks of MRI. Benefits and Risks of MRI. Accessed: 2023-12-19. [2] [3] [4] [5] [6] [7] [8] [9] D. Achlioptas and F. McSherry. Fast computation of low-rank matrix approximations. Jour- nal of the ACM (JACM), 54(2):9โ€“es, 2007. S. Ahmadi-Asl, C. F. Caiafa, A. Cichocki, A. H. Phan, T. Tanaka, I. Oseledets, and J. Wang. Cross tensor approximation methods for compression and dimensionality reduction. IEEE Access, 9:150809โ€“150838, 2021. S. Ahmadi-Asl, A. H. Phan, A. Cichocki, A. Sozykina, Z. A. Aghbari, J. Wang, and I. Os- eledets. Adaptive cross tubal tensor approximation. arXiv:2305.05030, 2023. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine learning, 73(3):243โ€“272, 2008. H. Avron and C. Boutsidis. Faster subset selection for matrices and applications. SIAM Journal on Matrix Analysis and Applications., 34(4):1464โ€“1499, 2013. J. Bennett, C. Elkan, B. Liu, P. Smyth, and D. Tikk. KDD cup and workshop 2007. SIGKDD Explor. Newsl., 9(2):51โ€“52, 2007. A. Bhaskara, A. Rostamizadeh, J. Altschuler, M. Zadimoghaddam, T. Fu, and V. Mirrokni. Greedy column subset selection: New bounds and distributed algorithms. International Con- ference on Machine Learning, 2016. C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix recon- struction. SIAM Journal on Computing, 43(2):687โ€“717, 2014. [10] C. Boutsidis and D. Woodruff. Optimal CUR matrix decompositions. SIAM Journal on Computing, 46(2):543โ€“589, 2017. [11] H. Cai, J.-F. Cai, T. Wang, and G. Yin. Accelerated structured alternating projections for ro- bust spectrally sparse signal recovery. IEEE Transactions on Signal Processing, 69:809โ€“821, 2021. [12] H. Cai, J.-F. Cai, and K. Wei. Accelerated alternating projections for robust principal com- ponent analysis. Journal of Machine Learning Research, 20(1):685โ€“717, 2019. [13] H. Cai, J.-F. Cai, and J. You. Structured gradient descent for fast robust low-rank Hankel matrix completion. SIAM Journal on Scientific Computing., 45(3):A1172โ€“A1198, 2023. [14] H. Cai, Z. Chao, L. Huang, and D. Needell. Fast robust tensor principal component analysis via fiber CUR decomposition. In Proceedings of the IEEE/CVF International Conference 90 on Computer Vision (ICCV) Workshops, pages 189โ€“197, 2021. [15] H. Cai, K. Hamm, L. Huang, J. Li, and T. Wang. Rapid robust principal component analysis: CUR accelerated in exact low rank estimation. IEEE Signal Processing Letters, 28:116โ€“120, 2020. [16] H. Cai, K. Hamm, L. Huang, and D. Needell. Mode-wise tensor decompositions: Multi- dimensional generalizations of CUR decompositions. Journal of Machine Learning Re- search, 22(185):1โ€“36, 2021. [17] H. Cai, K. Hamm, L. Huang, and D. Needell. Robust CUR decomposition: Theory and imaging applications. SIAM Journal on Imaging Sciences, 14(4):1472โ€“1503, 2021. [18] H. Cai, L. Huang, C. Kundu, and B. Su. On the robustness of cross-concentrated sampling for matrix completion. In 2024 58th Annual Conference on Information Sciences and Systems (CISS), pages 1โ€“5, 2024. [19] H. Cai, L. Huang, P. Li, and D. Needell. Matrix completion with cross-concentrated sam- pling: Bridging uniform sampling and CUR sampling. IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2023. [20] H. Cai, J. Liu, and W. Yin. Learned robust PCA: A scalable deep unfolding approach for high-dimensional outlier detection. In Advances in Neural Information Processing Systems, volume 34, pages 16977โ€“16989, 2021. [21] J.-F. Cai, R. H. Chan, and Z. Shen. A framelet-based image inpainting algorithm. Applied and Computational Harmonic Analysis, 24(2):131โ€“149, 2008. [22] J.-F. Cai, T. Wang, and K. Wei. Fast and provable algorithms for spectrally sparse signal re- construction via low-rank Hankel matrix completion. Applied and Computational Harmonic Analysis, 46(1):94โ€“121, 2019. [23] E. Candรจs, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM, 58(3):1โ€“37, 2011. [24] E. Candes and B. Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111โ€“119, 2012. [25] E. J. Candรจs and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717โ€“772, 2009. [26] E. J. Candรจs and T. Tao. The power of convex relaxation: Near-optimal matrix completion. IEEE transactions on information theory, 56(5):2053โ€“2080, 2010. [27] C. Chen, M. Gu, Z. Zhang, W. Zhang, and Y. Yu. Efficient spectrum-revealing cur matrix 91 decomposition. In International Conference on Artificial Intelligence and Statistics, pages 766โ€“775. Proceedings of Machine Learning Research, 2020. [28] J. Chen, Y. Wei, and Y. Xu. Tensor CUR decomposition under t-product and its perturbation. Numerical Functional Analysis and Optimization, 43(6):698โ€“722, 2022. [29] P. Chen and D. Suter. Recovering the missing components in a large noisy low-rank ma- trix: Application to sfm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8):1051โ€“1063, 2004. [30] Y. Chen and Y. Chi. Robust spectral compressed sensing via structured matrix completion. IEEE Transactions on Information Theory, 60(10):6576โ€“6601, 2014. [31] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis. Low-rank matrix recovery from errors and erasures. IEEE Transactions on Information Theory, 59(7):4324โ€“4337, 2013. [32] J. Chiu and L. Demanet. Sublinear randomized algorithms for skeleton decompositions. SIAM Journal on Matrix Analysis and Applications., 34(3):1361โ€“1383, 2013. [33] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, D. P. Mandic, et al. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompo- sitions. Foundations and Trendsยฎ in Machine Learning, 9(4-5):249โ€“429, 2016. [34] K. L. Clarkson and D. Woodruff. Numerical linear algebra in the streaming model. In Pro- ceedings of the forty-first annual ACM symposium on Theory of computing, pages 205โ€“214, 2009. [35] A. Deshpande and L. Rademacher. Efficient volume sampling for row/column subset se- In 2010 ieee 51st annual symposium on foundations of computer science, pages lection. 329โ€“338. IEEE, 2010. [36] Y. Dong and P.-G. Martinsson. Simpler is better: A comparative study of randomized algo- rithms for computing the CUR decomposition. arXiv preprint arXiv:2104.05877, 2021. [37] D. L. Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289โ€“1306, 2006. [38] P. Drineas, R. Kannan, and M. Mahoney. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing, 36(1):184โ€“206, 2006. [39] P. Drineas, R. Kannan, and M. Mahoney. Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. SIAM Journal on Comput- ing, 36(1):184โ€“206, 2006. 92 [40] P. Drineas, M. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications., 30(2):844โ€“881, 2008. [41] P. Drineas, M. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844โ€“881, 2008. [42] G. Ely, S. Aeron, N. Hao, and M. Kilmer. 5D seismic data completion and denoising using a novel class of tensor decompositions. Geophysics, 80:V83 โ€“ V95, 2015. [43] A. Gaur and S. S. Gaur. Statistical methods for practice and research: A guide to data analysis using SPSS. Sage, 2006. [44] P. Y. Gidisu and M. E. Hochstenbach. A hybrid DEIM and leverage scores based method for CUR index selection. In European Consortium for Mathematics in Industry, pages 147โ€“153. Springer, 2021. [45] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61โ€“70, 1992. [46] S. A. Goreinov, I. Oseledets, D. V. Savostyanov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. How to find a good submatrix. In Matrix Methods: Theory, Algorithms And Applications: Dedicated to the Memory of Gene Golub, pages 247โ€“256. World Scientific, 2010. [47] S. A. Goreinov and E. E. Tyrtyshnikov. The maximal-volume concept in approximation by low-rank matrices. Contemporary Mathematics, 280:47โ€“52, 2001. [48] L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix Analysis and Applications., 31(4):2029โ€“2054, 2010. [49] L. Grasedyck and S. Krรคmer. Stable ALS approximation in the TT-format for rank-adaptive tensor completion. Numerische Mathematik, 143(4):855โ€“904, 2019. [50] D. Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transac- tions on Information Theory, 57:1548โ€“1566, 2009. [51] W. Guo and J.-M. Qiu. A local macroscopic conservative (lomac) low rank tensor method for the vlasov dynamics. arXiv preprint arXiv:2207.00518, 2022. [52] W. Guo and J.-M. Qiu. A low rank tensor representation of linear transport and nonlin- ear vlasov solutions and their associated flow maps. Journal of Computational Physics, 458:111089, 2022. [53] W. Guo and J.-M. Qiu. A conservative low rank tensor method for the Vlasov dynamics. 46(1):A232โ€“A263, 2024. 93 [54] W. Hackbusch and S. Kรผhn. A new scheme for the tensor representation. Journal of Fourier analysis and applications, 15(5):706โ€“722, 2009. [55] N. Halko, P.-G. Martinsson, and J. Tropp. Finding structure with randomness: Prob- abilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217โ€“288, 2011. [56] K. Hamm. Generalized pseudoskeleton decompositions. Linear Algebra and Its Applica- tions, 664:236โ€“252, 2023. [57] K. Hamm and L. Huang. Perspectives on CUR decompositions. Applied and Computational Harmonic Analysis, 48(3):1088โ€“1099, 2020. [58] K. Hamm and L. Huang. Stability of sampling for CUR decompositions. Foundations of Data Science, 2(2):83, 2020. [59] K. Hamm, M. Meskini, and H. Cai. Riemannian CUR decompositions for robust principal component analysis. In ICML Workshop on Topology, Algebra, and Geometry in Machine Learning, 2022. [60] F. Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematical Physics, 6(1-4):164โ€“189, 1927. [61] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He. Fast and accurate matrix completion via trun- IEEE Transactions on Pattern Analysis and Machine cated nuclear norm regularization. Intelligence, 35(9):2117โ€“2130, 2012. [62] P. Jain and S. Oh. Provable tensor factorization with missing data. In Advances in Neural Information Processing Systems, volume 2, page 1431 โ€“ 1439, 2014. [63] S. Jain, A. Gutierrez, and J. Haupt. Noisy tensor completion for tensors with a sparse canonical polyadic factor. In IEEE International Symposium on Information Theory, pages 2153โ€“2157, 2017. [64] T. Jiang, T. Huang, X. Zhao, and L. Deng. Multi-dimensional imaging data recovery via minimizing the partial sum of tubal nuclear norm. Journal of Computational and Applied Mathematics, 372:112680, 2020. [65] T. Jiang, T. Huang, X. Zhao, T. Ji, and L. Deng. Matrix factorization for low-rank tensor completion using framelet prior. Information Sciences, 436:403โ€“417, 2018. [66] T. Jiang, M. K. P. Ng, X. Zhao, and T. Huang. Framelet representation of tensor nuclear norm for third-order tensor completion. IEEE Transactions on Image Processing, 29:7233โ€“7244, 2020. 94 [67] R. Jin and S. Zhu. CUR algorithm with incomplete matrix observation. arXiv preprint arXiv:1403.5647, 2014. [68] H. Johan. Tensor rank is NP-complete. Journal of Algorithms, 4(11):644โ€“654, 1990. [69] M. Kilmer, K. Braman, N. Hao, and R. C. Hoover. Third-order tensors as operators on matrices: A theoretical and computational framework with applications in imaging. SIAM Journal on Matrix Analysis and Applications., 34(1):148โ€“172, 2013. [70] M. Kilmer, L. Horesh, H. Avron, and E. Newman. Tensor-tensor products for optimal rep- resentation and compression. arXiv:2001.00046, 2019. [71] M. Kilmer and C. Martin. Factorization strategies for third-order tensors. Linear Algebra and Its Applications, 435(3):641โ€“658, 2011. [72] A. Kolbeinsson, J. Kossaifi, Y. Panagakis, A. Bulat, A. Anandkumar, I. Tzoulaki, and P. M. Matthews. Tensor dropout for robust learning. IEEE Journal of Selected Topics in Signal Processing, 15(3):630โ€“640, 2021. [73] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455โ€“500, 2009. [74] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender sys- tems. Computer, 42(8):30โ€“37, 2009. [75] S. Kumar, M. Mohri, and A. Talwalkar. Ensemble nystrom method. Advances in Neural Information Processing Systems, 22, 2009. [76] S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the Nystrรถm method. Journal of Machine Learning Research, 13(1):981โ€“1006, 2012. [77] C. Li, X. Wang, W. Dong, J. Yan, Q. Liu, and H. Zha. Joint active learning with feature se- lection via CUR matrix decomposition. IEEE transactions on pattern analysis and machine intelligence, 41(6):1382โ€“1396, 2018. [78] X. Li. Compressed sensing and matrix completion with constant proportion of corruptions. Constructive Approximation, 37:73โ€“99, 2013. [79] X. Li and Y. Pang. Deterministic column-based matrix decomposition. IEEE Transactions on Knowledge and Data Engineering, 22(1):145โ€“149, 2010. [80] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing IEEE Transactions on Pattern Analysis and Machine Intelligence, values in visual data. 35(1):208โ€“220, 2013. 95 [81] Q. Liu and X. Li. Efficient low-rank matrix factorization based on โ„“1,๐œ€-norm for online background subtraction. IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4900โ€“4904, 2021. [82] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan. Tensor robust principal component anal- ysis: Exact recovery of corrupted low-rank tensors via convex optimization. In Proceedings of the IEEE conference on computer vision and pattern Recognition, pages 5249โ€“5257, 2016. [83] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan. Tensor robust principal component anal- ysis with a new tensor nuclear norm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(04):925โ€“938, apr 2020. [84] C. Lu, J. Feng, Z. Lin, and S. Yan. Exact low tubal rank tensor recovery from Gaussian measurements. In Proc. 27th International Joint Conference on Artificial Intelligence, pages 2504โ€“2510, 2018. [85] L. Mackey, M. Jordan, and A. Talwalkar. Divide-and-conquer matrix factorization. Advances in neural information processing systems, 24, 2011. [86] M. Mahoney and P. Drineas. CUR matrix decompositions for improved data analy- sis. Proceedings of the National Academy of Sciences of the United States of America, 106(3):697โ€“702, 2009. [87] M. Mahoney and P. Drineas. CUR matrix decompositions for improved data analy- sis. Proceedings of the National Academy of Sciences of the United States of America, 106(3):697โ€“702, 2009. [88] P.-G. Martinsson and J. Tropp. Randomized numerical linear algebra: Foundations and algorithms. Acta Numerica, 29:403โ€“572, 2020. [89] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain. Non-convex robust PCA. In Advances in Neural Information Processing Systems, pages 1107โ€“1115, 2014. [90] L. Omberg, G. H. Golub, and O. Alter. A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proceedings of the National Academy of Sciences of the United States of America, 104(47):18371โ€“18376, 2007. [91] I. Oseledets. 33(5):2295โ€“2317, 2011. Tensor-train decomposition. SIAM Journal on Scientific Computing, [92] U. Oswal, S. Jain, K. S. Xu, and B. Eriksson. Block CUR: Decomposing matrices using groups of columns. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10โ€“14, 2018, Proceedings, Part II 18, pages 360โ€“376. Springer, 2019. 96 [93] V. Y. Pan, Q. Luan, J. Svadlenka, and L. Zhao. Superfast CUR matrix algorithms, their pre-processing and extensions. arXiv preprint arXiv:1710.07946, 2017. [94] R. Peng and E. Matsui. The Art of Data Science: A guide for anyone who works with Data. Skybrude Consulting, LLC, 2015. [95] J. Popa, S. Minkoff, and Y. Lou. An improved seismic data completion algorithm using low-rank tensor optimization: Cost reduction and optimal data orientation. Geophysics, 86(3):V219โ€“V232, 2021. [96] W. Qin, H. Wang, F. Zhang, J. Wang, X. Luo, and T. Huang. Low-rank high-order ten- sor completion with applications in visual data. IEEE Transactions on Image Processing, 31:2433โ€“2448, 2022. [97] B. Recht. A simpler approach to matrix completion. Journal of Machine Learning Research, 12(12), 2011. [98] S. Rendle. Factorization machines. In 2010 IEEE International conference on data mining, pages 995โ€“1000. IEEE, 2010. [99] M. M. Salut and D. Anderson. Tensor robust CUR for compression and denoising of hyper- spectral data. IEEE Access, 2023. [100] M. M. Salut and D. V. Anderson. Tensor robust cur for compression and denoising of hyper- spectral data. IEEE Access, 2023. [101] P. Shah, N. Rao, and G. Tang. Sparse and low-rank tensor decomposition. In Advances in Neural Information Processing Systems, volume 28, 2015. [102] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos. Tensor decomposition for signal processing and machine learning. IEEE Transactions on signal processing, 65(13):3551โ€“3582, 2017. [103] A. Sobral and E. Zahzah. Matrix and tensor completion algorithms for background model initialization: A comparative evaluation. Pattern Recognition Letters, 96:22โ€“33, 2017. [104] G. Song, M. K. P. Ng, and X. Zhang. Robust tensor completion using transformed tensor singular value decomposition. Numerical Linear Algebra with Applications, 27(3):e2299, 2020. [105] G. Song, M. K. P. Ng, and X. Zhang. Tensor completion by multi-rank via unitary transfor- mation. Applied and Computational Harmonic Analysis, 65:348โ€“373, 2023. [106] D. C. Sorensen and M. Embree. A deim induced CUR factorization. SIAM Journal on Scientific Computing, 38(3):A1454โ€“A1482, 2016. 97 [107] B. Su, J. You, H. Cai, and L. Huang. Guaranteed sampling flexibility for low-tubal-rank tensor completion. arXiv preprint arXiv:2406.11092, 2024. [108] Z. Tan, L. Huang, H. Cai, and Y. Lou. Non-convex approaches for low-rank tensor com- pletion under tubal sampling. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1โ€“5, 2023. [109] D. A. Tarzanagh and G. Michailidis. Fast randomized algorithms for t-product based tensor operations and decompositions with applications to imaging data. SIAM Journal on Imaging Sciences, 11(4):2629โ€“2664, 2018. [110] D. Thirde, L. Li, and F. Ferryman. Overview of the PETS2006 challenge. In Proc. 9th IEEE international workshop on performance evaluation of tracking and surveillance (PETS 2006), pages 47โ€“50, 2006. [111] C. Thurau, K. Kersting, and C. Bauckhage. Deterministic CUR for improved large-scale data analysis: An empirical study. In Proceedings of the 2012 SIAM International Conference on Data Mining, pages 684โ€“695. SIAM, 2012. [112] T. Tong, C. Ma, and Y. Chi. Accelerating ill-conditioned low-rank matrix estimation via scaled gradient descent. Journal of Machine Learning Research, 22(150):1โ€“63, 2021. [113] T. Tong, C. Ma, A. Prater-Bennette, E. Tripp, and Y. Chi. Scaling and scalability: Provable nonconvex low-rank tensor estimation from incomplete measurements. Journal of Machine Learning Research, 23(163):1โ€“77, 2022. [114] J. Tropp. Column subset selection, matrix factorization, and eigenvalue optimization. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 978โ€“986. Society for Industrial and Applied Mathematics, 2009. [115] J. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computa- tional Mathematics, 12:389โ€“434, 2010. [116] J. Tropp. An introduction to matrix concentration inequalities. Foundations and Trendsยฎ in Machine Learning, 8(1-2):1โ€“230, 2015. [117] L. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279โ€“311, 1966. [118] E. Tyrtyshnikov. Incomplete cross approximation in the mosaic-skeleton method. Comput- ing, 64:367โ€“380, 2000. [119] S. Voronin and P.-G. Martinsson. Efficient algorithms for CUR and interpolative matrix decompositions. Advances in Computational Mathematics, 43:495โ€“516, 2017. 98 [120] A. Wang and Z. Jin. Near-optimal noisy low-tubal-rank tensor completion via singular tube thresholding. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 553โ€“560, 2017. [121] A. Wang, D. Wei, B. Wang, and Z. Jin. Noisy low-tubal-rank tensor completion through iterative singular tube thresholding. IEEE Access, 6:35112โ€“35128, 2018. [122] S. Wang and Z. Zhang. Improving CUR matrix decomposition and the nystrรถm approxi- mation via adaptive sampling. Journal of Machine Learning Research, 14(1):2729โ€“2769, 2013. [123] S. Wang and Z. Zhang. Improving CUR matrix decomposition and the nystrรถm approxi- mation via adaptive sampling. Journal of Machine Learning Research, 14(1):2729โ€“2769, 2013. [124] S. Wang, Z. Zhang, and T. Zhang. Towards more efficient spsd matrix approximation and CUR matrix decomposition. Journal of Machine Learning Research, 17(209):1โ€“49, 2016. [125] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: IEEE Transactions on Image Processing, from error visibility to structural similarity. 13(4):600โ€“612, 2004. [126] D. Woodruff et al. Sketching as a tool for numerical linear algebra. Foundations and Trendsยฎ in Theoretical Computer Science, 10(1โ€“2):1โ€“157, 2014. [127] C. Wu, H. Zhao, and J. Hu. Near field sampling compression based on matrix CUR de- In 2021 IEEE International Symposium on Antennas and Propagation and composition. USNC-URSI Radio Science Meeting (APS/URSI), pages 1455โ€“1456. IEEE, 2021. [128] M. Xu, R. Jin, and Z.-H. Zhou. CUR algorithm for partially observed matrices. In In- ternational Conference on Machine Learning, pages 1412โ€“1421. Proceedings of Machine Learning Research, 2015. [129] Y. Xu, R. Hao, W. Yin, and Z. Su. Parallel matrix factorization for low-rank tensor comple- tion. Inverse Problems and Imaging, 9(2):601โ€“624, 2015. [130] S. Xue, W. Qiu, F. Liu, and X. Jin. Low-rank tensor completion by truncated nuclear norm regularization. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 2600โ€“2605. IEEE, 2018. [131] J. Yang, X. Zhao, T. Ma, Y. Chen, T. Huang, and M. Ding. Remote sensing images destriping using unidirectional hybrid total variation and nonconvex low-rank regularization. Journal of Computational and Applied Mathematics, 363:124โ€“144, 2020. [132] X. Yi, D. Park, Y. Chen, and C. Caramanis. Fast algorithms for robust PCA via gradient 99 descent. In Advances in Neural Information Processing Systems, pages 4152โ€“4160, 2016. [133] F. Zhang, J. Wang, W. Wang, and C. Xu. Low-tubal-rank plus sparse tensor recovery with IEEE Transactions on Pattern Analysis and Machine Intelli- prior subspace information. gence, 43(10):3492โ€“3507, 2020. [134] G. Zhang, H. Li, and Y. Wei. CPQR-based randomized algorithms for generalized cur de- compositions. Computational and Applied Mathematics, 43(3):132, 2024. [135] L. Zhang, L. Song, B. Du, and Y. Zhang. Nonlocal low-rank tensor completion for visual data. IEEE Transactions on Cybernetics, 51(2):673โ€“685, 2019. [136] S. Zhang and M. Wang. Correction of corrupted columns through fast robust hankel matrix completion. IEEE Transactions on Signal Processing, 67(10):2580โ€“2594, 2019. [137] T. Zhang and Y. Yang. Robust PCA by manifold optimization. Journal of Machine Learning Research, 19(1):3101โ€“3139, 2018. [138] Z. Zhang and S. Aeron. Exact tensor completion using t-SVD. IEEE Transactions on Signal Processing, 65(6):1511โ€“1526, 2017. [139] Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. Kilmer. Novel methods for multilinear data completion and de-noising based on tensor-SVD. In Proceedings of the IEEE conference on computer vision and pattern Recognition, pages 3842โ€“3849, 2014. [140] Q. Zhao, L. Zhang, and A. Cichocki. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE transactions on pattern analysis and machine intelli- gence, 37(9):1751โ€“1763, 2015. [141] X. Zhao, J. Yang, T. Ma, T. Jiang, M. K. P. Ng, and T. Huang. Tensor completion via complementary global, local, and nonlocal priors. IEEE Transactions on Image Processing, 31:984โ€“999, 2022. [142] Y. Zheng, T. Huang, X. Zhao, T. Jiang, T. Ma, and T. Ji. Mixed noise removal in hyperspectral image via low-fibered-rank regularization. IEEE Transactions on Geoscience and Remote Sensing, 58(1):734โ€“749, 2019. [143] P. Zhou, C. Lu, Z. Lin, and C. Zhang. Tensor factorization for low-rank tensor completion. IEEE Transactions on Image Processing, 27(3):1152โ€“1163, 2017. [144] Z. Zhou, X. Li, J. Wright, E. Candes, and Y. Ma. Stable principal component pursuit. In 2010 IEEE international symposium on information theory, pages 1518โ€“1522. IEEE, 2010. 100