ADVANCES IN MATRIX AND TENSOR ANALYSIS: FLEXIBLE AND ROBUST
SAMPLING MODELS, ALGORITHMS, AND APPLICATIONS

By

Bowen Su

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Applied Mathematics—Doctor of Philosophy

2025

ABSTRACT

This thesis investigates robust and flexible methods for matrix and tensor analysis, which are funda-

mental in data science. The primary focus of this work is the development of Guaranteed Sampling

Flexibility for Low-Tubal-Rank Tensor Completion, a project aimed at addressing the limitations

of existing sampling methods for tensor completion, such as Bernoulli and t-CUR sampling, which

often lack flexibility across diverse applications.

To overcome these challenges, we introduce Tensor Cross-Concentrated Sampling (t-CCS),

an extension of the matrix Cross-Concentrated Sampling (CCS) model to tensors, and propose

a novel non-convex algorithm, Iterative Tensor CUR Completion (ITCURC), specifically tailored

for t-CCS-based tensor completion. Theoretical analysis provides sufficient conditions for low-rank

tensor recovery and presents a detailed sampling complexity analysis. These findings are further

validated through extensive testing on both real-world and synthetic datasets.

In addition to the main project, this thesis includes another one complementary study. The study

explores the robustness of CCS model for matrix completion, a recent approach demonstrated to

effectively capture cross-concentrated data dependencies. However, its robustness to sparse outliers

has remained underexplored. To address this gap, we propose the Robust CCS Completion prob-

lem and develop a non-convex iterative algorithm, Robust CUR Completion (RCURC). Empirical

results on synthetic and real-world datasets demonstrate that RCURC is both efficient and robust

against outliers, making it a powerful tool for recovering incomplete data.

Collectively, these projects advance the robustness and flexibility of matrix and tensor methods,

enhancing their applicability in complex, real-world data environments.

ACKNOWLEDGEMENTS

As I reach the culmination of my Ph.D. journey, I wish to extend my deepest gratitude to the Math-

ematics Department and the following professors whose support and guidance have been instru-

mental to my growth.

I would first like to express my sincere gratitude to my committee chair and advisor, Professor

Andrew Christlieb. His expertise in Computational Science and Engineering has been a constant

source of inspiration, significantly shaping my research perspective and deepening my understand-

ing of high-performance computing. His invaluable advice and steadfast encouragement, particu-

larly during the most challenging moments of my graduate studies, have left a lasting impact on both

my personal and professional development. I would like long-term opportunities to work with Prof.

Andrew on the development of low-rank tensor approximation techniques for high-dimensional

PDEs. I am sincerely thankful to Professor Ekaterina Rapinchuk for her unwavering support and

academic guidance. Her encouragement and insightful advice during my most difficult times pro-

vided me with strength and motivation. Her commitment to my success, from serving on my Ph.D.

committee from the very beginning to offering steadfast support throughout my journey, has been

truly indispensable. I would like long-term opportunities to collaborate with Prof. Ekaterina on

the machine learning projects. I would like to express my sincere gratitude to Professor Yuying

Xie for his continuous support and guidance. His encouragement has been invaluable, helping me

stay focused and motivated throughout my doctoral journey. I would like long-term opportunities

to collaborate with Prof. Xie on the AI for science projects. My heartfelt appreciation extends

to Professor Mark Iwen for his generosity in sharing his expertise and professional wisdom. His

thoughtful guidance and enduring support during his service on my Ph.D. committee have been

invaluable.

Being part of the Department of Mathematics has been both an honor and a transformative ex-

perience. I am deeply grateful for the knowledge, skills, and unwavering support I have received.

I sincerely extend my heartfelt appreciation to Professor Jeffery and Prof Rajesh for their dedica-

tion to fostering a supportive, friendly and enriching academic environment. Additionally, I am

iii

deeply grateful for the collaborative and intellectually stimulating environment fostered by all math

staff and all my fellow graduate students in the Mathematics Department. I would like to acknowl-

edge Alvarado Taylor, Aldo Garcia Guinto, Bhusal Gokul, Ekblad Owen, Krupansky Nick, Kimble

Jamie, Mandela Quashie, Stephen White, Whiting David, Yu Shen, Jie Yang, Peikai Qi, Shitan Xu,

Boahen Edem, and many other remarkable peers, whose camaraderie and support have profoundly

enriched my Ph.D journey.

I would also like to express my heartfelt gratitude to my mentors at Los Alamos National Lab-

oratory, Dr. Charles Abolt and Dr. Adam Atchley, for their warm invitation, invaluable guidance

during my summer internship, and their proactive efforts to extend the opportunity for me to con-

tinue contributing part-time afterward. Their support and encouragement have been instrumental

in my professional growth.

Lastly but not the least, I would like to express my deepest gratitude to my family—my parents,

Gang Su and Fang Wang, and my wife, Xuandi—whose unwavering support and encouragement

have served as the bedrock of my journey. Your companionship, and unshakable faith in my po-

tential have been a true pillar of support, I owe a special debt of gratitude. Your understanding and

encouragement have inspired me to push through obstacles, remain focused, and strive for excel-

lence. Your collective love and sacrifices that have not only made this journey possible but also

deeply meaningful, motivating me to turn my aspirations into reality.

iv

TABLE OF CONTENTS

LIST OF TABLES .

LIST OF FIGURES .

.

.

.

.

.

.

LIST OF ALGORITHMS .

.

.

.

.

.

.

.

.

.

.

.

.

. . .

. . .

. . .

CHAPTER 1

INTRODUCTION .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

vi

. . vii

. .

ix

. .

1

CHAPTER 2

.

.

.

.

Introduction .

GUARANTEED SAMPLING FLEXIBILITY FOR LOW-TUBAL-RANK
4
. .
.
.
TENSOR COMPLETION .
. .
.
.
.
5
.
.
. . 17
.
.
.
. . 18
.
.
.
. . 21
.
.
.
. . 23
.
.
.
. . 35
.
.
.
. . 71
.
.
.

.
2.1
.
.
2.2 Proposed sampling model .
2.3 Theoretical Results .
.
. . .
2.4 An efficient solver for t-CCS .
2.5 Numerical Experiments .
.
2.6 Proofs of Theoretical Results .
.
2.7 Conclusion .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.

.

.

.

.

.

.

.

.

.

.

. . 72
. . 73
. . 79
. . 82
. . 87

. . 88
. . 89

. . 90

CHAPTER 3

ON THE ROBUSTNESS OF CROSS-CONCENTRATED SAMPLING
.
.
FOR MATRIX COMPLETION .
.
.
.
.
.
.
.
.
.
. . .
.
.
.
.
.
.
.
.

.
Introduction .
3.1
3.2 Proposed Algorithm .
3.3 Numerical Experiments .
.
3.4 Conclusion .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.

.

.

.

.

.

.

.

CHAPTER 4

.
4.1 Summary of Contributions .

CONCLUSION .

.

.

BIBLIOGRAPHY .

.

.

.

.

.

.

.

.

.

.

.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

.
.

.

v

LIST OF TABLES

Table 2.1 A Comprehensive Examination of the Per-Iteration Computational Cost for
.

ITCURC.

. .

. .

. .

. .

. .

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . 23

Table 2.2

Image inpainting results on the Building and Window datasets. The best re-
sults are emphasized in bold, while the second-best results are underlined.
ITCURC-𝛿 refers to the ITCURC method with the percentages of selected hor-
izontal and lateral slices set at a fixed rate of 𝛿%. The t-CCS based algorithm
ITCURC-𝛿%s are performed on t-CCS scheme while other Bernoulli based
. .
algorithms are performed on Bernoulli Sampling scheme.

. .

. .

. .

. .

Table 2.3 The quantitative results for MRI data completion are presented, with the best
ITCURC-𝛿 represents the
results in bold and the second-best underlined.
ITCURC method specifying that the selected proportion of horizontal and
lateral slices is exactly 𝛿%. The t-CCS based algorithm ITCURC-𝛿%s are
performed on t-CCS scheme while other Bernoulli based algorithms are per-
. .
formed on Bernoulli Sampling scheme.

. .

. .

. .

. .

. .

. .

.

.

.

.

.

.

. . 29

. . 32

Table 2.4 Quantitative results for seismic data completion: TMac, TNN, F-TNN with
Bernoulli sampling, and our method with t-CCS. Best results are in bold, and
second-best are underlined. ITCURC-𝛿 refers to the ITCURC method with the
percentages of selected horizontal and lateral slices set at a fixed rate of 𝛿%.
The t-CCS based algorithm ITCURC-𝛿%s are performed on t-CCS scheme
while other Bernoulli based algorithms are performed on Bernoulli Sampling
.
.
scheme.

. .

. .

. .

. .

. .

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . . 34

Table 3.1 Comparison of runtime and PSNR among RPCA, SPCP, LRMF, IRCUR-R,
.

IRCUR-F based on full observation and RCURC based on the CCS model.

. . 85

vi

LIST OF FIGURES

Figure 2.1 Visualization of a horizontal, a lateral, and a frontal slice of a tensor T ∈
K𝑛1×𝑛2×𝑛3. The orange region from the leftmost subfigure, middle subfigure,
and rightmost subfigure are a horizontal slice, a lateral slice, and a frontal
slice of T respectively.
.
.

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

8

Figure 2.2 A Standard Tensor Lateral Basis.

Figure 2.3 A Standard Tubal Basis. . .

Figure 2.4

t-CUR decomposition. . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . 13

. . 13

. . 16

Figure 2.5 Visual results of color image inpainting using t-CCS samples at an overall

sampling rate of 20% with BCPF, TMac, TNN, and F-TNN algorithms. .

.

. . . 21

Figure 2.6

(Row 1) 3D and (Row 2) 2D views illustrate ITCURC’s empirical phase tran-
sition for the t-CCS model. 𝛿 = |𝐼 |/768 = |𝐽 |/768 shows sampled indices
ratios, 𝑝 is the Bernoulli sampling probability over subtensors, and 𝛼 is the
overall tensor sampling rate. White and black in the 768 × 768 × 256 ten-
sor results represent success and failure, respectively, across 25 tests for tubal
ranks 2, 5, and 7 (Columns 1-3). The 𝛼 needed for success remains consistent
across different combinations 𝛿 and 𝑝.
.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 2.7 The averaged relative error of TICURC under the t-CCS model with respect
.

to iterations over 10 independent trials with 𝛿 = 0.20.

.

.

.

.

.

.

.

.

.

.

.

Figure 2.8 The averaged relative error of TICURC under the t-CCS model with respect
.

to iterations over 10 independent trials with 𝛿 = 0.25.

.

.

.

.

.

.

.

.

.

.

.

Figure 2.9 The averaged relative error of TICURC under the t-CCS model with respect
.

to iterations over 10 independent trials with 𝛿 = 0.30.

.

.

.

.

.

.

.

.

.

.

.

Figure 2.10 The averaged relative error of TICURC under the t-CCS model with respect
.

to iterations over 10 independent trials with 𝛿 = 0.35.

.

.

.

.

.

.

.

.

.

.

.

Figure 2.11 The visualization of color image inpainting for Building and Window datasets
by setting tubal-rank 𝑟 = 35 with the percentage selected horizontal and lat-
eral slices 𝛿 = 13% with overall sampling rate 20% for TICUR algorithm,
while other algorithms are applied based on Bernoulli sampling model with
the same overall sampling rate 20%. Additionally, t-CCS samples on the
. .
Building for ITCURC are the same as those in Figure 2.5.

. .

. .

. .

.

vii

. . 24

. . 25

. . 26

. . 26

. . 27

. . 28

Figure 2.12 Visualizations of MRI data recovery using ITCURC with tubal rank 𝑟 = 35,
lateral and horizontal slice selection rate 𝛿 = 27%, and an overall sampling
rate of 30%. Other algorithms are applied under Bernoulli sampling with the
same overall sampling rate. Results for slices 51, 66, 86, and 106 are shown
in rows 1 to 4, with a 1.3× magnified area at the bottom left of each result for
. .
.
.
clearer comparison.

. .

. .

. .

. .

. .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 2.13 Visualization of seismic data recovery results by setting tubal-rank 𝑟 = 3 for
ITCURC with percentage of selected horizontal and lateral slices 𝛿 = 17%
with overall sampling rate 28% while other methods are applied based on
Bernoulli sampling models with the same overall sampling rate 28%. Dis-
played are slices 15, 25, and 35 from top to bottom, with a 1.2× magnified
. .
area in each set for clearer comparison. .

. .

. .

. .

. .

. .

.

.

.

.

.

.

.

Figure 2.14 The structure of the proof of Theorem 2.5: The core of the proof for Theo-
rem 2.5 relies on assessing the probability that certain conditions, specified
in Proposition 1, are met. Condition I and Condition II serve as sufficient
criteria to ensure the applicability of Proposition 1. Thus, the proof of Theo-
rem 2.5 primarily involves determining the likelihood that condition I and II
are satisfied. The probabilistic assessment of condition I utilizes Lemma 2.8
as a fundamental instrument. Similarly, the evaluation of condition II employs
. .
Lemmas 2.8 to 2.10, and Corollary 2.1 as essential tools.

. .

. .

. .

. .

Figure 3.1

[19] Visual comparison of sampling schemes: from uniform to CUR sam-
pling at the same observation rate. Colored pixels indicate observed entries,
. .
.
black pixels indicate missing ones.

. .

. .

. .

. .

. .

.

.

.

.

.

.

.

.

.

Figure 3.2 Empirical convergence of RCURC [18]. Left: c = 10 and varying r. Right:
.
.

𝛼 = 0.2, 𝑟 = 5 and varying 𝑐.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 3.3 Video background subtraction results: Row 1 shows the original images
(full observation) at the corresponding frames, while Row 2 presents the ob-
served images generated by the CCS model at the respective frames. Rows
3 to 8 showcase the background components extracted using RPCA, SPCP,
LMRF, AccAltProj, IRCUR-R, and IRCUR-F algorithms based on the full
observation model. Row 9 presents the results obtained using the RCURC
. .
.
algorithm under the CCS model.

. .

. .

. .

. .

. .

.

.

.

.

.

.

.

.

.

.

viii

. . 30

. . 33

. . 46

. . 74

. . 83

. . 86

LIST OF ALGORITHMS

Algorithm 2.1

t-Product based on Fast Fourier Transform (FFT)

Algorithm 2.2 Moore-Penrose inverse .

Algorithm 2.3

t-SVD .

.

.

.

.

.

. . .

Algorithm 2.4 Compact t-SVD .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algorithm 2.5 Tensor Cross-Concentrated Sampling (t-CCS) .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Algorithm 2.6

Iterative CUR tensor completion for t-CCS (ITCURC) .

Algorithm 2.7 Two-Step Tensor Completion (TSTC)

.

.

Algorithm 3.1 Cross-Concentrated Sampling (CCS) [19]

Algorithm 3.2 Robust CUR Completion (RCURC)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. . 10

. . 12

. . 13

. . 14

. . 17

. . 22

. . 67

. . 74

. . 80

ix

CHAPTER 1

INTRODUCTION

1

In an era of unprecedented data generation, extracting meaningful insights from complex, high-

dimensional, and often incomplete datasets has become a cornerstone of data science [24, 37].

Matrix and tensor analysis, as foundational tools, provide versatile frameworks to address these

challenges. Their applications span a diverse range of fields, including image and video process-

ing [102, 73, 66], recommendation systems [74, 98], and scientific simulations [33, 53, 51? , 52].

However, despite their versatility, existing methods often struggle in real-world scenarios charac-

terized by noisy, sparsely observed, or intricately structured data. This thesis focuses on developing

robust and flexible methodologies for matrix and tensor analysis, aiming to enhance their robustness

to noise and outliers, improve their adaptability to diverse applications, and expand their theoretical

underpinnings.

The primary focus of this thesis is on Guaranteed Sampling Flexibility for Low-Tubal-Rank

Tensor Completion [107], addressing the limitations of conventional sampling strategies, such as

Bernoulli [72, 121, 63] and t-CUR sampling [108, 100], which lack adaptability for diverse real-

world applications. To overcome these challenges, this project introduces Tensor Cross-Concentrated

Sampling (t-CCS), a generalization of the CCS model to higher-order tensors. Complementing this

framework is the development of a novel non-convex algorithm, Iterative Tensor CUR Comple-

tion (ITCURC), specifically designed for t-CCS-based tensor completion. The project provides

rigorous theoretical foundations, including sufficient conditions for low-tubal-rank tensor recovery

and a detailed sampling complexity analysis. Extensive evaluations on synthetic and real-world

datasets validate the superior performance of t-CCS and ITCURC in terms of accuracy, flexibility,

and computational efficiency. This work advances tensor analysis by addressing the challenges of

high-dimensional, incomplete, and sparsely observed data.

The second project explores the Robustness of Cross-Concentrated Sampling (CCS) for Matrix

Completion [18], a recent method that leverages cross-sectional dependencies to recover missing

data. While CCS has shown promise in capturing essential patterns in data matrices, its vulnerabil-

ity to sparse outliers—a common challenge in real-world datasets—remains an open question. This

project introduces the Robust CCS Completion (RCURC) framework, extending CCS to handle

2

noisy and incomplete data with resilience to outlier corruption. A non-convex iterative algorithm

is developed to solve the RCURC problem, and experimental results on synthetic and real-world

datasets demonstrate the algorithm’s robustness, efficiency, and scalability.

Together, these projects address critical gaps in matrix and tensor analysis, focusing on robust-

ness, flexibility, and efficiency. The methodologies presented in this thesis not only tackle specific

challenges but also provide a foundation for addressing a broader class of problems in data science,

where noise, sparsity, and high dimensionality are pervasive. By proposing novel frameworks,

designing practical algorithms, and establishing comprehensive theoretical insights, this work ad-

vances the state of the art in matrix and tensor analysis, paving the way for their application to

increasingly complex and diverse data environments.

This thesis is structured as follows: Chapter 2 explores Tensor Cross-Concentrated Sampling

and the Iterative Tensor CUR Completion algorithm. Chapter 3 introduces the Robust CCS Com-

pletion framework, detailing its methodology and application to matrix completion problems.

3

CHAPTER 2

GUARANTEED SAMPLING FLEXIBILITY FOR LOW-TUBAL-RANK TENSOR
COMPLETION

4

ABSTRACT

While Bernoulli sampling is extensively studied in the field of tensor completion, and t-CUR sam-

pling provides a way to approximate low-tubal-rank tensors via lateral and horizontal subtensors,

both methods lack sufficient flexibility for diverse practical applications. To address this, we intro-

duces Tensor Cross-Concentrated Sampling (t-CCS), an innovative and straightforward sampling

model that advances the matrix cross-concentrated sampling concept within a tensor framework.

t-CCS effectively bridges the gap between Bernoulli and t-CUR sampling, offering additional flex-

ibility that can lead to computational savings in various contexts. A key aspect of our work is the

comprehensive theoretical analysis provided. We establish a sufficient condition for the successful

recovery of a low-rank tensor from its t-CCS samples. In support of this, we also develop a the-

oretical framework validating the feasibility of t-CUR via uniform random sampling and conduct

a detailed theoretical sampling complexity analysis for tensor completion problems utilizing the

general Bernoulli sampling model. Moreover, we introduce an efficient non-convex algorithm, the

Iterative Tensor CUR Completion (ITCURC) algorithm, specifically designed to tackle the unique

challenges of t-CCS-based tensor completion. We have intensively tested and validated the ef-

fectiveness of the t-CCS model and the ITCURC algorithm across both synthetic and real-world

datasets.

2.1

Introduction

A tensor, as a multidimensional generalization of matrix, provides an intuitive representa-

tion for handling multi-relational or multi-modal data such as hyperspectral data [16, 131, 142],

videos [80, 103], seismic data [42, 95], DNA microarrays [90]. However, in real-world scenarios,

it is common to encounter situations where only partial observations of the tensor data are available

due to unavoidable or unforeseen circumstances. These limitations can stem from factors such as

data collection issues or errors made during data entry by researchers. The problem of recovering

the missing data by effectively leveraging the available observations is commonly referred to as the

Tensor Completion (TC) problem.

TC is inherently complex and often ill-posed [49, 141], necessitating the exploration of vari-

5

ous sampling models and completion techniques. A common and crucial assumption for resolving

TC is the low-rank structure of the tensor, which has been extensively utilized to enhance TC ap-

proaches [5, 80, 138]. However, the concept of tensor rank is not unique and comes with its own

limitations. For example, the CANDECOMP / PARAFAC (CP) rank represents the minimum num-

ber of rank-one tensors required to achieve the CP decomposition, involving summations of these

tensors [60]. Computing the CP rank–an NP-hard problem–presents difficulties in the recovery of

tensors with a low CP rank [68]. Thus, finding the optimal low-CP-rank approximation of the target

tensor is still an open problem [141]. Other tensor ranks, such as Tucker [117], Tensor Train [91],

tubal [69] and Hierarchical-Tucker [48, 54], to name a few, also play prominent roles in the field,

each with its distinct computation and application implications.

In this study, we focus on the low-tubal-rank model for tensor completion. The tubal-rank is de-

fined based on the tensor decomposition known as tensor Singular Value Decomposition (t-SVD),

which employs the tensor-tensor product (t-product) [71]. In t-SVD, a tensor is decomposed into

the t-product of two orthogonal tensors and a 𝑓 -diagonal tensor. The tubal-rank is then determined

by the number of non-zero singular tubes present in the 𝑓 -diagonal tensor. Previous research has

shown that tubal-rank-based tensor models exhibit better modeling capabilities compared to other

rank-based models, particularly for tensors with fixed orientation or specific spatial-shifting char-
acteristics [96, 133]. In low-tubal-rank TC model, we consider T ∈ K𝑛1×𝑛2×𝑛3 with tubal rank 𝑟

and the observations are located in the set Ω. TC aims to recover the original tensor T from the

observations on Ω. Mathematically, we aim to solve the following optimization problem:

(cid:104)PΩ(T − (cid:101)T ), T − (cid:101)T (cid:105),

subject to tubal-rank( (cid:101)T ) = 𝑟,

(2.1)

min
(cid:101)T

where (cid:104)·, ·(cid:105) denotes the Frobenius inner product and PΩ is the sampling operator defined by

PΩ(T ) =

(cid:213)

[T ]𝑖, 𝑗,𝑘 E𝑖, 𝑗,𝑘

(2.2)

(𝑖, 𝑗,𝑘)∈Ω
where E𝑖, 𝑗,𝑘 ∈ {0, 1}𝑛1×𝑛2×𝑛3 is a tensor with all elements being zero except for the element at the

position indexed by (𝑖, 𝑗, 𝑘).

6

For successful recovery, the general setting of an efficient solver for (2.1) requires the observa-

tion set Ω to be sampled entry-wise, fiber-wise, or slab-wise through a certain unbiased stochastic

process, including the Bernoulli sampling process as referenced in [101, 104, 113, 120] and the uni-

form sampling process as referenced in [62, 105, 138]. Although extensive theoretical and empirical

studies have been conducted on these sampling settings, their practical applicability is sometimes

limited in certain contexts. For instance, in collaborative filtering applications, each dimension

of the three order tensor data typically represents users, rated items (such as movies or products),

and time respectively. The unbiased sampling models implicitly assume that all users are equally

likely to rate all items over time, a premise that is often unrealistic in real-world scenarios. Let’s

consider the application of Magnetic Resonance Imaging (MRI) as another example. MRI scans

face limitations with certain metal implants and can cause discomfort in prolonged sessions [1].

To address these issues, we propose a generalization of the cross-concentrated sampling model to

the tensor completion setting based on the cross-concentrated sampling model for matrix comple-

tion [19], termed tensor cross-concentrated sampling (t-CCS). t-CCS enables partial observations

on selected horizontal and lateral subtensors, making it more practical in many applications.

2.1.1 Basic Definitions and Terminology

We use K to denote an algebraically closed field, either R or C. We represent a matrix as a

capital italic letter (e.g., 𝐴) and a tensor by a cursive italic letter (e.g., T ). The notation [𝑛] denotes

the set of the first 𝑛 positive integers, i.e., {1, · · · , 𝑛}, for any 𝑛 ∈ Z+. Submatrices and subtensors

are denoted as [ 𝐴] 𝐼,𝐽 and [T ] 𝐼,𝐽,𝐾, respectively, with 𝐼, 𝐽, 𝐾 as subsets of appropriate index sets.

In particular, if 𝐼 is the full index set, we denote [T ]:,𝐽,𝐾 as [T ] 𝐼,𝐽,𝐾, and similar rules apply to 𝐽
and 𝐾. Additionally, |𝑆| denotes the cardinality of the set 𝑆. If 𝐼 is a subset of the set [𝑛], then 𝐼(cid:251)

denotes the set of elements in [𝑛] that are not in 𝐼. For a given matrix 𝐴, we use 𝐴† to denote its

Moore-Penrose inverse and 𝐴(cid:62) for its conjugate transpose. The spectral norm of 𝐴, represented by

(cid:107) 𝐴(cid:107), is its largest singular value. Additionally, the Frobenius norm of 𝐴 is denoted by (cid:107) 𝐴(cid:107)F, where
(cid:113)(cid:205)𝑖, 𝑗 | 𝐴𝑖, 𝑗 |2, and its nuclear norm, represented by (cid:107) 𝐴(cid:107)∗, is the sum of all its singular

(cid:107) 𝐴(cid:107)F =

values.

7

The Kronecker product is denoted by ⊗. The column vector e𝑖 has a 1 in the 𝑖-th position, with
other elements as 0, and its dimension is specified when used. For a tensor T ∈ K𝑛1×𝑛2×𝑛3, (cid:98)T

represents the tensor after applying a discrete Fourier transform along its third dimension. Given a
tensor T ∈ K𝑛1×𝑛2×𝑛3, we call [T ]𝑖,:,:, [T ]:, 𝑗,:, [T ]:,:,𝑘 horizontal, lateral, and frontal slice of T for
any 𝑖 ∈ [𝑛1], 𝑗 ∈ [𝑛2], and 𝑘 ∈ [𝑛3]. Figure 2.1 gives an example of the horizontal, lateral, frontal
slice of a tensor T ∈ K𝑛1×𝑛2×𝑛3.

Horizontal Slice [T ]𝑛1,:,:

Lateral Slice [T ]:,𝑛2,:

Frontal Slice [T ]:,:,1

Figure 2.1 Visualization of a horizontal, a lateral, and a frontal slice of a tensor T ∈ K𝑛1×𝑛2×𝑛3.
The orange region from the leftmost subfigure, middle subfigure, and rightmost subfigure are a
horizontal slice, a lateral slice, and a frontal slice of T respectively.

Given T ∈ K𝑛1×𝑛2×𝑛3, one can define the associated block circulant matrix obtained from the

mode-3 slabs of T i.e.,

bcirc(T ) :=

∈ K𝑛1𝑛3×𝑛2𝑛3,

T1

T2
...

T𝑛3

T1
...

· · · T2

· · · T3
...
. . .

T𝑛3 T𝑛3−1

· · · T1





























where T𝑖 := [T ]:,:,𝑖. For the purpose of this section, we will utilize a slight modification of the

unfolding of a matrix along its second mode and we define

unfold(T ) :=

(cid:104)

T (cid:62)
1

(cid:105) (cid:62)

· · · T (cid:62)
𝑛3

∈ K𝑛1𝑛3×𝑛2 and fold(unfold(T )) = T .

The t-product of tensors T ∈ K𝑛1×𝑛2×𝑛3 and S ∈ K𝑛2×𝑛4×𝑛3 is denoted by T ∗ S which is a tensor

of dimension 𝑛1 × 𝑛4 × 𝑛3 obtained via circular convolution. Specifically,

T ∗ S = fold(bcirc(T ) · unfold(S)).

(2.3)

8

The computational cost of the t-product based on Equation (2.3) is O (𝑛1𝑛2𝑛2
3

𝑛4), since bcirc(T )·

unfold(S) is the multiplication of a 𝑛1𝑛3 × 𝑛2𝑛3 matrix with a 𝑛2𝑛3 × 𝑛4 matrix.

Define T as

T = (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

(cid:1) · bcirc(T ) ·

(cid:16)

𝐹−1
𝑛3

⊗ 𝐼𝑛2

(cid:17)

,

where 𝐹𝑛 to represents the 𝑛 × 𝑛 Discrete Fourier Transform matrix and 𝐹−1
𝑛

is its matrix inverse.

By the property that a circulant matrix can be block-diagonalized by DFT, we can see that T is a

block-diagonal matrix. Notice that

(where S𝑖 = [S]:,:,𝑖)

unfold(S) =

=

S1

S2
...


























S𝑛3













S𝑛3 S𝑛3−1



S1
...

S2
...

S𝑛3

S1

For simplicity, denote 𝐸1 as

𝐼𝑛4

0
...

0





























· · · S2

· · · S3
...
. . .

· · · S1















·

𝐼𝑛4

0
...

0





























(where 𝐼𝑛4 is the 𝑛4 × 𝑛4 identity matrix)

. Hence, T ∗ S can also be expressed as

T ∗ S = fold(bcirc(T ) · unfold(S))

= fold (bcirc(T ) · bcirc(S) · 𝐸1)

= fold

= fold

= fold

(cid:16)(cid:16)

(cid:16)(cid:16)

(cid:16)(cid:16)

(cid:17)

(cid:17)

(cid:17)

𝐹−1
𝑛3

⊗ 𝐼𝑛1

𝐹−1
𝑛3

⊗ 𝐼𝑛1

𝐹−1
𝑛3

⊗ 𝐼𝑛1

· (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

(cid:1) · bcirc(T ) ·

(cid:16)

(cid:17)

⊗ 𝐼𝑛2

· (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛2

· T · (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛2

(cid:1) · bcirc(S) ·

𝐹−1
𝑛3

(cid:17)

⊗ 𝐼𝑛1

· (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

(cid:1) · 𝐸1

(cid:17)

(cid:1) · bcirc(S) · 𝐸1
(cid:17)

𝐹−1
𝑛3
(cid:16)

· T · S · (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

(cid:1) · 𝐸1

(cid:17)

.

Numerically, we implement the t-product of two tensors based on Algorithm 2.1.

9

Algorithm 2.1 t-Product based on Fast Fourier Transform (FFT)
1: Input: T ∈ K𝑛1×𝑛2×𝑛3, S ∈ K𝑛2×𝑛4×𝑛3.
2: T → (cid:98)T := fft(T , [], 3); S → (cid:98)S := fft(S, [], 3).
3: for each 𝑖 ∈ {1, 2, . . . , 𝑛3} do
4:
5: end for
6: Output: Z → ifft( (cid:98)Z, [], 3).

[ (cid:98)C]:,:,𝑖 = [ (cid:98)T ]:,:,𝑖 · [ (cid:98)S]:,:,𝑖.

Note that the computational costs of fft(T , [], 3), fft(S, [], 3) and ifft(C, [], 3) are 𝑛1𝑛2𝑛3 log(𝑛3),

𝑛2𝑛4𝑛3 log(𝑛3) and 𝑛1𝑛4𝑛3 log(𝑛3) respectively. Thus, t-product based on FFT takes O (𝑛1𝑛2𝑛3 log(𝑛3)+

𝑛2𝑛4𝑛3 log(𝑛3)+𝑛1𝑛4𝑛3 log(𝑛3)+𝑛1𝑛2𝑛4𝑛3) = O (𝑛1𝑛2𝑛4𝑛3), which is more computational efficient.

Definition 1 (Tensor Frobenius norm). The tensor Frobenius norm (cid:107)T (cid:107)F of a third-order tensor
T ∈ K𝑛1×𝑛2×𝑛3 is defined as ,

(cid:107)T (cid:107)F :=

(cid:115)(cid:213)

𝑖, 𝑗,𝑘

|T𝑖, 𝑗,𝑘 |2 =

1
√
𝑛3

(cid:107)bcirc(T )(cid:107)F.

Before we introduce the tensor spectral norm, let’s discuss mathematical insight to define such

a mathematical object. Suppose that (𝑉, (cid:107) · (cid:107)𝑉 ) and (𝑊, (cid:107) · (cid:107)𝑊 ) are two finite-dimensional linear

normed space, where (cid:107) · (cid:107)𝑉 and (cid:107) · (cid:107)𝑊 are two norms defined on 𝑉 and 𝑊 respectively. Suppose

𝐿 : 𝑉 → 𝑊 be a continuous linear operator. The operator norm of 𝐿 can be defined as

(cid:107)𝐿 (cid:107) = sup
(cid:107)𝑣(cid:107)𝑉 ≤1
Let 𝑉 = K𝑛1 and 𝑊 = K𝑛2. Given a matrix 𝐴 ∈ K𝑛1×𝑛2, it is easy to see that operator 𝐿 defined as

(cid:107)𝐿(𝑣)(cid:107)𝑊 .

𝐿 : (𝑉, (cid:107) · (cid:107)𝑉 ) −→ (𝑊, (cid:107) · (cid:107)𝑊 )

𝑣 ↦−→ 𝐿 (𝑣) = 𝐴 · 𝑣

(2.4)

is a continuous linear operator. Different choices of (cid:107) · (cid:107)𝑉 and (cid:107) · (cid:107)𝑊 will lead to different matrix

norms induced by the operator norm. For example, if (cid:107) · (cid:107)𝑉 and (cid:107) · (cid:107)𝑊 are both Frobenius norm,

then the operator norm defined in Equation (2.5) is the same as the matrix spectral norm. Now let’s
suppose 𝑉 = K𝑛2×1×𝑛3 and 𝑊 = K𝑛1×1×𝑛3. Given a tensor A ∈ K𝑛1×𝑛2×𝑛3, define operator L as

L : (𝑉, (cid:107) · (cid:107)F) −→ (𝑊, (cid:107) · (cid:107)F)

V ↦−→ 𝐿 (V) = A ∗ V.

(2.5)

10

It is easy to check the operator L is bounded and linear. The operator norm of L can be computed

as follows.

(cid:107)L (cid:107) := sup
(cid:107)V (cid:107)≤1

(cid:107)A ∗ V (cid:107)F

= sup

(cid:107)V (cid:107)F≤1

(cid:107)bcirc(A) · unfold(V)(cid:107)F

= (cid:107) bcirc(A)(cid:107)

Remember that A = (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1
√
𝐼𝑛1) = I𝑛1𝑛3 and

𝑛3(𝐹−1
𝑛3

⊗ 𝐼𝑛2)(cid:62) ·

(cid:1) · bcirc(A) ·

(cid:16)

√

𝑛3(𝐹−1
𝑛3

(cid:17)

. Notice that

𝐹−1
𝑛3

⊗ 𝐼𝑛2
⊗ 𝐼𝑛2) = I𝑛2𝑛3. Thus,

1√𝑛3

(𝐹𝑛3 ⊗ 𝐼𝑛1)(cid:62) ·

1√𝑛3
𝑛3(𝐹−1
𝑛3

(𝐹𝑛3 ⊗
⊗ 𝐼𝑛2)

√

1√𝑛3

(𝐹𝑛3 ⊗ 𝐼𝑛1) and

are two unitary orthogonal matrices. Hence, we have

(cid:107)A (cid:107) = (cid:107) (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1
1
√
𝑛3

(cid:1) · bcirc(A) ·

𝐹−1
𝑛3
(𝐹𝑛3 ⊗ 𝐼𝑛1)(cid:62) · (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

= (cid:107)

(cid:16)

(cid:17)

(cid:107)

⊗ 𝐼𝑛2

(cid:1) · bcirc(A) ·

(cid:16)

𝐹−1
𝑛3

⊗ 𝐼𝑛2

√

(cid:17)

·

𝑛3(𝐹−1
𝑛3

⊗ 𝐼𝑛2)(cid:62)(cid:107)

= (cid:107)(𝐹𝑛3 ⊗ 𝐼𝑛1)(cid:62) · (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

= (cid:107)(𝐹(cid:62)
𝑛3

⊗ 𝐼𝑛1) · (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

(cid:1) · bcirc(A) ·
(cid:16)

(cid:1) · bcirc(A) ·

⊗ 𝐼𝑛2
(cid:17)

⊗ 𝐼𝑛2

(cid:16)

𝐹−1
𝑛3

𝐹−1
𝑛3
(cid:16)

(cid:17)

· (𝐹−1
𝑛3

⊗ 𝐼𝑛2)(cid:62)(cid:107)

)(cid:62) ⊗ 𝐼𝑛2)(cid:107)

· ((𝐹−1
𝑛3
1
𝑛3

(cid:17)

·

(cid:1) · bcirc(A) ·

𝐹−1
𝑛3

⊗ 𝐼𝑛2

(𝐹𝑛3 ⊗ 𝐼𝑛2)(cid:107)

= (cid:107)𝑛3(𝐹−1
𝑛3

⊗ 𝐼𝑛1) · (cid:0)𝐹𝑛3 ⊗ 𝐼𝑛1

= (cid:107) bcirc(A) (cid:107).

Definition 2 ( 𝑓 -diagonal tensor ). A tensor is called 𝑓 -diagonal if each of its frontal slices is a

diagonal matrix.

Definition 3 (Tensor conjugate transpose). The conjugate transpose of a tensor T ∈ K𝑛1×𝑛2×𝑛3 is
the 𝑛2 × 𝑛1 × 𝑛3 tensor T (cid:62) obtained by conjugate transposing each of the frontal slice and then

reversing the order of the second to last frontal slices.

Definition 4 (Identity tensor). The identity tensor I ∈ K𝑛×𝑛×𝑛3 is the tensor with the only first
frontal slices [T ]:,:,1 being the 𝑛 × 𝑛 identity matrix and with other frontal slices [T ]:,:,𝑖 are all

zeros for 𝑖 = 2, · · · , 𝑛3.

Definition 5 (Orthogonal tensor). If a tensor of size 𝑛 × 𝑛 × 𝑛3 is orthogonal if T (cid:62) ∗ T = I =
T ∗ T (cid:62) = I ∈ K𝑛×𝑛×𝑛3

11

Definition 6 (Partially orthogonal tensor). If a tensor of size 𝑛1 × 𝑛2 × 𝑛3 is partially orthogonal if
T (cid:62) ∗ T = I ∈ K𝑛2×𝑛2×𝑛3 or T ∗ T (cid:62) = I ∈ K𝑛1×𝑛1×𝑛3

Definition 7 (Moore-Penrose inverse [71]). T † ∈ K𝑛2×𝑛1×𝑛3 is said to be the Moore-Penrose inverse
of T ∈ K𝑛1×𝑛2×𝑛3, if T † satisfies the following four equations,

T ∗ T † ∗ T = T , T † ∗ T ∗ T † = T †,
(cid:16)

(cid:17) (cid:62)

= T ∗ T †,

T † ∗ T

(cid:16)

T ∗ T †(cid:17) (cid:62)

= T † ∗ T .

Algorithm 2.2 Moore-Penrose inverse
1: Input: Z ∈ K𝑛1×𝑛2×𝑛3.
2: Z → (cid:98)Z = ﬀt(Z, [], 3).
3: for each 𝑖 ∈ 𝑛3 do
[ (cid:98)Z]†
4:
5: end for
6: Output: Z† = iﬀt( (cid:98)Z†, [], 3)

:,:,𝑖 = Moore-Penrose-inverse([ (cid:98)Z]:,:,𝑖)

Definition 8 (Tensor spectral norm and condition number). The tensor spectral norm (cid:107)T (cid:107)2 of a

third-order tensor T is defined as (cid:107)T (cid:107)2 = (cid:107)bcirc(T )(cid:107)2. The condition number of T is defined as:

𝜅(T ) = (cid:107)T †(cid:107)2 · (cid:107)T (cid:107)2.

Definition 9 (Standard tensor lateral basis). The lateral basis ˚𝔢𝑖, is of size 𝑛1 × 1 × 𝑛3 with only

[˚𝔢𝑖]𝑖,1,1 equal to 1 and the remaining equal to zero.

Definition 10 (Standard tensor tubal basis [109, 138]). A standard tubal basis (cid:164)𝔢𝑘 , is a 1 × 1 × 𝑛3

third mode tensor where all elements are zero except for a single nonzero element with a value of 1

at the (1, 1, 𝑘) entry.

Definition 11 ( Identity tensor). The identity tensor I ∈ K𝑛×𝑛×𝑛3 is the tensor whose first frontal
slice is the 𝑛 × 𝑛 identity matrix and other frontal slices are all zeros.

Our research will focus on the subtensors of an underlying tensor with low tubal-rank. To ensure

that this work is self-contained, we will begin by introducing the concept of the sampling tensor as

follows.

12

Figure 2.2 A Standard Tensor Lateral Basis.

Figure 2.3 A Standard Tubal Basis.

Definition 12 (Sampling tensor). Given a tensor T ∈ K𝑛1×𝑛2×𝑛3 and 𝐼 ⊆ [𝑛1], the horizontal
subtensor R of T with indices 𝐼 can be obtained via

R := [T ] 𝐼,:,: = [I] 𝐼,:,: ∗ T ,

where I is defined in Definition 11.

For convenience, [I] 𝐼,:,: will be denoted by S𝐼 with the given index set 𝐼. Similarly, the lateral

sub-tensor C with indices 𝐽 ⊆ [𝑛2] can be obtained as C := [T ]:,𝐽,: = T ∗ [I]:,𝐽,:. The subtensor

W of T with horizontal indices 𝐼 and lateral indices 𝐽 can be represented as U := [T ] 𝐼,𝐽,: =

S𝐼 ∗ T ∗ S𝐽.

2.1.2 Tensor decomposition

Tensor decompositions provide a concise representation of the underlying structure of data,

revealing the low-dimensional subspace within which the data resides.

Theorem 2.1 (t-SVD). Let T ∈ K𝑛1×𝑛2×𝑛3. Then, it can be factored as

T = W ∗ Σ ∗ V(cid:62),

where W ∈ K𝑛1×𝑛1×𝑛3, V ∈ K𝑛2×𝑛2×𝑛3 are orthogonal and Σ ∈ K𝑛1×𝑛2×𝑛3 is a f-diagonal tensor.

Numerically, we implement t-SVD based on Algorithm 2.3.

Algorithm 2.3 t-SVD
1: Input: T ∈ K𝑛1×𝑛2×𝑛3.
2: Z → (cid:98)Z := fft(Z, [], 3).
3: for each 𝑖 ∈ 𝑛3 do
4:
5: end for
6: Output: ifft( (cid:98)U, [], 3); ifft( (cid:98)S, [], 3); ifft( (cid:98)V, [], 3)

[[ (cid:98)U]:,:,𝑖, [ (cid:98)S]:,:,𝑖, [ (cid:98)V]:,:,𝑖] = SVD([ (cid:98)Z]:,:,𝑖)

13

From Algorithm 2.3, we can see that t-SVD is implemented by performing matrix SVD iter-

atively for a loop of 𝑛3 times. Thus, the computational complexity of t-SVD of a 𝑛1 × 𝑛2 × 𝑛3 is

O (min{𝑛2
1

𝑛2𝑛3, 𝑛1𝑛2
2

𝑛3})

Definition 13 (Tubal-rank and multi-rank). Suppose the tensor T ∈ K𝑛1×𝑛2×𝑛3 satisfies rank
𝑟𝑘 for 𝑘 ∈ [𝑛3]. Then (cid:174)𝑟 = (cid:0)𝑟1, 𝑟2, . . . , 𝑟𝑛3
addition, max{𝑟𝑖 : 𝑖 ∈ [𝑛3]} is called the tubal-rank of T , denoted by rank(T ). We denote tubal-

[ (cid:98)T ]:,:,𝑘
(cid:1) is called the multi-rank of T , denoted by rank𝑚 (T ). In

(cid:16)

(cid:17)

=

rank as 𝑟 or (cid:107)(cid:174)𝑟 (cid:107)∞, and (cid:107)(cid:174)𝑟 (cid:107)1 for the sum of the multi-rank.

Theorem 2.2 (Compact t-SVD). Let T ∈ K𝑛1×𝑛2×𝑛3 with tubal-rank 𝑟. Then, it can be factored as

T = W ∗ Σ ∗ V(cid:62),

where W ∈ K𝑛1×𝑟×𝑛3, V ∈ K𝑛2×𝑟×𝑛3 are partially orthogonal and Σ ∈ K𝑟×𝑟×𝑛3 is a f-diagonal

tensor. Numerically, we implement compact t-SVD based on Algorithm 2.4.

Algorithm 2.4 Compact t-SVD
1: Input: T ∈ K𝑛1×𝑛2×𝑛3.
2: Z → (cid:98)Z := fft(Z, [], 3).
3: Initialize (cid:99)W = zeros(𝑛1, 𝑟, 𝑛3), (cid:98)S = zeros(𝑟, 𝑟, 𝑛3) and (cid:98)V = zeros(𝑛2, 𝑟, 𝑛3).
4: for each 𝑖 ∈ 𝑛3 do
5:

6:

7:

[𝑊, 𝑆, 𝑉] = SVD( [ (cid:98)Z]:,:,𝑖)
[ (cid:99)W]:,:,𝑖 = [𝑊]:,1:𝑟;
[ (cid:98)S]:,:,𝑖 = [𝑆]1:𝑟,1:𝑟;
[ (cid:98)V]:,:,𝑖 = [𝑉]:,1:𝑟

8:
9: end for
10: Output: W = ifft( (cid:99)W, [], 3); S = ifft( (cid:98)S, [], 𝑆); V = ifft( (cid:98)V, [], 3)

Lemma 2.1. [71, 69] [Best Tubal rank-𝑟 approximation]. Let the t-SVD of T ∈ R𝑚×𝑛×𝑘 be T =
U ∗ S ∗ V†. For a given positive integer 𝑟, define T𝑟 = (cid:205)𝑟
𝑠=1 U (:, 𝑠, :) ∗ S(𝑠, 𝑠, :) ∗ V†(:, 𝑠, :). Then
T𝑟 = argminT ∈T ||T − T ||𝐹, where T = {X ∗ Y†|X ∈ K𝑚×𝑟×𝑘 , Y ∈ K𝑛×𝑟×𝑘 }.

Note that S in t-SVD is organized in a decreasing order, i.e., ||S(1, 1, :)||2 ≥ ||S(2, 2, :)||2 ≥ ...,

which is implicitly defined in [69]. Therefore, the best rank-𝑟 approximation of tensors is similar

to PCA (principal component analysis) of matrices. After we introduce the compact t-SVD, we

introduce two important definitions based on this type of decomposition.

14

Definition 14 (Tensor 𝜇0-incoherence condition). Given a tubal-rank 𝑟 tensor T ∈ K𝑛1×𝑛2×𝑛3 with
a compact t-SVD T = W ∗ S ∗ V(cid:62), we say T satisfy 𝜇0-incoherence condition if for all 𝑘 ∈

{1 · · · , 𝑛3}, the following hold:

max
𝑖=1,2,.,𝑛1

(cid:13)
(cid:13)
(cid:13)

[ (cid:99)W](cid:62)

:,:,𝑘 · e𝑖

(cid:13)
(cid:13)
(cid:13)F

≤

(cid:114) 𝜇0𝑟
𝑛1

, max

𝑗=1,2,...,𝑛2

(cid:13)
(cid:13)
(cid:13)

[ (cid:98)V](cid:62)

:,:,𝑘 · e 𝑗

(cid:13)
(cid:13)
(cid:13)F

≤

(cid:114) 𝜇0𝑟
𝑛2

.

In certain instances, to accentuate the incoherence parameter of a specific tensor T , we will rep-

resent this parameter as 𝜇T .

In tensor decomposition, t-CUR decomposition, a self-expressiveness tensor decomposition of

a given 3-mode tensor, has received significant attention [4, 28, 56, 109]. Specifically, t-CUR
involves representing a tensor T ∈ K𝑛1×𝑛2×𝑛3 as T ≈ C ∗ U ∗ R, with C = [T ]:,𝐽,: and R = [T ] 𝐼,:,:
for some 𝐽 ⊆ [𝑛2] and 𝐼 ⊆ [𝑛1]. There exist different versions of U. This work focuses on the

t-CUR decomposition of the form T ≈ C ∗ U† ∗ R with U = [T ] 𝐼,𝐽,:. Under certain conditions,

this approximation accurately represents T . [28, 56] have detailed the conditions for exact t-CUR

decomposition. We begin by defining the tubal-rank of a 3-mode tensor:

For convenience, we present one theoretical result of t-CUR below.

Theorem 2.3 ( [28, 56]). Let T ∈ K𝑛1×𝑛2×𝑛3 with multi-rank rank𝑚 (T ) = (cid:174)𝑟. 𝐼 ⊂ [𝑛1] and 𝐽 ⊂ [𝑛2]
are two index sets. Denote C = [T ]:,𝐽,:, R = [T ] 𝐼,:,:, and U = [T ] 𝐼,𝐽,:. Then T = C ∗ U† ∗ R if

and only if rank𝑚 (C) = rank𝑚 (R) = (cid:174)𝑟.

Theorem 2.3 can be visualized as follows.

15

Figure 2.4 t-CUR decomposition.

2.1.3 Related work

Kilmer and Martin [71] introduced novel definitions for tensor multi-rank and tubal-rank char-

acterized by the t-SVD. Researchers commonly employ a convex surrogate to tubal-rank function

augmented with regularization of the tensor nuclear norm (TNN), as indicated in [64, 66, 82, 84,

138, 143]. While a pioneering optimization method featuring TNN is initially proposed to tackle

the TC problem in [139], this approach necessitates the simultaneous minimization of all singu-

lar values across tensor slices, which hinders its ability to accurately approximate the tubal-rank

function [61, 130]. To circumvent this challenge, various truncated methods have been introduced

as alternatives. Notably, examples include the truncated nuclear norm regularization [61] and the

tensor truncated nuclear norm (T-TNN) [130]. Furthermore, Zhang et al. [135] introduced a novel

strategy for low-rank regularization, focusing on nonlocal similar patches. However, the aforemen-

tioned tensor completion algorithms are designed based on Bernoulli sampling model. Despite

its foundational role in probability theory and statistics, Bernoulli sampling frequently encounters

practical limitations when applied to real-world data collection scenarios [43, 94]. In the realm of

collaborative filtering, where the tensor’s horizontal and lateral slices denote users and rated ob-

jects (such as movies and merchandise) over a specific time period, the application of the Bernoulli

sampling model is impractical. As this model implicitly assumes that every user has an equal prob-

16

ability of rating any given object, an assumption that is seldom valid in real-world scenarios. The

variability in user preferences and interaction patterns makes this equal-probability assumption un-

realistic, thereby challenging the efficacy of the Bernoulli sampling approach in such contexts.

2.2 Proposed sampling model

We aim to develop a sampling strategy that is both efficient and effective for a range of real-

world scenarios. Inspired by the cross-concentrated sampling model for matrix completion [19] and

t-CUR decomposition [28, 56, 109], we introduce a novel sampling model tailored for tensor data,

named Tensor Cross-Concentrated Sampling (t-CCS). The t-CCS model extracts samples from both

horizontal and lateral subtensors of the original tensor. Formally, let R = [T ] 𝐼,:,: and C = [T ]:,𝐽,:

be the selected horizontal and lateral subtensors of T , determined by index sets 𝐼 and 𝐽 respectively.

Next, we sample entries on R and C based on the Bernoulli sampling model. The t-CCS procedure

is detailed in Procedure 2.5. Notably, t-CCS transitions to t-CUR sampling when the samples are

dense enough to fully capture the subtensors and reverts to Bernoulli sampling when all horizontal

and lateral slices are selected. The indices of the cross-concentrated samples are denoted as ΩR

and ΩC, corresponding to the notation used for the subtensors. Our task is to recover an underlying

tensor T with tubal-rank 𝑟 from the observations on ΩR ∪ ΩC:

(cid:68)

min
(cid:101)T

PΩR ∪ΩC (T − (cid:101)T ), T − (cid:101)T

(cid:69)

, subject to

tubal-rank( (cid:101)T ) = 𝑟,

(2.6)

where (cid:104)·, ·(cid:105) is the Frobenius inner product and PΩR ∪ΩC is defined in (2.2).

Algorithm 2.5 Tensor Cross-Concentrated Sampling (t-CCS)
1: Input: T ∈ K𝑛1×𝑛2×𝑛3.
2: Uniformly select the horizontal and lateral indices, denoted as 𝐼 and 𝐽, respectively.
3: Set R := [T ] 𝐼,:,: and C := [T ]:,𝐽,:.
4: Sample entries from R and C based on Bernoulli sampling models. Record the locations of

these samples as ΩR and ΩC for R and C, respectively.

5: Output: [T ]ΩR ∪ΩC , ΩR, ΩC, 𝐼, 𝐽.

This chapter aims to provides a theoretical well-posedness of the t-CCS model, our key theo-

retical contribution, detailed in Theorem 2.6.

17

2.3 Theoretical Results

This section is dedicated to providing a theoretical analysis of the well-posedness of the t-CCS

model, which represents our principal theoretical contribution. This analysis is thoroughly detailed

in Theorem 2.6. Before presenting our main theoretical result Theorem 2.6, we firstly introduce two

important supporting theorems, where the proof of main theoretical result rely on. The first theo-

rem, Theorem 2.4, establishes the necessary lower bounds for the number of lateral and horizontal

slices required when uniformly sampling these slices to ensure an exact t-CUR decomposition. The-

orem 2.4 can be seen as an adaptation of [109, Corollary 3.10], featuring a different proof method

specifically designed for uniform sampling and exact t-CUR, and provides a more thorough analysis

for this specific context.

Before presenting Theorem 2.4, let’s briefly review the sampling schemes for matrix CUR de-

composition. Various sampling schemes are designed to ensure the chosen rows and columns val-

idate the CUR decomposition. For example, deterministic methods are explored in works such

as [6, 8, 79]. Randomized sampling algorithms for CUR decompositions and the column subset

selection problem have been extensively studied, as seen in [32, 38, 40, 86, 114, 122]. For a compre-

hensive overview of both approaches, refer to [57]. Hybrid methods that combine both approaches

are discussed in [9, 10, 17].

Particularly, for a rank 𝑟 matrix in K𝑛1×𝑛2 with 𝜇-incoherence, sampling O (𝜇𝑟 log(𝑛1)) rows
and O (𝜇𝑟 log(𝑛2)) columns is sufficient to ensure the exact matrix CUR decomposition [12, 32].

In this work, we extend the uniform sampling results from the matrix setting to the tensor setting.

Theorem 2.4. Let T ∈ K𝑛1×𝑛2×𝑛3 satisfy the tensor 𝜇0-incoherence condition and have multi-rank
(cid:174)𝑟. The indices 𝐼 and 𝐽 are selected uniformly randomly without replacement from [𝑛1] and [𝑛2]

respectively. Set C = [T ]:,𝐽,:, R = [T ] 𝐼,:,: and U = [T ] 𝐼,𝐽,:. Then T = C ∗ U† ∗ R holds with

18

probability at least 1 − 1
𝑛𝛽
1

− 1
𝑛𝛽
2

, provided that

|𝐼 | ≥ 2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛1(cid:107)(cid:174)𝑟 (cid:107)1)

|𝐽 | ≥ 2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛2(cid:107)(cid:174)𝑟 (cid:107)1) .

Another important supporting theorem is Theorem 2.5, which adapts [138, Theorem 3.1] for

tensor recovery with tubal-rank 𝑟 under Bernoulli sampling, essential for Theorem 2.6. Our contri-

bution refines the theorem by explicitly detailing the numerical constants in the original sampling

probability. The proof of Theorem 2.5 is under the same framework as in [84, 138].

Theorem 2.5. Let T ∈ K𝑛1×𝑛2×𝑛3 of tubal-rank 𝑟 satisfy the tensor 𝜇0-incoherence condition. And
its compact t-SVD is T = U ∗ S ∗ V(cid:62) where U ∈ K𝑛1×𝑟×𝑛3, S ∈ K𝑟×𝑟×𝑛3 and V ∈ K𝑛2×𝑟×𝑛3.

Suppose the entries in Ω are sampled according to the Bernoulli model with probability 𝑝. If

𝑝 ≥

256𝛽(𝑛1 + 𝑛2)𝜇0𝑟 log2(𝑛1𝑛3 + 𝑛2𝑛3)
𝑛1𝑛2

with 𝛽 ≥ 1,

(2.7)

then T is the unique minimizer to

(cid:107)T (cid:107)TNN, subject to PΩ(T ) = PΩ(T ),

min
T

with probability at least 1 − 3 log(𝑛1𝑛3+𝑛2𝑛3)
(𝑛1𝑛3+𝑛2𝑛3)4𝛽−2 .

Theorem 2.6. Let T ∈ K𝑛1×𝑛2×𝑛3 satisfy the tensor 𝜇0-incoherence condition and have multi-rank
(cid:174)𝑟 with condition number 𝜅. Let 𝐼 ⊆ [𝑛1], 𝐽 ⊆ [𝑛2] be chosen uniformly with replacement to yield

R = [T ] 𝐼,:,: and C = [T ]:,𝐽,:. And suppose that ΩR and ΩC are generated from R and C according

to the Bernoulli distributions with probability 𝑝R and 𝑝C respectively. If

|𝐼 | ≥ 3200𝛽𝜇0𝑟 𝜅2 log2(𝑛1𝑛3 + 𝑛2𝑛3),

|𝐽 | ≥ 3200𝛽𝜇0𝑟 𝜅2 log2(𝑛1𝑛3 + 𝑛2𝑛3),

𝑝R ≥

𝑝C ≥

1600(|𝐼 | + 𝑛2)𝜇0𝑟 𝜅2 log2((𝑛1 + 𝑛2)𝑛3)
|𝐼 |𝑛2
1600(|𝐽 | + 𝑛1)𝜇0𝑟 𝜅2 log2((𝑛1 + 𝑛2)𝑛3)
|𝐽 |𝑛1

,

19

for some absolute constant 𝛽 > 1, then T can be uniquely determined from the entries on ΩR ∪ ΩC

with probability at least

1 −

−

1
(𝑛1𝑛3 + 𝑛2𝑛3)800𝛽𝜅2 log(𝑛2)
3 log(𝑛1𝑛3 + |𝐽 |𝑛3)
(𝑛1𝑛3 + |𝐽 |𝑛3)4𝛽−2

−

1
(𝑛1𝑛3 + 𝑛2𝑛3)800𝛽𝜅2 log(𝑛1)
.

3 log(𝑛2𝑛3 + |𝐼 |𝑛3)
(𝑛2𝑛3 + |𝐼 |𝑛3)4𝛽−2

−

Remark 1.

(i) When 𝑛1 = 𝑛2 = 𝑛, the results in the above theorem can be simplified to that T
can be uniquely determined from the entries on ΩR ∪ΩC with probability at least 1− 6 log(2𝑛𝑛3)
(𝑛𝑛3)4𝛽−2 .

(ii) Supposed T with multi-rank (cid:174)𝑟 of low tubal-rank 𝑟 is the underlying tensor we aim to re-

cover. Notice that such T is one of feasible solutions to the optimization problem (2.1) since

tubal-rank(T ) = 𝑟. Additionally, it is evident that for any (cid:101)T with tubal-rank 𝑟,

(cid:104)PΩ( (cid:101)T − T ), (cid:101)T − T )(cid:105) ≥ 0 and (cid:104)PΩ(T − T ), T − T (cid:105) = 0.

Thus, T is a global minimizer to the optimization problem (2.1). According to Theorem 2.6,

T with low tubal-rank 𝑟 can be reliably recovered using the t-CCS model with high probabil-

ity. Consequently, we can obtain a minimizer for the non-convex optimization problem (2.1)

through samples that are partially observed from the t-CCS model.

Theorem 2.6 elucidates that a sampling complexity of

O (𝑟 𝜅2 max{𝑛1, 𝑛2}𝑛3 log2(𝑛1𝑛3 + 𝑛2𝑛3))

is sufficient for TC on t-CCS model. This complexity is a 𝜅2 factor worse than that of the benchmark

provided by the state-of-the-art Bernoulli-sampling-based TC methods, such as the TNN method

detailed by Zhang and Aeron [138], which demands

O (𝑟 max{𝑛1, 𝑛2}𝑛3 log2(𝑛1𝑛3 + 𝑛2𝑛3))

samples. This observation suggests the potential for identifying a more optimal lower bound, which

will leave as a future direction.

20

2.4 An efficient solver for t-CCS

In this section, we investigate how to effectively and efficiently solve the t-CCS-based TC prob-

lem. First, we consider directly applying several existing TC algorithms including BCPF [140],

TMac [129], TNN [138], and F-TNN [66] to a t-CCS-based image recovery problem, where BCPF

is CP-based algorithm, TMac is Tucker-based algorithm, TNN and F-TNN are two tubal-rank-based

algorithms. However, it turns out that these methods are not well-suited for the tensor completion

problem based on t-CCS model. As illustrated in Figure 2.5, these approaches fail to yield reliable

visualization outcomes. This indicates the necessity to develop new algorithm(s) for the proposed

t-CCS model.

Ground truth

Observed

BCPF

TMac

TNN

F-TNN

Figure 2.5 Visual results of color image inpainting using t-CCS samples at an overall sampling rate
of 20% with BCPF, TMac, TNN, and F-TNN algorithms.

2.4.1 Iterative tensor CUR completion algorithm

To efficiently use the t-CCS structure, we develop the Iterative Tensor CUR Completion (ITCURC),

a non-convex algorithm inspired by projected gradient descent. ITCURC updates R, C, and U at

each iteration to preserve the tubal-rank 𝑟 of T . The update formulas are:

[R𝑘+1]:,𝐽(cid:251),: := [T𝑘 ] 𝐼,𝐽(cid:251),: + 𝜂𝑅 (cid:2)PΩR (T − T𝑘 )(cid:3)
[C𝑘+1] 𝐼(cid:251),:,: := [T𝑘 ] 𝐼(cid:251),𝐽 + 𝜂𝐶 (cid:2)PΩ𝐶 (T − T𝑘 )(cid:3)

,

𝐼,𝐽(cid:251),:

,

𝐼(cid:251),𝐽,:

U𝑘+1 := H𝑟

(cid:16)

[T𝑘 ] 𝐼,𝐽,: + 𝜂𝑈 (cid:2)PΩ𝑅∪Ω𝐶 (T − T𝑘 )(cid:3)

(cid:17)

,

𝐼,𝐽,:

(2.8)

(2.9)

(2.10)

where 𝜂𝑅, 𝜂𝐶, 𝜂𝑈 are step sizes, and H𝑟 is the truncated t-SVD operator. [R𝑘+1]:,𝐽,: and [C𝑘+1] 𝐼,:,:
are updated to U𝑘+1. The algorithm, starting from T0 = 0, iterates until 𝑒𝑘 ≤ 𝜀, where 𝜀 is a preset

21

tolerance and

𝑒𝑘 =

(cid:10)PΩR ∪ΩC (T − T𝑘 ) , T − T𝑘 (cid:11)
(cid:10)PΩR ∪ΩC (T ), T (cid:11)

.

(2.11)

The algorithm is summarized in Algorithm 2.6.

Algorithm 2.6 Iterative CUR tensor completion for t-CCS (ITCURC)
1: Input: [T ]ΩR ∪ΩC : observed data; ΩR, ΩC : observed locations; 𝐼, 𝐽 : horizontal and lateral
indexes that define R and C respectively; 𝜂𝑅, 𝜂𝐶, 𝜂𝑈 : step sizes; 𝑟 : target tubal-rank; 𝜀 :
tolerance level.

2: Set T0 = 0 ∈ K𝑛1×𝑛2×𝑛3.
3: while 𝑒𝑘 > 𝜀 do // 𝑒𝑘 is defined in (2.11)
[R𝑘+1]:,𝐽(cid:251),: = [T𝑘 ] 𝐼,𝐽(cid:251),: + 𝜂𝑅 [PΩR ([T ]ΩR ∪ΩC − T𝑘 )] 𝐼,𝐽(cid:251),:
4:
[C𝑘+1] 𝐼(cid:251),:,: = [T𝑘 ] 𝐼(cid:251),𝐽,: + 𝜂𝐶 [PΩC ([T ]ΩR ∪ΩC − T𝑘 )] 𝐼(cid:251),𝐽,:
5:
6: U𝑘+1 = H𝑟 ( [T𝑘 ] 𝐼,𝐽,: + 𝜂𝑈 [PΩR ∪ΩC ([T ]ΩR ∪ΩC − T𝑘 )] 𝐼,𝐽,:)
7:
8:
9:
10: end while
11: Output: C𝑘+1, U𝑘+1 and R𝑘+1

[R𝑘+1]:,𝐽,: = U𝑘+1
Update T𝑘+1// More details see (2.12), (2.13), (2.14)
𝑘 = 𝑘 + 1

[C𝑘+1] 𝐼,:,: = U𝑘+1.

and

Now let’s outline Algorithm 2.6’s implementation and computational costs. Updating [R𝑘+1]:,𝐽(cid:251),:
and [C𝑘+1] 𝐼(cid:251),:,: incurs O (|ΩR | + |ΩC | − |ΩU |) flops, focusing only on observed locations (refer
to (2.8) and (2.9)). The update of U𝑘+1, sized |𝐼 | × |𝐽 | × 𝑛3, involves (i) computing (cid:101)U𝑘+1 :=
[T𝑘 ] 𝐼,𝐽,: + 𝜂𝑈 [PΩR ∪ΩC ( [T ]ΩR ∪ΩC − T𝑘 )] 𝐼,𝐽,: and (ii) finding its tubal-rank 𝑟 approximation via
t-SVD. The cost for (i) is O (|ΩU |), while (ii) requires max{O (|𝐼 ||𝐽 |𝑟𝑛3), O (|𝐼 ||𝐽 |𝑛3 log(𝑛3))},

making the total update cost for U𝑘 to be max{O (|𝐼 ||𝐽 |𝑟𝑛3), O (|𝐼 ||𝐽 |𝑛3 log(𝑛3))}.

Considering the cost of updating T𝑘+1 in Algorithm 2.6, we focus on [T𝑘+1] 𝐼(cid:251),𝐽,:, [T𝑘+1] 𝐼,𝐽(cid:251),:,

and [T𝑘+1] 𝐼,𝐽,: each iteration. The update for [T𝑘+1] 𝐼(cid:251),𝐽,: is:
[T𝑘+1] 𝐼(cid:251),𝐽,: = [C𝑘 ] 𝐼(cid:251),:,: ∗ U†

𝑘 ∗ U𝑘 = [C𝑘 ] 𝐼(cid:251),:,: ∗ [V𝑘 ]:,1:𝑟,: ∗ [V𝑘 ](cid:62)

:,1:𝑟,:

.

(2.12)

is U𝑘 ’s t-SVD. Given [V𝑘 ]:,1:𝑟,:’s size as |𝐽 | × 𝑟 × 𝑛3, the compu-
where U𝑘 = W𝑘 ∗ Σ𝑘 ∗ V(cid:62)
𝑘
tational cost is O (𝑛1|𝐽 |𝑟𝑛3) flops for (2.12), making the total complexity for updating [T𝑘+1] 𝐼(cid:251),𝐽,:
also O (𝑛1|𝐽 |𝑟𝑛3) flops. We update [T𝑘+1] 𝐼,𝐽(cid:251),: by

[T𝑘+1] 𝐼,𝐽(cid:251),: := [W𝑘 ]:,1:𝑟,: ∗ [W𝑘 ](cid:62)

:,1:𝑟,: ∗ [R𝑘 ]:,𝐽(cid:251),:

.

(2.13)

22

Similar analysis for updating [T𝑘+1] 𝐼,𝐽(cid:251),:, the computational complexity of updating [T𝑘+1] 𝐼,𝐽(cid:251),: is
O (𝑛2|𝐼 |𝑟𝑛3). And we update [T𝑘+1] 𝐼,𝐽,: by setting

[T𝑘+1] 𝐼,𝐽,: := U𝑘 .

(2.14)

Thus, computational complexity of updating T𝑘+1 is O (|𝐼 |𝑟𝑛2𝑛3 + |𝐽 |𝑟𝑛1𝑛3).

Computation of the stopping criterion 𝑒𝑘 cost O (|Ω𝑅 | + |Ω𝐶 | − |Ω𝑈 |) flops as we only make

computations on the observed locations.

The computational costs per iteration are summarized in Table 2.1, showing a complexity of

O (𝑟 |𝐼 |𝑛2𝑛3 + 𝑟 |𝐽 |𝑛1𝑛3) when |𝐼 | (cid:28) 𝑛1 and |𝐽 | (cid:28) 𝑛2.

Table 2.1 A Comprehensive Examination of the Per-Iteration Computational Cost for ITCURC.

Step
Line 3: Computing the stopping criterion 𝑒𝑘
Line 4: [R𝑘+1]:,𝐽 (cid:251) ,: = [T𝑘] 𝐼,𝐽 (cid:251) ,: + 𝜂𝑅 [PΩR ( [T ]ΩR ∪ΩC − T𝑘)] 𝐼,𝐽 (cid:251) ,:
Line 5: [C𝑘+1] 𝐼 (cid:251) ,:,: = [T𝑘] 𝐼 (cid:251) ,𝐽 ,: + 𝜂𝐶 [𝑃ΩC ( [T ]ΩR ∪ΩC − T𝑘)] 𝐼 (cid:251) ,𝐽 ,:
Line 6: U𝑘+1 = H𝑟 ([T𝑘] 𝐼, 𝐽 ,: + 𝜂𝑈 [PΩR ∪ΩC ( [T ]ΩR ∪ΩC − T𝑘)] 𝐼,𝐽 ,:) O (max{|𝐼 ||𝐽 |𝑟𝑛3, |𝐽 ||𝐼 |𝑛3 log(𝑛3)})

O (|Ω𝑅 | + |Ω𝐶 | − |Ω𝑈 |)
O (|Ω𝑅 | − |Ω𝑈 |)
O (|Ω𝐶 | − |Ω𝑈 |)

Computational Complexity

Line 8: Updating T𝑘+1

O (𝑟 |𝐼 |𝑛2𝑛3 + 𝑟 |𝐽 |𝑛1𝑛3)

2.5 Numerical Experiments

This section presents the performance of our t-CCS based ITCURC through numerical exper-

iments on both synthetic and real-world data. The computations are performed on one of shared

nodes of the Computing Cluster with a 64-bit Linux system (GLNXA64), featuring Intel(R) Xeon(R)

Gold 6148 CPU (2.40 GHz). All experiments are carried out using MATLAB 2022a.

2.5.1 Synthetic data examples

This section evaluates ITCURC for t-CCS tensor completion, exploring the needed sample sizes

and the impact of Bernoulli sampling probability and fiber sampling rates on low-tubal-rank tensor

recovery.

We assess ITCURC’s tensor recovery capability under different combinations of horizontal and

lateral slice numbers |𝐼 | = 𝛿𝑛1, |𝐽 | = 𝛿𝑛2 and Bernoulli sampling rates 𝑝 on selected subtensors.

The study uses tensors of size 768 × 768 × 256 with tubal-ranks 𝑟 ∈ {2, 5, 7}. To counteract

23

𝑟 = 2

𝑟 = 5

𝑟 = 7

Figure 2.6 (Row 1) 3D and (Row 2) 2D views illustrate ITCURC’s empirical phase transition for
the t-CCS model. 𝛿 = |𝐼 |/768 = |𝐽 |/768 shows sampled indices ratios, 𝑝 is the Bernoulli sampling
probability over subtensors, and 𝛼 is the overall tensor sampling rate. White and black in the 768 ×
768 × 256 tensor results represent success and failure, respectively, across 25 tests for tubal ranks 2,
5, and 7 (Columns 1-3). The 𝛼 needed for success remains consistent across different combinations
𝛿 and 𝑝.

randomness, we conduct 25 tests for each (𝛿, 𝑝, 𝑟) set, a test is successful if
(cid:13)
(cid:13)
(cid:13)

T − C𝑘 ∗ W†

𝑘 ∗ R𝑘

(cid:13)
(cid:13)
(cid:13)F

𝜀𝑘 :=

(cid:107)T (cid:107)F

≤ 10−3.

Our empirical phase transition results are presented in Figure 2.6, with the first row showing a

3D view of the phase transition results and the second row the corresponding 2D view. White and

black pixels in these visuals indicate all tests’ success and failure, respectively. The results highlight

that higher overall sampling rates are needed for successful completion with larger tubal-ranks 𝑟.

Importantly, tensor completion is achievable with sufficiently large overall sampling rates, regard-

less of the specific horizontal, lateral slice sizes, and subtensor sampling rates (see the results of 2D

view). This demonstrates ITCURC’s flexibility in sampling low-tubal-rank tensors for successful

reconstruction. Additionally, we include our numerical results for the convergence of TICURC in

the following section. In the following, we include further empirical data demonstrating the conver-

24

gence behavior of the ITCURC algorithm within the t-CCS model framework. In this experiment,
we form a low tubal-rank tensor T = A ∗ B ∈ R𝑛1×𝑛2×𝑛3 using two Gaussian random tensors,
where A ∈ R𝑛1×𝑟×𝑛3 and B ∈ R𝑟×𝑛2×𝑛3. Our objective is to examine the convergence behavior of

the TICURC algorithm under different conditions. For the simulations, we set 𝑛1 = 𝑛2 = 768 and

𝑛3 = 256, and generate partial observations using the t-CCS model by adjusting the rank 𝑟 and
configuring the concentrated subtensors as R ∈ R𝛿𝑛1×𝑛2×𝑛3 and C ∈ R𝑛1×𝛿𝑛2×𝑛3, with 0 < 𝛿 < 1.

For each fixed 𝑟, we maintain a constant overall sampling rate 𝛼.

Utilizing the observed data, the TICURC algorithm is then employed to approximate the original

low tubal-rank tensor. The algorithm continues until the stopping criterion 𝜀𝑘 ≤ 10−6 is met, where

𝜀𝑘 represents the relative error between the estimate at the 𝑘-th iteration and the actual tensor,
defined as 𝜀𝑘 = (cid:107)T −(cid:98)T𝑘 (cid:107)𝐹
(cid:107)T (cid:107)𝐹

.

For each specified set of parameters (𝑟, 𝛿, 𝛼), we generate 10 different tensor completion scenar-

ios. The mean relative errors 𝜀𝑘 , along with the specific configurations, are reported in Figures 2.7

to 2.10. One can see that TICURC can achieve an almost linear convergence rate.

(a) 𝑟 = 2, 𝛼 = 0.15

(b) 𝑟 = 5, 𝛼 = 0.25

Figure 2.7 The averaged relative error of TICURC under the t-CCS model with respect to iterations
over 10 independent trials with 𝛿 = 0.20.

25

(a) 𝑟 = 2, 𝛼 = 0.15.

(b) 𝑟 = 5, 𝛼 = 0.25

Figure 2.8 The averaged relative error of TICURC under the t-CCS model with respect to iterations
over 10 independent trials with 𝛿 = 0.25.

(a) 𝑟 = 2, 𝛼 = 0.15

(b) 𝑟 = 5, 𝛼 = 0.25

Figure 2.9 The averaged relative error of TICURC under the t-CCS model with respect to iterations
over 10 independent trials with 𝛿 = 0.30.

2.5.2 Real-world Applications

This section presents an evaluation and comparison between the t-CCS model and the Bernoulli

Sampling model through tensor completion tasks across various types of data. Our goal is to ex-

amine assess the practical feasibility and real-world applicability of the t-CCS model, emphasizing

its effectiveness in diverse operational environments. Our experiments are designed to compare the

performance of ITCURC, designed based on t-CCS model, against with established TC methods

designed based on Bernoulli sampling model such as BCPF [140], TMac [129], TNN [138], and

26

(a) 𝑟 = 2, 𝛼 = 0.15

(b) 𝑟 = 5, 𝛼 = 0.25

Figure 2.10 The averaged relative error of TICURC under the t-CCS model with respect to iterations
over 10 independent trials with 𝛿 = 0.35.

F-TNN [66]. Our test metric focuses on the quality and execution time of the reconstruction. Qual-

ity is assessed using the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index

(SSIM) where

PSNR = 10 log10

(cid:32) 𝑛1𝑛2𝑛3(cid:107)T (cid:107)2
∞
(cid:107)T − (cid:101)T (cid:107)2
F

(cid:33)

.

and SSIM evaluates the structural similarity between two images, as detailed in [125]. On account

of that the data are third-order tensors, we report the mean values of SSIM of all the frontal slices.

Higher PSNR and SSIM scores suggest better reconstruction quality.

Our experimental process is as follows. We first generate random observations via the t-CCS

model: uniformly randomly selecting concentrated horizontal (R) and lateral (C) subtensors, de-

fined as R = [T ] 𝐼,:,: and C = [T ]:,𝐽,:, with |𝐼 | = 𝛿𝑛1 and |𝐽 | = 𝛿𝑛2; entries in R and C are sampled

based on the Bernoulli sampling model with locations of the observed entries denoted by ΩR and

ΩC. The procedure of the t-CCS model results in a tensor that’s only partially observed, primarily

in the R and C. ITCURC is then applied to estimate the missing entries and thus recover the origi-

nal tensor. For comparison, we also generate observations of the entire original tensor T using the
Bernoulli sampling model with a probability 𝑝T := |ΩR ∪ΩC |
𝑛1𝑛2𝑛3

. Additionally, we estimate the missing

data using several tensor completion methods: BCPF1, which is based on the CP decomposition

1https://github.com/qbzhao/BCPF

27

framework; TMac2, which utilizes the Tucker decomposition framework; and TNN3 and F-TNN4,

which are both grounded in the t-SVD framework. To ensure reliable results, we repeat this entire

procedure 30 times, averaging the PSNR and SSIM scores and the runtime to minimize the effects

of randomness.

2.5.2.1 Color image completion

Color images, viewed as 3D tensors with dimensions for height, width, and color channels, are

effectively modeled as low-tubal-rank tensors [80, 83]. In our tests, we focus on two large-size

images: ‘Building’5 (of size 2579 × 3887 × 3) and ‘Window’6 (of size 3009 × 4513 × 3). We present

averaged test results over various overall observation rates (𝛼) in Table 2.2, and visual comparisons

at 𝛼 = 20% in Figure 2.11.

Ground truth

BCPF

TMac

TNN

F-TNN

ITCURC

Figure 2.11 The visualization of color image inpainting for Building and Window datasets by setting
tubal-rank 𝑟 = 35 with the percentage selected horizontal and lateral slices 𝛿 = 13% with overall
sampling rate 20% for TICUR algorithm, while other algorithms are applied based on Bernoulli
sampling model with the same overall sampling rate 20%. Additionally, t-CCS samples on the
Building for ITCURC are the same as those in Figure 2.5.

Figure 2.11 presents a clear visual comparison of results from different methods at a 20% over-

all sampling rate, where the algorithms BCPF, TMac, TNN, F-TNN are applied on the Bernoulli

Sampling model and ITCURC are applied on t-CCS model. Ground truth is the original image of a

building, and a window. BCPF underperforms in visual effects compared to other methods. TNN

shows slight variations from ground truth, maintaining colors and details with minor discrepancies.

2https://xu-yangyang.github.io/TMac/
3https://github.com/jamiezeminzhang/
4https://github.com/TaiXiangJiang/Framelet-TNN
5https://pxhere.com/en/photo/57707
6https://pxhere.com/en/photo/1421981

28

TMac reveals some notable differences. F-TNN improves reflection fidelity and color saturation,

closely resembling ground truth. ITCURC also achieves the high similarity to Ground truth, accu-

rately reproducing colors and details. Moreover, ITCURC significantly outperforms TMac, TNN,

and F-TNN in the t-CCS based color image completion task, evidenced by the unsatisfactory results

of BCPF, TMac, TNN, and F-TNN under t-CCS model, as illustrated in Figure 2.5.

Table 2.2 Image inpainting results on the Building and Window datasets. The best results are
emphasized in bold, while the second-best results are underlined. ITCURC-𝛿 refers to the ITCURC
method with the percentages of selected horizontal and lateral slices set at a fixed rate of 𝛿%. The
t-CCS based algorithm ITCURC-𝛿%s are performed on t-CCS scheme while other Bernoulli based
algorithms are performed on Bernoulli Sampling scheme.

Dataset

Building

Window

Overall Observation Rate

12%

16%

20%

12%

16%

20%

PSNR

SSIM

Runtime (sec)

ITCURC-11
ITCURC-12
ITCURC-13
BCPF
TMac
TNN
F-TNN

ITCURC-11
ITCURC-12
ITCURC-13
BCPF
TMac
TNN
F-TNN

ITCURC-11
ITCURC-12
ITCURC-13
BCPF
TMac
TNN
F-TNN

28.9249
28.5518
28.1893
26.7939
27.0425
26.3466
28.2529

0.8310
0.8172
0.8016
0.8639
0.8402
0.6458
0.7583

31.0050
30.8055
30.7260
28.2949
30.1755
30.3844
30.1521

0.8880
0.8818
0.8774
0.8761
0.8586
0.8257
0.8354

32.1645
31.9489
31.6825
29.4298
32.3632
31.7512
33.1660

0.9118
0.9033
0.8954
0.8873
0.9111
0.8382
0.8626

35.2830
35.1195
35.0196
30.1611
33.2673
31.8747
35.6747

0.8571
0.8535
0.8504
0.8269
0.8200
0.8333
0.8745

36.1611
36.1145
36.1215
33.9990
36.6370
34.6443
36.9233

0.8738
0.8733
0.8731
0.8554
0.8928
0.8564
0.8899

37.0236
37.0174
36.8885
35.4780
37.5877
36.7893
37.2618

0.8848
0.8850
0.8837
0.8727
0.9035
0.8804
0.9066

10.9354
10.7715
12.2208
213.6800
92.9568
3651.4556
2642.9409

18.1098
17.3187
19.7731
19.3517
22.0287
21.2458
613.3072
360.2903
108.6827
104.8518
3289.5535 3004.6557
2692.6197 2267.5622

23.8990
25.5856
28.8653
345.3425
233.8853
5801.1631
4739.2703

24.1286
26.3275
29.4986
500.3060
242.7499
6572.9697
4134.5206

25.1853
28.0392
30.8361
1629.8061
259.6068
6690.7945
4105.0327

Table 2.2 shows ITCURC typically offers quality that is comparable to that of Bernoulli Sam-

pling based TC algorithms. In runtime efficiency, ITCURC leveraging the t-CCS model signifi-

cantly surpasses BCPF, TMac, TNN, and F-TNN, all of which are based on the Bernoulli sampling

model. This efficiency enhancement highlights the t-CCS model’s superior performance in prac-

tical applications. Additionally, ITCURC’s consistent performance in delivering similar quality

results across different 𝛿, provided the overall sampling rates are consistent. These highlight the

29

flexibility and feasibility of the t-CCS model.

2.5.2.2 MRI reconstruction

In this study, we test on a MRI heart dataset7 (of size 320 × 320 × 110), where compact t-SVD

with tubal-rank 35 yields less than 10% error, suggesting low-tubal-rank property of dataset. The

visualization of reconstruction of MRI data using different methods at a 30% overall sampling rate

are presented in Figure 2.12, and reconstruction quality and runtime are detailed in Table 2.3.

Ground truth

BCPF

TMac

TNN

F-TNN

ITCURC

Figure 2.12 Visualizations of MRI data recovery using ITCURC with tubal rank 𝑟 = 35, lateral and
horizontal slice selection rate 𝛿 = 27%, and an overall sampling rate of 30%. Other algorithms are
applied under Bernoulli sampling with the same overall sampling rate. Results for slices 51, 66,
86, and 106 are shown in rows 1 to 4, with a 1.3× magnified area at the bottom left of each result
for clearer comparison.

Figure 2.12 shows recovery results for four frontal MRI slices using BCPF, TMac, TNN, F-TNN

all under Bernoulli sampling model, and ITCURC under t-CCS model. The groundtruth serves as

7http://medicaldecathlon.com/dataaws

30

the actual dataset, from which missing values are to be predicted by the different algorithms. BCPF

shows notable artifacts and lacks the sharp edges of the heart’s interior structures. TMac improves

over BCPF but still presents a softer representation of cardiac anatomy. TNN enhances the detail

prediction, resulting in a more accurate completion of the tensor that begins to resemble the ref-

erence more closely. F-TNN maintains improvements on detail prediction, and edges within the

cardiac structure suggest a refined approach to tensor completion. ITCURC shows a reconstruction

where the cardiac structures are clearly defined, reflecting the structure present in the Ground truth

without implying superiority, but rather indicating effectiveness in predicting the missing values.

The highlighted regions of interest (ROIs), marked in blue, allow for a detailed comparison across

the methods. In these regions of interest (ROIs), though ITCURC’s reconstructions may not provide

the most visually appealing results, they demonstrate efficiency in preserving structural integrity

and texture, which are crucial aspects for clinical applications. Table 2.3 effectively demonstrates

the flexibility and feasibility of the t-CCS model and we can see that reconstruction performance

of t-CCS based method ITCURC generally aligns with, or matches, the reconstruction quality of

Bernoulli-sampling-based TC methods. Furthermore, in terms of runtime efficiency, ITCURC, im-

plemented under the t-CCS model, demonstrates a marked superiority by significantly outperform-

ing alternatives such as BCPF, TMac, TNN, and F-TNN, all of which are applied under Bernoulli

sampling scheme. This notable advantage distinctly underscores the enhanced effectiveness of the

t-CCS model in practical applications.

2.5.2.3 Seismic data reconstruction

Geophysical 3D seismic data is often modeled as a tensor with inline, crossline, and depth

dimensions. In our analysis, we focus on a seismic dataset8 of size 51 × 191 × 146, where compact

t-SVD with tubal-rank 3 yields less than 5% error, suggesting low-tubal-rank property of dataset.

The corresponding results are detailed in Figure 2.13 and Table 2.4.

Figure 2.13 presents the comparative analysis of seismic completion algorithms: BCPF, TMac,

TNN, and F-TNN, applied based on the Bernoulli sampling model, in contrast to ITCURC, which is

8https://terranubis.com/datainfo/F3-Demo-2020

31

Table 2.3 The quantitative results for MRI data completion are presented, with the best results in
bold and the second-best underlined. ITCURC-𝛿 represents the ITCURC method specifying that
the selected proportion of horizontal and lateral slices is exactly 𝛿%. The t-CCS based algorithm
ITCURC-𝛿%s are performed on t-CCS scheme while other Bernoulli based algorithms are per-
formed on Bernoulli Sampling scheme.

Overall Observation Rate

10%

15%

20%

25%

30%

PSNR

SSIM

Runtime (sec)

ITCURC-23
ITCURC-25
ITCURC-27
BCPF
TMac
TNN
F-TNN

ITCURC-23
ITCURC-25
ITCURC-27
BCPF
TMac
TNN
F-TNN

ITCURC-23
ITCURC-25
ITCURC-27
BCPF
TMac
TNN
F-TNN

22.4004
22.2548
22.1617
22.6581
22.8690
23.4779
21.8172

0.6020
0.5990
0.5990
0.6817
0.6804
0.6304
0.6442

5.7908
5.4241
5.8075
53.1651
30.4813
87.7591
91.2048

24.3553
24.0435
23.9311
24.5373
25.4225
25.3480
25.7453

0.6821
0.6769
0.6751
0.7151
0.7323
0.7494
0.7507

7.0230
8.3488
8.8408
88.2777
28.0944
84.0952
86.3112

26.9104
26.9940
27.0871
25.1663
27.7802
27.9423
27.1969

0.7584
0.7571
0.7567
0.7192
0.7873
0.7677
0.8181

29.1861
29.0219
29.0699
25.8111
29.1526
28.4522
29.3630

0.8160
0.8084
0.8086
0.7301
0.8227
0.7984
0.8562

30.3911
31.1752
31.2539
26.2042
31.1648
30.5580
31.3651

0.8451
0.8619
0.8600
0.7367
0.8924
0.8793
0.8871

4.8030
5.5303
6.0685
111.1949
28.6216
56.9761
84.0228

5.3058
5.9375
6.4371
180.6596
28.9400
57.9823
82.2064

5.9484
6.7111
7.4916
279.2789
30.0219
58.2098
81.2119

applied based on the t-CCS model. The ground truth serves as the definitive reference, with its stark

textural definition. BCPF falls short of delivering optimal fidelity, with finer details lost in trans-

lation. TMac is commendable for preserving the texture’s integrity, providing a cohesive image.

TNN improves upon this, sharpening textural nuances and closing in on the ground truth’s visual

quality. F-TNN excels visually, capturing essential texture information effectively, a significant ad-

vantage when the emphasis is on recognizing general features. ITCURC demonstrates comparable

visual results though less effective than other methods in terms of PSNR and SSIM.

Table 2.4 shows that the t-CCS based method, ITCURC, achieves the fastest processing speeds

while preserving satisfactory levels of PSNR and SSIM. This underscores the suitability of the t-

CCS model for applications where rapid processing is essential without significant loss in visual

32

Ground truth

BCPF

TMac

TNN

F-TNN

ITCURC

Figure 2.13 Visualization of seismic data recovery results by setting tubal-rank 𝑟 = 3 for ITCURC
with percentage of selected horizontal and lateral slices 𝛿 = 17% with overall sampling rate 28%
while other methods are applied based on Bernoulli sampling models with the same overall sam-
pling rate 28%. Displayed are slices 15, 25, and 35 from top to bottom, with a 1.2× magnified area
in each set for clearer comparison.

accuracy. Furthermore, the consistent performance of ITCURC across various subtensor sizes and

sampling rates further emphasizes flexibility and feasibility of the t-CCS model in diverse opera-

tional environments.

Discussions on the results of real-world datasets

From the above results, it is evident that our method surpasses others in runtime with signifi-

cantly lower computational costs. Consider a tensor of dimensions 𝑛1 × 𝑛2 × 𝑛3. When a framelet

transform matrix is constructed using 𝑛 filters and 𝑙 levels, the computational cost per iteration for

framelet-based Tensor Nuclear Norm (F-TNN) is given by O ((𝑛𝑙 − 𝑙 + 1)𝑛1𝑛2𝑛3(𝑛3 + min(𝑛1, 𝑛2))).

This formulation incorporates the processes involved in generating a framelet transform matrix,

as elaborated in seminal works such as [21] and [65]. While enhancing the number of levels

and filters in F-TNN can improve the quality of results, it also escalates the computational bur-

den, particularly for tensors of substantial size. In our experiments, we have set both the framelet

level and the number of filters to 1 for the F-TNN implementation. For comparison, the com-

33

Table 2.4 Quantitative results for seismic data completion: TMac, TNN, F-TNN with Bernoulli
sampling, and our method with t-CCS. Best results are in bold, and second-best are underlined.
ITCURC-𝛿 refers to the ITCURC method with the percentages of selected horizontal and lateral
slices set at a fixed rate of 𝛿%. The t-CCS based algorithm ITCURC-𝛿%s are performed on t-CCS
scheme while other Bernoulli based algorithms are performed on Bernoulli Sampling scheme.

Overall Observation Rate

12 %

16 %

20 %

24 %

28 %

PSNR

SSIM

Runtime (sec)

ITCURC-15
ITCURC-16
ITCURC-17
BCPF
TMac
TNN
F-TNN
ITCURC-15
ITCURC-16
ITCURC-17
BCPF
TMac
TNN
F-TNN
ITCURC-15
ITCURC-16
ITCURC-17
BCPF
TMac
TNN
F-TNN

24.8020
24.7386
24.8381
24.0733
24.8859
23.7395
24.0688
0.5732
0.5691
0.5724
0.5304
0.5566
0.5165
0.6607
6.4327
6.3825
7.0379
33.5759
16.6135
34.5718
22.1019

27.4143
26.4092
27.4737
26.1054
27.4768
26.1176
24.2084
24.1905
26.5349 26.9970
27.7428
26.3806
28.6408
27.5890
0.7338
0.6691
0.7349
0.6507
0.7321
0.6491
0.5420
0.5407
0.6962
0.6738
0.7577
0.6442
0.8142
0.7551
6.7701
6.2598
6.7522
6.3579
6.8325
6.6306
32.1258
33.1832
16.8581
14.3412
29.2464
31.3138
22.1420
21.4482

30.6585
29.3053
30.6905
29.3542
30.5312
28.8953
24.3015
24.2454
28.4662
30.7237
29.5430 30.9172
31.2791
29.7987
0.8596
0.8143
0.8610
0.8129
0.8523
0.7939
0.5532
0.5494
0.8504
0.7612
0.8486
0.8080
0.8814
0.8479
6.8212
6.8633
7.0215
7.0789
7.3253
7.1480
31.2663
31.7875
13.1142
13.7124
23.9876
26.1727
18.0547
17.8848

putational cost per iteration for the TNN is O (min(𝑛1, 𝑛2)𝑛1𝑛2𝑛3 + 𝑛1𝑛2𝑛3 log(𝑛3)), and for the

TMac, it is O ((𝑟1 + 𝑟2 + 𝑟3)𝑛1𝑛2𝑛3) where (𝑟1, 𝑟2, 𝑟3) denotes the Tucker rank. As for BCPF, it is

O (𝑅3(𝑛1𝑛2𝑛3) + 𝑅2(𝑛1𝑛2 + 𝑛2𝑛3 + 𝑛3𝑛1)), where 𝑅 is the CP rank. In contrast, the computational

expense per iteration of our proposed method is significantly reduced to O (𝑟 |𝐼 |𝑛2𝑛3 + 𝑟 |𝐽 |𝑛1𝑛3),

assuming |𝐼 | (cid:28) 𝑛1 and |𝐽 | (cid:28) 𝑛2, indicating a substantial efficiency improvement over traditional

methods.

Note that for F-TNN, [66] have formulated the tensor nuclear norm utilizing the 𝑀-product [70],

a generalization of the t-product for 3-order tensor. In [66], they have incorporated a tight wavelet

frame (framelet) as the transformation matrix 𝑀. This meticulous design of the 𝑀 transformation

contributes to the superior reconstruction quality of F-TNN. However, the absence of a rapid imple-

34

mentation for multiplying the tensor with matrix 𝑀 along the third mode leads to F-TNN requiring

significantly more computational time compared to other evaluated methods.

It is worth noting that our current approach provides an effective balance between runtime effi-

ciency and reconstruction quality, making it well-suited for potential real-world applications. This

balanced approach is particularly relevant in practical settings where it is essential to consider both

speed and quality in big data applications.

2.6 Proofs of Theoretical Results

2.6.1 Proof of Theorem 2.4

In this section, we provide a detailed proof of Theorem 2.4, which is one of two important

supporting theorems to our main result Theorem 2.6.

Before proceeding to prove Theorem 2.4, we will first introduce and discuss several supporting

lemmas. These lemmas are crucial to establish the foundation for the proof of Theorem 2.4.

Lemma 2.2. Let T ∈ K𝑛1×𝑛2×𝑛3, 𝐼 ⊆ [𝑛1] and 𝐽 ⊆ [𝑛2]. S𝐼 and S𝐽 are the horizontal and
lateral sampling tensors associated with indices 𝐼 and 𝐽 respectively (see Definition 12). Then the

following results hold

S𝐼 ∗ T =

T ∗ S𝐽 =




























[S𝐼]:,:,1 · [ (cid:98)T ]:,:,1

[S𝐼]:,:,1 · [ (cid:98)T ]:,:,2

[ (cid:98)T ]:,:,1 · [S𝐽]:,:,1

[ (cid:98)T ]:,:,2 · [S𝐽]:,:,1

[S𝐼]:,:,1 · [ (cid:98)T ]:,:,𝑛3

. . .

. . .

,

.

(2.15)

(2.16)

[ (cid:98)T ]:,:,𝑛3 · [S𝐽]:,:,1




























Proof. Here, we will only focus on the proof of (2.15). First, it is easy to see that S𝐼 ∗ T = S𝐼 · T .

35

In addition,

and

[S𝐼]:,:,1

S𝐼 =















[ (cid:98)T ]:,:,1

T =















[S𝐼]:,:,1

. . .

[ (cid:98)T ]:,:,2

. . .















.

[S𝐼]:,:,1















[ (cid:98)T ]:,:,𝑛3

The result can thus be derived.

(cid:131)

Theorem 2.7 ([115, 116]). Consider a finite sequence {𝑋𝑘 } of independent, random, Hermitian

matrices with common dimension 𝑑. Assume that

0 ≤ 𝜆min (𝑋𝑘 ) and 𝜆max (𝑋𝑘 ) ≤ 𝐿 for each index 𝑘.
Set 𝑌 = (cid:205)𝑘 𝑋𝑘 . Let 𝜇min and 𝜇max be the minimum and maximum eigenvalues of E(𝑌 ) respectively.

Then,

P {𝜆min(𝑌 ) ≤ (1 − 𝜀)𝜇min} ≤ 𝑑

P {𝜆max(𝑌 ) ≥ (1 + 𝜀)𝜇max} ≤ 𝑑

(cid:105) 𝜇min/𝐿

(cid:105) 𝜇max/𝐿

(cid:104)

e− 𝜀
(1−𝜀)1− 𝜀
(cid:104)
e𝜀
(1+𝜀)1+𝜀

for 𝜀 ∈ [0, 1), and

for 𝜀 ≥ 0.

Lemma 2.3. Suppose 𝐴 is a block diagonal matrix, i.e. 𝐴 =

𝐴1














𝑖 𝐴𝑖 = I𝑟𝑖 and 𝑟𝑖 ≤ 𝑛1 for ∀𝑖 ∈ [𝑛3]. Let 𝐼 be a random subset of [𝑛1].

, where each 𝐴𝑖















𝐴𝑛3

. . .

𝐴2

is a matrix of size 𝑛1 × 𝑟𝑖, 𝐴(cid:62)

36

Then for any 𝛿 ∈ [0, 1), the

𝑛3(cid:205)
𝑖=1

𝑟𝑖-th singular value of the matrix

[S𝐼]:,:,1

𝑁 =:














(cid:113) (1−𝛿)|𝐼 |
𝑛1

[S𝐼]:,:,1

. . .

𝐴1





























[S𝐼]:,:,1

𝐴2

. . .















𝐴𝑛3

will be no less than

with probability at least
−(𝛿+(1−𝛿) log(1−𝛿))

| 𝐼 |

𝑛
1 max
𝑛
𝑖 ∈ [𝑛
3 ]
1

(cid:107) [ 𝐴]𝑖,: (cid:107)2
F .

1 − (cid:107)(cid:174)𝑟 (cid:107)1𝑒

Proof. Firstly, it is easy to check that
𝑁 = (cid:2)I ⊗ (S𝐼):,:,1

(cid:3) · 𝐴 =

𝑛3(cid:213)

(cid:213)

𝑖∈𝐼

𝑗=1

e( 𝑗−1)𝑛1+𝑖 · [ 𝐴] ( 𝑗−1)𝑛1+𝑖,:,

𝑌 := 𝑁 (cid:62) · 𝑁 =

=

𝑛3(cid:213)

(cid:213)

𝑖∈𝐼

(cid:213)

𝑗=1
𝑛3(cid:213)

where e( 𝑗−1)𝑛1+𝑖 is the standard column basis vector of K𝑛1𝑛3. Consider

𝑛3(cid:205)

𝑖

𝑟𝑖 ×

𝑛3(cid:205)

𝑖

𝑟𝑖 Gram matrix

(e( 𝑗−1)𝑛1+𝑖 · [ 𝐴] ( 𝑗−1)𝑛1+𝑖,:)(cid:62) · e( 𝑗−1)𝑛1+𝑖 · [ 𝐴] ( 𝑗−1)𝑛1+𝑖,:

[ 𝐴](cid:62)

( 𝑗−1)𝑛1+𝑖,: [ 𝐴] ( 𝑗−1)𝑛1+𝑖,: =:

(cid:213)

𝑇𝑖,

where 𝑇𝑖 =

𝑖∈𝐼
[ 𝐴] ( 𝑗−1)𝑛1+𝑖,:. It is easy to see that 𝑌 is a random matrix due to random-
ness inherited from the random set 𝐼. It is easy to see that each 𝑇𝑖 is a positive semidefinite matrix

( 𝑗−1)𝑛1+𝑖,:

𝑛3(cid:205)
𝑗=1

[ 𝐴](cid:62)

𝑗=1

𝑖∈𝐼

𝑛3(cid:205)

𝑛3(cid:205)

𝑟𝑖 ×

of size
without replacement from the set (cid:8)𝑋1, 𝑋2, · · · 𝑋𝑛1

𝑟𝑖. Thus, the random matrix 𝑌 in fact is a sum of |𝐼 | random matrices sampled
(cid:9) of positive semi-definite matrices. Notice that

𝑖

𝑖

𝜆max (𝑇𝑖) = 𝜆max (cid:169)
(cid:173)
(cid:171)

(e( 𝑗−1)𝑛1+𝑖 · [ 𝐴] ( 𝑗−1)𝑛1+𝑖,:)(cid:62) · e( 𝑗−1)𝑛1+𝑖 · [ 𝐴] ( 𝑗−1)𝑛1+𝑖,:(cid:170)
(cid:174)
(cid:172)

e( 𝑗−1)𝑛1+𝑖 · [ 𝐴] ( 𝑗−1)𝑛1+𝑖,:(cid:170)
(cid:174)
(cid:172)

≤ max
𝑖

(cid:107) [ 𝐴]𝑖,:(cid:107)2
F

.

2

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

𝑛3(cid:213)

𝑗=1
𝑛3(cid:213)

𝑗=1

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

=

𝜎max (cid:169)
(cid:173)
(cid:171)

I and thus E(𝑌 ) = |𝐼 |
By the orthogonal property of matrix 𝐴, it is easy to see that E (𝑇𝑖) = 1
𝑛1
𝑛1
(𝑁) and by the Chernoff

where E is the expectation operator. Thus, by the fact that 𝜆min(𝑌 ) = 𝜎2

,

min

37

inequality (see Theorem 2.7), we have

𝜎min(𝑁) ≤

P (cid:169)
(cid:173)
(cid:171)

(cid:115)

(1 − 𝛿)|𝐼 |
𝑛1

(cid:170)
(cid:174)
(cid:172)

≤ (cid:107)(cid:174)𝑟 (cid:107)1𝑒

−(𝛿+(1−𝛿) log(1−𝛿))

| 𝐼 |

𝑛
1 max
𝑛
𝑖 ∈ [𝑛
3 ]
1

(cid:107) [ 𝐴]𝑖,: (cid:107)2

F , ∀𝛿 ∈ [0, 1).

(cid:131)

In the following, we delve into the proof of Theorem 2.4 to tell about how likely t-CUR decom-

position holds.

The proof of Theorem 2.4. According to Theorem 2.3, T = C ∗U† ∗R is equivalent to rank𝑚 (T ) =

rank𝑚 (C) = rank𝑚 (R). Therefore, it suffices to prove that rank𝑚 (T ) = rank𝑚 (C) = rank𝑚 (R)

holds with probability at least 1 − 1
𝑛𝛽
1
1

− 1
𝑛𝛽
2
2

with the given conditions. Notice that

T =

=

=

[ (cid:98)T ]:,:,1















𝑊1Σ1𝑉 (cid:62)

1












𝑊1













𝑊2

[ (cid:98)T ]:,:,2

. . .















[ (cid:98)T ]:,:,𝑛3

𝑊2Σ2𝑉 (cid:62)
2

. . .

Σ1















·















. . .

𝑊𝑛3

𝑊𝑛3Σ𝑛3

𝑉 (cid:62)
𝑛3

Σ2

. . .

(2.17)





























·


𝑉 (cid:62)

1












Σ𝑛3

𝑉 (cid:62)
2

. . .















𝑉 (cid:62)
𝑛3

=: 𝑊 · Σ · 𝑉 (cid:62),

in (2.17) is the compact SVD of [T ]:,:,𝑖 for 𝑖 ∈ [𝑛3]. And R = S𝐼𝑊Σ𝑉 (cid:62). By the
where 𝑊𝑖Σ𝑖𝑉 (cid:62)
𝑖
definition of tensor multi-rank, we have 𝑊𝑖 ∈ K𝑛1×𝑟𝑖 , Σ𝑖 ∈ K𝑟𝑖×𝑟𝑖 , 𝑉𝑖 ∈ K𝑛2×𝑟𝑖 , W ∈ K𝑛1𝑛3×(cid:107)(cid:174)𝑟 (cid:107)1, Σ ∈
K(cid:107)(cid:174)𝑟 (cid:107)1×(cid:107)(cid:174)𝑟 (cid:107)1, and 𝑉 ∈ K𝑛2𝑛3×(cid:107)(cid:174)𝑟 (cid:107)1.

Consequently, demonstrating that rank(R) = (cid:107)(cid:174)𝑟 (cid:107)1 suffices to ensure the condition that rank𝑚 (T ) =

38

rank𝑚 (R). Observe that Σ is a square matrix with full rank and 𝑉 has full column rank. By the

Sylvester rank inequality, rank(R) = (cid:107)(cid:174)𝑟 (cid:107)1 can be guaranteed by showing rank(S 𝐼 · 𝑊) = (cid:107)(cid:174)𝑟 (cid:107)1. By

applying Lemma 2.3, we have that for all 𝛿 ∈ [0, 1),
(cid:17)

P (cid:16)

𝜎(cid:107)(cid:174)𝑟 (cid:107)1
𝛽1 𝜇0 (cid:107)𝑟 (cid:107)∞ log(𝑛1 (cid:107)(cid:174)𝑟 (cid:107)1)
𝛿+(1−𝛿) log(1−𝛿)

(𝑆𝐼 · 𝑊) ≤ (cid:112)(1 − 𝛿)|𝐼 |/𝑛1
implies P (cid:16)
𝜎(cid:107)(cid:174)𝑟 (cid:107)1

|𝐼 | ≥

≤ (cid:107)(cid:174)𝑟 (cid:107)1𝑒−(𝛿+(1−𝛿) log(1−𝛿))

| 𝐼 |
0 (cid:107) (cid:174)𝑟 (cid:107)∞ .
𝜇

(𝑆𝐼 · 𝑊) ≤ (cid:112)(1 − 𝛿)|𝐼 |/𝑛1

(cid:17)

≤ 1
𝑛1

𝛽

1 .

Note that

P (cid:16)

rank(𝑆𝐼 · 𝑊) < (cid:107)(cid:174)𝑟 (cid:107)1

(cid:17)

We thus have when |𝐼 | ≥
P (cid:16)

rank(𝑆𝐼 · 𝑊) = (cid:107)(cid:174)𝑟 (cid:107)1

𝛽1 𝜇0 (cid:107)𝑟 (cid:107)∞ log(𝑛1 (cid:107)(cid:174)𝑟 (cid:107)1)
𝛿+(1−𝛿) log(1−𝛿)
(cid:17)

,
=1 − P (cid:16)

(cid:115)

(1 − 𝛿)|𝐼 |
𝑛1

.

(cid:170)
(cid:174)
(cid:172)

(cid:17)

𝜎(cid:107)(cid:174)𝑟 (cid:107)1

(𝑆𝐼 · 𝑊) ≤

≤ P (cid:169)
(cid:173)
(cid:171)

rank(𝑆𝐼 · 𝑊) < (cid:107)(cid:174)𝑟 (cid:107)1
(cid:115)

≥1 − P (cid:169)
(cid:173)
(cid:171)
Similarly, one can show that rank(S𝐽 · V) = (cid:107)(cid:174)𝑟 (cid:107)1 holds with probability at least 1 − 1
𝑛2

(𝑆𝐼 · 𝑊) ≤

𝜎(cid:107)(cid:174)𝑟 (cid:107)1

≥ 1 −

(cid:170)
(cid:174)
(cid:172)

.

(1 − 𝛿)|𝐼 |
𝑛1

1
𝛽1
𝑛1

𝛽

2 provided

that |𝐽 | ≥

𝛽2 𝜇0 (cid:107)𝑟 (cid:107)∞ log(𝑛2 (cid:107)(cid:174)𝑟 (cid:107)1)
𝛿+(1−𝛿) log(1−𝛿)

.

Combining all the statements and setting 𝛿 = 0.815 and 𝛽1 = 𝛽2 = 𝛽, we conclude that T =

C ∗ U† ∗ R holds with probability at least 1 − 1
𝑛𝛽
1

− 1
𝑛𝛽
2

and |𝐽 | ≥ 2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛2(cid:107)(cid:174)𝑟 (cid:107)1).

, provided that |𝐼 | ≥ 2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛1(cid:107)(cid:174)𝑟 (cid:107)1)

(cid:131)

2.6.1.1 Some remarks on the proof of Theorem 2.4

We wish to emphasize that the techniques employed in our proof are not merely straightforward

extensions of the probabilistic estimates used in matrix CUR decompositions since one cannot

directly apply the union of matrix CUR probabilistic estimates to flattened tensors due to ‘‘depen-

dence’’ and ”intertwined” sampling property of each sub block matrix after one flattens a tensor to

a block diagonal matrix in the Fourier domain. We introduce a new tool that offers a probabilistic

estimate for achieving an exact t-CUR decomposition, utilizing a novel proof methodology. The

cornerstone of our approach is to assess the likelihood that multi-rank is preserved when select-

ing horizontal or lateral slices uniformly. This approach distinguishes our method from traditional

techniques applied in matrix settings. Although our method involves converting a third-order tensor

39

into a block diagonal matrix, it necessitates the introduction of innovative techniques. These are

required to overcome several challenges that do not arise in matrix-based proofs. Given a three-
order tensor T ∈ R𝑛1×𝑛2×𝑛3 with multi-rank (cid:174)𝑟 = (𝑟1, 𝑟2, · · · , 𝑟𝑛3), its flattened version in the Fourier

domain is denoted as

T =

[ (cid:98)T ]:,:,1

0

0

0
...

0

[ (cid:98)T ]:,:,2

· · ·
...

0


















0

0

· · ·
...

0

· · ·

· · ·

· · ·
...

· · ·

0

0

0
...

[ (cid:98)T ]:,:,𝑛3

,


















where (cid:98)T = FFT(T , [], 3). For simplicity, we denote 𝑇𝑖 as [ (cid:98)T ]:,:,𝑖 for 𝑖 = 1, · · · , 𝑛3. It is easy to
see that sampling horizontal(lateral) slices of tensor T ∈ R𝑛1×𝑛2×𝑛3 with an index set 𝐼 is equiva-

lent of sampling row(column) vectors of the matrix (cid:98)T with indexes 𝐼, 𝑛1 + 𝐼, · · · , (𝑛3 − 1)𝑛1 + 𝐼.
In other words, the process of sampling 𝐼 horizontal(lateral) slices is the same with sampling 𝐼

rows(columns) of 𝑇𝑖, for 𝑖 = 1, · · · , 𝑛3. Similar arguments for the lateral slice index set 𝐽.

Consider the sample space

Ω = {(𝐼, 𝐽), 𝐼 ⊂ {1, · · · , 𝑛1}, 𝐽 ⊂ {1, · · · , 𝑛2}}.

Define the events F𝑖 as

for 𝑖 = 1, . . . , 𝑛3.

Lemma 2.4.

{(𝐼, 𝐽) ⊂ Ω | [𝑇𝑖] 𝐼,𝐽 = 𝑟𝑖},

{(𝐼, 𝐽) : [𝑇𝑖] 𝐼,𝐽 = 𝑟𝑖, 𝑖 = 1, · · · , 𝑛3} (cid:40) F1 ∩ F2 ∩ · · · ∩ F𝑛3

.

Proof. The set {(𝐼, 𝐽) : [𝑇𝑖] 𝐼,𝐽 = 𝑟𝑖, 𝑖 = 1, · · · , 𝑛3} can be viewed as a product space, i.e.,

(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)

{(𝐼, 𝐽) × (𝐼, 𝐽) · · · × (𝐼, 𝐽)
(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)
(cid:123)(cid:122)
(cid:125)
(cid:124)
n times

: [𝑇𝑖] 𝐼,𝐽 = 𝑟𝑖, 𝑖 = 1, · · · , 𝑛3}.

However,

F1 ∩ F2 ∩ · · · F𝑛3 = {(𝐼1, 𝐽1) × (𝐼2, 𝐽2) × · · · (𝐼𝑛3

, 𝐽𝑛3) : (𝐼𝑖, 𝐽𝑖) ∈ F𝑖, 𝑖 = 1, · · · , 𝑛3}.

40

Therefore,

{(𝐼, 𝐽) : [𝑇𝑖] 𝐼,𝐽 = 𝑟𝑖, 𝑖 = 1, · · · , 𝑛3} (cid:40) F1 ∩ F2 ∩ · · · ∩ F𝑛3

.

(cid:131)

Let G represent the event

{(𝐼, 𝐽) ⊂ Ω | |𝐼 | ≥ 𝜇0|(cid:174)𝑟 | log(𝑛1) log(𝑟), |𝐽 | ≥ 𝜇0|(cid:174)𝑟 | log(𝑛2) log(𝑟)}.

Although one might have that the conditional probability inequality, based on the [32, Theorem

2.1],[19, Theorem 2],

P(F𝑖 |G) ≥ 1 −

4𝑟 2
𝑛1𝑛2

,

one can find that based on Lemma 2.4:

P({(𝐼, 𝐽) ⊂ Ω : rank𝑚 ( [T ] 𝐼,𝐽,:) = (cid:174)𝑟}|G) = (cid:174)𝑟 |G) = P({(𝐼, 𝐽) ⊂ Ω : [𝑇𝑖] 𝐼,𝐽 = 𝑟𝑖, 𝑖 = 1, · · · , 𝑛3}|G)

≤ P(F1 ∩ F2 · · · ∩ F𝑛3 |G)

The probability inequality P({(𝐼, 𝐽) ⊂ Ω : rank𝑚 ([T ] 𝐼,𝐽,:) = (cid:174)𝑟}|G) ≤ P(F1 ∩ F2 · · · ∩ F𝑛3 |G)
P(F𝑖 |G) − (𝑛3 − 1). As a result, we
directly stops us from applying P(F1 ∩ F2 · · · ∩ F𝑛3 |G) ≥
can not get a lower bound of P({(𝐼, 𝐽) ⊂ Ω : [𝑇𝑖] 𝐼,𝐽 = 𝑟𝑖, 𝑖 = 1, · · · , 𝑛3}|G) = (cid:174)𝑟 |G) via a direct

𝑛3(cid:205)
𝑖=1

union of matrix CUR probabilistic estimate results.

Furthermore, we hope to emphasize that applying the matrix Chernoff inequality to a flatten

tensor also presents numerous challenges. Notice that one flatten tensor T into T in the Fourier
domain, the corresponding sampling index set of rows with respect to T becomes (cid:208)
𝑖∈𝐼

𝐼𝑖, where

𝐼𝑖 := {𝑖, 𝑛1 + 𝑖, 2𝑛1 + 𝑖, · · · , (𝑛3 − 1)𝑛1 + 𝑖}, 𝑖 = 1, · · · , 𝑛1,

and the corresponding sampling index of columns with respect to T becomes (cid:208)
𝑖∈𝐼

𝐽𝑖, where

𝐽𝑖 := {𝑖, 𝑛2 + 𝑖, 2𝑛2 + 𝑖, · · · , (𝑛3 − 1)𝑛2 + 𝑖}, 𝑖 = 1, · · · , 𝑛2.

Without loss of generality and for the sake of brevity, in the following, we focus solely on the case of

selecting horizontal slices of T , denoted by the sampling index set 𝐼. It is easy to find that one can

not directly apply the matrix Chernoff inequality to the finite set of positive-semi-definite matrices

H = (cid:8)𝑀1, 𝑀2, · · · 𝑀𝑛3

, 𝑀𝑛3+1, 𝑀𝑛3+2, · · · , 𝑀2𝑛3

, · · · , 𝑀𝑛1𝑛3

(cid:9) ,

41

where 𝑀 𝑗 = [T ](cid:62)
set (cid:208)
𝑖∈𝐼

𝐼𝑖. Specifically, the intertwined nature of the index set (cid:208)
𝑖∈𝐼

𝑗,: · [T ] 𝑗,: with 𝑗 = 1, · · · , 𝑛1𝑛3 due to the intertwined property of sampling index
𝐼𝑖 complicates the estimation of the
𝑛1(cid:205)
𝑗=1

spectral bound for the sum of random matrices. Specifically, the expression |𝜎max(

e( 𝑗 − 1)𝑛3 + 𝑖·

[T ] ( 𝑗−1)𝑛3+𝑖,:)|2 does not serve as a lower bound for max

𝑖

|[T ]𝑖,:|2
2.

2.6.2 Proof of Theorem 2.5

In this section, we provides a detailed proof of Theorem 2.5, another important supporting the-

orem to our main theoretical result Theorem 2.6. To the best of our knowledge, there is no existing

tensor version of the result found in [97, Theorem 1.1], which furnishes an explicit expression of

numerical constants within the theorem’s statement. Existing results related to tensor versions, such

as [138, Theorem 3.1] in the context of tensor completion, typically only imply numerical constants

implicitly. One can see that [138, Theorem 3.1] does not give an explicit expression of numerical

constants of 𝑐0, 𝑐1 and 𝑐2.

Theorem 2.8. [138, Theorem 3.1] Suppose M is an 𝑛1 × 𝑛2 × 𝑛3 tensor and its reduced t-SVD
is given by M = U ∗ S ∗ V(cid:62) where U ∈ R𝑛1×𝑟×𝑛3, S ∈ R𝑟×𝑟×𝑛3, and V ∈ R𝑛2×𝑟×𝑛3. Suppose

M satisfies the standard tensor incoherent condition with parameter 𝜇0 > 0. Then there exists

constants 𝑐0, 𝑐1, 𝑐2 > 0 such that if

𝜇0𝑟 log(𝑛3(𝑛1 + 𝑛2))
min{𝑛1, 𝑛2}
Then M is the unique minimizer to the follow optimization

𝑝 ≥ 𝑐0

.

with probability at least

min
X

(cid:107)X(cid:107)TNN

subject to PΩ(X) = PΩ(M),

1 − 𝑐1((𝑛1 + 𝑛2)𝑛3)−𝑐2.

Our work constitutes a substantial contribution through the meticulous analysis of these numer-

ical constants, yielding explicit formulations for their expressions. The details of these theoretical

advancements are comprehensively elaborated in our theoretical section. Before moving forward,

42

let us introduce several notations used throughout the rest of the supplemental material but not

covered in earlier sections.

Definition 15. Suppose T is an 𝑛1×𝑛2×𝑛3 tensor and its compact t-SVD is given by T = U∗S∗V(cid:62)
where U ∈ K𝑛1×𝑟×𝑛3, S ∈ K𝑟×𝑟×𝑛3 and V ∈ K𝑛2×𝑟×𝑛3. Define projection space T as
(cid:41)
:,𝑘,:) : X𝑘 ∈ K𝑛2×1×𝑛3, Y𝑘 ∈ K𝑛1×1×𝑛3

𝑘 + Y𝑘 ∗ [V](cid:62)

( [U]:,𝑘,: ∗ X(cid:62)

(cid:40) 𝑟

(cid:213)

and the orthogonal projection space T⊥ is the orthogonal complement T in K𝑛1×𝑛2×𝑛3. Define PT(X)

𝑘=1

and PT⊥ (X) as

PT(X) = U ∗ U(cid:62) ∗ X + X ∗ V ∗ V(cid:62) − U ∗ U(cid:62) ∗ X ∗ V ∗ V(cid:62),

PT⊥ (X) = (cid:0)I𝑛1 − U ∗ U(cid:62)(cid:1) ∗ X ∗ (cid:0)I𝑛2 − V ∗ V(cid:62)(cid:1) ,

where I𝑛1 is the identity tensor of size 𝑛1 × 𝑛1 × 𝑛3 and I𝑛2 is the identity tensor of size 𝑛2 × 𝑛2 × 𝑛3.

Definition 16. Define the operator RΩ : K𝑛1×𝑛2×𝑛3 → K𝑛1×𝑛2×𝑛3 as:
𝛿𝑖, 𝑗,𝑘 [X]𝑖, 𝑗,𝑘 ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 ,

RΩ(X) =

(cid:213)

1
𝑝

𝑖, 𝑗,𝑘

where [X]𝑖, 𝑗,𝑘 is the (𝑖, 𝑗, 𝑘)-th entry of a tensor X ∈ K𝑛1×𝑛2×𝑛3.

Definition 17. Given two tensor A ∈ K𝑛1×𝑛2×𝑛3 and B ∈ K𝑛1×𝑛2×𝑛3, the inner product of these two

tensors is defined as:

(cid:104)A, B(cid:105) =

1
𝑛3

(cid:62)

(cid:16)

B

(cid:17)

.

· A

trace

Before we introduce tensor operator norm, we need to introduce a transformed version of a
tensor operator. Given a tensor operator, F : K𝑛1×𝑛2×𝑛3 → K𝑛1×𝑛2×𝑛3, the associated transformed

operator F : B → B, where B =

(cid:110)

B : B ∈ K𝑛1×𝑛2×𝑛3

(cid:111)

, is defined as

F (X) = F (X).

Definition 18 (Tensor operator norm). Given a operator F : K𝑛1×𝑛2×𝑛3 → K𝑛1×𝑛2×𝑛3, the operator

norm (cid:107)F (cid:107) is defined as (cid:107)F (cid:107) = (cid:107)F (cid:107) = max
(cid:107)X(cid:107)F=1

(cid:107)F (X)(cid:107)F = max
(cid:107)X(cid:107)F=1

(cid:13)
(cid:13)
(cid:13)

F (X)

(cid:13)
(cid:13)
(cid:13)F

.

Definition 19 (𝑙∞,2 norm [138]). Given a tensor X ∈ K𝑛1×𝑛2×𝑛3, its 𝑙∞,2 norm is defined as
(cid:115)(cid:213)

(cid:115)(cid:213)

(cid:107)X(cid:107)∞,2 := max{max

𝑖

[X]2

𝑖,𝑏,𝑘 , max

𝑗

[T ]2

𝑎, 𝑗,𝑘 }.

𝑎,𝑘

𝑏,𝑘

43

Definition 20 (Tensor infinity norm). Given a tensor X ∈ K𝑛1×𝑛2×𝑛3, the tensor infinity norm of it

is defined as (cid:107)X(cid:107)∞ := max
𝑖, 𝑗,𝑘

| [X]𝑖, 𝑗,𝑘 |.

In the following, we will present a formal definition of the tensor completion problem based on
the Bernoulli sampling model. Consider a third-order tensor T ∈ K𝑛1×𝑛2×𝑛3 with tubal-rank 𝑟. We

denote Ω as the set of indices of the observed entries. Suppose that Ω is generated according to the

Bernoulli sampling model with probability 𝑝. We define the sampling operator PΩ such that for a
given tensor X in K𝑛1×𝑛2×𝑛3,

PΩ(X) =

(cid:213)

[X]𝑖, 𝑗,𝑘 E𝑖, 𝑗,𝑘 ,

(𝑖, 𝑗,𝑘)∈Ω
where E𝑖, 𝑗,𝑘 is a tensor in {0, 1}𝑛1×𝑛2×𝑛3 and all elements are zero except for the one at the position

indexed by (𝑖, 𝑗, 𝑘). The primary goal of the tensor completion problem is to reconstruct the tensor

T from the entries on Ω. We utilize the approach proposed in the references [84, 138], which

addresses the tensor completion issue through a specific convex optimization problem formulated

as follows:

min
X

(cid:107)X(cid:107)TNN

subject to PΩ(X) = PΩ(T ).

(2.18)

Notice that TNN is convex but not strictly convex. Thus, there might be more than one local mini-

mizer to the optimization problem (2.18). Therefore, we need to establish conditions to ensure that

our optimization problem has a unique minimizer, which is exactly the tensor we seek to recover.

The question of under what conditions T is the unique minimizer of the optimization problem

(2.18) naturally arises. In response, Proposition 1 gives an affirmative answer. Before proceeding,

it is important to highlight that in the following context, for convenience, we will interchangeably

make use of (cid:107) · (cid:107) to denote the tensor spectral norm, tensor operator norm or the matrix spectral

norm, depending on the specific situation.

Proposition 1 ([84]). Assume that T ∈ K𝑛1×𝑛2×𝑛3 of tubal-rank 𝑟 satisfies the incoherence condition
with parameter 𝜇0 and its compact t-SVD is given by T = U ∗ S ∗ V(cid:62) where U ∈ K𝑛1×𝑟×𝑛3, S ∈
K𝑟×𝑟×𝑛3 and V ∈ K𝑛2×𝑟×𝑛3. Suppose that Ω is generated according to the Bernoulli sampling model

44

with probability 𝑝. Then tensor T is a unique minimizer of the optimization problem (2.18), if the

following two conditions hold:

Condition 1. (cid:107)PTRΩPT − PT(cid:107) ≤ 1
2

Condition 2. There exists a tensor Y such that PΩ(Y) = Y and

(a) (cid:107)PT(Y) − U ∗ V(cid:62)(cid:107)F ≤ 1
4

(cid:113) 𝑝
𝑛3

(b) (cid:107)PT⊥ (Y) (cid:107) ≤ 1
2

Based on Proposition 1, our main result is derived through probabilistic estimation of the Con-

dition 1 and Condition 2. Throughout this computation, we explicitly determine both the lower

bound of the sampling probability 𝑝 and the probability of the exact recovery of T . The architecture

of the entire proof is described as follows.

2.6.2.1 Architecture of the proof of Theorem 2.5

The proof of Theorem 2.5 follows the pipeline developed in [84, 138]. We first state a sufficient

condition for T to be the unique optimal solution to the optimization problem (2.18) via construct-

ing a dual certificate Y obeying two conditions. This result is summarized in Proposition 1. To

obtain our main result Theorem 2.5, we just need to show that the conditions in Proposition 1

hold with a high probability. The Theorem 2.5 is built on the basis of Lemma 2.8, Lemma 2.9,

Lemma 2.10 and Corollary 2.1. A detailed roadmap of the proof towards Theorem 2.5 is outlined

in Figure 2.14.

45

Lemma 2.8

Lemma 2.9

Lemma 2.10

Corollary 2.1

condition I

condition II

Proposition 1

Theorem 2.5

Figure 2.14 The structure of the proof of Theorem 2.5: The core of the proof for Theorem 2.5 relies
on assessing the probability that certain conditions, specified in Proposition 1, are met. Condition
I and Condition II serve as sufficient criteria to ensure the applicability of Proposition 1. Thus,
the proof of Theorem 2.5 primarily involves determining the likelihood that condition I and II
are satisfied. The probabilistic assessment of condition I utilizes Lemma 2.8 as a fundamental
instrument. Similarly, the evaluation of condition II employs Lemmas 2.8 to 2.10, and Corollary 2.1
as essential tools.

Before delving into the proof of Theorem 2.5, we will introduce several supporting lemmas to

lay the necessary foundation.

2.6.2.2 Supporting lemmas for the proof of Theorem 2.5

Lemma 2.5 (Non-commutative Bernstein inequality [115]). Let 𝑋1, 𝑋2, · · · , 𝑋𝐿 be independent

zero-mean random matrices of dimension 𝑛1 × 𝑛2. Suppose

𝜎2 = max

E[

𝑋𝑘 𝑋 (cid:62)
𝑘 ]

,

E[

and (cid:107) 𝑋𝑘 (cid:107) ≤ 𝑀. Then for any 𝜏 ≥ 0,
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

(cid:32)(cid:13)
𝐿
(cid:13)
(cid:213)
(cid:13)
(cid:13)
(cid:13)

𝑋𝑘

P

𝑘=1

≥ 𝜏

≤ (𝑛1 + 𝑛2) exp

(cid:40)(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

𝐿
(cid:213)

𝑘=1

(cid:33)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

𝐿
(cid:213)

𝑘=1

𝑋 (cid:62)
𝑘 𝑋𝑘 ]

(cid:41)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

(cid:32)

−𝜏2/2
𝜎2 + 𝑀𝜏
3

(cid:33)

.

The following lemma is a variant of Non-commutative Bernstein inequality.

Lemma 2.6. Let 𝑋1, 𝑋2, · · · , 𝑋𝐿 be independent zero-mean random matrices of dimension 𝑛1 × 𝑛2.

Suppose

𝜎2 = max

(cid:40)(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

E[

𝐿
(cid:213)

𝑘=1

𝑋𝑘 𝑋 (cid:62)
𝑘 ]

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

,

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

E[

𝐿
(cid:213)

𝑘=1

𝑋 (cid:62)
𝑘 𝑋𝑘 ]

(cid:41)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

46

and (cid:107) 𝑋𝑘 (cid:107) ≤ 𝑀, where 𝑀 is a positive number. If we choose 𝜏 = (cid:112)

2𝑐𝜎2 log (𝑛1 + 𝑛2)+𝑐𝑀 log (𝑛1 + 𝑛2),

we have

P

(cid:32)(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

𝐿
(cid:213)

𝑘=1

𝑋𝑘

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

≥ (cid:112)

2𝑐𝜎2 log (𝑛1 + 𝑛2) + 𝑐𝑀 log (𝑛1 + 𝑛2)

(cid:33)

≤ (𝑛1 + 𝑛2)1−𝑐,

where 𝑐 is any positive number greater than 1. The following fact is very useful and we will make

frequent use of the result for the proofs of Theorem 2.5 and Proposition 1.

Lemma 2.7. Suppose T is an 𝑛1 × 𝑛2 × 𝑛3 tensor with its compact t-SVD given by T = U ∗ Σ ∗ V(cid:62)

and satisfy incoherence condition with parameter 𝜇0. Then,

(cid:107)PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 )(cid:107)2

F ≤

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

.

The following lemma shows how likely that the operator norm PTRΩPT − PT is smaller than

1
2. Such result will help us calculate how likely the Condition 1 in Proposition 1 holds.

Lemma 2.8. Assume that Ω is generated according to the Bernoulli distribution with probability

𝑝, then

holds with probability at least 1 − 2𝑛1𝑛2𝑛3 exp

(cid:16)

− 3𝑝𝑛1𝑛2

28(𝑛1+𝑛2)𝜇0𝑟

(cid:17)

.

(cid:107)PTRΩPT − PT(cid:107) ≤

1
2

The following lemma states that given an arbitrary tensor X ∈ K𝑛1×𝑛2×𝑛3, tensor spectral norm

of difference between RΩ(X) and X can be bounded with tensor infinity norm and 𝑙∞,2 norm with

a high probability.

Lemma 2.9. Given an arbitrary tensor X ∈ K𝑛1×𝑛2×𝑛3. Assume that Ω is generated according to
the Bernoulli distribution with probability 𝑝. Then, for any constant 𝑐2 > 1, we have

(cid:107)RΩ(X) − X(cid:107) ≤ (cid:107)X(cid:107)∞,2

2𝑐2
𝑝 log((𝑛1 + 𝑛2)𝑛3) +

𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑝

(cid:107)X(cid:107)∞

(2.19)

(cid:115)

holds with probability at least 1 − ((𝑛1 + 𝑛2)𝑛3)1−𝑐2.

The following lemma states that given an arbitrary tensor X ∈ K𝑛1×𝑛2×𝑛3, the bound of 𝑙∞,2
distance between PTRΩ(X) and PT(X) can be controlled by the 𝑙∞,2 norm of X and the tensor

infinity norm of X with a high probability.

47

Lemma 2.10. Assume that Ω is generated according to the Bernoulli distribution with probability

𝑝. For any positive number 𝑐1 ≥ 2, then we can get

(cid:32)

P

(cid:107)(PTRΩ(X) − PT(X))(cid:107)∞,2 ≤

(cid:115)

4𝑐1(𝑛1 + 𝑛2)𝜇0𝑟 log ((𝑛1 + 𝑛2)𝑛3)
𝑝𝑛1𝑛2

· (cid:107)X(cid:107)∞,2

+

𝑐1 log((𝑛1 + 𝑛2)𝑛3)
𝑝

(cid:115)

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

≥ 1 − ((𝑛1 + 𝑛2)𝑛3)2−𝑐1.

(cid:107)X(cid:107)∞(cid:170)
(cid:174)
(cid:172)

The following lemma states that, given an arbitrary tensor X ∈ R𝑛1×𝑛2×𝑛3, the tensor infinity

norm of PTRΩPT(X) − PT(X) can be bounded by the tensor infinity norm of PT(X) with a high

probability.

Lemma 2.11. Assume that Ω is generated according to the Bernoulli distribution with probability
𝑝. For any X ∈ K𝑛1×𝑛2×𝑛3, then

holds with probability at least 1 − 2𝑛1𝑛2𝑛3 exp

(cid:107)(PTRΩPT − PT) (X)(cid:107)∞ ≤
(cid:16) −3𝑝𝑛1𝑛2
16(𝑛1+𝑛2)𝜇0𝑟

.

(cid:107)PT(X)(cid:107)∞
(cid:17)

1
2

When PT(X) = X, we can easily achieve the following corollary.

Corollary 2.1. Assume that Ω is generated according to the Bernoulli distribution with probability
𝑞. For any X ∈ K𝑛1×𝑛2×𝑛3, if PT(X) = X then

holds with probability at least 1 − 2𝑛1𝑛2𝑛3 exp

(cid:16) −3𝑞𝑛1𝑛2
28(𝑛1+𝑛2)𝜇0𝑟

(cid:13)
(cid:13)

(cid:0)PTRΩ𝑡 PT − PT(cid:1) (X)(cid:13)

(cid:13)∞ ≤

(cid:107)X(cid:107)∞

1
2
(cid:17)
.

Corollary 2.1 is used to give a probabilistic estimate towards the lower bound of (cid:107)D𝑡 (cid:107)∞, where

D𝑡 is defined in Equation (2.22) later. Now we are ready to provide the proof of Theorem 2.5.

Proof of Theorem 2.5. First of all, one can get that the Condition 1 holds with probability at least
(cid:18)

(cid:19)

1 − 2𝑛1𝑛2𝑛3 exp

−

3𝑝𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟

(2.20)

according to Lemma 2.8.

Next, our main goal is to construct a dual certificate Y that satisfies the condition 2. We do

this using the Golfing Scheme [23, 50]. Choose 𝑡0 as

(cid:24)

𝑡0 ≥

log2

(cid:18)

(cid:114) 𝑛3𝑟
𝑝

4

(cid:19)(cid:25)

(2.21)

48

where (cid:100)·(cid:101) is the ceil function. Suppose that the set Ω of observed entries is generated from Ω =
∪𝑡0
𝑡=1Ω𝑡 with P[(𝑖, 𝑗, 𝑘) ∈ Ω𝑡] = 𝑞 := 1 − (1 − 𝑝)
see that for any (𝑖, 𝑗, 𝑘) ∈ [𝑛1] × [𝑛2] × [𝑛3],

1
𝑡
0 and is independent of each other. It is easy to

P[(𝑖, 𝑗, 𝑘) ∈ Ω] =1 − P[(𝑖, 𝑗, 𝑘) ∉ ∪𝑡0

𝑡=1Ω𝑡]

𝑡0(cid:214)

=1 −

P[(𝑖, 𝑗, 𝑘) ∈ Ω𝑐

𝑡 ] = 1 −

𝑡0(cid:214)

𝑡=0

(1 − 𝑝)

1
𝑡

0 = 𝑝.

𝑡=1
𝑡0(cid:208)
𝑡=1

Therefore, the construction of Ω =
K𝑛1×𝑛2×𝑛3 : 𝑡 = 0, · · · , 𝑡0} be a sequence of tensors with A0 = 0 and

Ω𝑡 shares the same distribution as that of Ω. Let {A𝑡 ∈

A𝑡 = A𝑡−1 + RΩ𝑡 PT(U ∗ V(cid:62) − PT(A𝑡−1)),

where RΩ𝑡 (T ) :=

𝑞 1Ω𝑡 (𝑖, 𝑗, 𝑘) [T ]𝑖, 𝑗,𝑘 ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
1
Next, our goal is to prove that PΩ(Y) = Y by mathematical induction. For 𝑡 = 0, PΩ(A0) =

(cid:205)
𝑖∈[𝑛1], 𝑗 ∈[𝑛2],𝑘 ∈[𝑛3]

𝑗 . Set Y := A𝑡0.

PΩ(0) = 0 = A0. Notice that

A1 = A0 + RΩ1 PT(U ∗ V(cid:62) − PT(A0))

= A0 + RΩ1 PT(U ∗ V(cid:62))

= RΩ1 (U ∗ U(cid:62) ∗ (U ∗ V(cid:62)) + (U ∗ V(cid:62)) ∗ V ∗ V(cid:62) − U ∗ U(cid:62) ∗ (U ∗ V(cid:62)) ∗ V ∗ V(cid:62))

= RΩ1 (U ∗ V(cid:62)).

Due to Ω1 ⊆ Ω, it is easy to see that PΩ(A1) = PΩ(RΩ1 (U ∗ V(cid:62))) = RΩ1 (U ∗ V(cid:62)) = A1.
Assume that for 𝑘 ≤ 𝑡0 − 1, it holds that PΩ(A𝑘 ) = A𝑘 . By linearity of operator PΩ and Ω𝑡0 ⊆ Ω,

it follows that

PΩ(Y) = PΩ(A𝑡0)

= PΩ(A𝑡0−1 + RΩ𝑡

0

PT(U ∗ V(cid:62) − PT(A𝑡0−1)))

= PΩ(A𝑡0−1) + PΩ(RΩ𝑡

0

PT(U ∗ V(cid:62) − PT(A𝑡0−1)))

= A𝑡0−1 + RΩ𝑡

0

PT(U ∗ V(cid:62) − PT(A𝑡0−1)) = A𝑡0 = Y.

Therefore Y = A𝑡0 is the dual certificate.

49

Now let’s prove that (cid:107)PT(Y) − U ∗ V(cid:62)(cid:107)F ≤ 1
4
D𝑡 = U ∗ V(cid:62) − PT(A𝑡).

(cid:113) 𝑝
𝑛3

. For 𝑡 = 0, 1, · · · , 𝑡0, set

Notice that

𝑝
𝑡0
Thus, one can derive the following results by Lemma 2.8: for each 𝑡,

1
0 ≥ 1 − (1 −
𝑡

𝑞 = 1 − (1 − 𝑝)

𝑝
𝑡0

) =

.

(cid:107)D𝑡 (cid:107)F ≤ (cid:13)

(cid:13)PT − PTRΩ𝑡 PT

(cid:13)
(cid:13) (cid:107)D𝑡−1(cid:107)F ≤

1
2

(cid:107)D𝑡−1(cid:107)F

holds with probability at least

1 − 2𝑛1𝑛2𝑛3 exp(−

3𝑞𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟

).

(2.22)

(2.23)

(2.24)

Applying (2.24) from 𝑡 = 𝑡0 to 𝑡 = 1, we will have that

(cid:107)PT(Y − U ∗ V(cid:62)) (cid:107)F = (cid:107)D𝑡0 (cid:107)F ≤

1
2

(cid:107)D𝑡0−1(cid:107)F ≤ · · · ≤ (

)𝑡0 (cid:107)U ∗ V(cid:62)(cid:107)F ≤ (

1
2

√
𝑟

)𝑡0

1
2

(2.25)

holds with probability at least

1 − 2𝑡0𝑛1𝑛2𝑛3 exp(−

Since 𝑡0 ≥

(cid:108)
log2

(cid:16)

(cid:113) 𝑛3𝑟
𝑝

4

(cid:17)(cid:109)

, (cid:107)PT(Y − U ∗ V(cid:62))(cid:107)F ≤ 1
4

1 − 2𝑡0𝑛1𝑛2𝑛3 exp(−

Next, we move on to prove (cid:107)PT⊥ (Y)(cid:107) ≤ 1

Lemma 2.9 for 𝑡0 times, we can get

3𝑞𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟

).

(cid:113) 𝑝
𝑛3
3𝑞𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟
2. Recall that Y =

).

holds with probability at least

(2.26)

𝑡0(cid:205)
𝑖=1

RΩ𝑡 PTD𝑡−1. By applying

(cid:107)PT⊥ (Y)(cid:107)
𝑡0(cid:213)

(cid:13)PT⊥ (cid:0)RΩ𝑡 PT − PT(cid:1) (D𝑡−1)(cid:13)
(cid:13)
(cid:13)

(cid:13)
(cid:13)

(cid:0)RΩ𝑡 − I(cid:1) (PT(D𝑡−1))(cid:13)
(cid:13)

≤

≤

≤

≤

=

𝑡=1
𝑡0(cid:213)

𝑡=1
𝑡0(cid:213)

𝑡=1
𝑡0(cid:213)

𝑡=1
𝑡0(cid:213)

𝑡=1

(cid:32) 𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞
(cid:32) 𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞
(cid:32) 𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:115)

(cid:115)

(cid:107)PT(D𝑡−1)(cid:107)∞ +

(cid:107)PT(D𝑡−1)(cid:107)∞ +

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:107)PT(D𝑡−1) (cid:107)∞,2

(2.27)

(cid:33)

(cid:33)

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:107)PT(D𝑡−1) (cid:107)∞,2

(2.28)

(cid:107)D𝑡−1(cid:107)∞ +

(cid:115)

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:107)D𝑡−1(cid:107)∞,2

(cid:33)

(2.29)

50

holds with probability at least

1 −

𝑡0
((𝑛1 + 𝑛2)𝑛3)𝑐2−1

.

(2.30)

(2.28) holds due to (2.23).

(2.29) is due to PT(D𝑡) = D𝑡 by the construction of D𝑡 in Equa-

tion (2.22).

Next, we will bound (2.29) by bounding these two terms:

(i)

(ii)

𝑡0(cid:205)
𝑡=1
𝑡0(cid:205)
𝑡=1

𝑐2 log((𝑛1+𝑛2)𝑛3)
𝑞

(cid:107)D𝑡−1(cid:107)∞

(cid:113) 2𝑐2 log((𝑛1+𝑛2)𝑛3)
𝑞

(cid:107)D𝑡−1(cid:107)∞,2

via estimating the upper bounds of (cid:107)D𝑡−1(cid:107)∞ and (cid:107)D𝑡−1(cid:107)∞,2.

By applying Corollary 2.1 for 𝑡 − 1 times, where 2 ≤ 𝑡 ≤ 𝑡0, we have

(cid:107)D𝑡−1(cid:107)∞ = (cid:13)
(cid:0)PT − PTRΩ𝑡 −1 PT(cid:1) · · · (cid:0)PT − PTRΩ1 PT(cid:1) D0
(cid:13)
(cid:19) 𝑡−1
(cid:18) 1
2

(cid:107)D0(cid:107)∞

≤

(cid:13)
(cid:13)∞

holds with probability at least 1 − 2𝑛1𝑛2𝑛3(𝑡 − 1) exp

(cid:16) −3𝑞𝑛1𝑛2
28(𝑛1+𝑛2)𝜇0𝑟

(cid:17)

. Therefore,

𝑡0(cid:213)

𝑡=1

𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:107)D𝑡−1(cid:107)∞ ≤

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

· (cid:107)D0(cid:107)∞

holds with probability at least

1 − 2𝑛1𝑛2𝑛3(𝑡0 − 1) exp

−3𝑞𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟
𝑡0(cid:205)
𝑡=1
(cid:107)D𝑡−1(cid:107)∞,2. For simplicity of expression, we will denote

Now we are going to estimate the upper bound for

(cid:113) 2𝑐2 log((𝑛1+𝑛2)𝑛3)
𝑞

(cid:19)

.

(cid:18)

(2.31)

(cid:107)D𝑡−1(cid:107)∞,2 by bounding

𝑎 = 2·

(cid:113) 𝑐1 log((𝑛1+𝑛2)𝑛3)
𝑞

(cid:113) (𝑛1+𝑛2)𝜇0𝑟
𝑛1𝑛2

and 𝑏 =

𝑐1 (log((𝑛1+𝑛2)𝑛3))
𝑞

(cid:113) (𝑛1+𝑛2)𝜇0𝑟
𝑛1𝑛2

. By applying Lemma 2.10

by 𝑡 − 1 times and considering the fact that PT(D𝑠) = D𝑠 for all 0 ≤ 𝑠 ≤ 𝑡0, we obtain

(cid:107)D𝑡−1(cid:107)∞,2 = (cid:13)
(cid:13)

(cid:0)PT − PTRΩ𝑡 −1 PT(cid:1) (D𝑡−2)(cid:13)

(cid:13)∞,2

≤𝑎(cid:107)D𝑡−2(cid:107)∞,2 + 𝑏(cid:107)D𝑡−2(cid:107)∞ ≤ · · · ≤ 𝑎𝑡−1(cid:107)D0(cid:107)∞,2 + 𝑏

holds with probability at least 1 −

𝑡−1
((𝑛1𝑛3+𝑛2𝑛3)𝑐

1 −2 for 2 ≤ 𝑡 ≤ 𝑡0.

𝑡−2
(cid:213)

𝑖=0

𝑎𝑖 (cid:107)D𝑡−2−𝑖 (cid:107)∞

51

Therefore,

(cid:115)

𝑡0(cid:213)

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:107)D𝑡−1(cid:107)∞,2

𝑡=1
(cid:115)

≤

(cid:115)

=

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

𝑎𝑡−1(cid:107)D0(cid:107)∞,2

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(cid:107)D0(cid:107)∞,2

1 − 𝑎𝑡0
1 − 𝑎

(cid:32)(cid:32) 𝑡0(cid:213)

𝑡=1

(cid:32)

(cid:33)

𝑡0(cid:213)

𝑏

𝑡−2
(cid:213)

+

𝑡=2
𝑡0(cid:213)

+ 𝑏 ·

𝑖=0
𝑡−2
(cid:213)

𝑡=2

𝑖=0

(cid:33)

𝑎𝑖 (cid:107)D𝑡−2−𝑖 (cid:107)∞

(cid:33)

𝑎𝑖 (cid:107)D𝑡−2−𝑖 (cid:107)∞

holds with probability at least

1 −

𝑡0 − 1
(𝑛1𝑛3 + 𝑛2𝑛3)𝑐1−2

.

Taking the process of estimating the upper bound for (cid:107)D𝑡 (cid:107)∞ into account, we thus have

(2.29) ≤

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

· (cid:107)D0(cid:107)∞ +

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(cid:107)D0(cid:107)∞,2
1 − 𝑎

(cid:115)

(cid:115)

+

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

𝑡0(cid:213)

𝑡−2
(cid:213)

𝑎𝑖

· 𝑏 ·

(cid:19) 𝑡−2−𝑖

(cid:107)D0(cid:107)∞

(cid:18) 1
2

𝑡=2
(cid:115)

𝑖=0
2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(cid:107)D0(cid:107)∞,2
1 − 𝑎

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

≤

· (cid:107)D0(cid:107)∞ +

(cid:115)

+

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

· (cid:107)D0(cid:107)∞

2𝑏
1 − 2𝑎
1 −2 −2(𝑡0 −1)𝑛1𝑛2𝑛3 exp

(2.32)

(2.33)

(2.34)

holds with probability at least 1−

𝑡0−1
(𝑛1𝑛3+𝑛2𝑛3)𝑐

(cid:16) −3𝑞𝑛1𝑛2
28(𝑛1+𝑛2)𝜇0𝑟

(cid:17)

when 0 < 𝑎 ≤ 1
4.

Therefore,

(cid:107)PT⊥ (Y) (cid:107) ≤

𝑡0(cid:213)

(cid:32) 𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:107)D𝑡−1(cid:107)∞ +

(cid:115)

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

(cid:107)D𝑡−1(cid:107)∞,2

(cid:33)

𝑡=1
2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

≤

(cid:115)

· (cid:107)D0(cid:107)∞ +

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(cid:107)D0(cid:107)∞,2
1 − 𝑎

(cid:115)

+

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

2𝑏
1 − 2𝑎

· (cid:107)D0(cid:107)∞

holds with probability at least

1 −

𝑡0 − 1
(𝑛1𝑛3 + 𝑛2𝑛3)𝑐1−2

provided that 0 < 𝑎 ≤ 1
4.

− 2(𝑡0 − 1)𝑛1𝑛2𝑛3 exp

(cid:18)

−3𝑞𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟

(cid:19)

−

𝑡0
(𝑛1𝑛3 + 𝑛2𝑛3)𝑐2−1

52

Note that (cid:107)D0(cid:107)∞ = (cid:107)U ∗ V(cid:62)(cid:107)∞ ≤ (𝑛1+𝑛2)𝜇0𝑟

2𝑛1𝑛2

and (cid:107)D0(cid:107)∞,2 = (cid:107)U ∗ V(cid:62)(cid:107)∞,2 ≤

(cid:113) (𝑛1+𝑛2)𝜇0𝑟
𝑛1𝑛2

.

And combining (2.30), (2.31), and (2.6.2.2), we thus have

(cid:107)PT⊥ (Y) (cid:107) ≤

𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

+

(cid:115)

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(cid:112)(𝑛1 + 𝑛2)𝜇0𝑟
√
𝑛1𝑛2
(1 − 𝑎)

(cid:115)

+

2𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

𝑏
1 − 2𝑎

·

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

≤

·

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2
(cid:18) log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

+ 2𝑐1

(cid:112)

2𝑐2

(cid:19) 3/2

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

(cid:115)

4

+

√

2𝑐2
3

·

log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

holds with probability at least

1 −

𝑡0 − 1
(𝑛1𝑛3 + 𝑛2𝑛3)𝑐1−2

− 2(𝑡0 − 1)𝑛1𝑛2𝑛3 exp

(cid:18)

−3𝑞𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟

(cid:19)

−

𝑡0
(𝑛1𝑛3 + 𝑛2𝑛3)𝑐2−1

provided that 0 < 𝑎 ≤ 1
4.
(cid:113) 𝑐1 log((𝑛1+𝑛2)𝑛3)
𝑞

Since 𝑎 = 2 ·

(cid:113) (𝑛1+𝑛2)𝜇0𝑟
𝑛1𝑛2

𝑞 ≥ 64𝑐1 log((𝑛1 + 𝑛2)𝑛3) ·

, the restriction 0 < 𝑎 ≤ 1
(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

4 is equivalent to
.

Notice that

We thus have

𝑝 ≥

256(𝑛1 + 𝑛2)𝜇0𝛽𝑟 log2((𝑛1 + 𝑛2)𝑛3)
𝑛1𝑛2

.

(cid:18)

(cid:114) 𝑛3𝑟
4
𝑝

(cid:19)

(cid:101)

≤

𝑡0 =(cid:100)log2
(cid:24)
log2
(cid:24) 1
2

≤

(cid:18)

(cid:114)
4

𝑛1𝑛2𝑛3
256(𝑛1 + 𝑛2)𝜇0𝛽 log2((𝑛1 + 𝑛2)𝑛3)
(cid:25)

(cid:19)

(cid:19)(cid:25)

− 2

≤ log((𝑛1 + 𝑛2)𝑛3)

log2

(cid:18)

𝑛1𝑛2𝑛3
(𝑛1 + 𝑛2)𝜇0𝛽

𝑞 ≥

, we have
256(𝑛1 + 𝑛2)𝜇0𝛽𝑟 log2((𝑛1 + 𝑛2)𝑛3)
𝑛1𝑛2
256(𝑛1 + 𝑛2)𝜇0𝛽𝑟 log((𝑛1 + 𝑛2)𝑛3)
𝑛1𝑛2

,

=

·

1
log ((𝑛1 + 𝑛2)𝑛3)

In addition 𝑞 ≥

𝑝
𝑡0

(𝑛1+𝑛2)𝜇0𝑟 log((𝑛1+𝑛2)𝑛3)
𝑞𝑛1𝑛2

i.e.,
256𝛽 .
Therefore the condition that 𝑞 ≥ 64𝑐1 log((𝑛1 + 𝑛2)𝑛3) (𝑛1+𝑛2)𝜇0𝑟

≤ 1

𝑛1𝑛2

holds when 𝑐1 = 4𝛽. Hence, the

53

condition 𝑎 ≤ 1

(cid:107)PT⊥ (Y) (cid:107) ≤

4 holds. In addition, by setting 𝑐2 = 12𝛽, we have
(cid:115)
𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑞

√

+

4

·

2𝑐2
3

·

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2
(cid:18) log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

2𝑐2
3

(cid:115)

1
256𝛽

+ 2𝑐1

(cid:112)

2𝑐2

(cid:19) 3/2

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2
(cid:18) 1
256𝛽

(cid:19) 3/2

< 1
2

.

(cid:112)

+ 2𝑐1

≤

𝑐2
256𝛽

+

2𝑐2
√

4

log((𝑛1 + 𝑛2)𝑛3)
𝑞

·

(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

with probability at least

1 −

log(𝑛1𝑛3 + 𝑛2𝑛3)
(𝑛1𝑛3 + 𝑛2𝑛3)4𝛽−2

−

≥ 1 −

3 log(𝑛1𝑛3 + 𝑛2𝑛3)
(𝑛1𝑛3 + 𝑛2𝑛3)4𝛽−2

log(𝑛1𝑛3 + 𝑛2𝑛3)
(𝑛1𝑛3 + 𝑛2𝑛3)27𝛽−2
.

−

log(𝑛1𝑛3 + 𝑛2𝑛3)
(𝑛1𝑛3 + 𝑛2𝑛3)12𝛽−1

Notice that the probabilistic estimation for the validity of the Condition 2 is predicated on the as-
2)𝑡−1(cid:107)D0(cid:107)∞ based on the

sumption that the Condition 1 holds true, where we show (cid:107)D (cid:107)∞ ≤ ( 1

Condition 1. Thus, T is the unique minimizer with probability at least

1 −

3 log(𝑛1𝑛3 + 𝑛2𝑛3)
(𝑛1𝑛3 + 𝑛2𝑛3)4𝛽−2

.

(cid:131)

2.6.2.3 Proof of supporting lemmas towards Theorem 2.5
Proof of Lemma 2.6. Substitute 𝜏 = (cid:112)

2𝑐𝜎2 log (𝑛1 + 𝑛2)+𝑐𝑀 log (𝑛1 + 𝑛2) to

−𝜏2/2
𝜎2+ 𝑀 𝜏
3

in Lemma 2.5.

We can get

−𝜏2/2
𝜎2 + 𝑀𝜏
3

= −

√

2𝑐𝜎2 log(𝑛1 + 𝑛2) + 2
√
2𝜎2 + 2

2𝑐 3
𝑐 1
2 𝜎𝑀 log

2

3

2 𝜎𝑀 log

3

2 (𝑛1 + 𝑛2) + 𝑐2𝑀 2 log2(𝑛1 + 𝑛2)

1

2 (𝑛1 + 𝑛2) + 2𝑐𝑀 2
3

log (𝑛1 + 𝑛2)

≤ −𝑐 log(𝑛1 + 𝑛2).

Proof of Lemma 2.7.

(cid:107)PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 ) (cid:107)2

F = (cid:104)PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 ), PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 )(cid:105)

= (cid:104)PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 ), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 (cid:105)

= (cid:13)

(cid:13)U(cid:62) ∗ ˚𝔢𝑖

≤ (cid:13)

(cid:13)U(cid:62) ∗ ˚𝔢𝑖

(cid:13)
2
(cid:13)
F
(cid:13)
2
(cid:13)
F

(cid:13)U(cid:62) ∗ ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢 𝑗
(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

+ (cid:13)

(cid:13)V(cid:62) ∗ ˚𝔢 𝑗

(cid:13)
(cid:13)

2
F

− (cid:13)

+ (cid:13)

(cid:13)V(cid:62) ∗ ˚𝔢 𝑗

(cid:13)
(cid:13)

2
F =

54

(cid:131)

(cid:131)

(cid:62) ∗ V(cid:13)
2
(cid:13)
F

,

In the following context, we use 𝛿𝑖, 𝑗,𝑘 to represent indicator function 1(𝑖, 𝑗,𝑘)∈Ω.

Proof of Lemma 2.8. By the fact that PT is self adjoint and idempotent operator, we can get E[PTRΩPT] =

PT(ERΩ)PT = PT. It is easy to check that

PTRΩPT(X) =

(cid:213)

𝑖, 𝑗,𝑘

1
𝑝

𝛿𝑖, 𝑗,𝑘 (cid:10)X, PT (cid:0)˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢 𝑗

(cid:62)(cid:1) (cid:11) PT

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

.

Fix a tensor X ∈ K𝑛1×𝑛2×𝑛3, we can write
(cid:18) 1
𝑝

(PTRΩPT − PT) (X) =

(cid:213)

𝑖, 𝑗,𝑘

𝛿𝑖 𝑗 𝑘 − 1

(cid:19) (cid:68)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)

(cid:69)

PT

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

=:

(cid:213)

𝑖, 𝑗,𝑘

H𝑖 𝑗 𝑘 (X)

where H𝑖 𝑗 𝑘 : K𝑛1×𝑛2×𝑛3 → K𝑛1×𝑛2×𝑛3 is a self-adjoint random operator and 𝛿𝑖, 𝑗,𝑘 is the indicator
function. It is direct to see that E (cid:2)H𝑖 𝑗 𝑘 (cid:3) = 0 due to the fact that E( 1
Define the operator H 𝑖 𝑗 𝑘 : B → B, where B =

denotes the set consists of

𝑝 E(𝛿𝑖, 𝑗,𝑘 ) − 1 = 0.

B : B ∈ K𝑛1×𝑛2×𝑛3

𝑝 𝛿𝑖, 𝑗,𝑘 − 1) = 1

(cid:111)

(cid:110)

block diagonal matrices with the blocks as the frontal slices of B, as
(cid:69)

(cid:68)

H 𝑖 𝑗 𝑘 (X) := H𝑖 𝑗 𝑘 (X) = (

𝛿𝑖 𝑗 𝑘 − 1)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)

1
𝑝

(cid:16)

PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

.

adjoint. Using the fact that E( 1

It is easy to check that H 𝑖 𝑗 𝑘 is also self-adjoint by using the fact that the operator PT(·) is self-
𝑝 𝛿𝑖, 𝑗,𝑘 − 1) = 0 again, we have E (cid:104)
(cid:13)
(cid:13)
H 𝑖 𝑗 𝑘
(cid:13)

the non-commutative Bernstein inequality, we need to bound

= 0. To prove the result by

(cid:13)
(cid:13)
(cid:13) and

2
𝑖, 𝑗,𝑘

H 𝑖 𝑗 𝑘

E (cid:104)

H

(cid:105)

(cid:105)

(cid:13)
(cid:13)
(cid:13)
. Firstly,
(cid:13)
(cid:13)

(cid:13)
(cid:13)
(cid:205)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

55

we have

(cid:13)
(cid:13)
(cid:13)

H 𝑖 𝑗 𝑘

(cid:13)
(cid:13)
(cid:13) = sup
(cid:107)X(cid:107)F=1

= sup

(cid:107)X(cid:107)F=1

= sup

(cid:107)X(cid:107)F=1

≤ sup

(cid:107)X(cid:107)F=1

= sup

(cid:107)X(cid:107)F=1

= sup

H 𝑖, 𝑗,𝑘 (X)

(cid:13)
(cid:13)
(cid:13)F

𝛿𝑖 𝑗 𝑘 − 1)

𝛿𝑖 𝑗 𝑘 − 1)

(cid:68)

(cid:68)

(

(

1
𝑝

1
𝑝

(cid:13)
(cid:13)
(cid:13)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 ), X

(cid:69)

(cid:69)

PT

PT

(cid:16)

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)(cid:13)
(cid:13)
(cid:13)
(cid:13)F
(cid:17)(cid:13)
(cid:13)
(cid:13)
(cid:13)F

1
𝑝

1
𝑝

1
𝑝

(cid:13)
(cid:13)
(cid:13)

(cid:13)
(cid:13)
(cid:13)

(cid:13)
(cid:13)
(cid:13)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 )

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 )

(cid:13)
(cid:13)
(cid:13)F

(cid:13)
(cid:13)
(cid:13)F

(cid:107)X(cid:107)F

(cid:13)
(cid:13)
(cid:13)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 )

(cid:13)
(cid:13)
(cid:13)F

(cid:107)X(cid:107)F

√

𝑛3

(cid:13)
(cid:13)
(cid:13)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 )

(cid:13)
(cid:13)
(cid:13)F

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ‘˚𝔢(cid:62)
𝑗 )

(cid:13)
(cid:13)
(cid:13)F

(cid:107)X(cid:107)F

(cid:13)
(cid:13)
(cid:13)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 )

(cid:13)
(cid:13)
(cid:13)F

(cid:107)X(cid:107)F=1
(cid:13)
1
(cid:13)
(cid:13)
𝑝

=

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 )

2

(cid:13)
(cid:13)
(cid:13)

≤

𝜇0(𝑛1 + 𝑛2)𝑟
𝑛1𝑛2 𝑝

.

Next, we move on to bound

(cid:13)
(cid:13)
(cid:205)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

E (cid:104)

2
𝑖, 𝑗,𝑘

H

F
(cid:13)
(cid:13)
(cid:105)
(cid:13)
. By using the fact that PT is a self-adjoint and an
(cid:13)
(cid:13)

idempotent operator, we can get that

2
𝑖, 𝑗,𝑘 (X) =

H

(cid:19) 2

𝛿𝑖 𝑗 𝑘 − 1

(cid:18) 1
𝑝

(cid:104)˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)(cid:105)(cid:104)𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 )(cid:105)PT

(cid:16)

𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

.

56

(cid:20) (cid:16) 1

𝑝 𝛿𝑖 𝑗 𝑘 − 1

(cid:17) 2(cid:21)

= 1−𝑝

𝑝 ≤ 1

𝑝 . Notice that

H

(cid:105)

2
𝑖 𝑗 𝑘 (X)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)F

≤

1
𝑝

(cid:213)

(cid:213)

E (cid:104)

Note that E
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

𝑛3
𝑝

𝑛3
𝑝

(cid:213)

(cid:213)

√

√

≤

=

(cid:104)˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)(cid:105)(cid:104)˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT

(cid:68)

(cid:68)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)

(cid:69) (cid:68)

(cid:69) (cid:68)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT

(cid:16)

(cid:16)

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

(cid:105)PT

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)F

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)(cid:69)

(cid:16)

PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)(cid:69)

(cid:16)

·

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)F

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)F

√

𝑛3
𝑝

≤

(cid:110)(cid:68)

· max
𝑖, 𝑗,𝑘

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)(cid:69)(cid:111)

(cid:68)

(cid:213)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 , PT(X)

(cid:16)

(cid:69)

·

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)F

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)(cid:13)
2
(cid:13)
(cid:13)
F

) · (cid:107)PT(X)(cid:107)F

(cid:107)PT(X)(cid:107)F (By Lemma 2.7)

√

(cid:16)

=

≤

(cid:13)
(cid:13)
(cid:13)

PT

𝑛3
· (max
𝑝
𝑖, 𝑗,𝑘
√
𝑛3(𝑛1 + 𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2
𝑛3(𝑛1 + 𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2
We have operator norm of (cid:205)
𝑖, 𝑗,𝑘

(cid:107)X(cid:107)F =

√

≤

(𝑛1 + 𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2
(cid:105)
2
H
𝑖 𝑗 𝑘

E (cid:104)

(cid:107)X(cid:107)F.

is bounded by

(𝑛1+𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2

. Thus, we use non-commutative

Bernstein inequality to the following result:

Notice that

𝜎2

𝑀 = 1 since 𝜎2 = 𝑀 = (𝑛1+𝑛2)𝜇0𝑟

𝑝𝑛1𝑛2

. Thus, by Lemma 2.5, we have

(cid:20)

P

(cid:107)PTPΩPT − PT(cid:107) > 1
2

(cid:21)

= P

(cid:213)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)








H𝑖 𝑗 𝑘

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

> 1
2








=P

(cid:213)

(cid:13)
(cid:13)


(cid:13)

(cid:13)
(cid:13)

𝑖, 𝑗,𝑘
(cid:13)



H 𝑖 𝑗 𝑘

≤2𝑛1𝑛2𝑛3 exp

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:18)

> 1
2







3𝑝𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟

−

(cid:19)

.

(cid:131)

Proof of Lemma 2.9. It is easy to check that

RΩ(X) − X =

(cid:19)

𝛿𝑖, 𝑗,𝑘 − 1

(cid:18) 1
𝑝

(cid:213)

𝑖, 𝑗,𝑘

[X]𝑖, 𝑗,𝑘 ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 =:

E𝑖, 𝑗,𝑘 .

(cid:213)

𝑖, 𝑗,𝑘

57

Notice that E[E𝑖, 𝑗,𝑘 ] = 0 and (cid:107)E𝑖, 𝑗,𝑘 (cid:107) ≤ 1
inequality, we just need to check uniform boundness of the spectral norm of E( (cid:205)
𝑖, 𝑗,𝑘

𝑝 (cid:107)X(cid:107)∞. In order to use the non-commutative Bernstein

(E𝑖, 𝑗,𝑘 )(cid:62)E𝑖, 𝑗,𝑘 )

and E( (cid:205)
𝑖, 𝑗,𝑘

E𝑖, 𝑗,𝑘 (E𝑖, 𝑗,𝑘 )(cid:62)). Using the fact that ˚𝔢𝑘

(cid:62) ∗ ˚𝔢𝑘 = (cid:164)𝔢(cid:62)

𝑘 ∗ (cid:164)𝔢𝑘 = (cid:164)𝔢1,(cid:164)𝔢1 ∗ (cid:164)𝔢𝑘 = (cid:164)𝔢𝑘 , and ˚𝔢 𝑗 ∗ (cid:164)𝔢1 = ˚𝔢 𝑗 ,

we can have the following result:

E(cid:62)

𝑖, 𝑗,𝑘 (T ) ∗ E𝑖, 𝑗,𝑘 (X) =

=

=

=

=

=

=

(cid:19) 2

(cid:19) 2

(cid:19) 2

(cid:19) 2

(cid:19) 2

(cid:19) 2

(cid:19) 2

𝛿𝑖, 𝑗,𝑘 − 1

𝛿𝑖, 𝑗,𝑘 − 1

𝛿𝑖, 𝑗,𝑘 − 1

𝛿𝑖, 𝑗,𝑘 − 1

𝛿𝑖, 𝑗,𝑘 − 1

𝛿𝑖, 𝑗,𝑘 − 1

𝛿𝑖, 𝑗,𝑘 − 1

(cid:18) 1
𝑝
(cid:18) 1
𝑝
(cid:18) 1
𝑝
(cid:18) 1
𝑝
(cid:18) 1
𝑝
(cid:18) 1
𝑝
(cid:18) 1
𝑝

[X]2

𝑖, 𝑗,𝑘

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17) (cid:62)

(cid:16)

∗

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

[X]2

𝑖, 𝑗,𝑘

(cid:0)˚𝔢 𝑗 ∗ (cid:164)𝔢(cid:62)

𝑘 ∗ ˚𝔢(cid:62)
𝑖

(cid:1) ∗

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

[X]2

𝑖, 𝑗,𝑘 ˚𝔢 𝑗 ∗ (cid:164)𝔢(cid:62)

𝑘 ∗ (cid:0)˚𝔢(cid:62)

𝑖 ∗ ˚𝔢𝑖(cid:1) ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

[X]2

𝑖, 𝑗,𝑘 ˚𝔢 𝑗 ∗ (cid:164)𝔢(cid:62)

𝑘 ∗ ( (cid:164)𝔢1 ∗ (cid:164)𝔢𝑘 ) ∗ ˚𝔢(cid:62)
𝑗

[X]2

𝑖, 𝑗,𝑘 ˚𝔢 𝑗 ∗ (cid:0)(cid:164)𝔢(cid:62)

𝑘 ∗ (cid:164)𝔢𝑘 (cid:1) ∗ ˚𝔢(cid:62)
𝑗

[X]2

𝑖, 𝑗,𝑘 ˚𝔢 𝑗 ∗ ( (cid:164)𝔢1) ∗ ˚𝔢(cid:62)
𝑗

[X]2

𝑖, 𝑗,𝑘 ˚𝔢 𝑗 ∗ ˚𝔢(cid:62)
𝑗 .

58

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

(cid:170)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:174)
(cid:172)

Notice that ˚𝔢 𝑗 ∗ ˚𝔢(cid:62)

𝑗 returns a zero tensor except for ( 𝑗, 𝑗, 1)th entry being 1. We have

(cid:213)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

E[E

(cid:62)
𝑖, 𝑗,𝑘 E𝑖, 𝑗,𝑘 ]

E[E(cid:62)

𝑖, 𝑗,𝑘 ∗ E𝑖, 𝑗,𝑘 ]

(by the definition of spectral norm of tensor)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

[X]2

𝑖, 𝑗,𝑘 ˚𝔢 𝑗 ∗ ˚𝔢(cid:62)
𝑗

[X]2

𝑖, 𝑗,𝑘 ˚𝔢 𝑗 ˚𝔢

(cid:62)
𝑗

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

(cid:205)
𝑖,𝑘

[X]2

𝑖,1,𝑘

(cid:205)
𝑖,𝑘

[X]2

𝑖,2,𝑘

. . .

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

=

≤

=

=

(cid:213)

(cid:213)

1
𝑝

1
𝑝

(cid:213)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:169)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:173)
(cid:13)
(cid:13)
(cid:171)

1
𝑝

(cid:205)
𝑖,𝑘

[X]2

𝑖,𝑛2,𝑘

𝑂𝑛2 (𝑛3−1)×𝑛2 (𝑛3−1)

(cid:41)

=

1
𝑝 max
𝑗 ∈[𝑛3]

(cid:13)
(cid:13)[X]:, 𝑗,:

(cid:13)
(cid:13)

.

2
F

(cid:40)

=

(cid:213)

1
𝑝 max
𝑗 ∈[𝑛3]
(cid:13)
(cid:205)𝑖, 𝑗,𝑘 E[E𝑖, 𝑗,𝑘 E
(cid:13)
(cid:13)

𝑖,𝑘

[X]2

𝑖, 𝑗,𝑘

(cid:62)
𝑖, 𝑗,𝑘 ]

(cid:13)
(cid:13)
(cid:13)

≤ 1

𝑝 max
𝑖∈[𝑛3]

(cid:13)
(cid:13)[X]𝑖,:,:

max

(cid:213)

E(

(E𝑖, 𝑗,𝑘 )(cid:62)E𝑖, 𝑗,𝑘 ), E(

E𝑖, 𝑗,𝑘 (E𝑖, 𝑗,𝑘 )(cid:62))

(cid:213)

𝑖, 𝑗,𝑘

(cid:13)
2
F. Thus,
(cid:13)




1
𝑝

≤

(cid:107)X(cid:107)2

∞,2

.

Similarly, we can get





𝑖, 𝑗,𝑘
By Lemma 2.6, for any 𝑐 > 1,
(cid:13)
(cid:13)
(cid:13)

(cid:107)RΩ(X) − X(cid:107) =

RΩ − X

(cid:13)
(cid:13)
(cid:13) =

(cid:213)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

E𝑖, 𝑗,𝑘

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

(cid:115)

2𝑐2
𝑝

≤

(cid:107)X(cid:107)2

∞,2 log((𝑛1 + 𝑛2)𝑛3) +

𝑐2 log((𝑛1 + 𝑛2)𝑛3)
𝑝

(cid:107)T (cid:107)∞

holds with probability at least 1 − ((𝑛1 + 𝑛2)𝑛3)1−𝑐2.

Proof of Lemma 2.10 Consider any 𝑏-th lateral column of PTRΩ(X) − PT(X):
1
𝑝

𝛿𝑖, 𝑗,𝑘 − 1) [X]𝑖, 𝑗,𝑘 PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

(PTRΩ(X) − PT(X)) ∗ ˚𝔢𝑏 =

𝑗 ) ∗ ˚𝔢𝑏 =:

(cid:213)

𝑖, 𝑗,𝑘
where 𝔞𝑖, 𝑗,𝑘 ∈ K𝑛1×1×𝑛3 are zero-mean independent lateral tensor columns.

(

𝔞𝑖, 𝑗,𝑘 ,

(cid:213)

𝑖, 𝑗,𝑘

59

Denote (cid:174)𝔞𝑖, 𝑗,𝑘 ∈ K𝑛1𝑛3×1 as the vectorized column vector of 𝔞𝑖, 𝑗,𝑘 . Then, we have
(𝑛1 + 𝑛2)𝜇0𝑟
𝑛1𝑛2

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

(cid:13)
(cid:13)(cid:174)𝔞𝑖, 𝑗,𝑘

| [X]𝑖, 𝑗,𝑘 |

(cid:13) = (cid:13)
(cid:13)

𝑗 ) ∗ ˚𝔢𝑏

(cid:13)𝔞𝑖, 𝑗,𝑘

(cid:13)
(cid:13)
(cid:13)F

(cid:13)
(cid:13)F

1
𝑝

1
𝑝

(cid:13)
(cid:13)
(cid:13)

(cid:115)

≤

≤

(cid:107)X(cid:107)∞.

We also have

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

(cid:213)

E(

(cid:174)𝔞(cid:62)
𝑖, 𝑗,𝑘 (cid:174)𝔞𝑖, 𝑗,𝑘 )

(cid:213)

= E(

(cid:13)
(cid:13)𝔞𝑖, 𝑗,𝑘

(cid:13)
(cid:13)

2
F

) =

1 − 𝑝
𝑝

(cid:213)

[X]2

𝑖, 𝑗,𝑘

(cid:13)
(cid:13)
(cid:13)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 ) ∗ ˚𝔢𝑏

(cid:13)
2
(cid:13)
(cid:13)
F

.

𝑖, 𝑗,𝑘

𝑖, 𝑗,𝑘
By the definition of PT and the incoherence condition, we have:

𝑖, 𝑗,𝑘

=

𝑗 ) ∗ ˚𝔢𝑏)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)F
(cid:13)
(U ∗ U(cid:62) ∗ ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ) ∗ ˚𝔢(cid:62)
𝑗 ∗ ˚𝔢𝑏 + (I𝑛1 − U ∗ U(cid:62)) ∗ ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
(cid:13)
(cid:13)
(cid:114) 𝜇0𝑟
𝑛1
(cid:114) 𝜇0𝑟
𝑛1

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)
(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)
By Cauchy-Schwartz inequality, we have

(cid:13)(I𝑛1 − U ∗ U(cid:62)) ∗ ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘

𝑗 ∗ V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)F
(cid:13)
(cid:13)
(cid:13)F

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ ˚𝔢𝑏

𝑗 ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)F

+ (cid:13)

(cid:13)
(cid:13)

≤

≤

+

𝑗 ∗ V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)F

𝑗 ∗ V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)F

(cid:13)
(cid:13)
(cid:13)

PT( ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 ) ∗ ˚𝔢𝑏)

(cid:13)
2
(cid:13)
(cid:13)
F

≤

2𝜇0𝑟
𝑛1

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)

2

F

+ 2

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
2
(cid:13)
(cid:13)
F

.

Thus,
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

E(

(cid:213)

𝑖, 𝑗,𝑘

(cid:174)𝔞(cid:62)
𝑖, 𝑗,𝑘 (cid:174)𝔞𝑖, 𝑗,𝑘 )

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

≤

=

≤

=

≤

Similarly, we can bound

2𝜇0𝑟
𝑝𝑛1
2𝜇0𝑟
𝑝𝑛1
2𝜇0𝑟
𝑝𝑛1
2𝜇0𝑟
𝑝𝑛1
2𝜇0𝑟
𝑝𝑛1

[X]2

𝑖, 𝑗,𝑘

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ ˚𝔢𝑏

[X]2

𝑖, 𝑗,𝑘

[X]2

𝑖, 𝑗,𝑘

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ ˚𝔢𝑏

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)

2

F

(cid:13)
(cid:13)
(cid:13)

2

F

(cid:13)
(cid:13)
(cid:13)

2

F

+

+

+

2
𝑝

2
𝑝

2
𝑝

(cid:213)

𝑖, 𝑗,𝑘

(cid:213)

𝑗

(cid:213)

𝑗

[X]2

𝑖, 𝑗,𝑘

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)

2

F

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
(cid:13)˚𝔢(cid:62)
(cid:13)

𝑗 ∗ V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
(cid:13)
(cid:13)

2

F

(cid:13)
(cid:13)
(cid:13)

2

F

(cid:213)

𝑖,𝑘

[X]2

𝑖, 𝑗,𝑘

(cid:107)X(cid:107)2

∞,2

[X]2

𝑖,𝑏,𝑘 +

2
𝑝

(cid:13)
(cid:13)V ∗ V(cid:62) ∗ ˚𝔢𝑏

(cid:13)
(cid:13)

2
F

(cid:107)X(cid:107)2

∞,2

(cid:213)

𝑖, 𝑗,𝑘

(cid:213)

𝑖, 𝑗,𝑘

(cid:213)

𝑖, 𝑗,𝑘

(cid:213)

𝑖,𝑘

(cid:107)X(cid:107)2

∞,2 ≤

2(𝑛1 + 𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2

(cid:107)X(cid:107)2

∞,2

by the same quantity.

(cid:12)
(cid:12)
E( (cid:205)
(cid:12)
(cid:12)
𝑖, 𝑗,𝑘
(cid:12)

(cid:107)X(cid:107)2

∞,2 +

2𝜇0𝑟
𝑝𝑛2
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:113) (𝑛1+𝑛2)𝜇0𝑟
𝑛1𝑛2

𝑖, 𝑗,𝑘 )

(cid:174)𝔞𝑖, 𝑗,𝑘 (cid:174)𝔞(cid:62)

For simplicity, we let 𝑀 = 1
𝑝

(cid:107)X(cid:107)∞ and 𝜎2 = 2(𝑛1+𝑛2)𝜇0𝑟

𝑝𝑛1𝑛2

(cid:107)X(cid:107)2

∞,2.

60

By Lemma 2.6, for any 𝑐1 > 1, we have

(cid:213)

(cid:174)𝔞𝑖, 𝑗,𝑘

≤ (cid:112)

P (cid:169)
(cid:173)
(cid:171)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:13)
(cid:13)
(cid:13)
= P (cid:169)
(cid:13)
(cid:173)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)
(cid:171)
𝑐1 log((𝑛1 + 𝑛2)𝑛3)(cid:112)(𝑛1 + 𝑛2)𝜇0𝑟
√
𝑛1𝑛2

(cid:174)𝔞𝑖, 𝑗,𝑘

(cid:213)

(cid:115)

≤

+

𝑝

2𝑐1𝜎2 log((𝑛1 + 𝑛2)𝑛3) + 𝑐1𝑀 log((𝑛1 + 𝑛2)𝑛3)(cid:170)
(cid:174)
(cid:172)

4𝑐1 log ((𝑛1 + 𝑛2)𝑛3) (𝑛1 + 𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2

· (cid:107)X(cid:107)∞,2

(cid:33)

(cid:107)X(cid:107)∞

≥ 1 − ((𝑛1 + 𝑛2)𝑛3)1−𝑐1.

(cid:213)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

𝔞𝑖, 𝑗,𝑘

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)F

=

(cid:213)

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
𝑖, 𝑗,𝑘
(cid:13)

(cid:174)𝔞𝑖, 𝑗,𝑘

(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)

.

Notice that

(cid:107) (PTRΩ(X) − PT(X)) ∗ ˚𝔢𝑏 (cid:107)F =

Therefore,
(cid:32)

P

(cid:107) (PTRΩ(X) − PT(X)) ∗ ˚𝔢𝑏 (cid:107)F ≤

(cid:115)

4𝑐1(𝑛1 + 𝑛2) log ((𝑛1 + 𝑛2)𝑛3) 𝜇0𝑟
𝑝𝑛1𝑛2

· (cid:107)X(cid:107)∞,2+

𝑐1 log((𝑛1 + 𝑛2)𝑛3)(cid:112)(𝑛1 + 𝑛2)𝜇0𝑟
√
𝑛1𝑛2

𝑝

(cid:33)

(cid:107)X(cid:107)∞

≥ 1 − ((𝑛1 + 𝑛2)𝑛3)1−𝑐1.

Using a union bound over all the tensor lateral slices, we have

(cid:32)

P

max
𝑏

{(cid:107) (PTRΩ(X) − PT(X)) ∗ ˚𝔢𝑏 (cid:107)F} ≤

𝑐1 log((𝑛1 + 𝑛2)𝑛3)(cid:112)(𝑛1 + 𝑛2)𝜇0𝑟
√
𝑛1𝑛2

𝑝

+

(cid:33)

(cid:107)X(cid:107)∞

Similarly, we can also show that

(cid:32)

P

max
𝑏

(cid:8)(cid:13)
(cid:13)˚𝔢(cid:62)

𝑏 ∗ (PTRΩ(X) − PT(X))(cid:13)
(cid:13)F

(cid:9) ≤

(cid:115)

4𝑐1(𝑛1 + 𝑛2) log ((𝑛1 + 𝑛2)𝑛3) 𝜇0𝑟
𝑝𝑛1𝑛2

· (cid:107)X(cid:107)∞,2

≥ 1 − 𝑛2((𝑛1 + 𝑛2)𝑛3)1−𝑐1.

(cid:115)

4𝑐1(𝑛1 + 𝑛2) log ((𝑛1 + 𝑛2)𝑛3) 𝜇0𝑟
𝑝𝑛1𝑛2

· (cid:107)X(cid:107)∞,2+

𝑐1 log((𝑛1 + 𝑛2)𝑛3)(cid:112)(𝑛1 + 𝑛2)𝜇0𝑟
√
𝑛1𝑛2

𝑝

(cid:33)

(cid:107)X(cid:107)∞

≥ 1 − 𝑛1((𝑛1 + 𝑛2)𝑛3)1−𝑐1.

Thus, we can get

(cid:32)

P

+

(cid:107)(PTRΩ(X) − PT(X))(cid:107)∞,2 ≤

(cid:115)

4𝑐1(𝑛1 + 𝑛2) log ((𝑛1 + 𝑛2)𝑛3) 𝜇0𝑟
𝑝𝑛1𝑛2

· (cid:107)X(cid:107)∞,2

𝑐1 log((𝑛1 + 𝑛2)𝑛3)(cid:112)(𝑛1 + 𝑛2)𝜇0𝑟
√
𝑛1𝑛2

𝑝

(cid:33)

(cid:107)X(cid:107)∞

≥ 1 − ((𝑛1 + 𝑛2)𝑛3)2−𝑐1.

(cid:131)

61

Proof of Lemma 2.11. Notice that

(cid:213)

PTRΩPT(X) = PTRΩ (cid:169)
(cid:173)
𝑖, 𝑗,𝑘
(cid:171)
(cid:213)

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 (cid:170)
(cid:174)
(cid:172)

𝛿𝑖, 𝑗,𝑘

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69)

1
𝑝

𝛿𝑖, 𝑗,𝑘

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69)

PT

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 (cid:170)
(cid:174)
(cid:172)
(cid:17)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

.

= PT (cid:169)
(cid:173)
𝑖, 𝑗,𝑘
(cid:171)
1
𝑝

(cid:213)

=

𝑖, 𝑗,𝑘

(2.35)

(2.36)

Equation (2.35) is due to PT(X) = (cid:205)
𝑖, 𝑗,𝑘

linearity of operator PT.

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)

𝑗 . Equation (2.36) is due to

Notice that

PT(X) = PT (PT(X))

(cid:213)

= PT (cid:169)
(cid:173)
𝑖, 𝑗,𝑘
(cid:171)
(cid:68)

(cid:213)

=

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69)

PT

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗 (cid:170)
(cid:174)
(cid:172)
(cid:17)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

Thus, we can have any (𝑎, 𝑏, 𝑐)th entry of PTRΩPT(X) − PT(X) can be given by

𝑖, 𝑗,𝑘

(cid:10)PTRΩPT(X) − PT(X), ˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)
(cid:42)

𝑏

(cid:11)

𝛿𝑖, 𝑗,𝑘 − 1)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:68)

(cid:69)

PT

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(

1
𝑝

(cid:213)

𝑖, 𝑗,𝑘

𝛿𝑖, 𝑗,𝑘 − 1)

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69) (cid:68)

(cid:16)

PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(

1
𝑝

(cid:213)

𝑖, 𝑗,𝑘

(cid:17)

(cid:17)

, ˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)
𝑏

, ˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)
𝑏

(cid:43)

(cid:69)

It is easy to see that E(ℎ𝑖, 𝑗,𝑘 ) = 0. Notice that

|ℎ𝑖, 𝑗,𝑘 | =

=

≤

𝛿𝑖, 𝑗,𝑘 − 1)

𝛿𝑖, 𝑗,𝑘 − 1)

(cid:68)

(cid:68)

1
𝑝

1
𝑝

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69) (cid:68)

PT

(cid:69) (cid:68)

PT

(cid:16)

(cid:16)

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

(cid:17)

, ˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)
𝑏

(cid:69)(cid:12)
(cid:12)
(cid:12)
(cid:12)

, PT (cid:0)˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)
𝑏

(cid:1) (cid:69)(cid:12)
(cid:12)
(cid:12)
(cid:12)

(cid:107)PT(X)(cid:107)∞

(cid:16)

(cid:13)
(cid:13)
(cid:13)

PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)(cid:13)
(cid:13)
(cid:13)F

(cid:13)
(cid:13)PT (cid:0)˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)

𝑏

(cid:1)(cid:13)
(cid:13)F

≤

(𝑛1 + 𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2

(cid:107)PT(X) (cid:107)∞

= :

(cid:213)

𝑖, 𝑗,𝑘

ℎ𝑖, 𝑗,𝑘 .

=

=

(

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
1
𝑝

(

It is easy to check that

62

(cid:213)

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
𝑖, 𝑗,𝑘
(cid:12)

E[ℎ2

𝑖, 𝑗,𝑘 ]

(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)
(cid:12)

= E

(cid:32)(cid:12)
(cid:12)
(cid:12)
(cid:12)

(

1
𝑝

𝛿𝑖, 𝑗,𝑘 − 1)

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69) (cid:68)

(cid:16)

PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

, ˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)
𝑏

2(cid:33)
(cid:69)(cid:12)
(cid:12)
(cid:12)
(cid:12)

(cid:18)

(

= E

1
𝑝

𝛿𝑖, 𝑗,𝑘 − 1)2 (cid:12)
(cid:12)
(cid:12)

(cid:68)

PT(X), ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:69) (cid:68)

(cid:16)

PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)

, ˚𝔢𝑎 ∗ (cid:164)𝔢𝑐 ∗ ˚𝔢(cid:62)
𝑏

2(cid:19)

(cid:69)(cid:12)
(cid:12)
(cid:12)

≤

1
𝑝

(cid:107)PT(X) (cid:107)2
∞

(cid:16)

(cid:13)
(cid:13)
(cid:13)

PT

˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:17)(cid:13)
(cid:13)
(cid:13)

2

F

≤

(𝑛1 + 𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2

(cid:107)PT(X) (cid:107)2

∞ .

Thus, by non-commutative Bernstein inequality, we have
1
2

(PTRΩPT(X) − PT(X))𝑎,𝑏,𝑐 ≥

P

(cid:20)

(cid:21)

(cid:107)PT(X)(cid:107)∞

−(cid:107)PT(X)(cid:107)2

∞/8

(cid:107)PT(X)(cid:107)2

∞ + (𝑛1+𝑛2)𝜇0𝑟
6𝑝𝑛1𝑛2

(cid:107)PT(X)(cid:107)2
∞

(cid:170)
(cid:174)
(cid:172)

≤2 exp (cid:169)
(cid:173)
(cid:171)
(cid:18)

=2 exp

(𝑛1+𝑛2)𝜇0𝑟
𝑝𝑛1𝑛2
−3𝑝𝑛1𝑛2
28(𝑛1 + 𝑛2)𝜇0𝑟

(cid:19)

.

Thus, using the union bound on every (𝑎, 𝑏, 𝑐)th entry, we have

(cid:107)(PTRΩPT − PT) (PT(X)) (cid:107)∞ ≤

holds with probability at least 1 − 2𝑛1𝑛2𝑛3 exp

(cid:16) −3𝑝𝑛1𝑛2
28(𝑛1+𝑛2)𝜇0𝑟

(cid:107)PT(X)(cid:107)∞

1
2
(cid:17)
.

Lastly, to maintain the integrity of a self-contained exposition, we offer a detailed proof of

Proposition 1 in Section 2.6.2.3, as originally presented in [84].

(cid:131)

Proof of Proposition 1

In the following context, the symbol T is used to represent the tensor

that we aim to recover in the optimization problem (2.18). Before we delve into the detailed proof

pipeline, we wish to reiterate the purpose of Proposition 1. It asserts that T is a unique minimizer

to the optimization problem (2.18) when Conditions 1 and 2 are met simultaneously.

Notice that T is a feasible solution to the problem (2.18). In order to show T is the unique

minimizer, it suffices to show

(cid:107)X(cid:107)TNN − (cid:107)T (cid:107)TNN > 0

for any feasible solution X but X ≠ T .

We first show that for any feasible solution X different from T , there exists a tensor M such

that

(cid:107)X(cid:107)TNN − (cid:107)T (cid:107)TNN ≥ (cid:10)U ∗ V(cid:62) + PT⊥ (T ) (M), X − T (cid:11) .

63

In this way, we can transform proving (cid:107)X(cid:107)TNN − (cid:107)T (cid:107)TNN > 0 into showing

(cid:10)U ∗ V(cid:62) + PT⊥ (T ) (M), X − T (cid:11) > 0.

To prove (cid:104)U ∗ V(cid:62) + PT⊥ (M), X − T (cid:105) > 0, we split

(cid:10)U ∗ V(cid:62) + PT⊥ (M), X − T (cid:11)

into two parts

(cid:104)PT⊥ (M), X − T (cid:105) and (cid:10)U ∗ V(cid:62), X − T (cid:11) .

By the construction of M, we can show that

(cid:104)PT⊥ (M), X − T (cid:105) = (cid:107)PT⊥ (X − T )(cid:107)TNN.

As for the part (cid:104)U ∗ V(cid:62), X − T (cid:105) , we need to further split it into two parts by introducing the dual

certification tensor Y:

(cid:10)PT(T ) (Y) − U ∗ V(cid:62), X − T (cid:11) , (cid:10)PT⊥ (T ) (Y), X − T (cid:11) .

The reason of doing such separation is that we can bound these two terms by 1

2 (cid:107)PT⊥ (X − T ) (cid:107)TNN
√
2
4 (cid:107)PT⊥ (X−T ) (cid:107)TNN respectively. By combining the bound of above three separations together,

and

we can get

(cid:104)U ∗ V(cid:62) + PT⊥ (M), X − T (cid:105) ≥

1
8

(cid:107)PT⊥ (X − T ) (cid:107)TNN

.

In the end, we prove

(cid:13)PT⊥ (T ) (X − T )(cid:13)
(cid:13)

(cid:13)TNN strictly larger than zero by contradiction. Before we

move on to the detailed proof, we will give several useful lemmas which are key to the proof of

Proposition 1. First, we state the characterization of the tensor nuclear norm (TNN) which can be

described as a duality to the tensor spectral norm.

Lemma 2.12. ([84]) Given a tensor T ∈ K𝑛1×𝑛2×𝑛3, we have

(cid:107)T (cid:107)TNN =

sup
2 ×𝑛

{Q∈K𝑛

1 ×𝑛

3 :(cid:107)Q (cid:107)≤1}

(cid:104)Q, T (cid:105) .

Next, we present a characterization of the subdifferential of TNN, which is useful for proving

the uniqueness of the minimizer to the optimization problem (2.18).

Lemma 2.13 (Subdifferential of TNN [84]). Let T ∈ K𝑛1×𝑛2×𝑛3 and its compact t-SVD be T =
U ∗ Σ ∗ V(cid:62). The subdifferential (the set of subgradients) of (cid:107)T (cid:107)TNN is 𝜕 (cid:107)T (cid:107)TNN = {U ∗ V(cid:62) +
W|U(cid:62) ∗ W = 0, W ∗ V = 0, (cid:107)W (cid:107) ≤ 1}.

64

For the proof of Proposition 1, a significant challenge is proving the minimizer’s uniqueness.

This involves ensuring the expression

(cid:10)U ∗ V(cid:62) + PT⊥ (T ) (M), X − T (cid:11) > 0

with M satisfying some conditions.

Lemma 2.14 ( [84]). Assume that Ω is generated according to the Bernoulli sampling with proba-

bility 𝑝. If (cid:107)PTRΩPT − PT(cid:107) ≤ 1
2

, then

(cid:107)PT(X) (cid:107)F ≤

(cid:115)

2𝑛3
𝑝

· (cid:107)PT⊥ (X) (cid:107)TNN

,

for any X with PΩ(X) = 0.

Proof of Lemma 2.14. Let X be a tensor satisfying PΩ(X) = 0. Using self-adjoint property of the

operator PT, we can have

(cid:107)RΩPT(X) (cid:107)2

F = (cid:104)RΩPT(X), RΩPT(X)(cid:105)
RΩPT(X), (cid:213)

=

(cid:42)

1
𝑝

𝑖, 𝑗,𝑘

𝛿𝑖, 𝑗,𝑘 [PT(X)]𝑖, 𝑗,𝑘 · ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:42)

RΩPT(X), (cid:213)

𝑖, 𝑗,𝑘

𝛿𝑖, 𝑗,𝑘 [PT(X)]𝑖, 𝑗,𝑘 · ˚𝔢𝑖 ∗ (cid:164)𝔢𝑘 ∗ ˚𝔢(cid:62)
𝑗

(cid:43)

(cid:43)

=

=

=

=

≥

≥

1
𝑝

1
𝑝
1
𝑝
1
𝑝
1
𝑝
1
𝑝

(cid:104)RΩPT(X), PΩ(PT(X))(cid:105) =

1
𝑝

(cid:104)PTRΩPT(X), X(cid:105)

(cid:104)PTRΩPT(X) − PT(X), X(cid:105) +

1
𝑝

(cid:104)PT(X), X(cid:105)

(cid:107)PT(X) (cid:107)2

F +

(cid:107)PT(X) (cid:107)2

F −

(cid:107)PT(X) (cid:107)2

F −

1
𝑝
1
𝑝
1
𝑝

(cid:104)(PTRΩPT − PT) (X), PT(X)(cid:105)

(cid:107)PTRΩPT − PT(cid:107) (cid:104)X, PT(X)(cid:105)

(cid:107)PTRΩPT − PT(cid:107) · (cid:107)PT(X) (cid:107)2

F ≥

1
2𝑝

(cid:107)PT(X) (cid:107)2
F

.

Notice that if PΩ(X) = 0, then RΩ(X) must be zero tensor. Thus, we have (cid:107)RΩPT(X) (cid:107)F =
(cid:13)
(cid:107)RΩPT⊥ (X)(cid:107)F ≤ 1
(cid:13)
PT⊥ (X)
(cid:13)F
result, we have (cid:107)PT(X)(cid:107)F ≤ (cid:112)2𝑝 (cid:107)RΩPT(X) (cid:107)F ≤

√𝑛3
𝑝 (cid:107)PT⊥ (X)(cid:107)TNN. As a
(cid:131)

(cid:13)
≤ 1
(cid:13)
𝑝√𝑛3
(cid:13)∗
(cid:113) 2𝑛3
𝑝 (cid:107)PT⊥ (X) (cid:107)TNN.

𝑝 (cid:107)PT⊥ (X) (cid:107)F = 1
𝑝√𝑛3

PT⊥ (X)

(cid:13)
(cid:13)
(cid:13)

(cid:13)
(cid:13)
(cid:13)

=

Proof of Proposition 1. Consider any feasible solution X ≠ T to the optimization problem (2.18)

with PΩ(X) = PΩ(T ). By the duality between the TNN and tensor spectral norm shown in

65

Lemma 2.12, we have that there exists a tensor M with (cid:107)M (cid:107) ≤ 1 such that

(cid:107)PT⊥ (X − T )(cid:107)TNN = (cid:104)M, PT⊥ (X − T )(cid:105)

= (cid:104)PT⊥ (M), PT⊥ (X − T )(cid:105) .

Firstly, it is easy to check that U(cid:62) ∗ PT⊥ (M) = 0 and PT⊥ (M) ∗ V = 0 by the definition of the

operator PT⊥ (·). By Lemma 2.13, we have that U ∗ V(cid:62) + PT⊥ (M) is a subgradient of T in terms

of Tensor nuclear norm. Therefore, we have

(cid:107)X(cid:107)TNN − (cid:107)T (cid:107)TNN ≥ (cid:10)U ∗ V(cid:62) + PT⊥ (M), X − T (cid:11) .

To prove (cid:107)X(cid:107)TNN − (cid:107)T (cid:107)TNN ≥ 0, it is sufficient to show

(cid:10)U ∗ V(cid:62) + PT⊥ (M), X − T (cid:11) ≥ 0.

Notice that for any Y with PΩ(Y) = Y, we have

(cid:104)Y, X − T (cid:105) = (cid:104)PΩ(Y), X − T (cid:105) = (cid:104)PΩ(Y), PΩ(X − T )(cid:105) = 0.

We thus have

(cid:10)U ∗ V(cid:62) + PT⊥ (M), X − T (cid:11) = (cid:10)U ∗ V(cid:62) + PT⊥ (M) − Y, X − T (cid:11) .

Furthermore, we have

(cid:10)U ∗ V(cid:62) + PT⊥ (M) − Y, X − T (cid:11)

= (cid:10)U ∗ V(cid:62) + PT⊥ (M) − PT⊥ (Y) − PT(Y), X − T (cid:11)

= (cid:104)PT⊥ (M), X − T (cid:105) − (cid:10)PT(Y) − U ∗ V(cid:62), X − T (cid:11) − (cid:104)PT⊥ (Y), X − T (cid:105)

= (cid:107)PT⊥ (X − T ) (cid:107)TNN − (cid:10)PT(Y) − U ∗ V(cid:62), PT(X − T )(cid:11) − (cid:104)PT⊥ (Y), PT⊥ (X − T )(cid:105)
≥ (cid:107)PT⊥ (X − T ) (cid:107)TNN − (cid:13)

(cid:13)PT(Y) − U ∗ V(cid:62)(cid:13)
(cid:13)F

(cid:107)PT(X − T ) (cid:107)F

− (cid:107)PT⊥ (Y)(cid:107) (cid:107)PT⊥ (X − T ) (cid:107)TNN

≥

≥

1
2

1
8

(cid:107)PT⊥ (X − T ) (cid:107)TNN −

(cid:107)PT⊥ (X − T ) (cid:107)TNN

.

(cid:114) 𝑝
𝑛3

1
4

·

(cid:115)

2𝑛3
𝑝

(cid:107)PT⊥ (X − T ) (cid:107)TNN

(2.37)

Inequality (2.37) results from Condition 1 and Condition 2 and Lemma 2.14. Next, to verify the

completeness of the proof, it suffices to show that (cid:107)PT⊥ (X − T ) (cid:107)TNN is strictly positive. We show
it by contradiction. Suppose (cid:107)PT⊥ (X − T ) (cid:107)TNN = 0, then PT(X − T ) = X − T and PTRΩPT(X −

66

T ) = 0. Therefore, we have

(cid:107)X − T (cid:107) = (cid:107)(PTRΩPT − PT)(X − T )(cid:107) ≤ (cid:107)PTRΩPT − PT(cid:107)(cid:107)X − T (cid:107),

which contradicts with the assumption that (cid:107)PTRΩPT − PT(cid:107) ≤ 1

to the optimization problem (2.18).

2 . Thus, T is the unique minimizer
(cid:131)

Next we present a detailed proof of Theorem 2.6, our main theoretical result, demonstrating

that our model ensures tensor recovery in high-probability.

2.6.3 Proof of Theorem 2.6

In this section, we provide a detailed proof of our main theoretical result Theorem 2.6. The

proof is based on our Two-Step Tensor Completion (TSTC) algorithm. For the ease of the reader,

we state the TSTC algorithm in Algorithm 2.7. This algorithm focuses on subtensor completion

before combining results with t-CUR.

Algorithm 2.7 Two-Step Tensor Completion (TSTC)
1: Input: [T ]ΩR ∪ΩC : observed data; ΩR, ΩC: observation locations; 𝐼, 𝐽: lateral and horizontal

indices; 𝑟: target rank; TC: the chosen tensor completion solver.

, 𝑟)
, 𝑟)

2: (cid:101)R = TC( [T ]ΩR
3: (cid:101)C = TC( [C]ΩC
4: (cid:101)U = [ (cid:101)C] 𝐼,:,:
5: (cid:101)T = (cid:101)C ∗ (cid:101)U† ∗ (cid:101)R
6: Output: (cid:101)T : approximation of T

Based on the idea of TSTC, it is crucial to understand that how the tensor incoherence properties

of the original low tubal-rank tensor transfer to subtensors.

2.6.3.1

Incoherence passes to subtensors

Inspired by [17, Theorem 3.5], we explore how subtensors inherit the tensor incoherence con-

ditions from the original tensor, differing from [99] in tensor norm and the definition of the tensor

incoherence condition. Our focus is on subtensor incoherence due to its impact on the required

sampling rate for accurate low tubal-rank tensor recovery (Theorem 2.5) and our emphasis on com-

pleting subtensors in tensor completion. We begin by examining the relationship between the tensor

incoherence properties of subtensors and the original low tubal-rank tensor.

67

Lemma 2.15. Let T ∈ K𝑛1×𝑛2×𝑛3 satisfy the tensor 𝜇0-incoherence condition. Suppose that T
has a compact t-SVD T = W ∗ Σ ∗ V(cid:62) and a condition number 𝜅. Consider the subtensors

C = [T ]:,𝐽,: and R = [T ] 𝐼,:,:, each maintaining the same tubal-rank as T . Their compact t-SVDs

are represented as

C = WC ∗ ΣC ∗ V(cid:62)

C and R = WR ∗ ΣR ∗ V(cid:62)
R

,

then the following results hold:
𝜇C ≤ 𝜅2 (cid:13)
(cid:13)
(cid:13)

[V]†

𝐽,:,:

(cid:13)
(cid:13)
(cid:13)

2 |𝐽 |
𝑛2

𝜇0, and 𝜇R ≤ 𝜅2 (cid:13)
(cid:13)
(cid:13)

[W]†

𝐼,:,:

(cid:13)
(cid:13)
(cid:13)

2 |𝐼 |
𝑛1

𝜇0.

Proof. First, let’s prove that

(cid:13)
(cid:13)
(cid:13)

(cid:62)

[(cid:100)WC

max
𝑖

]:,:,𝑘 · e𝑖

(cid:13)
(cid:13)
(cid:13)F

≤

(cid:114) 𝜇0𝑟
𝑛1

.

is
(cid:1) (cid:62) = P ∗ S ∗ Q(cid:62). Thus, P ∈ K𝑟×𝑟×𝑛3 is an orthonormal tensor, leading to WC = W ∗ P

𝐽,:,:. Assume the compact t-SVD of Σ ∗ (cid:0)[V] 𝐽,:,:

Notice that C = [T ]:,𝐽,: = W ∗ Σ ∗ [V](cid:62)
Σ ∗ (cid:0)[V] 𝐽,:,:
based on the relationship that C = W ∗ P ∗ S ∗ Q(cid:62).

(cid:1) (cid:62)

P(cid:62) ∗ P = I implies that [ (cid:98)P](cid:62)
Therefore, we can establish that for 𝑘 ∈ [𝑛3],

:,:,𝑘 · [ (cid:98)P]:,:,𝑘 = I𝑟, where I𝑟 is the 𝑟 × 𝑟 identity matrix for all 𝑘 ∈ [𝑛3].

max
𝑖

(cid:107) (cid:100)WC](cid:62)
(cid:62)

(cid:107) [ (cid:99)W](cid:62)
:,:,𝑘 · e𝑖 (cid:107)F = max
(cid:13)([V] 𝐽,:,:)†(cid:13)
]:,:,𝑘 · e𝑖 (cid:107)F ≤ 𝜅(T ) (cid:13)
(cid:107) [ (cid:99)VC
(cid:13)
C ∗ C. Thus, for each 𝑘 ∈ [𝑛3], [ (cid:98)V(cid:62)

:,:,𝑘 · e𝑖 (cid:107)F ≤
(cid:113) 𝜇0𝑟
𝑛2
C ]:,:,𝑘 = [(cid:98)Σ†

𝑖

𝑖
C ∗ W(cid:62)

(cid:114) 𝜇0𝑟
𝑛1

.

(2.38)

. The compact t-SVD of C

C]:,:,𝑘 · [ (cid:99)W(cid:62)

C ]:,:,𝑘 · [ (cid:98)C]:,:,𝑘

Next, let’s prove max

implies V(cid:62)

C = Σ†
holds and (cid:107) [ (cid:98)V(cid:62)

C ]:,:,𝑘 · e𝑖 (cid:107)F can be bounded by

(cid:107) [ (cid:98)V(cid:62)

C ]:,:,𝑘 · e𝑖 (cid:107)F =(cid:107) [(cid:98)Σ†

C]:,:,𝑘 · [ (cid:99)W(cid:62)

C ]:,:,𝑘 · [ (cid:98)C]:,:,𝑘 · e𝑖 (cid:107)F

]:,:,𝑘 · [ (cid:99)W]:,:,𝑘 · [(cid:98)Σ]:,:,𝑘 · [ (cid:98)V](cid:62)

𝐽,:,𝑘 · e𝑖 (cid:107)F

𝐽,:,𝑘 · e𝑖 (cid:107)F

(cid:62)

≤

(cid:107) [ (cid:98)V](cid:62)

≤(cid:107) [(cid:98)Σ†
C]:,:,𝑘 (cid:107)(cid:107) [(cid:100)WC
(cid:13)
(cid:13)
(cid:13)
(cid:13)
†
(cid:13)
(cid:13)
(cid:13)
(cid:13)
(cid:13)Σ
(cid:13)Σ
C
(cid:13)
(cid:13)
≤(cid:107)C†(cid:107) (cid:107)T (cid:107) (cid:107) [ (cid:98)V](cid:62)
= (cid:13)

(cid:13)( [V](cid:62)

≤ (cid:13)

(cid:13)( [V](cid:62)

:,:,𝑘 · e𝑖 (cid:107)F
𝐽,:,:)† ∗ Σ† ∗ W(cid:62)(cid:13)
𝐽,:,:)†(cid:13)

(cid:13) (cid:107)T †(cid:107) (cid:107)T (cid:107) (cid:107) [ (cid:98)V](cid:62)

(cid:13) (cid:107)T (cid:107) (cid:107) [ (cid:98)V](cid:62)

:,:,𝑘 · e𝑖 (cid:107)F
:,:,𝑘 · e𝑖 (cid:107)F ≤ 𝜅 (cid:13)

(cid:13)([V](cid:62)

𝐽,:,:)†(cid:13)
(cid:13)

(cid:114) 𝜇0𝑟
𝑛2

.

68

That is,

max
𝑖

(cid:107) [ (cid:98)V(cid:62)

C ]:,:,𝑘 · e𝑖 (cid:107)F ≤ 𝜅

(cid:13)
(cid:13)
(cid:13)
Combining (2.38) and (2.39), we can conclude that 𝜇C ≤ 𝜅2 (cid:13)
(cid:13)
(cid:13)
Applying above process on R, we can get 𝜇R ≤ 𝜅2 (cid:13)
[W]†
(cid:13)
(cid:13)

[V]†

𝐽,:,:

(cid:13)
(cid:13)
(cid:13)

𝜇0|𝐽 |
𝑛2

(cid:114) 𝑟
|𝐽 |

.

[V]†
𝐽,:,:
(cid:13)
2 |𝐼 |
(cid:13)
𝑛1
(cid:13)

𝐼,:,:

(cid:13)
2 |𝐽 |
(cid:13)
𝑛2
(cid:13)
𝜇0.

(cid:115)

𝜇0.

(2.39)

(cid:131)

Following Lemma 2.15, we explore the incoherence properties of uniformly sampled subten-

sors, with summarized results below.

Lemma 2.16. Let T ∈ K𝑛1×𝑛2×𝑛3 with multi-rank (cid:174)𝑟, and let T = W ∗ Σ ∗ V(cid:62) be its compact t-SVD.
Additionally, T satisfies the tensor 𝜇0-incoherence condition, and 𝜅 denotes the condition number

of T . Suppose 𝐼 ⊆ [𝑛1] and 𝐽 ⊆ [𝑛2] are chosen uniformly at random with replacement. Then

rank𝑚 (R) = rank𝑚 (C) = rank𝑚 (T ), 𝜇C ≤

25
4

𝜅2𝜇0 and 𝜇R ≤

25
4

𝜅2𝜇0

hold with probability at least 1 − 1
𝑛𝛽
1

− 1
𝑛𝛽
2

2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛2(cid:107)(cid:174)𝑟 (cid:107)1).

provided that |𝐼 | ≥ 2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛1(cid:107)(cid:174)𝑟 (cid:107)1) and |𝐽 | ≥

Proof of Lemma 2.16. According to Lemma 2.3 by setting 𝛿 = 0.815 and 𝛽1 = 𝛽2 = 𝛽, we can

easily get that

Therefore,

(cid:32)

P

(cid:107) [V]†

𝐽,:,:(cid:107) ≤

(cid:32)

P

(cid:107) [W]†

𝐼,:,:(cid:107) ≤

(cid:115)

(cid:115)

25𝑛2
4|𝐽 |

25𝑛1
4|𝐼 |

, rank𝑚 (C) = rank𝑚 (T )

≥1 −

(cid:33)

(cid:33)

, rank𝑚 (R) = rank𝑚 (T )

≥1 −

,

.

1
𝑛𝛽
2

1
𝑛𝛽
1

(cid:18)

(cid:18)

P

P

𝜇C ≤

𝜇R ≤

25
4

25
4

𝜅2𝜇0, rank𝑚 (C) = rank𝑚 (T )

𝜅2𝜇0, rank𝑚 (R) = rank𝑚 (T )

Combining (2.40) and (2.41), we can conclude that

(cid:19)

(cid:19)

≥1 −

≥1 −

,

.

1
𝑛𝛽
2
1
𝑛𝛽
1

rank𝑚 (R) = rank𝑚 (C) = rank𝑚 (T ), 𝜇C ≤

25
4

𝜅2𝜇0 and 𝜇R ≤

25
4

𝜅2𝜇0

with probability at least 1 − 1
𝑛𝛽
1

− 1
𝑛𝛽
2

provided that

|𝐼 | ≥ 2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛1(cid:107)(cid:174)𝑟 (cid:107)1) and |𝐽 | ≥ 2𝛽𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛2(cid:107)(cid:174)𝑟 (cid:107)1) .

(2.40)

(2.41)

(cid:131)

69

2.6.3.2 Proof of Theorem 2.6

Proof of Theorem 2.6. Note that 𝐼 ⊆ [𝑛1] and 𝐽 ⊆ [𝑛2] are chosen uniformly with replacement.

According to Lemma 2.16, we thus have

rank𝑚 (R) = rank𝑚 (C) = rank𝑚 (T ), 𝜇C ≤

25
4

𝜅2𝜇0 and 𝜇R ≤

25
4

𝜅2𝜇0

hold with probability at least

1 −

1
𝑛800𝛽𝜅2 log(𝑛1𝑛3+𝑛2𝑛3)
1

−

1
𝑛800𝛽𝜅2 log(𝑛1𝑛3+𝑛2𝑛3)
2

=1 −

1
(𝑛1𝑛3 + 𝑛2𝑛3)800𝛽𝜅2 log(𝑛1)

−

1
(𝑛1𝑛3 + 𝑛2𝑛3)800𝛽𝜅2 log(𝑛2)

provided that

|𝐼 | ≥ 3200𝛽𝜇0𝑟 𝜅2 log2(𝑛1𝑛3 + 𝑛2𝑛3) ≥ 800𝜅2 log(𝑛1𝑛3 + 𝑛2𝑛3) 𝛽 · (2𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛1(cid:107)(cid:174)𝑟 (cid:107)1)),

|𝐽 | ≥ 3200𝛽𝜇0𝑟 𝜅2 log2(𝑛1𝑛3 + 𝑛2𝑛3) ≥ 800𝜅2 log(𝑛1𝑛3 + 𝑛2𝑛3) 𝛽 · (2𝜇0(cid:107)(cid:174)𝑟 (cid:107)∞ log (𝑛2(cid:107)(cid:174)𝑟 (cid:107)1) .

Additionally, the following statements hold by Theorem 2.5 and the condition that

𝜇C ≤ 25
4

𝜅2𝜇0 and 𝜇R ≤ 25
4

𝜅2𝜇0 :

i) Given C ∈ K𝑛1×|𝐽 |×𝑛3 with rank(C) = 𝑟,

𝑝C ≥

1600𝛽(𝑛1 + |𝐽 |)𝜇0𝑟 𝜅2 log2(𝑛1𝑛3 + |𝐽 |𝑛3)
𝑛1|𝐽 |

for some 𝛽 > 1 ensures that C is the unique minimizer to

X∈K𝑛
with probability at least 1 − 3 log(𝑛1𝑛3+|𝐽 |𝑛3)
(𝑛1𝑛3+|𝐽 |𝑛3)4𝛽−2 .

min
1 × | 𝐽 | ×𝑛
3

(cid:107)X(cid:107)TNN,

subject to PΩC (X) = PΩC (C).

ii) Given R ∈ K|𝐼 |×𝑛2×𝑛3 with rank(R) = 𝑟,

𝑝R ≥

1600𝛽(𝑛2 + |𝐼 |)𝜇0𝑟 𝜅2 log2(𝑛2𝑛3 + |𝐼 |𝑛3)
𝑛2|𝐼 |

for some 𝛽 > 1 ensures that R is the unique minimizer to

min
X∈K| 𝐼 | ×𝑛

2 ×𝑛
3

(cid:107)X(cid:107)TNN,

subject to PΩR (X) = PΩR (R).

Once C and R are uniquely recovered from ΩC and ΩR, respectively. Then t-CUR decomposition

can provide the reconstruction of T via T = C ∗U† ∗R with the condition rank𝑚 (R) = rank𝑚 (C) =

rank𝑚 (T ).

70

Combining all the statements above, we can conclude that T can be uniquely recovered from

ΩC ∪ ΩR with probability at least

1 −

2
(𝑛1𝑛3 + 𝑛2𝑛3)800𝛽𝜅2 log(𝑛2)

−

3 log(𝑛1𝑛3 + |𝐽 |𝑛3)
(𝑛1𝑛3 + |𝐽 |𝑛3)4𝛽−2

−

3 log(𝑛2𝑛3 + |𝐼 |𝑛3)
(𝑛2𝑛3 + |𝐼 |𝑛3)4𝛽−2

.

(cid:131)

2.7 Conclusion

In this work, we present the t-CCS model, an extension of the matrix CCS model to a tensor

framework. We provide both theoretical and experimental evidence demonstrating the flexibility

and feasibility of the t-CCS model. The ITCURC algorithm, designed for the t-CCS model, pro-

vides a balanced trade-off between runtime efficiency and reconstruction quality. While it is not

as effective as the state-of-the-art Bernoulli-Based TC algorithm, it is still comparable in terms of

PSNR and SSIM. Thus, one of directions of our future research will focus on enhancing reconstruc-

tion quality through the integration of the 𝑀-product. From theoretical side, our current theoretical

result shows that the t-CUR sampling scheme, as a special case of t-CCS model, requires a complex-

ity of O (𝜇0𝑟𝑛3(𝑛2 log(𝑛1𝑛3) + 𝑛1 log(𝑛2𝑛3))) is sufficient, which is more sampling-efficient than

that of a general t-CCS scheme. This finding suggests there is potential to further improve the the-

oretical sampling complexity for the t-CCS model, an aspect we plan to explore in our future work.

Additionally, there is a need for comprehensive theoretical analysis on the convergence behavior of

the ITCURC algorithm within the t-CCS model framework. Evaluating the algorithm’s robustness

against additive noise will also be a critical area of focus for future research. Furthermore, while

our current work is limited to third-order tensors, we aim to extend our approach to accommodate

higher-order tensor configurations in subsequent studies.

71

CHAPTER 3

ON THE ROBUSTNESS OF CROSS-CONCENTRATED SAMPLING FOR MATRIX
COMPLETION

72

ABSTRACT

Matrix completion is essential in data science for recovering missing entries in partially observed

data. Recently, cross-concentrated sampling (CCS), a novel approach to matrix completion, has

gained attention, though its robustness against sparse outliers remains unaddressed. In this chap-

ter, we propose the Robust CCS Completion problem to explore this robustness and introduce a

non-convex iterative algorithm called Robust CUR Completion (RCURC). Our experiments with

synthetic and real-world datasets confirm that RCURC is both efficient and robust to sparse outliers,

making it a powerful tool for Robust Matrix Completion.

3.1

Introduction

The matrix completion problem, first introduced by Candes et al. [25] and Recht [97], aims to

reconstruct a low-rank matrix 𝑋 from a limited subset of its observed entries. In practice, many

real-world data matrices are highly incomplete, and this problem has emerged as an important

tool for uncovering latent structures in the data. The significance of matrix completion lies in its

broad applicability across numerous domains such as recovering missing data in recommendation

systems [7, 45], improving image quality [29, 61], and enhancing the efficiency and accuracy of

signal processing techniques [22, 13].

In its simplest form, the goal of matrix completion is to estimate the unknown entries of a

matrix 𝑋 given access only to a small fraction of the entries. Mathematically, this can be described

as solving for 𝑋 given observations from the set Ω, which contains the indices of known entries

in 𝑋. A common assumption is that the matrix 𝑋 is of low rank, meaning that it can be described

by a small number of underlying factors or components. The challenge arises from ensuring that

the reconstructed matrix accurately captures the underlying low-rank structure without overfitting

to the noise or sparsity of the observations.

Traditional matrix completion methods often rely on uniform or Bernoulli sampling strategies,

where each entry of the matrix is sampled independently with a fixed probability. However, this

approach can be inefficient, particularly when the data exhibits specific structures or when some

rows and columns are more informative than others.

73

Recent advancements in the matrix completion field have introduced a novel sampling method

known as CCS [19]. Unlike uniform sampling, which treats all entries equally, CCS focuses on

strategically sampling entries from certain rows and columns to achieve a more informative set of

observations. By concentrating the samples in regions of the matrix that are more likely to contain

useful information, CCS can lead to more accurate matrix recovery with fewer sampled entries.

Algorithm 3.1 outlines the key steps in the CCS procedure.

Algorithm 3.1 Cross-Concentrated Sampling (CCS) [19]
1: Input: the data matrix 𝑌 .
2: Uniformly select a subset of row indices 𝐼 and column indices 𝐽.
3: Set 𝑅 = [𝑌 ] 𝐼,: (rows indexed by 𝐼) and 𝐶 = [𝑌 ]:,𝐽 (columns indexed by 𝐽).
4: Uniformly sample entries within the selected rows 𝑅 and columns 𝐶, recording the sampled

locations as ΩR and ΩC, respectively.

5: Output: Return the observed entries [𝑌 ]ΩR∪ΩC and the indices ΩR, ΩC, 𝐼, 𝐽.

(a) Uniform Sampling

(b) CCS–Less Concentrated (c) CCS–More Concentrated

(d) CUR Sampling

Figure 3.1 [19] Visual comparison of sampling schemes: from uniform to CUR sampling at the
same observation rate. Colored pixels indicate observed entries, black pixels indicate missing ones.

As shown in Figure 3.1, the CCS model bridges two commonly used sampling methods in

matrix completion: Uniform Sampling and CUR Sampling. Uniform sampling randomly selects

entries from the entire matrix, while CUR sampling focuses on selecting entire rows and columns

for observation. The CCS approach can be viewed as a hybrid method, offering additional flexibility

by concentrating samples on selected rows and columns, with a theoretical basis for achieving better

recovery in certain structured datasets.

74

Despite the advantages of CCS, matrix completion in real-world applications often encounters

a significant challenge: data corruption by sparse outliers. In many scenarios, the observed matrix

is not simply low-rank but is also corrupted by noise or outliers that are sparsely distributed. Such

outliers can arise from various sources, such as user input errors in recommendation systems or

sensor malfunctions in signal processing. To address this, Robust Matrix Completion methods

have been developed, which introduce a sparse matrix 𝑆 to model the outliers, while ensuring that

the underlying low-rank matrix 𝑋 is accurately recovered.

A crucial question that remains is whether CCS-based matrix completion is robust to sparse

outliers when used with robust recovery algorithms. Specifically, we ask:

Question 1. [18]Is CCS-based matrix completion robust to sparse outliers under some robust al-

gorithms, like the uniform sampling model?

To address this, we examine the Robust CCS Completion problem, where we are given partial

observations PΩ(𝑌 ) of a corrupted data matrix 𝑌 = 𝑋 + 𝑆, where 𝑋 is a low-rank matrix and 𝑆

represents sparse outliers. The objective is to simultaneously recover both 𝑋 and 𝑆 using CCS-

based sampling. The problem is formulated as follows:

min
𝑋,𝑆

1
2

(cid:104)PΩ(𝑋 + 𝑆 − 𝑌 ), 𝑋 + 𝑆 − 𝑌 (cid:105)

subject to rank(𝑋) = 𝑟,

𝑆 is 𝛼-sparse,

where, the sampling operator PΩ is defined as:

PΩ(𝑌 ) =

(cid:213)

[𝑌 ]𝑖, 𝑗 𝑒𝑖𝑒(cid:62)
𝑗 ,

(𝑖, 𝑗)∈Ω
where Ω denotes the set of observed indices generated by the CCS model. The sparse compo-

nent 𝑆 accounts for outliers, enabling more accurate recovery of the underlying low-rank matrix 𝑋.

This framework extends the principles of Robust Matrix Completion, incorporating the novel CCS

sampling approach.

75

3.1.1 Related Work

The problem of low-rank matrix recovery in the presence of sparse outliers has been well-

studied under the settings of uniform sampling and Bernoulli sampling. This problem is known

as robust principal component analysis (RPCA) when the corrupted data matrix is fully observed,

and it is called Robust Matrix Completion if data is partially observed. The seminal work [23]

considers both RPCA and Robust Matrix Completion problems via convex relaxed formulations

and provides recovery guarantees. In particular, under the 𝜇-incoherence assumption for the low-

rank 𝑋, [23] requires the positions of outliers to be placed uniformly at random, and at least

0.1𝑛2 entries are observed uniformly at random. Later, a series of non-convex algorithms [31, 78,

89, 132, 137, 12, 112, 20] tackle RPCA and/or Robust Matrix Completion problems with an im-

proved, non-random 𝛼-sparsity assumptions for the outlier matrix 𝑆. The typical recovery guarantee

shows a linear convergence of a non-convex algorithm, provided 𝛼 ≤ O (1/poly(𝜇𝑟)); moreover,

O (poly(𝜇𝑟)polylog(𝑛)𝑛) random samples are typically required for the Robust Matrix Comple-

tion cases. Another line of work [30, 22, 136, 11, 13] focuses on the robust recovery of structured

low-rank matrices, e.g., Hankel matrices, and they typically require merely O (poly(𝜇𝑟)polylog(𝑛))

samples by utilizing the structure, even in the presence of structured outliers. More recently, [15,

17, 59] study the robust CUR decomposition problem, that is, recovering the low-rank matrix from

row- and column-wise observations with entry-wise corruptions.

On the other hand, [19] shows that CCS-based matrix completion requires O (𝜇2𝑟 2𝑛 log2 𝑛)

samples which is only a factor of log 𝑛 worse than the state-of-the-art result; however, its outlier

tolerance has not been studied.

3.1.2 Notation

For a matrix 𝑀, [𝑀]𝑖, 𝑗 , [𝑀] 𝐼,:, [𝑀]:,𝐽, and [𝑀] 𝐼,𝐽 denote its (𝑖, 𝑗)-th entry, its row submatrix

with row indices 𝐼, its column submatrix with column indices 𝐽, and its submatrix with row indices

𝐼 and column indices 𝐽, respectively. 𝑀 † represents the Moore–Penrose inverse of 𝑀. We use (cid:104)·, ·(cid:105)

to denote the Frobenius inner product. The symbol [𝑛] denotes the set {1, 2, · · · , 𝑛} for all 𝑛 ∈ Z+.

Throughout this chapter, uniform sampling is referred to as uniform sampling with replacement.

76

3.1.3 𝜇-incoherence and 𝛼-sparsity

In Robust Matrix Completion, we often rely on certain structural assumptions about the matrix

to be recovered. Two such pivotal assumptions are 𝜇-incoherence [26] and 𝛼-sparsity [23]. These

assumptions play crucial roles in ensuring that the recovery algorithm can effectively reconstruct

the matrix even when a significant portion of its entries are missing or corrupted by noise.

The concept of 𝜇-incoherence is pivotal in the field of matrix completion, designed to ensure

a balanced distribution of information across all rows and columns of a matrix. This balance is

crucial for preventing any single row or column from disproportionately influencing the overall

content of the matrix, which is particularly important when attempting to recover or approximate a

matrix from a partial set of its entries.

Informally, a matrix is described as 𝜇-incoherent when its singular vectors are such that no indi-

vidual component dominates. This is quantified through boundedness conditions on the entries of

the singular vectors, which ensure that the matrix’s structural information is uniformly distributed.

We formalize this intuitive concept with the following definition:

Definition 21 (𝜇-incoherence [26]). Let 𝑋 ∈ R𝑛1×𝑛2 be a matrix of rank 𝑟. The matrix 𝑋 is said
to be 𝜇-incoherent if the following conditions hold for its compact singular value decomposition

𝑋 = 𝑈Σ𝑉 (cid:62):

(cid:107)𝑈 (cid:107)2,∞ ≤

(cid:114) 𝜇𝑟
𝑛1

and

(cid:107)𝑉 (cid:107)2,∞ ≤

(cid:114) 𝜇𝑟
𝑛2

,

where (cid:107) · (cid:107)2,∞ represents the maximum ℓ2-norm among the rows of the matrices 𝑈 and 𝑉, respec-

tively, and 𝜇 is a positive scalar that quantifies the level of incoherence.

This definition encapsulates how 𝜇-incoherence functions as a safeguard against skewed data

representation in matrix completion tasks, facilitating algorithms that require uniformly spread sin-

gular vectors for effective reconstruction.

The 𝛼-sparsity assumption plays a critical role in the analysis of matrix structures, particularly

within the domain of Robust Matrix Completion. This assumption pertains to the density of non-

zero entries in the matrix, or more specifically, in its constituent components such as the sparse error

77

matrix. The sparsity parameter 𝛼 serves as a threshold, dictating the maximum allowable proportion

of non-zero entries in each row and each column of the matrix, thus ensuring a controlled spread

of these entries throughout the matrix.

Definition 22 (𝛼-sparsity [23]). Consider a matrix 𝑆 ∈ R𝑛1×𝑛2. We define 𝑆 as 𝛼-sparse if no more
than an 𝛼 fraction of its entries in each row and each column are non-zero. Specifically, for every

row index 𝑖 within the set of all row indices [𝑛1] and every column index 𝑗 within the set of all

column indices [𝑛2], the matrix satisfies:

(cid:107) [𝑆]𝑖,:(cid:107)0 ≤ 𝛼𝑛2

and

(cid:107) [𝑆]:, 𝑗 (cid:107)0 ≤ 𝛼𝑛1,

where (cid:107) · (cid:107)0 represent the number of non-zero entries.

The assumptions of 𝜇-incoherence and 𝛼-sparsity are crucial because they directly influence

the feasibility and complexity of the matrix recovery process in Robust Matrix Completion. These

conditions ensure that the matrix has a well-distributed singular vector structure and a manageable

number of outliers or corruptions, which are key for successful recovery using optimization-based

methods [23, 26].

3.1.4 CUR Approximation

CUR approximation, also referred to as skeleton decomposition, forms the foundation of our

algorithm design. To provide context, we will briefly review some key concepts of CUR approxi-

mation, which plays a crucial role in matrix dimensionality reduction.

CUR approximation is a powerful and interpretable technique that addresses the challenge of

reducing matrix dimensionality while preserving meaningful structure. Given a rank-𝑟 matrix 𝑋 ∈
R𝑛×𝑛

, the matrix can be reconstructed by selecting appropriate rows and columns that span its row

and column spaces, respectively. This method offers an intuitive way to approximate matrices by

extracting representative submatrices. The theoretical underpinnings of this approach have been

established in prior research, as outlined in the following theorem:

Theorem 3.1 ( [87, 58]). Consider row and column index sets 𝐼, 𝐽 ⊆ [𝑛]. Denote the submatrices

78

𝐶 = [𝑋]:,𝐽, 𝑈 = [𝑋] 𝐼,𝐽, and 𝑅 = [𝑋] 𝐼,:. If rank(𝑈) = rank(𝑋), then

𝑋 = 𝐶𝑈†𝑅.

An extensive body of literature has significantly contributed to the development of sampling

methods in matrix CUR approximation. Key works include those by Achlioptas and McSherry [2],

Ahmadi and Drineas [3], Drineas et al. [41], Boutsidis and Woodruff [10], Hamm and Huang [58],

Cai et al. [19], and Martinsson [88], among many others [76, 123, 35, 85, 126, 119, 87, 128, 111,

77, 67, 127, 36, 92, 106, 27, 44, 124, 93, 39, 134, 34, 46, 47, 118, 55, 75]. In fact, sampling a

sufficient number of rows and columns makes this condition highly likely. An example of such a

sampling strategy is provided in Theorem 3.2.

Theorem 3.2 ( [32, Theorem 1.1]). Let 𝑋 satisfy Definition 21, and suppose we sample |𝐼 | =

O (𝑟 log 𝑛) rows and |𝐽 | = O (𝑟 log 𝑛) columns uniformly at random. Then rank(𝑈) = rank(𝑋) with

probability at least 1 − 𝑂 (𝑟𝑛−2).

3.2 Proposed Algorithm

In this section, we introduce a novel non-convex algorithm for solving the Robust CCS Comple-

tion problem (3.1). The algorithm builds upon the projected gradient descent framework, integrat-

ing the CUR decomposition to efficiently compute low-rank approximations at each iteration. This

approach significantly reduces computational complexity while maintaining robust performance.

The proposed algorithm, named Robust CUR Completion (RCURC), is outlined in Algorithm 3.2.

RCURC leverages the CUR approximation, where selected rows and columns are used to cap-

ture the essential structure of the matrix. By iteratively updating both the sparse and low-rank

components, RCURC aims to solve the matrix completion problem with high efficiency, even in

the presence of sparse outliers. Specifically, in each iteration, the algorithm alternates between

updating the sparse matrix using a thresholding operator and refining the low-rank matrix through

projected gradient updates on the observed data. The method ensures that the low-rank component

is efficiently approximated using the CUR decomposition, which focuses on key rows and columns

to reduce the dimensionality of the problem.

79

Algorithm 3.2 Robust CUR Completion (RCURC)
1: Input: [𝑌 = 𝑋 + 𝑆]Ω𝑅∪Ω𝐶 : observed data; Ω𝑅, Ω𝐶: observation locations; 𝐼, 𝐽: row and column
indices that define 𝑅 and 𝐶 respectively; 𝜂𝑅, 𝜂𝐶: step sizes; 𝑟: target rank; 𝜀: target precision
level; 𝜁0: initial thresholding value; 𝛾: thresholding decay parameter.

𝑘 = 0

2: 𝑋0 = 𝑀0;
3: while 𝑒𝑘 > 𝜀 do // 𝑒𝑘 is defined in (3.3)
// Updating sparse component
4:
𝜁𝑘+1 = 𝛾 𝑘 𝜁0
5:
[𝑆𝑘+1] 𝐼,: = T𝜁𝑘+1 [𝑌 − 𝑋𝑘 ] 𝐼,:
6:
[𝑆𝑘+1]:,𝐽 = T𝜁𝑘+1 [𝑌 − 𝑋𝑘 ]:,𝐽
7:
// Updating low-rank component
8:
𝑅𝑘+1 = [𝑋𝑘 ] 𝐼,: + 𝜂𝑅 [PΩ𝑅 (𝑌 − 𝑋𝑘 − 𝑆𝑘+1)] 𝐼,:
9:
𝐶𝑘+1 = [𝑋𝑘 ]:,𝐽 + 𝜂𝐶 [PΩ𝐶 (𝑌 − 𝑋𝑘 − 𝑆𝑘+1)]:,𝐽
10:
𝑈𝑘+1 = H𝑟 ( [𝑅𝑘+1]:,𝐽 (cid:93) [𝐶𝑘+1] 𝐼,:)
11:
[𝑅𝑘+1]:,𝐽 = 𝑈𝑘+1
12:
𝑋𝑘+1 = 𝐶𝑘+1𝑈†
13:
𝑘+1
𝑘 = 𝑘 + 1
14:
15: end while
16: Output: 𝐶𝑘 , 𝑈𝑘 , 𝑅𝑘 : The CUR components of the estimated low-rank matrix.

[𝐶𝑘+1] 𝐼,: = 𝑈𝑘+1
𝑅𝑘+1 // Do not compute (see (3.2))

and

We will explain the algorithm step by step in the following paragraphs. For clarity, we begin

with the low-rank component. To leverage the structure of cross-concentrated samples, it is efficient

to enforce the low-rank constraint on 𝑋 using the CUR approximation technique. Let 𝑅 = 𝑋𝐼,:,

𝐶 = 𝑋:,𝐽, and 𝑈 = 𝑋𝐼,𝐽. Applying gradient descent directly on 𝑅 and 𝐶 yields:

𝑅𝑘+1 = [𝑋𝑘 ] 𝐼,: + 𝜂𝑅 (cid:2)PΩ𝑅 (𝑌 − 𝑋𝑘 − 𝑆𝑘+1)(cid:3)
𝐼,:
𝐶𝑘+1 = [𝑋𝑘 ]:,𝐽 + 𝜂𝐶 (cid:2)PΩ𝐶 (𝑌 − 𝑋𝑘 − 𝑆𝑘+1)(cid:3)

,

,

:,𝐽

where 𝜂𝑅 and 𝜂𝐶 are the step sizes. However, When it comes to the intersection submatrix 𝑈,

it is more complicated as Ω𝑅 and Ω𝐶 can have overlaps. We abuse the notation (cid:93) and define an

operator called union sum here:

(cid:2)[𝑅𝑘+1]:,𝐽 (cid:93) [𝐶𝑘+1] 𝐼,:

(cid:3)

𝑖, 𝑗 =

[𝑅𝑘+1]𝑖, 𝑗

[𝐶𝑘+1]𝑖, 𝑗

if (𝑖, 𝑗) ∈ Ω𝑅 \ Ω𝐶;

if (𝑖, 𝑗) ∈ Ω𝐶 \ Ω𝑅;

𝜂𝑅𝜂𝐶
𝜂𝑅+𝜂𝐶

(cid:16) [𝑅𝑘+1]𝑖, 𝑗
𝜂𝑅

+ [𝐶𝑘+1]𝑖, 𝑗
𝜂𝐶

(cid:17)

if (𝑖, 𝑗) ∈ Ω𝐶 ∩ Ω𝑅;

0

otherwise.






80

Basically, we take whatever value we have for the non-overlapped entries and take a weighted aver-

age for the overlaps where the weights are determined by the stepsizes used in the updates of 𝑅𝑘+1

and 𝐶𝑘+1. To ensure the rank-𝑟 constraint, at least one of 𝐶𝑘+1, 𝑈𝑘+1 or 𝑅𝑘+1 should be rank-𝑟. For

computational efficiency, we choose to put it on the smallest one. Thus,

𝑈𝑘+1 = H𝑟 (cid:0)[𝑅𝑘+1]:,𝐽 (cid:93) [𝐶𝑘+1] 𝐼,:

(cid:1) ,

where H𝑟 is the best rank-𝑟 approximation operator via truncated SVD. After replacing the inter-

section part 𝑈𝑘+1 in the previously updated 𝑅𝑘+1 and 𝐶𝑘+1, we have the new estimation of low-rank

component:

𝑋𝑘+1 = 𝐶𝑘+1𝑈†

𝑘+1

𝑅𝑘+1.

(3.1)

However, (3.1) is just a conceptual step and one should never compute it. In fact, the full matrix

𝑋𝑘 is never needed and should not be formed in the algorithm as updating the corresponding CUR

components is sufficient.

We detect the outliers and put them into the sparse matrix 𝑆 via hard-thresholding operator:




The hard-thresholding on residue 𝑌 − 𝑋𝑘 , paired with iterative decayed thresholding values:

if |[𝑀]𝑖, 𝑗 | < 𝜁;

[T𝜁 (𝑀)]𝑖, 𝑗 =

otherwise.

[𝑀]𝑖, 𝑗

0

𝜁𝑘+1 = 𝛾 𝑘 𝜁0 with some 𝛾 ∈ (0, 1),

has shown promising performance in outlier detection in prior art [12, 20, 14]. Notice that we

only need to remove outliers located on the selected rows and columns, i.e., 𝑅 and 𝐶, since they

are the only components needed to update the low-rank component later. Therefore, for compu-

tational efficiency, we should only compute 𝑋𝑘 on the selected rows and columns to update 𝑆𝑘+1

correspondingly—as said, one should never compute the full 𝑋𝑘 in this algorithm. In particular,

[𝑋𝑘 ] 𝐼,: = [𝐶𝑘 ] 𝐼,:𝑈†

𝑘 𝑅𝑘 and [𝑋𝑘 ]:,𝐽 = 𝐶𝑘𝑈†

𝑘 [𝑅𝑘 ]:,𝐽 .

(3.2)

The stopping criteria is set to be 𝑒𝑘 ≤ 𝜀 where 𝜀 is the targeted accuracy and the computational

error is

𝑒𝑘 =

(cid:104)PΩ𝑅∪Ω𝐶 (𝑆𝑘 + 𝑋𝑘 − 𝑌 ), 𝑆𝑘 + 𝑋𝑘 − 𝑌 (cid:105)
(cid:104)PΩ𝑅∪Ω𝐶𝑌 , 𝑌 (cid:105)

.

(3.3)

81

The recommended stepsizes are 𝜂𝑅 = 1

𝑝𝑅 and 𝜂𝐶 = 1
of Ω𝑅 and Ω𝐶 respectively. Smaller stepsizes should be used with larger 𝛼, i.e., more outliers.

𝑝𝐶 where 𝑝𝑅 and 𝑝𝐶 are the observation rates

3.3 Numerical Experiments

In this section, we will verify the empirical performance of RCURC with both synthetic and

real datasets. All experiments are implemented using MATLAB Online (R2024a, Update 6) and

executed on a cloud-based Linux environment running Ubuntu 20.04 with kernel version 5.15.0-

1062-AWS.

3.3.1 Synthetic Datasets

In this simulation, we assess the computational efficiency of our algorithm, RCURC, in address-

ing the robust CCS completion problem. We construct 𝑌 = 𝑋 + 𝑆, a 𝑑 × 𝑑 matrix with 𝑑 = 3000,

where 𝑋 = 𝑊𝑉 (cid:62) is a randomly generated rank-𝑟 matrix. To create the sparse outlier tensor 𝑆, we

randomly select 𝛼 percent entries to form the support of 𝑆. The values of the non-zero entries are
then uniformly sampled from the range [−𝑐E((cid:12)

(cid:12)
(cid:12))]. To generate the robust CCS

(cid:12)[𝑋]𝑖, 𝑗

completion problems, we set

|J|𝑑 = 25%. The results are obtained by
averaging over 10 runs and reported in Figure 3.2. Both figures in Figure 3.2 depict the relationship

𝑑 = 30%, and

𝑑 = |J|
|I|

(cid:12)[𝑋]𝑖, 𝑗
|Ω𝑅 |

(cid:12)), 𝑐E((cid:12)
(cid:12)
|I|𝑑 = |Ω𝐶 |

between the relative error 𝑒𝑘 and computational time for our RCURC method with varying rank 𝑟

and outlier amplification factor 𝑐. It is noteworthy that RCURC consistently achieves nearly linear

convergence rates across different scenarios. The empirical convergence illustrated in the left sub-

plot of Figure 3.2 shows that smaller r values allow the algorithm to achieve a given relative error

in fewer iterations. This is likely because smaller 𝑟 values minimize the impact of noise during the

iterative process, enabling the algorithm to concentrate on the dominant low-rank structure of the

matrix, which results in faster convergence.

82

Figure 3.2 Empirical convergence of RCURC [18]. Left: c = 10 and varying r. Right: 𝛼 = 0.2,
𝑟 = 5 and varying 𝑐.

3.3.2 Video Background Subtraction

We have applied RCURC under our robust CCS model to the problem of background separation.

We evaluated our algorithm on the Train Station dataset [110]. The dataset is of size 173×216×500.

In order to transform this data into a low-rank matrix, a specific reconfiguration process is applied.

This process involves stacking each of the frontal slices of the tensor, which are essentially indi-

vidual frames of the video. To transform high-dimensional video data into a low-rank matrix for

streamlined data processing and analysis, we flatten the height and width dimensions (173 and 216)

into a single dimension while retaining the frame dimension. This reshaping converts the original

3-dimensional tensor into a 2-dimensional matrix, facilitating subsequent computations. The re-

sulting matrix has a size of 37,368 (the product of 173 and 216) by 500 (the number of frames). The

CCS model is constructed by selecting 5% of the rows and 5% of the columns to create subrow and

subcolumn matrices, with a sampling rate of 80% on each submatrices. We selected several bench-

mark algorithms for comparison, including Principal Component Pursuit (RPCA) [23], Stable Prin-

cipal Component Pursuit (SPCP) [144], Low-Rank Matrix Factorization (LRMF) [81], Accelerated

Alternating Projection (AccAltProj) [12], Iterated Robust CUR with fixed indices (IRCUR-F) [15],

and Iterated Robust CUR with resampled indices (IRCUR-R) [15]. However, all six benchmark al-

gorithms are operated under the full observation model since we find they do not visually perform

83

02468Time(secs)10-610-410-2100Relative ErrorRelative Error vs Runtimer=5r=10r=150246Time(secs)10-610-410-2100Relative ErrorRelative Error vs Runtimec= 10c= 50c = 200well on the constructed CCS mdoel. Since IRCUR-F and IRCUR-R are both CUR-based algo-

rithms, we set the sampling parameters for IRCUR-R and IRCUR-F to 1.7732 × 102 and 4.0227,

respectively. This configuration ensures that these two algorithms access 5% of the rows and 5% of

the columns from the full observation data in each iteration. The methods RPCA, SPCP, AccAlt-

Proj, IRCUR-F, IRCUR-R and RCURC are configured to run with a maximum of 50 iterations or

until it meets a convergence tolerance of 10−3, whichever condition is satisfied first. We man-

ually set the regularization parameter for RPCA and SPCP to 0.001, as it yielded the best visual

results during tuning from the values 0.001, 0.01, 0.1, 1, and 10. The LRMF method is configured

to run with a maximum of 5 iterations and a rank of 1. After manual tunings, where the rank pa-

rameter is tested with values 1, 3, and 5, we selected rank 1 as it provided the best balance between

visual quality and runtime efficiency. Increasing the rank provided no significant improvement in

visual quality while substantially increasing the runtime. Similarly, increasing the number of it-

erations beyond 5 does not result in noticeable improvements in visual quality, but significantly

extended the runtime. For AccAltProj, IRCUR-R, IRCUR-F, and RCURC, we select a rank pa-

rameter of 1 as it yields the best visual results. This choice is based on tuning the rank parameter

over the values 1, 3, 5, 7, 9. The visual results are shown in Figure 3.3, consisting of five selected

frames (80th, 160th, 240th, 320th and 400th). We present the corresponding quantitative results

in Table 3.1, where the comparison is performed using the Peak Signal-to-Noise Ratio (PSNR) to

evaluate reconstruction accuracy and computational time to assess efficiency. The PSNR is cal-

culated by comparing reconstructed background, obtained through different methods, against the

ground truth background tensor. The ground truth tensor is created by replicating the first frame,

which represents a static background, across all frames in the dataset. For deterministic algorithms

like RPCA, SPCP, LMRF, each quantitative result is calculated as the average of ten independent

runs to help mitigate errors from machine precision, floating-point arithmetic, or other low-level

numerical issues. In contrast, ICURC-R and ICURC-F involve randomness in row and column

selection under full obeservation during iterations. Our method, based on the inherently random

sampling model—the CCS model, naturally involves randomness. Thus, each quantitative result of

84

these three random method is computed as the average of ten independent runs, with the standard

deviation included.

Table 3.1 Comparison of runtime and PSNR among RPCA, SPCP, LRMF, IRCUR-R, IRCUR-F
based on full observation and RCURC based on the CCS model.

Runtime (sec)
PSNR

RPCA SPCP LRMF AccAltProj
13.55
28.35

3.32
41.65

36.86
32.04

41.75
34.62

IRCUR-R
0.31 ± 0.02
41.40 ± 0.19

IRCUR-F
0.18 ± 0.03
41.06 ± 0.22

RCURC
0.20 ± 0.05
40.60 ± 0.58

The visualization results in Figure 3.3 illustrate the performance of various methods in recon-

structing static background components. Rows 3, 4, and 5 highlight the effectiveness of RPCA,

SPCP, and LRMF when applied to full observation. While these methods generally produce con-

sistent and comparable results, subtle imperfections are noticeable in certain frames, such as the

160th frame, where minor blurring or incomplete restoration occurs. Rows 6, 7, and 8, correspond-

ing to AccAltProj, IRCUR-R, and IRCUR-F, demonstrate notable performance in reconstructing

background components. These methods effectively handle the background separation task, deliv-

ering visually satisfactory outputs. It is evident that the our method (last row) performs well in

background subtraction. The results are comparable to other state-of-the-art algorithms under full

observation, indicating the success of the our method in the video background separation task. The

results in Table 3.1 provide a quantitative comparison of runtime and PSNR across various meth-

ods. The RCURC algorithm achieves a runtime of 0.20 ± 0.05 seconds, which is significantly faster

than RPCA (13.55 seconds), SPCP (36.86 seconds), LRMF (41.75 seconds), and AccAltProj (3.32

seconds). When compared to IRCUR-R (0.31 ± 0.02 seconds) and IRCUR-F (0.18 ± 0.03 seconds),

RCURC remains highly competitive in terms of runtime. Regarding PSNR, our method achieves

a value of 40.60 ± 0.58, which is higher than RPCA (28.35), SPCP (32.04), and LRMF (34.62).

Although the PSNR values of IRCUR-R (41.40 ± 0.19) and IRCUR-F (41.06 ± 0.22) are slightly

higher, our method still demonstrates competitive performance.

85

80th frame

160th frame

240th frame

320th frame

400th frame

Figure 3.3 Video background subtraction results: Row 1 shows the original images (full obser-
vation) at the corresponding frames, while Row 2 presents the observed images generated by the
CCS model at the respective frames. Rows 3 to 8 showcase the background components extracted
using RPCA, SPCP, LMRF, AccAltProj, IRCUR-R, and IRCUR-F algorithms based on the full ob-
servation model. Row 9 presents the results obtained using the RCURC algorithm under the CCS
model.

86

3.4 Conclusion

This chapter introduces a novel mathematical model for robust matrix completion problems

with cross-concentrated samples. A highly efficient non-convex algorithm, dubbed RCURC, has

been developed for the proposed model. The key techniques are projected gradient descent and

CUR approximation. The numerical experiments, with both synthetic and real datasets, show high

potential. In particular, we consistently observe linear convergence on RCURC.

As for future work, we will study the statistical properties of the proposed robust CCS comple-

tion model, such as theoretical sample complexities and outlier tolerance. The recovery guarantee

with a linear convergence rate will also be established for RCURC. We will also try to give a theoret-

ical analysis explaining why a smaller rank accelerates the convergence of our RCURC algorithm.

We will also explore other real-world applications that suit the proposed model.

87

CHAPTER 4

CONCLUSION

88

In this thesis, we have addressed critical challenges in matrix and tensor analysis by develop-

ing robust and flexible methodologies for data recovery tasks in data science. The methodologies

propose in this thesis contribute to the advancement of matrix and tensor analysis, particularly in

scenarios where robustness and flexibility are critical. By addressing the challenges posed by noise,

sparsity, and complex data structures, the propose techniques have the potential to benefit a wide

range of applications, such as image processing. Furthermore, the theoretical foundations estab-

lished for robust sampling and decomposition provide a framework for future extensions in related

fields. Our contributions span two interconnected projects, each tackling fundamental limitations

in existing approaches while extending their applicability to real-world scenarios characterized by

noise, sparsity, and high dimensionality. This chapter summarizes the key contributions of the

thesis.

4.1 Summary of Contributions

Guaranteed Sampling Flexibility for Tensor Completion

In this project, we address the limitations of existing tensor completion methods by introducing

Tensor Cross-Concentrated Sampling (t-CCS), a generalization of CCS to higher-order tensors. Ac-

companying this sampling framework, we develop the Iterative Tensor CUR Completion (ITCURC)

algorithm, which offers theoretical guarantees for low-tubal-rank tensor recovery. Through rigor-

ous theoretical analysis and extensive empirical validation, this project demonstrates the flexibility,

accuracy, and computational efficiency of t-CCS-based model.

Robust CCS Completion for Matrix Analysis

In this project, we explore the robustness of CCS for matrix completion. While CCS has demon-

strated effectiveness in capturing cross-sectional dependencies, its sensitivity to sparse outliers

posed a significant limitation. To address this, we propose the Robust CCS Completion frame-

work, introducing a non-convex iterative algorithm designed to handle noisy and incomplete data.

Experiments on synthetic and real-world datasets validate our algorithm’s efficiency and robustness,

establishing it as a robust tool for practical data completion tasks.

89

BIBLIOGRAPHY

[1]

Benefits and risks of MRI. Benefits and Risks of MRI. Accessed: 2023-12-19.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

D. Achlioptas and F. McSherry. Fast computation of low-rank matrix approximations. Jour-
nal of the ACM (JACM), 54(2):9–es, 2007.

S. Ahmadi-Asl, C. F. Caiafa, A. Cichocki, A. H. Phan, T. Tanaka, I. Oseledets, and J. Wang.
Cross tensor approximation methods for compression and dimensionality reduction. IEEE
Access, 9:150809–150838, 2021.

S. Ahmadi-Asl, A. H. Phan, A. Cichocki, A. Sozykina, Z. A. Aghbari, J. Wang, and I. Os-
eledets. Adaptive cross tubal tensor approximation. arXiv:2305.05030, 2023.

A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine
learning, 73(3):243–272, 2008.

H. Avron and C. Boutsidis. Faster subset selection for matrices and applications. SIAM
Journal on Matrix Analysis and Applications., 34(4):1464–1499, 2013.

J. Bennett, C. Elkan, B. Liu, P. Smyth, and D. Tikk. KDD cup and workshop 2007. SIGKDD
Explor. Newsl., 9(2):51–52, 2007.

A. Bhaskara, A. Rostamizadeh, J. Altschuler, M. Zadimoghaddam, T. Fu, and V. Mirrokni.
Greedy column subset selection: New bounds and distributed algorithms. International Con-
ference on Machine Learning, 2016.

C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix recon-
struction. SIAM Journal on Computing, 43(2):687–717, 2014.

[10] C. Boutsidis and D. Woodruff. Optimal CUR matrix decompositions. SIAM Journal on

Computing, 46(2):543–589, 2017.

[11] H. Cai, J.-F. Cai, T. Wang, and G. Yin. Accelerated structured alternating projections for ro-
bust spectrally sparse signal recovery. IEEE Transactions on Signal Processing, 69:809–821,
2021.

[12] H. Cai, J.-F. Cai, and K. Wei. Accelerated alternating projections for robust principal com-

ponent analysis. Journal of Machine Learning Research, 20(1):685–717, 2019.

[13] H. Cai, J.-F. Cai, and J. You. Structured gradient descent for fast robust low-rank Hankel

matrix completion. SIAM Journal on Scientific Computing., 45(3):A1172–A1198, 2023.

[14] H. Cai, Z. Chao, L. Huang, and D. Needell. Fast robust tensor principal component analysis
via fiber CUR decomposition. In Proceedings of the IEEE/CVF International Conference

90

on Computer Vision (ICCV) Workshops, pages 189–197, 2021.

[15] H. Cai, K. Hamm, L. Huang, J. Li, and T. Wang. Rapid robust principal component analysis:
CUR accelerated in exact low rank estimation. IEEE Signal Processing Letters, 28:116–120,
2020.

[16] H. Cai, K. Hamm, L. Huang, and D. Needell. Mode-wise tensor decompositions: Multi-
dimensional generalizations of CUR decompositions. Journal of Machine Learning Re-
search, 22(185):1–36, 2021.

[17] H. Cai, K. Hamm, L. Huang, and D. Needell. Robust CUR decomposition: Theory and
imaging applications. SIAM Journal on Imaging Sciences, 14(4):1472–1503, 2021.

[18] H. Cai, L. Huang, C. Kundu, and B. Su. On the robustness of cross-concentrated sampling for
matrix completion. In 2024 58th Annual Conference on Information Sciences and Systems
(CISS), pages 1–5, 2024.

[19] H. Cai, L. Huang, P. Li, and D. Needell. Matrix completion with cross-concentrated sam-
pling: Bridging uniform sampling and CUR sampling. IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, 2023.

[20] H. Cai, J. Liu, and W. Yin. Learned robust PCA: A scalable deep unfolding approach for
high-dimensional outlier detection. In Advances in Neural Information Processing Systems,
volume 34, pages 16977–16989, 2021.

[21]

J.-F. Cai, R. H. Chan, and Z. Shen. A framelet-based image inpainting algorithm. Applied
and Computational Harmonic Analysis, 24(2):131–149, 2008.

[22]

J.-F. Cai, T. Wang, and K. Wei. Fast and provable algorithms for spectrally sparse signal re-
construction via low-rank Hankel matrix completion. Applied and Computational Harmonic
Analysis, 46(1):94–121, 2019.

[23] E. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of

the ACM, 58(3):1–37, 2011.

[24] E. Candes and B. Recht. Exact matrix completion via convex optimization. Communications

of the ACM, 55(6):111–119, 2012.

[25] E. J. Candès and B. Recht. Exact matrix completion via convex optimization. Foundations

of Computational Mathematics, 9(6):717–772, 2009.

[26] E. J. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix completion.

IEEE transactions on information theory, 56(5):2053–2080, 2010.

[27] C. Chen, M. Gu, Z. Zhang, W. Zhang, and Y. Yu. Efficient spectrum-revealing cur matrix

91

decomposition. In International Conference on Artificial Intelligence and Statistics, pages
766–775. Proceedings of Machine Learning Research, 2020.

[28]

J. Chen, Y. Wei, and Y. Xu. Tensor CUR decomposition under t-product and its perturbation.
Numerical Functional Analysis and Optimization, 43(6):698–722, 2022.

[29] P. Chen and D. Suter. Recovering the missing components in a large noisy low-rank ma-
trix: Application to sfm. IEEE Transactions on Pattern Analysis and Machine Intelligence,
26(8):1051–1063, 2004.

[30] Y. Chen and Y. Chi. Robust spectral compressed sensing via structured matrix completion.

IEEE Transactions on Information Theory, 60(10):6576–6601, 2014.

[31] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis. Low-rank matrix recovery from errors

and erasures. IEEE Transactions on Information Theory, 59(7):4324–4337, 2013.

[32]

J. Chiu and L. Demanet. Sublinear randomized algorithms for skeleton decompositions.
SIAM Journal on Matrix Analysis and Applications., 34(3):1361–1383, 2013.

[33] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, D. P. Mandic, et al. Tensor networks
for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompo-
sitions. Foundations and Trends® in Machine Learning, 9(4-5):249–429, 2016.

[34] K. L. Clarkson and D. Woodruff. Numerical linear algebra in the streaming model. In Pro-
ceedings of the forty-first annual ACM symposium on Theory of computing, pages 205–214,
2009.

[35] A. Deshpande and L. Rademacher. Efficient volume sampling for row/column subset se-
In 2010 ieee 51st annual symposium on foundations of computer science, pages

lection.
329–338. IEEE, 2010.

[36] Y. Dong and P.-G. Martinsson. Simpler is better: A comparative study of randomized algo-

rithms for computing the CUR decomposition. arXiv preprint arXiv:2104.05877, 2021.

[37] D. L. Donoho.

Compressed sensing.

IEEE Transactions on information theory,

52(4):1289–1306, 2006.

[38] P. Drineas, R. Kannan, and M. Mahoney. Fast Monte Carlo algorithms for matrices III:
Computing a compressed approximate matrix decomposition. SIAM Journal on Computing,
36(1):184–206, 2006.

[39] P. Drineas, R. Kannan, and M. Mahoney. Fast monte carlo algorithms for matrices iii:
Computing a compressed approximate matrix decomposition. SIAM Journal on Comput-
ing, 36(1):184–206, 2006.

92

[40] P. Drineas, M. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions.

SIAM Journal on Matrix Analysis and Applications., 30(2):844–881, 2008.

[41] P. Drineas, M. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions.

SIAM Journal on Matrix Analysis and Applications, 30(2):844–881, 2008.

[42] G. Ely, S. Aeron, N. Hao, and M. Kilmer. 5D seismic data completion and denoising using

a novel class of tensor decompositions. Geophysics, 80:V83 – V95, 2015.

[43] A. Gaur and S. S. Gaur. Statistical methods for practice and research: A guide to data

analysis using SPSS. Sage, 2006.

[44] P. Y. Gidisu and M. E. Hochstenbach. A hybrid DEIM and leverage scores based method for
CUR index selection. In European Consortium for Mathematics in Industry, pages 147–153.
Springer, 2021.

[45] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an

information tapestry. Communications of the ACM, 35(12):61–70, 1992.

[46] S. A. Goreinov, I. Oseledets, D. V. Savostyanov, E. E. Tyrtyshnikov, and N. L. Zamarashkin.
How to find a good submatrix. In Matrix Methods: Theory, Algorithms And Applications:
Dedicated to the Memory of Gene Golub, pages 247–256. World Scientific, 2010.

[47] S. A. Goreinov and E. E. Tyrtyshnikov. The maximal-volume concept in approximation by

low-rank matrices. Contemporary Mathematics, 280:47–52, 2001.

[48] L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM Journal on Matrix

Analysis and Applications., 31(4):2029–2054, 2010.

[49] L. Grasedyck and S. Krämer. Stable ALS approximation in the TT-format for rank-adaptive

tensor completion. Numerische Mathematik, 143(4):855–904, 2019.

[50] D. Gross. Recovering low-rank matrices from few coefficients in any basis. IEEE Transac-

tions on Information Theory, 57:1548–1566, 2009.

[51] W. Guo and J.-M. Qiu. A local macroscopic conservative (lomac) low rank tensor method

for the vlasov dynamics. arXiv preprint arXiv:2207.00518, 2022.

[52] W. Guo and J.-M. Qiu. A low rank tensor representation of linear transport and nonlin-
ear vlasov solutions and their associated flow maps. Journal of Computational Physics,
458:111089, 2022.

[53] W. Guo and J.-M. Qiu. A conservative low rank tensor method for the Vlasov dynamics.

46(1):A232–A263, 2024.

93

[54] W. Hackbusch and S. Kühn. A new scheme for the tensor representation. Journal of Fourier

analysis and applications, 15(5):706–722, 2009.

[55] N. Halko, P.-G. Martinsson, and J. Tropp. Finding structure with randomness: Prob-
abilistic algorithms for constructing approximate matrix decompositions. SIAM review,
53(2):217–288, 2011.

[56] K. Hamm. Generalized pseudoskeleton decompositions. Linear Algebra and Its Applica-

tions, 664:236–252, 2023.

[57] K. Hamm and L. Huang. Perspectives on CUR decompositions. Applied and Computational

Harmonic Analysis, 48(3):1088–1099, 2020.

[58] K. Hamm and L. Huang. Stability of sampling for CUR decompositions. Foundations of

Data Science, 2(2):83, 2020.

[59] K. Hamm, M. Meskini, and H. Cai. Riemannian CUR decompositions for robust principal
component analysis. In ICML Workshop on Topology, Algebra, and Geometry in Machine
Learning, 2022.

[60] F. Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of

Mathematical Physics, 6(1-4):164–189, 1927.

[61] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He. Fast and accurate matrix completion via trun-
IEEE Transactions on Pattern Analysis and Machine

cated nuclear norm regularization.
Intelligence, 35(9):2117–2130, 2012.

[62] P. Jain and S. Oh. Provable tensor factorization with missing data. In Advances in Neural

Information Processing Systems, volume 2, page 1431 – 1439, 2014.

[63] S. Jain, A. Gutierrez, and J. Haupt. Noisy tensor completion for tensors with a sparse
canonical polyadic factor. In IEEE International Symposium on Information Theory, pages
2153–2157, 2017.

[64] T. Jiang, T. Huang, X. Zhao, and L. Deng. Multi-dimensional imaging data recovery via
minimizing the partial sum of tubal nuclear norm. Journal of Computational and Applied
Mathematics, 372:112680, 2020.

[65] T. Jiang, T. Huang, X. Zhao, T. Ji, and L. Deng. Matrix factorization for low-rank tensor

completion using framelet prior. Information Sciences, 436:403–417, 2018.

[66] T. Jiang, M. K. P. Ng, X. Zhao, and T. Huang. Framelet representation of tensor nuclear norm
for third-order tensor completion. IEEE Transactions on Image Processing, 29:7233–7244,
2020.

94

[67] R. Jin and S. Zhu. CUR algorithm with incomplete matrix observation. arXiv preprint

arXiv:1403.5647, 2014.

[68] H. Johan. Tensor rank is NP-complete. Journal of Algorithms, 4(11):644–654, 1990.

[69] M. Kilmer, K. Braman, N. Hao, and R. C. Hoover. Third-order tensors as operators on
matrices: A theoretical and computational framework with applications in imaging. SIAM
Journal on Matrix Analysis and Applications., 34(1):148–172, 2013.

[70] M. Kilmer, L. Horesh, H. Avron, and E. Newman. Tensor-tensor products for optimal rep-

resentation and compression. arXiv:2001.00046, 2019.

[71] M. Kilmer and C. Martin. Factorization strategies for third-order tensors. Linear Algebra

and Its Applications, 435(3):641–658, 2011.

[72] A. Kolbeinsson, J. Kossaifi, Y. Panagakis, A. Bulat, A. Anandkumar, I. Tzoulaki, and P. M.
Matthews. Tensor dropout for robust learning. IEEE Journal of Selected Topics in Signal
Processing, 15(3):630–640, 2021.

[73] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review,

51(3):455–500, 2009.

[74] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender sys-

tems. Computer, 42(8):30–37, 2009.

[75] S. Kumar, M. Mohri, and A. Talwalkar. Ensemble nystrom method. Advances in Neural

Information Processing Systems, 22, 2009.

[76] S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the Nyström method. Journal

of Machine Learning Research, 13(1):981–1006, 2012.

[77] C. Li, X. Wang, W. Dong, J. Yan, Q. Liu, and H. Zha. Joint active learning with feature se-
lection via CUR matrix decomposition. IEEE transactions on pattern analysis and machine
intelligence, 41(6):1382–1396, 2018.

[78] X. Li. Compressed sensing and matrix completion with constant proportion of corruptions.

Constructive Approximation, 37:73–99, 2013.

[79] X. Li and Y. Pang. Deterministic column-based matrix decomposition. IEEE Transactions

on Knowledge and Data Engineering, 22(1):145–149, 2010.

[80]

J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing
IEEE Transactions on Pattern Analysis and Machine Intelligence,
values in visual data.
35(1):208–220, 2013.

95

[81] Q. Liu and X. Li. Efficient low-rank matrix factorization based on ℓ1,𝜀-norm for online
background subtraction. IEEE Transactions on Circuits and Systems for Video Technology,
32(7):4900–4904, 2021.

[82] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan. Tensor robust principal component anal-
ysis: Exact recovery of corrupted low-rank tensors via convex optimization. In Proceedings
of the IEEE conference on computer vision and pattern Recognition, pages 5249–5257, 2016.

[83] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan. Tensor robust principal component anal-
ysis with a new tensor nuclear norm. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 42(04):925–938, apr 2020.

[84] C. Lu, J. Feng, Z. Lin, and S. Yan. Exact low tubal rank tensor recovery from Gaussian
measurements. In Proc. 27th International Joint Conference on Artificial Intelligence, pages
2504–2510, 2018.

[85] L. Mackey, M. Jordan, and A. Talwalkar. Divide-and-conquer matrix factorization. Advances

in neural information processing systems, 24, 2011.

[86] M. Mahoney and P. Drineas. CUR matrix decompositions for improved data analy-
sis. Proceedings of the National Academy of Sciences of the United States of America,
106(3):697–702, 2009.

[87] M. Mahoney and P. Drineas. CUR matrix decompositions for improved data analy-
sis. Proceedings of the National Academy of Sciences of the United States of America,
106(3):697–702, 2009.

[88] P.-G. Martinsson and J. Tropp. Randomized numerical linear algebra: Foundations and

algorithms. Acta Numerica, 29:403–572, 2020.

[89] P. Netrapalli, U. Niranjan, S. Sanghavi, A. Anandkumar, and P. Jain. Non-convex robust
PCA. In Advances in Neural Information Processing Systems, pages 1107–1115, 2014.

[90] L. Omberg, G. H. Golub, and O. Alter. A tensor higher-order singular value decomposition
for integrative analysis of DNA microarray data from different studies. Proceedings of the
National Academy of Sciences of the United States of America, 104(47):18371–18376, 2007.

[91]

I. Oseledets.
33(5):2295–2317, 2011.

Tensor-train decomposition.

SIAM Journal on Scientific Computing,

[92] U. Oswal, S. Jain, K. S. Xu, and B. Eriksson. Block CUR: Decomposing matrices using
groups of columns. In Machine Learning and Knowledge Discovery in Databases: European
Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings,
Part II 18, pages 360–376. Springer, 2019.

96

[93] V. Y. Pan, Q. Luan, J. Svadlenka, and L. Zhao. Superfast CUR matrix algorithms, their

pre-processing and extensions. arXiv preprint arXiv:1710.07946, 2017.

[94] R. Peng and E. Matsui. The Art of Data Science: A guide for anyone who works with Data.

Skybrude Consulting, LLC, 2015.

[95]

J. Popa, S. Minkoff, and Y. Lou. An improved seismic data completion algorithm using
low-rank tensor optimization: Cost reduction and optimal data orientation. Geophysics,
86(3):V219–V232, 2021.

[96] W. Qin, H. Wang, F. Zhang, J. Wang, X. Luo, and T. Huang. Low-rank high-order ten-
sor completion with applications in visual data. IEEE Transactions on Image Processing,
31:2433–2448, 2022.

[97] B. Recht. A simpler approach to matrix completion. Journal of Machine Learning Research,

12(12), 2011.

[98] S. Rendle. Factorization machines. In 2010 IEEE International conference on data mining,

pages 995–1000. IEEE, 2010.

[99] M. M. Salut and D. Anderson. Tensor robust CUR for compression and denoising of hyper-

spectral data. IEEE Access, 2023.

[100] M. M. Salut and D. V. Anderson. Tensor robust cur for compression and denoising of hyper-

spectral data. IEEE Access, 2023.

[101] P. Shah, N. Rao, and G. Tang. Sparse and low-rank tensor decomposition. In Advances in

Neural Information Processing Systems, volume 28, 2015.

[102] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos.
Tensor decomposition for signal processing and machine learning. IEEE Transactions on
signal processing, 65(13):3551–3582, 2017.

[103] A. Sobral and E. Zahzah. Matrix and tensor completion algorithms for background model
initialization: A comparative evaluation. Pattern Recognition Letters, 96:22–33, 2017.

[104] G. Song, M. K. P. Ng, and X. Zhang. Robust tensor completion using transformed tensor
singular value decomposition. Numerical Linear Algebra with Applications, 27(3):e2299,
2020.

[105] G. Song, M. K. P. Ng, and X. Zhang. Tensor completion by multi-rank via unitary transfor-

mation. Applied and Computational Harmonic Analysis, 65:348–373, 2023.

[106] D. C. Sorensen and M. Embree. A deim induced CUR factorization. SIAM Journal on

Scientific Computing, 38(3):A1454–A1482, 2016.

97

[107] B. Su, J. You, H. Cai, and L. Huang. Guaranteed sampling flexibility for low-tubal-rank

tensor completion. arXiv preprint arXiv:2406.11092, 2024.

[108] Z. Tan, L. Huang, H. Cai, and Y. Lou. Non-convex approaches for low-rank tensor com-
pletion under tubal sampling. In ICASSP 2023 - 2023 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.

[109] D. A. Tarzanagh and G. Michailidis. Fast randomized algorithms for t-product based tensor
operations and decompositions with applications to imaging data. SIAM Journal on Imaging
Sciences, 11(4):2629–2664, 2018.

[110] D. Thirde, L. Li, and F. Ferryman. Overview of the PETS2006 challenge.

In Proc. 9th
IEEE international workshop on performance evaluation of tracking and surveillance (PETS
2006), pages 47–50, 2006.

[111] C. Thurau, K. Kersting, and C. Bauckhage. Deterministic CUR for improved large-scale data
analysis: An empirical study. In Proceedings of the 2012 SIAM International Conference on
Data Mining, pages 684–695. SIAM, 2012.

[112] T. Tong, C. Ma, and Y. Chi. Accelerating ill-conditioned low-rank matrix estimation via
scaled gradient descent. Journal of Machine Learning Research, 22(150):1–63, 2021.

[113] T. Tong, C. Ma, A. Prater-Bennette, E. Tripp, and Y. Chi. Scaling and scalability: Provable
nonconvex low-rank tensor estimation from incomplete measurements. Journal of Machine
Learning Research, 23(163):1–77, 2022.

[114] J. Tropp. Column subset selection, matrix factorization, and eigenvalue optimization. In
Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages
978–986. Society for Industrial and Applied Mathematics, 2009.

[115] J. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computa-

tional Mathematics, 12:389–434, 2010.

[116] J. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends® in

Machine Learning, 8(1-2):1–230, 2015.

[117] L. Tucker.

Some mathematical notes on three-mode factor analysis. Psychometrika,

31(3):279–311, 1966.

[118] E. Tyrtyshnikov. Incomplete cross approximation in the mosaic-skeleton method. Comput-

ing, 64:367–380, 2000.

[119] S. Voronin and P.-G. Martinsson. Efficient algorithms for CUR and interpolative matrix

decompositions. Advances in Computational Mathematics, 43:495–516, 2017.

98

[120] A. Wang and Z. Jin. Near-optimal noisy low-tubal-rank tensor completion via singular tube
thresholding. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW),
pages 553–560, 2017.

[121] A. Wang, D. Wei, B. Wang, and Z. Jin. Noisy low-tubal-rank tensor completion through

iterative singular tube thresholding. IEEE Access, 6:35112–35128, 2018.

[122] S. Wang and Z. Zhang. Improving CUR matrix decomposition and the nyström approxi-
mation via adaptive sampling. Journal of Machine Learning Research, 14(1):2729–2769,
2013.

[123] S. Wang and Z. Zhang. Improving CUR matrix decomposition and the nyström approxi-
mation via adaptive sampling. Journal of Machine Learning Research, 14(1):2729–2769,
2013.

[124] S. Wang, Z. Zhang, and T. Zhang. Towards more efficient spsd matrix approximation and
CUR matrix decomposition. Journal of Machine Learning Research, 17(209):1–49, 2016.

[125] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli.

Image quality assessment:
IEEE Transactions on Image Processing,

from error visibility to structural similarity.
13(4):600–612, 2004.

[126] D. Woodruff et al. Sketching as a tool for numerical linear algebra. Foundations and Trends®

in Theoretical Computer Science, 10(1–2):1–157, 2014.

[127] C. Wu, H. Zhao, and J. Hu. Near field sampling compression based on matrix CUR de-
In 2021 IEEE International Symposium on Antennas and Propagation and

composition.
USNC-URSI Radio Science Meeting (APS/URSI), pages 1455–1456. IEEE, 2021.

[128] M. Xu, R. Jin, and Z.-H. Zhou. CUR algorithm for partially observed matrices.

In In-
ternational Conference on Machine Learning, pages 1412–1421. Proceedings of Machine
Learning Research, 2015.

[129] Y. Xu, R. Hao, W. Yin, and Z. Su. Parallel matrix factorization for low-rank tensor comple-

tion. Inverse Problems and Imaging, 9(2):601–624, 2015.

[130] S. Xue, W. Qiu, F. Liu, and X. Jin. Low-rank tensor completion by truncated nuclear norm
regularization. In 2018 24th International Conference on Pattern Recognition (ICPR), pages
2600–2605. IEEE, 2018.

[131] J. Yang, X. Zhao, T. Ma, Y. Chen, T. Huang, and M. Ding. Remote sensing images destriping
using unidirectional hybrid total variation and nonconvex low-rank regularization. Journal
of Computational and Applied Mathematics, 363:124–144, 2020.

[132] X. Yi, D. Park, Y. Chen, and C. Caramanis. Fast algorithms for robust PCA via gradient

99

descent. In Advances in Neural Information Processing Systems, pages 4152–4160, 2016.

[133] F. Zhang, J. Wang, W. Wang, and C. Xu. Low-tubal-rank plus sparse tensor recovery with
IEEE Transactions on Pattern Analysis and Machine Intelli-

prior subspace information.
gence, 43(10):3492–3507, 2020.

[134] G. Zhang, H. Li, and Y. Wei. CPQR-based randomized algorithms for generalized cur de-

compositions. Computational and Applied Mathematics, 43(3):132, 2024.

[135] L. Zhang, L. Song, B. Du, and Y. Zhang. Nonlocal low-rank tensor completion for visual

data. IEEE Transactions on Cybernetics, 51(2):673–685, 2019.

[136] S. Zhang and M. Wang. Correction of corrupted columns through fast robust hankel matrix

completion. IEEE Transactions on Signal Processing, 67(10):2580–2594, 2019.

[137] T. Zhang and Y. Yang. Robust PCA by manifold optimization. Journal of Machine Learning

Research, 19(1):3101–3139, 2018.

[138] Z. Zhang and S. Aeron. Exact tensor completion using t-SVD. IEEE Transactions on Signal

Processing, 65(6):1511–1526, 2017.

[139] Z. Zhang, G. Ely, S. Aeron, N. Hao, and M. Kilmer. Novel methods for multilinear data
completion and de-noising based on tensor-SVD. In Proceedings of the IEEE conference on
computer vision and pattern Recognition, pages 3842–3849, 2014.

[140] Q. Zhao, L. Zhang, and A. Cichocki. Bayesian CP factorization of incomplete tensors with
automatic rank determination. IEEE transactions on pattern analysis and machine intelli-
gence, 37(9):1751–1763, 2015.

[141] X. Zhao, J. Yang, T. Ma, T. Jiang, M. K. P. Ng, and T. Huang. Tensor completion via
complementary global, local, and nonlocal priors. IEEE Transactions on Image Processing,
31:984–999, 2022.

[142] Y. Zheng, T. Huang, X. Zhao, T. Jiang, T. Ma, and T. Ji. Mixed noise removal in hyperspectral
image via low-fibered-rank regularization. IEEE Transactions on Geoscience and Remote
Sensing, 58(1):734–749, 2019.

[143] P. Zhou, C. Lu, Z. Lin, and C. Zhang. Tensor factorization for low-rank tensor completion.

IEEE Transactions on Image Processing, 27(3):1152–1163, 2017.

[144] Z. Zhou, X. Li, J. Wright, E. Candes, and Y. Ma. Stable principal component pursuit. In
2010 IEEE international symposium on information theory, pages 1518–1522. IEEE, 2010.

100