SPARSITY IN THE SPECTRUM: SPARSE FOURIER TRANSFORMS AND SPECTRAL
            METHODS FOR FUNCTIONS OF MANY DIMENSIONS
                                        By
                                    Craig Gross
                                A DISSERTATION
                                    Submitted to
                            Michigan State University
                    in partial fulfillment of the requirements
                                 for the degree of
                  Applied Mathematics—Doctor of Philosophy
                                       2023


                                              ABSTRACT
The Fourier basis has been a cornerstone of numerical approximations due in part to its amenable
algebraic properties resulting in efficient algorithmic approaches. Primary among these is the Fast
Fourier Transform (FFT) which transforms a collection samples of a univariate function into that
function’s Fourier coefficients with computational complexity linear in the number of samples (with
an extra logarithmic term). Extensions based on the FFT include algorithms that take advantage of
sparsity in a function’s Fourier coefficients (sparse Fourier transforms or SFTs) to lower this com-
plexity even further as well as efficient approaches for approximating certain Fourier coefficients
of multivariate functions, most often those indexed over computationally friendly hyperbolic cross
structures. The ability to quickly compute a function’s Fourier coefficients has additionally allowed
for a variety of applications including fast algorithms for numerically solving partial differential
equations (PDEs) via spectral methods. This dissertation considers improvements on these three
applications of the FFT to produce (1) a high-dimensional Fourier transform over arbitrary index
sets with reduced sampling complexity from current state of the art methods, (2) an accurate high-
dimensional, sparse Fourier transform that can dramatically drive down the sampling and compu-
tational complexity so long as a sparsity assumption is satisfied, and (3) a high-dimensional, sparse
spectral method which makes use of our sparse Fourier transform to solve PDEs with multiscale
structure in extremely high dimensions.
     All three of these applications rely on the method of rank-1 lattices for their flexibility. By using
this quasi-Monte Carlo approach for sampling in high-dimensions, high-dimensional functions are
converted into one-dimensional ones on which well-studied techniques can be used. We extend
these approaches by first developing a fully deterministic construction of multiple, smaller, rank-1
lattices to sample over simultaneously which drive down the sampling complexity from traditional
rank-1 lattice methods. Our improved technique depends only linearly on the size of the underlying
set of frequencies that Fourier coefficients are computed over rather than the previously standard
quadratic dependence (with additional logarithmic terms).
     We can push further beyond this linear dependence on the frequency set of interest by making


use of univariate SFTs after the high-dimensional to one-dimensional conversion. However, to
effectively integrate univariate SFT algorithms into the rank-1 lattice approach without ruining
the derived computational speedups, we provide an alternative approach. Rather than employing
multiple rank-1 lattice sampling sets, we need to employ multiple rank-1 lattice SFTs. The slightly
inflated sampling cost allows for significant gains in coefficient reconstruction: we produce two
methods whose dependence on the frequency set of interest is cast entirely into logarithmic terms.
The complexity is then quadratically or linearly (depending on the chosen variation) dependent on
an imposed sparsity parameter and linear in the dimension of the underlying function domain. The
dependence on this sparsity is then fully characterized in near-optimal approximation guarantees
for the function of interest.
    And just as the FFT provided the foundation for fast spectral methods for numerically approx-
imating solutions to PDE, so too does our high-dimensional, sparse Fourier transform provide the
foundation for a high-dimensional, sparse spectral method. However, to be most effective, the un-
derlying frequency set of interest should be primarily driven by the PDE itself rather than the user.
As such, we provide a technique for efficiently converting sparse Fourier approximations of the
PDE data into a Fourier basis in which the solution to the PDE will be guaranteed to have a good
approximation. These ingredients combined with the rich literature on spectral methods allow for
us to provide error estimates in the Sobolev norm for the solution which are fully characterized by
properties of the PDE, namely the Fourier sparsity of its data and conditions related to its well-
posedness.
    Throughout the text, these proposed algorithms are accompanied with practical considerations
and implementations. These implementations are then judged against a variety of numerical tests
which demonstrate performance on par with the theoretical guarantees provided.


Copyright by
CRAIG GROSS
2023


To Alan, my fellow scientist and my brother. I love you.
                           v


                                      ACKNOWLEDGEMENTS
First and foremost, I would like to thank my advisor, Mark Iwen, for your incredible support
throughout my time at Michigan State University. Your generosity in time, advice, ideas, and more
is the reason that this work exists and would not have been possible without your guidance. I also
owe to you the space that I had throughout my studies to fully explore my interests both mathe-
matically and professionally, and find a path that was fulfilling. And I also have you to thank for
allowing me to grow beyond the boundaries I came to MSU with, whether they be boundaries of
perspective, opportunity, or geography.
     In that vein, I would also like thank my collaborators at Technische Universität Chemnitz, Lutz
Kämmerer and Toni Volkmer, who introduced me to the wonderful world of rank-1 lattices and
whom I wrote Chapters 2 and 3 alongside. You helped me develop as a researcher and an applied
mathematician through your invaluable mentorship, contributions, and conversations. And I have
you and the rest of Daniel Potts’ group to thank for your incredible hospitality in my unforgettable
visit to Chemnitz.
     And to one of my first mathematical mentors, Andrew Gillette, I thank you for showing me what
it means to be a mathematician. From my first day of freshman year in the Cesar E. Chavez Build-
ing at the University of Arizona to our continued conversations at Lawrence Livermore National
Laboratory, you have been there to foster my mathematical journey and afford me the opportunities
to make it to this point.
     I would also like to thank my fellow mathematicians who I had the pleasure of sharing thoughts
with throughout my studies. In particular, I have Ben Adcock and Simone Brugiapaglia to thank
for the inspiration and motivation resulting in the sparse spectral method presented in Chapter 4. I
also thank the members of my committee, Yingda Cheng, Jun Kitagawa, and Rongrong Wang, for
your instruction and guidance throughout my time at MSU.
     To my friends in my cohort, thank you for the long nights of analysis homework, the HopCat
happy hours, and the consistent cycle of commiseration and inspiration. And to those friends who
came to MSU before or after me, thank you for making and keeping the math department bright,
                                                   vi


welcoming, and growing.
    But most of all, I owe my successes, my opportunities, and everything else to my family. My
heroes, my mother and father, have provided the encouragement and continual support to reach
where I am today. Your perpetual care, humor, creativity, and joy form the foundation for me every
day, and it is only on that foundation that I am able to grow and push myself into places, ideas,
and worlds previously unknown. And to my siblings, Katie, for your empathy, drive, and spirit that
keeps me moving forward; Essa, for your conversations that bring me the perspective I need; and
Alan, for the everlasting knowledge that I have your love and support behind me: I can’t thank each
of you enough.
    And finally, to Sarina, I could write another 124 pages about how this, and every day, is due
to you. But I’ll keep it brief. Simply put, this, and so many more of the achievements in my life,
wouldn’t have been able to happen without you. You’ve kept me together in the bad and have been
the celebration of the good. You’ve been by my side every day, my outlet, my reflection for thoughts,
joys, and all the rest. You bring me everything. Thank you.
                                                  vii


                                 TABLE OF CONTENTS
CHAPTER 1        INTRODUCTION        . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . 1
      1.1 Overview . . . . . . . .   . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . 1
      1.2 Notation . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . 9
      1.3 Fourier preliminaries . .  . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . 13
CHAPTER 2        CONSTRUCTING MULTIPLE RANK-1 LATTICES
                 DETERMINISTICALLY . . . . . . . . . . . . . . . .          . . . . . . . . . .  15
      2.1 Overview of results . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . .  15
      2.2 The proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . .    . . . . . . . . . .  17
      2.3 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . . . .  28
CHAPTER 3        HIGH-DIMENSIONAL SPARSE FOURIER TRANSFORMS                         . . . . . .  39
      3.1 Overview of results and prior work . . . . . . . . . . . . . . . . . .    . . . . . .  39
      3.2 One-dimensional sparse Fourier transform results . . . . . . . . . .      . . . . . .  44
      3.3 Fast multivariate sparse Fourier transforms . . . . . . . . . . . . . .   . . . . . .  53
      3.4 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . .  74
CHAPTER 4        SPARSE FOURIER SPECTRAL METHODS FOR SOLVING PDE                          . . .  83
      4.1 Overview of results and prior work . . . . . . . . . . . . . . . . . . . . .    . . .  83
      4.2 Elliptic PDE setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . .  87
      4.3 Galerkin spectral methods . . . . . . . . . . . . . . . . . . . . . . . . .     . . .  89
      4.4 Stamping sets and truncation analysis . . . . . . . . . . . . . . . . . . .     . . .  90
      4.5 Fully sublinear-time SFTs with randomized lattices . . . . . . . . . . . .      . . .  99
      4.6 A sparse spectral method via SFTs . . . . . . . . . . . . . . . . . . . . .     . . . 100
      4.7 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . 108
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
                                             viii


                                                 CHAPTER 1
                                              INTRODUCTION
1.1   Overview
    This dissertation is concerned with the efficient approximation of periodic functions of many
variables by Fourier series and associated applications in solving partial differential equations. For
a periodic function 𝑔 : T𝑑 → C, where T is taken to be R/Z, we wish to compute its Fourier series,
or at least an approximation, as quickly as possible. That is, we want to find the coefficients 𝑔, ˆ a
complex sequence indexed by multivariate frequencies k ∈ Z𝑑 , of the Fourier series
                                                     Õ
                                               𝑔=         𝑔ˆ k e2𝜋ik·◦ .
                                                    k∈Z𝑑
Since the collection of multivariate trigonometric monomials {e2𝜋ik·◦ }k∈Z𝑑 forms an orthonormal
basis for 𝐿 2 (T𝑑 ) (cf. Theorem 1.1), the Fourier coefficients of 𝑔 can be computed by
                                                 ∫
                                          𝑔ˆ k =       𝑔(x)e−2𝜋ik·x 𝑑x.
                                                  T𝑑
Of course, using this formulation would require full knowledge of 𝑔 to begin with, or at least enough
information to approximate this integral. However, this is the problem we are attempting to solve
in the first place.
    The univariate formulation of this problem has been classically solved using the fast Fourier
transform (FFT). Given a parameter 𝐾 ∈ N, the FFT computes approximate Fourier coefficients of
a function 𝑔 1d : T → R via a simple left Riemann sum over 𝐾 points:
                                                    𝐾      
                                         1d     1Õ            𝑗 −2𝜋i𝜔 𝑗/𝐾
                                      𝑔ˆ 𝜔 ≈            𝑔         e         .
                                                𝐾 𝑗=0        𝐾
Computing all approximate Fourier coefficients with frequencies in [𝐾] := {0, . . . 𝐾 − 1} at once
can be performed by the matrix multiplication F𝐾 g1d , where
                                                                                
                             1 −2𝜋i𝜔 𝑗/𝐾                                        1d 𝑗
                    F𝐾 :=      e                               and g := 𝑔1d
                                                                                            .
                             𝐾              𝜔∈[𝐾], 𝑗 ∈[𝐾]                          𝐾 𝑗 ∈[𝐾]
Taking advantage of algebraic properties of the Fourier basis, the FFT algorithm performs this
matrix multiplication in O (𝐾 log 𝐾) time and space, instead of the standard O (𝐾 2 ) computational
complexity (see, e.g., [56] for a good survey of these techniques).
                                                          1


    Returning to the multivariate setting, instead of using an equispaced sampling of the target
function over an interval, we can take an equispaced sampling over the 𝑑-dimensional grid, denoted
(𝑔(j/𝐾))j∈[𝐾] 𝑑 , and effectively apply 𝑑 FFTs along the sides of this now 𝑑-dimensional tensor.
Thus, this multivariate FFT has a time/space-complexity of O (𝐾 𝑑 log𝑑 𝐾). This exponential growth
in 𝑑 characterizes the well-known curse of dimensionality and therefore, this multivariate FFT is
only suitable for low dimensions.
    Approaches to avoid this curse of dimensionality for Fourier approximation form a vast body
of literature. The state of the art in the contexts we are interested in is discussed in the literature
reviews of the subsequent chapters. However, we summarize a simple and effective approach upon
which the remainder of this dissertation is based: using rank-1 lattices.
Definition 1.1. Given a natural number 𝑀 ∈ N and a generating vector z ∈ {1, . . . , 𝑀 − 1} 𝑑 , the
rank-1 lattice Λ(z, 𝑀) ⊂ T𝑑 is defined as
                                                                               
                                                     𝑗
                                 Λ(z, 𝑀) :=             z mod 1 | 𝑗 ∈ [𝑀] .
                                                    𝑀
    Intuitively, a rank-1 lattice gives a direction vector z to restrict the multivariate function 𝑔 into
a univariate one 𝑔 1d defined by 𝑡 ↦→ 𝑔(𝑡z). The 𝑀 sampling points in the rank-1 lattice are in fact
an equispaced sampling over T of 𝑔 1d . An FFT of these equispaced samples of 𝑔 1d is then able to
give us information about the Fourier coefficients of the original, high-dimensional function 𝑔.
    To see how the FFT relates to the Fourier coefficients of 𝑔, we consider the Fourier series of 𝑔 1d
by way of the Fourier series of 𝑔,
                                           Õ                          Õ ©­ Õ ª®
                          1d                             2𝜋ik·z𝑡
                        𝑔 (𝑡) = 𝑔(𝑡z) =            𝑔ˆ k e         =      ­       𝑔ˆ k ® e2𝜋i𝜔𝑡 .
                                                                         ­            ®
                                           k∈Z𝑑                       𝜔∈Z k∈Z𝑑
                                                                         « k·z=𝜔      ¬
Thus
                                                           Õ
                                              𝑔ˆ 𝜔1d =           𝑔ˆ k .
                                                          k∈Z𝑑
                                                          k·z=𝜔
In light of the fact that we will be using an FFT approximation of 𝑔ˆ 1d , let us also note well-known
                                                          2


aliasing effect of the FFT. For all 𝜔 ∈ [𝑀],
                                                              Õ
                                          F 𝑀 g1d       =                 𝑔ˆ 𝜔0                              (1.1)
                                                     𝜔
                                                                𝜔0 ∈Z
                                                            𝜔0 ≡𝜔 mod 𝑀
(see Lemma 1.3 for the proof and further explanation). We can then assert that
                                                               Õ
                                                1d
                                          F𝑀 g          =                  𝑔ˆ k .
                                                     𝜔
                                                                k∈Z𝑑
                                                            k·z≡𝜔 mod 𝑀
To make the most effective use of the length-𝑀 FFT, a rank-1 lattice should be chosen so that this
sum contains at most one Fourier coefficient of the original function in 𝑔.               ˆ In order to accomplish
this, we will restrict the scope of our Fourier coefficient approximation to some chosen frequency
set of interest I ⊂ Z𝑑 and introduce the idea of the modulus mapping and a reconstructing rank-1
lattice.
Definition 1.2. Choose some I ⊂ Z𝑑 . The modulus mapping 𝑚 z,𝑀 : I → [𝑀] is defined by
k ↦→ k · z mod 𝑀. A rank-1 lattice Λ(z, 𝑀) is said to be reconstructing for I if the modulus
mapping is injective. An equivalent condition is that
                                 k · z . h · z mod 𝑀 for all k ≠ h ∈ I.
     We find then that for any trigonometric polynomial 𝑔 ∈ ΠI := span{e2𝜋ik·◦ | k ∈ I}, (1.1)
reduces to
                                                     
                                   𝑔ˆ k = F 𝑀 g    1d
                                                                   for all k ∈ I
                                                        𝑚 z,𝑀 (k)
and for any 𝑔 with Fourier coefficients not necessarily supported on I,
                                                             Õ
                         𝑔ˆ k = F 𝑀 g1d              +                   𝑔ˆ h for all k ∈ I.                 (1.2)
                                           𝑚 z,𝑀 (k)
                                                               h∉I
                                                        h·z≡k·z mod 𝑀
The upshot is that we are able to compute all Fourier coefficients in I of a periodic function 𝑔 up to
errors related to restricting our attention to I. The full rank-1 lattice FFT approach is summarized
in Algorithm 1.1.
                                                            3


Algorithm 1.1 Rank-1 lattice FFT
Input: A function 𝑔 : T𝑑 → C, a frequency set of interest I ⊂ Z𝑑 , and a reconstructing rank-1
     lattice for I, Λ(z, 𝑀)
Output: Approximate Fourier coefficients ĝΛ ∈ CI
  1: g1d ← (𝑔( 𝑗z/𝑀)) 𝑗 ∈[𝑀]
  2: Compute F 𝑀 g1d
  3: for k ∈ I do
  4:      𝑔ˆ kΛ ← F 𝑀 g1d 𝑚z,𝑀 (k)
                         
  5: end for
     With the basic ideas behind the rank-1 lattice FFT in hand, we can motivate the remaining
chapters of this dissertation.
1.1.1    Multiple rank-1 lattices and their construction
     The most important ingredient for Algorithm 1.1 is a reconstructing rank-1 lattice for a chosen
frequency set I. The size of this rank-1 lattice also has major impacts on the computational com-
plexity Algorithm 1.1, namely, the sampling step and the FFT. Thus, the goal should be to find as
small a reconstructing rank-1 lattice as possible.
     The most popular reconstructing rank-1 lattice construction is the component-by-component
(CBC) approach [39, 56, 46, 48]. The idea is to start with the set of frequency differences D (I) :=
{k − h | k, h ∈ I} and consider these differences one component at a time. Each component of
the generating vector is chosen by a brute-force scan through these differences to ensure that there
are no collisions modulo 𝑀, where it suffices for 𝑀 to be a prime number between |D (I)|/2 and
|D (I)|. But note also that |I| . |D (I)| . |I| 2 , and in fact, there exist specific frequency sets
which require any associated reconstructing rank-1 lattice to have size 𝑀 = Ω(|I| 2 ) [12, Section
3]. See also [39, Section 5] for more information about rank-1 lattices and their often (seemingly
unnecessarily) large sizes. The goal of Chapter 2 is to reduce this quadratic dependence on |I|
(and therefore on the sampling and computational complexity of Algorithm 1.1) using a slight
generalization of rank-1 lattices. Rather than restricting the high-dimensional function to just one
lattice, we use multiple rank-1 lattices [40, 41], which can be smaller than a single reconstructing
rank-1 lattice is required to be, to drive down the overall complexity.
                                                   4


     In particular, Chapter 2 presents the first known deterministic algorithm for constructing a series
of multiple rank-1 lattices for an arbitrary frequency set. As input, it takes a reconstructing single
                                                                                                        
reconstructing rank-1 lattice and returns O (log |I|) lattices each of size at O |𝐼 | log2 (𝐾I |I|)
where 𝐾I is the sidelength of the smallest hypercube containing I. Each lattice handles a portion of
the frequencies in I so that performing FFTs over all smaller lattices will exactly recover the Fourier
coefficients of trigonometric polynomials in ΠI Approximation guarantees are also provided for
general periodic functions similar to (1.2). Due to the size of the full multiple rank-1 lattice returned,
any quadratic dependence on a single rank-1 lattice in |I| can therefore be reduced to a linear (with
polylogarithmic terms) dependence without incurring significant additional errors.
1.1.2    Sparse Fourier transforms and rank-1 lattices
     Though the efforts of Chapter 2 are able to reduce the amount of work necessary in a rank-1
lattice FFT approach, a linear dependence on |I| in the complexity may still be intolerable. For
large search spaces of multivariate frequencies I such as the full hypercube of sidelength 𝐾, I =
  (− 𝐾2 , 𝐾2 ] ∩ Z , these methods still suffer from the curse of dimensionality.
                 𝑑
     Rather than a more general multiple rank-1 lattice approach, Chapter 3 considers the case of
functions whose Fourier series are sparse or compressible. Since the rank-1 lattice procedure
reduces high-dimensional functions into one-dimensional ones, one-dimensional sparse Fourier
transform (SFT) techniques [25, 27, 36, 35, 51, 62, 37, 26, 18, 45, 57, 58, 53, 3, 2, 1] become par-
ticularly appealing. SFTs are compressive sensing algorithms which are highly specialized to take
advantage of the number theoretic and algebraic structure of the Fourier basis as much as possible.
As a result, SFTs rarely have to consider Fourier basis functions individually during the reconstruc-
tion process, and so can simultaneously reduce both their measurement needs and computational
complexity to effectively depend only on the number of important Fourier series coefficients in the
function one aims to approximate. Thus, SFTs can sidestep runtimes which are polynomially de-
pendent on the bandwidth (in the case of a rank-1 lattice FFT, 𝑀), and instead run sublinearly in the
magnitude of the underlying frequency space under consideration. If one desires to capture only
the largest 𝑠 Fourier coefficients of a function, the SFT discussed in Theorem 3.1, for example, runs
                                                     5


in O (𝑠2 log4 𝑀)-time/space (with a randomized version cutting the quadratic factor of 𝑠 down to
linear). Additionally, these techniques often furnish recovery guarantees for Fourier compressible
functions in terms of best 𝑠-term approximations in the same vein as compressed sensing results
[19, 24].
    However, simply replacing the FFT F 𝑀 g1d in Line 2 of Algorithm 1.1 with a suitable SFT
A 𝑠,𝑀 𝑔 1d is not enough to relieve linear dependence on |I|. The for loop from Line 3 to Line 5
which matches 𝑑-dimensional and one-dimensional frequencies requires a linear scan through I.
A simple optimization is to swap the order this process, and match the 𝑠-many entries of A 𝑠,𝑀 𝑔 1d
to the corresponding Fourier coefficients indexed over I. But even this is not enough, as it requires
complete knowledge of the inverse modulus mapping 𝑚 z,𝑀     −1 which is either built up through the
rank-1 lattice construction and stored or computed through a O (𝑑|I|)-computation. All benefit in
swapping the FFT along the lattice with an SFT is then lost.
    The methods given in Chapter 3 instead use samples along possibly larger lattices to produce
a sparse approximation of the Fourier transform of 𝑔 without directly inverting 𝑚 z,𝑀 . Two algo-
rithms are presented which operate on SFTs of manipulations of 𝑔 1d in order to relate the univariate
coefficients to their multivariate counterparts in 𝑜(|I|)-time. This allows the methods to run faster
and with less memory than it takes to simply enumerate the frequency set I and/or store 𝑚 z,𝑀 (I)
whenever 𝑔 has a sufficiently accurate sparse approximation.
    The result is a series of curse-of-dimensionality breaking, high-dimensional SFTs with proven
compressive-sensing type guarantees for arbitrary periodic functions. The approaches are linear
in 𝑑 in the sampling and run-time complexities which succeed deterministically in quadratic in 𝑠
time. As with the univariate SFT discussed above, this can be reduced to linear in 𝑠 time via ran-
domization. We defer to Section 3.1 for a fuller discussion in the context of the provided literature
review.
    Finally, though these results are able to sidestep the necessity of the inverse of the modulus
mapping, 𝑚 z,𝑀 (I), an existing reconstructing rank-1 lattice for I is still required. As discussed
above, CBC constructions, though only necessary to perform once, are still relatively expensive
                                                   6


in the context of SFT complexities. This requirement is dropped via a randomized approach to
constructing rank-1 lattices in Section 4.5, resulting in an algorithm with complexity fully sublinear
in |I|.
1.1.3    Applications to PDE
    The fast, high-dimensional SFT techniques of Chapter 3 are applied in Chapter 4 to construct
an efficient, numerical PDE solver. For this exposition, we consider as a model problem an elliptic
PDE with periodic boundary conditions
                                               −∇ · (𝑎∇𝑢) = 𝑓                                       (1.3)
where 𝑎, 𝑓 : T𝑑 → R are the PDE data, and 𝑢 : T𝑑 → R is the solution. Solving (1.3) using a
traditional Fourier spectral method amounts to replacing the data and the solution with their Fourier
series, simplifying the left-hand side into a single Fourier series, matching the Fourier coefficients
of both sides, and solving the resulting system of equations for the Fourier coefficients of 𝑢.
    Two main sources of approximation error arise when implementing this technique computa-
tionally. The first is due to truncating the Fourier series involved to a finite number of terms. The
second is due to numerically approximating the Fourier coefficients of the PDE data. Due to the
rich theory of traditional spectral methods, these two sources of error can directly quantify the error
of the resulting approximation of 𝑢.
Lemma 1.1 (Strang’s lemma, [13]). Let 𝑢 truncation be the function which has the same Fourier series
as 𝑢 but truncated in some manner, and 𝑎 approximate and 𝑓 approximate be computed using approxima-
tions of the Fourier series of 𝑎 and 𝑓 truncated in the same way as 𝑢 truncation . Then the procedure
outlined above produces a solution 𝑢 spectral which satisfies
          𝑢 − 𝑢 spectral 𝐻1
                            .𝑎, 𝑓 𝑢 − 𝑢 truncation 𝐻1
                                                      + 𝑎 − 𝑎 approximate 𝐿∞
                                                                             + 𝑓 − 𝑓 approximate 𝐿2
where .𝑎, 𝑓 denotes an upper bound with constants that depend on the PDE data.
    This is a rough simplification of Strang’s lemma [13], which is itself a generalization of the
well-known Céa’s lemma (the specific version of the lemma that we use is presented and proven in
Lemma 4.6 below). Effectively, it states that the spectral method solution is optimal up to its Fourier
                                                       7


series truncation and the approximation of the PDE data 𝑎 and 𝑓 . Thus, analyzing convergence
reduces to estimating these two errors.
     Using 𝑑-dimensional FFTs to compute 𝑎 approximate and 𝑓 approximate in the procedure suggested in
Lemma 1.1 naturally enforces a Fourier series truncation. A 𝑑-dimensional FFT using a tensorized
grid of 𝐾 uniformly spaced points in each dimension will produce approximate Fourier coefficients
indexed by frequencies in the 𝑑-dimensional hypercube on the integer lattice Z𝑑 of sidelength 𝐾
(note that when we refer to “bandwidth” in a multidimensional sense, we are still referring to the
sidelength 𝐾 of the hypercube containing these integer frequencies). As discussed above, the cost
of each 𝑑-dimensional FFT in general requires more than 𝐾 𝑑 operations, as does the linear-system
solve (in the absence of any sparsity or other tricks). Thus, not only do traditional Fourier spectral
methods suffer from the curse of dimensionality, but even in moderate dimensions, multiscale prob-
lems (i.e., PDE data which require very high bandwidth to be fully resolved) can result in intractable
computations.
     This is a prime opportunity to take advantage of our high-dimensional SFT algorithms to com-
pute 𝑎 approximate and 𝑓 approximate . This allows for the data terms in Strang’s Lemma above to con-
verge near-optimally in terms of their compressibility in the Fourier basis. However, these SFTs
only provide us with truncation information useful for 𝑎 and 𝑓 , not necessarily 𝑢. One of the more
significant contributions of Chapter 4 is resolving this truncation gap. By analyzing in detail the
effect of the differential operator discretized using SFT approximations in frequency, we provide a
technique for computing the most important Fourier coefficients of 𝑢 using knowledge of the most
important Fourier coefficients of 𝑎 and 𝑓 . We can then prove truncation estimates which allow for a
sparse spectral method with 𝐻 1 error guarantees fully characterized by the Fourier compressibility
of the data and terms relating to the ellipticity properties of the original PDE. Note that though we
only consider a diffusion term in (1.3) for the simplicity of this overview, the analysis in Chapter 4
is actually that of a full multiscale, high-dimensional advection-diffusion-reaction equation, sim-
ilar to, e.g., the governing equations for flow dynamics in a porous medium used in hydrological
modeling [61].
                                                      8


1.1.4   A note on previous publication of this work
    The three chapters following this introduction are each comprised of the results presented in
three previously available manuscripts. With some exceptions, Chapter 2 is published as [34],
Chapter 3 is published as [33], and (at the time of submission of this dissertation) Chapter 4 is
publicly available at [32] and has been submitted for publication. Thus, the contents of Chapters 2
and 3 were developed collectively with Lutz Kämmerer and Toni Volkmer and Chapters 2 to 4 with
Mark Iwen. Additionally, portions of this introduction were adapted from the introductions of the
original three manuscripts.
    That being said, there are changes in the results given in this dissertation from their original pre-
sentations. In Chapter 2, the main modification is clarifying the Fourier recovery mechanism and
the error guarantees for approximation in Corollary 2.1. Chapter 3 includes the 𝐿 ∞ error guarantees
for the phase-encoding SFT originally provided in [32] and extends these guarantees to all algo-
rithms analyzed. Finally, Chapter 4 provides a complete analysis of advection-diffusion-reaction
equations rather than solely the diffusion equations of the original text.
1.1.5   Organization
    The remainder of this chapter is comprised of a section setting the notation and a section collect-
ing some useful Fourier series related lemmas that are used throughout the text. The three following
chapters respectively present the three main results summarized above. Each chapter gives a short
overview, followed by the theory, and finally a numerics section with implementation details and
tests demonstrating that theory in practice.
1.2   Notation
    We let 𝑑 be the ambient dimension of function domains under consideration. The torus T is
defined as R/Z, i.e., [0, 1] with the endpoints identified. Given a natural number 𝑀 ∈ N, we let
[𝑀] := {0, . . . , 𝑀 − 1}.
    Finite length vectors are defined using boldface. For example, we often use x ∈ T𝑑 as a point in
the spatial domain of a function and k ∈ Z𝑑 as a 𝑑-dimensional frequency to index Fourier coeffi-
cients. This also extends to multiindexed finite vectors. For example, if I ⊂ Z𝑑 with |I| < ∞, then
                                                   9


we would refer to a vector indexed over I as, e.g., ĝ = ( 𝑔ˆ k )k∈I . Infinite length sequences remain
in standard roman font, e.g., 𝑔ˆ = ( 𝑔ˆ k )k∈Z𝑑 . All finite length vectors will be implicitly extended to
larger index sets by taking on the value zero wherever they are not originally defined. Additionally,
the set of all complex-valued, finite length vectors or infinite length sequences supported on an
index set D is denoted as CD . Our convention is to use zero-based indexing, i.e., C 𝑀 = C [𝑀] .
     In general, a multivariate function to be recovered is 𝑔 : T𝑑 → C. Specific functions of used in
the context of elliptic PDE are
     • 𝑎 : T𝑑 → R, the diffusion coefficient;
     • b : T𝑑 → R𝑑 , the advection field;
     • 𝑐 : T𝑑 → R, the reaction coefficient;
     • 𝑓 : T𝑑 → R, the forcing function; and
     • 𝑢 : T𝑑 → R, the solution to the PDE.
     Unless otherwise stated, we assume all functions are complex-valued and defined on the torus
T𝑑 . For example, we take the inner product for 𝑢, 𝑣 ∈ 𝐿 2 := 𝐿 2 (T𝑑 ; C) to be
                                                         ∫
                                         h𝑢, 𝑣i 𝐿 2 :=        𝑢(x)𝑣(x) 𝑑x
                                                          T𝑑
where 𝑣 is taken to be the complex conjugate of 𝑣. Additionally, we assume all vectors and se-
quences are complex-valued and defined on Z𝑑 unless otherwise stated. For example, we take the
inner product for 𝑢,ˆ 𝑣ˆ ∈ ℓ 2 := ℓ 2 (Z𝑑 ; C) to be
                                                             Õ
                                             h𝑢,
                                               ˆ 𝑣ˆ i ℓ2 :=      𝑢ˆ k 𝑣ˆ k .
                                                            k∈Z𝑑
The domains and ranges for the function spaces 𝐿 1 , 𝐿 ∞ , 𝐶 (the space of continuous functions), and
𝐶 ∞ (the space of infinitely differentiable functions) are inferred similarly, as is the index set of the
spaces of sequences ℓ 1 and ℓ ∞ .
     We now define our specific notion of periodic Sobolev spaces (see also [8, Section 2.1] and [47,
Appendix A.2.2]).
Definition 1.3. For 𝑢 ∈ 𝐿 2 and 𝛼 ∈ N0𝑑 a multiindex, if there exists a 𝑣 ∈ 𝐿 2 such that
                         h𝑣, 𝜙i 𝐿 2 = (−1) |𝛼| h𝑢, 𝜕 𝛼 𝜙i 𝐿 2       for all 𝜙 ∈ 𝐶 ∞ ⊂ 𝐿 2 ,
                                                          10


we call 𝑣 the weak 𝛼 derivative of 𝑢, and write 𝜕 𝛼 𝑢 := 𝑣. We define the inner product
                                                   Õ              ∫
                              h𝑢, 𝑣i𝐻 1 :=                             𝜕 𝛼 𝑢(x)𝜕𝑣(x) 𝑑x,
                                                                    T𝑑
                                             𝛼∈{0,1} 𝑑 ,k𝛼k 1 ≤1
(where all derivatives are considered in the weak sense) and have the associated norm k𝑢k 𝐻 1 :=
  h𝑢, 𝑢i𝐻 1 . The periodic Sobolev space 𝐻 1 is defined as 𝐻 1 := {𝑢 ∈ 𝐿 2 | k𝑢k 𝐻 1 < ∞}.
p
    For any 𝑔 ∈ 𝐿 1 , and any k ∈ Z𝑑 , we define the kth Fourier coefficient
                                                                ∫
                                   𝑔ˆ k := 𝑔, e2𝜋ik·◦
                                                        𝐿2
                                                            =        𝑔(x)e−2𝜋ik·x 𝑑x.
                                                                  T𝑑
The Wiener algebra 𝑊 := 𝑊 (T𝑑 ; C) is defined as the set of all functions with absolutely summable
Fourier coefficients, 𝑊 := 𝑔 ∈ 𝐿 1 | 𝑔ˆ ∈ ℓ 1 . For any function 𝑔 ∈ 𝑊, its Fourier series is written
                                 
as
                                                        Õ
                                                𝑔=            𝑔ˆ k e2𝜋ik·◦
                                                       k∈Z𝑑
(see also Theorem 1.1 below). Given a hatted sequence 𝑔ˆ ∈ CZ without having previously defined
                                                                               𝑑
𝑔, the function 𝑔 is then implicitly defined as the Fourier series with Fourier coefficients 𝑔.       ˆ In
examples where sequences of Fourier coefficients are known to be finite length, e.g., the output
of sparse Fourier transform algorithms, these coefficients are written in boldface, e.g., ĝ𝑠 . Note
also that for notational aesthetic, Fourier coefficients for functions with super or subscripts will not
include the super or subscript under the hat, e.g., the Fourier coefficients of 𝐺 3𝑑 are 𝐺ˆ 3𝑑 . There are
some occasions where super or subscripts will refer to modifications of Fourier coefficients rather
than referring to the Fourier coefficients of a super or subscripted function (e.g., 𝑔ˆ 𝑠       is the best
                                                                                          opt
𝑠-term approximation of 𝑔,    ˆ not the Fourier coefficients of a function 𝑔𝑠 ), but these will be made
                                                                                      opt
clear from context. For univariate functions 𝑔 1d : T → C, we usually use 𝜔 to index the Fourier
coefficients, e.g., 𝑔ˆ 1d = 𝑔ˆ 𝜔1d 𝜔∈Z .
                                   
    A 𝑑-dimensional frequency set of interest is usually taken to be I ⊂ Z𝑑 . In general, most 𝑑-
dimensional frequency sets are labeled using calligraphic font. For example, Chapter 4 introduces
a particularly important class of frequency sets, the stamping sets denoted S 𝑁 ⊂ Z𝑑 for 𝑁 ∈ N0
which are implicitly parameterized by the set of all active frequencies in a PDE’s data A. The space
                                                           11


of all trigonometric polynomials with frequencies in I is denoted by ΠI := span{e2𝜋ik·◦ | k ∈ I}.
The expansion 𝐾I of a frequency set I ⊂ Z𝑑 is defined as
                                                                               
                                        𝐾I := max max 𝑘 𝑗 − min 𝑙 𝑗 + 1.
                                                𝑗 ∈[𝑑]     k∈I          l∈I
Note that this can be interpreted as the sidelength of the smallest hypercube containing I.
     For a sequence 𝑔ˆ ∈ CZ , its restriction to an index set I is denoted by 𝑔|                ˆ I . The same is true for
                                 𝑑
vectors. This can be interpreted as either a vector in CI or a sequence in CZ which is set to zero
                                                                                                    𝑑
outside of I. When 𝑔ˆ refers to the Fourier coefficients of the function 𝑔, restrictions of 𝑔 to index
sets refer to the Fourier series with Fourier coefficients restricted in the same way, i.e.,
                                            Õ                              Õ
                                    𝑔| I :=           ˆ I ) k e2𝜋ik·◦ =
                                                   ( 𝑔|                        𝑔ˆ k e2𝜋ik·◦ .
                                            k∈Z𝑑                          k∈I
We will also often consider restricting a multiindexed sequence to the hypercube with a fixed side-
length 𝐾. We will denote this set by B𝐾𝑑 , where the one-dimensional frequency band of length 𝐾,
B𝐾 , is defined by B𝐾 := − 𝐾2 , 𝐾2 ∩Z. Rather than subscript with this set, we use the shorthand
                                      
 ˆ 𝐾 := 𝑔|
𝑔|        ˆ B 𝑑 . The best 𝑠-term approximation of a sequence 𝑔ˆ is defined as the restriction 𝑔ˆ to its
             𝐾
𝑠-largest magnitude entries, denoted by 𝑔ˆ 𝑠 . The same applies to vectors.
                                                     opt
     Given a univariate function 𝑔 1d : T → C, we define the vector g1d ∈ C 𝑀 as the vector of 𝑀
equispaced samples of 𝑔 1d on T, that is,
                                                                  
                                               1d           1d    𝑗
                                             g      := 𝑔                        .
                                                                 𝑀      𝑗 ∈[𝑀]
If not explicitly stated, the length of this sampled vector will be clear from context. The length-𝑀
discrete Fourier transform (DFT) of a vector g1d is defined by
                                                                              
                          1 Õ 1d −2𝜋i𝜔 𝑗/𝑀                 1 Õ 1d 𝑗
           F𝑀 g  1d
                        :=           𝑔𝑗 e                =             𝑔              e−2𝜋i𝜔 𝑗/𝑀 for all 𝜔 ∈ [𝑀],
                      𝜔    𝑀                                𝑀                  𝑀
                              𝑗 ∈[𝑀]                            𝑗 ∈[𝑀]
where the matrix
                                                                    
                                                     1 −2𝜋i𝜔 𝑗/𝑀
                                        F𝑀   :=         e
                                                    𝑀                  𝜔∈[𝑀], 𝑗 ∈[𝑀]
is the discrete Fourier transform matrix. In the context of discrete Fourier transforms, without
loss of generality, frequencies 𝜔 are always taken implicitly modulo the length of the DFT, e.g.,
  F 𝑀 g1d −1 = F 𝑀 g1d 𝑀−1 . The same applies to the columns of the DFT matrix.
                          
                                                              12


     Given a natural number 𝑀 ∈ N (often prime) and a generating vector z ∈ {1, . . . , 𝑀 − 1} 𝑑 , the
associated rank-1 lattice is denoted
                                                                                     
                                                     𝑗
                                  Λ(z, 𝑀) :=             z mod 1 | 𝑗 ∈ [𝑀] .
                                                   𝑀
For any 𝑑-variate function 𝑔, we define its restriction to a rank-1 lattice as 𝑔 1d (𝑡) := 𝑔(𝑡z). Notice
then that by combining our previous conventions, given 𝑔 : T𝑑 → C, g1d is the vector of samples
of 𝑔 on the rank-1 lattice Λ(z, 𝑀). The modulus function for a rank-1 lattice 𝑚 z,𝑀 : I → [𝑀] is
defined by k ↦→ k · z mod 𝑀
1.3    Fourier preliminaries
     In the sequel, we will make use of various well-known results on Fourier series and discrete
Fourier transforms. We provide their statements adapted to our setting here.
Theorem 1.1. The space of all infinitely differentiable periodic functions 𝐶 ∞ is dense in 𝐿 2 and
𝐻 1 . In particular, space of trigonometric monomials {e2𝜋ik·◦ ∈ 𝐶 ∞ | 𝑘 ∈ Z𝑑 } is a basis for 𝐶 ∞ , an
orthonormal basis for 𝐿 2 , and an orthogonal basis for 𝐻 1 .
Proposition 1.1 (Plancherel’s identity). If 𝑢 ∈ 𝐿 2 , then 𝑢ˆ ∈ ℓ 2 with k𝑢k 𝐿 2 = k 𝑢k     ˆ ℓ2 . If 𝑣 ∈ 𝐿 2 , then
              ˆ 𝑣ˆ iℓ2 .
h𝑢, 𝑣i 𝐿 2 = h𝑢,
Proof. Consider                              *                                         +
                                               Õ                      Õ
                                h𝑢, 𝑣i 𝐿 2 =           𝑢ˆ k e2𝜋ik·◦ ,      𝑣ˆ l e𝜋il·◦
                                               k∈Z𝑑                   l∈Z𝑑               𝐿2
                                              Õ
                                           =          𝑢ˆ k 𝑣ˆ l e2𝜋ik·◦ , e2𝜋il·◦    𝐿2
                                             k,l∈Z𝑑
                                              Õ
                                           =          𝑢ˆ k 𝑣ˆ l 𝛿k,l
                                             k,l∈Z𝑑
                                             Õ
                                           =        𝑢ˆ k 𝑣ˆ k
                                             k∈Z𝑑
                                           = h𝑢,
                                               ˆ 𝑣ˆ i ℓ2
where we have used the orthonormality of the basis of trigonometric monomials in 𝐿 2 . The norm
result comes from taking 𝑣 = 𝑢.
Lemma 1.2. Let 𝑔 1d ∈ 𝐶 (T) be bandlimited, that is, supp( 𝑔ˆ 1d ) ⊂ B𝑀 . Then 𝑔ˆ 1d = F 𝑀 g1d .
                                                           13


Proof. Writing 𝑔 1d (𝑡) =                  𝑔ˆ 𝜔1d e2𝜋i𝜔𝑡 , for any 𝜔 ∈ B𝑀 , we calculate
                             Í
                                𝜔∈B 𝑀
                                                                          
                                                1 Õ 1d 𝑗
                            F𝑀 g 1d
                                            =                     𝑔                  e−2𝜋i𝜔 𝑗/𝑀
                                        𝜔        𝑀                          𝑀
                                                       𝑗 ∈[𝑀]
                                                                                                         !
                                                  1 Õ                  Õ                          0
                                            =                                  𝑔ˆ 𝜔1d0 e2𝜋i𝜔 𝑗/𝑀 e−2𝜋i𝜔 𝑗/𝑀
                                                 𝑀
                                                       𝑗 ∈[𝑀] 𝜔0 ∈B 𝑀
                                                  1 Õ 1d Õ 2𝜋i(𝜔0 −𝜔) 𝑗/𝑀
                                            =                       𝑔ˆ 0              e
                                                 𝑀 𝜔0 ∈B 𝜔
                                                               𝑀          𝑗 ∈[𝑀]
                                                   Õ
                                            =              𝑔ˆ 𝜔1d0 𝛿0,(𝜔0 −𝜔 mod 𝑀)
                                                 𝜔0 ∈B 𝑀
                                            = 𝑔ˆ 𝜔1d ,
as desired.
Lemma 1.3. For any function 𝑔 1d : T → C with Fourier series 𝑔 1d (𝑡) =                                                    1d 2𝜋i𝜔𝑡 , define the
                                                                                                                  Í
                                                                                                                    𝜔∈Z 𝑔ˆ 𝜔 e
aliased polynomial                                                                                  !
                                                          Õ                 Õ
                                       1d
                                   𝑔alias      (𝑡) =                                        𝑔ˆ 𝜔1d0 e2𝜋i𝜔𝑡 .
                                                         𝜔∈B 𝑀 𝜔0 ≡𝜔 mod 𝑀
                                                                     |          {z                 }
                                                                                        
                                                                           =:      1d
                                                                                𝑔ˆ alias
                                                                                           𝜔
Then the equispaced samples coincide, giving g1d = g1d                               alias
                                                                                             ∈ C 𝑀 and 𝑔ˆ alias1d = F g1d .
                                                                                                                        𝑀
Proof. We group frequencies in the Fourier series of 𝑔 1d by their residues in B𝑀 , giving
                           Õ                     0 𝑗/𝑀
                                                                   Õ Õ
                g1d      =        𝑔ˆ 𝜔1d0 e2𝜋i𝜔            =                           1d
                                                                                    𝑔ˆ 𝜔+𝑛𝑀      e2𝜋i(𝜔+𝑛𝑀) 𝑗/𝑀
                       𝑗
                            𝜔0 ∈Z                                𝜔∈B 𝑀 𝑛∈Z
                                                                     !
                             Õ                  Õ                                                       
                         =                                  𝑔ˆ 𝜔1d0    e2𝜋i𝜔 𝑗/𝑀
                                                                                          =      g1dalias 𝑗 for all 𝑗 ∈ [𝑀].
                            𝜔∈B 𝑀 𝜔0 ≡𝜔 mod 𝑀
Now, since supp( 𝑔ˆ alias
                    1d ) ⊂ B , Lemma 1.2 implies 𝑔ˆ 1d = F g1d = F g1d .
                                 𝑀                                               alias             𝑀 alias       𝑀
                                                                        14


                                              CHAPTER 2
        CONSTRUCTING MULTIPLE RANK-1 LATTICES DETERMINISTICALLY
As discussed in Section 1.1.1, this chapter focuses on computing Fourier series representations of
high-dimensional functions using multiple rank-1 lattices. We begin with a short overview of the
lattice construction and associated Fourier recovery methods in Section 2.1 and present the main
result in Theorem 2.1. Section 2.2 builds up the proof of Theorem 2.1 with some additional algo-
rithmic comments. Section 2.3 provides numerical tests of our multiple rank-1 lattice construction
and Fourier recovery algorithm.
2.1    Overview of results
     We provide the first known deterministic algorithm for constructing multiple rank-1 lattices
[40] for any given index set I ⊂ Z𝑑 with expansion 𝐾I := max 𝑗 ∈[𝑑] maxk∈I 𝑘 𝑗 − minl∈I + 1. The
                                                                                                   
proposed algorithm takes a given generating vector z ∈ [𝑀] 𝑑 of a reconstructing rank-1 lattice for
I as input and uses it to deterministically generate 𝐿 smaller lattice sizes 𝑃0 , . . . , 𝑃 𝐿−1 . Rather than
using the single set Λ(z, 𝑀) of 𝑀 equispaced sampling points along the lattice generating vector
z as in Algorithm 1.1, we use the 𝐿 sampling sets Λ(z, 𝑃0 ), . . . , Λ(z, 𝑃 𝐿−1 ) which are each still
equispaced points in the direction of z but are spaced out at different intervals. The frequencies in I
are then partitioned into 𝐿 groups, each associated with one of the smaller lattices This partitioning
is tracked by a function 𝜈 : I → [𝐿] with the defining property that
                               k · z . h · z mod 𝑃𝜈(k) for all h ≠ k ∈ I,                               (2.1)
that is, for each lattice size 𝑃ℓ , the frequencies in 𝜈 −1 (ℓ) do not collide with any of the other fre-
quencies in I modulo 𝑃ℓ .
     This is similar to the reconstructing property underlying the standard rank-1 lattice FFT ap-
proach Algorithm 1.1. However, to effectively use these 𝐿 sampling sets, we must take one FFT
along each smaller lattice and match only the frequencies associated to this lattice. Note though
that in total, these smaller lattices require only¹ O (|I| log2 (𝐾I |I|)) function evaluations as op-
posed to the O (|I| 2 ) function evaluations generally required by a single rank-1 lattice approach (cf.
                                                    15


Section 1.1.1). This process is outlined Algorithm 2.1.
Algorithm 2.1 Multiple rank-1 lattice FFT
Input: A function 𝑔 : T𝑑 → C, a frequency set of interest I ⊂ Z𝑑 , multiple rank-1 lattices
      Λ(z, 𝑃0 ), . . . Λ(z, 𝑃 𝐿−1 ) and mapping 𝜈 : I → [𝐿] satisfying (2.1)
Output: Approximate Fourier coefficients ĝ 𝐿 ∈ CI
  1: for ℓ ∈ [𝐿] do
  2:     g1d,ℓ ← (𝑔( 𝑗z/𝑃ℓ )) 𝑗 ∈[𝑃ℓ ]
  3:     ĝ1d,ℓ ← F𝑃ℓ g1d,ℓ
  4: end for
  5: for k ∈ I do            
  6:     𝑔ˆ k ← ĝ
            𝐿         1d,𝜈(k)                                         // recall 𝑚 z,𝑃𝜈 (k) (k) := k · z mod 𝑃𝜈(k)
                                𝑚 z,𝑃𝜈 (k) (k)
  7:  end for
     In detail, this chapter is devoted to proving this main theorem concerning the proposed Fourier
coefficient reconstruction algorithm on multiple rank-1 lattices.
Theorem 2.1. Let I ⊂ Z𝑑 be some frequency set with expansion 𝐾I . If Λ(z, 𝑀) is a reconstruct-
ing single rank-1 lattice for I, then one can deterministically construct multiple rank-1 lattices
Λ(z, 𝑃0 ), . . . , Λ(z, 𝑃 𝐿−1 ) such that the Fourier coefficients { 𝑔ˆ k | k ∈ I} of any trigonometric
polynomial 𝑔 ∈ ΠI can be exactly reconstructed using only samples of 𝑔 on these lattices by Al-
gorithm 2.1. Moreover, the total number of function evaluations on these lattice points is bounded
by
                             
                                                                                         for |I| = 1,
                             
                             2
                Õ            
                             
                             
                      𝑃ℓ ≤                                                         
                                                              |I|
                                                                                         for |I| ≥ 2.
                             
              ℓ∈[𝐿]           6 |I| log2 (𝑑𝐾I 𝑀) log 3 log2 (|I|) log2 (𝑑𝐾I 𝑀)
                             
                             
                             
The total computational complexity for the construction of these rank-1 lattices can be bounded by
                                                                                                  
               O |I| 2 log(|I|) log(𝑑𝐾I 𝑀) + |I| (𝑑 + log(𝑑𝐾I 𝑀) log(log(𝑑𝐾I 𝑀))) ,
and the total computational complexity for reconstructing the Fourier coefficients can be bounded
by
                                                                                    
                              O |I| 𝑑 + log(𝑑𝐾I 𝑀) log2 (|I| log|I| (𝑑𝐾I 𝑀)) .                              (2.2)
     ¹These bounds are simplifications of those in Lemma 2.2 and Theorem 2.2 under the mild assumptions that the
dimension 𝑑 and size of the original single rank-1 lattice 𝑀 are bounded polynomially by max{|I|, 𝐾 }. The latter
assumption holds for single rank-1 lattices constructed by CBC methods, cf. Section 2.2.1.
                                                          16


Proof. The bounds on the total number of samples from the rank-1 lattices follow from Theo-
rem 2.2, and the bound on the computational complexity for lattice construction follows from Sec-
tion 2.2.1. The exactness of the Fourier coefficient recovery is a result of Corollary 2.1. Since
Algorithm 2.1 requires an FFT of length ℓ for each ℓ ∈ [𝐿], the total complexity of Line 1 to Line 4
requires O (log maxℓ∈[𝐿] 𝑃ℓ ℓ∈[𝐿] 𝑃ℓ ) complexity, where the maximum is bounded in Lemma 2.2
                                  Í
and the sum is bounded above. The remaining lines are O (𝑑|I|) (assuming the modulus functions
have not been precomputed, in which case the complexity would reduce to O (|𝐼 |)). Simplifying
these complexities results in (2.2).
     Note that Algorithm 2.1 exactly reconstructs all Fourier coefficients of multivariate trigonomet-
ric polynomials with frequencies in a specific frequency set I which is assumed to be given. One
can also apply these rules in order to compute approximations of the Fourier coefficients of more
general periodic functions. The resulting trigonometric polynomial can be used as an approximant.
For specific approximation settings, the worst case error of this approximation is almost as good as
the approximation one achieves when approximating the Fourier coefficients using the lattice rule
that uses all samples of the reconstructing single rank-1 lattice from which we start the construction
of our rules, cf. [44] for details. From that point of view, the strategy we present here even yields a
general approach for significantly reducing the number of sampling values used while only slightly
increasing approximation errors. We refer to Corollary 2.1 for more details and to the numerical
example in section 2.3.2 that yields Figure 2.5 illustrating this assertion.
2.2     The proof of Theorem 2.1
     We denote the 𝑞th prime number by 𝑝 𝑞 , 𝑞 ∈ N. For technical reasons, we define 𝑝 0 := 1.
Lemma 2.1. Let J := {𝑘 1 , . . . , 𝑘 𝐽 } ⊂ N with 𝑘 1 < . . . < 𝑘 𝐽 and 𝑀˜ ≥ 𝑘 𝐽 − 𝑘 1 . Also let 𝑞 ∈ N
                                                             l             m
be such that 𝑝 𝑞−1 < 𝐽 ≤ 𝑝 𝑞 , and 𝑄 := max 1, 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀)   ˜ − 1 . Then, there exist primes
𝑃0 , . . . , 𝑃 𝐿−1 ∈ P𝐽 := {𝑝 𝑞+ℓ }ℓ∈[𝑄] with 𝐿 ≤ log2 (𝐽) + 1 such that
                                 Ø
                          J=          {𝑘 ∈ J | 𝑘 . ℎ mod 𝑃ℓ for all ℎ ∈ J \ {𝑘 }}
                                ℓ∈[𝐿]
holds.
                                                     17


Proof. We assume 𝐽 ≥ 2 and 𝑀˜ > 𝑝 𝑞 , otherwise the statement is trivial. Without loss of generality,
we can also assume J ⊂ [ 𝑀]         ˜ by considering the residues of each 𝑘 𝑗 ∈ J modulo 𝑀.                ˜ Note that
these residues are all unique due to 𝑀˜ > 𝑘 𝐽 − 𝑘 1 , and therefore, any modulo 𝑃ℓ collision of the
residues is equivalent to a collision of their original values.
      Let P𝐽 = {𝑝 𝑞+ℓ }ℓ∈[𝑄] be the set of the 𝑄 smallest prime numbers not smaller than 𝑝 𝑞 and
𝑌𝑖, 𝑗 := {𝑝 ∈ P𝐽 | 𝑘 𝑖 ≡ 𝑘 𝑗 mod 𝑝} a subset which collects all primes 𝑝 in P𝐽 where the frequencies
𝑘 𝑖 ∈ J and 𝑘 𝑗 ∈ J collide modulo 𝑝. Since |𝑘 𝑖 − 𝑘 𝑗 | is divisible by each prime 𝑝 in 𝑌𝑖, 𝑗 , the
Chinese Remainder Theorem implies that 𝑝∈𝑌𝑖, 𝑗 𝑝 divides |𝑘 𝑖 − 𝑘 𝑗 | < 𝑀.                     ˜ Therefore, we observe
                                                          Î
                                                      |𝑌 |
                                                                 Ö
                                                    𝑝 𝑞 𝑖, 𝑗 ≤          𝑝 < 𝑀˜
                                                                𝑝∈𝑌𝑖, 𝑗
                                                                                          l               m
for all 𝑖 ≠ 𝑗 ∈ {1, . . . , 𝐽} =: 𝑆0 , i.e., 𝑘 𝑖 ≠ 𝑘 𝑗 , and this implies |𝑌𝑖, 𝑗 | ≤ −1 + log 𝑝 𝑞 ( 𝑀) .˜
      Moreover, we collect all primes for which 𝑘 𝑖 collides with any other 𝑘 𝑗 in the sets
                   𝑌𝑖 : = {𝑝 ∈ P𝐽 | 𝑘 𝑖 ≡ 𝑘 𝑗 mod 𝑝 for at least one 𝑘 𝑗 ∈ J \ {𝑘 𝑖 }}
                                  Ø
                          =                 𝑌𝑖, 𝑗 .
                             𝑘 𝑗 ∈J \{𝑘 𝑖 }
The cardinality of each 𝑌𝑖 is bounded by
                                             Õ                             l                 m
                              |𝑌𝑖 | ≤                 |𝑌𝑖, 𝑗 | ≤ (𝐽 − 1) −1 + log 𝑝 𝑞 ( 𝑀)˜ .
                                       𝑘 𝑗 ∈J \{𝑘 𝑖 }
Accordingly, we count
                                                                         l                 m
                  |P𝐽 \ 𝑌𝑖 | = |P𝐽 | − |𝑌𝑖 | ≥ 𝑄 − (𝐽 − 1) −1 + log 𝑝 𝑞 ( 𝑀)            ˜ ≥ |P𝐽 |/2.
      We define the indicator variables
                                                             
                                                             
                                                             1   𝑝 𝑞+ℓ ∈ P𝐽 \ 𝑌𝑖 ,
                                                             
                                                             
                                                             
                                              𝑍𝑖,𝑞+ℓ :=
                                                             
                                                              0 𝑝 𝑞+ℓ ∈ 𝑌𝑖 ,
                                                             
                                                             
                                                             
for all 𝑘 𝑖 ∈ J and 𝑝 𝑞+ℓ     ∈ P𝐽 . Summing up these indicator variables and using the estimates from
above yields
                           Õ Õ                      Õ
                                       𝑍𝑖,𝑞+ℓ =             |P𝐽 \ 𝑌𝑖 | ≥ |𝑆0 ||P𝐽 |/2 = 𝐽 |P𝐽 |/2.                (2.3)
                           𝑖∈𝑆0 ℓ∈[𝑄]               𝑖∈𝑆0
                                                                 18


    We will now show that 𝑖∈𝑆0 𝑍𝑖,𝑞+ℓ ≥ 𝐽/2 holds for at least one 𝑝 ℓ ∈ P𝐽 by contradiction. To
                                 Í
this end, suppose that 𝑖∈𝑆0 𝑍𝑖,𝑞+ℓ < 𝐽/2 for all 𝑝 𝑞+ℓ ∈ P𝐽 . Accordingly, we estimate
                           Í
                                     Õ Õ
                                               𝑍𝑖,𝑞+ℓ < |𝑆0 ||P𝐽 |/2 = 𝐽 |P𝐽 |/2
                                    ℓ∈[𝑄] 𝑖∈𝑆0
which is in contradiction to (2.3). Thus, there exists at least one prime 𝑝 𝑞+ℓ0 ∈ P𝐽 such that
              Õ
                   𝑍𝑖,𝑞+ℓ0 = | {𝑘 𝑖 ∈ J | 𝑘 𝑖 . 𝑘 𝑗 mod 𝑝 𝑞+ℓ0 for all 𝑘 𝑗 ∈ J \ {𝑘 𝑖 }} | ≥ 𝐽/2.
              𝑖∈𝑆0             |                             {z                          }
                                                             =:J1
We set 𝑃0 := 𝑝 𝑞+ℓ0 , and then apply the strategy iteratively.
    For 𝑟 ∈ N, we define 𝑆𝑟 := {𝑖 ∈ 𝑆𝑟−1 | ∃𝑘 𝑗 ∈ J \ {𝑘 𝑖 } with 𝑘 𝑖 ≡ 𝑘 𝑗 mod 𝑃𝑟−1 } and obtain
𝑠𝑟 := |𝑆𝑟 | ≤ 2−𝑟 𝐽. Obviously, we have
                                                                       Ø𝑟
                                        J𝑟0 := {𝑘 𝑖 | 𝑖 ∈ 𝑆𝑟 } = J \       J𝑡 ,                         (2.4)
                                                                       𝑡=1
which are the frequencies that collide modulo each of 𝑃0 , . . . , 𝑃𝑟−1 to some other frequency in J .
We reconsider the variables defined above, but now we restrict the indices to 𝑖 ∈ 𝑆𝑟 . For instance,
we observe {𝑃0 , . . . , 𝑃𝑟−1 } ⊂ 𝑌𝑖 for all 𝑖 ∈ 𝑆𝑟 . We estimate
                                  Õ Õ                  Õ
                                             𝑍𝑖,𝑞+ℓ =       |P𝐽 \ 𝑌𝑖 | ≥ 𝑠𝑟 |P𝐽 |/2.
                                  𝑖∈𝑆𝑟 ℓ∈[𝑄]           𝑖∈𝑆𝑟
Using the same contradiction as above, we observe that for at least one 𝑝 𝑞+ℓ𝑟 ∈ P𝐽 \ {𝑃0 , . . . , 𝑃𝑟−1 }
we have
             Õ
                   𝑍𝑖,𝑞+ℓ𝑟 = | {𝑘 𝑖 ∈ J𝑟0 | 𝑘 𝑖 . 𝑘 𝑗 mod 𝑝 𝑞+ℓ𝑟 for all 𝑘 𝑗 ∈ J \ {𝑘 𝑖 }} | ≥ 𝑠𝑟 /2.
             𝑖∈𝑆𝑟              |                             {z                          }
                                                            =:J𝑟+1
    We now set 𝑃𝑟 := 𝑝 𝑞+ℓ𝑟 and increase 𝑟 up to the point where 0 = |𝑆𝑟+1 | = 𝑠𝑟+1 holds. In order to
estimate the largest possible step number 𝑟 max ≥ 𝑟, we require that 𝑠𝑟max +1 ≤ 2−(𝑟max +1) 𝐽 < 1. This
is satisfied in particular when 𝑟 max = log2 (𝐽) , and thus we bound the total number of primes as
                                                       
𝐿 ≤ 𝑟 max + 1 ≤ log2 (𝐽) + 1.
Remark 2.1. In the proof of Lemma 2.1 we determined that there exist primes in the candidate
                                                                                             l            m
set P𝐽 fulfilling the assertion. This set contains the first 𝑄 := max 1, 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀)        ˜ −1
                                                          19


prime numbers not smaller than 𝑝 𝑞 , 𝑝 𝑞−1 < 𝐽 ≤ 𝑝 𝑞 , which only depends on 𝐽. However, from a
theoretical point of view, any prime number 𝑝 larger than d𝐽/2e may fulfill |J1 | ≥ 𝐽/2. Thus, one
also could start the set of prime candidates at that point, which would result in a slightly increased
cardinality of the candidate set, due to the fact that 𝑄 depends on the logarithm to the base of the
smallest prime in the candidate set. In spite of that increased cardinality, the maximal prime number
in the candidate set 𝑝 𝑞+𝑄−1 , which is estimated in the next lemma, may be decreased. Analyzing
this approach leads to similar statements as in the previous and the following lemmas with slightly
changed constants. In more detail, both constants 𝐶1 and 𝐶2 can be bounded less than 3. However,
the proof requires more effort and we could not bound the resulting constants lower than those stated
in Lemma 2.2.
Lemma 2.2. Assume 𝐽, 𝑀˜ ∈ N, 𝐽 ≤ 𝑀,           ˜ 𝑝 𝑞 is the smallest prime not smaller than 𝐽, and let
                        l               m
𝑄 := max 1, 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀) ˜ − 1 . Then, we estimate
                                 
                                                                          for 𝐽 = 1,
                                 
                                 2
                                 
                                 
                                 
                      𝑝 𝑞+𝑄−1 ≤
                                                ˜ log 𝐶2 𝐽 log𝐽 ( 𝑀)˜     for 𝐽 ≥ 2,
                                                                     
                                 𝐶1 𝐽 log𝐽 ( 𝑀)
                                 
                                 
                                 
with absolute constants 𝐶1 < 2.3 (1 + e−3/2 ) ≤ 2.832 and 𝐶2 ≤ 2.3.
Proof. For 𝐽 = 1, we observe 𝑝 𝑞+𝑄−1 = 𝑝 𝑞 = 2.
When 𝐽 ≥ 2 and 𝑝 𝑞 ≥ 𝑀˜ we have 𝑄 = 1 and 𝑝 𝑞 < 2𝐽 as a result of Bertrand’s postulate.
    We then consider 𝐽 ≥ 2 and 𝑝 𝑞 < 𝑀˜ which yields
                                             l                m
                                                       ˜                                 ˜
             𝑞 + 𝑄 − 1 = 𝑞 − 1 + 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀) − 1 ≤ 𝑞 − 1 + 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀).
    We distinguish two cases, where the final constants from the lemma are determined by the
second case. In the first, we restrict to the finite range where 2 ≤ 𝐽 ≤ 8 with 𝑝 𝑞 < 𝑀˜ < 𝑝 𝑞d10/(𝐽−1)e ,
and numerically check that the upper bound
                                                         ˜ log 2.3 𝐽 log𝐽 ( 𝑀)
                                                                            ˜
                                                                               
                            𝑝 𝑞+𝑄−1 < 2.831 𝐽 log𝐽 ( 𝑀)
is satisfied. In the second case, where 2 ≤ 𝐽 ≤ 8 with 𝑀˜ ≥ 𝑝 𝑞d10/(𝐽−1)e or 𝐽 ≥ 9, we have
                                                     20


𝑞 + 𝑄 − 1 ≥ 20. We then estimate this quantity from above as
                                                                                            
                                                         ˜ =         𝑞  −   1        𝐽  −  1            ˜
            𝑞 + 𝑄 − 1 ≤ 𝑞 − 1 + 2(𝐽 − 1) log𝐽 ( 𝑀)                               +2            𝐽 log𝐽 ( 𝑀)
                                                                  𝐽 log𝐽 ( 𝑀) ˜         𝐽
                                               
                             𝑞−1        𝐽−1                  ˜ ≤ 2.3 𝐽 log𝐽 ( 𝑀)    ˜
                         ≤          +2             𝐽 log𝐽 ( 𝑀)
                                𝐽          𝐽
where one achieves the last estimate by computing             𝑞−1
                                                               𝐽   + 2 𝐽−1𝐽 for 2 ≤ 𝐽 < 66 and for 𝐽 ≥ 66, one
obtains
                     𝑞−1       𝐽−1    [60, Eq. (3.6)] 1.25506              1.25506
                           +2               ≤                     +2 ≤                + 2 < 2.3 .
                      𝐽           𝐽                     log 𝐽                log 66
By the estimate
                                                                      −3/2
                                          𝑒 −1/2 𝑥 log(𝑥) ≤ 𝑥 1+e          ,
implying
                                                                       1
                    log 𝑒 −1/2 𝑥 log 𝑥 = log(𝑥) + log log(𝑥) − ≤ (1 + e−3/2 ) log 𝑥
                                                                         2
for 𝑥 > 1, an application of [60, Eq. (3.11)] gives
                                                                                                     
                 𝑝 𝑞+𝑄−1 < (𝑞 + 𝑄 − 1) log(𝑞 + 𝑄 − 1) + log log(𝑞 + 𝑄 − 1) − 1/2
                          ≤ (1 + e−3/2 ) (𝑞 + 𝑄 − 1) log(𝑞 + 𝑄 − 1)
                          ≤ (1 + e−3/2 ) 2.3 𝐽 (log𝐽 ( 𝑀))  ˜ log 2.3 𝐽 log𝐽 ( 𝑀)      ˜ ,
                                                                                          
as desired.
    Lemma 2.1 ensures the existence of a set of primes 𝑃0 , . . . , 𝑃 𝐿−1 such that each single element
of a given set of integers will not collide modulo at least one 𝑃ℓ with any other of these integers.
We can now use these primes to convert the large reconstructing single rank-1 lattice Λ(𝑧, 𝑀, I) for
some frequency set I into smaller rank-1 lattices which, based on their ability to avoid collisions
in the frequency domain, will provide a sampling set to exactly reconstruct the Fourier coefficients
of all multivariate trigonometric polynomials in ΠI .
Theorem 2.2. Let I ⊂ Z𝑑 , |I| ≥ 2, and a generating vector z ∈ [𝑀] 𝑑 of Λ(z, 𝑀), a reconstructing
rank-1 lattice for I, be given. We determine 𝑀˜ := max{k · z | k ∈ I} − min{k · z | k ∈ I} + 1.
                                                         21


Then there exists a set of prime numbers 𝑃0 , . . . , 𝑃 𝐿−1 , 𝐿 ≤ log2 (|I|) + 1, such that
                                 Ø
                         I=            {k ∈ I | k · z . h · z mod 𝑃ℓ for all h ∈ I \ {k}},             (2.5)
                               ℓ∈[𝐿]
Thus, the multiple rank-1 lattices Λ(z, 𝑃0 ), …, Λ(z, 𝑃 𝐿−1 ) can be used as input for the multiple rank-
1 lattice Fourier transform Algorithm 2.1. The total number of sampling values in these multiple
rank-1 lattices can be bounded by
                             Õ                                                         
                                                             ˜ log 𝐶2 |I| log|I| ( 𝑀)
                                      𝑃ℓ ≤ 2 𝐶1 |I| (log2 ( 𝑀))                      ˜ ,               (2.6)
                            ℓ∈[𝐿]
with constants 𝐶1 , 𝐶2 from Lemma 2.2.
Proof. Define the set of hashed multivariate frequencies in I as I 1d := {k · z | k ∈ I}. Applying
Lemma 2.1 with J = I 1d and 𝑀˜ = max I 1d − min I 1d + 1 as above, we find a set of prime numbers
{𝑃0 , . . . , 𝑃 𝐿−1 } with maxℓ∈[𝐿] 𝑃ℓ ≤ 𝑝 𝑞+𝑄−1 and respective rank-1 lattices Λ(z, 𝑃0 ), . . . Λ(z, 𝑃 𝐿−1 )
such that (2.5) holds. We estimate
                         Ø                     Õ
                              Λ(z, 𝑃ℓ ) ≤           𝑃ℓ ≤ (log2 (|I|) + 1) 𝑝 𝑞−1+𝑄
                        ℓ∈[𝐿]                 ℓ∈[𝐿]
                                           Lem. 2.2                                        
                                              ≤                    ˜ log 𝐶2 |I| log|I| ( 𝑀)
                                                    2𝐶1 |I| log2 ( 𝑀)                     ˜ .
Remark 2.2. We consider two crucial estimates on 𝑀˜ in Theorem 2.2
                      
                                    
                                             
                                                           
                                                                    Õ                       
                      Õ                      Õ
                                                                                                       (2.7)
                                    
                                                          
                                                            
   𝑀˜ = 1 + max               𝑘 𝑖 𝑧𝑖 + max           −ℎ𝑖 𝑧𝑖 ≤ 1 +         𝑧𝑖 max 𝑘 𝑖 − min ℎ𝑖 ≤ 𝑑𝐾I 𝑀
                 k∈I 𝑖∈[𝑑]          h∈I 
                                             𝑖∈[𝑑]                        k∈I       h∈I
                                            
                                                            
                                                                   𝑖∈[𝑑]
                        
                        Õ             
                                              
                                               Õ          
                                                           
                                                                                                       (2.8)
                        
                                      
                                              
                                                          
                                                           
     𝑀˜ = 1 + max                𝑘 𝑖 𝑧𝑖 − min         ℎ𝑖 𝑧𝑖 ≤ 2kzk ∞ max kkk 1 + 1 ≤ 2𝑀 max kkk 1
                   k∈I 𝑖∈[𝑑]          h∈I 
                                              𝑖∈[𝑑]      
                                                                       k∈I                   k∈I
                                                        
where 𝐾I is the expansion of I.
    The estimate in (2.7) is a rough but universal upper bound on 𝑀˜ that depends on the dimen-
sion 𝑑. The inequality in (2.8) provides a dimension independent upper bound on 𝑀˜ in cases where
the frequency set I is contained in an ℓ1 -ball of a specific size 𝑅, i.e., I ⊂ {k ∈ Z𝑑 | kkk 1 ≤ 𝑅},
which yields 𝑀˜ ≤ 2𝑀 𝑅. We refer to Section 2.2.1, where we present and analyze the computational
costs and discuss the advantages of the latter estimate.
                                                            22


     The Fourier coefficient reconstruction process in Algorithm 2.1 allows for us to prove theoret-
ical error guarantees for approximation of functions that are not necessarily Fourier polynomials
supported on a known I. In particular, we are able provide 𝐿 ∞ and 𝐿 2 bounds for the approxima-
tion error in terms of the error in truncating a function’s Fourier coefficients to a chosen I. The
proof relies on the fact that the aliasing error in a DFT is comparable to the truncation error. See,
e.g., [44, Lemma 3.1] for similar results and further details. For the following we define the Wiener
algebra 𝑊 := {𝑔 ∈ 𝐿 1 | k 𝑔k
                           ˆ ℓ1 < ∞}.
Corollary 2.1. Let 𝑔 ∈ 𝑊 and fix a frequency set I ⊂ Z𝑑 with |I| < ∞. Use the multiple rank-1
lattices for I in Theorem 2.2 with Algorithm 2.1 to produce ĝ 𝐿 and 𝑔 𝐿 := k∈I 𝑔ˆ k𝐿 e2𝜋ik·◦ ∈ ΠI .
                                                                                                  Í
Then 𝑔 𝐿 approximates 𝑔 with the error bounds
                                    𝑔 − 𝑔𝐿    𝐿∞
                                                  ≤ (1 + 𝐿)k 𝑔ˆ − 𝑔|     ˆ I k ℓ1
                                                         √ 
                                  𝑔 − 𝑔𝐿    𝐿2
                                               ≤ 1 + 𝐿 k 𝑔ˆ − 𝑔|          ˆ I k ℓ2 .
Proof. By the triangle inequality
                                               Õ                        Õ
                               𝑔 − 𝑔𝐿    𝐿∞
                                             ≤       𝑔ˆ k − 𝑔ˆ k𝐿 +             | 𝑔ˆ k |
                                               k∈I                  k∈Z𝑑 \I
                                             = 𝑔|ˆ I − 𝑔ˆ 𝐿   ℓ1
                                                                  + k 𝑔ˆ − 𝑔|ˆ I k ℓ1 .
Now, note that by partitioning the frequencies k ∈ I by their values of 𝜈(k), for ĝ1d,ℓ as in Line 3
of Algorithm 2.1, we obtain
                                             Õ       Õ
                           ˆ I − 𝑔ˆ 𝐿
                           𝑔|         ℓ1
                                         =                   𝑔ˆ k − 𝑔ˆ k𝐿
                                            ℓ∈[𝐿] k∈𝜈 −1 (ℓ)
                                             Õ       Õ
                                         =                   𝑔ˆ k − ĝ1d,ℓ
                                                                       k·z mod 𝑃ℓ
                                            ℓ∈[𝐿] k∈𝜈 −1 (ℓ)
                                             Õ       Õ                      Õ
                                         =                   𝑔ˆ k −                      𝑔ˆ 𝜔1d ,
                                            ℓ∈[𝐿] k∈𝜈 −1 (ℓ)        𝜔≡k·z mod 𝑃ℓ
where the final line follows from Lemma 1.3. Since the multidimensional frequencies k ∈ Z𝑑 of 𝑔ˆ
map to the frequencies of 𝑔ˆ 1d by k ↦→ k · z and for any k ∈ 𝜈 −1 (ℓ), there are no such h ∈ I \ {k}
                                                      23


such that h · z ≡ k · z mod 𝑃ℓ , we know that
                                           Õ                            Õ
                                𝑔ˆ k −              𝑔ˆ 𝜔1d ≤                     | 𝑔ˆ h |.
                                       𝜔≡k·z mod 𝑃ℓ                    h∈Z𝑑 \I
                                                                h·z≡k·z mod 𝑃ℓ
Thus
                                              Õ        Õ                       Õ
                             ˆ I − 𝑔ˆ 𝐿
                            𝑔|          ℓ1
                                           =                      𝑔ˆ k −                   𝑔ˆ 𝜔1d
                                             ℓ∈[𝐿] k∈𝜈 −1 (ℓ)             𝜔≡k·z mod 𝑃ℓ
                                               Õ        Õ                Õ
                                           ≤                                      | 𝑔ˆ h |
                                              ℓ∈[𝐿] k∈𝜈 −1 (ℓ)         h∈Z𝑑 \I
                                                                 h·z≡k·z mod 𝑃ℓ
                                               Õ      Õ
                                           ≤                   | 𝑔ˆ h |
                                              ℓ∈[𝐿] h∈Z𝑑 \I
                                           = 𝐿k 𝑔ˆ − 𝑔| ˆ I k ℓ1 ,
finishing the proof of the 𝐿 ∞ /ℓ 1 result. The 𝐿 2 /ℓ 2 result follows by replacing the 𝐿 ∞ norm by the
𝐿 2 norm, taking squares of all terms, and taking a final square root.
    As considered in [40, Subsection 4.2] for randomized lattice constructions, we can take an alter-
native approach to Theorem 2.2 which requires fewer samples at the cost of having only theoretical
reconstruction guarantees for trigonometric polynomials (i.e., the results concerning approximation
discussed in Corollary 2.1 do not apply in a straightforward manner). Rather than require that at
each step of the lattice construction, a prime 𝑝 is chosen so that a set of frequencies can be obtained
which do not collide with any other frequency in the original frequency set modulo 𝑝, we instead
recursively reduce the size of the set that the resulting rank-1 lattice has the reconstruction property
over without concern for other frequencies.
Theorem 2.3. Let I ⊂ Z𝑑 , |I| ≥ 1, 𝑑 ≥ 2, 𝑀˜ := max{k · z | k ∈ I} − min{k · z | k ∈ I|} + 1.
For Λ(z, 𝑀) a reconstructing single rank-1 lattice for I , there exist primes 𝑃0 , . . . , 𝑃 𝐿−1 , 𝐿 ≤
log2 (|I|) + 1, with
                                   
                                                                                    for |I| = 1,
                                   
                                   2
                       Õ           
                                   
                                                                                                   (2.9)
                                   
                            𝑃ℓ ≤
                                                  ˜ log 2 log2 ( 𝑀)          ˜      for |I| ≥ 2,
                                                                              
                      ℓ∈[𝐿]         8 |I| log2 ( 𝑀)
                                   
                                   
                                   
                                                         24


such that for every 𝑔 ∈ ΠI , the formula
                                                    (k) −1
                                                 𝑃 𝜈Õ
                                                                      ( 𝑗z) mod 𝑃𝜈(k) −2𝑃𝜋i 𝑗k·z
                                                                                    
                                            1
                                    𝑔ˆ k =                 𝑔𝜈(k)−1                     e 𝜈 (k)
                                           𝑃𝜈(k)    𝑗=0
                                                                            𝑃𝜈(k)
                                                                                                   (2.10)
                                                           Õ
                 with      𝑔𝜈(k)−1 (x) := 𝑔(x) −                    𝑔ˆ h e2𝜋ih·x
                                                    h∈{l|𝜈(l)<𝜈(k)}
holds where 𝜈 : I → [𝐿] maps frequencies to the lattice used to reconstruct the corresponding
Fourier coefficient, i.e., we can uniquely reconstruct each multivariate trigonometric polynomial
with frequencies in I using samples along the rank-1 lattices Λ(z, 𝑃0 ), . . . , Λ(z, 𝑃 𝐿−1 ).
Proof. The proof is simply a recursive application of part of the previously discussed approach, so
we only provide a sketch.
    We use only the first prime 𝑃0 from Lemma 2.1 to determine a set of frequencies I0 ⊂ I such
that Λ(z, 𝑃0 ) is a reconstructing single rank-1 lattice for I0 with |I0 | ≥ |I|/2. Performing the recon-
struction process in Theorem 2.2 for only frequencies in I0 using samples from Λ(z, 𝑃0 ) recovers
the corresponding Fourier coefficients exactly. This then defines the correspondence 𝜈(k) = 0 for
all k ∈ I0 . Subtracting off the recovered polynomial terms and recursively repeating the process
with the frequency set I \ I0 gives (2.10).
    The upper bound on the number of samples is a result of Lemma 2.2, noting that at each step,
the cardinality of the frequency set is reduced by half. Splitting the dependence on |I| and 𝑀˜ in
the second logarithm using the inequality log(𝑥𝑦) ≤ 2(log 𝑥)(log 𝑦) for 𝑥, 𝑦 ≥ e and estimating the
resulting geometric series gives (2.9).
2.2.1   Analysis of lattice construction
    The approach analyzed in Theorem 2.2 provides a constructive, deterministic method for build-
ing reconstructing multiple rank-1 lattices from reconstructing single rank-1 lattices. Algorithm 2.2
summarizes the suggested approach in detail. In the following, we analyze the runtime complexity.
    We start by analyzing Line 1 which is O (𝑑 |I|). The arithmetic complexity of Lines 2 and 4
are dominated by determining the set of primes P|I| , which can be done in linear time with respect
                                                            
to 𝑝 𝑞+𝑄−1 ≤ 𝐶1 |I|(log|I| ( 𝑀))
                               ˜ log 𝐶2 |I| log|I| ( 𝑀)    ˜ estimated in Lemma 2.2, therefore requiring
                                                        25


Algorithm 2.2 Deterministic construction of multiple rank-1 lattice suitable for reconstruction and
approximation, according to Theorem 2.2 and Lemma 2.1
Input: frequency set I ⊂ Z𝑑 , generating vector z ∈ N0𝑑 of a reconstructing single rank-1 lattice for
     I
Output: number of lattices 𝐿, lattice sizes 𝑃0 , . . . , 𝑃 𝐿−1 , and mapping 𝜈 : I → [𝐿] for which
     coefficients are computed by which lattice
  1: J00 ← {k · z | k ∈ I}
  2: Determine 𝑞 ∈ N s.t. 𝑝 𝑞−1 < |I| ≤ 𝑝 𝑞                                    // recall 𝑝 ℓ is the ℓth prime
                                  l             m
  3: 𝑄 ← max 0, 2(|I| − 1) log 𝑝 𝑞 ( 𝑀)    ˜ −1                     // recall 𝑀˜ := max J00 − min J00 + 1
  4: P|I| ← 𝑝 𝑞+ℓ ℓ∈[𝑄]
               
  5: Initialize 𝑟 ← 0 and 𝜈 : I → N with 𝜈(k) = 0 for all k ∈ I
  6: repeat
  7:     for all ℓ ∈ [𝑄] do
  8:             0 = ∅
              J𝑟+1
  9:          for all k · z ∈ J𝑟0 do
10:                𝜈(k) ← 𝑟
11:                if k · z ≡ ℎ0 mod 𝑝 𝑞+ℓ for any ℎ0 ∈ J00 \ {ℎ} then
12:                        0 ← J 0 ∪ {k · z}
                        J𝑟+1       𝑟+1
13:                end if
14:           end for
15:           if J𝑟+10   ≤ J𝑟0 /2 then
16:                𝑃𝑟 ← 𝑝 𝑞+ℓ
17:                break
18:           end if
19:      end for
20:      𝑟 ←𝑟 +1
21: until J𝑟0 = ∅
22: 𝐿 ← 𝑟
O |I| log 𝑀˜ log log 𝑀˜            arithmetic operations.
                              
     The goal of the loop from Lines 6 to 21 is to separate the frequencies in I into 𝐿 groups. Each of
these ℓ ∈ [𝐿] groups is assigned a prime 𝑃ℓ so that the frequencies do not collide with any others in
I modulo 𝑃ℓ . In the worst case, there will be at most 𝐿 = O (log(|I|)) (cf. Theorem 2.2) repetitions
of this loop. The first inner loop requires, at most, a scan through each of the 𝑄 = O (|I| log 𝑝 𝑞 ( 𝑀))  ˜
primes in P|𝐼 | . The body of this inner loop can be accomplished in O (|I| log(|I|)) time. Indeed,
this requires the computation of 𝑘 mod 𝑝 𝑞+ℓ for all 𝑘 ∈ J00 (where we make sure to track the
association between 𝑘 mod 𝑝 𝑞+ℓ and the original frequency k ∈ I with 𝑘 = k · z), a sort of these
residues, and a linear scan to determine duplicates of the residues of elements originally in J𝑟0
                                                     26


(where we can rely on our function 𝜈 and the aforementioned association between 𝑘 mod 𝑝 𝑞ℓ and
k). This is dominated by the sort complexity, O (|I| log(|I|)). Thus, the total complexity for
                                         
Lines 6 to 21 is O |𝐼 | 2 log(|I|) log 𝑀˜ (noting that log(|I|) < log 𝑝 𝑞 ). Altogether, we observe
                                                                                
a runtime complexity of
                                                                                    
                      O |I| 2 log(|I|) log 𝑀˜ + |I| 𝑑 + log 𝑀˜ log log 𝑀˜
                                                                    
                                                                                          .
     In the following, we comment on practical issues of Algorithm 2.2. Line 1 might suffer from
overflowing integers which can be avoided by using higher precision integer representations. An
alternative is to skip this precomputation and instead compute the inner products modulo 𝑝 𝑞+ℓ on
the fly in Line 11 which will increase the runtime complexity by a factor of 𝑑 in the first summand.
Note also that one does not necessarily need to compute 𝑀˜ in advance. For the loop over primes
starting in Line 7, one might just start with the prime 𝑝 𝑞 and increase the prime number using some
“nextprime” function, which would increase the second summand in the runtime complexity.
     Finally, we discuss the range of the numbers 𝑀˜ as well as the influence of the original single
rank-1 lattice on the estimates herein. In general, there are two different suitable approaches for
finding a single reconstructing rank-1 lattice for a given frequency index set I. A simple approach is
to just pick a rank-1 lattice Λ(z, 𝑀) that provides the reconstruction property from a simple number-
theoretic point of view. For instance one can choose generating vectors z and lattice sizes 𝑀 that
fulfill
                   𝑧 0 ∈ N,   𝑧𝑖 ≥ (1 + max 𝑘 𝑖−1 − min ℎ𝑖−1 )𝑧𝑖−1 ,   𝑖 = 1, . . . , 𝑑 − 1,
                                        k∈I          h∈I
                              𝑀 ≥ (1 + max 𝑘 𝑑−1 − min ℎ 𝑑−1 )𝑧 𝑑−1 .
                                        k∈I          h∈I
Clearly, even for extremely sparse frequency sets and moderate expansions of I this approach will
lead to exponentially increasing 𝑑 − 1 components 𝑧 𝑑−1 ≥ 2𝑑−1 and lattice sizes 𝑀 ≥ 2𝑑 .
     As in Remark 2.2, this approach will lead to exponential increase in 𝑀˜ and thus a linear depen-
dence of the dimension 𝑑 in all log 𝑀˜ terms. From a theoretical point of view, this turns out to
                                           
be disadvantageous for higher dimensions 𝑑 due to the fact that the runtime complexity of Algo-
                                                    27


rithm 2.2 as well as the estimates of the total number of sampling values in Theorems 2.2 and 2.3
will be affected by this factor.
     A more costly way of determining reconstructing single rank-1 lattices is a suitable CBC con-
                                                                                                       
struction as suggested in [46], which requires a computational complexity in O 𝑑|I| 2 . The ad-
ditional computational effort pays off when applying the theoretical bounds on the resulting lattice
size 𝑀. In more detail, the CBC approach offers reconstructing rank-1 lattices with prime lattice
sizes 𝑀 bounded from above by 𝑀 ≤ max(|I| 2 , 2(𝐾I + 1)), cf. [39, 46]. As a consequence, the
estimates in Remark 2.2 give 𝑀˜ ≤ 𝐶𝑑𝐾I2 |I| 2 or even 𝑀˜ ≤ 𝐶 0 𝑅𝐾I |I| 2 for I a subset of an ℓ1 -ball
of radius 𝑅. Thus, the estimates on the required number of sampling values for unique reconstruc-
tion of multivariate trigonometric polynomials in ΠI estimated in (2.6) are respectively only either
logarithmically dependent on 𝑑 or even independent of 𝑑.
2.3    Numerics
     In this section, we investigate the statements of Theorems 2.2 and 2.3 numerically². We consider
different types of frequency sets I. In particular, we use symmetric hyperbolic cross type frequency
sets
                                     
                                                                             Ö                       
                                                                                                      
                                                                                                          (2.11)
                                     
                                                                                                     
                                                                                                      
                         𝑑
                I = 𝐻 𝑅,even     := k := (𝑘 0 , . . . , 𝑘 𝑑−1 ) > ∈ (2Z) 𝑑 |       max(1, |𝑘 𝑡 |) ≤ 𝑅
                                                                                                     
                                     
                                                                            𝑡∈[𝑑]                    
                                                                                                      
with expansion parameter 𝑅 ∈ N, which results in 𝐾I ≤ 2𝑅, in up to 𝑑 = 9 spatial dimensions.
These frequency sets 𝐻 𝑅,even 𝑑       have the property that in each frequency component only even indices
occur. This matches the behavior of the Fourier support of the test function 𝐺 3𝑑 introduced below
in Section 2.3.2 which we approximate using samples on multiple rank-1 lattices, see also [50, 44]
and [65, section 2.3.5].
     In addition, we use random frequency sets I ⊂ ([−𝑅, 𝑅] ∩ Z) 𝑑 , which yield 𝐾I ≤ 2𝑅, and we
consider these in up to 𝑑 = 10 000 spatial dimensions.
     ²All code is available at https://www.math.msu.edu/~markiwen/Code.html
                                                            28


2.3.1   Deterministic multiple rank-1 lattices generated by Algorithm 2.2 suitable for recon-
        struction and approximation
2.3.1.1    Resulting numbers of samples and oversampling factors
    In the beginning, we determine the overall number of samples in the multiple rank-1 lattices
output from Algorithm 2.2. Up to an additive term of 1 − 𝐿, this corresponds to ℓ∈[𝐿] 𝑃ℓ in
                                                                                              Í
Theorem 2.2, since the node 0 (point of origin) is contained in each of the resulting rank-1 lattices
Λ(z, 𝑃ℓ ). We start with symmetric hyperbolic cross sets I = 𝐻 𝑅,even     𝑑      as defined in (2.11) and
consider three different types of reconstructing single rank-1 lattices for I, Λ(z, 𝑀), as input for
Algorithm 2.2.
    First, we use the rank-1 lattices from [50, Table 6.1], which were generated by the CBC method
[38, Algorithm 3.7], as input for Algorithm 2.2. We plot the results in Figure 2.1a for spatial
dimensions 𝑑 ∈ {2, 3, . . . , 9} and with various refinements 𝑅 ∈ N of I = 𝐻 𝑅,even  𝑑    . The observed
numbers of samples seem to behave slightly worse than linear with respect to the cardinality of the
frequency set I. The corresponding theoretical upper bounds according to Theorem 2.2 using (2.8)
for 𝑀˜ are also shown as filled markers with dashed lines for spatial dimensions 𝑑 ∈ {2, 9} in
Figure 2.1a. The plotted upper bounds are distinctly larger and their slopes seem to be slightly
higher than those observed by plotting the numerical tests.
    Second, we consider single reconstructing rank-1 lattices for I, Λ(z, 𝑀), with
      z := (1, 𝐾I + 1, (𝐾I + 1) 2 , . . . , (𝐾I + 1) 𝑑−1 ) > and 𝑀 := (𝐾I + 1) 𝑑 = (2𝑅 + 1) 𝑑 ,    (2.12)
where 𝐾I = 2𝑅 in our case, and we show the results in Figure 2.1b. We observe that the obtained
numbers of samples are similar to the ones in Figure 2.1a, and the theoretical upper bounds accord-
ing to Theorem 2.2 using (2.8) for 𝑀˜ are slightly higher due components of the generating vector z
being larger.
    Third, we apply Algorithm 2.2 to the reconstructing single rank-1 lattices for I, Λ(z, 𝑀), as
                                                      29


                          𝑑 =2          𝑑 =3      𝑑 =4       𝑑 =5     𝑑 =6          𝑑 =7      𝑑 =8         𝑑 =9      theo.
            109                                                                   109
            107                                                                   107
#samples    105                                                        #samples   105
            103                                                                   103
            101                                                                   101
            100                                                                   100
                  100     101    102      103    104   105   106                        100   101    102    103    104   105   106
                                           |I|                                                               |I|
           (a) Λ(z, 𝑀) generated by [38, Algorithm 3.7]                                 (b) z and 𝑀 according to (2.12)
Figure 2.1 Overall #samples = 1 − 𝐿 + ℓ∈[𝐿] 𝑃ℓ for symmetric hyperbolic cross index sets I =
                                                             Í
  𝑑
𝐻 𝑅,even . Filled markers with dashed lines represent theoretical upper bounds from Theorem 2.2 for
𝑑 ∈ {2, 9} calculated using (2.8).
considered in [37, section 6]. In detail, we choose
                                 Ö
                        𝑀 :=            𝑞 𝑡 and z := (𝑀/𝑞 0 , 𝑀/𝑞 1 , . . . , 𝑀/𝑞 𝑑−1 ) > ,
                                𝑡∈[𝑑]                                                                                           (2.13)
                        where 𝑞 0 := 𝑑𝐾I + 𝑑 + 1 and 𝑞 𝑡+1 := min{𝑝 ∈ N | 𝑝 > 𝑞 𝑡 and 𝑝 prime}.
Here, the observed numerical results yield results that do not differ recognizably from Figure 2.1b,
and we therefore omit these plots. We would like to point out, that the theoretical upper bounds for
that kind of reconstructing single rank-1 lattices are slightly worse than those plotted in Figure 2.1b,
cf. Remark 2.2.
           Note that when running Algorithm 2.2 using single rank-1 lattices Λ(z, 𝑀) of type (2.12)
and (2.13) in practice, one may need to deal with limited numeric precision in the computer arith-
metic. For instance, for higher spatial dimensions, some components 𝑧𝑡 of the generating vector
z may become larger than 64-bit integers. This means that the sets J𝑟0 may have to be computed
carefully and repeatedly modulo each considered prime 𝑝 ∈ P|I| when searching for the primes
𝑃0 , . . . , 𝑃 𝐿−1 in Lines 6 to 21 of Algorithm 2.2.
   In order to have a closer look at the number of samples, we visualize the oversampling factor
#samples / |I| = (1 − 𝐿 + ℓ∈[𝐿] 𝑃ℓ )/|I| in Figure 2.2. For the considered test cases and the
                            Í
                                                                     30


three different types of lattices, we observe that the oversampling factors are below 1.7 log |I| + 3
for |I| > 1. This is distinctly smaller than the theoretical upper bounds in Theorem 2.2 suggest,
which have a constant of ≈ 5.7 and additional logarithmic factors depending on 𝑀.
                                                                               ˜ For instance
in Figure 2.2a, for I = 𝐻256,even
                         9
                                  (cardinality |I| = 1 264 513 and #samples = 27 025 383), the
oversampling factor is ≈ 21.37 whereas the corresponding upper bound for the oversampling factor
is ≈ 3 069 according to Theorem 2.2 using (2.8) for 𝑀.
                                                    ˜ The plots for reconstructing single rank-1
lattices for I, Λ(z, 𝑀), according to (2.13) look similar to the ones according to (2.12), where the
latter are shown in Figure 2.2b. Moreover, we only observe a relatively small difference compared
to Figure 2.2a.
                       𝑑 =2      𝑑 =3         𝑑 =4    𝑑 =5       𝑑 =6                  𝑑 =7     𝑑 =8      𝑑 =9          1.7 log |I| + 3
                20                                                                      20
#samples/|I|                                                            #samples/|I|
                15                                                                      15
                10                                                                      10
                 5                                                                       5
                 1                                                                       1
                     100   101   102    103    104   105   106                                100   101   102    103     104   105    106
                                         |I|                                                                      |I|
               (a) Λ(z, 𝑀) generated by [38, Algorithm 3.7]                                   (b) z and 𝑀 according to (2.12)
Figure 2.2 Oversampling factors for deterministic reconstructing multiple rank-1 lattices for sym-
metric hyperbolic cross index sets 𝐻 𝑅,even
                                     𝑑      .
               Next, we change the setting and use frequency sets I drawn uniformly randomly from cubes
[−𝑅, 𝑅] 𝑑 ∩ Z𝑑 . We generate reconstructing single rank-1 lattices for I, Λ(z, 𝑀), using [38, Al-
gorithm 3.7]. Then, we apply Algorithm 2.2 in order to deterministically generate reconstructing
multiple rank-1 lattices. We repeat the test 10 times for each setting with newly randomly cho-
sen frequency sets I and determine the maximum number of samples over the 10 repetitions. For
frequency set sizes |I| ∈ {10, 100, 1 000, 10 000} in 𝑑 ∈ {2, 3, 4, 6, 10, 100, 1 000, 10 000} spatial
dimensions and |I| = 100 000 for only some of the aforementioned spatial dimensions 𝑑, we visu-
alize the resulting oversampling factors in Figure 2.3 for expansion parameter 𝑅 = 64 (𝐾I ≤ 128).
                                                                    31


Using different reconstructing single rank-1 lattices for I, Λ(z, 𝑀), as in Figure 2.2, changes the
oversampling factors only slightly, and the oversampling factors are still well below 1.7 log |I| + 3,
compare Figures 2.3a and 2.3b. The plots for reconstructing single rank-1 lattices Λ(z, 𝑀) accord-
ing to (2.13) are omitted since they look very similar to Figure 2.3b. As mentioned before, we have
to take care of possible issues with numeric precision when running Algorithm 2.2 on reconstruct-
ing single rank-1 lattices of type (2.12) and (2.13) in practice.
                𝑑 =2         𝑑 =3         𝑑 =4         𝑑 =6     𝑑 = 10    𝑑 = 100               𝑑 = 1000         𝑑 = 10000   1.7 log |I| + 3
               20                                                                          20
#samples/|I|                                                                #samples/|I|
               15                                                                          15
               10                                                                          10
                5                                                                          5
                1                                                                          1
                       101          102          103      104       105                         101        102        103    104       105
                                                 |I|                                                                   |I|
               (a) Λ(z, 𝑀) generated by [38, Algorithm 3.7]                                      (b) z and 𝑀 according to (2.12)
Figure 2.3 Oversampling factors for deterministic reconstructing multiple rank-1 lattices for random
frequency sets I ⊂ {−64, −63, . . . , 64} 𝑑 .
2.3.1.2                Improvement of numbers of samples compared to single rank-1 lattices constructed
                       component-by-component
               For the deterministic reconstructing multiple rank-1 lattices generated by Algorithm 2.2 in the
previous subsection, one aspect of particular interest is the total number of nodes compared to the re-
constructing single rank-1 lattices, which are given as an input to the algorithm. We investigate this
in more detail for the case of lattices generated component-by-component by [38, Algorithm 3.7].
These reconstructing single rank-1 lattices for I, Λ(z, 𝑀), are specifically tailored to the structure
of the corresponding frequency sets I. We do not consider the case when Algorithm 2.2 is applied
to single rank-1 lattices of type (2.12) or (2.13) as these ones are typically extremely large compared
to the cardinality |I| of the frequency sets I.
               First, we start with symmetric hyperbolic cross index sets I = 𝐻 𝑅,even
                                                                                𝑑      and reconstructing sin-
                                                                          32


gle rank-1 lattices for I, Λ(z, 𝑀) generated by [38, Algorithm 3.7]. In Figure 2.4a, the obtained
#samples from Figure 2.1a is divided by the size 𝑀 of the single rank-1 lattice. We observe that
for smaller expansion parameters 𝑅 and consequently smaller cardinalities |I|, the generated mul-
tiple rank-1 lattices still consist of more nodes than the corresponding single rank-1 lattices and
therefore the ratio is larger than one. One main reason for this behavior is that for the component-
by-component constructed single rank-1 lattices, the number of nodes is initially much less than the
worst case upper bounds of almost O (|I| 2 ) suggest, cf. [38, section 3.8.2] for a detailed discussion.
Once a certain expansion 𝐾I and cardinality |I| have been reached, the multiple rank-1 lattices
outperform the single rank-1 lattices, yielding ratios around 0.1 in Figure 2.4a, i.e., Algorithm 2.2
reduces the number of sampling nodes by 9/10.
     Second, we consider randomly generated frequency sets as in Figure 2.3a. In Figure 2.4b, we vi-
sualize the ratios of the number of nodes of the deterministic reconstructing multiple rank-1 lattices
generated by Algorithm 2.2 over the lattice sizes 𝑀 of the reconstructing single rank-1 lattices gen-
erated by [38, Algorithm 3.7]. For the spatial dimensions 𝑑 ≥ 4 considered in Figure 2.3a, the ratios
decrease rapidly for increasing cardinality |I|, and we do not observe any noticeable dependence
on the spatial dimension 𝑑. Note that in the case 𝑑 = 2, the ratios are close to or above one since the
cube {−64, −63, . . . , 64}2 of possible frequencies only has cardinality 16 641 and the single rank-1
lattices already have small oversampling factors 𝑀/|I| < 16. Similarly, in the case 𝑑 = 3 for
cardinality |I| = 105 , the frequency set I fills approximately 1/20 of the cube {−64, −63, . . . , 64}3
and again the low oversampling factors 𝑀/|I| < 22 of the single rank-1 lattices are hard to beat
for multiple rank-1 lattices.
                                                   33


                                                                                                  𝑑=2             𝑑=3           𝑑=4
                          𝑑=2      𝑑=3           𝑑=4        𝑑=5                                   𝑑=6             𝑑 = 10        𝑑 = 100
                          𝑑=6      𝑑=7           𝑑=8        𝑑=9                                   𝑑 = 1000        𝑑 = 10000
              101                                                                 101
#samples/𝑀                                                          #samples/𝑀
              100                                                                 100
                                                                                 10−1
             10−1
                                                                                 10−2
             10−2
                    100    101   102   103      104   105   106                             101         102       103     104      105
                                         |I|                                                                      |I|
(a) using symmetric hyperbolic cross index sets I = (b) I ⊂ {−64, −63, . . . , 64} 𝑑 random frequency sets
  𝑑
𝐻 𝑅,even
Figure 2.4 Ratio #samples for deterministic reconstructing multiple rank-1 lattices suitable for ap-
proximation over lattice size 𝑀 of reconstructing single rank-1 lattice Λ(z, 𝑀), where Λ(z, 𝑀) was
generated by [38, Algorithm 3.7].
2.3.2           Comparison of reconstructing multiple and single rank-1 lattices for function approx-
                imation
             As mentioned in Corollary 2.1, we can use Algorithm 2.1 to compute approximations of func-
tions from samples along multiple rank-1 lattices. We consider the tensor-product test functions
𝐺 3𝑑 : T𝑑 → C from [50], 𝐺 3𝑑 (x) := 𝑗 ∈[𝑑] 𝑔3 (𝑥 𝑗 ), where the one-dimensional function 𝑔3 : T → C
                                    Î
is defined by
                                         r
                                                   3𝜋     
                                                                                              3
                                                                                                
                           𝑔3 (𝑥) := 4                      2 + sgn((𝑥 mod 1) − 1/2) sin(2𝜋𝑥)
                                               207𝜋 − 256
and k𝐺 3𝑑 k 𝐿 2 (T𝑑 ) = 1. The function 𝐺 3𝑑 lies in a so-called Sobolev space of dominating mixed
smoothness with smoothness almost 3.5 such that its Fourier coefficients 𝐺ˆ 3𝑑 decay fast with re-
spect to hyperbolic cross structures. In addition, ( 𝐺ˆ 3𝑑 )k = 0 if at least one component of k is odd.
Therefore, we approximate the function 𝐺 3𝑑 by multivariate trigonometric polynomials 𝐺 3𝑑,𝐿 :=
Í         𝑑,𝐿   2𝜋ik·◦ with Fourier coefficients supported on modified hyperbolic cross index sets
  k∈I ( Ĝ3 )k e
I = 𝐻 𝑅,even
      𝑑      as defined in (2.11). We compute the Fourier coefficients Ĝ3𝑑,𝐿 based on samples of 𝐺 3𝑑
and determine the relative 𝐿 2 (T𝑑 ) sampling errors k𝐺 3𝑑 − 𝐺 3𝑑,𝐿 k 𝐿 2 (T𝑑 ) /k𝐺 3𝑑 k 𝐿 2 (T𝑑 ) , where
                                  s                                 2 Õ                            2
                                                        Õ                                 
         k𝐺 3𝑑 − 𝐺 3 k 𝐿 2 (T𝑑 ) = k𝐺 3𝑑 k 2𝐿 2 (T𝑑 ) −
                   𝑑,𝐿
                                                            Ĝ3𝑑,𝐿    +         𝐺ˆ 3𝑑 − Ĝ3𝑑,𝐿           .
                                                                                        k                     k           k
                                                                  k∈I                             k∈I
                                                                  34


We compare the numerical results from [44, Figure 4.3b], where reconstructing single rank-1 lat-
tices and reconstructing random multiple rank-1 lattices were used, with new results using deter-
ministic multiple rank-1 lattices returned by Algorithm 2.2.
     As input for Algorithm 2.2, we use reconstructing single rank-1 lattices for I, Λ(z, 𝑀), with
generating vectors chosen according to (2.12). Instead of computing the Fourier coefficients 𝐺ˆ 3𝑑,𝐿
of the multivariate trigonometric polynomial 𝐺 3𝑑,𝐿 by Algorithm 2.1, we use [43, Algorithm 2],
which averages over all single rank-1 lattices Λ(z, 𝑃ℓ ) that are able to reconstruct a Fourier coef-
ficient 𝑔ˆ k of any multivariate trigonometric polynomial 𝑔 for a given frequency k ∈ I, whereas
Algorithm 2.1 uses only one single rank-1 lattice Λ(z, 𝑃𝜈(k) ). Note that both computation methods
are based on the same samples of 𝐺 3𝑑 along the obtained deterministic multiple rank-1 lattices. The
resulting relative 𝐿 2 (T𝑑 ) sampling errors are visualized for spatial dimensions 𝑑 ∈ {2, 3, 5, 8} in
Figure 2.5 as solid lines and filled markers. We observe that the errors decrease rapidly for increas-
ing expansion parameters 𝑅 of the hyperbolic cross I = 𝐻 𝑅,even 𝑑     and correspondingly increasing
number of samples. In addition, we consider reconstructing single rank-1 lattices generated by [38,
Algorithm 3.7] as input for Algorithm 2.2 and obtain results which are very close and therefore
omit their plots.
     Moreover, the relative errors from [44, Figure 4.3b] when using reconstructing random multiple
rank-1 lattices are shown in Figure 2.5 as dotted lines and filled markers. We observe that the
obtained number of samples and errors are similar to the deterministic ones. The results for the
deterministic multiple rank-1 lattice seem to be slightly better for 𝑑 ∈ {3, 5, 8}. In addition, the
relative errors from [44, Figure 4.3b] when directly sampling along reconstructing single rank-1
lattices are drawn as dashed lines and unfilled markers. It has already been observed in [44] that
in the beginning for smaller expansion parameters 𝑅 and consequently smaller number of samples,
the single rank-1 lattices perform better until a certain expansion parameter 𝑅 has been reached.
Afterwards, the multiple rank-1 lattices clearly outperform the single ones.
                                                  35


                                                                                      𝑑=2    𝑑=3   𝑑=5     𝑑=8
                                                  100
 k𝐺 3𝑑 − 𝐺 3𝑑,𝐿 k 𝐿 2 (T𝑑 ) /k𝐺 3𝑑 k 𝐿 2 (T𝑑 )
                                                 10−2
                                                 10−4
                                                 10−6
                                                 10−8
                                                 10−10
                                                     100      101      102      103      104      105    106        107      108      109
                                                                                       number of samples
Figure 2.5 Relative 𝐿 2 (T𝑑 ) sampling errors for 𝐺 3𝑑 with respect to the number of samples for re-
constructing single rank-1 lattices (dashed lines, unfilled markers), reconstructing random multiple
rank-1 lattices (dotted lines, filled markers), and reconstructing deterministic multiple rank-1 lat-
tices (solid lines, filled markers), when using the frequency index sets I := 𝐻 𝑅,even
                                                                                   𝑑    . Results for
single rank-1 lattices from [65, Figure 2.14] and for reconstructing random multiple rank-1 lattices
from [44, Figure 4.3].
2.3.3                                             Deterministic multiple rank-1 lattices with decreasing lattice size for reconstruction of
                                                  trigonometric polynomials
                                   Besides generating deterministic multiple rank-1 lattices according to Theorem 2.2 and Algo-
rithm 2.2, we have also discussed the alternate approach of Theorem 2.3, where the theoretical
results for function approximation, as mentioned in Corollary 2.1, cannot be applied directly, but
the number of required samples for the reconstruction of multivariate trigonometric polynomials
may be distinctly smaller.
                                   We start with symmetric hyperbolic cross type index sets I = 𝐻 𝑅,even
                                                                                                  𝑑      and apply the gen-
eration strategy of Theorem 2.3 on reconstructing single rank-1 lattices for I, Λ(z, 𝑀), gener-
ated by [38, Algorithm 3.7]. We visualize the resulting oversampling factors #samples / |I| =
(1 − 𝐿 + ℓ∈[𝐿] 𝑃ℓ )/|I| in Figure 2.6a for spatial dimensions 𝑑 ∈ {2, 3, . . . , 9} and various expan-
         Í
sion parameters 𝑅. For the considered test cases, we observe that the oversampling factors are well
below 3. When starting with single rank-1 lattices according to (2.12), the observed oversampling
factors only differ slightly, cf. Figure 2.6b.
                                                                                            36


                                     𝑑 =2          𝑑 =3         𝑑 =4         𝑑 =5                  𝑑 =6          𝑑 =7           𝑑 =8         𝑑 =9
               3                                                                                    3
#samples/|I|                                                                        #samples/|I|
               2                                                                                    2
               1                                                                                    1
                   100   101     102        103    104     105    106                                      100      101     102    103       104     105   106
                                             |I|                                                                                       |I|
(a) Λ(z, 𝑀) generated by [38, Algorithm 3.7] for (b) z and 𝑀 according to (2.12) for symmetric hyper-
symmetric hyperbolic cross index sets I = 𝐻 𝑅,even
                                            𝑑      bolic cross index sets I = 𝐻 𝑅,even
                                                                                𝑑
                         𝑑 =2           𝑑 =3          𝑑 =4         𝑑 =6         𝑑 = 10                    𝑑 = 100          𝑑 = 1000           𝑑 = 10000
               3                                                                                    3
#samples/|I|                                                                        #samples/|I|
               2                                                                                    2
               1                                                                                    1
                   101         102          103           104          105                                 101            102          103          104     105
                                             |I|                                                                                       |I|
(c) Λ(z, 𝑀) generated by [38, Algorithm 3.7] for ran- (d) z and 𝑀 according to (2.12) for random frequency
dom frequency sets I ⊂ {−64, −63, . . . , 64} 𝑑       sets I ⊂ {−64, −63, . . . , 64} 𝑑
Figure 2.6 Oversampling factors for deterministic reconstructing multiple rank-1 lattices con-
structed according to Theorem 2.3.
               The reason for these very low oversampling factors is that during the generation process ac-
cording to the proof of Theorem 2.3 the prime 𝑃0 is relatively close to |I|, the next prime 𝑃1 is
relatively close to |I \ I0 |, 𝑃2 is relatively close to |I \ (I0 ∪ I1 )|, and so on, where I0 contains
the frequencies of I which can be reconstructed by the lattice Λ(z, 𝑃0 ) and where I1 contains the
frequencies of I \ I0 which can be reconstructed by Λ(z, 𝑃1 ). In particular, we do not have the
fixed lower bound |I| ≤ 𝑃ℓ for all ℓ as in Algorithm 2.2.
               Next, we change the setting and use the frequency sets I drawn uniformly randomly from
cubes [−𝑅, 𝑅] 𝑑 ∩ Z𝑑 , see Section 2.3.1. As before, we generate reconstructing single rank-1 lat-
                                                                                37


tices for I, Λ(z, 𝑀), using [38, Algorithm 3.7]. Then, we apply the strategy of Theorem 2.3
in order to deterministically generate reconstructing multiple rank-1 lattices. We repeat the test
10 times for each setting with newly randomly chosen frequency sets I and determine the maxi-
mum number of samples over the 10 repetitions. For cardinalities |I| ∈ {10, 100, 1 000, 10 000} in
𝑑 ∈ {2, 3, 4, 6, 10, 100, 1 000, 10 000} spatial dimensions, we visualize the resulting oversampling
factors in Figure 2.6c for expansion parameter 𝑅 = 64 (𝐾I ≤ 128). Starting with reconstruct-
ing single rank-1 lattices Λ(z, 𝑀) according to (2.12) as in Figure 2.3b changes the oversampling
factors only slightly, and the oversampling factors are still well below 4, cf. Figure 2.6d.
                                                   38


                                             CHAPTER 3
                 HIGH-DIMENSIONAL SPARSE FOURIER TRANSFORMS
As discussed in Section 1.1.2, this chapter focuses on efficient sparse Fourier transforms (SFTs)
for high-dimensional functions. We begin with a review of the prior work against which we com-
pare our techniques as well as provide a more in-depth discussion of the methods in Section 3.1.
Section 3.2 reviews and further refines the univariate SFTs from [37, 53] which we will use in our
multivariate techniques. Section 3.3 presents our main multivariate approximation algorithms and
their analysis. Finally, we implement these two algorithms numerically and present the empirical
results in Section 3.4.
3.1    Overview of results and prior work
    Much recent work has considered the problem of quickly recovering both exactly sparse mul-
tivariate trigonometric polynomials as well as approximating more general functions by sparse
trigonometric polynomials using dimension-incremental approaches [65, 59, 17, 16]. These meth-
ods recover multivariate frequencies adaptively by searching lower-dimensional projections of I ⊂
   − 𝐾2 , 𝐾2 ∩ Z for energetic frequencies. These lower dimensional candidate sets are then
               𝑑
paired together to build up a fully 𝑑-dimensional search space smaller than the original one, which
is expected to support the most energetic frequencies (see e.g., [42, Section 3] and the references
within for a general overview).
    In the context of Fourier methods, lattice-based techniques do a good job of support identifi-
cation on the intermediary, lower-dimensional candidate sets, and especially recently, techniques
based on multiple rank-1 lattices have shown success [43, 42] (see also Chapter 2). Though the total
complexity in each of these steps is manageable and can be kept linear in the sparsity 𝑠 of the Fourier
series to be computed, these steps must be repeated in general to ensure that no potential frequencies
have been left out. In particular, this results in at least O (𝑑𝑠2 𝐾) operations (up to logarithmic fac-
tors) for functions supported on arbitrary frequency sets in order to obtain approximations that are
guaranteed to be accurate with high probability. Though from an implementational perspective,
this runtime can be mitigated by completing many of the repetitions and initial one-dimensional
                                                    39


searches in parallel, once pairing begins, the results of previous iterations must be synchronized
and communicated to future steps, necessitating serial interruptions.
    Other earlier works include [37] in which previously existing univariate SFT results [36, 62] are
refined and adapted to the multivariate setting. Though the resulting complexity on the dimension
is well above the dimension-incremental approaches, deterministic guarantees are given for multi-
variate Fourier approximation in O (𝑑 4 𝑠2 ) (up to logarithmic factors) time and memory, as well as
a random variant which drops to linear scaling in 𝑠, leading to a runtime on the order of O (𝑑 4 𝑠)
with respect to 𝑠 and 𝑑. Additionally, the compressed sensing type guarantees in terms of Fourier
compressibility of the function under consideration carry over from the univariate SFT analysis.
The scheme essentially makes use of a reconstructing rank-1 lattice on a superset of the full integer
                            𝑑
cube I = − 𝑑𝐾     2   ,  𝑑𝐾
                          2   ∩  Z    with certain number theoretic properties that allow for fast in-
version of the resulting one-dimensional coefficients by the Chinese Remainder Theorem. We note
that this necessarily inflated frequency domain accounts for the suboptimal scaling in 𝑑 above.
    In [54], another fully deterministic sampling strategy and reconstruction algorithm is given.
Like [37] though, the method can only be applied to Fourier approximations over an ambient fre-
quency space I which is a full 𝑑-dimensional cube. Moreover, the vector space structure exploited
to construct the sampling sets necessitates that the side length 𝐾 of this cube is the power of a
prime. However, the benefits to this construction are among the best considered so far: the method
is entirely deterministic, has noise-robust recovery guarantees in terms of best 𝑠-term estimates,
the sampling sets used are on the order of O (𝑑 3 𝑠2 𝐾), and the reconstruction algorithm’s runtime
complexity is on the order of O (𝑑 3 𝑠2 𝐾 2 ) both up to logarithmic factors. On the other hand, this
algorithm still does not scale linearly in 𝑠.
    Finally, we discuss [15, 14], a pair of papers detailing high-dimensional Fourier recovery algo-
rithms which offer a simplified (and therefore faster) approach to lattice transforms and dimension-
incremental methods. These algorithms make heavy use of a one-dimensional SFT [51, 18] based
on a phase modulation approach to discover energetic frequencies in a fashion similar to our Al-
gorithm 3.1 below. The main idea is to recover entries of multivariate frequencies by using equis-
                                                   40


paced evaluations of the function along a coordinate axis as well as samples of the function at the
same points slightly shifted (the remaining dimensions are generally ignored). This shift in space
produces a modulation in frequency from which frequency data can be recovered (cf. (3.6) and
Algorithm 3.1 below). By supplementing this approach with simple reconstructing rank-1 lattice
analysis for repetitions of the full integer cube, the runtime and number of samples are given on
average as O (𝑑𝑠) up to logarithmic factors.
    However, due to the possibility of collisions of multivariate frequencies under the hashing al-
gorithms employed, these results hold only for random signal models. In particular, theoretical
results are only stated for functions with randomly generated Fourier coefficients on the unit circle
with randomly chosen frequencies from a given frequency set. Additionally, the analysis of these
techniques assumes that the algorithm applied to the randomly generated signal does not encounter
certain low probability (with respect to the random signal model considered therein) energetic fre-
quency configurations. Furthermore, the method is restricted in stability, allowing for spatial shifts
in sampling bounded by at most the reciprocal of the side length of the multivariate frequency cube
under consideration, and only exact recovery is considered (or recovery up to factors related to sam-
ple corruption by Gaussian noise in [14]). In addition, no results given are proven concerning the
approximation of more general periodic functions, e.g., compressible functions.
3.1.1    Main contributions
    We begin with a brief summary of the benefits provided by our approach in comparison to
the methods discussed above. Below, we ignore logarithmic factors in our summary of the run-
time/sampling complexities.
     • All variants, deterministic and random, of both algorithms presented in this paper have run-
       time and sampling complexities linear in 𝑑 with best 𝑠-term estimates for arbitrary signals.
       This is in contrast to the complexities of dimension-incremental approaches [16, 17, 43, 42]
       and the number theoretic approaches [37, 54] while still achieving similarly strong best 𝑠-term
       guarantees.
     • Both algorithms proposed herein have randomized variants with runtime and sampling com-
                                                   41


        plexities linear in 𝑠 with best 𝑠-term estimates on arbitrary signals that hold with high
        probability. Thus, the randomized methods proposed in this paper achieve the efficient run-
        time complexities of [15, 14] while simultaneously exhibiting best 𝑠-term approximation
        guarantees for general periodic functions thereby improving on the non-deterministic dimen-
        sion incremental approaches [16, 17, 43, 42].
      • Both algorithms proposed herein have a deterministic variant with runtime and sampling
        complexities quadratic in 𝑠 with best 𝑠-term estimates on arbitrary signals that also hold
        deterministically. This is in contrast to all previously discussed methods without determin-
        istic guarantees, [16, 17, 43, 42, 14, 15], as well as improving on prior deterministic results
        [37, 54] for functions whose energetic frequency support sets I are smaller than the full cube.
Overview of the methods and related theory
     We will build on the fast and potentially deterministic one-dimensional SFT from [37] and its
discrete variant from [53] by applying those techniques along rank-1 lattices. As previously dis-
cussed, the primary difficulty in doing so is matching energetic one-dimensional Fourier coefficients
with their 𝑑-dimensional counterparts. We are especially interested in doing this in an efficient and
provably accurate way. We propose and analyze two different methods for solving this problem
herein.
     The first frequency identification approach, Algorithm 3.1, involves modifications of the phase
shifting technique from [51, 18, 15, 14]. We make use of the translation to modulation property
of the Fourier transform (cf. (3.6) below) observed in these works to extract out frequency data.
Combining this with SFTs on rank-1 lattices gives a new class of fast methods with several bene-
fits. Notably, we are able to maintain error guarantees for any function (not just random signals) in
terms of best Fourier 𝑠-term approximations. Additionally, we factor the instability and potential
for collisions from [15, 14] into these best 𝑠-term approximations. The only downside in our es-
timates is an additional linear factor of 𝐾 multiplying the terms commonly seen in standard error
bounds (cf. Corollaries 3.1 and 3.2). However, we are able to maintain deterministic results with
runtime and sampling complexities that are quadratic in 𝑠, as well as results for random variants
                                                    42


with complexities that are linear in 𝑠. Additionally, the dependence on the dimension 𝑑 is reduced
from O (𝑑 4 ) in [37] to only O (𝑑).
     Our second technique in Algorithm 3.2 uses a different approach to applying SFTs to modifi-
cations of the multivariate function along a reconstructing rank-1 lattice. Effectively, we reduce
𝑔 to a two-dimensional function. This is done by mapping all but one dimension, say ℓ, down to
one using a rank-1 lattice, and leaving the ℓ dimension free. From here, we take a two-dimensional
DFT (taking care to use SFTs where possible). The locations of Fourier coefficients in this two-
dimensional DFT can then be used to determine the ℓth coordinate of the frequency data. This is
then repeated for each dimension ℓ ∈ [𝑑].
     This process is slower but more stable than Algorithm 3.1. In particular this produces more ac-
curate best Fourier 𝑠-term approximation guarantees without the extraneous factor of 𝐾 (cf. Corol-
laries 3.3 and 3.4). The deterministic results still have a complexity quadratic in 𝑠 with random
extensions that are linear in 𝑠. However, we incur an extra quadratic factor of 𝐾 in the complexity
bounds (cf. Lemma 3.4).
     We stress here that by compartmentalizing the translation from multivariate analysis to univari-
ate analysis into the theory of rank-1 lattices, our techniques are suitable for any frequency set of
interest I. The only constraint is the necessity for a reconstructing rank-1 lattice for I (and poten-
tially projections of I in the case of Algorithm 3.2). This flexibility improves the results from [37],
primarily with respect to the polynomial factor of 𝑑 in our runtime and sampling complexities. We
remark that though the existence of the necessary reconstructing rank-1 lattice is a nontrivial re-
quirement, there exist efficient construction algorithms for arbitrary frequency sets via deterministic
component by component methods, see e.g., [39, 46, 56].
     In terms of implementation, we note that the multivariate techniques we employ are entirely
modular with respect to the univariate SFT used. As such, the complexity estimates and error
bounds for our approaches in Section 3.3 are directly derived from the chosen SFT.
     Finally, the methods we present are trivially parallelizable so that in particular, a large majority
of these univariate SFTs in Algorithm 3.1 or Algorithm 3.2 can occur in parallel.
                                                   43


3.2     One-dimensional sparse Fourier transform results
      Below, we summarize some of the previous work on one-dimensional sparse Fourier transforms
which will be used in our multivariate algorithms. Rather than focus on the inner workings of these
SFTs, we highlight five main properties concerning their recovery guarantees and computational
complexity. This compartmentalization allows for any SFT satisfying these properties to be easily
extended for multivariate Fourier recovery simply by plugging into Algorithm 3.1 and 3.2.
      We first review the sublinear-time algorithm from [37] which uses fewer than 𝑀 nonequispaced
samples of a function to compute Fourier coefficients in B𝑀 . We refer the reader interested in its
implementation and mathematical explanation to [37] as well as [36, 62]. Below, we will use slightly
improved error bounds over those in its original presentation. The proof of these improvements
necessitates the following lemma.
                                                                                     opt
Lemma 3.1. For x ∈ C𝐾 and S𝜏 := {𝑘 ∈ [𝐾] | |𝑥 𝑘 | ≥ 𝜏}, if 𝜏 ≥                   kx−x𝑠 k 1
                                                                                    𝑠       , then |S𝜏 | ≤ 2𝑠 and
                                                              opt
                                                                          √
                                    kx − x| S𝜏 k 2 ≤ kx − x2𝑠 k 2 + 𝜏 2𝑠,
                                                              opt
                                    kx − x| S𝜏 k 1 ≤ kx − x2𝑠 k 1 + 𝜏 · 2𝑠.
Proof. Ordering the entries of x in descending order (with ties broken arbitrarily) as |𝑥 𝑘 1 | ≥ |𝑥 𝑘 2 | ≥
. . ., we first note that
                                                       Õ2𝑠
                                            opt
                                    kx −   x𝑠 k 1  ≥        |𝑥 𝑘 𝑗 | ≥ 𝑠|𝑥 𝑘 2𝑠 |.
                                                      𝑗=𝑠+1
By assumption then, 𝜏 ≥ |𝑥 𝑘 2𝑠 |, and since S𝜏 contains the |S𝜏 |-many largest entries of x, we must
have S𝜏 ⊂ supp(x2𝑠 ). Note then that |S𝜏 | ≤ 2𝑠. Finally, we calculate
                      opt
                                                     opt            opt
                            kx − x| S𝜏 k 2 ≤ kx − x2𝑠 k 2 + kx2𝑠 − xS𝜏 k 2
                                                              s         Õ
                                                     opt
                                           ≤ kx − x2𝑠 k 2 +                          |𝑥 𝑘 | 2
                                                                            opt
                                                                   𝑘∈supp(x2𝑠 )\S𝜏
                                                     opt
                                                                 √
                                           ≤ kx − x2𝑠 k 2 + 𝜏 2𝑠.
The ℓ 1 estimate is proved by the same procedure, where the 2𝑠 many terms are bounded by 𝜏 in the
last line without a square root.
                                                       44


Theorem 3.1 (Robust sublinear-time, nonequispaced SFT: [37], Theorem 7/[53], Lemma 4). For
a signal 𝑔 1d ∈ 𝑊 (T) ∩ 𝐶 (T) corrupted by some arbitrary noise 𝜇 : T → C, Algorithm 3 of [37],
denoted A2𝑠,𝑀
            sub , will output a 2𝑠-sparse coefficient vector ĝ1d,𝑠 ∈ CB 𝑀 which
    1. reconstructs every frequency of 𝑔ˆ 1d | 𝑀 ∈ CB𝑀 , 𝜔 ∈ B𝑀 , with corresponding Fourier coeffi-
       cients meeting the tolerance
                                                                                                                          !
                                             √     k 𝑔ˆ 1d | − ( 𝑔ˆ 1d | ) opt k
                                                                                    1
                       | 𝑔ˆ 𝜔1d | > (4 + 2 2)                                         + k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ ,
                                                            𝑀              𝑀  𝑠
                                                                    𝑠
    2. satisfies the ℓ ∞ error estimate for recovered coefficients
                                                                     1d            1d     opt
                                                         √ © 𝑔ˆ | 𝑀 − ( 𝑔ˆ | 𝑀 ) 𝑠             1
           ( 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 )                                                 + 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀
                                                                                                                                   ª
                                                  ∞
                                                     ≤ 2 ­­                                                             1
                                                                                                                           +  k𝜇k ∞® ,
                                                                                                                                   ®
                                                                                 𝑠
                                                              «                                                                    ¬
    3. satisfies the ℓ 2 error estimate
                                                                                        √                                   opt
                                                                          opt        (8   2 + 6)k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 ) 𝑠 k 1
                 k 𝑔ˆ 1d | 𝑀        1d,𝑠          1d
                                − ĝ k 2 ≤ k 𝑔ˆ | 𝑀 −        ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 2 +                        √
                                                                                                            𝑠
                                                         √           √                                      
                                                  + (8 2 + 6) 𝑠 k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ ,
    4. satisfies the ℓ 1 error estimate
                                                                          opt
                                                                                       √                                     opt
                 k 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 k 1 ≤ k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 1 + (6 2 + 16)k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 ) 𝑠 k 1
                                                        √                                                 
                                                  + (6 2 + 16)𝑠 k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ ,
    5. and the number of required samples of 𝑔 1d and the operation count for A2𝑠,𝑀                               sub are
                                                                       𝑠2 log4 𝑀
                                                                                   
                                                                O                     .
                                                                          log 𝑠
The Monte Carlo variant of A2𝑠,𝑀            sub , denoted A sub,MC , referred to by Corollary 4 of [37] satisfies all
                                                                   2𝑠,𝑀
of the conditions (1) – (4) simultaneously with probability (1 − 𝜎) ∈ [2/3, 1) and has number of
required samples and operation count
                                                                                  
                                                                3                𝑀
                                                   O 𝑠 log (𝑀) log                      .
                                                                                 𝜎
The samples required by A2𝑠,𝑀            sub,MC
                                                are a subset of those required by A2𝑠,𝑀            sub .
                                                                      45


Proof. We refer to [37, Theorem 7] and its modification for noise robustness in [53, Lemma 4] for
the proofs of properties (2) and (5). As for (1), [37, Lemma 6] and its modification in [53, Lemma
4] imply that any 𝜔 ∈ B𝑀 with | 𝑔ˆ 𝜔1d | > 4(k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 ) 𝑠 k 1 /𝑠 + k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ ) =: 4𝛿
                                                                                     opt
will be identified in [37, Algorithm 3]. An approximate Fourier coefficient for these and any other
recovered frequencies is stored in the vector x which satisfies the same estimate in property (2) by
the proof of [37, Theorem 7] and [53, Lemma 4]. However, only the 2𝑠 largest magnitude values of
x will be returned in ĝ1d,𝑠 . We therefore analyze what happens when some of the potentially large
Fourier coefficients corresponding to frequencies in S4𝛿 do not have their approximations assigned
to ĝ1d,𝑠 .
     Using the definition of S𝜏 given in Lemma 3.1 applied to 𝑔ˆ 1d | 𝑀 , we must have |S4𝛿 | ≤ 2𝑠 =
| supp( ĝ1d,𝑠 )|. If 𝜔 ∈ S4𝛿 \ supp( ĝ1d,𝑠 ), there must then exist some other 𝜔0 ∈ supp( ĝ1d,𝑠 ) \ S4𝛿
which was identified and took the place of 𝜔 in supp( ĝ1d,𝑠 ). For this to happen, | 𝑔ˆ 𝜔1d0 | ≤ 4𝛿 and
|𝑥 𝜔0 | ≥ |𝑥𝜔 |. But by property (2) (extended to all coefficients in x), we know
                                     √                   √                                                √
                             4𝛿 + 2𝛿 ≥ | 𝑔ˆ 𝜔1d0 | + 2𝛿 ≥ |𝑥 𝜔0 | ≥ |𝑥 𝜔 | ≥ | 𝑔ˆ 𝜔1d | − 2𝛿.
                                                                                              √
Thus, any frequency in S4𝛿 not chosen satisfies | 𝑔ˆ 𝜔1d | ≤ (4 + 2 2)𝛿, and so every frequency in
S(4+2√2)𝛿 is in fact identified in ĝ1d,𝑠 verifying property (1).
     As for property (3), we estimate the ℓ 2 error using property (2), Lemma 3.1, and the above
argument as
             k 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 k 2 ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | supp(ĝ1d,𝑠 ) k 2 + k( 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) k 2
                                                                                          √ √
                                      ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 ∩supp(ĝ1d,𝑠 ) k 2 + 2𝛿 2𝑠
                                                                                                               √
                                      ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 k 2 + k 𝑔ˆ 1d | S4 𝛿 \supp(ĝ1d,𝑠 ) k 2 + 2𝛿 𝑠
                                                                    opt
                                                                                   √                    √ √                 √
                                      ≤ k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 2 + 4𝛿 2𝑠 + (4 + 2 2)𝛿 2𝑠 + 2𝛿 𝑠
                                                                    opt
                                                                                   √          √
                                      = k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 2 + (8 2 + 6) 𝑠𝛿.
                                                                    46


     The proof of property (4) is very similar. We estimate the ℓ 1 error using the same techniques as
            k 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 k 1 ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | supp(ĝ1d,𝑠 ) k 1 + k( 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) k 1
                                                                                         √
                                     ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 ∩supp(ĝ1d,𝑠 ) k 1 + 2𝛿 · 2𝑠
                                                                                                             √
                                     ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 k 1 + k 𝑔ˆ 1d | S4 𝛿 \supp(ĝ1d,𝑠 ) k 1 + 2 2𝛿𝑠
                                                                   opt
                                                                                                       √                √
                                     ≤ k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 1 + 4𝛿 · 2𝑠 + (4 + 2 2)𝛿 · 2𝑠 + 2 2𝛿𝑠
                                                                   opt
                                                                                  √
                                     = k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 1 + (6 2 + 16)𝑠𝛿.
Remark 3.1. In the noiseless case, if the univariate function 𝑔 1d is Fourier 𝑠-sparse, i.e., is a trigono-
metric polynomial and 𝑀 is large enough such that supp( 𝑔ˆ 1d ) ⊂ B𝑀 , both A2𝑠,𝑀                             sub and A sub,MC will
                                                                                                                            2𝑠,𝑀
exactly recover 𝑔ˆ 1d | 𝑀 (the latter with probability 1 − 𝜎), and therefore 𝑔ˆ 1d . In particular, note that
the output of either algorithm will then actually be 𝑠-sparse.
     Using the above SFT algorithm with the discretization process outlined in [53] leads to a fully
discrete sparse Fourier transform, requiring only equispaced samples of 𝑔 1d , denoted g1d . However,
rather than separately accounting for the truncation to the frequency band B𝑀 as above, the equis-
paced samples allow us to take advantage of aliasing, which is particularly important when we apply
the algorithm along reconstructing rank-1 lattices. Thus, instead of approximating 𝑔ˆ 1d | 𝑀 ∈ CB𝑀 ,
we prefer to approximate the discrete Fourier transform of g1d given by F 𝑀 g1d .
     Eventually, we will consider techniques for approximation of arbitrary periodic functions rather
than simply polynomials. For this reason, we require noise-robust recovery results for the method in
[53]. The necessary modifications to account for this robustness as well as the improved guarantees
carried over from the previous algorithm are given below. The upshot is that we are able to state
five properties of this SFT analogous to those in Theorem 3.1 which allow for modular proofs of
the multivariate results later on.
Theorem 3.2 (Robust discrete sublinear-time SFT: see [53], Theorem 5). For a signal 𝑔 1d ∈ 𝑊 (T)∩
𝐶 (T) corrupted by some arbitrary noise 𝜇 : T → C, and 1 ≤ 𝑟 ≤                                  36 ,
                                                                                                𝑀
                                                                                                      Algorithm 1 of [53], denoted
   disc , will output a 2𝑠-sparse coefficient vector ĝ1d,𝑠 ∈ CB 𝑀 which
A2𝑠,𝑀
                                                                    47


   1. reconstructs every frequency of F 𝑀 g1d ∈ CB𝑀 , 𝜔 ∈ B𝑀 , with corresponding aliased Fourier
       coefficient meeting the tolerance
                                                                                                                     !
                                            √                                    opt
                    1d                            kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠 k 1                     1d     −𝑟
             |(F 𝑀 g )𝜔 | > 12(1 + 2)                                                    + 2(kg k ∞ 𝑀 + k 𝝁k ∞ ) ,
                                                                      2𝑠
   2. satisfies the ℓ ∞ error estimate for recovered coefficients
                                                                                                                           !
                                                   √ kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠opt k 1
       k(F 𝑀 g1d − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) k ∞ ≤3 2                                            + 2(kg1d k ∞ 𝑀 −𝑟 + k 𝝁k ∞ ) ,
                                                                             2𝑠
   3. satisfies the ℓ 2 error estimate
                                                                                                                opt
                                                                           opt           kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠 k 1
                kF 𝑀 g1d − ĝ1d,𝑠 k 2 ≤ kF 𝑀 g1d − (F 𝑀 g1d )2𝑠 k 2 + 38                             √
                                                                                                       𝑠
                                                 √
                                          + 152 𝑠(kg1d k ∞ 𝑀 −𝑟 + k 𝝁k ∞ ),
   4. satisfies the ℓ 1 error estimate
                                                                           opt                                  opt
                kF 𝑀 g1d − ĝ1d,𝑠 k 1 ≤ kF 𝑀 g1d − (F 𝑀 g1d )2𝑠 k 1 + 54kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠 k 1
                                           + 215𝑠(kg1d k ∞ 𝑀 −𝑟 + k 𝝁k ∞ ),
   5. and the number of required samples of g1d and the operation count for A2𝑠,𝑀                         disc are
                                                                                   !
                                                          𝑠2𝑟 3/2 log11/2 𝑀
                                                    O                                .
                                                                     log 𝑠
The Monte Carlo variant of A2𝑠,𝑀      disc , denoted A disc,MC , satisfies all of the conditions (1) – (4) simul-
                                                              2𝑠,𝑀
taneously with probability (1 − 𝜎) ∈ [2/3, 1) and has number of required samples and operation
count
                                                                                 
                                                  3/2       9/2                  𝑀
                                            O 𝑠𝑟      log           (𝑀) log            .
                                                                                 𝜎
Proof. All notation in this proof matches that in [53] (in particular, we use 𝑓 to denote the one-
dimensional function in place of 𝑔 1d in the theorem statement and 𝑁 = 2𝑀 + 1). We begin by
substituting the 2𝜋-periodic Gaussian filter given in (3) on page 756 with the 1-periodic Gaussian
and associated Fourier transform
                                                ∞      (2 𝜋 ) 2 ( 𝑥−𝑛) 2                    𝑐2 𝜔 2
                                           1 Õ − 2𝑐2                                 1       1
                              𝑔(𝑥) =               e              1      ,  𝑔ˆ 𝜔 = √ e− 2 .
                                          𝑐 1 𝑛=−∞                                    2𝜋
                                                                  48


Note then that all results regarding the Fourier transform remain unchanged, and since this 1-
periodic Gaussian is a just a rescaling of the 2𝜋-periodic one used in [53], the bound in [53, Lemma
1] holds with a similarly compressed Gaussian, that is, for all 𝑥 ∈ − 12 , 12
                                                                                                   
                                                                         (2 𝜋 𝑥 ) 2
                                                          3         1        −
                                                                                                                             (3.1)
                                                                                  2
                                           𝑔(𝑥) ≤             +√           e 2𝑐1 .
                                                        𝑐1          2𝜋
Analogous results up to and including [53, Lemma 10] for 1-periodic functions then hold straight-
forwardly.
     Assuming that our signal measurements f = ( 𝑓 (𝑦 𝑗 )) 2𝑀                  𝑗=0 = ( 𝑓 ( 𝑁 )) 𝑗=0 are corrupted by some
                                                                                              𝑗 2𝑀
discrete noise 𝝁 = (𝜇 𝑗 ) 2𝑀  𝑗=0 , we consider for any 𝑥 ∈ T a similar bound to [53, Lemma 10]. Here,
𝑗 0 := arg min 𝑗 |𝑥 − 𝑦 𝑗 | and 𝜅 := d𝛾 ln 𝑁e + 1 for some 𝛾 ∈ R+ to be determined. Then,
                                                 0
                                                𝑗 +𝜅
             2𝑀
           1 Õ                             1 Õ
                   𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) −              ( 𝑓 (𝑦 𝑗 ) + 𝜇 𝑗 )𝑔(𝑥 − 𝑦 𝑗 )
           𝑁 𝑗=0                          𝑁 𝑗= 𝑗 0 −𝜅
                                                        0
                                                       𝑗 +𝜅                                       0
                                                                                                 𝑗 +𝜅
                        2𝑀
                    1 Õ                               Õ                                     1    Õ
                ≤             𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) −             𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) +                     𝜇 𝑗 𝑔(𝑥 − 𝑦 𝑗 )
                    𝑁 𝑗=0                            𝑗= 𝑗 0 −𝜅
                                                                                           𝑁    𝑗= 𝑗 0 −𝜅
                                                        0
                                                       𝑗 +𝜅
                        2𝑀                                                                              𝜅
                    1 Õ                               Õ                                    k 𝝁k ∞ Õ
                ≤             𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) −             𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) +                       𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 )
                    𝑁 𝑗=0                            𝑗= 𝑗 0 −𝜅
                                                                                              𝑁 𝑘=−𝜅
We bound the first term in this sum by a direct application of [53, Lemma 10]; however, we take this
opportunity to reduce the constant in the bound given there. In particular, bounding this term by
the final expression in the proof of [53, Lemma 10] and using our implicit assumption that 36 ≤ 𝑁,
we have
                                                        0
                                                      𝑗 +𝜅
                    2𝑀
                 1 Õ                              1 Õ
                          𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) −                ( 𝑓 (𝑦 𝑗 ) + 𝜇 𝑗 )𝑔(𝑥 − 𝑦 𝑗 )
                𝑁 𝑗=0                            𝑁 𝑗= 𝑗 0 −𝜅
                                               r            !                                                                (3.2)
                                                                                               𝜅
                                      3      1 ln 36                      −𝑟     1           Õ
                             ≤ √ +                             kf k ∞ 𝑁 + k 𝝁k ∞                    𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 ).
                                      2𝜋 2𝜋          36                          𝑁           𝑘=−𝜅
     We now work on bounding the second term. First note that for all 𝑘 ∈ [−𝜅, 𝜅] ∩ Z,
                                                                                     
                                                                                    𝑘
                                         𝑔(𝑥 − 𝑦 𝑗 0 ±𝑘 ) = 𝑔 𝑥 − 𝑦 𝑗 0 ±               .
                                                                                   𝑁
                                                               49


Assuming without loss of generality that 0 ≤ 𝑥 − 𝑦 𝑗 0 , we can bound the nonnegatively indexed
summands by (3.1) as
                                                                      (2 𝜋 ) 2 𝑘2
                                                𝑘         3        2        − 2 2
                                 𝑔 𝑥 − 𝑦 𝑗0 +        ≤       +√           e 2𝑐1 𝑁 .                            (3.3)
                                                𝑁         𝑐1       2𝜋
For the negatively indexed summands, the definition of 𝑗 0 = arg min 𝑗 |𝑥 − 𝑦 𝑗 | implies that 𝑥 − 𝑦 𝑗 0 ≤
2𝑁 . In particular,
 1
                                                     𝑘      1 − 2𝑘
                                          𝑥 − 𝑦 𝑗0 −     ≤             <0
                                                     𝑁         2𝑁
implies
                                        2                              
                                      𝑘        1 − 2𝑘                  𝑘         2𝑘 − 1 𝑘
                          𝑥 − 𝑦 𝑗0 −        ≥            𝑥 − 𝑦 𝑗0 −          ≥               · ,
                                     𝑁           2𝑁                    𝑁          2𝑁          𝑁
giving
                                                                 (2 𝜋 ) 2 𝑘2 (2 𝜋 ) 2 𝑘
                                             𝑘        3        2       − 2 2
                                                                                                               (3.4)
                                                                                      2 2
                             𝑔 𝑥 − 𝑦 𝑗0 −         ≤       +√          e 2𝑐1 𝑁 e 4𝑐1 𝑁 .
                                            𝑁        𝑐1        2𝜋
     We now bound the final exponential. We first recall from [53] the choices of parameters
                           √                                                      √
                        𝛽 ln 𝑁                                         6𝑟        𝛽 𝑟                  √
                  𝑐1 =            , 𝜅 = d𝛾 ln 𝑁e + 1, 𝛾 = √ = √ , 𝛽 = 6 𝑟
                            𝑁                                           2𝜋 2 𝜋
with 1 ≤ 𝑟 ≤   36 .
               𝑁
                    For 𝑘 ∈ [1, 𝜅] ∩ Z then,
                                              !                     !
                                     (2𝜋) 2 𝑘             (2𝜋) 2 𝜅
                              exp               ≤ exp
                                     4𝑐21 𝑁 2             4𝑐21 𝑁 2
                                                                            
                                                           2   6𝑟√ln 𝑁
                                                       ©𝜋               +2 ª
                                                                  2𝜋
                                                ≤ exp ­­                       ®
                                                             36𝑟 ln 𝑁 ®
                                                       «                      ¬
                                                           𝜋            𝜋 2
                                                ≤ exp √ +
                                                          6 2 18𝑟 ln 𝑁
                                                                        𝜋2
                                                                              
                                                           𝜋
                                                ≤ exp √ +                         =: 𝐴.
                                                          6 2 18 ln 36
     Combining this with our bounds for the nonnegatively indexed summands (3.3) and the nega-
tively indexed summands (3.4), we have
                                                                   !                                         !
                     𝜅                                                                      𝜅           2 𝑘2
                1 Õ                              3           1                                 − (2 𝜋2)
                                                                                         Õ
                        𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 ) ≤ √           + √             1 + (1 + 𝐴)            e  2𝛽 ln 𝑁
                𝑁 𝑘=−𝜅                        𝛽 ln 𝑁 𝑁 2𝜋                                𝑘=1
                                                        50


Expressing the final sum as a truncated lower Riemann sum and applying a change of variables on
the resulting integral, we have
                                𝜅               2 𝑘2
                                                            √         ∫ ∞                     √
                                       − (2 𝜋2)           𝛽 ln 𝑁                            𝛽   ln 𝑁
                              Õ                                                   2
                                    e      2𝛽 ln 𝑁    ≤ √                    𝑒 −𝑥 𝑑𝑥 = √              .
                               𝑘=1                             2𝜋 0                          2 2𝜋
Making use of our parameter values from [53], and the fact that 1 ≤ 𝑟 ≤ 36                              𝑁
                                                                                                          ,
                                                                                !
                                                                                          1+ 𝐴 √
                     𝜅                                                                                      
                1 Õ                                        3              1
                         𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 ) ≤ √                      + √               1 + √ 𝛽 ln 𝑁
                𝑁 𝑘=−𝜅                                 𝛽 ln 𝑁 𝑁 2𝜋                        2 2𝜋
                                                                                                                         (3.5)
                                                                                                            r
                                                          3          3(1 + 𝐴)             1        1 + 𝐴 ln 36
                                                 ≤ √             + √                + √ +
                                                     6 ln 36            2 2𝜋          36 2𝜋           4𝜋        36
                                                 < 2.
    With our revised bound for (3.2) above, we reprove [53, Theorem 4] to estimate 𝑔 ∗ 𝑓 by the
truncated discrete convolution with noisy samples. In particular, we apply [53, Theorem 3], (3.2),
(3.1), and finally our same assumption that 1 ≤ 𝑟 ≤                        𝑁
                                                                          36  to obtain
                                         l            m
                                   𝑗 0+    √6𝑟   ln 𝑁 +1
                                             2𝜋
                              1              Õ
            (𝑔 ∗ 𝑓 )(𝑥) −                                   ( 𝑓 (𝑦 𝑗 ) + 𝜇 𝑗 )𝑔(𝑥 − 𝑦 𝑗 )
                              𝑁            l            m
                                  𝑗= 𝑗 0 −   √6𝑟   ln 𝑁 −1
                                               2𝜋
                                                                           r           !
                         𝑁 1−𝑟                                3         1      ln  36
                 ≤ √ √              kf k ∞ 𝑁 −𝑟 + √ +                                    kf k ∞ 𝑁 −𝑟 + 2k 𝝁k ∞
                    6 𝑟 ln 𝑁                                   2𝜋      2𝜋       36
                                                          r           !                                              
                            1               3          1 ln 36 kf k ∞                                 kf k ∞
                 ≤ √               +√ +                                          + 2k 𝝁k ∞ < 2                + k 𝝁k ∞ .
                       6 ln 36              2𝜋 2𝜋               36        𝑁𝑟                            𝑁𝑟
Replacing all references of 3kf k ∞ 𝑁 −𝑟 by 2(kf k ∞ 𝑁 −𝑟 + k 𝝁k ∞ ) in the remainder of the steps up to
proving [53, Theorem 5] gives the desired noise robustness (with a slightly improved constant).
    Using the revised error estimates of the nonequispaced algorithm from Theorem 3.1 and re-
defining 𝛿 = 3(k f̂ − f̂𝑠 k 1 /2𝑠 + 2(kf k ∞ 𝑁 −𝑟 + k 𝝁k ∞ )) as in the proof of [53, Theorem 5] (which also
                        opt
contains the proof of property (2)), the discretization algorithm [53, Algorithm 1] will produce can-
                                                                                                                         √
didate Fourier coefficient approximations in lines 9 and 12 corresponding to every | 𝑓ˆ𝜔 | ≥ (4+2 2)𝛿
in place of 4𝛿 in Theorem 3.1. The exact same argument as in the proof of Theorem 3.1 then applies
to the selection of the 2𝑠-largest entries of this approximation with the revised threshold values and
error bounds to give properties (1), (3) and (4).
                                                                    51


    In detail, [53, Lemma 13] and the discussion right after its statement gives that property (2) holds
for any approximate coefficient with frequency recovered throughout the algorithm (which, for the
purposes of the following discussion, we will store in x rather than 𝑅ˆ defined in [53, Algorithm 1]),
not just those in the final output v := x𝑠 . Additionally, by the same lemma and our revised bounds
                                                    opt
                                                                                                √
from Theorem 3.1, any frequency 𝜔 ∈ [𝑁] satisfying | 𝑓𝜔 | > (4 + 2 2)𝛿 will have an associated
coefficient estimate in x.
    By Lemma 3.1, |S(4+2√2)𝛿 | ≤ 2𝑠 = | supp(v)|, and so if 𝜔 ∈ S(4+2√2)𝛿 \ supp(v), there exists
some 𝜔0 ∈ supp(v) \ S(4+2√2)𝛿 such that 𝑣 𝜔0 took the place of 𝑣 𝜔 in S. In particular, this means
                                          √                                   √
that |𝑥 𝜔0 | ≥ |𝑥 𝜔 |, | 𝑓ˆ𝜔0 | ≤ (4 + 2 2)𝛿, and | 𝑓ˆ𝜔 | > (4 + 2 2𝛿). Thus,
                                 √        √                     √                                    √
                         (4 + 2 2)𝛿 + 2𝛿 > | 𝑓ˆ𝜔0 | + 2𝛿 ≥ |𝑥 𝜔0 | ≥ |𝑥 𝜔 | ≥ | 𝑓ˆ𝜔 | − 2𝛿,
                                    √
implying that | 𝑓ˆ𝜔 | ≤ 4(1 + 2)𝛿 and therefore proving (1).
    To prove (3), we use Lemma 3.1, and consider
                     k f̂ − vk 2 ≤ k f̂ − f̂| supp(v) k 2 − k( f̂ − v)| supp(v) k 2
                                                                           √ √
                                  ≤ k f̂ − f̂| S(4+2√2) 𝛿 ∩supp(v) k 2 + 2𝛿 2𝑠
                                                                                                  √
                                  ≤ k f̂ − f̂| S(4+2√2) 𝛿 k 2 + k f̂| S(4+2√2) 𝛿 \supp(v) k 2 + 2𝛿 𝑠
                                              opt
                                                                √ √                         √ √          √
                                  ≤ k f̂ − f̂2𝑠 k 2 + (4 + 2 2)𝛿 2𝑠 + 4(1 + 2)𝛿 2𝑠 + 2𝛿 𝑠
                                             opt
                                                                   √ √
                                  = k f̂ − f̂2𝑠 k 2 + (14 + 8 2)𝛿 𝑠.
    The proof of (4) is similar, bounding the ℓ 1 error as
                    k f̂ − vk 1 ≤ k f̂ − f̂| supp(v) k 1 − k( f̂ − v)| supp(v) k 1
                                                                          √
                                 ≤ k f̂ − f̂| S(4+2√2) 𝛿 ∩supp(v) k 1 + 2𝛿 · 2𝑠
                                                                                                √
                                 ≤ k f̂ − f̂| S(4+2√2) 𝛿 k 1 + k f̂| S(4+2√2) 𝛿 \supp(v) k 1 + 2 2𝛿𝑠
                                            opt
                                                               √                           √           √
                                 ≤ k f̂ − f̂2𝑠 k 1 + (4 + 2 2)𝛿 · 2𝑠 + 4(1 + 2)𝛿 · 2𝑠 + 2 2𝛿𝑠
                                            opt
                                                                    √
                                 = k f̂ − f̂2𝑠 k 1 + (16 + 14 2)𝛿𝑠.
                                                                  52


3.3    Fast multivariate sparse Fourier transforms
     Having detailed two sublinear-time, one-dimensional SFT algorithms, we are now prepared to
extend these to the multivariate setting. The general approach will be to apply the one-dimensional
methods to transformations of our multivariate function of interest with samples taken along rank-1
lattices. The particular approaches for transforming our multivariate function will then allow for
the efficient extraction of multidimensional frequency information for the most energetic coeffi-
cients identified by univarate SFTs. In particular, our first approach considered in Section 3.3.1
successively shifts the function in each dimension, whereas our second approach considered in
Section 3.3.2 successively collapses all but one dimension along a rank-1 lattice and samples the
resulting two-dimensional function.
     Since the two approaches in Algorithms 1 and 2 below can make use of any univariate SFT algo-
rithm, their analysis will be presented in a modular fashion below. Each algorithm is followed by a
lemma (Lemma 3.2 and Lemma 3.4 respectively) which provides associated error guarantees when
any sufficiently accurate univariate SFT A 𝑠,𝑀 is employed. The lemmas are then each followed by
two corollaries (Corollaries 3.1 and 3.2 and Corollaries 3.3 and 3.4 respectively) where we apply
the lemma to the two example univariate SFTs reviewed in Section 3.2 specified by Theorems 3.2
and 3.1.
3.3.1    Phase encoding
     We begin by noting that this section makes significant use of the property of the Fourier trans-
form that translation of a function modulates its Fourier coefficients. We denote the shift operator
𝑆ℓ,𝛼 in the ℓth coordinate with shift 𝛼 ∈ R defined by its action on the multivariate periodic function
𝑔 : T𝑑 → C as
                 𝑆ℓ,𝛼 (𝑔)(𝑥 1 , . . . , 𝑥 𝑑 ) := 𝑔(𝑥1 , . . . , 𝑥ℓ−1 , (𝑥ℓ + 𝛼) mod 1, 𝑥ℓ+1 , . . . , 𝑥 𝑑 ).
By a change of coordinates in the integral defining a Fourier coefficient, we see that translating will
modulate the Fourier coefficients of 𝑔 : T𝑑 → C as
                                                   \
                                                  (𝑆 ℓ,𝛼 𝑔) k = e
                                                                   2𝜋i𝑘 ℓ 𝛼
                                                                            𝑔ˆ k .                           (3.6)
                                                               53


                                                         e2𝜋i(𝑘 0 𝑧0 𝑡+𝑘 1 𝑧1 𝑡)
                                                            𝑆0,1/𝐾
                            z𝑡
                                                         e2𝜋i(𝑘 0 (𝑧0 𝑡+1/𝐾)+𝑘 1 𝑧1 𝑡)
                                   z𝑡 + (1/𝐾, 0)
                                                          = e2𝜋i𝑘 0 ·1/𝐾 e2𝜋i(𝑘 0 𝑧0 𝑡+𝑘 1 𝑧1 𝑡)
                                                            | {z }
                                                            compute 𝑘 0
                                                             from here
                      1/𝐾
                       →
Figure 3.1 The basic procedure for the phase encoding algorithm applied to the trigonometric mono-
mial 𝑔(x) = e2𝜋ik·x .
The main idea of our phase encoding approach in Algorithm 3.1 is that by exploiting this spatial
translation property, we can separate out the components of recovered frequencies in modulations
of the function’s Fourier coefficients. Before stating the algorithm in detail, we begin with a simple
example.
Example 3.1 (Phase encoding on a trigonometric monomial). Let 𝑑 = 2. Suppose that 𝑔(x) =
e2𝜋ik·x is a trigonometric monomial with single frequency k ∈ I ⊂ Z2 for some known, potentially
large I. Given Λ(z, 𝑀), a reconstructing rank-1 lattice for I, we consider the one-dimensional
restriction of 𝑔 to the lattice 𝑔 1d (𝑡) := 𝑔(𝑡z). Since 𝑔 is Fourier-sparse, a lattice FFT (cf. Algo-
rithm 1.1) on 𝑔 1d is unnecessarily expensive. Thus, applying a much faster SFT to 𝑔 1d returns
   1d
𝑔ˆ k·z mod 𝑀
              = 1. Our goal is to match this coefficient of 𝑔 1d to the correct Fourier coefficient of 𝑔
without having to search all of I.
       Figure 3.1 depicts the phase encoding method we use in Algorithm 3.1 below. In order to
compute 𝑔 1d , we restrict 𝑔 to the dark blue line in this figure, z𝑡. However, to get extra information
about k, we also consider 𝑆0,1/𝐾 𝑔, a shift of 𝑔 in the first coordinate by 1/𝐾, restricted to the same
lattice. The shifted lattice that we effectively restrict 𝑔 to, z𝑡 + (1/𝐾, 0), is depicted in light blue.
       The resulting modulation of 𝑔 induced by this spatial shift (as described by (3.6)) is detailed
in the remainder of Figure 3.1. Thus, defining 𝑔 1d,1 (𝑡) := 𝑆0,1/𝐾 𝑔(z𝑡), an SFT would discover
                                                    54


   1d,1
𝑔ˆ k·z mod 𝑀
               = e2𝜋i𝑘 0 /𝐾 . We then can extract 𝑘 0 from this modulation.
       Repeating this process in the ℓ = 1 coordinate will recover 𝑘 1 , and therefore, the entirety of k
is recovered by using 𝑑 = 2 SFTs. From here, we can then match 𝑔ˆ k·z        1d
                                                                                mod 𝑀
                                                                                      = 1 to 𝑔ˆ k in faster than
O (|I|) time and memory as desired.
       In the language of Algorithm 3.1, the original SFT of 𝑔 1d occurs on Line 1. The SFTs of the
shifts of 𝑔, denoted 𝑔 1d,0 , . . . , 𝑔 1d,𝑑−1 , occur on Line 3. In this example, we considered a function
with only one significant Fourier mode, however, we will generally recover 𝑠 significant Fourier
modes from the SFT algorithm. Thus, the for loop from Lines 6 to 14 considers each of these
recovered one-dimensional frequencies separately. Line 9 computes the modulation induced by
each of the 𝑑 shifts, then extracts each coordinate of the 𝑑-dimensional frequency. The remaining
check on Line 11 is useful for the theoretical analysis to ensure that spuriously recovered frequencies
are ignored.
Algorithm 3.1 Simple Frequency Index Recovery by Phase Encoding
Input: A multivariate periodic function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ) (from which we are able to obtain
        potentially noisy samples), a multivariate frequency set I ⊂ B𝐾𝑑 , a reconstructing rank-1 lattice
        Λ(z, 𝑀) for I, and an SFT algorithm A 𝑠,𝑀 .
Output: Sparse coefficient vector ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 (optionally supported on I, see Line 11), an
                                                                𝐾
        approximation to ( 𝑔|      ˆ I)𝑠 .
                                         opt
   1:   Apply A 𝑠,𝑀 to the univariate restriction of 𝑔 to the lattice, 𝑔 1d (𝑡) = 𝑔(𝑡z), to produce ĝ1d,𝑠 =
        A 𝑠,𝑀 𝑔 1d , a sparse approximation of F 𝑀 g1d ∈ CB𝑀 .
   2:   for all ℓ ∈ [𝑑] do
   3:       Apply A 𝑠,𝑀 to 𝑔 1d,ℓ (𝑡) = 𝑆ℓ,1/𝐾 𝑔(𝑡z) to produce ĝ1d,ℓ,𝑠 = A 𝑠,𝑀 𝑔 1d,ℓ , a sparse approxima-
        tion of F 𝑀 g1d,ℓ ∈ CB𝑀 .
   4:   end for
   5:   ĝ𝑠 ← 0
   6:   for all 𝜔 ∈ supp( ĝ1d,𝑠 ) ⊂ B𝑀 do
   7:       k𝜔 ← 0
   8:       for all ℓ ∈ [𝑑] do
   9:            (𝑘 𝜔 )ℓ ← round(𝐾 arg( 𝑔ˆ 𝜔1d,ℓ,𝑠 /𝑔ˆ 𝜔1d,𝑠 )/2𝜋)
 10:        end for
 11:        if k𝜔 · z ≡ 𝜔 (mod 𝑀) (and optionally k𝜔 ∈ I; see Remark 3.2) then
 12:             𝑔ˆ k𝑠 𝜔 ← 𝑔ˆ k𝑠 𝜔 + 𝑔ˆ 𝜔1d,𝑠
 13:        end if
 14:    end for
                                                              55


3.3.1.1   Analysis of Algorithm 3.1
    Having seen the phase encoding approach of Algorithm 3.1 in action, we now provide an error
guarantee for its output. Notice that the assumptions on the SFT necessary for this theoretical
analysis are exactly those provided by Theorems 3.1 and 3.2. When we use the complex argument
function in Algorithm 3.1 and below, we use the principal branch, so that arg : C → (−𝜋, 𝜋].
Lemma 3.2 (General recovery result for Algorithm 3.1). Let A 𝑠,𝑀 in the input to Algorithm 3.1 be
a noise-robust SFT algorithm which, for a function 𝑔 1d ∈ 𝑊 (T) ∩𝐶 (T) corrupted by some arbitrary
noise 𝜇 : T → C, constructs an 𝑠-sparse Fourier approximation A 𝑠,𝑀 (𝑔 1d + 𝜇) =: ĝ1d,𝑠 ∈ CB𝑀
which
   1. reconstructs every frequency (up to 𝑠 many) of F 𝑀 g1d ∈ C 𝑀 , 𝜔 ∈ B𝑀 , with corresponding
       Fourier coefficient meeting the tolerance |(F 𝑀 g1d )𝜔 | > 𝜏,
   2. satisfies the ℓ ∞ error estimate for recovered coefficients
                                    (F 𝑀 g1d − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 )  ∞
                                                                          ≤ 𝜂∞ < 𝜏,
   3. satisfies the ℓ 2 error estimate
                                             F 𝑀 g1d − ĝ1d,𝑠      2
                                                                       ≤ 𝜂2 ,
   4. satisfies the ℓ 1 error estimate
                                             F 𝑀 g1d − ĝ1d,𝑠      1
                                                                       ≤ 𝜂1 ,
   5. and requires O (𝑃(𝑠, 𝑀)) total evaluations of 𝑔 1d , operating with computational complexity
       O (𝑅(𝑠, 𝑀)).
Additionally, assume that the parameters 𝜏 and 𝜂∞ hold uniformly for each SFT performed in Al-
gorithm 3.1.
    Let 𝑔, I, and Λ(z, 𝑀) be as specified in the input to Algorithm 3.1. Collecting the 𝜏-significant
frequencies of 𝑔 into the set S𝜏 := {k ∈ I | | 𝑔ˆ k | > 𝜏}, assume that |S𝜏 | ≤ 𝑠, and set
                                                                          !!
                                                                   2
                                    𝛽 = max 𝜏, 𝜂∞ 1 +                     .
                                                              sin 𝐾𝜋
                                                     56


Then Algorithm 3.1 (ignoring the optional check on Line 11) will produce an 𝑠-sparse approxima-
tion ĝ𝑠 of the Fourier coefficients of 𝑔 satisfying the error estimates
                                                               q
                       k ĝ𝑠 − 𝑔k
                               ˆ ℓ2 (Z𝑑 ) ≤ 𝜂2 + (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0)
                                                + k 𝑔|
                                                     ˆ I − 𝑔|ˆ S𝛽 k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ2 (Z𝑑 )
and
                       k ĝ𝑠 − 𝑔k
                               ˆ ℓ1 (Z𝑑 ) ≤ 𝜂1 + (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0)
                                                + k 𝑔|
                                                     ˆ I − 𝑔|ˆ S𝛽 k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ1 (Z𝑑 )
requiring O (𝑑 · 𝑃(𝑠, 𝑀)) total evaluations of 𝑔, in O (𝑑 · (𝑅(𝑠, 𝑀) + 𝑠)) total operations.
Proof. We begin by assuming that 𝑔 is a trigonometric polynomial with supp( 𝑔)                             ˆ ⊂ I. Since
Λ(z, 𝑀) is a reconstructing rank-1 lattice for I, there are no collisions among the one-dimensional
frequencies {k · z | k ∈ I} modulo 𝑀. Setting 𝑔(𝑡z) = 𝑔 1d (𝑡) then ensures that for each k ∈ I,
          1d . Since there are no frequency collisions in the lattice FFT, Lemma 1.3 implies that
𝑔ˆ k = 𝑔ˆ k·z
𝑔ˆ k = (F 𝑀 g1d )k·z mod 𝑀 . Thus, by assumption 1 on the SFT algorithm A 𝑠,𝑀 , Lines 1 and 3 of
Algorithm 3.1 will produce coefficient estimates of 𝑔ˆ k for every k ∈ S𝜏 . We then write these SFT
approximations as 𝑔ˆ k·z
                       1d,𝑠
                            mod 𝑀
                                  = 𝑔ˆ k + 𝜂k and 𝑔ˆ k·z1d,ℓ,𝑠
                                                            mod 𝑀
                                                                    = e2𝜋i𝑘 ℓ /𝐾 ( 𝑔ˆ k + 𝜂kℓ ) respectively, where we
have made use of (3.6). Note that |𝜂k |, |𝜂kℓ | ≤ 𝜂∞ . Now, considering the estimate for 𝑘 ℓ , we have
                                        1d,ℓ,𝑠
                                                   !                                          !
                            𝐾        𝑔ˆ k·z              𝐾                        ˆ
                                                                                  𝑔 k  + 𝜂 ℓ
                               arg 1d,𝑠mod 𝑀 =               arg e2𝜋i𝑘 ℓ /𝐾 1d,𝑠 k
                            2𝜋       𝑔ˆ k·z mod 𝑀       2𝜋                     𝑔ˆ k·z mod 𝑀
                                                                                          !
                                                               𝐾             ˆ
                                                                             𝑔 k  +  𝜂 ℓ
                                                     = 𝑘ℓ +       arg 1d,𝑠 k
                                                               2𝜋         𝑔ˆ k·z mod 𝑀
                                                                                               !
                                                               𝐾                   𝜂kℓ − 𝜂k
                                                     = 𝑘ℓ +       arg 1 + 1d,𝑠                    .
                                                               2𝜋                𝑔ˆ k·z mod 𝑀
We now only consider | 𝑔ˆ k | > 𝛽 ≥ max(𝜏, 3𝜂∞ ), that is k ∈ S𝛽 ⊂ S𝜏 , and therefore, the correspond-
ing approximate coefficient satisfies | 𝑔ˆ k·z 1d,𝑠
                                                   mod 𝑀
                                                          | > 𝛽 − 𝜂∞ . Thus, the magnitude of the fraction in
the argument must be strictly less than          2𝜂∞
                                                𝛽−𝜂∞   ≤ 1. Therefore, we consider the argument of a point
lying in the right half of the complex plane, in the open disc of radius                         2𝜂∞
                                                                                                𝛽−𝜂∞    centered at 1. The
                                                           57


maximal absolute argument of a point in this disc will be that of a point lying on a tangent line
passing through the origin. This point, the origin, and 1 then form a right triangle from which we
deduce that                                                          !
                                                      𝜂kℓ − 𝜂k
                                                                                                
                                                                                        2𝜂∞
                                       arg 1 +         1d,𝑠
                                                                        < arcsin                   ,
                                                    𝑔ˆ k·z                            𝛽 − 𝜂∞
                                                           mod 𝑀
and our choice of 𝛽 ≥ 𝜂∞ (1 + 2/sin(𝜋/𝐾)) then implies that
                                                                             !
                                                                  𝜂kℓ − 𝜂k            𝜋
                                                  arg 1 + 1d,𝑠                   < .
                                                                𝑔ˆ k·𝑧 mod 𝑀
                                                                                     𝐾
Thus,
                                                               1d,ℓ,𝑠
                                                                           !
                                               𝐾            𝑔ˆ k·z                       1
                                                    arg 1d,𝑠mod 𝑀 − 𝑘 ℓ < ,
                                              2𝜋            𝑔ˆ k·z mod 𝑀                 2
and so after rounding to the nearest integer, Algorithm 3.1 will recover 𝑘 ℓ for all ℓ ∈ [𝑑] and k ∈ S𝛽 .
     We now know that the final loop of Algorithm 3.1 will properly map the one-dimensional fre-
quency 𝜔 = k · z mod 𝑀 to k for all k ∈ S𝛽 . Thus, for these same k ∈ S𝛽 , Line 12 ensures
that we set 𝑔ˆ k𝑠 := 𝑔ˆ k·z
                        1d,𝑠
                            mod 𝑀
                                     . Additionally, the max(𝑠 − |S𝛽 |, 0) many coefficients 𝑔ˆ 𝜔1d,𝑠 for which
𝜔 ≠ k · z mod 𝑀 for any k ∈ S𝛽 are still available for potential assignment. If any multivariate
frequency k𝜔 ∈ I is reconstructed and passes the mandatory check in Line 11 then the approximate
Fourier coefficient 𝑔ˆ 𝜔1d,𝑠 properly corresponds to (F 𝑀 g1d )k 𝜔 ·z mod 𝑀 = 𝑔ˆ k 𝜔 .
     On the other hand, if some error introduced in the SFTs reconstructs a multivariate frequency
k𝜔 ∉ I, the reconstructing property does not allow us to conclude anything about a (𝑘 𝜔 , 𝜔) pair
passing the check in Line 11. Thus, it is possible that 𝑔ˆ 𝜔1d,𝑠 will contribute to some component of
ĝ𝑠 not corresponding to any frequency in I. At the least however, since we know that all entries of
ĝ1d,𝑠 corresponding to frequencies in S𝛽 are correctly assigned, the remaining ones satisfy | 𝑔ˆ 𝜔1d,𝑠 | ≤
𝛽 + 𝜂∞ . Using these facts allows us to estimate the ℓ 2 error as
         k ĝ𝑠 − 𝑔k
                 ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠 | Z𝑑 \I k ℓ2 (Z𝑑 ) + k ĝ𝑠 | I − 𝑔| ˆ supp(ĝ𝑠 )∩I k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                                                                           ˆ S𝛽 k ℓ2 (Z𝑑 )
                                            q                                                                              (3.7)
                            ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂2 + k 𝑔ˆ − 𝑔|                   ˆ S𝛽 k ℓ2 (I)
and the ℓ 1 error as
         k ĝ𝑠 − 𝑔k
                 ˆ ℓ1 (Z𝑑 ) ≤ k ĝ𝑠 | Z𝑑 \I k ℓ1 (Z𝑑 ) + k ĝ𝑠 | I − 𝑔| ˆ supp(ĝ𝑠 )∩I k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                                                                           ˆ S𝛽 k ℓ1 (Z𝑑 )
                                                                                                                           (3.8)
                            ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂1 + k 𝑔ˆ − 𝑔|                ˆ S𝛽 k ℓ1 (I)
                                                                     58


where we have additionally used the accuracy of the initial one-dimensional SFT and the assumption
that 𝑔ˆ is supported on I.
    We now handle the case when 𝑔 is not necessarily a polynomial with Fourier support contained
in I. Rather than aiming to approximate 𝑔ˆ k for every k ∈ Z𝑑 , we restrict attention to only frequen-
cies in I, instead attempting to approximate the Fourier coefficients of 𝑔| I = k∈I 𝑔ˆ k e2𝜋ik·◦ . We
                                                                                                     Í
then have that 𝑔 =: 𝑔| I + 𝑔| Z𝑑 \I and view potentially noisy input 𝑔 + 𝜇 to our algorithm as
                                            𝑔 + 𝜇 = 𝑔| I + 𝑔| Z𝑑 \I + 𝜇 .
                                                                 | {z }
                                                                       𝜇0
    Algorithm 3.1 applied to 𝑔 + 𝜇 is then equivalent to applying it to 𝑔| I + 𝜇0, where now 𝜏, 𝜂∞ , 𝜂2 ,
and 𝜂1 depend on 𝜇0, and the output is an approximation of 𝑔|                  ˆ I . Since 𝜇0 represents noise on the
input to A 𝑠,𝑀 in its applications to 𝑔| I (𝑡z) and 𝑆ℓ,1/𝐾 𝑔| I (𝑡z) we remark here that
                          k𝜇0 k ∞ ≤ k𝑔| Z𝑑 \I k ∞ + k𝜇k ∞ ≤ k 𝑔ˆ − 𝑔|      ˆ I k ℓ1 (Z𝑑 ) + k𝜇k ∞                   (3.9)
so as to help us estimate 𝜏, 𝜂∞ , 𝜂2 , and 𝜂1 in future applications of the lemma. Accounting for the
truncation to I in the ℓ 2 error bound and using (3.7) applied to 𝑔|               ˆ I , we estimate the ℓ 2 error as
                k ĝ𝑠 − 𝑔k
                        ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔|ˆ I k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                                       ˆ I k ℓ2 (Z𝑑 )
                                                  q
                                   ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂2 + k 𝑔|            ˆ I − 𝑔|
                                                                                              ˆ S𝛽 k ℓ2 (Z𝑑 )
                                         + k 𝑔ˆ − 𝑔|
                                                   ˆ I k ℓ2 (Z𝑑 )
and the ℓ 1 error as
                 k ĝ𝑠 − 𝑔k
                         ˆ ℓ1 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔|ˆ I k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                                        ˆ I k ℓ1 (Z𝑑 )
                                    ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂1 + k 𝑔|          ˆ I − 𝑔|
                                                                                             ˆ S𝛽 k ℓ1 (Z𝑑 )
                                          + k 𝑔ˆ − 𝑔|ˆ I k ℓ1 (Z𝑑 )
    Recalling that 𝑃(𝑠, 𝑀) and 𝑅(𝑠, 𝑀) are the sampling and runtime complexity of A 𝑠,𝑀 respec-
tively, since 1 + 𝑑 SFTs are required, the number of 𝑔 evaluations is O (𝑑 · 𝑃(𝑠, 𝑀)) and the associ-
ated computational complexity is O (𝑑 · 𝑅(𝑠, 𝑀)). The complexity of Lines 6 to 14 is O (𝑠𝑑).
                                                              59


Remark 3.2. Since the only possible misassigned values of 𝑔ˆ 𝜔1d,𝑠 contribute to coefficients in ĝ𝑠
outside the chosen frequency set I for which Λ(z, 𝑀) is reconstructing, if it is possible to quickly
(e.g., in O (𝑑) time) check a multivariate frequency’s inclusion in I (e.g., a hyperbolic cross), en-
tries outside of I in ĝ𝑠 can be identified in the optional check on Line 11 and remain (correctly)
unassigned. This has the effect of removing the max(𝑠 − |S𝛽 |, 0) terms in the error bounds while not
increasing the computational complexity. Additionally, this outputs an approximation to ( 𝑔|
                                                                                                              opt
                                                                                                         ˆ I )𝑠
which is supported only on our supplied frequency set I as we may expect or prefer.
     We now apply Lemma 3.2 with the discrete sublinear-time SFT from Theorem 3.2 to give spe-
cific error bounds in terms of best 𝑠-term approximation errors as well as detailed runtime and
sampling complexities.
Corollary 3.1 (Algorithm 3.1 with discrete sublinear-time SFT). Let 𝐾 ≥ 9. For I ⊂ B𝐾𝑑 with
reconstructing rank-1 lattice Λ(z, 𝑀) and the function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ), we consider applying
Algorithm 3.1 where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute
magnitude. Using the discrete sublinear-time SFT algorithm A2𝑠,𝑀          disc or A disc,MC with parameter
                                                                                    2𝑠,𝑀
            36 , Algorithm 3.1 will produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying
             𝑀
1≤𝑟 ≤
                                                               𝐾
the error estimates
                                                opt
                                 ˆ I − ( 𝑔|
                               k 𝑔|       ˆ I ) 𝑠 k1                  √
    k ĝ𝑠 − 𝑔k
             ˆ 2 ≤ (48 + 4𝐾)            √              + (189 + 16𝐾) 𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞)
                                           𝑠
                                                 opt
     k ĝ𝑠 − 𝑔k
              ˆ 1 ≤ (69 + 6𝐾) 𝑔|  ˆ I − ( 𝑔|
                                           ˆ I)𝑠       + (267 + 23𝐾)𝑠 (k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔|
                                                                                           ˆ I k 1 + 𝑒∞) ,
                                                     1
albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version. The total number of evaluations
of 𝑔 and computational complexity will be
                                                   !
                            𝑑𝑠2𝑟 3/2 log11/2 𝑀
                                                                                  
                                                                                 𝑑𝑀
                        O                            or O 𝑑𝑠𝑟 log 𝑀 log
                                                               3/2   9/2
                                    log 𝑠                                         𝜎
for A2𝑠,𝑀
        disc or A disc,MC respectively.
                   2𝑠,𝑀
Proof. For the definitions of 𝜏 and 𝛽 in Lemma 3.2 with associated values given by Theorem 3.2,
                                                         60


Lemma 3.1 applied with x = 𝑔|      ˆ I implies that S𝛽 can contain at most 2𝑠 elements and the bound
                                                                       opt
                                                                                          √
                          k 𝑔|
                            ˆ I − 𝑔|ˆ S𝛽 k ℓ2 (Z𝑑 ) ≤ k 𝑔|ˆ I − ( 𝑔|
                                                                  ˆ I )2𝑠 k ℓ2 (Z𝑑 ) + 𝛽 2𝑠
                                                        k 𝑔|
                                                          ˆ I − ( 𝑔|
                                                                        opt
                                                                  ˆ I ) 𝑠 k ℓ1 (Z𝑑 )       √                        (3.10)
                                                    ≤               √                + 𝛽 2𝑠
                                                                  2 𝑠
holds. Note that the last inequality follows from [24, Theorem 2.5] applied to 𝑔|                                  ˆ I )𝑠 .
                                                                                                                        opt
                                                                                                         ˆ I − ( 𝑔|
Lemma 3.2 then holds with 𝑠 replaced by 2𝑠 for the 2𝑠-sparse approximations given by A2𝑠,𝑀                             disc
or A2𝑠,𝑀
      disc,MC
              in Algorithm 3.1.
     After treating the truncation error as measurement noise as well as accounting for any noise in
the input bounded by 𝑒 ∞ , Theorem 3.2 gives the values
                                                                                                     !
                         √ k 𝑔|                   opt
                                  ˆ I − ( 𝑔|
                                           ˆ I )𝑠 k1                     −𝑟
                  𝜂∞ = 3 2                               + 2(k𝑔k ∞ 𝑀 + k 𝑔ˆ − 𝑔|      ˆ I k 1 + 𝑒∞) ,
                                          2𝑠
                                                               √
                                                       12(1 + 2)
                                                𝜏=           √       𝜂∞ .
                                                           3 2
Assuming 𝐾 ≥ 9,
                                                 !!                         !                              !
                                          2                           2                            2
             𝛽 = max 𝜏, 𝜂∞ 1 +              𝜋
                                                     = 𝜂∞ 1 +           𝜋
                                                                            ≤ 𝜂∞        1+          𝜋
                                                                                                        𝐾 .
                                      sin   𝐾                     sin    𝐾                    9 sin   9
Inserting the estimate for k 𝑔|        ˆ S𝛽 k 2 from (3.10), this bound for 𝛽, and the values for 𝜂2 (where
                                ˆ I − 𝑔|
again we use [24, Theorem 2.5]) and 𝜂1 from Theorem 3.2
                                            opt
                             ˆ I − ( 𝑔|
                        77k 𝑔|         ˆ I )𝑠 k1             √
                  𝜂2 ≤              √                + 152 𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔|       ˆ I k 1 + 𝑒∞)
                                   2 𝑠
                                              opt
                   𝜂1 ≤ 55 𝑔|  ˆ I − ( 𝑔|
                                        ˆ I )𝑠        + 215𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔|    ˆ I k 1 + 𝑒∞)
                                                   1
                                                                                               √
into the recovery bound in Lemma 3.2 and upper bounding k 𝑔ˆ − 𝑔|                  ˆ I k 2 by 𝑠k 𝑔ˆ − 𝑔| ˆ I k 1 gives the
final error estimate.
     The change to the complexity of the randomized algorithm arises from distributing the proba-
bility of failure 𝜎 over the 𝑑 + 1 SFTs in a union bound.
     Because the nonequispaced SFTs discussed in Theorem 3.1 do not approximate the discrete
Fourier transform and therefore do not alias the one-dimensional frequencies k·z into frequencies in
B𝑀 , slightly modifying Algorithm 3.1 to use SFTs with a larger bandwidth allows for the following
recovery result.
                                                            61


Corollary 3.2 (Algorithm 3.1 with nonequispaced sublinear-time SFT). For I ⊂ B𝐾𝑑 with 𝐾 ≥ 6,
fix the new bandwidth parameter 𝑀˜ := 2 maxk∈I |k · z| + 1. For Λ(z, 𝑀), a reconstructing rank-1
lattice for I with 𝑀 ≤ 𝑀,      ˜ and the function 𝑔 ∈ 𝑊 (T𝑑 )∩𝐶 (T𝑑 ), we consider applying Algorithm 3.1
where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute magnitude with
the following modifications:
    1. use the sublinear-time SFT algorithm A2𝑠,            sub or A sub,MC
                                                               𝑀˜           2𝑠, 𝑀˜
    2. and only check equality against 𝜔 in Line 11 (rather than equivalence modulo 𝑀),
to produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying the error estimates
                                𝐾
                                                  "                                                             #
                                                                        opt
                                                    k ˆ
                                                      𝑔| I −   ( ˆ
                                                                 𝑔|   )
                                                                    I 𝑠      k 1    √                    √
              k ĝ𝑠 − 𝑔kˆ ℓ2 (Z𝑑 ) ≤ (25 + 3𝐾)                 √                  + 𝑠k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑠𝑒 ∞ ,
                                                                  𝑠
                                                       h                                                      i
                                                                            opt
                   k ĝ𝑠 − 𝑔k
                           ˆ ℓ1 (Z𝑑 ) ≤ (35 + 3𝐾) 𝑔|      ˆ I − ( 𝑔| ˆ I)𝑠                     ˆ I k 1 + 𝑠𝑒 ∞
                                                                                    + 𝑠k 𝑔ˆ − 𝑔|
                                                                                  1
albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version. For A2𝑠,                      sub and A sub,MC respec-
                                                                                                      𝑀˜        2𝑠, 𝑀˜
tively, the total number of evaluations of 𝑔 and computational complexity will be
                                      𝑑𝑠2 log4 𝑀˜                                           ˜
                                                                                          
                                 O                    or O 𝑑𝑠 log ( 𝑀)   3    ˜ log 𝑑 𝑀          .
                                         log 𝑠                                            𝜎
Proof. The bandwidth specified ensures that B𝑀˜ ⊃ {k · z | k ∈ I}. In the case where 𝑔 is a
trigonometric polynomial with supp( 𝑔)         ˆ ⊂ I, so long as there exists some 𝑀 ≤ 𝑀˜ such that Λ(z, 𝑀)
is reconstructing for I, we are guaranteed that a length- 𝑀˜ DFT on a polynomial supported on {k·z |
k ∈ I} will not suffer from aliasing collisions. Thus, by Lemma 1.2, the one-dimensional Fourier
transforms truncated to B𝑀˜ coincide with length 𝑀˜ DFTs. We can therefore view an approximation
from the algorithm in Theorem 3.1 as one of a length 𝑀˜ DFT. The reasoning in the proofs of
Lemma 3.2 and Corollary 3.1 then holds with the SFT algorithms, parameters, numbers of samples,
and complexities of Theorem 3.1.
Remark 3.3. As in Chapter 2, (2.7) and (2.8), we can estimate 𝑀˜ above with two different tech-
                                                             62


niques:
                                       Õ                     Õ
                  𝑀˜ = 1 + 2 max             𝑘 ℓ 𝑧ℓ ≤ 1 + 2        |𝑧ℓ | max |𝑘 ℓ | = O (𝑑𝐾I 𝑀),
                                k∈I                                      k∈I
                                      ℓ∈[𝑑]                 ℓ∈[𝑑]
                                   Õ                                                                
                𝑀˜ = 1 + 2 max           𝑘 ℓ 𝑧ℓ ≤ 1 + 2kzk ∞ max kkk 1 = O 𝑀 max kkk 1 .
                             k∈I                                k∈I                        k∈I
                                  ℓ∈[𝑑]
The latter case is especially useful when I is a subset of a known ℓ 1 ball as it will provide a dimen-
sion independent upper bound on 𝑀.       ˜ Either of these upper bounds may then be used in practice to
avoid having to estimate 𝑀.   ˜
    That being said however, if one is willing to perform the one-time search through the frequency
set I to more accurately calculate 𝑀,     ˜ one can go even further to use the minimal bandwidth 𝑀˜ 0 =
maxk∈I (k · z) − mink∈I (k · z) + 1 so long as the function samples are properly modulated to shift
the one-dimensional frequencies into B𝑀˜ 0 . For example, running A2𝑠,               sub
                                                                                        𝑀˜ 0
                                                                                             or A2𝑠,
                                                                                                 sub,MC
                                                                                                     𝑀˜ 0
                                                                                                          on 𝑔 1d (𝑡) =
                                                              j 0k
e2𝜋i𝜙𝑡 𝑔(𝑡z) and 𝑔 1d,ℓ (𝑡) = e2𝜋i𝜙𝑡 𝑆ℓ,1/𝐾 𝑔(𝑡z) with 𝜙 = 𝑀2 − maxk∈I (k · z) is acceptable so long as
                                                                 ˜
this shift is accounted for in the frequency check on Line 11. Note though that these improvements
will only have the effect of reducing the logarithmic factors in the computational complexity.
3.3.2    Two-dimensional DFT technique
    Below, we will consider a method for recovering frequencies which, rather than shifting one
dimension of the multivariate periodic function 𝑔 at a time, leaves one dimension of 𝑔 out at a time.
We will fix one dimension ℓ ∈ [𝑑] of 𝑔 at equispaced nodes over T and apply a lattice SFT to the
other 𝑑 − 1 components. Applying a standard FFT to the results will produce a two-dimensional
DFT. The indices corresponding to the standard FFT will represent frequency components in di-
mension ℓ while the indices corresponding to the lattice SFT will be used to synchronize with
known one-dimensional frequencies k · z mod 𝑀.
    Note that below, we will separate out coordinate ℓ of a multivariate point x ∈ T𝑑 or frequency
k ∈ Z𝑑 , denoting the remaining coordinates as x0ℓ ∈ T𝑑−1 or k0ℓ ∈ Z𝑑−1 . With a slight abuse of
notation, we can rewrite the original point or frequency as x = (𝑥ℓ , x0ℓ ) or k = (𝑘 ℓ , k0ℓ ).
    Again, before stating Algorithm 3.2 in detail, we present an example.
                                                         63


Example 3.2 (Two-dimensional DFT technique on a trigonometric monomial). As in Example 3.1,
we let 𝑔 be the trigonometric monomial 𝑔(x) := e2𝜋ik·x . However, in this example, we let 𝑑 = 3, so
k ∈ I ⊂ Z3 and the domain of 𝑔 is T3 depicted in Figure 3.2. We will consider the procedure to
compute the ℓ = 0 component of k.
     First, we take a reconstructing rank-1 lattice Λ(z, 𝑀) for I and restrict all but the first component
of 𝑔 to the lattice. This produces a two-dimensional function of the form
                                        (𝑥 0 , 𝑡) ↦→ e2𝜋i(𝑘 0 𝑥0 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡) .
     We then sample this function at 𝐾 equispaced points over T in the 𝑥0 variable. This produces 𝐾
projected lattices spaced 1/𝐾 apart in the 𝑥 0 direction on which we sample 𝑔, depicted in Figure 3.2.
Fixing 𝑥 0 at each equispaced point produces the 𝐾 univariate functions which are organized into
the top array of Figure 3.3. Notice that colors of the entries in this array correspond to the lattices
in Figure 3.2 over which we sample 𝑔 to produce that entry.
     The next step is to apply an SFT to each of the univariate functions in this array. Each function
has exactly one active frequency, 𝑘 1 𝑧1 + 𝑘 2 𝑧 2 , with corresponding Fourier coefficient e2𝜋i𝑘 0 𝑗/𝐾 .
Thus, collecting the results into a matrix produces the left-most matrix in Figure 3.3 with only
the 𝑘 1 𝑧1 + 𝑘 2 𝑧2 mod 𝑀 column filled. This column contains 𝐾 equispaced samples of the function
e2𝜋i𝑘 0 𝑥0 , and so finally applying a DFT to the matrix will produce the right-most matrix in Figure 3.3.
We find only one entry in row 𝑘 0 mod 𝑀 corresponding to the only active frequency of e2𝜋i𝑘 0 𝑥0 .
Thus, we can read off the ℓ = 0 entry of k by determining which row contains the Fourier coefficient
of 𝑔 of interest. Repeating this process for all ℓ = 0, . . . , 𝑑 − 1 we will be able to recover k.
     We now generalize the procedure demonstrated in Example 3.2 in a lemma. In particular, we
must account for functions which have more than one significant frequency. For theoretical sim-
plicity, we use a length 𝑀-DFT in the first step rather than an SFT.
Lemma 3.3. Fix some finite multivariate frequency set I ⊂ B𝐾𝑑 , let Λ(z, 𝑀) be a reconstructing
rank-1 lattice for {k − 𝑘 ℓ eℓ | k ∈ I} (where eℓ ∈ Z𝑑 is the canonical basis vector which has
(eℓ )ℓ = 1 and zeros in all other entries) for all ℓ ∈ [𝑑], and assume that 𝑔 has Fourier support
supp( 𝑔) ˆ ⊂ I. Fixing one dimension ℓ ∈ [𝑑], and writing the generating vector as z = (𝑧ℓ , z0ℓ ) ∈ Z𝑑 ,
                                                        64


                                                𝑥2
                                                                                   1  𝑥1
                                  𝑥0                                               𝐾
Figure 3.2 An example of T3 depicting the 𝐾 projected rank-1 lattices on which 𝑔(x) is sampled to
compute the ℓ = 0 component of each 𝑑-dimensional frequency.
                                     © e2𝜋i(𝑘 0 𝐾0 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡)   ª
                                     ­                                   ®
                                     ­ e2𝜋i(𝑘 0 𝐾1 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡)
                                     ­                                   ®
                                                                         ®
                                     ­                                   ®
                                     ­              ..                   ®
                                     ­
                                     ­               .                   ®
                                                                         ®
                                     ­ 2𝜋i(𝑘 𝐾 −2 +𝑘 𝑧 𝑡+𝑘 𝑧 𝑡)          ®
                                     ­ e     0 𝐾        1 1      2 2     ®
                                     ­                                   ®
                                     ­ 2𝜋i(𝑘 0 𝐾 −1 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡)  ®
                                        e       𝐾
                                     «                                   ¬
                                          Apply SFT A 𝑠,𝑀 to rows
                                                                  © 0 ··· 0 0 0 ··· 0 ª
                             0
        0 ··· 0    e2𝜋i𝑘 0 𝐾    0 ··· 0        Apply
                                                                  ­                               ®
      ©                                   ª                       ­ .. . .        ..     . . . .. ®
      ­                                   ®                       ­ .       .      .            . ®
                   e2𝜋i𝑘 0 𝐾
                             1
                                               F𝐾 to
      ­                                   ®
      ­ 0 ··· 0                 0 ··· 0   ®                       ­
                                                                  ­ 0         0  0   0         0
                                                                                                  ®
                                               columns
      ­                                   ®                                                       ®
                                                                                                    row 𝑘 0 mod 𝐾
                                                                  ­                               ®
      ­                 ..                ®                       ­ 0 ··· 0 1 0 ··· 0 ®
      ­
      ­                  .                ®
                                          ®                       ­                               ®
                                                                  ­ 0         0 0 0            0 ®®
                    2𝜋i𝑘 0 𝐾𝐾−2
      ­                                   ®
        0 ··· 0 e               0 ··· 0
      ­                                   ®                       ­
                                                                  ­                               ®
      ­                                   ®                       ­ .. . .       ..      . . . .. ®
                    2𝜋i𝑘 0 𝐾𝐾−1                                   ­ .       .     .             . ®
      ­                                   ®
        0 ··· 0 e               0 ··· 0                           ­                               ®
      «                                   ¬                             0 ··· 0 0 0 ··· 0
         column 𝑘 1 𝑧1 + 𝑘 2 𝑧2 mod 𝑀                             «                               ¬
                                                                   column 𝑘 1 𝑧1 + 𝑘 2 𝑧2 mod 𝑀
Figure 3.3 One round of the basic procedure for the two dimensional DFT algorithm applied to the
trigonometric monomial 𝑔(x) = e2𝜋ik·x sampled over the sets depicted in Figure 3.2. Notice that
each row corresponds to samples of 𝑔(x) on the shifted lattice of the corresponding color.
                                                            65


define the polynomials
                                                                            
                                                                  𝑗
                                            𝑔 1d,ℓ
                                              𝑗 (𝑡)     := 𝑔          , 𝑡z for all 𝑗 ∈ [𝐾],
                                                                           0
                                                                  𝐾 ℓ
that is, fix coordinate ℓ at 𝑗/𝐾 and restrict the remaining coordinates to dimensions [𝑑] \ {ℓ} of
the rank-1 lattice. Then for all one-dimensional frequencies 𝜔 ∈ [𝑀],
                                
                                                e2𝜋i 𝑗 ℎℓ /𝐾 𝑔ˆ (ℎℓ ,kℓ0 )     if ∃k ∈ I with 𝜔 ≡ k0ℓ · z0ℓ (mod 𝑀),
                                
                                      Í
                                
                                 ℎℓ ∈B𝐾 s.t.
                              
                                
            F 𝑀 g1d,ℓ
                                
                  𝑗        =      (ℎℓ ,kℓ0 )∈I
                        𝜔       
                                
                                                                               otherwise.
                                
                                0
                                
                                
                                
                                                                     
Moreover, defining the matrix Gℓ =                       F 𝑀 g1d,ℓ
                                                                𝑗                                , we have
                                                                        𝜔 𝑗 ∈[𝐾],𝜔∈[𝑀]
                                                
                                       F𝐾 Gℓ                                     = 𝑔ˆ k for all k ∈ I,
                                                   𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀
and the remaining entries of the matrix F𝐾 Gℓ ∈ C𝐾×𝑀 are zero.
Proof. Using the Fourier series representation of 𝑔, we have
                                                                                                      
                                                                                     𝑗 𝑘ℓ
                                                                               2𝜋i         +kℓ0 ·zℓ0 𝑡
                                                                 Õ
                                                𝑔 1d,ℓ
                                                   𝑗 (𝑡)    :=          𝑔ˆ k e        𝐾
                                                                                                         .
                                                                 k∈I
We calculate for 𝜔 ∈ [𝑀]
                                                                                                     2 𝜋i(h0 ·z0 − 𝜔)𝑖
                                                          1 Õ Õ 2 𝜋i 𝑗 ℎℓ
                                     F 𝑀 g1d,ℓ
                                                                                                            ℓ ℓ
                                              𝑗         =                       e 𝐾 𝑔ˆ h e                    𝑀
                                                     𝜔     𝑀
                                                               𝑖∈[𝑀] h∈I
                                                           Õ 2 𝜋i 𝑗 ℎℓ
                                                        =       e 𝐾 𝑔ˆ h 𝛿0,(hℓ0 ·zℓ0 −𝜔 mod 𝑀) .
                                                           h∈I
When there exists some k ∈ I such that k0ℓ ·z0ℓ ≡ 𝜔 mod 𝑀, the fact that Λ(z, 𝑀) is a reconstructing
rank-1 lattice for {k− 𝑘 ℓ eℓ | k ∈ I} ensures that such k0ℓ satisfying this equivalence is unique. Then,
we can simplify this sum to
                                                                      Õ           2 𝜋i 𝑗 ℎℓ
                                              F 𝑀 g1d,ℓ
                                                      𝑗       =                  e      𝐾      𝑔ˆ (ℎℓ ,kℓ0 ) .
                                                            𝜔
                                                                  ℎℓ ∈B𝐾 s.t.
                                                                   (ℎℓ ,kℓ0 )∈I
When no k ∈ I exists such that k0ℓ · z0ℓ ≡ 𝜔 (mod 𝑀), this sum is instead zero as desired. Applying
F𝐾 to Gℓ then allows us to compute
                
                         ℓ
                                                               1 Õ Õ                                      2 𝜋i(ℎℓ −𝑘ℓ mod 𝐾 ) 𝑗
                  F𝐾 G                                      =                            𝑔ˆ (ℎℓ ,kℓ0 ) e             𝐾           = 𝑔ˆ k .
                             𝑘 ℓ mod 𝐾, kℓ0 ·zℓ0 mod 𝑀         𝐾
                                                                     𝑗 ∈[𝐾]   ℎ ∈B
                                                                               ℓ     𝐾
                                                                        66


Algorithm 3.2 Frequency Index Recovery by Two Dimensional DFT
Input: A multivariate periodic function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ) (from which we are able to obtain
     potentially noisy samples), a multivariate frequency set I ⊂ B𝐾𝑑 , a rank-1 lattice Λ(z, 𝑀) which
     is reconstructing for I and {k − 𝑘 ℓ eℓ | k ∈ I} for all ℓ ∈ [𝑑], and an SFT algorithm A 𝑠,𝑀 .
Output: Sparse coefficient vector ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 (optionally supported on I, see Line 16), an
                                                                   𝐾
     approximation to ( 𝑔|      ˆ I)𝑠 .
                                      opt
  1: Apply A 𝑠,𝑀 to the univariate restriction of 𝑔 to the lattice, 𝑔 1d (𝑡) := 𝑔(𝑡z), to produce ĝ1d,𝑠 :=
     A 𝑠,𝑀 𝑔 1d , a sparse approximation of F 𝑀 g1d ∈ C 𝑀 .
  2: for all ℓ ∈ [𝑑] do
  3:      for all 𝑗 ∈ [𝐾] do                                   
  4:          Apply A 𝑠,𝑀 to 𝑔 1d,ℓ     𝑗    (𝑡) := 𝑔   𝑗
                                                        𝐾 , 𝑡z0 to produce ĝ1d,ℓ,𝑠 := A
                                                              ℓ              𝑗           𝑠,𝑀 𝑔 𝑗 , a sparse approxi-
                                                                                              1d,ℓ
     mation of F 𝑀 g1d,ℓ 𝑗 .
  5:          Row 𝑗 of Gℓ,𝑠 ← ĝ1d,ℓ,𝑠     𝑗     .
  6:      end for
  7:      for all nonzero columns 𝜔 of Gℓ,𝑠 do
  8:          Apply F𝐾 to column 𝜔 of Gℓ,𝑠 to produce F𝐾 Gℓ,𝑠 .
  9:      end for
10:  end for
11:  ĝ𝑠 ← 0
12:  for all 𝜔 ∈ supp( ĝ1d,𝑠 ) do
13:       for all ℓ ∈ [𝑑] do
14:           ((𝑘 𝜔 )ℓ , ∼) ← arg min{| 𝑔ˆ 𝜔1d,𝑠 − (F𝐾 Gℓ,𝑠 ) ℎ,𝜔0 | | (ℎ, 𝜔0) ∈ B𝐾 × [𝑀] with ℎ𝑧ℓ + 𝜔0 ≡
     𝜔 mod 𝑀 }
15:       end for
16:       if k𝜔 · z ≡ 𝜔 mod 𝑀 (and optionally k𝜔 ∈ I) then
17:           𝑔ˆ k𝑠 𝜔 ← 𝑔ˆ k𝑠 𝜔 + 𝑔ˆ 𝜔1d,𝑠
18:       end if
19:  end for
     Example 3.2 and Lemma 3.3 explain the procedure in Lines 1 through 10 of Algorithm 3.2.
However, some care must be taken when we assign rows of nonzero entries in the resulting matrix
to coordinates of significant frequencies. The solution is the minimization problem in Line 14. This
step uses column information as well as the values of the entries in the matrix to ensure that we are
properly matching frequency components with the correct Fourier coefficient 𝑔ˆ 𝜔1d,𝑠 .
     The remainder of the algorithm is the same as Algorithm 3.1. Line 16 consists of the same
check to ensure that recovered frequencies are correct, and if this check passes, the one-dimensional
Fourier coefficient is assigned to its matched 𝑑-dimensional frequency.
Remark 3.4. We bring special attention to the fact that Algorithm 3.2 requires as input a rank-1
                                                                 67


lattice Λ(z, 𝑀) which is reconstructing for not only I, but also the projections of I of the form
{k − 𝑘 ℓ eℓ | k ∈ I} for any ℓ ∈ [𝑑]. For frequency sets I which are downward closed, that is,
if I is such that for any k ∈ I and h ∈ Z𝑑 , |h| ≤ |k| component-wise implies that h ∈ I, any
reconstructing rank-1 lattice for I is necessarily one for the considered projections as well. Thus,
for many frequency spaces of interest, e.g., hyperbolic crosses (cf. Remarks 3.2 and 3.3 as well as
Section 3.4 below), any reconstructing rank-1 lattice for I will suffice as input to Algorithm 3.2.
3.3.2.1      Analysis of Algorithm 3.2
     With the conceptual explanation of Algorithm 3.2 complete, we now provide error guarantees
for its output.
Lemma 3.4 (General recovery result for Algorithm 3.2.). Let 𝑔, I, and Λ(z, 𝑀) be as specified
in the input to Algorithm 3.2. Additionally, let A 𝑠,𝑀 be a noise-robust SFT algorithm satisfying
the same constraints as in Lemma 3.2 with parameters 𝜏 and 𝜂∞ holding uniformly for each SFT
performed in Algorithm 3.2.
     Collect the 𝜏-significant frequencies of 𝑔 into the set S𝜏 := {k ∈ I | | 𝑔ˆ k | > 𝜏} and assume that
|S𝜏 | ≤ 𝑠. Then Algorithm 3.2 (ignoring the optional check on Line 16) will produce an 𝑠-sparse
approximation of the Fourier coefficients of 𝑔 satisfying the error estimates
                                           p
     k ĝ𝑠 − 𝑔k
              ˆ ℓ2 (Z𝑑 ) ≤ 𝜂2 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + k 𝑔|
                                                                   ˆ I − 𝑔|
                                                                         ˆ S4𝜏 k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                                                                   ˆ I k ℓ2 (Z𝑑 )
      k ĝ𝑠 − 𝑔k
               ˆ ℓ1 (Z𝑑 ) ≤ 𝜂1 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + 𝑔|
                                                                  ˆ I − 𝑔|
                                                                        ˆ S4𝜏  ℓ 1 (Z𝑑 )
                                                                                         + k 𝑔ˆ − 𝑔|
                                                                                                  ˆ I k ℓ1 (Z𝑑 ) .
requiring O (𝑑𝐾 · 𝑃(𝑠, 𝑀)) total evaluations of 𝑔, in O (𝑑𝐾 (𝑅(𝑠, 𝑀) + 𝑠𝐾 log 𝐾)) total operations.
Proof. We begin by assuming that 𝑔 is a trigonometric polynomial with supp( 𝑔)                   ˆ ⊂ I. Since
Λ(z, 𝑀) is a reconstructing rank-1 lattice for I, the DFT-aliasing ensures that Line 1 of Algo-
rithm 3.2 will return approximate coefficients uniquely corresponding to all 𝜏-significant frequen-
cies k ∈ S𝜏 which we can label 𝑔ˆ k·z     1d,𝑠
                                              mod 𝑀
                                                    . Additionally, Line 4 recovers approximations to all
𝜏-significant frequencies of F 𝑀 g1d,ℓ  𝑗    which have the form given in Lemma 3.3. In particular, if
                                                       68


k ∈ S𝜏 , we have
                                                                
                               𝜏 < | 𝑔ˆ k | = F𝐾 Gℓ
                                                                   𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀
                                                     1 Õ                                                       −2 𝜋i 𝑗 𝑘ℓ mod 𝐾
                                               =                    F 𝑀 g1d,ℓ  𝑗                              e           𝐾
                                                    𝐾                                    kℓ0 ·zℓ0 mod 𝑀
                                                          𝑗 ∈[𝐾]
                                                    1 Õ                              
                                               ≤                    F 𝑀 g1d,ℓ   𝑗
                                                   𝐾                                     kℓ0 ·zℓ0 mod 𝑀
                                                         𝑗 ∈[𝐾]
                                               ≤ max (F 𝑀 g1d,ℓ         𝑗 )kℓ ·zℓ mod 𝑀 .
                                                                                     0 0
                                                   𝑗 ∈[𝐾]
Thus, there exists at least one F 𝑀 g1d,ℓ              𝑗     with k0ℓ · z0ℓ mod 𝑀 recovered as a 𝜏-significant frequency
in the SFT of Line 4, and k0ℓ · z0ℓ mod 𝑀 will be a nonzero column in Gℓ,𝑠 for all k ∈ S𝜏 .
    Analyzing these SFTs in more detail for any k ∈ I such that k0ℓ · z0ℓ mod 𝑀 is a nonzero column
of Gℓ,𝑠 , we write
                                                                                                            
                           ĝ1d,ℓ,𝑠
                              𝑗                             = F 𝑀 g1d,ℓ   𝑗                                 + 𝜂ℓ𝑗
                                       kℓ0 ·zℓ0 mod 𝑀                                kℓ0 ·zℓ0 mod 𝑀                     kℓ0 ·zℓ0 mod 𝑀
where, by the ℓ ∞ and recovery guarantees for A 𝑠,𝑀 , the error satisfies
                                                                                               
                                                                          if ĝ1d,ℓ,𝑠
                                                              
                                                                                                                        ≠0
                                                              
                                                               𝜂∞
                                                              
                                                              
                                                                                   𝑗            kℓ0 ·zℓ0 mod 𝑀
                               𝜂ℓ𝑗                        ≤                                                                      ≤ 𝜏.
                                    kℓ0 ·zℓ0 mod 𝑀                                              
                                                                          if        ĝ1d,ℓ,𝑠
                                                              
                                                              
                                                              𝜏
                                                              
                                                                                      𝑗                                 =0
                                                                                                   kℓ0 ·zℓ0 mod 𝑀
                                                              
                                                              
Thus, in the application of F𝐾 to column k0ℓ · z0ℓ mod 𝑀 of Gℓ,𝑠 , we have
                       
                    ℓ,𝑠
             F𝐾 G         𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀
                                                                                                                                   !
                                            
                           = F𝐾 Gℓ                                                 + F𝐾                𝜂ℓ𝑗
                                               𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀                                      kℓ0 ·zℓ0 mod 𝑀 𝑗 ∈[𝐾]
                                                                                                                                         𝑘 ℓ mod 𝐾
                           =: 𝑔ˆ k + 𝜂kℓ
with
                           1 Õ  ℓ                                    −2 𝜋i 𝑗 𝑘ℓ mod 𝐾                           
              |𝜂kℓ | =                      𝜂𝑗 0 0                  e           𝐾                ≤ max 𝜂ℓ𝑗 0 0                           ≤ 𝜏.
                           𝐾                      kℓ ·zℓ mod 𝑀                                        𝑗 ∈[𝐾]              kℓ ·zℓ mod 𝑀
                                𝑗 ∈[𝐾]
                                                                            69


These same calculations apply to the computed columns of F𝐾 Gℓ,𝑠 which do not correspond to
values of k0ℓ · z0ℓ mod 𝑀 for some k ∈ I since we assume supp( 𝑔)                               ˆ ⊂ I, and so at worst, these
columns are filled with noise bounded in magnitude by 𝜏.
      Restricting our attention to k ∈ S4𝜏 ⊂ S𝜏 , we know that Line 14 will be run with 𝜔 = k · z mod
𝑀 and (𝑘 ℓ mod 𝐾, k0ℓ ·z0ℓ mod 𝑀) as an admissible index in the minimization. By the reconstructing
property of Λ(z, 𝑀), no other h ∈ I will correspond to an admissible index (ℎℓ mod 𝐾, h0ℓ · z0ℓ mod
𝑀), and so the only remaining values of (F𝐾 Gℓ,𝑠 ) ℎ,𝜔0 in the minimization correspond to pure noise
𝜂 bounded in magnitude by 𝜏. Analyzing the objective at (𝑘 ℓ mod 𝐾, k0ℓ · z0ℓ mod 𝑀), we find
                  1d,𝑠                                                                                    1d,𝑠
             | 𝑔ˆ k·z mod 𝑀
                            − (F𝐾 Gℓ,𝑠 ) 𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀 | ≤ 2𝜏 < | 𝑔ˆ k | − 2𝜏 ≤ | 𝑔ˆ k·z           mod 𝑀
                                                                                                                      − 𝜂|,
and so the value for (𝑘 𝜔 )ℓ will in fact be assigned 𝑘 ℓ . Thus, after all 𝑑 components of k𝜔 = k have
been recovered, 𝑔ˆ k𝑠 will be assigned 𝑔ˆ k·z   1d,𝑠
                                                     mod 𝑀
                                                                 .
      The remaining max(𝑠 − |S4𝜏 |, 0) nonzero entries of ĝ1d,𝑠 can be distributed to entries of ĝ𝑠
possibly correctly but with no guarantee; at the very least however, these values must be at most 4𝜏+
𝜂∞ in magnitude. We split ĝ𝑠 as ĝ𝑠 = ĝ𝑠,correct + ĝ𝑠,incorrect to account for the values of ĝ𝑠 respectively
assigned correctly and incorrectly and note that supp( ĝ𝑠,correct ) ⊃ S4𝜏 . We then estimate the ℓ 2 error
as
  k ĝ𝑠 − 𝑔k
          ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠,correct − 𝑔|
                                       ˆ supp(ĝ𝑠,correct ) k ℓ2 (Z𝑑 ) + k ĝ𝑠,incorrect k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔|ˆ supp(ĝ𝑠,correct ) k ℓ2 (Z𝑑 )
                                          p
                       ≤ 𝜂2 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + k 𝑔ˆ − 𝑔|              ˆ S4𝜏 k ℓ2 (Z𝑑 )
and the ℓ 1 error as
  k ĝ𝑠 − 𝑔k
           ˆ ℓ1 (Z𝑑 ) ≤ ĝ𝑠,correct − 𝑔|
                                       ˆ supp(ĝ𝑠,correct )   ℓ 1 (Z𝑑 )
                                                                        + ĝ𝑠,incorrect   ℓ 1 (Z𝑑 )
                                                                                                    + 𝑔ˆ − 𝑔| ˆ supp(ĝ𝑠,correct )   ℓ 1 (Z𝑑 )
                       ≤ 𝜂1 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + 𝑔ˆ − 𝑔|               ˆ S4𝜏    ℓ 1 (Z𝑑 )
                                                                                                       .
As in the proof of Lemma 3.2, we note that the mandatory check in Line 16 helps ensure that all
misassigned values 𝑔ˆ 𝜔1d,𝑠 which contribute to ĝ𝑠,incorrect correspond to reconstructed k𝜔 outside of
I, with the optional check in this line (see Remark 3.2) eliminating ĝ𝑠,incorrect and the corresponding
term in the error estimate entirely.
                                                                    70


     Now, supposing that the Fourier support of 𝑔 is not limited to only I, just as in the analysis
for Algorithm 3.1, we treat 𝑔 as a perturbation of 𝑔| I , and use the robust SFT algorithm and the
previous argument to approximate 𝑔|             ˆ I . Note again that in each SFT, the noise added when using
measurements of 𝑔 as proxies for those of 𝑔| I is compounded by k𝑔| Z𝑑 \I k ∞ and is bounded by
        ˆ I k ℓ1 (Z𝑑 ) . Applying the guarantees above gives the ℓ 2 estimate
k 𝑔ˆ − 𝑔|
     k ĝ𝑠 − 𝑔k
              ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔|
                                    ˆ I k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                             ˆ I k ℓ2 (Z𝑑 )
                                                p
                          ≤ 𝜂2 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + k 𝑔|         ˆ I − 𝑔|
                                                                                    ˆ S4𝜏 k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                                                                               ˆ I k ℓ2 (Z𝑑 )
and the ℓ 1 estimate
      k ĝ𝑠 − 𝑔k
               ˆ ℓ1 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔|
                                     ˆ I k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔|
                                                              ˆ I k ℓ1 (Z𝑑 )
                          ≤ 𝜂1 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + 𝑔|          ˆ I − 𝑔|
                                                                                   ˆ S4𝜏  ℓ 1 (Z𝑑 )
                                                                                                     + k 𝑔ˆ − 𝑔|
                                                                                                              ˆ I k ℓ1 (Z𝑑 ) .
     Employing fast Fourier transforms for the at most 𝑑𝑠𝐾 DFTs, the computational complexity of
Lines 2 to 10 is O 𝑑 (𝐾 · 𝑅(𝑠, 𝑀) + 𝑠𝐾 2 log 𝐾) (which dominates the complexity of the remainder
                                                               
of the algorithm). Since 1 + 𝑑𝐾 SFTs are required, the number of 𝑔 evaluations is O (𝑑𝐾 · 𝑃(𝑠, 𝑀)).
Remark 3.5. Though the number of nonzero columns of Gℓ,𝑠 can be theoretically at most 𝑠𝐾, in
practice with a high quality algorithm, each of the 𝐾 SFTs should recover nearly the same frequen-
cies, meaning that there are actually O (𝑠) columns. This would remove a power of 𝐾 in the second
term of the runtime estimate.
     Note however, that even with near exact SFT algorithms, recovering exactly 𝑠 total frequencies
is not a certainty. There can be cancellations for certain terms in F 𝑀 g1d,ℓ                𝑗      depending interactions
between the coefficients sharing the same values on their [𝑑] \ {ℓ} entries, which makes it possible
that an SFT on F 𝑀 g1d,ℓ    𝑗     will miss coefficients. If required to output 𝑠-entries, an SFT algorithm
could favor some noisy value corresponding to a frequency outside the support.
Remark 3.6. Though we perform an exact FFT of the nonzero columns of G1d,ℓ in Line 8 of Algo-
rithm 3.2, Lemma 3.3 implies that the resulting matrix will be as sparse as the original function’s
Fourier transform. Thus, for a truly compressible function, an SFT down the columns of G1d,ℓ
                                                                 71


would be feasible as well. However, in especially higher dimensions, even small 𝐾 can allow for
large frequency spaces I. In these large frequency spaces, what is perceived as relatively sparse
can therefore quickly surpass 𝐾, rendering an 𝑠-sparse, length 𝐾 SFT useless.
     As a simple example, consider I to be the cube of side length 𝐾 = 𝑠, B𝑠𝑑 . For 𝑑 large enough,
any frequency support of size 𝑠 will be small in comparison to |I| = 𝑠 𝑑 . However, using an 𝑠-sparse
SFT instead of a length-𝑠 DFT in Algorithm 3.2 will actually be more expensive.
     Applying the discrete sublinear-time SFT from Theorem 3.2 to Lemma 3.4 analogously to the
derivation of Corollary 3.1 from Lemma 3.2 allows for the following recovery bound for Algo-
rithm 3.2. In particular, we observe asymptotically improved error guarantees over Corollary 3.1
at the cost of a slight increase in runtime.
Corollary 3.3 (Algorithm 3.2 with discrete sublinear-time SFT). For I ⊂ Z𝑑 with reconstructing
rank-1 lattice Λ(z, 𝑀) and the function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ), we consider applying Algorithm 3.2
where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute magnitude.
Using the discrete sublinear-time SFT algorithm A2𝑠,𝑀           disc or A disc,MC with parameter 1 ≤ 𝑟 ≤
                                                                           2𝑠,𝑀
                                                                                                          𝑀
                                                                                                          36
will produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying the error estimates
                                 𝐾
                                                      opt
                                      k 𝑔|
                                        ˆ I − ( 𝑔|
                                                ˆ I )𝑠 k1          √
               𝑠
           k ĝ − 𝑔kˆ ℓ2 (Z𝑑 )  ≤ 206         √              + 821 𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞)
                                                𝑠
                                                      opt
            k ĝ𝑠 − 𝑔k
                     ˆ ℓ1 (Z𝑑 ) ≤ 293 𝑔|ˆ I − ( 𝑔|
                                                ˆ I )𝑠       + 1161𝑠 (k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔|
                                                                                           ˆ I k 1 + 𝑒∞)
                                                           1
albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version.
     The total number of evaluations of 𝑔 and the computational complexity will be
                                                                               !!
                                                 𝑠𝑟 3/2 log11/2 𝑀
                                     O 𝑑𝑠𝐾                          + 𝐾 log 𝐾
                                                        log 𝑠
                                                                                    
                                                                      𝑑𝐾 𝑀
                            or O 𝑑𝑠𝐾 𝑟 log (𝑀) log
                                              3/2      9/2
                                                                             + 𝐾 log 𝐾
                                                                       𝜎
for A2𝑠,𝑀
       disc or A disc,MC respectively.
                   2𝑠,𝑀
     Again, the same strategy from Corollary 3.2 of widening the frequency band and shifting the
one-dimensional transforms accordingly allows us to use the nonequispaced SFT algorithm from
Theorem 3.1 in Algorithm 3.2. Note here that the widening and shifting occurs on a dimension by
                                                            72


dimension basis so as to account for the differing one-dimensional frequencies of the form k0ℓ · z0ℓ
for k ∈ I.
Corollary 3.4 (Algorithm 3.2 with nonequispaced sublinear-time SFT). For I ⊂ B𝐾𝑑 , let 𝑀˜ be the
larger one-dimensional bandwidth parameter from Corollary 3.2, and additionally define 𝑀˜ ℓ :=
2 maxk∈I |k0ℓ · z0ℓ | + 1. For Λ(z, 𝑀), a reconstructing rank-1 lattice for I and where 𝑀 is such
that 𝑀 ≤ min{ 𝑀,     ˜ minℓ∈[𝑑] 𝑀˜ ℓ }, for the function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ), we consider applying
Algorithm 3.2 where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute
magnitude with the following modifications:
   1. use the sublinear-time SFT algorithm A2𝑠,            sub or A sub,MC in Line 1 and A sub
                                                               𝑀˜          2𝑠, 𝑀˜                     2𝑠, 𝑀˜ ℓ
                                                                                                               or A2𝑠,
                                                                                                                   sub,MC
                                                                                                                       𝑀˜ ℓ
                                                                                                                            in
       Line 4
   2. and only check equality against 𝜔 in Line 14 (rather than equivalence modulo 𝑀),
to produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying the error estimates
                               𝐾
                                                                                                        !
                                                                   opt
                                               k ˆ
                                                 𝑔| I −  ( ˆ
                                                           𝑔| I 𝑠)     k 1    √                    √
                  k ĝ𝑠 − 𝑔k  ˆ ℓ2 (Z𝑑 ) ≤ 98            √                 + 𝑠k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑠𝑒 ∞
                                                            𝑠
                                                                                                     
                                                                     opt
                     k ĝ𝑠 − 𝑔kˆ ℓ1 (Z𝑑 ) ≤ 139 𝑔|  ˆ I − ( 𝑔| ˆ I )𝑠                   ˆ I k 1 + 𝑠𝑒 ∞ ,
                                                                             + 𝑠k 𝑔ˆ − 𝑔|
                                                                           1
albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version.
    Letting 𝑀¯ = max( 𝑀,      ˜ maxℓ∈[𝑑] 𝑀˜ ℓ ), the total number of evaluations of 𝑔 will be
                                    𝑑𝐾 𝑠2 log4 𝑀¯                                       𝑑𝐾 𝑀¯
                                                                                           
                               O                      or O 𝑑𝐾 𝑠 log 𝑀 log   3 ¯
                                         log 𝑠                                            𝜎
with associated computational complexities
                          𝑠 log4 𝑀¯                                                       𝑑𝐾 𝑀¯
                                                                                                        
           O 𝑑𝐾 𝑠                      + 𝐾 log 𝐾 or O 𝑑𝐾 𝑠 log 𝑀 log          3 ¯
                                                                                                   + 𝐾 log 𝐾
                             log 𝑠                                                         𝜎
for A2𝑠,·
      sub and A sub,MC respectively.
                 2𝑠,·
Remark 3.7. The bounds in Remark 3.3 will still hold for 𝑀˜ ℓ as well; thus one of these upper
bounds can be used as the effective bandwidth parameter for every SFT without having to calculate
the 𝑑 + 1 bandwidths by scanning I. Again however, if this scan is tolerable, one can reduce the
overall complexity by using analogous minimal bandwidths discussed in Remark 3.3 along with
corresponding frequency shifts.
                                                             73


3.4   Numerics
    We now demonstrate the effectiveness of our phase encoding and two-dimensional DFT al-
gorithms for computing Fourier coefficients of multivariate functions in a series of empirical tests.
The two techniques are implemented in MATLAB, with the code for the algorithms and tests in this
section publicly available¹. The results below use a MATLAB implementation² of the randomized
univariate sublinear-time nonequispaced algorithm A2𝑠,𝑀          sub,MC
                                                                        (cf. Theorem 3.1) as the underlying
SFT for both multivariate approaches as this allows for the fastest runtime and most sample effi-
cient implementations.
    In the univariate code, all parameters but one are qualitatively tuned below theoretical upper
bounds to increase efficiency while maintaining accuracy and are kept constant between tests below.
In particular, we fix the values C := 1, sigma := 2/3, and primeShift := 0 (see the documentation
and the original paper [37] for more detail). The only parameter we vary is “randomScale” which
affects the rate at which the deterministic algorithm A2𝑠,𝑀        sub is randomly sampled to produce the
Monte Carlo version A2𝑠,𝑀      sub,MC
                                      . This parameter represents a multiplicative scaling on logarithmic
factors of the bandwidth which determines how many prime numbers are randomly selected from
those used in the deterministic SFT implementation. Therefore, lower values of “randomScale”
will result in using fewer prime numbers, decreasing the number of function samples and overall
runtime at the risk of a higher probability of failure. We consider values well below the code default
and theoretical upper bound of 21 given in [37].
3.4.1   Exactly sparse case
    In the beginning, we consider the case of multivariate trigonometric polynomials with frequen-
cies supported within hyperbolic cross index sets. We define the 𝑑-dimensional hyperbolic cross
frequency set
                           
                                                                                    
                                                                                     
                                         Ö                     𝐾                    𝐾
                                                                    and
                           
                                                                                    
                 H𝐾𝑑   := k ∈ Z :   𝑑
                                              max(1, |𝑘 ℓ |) ≤            max 𝑘 ℓ <     ⊂ B𝐾𝑑
                                                              2          ℓ∈[𝑑]     2
                           
                                       ℓ∈[𝑑]                                        
                                                                                     
    ¹available at https://gitlab.com/grosscra/Rank1LatticeSparseFourier
    ²available at https://gitlab.com/grosscra/SublinearSparseFourierMATLAB
                                                           74


where the second condition ensures that H𝐾𝑑 is of expansion 𝐾 ∈ N. For a given sparsity 𝑠, we
choose 𝑠 many frequencies uniformly at random from H𝐾𝑑 , and we randomly draw corresponding
Fourier coefficients 𝑔ˆ k from [−1, 1] +i [−1, 1], | 𝑔ˆ k | ≥ 10−3 . For each parameter setting, we perform
the tests 100 times.
    Over these tests, we will determine the success rate as the percentage of times that all frequencies
were correctly identified in the output. We focus on frequency identification since this is the core
issue that Algorithms 3.1 and 3.2 solve, with the coefficient estimates carrying over directly from the
SFT algorithm. Moreover, with the 𝑠 most significant frequencies identified, any alternative method
for quickly computing the corresponding Fourier coefficients (if those from A 𝑠,𝑀 are not tolerable)
can be performed. Nevertheless, see the experiments following those in Section 3.4.1.1 for examples
where we compute ℓ 2 errors in the coefficient vectors rather than just comparing frequencies.
3.4.1.1    Randomized frequency sets within the 10-dimensional hyperbolic cross and high-
           dimensional full cuboids
    We set the spatial dimension 𝑑 := 10, the expansion 𝐾 := 33, and use I := H33                10 as set of
possible frequencies with cardinality |I| = 45 548 649. Then, the rank-1 lattice with generating
vector
                        z :=(1, 33, 579, 3 628, 21 944, 169 230,
                                                                                                       (3.11)
                                                                                    >
                              1 105 193, 7 798 320, 49 768 670, 320 144 128)
and lattice size 𝑀 := 2 040 484 044 is a reconstructing one. We apply Algorithm 3.1 and Algo-
rithm 3.2 with the SFT algorithm A2𝑠,   sub,MC
                                           𝑀˜
                                               .
    In Figure 3.4a, the success rate over 100 test runs is plotted against the sparsity values 𝑠 ∈
{10, 20, 50, 100, 200, 500, 1000} for Algorithm 3.1 and 𝑠 ∈ {10, 20, 50, 100} for Algorithm 3.2.
In Figure 3.4b, the average numbers of samples over 100 tests are reported. The magenta line
with circles corresponds to Algorithm 3.1 with bandwidth parameter 𝑀˜ = 𝑑𝐾 𝑀 ≈ 6.7 · 1011 and
randomScale = 0.3. We observe that the number of samples grow nearly linearly with respect to
the sparsity 𝑠. Moreover, the success rate is at least 0.99 (99 out of 100 test runs), where we define
success such that the support of output (sparse coefficient vector) contains the true frequencies.
Next, we reduce the bandwidth 𝑀˜ to 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 (see also Remark 3.3)
                                                      75


                                                                                                   𝑀                               ∼𝑠
                         phase bwℓ ∞ rs=0.3             phase bwℓ 1 rs=0.3                         phase bwℓ ∞ rs=0.3              phase bwℓ 1 rs=0.3
                         phase bwℓ 1 rs=0.5             2dim bwℓ 1 rs=0.3                          phase   bwℓ 1   rs=0.5          2dim bwℓ 1 rs=0.3
                         2dim   bwℓ 1   rs=0.5                                                     2dim bwℓ 1 rs=0.5
               1.00
                                                                                            109
success rate                                                                      samples
               0.95                                                                         108
                                                                                            107
               0.90
                       10    20          50      100    200    500 1,000                          10   20           50      100    200    500 1,000
                                           sparsity 𝑠                                                                 sparsity 𝑠
                      (a) Success rates vs. sparsity 𝑠.                                            (b) Samples vs. sparsity 𝑠.
Figure 3.4 Success rates and average number of samples over 100 test runs for Algorithm 3.1 with
  sub,MC
A2𝑠, 𝑀˜
         , denoted by “phase”, and Algorithm 3.2 with A2𝑠,   sub,MC
                                                                𝑀˜
                                                                    , denoted by “2dim”, on random
multivariate trigonometric polynomials, setting randomScale := rs. Random frequencies are cho-
sen from hyperbolic cross I := H33
                                 10 . “bwℓ ∞ ” and “bwℓ 1 ” respectively correspond to the bandwidth
parameters 𝑀˜ = 𝑑𝐾 𝑀 with approximate value 6.7 · 1011 and 𝑀˜ = 1 + 2kzk ∞ maxk∈I kkk 1 with
approximate value 1.6 · 1010 .
and visualize this as solid blue line with squares. This smaller bandwidth causes a decrease in the
number of samples of up to 50 percent while only mildly decreasing the success rates to values not
below 0.90. Increasing the randomScale parameter to 0.5, denoted by dashed blue line with squares,
raises the success rate to 1.00 while achieving still fewer samples than bandwidth parameter 𝑀˜ =
𝑑𝐾 𝑀 ≈ 6.7·1011 and randomScale = 0.3 (solid magenta line with circles). The numbers of samples
for Algorithm 3.2 are plotted as solid and dashed red lines with triangles for randomScale = 0.3 and
0.5, respectively, choosing the bandwidth 𝑀˜ := 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 . We observe
that Algorithm 3.2 requires a much larger number of samples, more than one order of magnitude,
compared to Algorithm 3.1, while achieving similar success rates. For comparison, in the case of
sparsity 𝑠 = 100 and randomScale = 0.5, Algorithm 3.2 takes almost 𝑀 = 2 040 484 044 samples,
the number to use a non-SFT, standard rank-1 lattice FFT.
                                                                             76


                                  phase rs=0.3    2dim rs=0.3                                       ∼𝑑             ∼ 𝑑2
                                                                                                    phase rs=0.3   2dim rs=0.3
               1.00
success rate                                                         samples
                                                                               109
               0.99
               0.98                                                            108
                      10   11   12 13 14 15 16 17 18 19 20                              10   11   12 13 14 15 16 17 18 19 20
                                   dimension 𝑑                                                       dimension 𝑑
                (a) Success rate vs. spatial dimension 𝑑.                            (b) Samples vs. spatial dimension 𝑑.
Figure 3.5 Average number of samples over 100 test runs for Algorithm 3.1 with SFT algorithm
  sub,MC
A2𝑠, 𝑀˜
         , denoted by “phase”, and Algorithm 3.2 with A2𝑠,  sub,MC
                                                               𝑀˜
                                                                   , denoted by “2dim”, on random
multivariate trigonometric polynomials, setting randomScale := rs. Random frequencies are cho-
sen from full cuboid of cardinality |I| ≈ 1012 with lattice size 𝑀 = |I| and bandwidth parameter
𝑀˜ = 𝑀.
               In Figure 3.5b, we investigate the dependence of the required number of samples of Algo-
rithm 3.1 and 3.2 on the spatial dimension 𝑑, where we consider the values 𝑑 ∈ {10, 11, . . . , 20}.
As before, the success rates are reported in Figure 3.5a. For this, we use a slightly different setting,
where we choose 𝑠 = 100 random frequencies from a full cuboid of cardinality ≈ 1012 . Note that a
cuboid with edge lengths 𝐾1 , 𝐾2 , . . . , 𝐾 𝑑 has the rank-1 lattice construction
                                                                          ©Ö ª
                   z = (1, 𝐾1 , 𝐾1 · 𝐾2 , . . . , 𝐾1 · 𝐾2 · · · 𝐾 𝑑−1 ) = ­       𝐾𝑗®
                                                                          « 𝑗=[ℓ]   ¬ℓ=[𝑑]
with lattice size 𝑀 = ℓ∈[𝑑] 𝐾ℓ = |I|. The main benefit of this construction is that the map
                       Î
k ↦→ k · z is a bijection between I and B𝑀 . Thus, the one-dimensional bandwidth parameter
𝑀˜ = 2 maxk∈I |k · z| + 1 (which is usually larger than 𝑀) in this case coincides with 𝑀 = |I|.
By choosing cuboids in this experiment which have approximately the same cardinality, we remove
any dependence on 𝑀˜ in our experiments, allowing us to focus on the dependence on 𝑑.
               In our examples, the cuboids are constructed by manually tuning the edge lengths for each
dimension so that the total cardinality is ≈ 1012 . One way to start this procedure is by computing
                                                                77


(1012 ) 1/𝑑 and then choosing 𝑑 edge lengths that approximately average to this value. From here,
the edge lengths can be qualitatively tweaked to arrive at a cuboid of the desired size. For instance,
we utilize the cuboid I := {−8, −7, . . . , 7}9 × {−7, −6, . . . , 7}, |I| ≈ 1.03 · 1012 , in the case 𝑑 = 10
and I := {−2, −1, . . . , 2} × {−2, −1, 0, 1}18 × {−1, 0, 1}, |I| ≈ 1.03 · 1012 , for 𝑑 = 20. Since
the expansion 𝐾 is a factor in the number of samples of Algorithm 3.2 (cf. Corollary 3.4) and we
want to concentrate on the dependence on the spatial dimension 𝑑, we now fix this parameter to
𝐾 := 16 independent of 𝑑. Moreover, the randomScale parameter is set to 0.3. The plots indicate
that the numbers of samples grow approximately linearly with respect to the dimension 𝑑 as stated
by Corollaries 3.2 and 3.4 for Algorithms 3.1 and 3.2, respectively. The success rates are slightly
better compared to the tests from Figure 3.4a.
3.4.1.2    Random frequency sets within 10-dimensional hyperbolic cross and noisy samples
    In this section, we again consider random multivariate trigonometric polynomials with frequen-
cies supported within the hyperbolic cross index set H33        10 of expansion 𝐾 = 33 and use the recon-
structing rank-1 lattice with generating vector z as stated in (3.11) and size 𝑀 := 2 040 484 044.
Similarly as in [43, Section 5.2], we perturb the samples of the trigonometric polynomial by addi-
tive complex (white) Gaussian noise 𝜀 𝑗 ∈ C with zero mean and standard deviation 𝜎. The noise is
                          √
generated by 𝜀 𝑗 := 𝜎/ 2 𝜀1, 𝑗 + i𝜀2, 𝑗 where 𝜀1, 𝑗 , 𝜀 2, 𝑗 are independent standard normal distributed.
                                       
Since the signal-to-noise ratio (SNR) can be approximately computed by
                                                        2
                                                                              | 𝑔ˆ k | 2
                                    Í 𝑀−1
                                      𝑗=0 |𝑔(x 𝑗 )| /𝑀
                                                                Í
                                                                           ˆ
                                                                   k∈supp( 𝑔)
                              SNR ≈                           ≈           2
                                                                                         ,
                                                    2
                                     Í 𝑀−1
                                         𝑗=0 |𝜀 𝑗 | /𝑀
                                                                        𝜎
                                  qÍ                        √
this leads to the choice 𝜎 :=                 ˆ
                                     k∈supp( 𝑔) | 𝑔ˆ k | 2 / SNR for a targeted SNR value. The SNR is
often expressed in the logarithmic decibel scale (dB), with SNRdB = 10 log10 SNR and SNR =
10SNRdB /10 , i.e., a linear SNR = 102 corresponds to a logarithmic SNRdB = 20 and SNR = 103
corresponds to SNRdB = 30. Here, our tests use sparsity 𝑠 = 100 and signal-to-noise ratios
SNRdB ∈ {0, 5, 10, 15, 20, 25, 30}. Moreover, we only use the bandwidth parameter 𝑀˜ = 1 +
2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 . Besides that, we choose the algorithm parameters as in Fig-
ure 3.4.
                                                        78


                                 phase rs=0.3        phase rs=0.5                                         phase rs=0.3        phase rs=0.5
                                 2dim rs=0.3         2dim rs=0.5                                          2dim rs=0.3         2dim rs=0.5
                1                                                                         10−1
               0.8
success rate                                                         relative ℓ 2 error
               0.6                                                                        10−2
               0.4
                                                                                          10−3
               0.2
                0
                      0     5     10    15      20     25    30                                  0   5     10    15      20     25    30
                                       SNRdb                                                                    SNRdb
                (a) success rates vs. noise level for 𝑠 = 100                       (b) relative ℓ 2 errors vs. noise level for 𝑠 = 100
Figure 3.6 Average success rates (all frequencies detected) and relative ℓ 2 errors over 100 test runs
for Algorithm 3.1 with A2𝑠,sub,MC
                              𝑀˜
                                  , denoted by “phase”, and Algorithm 3.2 with A2𝑠,   sub,MC
                                                                                         𝑀˜
                                                                                             , denoted
by “2dim”, on random multivariate trigonometric polynomials supported on the hyperbolic cross
I := H3310 , setting randomScale := rs ∈ {0.3, 0.5} and using the bandwidth parameter 𝑀        ˜ = 1+
2kzk ∞ maxk∈I kkk 1 with approximate value 1.6 · 10 .10
               In Figure 3.6a, we visualize the success rates depending on the noise level. For randomScale ∈
{0.3, 0.5} and both algorithms, the success rates start at less than 0.12 for SNRdB = 0 and grow
for increasing signal-to-noise ratios until at least 0.90 for SNRdB = 30. The success rates of Algo-
rithm 3.2 with A2𝑠,
                sub,MC
                    𝑀˜
                       (“2dim”) are often higher than for Algorithm 3.1 with A2𝑠,
                                                                              sub,MC
                                                                                  𝑀˜
                                                                                     (“phase”),
which may be caused by the larger numbers of samples for Algorithm 3.2 and the noise model
used. Note that the numbers of samples correspond to those in Figure 3.4b for 𝑠 = 100 independent
of the noise level. For Algorithm 3.2 with randomScale = 0.3, the increase of the success rate
seems to stagnate at SNRdB = 20, while this does not seem to be the case for randomScale = 0.5
or Algorithm 3.1. In particular, this behavior can also be observed in Figure 3.6b, where we plot
the average relative ℓ 2 error of the Fourier coefficients against the signal-to-noise ratio. Here, we
observe that for randomScale = 0.3, the decrease of the errors for increasing SNRdB values almost
stops once reaching SNRdB = 20 for both algorithms. Initially, the average error of Algorithm 3.2
is smaller, but at SNRdB = 15 and higher, the average error of Algorithm 3.1 is smaller. In case
                                                                    79


of randomScale = 0.5, we observe a distinct decrease for growing signal-to-noise ratios for both
algorithms.
3.4.1.3              Deterministic frequency set within 10-dimensional hyperbolic cross and noisy sam-
                     ples
               Next, instead of randomly chosen frequencies, we now consider frequencies on a 𝑑-dimensional
weighted hyperbolic cross
                                  
                                                                                                                     
                                                                                                                      
                                                Ö                                𝐾                                   𝐾
                                                                                                     and
                                  
                                                                                                                     
                          H𝐾𝑑,𝛼 := k ∈ Z𝑑 :           max(1, (ℓ + 1) 𝛼 |𝑘 ℓ |) ≤                           max 𝑘 ℓ <    .
                                                                                2                         ℓ∈[𝑑]     2
                                  
                                           ℓ∈[𝑑]                                                                     
                                                                                                                      
Here, we use 𝑑 = 10, 𝐾 = 33, I := H33
                                   10 , and 𝛼 = 1.7, which yields 𝑠 = |H 10,1.7 | = 101. Again,
                                                                        33
the Fourier coefficients 𝑔ˆ k are randomly chosen from [−1, 1] + i [−1, 1], | 𝑔ˆ k | ≥ 10−3 . We use the
same lattice and bandwidth parameter as in the last subsection as well as the same noise model and
parameters.
                                 phase rs=0.3        phase rs=0.5                                          phase rs=0.3        phase rs=0.5
                                 2dim rs=0.3         2dim rs=0.5                                           2dim rs=0.3         2dim rs=0.5
                1                                                                         10−1
               0.8
success rate                                                         relative ℓ 2 error
               0.6                                                                        10−2
               0.4
                                                                                          10−3
               0.2
                0
                      0      5    10    15      20     25    30                                  0    5    10    15       20     25    30
                                       SNRdb                                                                    SNRdb
                (a) success rates vs. noise level for 𝑠 = 100                       (b) relative ℓ 2 errors vs. noise level for 𝑠 = 100
Figure 3.7 Average success rates (all frequencies detected) and relative ℓ 2 errors over 100 test runs
for Algorithm 3.1 with A2𝑠,
                         sub,MC
                            𝑀˜
                                , denoted by “phase”, and Algorithm 3.2 with A2𝑠,  sub,MC
                                                                                      𝑀˜
                                                                                          , denoted by
“2dim”, on multivariate trigonometric polynomials with (deterministic) frequencies on weighted
hyperbolic cross within hyperbolic cross I := H33  10 , setting randomScale := rs ∈ {0.3, 0.5} and
bandwidth parameter 𝑀˜ = 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 .
               In Figure 3.7, we depict the obtained results. In particular, the results in Figure 3.7a are very
                                                                    80


similar to the ones for randomly chosen frequencies in Figure 3.6a. For the case of deterministic
frequencies in Figure 3.7a, the success rates are slightly better. Moreover, we do not observe the
“stagnation” of the success rates for Algorithm 3.2 with randomScale = 0.3. Correspondingly, the
relative ℓ 2 errors, as shown in Figure 3.7b, decrease distinctly for growing signal-to-noise ratios.
Algorithm 3.2 performs slightly better than Algorithm 3.1, but also requires more than one order
of magnitude more samples, similar to the results shown in Figure 3.4b for 𝑠 = 100.
3.4.2    Compressible case in 10 dimensions
    In this section, we apply the methods on a test function which is not exactly sparse but com-
pressible. In addition, we also consider noisy samples as in Section 3.4.1.2. We use the 10-variate
periodic test function 𝑔 : T10 → R,
                                   Ö                      Ö                     Ö
                       𝑔(x) :=             𝐾2 (𝑥ℓ ) +              𝐾4 (𝑥ℓ ) +            𝐾6 (𝑥ℓ ),           (3.12)
                                 ℓ∈{0,2,7}             ℓ∈{1,4,5,9}            ℓ∈{3,6,8}
from [59, Section 3.3] and [43, Section 5.3] which has infinitely many non-zero Fourier coeffi-
cients 𝑔ˆ k , where 𝐾𝑚 : T → R is the B-Spline of order 𝑚 ∈ N,
                                                  Õ           𝜋 𝑚
                                 𝐾𝑚 (𝑥) := 𝐶𝑚          sinc       𝑘 (−1) 𝑘 e2𝜋i𝑘𝑥 ,
                                                  𝑘 ∈Z
                                                               𝑚
with a constant 𝐶𝑚 > 0 such that k𝐾𝑚 k 𝐿 2 (T) = 1. We remark that each B-Spline 𝐾𝑚 of order
𝑚 ∈ N is a piece-wise polynomial of degree 𝑚 − 1. We apply Algorithm 3.1 with A2𝑠,                           sub,MC
                                                                                                                𝑀˜
and use the sparsity parameters 𝑠 ∈ {50, 100, 250, 500, 1000, 2000}, which corresponds to 2𝑠 ∈
{100, 200, 500, 1000, 2000, 4000}-many frequencies and Fourier coefficients for the output of Al-
gorithm 3.1. We use the frequency set I := H33           10 and randomScale := rs ∈ {0.05, 0.1}. Moreover,
we work with the same rank-1 lattice as in Section 3.4.1.2.
    The obtained basis index sets supp( ĝ𝑠 ) should “consist of” the union of three lower dimensional
manifolds, a three-dimensional hyperbolic cross in the dimensions 1, 3, 8; a four-dimensional hyper-
bolic cross in the dimensions 2, 5, 6, 10; and a three-dimensional hyperbolic cross in the dimensions
4, 7, 9. All tests are performed 100 times and the relative 𝐿 2 approximation error
                                     q
                                        k𝑔k 2𝐿 2 − k∈supp(ĝ𝑠 ) | 𝑔ˆ k | 2 + k∈supp(ĝ𝑠 ) | 𝑔ˆ k𝑠 − 𝑔ˆ k | 2
                                                  Í                         Í
                    k𝑔 − 𝑔 k 𝐿 2
                          𝑠
                                  =
                      k𝑔k 𝐿 2                                     k𝑔k 𝐿 2
                                                          81


is computed each time.
                                                                                                 rs=0.05, noiseless       rs=0.1, SNRdb = 10
                                                                                                 rs=0.1, SNRdb = 20       rs=0.1, SNRdb = 30
                                          𝑀              ∼𝑠                                      rs=0.1, noiseless        ( 𝑔|
                                                                                                                                opt
                                                                                                                            ˆ I )2𝑠
                                          rs=0.05        rs=0.1
                                                                                         100
          109
                                                                                        10 −1
                                                                   relative 𝐿 2 error
samples   108
                                                                                        10 −2
          107
                                                                                        10 −3
                100    200       500    1,000 2,000 4,000                                       100    200       500    1,000 2,000 4,000
                      sparsity 2𝑠 of the approximation                                                sparsity 2𝑠 of the approximation
                  (a) samples vs. sparsity 2𝑠                                               (b) relative 𝐿 2 errors vs. sparsity 2𝑠
Figure 3.8 Average number of samples and relative 𝐿 2 errors over 100 test runs for Algorithm 3.1
with A2𝑠,
       sub,MC
          𝑀˜
              on 10-dimensional test function (3.12) consisting of tensor products of B-Splines
of different order. Search space is unweighted hyperbolic cross I := H3310 with SFT parameters
randomScale := rs ∈ {0.05, 0.1} and 𝑀˜ = 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 .
          In Figure 3.8a, we visualize the average number of samples against the sparsity 2𝑠 of the approx-
imation. We observe an almost linear increase with respect to 2𝑠. In Figure 3.8b, we show the aver-
age relative errors for randomScale ∈ {0.05, 0.1} in the noiseless case as well as randomScale = 0.1
for SNRdb ∈ {10, 20, 30}. In general, for increasing sparsity, the errors become smaller. For
randomScale = 0.05 in the noiseless case and randomScale = 0.1 with SNRdb = 10, the average
error is similar and stays above 3 · 10−2 even for sparsity 2𝑠 = 4000. For higher signal-to-noise
ratio, the error decreases further. For SNRdb = 30, the obtained average error is 6.1 · 10−3 for
2𝑠 = 4000, which is only approximately twice as high as the best possible error when using the
2𝑠 largest (by magnitude) Fourier coefficients 𝑔ˆ k with the restriction k ∈ I := H33
                                                                                   10 . The latter is
plotted in Figure 3.8b as dashed line without markers.
                                                                  82


                                             CHAPTER 4
              SPARSE FOURIER SPECTRAL METHODS FOR SOLVING PDE
As discussed in Section 1.1.3, this chapter focuses on a sparse spectral method for solving el-
liptic PDEs. We begin with a review of the literature on sparse spectral methods against which
we motivate our results in Section 4.1. Section 4.2 gives the advection-diffusion-reaction PDE
setup and Section 4.3 converts this problem to its Galerkin representation underpinning the spec-
tral method approach. The following three sections provide the ingredients outlined by Strang’s
lemma, Lemma 1.1, in Section 1.1.3:
    1. a Fourier series truncation method for the solution and the resulting error analysis (Sec-
       tion 4.4),
    2. a (sparse) Fourier series approximation technique (Section 4.5), and
    3. a version of Strang’s lemma that ties everything together (Section 4.6).
We close with a numerics section, Section 4.7, describing the implementation of our technique and
a variety of numerical experiments demonstrating the theory.
4.1    Overview of results and prior work
     We now outline some of the previous literature on spectral methods with an emphasis on exploit-
ing sparsity. Along the way, various shortcomings will arise, and we will use these as opportunities
to motivate and explain our approach in the sequel.
4.1.1    Prior attempts to relieve dependence on bandwidth via SFT-type methods
     A key work pioneering the use of SFTs in computing solutions to PDEs is due to Daubechies, et
al. [21]. This work mostly focuses on time-dependent, one-dimensional problems where the spectral
scheme is formulated as alternating Fourier-projections and time-steps. Thus, there is no need to
impose an a priori Fourier basis truncation on the solution. The proposed projection step instead
utilizes an SFT at each time step to adaptively retain the most significant frequencies throughout the
time-stepping procedure. Time-independent problems like (1.3) can then be handled by stepping
in time until a stationary solution is obtained.
                                                  83


    A simplified form of this algorithm is shown to succeed numerically in [21], and it is also ana-
lyzed theoretically in the case where the diffusion coefficient consists of a known, fine-scale mode
superimposed over lower frequency terms. There, the Fourier-projection step can be considered to
be fixed. However, removing the known fine-scale assumption leads to many difficulties, including
the possibility of sparsity-induced omissions in early time steps cascading into larger errors later on.
In this chapter, on the other hand, we focus on the case of time-independent problems. This allows
us to utilize SFTs only once initially. By doing so we avoid the possibility of SFT-induced error
accumulation over many time steps. The main difficulty in our analysis then becomes determining
how the Fourier-sparse representations of the PDE data discovered by high-dimensional SFTs can
be used to rapidly find a suitable Fourier representation of the solution. This takes the form of mix-
ing the Fourier supports the data into stamping sets (discussed in detail in Section 4.4) on which
we can analyze the projection error of the solution. In fact, these stamping sets can be viewed as a
modification and generalization of the techniques used in the one-dimensional and known fine-scale
analysis from [21].
4.1.2    Attempts to relieve curse of dimensionality
    Many attempts to overcome the curse of dimensionality in Fourier spectral methods for PDE
have focused on using basis truncations which allow for an efficient high-dimensional Fourier
transform. One of the most popular techniques is the sparse grid spectral method, which com-
putes Fourier coefficients on the hyperbolic cross [47, 11, 29, 30, 63, 31, 20]. In general, a sparse
grid method reduces the number of sampling points necessary to approximate the PDE data to
O (𝐾 log𝑑−1 (𝐾)), where 𝐾 acts as a type of bandwidth parameter. Algorithms to compute spec-
tral representations using these sparse sampling grids run with similar complexity. When used
in conjunction with spectral methods for solving PDE, these sparse grid Fourier transforms pro-
duce solution approximations with error estimates similar to the full 𝑑-dimensional FFT-versions
reduced by factors only on the order of 1/log𝑑−1 (𝐾).
    In the context of sparse grid Fourier transforms, these methods compute Fourier coefficients
with frequencies indexed on hyperbolic crosses of similar cardinality to the number of sampling
                                                  84


points. These hyperbolic crosses have intimate links with the space of bounded mixed derivative,
in the sense that they are the optimal Fourier-approximation spaces for this class. Thus, sparse grid
Fourier spectral methods are particularly apt for problems where the solution is of bounded mixed
derivative, as this produces an optimal 𝑢 − 𝑢 truncation term in Lemma 1.1 above.
    Though sparse-grid spectral methods can efficiently solve a variety of high-dimensional prob-
lems, there are clear downsides for the types of problems we target in this chapter. While many
problems fit the bounded mixed derivative assumption [67, 68], and therefore have accurate Fourier
representations on the hyperbolic cross, the multiscale, Fourier-sparse problems that we are inter-
ested in are especially problematic. In fact, since a hyperbolic cross of bandwidth 𝐾 contains only
those frequencies k ∈ Z𝑑 with 𝑖∈[𝑑] |𝑘 𝑖 | = O (𝐾), 𝑑-dimensional frequencies active in all dimen-
                                  Î
sions can have only kkk ∞ = O (𝐾 1/𝑑 ). Thus, in a multiscale problem with even one frequency
that interacts in all dimensions, a hyperbolic cross is required with a bandwidth exponential in 𝑑
to properly resolve the data. This then forces the traditionally curse-of-dimensionality-mitigating
log𝑑−1 (𝐾) terms characteristic of sparse grid methods to be at least on the order of 𝑑 𝑑−1 .
4.1.3   More on high-dimensional Fourier transforms
    As outlined in Section 4.1.1 above, this chapter uses sparse Fourier transforms to create an
adaptive basis truncation suited to the PDE data. This mimics a similar evolution in the field of
high-dimensional Fourier transforms from sparse grids to more flexible techniques [52, 22, 55, 49,
31, 50, 56, 34]. In particular, the rank-1 lattice based approaches for high-dimensional Fourier
transforms discussed in Chapters 2 and 3 originate from a link between early high-dimensional
quadrature techniques and Fourier approximations on the hyperbolic cross [49, 50].
    Though many rank-1 lattice approaches take I to be the hyperbolic cross to leverage the well-
studied regularity properties and cardinality bounds similarly enjoyed in the sparse-grid literature,
rank-1 lattice results are available for arbitrary frequency sets. The computationally efficient exten-
sion of these techniques via sparse Fourier transforms in Chapter 3 as well as the randomization
trick presented in Section 4.5 take this frequency set flexibility to its limit, allowing I to be the a
priori unknown set of the most important Fourier coefficients of the function to be approximated.
                                                     85


This again suggests the applicability of these methods over sparse grid (or other non-sparsity ex-
ploiting) Fourier transforms in the context of multiscale problems involving even a small number
of Fourier coefficients in extremely high dimensions.
4.1.4    Additional links to compressive sensing
     As discussed above, the SFT literature overlaps considerably with the language and techniques
of compressive sensing. As previously detailed in Chapter 3, the high-dimensional SFT we use
herein provides error bounds with best 𝑠-term approximation, compressive-sensing-type error guar-
antees [19]. As a result, the Fourier coefficients of the PDE data are approximated with errors
depending on the compressibility of their true Fourier series, and then the compressibility of the
PDE’s solution in the Fourier basis is inferred from the Fourier compressibility of the data in a
direct and constructive fashion.
     Another very successful line of work, however, aims to more directly apply standard com-
pressive sensing reconstruction methods to the general spectral method framework for solving
PDEs. Referred to as CORSING [9, 4, 10, 8, 6], these techniques use compressed sensing con-
cepts to recover a sparse representation of the solution to the system of equations derived from the
(Petrov-)Galerkin formulation of a PDE. These methods have been further extended to the case of
pseudospectral methods in [5], in which a simpler-to-evaluate matrix equation is subsampled and
used as measurements for a compressive sensing algorithm (as an aside, [5] and discussions with
the author served as a primary inspiration for the results in this chapter). This compressive spec-
tral collocation method works by finding the largest Fourier-sine coefficients of the solution with
frequencies in the integer hypercube with bandwidth 𝐾 by applying Orthogonal Matching Pursuit
(OMP) on a set of samples of the PDE data. By using OMP, the method is able to succeed with
measurements on the order of O (𝑑 exp(𝑑)𝑠 log3 (𝑠) log(𝐾)) where 𝑠 is the imposed sparsity level
of the solution’s Fourier series. Thus, while the O (𝐾 𝑑 ) dependence from a traditional Fourier
(pseudo)spectral method is avoided and the method adapts well to large bandwidths, the curse of
dimensionality is still apparent.
     Recently, an improvement on [5] that addresses the curse of dimensionality was made avail-
                                                  86


able which is therefore well-suited for similar types of problems discussed in this chapter. In [66],
the approach of approximating Fourier-sine coefficients on a full hypercube is replaced with ap-
proximating Fourier coefficients on a hyperbolic cross. This has the effect of converting the lin-
ear dependence on 𝑑 in the sampling complexity to a log(𝑑) due to cardinality estimates of the
hyperbolic cross. However, the exp(𝑑) term is refined using a different technique. The key theo-
retical ingredient for being able to apply compressive sensing to these problems is bounding the
Riesz constants of the basis functions that result after applying the differential operator [6]. A
careful estimation of these constants on the Fourier basis indexed by a hyperbolic cross is able to
entirely remove the exponential in 𝑑 dependence, leading to a sampling complexity on the order
of O (𝐶𝑎 𝑠 log(𝑑) log3 (𝑠) log(𝐾)), where 𝐶𝑎 involves terms depending on ellipticity and compress-
ibility properties of 𝑎. Notably, this estimation procedure has connections to our stamping set
techniques described in Section 4.4.
     On the other hand, though focusing on the hyperbolic cross in compressive spectral colloca-
tion breaks the curse of dimensionality in the sampling complexity, the method still suffers from
the inability to generalize to multiscale problems or generic frequency sets of interest like those
described in Section 4.1.2. Additionally, as mentioned in Section 4.1.4, the compressive-sensing
algorithm used for recovery (in this case OMP) suffers from a computational complexity on the
order of the cardinality of the truncation set of interest. For the hyperbolic cross, this is still ex-
ponential in log(𝑑). Finally, the error estimates are presented in terms of the compressibility of
the Fourier series of the solution 𝑢, which may not be known a priori from the PDE data. We ex-
pect that there may be some way to link our stamping theory and convergence estimates with the
compressive sensing theory to refine and generalize both approaches.
4.2    Elliptic PDE setup
     We begin with a model elliptic partial differential equation.
Definition 4.1. For some 𝑎 : T𝑑 → R, b : T𝑑 → R𝑑 , 𝑐 : T𝑑 → R sufficiently smooth, define the
advection-diffusion-reaction operator in divergence form L by
                                  L𝑢 = −∇ · (𝑎∇𝑢) + b · ∇𝑢 + 𝑐𝑢.
                                                  87


If for some 𝑓 : T𝑑 → R sufficiently smooth, 𝑢 ∈ 𝐶 2 satisfies
                                                  L𝑢 = 𝑓 ,                                         (SF)
we say that 𝑢 solves the given PDE with periodic boundary conditions in the strong form.
     Now, after multiplying by the complex conjugate of a test function 𝑣 ∈ 𝐻 1 (T𝑑 ) and integrating
the first term by parts, we define the sesquilinear form associated to L as 𝔏 : 𝐻 1 × 𝐻 1 → C with
                         ∫
              𝔏(𝑢, 𝑣) :=      𝑎(x)∇𝑢(x) · ∇𝑣(x) + b(x) · ∇𝑢(x)𝑣(x) + 𝑐(x)𝑢(x)𝑣(x) 𝑑x,
                           T𝑑
and we say that 𝑢 ∈ 𝐻 1 solves the given PDE with periodic boundary conditions in the weak form
if
                                  𝔏(𝑢, 𝑣) = h 𝑓 , 𝑣i 𝐿 2   for all 𝑣 ∈ 𝐻 1 .                      (WF)
For our purposes, we will take 𝑎, 𝑐 ∈ 𝐿 ∞ (T𝑑 ; R), b ∈ 𝐿 ∞ (T𝑑 ; R) 𝑑 (i.e., each coordinate of the
advection field is in 𝐿 ∞ ), and 𝑓 ∈ 𝐿 2 (T𝑑 ; R).
     By the conditions specified in the Lax-Milgram theorem (see, e.g., [23]), we are guaranteed
that a unique solution to (WF) exists. We use the formulation as stated in [8, Proposition 2.1] and
proven in [7].
Proposition 4.1. For 𝑎, 𝑐 ∈ 𝐿 ∞ (T𝑑 ; R), b ∈ 𝐿 ∞ (T𝑑 ; R) 𝑑 , 𝔏 is continuous with continuity constant
                                                                             
                                𝛽 ≤ max k𝑎k 𝐿 ∞ , sup kb(x) k 2 , k𝑐k 𝐿 ∞ ,
                                                    x∈T𝑑
that is
                            |𝔏(𝑢, 𝑣)| ≤ 𝛽k𝑢k 𝐻 1 k𝑣k 𝐻 1         for all 𝑢, 𝑣 ∈ 𝐻 1 .              (4.1)
Additionally, assuming b ∈ 𝐻 1 (T𝑑 ; R) 𝑑 , if 𝑎(x) ≥ 𝑎 min > 0 and − 12 ∇ · b(x) + 𝑐(x) ≥ 𝑑min > 0 a.e.
on T𝑑 , then 𝔏 is also coercive with coercivity constant
                                          𝛼 ≥ min {𝑎 min , 𝑑min } ,
that is
                                |𝔏(𝑢, 𝑢)| ≥ 𝛼k𝑢k 2𝐻 1         for all 𝑢 ∈ 𝐻 1 .                    (4.2)
Under conditions (4.1) and (4.2), if 𝑓 ∈ 𝐿 2 (T𝑑 ; R) then (WF) has unique solution 𝑢 ∈ 𝐻 1 satisfying
                                                         k 𝑓 k 𝐿2
                                             k𝑢k 𝐻 1 ≤            .                                (4.3)
                                                            𝛼
                                                     88


4.3   Galerkin spectral methods
    By Theorem 1.1, it is equivalent to replace the weak PDE (WF) by
                           𝔏(𝑢, e2𝜋ik·◦ ) = h 𝑓 , e2𝜋ik·◦ i 𝐿 2 =: 𝑓ˆk                  for all k ∈ Z𝑑 .
Rewriting the sesquilinear form on the left-hand side and using the Fourier series representations
of 𝑎, b (where we collect all coordinates’ Fourier coefficients at a given frequency k ∈ Z𝑑 into the
vectors b̂k ∈ C𝑑 ), 𝑐, and 𝑢, we obtain
                                    Õ                      ∫
               𝔏(𝑢, e  2𝜋ik·◦
                              )=               𝑎ˆ l1 𝑢ˆ l2          e2𝜋il1 ·x ∇e2𝜋il2 ·x · ∇e2𝜋ik·x 𝑑x
                                                               T𝑑
                                 l1 ,l2 ∈Z𝑑
                                             Õ                  ∫
                                       +                 𝑢ˆ l2        e2𝜋il1 ·x b̂l1 · ∇e2𝜋il2 ·x e2𝜋ik·x 𝑑x
                                                                  T𝑑
                                          l1 ,l2 ∈Z𝑑
                                             Õ                      ∫
                                       +                 𝑐ˆl1 𝑢ˆ l2       e2𝜋il1 ·x e2𝜋il2 ·x e2𝜋ik·x 𝑑x
                                                                      T𝑑
                                          l1 ,l2 ∈Z𝑑
                                    Õ                          h                                           i
                               =               𝛿l1 ,k−l2 (2𝜋) 2 (l2 · k) 𝑎ˆ l1 + 2𝜋i b̂l1 · l2 + 𝑐ˆl1 𝑢ˆ l2
                                 l1 ,l2 ∈Z𝑑
                                 Õh                                                                   i
                                                      2
                               =           (2𝜋) (l · k) 𝑎ˆ k−l + 2𝜋i b̂k−l · l + 𝑐ˆk−l 𝑢ˆ l
                                 l∈Z𝑑
                               =: (𝐿 𝑢)  ˆ k,
where 𝐿 is an operator in ℓ 2 . This leads to the Galerkin form of our PDE,
                                                                  𝐿 𝑢ˆ = 𝑓ˆ.                                             (GF)
    The computational advantages of (GF) are clear. By numerically approximating 𝑎,                            ˆ b̂, 𝑐ˆ and 𝑓ˆ
(which automatically truncates 𝐿), we arrive at a discretized, finite system of equations that can be
solved for the Fourier coefficients of our solution.
    We will use a fast sparse Fourier transform (SFT) for functions of many dimensions to approx-
imate our PDE data which then leads to a sparse system of equations that we can quickly solve to
approximate 𝑢.ˆ This SFT will use the values of 𝑎, b, 𝑐 and 𝑓 at equispaced nodes on a randomized
rank-1 lattice in T𝑑 , and therefore, our technique is effectively a pseudospectral method where the
discretization of the solution space {𝑢ˆ | 𝑢 ∈ ℎ} is adapted to the PDE data.
                                                                      89


    Before we move to the detailed discussion of this SFT, we provide a more detailed analysis of
the Galerkin operator in Section 4.4 to help us analyze the resulting spectral method. But first, we
note that 𝐿 also captures the behavior of 𝔏 as a sesquilinear form.
Proposition 4.2. For 𝑢, ˆ 𝑣ˆ ∈ ℓ 2 with 𝑢, 𝑣 ∈ 𝐻 1 ,
                                            𝔏(𝑢, 𝑣) = h𝐿 𝑢,   ˆ 𝑣ˆ iℓ2 .
Proof. By the Fourier series representation of 𝑣,
                                  Õ                           Õ
                      𝔏(𝑢, 𝑣) =        𝔏(𝑢, e2𝜋ik·◦ ) 𝑣ˆ k =         (𝐿 𝑢)
                                                                         ˆ k 𝑣ˆ k = h𝐿 𝑢,
                                                                                       ˆ 𝑣ˆ iℓ2 .
                                  k∈Z𝑑                        k∈Z𝑑
4.4    Stamping sets and truncation analysis
    Notably, (GF) gives us insight into the frequency support of 𝑢.               ˆ The structure outlined in the
following proposition is crucial in constructing a fast spectral method that exploits Fourier-sparsity.
Proposition 4.3. Given 𝑎,  ˆ b̂, and 𝑐,ˆ the Fourier coefficients of the diffusion coefficient, the advec-
tion field, and the reaction coefficient of an ADR equation respectively, denote the set of “active”
frequencies
                                                              
                                             ©Ø
                         A := supp ( 𝑎) ˆ ∪­          supp 𝑏ˆ 𝑗 ® ∪ supp ( 𝑐)      ˆ ⊂ Z𝑑 .
                                                                    ª
                                             « 𝑗 ∈[𝑑]               ¬
For any set F ⊂ Z and 𝑁 ∈ N0 , recursively define the sets
                    𝑑
                                             
                                                                              if 𝑁 = 0
                                             
                                             F
                                             
                                             
                             𝑁
                                             
                           S [A] (F ) :=                                                ,
                                                        [A] (F ) + A if 𝑁 > 0
                                             
                                             S                                                             (4.4)
                                                  𝑁−1
                                             
                                             
                                             Ø∞
                           S ∞ [A] (F ) :=         S 𝑁 [A] (F ),
                                             𝑁=0
where here, addition is defined as the Minkowski sum of sets. Under the conditions of Proposi-
tion 4.1, supp( 𝑢)
                 ˆ ⊂ S ∞ [A] (supp( 𝑓ˆ)).
Proof. Note first that the fact that 𝑎, b, and 𝑐 are real imply the supports of their Fourier series are
“rotationally” symmetric in Z𝑑 , e.g., supp( 𝑎)    ˆ = − supp( 𝑎).    ˆ Now, we show that 𝐿 k,k ≠ 0 for all
k ∈ Z𝑑 . Recall that
                                                                             
                              𝐿 k,k := (2𝜋) 2 (k · k) 𝑎ˆ 0 + 2𝜋i b̂0 · k + 𝑐ˆ0 .
                                                        90


It suffices to show that 𝑎ˆ 0 and 𝑐ˆ0 are strictly positive as the middle term will always be purely
imaginary. Since 𝑎 is always strictly positive under the assumptions of Proposition 4.1, its mean 𝑎ˆ 0
is necessarily strictly positive. As for 𝑐, the conditions of Proposition 4.1 require
                                                  1
                                                − ∇·b+𝑐 > 0
                                                  2
which implies
                                                    ∫
                                                  1
                                          𝑐ˆ0 >           ∇ · b(x) 𝑑x.
                                                  2   T𝑑
However, the divergence theorem implies that the right hand side is zero, and therefore 𝑐ˆ0 is positive
as desired.
     Now, since Lk,k is nonzero, we may rearrange the equality (𝐿 𝑢)          ˆ k = 𝑓ˆk to obtain
                                             𝑓ˆk − l∈({k}+A)\{k} 𝐿 k,l 𝑢ˆ l
                                                   Í
                                    𝑢ˆ k =                                  ,
                                                          𝐿 k,k
where we have restricted the summation to only those frequencies where the entries of row k of 𝐿 are
nonzero, that is, the active frequencies of the PDE data translated by k. Thus, 𝑢ˆ k explicitly depends
only on the values of 𝑢ˆ on S 1 [A] ({k}) \ {k}, which themselves then depend only on values of 𝑢ˆ
on S 2 [A] ({k}), and so on. This decouples the system of equations 𝐿 𝑢ˆ into a disjoint collection of
systems of equations, one for each class of frequencies S ∞ [A] ({k}). Since Proposition 4.1 implies
that 𝑣ˆ = 0 is the unique solution of 𝐿 𝑣ˆ = 0, the unique solution of the system of equations for 𝑢ˆ on
S ∞ [A] ({k}) for any k ∉ supp( 𝑓ˆ) is 𝑢|  ˆ S ∞ [A] ({k}) = 0. Therefore, supp( 𝑢) ˆ ⊂ S ∞ [A] (supp( 𝑓ˆ)) as
desired.
     In what follows, when the set F (often supp( 𝑓ˆ)) and set of active frequencies A are clear from
context, we suppress them in the notation given by (4.4) so that S 𝑁 := S 𝑁 [A] (F ). Intuitively, we
can imagine constructing S 𝑁 by first creating a “rubber stamp” in the shape of A. This rubber stamp
is then stamped onto every frequency in F =: S 0 to construct S 1 . Then, this process is repeated,
stamping each element of S 1 to produce S 2 , and so on. For this reason, we will colloquially refer
to these as “stamping sets.” Figure 4.1 gives an example of this stamping procedure for 𝑑 = 2.
     A key approach of our further analysis will be analyzing the decay of 𝑢ˆ on successive stamping
levels. The stamping level will become the driving parameter in the spectral method rather than
                                                        91


                                                                                          
                                   supp( 𝑓ˆ) = S 0 [A] supp( 𝑓ˆ)           S 1 [A] supp( 𝑓ˆ)
                    A
                                                                   
                       S 2 [A] supp( 𝑓ˆ)          S 3 [A] supp( 𝑓ˆ)
                                                                                𝑁 =0
                                                                                𝑁 =1
                                                                                𝑁 =2
                                                                                𝑁 =3
     Figure 4.1 New frequencies in each stamping level up to 𝑁 = 3 where 𝑁 = 0 is supp( 𝑓ˆ).
bandwidth in a traditional spectral method. Before moving onto this analysis however, we provide
an upper bound for the cardinality of the stamping sets. This will ultimately be used to upper bound
the computational complexity of our technique.
Lemma 4.1. Suppose that A = −A with 0 ∈ A, and supp( 𝑓ˆ) ≤ |A|. Then
                      S 𝑁 [A] (supp( 𝑓ˆ)) ≤ 7 max(|A|, 2𝑁 + 1) min(|A|,2𝑁+1) .
    We prove this by first providing the following combinatorial upper bound for the cardinality of
a stamp set.
Lemma 4.2. Suppose that A = −A with 0 ∈ A. Then
                                               𝑁 min(𝑛,(|A|−1)/2)                        
                                              Õ         Õ               (|A| − 1)/2 𝑛 − 1
             S [A] (supp( 𝑓ˆ)) ≤ supp( 𝑓ˆ)
              𝑁
                                                                  2 𝑡
                                                                                              . (4.5)
                                              𝑛=0       𝑡=0
                                                                              𝑡         𝑡−1
Proof. We begin by separating S 𝑁 into the disjoint pieces
                                             𝑁           𝑛−1
                                                                !!
                                            Ä            Ø
                                     S𝑁 =         S𝑛 \       S𝑖
                                            𝑛=0          𝑖=0
                                                  92


                                                                                                Ð        
and computing the cardinality of each of these sets (where we take 𝑆 −1 = ∅). If k ∈ S 𝑛 \         𝑛−1 𝑖
                                                                                                   𝑖=0 S    ,
then we are able to write k as
                                                        Õ 𝑛
                                             k = k𝑓 +        k𝑚A                                       (4.6)
                                                        𝑚=1
where k 𝑓 ∈ supp( 𝑓ˆ) and k𝑚    A ∈ A \ {0} for all 𝑚 = 1, . . . , 𝑛. Additionally, since k is not in any
earlier stamping sets, this is the smallest 𝑛 for which this is possible. In particular, it is not possible
for any two frequencies in the sum to be negatives of each other resulting in pairs of cancelled terms.
    With this summation in mind, arbitrarily split A \ {0} into 𝐴 t −𝐴 (i.e., place all frequencies
which do not negate each other into 𝐴 and their negatives in −𝐴). By collecting like frequencies
that occur as a k𝑚
                 A term in (4.6), we can rewrite this sum as
                                             Õ
                                  k = k𝑓 +         𝑠(k, kA )𝑚(k, kA )kA ,                              (4.7)
                                            k A ∈𝐴
where the sign function 𝑠(k, kA ) is given by
                                    
                                          if kA is a term in the summation (4.6)
                                    
                                      1
                                    
                                    
                                    
                                    
                                    
                                    
                                    
                                    
                    𝑠(k, kA ) := −1
                                         if −kA is a term in the summation (4.6)
                                    
                                    
                                    
                                          otherwise
                                    
                                    0
                                    
                                    
                                    
and the multiplicity function 𝑚(k, kA ) is defined as the number of times that kA or −kA appears
as a k𝑚A term in (4.6). Letting s(k) := (𝑠(k, kA ))k A ∈𝐴 and m(k) := (𝑚(k, kA ))k A ∈𝐴 , we can then
                        Ð         
identify any k ∈ S \
                   𝑛
                             𝑖=0 S with the tuple
                             𝑛−1 𝑖
                      (k 𝑓 , s(k), m(k)) ∈ supp( 𝑓ˆ) × {−1, 0, 1} 𝐴 × {0, . . . , 𝑛} 𝐴 .
                                                                                                 Ð        
Upper bounding the number of these tuples that can correspond to a value of k ∈ S 𝑛 \               𝑛−1 𝑖
                                                                                                    𝑖=0 S
will then upper bound the cardinality of this set.
    Since any k 𝑓 ∈ supp( 𝑓ˆ) can result in a valid k value, we will focus on the pairs of sign and
multiplicity vectors. Define by 𝑇𝑛 ⊂ {−1, 0, 1} 𝐴 × {0, . . . , 𝑛} 𝐴 the set of valid sign and multiplicity
                                             Ð         
pairs that can correspond to a k ∈ S 𝑛 \         𝑛−1 𝑖
                                                 𝑖=0 S    . In particular, for (s, m) ∈ 𝑇𝑛 , kmk 1 = 𝑛 and
                                                     93


supp(s) = supp(m). Thus, we can write
        min(𝑛,|
           Ä 𝐴|)
                    (s, m) ∈ {−1, 0, 1} 𝐴 × {0, . . . , 𝑛} 𝐴 | kmk 1 = 𝑛 and | supp(s)| = | supp(m)| = 𝑡 .
                  
  𝑇𝑛 ⊂
           𝑡=0
This inner set then corresponds to the 𝑡-partitions of the integer 𝑛 spread over the | 𝐴| entries of m
where each non-zero term is assigned a sign −1 or 1. The cardinality is therefore 2𝑡 | 𝐴|           𝑡−1 : the
                                                                                                   𝑛−1
                                                                                                𝑡
first factor is from the possible sign options, the second is the number of ways to choose the entries
of m which are nonzero, and the last is the number of 𝑡-partitions of 𝑛 which will fill the nonzero
entries of m. Noting that | 𝐴| =       2 ,
                                    |A|−1
                                            our final cardinality estimate is
                               𝑁          𝑛−1
                                                     !
                              Õ           Ø
                          𝑁          𝑛             𝑖
                        S   =     S \          S
                              𝑛=0          𝑖=0
                              Õ𝑁
                            ≤      supp( 𝑓ˆ) |𝑇𝑛 |
                              𝑛=0
                                           𝑁 min(𝑛,(|A|−1)/2)                          
                                          Õ            Õ             (|A| − 1)/2 𝑛 − 1
                            ≤ supp( 𝑓ˆ)                         2𝑡
                                          𝑛=0           𝑡=0
                                                                           𝑡          𝑡−1
as desired.
     Though this upper bound is much tighter than the one given in the main text, it is harder to
parse. As such, we simplify it to the bound presented in Lemma 4.1.
Proof of Lemma 4.1. Let 𝑟 = (|A| − 1)/2. We consider two cases:
Case 1: 𝑟 ≥ 𝑁 We estimate the innermost sum of (4.5). Since 𝑟 ≥ 𝑁 ≥ 𝑛, min(𝑛, (|A|−1)/2) = 𝑛.
        By upper bounding the binomial coefficients with powers of 𝑟, we obtain
                                            𝑛               Õ    𝑛
                                          Õ          𝑟 𝑛−1
                                               2 𝑡
                                                                ≤      2𝑡 (𝑟 𝑡 ) 2
                                          𝑡=0
                                                      𝑡 𝑡−1        𝑡=0
                                                                ≤ 2(2𝑟 2 ) 𝑛
        where the second estimate follows from the approximating the geometric sum. Again, bound-
        ing the next geometric sum by double the largest term, we have
                                   Õ𝑁
                 S 𝑁 ≤ supp( 𝑓ˆ)        2(2𝑟 2 ) 𝑛 ≤ (2𝑟 + 1)4(2𝑟 2 ) 𝑁 ≤ 2(2𝑟 + 1) 2𝑁+1 = |A| 2𝑁+1 .
                                   𝑛=0
                                                          94


Case 2: 𝑟 < 𝑁 Bounding the innermost sum of (4.5) proceeds much the same way as Case 1, but
        we must first split the outermost sum into the first 𝑟 + 1 terms and last 𝑁 − 𝑟 terms. Working
        with the first terms, we find
                                                  𝑟 Õ 𝑛                 
                                                Õ            𝑟 𝑛−1
                                                        2 𝑡
                                                                               ≤ 4(2𝑟 2 ) 𝑟
                                                𝑛=0 𝑡=0
                                                             𝑡 𝑡−1
        using the argument in Case 1. Now, we bound
                                           𝑁 Õ  𝑟                         𝑁
                                        Õ              𝑟 𝑛−1                Õ
                                                   2𝑡
                                                                     ≤            2(2(𝑛 − 1) 2 ) 𝑟
                                      𝑛=𝑟+1 𝑡=0
                                                       𝑡 𝑡−1              𝑛=𝑟+1
                                                                                 ∫ 𝑁
                                                                     ≤2      𝑟+1
                                                                                       𝑛2𝑟 𝑑𝑛
                                                                                 √𝑟
                                                                          √ ( 2𝑁) 2𝑟+1
                                                                     ≤ 2                    .
                                                                                   2𝑟 + 1
        Thus,
                                       "                   √               #
                                                      √  ( 2𝑁)      2𝑟+1           √ √  |A|
               S𝑁                 ˆ
                      ≤ supp( 𝑓 ) 4(2𝑟 ) + 2   2 𝑟
                                                                               ≤ 5 2 2𝑁             ≤ 7(2𝑁 + 1) |A| .
                                                             2𝑟 + 1
        Combining the two cases gives the desired upper bound.
     Proposition 4.3 gives us a natural way to consider truncations of the solution 𝑢 in frequency
space. We will use these truncations to discretize the Galerkin formulation (GF) in Section 4.6
below. In order to analyze the error in the resulting spectral method algorithm, we will need quan-
titative bounds on how the solution decays outside of the frequency sets S 𝑁 := S 𝑁 [A] (supp( 𝑓ˆ)).
For S 𝑁 to be finite, we assume in this section that A and supp( 𝑓ˆ) are finite. This assumption will
be lifted later via Lemma 4.5.
     We begin with a technical result regarding the interplay between 𝐿 and the supports of vectors
that it acts on.
Proposition 4.4. For any 𝑣ˆ with supp( 𝑣ˆ ) ⊂ S 𝑛 \ S 𝑛−1 , supp(𝐿 𝑣ˆ ) ⊂ S 𝑛+1 \ S 𝑛−2 .
Proof. For any k ∈ Z𝑑 , recall that row k of 𝐿 is supported on {k} + A. Consider
                            Õ                         Õ                                   Õ
                (𝐿 𝑣ˆ ) k =      𝐿 k,l 𝑣ˆ l =                      𝐿 k,l 𝑣ˆ l =                          𝐿 k,l 𝑣ˆ l .
                            l∈Z𝑑              l∈({k}+A)∩supp( 𝑣ˆ )               l∈({k}+A)∩(S 𝑛 \S 𝑛−1 )
                                                              95


    This sum is nonempty only if k is such that there exists l ∈ S 𝑛 \ S 𝑛−1 and k∗A ∈ A with
k = l + k∗A . By definition of l ∈ S 𝑛 \ S 𝑛−1 , 𝑛 is the minimal such number that
                             Õ 𝑛
                                     A , where k 𝑓 ∈ supp( 𝑓 ), kA ∈ A for all 𝑚 = 1, . . . , 𝑛
                l = k𝑓 +           k𝑚                                ˆ     𝑚
                             𝑚=1
holds. In particular, this implies that k𝑚        A ≠ 0 for all 𝑚 = 1, . . . , 𝑛.
    There are now two cases. First, if k∗A = −k𝑚                 A for any 𝑚, k = l + kA ∈ S
                                                                                               ∗       𝑛−1 \ S 𝑛−2 , and the
proposition is satisfied. On the other hand, we consider the case when k∗A does not negate any k𝑚                         A
involved in the sum equalling l. If k∗A = 0, then clearly k = l ∈ S 𝑛 \ S 𝑛−1 . In any other case, we
represent
                                                   Õ𝑛                             𝑛+1
                                                                                  Õ
                                      k = k𝑓 +           k𝑚A  +  k∗A   =: k 𝑓 +        k𝑚 A,
                                                   𝑚=1                            𝑚=1
where 𝑛 + 1 is the smallest number for which this holds. Thus, k ∈ S 𝑛+1 \ S 𝑛 . Altogether then,
the only possible k values such that the sum is nonzero are those in S 𝑛+1 \ S 𝑛−2 , completing the
proof.
    Noting that supp(𝐿 𝑢)    ˆ = supp( 𝑓ˆ), we observe the following interesting relationship between the
values of 𝑢ˆ on neighboring stamping levels. Below, to simplify notation, for all 𝑚, 𝑛 ∈ N0 , we set
                                           𝑑𝑚,𝑛 := h𝐿 𝑢ˆ S 𝑚 \S 𝑚−1 , 𝑢ˆ S 𝑛 \S 𝑛−1 iℓ2 ,
with the convention that S −1 = ∅.
Corollary 4.1. For all 𝑛 ∈ N0 ,
                                                                
                                                                 h 𝑓ˆ, 𝑢|           if 𝑛 = 0
                                                                
                                                                        ˆ S 0 iℓ2
                                                                
                                                                
                                                                
                              𝑑𝑛+1,𝑛 + 𝑑𝑛,𝑛 + 𝑑𝑛−1,𝑛 =
                                                                                     otherwise.
                                                                
                                                                0
                                                                
                                                                
                                                                
Proof. By Proposition 4.4, 𝑢|       ˆ S 𝑛 \S 𝑛−1 is ℓ 2 -orthogonal to 𝐿 𝑢|     ˆ S 𝑚 \S 𝑚−1 for all 𝑚 ∉ {𝑛 − 1, 𝑛, 𝑛 + 1}.
In our simplified notation, 𝑑𝑚,𝑛 = 0 for all 𝑚 ∉ {𝑛 − 1, 𝑛, 𝑛 + 1}. Thus
                                                                      Õ∞
               h 𝑓ˆ, 𝑢|
                     ˆ S 𝑛 \S 𝑛−1 iℓ2 = h𝐿 𝑢,    ˆ S 𝑛 \S 𝑛−1 iℓ2 =
                                              ˆ 𝑢|                         𝑑𝑚,𝑛 = 𝑑𝑛+1,𝑛 + 𝑑𝑛,𝑛 + 𝑑𝑛−1,𝑛 .
                                                                      𝑚=0
                                                                96


The proof is finished by noting that
                                                                 
                                                                  h 𝑓ˆ, 𝑢|          if 𝑛 = 0
                                                                 
                                                                          ˆ S0 i
                                                                 
                                                                 
                                                                 
                                       h 𝑓ˆ, 𝑢|
                                             ˆ S 𝑛 \S 𝑛−1 iℓ2 =
                                                                                     otherwise.
                                                                 
                                                                 0
                                                                 
                                                                 
                                                                 
    We are now ready to estimate 𝑢|             ˆ S 𝑛 \S 𝑛−1 in terms of its neighbors 𝑢|          ˆ S 𝑛+1 \S 𝑛 and 𝑢| ˆ S 𝑛−1 \S 𝑛−2 . The
standard approach would be to use a combination of coercivity and continuity (see, e.g., the proof
of Lemma 4.6 or [13, Section 6.4] for other examples): for 𝑛 > 0,
                                                                                                                                         
                2
𝛼 𝑢| S 𝑛 \S 𝑛−1 𝐻1
                     ≤ |𝑑𝑛,𝑛 | ≤ |𝑑𝑛+1,𝑛 | + |𝑑𝑛−1,𝑛 | ≤ 𝛽 𝑢| S 𝑛 \S 𝑛−1              𝐻1
                                                                                             𝑢| S 𝑛+1 \S 𝑛       𝐻1
                                                                                                                    + 𝑢| S 𝑛−1 \S 𝑛−2  𝐻1
                                                                                                                                            ,
and we obtain
                                                        𝛽                                                     
                               𝑢| S 𝑛 \S 𝑛−1    𝐻1
                                                    ≤         𝑢| S 𝑛+1 \S 𝑛 𝐻1
                                                                                 + 𝑢| S 𝑛−1 \S 𝑛−2        𝐻1
                                                                                                                 .
                                                       𝛼
However, we will hope to iterate this bound, and the fact that 𝛽 ≥ 𝛼 will not allow for us to show
any decay as 𝑛 → ∞. Thus, we require a slightly subtler estimate than simply using continuity.
Proposition 4.5. Define
                                                                                                                
                           𝛽−0  := max k𝑎 − 𝑎ˆ 0 k 𝐿 ∞ , sup b(x) − b̂0 2 , k𝑐 − 𝑐ˆ0 k 𝐿 ∞ .
                                                                x∈T𝑑
For 𝑛 > 0, we have
                                     |𝑑𝑛±1,𝑛 | ≤ 𝛽−0 𝑢| S 𝑛 \S 𝑛−1       𝐻1
                                                                              𝑢| S 𝑛±1 \S 𝑛±1−1   𝐻1
                                                                                                       .
Proof. Restricting all sums to the support of the vectors they index, we have
                                                   Õ                       Õ
                                 𝑑𝑛±1,𝑛 =                                                    𝐿 k,l 𝑢ˆ l 𝑢ˆ k .
                                               k∈S 𝑛 \S 𝑛−1 l∈({k}+A))∩(S 𝑛±1 \S 𝑛±1−1 )
Clearly, choosing l = k ∈ S 𝑛 \ S 𝑛−1 would not allow for l ∈ S 𝑛±1 \ S 𝑛±1−1 . Thus, no term
multiplying 𝐿 k,k will appear in the sum. This implies that there are no terms including the Fourier
coefficients 𝑎ˆ 0 , b̂0 , or 𝑐ˆ0 . It is therefore equivalent to replace 𝐿 with a version 𝐿 − defined using the
Fourier coefficients 𝑎ˆ − 𝑎ˆ 0 , b̂ − b̂0 , and 𝑐ˆ − 𝑐ˆ0 . We then have the equivalence
                                          𝑑𝑛±1,𝑛 = h𝐿 − 𝑢|                      ˆ S 𝑛 \S 𝑛−1 iℓ2 ,
                                                             ˆ S 𝑛±1 \S 𝑛±1−1 , 𝑢|
                                                                    97


which by Proposition 4.2 and the standard argument to prove the continuity upper bound, implies
                             |𝑑𝑛±1,𝑛 | ≤ 𝛽−0 𝑢| S 𝑛 \S 𝑛−1    𝐻1
                                                                   𝑢| S 𝑛±1 \S 𝑛±1−1  𝐻1
                                                                                         .
as desired.
    The same argument preceding Proposition 4.5 then gives the desired “neighbor” estimate.
Corollary 4.2. For all 𝑛 > 1,
                                           𝛽−0                                                
                      𝑢| S 𝑛 \S 𝑛−1  𝐻1
                                        ≤          𝑢| S 𝑛+1 \S 𝑛  𝐻1
                                                                      + 𝑢| S 𝑛−1 \S 𝑛−2     𝐻1
                                                                                                 .
                                            𝛼
    We now have the pieces to state an estimate of the truncation error.
Lemma 4.3. Let 𝑎, b, 𝑐, 𝑓 , and 𝑢 be as in Proposition 4.1. Assume
                                                    3𝛽−0 < 𝛼.                                                    (4.8)
Then
                                                                      𝑁+1
                                                             𝛽−0
                                                       
                                                                             k 𝑓 k 𝐿2
                                 k𝑢 − 𝑢| S 𝑁 k 𝐻 1 ≤                                  .
                                                         𝛼 − 2𝛽−0               𝛼
Proof. We begin by breaking supp( 𝑢)      ˆ \ S 𝑁 into sets of new contributions
                                                                                                   Ð∞                 
                                                                                                    𝑛=𝑁+1 S 𝑛 \ S 𝑛−1
(which holds due to Proposition 4.3). Thus
                                                     Õ∞
                              k𝑢 − 𝑢| S 𝑁 k 𝐻 1 ≤            𝑢| S 𝑛 \S 𝑛−1   𝐻1
                                                                                 =: 𝑇𝑁 .
                                                    𝑛=𝑁+1
Applying the neighbor bound, Corollary 4.2, (where we define 𝐴 := 𝛽−0 /𝛼), we have
                                      ∞                            ∞
                                                                                               !
                                     Õ                            Õ
                      𝑇𝑁 ≤ 𝐴               𝑢| S 𝑛+1 \S 𝑛 𝐻 1 +             𝑢| S 𝑛−1 \S 𝑛−2 𝐻 1
                                    𝑛=𝑁+1                        𝑛=𝑁+1
                          = 𝐴 (𝑇𝑁+1 + 𝑇𝑁−1 )
                                                                                          
                          = 2𝐴𝑇𝑁 + 𝐴 𝑢| S 𝑁 \S 𝑁 −1           𝐻1
                                                                  − 𝑢| S 𝑁 +1 \S 𝑁      𝐻1
                                                                                             .
After rearranging, and ignoring the negative term, we find
                                                   𝐴
                                       𝑇𝑁 ≤               𝑢| 𝑁 𝑁 −1           .                                  (4.9)
                                               1 − 2𝐴 S \S                 𝐻1
Noting that we always have
                                           𝑢| S 𝑁 \S 𝑁 −1  𝐻1
                                                                 ≤ 𝑇𝑁−1 ,                                       (4.10)
                                                         98


iterating (4.9) and (4.10) in turn gives
                                                   𝑁+1                           𝑁+1
                                               𝐴                           𝐴              k 𝑓 k 𝐿2
                 k𝑢 − 𝑢| S 𝑁 k 𝐻 1 ≤ 𝑇𝑁 ≤                𝑢| S 0 𝐻1
                                                                   ≤                               .
                                            1 − 2𝐴                      1 − 2𝐴               𝛼
4.5    Fully sublinear-time SFTs with randomized lattices
     In Chapter 3, two methods for high-dimensional SFTs are presented, each with a determin-
istic and Monte Carlo variant. Below, we will be using the faster of the two algorithms (at the
cost of slightly suboptimal error guarantees), the phase-encoding approach with the nonequispaced
sublinear-time SFT discussed in Corollary 3.2. We focus on only the Monte Carlo variant as the
improvements in this section require randomization.
     To use the high-dimensional phase-encoding SFT given in Algorithm 3.1, we need to know
a reconstructing rank-1 lattice in advance. Though component-by-component algorithms can de-
terministically construct a reconstructing rank-1 lattice given any frequency set I, as previously
discussed, these algorithms are superlinear in |I| as they effectively search the frequency space for
collisions throughout construction.
     This section presents an alternative based on choosing a random lattice. This lattice is chosen
by drawing z from a uniform distribution over {1, . . . , 𝑀 − 1} 𝑑 for 𝑀 sufficiently large. Below, we
provide probability estimates for when this lattice is reconstructing for a frequency set I.
Lemma 4.4. Let 𝐾I := max 𝑗 ∈[𝑑] (maxk∈I 𝑘 𝑗 − minl∈I 𝑙 𝑗 ) + 1 be the expansion of the frequency
                                                                                                      |I| 2
set I ⊂ Z𝑑 . Let 𝜎 ∈ (0, 1], and fix 𝑀 to be the smallest prime greater than max(𝐾I ,                  𝜎 ). Then
drawing each component of z i.i.d from {1, . . . 𝑀 − 1} gives that Λ(z, 𝑀) is a reconstructing rank-1
lattice for I with probability 1 − 𝜎.
Proof. In order to show that Λ(z, 𝑀) is reconstructing for I, it suffices to show that for any k ≠
l ∈ I, k · z . l · z mod 𝑀 (cf. Definition 1.2). Thus, we are interested in showing that P[∃k ≠ l ∈
I s.t. (k − l) · z ≡ 0 mod 𝑀] is small.
     If k, l ∈ I are distinct, at least one component 𝑘 𝑗 − 𝑙 𝑗 is nonzero. Since 𝑀 > 𝐾I , we therefore
have that 𝑘 𝑗 − 𝑙 𝑗 . 0 mod 𝑀, and since 𝑀 is prime, 𝑘 𝑗 − 𝑙 𝑗 has a multiplicative inverse modulo
                                             h                  Í                                   i
𝑀. Then P[(k − l) · z ≡ 0 mod 𝑀] = P 𝑧 𝑗 = (𝑘 𝑗 − 𝑙 𝑗 )      −1
                                                                   𝑖∈[𝑑],𝑖≠ 𝑗 (𝑘 𝑖 − 𝑙𝑖 )𝑧𝑖 mod 𝑀 . Since 𝑧 𝑗
                                                      99


is uniformly distributed in {1, . . . 𝑀 − 1}, this probability is           𝑀−1 .
                                                                              1
                                                                                   By the union bound,
                                                                Õ                                   |I| 2
    P[∃k ≠ l ∈ I s.t. (k − l) · z ≡ 0 mod 𝑀] ≤                       P[(k − l) · z ≡ 0 mod 𝑀] ≤           ≤𝜎
                                                                                                   𝑀 −1
                                                               k≠l∈I
as desired.
    One important consequence of Lemma 4.4 is that we no longer need to provide the frequency
set of interest in Corollary 3.2. Having chosen 𝐾, the expansion, and 𝑠, the sparsity level, we
can always take I to be the frequencies corresponding to the largest 𝑠 Fourier coefficients of the
function 𝑔 in the hypercube B𝐾𝑑 . Lemma 4.4 then implies that a randomly generated lattice with
length max(𝐾, 𝑠2 /𝜎) will be reconstructing for these optimal frequencies with probability 𝜎. We
summarize this in the following corollary.
Corollary 4.3. Given a multivariate bandwidth 𝐾, a sparsity level 𝑠, probability of failure 𝜎 ∈
(0, 1], and sampling access to 𝑔 ∈ 𝐿 2 , there exists a fast, randomized SFT which will produce a
2𝑠-sparse approximation ĝ𝑠 of 𝑔ˆ and function 𝑔 𝑠 := k∈supp(ĝ𝑠 ) 𝑔ˆ k𝑠 e2𝜋ik·◦ approximating 𝑔 satisfying
                                                                  Í
                                                                          √               opt
                            k𝑔 − 𝑔 𝑠 k 𝐿 2 ≤ k 𝑔ˆ − ĝ𝑠 k ℓ2 ≤ (25 + 3𝐾) 𝑠 𝑔ˆ − ( 𝑔| ˆ 𝐾 )𝑠
                                                                                               ℓ1
with probability 1 − 𝜎. If 𝑔 ∈ 𝐿 ∞ , then 𝑔 𝑠 and ĝ𝑠 satisfy the upper bound
                                                                                         opt
                             k𝑔 − 𝑔 𝑠 k 𝐿 ∞ ≤ k 𝑔ˆ − ĝ𝑠 k ℓ1 ≤ (35 + 3𝐾)𝑠 𝑔ˆ − ( 𝑔|ˆ 𝐾 )𝑠
                                                                                              ℓ1
with the same probability estimate. The total number of samples of 𝑔 and computational complexity
of the algorithm can be bounded above by
                                                                                          
                                         3                              𝑑𝐾 max(𝐾, 𝑠/𝜎)
                            O 𝑑𝑠 log (𝑑𝐾 max(𝐾, 𝑠/𝜎)) log                                       .
                                                                                𝜎
If we fix 𝜎 (say 𝜎 = 0.95), this reduces to a complexity of
                                                                           
                                             O 𝑑𝑠 log4 (𝑑𝐾 max(𝐾, 𝑠)) .
4.6   A sparse spectral method via SFTs
    Let â𝑠 , b̂𝑠 , ĉ𝑠 , and f̂ 𝑠 be 𝑠-sparse approximations of 𝑎,             ˆ and 𝑓ˆ respectively, where each
                                                                         ˆ b̂, 𝑐,
coordinate in b̂ is approximated separately. We will use these approximations to discretize the
                                                              100


Galerkin formulation (GF) of our PDE. The first step is to reduce to the case where the PDE data
is Fourier-sparse which is motivated by the following lemma.
Lemma 4.5. Let 𝑎0 := 𝑎| supp(â𝑠 ) , 𝑏0𝑗 := 𝑏 𝑗 | supp(b̂𝑠 ) for 𝑗 ∈ [𝑑], 𝑐0 := 𝑐| supp(ĉ𝑠 ) , and 𝑓 0 := 𝑓 | supp(f̂ 𝑠 ) .
                                                           𝑗
Define
                                                                                         
                          𝛽0−                     0                     0
                              := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ .     0
                                                         x∈T𝑑
Suppose that 𝑎0, b0, 𝑐0, and 𝑓 0 satisfy the conditions of Proposition 4.1 and let 𝑢0 be the unique
solution of the resulting elliptic PDE, which we write in Galerkin form as
                                                     𝐿 0𝑢ˆ0 = 𝑓ˆ0 .                                             (4.11)
Then
                                                     k 𝑓 − 𝑓 0 k 𝐿 2 𝛽0− k 𝑓 0 k 𝐿 2
                                    k𝑢 − 𝑢0 k 𝐻 1 ≤                   +               .
                                                             𝛼              𝛼𝛼0
where 𝛼0 is taken to be the coercivity coefficient of the differential operator defined using 𝑎0, b0, and
𝑐0 .
Proof. We begin by observing
                     𝐿 ( 𝑢ˆ − 𝑢ˆ0) = 𝐿 𝑢ˆ − 𝐿 0𝑢ˆ0 − (𝐿 − 𝐿 0) 𝑢ˆ0 = 𝑓ˆ − 𝑓ˆ0 − (𝐿 − 𝐿 0) 𝑢ˆ0,
and therefore
                  |h𝐿 ( 𝑢ˆ − 𝑢ˆ0), 𝑢ˆ − 𝑢ˆ0i| ≤ h 𝑓ˆ − 𝑓ˆ0, 𝑢ˆ − 𝑢ˆ0i + |h(𝐿 − 𝐿 0) 𝑢ˆ0, 𝑢ˆ − 𝑢ˆ0i|.
After an application of Proposition 4.2 to convert the ℓ 2 inner products into sesquilinear forms, we
can make use of coercivity, (4.2), continuity, (4.1), and the Cauchy-Schwarz inequality to produce
the 𝐻 1 approximation
                                    𝛼k𝑢 − 𝑢0 k 𝐻 1 ≤ 𝑓ˆ − 𝑓ˆ0      ℓ2
                                                                      + 𝛽0− k𝑢0 k 𝐻 1 .
An application of the stability estimate (4.3) gives the desired bound
                                                     k 𝑓 − 𝑓 0 k 𝐿 2 𝛽0− k 𝑓 0 k 𝐿 2
                                    k𝑢 − 𝑢0 k 𝐻 1 ≤                   +               .
                                                             𝛼              𝛼𝛼0
                                                         101


     We can now replace the trial and test spaces in (WF) with finite dimensional approximations
so as to convert (GF) to a matrix equation. Inspired by Proposition 4.3 and the truncation error
analysis in Section 4.4, we use the space of functions whose Fourier coefficients are supported on
S 𝑁 := S 𝑁 [A] (supp 𝑓ˆ). By doing so, we discretize the Galerkin formulation of the problem (GF)
into the finite system of equations
                        Õ h                                                      i
        (L𝑁 û)k :=             (2𝜋) 2 (l · k) 𝑎ˆ k−l + 2𝜋i b̂k−l · l + 𝑐ˆk−l 𝑢ˆ l = 𝑓ˆk           for all k ∈ S 𝑁 .      (4.12)
                       l∈S 𝑁
However, in practice, we do not know 𝑎,               ˆ b̂, 𝑐, ˆ and 𝑓ˆ exactly (and indeed, they may not be exactly
sparse). Thus, we substitute the SFT approximations â𝑠 , b̂𝑠 , ĉ𝑠 , and f̂ 𝑠 , defining the new finite-
dimensional operator L𝑠,𝑁 : CS → CS by
                                            𝑁          𝑁
                                 Õ h                                                      i
                                              2
                                                                                                    for all k ∈ S 𝑁 .
                                                          𝑠               𝑠            𝑠
                 L𝑠,𝑁 û   k :=         (2𝜋) (l ·    k) 𝑎ˆ k−l + 2𝜋i    b̂k−l  ·l +   𝑐ˆk−l   𝑢ˆ l
                                l∈S 𝑁
Our new approximate solution will be û𝑠,𝑁 ∈ CS which solves
                                                                   𝑁
                                                           L𝑠,𝑁 û𝑠,𝑁 = f̂ 𝑠 .                                            (4.13)
We summarize our technique in Algorithm 4.1.
Algorithm 4.1 Sparse spectral method
Input: PDE data 𝑎, b, 𝑐, and 𝑓 , a sparsity parameter 𝑠, a bandwidth parameter 𝐾, and stamping
     level 𝑁
Output: Fourier coefficients û𝑠,𝑁 of approximate solution
  1: â𝑠 ← SFT[𝑠, 𝐾] (𝑎)              // SFT is Algorithm 3.1 using a random rank-1 lattice (cf. Section 4.5)
  2: A ← supp( â )
          𝑠                𝑠
  3: for 𝑗 ∈ [𝑑] do
  4:        b̂𝑠𝑗 ← SFT[𝑠, 𝐾] (𝑏 𝑗 )
                                      
  5:        A 𝑠 ← A 𝑠 ∪ supp b̂𝑠𝑗
  6: end for
  7: ĉ𝑠 ← SFT[𝑠, 𝐾] (𝑐)
  8: A 𝑠 ← A 𝑠 ∪ supp(ĉ𝑠 )
  9: f̂ 𝑠 ← SFT[𝑠, 𝐾] ( 𝑓 )              
10: Compute S 𝑁 [A 𝑠 ] supp f̂ 𝑠                                                                     // see, e.g., (4.4) or (4.7)
                                                                            
11: (L𝑠,𝑁 )k∈S 𝑁 ,l∈S 𝑁 ← (2𝜋) 2 (l · k) 𝑎ˆ k−l     𝑠 + 2𝜋i b̂𝑠 · l + 𝑐ˆ𝑠
                                                                    k−l           k−l
12:  û𝑠,𝑁 ← L𝑠,𝑁 \f̂ 𝑠                                      // using MATLAB backslash notation for matrix solve
                                                                 102


     Showing that 𝑢 𝑠,𝑁 converges to 𝑢 now relies on a version of Strang’s lemma [13, Equation
(6.4.46)]. We make the assumption here that all functions’ Fourier coefficients are supported on
the supports of the outputs of their respective SFTs so that our use of S 𝑁 is unambiguous. However,
this assumption will be lifted by Lemma 4.5 in Corollary 4.4 below.
                                                                                                                    
Lemma 4.6 (Strang’s Lemma). Suppose that supp( 𝑎)                     ˆ = supp( â𝑠 ), supp 𝑏ˆ 𝑗 = supp b̂𝑠𝑗 for all
𝑗 ∈ [𝑑], supp( 𝑐)   ˆ = supp(ĉ𝑠 ), and supp( 𝑓ˆ) = supp( f̂ 𝑠 ). Also suppose that 𝑎 𝑠 ≥ 𝑎 min                    𝑠   > 0 and
− 12 ∇ · b𝑠 + 𝑐 𝑠 ≥ 𝑑min
                      𝑠     > 0 on T𝑑 , with 𝛼 𝑠 ≥ min{𝑎 min      𝑠 , 𝑑 𝑠 }. Additionally, define
                                                                          min
                                                                                                   
                             𝛽−𝑠                     𝑠                        𝑠
                                 := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ .       𝑠
                                                             x∈T𝑑
Let 𝑢 and 𝑢 𝑠,𝑁 be as above. Then
                                                 
                            𝑠,𝑁                𝛽                          𝛽−𝑠                  k 𝑓 − 𝑓 𝑠 k 𝐿2
                    𝑢−𝑢          𝐻1
                                    ≤ 1+ 𝑠           𝑢| Z𝑑 \S 𝑁   𝐻1
                                                                       +      k𝑢|  S 𝑁 k 𝐻 1 +                .
                                              𝛼                           𝛼𝑠                          𝛼𝑠
Proof. Define 𝐿 𝑠 as 𝐿 where 𝑎, b, and 𝑐 are replaced by 𝑎 𝑠 , b𝑠 , and 𝑐 𝑠 . Note that 𝐿 𝑠 is still an
infinite dimensional operator and is not truncated to S 𝑁 like L𝑠,𝑁 is. We let ê := û𝑠,𝑁 − 𝑢|                       ˆ S 𝑁 , and
consider
                            L𝑠,𝑁 ê = L𝑠,𝑁 û𝑠,𝑁 − (𝐿 𝑠 𝑢| ˆ S 𝑁 )| S 𝑁
                                    = f̂ 𝑠 − 𝑓ˆ + (𝐿 𝑢)|
                                                       ˆ S 𝑁 − (𝐿 𝑠 𝑢|  ˆ S 𝑁 )| S 𝑁
                                    = f̂ 𝑠 − 𝑓ˆ + (𝐿 𝑢|ˆ Z𝑑 \S 𝑁 )| S 𝑁 + ((𝐿 − 𝐿 𝑠 ) 𝑢|  ˆ S 𝑁 )| S 𝑁 .
Noting that L𝑠,𝑁 ê = (𝐿 𝑠 ê)| S 𝑁 and owing to coercivity of 𝐿 𝑠 , we have
               𝛼 𝑠 k𝑒k 2𝐻 1 ≤ hL𝑠,𝑁 ê, êi
                            ≤ k 𝑓 𝑠 − 𝑓 k 𝐿 2 k𝑒k 𝐻 1 + 𝛽 𝑢| Z𝑑 \S 𝑁      𝐻1
                                                                              k𝑒k 𝐻 1 + 𝛽−𝑠 k𝑢| S 𝑁 k 𝐻 1 k𝑒k 𝐻 1 .
The result then follows from rearranging to estimate k𝑒k 𝐻 1 and using the triangle inequality to
estimate 𝑢 − 𝑢 𝑠,𝑁       𝐻1
                              ≤ k𝑢 − 𝑢| S 𝑁 k 𝐻 1 + k𝑒k 𝐻 1 .
     We can now thread all of our results together into a final convergence analysis. The first corollary
below is a more direct application of Strang’s lemma which is then followed by another corollary
                                                             103


which takes advantage of the SFT recovery results. We will also return to the setting where the PDE
data are not necessarily Fourier sparse. Thus, we again employ intermediate, compactly Fourier-
supported PDE data as in Lemma 4.5.
Corollary 4.4. Let 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , and 𝑓 𝑠 be Fourier sparse approximations of 𝑎, b, 𝑐, and 𝑓 . Let
𝑎0 = 𝑎| supp(â𝑠 ) , 𝑏0𝑗 = 𝑏 𝑗 | supp(b̂𝑠 ) for all 𝑗 ∈ [𝑑], 𝑐0 = 𝑐| supp(ĉ𝑠 ) , and 𝑓 0 = 𝑓 | supp(f̂ 𝑠 ) . Suppose 𝑎, b, 𝑐,
                                         𝑗
 𝑓; 𝑎0, b0, 𝑐0, 𝑓 0;  and  𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , 𝑓 𝑠 satisfy the conditions of Proposition 4.1 with coercivity constants
𝛼, 𝛼0, and 𝛼 𝑠 respectively. Define the three modified continuity constants
                                                                                                      
                                 0                        0                     0               0
                               𝛽− := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ ,
                                                                x∈T𝑑
                                                                                                         
                              0,0                   0                     0                0
                           𝛽− := max k𝑎 − 𝑎ˆ 0 k 𝐿 ∞ , sup b − b̂0 2 , k𝑐 − 𝑐ˆ0 k 𝐿 ∞ ,
                                                                x∈T𝑑
                                                                                                         
                              0,𝑠                   0     𝑠               0      𝑠         0      𝑠
                           𝛽− := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ .
                                                                x∈T𝑑
Additionally, suppose that
                                                             3𝛽0,0
                                                               − < 𝛼 .
                                                                        0
                                                                                                                       (4.14)
Then with 𝑢 the exact solution to (WF) and 𝑢 𝑠,𝑁 the output of Algorithm 4.1, we have
                            𝑠,𝑁             k 𝑓 − 𝑓 0 k 𝐿 2 𝛽0− k 𝑓 0 k 𝐿 2 𝛽0,𝑠        0
                                                                                  − k 𝑓 k 𝐿2        k 𝑓 0 − 𝑓 𝑠 k 𝐿2
                      𝑢−𝑢          𝐻1
                                       ≤                    +               +                   +
                                                  𝛼               𝛼𝛼0              𝛼 𝑠 𝛼0                  𝛼𝑠
                                                                             ! 𝑁+1                                     (4.15)
                                                        𝛽0−        𝛽0,0              k 𝑓 0 k 𝐿2
                                                           
                                                                     −
                                                + 1+ 𝑠                                          .
                                                        𝛼      𝛼0 − 2𝛽0,0 −             𝛼0
Proof. The condition (4.14) allows the use of Lemma 4.3, which upper bounds the truncation error
in Lemma 4.6. Combining Lemma 4.5 with this bound from Lemma 4.6 and applying the stability
estimate from Proposition 4.1 finishes the proof.
     This upper bound relies on the intermediate 𝑎0, b0, 𝑐0, and 𝑓 0. However, in practice, it is more
likely that user of this algorithm will have knowledge regarding the well-posedness of the original
problem (i.e., that 𝑎, b, 𝑐, and 𝑓 satisfy Proposition 4.1) and will be able to verify the well-posedness
of the sparse approximate problem (i.e., that 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , and 𝑓 𝑠 satisfy Proposition 4.1) or at least
increase the accuracy of the SFT so that the coercivity conditions of the original problem are not too
far perturbed. The intermediate “prime” functions, on the other hand, are less accessible. There-
fore, we rewrite this statement so the assumptions and error bounds can be quantified using only
                                                                104


errors between the original functions and the sparse approximations which Corollary 4.3 gives up-
per bounds for.
Corollary 4.5. Assume that 𝑎, 𝑐 ∈ 𝐿 ∞ (T𝑑 ; R), b ∈ 𝐻 1 (T𝑑 ; R) 𝑑 , and 𝑓 ∈ 𝐿 2 (T𝑑 ; R) with 𝑎(x) ≥
𝑎 min > 0 and − 12 ∇ · b(x) + 𝑐(x) ≥ 𝑑min > 0 a.e. on T𝑑 . Let 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , and 𝑓 𝑠 be Fourier sparse
approximations supported in frequency on B𝐾𝑑 of 𝑎, b, 𝑐, and 𝑓 respectively with
                                                    k 𝑎ˆ − â𝑠 k ℓ1 < 𝑎 min ,
                                        𝜋𝐾 Õ ˆ                                                                        (4.16)
                     k 𝑐ˆ − ĉ𝑠 k ℓ1 +                𝑏 𝑗 − b̂𝑠𝑗       − k∇ · (b − b| 𝐾 )k 𝐿 ∞ < 𝑑min .
                                         2                         ℓ1
                                            𝑗 ∈[𝑑]
Define
           
                                                                                                                     
                                                                                                                      
           
                                𝑠                       𝑠          𝜋𝐾 Õ ˆ                 𝑠
                                                                                                                      
                                                                                                                      
𝛼 := min 𝑎 min − k 𝑎ˆ − â k ℓ1 , 𝑑min − k 𝑐ˆ − ĉ k ℓ1 −                         𝑏 𝑗 − b̂ 𝑗 1 − k∇ · (b − b| 𝐾 )k 𝐿 ∞ > 0,
                                                                     2                        ℓ                      
           
                                                                         𝑗 ∈[𝑑]                                      
                                                                                                                      
                                                          v
                                                           tÕ                                           
                                         
                                         
                                                                                    2                  
                                                                                                        
                                                                                                        
                            ˆ 𝑠                     𝑠
                           𝛽− := max k 𝑎ˆ − â k ℓ1 ,                    ˆ                         𝑠
                                                                         𝑏 𝑗 − b̂ 𝑗 1 , k 𝑐ˆ − ĉ k ℓ1 ,
                                                                                  𝑠
                                                                                    ℓ                  
                                         
                                                               𝑗 ∈[𝑑]                                  
                                                                                                        
and
                                                        v
                                                         tÕ                                               
                                       
                                       
                                                                               2                       
                                                                                                          
                                                                                                          
                        ˆ 0
                      𝛽− := max k 𝑎ˆ − 𝑎ˆ 0 k ℓ1 ,                     ˆ
                                                                      𝑏𝑗 − 𝑏𝑗   ˆ        , k 𝑐ˆ − 𝑐ˆ0 k ℓ1 .
                                                                                   0 ℓ1                  
                                       
                                                            𝑗 ∈[𝑑]                                       
                                                                                                          
Additionally, suppose that
                                                           3 𝛽ˆ−0 ≤ 𝛼.
Then with 𝑢 the exact solution to (WF) and 𝑢 𝑠,𝑁 the output of Algorithm 4.1, we have
                                                                                                      𝑁+1 !
                                                 𝑓ˆ ℓ2      𝑓ˆ − f̂ 𝑠 ℓ2 𝛽ˆ−𝑠              𝛽ˆ−0
                                                                                      
                                  𝑠,𝑁
                         𝑢−𝑢          𝐻1
                                          ≤3                               +       +                         .
                                                  𝛼              𝑓ˆ 2
                                                                    ℓ
                                                                               𝛼        𝛼 − 2 𝛽ˆ−0
Proof. Since 𝑎ˆ0 = 𝑎|  ˆ supp(â𝑠 ) ,
                                       k𝑎 − 𝑎0 k 𝐿 ∞ ≤ k 𝑎ˆ − 𝑎ˆ0 k ℓ1 ≤ k 𝑎ˆ − â𝑠 k ℓ1 ,
                                      k𝑎0 − 𝑎 𝑠 k 𝐿 ∞ ≤ k 𝑎ˆ0 − â𝑠 k ℓ1 ≤ k 𝑎ˆ − â𝑠 k ℓ1 ,
and analogously for 𝑐, 𝑏 𝑗 for all 𝑗 ∈ [𝑑], and 𝑓 , where the latter uses ℓ 2 norms. This allows for the
replacement of 𝛽0− and 𝛽0,𝑠     − in (4.15) by 𝛽− as well as the replacement of k 𝑓 − 𝑓 k 𝐿 2 and k 𝑓 − 𝑓 k 𝐿 2
                                                   ˆ𝑠                                                      0       0    𝑠
by 𝑓ˆ − f̂ 𝑠 𝐿2
                . A similar argument allows the replacement of 𝛽0,0                    − by 𝛽− .
                                                                                                ˆ0
                                                                105


    Additionally,
                                      𝑎 𝑠 ≥ 𝑎 − k𝑎 − 𝑎 𝑠 k 𝐿 ∞ ≥ 𝑎 − k 𝑎ˆ − â𝑠 k ℓ1 and
                                         𝑎0 ≥ 𝑎 − k𝑎 − 𝑎0 k 𝐿 ∞ ≥ 𝑎 − k 𝑎ˆ − â𝑠 k ℓ1
giving min(𝑎 min𝑠 , 𝑎0 ) ≥ 𝑎
                        min           min − k 𝑎ˆ − â k ℓ 1 . We can bound min(𝑑 min , 𝑑 min ) from below similarly.
                                                         𝑠                                         𝑠     0
In particular, e.g.,
                                1                     1                                      1
                       𝑐0 − ∇ · b0 ≥ 𝑐 − ∇ · b − k𝑐 − 𝑐0 k 𝐿 ∞ − k∇ · (b − b0)k 𝐿 ∞ .
                                2                     2                                      2
The k𝑐 − 𝑐0 k 𝐿 ∞ term can be bounded by k 𝑐ˆ − ĉ𝑠 k ℓ1 . To bound the divergence term, we use
         k∇ · (b − b0)k 𝐿 ∞ ≤ k∇ · (b − b0)| 𝐾 k 𝐿 ∞ + k∇ · (b − b| 𝐾 )k 𝐿 ∞
                                       Õ             Õ                    
                                 =                                    𝑏ˆ 𝑗       𝜕 𝑗 e2𝜋ik·◦     + k∇ · (b − b| 𝐾 )k 𝐿 ∞
                                                                             k
                                      𝑗 ∈[𝑑] k∉supp( b̂𝑠 )∩B 𝑑
                                                          𝑗      𝐾
                                                                                              𝐿∞
                                           Õ             Õ                      
                                  ≤ 2𝜋                                      𝑏ˆ 𝑗      𝑘 𝑗 + k∇ · (b − b| 𝐾 )k 𝐿 ∞          (4.17)
                                                                                   k
                                          𝑗 ∈[𝑑]  k∉supp( b̂𝑠𝑗 )∩B𝐾𝑑
                                             Õ
                                  ≤ 𝐾𝜋              𝑏ˆ 𝑗 − 𝑏ˆ 0𝑗        + k∇ · (b − b| 𝐾 )k 𝐿 ∞
                                                                   ℓ1
                                            𝑗 ∈[𝑑]
                                             Õ
                                  ≤ 𝐾𝜋              𝑏ˆ 𝑗 − b̂𝑠𝑗         + k∇ · (b − b| 𝐾 )k 𝐿 ∞
                                                                   ℓ1
                                            𝑗 ∈[𝑑]
Thus min(𝛼0, 𝛼 𝑠 ) ≥ 𝛼 as stated, implying the satisfaction of Proposition 4.1 for the PDEs with 𝑎0,
b0, 𝑐0, 𝑓 0 and 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , 𝑓 𝑠 as data. This also allows the replacement of 𝛼, 𝛼0 and 𝛼 𝑠 in (4.15) by 𝛼.
    The rest follows by upper bounding k 𝑓 0 k 𝐿 2 by 𝑓ˆ                         ℓ2
                                                                                     , combining like terms, and simplifying.
Remark 4.1. Corollary 4.5, includes some overly cautious concessions in order to produce a fully
unified result with cleaner error bounds. In particular, condition (4.16) and the resulting definition
of 𝛼 are used to avoid the need to consider well-posedness of the approximate versions of the PDE
as required in Corollary 4.4. In general, this condition is less important as the SFT approximations
of the PDE data become more accurate. The pessimistic advection term bounding in (4.17) is a
result of the fact that 𝐶 1 guarantees for the SFT algorithm are not available. Again, this step is
unnecessary if it is known (or assumed) that the approximate PDEs are well-posed. However, note
                                                                      106


that the truncation term k∇ · (b − b| 𝐾 )k 𝐿 ∞ can be controlled via regularity results for multivariate
Fourier truncation, e.g., [64, 47], so long as the regularity of the advection field is known a priori.
Remark 4.2. We can interpret this upper bound by focusing on the sum
                                        𝑓ˆ − f̂ 𝑠 ℓ2 𝛽ˆ−𝑠
                                                                        𝑁+1
                                                                𝛽ˆ−0
                                                            
                                                    +     +                   .                   (4.18)
                                           𝑓ˆ 2ℓ
                                                      𝛼       𝛼 − 𝛽ˆ−0
The first term is controlled by the accuracy of the SFT approximation to 𝑓 . As a reminder, using
Algorithm 3.1 for this SFT produces a near optimal error, upper bounded in Corollary 4.3 by
                                                           √                opt
                               𝑓ˆ − f̂ 𝑠  ℓ2
                                             ≤ (25 + 3𝐾) 𝑠 𝑓ˆ − 𝑓ˆ| 𝐾                  .
                                                                              𝑠    ℓ1
    The second term, 𝛽ˆ−𝑠 /𝛼 is controlled by the accuracy of the SFT approximations of the coeffi-
cients defining the differential operator, 𝑎, b, and 𝑐. Again, recall that Algorithm 3.1 produces near
optimal approximations with error upper bounded by, e.g.,
                                                                            opt
                              k 𝑎ˆ − â𝑠 k ℓ1 ≤ (25 + 3𝐾)𝑠 𝑎ˆ − ( 𝑎|   ˆ 𝐾 )𝑠        .
                                                                                  ℓ1
    The final term is controlled by two factors: the properties of the PDE data and the stamping
level chosen. We see that the error decays exponentially as the stamping level increases. The base
of this exponent is controlled by the PDE data. In particular, convergence is accelerated as 𝑎 and 𝑐
approach large constants and b approaches a field with divergence zero and little deviation from its
mean. Indeed 𝛽−0 is reduced as the deviation of all three coefficients from their mean decreases. The
other piece, 𝛼, (ignoring the SFT-dependent terms) increases as the minimums of 𝑎 and 𝑐 − 12 ∇ · b
increase.
Remark 4.3. The computational complexity of Algorithm 4.1 is
                                                                                        
                     O 𝑑𝑠 log4 (𝑑𝐾 max(𝐾, 𝑠)) + max(𝑠, 2𝑁 + 1) 3 min(𝑠,2𝑁+1) .
in the case of no advection field, and
                                                                                          
                   O 𝑑 2 𝑠 log4 (𝑑𝐾 max(𝐾, 𝑠)) + max(𝑑𝑠, 2𝑁 + 1) 3 min(𝑑𝑠,2𝑁+1) .
when an advection field is present. This is due to the three or 𝑑 + 3 SFTs respectively and a matrix
solve of a S 𝑁 × S 𝑁 system. Note that computing the stamping set can be done by enumerating
                                                       107


the frequencies using the techniques in Lemma 4.2 and therefore is subject to the same upper bound
as given in Lemma 4.1 for a stamp set’s cardinality. Recall also that the SFT complexity can be
tuned to produce SFT approximations satisfying the above bounds with higher probability.
    We do not analyze the complexity of the matrix solve in depth, and instead resort to the upper
bound given by Gaussian elimination on the dense matrix. However, L𝑠,𝑁 is relatively sparse for
larger stamping levels. As the capabilities of sparse solvers depend strongly on analyzing the graph
connecting interacting rows in L𝑠,𝑁 (cf. [28, Chapter 11]), we expect that the analysis of an efficient
sparse solver could be carried out using much of the same analysis of stamping sets performed in
Section 4.4.
4.7     Numerics
    This section gives examples of the algorithm summarized above applied to various problems.
We begin with an overview of our implementation as well as some techniques used evaluate the
accuracy of our approximations. We then present solutions to univariate and very high-dimensional
multiscale problems with both exactly sparse and Fourier-compressible data. For simplicity, all
experiments presented except for the last discard the advection and reaction terms, solving only a
stationary diffusion equation. In this setting, solutions are unique up to constant shifts, so we always
consider solutions with mean zero, that is, 𝑢ˆ 0 = 0.
4.7.1     Code and testing overview
    We implement Algorithm 4.1 described above in MATLAB using an object-oriented approach,
with all code publicly available.¹ All SFTs are computed using the rank-1 lattice sparse Fourier
transforms from Chapter 3.²
    In order to evaluate the quality of our approximations, we need to choose an appropriate metric.
Letting 𝑢 𝑠,𝑁 be the approximation returned by our algorithm, the ideal choice would be to use
 𝑢 − 𝑢 𝑠,𝑁    𝐻1
                 . However, for the types of problems we will be investigating, the true solution 𝑢
is unavailable to us. Instead, we will use a proxy that takes advantage of the stability result in
     ¹https://gitlab.com/grosscra/SparseADR
     ²this code is publicly available at https://gitlab.com/grosscra/Rank1LatticeSparseFourier
                                                            108


Proposition 4.1.
Lemma 4.7. Let 𝑢 be the true solution to (GF) and 𝑢 𝑠,𝑁 be the approximation returned by solving
(4.13). Define 𝑓ˆ𝑠,𝑁 := 𝐿 𝑢ˆ 𝑠,𝑁 with 𝑓 𝑠,𝑁 = L𝑢 𝑠,𝑁 . Then
                                                   𝑓 − 𝑓 𝑠,𝑁  𝐿2
                                                                      𝑓ˆ − 𝑓ˆ𝑠,𝑁 ℓ2
                               𝑢 − 𝑢 𝑠,𝑁  𝐻1
                                             ≤                   =                  .
                                                         𝛼                  𝛼
Proof. The result follows from the fact that 𝑢ˆ − 𝑢ˆ 𝑠,𝑁 solves 𝐿 𝑢ˆ − 𝑢ˆ 𝑠,𝑁 = 𝑓ˆ − 𝐿 𝑢ˆ 𝑠,𝑁 = 𝑓ˆ − 𝑓ˆ𝑠,𝑁
                                                                                    
and applying Proposition 4.1.
    In the sequel, we will ignore 𝛼 since we are mostly interested in convergence properties in 𝑠
and 𝑁 and we will compute the relative error
                                         𝑓 − 𝑓 𝑠,𝑁           𝑓ˆ − 𝑓ˆ𝑠,𝑁
                                                     𝐿2
                                                         or              ℓ2
                                           k 𝑓 k 𝐿2              𝑓ˆ 2
                                                                    ℓ
as our proxy instead. Whenever the data are exactly Fourier-sparse, the numerator of the second
of these proxies can be computed exactly due to the fact that supp( 𝑓ˆ𝑠,𝑁 ) is known to be contained
in S 𝑁+1 (cf. Proposition 4.4). However, in the non-sparse setting, even though 𝑓 − 𝑓 𝑠,𝑁 can be
evaluated pointwise, computing an accurate approximation of its norm on T𝑑 is challenging for
large 𝑑. For this reason, we approximate the norm via Monte Carlo sampling. We also furnish the
cases where exactly computing 𝑓ˆ − 𝑓ˆ𝑠,𝑁          ℓ2
                                                     is possible with the pointwise Monte Carlo estimates
to show that in practice, Monte Carlo sampling does as well as the exact computation.
4.7.2   Univariate compressible
    We begin by replicating the lone numerical example of solving an elliptic problem in [21, Sec-
tion 5.1]. In this case, we solve the univariate problem
                               −(𝑎(𝑥)𝑢0 (𝑥)) 0 = 𝑓 (𝑥) for all 𝑥 ∈ T, where
                                                                                  ∫
              1        0.6 + 0.2 cos(2𝜋𝑥)
    𝑎(𝑥) =      exp                            , 𝑓 (𝑥) = exp(− cos(2𝜋𝑥)) − exp(− cos(2𝜋𝑥)) 𝑑𝑥
             10        1 + 0.7 sin(256𝜋𝑥)                                             T
                                                                                                    (4.19)
(note that the only difference from [21] is that we use the domain T = [0, 1] rather than [0, 2𝜋]).
This data is not Fourier sparse, but is compressible. In the original paper, a bandwidth of 𝐾 = 1 536
is considered and approximations with 9 and 17 Fourier coefficients are used.
                                                        109


   We first construct a high accuracy approximation of the solution to (4.19) by numerically in-
tegrating on an extremely fine mesh of 10 000 points. This allows us to forgo our proxy error
described in Lemma 4.7. As in [21], the bandwidth of our SFT used is set to 𝐾 = 1 536. Due to
our SFT returning a 2𝑠 sparse approximation, we use 𝑠 = 4 and 𝑠 = 8 to compare with the 9 and
17 terms respectively considered in the original paper, and also provide an example with 𝑠 = 12.
We set the stamping level to 𝑁 = 1 throughout, which, as discussed in the introduction, is similar
to the technique used in [21].
                                          101
                        Relative error
                                          100
                                         10−1
                                                           𝐿2
                                                           𝐻1
                                         10−2              Proxy error
                                                4                                  8                              12
                                                                             𝑠 (sparsity)
                             Figure 4.2 Errors in approximating the solution to (4.19).
                                                                                   1.75
   0.10
                                                                                   1.50
                                                                                   1.25
   0.00
                                                                                   1.00
                   𝑢                                                                                𝑢0
 −0.10                                                                             0.75
                   𝑢 4,1                                                                            (𝑢 4,1 ) 0
                   𝑢 8,1                                                           0.50             (𝑢 8,1 ) 0
 −0.20             𝑢 12,1                                                          0.25             (𝑢 12,1 ) 0
          0      0.2               0.4              0.6      0.8         1                  0.680            0.685     0.690
          (a) Approximate solutions of (4.19).                                     (b) Detail of approximate derivatives of (4.19).
                                                          Figure 4.3 Qualitative results.
   The relative errors approximated in 𝐿 2 and 𝐻 1 are given in Figure 4.2. The original paper
does not give numerical results, and instead, gives qualitative results, comparing the approximate
                                                                             110


solutions and their derivatives with the true solution and its derivative. We have replicated this
qualitative analysis in Figure 4.3 with similar results.
     Figure 4.2 also shows the error computed via the proxy described by Lemma 4.7, and in particu-
lar, how pessimistic the proxy error can be. In this case, the small errors in the derivative (visualized
in Figure 4.3b) are compounded by passing the approximate solution through the operator where 𝑎0
is often large relative to 𝑎. In future examples, we will see that the convergence of the proxy error
is much more tolerable.
4.7.3    Multivariate exactly sparse
4.7.3.1    Low sparsity
     Moving to the multivariate case, we start with a simple example with exactly sparse data. Our
goal is to solve
                           −∇ · (𝑎(x)∇𝑢(x)) = 𝑓 (x) for all x ∈ T𝑑 , where
                                                                                                   (4.20)
                                                                                
                        𝑎(x) = 𝑎ˆ 0 + 𝑐 𝑎 cos(2𝜋k𝑎 · x),   𝑓 (𝑥) = sin 2𝜋k 𝑓 · x .
We draw 𝑐 𝑎 ∼ Unif ([−1, 1]), keep it constant for each dimension, and set 𝑎ˆ 0 = 4 so that our
problem remains elliptic (in the specific example below, 𝑐 𝑎 ≈ −0.6). For dimensions varying from
𝑑 = 1 to 𝑑 = 1 024, we then draw k𝑎 , k 𝑓 ∼ Unif [−499, 500] 𝑑 ∩ Z𝑑 . The PDE (4.20) is then
                                                                             
solved for stamping levels 𝑁 = 1, . . . , 5. The bandwidth of the SFT is set to 1000 and the sparsity
is set to 2. We then compute a Monte Carlo approximation of the proxy error choosing 200 points
drawn uniformly from T𝑑 and also compute the proxy error exactly by virtue of the sparsity of 𝑎
and 𝑓 . The results are given in Figure 4.4.
     We see that the results do not depend on the dimension of the problem. Since all dependence
on 𝑑 is in the runtime of the SFT, we also observe that in practice, after the SFTs of the data have
been computed, re-solving the problem on different stamping levels takes about the same amount
of time for each 𝑑. The error also converges exponentially in the stamping level as suggested by
the theoretical error guarantees. Notably, we also see that the Monte Carlo approximation with 200
points captures the same proxy error as the exact computation.
                                                   111


                                 10−2
                                 10−3
                     𝐿2
                    𝑓 − 𝑓 𝑠, 𝑁
                                 10−4
                      k 𝑓 k 𝐿2
                                 10−5
                                 10−6
                                        1                2                 3               4         5
                                                                 𝑁 (stamping level)
                                            𝑑   = 1 Monte Carlo                   𝑑   = 1 exact
                                            𝑑   = 4 Monte Carlo                   𝑑   = 4 exact
                                            𝑑   = 16 Monte Carlo                  𝑑   = 16 exact
                                            𝑑   = 64 Monte Carlo                  𝑑   = 64 exact
                                            𝑑   = 256 Monte Carlo                 𝑑   = 256 exact
                                            𝑑   = 1024 Monte Carlo                𝑑   = 1024 exact
     Figure 4.4 Proxy error solving (4.20) with 𝑑 = 1, 4, 16, 64, 256, 1 024 and 𝑁 = 1, . . . , 5.
4.7.3.2   High sparsity
   We expand on the exactly sparse case by testing a diffusion coefficient with much higher sparsity.
Here, we solve (4.20) with
                                                                Õ
                                                𝑎(x) = 𝑎ˆ 0 +          𝑐 k cos(2𝜋k · x).                 (4.21)
                                                                k∈I𝑎
The vector of coefficients is drawn as c ∼ Unif [−1, 1] 25 once and reused in each test. For every 𝑑,
                                                          
the frequencies k ∈ I𝑎 are each drawn uniformly from [−499, 500] 𝑑 ∩ Z𝑑 as before with |I𝑎 | = 25.
Here 𝑎ˆ 0 = 4 dkck 2 e to ensure ellipticity. Again, the bandwidth of the SFT algorithm is set to 1 000,
but the sparsity is now fixed to 26. The results are given in Figure 4.5
   Again, we see that the results do not depend on the spatial dimension except for the notable
example of 𝑑 = 1. The 𝑑 = 1 case suffers from similar issues in a pessimistic proxy error as in Fig-
ure 4.2. Specifically, the right hand-side for this example was generated with frequency 𝑘 𝑓 = −10
and is therefore relatively low-frequency. Thus, the high-frequency modes leading to errors in the
approximate solution are amplified by the high-frequencies in 𝑎 when computing 𝑓 𝑠,𝑁 . Indeed, in
further experiments (not pictured here), increasing the frequencies of 𝑓 or decreasing the frequen-
                                                                  112


                                  100
                     𝐿2
                    𝑓 − 𝑓 𝑠, 𝑁   10−1
                      k 𝑓 k 𝐿2
                                 10−2
                                        1                            2                       3
                                                            𝑁 (stamping level)
                                            𝑑   = 1 Monte Carlo           𝑑   = 1 exact
                                            𝑑   = 4 Monte Carlo           𝑑   = 4 exact
                                            𝑑   = 16 Monte Carlo          𝑑   = 16 exact
                                            𝑑   = 64 Monte Carlo          𝑑   = 64 exact
                                            𝑑   = 256 Monte Carlo         𝑑   = 256 exact
                                            𝑑   = 1024 Monte Carlo        𝑑   = 1024 exact
Figure 4.5 Proxy error solving (4.20) with diffusion coefficient (4.21) in dimensions 𝑑 = 1, 4, 16,
64, 256, 1 024 and stamping levels 𝑁 = 1, . . . , 3.
cies of 𝑎 result in a lower proxy error.
   For the other dimensions, the slight offsets in the exact proxy error can be attributed to the ran-
domized frequencies as well as slight variations in the randomized SFT code. We do see slightly
more variance in the proxy error computed using Monte Carlo sampling however. This is to be ex-
pected for data with more varied frequency content, and as such, in future experiments, we increase
the number of sampling points.
   Note that because we consider sparsity much larger than the stamping level, the computa-
tional and memory complexity of the stamping and solution step is much higher. As suggested
by Lemma 4.1, the size of the resulting stamp set (and therefore the necessary matrix solve) in the
largest case is at most 7 · 527 ≈ 7 × 1012 which pushes the memory boundaries of our computational
resources.
4.7.4   Multivariate compressible
   In order to test Fourier-compressible data which is not exactly sparse, we use a series of ten-
sorized, periodized Gaussians. Here, we present the only details necessary to demonstrate our
                                                             113


algorithm’s effectiveness on Fourier-compressible data, but for a fuller treatment on the Fourier
properties of periodized Gaussians, see e.g., [53, Section 2.1].
     Here, we define the periodic Gaussian 𝐺 𝑟 : T → R by
                                                  √         ∞
                                                    2𝜋 Õ − (2 𝜋 ) 2 (2𝑥−𝑚)
                                     𝐺 𝑟 (𝑥) =                    e        2𝑟
                                                    𝑟 𝑚=−∞
where the dilation-type parameter 𝑟 allows us to control the effective support of 𝐺ˆ 𝑟 . In practice, we
truncate the infinite sum to 𝑚 ∈ {−10, . . . , 10} as additional terms do not change the output up to
machine precision. Note here that the nonstandard multiplicative factors help control the behavior
of the function in frequency rather than space. Given a multivariate modulating frequency k ∈ Z𝑑 ,
we define the modulated, tensorized, periodic Gaussian by
                                                      Ö
                                      𝐺 𝑟,k (x) =           e2𝜋i𝑘 𝑖 𝑥𝑖 𝐺 𝑟 (𝑥𝑖 ).
                                                     𝑗 ∈[𝑑]
Finally, given a set of frequencies I ⊂ Z𝑑 , dilation parameters r ∈ RI+ , and coefficients c ∈ RI , we
can define Gaussian series
                                                        Õ
                                        𝐺 Ic,r (x) :=         𝑐 k 𝐺 𝑟k ,k (x).
                                                        k∈I
     Depending on the severity of the dilations chosen (i.e., 𝑟 k  1), this can well approximate a
Fourier series with frequencies in I. On the other hand, a less severe dilation results in Fourier co-
efficients with magnitudes forming less concentrated Gaussians centered around the “frequencies”
k ∈ I and −k. An example of a series with its associated Fourier transform is given in Figure 4.6.
     In our first experiment, we fix 𝑑 = 2 and vary both stamp level and sparsity to again solve (4.20).
The diffusion coefficient in (4.20) is replaced with a two-term Gaussian series 𝑎 = 𝑐 0 + 𝐺 Ic,r , where
                                   2                                  
                              2    2
      I ∼ Unif       [−24, 25] ∩ Z       ,     c ∼ Unif [−1, 1] 2 ,               r = 1.12 1, 𝑐 0 = 10 dkck 2 e .
Note the increased constant factor from our previous examples to decrease the likelihood of sparse
approximations of 𝑎 not satisfying the ellipticity property. The Fourier transform of the resulting
𝑎 used for the following test is depicted in Figure 4.7 below. The diffusion equation is then solved
across various sparsities with increasing stamping level. The bandwidth parameter of the SFT is
                                                        114


                                                                          50
                                                                          40
     30                                                                   30
     20                                                                   20
                                                                          10
     10                                                             𝑘1     0
      0                                                                  −10
    −10                                                                  −20
     0.5                                                                 −30
        0.25                                            0.5              −40
               0                                 0.25
                                        0
               −0.25         −0.25                                             −40−30−20−10 0 10 20 30 40 50
          𝑥1
                                            𝑥0                                              𝑘0
                 (a) 𝑐 1 𝐺 𝑟1 ,k1 + 𝑐 2 𝐺 𝑟2 ,k2                                 (b) 𝑐 1 𝐺ˆ 𝑟1 ,k1 + 𝑐 2 𝐺ˆ 𝑟2 ,k2
Figure 4.6 An example Gaussian series with 𝑐 1 = 𝑐 2 = 1, 𝑟 1 = 0.5, 𝑟 2 = 2, k1 = (3, 2), and
k2 = (−5, 15). The first term corresponds to the wider Gaussian shape and more spread out portions
of the Fourier transform. The second term contributes to the highly oscillatory parts and the isolated
spikes in the Fourier transform.
set to 𝐾 = 100 to account for the wider effective support of 𝑎.
                                                             ˆ The Monte Carlo proxy error is
computed with 1 000 samples and depicted in Figure 4.8.
                                     50
                                     40
                                     30
                                     20
                                     10
                               𝑘1     0
                                    −10
                                    −20
                                    −30
                                    −40
                                                 −40−30−20−10 0 10 20 30 40 50
                                                              𝑘0
                  Figure 4.7 The specific 𝑎ˆ used in examples depicted in Figure 4.8.
   Here, the stamping level does not affect convergence until the sparsity is above 𝑠 ≥ 16. This
demonstrates the tradeoff between sparsity and stamping level in regards to the error bound (4.18).
Until the SFT is able to capture enough useful information in 𝑎,
                                                              ˆ the k 𝑎ˆ − â𝑠 k ℓ1 piece of the error
bound dominates. Eventually, this factor is reduced far enough that the stamping term becomes
                                                              115


                       𝐿2
                      𝑓 − 𝑓 𝑠, 𝑁
                        k 𝑓 k 𝐿2   10−2
                                          1                               2                          3
                                                                  𝑁 (stamping level)
                                              𝑠 = 2 Monte Carlo               𝑠 = 16 Monte Carlo
                                              𝑠 = 4 Monte Carlo               𝑠 = 32 Monte Carlo
                                              𝑠 = 8 Monte Carlo               𝑠 = 64 Monte Carlo
Figure 4.8 Proxy error solving (4.20) with Gaussian series diffusion coefficient with sparsity levels
𝑠 = 2, 4, 8, 16, 32, 64, and stamping levels 𝑁 = 1, . . . , 3.
apparent.
    We provide another example, where sparsity is fixed at 𝑠 = 16, and dimension and stamping
level are increased. Again we solve (4.20) with the diffusion coefficient replaced by the two-term
Gaussian series 𝑎 = 𝑐 0 + 𝐺 Ic,r , where
                                                  2                        
   I ∼ Unif        [−249, 250] ∩ Z   𝑑          𝑑
                                                          ,   c ∼ Unif [−1, 1] 2 ,     r = 1.1𝑑 1,   𝑐 0 = 10 dkck 2 e ,
and c and 𝑐 0 are not redrawn across test cases. The bandwidth of the SFT is set to 1 000 to again
account for the potentially widened Fourier transform of 𝑎. With a 1 000 point Monte Carlo ap-
proximation of the proxy error, the results are given in Figure 4.9.
    Here we observe much the same behavior as the previous test case. This is due to the fact that the
dimension additionally drives the sparsity of the Gaussian Fourier transforms based on the choice
of dilation r = 1.1𝑑 1. In additional experiments performed at higher dimensions (not pictured
here), this factor results in numerical instability and the approximation error blows up. We also
see that the 𝑑 = 2 and 𝑑 = 4 examples are swapped from their assumed positions (and the 𝑑 = 2
case even mildly benefits from increased stamping level). This is attributed to the random draw of
the frequency locations affecting the proxy error as well as the SFT algorithm performing better in
                                                                   116


                                     10−2
                        𝐿2
                       𝑓 − 𝑓 𝑠, 𝑁
                         k 𝑓 k 𝐿2    10−3
                                     10−4
                                               1                  2                         3                     4
                                                                  𝑁 (stamping level)
                                               𝑑 = 2 Monte Carlo                  𝑑 = 8 Monte Carlo
                                               𝑑 = 4 Monte Carlo                  𝑑 = 16 Monte Carlo
Figure 4.9 Approximate proxy error solving (4.20) with Gaussian series diffusion coefficient with
𝑑 = 2, 4, 8, 16 and 𝑁 = 1, . . . , 5.
lower dimensions when all parameters are fixed.
4.7.5   Three-dimensional exactly sparse advection-diffusion-reaction equation
   We now extend our numerical experiments to the situation of a three-dimensional advection-
diffusion-reaction equation. We work with the PDE
                                                    −∇ · (𝑎∇𝑢) + b · ∇𝑢 + 𝑐𝑢 = 𝑓                                      (4.22)
with exactly sparse data
                                         Õ                                    Õ
              𝑎(x) = 𝑎ˆ 0 +                        𝑐sine
                                                    𝑎,k sin(2𝜋k · x) +                   𝑐cosine
                                                                                          𝑎,k cos(2𝜋k · x)
                                       k∈I𝑎sine                             k∈I𝑎cosine
                     Õ                                         Õ
        𝑏 𝑗 (x) =               𝑐sine
                                 𝑏 𝑗 ,k sin(2𝜋k · x) +                     𝑏 𝑗 ,k cos(2𝜋k · x) for all 𝑗 ∈ [3]
                                                                          𝑐cosine
                    k∈I𝑏sine                                 k∈I𝑏cosine
                                                                                                                      (4.23)
                          𝑗                                        𝑗
                                         Õ                                    Õ
              𝑐(x) = 𝑐ˆ0 +                         𝑐sine
                                                    𝑐,k sin(2𝜋k   · x) +                 𝑐cosine
                                                                                          𝑐,k    cos(2𝜋k   · x)
                                       k∈I𝑐sine                             k∈I𝑐cosine
                                     Õ                                      Õ
                    𝑓 (x) =                    𝑐sine
                                                𝑓 ,k sin(2𝜋k · x) +                 𝑐cosine
                                                                                     𝑓 ,k cos(2𝜋k · x),
                                    k∈I𝑓sine                           k∈I𝑓cosine
                                                                      117


where
                                                                  I𝑎sine = I𝑎cosine = 2
                                       I𝑏sine
                                          𝑗
                                              = I𝑏cosine
                                                   𝑗
                                                         = I𝑐sine = Iccosine = 5 for all 𝑗 ∈ [3]
                                                              I𝑓sine = 2, and I𝑓cosine = 3.
In total, there are 45 terms composing the differential operator, and 5 terms composing the forcing
function. Each frequency is randomly drawn from Unif ([−49, 50] 3 ∩ Z3 ) and each coefficient for
𝑎 and 𝑓 from Unif ([−1, 1]). The coefficients for b and 𝑐 are drawn from Unif ([0, 1]). To ensure
                         q                                 q                  
                                  2         2                         2        2
well-posedness, 𝑎ˆ 0 = 4     sine
                            𝑐𝑎 2 + 𝑐𝑎cosine
                                            2
                                               , and 𝑐ˆ0 = 4     sine   cosine
                                                                𝑐𝑐 2 + 𝑐𝑐      2
                                                                                  . The bandwidth
of the SFT is set to 𝐾 = 100 and we consider sparsity levels 𝑠 = 2 and 𝑠 = 5. Due to the large size
of the stamp, we only consider stamping levels 𝑁 = 1, 2.
                                                                         𝑓 − 𝑓 𝑠,𝑁         𝐿2
                                                                                                 /k 𝑓 k 𝐿 2
                                                          𝑠      𝑁     exact             Monte Carlo
                                                                 1     0.518                0.518
                                                          2
                                                                 2     0.518                0.518
                                                                 1     0.054                0.054
                                                          5
                                                                 2     0.031                0.031
                        Table 4.1 Error in approximating solution to ADR equation (4.22).
     0.4                                            0.4                                                     0.4
     0.3                                            0.3                                                     0.3
𝑥3                                             𝑥3                                                      𝑥3
     0.2                                            0.2                                                     0.2
     0.1                                            0.1                                                     0.1
      0                                              0                                                        0
           0    0.1   0.2        0.3    0.4               0      0.1    0.2        0.3     0.4                    0      0.1   0.2        0.3   0.4
                            𝑥2                                                𝑥2                                                     𝑥2
           (a) Slice through 𝑓 2,1 .                  (b) Slice through 𝑓 10,2 .                                      (c) Slice through 𝑓 .
                                 Figure 4.10 Samples of 𝑓 10,2 and 𝑓 on the 𝑥 1 = 63/128 plane.
       The resulting true and Monte Carlo proxy error (sampled over 1 000 points) is given in Table 4.1.
Additionally, Figure 4.10 shows a portion of a slice through 𝑓 as well as 𝑓 2,1 and 𝑓 10,2 which are
computed by passing 𝑢 2,1 and 𝑢 10,2 through the differential operator.
                                                                              118


    We note that 𝑓 10,2 and 𝑓 appear qualitatively indistinguishable. However, since the sparsity
level, 𝑠 = 2, used to compute 𝑢 2,1 is lower than the sparsity of any term in (4.23), 𝑓 2,1 loses some
of characteristics of the original source term. Though it captures some of the true behavior in
both larger scales (e.g., the oscillations moving in the northeast direction) and finer scales (e.g., the
oscillations moving in the southeast direction), some interfering modes which produce the “wavy”
effect are left out. This is supported by the relative errors reported in Table 4.1. Note also that the
stamping level affects the convergence in 𝑠 = 5 case, but not the 𝑠 = 2 case. This is due to the
sparsity related errors in (4.18) overwhelming the stamping term until the SFT approximations of
the data are accurate enough.
                                                  119


                                       BIBLIOGRAPHY
[1]  Sina Bittens and Gerlind Plonka. Real sparse fast DCT for vectors with short support. Linear
     Algebra Appl., 582:359–390, 2019.
[2]  Sina Bittens and Gerlind Plonka. Sparse fast DCT for vectors with one-block support. Numer.
     Algorithms, 82(2):663–697, 2019.
[3]  Sina Bittens, Ruochuan Zhang, and Mark A Iwen. A deterministic sparse FFT for functions
     with structured Fourier sparsity. Advances in Computational Mathematics, 45(2):519–561,
     2019.
[4]  Simone Brugiapaglia. COmpRessed SolvING: Sparse Approximation of PDEs based on Com-
     pressed Sensing. PhD thesis, Polytecnico Di Milano, Milan, Italy, January 2016.
[5]  Simone Brugiapaglia. A compressive spectral collocation method for the diffusion equa-
     tion under the restricted isometry property. In Marta D’Elia, Max Gunzburger, and Gian-
     luigi Rozza, editors, Quantification of Uncertainty: Improving Efficiency and Technology:
     QUIET selected contributions, Lecture Notes in Computational Science and Engineering,
     pages 15–40. Springer International Publishing, Cham, 2020.
[6]  Simone Brugiapaglia, Sjoerd Dirksen, Hans Christian Jung, and Holger Rauhut. Sparse re-
     covery in bounded Riesz systems with applications to numerical methods for PDEs. Applied
     and Computational Harmonic Analysis, 53:231–269, July 2021.
[7]  Simone Brugiapaglia, Stefano Micheletti, Fabio Nobile, and Simona Perotto. Supplementary
     material to “Wavelet–Fourier CORSING techniques for multidimensional advection–diffu-
     sion–reaction equations”, September 2020.
[8]  Simone Brugiapaglia, Stefano Micheletti, Fabio Nobile, and Simona Perotto. Wavelet–Fourier
     CORSING techniques for multidimensional advection–diffusion–reaction equations. IMA
     Journal of Numerical Analysis, (draa036), September 2020.
[9]  Simone Brugiapaglia, Stefano Micheletti, and Simona Perotto. Compressed solving: A nu-
     merical approximation technique for elliptic PDEs based on compressed sensing. Computers
     & Mathematics with Applications, 70(6):1306–1335, September 2015.
[10] Simone Brugiapaglia, Fabio Nobile, Stefano Micheletti, and Simona Perotto. A theoretical
     study of COmpRessed SolvING for advection-diffusion-reaction problems. Mathematics of
     Computation, 87(309):1–38, January 2018.
[11] Hans-Joachim Bungartz and Michael Griebel. Sparse grids. Acta Numerica, 13:147–269,
     May 2004. Publisher: Cambridge University Press.
[12] Glenn Byrenheid, Lutz Kämmerer, Tino Ullrich, and Toni Volkmer. Tight error bounds for
     rank-1 lattice sampling in spaces of hybrid mixed smoothness. Numerische Mathematik,
     136(4):993–1034, August 2017.
                                              120


[13] Claudio Canuto, M. Yousuff Hussaini, Alfio Quarteroni, and Thomas A. Zang. Spectral Meth-
     ods: Fundamentals in Single Domains. Scientific Computation. Springer-Verlag, Berlin Hei-
     delberg, 2006.
[14] Bosu Choi, Andrew Christlieb, and Yang Wang. Multiscale High-Dimensional Sparse Fourier
     Algorithms for Noisy Data. ArXiv e-prints, 2019. arXiv:1907.03692.
[15] Bosu Choi, Andrew Christlieb, and Yang Wang. High-dimensional sparse Fourier algorithms.
     Numerical Algorithms, 87(1):161–186, May 2021.
[16] Bosu Choi, Mark Iwen, and Toni Volkmer. Sparse harmonic transforms ii: best s-term ap-
     proximation guarantees for bounded orthonormal product bases in sublinear-time. Numerische
     Mathematik, 148(2):293–362, Jun 2021.
[17] Bosu Choi, Mark A. Iwen, and Felix Krahmer. Sparse harmonic transforms: A new class of
     sublinear-time algorithms for learning functions of many variables. Found. Comput. Math.,
     2020.
[18] Andrew Christlieb, David Lawlor, and Yang Wang. A multiscale sub-linear time Fourier
     algorithm for noisy data. Appl. Comput. Harmon. Anal., 40(3):553 – 574, 2016.
[19] Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Compressed sensing and best 𝑘-term
     approximation. Journal of the American Mathematical Society, 22(1):211–231, January 2009.
[20] Dinh Dũng, Vladimir Temlyakov, and Tino Ullrich. Hyperbolic Cross Approximation. Ad-
     vanced Courses in Mathematics - CRM Barcelona. Springer International Publishing, Cham,
     2018.
[21] Ingrid Daubechies, Olof Runborg, and Jing Zou. A sparse spectral method for homogeniza-
     tion multiscale problems. Multiscale Modeling & Simulation, 6(3):711–740, January 2007.
     Publisher: Society for Industrial and Applied Mathematics.
[22] Michael Döhler, Stefan Kunis, and Daniel Potts. Nonequispaced hyperbolic cross fast Fourier
     transform. SIAM Journal on Numerical Analysis, 47(6):4415–4428, January 2010. Publisher:
     Society for Industrial and Applied Mathematics.
[23] Lawrence C. Evans. Partial differential equations. Number v. 19 in Graduate studies in
     mathematics. American Mathematical Society, Providence, R.I, second edition edition, 2010.
[24] Simon Foucart and Holger Rauhut. A mathematical introduction to compressive sensing.
     Springer, 2013.
[25] A. C. Gilbert, S. Muthukrishnan, and M. Strauss. Improved time bounds for near-optimal
     sparse Fourier representations. In Manos Papadakis, Andrew F. Laine, and Michael A. Unser,
     editors, Wavelets XI, volume 5914, pages 398 – 412. International Society for Optics and
     Photonics, SPIE, 2005.
[26] Anna C Gilbert, Piotr Indyk, Mark Iwen, and Ludwig Schmidt. Recent developments in
     the sparse Fourier transform: A compressed Fourier transform for big data. IEEE Signal
     Processing Magazine, 31(5):91–100, 2014.
                                              121


[27] Anna C. Gilbert, Martin J. Strauss, and Joel A. Tropp. A tutorial on fast Fourier sampling.
     IEEE Signal Process. Mag., 25(2):57–66, 2008.
[28] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins Studies in
     the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, fourth edition,
     2013.
[29] V Gradinaru. Fourier transform on sparse grids: Code design and the time dependent
     Schrödinger equation. Computing (Wien. Print), 80(1):1–22, January 2007. Place: Wien
     Publisher: Springer.
[30] Michael Griebel and Jan Hamaekers. Sparse grids for the Schrödinger equation. Special
     issue on molecular modelling, 41(2):215–247, January 2007. Place: Les Ulis Publisher: EDP
     Sciences.
[31] Michael Griebel and Jan Hamaekers. Fast discrete Fourier transform on generalized sparse
     grids. In Jochen Garcke and Dirk Pflüger, editors, Sparse Grids and Applications - Munich
     2012, volume 97, pages 75–107. Springer International Publishing, Cham, 2014. Series Title:
     Lecture Notes in Computational Science and Engineering.
[32] Craig Gross and Mark Iwen. Sparse spectral methods for solving high-dimensional and mul-
     tiscale elliptic PDEs. ArXiv e-prints, 2023. arXiv:2302.00752.
[33] Craig Gross, Mark Iwen, Lutz Kämmerer, and Toni Volkmer. Sparse Fourier transforms on
     rank-1 lattices for the rapid and low-memory approximation of functions of many variables.
     Sampling Theory, Signal Processing, and Data Analysis, 20(1):1, December 2021.
[34] Craig Gross, Mark A Iwen, Lutz Kämmerer, and Toni Volkmer. A deterministic algorithm
     for constructing multiple rank-1 lattices of near-optimal size. Advances in Computational
     Mathematics, 47(6):1–24, 2021.
[35] Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price. Simple and practical algo-
     rithm for sparse Fourier transform. In Proceedings of the twenty-third annual ACM-SIAM
     symposium on Discrete Algorithms, pages 1183–1194. SIAM, 2012.
[36] Mark A Iwen. Combinatorial sublinear-time Fourier algorithms. Foundations of Computa-
     tional Mathematics, 10(3):303–338, 2010.
[37] Mark A. Iwen. Improved approximation guarantees for sublinear-time Fourier algorithms.
     Appl. Comput. Harmon. Anal., 34:57–82, 2013.
[38] Lutz Kämmerer. High Dimensional Fast Fourier Transform Based on Rank-1 Lattice Sam-
     pling. Ph.D, Universitätsverlag Chemnitz, 2014.
[39] Lutz Kämmerer. Reconstructing multivariate trigonometric polynomials from samples along
     rank-1 lattices. In Gregory E. Fasshauer and Larry L. Schumaker, editors, Approximation
     Theory XIV: San Antonio 2013, pages 255–271. Springer International Publishing, 2014.
[40] Lutz Kämmerer. Multiple rank-1 lattices as sampling schemes for multivariate trigonometric
     polynomials. Journal of Fourier Analysis and Applications, 24(17):17–44, 2018.
                                               122


[41] Lutz Kämmerer. Constructing spatial discretizations for sparse multivariate trigonometric
     polynomials that allow for a fast discrete Fourier transform. Applied and Computational Har-
     monic Analysis, 47(3):702–729, 2019.
[42] Lutz Kämmerer, Felix Krahmer, and Toni Volkmer. A sample efficient sparse FFT for arbitrary
     frequency candidate sets in high dimensions. Numerical Algorithms, 89(4):1479–1520, Apr
     2022.
[43] Lutz Kämmerer, Daniel Potts, and Toni Volkmer. High-dimensional sparse FFT based on
     sampling along multiple rank-1 lattices. Appl. Comput. Harmon. Anal., 51:225–257, 2021.
[44] Lutz Kämmerer and Toni Volkmer. Approximation of multivariate periodic functions based on
     sampling along multiple rank-1 lattices. Journal of Approximation Theory, 246:1–27, 2019.
[45] Michael Kapralov. Sparse Fourier Transform in Any Constant Dimension with Nearly-Optimal
     Sample Complexity in Sublinear Time, page 264–277. Assoc. Comput. Mach., New York, NY,
     USA, 2016.
[46] Frances Kuo, Giovanni Migliorati, Fabio Nobile, and Dirk Nuyens. Function integra-
     tion, reconstruction and approximation using rank-1 lattices. Mathematics of Computation,
     90(330):1861–1897, July 2021.
[47] Friedrich Kupka. Sparse grid spectral methods for the numerical solution of partial differen-
     tial equations with periodic boundary conditions. Ph.D., Universität Wien, Vienna, Austria,
     November 1997.
[48] Lutz Kämmerer. A fast probabilistic component-by-component construction of exactly inte-
     grating rank-1 lattices and applications. ArXiv e-prints, 2020. arXiv:2012.14263.
[49] Lutz Kämmerer, Stefan Kunis, and Daniel Potts. Interpolation lattices for hyperbolic cross
     trigonometric polynomials. Journal of Complexity, 28(1):76–92, February 2012.
[50] Lutz Kämmerer, Daniel Potts, and Toni Volkmer. Approximation of multivariate periodic
     functions by trigonometric polynomials based on rank-1 lattice sampling. Journal of Com-
     plexity, 31(4):543–576, August 2015.
[51] David Lawlor, Yang Wang, and Andrew Christlieb. Adaptive sub-linear time Fourier algo-
     rithms. Adv. Adapt. Data Anal., 05(01):1350003, 2013.
[52] Dong Li and Fred J. Hickernell. Trigonometric spectral collocation methods on lattices. In
     Recent advances in scientific computing and partial differential equations (Hong Kong, 2002),
     volume 330 of Contemp. Math., pages 121–132. Amer. Math. Soc., Providence, RI, 2003.
[53] Sami Merhi, Ruochuan Zhang, Mark A. Iwen, and Andrew Christlieb. A new class of fully
     discrete sparse Fourier transforms: Faster stable implementations with guarantees. Journal
     of Fourier Analysis and Applications, 25(3):751–784, June 2019.
[54] Lucia Morotti. Explicit universal sampling sets in finite vector spaces. Appl. Comput. Harmon.
     Anal., 43(2):354–369, 2017.
                                                 123


[55] Hans Munthe-Kaas and Tor Sørevik. Multidimensional pseudo-spectral methods on lattice
     grids. Applied Numerical Mathematics, 62(3):155–165, March 2012.
[56] Gerlind Plonka, Daniel Potts, Gabriele Steidl, and Manfred Tasche. Numerical Fourier Anal-
     ysis. Applied and Numerical Harmonic Analysis. Springer International Publishing, Cham,
     2018.
[57] Gerlind Plonka and Katrin Wannenwetsch. A sparse fast Fourier algorithm for real non-
     negative vectors. J. Comput. Appl. Math., 321:532–539, 2017.
[58] Gerlind Plonka, Katrin Wannenwetsch, Annie Cuyt, and Wen-shin Lee. Deterministic sparse
     FFT for 𝑀-sparse vectors. Numer. Algorithms, 78(1):133–159, 2018.
[59] Daniel Potts and Toni Volkmer. Sparse high-dimensional FFT based on rank-1 lattice sam-
     pling. Appl. Comput. Harmon. Anal., 41(3):713–748, 2016.
[60] J. Barkley Rosser and Lowell Schoenfeld. Approximate formulas for some functions of prime
     numbers. Illinois Journal of Mathematics, 6(1):64–94, 1962.
[61] A.D. Rubio, A. Zalts, and C.D. El Hasi. Numerical solution of the advection-reaction-
     diffusion equation at different scales. Environmental Modelling & Software, 23(1):90–95,
     January 2008.
[62] Ben Segal and MA Iwen. Improved sparse Fourier approximation results: faster implementa-
     tions and stronger guarantees. Numer. Algorithms, 63(2):239–263, 2013.
[63] Jie Shen and Li-Lian Wang. Sparse spectral approximations of high-dimensional problems
     based on hyperbolic cross. SIAM Journal on Numerical Analysis, 48(3):1087–1109, January
     2010. Publisher: Society for Industrial and Applied Mathematics.
[64] V. N. Temlyakov. Approximation of periodic functions. Comput. Math. Anal. Ser. Nova Sci.
     Publ., Inc., Commack, NY, 1993.
[65] Toni Volkmer. Multivariate Approximation and High-Dimensional Sparse FFT Based on
     Rank-1 Lattice Sampling. Ph.D, Universitätsverlag Chemnitz, 2017.
[66] Weiqi Wang and Simone Brugiapaglia. Compressive Fourier collocation methods for high-
     dimensional diffusion equations with periodic boundary conditions. ArXiv e-prints, 2022.
     arxiv:2206.01255.
[67] Harry Yserentant. On the regularity of the electronic Schrödinger equation in Hilbert spaces
     of mixed derivatives. Numerische Mathematik, 98(4):731–759, October 2004.
[68] Harry Yserentant. Sparse grid spaces for the numerical solution of the electronic Schrödinger
     equation. Numerische Mathematik, 101(2):381–389, August 2005.
                                               124