SPARSITY IN THE SPECTRUM: SPARSE FOURIER TRANSFORMS AND SPECTRAL METHODS FOR FUNCTIONS OF MANY DIMENSIONS By Craig Gross A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Applied Mathematics—Doctor of Philosophy 2023 ABSTRACT The Fourier basis has been a cornerstone of numerical approximations due in part to its amenable algebraic properties resulting in efficient algorithmic approaches. Primary among these is the Fast Fourier Transform (FFT) which transforms a collection samples of a univariate function into that function’s Fourier coefficients with computational complexity linear in the number of samples (with an extra logarithmic term). Extensions based on the FFT include algorithms that take advantage of sparsity in a function’s Fourier coefficients (sparse Fourier transforms or SFTs) to lower this com- plexity even further as well as efficient approaches for approximating certain Fourier coefficients of multivariate functions, most often those indexed over computationally friendly hyperbolic cross structures. The ability to quickly compute a function’s Fourier coefficients has additionally allowed for a variety of applications including fast algorithms for numerically solving partial differential equations (PDEs) via spectral methods. This dissertation considers improvements on these three applications of the FFT to produce (1) a high-dimensional Fourier transform over arbitrary index sets with reduced sampling complexity from current state of the art methods, (2) an accurate high- dimensional, sparse Fourier transform that can dramatically drive down the sampling and compu- tational complexity so long as a sparsity assumption is satisfied, and (3) a high-dimensional, sparse spectral method which makes use of our sparse Fourier transform to solve PDEs with multiscale structure in extremely high dimensions. All three of these applications rely on the method of rank-1 lattices for their flexibility. By using this quasi-Monte Carlo approach for sampling in high-dimensions, high-dimensional functions are converted into one-dimensional ones on which well-studied techniques can be used. We extend these approaches by first developing a fully deterministic construction of multiple, smaller, rank-1 lattices to sample over simultaneously which drive down the sampling complexity from traditional rank-1 lattice methods. Our improved technique depends only linearly on the size of the underlying set of frequencies that Fourier coefficients are computed over rather than the previously standard quadratic dependence (with additional logarithmic terms). We can push further beyond this linear dependence on the frequency set of interest by making use of univariate SFTs after the high-dimensional to one-dimensional conversion. However, to effectively integrate univariate SFT algorithms into the rank-1 lattice approach without ruining the derived computational speedups, we provide an alternative approach. Rather than employing multiple rank-1 lattice sampling sets, we need to employ multiple rank-1 lattice SFTs. The slightly inflated sampling cost allows for significant gains in coefficient reconstruction: we produce two methods whose dependence on the frequency set of interest is cast entirely into logarithmic terms. The complexity is then quadratically or linearly (depending on the chosen variation) dependent on an imposed sparsity parameter and linear in the dimension of the underlying function domain. The dependence on this sparsity is then fully characterized in near-optimal approximation guarantees for the function of interest. And just as the FFT provided the foundation for fast spectral methods for numerically approx- imating solutions to PDE, so too does our high-dimensional, sparse Fourier transform provide the foundation for a high-dimensional, sparse spectral method. However, to be most effective, the un- derlying frequency set of interest should be primarily driven by the PDE itself rather than the user. As such, we provide a technique for efficiently converting sparse Fourier approximations of the PDE data into a Fourier basis in which the solution to the PDE will be guaranteed to have a good approximation. These ingredients combined with the rich literature on spectral methods allow for us to provide error estimates in the Sobolev norm for the solution which are fully characterized by properties of the PDE, namely the Fourier sparsity of its data and conditions related to its well- posedness. Throughout the text, these proposed algorithms are accompanied with practical considerations and implementations. These implementations are then judged against a variety of numerical tests which demonstrate performance on par with the theoretical guarantees provided. Copyright by CRAIG GROSS 2023 To Alan, my fellow scientist and my brother. I love you. v ACKNOWLEDGEMENTS First and foremost, I would like to thank my advisor, Mark Iwen, for your incredible support throughout my time at Michigan State University. Your generosity in time, advice, ideas, and more is the reason that this work exists and would not have been possible without your guidance. I also owe to you the space that I had throughout my studies to fully explore my interests both mathe- matically and professionally, and find a path that was fulfilling. And I also have you to thank for allowing me to grow beyond the boundaries I came to MSU with, whether they be boundaries of perspective, opportunity, or geography. In that vein, I would also like thank my collaborators at Technische Universität Chemnitz, Lutz Kämmerer and Toni Volkmer, who introduced me to the wonderful world of rank-1 lattices and whom I wrote Chapters 2 and 3 alongside. You helped me develop as a researcher and an applied mathematician through your invaluable mentorship, contributions, and conversations. And I have you and the rest of Daniel Potts’ group to thank for your incredible hospitality in my unforgettable visit to Chemnitz. And to one of my first mathematical mentors, Andrew Gillette, I thank you for showing me what it means to be a mathematician. From my first day of freshman year in the Cesar E. Chavez Build- ing at the University of Arizona to our continued conversations at Lawrence Livermore National Laboratory, you have been there to foster my mathematical journey and afford me the opportunities to make it to this point. I would also like to thank my fellow mathematicians who I had the pleasure of sharing thoughts with throughout my studies. In particular, I have Ben Adcock and Simone Brugiapaglia to thank for the inspiration and motivation resulting in the sparse spectral method presented in Chapter 4. I also thank the members of my committee, Yingda Cheng, Jun Kitagawa, and Rongrong Wang, for your instruction and guidance throughout my time at MSU. To my friends in my cohort, thank you for the long nights of analysis homework, the HopCat happy hours, and the consistent cycle of commiseration and inspiration. And to those friends who came to MSU before or after me, thank you for making and keeping the math department bright, vi welcoming, and growing. But most of all, I owe my successes, my opportunities, and everything else to my family. My heroes, my mother and father, have provided the encouragement and continual support to reach where I am today. Your perpetual care, humor, creativity, and joy form the foundation for me every day, and it is only on that foundation that I am able to grow and push myself into places, ideas, and worlds previously unknown. And to my siblings, Katie, for your empathy, drive, and spirit that keeps me moving forward; Essa, for your conversations that bring me the perspective I need; and Alan, for the everlasting knowledge that I have your love and support behind me: I can’t thank each of you enough. And finally, to Sarina, I could write another 124 pages about how this, and every day, is due to you. But I’ll keep it brief. Simply put, this, and so many more of the achievements in my life, wouldn’t have been able to happen without you. You’ve kept me together in the bad and have been the celebration of the good. You’ve been by my side every day, my outlet, my reflection for thoughts, joys, and all the rest. You bring me everything. Thank you. vii TABLE OF CONTENTS CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Fourier preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 CHAPTER 2 CONSTRUCTING MULTIPLE RANK-1 LATTICES DETERMINISTICALLY . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Overview of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 The proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 CHAPTER 3 HIGH-DIMENSIONAL SPARSE FOURIER TRANSFORMS . . . . . . 39 3.1 Overview of results and prior work . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 One-dimensional sparse Fourier transform results . . . . . . . . . . . . . . . . 44 3.3 Fast multivariate sparse Fourier transforms . . . . . . . . . . . . . . . . . . . . 53 3.4 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 CHAPTER 4 SPARSE FOURIER SPECTRAL METHODS FOR SOLVING PDE . . . 83 4.1 Overview of results and prior work . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Elliptic PDE setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Galerkin spectral methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.4 Stamping sets and truncation analysis . . . . . . . . . . . . . . . . . . . . . . 90 4.5 Fully sublinear-time SFTs with randomized lattices . . . . . . . . . . . . . . . 99 4.6 A sparse spectral method via SFTs . . . . . . . . . . . . . . . . . . . . . . . . 100 4.7 Numerics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 viii CHAPTER 1 INTRODUCTION 1.1 Overview This dissertation is concerned with the efficient approximation of periodic functions of many variables by Fourier series and associated applications in solving partial differential equations. For a periodic function 𝑔 : T𝑑 → C, where T is taken to be R/Z, we wish to compute its Fourier series, or at least an approximation, as quickly as possible. That is, we want to find the coefficients 𝑔, ˆ a complex sequence indexed by multivariate frequencies k ∈ Z𝑑 , of the Fourier series Õ 𝑔= 𝑔ˆ k e2𝜋ik·◦ . k∈Z𝑑 Since the collection of multivariate trigonometric monomials {e2𝜋ik·◦ }k∈Z𝑑 forms an orthonormal basis for 𝐿 2 (T𝑑 ) (cf. Theorem 1.1), the Fourier coefficients of 𝑔 can be computed by ∫ 𝑔ˆ k = 𝑔(x)e−2𝜋ik·x 𝑑x. T𝑑 Of course, using this formulation would require full knowledge of 𝑔 to begin with, or at least enough information to approximate this integral. However, this is the problem we are attempting to solve in the first place. The univariate formulation of this problem has been classically solved using the fast Fourier transform (FFT). Given a parameter 𝐾 ∈ N, the FFT computes approximate Fourier coefficients of a function 𝑔 1d : T → R via a simple left Riemann sum over 𝐾 points: 𝐾   1d 1Õ 𝑗 −2𝜋i𝜔 𝑗/𝐾 𝑔ˆ 𝜔 ≈ 𝑔 e . 𝐾 𝑗=0 𝐾 Computing all approximate Fourier coefficients with frequencies in [𝐾] := {0, . . . 𝐾 − 1} at once can be performed by the matrix multiplication F𝐾 g1d , where      1 −2𝜋i𝜔 𝑗/𝐾 1d 𝑗 F𝐾 := e and g := 𝑔1d . 𝐾 𝜔∈[𝐾], 𝑗 ∈[𝐾] 𝐾 𝑗 ∈[𝐾] Taking advantage of algebraic properties of the Fourier basis, the FFT algorithm performs this matrix multiplication in O (𝐾 log 𝐾) time and space, instead of the standard O (𝐾 2 ) computational complexity (see, e.g., [56] for a good survey of these techniques). 1 Returning to the multivariate setting, instead of using an equispaced sampling of the target function over an interval, we can take an equispaced sampling over the 𝑑-dimensional grid, denoted (𝑔(j/𝐾))j∈[𝐾] 𝑑 , and effectively apply 𝑑 FFTs along the sides of this now 𝑑-dimensional tensor. Thus, this multivariate FFT has a time/space-complexity of O (𝐾 𝑑 log𝑑 𝐾). This exponential growth in 𝑑 characterizes the well-known curse of dimensionality and therefore, this multivariate FFT is only suitable for low dimensions. Approaches to avoid this curse of dimensionality for Fourier approximation form a vast body of literature. The state of the art in the contexts we are interested in is discussed in the literature reviews of the subsequent chapters. However, we summarize a simple and effective approach upon which the remainder of this dissertation is based: using rank-1 lattices. Definition 1.1. Given a natural number 𝑀 ∈ N and a generating vector z ∈ {1, . . . , 𝑀 − 1} 𝑑 , the rank-1 lattice Λ(z, 𝑀) ⊂ T𝑑 is defined as   𝑗 Λ(z, 𝑀) := z mod 1 | 𝑗 ∈ [𝑀] . 𝑀 Intuitively, a rank-1 lattice gives a direction vector z to restrict the multivariate function 𝑔 into a univariate one 𝑔 1d defined by 𝑡 ↦→ 𝑔(𝑡z). The 𝑀 sampling points in the rank-1 lattice are in fact an equispaced sampling over T of 𝑔 1d . An FFT of these equispaced samples of 𝑔 1d is then able to give us information about the Fourier coefficients of the original, high-dimensional function 𝑔. To see how the FFT relates to the Fourier coefficients of 𝑔, we consider the Fourier series of 𝑔 1d by way of the Fourier series of 𝑔, Õ Õ ©­ Õ ª® 1d 2𝜋ik·z𝑡 𝑔 (𝑡) = 𝑔(𝑡z) = 𝑔ˆ k e = ­ 𝑔ˆ k ® e2𝜋i𝜔𝑡 . ­ ® k∈Z𝑑 𝜔∈Z k∈Z𝑑 « k·z=𝜔 ¬ Thus Õ 𝑔ˆ 𝜔1d = 𝑔ˆ k . k∈Z𝑑 k·z=𝜔 In light of the fact that we will be using an FFT approximation of 𝑔ˆ 1d , let us also note well-known 2 aliasing effect of the FFT. For all 𝜔 ∈ [𝑀],   Õ F 𝑀 g1d = 𝑔ˆ 𝜔0 (1.1) 𝜔 𝜔0 ∈Z 𝜔0 ≡𝜔 mod 𝑀 (see Lemma 1.3 for the proof and further explanation). We can then assert that   Õ 1d F𝑀 g = 𝑔ˆ k . 𝜔 k∈Z𝑑 k·z≡𝜔 mod 𝑀 To make the most effective use of the length-𝑀 FFT, a rank-1 lattice should be chosen so that this sum contains at most one Fourier coefficient of the original function in 𝑔. ˆ In order to accomplish this, we will restrict the scope of our Fourier coefficient approximation to some chosen frequency set of interest I ⊂ Z𝑑 and introduce the idea of the modulus mapping and a reconstructing rank-1 lattice. Definition 1.2. Choose some I ⊂ Z𝑑 . The modulus mapping 𝑚 z,𝑀 : I → [𝑀] is defined by k ↦→ k · z mod 𝑀. A rank-1 lattice Λ(z, 𝑀) is said to be reconstructing for I if the modulus mapping is injective. An equivalent condition is that k · z . h · z mod 𝑀 for all k ≠ h ∈ I. We find then that for any trigonometric polynomial 𝑔 ∈ ΠI := span{e2𝜋ik·◦ | k ∈ I}, (1.1) reduces to   𝑔ˆ k = F 𝑀 g 1d for all k ∈ I 𝑚 z,𝑀 (k) and for any 𝑔 with Fourier coefficients not necessarily supported on I,   Õ 𝑔ˆ k = F 𝑀 g1d + 𝑔ˆ h for all k ∈ I. (1.2) 𝑚 z,𝑀 (k) h∉I h·z≡k·z mod 𝑀 The upshot is that we are able to compute all Fourier coefficients in I of a periodic function 𝑔 up to errors related to restricting our attention to I. The full rank-1 lattice FFT approach is summarized in Algorithm 1.1. 3 Algorithm 1.1 Rank-1 lattice FFT Input: A function 𝑔 : T𝑑 → C, a frequency set of interest I ⊂ Z𝑑 , and a reconstructing rank-1 lattice for I, Λ(z, 𝑀) Output: Approximate Fourier coefficients ĝΛ ∈ CI 1: g1d ← (𝑔( 𝑗z/𝑀)) 𝑗 ∈[𝑀] 2: Compute F 𝑀 g1d 3: for k ∈ I do 4: 𝑔ˆ kΛ ← F 𝑀 g1d 𝑚z,𝑀 (k)  5: end for With the basic ideas behind the rank-1 lattice FFT in hand, we can motivate the remaining chapters of this dissertation. 1.1.1 Multiple rank-1 lattices and their construction The most important ingredient for Algorithm 1.1 is a reconstructing rank-1 lattice for a chosen frequency set I. The size of this rank-1 lattice also has major impacts on the computational com- plexity Algorithm 1.1, namely, the sampling step and the FFT. Thus, the goal should be to find as small a reconstructing rank-1 lattice as possible. The most popular reconstructing rank-1 lattice construction is the component-by-component (CBC) approach [39, 56, 46, 48]. The idea is to start with the set of frequency differences D (I) := {k − h | k, h ∈ I} and consider these differences one component at a time. Each component of the generating vector is chosen by a brute-force scan through these differences to ensure that there are no collisions modulo 𝑀, where it suffices for 𝑀 to be a prime number between |D (I)|/2 and |D (I)|. But note also that |I| . |D (I)| . |I| 2 , and in fact, there exist specific frequency sets which require any associated reconstructing rank-1 lattice to have size 𝑀 = Ω(|I| 2 ) [12, Section 3]. See also [39, Section 5] for more information about rank-1 lattices and their often (seemingly unnecessarily) large sizes. The goal of Chapter 2 is to reduce this quadratic dependence on |I| (and therefore on the sampling and computational complexity of Algorithm 1.1) using a slight generalization of rank-1 lattices. Rather than restricting the high-dimensional function to just one lattice, we use multiple rank-1 lattices [40, 41], which can be smaller than a single reconstructing rank-1 lattice is required to be, to drive down the overall complexity. 4 In particular, Chapter 2 presents the first known deterministic algorithm for constructing a series of multiple rank-1 lattices for an arbitrary frequency set. As input, it takes a reconstructing single   reconstructing rank-1 lattice and returns O (log |I|) lattices each of size at O |𝐼 | log2 (𝐾I |I|) where 𝐾I is the sidelength of the smallest hypercube containing I. Each lattice handles a portion of the frequencies in I so that performing FFTs over all smaller lattices will exactly recover the Fourier coefficients of trigonometric polynomials in ΠI Approximation guarantees are also provided for general periodic functions similar to (1.2). Due to the size of the full multiple rank-1 lattice returned, any quadratic dependence on a single rank-1 lattice in |I| can therefore be reduced to a linear (with polylogarithmic terms) dependence without incurring significant additional errors. 1.1.2 Sparse Fourier transforms and rank-1 lattices Though the efforts of Chapter 2 are able to reduce the amount of work necessary in a rank-1 lattice FFT approach, a linear dependence on |I| in the complexity may still be intolerable. For large search spaces of multivariate frequencies I such as the full hypercube of sidelength 𝐾, I = (− 𝐾2 , 𝐾2 ] ∩ Z , these methods still suffer from the curse of dimensionality.     𝑑 Rather than a more general multiple rank-1 lattice approach, Chapter 3 considers the case of functions whose Fourier series are sparse or compressible. Since the rank-1 lattice procedure reduces high-dimensional functions into one-dimensional ones, one-dimensional sparse Fourier transform (SFT) techniques [25, 27, 36, 35, 51, 62, 37, 26, 18, 45, 57, 58, 53, 3, 2, 1] become par- ticularly appealing. SFTs are compressive sensing algorithms which are highly specialized to take advantage of the number theoretic and algebraic structure of the Fourier basis as much as possible. As a result, SFTs rarely have to consider Fourier basis functions individually during the reconstruc- tion process, and so can simultaneously reduce both their measurement needs and computational complexity to effectively depend only on the number of important Fourier series coefficients in the function one aims to approximate. Thus, SFTs can sidestep runtimes which are polynomially de- pendent on the bandwidth (in the case of a rank-1 lattice FFT, 𝑀), and instead run sublinearly in the magnitude of the underlying frequency space under consideration. If one desires to capture only the largest 𝑠 Fourier coefficients of a function, the SFT discussed in Theorem 3.1, for example, runs 5 in O (𝑠2 log4 𝑀)-time/space (with a randomized version cutting the quadratic factor of 𝑠 down to linear). Additionally, these techniques often furnish recovery guarantees for Fourier compressible functions in terms of best 𝑠-term approximations in the same vein as compressed sensing results [19, 24]. However, simply replacing the FFT F 𝑀 g1d in Line 2 of Algorithm 1.1 with a suitable SFT A 𝑠,𝑀 𝑔 1d is not enough to relieve linear dependence on |I|. The for loop from Line 3 to Line 5 which matches 𝑑-dimensional and one-dimensional frequencies requires a linear scan through I. A simple optimization is to swap the order this process, and match the 𝑠-many entries of A 𝑠,𝑀 𝑔 1d to the corresponding Fourier coefficients indexed over I. But even this is not enough, as it requires complete knowledge of the inverse modulus mapping 𝑚 z,𝑀 −1 which is either built up through the rank-1 lattice construction and stored or computed through a O (𝑑|I|)-computation. All benefit in swapping the FFT along the lattice with an SFT is then lost. The methods given in Chapter 3 instead use samples along possibly larger lattices to produce a sparse approximation of the Fourier transform of 𝑔 without directly inverting 𝑚 z,𝑀 . Two algo- rithms are presented which operate on SFTs of manipulations of 𝑔 1d in order to relate the univariate coefficients to their multivariate counterparts in 𝑜(|I|)-time. This allows the methods to run faster and with less memory than it takes to simply enumerate the frequency set I and/or store 𝑚 z,𝑀 (I) whenever 𝑔 has a sufficiently accurate sparse approximation. The result is a series of curse-of-dimensionality breaking, high-dimensional SFTs with proven compressive-sensing type guarantees for arbitrary periodic functions. The approaches are linear in 𝑑 in the sampling and run-time complexities which succeed deterministically in quadratic in 𝑠 time. As with the univariate SFT discussed above, this can be reduced to linear in 𝑠 time via ran- domization. We defer to Section 3.1 for a fuller discussion in the context of the provided literature review. Finally, though these results are able to sidestep the necessity of the inverse of the modulus mapping, 𝑚 z,𝑀 (I), an existing reconstructing rank-1 lattice for I is still required. As discussed above, CBC constructions, though only necessary to perform once, are still relatively expensive 6 in the context of SFT complexities. This requirement is dropped via a randomized approach to constructing rank-1 lattices in Section 4.5, resulting in an algorithm with complexity fully sublinear in |I|. 1.1.3 Applications to PDE The fast, high-dimensional SFT techniques of Chapter 3 are applied in Chapter 4 to construct an efficient, numerical PDE solver. For this exposition, we consider as a model problem an elliptic PDE with periodic boundary conditions −∇ · (𝑎∇𝑢) = 𝑓 (1.3) where 𝑎, 𝑓 : T𝑑 → R are the PDE data, and 𝑢 : T𝑑 → R is the solution. Solving (1.3) using a traditional Fourier spectral method amounts to replacing the data and the solution with their Fourier series, simplifying the left-hand side into a single Fourier series, matching the Fourier coefficients of both sides, and solving the resulting system of equations for the Fourier coefficients of 𝑢. Two main sources of approximation error arise when implementing this technique computa- tionally. The first is due to truncating the Fourier series involved to a finite number of terms. The second is due to numerically approximating the Fourier coefficients of the PDE data. Due to the rich theory of traditional spectral methods, these two sources of error can directly quantify the error of the resulting approximation of 𝑢. Lemma 1.1 (Strang’s lemma, [13]). Let 𝑢 truncation be the function which has the same Fourier series as 𝑢 but truncated in some manner, and 𝑎 approximate and 𝑓 approximate be computed using approxima- tions of the Fourier series of 𝑎 and 𝑓 truncated in the same way as 𝑢 truncation . Then the procedure outlined above produces a solution 𝑢 spectral which satisfies 𝑢 − 𝑢 spectral 𝐻1 .𝑎, 𝑓 𝑢 − 𝑢 truncation 𝐻1 + 𝑎 − 𝑎 approximate 𝐿∞ + 𝑓 − 𝑓 approximate 𝐿2 where .𝑎, 𝑓 denotes an upper bound with constants that depend on the PDE data. This is a rough simplification of Strang’s lemma [13], which is itself a generalization of the well-known Céa’s lemma (the specific version of the lemma that we use is presented and proven in Lemma 4.6 below). Effectively, it states that the spectral method solution is optimal up to its Fourier 7 series truncation and the approximation of the PDE data 𝑎 and 𝑓 . Thus, analyzing convergence reduces to estimating these two errors. Using 𝑑-dimensional FFTs to compute 𝑎 approximate and 𝑓 approximate in the procedure suggested in Lemma 1.1 naturally enforces a Fourier series truncation. A 𝑑-dimensional FFT using a tensorized grid of 𝐾 uniformly spaced points in each dimension will produce approximate Fourier coefficients indexed by frequencies in the 𝑑-dimensional hypercube on the integer lattice Z𝑑 of sidelength 𝐾 (note that when we refer to “bandwidth” in a multidimensional sense, we are still referring to the sidelength 𝐾 of the hypercube containing these integer frequencies). As discussed above, the cost of each 𝑑-dimensional FFT in general requires more than 𝐾 𝑑 operations, as does the linear-system solve (in the absence of any sparsity or other tricks). Thus, not only do traditional Fourier spectral methods suffer from the curse of dimensionality, but even in moderate dimensions, multiscale prob- lems (i.e., PDE data which require very high bandwidth to be fully resolved) can result in intractable computations. This is a prime opportunity to take advantage of our high-dimensional SFT algorithms to com- pute 𝑎 approximate and 𝑓 approximate . This allows for the data terms in Strang’s Lemma above to con- verge near-optimally in terms of their compressibility in the Fourier basis. However, these SFTs only provide us with truncation information useful for 𝑎 and 𝑓 , not necessarily 𝑢. One of the more significant contributions of Chapter 4 is resolving this truncation gap. By analyzing in detail the effect of the differential operator discretized using SFT approximations in frequency, we provide a technique for computing the most important Fourier coefficients of 𝑢 using knowledge of the most important Fourier coefficients of 𝑎 and 𝑓 . We can then prove truncation estimates which allow for a sparse spectral method with 𝐻 1 error guarantees fully characterized by the Fourier compressibility of the data and terms relating to the ellipticity properties of the original PDE. Note that though we only consider a diffusion term in (1.3) for the simplicity of this overview, the analysis in Chapter 4 is actually that of a full multiscale, high-dimensional advection-diffusion-reaction equation, sim- ilar to, e.g., the governing equations for flow dynamics in a porous medium used in hydrological modeling [61]. 8 1.1.4 A note on previous publication of this work The three chapters following this introduction are each comprised of the results presented in three previously available manuscripts. With some exceptions, Chapter 2 is published as [34], Chapter 3 is published as [33], and (at the time of submission of this dissertation) Chapter 4 is publicly available at [32] and has been submitted for publication. Thus, the contents of Chapters 2 and 3 were developed collectively with Lutz Kämmerer and Toni Volkmer and Chapters 2 to 4 with Mark Iwen. Additionally, portions of this introduction were adapted from the introductions of the original three manuscripts. That being said, there are changes in the results given in this dissertation from their original pre- sentations. In Chapter 2, the main modification is clarifying the Fourier recovery mechanism and the error guarantees for approximation in Corollary 2.1. Chapter 3 includes the 𝐿 ∞ error guarantees for the phase-encoding SFT originally provided in [32] and extends these guarantees to all algo- rithms analyzed. Finally, Chapter 4 provides a complete analysis of advection-diffusion-reaction equations rather than solely the diffusion equations of the original text. 1.1.5 Organization The remainder of this chapter is comprised of a section setting the notation and a section collect- ing some useful Fourier series related lemmas that are used throughout the text. The three following chapters respectively present the three main results summarized above. Each chapter gives a short overview, followed by the theory, and finally a numerics section with implementation details and tests demonstrating that theory in practice. 1.2 Notation We let 𝑑 be the ambient dimension of function domains under consideration. The torus T is defined as R/Z, i.e., [0, 1] with the endpoints identified. Given a natural number 𝑀 ∈ N, we let [𝑀] := {0, . . . , 𝑀 − 1}. Finite length vectors are defined using boldface. For example, we often use x ∈ T𝑑 as a point in the spatial domain of a function and k ∈ Z𝑑 as a 𝑑-dimensional frequency to index Fourier coeffi- cients. This also extends to multiindexed finite vectors. For example, if I ⊂ Z𝑑 with |I| < ∞, then 9 we would refer to a vector indexed over I as, e.g., ĝ = ( 𝑔ˆ k )k∈I . Infinite length sequences remain in standard roman font, e.g., 𝑔ˆ = ( 𝑔ˆ k )k∈Z𝑑 . All finite length vectors will be implicitly extended to larger index sets by taking on the value zero wherever they are not originally defined. Additionally, the set of all complex-valued, finite length vectors or infinite length sequences supported on an index set D is denoted as CD . Our convention is to use zero-based indexing, i.e., C 𝑀 = C [𝑀] . In general, a multivariate function to be recovered is 𝑔 : T𝑑 → C. Specific functions of used in the context of elliptic PDE are • 𝑎 : T𝑑 → R, the diffusion coefficient; • b : T𝑑 → R𝑑 , the advection field; • 𝑐 : T𝑑 → R, the reaction coefficient; • 𝑓 : T𝑑 → R, the forcing function; and • 𝑢 : T𝑑 → R, the solution to the PDE. Unless otherwise stated, we assume all functions are complex-valued and defined on the torus T𝑑 . For example, we take the inner product for 𝑢, 𝑣 ∈ 𝐿 2 := 𝐿 2 (T𝑑 ; C) to be ∫ h𝑢, 𝑣i 𝐿 2 := 𝑢(x)𝑣(x) 𝑑x T𝑑 where 𝑣 is taken to be the complex conjugate of 𝑣. Additionally, we assume all vectors and se- quences are complex-valued and defined on Z𝑑 unless otherwise stated. For example, we take the inner product for 𝑢,ˆ 𝑣ˆ ∈ ℓ 2 := ℓ 2 (Z𝑑 ; C) to be Õ h𝑢, ˆ 𝑣ˆ i ℓ2 := 𝑢ˆ k 𝑣ˆ k . k∈Z𝑑 The domains and ranges for the function spaces 𝐿 1 , 𝐿 ∞ , 𝐶 (the space of continuous functions), and 𝐶 ∞ (the space of infinitely differentiable functions) are inferred similarly, as is the index set of the spaces of sequences ℓ 1 and ℓ ∞ . We now define our specific notion of periodic Sobolev spaces (see also [8, Section 2.1] and [47, Appendix A.2.2]). Definition 1.3. For 𝑢 ∈ 𝐿 2 and 𝛼 ∈ N0𝑑 a multiindex, if there exists a 𝑣 ∈ 𝐿 2 such that h𝑣, 𝜙i 𝐿 2 = (−1) |𝛼| h𝑢, 𝜕 𝛼 𝜙i 𝐿 2 for all 𝜙 ∈ 𝐶 ∞ ⊂ 𝐿 2 , 10 we call 𝑣 the weak 𝛼 derivative of 𝑢, and write 𝜕 𝛼 𝑢 := 𝑣. We define the inner product Õ ∫ h𝑢, 𝑣i𝐻 1 := 𝜕 𝛼 𝑢(x)𝜕𝑣(x) 𝑑x, T𝑑 𝛼∈{0,1} 𝑑 ,k𝛼k 1 ≤1 (where all derivatives are considered in the weak sense) and have the associated norm k𝑢k 𝐻 1 := h𝑢, 𝑢i𝐻 1 . The periodic Sobolev space 𝐻 1 is defined as 𝐻 1 := {𝑢 ∈ 𝐿 2 | k𝑢k 𝐻 1 < ∞}. p For any 𝑔 ∈ 𝐿 1 , and any k ∈ Z𝑑 , we define the kth Fourier coefficient ∫ 𝑔ˆ k := 𝑔, e2𝜋ik·◦ 𝐿2 = 𝑔(x)e−2𝜋ik·x 𝑑x. T𝑑 The Wiener algebra 𝑊 := 𝑊 (T𝑑 ; C) is defined as the set of all functions with absolutely summable Fourier coefficients, 𝑊 := 𝑔 ∈ 𝐿 1 | 𝑔ˆ ∈ ℓ 1 . For any function 𝑔 ∈ 𝑊, its Fourier series is written  as Õ 𝑔= 𝑔ˆ k e2𝜋ik·◦ k∈Z𝑑 (see also Theorem 1.1 below). Given a hatted sequence 𝑔ˆ ∈ CZ without having previously defined 𝑑 𝑔, the function 𝑔 is then implicitly defined as the Fourier series with Fourier coefficients 𝑔. ˆ In examples where sequences of Fourier coefficients are known to be finite length, e.g., the output of sparse Fourier transform algorithms, these coefficients are written in boldface, e.g., ĝ𝑠 . Note also that for notational aesthetic, Fourier coefficients for functions with super or subscripts will not include the super or subscript under the hat, e.g., the Fourier coefficients of 𝐺 3𝑑 are 𝐺ˆ 3𝑑 . There are some occasions where super or subscripts will refer to modifications of Fourier coefficients rather than referring to the Fourier coefficients of a super or subscripted function (e.g., 𝑔ˆ 𝑠 is the best opt 𝑠-term approximation of 𝑔, ˆ not the Fourier coefficients of a function 𝑔𝑠 ), but these will be made opt clear from context. For univariate functions 𝑔 1d : T → C, we usually use 𝜔 to index the Fourier coefficients, e.g., 𝑔ˆ 1d = 𝑔ˆ 𝜔1d 𝜔∈Z .  A 𝑑-dimensional frequency set of interest is usually taken to be I ⊂ Z𝑑 . In general, most 𝑑- dimensional frequency sets are labeled using calligraphic font. For example, Chapter 4 introduces a particularly important class of frequency sets, the stamping sets denoted S 𝑁 ⊂ Z𝑑 for 𝑁 ∈ N0 which are implicitly parameterized by the set of all active frequencies in a PDE’s data A. The space 11 of all trigonometric polynomials with frequencies in I is denoted by ΠI := span{e2𝜋ik·◦ | k ∈ I}. The expansion 𝐾I of a frequency set I ⊂ Z𝑑 is defined as   𝐾I := max max 𝑘 𝑗 − min 𝑙 𝑗 + 1. 𝑗 ∈[𝑑] k∈I l∈I Note that this can be interpreted as the sidelength of the smallest hypercube containing I. For a sequence 𝑔ˆ ∈ CZ , its restriction to an index set I is denoted by 𝑔| ˆ I . The same is true for 𝑑 vectors. This can be interpreted as either a vector in CI or a sequence in CZ which is set to zero 𝑑 outside of I. When 𝑔ˆ refers to the Fourier coefficients of the function 𝑔, restrictions of 𝑔 to index sets refer to the Fourier series with Fourier coefficients restricted in the same way, i.e., Õ Õ 𝑔| I := ˆ I ) k e2𝜋ik·◦ = ( 𝑔| 𝑔ˆ k e2𝜋ik·◦ . k∈Z𝑑 k∈I We will also often consider restricting a multiindexed sequence to the hypercube with a fixed side- length 𝐾. We will denote this set by B𝐾𝑑 , where the one-dimensional frequency band of length 𝐾, B𝐾 , is defined by B𝐾 := − 𝐾2 , 𝐾2 ∩Z. Rather than subscript with this set, we use the shorthand     ˆ 𝐾 := 𝑔| 𝑔| ˆ B 𝑑 . The best 𝑠-term approximation of a sequence 𝑔ˆ is defined as the restriction 𝑔ˆ to its 𝐾 𝑠-largest magnitude entries, denoted by 𝑔ˆ 𝑠 . The same applies to vectors. opt Given a univariate function 𝑔 1d : T → C, we define the vector g1d ∈ C 𝑀 as the vector of 𝑀 equispaced samples of 𝑔 1d on T, that is,    1d 1d 𝑗 g := 𝑔 . 𝑀 𝑗 ∈[𝑀] If not explicitly stated, the length of this sampled vector will be clear from context. The length-𝑀 discrete Fourier transform (DFT) of a vector g1d is defined by     1 Õ 1d −2𝜋i𝜔 𝑗/𝑀 1 Õ 1d 𝑗 F𝑀 g 1d := 𝑔𝑗 e = 𝑔 e−2𝜋i𝜔 𝑗/𝑀 for all 𝜔 ∈ [𝑀], 𝜔 𝑀 𝑀 𝑀 𝑗 ∈[𝑀] 𝑗 ∈[𝑀] where the matrix   1 −2𝜋i𝜔 𝑗/𝑀 F𝑀 := e 𝑀 𝜔∈[𝑀], 𝑗 ∈[𝑀] is the discrete Fourier transform matrix. In the context of discrete Fourier transforms, without loss of generality, frequencies 𝜔 are always taken implicitly modulo the length of the DFT, e.g., F 𝑀 g1d −1 = F 𝑀 g1d 𝑀−1 . The same applies to the columns of the DFT matrix.   12 Given a natural number 𝑀 ∈ N (often prime) and a generating vector z ∈ {1, . . . , 𝑀 − 1} 𝑑 , the associated rank-1 lattice is denoted   𝑗 Λ(z, 𝑀) := z mod 1 | 𝑗 ∈ [𝑀] . 𝑀 For any 𝑑-variate function 𝑔, we define its restriction to a rank-1 lattice as 𝑔 1d (𝑡) := 𝑔(𝑡z). Notice then that by combining our previous conventions, given 𝑔 : T𝑑 → C, g1d is the vector of samples of 𝑔 on the rank-1 lattice Λ(z, 𝑀). The modulus function for a rank-1 lattice 𝑚 z,𝑀 : I → [𝑀] is defined by k ↦→ k · z mod 𝑀 1.3 Fourier preliminaries In the sequel, we will make use of various well-known results on Fourier series and discrete Fourier transforms. We provide their statements adapted to our setting here. Theorem 1.1. The space of all infinitely differentiable periodic functions 𝐶 ∞ is dense in 𝐿 2 and 𝐻 1 . In particular, space of trigonometric monomials {e2𝜋ik·◦ ∈ 𝐶 ∞ | 𝑘 ∈ Z𝑑 } is a basis for 𝐶 ∞ , an orthonormal basis for 𝐿 2 , and an orthogonal basis for 𝐻 1 . Proposition 1.1 (Plancherel’s identity). If 𝑢 ∈ 𝐿 2 , then 𝑢ˆ ∈ ℓ 2 with k𝑢k 𝐿 2 = k 𝑢k ˆ ℓ2 . If 𝑣 ∈ 𝐿 2 , then ˆ 𝑣ˆ iℓ2 . h𝑢, 𝑣i 𝐿 2 = h𝑢, Proof. Consider * + Õ Õ h𝑢, 𝑣i 𝐿 2 = 𝑢ˆ k e2𝜋ik·◦ , 𝑣ˆ l e𝜋il·◦ k∈Z𝑑 l∈Z𝑑 𝐿2 Õ = 𝑢ˆ k 𝑣ˆ l e2𝜋ik·◦ , e2𝜋il·◦ 𝐿2 k,l∈Z𝑑 Õ = 𝑢ˆ k 𝑣ˆ l 𝛿k,l k,l∈Z𝑑 Õ = 𝑢ˆ k 𝑣ˆ k k∈Z𝑑 = h𝑢, ˆ 𝑣ˆ i ℓ2 where we have used the orthonormality of the basis of trigonometric monomials in 𝐿 2 . The norm result comes from taking 𝑣 = 𝑢. Lemma 1.2. Let 𝑔 1d ∈ 𝐶 (T) be bandlimited, that is, supp( 𝑔ˆ 1d ) ⊂ B𝑀 . Then 𝑔ˆ 1d = F 𝑀 g1d . 13 Proof. Writing 𝑔 1d (𝑡) = 𝑔ˆ 𝜔1d e2𝜋i𝜔𝑡 , for any 𝜔 ∈ B𝑀 , we calculate Í 𝜔∈B 𝑀     1 Õ 1d 𝑗 F𝑀 g 1d = 𝑔 e−2𝜋i𝜔 𝑗/𝑀 𝜔 𝑀 𝑀 𝑗 ∈[𝑀] ! 1 Õ Õ 0 = 𝑔ˆ 𝜔1d0 e2𝜋i𝜔 𝑗/𝑀 e−2𝜋i𝜔 𝑗/𝑀 𝑀 𝑗 ∈[𝑀] 𝜔0 ∈B 𝑀 1 Õ 1d Õ 2𝜋i(𝜔0 −𝜔) 𝑗/𝑀 = 𝑔ˆ 0 e 𝑀 𝜔0 ∈B 𝜔 𝑀 𝑗 ∈[𝑀] Õ = 𝑔ˆ 𝜔1d0 𝛿0,(𝜔0 −𝜔 mod 𝑀) 𝜔0 ∈B 𝑀 = 𝑔ˆ 𝜔1d , as desired. Lemma 1.3. For any function 𝑔 1d : T → C with Fourier series 𝑔 1d (𝑡) = 1d 2𝜋i𝜔𝑡 , define the Í 𝜔∈Z 𝑔ˆ 𝜔 e aliased polynomial ! Õ Õ 1d 𝑔alias (𝑡) = 𝑔ˆ 𝜔1d0 e2𝜋i𝜔𝑡 . 𝜔∈B 𝑀 𝜔0 ≡𝜔 mod 𝑀 | {z }   =: 1d 𝑔ˆ alias 𝜔 Then the equispaced samples coincide, giving g1d = g1d alias ∈ C 𝑀 and 𝑔ˆ alias1d = F g1d . 𝑀 Proof. We group frequencies in the Fourier series of 𝑔 1d by their residues in B𝑀 , giving   Õ 0 𝑗/𝑀 Õ Õ g1d = 𝑔ˆ 𝜔1d0 e2𝜋i𝜔 = 1d 𝑔ˆ 𝜔+𝑛𝑀 e2𝜋i(𝜔+𝑛𝑀) 𝑗/𝑀 𝑗 𝜔0 ∈Z 𝜔∈B 𝑀 𝑛∈Z ! Õ Õ   = 𝑔ˆ 𝜔1d0 e2𝜋i𝜔 𝑗/𝑀 = g1dalias 𝑗 for all 𝑗 ∈ [𝑀]. 𝜔∈B 𝑀 𝜔0 ≡𝜔 mod 𝑀 Now, since supp( 𝑔ˆ alias 1d ) ⊂ B , Lemma 1.2 implies 𝑔ˆ 1d = F g1d = F g1d . 𝑀 alias 𝑀 alias 𝑀 14 CHAPTER 2 CONSTRUCTING MULTIPLE RANK-1 LATTICES DETERMINISTICALLY As discussed in Section 1.1.1, this chapter focuses on computing Fourier series representations of high-dimensional functions using multiple rank-1 lattices. We begin with a short overview of the lattice construction and associated Fourier recovery methods in Section 2.1 and present the main result in Theorem 2.1. Section 2.2 builds up the proof of Theorem 2.1 with some additional algo- rithmic comments. Section 2.3 provides numerical tests of our multiple rank-1 lattice construction and Fourier recovery algorithm. 2.1 Overview of results We provide the first known deterministic algorithm for constructing multiple rank-1 lattices [40] for any given index set I ⊂ Z𝑑 with expansion 𝐾I := max 𝑗 ∈[𝑑] maxk∈I 𝑘 𝑗 − minl∈I + 1. The  proposed algorithm takes a given generating vector z ∈ [𝑀] 𝑑 of a reconstructing rank-1 lattice for I as input and uses it to deterministically generate 𝐿 smaller lattice sizes 𝑃0 , . . . , 𝑃 𝐿−1 . Rather than using the single set Λ(z, 𝑀) of 𝑀 equispaced sampling points along the lattice generating vector z as in Algorithm 1.1, we use the 𝐿 sampling sets Λ(z, 𝑃0 ), . . . , Λ(z, 𝑃 𝐿−1 ) which are each still equispaced points in the direction of z but are spaced out at different intervals. The frequencies in I are then partitioned into 𝐿 groups, each associated with one of the smaller lattices This partitioning is tracked by a function 𝜈 : I → [𝐿] with the defining property that k · z . h · z mod 𝑃𝜈(k) for all h ≠ k ∈ I, (2.1) that is, for each lattice size 𝑃ℓ , the frequencies in 𝜈 −1 (ℓ) do not collide with any of the other fre- quencies in I modulo 𝑃ℓ . This is similar to the reconstructing property underlying the standard rank-1 lattice FFT ap- proach Algorithm 1.1. However, to effectively use these 𝐿 sampling sets, we must take one FFT along each smaller lattice and match only the frequencies associated to this lattice. Note though that in total, these smaller lattices require only¹ O (|I| log2 (𝐾I |I|)) function evaluations as op- posed to the O (|I| 2 ) function evaluations generally required by a single rank-1 lattice approach (cf. 15 Section 1.1.1). This process is outlined Algorithm 2.1. Algorithm 2.1 Multiple rank-1 lattice FFT Input: A function 𝑔 : T𝑑 → C, a frequency set of interest I ⊂ Z𝑑 , multiple rank-1 lattices Λ(z, 𝑃0 ), . . . Λ(z, 𝑃 𝐿−1 ) and mapping 𝜈 : I → [𝐿] satisfying (2.1) Output: Approximate Fourier coefficients ĝ 𝐿 ∈ CI 1: for ℓ ∈ [𝐿] do 2: g1d,ℓ ← (𝑔( 𝑗z/𝑃ℓ )) 𝑗 ∈[𝑃ℓ ] 3: ĝ1d,ℓ ← F𝑃ℓ g1d,ℓ 4: end for 5: for k ∈ I do   6: 𝑔ˆ k ← ĝ 𝐿 1d,𝜈(k) // recall 𝑚 z,𝑃𝜈 (k) (k) := k · z mod 𝑃𝜈(k) 𝑚 z,𝑃𝜈 (k) (k) 7: end for In detail, this chapter is devoted to proving this main theorem concerning the proposed Fourier coefficient reconstruction algorithm on multiple rank-1 lattices. Theorem 2.1. Let I ⊂ Z𝑑 be some frequency set with expansion 𝐾I . If Λ(z, 𝑀) is a reconstruct- ing single rank-1 lattice for I, then one can deterministically construct multiple rank-1 lattices Λ(z, 𝑃0 ), . . . , Λ(z, 𝑃 𝐿−1 ) such that the Fourier coefficients { 𝑔ˆ k | k ∈ I} of any trigonometric polynomial 𝑔 ∈ ΠI can be exactly reconstructed using only samples of 𝑔 on these lattices by Al- gorithm 2.1. Moreover, the total number of function evaluations on these lattice points is bounded by  for |I| = 1,  2 Õ    𝑃ℓ ≤   |I| for |I| ≥ 2.  ℓ∈[𝐿]  6 |I| log2 (𝑑𝐾I 𝑀) log 3 log2 (|I|) log2 (𝑑𝐾I 𝑀)    The total computational complexity for the construction of these rank-1 lattices can be bounded by   O |I| 2 log(|I|) log(𝑑𝐾I 𝑀) + |I| (𝑑 + log(𝑑𝐾I 𝑀) log(log(𝑑𝐾I 𝑀))) , and the total computational complexity for reconstructing the Fourier coefficients can be bounded by    O |I| 𝑑 + log(𝑑𝐾I 𝑀) log2 (|I| log|I| (𝑑𝐾I 𝑀)) . (2.2) ¹These bounds are simplifications of those in Lemma 2.2 and Theorem 2.2 under the mild assumptions that the dimension 𝑑 and size of the original single rank-1 lattice 𝑀 are bounded polynomially by max{|I|, 𝐾 }. The latter assumption holds for single rank-1 lattices constructed by CBC methods, cf. Section 2.2.1. 16 Proof. The bounds on the total number of samples from the rank-1 lattices follow from Theo- rem 2.2, and the bound on the computational complexity for lattice construction follows from Sec- tion 2.2.1. The exactness of the Fourier coefficient recovery is a result of Corollary 2.1. Since Algorithm 2.1 requires an FFT of length ℓ for each ℓ ∈ [𝐿], the total complexity of Line 1 to Line 4 requires O (log maxℓ∈[𝐿] 𝑃ℓ ℓ∈[𝐿] 𝑃ℓ ) complexity, where the maximum is bounded in Lemma 2.2 Í and the sum is bounded above. The remaining lines are O (𝑑|I|) (assuming the modulus functions have not been precomputed, in which case the complexity would reduce to O (|𝐼 |)). Simplifying these complexities results in (2.2). Note that Algorithm 2.1 exactly reconstructs all Fourier coefficients of multivariate trigonomet- ric polynomials with frequencies in a specific frequency set I which is assumed to be given. One can also apply these rules in order to compute approximations of the Fourier coefficients of more general periodic functions. The resulting trigonometric polynomial can be used as an approximant. For specific approximation settings, the worst case error of this approximation is almost as good as the approximation one achieves when approximating the Fourier coefficients using the lattice rule that uses all samples of the reconstructing single rank-1 lattice from which we start the construction of our rules, cf. [44] for details. From that point of view, the strategy we present here even yields a general approach for significantly reducing the number of sampling values used while only slightly increasing approximation errors. We refer to Corollary 2.1 for more details and to the numerical example in section 2.3.2 that yields Figure 2.5 illustrating this assertion. 2.2 The proof of Theorem 2.1 We denote the 𝑞th prime number by 𝑝 𝑞 , 𝑞 ∈ N. For technical reasons, we define 𝑝 0 := 1. Lemma 2.1. Let J := {𝑘 1 , . . . , 𝑘 𝐽 } ⊂ N with 𝑘 1 < . . . < 𝑘 𝐽 and 𝑀˜ ≥ 𝑘 𝐽 − 𝑘 1 . Also let 𝑞 ∈ N  l m be such that 𝑝 𝑞−1 < 𝐽 ≤ 𝑝 𝑞 , and 𝑄 := max 1, 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀) ˜ − 1 . Then, there exist primes 𝑃0 , . . . , 𝑃 𝐿−1 ∈ P𝐽 := {𝑝 𝑞+ℓ }ℓ∈[𝑄] with 𝐿 ≤ log2 (𝐽) + 1 such that Ø J= {𝑘 ∈ J | 𝑘 . ℎ mod 𝑃ℓ for all ℎ ∈ J \ {𝑘 }} ℓ∈[𝐿] holds. 17 Proof. We assume 𝐽 ≥ 2 and 𝑀˜ > 𝑝 𝑞 , otherwise the statement is trivial. Without loss of generality, we can also assume J ⊂ [ 𝑀] ˜ by considering the residues of each 𝑘 𝑗 ∈ J modulo 𝑀. ˜ Note that these residues are all unique due to 𝑀˜ > 𝑘 𝐽 − 𝑘 1 , and therefore, any modulo 𝑃ℓ collision of the residues is equivalent to a collision of their original values. Let P𝐽 = {𝑝 𝑞+ℓ }ℓ∈[𝑄] be the set of the 𝑄 smallest prime numbers not smaller than 𝑝 𝑞 and 𝑌𝑖, 𝑗 := {𝑝 ∈ P𝐽 | 𝑘 𝑖 ≡ 𝑘 𝑗 mod 𝑝} a subset which collects all primes 𝑝 in P𝐽 where the frequencies 𝑘 𝑖 ∈ J and 𝑘 𝑗 ∈ J collide modulo 𝑝. Since |𝑘 𝑖 − 𝑘 𝑗 | is divisible by each prime 𝑝 in 𝑌𝑖, 𝑗 , the Chinese Remainder Theorem implies that 𝑝∈𝑌𝑖, 𝑗 𝑝 divides |𝑘 𝑖 − 𝑘 𝑗 | < 𝑀. ˜ Therefore, we observe Î |𝑌 | Ö 𝑝 𝑞 𝑖, 𝑗 ≤ 𝑝 < 𝑀˜ 𝑝∈𝑌𝑖, 𝑗 l m for all 𝑖 ≠ 𝑗 ∈ {1, . . . , 𝐽} =: 𝑆0 , i.e., 𝑘 𝑖 ≠ 𝑘 𝑗 , and this implies |𝑌𝑖, 𝑗 | ≤ −1 + log 𝑝 𝑞 ( 𝑀) .˜ Moreover, we collect all primes for which 𝑘 𝑖 collides with any other 𝑘 𝑗 in the sets 𝑌𝑖 : = {𝑝 ∈ P𝐽 | 𝑘 𝑖 ≡ 𝑘 𝑗 mod 𝑝 for at least one 𝑘 𝑗 ∈ J \ {𝑘 𝑖 }} Ø = 𝑌𝑖, 𝑗 . 𝑘 𝑗 ∈J \{𝑘 𝑖 } The cardinality of each 𝑌𝑖 is bounded by Õ l m |𝑌𝑖 | ≤ |𝑌𝑖, 𝑗 | ≤ (𝐽 − 1) −1 + log 𝑝 𝑞 ( 𝑀)˜ . 𝑘 𝑗 ∈J \{𝑘 𝑖 } Accordingly, we count l m |P𝐽 \ 𝑌𝑖 | = |P𝐽 | − |𝑌𝑖 | ≥ 𝑄 − (𝐽 − 1) −1 + log 𝑝 𝑞 ( 𝑀) ˜ ≥ |P𝐽 |/2. We define the indicator variables   1 𝑝 𝑞+ℓ ∈ P𝐽 \ 𝑌𝑖 ,    𝑍𝑖,𝑞+ℓ :=   0 𝑝 𝑞+ℓ ∈ 𝑌𝑖 ,    for all 𝑘 𝑖 ∈ J and 𝑝 𝑞+ℓ ∈ P𝐽 . Summing up these indicator variables and using the estimates from above yields Õ Õ Õ 𝑍𝑖,𝑞+ℓ = |P𝐽 \ 𝑌𝑖 | ≥ |𝑆0 ||P𝐽 |/2 = 𝐽 |P𝐽 |/2. (2.3) 𝑖∈𝑆0 ℓ∈[𝑄] 𝑖∈𝑆0 18 We will now show that 𝑖∈𝑆0 𝑍𝑖,𝑞+ℓ ≥ 𝐽/2 holds for at least one 𝑝 ℓ ∈ P𝐽 by contradiction. To Í this end, suppose that 𝑖∈𝑆0 𝑍𝑖,𝑞+ℓ < 𝐽/2 for all 𝑝 𝑞+ℓ ∈ P𝐽 . Accordingly, we estimate Í Õ Õ 𝑍𝑖,𝑞+ℓ < |𝑆0 ||P𝐽 |/2 = 𝐽 |P𝐽 |/2 ℓ∈[𝑄] 𝑖∈𝑆0 which is in contradiction to (2.3). Thus, there exists at least one prime 𝑝 𝑞+ℓ0 ∈ P𝐽 such that Õ 𝑍𝑖,𝑞+ℓ0 = | {𝑘 𝑖 ∈ J | 𝑘 𝑖 . 𝑘 𝑗 mod 𝑝 𝑞+ℓ0 for all 𝑘 𝑗 ∈ J \ {𝑘 𝑖 }} | ≥ 𝐽/2. 𝑖∈𝑆0 | {z } =:J1 We set 𝑃0 := 𝑝 𝑞+ℓ0 , and then apply the strategy iteratively. For 𝑟 ∈ N, we define 𝑆𝑟 := {𝑖 ∈ 𝑆𝑟−1 | ∃𝑘 𝑗 ∈ J \ {𝑘 𝑖 } with 𝑘 𝑖 ≡ 𝑘 𝑗 mod 𝑃𝑟−1 } and obtain 𝑠𝑟 := |𝑆𝑟 | ≤ 2−𝑟 𝐽. Obviously, we have Ø𝑟 J𝑟0 := {𝑘 𝑖 | 𝑖 ∈ 𝑆𝑟 } = J \ J𝑡 , (2.4) 𝑡=1 which are the frequencies that collide modulo each of 𝑃0 , . . . , 𝑃𝑟−1 to some other frequency in J . We reconsider the variables defined above, but now we restrict the indices to 𝑖 ∈ 𝑆𝑟 . For instance, we observe {𝑃0 , . . . , 𝑃𝑟−1 } ⊂ 𝑌𝑖 for all 𝑖 ∈ 𝑆𝑟 . We estimate Õ Õ Õ 𝑍𝑖,𝑞+ℓ = |P𝐽 \ 𝑌𝑖 | ≥ 𝑠𝑟 |P𝐽 |/2. 𝑖∈𝑆𝑟 ℓ∈[𝑄] 𝑖∈𝑆𝑟 Using the same contradiction as above, we observe that for at least one 𝑝 𝑞+ℓ𝑟 ∈ P𝐽 \ {𝑃0 , . . . , 𝑃𝑟−1 } we have Õ 𝑍𝑖,𝑞+ℓ𝑟 = | {𝑘 𝑖 ∈ J𝑟0 | 𝑘 𝑖 . 𝑘 𝑗 mod 𝑝 𝑞+ℓ𝑟 for all 𝑘 𝑗 ∈ J \ {𝑘 𝑖 }} | ≥ 𝑠𝑟 /2. 𝑖∈𝑆𝑟 | {z } =:J𝑟+1 We now set 𝑃𝑟 := 𝑝 𝑞+ℓ𝑟 and increase 𝑟 up to the point where 0 = |𝑆𝑟+1 | = 𝑠𝑟+1 holds. In order to estimate the largest possible step number 𝑟 max ≥ 𝑟, we require that 𝑠𝑟max +1 ≤ 2−(𝑟max +1) 𝐽 < 1. This is satisfied in particular when 𝑟 max = log2 (𝐽) , and thus we bound the total number of primes as   𝐿 ≤ 𝑟 max + 1 ≤ log2 (𝐽) + 1. Remark 2.1. In the proof of Lemma 2.1 we determined that there exist primes in the candidate  l m set P𝐽 fulfilling the assertion. This set contains the first 𝑄 := max 1, 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀) ˜ −1 19 prime numbers not smaller than 𝑝 𝑞 , 𝑝 𝑞−1 < 𝐽 ≤ 𝑝 𝑞 , which only depends on 𝐽. However, from a theoretical point of view, any prime number 𝑝 larger than d𝐽/2e may fulfill |J1 | ≥ 𝐽/2. Thus, one also could start the set of prime candidates at that point, which would result in a slightly increased cardinality of the candidate set, due to the fact that 𝑄 depends on the logarithm to the base of the smallest prime in the candidate set. In spite of that increased cardinality, the maximal prime number in the candidate set 𝑝 𝑞+𝑄−1 , which is estimated in the next lemma, may be decreased. Analyzing this approach leads to similar statements as in the previous and the following lemmas with slightly changed constants. In more detail, both constants 𝐶1 and 𝐶2 can be bounded less than 3. However, the proof requires more effort and we could not bound the resulting constants lower than those stated in Lemma 2.2. Lemma 2.2. Assume 𝐽, 𝑀˜ ∈ N, 𝐽 ≤ 𝑀, ˜ 𝑝 𝑞 is the smallest prime not smaller than 𝐽, and let  l m 𝑄 := max 1, 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀) ˜ − 1 . Then, we estimate  for 𝐽 = 1,  2    𝑝 𝑞+𝑄−1 ≤ ˜ log 𝐶2 𝐽 log𝐽 ( 𝑀)˜ for 𝐽 ≥ 2,   𝐶1 𝐽 log𝐽 ( 𝑀)    with absolute constants 𝐶1 < 2.3 (1 + e−3/2 ) ≤ 2.832 and 𝐶2 ≤ 2.3. Proof. For 𝐽 = 1, we observe 𝑝 𝑞+𝑄−1 = 𝑝 𝑞 = 2. When 𝐽 ≥ 2 and 𝑝 𝑞 ≥ 𝑀˜ we have 𝑄 = 1 and 𝑝 𝑞 < 2𝐽 as a result of Bertrand’s postulate. We then consider 𝐽 ≥ 2 and 𝑝 𝑞 < 𝑀˜ which yields l m ˜ ˜ 𝑞 + 𝑄 − 1 = 𝑞 − 1 + 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀) − 1 ≤ 𝑞 − 1 + 2(𝐽 − 1) log 𝑝 𝑞 ( 𝑀). We distinguish two cases, where the final constants from the lemma are determined by the second case. In the first, we restrict to the finite range where 2 ≤ 𝐽 ≤ 8 with 𝑝 𝑞 < 𝑀˜ < 𝑝 𝑞d10/(𝐽−1)e , and numerically check that the upper bound ˜ log 2.3 𝐽 log𝐽 ( 𝑀) ˜  𝑝 𝑞+𝑄−1 < 2.831 𝐽 log𝐽 ( 𝑀) is satisfied. In the second case, where 2 ≤ 𝐽 ≤ 8 with 𝑀˜ ≥ 𝑝 𝑞d10/(𝐽−1)e or 𝐽 ≥ 9, we have 20 𝑞 + 𝑄 − 1 ≥ 20. We then estimate this quantity from above as   ˜ = 𝑞 − 1 𝐽 − 1 ˜ 𝑞 + 𝑄 − 1 ≤ 𝑞 − 1 + 2(𝐽 − 1) log𝐽 ( 𝑀) +2 𝐽 log𝐽 ( 𝑀) 𝐽 log𝐽 ( 𝑀) ˜ 𝐽   𝑞−1 𝐽−1 ˜ ≤ 2.3 𝐽 log𝐽 ( 𝑀) ˜ ≤ +2 𝐽 log𝐽 ( 𝑀) 𝐽 𝐽 where one achieves the last estimate by computing 𝑞−1 𝐽 + 2 𝐽−1𝐽 for 2 ≤ 𝐽 < 66 and for 𝐽 ≥ 66, one obtains 𝑞−1 𝐽−1 [60, Eq. (3.6)] 1.25506 1.25506 +2 ≤ +2 ≤ + 2 < 2.3 . 𝐽 𝐽 log 𝐽 log 66 By the estimate −3/2 𝑒 −1/2 𝑥 log(𝑥) ≤ 𝑥 1+e , implying   1 log 𝑒 −1/2 𝑥 log 𝑥 = log(𝑥) + log log(𝑥) − ≤ (1 + e−3/2 ) log 𝑥 2 for 𝑥 > 1, an application of [60, Eq. (3.11)] gives  𝑝 𝑞+𝑄−1 < (𝑞 + 𝑄 − 1) log(𝑞 + 𝑄 − 1) + log log(𝑞 + 𝑄 − 1) − 1/2 ≤ (1 + e−3/2 ) (𝑞 + 𝑄 − 1) log(𝑞 + 𝑄 − 1) ≤ (1 + e−3/2 ) 2.3 𝐽 (log𝐽 ( 𝑀)) ˜ log 2.3 𝐽 log𝐽 ( 𝑀) ˜ ,  as desired. Lemma 2.1 ensures the existence of a set of primes 𝑃0 , . . . , 𝑃 𝐿−1 such that each single element of a given set of integers will not collide modulo at least one 𝑃ℓ with any other of these integers. We can now use these primes to convert the large reconstructing single rank-1 lattice Λ(𝑧, 𝑀, I) for some frequency set I into smaller rank-1 lattices which, based on their ability to avoid collisions in the frequency domain, will provide a sampling set to exactly reconstruct the Fourier coefficients of all multivariate trigonometric polynomials in ΠI . Theorem 2.2. Let I ⊂ Z𝑑 , |I| ≥ 2, and a generating vector z ∈ [𝑀] 𝑑 of Λ(z, 𝑀), a reconstructing rank-1 lattice for I, be given. We determine 𝑀˜ := max{k · z | k ∈ I} − min{k · z | k ∈ I} + 1. 21 Then there exists a set of prime numbers 𝑃0 , . . . , 𝑃 𝐿−1 , 𝐿 ≤ log2 (|I|) + 1, such that Ø I= {k ∈ I | k · z . h · z mod 𝑃ℓ for all h ∈ I \ {k}}, (2.5) ℓ∈[𝐿] Thus, the multiple rank-1 lattices Λ(z, 𝑃0 ), …, Λ(z, 𝑃 𝐿−1 ) can be used as input for the multiple rank- 1 lattice Fourier transform Algorithm 2.1. The total number of sampling values in these multiple rank-1 lattices can be bounded by Õ   ˜ log 𝐶2 |I| log|I| ( 𝑀) 𝑃ℓ ≤ 2 𝐶1 |I| (log2 ( 𝑀)) ˜ , (2.6) ℓ∈[𝐿] with constants 𝐶1 , 𝐶2 from Lemma 2.2. Proof. Define the set of hashed multivariate frequencies in I as I 1d := {k · z | k ∈ I}. Applying Lemma 2.1 with J = I 1d and 𝑀˜ = max I 1d − min I 1d + 1 as above, we find a set of prime numbers {𝑃0 , . . . , 𝑃 𝐿−1 } with maxℓ∈[𝐿] 𝑃ℓ ≤ 𝑝 𝑞+𝑄−1 and respective rank-1 lattices Λ(z, 𝑃0 ), . . . Λ(z, 𝑃 𝐿−1 ) such that (2.5) holds. We estimate Ø Õ Λ(z, 𝑃ℓ ) ≤ 𝑃ℓ ≤ (log2 (|I|) + 1) 𝑝 𝑞−1+𝑄 ℓ∈[𝐿] ℓ∈[𝐿] Lem. 2.2   ≤ ˜ log 𝐶2 |I| log|I| ( 𝑀) 2𝐶1 |I| log2 ( 𝑀) ˜ . Remark 2.2. We consider two crucial estimates on 𝑀˜ in Theorem 2.2         Õ   Õ Õ (2.7)       𝑀˜ = 1 + max 𝑘 𝑖 𝑧𝑖 + max −ℎ𝑖 𝑧𝑖 ≤ 1 + 𝑧𝑖 max 𝑘 𝑖 − min ℎ𝑖 ≤ 𝑑𝐾I 𝑀 k∈I 𝑖∈[𝑑]  h∈I   𝑖∈[𝑑]  k∈I h∈I      𝑖∈[𝑑]  Õ    Õ   (2.8)         𝑀˜ = 1 + max 𝑘 𝑖 𝑧𝑖 − min ℎ𝑖 𝑧𝑖 ≤ 2kzk ∞ max kkk 1 + 1 ≤ 2𝑀 max kkk 1 k∈I 𝑖∈[𝑑]  h∈I   𝑖∈[𝑑]   k∈I k∈I     where 𝐾I is the expansion of I. The estimate in (2.7) is a rough but universal upper bound on 𝑀˜ that depends on the dimen- sion 𝑑. The inequality in (2.8) provides a dimension independent upper bound on 𝑀˜ in cases where the frequency set I is contained in an ℓ1 -ball of a specific size 𝑅, i.e., I ⊂ {k ∈ Z𝑑 | kkk 1 ≤ 𝑅}, which yields 𝑀˜ ≤ 2𝑀 𝑅. We refer to Section 2.2.1, where we present and analyze the computational costs and discuss the advantages of the latter estimate. 22 The Fourier coefficient reconstruction process in Algorithm 2.1 allows for us to prove theoret- ical error guarantees for approximation of functions that are not necessarily Fourier polynomials supported on a known I. In particular, we are able provide 𝐿 ∞ and 𝐿 2 bounds for the approxima- tion error in terms of the error in truncating a function’s Fourier coefficients to a chosen I. The proof relies on the fact that the aliasing error in a DFT is comparable to the truncation error. See, e.g., [44, Lemma 3.1] for similar results and further details. For the following we define the Wiener algebra 𝑊 := {𝑔 ∈ 𝐿 1 | k 𝑔k ˆ ℓ1 < ∞}. Corollary 2.1. Let 𝑔 ∈ 𝑊 and fix a frequency set I ⊂ Z𝑑 with |I| < ∞. Use the multiple rank-1 lattices for I in Theorem 2.2 with Algorithm 2.1 to produce ĝ 𝐿 and 𝑔 𝐿 := k∈I 𝑔ˆ k𝐿 e2𝜋ik·◦ ∈ ΠI . Í Then 𝑔 𝐿 approximates 𝑔 with the error bounds 𝑔 − 𝑔𝐿 𝐿∞ ≤ (1 + 𝐿)k 𝑔ˆ − 𝑔| ˆ I k ℓ1  √  𝑔 − 𝑔𝐿 𝐿2 ≤ 1 + 𝐿 k 𝑔ˆ − 𝑔| ˆ I k ℓ2 . Proof. By the triangle inequality Õ Õ 𝑔 − 𝑔𝐿 𝐿∞ ≤ 𝑔ˆ k − 𝑔ˆ k𝐿 + | 𝑔ˆ k | k∈I k∈Z𝑑 \I = 𝑔|ˆ I − 𝑔ˆ 𝐿 ℓ1 + k 𝑔ˆ − 𝑔|ˆ I k ℓ1 . Now, note that by partitioning the frequencies k ∈ I by their values of 𝜈(k), for ĝ1d,ℓ as in Line 3 of Algorithm 2.1, we obtain Õ Õ ˆ I − 𝑔ˆ 𝐿 𝑔| ℓ1 = 𝑔ˆ k − 𝑔ˆ k𝐿 ℓ∈[𝐿] k∈𝜈 −1 (ℓ) Õ Õ = 𝑔ˆ k − ĝ1d,ℓ k·z mod 𝑃ℓ ℓ∈[𝐿] k∈𝜈 −1 (ℓ) Õ Õ Õ = 𝑔ˆ k − 𝑔ˆ 𝜔1d , ℓ∈[𝐿] k∈𝜈 −1 (ℓ) 𝜔≡k·z mod 𝑃ℓ where the final line follows from Lemma 1.3. Since the multidimensional frequencies k ∈ Z𝑑 of 𝑔ˆ map to the frequencies of 𝑔ˆ 1d by k ↦→ k · z and for any k ∈ 𝜈 −1 (ℓ), there are no such h ∈ I \ {k} 23 such that h · z ≡ k · z mod 𝑃ℓ , we know that Õ Õ 𝑔ˆ k − 𝑔ˆ 𝜔1d ≤ | 𝑔ˆ h |. 𝜔≡k·z mod 𝑃ℓ h∈Z𝑑 \I h·z≡k·z mod 𝑃ℓ Thus Õ Õ Õ ˆ I − 𝑔ˆ 𝐿 𝑔| ℓ1 = 𝑔ˆ k − 𝑔ˆ 𝜔1d ℓ∈[𝐿] k∈𝜈 −1 (ℓ) 𝜔≡k·z mod 𝑃ℓ Õ Õ Õ ≤ | 𝑔ˆ h | ℓ∈[𝐿] k∈𝜈 −1 (ℓ) h∈Z𝑑 \I h·z≡k·z mod 𝑃ℓ Õ Õ ≤ | 𝑔ˆ h | ℓ∈[𝐿] h∈Z𝑑 \I = 𝐿k 𝑔ˆ − 𝑔| ˆ I k ℓ1 , finishing the proof of the 𝐿 ∞ /ℓ 1 result. The 𝐿 2 /ℓ 2 result follows by replacing the 𝐿 ∞ norm by the 𝐿 2 norm, taking squares of all terms, and taking a final square root. As considered in [40, Subsection 4.2] for randomized lattice constructions, we can take an alter- native approach to Theorem 2.2 which requires fewer samples at the cost of having only theoretical reconstruction guarantees for trigonometric polynomials (i.e., the results concerning approximation discussed in Corollary 2.1 do not apply in a straightforward manner). Rather than require that at each step of the lattice construction, a prime 𝑝 is chosen so that a set of frequencies can be obtained which do not collide with any other frequency in the original frequency set modulo 𝑝, we instead recursively reduce the size of the set that the resulting rank-1 lattice has the reconstruction property over without concern for other frequencies. Theorem 2.3. Let I ⊂ Z𝑑 , |I| ≥ 1, 𝑑 ≥ 2, 𝑀˜ := max{k · z | k ∈ I} − min{k · z | k ∈ I|} + 1. For Λ(z, 𝑀) a reconstructing single rank-1 lattice for I , there exist primes 𝑃0 , . . . , 𝑃 𝐿−1 , 𝐿 ≤ log2 (|I|) + 1, with  for |I| = 1,  2 Õ   (2.9)  𝑃ℓ ≤ ˜ log 2 log2 ( 𝑀) ˜ for |I| ≥ 2,   ℓ∈[𝐿]  8 |I| log2 ( 𝑀)    24 such that for every 𝑔 ∈ ΠI , the formula (k) −1 𝑃 𝜈Õ ( 𝑗z) mod 𝑃𝜈(k) −2𝑃𝜋i 𝑗k·z   1 𝑔ˆ k = 𝑔𝜈(k)−1 e 𝜈 (k) 𝑃𝜈(k) 𝑗=0 𝑃𝜈(k) (2.10) Õ with 𝑔𝜈(k)−1 (x) := 𝑔(x) − 𝑔ˆ h e2𝜋ih·x h∈{l|𝜈(l)<𝜈(k)} holds where 𝜈 : I → [𝐿] maps frequencies to the lattice used to reconstruct the corresponding Fourier coefficient, i.e., we can uniquely reconstruct each multivariate trigonometric polynomial with frequencies in I using samples along the rank-1 lattices Λ(z, 𝑃0 ), . . . , Λ(z, 𝑃 𝐿−1 ). Proof. The proof is simply a recursive application of part of the previously discussed approach, so we only provide a sketch. We use only the first prime 𝑃0 from Lemma 2.1 to determine a set of frequencies I0 ⊂ I such that Λ(z, 𝑃0 ) is a reconstructing single rank-1 lattice for I0 with |I0 | ≥ |I|/2. Performing the recon- struction process in Theorem 2.2 for only frequencies in I0 using samples from Λ(z, 𝑃0 ) recovers the corresponding Fourier coefficients exactly. This then defines the correspondence 𝜈(k) = 0 for all k ∈ I0 . Subtracting off the recovered polynomial terms and recursively repeating the process with the frequency set I \ I0 gives (2.10). The upper bound on the number of samples is a result of Lemma 2.2, noting that at each step, the cardinality of the frequency set is reduced by half. Splitting the dependence on |I| and 𝑀˜ in the second logarithm using the inequality log(𝑥𝑦) ≤ 2(log 𝑥)(log 𝑦) for 𝑥, 𝑦 ≥ e and estimating the resulting geometric series gives (2.9). 2.2.1 Analysis of lattice construction The approach analyzed in Theorem 2.2 provides a constructive, deterministic method for build- ing reconstructing multiple rank-1 lattices from reconstructing single rank-1 lattices. Algorithm 2.2 summarizes the suggested approach in detail. In the following, we analyze the runtime complexity. We start by analyzing Line 1 which is O (𝑑 |I|). The arithmetic complexity of Lines 2 and 4 are dominated by determining the set of primes P|I| , which can be done in linear time with respect   to 𝑝 𝑞+𝑄−1 ≤ 𝐶1 |I|(log|I| ( 𝑀)) ˜ log 𝐶2 |I| log|I| ( 𝑀) ˜ estimated in Lemma 2.2, therefore requiring 25 Algorithm 2.2 Deterministic construction of multiple rank-1 lattice suitable for reconstruction and approximation, according to Theorem 2.2 and Lemma 2.1 Input: frequency set I ⊂ Z𝑑 , generating vector z ∈ N0𝑑 of a reconstructing single rank-1 lattice for I Output: number of lattices 𝐿, lattice sizes 𝑃0 , . . . , 𝑃 𝐿−1 , and mapping 𝜈 : I → [𝐿] for which coefficients are computed by which lattice 1: J00 ← {k · z | k ∈ I} 2: Determine 𝑞 ∈ N s.t. 𝑝 𝑞−1 < |I| ≤ 𝑝 𝑞 // recall 𝑝 ℓ is the ℓth prime  l m 3: 𝑄 ← max 0, 2(|I| − 1) log 𝑝 𝑞 ( 𝑀) ˜ −1 // recall 𝑀˜ := max J00 − min J00 + 1 4: P|I| ← 𝑝 𝑞+ℓ ℓ∈[𝑄]  5: Initialize 𝑟 ← 0 and 𝜈 : I → N with 𝜈(k) = 0 for all k ∈ I 6: repeat 7: for all ℓ ∈ [𝑄] do 8: 0 = ∅ J𝑟+1 9: for all k · z ∈ J𝑟0 do 10: 𝜈(k) ← 𝑟 11: if k · z ≡ ℎ0 mod 𝑝 𝑞+ℓ for any ℎ0 ∈ J00 \ {ℎ} then 12: 0 ← J 0 ∪ {k · z} J𝑟+1 𝑟+1 13: end if 14: end for 15: if J𝑟+10 ≤ J𝑟0 /2 then 16: 𝑃𝑟 ← 𝑝 𝑞+ℓ 17: break 18: end if 19: end for 20: 𝑟 ←𝑟 +1 21: until J𝑟0 = ∅ 22: 𝐿 ← 𝑟 O |I| log 𝑀˜ log log 𝑀˜ arithmetic operations.   The goal of the loop from Lines 6 to 21 is to separate the frequencies in I into 𝐿 groups. Each of these ℓ ∈ [𝐿] groups is assigned a prime 𝑃ℓ so that the frequencies do not collide with any others in I modulo 𝑃ℓ . In the worst case, there will be at most 𝐿 = O (log(|I|)) (cf. Theorem 2.2) repetitions of this loop. The first inner loop requires, at most, a scan through each of the 𝑄 = O (|I| log 𝑝 𝑞 ( 𝑀)) ˜ primes in P|𝐼 | . The body of this inner loop can be accomplished in O (|I| log(|I|)) time. Indeed, this requires the computation of 𝑘 mod 𝑝 𝑞+ℓ for all 𝑘 ∈ J00 (where we make sure to track the association between 𝑘 mod 𝑝 𝑞+ℓ and the original frequency k ∈ I with 𝑘 = k · z), a sort of these residues, and a linear scan to determine duplicates of the residues of elements originally in J𝑟0 26 (where we can rely on our function 𝜈 and the aforementioned association between 𝑘 mod 𝑝 𝑞ℓ and k). This is dominated by the sort complexity, O (|I| log(|I|)). Thus, the total complexity for   Lines 6 to 21 is O |𝐼 | 2 log(|I|) log 𝑀˜ (noting that log(|I|) < log 𝑝 𝑞 ). Altogether, we observe  a runtime complexity of    O |I| 2 log(|I|) log 𝑀˜ + |I| 𝑑 + log 𝑀˜ log log 𝑀˜   . In the following, we comment on practical issues of Algorithm 2.2. Line 1 might suffer from overflowing integers which can be avoided by using higher precision integer representations. An alternative is to skip this precomputation and instead compute the inner products modulo 𝑝 𝑞+ℓ on the fly in Line 11 which will increase the runtime complexity by a factor of 𝑑 in the first summand. Note also that one does not necessarily need to compute 𝑀˜ in advance. For the loop over primes starting in Line 7, one might just start with the prime 𝑝 𝑞 and increase the prime number using some “nextprime” function, which would increase the second summand in the runtime complexity. Finally, we discuss the range of the numbers 𝑀˜ as well as the influence of the original single rank-1 lattice on the estimates herein. In general, there are two different suitable approaches for finding a single reconstructing rank-1 lattice for a given frequency index set I. A simple approach is to just pick a rank-1 lattice Λ(z, 𝑀) that provides the reconstruction property from a simple number- theoretic point of view. For instance one can choose generating vectors z and lattice sizes 𝑀 that fulfill 𝑧 0 ∈ N, 𝑧𝑖 ≥ (1 + max 𝑘 𝑖−1 − min ℎ𝑖−1 )𝑧𝑖−1 , 𝑖 = 1, . . . , 𝑑 − 1, k∈I h∈I 𝑀 ≥ (1 + max 𝑘 𝑑−1 − min ℎ 𝑑−1 )𝑧 𝑑−1 . k∈I h∈I Clearly, even for extremely sparse frequency sets and moderate expansions of I this approach will lead to exponentially increasing 𝑑 − 1 components 𝑧 𝑑−1 ≥ 2𝑑−1 and lattice sizes 𝑀 ≥ 2𝑑 . As in Remark 2.2, this approach will lead to exponential increase in 𝑀˜ and thus a linear depen- dence of the dimension 𝑑 in all log 𝑀˜ terms. From a theoretical point of view, this turns out to  be disadvantageous for higher dimensions 𝑑 due to the fact that the runtime complexity of Algo- 27 rithm 2.2 as well as the estimates of the total number of sampling values in Theorems 2.2 and 2.3 will be affected by this factor. A more costly way of determining reconstructing single rank-1 lattices is a suitable CBC con-   struction as suggested in [46], which requires a computational complexity in O 𝑑|I| 2 . The ad- ditional computational effort pays off when applying the theoretical bounds on the resulting lattice size 𝑀. In more detail, the CBC approach offers reconstructing rank-1 lattices with prime lattice sizes 𝑀 bounded from above by 𝑀 ≤ max(|I| 2 , 2(𝐾I + 1)), cf. [39, 46]. As a consequence, the estimates in Remark 2.2 give 𝑀˜ ≤ 𝐶𝑑𝐾I2 |I| 2 or even 𝑀˜ ≤ 𝐶 0 𝑅𝐾I |I| 2 for I a subset of an ℓ1 -ball of radius 𝑅. Thus, the estimates on the required number of sampling values for unique reconstruc- tion of multivariate trigonometric polynomials in ΠI estimated in (2.6) are respectively only either logarithmically dependent on 𝑑 or even independent of 𝑑. 2.3 Numerics In this section, we investigate the statements of Theorems 2.2 and 2.3 numerically². We consider different types of frequency sets I. In particular, we use symmetric hyperbolic cross type frequency sets   Ö   (2.11)     𝑑 I = 𝐻 𝑅,even := k := (𝑘 0 , . . . , 𝑘 𝑑−1 ) > ∈ (2Z) 𝑑 | max(1, |𝑘 𝑡 |) ≤ 𝑅     𝑡∈[𝑑]   with expansion parameter 𝑅 ∈ N, which results in 𝐾I ≤ 2𝑅, in up to 𝑑 = 9 spatial dimensions. These frequency sets 𝐻 𝑅,even 𝑑 have the property that in each frequency component only even indices occur. This matches the behavior of the Fourier support of the test function 𝐺 3𝑑 introduced below in Section 2.3.2 which we approximate using samples on multiple rank-1 lattices, see also [50, 44] and [65, section 2.3.5]. In addition, we use random frequency sets I ⊂ ([−𝑅, 𝑅] ∩ Z) 𝑑 , which yield 𝐾I ≤ 2𝑅, and we consider these in up to 𝑑 = 10 000 spatial dimensions. ²All code is available at https://www.math.msu.edu/~markiwen/Code.html 28 2.3.1 Deterministic multiple rank-1 lattices generated by Algorithm 2.2 suitable for recon- struction and approximation 2.3.1.1 Resulting numbers of samples and oversampling factors In the beginning, we determine the overall number of samples in the multiple rank-1 lattices output from Algorithm 2.2. Up to an additive term of 1 − 𝐿, this corresponds to ℓ∈[𝐿] 𝑃ℓ in Í Theorem 2.2, since the node 0 (point of origin) is contained in each of the resulting rank-1 lattices Λ(z, 𝑃ℓ ). We start with symmetric hyperbolic cross sets I = 𝐻 𝑅,even 𝑑 as defined in (2.11) and consider three different types of reconstructing single rank-1 lattices for I, Λ(z, 𝑀), as input for Algorithm 2.2. First, we use the rank-1 lattices from [50, Table 6.1], which were generated by the CBC method [38, Algorithm 3.7], as input for Algorithm 2.2. We plot the results in Figure 2.1a for spatial dimensions 𝑑 ∈ {2, 3, . . . , 9} and with various refinements 𝑅 ∈ N of I = 𝐻 𝑅,even 𝑑 . The observed numbers of samples seem to behave slightly worse than linear with respect to the cardinality of the frequency set I. The corresponding theoretical upper bounds according to Theorem 2.2 using (2.8) for 𝑀˜ are also shown as filled markers with dashed lines for spatial dimensions 𝑑 ∈ {2, 9} in Figure 2.1a. The plotted upper bounds are distinctly larger and their slopes seem to be slightly higher than those observed by plotting the numerical tests. Second, we consider single reconstructing rank-1 lattices for I, Λ(z, 𝑀), with z := (1, 𝐾I + 1, (𝐾I + 1) 2 , . . . , (𝐾I + 1) 𝑑−1 ) > and 𝑀 := (𝐾I + 1) 𝑑 = (2𝑅 + 1) 𝑑 , (2.12) where 𝐾I = 2𝑅 in our case, and we show the results in Figure 2.1b. We observe that the obtained numbers of samples are similar to the ones in Figure 2.1a, and the theoretical upper bounds accord- ing to Theorem 2.2 using (2.8) for 𝑀˜ are slightly higher due components of the generating vector z being larger. Third, we apply Algorithm 2.2 to the reconstructing single rank-1 lattices for I, Λ(z, 𝑀), as 29 𝑑 =2 𝑑 =3 𝑑 =4 𝑑 =5 𝑑 =6 𝑑 =7 𝑑 =8 𝑑 =9 theo. 109 109 107 107 #samples 105 #samples 105 103 103 101 101 100 100 100 101 102 103 104 105 106 100 101 102 103 104 105 106 |I| |I| (a) Λ(z, 𝑀) generated by [38, Algorithm 3.7] (b) z and 𝑀 according to (2.12) Figure 2.1 Overall #samples = 1 − 𝐿 + ℓ∈[𝐿] 𝑃ℓ for symmetric hyperbolic cross index sets I = Í 𝑑 𝐻 𝑅,even . Filled markers with dashed lines represent theoretical upper bounds from Theorem 2.2 for 𝑑 ∈ {2, 9} calculated using (2.8). considered in [37, section 6]. In detail, we choose Ö 𝑀 := 𝑞 𝑡 and z := (𝑀/𝑞 0 , 𝑀/𝑞 1 , . . . , 𝑀/𝑞 𝑑−1 ) > , 𝑡∈[𝑑] (2.13) where 𝑞 0 := 𝑑𝐾I + 𝑑 + 1 and 𝑞 𝑡+1 := min{𝑝 ∈ N | 𝑝 > 𝑞 𝑡 and 𝑝 prime}. Here, the observed numerical results yield results that do not differ recognizably from Figure 2.1b, and we therefore omit these plots. We would like to point out, that the theoretical upper bounds for that kind of reconstructing single rank-1 lattices are slightly worse than those plotted in Figure 2.1b, cf. Remark 2.2. Note that when running Algorithm 2.2 using single rank-1 lattices Λ(z, 𝑀) of type (2.12) and (2.13) in practice, one may need to deal with limited numeric precision in the computer arith- metic. For instance, for higher spatial dimensions, some components 𝑧𝑡 of the generating vector z may become larger than 64-bit integers. This means that the sets J𝑟0 may have to be computed carefully and repeatedly modulo each considered prime 𝑝 ∈ P|I| when searching for the primes 𝑃0 , . . . , 𝑃 𝐿−1 in Lines 6 to 21 of Algorithm 2.2. In order to have a closer look at the number of samples, we visualize the oversampling factor #samples / |I| = (1 − 𝐿 + ℓ∈[𝐿] 𝑃ℓ )/|I| in Figure 2.2. For the considered test cases and the Í 30 three different types of lattices, we observe that the oversampling factors are below 1.7 log |I| + 3 for |I| > 1. This is distinctly smaller than the theoretical upper bounds in Theorem 2.2 suggest, which have a constant of ≈ 5.7 and additional logarithmic factors depending on 𝑀. ˜ For instance in Figure 2.2a, for I = 𝐻256,even 9 (cardinality |I| = 1 264 513 and #samples = 27 025 383), the oversampling factor is ≈ 21.37 whereas the corresponding upper bound for the oversampling factor is ≈ 3 069 according to Theorem 2.2 using (2.8) for 𝑀. ˜ The plots for reconstructing single rank-1 lattices for I, Λ(z, 𝑀), according to (2.13) look similar to the ones according to (2.12), where the latter are shown in Figure 2.2b. Moreover, we only observe a relatively small difference compared to Figure 2.2a. 𝑑 =2 𝑑 =3 𝑑 =4 𝑑 =5 𝑑 =6 𝑑 =7 𝑑 =8 𝑑 =9 1.7 log |I| + 3 20 20 #samples/|I| #samples/|I| 15 15 10 10 5 5 1 1 100 101 102 103 104 105 106 100 101 102 103 104 105 106 |I| |I| (a) Λ(z, 𝑀) generated by [38, Algorithm 3.7] (b) z and 𝑀 according to (2.12) Figure 2.2 Oversampling factors for deterministic reconstructing multiple rank-1 lattices for sym- metric hyperbolic cross index sets 𝐻 𝑅,even 𝑑 . Next, we change the setting and use frequency sets I drawn uniformly randomly from cubes [−𝑅, 𝑅] 𝑑 ∩ Z𝑑 . We generate reconstructing single rank-1 lattices for I, Λ(z, 𝑀), using [38, Al- gorithm 3.7]. Then, we apply Algorithm 2.2 in order to deterministically generate reconstructing multiple rank-1 lattices. We repeat the test 10 times for each setting with newly randomly cho- sen frequency sets I and determine the maximum number of samples over the 10 repetitions. For frequency set sizes |I| ∈ {10, 100, 1 000, 10 000} in 𝑑 ∈ {2, 3, 4, 6, 10, 100, 1 000, 10 000} spatial dimensions and |I| = 100 000 for only some of the aforementioned spatial dimensions 𝑑, we visu- alize the resulting oversampling factors in Figure 2.3 for expansion parameter 𝑅 = 64 (𝐾I ≤ 128). 31 Using different reconstructing single rank-1 lattices for I, Λ(z, 𝑀), as in Figure 2.2, changes the oversampling factors only slightly, and the oversampling factors are still well below 1.7 log |I| + 3, compare Figures 2.3a and 2.3b. The plots for reconstructing single rank-1 lattices Λ(z, 𝑀) accord- ing to (2.13) are omitted since they look very similar to Figure 2.3b. As mentioned before, we have to take care of possible issues with numeric precision when running Algorithm 2.2 on reconstruct- ing single rank-1 lattices of type (2.12) and (2.13) in practice. 𝑑 =2 𝑑 =3 𝑑 =4 𝑑 =6 𝑑 = 10 𝑑 = 100 𝑑 = 1000 𝑑 = 10000 1.7 log |I| + 3 20 20 #samples/|I| #samples/|I| 15 15 10 10 5 5 1 1 101 102 103 104 105 101 102 103 104 105 |I| |I| (a) Λ(z, 𝑀) generated by [38, Algorithm 3.7] (b) z and 𝑀 according to (2.12) Figure 2.3 Oversampling factors for deterministic reconstructing multiple rank-1 lattices for random frequency sets I ⊂ {−64, −63, . . . , 64} 𝑑 . 2.3.1.2 Improvement of numbers of samples compared to single rank-1 lattices constructed component-by-component For the deterministic reconstructing multiple rank-1 lattices generated by Algorithm 2.2 in the previous subsection, one aspect of particular interest is the total number of nodes compared to the re- constructing single rank-1 lattices, which are given as an input to the algorithm. We investigate this in more detail for the case of lattices generated component-by-component by [38, Algorithm 3.7]. These reconstructing single rank-1 lattices for I, Λ(z, 𝑀), are specifically tailored to the structure of the corresponding frequency sets I. We do not consider the case when Algorithm 2.2 is applied to single rank-1 lattices of type (2.12) or (2.13) as these ones are typically extremely large compared to the cardinality |I| of the frequency sets I. First, we start with symmetric hyperbolic cross index sets I = 𝐻 𝑅,even 𝑑 and reconstructing sin- 32 gle rank-1 lattices for I, Λ(z, 𝑀) generated by [38, Algorithm 3.7]. In Figure 2.4a, the obtained #samples from Figure 2.1a is divided by the size 𝑀 of the single rank-1 lattice. We observe that for smaller expansion parameters 𝑅 and consequently smaller cardinalities |I|, the generated mul- tiple rank-1 lattices still consist of more nodes than the corresponding single rank-1 lattices and therefore the ratio is larger than one. One main reason for this behavior is that for the component- by-component constructed single rank-1 lattices, the number of nodes is initially much less than the worst case upper bounds of almost O (|I| 2 ) suggest, cf. [38, section 3.8.2] for a detailed discussion. Once a certain expansion 𝐾I and cardinality |I| have been reached, the multiple rank-1 lattices outperform the single rank-1 lattices, yielding ratios around 0.1 in Figure 2.4a, i.e., Algorithm 2.2 reduces the number of sampling nodes by 9/10. Second, we consider randomly generated frequency sets as in Figure 2.3a. In Figure 2.4b, we vi- sualize the ratios of the number of nodes of the deterministic reconstructing multiple rank-1 lattices generated by Algorithm 2.2 over the lattice sizes 𝑀 of the reconstructing single rank-1 lattices gen- erated by [38, Algorithm 3.7]. For the spatial dimensions 𝑑 ≥ 4 considered in Figure 2.3a, the ratios decrease rapidly for increasing cardinality |I|, and we do not observe any noticeable dependence on the spatial dimension 𝑑. Note that in the case 𝑑 = 2, the ratios are close to or above one since the cube {−64, −63, . . . , 64}2 of possible frequencies only has cardinality 16 641 and the single rank-1 lattices already have small oversampling factors 𝑀/|I| < 16. Similarly, in the case 𝑑 = 3 for cardinality |I| = 105 , the frequency set I fills approximately 1/20 of the cube {−64, −63, . . . , 64}3 and again the low oversampling factors 𝑀/|I| < 22 of the single rank-1 lattices are hard to beat for multiple rank-1 lattices. 33 𝑑=2 𝑑=3 𝑑=4 𝑑=2 𝑑=3 𝑑=4 𝑑=5 𝑑=6 𝑑 = 10 𝑑 = 100 𝑑=6 𝑑=7 𝑑=8 𝑑=9 𝑑 = 1000 𝑑 = 10000 101 101 #samples/𝑀 #samples/𝑀 100 100 10−1 10−1 10−2 10−2 100 101 102 103 104 105 106 101 102 103 104 105 |I| |I| (a) using symmetric hyperbolic cross index sets I = (b) I ⊂ {−64, −63, . . . , 64} 𝑑 random frequency sets 𝑑 𝐻 𝑅,even Figure 2.4 Ratio #samples for deterministic reconstructing multiple rank-1 lattices suitable for ap- proximation over lattice size 𝑀 of reconstructing single rank-1 lattice Λ(z, 𝑀), where Λ(z, 𝑀) was generated by [38, Algorithm 3.7]. 2.3.2 Comparison of reconstructing multiple and single rank-1 lattices for function approx- imation As mentioned in Corollary 2.1, we can use Algorithm 2.1 to compute approximations of func- tions from samples along multiple rank-1 lattices. We consider the tensor-product test functions 𝐺 3𝑑 : T𝑑 → C from [50], 𝐺 3𝑑 (x) := 𝑗 ∈[𝑑] 𝑔3 (𝑥 𝑗 ), where the one-dimensional function 𝑔3 : T → C Î is defined by r 3𝜋  3  𝑔3 (𝑥) := 4 2 + sgn((𝑥 mod 1) − 1/2) sin(2𝜋𝑥) 207𝜋 − 256 and k𝐺 3𝑑 k 𝐿 2 (T𝑑 ) = 1. The function 𝐺 3𝑑 lies in a so-called Sobolev space of dominating mixed smoothness with smoothness almost 3.5 such that its Fourier coefficients 𝐺ˆ 3𝑑 decay fast with re- spect to hyperbolic cross structures. In addition, ( 𝐺ˆ 3𝑑 )k = 0 if at least one component of k is odd. Therefore, we approximate the function 𝐺 3𝑑 by multivariate trigonometric polynomials 𝐺 3𝑑,𝐿 := Í 𝑑,𝐿 2𝜋ik·◦ with Fourier coefficients supported on modified hyperbolic cross index sets k∈I ( Ĝ3 )k e I = 𝐻 𝑅,even 𝑑 as defined in (2.11). We compute the Fourier coefficients Ĝ3𝑑,𝐿 based on samples of 𝐺 3𝑑 and determine the relative 𝐿 2 (T𝑑 ) sampling errors k𝐺 3𝑑 − 𝐺 3𝑑,𝐿 k 𝐿 2 (T𝑑 ) /k𝐺 3𝑑 k 𝐿 2 (T𝑑 ) , where s  2 Õ    2 Õ   k𝐺 3𝑑 − 𝐺 3 k 𝐿 2 (T𝑑 ) = k𝐺 3𝑑 k 2𝐿 2 (T𝑑 ) − 𝑑,𝐿 Ĝ3𝑑,𝐿 + 𝐺ˆ 3𝑑 − Ĝ3𝑑,𝐿 . k k k k∈I k∈I 34 We compare the numerical results from [44, Figure 4.3b], where reconstructing single rank-1 lat- tices and reconstructing random multiple rank-1 lattices were used, with new results using deter- ministic multiple rank-1 lattices returned by Algorithm 2.2. As input for Algorithm 2.2, we use reconstructing single rank-1 lattices for I, Λ(z, 𝑀), with generating vectors chosen according to (2.12). Instead of computing the Fourier coefficients 𝐺ˆ 3𝑑,𝐿 of the multivariate trigonometric polynomial 𝐺 3𝑑,𝐿 by Algorithm 2.1, we use [43, Algorithm 2], which averages over all single rank-1 lattices Λ(z, 𝑃ℓ ) that are able to reconstruct a Fourier coef- ficient 𝑔ˆ k of any multivariate trigonometric polynomial 𝑔 for a given frequency k ∈ I, whereas Algorithm 2.1 uses only one single rank-1 lattice Λ(z, 𝑃𝜈(k) ). Note that both computation methods are based on the same samples of 𝐺 3𝑑 along the obtained deterministic multiple rank-1 lattices. The resulting relative 𝐿 2 (T𝑑 ) sampling errors are visualized for spatial dimensions 𝑑 ∈ {2, 3, 5, 8} in Figure 2.5 as solid lines and filled markers. We observe that the errors decrease rapidly for increas- ing expansion parameters 𝑅 of the hyperbolic cross I = 𝐻 𝑅,even 𝑑 and correspondingly increasing number of samples. In addition, we consider reconstructing single rank-1 lattices generated by [38, Algorithm 3.7] as input for Algorithm 2.2 and obtain results which are very close and therefore omit their plots. Moreover, the relative errors from [44, Figure 4.3b] when using reconstructing random multiple rank-1 lattices are shown in Figure 2.5 as dotted lines and filled markers. We observe that the obtained number of samples and errors are similar to the deterministic ones. The results for the deterministic multiple rank-1 lattice seem to be slightly better for 𝑑 ∈ {3, 5, 8}. In addition, the relative errors from [44, Figure 4.3b] when directly sampling along reconstructing single rank-1 lattices are drawn as dashed lines and unfilled markers. It has already been observed in [44] that in the beginning for smaller expansion parameters 𝑅 and consequently smaller number of samples, the single rank-1 lattices perform better until a certain expansion parameter 𝑅 has been reached. Afterwards, the multiple rank-1 lattices clearly outperform the single ones. 35 𝑑=2 𝑑=3 𝑑=5 𝑑=8 100 k𝐺 3𝑑 − 𝐺 3𝑑,𝐿 k 𝐿 2 (T𝑑 ) /k𝐺 3𝑑 k 𝐿 2 (T𝑑 ) 10−2 10−4 10−6 10−8 10−10 100 101 102 103 104 105 106 107 108 109 number of samples Figure 2.5 Relative 𝐿 2 (T𝑑 ) sampling errors for 𝐺 3𝑑 with respect to the number of samples for re- constructing single rank-1 lattices (dashed lines, unfilled markers), reconstructing random multiple rank-1 lattices (dotted lines, filled markers), and reconstructing deterministic multiple rank-1 lat- tices (solid lines, filled markers), when using the frequency index sets I := 𝐻 𝑅,even 𝑑 . Results for single rank-1 lattices from [65, Figure 2.14] and for reconstructing random multiple rank-1 lattices from [44, Figure 4.3]. 2.3.3 Deterministic multiple rank-1 lattices with decreasing lattice size for reconstruction of trigonometric polynomials Besides generating deterministic multiple rank-1 lattices according to Theorem 2.2 and Algo- rithm 2.2, we have also discussed the alternate approach of Theorem 2.3, where the theoretical results for function approximation, as mentioned in Corollary 2.1, cannot be applied directly, but the number of required samples for the reconstruction of multivariate trigonometric polynomials may be distinctly smaller. We start with symmetric hyperbolic cross type index sets I = 𝐻 𝑅,even 𝑑 and apply the gen- eration strategy of Theorem 2.3 on reconstructing single rank-1 lattices for I, Λ(z, 𝑀), gener- ated by [38, Algorithm 3.7]. We visualize the resulting oversampling factors #samples / |I| = (1 − 𝐿 + ℓ∈[𝐿] 𝑃ℓ )/|I| in Figure 2.6a for spatial dimensions 𝑑 ∈ {2, 3, . . . , 9} and various expan- Í sion parameters 𝑅. For the considered test cases, we observe that the oversampling factors are well below 3. When starting with single rank-1 lattices according to (2.12), the observed oversampling factors only differ slightly, cf. Figure 2.6b. 36 𝑑 =2 𝑑 =3 𝑑 =4 𝑑 =5 𝑑 =6 𝑑 =7 𝑑 =8 𝑑 =9 3 3 #samples/|I| #samples/|I| 2 2 1 1 100 101 102 103 104 105 106 100 101 102 103 104 105 106 |I| |I| (a) Λ(z, 𝑀) generated by [38, Algorithm 3.7] for (b) z and 𝑀 according to (2.12) for symmetric hyper- symmetric hyperbolic cross index sets I = 𝐻 𝑅,even 𝑑 bolic cross index sets I = 𝐻 𝑅,even 𝑑 𝑑 =2 𝑑 =3 𝑑 =4 𝑑 =6 𝑑 = 10 𝑑 = 100 𝑑 = 1000 𝑑 = 10000 3 3 #samples/|I| #samples/|I| 2 2 1 1 101 102 103 104 105 101 102 103 104 105 |I| |I| (c) Λ(z, 𝑀) generated by [38, Algorithm 3.7] for ran- (d) z and 𝑀 according to (2.12) for random frequency dom frequency sets I ⊂ {−64, −63, . . . , 64} 𝑑 sets I ⊂ {−64, −63, . . . , 64} 𝑑 Figure 2.6 Oversampling factors for deterministic reconstructing multiple rank-1 lattices con- structed according to Theorem 2.3. The reason for these very low oversampling factors is that during the generation process ac- cording to the proof of Theorem 2.3 the prime 𝑃0 is relatively close to |I|, the next prime 𝑃1 is relatively close to |I \ I0 |, 𝑃2 is relatively close to |I \ (I0 ∪ I1 )|, and so on, where I0 contains the frequencies of I which can be reconstructed by the lattice Λ(z, 𝑃0 ) and where I1 contains the frequencies of I \ I0 which can be reconstructed by Λ(z, 𝑃1 ). In particular, we do not have the fixed lower bound |I| ≤ 𝑃ℓ for all ℓ as in Algorithm 2.2. Next, we change the setting and use the frequency sets I drawn uniformly randomly from cubes [−𝑅, 𝑅] 𝑑 ∩ Z𝑑 , see Section 2.3.1. As before, we generate reconstructing single rank-1 lat- 37 tices for I, Λ(z, 𝑀), using [38, Algorithm 3.7]. Then, we apply the strategy of Theorem 2.3 in order to deterministically generate reconstructing multiple rank-1 lattices. We repeat the test 10 times for each setting with newly randomly chosen frequency sets I and determine the maxi- mum number of samples over the 10 repetitions. For cardinalities |I| ∈ {10, 100, 1 000, 10 000} in 𝑑 ∈ {2, 3, 4, 6, 10, 100, 1 000, 10 000} spatial dimensions, we visualize the resulting oversampling factors in Figure 2.6c for expansion parameter 𝑅 = 64 (𝐾I ≤ 128). Starting with reconstruct- ing single rank-1 lattices Λ(z, 𝑀) according to (2.12) as in Figure 2.3b changes the oversampling factors only slightly, and the oversampling factors are still well below 4, cf. Figure 2.6d. 38 CHAPTER 3 HIGH-DIMENSIONAL SPARSE FOURIER TRANSFORMS As discussed in Section 1.1.2, this chapter focuses on efficient sparse Fourier transforms (SFTs) for high-dimensional functions. We begin with a review of the prior work against which we com- pare our techniques as well as provide a more in-depth discussion of the methods in Section 3.1. Section 3.2 reviews and further refines the univariate SFTs from [37, 53] which we will use in our multivariate techniques. Section 3.3 presents our main multivariate approximation algorithms and their analysis. Finally, we implement these two algorithms numerically and present the empirical results in Section 3.4. 3.1 Overview of results and prior work Much recent work has considered the problem of quickly recovering both exactly sparse mul- tivariate trigonometric polynomials as well as approximating more general functions by sparse trigonometric polynomials using dimension-incremental approaches [65, 59, 17, 16]. These meth- ods recover multivariate frequencies adaptively by searching lower-dimensional projections of I ⊂ − 𝐾2 , 𝐾2 ∩ Z for energetic frequencies. These lower dimensional candidate sets are then     𝑑 paired together to build up a fully 𝑑-dimensional search space smaller than the original one, which is expected to support the most energetic frequencies (see e.g., [42, Section 3] and the references within for a general overview). In the context of Fourier methods, lattice-based techniques do a good job of support identifi- cation on the intermediary, lower-dimensional candidate sets, and especially recently, techniques based on multiple rank-1 lattices have shown success [43, 42] (see also Chapter 2). Though the total complexity in each of these steps is manageable and can be kept linear in the sparsity 𝑠 of the Fourier series to be computed, these steps must be repeated in general to ensure that no potential frequencies have been left out. In particular, this results in at least O (𝑑𝑠2 𝐾) operations (up to logarithmic fac- tors) for functions supported on arbitrary frequency sets in order to obtain approximations that are guaranteed to be accurate with high probability. Though from an implementational perspective, this runtime can be mitigated by completing many of the repetitions and initial one-dimensional 39 searches in parallel, once pairing begins, the results of previous iterations must be synchronized and communicated to future steps, necessitating serial interruptions. Other earlier works include [37] in which previously existing univariate SFT results [36, 62] are refined and adapted to the multivariate setting. Though the resulting complexity on the dimension is well above the dimension-incremental approaches, deterministic guarantees are given for multi- variate Fourier approximation in O (𝑑 4 𝑠2 ) (up to logarithmic factors) time and memory, as well as a random variant which drops to linear scaling in 𝑠, leading to a runtime on the order of O (𝑑 4 𝑠) with respect to 𝑠 and 𝑑. Additionally, the compressed sensing type guarantees in terms of Fourier compressibility of the function under consideration carry over from the univariate SFT analysis. The scheme essentially makes use of a reconstructing rank-1 lattice on a superset of the full integer       𝑑 cube I = − 𝑑𝐾 2 , 𝑑𝐾 2 ∩ Z with certain number theoretic properties that allow for fast in- version of the resulting one-dimensional coefficients by the Chinese Remainder Theorem. We note that this necessarily inflated frequency domain accounts for the suboptimal scaling in 𝑑 above. In [54], another fully deterministic sampling strategy and reconstruction algorithm is given. Like [37] though, the method can only be applied to Fourier approximations over an ambient fre- quency space I which is a full 𝑑-dimensional cube. Moreover, the vector space structure exploited to construct the sampling sets necessitates that the side length 𝐾 of this cube is the power of a prime. However, the benefits to this construction are among the best considered so far: the method is entirely deterministic, has noise-robust recovery guarantees in terms of best 𝑠-term estimates, the sampling sets used are on the order of O (𝑑 3 𝑠2 𝐾), and the reconstruction algorithm’s runtime complexity is on the order of O (𝑑 3 𝑠2 𝐾 2 ) both up to logarithmic factors. On the other hand, this algorithm still does not scale linearly in 𝑠. Finally, we discuss [15, 14], a pair of papers detailing high-dimensional Fourier recovery algo- rithms which offer a simplified (and therefore faster) approach to lattice transforms and dimension- incremental methods. These algorithms make heavy use of a one-dimensional SFT [51, 18] based on a phase modulation approach to discover energetic frequencies in a fashion similar to our Al- gorithm 3.1 below. The main idea is to recover entries of multivariate frequencies by using equis- 40 paced evaluations of the function along a coordinate axis as well as samples of the function at the same points slightly shifted (the remaining dimensions are generally ignored). This shift in space produces a modulation in frequency from which frequency data can be recovered (cf. (3.6) and Algorithm 3.1 below). By supplementing this approach with simple reconstructing rank-1 lattice analysis for repetitions of the full integer cube, the runtime and number of samples are given on average as O (𝑑𝑠) up to logarithmic factors. However, due to the possibility of collisions of multivariate frequencies under the hashing al- gorithms employed, these results hold only for random signal models. In particular, theoretical results are only stated for functions with randomly generated Fourier coefficients on the unit circle with randomly chosen frequencies from a given frequency set. Additionally, the analysis of these techniques assumes that the algorithm applied to the randomly generated signal does not encounter certain low probability (with respect to the random signal model considered therein) energetic fre- quency configurations. Furthermore, the method is restricted in stability, allowing for spatial shifts in sampling bounded by at most the reciprocal of the side length of the multivariate frequency cube under consideration, and only exact recovery is considered (or recovery up to factors related to sam- ple corruption by Gaussian noise in [14]). In addition, no results given are proven concerning the approximation of more general periodic functions, e.g., compressible functions. 3.1.1 Main contributions We begin with a brief summary of the benefits provided by our approach in comparison to the methods discussed above. Below, we ignore logarithmic factors in our summary of the run- time/sampling complexities. • All variants, deterministic and random, of both algorithms presented in this paper have run- time and sampling complexities linear in 𝑑 with best 𝑠-term estimates for arbitrary signals. This is in contrast to the complexities of dimension-incremental approaches [16, 17, 43, 42] and the number theoretic approaches [37, 54] while still achieving similarly strong best 𝑠-term guarantees. • Both algorithms proposed herein have randomized variants with runtime and sampling com- 41 plexities linear in 𝑠 with best 𝑠-term estimates on arbitrary signals that hold with high probability. Thus, the randomized methods proposed in this paper achieve the efficient run- time complexities of [15, 14] while simultaneously exhibiting best 𝑠-term approximation guarantees for general periodic functions thereby improving on the non-deterministic dimen- sion incremental approaches [16, 17, 43, 42]. • Both algorithms proposed herein have a deterministic variant with runtime and sampling complexities quadratic in 𝑠 with best 𝑠-term estimates on arbitrary signals that also hold deterministically. This is in contrast to all previously discussed methods without determin- istic guarantees, [16, 17, 43, 42, 14, 15], as well as improving on prior deterministic results [37, 54] for functions whose energetic frequency support sets I are smaller than the full cube. Overview of the methods and related theory We will build on the fast and potentially deterministic one-dimensional SFT from [37] and its discrete variant from [53] by applying those techniques along rank-1 lattices. As previously dis- cussed, the primary difficulty in doing so is matching energetic one-dimensional Fourier coefficients with their 𝑑-dimensional counterparts. We are especially interested in doing this in an efficient and provably accurate way. We propose and analyze two different methods for solving this problem herein. The first frequency identification approach, Algorithm 3.1, involves modifications of the phase shifting technique from [51, 18, 15, 14]. We make use of the translation to modulation property of the Fourier transform (cf. (3.6) below) observed in these works to extract out frequency data. Combining this with SFTs on rank-1 lattices gives a new class of fast methods with several bene- fits. Notably, we are able to maintain error guarantees for any function (not just random signals) in terms of best Fourier 𝑠-term approximations. Additionally, we factor the instability and potential for collisions from [15, 14] into these best 𝑠-term approximations. The only downside in our es- timates is an additional linear factor of 𝐾 multiplying the terms commonly seen in standard error bounds (cf. Corollaries 3.1 and 3.2). However, we are able to maintain deterministic results with runtime and sampling complexities that are quadratic in 𝑠, as well as results for random variants 42 with complexities that are linear in 𝑠. Additionally, the dependence on the dimension 𝑑 is reduced from O (𝑑 4 ) in [37] to only O (𝑑). Our second technique in Algorithm 3.2 uses a different approach to applying SFTs to modifi- cations of the multivariate function along a reconstructing rank-1 lattice. Effectively, we reduce 𝑔 to a two-dimensional function. This is done by mapping all but one dimension, say ℓ, down to one using a rank-1 lattice, and leaving the ℓ dimension free. From here, we take a two-dimensional DFT (taking care to use SFTs where possible). The locations of Fourier coefficients in this two- dimensional DFT can then be used to determine the ℓth coordinate of the frequency data. This is then repeated for each dimension ℓ ∈ [𝑑]. This process is slower but more stable than Algorithm 3.1. In particular this produces more ac- curate best Fourier 𝑠-term approximation guarantees without the extraneous factor of 𝐾 (cf. Corol- laries 3.3 and 3.4). The deterministic results still have a complexity quadratic in 𝑠 with random extensions that are linear in 𝑠. However, we incur an extra quadratic factor of 𝐾 in the complexity bounds (cf. Lemma 3.4). We stress here that by compartmentalizing the translation from multivariate analysis to univari- ate analysis into the theory of rank-1 lattices, our techniques are suitable for any frequency set of interest I. The only constraint is the necessity for a reconstructing rank-1 lattice for I (and poten- tially projections of I in the case of Algorithm 3.2). This flexibility improves the results from [37], primarily with respect to the polynomial factor of 𝑑 in our runtime and sampling complexities. We remark that though the existence of the necessary reconstructing rank-1 lattice is a nontrivial re- quirement, there exist efficient construction algorithms for arbitrary frequency sets via deterministic component by component methods, see e.g., [39, 46, 56]. In terms of implementation, we note that the multivariate techniques we employ are entirely modular with respect to the univariate SFT used. As such, the complexity estimates and error bounds for our approaches in Section 3.3 are directly derived from the chosen SFT. Finally, the methods we present are trivially parallelizable so that in particular, a large majority of these univariate SFTs in Algorithm 3.1 or Algorithm 3.2 can occur in parallel. 43 3.2 One-dimensional sparse Fourier transform results Below, we summarize some of the previous work on one-dimensional sparse Fourier transforms which will be used in our multivariate algorithms. Rather than focus on the inner workings of these SFTs, we highlight five main properties concerning their recovery guarantees and computational complexity. This compartmentalization allows for any SFT satisfying these properties to be easily extended for multivariate Fourier recovery simply by plugging into Algorithm 3.1 and 3.2. We first review the sublinear-time algorithm from [37] which uses fewer than 𝑀 nonequispaced samples of a function to compute Fourier coefficients in B𝑀 . We refer the reader interested in its implementation and mathematical explanation to [37] as well as [36, 62]. Below, we will use slightly improved error bounds over those in its original presentation. The proof of these improvements necessitates the following lemma. opt Lemma 3.1. For x ∈ C𝐾 and S𝜏 := {𝑘 ∈ [𝐾] | |𝑥 𝑘 | ≥ 𝜏}, if 𝜏 ≥ kx−x𝑠 k 1 𝑠 , then |S𝜏 | ≤ 2𝑠 and opt √ kx − x| S𝜏 k 2 ≤ kx − x2𝑠 k 2 + 𝜏 2𝑠, opt kx − x| S𝜏 k 1 ≤ kx − x2𝑠 k 1 + 𝜏 · 2𝑠. Proof. Ordering the entries of x in descending order (with ties broken arbitrarily) as |𝑥 𝑘 1 | ≥ |𝑥 𝑘 2 | ≥ . . ., we first note that Õ2𝑠 opt kx − x𝑠 k 1 ≥ |𝑥 𝑘 𝑗 | ≥ 𝑠|𝑥 𝑘 2𝑠 |. 𝑗=𝑠+1 By assumption then, 𝜏 ≥ |𝑥 𝑘 2𝑠 |, and since S𝜏 contains the |S𝜏 |-many largest entries of x, we must have S𝜏 ⊂ supp(x2𝑠 ). Note then that |S𝜏 | ≤ 2𝑠. Finally, we calculate opt opt opt kx − x| S𝜏 k 2 ≤ kx − x2𝑠 k 2 + kx2𝑠 − xS𝜏 k 2 s Õ opt ≤ kx − x2𝑠 k 2 + |𝑥 𝑘 | 2 opt 𝑘∈supp(x2𝑠 )\S𝜏 opt √ ≤ kx − x2𝑠 k 2 + 𝜏 2𝑠. The ℓ 1 estimate is proved by the same procedure, where the 2𝑠 many terms are bounded by 𝜏 in the last line without a square root. 44 Theorem 3.1 (Robust sublinear-time, nonequispaced SFT: [37], Theorem 7/[53], Lemma 4). For a signal 𝑔 1d ∈ 𝑊 (T) ∩ 𝐶 (T) corrupted by some arbitrary noise 𝜇 : T → C, Algorithm 3 of [37], denoted A2𝑠,𝑀 sub , will output a 2𝑠-sparse coefficient vector ĝ1d,𝑠 ∈ CB 𝑀 which 1. reconstructs every frequency of 𝑔ˆ 1d | 𝑀 ∈ CB𝑀 , 𝜔 ∈ B𝑀 , with corresponding Fourier coeffi- cients meeting the tolerance ! √ k 𝑔ˆ 1d | − ( 𝑔ˆ 1d | ) opt k 1 | 𝑔ˆ 𝜔1d | > (4 + 2 2) + k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ , 𝑀 𝑀 𝑠 𝑠 2. satisfies the ℓ ∞ error estimate for recovered coefficients 1d 1d opt √ © 𝑔ˆ | 𝑀 − ( 𝑔ˆ | 𝑀 ) 𝑠 1 ( 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) + 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 ª ∞ ≤ 2 ­­ 1 + k𝜇k ∞® , ® 𝑠 « ¬ 3. satisfies the ℓ 2 error estimate √ opt opt (8 2 + 6)k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 ) 𝑠 k 1 k 𝑔ˆ 1d | 𝑀 1d,𝑠 1d − ĝ k 2 ≤ k 𝑔ˆ | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 2 + √ 𝑠 √ √   + (8 2 + 6) 𝑠 k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ , 4. satisfies the ℓ 1 error estimate opt √ opt k 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 k 1 ≤ k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 1 + (6 2 + 16)k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 ) 𝑠 k 1 √   + (6 2 + 16)𝑠 k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ , 5. and the number of required samples of 𝑔 1d and the operation count for A2𝑠,𝑀 sub are 𝑠2 log4 𝑀   O . log 𝑠 The Monte Carlo variant of A2𝑠,𝑀 sub , denoted A sub,MC , referred to by Corollary 4 of [37] satisfies all 2𝑠,𝑀 of the conditions (1) – (4) simultaneously with probability (1 − 𝜎) ∈ [2/3, 1) and has number of required samples and operation count    3 𝑀 O 𝑠 log (𝑀) log . 𝜎 The samples required by A2𝑠,𝑀 sub,MC are a subset of those required by A2𝑠,𝑀 sub . 45 Proof. We refer to [37, Theorem 7] and its modification for noise robustness in [53, Lemma 4] for the proofs of properties (2) and (5). As for (1), [37, Lemma 6] and its modification in [53, Lemma 4] imply that any 𝜔 ∈ B𝑀 with | 𝑔ˆ 𝜔1d | > 4(k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 ) 𝑠 k 1 /𝑠 + k 𝑔ˆ 1d − 𝑔ˆ 1d | 𝑀 k 1 + k𝜇k ∞ ) =: 4𝛿 opt will be identified in [37, Algorithm 3]. An approximate Fourier coefficient for these and any other recovered frequencies is stored in the vector x which satisfies the same estimate in property (2) by the proof of [37, Theorem 7] and [53, Lemma 4]. However, only the 2𝑠 largest magnitude values of x will be returned in ĝ1d,𝑠 . We therefore analyze what happens when some of the potentially large Fourier coefficients corresponding to frequencies in S4𝛿 do not have their approximations assigned to ĝ1d,𝑠 . Using the definition of S𝜏 given in Lemma 3.1 applied to 𝑔ˆ 1d | 𝑀 , we must have |S4𝛿 | ≤ 2𝑠 = | supp( ĝ1d,𝑠 )|. If 𝜔 ∈ S4𝛿 \ supp( ĝ1d,𝑠 ), there must then exist some other 𝜔0 ∈ supp( ĝ1d,𝑠 ) \ S4𝛿 which was identified and took the place of 𝜔 in supp( ĝ1d,𝑠 ). For this to happen, | 𝑔ˆ 𝜔1d0 | ≤ 4𝛿 and |𝑥 𝜔0 | ≥ |𝑥𝜔 |. But by property (2) (extended to all coefficients in x), we know √ √ √ 4𝛿 + 2𝛿 ≥ | 𝑔ˆ 𝜔1d0 | + 2𝛿 ≥ |𝑥 𝜔0 | ≥ |𝑥 𝜔 | ≥ | 𝑔ˆ 𝜔1d | − 2𝛿. √ Thus, any frequency in S4𝛿 not chosen satisfies | 𝑔ˆ 𝜔1d | ≤ (4 + 2 2)𝛿, and so every frequency in S(4+2√2)𝛿 is in fact identified in ĝ1d,𝑠 verifying property (1). As for property (3), we estimate the ℓ 2 error using property (2), Lemma 3.1, and the above argument as k 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 k 2 ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | supp(ĝ1d,𝑠 ) k 2 + k( 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) k 2 √ √ ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 ∩supp(ĝ1d,𝑠 ) k 2 + 2𝛿 2𝑠 √ ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 k 2 + k 𝑔ˆ 1d | S4 𝛿 \supp(ĝ1d,𝑠 ) k 2 + 2𝛿 𝑠 opt √ √ √ √ ≤ k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 2 + 4𝛿 2𝑠 + (4 + 2 2)𝛿 2𝑠 + 2𝛿 𝑠 opt √ √ = k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 2 + (8 2 + 6) 𝑠𝛿. 46 The proof of property (4) is very similar. We estimate the ℓ 1 error using the same techniques as k 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 k 1 ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | supp(ĝ1d,𝑠 ) k 1 + k( 𝑔ˆ 1d | 𝑀 − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) k 1 √ ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 ∩supp(ĝ1d,𝑠 ) k 1 + 2𝛿 · 2𝑠 √ ≤ k 𝑔ˆ 1d | 𝑀 − 𝑔ˆ 1d | S4 𝛿 k 1 + k 𝑔ˆ 1d | S4 𝛿 \supp(ĝ1d,𝑠 ) k 1 + 2 2𝛿𝑠 opt √ √ ≤ k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 1 + 4𝛿 · 2𝑠 + (4 + 2 2)𝛿 · 2𝑠 + 2 2𝛿𝑠 opt √ = k 𝑔ˆ 1d | 𝑀 − ( 𝑔ˆ 1d | 𝑀 )2𝑠 k 1 + (6 2 + 16)𝑠𝛿. Remark 3.1. In the noiseless case, if the univariate function 𝑔 1d is Fourier 𝑠-sparse, i.e., is a trigono- metric polynomial and 𝑀 is large enough such that supp( 𝑔ˆ 1d ) ⊂ B𝑀 , both A2𝑠,𝑀 sub and A sub,MC will 2𝑠,𝑀 exactly recover 𝑔ˆ 1d | 𝑀 (the latter with probability 1 − 𝜎), and therefore 𝑔ˆ 1d . In particular, note that the output of either algorithm will then actually be 𝑠-sparse. Using the above SFT algorithm with the discretization process outlined in [53] leads to a fully discrete sparse Fourier transform, requiring only equispaced samples of 𝑔 1d , denoted g1d . However, rather than separately accounting for the truncation to the frequency band B𝑀 as above, the equis- paced samples allow us to take advantage of aliasing, which is particularly important when we apply the algorithm along reconstructing rank-1 lattices. Thus, instead of approximating 𝑔ˆ 1d | 𝑀 ∈ CB𝑀 , we prefer to approximate the discrete Fourier transform of g1d given by F 𝑀 g1d . Eventually, we will consider techniques for approximation of arbitrary periodic functions rather than simply polynomials. For this reason, we require noise-robust recovery results for the method in [53]. The necessary modifications to account for this robustness as well as the improved guarantees carried over from the previous algorithm are given below. The upshot is that we are able to state five properties of this SFT analogous to those in Theorem 3.1 which allow for modular proofs of the multivariate results later on. Theorem 3.2 (Robust discrete sublinear-time SFT: see [53], Theorem 5). For a signal 𝑔 1d ∈ 𝑊 (T)∩ 𝐶 (T) corrupted by some arbitrary noise 𝜇 : T → C, and 1 ≤ 𝑟 ≤ 36 , 𝑀 Algorithm 1 of [53], denoted disc , will output a 2𝑠-sparse coefficient vector ĝ1d,𝑠 ∈ CB 𝑀 which A2𝑠,𝑀 47 1. reconstructs every frequency of F 𝑀 g1d ∈ CB𝑀 , 𝜔 ∈ B𝑀 , with corresponding aliased Fourier coefficient meeting the tolerance ! √ opt 1d kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠 k 1 1d −𝑟 |(F 𝑀 g )𝜔 | > 12(1 + 2) + 2(kg k ∞ 𝑀 + k 𝝁k ∞ ) , 2𝑠 2. satisfies the ℓ ∞ error estimate for recovered coefficients ! √ kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠opt k 1 k(F 𝑀 g1d − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) k ∞ ≤3 2 + 2(kg1d k ∞ 𝑀 −𝑟 + k 𝝁k ∞ ) , 2𝑠 3. satisfies the ℓ 2 error estimate opt opt kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠 k 1 kF 𝑀 g1d − ĝ1d,𝑠 k 2 ≤ kF 𝑀 g1d − (F 𝑀 g1d )2𝑠 k 2 + 38 √ 𝑠 √ + 152 𝑠(kg1d k ∞ 𝑀 −𝑟 + k 𝝁k ∞ ), 4. satisfies the ℓ 1 error estimate opt opt kF 𝑀 g1d − ĝ1d,𝑠 k 1 ≤ kF 𝑀 g1d − (F 𝑀 g1d )2𝑠 k 1 + 54kF 𝑀 g1d − (F 𝑀 g1d ) 𝑠 k 1 + 215𝑠(kg1d k ∞ 𝑀 −𝑟 + k 𝝁k ∞ ), 5. and the number of required samples of g1d and the operation count for A2𝑠,𝑀 disc are ! 𝑠2𝑟 3/2 log11/2 𝑀 O . log 𝑠 The Monte Carlo variant of A2𝑠,𝑀 disc , denoted A disc,MC , satisfies all of the conditions (1) – (4) simul- 2𝑠,𝑀 taneously with probability (1 − 𝜎) ∈ [2/3, 1) and has number of required samples and operation count    3/2 9/2 𝑀 O 𝑠𝑟 log (𝑀) log . 𝜎 Proof. All notation in this proof matches that in [53] (in particular, we use 𝑓 to denote the one- dimensional function in place of 𝑔 1d in the theorem statement and 𝑁 = 2𝑀 + 1). We begin by substituting the 2𝜋-periodic Gaussian filter given in (3) on page 756 with the 1-periodic Gaussian and associated Fourier transform ∞ (2 𝜋 ) 2 ( 𝑥−𝑛) 2 𝑐2 𝜔 2 1 Õ − 2𝑐2 1 1 𝑔(𝑥) = e 1 , 𝑔ˆ 𝜔 = √ e− 2 . 𝑐 1 𝑛=−∞ 2𝜋 48 Note then that all results regarding the Fourier transform remain unchanged, and since this 1- periodic Gaussian is a just a rescaling of the 2𝜋-periodic one used in [53], the bound in [53, Lemma 1] holds with a similarly compressed Gaussian, that is, for all 𝑥 ∈ − 12 , 12     (2 𝜋 𝑥 ) 2 3 1 − (3.1) 2 𝑔(𝑥) ≤ +√ e 2𝑐1 . 𝑐1 2𝜋 Analogous results up to and including [53, Lemma 10] for 1-periodic functions then hold straight- forwardly. Assuming that our signal measurements f = ( 𝑓 (𝑦 𝑗 )) 2𝑀 𝑗=0 = ( 𝑓 ( 𝑁 )) 𝑗=0 are corrupted by some 𝑗 2𝑀 discrete noise 𝝁 = (𝜇 𝑗 ) 2𝑀 𝑗=0 , we consider for any 𝑥 ∈ T a similar bound to [53, Lemma 10]. Here, 𝑗 0 := arg min 𝑗 |𝑥 − 𝑦 𝑗 | and 𝜅 := d𝛾 ln 𝑁e + 1 for some 𝛾 ∈ R+ to be determined. Then, 0 𝑗 +𝜅 2𝑀 1 Õ 1 Õ 𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) − ( 𝑓 (𝑦 𝑗 ) + 𝜇 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) 𝑁 𝑗=0 𝑁 𝑗= 𝑗 0 −𝜅 0 𝑗 +𝜅 0 𝑗 +𝜅 2𝑀 1 Õ Õ 1 Õ ≤ 𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) − 𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) + 𝜇 𝑗 𝑔(𝑥 − 𝑦 𝑗 ) 𝑁 𝑗=0 𝑗= 𝑗 0 −𝜅 𝑁 𝑗= 𝑗 0 −𝜅 0 𝑗 +𝜅 2𝑀 𝜅 1 Õ Õ k 𝝁k ∞ Õ ≤ 𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) − 𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) + 𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 ) 𝑁 𝑗=0 𝑗= 𝑗 0 −𝜅 𝑁 𝑘=−𝜅 We bound the first term in this sum by a direct application of [53, Lemma 10]; however, we take this opportunity to reduce the constant in the bound given there. In particular, bounding this term by the final expression in the proof of [53, Lemma 10] and using our implicit assumption that 36 ≤ 𝑁, we have 0 𝑗 +𝜅 2𝑀 1 Õ 1 Õ 𝑓 (𝑦 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) − ( 𝑓 (𝑦 𝑗 ) + 𝜇 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) 𝑁 𝑗=0 𝑁 𝑗= 𝑗 0 −𝜅 r ! (3.2) 𝜅 3 1 ln 36 −𝑟 1 Õ ≤ √ + kf k ∞ 𝑁 + k 𝝁k ∞ 𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 ). 2𝜋 2𝜋 36 𝑁 𝑘=−𝜅 We now work on bounding the second term. First note that for all 𝑘 ∈ [−𝜅, 𝜅] ∩ Z,   𝑘 𝑔(𝑥 − 𝑦 𝑗 0 ±𝑘 ) = 𝑔 𝑥 − 𝑦 𝑗 0 ± . 𝑁 49 Assuming without loss of generality that 0 ≤ 𝑥 − 𝑦 𝑗 0 , we can bound the nonnegatively indexed summands by (3.1) as     (2 𝜋 ) 2 𝑘2 𝑘 3 2 − 2 2 𝑔 𝑥 − 𝑦 𝑗0 + ≤ +√ e 2𝑐1 𝑁 . (3.3) 𝑁 𝑐1 2𝜋 For the negatively indexed summands, the definition of 𝑗 0 = arg min 𝑗 |𝑥 − 𝑦 𝑗 | implies that 𝑥 − 𝑦 𝑗 0 ≤ 2𝑁 . In particular, 1 𝑘 1 − 2𝑘 𝑥 − 𝑦 𝑗0 − ≤ <0 𝑁 2𝑁 implies  2   𝑘 1 − 2𝑘 𝑘 2𝑘 − 1 𝑘 𝑥 − 𝑦 𝑗0 − ≥ 𝑥 − 𝑦 𝑗0 − ≥ · , 𝑁 2𝑁 𝑁 2𝑁 𝑁 giving     (2 𝜋 ) 2 𝑘2 (2 𝜋 ) 2 𝑘 𝑘 3 2 − 2 2 (3.4) 2 2 𝑔 𝑥 − 𝑦 𝑗0 − ≤ +√ e 2𝑐1 𝑁 e 4𝑐1 𝑁 . 𝑁 𝑐1 2𝜋 We now bound the final exponential. We first recall from [53] the choices of parameters √ √ 𝛽 ln 𝑁 6𝑟 𝛽 𝑟 √ 𝑐1 = , 𝜅 = d𝛾 ln 𝑁e + 1, 𝛾 = √ = √ , 𝛽 = 6 𝑟 𝑁 2𝜋 2 𝜋 with 1 ≤ 𝑟 ≤ 36 . 𝑁 For 𝑘 ∈ [1, 𝜅] ∩ Z then, ! ! (2𝜋) 2 𝑘 (2𝜋) 2 𝜅 exp ≤ exp 4𝑐21 𝑁 2 4𝑐21 𝑁 2   2 6𝑟√ln 𝑁 ©𝜋 +2 ª 2𝜋 ≤ exp ­­ ® 36𝑟 ln 𝑁 ® « ¬ 𝜋 𝜋 2 ≤ exp √ + 6 2 18𝑟 ln 𝑁 𝜋2   𝜋 ≤ exp √ + =: 𝐴. 6 2 18 ln 36 Combining this with our bounds for the nonnegatively indexed summands (3.3) and the nega- tively indexed summands (3.4), we have ! ! 𝜅 𝜅 2 𝑘2 1 Õ 3 1 − (2 𝜋2) Õ 𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 ) ≤ √ + √ 1 + (1 + 𝐴) e 2𝛽 ln 𝑁 𝑁 𝑘=−𝜅 𝛽 ln 𝑁 𝑁 2𝜋 𝑘=1 50 Expressing the final sum as a truncated lower Riemann sum and applying a change of variables on the resulting integral, we have 𝜅 2 𝑘2 √ ∫ ∞ √ − (2 𝜋2) 𝛽 ln 𝑁 𝛽 ln 𝑁 Õ 2 e 2𝛽 ln 𝑁 ≤ √ 𝑒 −𝑥 𝑑𝑥 = √ . 𝑘=1 2𝜋 0 2 2𝜋 Making use of our parameter values from [53], and the fact that 1 ≤ 𝑟 ≤ 36 𝑁 , ! 1+ 𝐴 √ 𝜅  1 Õ 3 1 𝑔(𝑥 − 𝑦 𝑗 0 +𝑘 ) ≤ √ + √ 1 + √ 𝛽 ln 𝑁 𝑁 𝑘=−𝜅 𝛽 ln 𝑁 𝑁 2𝜋 2 2𝜋 (3.5) r 3 3(1 + 𝐴) 1 1 + 𝐴 ln 36 ≤ √ + √ + √ + 6 ln 36 2 2𝜋 36 2𝜋 4𝜋 36 < 2. With our revised bound for (3.2) above, we reprove [53, Theorem 4] to estimate 𝑔 ∗ 𝑓 by the truncated discrete convolution with noisy samples. In particular, we apply [53, Theorem 3], (3.2), (3.1), and finally our same assumption that 1 ≤ 𝑟 ≤ 𝑁 36 to obtain l m 𝑗 0+ √6𝑟 ln 𝑁 +1 2𝜋 1 Õ (𝑔 ∗ 𝑓 )(𝑥) − ( 𝑓 (𝑦 𝑗 ) + 𝜇 𝑗 )𝑔(𝑥 − 𝑦 𝑗 ) 𝑁 l m 𝑗= 𝑗 0 − √6𝑟 ln 𝑁 −1 2𝜋 r ! 𝑁 1−𝑟 3 1 ln 36 ≤ √ √ kf k ∞ 𝑁 −𝑟 + √ + kf k ∞ 𝑁 −𝑟 + 2k 𝝁k ∞ 6 𝑟 ln 𝑁 2𝜋 2𝜋 36 r !   1 3 1 ln 36 kf k ∞ kf k ∞ ≤ √ +√ + + 2k 𝝁k ∞ < 2 + k 𝝁k ∞ . 6 ln 36 2𝜋 2𝜋 36 𝑁𝑟 𝑁𝑟 Replacing all references of 3kf k ∞ 𝑁 −𝑟 by 2(kf k ∞ 𝑁 −𝑟 + k 𝝁k ∞ ) in the remainder of the steps up to proving [53, Theorem 5] gives the desired noise robustness (with a slightly improved constant). Using the revised error estimates of the nonequispaced algorithm from Theorem 3.1 and re- defining 𝛿 = 3(k f̂ − f̂𝑠 k 1 /2𝑠 + 2(kf k ∞ 𝑁 −𝑟 + k 𝝁k ∞ )) as in the proof of [53, Theorem 5] (which also opt contains the proof of property (2)), the discretization algorithm [53, Algorithm 1] will produce can- √ didate Fourier coefficient approximations in lines 9 and 12 corresponding to every | 𝑓ˆ𝜔 | ≥ (4+2 2)𝛿 in place of 4𝛿 in Theorem 3.1. The exact same argument as in the proof of Theorem 3.1 then applies to the selection of the 2𝑠-largest entries of this approximation with the revised threshold values and error bounds to give properties (1), (3) and (4). 51 In detail, [53, Lemma 13] and the discussion right after its statement gives that property (2) holds for any approximate coefficient with frequency recovered throughout the algorithm (which, for the purposes of the following discussion, we will store in x rather than 𝑅ˆ defined in [53, Algorithm 1]), not just those in the final output v := x𝑠 . Additionally, by the same lemma and our revised bounds opt √ from Theorem 3.1, any frequency 𝜔 ∈ [𝑁] satisfying | 𝑓𝜔 | > (4 + 2 2)𝛿 will have an associated coefficient estimate in x. By Lemma 3.1, |S(4+2√2)𝛿 | ≤ 2𝑠 = | supp(v)|, and so if 𝜔 ∈ S(4+2√2)𝛿 \ supp(v), there exists some 𝜔0 ∈ supp(v) \ S(4+2√2)𝛿 such that 𝑣 𝜔0 took the place of 𝑣 𝜔 in S. In particular, this means √ √ that |𝑥 𝜔0 | ≥ |𝑥 𝜔 |, | 𝑓ˆ𝜔0 | ≤ (4 + 2 2)𝛿, and | 𝑓ˆ𝜔 | > (4 + 2 2𝛿). Thus, √ √ √ √ (4 + 2 2)𝛿 + 2𝛿 > | 𝑓ˆ𝜔0 | + 2𝛿 ≥ |𝑥 𝜔0 | ≥ |𝑥 𝜔 | ≥ | 𝑓ˆ𝜔 | − 2𝛿, √ implying that | 𝑓ˆ𝜔 | ≤ 4(1 + 2)𝛿 and therefore proving (1). To prove (3), we use Lemma 3.1, and consider k f̂ − vk 2 ≤ k f̂ − f̂| supp(v) k 2 − k( f̂ − v)| supp(v) k 2 √ √ ≤ k f̂ − f̂| S(4+2√2) 𝛿 ∩supp(v) k 2 + 2𝛿 2𝑠 √ ≤ k f̂ − f̂| S(4+2√2) 𝛿 k 2 + k f̂| S(4+2√2) 𝛿 \supp(v) k 2 + 2𝛿 𝑠 opt √ √ √ √ √ ≤ k f̂ − f̂2𝑠 k 2 + (4 + 2 2)𝛿 2𝑠 + 4(1 + 2)𝛿 2𝑠 + 2𝛿 𝑠 opt √ √ = k f̂ − f̂2𝑠 k 2 + (14 + 8 2)𝛿 𝑠. The proof of (4) is similar, bounding the ℓ 1 error as k f̂ − vk 1 ≤ k f̂ − f̂| supp(v) k 1 − k( f̂ − v)| supp(v) k 1 √ ≤ k f̂ − f̂| S(4+2√2) 𝛿 ∩supp(v) k 1 + 2𝛿 · 2𝑠 √ ≤ k f̂ − f̂| S(4+2√2) 𝛿 k 1 + k f̂| S(4+2√2) 𝛿 \supp(v) k 1 + 2 2𝛿𝑠 opt √ √ √ ≤ k f̂ − f̂2𝑠 k 1 + (4 + 2 2)𝛿 · 2𝑠 + 4(1 + 2)𝛿 · 2𝑠 + 2 2𝛿𝑠 opt √ = k f̂ − f̂2𝑠 k 1 + (16 + 14 2)𝛿𝑠. 52 3.3 Fast multivariate sparse Fourier transforms Having detailed two sublinear-time, one-dimensional SFT algorithms, we are now prepared to extend these to the multivariate setting. The general approach will be to apply the one-dimensional methods to transformations of our multivariate function of interest with samples taken along rank-1 lattices. The particular approaches for transforming our multivariate function will then allow for the efficient extraction of multidimensional frequency information for the most energetic coeffi- cients identified by univarate SFTs. In particular, our first approach considered in Section 3.3.1 successively shifts the function in each dimension, whereas our second approach considered in Section 3.3.2 successively collapses all but one dimension along a rank-1 lattice and samples the resulting two-dimensional function. Since the two approaches in Algorithms 1 and 2 below can make use of any univariate SFT algo- rithm, their analysis will be presented in a modular fashion below. Each algorithm is followed by a lemma (Lemma 3.2 and Lemma 3.4 respectively) which provides associated error guarantees when any sufficiently accurate univariate SFT A 𝑠,𝑀 is employed. The lemmas are then each followed by two corollaries (Corollaries 3.1 and 3.2 and Corollaries 3.3 and 3.4 respectively) where we apply the lemma to the two example univariate SFTs reviewed in Section 3.2 specified by Theorems 3.2 and 3.1. 3.3.1 Phase encoding We begin by noting that this section makes significant use of the property of the Fourier trans- form that translation of a function modulates its Fourier coefficients. We denote the shift operator 𝑆ℓ,𝛼 in the ℓth coordinate with shift 𝛼 ∈ R defined by its action on the multivariate periodic function 𝑔 : T𝑑 → C as 𝑆ℓ,𝛼 (𝑔)(𝑥 1 , . . . , 𝑥 𝑑 ) := 𝑔(𝑥1 , . . . , 𝑥ℓ−1 , (𝑥ℓ + 𝛼) mod 1, 𝑥ℓ+1 , . . . , 𝑥 𝑑 ). By a change of coordinates in the integral defining a Fourier coefficient, we see that translating will modulate the Fourier coefficients of 𝑔 : T𝑑 → C as \ (𝑆 ℓ,𝛼 𝑔) k = e 2𝜋i𝑘 ℓ 𝛼 𝑔ˆ k . (3.6) 53 e2𝜋i(𝑘 0 𝑧0 𝑡+𝑘 1 𝑧1 𝑡) 𝑆0,1/𝐾 z𝑡 e2𝜋i(𝑘 0 (𝑧0 𝑡+1/𝐾)+𝑘 1 𝑧1 𝑡) z𝑡 + (1/𝐾, 0) = e2𝜋i𝑘 0 ·1/𝐾 e2𝜋i(𝑘 0 𝑧0 𝑡+𝑘 1 𝑧1 𝑡) | {z } compute 𝑘 0 from here 1/𝐾 → Figure 3.1 The basic procedure for the phase encoding algorithm applied to the trigonometric mono- mial 𝑔(x) = e2𝜋ik·x . The main idea of our phase encoding approach in Algorithm 3.1 is that by exploiting this spatial translation property, we can separate out the components of recovered frequencies in modulations of the function’s Fourier coefficients. Before stating the algorithm in detail, we begin with a simple example. Example 3.1 (Phase encoding on a trigonometric monomial). Let 𝑑 = 2. Suppose that 𝑔(x) = e2𝜋ik·x is a trigonometric monomial with single frequency k ∈ I ⊂ Z2 for some known, potentially large I. Given Λ(z, 𝑀), a reconstructing rank-1 lattice for I, we consider the one-dimensional restriction of 𝑔 to the lattice 𝑔 1d (𝑡) := 𝑔(𝑡z). Since 𝑔 is Fourier-sparse, a lattice FFT (cf. Algo- rithm 1.1) on 𝑔 1d is unnecessarily expensive. Thus, applying a much faster SFT to 𝑔 1d returns 1d 𝑔ˆ k·z mod 𝑀 = 1. Our goal is to match this coefficient of 𝑔 1d to the correct Fourier coefficient of 𝑔 without having to search all of I. Figure 3.1 depicts the phase encoding method we use in Algorithm 3.1 below. In order to compute 𝑔 1d , we restrict 𝑔 to the dark blue line in this figure, z𝑡. However, to get extra information about k, we also consider 𝑆0,1/𝐾 𝑔, a shift of 𝑔 in the first coordinate by 1/𝐾, restricted to the same lattice. The shifted lattice that we effectively restrict 𝑔 to, z𝑡 + (1/𝐾, 0), is depicted in light blue. The resulting modulation of 𝑔 induced by this spatial shift (as described by (3.6)) is detailed in the remainder of Figure 3.1. Thus, defining 𝑔 1d,1 (𝑡) := 𝑆0,1/𝐾 𝑔(z𝑡), an SFT would discover 54 1d,1 𝑔ˆ k·z mod 𝑀 = e2𝜋i𝑘 0 /𝐾 . We then can extract 𝑘 0 from this modulation. Repeating this process in the ℓ = 1 coordinate will recover 𝑘 1 , and therefore, the entirety of k is recovered by using 𝑑 = 2 SFTs. From here, we can then match 𝑔ˆ k·z 1d mod 𝑀 = 1 to 𝑔ˆ k in faster than O (|I|) time and memory as desired. In the language of Algorithm 3.1, the original SFT of 𝑔 1d occurs on Line 1. The SFTs of the shifts of 𝑔, denoted 𝑔 1d,0 , . . . , 𝑔 1d,𝑑−1 , occur on Line 3. In this example, we considered a function with only one significant Fourier mode, however, we will generally recover 𝑠 significant Fourier modes from the SFT algorithm. Thus, the for loop from Lines 6 to 14 considers each of these recovered one-dimensional frequencies separately. Line 9 computes the modulation induced by each of the 𝑑 shifts, then extracts each coordinate of the 𝑑-dimensional frequency. The remaining check on Line 11 is useful for the theoretical analysis to ensure that spuriously recovered frequencies are ignored. Algorithm 3.1 Simple Frequency Index Recovery by Phase Encoding Input: A multivariate periodic function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ) (from which we are able to obtain potentially noisy samples), a multivariate frequency set I ⊂ B𝐾𝑑 , a reconstructing rank-1 lattice Λ(z, 𝑀) for I, and an SFT algorithm A 𝑠,𝑀 . Output: Sparse coefficient vector ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 (optionally supported on I, see Line 11), an 𝐾 approximation to ( 𝑔| ˆ I)𝑠 . opt 1: Apply A 𝑠,𝑀 to the univariate restriction of 𝑔 to the lattice, 𝑔 1d (𝑡) = 𝑔(𝑡z), to produce ĝ1d,𝑠 = A 𝑠,𝑀 𝑔 1d , a sparse approximation of F 𝑀 g1d ∈ CB𝑀 . 2: for all ℓ ∈ [𝑑] do 3: Apply A 𝑠,𝑀 to 𝑔 1d,ℓ (𝑡) = 𝑆ℓ,1/𝐾 𝑔(𝑡z) to produce ĝ1d,ℓ,𝑠 = A 𝑠,𝑀 𝑔 1d,ℓ , a sparse approxima- tion of F 𝑀 g1d,ℓ ∈ CB𝑀 . 4: end for 5: ĝ𝑠 ← 0 6: for all 𝜔 ∈ supp( ĝ1d,𝑠 ) ⊂ B𝑀 do 7: k𝜔 ← 0 8: for all ℓ ∈ [𝑑] do 9: (𝑘 𝜔 )ℓ ← round(𝐾 arg( 𝑔ˆ 𝜔1d,ℓ,𝑠 /𝑔ˆ 𝜔1d,𝑠 )/2𝜋) 10: end for 11: if k𝜔 · z ≡ 𝜔 (mod 𝑀) (and optionally k𝜔 ∈ I; see Remark 3.2) then 12: 𝑔ˆ k𝑠 𝜔 ← 𝑔ˆ k𝑠 𝜔 + 𝑔ˆ 𝜔1d,𝑠 13: end if 14: end for 55 3.3.1.1 Analysis of Algorithm 3.1 Having seen the phase encoding approach of Algorithm 3.1 in action, we now provide an error guarantee for its output. Notice that the assumptions on the SFT necessary for this theoretical analysis are exactly those provided by Theorems 3.1 and 3.2. When we use the complex argument function in Algorithm 3.1 and below, we use the principal branch, so that arg : C → (−𝜋, 𝜋]. Lemma 3.2 (General recovery result for Algorithm 3.1). Let A 𝑠,𝑀 in the input to Algorithm 3.1 be a noise-robust SFT algorithm which, for a function 𝑔 1d ∈ 𝑊 (T) ∩𝐶 (T) corrupted by some arbitrary noise 𝜇 : T → C, constructs an 𝑠-sparse Fourier approximation A 𝑠,𝑀 (𝑔 1d + 𝜇) =: ĝ1d,𝑠 ∈ CB𝑀 which 1. reconstructs every frequency (up to 𝑠 many) of F 𝑀 g1d ∈ C 𝑀 , 𝜔 ∈ B𝑀 , with corresponding Fourier coefficient meeting the tolerance |(F 𝑀 g1d )𝜔 | > 𝜏, 2. satisfies the ℓ ∞ error estimate for recovered coefficients (F 𝑀 g1d − ĝ1d,𝑠 )| supp(ĝ1d,𝑠 ) ∞ ≤ 𝜂∞ < 𝜏, 3. satisfies the ℓ 2 error estimate F 𝑀 g1d − ĝ1d,𝑠 2 ≤ 𝜂2 , 4. satisfies the ℓ 1 error estimate F 𝑀 g1d − ĝ1d,𝑠 1 ≤ 𝜂1 , 5. and requires O (𝑃(𝑠, 𝑀)) total evaluations of 𝑔 1d , operating with computational complexity O (𝑅(𝑠, 𝑀)). Additionally, assume that the parameters 𝜏 and 𝜂∞ hold uniformly for each SFT performed in Al- gorithm 3.1. Let 𝑔, I, and Λ(z, 𝑀) be as specified in the input to Algorithm 3.1. Collecting the 𝜏-significant frequencies of 𝑔 into the set S𝜏 := {k ∈ I | | 𝑔ˆ k | > 𝜏}, assume that |S𝜏 | ≤ 𝑠, and set !! 2 𝛽 = max 𝜏, 𝜂∞ 1 +  . sin 𝐾𝜋 56 Then Algorithm 3.1 (ignoring the optional check on Line 11) will produce an 𝑠-sparse approxima- tion ĝ𝑠 of the Fourier coefficients of 𝑔 satisfying the error estimates q k ĝ𝑠 − 𝑔k ˆ ℓ2 (Z𝑑 ) ≤ 𝜂2 + (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + k 𝑔| ˆ I − 𝑔|ˆ S𝛽 k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ2 (Z𝑑 ) and k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ 𝜂1 + (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + k 𝑔| ˆ I − 𝑔|ˆ S𝛽 k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ1 (Z𝑑 ) requiring O (𝑑 · 𝑃(𝑠, 𝑀)) total evaluations of 𝑔, in O (𝑑 · (𝑅(𝑠, 𝑀) + 𝑠)) total operations. Proof. We begin by assuming that 𝑔 is a trigonometric polynomial with supp( 𝑔) ˆ ⊂ I. Since Λ(z, 𝑀) is a reconstructing rank-1 lattice for I, there are no collisions among the one-dimensional frequencies {k · z | k ∈ I} modulo 𝑀. Setting 𝑔(𝑡z) = 𝑔 1d (𝑡) then ensures that for each k ∈ I, 1d . Since there are no frequency collisions in the lattice FFT, Lemma 1.3 implies that 𝑔ˆ k = 𝑔ˆ k·z 𝑔ˆ k = (F 𝑀 g1d )k·z mod 𝑀 . Thus, by assumption 1 on the SFT algorithm A 𝑠,𝑀 , Lines 1 and 3 of Algorithm 3.1 will produce coefficient estimates of 𝑔ˆ k for every k ∈ S𝜏 . We then write these SFT approximations as 𝑔ˆ k·z 1d,𝑠 mod 𝑀 = 𝑔ˆ k + 𝜂k and 𝑔ˆ k·z1d,ℓ,𝑠 mod 𝑀 = e2𝜋i𝑘 ℓ /𝐾 ( 𝑔ˆ k + 𝜂kℓ ) respectively, where we have made use of (3.6). Note that |𝜂k |, |𝜂kℓ | ≤ 𝜂∞ . Now, considering the estimate for 𝑘 ℓ , we have 1d,ℓ,𝑠 ! ! 𝐾 𝑔ˆ k·z 𝐾 ˆ 𝑔 k + 𝜂 ℓ arg 1d,𝑠mod 𝑀 = arg e2𝜋i𝑘 ℓ /𝐾 1d,𝑠 k 2𝜋 𝑔ˆ k·z mod 𝑀 2𝜋 𝑔ˆ k·z mod 𝑀 ! 𝐾 ˆ 𝑔 k + 𝜂 ℓ = 𝑘ℓ + arg 1d,𝑠 k 2𝜋 𝑔ˆ k·z mod 𝑀 ! 𝐾 𝜂kℓ − 𝜂k = 𝑘ℓ + arg 1 + 1d,𝑠 . 2𝜋 𝑔ˆ k·z mod 𝑀 We now only consider | 𝑔ˆ k | > 𝛽 ≥ max(𝜏, 3𝜂∞ ), that is k ∈ S𝛽 ⊂ S𝜏 , and therefore, the correspond- ing approximate coefficient satisfies | 𝑔ˆ k·z 1d,𝑠 mod 𝑀 | > 𝛽 − 𝜂∞ . Thus, the magnitude of the fraction in the argument must be strictly less than 2𝜂∞ 𝛽−𝜂∞ ≤ 1. Therefore, we consider the argument of a point lying in the right half of the complex plane, in the open disc of radius 2𝜂∞ 𝛽−𝜂∞ centered at 1. The 57 maximal absolute argument of a point in this disc will be that of a point lying on a tangent line passing through the origin. This point, the origin, and 1 then form a right triangle from which we deduce that ! 𝜂kℓ − 𝜂k   2𝜂∞ arg 1 + 1d,𝑠 < arcsin , 𝑔ˆ k·z 𝛽 − 𝜂∞ mod 𝑀 and our choice of 𝛽 ≥ 𝜂∞ (1 + 2/sin(𝜋/𝐾)) then implies that ! 𝜂kℓ − 𝜂k 𝜋 arg 1 + 1d,𝑠 < . 𝑔ˆ k·𝑧 mod 𝑀 𝐾 Thus, 1d,ℓ,𝑠 ! 𝐾 𝑔ˆ k·z 1 arg 1d,𝑠mod 𝑀 − 𝑘 ℓ < , 2𝜋 𝑔ˆ k·z mod 𝑀 2 and so after rounding to the nearest integer, Algorithm 3.1 will recover 𝑘 ℓ for all ℓ ∈ [𝑑] and k ∈ S𝛽 . We now know that the final loop of Algorithm 3.1 will properly map the one-dimensional fre- quency 𝜔 = k · z mod 𝑀 to k for all k ∈ S𝛽 . Thus, for these same k ∈ S𝛽 , Line 12 ensures that we set 𝑔ˆ k𝑠 := 𝑔ˆ k·z 1d,𝑠 mod 𝑀 . Additionally, the max(𝑠 − |S𝛽 |, 0) many coefficients 𝑔ˆ 𝜔1d,𝑠 for which 𝜔 ≠ k · z mod 𝑀 for any k ∈ S𝛽 are still available for potential assignment. If any multivariate frequency k𝜔 ∈ I is reconstructed and passes the mandatory check in Line 11 then the approximate Fourier coefficient 𝑔ˆ 𝜔1d,𝑠 properly corresponds to (F 𝑀 g1d )k 𝜔 ·z mod 𝑀 = 𝑔ˆ k 𝜔 . On the other hand, if some error introduced in the SFTs reconstructs a multivariate frequency k𝜔 ∉ I, the reconstructing property does not allow us to conclude anything about a (𝑘 𝜔 , 𝜔) pair passing the check in Line 11. Thus, it is possible that 𝑔ˆ 𝜔1d,𝑠 will contribute to some component of ĝ𝑠 not corresponding to any frequency in I. At the least however, since we know that all entries of ĝ1d,𝑠 corresponding to frequencies in S𝛽 are correctly assigned, the remaining ones satisfy | 𝑔ˆ 𝜔1d,𝑠 | ≤ 𝛽 + 𝜂∞ . Using these facts allows us to estimate the ℓ 2 error as k ĝ𝑠 − 𝑔k ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠 | Z𝑑 \I k ℓ2 (Z𝑑 ) + k ĝ𝑠 | I − 𝑔| ˆ supp(ĝ𝑠 )∩I k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ S𝛽 k ℓ2 (Z𝑑 ) q (3.7) ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂2 + k 𝑔ˆ − 𝑔| ˆ S𝛽 k ℓ2 (I) and the ℓ 1 error as k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ k ĝ𝑠 | Z𝑑 \I k ℓ1 (Z𝑑 ) + k ĝ𝑠 | I − 𝑔| ˆ supp(ĝ𝑠 )∩I k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ S𝛽 k ℓ1 (Z𝑑 ) (3.8) ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂1 + k 𝑔ˆ − 𝑔| ˆ S𝛽 k ℓ1 (I) 58 where we have additionally used the accuracy of the initial one-dimensional SFT and the assumption that 𝑔ˆ is supported on I. We now handle the case when 𝑔 is not necessarily a polynomial with Fourier support contained in I. Rather than aiming to approximate 𝑔ˆ k for every k ∈ Z𝑑 , we restrict attention to only frequen- cies in I, instead attempting to approximate the Fourier coefficients of 𝑔| I = k∈I 𝑔ˆ k e2𝜋ik·◦ . We Í then have that 𝑔 =: 𝑔| I + 𝑔| Z𝑑 \I and view potentially noisy input 𝑔 + 𝜇 to our algorithm as 𝑔 + 𝜇 = 𝑔| I + 𝑔| Z𝑑 \I + 𝜇 . | {z } 𝜇0 Algorithm 3.1 applied to 𝑔 + 𝜇 is then equivalent to applying it to 𝑔| I + 𝜇0, where now 𝜏, 𝜂∞ , 𝜂2 , and 𝜂1 depend on 𝜇0, and the output is an approximation of 𝑔| ˆ I . Since 𝜇0 represents noise on the input to A 𝑠,𝑀 in its applications to 𝑔| I (𝑡z) and 𝑆ℓ,1/𝐾 𝑔| I (𝑡z) we remark here that k𝜇0 k ∞ ≤ k𝑔| Z𝑑 \I k ∞ + k𝜇k ∞ ≤ k 𝑔ˆ − 𝑔| ˆ I k ℓ1 (Z𝑑 ) + k𝜇k ∞ (3.9) so as to help us estimate 𝜏, 𝜂∞ , 𝜂2 , and 𝜂1 in future applications of the lemma. Accounting for the truncation to I in the ℓ 2 error bound and using (3.7) applied to 𝑔| ˆ I , we estimate the ℓ 2 error as k ĝ𝑠 − 𝑔k ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔|ˆ I k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ2 (Z𝑑 ) q ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂2 + k 𝑔| ˆ I − 𝑔| ˆ S𝛽 k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ2 (Z𝑑 ) and the ℓ 1 error as k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔|ˆ I k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ1 (Z𝑑 ) ≤ (𝛽 + 𝜂∞ ) max(𝑠 − |S𝛽 |, 0) + 𝜂1 + k 𝑔| ˆ I − 𝑔| ˆ S𝛽 k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔|ˆ I k ℓ1 (Z𝑑 ) Recalling that 𝑃(𝑠, 𝑀) and 𝑅(𝑠, 𝑀) are the sampling and runtime complexity of A 𝑠,𝑀 respec- tively, since 1 + 𝑑 SFTs are required, the number of 𝑔 evaluations is O (𝑑 · 𝑃(𝑠, 𝑀)) and the associ- ated computational complexity is O (𝑑 · 𝑅(𝑠, 𝑀)). The complexity of Lines 6 to 14 is O (𝑠𝑑). 59 Remark 3.2. Since the only possible misassigned values of 𝑔ˆ 𝜔1d,𝑠 contribute to coefficients in ĝ𝑠 outside the chosen frequency set I for which Λ(z, 𝑀) is reconstructing, if it is possible to quickly (e.g., in O (𝑑) time) check a multivariate frequency’s inclusion in I (e.g., a hyperbolic cross), en- tries outside of I in ĝ𝑠 can be identified in the optional check on Line 11 and remain (correctly) unassigned. This has the effect of removing the max(𝑠 − |S𝛽 |, 0) terms in the error bounds while not increasing the computational complexity. Additionally, this outputs an approximation to ( 𝑔| opt ˆ I )𝑠 which is supported only on our supplied frequency set I as we may expect or prefer. We now apply Lemma 3.2 with the discrete sublinear-time SFT from Theorem 3.2 to give spe- cific error bounds in terms of best 𝑠-term approximation errors as well as detailed runtime and sampling complexities. Corollary 3.1 (Algorithm 3.1 with discrete sublinear-time SFT). Let 𝐾 ≥ 9. For I ⊂ B𝐾𝑑 with reconstructing rank-1 lattice Λ(z, 𝑀) and the function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ), we consider applying Algorithm 3.1 where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute magnitude. Using the discrete sublinear-time SFT algorithm A2𝑠,𝑀 disc or A disc,MC with parameter 2𝑠,𝑀 36 , Algorithm 3.1 will produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying 𝑀 1≤𝑟 ≤ 𝐾 the error estimates opt ˆ I − ( 𝑔| k 𝑔| ˆ I ) 𝑠 k1 √ k ĝ𝑠 − 𝑔k ˆ 2 ≤ (48 + 4𝐾) √ + (189 + 16𝐾) 𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞) 𝑠 opt k ĝ𝑠 − 𝑔k ˆ 1 ≤ (69 + 6𝐾) 𝑔| ˆ I − ( 𝑔| ˆ I)𝑠 + (267 + 23𝐾)𝑠 (k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞) , 1 albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version. The total number of evaluations of 𝑔 and computational complexity will be ! 𝑑𝑠2𝑟 3/2 log11/2 𝑀    𝑑𝑀 O or O 𝑑𝑠𝑟 log 𝑀 log 3/2 9/2 log 𝑠 𝜎 for A2𝑠,𝑀 disc or A disc,MC respectively. 2𝑠,𝑀 Proof. For the definitions of 𝜏 and 𝛽 in Lemma 3.2 with associated values given by Theorem 3.2, 60 Lemma 3.1 applied with x = 𝑔| ˆ I implies that S𝛽 can contain at most 2𝑠 elements and the bound opt √ k 𝑔| ˆ I − 𝑔|ˆ S𝛽 k ℓ2 (Z𝑑 ) ≤ k 𝑔|ˆ I − ( 𝑔| ˆ I )2𝑠 k ℓ2 (Z𝑑 ) + 𝛽 2𝑠 k 𝑔| ˆ I − ( 𝑔| opt ˆ I ) 𝑠 k ℓ1 (Z𝑑 ) √ (3.10) ≤ √ + 𝛽 2𝑠 2 𝑠 holds. Note that the last inequality follows from [24, Theorem 2.5] applied to 𝑔| ˆ I )𝑠 . opt ˆ I − ( 𝑔| Lemma 3.2 then holds with 𝑠 replaced by 2𝑠 for the 2𝑠-sparse approximations given by A2𝑠,𝑀 disc or A2𝑠,𝑀 disc,MC in Algorithm 3.1. After treating the truncation error as measurement noise as well as accounting for any noise in the input bounded by 𝑒 ∞ , Theorem 3.2 gives the values ! √ k 𝑔| opt ˆ I − ( 𝑔| ˆ I )𝑠 k1 −𝑟 𝜂∞ = 3 2 + 2(k𝑔k ∞ 𝑀 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞) , 2𝑠 √ 12(1 + 2) 𝜏= √ 𝜂∞ . 3 2 Assuming 𝐾 ≥ 9, !! ! ! 2 2 2 𝛽 = max 𝜏, 𝜂∞ 1 + 𝜋  = 𝜂∞ 1 + 𝜋  ≤ 𝜂∞ 1+ 𝜋 𝐾 . sin 𝐾 sin 𝐾 9 sin 9 Inserting the estimate for k 𝑔| ˆ S𝛽 k 2 from (3.10), this bound for 𝛽, and the values for 𝜂2 (where ˆ I − 𝑔| again we use [24, Theorem 2.5]) and 𝜂1 from Theorem 3.2 opt ˆ I − ( 𝑔| 77k 𝑔| ˆ I )𝑠 k1 √ 𝜂2 ≤ √ + 152 𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞) 2 𝑠 opt 𝜂1 ≤ 55 𝑔| ˆ I − ( 𝑔| ˆ I )𝑠 + 215𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞) 1 √ into the recovery bound in Lemma 3.2 and upper bounding k 𝑔ˆ − 𝑔| ˆ I k 2 by 𝑠k 𝑔ˆ − 𝑔| ˆ I k 1 gives the final error estimate. The change to the complexity of the randomized algorithm arises from distributing the proba- bility of failure 𝜎 over the 𝑑 + 1 SFTs in a union bound. Because the nonequispaced SFTs discussed in Theorem 3.1 do not approximate the discrete Fourier transform and therefore do not alias the one-dimensional frequencies k·z into frequencies in B𝑀 , slightly modifying Algorithm 3.1 to use SFTs with a larger bandwidth allows for the following recovery result. 61 Corollary 3.2 (Algorithm 3.1 with nonequispaced sublinear-time SFT). For I ⊂ B𝐾𝑑 with 𝐾 ≥ 6, fix the new bandwidth parameter 𝑀˜ := 2 maxk∈I |k · z| + 1. For Λ(z, 𝑀), a reconstructing rank-1 lattice for I with 𝑀 ≤ 𝑀, ˜ and the function 𝑔 ∈ 𝑊 (T𝑑 )∩𝐶 (T𝑑 ), we consider applying Algorithm 3.1 where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute magnitude with the following modifications: 1. use the sublinear-time SFT algorithm A2𝑠, sub or A sub,MC 𝑀˜ 2𝑠, 𝑀˜ 2. and only check equality against 𝜔 in Line 11 (rather than equivalence modulo 𝑀), to produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying the error estimates 𝐾 " # opt k ˆ 𝑔| I − ( ˆ 𝑔| ) I 𝑠 k 1 √ √ k ĝ𝑠 − 𝑔kˆ ℓ2 (Z𝑑 ) ≤ (25 + 3𝐾) √ + 𝑠k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑠𝑒 ∞ , 𝑠 h i opt k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ (35 + 3𝐾) 𝑔| ˆ I − ( 𝑔| ˆ I)𝑠 ˆ I k 1 + 𝑠𝑒 ∞ + 𝑠k 𝑔ˆ − 𝑔| 1 albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version. For A2𝑠, sub and A sub,MC respec- 𝑀˜ 2𝑠, 𝑀˜ tively, the total number of evaluations of 𝑔 and computational complexity will be 𝑑𝑠2 log4 𝑀˜ ˜      O or O 𝑑𝑠 log ( 𝑀) 3 ˜ log 𝑑 𝑀 . log 𝑠 𝜎 Proof. The bandwidth specified ensures that B𝑀˜ ⊃ {k · z | k ∈ I}. In the case where 𝑔 is a trigonometric polynomial with supp( 𝑔) ˆ ⊂ I, so long as there exists some 𝑀 ≤ 𝑀˜ such that Λ(z, 𝑀) is reconstructing for I, we are guaranteed that a length- 𝑀˜ DFT on a polynomial supported on {k·z | k ∈ I} will not suffer from aliasing collisions. Thus, by Lemma 1.2, the one-dimensional Fourier transforms truncated to B𝑀˜ coincide with length 𝑀˜ DFTs. We can therefore view an approximation from the algorithm in Theorem 3.1 as one of a length 𝑀˜ DFT. The reasoning in the proofs of Lemma 3.2 and Corollary 3.1 then holds with the SFT algorithms, parameters, numbers of samples, and complexities of Theorem 3.1. Remark 3.3. As in Chapter 2, (2.7) and (2.8), we can estimate 𝑀˜ above with two different tech- 62 niques: Õ Õ 𝑀˜ = 1 + 2 max 𝑘 ℓ 𝑧ℓ ≤ 1 + 2 |𝑧ℓ | max |𝑘 ℓ | = O (𝑑𝐾I 𝑀), k∈I k∈I ℓ∈[𝑑] ℓ∈[𝑑] Õ   𝑀˜ = 1 + 2 max 𝑘 ℓ 𝑧ℓ ≤ 1 + 2kzk ∞ max kkk 1 = O 𝑀 max kkk 1 . k∈I k∈I k∈I ℓ∈[𝑑] The latter case is especially useful when I is a subset of a known ℓ 1 ball as it will provide a dimen- sion independent upper bound on 𝑀. ˜ Either of these upper bounds may then be used in practice to avoid having to estimate 𝑀. ˜ That being said however, if one is willing to perform the one-time search through the frequency set I to more accurately calculate 𝑀, ˜ one can go even further to use the minimal bandwidth 𝑀˜ 0 = maxk∈I (k · z) − mink∈I (k · z) + 1 so long as the function samples are properly modulated to shift the one-dimensional frequencies into B𝑀˜ 0 . For example, running A2𝑠, sub 𝑀˜ 0 or A2𝑠, sub,MC 𝑀˜ 0 on 𝑔 1d (𝑡) = j 0k e2𝜋i𝜙𝑡 𝑔(𝑡z) and 𝑔 1d,ℓ (𝑡) = e2𝜋i𝜙𝑡 𝑆ℓ,1/𝐾 𝑔(𝑡z) with 𝜙 = 𝑀2 − maxk∈I (k · z) is acceptable so long as ˜ this shift is accounted for in the frequency check on Line 11. Note though that these improvements will only have the effect of reducing the logarithmic factors in the computational complexity. 3.3.2 Two-dimensional DFT technique Below, we will consider a method for recovering frequencies which, rather than shifting one dimension of the multivariate periodic function 𝑔 at a time, leaves one dimension of 𝑔 out at a time. We will fix one dimension ℓ ∈ [𝑑] of 𝑔 at equispaced nodes over T and apply a lattice SFT to the other 𝑑 − 1 components. Applying a standard FFT to the results will produce a two-dimensional DFT. The indices corresponding to the standard FFT will represent frequency components in di- mension ℓ while the indices corresponding to the lattice SFT will be used to synchronize with known one-dimensional frequencies k · z mod 𝑀. Note that below, we will separate out coordinate ℓ of a multivariate point x ∈ T𝑑 or frequency k ∈ Z𝑑 , denoting the remaining coordinates as x0ℓ ∈ T𝑑−1 or k0ℓ ∈ Z𝑑−1 . With a slight abuse of notation, we can rewrite the original point or frequency as x = (𝑥ℓ , x0ℓ ) or k = (𝑘 ℓ , k0ℓ ). Again, before stating Algorithm 3.2 in detail, we present an example. 63 Example 3.2 (Two-dimensional DFT technique on a trigonometric monomial). As in Example 3.1, we let 𝑔 be the trigonometric monomial 𝑔(x) := e2𝜋ik·x . However, in this example, we let 𝑑 = 3, so k ∈ I ⊂ Z3 and the domain of 𝑔 is T3 depicted in Figure 3.2. We will consider the procedure to compute the ℓ = 0 component of k. First, we take a reconstructing rank-1 lattice Λ(z, 𝑀) for I and restrict all but the first component of 𝑔 to the lattice. This produces a two-dimensional function of the form (𝑥 0 , 𝑡) ↦→ e2𝜋i(𝑘 0 𝑥0 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡) . We then sample this function at 𝐾 equispaced points over T in the 𝑥0 variable. This produces 𝐾 projected lattices spaced 1/𝐾 apart in the 𝑥 0 direction on which we sample 𝑔, depicted in Figure 3.2. Fixing 𝑥 0 at each equispaced point produces the 𝐾 univariate functions which are organized into the top array of Figure 3.3. Notice that colors of the entries in this array correspond to the lattices in Figure 3.2 over which we sample 𝑔 to produce that entry. The next step is to apply an SFT to each of the univariate functions in this array. Each function has exactly one active frequency, 𝑘 1 𝑧1 + 𝑘 2 𝑧 2 , with corresponding Fourier coefficient e2𝜋i𝑘 0 𝑗/𝐾 . Thus, collecting the results into a matrix produces the left-most matrix in Figure 3.3 with only the 𝑘 1 𝑧1 + 𝑘 2 𝑧2 mod 𝑀 column filled. This column contains 𝐾 equispaced samples of the function e2𝜋i𝑘 0 𝑥0 , and so finally applying a DFT to the matrix will produce the right-most matrix in Figure 3.3. We find only one entry in row 𝑘 0 mod 𝑀 corresponding to the only active frequency of e2𝜋i𝑘 0 𝑥0 . Thus, we can read off the ℓ = 0 entry of k by determining which row contains the Fourier coefficient of 𝑔 of interest. Repeating this process for all ℓ = 0, . . . , 𝑑 − 1 we will be able to recover k. We now generalize the procedure demonstrated in Example 3.2 in a lemma. In particular, we must account for functions which have more than one significant frequency. For theoretical sim- plicity, we use a length 𝑀-DFT in the first step rather than an SFT. Lemma 3.3. Fix some finite multivariate frequency set I ⊂ B𝐾𝑑 , let Λ(z, 𝑀) be a reconstructing rank-1 lattice for {k − 𝑘 ℓ eℓ | k ∈ I} (where eℓ ∈ Z𝑑 is the canonical basis vector which has (eℓ )ℓ = 1 and zeros in all other entries) for all ℓ ∈ [𝑑], and assume that 𝑔 has Fourier support supp( 𝑔) ˆ ⊂ I. Fixing one dimension ℓ ∈ [𝑑], and writing the generating vector as z = (𝑧ℓ , z0ℓ ) ∈ Z𝑑 , 64 𝑥2 1 𝑥1 𝑥0 𝐾 Figure 3.2 An example of T3 depicting the 𝐾 projected rank-1 lattices on which 𝑔(x) is sampled to compute the ℓ = 0 component of each 𝑑-dimensional frequency. © e2𝜋i(𝑘 0 𝐾0 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡) ª ­ ® ­ e2𝜋i(𝑘 0 𝐾1 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡) ­ ® ® ­ ® ­ .. ® ­ ­ . ® ® ­ 2𝜋i(𝑘 𝐾 −2 +𝑘 𝑧 𝑡+𝑘 𝑧 𝑡) ® ­ e 0 𝐾 1 1 2 2 ® ­ ® ­ 2𝜋i(𝑘 0 𝐾 −1 +𝑘 1 𝑧1 𝑡+𝑘 2 𝑧2 𝑡) ® e 𝐾 « ¬ Apply SFT A 𝑠,𝑀 to rows © 0 ··· 0 0 0 ··· 0 ª 0 0 ··· 0 e2𝜋i𝑘 0 𝐾 0 ··· 0 Apply ­ ® © ª ­ .. . . .. . . . .. ® ­ ® ­ . . . . ® e2𝜋i𝑘 0 𝐾 1 F𝐾 to ­ ® ­ 0 ··· 0 0 ··· 0 ® ­ ­ 0 0 0 0 0 ® columns ­ ® ® row 𝑘 0 mod 𝐾 ­ ® ­ .. ® ­ 0 ··· 0 1 0 ··· 0 ® ­ ­ . ® ® ­ ® ­ 0 0 0 0 0 ®® 2𝜋i𝑘 0 𝐾𝐾−2 ­ ® 0 ··· 0 e 0 ··· 0 ­ ® ­ ­ ® ­ ® ­ .. . . .. . . . .. ® 2𝜋i𝑘 0 𝐾𝐾−1 ­ . . . . ® ­ ® 0 ··· 0 e 0 ··· 0 ­ ® « ¬ 0 ··· 0 0 0 ··· 0 column 𝑘 1 𝑧1 + 𝑘 2 𝑧2 mod 𝑀 « ¬ column 𝑘 1 𝑧1 + 𝑘 2 𝑧2 mod 𝑀 Figure 3.3 One round of the basic procedure for the two dimensional DFT algorithm applied to the trigonometric monomial 𝑔(x) = e2𝜋ik·x sampled over the sets depicted in Figure 3.2. Notice that each row corresponds to samples of 𝑔(x) on the shifted lattice of the corresponding color. 65 define the polynomials   𝑗 𝑔 1d,ℓ 𝑗 (𝑡) := 𝑔 , 𝑡z for all 𝑗 ∈ [𝐾], 0 𝐾 ℓ that is, fix coordinate ℓ at 𝑗/𝐾 and restrict the remaining coordinates to dimensions [𝑑] \ {ℓ} of the rank-1 lattice. Then for all one-dimensional frequencies 𝜔 ∈ [𝑀],  e2𝜋i 𝑗 ℎℓ /𝐾 𝑔ˆ (ℎℓ ,kℓ0 ) if ∃k ∈ I with 𝜔 ≡ k0ℓ · z0ℓ (mod 𝑀),   Í   ℎℓ ∈B𝐾 s.t.     F 𝑀 g1d,ℓ  𝑗 = (ℎℓ ,kℓ0 )∈I 𝜔   otherwise.  0       Moreover, defining the matrix Gℓ = F 𝑀 g1d,ℓ 𝑗 , we have 𝜔 𝑗 ∈[𝐾],𝜔∈[𝑀]   F𝐾 Gℓ = 𝑔ˆ k for all k ∈ I, 𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀 and the remaining entries of the matrix F𝐾 Gℓ ∈ C𝐾×𝑀 are zero. Proof. Using the Fourier series representation of 𝑔, we have   𝑗 𝑘ℓ 2𝜋i +kℓ0 ·zℓ0 𝑡 Õ 𝑔 1d,ℓ 𝑗 (𝑡) := 𝑔ˆ k e 𝐾 . k∈I We calculate for 𝜔 ∈ [𝑀] 2 𝜋i(h0 ·z0 − 𝜔)𝑖   1 Õ Õ 2 𝜋i 𝑗 ℎℓ F 𝑀 g1d,ℓ ℓ ℓ 𝑗 = e 𝐾 𝑔ˆ h e 𝑀 𝜔 𝑀 𝑖∈[𝑀] h∈I Õ 2 𝜋i 𝑗 ℎℓ = e 𝐾 𝑔ˆ h 𝛿0,(hℓ0 ·zℓ0 −𝜔 mod 𝑀) . h∈I When there exists some k ∈ I such that k0ℓ ·z0ℓ ≡ 𝜔 mod 𝑀, the fact that Λ(z, 𝑀) is a reconstructing rank-1 lattice for {k− 𝑘 ℓ eℓ | k ∈ I} ensures that such k0ℓ satisfying this equivalence is unique. Then, we can simplify this sum to   Õ 2 𝜋i 𝑗 ℎℓ F 𝑀 g1d,ℓ 𝑗 = e 𝐾 𝑔ˆ (ℎℓ ,kℓ0 ) . 𝜔 ℎℓ ∈B𝐾 s.t. (ℎℓ ,kℓ0 )∈I When no k ∈ I exists such that k0ℓ · z0ℓ ≡ 𝜔 (mod 𝑀), this sum is instead zero as desired. Applying F𝐾 to Gℓ then allows us to compute  ℓ  1 Õ Õ 2 𝜋i(ℎℓ −𝑘ℓ mod 𝐾 ) 𝑗 F𝐾 G = 𝑔ˆ (ℎℓ ,kℓ0 ) e 𝐾 = 𝑔ˆ k . 𝑘 ℓ mod 𝐾, kℓ0 ·zℓ0 mod 𝑀 𝐾 𝑗 ∈[𝐾] ℎ ∈B ℓ 𝐾 66 Algorithm 3.2 Frequency Index Recovery by Two Dimensional DFT Input: A multivariate periodic function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ) (from which we are able to obtain potentially noisy samples), a multivariate frequency set I ⊂ B𝐾𝑑 , a rank-1 lattice Λ(z, 𝑀) which is reconstructing for I and {k − 𝑘 ℓ eℓ | k ∈ I} for all ℓ ∈ [𝑑], and an SFT algorithm A 𝑠,𝑀 . Output: Sparse coefficient vector ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 (optionally supported on I, see Line 16), an 𝐾 approximation to ( 𝑔| ˆ I)𝑠 . opt 1: Apply A 𝑠,𝑀 to the univariate restriction of 𝑔 to the lattice, 𝑔 1d (𝑡) := 𝑔(𝑡z), to produce ĝ1d,𝑠 := A 𝑠,𝑀 𝑔 1d , a sparse approximation of F 𝑀 g1d ∈ C 𝑀 . 2: for all ℓ ∈ [𝑑] do 3: for all 𝑗 ∈ [𝐾] do   4: Apply A 𝑠,𝑀 to 𝑔 1d,ℓ 𝑗 (𝑡) := 𝑔 𝑗 𝐾 , 𝑡z0 to produce ĝ1d,ℓ,𝑠 := A ℓ 𝑗 𝑠,𝑀 𝑔 𝑗 , a sparse approxi- 1d,ℓ mation of F 𝑀 g1d,ℓ 𝑗 . 5: Row 𝑗 of Gℓ,𝑠 ← ĝ1d,ℓ,𝑠 𝑗 . 6: end for 7: for all nonzero columns 𝜔 of Gℓ,𝑠 do 8: Apply F𝐾 to column 𝜔 of Gℓ,𝑠 to produce F𝐾 Gℓ,𝑠 . 9: end for 10: end for 11: ĝ𝑠 ← 0 12: for all 𝜔 ∈ supp( ĝ1d,𝑠 ) do 13: for all ℓ ∈ [𝑑] do 14: ((𝑘 𝜔 )ℓ , ∼) ← arg min{| 𝑔ˆ 𝜔1d,𝑠 − (F𝐾 Gℓ,𝑠 ) ℎ,𝜔0 | | (ℎ, 𝜔0) ∈ B𝐾 × [𝑀] with ℎ𝑧ℓ + 𝜔0 ≡ 𝜔 mod 𝑀 } 15: end for 16: if k𝜔 · z ≡ 𝜔 mod 𝑀 (and optionally k𝜔 ∈ I) then 17: 𝑔ˆ k𝑠 𝜔 ← 𝑔ˆ k𝑠 𝜔 + 𝑔ˆ 𝜔1d,𝑠 18: end if 19: end for Example 3.2 and Lemma 3.3 explain the procedure in Lines 1 through 10 of Algorithm 3.2. However, some care must be taken when we assign rows of nonzero entries in the resulting matrix to coordinates of significant frequencies. The solution is the minimization problem in Line 14. This step uses column information as well as the values of the entries in the matrix to ensure that we are properly matching frequency components with the correct Fourier coefficient 𝑔ˆ 𝜔1d,𝑠 . The remainder of the algorithm is the same as Algorithm 3.1. Line 16 consists of the same check to ensure that recovered frequencies are correct, and if this check passes, the one-dimensional Fourier coefficient is assigned to its matched 𝑑-dimensional frequency. Remark 3.4. We bring special attention to the fact that Algorithm 3.2 requires as input a rank-1 67 lattice Λ(z, 𝑀) which is reconstructing for not only I, but also the projections of I of the form {k − 𝑘 ℓ eℓ | k ∈ I} for any ℓ ∈ [𝑑]. For frequency sets I which are downward closed, that is, if I is such that for any k ∈ I and h ∈ Z𝑑 , |h| ≤ |k| component-wise implies that h ∈ I, any reconstructing rank-1 lattice for I is necessarily one for the considered projections as well. Thus, for many frequency spaces of interest, e.g., hyperbolic crosses (cf. Remarks 3.2 and 3.3 as well as Section 3.4 below), any reconstructing rank-1 lattice for I will suffice as input to Algorithm 3.2. 3.3.2.1 Analysis of Algorithm 3.2 With the conceptual explanation of Algorithm 3.2 complete, we now provide error guarantees for its output. Lemma 3.4 (General recovery result for Algorithm 3.2.). Let 𝑔, I, and Λ(z, 𝑀) be as specified in the input to Algorithm 3.2. Additionally, let A 𝑠,𝑀 be a noise-robust SFT algorithm satisfying the same constraints as in Lemma 3.2 with parameters 𝜏 and 𝜂∞ holding uniformly for each SFT performed in Algorithm 3.2. Collect the 𝜏-significant frequencies of 𝑔 into the set S𝜏 := {k ∈ I | | 𝑔ˆ k | > 𝜏} and assume that |S𝜏 | ≤ 𝑠. Then Algorithm 3.2 (ignoring the optional check on Line 16) will produce an 𝑠-sparse approximation of the Fourier coefficients of 𝑔 satisfying the error estimates p k ĝ𝑠 − 𝑔k ˆ ℓ2 (Z𝑑 ) ≤ 𝜂2 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + k 𝑔| ˆ I − 𝑔| ˆ S4𝜏 k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ2 (Z𝑑 ) k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ 𝜂1 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + 𝑔| ˆ I − 𝑔| ˆ S4𝜏 ℓ 1 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ1 (Z𝑑 ) . requiring O (𝑑𝐾 · 𝑃(𝑠, 𝑀)) total evaluations of 𝑔, in O (𝑑𝐾 (𝑅(𝑠, 𝑀) + 𝑠𝐾 log 𝐾)) total operations. Proof. We begin by assuming that 𝑔 is a trigonometric polynomial with supp( 𝑔) ˆ ⊂ I. Since Λ(z, 𝑀) is a reconstructing rank-1 lattice for I, the DFT-aliasing ensures that Line 1 of Algo- rithm 3.2 will return approximate coefficients uniquely corresponding to all 𝜏-significant frequen- cies k ∈ S𝜏 which we can label 𝑔ˆ k·z 1d,𝑠 mod 𝑀 . Additionally, Line 4 recovers approximations to all 𝜏-significant frequencies of F 𝑀 g1d,ℓ 𝑗 which have the form given in Lemma 3.3. In particular, if 68 k ∈ S𝜏 , we have   𝜏 < | 𝑔ˆ k | = F𝐾 Gℓ 𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀 1 Õ   −2 𝜋i 𝑗 𝑘ℓ mod 𝐾 = F 𝑀 g1d,ℓ 𝑗 e 𝐾 𝐾 kℓ0 ·zℓ0 mod 𝑀 𝑗 ∈[𝐾] 1 Õ   ≤ F 𝑀 g1d,ℓ 𝑗 𝐾 kℓ0 ·zℓ0 mod 𝑀 𝑗 ∈[𝐾] ≤ max (F 𝑀 g1d,ℓ 𝑗 )kℓ ·zℓ mod 𝑀 . 0 0 𝑗 ∈[𝐾] Thus, there exists at least one F 𝑀 g1d,ℓ 𝑗 with k0ℓ · z0ℓ mod 𝑀 recovered as a 𝜏-significant frequency in the SFT of Line 4, and k0ℓ · z0ℓ mod 𝑀 will be a nonzero column in Gℓ,𝑠 for all k ∈ S𝜏 . Analyzing these SFTs in more detail for any k ∈ I such that k0ℓ · z0ℓ mod 𝑀 is a nonzero column of Gℓ,𝑠 , we write       ĝ1d,ℓ,𝑠 𝑗 = F 𝑀 g1d,ℓ 𝑗 + 𝜂ℓ𝑗 kℓ0 ·zℓ0 mod 𝑀 kℓ0 ·zℓ0 mod 𝑀 kℓ0 ·zℓ0 mod 𝑀 where, by the ℓ ∞ and recovery guarantees for A 𝑠,𝑀 , the error satisfies    if ĝ1d,ℓ,𝑠  ≠0   𝜂∞      𝑗 kℓ0 ·zℓ0 mod 𝑀 𝜂ℓ𝑗 ≤ ≤ 𝜏. kℓ0 ·zℓ0 mod 𝑀   if ĝ1d,ℓ,𝑠   𝜏  𝑗 =0 kℓ0 ·zℓ0 mod 𝑀   Thus, in the application of F𝐾 to column k0ℓ · z0ℓ mod 𝑀 of Gℓ,𝑠 , we have   ℓ,𝑠 F𝐾 G 𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀    !   = F𝐾 Gℓ + F𝐾 𝜂ℓ𝑗 𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀 kℓ0 ·zℓ0 mod 𝑀 𝑗 ∈[𝐾] 𝑘 ℓ mod 𝐾 =: 𝑔ˆ k + 𝜂kℓ with 1 Õ  ℓ −2 𝜋i 𝑗 𝑘ℓ mod 𝐾   |𝜂kℓ | = 𝜂𝑗 0 0 e 𝐾 ≤ max 𝜂ℓ𝑗 0 0 ≤ 𝜏. 𝐾 kℓ ·zℓ mod 𝑀 𝑗 ∈[𝐾] kℓ ·zℓ mod 𝑀 𝑗 ∈[𝐾] 69 These same calculations apply to the computed columns of F𝐾 Gℓ,𝑠 which do not correspond to values of k0ℓ · z0ℓ mod 𝑀 for some k ∈ I since we assume supp( 𝑔) ˆ ⊂ I, and so at worst, these columns are filled with noise bounded in magnitude by 𝜏. Restricting our attention to k ∈ S4𝜏 ⊂ S𝜏 , we know that Line 14 will be run with 𝜔 = k · z mod 𝑀 and (𝑘 ℓ mod 𝐾, k0ℓ ·z0ℓ mod 𝑀) as an admissible index in the minimization. By the reconstructing property of Λ(z, 𝑀), no other h ∈ I will correspond to an admissible index (ℎℓ mod 𝐾, h0ℓ · z0ℓ mod 𝑀), and so the only remaining values of (F𝐾 Gℓ,𝑠 ) ℎ,𝜔0 in the minimization correspond to pure noise 𝜂 bounded in magnitude by 𝜏. Analyzing the objective at (𝑘 ℓ mod 𝐾, k0ℓ · z0ℓ mod 𝑀), we find 1d,𝑠 1d,𝑠 | 𝑔ˆ k·z mod 𝑀 − (F𝐾 Gℓ,𝑠 ) 𝑘 ℓ mod 𝐾,kℓ0 ·zℓ0 mod 𝑀 | ≤ 2𝜏 < | 𝑔ˆ k | − 2𝜏 ≤ | 𝑔ˆ k·z mod 𝑀 − 𝜂|, and so the value for (𝑘 𝜔 )ℓ will in fact be assigned 𝑘 ℓ . Thus, after all 𝑑 components of k𝜔 = k have been recovered, 𝑔ˆ k𝑠 will be assigned 𝑔ˆ k·z 1d,𝑠 mod 𝑀 . The remaining max(𝑠 − |S4𝜏 |, 0) nonzero entries of ĝ1d,𝑠 can be distributed to entries of ĝ𝑠 possibly correctly but with no guarantee; at the very least however, these values must be at most 4𝜏+ 𝜂∞ in magnitude. We split ĝ𝑠 as ĝ𝑠 = ĝ𝑠,correct + ĝ𝑠,incorrect to account for the values of ĝ𝑠 respectively assigned correctly and incorrectly and note that supp( ĝ𝑠,correct ) ⊃ S4𝜏 . We then estimate the ℓ 2 error as k ĝ𝑠 − 𝑔k ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠,correct − 𝑔| ˆ supp(ĝ𝑠,correct ) k ℓ2 (Z𝑑 ) + k ĝ𝑠,incorrect k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔|ˆ supp(ĝ𝑠,correct ) k ℓ2 (Z𝑑 ) p ≤ 𝜂2 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + k 𝑔ˆ − 𝑔| ˆ S4𝜏 k ℓ2 (Z𝑑 ) and the ℓ 1 error as k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ ĝ𝑠,correct − 𝑔| ˆ supp(ĝ𝑠,correct ) ℓ 1 (Z𝑑 ) + ĝ𝑠,incorrect ℓ 1 (Z𝑑 ) + 𝑔ˆ − 𝑔| ˆ supp(ĝ𝑠,correct ) ℓ 1 (Z𝑑 ) ≤ 𝜂1 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + 𝑔ˆ − 𝑔| ˆ S4𝜏 ℓ 1 (Z𝑑 ) . As in the proof of Lemma 3.2, we note that the mandatory check in Line 16 helps ensure that all misassigned values 𝑔ˆ 𝜔1d,𝑠 which contribute to ĝ𝑠,incorrect correspond to reconstructed k𝜔 outside of I, with the optional check in this line (see Remark 3.2) eliminating ĝ𝑠,incorrect and the corresponding term in the error estimate entirely. 70 Now, supposing that the Fourier support of 𝑔 is not limited to only I, just as in the analysis for Algorithm 3.1, we treat 𝑔 as a perturbation of 𝑔| I , and use the robust SFT algorithm and the previous argument to approximate 𝑔| ˆ I . Note again that in each SFT, the noise added when using measurements of 𝑔 as proxies for those of 𝑔| I is compounded by k𝑔| Z𝑑 \I k ∞ and is bounded by ˆ I k ℓ1 (Z𝑑 ) . Applying the guarantees above gives the ℓ 2 estimate k 𝑔ˆ − 𝑔| k ĝ𝑠 − 𝑔k ˆ ℓ2 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔| ˆ I k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ2 (Z𝑑 ) p ≤ 𝜂2 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + k 𝑔| ˆ I − 𝑔| ˆ S4𝜏 k ℓ2 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ2 (Z𝑑 ) and the ℓ 1 estimate k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ k ĝ𝑠 − 𝑔| ˆ I k ℓ1 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ1 (Z𝑑 ) ≤ 𝜂1 + (4𝜏 + 𝜂∞ ) max(𝑠 − |S4𝜏 |, 0) + 𝑔| ˆ I − 𝑔| ˆ S4𝜏 ℓ 1 (Z𝑑 ) + k 𝑔ˆ − 𝑔| ˆ I k ℓ1 (Z𝑑 ) . Employing fast Fourier transforms for the at most 𝑑𝑠𝐾 DFTs, the computational complexity of Lines 2 to 10 is O 𝑑 (𝐾 · 𝑅(𝑠, 𝑀) + 𝑠𝐾 2 log 𝐾) (which dominates the complexity of the remainder  of the algorithm). Since 1 + 𝑑𝐾 SFTs are required, the number of 𝑔 evaluations is O (𝑑𝐾 · 𝑃(𝑠, 𝑀)). Remark 3.5. Though the number of nonzero columns of Gℓ,𝑠 can be theoretically at most 𝑠𝐾, in practice with a high quality algorithm, each of the 𝐾 SFTs should recover nearly the same frequen- cies, meaning that there are actually O (𝑠) columns. This would remove a power of 𝐾 in the second term of the runtime estimate. Note however, that even with near exact SFT algorithms, recovering exactly 𝑠 total frequencies is not a certainty. There can be cancellations for certain terms in F 𝑀 g1d,ℓ 𝑗 depending interactions between the coefficients sharing the same values on their [𝑑] \ {ℓ} entries, which makes it possible that an SFT on F 𝑀 g1d,ℓ 𝑗 will miss coefficients. If required to output 𝑠-entries, an SFT algorithm could favor some noisy value corresponding to a frequency outside the support. Remark 3.6. Though we perform an exact FFT of the nonzero columns of G1d,ℓ in Line 8 of Algo- rithm 3.2, Lemma 3.3 implies that the resulting matrix will be as sparse as the original function’s Fourier transform. Thus, for a truly compressible function, an SFT down the columns of G1d,ℓ 71 would be feasible as well. However, in especially higher dimensions, even small 𝐾 can allow for large frequency spaces I. In these large frequency spaces, what is perceived as relatively sparse can therefore quickly surpass 𝐾, rendering an 𝑠-sparse, length 𝐾 SFT useless. As a simple example, consider I to be the cube of side length 𝐾 = 𝑠, B𝑠𝑑 . For 𝑑 large enough, any frequency support of size 𝑠 will be small in comparison to |I| = 𝑠 𝑑 . However, using an 𝑠-sparse SFT instead of a length-𝑠 DFT in Algorithm 3.2 will actually be more expensive. Applying the discrete sublinear-time SFT from Theorem 3.2 to Lemma 3.4 analogously to the derivation of Corollary 3.1 from Lemma 3.2 allows for the following recovery bound for Algo- rithm 3.2. In particular, we observe asymptotically improved error guarantees over Corollary 3.1 at the cost of a slight increase in runtime. Corollary 3.3 (Algorithm 3.2 with discrete sublinear-time SFT). For I ⊂ Z𝑑 with reconstructing rank-1 lattice Λ(z, 𝑀) and the function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ), we consider applying Algorithm 3.2 where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute magnitude. Using the discrete sublinear-time SFT algorithm A2𝑠,𝑀 disc or A disc,MC with parameter 1 ≤ 𝑟 ≤ 2𝑠,𝑀 𝑀 36 will produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying the error estimates 𝐾 opt k 𝑔| ˆ I − ( 𝑔| ˆ I )𝑠 k1 √ 𝑠 k ĝ − 𝑔kˆ ℓ2 (Z𝑑 ) ≤ 206 √ + 821 𝑠(k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞) 𝑠 opt k ĝ𝑠 − 𝑔k ˆ ℓ1 (Z𝑑 ) ≤ 293 𝑔|ˆ I − ( 𝑔| ˆ I )𝑠 + 1161𝑠 (k𝑔k ∞ 𝑀 −𝑟 + k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑒∞) 1 albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version. The total number of evaluations of 𝑔 and the computational complexity will be !! 𝑠𝑟 3/2 log11/2 𝑀 O 𝑑𝑠𝐾 + 𝐾 log 𝐾 log 𝑠      𝑑𝐾 𝑀 or O 𝑑𝑠𝐾 𝑟 log (𝑀) log 3/2 9/2 + 𝐾 log 𝐾 𝜎 for A2𝑠,𝑀 disc or A disc,MC respectively. 2𝑠,𝑀 Again, the same strategy from Corollary 3.2 of widening the frequency band and shifting the one-dimensional transforms accordingly allows us to use the nonequispaced SFT algorithm from Theorem 3.1 in Algorithm 3.2. Note here that the widening and shifting occurs on a dimension by 72 dimension basis so as to account for the differing one-dimensional frequencies of the form k0ℓ · z0ℓ for k ∈ I. Corollary 3.4 (Algorithm 3.2 with nonequispaced sublinear-time SFT). For I ⊂ B𝐾𝑑 , let 𝑀˜ be the larger one-dimensional bandwidth parameter from Corollary 3.2, and additionally define 𝑀˜ ℓ := 2 maxk∈I |k0ℓ · z0ℓ | + 1. For Λ(z, 𝑀), a reconstructing rank-1 lattice for I and where 𝑀 is such that 𝑀 ≤ min{ 𝑀, ˜ minℓ∈[𝑑] 𝑀˜ ℓ }, for the function 𝑔 ∈ 𝑊 (T𝑑 ) ∩ 𝐶 (T𝑑 ), we consider applying Algorithm 3.2 where each function sample may be corrupted by noise at most 𝑒 ∞ ≥ 0 in absolute magnitude with the following modifications: 1. use the sublinear-time SFT algorithm A2𝑠, sub or A sub,MC in Line 1 and A sub 𝑀˜ 2𝑠, 𝑀˜ 2𝑠, 𝑀˜ ℓ or A2𝑠, sub,MC 𝑀˜ ℓ in Line 4 2. and only check equality against 𝜔 in Line 14 (rather than equivalence modulo 𝑀), to produce ĝ𝑠 = ( 𝑔ˆ k𝑠 )k∈B 𝑑 a 2𝑠-sparse approximation of 𝑔ˆ satisfying the error estimates 𝐾 ! opt k ˆ 𝑔| I − ( ˆ 𝑔| I 𝑠) k 1 √ √ k ĝ𝑠 − 𝑔k ˆ ℓ2 (Z𝑑 ) ≤ 98 √ + 𝑠k 𝑔ˆ − 𝑔| ˆ I k 1 + 𝑠𝑒 ∞ 𝑠   opt k ĝ𝑠 − 𝑔kˆ ℓ1 (Z𝑑 ) ≤ 139 𝑔| ˆ I − ( 𝑔| ˆ I )𝑠 ˆ I k 1 + 𝑠𝑒 ∞ , + 𝑠k 𝑔ˆ − 𝑔| 1 albeit with probability 1 − 𝜎 ∈ [0, 1) for the Monte Carlo version. Letting 𝑀¯ = max( 𝑀, ˜ maxℓ∈[𝑑] 𝑀˜ ℓ ), the total number of evaluations of 𝑔 will be 𝑑𝐾 𝑠2 log4 𝑀¯ 𝑑𝐾 𝑀¯      O or O 𝑑𝐾 𝑠 log 𝑀 log 3 ¯ log 𝑠 𝜎 with associated computational complexities 𝑠 log4 𝑀¯ 𝑑𝐾 𝑀¯         O 𝑑𝐾 𝑠 + 𝐾 log 𝐾 or O 𝑑𝐾 𝑠 log 𝑀 log 3 ¯ + 𝐾 log 𝐾 log 𝑠 𝜎 for A2𝑠,· sub and A sub,MC respectively. 2𝑠,· Remark 3.7. The bounds in Remark 3.3 will still hold for 𝑀˜ ℓ as well; thus one of these upper bounds can be used as the effective bandwidth parameter for every SFT without having to calculate the 𝑑 + 1 bandwidths by scanning I. Again however, if this scan is tolerable, one can reduce the overall complexity by using analogous minimal bandwidths discussed in Remark 3.3 along with corresponding frequency shifts. 73 3.4 Numerics We now demonstrate the effectiveness of our phase encoding and two-dimensional DFT al- gorithms for computing Fourier coefficients of multivariate functions in a series of empirical tests. The two techniques are implemented in MATLAB, with the code for the algorithms and tests in this section publicly available¹. The results below use a MATLAB implementation² of the randomized univariate sublinear-time nonequispaced algorithm A2𝑠,𝑀 sub,MC (cf. Theorem 3.1) as the underlying SFT for both multivariate approaches as this allows for the fastest runtime and most sample effi- cient implementations. In the univariate code, all parameters but one are qualitatively tuned below theoretical upper bounds to increase efficiency while maintaining accuracy and are kept constant between tests below. In particular, we fix the values C := 1, sigma := 2/3, and primeShift := 0 (see the documentation and the original paper [37] for more detail). The only parameter we vary is “randomScale” which affects the rate at which the deterministic algorithm A2𝑠,𝑀 sub is randomly sampled to produce the Monte Carlo version A2𝑠,𝑀 sub,MC . This parameter represents a multiplicative scaling on logarithmic factors of the bandwidth which determines how many prime numbers are randomly selected from those used in the deterministic SFT implementation. Therefore, lower values of “randomScale” will result in using fewer prime numbers, decreasing the number of function samples and overall runtime at the risk of a higher probability of failure. We consider values well below the code default and theoretical upper bound of 21 given in [37]. 3.4.1 Exactly sparse case In the beginning, we consider the case of multivariate trigonometric polynomials with frequen- cies supported within hyperbolic cross index sets. We define the 𝑑-dimensional hyperbolic cross frequency set     Ö 𝐾 𝐾 and    H𝐾𝑑 := k ∈ Z : 𝑑 max(1, |𝑘 ℓ |) ≤ max 𝑘 ℓ < ⊂ B𝐾𝑑  2 ℓ∈[𝑑] 2   ℓ∈[𝑑]   ¹available at https://gitlab.com/grosscra/Rank1LatticeSparseFourier ²available at https://gitlab.com/grosscra/SublinearSparseFourierMATLAB 74 where the second condition ensures that H𝐾𝑑 is of expansion 𝐾 ∈ N. For a given sparsity 𝑠, we choose 𝑠 many frequencies uniformly at random from H𝐾𝑑 , and we randomly draw corresponding Fourier coefficients 𝑔ˆ k from [−1, 1] +i [−1, 1], | 𝑔ˆ k | ≥ 10−3 . For each parameter setting, we perform the tests 100 times. Over these tests, we will determine the success rate as the percentage of times that all frequencies were correctly identified in the output. We focus on frequency identification since this is the core issue that Algorithms 3.1 and 3.2 solve, with the coefficient estimates carrying over directly from the SFT algorithm. Moreover, with the 𝑠 most significant frequencies identified, any alternative method for quickly computing the corresponding Fourier coefficients (if those from A 𝑠,𝑀 are not tolerable) can be performed. Nevertheless, see the experiments following those in Section 3.4.1.1 for examples where we compute ℓ 2 errors in the coefficient vectors rather than just comparing frequencies. 3.4.1.1 Randomized frequency sets within the 10-dimensional hyperbolic cross and high- dimensional full cuboids We set the spatial dimension 𝑑 := 10, the expansion 𝐾 := 33, and use I := H33 10 as set of possible frequencies with cardinality |I| = 45 548 649. Then, the rank-1 lattice with generating vector z :=(1, 33, 579, 3 628, 21 944, 169 230, (3.11) > 1 105 193, 7 798 320, 49 768 670, 320 144 128) and lattice size 𝑀 := 2 040 484 044 is a reconstructing one. We apply Algorithm 3.1 and Algo- rithm 3.2 with the SFT algorithm A2𝑠, sub,MC 𝑀˜ . In Figure 3.4a, the success rate over 100 test runs is plotted against the sparsity values 𝑠 ∈ {10, 20, 50, 100, 200, 500, 1000} for Algorithm 3.1 and 𝑠 ∈ {10, 20, 50, 100} for Algorithm 3.2. In Figure 3.4b, the average numbers of samples over 100 tests are reported. The magenta line with circles corresponds to Algorithm 3.1 with bandwidth parameter 𝑀˜ = 𝑑𝐾 𝑀 ≈ 6.7 · 1011 and randomScale = 0.3. We observe that the number of samples grow nearly linearly with respect to the sparsity 𝑠. Moreover, the success rate is at least 0.99 (99 out of 100 test runs), where we define success such that the support of output (sparse coefficient vector) contains the true frequencies. Next, we reduce the bandwidth 𝑀˜ to 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 (see also Remark 3.3) 75 𝑀 ∼𝑠 phase bwℓ ∞ rs=0.3 phase bwℓ 1 rs=0.3 phase bwℓ ∞ rs=0.3 phase bwℓ 1 rs=0.3 phase bwℓ 1 rs=0.5 2dim bwℓ 1 rs=0.3 phase bwℓ 1 rs=0.5 2dim bwℓ 1 rs=0.3 2dim bwℓ 1 rs=0.5 2dim bwℓ 1 rs=0.5 1.00 109 success rate samples 0.95 108 107 0.90 10 20 50 100 200 500 1,000 10 20 50 100 200 500 1,000 sparsity 𝑠 sparsity 𝑠 (a) Success rates vs. sparsity 𝑠. (b) Samples vs. sparsity 𝑠. Figure 3.4 Success rates and average number of samples over 100 test runs for Algorithm 3.1 with sub,MC A2𝑠, 𝑀˜ , denoted by “phase”, and Algorithm 3.2 with A2𝑠, sub,MC 𝑀˜ , denoted by “2dim”, on random multivariate trigonometric polynomials, setting randomScale := rs. Random frequencies are cho- sen from hyperbolic cross I := H33 10 . “bwℓ ∞ ” and “bwℓ 1 ” respectively correspond to the bandwidth parameters 𝑀˜ = 𝑑𝐾 𝑀 with approximate value 6.7 · 1011 and 𝑀˜ = 1 + 2kzk ∞ maxk∈I kkk 1 with approximate value 1.6 · 1010 . and visualize this as solid blue line with squares. This smaller bandwidth causes a decrease in the number of samples of up to 50 percent while only mildly decreasing the success rates to values not below 0.90. Increasing the randomScale parameter to 0.5, denoted by dashed blue line with squares, raises the success rate to 1.00 while achieving still fewer samples than bandwidth parameter 𝑀˜ = 𝑑𝐾 𝑀 ≈ 6.7·1011 and randomScale = 0.3 (solid magenta line with circles). The numbers of samples for Algorithm 3.2 are plotted as solid and dashed red lines with triangles for randomScale = 0.3 and 0.5, respectively, choosing the bandwidth 𝑀˜ := 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 . We observe that Algorithm 3.2 requires a much larger number of samples, more than one order of magnitude, compared to Algorithm 3.1, while achieving similar success rates. For comparison, in the case of sparsity 𝑠 = 100 and randomScale = 0.5, Algorithm 3.2 takes almost 𝑀 = 2 040 484 044 samples, the number to use a non-SFT, standard rank-1 lattice FFT. 76 phase rs=0.3 2dim rs=0.3 ∼𝑑 ∼ 𝑑2 phase rs=0.3 2dim rs=0.3 1.00 success rate samples 109 0.99 0.98 108 10 11 12 13 14 15 16 17 18 19 20 10 11 12 13 14 15 16 17 18 19 20 dimension 𝑑 dimension 𝑑 (a) Success rate vs. spatial dimension 𝑑. (b) Samples vs. spatial dimension 𝑑. Figure 3.5 Average number of samples over 100 test runs for Algorithm 3.1 with SFT algorithm sub,MC A2𝑠, 𝑀˜ , denoted by “phase”, and Algorithm 3.2 with A2𝑠, sub,MC 𝑀˜ , denoted by “2dim”, on random multivariate trigonometric polynomials, setting randomScale := rs. Random frequencies are cho- sen from full cuboid of cardinality |I| ≈ 1012 with lattice size 𝑀 = |I| and bandwidth parameter 𝑀˜ = 𝑀. In Figure 3.5b, we investigate the dependence of the required number of samples of Algo- rithm 3.1 and 3.2 on the spatial dimension 𝑑, where we consider the values 𝑑 ∈ {10, 11, . . . , 20}. As before, the success rates are reported in Figure 3.5a. For this, we use a slightly different setting, where we choose 𝑠 = 100 random frequencies from a full cuboid of cardinality ≈ 1012 . Note that a cuboid with edge lengths 𝐾1 , 𝐾2 , . . . , 𝐾 𝑑 has the rank-1 lattice construction ©Ö ª z = (1, 𝐾1 , 𝐾1 · 𝐾2 , . . . , 𝐾1 · 𝐾2 · · · 𝐾 𝑑−1 ) = ­ 𝐾𝑗® « 𝑗=[ℓ] ¬ℓ=[𝑑] with lattice size 𝑀 = ℓ∈[𝑑] 𝐾ℓ = |I|. The main benefit of this construction is that the map Î k ↦→ k · z is a bijection between I and B𝑀 . Thus, the one-dimensional bandwidth parameter 𝑀˜ = 2 maxk∈I |k · z| + 1 (which is usually larger than 𝑀) in this case coincides with 𝑀 = |I|. By choosing cuboids in this experiment which have approximately the same cardinality, we remove any dependence on 𝑀˜ in our experiments, allowing us to focus on the dependence on 𝑑. In our examples, the cuboids are constructed by manually tuning the edge lengths for each dimension so that the total cardinality is ≈ 1012 . One way to start this procedure is by computing 77 (1012 ) 1/𝑑 and then choosing 𝑑 edge lengths that approximately average to this value. From here, the edge lengths can be qualitatively tweaked to arrive at a cuboid of the desired size. For instance, we utilize the cuboid I := {−8, −7, . . . , 7}9 × {−7, −6, . . . , 7}, |I| ≈ 1.03 · 1012 , in the case 𝑑 = 10 and I := {−2, −1, . . . , 2} × {−2, −1, 0, 1}18 × {−1, 0, 1}, |I| ≈ 1.03 · 1012 , for 𝑑 = 20. Since the expansion 𝐾 is a factor in the number of samples of Algorithm 3.2 (cf. Corollary 3.4) and we want to concentrate on the dependence on the spatial dimension 𝑑, we now fix this parameter to 𝐾 := 16 independent of 𝑑. Moreover, the randomScale parameter is set to 0.3. The plots indicate that the numbers of samples grow approximately linearly with respect to the dimension 𝑑 as stated by Corollaries 3.2 and 3.4 for Algorithms 3.1 and 3.2, respectively. The success rates are slightly better compared to the tests from Figure 3.4a. 3.4.1.2 Random frequency sets within 10-dimensional hyperbolic cross and noisy samples In this section, we again consider random multivariate trigonometric polynomials with frequen- cies supported within the hyperbolic cross index set H33 10 of expansion 𝐾 = 33 and use the recon- structing rank-1 lattice with generating vector z as stated in (3.11) and size 𝑀 := 2 040 484 044. Similarly as in [43, Section 5.2], we perturb the samples of the trigonometric polynomial by addi- tive complex (white) Gaussian noise 𝜀 𝑗 ∈ C with zero mean and standard deviation 𝜎. The noise is √ generated by 𝜀 𝑗 := 𝜎/ 2 𝜀1, 𝑗 + i𝜀2, 𝑗 where 𝜀1, 𝑗 , 𝜀 2, 𝑗 are independent standard normal distributed.  Since the signal-to-noise ratio (SNR) can be approximately computed by 2 | 𝑔ˆ k | 2 Í 𝑀−1 𝑗=0 |𝑔(x 𝑗 )| /𝑀 Í ˆ k∈supp( 𝑔) SNR ≈ ≈ 2 , 2 Í 𝑀−1 𝑗=0 |𝜀 𝑗 | /𝑀 𝜎 qÍ √ this leads to the choice 𝜎 := ˆ k∈supp( 𝑔) | 𝑔ˆ k | 2 / SNR for a targeted SNR value. The SNR is often expressed in the logarithmic decibel scale (dB), with SNRdB = 10 log10 SNR and SNR = 10SNRdB /10 , i.e., a linear SNR = 102 corresponds to a logarithmic SNRdB = 20 and SNR = 103 corresponds to SNRdB = 30. Here, our tests use sparsity 𝑠 = 100 and signal-to-noise ratios SNRdB ∈ {0, 5, 10, 15, 20, 25, 30}. Moreover, we only use the bandwidth parameter 𝑀˜ = 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 . Besides that, we choose the algorithm parameters as in Fig- ure 3.4. 78 phase rs=0.3 phase rs=0.5 phase rs=0.3 phase rs=0.5 2dim rs=0.3 2dim rs=0.5 2dim rs=0.3 2dim rs=0.5 1 10−1 0.8 success rate relative ℓ 2 error 0.6 10−2 0.4 10−3 0.2 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 SNRdb SNRdb (a) success rates vs. noise level for 𝑠 = 100 (b) relative ℓ 2 errors vs. noise level for 𝑠 = 100 Figure 3.6 Average success rates (all frequencies detected) and relative ℓ 2 errors over 100 test runs for Algorithm 3.1 with A2𝑠,sub,MC 𝑀˜ , denoted by “phase”, and Algorithm 3.2 with A2𝑠, sub,MC 𝑀˜ , denoted by “2dim”, on random multivariate trigonometric polynomials supported on the hyperbolic cross I := H3310 , setting randomScale := rs ∈ {0.3, 0.5} and using the bandwidth parameter 𝑀 ˜ = 1+ 2kzk ∞ maxk∈I kkk 1 with approximate value 1.6 · 10 .10 In Figure 3.6a, we visualize the success rates depending on the noise level. For randomScale ∈ {0.3, 0.5} and both algorithms, the success rates start at less than 0.12 for SNRdB = 0 and grow for increasing signal-to-noise ratios until at least 0.90 for SNRdB = 30. The success rates of Algo- rithm 3.2 with A2𝑠, sub,MC 𝑀˜ (“2dim”) are often higher than for Algorithm 3.1 with A2𝑠, sub,MC 𝑀˜ (“phase”), which may be caused by the larger numbers of samples for Algorithm 3.2 and the noise model used. Note that the numbers of samples correspond to those in Figure 3.4b for 𝑠 = 100 independent of the noise level. For Algorithm 3.2 with randomScale = 0.3, the increase of the success rate seems to stagnate at SNRdB = 20, while this does not seem to be the case for randomScale = 0.5 or Algorithm 3.1. In particular, this behavior can also be observed in Figure 3.6b, where we plot the average relative ℓ 2 error of the Fourier coefficients against the signal-to-noise ratio. Here, we observe that for randomScale = 0.3, the decrease of the errors for increasing SNRdB values almost stops once reaching SNRdB = 20 for both algorithms. Initially, the average error of Algorithm 3.2 is smaller, but at SNRdB = 15 and higher, the average error of Algorithm 3.1 is smaller. In case 79 of randomScale = 0.5, we observe a distinct decrease for growing signal-to-noise ratios for both algorithms. 3.4.1.3 Deterministic frequency set within 10-dimensional hyperbolic cross and noisy sam- ples Next, instead of randomly chosen frequencies, we now consider frequencies on a 𝑑-dimensional weighted hyperbolic cross     Ö 𝐾 𝐾 and    H𝐾𝑑,𝛼 := k ∈ Z𝑑 : max(1, (ℓ + 1) 𝛼 |𝑘 ℓ |) ≤ max 𝑘 ℓ < .  2 ℓ∈[𝑑] 2   ℓ∈[𝑑]   Here, we use 𝑑 = 10, 𝐾 = 33, I := H33 10 , and 𝛼 = 1.7, which yields 𝑠 = |H 10,1.7 | = 101. Again, 33 the Fourier coefficients 𝑔ˆ k are randomly chosen from [−1, 1] + i [−1, 1], | 𝑔ˆ k | ≥ 10−3 . We use the same lattice and bandwidth parameter as in the last subsection as well as the same noise model and parameters. phase rs=0.3 phase rs=0.5 phase rs=0.3 phase rs=0.5 2dim rs=0.3 2dim rs=0.5 2dim rs=0.3 2dim rs=0.5 1 10−1 0.8 success rate relative ℓ 2 error 0.6 10−2 0.4 10−3 0.2 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 SNRdb SNRdb (a) success rates vs. noise level for 𝑠 = 100 (b) relative ℓ 2 errors vs. noise level for 𝑠 = 100 Figure 3.7 Average success rates (all frequencies detected) and relative ℓ 2 errors over 100 test runs for Algorithm 3.1 with A2𝑠, sub,MC 𝑀˜ , denoted by “phase”, and Algorithm 3.2 with A2𝑠, sub,MC 𝑀˜ , denoted by “2dim”, on multivariate trigonometric polynomials with (deterministic) frequencies on weighted hyperbolic cross within hyperbolic cross I := H33 10 , setting randomScale := rs ∈ {0.3, 0.5} and bandwidth parameter 𝑀˜ = 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 . In Figure 3.7, we depict the obtained results. In particular, the results in Figure 3.7a are very 80 similar to the ones for randomly chosen frequencies in Figure 3.6a. For the case of deterministic frequencies in Figure 3.7a, the success rates are slightly better. Moreover, we do not observe the “stagnation” of the success rates for Algorithm 3.2 with randomScale = 0.3. Correspondingly, the relative ℓ 2 errors, as shown in Figure 3.7b, decrease distinctly for growing signal-to-noise ratios. Algorithm 3.2 performs slightly better than Algorithm 3.1, but also requires more than one order of magnitude more samples, similar to the results shown in Figure 3.4b for 𝑠 = 100. 3.4.2 Compressible case in 10 dimensions In this section, we apply the methods on a test function which is not exactly sparse but com- pressible. In addition, we also consider noisy samples as in Section 3.4.1.2. We use the 10-variate periodic test function 𝑔 : T10 → R, Ö Ö Ö 𝑔(x) := 𝐾2 (𝑥ℓ ) + 𝐾4 (𝑥ℓ ) + 𝐾6 (𝑥ℓ ), (3.12) ℓ∈{0,2,7} ℓ∈{1,4,5,9} ℓ∈{3,6,8} from [59, Section 3.3] and [43, Section 5.3] which has infinitely many non-zero Fourier coeffi- cients 𝑔ˆ k , where 𝐾𝑚 : T → R is the B-Spline of order 𝑚 ∈ N, Õ  𝜋 𝑚 𝐾𝑚 (𝑥) := 𝐶𝑚 sinc 𝑘 (−1) 𝑘 e2𝜋i𝑘𝑥 , 𝑘 ∈Z 𝑚 with a constant 𝐶𝑚 > 0 such that k𝐾𝑚 k 𝐿 2 (T) = 1. We remark that each B-Spline 𝐾𝑚 of order 𝑚 ∈ N is a piece-wise polynomial of degree 𝑚 − 1. We apply Algorithm 3.1 with A2𝑠, sub,MC 𝑀˜ and use the sparsity parameters 𝑠 ∈ {50, 100, 250, 500, 1000, 2000}, which corresponds to 2𝑠 ∈ {100, 200, 500, 1000, 2000, 4000}-many frequencies and Fourier coefficients for the output of Al- gorithm 3.1. We use the frequency set I := H33 10 and randomScale := rs ∈ {0.05, 0.1}. Moreover, we work with the same rank-1 lattice as in Section 3.4.1.2. The obtained basis index sets supp( ĝ𝑠 ) should “consist of” the union of three lower dimensional manifolds, a three-dimensional hyperbolic cross in the dimensions 1, 3, 8; a four-dimensional hyper- bolic cross in the dimensions 2, 5, 6, 10; and a three-dimensional hyperbolic cross in the dimensions 4, 7, 9. All tests are performed 100 times and the relative 𝐿 2 approximation error q k𝑔k 2𝐿 2 − k∈supp(ĝ𝑠 ) | 𝑔ˆ k | 2 + k∈supp(ĝ𝑠 ) | 𝑔ˆ k𝑠 − 𝑔ˆ k | 2 Í Í k𝑔 − 𝑔 k 𝐿 2 𝑠 = k𝑔k 𝐿 2 k𝑔k 𝐿 2 81 is computed each time. rs=0.05, noiseless rs=0.1, SNRdb = 10 rs=0.1, SNRdb = 20 rs=0.1, SNRdb = 30 𝑀 ∼𝑠 rs=0.1, noiseless ( 𝑔| opt ˆ I )2𝑠 rs=0.05 rs=0.1 100 109 10 −1 relative 𝐿 2 error samples 108 10 −2 107 10 −3 100 200 500 1,000 2,000 4,000 100 200 500 1,000 2,000 4,000 sparsity 2𝑠 of the approximation sparsity 2𝑠 of the approximation (a) samples vs. sparsity 2𝑠 (b) relative 𝐿 2 errors vs. sparsity 2𝑠 Figure 3.8 Average number of samples and relative 𝐿 2 errors over 100 test runs for Algorithm 3.1 with A2𝑠, sub,MC 𝑀˜ on 10-dimensional test function (3.12) consisting of tensor products of B-Splines of different order. Search space is unweighted hyperbolic cross I := H3310 with SFT parameters randomScale := rs ∈ {0.05, 0.1} and 𝑀˜ = 1 + 2kzk ∞ maxk∈I kkk 1 ≈ 1.6 · 1010 . In Figure 3.8a, we visualize the average number of samples against the sparsity 2𝑠 of the approx- imation. We observe an almost linear increase with respect to 2𝑠. In Figure 3.8b, we show the aver- age relative errors for randomScale ∈ {0.05, 0.1} in the noiseless case as well as randomScale = 0.1 for SNRdb ∈ {10, 20, 30}. In general, for increasing sparsity, the errors become smaller. For randomScale = 0.05 in the noiseless case and randomScale = 0.1 with SNRdb = 10, the average error is similar and stays above 3 · 10−2 even for sparsity 2𝑠 = 4000. For higher signal-to-noise ratio, the error decreases further. For SNRdb = 30, the obtained average error is 6.1 · 10−3 for 2𝑠 = 4000, which is only approximately twice as high as the best possible error when using the 2𝑠 largest (by magnitude) Fourier coefficients 𝑔ˆ k with the restriction k ∈ I := H33 10 . The latter is plotted in Figure 3.8b as dashed line without markers. 82 CHAPTER 4 SPARSE FOURIER SPECTRAL METHODS FOR SOLVING PDE As discussed in Section 1.1.3, this chapter focuses on a sparse spectral method for solving el- liptic PDEs. We begin with a review of the literature on sparse spectral methods against which we motivate our results in Section 4.1. Section 4.2 gives the advection-diffusion-reaction PDE setup and Section 4.3 converts this problem to its Galerkin representation underpinning the spec- tral method approach. The following three sections provide the ingredients outlined by Strang’s lemma, Lemma 1.1, in Section 1.1.3: 1. a Fourier series truncation method for the solution and the resulting error analysis (Sec- tion 4.4), 2. a (sparse) Fourier series approximation technique (Section 4.5), and 3. a version of Strang’s lemma that ties everything together (Section 4.6). We close with a numerics section, Section 4.7, describing the implementation of our technique and a variety of numerical experiments demonstrating the theory. 4.1 Overview of results and prior work We now outline some of the previous literature on spectral methods with an emphasis on exploit- ing sparsity. Along the way, various shortcomings will arise, and we will use these as opportunities to motivate and explain our approach in the sequel. 4.1.1 Prior attempts to relieve dependence on bandwidth via SFT-type methods A key work pioneering the use of SFTs in computing solutions to PDEs is due to Daubechies, et al. [21]. This work mostly focuses on time-dependent, one-dimensional problems where the spectral scheme is formulated as alternating Fourier-projections and time-steps. Thus, there is no need to impose an a priori Fourier basis truncation on the solution. The proposed projection step instead utilizes an SFT at each time step to adaptively retain the most significant frequencies throughout the time-stepping procedure. Time-independent problems like (1.3) can then be handled by stepping in time until a stationary solution is obtained. 83 A simplified form of this algorithm is shown to succeed numerically in [21], and it is also ana- lyzed theoretically in the case where the diffusion coefficient consists of a known, fine-scale mode superimposed over lower frequency terms. There, the Fourier-projection step can be considered to be fixed. However, removing the known fine-scale assumption leads to many difficulties, including the possibility of sparsity-induced omissions in early time steps cascading into larger errors later on. In this chapter, on the other hand, we focus on the case of time-independent problems. This allows us to utilize SFTs only once initially. By doing so we avoid the possibility of SFT-induced error accumulation over many time steps. The main difficulty in our analysis then becomes determining how the Fourier-sparse representations of the PDE data discovered by high-dimensional SFTs can be used to rapidly find a suitable Fourier representation of the solution. This takes the form of mix- ing the Fourier supports the data into stamping sets (discussed in detail in Section 4.4) on which we can analyze the projection error of the solution. In fact, these stamping sets can be viewed as a modification and generalization of the techniques used in the one-dimensional and known fine-scale analysis from [21]. 4.1.2 Attempts to relieve curse of dimensionality Many attempts to overcome the curse of dimensionality in Fourier spectral methods for PDE have focused on using basis truncations which allow for an efficient high-dimensional Fourier transform. One of the most popular techniques is the sparse grid spectral method, which com- putes Fourier coefficients on the hyperbolic cross [47, 11, 29, 30, 63, 31, 20]. In general, a sparse grid method reduces the number of sampling points necessary to approximate the PDE data to O (𝐾 log𝑑−1 (𝐾)), where 𝐾 acts as a type of bandwidth parameter. Algorithms to compute spec- tral representations using these sparse sampling grids run with similar complexity. When used in conjunction with spectral methods for solving PDE, these sparse grid Fourier transforms pro- duce solution approximations with error estimates similar to the full 𝑑-dimensional FFT-versions reduced by factors only on the order of 1/log𝑑−1 (𝐾). In the context of sparse grid Fourier transforms, these methods compute Fourier coefficients with frequencies indexed on hyperbolic crosses of similar cardinality to the number of sampling 84 points. These hyperbolic crosses have intimate links with the space of bounded mixed derivative, in the sense that they are the optimal Fourier-approximation spaces for this class. Thus, sparse grid Fourier spectral methods are particularly apt for problems where the solution is of bounded mixed derivative, as this produces an optimal 𝑢 − 𝑢 truncation term in Lemma 1.1 above. Though sparse-grid spectral methods can efficiently solve a variety of high-dimensional prob- lems, there are clear downsides for the types of problems we target in this chapter. While many problems fit the bounded mixed derivative assumption [67, 68], and therefore have accurate Fourier representations on the hyperbolic cross, the multiscale, Fourier-sparse problems that we are inter- ested in are especially problematic. In fact, since a hyperbolic cross of bandwidth 𝐾 contains only those frequencies k ∈ Z𝑑 with 𝑖∈[𝑑] |𝑘 𝑖 | = O (𝐾), 𝑑-dimensional frequencies active in all dimen- Î sions can have only kkk ∞ = O (𝐾 1/𝑑 ). Thus, in a multiscale problem with even one frequency that interacts in all dimensions, a hyperbolic cross is required with a bandwidth exponential in 𝑑 to properly resolve the data. This then forces the traditionally curse-of-dimensionality-mitigating log𝑑−1 (𝐾) terms characteristic of sparse grid methods to be at least on the order of 𝑑 𝑑−1 . 4.1.3 More on high-dimensional Fourier transforms As outlined in Section 4.1.1 above, this chapter uses sparse Fourier transforms to create an adaptive basis truncation suited to the PDE data. This mimics a similar evolution in the field of high-dimensional Fourier transforms from sparse grids to more flexible techniques [52, 22, 55, 49, 31, 50, 56, 34]. In particular, the rank-1 lattice based approaches for high-dimensional Fourier transforms discussed in Chapters 2 and 3 originate from a link between early high-dimensional quadrature techniques and Fourier approximations on the hyperbolic cross [49, 50]. Though many rank-1 lattice approaches take I to be the hyperbolic cross to leverage the well- studied regularity properties and cardinality bounds similarly enjoyed in the sparse-grid literature, rank-1 lattice results are available for arbitrary frequency sets. The computationally efficient exten- sion of these techniques via sparse Fourier transforms in Chapter 3 as well as the randomization trick presented in Section 4.5 take this frequency set flexibility to its limit, allowing I to be the a priori unknown set of the most important Fourier coefficients of the function to be approximated. 85 This again suggests the applicability of these methods over sparse grid (or other non-sparsity ex- ploiting) Fourier transforms in the context of multiscale problems involving even a small number of Fourier coefficients in extremely high dimensions. 4.1.4 Additional links to compressive sensing As discussed above, the SFT literature overlaps considerably with the language and techniques of compressive sensing. As previously detailed in Chapter 3, the high-dimensional SFT we use herein provides error bounds with best 𝑠-term approximation, compressive-sensing-type error guar- antees [19]. As a result, the Fourier coefficients of the PDE data are approximated with errors depending on the compressibility of their true Fourier series, and then the compressibility of the PDE’s solution in the Fourier basis is inferred from the Fourier compressibility of the data in a direct and constructive fashion. Another very successful line of work, however, aims to more directly apply standard com- pressive sensing reconstruction methods to the general spectral method framework for solving PDEs. Referred to as CORSING [9, 4, 10, 8, 6], these techniques use compressed sensing con- cepts to recover a sparse representation of the solution to the system of equations derived from the (Petrov-)Galerkin formulation of a PDE. These methods have been further extended to the case of pseudospectral methods in [5], in which a simpler-to-evaluate matrix equation is subsampled and used as measurements for a compressive sensing algorithm (as an aside, [5] and discussions with the author served as a primary inspiration for the results in this chapter). This compressive spec- tral collocation method works by finding the largest Fourier-sine coefficients of the solution with frequencies in the integer hypercube with bandwidth 𝐾 by applying Orthogonal Matching Pursuit (OMP) on a set of samples of the PDE data. By using OMP, the method is able to succeed with measurements on the order of O (𝑑 exp(𝑑)𝑠 log3 (𝑠) log(𝐾)) where 𝑠 is the imposed sparsity level of the solution’s Fourier series. Thus, while the O (𝐾 𝑑 ) dependence from a traditional Fourier (pseudo)spectral method is avoided and the method adapts well to large bandwidths, the curse of dimensionality is still apparent. Recently, an improvement on [5] that addresses the curse of dimensionality was made avail- 86 able which is therefore well-suited for similar types of problems discussed in this chapter. In [66], the approach of approximating Fourier-sine coefficients on a full hypercube is replaced with ap- proximating Fourier coefficients on a hyperbolic cross. This has the effect of converting the lin- ear dependence on 𝑑 in the sampling complexity to a log(𝑑) due to cardinality estimates of the hyperbolic cross. However, the exp(𝑑) term is refined using a different technique. The key theo- retical ingredient for being able to apply compressive sensing to these problems is bounding the Riesz constants of the basis functions that result after applying the differential operator [6]. A careful estimation of these constants on the Fourier basis indexed by a hyperbolic cross is able to entirely remove the exponential in 𝑑 dependence, leading to a sampling complexity on the order of O (𝐶𝑎 𝑠 log(𝑑) log3 (𝑠) log(𝐾)), where 𝐶𝑎 involves terms depending on ellipticity and compress- ibility properties of 𝑎. Notably, this estimation procedure has connections to our stamping set techniques described in Section 4.4. On the other hand, though focusing on the hyperbolic cross in compressive spectral colloca- tion breaks the curse of dimensionality in the sampling complexity, the method still suffers from the inability to generalize to multiscale problems or generic frequency sets of interest like those described in Section 4.1.2. Additionally, as mentioned in Section 4.1.4, the compressive-sensing algorithm used for recovery (in this case OMP) suffers from a computational complexity on the order of the cardinality of the truncation set of interest. For the hyperbolic cross, this is still ex- ponential in log(𝑑). Finally, the error estimates are presented in terms of the compressibility of the Fourier series of the solution 𝑢, which may not be known a priori from the PDE data. We ex- pect that there may be some way to link our stamping theory and convergence estimates with the compressive sensing theory to refine and generalize both approaches. 4.2 Elliptic PDE setup We begin with a model elliptic partial differential equation. Definition 4.1. For some 𝑎 : T𝑑 → R, b : T𝑑 → R𝑑 , 𝑐 : T𝑑 → R sufficiently smooth, define the advection-diffusion-reaction operator in divergence form L by L𝑢 = −∇ · (𝑎∇𝑢) + b · ∇𝑢 + 𝑐𝑢. 87 If for some 𝑓 : T𝑑 → R sufficiently smooth, 𝑢 ∈ 𝐶 2 satisfies L𝑢 = 𝑓 , (SF) we say that 𝑢 solves the given PDE with periodic boundary conditions in the strong form. Now, after multiplying by the complex conjugate of a test function 𝑣 ∈ 𝐻 1 (T𝑑 ) and integrating the first term by parts, we define the sesquilinear form associated to L as 𝔏 : 𝐻 1 × 𝐻 1 → C with ∫ 𝔏(𝑢, 𝑣) := 𝑎(x)∇𝑢(x) · ∇𝑣(x) + b(x) · ∇𝑢(x)𝑣(x) + 𝑐(x)𝑢(x)𝑣(x) 𝑑x, T𝑑 and we say that 𝑢 ∈ 𝐻 1 solves the given PDE with periodic boundary conditions in the weak form if 𝔏(𝑢, 𝑣) = h 𝑓 , 𝑣i 𝐿 2 for all 𝑣 ∈ 𝐻 1 . (WF) For our purposes, we will take 𝑎, 𝑐 ∈ 𝐿 ∞ (T𝑑 ; R), b ∈ 𝐿 ∞ (T𝑑 ; R) 𝑑 (i.e., each coordinate of the advection field is in 𝐿 ∞ ), and 𝑓 ∈ 𝐿 2 (T𝑑 ; R). By the conditions specified in the Lax-Milgram theorem (see, e.g., [23]), we are guaranteed that a unique solution to (WF) exists. We use the formulation as stated in [8, Proposition 2.1] and proven in [7]. Proposition 4.1. For 𝑎, 𝑐 ∈ 𝐿 ∞ (T𝑑 ; R), b ∈ 𝐿 ∞ (T𝑑 ; R) 𝑑 , 𝔏 is continuous with continuity constant   𝛽 ≤ max k𝑎k 𝐿 ∞ , sup kb(x) k 2 , k𝑐k 𝐿 ∞ , x∈T𝑑 that is |𝔏(𝑢, 𝑣)| ≤ 𝛽k𝑢k 𝐻 1 k𝑣k 𝐻 1 for all 𝑢, 𝑣 ∈ 𝐻 1 . (4.1) Additionally, assuming b ∈ 𝐻 1 (T𝑑 ; R) 𝑑 , if 𝑎(x) ≥ 𝑎 min > 0 and − 12 ∇ · b(x) + 𝑐(x) ≥ 𝑑min > 0 a.e. on T𝑑 , then 𝔏 is also coercive with coercivity constant 𝛼 ≥ min {𝑎 min , 𝑑min } , that is |𝔏(𝑢, 𝑢)| ≥ 𝛼k𝑢k 2𝐻 1 for all 𝑢 ∈ 𝐻 1 . (4.2) Under conditions (4.1) and (4.2), if 𝑓 ∈ 𝐿 2 (T𝑑 ; R) then (WF) has unique solution 𝑢 ∈ 𝐻 1 satisfying k 𝑓 k 𝐿2 k𝑢k 𝐻 1 ≤ . (4.3) 𝛼 88 4.3 Galerkin spectral methods By Theorem 1.1, it is equivalent to replace the weak PDE (WF) by 𝔏(𝑢, e2𝜋ik·◦ ) = h 𝑓 , e2𝜋ik·◦ i 𝐿 2 =: 𝑓ˆk for all k ∈ Z𝑑 . Rewriting the sesquilinear form on the left-hand side and using the Fourier series representations of 𝑎, b (where we collect all coordinates’ Fourier coefficients at a given frequency k ∈ Z𝑑 into the vectors b̂k ∈ C𝑑 ), 𝑐, and 𝑢, we obtain Õ ∫ 𝔏(𝑢, e 2𝜋ik·◦ )= 𝑎ˆ l1 𝑢ˆ l2 e2𝜋il1 ·x ∇e2𝜋il2 ·x · ∇e2𝜋ik·x 𝑑x T𝑑 l1 ,l2 ∈Z𝑑 Õ ∫ + 𝑢ˆ l2 e2𝜋il1 ·x b̂l1 · ∇e2𝜋il2 ·x e2𝜋ik·x 𝑑x T𝑑 l1 ,l2 ∈Z𝑑 Õ ∫ + 𝑐ˆl1 𝑢ˆ l2 e2𝜋il1 ·x e2𝜋il2 ·x e2𝜋ik·x 𝑑x T𝑑 l1 ,l2 ∈Z𝑑 Õ h   i = 𝛿l1 ,k−l2 (2𝜋) 2 (l2 · k) 𝑎ˆ l1 + 2𝜋i b̂l1 · l2 + 𝑐ˆl1 𝑢ˆ l2 l1 ,l2 ∈Z𝑑 Õh   i 2 = (2𝜋) (l · k) 𝑎ˆ k−l + 2𝜋i b̂k−l · l + 𝑐ˆk−l 𝑢ˆ l l∈Z𝑑 =: (𝐿 𝑢) ˆ k, where 𝐿 is an operator in ℓ 2 . This leads to the Galerkin form of our PDE, 𝐿 𝑢ˆ = 𝑓ˆ. (GF) The computational advantages of (GF) are clear. By numerically approximating 𝑎, ˆ b̂, 𝑐ˆ and 𝑓ˆ (which automatically truncates 𝐿), we arrive at a discretized, finite system of equations that can be solved for the Fourier coefficients of our solution. We will use a fast sparse Fourier transform (SFT) for functions of many dimensions to approx- imate our PDE data which then leads to a sparse system of equations that we can quickly solve to approximate 𝑢.ˆ This SFT will use the values of 𝑎, b, 𝑐 and 𝑓 at equispaced nodes on a randomized rank-1 lattice in T𝑑 , and therefore, our technique is effectively a pseudospectral method where the discretization of the solution space {𝑢ˆ | 𝑢 ∈ ℎ} is adapted to the PDE data. 89 Before we move to the detailed discussion of this SFT, we provide a more detailed analysis of the Galerkin operator in Section 4.4 to help us analyze the resulting spectral method. But first, we note that 𝐿 also captures the behavior of 𝔏 as a sesquilinear form. Proposition 4.2. For 𝑢, ˆ 𝑣ˆ ∈ ℓ 2 with 𝑢, 𝑣 ∈ 𝐻 1 , 𝔏(𝑢, 𝑣) = h𝐿 𝑢, ˆ 𝑣ˆ iℓ2 . Proof. By the Fourier series representation of 𝑣, Õ Õ 𝔏(𝑢, 𝑣) = 𝔏(𝑢, e2𝜋ik·◦ ) 𝑣ˆ k = (𝐿 𝑢) ˆ k 𝑣ˆ k = h𝐿 𝑢, ˆ 𝑣ˆ iℓ2 . k∈Z𝑑 k∈Z𝑑 4.4 Stamping sets and truncation analysis Notably, (GF) gives us insight into the frequency support of 𝑢. ˆ The structure outlined in the following proposition is crucial in constructing a fast spectral method that exploits Fourier-sparsity. Proposition 4.3. Given 𝑎, ˆ b̂, and 𝑐,ˆ the Fourier coefficients of the diffusion coefficient, the advec- tion field, and the reaction coefficient of an ADR equation respectively, denote the set of “active” frequencies   ©Ø A := supp ( 𝑎) ˆ ∪­ supp 𝑏ˆ 𝑗 ® ∪ supp ( 𝑐) ˆ ⊂ Z𝑑 . ª « 𝑗 ∈[𝑑] ¬ For any set F ⊂ Z and 𝑁 ∈ N0 , recursively define the sets 𝑑  if 𝑁 = 0  F   𝑁  S [A] (F ) := , [A] (F ) + A if 𝑁 > 0  S (4.4)  𝑁−1   Ø∞ S ∞ [A] (F ) := S 𝑁 [A] (F ), 𝑁=0 where here, addition is defined as the Minkowski sum of sets. Under the conditions of Proposi- tion 4.1, supp( 𝑢) ˆ ⊂ S ∞ [A] (supp( 𝑓ˆ)). Proof. Note first that the fact that 𝑎, b, and 𝑐 are real imply the supports of their Fourier series are “rotationally” symmetric in Z𝑑 , e.g., supp( 𝑎) ˆ = − supp( 𝑎). ˆ Now, we show that 𝐿 k,k ≠ 0 for all k ∈ Z𝑑 . Recall that   𝐿 k,k := (2𝜋) 2 (k · k) 𝑎ˆ 0 + 2𝜋i b̂0 · k + 𝑐ˆ0 . 90 It suffices to show that 𝑎ˆ 0 and 𝑐ˆ0 are strictly positive as the middle term will always be purely imaginary. Since 𝑎 is always strictly positive under the assumptions of Proposition 4.1, its mean 𝑎ˆ 0 is necessarily strictly positive. As for 𝑐, the conditions of Proposition 4.1 require 1 − ∇·b+𝑐 > 0 2 which implies ∫ 1 𝑐ˆ0 > ∇ · b(x) 𝑑x. 2 T𝑑 However, the divergence theorem implies that the right hand side is zero, and therefore 𝑐ˆ0 is positive as desired. Now, since Lk,k is nonzero, we may rearrange the equality (𝐿 𝑢) ˆ k = 𝑓ˆk to obtain 𝑓ˆk − l∈({k}+A)\{k} 𝐿 k,l 𝑢ˆ l Í 𝑢ˆ k = , 𝐿 k,k where we have restricted the summation to only those frequencies where the entries of row k of 𝐿 are nonzero, that is, the active frequencies of the PDE data translated by k. Thus, 𝑢ˆ k explicitly depends only on the values of 𝑢ˆ on S 1 [A] ({k}) \ {k}, which themselves then depend only on values of 𝑢ˆ on S 2 [A] ({k}), and so on. This decouples the system of equations 𝐿 𝑢ˆ into a disjoint collection of systems of equations, one for each class of frequencies S ∞ [A] ({k}). Since Proposition 4.1 implies that 𝑣ˆ = 0 is the unique solution of 𝐿 𝑣ˆ = 0, the unique solution of the system of equations for 𝑢ˆ on S ∞ [A] ({k}) for any k ∉ supp( 𝑓ˆ) is 𝑢| ˆ S ∞ [A] ({k}) = 0. Therefore, supp( 𝑢) ˆ ⊂ S ∞ [A] (supp( 𝑓ˆ)) as desired. In what follows, when the set F (often supp( 𝑓ˆ)) and set of active frequencies A are clear from context, we suppress them in the notation given by (4.4) so that S 𝑁 := S 𝑁 [A] (F ). Intuitively, we can imagine constructing S 𝑁 by first creating a “rubber stamp” in the shape of A. This rubber stamp is then stamped onto every frequency in F =: S 0 to construct S 1 . Then, this process is repeated, stamping each element of S 1 to produce S 2 , and so on. For this reason, we will colloquially refer to these as “stamping sets.” Figure 4.1 gives an example of this stamping procedure for 𝑑 = 2. A key approach of our further analysis will be analyzing the decay of 𝑢ˆ on successive stamping levels. The stamping level will become the driving parameter in the spectral method rather than 91     supp( 𝑓ˆ) = S 0 [A] supp( 𝑓ˆ) S 1 [A] supp( 𝑓ˆ) A     S 2 [A] supp( 𝑓ˆ) S 3 [A] supp( 𝑓ˆ) 𝑁 =0 𝑁 =1 𝑁 =2 𝑁 =3 Figure 4.1 New frequencies in each stamping level up to 𝑁 = 3 where 𝑁 = 0 is supp( 𝑓ˆ). bandwidth in a traditional spectral method. Before moving onto this analysis however, we provide an upper bound for the cardinality of the stamping sets. This will ultimately be used to upper bound the computational complexity of our technique. Lemma 4.1. Suppose that A = −A with 0 ∈ A, and supp( 𝑓ˆ) ≤ |A|. Then S 𝑁 [A] (supp( 𝑓ˆ)) ≤ 7 max(|A|, 2𝑁 + 1) min(|A|,2𝑁+1) . We prove this by first providing the following combinatorial upper bound for the cardinality of a stamp set. Lemma 4.2. Suppose that A = −A with 0 ∈ A. Then 𝑁 min(𝑛,(|A|−1)/2)    Õ Õ (|A| − 1)/2 𝑛 − 1 S [A] (supp( 𝑓ˆ)) ≤ supp( 𝑓ˆ) 𝑁 2 𝑡 . (4.5) 𝑛=0 𝑡=0 𝑡 𝑡−1 Proof. We begin by separating S 𝑁 into the disjoint pieces 𝑁 𝑛−1 !! Ä Ø S𝑁 = S𝑛 \ S𝑖 𝑛=0 𝑖=0 92 Ð  and computing the cardinality of each of these sets (where we take 𝑆 −1 = ∅). If k ∈ S 𝑛 \ 𝑛−1 𝑖 𝑖=0 S , then we are able to write k as Õ 𝑛 k = k𝑓 + k𝑚A (4.6) 𝑚=1 where k 𝑓 ∈ supp( 𝑓ˆ) and k𝑚 A ∈ A \ {0} for all 𝑚 = 1, . . . , 𝑛. Additionally, since k is not in any earlier stamping sets, this is the smallest 𝑛 for which this is possible. In particular, it is not possible for any two frequencies in the sum to be negatives of each other resulting in pairs of cancelled terms. With this summation in mind, arbitrarily split A \ {0} into 𝐴 t −𝐴 (i.e., place all frequencies which do not negate each other into 𝐴 and their negatives in −𝐴). By collecting like frequencies that occur as a k𝑚 A term in (4.6), we can rewrite this sum as Õ k = k𝑓 + 𝑠(k, kA )𝑚(k, kA )kA , (4.7) k A ∈𝐴 where the sign function 𝑠(k, kA ) is given by  if kA is a term in the summation (4.6)  1         𝑠(k, kA ) := −1  if −kA is a term in the summation (4.6)    otherwise  0    and the multiplicity function 𝑚(k, kA ) is defined as the number of times that kA or −kA appears as a k𝑚A term in (4.6). Letting s(k) := (𝑠(k, kA ))k A ∈𝐴 and m(k) := (𝑚(k, kA ))k A ∈𝐴 , we can then Ð  identify any k ∈ S \ 𝑛 𝑖=0 S with the tuple 𝑛−1 𝑖 (k 𝑓 , s(k), m(k)) ∈ supp( 𝑓ˆ) × {−1, 0, 1} 𝐴 × {0, . . . , 𝑛} 𝐴 . Ð  Upper bounding the number of these tuples that can correspond to a value of k ∈ S 𝑛 \ 𝑛−1 𝑖 𝑖=0 S will then upper bound the cardinality of this set. Since any k 𝑓 ∈ supp( 𝑓ˆ) can result in a valid k value, we will focus on the pairs of sign and multiplicity vectors. Define by 𝑇𝑛 ⊂ {−1, 0, 1} 𝐴 × {0, . . . , 𝑛} 𝐴 the set of valid sign and multiplicity Ð  pairs that can correspond to a k ∈ S 𝑛 \ 𝑛−1 𝑖 𝑖=0 S . In particular, for (s, m) ∈ 𝑇𝑛 , kmk 1 = 𝑛 and 93 supp(s) = supp(m). Thus, we can write min(𝑛,| Ä 𝐴|) (s, m) ∈ {−1, 0, 1} 𝐴 × {0, . . . , 𝑛} 𝐴 | kmk 1 = 𝑛 and | supp(s)| = | supp(m)| = 𝑡 .  𝑇𝑛 ⊂ 𝑡=0 This inner set then corresponds to the 𝑡-partitions of the integer 𝑛 spread over the | 𝐴| entries of m where each non-zero term is assigned a sign −1 or 1. The cardinality is therefore 2𝑡 | 𝐴| 𝑡−1 : the  𝑛−1 𝑡 first factor is from the possible sign options, the second is the number of ways to choose the entries of m which are nonzero, and the last is the number of 𝑡-partitions of 𝑛 which will fill the nonzero entries of m. Noting that | 𝐴| = 2 , |A|−1 our final cardinality estimate is 𝑁 𝑛−1 ! Õ Ø 𝑁 𝑛 𝑖 S = S \ S 𝑛=0 𝑖=0 Õ𝑁 ≤ supp( 𝑓ˆ) |𝑇𝑛 | 𝑛=0 𝑁 min(𝑛,(|A|−1)/2)    Õ Õ (|A| − 1)/2 𝑛 − 1 ≤ supp( 𝑓ˆ) 2𝑡 𝑛=0 𝑡=0 𝑡 𝑡−1 as desired. Though this upper bound is much tighter than the one given in the main text, it is harder to parse. As such, we simplify it to the bound presented in Lemma 4.1. Proof of Lemma 4.1. Let 𝑟 = (|A| − 1)/2. We consider two cases: Case 1: 𝑟 ≥ 𝑁 We estimate the innermost sum of (4.5). Since 𝑟 ≥ 𝑁 ≥ 𝑛, min(𝑛, (|A|−1)/2) = 𝑛. By upper bounding the binomial coefficients with powers of 𝑟, we obtain 𝑛    Õ 𝑛 Õ 𝑟 𝑛−1 2 𝑡 ≤ 2𝑡 (𝑟 𝑡 ) 2 𝑡=0 𝑡 𝑡−1 𝑡=0 ≤ 2(2𝑟 2 ) 𝑛 where the second estimate follows from the approximating the geometric sum. Again, bound- ing the next geometric sum by double the largest term, we have Õ𝑁 S 𝑁 ≤ supp( 𝑓ˆ) 2(2𝑟 2 ) 𝑛 ≤ (2𝑟 + 1)4(2𝑟 2 ) 𝑁 ≤ 2(2𝑟 + 1) 2𝑁+1 = |A| 2𝑁+1 . 𝑛=0 94 Case 2: 𝑟 < 𝑁 Bounding the innermost sum of (4.5) proceeds much the same way as Case 1, but we must first split the outermost sum into the first 𝑟 + 1 terms and last 𝑁 − 𝑟 terms. Working with the first terms, we find 𝑟 Õ 𝑛    Õ 𝑟 𝑛−1 2 𝑡 ≤ 4(2𝑟 2 ) 𝑟 𝑛=0 𝑡=0 𝑡 𝑡−1 using the argument in Case 1. Now, we bound 𝑁 Õ 𝑟    𝑁 Õ 𝑟 𝑛−1 Õ 2𝑡 ≤ 2(2(𝑛 − 1) 2 ) 𝑟 𝑛=𝑟+1 𝑡=0 𝑡 𝑡−1 𝑛=𝑟+1 ∫ 𝑁 ≤2 𝑟+1 𝑛2𝑟 𝑑𝑛 √𝑟 √ ( 2𝑁) 2𝑟+1 ≤ 2 . 2𝑟 + 1 Thus, " √ # √ ( 2𝑁) 2𝑟+1 √ √  |A| S𝑁 ˆ ≤ supp( 𝑓 ) 4(2𝑟 ) + 2 2 𝑟 ≤ 5 2 2𝑁 ≤ 7(2𝑁 + 1) |A| . 2𝑟 + 1 Combining the two cases gives the desired upper bound. Proposition 4.3 gives us a natural way to consider truncations of the solution 𝑢 in frequency space. We will use these truncations to discretize the Galerkin formulation (GF) in Section 4.6 below. In order to analyze the error in the resulting spectral method algorithm, we will need quan- titative bounds on how the solution decays outside of the frequency sets S 𝑁 := S 𝑁 [A] (supp( 𝑓ˆ)). For S 𝑁 to be finite, we assume in this section that A and supp( 𝑓ˆ) are finite. This assumption will be lifted later via Lemma 4.5. We begin with a technical result regarding the interplay between 𝐿 and the supports of vectors that it acts on. Proposition 4.4. For any 𝑣ˆ with supp( 𝑣ˆ ) ⊂ S 𝑛 \ S 𝑛−1 , supp(𝐿 𝑣ˆ ) ⊂ S 𝑛+1 \ S 𝑛−2 . Proof. For any k ∈ Z𝑑 , recall that row k of 𝐿 is supported on {k} + A. Consider Õ Õ Õ (𝐿 𝑣ˆ ) k = 𝐿 k,l 𝑣ˆ l = 𝐿 k,l 𝑣ˆ l = 𝐿 k,l 𝑣ˆ l . l∈Z𝑑 l∈({k}+A)∩supp( 𝑣ˆ ) l∈({k}+A)∩(S 𝑛 \S 𝑛−1 ) 95 This sum is nonempty only if k is such that there exists l ∈ S 𝑛 \ S 𝑛−1 and k∗A ∈ A with k = l + k∗A . By definition of l ∈ S 𝑛 \ S 𝑛−1 , 𝑛 is the minimal such number that Õ 𝑛 A , where k 𝑓 ∈ supp( 𝑓 ), kA ∈ A for all 𝑚 = 1, . . . , 𝑛 l = k𝑓 + k𝑚 ˆ 𝑚 𝑚=1 holds. In particular, this implies that k𝑚 A ≠ 0 for all 𝑚 = 1, . . . , 𝑛. There are now two cases. First, if k∗A = −k𝑚 A for any 𝑚, k = l + kA ∈ S ∗ 𝑛−1 \ S 𝑛−2 , and the proposition is satisfied. On the other hand, we consider the case when k∗A does not negate any k𝑚 A involved in the sum equalling l. If k∗A = 0, then clearly k = l ∈ S 𝑛 \ S 𝑛−1 . In any other case, we represent Õ𝑛 𝑛+1 Õ k = k𝑓 + k𝑚A + k∗A =: k 𝑓 + k𝑚 A, 𝑚=1 𝑚=1 where 𝑛 + 1 is the smallest number for which this holds. Thus, k ∈ S 𝑛+1 \ S 𝑛 . Altogether then, the only possible k values such that the sum is nonzero are those in S 𝑛+1 \ S 𝑛−2 , completing the proof. Noting that supp(𝐿 𝑢) ˆ = supp( 𝑓ˆ), we observe the following interesting relationship between the values of 𝑢ˆ on neighboring stamping levels. Below, to simplify notation, for all 𝑚, 𝑛 ∈ N0 , we set 𝑑𝑚,𝑛 := h𝐿 𝑢ˆ S 𝑚 \S 𝑚−1 , 𝑢ˆ S 𝑛 \S 𝑛−1 iℓ2 , with the convention that S −1 = ∅. Corollary 4.1. For all 𝑛 ∈ N0 ,   h 𝑓ˆ, 𝑢| if 𝑛 = 0  ˆ S 0 iℓ2    𝑑𝑛+1,𝑛 + 𝑑𝑛,𝑛 + 𝑑𝑛−1,𝑛 = otherwise.  0    Proof. By Proposition 4.4, 𝑢| ˆ S 𝑛 \S 𝑛−1 is ℓ 2 -orthogonal to 𝐿 𝑢| ˆ S 𝑚 \S 𝑚−1 for all 𝑚 ∉ {𝑛 − 1, 𝑛, 𝑛 + 1}. In our simplified notation, 𝑑𝑚,𝑛 = 0 for all 𝑚 ∉ {𝑛 − 1, 𝑛, 𝑛 + 1}. Thus Õ∞ h 𝑓ˆ, 𝑢| ˆ S 𝑛 \S 𝑛−1 iℓ2 = h𝐿 𝑢, ˆ S 𝑛 \S 𝑛−1 iℓ2 = ˆ 𝑢| 𝑑𝑚,𝑛 = 𝑑𝑛+1,𝑛 + 𝑑𝑛,𝑛 + 𝑑𝑛−1,𝑛 . 𝑚=0 96 The proof is finished by noting that   h 𝑓ˆ, 𝑢| if 𝑛 = 0  ˆ S0 i    h 𝑓ˆ, 𝑢| ˆ S 𝑛 \S 𝑛−1 iℓ2 = otherwise.  0    We are now ready to estimate 𝑢| ˆ S 𝑛 \S 𝑛−1 in terms of its neighbors 𝑢| ˆ S 𝑛+1 \S 𝑛 and 𝑢| ˆ S 𝑛−1 \S 𝑛−2 . The standard approach would be to use a combination of coercivity and continuity (see, e.g., the proof of Lemma 4.6 or [13, Section 6.4] for other examples): for 𝑛 > 0,   2 𝛼 𝑢| S 𝑛 \S 𝑛−1 𝐻1 ≤ |𝑑𝑛,𝑛 | ≤ |𝑑𝑛+1,𝑛 | + |𝑑𝑛−1,𝑛 | ≤ 𝛽 𝑢| S 𝑛 \S 𝑛−1 𝐻1 𝑢| S 𝑛+1 \S 𝑛 𝐻1 + 𝑢| S 𝑛−1 \S 𝑛−2 𝐻1 , and we obtain 𝛽  𝑢| S 𝑛 \S 𝑛−1 𝐻1 ≤ 𝑢| S 𝑛+1 \S 𝑛 𝐻1 + 𝑢| S 𝑛−1 \S 𝑛−2 𝐻1 . 𝛼 However, we will hope to iterate this bound, and the fact that 𝛽 ≥ 𝛼 will not allow for us to show any decay as 𝑛 → ∞. Thus, we require a slightly subtler estimate than simply using continuity. Proposition 4.5. Define   𝛽−0 := max k𝑎 − 𝑎ˆ 0 k 𝐿 ∞ , sup b(x) − b̂0 2 , k𝑐 − 𝑐ˆ0 k 𝐿 ∞ . x∈T𝑑 For 𝑛 > 0, we have |𝑑𝑛±1,𝑛 | ≤ 𝛽−0 𝑢| S 𝑛 \S 𝑛−1 𝐻1 𝑢| S 𝑛±1 \S 𝑛±1−1 𝐻1 . Proof. Restricting all sums to the support of the vectors they index, we have Õ Õ 𝑑𝑛±1,𝑛 = 𝐿 k,l 𝑢ˆ l 𝑢ˆ k . k∈S 𝑛 \S 𝑛−1 l∈({k}+A))∩(S 𝑛±1 \S 𝑛±1−1 ) Clearly, choosing l = k ∈ S 𝑛 \ S 𝑛−1 would not allow for l ∈ S 𝑛±1 \ S 𝑛±1−1 . Thus, no term multiplying 𝐿 k,k will appear in the sum. This implies that there are no terms including the Fourier coefficients 𝑎ˆ 0 , b̂0 , or 𝑐ˆ0 . It is therefore equivalent to replace 𝐿 with a version 𝐿 − defined using the Fourier coefficients 𝑎ˆ − 𝑎ˆ 0 , b̂ − b̂0 , and 𝑐ˆ − 𝑐ˆ0 . We then have the equivalence 𝑑𝑛±1,𝑛 = h𝐿 − 𝑢| ˆ S 𝑛 \S 𝑛−1 iℓ2 , ˆ S 𝑛±1 \S 𝑛±1−1 , 𝑢| 97 which by Proposition 4.2 and the standard argument to prove the continuity upper bound, implies |𝑑𝑛±1,𝑛 | ≤ 𝛽−0 𝑢| S 𝑛 \S 𝑛−1 𝐻1 𝑢| S 𝑛±1 \S 𝑛±1−1 𝐻1 . as desired. The same argument preceding Proposition 4.5 then gives the desired “neighbor” estimate. Corollary 4.2. For all 𝑛 > 1, 𝛽−0   𝑢| S 𝑛 \S 𝑛−1 𝐻1 ≤ 𝑢| S 𝑛+1 \S 𝑛 𝐻1 + 𝑢| S 𝑛−1 \S 𝑛−2 𝐻1 . 𝛼 We now have the pieces to state an estimate of the truncation error. Lemma 4.3. Let 𝑎, b, 𝑐, 𝑓 , and 𝑢 be as in Proposition 4.1. Assume 3𝛽−0 < 𝛼. (4.8) Then  𝑁+1 𝛽−0  k 𝑓 k 𝐿2 k𝑢 − 𝑢| S 𝑁 k 𝐻 1 ≤ . 𝛼 − 2𝛽−0 𝛼 Proof. We begin by breaking supp( 𝑢) ˆ \ S 𝑁 into sets of new contributions Ð∞  𝑛=𝑁+1 S 𝑛 \ S 𝑛−1 (which holds due to Proposition 4.3). Thus Õ∞ k𝑢 − 𝑢| S 𝑁 k 𝐻 1 ≤ 𝑢| S 𝑛 \S 𝑛−1 𝐻1 =: 𝑇𝑁 . 𝑛=𝑁+1 Applying the neighbor bound, Corollary 4.2, (where we define 𝐴 := 𝛽−0 /𝛼), we have ∞ ∞ ! Õ Õ 𝑇𝑁 ≤ 𝐴 𝑢| S 𝑛+1 \S 𝑛 𝐻 1 + 𝑢| S 𝑛−1 \S 𝑛−2 𝐻 1 𝑛=𝑁+1 𝑛=𝑁+1 = 𝐴 (𝑇𝑁+1 + 𝑇𝑁−1 )   = 2𝐴𝑇𝑁 + 𝐴 𝑢| S 𝑁 \S 𝑁 −1 𝐻1 − 𝑢| S 𝑁 +1 \S 𝑁 𝐻1 . After rearranging, and ignoring the negative term, we find 𝐴 𝑇𝑁 ≤ 𝑢| 𝑁 𝑁 −1 . (4.9) 1 − 2𝐴 S \S 𝐻1 Noting that we always have 𝑢| S 𝑁 \S 𝑁 −1 𝐻1 ≤ 𝑇𝑁−1 , (4.10) 98 iterating (4.9) and (4.10) in turn gives   𝑁+1   𝑁+1 𝐴 𝐴 k 𝑓 k 𝐿2 k𝑢 − 𝑢| S 𝑁 k 𝐻 1 ≤ 𝑇𝑁 ≤ 𝑢| S 0 𝐻1 ≤ . 1 − 2𝐴 1 − 2𝐴 𝛼 4.5 Fully sublinear-time SFTs with randomized lattices In Chapter 3, two methods for high-dimensional SFTs are presented, each with a determin- istic and Monte Carlo variant. Below, we will be using the faster of the two algorithms (at the cost of slightly suboptimal error guarantees), the phase-encoding approach with the nonequispaced sublinear-time SFT discussed in Corollary 3.2. We focus on only the Monte Carlo variant as the improvements in this section require randomization. To use the high-dimensional phase-encoding SFT given in Algorithm 3.1, we need to know a reconstructing rank-1 lattice in advance. Though component-by-component algorithms can de- terministically construct a reconstructing rank-1 lattice given any frequency set I, as previously discussed, these algorithms are superlinear in |I| as they effectively search the frequency space for collisions throughout construction. This section presents an alternative based on choosing a random lattice. This lattice is chosen by drawing z from a uniform distribution over {1, . . . , 𝑀 − 1} 𝑑 for 𝑀 sufficiently large. Below, we provide probability estimates for when this lattice is reconstructing for a frequency set I. Lemma 4.4. Let 𝐾I := max 𝑗 ∈[𝑑] (maxk∈I 𝑘 𝑗 − minl∈I 𝑙 𝑗 ) + 1 be the expansion of the frequency |I| 2 set I ⊂ Z𝑑 . Let 𝜎 ∈ (0, 1], and fix 𝑀 to be the smallest prime greater than max(𝐾I , 𝜎 ). Then drawing each component of z i.i.d from {1, . . . 𝑀 − 1} gives that Λ(z, 𝑀) is a reconstructing rank-1 lattice for I with probability 1 − 𝜎. Proof. In order to show that Λ(z, 𝑀) is reconstructing for I, it suffices to show that for any k ≠ l ∈ I, k · z . l · z mod 𝑀 (cf. Definition 1.2). Thus, we are interested in showing that P[∃k ≠ l ∈ I s.t. (k − l) · z ≡ 0 mod 𝑀] is small. If k, l ∈ I are distinct, at least one component 𝑘 𝑗 − 𝑙 𝑗 is nonzero. Since 𝑀 > 𝐾I , we therefore have that 𝑘 𝑗 − 𝑙 𝑗 . 0 mod 𝑀, and since 𝑀 is prime, 𝑘 𝑗 − 𝑙 𝑗 has a multiplicative inverse modulo h Í i 𝑀. Then P[(k − l) · z ≡ 0 mod 𝑀] = P 𝑧 𝑗 = (𝑘 𝑗 − 𝑙 𝑗 ) −1 𝑖∈[𝑑],𝑖≠ 𝑗 (𝑘 𝑖 − 𝑙𝑖 )𝑧𝑖 mod 𝑀 . Since 𝑧 𝑗 99 is uniformly distributed in {1, . . . 𝑀 − 1}, this probability is 𝑀−1 . 1 By the union bound, Õ |I| 2 P[∃k ≠ l ∈ I s.t. (k − l) · z ≡ 0 mod 𝑀] ≤ P[(k − l) · z ≡ 0 mod 𝑀] ≤ ≤𝜎 𝑀 −1 k≠l∈I as desired. One important consequence of Lemma 4.4 is that we no longer need to provide the frequency set of interest in Corollary 3.2. Having chosen 𝐾, the expansion, and 𝑠, the sparsity level, we can always take I to be the frequencies corresponding to the largest 𝑠 Fourier coefficients of the function 𝑔 in the hypercube B𝐾𝑑 . Lemma 4.4 then implies that a randomly generated lattice with length max(𝐾, 𝑠2 /𝜎) will be reconstructing for these optimal frequencies with probability 𝜎. We summarize this in the following corollary. Corollary 4.3. Given a multivariate bandwidth 𝐾, a sparsity level 𝑠, probability of failure 𝜎 ∈ (0, 1], and sampling access to 𝑔 ∈ 𝐿 2 , there exists a fast, randomized SFT which will produce a 2𝑠-sparse approximation ĝ𝑠 of 𝑔ˆ and function 𝑔 𝑠 := k∈supp(ĝ𝑠 ) 𝑔ˆ k𝑠 e2𝜋ik·◦ approximating 𝑔 satisfying Í √ opt k𝑔 − 𝑔 𝑠 k 𝐿 2 ≤ k 𝑔ˆ − ĝ𝑠 k ℓ2 ≤ (25 + 3𝐾) 𝑠 𝑔ˆ − ( 𝑔| ˆ 𝐾 )𝑠 ℓ1 with probability 1 − 𝜎. If 𝑔 ∈ 𝐿 ∞ , then 𝑔 𝑠 and ĝ𝑠 satisfy the upper bound opt k𝑔 − 𝑔 𝑠 k 𝐿 ∞ ≤ k 𝑔ˆ − ĝ𝑠 k ℓ1 ≤ (35 + 3𝐾)𝑠 𝑔ˆ − ( 𝑔|ˆ 𝐾 )𝑠 ℓ1 with the same probability estimate. The total number of samples of 𝑔 and computational complexity of the algorithm can be bounded above by    3 𝑑𝐾 max(𝐾, 𝑠/𝜎) O 𝑑𝑠 log (𝑑𝐾 max(𝐾, 𝑠/𝜎)) log . 𝜎 If we fix 𝜎 (say 𝜎 = 0.95), this reduces to a complexity of   O 𝑑𝑠 log4 (𝑑𝐾 max(𝐾, 𝑠)) . 4.6 A sparse spectral method via SFTs Let â𝑠 , b̂𝑠 , ĉ𝑠 , and f̂ 𝑠 be 𝑠-sparse approximations of 𝑎, ˆ and 𝑓ˆ respectively, where each ˆ b̂, 𝑐, coordinate in b̂ is approximated separately. We will use these approximations to discretize the 100 Galerkin formulation (GF) of our PDE. The first step is to reduce to the case where the PDE data is Fourier-sparse which is motivated by the following lemma. Lemma 4.5. Let 𝑎0 := 𝑎| supp(â𝑠 ) , 𝑏0𝑗 := 𝑏 𝑗 | supp(b̂𝑠 ) for 𝑗 ∈ [𝑑], 𝑐0 := 𝑐| supp(ĉ𝑠 ) , and 𝑓 0 := 𝑓 | supp(f̂ 𝑠 ) . 𝑗 Define   𝛽0− 0 0 := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ . 0 x∈T𝑑 Suppose that 𝑎0, b0, 𝑐0, and 𝑓 0 satisfy the conditions of Proposition 4.1 and let 𝑢0 be the unique solution of the resulting elliptic PDE, which we write in Galerkin form as 𝐿 0𝑢ˆ0 = 𝑓ˆ0 . (4.11) Then k 𝑓 − 𝑓 0 k 𝐿 2 𝛽0− k 𝑓 0 k 𝐿 2 k𝑢 − 𝑢0 k 𝐻 1 ≤ + . 𝛼 𝛼𝛼0 where 𝛼0 is taken to be the coercivity coefficient of the differential operator defined using 𝑎0, b0, and 𝑐0 . Proof. We begin by observing 𝐿 ( 𝑢ˆ − 𝑢ˆ0) = 𝐿 𝑢ˆ − 𝐿 0𝑢ˆ0 − (𝐿 − 𝐿 0) 𝑢ˆ0 = 𝑓ˆ − 𝑓ˆ0 − (𝐿 − 𝐿 0) 𝑢ˆ0, and therefore |h𝐿 ( 𝑢ˆ − 𝑢ˆ0), 𝑢ˆ − 𝑢ˆ0i| ≤ h 𝑓ˆ − 𝑓ˆ0, 𝑢ˆ − 𝑢ˆ0i + |h(𝐿 − 𝐿 0) 𝑢ˆ0, 𝑢ˆ − 𝑢ˆ0i|. After an application of Proposition 4.2 to convert the ℓ 2 inner products into sesquilinear forms, we can make use of coercivity, (4.2), continuity, (4.1), and the Cauchy-Schwarz inequality to produce the 𝐻 1 approximation 𝛼k𝑢 − 𝑢0 k 𝐻 1 ≤ 𝑓ˆ − 𝑓ˆ0 ℓ2 + 𝛽0− k𝑢0 k 𝐻 1 . An application of the stability estimate (4.3) gives the desired bound k 𝑓 − 𝑓 0 k 𝐿 2 𝛽0− k 𝑓 0 k 𝐿 2 k𝑢 − 𝑢0 k 𝐻 1 ≤ + . 𝛼 𝛼𝛼0 101 We can now replace the trial and test spaces in (WF) with finite dimensional approximations so as to convert (GF) to a matrix equation. Inspired by Proposition 4.3 and the truncation error analysis in Section 4.4, we use the space of functions whose Fourier coefficients are supported on S 𝑁 := S 𝑁 [A] (supp 𝑓ˆ). By doing so, we discretize the Galerkin formulation of the problem (GF) into the finite system of equations Õ h   i (L𝑁 û)k := (2𝜋) 2 (l · k) 𝑎ˆ k−l + 2𝜋i b̂k−l · l + 𝑐ˆk−l 𝑢ˆ l = 𝑓ˆk for all k ∈ S 𝑁 . (4.12) l∈S 𝑁 However, in practice, we do not know 𝑎, ˆ b̂, 𝑐, ˆ and 𝑓ˆ exactly (and indeed, they may not be exactly sparse). Thus, we substitute the SFT approximations â𝑠 , b̂𝑠 , ĉ𝑠 , and f̂ 𝑠 , defining the new finite- dimensional operator L𝑠,𝑁 : CS → CS by 𝑁 𝑁 Õ h   i 2 for all k ∈ S 𝑁 .  𝑠 𝑠 𝑠 L𝑠,𝑁 û k := (2𝜋) (l · k) 𝑎ˆ k−l + 2𝜋i b̂k−l ·l + 𝑐ˆk−l 𝑢ˆ l l∈S 𝑁 Our new approximate solution will be û𝑠,𝑁 ∈ CS which solves 𝑁 L𝑠,𝑁 û𝑠,𝑁 = f̂ 𝑠 . (4.13) We summarize our technique in Algorithm 4.1. Algorithm 4.1 Sparse spectral method Input: PDE data 𝑎, b, 𝑐, and 𝑓 , a sparsity parameter 𝑠, a bandwidth parameter 𝐾, and stamping level 𝑁 Output: Fourier coefficients û𝑠,𝑁 of approximate solution 1: â𝑠 ← SFT[𝑠, 𝐾] (𝑎) // SFT is Algorithm 3.1 using a random rank-1 lattice (cf. Section 4.5) 2: A ← supp( â ) 𝑠 𝑠 3: for 𝑗 ∈ [𝑑] do 4: b̂𝑠𝑗 ← SFT[𝑠, 𝐾] (𝑏 𝑗 )   5: A 𝑠 ← A 𝑠 ∪ supp b̂𝑠𝑗 6: end for 7: ĉ𝑠 ← SFT[𝑠, 𝐾] (𝑐) 8: A 𝑠 ← A 𝑠 ∪ supp(ĉ𝑠 ) 9: f̂ 𝑠 ← SFT[𝑠, 𝐾] ( 𝑓 )   10: Compute S 𝑁 [A 𝑠 ] supp f̂ 𝑠 // see, e.g., (4.4) or (4.7)   11: (L𝑠,𝑁 )k∈S 𝑁 ,l∈S 𝑁 ← (2𝜋) 2 (l · k) 𝑎ˆ k−l 𝑠 + 2𝜋i b̂𝑠 · l + 𝑐ˆ𝑠 k−l k−l 12: û𝑠,𝑁 ← L𝑠,𝑁 \f̂ 𝑠 // using MATLAB backslash notation for matrix solve 102 Showing that 𝑢 𝑠,𝑁 converges to 𝑢 now relies on a version of Strang’s lemma [13, Equation (6.4.46)]. We make the assumption here that all functions’ Fourier coefficients are supported on the supports of the outputs of their respective SFTs so that our use of S 𝑁 is unambiguous. However, this assumption will be lifted by Lemma 4.5 in Corollary 4.4 below.     Lemma 4.6 (Strang’s Lemma). Suppose that supp( 𝑎) ˆ = supp( â𝑠 ), supp 𝑏ˆ 𝑗 = supp b̂𝑠𝑗 for all 𝑗 ∈ [𝑑], supp( 𝑐) ˆ = supp(ĉ𝑠 ), and supp( 𝑓ˆ) = supp( f̂ 𝑠 ). Also suppose that 𝑎 𝑠 ≥ 𝑎 min 𝑠 > 0 and − 12 ∇ · b𝑠 + 𝑐 𝑠 ≥ 𝑑min 𝑠 > 0 on T𝑑 , with 𝛼 𝑠 ≥ min{𝑎 min 𝑠 , 𝑑 𝑠 }. Additionally, define min   𝛽−𝑠 𝑠 𝑠 := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ . 𝑠 x∈T𝑑 Let 𝑢 and 𝑢 𝑠,𝑁 be as above. Then   𝑠,𝑁 𝛽 𝛽−𝑠 k 𝑓 − 𝑓 𝑠 k 𝐿2 𝑢−𝑢 𝐻1 ≤ 1+ 𝑠 𝑢| Z𝑑 \S 𝑁 𝐻1 + k𝑢| S 𝑁 k 𝐻 1 + . 𝛼 𝛼𝑠 𝛼𝑠 Proof. Define 𝐿 𝑠 as 𝐿 where 𝑎, b, and 𝑐 are replaced by 𝑎 𝑠 , b𝑠 , and 𝑐 𝑠 . Note that 𝐿 𝑠 is still an infinite dimensional operator and is not truncated to S 𝑁 like L𝑠,𝑁 is. We let ê := û𝑠,𝑁 − 𝑢| ˆ S 𝑁 , and consider L𝑠,𝑁 ê = L𝑠,𝑁 û𝑠,𝑁 − (𝐿 𝑠 𝑢| ˆ S 𝑁 )| S 𝑁 = f̂ 𝑠 − 𝑓ˆ + (𝐿 𝑢)| ˆ S 𝑁 − (𝐿 𝑠 𝑢| ˆ S 𝑁 )| S 𝑁 = f̂ 𝑠 − 𝑓ˆ + (𝐿 𝑢|ˆ Z𝑑 \S 𝑁 )| S 𝑁 + ((𝐿 − 𝐿 𝑠 ) 𝑢| ˆ S 𝑁 )| S 𝑁 . Noting that L𝑠,𝑁 ê = (𝐿 𝑠 ê)| S 𝑁 and owing to coercivity of 𝐿 𝑠 , we have 𝛼 𝑠 k𝑒k 2𝐻 1 ≤ hL𝑠,𝑁 ê, êi ≤ k 𝑓 𝑠 − 𝑓 k 𝐿 2 k𝑒k 𝐻 1 + 𝛽 𝑢| Z𝑑 \S 𝑁 𝐻1 k𝑒k 𝐻 1 + 𝛽−𝑠 k𝑢| S 𝑁 k 𝐻 1 k𝑒k 𝐻 1 . The result then follows from rearranging to estimate k𝑒k 𝐻 1 and using the triangle inequality to estimate 𝑢 − 𝑢 𝑠,𝑁 𝐻1 ≤ k𝑢 − 𝑢| S 𝑁 k 𝐻 1 + k𝑒k 𝐻 1 . We can now thread all of our results together into a final convergence analysis. The first corollary below is a more direct application of Strang’s lemma which is then followed by another corollary 103 which takes advantage of the SFT recovery results. We will also return to the setting where the PDE data are not necessarily Fourier sparse. Thus, we again employ intermediate, compactly Fourier- supported PDE data as in Lemma 4.5. Corollary 4.4. Let 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , and 𝑓 𝑠 be Fourier sparse approximations of 𝑎, b, 𝑐, and 𝑓 . Let 𝑎0 = 𝑎| supp(â𝑠 ) , 𝑏0𝑗 = 𝑏 𝑗 | supp(b̂𝑠 ) for all 𝑗 ∈ [𝑑], 𝑐0 = 𝑐| supp(ĉ𝑠 ) , and 𝑓 0 = 𝑓 | supp(f̂ 𝑠 ) . Suppose 𝑎, b, 𝑐, 𝑗 𝑓; 𝑎0, b0, 𝑐0, 𝑓 0; and 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , 𝑓 𝑠 satisfy the conditions of Proposition 4.1 with coercivity constants 𝛼, 𝛼0, and 𝛼 𝑠 respectively. Define the three modified continuity constants   0 0 0 0 𝛽− := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ , x∈T𝑑   0,0 0 0 0 𝛽− := max k𝑎 − 𝑎ˆ 0 k 𝐿 ∞ , sup b − b̂0 2 , k𝑐 − 𝑐ˆ0 k 𝐿 ∞ , x∈T𝑑   0,𝑠 0 𝑠 0 𝑠 0 𝑠 𝛽− := max k𝑎 − 𝑎 k 𝐿 ∞ , sup kb − b k 2 , k𝑐 − 𝑐 k 𝐿 ∞ . x∈T𝑑 Additionally, suppose that 3𝛽0,0 − < 𝛼 . 0 (4.14) Then with 𝑢 the exact solution to (WF) and 𝑢 𝑠,𝑁 the output of Algorithm 4.1, we have 𝑠,𝑁 k 𝑓 − 𝑓 0 k 𝐿 2 𝛽0− k 𝑓 0 k 𝐿 2 𝛽0,𝑠 0 − k 𝑓 k 𝐿2 k 𝑓 0 − 𝑓 𝑠 k 𝐿2 𝑢−𝑢 𝐻1 ≤ + + + 𝛼 𝛼𝛼0 𝛼 𝑠 𝛼0 𝛼𝑠 ! 𝑁+1 (4.15) 𝛽0− 𝛽0,0 k 𝑓 0 k 𝐿2   − + 1+ 𝑠 . 𝛼 𝛼0 − 2𝛽0,0 − 𝛼0 Proof. The condition (4.14) allows the use of Lemma 4.3, which upper bounds the truncation error in Lemma 4.6. Combining Lemma 4.5 with this bound from Lemma 4.6 and applying the stability estimate from Proposition 4.1 finishes the proof. This upper bound relies on the intermediate 𝑎0, b0, 𝑐0, and 𝑓 0. However, in practice, it is more likely that user of this algorithm will have knowledge regarding the well-posedness of the original problem (i.e., that 𝑎, b, 𝑐, and 𝑓 satisfy Proposition 4.1) and will be able to verify the well-posedness of the sparse approximate problem (i.e., that 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , and 𝑓 𝑠 satisfy Proposition 4.1) or at least increase the accuracy of the SFT so that the coercivity conditions of the original problem are not too far perturbed. The intermediate “prime” functions, on the other hand, are less accessible. There- fore, we rewrite this statement so the assumptions and error bounds can be quantified using only 104 errors between the original functions and the sparse approximations which Corollary 4.3 gives up- per bounds for. Corollary 4.5. Assume that 𝑎, 𝑐 ∈ 𝐿 ∞ (T𝑑 ; R), b ∈ 𝐻 1 (T𝑑 ; R) 𝑑 , and 𝑓 ∈ 𝐿 2 (T𝑑 ; R) with 𝑎(x) ≥ 𝑎 min > 0 and − 12 ∇ · b(x) + 𝑐(x) ≥ 𝑑min > 0 a.e. on T𝑑 . Let 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , and 𝑓 𝑠 be Fourier sparse approximations supported in frequency on B𝐾𝑑 of 𝑎, b, 𝑐, and 𝑓 respectively with k 𝑎ˆ − â𝑠 k ℓ1 < 𝑎 min , 𝜋𝐾 Õ ˆ (4.16) k 𝑐ˆ − ĉ𝑠 k ℓ1 + 𝑏 𝑗 − b̂𝑠𝑗 − k∇ · (b − b| 𝐾 )k 𝐿 ∞ < 𝑑min . 2 ℓ1 𝑗 ∈[𝑑] Define       𝑠 𝑠 𝜋𝐾 Õ ˆ 𝑠   𝛼 := min 𝑎 min − k 𝑎ˆ − â k ℓ1 , 𝑑min − k 𝑐ˆ − ĉ k ℓ1 − 𝑏 𝑗 − b̂ 𝑗 1 − k∇ · (b − b| 𝐾 )k 𝐿 ∞ > 0,  2 ℓ    𝑗 ∈[𝑑]    v tÕ     2    ˆ 𝑠 𝑠 𝛽− := max k 𝑎ˆ − â k ℓ1 , ˆ 𝑠 𝑏 𝑗 − b̂ 𝑗 1 , k 𝑐ˆ − ĉ k ℓ1 , 𝑠  ℓ    𝑗 ∈[𝑑]   and  v tÕ       2    ˆ 0 𝛽− := max k 𝑎ˆ − 𝑎ˆ 0 k ℓ1 , ˆ 𝑏𝑗 − 𝑏𝑗 ˆ , k 𝑐ˆ − 𝑐ˆ0 k ℓ1 .  0 ℓ1    𝑗 ∈[𝑑]   Additionally, suppose that 3 𝛽ˆ−0 ≤ 𝛼. Then with 𝑢 the exact solution to (WF) and 𝑢 𝑠,𝑁 the output of Algorithm 4.1, we have  𝑁+1 ! 𝑓ˆ ℓ2 𝑓ˆ − f̂ 𝑠 ℓ2 𝛽ˆ−𝑠 𝛽ˆ−0  𝑠,𝑁 𝑢−𝑢 𝐻1 ≤3 + + . 𝛼 𝑓ˆ 2 ℓ 𝛼 𝛼 − 2 𝛽ˆ−0 Proof. Since 𝑎ˆ0 = 𝑎| ˆ supp(â𝑠 ) , k𝑎 − 𝑎0 k 𝐿 ∞ ≤ k 𝑎ˆ − 𝑎ˆ0 k ℓ1 ≤ k 𝑎ˆ − â𝑠 k ℓ1 , k𝑎0 − 𝑎 𝑠 k 𝐿 ∞ ≤ k 𝑎ˆ0 − â𝑠 k ℓ1 ≤ k 𝑎ˆ − â𝑠 k ℓ1 , and analogously for 𝑐, 𝑏 𝑗 for all 𝑗 ∈ [𝑑], and 𝑓 , where the latter uses ℓ 2 norms. This allows for the replacement of 𝛽0− and 𝛽0,𝑠 − in (4.15) by 𝛽− as well as the replacement of k 𝑓 − 𝑓 k 𝐿 2 and k 𝑓 − 𝑓 k 𝐿 2 ˆ𝑠 0 0 𝑠 by 𝑓ˆ − f̂ 𝑠 𝐿2 . A similar argument allows the replacement of 𝛽0,0 − by 𝛽− . ˆ0 105 Additionally, 𝑎 𝑠 ≥ 𝑎 − k𝑎 − 𝑎 𝑠 k 𝐿 ∞ ≥ 𝑎 − k 𝑎ˆ − â𝑠 k ℓ1 and 𝑎0 ≥ 𝑎 − k𝑎 − 𝑎0 k 𝐿 ∞ ≥ 𝑎 − k 𝑎ˆ − â𝑠 k ℓ1 giving min(𝑎 min𝑠 , 𝑎0 ) ≥ 𝑎 min min − k 𝑎ˆ − â k ℓ 1 . We can bound min(𝑑 min , 𝑑 min ) from below similarly. 𝑠 𝑠 0 In particular, e.g., 1 1 1 𝑐0 − ∇ · b0 ≥ 𝑐 − ∇ · b − k𝑐 − 𝑐0 k 𝐿 ∞ − k∇ · (b − b0)k 𝐿 ∞ . 2 2 2 The k𝑐 − 𝑐0 k 𝐿 ∞ term can be bounded by k 𝑐ˆ − ĉ𝑠 k ℓ1 . To bound the divergence term, we use k∇ · (b − b0)k 𝐿 ∞ ≤ k∇ · (b − b0)| 𝐾 k 𝐿 ∞ + k∇ · (b − b| 𝐾 )k 𝐿 ∞ Õ Õ   = 𝑏ˆ 𝑗 𝜕 𝑗 e2𝜋ik·◦ + k∇ · (b − b| 𝐾 )k 𝐿 ∞ k 𝑗 ∈[𝑑] k∉supp( b̂𝑠 )∩B 𝑑 𝑗 𝐾 𝐿∞ Õ Õ   ≤ 2𝜋 𝑏ˆ 𝑗 𝑘 𝑗 + k∇ · (b − b| 𝐾 )k 𝐿 ∞ (4.17) k 𝑗 ∈[𝑑] k∉supp( b̂𝑠𝑗 )∩B𝐾𝑑 Õ ≤ 𝐾𝜋 𝑏ˆ 𝑗 − 𝑏ˆ 0𝑗 + k∇ · (b − b| 𝐾 )k 𝐿 ∞ ℓ1 𝑗 ∈[𝑑] Õ ≤ 𝐾𝜋 𝑏ˆ 𝑗 − b̂𝑠𝑗 + k∇ · (b − b| 𝐾 )k 𝐿 ∞ ℓ1 𝑗 ∈[𝑑] Thus min(𝛼0, 𝛼 𝑠 ) ≥ 𝛼 as stated, implying the satisfaction of Proposition 4.1 for the PDEs with 𝑎0, b0, 𝑐0, 𝑓 0 and 𝑎 𝑠 , b𝑠 , 𝑐 𝑠 , 𝑓 𝑠 as data. This also allows the replacement of 𝛼, 𝛼0 and 𝛼 𝑠 in (4.15) by 𝛼. The rest follows by upper bounding k 𝑓 0 k 𝐿 2 by 𝑓ˆ ℓ2 , combining like terms, and simplifying. Remark 4.1. Corollary 4.5, includes some overly cautious concessions in order to produce a fully unified result with cleaner error bounds. In particular, condition (4.16) and the resulting definition of 𝛼 are used to avoid the need to consider well-posedness of the approximate versions of the PDE as required in Corollary 4.4. In general, this condition is less important as the SFT approximations of the PDE data become more accurate. The pessimistic advection term bounding in (4.17) is a result of the fact that 𝐶 1 guarantees for the SFT algorithm are not available. Again, this step is unnecessary if it is known (or assumed) that the approximate PDEs are well-posed. However, note 106 that the truncation term k∇ · (b − b| 𝐾 )k 𝐿 ∞ can be controlled via regularity results for multivariate Fourier truncation, e.g., [64, 47], so long as the regularity of the advection field is known a priori. Remark 4.2. We can interpret this upper bound by focusing on the sum 𝑓ˆ − f̂ 𝑠 ℓ2 𝛽ˆ−𝑠  𝑁+1 𝛽ˆ−0  + + . (4.18) 𝑓ˆ 2ℓ 𝛼 𝛼 − 𝛽ˆ−0 The first term is controlled by the accuracy of the SFT approximation to 𝑓 . As a reminder, using Algorithm 3.1 for this SFT produces a near optimal error, upper bounded in Corollary 4.3 by √   opt 𝑓ˆ − f̂ 𝑠 ℓ2 ≤ (25 + 3𝐾) 𝑠 𝑓ˆ − 𝑓ˆ| 𝐾 . 𝑠 ℓ1 The second term, 𝛽ˆ−𝑠 /𝛼 is controlled by the accuracy of the SFT approximations of the coeffi- cients defining the differential operator, 𝑎, b, and 𝑐. Again, recall that Algorithm 3.1 produces near optimal approximations with error upper bounded by, e.g., opt k 𝑎ˆ − â𝑠 k ℓ1 ≤ (25 + 3𝐾)𝑠 𝑎ˆ − ( 𝑎| ˆ 𝐾 )𝑠 . ℓ1 The final term is controlled by two factors: the properties of the PDE data and the stamping level chosen. We see that the error decays exponentially as the stamping level increases. The base of this exponent is controlled by the PDE data. In particular, convergence is accelerated as 𝑎 and 𝑐 approach large constants and b approaches a field with divergence zero and little deviation from its mean. Indeed 𝛽−0 is reduced as the deviation of all three coefficients from their mean decreases. The other piece, 𝛼, (ignoring the SFT-dependent terms) increases as the minimums of 𝑎 and 𝑐 − 12 ∇ · b increase. Remark 4.3. The computational complexity of Algorithm 4.1 is   O 𝑑𝑠 log4 (𝑑𝐾 max(𝐾, 𝑠)) + max(𝑠, 2𝑁 + 1) 3 min(𝑠,2𝑁+1) . in the case of no advection field, and   O 𝑑 2 𝑠 log4 (𝑑𝐾 max(𝐾, 𝑠)) + max(𝑑𝑠, 2𝑁 + 1) 3 min(𝑑𝑠,2𝑁+1) . when an advection field is present. This is due to the three or 𝑑 + 3 SFTs respectively and a matrix solve of a S 𝑁 × S 𝑁 system. Note that computing the stamping set can be done by enumerating 107 the frequencies using the techniques in Lemma 4.2 and therefore is subject to the same upper bound as given in Lemma 4.1 for a stamp set’s cardinality. Recall also that the SFT complexity can be tuned to produce SFT approximations satisfying the above bounds with higher probability. We do not analyze the complexity of the matrix solve in depth, and instead resort to the upper bound given by Gaussian elimination on the dense matrix. However, L𝑠,𝑁 is relatively sparse for larger stamping levels. As the capabilities of sparse solvers depend strongly on analyzing the graph connecting interacting rows in L𝑠,𝑁 (cf. [28, Chapter 11]), we expect that the analysis of an efficient sparse solver could be carried out using much of the same analysis of stamping sets performed in Section 4.4. 4.7 Numerics This section gives examples of the algorithm summarized above applied to various problems. We begin with an overview of our implementation as well as some techniques used evaluate the accuracy of our approximations. We then present solutions to univariate and very high-dimensional multiscale problems with both exactly sparse and Fourier-compressible data. For simplicity, all experiments presented except for the last discard the advection and reaction terms, solving only a stationary diffusion equation. In this setting, solutions are unique up to constant shifts, so we always consider solutions with mean zero, that is, 𝑢ˆ 0 = 0. 4.7.1 Code and testing overview We implement Algorithm 4.1 described above in MATLAB using an object-oriented approach, with all code publicly available.¹ All SFTs are computed using the rank-1 lattice sparse Fourier transforms from Chapter 3.² In order to evaluate the quality of our approximations, we need to choose an appropriate metric. Letting 𝑢 𝑠,𝑁 be the approximation returned by our algorithm, the ideal choice would be to use 𝑢 − 𝑢 𝑠,𝑁 𝐻1 . However, for the types of problems we will be investigating, the true solution 𝑢 is unavailable to us. Instead, we will use a proxy that takes advantage of the stability result in ¹https://gitlab.com/grosscra/SparseADR ²this code is publicly available at https://gitlab.com/grosscra/Rank1LatticeSparseFourier 108 Proposition 4.1. Lemma 4.7. Let 𝑢 be the true solution to (GF) and 𝑢 𝑠,𝑁 be the approximation returned by solving (4.13). Define 𝑓ˆ𝑠,𝑁 := 𝐿 𝑢ˆ 𝑠,𝑁 with 𝑓 𝑠,𝑁 = L𝑢 𝑠,𝑁 . Then 𝑓 − 𝑓 𝑠,𝑁 𝐿2 𝑓ˆ − 𝑓ˆ𝑠,𝑁 ℓ2 𝑢 − 𝑢 𝑠,𝑁 𝐻1 ≤ = . 𝛼 𝛼 Proof. The result follows from the fact that 𝑢ˆ − 𝑢ˆ 𝑠,𝑁 solves 𝐿 𝑢ˆ − 𝑢ˆ 𝑠,𝑁 = 𝑓ˆ − 𝐿 𝑢ˆ 𝑠,𝑁 = 𝑓ˆ − 𝑓ˆ𝑠,𝑁  and applying Proposition 4.1. In the sequel, we will ignore 𝛼 since we are mostly interested in convergence properties in 𝑠 and 𝑁 and we will compute the relative error 𝑓 − 𝑓 𝑠,𝑁 𝑓ˆ − 𝑓ˆ𝑠,𝑁 𝐿2 or ℓ2 k 𝑓 k 𝐿2 𝑓ˆ 2 ℓ as our proxy instead. Whenever the data are exactly Fourier-sparse, the numerator of the second of these proxies can be computed exactly due to the fact that supp( 𝑓ˆ𝑠,𝑁 ) is known to be contained in S 𝑁+1 (cf. Proposition 4.4). However, in the non-sparse setting, even though 𝑓 − 𝑓 𝑠,𝑁 can be evaluated pointwise, computing an accurate approximation of its norm on T𝑑 is challenging for large 𝑑. For this reason, we approximate the norm via Monte Carlo sampling. We also furnish the cases where exactly computing 𝑓ˆ − 𝑓ˆ𝑠,𝑁 ℓ2 is possible with the pointwise Monte Carlo estimates to show that in practice, Monte Carlo sampling does as well as the exact computation. 4.7.2 Univariate compressible We begin by replicating the lone numerical example of solving an elliptic problem in [21, Sec- tion 5.1]. In this case, we solve the univariate problem −(𝑎(𝑥)𝑢0 (𝑥)) 0 = 𝑓 (𝑥) for all 𝑥 ∈ T, where   ∫ 1 0.6 + 0.2 cos(2𝜋𝑥) 𝑎(𝑥) = exp , 𝑓 (𝑥) = exp(− cos(2𝜋𝑥)) − exp(− cos(2𝜋𝑥)) 𝑑𝑥 10 1 + 0.7 sin(256𝜋𝑥) T (4.19) (note that the only difference from [21] is that we use the domain T = [0, 1] rather than [0, 2𝜋]). This data is not Fourier sparse, but is compressible. In the original paper, a bandwidth of 𝐾 = 1 536 is considered and approximations with 9 and 17 Fourier coefficients are used. 109 We first construct a high accuracy approximation of the solution to (4.19) by numerically in- tegrating on an extremely fine mesh of 10 000 points. This allows us to forgo our proxy error described in Lemma 4.7. As in [21], the bandwidth of our SFT used is set to 𝐾 = 1 536. Due to our SFT returning a 2𝑠 sparse approximation, we use 𝑠 = 4 and 𝑠 = 8 to compare with the 9 and 17 terms respectively considered in the original paper, and also provide an example with 𝑠 = 12. We set the stamping level to 𝑁 = 1 throughout, which, as discussed in the introduction, is similar to the technique used in [21]. 101 Relative error 100 10−1 𝐿2 𝐻1 10−2 Proxy error 4 8 12 𝑠 (sparsity) Figure 4.2 Errors in approximating the solution to (4.19). 1.75 0.10 1.50 1.25 0.00 1.00 𝑢 𝑢0 −0.10 0.75 𝑢 4,1 (𝑢 4,1 ) 0 𝑢 8,1 0.50 (𝑢 8,1 ) 0 −0.20 𝑢 12,1 0.25 (𝑢 12,1 ) 0 0 0.2 0.4 0.6 0.8 1 0.680 0.685 0.690 (a) Approximate solutions of (4.19). (b) Detail of approximate derivatives of (4.19). Figure 4.3 Qualitative results. The relative errors approximated in 𝐿 2 and 𝐻 1 are given in Figure 4.2. The original paper does not give numerical results, and instead, gives qualitative results, comparing the approximate 110 solutions and their derivatives with the true solution and its derivative. We have replicated this qualitative analysis in Figure 4.3 with similar results. Figure 4.2 also shows the error computed via the proxy described by Lemma 4.7, and in particu- lar, how pessimistic the proxy error can be. In this case, the small errors in the derivative (visualized in Figure 4.3b) are compounded by passing the approximate solution through the operator where 𝑎0 is often large relative to 𝑎. In future examples, we will see that the convergence of the proxy error is much more tolerable. 4.7.3 Multivariate exactly sparse 4.7.3.1 Low sparsity Moving to the multivariate case, we start with a simple example with exactly sparse data. Our goal is to solve −∇ · (𝑎(x)∇𝑢(x)) = 𝑓 (x) for all x ∈ T𝑑 , where (4.20)  𝑎(x) = 𝑎ˆ 0 + 𝑐 𝑎 cos(2𝜋k𝑎 · x), 𝑓 (𝑥) = sin 2𝜋k 𝑓 · x . We draw 𝑐 𝑎 ∼ Unif ([−1, 1]), keep it constant for each dimension, and set 𝑎ˆ 0 = 4 so that our problem remains elliptic (in the specific example below, 𝑐 𝑎 ≈ −0.6). For dimensions varying from 𝑑 = 1 to 𝑑 = 1 024, we then draw k𝑎 , k 𝑓 ∼ Unif [−499, 500] 𝑑 ∩ Z𝑑 . The PDE (4.20) is then  solved for stamping levels 𝑁 = 1, . . . , 5. The bandwidth of the SFT is set to 1000 and the sparsity is set to 2. We then compute a Monte Carlo approximation of the proxy error choosing 200 points drawn uniformly from T𝑑 and also compute the proxy error exactly by virtue of the sparsity of 𝑎 and 𝑓 . The results are given in Figure 4.4. We see that the results do not depend on the dimension of the problem. Since all dependence on 𝑑 is in the runtime of the SFT, we also observe that in practice, after the SFTs of the data have been computed, re-solving the problem on different stamping levels takes about the same amount of time for each 𝑑. The error also converges exponentially in the stamping level as suggested by the theoretical error guarantees. Notably, we also see that the Monte Carlo approximation with 200 points captures the same proxy error as the exact computation. 111 10−2 10−3 𝐿2 𝑓 − 𝑓 𝑠, 𝑁 10−4 k 𝑓 k 𝐿2 10−5 10−6 1 2 3 4 5 𝑁 (stamping level) 𝑑 = 1 Monte Carlo 𝑑 = 1 exact 𝑑 = 4 Monte Carlo 𝑑 = 4 exact 𝑑 = 16 Monte Carlo 𝑑 = 16 exact 𝑑 = 64 Monte Carlo 𝑑 = 64 exact 𝑑 = 256 Monte Carlo 𝑑 = 256 exact 𝑑 = 1024 Monte Carlo 𝑑 = 1024 exact Figure 4.4 Proxy error solving (4.20) with 𝑑 = 1, 4, 16, 64, 256, 1 024 and 𝑁 = 1, . . . , 5. 4.7.3.2 High sparsity We expand on the exactly sparse case by testing a diffusion coefficient with much higher sparsity. Here, we solve (4.20) with Õ 𝑎(x) = 𝑎ˆ 0 + 𝑐 k cos(2𝜋k · x). (4.21) k∈I𝑎 The vector of coefficients is drawn as c ∼ Unif [−1, 1] 25 once and reused in each test. For every 𝑑,  the frequencies k ∈ I𝑎 are each drawn uniformly from [−499, 500] 𝑑 ∩ Z𝑑 as before with |I𝑎 | = 25. Here 𝑎ˆ 0 = 4 dkck 2 e to ensure ellipticity. Again, the bandwidth of the SFT algorithm is set to 1 000, but the sparsity is now fixed to 26. The results are given in Figure 4.5 Again, we see that the results do not depend on the spatial dimension except for the notable example of 𝑑 = 1. The 𝑑 = 1 case suffers from similar issues in a pessimistic proxy error as in Fig- ure 4.2. Specifically, the right hand-side for this example was generated with frequency 𝑘 𝑓 = −10 and is therefore relatively low-frequency. Thus, the high-frequency modes leading to errors in the approximate solution are amplified by the high-frequencies in 𝑎 when computing 𝑓 𝑠,𝑁 . Indeed, in further experiments (not pictured here), increasing the frequencies of 𝑓 or decreasing the frequen- 112 100 𝐿2 𝑓 − 𝑓 𝑠, 𝑁 10−1 k 𝑓 k 𝐿2 10−2 1 2 3 𝑁 (stamping level) 𝑑 = 1 Monte Carlo 𝑑 = 1 exact 𝑑 = 4 Monte Carlo 𝑑 = 4 exact 𝑑 = 16 Monte Carlo 𝑑 = 16 exact 𝑑 = 64 Monte Carlo 𝑑 = 64 exact 𝑑 = 256 Monte Carlo 𝑑 = 256 exact 𝑑 = 1024 Monte Carlo 𝑑 = 1024 exact Figure 4.5 Proxy error solving (4.20) with diffusion coefficient (4.21) in dimensions 𝑑 = 1, 4, 16, 64, 256, 1 024 and stamping levels 𝑁 = 1, . . . , 3. cies of 𝑎 result in a lower proxy error. For the other dimensions, the slight offsets in the exact proxy error can be attributed to the ran- domized frequencies as well as slight variations in the randomized SFT code. We do see slightly more variance in the proxy error computed using Monte Carlo sampling however. This is to be ex- pected for data with more varied frequency content, and as such, in future experiments, we increase the number of sampling points. Note that because we consider sparsity much larger than the stamping level, the computa- tional and memory complexity of the stamping and solution step is much higher. As suggested by Lemma 4.1, the size of the resulting stamp set (and therefore the necessary matrix solve) in the largest case is at most 7 · 527 ≈ 7 × 1012 which pushes the memory boundaries of our computational resources. 4.7.4 Multivariate compressible In order to test Fourier-compressible data which is not exactly sparse, we use a series of ten- sorized, periodized Gaussians. Here, we present the only details necessary to demonstrate our 113 algorithm’s effectiveness on Fourier-compressible data, but for a fuller treatment on the Fourier properties of periodized Gaussians, see e.g., [53, Section 2.1]. Here, we define the periodic Gaussian 𝐺 𝑟 : T → R by √ ∞ 2𝜋 Õ − (2 𝜋 ) 2 (2𝑥−𝑚) 𝐺 𝑟 (𝑥) = e 2𝑟 𝑟 𝑚=−∞ where the dilation-type parameter 𝑟 allows us to control the effective support of 𝐺ˆ 𝑟 . In practice, we truncate the infinite sum to 𝑚 ∈ {−10, . . . , 10} as additional terms do not change the output up to machine precision. Note here that the nonstandard multiplicative factors help control the behavior of the function in frequency rather than space. Given a multivariate modulating frequency k ∈ Z𝑑 , we define the modulated, tensorized, periodic Gaussian by Ö 𝐺 𝑟,k (x) = e2𝜋i𝑘 𝑖 𝑥𝑖 𝐺 𝑟 (𝑥𝑖 ). 𝑗 ∈[𝑑] Finally, given a set of frequencies I ⊂ Z𝑑 , dilation parameters r ∈ RI+ , and coefficients c ∈ RI , we can define Gaussian series Õ 𝐺 Ic,r (x) := 𝑐 k 𝐺 𝑟k ,k (x). k∈I Depending on the severity of the dilations chosen (i.e., 𝑟 k  1), this can well approximate a Fourier series with frequencies in I. On the other hand, a less severe dilation results in Fourier co- efficients with magnitudes forming less concentrated Gaussians centered around the “frequencies” k ∈ I and −k. An example of a series with its associated Fourier transform is given in Figure 4.6. In our first experiment, we fix 𝑑 = 2 and vary both stamp level and sparsity to again solve (4.20). The diffusion coefficient in (4.20) is replaced with a two-term Gaussian series 𝑎 = 𝑐 0 + 𝐺 Ic,r , where  2   2 2 I ∼ Unif [−24, 25] ∩ Z , c ∼ Unif [−1, 1] 2 , r = 1.12 1, 𝑐 0 = 10 dkck 2 e . Note the increased constant factor from our previous examples to decrease the likelihood of sparse approximations of 𝑎 not satisfying the ellipticity property. The Fourier transform of the resulting 𝑎 used for the following test is depicted in Figure 4.7 below. The diffusion equation is then solved across various sparsities with increasing stamping level. The bandwidth parameter of the SFT is 114 50 40 30 30 20 20 10 10 𝑘1 0 0 −10 −10 −20 0.5 −30 0.25 0.5 −40 0 0.25 0 −0.25 −0.25 −40−30−20−10 0 10 20 30 40 50 𝑥1 𝑥0 𝑘0 (a) 𝑐 1 𝐺 𝑟1 ,k1 + 𝑐 2 𝐺 𝑟2 ,k2 (b) 𝑐 1 𝐺ˆ 𝑟1 ,k1 + 𝑐 2 𝐺ˆ 𝑟2 ,k2 Figure 4.6 An example Gaussian series with 𝑐 1 = 𝑐 2 = 1, 𝑟 1 = 0.5, 𝑟 2 = 2, k1 = (3, 2), and k2 = (−5, 15). The first term corresponds to the wider Gaussian shape and more spread out portions of the Fourier transform. The second term contributes to the highly oscillatory parts and the isolated spikes in the Fourier transform. set to 𝐾 = 100 to account for the wider effective support of 𝑎. ˆ The Monte Carlo proxy error is computed with 1 000 samples and depicted in Figure 4.8. 50 40 30 20 10 𝑘1 0 −10 −20 −30 −40 −40−30−20−10 0 10 20 30 40 50 𝑘0 Figure 4.7 The specific 𝑎ˆ used in examples depicted in Figure 4.8. Here, the stamping level does not affect convergence until the sparsity is above 𝑠 ≥ 16. This demonstrates the tradeoff between sparsity and stamping level in regards to the error bound (4.18). Until the SFT is able to capture enough useful information in 𝑎, ˆ the k 𝑎ˆ − â𝑠 k ℓ1 piece of the error bound dominates. Eventually, this factor is reduced far enough that the stamping term becomes 115 𝐿2 𝑓 − 𝑓 𝑠, 𝑁 k 𝑓 k 𝐿2 10−2 1 2 3 𝑁 (stamping level) 𝑠 = 2 Monte Carlo 𝑠 = 16 Monte Carlo 𝑠 = 4 Monte Carlo 𝑠 = 32 Monte Carlo 𝑠 = 8 Monte Carlo 𝑠 = 64 Monte Carlo Figure 4.8 Proxy error solving (4.20) with Gaussian series diffusion coefficient with sparsity levels 𝑠 = 2, 4, 8, 16, 32, 64, and stamping levels 𝑁 = 1, . . . , 3. apparent. We provide another example, where sparsity is fixed at 𝑠 = 16, and dimension and stamping level are increased. Again we solve (4.20) with the diffusion coefficient replaced by the two-term Gaussian series 𝑎 = 𝑐 0 + 𝐺 Ic,r , where  2   I ∼ Unif [−249, 250] ∩ Z 𝑑 𝑑 , c ∼ Unif [−1, 1] 2 , r = 1.1𝑑 1, 𝑐 0 = 10 dkck 2 e , and c and 𝑐 0 are not redrawn across test cases. The bandwidth of the SFT is set to 1 000 to again account for the potentially widened Fourier transform of 𝑎. With a 1 000 point Monte Carlo ap- proximation of the proxy error, the results are given in Figure 4.9. Here we observe much the same behavior as the previous test case. This is due to the fact that the dimension additionally drives the sparsity of the Gaussian Fourier transforms based on the choice of dilation r = 1.1𝑑 1. In additional experiments performed at higher dimensions (not pictured here), this factor results in numerical instability and the approximation error blows up. We also see that the 𝑑 = 2 and 𝑑 = 4 examples are swapped from their assumed positions (and the 𝑑 = 2 case even mildly benefits from increased stamping level). This is attributed to the random draw of the frequency locations affecting the proxy error as well as the SFT algorithm performing better in 116 10−2 𝐿2 𝑓 − 𝑓 𝑠, 𝑁 k 𝑓 k 𝐿2 10−3 10−4 1 2 3 4 𝑁 (stamping level) 𝑑 = 2 Monte Carlo 𝑑 = 8 Monte Carlo 𝑑 = 4 Monte Carlo 𝑑 = 16 Monte Carlo Figure 4.9 Approximate proxy error solving (4.20) with Gaussian series diffusion coefficient with 𝑑 = 2, 4, 8, 16 and 𝑁 = 1, . . . , 5. lower dimensions when all parameters are fixed. 4.7.5 Three-dimensional exactly sparse advection-diffusion-reaction equation We now extend our numerical experiments to the situation of a three-dimensional advection- diffusion-reaction equation. We work with the PDE −∇ · (𝑎∇𝑢) + b · ∇𝑢 + 𝑐𝑢 = 𝑓 (4.22) with exactly sparse data Õ Õ 𝑎(x) = 𝑎ˆ 0 + 𝑐sine 𝑎,k sin(2𝜋k · x) + 𝑐cosine 𝑎,k cos(2𝜋k · x) k∈I𝑎sine k∈I𝑎cosine Õ Õ 𝑏 𝑗 (x) = 𝑐sine 𝑏 𝑗 ,k sin(2𝜋k · x) + 𝑏 𝑗 ,k cos(2𝜋k · x) for all 𝑗 ∈ [3] 𝑐cosine k∈I𝑏sine k∈I𝑏cosine (4.23) 𝑗 𝑗 Õ Õ 𝑐(x) = 𝑐ˆ0 + 𝑐sine 𝑐,k sin(2𝜋k · x) + 𝑐cosine 𝑐,k cos(2𝜋k · x) k∈I𝑐sine k∈I𝑐cosine Õ Õ 𝑓 (x) = 𝑐sine 𝑓 ,k sin(2𝜋k · x) + 𝑐cosine 𝑓 ,k cos(2𝜋k · x), k∈I𝑓sine k∈I𝑓cosine 117 where I𝑎sine = I𝑎cosine = 2 I𝑏sine 𝑗 = I𝑏cosine 𝑗 = I𝑐sine = Iccosine = 5 for all 𝑗 ∈ [3] I𝑓sine = 2, and I𝑓cosine = 3. In total, there are 45 terms composing the differential operator, and 5 terms composing the forcing function. Each frequency is randomly drawn from Unif ([−49, 50] 3 ∩ Z3 ) and each coefficient for 𝑎 and 𝑓 from Unif ([−1, 1]). The coefficients for b and 𝑐 are drawn from Unif ([0, 1]). To ensure q  q  2 2 2 2 well-posedness, 𝑎ˆ 0 = 4 sine 𝑐𝑎 2 + 𝑐𝑎cosine 2 , and 𝑐ˆ0 = 4 sine cosine 𝑐𝑐 2 + 𝑐𝑐 2 . The bandwidth of the SFT is set to 𝐾 = 100 and we consider sparsity levels 𝑠 = 2 and 𝑠 = 5. Due to the large size of the stamp, we only consider stamping levels 𝑁 = 1, 2. 𝑓 − 𝑓 𝑠,𝑁 𝐿2 /k 𝑓 k 𝐿 2 𝑠 𝑁 exact Monte Carlo 1 0.518 0.518 2 2 0.518 0.518 1 0.054 0.054 5 2 0.031 0.031 Table 4.1 Error in approximating solution to ADR equation (4.22). 0.4 0.4 0.4 0.3 0.3 0.3 𝑥3 𝑥3 𝑥3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 𝑥2 𝑥2 𝑥2 (a) Slice through 𝑓 2,1 . (b) Slice through 𝑓 10,2 . (c) Slice through 𝑓 . Figure 4.10 Samples of 𝑓 10,2 and 𝑓 on the 𝑥 1 = 63/128 plane. The resulting true and Monte Carlo proxy error (sampled over 1 000 points) is given in Table 4.1. Additionally, Figure 4.10 shows a portion of a slice through 𝑓 as well as 𝑓 2,1 and 𝑓 10,2 which are computed by passing 𝑢 2,1 and 𝑢 10,2 through the differential operator. 118 We note that 𝑓 10,2 and 𝑓 appear qualitatively indistinguishable. However, since the sparsity level, 𝑠 = 2, used to compute 𝑢 2,1 is lower than the sparsity of any term in (4.23), 𝑓 2,1 loses some of characteristics of the original source term. Though it captures some of the true behavior in both larger scales (e.g., the oscillations moving in the northeast direction) and finer scales (e.g., the oscillations moving in the southeast direction), some interfering modes which produce the “wavy” effect are left out. This is supported by the relative errors reported in Table 4.1. Note also that the stamping level affects the convergence in 𝑠 = 5 case, but not the 𝑠 = 2 case. This is due to the sparsity related errors in (4.18) overwhelming the stamping term until the SFT approximations of the data are accurate enough. 119 BIBLIOGRAPHY [1] Sina Bittens and Gerlind Plonka. Real sparse fast DCT for vectors with short support. Linear Algebra Appl., 582:359–390, 2019. [2] Sina Bittens and Gerlind Plonka. Sparse fast DCT for vectors with one-block support. Numer. Algorithms, 82(2):663–697, 2019. [3] Sina Bittens, Ruochuan Zhang, and Mark A Iwen. A deterministic sparse FFT for functions with structured Fourier sparsity. Advances in Computational Mathematics, 45(2):519–561, 2019. [4] Simone Brugiapaglia. COmpRessed SolvING: Sparse Approximation of PDEs based on Com- pressed Sensing. PhD thesis, Polytecnico Di Milano, Milan, Italy, January 2016. [5] Simone Brugiapaglia. A compressive spectral collocation method for the diffusion equa- tion under the restricted isometry property. In Marta D’Elia, Max Gunzburger, and Gian- luigi Rozza, editors, Quantification of Uncertainty: Improving Efficiency and Technology: QUIET selected contributions, Lecture Notes in Computational Science and Engineering, pages 15–40. Springer International Publishing, Cham, 2020. [6] Simone Brugiapaglia, Sjoerd Dirksen, Hans Christian Jung, and Holger Rauhut. Sparse re- covery in bounded Riesz systems with applications to numerical methods for PDEs. Applied and Computational Harmonic Analysis, 53:231–269, July 2021. [7] Simone Brugiapaglia, Stefano Micheletti, Fabio Nobile, and Simona Perotto. Supplementary material to “Wavelet–Fourier CORSING techniques for multidimensional advection–diffu- sion–reaction equations”, September 2020. [8] Simone Brugiapaglia, Stefano Micheletti, Fabio Nobile, and Simona Perotto. Wavelet–Fourier CORSING techniques for multidimensional advection–diffusion–reaction equations. IMA Journal of Numerical Analysis, (draa036), September 2020. [9] Simone Brugiapaglia, Stefano Micheletti, and Simona Perotto. Compressed solving: A nu- merical approximation technique for elliptic PDEs based on compressed sensing. Computers & Mathematics with Applications, 70(6):1306–1335, September 2015. [10] Simone Brugiapaglia, Fabio Nobile, Stefano Micheletti, and Simona Perotto. A theoretical study of COmpRessed SolvING for advection-diffusion-reaction problems. Mathematics of Computation, 87(309):1–38, January 2018. [11] Hans-Joachim Bungartz and Michael Griebel. Sparse grids. Acta Numerica, 13:147–269, May 2004. Publisher: Cambridge University Press. [12] Glenn Byrenheid, Lutz Kämmerer, Tino Ullrich, and Toni Volkmer. Tight error bounds for rank-1 lattice sampling in spaces of hybrid mixed smoothness. Numerische Mathematik, 136(4):993–1034, August 2017. 120 [13] Claudio Canuto, M. Yousuff Hussaini, Alfio Quarteroni, and Thomas A. Zang. Spectral Meth- ods: Fundamentals in Single Domains. Scientific Computation. Springer-Verlag, Berlin Hei- delberg, 2006. [14] Bosu Choi, Andrew Christlieb, and Yang Wang. Multiscale High-Dimensional Sparse Fourier Algorithms for Noisy Data. ArXiv e-prints, 2019. arXiv:1907.03692. [15] Bosu Choi, Andrew Christlieb, and Yang Wang. High-dimensional sparse Fourier algorithms. Numerical Algorithms, 87(1):161–186, May 2021. [16] Bosu Choi, Mark Iwen, and Toni Volkmer. Sparse harmonic transforms ii: best s-term ap- proximation guarantees for bounded orthonormal product bases in sublinear-time. Numerische Mathematik, 148(2):293–362, Jun 2021. [17] Bosu Choi, Mark A. Iwen, and Felix Krahmer. Sparse harmonic transforms: A new class of sublinear-time algorithms for learning functions of many variables. Found. Comput. Math., 2020. [18] Andrew Christlieb, David Lawlor, and Yang Wang. A multiscale sub-linear time Fourier algorithm for noisy data. Appl. Comput. Harmon. Anal., 40(3):553 – 574, 2016. [19] Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Compressed sensing and best 𝑘-term approximation. Journal of the American Mathematical Society, 22(1):211–231, January 2009. [20] Dinh Dũng, Vladimir Temlyakov, and Tino Ullrich. Hyperbolic Cross Approximation. Ad- vanced Courses in Mathematics - CRM Barcelona. Springer International Publishing, Cham, 2018. [21] Ingrid Daubechies, Olof Runborg, and Jing Zou. A sparse spectral method for homogeniza- tion multiscale problems. Multiscale Modeling & Simulation, 6(3):711–740, January 2007. Publisher: Society for Industrial and Applied Mathematics. [22] Michael Döhler, Stefan Kunis, and Daniel Potts. Nonequispaced hyperbolic cross fast Fourier transform. SIAM Journal on Numerical Analysis, 47(6):4415–4428, January 2010. Publisher: Society for Industrial and Applied Mathematics. [23] Lawrence C. Evans. Partial differential equations. Number v. 19 in Graduate studies in mathematics. American Mathematical Society, Providence, R.I, second edition edition, 2010. [24] Simon Foucart and Holger Rauhut. A mathematical introduction to compressive sensing. Springer, 2013. [25] A. C. Gilbert, S. Muthukrishnan, and M. Strauss. Improved time bounds for near-optimal sparse Fourier representations. In Manos Papadakis, Andrew F. Laine, and Michael A. Unser, editors, Wavelets XI, volume 5914, pages 398 – 412. International Society for Optics and Photonics, SPIE, 2005. [26] Anna C Gilbert, Piotr Indyk, Mark Iwen, and Ludwig Schmidt. Recent developments in the sparse Fourier transform: A compressed Fourier transform for big data. IEEE Signal Processing Magazine, 31(5):91–100, 2014. 121 [27] Anna C. Gilbert, Martin J. Strauss, and Joel A. Tropp. A tutorial on fast Fourier sampling. IEEE Signal Process. Mag., 25(2):57–66, 2008. [28] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, fourth edition, 2013. [29] V Gradinaru. Fourier transform on sparse grids: Code design and the time dependent Schrödinger equation. Computing (Wien. Print), 80(1):1–22, January 2007. Place: Wien Publisher: Springer. [30] Michael Griebel and Jan Hamaekers. Sparse grids for the Schrödinger equation. Special issue on molecular modelling, 41(2):215–247, January 2007. Place: Les Ulis Publisher: EDP Sciences. [31] Michael Griebel and Jan Hamaekers. Fast discrete Fourier transform on generalized sparse grids. In Jochen Garcke and Dirk Pflüger, editors, Sparse Grids and Applications - Munich 2012, volume 97, pages 75–107. Springer International Publishing, Cham, 2014. Series Title: Lecture Notes in Computational Science and Engineering. [32] Craig Gross and Mark Iwen. Sparse spectral methods for solving high-dimensional and mul- tiscale elliptic PDEs. ArXiv e-prints, 2023. arXiv:2302.00752. [33] Craig Gross, Mark Iwen, Lutz Kämmerer, and Toni Volkmer. Sparse Fourier transforms on rank-1 lattices for the rapid and low-memory approximation of functions of many variables. Sampling Theory, Signal Processing, and Data Analysis, 20(1):1, December 2021. [34] Craig Gross, Mark A Iwen, Lutz Kämmerer, and Toni Volkmer. A deterministic algorithm for constructing multiple rank-1 lattices of near-optimal size. Advances in Computational Mathematics, 47(6):1–24, 2021. [35] Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price. Simple and practical algo- rithm for sparse Fourier transform. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1183–1194. SIAM, 2012. [36] Mark A Iwen. Combinatorial sublinear-time Fourier algorithms. Foundations of Computa- tional Mathematics, 10(3):303–338, 2010. [37] Mark A. Iwen. Improved approximation guarantees for sublinear-time Fourier algorithms. Appl. Comput. Harmon. Anal., 34:57–82, 2013. [38] Lutz Kämmerer. High Dimensional Fast Fourier Transform Based on Rank-1 Lattice Sam- pling. Ph.D, Universitätsverlag Chemnitz, 2014. [39] Lutz Kämmerer. Reconstructing multivariate trigonometric polynomials from samples along rank-1 lattices. In Gregory E. Fasshauer and Larry L. Schumaker, editors, Approximation Theory XIV: San Antonio 2013, pages 255–271. Springer International Publishing, 2014. [40] Lutz Kämmerer. Multiple rank-1 lattices as sampling schemes for multivariate trigonometric polynomials. Journal of Fourier Analysis and Applications, 24(17):17–44, 2018. 122 [41] Lutz Kämmerer. Constructing spatial discretizations for sparse multivariate trigonometric polynomials that allow for a fast discrete Fourier transform. Applied and Computational Har- monic Analysis, 47(3):702–729, 2019. [42] Lutz Kämmerer, Felix Krahmer, and Toni Volkmer. A sample efficient sparse FFT for arbitrary frequency candidate sets in high dimensions. Numerical Algorithms, 89(4):1479–1520, Apr 2022. [43] Lutz Kämmerer, Daniel Potts, and Toni Volkmer. High-dimensional sparse FFT based on sampling along multiple rank-1 lattices. Appl. Comput. Harmon. Anal., 51:225–257, 2021. [44] Lutz Kämmerer and Toni Volkmer. Approximation of multivariate periodic functions based on sampling along multiple rank-1 lattices. Journal of Approximation Theory, 246:1–27, 2019. [45] Michael Kapralov. Sparse Fourier Transform in Any Constant Dimension with Nearly-Optimal Sample Complexity in Sublinear Time, page 264–277. Assoc. Comput. Mach., New York, NY, USA, 2016. [46] Frances Kuo, Giovanni Migliorati, Fabio Nobile, and Dirk Nuyens. Function integra- tion, reconstruction and approximation using rank-1 lattices. Mathematics of Computation, 90(330):1861–1897, July 2021. [47] Friedrich Kupka. Sparse grid spectral methods for the numerical solution of partial differen- tial equations with periodic boundary conditions. Ph.D., Universität Wien, Vienna, Austria, November 1997. [48] Lutz Kämmerer. A fast probabilistic component-by-component construction of exactly inte- grating rank-1 lattices and applications. ArXiv e-prints, 2020. arXiv:2012.14263. [49] Lutz Kämmerer, Stefan Kunis, and Daniel Potts. Interpolation lattices for hyperbolic cross trigonometric polynomials. Journal of Complexity, 28(1):76–92, February 2012. [50] Lutz Kämmerer, Daniel Potts, and Toni Volkmer. Approximation of multivariate periodic functions by trigonometric polynomials based on rank-1 lattice sampling. Journal of Com- plexity, 31(4):543–576, August 2015. [51] David Lawlor, Yang Wang, and Andrew Christlieb. Adaptive sub-linear time Fourier algo- rithms. Adv. Adapt. Data Anal., 05(01):1350003, 2013. [52] Dong Li and Fred J. Hickernell. Trigonometric spectral collocation methods on lattices. In Recent advances in scientific computing and partial differential equations (Hong Kong, 2002), volume 330 of Contemp. Math., pages 121–132. Amer. Math. Soc., Providence, RI, 2003. [53] Sami Merhi, Ruochuan Zhang, Mark A. Iwen, and Andrew Christlieb. A new class of fully discrete sparse Fourier transforms: Faster stable implementations with guarantees. Journal of Fourier Analysis and Applications, 25(3):751–784, June 2019. [54] Lucia Morotti. Explicit universal sampling sets in finite vector spaces. Appl. Comput. Harmon. Anal., 43(2):354–369, 2017. 123 [55] Hans Munthe-Kaas and Tor Sørevik. Multidimensional pseudo-spectral methods on lattice grids. Applied Numerical Mathematics, 62(3):155–165, March 2012. [56] Gerlind Plonka, Daniel Potts, Gabriele Steidl, and Manfred Tasche. Numerical Fourier Anal- ysis. Applied and Numerical Harmonic Analysis. Springer International Publishing, Cham, 2018. [57] Gerlind Plonka and Katrin Wannenwetsch. A sparse fast Fourier algorithm for real non- negative vectors. J. Comput. Appl. Math., 321:532–539, 2017. [58] Gerlind Plonka, Katrin Wannenwetsch, Annie Cuyt, and Wen-shin Lee. Deterministic sparse FFT for 𝑀-sparse vectors. Numer. Algorithms, 78(1):133–159, 2018. [59] Daniel Potts and Toni Volkmer. Sparse high-dimensional FFT based on rank-1 lattice sam- pling. Appl. Comput. Harmon. Anal., 41(3):713–748, 2016. [60] J. Barkley Rosser and Lowell Schoenfeld. Approximate formulas for some functions of prime numbers. Illinois Journal of Mathematics, 6(1):64–94, 1962. [61] A.D. Rubio, A. Zalts, and C.D. El Hasi. Numerical solution of the advection-reaction- diffusion equation at different scales. Environmental Modelling & Software, 23(1):90–95, January 2008. [62] Ben Segal and MA Iwen. Improved sparse Fourier approximation results: faster implementa- tions and stronger guarantees. Numer. Algorithms, 63(2):239–263, 2013. [63] Jie Shen and Li-Lian Wang. Sparse spectral approximations of high-dimensional problems based on hyperbolic cross. SIAM Journal on Numerical Analysis, 48(3):1087–1109, January 2010. Publisher: Society for Industrial and Applied Mathematics. [64] V. N. Temlyakov. Approximation of periodic functions. Comput. Math. Anal. Ser. Nova Sci. Publ., Inc., Commack, NY, 1993. [65] Toni Volkmer. Multivariate Approximation and High-Dimensional Sparse FFT Based on Rank-1 Lattice Sampling. Ph.D, Universitätsverlag Chemnitz, 2017. [66] Weiqi Wang and Simone Brugiapaglia. Compressive Fourier collocation methods for high- dimensional diffusion equations with periodic boundary conditions. ArXiv e-prints, 2022. arxiv:2206.01255. [67] Harry Yserentant. On the regularity of the electronic Schrödinger equation in Hilbert spaces of mixed derivatives. Numerische Mathematik, 98(4):731–759, October 2004. [68] Harry Yserentant. Sparse grid spaces for the numerical solution of the electronic Schrödinger equation. Numerische Mathematik, 101(2):381–389, August 2005. 124