SUB-LINEAR FOURIER ALGORITHMS: THEORY AND IMPLEMENTATION
By
David Lawlor

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Applied Mathematics
2012

ABSTRACT
SUB-LINEAR FOURIER ALGORITHMS: THEORY AND
IMPLEMENTATION
By
David Lawlor
We present a new deterministic algorithm for the sparse Fourier transform problem, in
which we seek to identify k

N signiﬁcant Fourier coeﬃcients from a signal of bandwidth

N . Previous deterministic algorithms exhibit quadratic runtime scaling in k, while our
algorithm scales linearly with k in the average case. Underlying our algorithm are a few
simple observations relating the Fourier coeﬃcients of time-shifted samples to unshifted
samples of the input function. This allows us to detect when aliasing between two or more
frequencies has occurred, as well as to determine the value of unaliased frequencies. We also
show that the small time shifts required for our algorithm can be implemented using two
much larger shifts, an important point for applications where the sampling rate is limited by
hardware capacity. Empirically our algorithm is orders of magnitude faster than competing
algorithms.
We study the eﬀect of noise on the performance of our algorithm, and derive a relationship
between its magnitude and the rate at which the signal is sampled. This in turn is used
to devise modiﬁed versions of our basic algorithm that are robust to noise, although with
higher runtime and sampling requirements than in the noiseless case. The ﬁrst of these is a
minor change, and exhibits poor performance relative to the noise level. To remedy this we
propose a multi-scale algorithm that allows for error correction in the frequency estimates
in an iterative fashion. It is shown empirically that this multi-scale algorithm is robust to
noise even with coarse sampling rates.

Finally, we discuss the application of our algorithms to the multi-dimensional setting.
A straightforward extension, while simple to describe and implement, exhibits exponential
scaling with the dimension. A second extension is then proposed that uses number-theoretic
techniques to reduce the multi-dimensional problem to the one-dimensional case, which can
then be solved eﬃciently with the algorithm proposed in chapter two. An empirical evaluation shows that this second method signiﬁcantly outperforms highly optimized FFT implementations in two dimensions.

Copyright by
DAVID LAWLOR
2012

To my family

v

ACKNOWLEDGMENTS

I would ﬁrst like to thank my advisor, Andrew Christlieb, for his advice and guidance over
the past ﬁve years. In our very ﬁrst meeting he showed the patience and wisdom necessary
to mold my vague notions of research problems into deﬁnite plans. He made sure I kept
on pace throughout my graduate career, which sometimes meant pushing me outside of my
comfort zone. Beyond his scientiﬁc guidance, his advice on navigating an academic career
has been invaluable.
I would like to thank Anna Gilbert for the many hours spent working through ideas in
her oﬃce, acting at times as a proxy advisor for me in Ann Arbor. She generously invited
me to attend seminars, group meetings, and workshops at the University of Michigan and
elsewhere that have helped to deﬁne my research interests. I would like to thank Yang Wang
for providing the spark that would become my dissertation project, and for ﬁnding time in his
busy schedule as chair of our department to work through problems and provide feedback on
drafts. I would like to thank Jianliang Qian and Di Liu for serving on my defense committee
and for ably teaching the core applied mathematics courses during my graduate studies.
I would like to thank the support staﬀ of the math department, and Barbara Miller
in particular, for all their help with administrative matters. I would like to thank my
fellow math students for their friendship over the past ﬁve years. The nights spent playing
softball, sharing dinner, or laughing over beers made the experience of graduate school
more rewarding. The applied math group deserves special mention for organizing a student
seminar where I was able to practice giving research talks in front of a friendly audience.
My friends from college and outside the math department have given me many fond
memories having nothing to do with academics. Many thanks go to them for helping me to
vi

not miss out on life while pursuing my studies. My parents, Jack and Marian, my brother
Patrick, and my grandfather Bill have been incredibly supportive throughout my graduate
career, and I dedicate this dissertation to them. Lastly I would like to express my profound
gratitude to my partner Melanie, who has shown me the deepest love, kindness, and patience
throughout our time together. It is no exaggeration to say that this dissertation would not
have been possible without her support and encouragement.

vii

TABLE OF CONTENTS

List of Tables . . . . . . . . . . . . . . .
List of Figures

. . . . . . . . . . . . . . . . . . .

x

. . . . . . . . . . . . . . . . . .

xi

.
.
.
.
.

. . . . . . . . . . . . . .

Chapter 1 Introduction . . . . . . . .
1.1 Sparse Fourier approximation . . .
1.2 Relationship to compressed sensing
1.3 Related work . . . . . . . . . . . .
1.4 Thesis outline . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

1
1
3
6
10

Chapter 2 Adaptive sub-linear Fourier algorithms . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Mathematical background . . . . . . . . . . . . . . . .
2.2.1 Notation and preliminaries . . . . . . . . . . . .
2.2.2 Technical lemmas . . . . . . . . . . . . . . . . .
2.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Non-adaptive . . . . . . . . . . . . . . . . . . .
2.3.2 Adaptive . . . . . . . . . . . . . . . . . . . . . .
2.4 Average-case analysis . . . . . . . . . . . . . . . . . . .
2.4.1 While loop runtime and sampling complexity . .
2.4.2 Random signal model . . . . . . . . . . . . . . .
2.4.3 Markov analysis of collisions . . . . . . . . . . .
2.4.4 Average-case runtime and sampling complexity
2.5 Empirical Evaluation . . . . . . . . . . . . . . . . . . .
2.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Sampling Complexity . . . . . . . . . . . . . . .
2.5.3 Runtime Complexity . . . . . . . . . . . . . . .
2.6 Multiple Shifts . . . . . . . . . . . . . . . . . . . . . .
2.6.1 Two shifts . . . . . . . . . . . . . . . . . . . . .
2.6.2 Three or more shifts . . . . . . . . . . . . . . .
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

11
11
12
13
17
22
23
24
24
26
26
27
29
31
32
33
34
34
37
39
41

Fourier algorithms for noisy data
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

42
42
43
43
44
47
48

Chapter 3 Multi-scale sub-linear
3.1 Introduction . . . . . . . . . .
3.2 Mathematical background . .
3.2.1 Noise model . . . . . .
3.2.2 Sensitivity to noise . .
3.2.3 Earth mover distance .
3.3 A minor modiﬁcation . . . . .

.
.
.
.
.

viii

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

48
50
52
53
53
56
58

Chapter 4 Sub-linear Fourier algorithms in multiple dimensions
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Straightforward extension to multiple dimensions . . . . . . . . .
4.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Signal unwrapping and the Chinese Remainder Theorem . . . . .
4.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Empirical evaluation . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Runtime and sampling complexity . . . . . . . . . . . . . .
4.5.3 Sample lengths and aliased frequencies . . . . . . . . . . .
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

60
60
61
63
64
66
68
69
71
72
72
73
74
78

. . . . . . . . . . . . . . . . . . .

80

3.4

3.5

3.3.1 Rounding . . . . . .
3.3.2 Empirical evaluation
Multi-scale methods . . . .
3.4.1 Preliminaries . . . .
3.4.2 Algorithms . . . . .
3.4.3 Empirical evaluation
Conclusion . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

Bibliography . . . . . . . . . . . . . . .

ix

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

LIST OF TABLES

Table 2.1

Implementations used in the empirical evaluation. . . . . . . . . . .

x

32

LIST OF FIGURES

Figure 2.1

Figure 2.2

Figure 3.1

Figure 3.2

Figure 3.3

(a) Sampling complexity with ﬁxed bandwidth N = 222 for PS-Det
(blue solid line), GFFT-RS (red solid line), AAFFT (black dashed
line), and FFTW (magenta dashed line). (b) Runtime complexity
with ﬁxed bandwidth N = 222 for PS-Det (blue solid line), GFFTRF (red solid line), AAFFT (black dashed line), and FFTW (magenta
dashed line). For interpretation of the references to color in this and
all other ﬁgures, the reader is referred to the electronic version of this
dissertation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

(a) Sampling complexity with ﬁxed sparsity k = 60 for PS-Det (blue
solid line), GFFT-RS (red solid line), AAFFT (black dashed line),
and FFTW (magenta dashed line). (b) Runtime complexity with
ﬁxed sparsity k = 60 for PS-Det (blue solid line), GFFT-RF (red
solid line), AAFFT (black dashed line), and FFTW (magenta dashed
line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

(left): Mean phase error without rounding (shaded attribute) vs.
noise level σ and sample length p, in logarithmic scale. (right) Mean
phase error with rounding (shaded attribute) vs. noise level σ and
sample length p, in logarithmic scale. The bandwidth for both plots
is N = 222 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

EMD(1) error as a function of the noise level σ with the sparsity and
bandwidth ﬁxed at k = 64, N = 222 , respectively. Diﬀerent values of
the parameter c1 are shown: c1 = 4 (blue solid line), c1 = 16 (black
dashed line), c1 = 64 (red solid line), and c1 = 256 (magenta dashed
line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

EMD(1) error as a function of the noise level σ with the sparsity and
bandwidth ﬁxed at k = 64, N = 222 , respectively. Diﬀerent values of
the parameter c1 are shown: c1 = 4 (blue solid line), c1 = 16 (black
dashed line), c1 = 64 (red solid line), and c1 = 256 (magenta dashed
line). (a) Nonadaptive shift algorithm (B = 4), (b) Adaptive shift
algorithm (α = 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

xi

Figure 3.4

Figure 4.1

Figure 4.2

√
(a) Samples to achieve EMD(1) error σ/ k, as a function of the
sparsity k with ﬁxed σ = 0.512, N = 222 , for rounding only (blue
solid line), non-adaptive shift (red dash-dot line), and adaptive shift
(magenta dashed line) algorithms. (b) Runtime to achieve the same
error bound, as a function of the sparsity k. . . . . . . . . . . . . . .

59

(a) 2D sampling complexity with ﬁxed bandwidth N1 = N2 = 211
for Naive (blue solid line), Naive-Sqrt (black dashed line), Unwrapped
(magenta dashed line), and FFTW (red dashed line). (b) 2D runtime complexity with ﬁxed bandwidth N1 = N2 = 211 for Naive
(blue solid line), Naive-Sqrt (black dashed line), Unwrapped (magenta dashed line), and FFTW (red dashed line). . . . . . . . . . . .

75

(a) Average number of sample lengths used in 2D with ﬁxed bandwidth N1 = N2 = 211 for Naive (solid blue line), Naive-Sqrt (black
dashed line), and Unwrapped (magenta dashed line). Also shown
for comparison is 1D PS-Det (red solid line) with ﬁxed bandwidth
N = 222 . (b) Average fraction of aliased frequencies in 2D with ﬁxed
bandwidth N1 = N2 = 211 for Naive (solid blue line), Naive-Sqrt
(black dashed line), and Unwrapped (magenta dashed line). Also
shown for comparison is 1D PS-Det (red solid line) with ﬁxed bandwidth N = 222 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

xii

Chapter 1
Introduction

1.1

Sparse Fourier approximation

The Fast Fourier Transform (FFT) [12] is arguably the most ubiquitous numerical algorithm
in scientiﬁc computing. In addition to being named one of the “Top Ten Algorithms” of
the past century [15], the FFT is a critical tool in myriad applications, ranging from signal
processing to computational PDE and machine learning. At the time of its introduction, it
represented a major leap forward in the size of problems that could be solved on available
hardware, as it reduces the runtime complexity of computing the Discrete Fourier Transform
(DFT) of a length-N array from O(N 2 ) to O(N log N ). Even with advanced hardware
and optimized implementations, for large N this still represents the major computational
bottleneck in many applications.
Any algorithm which computes all N Fourier coeﬃcients necessarily has a runtime complexity of Ω(N ), since it takes that much time merely to report the output. However, in
many applications it is known that the DFT of the signal of interest is highly sparse – that is,
1

only a small number of coeﬃcients are non-zero. In this case it is possible to break the Ω(N )
barrier by asking only for the largest k terms in the signal’s DFT. When k

N existing

algorithms for this sparse Fourier approximation problem can signiﬁcantly outperform even
highly optimized FFT implementations [30, 28, 26].

Any sub-linear1 Fourier algorithm can’t even read all the input, and so must either be
approximate or succeed with high probability (or both). The discrete uncertainty principle of
Donoho and Stark [16] provides a heuristic justiﬁcation for sub-sampling in the time domain
if the signal of interest is highly sparse. One version states that if Nt , Nω are the number of
non-zero values of a length-N signal respectively in the time and frequency domains, then
it must hold that Nt Nω ≥ N . Thus when Nω

N , the number of non-zero time samples

must be large. This suggests that the few samples taken by a sub-linear algorithm should
still carry signiﬁcant information about the frequency content of the signal.

To date, most sparse Fourier approximation algorithms have exhibited an empirical
speedup over the FFT only for limited sparsity regimes due to their complicated structure,
and so are sub-linear only in an asymptotic sense.2 In contrast, the algorithms presented in
this dissertation have a simple structure which permits a straightforward, fast implementation, resulting in improvements over the FFT for a large range of sparsity values.

1 That is, an algorithm with runtime o(N ).
2 The recent algorithm [26] has been shown to outperform the FFT for a much larger
range of k, comparable to that of the noiseless algorithm presented in Chapter 2 of this
dissertation.
2

1.2

Relationship to compressed sensing

The term “compressed sensing” refers to a new paradigm in signal processing which seeks
to recover a compressible signal from a number of linear measurements roughly proportional
to its information content, rather than its nominal dimension. There has been tremendous
recent progress in this area, as evidenced by the daily updates of a popular research blog
devoted to the subject [9]. While this dissertation does not make explicit use of the results
or algorithms of compressed sensing, there are parallels in the approaches used. The purpose
of this section is to clarify the relationship between the two.

All algorithms for the sparse approximate DFT problem take a small number of samples
of the input x, either at random or in a deterministic fashion. These samples are then
processed in a highly non-linear, algorithm-dependent manner to produce a k-term Fourier
representation of x – that is, a list

ω ,a

k
=1 of signiﬁcant frequency/coeﬃcient pairs.

In other words, these algorithms approximately solve the severely underdetermined system
RF ∗ x = Rx, where R is the restriction to the samples used by the algorithm, F ∗ is the
adjoint of the N × N discrete Fourier matrix with entries

1
Fjk = √ e−2πijk/N , 0 ≤ j, k < N,
N

(1.1)

and x is the discrete Fourier transform of x.

In [8], Cand`s, Romberg, and Tao considered the dual problem: that of recovering a given
e
signal from highly incomplete Fourier measurements. Speciﬁcally, suppose that a signal x of
3

length N is the superposition of k spikes at times t = τj :
k
x[τj ]δ(t − τj ).

x[t] =

(1.2)

j=1
The authors show that, with high probability, x can be recovered exactly from a randomly
chosen set Ω of m frequencies from the discrete Fourier transform of x, provided

m ≥ Ck log N

(1.3)

for some constant C whose value depends on the desired probability of success. This can be
viewed as the severely underdetermined linear system dual to the system described above:
RF x = Rx. The recovery algorithm in this case is the 1 minimization

g ∗ = argmin g 1

subject to g(ω) = f (ω) for all ω ∈ Ω.

(1.4)

The idea of using 1 minimization to recover sparse vectors has been studied extensively in a
number of research communities, including seismic imaging [42], image processing [41], and
signal processing [10], where it is commonly referred to as basis pursuit. The theoretical
foundations of 1 approximation are treated in depth in the monograph [39].
Sparse Fourier approximation and compressed sensing are therefore broadly similar in
both their goals (sparse approximation of signals) and methods (in particular, the use of
randomization.) There are, however, substantial diﬀerences between the two, which we now
enumerate.

1. Sampling requirements. The compressed sensing model requires measurement matrices
4

to satisfy the Restricted Isometry Property, which has been shown to hold with high
probability for random Gaussian, Bernoulli, and Fourier ensembles. Sparse Fourier
algorithms, on the other hand, generally require more structure in their sampling
sets. This is obviously true for the deterministic algorithms, and also for some of
the randomized versions – in particular, [21] requires samples that lie on arithmetic
progressions.

2. Reconstruction costs. As mentioned above, the reconstruction of the target signal in
the compressed sensing model is achieved by a convex optimization problem (which
can be recast as a linear program.) This is expected to incur a computational costs
of O(N 3 ) for a signal of length N . Most Sparse Fourier approximation algorithms are
sublinear in time, and so exponentially faster.

3. Allocation of resources. The balance between the two previous items is the major point
of distinction. Indeed, we view the comparison of the two paradigms as an “applesto-oranges” scenario: In the seismic imaging environment (where practitioners have
recently implemented compressed sensing methods [35], [14]), high acquisition costs
make long processing times on the back end more palatable. Sparse Fourier approximation algorithms, however, were developed with data streaming applications in mind.
In this arena, low signal acquisition costs and enormous problem sizes necessitate fast
algorithms with sparing use of memory resources.
5

1.3

Related work

The ﬁrst works to implicitly address the sparse approximate DFT problem appeared in
the theoretical computer science literature in the early 1990s. In [36], a variant of the
Fourier transform for Boolean functions was shown to have applications for learnability. A
polynomial-time algorithm to ﬁnd large coeﬃcients in this basis was given in [33], while
the interpolation of sparse polynomials over ﬁnite ﬁelds was considered in [37]. It was later
realized [21] that this last algorithm could be considered as an approximate DFT for the
special case when N is a power of two.
In the past ten or so years, a number of algorithms have appeared which directly address
the problem of computing sparse approximate Fourier transforms. When comparing the
results in the literature, care must be taken to identify the class of signals over which a
speciﬁc algorithm is to perform, as well as to identify the error bounds of a given method.
Diﬀerent algorithms have been devised in diﬀerent research communities, and so have varying
assumptions on the underlying signals as well as diﬀerent levels of acceptable error.
The ﬁrst result with sub-linear runtime and sampling requirements appeared in [20].
They give a poly(k, log N, log(1/δ), 1/ε) time algorithm for ﬁnding, with probability 1 − δ,
an approximation y of the DFT of the input x that is nearly optimal, in the sense that

2
x − y 2 ≤ (1 + ε) x − xopt ,
2
2

(1.5)

where xopt is the best k-term approximation to x. This type of error bound is known as
an “ 2 / 2 ” guarantee. Here the exponent of k in the runtime is two, so the algorithm is
quadratic in the sparsity. Moreover, the algorithm is non-adaptive in the sense that the
6

samples used are independent of the input x. This algorithm was modiﬁed in [21] to bring
the dependence on k down to linear.3 This was accomplished mainly by replacing uniform
random variables (used to sample the input) by random arithmetic progressions, which
allowed the use of the nonequispaced fast Fourier transform to sample from intermediate
representations and to estimate the coeﬃcients in near-linear time. The increased overhead
of this procedure, however, limited the range of k for which the algorithm outperformed a
standard FFT implementation [30].
Around the same time, a similar algorithm was developed in the context of list decoding for proving hard-core predicates for one-way functions [3]. This can be considered an
extension of [33], and like [20, 21] is a randomized algorithm. Since the goal in this work
was to give a polynomial-time algorithm for list decoding, no eﬀort was made to optimize
the dependence on k; it stands at k 11/2 , considerably higher than [20, 21]. The randomness
in this algorithm is used only to construct a sample set on which norms are estimated, and
in [2] this set is replaced with a deterministic construction. This construction is based on the
notion of ε-approximating the uniform distribution over arithmetic progressions, and relies
on existing constructions of ε-biased sets of small size [32, 1]. Depending on the size of the εbiased sets used, the sampling and runtime complexities are O(k 4 logc N ) and O(k 6 logc N ),
respectively, for some c > 4.4
3 See [22] for a “user-friendly” description of the improved algorithm.
4 Speciﬁcally, the runtime is O(k 2 · log N · |S|), where S is the set of samples read by the
log N
algorithm. This set takes the form S =
A − B , where A has ε-discrepancy on
=1
rank 2 Bohr sets, B ε-approximates the uniform distribution on [0, 2 ) ∩ Z, and A − B
is the diﬀerence set. Using constructions from [32] one has |A| = O(ε−1 log4 N ), |B | =
O(ε−3 log4 N ); setting ε = Θ(k −1 ) and noting that
A−B = O
A−B
and
A − B = O(|A||B |) (see, e.g., [45]) one obtains the stated sampling and runtime complexities.
7

In the series of works [27, 28, 29], a diﬀerent deterministic algorithm for sparse Fourier
approximation was given that relies on the combinatorial properties of aliasing, or collisions
among frequencies in sub-sampled DFTs. By taking enough short DFTs of co-prime lengths,
and employing the Chinese Remainder Theorem to reconstruct energetic frequencies from
their residues modulo these sample lengths, the author is able to prove sampling and runtime
bounds of O(k 2 log4 N ). The error bound is of the form

1
x − y 2 ≤ x − xopt + √ x − xopt ;
2
1
k

(1.6)

it has been shown that the stronger “ 2 / 2 ” guarantee (1.5) cannot hold for a sub-linear,
deterministic algorithm [11]. Moreover, the range of k for which this algorithm is faster than
the FFT is smaller in practice than that of [21].

Most recently, the authors of [26] presented a randomized algorithm that extends by
an order of magnitude the range of sparsity for which it is faster than the FFT. This is
accomplished by removing the iterative aspect from [21] by using more eﬃcient ﬁlters, which
are nearly ﬂat within the passband and which decay exponentially outside. In contrast, the
box-car ﬁlters used in [21] have a frequency response which oscillates and decays like |ω|−1 .
In addition, the identiﬁcation of signiﬁcant frequencies is done by direct estimation after
hashing into a large number of bins rather than the binary search technique of [21]. These
√
changes give a runtime bound of O(log N N k log N ) and a somewhat stronger error bound

2
ε
2
x−y 2 ≤
∞ k x − xopt 2 + δ x 1
with probability 1 − 1/N , where ε > 0 and δ = N −O(1) is a precision parameter.
8

(1.7)

These existing algorithms generally take one of two approaches to the sparse Fourier
transform problem. In [20, 3, 21, 26], the spectrum of the input is randomly permuted and
then run through a low-pass ﬁlter to isolate and identify frequencies which carry a large fraction of the signal’s energy. This leads to randomized algorithms that fail on a non-negligible
set of possible inputs. On the other hand, [28] takes advantage of the combinatorial properties of aliasing in order to identify the signiﬁcant frequencies. This leads to a deterministic
algorithm with higher runtime and sampling requirements than the randomized algorithms
mentioned. Both of these randomized and deterministic approaches have drawbacks. Randomized algorithms are not suitable for failure-intolerant applications, while the process
used to reconstruct signiﬁcant frequencies in [28] relies on the Chinese Remainder Theorem
(CRT), which is highly unstable to errors in the residues. While there do exist algorithms
for “noisy Chinese Remaindering” [23, 5, 44] these have thus far not found application to
the sparse DFT problem, and we leave this as future work.

As this dissertation was being prepared, the author became aware of an independent work
using very similar methods for frequency estimation in the noiseless case [25]. Both methods
consider the phase diﬀerence between Fourier samples to extract frequency information, but
are based on diﬀerent techniques for binning signiﬁcant frequencies. The authors of [25]
use random dilations and eﬃcient ﬁlters of [26], whereas we use diﬀerent sample lengths in
the spirit of [28]. We believe both contributions are of interest, and reinforce the notion
that exploiting phase information is critical for developing fast, robust algorithms for sparse
Fourier approximation.
9

1.4

Thesis outline

The remainder of this dissertation is structured as follows. In Chapter 2 we give an averagecase Θ(k log(k)) algorithm for the sparse Fourier transform in the noiseless setting. The
runtime bound is proven over a certain class of random signals, and it is veriﬁed empirically
with numerical simulations. We also show how to implement the algorithm using lowerresolution hardware, an important consideration for applications.
In Chapter 3 we extend the algorithm given in Chapter 2 to handle noisy signals. We
ﬁrst give a minor modiﬁcation of the algorithm in chapter 2 to handle low-level noise, and we
discuss the merits of measuring the error of approximation using the earth-mover distance
(EMD) in place of the standard Euclidean norm. We then give two algorithms to handle
more general noise. This is done in a multi-scale manner, where we learn chunks of signiﬁcant
frequencies in an iterative fashion.
Finally, in Chapter 4 we present extensions of our algorithm to multiple dimensions.
Since the runtime of the FFT is O(N d log N ) in d dimensions, we expect to see the greatest
improvement over traditional Fourier transform methods in this setting. The ﬁrst extension
is quite natural and follows from the separability of the Fourier exponentials e2πiω·x in
multiple dimensions, but has runtime and sampling complexities that scale exponentially in
the dimension d. We also give a second extension which relies on a clever application of
the Chinese Remainder Theorem to construct an equivalent one-dimensional problem that is
solved using the methods of chapter 2. This second algorithm signiﬁcantly outperforms both
the previous extension and highly-optimized FFT implementations, as shown empirically in
two dimensions.

10

Chapter 2
Adaptive sub-linear Fourier
algorithms

2.1

Introduction

In this chapter we describe a simple, deterministic algorithm that avoids reconstruction
with the Chinese Remainder Theorem. As mentioned in chapter , existing deterministic
algorithms for the sparse Fourier approximation problem rely on the CRT for energetic
frequency reconstruction, and so are very unstable to errors in the residues. Our method
relies on sampling the signal in the time domain at slightly shifted points, and thus it
assumes access to an underlying continuous-time signal. The shifted time samples allow
us to determine the value of signiﬁcant frequencies in sub-sampled FFTs and also indicate
when two or more frequencies have been aliased in such a sub-sampled FFT. These two key
facts allow us to signiﬁcantly reduce (by up to two orders of magnitude) the average-case
sampling and runtime complexity of the sparse FFT over a certain class of random signals.
11

Our worst-case bounds improve by a constant factor those of prior deterministic algorithms.
We present both adaptive and non-adaptive versions of our algorithms. If the application
allows samples to be acquired adaptively (that is, dependent on previous samples), we are
able to improve further on our average-case bounds.
The remainder of this chapter is organized as follows. In section 2.2 we introduce notation
and prove the technical lemmas underlying our algorithms. In section 2.3 we introduce
randomized and deterministic versions of our algorithm. In section 2.4 we prove that our
algorithm has average-case runtime and sampling complexities of Θ(k log(k)) and Θ(k),
respectively. In section 2.5 we present the results of an empirical evaluation of our algorithm
and compare its runtime and sampling requirements to competing algorithms. In section 2.6
we describe how to combine two or more large shifts to reduce the sampling rate requirements
of the algorithm, making them more suitable for hardware implementation. Finally in section
2.7 we provide some concluding remarks.

2.2

Mathematical background

In this section we lay the groundwork for the remainder of this chapter, and indeed the rest
of this dissertation. In section 2.2.1 we explain the notation used in the sequel and give
some elementary results concerning the relationship between Fourier series and the discrete
Fourier transform. We also derive two key results concerning the phase of DFT coeﬃcients
and the relative magnitude of aliased coeﬃcients that are at the heart of our algorithms.
In section 2.2.2 we prove two lemmas which will be useful in deriving tests for aliasing and
worst-case bounds for our algorithms.
12

2.2.1

Notation and preliminaries

Throughout this dissertation we shall be concerned with frequency-sparse band-limited signals S : [0, 1) → C of the form
k

2πiωj t
aj e
,

S(t) =

(2.1)

j=1
where ωj ∈ [−N/2, N/2) ∩ Z, aj ∈ C, and k
1
S(ω) =
0

N . The Fourier series of S is given by

S(t)e−2πiωt dt, ω ∈ Z.

(2.2)

For signals of the form (2.1) we have
1
S(ω ) =
0



k



aj
j=1

2πiωj t  −2πiω t
e
dt
aj e

j=1

k
=



1 2πi(ω −ω )t
j
e
dt.
0

(2.3)

Since ωj and ω are integers, the integral in (2.3) is simply the Kronecker delta function δj ,
so we have S(ω ) = a for

= 1, . . . , k and S(ω) = 0 for all other ω ∈ [−N/2, N/2) ∩ Z.

We use the “big-oh” asymptotic notation common in computer science, which we summarize here brieﬂy. For two functions f, g : N → R+ we say f = O(g) if there is a positive
constant C so that for all n large enough we have f (n) ≤ Cg(n). We say f = Θ(g) if
f (n)
f = O(g) and g = O(f ). Finally, we say f = o(g) if limn→∞
= 0.
g(n)
We use the notation P[A] to denote the probability of an event A, and the notation E[X]
to denote the expected value of a random variable X. The underlying probability measure
13

will always be clear from context.
Given any ﬁnite sequence S = (s0 , s1 , . . . , sp−1 ) of length p we deﬁne its discrete Fourier
transform (DFT) by
p−1

p−1
sj e2πijh/p =

S[h] =
j=0

jh
S[j]Wp ,

(2.4)

j=0

where h = 0, 1, . . . , p − 1, S[j] := sj and Wp := e−2πi/p is the primitive p-th root of unity.
For a detailed discussion of the DFT and its relationship to Fourier series see [6]. The Fast
Fourier Transform (FFT) [12] allows the computation of S in O(p log p) time for any integer
p. See [46] for an in-depth treatment of the FFT and its many possible implementations.
The simplest way to determine the non-zero Fourier coeﬃcients of a signal of the form
(2.1) is to sample at the Nyquist rate 1/N and compute the full DFT. While guaranteed to
give the correct output, this procedure has sampling and runtime complexities of Θ(N ) and
Θ(N log N ), respectively. This is computationally wasteful, since the assumption k

N

means that almost all coeﬃcients of the Nyquist-rate DFT are zero. We are thus interested
in sub-linear algorithms, for which both the runtime and sampling complexities are o(N ).
A frequency-sparse signal with k non-zero Fourier coeﬃcients can be described using only
Θ(k) space, since we need only to store the locations of the non-zero Fourier coeﬃcients and
the values of those coeﬃcients. In section 2.3 we use the notation

def
R =

ωj , aj

k
j=1

(2.5)

to denote such a representation.
We apply the DFT to discrete samples of S(t) to compute the Fourier coeﬃcients aj of
14

S(t). For an integer p and real ε > 0 we form two discrete arrays Sp and Sp,ε by sampling
S(t) at rate 1/p starting at t = 0 and t = ε, respectively. That is, for j = 0, 1, . . . , p − 1,

j
,
p
j
Sp,ε [j] = S
+ε .
p
Sp [j] = S

(2.6)
(2.7)

If S(t) is of the form (2.1), we have for the unshifted samples that
p−1
Sp [h] =

Sp [j]e−2πihj/p

j=0
p−1
=

S

j
p

e−2πihj/p

j=0


p−1
k

=
a e2πiω j/p  e−2πihj/p
j=0 =1


p−1
k
e2πij(ω −h)/p 
a 
=
j=0
=1
=p

a .

(2.8)

ω ≡h (mod p)
p−1
e2πijα equals p if α ∈ Z and zero otherwise in the last

Here we used the fact that

j=0
equality. For the shifted samples we have, similarly,
p−1
Sp,ε [h] =

Sp,ε [j]e−2πihj/p

j=0
p−1
=

S
j=0

j
+ ε e−2πihj/p
p

15

p−1
=





k

a e2πiω (j/p+ε)  e−2πihj/p


j=0
k

=

=1


p−1

a e2πiω ε 


e2πij(ω −h)/p 

j=0

=1

a e2πiω ε .

=p

(2.9)

ω ≡h (mod p)

Now assume that all ωj (mod p), 1 ≤ j ≤ k are distinct. In this case we have from (2.8)
that
Sp [h] =





paj


 0

h = ωj (mod p)
otherwise.

By examining the peaks of Sp we can determine ωj (mod p) for 1 ≤ j ≤ k. Futhermore,
from (2.9) we have


 pa e2πiεωj
j
Sp,ε [h] =


0

h = ωj (mod p)
otherwise.

Using the both the ε-shifted and the unshifted samples, we can recover the frequencies
Sp,ε [h]
2πiεωj
=e
. Hence
ωj (mod ε−1 ) by noting that, for h = ωj (mod p), we have
Sp [h]

2πεωj ≡ Arg

Sp,ε [h]
Sp [h]

(mod 2π),

(2.10)

where Arg(z) denotes the phase angle of the complex number z in [−π, π). Assume that we
1
take ε ≤ N . Then ωj is completely determined by (2.10) as there will be no wrap-around
16

aliasing, and we have
ωj =

1
Arg
2πε

Sp,ε [h]
Sp [h]

.

(2.11)

Remark 2.2.1. In fact, more generally, if we have an estimate of ωj , say |ωj | < L , then
2
1
by taking ε ≤ L the same reconstruction formula (2.11) holds; we will use this observation
in chapter when we develop a multi-scale version of our algorithm for the case of noisy
input. The fact that taking slightly shifted samples allows us to identify frequencies in S(t)
underlies the algorithms which follow, and the remainder of this chapter analyzes various
aspects of the proposed algorithms, such as eﬃciency and robustness.
Equation (2.8) represents the phenomenon of aliasing, which occurs when two or more
frequencies present in S(t) are congruent modulo the sampling rate p. This can happen
whenever p < N , and in this case it is impossible to distinguish the aliased frequencies
using only samples at t = j/p. When aliasing occurs reconstruction via (2.11) is no longer
valid. The aliasing phenomenon presents a serious challenge for any method with sub-linear
sampling complexity. In the next section we develop a simple test to determine whether or
not aliasing has occurred in a p-length DFT, which then allows us to eﬀectively overcome
this challenge and develop provably correct sub-linear algorithms.

2.2.2

Technical lemmas

To eﬀectively apply the sub-sampling idea in a Fourier algorithm one must ﬁrst overcome
the aliasing challenge. Previous approaches applied the Chinese Remainder Theorem to
reconstruct the frequencies ωj by taking a suitable number of p’s, which must overcome the
problem of registrations to match up the residues whenever a new p is used. The complexity
of this matching problem becomes combinatorial in k, and clever techniques which require
17

more samples from S(t) must be used to maintain sub-linearity; see [28, 29] for details.
Using shifted sub-samples gives us a simple yet extremely eﬀective criterion to determine
whether or not aliasing has occurred at a given location in a p-length DFT without resorting
to complicated combinatorial techniques. Denote by I(S, h; p) the set of frequencies present
def
in S(t) that are congruent to h (mod p), so that I(S, h; p) = {j : ωj ≡ h (mod p)}. From
equations (2.8) and (2.9) we have

Sp,ε [h]

2

− Sp [h]

2

= p2

aj a e

2πiε(ωj −ω )

(2.12)

j, ∈I(S,h;p)
2
aj .

− p2
j∈I(S,h;p)

Lemma 2.2.2. Let p > 1 and h ∈ {0, 1, . . . , p − 1}. Assume that q = |I(S, h; p)| > 1, i.e.
ωj ≡ h (mod p) for more than one j in S(t). Then we have the following:
(A)

Let ε > 0 and E := {ωj − ωm : j, m ∈ I(S, h; p)}. Suppose that all elements of εE
are distinct (mod 1). Then Sp,mε [h] = Sp [h] for some 1 ≤ m ≤ q 2 − q.

(B)

For almost all ε > 0 we have Sp,ε [h] = Sp [h] .

2
Proof. The proof of part (B) is immediate from (2.12). Observe that f (ε) := Sp,ε [h] −
2
Sp [h] is trigonometric polynomial in ε, and it is not identically 0 given that q = |I(S, h; p)| >
1. Thus it has at most ﬁnitely many zeros for ε ∈ [0, 1), and hence (B) is clearly true.
We resort to the Vandermonde matrix to prove part (A). Recall that a Vandermonde
j−1
for some values xi . It is well-known that if the xi are distinct
matrix V satisﬁes Vij = xi
then V is invertible [24]. For simplicity we write f (t) = α∈E cα e2πiαt . Set rα := e2πiαε
where ε satisﬁes the hypothesis of the lemma, which implies that all rj are distinct. Assume
18

the claim of part (A) is false. Then we have f (mε) = 0 for all 0 ≤ m ≤ q 2 −q. Here f (0) = 0
is automatic because Sp,0 = Sp . Thus we have

m
cα rα = 0,

m = 0, 1, . . . , q 2 − q.

(2.13)

α∈E
But the cardinality of E is at most q 2 − q + 1, which means that there are at most q 2 − q + 1
m
terms in the sum in (2.13). By hypothesis all rα are distinct, so the matrix [rα ] is a
nonsingular Vandermonde matrix. Thus for (2.13) to hold all cα must be zero. This is
clearly not the case, and a contradiction.
Remark 2.2.3. Any irrational ε or ε = a with a, b coprime and b ≥ 2N will satisfy the
b
hypothesis of part (A) of Lemma 2.2.2. For irrational ε, suppose to the contrary that there
were a quadruplet (j1 , j2 , j3 , j4 ) of indices so that

ε ωj − ωj ≡ ε ωj − ωj
1
2
3
4

(mod 1).

(2.14)

Then we would have
ε ωj − ωj − ωj + ωj ∈ Z,
1
2
3
4

(2.15)

and since all ωj are integers this would imply ε ∈ Q, a contradiction. Similarly, in the case
i
a in lowest terms with b ≥ 2N , suppose there were a quadruplet of indices so that
when ε =
b
(2.14) held. Since each ωj ≤ N , the term in parentheses in (2.15) is less than 2N . Thus
2
i
we would have
n=

a
ωj − ωj − ωj + ωj
1
2
3
4
b

for some n ∈ Z, but this is impossible since by assumption b ≥ 2N .
19

Remark 2.2.4. It is also easy to show that in the special case where all coeﬃcients aj are
real and |I(S, h; p)| = 2, we have Sp,ε [h] = Sp [h] for any ε = a with a, b coprime and
b
b ≥ N . In this case, two frequencies are aliased modulo p, so that ω2 = ω1 + np for some
N
N
nonzero n ∈ Z with − 2p ≤ n < 2p . Moreover, for h = ω1 (mod p) = ω2 (mod p), we have

Sp [h] = |a1 + a2 | ,
Sp,ε [h] = a1 e2πiω1 ε + a2 e2πiω2 ε
= a1 + a2 e2πiε(ω2 −ω1 )
= a1 + a2 e2πiεnp .

(2.16)

Since the coeﬃcients aj are assumed to be real, for the remark to hold it suﬃces that εnp
np
np
be nonintegral. This is clear since εnp = a
and by assumption − 1 ≤
< 1.
2
2
b
b

Lemma 2.2.2 allows us to determine whether aliasing has occurred by checking whether
Sp,ε [h] / Sp [h] = 1 for a few values of ε. It oﬀers both a deterministic (part (A)) and
a random (part (B)) procedure to identify aliasing in the sub-sampled DFTs. In practice
we use a test related to part (B), and need to set a tolerance τ in order to accept or reject
frequencies according to the criterion
Sp,ε [h]

− 1 ≤ τ.

(2.17)

Sp [h]
In our algorithms we choose ε = 1/cN for some small constant c ≥ 1, which would satisfy
the hypothesis of part (A) of Lemma 2.2.2. A tolerance τ on the order of p/N works well in
general, which is what we use in our experiments in section 2.5 below.
20

In our algorithms we will take a number of sub-sampled DFTs of an input signal S(t) of
the form (2.1), whose lengths we denote p . Lemma 2.2.2 allows us to determine whether or
not two or more frequencies are aliased (mod p ), so that we only add non-aliased terms to
our representation. Since it is unlikely that two or more frequencies are aliased modulo two
diﬀerent sampling rates, using a diﬀerent p in a subsequent iteration lets us quickly discover
all frequencies present in S(t). Lemma 2.2.6 gives a worst-case bound on the number of p ’s
required by our deterministic algorithm to identify all k frequencies in a given Fourier-sparse
signal. It is similar to [28, Lemma 1], but with a smaller constant. In its proof we use the
CRT, which we quote here for completeness (see, e.g., [38]).
Theorem 2.2.5 (Chinese Remainder Theorem). Any integer n is uniquely speciﬁed modulo
N by its remainders modulo m pairwise relatively prime numbers p , provided

m p ≥ N.
=1

Lemma 2.2.6. Let M > 1. It suﬃces to take 1 + (k − 1) logM N pairwise relatively prime
p ’s with p ≥ M to ensure that each frequency ωj is isolated (i.e. not aliased) (mod p )
for at least one .
Proof. Assume otherwise, namely that given p for = 1, 2, . . . , L with L > k logM N there
exists some ωj such that ωj is aliased (mod p ). By the Pigeon Hole Principle there exists
at least one ωm = ωj such that ωj −ωm ≡ 0 (mod p ) at least q times, where q > logM N .
Without loss of generality we assume that ωj − ωm ≡ 0 (mod p ) for
by the fact that p ≥ M we have
q
p ≥ M q ≥ N.
=1
By the CRT we would then have ωj ≡ ωm (mod N ), a contradiction.
21

= 1, 2, . . . , q. Now

Remark 2.2.7. The algorithm in [28] requires taking 1+2k logk N co-prime sample lengths,
since that algorithm requires each ω to be isolated in at least half of the DFTs of length
p . This requirement stems from the fact that that algorithm cannot distinguish between
aliased and non-aliased frequencies in a given sub-sampled DFT. Our worst-case bound is
approximately a factor of two better, though in practice our algorithms never use all those
sample lengths on random input. The fact that we can tell which frequencies are “good” for
a given p allows us to construct our Fourier representation one term at a time, and quit
when we have achieved a prescribed stopping criterion.

2.3

Algorithms

Both of our algorithms proceed along a similar course; in fact they diﬀer only in the choice
of the sample lengths p . We assume that we are given access to the continuous-time signal
S(t) whose Fourier coeﬃcients we would like to determine, and further that we can sample
from S at arbitrary points t in unit time. This is an appropriate model for analog signals,
but not for discrete ones. In the discrete case, one could interpolate between given samples
to approximate the required S-values, though we have not implemented or analyzed this
case. (The same assumptions hold for the algorithms in [28], while those in [20, 21, 26] are
formulated purely in the discrete realm.) In this chapter we limit ourselves to the noiseless
case. Though this is a highly unrealistic assumption, it permits a simple description of the
underlying algorithm. In chapter we address the issue of noise speciﬁcally and give a number
of modiﬁcations to our algorithms to handle noise levels of varying strengths.
22

2.3.1

Non-adaptive

Our algorithms start by choosing a sample length p1 such that p1 ≥ ck for some constant
c > 1. For a ﬁxed ε ≤ 1/N , we then compute Sp and Sp,ε , sort the results by magnitude,
and compute frequencies ω via (2.11) for the k largest coeﬃcients in absolute value. We then
check whether or not each of those frequencies is aliased via (2.17), and if it is not, we add
it to our list. The coeﬃcient is given by the unshifted sample value Sp [h] at that frequency.
After this, we combine terms with the same frequency and prune small coeﬃcients from the
list. We then iterate until a stopping criterion is reached. In the empirical study described
in section 2.5, we stopped when the number of distinct frequencies in our list equaled the
desired sparsity.

Our deterministic algorithm chooses p to be the th prime greater than ck. This ensures
that all samples lengths are co-prime, at the expense of taking slightly more samples than
necessary. By Lemma 2.2.6, 1 + (k − 1) logck N such p s suﬃce to isolate every ω at least
once. This gives us worst-case sampling and runtime complexity on the same order as [28],
though the results in section 2.5 indicate that on average we signiﬁcantly outperform those
pessimistic bounds.

Our Las Vegas algorithm chooses p uniformly at random from the interval [c1 k, c2 k] for
constants 1 < c1 < c2 . In this case we cannot make a worst-case guarantee on the number of
iterations needed by the algorithm to converge. However, the results in section 2.5 indicate
that the Las Vegas version performs similarly to the deterministic version on the class of
signals tested.
23

2.3.2

Adaptive

The algorithms can also be implemented in an adaptive fashion, by which we mean that the
size of the current representation is taken into account in subsequent iterations. In particular,
if R is our current representation, we let k ∗ = k − |R| and choose the next p with respect
to k ∗ instead of k. Moreover, before taking DFTs, we subtract oﬀ the contribution from the
current representation, so that eﬀort is not expended re-identifying portions of the spectrum
already discovered. This idea is similar to that in [20, 21], though in our empirical studies the
evaluation of the representation is done directly, rather than as an unequally-spaced FFT.
This gives our algorithms asymptotically slower runtime, but the eﬀect is negligible for the
values of k studied in section 2.5. A formal description appears below in algorithm 1.

2.4

Average-case analysis

In this section we prove that the average-case runtime and sampling complexity of our
algorithm are Θ(k log k) and Θ(k), respectively. This is shown over a class of random signals
described in section 2.4.2. Before giving this result on the expected runtime and sampling
complexity, in section 2.4.1 we estimate the cost of a single iteration of the while loop in
algorithm 1, lines 5–25. We then describe in section 2.4.2 the random signal model over which
we prove our average-case bounds. In section 2.4.3 we prove that the expected number of
iterations of the while loop is constant, and in section 2.4.4 we use this result to prove our
average-case bounds.
24

Algorithm 1 Phaseshift
Input: function pointer S, integers c1 , c2 , k, N , real ε
Output: R, a sparse representation for S
R ← ∅, ε0 ← 0, ε1 ← ε, ← 1
while |R| < k do
5: k ∗ ← k − |R|
{or k if non-adaptive}
∗
p ← ﬁrst prime ≥ c1 k
{or Uniform(c1 k ∗ , c2 k ∗ ) if Las Vegas}
for m = 0 to 1 do
for j = 0 to p − 1 do
j
+ εm
S ,m [j] ← S
p
cω e2πiω(j/p +εm )
10:
Srep [j] ←
(ω,cω )∈R
{omit if non-adaptive}
end for
S ,m ← FFT(S ,m − Srep )
S sort ← Sort(S ,m )
,m
for j = 1 to k ∗ do

S sort [j]
1
,1

15:
ωj, ←
Arg 
sort [j]
2πε
S
,0
end for
end for
for j = 1 to k ∗ do
S sort [j]
p
,0
if
−1 <
then
N
S sort [j]
,1
20:
R ← R ∪ ωj, , S ,0 [ωj, ]
end if
end for
collect terms in R with same ω
prune small coeﬃcients from R
25:
← +1
end while

25

2.4.1

While loop runtime and sampling complexity

The computational cost of the while loop in algorithm 1, lines 5–25 is dominated by three
operations. The ﬁrst is the evaluation of the current representation R of k − k ∗ terms at
the O(k ∗ ) points j/p in line 10. In our implementation, we simply calculated this directly,
looping over both the sample points and the terms in the representation. The complexity of
this implementation is O(p (k − k ∗ )) = O(k ∗ (k − k ∗ )) = O(k 2 ), and while non-equispaced
fast Fourier transforms [18, 4] yield an asymptotically faster runtime of O(k log(k)), they
also incur large overhead costs. For the values of k considered in this dissertation, the
direct evaluation seems to have little eﬀect on the overall runtime. The other two dominant
computational tasks in the inner loop are the FFTs of O(k) samples and the subsequent
sorting of these DFT coeﬃcients. It is well-known that both of these operations can be
done in time Θ(k log(k)) [13]. Thus the inner loop has overall time complexity Θ(k log(k)),
assuming the use of non-equispaced FFTs.

2.4.2

Random signal model

For both the average-case analysis and for the empirical evaluation described in section 2.5
we considered test signals with uniformly random phase over the bandwidth and coeﬃcients
chosen uniformly from the complex unit circle. In other words, given k and N , we choose
k frequencies ωj uniformly at random (without replacement) from [−N/2, N/2) ∩ Z. The
2πiθj
corresponding Fourier coeﬃcients aj are of the form e
, where θj is drawn uniformly
from [0, 1). The signal is then given by
k

2πiωj t
aj e
.

S(t) =
j=1

26

(2.18)

This is the standard signal model considered in previous empirical evaluations of sub-linear
Fourier algorithms [30, 28, 26].

2.4.3

Markov analysis of collisions

In order to analyze the expected runtime and sampling complexity of our algorithms, we must
estimate the expected number of collisions among frequencies modulo the sample lengths
used by the algorithms. Recall that in the noiseless case, our algorithms are able to detect
when a collision between two or more frequencies has occurred, and for those that are not
aliased we are able to calculate the value of the frequency. Thus we seek to estimate the
expected fraction of frequencies that are aliased modulo a given sample length p, since this
determines how many passes the algorithm makes. In this section we derive bounds on the
expected value of this quantity.
In the random signal model considered in section 2.4.2, we assume the k frequencies are
uniformly distributed over the bandwidth [−N/2, N/2), and so the residues ω mod p are also
uniformly distributed over [0, p − 1] ∩ Z. Our problem then becomes a classical occupancy
problem: the number of collisions among the frequencies is equivalent to the number of
multiple-occupancy bins when k balls are thrown uniformly at random into p bins. Deﬁne
Xm to be the number of single-occupancy bins after m balls are thrown, Ym to be the
number of multiple-occupancy bins after m balls are thrown, and Zm to be the number
of zero-occupancy bins after m balls are thrown. Since p is constant, we have the trivial
relationship Zm = p − Xm − Ym , so it suﬃces to consider only the pair (Xm , Ym ).
When the (m + 1)st ball is thrown, we have the following possibilities:
• it lands in an unoccupied bucket, with probability Zm /p = 1 − (Xm + Ym )/p;
27

• it lands in a single-occupancy bucket, with probability Xm /p;

• it lands in a multiple-occupancy bucket, with probability Ym /p.

In the ﬁrst case, we have Xm+1 = Xm + 1, Ym+1 = Ym ; in the second case, we have
Xm+1 = Xm − 1, Ym+1 = Ym + 1; and in the third case, we have Xm+1 = Xm , Ym+1 =
Ym . Conditioning on the values of Xm , Ym we have


 













 Xm+1   Xm    1 − 2/p −1/p   Xm   1 
E 
 
 = 

 +  ,
Ym+1
Ym
1/p
1
Ym
0

(2.19)

so that the system forms a Markov chain. By recursively conditioning on the values of
Xm−1 , Ym−1 , we can calculate the expected values of Xk , Yk for any k > 0 using the
initial condition X1 = 1, Y1 = 0.
Denoting by A the matrix in the right-hand side of equation (2.19), we have
 
 m  1  
 1 
 Xk 
Am    .
E 
A   =
 =
0
0
Yk
m=0
m=0




k−1









k−1

(2.20)

Since ρ(A) = 1 − 1/p < 1, where ρ is the spectral radius, the geometric matrix series can be
written
k−1

Am = (I − A)−1 (I − Ak ).

(2.21)

m=0
After some linear algebra, we obtain






1
k(1 − p )k−1



 Xk  

E 
 = 
.
1 )k ) − k(1 − 1 )k−1
Yk
p(1 − (1 − p
p
28

(2.22)

Since Zk = p − Xk − Yk , we have E Zk = p(1 − 1/p)k .
In our algorithms we choose p = ck for some small integer c. Using this and the approxx
imation (1 + n )n ≈ ex , we have






ke−1/c



 Xk  

E 
 ≈ 
.
−1/c ) − ke−1/c
Yk
ck(1 − e

(2.23)

This gives a nonlinear equation for the expected number of collisions among k frequencies
as a function of the parameter c. Newton’s method can then be used to determine the value
c required to ensure a desired fraction of the frequencies are not aliased. For example, to
ensure that 90% of frequencies are isolated on average, it suﬃces to take c = 5; this value for
the parameter c had already been found to give good performance in our empirical evaluation
of the algorithms.

2.4.4

Average-case runtime and sampling complexity

In this section we will use a probabilistic recurrence relation due to Karp [31, 17] to give
average-case performance bounds and concentration results for the case when the algorithm
is halted after identifying k or more terms. In particular, we use the following theorem for
recurrences of the form
T (k) = a(k) + T (H(k)),

(2.24)

where T (k) denotes the time required to solve an instance of size k, a(k) is the amount of
work done on a problem of size k, and 0 ≤ H(k) ≤ k is a random variable denoting the size
of the subproblem generated by the algorithm.
29

Theorem 2.4.1. [31, Theorem 1.2] Suppose a(k) is nondecreasing, continuous, and strictly
increasing on {x : a(x) > 0}, and that E[H(k)] ≤ m(k) for a nondecreasing continuous
function m(k) such that m(k)/k is also nondecreasing. Denote by u(k) the solution to the
deterministic recurrence
u(k) = a(k) + u(m(k)).

(2.25)

Then for k > 0 and t ∈ N,

P[T (k) > u(k) + ta(k)] ≤

m(k) t
.
k

(2.26)

In section 2.4.1 we showed that our algorithm does work a(k) = Θ(k log(k)) on input of
size k, while in section 2.4.3 we showed that it generates a subproblem whose average size is
m(k) = k/10. (Recall that with the parameter c = 5, on average over 90% of the frequencies
were not aliased modulo p = O(ck).) The associated deterministic recurrence is then

u(k) = Θ(k log(k)) + u(k/10),

(2.27)

whose solution is u(k) = Θ(k log(k)) (see, e.g., [13]). A straightforward application of
Theorem 2.4.1 yields

P[T (k) > Θ(k log(k)) + tk log(k)] ≤ 10−t ,

(2.28)

so that the runtime is tightly concentrated about its mean Θ(k log(k)).

The sampling complexity S(k) can be handled in an analogous manner, since in this case
30

a(k) = Θ(k) and m(k) = k/10 as before. The associated deterministic recurrence becomes

u(k) = Θ(k) + u(k/10),

(2.29)

whose solution is u(k) = Θ(k). Applying Theorem 2.4.1 again we have

P[S(k) > Θ(k) + tk] ≤ 10−t ,

(2.30)

so that we again have tight concentration around the mean Θ(k).

2.5

Empirical Evaluation

In this section we describe the results of an empirical evaluation of the adaptive deterministic
and Las Vegas variants of the Phaseshift algorithm described above. Both algorithms were
implemented in C++ using FFTW 3.0 [19] for the FFTs, using FFTW ESTIMATE plans since
the sample lengths are not known in advance for the Las Vegas variant. For comparison we
also ran the same tests on the four variants of GFFT as well as on AAFFT and FFTW itself.
The FFTW runs utilized the FFTW PATIENT plans with wisdom enabled, and so are highly
optimized. The experiments were run on a single core of an Intel Xeon E5620 CPU with a
clock speed of 2.4 GHz and 24 GB of RAM, running SUSE Linux with kernel 2.6.16.60-0.81.2smp for x86 64. All code was compiled with the Intel compiler using the -fast optimization.
As in [29], timing is reported in CPU ticks using the cycle.h ﬁle included with the source
code for FFTW.
In the following sections we refer to our algorithm as “Phaseshift”, since by taking shifted
time samples of the input signal we also shift the phase of the Fourier coeﬃcients. To keep
31

Table 2.1: Implementations used in the empirical evaluation.
Algorithm R/D Samples
PS-Det
D
k
PS-LV
R
k
GFFT-DF
D
k 2 log4 N
D
k 2 log2 N
GFFT-DS
GFFT-RF
R
k log4 N
GFFT-RS
R
k log2 N
R
k logc N
AAFFT
FFTW
D
N

Runtime
k log k
k log k
k 2 log4 N
N k log2 N
k log5 N
N log N
k logc N
N log N

Reference
Section 2.4
Section 2.4
[29]
[29]
[29]
[29]
[21]
[19]

the plots readable, we only show data for the adaptive, deterministic variant of our algorithm;
the other variants perform similarly. The algorithms of [29] are denoted GFFT-XY, where
X ∈ {D,R} and Y ∈ {F,S}. The D/R stands for deterministic or randomized, while the F/S
stands for fast or slow. The fast variants use more samples but less runtime while the slow
variants use fewer samples but more runtime. In the plots below, we always show the GFFT
variant with the most favorable sampling or runtime complexity. Finally, AAFFT denotes
the algorithm of [21]. The implementations tested are summarized in table 2.1 along with
the average-case sampling and runtime complexities, and the associated references.

2.5.1

Setup

Each data point in Fig. 2.1–2.2 is the average of 100 independent trials of the associated
algorithm for the given values of the bandwidth N and the sparsity k. The lower and
upper bars associated with each data point represent the minimum and maximum number
of samples or runtime of the algorithm over the 100 test functions. The values of k tested
were 2, 4, 8, . . . , 4096, while the values of N were 217 , 218 , . . . , 226 . For the larger values
of k, the slow GFFT variants and AAFFT took too long to complete on our hardware, so
we only present partial data for these algorithms. Nevertheless, the trend seen in the plots
32

below continues for higher values of the sparsity. The test signals were generated according
to the signal model described in section 2.4.2.
The Phaseshift and deterministic GFFT variants will always recover such signals exactly.
The randomized GFFT variants are Monte Carlo algorithms, and so, when they succeed, will
also recover the signal exactly. AAFFT, on the other hand, is an approximation algorithm
which will fail on a non-negligible set of input signals. However, for the runs depicted
in Fig. 2.1–2.2, AAFFT always produced an answer with 2 error less than 10−4 . The
randomized GFFT variants failed a total of 7 times out of 2200 test signals, a relatively
small amount that can be reduced by parameter tuning. For the Phaseshift variants, we
chose the parameters c1 = 5, c2 = 10, and took the shift ε to be 1/2N . Finally, for the
randomized GFFT variants, we chose the Monte Carlo parameter to be 1.2.

2.5.2

Sampling Complexity

In Fig. 2.1 (a), we compare the average number of samples of the input signal S required by
each algorithm when the bandwidth N is ﬁxed at 222 . The sparsity of the test signal was
varied from 2 to 4096 by powers of two. We can see that the Phaseshift variants require over
an order of magnitude fewer samples than GFFT-RS, the GFFT variant with the lowest
sampling requirements. Both Phaseshift variants also require over an order of magnitude
fewer samples than AAFFT. The comparison with the deterministic GFFT variants is even
starker; Phaseshift-Det requires two orders of magnitude fewer samples than GFFT-DS (not
shown), and four orders of magnitude fewer samples than GFFT-DF (not shown).
In Fig. 2.2 (a), we compare the average number of samples of the input signal S required
by each algorithm when the sparsity k is ﬁxed at 60. The bandwidth N was varied from 217 –
33

226 by powers of two. Using powers of two for the bandwidth allows the best performance
for both FFTW and AAFFT, though this fact is more relevant for the runtime comparisons
in the following section. We can see that the Phaseshift variants require many fewer samples
than all four GFFT variants as well as AAFFT and FFTW, for all values of N tested. The
Phaseshift variants exhibit almost no dependence on the bandwidth for all values of N , a
feature not shared by the other deterministic algorithms.

2.5.3

Runtime Complexity

In Fig. 2.1 (b), we compare the average runtime of each algorithm over 100 test signals
when the bandwidth N is ﬁxed at 222 . The range of sparsity k considered is the same as
in section 2.5.2. For all values of k the Phaseshift variants are faster than GFFT-RF (the
fastest GFFT variant) and AAFFT by more than an order of magnitude. When compared
to GFFT-RS (not shown), GFFT-DS (not shown), and FFTW, the diﬀerence in runtime is
closer to three orders of magnitude.
In Fig. 2.2 (b), we compare the average runtime of each algorithm over 100 test signals
when the sparsity k is ﬁxed at 60. The range of bandwidth considered is the same as in
section 2.5.2. The Phaseshift variants are the only algorithms that outperform FFTW for
all values of N tested. The other implementations tested only become competitive with the
standard FFT for N

2.6

220 , while ours are faster even for modest N .

Multiple Shifts

Our algorithm requires access to samples of the input function with a very small time shift
ε < 1/N , where N is the assumed bandwidth of the signal. For signals with a very large
34

22

Samples for Fixed Bandwidth N = 2
7

Samples

10

5

10

3

10

1

10 0
10

1

10

2

10
Sparsity K

3

10

4

10

(a)
22

Runtime for Fixed Bandwidth N = 2
10

Runtime (CPU ticks)

10

8

10

6

10

4

10

0

10

1

10

2

10
Sparsity K

3

10

4

10

(b)
Figure 2.1: (a) Sampling complexity with ﬁxed bandwidth N = 222 for PS-Det (blue solid
line), GFFT-RS (red solid line), AAFFT (black dashed line), and FFTW (magenta dashed
line). (b) Runtime complexity with ﬁxed bandwidth N = 222 for PS-Det (blue solid line),
GFFT-RF (red solid line), AAFFT (black dashed line), and FFTW (magenta dashed line).
For interpretation of the references to color in this and all other ﬁgures, the reader is referred
to the electronic version of this dissertation.

35

Samples for Fixed Sparsity k = 60
7

Samples

10

5

10

3

10

5

10

6

7

10

10

8

10

Bandwidth N

(a)

Runtime (CPU ticks)

Runtime for Fixed Sparsity k = 60

9

10

7

10

5

10

5

10

6

7

10

10

8

10

Bandwidth N

(b)
Figure 2.2: (a) Sampling complexity with ﬁxed sparsity k = 60 for PS-Det (blue solid line),
GFFT-RS (red solid line), AAFFT (black dashed line), and FFTW (magenta dashed line).
(b) Runtime complexity with ﬁxed sparsity k = 60 for PS-Det (blue solid line), GFFT-RF
(red solid line), AAFFT (black dashed line), and FFTW (magenta dashed line).

36

bandwidth N

1012 , this time shift is impractical to implement with current hardware.

Following [47], however, it is possible to combine two or more much smaller shifts ε1 , ε2 to
achieve the same eﬀect.

2.6.1

Two shifts

Let r1 , r2 be co-prime integers with r1 r2 ≥ N , and set εi = 1/ri for i = 1, 2. Since |ω| > ri
in general, we no longer have the relationship

ω=

1
Arg
2πεi

Sp,εi [h]

(2.31)

Sp [h]

for h = ω (mod p) due to the wrap-around of the frequencies (mod ri ). Rather, we consider
the related quantities
def 1
Arg
σi =
2π

Sp,εi [h]
Sp [h]

,

(2.32)

so that σi ∈ [−1/2, 1/2). We then have

σi =

ω
+ ki
ri

(2.33)

for some ki ∈ Z. Determining ω is then equivalent to determining the values ki .
Note that (2.33) gives two equations, one for each index i. Solving these both for ω and
setting them equal gives the linear diophantine equation

r1 k1 − r2 k2 = r1 σ1 − r2 σ2 ,

(2.34)

which is solvable for (k1 , k2 ) with any integer right-hand side since r1 , r2 were taken to be
37

co-prime. (See, e.g., [38] for a basic treatment of linear diophantine equations.) Indeed, let
b be the right-hand side of (2.34), and let (x, y) be integers such that

r1 x + r2 y = 1.

(2.35)

(These are easily furnished by the extended Euclidean algorithm [13].) The general solution
to (2.34) is given by

k1 = xb − r2 n
k2 = −yb − r1 n,

(2.36)

for any integer n. Plugging the ﬁrst of these in to (2.33) and solving for ω, we have

ω = r1 σ1 − r1 bx − r1 r2 n.

(2.37)

Since r1 r2 ≥ N and diﬀerent solutions ω diﬀer by at least r1 r2 , there is a unique n such
that −N/2 ≤ ω < N/2. Knowing this, we can determine the value of ω mod N eﬃciently.
Speciﬁcally, n must take one of the values

r1 σ1 − r1 bx
r1 r2

or

r1 σ1 − r1 bx
,
r1 r2

so we can simply check if the ﬁrst of these puts ω in the correct range, and if not take the
second.
The only requirements on r1 , r2 are that they be co-prime with product greater than N .
In particular, we can take r1 = N 1/2 and r2 = r1 + 1, so that the shifts required by the
38

algorithm are of the order N −1/2 instead of N −1 when using only one ε. For N = 1012 ,
say, this allows clock speeds in the megahertz range instead of terahertz. In practice, one
would prefer more separation between the values r1 and r2 ; this can easily be achieved by
taking r1 to be an even number greater than N 1/2 /α and r2 an odd prime greater than
α N 1/2 for some α > 1.

2.6.2

Three or more shifts

In the previous section we saw that using two shifts εi reduced the clock rate requirements
from O(N −1 ) to O(N −1/2 ). We will show in this section that a similar procedure with
m shifts can reduce the clock rate to O(N −1/m ), at the cost of solving more complicated
systems than (2.34).
To this end, suppose {ri }m are integers such that
i=1

m r ≥ N , and deﬁne ε = r−1 .
i
i=1 i
i

Using (2.32) to deﬁne σi , we again have a system of equations of the form (2.33) for i =
1, . . . , m. Solving these for ω and summing the resulting equations gives
m
ri (σi − ki );

mω =

(2.38)

i=1
on the other hand, multiplying any one of the equations by m and solving for ω yields

mω = mri (σi − ki ).

(2.39)

Choosing i = 1 in (2.39) and combining it with (2.38) then gives the linear diophantine
39

equation in m variables (k1 , . . . , km )
m

m

(m − 1)r1 k1 −

ri ki = (m − 1)r1 σ1 −
i=2

ri σ i .

(2.40)

i=2

This is solvable for any integer right-hand side when gcd ((m − 1)r1 , r2 , . . . , rm ) = 1.

The analogous system (2.34) with two shifts admitted an eﬃcient method to ﬁnd the
unique solution (k1 , k2 ) such that ω ∈ [−N/2, N/2). In the case of three or more shifts the
problem is considerably more diﬃcult, since there are m − 1 ≥ 2 degrees of freedom in the
solution space. Moreover, distinct solutions do not give rise to values of ω separated by N
or more. In fact, for any value of m one can express the solutions to (2.40) in the form

k = Az + d,

(2.41)

where k ∈ Zm and z ∈ Zm−1 are variable and A ∈ Qm×m−1 and d ∈ Zm−1 are ﬁxed
[43]. In addition we have the inequality constraints

−

N
N
+ σi < ki ≤
+ σi .
2ri
2ri

(2.42)

Let ui , li denote respectively the upper and lower bounds on ki from (2.42). Given these
and the values di one can consider (2.41) as a feasible integer programming problem: does
there exist z ∈ Zm−1 such that








 A 
 u−d 

z ≤ 
?
−A
d−l
40

(2.43)

For ﬁxed values of m this problem is solvable in polynomial time [34], while the general case
is known to be N P-complete [43].

2.7

Conclusion

In this chapter we presented a deterministic and Las Vegas algorithm for the sparse Fourier
transform problem that empirically outperform existing algorithms in average-case sampling
and runtime complexity. While our worst-case bounds do not improve the asymptotic complexity, we proved average-case bounds over a representative class of random signals that
signiﬁcantly improve the computational complexity of our algorithm with respect to its competitors. We are moreover able to extend by an order of magnitude the range of sparsity for
which our algorithm is faster than FFTW in the average case.
The improved performance of our algorithm can be attributed to two major factors:
adaptivity and ability to detect aliasing. In particular, we are able to extract more information from a small number of function samples by considering the phase of the DFT
coeﬃcients in addition to their magnitudes. This represents a signiﬁcant improvement over
the current state of the art for the sparse Fourier transform problem. Finally, we showed
that our algorithm can be implemented eﬃciently using two large co-prime shifts in place of
one very small shift. This has signiﬁcant implications in practical situations where hardware
with high clock rates costs signiﬁcantly more than lower rate hardware.

41

Chapter 3
Multi-scale sub-linear Fourier
algorithms for noisy data

3.1

Introduction

In chapter we developed algorithms for approximating the discrete Fourier transform of
noiseless signals using much fewer computational resources than a traditional FFT. Unfortunately any signal recorded from real-world data will contain errors of one sort or another,
due to imprecise instrumentation or noisy environments. In this chapter we develop modiﬁed
versions of our noiseless algorithm to address the challenge of noisy, real-world data.
The remainder of this chapter is organized as follows. In section 3.2 we describe our noise
model, discuss some of the problems associated with noisy signals, and argue that in certain
applications the 2 error metric is inappropriate and should be replaced with a form of Earth
Mover Distance. In section 3.3 we give a simple modiﬁcation of our algorithm suitable for
low-level noise. The runtime complexity for this algorithm is still Θ(k log k), but with a much
42

larger constant than in the noiseless case. We illustrate its performance with an empirical
evaluation. In section 3.4 we give another modiﬁcation of our algorithm which uses multiple
shifts εq to determine energetic frequency values in a multi-scale manner. The resulting
algorithm has runtime complexity Θ k 2 log(k) , but is robust to higher-level noise. This is
conﬁrmed in an empirical evaluation of the two versions of our multi-scale algorithm.

3.2

Mathematical background

In this section we describe the mathematical setting for our study of sub-linear Fourier algorithms in the presence of noise. In section 3.2.1 we describe the speciﬁc noise model used in
our analysis and our empirical studies, and we discuss its advantages and limitations. In section 3.2.2 we derive some fundamental results on the stability of the operations underpinning
our algorithm to noise.

3.2.1

Noise model

In order to derive some analytic estimates for use in our algorithms below, we limit ourselves
in this dissertation to noisy signals of a speciﬁc type. Namely, we consider i.i.d. complex
gaussian noise with covariance Σ = σ 2 I, where I is the two-by-two identity matrix. If S(t)
is a signal of the form (2.1), our discrete measurements are of the form

n
Sp [j] = S

j
p

1
2
+ σ ηj + iηj ,

(3.1)

i
where ηj are i.i.d. standard normal random variables (mean zero, unit variance) for i = 1, 2,
j = 1, . . . , p and the superscript “n” denotes that these are noisy measurements.
43

Noise of the form (3.1) has the advantage that it is amenable to analytic estimates,
at the expense of perhaps being overly pessimistic in modeling real-world phenomena. To
emphasize this last point, we note that since gaussian random variables are unbounded, there
is always a small probability that the noise overwhelms the signal in a given measurement.
This is not realistic for real-world signals, and in our empirical studies in sections 3.3.2 and
3.4.3 we truncate the noise to two standard deviations. We note further that uncorrelated
white noise is only a model for real-world noise; depending on the application area, some
dependence or color may be present.

3.2.2

Sensitivity to noise

As in section 2.2.1, we apply the DFT to our samples to approximate the signiﬁcant Fourier
coeﬃcients of the underlying signal S(t). With noisy measurements of the form (3.1), we
have
p−1
n
Sp [h] =

n
Sp [j]e−2πihj/p

j=0
p−1
=

S
j=0

j
p

1
2
+ σ ηj + iηj

1
2
= Sp [h] + σ ηh + iηh .

e−2πihj/p

(3.2)

Here we’ve written
i def
ηh =

p−1

i
ηj e−2πihj/p

j=0
44

(3.3)

i
i
to denote the DFT of the noise terms ηj . Note that E ηh = 0 and E

i
ηh

2

= p for

i = 1, 2, h = 0, . . . , p − 1.
Since these noise terms have mean zero, we have

n
E Sp [h] = Sp [h].

(3.4)

n
Moreover, the variance of Sp [h] can be calculated as

n
Var Sp [h] = E

n
Sp [h] − Sp [h]

= E σ2

1
ηh

2

2

2
+ ηh

= 2σ 2 p.

2

(3.5)

n
Thus, a typical noisy DFT coeﬃcient Sp [h] will deviate from the true value Sp [h] by an
√
amount proportional to σ p. Recall from 2.2.1 that the discrete Fourier coeﬃcients are
given by Sp [h] = pa for ω ≡ h (mod p). A typical noisy DFT coeﬃcient will therefore
satisfy
n
Sp [h] = p a + O

σ
√
p

.

(3.6)

n
The locations of the peaks of Sp [h] will therefore give the correct values of ω (mod p) as
√
long as σ/ p
|a |. This will be the case in any reasonable setting, since in general there
can be no hope of recovering Fourier coeﬃcients whose magnitude is less than that of the
noise.
Our noiseless algorithm relies crucially on the computation of the phase angle of complex
numbers to determine the frequency index associated with a signiﬁcant Fourier coeﬃcient.
45

The numbers whose phase angles we are interested in calculating are the ratios Sp,ε [h]/Sp [h],
n
n
which must be replaced in the noisy case by Sp,ε [h]/Sp [h]. Suppose that ω ≡ h (mod p) is
one of the frequencies appearing in (2.1) and let a be the corresponding Fourier coeﬃcient
(so that a = p−1 Sp [h]). Using the binomial approximation (1 + z)α = 1 + αz + O z 2 and
a typical value for the noisy DFT coeﬃcients, we have
√
n
Sp,ε [h]
Sp [h]e2πiωε + O σ p
=
√
n
Sp [h]
Sp [h] + O σ p
√
e2πiωε + O σ/a p
=
√
1 + O σ/a p
√
= e2πiωε + O (σ/a p) .

(3.7)

Thus the ratio of noisy DFT coeﬃcients agrees with the noiseless ratio up to an error term
√
of order σ/|a| p.
Given this estimate for the ratio of noisy DFT coeﬃcients, we can derive bounds for
the error in the phase angle computed via Arg(z). Note that Arg(z) is diﬀerentiable except
across the negative real axis, so assume for the moment that Arg(z) is suﬃciently far from
±π. Since Arg(z) is constant on rays from the origin and linear (with unit slope) along
circles centered at the origin, we have

|Arg (z + η) − Arg(z)| ≤ |η|,

(3.8)

as long as −π + |η| < Arg(z) < π − |η|. Combining the bound (3.8) with the estimate (3.7),
we have
Arg

n
Sp,ε [h]
n
Sp [h]

− 2πωε ≤ O
46

σ
√
|a| p

;

(3.9)

1
equivalently, if ω = 2πε Arg

n
Sp,ε [h]
n
Sp [h]

is our noisy frequency estimate, we have

|ω − ω| ≤ O

σ
√
2πε|a| p

.

(3.10)

√
For ﬁxed ε, |a|, we can see from this last inequality that the ratio σ/ p is critical in determining the sensitivity of our phase calculation to noise. In the remainder of this chapter
we examine how the choice p and ε aﬀect the quality of the frequency approximation in our
algorithm.

3.2.3

Earth mover distance

In the existing literature on the sparse Fourier transform, the 2 norm is most often used to
assess the quality of approximation. There are many reasons for this choice, with the two
most convincing perhaps being the completeness of the complex exponentials with respect to
the 2 norm and Parseval’s theorem. For certain applications, however, this choice of norm
is inappropriate. For example, in wide-band spectral estimation and radar applications, one
is interested in identifying a set of frequency intervals containing active Fourier modes. In
this case, an estimate ω of the true frequency ω with |ω − ω|

N is useful, but unless ω = ω

the 2 metric will report an O(1) error.
For these reasons, we propose measuring the approximation error of sparse Fourier transform problems with the Earth Mover Distance (EMD) [40]. Originally developed in the
context of content-based image retrieval, EMD measures the minimum cost that must be
paid (with a user-speciﬁed cost function) to transform one distribution of points into another. EMD can be calculated eﬃciently as the solution of a linear program corresponding to
47

a certain ﬂow minimization problem. In our situation, we consider the cost to move a set of
k
k
to the true values (ωi , cωj )
estimated Fourier modes and coeﬃcients (ωj , cω )
j j=1
j=1
under the cost function

d1 (ω, cω ), (ω, cω ); N

def |ω − ω|
=
+ |cω − cω |.
N

(3.11)

This choice of cost function strikes a balance between the ﬁdelity of the frequency estimate
(as a fraction of the bandwidth) and that of the coeﬃcient estimate. We denote the EMD
using d1 for the cost function by EMD(1) in our empirical studies in sections 3.3.2 and 3.4.3
below.

3.3

A minor modiﬁcation

In this section we give a simple modiﬁcation of our noiseless algorithm to improve the
robustness of the frequency estimation in the presence of noise. In section 3.3.1 we describe
the modiﬁcation and study its performance as a function of the noise and the sampling rate
p. In section 3.3.2 we report the results of an empirical evaluation of the accuracy of the
modiﬁed algorithm as a function of the noise level.

3.3.1

Rounding

As noted in section 3.2.2, the frequency reconstruction from noisy measurements is correct
√
up to an error term of size O σ/2πε|a| p . Moreover, the location of the peaks in the noisy
DFT are robust to noise, so that the values of ω (mod p) given by the indices of the k
n
largest coeﬃcients in Sp [h] can be taken as exact. By combining these two measures we can
48

more reliably estimate the frequency indices of signiﬁcant coeﬃcients.
Our proposed modiﬁcation is therefore to simply round the noisy frequency estimate
n
Sp,ε [h]
1
to the nearest integer of the form np + h. This improved estimate
ω = 2πε Arg
n
Sp [h]
is therefore given by
ω = p · round

ω−h
p

+ h,

(3.12)

where round(x) returns the nearest integer to x. For low levels of noise this modiﬁcation will
return the true value ω, while for larger noise levels it is possible that ω deviates by more
than p/2 from the true frequency ω. In this case the estimate ω will be wrong by a multiple
of p. Choosing larger values for p (i.e. increasing the parameter c1 ) will help to reduce the
likelihood of an error in frequency estimation. Furthermore, to assure that the estimated
frequencies are suﬃciently far from the branch cut of Arg(·) along the negative real axis, we
take the shift ε ≤ 1/2N . The estimated frequencies then satisﬁy −N ≤ ω < N , so that the
deviations due to the noise level will not push the estimates across the discontinuity.
We saw in the previous section that the error in the phase estimation is proportional to
σp−1/2 , so we would expect similar behavior for the improved estimate ω . To test this, we
generated 100 frequencies uniformly at random from [−N/2, N/2) with N = 222 for a range
of parameters (σ, p), and set the corresponding coeﬃcient to unity. Thus our test signals
for this empirical trial were one-term trigonometric polynomials. We then reconstructed the
frequencies in two ways: ﬁrst, simply using the formula 2.11, and second by combining this
estimate with the rounding procedure (3.12). In ﬁgure 3.1 we plot the average phase error
in logarithmic scale as a function of both σ and p, which were varied from 0.001 to 0.512
and from 10 to 5120, respectively, by powers of two.
In the plot on the left, which corresponds to reconstruction using only (2.11), we can
49

no rounding
12

15

10
8

log2(p)

log2(p)

12

rounding

10

6
4

15

10
10

8
6

5

4

5
−8 −6 −4 −2
log2(σ)

−8 −6 −4 −2
log2(σ)

Figure 3.1: (left): Mean phase error without rounding (shaded attribute) vs. noise level σ
and sample length p, in logarithmic scale. (right) Mean phase error with rounding (shaded
attribute) vs. noise level σ and sample length p, in logarithmic scale. The bandwidth for
both plots is N = 222 .
clearly see the contours of constant phase error obeying the relationship log2 (p) = 2 log2 (σ)+
c for some constant c. This conﬁrms our analytic estimate from section 3.2.2 that the phase
√
error is proportional to σ/ p. In the plot on the right, which corresponds to the improved
reconstruction using (3.12), we can see that for large values of σ and small values of p the
same relationship holds. However, for smaller σ and larger p we see an abrupt transition to
exact reconstruction (the white area in the upper-left). The boundary of this region appears
2
to follow the relationship log2 (p) = 3 log2 (σ) + c, indicating that for small enough values of
the ratio σ/p3/2 the rounding procedure is exact.

3.3.2

Empirical evaluation

In this section we describe the results of an empirical evaluation of the algorithm with
the rounding procedure (3.12). This modiﬁcation doesn’t change the runtime or sampling
complexity signiﬁcantly, so we focus here on the error in the approximation as a function of
the noise level σ and the parameter c1 .
50

22

Rounding only, k = 64, N = 2

Avg. EMD(1) Error

0

10

−2

10

−4

10

−6

10

−3

10

−2

−1

10

10

0

10

Noise level σ

Figure 3.2: EMD(1) error as a function of the noise level σ with the sparsity and bandwidth
ﬁxed at k = 64, N = 222 , respectively. Diﬀerent values of the parameter c1 are shown:
c1 = 4 (blue solid line), c1 = 16 (black dashed line), c1 = 64 (red solid line), and c1 = 256
(magenta dashed line).

In ﬁgure 3.2 we report the average EMD(1) error over 100 test signals as a function of the
input noise level σ (note that σ is not given in dB), for various choices of the parameter c1 .
The test signals were generated in the same manner as in section 2.4.2: k-term trigonometric
polynomials with frequencies drawn uniformly at random from [−N/2, N/2) and coeﬃcients
drawn uniformly from the complex unit circle, with complex gaussian noise of variance σ
added to each measurement. In this experiment, the sparsity and bandwidth are ﬁxed at
k = 64 and N = 222 , respectively. As expected, the error decreases as c1 increases, since the
rounding procedure is more likely to result in the true frequency. Moreover, for larger values
of c1 , the error increases linearly with the noise level, indicating the procedure’s robustness
in the presence of noise. For smaller values of c1 the EMD(1) error is saturated by the
frequency error, so that there is only a mild dependence on σ.
We remark that in the noiseless case the choice c1 = 5 was found to be suﬃcient, while
ﬁgure 3.2 indicates that the much larger value c1 ≈ 256 is necessary for good approximation
in the EMD(1) metric. The larger sample lengths imply an increase in both the runtime and
51

sampling complexity, and indicate that the rounding procedure should be complemented by
other modiﬁcations. In section 3.4 we combine the rounding procedure with the use of larger
shifts εq in a multiscale approach to frequency estimation that is robust to higher noise levels
even at coarse sampling rates (i.e. small values of c1 ).

3.4

Multi-scale methods

In this section we describe two approaches to multi-scale frequency estimation using multiple
shifts εq . These algorithms exhibit a form of error correction in their estimation of signiﬁcant
frequency values, which allows them to use much coarser sampling rates than the simple
rounding method described in section 3.3 to achieve a desired error. This comes, however, at
the cost of increased average-case runtime and sampling complexities of Θ k 2 log(k) log(N )
and Θ k 2 log(N ) , respectively. We note that the algorithms presented below, while simple
to describe and implement, are not optimal in terms of computational complexity – the
authors of [25] have recently given a Θ (k log(N ) log(N/k)) time algorithm for the general
(noisy) case. That algorithm, however, is considerably more complicated than ours, and
moreover has not yet been shown to empirically outperform standard FFT implementations.
In section 3.4.1 we give some background on our multi-scale methods and introduce the
main idea of our algorithms. In section 3.4.2 we describe two variations on the basic multiscale algorithm, and in section 3.4.3 we report the results of an empirical evaluation of the
algorithms.
52

3.4.1

Preliminaries

In both the noiseless algorithm and the modiﬁed version given by rounding the frequencies
to the nearest integers with the correct remainder modulo p, we required that ε ≤ 1/N . This
was due to the fact that the range of the principle argument function Arg(z) is [−π, π), so
to identify a frequency bandlimited to [−N/2, N/2) without any wrap-around aliasing we
needed to ensure that 2πεω ∈ [−π, π). Moreover, we noted in remark 2.2.1 that, given an
1
estimate of the magnitude of a particular frequency |ω| < L , then taking ε < L suﬃces
2
to recover ω without wrap-around aliasing. This is the key observation for our multi-scale
algorithms, which build up an estimate for ω in an iterative fashion. By increasing the shift
ε while simultaneously demodulating by our current frequency estimate, we obtain a new
estimate on a ﬁner scale that allows us to correct for errors introduced in previous iterations.

3.4.2

Algorithms

Let us describe the general architecture of these algorithms in some detail. We begin by
0
choosing an appropriate p and ε < 1/2N and calculating our initial frequency estimates ωj ,
1 ≤ j ≤ k ∗ , as in section 3.3. Thus for ωj ≡ h (mod p) we have

0
ωj = p · round

1
2πεp

Arg

n
Sp,ε [h]
n
Sp [h]

−h

+ h,

(3.13)

√
which will in general diﬀer from the true frequency ωj by approximately σ/2πε p. For an
appropriate choice of p, the diﬀerence between the estimated and true frequencies will satisfy

0
ωj − ωj
53

ωj .

(3.14)

We then begin an iterative frequency estimation procedure that tries to correct for the
error in phase. For each of the k ∗ estimated frequencies, we increase the shift ε (say by a
multiplicative factor of 2B , where B is a parameter) and demodulate the input by our ﬁrst
0
estimate ωj . That is, for 1 ≤ j ≤ k ∗ we collect the samples

n
Sp [q] = e

0
−2πiωj q/p
S

q
p

,

0
B
n [q] = e−2πiωj q/p+ε2 S
Sp,ε

(3.15)
q
+ ε2B ,
p

(3.16)

and perform DFTs on them. Note that by elementary properties of Fourier series, the peak
0
location h = ωj (mod p) will be shifted down by ωj (mod p). From (3.13) we see that
0
ωj (mod p) = h, so that the peak corresponding to ωj will appear at the zero frequency in
the demodulated signal.

Moreover, the phase diﬀerence between the shifted and unshifted DFT values at the zero
0
frequency will be proportional to ωj − ωj , and we can estimate this phase error using the
principle argument function as before. We therefore have the improved estimate

1
0
ωj = ωj + p · round

1
2πε2B p

Arg

n
Sp,ε [0]
n
Sp [0]

.

(3.17)

Note that increasing the shift by a factor 2B ensures that the correction term will lie
−1 , 1 , and will diﬀer from the true error ω 0 − ω by approximately
j
j
ε2B ε2B
√
σ2−B /2πε p. Thus the correction term operates on scales a factor 2B smaller than the

in the range

original estimate.

The algorithms then iterate this procedure: demodulate the input samples by the previous
54

i−1
frequency estimate ωj , increase the shift by a factor 2B , and estimate the correction term
as in (3.17). In this way the estimates “zoom in” on the correct frequency value in a multiscale manner. Since we must demodulate by diﬀerent amounts for each of the k frequencies,
one iteration of this procedure requires Θ k 2 samples and Θ k 2 log(k) time. Note that
each iteration of the frequency estimation procedure provides a new estimate of a signiﬁcant
n
frequency’s corresponding coeﬃcient in Sp [0]. We use this fact to improve the robustness
of our coeﬃcient estimation by taking the median of the real and imaginary parts of the
estimates.
Our two versions diﬀer in how the the shifts are increased, which in turn aﬀects how
many iterations of the estimation procedure are performed. In the “non-adaptive shift”
algorithm, B is a user-speciﬁed constant, so the number of iterations in the non-adaptive
shift algorithm is log2 (N ) /B. The “adaptive shift” algorithm uses and uses the size of the
previous correction term to estimate an appropriate value for εi , the the shift used in the
j
ith iteration on the j th frequency. Speciﬁcally, we set

i−1


 2 α+log2 ωj

N
εi =
j 
 2 α+log2 (p)


N

i−1 = 0,
if ωj

(3.18)

otherwise.

The “safety parameter” α ≥ 1 guards against underestimating the magnitude of the true
error. In this adaptive version we stop the estimation procedure when the correction term
is zero in two consecutive iterations for all frequencies. While we cannot make an analytic
claim on the number of iterations the adaptive shift algorithm makes, in practice it performs
similarly to the non-adaptive version.
Finally, we note that for small values of the sparsity k, only requiring p ≥ c1 k would
55

allow for very coarse sampling rates. As we saw in section 3.2.2, the frequency error is
√
proportional to σ/ε p. If p is too small, then this phase error could be very large, and it
may not be the case that (3.14) holds. To guard against this we require in addition that
p ≥ cσ 2 22B , where the constant c is determined from a ﬁt to the data in ﬁgure 3.1 (left).

3.4.3

Empirical evaluation

To test both versions of our multi-scale algorithms, we performed an empirical evaluation
with test signals identical to those in section 3.3.2. We varied the noise level and the
parameter c1 over the same range of values, set B = 4 in the non-adaptive shift algorithm
and α = 1 in the adaptive shift algorithm. In ﬁgure 3.3 we report the average EMD(1) error
as a function of the noise level σ for diﬀerent choices of the parameter c1 . The sparsity
k = 64 and bandwidth N = 222 are ﬁxed, as in ﬁgure 3.2. Note that for both the nonadaptive shift and adaptive shift algorithms the error is well-behaved for all values of c1 ,
whereas we had to take c1 ≈ 256 to achieve a similar behavior when using the rounding only
algorithm of section 3.3. Thus with our multi-scale algorithms, we can achieve a prescribed
error tolerance using much coarser samplings.
In ﬁgure 3.4 we report the number of samples and the runtime required for each of our
√
three algorithms to approximate the true spectrum to within σ/ k in the EMD(1) metric.
In these plots, the bandwidth N = 222 and noise level σ = 0.512 ﬁxed, but the parameter
c1 varies for each data point. As seen in ﬁgures 3.2 and 3.3, the rounding only algorithm
generally needs a larger c1 value to achieve the same error as the multi-scale algorithms. To
√
make a fair comparison, we ﬁnd the smallest c1 that achieves the error bound σ/ k and
plot the corresponding number of samples and runtime.
56

Nonadaptive shifts, k = 64, N = 222

Avg. EMD(1) Error

0

10

−2

10

−4

10

−6

10

−3

10

−2

−1

10

0

10

10

Noise level σ

(a)
22

Adaptive shifts, k = 64, N = 2

Avg. EMD(1) Error

0

10

−2

10

−4

10

−6

10

−3

10

−2

−1

10

10

0

10

Noise level σ

(b)
Figure 3.3: EMD(1) error as a function of the noise level σ with the sparsity and bandwidth
ﬁxed at k = 64, N = 222 , respectively. Diﬀerent values of the parameter c1 are shown:
c1 = 4 (blue solid line), c1 = 16 (black dashed line), c1 = 64 (red solid line), and c1 =
256 (magenta dashed line). (a) Nonadaptive shift algorithm (B = 4), (b) Adaptive shift
algorithm (α = 1).

57

From ﬁgure 3.4 (a) we can see that the rounding only algorithm requires fewer samples
to achieve the error bound than the multi-scale algorithms, even though the parameter c1
is much larger in general. From ﬁgure 3.4 (b) we can see that the same holds true for the
runtime. This is to be expected since the iterative frequency estimation procedure in the
multi-scale algorithms is currently a computational bottleneck.

3.5

Conclusion

In this chapter we extended the algorithm of chapter to handle noisy signals. The ﬁrst
modiﬁcation we gave is a minor alteration to the noiseless algorithm that doesn’t change
the computational complexity signiﬁcantly, though it does require a relatively ﬁne sampling
rate to produce quality approximations in the EMD(1) metric. We also gave two multi-scale
algorithms that exhibit error correction in their frequency estimation procedures. While
these algorithms allow the use of much coarser samplings to provide good approximations,
they are slower both in theory and in practice than the minor modiﬁcation. We remark that
our multi-scale algorithms are somewhat wasteful in their frequency estimation procedures,
since currently only the values of the samples at the zero frequency are used to estimate
the frequency error. It remains an open question as to whether more information can be
extracted from the demodulated samples so as to reduce the complexity of the frequency
estimation procedure.

58

Avg. samples (CPU ticks)

Samples for N=222, EMD(1) < 0.512/sqrt(k)

8

10

6

10

4

10

2

10

0

10 0
10

1

10
Sparsity k

2

10

Avg. runtime (CPU ticks)

(a)
Runtime for N=222, EMD(1) < 0.512/sqrt(k)

10

10

8

10

6

10

4

10

0

10

1

10
Sparsity k

2

10

(b)
√
Figure 3.4: (a) Samples to achieve EMD(1) error σ/ k, as a function of the sparsity k
with ﬁxed σ = 0.512, N = 222 , for rounding only (blue solid line), non-adaptive shift (red
dash-dot line), and adaptive shift (magenta dashed line) algorithms. (b) Runtime to achieve
the same error bound, as a function of the sparsity k.

59

Chapter 4
Sub-linear Fourier algorithms in
multiple dimensions

4.1

Introduction

In this chapter we shift our focus from one-dimensional signals to the multidimensional
case, concentrating (as in chapter ) on the noiseless case. We give three extensions of our
basic one-dimensional noiseless algorithm to the multi-dimensional setting. Two of these
are a result of the separability properties of the exponential function, while the third uses a
number-theoretic construction to generate an equivalent one-dimensional problem on which
our original algorithm can be applied. In this chapter we show that the resulting algorithms
require fewer samples of the input and execute in less CPU time than the d-dimensional
FFT.
The remainder of this chapter is organized as follows. In section 4.2 we deﬁne our notation
for the multidimensional case. In section 4.3, we describe a straightforward extension of
60

our one-dimensional algorithm to multiple dimensions. The resulting algorithm is simple
to describe and implement, but suﬀers from increased computational complexity that scales
exponentially in the dimension. In order to overcome this barrier, in section 4.4 we propose an
alternative method that computes the one-dimensional Fourier transform of an “unwrapped”
version of the multi-dimensional signal. Finally, in section 2.5 we describe the results of an
empirical evaluation of the proposed methods in two dimensions.

4.2

Notation

We denote the ambient dimension by d, and change the independent variable from t to
x and use boldface to emphasize the multidimensional character of the input. We write
the components of d-dimensional vectors with subscripts, so that x = x1 , . . . , xd

. The

Euclidean inner product is written

def
ω·x =

d
ωq xq ,

(4.1)

q=1
and we denote by
d
−

Q = Q N1 , . . . , Nd =
q=1

Nq Nq
,
2 2

∩Z

(4.2)

the d-dimensional orthotope with side lengths N1 , . . . , Nd centered at the origin. The letter
N (without subscripts) denotes the cardinality of Q, so that
d
N = |Q| =

Nq .
q=1

61

(4.3)

Our principal objects of study in this chapter are frequency-sparse band-limited signals
S : [0, 1]d → C of the form
k
S(x) =

aj e

2πiωj ·x
,

(4.4)

j=1
where aj ∈ C. In this setting, “band-limited” means that ωj ∈ Q N1 , . . . , Nd for some
integers N1 , . . . , Nd , and “frequency-sparse” means that k

N = |Q|. The d-dimensional

Fourier series S(ω) of a signal S(x) is deﬁned analogously to (2.2) by

S(ω) =

[0,1]d

S(x)e−2πiω·x dx, ω ∈ Zd .

(4.5)

For signals of the form (4.4) we have

S(ω ) =

[0,1]d



k
aj e


j=1

k
=

aj
j=1

2πiωj ·x  −2πiω ·x
e
dx

[0,1]d

2πi(ωj −ω )·x
e
dx.

(4.6)

The separability of the exponentials ex = ex1 · · · exd implies that the integral in (4.6) factors
into the product of d one-dimensional integrals of the form
1 2πi(ω −ω )xq
j,q
,q
e
dxq , q = 1, . . . , d.
0

(4.7)

Since all components of the frequency vectors ωj and ω are integers, each of these integrals
is just the Kronecker delta function δj , so that S(ω ) = a for

= 1, . . . , k and 0 for all

other ω ∈ Q.
Given a ﬁnite d-dimensional array S of size p1 × · · · × pd we deﬁne its discrete Fourier
62

transform by
p−1
S[h] =

S[j]e−2πih·j/p ,

(4.8)

j=0
where p = p1 , . . . , pd , the indices hq and jq range from 0 to pq − 1 for q = 1, . . . , d,
def
the division j/p = j1 /p1 , . . . , jd /pd is deﬁned elementwise, and the sum over the vector
index j is nested and performed over each dimension, so that
p−1
j=0

def
=

pd −1

p1 −1
···
j1 =0

.

(4.9)

jd =0

For p = p1 · · · pd , multi-dimensional FFT algorithms [46] permit the computation of S in
O(p log p) time.
As in chapter , the simplest method to determine the signiﬁcant frequency/coeﬃcient
k
in the Fourier transform of a signal S of the form (4.4) is to sample at
pairs (ωj , aj )
j=1
the Nyquist rate in each dimension and compute the full d-dimensional DFT. For a signal
bandlimited to Q(N1 , . . . , Nd ) this procedure has sampling and runtime complexities of
Θ(N ) and Θ(N log N ), respectively, where N = |Q|. When k

N this is computationally

wasteful, and in the following sections we develop methods which are sub-linear in N .

4.3

Straightforward extension to multiple dimensions

In this section we describe a straightforward extension of the algorithm given in chapter
to multiple dimensions. Recall that in the one-dimensional case, we used the fact that the
phase modulation in the DFT coeﬃcients of time-shifted samples of the input is linear in
the frequency index. In section 4.3.1 we show that the same is true in multiple dimensions.
63

In section 4.3.2 we exploit this fact to develop an algorithm with sampling and runtime
complexities of Θ k d and Θ k d log k d

4.3.1

, respectively.

Preliminaries

As in chapter we apply the d-dimensional DFT to discrete samples of S(x) to determine the
signiﬁcant frequency/coeﬃcient pairs

ωj , aj

k
. Fix integers p1 , . . . , pd and positive
j=1

real numbers ε1 , . . . , εd . We form d + 1 discrete d-dimensional arrays Sp , Sp,ε1 , . . . , Sp,εd
by sampling S(x) at rate 1/pq in the q th dimension starting with xq = 0 and xq = εq . That
is, for 0 ≤ jq ≤ pq − 1,

Sp [j1 , . . . , jd ] = S
Sp,εq [j1 , . . . , jd ] = S

j
j1
,..., d
p1
pd

,

j
jq
j1
+ εq , . . . , d
,...,
p1
pq
pd

.

(4.10)

If S(x) is of the form (4.4), we have (as in chapter )
p−1
Sp [h] =

S

j
p

e−2πih·j/p

j=0


p−1
k

=
a e2πiω ·j/p  e−2πih·j/p
j=0 =1
p−1
k
=
a
e2πi(ω −h)·j/p
=1 j=0
= p1 · · · pd

a .

(4.11)

ω ≡h (mod p)
Here as before, the division of vectors j/p = (j1 /p1 , . . . , jd /pd ) is to be interpreted ele64

mentwise, and ω ≡ h (mod p) is shorthand for the d simultaneous congruences ω ,q ≡
hq (mod pq ), q = 1, . . . , d. The last equality in (4.11) can be seen as follows: in expanded
form, the inner sum in the second-to-last line in (4.11) reads
pd −1

p1 −1
···
j1 =0

e

2πi (ω ,1 −h1 )j1 /p1 + ··· + (ω ,d −hd )jd /pd
,

(4.12)

jd =0

which can be rewritten as the product of d sums
d pq −1
e

2πi(ω ,q −hq )jq /pq
.

(4.13)

q=1 jq =0
As in chapter , each of the sums in (4.13) is equal to pq if (ω ,q − hq )/pq is integral, and
zero otherwise. The product is thus equal to p1 · · · pd if ω ,q ≡ hq (mod pq ) for 1 ≤ q ≤ d
and zero otherwise, hence (4.11).
Similarly, for the shifted samples we have
p−1
Sp,εq [h] =

Sp,εq [j]e−2πih·j/p

j=0


p−1
k
2πi ω ·j/p+ω ,q εq

 e−2πih·j/p
=
a e
j=0 =1
p−1
k
2πi ω −h ·j/p+ω ,q εq
=
a
e
=1 j=0
2πiω ,q εq
a e
.

= p1 · · · pd

(4.14)

ω ≡h (mod p)

Assuming for the moment that ω (mod p) k=1 are distinct (as d-vectors), we then
65

have for the unshifted samples


 p ···p a
1
d
Sp [h] =

 0

if ω ≡ h (mod p),
(4.15)
otherwise,

while for the shifted samples we have, for 1 ≤ q ≤ d,


 p · · · p a e2πiεq ω ,q if ω ≡ h (mod p),
1
d
Sp,εq [h] =

 0
otherwise.

(4.16)

1
As in the one-dimensional case, if we take εq ≤ N , we can recover ω ,q by noting that, for
q
h ≡ ω (mod p),
1
ω ,q =
Arg
2πεq

4.3.2

Sp,εq [h]
Sp [h]

.

(4.17)

Algorithm

Equation (4.17) allows us to quickly extract the signiﬁcant frequency values in the multidimensional DFT of an input signal S(x) provided that no aliasing occurs in a given (ddimensional) frequency bin h. Recall that in the one-dimensional case, we tested whether
or not a collision had occurred in a frequency bin by comparing the magnitudes of the
shifted and unshifted coeﬃcients for that bin. Moreover, we took the sample lengths pj to
be prime numbers greater than a constant multiple of k (or k ∗ in the adaptive case). In
section 2.4 we proved that with this choice of pj , the expected number of collisions among
k frequencies is a constant fraction of k, so that the computational cost of each iteration of
algorithm 1 decreases geometrically in the iteration number. This in turn implied that the
total computational cost is dominated by that of the ﬁrst iteration, which gave us sampling
66

and runtime complexities of Θ(k) and Θ(k log(k)), respectively.
In multiple dimensions, collisions among frequencies can occur in any of the d dimensions.
A naive approach is then to only calculate frequency values via (4.17) for those frequency
bins h for which the ratio of shifted to unshifted coeﬃcients is suﬃciently close to unity in
all dimensions. That is, for given tolerances τq , we require all of the inequalities
Sp,εq [h]

− 1 ≤ τq , 1 ≤ q ≤ d,

(4.18)

Sp [h]
to hold in order to accept the frequency bin h. For the simulations described in section 4.5
we took τq = p1 · · · pd /N1 · · · Nd .
In order to apply the average-case analysis of section 2.4 we therefore require the sampling
rate in each dimension to be at least a constant multiple of the sparsity – that is, we take
pj,q ≥ ck (or ck ∗ ) for 1 ≤ q ≤ d. As discussed in section 2.4.1, the computational costs of
our algorithm are dominated by the FFTs and sorts of the sample arrays Sp and Sp,εq .
The sampling complexity of this naive algorithm is then Θ k d , since we require that many
signal values on the grid deﬁned by the cartesian product of the pj,q s. As the d-dimensional
FFT of an array of size Θ(k) × · · · × Θ(k) has time complexity Θ k d log k d
algorithm will execute in time Θ k d log k d

, this naive

as well.

Remark 4.3.1. The empirical evaluation in section 4.5 indicates that this choice of pj,q
values is overly cautious, and that a coarser sampling of the input may suﬃce for energetic
frequency identiﬁcation. Speciﬁcally, it was observed that the naive algorithm described
above almost never detected an aliased frequency bin, so that on average only one set of
sample rates p was used. The explanation for this observation is that by choosing the sample
67

rates pj,q ≥ ck, we are eﬀectively hashing the k frequencies into O k d d-dimensional bins,
since the discrete grid deﬁned by the cartesian product of the pj,q s has (ck)d nodes. For
d > 1 we should therefore expect many fewer multiple-occupancy d-dimensional bins than
in the d = 1 case.
We therefore tested a variant of the naive algorithm in which we take pj,q ≥ (ck)1/d (or
(ck ∗ )1/d ) for q = 1, . . . , d. In this manner the total number of d-dimensional bins is ck (or
ck ∗ ), so the sampling and runtime complexities of one iteration of the inner loop are Θ(k) and
Θ(k log k), respectively. On average a constant fraction of the frequencies end up in singleoccupancy d-dimensional frequency bins, which by (4.14) is suﬃcient for calculating the
frequency index. Since we test for aliasing in every dimension, we would expect this variant
of the naive algorithm to reject more frequencies than our one-dimensional algorithm. The
results of the empirical evaluation support this observation.

4.4

Signal unwrapping and the Chinese Remainder Theorem

In this section we provide an alternative to the naive algorithm described above that avoids
the exponential scaling of the sampling and runtime complexities in the dimension d. We
achieve this by using a clever number-theoretic technique to “unwrap” the d-dimensional
input signal S(x) into a related one-dimensional signal S u (x). We then apply the onedimensional algorithm of chapter and transform back to d-dimensional indices. As we will
show in section 4.4.1, this dimension-reduction technique follows from the Chinese Remainder
Theorem (Theorem 2.2.5), which we used previously for worst-case bounds on the number
68

of pj s used by the one-dimensional algorithm.

4.4.1

Preliminaries

In this section we provide the details of the dimension-reducing unwrapping technique, based
on ideas presented in [29] and [7]. Let S(x) be a signal of the form (4.4), and let Q =
Q N1 , . . . , Nd be as in (4.2) so that supp S ⊂ Q. Next, choose d relatively prime integers
Nq , 1 ≤ q ≤ d, such that
Nq ≥ dNq ,

(4.19)

and set
d
N=

Nq .

(4.20)

q=1
Finally, let y −1 (mod Nq ) denote the multiplicative inverse of y in Z , so that y·y −1 (mod Nq ) ≡
Nq
1 (mod Nq ). Note that y −1 (mod Nq ) exists, and is unique, whenever y is relatively prime
to Nq .
Deﬁne the function ι : Q N1 , . . . , Nd → Z by
N

ι(x) = 

d


N

q=1 Nq

xq  (mod N ).

(4.21)

Since the Nq are coprime, the Chinese Remainder Theorem implies that ι is a bijection,
whose inverse x = ι−1 (x) has components

xq = x

N

−1 (mod Nq )
.

Nq
69

(4.22)

Given these parameters, we can deﬁne a one-dimensional function S u : [0, 1] → C (the
“unwrapped” version of S) by

N

S u (x) = S

N1

x, . . . ,

N
Nd

x .

(4.23)

For notational convenience in (4.24) below, let x denote the vector argument of S in (4.23).
If S is of the form (4.4), we then have
1

S u (x)e−2πiωx dx
0 

k
1
2πiωj ·x  −2πiωx

S(ωj )e
e
dx
=
0
j=1



d
k
1
N
S(ωj )
exp −2πix ω −
ωj,q  dx.
=
Nq
0
q=1
j=1

S u (ω) =

(4.24)

From (4.24) we see that S u is non-zero only at the frequencies
d
ωj =

N

q=1 Nq

ωj,q = ι(ωj ),

(4.25)

and moreover we have S u (ω) = S ι−1 (ω) .

Finally, we note that if S(ω) is band-limited to Q, then S u (ω) is band-limited to
− N , N . To see this, note that (4.25) implies
2 2
d
|ω| ≤

N

q=1 Nq

d
ωq ≤
q=1
70

N Nq
Nq 2

d
≤
q=1

N
N
= ,
2d
2

(4.26)

Nq Nq
where we used the band-limitedness of ωq to − 2 , 2
in the second inequality and the
choice of Nq from (4.19) in the third. Thus S u (x) is frequency-sparse and band-limited
whenever S(x) is, so we may apply all the results of chapter to obtain a multi-dimensional
algorithm.

4.4.2

Algorithm

Using the bijective unwrapping ι from (4.21), we have a simple prescription for a multidimensional algorithm. Given Q we ﬁrst precompute Nq , N , and N /Nq

−1 (mod Nq )

for

1 ≤ q ≤ d using the extended Euclidean algorithm [13]. This allows us to evaluate S u at
any desired point x using (4.23). We then use the one-dimensional algorithm of chapter to
approximate S u , and ﬁnally transform back to d dimensions using ι−1 . The sampling and
runtime complexities of this “unwrapped” algorithm are Θ(k) and Θ(k log k), respectively.

Remark 4.4.1. Since this “unwrapped” algorithm merely invokes the one-dimensional algorithm of chapter , we would expect it to perform similarly to that algorithm in practice.
While it does seem to attain the stated sampling complexity, the results of the empirical
evaluation in section 4.5 imply that it performs most similarly to the variation on the naive
algorithm mentioned in remark 4.3.1. This is somewhat surprising, and we leave the relationship between the two algorithms as future work.
71

4.5

Empirical evaluation

In this section we describe the results of an empirical evaluation in two dimensions of the
three multi-dimensional algorithms described in sections 4.3 and 4.4. We only tested the
adaptive versions of these algorithms, where the sample lengths are chosen with respect to
k ∗ , the number of frequencies left in the residual. As in chapter , all three variants were
implemented in C++ using FFTW 3.0 [19] for the FFTs, and all code was compiled with the
Intel compiler using the -fast optimization. The experiments were run on the same machine
used in that section. In the ﬁgures below we refer to the three algorithms as “Naive” (section
4.3), “Naive-Sqrt” (section 4.3, remark 4.3.1), and “Unwrapped” (section 4.4).

4.5.1

Setup

For all of the experiments below, we ﬁxed the bandwidth in both dimensions to be 211 , so
in the notation of section 4.2 we have Q = [−1024, 1023) × [−1024, 1023). This value was
chosen to facilitate comparison with the one-dimensional results in section 2.5, where the
(one-dimensional) bandwidth was taken to be 222 . We varied the sparsity k from 2 to 512
by powers of two. All of the experiments were performed in the absence of noise, and we
ﬁxed the parameter c1 = 5. For the Naive algorithm and its variant Naive-Sqrt we took the
shifts εq = 1/2Nq , while for the Unwrapped version we took ε = 1/2N . The test signals
were of the form (4.4), and were generated by drawing frequencies ωj uniformly from Q and
coeﬃcients aj uniformly from the complex unit circle.
Each data point in ﬁgures 4.1 and 4.2 represents the average over 100 independent trials,
and the upper and lower bars in ﬁgure 4.1 represent the maximum and minimum values over
all 100 trials. All variants of the algorithms recovered each test signal exactly (within 10−12
72

in the 2 norm). Finally, we also compared our algorithm with the two-dimensional FFTW
routine using FFTW MEASURE plans.

4.5.2

Runtime and sampling complexity

In ﬁgure 4.1(a) we compare the average number of samples of the input S(x) required
by each algorithm when the bandwidth Q is ﬁxed. We can see that the Naive algorithm
scales approximately like Θ k 2 , while the Naive-Sqrt and Unwrapped algorithms scale
apprimately like Θ(k), as expected. Note that for k

256, the Naive algorithm samples the

input over the Nyquist rate 1/ (N1 N2 ), while for all values of k tested the Naive-Sqrt and
Unwrapped algorithms take orders of magnitude fewer samples than either FFTW or the
Naive algorithm.
In ﬁgure 4.1(b) we compare the average runtime (in CPU ticks) of each algorithm when
the bandwidth Q is ﬁxed. The Naive algorithm is faster than FFTW for k ≤ 64, while
the Naive-Sqrt and Unwrapped algorithms are faster for k ≤ 256. We can also see that
the Naive algorithm scales approximately like Θ k 2 log(k) , and while the Naive-Sqrt and
Unwrapped algorithms are generally two orders of magnitude faster, they also appear to
scale at the same rate. This is in contrast to the expected Θ(k log k) runtime complexity.
A potential explanation for this observation is that, as in the one-dimensional case, we
evaluated the intermediate representations by simply looping over the sample points and the
terms in the representation. As mentioned in section 2.4.1, this has time complexity Θ(k 2 ),
while non-uniform FFTs can be used to reduce this to Θ(k log k). Whereas in the onedimensional case this did not aﬀect the overall runtime, it appears that in two dimensions
it represents a signiﬁcant fraction of the runtime. We leave the replacement of the direct
73

evaluation with non-uniform FFTs in both the one- and multi-dimensional algorithms in this
dissertation as future work.

4.5.3

Sample lengths and aliased frequencies

In ﬁgure 4.2(a) we compare the average number of sets of sample lengths p used by each
algorithm in the ﬁxed bandwidth case. We include the one-dimensional algorithm with
N = 222 for comparison. We can see that the Naive algorithm almost always uses only
one set of sample lengths, which implies that the choice of p ,q ≥ ck ∗ for 1 ≤ q ≤ d is
overly cautious. When we reduce the number of frequency bins by only requiring p ,q ≥
√
ck ∗ , the Naive-Sqrt algorithm uses many more sample length sets. Moreover, we see
that the Unwrapped algorithm uses approximately the same number of sample lengths as
the Naive-Sqrt algorithm, which is unexpected. What is more, both the Naive-Sqrt and
Unwrapped algorithms use many more sample lengths than the one-dimensional algorithm
on an equivalent bandwidth.
In ﬁgure 4.2(b) we compare the average fraction of aliased frequencies encountered by each
algorithm over its execution. This quantity was calculated simply by incrementing a counter
whenever the aliasing test (4.18) (or its equivalent in the Unwrapped and one-dimensional
cases) failed, and then dividing the total number of failures by the sparsity k. Note that
by the Markov chain analysis of one-dimensional algorithm in section 2.4.3 this quantity
should be independent of k, and this is indeed observed for the one-dimensional algorithm.
As mentioned in remark 4.3.1, we can see that the Naive algorithm almost never encounters
an aliased frequency, while the Naive-Sqrt and Unwrapped algorithms reject many more
frequencies than in the equivalent one-dimensional case.
74

Samples for fixed bandwidth N1 = N2 = 211

10

Avg. Samples

10

5

10

0

10

0

10

1

2

10

3

10

10

Sparsity k

Avg. Runtime (CPU Ticks)

(a)
Runtime for fixed bandwidth N = N = 211
1

11

2

10

8

10

5

10

0

10

1

2

10

10

3

10

Sparsity k

(b)
Figure 4.1: (a) 2D sampling complexity with ﬁxed bandwidth N1 = N2 = 211 for Naive
(blue solid line), Naive-Sqrt (black dashed line), Unwrapped (magenta dashed line), and
FFTW (red dashed line). (b) 2D runtime complexity with ﬁxed bandwidth N1 = N2 = 211
for Naive (blue solid line), Naive-Sqrt (black dashed line), Unwrapped (magenta dashed
line), and FFTW (red dashed line).

75

Avg. # of pjs for fixed bandwidth
1

j

Avg # of p s

10

0

10

0

10

1

2

10

10

3

10

Sparsity k

Avg fraction of aliased freqs

(a)
Avg. fraction of aliased freqs for fixed bandwidth
0.8
0.6
0.4
0.2
0 0
10

1

2

10

10

3

10

Sparsity k

(b)
Figure 4.2: (a) Average number of sample lengths used in 2D with ﬁxed bandwidth
N1 = N2 = 211 for Naive (solid blue line), Naive-Sqrt (black dashed line), and Unwrapped
(magenta dashed line). Also shown for comparison is 1D PS-Det (red solid line) with ﬁxed
bandwidth N = 222 . (b) Average fraction of aliased frequencies in 2D with ﬁxed bandwidth
N1 = N2 = 211 for Naive (solid blue line), Naive-Sqrt (black dashed line), and Unwrapped
(magenta dashed line). Also shown for comparison is 1D PS-Det (red solid line) with ﬁxed
bandwidth N = 222 .

76

A possible explanation for the increased number of sample length sets p and the increased fraction of aliased frequencies observed in the Naive-Sqrt algorithm is the test used
for aliasing in a given frequency bin. In the one-dimensional case, we tested whether the
ratio between the magnitudes of the shifted and unshifted peaks was suﬃciently close to
unity, whereas in the multi-dimensional case we require that a frequency bin pass this test
in all dimensions. The requirements for accepting a frequency bin are therefore more stringent in higher dimensions, and we would expect a commensurate increase in the number of
rejected frequencies. As shown in ﬁgure 4.2 this is indeed observed in two dimensions for
the Naive-Sqrt algorithm.

The eﬀect of this increase in the number of rejected frequencies can also be seen in the
runtime of the two-dimensional Naive-Sqrt algorithm. As more frequencies are rejected, more
sample lengths p are used to identify the remaining frequencies. Recall that in the adaptive
algorithm (which is what we implemented), whenever a new sample rate is used we subtract
oﬀ the contribution of the already-discovered Fourier terms. Since our implementation currently performs this operation directly, with time complexity Θ k 2 , the increased number
of rejected frequencies leads to the overall Θ k 2 time complexity observed in ﬁgure 4.1(b).

Finally we note that the preceding discussion only seems to apply to the Naive-Sqrt
algorithm, and not the Unwrapped version. The aliasing test in the Unwrapped algorithm is
identical to the one-dimensional version, so we would not expect an increase in the number
of rejected frequencies. The fact that the Unwrapped algorithm has near-identical empirical
performance with the Naive-Sqrt algorithm remains somewhat mysterious, and we leave the
elucidation of the relationship between the two as future work.
77

4.6

Conclusion

In this chapter we gave three diﬀerent extensions of our basic algorithm to the multidimensional setting. Two of these rely on the separability properties of the exponential
function, while the third relies on a clever number-theoretic construction using the Chinese
Remainder Theorem. The resulting algorithms have runtime and sampling requirements
that are less stringent than the d-dimensional FFT for sparsity levels up to k = 256. We
remark again that the implementation of non-uniform FFTs in place of direct evaluation of
intermediate representations and the explanation of the similarity in the observed behavior of the Naive-Sqrt and Unwrapped algorithms are left as possible directions for future
research.

78

BIBLIOGRAPHY

79

BIBLIOGRAPHY
[1] M. Ajtai, H. Iwaniec, J. Koml´s, J. Pintz, and E. Szemer´di, Construction of a thin set
o
e
with small Fourier coeﬃcients, Bull. London Math. Soc. 22 (1990), no. 6, 583–590.
[2] A. Akavia, Deterministic Sparse Fourier Approximation via Fooling Arithmetic Progressions, Conference on Learning Theory (CoLT), 2010.
[3] A. Akavia, S. Goldwasser, and S. Safra, Proving hard-core predicates using list decoding,
Annual Symposium on Foundations of Computer Science, vol. 44, 2003, pp. 146–159.
[4] C. Anderson and M. D. Dahleh, Rapid computation of the discrete Fourier transform,
SIAM J. Sci. Comput. 17 (1996), no. 4, 913–919.
[5] D. Boneh, Finding smooth integers in short intervals using crt decoding, Journal of
Computer and System Sciences 64 (2002), no. 4, 768–784.
[6] W. L. Briggs and V. E. Henson, The DFT: an owner’s manual for the discrete Fourier
transform, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,
1995.
[7] C. Burrus, Index mappings for multidimensional formulation of the DFT and convolution, IEEE Transactions on Acoustics, Speech and Signal Processing 25 (1977), no. 3,
239–242.
[8] E. J. Cand`s, J. K. Romberg, and T. Tao, Robust uncertainty principles: exact sige
nal reconstruction from highly incomplete frequency information, IEEE Trans. Inform.
Theory 52 (2006), no. 2, 489–509.
[9] I. Carron, Nuit blanche, http://nuit-blanche.blogspot.com.
[10] S.S. Chen, D.L. Donoho, and M.A. Saunders, Atomic decomposition by basis pursuit,
SIAM J. Sci. Comput. 20 (1998), no. 1, 33–61.
[11] A. Cohen, W. Dahmen, and R. DeVore, Compressed sensing and best k-term approximation, J. AMS 22 (2009), no. 1, 211–231.
80

[12] J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex
Fourier series, Math. Comp. 19 (1965), 297–301.
[13] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms,
second ed., MIT Press, Cambridge, MA, 2001.
[14] L. Demanet and G. Peyr´,
e
Compressive wave computation,
Submitted
to
Found.
Comput.
Math.
(2008),
Preprint
available
at http://math.mit.edu/ laurent/html/publications.html.
[15] J. Dongarra and F. Sullivan, Guest editors’ introduction: The top 10 algorithms, Computing in Science and Engineering (2000), 22–23.
[16] D. L. Donoho and P. B. Stark, Uncertainty principles and signal recovery, SIAM J.
Appl. Math. 49 (1989), no. 3, 906–931.
[17] D. P. Dubhashi and A. Panconesi, Concentration of measure for the analysis of randomized algorithms, Cambridge University Press, Cambridge, 2009.
[18] A. Dutt and V. Rokhlin, Fast Fourier transforms for nonequispaced data, SIAM J. Sci.
Comput. 14 (1993), no. 6, 1368–1393.
[19] M. Frigo and S. G. Johnson, The design and implementation of FFTW3, Proceedings
of the IEEE 93 (2005), no. 2, 216–231, Special issue on “Program Generation, Optimization, and Platform Adaptation”.
[20] A. Gilbert, S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss, Near-optimal sparse
Fourier representations via sampling, Symposium on Theory of Computing, 2002,
pp. 152–161.
[21] A. Gilbert, S. Muthukrishnan, and M. Strauss, Improved time bounds for near-optimal
sparse Fourier representations, SPIE Wavelets XI, 2005.
[22] A. Gilbert, M. Strauss, and J. Tropp, A tutorial on fast Fourier sampling, IEEE Signal
Processing Magazine 25 (2008), no. 2, 57–66.
[23] O. Goldreich, D. Ron, and M. Sudan, Chinese remaindering with errors, IEEE Transactions on Information Theory 46 (2000), no. 4, 1330–1338.
[24] G. Golub and C. Van Loan, Matrix computations, third ed., Johns Hopkins Studies in
the Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 1996.
81

[25] H. Hassanieh, P. Indyk, D. Katabi, and E. Price, Nearly optimal sparse fourier transform, to appear in ACM Symposium on Theory of Computing (STOC), 2012.
[26]

, Simple and practical algorithms for sparse Fourier transform, to appear in
ACM-SIAM Symposium on Discrete Algorithms (SODA), 2012.

[27] M. Iwen, A deterministic sub-linear time sparse Fourier algorithm via non-adaptive
compressed sensing methods, Proceedings of the nineteenth annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 2008, pp. 20–29.
[28]

, Combinatorial sublinear-time Fourier algorithms, Found. Comput. Math. 10
(2010), no. 3, 303–338.

[29]

, Improved approximation guarantees for sublinear-time Fourier algorithms, Submitted (2011), Available at http://www.math.duke.edu/ markiwen/.

[30] M. Iwen, A. Gilbert, and M. Strauss, Empirical evaluation of a sub-linear time sparse
DFT algorithm, Commun. Math. Sci. 5 (2007), no. 4, 981–998.
[31] R. M. Karp, Probabilistic recurrence relations, J. Assoc. Comput. Mach. 41 (1994),
no. 6, 1136–1150.
[32] N. Katz, An estimate for character sums, J. Amer. Math. Soc. 2 (1989), no. 2, 197–200.
[33] E. Kushilevitz and Y. Mansour, Learning decision trees using the Fourier spectrum,
SIAM J. Comput. 22 (1993), no. 6, 1331–1348.
[34] H. W. Lenstra, Jr., Integer programming with a ﬁxed number of variables, Math. Oper.
Res. 8 (1983), no. 4, 538–548.
[35] T. Lin and F.J. Herrmann, Compressed waveﬁeld extrapolation, Geophysics 72 (2007),
no. 5, SM77–SM93.
[36] N. Linial, Y. Mansour, and N. Nisan, Constant depth circuits, Fourier transform, and
learnability, J. Assoc. Comput. Mach. 40 (1993), no. 3, 607–620.
[37] Y. Mansour, Randomized interpolation and approximation of sparse polynomials, SIAM
Journal on Computing 24 (1995), no. 2, 357–368.
82

[38] I. Niven, H.S. Zuckerman, and H.L. Montgomery, An introduction to the theory of
numbers, ﬁfth ed., John Wiley & Sons Inc., New York, 1991.
[39] A.M. Pinkus, On L1 -approximation, Cambridge Tracts in Mathematics, vol. 93, Cambridge University Press, Cambridge, 1989.
[40] Y. Rubner, C. Tomasi, and L.J. Guibas, The earth mover’s distance as a metric for
image retrieval, International Journal of Computer Vision 40 (2000), no. 2, 99–121.
[41] L.I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal
algorithms, Physica D: Nonlinear Phenomena 60 (1992), no. 1-4, 259–268.
[42] F. Santosa and W.W. Symes, Linear inversion of band-limited reﬂection seismograms,
SIAM J. Sci. Statist. Comput. 7 (1986), no. 4, 1307–1330.
[43] A. Schrijver, Theory of linear and integer programming, Wiley-Interscience Series in
Discrete Mathematics, John Wiley & Sons Ltd., Chichester, 1986.
[44] I.E. Shparlinski and R. Steinfeld, Noisy chinese remaindering in the Lee norm, Journal
of Complexity 20 (2004), no. 2-3, 423–437.
[45] T. Tao and V. Vu, Additive combinatorics, Cambridge Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2006.
[46] C. Van Loan, Computational frameworks for the fast Fourier transform, Frontiers in
Applied Mathematics, vol. 10, Society for Industrial and Applied Mathematics (SIAM),
Philadelphia, PA, 1992.
[47] Y. Wang and G. Zhou, On the use of high-order ambiguity function for multi-component
polynomial phase signals, Signal Processing 65 (1998), no. 2, 283–296.

83